Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-06-06 Thread Neil Armstrong

On 11/05/2024 23:56, Dmitry Baryshkov wrote:

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
Cc: Steev Klimaszewski 
Cc: Alexey Minnekhanov 

--

Changes in v8:
- Reworked pd-mapper to register as an rproc_subdev / auxdev
- Dropped Tested-by from Steev and Alexey from the last patch since the
   implementation was changed significantly.
- Add sensors, cdsp and mpss_root domains to 660 config (Alexey
   Minnekhanov)
- Added platform entry for sm4250 (used for qrb4210 / RB2)
- Added locking to the pdr_get_domain_list() (Chris Lew)
- Remove the call to qmi_del_server() and corresponding API (Chris Lew)
- In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
- Link to v7: 
https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
   builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
   silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
   (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
   them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (5):
   soc: qcom: pdr: protect locator_addr with the main mutex
   soc: qcom: pdr: fix parsing of domains lists
   soc: qcom: pdr: extract PDR message marshalling data
   soc: qcom: add pd-mapper implementation
   remoteproc: qcom: enable in-kernel PD mapper

  drivers/remoteproc/qcom_common.c|  87 +
  drivers/remoteproc/qcom_common.h|  10 +
  drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
  drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
  drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
  drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
  drivers/soc/qcom/Kconfig|  15 +
  drivers/soc/qcom/Makefile   |   2 +
  drivers/soc/qcom/pdr_interface.c|  17 +-
  drivers/soc/qcom/pdr_internal.h | 318 ++---
  drivers/soc/qcom/qcom_pd_mapper.c   | 676 
  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
  12 files changed, 1190 insertions(+), 300 deletions(-)
---
base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,


Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Neil Armstrong  # on SM8550-HDK
Tested-by: Neil Armstrong  # on SM8650-QRD

Thanks,
Neil



Re: [PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot

2024-06-05 Thread Google
On Wed, 05 Jun 2024 16:26:44 -0400
Steven Rostedt  wrote:

> 
> Recieved some minor bug reports from the kernel test robot. First I started
> cleaning up some of the sparse warnings. There's many more, but most changes
> are not really helping anything, but just quieting the warnings.
> 
> But the reports from kernel test robot need to be fixed.

All looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thank you!

> 
> Steven Rostedt (Google) (6):
>   ftrace: Declare function_trace_op in header to quiet sparse warning
>   ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU
>   ftrace: Assign RCU list variable with rcu_assign_ptr()
>   ftrace: Fix prototypes for ftrace_startup/shutdown_subops()
>   function_graph: Make fgraph_do_direct static key static
>   function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not 
> enabled
> 
> 
>  include/linux/ftrace.h | 3 +++
>  kernel/trace/fgraph.c  | 4 +++-
>  kernel/trace/ftrace.c  | 4 ++--
>  kernel/trace/ftrace_internal.h | 9 +
>  kernel/trace/trace.h   | 1 -
>  5 files changed, 17 insertions(+), 4 deletions(-)


-- 
Masami Hiramatsu (Google) 



[PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot

2024-06-05 Thread Steven Rostedt


Recieved some minor bug reports from the kernel test robot. First I started
cleaning up some of the sparse warnings. There's many more, but most changes
are not really helping anything, but just quieting the warnings.

But the reports from kernel test robot need to be fixed.

Steven Rostedt (Google) (6):
  ftrace: Declare function_trace_op in header to quiet sparse warning
  ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU
  ftrace: Assign RCU list variable with rcu_assign_ptr()
  ftrace: Fix prototypes for ftrace_startup/shutdown_subops()
  function_graph: Make fgraph_do_direct static key static
  function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not 
enabled


 include/linux/ftrace.h | 3 +++
 kernel/trace/fgraph.c  | 4 +++-
 kernel/trace/ftrace.c  | 4 ++--
 kernel/trace/ftrace_internal.h | 9 +
 kernel/trace/trace.h   | 1 -
 5 files changed, 17 insertions(+), 4 deletions(-)



[PATCH v4 07/11] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-06-05 Thread Björn Töpel
From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Björn Töpel 
---
 arch/riscv/mm/ptdump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
 
return 0;
 }
-- 
2.43.0




Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-30 Thread classabbyamp
I've tested this applied on top of kernel 6.8.11 on an X13s over the 
past week and it's been working well.


--
classabbyamp



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-30 Thread Steven Rostedt
On Thu, 30 May 2024 16:02:37 +0300
Ilkka Naulapää  wrote:

> applied your patch and here's the output.
> 

Unfortunately, it doesn't give me any new information. I added one more
BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/

-- Steve

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index de5b72216b1a..a090495e78c9 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block 
*sb)
return NULL;
 
ti->flags = 0;
+   ti->magic = 20240823;
 
return >vfs_inode;
 }
 
 static void tracefs_free_inode(struct inode *inode)
 {
-   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+   struct tracefs_inode *ti = get_tracefs(inode);
+
+   BUG_ON(ti->magic != 20240823);
+   kmem_cache_free(tracefs_inode_cachep, ti);
 }
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
@@ -147,16 +151,6 @@ static const struct inode_operations 
tracefs_dir_inode_operations = {
.rmdir  = tracefs_syscall_rmdir,
 };
 
-struct inode *tracefs_get_inode(struct super_block *sb)
-{
-   struct inode *inode = new_inode(sb);
-   if (inode) {
-   inode->i_ino = get_next_ino();
-   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
-   }
-   return inode;
-}
-
 struct tracefs_mount_opts {
kuid_t uid;
kgid_t gid;
@@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, 
struct inode *inode)
return;
 
ti = get_tracefs(inode);
+   BUG_ON(ti->magic != 20240823);
if (ti && ti->flags & TRACEFS_EVENT_INODE)
eventfs_set_ef_status_free(dentry);
iput(inode);
@@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry)
return dentry;
 }
 
+struct inode *tracefs_get_inode(struct super_block *sb)
+{
+   struct inode *inode = new_inode(sb);
+
+   BUG_ON(sb->s_op != _super_operations);
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
+   }
+   return inode;
+}
+
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 69c2b1d87c46..9059b8b11bb6 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -9,12 +9,15 @@ enum {
 struct tracefs_inode {
unsigned long   flags;
void*private;
+   unsigned long   magic;
struct inodevfs_inode;
 };
 
 static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
 {
-   return container_of(inode, struct tracefs_inode, vfs_inode);
+   struct tracefs_inode *ti = container_of(inode, struct tracefs_inode, 
vfs_inode);
+   BUG_ON(ti->magic != 20240823);
+   return ti;
 }
 
 struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-29 Thread Steven Rostedt
On Wed, 29 May 2024 14:47:57 -0400
Steven Rostedt  wrote:

> Let me make a debug patch (that crashes on this issue) for that kernel,
> and perhaps you could bisect it?

Can you try this on 6.6-rc1 and see if it gives you any other splats?

Hmm, you can switch it to WARN_ON and that way it may not crash the
machine, and you can use dmesg to get the output.

Thanks,

-- Steve


diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index de5b72216b1a..a090495e78c9 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block 
*sb)
return NULL;
 
ti->flags = 0;
+   ti->magic = 20240823;
 
return >vfs_inode;
 }
 
 static void tracefs_free_inode(struct inode *inode)
 {
-   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+   struct tracefs_inode *ti = get_tracefs(inode);
+
+   BUG_ON(ti->magic != 20240823);
+   kmem_cache_free(tracefs_inode_cachep, ti);
 }
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
@@ -147,16 +151,6 @@ static const struct inode_operations 
tracefs_dir_inode_operations = {
.rmdir  = tracefs_syscall_rmdir,
 };
 
-struct inode *tracefs_get_inode(struct super_block *sb)
-{
-   struct inode *inode = new_inode(sb);
-   if (inode) {
-   inode->i_ino = get_next_ino();
-   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
-   }
-   return inode;
-}
-
 struct tracefs_mount_opts {
kuid_t uid;
kgid_t gid;
@@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, 
struct inode *inode)
return;
 
ti = get_tracefs(inode);
+   BUG_ON(ti->magic != 20240823);
if (ti && ti->flags & TRACEFS_EVENT_INODE)
eventfs_set_ef_status_free(dentry);
iput(inode);
@@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry)
return dentry;
 }
 
+struct inode *tracefs_get_inode(struct super_block *sb)
+{
+   struct inode *inode = new_inode(sb);
+
+   BUG_ON(sb->s_op != _super_operations);
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
+   }
+   return inode;
+}
+
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 69c2b1d87c46..9f6f303a9e58 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -9,6 +9,7 @@ enum {
 struct tracefs_inode {
unsigned long   flags;
void*private;
+   unsigned long   magic;
struct inodevfs_inode;
 };
 



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-29 Thread Steven Rostedt
On Wed, 29 May 2024 21:36:08 +0300
Ilkka Naulapää  wrote:

> applied your patch without others, so trace and panic there.
> Screenshot attached. Also tested kernels backward and found out that

Bah, it's still in an RCU callback, which doesn't tell us why a
normal inode is being sent to the trace inode free list.

> this trace bug first triggered on 6.6-rc1.

Hmm, that's when eventfs was added.

> 
> Let me know if you need more assistance with this.

Let me make a debug patch (that crashes on this issue) for that kernel,
and perhaps you could bisect it?

Thanks!

-- Steve



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-28 Thread Steven Rostedt
On Tue, 28 May 2024 07:51:30 +0300
Ilkka Naulapää  wrote:

> yeah, the cache_from_obj tracing bug (without panic) has been
> displayed quite some time now - maybe even since 6.7.x or so. I could
> try checking a few versions back for this and try bisecting it if I
> can find when this started.
> 

OK, so I don't think the commit your last bisect hit is the cause of
the bug. It added a delay (via RCU) and is causing the real bug to blow
up more.

Can you add this patch to v6.9.2 and hopefully it crashes in a better
location that we can find where the mixup happened.

You may need to add the other commit (too if this doesn't trigger.

Thanks,

-- Steve

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 417c840e6403..7af3f696696d 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -50,6 +50,7 @@ static struct inode *tracefs_alloc_inode(struct super_block 
*sb)
list_add_rcu(>list, _inodes);
spin_unlock_irqrestore(_inode_lock, flags);
 
+   ti->magic = 20240823;
return >vfs_inode;
 }
 
@@ -66,6 +67,7 @@ static void tracefs_free_inode(struct inode *inode)
struct tracefs_inode *ti = get_tracefs(inode);
unsigned long flags;
 
+   BUG_ON(ti->magic != 20240823);
spin_lock_irqsave(_inode_lock, flags);
list_del_rcu(>list);
spin_unlock_irqrestore(_inode_lock, flags);
@@ -271,16 +273,6 @@ static const struct inode_operations 
tracefs_file_inode_operations = {
.setattr= tracefs_setattr,
 };
 
-struct inode *tracefs_get_inode(struct super_block *sb)
-{
-   struct inode *inode = new_inode(sb);
-   if (inode) {
-   inode->i_ino = get_next_ino();
-   simple_inode_init_ts(inode);
-   }
-   return inode;
-}
-
 struct tracefs_mount_opts {
kuid_t uid;
kgid_t gid;
@@ -448,6 +440,17 @@ static const struct super_operations 
tracefs_super_operations = {
.show_options   = tracefs_show_options,
 };
 
+struct inode *tracefs_get_inode(struct super_block *sb)
+{
+   struct inode *inode = new_inode(sb);
+   BUG_ON(sb->s_op != _super_operations);
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   simple_inode_init_ts(inode);
+   }
+   return inode;
+}
+
 /*
  * It would be cleaner if eventfs had its own dentry ops.
  *
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index f704d8348357..dda7d2708e30 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -16,6 +16,7 @@ struct tracefs_inode {
};
/* The below gets initialized with memset_after(ti, 0, vfs_inode) */
struct list_headlist;
+   unsigned long   magic;
unsigned long   flags;
void*private;
 };



Re: [PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-28 Thread Daniel Bristot de Oliveira
On 5/20/24 07:42, Yang Li wrote:
> The patch updates the function documentation comment for
> rv_en(dis)able_monitor to adhere to the kernel-doc specification.
> 
> Signed-off-by: Yang Li 

Acked-by: Daniel Bristot de Oliveira 

Thanks
-- Daniel



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Ilkka Naulapää
yeah, the cache_from_obj tracing bug (without panic) has been
displayed quite some time now - maybe even since 6.7.x or so. I could
try checking a few versions back for this and try bisecting it if I
can find when this started.

--Ilkka

On Tue, May 28, 2024 at 1:31 AM Steven Rostedt  wrote:
>
> On Fri, 24 May 2024 12:50:08 +0200
> "Linux regression tracking (Thorsten Leemhuis)"  
> wrote:
>
> > > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > > quick display of a kernel trace dump before the shutdown/reboot
> > > completed. Starting from version 6.8.10 and continuing into version
> > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > > preventing the shutdown or reboot from completing and leaving the
> > > machine stuck.
>
> You state "Before kernel version 6.8.10, the bug caused ...". Does that
> mean that a bug was happening before v6.8.10? But did not cause a panic?
>
> I just noticed your second screen shot from your report, and it has:
>
>  "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from 
> inode_cache"
>
> So somehow an tracefs_inode was allocated from the inode_cache and is
> now being freed by the tracefs_inode logic? Did this happen before
> 6.8.10? If so, this code could just be triggering an issue from an
> unrelated bug.
>
> Thanks,
>
> -- Steve



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Ilkka Naulapää
I tried 6.10-rc1 and it still ends up to panic

--Ilkka


On Tue, May 28, 2024 at 12:44 AM Steven Rostedt  wrote:
>
> On Mon, 27 May 2024 20:14:42 +0200
> Greg KH  wrote:
>
> > On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote:
> > > Hi Steven,
> > >
> > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
> > > panic inducing commit:
> > >
> > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are 
> > > options)
> > >
> > > I reverted that commit to 6.9.2 and now it only serves the trace but
> > > the panic is gone. But I can live with it.
> >
> > Steven, should we revert that?
> >
> > Or is there some other change that we should take to resolve this?
> >
>
> Before we revert it (as it may be a bug in mainline), Ilkka, can you
> test v6.10-rc1?  If it exists there, it will let me know whether or not
> I missed something.
>
> Thanks,
>
> -- Steve



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Steven Rostedt
On Fri, 24 May 2024 12:50:08 +0200
"Linux regression tracking (Thorsten Leemhuis)"  
wrote:

> > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > quick display of a kernel trace dump before the shutdown/reboot
> > completed. Starting from version 6.8.10 and continuing into version
> > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > preventing the shutdown or reboot from completing and leaving the
> > machine stuck.

You state "Before kernel version 6.8.10, the bug caused ...". Does that
mean that a bug was happening before v6.8.10? But did not cause a panic?

I just noticed your second screen shot from your report, and it has:

 "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from 
inode_cache"

So somehow an tracefs_inode was allocated from the inode_cache and is
now being freed by the tracefs_inode logic? Did this happen before
6.8.10? If so, this code could just be triggering an issue from an
unrelated bug.

Thanks,

-- Steve



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Steven Rostedt
On Mon, 27 May 2024 20:14:42 +0200
Greg KH  wrote:

> On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote:
> > Hi Steven,
> > 
> > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
> > panic inducing commit:
> > 
> > 414fb08628143 (tracefs: Reset permissions on remount if permissions are 
> > options)
> > 
> > I reverted that commit to 6.9.2 and now it only serves the trace but
> > the panic is gone. But I can live with it.  
> 
> Steven, should we revert that?
> 
> Or is there some other change that we should take to resolve this?
> 

Before we revert it (as it may be a bug in mainline), Ilkka, can you
test v6.10-rc1?  If it exists there, it will let me know whether or not
I missed something.

Thanks,

-- Steve



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Greg KH
On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote:
> Hi Steven,
> 
> I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
> panic inducing commit:
> 
> 414fb08628143 (tracefs: Reset permissions on remount if permissions are 
> options)
> 
> I reverted that commit to 6.9.2 and now it only serves the trace but
> the panic is gone. But I can live with it.

Steven, should we revert that?

Or is there some other change that we should take to resolve this?

thanks,

greg k-h



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Ilkka Naulapää
Hi Steven,

I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
panic inducing commit:

414fb08628143 (tracefs: Reset permissions on remount if permissions are options)

I reverted that commit to 6.9.2 and now it only serves the trace but
the panic is gone. But I can live with it.

--Ilkka

On Sun, May 26, 2024 at 8:42 PM Ilkka Naulapää  wrote:
>
> hi,
>
> I took 6.9.2 and applied that 0bcfd9aa4dafa to it. Now the kernel is
> serving me both problems; the trace and the panic as the pic shows.
>
> > To understand this, did you do anything with tracing? Before shutting down,
> > is there anything in /sys/kernel/tracing/instances directory?
> > Were any of the files/directories permissions in /sys/kernel/tracing 
> > changed?
>
> And to answer your question, I did not do any tracing or so and the
> /sys/kernel/tracing is empty.
> Just plain boot-up, no matter if in full desktop or in bare rescue
> mode, ends up the same way.
>
> --Ilkka
>
> On Fri, May 24, 2024 at 8:19 PM Steven Rostedt  wrote:
> >
> > On Fri, 24 May 2024 12:50:08 +0200
> > "Linux regression tracking (Thorsten Leemhuis)"  
> > wrote:
> >
> > > > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > > > quick display of a kernel trace dump before the shutdown/reboot
> > > > completed. Starting from version 6.8.10 and continuing into version
> > > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > > > preventing the shutdown or reboot from completing and leaving the
> > > > machine stuck.
> >
> > Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on
> > remount if permissions are options"), which added a "iput" callback to the
> > dentry without calling iput, leaving stale inodes around.
> >
> > This is fixed with:
> >
> >   0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()")
> >
> > Try adding just that patch. It will at least make it go back to what was
> > happening before 6.8.10 (I hope!).
> >
> > -- Steve



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-24 Thread Steven Rostedt
On Fri, 24 May 2024 12:50:08 +0200
"Linux regression tracking (Thorsten Leemhuis)"  
wrote:

> > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > quick display of a kernel trace dump before the shutdown/reboot
> > completed. Starting from version 6.8.10 and continuing into version
> > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > preventing the shutdown or reboot from completing and leaving the
> > machine stuck.

Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on
remount if permissions are options"), which added a "iput" callback to the
dentry without calling iput, leaving stale inodes around.

This is fixed with:

  0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()")

Try adding just that patch. It will at least make it go back to what was
happening before 6.8.10 (I hope!).

-- Steve



Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-24 Thread Steven Rostedt
On Fri, 24 May 2024 12:50:08 +0200
"Linux regression tracking (Thorsten Leemhuis)"  
wrote:

> [CCing a few people]
> 

Thanks for the Cc.

> On 24.05.24 12:31, Ilkka Naulapää wrote:
> > 
> > I have encountered a critical bug in the Linux vanilla kernel that
> > leads to a kernel panic during the shutdown or reboot process. The
> > issue arises after all services, including `journald`, have been
> > stopped. As a result, the machine fails to complete the shutdown or
> > reboot procedure, effectively causing the system to hang and not shut
> > down or reboot.  

To understand this, did you do anything with tracing? Before shutting down,
is there anything in /sys/kernel/tracing/instances directory?
Were any of the files/directories permissions in /sys/kernel/tracing changed?

> 
> Thx for the report. Not my area of expertise, so take this with a gain
> of salt. But given the versions your mention in your report and the
> screenshot that mentioned tracefs_free_inode I suspect this is caused by
> baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are
> options"). A few fixes for it will soon hit mainline and are meant to be
> backported to affected stable trees:
> 
> https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/
> https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/
> 
> You might want to try them – or recheck once they hit the stable trees
> you are about. If they don't work, please report back.

There's been quite a bit of updates in this code, but this looks new to me.
I have more fixes that were just pulled by Linus today.

  https://git.kernel.org/torvalds/c/0eb03c7e8e2a4cc3653eb5eeb2d2001182071215

I'm not sure how relevant that is for this. But if you can reproduce it
with that commit, then this is a new bug.

-- Steve



Re: How to properly fix reading user pointers in bpf in android kernel 4.9?

2024-05-24 Thread Bagas Sanjaya
[also Cc: bpf maintainers and get_maintainer output]

On Thu, May 23, 2024 at 07:52:22PM +0300, Marcel wrote:
> This seems that it was a long standing problem with the Linux kernel in 
> general. bpf_probe_read should have worked for both kernel and user pointers 
> but it fails with access error when reading an user one instead. 
> 
> I know there's a patch upstream that fixes this by introducing new helpers 
> for reading kernel and userspace pointers and I tried to back port them back 
> to my kernel but with no success. Tools like bcc fail to use them and instead 
> they report that the arguments sent to the helpers are invalid. I assume this 
> is due to the arguments ARG_CONST_STACK_SIZE and ARG_PTR_TO_RAW_STACK handle 
> data different in the 4.9 android version and the upstream version but I'm 
> not sure that this is the cause. I left the patch I did below and with a link 
> to the kernel I'm working on and maybe someone can take a look and give me an 
> hand (the patch isn't applied yet)

What upstream patch? Has it already been in mainline?

> 
> <https://github.com/nitanmarcel/android_kernel_oneplus_sdm845-bpf>
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 744b4763b80e..de94c13b7193 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -559,6 +559,43 @@ enum bpf_func_id {
> */
> BPF_FUNC_probe_read_user,
>  
> +   /**
> +   * int bpf_probe_read_kernel(void *dst, int size, void *src)
> +   * Read a kernel pointer safely.
> +   * Return: 0 on success or negative error
> +   */
> +   BPF_FUNC_probe_read_kernel,
> +
> + /**
> +  * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
> +  * Copy a NUL terminated string from user unsafe address. In case 
> the string
> +  * length is smaller than size, the target is not padded with 
> further NUL
> +  * bytes. In case the string length is larger than size, just 
> count-1
> +  * bytes are copied and the last byte is set to NUL.
> +  * @dst: destination address
> +  * @size: maximum number of bytes to copy, including the trailing 
> NUL
> +  * @unsafe_ptr: unsafe address
> +  * Return:
> +  *   > 0 length of the string including the trailing NUL on success
> +  *   < 0 error
> +  */
> + BPF_FUNC_probe_read_user_str,
> +
> + /**
> +  * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
> +  * Copy a NUL terminated string from unsafe address. In case the 
> string
> +  * length is smaller than size, the target is not padded with 
> further NUL
> +  * bytes. In case the string length is larger than size, just 
> count-1
> +  * bytes are copied and the last byte is set to NUL.
> +  * @dst: destination address
> +  * @size: maximum number of bytes to copy, including the trailing 
> NUL
> +  * @unsafe_ptr: unsafe address
> +  * Return:
> +  *   > 0 length of the string including the trailing NUL on success
> +  *   < 0 error
> +  */
> + BPF_FUNC_probe_read_kernel_str,
> +
>   __BPF_FUNC_MAX_ID,
>  };
>  
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index a1e37a5d8c88..3478ca744a45 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -94,7 +94,7 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
>   .arg3_type  = ARG_ANYTHING,
>  };
>  
> -BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, 
> unsafe_ptr)
> +BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void  __user 
> *, unsafe_ptr)
>  {
>   int ret;
>  
> @@ -115,6 +115,27 @@ static const struct bpf_func_proto 
> bpf_probe_read_user_proto = {
>  };
>  
>  
> +BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, 
> unsafe_ptr)
> +{
> + int ret;
> +
> + ret = probe_kernel_read(dst, unsafe_ptr, size);
> + if (unlikely(ret < 0))
> + memset(dst, 0, size);
> +
> + return ret;
> +}
> +
> +static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
> + .func   = bpf_probe_read_kernel,
> + .gpl_only   = true,
> + .ret_type   = RET_INTEGER,
> + .arg1_type  = ARG_PTR_TO_RAW_STACK,
> + .arg2_type  = ARG_CONST_STACK_SIZE,
> + .arg3_type  = ARG_ANYTHING,
> +};
> +
> +
>  BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
>  u32, size)
>  {
> @@ -487,6 +508,69 @@ static const struct bpf_func_proto 
>

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-24 Thread Linux regression tracking (Thorsten Leemhuis)
[CCing a few people]

On 24.05.24 12:31, Ilkka Naulapää wrote:
> 
> I have encountered a critical bug in the Linux vanilla kernel that
> leads to a kernel panic during the shutdown or reboot process. The
> issue arises after all services, including `journald`, have been
> stopped. As a result, the machine fails to complete the shutdown or
> reboot procedure, effectively causing the system to hang and not shut
> down or reboot.

Thx for the report. Not my area of expertise, so take this with a gain
of salt. But given the versions your mention in your report and the
screenshot that mentioned tracefs_free_inode I suspect this is caused by
baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are
options"). A few fixes for it will soon hit mainline and are meant to be
backported to affected stable trees:

https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/
https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/

You might want to try them – or recheck once they hit the stable trees
you are about. If they don't work, please report back.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Here are the details of the issue:
> 
> - Affected Versions: Before kernel version 6.8.10, the bug caused a
> quick display of a kernel trace dump before the shutdown/reboot
> completed. Starting from version 6.8.10 and continuing into version
> 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> preventing the shutdown or reboot from completing and leaving the
> machine stuck.
> 
> - Symptoms:
>   - In normal shutdown/reboot scenarios, the kernel trace dump briefly
> appears as the last message on the screen.
>   - In rescue mode, the kernel panic message is displayed. Normally it
> is not shown.
> 
> Since `journald` is stopped before this issue occurs, no textual logs
> are available. However, I have captured two pictures illustrating
> these related issues, which I am attaching to this email for your
> reference. Also added my custom kernel config.
> 
> Thank you for your attention to this matter. Please let me know if any
> additional information is required to assist in diagnosing and
> resolving this bug.
> 
> Best regards,
> 
> Ilkka Naulapää



[PATCH v3 6/9] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-21 Thread Björn Töpel
From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Björn Töpel 
---
 arch/riscv/mm/ptdump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
 
return 0;
 }
-- 
2.40.1




[PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-19 Thread Yang Li
The patch updates the function documentation comment for
rv_en(dis)able_monitor to adhere to the kernel-doc specification.

Signed-off-by: Yang Li 
---
 kernel/trace/rv/rv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
index 2f68e93fff0b..df0745a42a3f 100644
--- a/kernel/trace/rv/rv.c
+++ b/kernel/trace/rv/rv.c
@@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def 
*mdef, bool sync)
 
 /**
  * rv_disable_monitor - disable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success.
  */
@@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef)
 
 /**
  * rv_enable_monitor - enable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success, error otherwise.
  */
-- 
2.20.1.7.g153144c




Re: [PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()

2024-05-18 Thread Google
On Sat, 18 May 2024 15:54:49 -0700
Jeff Johnson  wrote:

> Fix the 'make W=1' warning:
> 
> WARNING: modpost: missing MODULE_DESCRIPTION() in 
> kernel/trace/preemptirq_delay_test.o
> 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency 
tracers")

Thanks,

> Signed-off-by: Jeff Johnson 
> ---
>  kernel/trace/preemptirq_delay_test.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/trace/preemptirq_delay_test.c 
> b/kernel/trace/preemptirq_delay_test.c
> index 8c4ffd076162..cb0871fbdb07 100644
> --- a/kernel/trace/preemptirq_delay_test.c
> +++ b/kernel/trace/preemptirq_delay_test.c
> @@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void)
>  
>  module_init(preemptirq_delay_init)
>  module_exit(preemptirq_delay_exit)
> +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency 
> tracers");
>  MODULE_LICENSE("GPL v2");
> 
> ---
> base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532
> change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b
> 


-- 
Masami Hiramatsu (Google) 



[PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()

2024-05-18 Thread Jeff Johnson
Fix the 'make W=1' warning:

WARNING: modpost: missing MODULE_DESCRIPTION() in 
kernel/trace/preemptirq_delay_test.o

Signed-off-by: Jeff Johnson 
---
 kernel/trace/preemptirq_delay_test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index 8c4ffd076162..cb0871fbdb07 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void)
 
 module_init(preemptirq_delay_init)
 module_exit(preemptirq_delay_exit)
+MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency 
tracers");
 MODULE_LICENSE("GPL v2");

---
base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532
change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b




Re: [PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-17 Thread Daniel Bristot de Oliveira
Hi Yang

On 5/17/24 11:14, Yang Li wrote:
> The patch updates the function documentation comment for
> rv_en(dis)able_monitor to adhere to the kernel-doc specification.
> 
> Signed-off-by: Yang Li 
> ---
>  kernel/trace/rv/rv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
> index 2f68e93fff0b..df0745a42a3f 100644
> --- a/kernel/trace/rv/rv.c
> +++ b/kernel/trace/rv/rv.c
> @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def 
> *mdef, bool sync)
>  
>  /**
>   * rv_disable_monitor - disable a given runtime monitor
> + * @mdef: Pointer to the monitor definition structure.

This change is in for mainline kernel, why are you using the -next on the 
Subject?

-- Daniel



[PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-17 Thread Yang Li
The patch updates the function documentation comment for
rv_en(dis)able_monitor to adhere to the kernel-doc specification.

Signed-off-by: Yang Li 
---
 kernel/trace/rv/rv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
index 2f68e93fff0b..df0745a42a3f 100644
--- a/kernel/trace/rv/rv.c
+++ b/kernel/trace/rv/rv.c
@@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def 
*mdef, bool sync)
 
 /**
  * rv_disable_monitor - disable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success.
  */
@@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef)
 
 /**
  * rv_enable_monitor - enable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success, error otherwise.
  */
-- 
2.20.1.7.g153144c




[PATCH v3 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-16 Thread Kris Van Hees
Signed-off-by: Kris Van Hees 
Reviewed-by: Nick Alcock 
---
Changes since v2:
 - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
 - Use $(real-prereqs) rather than $(filter-out ...)
---
 scripts/Makefile.vmlinux | 16 
 1 file changed, 16 insertions(+)

diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index c9f3e03124d7f..afe8287e8dda0 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -36,6 +36,22 @@ targets += vmlinux
 vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
+$(call if_changed_dep,link_vmlinux)
 
+# module.builtin.ranges
+# ---
+ifdef CONFIG_BUILTIN_MODULE_RANGES
+__default: modules.builtin.ranges
+
+quiet_cmd_modules_builtin_ranges = GEN $@
+  cmd_modules_builtin_ranges = \
+   $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@
+
+vmlinux.map: vmlinux
+
+targets += modules.builtin.ranges
+modules.builtin.ranges: modules.builtin.modinfo vmlinux.map vmlinux.o.map FORCE
+   $(call if_changed,modules_builtin_ranges)
+endif
+
 # Add FORCE to the prequisites of a target to force it to be always rebuilt.
 # ---
 
-- 
2.43.0




Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread Oscar Salvador
On Tue, May 14, 2024 at 04:04:43PM +0200, Björn Töpel wrote:
> From: Björn Töpel 
> 
> During memory hot remove, the ptdump functionality can end up touching
> stale data. Avoid any potential crashes (or worse), by holding the
> memory hotplug read-lock while traversing the page table.
> 
> This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
> Hold memory hotplug lock while walking for kernel page table dump").
> 
> Signed-off-by: Björn Töpel 

Reviewed-by: Oscar Salvador 

funny enough, it seems arm64 and riscv are the only ones holding the
hotplug lock here.
I think we have the same problem on the other arches as well (at least
on x86_64 that I can see).

If we happen to finally need the lock in those, I would rather have a
centric function in the generic mm code with the locking and then
calling an arch specific ptdump_show function, so the lock is not
scattered. But that is another story.

 

-- 
Oscar Salvador
SUSE Labs



Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread David Hildenbrand

On 14.05.24 16:04, Björn Töpel wrote:

From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Signed-off-by: Björn Töpel 
---
  arch/riscv/mm/ptdump.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
  
  static int ptdump_show(struct seq_file *m, void *v)

  {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
  
  	return 0;

  }


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb




[PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread Björn Töpel
From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Signed-off-by: Björn Töpel 
---
 arch/riscv/mm/ptdump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
 
return 0;
 }
-- 
2.40.1




Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-05-13 Thread Michal Koutný
On Mon, Apr 08, 2024 at 04:58:18PM GMT, Michal Koutný  wrote:
> The kernel provides mechanisms, while it should not imply policies --
> default pid_max seems to be an example of the policy that does not fit
> all. At the same time pid_max must have some value assigned, so use the
> end of the allowed range -- pid_max_max.
> 
> This change thus increases initial pid_max from 32k to 4M (x86_64
> defconfig).

Out of curiosity I dug out the commit
acdc721fe26d ("[PATCH] pid-max-2.5.33-A0") v2.5.34~5
that introduced the 32k default. The commit message doesn't say why such
a sudden change though.
Previously, the limit was 1G of pids (i.e. effectively no default limit
like the intention of this series).

Honestly, I expected more enthusiasm or reasons against removing the
default value of pid_max. Is this really not of interest to anyone?

(Thanks, Andrew, for your responses. I don't plan to pursue this further
should there be no more interest in having less default limit values in
kernel.)

Regards,
Michal


signature.asc
Description: PGP signature


Re: [PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-12 Thread Masahiro Yamada
On Sun, May 12, 2024 at 7:44 AM Kris Van Hees  wrote:
>
> Signed-off-by: Kris Van Hees 
> Reviewed-by: Nick Alcock 
> ---
> Changes since v1:
>  - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
> ---
>  scripts/Makefile.vmlinux | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index c9f3e03124d7f..54095d72f7fd7 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -36,6 +36,23 @@ targets += vmlinux
>  vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
> +$(call if_changed_dep,link_vmlinux)
>
> +# module.builtin.ranges
> +# ---
> +ifdef CONFIG_BUILTIN_MODULE_RANGES
> +__default: modules.builtin.ranges
> +
> +quiet_cmd_modules_builtin_ranges = GEN $@
> +  cmd_modules_builtin_ranges = \
> +   $(srctree)/scripts/generate_builtin_ranges.awk \
> + $(filter-out FORCE,$+) > $@


$(filter-out FORCE,$+)

  ->

$(real-prereqs)



> +
> +vmlinux.map: vmlinux
> +
> +targets += modules.builtin.ranges
> +modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE
> +   $(call if_changed,modules_builtin_ranges)
> +endif
> +
>  # Add FORCE to the prequisites of a target to force it to be always rebuilt.
>  # ---
>
> --
> 2.43.0
>
>


-- 
Best Regards
Masahiro Yamada



Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-12 Thread Steev Klimaszewski
On Sat, May 11, 2024 at 4:56 PM Dmitry Baryshkov
 wrote:
>
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
>
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
>
> However this configuration is largely static and common between
> different platforms. Provide in-kernel service implementing static
> per-platform data.
>
> To: Bjorn Andersson 
> To: Konrad Dybcio 
> To: Sibi Sankar 
> To: Mathieu Poirier 
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-remotep...@vger.kernel.org
> Cc: Johan Hovold 
> Cc: Xilin Wu 
> Cc: "Bryan O'Donoghue" 
> Cc: Steev Klimaszewski 
> Cc: Alexey Minnekhanov 
>
> --
>
> Changes in v8:
> - Reworked pd-mapper to register as an rproc_subdev / auxdev
> - Dropped Tested-by from Steev and Alexey from the last patch since the
>   implementation was changed significantly.
> - Add sensors, cdsp and mpss_root domains to 660 config (Alexey
>   Minnekhanov)
> - Added platform entry for sm4250 (used for qrb4210 / RB2)
> - Added locking to the pdr_get_domain_list() (Chris Lew)
> - Remove the call to qmi_del_server() and corresponding API (Chris Lew)
> - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
> - Link to v7: 
> https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org
>
> Changes in v7:
> - Fixed modular build (Steev)
> - Link to v6: 
> https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
>
> Changes in v6:
> - Reworked mutex to fix lockdep issue on deregistration
> - Fixed dependencies between PD-mapper and remoteproc to fix modular
>   builds (Krzysztof)
> - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> - Fixed kerneldocs (Krzysztof)
> - Removed extra pr_debug messages (Krzysztof)
> - Fixed wcss build (Krzysztof)
> - Added platforms which do not require protection domain mapping to
>   silence the notice on those platforms
> - Link to v5: 
> https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
>
> Changes in v5:
> - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> Lew)
> - pd_mapper: reworked to provide static configuration per platform
>   (Bjorn)
> - Link to v4: 
> https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
>
> Changes in v4:
> - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> - Added configuration for sm6350 (Thanks to Luca)
> - Removed RFC tag (Konrad)
> - Link to v3: 
> https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
>
> Changes in RFC v3:
> - Send start / stop notifications when PD-mapper domain list is changed
> - Reworked the way PD-mapper treats protection domains, register all of
>   them in a single batch
> - Added SC7180 domains configuration based on TCL Book 14 GO
> - Link to v2: 
> https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
>
> Changes in RFC v2:
> - Swapped num_domains / domains (Konrad)
> - Fixed an issue with battery not working on sc8280xp
> - Added missing configuration for QCS404
>
> ---
> Dmitry Baryshkov (5):
>   soc: qcom: pdr: protect locator_addr with the main mutex
>   soc: qcom: pdr: fix parsing of domains lists
>   soc: qcom: pdr: extract PDR message marshalling data
>   soc: qcom: add pd-mapper implementation
>   remoteproc: qcom: enable in-kernel PD mapper
>
>  drivers/remoteproc/qcom_common.c|  87 +
>  drivers/remoteproc/qcom_common.h|  10 +
>  drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
>  drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
>  drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
>  drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
>  drivers/soc/qcom/Kconfig|  15 +
>  drivers/soc/qcom/Makefile   |   2 +
>  drivers/soc/qcom/pdr_interface.c|  17 +-
>  drivers/soc/qcom/pdr_internal.h | 318 ++---
>  drivers/soc/qcom/qcom_pd_mapper.c   | 676 
> 
>  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
>  12 files changed, 1190 insertions(+), 300 deletions(-)
> ---
> base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
> change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
>
> Best regards,
> --
> Dmitry Baryshkov 
>

I've tested this over the weekend on my Thinkpad X13s with a number of
reboots and seems to do the correct thing in v8 as well.
Tested-by: Steev Klimaszewski 



[PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-11 Thread Kris Van Hees
Signed-off-by: Kris Van Hees 
Reviewed-by: Nick Alcock 
---
Changes since v1:
 - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
---
 scripts/Makefile.vmlinux | 17 +
 1 file changed, 17 insertions(+)

diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index c9f3e03124d7f..54095d72f7fd7 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -36,6 +36,23 @@ targets += vmlinux
 vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
+$(call if_changed_dep,link_vmlinux)
 
+# module.builtin.ranges
+# ---
+ifdef CONFIG_BUILTIN_MODULE_RANGES
+__default: modules.builtin.ranges
+
+quiet_cmd_modules_builtin_ranges = GEN $@
+  cmd_modules_builtin_ranges = \
+   $(srctree)/scripts/generate_builtin_ranges.awk \
+ $(filter-out FORCE,$+) > $@
+
+vmlinux.map: vmlinux
+
+targets += modules.builtin.ranges
+modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE
+   $(call if_changed,modules_builtin_ranges)
+endif
+
 # Add FORCE to the prequisites of a target to force it to be always rebuilt.
 # ---
 
-- 
2.43.0




[PATCH v8 5/5] remoteproc: qcom: enable in-kernel PD mapper

2024-05-11 Thread Dmitry Baryshkov
Request in-kernel protection domain mapper to be started before starting
Qualcomm DSP and release it once DSP is stopped. Once all DSPs are
stopped, the PD mapper will be stopped too.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/remoteproc/qcom_common.c| 87 +
 drivers/remoteproc/qcom_common.h| 10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |  3 ++
 drivers/remoteproc/qcom_q6v5_mss.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_pas.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_wcss.c |  3 ++
 6 files changed, 109 insertions(+)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 03e5f5d533eb..8c8688f99f0a 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -25,6 +26,7 @@
 #define to_glink_subdev(d) container_of(d, struct qcom_rproc_glink, subdev)
 #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev)
 #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev)
+#define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev)
 
 #define MAX_NUM_OF_SS   10
 #define MAX_REGION_NAME_LENGTH  16
@@ -519,5 +521,90 @@ void qcom_remove_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr)
 }
 EXPORT_SYMBOL_GPL(qcom_remove_ssr_subdev);
 
+static void pdm_dev_release(struct device *dev)
+{
+   struct auxiliary_device *adev = to_auxiliary_dev(dev);
+
+   kfree(adev);
+}
+
+static int pdm_notify_prepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+   struct auxiliary_device *adev;
+   int ret;
+
+   adev = kzalloc(sizeof(*adev), GFP_KERNEL);
+   if (!adev)
+   return -ENOMEM;
+
+   adev->dev.parent = pdm->dev;
+   adev->dev.release = pdm_dev_release;
+   adev->name = "pd-mapper";
+   adev->id = pdm->index;
+
+   ret = auxiliary_device_init(adev);
+   if (ret) {
+   kfree(adev);
+   return ret;
+   }
+
+   ret = auxiliary_device_add(adev);
+   if (ret) {
+   auxiliary_device_uninit(adev);
+   return ret;
+   }
+
+   pdm->adev = adev;
+
+   return 0;
+}
+
+
+static void pdm_notify_unprepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+
+   if (!pdm->adev)
+   return;
+
+   auxiliary_device_delete(pdm->adev);
+   auxiliary_device_uninit(pdm->adev);
+   pdm->adev = NULL;
+}
+
+/**
+ * qcom_add_pdm_subdev() - register PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Register @pdm so that Protection Device mapper service is started when the
+ * DSP is started too.
+ */
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   pdm->dev = >dev;
+   pdm->index = rproc->index;
+
+   pdm->subdev.prepare = pdm_notify_prepare;
+   pdm->subdev.unprepare = pdm_notify_unprepare;
+
+   rproc_add_subdev(rproc, >subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_add_pdm_subdev);
+
+/**
+ * qcom_remove_pdm_subdev() - remove PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Remove the PD Mapper subdevice.
+ */
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   rproc_remove_subdev(rproc, >subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_remove_pdm_subdev);
+
 MODULE_DESCRIPTION("Qualcomm Remoteproc helper driver");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/remoteproc/qcom_common.h b/drivers/remoteproc/qcom_common.h
index 9ef4449052a9..b07fbaa091a0 100644
--- a/drivers/remoteproc/qcom_common.h
+++ b/drivers/remoteproc/qcom_common.h
@@ -34,6 +34,13 @@ struct qcom_rproc_ssr {
struct qcom_ssr_subsystem *info;
 };
 
+struct qcom_rproc_pdm {
+   struct rproc_subdev subdev;
+   struct device *dev;
+   int index;
+   struct auxiliary_device *adev;
+};
+
 void qcom_minidump(struct rproc *rproc, unsigned int minidump_id,
void (*rproc_dumpfn_t)(struct rproc *rproc,
struct rproc_dump_segment *segment, void *dest, 
size_t offset,
@@ -52,6 +59,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr,
 const char *ssr_name);
 void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr);
 
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+
 #if IS_ENABLED(CONFIG_QCOM_SYSMON)
 struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc,
   const char *name,
diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_a

[PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-11 Thread Dmitry Baryshkov
Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
Cc: Steev Klimaszewski 
Cc: Alexey Minnekhanov 

--

Changes in v8:
- Reworked pd-mapper to register as an rproc_subdev / auxdev
- Dropped Tested-by from Steev and Alexey from the last patch since the
  implementation was changed significantly.
- Add sensors, cdsp and mpss_root domains to 660 config (Alexey
  Minnekhanov)
- Added platform entry for sm4250 (used for qrb4210 / RB2)
- Added locking to the pdr_get_domain_list() (Chris Lew)
- Remove the call to qmi_del_server() and corresponding API (Chris Lew)
- In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
- Link to v7: 
https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
  builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
  silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
  (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
  them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (5):
  soc: qcom: pdr: protect locator_addr with the main mutex
  soc: qcom: pdr: fix parsing of domains lists
  soc: qcom: pdr: extract PDR message marshalling data
  soc: qcom: add pd-mapper implementation
  remoteproc: qcom: enable in-kernel PD mapper

 drivers/remoteproc/qcom_common.c|  87 +
 drivers/remoteproc/qcom_common.h|  10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
 drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
 drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
 drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
 drivers/soc/qcom/Kconfig|  15 +
 drivers/soc/qcom/Makefile   |   2 +
 drivers/soc/qcom/pdr_interface.c|  17 +-
 drivers/soc/qcom/pdr_internal.h | 318 ++---
 drivers/soc/qcom/qcom_pd_mapper.c   | 676 
 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
 12 files changed, 1190 insertions(+), 300 deletions(-)
---
base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,
-- 
Dmitry Baryshkov 




Re: kernel BUG in ptr_stale

2024-05-09 Thread Kent Overstreet
On Thu, May 09, 2024 at 02:26:24PM +0800, Ubisectech Sirius wrote:
> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 6.7. Attached to 
> the email were a PoC file of the issue.

This (and several of your others) are fixed in Linus's tree.

> 
> Stack dump:
> 
> bcachefs (loop1): mounting version 1.7: (unknown version) 
> opts=metadata_checksum=none,data_checksum=none,nojournal_transaction_names
> ----[ cut here ]
> kernel BUG at fs/bcachefs/buckets.h:114!
> invalid opcode:  [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 9472 Comm: syz-executor.1 Not tainted 6.7.0 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014
> RIP: 0010:bucket_gen fs/bcachefs/buckets.h:114 [inline]
> RIP: 0010:ptr_stale+0x474/0x4e0 fs/bcachefs/buckets.h:188
> Code: 48 c7 c2 80 8c 1b 8b be 67 00 00 00 48 c7 c7 e0 8c 1b 8b c6 05 ea a6 72 
> 0b 01 e8 57 55 9c fd e9 fb fc ff ff e8 9d 02 bd fd 90 <0f> 0b 48 89 04 24 e8 
> 31 bb 13 fe 48 8b 04 24 e9 35 fc ff ff e8 23
> RSP: 0018:c90007c4ec38 EFLAGS: 00010246
> RAX: 0004 RBX: 0080 RCX: c90002679000
> RDX: 0004 RSI: 83ccf3b3 RDI: 0006
> RBP:  R08: 0006 R09: 1028
> R10: 0080 R11:  R12: 1028
> R13: 88804dee5100 R14:  R15: 88805b1a4110
> FS:  7f79ba8ab640() GS:88807ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f0bbda3f000 CR3: 5f37a000 CR4: 00750ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
>  
>  bch2_bkey_ptrs_to_text+0xb4e/0x1760 fs/bcachefs/extents.c:1012
>  bch2_btree_ptr_v2_to_text+0x288/0x330 fs/bcachefs/extents.c:215
>  bch2_val_to_text fs/bcachefs/bkey_methods.c:287 [inline]
>  bch2_bkey_val_to_text+0x1c8/0x210 fs/bcachefs/bkey_methods.c:297
>  journal_validate_key+0x7ab/0xb50 fs/bcachefs/journal_io.c:322
>  journal_entry_btree_root_validate+0x31c/0x380 fs/bcachefs/journal_io.c:411
>  bch2_journal_entry_validate+0xc7/0x130 fs/bcachefs/journal_io.c:752
>  bch2_sb_clean_validate_late+0x14b/0x1e0 fs/bcachefs/sb-clean.c:32
>  bch2_read_superblock_clean+0xbb/0x250 fs/bcachefs/sb-clean.c:160
>  bch2_fs_recovery+0x113/0x52d0 fs/bcachefs/recovery.c:691
>  bch2_fs_start+0x365/0x5e0 fs/bcachefs/super.c:978
>  bch2_fs_open+0x1ac9/0x3890 fs/bcachefs/super.c:1968
>  bch2_mount+0x538/0x13c0 fs/bcachefs/fs.c:1863
>  legacy_get_tree+0x109/0x220 fs/fs_context.c:662
>  vfs_get_tree+0x93/0x380 fs/super.c:1771
>  do_new_mount fs/namespace.c:3337 [inline]
>  path_mount+0x679/0x1e40 fs/namespace.c:3664
>  do_mount fs/namespace.c:3677 [inline]
>  __do_sys_mount fs/namespace.c:3886 [inline]
>  __se_sys_mount fs/namespace.c:3863 [inline]
>  __x64_sys_mount+0x287/0x310 fs/namespace.c:3863
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f79b9a91b3e
> Code: 48 c7 c0 ff ff ff ff eb aa e8 be 0d 00 00 66 2e 0f 1f 84 00 00 00 00 00 
> 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f79ba8aae38 EFLAGS: 0202 ORIG_RAX: 00a5
> RAX: ffda RBX: 000119f4 RCX: 7f79b9a91b3e
> RDX: 20011a00 RSI: 20011a40 RDI: 7f79ba8aae90
> RBP: 7f79ba8aaed0 R08: 7f79ba8aaed0 R09: 0181c050
> R10: 0181c050 R11: 0202 R12: 20011a00
> R13: 20011a40 R14: 7f79ba8aae90 R15: 21c0
>  
> Modules linked in:
> ---[ end trace  ]---
> 
> 
> Thank you for taking the time to read this email and we look forward to 
> working with you further.
> 
> 
> 
> 
> 
> 





[PATCH] tracing: Fix trace_pid_list_free() kernel-doc

2024-05-06 Thread Jeff Johnson
make C=1 reports:

kernel/trace/pid_list.c:458: warning: Function parameter or struct member 
'pid_list' not described in 'trace_pid_list_free'

Add the missing parameter to the trace_pid_list_free() kernel-doc.

Signed-off-by: Jeff Johnson 
---
 kernel/trace/pid_list.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/pid_list.c b/kernel/trace/pid_list.c
index 95106d02b32d..19b271a12c99 100644
--- a/kernel/trace/pid_list.c
+++ b/kernel/trace/pid_list.c
@@ -451,6 +451,7 @@ struct trace_pid_list *trace_pid_list_alloc(void)
 
 /**
  * trace_pid_list_free - Frees an allocated pid_list.
+ * @pid_list: The pid list to free.
  *
  * Frees the memory for a pid_list that was allocated.
  */

---
base-commit: dd5a440a31fae6e459c0d627162825505361
change-id: 20240506-trace_pid_list_free-kdoc-e2bf15be84ee




Re: [PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-05-06 Thread Hou Tao



On 4/26/2024 10:39 PM, Hou Tao wrote:
> From: Hou Tao 
>
> When trying to insert a 10MB kernel module kept in a virtio-fs with cache
> disabled, the following warning was reported:
>
>   [ cut here ]
>   WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ..
>   Modules linked in:
>   CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ..
>   RIP: 0010:__alloc_pages+0x2bf/0x380
>   ..
>   Call Trace:
>
>? __warn+0x8e/0x150
>? __alloc_pages+0x2bf/0x380
>__kmalloc_large_node+0x86/0x160
>__kmalloc+0x33c/0x480
>virtio_fs_enqueue_req+0x240/0x6d0
>virtio_fs_wake_pending_and_unlock+0x7f/0x190
>queue_request_and_unlock+0x55/0x60
>fuse_simple_request+0x152/0x2b0
>fuse_direct_io+0x5d2/0x8c0
>fuse_file_read_iter+0x121/0x160
>__kernel_read+0x151/0x2d0
>kernel_read+0x45/0x50
>kernel_read_file+0x1a9/0x2a0
>init_module_from_file+0x6a/0xe0
>idempotent_init_module+0x175/0x230
>__x64_sys_finit_module+0x5d/0xb0
>x64_sys_call+0x1c3/0x9e0
>do_syscall_64+0x3d/0xc0
>entry_SYSCALL_64_after_hwframe+0x4b/0x53
>..
>
>   ---[ end trace  ]---
>
> The warning is triggered as follows:
>

SNIP
> @@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct 
> iov_iter *iter,
>   size_t nbytes = min(count, nmax);
>  
>   err = fuse_get_user_pages(>ap, iter, , write,
> -   max_pages);
> +   max_pages, fc->use_pages_for_kvec_io);
>   if (err && !nbytes)
>   break;

Just find out that flush_kernel_vmap_range() and
invalidate_kernel_vmap_range() should be used before DMA rw operation
and after DMA read operation if the kvec IO is backed by vmalloc() area.
Will update it in v4.
>  
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f239196103137..d4f04e19058c1 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -860,6 +860,9 @@ struct fuse_conn {
>   /** Passthrough support for read/write IO */
>   unsigned int passthrough:1;
>  
> + /* Use pages instead of pointer for kernel I/O */
> + unsigned int use_pages_for_kvec_io:1;
> +
>   /** Maximum stack depth for passthrough backing files */
>   int max_stack_depth;
>  
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 322af827a2329..36984c0e23d14 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -1512,6 +1512,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
>   fc->delete_stale = true;
>   fc->auto_submounts = true;
>   fc->sync_fs = true;
> + fc->use_pages_for_kvec_io = true;
>  
>   /* Tell FUSE to split requests that exceed the virtqueue's size */
>   fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,




Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-06 Thread Huacai Chen
On Mon, May 6, 2024 at 3:00 PM maobibo  wrote:
>
>
>
> On 2024/5/6 上午9:53, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
> >>
> >> PARAVIRT option and pv ipi is added on guest kernel side, function
> >> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> >> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> >> it will call function kvm_para_available() to detect current hypervirsor
> >> type. Now only KVM type detection is supported, the paravirt function can
> >> work only if current hypervisor type is KVM, since there is only KVM
> >> supported on LoongArch now.
> >>
> >> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> >> virutal IPI sender, ipi message is stored in DDR memory rather than
> >> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> >> at the same time like X86 KVM method. Hypercall method is used for IPI
> >> sending.
> >>
> >> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> >> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> >> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
> >>
> >> Signed-off-by: Bibo Mao 
> >> ---
> >>   arch/loongarch/Kconfig    |   9 ++
> >>   arch/loongarch/include/asm/hardirq.h  |   1 +
> >>   arch/loongarch/include/asm/paravirt.h |  27 
> >>   .../include/asm/paravirt_api_clock.h  |   1 +
> >>   arch/loongarch/kernel/Makefile|   1 +
> >>   arch/loongarch/kernel/irq.c   |   2 +-
> >>   arch/loongarch/kernel/paravirt.c  | 151 ++
> >>   arch/loongarch/kernel/smp.c   |   4 +-
> >>   8 files changed, 194 insertions(+), 2 deletions(-)
> >>   create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>   create mode 100644 arch/loongarch/kernel/paravirt.c
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 54ad04dacdee..0a1540a8853e 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> >>  bool
> >>  default y
> >>
> >> +config PARAVIRT
> >> +   bool "Enable paravirtualization code"
> >> +   depends on AS_HAS_LVZ_EXTENSION
> >> +   help
> >> +  This changes the kernel so it can modify itself when it is run
> >> + under a hypervisor, potentially improving performance 
> >> significantly
> >> + over full virtualization.  However, when run without a hypervisor
> >> + the kernel is theoretically slower and slightly larger.
> >> +
> >>   config ARCH_SUPPORTS_KEXEC
> >>  def_bool y
> >>
> >> diff --git a/arch/loongarch/include/asm/hardirq.h 
> >> b/arch/loongarch/include/asm/hardirq.h
> >> index 9f0038e19c7f..b26d596a73aa 100644
> >> --- a/arch/loongarch/include/asm/hardirq.h
> >> +++ b/arch/loongarch/include/asm/hardirq.h
> >> @@ -21,6 +21,7 @@ enum ipi_msg_type {
> >>   typedef struct {
> >>  unsigned int ipi_irqs[NR_IPI];
> >>  unsigned int __softirq_pending;
> >> +   atomic_t message cacheline_aligned_in_smp;
> >>   } cacheline_aligned irq_cpustat_t;
> >>
> >>   DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> >> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >> b/arch/loongarch/include/asm/paravirt.h
> >> new file mode 100644
> >> index ..58f7b7b89f2c
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt.h
> >> @@ -0,0 +1,27 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >> +#define _ASM_LOONGARCH_PARAVIRT_H
> >> +
> >> +#ifdef CONFIG_PARAVIRT
> >> +#include 
> >> +struct static_key;
> >> +extern struct static_key paravirt_steal_enabled;
> >> +extern struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +u64 dummy_steal_clock(int cpu);
> >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> >> +
> >> +static inline u64 paravirt_steal_clock

Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-06 Thread maobibo




On 2024/5/6 上午9:53, Huacai Chen wrote:

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|   9 ++
  arch/loongarch/include/asm/hardirq.h  |   1 +
  arch/loongarch/include/asm/paravirt.h |  27 
  .../include/asm/paravirt_api_clock.h  |   1 +
  arch/loongarch/kernel/Makefile|   1 +
  arch/loongarch/kernel/irq.c   |   2 +-
  arch/loongarch/kernel/paravirt.c  | 151 ++
  arch/loongarch/kernel/smp.c   |   4 +-
  8 files changed, 194 insertions(+), 2 deletions(-)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 54ad04dacdee..0a1540a8853e 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
  typedef struct {
 unsigned int ipi_irqs[NR_IPI];
 unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
  } cacheline_aligned irq_cpustat_t;

  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
 per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
 }

-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   se

[PATCH] kernel/module: disable cfi for do_mod_ctors

2024-05-06 Thread Joey Jiao
CFI failure when both CONFIG_CONSTRUCTORS and CFI_CLANG enabled.

CFI failure at do_init_module+0x100/0x384 (target:
tsan.module_ctor+0x0/0xa98 [module_name_xx]; expected type: 0xa540670c)

Disable cfi for do_mod_ctors to avoid cfi check on mod->ctors[i]().

Signed-off-by: Joey Jiao 
---
 kernel/module/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..d51e63795637 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2453,6 +2453,7 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 }
 
 /* Call module constructors. */
+__nocfi
 static void do_mod_ctors(struct module *mod)
 {
 #ifdef CONFIG_CONSTRUCTORS
-- 
2.43.2




Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-05 Thread Huacai Chen
Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
>
> PARAVIRT option and pv ipi is added on guest kernel side, function
> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current hypervirsor
> type. Now only KVM type detection is supported, the paravirt function can
> work only if current hypervisor type is KVM, since there is only KVM
> supported on LoongArch now.
>
> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> virutal IPI sender, ipi message is stored in DDR memory rather than
> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> at the same time like X86 KVM method. Hypercall method is used for IPI
> sending.
>
> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|   9 ++
>  arch/loongarch/include/asm/hardirq.h  |   1 +
>  arch/loongarch/include/asm/paravirt.h |  27 
>  .../include/asm/paravirt_api_clock.h  |   1 +
>  arch/loongarch/kernel/Makefile|   1 +
>  arch/loongarch/kernel/irq.c   |   2 +-
>  arch/loongarch/kernel/paravirt.c  | 151 ++
>  arch/loongarch/kernel/smp.c   |   4 +-
>  8 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 54ad04dacdee..0a1540a8853e 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/hardirq.h 
> b/arch/loongarch/include/asm/hardirq.h
> index 9f0038e19c7f..b26d596a73aa 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -21,6 +21,7 @@ enum ipi_msg_type {
>  typedef struct {
> unsigned int ipi_irqs[NR_IPI];
> unsigned int __softirq_pending;
> +   atomic_t message cacheline_aligned_in_smp;
>  } cacheline_aligned irq_cpustat_t;
>
>  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3a7620b66bc6..c9bfeda89e40 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
>

Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-30 Thread Chris Lew




On 4/26/2024 6:36 PM, Dmitry Baryshkov wrote:

On Sat, 27 Apr 2024 at 04:03, Chris Lew  wrote:




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 

@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
   int ret;
   unsigned int val;

- ret = qcom_q6v5_prepare(>q6v5);
+ ret = qcom_pdm_get();
   if (ret)
   return ret;


Would it make sense to try and model this as a rproc subdev? This
section of the remoteproc code seems to be focused on making specific
calls to setup and enable hardware resources, where as pd mapper is
software.

sysmon and ssr are also purely software and they are modeled as subdevs
in qcom_common. I'm not an expert on remoteproc organization but this
was just a thought.


Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance



Both sysmon and ssr have some kind of global states that they manage 
too. Each subdev functionality tends to be a mix of per-remoteproc 
instance management and global state management.


If pd-mapper was completely global, pd-mapper would be able to 
instantiate by itself. Instead, instantiation is dependent on each 
remoteproc instance properly getting and putting references.


The pdm subdev could manage the references to pd-mapper for that 
remoteproc instance.


On the other hand, I think Bjorn recommended this could be moved to 
probe time in v4. The v4 version was doing the reinitialization-dance, 
but I think the recommendation could still apply to this version.




Thanks!
Chris



+ ret = qcom_q6v5_prepare(>q6v5);
+ if (ret)
+ goto put_pdm;
+
   ret = adsp_map_carveout(rproc);
   if (ret) {
   dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
   adsp_unmap_carveout(rproc);
   disable_irqs:
   qcom_q6v5_unprepare(>q6v5);
+put_pdm:
+ qcom_pdm_release();

   return ret;
   }









BUG: unable to handle kernel paging request in do_split

2024-04-29 Thread Ubisectech Sirius
Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the 
email were a PoC file of the issue.

Stack dump:
BUG: unable to handle page fault for address: ed110c2fd97f
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 7ffd0067 P4D 7ffd0067 PUD 0
Oops:  [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 24082 Comm: syz-executor.3 Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047
Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c 
c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 
e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef
RSP: 0018:c90001e9f858 EFLAGS: 00010a02
RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000
RDX: 11110c2fd97f RSI: 823364ab RDI: 0005
RBP: 8880617ecc00 R08: 0005 R09: 
R10:  R11:  R12: dc00
R13:  R14:  R15: 88801ee8d2b0
FS:  7f191402a640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:
 
 make_indexed_dir+0x1158/0x1540 fs/ext4/namei.c:2342
 ext4_add_entry+0xcd0/0xe80 fs/ext4/namei.c:2454
 ext4_add_nondir+0x90/0x2b0 fs/ext4/namei.c:2795
 ext4_symlink+0x539/0x9e0 fs/ext4/namei.c:3436
 vfs_symlink fs/namei.c:4464 [inline]
 vfs_symlink+0x3f6/0x640 fs/namei.c:4448
 do_symlinkat+0x245/0x2f0 fs/namei.c:4490
 __do_sys_symlink fs/namei.c:4511 [inline]
 __se_sys_symlink fs/namei.c:4509 [inline]
 __x64_sys_symlink+0x79/0xa0 fs/namei.c:4509
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f191329002d
Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7f191402a028 EFLAGS: 0246 ORIG_RAX: 0058
RAX: ffda RBX: 7f19133cbf80 RCX: 7f191329002d
RDX:  RSI: 2e40 RDI: 20001640
RBP: 7f19132f14d0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 000b R14: 7f19133cbf80 R15: 7f191400a000
 
Modules linked in:
CR2: ed110c2fd97f
---[ end trace  ]---
RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047
Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c 
c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 
e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef
RSP: 0018:c90001e9f858 EFLAGS: 00010a02
RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000
RDX: 11110c2fd97f RSI: 823364ab RDI: 0005
RBP: 8880617ecc00 R08: 0005 R09: 
R10:  R11:  R12: dc00
R13:  R14:  R15: 88801ee8d2b0
FS:  7f191402a640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554

Code disassembly (best guess):
   0:   d2 0f   rorb   %cl,(%rdi)
   2:   85 38   test   %edi,(%rax)
   4:   0b 00   or (%rax),%eax
   6:   00 8b 45 00 89 84   add%cl,-0x7b76ffbb(%rbx)
   c:   24 84   and$0x84,%al
   e:   00 00   add%al,(%rax)
  10:   00 41 8dadd%al,-0x73(%rcx)
  13:   45 ff 48 8d rex.RB decl -0x73(%r8)
  17:   1c c3   sbb$0xc3,%al
  19:   48 b8 00 00 00 00 00movabs $0xdc00,%rax
  20:   fc ff df
  23:   48 89 damov%rbx,%rdx
  26:   48 c1 ea 03 shr$0x3,%rdx
* 2a:   0f b6 14 02 movzbl (%rdx,%rax,1),%edx <-- trapping 
instruction
  2e:   48 89 d8mov%rbx,%rax
  31:   83 e0 07and$0x7,%eax
  34:   83 c0 03add$0x3,%eax
  37:   38 d0   cmp%dl,%al
  39:   7c 08   jl 0x43
  3b:   84 d2   test   %dl,%dl
  3d:   0f  .byte 0xf
  3e:   85 ef   test   

[PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-04-28 Thread Bibo Mao
PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|   9 ++
 arch/loongarch/include/asm/hardirq.h  |   1 +
 arch/loongarch/include/asm/paravirt.h |  27 
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |   2 +-
 arch/loongarch/kernel/paravirt.c  | 151 ++
 arch/loongarch/kernel/smp.c   |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 54ad04dacdee..0a1540a8853e 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
}
 
-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI 
| ECFGF_PMC);
 }
diff --git a/arch/loongarch/kernel/paravir

Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Dmitry Baryshkov
On Sat, 27 Apr 2024 at 04:03, Chris Lew  wrote:
>
>
>
> On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:
> > diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
> > b/drivers/remoteproc/qcom_q6v5_adsp.c
> > index 1d24c9b656a8..02d0c626b03b 100644
> > --- a/drivers/remoteproc/qcom_q6v5_adsp.c
> > +++ b/drivers/remoteproc/qcom_q6v5_adsp.c
> > @@ -23,6 +23,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >
> > @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
> >   int ret;
> >   unsigned int val;
> >
> > - ret = qcom_q6v5_prepare(>q6v5);
> > + ret = qcom_pdm_get();
> >   if (ret)
> >   return ret;
>
> Would it make sense to try and model this as a rproc subdev? This
> section of the remoteproc code seems to be focused on making specific
> calls to setup and enable hardware resources, where as pd mapper is
> software.
>
> sysmon and ssr are also purely software and they are modeled as subdevs
> in qcom_common. I'm not an expert on remoteproc organization but this
> was just a thought.

Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance

>
> Thanks!
> Chris
>
> >
> > + ret = qcom_q6v5_prepare(>q6v5);
> > + if (ret)
> > + goto put_pdm;
> > +
> >   ret = adsp_map_carveout(rproc);
> >   if (ret) {
> >   dev_err(adsp->dev, "ADSP smmu mapping failed\n");
> > @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
> >   adsp_unmap_carveout(rproc);
> >   disable_irqs:
> >   qcom_q6v5_unprepare(>q6v5);
> > +put_pdm:
> > + qcom_pdm_release();
> >
> >   return ret;
> >   }
>


-- 
With best wishes
Dmitry



Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Chris Lew




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)

int ret;
unsigned int val;
  
-	ret = qcom_q6v5_prepare(>q6v5);

+   ret = qcom_pdm_get();
if (ret)
return ret;


Would it make sense to try and model this as a rproc subdev? This 
section of the remoteproc code seems to be focused on making specific 
calls to setup and enable hardware resources, where as pd mapper is 
software.


sysmon and ssr are also purely software and they are modeled as subdevs 
in qcom_common. I'm not an expert on remoteproc organization but this 
was just a thought.


Thanks!
Chris

  
+	ret = qcom_q6v5_prepare(>q6v5);

+   if (ret)
+   goto put_pdm;
+
ret = adsp_map_carveout(rproc);
if (ret) {
dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
adsp_unmap_carveout(rproc);
  disable_irqs:
qcom_q6v5_unprepare(>q6v5);
+put_pdm:
+   qcom_pdm_release();
  
  	return ret;

  }





Re: [PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.

2024-04-26 Thread Google
Hi LuMingYin,

Thanks for finding the problem! But please make a commit message
following Documentation/process/submitting-patches.rst


On Fri, 26 Apr 2024 10:13:43 +0100
lumingyindet...@126.com wrote:

> From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> 
> At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables 
> named code and tmp are defined. At line 1437, a new dynamic memory area is 
> allocated using the function kcalloc. When the if statement at line 1467 
> evaluates to true, the program jumps to the out label at line 1469. Within 
> this function, there are two labels: out and fail. The difference between 
> these two labels is that fail additionally frees the dynamic memory area 
> pointed to by the variable code. Therefore, the program should jump to the 
> fail label instead of the out label. This commit fixes this bug.
> 

For example, you must line break after about 70 characters. Also,
please don't use the line number because the line number is easily
changed (function name is OK). Since this bug is very clear mistake,
so you can just explain that as following.


 If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it
 jumps to 'out' instead of 'fail' by mistake. In the result, in this
 case the 'tmp' buffer is not freed and leaks its memory.

 Fix it by jumping to 'fail' in that case.

The first paragraph explains what happens, and second one to exaplain
how to fix it.

Also, please add this Fixes tag.

Fixes: 032330abd08b ("tracing/probes: Cleanup probe argument parser")

You can easily find this commit number with git blame.

Thank you,

> Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> ---
>  kernel/trace/trace_probe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index dfe3ee6035ec..42bc0f362226 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
> *argv, ssize_t *size,
>   parg->fmt = kmalloc(len, GFP_KERNEL);
>   if (!parg->fmt) {
>   ret = -ENOMEM;
> - goto out;
> + goto fail;
>   }
>   snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
>parg->count);
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu (Google) 



[PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO

2024-04-26 Thread Hou Tao
From: Hou Tao 

Hi,

The patch set aims to fix the warning related to an abnormal size
parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing
use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs.
Beside the abnormal size parameter for kmalloc, the gfp parameter is
also questionable: GFP_ATOMIC is used even when the allocation occurs
in a kworker context. Patch #2 fixes it by using GFP_NOFS when the
allocation is initiated by the kworker. For more details, please check
the individual patches.

As usual, comments are always welcome.

Change Log:

v3:
 * introduce use_pages_for_kvec_io for virtiofs. When the option is
   enabled, fuse will use iov_iter_extract_pages() to construct a page
   array and pass the pages array instead of a pointer to virtiofs.
   The benefit is twofold: the length of the data passed to virtiofs is
   limited by max_pages, and there is no memory copy compared with v2.

v2: 
https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/
  * limit the length of ITER_KVEC dio by max_pages instead of the
newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
dio being consistent with other rw operations.
  * replace kmalloc-allocated bounce buffer by using a bounce buffer
backed by scattered pages when the length of the bounce buffer for
KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
fragmented memory, the KVEC_ITER dio can be handled normally by
virtiofs. (Bernd Schubert)
  * merge the GFP_NOFS patch [1] into this patch-set and use
memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
(Benjamin Coddington)

v1: 
https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/

[1]: 
https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/

Hou Tao (2):
  virtiofs: use pages instead of pointer for kernel direct IO
  virtiofs: use GFP_NOFS when enqueuing request through kworker

 fs/fuse/file.c  | 12 
 fs/fuse/fuse_i.h|  3 +++
 fs/fuse/virtio_fs.c | 25 -
 3 files changed, 27 insertions(+), 13 deletions(-)

-- 
2.29.2




[PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-04-26 Thread Hou Tao
From: Hou Tao 

When trying to insert a 10MB kernel module kept in a virtio-fs with cache
disabled, the following warning was reported:

  [ cut here ]
  WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ..
  Modules linked in:
  CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ..
  RIP: 0010:__alloc_pages+0x2bf/0x380
  ..
  Call Trace:
   
   ? __warn+0x8e/0x150
   ? __alloc_pages+0x2bf/0x380
   __kmalloc_large_node+0x86/0x160
   __kmalloc+0x33c/0x480
   virtio_fs_enqueue_req+0x240/0x6d0
   virtio_fs_wake_pending_and_unlock+0x7f/0x190
   queue_request_and_unlock+0x55/0x60
   fuse_simple_request+0x152/0x2b0
   fuse_direct_io+0x5d2/0x8c0
   fuse_file_read_iter+0x121/0x160
   __kernel_read+0x151/0x2d0
   kernel_read+0x45/0x50
   kernel_read_file+0x1a9/0x2a0
   init_module_from_file+0x6a/0xe0
   idempotent_init_module+0x175/0x230
   __x64_sys_finit_module+0x5d/0xb0
   x64_sys_call+0x1c3/0x9e0
   do_syscall_64+0x3d/0xc0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   ..
   
  ---[ end trace  ]---

The warning is triggered as follows:

1) syscall finit_module() handles the module insertion and it invokes
kernel_read_file() to read the content of the module first.

2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
passes it to kernel_read(). kernel_read() constructs a kvec iter by
using iov_iter_kvec() and passes it to fuse_file_read_iter().

3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
fuse_direct_io(). As for now, the maximal read size for kvec iter is
only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
the size of the 10MB-sized buffer in out_args[0] of a fuse request and
passes the fuse request to virtio_fs_wake_pending_and_unlock().

4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
queue the request. Because virtiofs need DMA-able address, so
virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for
all fuse args, copies these args into the bounce buffer and passed the
physical address of the bounce buffer to virtiofsd. The total length of
these fuse args for the passed fuse request is about 10MB, so
copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and
it triggers the warning in __alloc_pages():

if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
return NULL;

5) virtio_fs_enqueue_req() will retry the memory allocation in a
kworker, but it won't help, because kmalloc() will always return NULL
due to the abnormal size and finit_module() will hang forever.

A feasible solution is to limit the value of max_read for virtio-fs, so
the length passed to kmalloc() will be limited. However it will affect
the maximal read size for normal read. And for virtio-fs write initiated
from kernel, it has the similar problem but now there is no way to limit
fc->max_write in kernel.

So instead of limiting both the values of max_read and max_write in
kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as
true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use
pages instead of pointer to pass the KVEC_IO data.

Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Hou Tao 
---
 fs/fuse/file.c  | 12 
 fs/fuse/fuse_i.h|  3 +++
 fs/fuse/virtio_fs.c |  1 +
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b57ce41576407..82b77c5d8c643 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1471,13 +1471,17 @@ static inline size_t fuse_get_frag_size(const struct 
iov_iter *ii,
 
 static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
   size_t *nbytesp, int write,
-  unsigned int max_pages)
+  unsigned int max_pages,
+  bool use_pages_for_kvec_io)
 {
size_t nbytes = 0;  /* # bytes already packed in req */
ssize_t ret = 0;
 
-   /* Special case for kernel I/O: can copy directly into the buffer */
-   if (iov_iter_is_kvec(ii)) {
+   /* Special case for kernel I/O: can copy directly into the buffer.
+* However if the implementation of fuse_conn requires pages instead of
+* pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead.
+*/
+   if (iov_iter_is_kvec(ii) && !use_pages_for_kvec_io) {
unsigned long user_addr = fuse_get_user_addr(ii);
size_t frag_size = fuse_get_frag_size(ii, *nbytesp);
 
@@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct 
iov_iter *iter,
size_t nbytes = min(count, nmax);
 
err = fuse_get_user_pages(>ap, iter, , write,
-

Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-26 Thread Alexey Minnekhanov




On 24.04.2024 12:27, Dmitry Baryshkov wrote:

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

Unlike previous revisions of the patchset, this iteration uses static
configuration per platform, rather than building it dynamically from the
list of DSPs being started.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
--

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
   builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
   silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
   (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
   them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404



I've tested this series on sdm660 device, with userspace pd-mapper
service disabled, and don't see any regressions - e.g Wi-Fi/BT
still come online and work as before.

Debug logs:
https://paste.sr.ht/~minlexx/bd03db4c582a3275078ce4fd05ea76ce46a52b8e

Missing cdsp_root and adsp_sensors PDs are not currently an issue,
because those are not enabled yet on SDM660 or hard to test, so

Tested-by: Alexey Minnekhanov 

--
Regards,
Alexey Minnekhanov
postmarketOS developer



[PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.

2024-04-26 Thread lumingyindetect
From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>

At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables 
named code and tmp are defined. At line 1437, a new dynamic memory area is 
allocated using the function kcalloc. When the if statement at line 1467 
evaluates to true, the program jumps to the out label at line 1469. Within this 
function, there are two labels: out and fail. The difference between these two 
labels is that fail additionally frees the dynamic memory area pointed to by 
the variable code. Therefore, the program should jump to the fail label instead 
of the out label. This commit fixes this bug.

Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
---
 kernel/trace/trace_probe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index dfe3ee6035ec..42bc0f362226 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
*argv, ssize_t *size,
parg->fmt = kmalloc(len, GFP_KERNEL);
if (!parg->fmt) {
ret = -ENOMEM;
-   goto out;
+   goto fail;
}
snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
 parg->count);
-- 
2.25.1




Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-25 Thread Dmitry Baryshkov
On Thu, 25 Apr 2024 at 10:08, Steev Klimaszewski  wrote:
>
> Hi Dmitry,
>
> On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov
>  wrote:
> >
> > Protection domain mapper is a QMI service providing mapping between
> > 'protection domains' and services supported / allowed in these domains.
> > For example such mapping is required for loading of the WiFi firmware or
> > for properly starting up the UCSI / altmode / battery manager support.
> >
> > The existing userspace implementation has several issue. It doesn't play
> > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> > firmware location is changed (or if the firmware was not available at
> > the time pd-mapper was started but the corresponding directory is
> > mounted later), etc.
> >
> > However this configuration is largely static and common between
> > different platforms. Provide in-kernel service implementing static
> > per-platform data.
> >
> > Unlike previous revisions of the patchset, this iteration uses static
> > configuration per platform, rather than building it dynamically from the
> > list of DSPs being started.
> >
> > To: Bjorn Andersson 
> > To: Konrad Dybcio 
> > To: Sibi Sankar 
> > To: Mathieu Poirier 
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-remotep...@vger.kernel.org
> > Cc: Johan Hovold 
> > Cc: Xilin Wu 
> > Cc: "Bryan O'Donoghue" 
> > --
> >
> > Changes in v7:
> > - Fixed modular build (Steev)
> > - Link to v6: 
> > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
> >
> > Changes in v6:
> > - Reworked mutex to fix lockdep issue on deregistration
> > - Fixed dependencies between PD-mapper and remoteproc to fix modular
> >   builds (Krzysztof)
> > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> > - Fixed kerneldocs (Krzysztof)
> > - Removed extra pr_debug messages (Krzysztof)
> > - Fixed wcss build (Krzysztof)
> > - Added platforms which do not require protection domain mapping to
> >   silence the notice on those platforms
> > - Link to v5: 
> > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
> >
> > Changes in v5:
> > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> > Lew)
> > - pd_mapper: reworked to provide static configuration per platform
> >   (Bjorn)
> > - Link to v4: 
> > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
> >
> > Changes in v4:
> > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> > - Added configuration for sm6350 (Thanks to Luca)
> > - Removed RFC tag (Konrad)
> > - Link to v3: 
> > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
> >
> > Changes in RFC v3:
> > - Send start / stop notifications when PD-mapper domain list is changed
> > - Reworked the way PD-mapper treats protection domains, register all of
> >   them in a single batch
> > - Added SC7180 domains configuration based on TCL Book 14 GO
> > - Link to v2: 
> > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
> >
> > Changes in RFC v2:
> > - Swapped num_domains / domains (Konrad)
> > - Fixed an issue with battery not working on sc8280xp
> > - Added missing configuration for QCS404
> >
> > ---
> > Dmitry Baryshkov (6):
> >   soc: qcom: pdr: protect locator_addr with the main mutex
> >   soc: qcom: pdr: fix parsing of domains lists
> >   soc: qcom: pdr: extract PDR message marshalling data
> >   soc: qcom: qmi: add a way to remove running service
> >   soc: qcom: add pd-mapper implementation
> >   remoteproc: qcom: enable in-kernel PD mapper
> >
> >  drivers/remoteproc/Kconfig  |   4 +
> >  drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
> >  drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
> >  drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
> >  drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
> >  drivers/soc/qcom/Kconfig|  14 +
> >  drivers/soc/qcom/Makefile   |   2 +
> >  drivers/soc/qcom/pdr_interface.c|   6 +-
> >  drivers/soc/qcom/pdr_internal.h | 318 ++---
> >  drivers/soc/qcom/qcom_pd_mapper.c   | 656 
> > 
> >  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
> >  drivers/soc/qcom/qmi_interface.c|  67 
> >  include/linux/soc/qcom/pd_mapper.h  |  28 ++
> >  include/linux/soc/qcom/qmi.h|   2 +
> >  14 files changed, 1193 insertions(+), 302 deletions(-)
> > ---
> > base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
> > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
> >
> > Best regards,
> > --
> > Dmitry Baryshkov 
> >
> >
> I've tested this series over a large number of reboots, and the p-d
> devices(?) do always seem to come up (with the pd-mapper service
> disabled) on my Thinkpad X13s.  One less service to run in userland!
> Tested-by: Steev Klimaszewski 

Thank you!

-- 
With best wishes
Dmitry



Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-25 Thread Steev Klimaszewski
Hi Dmitry,

On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov
 wrote:
>
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
>
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
>
> However this configuration is largely static and common between
> different platforms. Provide in-kernel service implementing static
> per-platform data.
>
> Unlike previous revisions of the patchset, this iteration uses static
> configuration per platform, rather than building it dynamically from the
> list of DSPs being started.
>
> To: Bjorn Andersson 
> To: Konrad Dybcio 
> To: Sibi Sankar 
> To: Mathieu Poirier 
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-remotep...@vger.kernel.org
> Cc: Johan Hovold 
> Cc: Xilin Wu 
> Cc: "Bryan O'Donoghue" 
> --
>
> Changes in v7:
> - Fixed modular build (Steev)
> - Link to v6: 
> https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
>
> Changes in v6:
> - Reworked mutex to fix lockdep issue on deregistration
> - Fixed dependencies between PD-mapper and remoteproc to fix modular
>   builds (Krzysztof)
> - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> - Fixed kerneldocs (Krzysztof)
> - Removed extra pr_debug messages (Krzysztof)
> - Fixed wcss build (Krzysztof)
> - Added platforms which do not require protection domain mapping to
>   silence the notice on those platforms
> - Link to v5: 
> https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
>
> Changes in v5:
> - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> Lew)
> - pd_mapper: reworked to provide static configuration per platform
>   (Bjorn)
> - Link to v4: 
> https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
>
> Changes in v4:
> - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> - Added configuration for sm6350 (Thanks to Luca)
> - Removed RFC tag (Konrad)
> - Link to v3: 
> https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
>
> Changes in RFC v3:
> - Send start / stop notifications when PD-mapper domain list is changed
> - Reworked the way PD-mapper treats protection domains, register all of
>   them in a single batch
> - Added SC7180 domains configuration based on TCL Book 14 GO
> - Link to v2: 
> https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
>
> Changes in RFC v2:
> - Swapped num_domains / domains (Konrad)
> - Fixed an issue with battery not working on sc8280xp
> - Added missing configuration for QCS404
>
> ---
> Dmitry Baryshkov (6):
>   soc: qcom: pdr: protect locator_addr with the main mutex
>   soc: qcom: pdr: fix parsing of domains lists
>   soc: qcom: pdr: extract PDR message marshalling data
>   soc: qcom: qmi: add a way to remove running service
>   soc: qcom: add pd-mapper implementation
>   remoteproc: qcom: enable in-kernel PD mapper
>
>  drivers/remoteproc/Kconfig  |   4 +
>  drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
>  drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
>  drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
>  drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
>  drivers/soc/qcom/Kconfig|  14 +
>  drivers/soc/qcom/Makefile   |   2 +
>  drivers/soc/qcom/pdr_interface.c|   6 +-
>  drivers/soc/qcom/pdr_internal.h | 318 ++---
>  drivers/soc/qcom/qcom_pd_mapper.c   | 656 
> 
>  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
>  drivers/soc/qcom/qmi_interface.c|  67 
>  include/linux/soc/qcom/pd_mapper.h  |  28 ++
>  include/linux/soc/qcom/qmi.h|   2 +
>  14 files changed, 1193 insertions(+), 302 deletions(-)
> ---
> base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
> change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
>
> Best regards,
> --
> Dmitry Baryshkov 
>
>
I've tested this series over a large number of reboots, and the p-d
devices(?) do always seem to come up (with the pd-mapper service
disabled) on my Thinkpad X13s.  One less service to run in userland!
Tested-by: Steev Klimaszewski 



[PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-24 Thread Dmitry Baryshkov
Request in-kernel protection domain mapper to be started before starting
Qualcomm DSP and release it once DSP is stopped. Once all DSPs are
stopped, the PD mapper will be stopped too.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/remoteproc/Kconfig  |  4 
 drivers/remoteproc/qcom_q6v5_adsp.c | 11 ++-
 drivers/remoteproc/qcom_q6v5_mss.c  | 10 +-
 drivers/remoteproc/qcom_q6v5_pas.c  | 12 +++-
 drivers/remoteproc/qcom_q6v5_wcss.c | 12 +++-
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa85..a0ce552f89a1 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -181,6 +181,7 @@ config QCOM_Q6V5_ADSP
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_PIL_INFO
select QCOM_MDT_LOADER
@@ -201,6 +202,7 @@ config QCOM_Q6V5_MSS
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_MDT_LOADER
select QCOM_PIL_INFO
@@ -221,6 +223,7 @@ config QCOM_Q6V5_PAS
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_PIL_INFO
select QCOM_MDT_LOADER
@@ -243,6 +246,7 @@ config QCOM_Q6V5_WCSS
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_MDT_LOADER
select QCOM_PIL_INFO
diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
int ret;
unsigned int val;
 
-   ret = qcom_q6v5_prepare(>q6v5);
+   ret = qcom_pdm_get();
if (ret)
return ret;
 
+   ret = qcom_q6v5_prepare(>q6v5);
+   if (ret)
+   goto put_pdm;
+
ret = adsp_map_carveout(rproc);
if (ret) {
dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
adsp_unmap_carveout(rproc);
 disable_irqs:
qcom_q6v5_unprepare(>q6v5);
+put_pdm:
+   qcom_pdm_release();
 
return ret;
 }
@@ -478,6 +485,8 @@ static int adsp_stop(struct rproc *rproc)
if (handover)
qcom_adsp_pil_handover(>q6v5);
 
+   qcom_pdm_release();
+
return ret;
 }
 
diff --git a/drivers/remoteproc/qcom_q6v5_mss.c 
b/drivers/remoteproc/qcom_q6v5_mss.c
index 1779fc890e10..791f11e7adbf 100644
--- a/drivers/remoteproc/qcom_q6v5_mss.c
+++ b/drivers/remoteproc/qcom_q6v5_mss.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1581,10 +1582,14 @@ static int q6v5_start(struct rproc *rproc)
int xfermemop_ret;
int ret;
 
-   ret = q6v5_mba_load(qproc);
+   ret = qcom_pdm_get();
if (ret)
return ret;
 
+   ret = q6v5_mba_load(qproc);
+   if (ret)
+   goto put_pdm;
+
dev_info(qproc->dev, "MBA booted with%s debug policy, loading mpss\n",
 qproc->dp_size ? "" : "out");
 
@@ -1613,6 +1618,8 @@ static int q6v5_start(struct rproc *rproc)
 reclaim_mpss:
q6v5_mba_reclaim(qproc);
q6v5_dump_mba_logs(qproc);
+put_pdm:
+   qcom_pdm_release();
 
return ret;
 }
@@ -1627,6 +1634,7 @@ static int q6v5_stop(struct rproc *rproc)
dev_err(qproc->dev, "timed out on wait\n");
 
q6v5_mba_reclaim(qproc);
+   qcom_pdm_release();
 
return 0;
 }
diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 54d8005d40a3..653e54f975fc 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -261,10 +262,14 @@ static int adsp_start(struct rproc *rproc)
struct qcom_adsp *adsp = rproc->priv;
int ret;
 
-   ret = qcom_q6v5_prepare(>q6v5);
+   ret = qcom_pdm_get();
if (ret)
return ret;
 
+

[PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-24 Thread Dmitry Baryshkov
Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

Unlike previous revisions of the patchset, this iteration uses static
configuration per platform, rather than building it dynamically from the
list of DSPs being started.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
--

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
  builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
  silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
  (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
  them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (6):
  soc: qcom: pdr: protect locator_addr with the main mutex
  soc: qcom: pdr: fix parsing of domains lists
  soc: qcom: pdr: extract PDR message marshalling data
  soc: qcom: qmi: add a way to remove running service
  soc: qcom: add pd-mapper implementation
  remoteproc: qcom: enable in-kernel PD mapper

 drivers/remoteproc/Kconfig  |   4 +
 drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
 drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
 drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
 drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
 drivers/soc/qcom/Kconfig|  14 +
 drivers/soc/qcom/Makefile   |   2 +
 drivers/soc/qcom/pdr_interface.c|   6 +-
 drivers/soc/qcom/pdr_internal.h | 318 ++---
 drivers/soc/qcom/qcom_pd_mapper.c   | 656 
 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
 drivers/soc/qcom/qmi_interface.c|  67 
 include/linux/soc/qcom/pd_mapper.h  |  28 ++
 include/linux/soc/qcom/qmi.h|   2 +
 14 files changed, 1193 insertions(+), 302 deletions(-)
---
base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,
-- 
Dmitry Baryshkov 




Re: Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-12 Thread Michal Koutný
On Thu, Apr 11, 2024 at 03:03:31PM -0700, Andrew Morton 
 wrote:
> A large increase in the maximum number of processes.

The change from (some) default to effective infinity is the crux of the
change. Because that is only a number.
(Thus I don't find the number's 12700% increase alone a big change.)

Actual maximum amount of processes is "workload dependent" and hence
should be determined based on the particular workload.

> Or did I misinterpret?

I thought you saw an issue with projection of that number into sizings
based on the default. Which of them comprises the large change in your
eyes?

Thanks,
Michal



Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-11 Thread Andrew Morton
On Thu, 11 Apr 2024 17:40:02 +0200 Michal Koutný  wrote:

> Hello.
> 
> On Mon, Apr 08, 2024 at 01:29:55PM -0700, Andrew Morton 
>  wrote:
> > That seems like a large change.
> 
> In what sense is it large?

A large increase in the maximum number of processes.  Or did I misinterpret?





[PATCH 3/5] openrisc: traps: Don't send signals to kernel mode threads

2024-04-11 Thread Stafford Horne
OpenRISC exception handling sends signals to user processes on floating
point exceptions and trap instructions (for debugging) among others.
There is a bug where the trap handling logic may send signals to kernel
threads, we should not send these signals to kernel threads, if that
happens we treat it as an error.

This patch adds conditions to die if the kernel receives these
exceptions in kernel mode code.

Fixes: 27267655c531 ("openrisc: Support floating point user api")
Signed-off-by: Stafford Horne 
---
 arch/openrisc/kernel/traps.c | 48 ++--
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index 88fe27e4c10c..211ddaa0c5fa 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -180,29 +180,39 @@ asmlinkage void unhandled_exception(struct pt_regs *regs, 
int ea, int vector)
 
 asmlinkage void do_fpe_trap(struct pt_regs *regs, unsigned long address)
 {
-   int code = FPE_FLTUNK;
-   unsigned long fpcsr = regs->fpcsr;
-
-   if (fpcsr & SPR_FPCSR_IVF)
-   code = FPE_FLTINV;
-   else if (fpcsr & SPR_FPCSR_OVF)
-   code = FPE_FLTOVF;
-   else if (fpcsr & SPR_FPCSR_UNF)
-   code = FPE_FLTUND;
-   else if (fpcsr & SPR_FPCSR_DZF)
-   code = FPE_FLTDIV;
-   else if (fpcsr & SPR_FPCSR_IXF)
-   code = FPE_FLTRES;
-
-   /* Clear all flags */
-   regs->fpcsr &= ~SPR_FPCSR_ALLF;
-
-   force_sig_fault(SIGFPE, code, (void __user *)regs->pc);
+   if (user_mode(regs)) {
+   int code = FPE_FLTUNK;
+   unsigned long fpcsr = regs->fpcsr;
+
+   if (fpcsr & SPR_FPCSR_IVF)
+   code = FPE_FLTINV;
+   else if (fpcsr & SPR_FPCSR_OVF)
+   code = FPE_FLTOVF;
+   else if (fpcsr & SPR_FPCSR_UNF)
+   code = FPE_FLTUND;
+   else if (fpcsr & SPR_FPCSR_DZF)
+   code = FPE_FLTDIV;
+   else if (fpcsr & SPR_FPCSR_IXF)
+   code = FPE_FLTRES;
+
+   /* Clear all flags */
+   regs->fpcsr &= ~SPR_FPCSR_ALLF;
+
+   force_sig_fault(SIGFPE, code, (void __user *)regs->pc);
+   } else {
+   pr_emerg("KERNEL: Illegal fpe exception 0x%.8lx\n", regs->pc);
+   die("Die:", regs, SIGFPE);
+   }
 }
 
 asmlinkage void do_trap(struct pt_regs *regs, unsigned long address)
 {
-   force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc);
+   if (user_mode(regs)) {
+   force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc);
+   } else {
+   pr_emerg("KERNEL: Illegal trap exception 0x%.8lx\n", regs->pc);
+   die("Die:", regs, SIGILL);
+   }
 }
 
 asmlinkage void do_unaligned_access(struct pt_regs *regs, unsigned long 
address)
-- 
2.44.0




Re: Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-11 Thread Michal Koutný
Hello.

On Mon, Apr 08, 2024 at 01:29:55PM -0700, Andrew Morton 
 wrote:
> That seems like a large change.

In what sense is it large?

I tried to lookup the code parts that depend on this default and either
add the other patches or mention the impact (that part could be more
thorough) in the commit message.

> It isn't clear why we'd want to merge this patchset.  Does it improve
> anyone's life and if so, how?

- kernel devs who don't care about policy
  - policy should be decided by distros/users, not in kernel

- users who need many threads
  - current default is too low
  - this is one more place to look at when configuring

- users who want to prevent fork-bombs
  - current default is ineffective (too high), false feeling of safety
  - i.e. they should configure appropriate mechanism appropriately


I thought that the first point alone would be convincing and that only
scaling impact might need clarification.

Regards,
Michal



[PATCH v2 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-04-10 Thread Andrew Davis
The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly as part of a threaded IRQ handler.

Signed-off-by: Andrew Davis 
---
 drivers/mailbox/Kconfig|   9 ---
 drivers/mailbox/omap-mailbox.c | 107 ++---
 2 files changed, 5 insertions(+), 111 deletions(-)

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index 42940108a1874..78e4c74fbe5c2 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -68,15 +68,6 @@ config OMAP2PLUS_MBOX
  OMAP2/3; or IPU, IVA HD and DSP in OMAP4/5. Say Y here if you
  want to use OMAP2+ Mailbox framework support.
 
-config OMAP_MBOX_KFIFO_SIZE
-   int "Mailbox kfifo default buffer size (bytes)"
-   depends on OMAP2PLUS_MBOX
-   default 256
-   help
- Specify the default size of mailbox's kfifo buffers (bytes).
- This can also be changed at runtime (via the mbox_kfifo_size
- module parameter).
-
 config ROCKCHIP_MBOX
bool "Rockchip Soc Integrated Mailbox Support"
depends on ARCH_ROCKCHIP || COMPILE_TEST
diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c
index c5d4083125856..46747559b438f 100644
--- a/drivers/mailbox/omap-mailbox.c
+++ b/drivers/mailbox/omap-mailbox.c
@@ -65,14 +65,6 @@ struct omap_mbox_fifo {
u32 intr_bit;
 };
 
-struct omap_mbox_queue {
-   spinlock_t  lock;
-   struct kfifofifo;
-   struct work_struct  work;
-   struct omap_mbox*mbox;
-   bool full;
-};
-
 struct omap_mbox_match_data {
u32 intr_type;
 };
@@ -90,7 +82,6 @@ struct omap_mbox_device {
 struct omap_mbox {
const char  *name;
int irq;
-   struct omap_mbox_queue  *rxq;
struct omap_mbox_device *parent;
struct omap_mbox_fifo   tx_fifo;
struct omap_mbox_fifo   rx_fifo;
@@ -99,10 +90,6 @@ struct omap_mbox {
boolsend_no_irq;
 };
 
-static unsigned int mbox_kfifo_size = CONFIG_OMAP_MBOX_KFIFO_SIZE;
-module_param(mbox_kfifo_size, uint, S_IRUGO);
-MODULE_PARM_DESC(mbox_kfifo_size, "Size of omap's mailbox kfifo (bytes)");
-
 static inline
 unsigned int mbox_read_reg(struct omap_mbox_device *mdev, size_t ofs)
 {
@@ -202,30 +189,6 @@ static void omap_mbox_disable_irq(struct omap_mbox *mbox, 
omap_mbox_irq_t irq)
mbox_write_reg(mbox->parent, bit, irqdisable);
 }
 
-/*
- * Message receiver(workqueue)
- */
-static void mbox_rx_work(struct work_struct *work)
-{
-   struct omap_mbox_queue *mq =
-   container_of(work, struct omap_mbox_queue, work);
-   u32 msg;
-   int len;
-
-   while (kfifo_len(>fifo) >= sizeof(msg)) {
-   len = kfifo_out(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
-
-   mbox_chan_received_data(mq->mbox->chan, (void *)(uintptr_t)msg);
-   spin_lock_irq(>lock);
-   if (mq->full) {
-   mq->full = false;
-   omap_mbox_enable_irq(mq->mbox, IRQ_RX);
-   }
-   spin_unlock_irq(>lock);
-   }
-}
-
 /*
  * Mailbox interrupt handler
  */
@@ -238,27 +201,15 @@ static void __mbox_tx_interrupt(struct omap_mbox *mbox)
 
 static void __mbox_rx_interrupt(struct omap_mbox *mbox)
 {
-   struct omap_mbox_queue *mq = mbox->rxq;
u32 msg;
-   int len;
 
while (!mbox_fifo_empty(mbox)) {
-   if (unlikely(kfifo_avail(>fifo) < sizeof(msg))) {
-   omap_mbox_disable_irq(mbox, IRQ_RX);
-   mq->full = true;
-   goto nomem;
-   }
-
msg = mbox_fifo_read(mbox);
-
-   len = kfifo_in(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
+   mbox_chan_received_data(mbox->chan, (void *)(uintptr_t)msg);
}
 
-   /* no more messages in the fifo. clear IRQ source. */
+   /* clear IRQ source. */
ack_mbox_irq(mbox, IRQ_RX);
-nomem:
-   schedule_work(>rxq->work);
 }
 
 static irqreturn_t mbox_interrupt(int irq, void *p)
@@ -274,57 +225,15 @@ static irqreturn_t mbox_interrupt(int irq, void *p)
return IRQ_HANDLED;
 }
 
-static struct omap_mbox_queue *mbox_queue_alloc(struct omap_mbox *mbox,
-   void (*work)(struct work_struct *))
-{
-   struct omap_mbox_queue *mq;
- 

[PATCH v2 fs/proc/bootconfig 2/2] fs/proc: Skip bootloader comment if no embedded kernel parameters

2024-04-08 Thread Paul E. McKenney
From: Masami Hiramatsu 

If the "bootconfig" kernel command-line argument was specified or if
the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are
no embedded kernel parameter, omit the "# Parameters from bootloader:"
comment from the /proc/bootconfig file.  This will cause automation
to fall back to the /proc/cmdline file, which will be identical to the
comment in this no-embedded-kernel-parameters case.

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Paul E. McKenney 
---
 fs/proc/bootconfig.c   | 2 +-
 include/linux/bootconfig.h | 1 +
 init/main.c| 5 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
index e5635a6b127b0..87dcaae32ff87 100644
--- a/fs/proc/bootconfig.c
+++ b/fs/proc/bootconfig.c
@@ -63,7 +63,7 @@ static int __init copy_xbc_key_value_list(char *dst, size_t 
size)
dst += ret;
}
}
-   if (ret >= 0 && boot_command_line[0]) {
+   if (cmdline_has_extra_options() && ret >= 0 && boot_command_line[0]) {
ret = snprintf(dst, rest(dst, end), "# Parameters from 
bootloader:\n# %s\n",
   boot_command_line);
if (ret > 0)
diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index ca73940e26df8..e5ee2c694401e 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -10,6 +10,7 @@
 #ifdef __KERNEL__
 #include 
 #include 
+bool __init cmdline_has_extra_options(void);
 #else /* !__KERNEL__ */
 /*
  * NOTE: This is only for tools/bootconfig, because tools/bootconfig will
diff --git a/init/main.c b/init/main.c
index 2ca52474d0c30..881f6230ee59e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -487,6 +487,11 @@ static int __init warn_bootconfig(char *str)
 
 early_param("bootconfig", warn_bootconfig);
 
+bool __init cmdline_has_extra_options(void)
+{
+   return extra_command_line || extra_init_args;
+}
+
 /* Change NUL term back to "=", to make "param" the whole string. */
 static void __init repair_env_string(char *param, char *val)
 {
-- 
2.40.1




Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread kernel test robot
Hi Michal,

kernel test robot noticed the following build errors:

[auto build test ERROR on fec50db7033ea478773b159e0e2efb135270e3b7]

url:
https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031
base:   fec50db7033ea478773b159e0e2efb135270e3b7
patch link:
https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com
patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value
config: arm-allnoconfig 
(https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 
8b3b4a92adee40483c27f26c478a384cd69c6f05)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404090903.3jz667sn-...@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/sysctl.c:23:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:522:36: warning: arithmetic between different 
enumeration types ('enum node_stat_item' and 'enum lru_list') 
[-Wenum-enum-conversion]
 522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
 |   ~~~ ^ ~~~
>> kernel/sysctl.c:1819:14: error: initializing 'void *' with an expression of 
>> type 'const int *' discards qualifiers 
>> [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
1819 | .extra2 = _max_max,
 |   ^~~~
   1 warning and 1 error generated.


vim +1819 kernel/sysctl.c

f461d2dcd511c0 Christoph Hellwig   2020-04-24  1617  
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1618  static struct ctl_table 
kern_table[] = {
^1da177e4c3f41 Linus Torvalds  2005-04-16  1619 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1620 .procname   
= "panic",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1621 .data   
= _timeout,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1622 .maxlen 
= sizeof(int),
49f0ce5f92321c Jerome Marchand 2014-01-21  1623 .mode   
= 0644,
6d4561110a3e9f Eric W. Biederman   2009-11-16  1624 .proc_handler   
= proc_dointvec,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1625 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1626  #ifdef CONFIG_PROC_SYSCTL
^1da177e4c3f41 Linus Torvalds  2005-04-16  1627 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1628 .procname   
= "tainted",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1629 .maxlen 
= sizeof(long),
^1da177e4c3f41 Linus Torvalds  2005-04-16  1630 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1631 .proc_handler   
= proc_taint,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1632 },
2da02997e08d3e David Rientjes  2009-01-06  1633 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1634 .procname   
= "sysctl_writes_strict",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1635 .data   
= _writes_strict,
9e3961a0979817 Prarit Bhargava 2014-12-10  1636 .maxlen 
= sizeof(int),
2da02997e08d3e David Rientjes  2009-01-06  1637 .mode   
= 0644,
9e3961a0979817 Prarit Bhargava 2014-12-10  1638 .proc_handler   
= proc_dointvec_minmax,
78e36f3b0dae58 Xiaoming Ni 2022-01-21  1639 .extra1 
= SYSCTL_NEG_ONE,
eec4844fae7c03 Matteo Croce2019-07-18  1640 .extra2 
= SYSCTL_ONE,
2da02997e08d3e David Rientjes  2009-01-06  1641 },
964c9dff009189 Alexander Popov 2018-08-17  1642  #endif
1efff914afac8a Theodore Ts'o   2015-03-17  1643 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1644 .procname   
= "print-fatal-signals",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1645 .data   
= _fatal_signals,
964c9dff009189 Alexander Popov 2018-08-17  1646 .maxlen 
= sizeof(int),
1efff914afac8a Theodore Ts'o   2015-03-17  1647 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1648 .proc_handler   
= proc_dointvec,
1efff914afac8a Theodore Ts'o   2015-03-17  1649 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1650  #ifdef CONFIG_SPARC
^1da177e4c3f41 Linus Torvalds  2005-04-16  1651 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1652 .procname   
= "reb

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread kernel test robot
Hi Michal,

kernel test robot noticed the following build warnings:

[auto build test WARNING on fec50db7033ea478773b159e0e2efb135270e3b7]

url:
https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031
base:   fec50db7033ea478773b159e0e2efb135270e3b7
patch link:
https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com
patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value
config: alpha-allnoconfig 
(https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404090849.mgj3z0xi-...@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/sysctl.c:1819:35: warning: initialization discards 'const' qualifier 
>> from pointer target type [-Wdiscarded-qualifiers]
1819 | .extra2 = _max_max,
 |   ^


vim +/const +1819 kernel/sysctl.c

f461d2dcd511c0 Christoph Hellwig   2020-04-24  1617  
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1618  static struct ctl_table 
kern_table[] = {
^1da177e4c3f41 Linus Torvalds  2005-04-16  1619 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1620 .procname   
= "panic",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1621 .data   
= _timeout,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1622 .maxlen 
= sizeof(int),
49f0ce5f92321c Jerome Marchand 2014-01-21  1623 .mode   
= 0644,
6d4561110a3e9f Eric W. Biederman   2009-11-16  1624 .proc_handler   
= proc_dointvec,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1625 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1626  #ifdef CONFIG_PROC_SYSCTL
^1da177e4c3f41 Linus Torvalds  2005-04-16  1627 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1628 .procname   
= "tainted",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1629 .maxlen 
= sizeof(long),
^1da177e4c3f41 Linus Torvalds  2005-04-16  1630 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1631 .proc_handler   
= proc_taint,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1632 },
2da02997e08d3e David Rientjes  2009-01-06  1633 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1634 .procname   
= "sysctl_writes_strict",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1635 .data   
= _writes_strict,
9e3961a0979817 Prarit Bhargava 2014-12-10  1636 .maxlen 
= sizeof(int),
2da02997e08d3e David Rientjes  2009-01-06  1637 .mode   
= 0644,
9e3961a0979817 Prarit Bhargava 2014-12-10  1638 .proc_handler   
= proc_dointvec_minmax,
78e36f3b0dae58 Xiaoming Ni 2022-01-21  1639 .extra1 
= SYSCTL_NEG_ONE,
eec4844fae7c03 Matteo Croce2019-07-18  1640 .extra2 
= SYSCTL_ONE,
2da02997e08d3e David Rientjes  2009-01-06  1641 },
964c9dff009189 Alexander Popov 2018-08-17  1642  #endif
1efff914afac8a Theodore Ts'o   2015-03-17  1643 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1644 .procname   
= "print-fatal-signals",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1645 .data   
= _fatal_signals,
964c9dff009189 Alexander Popov 2018-08-17  1646 .maxlen 
= sizeof(int),
1efff914afac8a Theodore Ts'o   2015-03-17  1647 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1648 .proc_handler   
= proc_dointvec,
1efff914afac8a Theodore Ts'o   2015-03-17  1649 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1650  #ifdef CONFIG_SPARC
^1da177e4c3f41 Linus Torvalds  2005-04-16  1651 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1652 .procname   
= "reboot-cmd",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1653 .data   
= reboot_command,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1654 .maxlen 
= 256,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1655 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1656 .proc_handler   
= proc_dostring,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1657 },
^1da177e4c3f41 Linus Torvalds  2005-04-16  1658 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1659 .procname   

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Andrew Morton
On Mon,  8 Apr 2024 16:58:18 +0200 Michal Koutný  wrote:

> The kernel provides mechanisms, while it should not imply policies --
> default pid_max seems to be an example of the policy that does not fit
> all. At the same time pid_max must have some value assigned, so use the
> end of the allowed range -- pid_max_max.
> 
> This change thus increases initial pid_max from 32k to 4M (x86_64
> defconfig).

That seems like a large change.

It isn't clear why we'd want to merge this patchset.  Does it improve
anyone's life and if so, how?




[PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Michal Koutný
pid_max is a per-pidns (thus global too) limit on a number of tasks the
kernel admits. The knob can be configured by admin in the range between
pid_max_min and pid_max_max (sic). The default value sits between
those and it typically equals max(32k, 1k*nr_cpus).

The nr_cpu scaling was introduced in commit 72680a191b93 ("pids:
increase pid_max based on num_possible_cpus") to accommodate kernel's own
helper tasks (before workqueues). Generally, 1024 tasks/cpu cap is too
much if they were all running and it is also too little when they are
idle (memory being bottleneck).

The kernel also provides other mechanisms to restrict number of tasks --
threads-max sysctl and RLIMIT_NPROC with memory-scaled defaults and
generic pids cgroup controller (the last one being the solution of
fork-bombs, with qualified limits set up by admin).

The kernel provides mechanisms, while it should not imply policies --
default pid_max seems to be an example of the policy that does not fit
all. At the same time pid_max must have some value assigned, so use the
end of the allowed range -- pid_max_max.

This change thus increases initial pid_max from 32k to 4M (x86_64
defconfig).

This has effect on size of structure that alloc_pid/idr_alloc_cyclic
eventually uses and structure that kernel tracing uses with
'record-tgid' (~16 MiB).

Signed-off-by: Michal Koutný 
---
 include/linux/pid.h |  4 ++--
 include/linux/threads.h | 15 -------
 kernel/pid.c|  8 +++-
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index a3aad9b4074c..0d191ac02958 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -106,8 +106,8 @@ extern void exchange_tids(struct task_struct *task, struct 
task_struct *old);
 extern void transfer_pid(struct task_struct *old, struct task_struct *new,
 enum pid_type);
 
-extern int pid_max;
-extern int pid_max_min, pid_max_max;
+extern int pid_max_min, pid_max;
+extern const int pid_max_max;
 
 /*
  * look up a PID in the hash table. Must be called with the tasklist_lock
diff --git a/include/linux/threads.h b/include/linux/threads.h
index c34173e6c5f1..43f8f38a0c13 100644
--- a/include/linux/threads.h
+++ b/include/linux/threads.h
@@ -22,25 +22,18 @@
 
 #define MIN_THREADS_LEFT_FOR_ROOT 4
 
-/*
- * This controls the default maximum pid allocated to a process
- */
-#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)
-
 /*
  * A maximum of 4 million PIDs should be enough for a while.
  * [NOTE: PID/TIDs are limited to 2^30 ~= 1 billion, see FUTEX_TID_MASK.]
  */
 #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
-   (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
+   (sizeof(long) > 4 ? 4 * 1024 * 1024 : 0x8000))
 
 /*
- * Define a minimum number of pids per cpu.  Heuristically based
- * on original pid max of 32k for 32 cpus.  Also, increase the
- * minimum settable value for pid_max on the running system based
- * on similar defaults.  See kernel/pid.c:pid_idr_init() for details.
+ * Define a minimum number of pids per cpu. Mainly to accommodate
+ * smpboot_register_percpu_thread() kernel threads.
+ * See kernel/pid.c:pid_idr_init() for details.
  */
-#define PIDS_PER_CPU_DEFAULT   1024
 #define PIDS_PER_CPU_MIN   8
 
 #endif
diff --git a/kernel/pid.c b/kernel/pid.c
index da76ed1873f7..24ae505ac3b0 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -60,10 +60,10 @@ struct pid init_struct_pid = {
}, }
 };
 
-int pid_max = PID_MAX_DEFAULT;
+int pid_max = PID_MAX_LIMIT;
 
 int pid_max_min = RESERVED_PIDS + 1;
-int pid_max_max = PID_MAX_LIMIT;
+const int pid_max_max = PID_MAX_LIMIT;
 /*
  * Pseudo filesystems start inode numbering after one. We use Reserved
  * PIDs as a natural offset.
@@ -652,9 +652,7 @@ void __init pid_idr_init(void)
/* Verify no one has done anything silly: */
BUILD_BUG_ON(PID_MAX_LIMIT >= PIDNS_ADDING);
 
-   /* bump default and minimum pid_max based on number of cpus */
-   pid_max = min(pid_max_max, max_t(int, pid_max,
-   PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
+   /* bump minimum pid_max based on number of cpus */
pid_max_min = max_t(int, pid_max_min,
PIDS_PER_CPU_MIN * num_possible_cpus());
pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);
-- 
2.44.0




[PATCH 0/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Michal Koutný
TL;DR excerpt from commit 02/03:

The kernel provides mechanisms, while it should not imply policies --
default pid_max seems to be an example of the policy that does not fit
all. At the same time pid_max must have some value assigned, so use the
end of the allowed range -- pid_max_max.

More details are in that commit's message. The other two commits are
related preparation and less related refresh in code that somewhat
references pid_max.

Michal Koutný (3):
  tracing: Remove dependency of saved_cmdlines_buffer on PID_MAX_DEFAULT
  kernel/pid: Remove default pid_max value
  tracing: Compare pid_max against pid_list capacity

 include/linux/pid.h   |  4 ++--
 include/linux/threads.h   | 15 ---
 kernel/pid.c  |  8 +++-
 kernel/trace/pid_list.c   |  6 +++---
 kernel/trace/pid_list.h   |  4 ++--
 kernel/trace/trace_sched_switch.c | 11 ++-
 6 files changed, 20 insertions(+), 28 deletions(-)


base-commit: fec50db7033ea478773b159e0e2efb135270e3b7
-- 
2.44.0




Re: [PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-04-01 Thread Andrew Davis

On 4/1/24 6:39 PM, Hari Nagalla wrote:

On 3/25/24 12:20, Andrew Davis wrote:

The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly.

Yes, this would definitely speed up the message receive path. With RT linux, 
the irq runs in thread context, so that is Ok. But with non-RT the whole 
receive path runs in interrupt context. So, i think it would be appropriate to 
use a threaded_irq()?


I was thinking the same at first, but seems some mailbox drivers use threaded, 
others
use non-threaded context. Since all we do in the IRQ context anymore is call
mbox_chan_received_data(), which is supposed to be IRQ safe, then it should be 
fine
either way. So for now I just kept this using the regular IRQ context as before.

If that does turn out to be an issue then let's switch to threaded.

Andrew



Re: [PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-04-01 Thread Hari Nagalla

On 3/25/24 12:20, Andrew Davis wrote:

The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly.
Yes, this would definitely speed up the message receive path. With RT 
linux, the irq runs in thread context, so that is Ok. But with non-RT 
the whole receive path runs in interrupt context. So, i think it would 
be appropriate to use a threaded_irq()?




[PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-03-25 Thread Andrew Davis
The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly.

Signed-off-by: Andrew Davis 
---
 drivers/mailbox/Kconfig|   9 ---
 drivers/mailbox/omap-mailbox.c | 103 +
 2 files changed, 3 insertions(+), 109 deletions(-)

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index 42940108a1874..78e4c74fbe5c2 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -68,15 +68,6 @@ config OMAP2PLUS_MBOX
  OMAP2/3; or IPU, IVA HD and DSP in OMAP4/5. Say Y here if you
  want to use OMAP2+ Mailbox framework support.
 
-config OMAP_MBOX_KFIFO_SIZE
-   int "Mailbox kfifo default buffer size (bytes)"
-   depends on OMAP2PLUS_MBOX
-   default 256
-   help
- Specify the default size of mailbox's kfifo buffers (bytes).
- This can also be changed at runtime (via the mbox_kfifo_size
- module parameter).
-
 config ROCKCHIP_MBOX
bool "Rockchip Soc Integrated Mailbox Support"
depends on ARCH_ROCKCHIP || COMPILE_TEST
diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c
index c5d4083125856..4e7e0e2f537b0 100644
--- a/drivers/mailbox/omap-mailbox.c
+++ b/drivers/mailbox/omap-mailbox.c
@@ -65,14 +65,6 @@ struct omap_mbox_fifo {
u32 intr_bit;
 };
 
-struct omap_mbox_queue {
-   spinlock_t  lock;
-   struct kfifofifo;
-   struct work_struct  work;
-   struct omap_mbox*mbox;
-   bool full;
-};
-
 struct omap_mbox_match_data {
u32 intr_type;
 };
@@ -90,7 +82,6 @@ struct omap_mbox_device {
 struct omap_mbox {
const char  *name;
int irq;
-   struct omap_mbox_queue  *rxq;
struct omap_mbox_device *parent;
struct omap_mbox_fifo   tx_fifo;
struct omap_mbox_fifo   rx_fifo;
@@ -99,10 +90,6 @@ struct omap_mbox {
boolsend_no_irq;
 };
 
-static unsigned int mbox_kfifo_size = CONFIG_OMAP_MBOX_KFIFO_SIZE;
-module_param(mbox_kfifo_size, uint, S_IRUGO);
-MODULE_PARM_DESC(mbox_kfifo_size, "Size of omap's mailbox kfifo (bytes)");
-
 static inline
 unsigned int mbox_read_reg(struct omap_mbox_device *mdev, size_t ofs)
 {
@@ -202,30 +189,6 @@ static void omap_mbox_disable_irq(struct omap_mbox *mbox, 
omap_mbox_irq_t irq)
mbox_write_reg(mbox->parent, bit, irqdisable);
 }
 
-/*
- * Message receiver(workqueue)
- */
-static void mbox_rx_work(struct work_struct *work)
-{
-   struct omap_mbox_queue *mq =
-   container_of(work, struct omap_mbox_queue, work);
-   u32 msg;
-   int len;
-
-   while (kfifo_len(>fifo) >= sizeof(msg)) {
-   len = kfifo_out(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
-
-   mbox_chan_received_data(mq->mbox->chan, (void *)(uintptr_t)msg);
-   spin_lock_irq(>lock);
-   if (mq->full) {
-   mq->full = false;
-   omap_mbox_enable_irq(mq->mbox, IRQ_RX);
-   }
-   spin_unlock_irq(>lock);
-   }
-}
-
 /*
  * Mailbox interrupt handler
  */
@@ -238,27 +201,15 @@ static void __mbox_tx_interrupt(struct omap_mbox *mbox)
 
 static void __mbox_rx_interrupt(struct omap_mbox *mbox)
 {
-   struct omap_mbox_queue *mq = mbox->rxq;
u32 msg;
-   int len;
 
while (!mbox_fifo_empty(mbox)) {
-   if (unlikely(kfifo_avail(>fifo) < sizeof(msg))) {
-   omap_mbox_disable_irq(mbox, IRQ_RX);
-   mq->full = true;
-   goto nomem;
-   }
-
msg = mbox_fifo_read(mbox);
-
-   len = kfifo_in(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
+   mbox_chan_received_data(mbox->chan, (void *)(uintptr_t)msg);
}
 
-   /* no more messages in the fifo. clear IRQ source. */
+   /* clear IRQ source. */
ack_mbox_irq(mbox, IRQ_RX);
-nomem:
-   schedule_work(>rxq->work);
 }
 
 static irqreturn_t mbox_interrupt(int irq, void *p)
@@ -274,57 +225,15 @@ static irqreturn_t mbox_interrupt(int irq, void *p)
return IRQ_HANDLED;
 }
 
-static struct omap_mbox_queue *mbox_queue_alloc(struct omap_mbox *mbox,
-   void (*work)(struct work_struct *))
-{
-   struct omap_mbox_queue *mq;
-   unsigned int size;
-
-  

[PATCH -next] fs: Fix kernel-doc comments to functions

2024-03-22 Thread Yang Li
This commit fix kernel-doc style comments with complete parameter
descriptions for the lookup_file(),lookup_dir_entry() and
lookup_file_dentry().

Signed-off-by: Yang Li 
---
 fs/tracefs/event_inode.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index dc067eeb6387..894c6ca1e500 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -336,6 +336,7 @@ static void update_inode_attr(struct dentry *dentry, struct 
inode *inode,
 
 /**
  * lookup_file - look up a file in the tracefs filesystem
+ * @parent_ei: Pointer to the eventfs_inode that represents parent of the file
  * @dentry: the dentry to look up
  * @mode: the permission that the file should have.
  * @attr: saved attributes changed by user
@@ -389,6 +390,7 @@ static struct dentry *lookup_file(struct eventfs_inode 
*parent_ei,
 /**
  * lookup_dir_entry - look up a dir in the tracefs filesystem
  * @dentry: the directory to look up
+ * @pei: Pointer to the parent eventfs_inode if available
  * @ei: the eventfs_inode that represents the directory to create
  *
  * This function will look up a dentry for a directory represented by
@@ -478,16 +480,20 @@ void eventfs_d_release(struct dentry *dentry)
 
 /**
  * lookup_file_dentry - create a dentry for a file of an eventfs_inode
+ * @dentry: The parent dentry under which the new file's dentry will be created
  * @ei: the eventfs_inode that the file will be created under
  * @idx: the index into the entry_attrs[] of the @ei
- * @parent: The parent dentry of the created file.
- * @name: The name of the file to create
  * @mode: The mode of the file.
  * @data: The data to use to set the inode of the file with on open()
  * @fops: The fops of the file to be created.
  *
- * Create a dentry for a file of an eventfs_inode @ei and place it into the
- * address located at @e_dentry.
+ * This function creates a dentry for a file associated with an
+ * eventfs_inode @ei. It uses the entry attributes specified by @idx,
+ * if available. The file will have the specified @mode and its inode will be
+ * set up with @data upon open. The file operations will be set to @fops.
+ *
+ * Return: Returns a pointer to the newly created file's dentry or an error
+ * pointer.
  */
 static struct dentry *
 lookup_file_dentry(struct dentry *dentry,
-- 
2.20.1.7.g153144c




[PATCH v7 6/7] LoongArch: Add pv ipi support on guest kernel side

2024-03-15 Thread Bibo Mao
PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|   9 ++
 arch/loongarch/include/asm/hardirq.h  |   1 +
 arch/loongarch/include/asm/paravirt.h |  27 
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |   2 +-
 arch/loongarch/kernel/paravirt.c  | 151 ++
 arch/loongarch/kernel/smp.c   |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index b274784c2e26..a1fccaf117aa 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -578,6 +578,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
}
 
-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI 
| ECFGF_PMC);
 }
diff --git a/arch/loongarch/kernel/paravir

[PATCH v6 6/7] LoongArch: Add pv ipi support on guest kernel side

2024-03-02 Thread Bibo Mao
PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|   9 ++
 arch/loongarch/include/asm/hardirq.h  |   1 +
 arch/loongarch/include/asm/paravirt.h |  27 
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |   2 +-
 arch/loongarch/kernel/paravirt.c  | 151 ++
 arch/loongarch/kernel/smp.c   |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 929f68926b34..fdaae9a0435c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
}
 
-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI 
| ECFGF_PMC);
 }
diff --git a/arch/loongarch/kernel/paravir

Re: [PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-25 Thread maobibo




On 2024/2/24 下午5:15, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao  wrote:


Paravirt interface pv_ipi_init() is added here for guest kernel, it
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current VMM type.
Now only KVM VMM type is detected,the paravirt function can work only if
current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

There is not effective with pv_ipi_init() now, it is dummy function.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|  9 
  arch/loongarch/include/asm/kvm_para.h |  7 
  arch/loongarch/include/asm/paravirt.h | 27 
  .../include/asm/paravirt_api_clock.h  |  1 +
  arch/loongarch/kernel/Makefile|  1 +
  arch/loongarch/kernel/paravirt.c  | 41 +++
  arch/loongarch/kernel/setup.c |  1 +
  7 files changed, 87 insertions(+)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 929f68926b34..fdaae9a0435c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index d48f993ae206..af5d677a9052 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
  #ifndef _ASM_LOONGARCH_KVM_PARA_H
  #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypercall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
  /*
   * LoongArch hypercall return code
   */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..5cf794e8490f
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_ipi_init(void)
+{
+   if (!cp

Re: [PATCH] ftrace: fix most kernel-doc warnings

2024-02-25 Thread Google
On Thu, 22 Feb 2024 21:48:33 -0800
Randy Dunlap  wrote:

> Reduce the number of kernel-doc warnings from 52 down to 10, i.e.,
> fix 42 kernel-doc warnings by (a) using the Returns: format for
> function return values or (b) using "@var:" instead of "@var -"
> for function parameter descriptions.
> 
> Fix one return values list so that it is formatted correctly when
> rendered for output.
> 
> Spell "non-zero" with a hyphen in several places.

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thanks!

> 
> Signed-off-by: Randy Dunlap 
> Reported-by: kernel test robot 
> Link: 
> https://lore.kernel.org/oe-kbuild-all/202312180518.x6frydsn-...@intel.com/
> Cc: Steven Rostedt 
> Cc: Masami Hiramatsu 
> Cc: Mathieu Desnoyers 
> Cc: Mark Rutland 
> Cc: linux-trace-ker...@vger.kernel.org
> ---
> This patch addresses most of the reported kernel-doc warnings but does
> not fix all of them, so I did not use "Closes:" for the Link: tag.
> 
>  kernel/trace/ftrace.c |   90 ++++----
>  1 file changed, 46 insertions(+), 44 deletions(-)
> 
> diff -- a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1160,7 +1160,7 @@ __ftrace_lookup_ip(struct ftrace_hash *h
>   * Search a given @hash to see if a given instruction pointer (@ip)
>   * exists in it.
>   *
> - * Returns the entry that holds the @ip if found. NULL otherwise.
> + * Returns: the entry that holds the @ip if found. NULL otherwise.
>   */
>  struct ftrace_func_entry *
>  ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip)
> @@ -1282,7 +1282,7 @@ static void free_ftrace_hash_rcu(struct
>  
>  /**
>   * ftrace_free_filter - remove all filters for an ftrace_ops
> - * @ops - the ops to remove the filters from
> + * @ops: the ops to remove the filters from
>   */
>  void ftrace_free_filter(struct ftrace_ops *ops)
>  {
> @@ -1587,7 +1587,7 @@ static struct dyn_ftrace *lookup_rec(uns
>   * @end: end of range to search (inclusive). @end points to the last byte
>   *   to check.
>   *
> - * Returns rec->ip if the related ftrace location is a least partly within
> + * Returns: rec->ip if the related ftrace location is a least partly within
>   * the given address range. That is, the first address of the instruction
>   * that is either a NOP or call to the function tracer. It checks the ftrace
>   * internal tables to determine if the address belongs or not.
> @@ -1607,9 +1607,10 @@ unsigned long ftrace_location_range(unsi
>   * ftrace_location - return the ftrace location
>   * @ip: the instruction pointer to check
>   *
> - * If @ip matches the ftrace location, return @ip.
> - * If @ip matches sym+0, return sym's ftrace location.
> - * Otherwise, return 0.
> + * Returns:
> + * * If @ip matches the ftrace location, return @ip.
> + * * If @ip matches sym+0, return sym's ftrace location.
> + * * Otherwise, return 0.
>   */
>  unsigned long ftrace_location(unsigned long ip)
>  {
> @@ -1639,7 +1640,7 @@ out:
>   * @start: start of range to search
>   * @end: end of range to search (inclusive). @end points to the last byte to 
> check.
>   *
> - * Returns 1 if @start and @end contains a ftrace location.
> + * Returns: 1 if @start and @end contains a ftrace location.
>   * That is, the instruction that is either a NOP or call to
>   * the function tracer. It checks the ftrace internal tables to
>   * determine if the address belongs or not.
> @@ -2574,7 +2575,7 @@ static void call_direct_funcs(unsigned l
>   * wants to convert to a callback that saves all regs. If FTRACE_FL_REGS
>   * is not set, then it wants to convert to the normal callback.
>   *
> - * Returns the address of the trampoline to set to
> + * Returns: the address of the trampoline to set to
>   */
>  unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec)
>  {
> @@ -2615,7 +2616,7 @@ unsigned long ftrace_get_addr_new(struct
>   * a function that saves all the regs. Basically the '_EN' version
>   * represents the current state of the function.
>   *
> - * Returns the address of the trampoline that is currently being called
> + * Returns: the address of the trampoline that is currently being called
>   */
>  unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec)
>  {
> @@ -2719,7 +2720,7 @@ struct ftrace_rec_iter {
>  /**
>   * ftrace_rec_iter_start - start up iterating over traced functions
>   *
> - * Returns an iterator handle that is used to iterate over all
> + * Returns: an iterator handle that is used to iterate over all
>   * the records that represent address locations where functions
>   * are traced.
>   *
> @@ -2751,7 +2

Re: [PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-24 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao  wrote:
>
> Paravirt interface pv_ipi_init() is added here for guest kernel, it
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current VMM type.
> Now only KVM VMM type is detected,the paravirt function can work only if
> current VMM is KVM hypervisor, since there is only KVM hypervisor
> supported on LoongArch now.
>
> There is not effective with pv_ipi_init() now, it is dummy function.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|  9 
>  arch/loongarch/include/asm/kvm_para.h |  7 
>  arch/loongarch/include/asm/paravirt.h | 27 
>  .../include/asm/paravirt_api_clock.h  |  1 +
>  arch/loongarch/kernel/Makefile|  1 +
>  arch/loongarch/kernel/paravirt.c      | 41 +++
>  arch/loongarch/kernel/setup.c |  1 +
>  7 files changed, 87 insertions(+)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 929f68926b34..fdaae9a0435c 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> index d48f993ae206..af5d677a9052 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -2,6 +2,13 @@
>  #ifndef _ASM_LOONGARCH_KVM_PARA_H
>  #define _ASM_LOONGARCH_KVM_PARA_H
>
> +/*
> + * Hypercall code field
> + */
> +#define HYPERVISOR_KVM 1
> +#define HYPERVISOR_VENDOR_SHIFT8
> +#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
> + code)
> +
>  /*
>   * LoongArch hypercall return code
>   */
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3c808c680370..662e6e9de12d 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
>  obj-$(CONFIG_SMP)  += smp.o
>
> diff --git a/arch/loongarch/kernel/paravirt.c 
> b/arch/loongarch/kernel/paravirt.c
> new file mode 100644
> index ..5cf794e8490f
> --- /dev/null
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct static_key paravirt_steal_enabled;
&

[PATCH] ftrace: fix most kernel-doc warnings

2024-02-22 Thread Randy Dunlap
Reduce the number of kernel-doc warnings from 52 down to 10, i.e.,
fix 42 kernel-doc warnings by (a) using the Returns: format for
function return values or (b) using "@var:" instead of "@var -"
for function parameter descriptions.

Fix one return values list so that it is formatted correctly when
rendered for output.

Spell "non-zero" with a hyphen in several places.

Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Link: https://lore.kernel.org/oe-kbuild-all/202312180518.x6frydsn-...@intel.com/
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Cc: Mark Rutland 
Cc: linux-trace-ker...@vger.kernel.org
---
This patch addresses most of the reported kernel-doc warnings but does
not fix all of them, so I did not use "Closes:" for the Link: tag.

 kernel/trace/ftrace.c |   90 
 1 file changed, 46 insertions(+), 44 deletions(-)

diff -- a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1160,7 +1160,7 @@ __ftrace_lookup_ip(struct ftrace_hash *h
  * Search a given @hash to see if a given instruction pointer (@ip)
  * exists in it.
  *
- * Returns the entry that holds the @ip if found. NULL otherwise.
+ * Returns: the entry that holds the @ip if found. NULL otherwise.
  */
 struct ftrace_func_entry *
 ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip)
@@ -1282,7 +1282,7 @@ static void free_ftrace_hash_rcu(struct
 
 /**
  * ftrace_free_filter - remove all filters for an ftrace_ops
- * @ops - the ops to remove the filters from
+ * @ops: the ops to remove the filters from
  */
 void ftrace_free_filter(struct ftrace_ops *ops)
 {
@@ -1587,7 +1587,7 @@ static struct dyn_ftrace *lookup_rec(uns
  * @end: end of range to search (inclusive). @end points to the last byte
  * to check.
  *
- * Returns rec->ip if the related ftrace location is a least partly within
+ * Returns: rec->ip if the related ftrace location is a least partly within
  * the given address range. That is, the first address of the instruction
  * that is either a NOP or call to the function tracer. It checks the ftrace
  * internal tables to determine if the address belongs or not.
@@ -1607,9 +1607,10 @@ unsigned long ftrace_location_range(unsi
  * ftrace_location - return the ftrace location
  * @ip: the instruction pointer to check
  *
- * If @ip matches the ftrace location, return @ip.
- * If @ip matches sym+0, return sym's ftrace location.
- * Otherwise, return 0.
+ * Returns:
+ * * If @ip matches the ftrace location, return @ip.
+ * * If @ip matches sym+0, return sym's ftrace location.
+ * * Otherwise, return 0.
  */
 unsigned long ftrace_location(unsigned long ip)
 {
@@ -1639,7 +1640,7 @@ out:
  * @start: start of range to search
  * @end: end of range to search (inclusive). @end points to the last byte to 
check.
  *
- * Returns 1 if @start and @end contains a ftrace location.
+ * Returns: 1 if @start and @end contains a ftrace location.
  * That is, the instruction that is either a NOP or call to
  * the function tracer. It checks the ftrace internal tables to
  * determine if the address belongs or not.
@@ -2574,7 +2575,7 @@ static void call_direct_funcs(unsigned l
  * wants to convert to a callback that saves all regs. If FTRACE_FL_REGS
  * is not set, then it wants to convert to the normal callback.
  *
- * Returns the address of the trampoline to set to
+ * Returns: the address of the trampoline to set to
  */
 unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec)
 {
@@ -2615,7 +2616,7 @@ unsigned long ftrace_get_addr_new(struct
  * a function that saves all the regs. Basically the '_EN' version
  * represents the current state of the function.
  *
- * Returns the address of the trampoline that is currently being called
+ * Returns: the address of the trampoline that is currently being called
  */
 unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec)
 {
@@ -2719,7 +2720,7 @@ struct ftrace_rec_iter {
 /**
  * ftrace_rec_iter_start - start up iterating over traced functions
  *
- * Returns an iterator handle that is used to iterate over all
+ * Returns: an iterator handle that is used to iterate over all
  * the records that represent address locations where functions
  * are traced.
  *
@@ -2751,7 +2752,7 @@ struct ftrace_rec_iter *ftrace_rec_iter_
  * ftrace_rec_iter_next - get the next record to process.
  * @iter: The handle to the iterator.
  *
- * Returns the next iterator after the given iterator @iter.
+ * Returns: the next iterator after the given iterator @iter.
  */
 struct ftrace_rec_iter *ftrace_rec_iter_next(struct ftrace_rec_iter *iter)
 {
@@ -2776,7 +2777,7 @@ struct ftrace_rec_iter *ftrace_rec_iter_
  * ftrace_rec_iter_record - get the record at the iterator location
  * @iter: The current iterator location
  *
- * Returns the record that the current @iter is at.
+ * Returns: the record that the current @iter is at.
  */
 st

[PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-21 Thread Bibo Mao
Paravirt interface pv_ipi_init() is added here for guest kernel, it
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current VMM type.
Now only KVM VMM type is detected,the paravirt function can work only if
current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

There is not effective with pv_ipi_init() now, it is dummy function.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  1 +
 7 files changed, 87 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 929f68926b34..fdaae9a0435c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index d48f993ae206..af5d677a9052 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypercall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypercall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..5cf794e8490f
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_ipi_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git 

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread maobibo




On 2024/2/19 下午5:38, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 5:21 PM maobibo  wrote:




On 2024/2/19 下午4:48, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
arch/loongarch/Kconfig|  9 
arch/loongarch/include/asm/kvm_para.h |  7 
arch/loongarch/include/asm/paravirt.h | 27 
.../include/asm/paravirt_api_clock.h  |  1 +
arch/loongarch/kernel/Makefile|  1 +
arch/loongarch/kernel/paravirt.c  | 41 +++
arch/loongarch/kernel/setup.c |  2 +
7 files changed, 88 insertions(+)
create mode 100644 arch/loongarch/include/asm/paravirt.h
create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
   bool
   default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
config ARCH_SUPPORTS_KEXEC
   def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
#ifndef _ASM_LOONGARCH_KVM_PARA_H
#define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
/*
 * LoongArch hypcall return code
 */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.


Originally I want to remove this piece of code, but it fails to compile
if CONFIG_PARAVIRT is selected. Here is reference code, function
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.

static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
   if (static_key_false(_steal_enabled)) {
   u64 steal;

   steal = paravirt_steal_clock(smp_processor_id());
   steal -= this_rq()->prev_steal_time;
   steal = min(steal, maxtime);
   account_steal_time(steal);
   this_rq()->prev_steal_time += steal;

   return steal;
   }
#endif
   return 0;
}

OK, then keep it.




+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
obj-$(CONFIG_STACKTRACE)

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread Huacai Chen
On Mon, Feb 19, 2024 at 5:21 PM maobibo  wrote:
>
>
>
> On 2024/2/19 下午4:48, Huacai Chen wrote:
> > On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:
> >>
> >>
> >>
> >> On 2024/2/19 上午10:42, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
> >>>>
> >>>> The patch adds paravirt interface for guest kernel, function
> >>>> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> >>>> runs on VM mode, it will call function kvm_para_available() to detect
> >>>> whether current VMM is KVM hypervisor. And the paravirt function can work
> >>>> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> >>>> supported on LoongArch now.
> >>>>
> >>>> This patch only adds paravirt interface for guest kernel, however there
> >>>> is not effective pv functions added here.
> >>>>
> >>>> Signed-off-by: Bibo Mao 
> >>>> ---
> >>>>arch/loongarch/Kconfig|  9 ++++
> >>>>arch/loongarch/include/asm/kvm_para.h |  7 ++++
> >>>>arch/loongarch/include/asm/paravirt.h | 27 
> >>>>.../include/asm/paravirt_api_clock.h  |  1 +
> >>>>arch/loongarch/kernel/Makefile|  1 +
> >>>>    arch/loongarch/kernel/paravirt.c  | 41 +++
> >>>>arch/loongarch/kernel/setup.c |  2 +
> >>>>7 files changed, 88 insertions(+)
> >>>>create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>>>create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>>>create mode 100644 arch/loongarch/kernel/paravirt.c
> >>>>
> >>>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >>>> index 10959e6c3583..817a56dff80f 100644
> >>>> --- a/arch/loongarch/Kconfig
> >>>> +++ b/arch/loongarch/Kconfig
> >>>> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> >>>>   bool
> >>>>   default y
> >>>>
> >>>> +config PARAVIRT
> >>>> +   bool "Enable paravirtualization code"
> >>>> +   depends on AS_HAS_LVZ_EXTENSION
> >>>> +   help
> >>>> +  This changes the kernel so it can modify itself when it is run
> >>>> + under a hypervisor, potentially improving performance 
> >>>> significantly
> >>>> + over full virtualization.  However, when run without a 
> >>>> hypervisor
> >>>> + the kernel is theoretically slower and slightly larger.
> >>>> +
> >>>>config ARCH_SUPPORTS_KEXEC
> >>>>   def_bool y
> >>>>
> >>>> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> >>>> b/arch/loongarch/include/asm/kvm_para.h
> >>>> index 9425d3b7e486..41200e922a82 100644
> >>>> --- a/arch/loongarch/include/asm/kvm_para.h
> >>>> +++ b/arch/loongarch/include/asm/kvm_para.h
> >>>> @@ -2,6 +2,13 @@
> >>>>#ifndef _ASM_LOONGARCH_KVM_PARA_H
> >>>>#define _ASM_LOONGARCH_KVM_PARA_H
> >>>>
> >>>> +/*
> >>>> + * Hypcall code field
> >>>> + */
> >>>> +#define HYPERVISOR_KVM 1
> >>>> +#define HYPERVISOR_VENDOR_SHIFT8
> >>>> +#define HYPERCALL_CODE(vendor, code)   ((vendor << 
> >>>> HYPERVISOR_VENDOR_SHIFT) + code)
> >>>> +
> >>>>/*
> >>>> * LoongArch hypcall return code
> >>>> */
> >>>> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >>>> b/arch/loongarch/include/asm/paravirt.h
> >>>> new file mode 100644
> >>>> index ..b64813592ba0
> >>>> --- /dev/null
> >>>> +++ b/arch/loongarch/include/asm/paravirt.h
> >>>> @@ -0,0 +1,27 @@
> >>>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>>> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >>>> +#define _ASM_LOONGARCH_PARAVIRT_H
> >>>> +
> >>>> +#ifdef CONFIG_PARAVIRT
> >>>> +#include 
> &g

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread maobibo




On 2024/2/19 下午4:48, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
   arch/loongarch/Kconfig|  9 
   arch/loongarch/include/asm/kvm_para.h |  7 
   arch/loongarch/include/asm/paravirt.h | 27 
   .../include/asm/paravirt_api_clock.h  |  1 +
   arch/loongarch/kernel/Makefile|  1 +
   arch/loongarch/kernel/paravirt.c  | 41 +++
   arch/loongarch/kernel/setup.c |  2 +
   7 files changed, 88 insertions(+)
   create mode 100644 arch/loongarch/include/asm/paravirt.h
   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
   create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
  bool
  default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
   config ARCH_SUPPORTS_KEXEC
  def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
   #ifndef _ASM_LOONGARCH_KVM_PARA_H
   #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
   /*
* LoongArch hypcall return code
*/
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.


Originally I want to remove this piece of code, but it fails to compile
if CONFIG_PARAVIRT is selected. Here is reference code, function
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.

static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
  if (static_key_false(_steal_enabled)) {
  u64 steal;

  steal = paravirt_steal_clock(smp_processor_id());
  steal -= this_rq()->prev_steal_time;
  steal = min(steal, maxtime);
  account_steal_time(steal);
  this_rq()->prev_steal_time += steal;

  return steal;
  }
#endif
  return 0;
}

OK, then keep it.




+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
   obj-$(CONFIG_STACKTRACE)   += stacktrace.o

   obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread Huacai Chen
On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:
>
>
>
> On 2024/2/19 上午10:42, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
> >>
> >> The patch adds paravirt interface for guest kernel, function
> >> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> >> runs on VM mode, it will call function kvm_para_available() to detect
> >> whether current VMM is KVM hypervisor. And the paravirt function can work
> >> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> >> supported on LoongArch now.
> >>
> >> This patch only adds paravirt interface for guest kernel, however there
> >> is not effective pv functions added here.
> >>
> >> Signed-off-by: Bibo Mao 
> >> ---
> >>   arch/loongarch/Kconfig|  9 
> >>   arch/loongarch/include/asm/kvm_para.h     |  7 
> >>   arch/loongarch/include/asm/paravirt.h | 27 
> >>   .../include/asm/paravirt_api_clock.h  |  1 +
> >>   arch/loongarch/kernel/Makefile|  1 +
> >>   arch/loongarch/kernel/paravirt.c  | 41 +++
> >>   arch/loongarch/kernel/setup.c |  2 +
> >>   7 files changed, 88 insertions(+)
> >>   create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>   create mode 100644 arch/loongarch/kernel/paravirt.c
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 10959e6c3583..817a56dff80f 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> >>  bool
> >>  default y
> >>
> >> +config PARAVIRT
> >> +   bool "Enable paravirtualization code"
> >> +   depends on AS_HAS_LVZ_EXTENSION
> >> +   help
> >> +  This changes the kernel so it can modify itself when it is run
> >> + under a hypervisor, potentially improving performance 
> >> significantly
> >> + over full virtualization.  However, when run without a hypervisor
> >> + the kernel is theoretically slower and slightly larger.
> >> +
> >>   config ARCH_SUPPORTS_KEXEC
> >>  def_bool y
> >>
> >> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> >> b/arch/loongarch/include/asm/kvm_para.h
> >> index 9425d3b7e486..41200e922a82 100644
> >> --- a/arch/loongarch/include/asm/kvm_para.h
> >> +++ b/arch/loongarch/include/asm/kvm_para.h
> >> @@ -2,6 +2,13 @@
> >>   #ifndef _ASM_LOONGARCH_KVM_PARA_H
> >>   #define _ASM_LOONGARCH_KVM_PARA_H
> >>
> >> +/*
> >> + * Hypcall code field
> >> + */
> >> +#define HYPERVISOR_KVM 1
> >> +#define HYPERVISOR_VENDOR_SHIFT8
> >> +#define HYPERCALL_CODE(vendor, code)   ((vendor << 
> >> HYPERVISOR_VENDOR_SHIFT) + code)
> >> +
> >>   /*
> >>* LoongArch hypcall return code
> >>*/
> >> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >> b/arch/loongarch/include/asm/paravirt.h
> >> new file mode 100644
> >> index ..b64813592ba0
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt.h
> >> @@ -0,0 +1,27 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >> +#define _ASM_LOONGARCH_PARAVIRT_H
> >> +
> >> +#ifdef CONFIG_PARAVIRT
> >> +#include 
> >> +struct static_key;
> >> +extern struct static_key paravirt_steal_enabled;
> >> +extern struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +u64 dummy_steal_clock(int cpu);
> >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> >> +
> >> +static inline u64 paravirt_steal_clock(int cpu)
> >> +{
> >> +   return static_call(pv_steal_clock)(cpu);
> >> +}
> > The steal time code can be removed in this patch, I think.
> >
> Originally I want to remove this piece of code, but it fails to compile
> if CONFIG_PARAVIRT is selected. Here is reference code, function
> paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.
>
> static __always_inline u64 steal_account_process_time(u64 maxtime)
> {
> #ifdef CON

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-18 Thread maobibo




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|  9 
  arch/loongarch/include/asm/kvm_para.h |  7 
  arch/loongarch/include/asm/paravirt.h | 27 
  .../include/asm/paravirt_api_clock.h  |  1 +
  arch/loongarch/kernel/Makefile|  1 +
  arch/loongarch/kernel/paravirt.c  | 41 +++
  arch/loongarch/kernel/setup.c |  2 +
  7 files changed, 88 insertions(+)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
  #ifndef _ASM_LOONGARCH_KVM_PARA_H
  #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
  /*
   * LoongArch hypcall return code
   */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.

Originally I want to remove this piece of code, but it fails to compile 
if CONFIG_PARAVIRT is selected. Here is reference code, function 
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.


static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(_steal_enabled)) {
u64 steal;

steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
steal = min(steal, maxtime);
account_steal_time(steal);
this_rq()->prev_steal_time += steal;

return steal;
}
#endif
return 0;
}


+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d0

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-18 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
>
> The patch adds paravirt interface for guest kernel, function
> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> runs on VM mode, it will call function kvm_para_available() to detect
> whether current VMM is KVM hypervisor. And the paravirt function can work
> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> supported on LoongArch now.
>
> This patch only adds paravirt interface for guest kernel, however there
> is not effective pv functions added here.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|  9 
>  arch/loongarch/include/asm/kvm_para.h |  7 
>  arch/loongarch/include/asm/paravirt.h | 27 
>  .../include/asm/paravirt_api_clock.h  |  1 +
>  arch/loongarch/kernel/Makefile|  1 +
>  arch/loongarch/kernel/paravirt.c  | 41 +++
>  arch/loongarch/kernel/setup.c |  2 +
>  7 files changed, 88 insertions(+)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 10959e6c3583..817a56dff80f 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> index 9425d3b7e486..41200e922a82 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -2,6 +2,13 @@
>  #ifndef _ASM_LOONGARCH_KVM_PARA_H
>  #define _ASM_LOONGARCH_KVM_PARA_H
>
> +/*
> + * Hypcall code field
> + */
> +#define HYPERVISOR_KVM 1
> +#define HYPERVISOR_VENDOR_SHIFT8
> +#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
> + code)
> +
>  /*
>   * LoongArch hypcall return code
>   */
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..b64813592ba0
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
The steal time code can be removed in this patch, I think.

> +
> +int pv_guest_init(void);
> +#else
> +static inline int pv_guest_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3c808c680370..662e6e9de12d 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
>  obj-$(CONFIG_SMP)  += smp.o
>
> diff --git a/arch/loongarch/kernel/paravirt.c 
> b/arch/loongarch/kernel/paravirt.c
> new file mode 100644
> index ..21d01d05791a
> --- /dev/null
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include

Re: [PATCH v2] x86/sgx: fix kernel-doc comment misuse

2024-02-12 Thread Jarkko Sakkinen
On Sun Feb 11, 2024 at 8:24 AM EET, Randy Dunlap wrote:
> Don't use "/**" for a non-kernel-doc comment. This prevents a warning
> from scripts/kernel-doc:
>
> main.c:740: warning: expecting prototype for A section metric is concatenated 
> in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() 
> instead
>
> Cc: Jarkko Sakkinen 
> Cc: Dave Hansen 
> Cc: linux-...@vger.kernel.org
> Cc: x...@kernel.org
> Reviewed-by: Kai Huang 
> Signed-off-by: Randy Dunlap 
> ---
> v2: add Rev-by: Kai Huang
>
>  arch/x86/kernel/cpu/sgx/main.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -731,7 +731,7 @@ out:
>   return 0;
>  }
>  
> -/**
> +/*
>   * A section metric is concatenated in a way that @low bits 12-31 define the
>   * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
>   * metric.


Reviewed-by: Jarkko Sakkinen 

BR, Jarkko



Re: [syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options

2024-02-12 Thread syzbot
syzbot suspects this issue was fixed by commit:

commit ad579864637af46447208254719943179b69d41a
Author: Steven Rostedt (Google) 
Date:   Tue Jan 2 20:12:49 2024 +

tracefs: Check for dentry->d_inode exists in set_gid()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17659d2418
start commit:   453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne..
git tree:   upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4
dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15e52409e8

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: tracefs: Check for dentry->d_inode exists in set_gid()

For information about bisection process see: https://goo.gl/tpsmEJ#bisection



[PATCH v2] x86/sgx: fix kernel-doc comment misuse

2024-02-10 Thread Randy Dunlap
Don't use "/**" for a non-kernel-doc comment. This prevents a warning
from scripts/kernel-doc:

main.c:740: warning: expecting prototype for A section metric is concatenated 
in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() 
instead

Cc: Jarkko Sakkinen 
Cc: Dave Hansen 
Cc: linux-...@vger.kernel.org
Cc: x...@kernel.org
Reviewed-by: Kai Huang 
Signed-off-by: Randy Dunlap 
---
v2: add Rev-by: Kai Huang

 arch/x86/kernel/cpu/sgx/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -731,7 +731,7 @@ out:
return 0;
 }
 
-/**
+/*
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
  * metric.



Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-02-02 Thread Luis Chamberlain
On Thu, Feb 01, 2024 at 03:27:54PM +0100, Marco Pagani wrote:
> 
> On 2024-01-30 21:47, Luis Chamberlain wrote:
> > 
> > It very much sounds like there is a desire to have this but without a
> > user, there is no justification.
> 
> I was working on a set of patches to fix an issue in the fpga subsystem
> when I came across your commit 557aafac1153 ("kernel/module: add
> documentation for try_module_get()") that made me realize we also had a
> safety problem. 
> 
> To solve this problem for the fpga manager, we had to add a mutex to
> ensure the low-level module still exists before calling
> try_module_get(). However, having a safer version of try_module_get()
> would have simplified the code and made it more robust against changes.
> 
> https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/
> 
> I suspect there may be other cases where try_module_get() is
> inadvertently called without ensuring that the module still exists
> that may benefit from a safer implementation.

Maybe so, however I'm not yet sure if this is safe from deadlocks.
Please work on a series of selftest simple modules which demonstrate
its use / and a simple bash script selftest loader which verifies this
won't bust. Consider you may have third party modules which also race
with this too, and other users without this new API.

> >> +bool try_module_get_safe(struct module *module)
> >> +{
> >> +  struct module *mod;
> >> +  bool ret = true;
> >> +
> >> +  if (!module)
> >> +  goto out;
> >> +
> >> +  mutex_lock(_mutex);
> > 
> > If a user comes around then this should be mutex_lock_interruptible(),
> > and add might_sleep()
> 
> Would it be okay to return false if it gets interrupted, or should I
> change the return type to int to propagate -EINTR? My concern with
> changing the signature is that it would be less straightforward to
> use the function in place of try_module_get().

Since we want a safe mechanism we might as well not allow a simple drop
in replacement but a more robust one so that users take care of the
return value properly.

  Luis



Re: [PATCH] lib/test_kmod: fix kernel-doc warnings

2024-02-01 Thread Luis Chamberlain
On Fri, Nov 03, 2023 at 09:20:44PM -0700, Randy Dunlap wrote:
> Fix all kernel-doc warnings in test_kmod.c:
> - Mark some enum values as private so that kernel-doc is not needed
>   for them
> - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments
> - add kernel-doc info for @task_sync
> 
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in 
> enum 'kmod_test_case'
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum 
> 'kmod_test_case'
> test_kmod.c:100: warning: Function parameter or member 'task_sync' not 
> described in 'kmod_test_device_info'
> test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not 
> described in 'kmod_test_device'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Luis Chamberlain 
> Cc: linux-modu...@vger.kernel.org

Applied and pushed, thanks!

  Luis



Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-02-01 Thread Marco Pagani



On 2024-01-30 21:47, Luis Chamberlain wrote:
> On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote:
>> The current implementation of try_module_get() requires the module to
>> exist and be live as a precondition. While this may seem intuitive at
>> first glance, enforcing the precondition can be tricky, considering that
>> modules can be unloaded at any time if not previously taken. For
>> instance, the caller could be preempted just before calling
>> try_module_get(), and while preempted, the module could be unloaded and
>> freed. More subtly, the module could also be unloaded at any point while
>> executing try_module_get() before incrementing the refount with
>> atomic_inc_not_zero().
>>
>> Neglecting the precondition that the module must exist and be live can
>> cause unexpected race conditions that can lead to crashes. However,
>> ensuring that the precondition is met may require additional locking
>> that increases the complexity of the code and can make it more
>> error-prone.
>>
>> This patch adds a slower yet safer implementation of try_module_get()
>> that checks if the module is valid by looking into the mod_tree before
>> taking the module's refcount. This new function can be safely called on
>> stale and invalid module pointers, relieving developers from the burden
>> of ensuring that the module exists and is live before attempting to take
>> it.
>>
>> The tree lookup and refcount increment are executed after taking the
>> module_mutex to prevent the module from being unloaded after looking up
>> the tree.
>>
>> Signed-off-by: Marco Pagani 
> 
> It very much sounds like there is a desire to have this but without a
> user, there is no justification.

I was working on a set of patches to fix an issue in the fpga subsystem
when I came across your commit 557aafac1153 ("kernel/module: add
documentation for try_module_get()") that made me realize we also had a
safety problem. 

To solve this problem for the fpga manager, we had to add a mutex to
ensure the low-level module still exists before calling
try_module_get(). However, having a safer version of try_module_get()
would have simplified the code and made it more robust against changes.

https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/

I suspect there may be other cases where try_module_get() is
inadvertently called without ensuring that the module still exists
that may benefit from a safer implementation.

>> +bool try_module_get_safe(struct module *module)
>> +{
>> +struct module *mod;
>> +bool ret = true;
>> +
>> +if (!module)
>> +goto out;
>> +
>> +mutex_lock(_mutex);
> 
> If a user comes around then this should be mutex_lock_interruptible(),
> and add might_sleep()

Would it be okay to return false if it gets interrupted, or should I
change the return type to int to propagate -EINTR? My concern with
changing the signature is that it would be less straightforward to
use the function in place of try_module_get().

>> +
>> +/*
>> + * Check if the address points to a valid live module and take
>> + * the refcount only if it points to the module struct.
>> + */
>> +mod = __module_address((unsigned long)module);
>> +if (mod && mod == module && module_is_live(mod))
>> +__module_get(mod);
>> +else
>> +ret = false;
>> +
>> +mutex_unlock(_mutex);
>> +
>> +out:
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(try_module_get_safe);
> 
> And EXPORT_SYMBOL_GPL() would need to be used.

Okay, I initially used EXPORT_SYMBOL() to be compatible with
try_module_get().

> 
> I'd also expect selftests to be expanded for this case, but again,
> without a user, this is just trying to resolve a problem which does not
> exist.

I can add selftests in the next versions.
Thanks,
Marco




Re: [PATCH] lib/test_kmod: fix kernel-doc warnings

2024-01-31 Thread Randy Dunlap
Hi,

Any comments on this patch?
Thanks.


On 11/3/23 21:20, Randy Dunlap wrote:
> Fix all kernel-doc warnings in test_kmod.c:
> - Mark some enum values as private so that kernel-doc is not needed
>   for them
> - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments
> - add kernel-doc info for @task_sync
> 
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in 
> enum 'kmod_test_case'
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum 
> 'kmod_test_case'
> test_kmod.c:100: warning: Function parameter or member 'task_sync' not 
> described in 'kmod_test_device_info'
> test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not 
> described in 'kmod_test_device'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Luis Chamberlain 
> Cc: linux-modu...@vger.kernel.org
> ---
>  lib/test_kmod.c |6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff -- a/lib/test_kmod.c b/lib/test_kmod.c
> --- a/lib/test_kmod.c
> +++ b/lib/test_kmod.c
> @@ -58,11 +58,14 @@ static int num_test_devs;
>   * @need_mod_put for your tests case.
>   */
>  enum kmod_test_case {
> + /* private: */
>   __TEST_KMOD_INVALID = 0,
> + /* public: */
>  
>   TEST_KMOD_DRIVER,
>   TEST_KMOD_FS_TYPE,
>  
> + /* private: */
>   __TEST_KMOD_MAX,
>  };
>  
> @@ -82,6 +85,7 @@ struct kmod_test_device;
>   * @ret_sync: return value if request_module() is used, sync request for
>   *   @TEST_KMOD_DRIVER
>   * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE
> + * @task_sync: kthread's task_struct or %NULL if not running
>   * @thread_idx: thread ID
>   * @test_dev: test device test is being performed under
>   * @need_mod_put: Some tests (get_fs_type() is one) requires putting the 
> module
> @@ -108,7 +112,7 @@ struct kmod_test_device_info {
>   * @dev: pointer to misc_dev's own struct device
>   * @config_mutex: protects configuration of test
>   * @trigger_mutex: the test trigger can only be fired once at a time
> - * @thread_lock: protects @done count, and the @info per each thread
> + * @thread_mutex: protects @done count, and the @info per each thread
>   * @done: number of threads which have completed or failed
>   * @test_is_oom: when we run out of memory, use this to halt moving forward
>   * @kthreads_done: completion used to signal when all work is done

-- 
#Randy



[PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-31 Thread Bibo Mao
The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   

Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-01-30 Thread Luis Chamberlain
On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote:
> The current implementation of try_module_get() requires the module to
> exist and be live as a precondition. While this may seem intuitive at
> first glance, enforcing the precondition can be tricky, considering that
> modules can be unloaded at any time if not previously taken. For
> instance, the caller could be preempted just before calling
> try_module_get(), and while preempted, the module could be unloaded and
> freed. More subtly, the module could also be unloaded at any point while
> executing try_module_get() before incrementing the refount with
> atomic_inc_not_zero().
> 
> Neglecting the precondition that the module must exist and be live can
> cause unexpected race conditions that can lead to crashes. However,
> ensuring that the precondition is met may require additional locking
> that increases the complexity of the code and can make it more
> error-prone.
> 
> This patch adds a slower yet safer implementation of try_module_get()
> that checks if the module is valid by looking into the mod_tree before
> taking the module's refcount. This new function can be safely called on
> stale and invalid module pointers, relieving developers from the burden
> of ensuring that the module exists and is live before attempting to take
> it.
> 
> The tree lookup and refcount increment are executed after taking the
> module_mutex to prevent the module from being unloaded after looking up
> the tree.
> 
> Signed-off-by: Marco Pagani 

It very much sounds like there is a desire to have this but without a
user, there is no justification.

> +bool try_module_get_safe(struct module *module)
> +{
> + struct module *mod;
> + bool ret = true;
> +
> + if (!module)
> + goto out;
> +
> + mutex_lock(_mutex);

If a user comes around then this should be mutex_lock_interruptible(),
and add might_sleep()

> +
> + /*
> +  * Check if the address points to a valid live module and take
> +  * the refcount only if it points to the module struct.
> +  */
> + mod = __module_address((unsigned long)module);
> + if (mod && mod == module && module_is_live(mod))
> + __module_get(mod);
> + else
> + ret = false;
> +
> + mutex_unlock(_mutex);
> +
> +out:
> + return ret;
> +}
> +EXPORT_SYMBOL(try_module_get_safe);

And EXPORT_SYMBOL_GPL() would need to be used.

I'd also expect selftests to be expanded for this case, but again,
without a user, this is just trying to resolve a problem which does not
exist.

  Luis



[RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-01-30 Thread Marco Pagani
The current implementation of try_module_get() requires the module to
exist and be live as a precondition. While this may seem intuitive at
first glance, enforcing the precondition can be tricky, considering that
modules can be unloaded at any time if not previously taken. For
instance, the caller could be preempted just before calling
try_module_get(), and while preempted, the module could be unloaded and
freed. More subtly, the module could also be unloaded at any point while
executing try_module_get() before incrementing the refount with
atomic_inc_not_zero().

Neglecting the precondition that the module must exist and be live can
cause unexpected race conditions that can lead to crashes. However,
ensuring that the precondition is met may require additional locking
that increases the complexity of the code and can make it more
error-prone.

This patch adds a slower yet safer implementation of try_module_get()
that checks if the module is valid by looking into the mod_tree before
taking the module's refcount. This new function can be safely called on
stale and invalid module pointers, relieving developers from the burden
of ensuring that the module exists and is live before attempting to take
it.

The tree lookup and refcount increment are executed after taking the
module_mutex to prevent the module from being unloaded after looking up
the tree.

Signed-off-by: Marco Pagani 
---
 include/linux/module.h | 15 +++
 kernel/module/main.c   | 27 +++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 08364d5cbc07..86b6ea43d204 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -695,6 +695,19 @@ extern void __module_get(struct module *module);
  */
 extern bool try_module_get(struct module *module);
 
+/**
+ * try_module_get_safe() - safely take the refcount of a module.
+ * @module: address of the module to be taken.
+ *
+ * Safer version of try_module_get(). Check first if the module exists and is 
alive,
+ * and then take its reference count.
+ *
+ * Return:
+ * * %true - module exists and its refcount has been incremented or module is 
NULL.
+ * * %false - module does not exist.
+ */
+extern bool try_module_get_safe(struct module *module);
+
 /**
  * module_put() - release a reference count to a module
  * @module: the module we should release a reference count for
@@ -815,6 +828,8 @@ static inline bool try_module_get(struct module *module)
return true;
 }
 
+#define try_module_get_safe(module) try_module_get(module)
+
 static inline void module_put(struct module *module)
 {
 }
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 98fedfdb8db5..22376b69778c 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -842,6 +842,33 @@ bool try_module_get(struct module *module)
 }
 EXPORT_SYMBOL(try_module_get);
 
+bool try_module_get_safe(struct module *module)
+{
+   struct module *mod;
+   bool ret = true;
+
+   if (!module)
+   goto out;
+
+   mutex_lock(_mutex);
+
+   /*
+* Check if the address points to a valid live module and take
+* the refcount only if it points to the module struct.
+*/
+   mod = __module_address((unsigned long)module);
+   if (mod && mod == module && module_is_live(mod))
+   __module_get(mod);
+   else
+   ret = false;
+
+   mutex_unlock(_mutex);
+
+out:
+   return ret;
+}
+EXPORT_SYMBOL(try_module_get_safe);
+
 void module_put(struct module *module)
 {
int ret;

base-commit: 4515d08a742c76612b65d2f47a87d12860519842
-- 
2.43.0




[PATCH v3 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-22 Thread Bibo Mao
The patch add paravirt interface for guest kernel, function
pv_guest_init firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
in

BUG: unable to handle kernel paging request in __skb_flow_dissect

2024-01-16 Thread Ubisectech Sirius
Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7.0-g052d534373b7. 
Attached to the email were a POC file of the issue.
Stack dump:
[ 185.664167][ T8332] BUG: unable to handle page fault for address: 
ed1029c40001
[ 185.665134][ T8332] #PF: supervisor read access in kernel mode
[ 185.665877][ T8332] #PF: error_code(0x) - not-present page
[ 185.666481][ T8332] PGD 7ffd0067 P4D 7ffd0067 PUD 3fff5067 PMD 0
[ 185.667129][ T8332] Oops:  [#1] PREEMPT SMP KASAN
[ 185.667719][ T8332] CPU: 1 PID: 8332 Comm: poc Not tainted 
6.7.0-g052d534373b7 #19
[ 185.668641][ T8332] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.15.0-1 04/01/2014
[ 185.669639][ T8332] RIP: 0010:__skb_flow_dissect 
(net/core/flow_dissector.c:1170 (discriminator 1))
[ 185.682210][ T8332] Call Trace:
[ 185.682595][ T8332] 
[ 185.717256][ T8332] __skb_get_hash (net/core/flow_dissector.c:1737 
net/core/flow_dissector.c:1770 net/core/flow_dissector.c:1794 
net/core/flow_dissector.c:1856)
[ 185.721978][ T8332] ip_tunnel_xmit (./include/linux/skbuff.h:1566 
net/ipv4/ip_tunnel.c:748)
[ 185.727788][ T8332] ipip_tunnel_xmit (net/ipv4/ipip.c:308)
[ 185.728396][ T8332] dev_hard_start_xmit (./include/linux/netdevice.h:5004 
net/core/dev.c:3547 net/core/dev.c:3563)
[ 185.729082][ T8332] __dev_queue_xmit (./include/linux/netdevice.h:3367 
net/core/dev.c:4352)
[ 185.736814][ T8332] neigh_connected_output (./include/linux/netdevice.h:3171 
net/core/neighbour.c:1592)
[ 185.737536][ T8332] ip_finish_output2 (./include/net/neighbour.h:542 
net/ipv4/ip_output.c:235)
[ 185.742239][ T8332] __ip_finish_output (net/ipv4/ip_output.c:313 
net/ipv4/ip_output.c:295)
[ 185.742943][ T8332] ip_finish_output (net/ipv4/ip_output.c:323)
[ 185.743556][ T8332] ip_mc_output (./include/linux/netfilter.h:303 
net/ipv4/ip_output.c:420)
[ 185.744137][ T8332] ip_local_out (./include/net/dst.h:451 
net/ipv4/ip_output.c:129)
[ 185.744746][ T8332] iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84 
(discriminator 4))
[ 185.745390][ T8332] ip_tunnel_xmit (net/ipv4/ip_tunnel.c:833)
[ 185.750430][ T8332] dev_hard_start_xmit (./include/linux/netdevice.h:5004 
net/core/dev.c:3547 net/core/dev.c:3563)
[ 185.751114][ T8332] __dev_queue_xmit (./include/linux/netdevice.h:3367 
net/core/dev.c:4352)
[ 185.759138][ T8332] __bpf_redirect (./include/linux/netdevice.h:3367 
net/core/filter.c:2136 net/core/filter.c:2165 net/core/filter.c:2188)
[ 185.759757][ T8332] bpf_clone_redirect (net/core/filter.c:2459 
net/core/filter.c:2431)
[ 185.761088][ T8332] ___bpf_prog_run (kernel/bpf/core.c:1986)
[ 185.762499][ T8332] __bpf_prog_run512 (kernel/bpf/core.c:2227)
[ 185.778478][ T8332] bpf_test_run (./include/linux/bpf.h:1231 
./include/linux/filter.h:651 ./include/linux/filter.h:658 
net/bpf/test_run.c:423)
[ 185.783715][ T8332] bpf_prog_test_run_skb (net/bpf/test_run.c:1057)
[ 185.786538][ T8332] __sys_bpf (kernel/bpf/syscall.c:4107 
kernel/bpf/syscall.c:5475)
[ 185.793454][ T8332] __x64_sys_bpf (kernel/bpf/syscall.c:5559)
[ 185.794810][ T8332] do_syscall_64 (arch/x86/entry/common.c:52 
arch/x86/entry/common.c:83)
[ 185.795399][ T8332] entry_SYSCALL_64_after_hwframe 
(arch/x86/entry/entry_64.S:129)
[ 185.796182][ T8332] RIP: 0033:0x7f4f8955df29
Analyze of the issue:
The issue code in the __skb_flow_dissect 
function(net/core/flow_dissector.c:1170). The code are blow:
iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, hlen, &_iph);
if (!iph || iph->ihl < 5) {
fdret = FLOW_DISSECT_RET_OUT_BAD;
break;
}
It looks like the function __skb_header_pointer will return a invalid address, 
and iph->ihl will read the invalid address to get value. So, I think the issue 
is lack of check the iph is valid or no.
Thank you for taking the time to read this email and we look forward to working 
with you further.
 Ubisectech Sirius Team
 Web: www.ubisectech.com
 Email: bugrep...@ubisectech.com


横板竖版组合LOGO_画板 1.png
Description: Binary data


poc.c
Description: Binary data


[PATCH v2 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-07 Thread Bibo Mao
The patch add paravirt interface for guest kernel, function
pv_guest_init firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, and there is only KVM hypervisor
supported on LoongArch now.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..d8ccaf46a50d 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -564,6 +564,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index d183a745fb

Re: [syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options

2024-01-03 Thread Steven Rostedt
On Wed, 03 Jan 2024 13:41:31 -0800
syzbot  wrote:

> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne..
> git tree:   upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=10ec3829e8
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4
> dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a
> compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
> 2.40
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15e52409e8
> 
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/38b92a7149e8/disk-453f5db0.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/4f872267133f/vmlinux-453f5db0.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/587572061791/bzImage-453f5db0.xz
> 
> The issue was bisected to:
> 
> commit 7e8358edf503e87236c8d07f69ef0ed846dd5112
> Author: Steven Rostedt (Google) 
> Date:   Fri Dec 22 00:07:57 2023 +
> 
> eventfs: Fix file and directory uid and gid ownership
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=108cd519e8
> final oops: https://syzkaller.appspot.com/x/report.txt?x=128cd519e8
> console output: https://syzkaller.appspot.com/x/log.txt?x=148cd519e8
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+f8a023e0c6beabe23...@syzkaller.appspotmail.com
> Fixes: 7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership")
> 
> BUG: unable to handle page fault for address: fff0
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x) - not-present page
> PGD d734067 P4D d734067 PUD d736067 PMD 0 
> Oops:  [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 5056 Comm: syz-executor170 Not tainted 
> 6.7.0-rc7-syzkaller-00049-g453f5db0619e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 11/17/2023
> RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline]
> RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337
> Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 
> 00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 
> 83 e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75
> RSP: 0018:c900040ffca8 EFLAGS: 00010246
> RAX: 1ffe RBX: fff0 RCX: 888014bf5940
> RDX:  RSI: 0004 RDI: c900040ffc20
> RBP: dc00 R08: 0003 R09: f5200081ff84
> R10: dc00 R11: f5200081ff84 R12: 88801d743888
> R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810
> FS:  557dd480() GS:8880b980() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: fff0 CR3: 1ec48000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  tracefs_remount+0x78/0x80 fs/tracefs/inode.c:353
>  reconfigure_super+0x440/0x870 fs/super.c:1143
>  do_remount fs/namespace.c:2884 [inline]

This is the same bug that was fixed by:

   
https://lore.kernel.org/linux-trace-kernel/20240102151249.05da2...@gandalf.local.home/

And just waiting to be applied:

   https://lore.kernel.org/all/20240102210731.1f1c5...@gandalf.local.home/

Thanks,

-- Steve

>  path_mount+0xc24/0xfa0 fs/namespace.c:3656
>  do_mount fs/namespace.c:3677 [inline]
>  __do_sys_mount fs/namespace.c:3886 [inline]
>  __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3863
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
> RIP: 0033:0x7fec326e8d99
> Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7ffc8103ddf8 EFLAGS: 0246 ORIG_RAX: 00a5
> RAX: ffda RBX: 7ffc8103de00 RCX: 7fec326e8d99
> RDX:  RSI: 20c0 RDI: 
> RBP: 7ffc8103de08 R08: 2140 R09: 7fec326b5b80
> R10: 02200022 R11: 0246 R12: 
> R13: 7ffc8103e068 R14: 0001 R15: 0001
>  
> Modules linked in:
> CR2: fff0
> ---[ end trace  ]---
> RIP: 0010:set_gid fs/tracefs/inode.c:224

[syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options

2024-01-03 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne..
git tree:   upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=10ec3829e8
kernel config:  https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4
dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15e52409e8

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/38b92a7149e8/disk-453f5db0.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/4f872267133f/vmlinux-453f5db0.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/587572061791/bzImage-453f5db0.xz

The issue was bisected to:

commit 7e8358edf503e87236c8d07f69ef0ed846dd5112
Author: Steven Rostedt (Google) 
Date:   Fri Dec 22 00:07:57 2023 +

eventfs: Fix file and directory uid and gid ownership

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=108cd519e8
final oops: https://syzkaller.appspot.com/x/report.txt?x=128cd519e8
console output: https://syzkaller.appspot.com/x/log.txt?x=148cd519e8

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+f8a023e0c6beabe23...@syzkaller.appspotmail.com
Fixes: 7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership")

BUG: unable to handle page fault for address: fff0
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD d734067 P4D d734067 PUD d736067 PMD 0 
Oops:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 5056 Comm: syz-executor170 Not tainted 
6.7.0-rc7-syzkaller-00049-g453f5db0619e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
11/17/2023
RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline]
RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337
Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 
00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 83 
e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75
RSP: 0018:c900040ffca8 EFLAGS: 00010246
RAX: 1ffe RBX: fff0 RCX: 888014bf5940
RDX:  RSI: 0004 RDI: c900040ffc20
RBP: dc00 R08: 0003 R09: f5200081ff84
R10: dc00 R11: f5200081ff84 R12: 88801d743888
R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810
FS:  557dd480() GS:8880b980() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff0 CR3: 1ec48000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 tracefs_remount+0x78/0x80 fs/tracefs/inode.c:353
 reconfigure_super+0x440/0x870 fs/super.c:1143
 do_remount fs/namespace.c:2884 [inline]
 path_mount+0xc24/0xfa0 fs/namespace.c:3656
 do_mount fs/namespace.c:3677 [inline]
 __do_sys_mount fs/namespace.c:3886 [inline]
 __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3863
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fec326e8d99
Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ffc8103ddf8 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 7ffc8103de00 RCX: 7fec326e8d99
RDX:  RSI: 20c0 RDI: 
RBP: 7ffc8103de08 R08: 2140 R09: 7fec326b5b80
R10: 02200022 R11: 0246 R12: 
R13: 7ffc8103e068 R14: 0001 R15: 0001
 
Modules linked in:
CR2: fff0
---[ end trace  ]---
RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline]
RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337
Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 
00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 83 
e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75
RSP: 0018:c900040ffca8 EFLAGS: 00010246
RAX: 1ffe RBX: fff0 RCX: 888014bf5940
RDX:  RSI: 0004 RDI: c900040ffc20
RBP: dc00 R08: 0003 R09: f5200081ff84
R10: dc00 R11: f5200081ff84 R12: 88801d743888
R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810
FS:  557dd480() GS:8880b980() knlGS:
CS:  0010 DS:  ES: 00

  1   2   3   4   5   6   7   8   9   10   >