Re: [PATCH v11 14/14] selftests/sgx: Add scripts for EPC cgroup testing

2024-04-14 Thread Haitao Huang
On Sun, 14 Apr 2024 10:01:03 -0500, Jarkko Sakkinen   
wrote:



On Wed Apr 10, 2024 at 9:25 PM EEST, Haitao Huang wrote:

To run selftests for EPC cgroup:

sudo ./run_epc_cg_selftests.sh

To watch misc cgroup 'current' changes during testing, run this in a
separate terminal:

./watch_misc_for_tests.sh current

With different cgroups, the script starts one or multiple concurrent SGX
selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
test case, which loads an enclave of EPC size equal to the EPC capacity
available on the platform. The script checks results against the
expectation set for each cgroup and reports success or failure.

The script creates 3 different cgroups at the beginning with following
expectations:

1) SMALL - intentionally small enough to fail the test loading an
enclave of size equal to the capacity.
2) LARGE - large enough to run up to 4 concurrent tests but fail some if
more than 4 concurrent tests are run. The script starts 4 expecting at
least one test to pass, and then starts 5 expecting at least one test
to fail.
3) LARGER - limit is the same as the capacity, large enough to run lots  
of

concurrent tests. The script starts 8 of them and expects all pass.
Then it reruns the same test with one process randomly killed and
usage checked to be zero after all processes exit.

The script also includes a test with low mem_cg limit and LARGE sgx_epc
limit to verify that the RAM used for per-cgroup reclamation is charged
to a proper mem_cg. For this test, it turns off swapping before start,
and turns swapping back on afterwards.

Signed-off-by: Haitao Huang 
---
V11:
- Remove cgroups-tools dependency and make scripts ash compatible.  
(Jarkko)

- Drop support for cgroup v1 and simplify. (Michal, Jarkko)
- Add documentation for functions. (Jarkko)
- Turn off swapping before memcontrol tests and back on after
- Format and style fixes, name for hard coded values

V7:
- Added memcontrol test.

V5:
- Added script with automatic results checking, remove the interactive
script.
- The script can run independent from the series below.
---
 tools/testing/selftests/sgx/ash_cgexec.sh |  16 +
 .../selftests/sgx/run_epc_cg_selftests.sh | 275 ++
 .../selftests/sgx/watch_misc_for_tests.sh |  11 +
 3 files changed, 302 insertions(+)
 create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh
 create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
 create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh

diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh  
b/tools/testing/selftests/sgx/ash_cgexec.sh

new file mode 100755
index ..cfa5d2b0e795
--- /dev/null
+++ b/tools/testing/selftests/sgx/ash_cgexec.sh
@@ -0,0 +1,16 @@
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2024 Intel Corporation.
+
+# Start a program in a given cgroup.
+# Supports V2 cgroup paths, relative to /sys/fs/cgroup
+if [ "$#" -lt 2 ]; then
+echo "Usage: $0   [args...]"
+exit 1
+fi
+# Move this shell to the cgroup.
+echo 0 >/sys/fs/cgroup/$1/cgroup.procs
+shift
+# Execute the command within the cgroup
+exec "$@"
+
diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh  
b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh

new file mode 100755
index ..dd56273056fc
--- /dev/null
+++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
@@ -0,0 +1,275 @@
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2023, 2024 Intel Corporation.
+
+TEST_ROOT_CG=selftest
+TEST_CG_SUB1=$TEST_ROOT_CG/test1
+TEST_CG_SUB2=$TEST_ROOT_CG/test2
+# We will only set limit in test1 and run tests in test3
+TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3
+TEST_CG_SUB4=$TEST_ROOT_CG/test4
+
+# Cgroup v2 only
+CG_ROOT=/sys/fs/cgroup
+mkdir -p $CG_ROOT/$TEST_CG_SUB1
+mkdir -p $CG_ROOT/$TEST_CG_SUB2
+mkdir -p $CG_ROOT/$TEST_CG_SUB3
+mkdir -p $CG_ROOT/$TEST_CG_SUB4
+
+# Turn on misc and memory controller in non-leaf nodes
+echo "+misc" >  $CG_ROOT/cgroup.subtree_control && \
+echo "+memory" > $CG_ROOT/cgroup.subtree_control && \
+echo "+misc" >  $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
+echo "+memory" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
+echo "+misc" >  $CG_ROOT/$TEST_CG_SUB1/cgroup.subtree_control
+if [ $? -ne 0 ]; then
+echo "# Failed setting up cgroups, make sure misc and memory  
cgroups are enabled."

+exit 1
+fi
+
+CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}')
+# This is below number of VA pages needed for enclave of capacity  
size. So

+# should fail oversubscribed cases
+SMALL=$(( CAPACITY / 512 ))
+
+# At least load one enclave of capacity size successfully, maybe up to  
4.
+# But some may fail if we run more than 4 concurrent enclaves of  
capacity size.

+LARGE=$(( SMALL * 4 ))
+
+# Load lots of enclaves
+LARGER=$CAPACITY
+echo "# Setting up limits."
+echo "sgx_epc $SMALL" > $CG_ROOT/$TEST_CG_SUB1/misc.max && \
+echo "sgx_epc $LARGE" >  

general protection fault in integrity_inode_get

2024-04-14 Thread Ubisectech Sirius
Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.8. Attached to the 
email were a PoC file of the issue.

Stack dump:
loop0: detected capacity change from 0 to 64
hfs: unable to locate alternate MDB
hfs: continuing without an alternate MDB
general protection fault, probably for non-canonical address 
0xdc001cdb:  [#1] PREEMPT SMP KASAN NOPTI
KASAN: probably user-memory-access in range 
[0xe6d8-0xe6df]
CPU: 1 PID: 8089 Comm: syz-executor264 Not tainted 6.8.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:integrity_inode_get+0x35e/0x5f0 security/integrity/iint.c:147
Code: 42 80 3c 30 00 0f 85 b6 01 00 00 49 8b 45 00 48 85 c0 74 46 48 89 c3 e8 
c0 5c 70 fd 48 8d bb a8 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00 0f 85 99 
01 00 00 4c 8b ab a8 00 00 00 49 39 ed 77
RSP: 0018:c90001c07720 EFLAGS: 00010206
RAX: 1cdb RBX: e633 RCX: 816ad98e
RDX: 8880174e24c0 RSI: 841a4ff0 RDI: e6db
RBP: 888042ad48d8 R08: 0001 R09: f52000380ed5
R10: 0003 R11:  R12: 88804a3ca128
R13: 88804a391010 R14: dc00 R15: 888042ad48d8
FS:  556a53c0() GS:88807ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8d72207c00 CR3: 20238000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:
 
 process_measurement+0x607/0x1ee0 security/integrity/ima/ima_main.c:251
 ima_file_check+0xba/0x100 security/integrity/ima/ima_main.c:557
 do_open fs/namei.c:3647 [inline]
 path_openat+0x16fa/0x2670 fs/namei.c:3802
 do_filp_open+0x1c9/0x420 fs/namei.c:3829
 do_sys_openat2+0x164/0x1d0 fs/open.c:1404
 do_sys_open fs/open.c:1419 [inline]
 __do_sys_openat fs/open.c:1435 [inline]
 __se_sys_openat fs/open.c:1430 [inline]
 __x64_sys_openat+0x140/0x1f0 fs/open.c:1430
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd5/0x270 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f8d7a6a668d
Code: c3 e8 97 22 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7fffb08e2138 EFLAGS: 0246 ORIG_RAX: 0101
RAX: ffda RBX: 00018fbc RCX: 7f8d7a6a668d
RDX: 00141842 RSI: 2380 RDI: ff9c
RBP:  R08: 0260 R09: 556a6080
R10:  R11: 0246 R12: 7f8d7a6f80b9
R13: 7f8d7a6f80c3 R14: 0008 R15: 7fffb08e21a0
 
Modules linked in:
---[ end trace  ]---
RIP: 0010:integrity_inode_get+0x35e/0x5f0 security/integrity/iint.c:147
Code: 42 80 3c 30 00 0f 85 b6 01 00 00 49 8b 45 00 48 85 c0 74 46 48 89 c3 e8 
c0 5c 70 fd 48 8d bb a8 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00 0f 85 99 
01 00 00 4c 8b ab a8 00 00 00 49 39 ed 77
RSP: 0018:c90001c07720 EFLAGS: 00010206
RAX: 1cdb RBX: e633 RCX: 816ad98e
RDX: 8880174e24c0 RSI: 841a4ff0 RDI: e6db
RBP: 888042ad48d8 R08: 0001 R09: f52000380ed5
R10: 0003 R11:  R12: 88804a3ca128
R13: 88804a391010 R14: dc00 R15: 888042ad48d8
FS:  556a53c0() GS:88807ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8d72207c00 CR3: 20238000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554

Code disassembly (best guess):
   0:   42 80 3c 30 00  cmpb   $0x0,(%rax,%r14,1)
   5:   0f 85 b6 01 00 00   jne0x1c1
   b:   49 8b 45 00 mov0x0(%r13),%rax
   f:   48 85 c0test   %rax,%rax
  12:   74 46   je 0x5a
  14:   48 89 c3mov%rax,%rbx
  17:   e8 c0 5c 70 fd  call   0xfd705cdc
  1c:   48 8d bb a8 00 00 00lea0xa8(%rbx),%rdi
  23:   48 89 f8mov%rdi,%rax
  26:   48 c1 e8 03 shr$0x3,%rax
* 2a:   42 80 3c 30 00  cmpb   $0x0,(%rax,%r14,1) <-- trapping 
instruction
  2f:   0f 85 99 01 00 00   jne0x1ce
  35:   4c 8b ab a8 00 00 00mov0xa8(%rbx),%r13
  3c:   49 39 edcmp%rbp,%r13
  3f:   77  .byte 0x77

Thank you for taking the time to read this email and we look forward to working 
with you further.














poc.c
Description: Binary data


Re: [PATCH] module: ban '.', '..' as module names, ban '/' in module names

2024-04-14 Thread Matthew Wilcox
On Sun, Apr 14, 2024 at 10:05:05PM +0300, Alexey Dobriyan wrote:
> Any other subsystem should use nice helper function aptly named
> 
>   string_is_vfs_ready()
> 
> and apply additional restrictions if necessary.
> 
> /proc/modules hints that newlines should be banned too,
> and \x1f, and whitespace, and similar looking characters 
> from different languages and emojis (except obviously).

I don't see the purpose of allowing any character in 0x01-0x1f.
How annoying to have BEL in there.  And, really, what's the value in
allowing characters after 0x7e?




Re: [PATCH] module: ban '.', '..' as module names, ban '/' in module names

2024-04-14 Thread Luis Chamberlain
On Sun, Apr 14, 2024 at 10:05:05PM +0300, Alexey Dobriyan wrote:
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3616,4 +3616,12 @@ extern int vfs_fadvise(struct file *file, loff_t 
> offset, loff_t len,
>  extern int generic_fadvise(struct file *file, loff_t offset, loff_t len,
>  int advice);
>  
> +/*
> + * Use this if data from userspace end up as directory/filename on
> + * some virtual filesystem.
> + */
> +static inline bool string_is_vfs_ready(const char *s)
> +{
> + return strcmp(s, ".") != 0 && strcmp(s, "..") != 0 && !strchr(s, '/');
> +}
>  #endif /* _LINUX_FS_H */
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2893,6 +2893,11 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>  
>   audit_log_kern_module(mod->name);
>  
> + if (!string_is_vfs_ready(mod->name)) {
> + err = -EINVAL;
> + goto free_module;
> + }
> +

Sensible change however to put string_is_vfs_ready() in include/linux/fs.h 
is a stretch if there really are no other users.

  Luis



[PATCH] module: ban '.', '..' as module names, ban '/' in module names

2024-04-14 Thread Alexey Dobriyan
As the title says, ban

.
..

and any name containing '/' as they show in sysfs as directory names:

/sys/module/${mod.name}

sysfs tries to mangle the name and make '/' into '!' which kind of work
but not really.

Corrupting simple module to have name '/est' and loading it works:

# insmod xxx.ko

$ cat /proc/modules
/est 12288 0 - Live 0x (P)

/proc has no problems with it as it ends in data not pathname.

sysfs mangles it to '/sys/module/!test'.

lsmod is confused:

$ lsmod
Module  Size  Used by
libkmod: ERROR ../libkmod/libkmod-module.c:1998 
kmod_module_get_holders: could not open '/sys/module//est/holders': No such 
file or directory
/est  -2  -2

Size and refcount are bogus entirely.

Apparently lsmod doesn't know about sysfs mangling scheme.

Worse, rmmod doesn't work too:

$ sudo rmmod '/est'
rmmod: ERROR: Module /est is not currently loaded

I don't even want to know what it is doing.

Practically there is no nice way for the admin to get rid of the module,
so we should just ban such names. Writing small program to just delete
module by name could possibly work maybe.

Any other subsystem should use nice helper function aptly named

string_is_vfs_ready()

and apply additional restrictions if necessary.

/proc/modules hints that newlines should be banned too,
and \x1f, and whitespace, and similar looking characters 
from different languages and emojis (except obviously).

Signed-off-by: Alexey Dobriyan 
---

 include/linux/fs.h   |8 
 kernel/module/main.c |5 +
 2 files changed, 13 insertions(+)

--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3616,4 +3616,12 @@ extern int vfs_fadvise(struct file *file, loff_t offset, 
loff_t len,
 extern int generic_fadvise(struct file *file, loff_t offset, loff_t len,
   int advice);
 
+/*
+ * Use this if data from userspace end up as directory/filename on
+ * some virtual filesystem.
+ */
+static inline bool string_is_vfs_ready(const char *s)
+{
+   return strcmp(s, ".") != 0 && strcmp(s, "..") != 0 && !strchr(s, '/');
+}
 #endif /* _LINUX_FS_H */
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2893,6 +2893,11 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
 
audit_log_kern_module(mod->name);
 
+   if (!string_is_vfs_ready(mod->name)) {
+   err = -EINVAL;
+   goto free_module;
+   }
+
/* Reserve our place in the list. */
err = add_unformed_module(mod);
if (err)



Re: [PATCH 1/4] dt-bindings: iio: light: stk33xx: add regulator for vdd supply

2024-04-14 Thread Krzysztof Kozlowski
On 14/04/2024 19:57, Aren Moynihan wrote:
> Signed-off-by: Aren Moynihan 
> ---

Why? Please provide proper commit msg which will explain why you are
doing it (e.g. say about hardware).

Anyway empty commit msgs cannot be accepted.



Best regards,
Krzysztof




[PATCH 4/4] arm64: dts: allwinner: pinephone: Add power supply to stk3311

2024-04-14 Thread Aren Moynihan
From: Ondrej Jirman 

This makes the driver disable the supply during sleep.

Signed-off-by: Ondrej Jirman 
Signed-off-by: Aren Moynihan 
---
 arch/arm64/boot/dts/allwinner/sun50i-a64-pinephone.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-pinephone.dtsi 
b/arch/arm64/boot/dts/allwinner/sun50i-a64-pinephone.dtsi
index b5a232209f2b..e87bc21db316 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64-pinephone.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-pinephone.dtsi
@@ -250,6 +250,7 @@ light-sensor@48 {
reg = <0x48>;
interrupt-parent = <>;
interrupts = <1 0 IRQ_TYPE_EDGE_FALLING>; /* PB0 */
+   vdd-supply = <_ldo_io0>;
};
 
/* Accelerometer/gyroscope */
-- 
2.44.0




[PATCH 3/4] iio: light: stk3310: log error if reading the chip id fails

2024-04-14 Thread Aren Moynihan
If the chip isn't powered, this call is likely to return an error.
Without a log here the driver will silently fail to probe. Common errors
are ENXIO (when the chip isn't powered) and ETIMEDOUT (when the i2c bus
isn't powered).

Signed-off-by: Aren Moynihan 
---
 drivers/iio/light/stk3310.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/light/stk3310.c b/drivers/iio/light/stk3310.c
index bfa090538df7..c0954a63a143 100644
--- a/drivers/iio/light/stk3310.c
+++ b/drivers/iio/light/stk3310.c
@@ -472,8 +472,10 @@ static int stk3310_init(struct iio_dev *indio_dev)
struct i2c_client *client = data->client;
 
ret = regmap_read(data->regmap, STK3310_REG_ID, );
-   if (ret < 0)
+   if (ret < 0) {
+   dev_err(>dev, "failed to read chip id: %d", ret);
return ret;
+   }
 
if (chipid != STK3310_CHIP_ID_VAL &&
chipid != STK3311_CHIP_ID_VAL &&
-- 
2.44.0




[PATCH 2/4] iio: light: stk3310: Implement vdd supply and power it off during suspend

2024-04-14 Thread Aren Moynihan
From: Ondrej Jirman 

VDD power input can be used to completely power off the chip during
system suspend. Do so if available.

Signed-off-by: Ondrej Jirman 
Signed-off-by: Aren Moynihan 
---
 drivers/iio/light/stk3310.c | 56 +++--
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/drivers/iio/light/stk3310.c b/drivers/iio/light/stk3310.c
index 7b71ad71d78d..bfa090538df7 100644
--- a/drivers/iio/light/stk3310.c
+++ b/drivers/iio/light/stk3310.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define STK3310_REG_STATE  0x00
 #define STK3310_REG_PSCTRL 0x01
@@ -117,6 +118,7 @@ struct stk3310_data {
struct regmap_field *reg_int_ps;
struct regmap_field *reg_flag_psint;
struct regmap_field *reg_flag_nf;
+   struct regulator *vdd_reg;
 };
 
 static const struct iio_event_spec stk3310_events[] = {
@@ -607,6 +609,16 @@ static int stk3310_probe(struct i2c_client *client)
 
mutex_init(>lock);
 
+   data->vdd_reg = devm_regulator_get_optional(>dev, "vdd");
+   if (IS_ERR(data->vdd_reg)) {
+   ret = PTR_ERR(data->vdd_reg);
+   if (ret == -ENODEV)
+   data->vdd_reg = NULL;
+   else
+   return dev_err_probe(>dev, ret,
+"get regulator vdd failed\n");
+   }
+
ret = stk3310_regmap_init(data);
if (ret < 0)
return ret;
@@ -617,9 +629,18 @@ static int stk3310_probe(struct i2c_client *client)
indio_dev->channels = stk3310_channels;
indio_dev->num_channels = ARRAY_SIZE(stk3310_channels);
 
+   if (data->vdd_reg) {
+   ret = regulator_enable(data->vdd_reg);
+   if (ret)
+   return dev_err_probe(>dev, ret,
+"regulator vdd enable failed\n");
+
+   usleep_range(1000, 2000);
+   }
+
ret = stk3310_init(indio_dev);
if (ret < 0)
-   return ret;
+   goto err_vdd_disable;
 
if (client->irq > 0) {
ret = devm_request_threaded_irq(>dev, client->irq,
@@ -645,32 +666,61 @@ static int stk3310_probe(struct i2c_client *client)
 
 err_standby:
stk3310_set_state(data, STK3310_STATE_STANDBY);
+err_vdd_disable:
+   if (data->vdd_reg)
+   regulator_disable(data->vdd_reg);
return ret;
 }
 
 static void stk3310_remove(struct i2c_client *client)
 {
struct iio_dev *indio_dev = i2c_get_clientdata(client);
+   struct stk3310_data *data = iio_priv(indio_dev);
 
iio_device_unregister(indio_dev);
stk3310_set_state(iio_priv(indio_dev), STK3310_STATE_STANDBY);
+   if (data->vdd_reg)
+   regulator_disable(data->vdd_reg);
 }
 
 static int stk3310_suspend(struct device *dev)
 {
struct stk3310_data *data;
+   int ret;
 
data = iio_priv(i2c_get_clientdata(to_i2c_client(dev)));
 
-   return stk3310_set_state(data, STK3310_STATE_STANDBY);
+   ret = stk3310_set_state(data, STK3310_STATE_STANDBY);
+   if (ret)
+   return ret;
+
+   if (data->vdd_reg) {
+   regcache_mark_dirty(data->regmap);
+   regulator_disable(data->vdd_reg);
+   }
+
+   return 0;
 }
 
 static int stk3310_resume(struct device *dev)
 {
-   u8 state = 0;
struct stk3310_data *data;
+   u8 state = 0;
+   int ret;
 
data = iio_priv(i2c_get_clientdata(to_i2c_client(dev)));
+
+   if (data->vdd_reg) {
+   ret = regulator_enable(data->vdd_reg);
+   if (ret) {
+   dev_err(dev, "Failed to re-enable regulator vdd\n");
+   return ret;
+   }
+
+   usleep_range(1000, 2000);
+   regcache_sync(data->regmap);
+   }
+
if (data->ps_enabled)
state |= STK3310_STATE_EN_PS;
if (data->als_enabled)
-- 
2.44.0




[PATCH 1/4] dt-bindings: iio: light: stk33xx: add regulator for vdd supply

2024-04-14 Thread Aren Moynihan
Signed-off-by: Aren Moynihan 
---
 Documentation/devicetree/bindings/iio/light/stk33xx.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iio/light/stk33xx.yaml 
b/Documentation/devicetree/bindings/iio/light/stk33xx.yaml
index f6e22dc9814a..db35e239d4a8 100644
--- a/Documentation/devicetree/bindings/iio/light/stk33xx.yaml
+++ b/Documentation/devicetree/bindings/iio/light/stk33xx.yaml
@@ -29,6 +29,7 @@ properties:
   interrupts:
 maxItems: 1
 
+  vdd-supply: true
   proximity-near-level: true
 
 required:
-- 
2.44.0




[PATCH 0/4] iio: light: stk3310: support powering off during suspend

2024-04-14 Thread Aren Moynihan
In the Pine64 PinePhone, the stk3310 chip is powered by a regulator that is
disabled at system boot and can be shut off during suspend. To ensure that
the chip properly initializes, both after boot and suspend, we need to
manage this regulator.

Additionally if the chip is shut off in suspend, we need to make sure that
it gets reinitialized with the same parameters after resume.

Aren Moynihan (2):
  dt-bindings: iio: light: stk33xx: add regulator for vdd supply
  iio: light: stk3310: log error if reading the chip id fails

Ondrej Jirman (2):
  iio: light: stk3310: Implement vdd supply and power it off during
suspend
  arm64: dts: allwinner: pinephone: Add power supply to stk3311

 .../bindings/iio/light/stk33xx.yaml   |  1 +
 .../dts/allwinner/sun50i-a64-pinephone.dtsi   |  1 +
 drivers/iio/light/stk3310.c   | 60 +--
 3 files changed, 58 insertions(+), 4 deletions(-)

-- 
2.44.0




Re: [PATCH] openrisc: Use do_kernel_power_off()

2024-04-14 Thread Sebastian Reichel
Hi,

On Sun, Mar 31, 2024 at 08:02:28AM +0100, Stafford Horne wrote:
> After commit 14c5678720bd ("power: reset: syscon-poweroff: Use
> devm_register_sys_off_handler(POWER_OFF)") setting up of pm_power_off
> was removed from the driver, this causes OpenRISC platforms using
> syscon-poweroff to no longer shutdown.
> 
> The kernel now supports chained power-off handlers. Use
> do_kernel_power_off() that invokes chained power-off handlers.  All
> architectures have moved away from using pm_power_off except OpenRISC.
> 
> This patch migrates openrisc to use do_kernel_power_off() instead of the
> legacy pm_power_off().
> 
> Fixes: 14c5678720bd ("power: reset: syscon-poweroff: Use 
> devm_register_sys_off_handler(POWER_OFF)")
> Signed-off-by: Stafford Horne 
> ---

Reviewed-by: Sebastian Reichel 

-- Sebastian

>  arch/openrisc/kernel/process.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
> index 86e02929f3ac..3c27d1c72718 100644
> --- a/arch/openrisc/kernel/process.c
> +++ b/arch/openrisc/kernel/process.c
> @@ -65,7 +65,7 @@ void machine_restart(char *cmd)
>  }
>  
>  /*
> - * This is used if pm_power_off has not been set by a power management
> + * This is used if a sys-off handler was not set by a power management
>   * driver, in this case we can assume we are on a simulator.  On
>   * OpenRISC simulators l.nop 1 will trigger the simulator exit.
>   */
> @@ -89,10 +89,8 @@ void machine_halt(void)
>  void machine_power_off(void)
>  {
>   printk(KERN_INFO "*** MACHINE POWER OFF ***\n");
> - if (pm_power_off != NULL)
> - pm_power_off();
> - else
> - default_power_off();
> + do_kernel_power_off();
> + default_power_off();
>  }
>  
>  /*
> -- 
> 2.44.0
> 


signature.asc
Description: PGP signature


Re: [GIT PULL] virtio: bugfixes

2024-04-14 Thread pr-tracker-bot
The pull request you sent on Sun, 14 Apr 2024 04:20:47 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/399f4dae683a719eeeca8f30d3871577b53ffcca

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



Re: [PATCH v11 14/14] selftests/sgx: Add scripts for EPC cgroup testing

2024-04-14 Thread Jarkko Sakkinen
On Wed Apr 10, 2024 at 9:25 PM EEST, Haitao Huang wrote:
> To run selftests for EPC cgroup:
>
> sudo ./run_epc_cg_selftests.sh
>
> To watch misc cgroup 'current' changes during testing, run this in a
> separate terminal:
>
> ./watch_misc_for_tests.sh current
>
> With different cgroups, the script starts one or multiple concurrent SGX
> selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
> test case, which loads an enclave of EPC size equal to the EPC capacity
> available on the platform. The script checks results against the
> expectation set for each cgroup and reports success or failure.
>
> The script creates 3 different cgroups at the beginning with following
> expectations:
>
> 1) SMALL - intentionally small enough to fail the test loading an
> enclave of size equal to the capacity.
> 2) LARGE - large enough to run up to 4 concurrent tests but fail some if
> more than 4 concurrent tests are run. The script starts 4 expecting at
> least one test to pass, and then starts 5 expecting at least one test
> to fail.
> 3) LARGER - limit is the same as the capacity, large enough to run lots of
> concurrent tests. The script starts 8 of them and expects all pass.
> Then it reruns the same test with one process randomly killed and
> usage checked to be zero after all processes exit.
>
> The script also includes a test with low mem_cg limit and LARGE sgx_epc
> limit to verify that the RAM used for per-cgroup reclamation is charged
> to a proper mem_cg. For this test, it turns off swapping before start,
> and turns swapping back on afterwards.
>
> Signed-off-by: Haitao Huang 
> ---
> V11:
> - Remove cgroups-tools dependency and make scripts ash compatible. (Jarkko)
> - Drop support for cgroup v1 and simplify. (Michal, Jarkko)
> - Add documentation for functions. (Jarkko)
> - Turn off swapping before memcontrol tests and back on after
> - Format and style fixes, name for hard coded values
>
> V7:
> - Added memcontrol test.
>
> V5:
> - Added script with automatic results checking, remove the interactive
> script.
> - The script can run independent from the series below.
> ---
>  tools/testing/selftests/sgx/ash_cgexec.sh |  16 +
>  .../selftests/sgx/run_epc_cg_selftests.sh | 275 ++
>  .../selftests/sgx/watch_misc_for_tests.sh |  11 +
>  3 files changed, 302 insertions(+)
>  create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh
>  create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
>  create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh
>
> diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh 
> b/tools/testing/selftests/sgx/ash_cgexec.sh
> new file mode 100755
> index ..cfa5d2b0e795
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/ash_cgexec.sh
> @@ -0,0 +1,16 @@
> +#!/usr/bin/env sh
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2024 Intel Corporation.
> +
> +# Start a program in a given cgroup.
> +# Supports V2 cgroup paths, relative to /sys/fs/cgroup
> +if [ "$#" -lt 2 ]; then
> +echo "Usage: $0   [args...]"
> +exit 1
> +fi
> +# Move this shell to the cgroup.
> +echo 0 >/sys/fs/cgroup/$1/cgroup.procs
> +shift
> +# Execute the command within the cgroup
> +exec "$@"
> +
> diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh 
> b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
> new file mode 100755
> index ..dd56273056fc
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
> @@ -0,0 +1,275 @@
> +#!/usr/bin/env sh
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2023, 2024 Intel Corporation.
> +
> +TEST_ROOT_CG=selftest
> +TEST_CG_SUB1=$TEST_ROOT_CG/test1
> +TEST_CG_SUB2=$TEST_ROOT_CG/test2
> +# We will only set limit in test1 and run tests in test3
> +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3
> +TEST_CG_SUB4=$TEST_ROOT_CG/test4
> +
> +# Cgroup v2 only
> +CG_ROOT=/sys/fs/cgroup
> +mkdir -p $CG_ROOT/$TEST_CG_SUB1
> +mkdir -p $CG_ROOT/$TEST_CG_SUB2
> +mkdir -p $CG_ROOT/$TEST_CG_SUB3
> +mkdir -p $CG_ROOT/$TEST_CG_SUB4
> +
> +# Turn on misc and memory controller in non-leaf nodes
> +echo "+misc" >  $CG_ROOT/cgroup.subtree_control && \
> +echo "+memory" > $CG_ROOT/cgroup.subtree_control && \
> +echo "+misc" >  $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
> +echo "+memory" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
> +echo "+misc" >  $CG_ROOT/$TEST_CG_SUB1/cgroup.subtree_control
> +if [ $? -ne 0 ]; then
> +echo "# Failed setting up cgroups, make sure misc and memory cgroups are 
> enabled."
> +exit 1
> +fi
> +
> +CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}')
> +# This is below number of VA pages needed for enclave of capacity size. So
> +# should fail oversubscribed cases
> +SMALL=$(( CAPACITY / 512 ))
> +
> +# At least load one enclave of capacity size successfully, maybe up to 4.
> +# But some may fail if we run more than 4 concurrent enclaves of capacity 
> size.
> +LARGE=$(( SMALL * 4 ))
> +
> 

Re: [PATCH v3] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-14 Thread Google
On Sun, 14 Apr 2024 19:49:45 +0800
qiang4.zh...@linux.intel.com wrote:

> From: Qiang Zhang 
> 
> On the time to free xbc memory in xbc_exit(), memblock may has handed
> over memory to buddy allocator. So it doesn't make sense to free memory
> back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
> on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
> Following KASAN logs shows this case.
> 
> This patch fixes the xbc memory free problem by calling memblock_free()
> in early xbc init error rewind path and calling memblock_free_late() in
> xbc exit path to free memory to buddy allocator.
> 
> [9.410890] 
> ==
> [9.418962] BUG: KASAN: use-after-free in 
> memblock_isolate_range+0x12d/0x260
> [9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1
> 
> [9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
> 6.9.0-rc3-00208-g586b5dfb51b9 #5
> [9.446403] Hardware name: Intel Corporation RPLP LP5 
> (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D- 
> Dec 28 2023
> [9.460789] Call Trace:
> [9.463518]  
> [9.465859]  dump_stack_lvl+0x53/0x70
> [9.469949]  print_report+0xce/0x610
> [9.473944]  ? __virt_addr_valid+0xf5/0x1b0
> [9.478619]  ? memblock_isolate_range+0x12d/0x260
> [9.483877]  kasan_report+0xc6/0x100
> [9.487870]  ? memblock_isolate_range+0x12d/0x260
> [9.493125]  memblock_isolate_range+0x12d/0x260
> [9.498187]  memblock_phys_free+0xb4/0x160
> [9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
> [9.508021]  ? mutex_unlock+0x7e/0xd0
> [9.512111]  ? __pfx_mutex_unlock+0x10/0x10
> [9.516786]  ? kernel_init_freeable+0x2d4/0x430
> [9.521850]  ? __pfx_kernel_init+0x10/0x10
> [9.526426]  xbc_exit+0x17/0x70
> [9.529935]  kernel_init+0x38/0x1e0
> [9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
> [9.538601]  ret_from_fork+0x2c/0x50
> [9.542596]  ? __pfx_kernel_init+0x10/0x10
> [9.547170]  ret_from_fork_asm+0x1a/0x30
> [9.551552]  
> 
> [9.555649] The buggy address belongs to the physical page:
> [9.561875] page: refcount:0 mapcount:0 mapping: index:0x1 
> pfn:0x45dd30
> [9.570821] flags: 0x200(node=0|zone=2)
> [9.576271] page_type: 0x()
> [9.580167] raw: 0200 ea0011774c48 ea0012ba1848 
> 
> [9.588823] raw: 0001   
> 
> [9.597476] page dumped because: kasan: bad access detected
> 
> [9.605362] Memory state around the buggy address:
> [9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00
> [9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00
> [9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.634930]^
> [9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.654675] 
> ==
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Qiang Zhang 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thanks!

> ---
> v3:
> - add NULL pointer check in memblock_free_late() path.
> 
> v2:
> - add an early flag in xbc_free_mem() to free memory back to memblock in
>   xbc_init error path or put memory to buddy allocator in normal xbc_exit.
> ---
>  include/linux/bootconfig.h |  7 ++-
>  lib/bootconfig.c   | 19 +++
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
> index e5ee2c694401..3f4b4ac527ca 100644
> --- a/include/linux/bootconfig.h
> +++ b/include/linux/bootconfig.h
> @@ -288,7 +288,12 @@ int __init xbc_init(const char *buf, size_t size, const 
> char **emsg, int *epos);
>  int __init xbc_get_info(int *node_size, size_t *data_size);
>  
>  /* XBC cleanup data structures */
> -void __init xbc_exit(void);
> +void __init _xbc_exit(bool early);
> +
> +static inline void xbc_exit(void)
> +{
> + _xbc_exit(false);
> +}
>  
>  /* XBC embedded bootconfig data in kernel */
>  #ifdef CONFIG_BOOT_CONFIG_EMBED
> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
> index c59d26068a64..8841554432d5 100644
> --- a/lib/bootconfig.c
> +++ b/lib/bootconfig.c
> @@ -61,9 +61,12 @@ static inline void * __init xbc_alloc_mem(size_t size)
>   return memblock_alloc(size, SMP_CACHE_BYTES);
>  }
>  
> -static inline void __init xbc_free_mem(void *addr, size_t size)
> +static inline void __init xbc_free_mem(void *addr, size_t size, bool early)
>  {
> - memblock_free(addr, size);
> + if (early)
> + memblock_free(addr, size);
> + else if (addr)
> + memblock_free_late(__pa(addr), 

[PATCH v3] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-14 Thread qiang4 . zhang
From: Qiang Zhang 

On the time to free xbc memory in xbc_exit(), memblock may has handed
over memory to buddy allocator. So it doesn't make sense to free memory
back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
Following KASAN logs shows this case.

This patch fixes the xbc memory free problem by calling memblock_free()
in early xbc init error rewind path and calling memblock_free_late() in
xbc exit path to free memory to buddy allocator.

[9.410890] 
==
[9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260
[9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1

[9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
6.9.0-rc3-00208-g586b5dfb51b9 #5
[9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP 
LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D- Dec 28 2023
[9.460789] Call Trace:
[9.463518]  
[9.465859]  dump_stack_lvl+0x53/0x70
[9.469949]  print_report+0xce/0x610
[9.473944]  ? __virt_addr_valid+0xf5/0x1b0
[9.478619]  ? memblock_isolate_range+0x12d/0x260
[9.483877]  kasan_report+0xc6/0x100
[9.487870]  ? memblock_isolate_range+0x12d/0x260
[9.493125]  memblock_isolate_range+0x12d/0x260
[9.498187]  memblock_phys_free+0xb4/0x160
[9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
[9.508021]  ? mutex_unlock+0x7e/0xd0
[9.512111]  ? __pfx_mutex_unlock+0x10/0x10
[9.516786]  ? kernel_init_freeable+0x2d4/0x430
[9.521850]  ? __pfx_kernel_init+0x10/0x10
[9.526426]  xbc_exit+0x17/0x70
[9.529935]  kernel_init+0x38/0x1e0
[9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
[9.538601]  ret_from_fork+0x2c/0x50
[9.542596]  ? __pfx_kernel_init+0x10/0x10
[9.547170]  ret_from_fork_asm+0x1a/0x30
[9.551552]  

[9.555649] The buggy address belongs to the physical page:
[9.561875] page: refcount:0 mapcount:0 mapping: index:0x1 
pfn:0x45dd30
[9.570821] flags: 0x200(node=0|zone=2)
[9.576271] page_type: 0x()
[9.580167] raw: 0200 ea0011774c48 ea0012ba1848 

[9.588823] raw: 0001   

[9.597476] page dumped because: kasan: bad access detected

[9.605362] Memory state around the buggy address:
[9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff
[9.634930]^
[9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff
[9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff
[9.654675] 
==

Cc: sta...@vger.kernel.org
Signed-off-by: Qiang Zhang 
---
v3:
- add NULL pointer check in memblock_free_late() path.

v2:
- add an early flag in xbc_free_mem() to free memory back to memblock in
  xbc_init error path or put memory to buddy allocator in normal xbc_exit.
---
 include/linux/bootconfig.h |  7 ++-
 lib/bootconfig.c   | 19 +++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index e5ee2c694401..3f4b4ac527ca 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -288,7 +288,12 @@ int __init xbc_init(const char *buf, size_t size, const 
char **emsg, int *epos);
 int __init xbc_get_info(int *node_size, size_t *data_size);
 
 /* XBC cleanup data structures */
-void __init xbc_exit(void);
+void __init _xbc_exit(bool early);
+
+static inline void xbc_exit(void)
+{
+   _xbc_exit(false);
+}
 
 /* XBC embedded bootconfig data in kernel */
 #ifdef CONFIG_BOOT_CONFIG_EMBED
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index c59d26068a64..8841554432d5 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -61,9 +61,12 @@ static inline void * __init xbc_alloc_mem(size_t size)
return memblock_alloc(size, SMP_CACHE_BYTES);
 }
 
-static inline void __init xbc_free_mem(void *addr, size_t size)
+static inline void __init xbc_free_mem(void *addr, size_t size, bool early)
 {
-   memblock_free(addr, size);
+   if (early)
+   memblock_free(addr, size);
+   else if (addr)
+   memblock_free_late(__pa(addr), size);
 }
 
 #else /* !__KERNEL__ */
@@ -73,7 +76,7 @@ static inline void *xbc_alloc_mem(size_t size)
return malloc(size);
 }
 
-static inline void xbc_free_mem(void *addr, size_t size)
+static inline void xbc_free_mem(void *addr, size_t size, bool early)
 {
free(addr);
 }
@@ -904,13 +907,13 @@ static int __init xbc_parse_tree(void)
  * If you need to 

Re: [PATCH v2] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-14 Thread Qiang Zhang
On Sat, Apr 13, 2024 at 09:21:38PM +0900, Masami Hiramatsu wrote:
>Hi Qiang,
>
>I found xbc_free_mem() missed to check !addr. When I booted kernel without
>bootconfig data but with "bootconfig" cmdline, I got a kernel crash below;
>
>
>[2.394904] [ cut here ]
>[2.396490] kernel BUG at arch/x86/mm/physaddr.c:28!
>[2.398176] invalid opcode:  [#1] PREEMPT SMP PTI
>[2.399388] CPU: 7 PID: 1 Comm: swapper/0 Tainted: G N 
>6.9.0-rc3-4-g121fbb463836 #10
>[2.401579] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>1.15.0-1 04/01/2014
>[2.403247] RIP: 0010:__phys_addr+0x40/0x60
>[2.404196] Code: 48 2b 05 fb a4 3d 01 48 05 00 00 00 80 48 39 c7 72 17 0f 
>b6 0d ee 9e c0 01 48 89 c2 48 d3 ea 48 85 d2 75 05 c3 cc cc cc cc 90 <0f> 0b 
>48 03 05 e7 e2 9d 01 48 81 ff ff ff ff 1f 76 e8 90 0f6
>[2.407250] RSP: :c9013f18 EFLAGS: 00010287
>[2.407991] RAX: 7780 RBX: 81c17940 RCX: 
>008a
>[2.408891] RDX: 008b RSI: 88800775f320 RDI: 
>8000
>[2.409727] RBP:  R08:  R09: 
>
>[2.410555] R10: 888005028a60 R11: 008a R12: 
>
>[2.411423] R13:  R14:  R15: 
>
>[2.412155] FS:  () GS:88807d9c() 
>knlGS:
>[2.412970] CS:  0010 DS:  ES:  CR0: 80050033
>[2.413550] CR2:  CR3: 02a48000 CR4: 
>06b0
>[2.414264] Call Trace:
>[2.414520]  
>[2.414755]  ? die+0x37/0x90
>[2.415062]  ? do_trap+0xe3/0x110
>[2.415451]  ? __phys_addr+0x40/0x60
>[2.415822]  ? do_error_trap+0x9c/0x120
>[2.416215]  ? __phys_addr+0x40/0x60
>[2.416573]  ? __phys_addr+0x40/0x60
>[2.416968]  ? exc_invalid_op+0x53/0x70
>[2.417358]  ? __phys_addr+0x40/0x60
>[2.417709]  ? asm_exc_invalid_op+0x1a/0x20
>[2.418122]  ? __pfx_kernel_init+0x10/0x10
>[2.418569]  ? __phys_addr+0x40/0x60
>[2.418960]  _xbc_exit+0x74/0xc0
>[2.419374]  kernel_init+0x3a/0x1c0
>[2.419764]  ret_from_fork+0x34/0x50
>[2.420132]  ? __pfx_kernel_init+0x10/0x10
>[2.420578]  ret_from_fork_asm+0x1a/0x30
>[2.420973]  
>[2.421200] Modules linked in:
>[2.421598] ---[ end trace  ]---
>[2.422053] RIP: 0010:__phys_addr+0x40/0x60
>[2.422484] Code: 48 2b 05 fb a4 3d 01 48 05 00 00 00 80 48 39 c7 72 17 0f 
>b6 0d ee 9e c0 01 48 89 c2 48 d3 ea 48 85 d2 75 05 c3 cc cc cc cc 90 <0f> 0b 
>48 03 05 e7 e2 9d 01 48 81 ff ff ff ff 1f 76 e8 90 0f6
>[2.424294] RSP: :c9013f18 EFLAGS: 00010287
>[2.424769] RAX: 7780 RBX: 81c17940 RCX: 
>008a
>[2.425378] RDX: 008b RSI: 88800775f320 RDI: 
>8000
>[2.425993] RBP:  R08:  R09: 
>
>[2.426589] R10: 888005028a60 R11: 008a R12: 
>
>[2.427156] R13:  R14:  R15: 
>
>[2.427746] FS:  () GS:88807d9c() 
>knlGS:
>[2.428368] CS:  0010 DS:  ES:  CR0: 80050033
>[2.428820] CR2:  CR3: 02a48000 CR4: 
>06b0
>[2.429373] Kernel panic - not syncing: Fatal exception
>[2.429982] Kernel Offset: disabled
>[2.430261] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
>Adding below patch fixed it.
>
>diff --git a/lib/bootconfig.c b/lib/bootconfig.c
>index f9a45adc6307..8841554432d5 100644
>--- a/lib/bootconfig.c
>+++ b/lib/bootconfig.c
>@@ -65,7 +65,7 @@ static inline void __init xbc_free_mem(void *addr, size_t 
>size, bool early)
> {
>   if (early)
>   memblock_free(addr, size);
>-  else
>+  else if (addr)
>   memblock_free_late(__pa(addr), size);
> }
> 
>Can you update with this fix?

Sure.

>
>Thank you,
>
>
>On Fri, 12 Apr 2024 22:18:20 +0900
>Masami Hiramatsu (Google)  wrote:
>
>> On Fri, 12 Apr 2024 18:49:41 +0800
>> qiang4.zh...@linux.intel.com wrote:
>> 
>> > From: Qiang Zhang 
>> > 
>> > On the time to free xbc memory in xbc_exit(), memblock may has handed
>> > over memory to buddy allocator. So it doesn't make sense to free memory
>> > back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
>> > on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
>> > Following KASAN logs shows this case.
>> > 
>> > This patch fixes the xbc memory free problem by calling memblock_free()
>> > in early xbc init error rewind path and calling memblock_free_late() in
>> > xbc exit path to free memory to buddy allocator.
>> > 
>> > [9.410890] 
>> > ==
>> > [9.418962] BUG: KASAN: use-after-free in 
>> > 

[syzbot] [trace?] [bpf?] possible deadlock in pwq_dec_nr_in_flight

2024-04-14 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15d2559918
kernel config:  https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=92438ab91cb6348b16fa
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/f6c04726a2ae/disk-fe46a7dd.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/09c26ce901ea/vmlinux-fe46a7dd.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/134acf7f5322/bzImage-fe46a7dd.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+92438ab91cb6348b1...@syzkaller.appspotmail.com

[ cut here ]
==
WARNING: possible circular locking dependency detected
6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted
--
kworker/u8:5/987 is trying to acquire lock:
8e126300 (console_owner){}-{0:0}, at: console_trylock_spinning 
kernel/printk/printk.c:1997 [inline]
8e126300 (console_owner){}-{0:0}, at: vprintk_emit+0x3d6/0x770 
kernel/printk/printk.c:2341

but task is already holding lock:
8881483629a0 (>lock){..-.}-{2:2}, at: node_activate_pending_pwq 
kernel/workqueue.c:1882 [inline]
8881483629a0 (>lock){..-.}-{2:2}, at: pwq_dec_nr_active 
kernel/workqueue.c:1993 [inline]
8881483629a0 (>lock){..-.}-{2:2}, at: pwq_dec_nr_in_flight+0x32a/0xd60 
kernel/workqueue.c:2017

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (>lock){..-.}-{2:2}:
   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
   __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
   pwq_tryinc_nr_active+0x2ef/0x720 kernel/workqueue.c:1774
   __queue_work+0xa9d/0xec0 kernel/workqueue.c:2395
   queue_work_on+0x14f/0x250 kernel/workqueue.c:2435
   queue_work include/linux/workqueue.h:605 [inline]
   call_usermodehelper_exec+0x286/0x4a0 kernel/umh.c:434
   kobject_uevent_env+0x6b5/0x8f0 lib/kobject_uevent.c:618
   driver_register+0x2d6/0x320 drivers/base/driver.c:254
   pcie_init_services+0xa/0x20 drivers/pci/pcie/portdrv.c:828
   pcie_portdrv_init+0x38/0x60 drivers/pci/pcie/portdrv.c:839
   do_one_initcall+0x238/0x830 init/main.c:1241
   do_initcall_level+0x157/0x210 init/main.c:1303
   do_initcalls+0x3f/0x80 init/main.c:1319
   kernel_init_freeable+0x435/0x5d0 init/main.c:1550
   kernel_init+0x1d/0x2a0 init/main.c:1439
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

-> #3 (>lock){-.-.}-{2:2}:
   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
   __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
   __queue_work+0x6ec/0xec0
   queue_work_on+0x14f/0x250 kernel/workqueue.c:2435
   queue_work include/linux/workqueue.h:605 [inline]
   rpm_suspend+0xe99/0x1780 drivers/base/power/runtime.c:662
   __pm_runtime_idle+0x131/0x1a0 drivers/base/power/runtime.c:1104
   pm_runtime_put include/linux/pm_runtime.h:448 [inline]
   __device_attach+0x3e5/0x520 drivers/base/dd.c:1048
   bus_probe_device+0x189/0x260 drivers/base/bus.c:532
   device_add+0x8ff/0xca0 drivers/base/core.c:3639
   serial_base_port_add+0x2b6/0x3f0 drivers/tty/serial/serial_base_bus.c:178
   serial_core_port_device_add drivers/tty/serial/serial_core.c:3353 
[inline]
   serial_core_register_port+0x393/0x1e30 
drivers/tty/serial/serial_core.c:3394
   serial8250_register_8250_port+0x1433/0x1cd0 
drivers/tty/serial/8250/8250_core.c:1138
   serial_pnp_probe+0x7d5/0xa20 drivers/tty/serial/8250/8250_pnp.c:478
   pnp_device_probe+0x2ba/0x460 drivers/pnp/driver.c:111
   really_probe+0x29e/0xc50 drivers/base/dd.c:658
   __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
   driver_probe_device+0x50/0x430 drivers/base/dd.c:830
   __driver_attach+0x45f/0x710 drivers/base/dd.c:1216
   bus_for_each_dev+0x239/0x2b0 drivers/base/bus.c:368
   bus_add_driver+0x347/0x620 drivers/base/bus.c:673
   driver_register+0x23a/0x320 drivers/base/driver.c:246
   serial8250_init+0x9e/0x170 drivers/tty/serial/8250/8250_core.c:1239
   do_one_initcall+0x238/0x830 init/main.c:1241
   do_initcall_level+0x157/0x210 init/main.c:1303
   do_initcalls+0x3f/0x80 init/main.c:1319
   kernel_init_freeable+0x435/0x5d0 init/main.c:1550
   

[PATCH v2] vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API

2024-04-14 Thread Christophe JAILLET
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_max() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET 
Reviewed-by: Simon Horman 
---
Changes in V2:
   - fix a typo in the commit message
   - Add a R-b tag

V1: 
https://lore.kernel.org/all/bd27d4066f7749997a75cf4111fbf51e11d5898d.1705350942.git.christophe.jail...@wanadoo.fr/
---
 drivers/vhost/vdpa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ba52d128aeb7..63a53680a85c 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1548,7 +1548,7 @@ static void vhost_vdpa_release_dev(struct device *device)
struct vhost_vdpa *v =
   container_of(device, struct vhost_vdpa, dev);
 
-   ida_simple_remove(_vdpa_ida, v->minor);
+   ida_free(_vdpa_ida, v->minor);
kfree(v->vqs);
kfree(v);
 }
@@ -1571,8 +1571,8 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
if (!v)
return -ENOMEM;
 
-   minor = ida_simple_get(_vdpa_ida, 0,
-  VHOST_VDPA_DEV_MAX, GFP_KERNEL);
+   minor = ida_alloc_max(_vdpa_ida, VHOST_VDPA_DEV_MAX - 1,
+ GFP_KERNEL);
if (minor < 0) {
kfree(v);
return minor;
-- 
2.44.0




Re: [PATCH] vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API

2024-04-14 Thread Michael S. Tsirkin
On Sun, Apr 14, 2024 at 10:59:06AM +0200, Christophe JAILLET wrote:
> Le 14/04/2024 à 10:35, Michael S. Tsirkin a écrit :
> > On Mon, Jan 15, 2024 at 09:35:50PM +0100, Christophe JAILLET wrote:
> > > ida_alloc() and ida_free() should be preferred to the deprecated
> > > ida_simple_get() and ida_simple_remove().
> > > 
> > > Note that the upper limit of ida_simple_get() is exclusive, buInputt the 
> > > one of
> > 
> > What's buInputt? But?
> 
> Yes, sorry. It is "but".
> 
> Let me know if I should send a v2, or if it can be fixed when it is applied.
> 
> CJ

Yes it's easier if you do. Thanks!

> > 
> > > ida_alloc_max() is inclusive. So a -1 has been added when needed.
> > > 
> > > Signed-off-by: Christophe JAILLET 
> > 
> > 
> > Jason, wanna ack?
> > 
> > > ---
> > >   drivers/vhost/vdpa.c | 6 +++---
> > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index bc4a51e4638b..849b9d2dd51f 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -1534,7 +1534,7 @@ static void vhost_vdpa_release_dev(struct device 
> > > *device)
> > >   struct vhost_vdpa *v =
> > >  container_of(device, struct vhost_vdpa, dev);
> > > - ida_simple_remove(_vdpa_ida, v->minor);
> > > + ida_free(_vdpa_ida, v->minor);
> > >   kfree(v->vqs);
> > >   kfree(v);
> > >   }
> > > @@ -1557,8 +1557,8 @@ static int vhost_vdpa_probe(struct vdpa_device 
> > > *vdpa)
> > >   if (!v)
> > >   return -ENOMEM;
> > > - minor = ida_simple_get(_vdpa_ida, 0,
> > > -VHOST_VDPA_DEV_MAX, GFP_KERNEL);
> > > + minor = ida_alloc_max(_vdpa_ida, VHOST_VDPA_DEV_MAX - 1,
> > > +   GFP_KERNEL);
> > >   if (minor < 0) {
> > >   kfree(v);
> > >   return minor;
> > > -- 
> > > 2.43.0
> > 
> > 
> > 




Re: [PATCH] vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API

2024-04-14 Thread Christophe JAILLET

Le 14/04/2024 à 10:35, Michael S. Tsirkin a écrit :

On Mon, Jan 15, 2024 at 09:35:50PM +0100, Christophe JAILLET wrote:

ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, buInputt the one of


What's buInputt? But?


Yes, sorry. It is "but".

Let me know if I should send a v2, or if it can be fixed when it is applied.

CJ




ida_alloc_max() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET 



Jason, wanna ack?


---
  drivers/vhost/vdpa.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index bc4a51e4638b..849b9d2dd51f 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1534,7 +1534,7 @@ static void vhost_vdpa_release_dev(struct device *device)
struct vhost_vdpa *v =
   container_of(device, struct vhost_vdpa, dev);
  
-	ida_simple_remove(_vdpa_ida, v->minor);

+   ida_free(_vdpa_ida, v->minor);
kfree(v->vqs);
kfree(v);
  }
@@ -1557,8 +1557,8 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
if (!v)
return -ENOMEM;
  
-	minor = ida_simple_get(_vdpa_ida, 0,

-  VHOST_VDPA_DEV_MAX, GFP_KERNEL);
+   minor = ida_alloc_max(_vdpa_ida, VHOST_VDPA_DEV_MAX - 1,
+ GFP_KERNEL);
if (minor < 0) {
kfree(v);
return minor;
--
2.43.0









Re: [PATCH] vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API

2024-04-14 Thread Michael S. Tsirkin
On Mon, Jan 15, 2024 at 09:35:50PM +0100, Christophe JAILLET wrote:
> ida_alloc() and ida_free() should be preferred to the deprecated
> ida_simple_get() and ida_simple_remove().
> 
> Note that the upper limit of ida_simple_get() is exclusive, buInputt the one 
> of

What's buInputt? But?

> ida_alloc_max() is inclusive. So a -1 has been added when needed.
> 
> Signed-off-by: Christophe JAILLET 


Jason, wanna ack?

> ---
>  drivers/vhost/vdpa.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index bc4a51e4638b..849b9d2dd51f 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1534,7 +1534,7 @@ static void vhost_vdpa_release_dev(struct device 
> *device)
>   struct vhost_vdpa *v =
>  container_of(device, struct vhost_vdpa, dev);
>  
> - ida_simple_remove(_vdpa_ida, v->minor);
> + ida_free(_vdpa_ida, v->minor);
>   kfree(v->vqs);
>   kfree(v);
>  }
> @@ -1557,8 +1557,8 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
>   if (!v)
>   return -ENOMEM;
>  
> - minor = ida_simple_get(_vdpa_ida, 0,
> -VHOST_VDPA_DEV_MAX, GFP_KERNEL);
> + minor = ida_alloc_max(_vdpa_ida, VHOST_VDPA_DEV_MAX - 1,
> +   GFP_KERNEL);
>   if (minor < 0) {
>   kfree(v);
>   return minor;
> -- 
> 2.43.0




Re: [PATCH] vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API

2024-04-14 Thread Christophe JAILLET

Le 16/01/2024 à 15:57, Simon Horman a écrit :

On Mon, Jan 15, 2024 at 09:35:50PM +0100, Christophe JAILLET wrote:

ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, buInputt the one of
ida_alloc_max() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET 


Reviewed-by: Simon Horman 





Hi,

polite reminder ;-)

CJ



[GIT PULL] virtio: bugfixes

2024-04-14 Thread Michael S. Tsirkin
The following changes since commit fec50db7033ea478773b159e0e2efb135270e3b7:

  Linux 6.9-rc3 (2024-04-07 13:22:46 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 76f408535aab39c33e0a1dcada9fba5631c65595:

  vhost: correct misleading printing information (2024-04-08 04:11:04 -0400)


virtio: bugfixes

Some small, obvious (in hindsight) bugfixes:

- new ioctl in vhost-vdpa has a wrong # - not too late to fix

- vhost has apparently been lacking an smp_rmb() -
  due to code duplication :( The duplication will be fixed in
  the next merge cycle, this is a minimal fix.

- an error message in vhost talks about guest moving used index -
  which of course never happens, guest only ever moves the
  available index.

- i2c-virtio didn't set the driver owner so it did not get
  refcounted correctly.

Signed-off-by: Michael S. Tsirkin 


Gavin Shan (2):
  vhost: Add smp_rmb() in vhost_vq_avail_empty()
  vhost: Add smp_rmb() in vhost_enable_notify()

Krzysztof Kozlowski (1):
  virtio: store owner from modules with register_virtio_driver()

Michael S. Tsirkin (1):
  vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE

Xianting Tian (1):
  vhost: correct misleading printing information

 .../driver-api/virtio/writing_virtio_drivers.rst   |  1 -
 drivers/vhost/vhost.c  | 30 ++
 drivers/virtio/virtio.c|  6 +++--
 include/linux/virtio.h |  7 +++--
 include/uapi/linux/vhost.h | 15 ++-
 5 files changed, 42 insertions(+), 17 deletions(-)




Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

2024-04-14 Thread Mike Rapoport
On Fri, Apr 12, 2024 at 11:08:00AM +0200, Ingo Molnar wrote:
> 
> * Mike Rapoport  wrote:
> 
> > for (s = start; s < end; s++) {
> > void *addr = (void *)s + *s;
> > +   void *wr_addr = addr + module_writable_offset(mod, addr);
> 
> So instead of repeating this pattern in a dozen of places, why not use a 
> simpler method:
> 
>   void *wr_addr = module_writable_address(mod, addr);
> 
> or so, since we have to pass 'addr' to the module code anyway.

Agree.
 
> The text patching code is pretty complex already.
> 
> Thanks,
> 
>   Ingo

-- 
Sincerely yours,
Mike.



Re: [RFC PATCH 2/7] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations

2024-04-14 Thread Mike Rapoport
On Fri, Apr 12, 2024 at 06:07:19AM +, Christophe Leroy wrote:
> 
> 
> Le 11/04/2024 à 18:05, Mike Rapoport a écrit :
> > From: "Mike Rapoport (IBM)" 
> > 
> > vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explictly
> > specify node ID will use huge pages only if size_per_node is larger than
> > PMD_SIZE.
> > Still the actual allocated memory is not distributed between nodes and
> > there is no advantage in such approach.
> > On the contrary, BPF allocates PMD_SIZE * num_possible_nodes() for each
> > new bpf_prog_pack, while it could do with PMD_SIZE'ed packs.
> > 
> > Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with
> > NUMA_NO_NODE and use huge pages whenever the requested allocation size
> > is larger than PMD_SIZE.
> 
> Patch looks ok but message is confusing. We also use huge pages at PTE 
> size, for instance 512k pages or 16k pages on powerpc 8xx, while 
> PMD_SIZE is 4M.

Ok, I'll rephrase.
 
> Christophe
> 
> > 
> > Signed-off-by: Mike Rapoport (IBM) 
> > ---
> >   mm/vmalloc.c | 9 ++---
> >   1 file changed, 2 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 22aa63f4ef63..5fc8b514e457 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3737,8 +3737,6 @@ void *__vmalloc_node_range(unsigned long size, 
> > unsigned long align,
> > }
> >   
> > if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
> > -   unsigned long size_per_node;
> > -
> > /*
> >  * Try huge pages. Only try for PAGE_KERNEL allocations,
> >  * others like modules don't yet expect huge pages in
> > @@ -3746,13 +3744,10 @@ void *__vmalloc_node_range(unsigned long size, 
> > unsigned long align,
> >  * supporting them.
> >  */
> >   
> > -   size_per_node = size;
> > -   if (node == NUMA_NO_NODE)
> > -   size_per_node /= num_online_nodes();
> > -   if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
> > +   if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
> > shift = PMD_SHIFT;
> > else
> > -   shift = arch_vmap_pte_supported_shift(size_per_node);
> > +   shift = arch_vmap_pte_supported_shift(size);
> >   
> > align = max(real_align, 1UL << shift);
> > size = ALIGN(real_size, 1UL << shift);

-- 
Sincerely yours,
Mike.



Re: [PATCH v4 06/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-04-14 Thread Mike Rapoport
On Thu, Apr 11, 2024 at 10:53:46PM +0200, Sam Ravnborg wrote:
> Hi Mike.
> 
> On Thu, Apr 11, 2024 at 07:00:42PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" 
> > 
> > Several architectures override module_alloc() only to define address
> > range for code allocations different than VMALLOC address space.
> > 
> > Provide a generic implementation in execmem that uses the parameters for
> > address space ranges, required alignment and page protections provided
> > by architectures.
> > 
> > The architectures must fill execmem_info structure and implement
> > execmem_arch_setup() that returns a pointer to that structure. This way the
> > execmem initialization won't be called from every architecture, but rather
> > from a central place, namely a core_initcall() in execmem.
> > 
> > The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
> > with the parameters defined by the architectures.  If an architecture does
> > not implement execmem_arch_setup(), execmem_alloc() will fall back to
> > module_alloc().
> > 
> > Signed-off-by: Mike Rapoport (IBM) 
> > ---
> 
> This code snippet could be more readable ...
> > diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
> > index 66c45a2764bc..b70047f944cc 100644
> > --- a/arch/sparc/kernel/module.c
> > +++ b/arch/sparc/kernel/module.c
> > @@ -14,6 +14,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -21,34 +22,26 @@
> >  
> >  #include "entry.h"
> >  
> > +static struct execmem_info execmem_info __ro_after_init = {
> > +   .ranges = {
> > +   [EXECMEM_DEFAULT] = {
> >  #ifdef CONFIG_SPARC64
> > -
> > -#include 
> > -
> > -static void *module_map(unsigned long size)
> > -{
> > -   if (PAGE_ALIGN(size) > MODULES_LEN)
> > -   return NULL;
> > -   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> > -   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> > -   __builtin_return_address(0));
> > -}
> > +   .start = MODULES_VADDR,
> > +   .end = MODULES_END,
> >  #else
> > -static void *module_map(unsigned long size)
> > +   .start = VMALLOC_START,
> > +   .end = VMALLOC_END,
> > +#endif
> > +   .alignment = 1,
> > +   },
> > +   },
> > +};
> > +
> > +struct execmem_info __init *execmem_arch_setup(void)
> >  {
> > -   return vmalloc(size);
> > -}
> > -#endif /* CONFIG_SPARC64 */
> > -
> > -void *module_alloc(unsigned long size)
> > -{
> > -   void *ret;
> > -
> > -   ret = module_map(size);
> > -   if (ret)
> > -   memset(ret, 0, size);
> > +   execmem_info.ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL;
> >  
> > -   return ret;
> > +   return _info;
> >  }
> >  
> >  /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
> 
> ... if the following was added:
> 
> diff --git a/arch/sparc/include/asm/pgtable_32.h 
> b/arch/sparc/include/asm/pgtable_32.h
> index 9e85d57ac3f2..62bcafe38b1f 100644
> --- a/arch/sparc/include/asm/pgtable_32.h
> +++ b/arch/sparc/include/asm/pgtable_32.h
> @@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct 
> vm_area_struct *vma,
> 
>  #define VMALLOC_START   _AC(0xfe60,UL)
>  #define VMALLOC_END _AC(0xffc0,UL)
> +#define MODULES_VADDR   VMALLOC_START
> +#define MODULES_END VMALLOC_END
> 
> 
> Then the #ifdef CONFIG_SPARC64 could be dropped and the code would be
> the same for 32 and 64 bits.
 
Yeah, the #ifdef there can be dropped even regardless of execmem.
I'll add a patch for that.

> Just a drive-by comment.
> 
>   Sam
> 

-- 
Sincerely yours,
Mike.



Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-14 Thread Mike Rapoport
On Fri, Apr 12, 2024 at 11:16:10AM +0200, Ingo Molnar wrote:
> 
> * Mike Rapoport  wrote:
> 
> > +/**
> > + * enum execmem_type - types of executable memory ranges
> > + *
> > + * There are several subsystems that allocate executable memory.
> > + * Architectures define different restrictions on placement,
> > + * permissions, alignment and other parameters for memory that can be used
> > + * by these subsystems.
> > + * Types in this enum identify subsystems that allocate executable memory
> > + * and let architectures define parameters for ranges suitable for
> > + * allocations by each subsystem.
> > + *
> > + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> > + * are not explcitly defined.
> > + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> > + * @EXECMEM_KPROBES: parameters for kprobes
> > + * @EXECMEM_FTRACE: parameters for ftrace
> > + * @EXECMEM_BPF: parameters for BPF
> > + * @EXECMEM_TYPE_MAX:
> > + */
> > +enum execmem_type {
> > +   EXECMEM_DEFAULT,
> > +   EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> > +   EXECMEM_KPROBES,
> > +   EXECMEM_FTRACE,
> > +   EXECMEM_BPF,
> > +   EXECMEM_TYPE_MAX,
> > +};
> 
> s/explcitly
>  /explicitly
 
Sure, thanks

> Thanks,
> 
>   Ingo

-- 
Sincerely yours,
Mike.



Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-14 Thread Mike Rapoport
On Thu, Apr 11, 2024 at 12:42:05PM -0700, Luis Chamberlain wrote:
> On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" 
> > 
> > module_alloc() is used everywhere as a mean to allocate memory for code.
> > 
> > Beside being semantically wrong, this unnecessarily ties all subsystems
> > that need to allocate code, such as ftrace, kprobes and BPF to modules and
> > puts the burden of code allocation to the modules code.
> > 
> > Several architectures override module_alloc() because of various
> > constraints where the executable memory can be located and this causes
> > additional obstacles for improvements of code allocation.
> > 
> > Start splitting code allocation from modules by introducing execmem_alloc()
> > and execmem_free() APIs.
> > 
> > Initially, execmem_alloc() is a wrapper for module_alloc() and
> > execmem_free() is a replacement of module_memfree() to allow updating all
> > call sites to use the new APIs.
> > 
> > Since architectures define different restrictions on placement,
> > permissions, alignment and other parameters for memory that can be used by
> > different subsystems that allocate executable memory, execmem_alloc() takes
> > a type argument, that will be used to identify the calling subsystem and to
> > allow architectures define parameters for ranges suitable for that
> > subsystem.
> 
> It would be good to describe this is a non-fuctional change.

Ok.
 
> > Signed-off-by: Mike Rapoport (IBM) 
> > ---
> 
> > diff --git a/mm/execmem.c b/mm/execmem.c
> > new file mode 100644
> > index ..ed2ea41a2543
> > --- /dev/null
> > +++ b/mm/execmem.c
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0
> 
> And this just needs to copy over the copyright notices from the main.c file.

Will do.
 
>   Luis

-- 
Sincerely yours,
Mike.