Re: [PATCH 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-01-17 Thread kernel test robot
Hi Arnaud,

kernel test robot noticed the following build errors:

[auto build test ERROR on remoteproc/rproc-next]
[also build test ERROR on linus/master v6.7 next-20240117]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Arnaud-Pouliquen/remoteproc-Add-TEE-support/20240115-215613
base:   git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git 
rproc-next
patch link:
https://lore.kernel.org/r/20240115135249.296822-5-arnaud.pouliquen%40foss.st.com
patch subject: [PATCH 4/4] remoteproc: stm32: Add support of an OP-TEE TA to 
load the firmware
config: arm64-randconfig-r081-20240118 
(https://download.01.org/0day-ci/archive/20240118/202401181522.bvg4emu4-...@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240118/202401181522.bvg4emu4-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202401181522.bvg4emu4-...@intel.com/

All errors (new ones prefixed by >>):

   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`tee_rproc_load_fw':
>> tee_remoteproc.c:(.text+0x138): undefined reference to 
>> `tee_shm_register_kernel_buf'
>> tee_remoteproc.c:(.text+0x138): relocation truncated to fit: 
>> R_AARCH64_CALL26 against undefined symbol `tee_shm_register_kernel_buf'
>> aarch64-linux-ld: tee_remoteproc.c:(.text+0x18c): undefined reference to 
>> `tee_client_invoke_func'
>> tee_remoteproc.c:(.text+0x18c): relocation truncated to fit: 
>> R_AARCH64_CALL26 against undefined symbol `tee_client_invoke_func'
>> aarch64-linux-ld: tee_remoteproc.c:(.text+0x1ac): undefined reference to 
>> `tee_shm_free'
>> tee_remoteproc.c:(.text+0x1ac): relocation truncated to fit: 
>> R_AARCH64_CALL26 against undefined symbol `tee_shm_free'
   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`tee_rproc_start':
>> tee_remoteproc.c:(.text+0x240): undefined reference to 
>> `tee_client_invoke_func'
   tee_remoteproc.c:(.text+0x240): relocation truncated to fit: 
R_AARCH64_CALL26 against undefined symbol `tee_client_invoke_func'
   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`rproc_tee_get_rsc_table':
   tee_remoteproc.c:(.text+0x2dc): undefined reference to 
`tee_client_invoke_func'
   tee_remoteproc.c:(.text+0x2dc): relocation truncated to fit: 
R_AARCH64_CALL26 against undefined symbol `tee_client_invoke_func'
   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`tee_rproc_stop':
   tee_remoteproc.c:(.text+0x408): undefined reference to 
`tee_client_invoke_func'
   tee_remoteproc.c:(.text+0x408): relocation truncated to fit: 
R_AARCH64_CALL26 against undefined symbol `tee_client_invoke_func'
   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`tee_rproc_register':
>> tee_remoteproc.c:(.text+0x51c): undefined reference to 
>> `tee_client_open_session'
>> tee_remoteproc.c:(.text+0x51c): relocation truncated to fit: 
>> R_AARCH64_CALL26 against undefined symbol `tee_client_open_session'
   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`tee_rproc_unregister':
>> tee_remoteproc.c:(.text+0x624): undefined reference to 
>> `tee_client_close_session'
>> tee_remoteproc.c:(.text+0x624): relocation truncated to fit: 
>> R_AARCH64_CALL26 against undefined symbol `tee_client_close_session'
   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`tee_rproc_probe':
>> tee_remoteproc.c:(.text+0x6b0): undefined reference to 
>> `tee_client_open_context'
>> tee_remoteproc.c:(.text+0x6b0): relocation truncated to fit: 
>> R_AARCH64_CALL26 against undefined symbol `tee_client_open_context'
>> aarch64-linux-ld: tee_remoteproc.c:(.text+0x6e0): undefined reference to 
>> `tee_client_close_context'
>> tee_remoteproc.c:(.text+0x6e0): relocation truncated to fit: 
>> R_AARCH64_CALL26 against undefined symbol `tee_client_close_context'
   aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o: in function 
`tee_rproc_remove':
   tee_remoteproc.c:(.text+0x78c): undefined reference to 
`tee_client_close_session'
   tee_remoteproc.c:(.text+0x78c): additional relocation overflows omitted from 
the output
   aarch64-linux-ld: tee_remoteproc.c:(.text+0x7dc): undefined reference to 
`tee_client_close_context'
>> aarch64-linux-ld: drivers/remoteproc/tee_remoteproc.o:(.data+0x10): 
>> undefined reference to `tee_bus_type'

Kconfig warnings

Re: [PATCH v2] kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer

2024-01-17 Thread Steven Rostedt
On Thu, 18 Jan 2024 02:18:42 +
Chen Zhongjin  wrote:

> There is a deadlock scenario in kprobe_optimizer():
> 
> pid A pid B   pid C
> kprobe_optimizer()do_exit()   perf_kprobe_init()
> mutex_lock(_mutex) exit_tasks_rcu_start()  
> mutex_lock(_mutex)
> synchronize_rcu_tasks()   zap_pid_ns_processes()  // waiting 
> kprobe_mutex
> // waiting tasks_rcu_exit_srcukernel_wait4()
>   // waiting pid C exit
> 
> To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in 
> kprobe_optimizer()
> rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also 
> promise
> that all preempted tasks have scheduled, but it will not wait 
> tasks_rcu_exit_srcu.
> 

Did lockdep detect this? If not, we should fix that.

I'm also thinking if we should find another solution, as this seems more of
a work around than a fix.

> Fixes: a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe with 
> CONFIG_PREEMPT=y")
> Signed-off-by: Chen Zhongjin 
> ---
> v1 -> v2: Add Fixes tag
> ---
>  arch/Kconfig | 2 +-
>  kernel/kprobes.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index f4b210ab0612..dc6a18854017 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST
>  config OPTPROBES
>   def_bool y
>   depends on KPROBES && HAVE_OPTPROBES
> - select TASKS_RCU if PREEMPTION
> + select TASKS_RUDE_RCU

Is this still a bug if PREEMPTION is not enabled?

-- Steve

>  
>  config KPROBES_ON_FTRACE
>   def_bool y
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d5a0ee40bf66..09056ae50c58 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work)
>* Note that on non-preemptive kernel, this is transparently converted
>* to synchronoze_sched() to wait for all interrupts to have completed.
>*/
> - synchronize_rcu_tasks();
> + synchronize_rcu_tasks_rude();
>  
>   /* Step 3: Optimize kprobes after quiesence period */
>   do_optimize_kprobes();




Re: [PATCH net] ipvs: Simplify the allocation of ip_vs_conn slab caches

2024-01-17 Thread Kunwu Chan

Hi Simon,

Thanks for your reply.

On 2024/1/17 17:29, Simon Horman wrote:

On Wed, Jan 17, 2024 at 03:20:45PM +0800, Kunwu Chan wrote:

Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Signed-off-by: Kunwu Chan 


Hi Kunwu Chan,

I think this is more of a cleanup than a fix,
so it should probably be targeted at 'nf-next' rather than 'net'.

Thanks, I'm confused about when to use "nf-next" or "net" or "net-next".
"nf-next" means fixing errors for linux-next.git and linux-stable.git, 
while "nf" or "next" just means linux-next.git?




If it is a fix, then I would suggest targeting it at 'nf'
and providing a Fixes tag.

I'll keep it in mind in the future.


The above notwithstanding, this looks good to me.

Acked-by: Simon Horman 


---
  net/netfilter/ipvs/ip_vs_conn.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index a743db073887..98d7dbe3d787 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1511,9 +1511,7 @@ int __init ip_vs_conn_init(void)
return -ENOMEM;
  
  	/* Allocate ip_vs_conn slab cache */

-   ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
- sizeof(struct ip_vs_conn), 0,
- SLAB_HWCACHE_ALIGN, NULL);
+   ip_vs_conn_cachep = KMEM_CACHE(ip_vs_conn, SLAB_HWCACHE_ALIGN);
if (!ip_vs_conn_cachep) {
kvfree(ip_vs_conn_tab);
return -ENOMEM;

--
Thanks,
  Kunwu




[PATCH v2] kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer

2024-01-17 Thread Chen Zhongjin
There is a deadlock scenario in kprobe_optimizer():

pid A   pid B   pid C
kprobe_optimizer()  do_exit()   perf_kprobe_init()
mutex_lock(_mutex)   exit_tasks_rcu_start()  
mutex_lock(_mutex)
synchronize_rcu_tasks() zap_pid_ns_processes()  // waiting kprobe_mutex
// waiting tasks_rcu_exit_srcu  kernel_wait4()
// waiting pid C exit

To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in 
kprobe_optimizer()
rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also 
promise
that all preempted tasks have scheduled, but it will not wait 
tasks_rcu_exit_srcu.

Fixes: a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe with 
CONFIG_PREEMPT=y")
Signed-off-by: Chen Zhongjin 
---
v1 -> v2: Add Fixes tag
---
 arch/Kconfig | 2 +-
 kernel/kprobes.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index f4b210ab0612..dc6a18854017 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST
 config OPTPROBES
def_bool y
depends on KPROBES && HAVE_OPTPROBES
-   select TASKS_RCU if PREEMPTION
+   select TASKS_RUDE_RCU
 
 config KPROBES_ON_FTRACE
def_bool y
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d5a0ee40bf66..09056ae50c58 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work)
 * Note that on non-preemptive kernel, this is transparently converted
 * to synchronoze_sched() to wait for all interrupts to have completed.
 */
-   synchronize_rcu_tasks();
+   synchronize_rcu_tasks_rude();
 
/* Step 3: Optimize kprobes after quiesence period */
do_optimize_kprobes();
-- 
2.25.1




Re: [GIT PULL] remoteproc updates for v6.8

2024-01-17 Thread pr-tracker-bot
The pull request you sent on Sun, 14 Jan 2024 19:35:34 -0800:

> https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git 
> tags/rproc-v6.8

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e88481f74fee690316c1e0fd3777f919067d2d58

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



Re: [GIT PULL] rpmsg updates for v6.8

2024-01-17 Thread pr-tracker-bot
The pull request you sent on Sun, 14 Jan 2024 19:24:43 -0800:

> https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git 
> tags/rpmsg-v6.8

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2a43434675b2114e8f909a5039cc421d35d35ce9

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



Re: [GIT PULL] hwspinlock updates to v6.8

2024-01-17 Thread pr-tracker-bot
The pull request you sent on Sun, 14 Jan 2024 19:40:55 -0800:

> https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git 
> tags/hwlock-v6.8

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a2ec2071cad819fe952a3f1ef66f2d2112c27b6e

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



Re: [PATCH 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-01-17 Thread kernel test robot
Hi Arnaud,

kernel test robot noticed the following build warnings:

[auto build test WARNING on remoteproc/rproc-next]
[also build test WARNING on robh/for-next linus/master v6.7 next-20240117]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Arnaud-Pouliquen/remoteproc-Add-TEE-support/20240115-215613
base:   git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git 
rproc-next
patch link:
https://lore.kernel.org/r/20240115135249.296822-5-arnaud.pouliquen%40foss.st.com
patch subject: [PATCH 4/4] remoteproc: stm32: Add support of an OP-TEE TA to 
load the firmware
config: mips-randconfig-r112-20240117 
(https://download.01.org/0day-ci/archive/20240118/202401180740.1ud9pjyn-...@intel.com/config)
compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 
9bde5becb44ea071f5e1fa1f5d4071dc8788b18c)
reproduce: 
(https://download.01.org/0day-ci/archive/20240118/202401180740.1ud9pjyn-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202401180740.1ud9pjyn-...@intel.com/

sparse warnings: (new ones prefixed by >>)
   drivers/remoteproc/tee_remoteproc.c:82:26: sparse: sparse: symbol 
'tee_rproc_ctx' was not declared. Should it be static?
   drivers/remoteproc/tee_remoteproc.c:166:24: sparse: sparse: incorrect type 
in assignment (different address spaces) @@ expected void *rsc_va @@ 
got void [noderef] __iomem * @@
   drivers/remoteproc/tee_remoteproc.c:166:24: sparse: expected void *rsc_va
   drivers/remoteproc/tee_remoteproc.c:166:24: sparse: got void [noderef] 
__iomem *
>> drivers/remoteproc/tee_remoteproc.c:233:31: sparse: sparse: incorrect type 
>> in argument 1 (different address spaces) @@ expected void const volatile 
>> [noderef] __iomem *addr @@ got void *rsc_va @@
   drivers/remoteproc/tee_remoteproc.c:233:31: sparse: expected void const 
volatile [noderef] __iomem *addr
   drivers/remoteproc/tee_remoteproc.c:233:31: sparse: got void *rsc_va
   drivers/remoteproc/tee_remoteproc.c: note: in included file (through 
include/linux/preempt.h, include/linux/spinlock.h, include/linux/mmzone.h, ...):
   include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates 
to true

vim +233 drivers/remoteproc/tee_remoteproc.c

6805d1065198e1 Arnaud Pouliquen 2024-01-15  215  
6805d1065198e1 Arnaud Pouliquen 2024-01-15  216  int tee_rproc_stop(struct 
tee_rproc *trproc)
6805d1065198e1 Arnaud Pouliquen 2024-01-15  217  {
6805d1065198e1 Arnaud Pouliquen 2024-01-15  218 struct 
tee_ioctl_invoke_arg arg;
6805d1065198e1 Arnaud Pouliquen 2024-01-15  219 struct tee_param 
param[MAX_TEE_PARAM_ARRY_MEMBER];
6805d1065198e1 Arnaud Pouliquen 2024-01-15  220 int ret;
6805d1065198e1 Arnaud Pouliquen 2024-01-15  221  
6805d1065198e1 Arnaud Pouliquen 2024-01-15  222 prepare_args(trproc, 
TA_RPROC_FW_CMD_STOP_FW, , param, 0);
6805d1065198e1 Arnaud Pouliquen 2024-01-15  223  
6805d1065198e1 Arnaud Pouliquen 2024-01-15  224 ret = 
tee_client_invoke_func(tee_rproc_ctx->tee_ctx, , param);
6805d1065198e1 Arnaud Pouliquen 2024-01-15  225 if (ret < 0 || arg.ret 
!= 0) {
6805d1065198e1 Arnaud Pouliquen 2024-01-15  226 
dev_err(tee_rproc_ctx->dev,
6805d1065198e1 Arnaud Pouliquen 2024-01-15  227 
"TA_RPROC_FW_CMD_STOP_FW invoke failed TEE err: %x, ret:%x\n",
6805d1065198e1 Arnaud Pouliquen 2024-01-15  228 
arg.ret, ret);
6805d1065198e1 Arnaud Pouliquen 2024-01-15  229 if (!ret)
6805d1065198e1 Arnaud Pouliquen 2024-01-15  230 ret = 
-EIO;
6805d1065198e1 Arnaud Pouliquen 2024-01-15  231 }
6805d1065198e1 Arnaud Pouliquen 2024-01-15  232 if (trproc->rsc_va)
6805d1065198e1 Arnaud Pouliquen 2024-01-15 @233 
iounmap(trproc->rsc_va);
6805d1065198e1 Arnaud Pouliquen 2024-01-15  234 trproc->rsc_va = NULL;
6805d1065198e1 Arnaud Pouliquen 2024-01-15  235  
6805d1065198e1 Arnaud Pouliquen 2024-01-15  236 return ret;
6805d1065198e1 Arnaud Pouliquen 2024-01-15  237  }
6805d1065198e1 Arnaud Pouliquen 2024-01-15  238  
EXPORT_SYMBOL_GPL(tee_rproc_stop);
6805d1065198e1 Arnaud Pouliquen 2024-01-15  239  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

2024-01-17 Thread SeongJae Park
On Wed, 17 Jan 2024 13:11:03 -0800 SeongJae Park  wrote:

[...]
> Hi Honggyu,
> 
> On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim  wrote:
> 
> > Hi SeongJae,
> > 
> > Thanks very much for your comments in details.
> > 
> > On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park  wrote:
> > 
[...]
> > > To this end, I feel the problem might be able te be simpler, because this
> > > patchset is trying to provide two sophisticated operations, while I think 
> > > a
> > > simpler approach might be possible.  My humble simpler idea is adding a 
> > > DAMOS
> > > operation for moving pages to a given node (like sys_move_phy_pages 
> > > RFC[1]),
> > > instead of the promote/demote.  Because the general pages migration can 
> > > handle
> > > multiple cases including the promote/demote in my humble assumption.
[...]
> > > In more detail, users could decide which is the appropriate node for 
> > > promotion
> > > or demotion and use the new DAMOS action to do promotion and demotion.  
> > > Users
> > > would requested to decide which node is the proper promotion/demotion 
> > > target
> > > nodes, but that decision wouldn't be that hard in my opinion.
> > > 
> > > For this, 'struct damos' would need to be updated for such 
> > > argument-dependent
> > > actions, like 'struct damos_filter' is haing a union.
> > 
> > That might be a better solution.  I will think about it.
> 
> More specifically, I think receiving an address range as the argument might
> more flexible than just NUMA node.  Maybe we can imagine proactively migrating
> cold movable pages from normal zones to movable zones, to avoid normal zone
> memory pressure.

Yet another crazy idea.  Finding hot regions in the middle of cold region and
move to besides of other hot pages.  As a result, memory is sorted by access
temperature even in same node, and the system gains more spatial locality,
which benefits general locality-based algorithms including DAMON's adaptive
regions adjustment.


Thanks,
SJ

[...]



Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

2024-01-17 Thread SeongJae Park
Hi Honggyu,

On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> Thanks very much for your comments in details.
> 
> On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park  wrote:
> 
> > Thank you so much for this great patches and the above nice test results.  I
> > believe the test setup and results make sense, and merging a revised 
> > version of
> > this patchset would provide real benefits to the users.
> 
> Glad to hear that!
> 
> > In a high level, I think it might better to separate DAMON internal changes
> > from DAMON external changes.
> 
> I agree.  I can't guarantee but I can move all the external changes
> inside mm/damon, but will try that as much as possible.
> 
> > For DAMON part changes, I have no big concern other than trivial coding 
> > style
> > level comments.
> 
> Sure.  I will fix those.
> 
> > For DAMON-external changes that implementing demote_pages() and
> > promote_pages(), I'm unsure if the implementation is reusing appropriate
> > functions, and if those are placee in right source file.  Especially, I'm
> > unsure if vmscan.c is the right place for promotion code.  Also I don't 
> > know if
> > there is a good agreement on the promotion/demotion target node decision.  
> > That
> > should be because I'm not that familiar with the areas and the files, but I
> > feel this might because our discussions on the promotion and the demotion
> > operations are having rooms for being more matured.  Because I'm not very
> > faimiliar with the part, I'd like to hear others' comments, too.
> 
> I would also like to hear others' comments, but this might not be needed
> if most of external code can be moved to mm/damon.
> 
> > To this end, I feel the problem might be able te be simpler, because this
> > patchset is trying to provide two sophisticated operations, while I think a
> > simpler approach might be possible.  My humble simpler idea is adding a 
> > DAMOS
> > operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> > instead of the promote/demote.  Because the general pages migration can 
> > handle
> > multiple cases including the promote/demote in my humble assumption.
> 
> My initial implementation was similar but I found that it's not accurate
> enough due to the nature of inaccuracy of DAMON regions.  I saw that
> many pages were demoted and promoted back and forth because migration
> target regions include both hot and cold pages together.
> 
> So I have implemented the demotion and promotion logics based on the
> shrink_folio_list, which contains many corner case handling logics for
> reclaim.
> 
> Having the current demotion and promotion logics makes the hot/cold
> migration pretty accurate as expected.  We made a simple program called
> "hot_cold" and it receives 2 arguments for hot size and cold size in MB.
> For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
> of cold memory.  It basically allocates 2 large blocks of memory with
> mmap, then repeat memset for the initial 200MB to make it accessed in an
> infinite loop.
> 
> Let's say there are 3 nodes in the system and the first node0 and node1
> are the first tier, and node2 is the second tier.
> 
>   $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
>   0-1
> 
>   $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
>   2
> 
> Here is the result of partitioning hot/cold memory and I put execution
> command at the right side of numastat result.  I initially ran each
> hot_cold program with preferred setting so that they initially allocate
> memory on one of node0 or node2, but they gradually migrated based on
> their access frequencies.
> 
>   $ numastat -c -p hot_cold
>   Per-node process memory usage (in MBs) 
>   PID  Node 0 Node 1 Node 2 Total 
>   ---  -- -- -- - 
>   754 (hot_cold) 1800  0   2000  3800<- hot_cold 1800 2000 
>   1184 (hot_cold) 300  0500   800<- hot_cold 300 500 
>   1818 (hot_cold) 801  0   3199  4000<- hot_cold 800 3200 
>   30289 (hot_cold)  4  0  510<- hot_cold 3 5 
>   30325 (hot_cold) 31  0 5181<- hot_cold 30 50 
>   ---  -- -- -- - 
>   Total  2938  0   5756  8695
> 
> The final node placement result shows that DAMON accurately migrated
> pages by their hotness for multiple processes.

What was the result when the corner cases handling logics were not applied?

And, what are the corner cases handling logic that seemed essential?  I show
the page granularity active/reference check could indeed provide many
improvements, but that's only my humble assumption.

If the corner cases are indeed better to be applied in page granularity, I
agree we need some more efforts since DAMON monitoring results are not page
granularity aware by the design.  Users could increase min_nr_regions to make
it more accurate, and we have plan to support page granularity 

Re: [RFC V1 01/13] vhost-vdpa: count pinned memory

2024-01-17 Thread Steven Sistare
On 1/10/2024 5:24 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 10, 2024 at 12:40:03PM -0800, Steve Sistare wrote:
>> Remember the count of pinned memory for the device.
>>
>> Signed-off-by: Steve Sistare 
> 
> Can we have iommufd support in vdpa so we do not keep extending these hacks?

I assume this is rhetorical and not aimed specifically at me, but live update
interfaces for iommufd are on my todo list.

- Steve

>> ---
>>  drivers/vhost/vdpa.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index da7ec77cdaff..10fb95bcca1a 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -59,6 +59,7 @@ struct vhost_vdpa {
>>  int in_batch;
>>  struct vdpa_iova_range range;
>>  u32 batch_asid;
>> +long pinned_vm;
>>  };
>>  
>>  static DEFINE_IDA(vhost_vdpa_ida);
>> @@ -893,6 +894,7 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, 
>> struct vhost_iotlb *iotlb,
>>  unpin_user_page(page);
>>  }
>>  atomic64_sub(PFN_DOWN(map->size), >mm->pinned_vm);
>> +v->pinned_vm -= PFN_DOWN(map->size);
>>  vhost_vdpa_general_unmap(v, map, asid);
>>  vhost_iotlb_map_free(iotlb, map);
>>  }
>> @@ -975,9 +977,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, struct 
>> vhost_iotlb *iotlb,
>>  return r;
>>  }
>>  
>> -if (!vdpa->use_va)
>> +if (!vdpa->use_va) {
>>  atomic64_add(PFN_DOWN(size), >mm->pinned_vm);
>> -
>> +v->pinned_vm += PFN_DOWN(size);
>> +}
>>  return 0;
>>  }
>>  
>> -- 
>> 2.39.3
> 



Re: [RFC V1 00/13] vdpa live update

2024-01-17 Thread Steven Sistare
On 1/10/2024 9:55 PM, Jason Wang wrote:
> On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  
> wrote:
>>
>> Live update is a technique wherein an application saves its state, exec's
>> to an updated version of itself, and restores its state.  Clients of the
>> application experience a brief suspension of service, on the order of
>> 100's of milliseconds, but are otherwise unaffected.
>>
>> Define and implement interfaces that allow vdpa devices to be preserved
>> across fork or exec, to support live update for applications such as qemu.
>> The device must be suspended during the update, but its dma mappings are
>> preserved, so the suspension is brief.
>>
>> The VHOST_NEW_OWNER ioctl transfers device ownership and pinned memory
>> accounting from one process to another.
>>
>> The VHOST_BACKEND_F_NEW_OWNER backend capability indicates that
>> VHOST_NEW_OWNER is supported.
>>
>> The VHOST_IOTLB_REMAP message type updates a dma mapping with its userland
>> address in the new process.
>>
>> The VHOST_BACKEND_F_IOTLB_REMAP backend capability indicates that
>> VHOST_IOTLB_REMAP is supported and required.  Some devices do not
>> require it, because the userland address of each dma mapping is discarded
>> after being translated to a physical address.
>>
>> Here is a pseudo-code sequence for performing live update, based on
>> suspend + reset because resume is not yet available.  The vdpa device
>> descriptor, fd, remains open across the exec.
>>
>>   ioctl(fd, VHOST_VDPA_SUSPEND)
>>   ioctl(fd, VHOST_VDPA_SET_STATUS, 0)
>>   exec
> 
> Is there a userspace implementation as a reference?

I have working patches for qemu that use these ioctl's, but they depend on 
other 
qemu cpr patches that are a work in progress, and not posted yet.  I'm working 
on
that.

>>   ioctl(fd, VHOST_NEW_OWNER)
>>
>>   issue ioctls to re-create vrings
>>
>>   if VHOST_BACKEND_F_IOTLB_REMAP
>>   foreach dma mapping
>>   write(fd, {VHOST_IOTLB_REMAP, new_addr})
> 
> I think I need to understand the advantages of this approach. For
> example, why it is better than
> 
> ioctl(VHOST_RESET_OWNER)
> exec
> 
> ioctl(VHOST_SET_OWNER)
> 
> for each dma mapping
>  ioctl(VHOST_IOTLB_UPDATE)

That is slower.  VHOST_RESET_OWNER unbinds physical pages, and 
VHOST_IOTLB_UPDATE
rebinds them.  It costs multiple seconds for large memories, and is incurred 
during the
virtual machine's pause time during live update.  For comparison, the total 
pause time
for live update with vfio interfaces is ~100 millis.

However, the interaction with userland is so similar that the same code paths 
can be used.
In my qemu prototype, after cpr exec's new qemu:
  - vhost_vdpa_set_owner() calls VHOST_NEW_OWNER instead of VHOST_SET_OWNER
  - vhost_vdpa_dma_map() sets type VHOST_IOTLB_REMAP instead of 
VHOST_IOTLB_UPDATE

- Steve




Re: [RFC V1 05/13] vhost-vdpa: VHOST_IOTLB_REMAP

2024-01-17 Thread Steven Sistare
On 1/10/2024 10:08 PM, Jason Wang wrote:
> On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  
> wrote:
>>
>> When device ownership is passed to a new process via VHOST_NEW_OWNER,
>> some devices need to know the new userland addresses of the dma mappings.
>> Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr
>> of a mapping.  The new uaddr must address the same memory object as
>> originally mapped.
>>
>> The user must suspend the device before the old address is invalidated,
>> and cannot resume it until after VHOST_IOTLB_REMAP is called, but this
>> requirement is not enforced by the API.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  drivers/vhost/vdpa.c | 34 
>>  include/uapi/linux/vhost_types.h | 11 ++-
>>  2 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index faed6471934a..ec5ca20bd47d 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>>
>>  }
>>
>> +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v,
>> + struct vhost_iotlb *iotlb,
>> + struct vhost_iotlb_msg *msg)
>> +{
>> +   struct vdpa_device *vdpa = v->vdpa;
>> +   const struct vdpa_config_ops *ops = vdpa->config;
>> +   u32 asid = iotlb_to_asid(iotlb);
>> +   u64 start = msg->iova;
>> +   u64 last = start + msg->size - 1;
>> +   struct vhost_iotlb_map *map;
>> +   int r = 0;
>> +
>> +   if (msg->perm || !msg->size)
>> +   return -EINVAL;
>> +
>> +   map = vhost_iotlb_itree_first(iotlb, start, last);
>> +   if (!map)
>> +   return -ENOENT;
>> +
>> +   if (map->start != start || map->last != last)
>> +   return -EINVAL;
>> +
>> +   /* batch will finish with remap.  non-batch must do it now. */
>> +   if (!v->in_batch)
>> +   r = ops->set_map(vdpa, asid, iotlb);
>> +   if (!r)
>> +   map->addr = msg->uaddr;
> 
> I may miss something, for example for PA mapping,
> 
> 1) need to convert uaddr into phys addr
> 2) need to check whether the uaddr is backed by the same page or not?

This code does not verify that the new size@uaddr points to the same physical
pages as the old size@uaddr.  If the app screws up and they differ, then the app
may corrupt its own memory, but no-one else's.

It would be expensive for large memories to verify page by page, O(npages), and 
such
verification lies on the critical path for virtual machine downtime during live 
update.
I could compare the properties of the vma(s) for the old size@uaddr vs the vma 
for the 
new, but that is more complicated and would be a maintenance headache.  When I 
submitted
such code to Alex W when writing the equivalent patches for vfio, he said don't 
check,
correctness is the user's responsibility.

- Steve

>> +
>> +   return r;
>> +}
>> +
>>  static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>struct vhost_iotlb *iotlb,
>>struct vhost_iotlb_msg *msg)
>> @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct 
>> vhost_dev *dev, u32 asid,
>> ops->set_map(vdpa, asid, iotlb);
>> v->in_batch = false;
>> break;
>> +   case VHOST_IOTLB_REMAP:
>> +   r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg);
>> +   break;
>> default:
>> r = -EINVAL;
>> break;
>> diff --git a/include/uapi/linux/vhost_types.h 
>> b/include/uapi/linux/vhost_types.h
>> index 9177843951e9..35908315ff55 100644
>> --- a/include/uapi/linux/vhost_types.h
>> +++ b/include/uapi/linux/vhost_types.h
>> @@ -79,7 +79,7 @@ struct vhost_iotlb_msg {
>>  /*
>>   * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying
>>   * multiple mappings in one go: beginning with
>> - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of
>> + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or
>>   * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END.
>>   * When one of these two values is used as the message type, the rest
>>   * of the fields in the message are ignored. There's no guarantee that
>> @@ -87,6 +87,15 @@ struct vhost_iotlb_msg {
>>   */
>>  #define VHOST_IOTLB_BATCH_BEGIN5
>>  #define VHOST_IOTLB_BATCH_END  6
>> +
>> +/*
>> + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova.
>> + * The new uaddr must address the same memory object as originally mapped.
>> + * Failure to do so will result in user memory corruption and/or device
>> + * misbehavior.  iova and size must match the arguments used to create the
>> + * an existing mapping.  Protection is not changed, and perm must be 0.
>> + 

Re: [PATCH] kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer

2024-01-17 Thread Andrew Morton
On Wed, 17 Jan 2024 06:16:36 + Chen Zhongjin  
wrote:

> There is a deadlock scenario in kprobe_optimizer():
> 
> pid A pid B   pid C
> kprobe_optimizer()do_exit()   perf_kprobe_init()
> mutex_lock(_mutex) exit_tasks_rcu_start()  
> mutex_lock(_mutex)
> synchronize_rcu_tasks()   zap_pid_ns_processes()  // waiting 
> kprobe_mutex
> // waiting tasks_rcu_exit_srcukernel_wait4()
>   // waiting pid C exit
> 
> To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in 
> kprobe_optimizer()
> rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also 
> promise
> that all preempted tasks have scheduled, but it will not wait 
> tasks_rcu_exit_srcu.
> 
> Signed-off-by: Chen Zhongjin 

Thanks.  Should we backport this fix into earlier kernels?  If so, are
we able to identify a suitable Fixes: target?



Re: [RFC V1 08/13] vduse: flush workers on suspend

2024-01-17 Thread Steven Sistare
On 1/10/2024 10:09 PM, Jason Wang wrote:
> On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  
> wrote:
>>
>> To pass ownership of a live vdpa device to a new process, the user
>> suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
>> VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
>> mm.  Flush workers in suspend to guarantee that no worker sees the new
>> mm and old VA in between.
>>
>> Signed-off-by: Steve Sistare 
> 
> It seems we need a better title, probably "suspend support for vduse"?
> And it looks better to be an separate patch.

Will do - steve




Re: [RFC V1 10/13] vdpa_sim: flush workers on suspend

2024-01-17 Thread Steven Sistare
On 1/16/2024 1:57 PM, Eugenio Perez Martin wrote:
> On Wed, Jan 10, 2024 at 9:40 PM Steve Sistare  
> wrote:
>>
>> To pass ownership of a live vdpa device to a new process, the user
>> suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
>> VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
>> mm.  Flush workers in suspend to guarantee that no worker sees the new
>> mm and old VA in between.
> 
> The worker should already be stopped by the end of the suspend ioctl,
> so maybe we can consider this a fix?

Do you mean: the current behavior is a bug, independently of my new use case,
so I should submit this patch as a separate bug fix?  If yes, then will do.

>> Signed-off-by: Steve Sistare 
>> ---
>>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c 
>> b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index 6304cb0b4770..8734834983cb 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -74,6 +74,17 @@ static void vdpasim_worker_change_mm_sync(struct vdpasim 
>> *vdpasim,
>> kthread_flush_work(work);
>>  }
>>
>> +static void flush_work_fn(struct kthread_work *work) {}
>> +
>> +static void vdpasim_flush_work(struct vdpasim *vdpasim)
>> +{
>> +   struct kthread_work work;
>> +
>> +   kthread_init_work(, flush_work_fn);
>> +   kthread_queue_work(vdpasim->worker, );
>> +   kthread_flush_work();
> 
> Wouldn't it be better to cancel the work with kthread_cancel_work_sync here?

I believe that does not guarantee that currently executing work completes:

  static bool __kthread_cancel_work_sync()
if (worker->current_work != work)
goto out_fast;

- Steve

>> +}
>> +
>>  static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
>>  {
>> return container_of(vdpa, struct vdpasim, vdpa);
>> @@ -512,6 +523,8 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
>> vdpasim->running = false;
>> mutex_unlock(>mutex);
>>
>> +   vdpasim_flush_work(vdpasim);
>> +
>> return 0;
>>  }
>>
>> --
>> 2.39.3
>>
> 



Re: [RFC V1 07/13] vhost-vdpa: flush workers on suspend

2024-01-17 Thread Steven Sistare
On 1/11/2024 9:28 PM, Jason Wang wrote:
> On Fri, Jan 12, 2024 at 12:18 AM Mike Christie
>  wrote:
>>
>> On 1/10/24 9:09 PM, Jason Wang wrote:
>>> On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  
>>> wrote:

 To pass ownership of a live vdpa device to a new process, the user
 suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
 VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
 mm.  Flush workers in suspend to guarantee that no worker sees the new
 mm and old VA in between.

 Signed-off-by: Steve Sistare 
 ---
  drivers/vhost/vdpa.c | 4 
  1 file changed, 4 insertions(+)

 diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
 index 8fe1562d24af..9673e8e20d11 100644
 --- a/drivers/vhost/vdpa.c
 +++ b/drivers/vhost/vdpa.c
 @@ -591,10 +591,14 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
  {
 struct vdpa_device *vdpa = v->vdpa;
 const struct vdpa_config_ops *ops = vdpa->config;
 +   struct vhost_dev *vdev = >vdev;

 if (!ops->suspend)
 return -EOPNOTSUPP;

 +   if (vdev->use_worker)
 +   vhost_dev_flush(vdev);
>>>
>>> It looks to me like it's better to check use_woker in vhost_dev_flush.
>>
>> You can now just call vhost_dev_flush and it will do the right thing.
>> The xa_for_each loop will only flush workers if they have been setup,
>> so for vdpa it will not find/flush anything.

Very good, I will drop this patch.

- Steve



[PATCH V1] vdpa_sim: reset must not run

2024-01-17 Thread Steve Sistare
vdpasim_do_reset sets running to true, which is wrong, as it allows
vdpasim_kick_vq to post work requests before the device has been
configured.  To fix, do not set running until VIRTIO_CONFIG_S_FEATURES_OK
is set.

Fixes: 0c89e2a3a9d0 ("vdpa_sim: Implement suspend vdpa op")
Signed-off-by: Steve Sistare 
Reviewed-by: Eugenio Pérez 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index be2925d0d283..6304cb0b4770 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -160,7 +160,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim, u32 
flags)
}
}
 
-   vdpasim->running = true;
+   vdpasim->running = false;
spin_unlock(>iommu_lock);
 
vdpasim->features = 0;
@@ -483,6 +483,7 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 
status)
 
mutex_lock(>mutex);
vdpasim->status = status;
+   vdpasim->running = (status & VIRTIO_CONFIG_S_FEATURES_OK) != 0;
mutex_unlock(>mutex);
 }
 
-- 
2.39.3




Re: [PATCH 00/20] filelock: split struct file_lock into file_lock and file_lease structs

2024-01-17 Thread NeilBrown
On Thu, 18 Jan 2024, Chuck Lever wrote:
> On Tue, Jan 16, 2024 at 02:45:56PM -0500, Jeff Layton wrote:
> > Long ago, file locks used to hang off of a singly-linked list in struct
> > inode. Because of this, when leases were added, they were added to the
> > same list and so they had to be tracked using the same sort of
> > structure.
> > 
> > Several years ago, we added struct file_lock_context, which allowed us
> > to use separate lists to track different types of file locks. Given
> > that, leases no longer need to be tracked using struct file_lock.
> > 
> > That said, a lot of the underlying infrastructure _is_ the same between
> > file leases and locks, so we can't completely separate everything.
> 
> Naive question: locks and leases are similar. Why do they need to be
> split apart? The cover letter doesn't address that, and I'm new
> enough at this that I don't have that context.

For me, the big win was in the last patch where we got separate
lock_manager_operations and lease_manager_operations.  There is zero
overlap between the two.  This highlights that one one level these are
different things with different behaviours.

NeilBrown



Re: [PATCH v9 3/3] remoteproc: zynqmp: parse TCM from device tree

2024-01-17 Thread Mathieu Poirier
Alright, I spent several hours looking at this patchset and the driver as a
whole.  I certainly salute your efforts to heed my advice and make the code less
brittle but I'm afraid we are not there.

See below for a different way to proceed.

On Wed, Jan 10, 2024 at 01:35:05PM -0800, Tanmay Shah wrote:
> ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information
> is available in device-tree. Parse TCM information in driver
> as per new bindings.
> 
> Signed-off-by: Tanmay Shah 
> ---
> 
> Changes in v9:
>   - Introduce new API to request and release core1 TCM power-domains in
> lockstep mode. This will be used during prepare -> add_tcm_banks
> callback to enable TCM in lockstep mode.
>   - Parse TCM from device-tree in lockstep mode and split mode in
> uniform way.
>   - Fix TCM representation in device-tree in lockstep mode.
> 
> Changes in v8:
>   - Remove pm_domains framework
>   - Remove checking of pm_domain_id validation to power on/off tcm
>   - Remove spurious change
>   - parse power-domains property from device-tree and use EEMI calls
> to power on/off TCM instead of using pm domains framework
> 
> Changes in v7:
>   - move checking of pm_domain_id from previous patch
>   - fix mem_bank_data memory allocation
> 
>  drivers/remoteproc/xlnx_r5_remoteproc.c | 245 +++-
>  1 file changed, 239 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> b/drivers/remoteproc/xlnx_r5_remoteproc.c
> index 4395edea9a64..0f87b984850b 100644
> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> @@ -74,8 +74,8 @@ struct mbox_info {
>  };
>  
>  /*
> - * Hardcoded TCM bank values. This will be removed once TCM bindings are
> - * accepted for system-dt specifications and upstreamed in linux kernel
> + * Hardcoded TCM bank values. This will stay in driver to maintain backward
> + * compatibility with device-tree that does not have TCM information.
>   */
>  static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
>   {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
> */
> @@ -102,6 +102,7 @@ static const struct mem_bank_data 
> zynqmp_tcm_banks_lockstep[] = {
>   * @rproc: rproc handle
>   * @pm_domain_id: RPU CPU power domain id
>   * @ipi: pointer to mailbox information
> + * @lockstep_core1_np: second core's device_node to use in lockstep mode
>   */
>  struct zynqmp_r5_core {
>   struct device *dev;
> @@ -111,6 +112,7 @@ struct zynqmp_r5_core {
>   struct rproc *rproc;
>   u32 pm_domain_id;
>   struct mbox_info *ipi;
> + struct device_node *lockstep_core1_np;
>  };
>  
>  /**
> @@ -539,6 +541,110 @@ static int tcm_mem_map(struct rproc *rproc,
>   return 0;
>  }
>  
> +int request_core1_tcm_lockstep(struct rproc *rproc)
> +{
> + struct zynqmp_r5_core *r5_core = rproc->priv;
> + struct of_phandle_args out_args = {0};
> + int ret, i, num_pd, pd_id, ret_err;
> + struct device_node *np;
> +
> + np = r5_core->lockstep_core1_np;
> +
> + /* Get number of power-domains */
> + num_pd = of_count_phandle_with_args(np, "power-domains",
> + "#power-domain-cells");
> + if (num_pd <= 0)
> + return -EINVAL;
> +
> + /* Get individual power-domain id and enable TCM */
> + for (i = 1; i < num_pd; i++) {
> + ret = of_parse_phandle_with_args(np, "power-domains",
> +  "#power-domain-cells",
> +  i, _args);
> + if (ret) {
> + dev_warn(r5_core->dev,
> +  "failed to get tcm %d in power-domains list, 
> ret %d\n",
> +  i, ret);
> + goto fail_request_core1_tcm;
> + }
> +
> + pd_id = out_args.args[0];
> + of_node_put(out_args.np);
> +
> + ret = zynqmp_pm_request_node(pd_id, 
> ZYNQMP_PM_CAPABILITY_ACCESS, 0,
> +  ZYNQMP_PM_REQUEST_ACK_BLOCKING);
> + if (ret) {
> + dev_err(r5_core->dev, "failed to request TCM node 
> 0x%x\n",
> + pd_id);
> + goto fail_request_core1_tcm;
> + }
> + }
> +
> + return 0;
> +
> +fail_request_core1_tcm:
> +
> + /* Cache actual error to return later */
> + ret_err = ret;
> +
> + /* Release previously requested TCM in case of failure */
> + while (--i > 0) {
> + ret = of_parse_phandle_with_args(np, "power-domains",
> +  "#power-domain-cells",
> +  i, _args);
> + if (ret)
> + return ret;
> + pd_id = out_args.args[0];
> + of_node_put(out_args.np);
> + 

Re: [PATCH 00/20] filelock: split struct file_lock into file_lock and file_lease structs

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 10:12 -0500, Chuck Lever wrote:
> On Tue, Jan 16, 2024 at 02:45:56PM -0500, Jeff Layton wrote:
> > Long ago, file locks used to hang off of a singly-linked list in struct
> > inode. Because of this, when leases were added, they were added to the
> > same list and so they had to be tracked using the same sort of
> > structure.
> > 
> > Several years ago, we added struct file_lock_context, which allowed us
> > to use separate lists to track different types of file locks. Given
> > that, leases no longer need to be tracked using struct file_lock.
> > 
> > That said, a lot of the underlying infrastructure _is_ the same between
> > file leases and locks, so we can't completely separate everything.
> 
> Naive question: locks and leases are similar. Why do they need to be
> split apart? The cover letter doesn't address that, and I'm new
> enough at this that I don't have that context.
> 

Leases and locks do have some similarities, but it's mostly the
internals (stuff like the blocker/waiter handling) where they are
similar. Superficially they are very different objects, and handling
them with the same struct is unintuitive.

So, for now this is just about cleaning up the lock and lease handling
APIs for better type safety and clarity. It's also nice to separate out
things like the kasync handling, which only applies to leases, as well
as splitting up the lock_manager_operations, which don't share any
operations between locks and leases.

Longer term, we're also considering adding things like directory
delegations, which may need to either expand struct file_lease, or add
a new variant (dir_deleg ?). I'd rather not add that complexity to
struct file_lock. 

> 
> > This patchset first splits a group of fields used by both file locks and
> > leases into a new struct file_lock_core, that is then embedded in struct
> > file_lock. Coccinelle was then used to convert a lot of the callers to
> > deal with the move, with the remaining 25% or so converted by hand.
> > 
> > It then converts several internal functions in fs/locks.c to work
> > with struct file_lock_core. Lastly, struct file_lock is split into
> > struct file_lock and file_lease, and the lease-related APIs converted to
> > take struct file_lease.
> > 
> > After the first few patches (which I left split up for easier review),
> > the set should be bisectable. I'll plan to squash the first few
> > together to make sure the resulting set is bisectable before merge.
> > 
> > Finally, I left the coccinelle scripts I used in tree. I had heard it
> > was preferable to merge those along with the patches that they
> > generate, but I wasn't sure where they go. I can either move those to a
> > more appropriate location or we can just drop that commit if it's not
> > needed.
> > 
> > I'd like to have this considered for inclusion in v6.9. Christian, would
> > you be amenable to shepherding this into mainline (assuming there are no
> > major objections, of course)?
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> > Jeff Layton (20):
> >   filelock: split common fields into struct file_lock_core
> >   filelock: add coccinelle scripts to move fields to struct 
> > file_lock_core
> >   filelock: the results of the coccinelle conversion
> >   filelock: fixups after the coccinelle changes
> >   filelock: convert some internal functions to use file_lock_core 
> > instead
> >   filelock: convert more internal functions to use file_lock_core
> >   filelock: make posix_same_owner take file_lock_core pointers
> >   filelock: convert posix_owner_key to take file_lock_core arg
> >   filelock: make locks_{insert,delete}_global_locks take file_lock_core 
> > arg
> >   filelock: convert locks_{insert,delete}_global_blocked
> >   filelock: convert the IS_* macros to take file_lock_core
> >   filelock: make __locks_delete_block and __locks_wake_up_blocks take 
> > file_lock_core
> >   filelock: convert __locks_insert_block, conflict and deadlock checks 
> > to use file_lock_core
> >   filelock: convert fl_blocker to file_lock_core
> >   filelock: clean up locks_delete_block internals
> >   filelock: reorganize locks_delete_block and __locks_insert_block
> >   filelock: make assign_type helper take a file_lock_core pointer
> >   filelock: convert locks_wake_up_blocks to take a file_lock_core 
> > pointer
> >   filelock: convert locks_insert_lock_ctx and locks_delete_lock_ctx
> >   filelock: split leases out of struct file_lock
> > 
> >  cocci/filelock.cocci|  81 +
> >  cocci/filelock2.cocci   |   6 +
> >  cocci/nlm.cocci |  81 +
> >  fs/9p/vfs_file.c|  38 +-
> >  fs/afs/flock.c  |  55 +--
> >  fs/ceph/locks.c |  74 ++--
> >  fs/dlm/plock.c  |  44 +--
> >  fs/fuse/file.c  |  14 +-
> >  fs/gfs2/file.c  |  16 +-
> >  fs/libfs.c  |   2 

Re: [PATCH 04/20] filelock: fixups after the coccinelle changes

2024-01-17 Thread Chuck Lever
On Tue, Jan 16, 2024 at 02:46:00PM -0500, Jeff Layton wrote:
> The coccinelle script doesn't catch quite everythng (particularly with
> embedded structs). These are some by-hand fixups after the split of
> common fields into struct file_lock_core.
> 
> Signed-off-by: Jeff Layton 

For the changes in fs/lockd/ and fs/nfsd/:

Acked-by: Chuck Lever 


> ---
>  fs/ceph/locks.c |  8 ++---
>  fs/lockd/clnt4xdr.c |  8 ++---
>  fs/lockd/clntproc.c |  6 ++--
>  fs/lockd/clntxdr.c  |  8 ++---
>  fs/lockd/svc4proc.c | 10 +++---
>  fs/lockd/svclock.c  | 54 +
>  fs/lockd/svcproc.c  | 10 +++---
>  fs/lockd/svcsubs.c  |  4 +--
>  fs/lockd/xdr.c  |  8 ++---
>  fs/lockd/xdr4.c |  8 ++---
>  fs/locks.c  | 67 
> +
>  fs/nfs/delegation.c |  2 +-
>  fs/nfs/nfs4state.c  |  2 +-
>  fs/nfs/nfs4trace.h  |  4 +--
>  fs/nfs/write.c  |  4 +--
>  fs/nfsd/nfs4callback.c  |  2 +-
>  fs/nfsd/nfs4state.c |  4 +--
>  fs/smb/client/file.c|  2 +-
>  fs/smb/server/vfs.c |  2 +-
>  include/trace/events/afs.h  |  4 +--
>  include/trace/events/filelock.h | 32 ++--
>  21 files changed, 126 insertions(+), 123 deletions(-)
> 
> diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
> index ee12f9864980..55be5d231e38 100644
> --- a/fs/ceph/locks.c
> +++ b/fs/ceph/locks.c
> @@ -386,9 +386,9 @@ void ceph_count_locks(struct inode *inode, int 
> *fcntl_count, int *flock_count)
>   ctx = locks_inode_context(inode);
>   if (ctx) {
>   spin_lock(>flc_lock);
> - list_for_each_entry(lock, >flc_posix, fl_list)
> + list_for_each_entry(lock, >flc_posix, fl_core.fl_list)
>   ++(*fcntl_count);
> - list_for_each_entry(lock, >flc_flock, fl_list)
> + list_for_each_entry(lock, >flc_flock, fl_core.fl_list)
>   ++(*flock_count);
>   spin_unlock(>flc_lock);
>   }
> @@ -455,7 +455,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
>   return 0;
>  
>   spin_lock(>flc_lock);
> - list_for_each_entry(lock, >flc_posix, fl_list) {
> + list_for_each_entry(lock, >flc_posix, fl_core.fl_list) {
>   ++seen_fcntl;
>   if (seen_fcntl > num_fcntl_locks) {
>   err = -ENOSPC;
> @@ -466,7 +466,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
>   goto fail;
>   ++l;
>   }
> - list_for_each_entry(lock, >flc_flock, fl_list) {
> + list_for_each_entry(lock, >flc_flock, fl_core.fl_list) {
>   ++seen_flock;
>   if (seen_flock > num_flock_locks) {
>   err = -ENOSPC;
> diff --git a/fs/lockd/clnt4xdr.c b/fs/lockd/clnt4xdr.c
> index ed00bd2869a7..083a3b1bf288 100644
> --- a/fs/lockd/clnt4xdr.c
> +++ b/fs/lockd/clnt4xdr.c
> @@ -243,7 +243,7 @@ static void encode_nlm4_holder(struct xdr_stream *xdr,
>   u64 l_offset, l_len;
>   __be32 *p;
>  
> - encode_bool(xdr, lock->fl.fl_type == F_RDLCK);
> + encode_bool(xdr, lock->fl.fl_core.fl_type == F_RDLCK);
>   encode_int32(xdr, lock->svid);
>   encode_netobj(xdr, lock->oh.data, lock->oh.len);
>  
> @@ -357,7 +357,7 @@ static void nlm4_xdr_enc_testargs(struct rpc_rqst *req,
>   const struct nlm_lock *lock = >lock;
>  
>   encode_cookie(xdr, >cookie);
> - encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
> + encode_bool(xdr, lock->fl.fl_core.fl_type == F_WRLCK);
>   encode_nlm4_lock(xdr, lock);
>  }
>  
> @@ -380,7 +380,7 @@ static void nlm4_xdr_enc_lockargs(struct rpc_rqst *req,
>  
>   encode_cookie(xdr, >cookie);
>   encode_bool(xdr, args->block);
> - encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
> + encode_bool(xdr, lock->fl.fl_core.fl_type == F_WRLCK);
>   encode_nlm4_lock(xdr, lock);
>   encode_bool(xdr, args->reclaim);
>   encode_int32(xdr, args->state);
> @@ -403,7 +403,7 @@ static void nlm4_xdr_enc_cancargs(struct rpc_rqst *req,
>  
>   encode_cookie(xdr, >cookie);
>   encode_bool(xdr, args->block);
> - encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
> + encode_bool(xdr, lock->fl.fl_core.fl_type == F_WRLCK);
>   encode_nlm4_lock(xdr, lock);
>  }
>  
> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> index ac1d07034346..15461e8952b4 100644
> --- a/fs/lockd/clntproc.c
> +++ b/fs/lockd/clntproc.c
> @@ -143,7 +143,7 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, 
> struct file_lock *fl)
>   lock->svid = fl->fl_u.nfs_fl.owner->pid;
>   lock->fl.fl_start = fl->fl_start;
>   lock->fl.fl_end = fl->fl_end;
> - lock->fl.fl_type = fl->fl_core.fl_type;
> + lock->fl.fl_core.fl_type = fl->fl_core.fl_type;
>  }
>  
>  

Re: [PATCH 00/20] filelock: split struct file_lock into file_lock and file_lease structs

2024-01-17 Thread Chuck Lever
On Tue, Jan 16, 2024 at 02:45:56PM -0500, Jeff Layton wrote:
> Long ago, file locks used to hang off of a singly-linked list in struct
> inode. Because of this, when leases were added, they were added to the
> same list and so they had to be tracked using the same sort of
> structure.
> 
> Several years ago, we added struct file_lock_context, which allowed us
> to use separate lists to track different types of file locks. Given
> that, leases no longer need to be tracked using struct file_lock.
> 
> That said, a lot of the underlying infrastructure _is_ the same between
> file leases and locks, so we can't completely separate everything.

Naive question: locks and leases are similar. Why do they need to be
split apart? The cover letter doesn't address that, and I'm new
enough at this that I don't have that context.


> This patchset first splits a group of fields used by both file locks and
> leases into a new struct file_lock_core, that is then embedded in struct
> file_lock. Coccinelle was then used to convert a lot of the callers to
> deal with the move, with the remaining 25% or so converted by hand.
> 
> It then converts several internal functions in fs/locks.c to work
> with struct file_lock_core. Lastly, struct file_lock is split into
> struct file_lock and file_lease, and the lease-related APIs converted to
> take struct file_lease.
> 
> After the first few patches (which I left split up for easier review),
> the set should be bisectable. I'll plan to squash the first few
> together to make sure the resulting set is bisectable before merge.
> 
> Finally, I left the coccinelle scripts I used in tree. I had heard it
> was preferable to merge those along with the patches that they
> generate, but I wasn't sure where they go. I can either move those to a
> more appropriate location or we can just drop that commit if it's not
> needed.
> 
> I'd like to have this considered for inclusion in v6.9. Christian, would
> you be amenable to shepherding this into mainline (assuming there are no
> major objections, of course)?
> 
> Signed-off-by: Jeff Layton 
> ---
> Jeff Layton (20):
>   filelock: split common fields into struct file_lock_core
>   filelock: add coccinelle scripts to move fields to struct file_lock_core
>   filelock: the results of the coccinelle conversion
>   filelock: fixups after the coccinelle changes
>   filelock: convert some internal functions to use file_lock_core instead
>   filelock: convert more internal functions to use file_lock_core
>   filelock: make posix_same_owner take file_lock_core pointers
>   filelock: convert posix_owner_key to take file_lock_core arg
>   filelock: make locks_{insert,delete}_global_locks take file_lock_core 
> arg
>   filelock: convert locks_{insert,delete}_global_blocked
>   filelock: convert the IS_* macros to take file_lock_core
>   filelock: make __locks_delete_block and __locks_wake_up_blocks take 
> file_lock_core
>   filelock: convert __locks_insert_block, conflict and deadlock checks to 
> use file_lock_core
>   filelock: convert fl_blocker to file_lock_core
>   filelock: clean up locks_delete_block internals
>   filelock: reorganize locks_delete_block and __locks_insert_block
>   filelock: make assign_type helper take a file_lock_core pointer
>   filelock: convert locks_wake_up_blocks to take a file_lock_core pointer
>   filelock: convert locks_insert_lock_ctx and locks_delete_lock_ctx
>   filelock: split leases out of struct file_lock
> 
>  cocci/filelock.cocci|  81 +
>  cocci/filelock2.cocci   |   6 +
>  cocci/nlm.cocci |  81 +
>  fs/9p/vfs_file.c|  38 +-
>  fs/afs/flock.c  |  55 +--
>  fs/ceph/locks.c |  74 ++--
>  fs/dlm/plock.c  |  44 +--
>  fs/fuse/file.c  |  14 +-
>  fs/gfs2/file.c  |  16 +-
>  fs/libfs.c  |   2 +-
>  fs/lockd/clnt4xdr.c |  14 +-
>  fs/lockd/clntlock.c |   2 +-
>  fs/lockd/clntproc.c |  60 +--
>  fs/lockd/clntxdr.c  |  14 +-
>  fs/lockd/svc4proc.c |  10 +-
>  fs/lockd/svclock.c  |  64 ++--
>  fs/lockd/svcproc.c  |  10 +-
>  fs/lockd/svcsubs.c  |  24 +-
>  fs/lockd/xdr.c  |  14 +-
>  fs/lockd/xdr4.c |  14 +-
>  fs/locks.c  | 785 
> ++--
>  fs/nfs/delegation.c |   4 +-
>  fs/nfs/file.c   |  22 +-
>  fs/nfs/nfs3proc.c   |   2 +-
>  fs/nfs/nfs4_fs.h|   2 +-
>  fs/nfs/nfs4file.c   |   2 +-
>  fs/nfs/nfs4proc.c   |  39 +-
>  fs/nfs/nfs4state.c  |   6 +-
>  fs/nfs/nfs4trace.h  |   4 +-
>  fs/nfs/nfs4xdr.c|   8 +-
>  fs/nfs/write.c  |   8 +-
>  fs/nfsd/filecache.c |   4 

Re: [PATCH 03/20] filelock: the results of the coccinelle conversion

2024-01-17 Thread Chuck Lever
On Tue, Jan 16, 2024 at 02:45:59PM -0500, Jeff Layton wrote:
> This is the direct result of the changes generated by coccinelle. There
> is still about 1/4 of the callsites that need to be touched by hand
> here.
> 
> Signed-off-by: Jeff Layton 

For the changes in include/linux/lockd/lockd.h, fs/lockd/, and
fs/nfsd/:

Acked-by: Chuck Lever 


> ---
>  fs/9p/vfs_file.c|  38 ++---
>  fs/afs/flock.c  |  55 +++---
>  fs/ceph/locks.c |  66 +++
>  fs/dlm/plock.c  |  44 ++---
>  fs/fuse/file.c  |  14 +-
>  fs/gfs2/file.c  |  16 +-
>  fs/lockd/clnt4xdr.c |   6 +-
>  fs/lockd/clntlock.c |   2 +-
>  fs/lockd/clntproc.c |  60 ---
>  fs/lockd/clntxdr.c  |   6 +-
>  fs/lockd/svclock.c  |  10 +-
>  fs/lockd/svcsubs.c  |  20 +--
>  fs/lockd/xdr.c  |   6 +-
>  fs/lockd/xdr4.c |   6 +-
>  fs/locks.c  | 406 
> +++-
>  fs/nfs/delegation.c |   2 +-
>  fs/nfs/file.c   |  22 +--
>  fs/nfs/nfs3proc.c   |   2 +-
>  fs/nfs/nfs4proc.c   |  35 ++--
>  fs/nfs/nfs4state.c  |   4 +-
>  fs/nfs/nfs4xdr.c|   8 +-
>  fs/nfs/write.c  |   4 +-
>  fs/nfsd/filecache.c |   4 +-
>  fs/nfsd/nfs4layouts.c   |  15 +-
>  fs/nfsd/nfs4state.c |  73 
>  fs/ocfs2/locks.c|  12 +-
>  fs/ocfs2/stack_user.c   |   2 +-
>  fs/smb/client/cifssmb.c |   8 +-
>  fs/smb/client/file.c|  72 
>  fs/smb/client/smb2file.c|   2 +-
>  fs/smb/server/smb2pdu.c |  44 ++---
>  fs/smb/server/vfs.c |  12 +-
>  include/linux/lockd/lockd.h |   8 +-
>  33 files changed, 554 insertions(+), 530 deletions(-)
> 
> diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
> index 11cd8d23f6f2..f35ac7cb782e 100644
> --- a/fs/9p/vfs_file.c
> +++ b/fs/9p/vfs_file.c
> @@ -107,7 +107,7 @@ static int v9fs_file_lock(struct file *filp, int cmd, 
> struct file_lock *fl)
>  
>   p9_debug(P9_DEBUG_VFS, "filp: %p lock: %p\n", filp, fl);
>  
> - if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
> + if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_core.fl_type != 
> F_UNLCK) {
>   filemap_write_and_wait(inode->i_mapping);
>   invalidate_mapping_pages(>i_data, 0, -1);
>   }
> @@ -127,7 +127,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
> struct file_lock *fl)
>   fid = filp->private_data;
>   BUG_ON(fid == NULL);
>  
> - BUG_ON((fl->fl_flags & FL_POSIX) != FL_POSIX);
> + BUG_ON((fl->fl_core.fl_flags & FL_POSIX) != FL_POSIX);
>  
>   res = locks_lock_file_wait(filp, fl);
>   if (res < 0)
> @@ -136,7 +136,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
> struct file_lock *fl)
>   /* convert posix lock to p9 tlock args */
>   memset(, 0, sizeof(flock));
>   /* map the lock type */
> - switch (fl->fl_type) {
> + switch (fl->fl_core.fl_type) {
>   case F_RDLCK:
>   flock.type = P9_LOCK_TYPE_RDLCK;
>   break;
> @@ -152,7 +152,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
> struct file_lock *fl)
>   flock.length = 0;
>   else
>   flock.length = fl->fl_end - fl->fl_start + 1;
> - flock.proc_id = fl->fl_pid;
> + flock.proc_id = fl->fl_core.fl_pid;
>   flock.client_id = fid->clnt->name;
>   if (IS_SETLKW(cmd))
>   flock.flags = P9_LOCK_FLAGS_BLOCK;
> @@ -207,12 +207,12 @@ static int v9fs_file_do_lock(struct file *filp, int 
> cmd, struct file_lock *fl)
>* incase server returned error for lock request, revert
>* it locally
>*/
> - if (res < 0 && fl->fl_type != F_UNLCK) {
> - fl_type = fl->fl_type;
> - fl->fl_type = F_UNLCK;
> + if (res < 0 && fl->fl_core.fl_type != F_UNLCK) {
> + fl_type = fl->fl_core.fl_type;
> + fl->fl_core.fl_type = F_UNLCK;
>   /* Even if this fails we want to return the remote error */
>   locks_lock_file_wait(filp, fl);
> - fl->fl_type = fl_type;
> + fl->fl_core.fl_type = fl_type;
>   }
>   if (flock.client_id != fid->clnt->name)
>   kfree(flock.client_id);
> @@ -234,7 +234,7 @@ static int v9fs_file_getlock(struct file *filp, struct 
> file_lock *fl)
>* if we have a conflicting lock locally, no need to validate
>* with server
>*/
> - if (fl->fl_type != F_UNLCK)
> + if (fl->fl_core.fl_type != F_UNLCK)
>   return res;
>  
>   /* convert posix lock to p9 tgetlock args */
> @@ -245,7 +245,7 @@ static int v9fs_file_getlock(struct file *filp, struct 
> file_lock *fl)
>   glock.length = 0;
>   else
>   glock.length = fl->fl_end - fl->fl_start + 1;
> - glock.proc_id = fl->fl_pid;
> + glock.proc_id 

[mhiramat:topic/fprobe-on-fgraph] [function_graph] f2ed0fd5cc: WARNING:at_kernel/trace/trace.c:#run_tracer_selftest

2024-01-17 Thread kernel test robot



Hello,

kernel test robot noticed 
"WARNING:at_kernel/trace/trace.c:#run_tracer_selftest" on:

commit: f2ed0fd5cc77ca2ad957569c35e76ef08cf93f54 ("function_graph: Add a new 
entry handler with parent_ip and ftrace_regs")
https://git.kernel.org/cgit/linux/kernel/git/mhiramat/linux.git 
topic/fprobe-on-fgraph

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+--+++
|  | 2d3653c71e | 
f2ed0fd5cc |
+--+++
| WARNING:at_kernel/trace/trace.c:#run_tracer_selftest | 0  | 20
 |
| EIP:run_tracer_selftest  | 0  | 20
 |
+--+++


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-lkp/202401172200.c8731564-oliver.s...@intel.com



[5.535374][T1] Testing tracer function_graph: .. no entries found 
..FAILED!
[6.645598][T1] [ cut here ]
[6.646006][T1] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:2045 
run_tracer_selftest+0x256/0x26c
[6.646759][T1] Modules linked in:
[6.647062][T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G
T  6.7.0-rc8-00022-gf2ed0fd5cc77 #1
[6.647669][T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[6.648455][T1] EIP: run_tracer_selftest+0x256/0x26c
[6.648871][T1] Code: 00 00 66 90 8b 15 ec 97 d3 d4 b9 ff ff ff ff a1 28 
98 d3 d4 e8 37 68 ff ff e9 26 fe ff ff c7 04 24 90 d6 8c d4 e8 92 68 f6 ff <0f> 
0b b8 ff ff ff ff e9 a7 fe ff ff b8 f4 ff ff ff e9 9d fe ff ff
[6.650343][T1] EAX: 0007 EBX: d4e33700 ECX:  EDX: 
[6.650884][T1] ESI: d4e4a8c0 EDI: c101f100 EBP: c138de8c ESP: c138de78
[6.651422][T1] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 
00010246
[6.651661][T1] CR0: 80050033 CR2: ffcb2000 CR3: 14f88000 CR4: 00040690
[6.652189][T1] DR0:  DR1:  DR2:  DR3: 
[6.652715][T1] DR6: fffe0ff0 DR7: 0400
[6.653063][T1] Call Trace:
[6.653309][T1]  ? show_regs+0x82/0x90
[6.653630][T1]  ? run_tracer_selftest+0x256/0x26c
[6.654025][T1]  ? __warn+0x89/0x1bc
[6.654333][T1]  ? run_tracer_selftest+0x256/0x26c
[6.654733][T1]  ? run_tracer_selftest+0x256/0x26c
[6.655128][T1]  ? report_bug+0x1c5/0x23c
[6.655467][T1]  ? exc_overflow+0x58/0x58
[6.655661][T1]  ? handle_bug+0x28/0x50
[6.655984][T1]  ? exc_invalid_op+0x28/0x7c
[6.656332][T1]  ? preempt_count_sub+0xc9/0x120
[6.656706][T1]  ? preempt_count_add+0x69/0xc8
[6.657074][T1]  ? vprintk_emit+0x7a/0x194
[6.657418][T1]  ? handle_exception+0x14d/0x14d
[6.657794][T1]  ? exc_overflow+0x58/0x58
[6.658130][T1]  ? run_tracer_selftest+0x256/0x26c
[6.658530][T1]  ? exc_overflow+0x58/0x58
[6.658865][T1]  ? run_tracer_selftest+0x256/0x26c
[6.659262][T1]  register_tracer+0xe4/0x28c
[6.659612][T1]  ? register_trace_event+0xf6/0x180
[6.659661][T1]  ? init_graph_tracefs+0x3c/0x3c
[6.660036][T1]  init_graph_trace+0x56/0x7c
[6.660386][T1]  do_one_initcall+0x70/0x3b8
[6.660732][T1]  ? parameq+0x13/0x68
[6.661039][T1]  ? parse_args+0x202/0x37c
[6.661383][T1]  do_initcalls+0x100/0x228
[6.661721][T1]  ? rdinit_setup+0x40/0x40
[6.662061][T1]  ? rest_init+0x194/0x194
[6.662393][T1]  kernel_init_freeable+0xd3/0x1c4
[6.662772][T1]  kernel_init+0x1a/0x1dc
[6.663096][T1]  ? schedule_tail+0x60/0x78
[6.663439][T1]  ret_from_fork+0x3d/0x48
[6.663661][T1]  ? rest_init+0x194/0x194
[6.663992][T1]  ret_from_fork_asm+0x12/0x18
[6.664348][T1]  entry_INT80_32+0x10d/0x10d
[6.664702][T1] irq event stamp: 2172679
[6.665030][T1] hardirqs last  enabled at (2172687): [] 
console_unlock+0x11a/0x178
[6.665694][T1] hardirqs last disabled at (2172694): [] 
console_unlock+0x101/0x178
[6.666360][T1] softirqs last  enabled at (2172316): [] 
__do_softirq+0x2f3/0x4b4
[6.667009][T1] softirqs last disabled at (2172309): [] 
do_softirq_own_stack+0x37/0x4c
[6.667661][T1] ---[ end trace  ]---



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240117/202401172200.c8731564-oliver.s...@intel.com



-- 
0-DAY CI Kernel Test Service
https://

[mhiramat:topic/fprobe-on-fgraph] [function_graph] 367c724761: BUG:KASAN:slab-out-of-bounds_in_ftrace_push_return_trace

2024-01-17 Thread kernel test robot
00
[   43.990167][  T644] page dumped because: kasan: bad access detected
[   43.996456][  T644]
[   43.998663][  T644] Memory state around the buggy address:
[   44.004172][  T644]  888116ea1e80: fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc
[   44.012114][  T644]  888116ea1f00: fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc
[   44.020056][  T644] >888116ea1f80: fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc
[   44.027999][  T644]  
   ^
[   44.035856][  T644]  888116ea2000: 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00
[   44.043803][  T644]  888116ea2080: 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00
[   44.051747][  T644] 
==
[   44.059762][  T644] Disabling lock debugging due to kernel taint
[   44.423837][  T312] 2024-01-15 00:33:41 echo 1 > tracing_on
[   44.425933][  T312]
[   45.159545][  T312] 2024-01-15 00:33:41 echo 0 > tracing_on
[   45.159845][  T312]
[   45.289241][  T312] 2024-01-15 00:33:41 echo wakeup_dl > current_tracer
[   45.289643][  T312]
[   45.735915][  T312] 2024-01-15 00:33:42 echo 1 > tracing_on
[   45.735956][  T312]
[   45.780233][  T312] 2024-01-15 00:33:42 echo 0 > tracing_on
[   45.780288][  T312]
[   45.802820][  T312] 2024-01-15 00:33:42 echo wakeup_rt > current_tracer
[   45.802877][  T312]
[   46.191536][  T312] 2024-01-15 00:33:42 echo 1 > tracing_on
[   46.191576][  T312]
[   46.227375][  T312] 2024-01-15 00:33:42 echo 0 > tracing_on
[   46.227420][  T312]
[   46.248058][  T312] 2024-01-15 00:33:42 echo wakeup > current_tracer
[   46.248099][  T312]
[   46.637161][  T312] 2024-01-15 00:33:43 echo 1 > tracing_on


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240117/202401172217.36e37075-oliver.s...@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




[PATCH 2/2] remoteproc: stm32: Fix incorrect type assignment returned by stm32_rproc_get_loaded_rsc_tablef

2024-01-17 Thread Arnaud Pouliquen
The sparse tool complains about the remove of the _iomem attribute.

stm32_rproc.c:660:17: warning: cast removes address space '__iomem' of 
expression

Add '__force' to explicitly specify that the cast is intentional.
This conversion is necessary to cast to addresses pointer,
which are then managed by the remoteproc core as a pointer to a
resource_table structure.

Fixes: 8a471396d21c ("remoteproc: stm32: Move resource table setup to 
rproc_ops")
Signed-off-by: Arnaud Pouliquen 
---
 drivers/remoteproc/stm32_rproc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c
index 2c28635219eb..10b442c6f632 100644
--- a/drivers/remoteproc/stm32_rproc.c
+++ b/drivers/remoteproc/stm32_rproc.c
@@ -657,7 +657,7 @@ stm32_rproc_get_loaded_rsc_table(struct rproc *rproc, 
size_t *table_sz)
 * entire area by overwriting it with the initial values stored in 
rproc->clean_table.
 */
*table_sz = RSC_TBL_SIZE;
-   return (struct resource_table *)ddata->rsc_va;
+   return (__force struct resource_table *)ddata->rsc_va;
 }
 
 static const struct rproc_ops st_rproc_ops = {
-- 
2.25.1




[PATCH 1/2] remoteproc: stm32: Fix incorrect type in assignment for va

2024-01-17 Thread Arnaud Pouliquen
The sparse tool complains about the attribute conversion between
a _iomem void * and a void *:

stm32_rproc.c:122:12: sparse: sparse: incorrect type in assignment (different 
address spaces) @@ expected void *va @@ got void [noderef] __iomem * @@
stm32_rproc.c:122:12: sparse: expected void *va
stm32_rproc.c:122:12: sparse: got void [noderef] __iomem *

Add '__force' to explicitly specify that the cast is intentional.
This conversion is necessary to cast to virtual addresses pointer,used,
by the remoteproc core.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202312150052.hcinklqb-...@intel.com/
Fixes: 13140de09cc2 ("remoteproc: stm32: add an ST stm32_rproc driver")
Signed-off-by: Arnaud Pouliquen 
---
 drivers/remoteproc/stm32_rproc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c
index 4f469f0bcf8b..2c28635219eb 100644
--- a/drivers/remoteproc/stm32_rproc.c
+++ b/drivers/remoteproc/stm32_rproc.c
@@ -120,7 +120,7 @@ static int stm32_rproc_mem_alloc(struct rproc *rproc,
void *va;
 
dev_dbg(dev, "map memory: %pad+%zx\n", >dma, mem->len);
-   va = ioremap_wc(mem->dma, mem->len);
+   va = (__force void *)ioremap_wc(mem->dma, mem->len);
if (IS_ERR_OR_NULL(va)) {
dev_err(dev, "Unable to map memory region: %pad+0x%zx\n",
>dma, mem->len);
@@ -137,7 +137,7 @@ static int stm32_rproc_mem_release(struct rproc *rproc,
   struct rproc_mem_entry *mem)
 {
dev_dbg(rproc->dev.parent, "unmap memory: %pa\n", >dma);
-   iounmap(mem->va);
+   iounmap((__force __iomem void *)mem->va);
 
return 0;
 }
-- 
2.25.1




[PATCH 0/2] remoteproc: stm32: Fix sparse warnings

2024-01-17 Thread Arnaud Pouliquen
Fix warnings reported by sparse using make option "C=1"

Arnaud Pouliquen (2):
  remoteproc: stm32: Fix incorrect type in assignment for va
  remoteproc: stm32: Fix incorrect type assignment returned by
stm32_rproc_get_loaded_rsc_tablef

 drivers/remoteproc/stm32_rproc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
-- 
2.25.1




Re: [PATCH 02/20] filelock: add coccinelle scripts to move fields to struct file_lock_core

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 13:25 +, David Howells wrote:
> Do we need to keep these coccinelle scripts for posterity?  Or can they just
> be included in the patch description of the patch that generates them?
> 

I have the same question. I included them here mostly so they can be
reviewed as well, but I'm not sure whether and how we should retain them
for posterity.
-- 
Jeff Layton 



Re: [PATCH 02/20] filelock: add coccinelle scripts to move fields to struct file_lock_core

2024-01-17 Thread David Howells
Do we need to keep these coccinelle scripts for posterity?  Or can they just
be included in the patch description of the patch that generates them?

David




Re: [PATCH 0/2] remoteproc: stm32: Fix sparse warnings

2024-01-17 Thread Arnaud POULIQUEN
Hi,

On 1/17/24 14:18, Arnaud Pouliquen wrote:
> Fix warnings reported by sparse using make option "C=1"
> 
> Arnaud Pouliquen (2):
>   remoteproc: stm32: Fix incorrect type in assignment for va
>   remoteproc: stm32: Fix incorrect type assignment returned by
> stm32_rproc_get_loaded_rsc_tablef
> 
>  drivers/remoteproc/stm32_rproc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> 
> base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a

I faced an issue with the ST server message that blocked a part
of the mails associated with this series. Please ignore this series,
as I will resend it.

Regards
Arnaud



[PATCH 1/2] remoteproc: stm32: Fix incorrect type in assignment for va

2024-01-17 Thread Arnaud Pouliquen
The sparse tool complains about the attribute conversion between
a _iomem void * and a void *:

stm32_rproc.c:122:12: sparse: sparse: incorrect type in assignment (different 
address spaces) @@ expected void *va @@ got void [noderef] __iomem * @@
stm32_rproc.c:122:12: sparse: expected void *va
stm32_rproc.c:122:12: sparse: got void [noderef] __iomem *

Add '__force' to explicitly specify that the cast is intentional.
This conversion is necessary to cast to virtual addresses pointer,used,
by the remoteproc core.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202312150052.hcinklqb-...@intel.com/
Fixes: 13140de09cc2 ("remoteproc: stm32: add an ST stm32_rproc driver")
Signed-off-by: Arnaud Pouliquen 
---
 drivers/remoteproc/stm32_rproc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c
index 4f469f0bcf8b..2c28635219eb 100644
--- a/drivers/remoteproc/stm32_rproc.c
+++ b/drivers/remoteproc/stm32_rproc.c
@@ -120,7 +120,7 @@ static int stm32_rproc_mem_alloc(struct rproc *rproc,
void *va;
 
dev_dbg(dev, "map memory: %pad+%zx\n", >dma, mem->len);
-   va = ioremap_wc(mem->dma, mem->len);
+   va = (__force void *)ioremap_wc(mem->dma, mem->len);
if (IS_ERR_OR_NULL(va)) {
dev_err(dev, "Unable to map memory region: %pad+0x%zx\n",
>dma, mem->len);
@@ -137,7 +137,7 @@ static int stm32_rproc_mem_release(struct rproc *rproc,
   struct rproc_mem_entry *mem)
 {
dev_dbg(rproc->dev.parent, "unmap memory: %pa\n", >dma);
-   iounmap(mem->va);
+   iounmap((__force __iomem void *)mem->va);
 
return 0;
 }
-- 
2.25.1




[PATCH 0/2] remoteproc: stm32: Fix sparse warnings

2024-01-17 Thread Arnaud Pouliquen
Fix warnings reported by sparse using make option "C=1"

Arnaud Pouliquen (2):
  remoteproc: stm32: Fix incorrect type in assignment for va
  remoteproc: stm32: Fix incorrect type assignment returned by
stm32_rproc_get_loaded_rsc_tablef

 drivers/remoteproc/stm32_rproc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
-- 
2.25.1




Re: [PATCH 00/20] filelock: split struct file_lock into file_lock and file_lease structs

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 13:48 +0100, Christian Brauner wrote:
> > I'd like to have this considered for inclusion in v6.9. Christian, would
> > you be amenable to shepherding this into mainline (assuming there are no
> > major objections, of course)?
> 
> Yes, of course I will be happy to.

Great! I probably have at least another version or two to send before
it's ready for linux-next, but hopefully we can get it there soon after
the merge window closes.

Thanks,
-- 
Jeff Layton 



Re: [PATCH 00/20] filelock: split struct file_lock into file_lock and file_lease structs

2024-01-17 Thread Christian Brauner
> I'd like to have this considered for inclusion in v6.9. Christian, would
> you be amenable to shepherding this into mainline (assuming there are no
> major objections, of course)?

Yes, of course I will be happy to.



Re: [PATCH 01/20] filelock: split common fields into struct file_lock_core

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 09:07 +1100, NeilBrown wrote:
> On Wed, 17 Jan 2024, Jeff Layton wrote:
> > In a future patch, we're going to split file leases into their own
> > structure. Since a lot of the underlying machinery uses the same fields
> > move those into a new file_lock_core, and embed that inside struct
> > file_lock.
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  include/linux/filelock.h | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/filelock.h b/include/linux/filelock.h
> > index 95e868e09e29..7825511c1c11 100644
> > --- a/include/linux/filelock.h
> > +++ b/include/linux/filelock.h
> > @@ -85,8 +85,9 @@ bool opens_in_grace(struct net *);
> >   *
> >   * Obviously, the last two criteria only matter for POSIX locks.
> >   */
> > -struct file_lock {
> > -   struct file_lock *fl_blocker;   /* The lock, that is blocking us */
> > +
> > +struct file_lock_core {
> > +   struct file_lock *fl_blocker;   /* The lock that is blocking us */
> > struct list_head fl_list;   /* link into file_lock_context */
> > struct hlist_node fl_link;  /* node in global lists */
> > struct list_head fl_blocked_requests;   /* list of requests with
> > @@ -102,6 +103,10 @@ struct file_lock {
> > int fl_link_cpu;/* what cpu's list is this on? */
> > wait_queue_head_t fl_wait;
> > struct file *fl_file;
> > +};
> > +
> > +struct file_lock {
> > +   struct file_lock_core fl_core;
> > loff_t fl_start;
> > loff_t fl_end;
> >  
> 
> If I we doing this, I would rename all the fields in file_lock_core to
> have an "flc_" prefix, and add some #defines like
> 
>  #define fl_list fl_core.flc_list
> 
> so there would be no need to squash this with later patches to achieve
> bisectability.
> 
> The #defines would be removed after the coccinelle scripts etc are
> applied.
> 
> I would also do the "convert some internal functions" patches *before*
> the bulk conversion of fl_foo to fl_code.flc_foo so that those functions
> don't get patched twice.
> 
> But this is all personal preference.  If you prefer your approach,
> please leave it that way.  The only clear benefit of my approach is that
> you don't need to squash patches together, and that is probably not a
> big deal.
> 

I considered going back and doing that. It would allow me to break this
up into smaller patches, but I think that basically means doing all of
this work over again. I'll probably stick with this approach, unless I
end up needing to respin for other reasons.

-- 
Jeff Layton 



Re: [PATCH 20/20] filelock: split leases out of struct file_lock

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 09:44 +1100, NeilBrown wrote:
> On Wed, 17 Jan 2024, Jeff Layton wrote:
> > Add a new struct file_lease and move the lease-specific fields from
> > struct file_lock to it. Convert the appropriate API calls to take
> > struct file_lease instead, and convert the callers to use them.
> 
> I think that splitting of struct lease_manager_operations out from
> lock_manager_operations should be mentioned here too.
> 

Will do.

> 
> >  
> > +struct file_lease {
> > +   struct file_lock_core fl_core;
> > +   struct fasync_struct *  fl_fasync; /* for lease break notifications */
> > +   /* for lease breaks: */
> > +   unsigned long fl_break_time;
> > +   unsigned long fl_downgrade_time;
> > +   const struct lease_manager_operations *fl_lmops;/* Callbacks 
> > for lockmanagers */
> 
> comment should be "Callbacks for leasemanagers".  Or maybe 
> "lease managers". 
> 
> It is unfortunate that "lock" and "lease" both start with 'l' as we now
> have two quite different fields in different structures with the same
> name - fl_lmops.
> 

Hah, I had sort of considered that an advantage since I didn't need to
change as many call sites! Still, I get your point that having distinct
names is preferable.

I can change this to be distinct. I'll just need to come up with a
reasonable variable name (never my strong suit).

-- 
Jeff Layton 



Re: [PATCH net] net: ipvs: avoid stat macros calls from preemptible context

2024-01-17 Thread Pablo Neira Ayuso
On Mon, Jan 15, 2024 at 05:39:22PM +0300, Fedor Pchelkin wrote:
> Inside decrement_ttl() upon discovering that the packet ttl has exceeded,
> __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible
> context having the following backtrace:
> 
> check_preemption_disabled: 48 callbacks suppressed
> BUG: using __this_cpu_add() in preemptible [] code: curl/1177
> caller is decrement_ttl+0x217/0x830

Applied to nf.git, thanks



Re: [PATCH 13/20] filelock: convert __locks_insert_block, conflict and deadlock checks to use file_lock_core

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 09:32 +1100, NeilBrown wrote:
> On Wed, 17 Jan 2024, Jeff Layton wrote:
> > Have both __locks_insert_block and the deadlock and conflict checking
> > functions take a struct file_lock_core pointer instead of a struct
> > file_lock one. Also, change posix_locks_deadlock to return bool.
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/locks.c | 132 
> > -
> >  1 file changed, 70 insertions(+), 62 deletions(-)
> > 
> 
> 
> >  
> >  /* Must be called with the blocked_lock_lock held! */
> > -static int posix_locks_deadlock(struct file_lock *caller_fl,
> > -   struct file_lock *block_fl)
> > +static bool posix_locks_deadlock(struct file_lock *caller_fl,
> > +struct file_lock *block_fl)
> >  {
> > +   struct file_lock_core *caller = _fl->fl_core;
> > +   struct file_lock_core *blocker = _fl->fl_core;
> > int i = 0;
> > -   struct file_lock_core *flc = _fl->fl_core;
> >  
> > lockdep_assert_held(_lock_lock);
> >  
> > @@ -1034,16 +1040,16 @@ static int posix_locks_deadlock(struct file_lock 
> > *caller_fl,
> >  * This deadlock detector can't reasonably detect deadlocks with
> >  * FL_OFDLCK locks, since they aren't owned by a process, per-se.
> >  */
> > -   if (IS_OFDLCK(flc))
> > +   if (IS_OFDLCK(caller))
> > return 0;
> 
>   return false;
> 

Good catch. Fixed in my local branch.

-- 
Jeff Layton 



Re: [PATCH 12/20] filelock: make __locks_delete_block and __locks_wake_up_blocks take file_lock_core

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 09:23 +1100, NeilBrown wrote:
> On Wed, 17 Jan 2024, Jeff Layton wrote:
> > Convert __locks_delete_block and __locks_wake_up_blocks to take a struct
> > file_lock_core pointer. Note that to accomodate this, we need to add a
> > new file_lock() wrapper to go from file_lock_core to file_lock.
> 
> Actually we don't need it see below.
> 
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/locks.c | 43 ++-
> >  1 file changed, 26 insertions(+), 17 deletions(-)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index eddf4d767d5d..6b8e8820dec9 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -92,6 +92,11 @@ static inline bool IS_LEASE(struct file_lock_core *flc)
> >  
> >  #define IS_REMOTELCK(fl)   (fl->fl_core.fl_pid <= 0)
> >  
> > +struct file_lock *file_lock(struct file_lock_core *flc)
> > +{
> > +   return container_of(flc, struct file_lock, fl_core);
> > +}
> > +
> >  static bool lease_breaking(struct file_lock *fl)
> >  {
> > return fl->fl_core.fl_flags & (FL_UNLOCK_PENDING | 
> > FL_DOWNGRADE_PENDING);
> > @@ -677,31 +682,35 @@ static void locks_delete_global_blocked(struct 
> > file_lock_core *waiter)
> >   *
> >   * Must be called with blocked_lock_lock held.
> >   */
> > -static void __locks_delete_block(struct file_lock *waiter)
> > +static void __locks_delete_block(struct file_lock_core *waiter)
> >  {
> > -   locks_delete_global_blocked(>fl_core);
> > -   list_del_init(>fl_core.fl_blocked_member);
> > +   locks_delete_global_blocked(waiter);
> > +   list_del_init(>fl_blocked_member);
> >  }
> >  
> > -static void __locks_wake_up_blocks(struct file_lock *blocker)
> > +static void __locks_wake_up_blocks(struct file_lock_core *blocker)
> >  {
> > -   while (!list_empty(>fl_core.fl_blocked_requests)) {
> > -   struct file_lock *waiter;
> > +   while (!list_empty(>fl_blocked_requests)) {
> > +   struct file_lock_core *waiter;
> > +   struct file_lock *fl;
> > +
> > +   waiter = list_first_entry(>fl_blocked_requests,
> > + struct file_lock_core, 
> > fl_blocked_member);
> >  
> > -   waiter = list_first_entry(>fl_core.fl_blocked_requests,
> > - struct file_lock, 
> > fl_core.fl_blocked_member);
> 
> > +   fl = file_lock(waiter);
> 
>   fl = list_first_entry(>fl_core.fl_blocked_requests,
> struct file_lock, 
> fl_core.fl_blocked_member);
> 
> waiter = >fl_core;
> 
> achieves the same result without needing file_lock().
> 
> If you really want to add file_lock() then do so, but you need a better
> justification :-)
> 
> 

Except that we actually do need to pass the file_lock pointer to
lm_notify in the block below. There are also some other places that come
up later that will need file_lock() (and file_lease(), for that matter).
I probably could get away without the container_of wrappers, but then
we'd need separate functions for leases for some of the fs/locks.c code.

To be clear, this is not the "cleanest" split ever. If we were
reimplementing leases from scratch today, I'd probably not piggyback on
so much of the file locking code. My main goal here is to get the
initial struct split done. Once that's finished, we may be able to do a
more clean separation later.

> 
> 
> > __locks_delete_block(waiter);
> > -   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> > -   waiter->fl_lmops->lm_notify(waiter);
> > +   if ((IS_POSIX(waiter) || IS_FLOCK(waiter)) &&
> > +   fl->fl_lmops && fl->fl_lmops->lm_notify)
> > +   fl->fl_lmops->lm_notify(fl);


> > else
> > -   wake_up(>fl_core.fl_wait);
> > +   wake_up(>fl_wait);
> >  
> > /*
> >  * The setting of fl_blocker to NULL marks the "done"
> >  * point in deleting a block. Paired with acquire at the top
> >  * of locks_delete_block().
> >  */
> > -   smp_store_release(>fl_core.fl_blocker, NULL);
> > +   smp_store_release(>fl_blocker, NULL);
> > }
> >  }
> >  
> > @@ -743,8 +752,8 @@ int locks_delete_block(struct file_lock *waiter)
> > spin_lock(_lock_lock);
> > if (waiter->fl_core.fl_blocker)
> > status = 0;
> > -   __locks_wake_up_blocks(waiter);
> > -   __locks_delete_block(waiter);
> > +   __locks_wake_up_blocks(>fl_core);
> > +   __locks_delete_block(>fl_core);
> >  
> > /*
> >  * The setting of fl_blocker to NULL marks the "done" point in deleting
> > @@ -799,7 +808,7 @@ static void __locks_insert_block(struct file_lock 
> > *blocker,
> >  * waiter, but might not conflict with blocker, or the requests
> >  * and lock which block it.  So they all need to be woken.
> >  */
> > -   __locks_wake_up_blocks(waiter);
> > +   __locks_wake_up_blocks(>fl_core);
> >  }
> > 

Re: [PATCH 11/20] filelock: convert the IS_* macros to take file_lock_core

2024-01-17 Thread Jeff Layton
On Wed, 2024-01-17 at 09:16 +1100, NeilBrown wrote:
> On Wed, 17 Jan 2024, Jeff Layton wrote:
> > I couldn't get them to work properly as macros, so convert them
> > to static inlines instead (which is probably better for the type safety
> > anyway).
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/locks.c | 46 +-
> >  1 file changed, 33 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 770aaa5809ba..eddf4d767d5d 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -70,10 +70,26 @@
> >  
> >  #include 
> >  
> > -#define IS_POSIX(fl)   (fl->fl_core.fl_flags & FL_POSIX)
> Used 3 times... once as
>   if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> Can an IS_POSIX lock also be IS_OFDLCK ??
> 

Yes. They conflict with one another so they're both considered POSIX
locks.

> 
> > -#define IS_FLOCK(fl)   (fl->fl_core.fl_flags & FL_FLOCK)
> Used once.
> 
> > -#define IS_LEASE(fl)   (fl->fl_core.fl_flags & 
> > (FL_LEASE|FL_DELEG|FL_LAYOUT))
> Used twice.  Either "IS_LEASE" approves things that aren't leases, or
> FL_LEASE is not set on all leases Names could be improved.
> 

Good point.

> > -#define IS_OFDLCK(fl)  (fl->fl_core.fl_flags & FL_OFDLCK)
> 
> used 4 times - a clear winner :-)
> 
> If it would me, I would simply discard these macros and open-code the
> tests.  I don't think IS_FLOCK() is easier to read for someone who knows
> the code, and I think IS_LEASE() is actually harder to read for someone
> who doesn't know the code, as that it does it not really obvious.
> 
> But this is just a suggestion, I won't push it.
> 
> Thanks,
> NeilBrown
> 
> 

It's a good suggestion, and I considered doing this when I converted
over the macros to inlines. I may go ahead and make this change for v2.

> > +static inline bool IS_POSIX(struct file_lock_core *flc)
> > +{
> > +   return flc->fl_flags & FL_POSIX;
> > +}
> > +
> > +static inline bool IS_FLOCK(struct file_lock_core *flc)
> > +{
> > +   return flc->fl_flags & FL_FLOCK;
> > +}
> > +
> > +static inline bool IS_OFDLCK(struct file_lock_core *flc)
> > +{
> > +   return flc->fl_flags & FL_OFDLCK;
> > +}
> > +
> > +static inline bool IS_LEASE(struct file_lock_core *flc)
> > +{
> > +   return flc->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT);
> > +}
> > +
> >  #define IS_REMOTELCK(fl)   (fl->fl_core.fl_pid <= 0)
> >  
> >  static bool lease_breaking(struct file_lock *fl)
> > @@ -761,6 +777,7 @@ static void __locks_insert_block(struct file_lock 
> > *blocker,
> >struct file_lock *))
> >  {
> > struct file_lock *fl;
> > +   struct file_lock_core *bflc;
> > BUG_ON(!list_empty(>fl_core.fl_blocked_member));
> >  
> >  new_blocker:
> > @@ -773,7 +790,9 @@ static void __locks_insert_block(struct file_lock 
> > *blocker,
> > waiter->fl_core.fl_blocker = blocker;
> > list_add_tail(>fl_core.fl_blocked_member,
> >   >fl_core.fl_blocked_requests);
> > -   if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> > +
> > +   bflc = >fl_core;
> > +   if (IS_POSIX(bflc) && !IS_OFDLCK(bflc))
> > locks_insert_global_blocked(>fl_core);
> >  
> > /* The requests in waiter->fl_blocked are known to conflict with
> > @@ -998,6 +1017,7 @@ static int posix_locks_deadlock(struct file_lock 
> > *caller_fl,
> > struct file_lock *block_fl)
> >  {
> > int i = 0;
> > +   struct file_lock_core *flc = _fl->fl_core;
> >  
> > lockdep_assert_held(_lock_lock);
> >  
> > @@ -1005,7 +1025,7 @@ static int posix_locks_deadlock(struct file_lock 
> > *caller_fl,
> >  * This deadlock detector can't reasonably detect deadlocks with
> >  * FL_OFDLCK locks, since they aren't owned by a process, per-se.
> >  */
> > -   if (IS_OFDLCK(caller_fl))
> > +   if (IS_OFDLCK(flc))
> > return 0;
> >  
> > while ((block_fl = what_owner_is_waiting_for(block_fl))) {
> > @@ -2157,7 +2177,7 @@ static pid_t locks_translate_pid(struct file_lock 
> > *fl, struct pid_namespace *ns)
> > pid_t vnr;
> > struct pid *pid;
> >  
> > -   if (IS_OFDLCK(fl))
> > +   if (IS_OFDLCK(>fl_core))
> > return -1;
> > if (IS_REMOTELCK(fl))
> > return fl->fl_core.fl_pid;
> > @@ -2721,19 +2741,19 @@ static void lock_get_status(struct seq_file *f, 
> > struct file_lock *fl,
> > if (repeat)
> > seq_printf(f, "%*s", repeat - 1 + (int)strlen(pfx), pfx);
> >  
> > -   if (IS_POSIX(fl)) {
> > +   if (IS_POSIX(>fl_core)) {
> > if (fl->fl_core.fl_flags & FL_ACCESS)
> > seq_puts(f, "ACCESS");
> > -   else if (IS_OFDLCK(fl))
> > +   else if (IS_OFDLCK(>fl_core))
> > seq_puts(f, "OFDLCK");
> > else
> > seq_puts(f, "POSIX ");
> >  
> > seq_printf(f, " %s ",
> >  (inode == NULL) ? "*NOINODE*" : "ADVISORY ");
> > -   } else if 

Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

2024-01-17 Thread Honggyu Kim
Hi SeongJae,

Thanks very much for your comments in details.

On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park  wrote:

> Thank you so much for this great patches and the above nice test results.  I
> believe the test setup and results make sense, and merging a revised version 
> of
> this patchset would provide real benefits to the users.

Glad to hear that!

> In a high level, I think it might better to separate DAMON internal changes
> from DAMON external changes.

I agree.  I can't guarantee but I can move all the external changes
inside mm/damon, but will try that as much as possible.

> For DAMON part changes, I have no big concern other than trivial coding style
> level comments.

Sure.  I will fix those.

> For DAMON-external changes that implementing demote_pages() and
> promote_pages(), I'm unsure if the implementation is reusing appropriate
> functions, and if those are placee in right source file.  Especially, I'm
> unsure if vmscan.c is the right place for promotion code.  Also I don't know 
> if
> there is a good agreement on the promotion/demotion target node decision.  
> That
> should be because I'm not that familiar with the areas and the files, but I
> feel this might because our discussions on the promotion and the demotion
> operations are having rooms for being more matured.  Because I'm not very
> faimiliar with the part, I'd like to hear others' comments, too.

I would also like to hear others' comments, but this might not be needed
if most of external code can be moved to mm/damon.

> To this end, I feel the problem might be able te be simpler, because this
> patchset is trying to provide two sophisticated operations, while I think a
> simpler approach might be possible.  My humble simpler idea is adding a DAMOS
> operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> instead of the promote/demote.  Because the general pages migration can handle
> multiple cases including the promote/demote in my humble assumption.

My initial implementation was similar but I found that it's not accurate
enough due to the nature of inaccuracy of DAMON regions.  I saw that
many pages were demoted and promoted back and forth because migration
target regions include both hot and cold pages together.

So I have implemented the demotion and promotion logics based on the
shrink_folio_list, which contains many corner case handling logics for
reclaim.

Having the current demotion and promotion logics makes the hot/cold
migration pretty accurate as expected.  We made a simple program called
"hot_cold" and it receives 2 arguments for hot size and cold size in MB.
For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
of cold memory.  It basically allocates 2 large blocks of memory with
mmap, then repeat memset for the initial 200MB to make it accessed in an
infinite loop.

Let's say there are 3 nodes in the system and the first node0 and node1
are the first tier, and node2 is the second tier.

  $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
  0-1

  $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
  2

Here is the result of partitioning hot/cold memory and I put execution
command at the right side of numastat result.  I initially ran each
hot_cold program with preferred setting so that they initially allocate
memory on one of node0 or node2, but they gradually migrated based on
their access frequencies.

  $ numastat -c -p hot_cold
  Per-node process memory usage (in MBs) 
  PID  Node 0 Node 1 Node 2 Total 
  ---  -- -- -- - 
  754 (hot_cold) 1800  0   2000  3800<- hot_cold 1800 2000 
  1184 (hot_cold) 300  0500   800<- hot_cold 300 500 
  1818 (hot_cold) 801  0   3199  4000<- hot_cold 800 3200 
  30289 (hot_cold)  4  0  510<- hot_cold 3 5 
  30325 (hot_cold) 31  0 5181<- hot_cold 30 50 
  ---  -- -- -- - 
  Total  2938  0   5756  8695

The final node placement result shows that DAMON accurately migrated
pages by their hotness for multiple processes.

> In more detail, users could decide which is the appropriate node for promotion
> or demotion and use the new DAMOS action to do promotion and demotion.  Users
> would requested to decide which node is the proper promotion/demotion target
> nodes, but that decision wouldn't be that hard in my opinion.
> 
> For this, 'struct damos' would need to be updated for such argument-dependent
> actions, like 'struct damos_filter' is haing a union.

That might be a better solution.  I will think about it.

> In future, we could extend the operation to the promotion and the demotion
> after the dicussion around the promotion and demotion is matured, if required.
> And assuming DAMON be extended for originating CPU-aware access monitoring, 
> the
> new DAMOS action would also cover more use cases such as general NUMA nodes
> balancing 

Re: [PATCH] virtiofs: limit the length of ITER_KVEC dio by max_nopage_rw

2024-01-17 Thread Hou Tao
Hi,

On 1/11/2024 6:34 AM, Bernd Schubert wrote:
>
>
> On 1/10/24 02:16, Hou Tao wrote:
>> Hi,
>>
>> On 1/9/2024 9:11 PM, Bernd Schubert wrote:
>>>
>>>
>>> On 1/3/24 11:59, Hou Tao wrote:
 From: Hou Tao 

 When trying to insert a 10MB kernel module kept in a virtiofs with
 cache
 disabled, the following warning was reported:

     [ cut here ]
     WARNING: CPU: 2 PID: 439 at mm/page_alloc.c:4544 ..
     Modules linked in:
     CPU: 2 PID: 439 Comm: insmod Not tainted 6.7.0-rc7+ #33
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ..
     RIP: 0010:__alloc_pages+0x2c4/0x360
     ..
     Call Trace:
  
  ? __warn+0x8f/0x150
  ? __alloc_pages+0x2c4/0x360
  __kmalloc_large_node+0x86/0x160
  __kmalloc+0xcd/0x140
  virtio_fs_enqueue_req+0x240/0x6d0
  virtio_fs_wake_pending_and_unlock+0x7f/0x190
  queue_request_and_unlock+0x58/0x70
  fuse_simple_request+0x18b/0x2e0
  fuse_direct_io+0x58a/0x850
  fuse_file_read_iter+0xdb/0x130
  __kernel_read+0xf3/0x260
  kernel_read+0x45/0x60
  kernel_read_file+0x1ad/0x2b0
  init_module_from_file+0x6a/0xe0
  idempotent_init_module+0x179/0x230
  __x64_sys_finit_module+0x5d/0xb0
  do_syscall_64+0x36/0xb0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
  ..
  
     ---[ end trace  ]---

 The warning happened as follow. In copy_args_to_argbuf(), virtiofs
 uses
 kmalloc-ed memory as bound buffer for fuse args, but
 fuse_get_user_pages() only limits the length of fuse arg by
 max_read or
 max_write for IOV_KVEC io (e.g., kernel_read_file from
 finit_module()).
 For virtiofs, max_read is UINT_MAX, so a big read request which is
 about
>>>
>>>
>>> I find this part of the explanation a bit confusing. I guess you
>>> wanted to write something like
>>>
>>> fuse_direct_io() -> fuse_get_user_pages() is limited by
>>> fc->max_write/fc->max_read and fc->max_pages. For virtiofs max_pages
>>> does not apply as ITER_KVEC is used. As virtiofs sets fc->max_read to
>>> UINT_MAX basically no limit is applied at all.
>>
>> Yes, what you said is just as expected but it is not the root cause of
>> the warning. The culprit of the warning is kmalloc() in
>> copy_args_to_argbuf() just as said in commit message. vmalloc() is also
>> not acceptable, because the physical memory needs to be contiguous. For
>> the problem, because there is no page involved, so there will be extra
>> sg available, maybe we can use these sg to break the big read/write
>> request into page.
>
> Hmm ok, I was hoping that contiguous memory is not needed.
> I see that ENOMEM is handled, but how that that perform (or even
> complete) on a really badly fragmented system? I guess splitting into
> smaller pages or at least adding some reserve kmem_cache (or even
> mempool) would make sense?

I don't think using kmem_cache will help, because direct IO initiated
from kernel (ITER_KVEC io) needs big and contiguous memory chunk. I have
written a draft patch in which it breaks the ITER_KVEC chunk into pages,
uses these pages to initialize extra sgs and passes it to virtiofsd. It
works but it is a bit complicated and I am not sure whether it is worthy
the complexity. Anyway, I will beautify it and post it as v2.

>
>>>
>>> I also wonder if it wouldn't it make sense to set a sensible limit in
>>> virtio_fs_ctx_set_defaults() instead of introducing a new variable?
>>
>> As said in the commit message:
>>
>> A feasible solution is to limit the value of max_read for virtiofs, so
>> the length passed to kmalloc() will be limited. However it will affects
>> the max read size for ITER_IOVEC io and the value of max_write also
>> needs
>> limitation.
>>
>> It is a bit hard to set a reasonable value for both max_read and
>> max_write to handle both normal ITER_IOVEC io and ITER_KVEC io. And
>> considering ITER_KVEC io + dio case is uncommon, I think using a new
>> limitation is more reasonable.
>
> For ITER_IOVEC max_pages applies - which is limited to
> FUSE_MAX_MAX_PAGES - why can't this be used in
> virtio_fs_ctx_set_defaults?

It won't help too much. Under x86-64, max_read will be 256 * 4KB = 1MB,
so it will try to do kmalloc(1MB, GFP_ATOMIC) and I think it still
creates too much memory pressure for the system.
>
> @Miklos, is there a reason why there is no upper fc->max_{read,write}
> limit in process_init_reply()? Shouldn't both be limited to
> (FUSE_MAX_MAX_PAGES * PAGE_SIZE). Or any other reasonable limit?

It seems that for all other read/write requests beside ITER_IOVEC direct
io, max_pages_limit is honored implicitly.
>
>
> Thanks,
> Bernd
>
>
>
>>>
>>> Also, I guess the issue is kmalloc_array() in virtio_fs_enqueue_req?
>>> Wouldn't it make sense to use kvm_alloc_array/kvfree in that function?
>>>
>>>
>>> Thanks,
>>> 

Re: [PATCH net] ipvs: Simplify the allocation of ip_vs_conn slab caches

2024-01-17 Thread Simon Horman
On Wed, Jan 17, 2024 at 03:20:45PM +0800, Kunwu Chan wrote:
> Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
> to simplify the creation of SLAB caches.
> 
> Signed-off-by: Kunwu Chan 

Hi Kunwu Chan,

I think this is more of a cleanup than a fix,
so it should probably be targeted at 'nf-next' rather than 'net'.

If it is a fix, then I would suggest targeting it at 'nf'
and providing a Fixes tag.

The above notwithstanding, this looks good to me.

Acked-by: Simon Horman 

> ---
>  net/netfilter/ipvs/ip_vs_conn.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index a743db073887..98d7dbe3d787 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -1511,9 +1511,7 @@ int __init ip_vs_conn_init(void)
>   return -ENOMEM;
>  
>   /* Allocate ip_vs_conn slab cache */
> - ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
> -   sizeof(struct ip_vs_conn), 0,
> -   SLAB_HWCACHE_ALIGN, NULL);
> + ip_vs_conn_cachep = KMEM_CACHE(ip_vs_conn, SLAB_HWCACHE_ALIGN);
>   if (!ip_vs_conn_cachep) {
>   kvfree(ip_vs_conn_tab);
>   return -ENOMEM;



Re: [PATCH net] ipvs: Simplify the allocation of ip_vs_conn slab caches

2024-01-17 Thread Denis Kirjanov



On 1/17/24 10:20, Kunwu Chan wrote:
> Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
> to simplify the creation of SLAB caches.
> 
> Signed-off-by: Kunwu Chan 

The patch is actually for net-next 

> ---
>  net/netfilter/ipvs/ip_vs_conn.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index a743db073887..98d7dbe3d787 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -1511,9 +1511,7 @@ int __init ip_vs_conn_init(void)
>   return -ENOMEM;
>  
>   /* Allocate ip_vs_conn slab cache */
> - ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
> -   sizeof(struct ip_vs_conn), 0,
> -   SLAB_HWCACHE_ALIGN, NULL);
> + ip_vs_conn_cachep = KMEM_CACHE(ip_vs_conn, SLAB_HWCACHE_ALIGN);
>   if (!ip_vs_conn_cachep) {
>   kvfree(ip_vs_conn_tab);
>   return -ENOMEM;