Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-20 Thread Google
On Sat, 20 Apr 2024 07:22:50 +0300
Mike Rapoport  wrote:

> On Fri, Apr 19, 2024 at 02:42:16PM -0700, Song Liu wrote:
> > On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport  wrote:
> > >
> > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote:
> > > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport  wrote:
> > > > [...]
> > > > > > >
> > > > > > > [1] 
> > > > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org
> > > > > >
> > > > > > For the ROX to work, we need different users (module text, kprobe, 
> > > > > > etc.) to have
> > > > > > the same execmem_range. From [1]:
> > > > > >
> > > > > > static void *execmem_cache_alloc(struct execmem_range *range, 
> > > > > > size_t size)
> > > > > > {
> > > > > > ...
> > > > > >p = __execmem_cache_alloc(size);
> > > > > >if (p)
> > > > > >return p;
> > > > > >   err = execmem_cache_populate(range, size);
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > > We are calling __execmem_cache_alloc() without range. For this to 
> > > > > > work,
> > > > > > we can only call execmem_cache_alloc() with one execmem_range.
> > > > >
> > > > > Actually, on x86 this will "just work" because everything shares the 
> > > > > same
> > > > > address space :)
> > > > >
> > > > > The 2M pages in the cache will be in the modules space, so
> > > > > __execmem_cache_alloc() will always return memory from that address 
> > > > > space.
> > > > >
> > > > > For other architectures this indeed needs to be fixed with passing the
> > > > > range to __execmem_cache_alloc() and limiting search in the cache for 
> > > > > that
> > > > > range.
> > > >
> > > > I think we at least need the "map to" concept (initially proposed by 
> > > > Thomas)
> > > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE
> > > > maps to EXECMEM_MODULE_TEXT, so that all these actually share
> > > > the same range.
> > >
> > > Why?
> > 
> > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as
> > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe
> > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing
> > the "range" object, we will have to compare different range parameters to 
> > check
> > we can share cached pages between module text and kprobe, which is not
> > efficient. Did I miss something?

Song, thanks for trying to eplain. I think I need to explain why I used
module_alloc() originally.

This depends on how kprobe features are implemented on the architecture, and
how much features are supported on kprobes.

Because kprobe jump optimization and kprobe jump-back optimization need to
use a jump instruction to jump into the trampoline and jump back from the
trampoline directly, if the architecuture jmp instruction supports +-2GB range
like x86, it needs to allocate the trampoline buffer inside such address space.
This requirement is similar to the modules (because module function needs to
call other functions in the kernel etc.), at least kprobes on x86 used
module_alloc().

However, if an architecture only supports breakpoint/trap based kprobe,
it does not need to consider whether the execmem is allocated.

> 
> We can always share large ROX pages as long as they are within the correct
> address space. The permissions for them are ROX and the alignment
> differences are due to KASAN and this is handled during allocation of the
> large page to refill the cache. __execmem_cache_alloc() only needs to limit
> the search for the address space of the range.

So I don't think EXECMEM_KPROBE always same as EXECMEM_MODULE_TEXT, it
should be configured for each arch. Especially, if it is only used for
searching parameter, it looks OK to me.

Thank you,

> 
> And regardless, they way we deal with sharing of the cache can be sorted
> out later.
> 
> > Thanks,
> > Song
> 
> -- 
> Sincerely yours,
> Mike.
> 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-20 Thread Mike Rapoport
On Fri, Apr 19, 2024 at 03:59:40PM +, Christophe Leroy wrote:
> 
> 
> Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> > Hi Masami,
> > 
> > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> >> Hi Mike,
> >>
> >> On Thu, 11 Apr 2024 19:00:50 +0300
> >> Mike Rapoport  wrote:
> >>
> >>> From: "Mike Rapoport (IBM)" 
> >>>
> >>> kprobes depended on CONFIG_MODULES because it has to allocate memory for
> >>> code.
> >>>
> >>> Since code allocations are now implemented with execmem, kprobes can be
> >>> enabled in non-modular kernels.
> >>>
> >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> >>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
> >>
> >> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> >> function body? We have enough dummy functions for that, so it should
> >> not make a problem.
> > 
> > The code in check_kprobe_address_safe() that gets the module and checks for
> > __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> > I can pull it out to a helper or leave #ifdef in the function body,
> > whichever you prefer.
> 
> As far as I can see, the only problem is MODULE_STATE_COMING.
> Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  ?

There's dereference of 'struct module' there:
 
(*probed_mod)->state != MODULE_STATE_COMING) {
...
}

so moving out 'enum module_state' won't be enough.
 
> >   
> >> -- 
> >> Masami Hiramatsu
> > 

-- 
Sincerely yours,
Mike.



Re: [PATCH virt] virt: fix uninit-value in vhost_vsock_dev_open

2024-04-20 Thread Michael S. Tsirkin
On Sat, Apr 20, 2024 at 05:57:50PM +0900, Jeongjun Park wrote:
> Change vhost_vsock_dev_open() to use kvzalloc() instead of kvmalloc()
> to avoid uninit state.
> 
> Reported-by: syzbot+6c21aeb59d0e82eb2...@syzkaller.appspotmail.com
> Fixes: dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by 
> __GFP_RETRY_MAYFAIL with more useful semantic")
> Signed-off-by: Jeongjun Park 

What value exactly is used uninitialized?

> ---
>  drivers/vhost/vsock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index ec20ecff85c7..652ef97a444b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -656,7 +656,7 @@ static int vhost_vsock_dev_open(struct inode *inode, 
> struct file *file)
>   /* This struct is large and allocation could fail, fall back to vmalloc
>* if there is no other way.
>*/
> - vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
> + vsock = kvzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>   if (!vsock)
>   return -ENOMEM;
>  
> -- 
> 2.34.1




Re: [PATCH v9 07/36] function_graph: Allow multiple users to attach to function graph

2024-04-20 Thread Google
On Fri, 19 Apr 2024 23:52:58 -0400
Steven Rostedt  wrote:

> On Mon, 15 Apr 2024 21:50:20 +0900
> "Masami Hiramatsu (Google)"  wrote:
> 
> > @@ -27,23 +28,157 @@
> >  
> >  #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
> >  #define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))
> > +
> > +/*
> > + * On entry to a function (via function_graph_enter()), a new 
> > ftrace_ret_stack
> > + * is allocated on the task's ret_stack with indexes entry, then each
> > + * fgraph_ops on the fgraph_array[]'s entryfunc is called and if that 
> > returns
> > + * non-zero, the index into the fgraph_array[] for that fgraph_ops is 
> > recorded
> > + * on the indexes entry as a bit flag.
> > + * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
> > + * be found, the index to it is also added to the ret_stack along with the
> > + * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
> > + * called.
> > + *
> > + * The top of the ret_stack (when not empty) will always have a reference
> > + * to the last ftrace_ret_stack saved. All references to the
> > + * ftrace_ret_stack has the format of:
> > + *
> > + * bits:  0 -  9   offset in words from the previous ftrace_ret_stack
> > + * (bitmap type should have FGRAPH_RET_INDEX always)
> > + * bits: 10 - 11   Type of storage
> > + *   0 - reserved
> > + *   1 - bitmap of fgraph_array index
> > + *
> > + * For bitmap of fgraph_array index
> > + *  bits: 12 - 27  The bitmap of fgraph_ops fgraph_array index
> 
> I really hate the terminology I came up with here, and would love to
> get better terminology for describing what is going on. I looked it
> over but I'm constantly getting confused. And I wrote this code!
> 
> Perhaps we should use:
> 
>  @frame : The data that represents a single function call. When a
>   function is traced, all the data used for all the callbacks
>   attached to it, is in a single frame. This would replace the
>   FGRAPH_RET_SIZE as FGRAPH_FRAME_SIZE.

Agreed.

> 
>  @offset : This is the word size position on the stack. It would
>replace INDEX, as I think "index" is being used for more
>than one thing. Perhaps it should be "offset" when dealing
>with where it is on the shadow stack, and "pos" when dealing
>with which callback ops is being referenced.

Indeed. @index is usually used from the index in an array. So we can use
@index for fgraph_array[]. But inside a @frame, @offset would be better.

> 
> 
> > + *
> > + * That is, at the end of function_graph_enter, if the first and forth
> > + * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc 
> > called
> > + * on the return of the function being traced, this is what will be on the
> > + * task's shadow ret_stack: (the stack grows upward)
> > + *
> > + * || <- task->curr_ret_stack
> > + * ++
> > + * | bitmap_type(bitmap:(BIT(3)|BIT(0)),|
> > + * | offset:FGRAPH_RET_INDEX)   | <- the offset is from 
> > here
> > + * ++
> > + * | struct ftrace_ret_stack|
> > + * |   (stores the saved ret pointer)   | <- the offset points here
> > + * ++
> > + * | (X) | (N)  | ( N words away from
> > + * ||   previous ret_stack)
> > + *
> > + * If a backtrace is required, and the real return pointer needs to be
> > + * fetched, then it looks at the task's curr_ret_stack index, if it
> > + * is greater than zero (reserved, or right before poped), it would mask
> > + * the value by FGRAPH_RET_INDEX_MASK to get the offset index of the
> > + * ftrace_ret_stack structure stored on the shadow stack.
> > + */
> > +
> > +#define FGRAPH_RET_INDEX_SIZE  10
> 
> Replace SIZE with BITS.

Agreed.

> 
> > +#define FGRAPH_RET_INDEX_MASK  GENMASK(FGRAPH_RET_INDEX_SIZE - 1, 0)
> 
>   #define FGRAPH_FRAME_SIZE_BITS  10
>   #define FGRAPH_FRAME_SIZE_MASK  GENMASK(FGRAPH_FRAME_SIZE_BITS - 1, 0)
> 
> 
> > +
> > +#define FGRAPH_TYPE_SIZE   2
> > +#define FGRAPH_TYPE_MASK   GENMASK(FGRAPH_TYPE_SIZE - 1, 0)
> 
>   #define FGRAPH_TYPE_BITS2
>   #define FGRAPH_TYPE_MASKGENMASK(FGRAPH_TYPE_BITS - 1, 0)
> 
> 
> > +#define FGRAPH_TYPE_SHIFT  FGRAPH_RET_INDEX_SIZE
> > +
> > +enum {
> > +   FGRAPH_TYPE_RESERVED= 0,
> > +   FGRAPH_TYPE_BITMAP  = 1,
> > +};
> > +
> > +#define FGRAPH_INDEX_SIZE  16
> 
> replace "INDEX" with "OPS" as it will be the indexes of ops in the
> array.
> 
>   #define FGRAPH_OPS_BITS 16
>   #define FGRAPH_OPS_MASK GENMASK(FGRAPH_OPS_BITS - 1, 0)

OK, this looks good.

> 
> > +#define FGRAPH_INDEX_MASK  GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
> > +#define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + 

[PATCH virt] virt: fix uninit-value in vhost_vsock_dev_open

2024-04-20 Thread Jeongjun Park
Change vhost_vsock_dev_open() to use kvzalloc() instead of kvmalloc()
to avoid uninit state.

Reported-by: syzbot+6c21aeb59d0e82eb2...@syzkaller.appspotmail.com
Fixes: dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by 
__GFP_RETRY_MAYFAIL with more useful semantic")
Signed-off-by: Jeongjun Park 
---
 drivers/vhost/vsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ec20ecff85c7..652ef97a444b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -656,7 +656,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct 
file *file)
/* This struct is large and allocation could fail, fall back to vmalloc
 * if there is no other way.
 */
-   vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
+   vsock = kvzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!vsock)
return -ENOMEM;
 
-- 
2.34.1



Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-20 Thread Google
On Sat, 20 Apr 2024 10:33:38 +0300
Mike Rapoport  wrote:

> On Fri, Apr 19, 2024 at 03:59:40PM +, Christophe Leroy wrote:
> > 
> > 
> > Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> > > Hi Masami,
> > > 
> > > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> > >> Hi Mike,
> > >>
> > >> On Thu, 11 Apr 2024 19:00:50 +0300
> > >> Mike Rapoport  wrote:
> > >>
> > >>> From: "Mike Rapoport (IBM)" 
> > >>>
> > >>> kprobes depended on CONFIG_MODULES because it has to allocate memory for
> > >>> code.
> > >>>
> > >>> Since code allocations are now implemented with execmem, kprobes can be
> > >>> enabled in non-modular kernels.
> > >>>
> > >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> > >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> > >>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
> > >>
> > >> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> > >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> > >> function body? We have enough dummy functions for that, so it should
> > >> not make a problem.
> > > 
> > > The code in check_kprobe_address_safe() that gets the module and checks 
> > > for
> > > __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> > > I can pull it out to a helper or leave #ifdef in the function body,
> > > whichever you prefer.
> > 
> > As far as I can see, the only problem is MODULE_STATE_COMING.
> > Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  ?
> 
> There's dereference of 'struct module' there:
>  
>   (*probed_mod)->state != MODULE_STATE_COMING) {
>   ...
>   }
> 
> so moving out 'enum module_state' won't be enough.

Hmm, this part should be inline functions like;

#ifdef CONFIG_MODULES
static inline bool module_is_coming(struct module *mod)
{
return mod->state == MODULE_STATE_COMING;
}
#else
#define module_is_coming(mod) (false)
#endif

Then we don't need the enum.
Thank you,

>  
> > >   
> > >> -- 
> > >> Masami Hiramatsu
> > > 
> 
> -- 
> Sincerely yours,
> Mike.
> 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH virt] virt: fix uninit-value in vhost_vsock_dev_open

2024-04-20 Thread Jeongjun Park
static bool vhost_transport_seqpacket_allow(u32 remote_cid)
{

vsock = vhost_vsock_get(remote_cid);

if (vsock)
seqpacket_allow = vsock->seqpacket_allow;

}

I think this is due to reading a previously created uninitialized 
vsock->seqpacket_allow inside vhost_transport_seqpacket_allow(), 
which is executed by the function pointer present in the if statement.

Thanks



Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-20 Thread Mike Rapoport
On Sat, Apr 20, 2024 at 06:15:00PM +0900, Masami Hiramatsu wrote:
> On Sat, 20 Apr 2024 10:33:38 +0300
> Mike Rapoport  wrote:
> 
> > On Fri, Apr 19, 2024 at 03:59:40PM +, Christophe Leroy wrote:
> > > 
> > > 
> > > Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> > > > Hi Masami,
> > > > 
> > > > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> > > >> Hi Mike,
> > > >>
> > > >> On Thu, 11 Apr 2024 19:00:50 +0300
> > > >> Mike Rapoport  wrote:
> > > >>
> > > >>> From: "Mike Rapoport (IBM)" 
> > > >>>
> > > >>> kprobes depended on CONFIG_MODULES because it has to allocate memory 
> > > >>> for
> > > >>> code.
> > > >>>
> > > >>> Since code allocations are now implemented with execmem, kprobes can 
> > > >>> be
> > > >>> enabled in non-modular kernels.
> > > >>>
> > > >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes 
> > > >>> inside
> > > >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> > > >>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
> > > >>
> > > >> Thanks for this work, but this conflicts with the latest fix in 
> > > >> v6.9-rc4.
> > > >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> > > >> function body? We have enough dummy functions for that, so it should
> > > >> not make a problem.
> > > > 
> > > > The code in check_kprobe_address_safe() that gets the module and checks 
> > > > for
> > > > __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> > > > I can pull it out to a helper or leave #ifdef in the function body,
> > > > whichever you prefer.
> > > 
> > > As far as I can see, the only problem is MODULE_STATE_COMING.
> > > Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  
> > > ?
> > 
> > There's dereference of 'struct module' there:
> >  
> > (*probed_mod)->state != MODULE_STATE_COMING) {
> > ...
> > }
> > 
> > so moving out 'enum module_state' won't be enough.
> 
> Hmm, this part should be inline functions like;
> 
> #ifdef CONFIG_MODULES
> static inline bool module_is_coming(struct module *mod)
> {
>   return mod->state == MODULE_STATE_COMING;
> }
> #else
> #define module_is_coming(mod) (false)

I'd prefer

static inline module_is_coming(struct module *mod)
{
return false;
}

> #endif
>
> Then we don't need the enum.
> Thank you,
> 
> -- 
> Masami Hiramatsu (Google) 

-- 
Sincerely yours,
Mike.



Re: [PATCH 2/4] iio: light: stk3310: Implement vdd supply and power it off during suspend

2024-04-20 Thread Jonathan Cameron
On Sun, 14 Apr 2024 13:57:14 -0400
Aren Moynihan  wrote:

> From: Ondrej Jirman 
> 
> VDD power input can be used to completely power off the chip during
> system suspend. Do so if available.
I'd make this non optional (relying on regulator framework providing
us a stub for old DT etc) and pay the minor cost of potentially restoring
registers when it was never powered down.

Simpler code and likely anyone who is doing suspend / resume will have
power control anyway.

Jonathan

> 
> Signed-off-by: Ondrej Jirman 
> Signed-off-by: Aren Moynihan 
> ---
>  drivers/iio/light/stk3310.c | 56 +++--
>  1 file changed, 53 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iio/light/stk3310.c b/drivers/iio/light/stk3310.c
> index 7b71ad71d78d..bfa090538df7 100644
> --- a/drivers/iio/light/stk3310.c
> +++ b/drivers/iio/light/stk3310.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define STK3310_REG_STATE0x00
>  #define STK3310_REG_PSCTRL   0x01
> @@ -117,6 +118,7 @@ struct stk3310_data {
>   struct regmap_field *reg_int_ps;
>   struct regmap_field *reg_flag_psint;
>   struct regmap_field *reg_flag_nf;
> + struct regulator *vdd_reg;
>  };
>  
>  static const struct iio_event_spec stk3310_events[] = {
> @@ -607,6 +609,16 @@ static int stk3310_probe(struct i2c_client *client)
>  
>   mutex_init(>lock);
>  
> + data->vdd_reg = devm_regulator_get_optional(>dev, "vdd");

This needs a comment on why it is optional.
Generally power supply regulators are not, but I think the point here
is to avoid restoring the registers if the chip wasn't powered down?

This feels like an interesting gap in the regulator framework.
For most cases we can rely on  stub / fake regulator being created
for always on supplies, but that doesn't let us elide the register writes.

My gut feeling is do them unconditionally. Suspend / resume isn't
that common that it will matter much.

That would allow you to have this as devm_regulator_get() and
drop handling of it not being provided.

> + if (IS_ERR(data->vdd_reg)) {
> + ret = PTR_ERR(data->vdd_reg);
> + if (ret == -ENODEV)
> + data->vdd_reg = NULL;
> + else
> + return dev_err_probe(>dev, ret,
> +  "get regulator vdd failed\n");
> + }
> +
>   ret = stk3310_regmap_init(data);
>   if (ret < 0)
>   return ret;
> @@ -617,9 +629,18 @@ static int stk3310_probe(struct i2c_client *client)
>   indio_dev->channels = stk3310_channels;
>   indio_dev->num_channels = ARRAY_SIZE(stk3310_channels);
>  
> + if (data->vdd_reg) {
> + ret = regulator_enable(data->vdd_reg);
> + if (ret)
> + return dev_err_probe(>dev, ret,
> +  "regulator vdd enable failed\n");
> +
> + usleep_range(1000, 2000);
> + }
> +
>   ret = stk3310_init(indio_dev);
>   if (ret < 0)
> - return ret;
> + goto err_vdd_disable;
>  
>   if (client->irq > 0) {
>   ret = devm_request_threaded_irq(>dev, client->irq,
> @@ -645,32 +666,61 @@ static int stk3310_probe(struct i2c_client *client)
>  
>  err_standby:
>   stk3310_set_state(data, STK3310_STATE_STANDBY);
> +err_vdd_disable:
> + if (data->vdd_reg)
> + regulator_disable(data->vdd_reg);
>   return ret;
>  }
>  
>  static void stk3310_remove(struct i2c_client *client)
>  {
>   struct iio_dev *indio_dev = i2c_get_clientdata(client);
> + struct stk3310_data *data = iio_priv(indio_dev);
>  
>   iio_device_unregister(indio_dev);
>   stk3310_set_state(iio_priv(indio_dev), STK3310_STATE_STANDBY);
> + if (data->vdd_reg)
> + regulator_disable(data->vdd_reg);
>  }
>  
>  static int stk3310_suspend(struct device *dev)
>  {
>   struct stk3310_data *data;
> + int ret;
>  
>   data = iio_priv(i2c_get_clientdata(to_i2c_client(dev)));
>  
> - return stk3310_set_state(data, STK3310_STATE_STANDBY);
> + ret = stk3310_set_state(data, STK3310_STATE_STANDBY);
> + if (ret)
> + return ret;
> +
> + if (data->vdd_reg) {

As above, I don't think we care enough about overhead on
boards where there isn't a vdd regulator.   Just do this
unconditionally.

> + regcache_mark_dirty(data->regmap);
> + regulator_disable(data->vdd_reg);
> + }
> +
> + return 0;
>  }
>  
>  static int stk3310_resume(struct device *dev)
>  {
> - u8 state = 0;
>   struct stk3310_data *data;
> + u8 state = 0;
> + int ret;
>  
>   data = iio_priv(i2c_get_clientdata(to_i2c_client(dev)));
> +
> + if (data->vdd_reg) {
> + ret = regulator_enable(data->vdd_reg);
> + if (ret) {
> + dev_err(dev, "Failed to re-enable regulator vdd\n");
> +  

Re: [PATCH 1/2] tracing/user_events: Fix non-spaced field matching

2024-04-20 Thread Google
On Fri, 19 Apr 2024 14:13:34 -0700
Beau Belgrave  wrote:

> On Fri, Apr 19, 2024 at 11:33:05AM +0900, Masami Hiramatsu wrote:
> > On Tue, 16 Apr 2024 22:41:01 +
> > Beau Belgrave  wrote:
> > 
> > > When the ABI was updated to prevent same name w/different args, it
> > > missed an important corner case when fields don't end with a space.
> > > Typically, space is used for fields to help separate them, like
> > > "u8 field1; u8 field2". If no spaces are used, like
> > > "u8 field1;u8 field2", then the parsing works for the first time.
> > > However, the match check fails on a subsequent register, leading to
> > > confusion.
> > > 
> > > This is because the match check uses argv_split() and assumes that all
> > > fields will be split upon the space. When spaces are used, we get back
> > > { "u8", "field1;" }, without spaces we get back { "u8", "field1;u8" }.
> > > This causes a mismatch, and the user program gets back -EADDRINUSE.
> > > 
> > > Add a method to detect this case before calling argv_split(). If found
> > > force a space after the field separator character ';'. This ensures all
> > > cases work properly for matching.
> > > 
> > > With this fix, the following are all treated as matching:
> > > u8 field1;u8 field2
> > > u8 field1; u8 field2
> > > u8 field1;\tu8 field2
> > > u8 field1;\nu8 field2
> > 
> > Sounds good to me. I just have some nits.
> > 
> > > 
> > > Fixes: ba470eebc2f6 ("tracing/user_events: Prevent same name but 
> > > different args event")
> > > Signed-off-by: Beau Belgrave 
> > > ---
> > >  kernel/trace/trace_events_user.c | 88 +++-
> > >  1 file changed, 87 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/trace/trace_events_user.c 
> > > b/kernel/trace/trace_events_user.c
> > > index 70d428c394b6..9184d3962b2a 100644
> > > --- a/kernel/trace/trace_events_user.c
> > > +++ b/kernel/trace/trace_events_user.c
> > > @@ -1989,6 +1989,92 @@ static int user_event_set_tp_name(struct 
> > > user_event *user)
> > >   return 0;
> > >  }
> > >  
> > > +/*
> > > + * Counts how many ';' without a trailing space are in the args.
> > > + */
> > > +static int count_semis_no_space(char *args)
> > > +{
> > > + int count = 0;
> > > +
> > > + while ((args = strchr(args, ';'))) {
> > > + args++;
> > > +
> > > + if (!isspace(*args))
> > > + count++;
> > > + }
> > > +
> > > + return count;
> > > +}
> > > +
> > > +/*
> > > + * Copies the arguments while ensuring all ';' have a trailing space.
> > > + */
> > > +static char *fix_semis_no_space(char *args, int count)
> > 
> > nit: This name does not represent what it does. 'insert_space_after_semis()'
> > is more self-described.
> > 
> 
> Sure, will fix in a v2.
> 
> > > +{
> > > + char *fixed, *pos;
> > > + char c, last;
> > > + int len;
> > > +
> > > + len = strlen(args) + count;
> > > + fixed = kmalloc(len + 1, GFP_KERNEL);
> > > +
> > > + if (!fixed)
> > > + return NULL;
> > > +
> > > + pos = fixed;
> > > + last = '\0';
> > > +
> > > + while (len > 0) {
> > > + c = *args++;
> > > +
> > > + if (last == ';' && !isspace(c)) {
> > > + *pos++ = ' ';
> > > + len--;
> > > + }
> > > +
> > > + if (len > 0) {
> > > + *pos++ = c;
> > > + len--;
> > > + }
> > > +
> > > + last = c;
> > > + }
> > 
> > nit: This loop can be simpler, because we are sure fixed has enough length;
> > 
> > /* insert a space after ';' if there is no space. */
> > while(*args) {
> > *pos = *args++;
> > if (*pos++ == ';' && !isspace(*args))
> > *pos++ = ' ';
> > }
> > 
> 
> I was worried that if count_semis_no_space() ever had different logic
> (maybe after this commit) that it could cause an overflow if the count
> was wrong, etc.
> 
> I don't have an issue making it shorter, but I was trying to be more on
> the safe side, since this isn't a fast path (event register).

OK, anyway current code looks correct. But note that I don't think
"pos++; len--;" is safer, since it is not atomic. This pattern
easily loose "len--;" in my experience. So please carefully use it ;)

> 
> > > +
> > > + /*
> > > +  * len is the length of the copy excluding the null.
> > > +  * This ensures we always have room for a null.
> > > +  */
> > > + *pos = '\0';
> > > +
> > > + return fixed;
> > > +}
> > > +
> > > +static char **user_event_argv_split(char *args, int *argc)
> > > +{
> > > + /* Count how many ';' without a trailing space */
> > > + int count = count_semis_no_space(args);
> > > +
> > > + if (count) {
> > 
> > nit: it is better to exit fast, so 
> > 
> > if (!count)
> > return argv_split(GFP_KERNEL, args, argc);
> > 
> > ...
> 
> Sure, will fix in a v2.
> 
> > 
> > Thank you,
> > 
> > OT: BTW, can this also simplify synthetic events?
> > 
> 
> I'm not sure, I'll check when I have some time. I want to get this fix
> in sooner rather than later.

Ah, nevermind. Synthetic 

Re: [PATCH 1/4] dt-bindings: iio: light: stk33xx: add regulator for vdd supply

2024-04-20 Thread Jonathan Cameron
On Sun, 14 Apr 2024 13:57:13 -0400
Aren Moynihan  wrote:

> Signed-off-by: Aren Moynihan 
> ---
>  Documentation/devicetree/bindings/iio/light/stk33xx.yaml | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Documentation/devicetree/bindings/iio/light/stk33xx.yaml 
> b/Documentation/devicetree/bindings/iio/light/stk33xx.yaml
> index f6e22dc9814a..db35e239d4a8 100644
> --- a/Documentation/devicetree/bindings/iio/light/stk33xx.yaml
> +++ b/Documentation/devicetree/bindings/iio/light/stk33xx.yaml
> @@ -29,6 +29,7 @@ properties:
>interrupts:
>  maxItems: 1
>  
> +  vdd-supply: true

As per review of patch 2, make it required and add one to the example.
That doesn't mean it will be in every dts file, just that we expect
it to exist given chips tend to work poorly without power.

>proximity-near-level: true
>  
>  required:




Re: [PATCH 3/4] iio: light: stk3310: log error if reading the chip id fails

2024-04-20 Thread Jonathan Cameron
On Mon, 15 Apr 2024 18:05:54 +0300
Andy Shevchenko  wrote:

> On Sun, Apr 14, 2024 at 8:57 PM Aren Moynihan  wrote:
> >
> > If the chip isn't powered, this call is likely to return an error.
> > Without a log here the driver will silently fail to probe. Common errors
> > are ENXIO (when the chip isn't powered) and ETIMEDOUT (when the i2c bus
> > isn't powered).  
> 
> > ret = regmap_read(data->regmap, STK3310_REG_ID, );
> > -   if (ret < 0)
> > +   if (ret < 0) {
> > +   dev_err(>dev, "failed to read chip id: %d", ret);
> > return ret;
> > +   }  
> 
> Briefly looking at the code it seems that this one is strictly part of
> the probe phase, which means we may use
> 
>   return dev_err_probe(...);
> 
> pattern. Yet, you may add another patch to clean up all of them:
> _probe(), _init(), _regmap_init() to use the same pattern everywhere.
> 

Yes, a precursor patch to use dev_err_probe() throughout the probe only
functions in this driver would be excellent.

Jonathan



Re: [PATCH v4 2/2] rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()

2024-04-20 Thread Google
On Fri, 19 Apr 2024 10:59:09 -0700
Andrii Nakryiko  wrote:

> On Thu, Apr 18, 2024 at 6:00 PM Masami Hiramatsu  wrote:
> >
> > On Thu, 18 Apr 2024 12:09:09 -0700
> > Andrii Nakryiko  wrote:
> >
> > > Take into account CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING when validating
> > > that RCU is watching when trying to setup rethooko on a function entry.
> > >
> > > One notable exception when we force rcu_is_watching() check is
> > > CONFIG_KPROBE_EVENTS_ON_NOTRACE=y case, in which case kretprobes will use
> > > old-style int3-based workflow instead of relying on ftrace, making RCU
> > > watching check important to validate.
> > >
> > > This further (in addition to improvements in the previous patch)
> > > improves BPF multi-kretprobe (which rely on rethook) runtime throughput
> > > by 2.3%, according to BPF benchmarks ([0]).
> > >
> > >   [0] 
> > > https://lore.kernel.org/bpf/caef4bzauq2wkmjzdc9s0rbwa01bybgwhn6andxqshyia47p...@mail.gmail.com/
> > >
> > > Signed-off-by: Andrii Nakryiko 
> >
> >
> > Thanks for update! This looks good to me.
> 
> Thanks, Masami! Will you take it through your tree, or you'd like to
> route it through bpf-next?

OK, let me take it through linux-trace tree.

Thank you!

> 
> >
> > Acked-by: Masami Hiramatsu (Google) 
> >
> > Thanks,
> >
> > > ---
> > >  kernel/trace/rethook.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> > > index fa03094e9e69..a974605ad7a5 100644
> > > --- a/kernel/trace/rethook.c
> > > +++ b/kernel/trace/rethook.c
> > > @@ -166,6 +166,7 @@ struct rethook_node *rethook_try_get(struct rethook 
> > > *rh)
> > >   if (unlikely(!handler))
> > >   return NULL;
> > >
> > > +#if defined(CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING) || 
> > > defined(CONFIG_KPROBE_EVENTS_ON_NOTRACE)
> > >   /*
> > >* This expects the caller will set up a rethook on a function 
> > > entry.
> > >* When the function returns, the rethook will eventually be 
> > > reclaimed
> > > @@ -174,6 +175,7 @@ struct rethook_node *rethook_try_get(struct rethook 
> > > *rh)
> > >*/
> > >   if (unlikely(!rcu_is_watching()))
> > >   return NULL;
> > > +#endif
> > >
> > >   return (struct rethook_node *)objpool_pop(>pool);
> > >  }
> > > --
> > > 2.43.0
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google) 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter

2024-04-20 Thread kernel test robot
Hi zhenwei,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.9-rc4 next-20240419]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/zhenwei-pi/virtio_balloon-introduce-oom-kill-invocations/20240418-142934
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
mm-everything
patch link:
https://lore.kernel.org/r/20240418062602.1291391-3-pizhenwei%40bytedance.com
patch subject: [PATCH 2/3] virtio_balloon: introduce memory allocation stall 
counter
config: i386-randconfig-141-20240421 
(https://download.01.org/0day-ci/archive/20240421/202404211106.b9pwufqk-...@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 
6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240421/202404211106.b9pwufqk-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404211106.b9pwufqk-...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_balloon.c:324:18: warning: unused variable 'stall' 
>> [-Wunused-variable]
 324 | long available, stall = 0;
 | ^
   1 warning generated.


vim +/stall +324 drivers/virtio/virtio_balloon.c

   318  
   319  static unsigned int update_balloon_stats(struct virtio_balloon *vb)
   320  {
   321  unsigned long events[NR_VM_EVENT_ITEMS];
   322  struct sysinfo i;
   323  unsigned int idx = 0;
 > 324  long available, stall = 0;
   325  unsigned long caches;
   326  
   327  all_vm_events(events);
   328  si_meminfo();
   329  
   330  available = si_mem_available();
   331  caches = global_node_page_state(NR_FILE_PAGES);
   332  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [PATCH v9 5/9] clk: mmp: Add Marvell PXA1908 clock driver

2024-04-20 Thread Duje Mihanović

On 4/20/24 00:24, Stephen Boyd wrote:

Quoting Duje Mihanović (2024-04-19 07:31:14)

On Friday, April 12, 2024 4:57:09 AM GMT+2 Stephen Boyd wrote:

Quoting Duje Mihanović (2024-04-11 03:15:34)


On 4/11/2024 10:00 AM, Stephen Boyd wrote:

Is there a reason this file can't be a platform driver?


Not that I know of, I did it like this only because the other in-tree
MMP clk drivers do so. I guess the initialization should look like any
of the qcom GCC drivers then?


Yes.


With the entire clock driver code in one file this is quite messy as I also
needed to add module_init and module_exit functions to (un)register each
platform driver, presumably because the module_platform_driver macro doesn't
work with multiple platform drivers in one module. If I split up the driver
code for each clock controller block into its own file (such as clk-of-
pxa1908-apbc.c) as I believe is the best option, should the commits be split
up accordingly as well?


Sure. Why is 'of' in the name? Maybe that is unnecessary?


That seems to be a historical leftover from when Marvell was just adding 
DT support to the ARM32 MMP SoCs which Rob followed along with in the 
PXA1928 clk driver and so have I. Should I drop it then as Marvell has 
in the PXA1908 vendor kernel?


Regards,
--
Duje




Re: [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging

2024-04-20 Thread Yu Zhao
On Fri, Apr 19, 2024 at 3:48 PM James Houghton  wrote:
>
> On Fri, Apr 19, 2024 at 2:07 PM David Matlack  wrote:
> >
> > On 2024-04-19 01:47 PM, James Houghton wrote:
> > > On Thu, Apr 11, 2024 at 10:28 AM David Matlack  
> > > wrote:
> > > > On 2024-04-11 10:08 AM, David Matlack wrote:
> > > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > > > {
> > > > bool young = false;
> > > >
> > > > if (!range->arg.metadata->bitmap && 
> > > > kvm_memslots_have_rmaps(kvm))
> > > > young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
> > > >
> > > > if (tdp_mmu_enabled)
> > > > young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
> > > >
> > > > return young;
> > > > }
> > > >
> > > > bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > > > {
> > > > bool young = false;
> > > >
> > > > if (!range->arg.metadata->bitmap && 
> > > > kvm_memslots_have_rmaps(kvm))
> > > > young = kvm_handle_gfn_range(kvm, range, 
> > > > kvm_test_age_rmap);
> > > >
> > > > if (tdp_mmu_enabled)
> > > > young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
> > > >
> > > > return young;
> > >
> > >
> > > Yeah I think this is the right thing to do. Given your other
> > > suggestions (on patch 3), I think this will look something like this
> > > -- let me know if I've misunderstood something:
> > >
> > > bool check_rmap = !bitmap && kvm_memslot_have_rmaps(kvm);
> > >
> > > if (check_rmap)
> > >   KVM_MMU_LOCK(kvm);
> > >
> > > rcu_read_lock(); // perhaps only do this when we don't take the MMU lock?
> > >
> > > if (check_rmap)
> > >   kvm_handle_gfn_range(/* ... */ kvm_test_age_rmap)
> > >
> > > if (tdp_mmu_enabled)
> > >   kvm_tdp_mmu_test_age_gfn() // modified to be RCU-safe
> > >
> > > rcu_read_unlock();
> > > if (check_rmap)
> > >   KVM_MMU_UNLOCK(kvm);
> >
> > I was thinking a little different. If you follow my suggestion to first
> > make the TDP MMU aging lockless, you'll end up with something like this
> > prior to adding bitmap support (note: the comments are just for
> > demonstrative purposes):
> >
> > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > {
> > bool young = false;
> >
> > /* Shadow MMU aging holds write-lock. */
> > if (kvm_memslots_have_rmaps(kvm)) {
> > write_lock(>mmu_lock);
> > young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
> > write_unlock(>mmu_lock);
> > }
> >
> > /* TDM MMU aging is lockless. */
> > if (tdp_mmu_enabled)
> > young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
> >
> > return young;
> > }
> >
> > Then when you add bitmap support it would look something like this:
> >
> > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > {
> > unsigned long *bitmap = range->arg.metadata->bitmap;
> > bool young = false;
> >
> > /* SHadow MMU aging holds write-lock and does not support bitmap. */
> > if (kvm_memslots_have_rmaps(kvm) && !bitmap) {
> > write_lock(>mmu_lock);
> > young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
> > write_unlock(>mmu_lock);
> > }
> >
> > /* TDM MMU aging is lockless and supports bitmap. */
> > if (tdp_mmu_enabled)
> > young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
> >
> > return young;
> > }
> >
> > rcu_read_lock/unlock() would be called in kvm_tdp_mmu_age_gfn_range().
>
> Oh yes this is a lot better. I hope I would have seen this when it
> came time to actually update this patch. Thanks.
>
> >
> > That brings up a question I've been wondering about. If KVM only
> > advertises support for the bitmap lookaround when shadow roots are not
> > allocated, does that mean MGLRU will be blind to accesses made by L2
> > when nested virtualization is enabled? And does that mean the Linux MM
> > will think all L2 memory is cold (i.e. good candidate for swapping)
> > because it isn't seeing accesses made by L2?
>
> Yes, I think so (for both questions). That's better than KVM not
> participating in MGLRU aging at all, which is the case today (IIUC --
> also ignoring the case where KVM accesses guest memory directly). We
> could have MGLRU always invoke the mmu notifiers, but frequently
> taking the MMU lock for writing might be worse than evicting when we
> shouldn't. Maybe Yu tried this at some point, but I can't find any
> results for this.

No, in this case only the fast path (page table scanning) is disabled.
MGLRU still sees the A-bit from L2 using the rmap, i.e., the slow path
calling folio_check_references().