date:20240110

[GIT PULL] NVDIMM/NFIT changes for 6.8

2024-01-10 Thread Ira Weiny

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.8

... to get updates to the nvdimm tree.  They are a mix of bug fixes and updates
to interfaces used by nvdimm.

Updates to interfaces include:
Use the new scope based management
Remove deprecated ida interfaces
Update to sysfs_emit()

Fixup kdoc comments

They have all been in -next more than 6 days with no reported issues.

---

The following changes since commit 610a9b8f49fbcf1100716370d3b5f6f884a2835a:

  Linux 6.7-rc8 (2023-12-31 12:51:25 -0800)

are available in the Git repository at:

  g...@gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.8

for you to fetch changes up to a085a5eb6594a3ebe5c275e9c2c2d341f686c23c:

  acpi/nfit: Use sysfs_emit() for all attributes (2024-01-03 12:21:37 -0800)


libnvdimm updates for v6.8

- updates to deprecated and changed interfaces
- use new cleanup.h features
- use new ida interface
- kdoc fixes


Christophe JAILLET (1):
  nvdimm: Remove usage of the deprecated ida_simple_xx() API

Dan Williams (1):
  acpi/nfit: Use sysfs_emit() for all attributes

Dinghao Liu (1):
  nvdimm-btt: simplify code with the scope based resource management

Michal Wilczynski (1):
  ACPI: NFIT: Use cleanup.h helpers instead of devm_*()

Randy Dunlap (3):
  nvdimm/btt: fix btt_blk_cleanup() kernel-doc
  nvdimm/dimm_devs: fix kernel-doc for function params
  nvdimm/namespace: fix kernel-doc for function params

 drivers/acpi/nfit/core.c| 65 +++--
 drivers/nvdimm/btt.c| 15 --
 drivers/nvdimm/btt_devs.c   |  6 ++--
 drivers/nvdimm/bus.c|  4 +--
 drivers/nvdimm/dax_devs.c   |  4 +--
 drivers/nvdimm/dimm_devs.c  | 17 ---
 drivers/nvdimm/namespace_devs.c | 19 
 drivers/nvdimm/pfn_devs.c   |  4 +--
 8 files changed, 71 insertions(+), 63 deletions(-)

Re: [PATCH] driver/virtio: Add Memory Balloon Support for SEV/SEV-ES

2024-01-10 Thread Zheyun Shen

On Wed, Jan 10, 2024 at 4:01 PM Michael S. Tsirkin  wrote:
> Sorry I don't get what you are saying at all.
> Please format the commit log along the following lines:

> Currently .
> This is bad because ...
> To fix ...
> As a result ...

> No way I am going to spead CONFIG_AMD_MEM_ENCRYPT all over the place 
> like this.

I will try to find out a solution with fewer macros and send patch V2
with a more perspicuous commit log.

On Thur, Jan 11, 2024 at 11:20 AM Jason Wang  wrote:

> > For now, SEV pins guest's memory to avoid swapping or
> > moving ciphertext, but leading to the inhibition of
> > Memory Ballooning.
> >
> > In Memory Ballooning, only guest's free pages will be relocated
> > in balloon inflation and deflation, so the difference of plaintext
> > doesn't matter to guest.

> This seems only true if the page is zeroed, is this true here?

Sorry, I cannot figure out why the pages should be zeroed. I think
both host kernel and guest kernel assume that the pages are not 
zeroed and will use kzalloc or manually zero them in real applications,
which is same as non-SEV environments. 

I have tested in SEV-ES, reclaiming memory by balloon inflation and reuse 
them after balloon deflation both works well with the patch. Hypervisor 
can normally give the reclaimed memory from one CVM to another, or give 
back to the origin CVM.

Re: [PATCH] driver/virtio: Add Memory Balloon Support for SEV/SEV-ES

2024-01-10 Thread Jason Wang

On Wed, Jan 10, 2024 at 2:23 PM Zheyun Shen  wrote:
>
> For now, SEV pins guest's memory to avoid swapping or
> moving ciphertext, but leading to the inhibition of
> Memory Ballooning.
>
> In Memory Ballooning, only guest's free pages will be relocated
> in balloon inflation and deflation, so the difference of plaintext
> doesn't matter to guest.

This seems only true if the page is zeroed, is this true here?

Thanks

Re: [PATCH v7 2/3] vduse: Temporarily fail if control queue feature requested

2024-01-10 Thread Jason Wang

On Tue, Jan 9, 2024 at 7:10 PM Maxime Coquelin
 wrote:
>
> Virtio-net driver control queue implementation is not safe
> when used with VDUSE. If the VDUSE application does not
> reply to control queue messages, it currently ends up
> hanging the kernel thread sending this command.
>
> Some work is on-going to make the control queue
> implementation robust with VDUSE. Until it is completed,
> let's fail features check if control-queue feature is
> requested.
>
> Signed-off-by: Maxime Coquelin 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index a5af6d4077b8..00f3f562ab5d 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -8,6 +8,7 @@
>   *
>   */
>
> +#include "linux/virtio_net.h"
>  #include 
>  #include 
>  #include 
> @@ -28,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include "iova_domain.h"
> @@ -1680,6 +1682,9 @@ static bool features_is_valid(struct vduse_dev_config 
> *config)
> if ((config->device_id == VIRTIO_ID_BLOCK) &&
> (config->features & BIT_ULL(VIRTIO_BLK_F_CONFIG_WCE)))
> return false;
> +   else if ((config->device_id == VIRTIO_ID_NET) &&
> +   (config->features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
> +   return false;
>
> return true;
>  }
> --
> 2.43.0
>

Re: [RFC V1 08/13] vduse: flush workers on suspend

2024-01-10 Thread Jason Wang

On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  wrote:
>
> To pass ownership of a live vdpa device to a new process, the user
> suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
> VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
> mm.  Flush workers in suspend to guarantee that no worker sees the new
> mm and old VA in between.
>
> Signed-off-by: Steve Sistare 

It seems we need a better title, probably "suspend support for vduse"?
And it looks better to be an separate patch.

Thanks

Re: [RFC V1 07/13] vhost-vdpa: flush workers on suspend

2024-01-10 Thread Jason Wang

On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  wrote:
>
> To pass ownership of a live vdpa device to a new process, the user
> suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
> VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
> mm.  Flush workers in suspend to guarantee that no worker sees the new
> mm and old VA in between.
>
> Signed-off-by: Steve Sistare 
> ---
>  drivers/vhost/vdpa.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 8fe1562d24af..9673e8e20d11 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -591,10 +591,14 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
>  {
> struct vdpa_device *vdpa = v->vdpa;
> const struct vdpa_config_ops *ops = vdpa->config;
> +   struct vhost_dev *vdev = >vdev;
>
> if (!ops->suspend)
> return -EOPNOTSUPP;
>
> +   if (vdev->use_worker)
> +   vhost_dev_flush(vdev);

It looks to me like it's better to check use_woker in vhost_dev_flush.

Thanks


> +
> return ops->suspend(vdpa);
>  }
>
> --
> 2.39.3
>

Re: [RFC V1 05/13] vhost-vdpa: VHOST_IOTLB_REMAP

2024-01-10 Thread Jason Wang

On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  wrote:
>
> When device ownership is passed to a new process via VHOST_NEW_OWNER,
> some devices need to know the new userland addresses of the dma mappings.
> Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr
> of a mapping.  The new uaddr must address the same memory object as
> originally mapped.
>
> The user must suspend the device before the old address is invalidated,
> and cannot resume it until after VHOST_IOTLB_REMAP is called, but this
> requirement is not enforced by the API.
>
> Signed-off-by: Steve Sistare 
> ---
>  drivers/vhost/vdpa.c | 34 
>  include/uapi/linux/vhost_types.h | 11 ++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index faed6471934a..ec5ca20bd47d 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>
>  }
>
> +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v,
> + struct vhost_iotlb *iotlb,
> + struct vhost_iotlb_msg *msg)
> +{
> +   struct vdpa_device *vdpa = v->vdpa;
> +   const struct vdpa_config_ops *ops = vdpa->config;
> +   u32 asid = iotlb_to_asid(iotlb);
> +   u64 start = msg->iova;
> +   u64 last = start + msg->size - 1;
> +   struct vhost_iotlb_map *map;
> +   int r = 0;
> +
> +   if (msg->perm || !msg->size)
> +   return -EINVAL;
> +
> +   map = vhost_iotlb_itree_first(iotlb, start, last);
> +   if (!map)
> +   return -ENOENT;
> +
> +   if (map->start != start || map->last != last)
> +   return -EINVAL;
> +
> +   /* batch will finish with remap.  non-batch must do it now. */
> +   if (!v->in_batch)
> +   r = ops->set_map(vdpa, asid, iotlb);
> +   if (!r)
> +   map->addr = msg->uaddr;

I may miss something, for example for PA mapping,

1) need to convert uaddr into phys addr
2) need to check whether the uaddr is backed by the same page or not?

Thanks

> +
> +   return r;
> +}
> +
>  static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>struct vhost_iotlb *iotlb,
>struct vhost_iotlb_msg *msg)
> @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct 
> vhost_dev *dev, u32 asid,
> ops->set_map(vdpa, asid, iotlb);
> v->in_batch = false;
> break;
> +   case VHOST_IOTLB_REMAP:
> +   r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg);
> +   break;
> default:
> r = -EINVAL;
> break;
> diff --git a/include/uapi/linux/vhost_types.h 
> b/include/uapi/linux/vhost_types.h
> index 9177843951e9..35908315ff55 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -79,7 +79,7 @@ struct vhost_iotlb_msg {
>  /*
>   * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying
>   * multiple mappings in one go: beginning with
> - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of
> + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or
>   * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END.
>   * When one of these two values is used as the message type, the rest
>   * of the fields in the message are ignored. There's no guarantee that
> @@ -87,6 +87,15 @@ struct vhost_iotlb_msg {
>   */
>  #define VHOST_IOTLB_BATCH_BEGIN5
>  #define VHOST_IOTLB_BATCH_END  6
> +
> +/*
> + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova.
> + * The new uaddr must address the same memory object as originally mapped.
> + * Failure to do so will result in user memory corruption and/or device
> + * misbehavior.  iova and size must match the arguments used to create the
> + * an existing mapping.  Protection is not changed, and perm must be 0.
> + */
> +#define VHOST_IOTLB_REMAP  7
> __u8 type;
>  };
>
> --
> 2.39.3
>

Re: [RFC V1 00/13] vdpa live update

2024-01-10 Thread Jason Wang

On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare  wrote:
>
> Live update is a technique wherein an application saves its state, exec's
> to an updated version of itself, and restores its state.  Clients of the
> application experience a brief suspension of service, on the order of
> 100's of milliseconds, but are otherwise unaffected.
>
> Define and implement interfaces that allow vdpa devices to be preserved
> across fork or exec, to support live update for applications such as qemu.
> The device must be suspended during the update, but its dma mappings are
> preserved, so the suspension is brief.
>
> The VHOST_NEW_OWNER ioctl transfers device ownership and pinned memory
> accounting from one process to another.
>
> The VHOST_BACKEND_F_NEW_OWNER backend capability indicates that
> VHOST_NEW_OWNER is supported.
>
> The VHOST_IOTLB_REMAP message type updates a dma mapping with its userland
> address in the new process.
>
> The VHOST_BACKEND_F_IOTLB_REMAP backend capability indicates that
> VHOST_IOTLB_REMAP is supported and required.  Some devices do not
> require it, because the userland address of each dma mapping is discarded
> after being translated to a physical address.
>
> Here is a pseudo-code sequence for performing live update, based on
> suspend + reset because resume is not yet available.  The vdpa device
> descriptor, fd, remains open across the exec.
>
>   ioctl(fd, VHOST_VDPA_SUSPEND)
>   ioctl(fd, VHOST_VDPA_SET_STATUS, 0)
>   exec

Is there a userspace implementation as a reference?

>
>   ioctl(fd, VHOST_NEW_OWNER)
>
>   issue ioctls to re-create vrings
>
>   if VHOST_BACKEND_F_IOTLB_REMAP
>   foreach dma mapping
>   write(fd, {VHOST_IOTLB_REMAP, new_addr})

I think I need to understand the advantages of this approach. For
example, why it is better than

ioctl(VHOST_RESET_OWNER)
exec

ioctl(VHOST_SET_OWNER)

for each dma mapping
 ioctl(VHOST_IOTLB_UPDATE)

Thanks

>
>   ioctl(fd, VHOST_VDPA_SET_STATUS,
> ACKNOWLEDGE | DRIVER | FEATURES_OK | DRIVER_OK)
>
>
> Steve Sistare (13):
>   vhost-vdpa: count pinned memory
>   vhost-vdpa: pass mm to bind
>   vhost-vdpa: VHOST_NEW_OWNER
>   vhost-vdpa: VHOST_BACKEND_F_NEW_OWNER
>   vhost-vdpa: VHOST_IOTLB_REMAP
>   vhost-vdpa: VHOST_BACKEND_F_IOTLB_REMAP
>   vhost-vdpa: flush workers on suspend
>   vduse: flush workers on suspend
>   vdpa_sim: reset must not run
>   vdpa_sim: flush workers on suspend
>   vdpa/mlx5: new owner capability
>   vdpa_sim: new owner capability
>   vduse: new owner capability
>
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  |   3 +-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c   |  24 ++-
>  drivers/vdpa/vdpa_user/vduse_dev.c |  32 +
>  drivers/vhost/vdpa.c   | 101 +++--
>  drivers/vhost/vhost.c  |  15 +
>  drivers/vhost/vhost.h  |   1 +
>  include/uapi/linux/vhost.h |  10 +++
>  include/uapi/linux/vhost_types.h   |  15 -
>  8 files changed, 191 insertions(+), 10 deletions(-)
>
> --
> 2.39.3
>

Re: [GIT PULL] Modules changes for v6.8-rc1

2024-01-10 Thread pr-tracker-bot

The pull request you sent on Tue, 9 Jan 2024 06:46:41 -0800:

> git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ 
> tags/modules-6.8-rc1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4cd083d53108b32f4c8ed92a3f85d7b36133c0c9

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: [PATCH v5 11/34] function_graph: Have the instances use their own ftrace_ops for filtering

2024-01-10 Thread Google

Hi Mark,

Thanks for the investigation.

On Mon, 8 Jan 2024 12:25:55 +
Mark Rutland  wrote:

> Hi,
> 
> There's a bit more of an info-dump below; I'll go try to dump the fgraph 
> shadow
> stack so that we can analyse this in more detail.
> 
> On Mon, Jan 08, 2024 at 10:14:36AM +0900, Masami Hiramatsu wrote:
> > On Fri, 5 Jan 2024 17:09:10 +
> > Mark Rutland  wrote:
> > 
> > > On Mon, Dec 18, 2023 at 10:13:46PM +0900, Masami Hiramatsu (Google) wrote:
> > > > From: Steven Rostedt (VMware) 
> > > > 
> > > > Allow for instances to have their own ftrace_ops part of the fgraph_ops
> > > > that makes the funtion_graph tracer filter on the set_ftrace_filter file
> > > > of the instance and not the top instance.
> > > > 
> > > > This also change how the function_graph handles multiple instances on 
> > > > the
> > > > shadow stack. Previously we use ARRAY type entries to record which one
> > > > is enabled, and this makes it a bitmap of the fgraph_array's indexes.
> > > > Previous function_graph_enter() expects calling back from
> > > > prepare_ftrace_return() function which is called back only once if it is
> > > > enabled. But this introduces different ftrace_ops for each fgraph
> > > > instance and those are called from ftrace_graph_func() one by one. Thus
> > > > we can not loop on the fgraph_array(), and need to reuse the ret_stack
> > > > pushed by the previous instance. Finding the ret_stack is easy because
> > > > we can check the ret_stack->func. But that is not enough for the self-
> > > > recursive tail-call case. Thus fgraph uses the bitmap entry to find it
> > > > is already set (this means that entry is for previous tail call).
> > > > 
> > > > Signed-off-by: Steven Rostedt (VMware) 
> > > > Signed-off-by: Masami Hiramatsu (Google) 
> > > 
> > > As a heads-up, while testing the topic/fprobe-on-fgraph branch on arm64, 
> > > I get
> > > a warning which bisets down to this commit:
> > 
> > Hmm, so does this happen when enabling function graph tracer?
> 
> Yes; I see it during the function_graph boot-time self-test if I also enable
> CONFIG_IRQSOFF_TRACER=y. I can also trigger it regardless of
> CONFIG_IRQSOFF_TRACER if I cat /proc/self/stack with the function_graph tracer
> enabled (note that I hacked the unwinder to continue after failing to recover 
> a
> return address):
> 
> | # mount -t tracefs none /sys/kernel/tracing/
> | # echo function_graph > /sys/kernel/tracing/current_tracer
> | # cat /proc/self/stack
> | [   37.469980] [ cut here ]
> | [   37.471503] WARNING: CPU: 2 PID: 174 at 
> arch/arm64/kernel/stacktrace.c:84 arch_stack_walk+0x2d8/0x338
> | [   37.474381] Modules linked in:
> | [   37.475501] CPU: 2 PID: 174 Comm: cat Not tainted 
> 6.7.0-rc2-00026-gea1e68a341c2-dirty #15
> | [   37.478133] Hardware name: linux,dummy-virt (DT)
> | [   37.479670] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS 
> BTYPE=--)
> | [   37.481923] pc : arch_stack_walk+0x2d8/0x338
> | [   37.483373] lr : arch_stack_walk+0x1bc/0x338
> | [   37.484818] sp : 8000835f3a90
> | [   37.485974] x29: 8000835f3a90 x28: 8000835f3b80 x27: 
> 8000835f3b38
> | [   37.488405] x26: 04341e00 x25: 8000835f4000 x24: 
> 80008002df18
> | [   37.490842] x23: 80008002df18 x22: 8000835f3b60 x21: 
> 80008015d240
> | [   37.493269] x20: 8000835f3b50 x19: 8000835f3b40 x18: 
> 
> | [   37.495704] x17:  x16:  x15: 
> 
> | [   37.498144] x14:  x13:  x12: 
> 
> | [   37.500579] x11: 800082b4d920 x10: 8000835f3a70 x9 : 
> 8000800e55a0
> | [   37.503021] x8 : 80008002df18 x7 : 04341e00 x6 : 
> 
> | [   37.505452] x5 :  x4 : 8000835f3e48 x3 : 
> 8000835f3b80
> | [   37.507888] x2 : 80008002df18 x1 : 07f7b000 x0 : 
> 80008002df18
> | [   37.510319] Call trace:
> | [   37.511202]  arch_stack_walk+0x2d8/0x338
> | [   37.512541]  stack_trace_save_tsk+0x90/0x110
> | [   37.514012]  return_to_handler+0x0/0x48
> | [   37.515336]  return_to_handler+0x0/0x48
> | [   37.516657]  return_to_handler+0x0/0x48
> | [   37.517985]  return_to_handler+0x0/0x48
> | [   37.519305]  return_to_handler+0x0/0x48
> | [   37.520623]  return_to_handler+0x0/0x48
> | [   37.521957]  return_to_handler+0x0/0x48
> | [   37.523272]  return_to_handler+0x0/0x48
> | [   37.524595]  return_to_handler+0x0/0x48
> | [   37.525931]  return_to_handler+0x0/0x48
> | [   37.527254]  return_to_handler+0x0/0x48
> | [   37.528564]  el0t_64_sync_handler+0x120/0x130
> | [   37.530046]  el0t_64_sync+0x190/0x198
> | [   37.531310] ---[ end trace  ]---
> | [<0>] ftrace_stub_graph+0x8/0x8
> | [<0>] ftrace_stub_graph+0x8/0x8
> | [<0>] ftrace_stub_graph+0x8/0x8
> | [<0>] ftrace_stub_graph+0x8/0x8
> | [<0>] ftrace_stub_graph+0x8/0x8
> | [<0>] ftrace_stub_graph+0x8/0x8
> | [<0>] ftrace_stub_graph+0x8/0x8
>

Re: [PATCH net-next v3 2/3] net: introduce abstraction for network memory

2024-01-10 Thread Jakub Kicinski

On Wed, 10 Jan 2024 09:50:08 -0800 Shakeel Butt wrote:
> On Thu, Jan 4, 2024 at 1:44 PM Jakub Kicinski  wrote:
> > You seem to be trying hard to make struct netmem a thing.
> > Perhaps you have a reason I'm not getting?  
> 
> Mina already went with your suggestion and that is fine. To me, struct
> netmem is more aesthetically aligned with the existing struct
> encoded_page approach, but I don't have a strong opinion one way or
> the other. However it seems like you have a stronger preference for
> __bitwise approach. Is there a technical reason or just aesthetic?

Yes, right above the text you quoted:

  The __bitwise annotation will make catching people trying
  to cast to struct page * trivial.

https://lore.kernel.org/all/20240104134424.399fe...@kernel.org/

[PATCH] mm: Update mark_victim tracepoints fields

2024-01-10 Thread Carlos Galo

The current implementation of the mark_victim tracepoint provides only
the process ID (pid) of the victim process. This limitation poses
challenges for userspace tools that need additional information
about the OOM victim. The association between pid and the additional
data may be lost after the kill, making it difficult for userspace to
correlate the OOM event with the specific process.

In order to mitigate this limitation, add the following fields:

- UID
   In Android each installed application has a unique UID. Including
   the `uid` assists in correlating OOM events with specific apps.

- Process Name (comm)
   Enables identification of the affected process.

- OOM Score
   Allows userspace to get additional insights of the relative kill
   priority of the OOM victim.

Cc: Steven Rostedt 
Cc: Andrew Morton 
Cc: Suren Baghdasaryan 
Signed-off-by: Carlos Galo 
---
 include/trace/events/oom.h | 19 +++
 mm/oom_kill.c  |  6 +-
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/oom.h b/include/trace/events/oom.h
index 26a11e4a2c36..fb8a5d1b8a0a 100644
--- a/include/trace/events/oom.h
+++ b/include/trace/events/oom.h
@@ -72,19 +72,30 @@ TRACE_EVENT(reclaim_retry_zone,
 );
 
 TRACE_EVENT(mark_victim,
-   TP_PROTO(int pid),
+   TP_PROTO(struct task_struct *task, uid_t uid),
 
-   TP_ARGS(pid),
+   TP_ARGS(task, uid),
 
TP_STRUCT__entry(
__field(int, pid)
+   __field(uid_t, uid)
+   __string(comm, task->comm)
+   __field(short, oom_score_adj)
),
 
TP_fast_assign(
-   __entry->pid = pid;
+   __entry->pid = task->pid;
+   __entry->uid = uid;
+   __assign_str(comm, task->comm);
+   __entry->oom_score_adj = task->signal->oom_score_adj;
),
 
-   TP_printk("pid=%d", __entry->pid)
+   TP_printk("pid=%d uid=%u comm=%s oom_score_adj=%hd",
+   __entry->pid,
+   __entry->uid
+   __get_str(comm),
+   __entry->oom_score_adj,
+   )
 );
 
 TRACE_EVENT(wake_reaper,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 9e6071fde34a..0698c00c5da6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "internal.h"
@@ -753,6 +754,7 @@ static inline void queue_oom_reaper(struct task_struct *tsk)
  */
 static void mark_oom_victim(struct task_struct *tsk)
 {
+   const struct cred *cred;
struct mm_struct *mm = tsk->mm;
 
WARN_ON(oom_killer_disabled);
@@ -772,7 +774,9 @@ static void mark_oom_victim(struct task_struct *tsk)
 */
__thaw_task(tsk);
atomic_inc(_victims);
-   trace_mark_victim(tsk->pid);
+   cred = get_task_cred(tsk);
+   trace_mark_victim(tsk, cred->uid.val);
+   put_cred(cred);
 }
 
 /**

base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
-- 
2.43.0.275.g3460e3d667-goog

Re: [PATCH] vfio: fix virtio-pci dependency

2024-01-10 Thread Alex Williamson

On Tue,  9 Jan 2024 08:57:19 +0100
Arnd Bergmann  wrote:

> From: Arnd Bergmann 
> 
> The new vfio-virtio driver already has a dependency on 
> VIRTIO_PCI_ADMIN_LEGACY,
> but that is a bool symbol and allows vfio-virtio to be built-in even if
> virtio-pci itself is a loadable module. This leads to a link failure:
> 
> aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
> `virtiovf_pci_probe':
> main.c:(.text+0xec): undefined reference to `virtio_pci_admin_has_legacy_io'
> aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
> `virtiovf_pci_init_device':
> main.c:(.text+0x260): undefined reference to 
> `virtio_pci_admin_legacy_io_notify_info'
> aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
> `virtiovf_pci_bar0_rw':
> main.c:(.text+0x6ec): undefined reference to 
> `virtio_pci_admin_legacy_common_io_read'
> aarch64-linux-ld: main.c:(.text+0x6f4): undefined reference to 
> `virtio_pci_admin_legacy_device_io_read'
> aarch64-linux-ld: main.c:(.text+0x7f0): undefined reference to 
> `virtio_pci_admin_legacy_common_io_write'
> aarch64-linux-ld: main.c:(.text+0x7f8): undefined reference to 
> `virtio_pci_admin_legacy_device_io_write'
> 
> Add another explicit dependency on the tristate symbol.
> 
> Fixes: eb61eca0e8c3 ("vfio/virtio: Introduce a vfio driver over virtio 
> devices")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/vfio/pci/virtio/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> index fc3a0be9d8d4..bd80eca4a196 100644
> --- a/drivers/vfio/pci/virtio/Kconfig
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config VIRTIO_VFIO_PCI
>  tristate "VFIO support for VIRTIO NET PCI devices"
> -depends on VIRTIO_PCI_ADMIN_LEGACY
> +depends on VIRTIO_PCI && VIRTIO_PCI_ADMIN_LEGACY
>  select VFIO_PCI_CORE
>  help
>This provides support for exposing VIRTIO NET VF devices which 
> support

Applied to vfio next branch for v6.8.  Thanks!

Alex

Re: [PATCH] virtiofs: limit the length of ITER_KVEC dio by max_nopage_rw

2024-01-10 Thread Bernd Schubert





On 1/10/24 02:16, Hou Tao wrote:

Hi,

On 1/9/2024 9:11 PM, Bernd Schubert wrote:



On 1/3/24 11:59, Hou Tao wrote:

From: Hou Tao 

When trying to insert a 10MB kernel module kept in a virtiofs with cache
disabled, the following warning was reported:

    [ cut here ]
    WARNING: CPU: 2 PID: 439 at mm/page_alloc.c:4544 ..
    Modules linked in:
    CPU: 2 PID: 439 Comm: insmod Not tainted 6.7.0-rc7+ #33
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ..
    RIP: 0010:__alloc_pages+0x2c4/0x360
    ..
    Call Trace:
     
     ? __warn+0x8f/0x150
     ? __alloc_pages+0x2c4/0x360
     __kmalloc_large_node+0x86/0x160
     __kmalloc+0xcd/0x140
     virtio_fs_enqueue_req+0x240/0x6d0
     virtio_fs_wake_pending_and_unlock+0x7f/0x190
     queue_request_and_unlock+0x58/0x70
     fuse_simple_request+0x18b/0x2e0
     fuse_direct_io+0x58a/0x850
     fuse_file_read_iter+0xdb/0x130
     __kernel_read+0xf3/0x260
     kernel_read+0x45/0x60
     kernel_read_file+0x1ad/0x2b0
     init_module_from_file+0x6a/0xe0
     idempotent_init_module+0x179/0x230
     __x64_sys_finit_module+0x5d/0xb0
     do_syscall_64+0x36/0xb0
     entry_SYSCALL_64_after_hwframe+0x6e/0x76
     ..
     
    ---[ end trace  ]---

The warning happened as follow. In copy_args_to_argbuf(), virtiofs uses
kmalloc-ed memory as bound buffer for fuse args, but
fuse_get_user_pages() only limits the length of fuse arg by max_read or
max_write for IOV_KVEC io (e.g., kernel_read_file from finit_module()).
For virtiofs, max_read is UINT_MAX, so a big read request which is about



I find this part of the explanation a bit confusing. I guess you
wanted to write something like

fuse_direct_io() -> fuse_get_user_pages() is limited by
fc->max_write/fc->max_read and fc->max_pages. For virtiofs max_pages
does not apply as ITER_KVEC is used. As virtiofs sets fc->max_read to
UINT_MAX basically no limit is applied at all.


Yes, what you said is just as expected but it is not the root cause of
the warning. The culprit of the warning is kmalloc() in
copy_args_to_argbuf() just as said in commit message. vmalloc() is also
not acceptable, because the physical memory needs to be contiguous. For
the problem, because there is no page involved, so there will be extra
sg available, maybe we can use these sg to break the big read/write
request into page.


Hmm ok, I was hoping that contiguous memory is not needed.
I see that ENOMEM is handled, but how that that perform (or even 
complete) on a really badly fragmented system? I guess splitting into 
smaller pages or at least adding some reserve kmem_cache (or even 
mempool) would make sense?




I also wonder if it wouldn't it make sense to set a sensible limit in
virtio_fs_ctx_set_defaults() instead of introducing a new variable?


As said in the commit message:

A feasible solution is to limit the value of max_read for virtiofs, so
the length passed to kmalloc() will be limited. However it will affects
the max read size for ITER_IOVEC io and the value of max_write also needs
limitation.

It is a bit hard to set a reasonable value for both max_read and
max_write to handle both normal ITER_IOVEC io and ITER_KVEC io. And
considering ITER_KVEC io + dio case is uncommon, I think using a new
limitation is more reasonable.


For ITER_IOVEC max_pages applies - which is limited to 
FUSE_MAX_MAX_PAGES - why can't this be used in virtio_fs_ctx_set_defaults?


@Miklos, is there a reason why there is no upper fc->max_{read,write} 
limit in process_init_reply()? Shouldn't both be limited to

(FUSE_MAX_MAX_PAGES * PAGE_SIZE). Or any other reasonable limit?


Thanks,
Bernd





Also, I guess the issue is kmalloc_array() in virtio_fs_enqueue_req?
Wouldn't it make sense to use kvm_alloc_array/kvfree in that function?


Thanks,
Bernd



10MB is passed to copy_args_to_argbuf(), kmalloc() is called in turn
with len=10MB, and triggers the warning in __alloc_pages():
WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp)).

A feasible solution is to limit the value of max_read for virtiofs, so
the length passed to kmalloc() will be limited. However it will affects
the max read size for ITER_IOVEC io and the value of max_write also
needs
limitation. So instead of limiting the values of max_read and max_write,
introducing max_nopage_rw to cap both the values of max_read and
max_write when the fuse dio read/write request is initiated from kernel.

Considering that fuse read/write request from kernel is uncommon and to
decrease the demand for large contiguous pages, set max_nopage_rw as
256KB instead of KMALLOC_MAX_SIZE - 4096 or similar.

Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Hou Tao 
---
   fs/fuse/file.c  | 12 +++-
   fs/fuse/fuse_i.h    |  3 +++
   fs/fuse/inode.c |  1 +
   fs/fuse/virtio_fs.c |  6 ++
   4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a660f1f21540..f1beb7c0b782 100644
---

Re: [RFC v1 0/8] vhost-vdpa: add support for iommufd

2024-01-10 Thread Michael S. Tsirkin

On Sat, Nov 04, 2023 at 01:16:33AM +0800, Cindy Lu wrote:
> 
> Hi All
> This code provides the iommufd support for vdpa device
> This code fixes the bugs from the last version and also add the asid support. 
> rebase on kernel
> v6,6-rc3
> Test passed in the physical device (vp_vdpa), but  there are still some 
> problems in the emulated device (vdpa_sim_net), 
> I will continue working on it
> 
> The kernel code is
> https://gitlab.com/lulu6/vhost/-/tree/iommufdRFC_v1
> 
> Signed-off-by: Cindy Lu 

Was this abandoned?

> 
> Cindy Lu (8):
>   vhost/iommufd: Add the functions support iommufd
>   Kconfig: Add the new file vhost/iommufd
>   vhost: Add 3 new uapi to support iommufd
>   vdpa: Add new vdpa_config_ops to support iommufd
>   vdpa_sim :Add support for iommufd
>   vdpa: change the map/unmap process to support iommufd
>   vp_vdpa::Add support for iommufd
>   iommu: expose the function iommu_device_use_default_domain
> 
>  drivers/iommu/iommu.c |   2 +
>  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   8 ++
>  drivers/vdpa/virtio_pci/vp_vdpa.c |   4 +
>  drivers/vhost/Kconfig |   1 +
>  drivers/vhost/Makefile|   1 +
>  drivers/vhost/iommufd.c   | 178 +
>  drivers/vhost/vdpa.c  | 210 +-
>  drivers/vhost/vhost.h |  21 +++
>  include/linux/vdpa.h  |  38 +-
>  include/uapi/linux/vhost.h|  66 ++
>  10 files changed, 525 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/vhost/iommufd.c
> 
> -- 
> 2.34.3

Re: [RFC V1 01/13] vhost-vdpa: count pinned memory

2024-01-10 Thread Michael S. Tsirkin

On Wed, Jan 10, 2024 at 12:40:03PM -0800, Steve Sistare wrote:
> Remember the count of pinned memory for the device.
> 
> Signed-off-by: Steve Sistare 

Can we have iommufd support in vdpa so we do not keep extending these hacks?


> ---
>  drivers/vhost/vdpa.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index da7ec77cdaff..10fb95bcca1a 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -59,6 +59,7 @@ struct vhost_vdpa {
>   int in_batch;
>   struct vdpa_iova_range range;
>   u32 batch_asid;
> + long pinned_vm;
>  };
>  
>  static DEFINE_IDA(vhost_vdpa_ida);
> @@ -893,6 +894,7 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, 
> struct vhost_iotlb *iotlb,
>   unpin_user_page(page);
>   }
>   atomic64_sub(PFN_DOWN(map->size), >mm->pinned_vm);
> + v->pinned_vm -= PFN_DOWN(map->size);
>   vhost_vdpa_general_unmap(v, map, asid);
>   vhost_iotlb_map_free(iotlb, map);
>   }
> @@ -975,9 +977,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, struct 
> vhost_iotlb *iotlb,
>   return r;
>   }
>  
> - if (!vdpa->use_va)
> + if (!vdpa->use_va) {
>   atomic64_add(PFN_DOWN(size), >mm->pinned_vm);
> -
> + v->pinned_vm += PFN_DOWN(size);
> + }
>   return 0;
>  }
>  
> -- 
> 2.39.3

[PATCH v9 3/3] remoteproc: zynqmp: parse TCM from device tree

2024-01-10 Thread Tanmay Shah

ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information
is available in device-tree. Parse TCM information in driver
as per new bindings.

Signed-off-by: Tanmay Shah 
---

Changes in v9:
  - Introduce new API to request and release core1 TCM power-domains in
lockstep mode. This will be used during prepare -> add_tcm_banks
callback to enable TCM in lockstep mode.
  - Parse TCM from device-tree in lockstep mode and split mode in
uniform way.
  - Fix TCM representation in device-tree in lockstep mode.

Changes in v8:
  - Remove pm_domains framework
  - Remove checking of pm_domain_id validation to power on/off tcm
  - Remove spurious change
  - parse power-domains property from device-tree and use EEMI calls
to power on/off TCM instead of using pm domains framework

Changes in v7:
  - move checking of pm_domain_id from previous patch
  - fix mem_bank_data memory allocation

 drivers/remoteproc/xlnx_r5_remoteproc.c | 245 +++-
 1 file changed, 239 insertions(+), 6 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 4395edea9a64..0f87b984850b 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -74,8 +74,8 @@ struct mbox_info {
 };
 
 /*
- * Hardcoded TCM bank values. This will be removed once TCM bindings are
- * accepted for system-dt specifications and upstreamed in linux kernel
+ * Hardcoded TCM bank values. This will stay in driver to maintain backward
+ * compatibility with device-tree that does not have TCM information.
  */
 static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
{0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
@@ -102,6 +102,7 @@ static const struct mem_bank_data 
zynqmp_tcm_banks_lockstep[] = {
  * @rproc: rproc handle
  * @pm_domain_id: RPU CPU power domain id
  * @ipi: pointer to mailbox information
+ * @lockstep_core1_np: second core's device_node to use in lockstep mode
  */
 struct zynqmp_r5_core {
struct device *dev;
@@ -111,6 +112,7 @@ struct zynqmp_r5_core {
struct rproc *rproc;
u32 pm_domain_id;
struct mbox_info *ipi;
+   struct device_node *lockstep_core1_np;
 };
 
 /**
@@ -539,6 +541,110 @@ static int tcm_mem_map(struct rproc *rproc,
return 0;
 }
 
+int request_core1_tcm_lockstep(struct rproc *rproc)
+{
+   struct zynqmp_r5_core *r5_core = rproc->priv;
+   struct of_phandle_args out_args = {0};
+   int ret, i, num_pd, pd_id, ret_err;
+   struct device_node *np;
+
+   np = r5_core->lockstep_core1_np;
+
+   /* Get number of power-domains */
+   num_pd = of_count_phandle_with_args(np, "power-domains",
+   "#power-domain-cells");
+   if (num_pd <= 0)
+   return -EINVAL;
+
+   /* Get individual power-domain id and enable TCM */
+   for (i = 1; i < num_pd; i++) {
+   ret = of_parse_phandle_with_args(np, "power-domains",
+"#power-domain-cells",
+i, _args);
+   if (ret) {
+   dev_warn(r5_core->dev,
+"failed to get tcm %d in power-domains list, 
ret %d\n",
+i, ret);
+   goto fail_request_core1_tcm;
+   }
+
+   pd_id = out_args.args[0];
+   of_node_put(out_args.np);
+
+   ret = zynqmp_pm_request_node(pd_id, 
ZYNQMP_PM_CAPABILITY_ACCESS, 0,
+ZYNQMP_PM_REQUEST_ACK_BLOCKING);
+   if (ret) {
+   dev_err(r5_core->dev, "failed to request TCM node 
0x%x\n",
+   pd_id);
+   goto fail_request_core1_tcm;
+   }
+   }
+
+   return 0;
+
+fail_request_core1_tcm:
+
+   /* Cache actual error to return later */
+   ret_err = ret;
+
+   /* Release previously requested TCM in case of failure */
+   while (--i > 0) {
+   ret = of_parse_phandle_with_args(np, "power-domains",
+"#power-domain-cells",
+i, _args);
+   if (ret)
+   return ret;
+   pd_id = out_args.args[0];
+   of_node_put(out_args.np);
+   zynqmp_pm_release_node(pd_id);
+   }
+
+   return ret_err;
+}
+
+void release_core1_tcm_lockstep(struct rproc *rproc)
+{
+   struct zynqmp_r5_core *r5_core = rproc->priv;
+   struct of_phandle_args out_args = {0};
+   struct zynqmp_r5_cluster *cluster;
+   int ret, i, num_pd, pd_id;
+   struct device_node *np;
+
+   /* Get R5 core1 node */
+   cluster = dev_get_drvdata(r5_core->dev->parent);
+
+   if (cluster->mode != LOCKSTEP_MODE)
+

[PATCH v9 0/3] add zynqmp TCM bindings

2024-01-10 Thread Tanmay Shah

Tightly-Coupled Memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains exclusive two 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
In lockstep mode, both 128KB memory is accessible to the cluster.

As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following
is address space of TCM memory. The bindings in this patch series
introduces properties to accommodate following address space with
address translation between Linux and Cortex-R5 views.

| | | |
| --- | --- | --- |
|  *Mode*|   *R5 View* | *Linux view* |  Notes   |
| *Split Mode*   | *start addr*| *start addr* |  |
| R5_0 ATCM (64 KB)  | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (64 KB)  | 0x0002_ | 0xFFE2_  |  |
| R5_1 ATCM (64 KB)  | 0x_ | 0xFFE9_  | alias of 0xFFE1_ |
| R5_1 BTCM (64 KB)  | 0x0002_ | 0xFFEB_  | alias of 0xFFE3_ |
|  ___   | ___ |___   |  |
| *Lockstep Mode*| |  |  |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |  |

References:
UG1085 TCM address space:
https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map

Changes in v9:
  - Fix rproc lockstep dts
  - Introduce new API to request and release core1 TCM power-domains in
lockstep mode. This will be used during prepare -> add_tcm_banks
callback to enable TCM in lockstep mode.
  - Parse TCM from device-tree in lockstep mode and split mode in
uniform way.
  - Fix TCM representation in device-tree in lockstep mode.
  - Fix comments as suggested

Changes in v8:
  - Remove use of pm_domains framework
  - Remove checking of pm_domain_id validation to power on/off tcm
  - Remove spurious change
  - parse power-domains property from device-tree and use EEMI calls
to power on/off TCM instead of using pm domains framework

Changes in v7:
  - %s/pm_dev1/pm_dev_core0/r
  - %s/pm_dev_link1/pm_dev_core0_link/r
  - %s/pm_dev2/pm_dev_core1/r
  - %s/pm_dev_link2/pm_dev_core1_link/r
  - remove pm_domain_id check to move next patch
  - add comment about how 1st entry in pm domain list is used
  - fix loop when jump to fail_add_pm_domains loop
  - move checking of pm_domain_id from previous patch
  - fix mem_bank_data memory allocation

Changes in v6:
  - Introduce new node entry for r5f cluster split mode dts and
keep it disabled by default.
  - Keep remoteproc lockstep mode enabled by default to maintian
back compatibility.
  - Enable split mode only for zcu102 board to demo split mode use
  - Remove spurious change
  - Handle errors in add_pm_domains function
  - Remove redundant code to handle errors from remove_pm_domains
  - Missing . at the end of the commit message
  - remove redundant initialization of variables
  - remove fail_tcm label and relevant code to free memory
acquired using devm_* API. As this will be freed when device free it
  - add extra check to see if "reg" property is supported or not

Changes in v5:
  - maintain Rob's Ack on bindings patch as no changes in bindings
  - split previous patch into multiple patches
  - Use pm domain framework to turn on/off TCM
  - Add support of parsing TCM information from device-tree
  - maintain backward compatibility with previous bindings without
TCM information available in device-tree

This patch series continues previous effort to upstream ZynqMP
TCM bindings:
Previous v4 version link:
https://lore.kernel.org/all/20230829181900.2561194-1-tanmay.s...@amd.com/

Previous v3 version link:
https://lore.kernel.org/all/1689964908-22371-1-git-send-email-radhey.shyam.pan...@amd.com/
Radhey Shyam Pandey (1):
  dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings


Radhey Shyam Pandey (1):
  dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

Tanmay Shah (2):
  dts: zynqmp: add properties for TCM in remoteproc
  remoteproc: zynqmp: parse TCM from device tree

 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 131 --
 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |   8 +
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi|  58 -
 drivers/remoteproc/xlnx_r5_remoteproc.c   | 245 +-
 4 files changed, 413 insertions(+), 29 deletions(-)


base-commit: ff9af5732fe761fa8e7aa66cb482f93a37e284ee
-- 
2.25.1

[PATCH v9 2/3] dts: zynqmp: add properties for TCM in remoteproc

2024-01-10 Thread Tanmay Shah

Add properties as per new bindings in zynqmp remoteproc node
to represent TCM address and size.

This patch also adds alternative remoteproc node to represent
remoteproc cluster in split mode. By default lockstep mode is
enabled and users should disable it before using split mode
dts. Both device-tree nodes can't be used simultaneously one
of them must be disabled. For zcu102-1.0 and zcu102-1.1 board
remoteproc split mode dts node is enabled and lockstep mode
dts is disabled.

Signed-off-by: Tanmay Shah 
---

Changes in v9:
  - fix rproc lockstep dts


 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |  8 +++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 58 +--
 2 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts 
b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
index c8f71a1aec89..495ca94b45db 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
+++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
@@ -14,6 +14,14 @@ / {
compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", 
"xlnx,zynqmp";
 };
 
+_split {
+   status = "okay";
+};
+
+_lockstep {
+   status = "disabled";
+};
+
  {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index b61fc99cd911..cfdd1f68501f 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -247,19 +247,67 @@ fpga_full: fpga-full {
ranges;
};
 
-   remoteproc {
+   rproc_lockstep: remoteproc@ffe0 {
compatible = "xlnx,zynqmp-r5fss";
xlnx,cluster-mode = <1>;
 
-   r5f-0 {
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x2>,
+<0x0 0x2 0x0 0xffe2 0x0 0x2>;
+
+   r5f@0 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x0 0x0 0x0 0x2>, <0x0 0x2 0x0 0x2>;
+   reg-names = "atcm", "btcm";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>;
+   memory-region = <_0_fw_image>;
+   };
+
+   r5f@1 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm", "btcm";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_1_fw_image>;
+   };
+   };
+
+   rproc_split: remoteproc-split@ffe0 {
+   status = "disabled";
+   compatible = "xlnx,zynqmp-r5fss";
+   xlnx,cluster-mode = <0>;
+
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x1 0x0 0x0 0xffe9 0x0 0x1>,
+<0x1 0x2 0x0 0xffeb 0x0 0x1>;
+
+   r5f@0 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_0>;
+   reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>;
+   reg-names = "atcm", "btcm";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>;
memory-region = <_0_fw_image>;
};
 
-   r5f-1 {
+   r5f@1 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_1>;
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm", "btcm";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
memory-region = <_1_fw_image>;
};
};
-- 
2.25.1

[PATCH v9 1/3] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-01-10 Thread Tanmay Shah

From: Radhey Shyam Pandey 

Introduce bindings for TCM memory address space on AMD-xilinx Zynq
UltraScale+ platform. It will help in defining TCM in device-tree
and make it's access platform agnostic and data-driven.

Tightly-coupled memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.

The TCM resources(reg, reg-names and power-domain) are documented for
each TCM in the R5 node. The reg and reg-names are made as required
properties as we don't want to hardcode TCM addresses for future
platforms and for zu+ legacy implementation will ensure that the
old dts w/o reg/reg-names works and stable ABI is maintained.

It also extends the examples for TCM split and lockstep modes.

Signed-off-by: Radhey Shyam Pandey 
Signed-off-by: Tanmay Shah 
Acked-by: Rob Herring 
---
 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 131 +++---
 1 file changed, 113 insertions(+), 18 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml 
b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
index 78aac69f1060..9ecd63ea1b38 100644
--- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
@@ -20,6 +20,17 @@ properties:
   compatible:
 const: xlnx,zynqmp-r5fss
 
+  "#address-cells":
+const: 2
+
+  "#size-cells":
+const: 2
+
+  ranges:
+description: |
+  Standard ranges definition providing address translations for
+  local R5F TCM address spaces to bus addresses.
+
   xlnx,cluster-mode:
 $ref: /schemas/types.yaml#/definitions/uint32
 enum: [0, 1, 2]
@@ -37,7 +48,7 @@ properties:
   2: single cpu mode
 
 patternProperties:
-  "^r5f-[a-f0-9]+$":
+  "^r5f@[0-9a-f]+$":
 type: object
 description: |
   The RPU is located in the Low Power Domain of the Processor Subsystem.
@@ -54,8 +65,19 @@ patternProperties:
   compatible:
 const: xlnx,zynqmp-r5f
 
+  reg:
+items:
+  - description: ATCM internal memory region
+  - description: BTCM internal memory region
+
+  reg-names:
+items:
+  - const: atcm
+  - const: btcm
+
   power-domains:
-maxItems: 1
+minItems: 1
+maxItems: 3
 
   mboxes:
 minItems: 1
@@ -102,34 +124,107 @@ patternProperties:
 required:
   - compatible
   - power-domains
+  - reg
+  - reg-names
 
 unevaluatedProperties: false
 
 required:
   - compatible
+  - "#address-cells"
+  - "#size-cells"
+  - ranges
 
 additionalProperties: false
 
 examples:
   - |
-remoteproc {
-compatible = "xlnx,zynqmp-r5fss";
-xlnx,cluster-mode = <1>;
-
-r5f-0 {
-compatible = "xlnx,zynqmp-r5f";
-power-domains = <_firmware 0x7>;
-memory-region = <_0_fw_image>, <>, 
<>, <>;
-mboxes = <_mailbox_rpu0 0>, <_mailbox_rpu0 1>;
-mbox-names = "tx", "rx";
+#include 
+
+//Split mode configuration
+soc {
+#address-cells = <2>;
+#size-cells = <2>;
+
+remoteproc@ffe0 {
+compatible = "xlnx,zynqmp-r5fss";
+xlnx,cluster-mode = <0>;
+
+#address-cells = <2>;
+#size-cells = <2>;
+ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+ <0x0 0x2 0x0 0xffe2 0x0 0x1>,
+ <0x1 0x0 0x0 0xffe9 0x0 0x1>,
+ <0x1 0x2 0x0 0xffeb 0x0 0x1>;
+
+r5f@0 {
+compatible = "xlnx,zynqmp-r5f";
+reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>;
+reg-names = "atcm", "btcm";
+power-domains = <_firmware PD_RPU_0>,
+<_firmware PD_R5_0_ATCM>,
+<_firmware PD_R5_0_BTCM>;
+memory-region = <_0_fw_image>, <>,
+<>, <>;
+mboxes = <_mailbox_rpu0 0>, <_mailbox_rpu0 1>;
+mbox-names = "tx", "rx";
+};
+
+r5f@1 {
+compatible = "xlnx,zynqmp-r5f";
+reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+reg-names = "atcm", "btcm";
+power-domains = <_firmware PD_RPU_1>,
+<_firmware PD_R5_1_ATCM>,
+<_firmware PD_R5_1_BTCM>;
+memory-region = <_1_fw_image>, <>,
+<>, <>;
+mboxes = <_mailbox_rpu1 0>, <_mailbox_rpu1 1>;
+mbox-names = "tx", "rx";
+};
 };
+};
 
-r5f-1 {
-compatible = "xlnx,zynqmp-r5f";
-

[RFC V1 12/13] vdpa_sim: new owner capability

2024-01-10 Thread Steve Sistare

The vdpa_sim device supports ownership transfer to a new process, so
advertise VHOST_BACKEND_F_NEW_OWNER.  User virtual addresses are used
by the software iommu, so VHOST_IOTLB_REMAP is required after
VHOST_NEW_OWNER, so advertise VHOST_BACKEND_F_IOTLB_REMAP.

Signed-off-by: Steve Sistare 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 8734834983cb..d037869d8a89 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -430,7 +430,13 @@ static u64 vdpasim_get_device_features(struct vdpa_device 
*vdpa)
 
 static u64 vdpasim_get_backend_features(const struct vdpa_device *vdpa)
 {
-   return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
+   u64 features = BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK) |
+  BIT_ULL(VHOST_BACKEND_F_NEW_OWNER);
+
+   if (use_va)
+   features += BIT_ULL(VHOST_BACKEND_F_IOTLB_REMAP);
+
+   return features;
 }
 
 static int vdpasim_set_driver_features(struct vdpa_device *vdpa, u64 features)
-- 
2.39.3

[RFC V1 13/13] vduse: new owner capability

2024-01-10 Thread Steve Sistare

The vduse device supports ownership transfer to a new process, so
advertise VHOST_BACKEND_F_NEW_OWNER.  User virtual addresses are used
by the software iommu, so VHOST_IOTLB_REMAP is required after
VHOST_NEW_OWNER, so advertise VHOST_BACKEND_F_IOTLB_REMAP.

Signed-off-by: Steve Sistare 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 6b25457a037d..67815f6391db 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -608,6 +609,12 @@ static u32 vduse_vdpa_get_vq_align(struct vdpa_device 
*vdpa)
return dev->vq_align;
 }
 
+static u64 vduse_vdpa_get_backend_features(const struct vdpa_device *vdpa)
+{
+   return BIT_ULL(VHOST_BACKEND_F_IOTLB_REMAP) |
+  BIT_ULL(VHOST_BACKEND_F_NEW_OWNER);
+}
+
 static u64 vduse_vdpa_get_device_features(struct vdpa_device *vdpa)
 {
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
@@ -801,6 +808,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = 
{
.set_vq_state   = vduse_vdpa_set_vq_state,
.get_vq_state   = vduse_vdpa_get_vq_state,
.get_vq_align   = vduse_vdpa_get_vq_align,
+   .get_backend_features   = vduse_vdpa_get_backend_features,
.get_device_features= vduse_vdpa_get_device_features,
.set_driver_features= vduse_vdpa_set_driver_features,
.get_driver_features= vduse_vdpa_get_driver_features,
-- 
2.39.3

[RFC V1 07/13] vhost-vdpa: flush workers on suspend

2024-01-10 Thread Steve Sistare

To pass ownership of a live vdpa device to a new process, the user
suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
mm.  Flush workers in suspend to guarantee that no worker sees the new
mm and old VA in between.

Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 8fe1562d24af..9673e8e20d11 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -591,10 +591,14 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
 {
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
+   struct vhost_dev *vdev = >vdev;
 
if (!ops->suspend)
return -EOPNOTSUPP;
 
+   if (vdev->use_worker)
+   vhost_dev_flush(vdev);
+
return ops->suspend(vdpa);
 }
 
-- 
2.39.3

[RFC V1 04/13] vhost-vdpa: VHOST_BACKEND_F_NEW_OWNER

2024-01-10 Thread Steve Sistare

Add the VHOST_BACKEND_F_NEW_OWNER backend capability, which indicates that
VHOST_NEW_OWNER is supported.

Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c | 7 ++-
 include/uapi/linux/vhost_types.h | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index eb3a95e703b0..faed6471934a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -621,6 +621,10 @@ static long vhost_vdpa_new_owner(struct vhost_vdpa *v)
struct mm_struct *mm_new = current->mm;
long pinned_vm = v->pinned_vm;
unsigned long lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
+   u64 features = vhost_vdpa_get_backend_features(v);
+
+   if (!(features & BIT_ULL(VHOST_BACKEND_F_NEW_OWNER)))
+   return -EOPNOTSUPP;
 
if (!mm_old)
return -EINVAL;
@@ -784,7 +788,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST) |
 BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
 BIT_ULL(VHOST_BACKEND_F_RESUME) |
-
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
+
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK) |
+BIT_ULL(VHOST_BACKEND_F_NEW_OWNER)))
return -EOPNOTSUPP;
if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
 !vhost_vdpa_can_suspend(v))
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index d7656908f730..9177843951e9 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -192,5 +192,7 @@ struct vhost_vdpa_iova_range {
 #define VHOST_BACKEND_F_DESC_ASID0x7
 /* IOTLB don't flush memory mapping across device reset */
 #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
+/* Supports VHOST_NEW_OWNER */
+#define VHOST_BACKEND_F_NEW_OWNER  0x9
 
 #endif
-- 
2.39.3

[RFC V1 11/13] vdpa/mlx5: new owner capability

2024-01-10 Thread Steve Sistare

The mlx5 vdpa device supports ownership transfer to a new process, so
advertise VHOST_BACKEND_F_NEW_OWNER.  User virtual addresses are not
used after they are initially translated to physical, so VHOST_IOTLB_REMAP
is not required, hence VHOST_BACKEND_F_IOTLB_REMAP is not advertised.

Signed-off-by: Steve Sistare 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 26ba7da6b410..26f24fb0e160 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2562,7 +2562,8 @@ static void unregister_link_notifier(struct mlx5_vdpa_net 
*ndev)
 
 static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device *vdpa)
 {
-   return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
+   return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK) |
+  BIT_ULL(VHOST_BACKEND_F_NEW_OWNER);
 }
 
 static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 
features)
-- 
2.39.3

[RFC V1 10/13] vdpa_sim: flush workers on suspend

2024-01-10 Thread Steve Sistare

To pass ownership of a live vdpa device to a new process, the user
suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
mm.  Flush workers in suspend to guarantee that no worker sees the new
mm and old VA in between.

Signed-off-by: Steve Sistare 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 6304cb0b4770..8734834983cb 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -74,6 +74,17 @@ static void vdpasim_worker_change_mm_sync(struct vdpasim 
*vdpasim,
kthread_flush_work(work);
 }
 
+static void flush_work_fn(struct kthread_work *work) {}
+
+static void vdpasim_flush_work(struct vdpasim *vdpasim)
+{
+   struct kthread_work work;
+
+   kthread_init_work(, flush_work_fn);
+   kthread_queue_work(vdpasim->worker, );
+   kthread_flush_work();
+}
+
 static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
 {
return container_of(vdpa, struct vdpasim, vdpa);
@@ -512,6 +523,8 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
vdpasim->running = false;
mutex_unlock(>mutex);
 
+   vdpasim_flush_work(vdpasim);
+
return 0;
 }
 
-- 
2.39.3

[RFC V1 09/13] vdpa_sim: reset must not run

2024-01-10 Thread Steve Sistare

vdpasim_do_reset sets running to true, which is wrong, as it allows
vdpasim_kick_vq to post work requests before the device has been
configured.  To fix, do not set running until VIRTIO_CONFIG_S_FEATURES_OK
is set.

Signed-off-by: Steve Sistare 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index be2925d0d283..6304cb0b4770 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -160,7 +160,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim, u32 
flags)
}
}
 
-   vdpasim->running = true;
+   vdpasim->running = false;
spin_unlock(>iommu_lock);
 
vdpasim->features = 0;
@@ -483,6 +483,7 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 
status)
 
mutex_lock(>mutex);
vdpasim->status = status;
+   vdpasim->running = (status & VIRTIO_CONFIG_S_FEATURES_OK) != 0;
mutex_unlock(>mutex);
 }
 
-- 
2.39.3

[RFC V1 05/13] vhost-vdpa: VHOST_IOTLB_REMAP

2024-01-10 Thread Steve Sistare

When device ownership is passed to a new process via VHOST_NEW_OWNER,
some devices need to know the new userland addresses of the dma mappings.
Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr
of a mapping.  The new uaddr must address the same memory object as
originally mapped.

The user must suspend the device before the old address is invalidated,
and cannot resume it until after VHOST_IOTLB_REMAP is called, but this
requirement is not enforced by the API.

Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c | 34 
 include/uapi/linux/vhost_types.h | 11 ++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index faed6471934a..ec5ca20bd47d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 
 }
 
+static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v,
+ struct vhost_iotlb *iotlb,
+ struct vhost_iotlb_msg *msg)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+   u32 asid = iotlb_to_asid(iotlb);
+   u64 start = msg->iova;
+   u64 last = start + msg->size - 1;
+   struct vhost_iotlb_map *map;
+   int r = 0;
+
+   if (msg->perm || !msg->size)
+   return -EINVAL;
+
+   map = vhost_iotlb_itree_first(iotlb, start, last);
+   if (!map)
+   return -ENOENT;
+
+   if (map->start != start || map->last != last)
+   return -EINVAL;
+
+   /* batch will finish with remap.  non-batch must do it now. */
+   if (!v->in_batch)
+   r = ops->set_map(vdpa, asid, iotlb);
+   if (!r)
+   map->addr = msg->uaddr;
+
+   return r;
+}
+
 static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
   struct vhost_iotlb *iotlb,
   struct vhost_iotlb_msg *msg)
@@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
*dev, u32 asid,
ops->set_map(vdpa, asid, iotlb);
v->in_batch = false;
break;
+   case VHOST_IOTLB_REMAP:
+   r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg);
+   break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 9177843951e9..35908315ff55 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -79,7 +79,7 @@ struct vhost_iotlb_msg {
 /*
  * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying
  * multiple mappings in one go: beginning with
- * VHOST_IOTLB_BATCH_BEGIN, followed by any number of
+ * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or
  * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END.
  * When one of these two values is used as the message type, the rest
  * of the fields in the message are ignored. There's no guarantee that
@@ -87,6 +87,15 @@ struct vhost_iotlb_msg {
  */
 #define VHOST_IOTLB_BATCH_BEGIN5
 #define VHOST_IOTLB_BATCH_END  6
+
+/*
+ * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova.
+ * The new uaddr must address the same memory object as originally mapped.
+ * Failure to do so will result in user memory corruption and/or device
+ * misbehavior.  iova and size must match the arguments used to create the
+ * an existing mapping.  Protection is not changed, and perm must be 0.
+ */
+#define VHOST_IOTLB_REMAP  7
__u8 type;
 };
 
-- 
2.39.3

[RFC V1 06/13] vhost-vdpa: VHOST_BACKEND_F_IOTLB_REMAP

2024-01-10 Thread Steve Sistare

Add the VHOST_BACKEND_F_IOTLB_REMAP backend capability, which indicates
that VHOST_IOTLB_REMAP is supported.

If VHOST_BACKEND_F_IOTLB_REMAP is advertised, then the user must call
VHOST_IOTLB_REMAP after ownership of a device is transferred to a new
process via VHOST_NEW_OWNER.  Disabling the feature during negotiation
does not negate this requirement.

Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c | 8 +++-
 include/uapi/linux/vhost_types.h | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ec5ca20bd47d..8fe1562d24af 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -789,7 +789,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
 BIT_ULL(VHOST_BACKEND_F_RESUME) |
 
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK) |
-BIT_ULL(VHOST_BACKEND_F_NEW_OWNER)))
+BIT_ULL(VHOST_BACKEND_F_NEW_OWNER) |
+BIT_ULL(VHOST_BACKEND_F_IOTLB_REMAP)))
return -EOPNOTSUPP;
if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
 !vhost_vdpa_can_suspend(v))
@@ -1229,11 +1230,16 @@ static int vhost_vdpa_process_iotlb_remap(struct 
vhost_vdpa *v,
u64 start = msg->iova;
u64 last = start + msg->size - 1;
struct vhost_iotlb_map *map;
+   u64 features;
int r = 0;
 
if (msg->perm || !msg->size)
return -EINVAL;
 
+   features = ops->get_backend_features(vdpa);
+   if (!(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_REMAP)))
+   return -EOPNOTSUPP;
+
map = vhost_iotlb_itree_first(iotlb, start, last);
if (!map)
return -ENOENT;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 35908315ff55..7e79e9bd0f7b 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -203,5 +203,7 @@ struct vhost_vdpa_iova_range {
 #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
 /* Supports VHOST_NEW_OWNER */
 #define VHOST_BACKEND_F_NEW_OWNER  0x9
+/* Supports VHOST_IOTLB_REMAP */
+#define VHOST_BACKEND_F_IOTLB_REMAP  0xa
 
 #endif
-- 
2.39.3

[RFC V1 08/13] vduse: flush workers on suspend

2024-01-10 Thread Steve Sistare

To pass ownership of a live vdpa device to a new process, the user
suspends the device, calls VHOST_NEW_OWNER to change the mm, and calls
VHOST_IOTLB_REMAP to change the user virtual addresses to match the new
mm.  Flush workers in suspend to guarantee that no worker sees the new
mm and old VA in between.

Signed-off-by: Steve Sistare 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 0ddd4b8abecb..6b25457a037d 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -472,6 +472,18 @@ static void vduse_dev_reset(struct vduse_dev *dev)
up_write(>rwsem);
 }
 
+static void vduse_flush_work(struct vduse_dev *dev)
+{
+   flush_work(>inject);
+
+   for (int i = 0; i < dev->vq_num; i++) {
+   struct vduse_virtqueue *vq = dev->vqs[i];
+
+   flush_work(>inject);
+   flush_work(>kick);
+   }
+}
+
 static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx,
u64 desc_area, u64 driver_area,
u64 device_area)
@@ -713,6 +725,17 @@ static int vduse_vdpa_reset(struct vdpa_device *vdpa)
return ret;
 }
 
+static int vduse_vdpa_suspend(struct vdpa_device *vdpa)
+{
+   struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+   down_write(>rwsem);
+   vduse_flush_work(dev);
+   up_write(>rwsem);
+
+   return 0;
+}
+
 static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
 {
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
@@ -794,6 +817,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = 
{
.set_vq_affinity= vduse_vdpa_set_vq_affinity,
.get_vq_affinity= vduse_vdpa_get_vq_affinity,
.reset  = vduse_vdpa_reset,
+   .suspend= vduse_vdpa_suspend,
.set_map= vduse_vdpa_set_map,
.free   = vduse_vdpa_free,
 };
-- 
2.39.3

[RFC V1 03/13] vhost-vdpa: VHOST_NEW_OWNER

2024-01-10 Thread Steve Sistare

Add an ioctl to transfer file descriptor ownership and pinned memory
accounting from one process to another.

Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c   | 37 +
 drivers/vhost/vhost.c  | 15 +++
 drivers/vhost/vhost.h  |  1 +
 include/uapi/linux/vhost.h | 10 ++
 4 files changed, 63 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 2269988d6d33..eb3a95e703b0 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -613,6 +613,40 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
return ops->resume(vdpa);
 }
 
+static long vhost_vdpa_new_owner(struct vhost_vdpa *v)
+{
+   int r;
+   struct vhost_dev *vdev = >vdev;
+   struct mm_struct *mm_old = vdev->mm;
+   struct mm_struct *mm_new = current->mm;
+   long pinned_vm = v->pinned_vm;
+   unsigned long lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
+
+   if (!mm_old)
+   return -EINVAL;
+
+   if (!v->vdpa->use_va &&
+   pinned_vm + atomic64_read(_new->pinned_vm) > lock_limit)
+   return -ENOMEM;
+
+   r = vhost_vdpa_bind_mm(v, mm_new);
+   if (r)
+   return r;
+
+   r = vhost_dev_new_owner(vdev);
+   if (r) {
+   vhost_vdpa_bind_mm(v, mm_old);
+   return r;
+   }
+
+   if (!v->vdpa->use_va) {
+   atomic64_sub(pinned_vm, _old->pinned_vm);
+   atomic64_add(pinned_vm, _new->pinned_vm);
+   }
+
+   return r;
+}
+
 static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
   void __user *argp)
 {
@@ -843,6 +877,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_RESUME:
r = vhost_vdpa_resume(v);
break;
+   case VHOST_NEW_OWNER:
+   r = vhost_vdpa_new_owner(v);
+   break;
default:
r = vhost_dev_ioctl(>vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e0c181ad17e3..0ce7ee9834f4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -907,6 +907,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
 
+/* Caller should have device mutex */
+long vhost_dev_new_owner(struct vhost_dev *dev)
+{
+   if (dev->mm == current->mm)
+   return -EBUSY;
+
+   if (!vhost_dev_has_owner(dev))
+   return -EINVAL;
+
+   vhost_detach_mm(dev);
+   vhost_attach_mm(dev);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vhost_dev_new_owner);
+
 static struct vhost_iotlb *iotlb_alloc(void)
 {
return vhost_iotlb_alloc(max_iotlb_entries,
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f60d5f7bef94..cd0dab21d99e 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -185,6 +185,7 @@ void vhost_dev_init(struct vhost_dev *, struct 
vhost_virtqueue **vqs,
int (*msg_handler)(struct vhost_dev *dev, u32 asid,
   struct vhost_iotlb_msg *msg));
 long vhost_dev_set_owner(struct vhost_dev *dev);
+long vhost_dev_new_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
 struct vhost_iotlb *vhost_dev_reset_owner_prepare(void);
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 649560c685f1..5e3cdce4c0cf 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -123,6 +123,16 @@
 #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
 #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
 
+/* Set current process as the new owner of this file descriptor.  The fd must
+ * already be owned, via a prior call to VHOST_SET_OWNER.  The pinned memory
+ * count is transferred from the previous to the new owner.
+ * Errors:
+ *   EINVAL: not owned
+ *   EBUSY:  caller is already the owner
+ *   ENOMEM: RLIMIT_MEMLOCK exceeded
+ */
+#define VHOST_NEW_OWNER _IO(VHOST_VIRTIO, 0x27)
+
 /* VHOST_NET specific defines */
 
 /* Attach virtio net ring to a raw socket, or tap device.
-- 
2.39.3

[RFC V1 00/13] vdpa live update

2024-01-10 Thread Steve Sistare

Live update is a technique wherein an application saves its state, exec's
to an updated version of itself, and restores its state.  Clients of the
application experience a brief suspension of service, on the order of 
100's of milliseconds, but are otherwise unaffected.

Define and implement interfaces that allow vdpa devices to be preserved
across fork or exec, to support live update for applications such as qemu.
The device must be suspended during the update, but its dma mappings are
preserved, so the suspension is brief.

The VHOST_NEW_OWNER ioctl transfers device ownership and pinned memory
accounting from one process to another.

The VHOST_BACKEND_F_NEW_OWNER backend capability indicates that
VHOST_NEW_OWNER is supported.

The VHOST_IOTLB_REMAP message type updates a dma mapping with its userland
address in the new process.

The VHOST_BACKEND_F_IOTLB_REMAP backend capability indicates that
VHOST_IOTLB_REMAP is supported and required.  Some devices do not
require it, because the userland address of each dma mapping is discarded
after being translated to a physical address.

Here is a pseudo-code sequence for performing live update, based on
suspend + reset because resume is not yet available.  The vdpa device
descriptor, fd, remains open across the exec.

  ioctl(fd, VHOST_VDPA_SUSPEND)
  ioctl(fd, VHOST_VDPA_SET_STATUS, 0)
  exec 

  ioctl(fd, VHOST_NEW_OWNER)

  issue ioctls to re-create vrings

  if VHOST_BACKEND_F_IOTLB_REMAP
  foreach dma mapping
  write(fd, {VHOST_IOTLB_REMAP, new_addr})

  ioctl(fd, VHOST_VDPA_SET_STATUS,
ACKNOWLEDGE | DRIVER | FEATURES_OK | DRIVER_OK)


Steve Sistare (13):
  vhost-vdpa: count pinned memory
  vhost-vdpa: pass mm to bind
  vhost-vdpa: VHOST_NEW_OWNER
  vhost-vdpa: VHOST_BACKEND_F_NEW_OWNER
  vhost-vdpa: VHOST_IOTLB_REMAP
  vhost-vdpa: VHOST_BACKEND_F_IOTLB_REMAP
  vhost-vdpa: flush workers on suspend
  vduse: flush workers on suspend
  vdpa_sim: reset must not run
  vdpa_sim: flush workers on suspend
  vdpa/mlx5: new owner capability
  vdpa_sim: new owner capability
  vduse: new owner capability

 drivers/vdpa/mlx5/net/mlx5_vnet.c  |   3 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c   |  24 ++-
 drivers/vdpa/vdpa_user/vduse_dev.c |  32 +
 drivers/vhost/vdpa.c   | 101 +++--
 drivers/vhost/vhost.c  |  15 +
 drivers/vhost/vhost.h  |   1 +
 include/uapi/linux/vhost.h |  10 +++
 include/uapi/linux/vhost_types.h   |  15 -
 8 files changed, 191 insertions(+), 10 deletions(-)

-- 
2.39.3

[RFC V1 02/13] vhost-vdpa: pass mm to bind

2024-01-10 Thread Steve Sistare

Pass the target mm to vhost_vdpa_bind_mm.  No functional change.

Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 10fb95bcca1a..2269988d6d33 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -248,7 +248,7 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
return _compat_vdpa_reset(v);
 }
 
-static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
+static long vhost_vdpa_bind_mm(struct vhost_vdpa *v, struct mm_struct *mm)
 {
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
@@ -256,7 +256,7 @@ static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
if (!vdpa->use_va || !ops->bind_mm)
return 0;
 
-   return ops->bind_mm(vdpa, v->vdev.mm);
+   return ops->bind_mm(vdpa, mm);
 }
 
 static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
@@ -855,7 +855,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 
switch (cmd) {
case VHOST_SET_OWNER:
-   r = vhost_vdpa_bind_mm(v);
+   r = vhost_vdpa_bind_mm(v, v->vdev.mm);
if (r)
vhost_dev_reset_owner(d, NULL);
break;
-- 
2.39.3

[RFC V1 01/13] vhost-vdpa: count pinned memory

2024-01-10 Thread Steve Sistare

Remember the count of pinned memory for the device.

Signed-off-by: Steve Sistare 
---
 drivers/vhost/vdpa.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index da7ec77cdaff..10fb95bcca1a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -59,6 +59,7 @@ struct vhost_vdpa {
int in_batch;
struct vdpa_iova_range range;
u32 batch_asid;
+   long pinned_vm;
 };
 
 static DEFINE_IDA(vhost_vdpa_ida);
@@ -893,6 +894,7 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, 
struct vhost_iotlb *iotlb,
unpin_user_page(page);
}
atomic64_sub(PFN_DOWN(map->size), >mm->pinned_vm);
+   v->pinned_vm -= PFN_DOWN(map->size);
vhost_vdpa_general_unmap(v, map, asid);
vhost_iotlb_map_free(iotlb, map);
}
@@ -975,9 +977,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, struct 
vhost_iotlb *iotlb,
return r;
}
 
-   if (!vdpa->use_va)
+   if (!vdpa->use_va) {
atomic64_add(PFN_DOWN(size), >mm->pinned_vm);
-
+   v->pinned_vm += PFN_DOWN(size);
+   }
return 0;
 }
 
-- 
2.39.3

Re: [GIT PULL] hardening updates for v6.8-rc1

2024-01-10 Thread pr-tracker-bot

The pull request you sent on Mon, 8 Jan 2024 10:20:13 -0800:

> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
> tags/hardening-v6.8-rc1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/120a201bd2ad0bffebdd2cf62c389dbba79bbfae

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: [PATCH v6 01/12] cgroup/misc: Add per resource callbacks for CSS events

2024-01-10 Thread Jarkko Sakkinen

On Tue Jan 9, 2024 at 5:37 AM EET, Haitao Huang wrote:
> On Wed, 15 Nov 2023 14:25:59 -0600, Jarkko Sakkinen   
> wrote:
>
> > On Mon Oct 30, 2023 at 8:20 PM EET, Haitao Huang wrote:
> >> From: Kristen Carlson Accardi 
> >>
> >> The misc cgroup controller (subsystem) currently does not perform
> >> resource type specific action for Cgroups Subsystem State (CSS) events:
> >> the 'css_alloc' event when a cgroup is created and the 'css_free' event
> >> when a cgroup is destroyed.
> >>
> >> Define callbacks for those events and allow resource providers to
> >> register the callbacks per resource type as needed. This will be
> >> utilized later by the EPC misc cgroup support implemented in the SGX
> >> driver.
> >>
> >> Also add per resource type private data for those callbacks to store and
> >> access resource specific data.
> >>
> >> Signed-off-by: Kristen Carlson Accardi 
> >> Co-developed-by: Haitao Huang 
> >> Signed-off-by: Haitao Huang 
> >> ---
> >> V6:
> >> - Create ops struct for per resource callbacks (Jarkko)
> >> - Drop max_write callback (Dave, Michal)
> >> - Style fixes (Kai)
> >> ---
> >>  include/linux/misc_cgroup.h | 14 ++
> >>  kernel/cgroup/misc.c| 27 ---
> >>  2 files changed, 38 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
> >> index e799b1f8d05b..5dc509c27c3d 100644
> >> --- a/include/linux/misc_cgroup.h
> >> +++ b/include/linux/misc_cgroup.h
> >> @@ -27,16 +27,30 @@ struct misc_cg;
> >>
> >>  #include 
> >>
> >> +/**
> >> + * struct misc_operations_struct: per resource callback ops.
> >> + * @alloc: invoked for resource specific initialization when cgroup is  
> >> allocated.
> >> + * @free: invoked for resource specific cleanup when cgroup is  
> >> deallocated.
> >> + */
> >> +struct misc_operations_struct {
> >> +  int (*alloc)(struct misc_cg *cg);
> >> +  void (*free)(struct misc_cg *cg);
> >> +};
> >
> > Maybe just misc_operations, or even misc_ops?
> >
>
> With Michal's suggestion to make ops per-resource-type, I'll rename this  
> misc_res_ops  (I was following vm_operations_struct as example)
>
> >> +
> >>  /**
> >>   * struct misc_res: Per cgroup per misc type resource
> >>   * @max: Maximum limit on the resource.
> >>   * @usage: Current usage of the resource.
> >>   * @events: Number of times, the resource limit exceeded.
> >> + * @priv: resource specific data.
> >> + * @misc_ops: resource specific operations.
> >>   */
> >>  struct misc_res {
> >>u64 max;
> >>atomic64_t usage;
> >>atomic64_t events;
> >> +  void *priv;
> >
> > priv is the wrong patch, it just confuses the overall picture heere.
> > please move it to 04/12. Let's deal with the callbacks here.
> >
>
> Ok
>
> >> +  const struct misc_operations_struct *misc_ops;
> >>  };
> >
> > misc_ops would be at least consistent with this, as misc_res also has an
> > acronym.
> >
> >>
> >>  /**
> >> diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
> >> index 79a3717a5803..d971ede44ebf 100644
> >> --- a/kernel/cgroup/misc.c
> >> +++ b/kernel/cgroup/misc.c
> >> @@ -383,23 +383,37 @@ static struct cftype misc_cg_files[] = {
> >>  static struct cgroup_subsys_state *
> >>  misc_cg_alloc(struct cgroup_subsys_state *parent_css)
> >>  {
> >> +  struct misc_cg *parent_cg, *cg;
> >>enum misc_res_type i;
> >> -  struct misc_cg *cg;
> >> +  int ret;
> >>
> >>if (!parent_css) {
> >> -  cg = _cg;
> >> +  parent_cg = cg = _cg;
> >>} else {
> >>cg = kzalloc(sizeof(*cg), GFP_KERNEL);
> >>if (!cg)
> >>return ERR_PTR(-ENOMEM);
> >> +  parent_cg = css_misc(parent_css);
> >>}
> >>
> >>for (i = 0; i < MISC_CG_RES_TYPES; i++) {
> >>WRITE_ONCE(cg->res[i].max, MAX_NUM);
> >>atomic64_set(>res[i].usage, 0);
> >> +  if (parent_cg->res[i].misc_ops && 
> >> parent_cg->res[i].misc_ops->alloc)  
> >> {
> >> +  ret = parent_cg->res[i].misc_ops->alloc(cg);
> >> +  if (ret)
> >> +  goto alloc_err;
> >
> > The patch set only has a use case for both operations defined - any
> > partial combinations should never be allowed.
> >
> > To enforce this invariant you could create a set of operations (written
> > out of top of my head):
> >
> > static int misc_res_init(struct misc_res *res, struct misc_ops *ops)
> > {
> > if (!misc_ops->alloc) {
> > pr_err("%s: alloc missing\n", __func__);
> > return -EINVAL;
> > }
> >
> > if (!misc_ops->free) {
> > pr_err("%s: free missing\n", __func__);
> > return -EINVAL;
> > }
> >
> > res->misc_ops = misc_ops;
> > return 0;
> > }
> >
> > static inline int misc_res_alloc(struct misc_cg *cg, struct misc_res  
> > *res)
> > {
> > int ret;
> >
> > if (!res->misc_ops)
> > return 0;
> > 
> > return res->misc_ops->alloc(cg);
> > }
> >
> >

Re: [PATCH v1 5/5] documentation: Update on livepatch elf format

2024-01-10 Thread Marcos Paulo de Souza

On Mon, 2023-11-06 at 17:25 +0100, Lukas Hruska wrote:
> Add a section to Documentation/livepatch/module-elf-format.rst
> describing how klp-convert works for fixing relocations.
> 
> Signed-off-by: Lukas Hruska 

Reviewed-by: Marcos Paulo de Souza 

> ---
>  Documentation/livepatch/module-elf-format.rst | 67
> +++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/Documentation/livepatch/module-elf-format.rst
> b/Documentation/livepatch/module-elf-format.rst
> index a03ed02ec57e..2aa9b11cd806 100644
> --- a/Documentation/livepatch/module-elf-format.rst
> +++ b/Documentation/livepatch/module-elf-format.rst
> @@ -300,3 +300,70 @@ symbol table, and relocation section indices,
> ELF information is preserved for
>  livepatch modules and is made accessible by the module loader
> through
>  module->klp_info, which is a :c:type:`klp_modinfo` struct. When a
> livepatch module
>  loads, this struct is filled in by the module loader.
> +
> +6. klp-convert tool
> +===
> +The livepatch relocation sections might be created using
> +scripts/livepatch/klp-convert. It is called automatically during
> +the build as part of a module post processing.
> +
> +The tool is not able to find the symbols and all the metadata
> +automatically. Instead, all needed information must already be
> +part of rela entry for the given symbol. Such a rela can
> +be created easily by using KLP_RELOC_SYMBOL() macro after
> +the symbol declaration.
> +
> +KLP_RELOC_SYMBOL causes that the relocation entries for
> +the given symbol will be created in the following format::
> +
> +  .klp.sym.rela.lp_object.sym_object.sym_name,sympos
> +  ^   ^ ^   ^ ^    ^ ^  ^   ^
> +  |___| |___| || |__|   |
> +   [A] [B]    [C]   [D]    [E]
> +
> +[A]
> +  The symbol name is prefixed with the string ".klp.sym.rela."
> +
> +[B]
> +  The name of the object (i.e. "vmlinux" or name of module) which
> +  is livepatched.
> +
> +[C]
> +  The name of the object (i.e. "vmlinux" or name of module) to
> +  which the symbol belongs follows immediately after the prefix.
> +
> +[D]
> +  The actual name of the symbol.
> +
> +[E]
> +  The position of the symbol in the object (as according to
> kallsyms)
> +  This is used to differentiate duplicate symbols within the same
> +  object. The symbol position is expressed numerically (0, 1, 2...).
> +  The symbol position of a unique symbol is 0.
> +
> +Example:
> +
> +**Livepatch source code:**
> +
> +::
> +
> +  extern char *saved_command_line \
> + KLP_RELOC_SYMBOL(vmlinux, vmlinux,
> saved_command_line, 0);
> +
> +**`readelf -r -W` output of compiled module:**
> +
> +::
> +
> +  Relocation section '.rela.text' at offset 0x32e60 contains 10
> entries:
> +  Offset Info Type  
> Symbol's Value  Symbol's Name + Addend
> +  ...
> +  0068  003c0002 R_X86_64_PC32 
>  .klp.sym.rela.vmlinux.vmlinux.saved_command_line,0 -
> 4
> +  ...
> +
> +**`readelf -r -W` output of transformed module by klp-convert:**
> +
> +::
> +
> +  Relocation section '.klp.rela.vmlinux.text' at offset 0x5cb60
> contains 1 entry:
> +  Offset Info Type  
> Symbol's Value  Symbol's Name + Addend
> +  0068  003c0002 R_X86_64_PC32 
>  .klp.sym.vmlinux.saved_command_line,0 - 4

Re: [PATCH v1 3/5] kbuild/modpost: integrate klp-convert

2024-01-10 Thread Marcos Paulo de Souza

On Mon, 2023-11-06 at 17:25 +0100, Lukas Hruska wrote:
> From: Josh Poimboeuf 
> 
> Update the modpost program so that it does not warn about unresolved
> symbols matching the expected format which will be then resolved by
> klp-convert.
> 
> Signed-off-by: Josh Poimboeuf 
> Signed-off-by: Lukas Hruska 

Reviewed-by: Marcos Paulo de Souza 

(The patch currently conflicts with Linus tree on Makefile and
modpost.c, but nothing to worry, AFAICS)

> ---
>  .gitignore    |  1 +
>  Makefile  | 10 ++
>  scripts/Makefile.modfinal | 15 +++
>  scripts/Makefile.modpost  |  5 +
>  scripts/mod/modpost.c | 36 ++--
>  scripts/mod/modpost.h |  3 +++
>  6 files changed, 64 insertions(+), 6 deletions(-)
> 
> diff --git a/.gitignore b/.gitignore
> index 9fd4c9533b3d..628caf76b617 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -69,6 +69,7 @@ modules.order
>  /Module.markers
>  /modules.builtin
>  /modules.builtin.modinfo
> +/modules.livepatch
>  /modules.nsdeps
>  
>  #
> diff --git a/Makefile b/Makefile
> index 2fdd8b40b7e0..459b9c9fe0a8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1185,6 +1185,7 @@ PHONY += prepare0
>  export extmod_prefix = $(if $(KBUILD_EXTMOD),$(KBUILD_EXTMOD)/)
>  export MODORDER := $(extmod_prefix)modules.order
>  export MODULES_NSDEPS := $(extmod_prefix)modules.nsdeps
> +export MODULES_LIVEPATCH := $(extmod-prefix)modules.livepatch
>  
>  ifeq ($(KBUILD_EXTMOD),)
>  
> @@ -1535,8 +1536,8 @@ endif
>  #
>  
>  # *.ko are usually independent of vmlinux, but
> CONFIG_DEBUG_INFO_BTF_MODULES
> -# is an exception.
> -ifdef CONFIG_DEBUG_INFO_BTF_MODULES
> +# and CONFIG_LIVEPATCH are exceptions.
> +ifneq ($(or $(CONFIG_DEBUG_INFO_BTF_MODULES),$(CONFIG_LIVEPATCH)),)
>  KBUILD_BUILTIN := 1
>  modules: vmlinux
>  endif
> @@ -1595,8 +1596,9 @@ endif
>  # Directories & files removed with 'make clean'
>  CLEAN_FILES += include/ksym vmlinux.symvers modules-only.symvers \
>      modules.builtin modules.builtin.modinfo
> modules.nsdeps \
> -    compile_commands.json .thinlto-cache rust/test
> rust/doc \
> -    rust-project.json .vmlinux.objs .vmlinux.export.c
> +    modules.livepatch compile_commands.json .thinlto-
> cache \
> +    rust/test rust/doc rust-project.json .vmlinux.objs \
> +    .vmlinux.export.c
>  
>  # Directories & files removed with 'make mrproper'
>  MRPROPER_FILES += include/config include/generated  \
> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
> index fc19f67039bd..155d07476a2c 100644
> --- a/scripts/Makefile.modfinal
> +++ b/scripts/Makefile.modfinal
> @@ -14,6 +14,7 @@ include $(srctree)/scripts/Makefile.lib
>  
>  # find all modules listed in modules.order
>  modules := $(call read-file, $(MODORDER))
> +modules-klp := $(call read-file, $(MODULES_LIVEPATCH))
>  
>  __modfinal: $(modules:%.o=%.ko)
>   @:
> @@ -65,6 +66,20 @@ endif
>  
>  targets += $(modules:%.o=%.ko) $(modules:%.o=%.mod.o)
>  
> +# Livepatch
> +# --
> -
> +
> +%.tmp.ko: %.o %.mod.o FORCE
> + +$(call if_changed,ld_ko_o)
> +
> +quiet_cmd_klp_convert = KLP $@
> +  cmd_klp_convert = scripts/livepatch/klp-convert $< $@
> +
> +$(modules-klp:%.o=%.ko): %.ko: %.tmp.ko FORCE
> + $(call if_changed,klp_convert)
> +
> +targets += $(modules-klp:.ko=.tmp.ko)
> +
>  # Add FORCE to the prequisites of a target to force it to be always
> rebuilt.
>  # --
> -
>  
> diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
> index 39472e834b63..c757f5eddc3e 100644
> --- a/scripts/Makefile.modpost
> +++ b/scripts/Makefile.modpost
> @@ -47,6 +47,7 @@ modpost-args
> = 
> \
>   $(if $(KBUILD_MODPOST_WARN),-
> w)\
>   $(if $(KBUILD_NSDEPS),-d
> $(MODULES_NSDEPS))\
>   $(if
> $(CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS)$(KBUILD_NSDEPS),-
> N)\
> + $(if $(CONFIG_LIVEPATCH),-l
> $(MODULES_LIVEPATCH)) \
>   $(if $(findstring 1, $(KBUILD_EXTRA_WARN)),-
> W)\
>   -o $@
>  
> @@ -144,6 +145,10 @@ $(output-symdump): $(modpost-deps) FORCE
>   $(call if_changed,modpost)
>  
>  __modpost: $(output-symdump)
> +ifndef CONFIG_LIVEPATCH
> + $(Q)rm -f $(MODULES_LIVEPATCH)
> + $(Q)touch $(MODULES_LIVEPATCH)
> +endif
>  PHONY += FORCE
>  FORCE:
>  
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index b29b29707f10..f6afa2e10601 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -1733,6 +1733,10 @@ static void read_symbols(const char *modname)
>   }
>   }
>  
> + /* Livepatch modules have unresolved symbols

Re: [PATCH 2/2] arm64: dts: qcom: sm7225-fairphone-fp4: Add PM6150L thermals

2024-01-10 Thread Konrad Dybcio





On 1/9/24 12:24, Luca Weiss wrote:

On Tue Jan 9, 2024 at 11:09 AM CET, Konrad Dybcio wrote:



On 1/5/24 15:54, Luca Weiss wrote:

Configure the thermals for the PA_THERM1, MSM_THERM, PA_THERM0,
RFC_CAM_THERM, CAM_FLASH_THERM and QUIET_THERM thermistors connected to
PM6150L.

Due to hardware constraints we can only register 4 zones with
pm6150l_adc_tm, the other 2 we can register via generic-adc-thermal.


Ugh.. so the ADC can support more inputs than the ADC_TM that was
designed to ship alongside it can?

And that's why the "generic-adc-thermal"-provided zones need to
be polled?


This part of the code from qcom-spmi-adc-tm5.c was trigerring if I
define more than 4 channels, and looking at downstream I can also see
that only 4 zones are registered properly with adc_tm, the rest is
registered with "qcom,adc-tm5-iio" which skips from what I could tell
basically all the HW bits and only registering the thermal zone.


ret = adc_tm5_read(chip, ADC_TM5_NUM_BTM,
   _available, sizeof(channels_available));
if (ret) {
dev_err(chip->dev, "read failed for BTM channels\n");
return ret;
}

for (i = 0; i < chip->nchannels; i++) {
if (chip->channels[i].channel >= channels_available) {
dev_err(chip->dev, "Invalid channel %d\n", 
chip->channels[i].channel);
return -EINVAL;
}
}






The trip points can really only be considered as placeholders, more
configuration with cooling etc. can be added later.

Signed-off-by: Luca Weiss 
---

[...]

I've read the sentence above, but..

+   sdm-skin-thermal {
+   polling-delay-passive = <1000>;
+   polling-delay = <5000>;
+   thermal-sensors = <_therm_sensor>;
+
+   trips {
+   active-config0 {
+   temperature = <125000>;
+   hysteresis = <1000>;
+   type = "passive";


I don't fancy burnt fingers for dinner!


With passive trip point it wouldn't even do anything now, but at what
temp do you think it should do what? I'd definitely need more time to
understand more of how the thermal setup works in downstream Android,
and then replicate a sane configuration for mainline with proper
temperatures, cooling, etc.

If "skin therm" means "the temperature of some part of the phone's
body that can be felt with a human hand", then definitely some
throttling should happen at 40ish with heavy throttling at 50
and crit at 55 or so..

We should probably make this a broader topic and keep a single
policy for all supported phones.

+ CC AGdR, may be interested in where this leads

Konrad

[PATCH 9/9] MAINTAINERS: add myself as Marvell PXA1908 maintainer

2024-01-10 Thread Duje Mihanović

Add myself as the maintainer for Marvell PXA1908 SoC support.

Signed-off-by: Duje Mihanović 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bcacd665f259..374df772aeff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2344,6 +2344,15 @@ F:   drivers/irqchip/irq-mvebu-*
 F: drivers/pinctrl/mvebu/
 F: drivers/rtc/rtc-armada38x.c
 
+ARM/Marvell PXA1908 SOC support
+M: Duje Mihanović 
+L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
+S: Maintained
+T: git https://gitlab.com/LegoLivesMatter/linux
+F: arch/arm64/boot/dts/marvell/pxa1908*
+F: drivers/clk/mmp/clk-of-pxa1908.c
+F: include/dt-bindings/clock/marvell,pxa1908.h
+
 ARM/Mediatek RTC DRIVER
 M: Eddie Huang 
 M: Sean Wang 
-- 
2.43.0

[PATCH v8 8/9] arm64: dts: Add DTS for Marvell PXA1908 and samsung,coreprimevelte

2024-01-10 Thread Duje Mihanović

Add DTS for Marvell PXA1908 SoC and Samsung Galaxy Core Prime Value
Edition LTE, a smartphone based on said SoC.

Signed-off-by: Duje Mihanović 
---
 arch/arm64/boot/dts/marvell/Makefile   |   3 +
 .../dts/marvell/pxa1908-samsung-coreprimevelte.dts | 336 +
 arch/arm64/boot/dts/marvell/pxa1908.dtsi   | 304 +++
 3 files changed, 643 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/Makefile 
b/arch/arm64/boot/dts/marvell/Makefile
index 99b8cb3c49e1..687c256d95fe 100644
--- a/arch/arm64/boot/dts/marvell/Makefile
+++ b/arch/arm64/boot/dts/marvell/Makefile
@@ -28,3 +28,6 @@ dtb-$(CONFIG_ARCH_MVEBU) += cn9130-crb-A.dtb
 dtb-$(CONFIG_ARCH_MVEBU) += cn9130-crb-B.dtb
 dtb-$(CONFIG_ARCH_MVEBU) += ac5x-rd-carrier-cn9131.dtb
 dtb-$(CONFIG_ARCH_MVEBU) += ac5-98dx35xx-rd.dtb
+
+# MMP SoC Family
+dtb-$(CONFIG_ARCH_MMP) += pxa1908-samsung-coreprimevelte.dtb
diff --git a/arch/arm64/boot/dts/marvell/pxa1908-samsung-coreprimevelte.dts 
b/arch/arm64/boot/dts/marvell/pxa1908-samsung-coreprimevelte.dts
new file mode 100644
index ..4aac4c120087
--- /dev/null
+++ b/arch/arm64/boot/dts/marvell/pxa1908-samsung-coreprimevelte.dts
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include "pxa1908.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Samsung Galaxy Core Prime VE LTE";
+   compatible = "samsung,coreprimevelte", "marvell,pxa1908";
+
+   aliases {
+   mmc0 =  /* eMMC */
+   mmc1 =  /* SD card */
+   serial0 = 
+   };
+
+   chosen {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   stdout-path = "serial0:115200n8";
+
+   /* S-Boot places the initramfs here */
+   linux,initrd-start = <0x4d7>;
+   linux,initrd-end = <0x500>;
+
+   fb0: framebuffer@17177000 {
+   compatible = "simple-framebuffer";
+   reg = <0 0x17177000 0 (480 * 800 * 4)>;
+   width = <480>;
+   height = <800>;
+   stride = <(480 * 4)>;
+   format = "a8r8g8b8";
+   };
+   };
+
+   /* Bootloader fills this in */
+   memory {
+   device_type = "memory";
+   reg = <0 0 0 0>;
+   };
+
+   reserved-memory {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   framebuffer@1700 {
+   reg = <0 0x1700 0 0x180>;
+   no-map;
+   };
+
+   gpu@900 {
+   reg = <0 0x900 0 0x100>;
+   };
+
+   /* Communications processor, aka modem */
+   cp@500 {
+   reg = <0 0x500 0 0x300>;
+   };
+
+   cm3@a00 {
+   reg = <0 0xa00 0 0x8>;
+   };
+
+   seclog@800 {
+   reg = <0 0x800 0 0x10>;
+   };
+
+   ramoops@810 {
+   compatible = "ramoops";
+   reg = <0 0x810 0 0x4>;
+   record-size = <0x8000>;
+   console-size = <0x2>;
+   max-reason = <5>;
+   };
+   };
+
+
+   i2c-muic {
+   compatible = "i2c-gpio";
+   sda-gpios = < 30 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+   scl-gpios = < 29 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+   i2c-gpio,delay-us = <3>;
+   i2c-gpio,timeout-ms = <100>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_muic_pins>;
+
+   muic: extcon@14 {
+   compatible = "siliconmitus,sm5504-muic";
+   reg = <0x14>;
+   interrupt-parent = <>;
+   interrupts = <0 IRQ_TYPE_EDGE_FALLING>;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+   pinctrl-names = "default";
+   pinctrl-0 = <_keys_pins>;
+   autorepeat;
+
+   key-home {
+   label = "Home";
+   linux,code = ;
+   gpios = < 50 GPIO_ACTIVE_LOW>;
+   };
+
+   key-volup {
+   label = "Volume Up";
+   linux,code = ;
+   gpios = < 16 GPIO_ACTIVE_LOW>;
+   };
+
+   key-voldown {
+   label = "Volume Down";
+   linux,code = ;
+   gpios = < 17 GPIO_ACTIVE_LOW>;
+   };
+   };
+};
+
+ {
+   status = "okay";
+};
+
+ {
+

[PATCH v8 7/9] arm64: Kconfig.platforms: Add config for Marvell PXA1908 platform

2024-01-10 Thread Duje Mihanović

Add ARCH_MMP configuration option for Marvell PXA1908 SoC.

Signed-off-by: Duje Mihanović 
---
 arch/arm64/Kconfig.platforms | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index 24335565bad5..d71b0b6e75aa 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -168,6 +168,14 @@ config ARCH_MESON
  This enables support for the arm64 based Amlogic SoCs
  such as the s905, S905X/D, S912, A113X/D or S905X/D2
 
+config ARCH_MMP
+   bool "Marvell MMP SoC Family"
+   select PINCTRL
+   select PINCTRL_SINGLE
+   help
+ This enables support for Marvell MMP SoC family, currently
+ supporting PXA1908 aka IAP140.
+
 config ARCH_MVEBU
bool "Marvell EBU SoC Family"
select ARMADA_AP806_SYSCON

-- 
2.43.0

[PATCH v8 5/9] clk: mmp: Add Marvell PXA1908 clock driver

2024-01-10 Thread Duje Mihanović

Add driver for Marvell PXA1908 clock controller blocks. The SoC has
numerous clock controller blocks, currently supporting APBC, APBCP, MPMU
and APMU.

Signed-off-by: Duje Mihanović 
---
 drivers/clk/mmp/Makefile |   2 +-
 drivers/clk/mmp/clk-of-pxa1908.c | 328 +++
 2 files changed, 329 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/mmp/Makefile b/drivers/clk/mmp/Makefile
index 441bf83080a1..69f9c3afde83 100644
--- a/drivers/clk/mmp/Makefile
+++ b/drivers/clk/mmp/Makefile
@@ -11,4 +11,4 @@ obj-$(CONFIG_MACH_MMP_DT) += clk-of-pxa168.o clk-of-pxa910.o
 obj-$(CONFIG_COMMON_CLK_MMP2) += clk-of-mmp2.o clk-pll.o pwr-island.o
 obj-$(CONFIG_COMMON_CLK_MMP2_AUDIO) += clk-audio.o
 
-obj-y += clk-of-pxa1928.o
+obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-of-pxa1908.o
diff --git a/drivers/clk/mmp/clk-of-pxa1908.c b/drivers/clk/mmp/clk-of-pxa1908.c
new file mode 100644
index ..6f1f6e25a718
--- /dev/null
+++ b/drivers/clk/mmp/clk-of-pxa1908.c
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "clk.h"
+
+#define APMU_CLK_GATE_CTRL 0x40
+#define MPMU_UART_PLL  0x14
+
+#define APBC_UART0 0x0
+#define APBC_UART1 0x4
+#define APBC_GPIO  0x8
+#define APBC_PWM0  0xc
+#define APBC_PWM1  0x10
+#define APBC_PWM2  0x14
+#define APBC_PWM3  0x18
+#define APBC_SSP0  0x1c
+#define APBC_SSP1  0x20
+#define APBC_IPC_RST   0x24
+#define APBC_RTC   0x28
+#define APBC_TWSI0 0x2c
+#define APBC_KPC   0x30
+#define APBC_SWJTAG0x40
+#define APBC_SSP2  0x4c
+#define APBC_TWSI1 0x60
+#define APBC_THERMAL   0x6c
+#define APBC_TWSI3 0x70
+
+#define APBCP_UART20x1c
+#define APBCP_TWSI20x28
+#define APBCP_AICER0x38
+
+#define APMU_CCIC1 0x24
+#define APMU_ISP   0x38
+#define APMU_DSI1  0x44
+#define APMU_DISP1 0x4c
+#define APMU_CCIC0 0x50
+#define APMU_SDH0  0x54
+#define APMU_SDH1  0x58
+#define APMU_USB   0x5c
+#define APMU_NF0x60
+#define APMU_VPU   0xa4
+#define APMU_GC0xcc
+#define APMU_SDH2  0xe0
+#define APMU_GC2D  0xf4
+#define APMU_TRACE 0x108
+#define APMU_DVC_DFC_DEBUG 0x140
+
+#define MPMU_NR_CLKS   39
+#define APBC_NR_CLKS   19
+#define APBCP_NR_CLKS  4
+#define APMU_NR_CLKS   17
+
+struct pxa1908_clk_unit {
+   struct mmp_clk_unit unit;
+   void __iomem *mpmu_base;
+   void __iomem *apmu_base;
+   void __iomem *apbc_base;
+   void __iomem *apbcp_base;
+   void __iomem *apbs_base;
+   void __iomem *ciu_base;
+};
+
+static struct mmp_param_fixed_rate_clk fixed_rate_clks[] = {
+   {PXA1908_CLK_CLK32, "clk32", NULL, 0, 32768},
+   {PXA1908_CLK_VCTCXO, "vctcxo", NULL, 0, 26 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_624, "pll1_624", NULL, 0, 624 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_416, "pll1_416", NULL, 0, 416 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_499, "pll1_499", NULL, 0, 499 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_832, "pll1_832", NULL, 0, 832 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_1248, "pll1_1248", NULL, 0, 1248 * HZ_PER_MHZ},
+};
+
+static struct mmp_param_fixed_factor_clk fixed_factor_clks[] = {
+   {PXA1908_CLK_PLL1_D2, "pll1_d2", "pll1_624", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D4, "pll1_d4", "pll1_d2", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D6, "pll1_d6", "pll1_d2", 1, 3, 0},
+   {PXA1908_CLK_PLL1_D8, "pll1_d8", "pll1_d4", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D12, "pll1_d12", "pll1_d6", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D13, "pll1_d13", "pll1_624", 1, 13, 0},
+   {PXA1908_CLK_PLL1_D16, "pll1_d16", "pll1_d8", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D24, "pll1_d24", "pll1_d12", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D48, "pll1_d48", "pll1_d24", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D96, "pll1_d96", "pll1_d48", 1, 2, 0},
+   {PXA1908_CLK_PLL1_32, "pll1_32", "pll1_d13", 2, 3, 0},
+   {PXA1908_CLK_PLL1_208, "pll1_208", "pll1_d2", 2, 3, 0},
+   {PXA1908_CLK_PLL1_117, "pll1_117", "pll1_624", 3, 16, 0},
+};
+
+static struct mmp_clk_factor_masks uart_factor_masks = {
+   .factor = 2,
+   .num_mask = GENMASK(12, 0),
+   .den_mask = GENMASK(12, 0),
+   .num_shift = 16,
+   .den_shift = 0,
+};
+
+static struct u32_fract uart_factor_tbl[] = {
+   {.numerator = 8125, .denominator = 1536},   /* 14.745MHz */
+};
+
+static DEFINE_SPINLOCK(pll1_lock);
+static struct mmp_param_general_gate_clk pll1_gate_clks[] = {
+   {PXA1908_CLK_PLL1_D2_GATE, "pll1_d2_gate", "pll1_d2", 0, 
APMU_CLK_GATE_CTRL, 29, 0, _lock},
+

[PATCH v8 4/9] dt-bindings: clock: Add Marvell PXA1908 clock bindings

2024-01-10 Thread Duje Mihanović

Add dt bindings and documentation for the Marvell PXA1908 clock
controller.

Reviewed-by: Conor Dooley 
Signed-off-by: Duje Mihanović 
---
 .../devicetree/bindings/clock/marvell,pxa1908.yaml | 48 
 include/dt-bindings/clock/marvell,pxa1908.h| 88 ++
 2 files changed, 136 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml 
b/Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml
new file mode 100644
index ..4e78933232b6
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/marvell,pxa1908.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Marvell PXA1908 Clock Controllers
+
+maintainers:
+  - Duje Mihanović 
+
+description: |
+  The PXA1908 clock subsystem generates and supplies clock to various
+  controllers within the PXA1908 SoC. The PXA1908 contains numerous clock
+  controller blocks, with the ones currently supported being APBC, APBCP, MPMU
+  and APMU roughly corresponding to internal buses.
+
+  All these clock identifiers could be found in 
.
+
+properties:
+  compatible:
+enum:
+  - marvell,pxa1908-apbc
+  - marvell,pxa1908-apbcp
+  - marvell,pxa1908-mpmu
+  - marvell,pxa1908-apmu
+
+  reg:
+maxItems: 1
+
+  '#clock-cells':
+const: 1
+
+required:
+  - compatible
+  - reg
+  - '#clock-cells'
+
+additionalProperties: false
+
+examples:
+  # APMU block:
+  - |
+clock-controller@d4282800 {
+  compatible = "marvell,pxa1908-apmu";
+  reg = <0xd4282800 0x400>;
+  #clock-cells = <1>;
+};
diff --git a/include/dt-bindings/clock/marvell,pxa1908.h 
b/include/dt-bindings/clock/marvell,pxa1908.h
new file mode 100644
index ..fb15b0d0cd4c
--- /dev/null
+++ b/include/dt-bindings/clock/marvell,pxa1908.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */
+#ifndef __DTS_MARVELL_PXA1908_CLOCK_H
+#define __DTS_MARVELL_PXA1908_CLOCK_H
+
+/* plls */
+#define PXA1908_CLK_CLK32  1
+#define PXA1908_CLK_VCTCXO 2
+#define PXA1908_CLK_PLL1_624   3
+#define PXA1908_CLK_PLL1_416   4
+#define PXA1908_CLK_PLL1_499   5
+#define PXA1908_CLK_PLL1_832   6
+#define PXA1908_CLK_PLL1_1248  7
+#define PXA1908_CLK_PLL1_D28
+#define PXA1908_CLK_PLL1_D49
+#define PXA1908_CLK_PLL1_D810
+#define PXA1908_CLK_PLL1_D16   11
+#define PXA1908_CLK_PLL1_D612
+#define PXA1908_CLK_PLL1_D12   13
+#define PXA1908_CLK_PLL1_D24   14
+#define PXA1908_CLK_PLL1_D48   15
+#define PXA1908_CLK_PLL1_D96   16
+#define PXA1908_CLK_PLL1_D13   17
+#define PXA1908_CLK_PLL1_3218
+#define PXA1908_CLK_PLL1_208   19
+#define PXA1908_CLK_PLL1_117   20
+#define PXA1908_CLK_PLL1_416_GATE  21
+#define PXA1908_CLK_PLL1_624_GATE  22
+#define PXA1908_CLK_PLL1_832_GATE  23
+#define PXA1908_CLK_PLL1_1248_GATE 24
+#define PXA1908_CLK_PLL1_D2_GATE   25
+#define PXA1908_CLK_PLL1_499_EN26
+#define PXA1908_CLK_PLL2VCO27
+#define PXA1908_CLK_PLL2   28
+#define PXA1908_CLK_PLL2P  29
+#define PXA1908_CLK_PLL2VCODIV330
+#define PXA1908_CLK_PLL3VCO31
+#define PXA1908_CLK_PLL3   32
+#define PXA1908_CLK_PLL3P  33
+#define PXA1908_CLK_PLL3VCODIV334
+#define PXA1908_CLK_PLL4VCO35
+#define PXA1908_CLK_PLL4   36
+#define PXA1908_CLK_PLL4P  37
+#define PXA1908_CLK_PLL4VCODIV338
+
+/* apb (apbc) peripherals */
+#define PXA1908_CLK_UART0  1
+#define PXA1908_CLK_UART1  2
+#define PXA1908_CLK_GPIO   3
+#define PXA1908_CLK_PWM0   4
+#define PXA1908_CLK_PWM1   5
+#define PXA1908_CLK_PWM2   6
+#define PXA1908_CLK_PWM3   7
+#define PXA1908_CLK_SSP0   8
+#define PXA1908_CLK_SSP1   9
+#define PXA1908_CLK_IPC_RST10
+#define PXA1908_CLK_RTC11
+#define PXA1908_CLK_TWSI0  12
+#define PXA1908_CLK_KPC13
+#define PXA1908_CLK_SWJTAG 14
+#define PXA1908_CLK_SSP2   15
+#define PXA1908_CLK_TWSI1  16
+#define PXA1908_CLK_THERMAL17
+#define PXA1908_CLK_TWSI3  18
+
+/* apb (apbcp) peripherals */
+#define PXA1908_CLK_UART2  1
+#define PXA1908_CLK_TWSI2  2
+#define PXA1908_CLK_AICER  3
+
+/* axi (apmu) peripherals */
+#define PXA1908_CLK_CCIC1  1
+#define PXA1908_CLK_ISP2
+#define PXA1908_CLK_DSI1   3
+#define PXA1908_CLK_DISP1  4
+#define PXA1908_CLK_CCIC0

[PATCH v8 3/9] pinctrl: single: add marvell,pxa1908-padconf compatible

2024-01-10 Thread Duje Mihanović

Add the "marvell,pxa1908-padconf" compatible to allow migrating to a
separate pinctrl driver later.

Signed-off-by: Duje Mihanović 
---
 drivers/pinctrl/pinctrl-single.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 19cc0db771a5..c15bf3cbabd7 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -1967,6 +1967,7 @@ static const struct pcs_soc_data pinconf_single = {
 };
 
 static const struct of_device_id pcs_of_match[] = {
+   { .compatible = "marvell,pxa1908-padconf", .data = _single },
{ .compatible = "ti,am437-padconf", .data = _single_am437x },
{ .compatible = "ti,am654-padconf", .data = _single_am654 },
{ .compatible = "ti,dra7-padconf", .data = _single_dra7 },

-- 
2.43.0

[PATCH v8 2/9] dt-bindings: pinctrl: pinctrl-single: add marvell,pxa1908-padconf compatible

2024-01-10 Thread Duje Mihanović

Add the "marvell,pxa1908-padconf" compatible to allow migrating to a
separate pinctrl driver later.

Reviewed-by: Rob Herring 
Signed-off-by: Duje Mihanović 
---
 Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml
index c11495524dd2..1ce24ad8bc73 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml
@@ -33,6 +33,10 @@ properties:
   - ti,omap5-padconf
   - ti,j7200-padconf
   - const: pinctrl-single
+  - items:
+  - enum:
+  - marvell,pxa1908-padconf
+  - const: pinconf-single
 
   reg:
 maxItems: 1

-- 
2.43.0

[PATCH v8 0/9] Initial Marvell PXA1908 support

2024-01-10 Thread Duje Mihanović

Hello,

This series adds initial support for the Marvell PXA1908 SoC and
"samsung,coreprimevelte", a smartphone using the SoC.

Unlike the previous revisions which are based on -rc tags, this revision
is based on next-20240110 as it requires commits 67508b874844 ("ASoC:
pxa: sspa: Don't select SND_ARM") and 6db359b5eef5 ("soc: pxa: ssp: fix
casts") from the linux-next tree to compile successfully with
allyesconfig/allmodconfig.

USB works and the phone can boot a rootfs from an SD card, but there are
some warnings in the dmesg:

During SMP initialization:
[0.006519] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU1: 0x00
[0.006542] CPU features: Unsupported CPU feature variation detected.
[0.006589] CPU1: Booted secondary processor 0x01 [0x410fd032]
[0.010710] Detected VIPT I-cache on CPU2
[0.010716] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU2: 0x00
[0.010758] CPU2: Booted secondary processor 0x02 [0x410fd032]
[0.014849] Detected VIPT I-cache on CPU3
[0.014855] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU3: 0x00
[0.014895] CPU3: Booted secondary processor 0x03 [0x410fd032]

SMMU probing fails:
[0.101798] arm-smmu c001.iommu: probing hardware configuration...
[0.101809] arm-smmu c001.iommu: SMMUv1 with:
[0.101816] arm-smmu c001.iommu: no translation support!

A 3.14 based Marvell tree is available on GitHub
acorn-marvell/brillo_pxa_kernel, and a Samsung one on GitHub
CoderCharmander/g361f-kernel.

Andreas Färber attempted to upstream support for this SoC in 2017:
https://lore.kernel.org/lkml/20170222022929.10540-1-afaer...@suse.de/

Signed-off-by: Duje Mihanović 

Changes in v8:
- Drop SSPA patch
- Drop broken-cd from eMMC node
- Specify S-Boot hardcoded initramfs location in device tree
- Add ARM PMU node
- Correct inverted modem memory base and size
- Update trailers
- Rebase on next-20240110
- Link to v7: 
https://lore.kernel.org/20231102-pxa1908-lkml-v7-0-cabb1a0cb...@skole.hr
  and https://lore.kernel.org/20231102152033.5511-1-duje.mihano...@skole.hr

Changes in v7:
- Suppress SND_MMP_SOC_SSPA on ARM64
- Update trailers
- Rebase on v6.6-rc7
- Link to v6: 
https://lore.kernel.org/r/20231010-pxa1908-lkml-v6-0-b2fe09240...@skole.hr

Changes in v6:
- Address maintainer comments:
  - Add "marvell,pxa1908-padconf" binding to pinctrl-single driver
- Drop GPIO patch as it's been pulled
- Update trailers
- Rebase on v6.6-rc5
- Link to v5: 
https://lore.kernel.org/r/20230812-pxa1908-lkml-v5-0-a5d51937e...@skole.hr

Changes in v5:
- Address maintainer comments:
  - Move *_NR_CLKS to clock driver from dt binding file
- Allocate correct number of clocks for each block instead of blindly
  allocating 50 for each
- Link to v4: 
https://lore.kernel.org/r/20230807-pxa1908-lkml-v4-0-cb387d73b...@skole.hr

Changes in v4:
- Address maintainer comments:
  - Relicense clock binding file to BSD-2
- Add pinctrl-names to SD card node
- Add vgic registers to GIC node
- Rebase on v6.5-rc5
- Link to v3: 
https://lore.kernel.org/r/20230804-pxa1908-lkml-v3-0-8e48fca37...@skole.hr

Changes in v3:
- Address maintainer comments:
  - Drop GPIO dynamic allocation patch
  - Move clock register offsets into driver (instead of bindings file)
  - Add missing Tested-by trailer to u32_fract patch
  - Move SoC binding to arm/mrvl/mrvl.yaml
- Add serial0 alias and stdout-path to board dts to enable UART
  debugging
- Rebase on v6.5-rc4
- Link to v2: 
https://lore.kernel.org/r/20230727162909.6031-1-duje.mihano...@skole.hr

Changes in v2:
- Remove earlycon patch as it's been merged into tty-next
- Address maintainer comments:
  - Clarify GPIO regressions on older PXA platforms
  - Add Fixes tag to commit disabling GPIO pinctrl calls for this SoC
  - Add missing includes to clock driver
  - Clock driver uses HZ_PER_MHZ, u32_fract and GENMASK
  - Dual license clock bindings
  - Change clock IDs to decimal
  - Fix underscores in dt node names
  - Move chosen node to top of board dts
  - Clean up documentation
  - Reorder commits
  - Drop pxa,rev-id
- Rename muic-i2c to i2c-muic
- Reword some commits
- Move framebuffer node to chosen
- Add aliases for mmc nodes
- Rebase on v6.5-rc3
- Link to v1: 
https://lore.kernel.org/r/20230721210042.21535-1-duje.mihano...@skole.hr

---
Andy Shevchenko (1):
  clk: mmp: Switch to use struct u32_fract instead of custom one

Duje Mihanović (8):
  dt-bindings: pinctrl: pinctrl-single: add marvell,pxa1908-padconf 
compatible
  pinctrl: single: add marvell,pxa1908-padconf compatible
  dt-bindings: clock: Add Marvell PXA1908 clock bindings
  clk: mmp: Add Marvell PXA1908 clock driver
  dt-bindings: marvell: Document PXA1908 SoC
  arm64: Kconfig.platforms:

[PATCH v8 1/9] clk: mmp: Switch to use struct u32_fract instead of custom one

2024-01-10 Thread Duje Mihanović

From: Andy Shevchenko 

The struct mmp_clk_factor_tbl repeats the generic struct u32_fract.
Kill the custom one and use the generic one instead.

Signed-off-by: Andy Shevchenko 
Tested-by: Duje Mihanović 
Reviewed-by: Linus Walleij 
Signed-off-by: Duje Mihanović 
---
 drivers/clk/mmp/clk-frac.c   | 57 
 drivers/clk/mmp/clk-of-mmp2.c| 26 +-
 drivers/clk/mmp/clk-of-pxa168.c  |  4 +--
 drivers/clk/mmp/clk-of-pxa1928.c |  6 ++---
 drivers/clk/mmp/clk-of-pxa910.c  |  4 +--
 drivers/clk/mmp/clk.h| 10 +++
 6 files changed, 51 insertions(+), 56 deletions(-)

diff --git a/drivers/clk/mmp/clk-frac.c b/drivers/clk/mmp/clk-frac.c
index 1b90867b60c4..6556f6ada2e8 100644
--- a/drivers/clk/mmp/clk-frac.c
+++ b/drivers/clk/mmp/clk-frac.c
@@ -26,14 +26,15 @@ static long clk_factor_round_rate(struct clk_hw *hw, 
unsigned long drate,
 {
struct mmp_clk_factor *factor = to_clk_factor(hw);
u64 rate = 0, prev_rate;
+   struct u32_fract *d;
int i;
 
for (i = 0; i < factor->ftbl_cnt; i++) {
-   prev_rate = rate;
-   rate = *prate;
-   rate *= factor->ftbl[i].den;
-   do_div(rate, factor->ftbl[i].num * factor->masks->factor);
+   d = >ftbl[i];
 
+   prev_rate = rate;
+   rate = (u64)(*prate) * d->denominator;
+   do_div(rate, d->numerator * factor->masks->factor);
if (rate > drate)
break;
}
@@ -52,23 +53,22 @@ static unsigned long clk_factor_recalc_rate(struct clk_hw 
*hw,
 {
struct mmp_clk_factor *factor = to_clk_factor(hw);
struct mmp_clk_factor_masks *masks = factor->masks;
-   unsigned int val, num, den;
+   struct u32_fract d;
+   unsigned int val;
u64 rate;
 
val = readl_relaxed(factor->base);
 
/* calculate numerator */
-   num = (val >> masks->num_shift) & masks->num_mask;
+   d.numerator = (val >> masks->num_shift) & masks->num_mask;
 
/* calculate denominator */
-   den = (val >> masks->den_shift) & masks->den_mask;
-
-   if (!den)
+   d.denominator = (val >> masks->den_shift) & masks->den_mask;
+   if (!d.denominator)
return 0;
 
-   rate = parent_rate;
-   rate *= den;
-   do_div(rate, num * factor->masks->factor);
+   rate = (u64)parent_rate * d.denominator;
+   do_div(rate, d.numerator * factor->masks->factor);
 
return rate;
 }
@@ -82,18 +82,18 @@ static int clk_factor_set_rate(struct clk_hw *hw, unsigned 
long drate,
int i;
unsigned long val;
unsigned long flags = 0;
+   struct u32_fract *d;
u64 rate = 0;
 
for (i = 0; i < factor->ftbl_cnt; i++) {
-   rate = prate;
-   rate *= factor->ftbl[i].den;
-   do_div(rate, factor->ftbl[i].num * factor->masks->factor);
+   d = >ftbl[i];
 
+   rate = (u64)prate * d->denominator;
+   do_div(rate, d->numerator * factor->masks->factor);
if (rate > drate)
break;
}
-   if (i > 0)
-   i--;
+   d = i ? >ftbl[i - 1] : >ftbl[0];
 
if (factor->lock)
spin_lock_irqsave(factor->lock, flags);
@@ -101,10 +101,10 @@ static int clk_factor_set_rate(struct clk_hw *hw, 
unsigned long drate,
val = readl_relaxed(factor->base);
 
val &= ~(masks->num_mask << masks->num_shift);
-   val |= (factor->ftbl[i].num & masks->num_mask) << masks->num_shift;
+   val |= (d->numerator & masks->num_mask) << masks->num_shift;
 
val &= ~(masks->den_mask << masks->den_shift);
-   val |= (factor->ftbl[i].den & masks->den_mask) << masks->den_shift;
+   val |= (d->denominator & masks->den_mask) << masks->den_shift;
 
writel_relaxed(val, factor->base);
 
@@ -118,7 +118,8 @@ static int clk_factor_init(struct clk_hw *hw)
 {
struct mmp_clk_factor *factor = to_clk_factor(hw);
struct mmp_clk_factor_masks *masks = factor->masks;
-   u32 val, num, den;
+   struct u32_fract d;
+   u32 val;
int i;
unsigned long flags = 0;
 
@@ -128,23 +129,22 @@ static int clk_factor_init(struct clk_hw *hw)
val = readl(factor->base);
 
/* calculate numerator */
-   num = (val >> masks->num_shift) & masks->num_mask;
+   d.numerator = (val >> masks->num_shift) & masks->num_mask;
 
/* calculate denominator */
-   den = (val >> masks->den_shift) & masks->den_mask;
+   d.denominator = (val >> masks->den_shift) & masks->den_mask;
 
for (i = 0; i < factor->ftbl_cnt; i++)
-   if (den == factor->ftbl[i].den && num == factor->ftbl[i].num)
+   if (d.denominator == factor->ftbl[i].denominator &&
+   d.numerator == factor->ftbl[i].numerator)
break;
 
if (i >=

[PATCH v8 6/9] dt-bindings: marvell: Document PXA1908 SoC

2024-01-10 Thread Duje Mihanović

Add dt binding for the Marvell PXA1908 SoC.

Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Duje Mihanović 
---
 Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml 
b/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml
index 4c43eaf3632e..f73bb8ec3a1a 100644
--- a/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml
+++ b/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml
@@ -35,6 +35,11 @@ properties:
   - enum:
   - dell,wyse-ariel
   - const: marvell,mmp3
+  - description: PXA1908 based boards
+items:
+  - enum:
+  - samsung,coreprimevelte
+  - const: marvell,pxa1908
 
 additionalProperties: true
 

-- 
2.43.0

Re: [PATCH v1 2/5] livepatch: Add klp-convert tool

2024-01-10 Thread Marcos Paulo de Souza

On Fri, 2024-01-05 at 14:29 +0100, Petr Mladek wrote:
> On Mon 2023-11-06 17:25:10, Lukas Hruska wrote:
> > Livepatches need to access external symbols which can't be handled
> > by the normal relocation mechanism. It is needed for two types
> > of symbols:
> > 
> >   + Symbols which can be local for the original livepatched
> > function.
> >     The alternative implementation in the livepatch sees them
> >     as external symbols.
> > 
> >   + Symbols in modules which are exported via EXPORT_SYMBOL*().
> > They
> >     must be handled special way otherwise the livepatch module
> > would
> >     depend on the livepatched one. Loading such livepatch would
> > cause
> >     loading the other module as well.
> > 
> > The address of these symbols can be found via kallsyms. Or they can
> 
> Please, remove the extra space at the end of the line.
> 
> > be relocated using livepatch specific relocation sections as
> > specified
> > in Documentation/livepatch/module-elf-format.txt.
> > 
> > Currently, there is no trivial way to embed the required
> > information as
> > requested in the final livepatch elf object. klp-convert solves
> > this
> > problem by using annotations in the elf object to convert the
> > relocation
> > accordingly to the specification, enabling it to be handled by the
> > livepatch loader.
> > 
> > Given the above, create scripts/livepatch to hold tools developed
> > for
> > livepatches and add source files for klp-convert there.
> > 
> > Allow to annotate such external symbols in the livepatch by a macro
> > KLP_RELOC_SYMBOL(). It will create symbol with all needed
> > metadata. For example:
> > 
> >   extern char *saved_command_line \
> >  KLP_RELOC_SYMBOL(vmlinux, vmlinux,
> > saved_command_line, 0);
> > 
> > would create symbol
> > 
> > $>readelf -r -W :
> > Relocation section '.rela.text' at offset 0x32e60 contains 10
> > entries:
> >     Offset Info Type   Symbol's
> > Value  Symbol's Name + Addend
> > [...]
> > 0068  003c0002 R_X86_64_PC32 
> >  .klp.sym.rela.vmlinux.vmlinux.saved_command_line,0
> > - 4
> > [...]
> > 
> > 
> > Also add scripts/livepatch/klp-convert. The tool transforms symbols
> > created by KLP_RELOC_SYMBOL() to object specific rela sections
> > and rela entries which would later be proceed when the livepatch
> > or the livepatched object is loaded.
> > 
> > For example, klp-convert would replace the above symbols with:
> 
> s/above symbols/above symbol/
> 
> > $> readelf -r -W 
> > Relocation section '.klp.rela.vmlinux.text' at offset 0x5cb60
> > contains 1 entry:
> >     Offset Info Type   Symbol's
> > Value  Symbol's Name + Addend
> > 0068  003c0002 R_X86_64_PC32 
> >  .klp.sym.vmlinux.saved_command_line,0 - 4
> > 
> > klp-convert relies on libelf and on a list implementation. Add
> > files
> > scripts/livepatch/elf.c and scripts/livepatch/elf.h, which are a
> > libelf
> > interfacing layer and scripts/livepatch/list.h, which is a list
> > implementation.
> > 
> > Update Makefiles to correctly support the compilation of the new
> > tool,
> > update MAINTAINERS file and add a .gitignore file.
> > 
> > ---
> >  MAINTAINERS |   1 +
> >  include/linux/livepatch.h   |  19 +
> >  scripts/Makefile    |   1 +
> >  scripts/livepatch/.gitignore    |   1 +
> >  scripts/livepatch/Makefile  |   5 +
> >  scripts/livepatch/elf.c | 817
> > 
> >  scripts/livepatch/elf.h |  73 +++
> 
> I see a similar code in
> 
>     tools/objtool/elf.c
>     tools/objtool/include/objtool/elf.h
> 
> Both variants have been written by Josh. I wonder if we could share
> one implementation. Josh?
> 
> >  scripts/livepatch/klp-convert.c | 283 +++
> >  scripts/livepatch/klp-convert.h |  42 ++
> >  scripts/livepatch/list.h    | 391 +++
> 
> And probably also the list.h

I understand that code that live on tools/ are usually self contained,
so I'm not sure how can this code be shared. Is it advisable to add
list.h, elf.h to tools/include/tools? I'm not sure about the elf.c
tough.

> 
> >  10 files changed, 1633 insertions(+)
> >  create mode 100644 scripts/livepatch/.gitignore
> >  create mode 100644 scripts/livepatch/Makefile
> >  create mode 100644 scripts/livepatch/elf.c
> >  create mode 100644 scripts/livepatch/elf.h
> >  create mode 100644 scripts/livepatch/klp-convert.c
> >  create mode 100644 scripts/livepatch/klp-convert.h
> >  create mode 100644 scripts/livepatch/list.h
> > 
> > --- /dev/null
> > +++ b/scripts/livepatch/klp-convert.c
> > @@ -0,0 +1,283 @@
> [...]
> > +/* Converts rela symbol names */
> > +static bool convert_symbol(struct symbol *s)
> > +{
> > +   char lp_obj_name[MODULE_NAME_LEN];
> > +   char sym_obj_name[MODULE_NAME_LEN];
> > +   char sym_name[KSYM_NAME_LEN];
> > +   char *klp_sym_name;
> > +

Re: [PATCH V3] remoteproc: virtio: Fix wdg cannot recovery remote processor

2024-01-10 Thread Mathieu Poirier

Good day Joakim,

On Sun, Dec 17, 2023 at 01:36:59PM +0800, joakim.zh...@cixtech.com wrote:
> From: Joakim Zhang 
> 
> Recovery remote processor failed when wdg irq received:
> [0.842574] remoteproc remoteproc0: crash detected in cix-dsp-rproc: type 
> watchdog
> [0.842750] remoteproc remoteproc0: handling crash #1 in cix-dsp-rproc
> [0.842824] remoteproc remoteproc0: recovering cix-dsp-rproc
> [0.843342] remoteproc remoteproc0: stopped remote processor cix-dsp-rproc
> [0.847901] rproc-virtio rproc-virtio.0.auto: Failed to associate buffer
> [0.847979] remoteproc remoteproc0: failed to probe subdevices for 
> cix-dsp-rproc: -16
> 
> The reason is that dma coherent mem would not be released when
> recovering the remote processor, due to rproc_virtio_remove()
> would not be called, where the mem released. It will fail when
> it try to allocate and associate buffer again.
> 
> Releasing reserved memory from rproc_virtio_dev_release(), instead of
> rproc_virtio_remove().
> 
> Fixes: 1d7b61c06dc3 ("remoteproc: virtio: Create platform device for the 
> remoteproc_virtio")
> Signed-off-by: Joakim Zhang 

I am in agreement with your patch.  I will apply it when 6.8-rc1 comes out.

Thanks,
Mathieu

> ---
> ChangeLogs:
> V1->V2:
>   * the same for of_reserved_mem_device_release()
> V2->V3:
>   * release reserved memory in rproc_virtio_dev_release()
> ---
>  drivers/remoteproc/remoteproc_virtio.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_virtio.c 
> b/drivers/remoteproc/remoteproc_virtio.c
> index 83d76915a6ad..25b66b113b69 100644
> --- a/drivers/remoteproc/remoteproc_virtio.c
> +++ b/drivers/remoteproc/remoteproc_virtio.c
> @@ -351,6 +351,9 @@ static void rproc_virtio_dev_release(struct device *dev)
>  
>   kfree(vdev);
>  
> + of_reserved_mem_device_release(>pdev->dev);
> + dma_release_coherent_memory(>pdev->dev);
> +
>   put_device(>pdev->dev);
>  }
>  
> @@ -584,9 +587,6 @@ static void rproc_virtio_remove(struct platform_device 
> *pdev)
>   rproc_remove_subdev(rproc, >subdev);
>   rproc_remove_rvdev(rvdev);
>  
> - of_reserved_mem_device_release(>dev);
> - dma_release_coherent_memory(>dev);
> -
>   put_device(>dev);
>  }
>  
> -- 
> 2.25.1
>

Re: [PATCH 1/2] arm64: dts: qcom: sm7225-fairphone-fp4: Add PMK8003 thermals

2024-01-10 Thread Konrad Dybcio





On 1/5/24 15:54, Luca Weiss wrote:

Configure the thermals for the XO_THERM thermistor connected to the
PMK8003 (which is called PMK8350 in software).

The ADC configuration for PMK8350_ADC7_AMUX_THM1_100K_PU has already
been added in the past.

The trip points can really only be considered as placeholders, more
configuration with cooling etc. can be added later.

Signed-off-by: Luca Weiss 
---
  arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts | 25 +++
  1 file changed, 25 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts 
b/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
index ade619805519..b7ccfe4011bb 100644
--- a/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
+++ b/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
@@ -112,6 +112,20 @@ active-config0 {
};
};
};
+
+   xo-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+   thermal-sensors = <_adc_tm 0>;
+
+   trips {
+   active-config0 {
+   temperature = <125000>;
+   hysteresis = <1000>;
+   type = "passive";
+   };
+   };
+   };
};
  };
  
@@ -490,6 +504,17 @@ conn-therm@1 {

};
  };
  
+_adc_tm {

+   status = "okay";
+
+   xo-therm@0 {
+   reg = <0>;
+   io-channels = <_vadc PMK8350_ADC7_AMUX_THM1_100K_PU>;
+   qcom,ratiometric;
+   qcom,hw-settle-time-us = <200>;


My ocd would rather see the boolean property at the end

anyway

Reviewed-by: Konrad Dybcio 

Konrad

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-10 Thread Steven Rostedt

On Wed, 10 Jan 2024 10:52:51 -0500
Steven Rostedt  wrote:

> On Wed, 10 Jan 2024 08:07:46 -0500
> Steven Rostedt  wrote:
> 
> > Or are you saying that I don't need the ".permission" callback, because
> > eventfs does it when it creates the inodes? But for eventfs to know what
> > the permissions changes are, it uses .getattr and .setattr.  
> 
> OK, if your main argument is that we do not need .permission, I agree with
> you. But that's a trivial change and doesn't affect the complexity that
> eventfs is doing. In fact, removing the "permission" check is simply this
> patch:
> 
> --
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index fdff53d5a1f8..f2af07a857e2 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -192,18 +192,10 @@ static int eventfs_get_attr(struct mnt_idmap *idmap,
>   return 0;
>  }
>  
> -static int eventfs_permission(struct mnt_idmap *idmap,
> -   struct inode *inode, int mask)
> -{
> - set_top_events_ownership(inode);
> - return generic_permission(idmap, inode, mask);
> -}
> -
>  static const struct inode_operations eventfs_root_dir_inode_operations = {
>   .lookup = eventfs_root_lookup,
>   .setattr= eventfs_set_attr,
>   .getattr= eventfs_get_attr,
> - .permission = eventfs_permission,
>  };
>  
>  static const struct inode_operations eventfs_file_inode_operations = {
> --
> 
> I only did that because Linus mentioned it, and I thought it was needed.
> I'll apply this patch too, as it appears to work with this code.

Oh, eventfs files and directories don't need the .permissions because its
inodes and dentries are not created until accessed. But the "events"
directory itself has its dentry and inode created at boot up, but still
uses the eventfs_root_dir_inode_operations. So the .permissions is still
needed!

If you look at the "set_top_events_ownership()" function, it has:

/* The top events directory doesn't get automatically updated */
if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL))
return;

That is, it does nothing if the entry is not the "events" directory. It
falls back to he default "->permissions()" function for everything but the
top level "events" directory.

But this and .getattr are still needed for the events directory, because it
suffers the same issue as the other tracefs entries. That is, it's inodes
and dentries are created at boot up before it is mounted. So if the mount
has gid=1000, it will be ignored.

The .getattr is called by "stat" which ls does. So after boot up if you
just do:

 # chmod 0750 /sys/kernel/events
 # chmod 0770 /sys/kernel/tracing
 # mount -o remount,gid=1000 /sys/kernel/tracing
 # su - rostedt
 $ id
uid=1000(rostedt) gid=1000(rostedt) groups=1000(rostedt)
 $ ls /sys/kernel/tracing/events/
9pext4iomapmodule  raw_syscalls  thermal
alarmtimerfib iommumsr rcu   thp
avc   fib6io_uring napiregmaptimer
block filelockipi  neigh   regulator tlb
bpf_test_run  filemap irq  net resctrl   udp
bpf_trace ftrace  irq_matrix   netfs   rpm   virtio_gpu[
...]

The above works because "ls" does a stat() on the directory first, which
does a .getattr() call that updates the permissions of the existing "events"
directory inode.

  BUT!

If I had used my own getents() program that has:

fd = openat(AT_FDCWD, argv[1], O_RDONLY);
if (fd < 0)
perror("openat");

n = getdents64(fd, buf, BUF_SIZE);
if (n < 0)
perror("getdents64");

Where it calls the openat() without doing a stat fist, and after boot, had done:

 # chmod 0750 /sys/kernel/events
 # chmod 0770 /sys/kernel/tracing
 # mount -o remount,gid=1000 /sys/kernel/tracing
 # su - rostedt
 $ id
uid=1000(rostedt) gid=1000(rostedt) groups=1000(rostedt)
 $ ./getdents /sys/kernel/tracing/events
openat: Permission denied
getdents64: Bad file descriptor

It errors because he "events" inode permission hasn't been updated yet.
Now after getting the above error, if I do the "ls" and then run it again:

 $ ls /sys/kernel/tracing/events > /dev/null
 $ ./getdents /sys/kernel/tracing/events
enable
header_page
header_event
initcall
vsyscall
syscalls

it works!

so no, I can't remove that .permissions callback from eventfs.

-- Steve

Re: [PATCH net-next v3 2/3] net: introduce abstraction for network memory

2024-01-10 Thread Shakeel Butt

On Thu, Jan 4, 2024 at 1:44 PM Jakub Kicinski  wrote:
>
[...]
>
> You seem to be trying hard to make struct netmem a thing.
> Perhaps you have a reason I'm not getting?

Mina already went with your suggestion and that is fine. To me, struct
netmem is more aesthetically aligned with the existing struct
encoded_page approach, but I don't have a strong opinion one way or
the other. However it seems like you have a stronger preference for
__bitwise approach. Is there a technical reason or just aesthetic?

Re: [PATCH] arm64: dts: qcom: sm7225-fairphone-fp4: Switch firmware ext to .mbn

2024-01-10 Thread Konrad Dybcio





On 1/10/24 16:21, Luca Weiss wrote:

Specify the file name for the squashed/non-split firmware with the .mbn
extension instead of the split .mdt. The kernel can load both but the
squashed version is preferred in dts nowadays.

Signed-off-by: Luca Weiss 
---


Thanks!

Reviewed-by: Konrad Dybcio 

Konrad

Re: REGRESSION: lockdep warning triggered by 15b9ce7ecd: virtio_balloon: stay awake while adjusting balloon

2024-01-10 Thread Theodore Ts'o

On Wed, Jan 10, 2024 at 03:11:01AM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 08, 2024 at 04:50:15PM -0500, Theodore Ts'o wrote:
> > Hi, while doing final testing before sending a pull request, I merged
> > in linux-next, and commit 5b9ce7ecd7: virtio_balloon: stay awake while
> > adjusting balloon seems to be causing a lockdep warning (see attached)
> > when running gce-xfstests on a Google Compute Engine e2 VM.  I was not
> > able to trigger it using kvm-xfstests, but the following command:
> > "gce-xfstests -C 10 ext4/4k generic/476) was sufficient to triger the
> > problem.   For more information please see [1] and [2].
> > 
> > [1] 
> > https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
> > [2] https://thunk.org/gce-xfstests
> > 
> > I found it by looking at the git logs, and this commit aroused my
> > suspicions, and I further testing showed that the lockdep warning was
> > reproducible with this commit, but not when testing with the
> > immediately preceeding commit (15b9ce7ecd^).
> > 
> > Cheers,
> 
> 
> Thanks a lot for the report!
> I pushed a fixed patch out (tree rebased).
> Would be great if you can confirm it's allright now.

I manually fixed up the white-space issues with the patch last night,
and verified that it fixed it for me with an overnight test run.  (My
patch was versus next-20240109, and then I tested with ext4/dev merged
in.  Previously I had noted the problem with next-20240107 with
ext4/dev merged in.)

Thanks,

- Ted


>From 98097bbd4fe2e15db8fa357aa6e29435cb62e450 Mon Sep 17 00:00:00 2001
From: David Stevens 
Date: Tue, 9 Jan 2024 14:41:21 +0900
Subject: [PATCH] virtio_balloon: Fix interrupt context deadlock

Use _irq spinlock functions with the adjustment_lock, since
start_update_balloon_size needs to acquire it in an interrupt context.

Fixes: 5b9ce7ecd715 ("virtio_balloon: stay awake while adjusting balloon")
Reported-by: Theodore Ts'o 
Tested-by: Theodore Ts'o 
Signed-off-by: David Stevens 
Signed-off-by: Theodore Ts'o 
---
 drivers/virtio/virtio_balloon.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index aa6a1a649ad6..1f5b3dd31fcf 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -459,12 +459,12 @@ static void start_update_balloon_size(struct 
virtio_balloon *vb)
 
 static void end_update_balloon_size(struct virtio_balloon *vb)
 {
-   spin_lock(>adjustment_lock);
+   spin_lock_irq(>adjustment_lock);
if (!vb->adjustment_signal_pending && vb->adjustment_in_progress) {
vb->adjustment_in_progress = false;
pm_relax(vb->vdev->dev.parent);
}
-   spin_unlock(>adjustment_lock);
+   spin_unlock_irq(>adjustment_lock);
 }
 
 static void virtballoon_changed(struct virtio_device *vdev)
@@ -506,9 +506,9 @@ static void update_balloon_size_func(struct work_struct 
*work)
vb = container_of(work, struct virtio_balloon,
  update_balloon_size_work);
 
-   spin_lock(>adjustment_lock);
+   spin_lock_irq(>adjustment_lock);
vb->adjustment_signal_pending = false;
-   spin_unlock(>adjustment_lock);
+   spin_unlock_irq(>adjustment_lock);
 
diff = towards_target(vb);
 
-- 
2.43.0

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-10 Thread Steven Rostedt

On Wed, 10 Jan 2024 10:52:51 -0500
Steven Rostedt  wrote:

> I'll apply this patch too, as it appears to work with this code.

I meant "appears to work without this code".

-- Steve

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-10 Thread Steven Rostedt

On Wed, 10 Jan 2024 08:07:46 -0500
Steven Rostedt  wrote:

> Or are you saying that I don't need the ".permission" callback, because
> eventfs does it when it creates the inodes? But for eventfs to know what
> the permissions changes are, it uses .getattr and .setattr.

OK, if your main argument is that we do not need .permission, I agree with
you. But that's a trivial change and doesn't affect the complexity that
eventfs is doing. In fact, removing the "permission" check is simply this
patch:

--
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index fdff53d5a1f8..f2af07a857e2 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -192,18 +192,10 @@ static int eventfs_get_attr(struct mnt_idmap *idmap,
return 0;
 }
 
-static int eventfs_permission(struct mnt_idmap *idmap,
- struct inode *inode, int mask)
-{
-   set_top_events_ownership(inode);
-   return generic_permission(idmap, inode, mask);
-}
-
 static const struct inode_operations eventfs_root_dir_inode_operations = {
.lookup = eventfs_root_lookup,
.setattr= eventfs_set_attr,
.getattr= eventfs_get_attr,
-   .permission = eventfs_permission,
 };
 
 static const struct inode_operations eventfs_file_inode_operations = {
--

I only did that because Linus mentioned it, and I thought it was needed.
I'll apply this patch too, as it appears to work with this code.

Thanks!

-- Steve

Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs

2024-01-10 Thread Miklos Szeredi

On Wed, 10 Jan 2024 at 16:19, Ahelenia Ziemiańska
 wrote:

> > We need to find an alternative to refusing splice, since this is not
> > going to fly, IMO.
> The alternative is to not hold the lock. See the references in the
> cover letter for why this wasn't done. IMO a potential slight perf
> hit flies more than a total exclusion on the pipe.

IDGI.  This will make splice(2) return EINVAL for unprivileged fuse
files, right?

That would be a regression, not a perf hit, if the application is not
falling back to plain read; a reasonable scenario, considering splice
from files (including fuse) has worked on linux for a *long* time.

Thanks,
Mikos

[PATCH] arm64: dts: qcom: sm7225-fairphone-fp4: Switch firmware ext to .mbn

2024-01-10 Thread Luca Weiss

Specify the file name for the squashed/non-split firmware with the .mbn
extension instead of the split .mdt. The kernel can load both but the
squashed version is preferred in dts nowadays.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts 
b/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
index ade619805519..9ed349ec076a 100644
--- a/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
+++ b/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
@@ -116,7 +116,7 @@ active-config0 {
 };
 
  {
-   firmware-name = "qcom/sm7225/fairphone4/adsp.mdt";
+   firmware-name = "qcom/sm7225/fairphone4/adsp.mbn";
status = "okay";
 };
 
@@ -361,7 +361,7 @@ _i2c0 {
 };
 
  {
-   firmware-name = "qcom/sm7225/fairphone4/cdsp.mdt";
+   firmware-name = "qcom/sm7225/fairphone4/cdsp.mbn";
status = "okay";
 };
 
@@ -400,12 +400,12 @@  {
  {
qcom,gsi-loader = "self";
memory-region = <_ipa_fw_mem>;
-   firmware-name = "qcom/sm7225/fairphone4/ipa_fws.mdt";
+   firmware-name = "qcom/sm7225/fairphone4/ipa_fws.mbn";
status = "okay";
 };
 
  {
-   firmware-name = "qcom/sm7225/fairphone4/modem.mdt";
+   firmware-name = "qcom/sm7225/fairphone4/modem.mbn";
status = "okay";
 };
 

---
base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
change-id: 20240110-fp4-mbn-74b1a7547342

Best regards,
-- 
Luca Weiss

Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs

2024-01-10 Thread Ahelenia Ziemiańska

On Wed, Jan 10, 2024 at 02:43:04PM +0100, Miklos Szeredi wrote:
> On Thu, 21 Dec 2023 at 04:09, Ahelenia Ziemiańska
>  wrote:
> > Potentially-blocking splice_reads are allowed for normal filesystems
> > like NFS because they're blessed by root.
> >
> > FUSE is commonly used suid-root, and allows anyone to trivially create
> > a file that, when spliced from, will just sleep forever with the pipe
> > lock held.
> >
> > The only way IPC to the fusing process could be avoided is if
> > !(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
> > and we weren't past the end. Just refuse it.
> How is this not going to cause regressions out there?
In "[PATCH v2 14/11] fuse: allow splicing to trusted mounts only"
splicing is re-enabled for mounts made by the real root.

> We need to find an alternative to refusing splice, since this is not
> going to fly, IMO.
The alternative is to not hold the lock. See the references in the
cover letter for why this wasn't done. IMO a potential slight perf
hit flies more than a total exclusion on the pipe.


signature.asc
Description: PGP signature

[bug report] eventfs: Read ei->entries before ei->children in eventfs_iterate()

2024-01-10 Thread Dan Carpenter

Hello Steven Rostedt (Google),

The patch 704f960dbee2: "eventfs: Read ei->entries before
ei->children in eventfs_iterate()" from Jan 4, 2024 (linux-next),
leads to the following Smatch static checker warning:

fs/tracefs/event_inode.c:775 eventfs_iterate()
warn: missing error code here? 'create_file_dentry()' failed. 'ret' = 
'0'

fs/tracefs/event_inode.c
749 /*
750  * Need to create the dentries and inodes to have a consistent
751  * inode number.
752  */
753 ret = 0;

Should this assignment be inside the loop?

754 
755 /* Start at 'c' to jump over already read entries */
756 for (i = c; i < ei->nr_entries; i++, ctx->pos++) {
757 void *cdata = ei->data;
758 
759 entry = >entries[i];
760 name = entry->name;
761 
762 mutex_lock(_mutex);
763 /* If ei->is_freed then just bail here, nothing more to 
do */
764 if (ei->is_freed) {
765 mutex_unlock(_mutex);
766 goto out;

On the second iteration through the loop then ret is an error code

767 }
768 r = entry->callback(name, , , );
769 mutex_unlock(_mutex);
770 if (r <= 0)
771 continue;

that comes from r = entry->callback().  Except, hm, none of the callback
currently return anything except zero or one so it's not an issue yet.

772 
773 dentry = create_file_dentry(ei, i, ei_dentry, name, 
mode, cdata, fops);
774 if (!dentry)
--> 775 goto out;
776 ino = dentry->d_inode->i_ino;
777 dput(dentry);
778 
779 if (!dir_emit(ctx, name, strlen(name), ino, DT_REG))
780 goto out;
781 }
782 
783 /* Subtract the skipped entries above */
784 c -= min((unsigned int)c, (unsigned int)ei->nr_entries);
785 

regards,
dan carpenter

Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs

2024-01-10 Thread Miklos Szeredi

On Thu, 21 Dec 2023 at 04:09, Ahelenia Ziemiańska
 wrote:
>
> Potentially-blocking splice_reads are allowed for normal filesystems
> like NFS because they're blessed by root.
>
> FUSE is commonly used suid-root, and allows anyone to trivially create
> a file that, when spliced from, will just sleep forever with the pipe
> lock held.
>
> The only way IPC to the fusing process could be avoided is if
> !(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
> and we weren't past the end. Just refuse it.

How is this not going to cause regressions out there?

We need to find an alternative to refusing splice, since this is not
going to fly, IMO.

Thanks,
Miklos

Re: [PATCH v1 1/5] livepatch: Create and include UAPI headers

2024-01-10 Thread Marcos Paulo de Souza

On Mon, 2023-11-06 at 17:25 +0100, Lukas Hruska wrote:
> From: Josh Poimboeuf 
> 
> Define klp prefixes in include/uapi/linux/livepatch.h, and use them
> for
> replacing hard-coded values in kernel/livepatch/core.c.
> 
> Signed-off-by: Josh Poimboeuf 
> Signed-off-by: Lukas Hruska 

Reviewed-by: Marcos Paulo de Souza 

> ---
>  MAINTAINERS    |  1 +
>  include/uapi/linux/livepatch.h | 15 +++
>  kernel/livepatch/core.c    |  5 +++--
>  3 files changed, 19 insertions(+), 2 deletions(-)
>  create mode 100644 include/uapi/linux/livepatch.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4cc6bf79fdd8..11a2d84c1277 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12130,6 +12130,7 @@ F:Documentation/ABI/testing/sysfs-
> kernel-livepatch
>  F:   Documentation/livepatch/
>  F:   arch/powerpc/include/asm/livepatch.h
>  F:   include/linux/livepatch.h
> +F:   include/uapi/linux/livepatch.h
>  F:   kernel/livepatch/
>  F:   kernel/module/livepatch.c
>  F:   lib/livepatch/
> diff --git a/include/uapi/linux/livepatch.h
> b/include/uapi/linux/livepatch.h
> new file mode 100644
> index ..e19430918a07
> --- /dev/null
> +++ b/include/uapi/linux/livepatch.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +
> +/*
> + * livepatch.h - Kernel Live Patching Core
> + *
> + * Copyright (C) 2016 Josh Poimboeuf 
> + */
> +
> +#ifndef _UAPI_LIVEPATCH_H
> +#define _UAPI_LIVEPATCH_H
> +
> +#define KLP_RELA_PREFIX  ".klp.rela."
> +#define KLP_SYM_PREFIX   ".klp.sym."
> +
> +#endif /* _UAPI_LIVEPATCH_H */
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index 61328328c474..622f1916a5c8 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "core.h"
>  #include "patch.h"
> @@ -226,7 +227,7 @@ static int klp_resolve_symbols(Elf_Shdr *sechdrs,
> const char *strtab,
>  
>   /* Format: .klp.sym.sym_objname.sym_name,sympos */
>   cnt = sscanf(strtab + sym->st_name,
> -  ".klp.sym.%55[^.].%511[^,],%lu",
> +  KLP_SYM_PREFIX "%55[^.].%511[^,],%lu",
>    sym_objname, sym_name, );
>   if (cnt != 3) {
>   pr_err("symbol %s has an incorrectly
> formatted name\n",
> @@ -305,7 +306,7 @@ static int klp_write_section_relocs(struct module
> *pmod, Elf_Shdr *sechdrs,
>    * See comment in klp_resolve_symbols() for an explanation
>    * of the selected field width value.
>    */
> - cnt = sscanf(shstrtab + sec->sh_name, ".klp.rela.%55[^.]",
> + cnt = sscanf(shstrtab + sec->sh_name, KLP_RELA_PREFIX
> "%55[^.]",
>    sec_objname);
>   if (cnt != 1) {
>   pr_err("section %s has an incorrectly formatted
> name\n",

Re: [PATCH] ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default

2024-01-10 Thread Jiri Olsa

On Wed, Jan 10, 2024 at 09:13:06AM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) 
> 
> The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
> and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
> are multiple ftrace_ops at the same function, but since the x86 only
> support to jump to direct_call from ftrace_regs_caller, when we set
> the function tracer on the same target function on x86, ftrace-direct
> does not work as below (this actually works on arm64.)
> 
> At first, insmod ftrace-direct.ko to put a direct_call on
> 'wake_up_process()'.
> 
>  # insmod kernel/samples/ftrace/ftrace-direct.ko
>  # less trace
> ...
>   -0   [006] ..s1.   564.686958: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [007] ..s1.   564.687836: my_direct_func: waking up 
> kcompactd0-63
>   -0   [006] ..s1.   564.690926: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [006] ..s1.   564.696872: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [007] ..s1.   565.191982: my_direct_func: waking up 
> kcompactd0-63
> 
> Setup a function filter to the 'wake_up_process' too, and enable it.
> 
>  # cd /sys/kernel/tracing/
>  # echo wake_up_process > set_ftrace_filter
>  # echo function > current_tracer
>  # less trace
> ...
>   -0   [006] ..s3.   686.180972: wake_up_process 
> <-call_timer_fn
>   -0   [006] ..s3.   686.186919: wake_up_process 
> <-call_timer_fn
>   -0   [002] ..s3.   686.264049: wake_up_process 
> <-call_timer_fn
>   -0   [002] d.h6.   686.515216: wake_up_process <-kick_pool
>   -0   [002] d.h6.   686.691386: wake_up_process <-kick_pool
> 
> Then, only function tracer is shown on x86.
> But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
> on the same function, it is shown again.
> 
>  # echo 'p wake_up_process' >> dynamic_events
>  # echo 1 > events/kprobes/p_wake_up_process_0/enable
>  # echo > trace
>  # less trace
> ...
>   -0   [006] ..s2.  2710.345919: p_wake_up_process_0: 
> (wake_up_process+0x4/0x20)
>   -0   [006] ..s3.  2710.345923: wake_up_process 
> <-call_timer_fn
>   -0   [006] ..s1.  2710.345928: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [006] ..s2.  2710.349931: p_wake_up_process_0: 
> (wake_up_process+0x4/0x20)
>   -0   [006] ..s3.  2710.349934: wake_up_process 
> <-call_timer_fn
>   -0   [006] ..s1.  2710.349937: my_direct_func: waking up 
> rcu_preempt-17
> 
> To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
> direct_call by default.

nice catch

Acked-by: Jiri Olsa 

jirka

> 
> Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and 
> !WITH_REGS")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Masami Hiramatsu (Google) 
> ---
>  kernel/trace/ftrace.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index b01ae7d36021..c060d5b47910 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -5325,7 +5325,17 @@ static LIST_HEAD(ftrace_direct_funcs);
>  
>  static int register_ftrace_function_nolock(struct ftrace_ops *ops);
>  
> +/*
> + * If there are multiple ftrace_ops, use SAVE_REGS by default, so that direct
> + * call will be jumped from ftrace_regs_caller. Only if the architecture does
> + * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it
> + * jumps from ftrace_caller for multiple ftrace_ops.
> + */
> +#ifndef HAVE_DYNAMIC_FTRACE_WITH_REGS
>  #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS)
> +#else
> +#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
> +#endif
>  
>  static int check_direct_multi(struct ftrace_ops *ops)
>  {
>

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-10 Thread Steven Rostedt

On Wed, 10 Jan 2024 12:45:36 +0100
Christian Brauner  wrote:

> So say you do:
> 
> mkdir /sys/kernel/tracing/instances/foo
> 
> After this has returned we know everything we need to know about the new
> tracefs instance including the ownership and the mode of all inodes in
> /sys/kernel/tracing/instances/foo/events/* and below precisely because
> ownership is always inherited from the parent dentry and recorded in the
> metadata struct eventfs_inode.
> 
> So say someone does:
> 
> open("/sys/kernel/tracing/instances/foo/events/xfs");
> 
> and say this is the first time that someone accesses that events/
> directory.
> 
> When the open pathwalk is done, the vfs will determine via
> 
> [1] may_lookup(inode_of(events))
> 
> whether you are able to list entries such as "xfs" in that directory.
> The vfs checks inode_permission(MAY_EXEC) on "events" and if that holds
> it ends up calling i_op->eventfs_root_lookup(events).
> 
> At this point tracefs/eventfs adds the inodes for all entries in that
> "events" directory including "xfs" based on the metadata it recorded
> during the mkdir. Since now someone is actually interested in them. And
> it initializes the inodes with ownership and everything and adds the
> dentries that belong into that directory.
> 
> Nothing here depends on the permissions of the caller. The only
> permission that mattered was done in the VFS in [1]. If the caller has
> permissions to enter a directory they can lookup and list its contents.
> And its contents where determined/fixed etc when mkdir was called.
> 
> So we just need to add the required objects into the caches (inode,
> dentry) whose addition we intentionally defered until someone actually
> needed them.
> 
> So, eventfs_root_lookup() now initializes the inodes with the ownership
> from the stored metadata or from the parent dentry and splices in inodes
> and dentries. No permission checking is needed for this because it is
> always a recheck of what the vfs did in [1].
> 
> We now return to the vfs and path walk continues to the final component
> that you actually want to open which is that "xfs" directory in this
> example. We check the permissions on that inode via may_open("xfs") and
> we open that directory returning an fd to userspace ultimately.
> 
> (I'm going by memory since I need to step out the door.)

So, let's say we do:

 chgrp -R rostedt /sys/kernel/tracing/

But I don't want rostedt to have access to xfs

 chgrp -R root /sys/kernel/tracing/events/xfs

Both actions will create the inodes and dentries of all files and
directories (because of "-R"). But once that is done, the ref counts go to
zero. They stay around until reclaim. But then I open Chrome ;-) and it
reclaims all the dentries and inodes, so we are back to here we were on
boot.

Now as rostedt I do:

 ls /sys/kernel/tracing/events/xfs

The VFS layer doesn't know if I have permission to that or not, because all
the inodes and dentries have been freed. It has to call back to eventfs to
find out. Which the eventfs_root_lookup() and eventfs_iterate_shared() will
recreated the inodes with the proper permission.

Or are you saying that I don't need the ".permission" callback, because
eventfs does it when it creates the inodes? But for eventfs to know what
the permissions changes are, it uses .getattr and .setattr.

-- Steve

Re: [PATCH] ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default

2024-01-10 Thread Google

On Wed, 10 Jan 2024 12:20:21 +
Mark Rutland  wrote:

> On Wed, Jan 10, 2024 at 09:13:06AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) 
> > 
> > The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
> > and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
> > are multiple ftrace_ops at the same function, but since the x86 only
> > support to jump to direct_call from ftrace_regs_caller, when we set
> > the function tracer on the same target function on x86, ftrace-direct
> > does not work as below (this actually works on arm64.)
> > 
> > At first, insmod ftrace-direct.ko to put a direct_call on
> > 'wake_up_process()'.
> > 
> >  # insmod kernel/samples/ftrace/ftrace-direct.ko
> >  # less trace
> > ...
> >   -0   [006] ..s1.   564.686958: my_direct_func: waking 
> > up rcu_preempt-17
> >   -0   [007] ..s1.   564.687836: my_direct_func: waking 
> > up kcompactd0-63
> >   -0   [006] ..s1.   564.690926: my_direct_func: waking 
> > up rcu_preempt-17
> >   -0   [006] ..s1.   564.696872: my_direct_func: waking 
> > up rcu_preempt-17
> >   -0   [007] ..s1.   565.191982: my_direct_func: waking 
> > up kcompactd0-63
> > 
> > Setup a function filter to the 'wake_up_process' too, and enable it.
> > 
> >  # cd /sys/kernel/tracing/
> >  # echo wake_up_process > set_ftrace_filter
> >  # echo function > current_tracer
> >  # less trace
> > ...
> >   -0   [006] ..s3.   686.180972: wake_up_process 
> > <-call_timer_fn
> >   -0   [006] ..s3.   686.186919: wake_up_process 
> > <-call_timer_fn
> >   -0   [002] ..s3.   686.264049: wake_up_process 
> > <-call_timer_fn
> >   -0   [002] d.h6.   686.515216: wake_up_process 
> > <-kick_pool
> >   -0   [002] d.h6.   686.691386: wake_up_process 
> > <-kick_pool
> > 
> > Then, only function tracer is shown on x86.
> > But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
> > on the same function, it is shown again.
> > 
> >  # echo 'p wake_up_process' >> dynamic_events
> >  # echo 1 > events/kprobes/p_wake_up_process_0/enable
> >  # echo > trace
> >  # less trace
> > ...
> >   -0   [006] ..s2.  2710.345919: p_wake_up_process_0: 
> > (wake_up_process+0x4/0x20)
> >   -0   [006] ..s3.  2710.345923: wake_up_process 
> > <-call_timer_fn
> >   -0   [006] ..s1.  2710.345928: my_direct_func: waking 
> > up rcu_preempt-17
> >   -0   [006] ..s2.  2710.349931: p_wake_up_process_0: 
> > (wake_up_process+0x4/0x20)
> >   -0   [006] ..s3.  2710.349934: wake_up_process 
> > <-call_timer_fn
> >   -0   [006] ..s1.  2710.349937: my_direct_func: waking 
> > up rcu_preempt-17
> > 
> > To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
> > direct_call by default.
> > 
> > Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and 
> > !WITH_REGS")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Masami Hiramatsu (Google) 
> 
> Sorry about this; I hadn't realised that x86 only supported direct calls when
> SAVE_REGS was requested.

Yeah, it is hard to find without my fprobe on fgraph series because
all probes (kprobe/fprobe) and function-graph tracer uses SAVE_REGS.
Only function tracer hits this issue, but usually ftrace-direct user
will not use it with function tracer. So we were lucky to find it :)

> 
> The patch looks good to me. I applied it atop v6.7 and double-checked that 
> this
> still works on arm64 as per your examples above, and everything looks good:
> 
> # mount -t tracefs none /sys/kernel/tracing/
> # insmod ftrace-direct.ko 
> # echo wake_up_process > /sys/kernel/tracing/set_ftrace_filter 
> # echo function > /sys/kernel/tracing/current_tracer 
> # less /sys/kernel/tracing/trace
> ..
>   -0   [007] ..s3.   172.932840: wake_up_process 
> <-process_timeout
>   -0   [007] ..s1.   172.932842: my_direct_func: waking up 
> kcompactd0-62
>   -0   [007] ..s3.   173.444836: wake_up_process 
> <-process_timeout
>   -0   [007] ..s1.   173.444838: my_direct_func: waking up 
> kcompactd0-62
>   -0   [001] d.h5.   173.471116: wake_up_process <-kick_pool
>   -0   [001] d.h3.   173.471118: my_direct_func: waking up 
> kworker/1:1-58
> 
> Reviewed-by: Mark Rutland 
> Tested-by: Mark Rutland  [arm64]

Thank you!

> 
> Thanks,
> Mark.
> 
> > ---
> >  kernel/trace/ftrace.c |   10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index b01ae7d36021..c060d5b47910 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -5325,7 +5325,17 @@ static LIST_HEAD(ftrace_direct_funcs);
> >  
> >  static int register_ftrace_function_nolock(struct ftrace_ops *ops);
> >  
> > +/*
> > + * If there are multiple ftrace_ops, use SAVE_REGS by default, so that 
> > direct

Re: [PATCH] ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default

2024-01-10 Thread Mark Rutland

On Wed, Jan 10, 2024 at 09:13:06AM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) 
> 
> The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
> and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
> are multiple ftrace_ops at the same function, but since the x86 only
> support to jump to direct_call from ftrace_regs_caller, when we set
> the function tracer on the same target function on x86, ftrace-direct
> does not work as below (this actually works on arm64.)
> 
> At first, insmod ftrace-direct.ko to put a direct_call on
> 'wake_up_process()'.
> 
>  # insmod kernel/samples/ftrace/ftrace-direct.ko
>  # less trace
> ...
>   -0   [006] ..s1.   564.686958: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [007] ..s1.   564.687836: my_direct_func: waking up 
> kcompactd0-63
>   -0   [006] ..s1.   564.690926: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [006] ..s1.   564.696872: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [007] ..s1.   565.191982: my_direct_func: waking up 
> kcompactd0-63
> 
> Setup a function filter to the 'wake_up_process' too, and enable it.
> 
>  # cd /sys/kernel/tracing/
>  # echo wake_up_process > set_ftrace_filter
>  # echo function > current_tracer
>  # less trace
> ...
>   -0   [006] ..s3.   686.180972: wake_up_process 
> <-call_timer_fn
>   -0   [006] ..s3.   686.186919: wake_up_process 
> <-call_timer_fn
>   -0   [002] ..s3.   686.264049: wake_up_process 
> <-call_timer_fn
>   -0   [002] d.h6.   686.515216: wake_up_process <-kick_pool
>   -0   [002] d.h6.   686.691386: wake_up_process <-kick_pool
> 
> Then, only function tracer is shown on x86.
> But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
> on the same function, it is shown again.
> 
>  # echo 'p wake_up_process' >> dynamic_events
>  # echo 1 > events/kprobes/p_wake_up_process_0/enable
>  # echo > trace
>  # less trace
> ...
>   -0   [006] ..s2.  2710.345919: p_wake_up_process_0: 
> (wake_up_process+0x4/0x20)
>   -0   [006] ..s3.  2710.345923: wake_up_process 
> <-call_timer_fn
>   -0   [006] ..s1.  2710.345928: my_direct_func: waking up 
> rcu_preempt-17
>   -0   [006] ..s2.  2710.349931: p_wake_up_process_0: 
> (wake_up_process+0x4/0x20)
>   -0   [006] ..s3.  2710.349934: wake_up_process 
> <-call_timer_fn
>   -0   [006] ..s1.  2710.349937: my_direct_func: waking up 
> rcu_preempt-17
> 
> To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
> direct_call by default.
> 
> Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and 
> !WITH_REGS")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Masami Hiramatsu (Google) 

Sorry about this; I hadn't realised that x86 only supported direct calls when
SAVE_REGS was requested.

The patch looks good to me. I applied it atop v6.7 and double-checked that this
still works on arm64 as per your examples above, and everything looks good:

# mount -t tracefs none /sys/kernel/tracing/
# insmod ftrace-direct.ko 
# echo wake_up_process > /sys/kernel/tracing/set_ftrace_filter 
# echo function > /sys/kernel/tracing/current_tracer 
# less /sys/kernel/tracing/trace
... 
  -0   [007] ..s3.   172.932840: wake_up_process 
<-process_timeout
  -0   [007] ..s1.   172.932842: my_direct_func: waking up 
kcompactd0-62
  -0   [007] ..s3.   173.444836: wake_up_process 
<-process_timeout
  -0   [007] ..s1.   173.444838: my_direct_func: waking up 
kcompactd0-62
  -0   [001] d.h5.   173.471116: wake_up_process <-kick_pool
  -0   [001] d.h3.   173.471118: my_direct_func: waking up 
kworker/1:1-58

Reviewed-by: Mark Rutland 
Tested-by: Mark Rutland  [arm64]

Thanks,
Mark.

> ---
>  kernel/trace/ftrace.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index b01ae7d36021..c060d5b47910 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -5325,7 +5325,17 @@ static LIST_HEAD(ftrace_direct_funcs);
>  
>  static int register_ftrace_function_nolock(struct ftrace_ops *ops);
>  
> +/*
> + * If there are multiple ftrace_ops, use SAVE_REGS by default, so that direct
> + * call will be jumped from ftrace_regs_caller. Only if the architecture does
> + * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it
> + * jumps from ftrace_caller for multiple ftrace_ops.
> + */
> +#ifndef HAVE_DYNAMIC_FTRACE_WITH_REGS
>  #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS)
> +#else
> +#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
> +#endif
>  
>  static int check_direct_multi(struct ftrace_ops *ops)
>  {
>

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-10 Thread Christian Brauner

On Mon, Jan 08, 2024 at 10:23:31AM -0500, Steven Rostedt wrote:
> On Mon, 8 Jan 2024 12:04:54 +0100
> Christian Brauner  wrote:
> 
> > > > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > > > redundant and just wrong.  
> > > 
> > > I don't think so.  
> > 
> > I'm very well aware that the dentries and inode aren't created during
> > mkdir but the completely directory layout is determined. You're just
> > splicing in dentries and inodes during lookup and readdir.
> > 
> > If mkdir /sys/kernel/tracing/instances/foo has succeeded and you later
> > do a lookup/readdir on
> > 
> > ls -al /sys/kernel/tracing/instances/foo/events
> > 
> > Why should the creation of the dentries and inodes ever fail due to a
> > permission failure?
> 
> They shouldn't.
> 
> > The vfs did already verify that you had the required
> > permissions to list entries in that directory. Why should filling up
> > /sys/kernel/tracing/instances/foo/events ever fail then? It shouldn't
> > That tracefs instance would be half-functional. And again, right now
> > that inode_permission() check cannot even fail.
> 
> And it shouldn't. But without dentries and inodes, how does VFS know what
> is allowed to open the files?

So say you do:

mkdir /sys/kernel/tracing/instances/foo

After this has returned we know everything we need to know about the new
tracefs instance including the ownership and the mode of all inodes in
/sys/kernel/tracing/instances/foo/events/* and below precisely because
ownership is always inherited from the parent dentry and recorded in the
metadata struct eventfs_inode.

So say someone does:

open("/sys/kernel/tracing/instances/foo/events/xfs");

and say this is the first time that someone accesses that events/
directory.

When the open pathwalk is done, the vfs will determine via

[1] may_lookup(inode_of(events))

whether you are able to list entries such as "xfs" in that directory.
The vfs checks inode_permission(MAY_EXEC) on "events" and if that holds
it ends up calling i_op->eventfs_root_lookup(events).

At this point tracefs/eventfs adds the inodes for all entries in that
"events" directory including "xfs" based on the metadata it recorded
during the mkdir. Since now someone is actually interested in them. And
it initializes the inodes with ownership and everything and adds the
dentries that belong into that directory.

Nothing here depends on the permissions of the caller. The only
permission that mattered was done in the VFS in [1]. If the caller has
permissions to enter a directory they can lookup and list its contents.
And its contents where determined/fixed etc when mkdir was called.

So we just need to add the required objects into the caches (inode,
dentry) whose addition we intentionally defered until someone actually
needed them.

So, eventfs_root_lookup() now initializes the inodes with the ownership
from the stored metadata or from the parent dentry and splices in inodes
and dentries. No permission checking is needed for this because it is
always a recheck of what the vfs did in [1].

We now return to the vfs and path walk continues to the final component
that you actually want to open which is that "xfs" directory in this
example. We check the permissions on that inode via may_open("xfs") and
we open that directory returning an fd to userspace ultimately.

(I'm going by memory since I need to step out the door.)

Re: REGRESSION: lockdep warning triggered by 15b9ce7ecd: virtio_balloon: stay awake while adjusting balloon

2024-01-10 Thread Michael S. Tsirkin

On Mon, Jan 08, 2024 at 04:50:15PM -0500, Theodore Ts'o wrote:
> Hi, while doing final testing before sending a pull request, I merged
> in linux-next, and commit 5b9ce7ecd7: virtio_balloon: stay awake while
> adjusting balloon seems to be causing a lockdep warning (see attached)
> when running gce-xfstests on a Google Compute Engine e2 VM.  I was not
> able to trigger it using kvm-xfstests, but the following command:
> "gce-xfstests -C 10 ext4/4k generic/476) was sufficient to triger the
> problem.   For more information please see [1] and [2].
> 
> [1] 
> https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
> [2] https://thunk.org/gce-xfstests
> 
> I found it by looking at the git logs, and this commit aroused my
> suspicions, and I further testing showed that the lockdep warning was
> reproducible with this commit, but not when testing with the
> immediately preceeding commit (15b9ce7ecd^).
> 
> Cheers,


Thanks a lot for the report!
I pushed a fixed patch out (tree rebased).
Would be great if you can confirm it's allright now.

>   - Ted
> 
> 
> root: ext4/4k run xfstest generic/476
> systemd[1]: Started fstests-generic-476.scope - /usr/bin/bash -c test -w 
> /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec 
> ./tests/generic/476.
> kernel: [  399.361181] EXT4-fs (dm-1): mounted filesystem 
> 840e25bd-f650-4819-8562-7eded85ef370 r/w with ordered data mode. Quota mode: 
> none.
> systemd[1]: fstests-generic-476.scope: Deactivated successfully.
> systemd[1]: fstests-generic-476.scope: Consumed 3min 1.966s CPU time.
> systemd[1]: xt\x2dvdb.mount: Deactivated successfully.
> kernel: [  537.085404] EXT4-fs (dm-0): unmounting filesystem 
> d3d7a675-f7b6-4384-abec-2e60d885b6da.
> systemd[1]: xt\x2dvdc.mount: Deactivated successfully.
> kernel: [  540.565870] 
> kernel: [  540.567523] 
> kernel: [  540.572007] WARNING: inconsistent lock state
> kernel: [  540.576407] 6.7.0-rc3-xfstests-lockdep-00012-g5b9ce7ecd715 #318 
> Not tainted
> kernel: [  540.583532] 
> kernel: [  540.587928] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> kernel: [  540.594326] kworker/0:3/329 [HC0[0]:SC0[0]:HE1:SE1] takes:
> kernel: [  540.599955] 90b280a548c0 (>adjustment_lock){?...}-{2:2}, 
> at: update_balloon_size_func+0x33/0x190
> kernel: [  540.609926] {IN-HARDIRQ-W} state was registered at:
> kernel: [  540.614935]   __lock_acquire+0x3f2/0xb30
> kernel: [  540.618992]   lock_acquire+0xbf/0x2b0
> kernel: [  540.622786]   _raw_spin_lock_irqsave+0x43/0x90
> kernel: [  540.627366]   virtballoon_changed+0x51/0xd0
> kernel: [  540.631947]   virtio_config_changed+0x5a/0x70
> kernel: [  540.636437]   vp_config_changed+0x11/0x20
> kernel: [  540.640576]   __handle_irq_event_percpu+0x88/0x230
> kernel: [  540.645500]   handle_irq_event+0x38/0x80
> kernel: [  540.649558]   handle_edge_irq+0x8f/0x1f0
> kernel: [  540.653791]   __common_interrupt+0x47/0xf0
> kernel: [  540.658106]   common_interrupt+0x79/0xa0
> kernel: [  540.661672] EXT4-fs (dm-1): unmounting filesystem 
> 840e25bd-f650-4819-8562-7eded85ef370.
> kernel: [  540.663183]   asm_common_interrupt+0x26/0x40
> kernel: [  540.663190]   acpi_safe_halt+0x1b/0x30
> kernel: [  540.663196]   acpi_idle_enter+0x7b/0xd0
> kernel: [  540.663199]   cpuidle_enter_state+0x90/0x4f0
> kernel: [  540.688723]   cpuidle_enter+0x2d/0x40
> kernel: [  540.692516]   cpuidle_idle_call+0xe4/0x120
> kernel: [  540.697036]   do_idle+0x84/0xd0
> kernel: [  540.700393]   cpu_startup_entry+0x2a/0x30
> kernel: [  540.704588]   rest_init+0xe9/0x180
> kernel: [  540.708118]   arch_call_rest_init+0xe/0x30
> kernel: [  540.712426]   start_kernel+0x41c/0x4b0
> kernel: [  540.716310]   x86_64_start_reservations+0x18/0x30
> kernel: [  540.721164]   x86_64_start_kernel+0x8c/0x90
> kernel: [  540.725737]   secondary_startup_64_no_verify+0x178/0x17b
> kernel: [  540.731432] irq event stamp: 22681
> kernel: [  540.734956] hardirqs last  enabled at (22681): 
> [] _raw_spin_unlock_irq+0x28/0x50
> kernel: [  540.744564] hardirqs last disabled at (22680): 
> [] _raw_spin_lock_irq+0x5d/0x90
> kernel: [  540.753475] softirqs last  enabled at (22076): 
> [] srcu_invoke_callbacks+0x101/0x1c0
> kernel: [  540.762904] softirqs last disabled at (22072): 
> [] srcu_invoke_callbacks+0x101/0x1c0
> kernel: [  540.773298] 
> kernel: [  540.773298] other info that might help us debug this:
> kernel: [  540.780207]  Possible unsafe locking scenario:
> kernel: [  540.780207] 
> kernel: [  540.786438]CPU0
> kernel: [  540.789007]
> kernel: [  540.791766]   lock(>adjustment_lock);
> kernel: [  540.796014]   
> kernel: [  540.798778] lock(>adjustment_lock);
> kernel: [  540.803605] 
> kernel: [  540.803605]  *** DEADLOCK ***
> kernel: [  540.803605] 
> kernel: [  540.809840] 2 locks held by kworker/0:3/329:
> kernel: [  540.814259]  #0: 90b280079148

Re: [PATCH] driver/virtio: Add Memory Balloon Support for SEV/SEV-ES

2024-01-10 Thread Michael S. Tsirkin

On Wed, Jan 10, 2024 at 02:22:42PM +0800, Zheyun Shen wrote:
> For now, SEV pins guest's memory to avoid swapping or
> moving ciphertext, but leading to the inhibition of
> Memory Ballooning.
> 
> In Memory Ballooning, only guest's free pages will be relocated
> in balloon inflation and deflation, so the difference of plaintext
> doesn't matter to guest.
> 
> Memory Ballooning is a nice memory overcommitment technology can
> be used in CVM based on SEV and SEV-ES, so userspace tools can
> provide an option to allow SEV not to pin memory and enable 
> Memory Ballooning. Guest kernel may not inhibit Balloon and 
> should set shared memory for Balloon decrypted.
> 
> Signed-off-by: Zheyun Shen 

Sorry I don't get what you are saying at all.
Please format the commit log along the following lines:

Currently .
This is bad because ...
To fix ...
As a result ...


> ---
>  drivers/virtio/virtio_balloon.c | 18 ++
>  drivers/virtio/virtio_ring.c|  7 +++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 1fe93e93f..aca4c8a58 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -18,6 +18,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +#include 
> +#endif
>  
>  /*
>   * Balloon device works in 4K page units.  So each page is pointed to by
> @@ -870,6 +873,9 @@ static int virtio_balloon_register_shrinker(struct 
> virtio_balloon *vb)
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  struct virtio_balloon *vb;
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +size_t vb_size = PAGE_ALIGN(sizeof(*vb));
> +#endif
>  int err;
>  
>  if (!vdev->config->get) {
> @@ -878,11 +884,19 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  return -EINVAL;
>  }
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +vdev->priv = vb = kzalloc(vb_size, GFP_KERNEL);
> +#else
>  vdev->priv = vb = kzalloc(sizeof(*vb), GFP_KERNEL);
> +#endif
>  if (!vb) {
>  err = -ENOMEM;
>  goto out;
>  }
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +set_memory_decrypted((unsigned long)vb, vb_size / PAGE_SIZE);
> +memset(vb, 0, vb_size);
> +#endif
>  
>  INIT_WORK(>update_balloon_stats_work, update_balloon_stats_func);
>  INIT_WORK(>update_balloon_size_work, update_balloon_size_func);
> @@ -1101,7 +1115,11 @@ static int virtballoon_validate(struct virtio_device 
> *vdev)
>  else if (!virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON))
>  __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_REPORTING);
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +__virtio_set_bit(vdev, VIRTIO_F_ACCESS_PLATFORM);
> +#else
>  __virtio_clear_bit(vdev, VIRTIO_F_ACCESS_PLATFORM);
> +#endif
>  return 0;
>  }
>  
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 49299b1f9..875612a2e 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -14,6 +14,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +#include 
> +#endif
>  
>  #ifdef DEBUG
>  /* For development, we want to crash whenever the ring is screwed. */
> @@ -321,6 +324,10 @@ static void *vring_alloc_queue(struct virtio_device 
> *vdev, size_t size,
>  if (queue) {
>  phys_addr_t phys_addr = virt_to_phys(queue);
>  *dma_handle = (dma_addr_t)phys_addr;
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +set_memory_decrypted((unsigned long)queue, 
> PAGE_ALIGN(size) / PAGE_SIZE);
> +memset(queue, 0, PAGE_ALIGN(size));
> +#endif
>  
>  /*
>   * Sanity check: make sure we dind't truncate

No way I am going to spead CONFIG_AMD_MEM_ENCRYPT all over the place
like this.


> --
> 2.34.1

71 matches

Mail list logo