Re: [PATCH v2] eventfs: Fix file and directory uid and gid ownership

2023-12-21 Thread Google
On Thu, 21 Dec 2023 19:07:57 -0500
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> It was reported that when mounting the tracefs file system with a gid
> other than root, the ownership did not carry down to the eventfs directory
> due to the dynamic nature of it.
> 
> A fix was done to solve this, but it had two issues.
> 
> (a) if the attr passed into update_inode_attr() was NULL, it didn't do
> anything. This is true for files that have not had a chown or chgrp
> done to itself or any of its sibling files, as the attr is allocated
> for all children when any one needs it.
> 
>  # umount /sys/kernel/tracing
>  # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt
> 
>  # ls -ld /mnt/events/sched
> drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/
> 
>  # ls -ld /mnt/events/sched/sched_switch
> drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/
> 
> But when checking the files:
> 
>  # ls -l /mnt/events/sched/sched_switch
> total 0
> -rw-r- 1 root root 0 Dec 21 13:12 enable
> -rw-r- 1 root root 0 Dec 21 13:12 filter
> -r--r- 1 root root 0 Dec 21 13:12 format
> -r--r- 1 root root 0 Dec 21 13:12 hist
> -r--r- 1 root root 0 Dec 21 13:12 id
> -rw-r- 1 root root 0 Dec 21 13:12 trigger
> 
> (b) When the attr does not denote the UID or GID, it defaulted to using
> the parent uid or gid. This is incorrect as changing the parent
> uid or gid will automatically change all its children.
> 
>  # chgrp tracing /mnt/events/timer
> 
>  # ls -ld /mnt/events/timer
> drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer
> 
>  # ls -l /mnt/events/timer
> total 0
> -rw-r- 1 root root0 Dec 21 14:35 enable
> -rw-r- 1 root root0 Dec 21 14:35 filter
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init
> drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start
> 
> At first it was thought that this could be easily fixed by just making the
> default ownership of the superblock when it was mounted. But this does not
> handle the case of:
> 
>  # chgrp tracing instances
>  # mkdir instances/foo
> 
> If the superblock was used, then the group ownership would be that of what
> it was when it was mounted, when it should instead be "tracing".
> 
> Instead, set a flag for the top level eventfs directory ("events") to flag
> which eventfs_inode belongs to it.
> 
> Since the "events" directory's dentry and inode are never freed, it does
> not need to use its attr field to restore its mode and ownership. Use the
> this eventfs_inode's attr as the default ownership for all the files and
> directories underneath it.
> 
> When the events eventfs_inode is created, it sets its ownership to its
> parent uid and gid. As the events directory is created at boot up before
> it gets mounted, this will always be uid=0 and gid=0. If it's created via
> an instance, then it will take the ownership of the instance directory.
> 
> When the file system is mounted, it will update all the gids if one is
> specified. This will have a callback to update the events evenfs_inode's
> default entries.
> 
> When a file or directory is created under the events directory, it will
> walk the ei->dentry parents until it finds the evenfs_inode that belongs
> to the events directory to retrieve the default uid and gid values.
> 
> Link: 
> https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=gtw+anwscylhpfswrq84pjho...@mail.gmail.com/
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 
Tested-by: Masami Hiramatsu (Google) 

Thank you,

> Cc: sta...@vger.kernel.org
> Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to 
> parent uid and gid")
> Reported-by: Linus Torvalds 
> Signed-off-by: Steven Rostedt (Google) 
> ---
> Changes since v1: 
> https://lore.kernel.org/linux-trace-kernel/20231221173812.51fe6...@gandalf.local.home
> 
> - While writing kselftests to test ownership, I found a bug. The remount
>   

[PATCH v2] eventfs: Fix file and directory uid and gid ownership

2023-12-21 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

It was reported that when mounting the tracefs file system with a gid
other than root, the ownership did not carry down to the eventfs directory
due to the dynamic nature of it.

A fix was done to solve this, but it had two issues.

(a) if the attr passed into update_inode_attr() was NULL, it didn't do
anything. This is true for files that have not had a chown or chgrp
done to itself or any of its sibling files, as the attr is allocated
for all children when any one needs it.

 # umount /sys/kernel/tracing
 # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt

 # ls -ld /mnt/events/sched
drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/

 # ls -ld /mnt/events/sched/sched_switch
drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/

But when checking the files:

 # ls -l /mnt/events/sched/sched_switch
total 0
-rw-r- 1 root root 0 Dec 21 13:12 enable
-rw-r- 1 root root 0 Dec 21 13:12 filter
-r--r- 1 root root 0 Dec 21 13:12 format
-r--r- 1 root root 0 Dec 21 13:12 hist
-r--r- 1 root root 0 Dec 21 13:12 id
-rw-r- 1 root root 0 Dec 21 13:12 trigger

(b) When the attr does not denote the UID or GID, it defaulted to using
    the parent uid or gid. This is incorrect as changing the parent
uid or gid will automatically change all its children.

 # chgrp tracing /mnt/events/timer

 # ls -ld /mnt/events/timer
drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer

 # ls -l /mnt/events/timer
total 0
-rw-r- 1 root root0 Dec 21 14:35 enable
-rw-r- 1 root root0 Dec 21 14:35 filter
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start

At first it was thought that this could be easily fixed by just making the
default ownership of the superblock when it was mounted. But this does not
handle the case of:

 # chgrp tracing instances
 # mkdir instances/foo

If the superblock was used, then the group ownership would be that of what
it was when it was mounted, when it should instead be "tracing".

Instead, set a flag for the top level eventfs directory ("events") to flag
which eventfs_inode belongs to it.

Since the "events" directory's dentry and inode are never freed, it does
not need to use its attr field to restore its mode and ownership. Use the
this eventfs_inode's attr as the default ownership for all the files and
directories underneath it.

When the events eventfs_inode is created, it sets its ownership to its
parent uid and gid. As the events directory is created at boot up before
it gets mounted, this will always be uid=0 and gid=0. If it's created via
an instance, then it will take the ownership of the instance directory.

When the file system is mounted, it will update all the gids if one is
specified. This will have a callback to update the events evenfs_inode's
default entries.

When a file or directory is created under the events directory, it will
walk the ei->dentry parents until it finds the evenfs_inode that belongs
to the events directory to retrieve the default uid and gid values.

Link: 
https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=gtw+anwscylhpfswrq84pjho...@mail.gmail.com/

Cc: sta...@vger.kernel.org
Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to 
parent uid and gid")
Reported-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v1: 
https://lore.kernel.org/linux-trace-kernel/20231221173812.51fe6...@gandalf.local.home

- While writing kselftests to test ownership, I found a bug. The remount
  option allows for an update of the gid. If it is specified, then all
  dentries are traversed in tracefs and eventfs and the gid is updated.

  The bug is that a eventfs that does not have a dentry, but had its gid
  updated, the old gid is still stored, and when the dentry is created, it
  will use the stored gid.

  Instead, check all eventfs_inodes that do not have a dentry, and traverse
  its children to make sure their gids are updated as well.

 fs/tracefs/event_inode.c | 105 +++
 fs/tracefs/inode.c   |   6 +++
 fs/tracefs/internal.h|   2 +
 3 files changed, 103 insertions(+), 10 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/

[PATCH] eventfs: Fix file and directory uid and gid ownership

2023-12-21 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

It was reported that when mounting the tracefs file system with a gid
other than root, the ownership did not carry down to the eventfs directory
due to the dynamic nature of it.

A fix was done to solve this, but it had two issues.

(a) if the attr passed into update_inode_attr() was NULL, it didn't do
anything. This is true for files that have not had a chown or chgrp
done to itself or any of its sibling files, as the attr is allocated
for all children when any one needs it.

 # umount /sys/kernel/tracing
 # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt

 # ls -ld /mnt/events/sched
drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/

 # ls -ld /mnt/events/sched/sched_switch
drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/

But when checking the files:

 # ls -l /mnt/events/sched/sched_switch
total 0
-rw-r- 1 root root 0 Dec 21 13:12 enable
-rw-r- 1 root root 0 Dec 21 13:12 filter
-r--r- 1 root root 0 Dec 21 13:12 format
-r--r- 1 root root 0 Dec 21 13:12 hist
-r--r- 1 root root 0 Dec 21 13:12 id
-rw-r- 1 root root 0 Dec 21 13:12 trigger

(b) When the attr does not denote the UID or GID, it defaulted to using
    the parent uid or gid. This is incorrect as changing the parent
uid or gid will automatically change all its children.

 # chgrp tracing /mnt/events/timer

 # ls -ld /mnt/events/timer
drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer

 # ls -l /mnt/events/timer
total 0
-rw-r- 1 root root0 Dec 21 14:35 enable
-rw-r- 1 root root0 Dec 21 14:35 filter
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start

At first it was thought that this could be easily fixed by just making the
default ownership of the superblock when it was mounted. But this does not
handle the case of:

 # chgrp tracing instances
 # mkdir instances/foo

If the superblock was used, then the group ownership would be that of what
it was when it was mounted, when it should instead be "tracing".

Instead, set a flag for the top level eventfs directory ("events") to flag
which eventfs_inode belongs to it.

Since the "events" directory's dentry and inode are never freed, it does
not need to use its attr field to restore its mode and ownership. Use the
this eventfs_inode's attr as the default ownership for all the files and
directories underneath it.

When the events eventfs_inode is created, it sets its ownership to its
parent uid and gid. As the events directory is created at boot up before
it gets mounted, this will always be uid=0 and gid=0. If it's created via
an instance, then it will take the ownership of the instance directory.

When the file system is mounted, it will update all the gids if one is
specified. This will have a callback to update the events evenfs_inode's
default entries.

When a file or directory is created under the events directory, it will
walk the ei->dentry parents until it finds the evenfs_inode that belongs
to the events directory to retrieve the default uid and gid values.

Link: 
https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=gtw+anwscylhpfswrq84pjho...@mail.gmail.com/

Cc: sta...@vger.kernel.org
Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to 
parent uid and gid")
Reported-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 74 ++--
 fs/tracefs/inode.c   |  6 
 fs/tracefs/internal.h|  2 ++
 3 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 2ccc849a5bda..2d6b11195420 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -113,7 +113,14 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, 
struct dentry *dentry,
 * determined by the parent directory.
 */
if (dentry->d_inode->i_mode & S_IFDIR) {
-   update_attr(&ei->attr, iattr);
+   /*
+* The events directory dentry is never freed, unless its
+* part of an instance that is deleted. It's attr is the
+* default for its child fi

Re: [PATCH] eventfs: Have event files and directories default to parent uid and gid

2023-12-21 Thread Google
On Wed, 20 Dec 2023 10:50:17 -0500
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> Dongliang reported:
> 
>   I found that in the latest version, the nodes of tracefs have been
>   changed to dynamically created.
> 
>   This has caused me to encounter a problem where the gid I specified in
>   the mounting parameters cannot apply to all files, as in the following
>   situation:
> 
>   /data/tmp/events # mount | grep tracefs
>   tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012)
> 
>   gid 3012 = readtracefs
> 
>   /data/tmp # ls -lh
>   total 0
>   -r--r-   1 root readtracefs 0 1970-01-01 08:00 README
>   -r--r-   1 root readtracefs 0 1970-01-01 08:00 available_events
> 
>   ums9621_1h10:/data/tmp/events # ls -lh
>   total 0
>   drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer
>   drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc
> 
>   It will prevent certain applications from accessing tracefs properly, I
>   try to avoid this issue by making the following modifications.
> 
> To fix this, have the files created default to taking the ownership of
> the parent dentry unless the ownership was previously set by the user.
> 
> Link: 
> https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang@unisoc.com/
> 

This looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thank you!

> Reported-by: Dongliang Cui 
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  fs/tracefs/event_inode.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 43e237864a42..2ccc849a5bda 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -148,7 +148,8 @@ static const struct file_operations 
> eventfs_file_operations = {
>   .release= eventfs_release,
>  };
>  
> -static void update_inode_attr(struct inode *inode, struct eventfs_attr 
> *attr, umode_t mode)
> +static void update_inode_attr(struct dentry *dentry, struct inode *inode,
> +   struct eventfs_attr *attr, umode_t mode)
>  {
>   if (!attr) {
>   inode->i_mode = mode;
> @@ -162,9 +163,13 @@ static void update_inode_attr(struct inode *inode, 
> struct eventfs_attr *attr, um
>  
>   if (attr->mode & EVENTFS_SAVE_UID)
>   inode->i_uid = attr->uid;
> + else
> + inode->i_uid = d_inode(dentry->d_parent)->i_uid;
>  
>   if (attr->mode & EVENTFS_SAVE_GID)
>   inode->i_gid = attr->gid;
> + else
> + inode->i_gid = d_inode(dentry->d_parent)->i_gid;
>  }
>  
>  /**
> @@ -206,7 +211,7 @@ static struct dentry *create_file(const char *name, 
> umode_t mode,
>   return eventfs_failed_creating(dentry);
>  
>   /* If the user updated the directory's attributes, use them */
> - update_inode_attr(inode, attr, mode);
> + update_inode_attr(dentry, inode, attr, mode);
>  
>   inode->i_op = &eventfs_file_inode_operations;
>   inode->i_fop = fop;
> @@ -242,7 +247,8 @@ static struct dentry *create_dir(struct eventfs_inode 
> *ei, struct dentry *parent
>   return eventfs_failed_creating(dentry);
>  
>   /* If the user updated the directory's attributes, use them */
> - update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | 
> S_IXUGO);
> + update_inode_attr(dentry, inode, &ei->attr,
> +   S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO);
>  
>   inode->i_op = &eventfs_root_dir_inode_operations;
>   inode->i_fop = &eventfs_file_operations;
> -- 
> 2.42.0
> 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH] eventfs: Have event files and directories default to parent uid and gid

2023-12-20 Thread Steven Rostedt
On Wed, 20 Dec 2023 10:50:17 -0500
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> Dongliang reported:
> 
>   I found that in the latest version, the nodes of tracefs have been
>   changed to dynamically created.
> 
>   This has caused me to encounter a problem where the gid I specified in
>   the mounting parameters cannot apply to all files, as in the following
>   situation:
> 
>   /data/tmp/events # mount | grep tracefs
>   tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012)
> 
>   gid 3012 = readtracefs
> 
>   /data/tmp # ls -lh
>   total 0
>   -r--r-   1 root readtracefs 0 1970-01-01 08:00 README
>   -r--r-   1 root readtracefs 0 1970-01-01 08:00 available_events
> 
>   ums9621_1h10:/data/tmp/events # ls -lh
>   total 0
>   drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer
>   drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc
> 
>   It will prevent certain applications from accessing tracefs properly, I
>   try to avoid this issue by making the following modifications.
> 
> To fix this, have the files created default to taking the ownership of
> the parent dentry unless the ownership was previously set by the user.
> 
> Link: 
> https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang@unisoc.com/
> 

I forgot to add:

  Cc: sta...@vger.kernel.org
  Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode")


-- Steve

> Reported-by: Dongliang Cui 
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  fs/tracefs/event_inode.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 43e237864a42..2ccc849a5bda 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -148,7 +148,8 @@ static const struct file_operations 
> eventfs_file_operations = {
>   .release= eventfs_release,
>  };
>  
> -static void update_inode_attr(struct inode *inode, struct eventfs_attr 
> *attr, umode_t mode)
> +static void update_inode_attr(struct dentry *dentry, struct inode *inode,
> +   struct eventfs_attr *attr, umode_t mode)
>  {
>   if (!attr) {
>   inode->i_mode = mode;
> @@ -162,9 +163,13 @@ static void update_inode_attr(struct inode *inode, 
> struct eventfs_attr *attr, um
>  
>   if (attr->mode & EVENTFS_SAVE_UID)
>   inode->i_uid = attr->uid;
> + else
> + inode->i_uid = d_inode(dentry->d_parent)->i_uid;
>  
>   if (attr->mode & EVENTFS_SAVE_GID)
>   inode->i_gid = attr->gid;
> + else
> + inode->i_gid = d_inode(dentry->d_parent)->i_gid;
>  }
>  
>  /**
> @@ -206,7 +211,7 @@ static struct dentry *create_file(const char *name, 
> umode_t mode,
>   return eventfs_failed_creating(dentry);
>  
>   /* If the user updated the directory's attributes, use them */
> - update_inode_attr(inode, attr, mode);
> + update_inode_attr(dentry, inode, attr, mode);
>  
>   inode->i_op = &eventfs_file_inode_operations;
>   inode->i_fop = fop;
> @@ -242,7 +247,8 @@ static struct dentry *create_dir(struct eventfs_inode 
> *ei, struct dentry *parent
>   return eventfs_failed_creating(dentry);
>  
>   /* If the user updated the directory's attributes, use them */
> - update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | 
> S_IXUGO);
> + update_inode_attr(dentry, inode, &ei->attr,
> +   S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO);
>  
>   inode->i_op = &eventfs_root_dir_inode_operations;
>   inode->i_fop = &eventfs_file_operations;




[PATCH] eventfs: Have event files and directories default to parent uid and gid

2023-12-20 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Dongliang reported:

  I found that in the latest version, the nodes of tracefs have been
  changed to dynamically created.

  This has caused me to encounter a problem where the gid I specified in
  the mounting parameters cannot apply to all files, as in the following
  situation:

  /data/tmp/events # mount | grep tracefs
  tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012)

  gid 3012 = readtracefs

  /data/tmp # ls -lh
  total 0
  -r--r-   1 root readtracefs 0 1970-01-01 08:00 README
  -r--r-   1 root readtracefs 0 1970-01-01 08:00 available_events

  ums9621_1h10:/data/tmp/events # ls -lh
  total 0
  drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer
  drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc

  It will prevent certain applications from accessing tracefs properly, I
  try to avoid this issue by making the following modifications.

To fix this, have the files created default to taking the ownership of
the parent dentry unless the ownership was previously set by the user.

Link: 
https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang@unisoc.com/

Reported-by: Dongliang Cui 
Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 43e237864a42..2ccc849a5bda 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -148,7 +148,8 @@ static const struct file_operations eventfs_file_operations 
= {
.release= eventfs_release,
 };
 
-static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, 
umode_t mode)
+static void update_inode_attr(struct dentry *dentry, struct inode *inode,
+ struct eventfs_attr *attr, umode_t mode)
 {
if (!attr) {
inode->i_mode = mode;
@@ -162,9 +163,13 @@ static void update_inode_attr(struct inode *inode, struct 
eventfs_attr *attr, um
 
if (attr->mode & EVENTFS_SAVE_UID)
inode->i_uid = attr->uid;
+   else
+   inode->i_uid = d_inode(dentry->d_parent)->i_uid;
 
if (attr->mode & EVENTFS_SAVE_GID)
inode->i_gid = attr->gid;
+   else
+   inode->i_gid = d_inode(dentry->d_parent)->i_gid;
 }
 
 /**
@@ -206,7 +211,7 @@ static struct dentry *create_file(const char *name, umode_t 
mode,
return eventfs_failed_creating(dentry);
 
/* If the user updated the directory's attributes, use them */
-   update_inode_attr(inode, attr, mode);
+   update_inode_attr(dentry, inode, attr, mode);
 
inode->i_op = &eventfs_file_inode_operations;
inode->i_fop = fop;
@@ -242,7 +247,8 @@ static struct dentry *create_dir(struct eventfs_inode *ei, 
struct dentry *parent
return eventfs_failed_creating(dentry);
 
/* If the user updated the directory's attributes, use them */
-   update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | 
S_IXUGO);
+   update_inode_attr(dentry, inode, &ei->attr,
+ S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO);
 
inode->i_op = &eventfs_root_dir_inode_operations;
inode->i_fop = &eventfs_file_operations;
-- 
2.42.0




Re: [PATCH v3.4] capabilities: require CAP_SETFCAP to map uid 0

2021-04-20 Thread Linus Torvalds
On Tue, Apr 20, 2021 at 9:47 AM Christian Brauner
 wrote:
>
> If there's no objections then Linus can probably just pick up the single
> patch here directly:

I'm ok with that assuming I don't hear any last-minute concerns..

I'll plan on appling it later today, so anybody with concerns please
holler quickly.

I don't want to leave it to much later in the week, and I may or may
not be functional tomorrow (getting my second vaccine shot, some
people react more strongly to it, so I'm leaving the possibility open
of not getting out of bed ;)

 Linus


Re: [PATCH v3.4] capabilities: require CAP_SETFCAP to map uid 0

2021-04-20 Thread Christian Brauner
On Tue, Apr 20, 2021 at 08:43:34AM -0500, Serge Hallyn wrote:
> cap_setfcap is required to create file capabilities.
> 
> Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> process running as uid 0 but without cap_setfcap is able to work around
> this as follows: unshare a new user namespace which maps parent uid 0
> into the child namespace.  While this task will not have new
> capabilities against the parent namespace, there is a loophole due to
> the way namespaced file capabilities are represented as xattrs.  File
> capabilities valid in userns 1 are distinguished from file capabilities
> valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> restricted root process can unshare a new self-mapping namespace, add a
> namespaced file capability onto a file, then use that file capability in
> the parent namespace.
> 
> To prevent that, do not allow mapping parent uid 0 if the process which
> opened the uid_map file does not have CAP_SETFCAP, which is the capability
> for setting file capabilities.
> 
> As a further wrinkle:  a task can unshare its user namespace, then
> open its uid_map file itself, and map (only) its own uid.  In this
> case we do not have the credential from before unshare,  which was
> potentially more restricted.  So, when creating a user namespace, we
> record whether the creator had CAP_SETFCAP.  Then we can use that
> during map_write().
> 
> With this patch:
> 
> 1. Unprivileged user can still unshare -Ur
> 
> ubuntu@caps:~$ unshare -Ur
> root@caps:~# logout
> 
> 2. Root user can still unshare -Ur
> 
> ubuntu@caps:~$ sudo bash
> root@caps:/home/ubuntu# unshare -Ur
> root@caps:/home/ubuntu# logout
> 
> 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> 
> root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> unable to set CAP_SETFCAP effective capability: Operation not permitted
> root@caps:/home/ubuntu# unshare -Ur
> unshare: write failed /proc/self/uid_map: Operation not permitted
> 
> Note: an alternative solution would be to allow uid 0 mappings by
> processes without CAP_SETFCAP, but to prevent such a namespace from
> writing any file capabilities.  This approach can be seen here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> 
> History:
> 
> Commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file
> capabilities") tried to fix the issue by preventing v3 fscaps to be
> written to disk when the root uid would map to the same uid in nested
> user namespaces. This led to regressions for various workloads. For
> example, see [1]. Ultimately this is a valid use-case we have to support
> meaning we had to revert this change in 3b0c2d3eaa83 ("Revert
> 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file
> capabilities")").
> 
> [1]: https://github.com/containers/buildah/issues/3071
> 
> Signed-off-by: Serge Hallyn 
> Reviewed-by: Andrew G. Morgan 
> Tested-by: Christian Brauner 
> Reviewed-by: Christian Brauner 
> Tested-by: Giuseppe Scrivano 
> Cc: "Eric W. Biederman" 

If there's no objections then Linus can probably just pick up the single
patch here directly:
https://lore.kernel.org/lkml/20210420134334.ga11...@mail.hallyn.com

I'm not sure it's worth waiting and releasing another kernel with this
bug. This tigthens the semantics nicely and makes for a simple check at
userns creation time instead of repeatedly checking at setxattr(). With
all the testing done we can be quite confident the risk of regressions
is way lower than the old patch and even if we see one I think this
version of the fix is actually worth the risk.

Christian


[PATCH v3.4] capabilities: require CAP_SETFCAP to map uid 0

2021-04-20 Thread Serge E. Hallyn
cap_setfcap is required to create file capabilities.

Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
process running as uid 0 but without cap_setfcap is able to work around
this as follows: unshare a new user namespace which maps parent uid 0
into the child namespace.  While this task will not have new
capabilities against the parent namespace, there is a loophole due to
the way namespaced file capabilities are represented as xattrs.  File
capabilities valid in userns 1 are distinguished from file capabilities
valid in userns 2 by the kuid which underlies uid 0.  Therefore the
restricted root process can unshare a new self-mapping namespace, add a
namespaced file capability onto a file, then use that file capability in
the parent namespace.

To prevent that, do not allow mapping parent uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the capability
for setting file capabilities.

As a further wrinkle:  a task can unshare its user namespace, then
open its uid_map file itself, and map (only) its own uid.  In this
case we do not have the credential from before unshare,  which was
potentially more restricted.  So, when creating a user namespace, we
record whether the creator had CAP_SETFCAP.  Then we can use that
during map_write().

With this patch:

1. Unprivileged user can still unshare -Ur

ubuntu@caps:~$ unshare -Ur
root@caps:~# logout

2. Root user can still unshare -Ur

ubuntu@caps:~$ sudo bash
root@caps:/home/ubuntu# unshare -Ur
root@caps:/home/ubuntu# logout

3. Root user without CAP_SETFCAP cannot unshare -Ur:

root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted

Note: an alternative solution would be to allow uid 0 mappings by
processes without CAP_SETFCAP, but to prevent such a namespace from
writing any file capabilities.  This approach can be seen here:

https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4

History:

Commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file
capabilities") tried to fix the issue by preventing v3 fscaps to be
written to disk when the root uid would map to the same uid in nested
user namespaces. This led to regressions for various workloads. For
example, see [1]. Ultimately this is a valid use-case we have to support
meaning we had to revert this change in 3b0c2d3eaa83 ("Revert
95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file
capabilities")").

[1]: https://github.com/containers/buildah/issues/3071

Signed-off-by: Serge Hallyn 
Reviewed-by: Andrew G. Morgan 
Tested-by: Christian Brauner 
Reviewed-by: Christian Brauner 
Tested-by: Giuseppe Scrivano 
Cc: "Eric W. Biederman" 

Changelog:
   * fix logic in the case of writing to another task's uid_map
   * rename 'ns' to 'map_ns', and make a file_ns local variable
   * use /* comments */
   * update the CAP_SETFCAP comment in capability.h
   * rename parent_unpriv to parent_can_setfcap (and reverse the
 logic)
   * remove printks
   * clarify (i hope) the code comments
   * update capability.h comment
   * renamed parent_can_setfcap to parent_could_setfcap
   * made the check its own disallowed_0_mapping() fn
   * moved the check into new_idmap_permitted
   * rename disallowed_0_mapping to verify_root_mapping
   * change verify_root_mapping to Christian's suggested flow
   * correct+clarify comments: parent uid 0 mapping to any
 child uid is a problem.
   * remove unused lower_first variable.
---
 include/linux/user_namespace.h  |  3 ++
 include/uapi/linux/capability.h |  3 +-
 kernel/user_namespace.c | 65 +++--
 3 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..f6c5f784be5a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,9 @@ struct user_namespace {
kgid_t  group;
struct ns_commonns;
unsigned long   flags;
+   /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
+* in its effective capability set at the child ns creation time. */
+   boolparent_could_setfcap;
 
 #ifdef CONFIG_KEYS
/* List of joinable keyrings in this namespace.  Modification access of
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index c6ca33034147..2ddb4226cd23 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -335,7 +335,8 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_CONTROL30
 
-/* Set or remove ca

Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

2021-04-20 Thread Serge E. Hallyn
On Mon, Apr 19, 2021 at 05:52:39PM +0200, Giuseppe Scrivano wrote:
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > Guiseppe can you take a look at this?
> >
> > This is a second attempt at tightening up the semantics of writing to
> > file capabilities from a user namespace.
> >
> > The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c
> > ("capabilities: Don't allow writing ambiguous v3 file capabilities")"),
> > which corrected the issue reported in:
> > https://github.com/containers/buildah/issues/3071
> >
> > There is a report the podman testsuite passes.  While different this
> > looks in many ways much more strict than the code that was reverted.  So
> > while I can imagine this change doesn't cause problems as is, I will be
> > surprised.
> 
> thanks for pulling me in the discussion.
> 
> I've tested the patch with several cases similar to the issue we had in
> the past and the patch seems to work well.  
> 
> Podman creates all the user namespaces within the same parent user
> namespace.  In the parent user namespace all the capabilities are kept
> and AFAIK Docker does the same.  I'd expect a change in behavior only
> for nested user namespaces in containers where CAP_SETFCAP is not
> granted, but that is not a common configuration given that CAP_SETFCAP
> is added by default.
> 
> 
> > "Serge E. Hallyn"  writes:
> >
> >> +/**
> >> + * verify_root_map() - check the uid 0 mapping
> >> + * @file: idmapping file
> >> + * @map_ns: user namespace of the target process
> >> + * @new_map: requested idmap
> >> + *
> >> + * If a process requested a mapping for uid 0 onto uid 0, verify that the
> >> + * process writing the map had the CAP_SETFCAP capability as the target 
> >> process
> >> + * will be able to write fscaps that are valid in ancestor user 
> >> namespaces.
> >> + *
> >> + * Return: true if the mapping is allowed, false if not.
> >> + */
> >> +static bool verify_root_map(const struct file *file,
> >> +  struct user_namespace *map_ns,
> >> +  struct uid_gid_map *new_map)
> >> +{
> >> +  int idx;
> >> +  const struct user_namespace *file_ns = file->f_cred->user_ns;
> >> +  struct uid_gid_extent *extent0 = NULL;
> >> +
> >> +  for (idx = 0; idx < new_map->nr_extents; idx++) {
> >> +  u32 lower_first;
> 
> nit: lower_first seems unused?

Drat - I noticed that Sunday or Monday and forgot to remove it, thanks.

> >> +
> >> +  if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> >> +  extent0 = &new_map->extent[idx];
> >> +  else
> >> +  extent0 = &new_map->forward[idx];
> >> +  if (extent0->lower_first == 0)
> >> +  break;
> >> +
> >> +  extent0 = NULL;
> >> +  }
> 
> Tested-by: Giuseppe Scrivano 

Awesome - thanks for testing.


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-20 Thread Christian Brauner
On Mon, Apr 19, 2021 at 10:42:08PM -0500, Serge Hallyn wrote:
> On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote:
> > On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> > > cap_setfcap is required to create file capabilities.
> > > 
> > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> > > process running as uid 0 but without cap_setfcap is able to work around
> > > this as follows: unshare a new user namespace which maps parent uid 0
> > > into the child namespace.  While this task will not have new
> > > capabilities against the parent namespace, there is a loophole due to
> > > the way namespaced file capabilities are represented as xattrs.  File
> > > capabilities valid in userns 1 are distinguished from file capabilities
> > > valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> > > restricted root process can unshare a new self-mapping namespace, add a
> > > namespaced file capability onto a file, then use that file capability in
> > > the parent namespace.
> > > 
> > > To prevent that, do not allow mapping parent uid 0 if the process which
> > > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > > for setting file capabilities.
> > > 
> > > As a further wrinkle:  a task can unshare its user namespace, then
> > > open its uid_map file itself, and map (only) its own uid.  In this
> > > case we do not have the credential from before unshare,  which was
> > > potentially more restricted.  So, when creating a user namespace, we
> > > record whether the creator had CAP_SETFCAP.  Then we can use that
> > > during map_write().
> > > 
> > > With this patch:
> > > 
> > > 1. Unprivileged user can still unshare -Ur
> > > 
> > > ubuntu@caps:~$ unshare -Ur
> > > root@caps:~# logout
> > > 
> > > 2. Root user can still unshare -Ur
> > > 
> > > ubuntu@caps:~$ sudo bash
> > > root@caps:/home/ubuntu# unshare -Ur
> > > root@caps:/home/ubuntu# logout
> > > 
> > > 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> > > 
> > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > > root@caps:/home/ubuntu# unshare -Ur
> > > unshare: write failed /proc/self/uid_map: Operation not permitted
> > > 
> > > Note: an alternative solution would be to allow uid 0 mappings by
> > > processes without CAP_SETFCAP, but to prevent such a namespace from
> > > writing any file capabilities.  This approach can be seen here:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > > 
> > 
> > Ah, can you link to the previous fix and its revert, please? I think
> > that was mentioned in the formerly private thread as well but we forgot:
> > 
> > commit 95ebabde382c371572297915b104e55403674e73
> > Author: Eric W. Biederman 
> > Date:   Thu Dec 17 09:42:00 2020 -0600
> > 
> > capabilities: Don't allow writing ambiguous v3 file capabilities
> > 
> > commit 3b0c2d3eaa83da259d7726192cf55a137769012f
> > Author: Eric W. Biederman 
> > Date:   Fri Mar 12 15:07:09 2021 -0600
> > 
> > Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 
> > file capabilities")
> 
> Sure.
> 
> Is there a tag for that kind of thing or do I just mention it at the end
> of the description?

In this case it might make sense to just have a little paragraph that
explains the regression. How do you feel about adding?:

  Commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file
  capabilities") tried to fix the issue by preventing v3 fscaps to be
  written to disk when the root uid would map to the same uid in nested
  user namespaces. This led to regressions for various workloads. For
  example, see [1]. Ultimately this is a valid use-case we have to support
  meaning we had to revert this change in 3b0c2d3eaa83 ("Revert
  95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file
  capabilities")").
  
  [1]: https://github.com/containers/buildah/issues/3071


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Serge E. Hallyn
On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote:
> On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> > cap_setfcap is required to create file capabilities.
> > 
> > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> > process running as uid 0 but without cap_setfcap is able to work around
> > this as follows: unshare a new user namespace which maps parent uid 0
> > into the child namespace.  While this task will not have new
> > capabilities against the parent namespace, there is a loophole due to
> > the way namespaced file capabilities are represented as xattrs.  File
> > capabilities valid in userns 1 are distinguished from file capabilities
> > valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> > restricted root process can unshare a new self-mapping namespace, add a
> > namespaced file capability onto a file, then use that file capability in
> > the parent namespace.
> > 
> > To prevent that, do not allow mapping parent uid 0 if the process which
> > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > for setting file capabilities.
> > 
> > As a further wrinkle:  a task can unshare its user namespace, then
> > open its uid_map file itself, and map (only) its own uid.  In this
> > case we do not have the credential from before unshare,  which was
> > potentially more restricted.  So, when creating a user namespace, we
> > record whether the creator had CAP_SETFCAP.  Then we can use that
> > during map_write().
> > 
> > With this patch:
> > 
> > 1. Unprivileged user can still unshare -Ur
> > 
> > ubuntu@caps:~$ unshare -Ur
> > root@caps:~# logout
> > 
> > 2. Root user can still unshare -Ur
> > 
> > ubuntu@caps:~$ sudo bash
> > root@caps:/home/ubuntu# unshare -Ur
> > root@caps:/home/ubuntu# logout
> > 
> > 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> > 
> > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > root@caps:/home/ubuntu# unshare -Ur
> > unshare: write failed /proc/self/uid_map: Operation not permitted
> > 
> > Note: an alternative solution would be to allow uid 0 mappings by
> > processes without CAP_SETFCAP, but to prevent such a namespace from
> > writing any file capabilities.  This approach can be seen here:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > 
> 
> Ah, can you link to the previous fix and its revert, please? I think
> that was mentioned in the formerly private thread as well but we forgot:
> 
> commit 95ebabde382c371572297915b104e55403674e73
> Author: Eric W. Biederman 
> Date:   Thu Dec 17 09:42:00 2020 -0600
> 
> capabilities: Don't allow writing ambiguous v3 file capabilities
> 
> commit 3b0c2d3eaa83da259d7726192cf55a137769012f
> Author: Eric W. Biederman 
> Date:   Fri Mar 12 15:07:09 2021 -0600
> 
> Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file 
> capabilities")

Sure.

Is there a tag for that kind of thing or do I just mention it at the end
of the description?


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Christian Brauner
On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> cap_setfcap is required to create file capabilities.
> 
> Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> process running as uid 0 but without cap_setfcap is able to work around
> this as follows: unshare a new user namespace which maps parent uid 0
> into the child namespace.  While this task will not have new
> capabilities against the parent namespace, there is a loophole due to
> the way namespaced file capabilities are represented as xattrs.  File
> capabilities valid in userns 1 are distinguished from file capabilities
> valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> restricted root process can unshare a new self-mapping namespace, add a
> namespaced file capability onto a file, then use that file capability in
> the parent namespace.
> 
> To prevent that, do not allow mapping parent uid 0 if the process which
> opened the uid_map file does not have CAP_SETFCAP, which is the capability
> for setting file capabilities.
> 
> As a further wrinkle:  a task can unshare its user namespace, then
> open its uid_map file itself, and map (only) its own uid.  In this
> case we do not have the credential from before unshare,  which was
> potentially more restricted.  So, when creating a user namespace, we
> record whether the creator had CAP_SETFCAP.  Then we can use that
> during map_write().
> 
> With this patch:
> 
> 1. Unprivileged user can still unshare -Ur
> 
> ubuntu@caps:~$ unshare -Ur
> root@caps:~# logout
> 
> 2. Root user can still unshare -Ur
> 
> ubuntu@caps:~$ sudo bash
> root@caps:/home/ubuntu# unshare -Ur
> root@caps:/home/ubuntu# logout
> 
> 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> 
> root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> unable to set CAP_SETFCAP effective capability: Operation not permitted
> root@caps:/home/ubuntu# unshare -Ur
> unshare: write failed /proc/self/uid_map: Operation not permitted
> 
> Note: an alternative solution would be to allow uid 0 mappings by
> processes without CAP_SETFCAP, but to prevent such a namespace from
> writing any file capabilities.  This approach can be seen here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> 

Ah, can you link to the previous fix and its revert, please? I think
that was mentioned in the formerly private thread as well but we forgot:

commit 95ebabde382c371572297915b104e55403674e73
Author: Eric W. Biederman 
Date:   Thu Dec 17 09:42:00 2020 -0600

capabilities: Don't allow writing ambiguous v3 file capabilities

commit 3b0c2d3eaa83da259d7726192cf55a137769012f
Author: Eric W. Biederman 
Date:   Fri Mar 12 15:07:09 2021 -0600

Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file 
capabilities")

> Signed-off-by: Serge Hallyn 
> Reviewed-by: Andrew G. Morgan 
> Tested-by: Christian Brauner 
> Reviewed-by: Christian Brauner 
> Cc: "Eric W. Biederman" 


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

2021-04-19 Thread Christian Brauner
On Mon, Apr 19, 2021 at 05:52:39PM +0200, Giuseppe Scrivano wrote:
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > Guiseppe can you take a look at this?
> >
> > This is a second attempt at tightening up the semantics of writing to
> > file capabilities from a user namespace.
> >
> > The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c
> > ("capabilities: Don't allow writing ambiguous v3 file capabilities")"),
> > which corrected the issue reported in:
> > https://github.com/containers/buildah/issues/3071
> >
> > There is a report the podman testsuite passes.  While different this
> > looks in many ways much more strict than the code that was reverted.  So
> > while I can imagine this change doesn't cause problems as is, I will be
> > surprised.
> 
> thanks for pulling me in the discussion.
> 
> I've tested the patch with several cases similar to the issue we had in
> the past and the patch seems to work well.  
> 
> Podman creates all the user namespaces within the same parent user
> namespace.  In the parent user namespace all the capabilities are kept
> and AFAIK Docker does the same.  I'd expect a change in behavior only
> for nested user namespaces in containers where CAP_SETFCAP is not
> granted, but that is not a common configuration given that CAP_SETFCAP
> is added by default.

Same for us and we do have extensive nested container workloads with
other runtimes running containers too.

> 
> 
> > "Serge E. Hallyn"  writes:
> >
> >> +/**
> >> + * verify_root_map() - check the uid 0 mapping
> >> + * @file: idmapping file
> >> + * @map_ns: user namespace of the target process
> >> + * @new_map: requested idmap
> >> + *
> >> + * If a process requested a mapping for uid 0 onto uid 0, verify that the
> >> + * process writing the map had the CAP_SETFCAP capability as the target 
> >> process
> >> + * will be able to write fscaps that are valid in ancestor user 
> >> namespaces.
> >> + *
> >> + * Return: true if the mapping is allowed, false if not.
> >> + */
> >> +static bool verify_root_map(const struct file *file,
> >> +  struct user_namespace *map_ns,
> >> +  struct uid_gid_map *new_map)
> >> +{
> >> +  int idx;
> >> +  const struct user_namespace *file_ns = file->f_cred->user_ns;
> >> +  struct uid_gid_extent *extent0 = NULL;
> >> +
> >> +  for (idx = 0; idx < new_map->nr_extents; idx++) {
> >> +  u32 lower_first;
> 
> nit: lower_first seems unused?
> 
> >> +
> >> +  if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> >> +  extent0 = &new_map->extent[idx];
> >> +  else
> >> +  extent0 = &new_map->forward[idx];
> >> +  if (extent0->lower_first == 0)
> >> +  break;
> >> +
> >> +  extent0 = NULL;
> >> +  }
> 
> Tested-by: Giuseppe Scrivano 

Thanks for running the tests and confirming my results!
Christian


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

2021-04-19 Thread Giuseppe Scrivano
ebied...@xmission.com (Eric W. Biederman) writes:

> Guiseppe can you take a look at this?
>
> This is a second attempt at tightening up the semantics of writing to
> file capabilities from a user namespace.
>
> The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c
> ("capabilities: Don't allow writing ambiguous v3 file capabilities")"),
> which corrected the issue reported in:
> https://github.com/containers/buildah/issues/3071
>
> There is a report the podman testsuite passes.  While different this
> looks in many ways much more strict than the code that was reverted.  So
> while I can imagine this change doesn't cause problems as is, I will be
> surprised.

thanks for pulling me in the discussion.

I've tested the patch with several cases similar to the issue we had in
the past and the patch seems to work well.  

Podman creates all the user namespaces within the same parent user
namespace.  In the parent user namespace all the capabilities are kept
and AFAIK Docker does the same.  I'd expect a change in behavior only
for nested user namespaces in containers where CAP_SETFCAP is not
granted, but that is not a common configuration given that CAP_SETFCAP
is added by default.


> "Serge E. Hallyn"  writes:
>
>> +/**
>> + * verify_root_map() - check the uid 0 mapping
>> + * @file: idmapping file
>> + * @map_ns: user namespace of the target process
>> + * @new_map: requested idmap
>> + *
>> + * If a process requested a mapping for uid 0 onto uid 0, verify that the
>> + * process writing the map had the CAP_SETFCAP capability as the target 
>> process
>> + * will be able to write fscaps that are valid in ancestor user namespaces.
>> + *
>> + * Return: true if the mapping is allowed, false if not.
>> + */
>> +static bool verify_root_map(const struct file *file,
>> +struct user_namespace *map_ns,
>> +struct uid_gid_map *new_map)
>> +{
>> +int idx;
>> +const struct user_namespace *file_ns = file->f_cred->user_ns;
>> +struct uid_gid_extent *extent0 = NULL;
>> +
>> +for (idx = 0; idx < new_map->nr_extents; idx++) {
>> +u32 lower_first;

nit: lower_first seems unused?

>> +
>> +if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>> +extent0 = &new_map->extent[idx];
>> +else
>> +extent0 = &new_map->forward[idx];
>> +if (extent0->lower_first == 0)
>> +break;
>> +
>> +extent0 = NULL;
>> +}

Tested-by: Giuseppe Scrivano 

Regards,
Giuseppe



[PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Serge E. Hallyn
cap_setfcap is required to create file capabilities.

Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
process running as uid 0 but without cap_setfcap is able to work around
this as follows: unshare a new user namespace which maps parent uid 0
into the child namespace.  While this task will not have new
capabilities against the parent namespace, there is a loophole due to
the way namespaced file capabilities are represented as xattrs.  File
capabilities valid in userns 1 are distinguished from file capabilities
valid in userns 2 by the kuid which underlies uid 0.  Therefore the
restricted root process can unshare a new self-mapping namespace, add a
namespaced file capability onto a file, then use that file capability in
the parent namespace.

To prevent that, do not allow mapping parent uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the capability
for setting file capabilities.

As a further wrinkle:  a task can unshare its user namespace, then
open its uid_map file itself, and map (only) its own uid.  In this
case we do not have the credential from before unshare,  which was
potentially more restricted.  So, when creating a user namespace, we
record whether the creator had CAP_SETFCAP.  Then we can use that
during map_write().

With this patch:

1. Unprivileged user can still unshare -Ur

ubuntu@caps:~$ unshare -Ur
root@caps:~# logout

2. Root user can still unshare -Ur

ubuntu@caps:~$ sudo bash
root@caps:/home/ubuntu# unshare -Ur
root@caps:/home/ubuntu# logout

3. Root user without CAP_SETFCAP cannot unshare -Ur:

root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted

Note: an alternative solution would be to allow uid 0 mappings by
processes without CAP_SETFCAP, but to prevent such a namespace from
writing any file capabilities.  This approach can be seen here:

https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4

Signed-off-by: Serge Hallyn 
Reviewed-by: Andrew G. Morgan 
Tested-by: Christian Brauner 
Reviewed-by: Christian Brauner 
Cc: "Eric W. Biederman" 

Changelog:
   * fix logic in the case of writing to another task's uid_map
   * rename 'ns' to 'map_ns', and make a file_ns local variable
   * use /* comments */
   * update the CAP_SETFCAP comment in capability.h
   * rename parent_unpriv to parent_can_setfcap (and reverse the
 logic)
   * remove printks
   * clarify (i hope) the code comments
   * update capability.h comment
   * renamed parent_can_setfcap to parent_could_setfcap
   * made the check its own disallowed_0_mapping() fn
   * moved the check into new_idmap_permitted
   * rename disallowed_0_mapping to verify_root_mapping
   * change verify_root_mapping to Christian's suggested flow
   * correct+clarify comments: parent uid 0 mapping to any
 child uid is a problem.
---
 include/linux/user_namespace.h  |  3 ++
 include/uapi/linux/capability.h |  3 +-
 kernel/user_namespace.c | 67 +++--
 3 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..f6c5f784be5a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,9 @@ struct user_namespace {
kgid_t  group;
struct ns_commonns;
unsigned long   flags;
+   /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
+* in its effective capability set at the child ns creation time. */
+   boolparent_could_setfcap;
 
 #ifdef CONFIG_KEYS
/* List of joinable keyrings in this namespace.  Modification access of
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index c6ca33034147..2ddb4226cd23 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -335,7 +335,8 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_CONTROL30
 
-/* Set or remove capabilities on files */
+/* Set or remove capabilities on files.
+   Map uid=0 into a child user namespace. */
 
 #define CAP_SETFCAP 31
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index af612945a4d0..609a729a9879 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -106,6 +106,7 @@ int create_user_ns(struct cred *new)
if (!ns)
goto fail_dec;
 
+   ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP);
ret = ns_alloc_inum(&ns->ns);
if (ret)
goto fail_free;
@@ -841,6 +842,62 @@ static int sort_idmaps(struct uid_gid_map *map)
    retu

Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

2021-04-18 Thread Eric W. Biederman


Guiseppe can you take a look at this?

This is a second attempt at tightening up the semantics of writing to
file capabilities from a user namespace.

The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c
("capabilities: Don't allow writing ambiguous v3 file capabilities")"),
which corrected the issue reported in:
https://github.com/containers/buildah/issues/3071

There is a report the podman testsuite passes.  While different this
looks in many ways much more strict than the code that was reverted.  So
while I can imagine this change doesn't cause problems as is, I will be
surprised.

Eric



"Serge E. Hallyn"  writes:

> A process running as uid 0 but without cap_setfcap currently can simply
> unshare a new user namespace with uid 0 mapped to 0.  While this task
> will not have new capabilities against the parent namespace, there is
> a loophole due to the way namespaced file capabilities work.  File
> capabilities valid in userns 1 are distinguised from file capabilities
> valid in userns 2 by the kuid which underlies uid 0.  Therefore
> the restricted root process can unshare a new self-mapping namespace,
> add a namespaced file capability onto a file, then use that file
> capability in the parent namespace.
>
> To prevent that, do not allow mapping uid 0 if the process which
> opened the uid_map file does not have CAP_SETFCAP, which is the capability
> for setting file capabilities.
>
> A further wrinkle:  a task can unshare its user namespace, then
> open its uid_map file itself, and map (only) its own uid.  In this
> case we do not have the credential from before unshare,  which was
> potentially more restricted.  So, when creating a user namespace, we
> record whether the creator had CAP_SETFCAP.  Then we can use that
> during map_write().
>
> With this patch:
>
> 1. unprivileged user can still unshare -Ur
>
> ubuntu@caps:~$ unshare -Ur
> root@caps:~# logout
>
> 2. root user can still unshare -Ur
>
> ubuntu@caps:~$ sudo bash
> root@caps:/home/ubuntu# unshare -Ur
> root@caps:/home/ubuntu# logout
>
> 3. root user without CAP_SETFCAP cannot unshare -Ur:
>
> root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> unable to set CAP_SETFCAP effective capability: Operation not permitted
> root@caps:/home/ubuntu# unshare -Ur
> unshare: write failed /proc/self/uid_map: Operation not permitted
>
> Signed-off-by: Serge Hallyn 
>
> Changelog:
>* fix logic in the case of writing to another task's uid_map
>* rename 'ns' to 'map_ns', and make a file_ns local variable
>* use /* comments */
>* update the CAP_SETFCAP comment in capability.h
>* rename parent_unpriv to parent_can_setfcap (and reverse the
>  logic)
>* remove printks
>* clarify (i hope) the code comments
>* update capability.h comment
>* renamed parent_can_setfcap to parent_could_setfcap
>* made the check its own disallowed_0_mapping() fn
>* moved the check into new_idmap_permitted
>* rename disallowed_0_mapping to verify_root_mapping
>* change verify_root_mapping to Christian's suggested flow
> ---
>  include/linux/user_namespace.h  |  3 ++
>  include/uapi/linux/capability.h |  3 +-
>  kernel/user_namespace.c | 66 +++--
>  3 files changed, 68 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 64cf8ebdc4ec..f6c5f784be5a 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -63,6 +63,9 @@ struct user_namespace {
>   kgid_t  group;
>   struct ns_commonns;
>   unsigned long   flags;
> + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
> +  * in its effective capability set at the child ns creation time. */
> + boolparent_could_setfcap;
>  
>  #ifdef CONFIG_KEYS
>   /* List of joinable keyrings in this namespace.  Modification access of
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index c6ca33034147..2ddb4226cd23 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -335,7 +335,8 @@ struct vfs_ns_cap_data {
>  
>  #define CAP_AUDIT_CONTROL30
>  
> -/* Set or remove capabilities on files */
> +/* Set or remove capabilities on files.
> +   Map uid=0 into a child user namespace. */
>  
>  #define CAP_SETFCAP   31
>  
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index af612945a4d0..2ead291177b0 100644
> --- a/kernel/us

Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

2021-04-18 Thread Christian Brauner
On Sat, Apr 17, 2021 at 03:04:34PM -0500, Serge Hallyn wrote:
> A process running as uid 0 but without cap_setfcap currently can simply
> unshare a new user namespace with uid 0 mapped to 0.  While this task
> will not have new capabilities against the parent namespace, there is
> a loophole due to the way namespaced file capabilities work.  File
> capabilities valid in userns 1 are distinguised from file capabilities
> valid in userns 2 by the kuid which underlies uid 0.  Therefore
> the restricted root process can unshare a new self-mapping namespace,
> add a namespaced file capability onto a file, then use that file
> capability in the parent namespace.
> 
> To prevent that, do not allow mapping uid 0 if the process which
> opened the uid_map file does not have CAP_SETFCAP, which is the capability
> for setting file capabilities.
> 
> A further wrinkle:  a task can unshare its user namespace, then
> open its uid_map file itself, and map (only) its own uid.  In this
> case we do not have the credential from before unshare,  which was
> potentially more restricted.  So, when creating a user namespace, we
> record whether the creator had CAP_SETFCAP.  Then we can use that
> during map_write().
> 
> With this patch:
> 
> 1. unprivileged user can still unshare -Ur
> 
> ubuntu@caps:~$ unshare -Ur
> root@caps:~# logout
> 
> 2. root user can still unshare -Ur
> 
> ubuntu@caps:~$ sudo bash
> root@caps:/home/ubuntu# unshare -Ur
> root@caps:/home/ubuntu# logout
> 
> 3. root user without CAP_SETFCAP cannot unshare -Ur:
> 
> root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> unable to set CAP_SETFCAP effective capability: Operation not permitted
> root@caps:/home/ubuntu# unshare -Ur
> unshare: write failed /proc/self/uid_map: Operation not permitted
> 
> Signed-off-by: Serge Hallyn 
> 
> Changelog:
>* fix logic in the case of writing to another task's uid_map
>* rename 'ns' to 'map_ns', and make a file_ns local variable
>* use /* comments */
>* update the CAP_SETFCAP comment in capability.h
>* rename parent_unpriv to parent_can_setfcap (and reverse the
>  logic)
>* remove printks
>* clarify (i hope) the code comments
>* update capability.h comment
>* renamed parent_can_setfcap to parent_could_setfcap
>* made the check its own disallowed_0_mapping() fn
>* moved the check into new_idmap_permitted
>* rename disallowed_0_mapping to verify_root_mapping
>* change verify_root_mapping to Christian's suggested flow
> ---

Thank you. This looks good. I tested this with:

- fstests
- LXD testsuite
- Podman testsuite
- libcap testsuite

Tested-by: Christian Brauner 
Reviewed-by: Christian Brauner 


RE: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index

2021-04-18 Thread K, Narendra
> -Original Message-
> From: Viktor Mihajlovski 
> Sent: Saturday, April 17, 2021 4:18 PM
> To: K, Narendra; Niklas Schnelle; Bjorn Helgaas
> Cc: Stefan Raspl; Peter Oberparleiter; linux-net...@vger.kernel.org; linux-
> p...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> s...@vger.kernel.org
> Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index
> 
> 
> [EXTERNAL EMAIL]
> 
> 
> 
> On 4/16/21 7:58 PM, K, Narendra wrote:
> >> -Original Message-
> >> From: Niklas Schnelle 
> >> Sent: Thursday, April 15, 2021 12:55 PM
> >> To: Bjorn Helgaas
> >> Cc: K, Narendra; Viktor Mihajlovski; Stefan Raspl; Peter
> >> Oberparleiter; linux- net...@vger.kernel.org;
> >> linux-...@vger.kernel.org; linux- ker...@vger.kernel.org;
> >> linux-s...@vger.kernel.org
> >> Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its
> >> index
> >>
> >>
> >> [EXTERNAL EMAIL]
> >>
> >> On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote:
> >>> On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote:
> >>>> On s390 each PCI device has a user-defined ID (UID) exposed under
> >>>> /sys/bus/pci/devices//uid. This ID was designed to serve as
> >>>> the PCI device's primary index and to match the device within Linux
> >>>> to the device configured in the hypervisor. To serve as a primary
> >>>> identifier the UID must be unique within the Linux instance, this
> >>>> is guaranteed by the platform if and only if the UID Uniqueness
> >>>> Checking flag is set within the CLP List PCI Functions response.
> >>>>
> >>>> In this sense the UID serves an analogous function as the SMBIOS
> >>>> instance number or ACPI index exposed as the "index" respectively
> >>>> "acpi_index" device attributes and used by e.g. systemd to set
> >>>> interface names. As s390 does not use and will likely never use
> >>>> ACPI nor SMBIOS there is no conflict and we can just expose the UID
> >>>> under the
> >> "index"
> >>>> attribute whenever UID Uniqueness Checking is active and get
> >>>> systemd's interface naming support for free.
> >>>>
> >>>> Signed-off-by: Niklas Schnelle 
> >>>> Acked-by: Viktor Mihajlovski 
> >>>
> >>> This seems like a nice solution to me.
> >>>
> >>> Acked-by: Bjorn Helgaas 
> >>
> >> Thanks! Yes I agree it's a simple solution that also makes sense from
> >> a design point. I'll wait for Narendra's opinion of course.
> >
> > Hi Niklas,
> >
> > It seems like the UID and the index are not equivalent in their meaning.
> >
> Hi Narendra,
> 
> the reasoning behind the wish to reuse index is that we think it's similar in 
> the
> sense that it provides a stable, firmware-reported interface identifier.
> Some background: s390 is a highly virtualized platform. There's no traditional
> bare metal mode of operation, neither for the computer system nor the I/O
> subsystem.
> The equivalent to a standalone system is a logical partition (LPAR) which in
> essence is a kind of virtual machine. LPARs access I/O devices in a 
> virtualized
> fashion as well. E.g., for PCI devices the I/O subsystem is responsible for 
> the
> management of PCI hardware and will provide the PCI functions to the LPARs.
> This is where UID and function_id come into play. Each PCI function will 
> carry a
> function_id attribute which defines it uniquely within the entire s390 
> system. If
> a UID is configured for the function, it must be unique within an LPAR, but
> doesn't need to be unique system-wide.
> For instance, the admistrator may want to ensure that for every LPAR the
> primary ethernet interface has the same identifier, to allow miigration of 
> Linux
> instances accross LPARs.
> This can't be achieved with a slot-based name.
> > The index SMBIOS type 41 device type instance indicates that
> >
> > 1. The device is an onboard device.
> > 2. A specific order of the onboard devices.
> >
> > The UID described seems to indicate that the PCI device is unique within the
> Linux instance (sufficient for naming).
> > But it does not indicate that PCI device is onboard and any order.
> >
> > As all devices with UID will be named as eno (Ethernet onboard),
> > names are not in sync with how each PCI function is

[PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

2021-04-17 Thread Serge E. Hallyn
A process running as uid 0 but without cap_setfcap currently can simply
unshare a new user namespace with uid 0 mapped to 0.  While this task
will not have new capabilities against the parent namespace, there is
a loophole due to the way namespaced file capabilities work.  File
capabilities valid in userns 1 are distinguised from file capabilities
valid in userns 2 by the kuid which underlies uid 0.  Therefore
the restricted root process can unshare a new self-mapping namespace,
add a namespaced file capability onto a file, then use that file
capability in the parent namespace.

To prevent that, do not allow mapping uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the capability
for setting file capabilities.

A further wrinkle:  a task can unshare its user namespace, then
open its uid_map file itself, and map (only) its own uid.  In this
case we do not have the credential from before unshare,  which was
potentially more restricted.  So, when creating a user namespace, we
record whether the creator had CAP_SETFCAP.  Then we can use that
during map_write().

With this patch:

1. unprivileged user can still unshare -Ur

ubuntu@caps:~$ unshare -Ur
root@caps:~# logout

2. root user can still unshare -Ur

ubuntu@caps:~$ sudo bash
root@caps:/home/ubuntu# unshare -Ur
root@caps:/home/ubuntu# logout

3. root user without CAP_SETFCAP cannot unshare -Ur:

root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted

Signed-off-by: Serge Hallyn 

Changelog:
   * fix logic in the case of writing to another task's uid_map
   * rename 'ns' to 'map_ns', and make a file_ns local variable
   * use /* comments */
   * update the CAP_SETFCAP comment in capability.h
   * rename parent_unpriv to parent_can_setfcap (and reverse the
 logic)
   * remove printks
   * clarify (i hope) the code comments
   * update capability.h comment
   * renamed parent_can_setfcap to parent_could_setfcap
   * made the check its own disallowed_0_mapping() fn
   * moved the check into new_idmap_permitted
   * rename disallowed_0_mapping to verify_root_mapping
   * change verify_root_mapping to Christian's suggested flow
---
 include/linux/user_namespace.h  |  3 ++
 include/uapi/linux/capability.h |  3 +-
 kernel/user_namespace.c | 66 +++--
 3 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..f6c5f784be5a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,9 @@ struct user_namespace {
kgid_t  group;
struct ns_commonns;
unsigned long   flags;
+   /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
+* in its effective capability set at the child ns creation time. */
+   boolparent_could_setfcap;
 
 #ifdef CONFIG_KEYS
/* List of joinable keyrings in this namespace.  Modification access of
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index c6ca33034147..2ddb4226cd23 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -335,7 +335,8 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_CONTROL30
 
-/* Set or remove capabilities on files */
+/* Set or remove capabilities on files.
+   Map uid=0 into a child user namespace. */
 
 #define CAP_SETFCAP 31
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index af612945a4d0..2ead291177b0 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -106,6 +106,7 @@ int create_user_ns(struct cred *new)
if (!ns)
goto fail_dec;
 
+   ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP);
ret = ns_alloc_inum(&ns->ns);
if (ret)
goto fail_free;
@@ -841,6 +842,61 @@ static int sort_idmaps(struct uid_gid_map *map)
return 0;
 }
 
+/**
+ * verify_root_map() - check the uid 0 mapping
+ * @file: idmapping file
+ * @map_ns: user namespace of the target process
+ * @new_map: requested idmap
+ *
+ * If a process requested a mapping for uid 0 onto uid 0, verify that the
+ * process writing the map had the CAP_SETFCAP capability as the target process
+ * will be able to write fscaps that are valid in ancestor user namespaces.
+ *
+ * Return: true if the mapping is allowed, false if not.
+ */
+static bool verify_root_map(const struct file *file,
+   struct user_namespace *map_ns,
+   struct uid_gid_map *new_map)
+{
+   int idx;
+   const struct user_namespace *file_ns = file->f_cred-&g

Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index

2021-04-17 Thread Viktor Mihajlovski




On 4/16/21 7:58 PM, K, Narendra wrote:

-Original Message-
From: Niklas Schnelle 
Sent: Thursday, April 15, 2021 12:55 PM
To: Bjorn Helgaas
Cc: K, Narendra; Viktor Mihajlovski; Stefan Raspl; Peter Oberparleiter; linux-
net...@vger.kernel.org; linux-...@vger.kernel.org; linux-
ker...@vger.kernel.org; linux-s...@vger.kernel.org
Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index


[EXTERNAL EMAIL]

On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote:

On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote:

On s390 each PCI device has a user-defined ID (UID) exposed under
/sys/bus/pci/devices//uid. This ID was designed to serve as the
PCI device's primary index and to match the device within Linux to
the device configured in the hypervisor. To serve as a primary
identifier the UID must be unique within the Linux instance, this is
guaranteed by the platform if and only if the UID Uniqueness
Checking flag is set within the CLP List PCI Functions response.

In this sense the UID serves an analogous function as the SMBIOS
instance number or ACPI index exposed as the "index" respectively
"acpi_index" device attributes and used by e.g. systemd to set
interface names. As s390 does not use and will likely never use ACPI
nor SMBIOS there is no conflict and we can just expose the UID under the

"index"

attribute whenever UID Uniqueness Checking is active and get
systemd's interface naming support for free.

Signed-off-by: Niklas Schnelle 
Acked-by: Viktor Mihajlovski 


This seems like a nice solution to me.

Acked-by: Bjorn Helgaas 


Thanks! Yes I agree it's a simple solution that also makes sense from a design
point. I'll wait for Narendra's opinion of course.


Hi Niklas,

It seems like the UID and the index are not equivalent in their meaning.


Hi Narendra,

the reasoning behind the wish to reuse index is that we think it's
similar in the sense that it provides a stable, firmware-reported
interface identifier.
Some background: s390 is a highly virtualized platform. There's
no traditional bare metal mode of operation, neither for the computer
system nor the I/O subsystem.
The equivalent to a standalone system is a logical partition (LPAR)
which in essence is a kind of virtual machine. LPARs access I/O devices
in a virtualized fashion as well. E.g., for PCI devices the I/O
subsystem is responsible for the management of PCI hardware and
will provide the PCI functions to the LPARs.
This is where UID and function_id come into play. Each PCI function will
carry a function_id attribute which defines it uniquely within the
entire s390 system. If a UID is configured for the function, it must
be unique within an LPAR, but doesn't need to be unique system-wide.
For instance, the admistrator may want to ensure that for every
LPAR the primary ethernet interface has the same identifier, to
allow miigration of Linux instances accross LPARs.
This can't be achieved with a slot-based name.

The index SMBIOS type 41 device type instance indicates that

1. The device is an onboard device.
2. A specific order of the onboard devices.

The UID described seems to indicate that the PCI device is unique within the 
Linux instance (sufficient for naming).
But it does not indicate that PCI device is onboard and any order.

As all devices with UID will be named as eno (Ethernet onboard), names are 
not in sync with how each PCI function is exposed on a PCI slot (appears
closer to SMBIOS type 9 record) as described below.

https://github.com/systemd/systemd/pull/19017
https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc

In summary, it seems like the eno names on s390 will be unique names, but 
may not match the meaning of the index.

Also,

a) Will UID remain unique/same as initially exposed across reboots ?

Yes, the UID is the primary identifier for a PCI function and part of
the LPAR configuration. Even hotplug will not change the UID.


b) Is the reuse of index related to the 'slots' based naming scheme described 
below ?

https://github.com/systemd/systemd/pull/19017
https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc


No, this is independent. The commit above fixes the slot name parsing.

c) If the slot based naming is fixed with the naming scheme from changes below, 
any thoughts on why is reuse of index needed ?

https://github.com/systemd/systemd/pull/19017
https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc

As I wrote above, the slot describes the PCI function at the system
level, whereas the uid/index does it in the context off the LPAR.

 > With regards,
Narendra K



--
Kind Regards,
   Viktor


Re: [RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)

2021-04-16 Thread Serge E. Hallyn
On Fri, Apr 16, 2021 at 04:34:53PM -0500, Serge E. Hallyn wrote:
> On Fri, Apr 16, 2021 at 05:05:01PM +0200, Christian Brauner wrote:
> > On Thu, Apr 15, 2021 at 11:58:51PM -0500, Serge Hallyn wrote:
> > > (Eric - this patch (v3) is a cleaned up version of the previous approach.
> > > v4 is at 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > > and is the approach you suggested.  I can send it also as a separate patch
> > > if you like)
> > > 
> > > A process running as uid 0 but without cap_setfcap currently can simply
> > > unshare a new user namespace with uid 0 mapped to 0.  While this task
> > > will not have new capabilities against the parent namespace, there is
> > > a loophole due to the way namespaced file capabilities work.  File
> > > capabilities valid in userns 1 are distinguised from file capabilities
> > > valid in userns 2 by the kuid which underlies uid 0.  Therefore
> > > the restricted root process can unshare a new self-mapping namespace,
> > > add a namespaced file capability onto a file, then use that file
> > > capability in the parent namespace.
> > > 
> > > To prevent that, do not allow mapping uid 0 if the process which
> > > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > > for setting file capabilities.
> > > 
> > > A further wrinkle:  a task can unshare its user namespace, then
> > > open its uid_map file itself, and map (only) its own uid.  In this
> > > case we do not have the credential from before unshare,  which was
> > > potentially more restricted.  So, when creating a user namespace, we
> > > record whether the creator had CAP_SETFCAP.  Then we can use that
> > > during map_write().
> > > 
> > > With this patch:
> > > 
> > > 1. unprivileged user can still unshare -Ur
> > > 
> > > ubuntu@caps:~$ unshare -Ur
> > > root@caps:~# logout
> > > 
> > > 2. root user can still unshare -Ur
> > > 
> > > ubuntu@caps:~$ sudo bash
> > > root@caps:/home/ubuntu# unshare -Ur
> > > root@caps:/home/ubuntu# logout
> > > 
> > > 3. root user without CAP_SETFCAP cannot unshare -Ur:
> > > 
> > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > > root@caps:/home/ubuntu# unshare -Ur
> > > unshare: write failed /proc/self/uid_map: Operation not permitted
> > > 
> > > Signed-off-by: Serge Hallyn 
> > > 
> > > Changelog:
> > >* fix logic in the case of writing to another task's uid_map
> > >* rename 'ns' to 'map_ns', and make a file_ns local variable
> > >* use /* comments */
> > >* update the CAP_SETFCAP comment in capability.h
> > >* rename parent_unpriv to parent_can_setfcap (and reverse the
> > >  logic)
> > >* remove printks
> > >* clarify (i hope) the code comments
> > >* update capability.h comment
> > >* renamed parent_can_setfcap to parent_could_setfcap
> > >* made the check its own disallowed_0_mapping() fn
> > >* moved the check into new_idmap_permitted
> > > ---
> > 
> > Thank you for working on this fix!
> > 
> > I do prefer your approach of doing the check at user namespace creation
> > time instead of moving it into the setxattr() codepath.
> > 
> > Let me reiterate that the ability to write through fscaps is a valid
> > usecase and this should continue to work but that for locked down user
> > namespace as Andrew wants to use them your patch provides a clean
> > solution.
> > We've are using identity mappings in quite a few scenarios partially
> > when performing tests but also to write through fscaps.
> > We also had reports of users that use identity mappings. They create
> > their rootfs by running image extraction in an identity mapped userns
> > where fscaps are written through.
> > Podman has use-cases for this feature as well and has been affected by
> > the regression of the first fix.
> 
> Thanks for reviewing.
> 
> I'm not sure what your point above is, so just to make sure - the
> alternative implementation also does allow fscaps for cases where
> root uid is remapped, only disallowing it if it would violate the
> ancestor's lack o

Re: [RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)

2021-04-16 Thread Serge E. Hallyn
On Fri, Apr 16, 2021 at 05:05:01PM +0200, Christian Brauner wrote:
> On Thu, Apr 15, 2021 at 11:58:51PM -0500, Serge Hallyn wrote:
> > (Eric - this patch (v3) is a cleaned up version of the previous approach.
> > v4 is at 
> > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > and is the approach you suggested.  I can send it also as a separate patch
> > if you like)
> > 
> > A process running as uid 0 but without cap_setfcap currently can simply
> > unshare a new user namespace with uid 0 mapped to 0.  While this task
> > will not have new capabilities against the parent namespace, there is
> > a loophole due to the way namespaced file capabilities work.  File
> > capabilities valid in userns 1 are distinguised from file capabilities
> > valid in userns 2 by the kuid which underlies uid 0.  Therefore
> > the restricted root process can unshare a new self-mapping namespace,
> > add a namespaced file capability onto a file, then use that file
> > capability in the parent namespace.
> > 
> > To prevent that, do not allow mapping uid 0 if the process which
> > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > for setting file capabilities.
> > 
> > A further wrinkle:  a task can unshare its user namespace, then
> > open its uid_map file itself, and map (only) its own uid.  In this
> > case we do not have the credential from before unshare,  which was
> > potentially more restricted.  So, when creating a user namespace, we
> > record whether the creator had CAP_SETFCAP.  Then we can use that
> > during map_write().
> > 
> > With this patch:
> > 
> > 1. unprivileged user can still unshare -Ur
> > 
> > ubuntu@caps:~$ unshare -Ur
> > root@caps:~# logout
> > 
> > 2. root user can still unshare -Ur
> > 
> > ubuntu@caps:~$ sudo bash
> > root@caps:/home/ubuntu# unshare -Ur
> > root@caps:/home/ubuntu# logout
> > 
> > 3. root user without CAP_SETFCAP cannot unshare -Ur:
> > 
> > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > root@caps:/home/ubuntu# unshare -Ur
> > unshare: write failed /proc/self/uid_map: Operation not permitted
> > 
> > Signed-off-by: Serge Hallyn 
> > 
> > Changelog:
> >* fix logic in the case of writing to another task's uid_map
> >* rename 'ns' to 'map_ns', and make a file_ns local variable
> >* use /* comments */
> >* update the CAP_SETFCAP comment in capability.h
> >* rename parent_unpriv to parent_can_setfcap (and reverse the
> >  logic)
> >* remove printks
> >* clarify (i hope) the code comments
> >* update capability.h comment
> >* renamed parent_can_setfcap to parent_could_setfcap
> >* made the check its own disallowed_0_mapping() fn
> >* moved the check into new_idmap_permitted
> > ---
> 
> Thank you for working on this fix!
> 
> I do prefer your approach of doing the check at user namespace creation
> time instead of moving it into the setxattr() codepath.
> 
> Let me reiterate that the ability to write through fscaps is a valid
> usecase and this should continue to work but that for locked down user
> namespace as Andrew wants to use them your patch provides a clean
> solution.
> We've are using identity mappings in quite a few scenarios partially
> when performing tests but also to write through fscaps.
> We also had reports of users that use identity mappings. They create
> their rootfs by running image extraction in an identity mapped userns
> where fscaps are written through.
> Podman has use-cases for this feature as well and has been affected by
> the regression of the first fix.

Thanks for reviewing.

I'm not sure what your point above is, so just to make sure - the
alternative implementation also does allow fscaps for cases where
root uid is remapped, only disallowing it if it would violate the
ancestor's lack of cap_setfcap.


> >  include/linux/user_namespace.h  |  3 ++
> >  include/uapi/linux/capability.h |  3 +-
> >  kernel/user_namespace.c | 61 +++--
> >  3 files changed, 63 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 64cf8ebdc4ec..f6c5f784be5a 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -63,

RE: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index

2021-04-16 Thread K, Narendra
> -Original Message-
> From: Niklas Schnelle 
> Sent: Thursday, April 15, 2021 12:55 PM
> To: Bjorn Helgaas
> Cc: K, Narendra; Viktor Mihajlovski; Stefan Raspl; Peter Oberparleiter; linux-
> net...@vger.kernel.org; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-s...@vger.kernel.org
> Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index
> 
> 
> [EXTERNAL EMAIL]
> 
> On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote:
> > > On s390 each PCI device has a user-defined ID (UID) exposed under
> > > /sys/bus/pci/devices//uid. This ID was designed to serve as the
> > > PCI device's primary index and to match the device within Linux to
> > > the device configured in the hypervisor. To serve as a primary
> > > identifier the UID must be unique within the Linux instance, this is
> > > guaranteed by the platform if and only if the UID Uniqueness
> > > Checking flag is set within the CLP List PCI Functions response.
> > >
> > > In this sense the UID serves an analogous function as the SMBIOS
> > > instance number or ACPI index exposed as the "index" respectively
> > > "acpi_index" device attributes and used by e.g. systemd to set
> > > interface names. As s390 does not use and will likely never use ACPI
> > > nor SMBIOS there is no conflict and we can just expose the UID under the
> "index"
> > > attribute whenever UID Uniqueness Checking is active and get
> > > systemd's interface naming support for free.
> > >
> > > Signed-off-by: Niklas Schnelle 
> > > Acked-by: Viktor Mihajlovski 
> >
> > This seems like a nice solution to me.
> >
> > Acked-by: Bjorn Helgaas 
> 
> Thanks! Yes I agree it's a simple solution that also makes sense from a design
> point. I'll wait for Narendra's opinion of course.

Hi Niklas,

It seems like the UID and the index are not equivalent in their meaning.

The index SMBIOS type 41 device type instance indicates that 

1. The device is an onboard device.
2. A specific order of the onboard devices.

The UID described seems to indicate that the PCI device is unique within the 
Linux instance (sufficient for naming). 
But it does not indicate that PCI device is onboard and any order. 

As all devices with UID will be named as eno (Ethernet onboard), names are 
not in sync with how each PCI function is exposed on a PCI slot (appears
closer to SMBIOS type 9 record) as described below.

https://github.com/systemd/systemd/pull/19017
https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc

In summary, it seems like the eno names on s390 will be unique names, but 
may not match the meaning of the index.

Also,

a) Will UID remain unique/same as initially exposed across reboots ? 

b) Is the reuse of index related to the 'slots' based naming scheme described 
below ?   

https://github.com/systemd/systemd/pull/19017
https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc

c) If the slot based naming is fixed with the naming scheme from changes below, 
any thoughts on why is reuse of index needed ? 

https://github.com/systemd/systemd/pull/19017
https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc
  

With regards,
Narendra K


Re: [RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)

2021-04-16 Thread Christian Brauner
On Thu, Apr 15, 2021 at 11:58:51PM -0500, Serge Hallyn wrote:
> (Eric - this patch (v3) is a cleaned up version of the previous approach.
> v4 is at 
> https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> and is the approach you suggested.  I can send it also as a separate patch
> if you like)
> 
> A process running as uid 0 but without cap_setfcap currently can simply
> unshare a new user namespace with uid 0 mapped to 0.  While this task
> will not have new capabilities against the parent namespace, there is
> a loophole due to the way namespaced file capabilities work.  File
> capabilities valid in userns 1 are distinguised from file capabilities
> valid in userns 2 by the kuid which underlies uid 0.  Therefore
> the restricted root process can unshare a new self-mapping namespace,
> add a namespaced file capability onto a file, then use that file
> capability in the parent namespace.
> 
> To prevent that, do not allow mapping uid 0 if the process which
> opened the uid_map file does not have CAP_SETFCAP, which is the capability
> for setting file capabilities.
> 
> A further wrinkle:  a task can unshare its user namespace, then
> open its uid_map file itself, and map (only) its own uid.  In this
> case we do not have the credential from before unshare,  which was
> potentially more restricted.  So, when creating a user namespace, we
> record whether the creator had CAP_SETFCAP.  Then we can use that
> during map_write().
> 
> With this patch:
> 
> 1. unprivileged user can still unshare -Ur
> 
> ubuntu@caps:~$ unshare -Ur
> root@caps:~# logout
> 
> 2. root user can still unshare -Ur
> 
> ubuntu@caps:~$ sudo bash
> root@caps:/home/ubuntu# unshare -Ur
> root@caps:/home/ubuntu# logout
> 
> 3. root user without CAP_SETFCAP cannot unshare -Ur:
> 
> root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> unable to set CAP_SETFCAP effective capability: Operation not permitted
> root@caps:/home/ubuntu# unshare -Ur
> unshare: write failed /proc/self/uid_map: Operation not permitted
> 
> Signed-off-by: Serge Hallyn 
> 
> Changelog:
>* fix logic in the case of writing to another task's uid_map
>* rename 'ns' to 'map_ns', and make a file_ns local variable
>* use /* comments */
>* update the CAP_SETFCAP comment in capability.h
>* rename parent_unpriv to parent_can_setfcap (and reverse the
>  logic)
>* remove printks
>* clarify (i hope) the code comments
>* update capability.h comment
>* renamed parent_can_setfcap to parent_could_setfcap
>* made the check its own disallowed_0_mapping() fn
>* moved the check into new_idmap_permitted
> ---

Thank you for working on this fix!

I do prefer your approach of doing the check at user namespace creation
time instead of moving it into the setxattr() codepath.

Let me reiterate that the ability to write through fscaps is a valid
usecase and this should continue to work but that for locked down user
namespace as Andrew wants to use them your patch provides a clean
solution.
We've are using identity mappings in quite a few scenarios partially
when performing tests but also to write through fscaps.
We also had reports of users that use identity mappings. They create
their rootfs by running image extraction in an identity mapped userns
where fscaps are written through.
Podman has use-cases for this feature as well and has been affected by
the regression of the first fix.

>  include/linux/user_namespace.h  |  3 ++
>  include/uapi/linux/capability.h |  3 +-
>  kernel/user_namespace.c | 61 +++--
>  3 files changed, 63 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 64cf8ebdc4ec..f6c5f784be5a 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -63,6 +63,9 @@ struct user_namespace {
>   kgid_t  group;
>   struct ns_commonns;
>   unsigned long   flags;
> + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
> +  * in its effective capability set at the child ns creation time. */
> + boolparent_could_setfcap;
>  
>  #ifdef CONFIG_KEYS
>   /* List of joinable keyrings in this namespace.  Modification access of
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index c6ca33034147..2ddb4226cd23 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -335,7 +335,8 @@ struct vfs_ns_cap_data {
>  
>  #define CAP_AUDIT_

[RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)

2021-04-15 Thread Serge E. Hallyn
(Eric - this patch (v3) is a cleaned up version of the previous approach.
v4 is at 
https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
and is the approach you suggested.  I can send it also as a separate patch
if you like)

A process running as uid 0 but without cap_setfcap currently can simply
unshare a new user namespace with uid 0 mapped to 0.  While this task
will not have new capabilities against the parent namespace, there is
a loophole due to the way namespaced file capabilities work.  File
capabilities valid in userns 1 are distinguised from file capabilities
valid in userns 2 by the kuid which underlies uid 0.  Therefore
the restricted root process can unshare a new self-mapping namespace,
add a namespaced file capability onto a file, then use that file
capability in the parent namespace.

To prevent that, do not allow mapping uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the capability
for setting file capabilities.

A further wrinkle:  a task can unshare its user namespace, then
open its uid_map file itself, and map (only) its own uid.  In this
case we do not have the credential from before unshare,  which was
potentially more restricted.  So, when creating a user namespace, we
record whether the creator had CAP_SETFCAP.  Then we can use that
during map_write().

With this patch:

1. unprivileged user can still unshare -Ur

ubuntu@caps:~$ unshare -Ur
root@caps:~# logout

2. root user can still unshare -Ur

ubuntu@caps:~$ sudo bash
root@caps:/home/ubuntu# unshare -Ur
root@caps:/home/ubuntu# logout

3. root user without CAP_SETFCAP cannot unshare -Ur:

root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted

Signed-off-by: Serge Hallyn 

Changelog:
   * fix logic in the case of writing to another task's uid_map
   * rename 'ns' to 'map_ns', and make a file_ns local variable
   * use /* comments */
   * update the CAP_SETFCAP comment in capability.h
   * rename parent_unpriv to parent_can_setfcap (and reverse the
 logic)
   * remove printks
   * clarify (i hope) the code comments
   * update capability.h comment
   * renamed parent_can_setfcap to parent_could_setfcap
   * made the check its own disallowed_0_mapping() fn
   * moved the check into new_idmap_permitted
---
 include/linux/user_namespace.h  |  3 ++
 include/uapi/linux/capability.h |  3 +-
 kernel/user_namespace.c | 61 +++--
 3 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..f6c5f784be5a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,9 @@ struct user_namespace {
kgid_t  group;
struct ns_commonns;
unsigned long   flags;
+   /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
+* in its effective capability set at the child ns creation time. */
+   boolparent_could_setfcap;
 
 #ifdef CONFIG_KEYS
/* List of joinable keyrings in this namespace.  Modification access of
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index c6ca33034147..2ddb4226cd23 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -335,7 +335,8 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_CONTROL30
 
-/* Set or remove capabilities on files */
+/* Set or remove capabilities on files.
+   Map uid=0 into a child user namespace. */
 
 #define CAP_SETFCAP 31
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index af612945a4d0..8c75028a9aae 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -106,6 +106,7 @@ int create_user_ns(struct cred *new)
if (!ns)
goto fail_dec;
 
+   ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP);
ret = ns_alloc_inum(&ns->ns);
if (ret)
goto fail_free;
@@ -841,6 +842,56 @@ static int sort_idmaps(struct uid_gid_map *map)
    return 0;
 }
 
+/*
+ * If mapping uid 0, then file capabilities created by the new namespace will
+ * be effective in the parent namespace.  Adding file capabilities requires
+ * CAP_SETFCAP, which the child namespace will have, so creating such a
+ * mapping requires CAP_SETFCAP in the parent namespace.
+ */
+static bool disallowed_0_mapping(const struct file *file,
+struct user_namespace *map_ns,
+struct uid_gid_map *new_map)
+{
+   int idx;
+   bool zeromapping = false;
+   const struct user_name

Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index

2021-04-15 Thread Niklas Schnelle
On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote:
> On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote:
> > On s390 each PCI device has a user-defined ID (UID) exposed under
> > /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI
> > device's primary index and to match the device within Linux to the
> > device configured in the hypervisor. To serve as a primary identifier
> > the UID must be unique within the Linux instance, this is guaranteed by
> > the platform if and only if the UID Uniqueness Checking flag is set
> > within the CLP List PCI Functions response.
> > 
> > In this sense the UID serves an analogous function as the SMBIOS
> > instance number or ACPI index exposed as the "index" respectively
> > "acpi_index" device attributes and used by e.g. systemd to set interface
> > names. As s390 does not use and will likely never use ACPI nor SMBIOS
> > there is no conflict and we can just expose the UID under the "index"
> > attribute whenever UID Uniqueness Checking is active and get systemd's
> > interface naming support for free.
> > 
> > Signed-off-by: Niklas Schnelle 
> > Acked-by: Viktor Mihajlovski 
> 
> This seems like a nice solution to me.
> 
> Acked-by: Bjorn Helgaas 

Thanks! Yes I agree it's a simple solution that also makes sense from a
design point. I'll wait for Narendra's opinion of course.

> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-pci | 11 +---
> >  arch/s390/pci/pci_sysfs.c   | 35 +
> >  2 files changed, 42 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> > b/Documentation/ABI/testing/sysfs-bus-pci
> > index 25c9c39770c6..1241b6d11a52 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-pci
> > +++ b/Documentation/ABI/testing/sysfs-bus-pci
> > @@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index
> >  Date:  July 2010
> >  Contact:   Narendra K , linux-b...@dell.com
> >  Description:
> > -   Reading this attribute will provide the firmware
> > -   given instance (SMBIOS type 41 device type instance) of the
> > -   PCI device. The attribute will be created only if the firmware
> > -   has given an instance number to the PCI device.
> > +   Reading this attribute will provide the firmware given instance
> > +   number of the PCI device.  Depending on the platform this can
> > +   be for example the SMBIOS type 41 device type instance or the
> > +   user-defined ID (UID) on s390. The attribute will be created
> > +   only if the firmware has given an instance number to the PCI
> > +   device and that number is guaranteed to uniquely identify the
> > +   device in the system.
> >  Users:
> > Userspace applications interested in knowing the
> > firmware assigned device type instance of the PCI
> > diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
> > index e14d346dafd6..20dbb2058d51 100644
> > --- a/arch/s390/pci/pci_sysfs.c
> > +++ b/arch/s390/pci/pci_sysfs.c
> > @@ -138,6 +138,38 @@ static ssize_t uid_is_unique_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RO(uid_is_unique);
> >  
> > +#ifndef CONFIG_DMI
> > +/* analogous to smbios index */
> 
> I think this is smbios_attr_instance, right?  Maybe mention that
> specifically to make it easier to match these up.
> 
> Looks like smbios_attr_instance and the similar ACPI stuff could use
> some updating to use the current attribute group infrastructure.
> 
> > +static ssize_t index_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > +   struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
> > +   u32 index = ~0;
> > +
> > +   if (zpci_unique_uid)
> > +   index = zdev->uid;
> > +
> > +   return sysfs_emit(buf, "%u\n", index);
> > +}
> > +static DEVICE_ATTR_RO(index);
> > +
> > +static umode_t zpci_unique_uids(struct kobject *kobj,
> > +   struct attribute *attr, int n)
> > +{
> > +   return zpci_unique_uid ? attr->mode : 0;
> > +}
> > +
> > +static struct attribute *zpci_ident_attrs[] = {
> > +   &dev_attr_index.attr,
> > +   NULL,
> > +};
> > +
> > +static struct attribute_group zpci_ident_attr_group = {
> > +   .attrs = zpci_ident_attrs,
> > +   .is_visib

Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index

2021-04-14 Thread Bjorn Helgaas
On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote:
> On s390 each PCI device has a user-defined ID (UID) exposed under
> /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI
> device's primary index and to match the device within Linux to the
> device configured in the hypervisor. To serve as a primary identifier
> the UID must be unique within the Linux instance, this is guaranteed by
> the platform if and only if the UID Uniqueness Checking flag is set
> within the CLP List PCI Functions response.
> 
> In this sense the UID serves an analogous function as the SMBIOS
> instance number or ACPI index exposed as the "index" respectively
> "acpi_index" device attributes and used by e.g. systemd to set interface
> names. As s390 does not use and will likely never use ACPI nor SMBIOS
> there is no conflict and we can just expose the UID under the "index"
> attribute whenever UID Uniqueness Checking is active and get systemd's
> interface naming support for free.
> 
> Signed-off-by: Niklas Schnelle 
> Acked-by: Viktor Mihajlovski 

This seems like a nice solution to me.

Acked-by: Bjorn Helgaas 

> ---
>  Documentation/ABI/testing/sysfs-bus-pci | 11 +---
>  arch/s390/pci/pci_sysfs.c   | 35 +
>  2 files changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> b/Documentation/ABI/testing/sysfs-bus-pci
> index 25c9c39770c6..1241b6d11a52 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -195,10 +195,13 @@ What:   /sys/bus/pci/devices/.../index
>  Date:July 2010
>  Contact: Narendra K , linux-b...@dell.com
>  Description:
> - Reading this attribute will provide the firmware
> - given instance (SMBIOS type 41 device type instance) of the
> - PCI device. The attribute will be created only if the firmware
> - has given an instance number to the PCI device.
> + Reading this attribute will provide the firmware given instance
> + number of the PCI device.  Depending on the platform this can
> + be for example the SMBIOS type 41 device type instance or the
> + user-defined ID (UID) on s390. The attribute will be created
> + only if the firmware has given an instance number to the PCI
> + device and that number is guaranteed to uniquely identify the
> + device in the system.
>  Users:
>   Userspace applications interested in knowing the
>   firmware assigned device type instance of the PCI
> diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
> index e14d346dafd6..20dbb2058d51 100644
> --- a/arch/s390/pci/pci_sysfs.c
> +++ b/arch/s390/pci/pci_sysfs.c
> @@ -138,6 +138,38 @@ static ssize_t uid_is_unique_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(uid_is_unique);
>  
> +#ifndef CONFIG_DMI
> +/* analogous to smbios index */

I think this is smbios_attr_instance, right?  Maybe mention that
specifically to make it easier to match these up.

Looks like smbios_attr_instance and the similar ACPI stuff could use
some updating to use the current attribute group infrastructure.

> +static ssize_t index_show(struct device *dev,
> +       struct device_attribute *attr, char *buf)
> +{
> + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
> + u32 index = ~0;
> +
> + if (zpci_unique_uid)
> + index = zdev->uid;
> +
> + return sysfs_emit(buf, "%u\n", index);
> +}
> +static DEVICE_ATTR_RO(index);
> +
> +static umode_t zpci_unique_uids(struct kobject *kobj,
> + struct attribute *attr, int n)
> +{
> + return zpci_unique_uid ? attr->mode : 0;
> +}
> +
> +static struct attribute *zpci_ident_attrs[] = {
> + &dev_attr_index.attr,
> + NULL,
> +};
> +
> +static struct attribute_group zpci_ident_attr_group = {
> + .attrs = zpci_ident_attrs,
> + .is_visible = zpci_unique_uids,

It's conventional to name these functions *_is_visible() (another
convention that smbios_attr_instance and acpi_attr_index probably
predate).

> +};
> +#endif
> +
>  static struct bin_attribute *zpci_bin_attrs[] = {
>   &bin_attr_util_string,
>   &bin_attr_report_error,
> @@ -179,5 +211,8 @@ static struct attribute_group pfip_attr_group = {
>  const struct attribute_group *zpci_attr_groups[] = {
>   &zpci_attr_group,
>   &pfip_attr_group,
> +#ifndef CONFIG_DMI
> + &zpci_ident_attr_group,
> +#endif
>   NULL,
>  };
> -- 
> 2.25.1
> 


[PATCH 1/1] s390/pci: expose a PCI device's UID as its index

2021-04-12 Thread Niklas Schnelle
On s390 each PCI device has a user-defined ID (UID) exposed under
/sys/bus/pci/devices//uid. This ID was designed to serve as the PCI
device's primary index and to match the device within Linux to the
device configured in the hypervisor. To serve as a primary identifier
the UID must be unique within the Linux instance, this is guaranteed by
the platform if and only if the UID Uniqueness Checking flag is set
within the CLP List PCI Functions response.

In this sense the UID serves an analogous function as the SMBIOS
instance number or ACPI index exposed as the "index" respectively
"acpi_index" device attributes and used by e.g. systemd to set interface
names. As s390 does not use and will likely never use ACPI nor SMBIOS
there is no conflict and we can just expose the UID under the "index"
attribute whenever UID Uniqueness Checking is active and get systemd's
interface naming support for free.

Signed-off-by: Niklas Schnelle 
Acked-by: Viktor Mihajlovski 
---
 Documentation/ABI/testing/sysfs-bus-pci | 11 +---
 arch/s390/pci/pci_sysfs.c   | 35 +
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index 25c9c39770c6..1241b6d11a52 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index
 Date:  July 2010
 Contact:   Narendra K , linux-b...@dell.com
 Description:
-   Reading this attribute will provide the firmware
-   given instance (SMBIOS type 41 device type instance) of the
-   PCI device. The attribute will be created only if the firmware
-   has given an instance number to the PCI device.
+   Reading this attribute will provide the firmware given instance
+   number of the PCI device.  Depending on the platform this can
+   be for example the SMBIOS type 41 device type instance or the
+   user-defined ID (UID) on s390. The attribute will be created
+   only if the firmware has given an instance number to the PCI
+   device and that number is guaranteed to uniquely identify the
+   device in the system.
 Users:
Userspace applications interested in knowing the
firmware assigned device type instance of the PCI
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index e14d346dafd6..20dbb2058d51 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -138,6 +138,38 @@ static ssize_t uid_is_unique_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(uid_is_unique);
 
+#ifndef CONFIG_DMI
+/* analogous to smbios index */
+static ssize_t index_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
+   u32 index = ~0;
+
+   if (zpci_unique_uid)
+   index = zdev->uid;
+
+   return sysfs_emit(buf, "%u\n", index);
+}
+static DEVICE_ATTR_RO(index);
+
+static umode_t zpci_unique_uids(struct kobject *kobj,
+   struct attribute *attr, int n)
+{
+   return zpci_unique_uid ? attr->mode : 0;
+}
+
+static struct attribute *zpci_ident_attrs[] = {
+   &dev_attr_index.attr,
+   NULL,
+};
+
+static struct attribute_group zpci_ident_attr_group = {
+   .attrs = zpci_ident_attrs,
+   .is_visible = zpci_unique_uids,
+};
+#endif
+
 static struct bin_attribute *zpci_bin_attrs[] = {
&bin_attr_util_string,
&bin_attr_report_error,
@@ -179,5 +211,8 @@ static struct attribute_group pfip_attr_group = {
 const struct attribute_group *zpci_attr_groups[] = {
&zpci_attr_group,
&pfip_attr_group,
+#ifndef CONFIG_DMI
+   &zpci_ident_attr_group,
+#endif
NULL,
 };
-- 
2.25.1



[PATCH v19 2/7] KVM: arm64: Advertise KVM UID to guests via SMCCC

2021-03-30 Thread Marc Zyngier
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong...@arm.com
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index ead21b98b620..78d32c34d49c 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,22 +54,31 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
case ARM_SMCCC_TRNG_VERSION:
case ARM_SMCCC_TRNG_FEATURES:
@@ -81,6 +90,6 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.29.2



[PATCH] s390/pci: expose a PCI device's UID as its index

2021-03-15 Thread Niklas Schnelle
On s390 each PCI device has a user-defined ID (UID) exposed under
/sys/bus/pci/devices//uid. This ID was designed to serve as the PCI
device's primary index and to match the device within Linux to the
device configured in the hypervisor. To serve as a primary identifier
the UID must be unique within the Linux instance, this is guaranteed by
the platform if and only if the UID Uniqueness Checking flag is set
within the CLP List PCI Functions response.

In this the UID serves an analogous function as the SMBIOS instance
number or ACPI index exposed as the "index" respectively "acpi_index"
device attributes and used by e.g. systemd to set interface names. As
s390 does not use and will likely never use ACPI nor SMBIOS there is no
conflict and we can just expose the UID under the "index" attribute
whenever UID Uniqueness Checking is active and get systemd's interface
naming support for free.

Signed-off-by: Niklas Schnelle 
Acked-by: Viktor Mihajlovski 
---
Changes from RFC patch:
- Use sysfs_emit() instead of sprintf() (Thanks Krzysztof Wilczyński!)

 Documentation/ABI/testing/sysfs-bus-pci | 11 +---
 arch/s390/pci/pci_sysfs.c   | 36 +
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index 25c9c39770c6..1241b6d11a52 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index
 Date:  July 2010
 Contact:   Narendra K , linux-b...@dell.com
 Description:
-   Reading this attribute will provide the firmware
-   given instance (SMBIOS type 41 device type instance) of the
-   PCI device. The attribute will be created only if the firmware
-   has given an instance number to the PCI device.
+   Reading this attribute will provide the firmware given instance
+   number of the PCI device.  Depending on the platform this can
+   be for example the SMBIOS type 41 device type instance or the
+   user-defined ID (UID) on s390. The attribute will be created
+   only if the firmware has given an instance number to the PCI
+   device and that number is guaranteed to uniquely identify the
+   device in the system.
 Users:
Userspace applications interested in knowing the
firmware assigned device type instance of the PCI
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 5c028bee91b9..c7107bd9dd93 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -131,6 +131,38 @@ static ssize_t report_error_write(struct file *filp, 
struct kobject *kobj,
 }
 static BIN_ATTR(report_error, S_IWUSR, NULL, report_error_write, PAGE_SIZE);
 
+#ifndef CONFIG_DMI
+/* analogous to smbios index */
+static ssize_t index_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
+   u32 index = ~0;
+
+   if (zpci_unique_uid)
+   index = zdev->uid;
+
+   return sysfs_emit(buf, "%u\n", index);
+}
+static DEVICE_ATTR_RO(index);
+
+static umode_t zpci_unique_uids(struct kobject *kobj,
+   struct attribute *attr, int n)
+{
+   return zpci_unique_uid ? attr->mode : 0;
+}
+
+static struct attribute *zpci_ident_attrs[] = {
+   &dev_attr_index.attr,
+   NULL,
+};
+
+static struct attribute_group zpci_ident_attr_group = {
+   .attrs = zpci_ident_attrs,
+   .is_visible = zpci_unique_uids,
+};
+#endif
+
 static struct bin_attribute *zpci_bin_attrs[] = {
&bin_attr_util_string,
&bin_attr_report_error,
@@ -150,6 +182,7 @@ static struct attribute *zpci_dev_attrs[] = {
&dev_attr_mio_enabled.attr,
NULL,
 };
+
 static struct attribute_group zpci_attr_group = {
.attrs = zpci_dev_attrs,
.bin_attrs = zpci_bin_attrs,
@@ -170,5 +203,8 @@ static struct attribute_group pfip_attr_group = {
 const struct attribute_group *zpci_attr_groups[] = {
&zpci_attr_group,
&pfip_attr_group,
+#ifndef CONFIG_DMI
+   &zpci_ident_attr_group,
+#endif
NULL,
 };
-- 
2.25.1



Re: [RFC 1/1] s390/pci: expose a PCI device's UID as its index

2021-03-08 Thread Viktor Mihajlovski




On 3/8/21 8:02 AM, Niklas Schnelle wrote:



On 3/7/21 9:46 PM, Krzysztof Wilczyński wrote:

Hi Niklas,

[...]

+static ssize_t index_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
+   u32 index = ~0;
+
+   if (zpci_unique_uid)
+   index = zdev->uid;
+
+   return sprintf(buf, "%u\n", index);

[...]

Would it be possible to use the new sysfs_emit() rather than sprintf()
even though the zpci_attr macro and still use mio_enabled_show() still
would use sprintf().  What do you think?

See https://www.kernel.org/doc/html/latest/filesystems/sysfs.html for
the changes in the internal API.

Krzysztof



Of course that makes sense and thanks for pointing me to this API!
@Viktor, may I carry your R-b over?


Sure, please go ahead.

I'll also update the other attributes in a clean up patch.

Thanks,
Niklas



--
Kind Regards,
   Viktor


Re: [RFC 1/1] s390/pci: expose a PCI device's UID as its index

2021-03-07 Thread Niklas Schnelle



On 3/7/21 9:46 PM, Krzysztof Wilczyński wrote:
> Hi Niklas,
> 
> [...]
>> +static ssize_t index_show(struct device *dev,
>> +  struct device_attribute *attr, char *buf)
>> +{
>> +struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
>> +u32 index = ~0;
>> +
>> +if (zpci_unique_uid)
>> +index = zdev->uid;
>> +
>> +return sprintf(buf, "%u\n", index);
> [...]
> 
> Would it be possible to use the new sysfs_emit() rather than sprintf()
> even though the zpci_attr macro and still use mio_enabled_show() still
> would use sprintf().  What do you think?
> 
> See https://www.kernel.org/doc/html/latest/filesystems/sysfs.html for
> the changes in the internal API.
> 
> Krzysztof
> 

Of course that makes sense and thanks for pointing me to this API!
@Viktor, may I carry your R-b over?

I'll also update the other attributes in a clean up patch.

Thanks,
Niklas


Re: [RFC 1/1] s390/pci: expose a PCI device's UID as its index

2021-03-07 Thread Krzysztof Wilczyński
Hi Niklas,

[...]
> +static ssize_t index_show(struct device *dev,
> +   struct device_attribute *attr, char *buf)
> +{
> + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
> + u32 index = ~0;
> +
> + if (zpci_unique_uid)
> +         index = zdev->uid;
> +
> + return sprintf(buf, "%u\n", index);
[...]

Would it be possible to use the new sysfs_emit() rather than sprintf()
even though the zpci_attr macro and still use mio_enabled_show() still
would use sprintf().  What do you think?

See https://www.kernel.org/doc/html/latest/filesystems/sysfs.html for
the changes in the internal API.

Krzysztof


[RFC 0/1] s390/pci: expose a PCI device's UID as its index

2021-03-03 Thread Niklas Schnelle
Hi,

On s390 each PCI device has a user-defined ID (UID).  This ID was
designed to serve as the PCI device's primary index and to match the
device within Linux to the with the view in the hypervisor
configuration. To serve as a primary identifier the UID must be unique
within the Linux instance, this is guaranteed by the platform if and
only if the UID Uniqueness Checking flag is set within the CLP List PCI
Functions response which is also currently used to determine whether the
UID is also used as the Domain part of the geographical PCI address of
the device.

As primary identifier of a PCI device, the UID serves an analogous
function as the SMBIOS instance number or ACPI index exposed as the
"index" respectively "acpi_index" device attributes. These attributes
are used by e.g. systemd/udev to set network interface names. As s390
does not use  ACPI nor SMBIOS there is no conflict and we can expose the
UID under the "index" attribute whenever UID Uniqueness Checking is
active and systemd/udev will then create "eno.." interface names.

Note: This is an evolution of an earlier patch I sent for exposing the
UID Uniqueness Checking flag directly. Thank you Greg for making me
realize that we were looking too much at just exposing platform details
instead of looking how existing interfaces could suit our purpose.

Thanks,
Niklas Schnelle

Niklas Schnelle (1):
  s390/pci: expose a PCI device's UID as its index

 Documentation/ABI/testing/sysfs-bus-pci | 11 +---
 arch/s390/pci/pci_sysfs.c   | 36 +
 2 files changed, 43 insertions(+), 4 deletions(-)

-- 
2.25.1



[RFC 1/1] s390/pci: expose a PCI device's UID as its index

2021-03-03 Thread Niklas Schnelle
On s390 each PCI device has a user-defined ID (UID) exposed under
/sys/bus/pci/devices//uid. This ID was designed to serve as the PCI
device's primary index and to match the device within Linux to the
device configured in the hypervisor. To serve as a primary identifier
the UID must be unique within the Linux instance, this is guaranteed by
the platform if and only if the UID Uniqueness Checking flag is set
within the CLP List PCI Functions response.

In this the UID serves an analogous function as the SMBIOS instance
number or ACPI index exposed as the "index" respectively "acpi_index"
device attributes and used by e.g. systemd to set interface names. As
s390 does not use and will likely never use ACPI nor SMBIOS there is no
conflict and we can just expose the UID under the "index" attribute
whenever UID Uniqueness Checking is active and get systemd's interface
naming support for free.

Signed-off-by: Niklas Schnelle 
Acked-by: Viktor Mihajlovski 
---
 Documentation/ABI/testing/sysfs-bus-pci | 11 +---
 arch/s390/pci/pci_sysfs.c   | 36 +
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index 25c9c39770c6..1241b6d11a52 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index
 Date:  July 2010
 Contact:   Narendra K , linux-b...@dell.com
 Description:
-   Reading this attribute will provide the firmware
-   given instance (SMBIOS type 41 device type instance) of the
-   PCI device. The attribute will be created only if the firmware
-   has given an instance number to the PCI device.
+   Reading this attribute will provide the firmware given instance
+   number of the PCI device.  Depending on the platform this can
+   be for example the SMBIOS type 41 device type instance or the
+   user-defined ID (UID) on s390. The attribute will be created
+   only if the firmware has given an instance number to the PCI
+   device and that number is guaranteed to uniquely identify the
+   device in the system.
 Users:
Userspace applications interested in knowing the
firmware assigned device type instance of the PCI
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 5c028bee91b9..30f48e8a1156 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -131,6 +131,38 @@ static ssize_t report_error_write(struct file *filp, 
struct kobject *kobj,
 }
 static BIN_ATTR(report_error, S_IWUSR, NULL, report_error_write, PAGE_SIZE);
 
+#ifndef CONFIG_DMI
+/* analogous to smbios index */
+static ssize_t index_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
+   u32 index = ~0;
+
+   if (zpci_unique_uid)
+   index = zdev->uid;
+
+   return sprintf(buf, "%u\n", index);
+}
+static DEVICE_ATTR_RO(index);
+
+static umode_t zpci_unique_uids(struct kobject *kobj,
+   struct attribute *attr, int n)
+{
+   return zpci_unique_uid ? attr->mode : 0;
+}
+
+static struct attribute *zpci_ident_attrs[] = {
+   &dev_attr_index.attr,
+   NULL,
+};
+
+static struct attribute_group zpci_ident_attr_group = {
+   .attrs = zpci_ident_attrs,
+   .is_visible = zpci_unique_uids,
+};
+#endif
+
 static struct bin_attribute *zpci_bin_attrs[] = {
&bin_attr_util_string,
&bin_attr_report_error,
@@ -150,6 +182,7 @@ static struct attribute *zpci_dev_attrs[] = {
&dev_attr_mio_enabled.attr,
NULL,
 };
+
 static struct attribute_group zpci_attr_group = {
.attrs = zpci_dev_attrs,
.bin_attrs = zpci_bin_attrs,
@@ -170,5 +203,8 @@ static struct attribute_group pfip_attr_group = {
 const struct attribute_group *zpci_attr_groups[] = {
&zpci_attr_group,
&pfip_attr_group,
+#ifndef CONFIG_DMI
+   &zpci_ident_attr_group,
+#endif
NULL,
 };
-- 
2.25.1



[PATCH v2 06/17] KVM: arm64: Advertise KVM UID to guests via SMCCC

2021-02-08 Thread Gavin Shan
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Signed-off-by: Will Deacon 
Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index a54c4805f2a6..e02e29a12bbf 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   long val[4] = { SMCCC_RET_NOT_SUPPORTED };
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,22 +54,31 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
case SDEI_1_0_FN_SDEI_VERSION:
case SDEI_1_0_FN_SDEI_EVENT_REGISTER:
@@ -93,6 +102,6 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.23.0



[PATCH v18 2/7] KVM: arm64: Advertise KVM UID to guests via SMCCC

2021-02-08 Thread Marc Zyngier
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong...@arm.com
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 25ea4ecb6449..b9d8607083eb 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.29.2



Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute

2021-02-04 Thread Niklas Schnelle



On 2/4/21 2:38 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 04, 2021 at 01:02:51PM +0100, Niklas Schnelle wrote:
>>
>>
>> On 2/4/21 11:40 AM, Greg Kroah-Hartman wrote:
>>> On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote:
>>>> The global UID uniqueness attribute exposes whether the platform
>>>> guarantees that the user-defined per-device UID attribute values
>>>> (/sys/bus/pci/device//uid) are unique and can thus be used as
>>>> a global identifier for the associated PCI device. With this commit
>>>> it is exposed at /sys/bus/pci/zpci/unique_uids
>>>>
>>>> Signed-off-by: Niklas Schnelle 
>>>> ---
>>>>  Documentation/ABI/testing/sysfs-bus-pci |  9 +
>>>>  drivers/pci/pci-sysfs.c | 21 +
>>>>  2 files changed, 30 insertions(+)
>>>>
>>>> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
>>>> b/Documentation/ABI/testing/sysfs-bus-pci
>>>> index 25c9c39770c6..812dd9d3f80d 100644
>>>> --- a/Documentation/ABI/testing/sysfs-bus-pci
>>>> +++ b/Documentation/ABI/testing/sysfs-bus-pci
>>>> @@ -375,3 +375,12 @@ Description:
>>>>The value comes from the PCI kernel device state and can be one
>>>>of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
>>>>The file is read only.
>>>> +What: /sys/bus/pci/zpci/unique_uids
>>>
>>> No blank line before this new line?
>>
>> Good catch, thanks!
>>
>>>
>>> And why "zpci"?
>>
>> There doesn't seem to be a precedent for arch specific attributes under
>> /sys/bus/pci so I went with a separate group for the RFC.
> 
> Why?  There's nothing arch-specific here, right?  Either the file is
> present or not.

Good point, will change this to /sys/bus/pci/unique_uids

> 
>> "zpci" since that's what we've been calling the s390 specific PCI.
>> I'm not attached to that name or having a separate group, from
>> my perspective /sys/bus/pci/unique_uids would actually be ideal
>> if Bjorn is okay with that, we don't currently foresee any additional
>> global attributes and no one else seems to have them but
>> one never knows and a separate group would then of course,
>> well group them.
> 
> Why not just a simple file, no need for arch-specific names if this
> really can show up under other arches?

I guess other arches would be free to implement the same UID concept
as above I'm happy to change this to a simple file.

> 
>>>> +Date: February 2021
>>>> +Contact:  Niklas Schnelle 
>>>> +Description:
>>>> +  This attribute exposes the global state of UID Uniqueness on an
>>>> +  s390 Linux system. If this file contains '1' the per-device UID
>>>> +  attribute is guaranteed to provide a unique user defined
>>>> +  identifier for that PCI device. If this file contains '0' UIDs
>>>> +  may collide and do not provide a unique identifier.
>>>
>>> What are they "colliding" with?  And where does the UID come from, the
>>> device itself or somewhere else?
>>
>> If this attribute is 0 multiple PCI devices seen by Linux may have the same 
>> UID i.e.
>> they may collide with each other on the same Linux instance.
> 
> So can't userspace figure this out on its own?

Userspace can figure out that they do not currently collide but
without this attribute it doesn't know that the platform also guarantees
that no collision will be introduced by e.g. hotplug. We are however
pushing to have this always turned on so yes over time this attribute
will hopefully be always 1 but this will take years.

> 
>> The
>> UIDs are exposed under /sys/bus/pci/devices//uid. Even if the attribute 
>> is 1
>> multiple Linux instances will often see the same UID. The main use case
>> we currently envision is naming PCI based network interfaces "eno"
>> which of course only works if the UIDs are unique for that Linux.
>> On the same mainframe multiple Linux instances may then e.g. use the same
>> UID for VFs from the same physical PCI network card or different cards
>> but the same physical network all defined by an administrator/management
>> system.
>>
>> The UIDs come from the platform/firmware/hypervisor and are provided
>> for each device by t

Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute

2021-02-04 Thread Greg Kroah-Hartman
On Thu, Feb 04, 2021 at 01:02:51PM +0100, Niklas Schnelle wrote:
> 
> 
> On 2/4/21 11:40 AM, Greg Kroah-Hartman wrote:
> > On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote:
> >> The global UID uniqueness attribute exposes whether the platform
> >> guarantees that the user-defined per-device UID attribute values
> >> (/sys/bus/pci/device//uid) are unique and can thus be used as
> >> a global identifier for the associated PCI device. With this commit
> >> it is exposed at /sys/bus/pci/zpci/unique_uids
> >>
> >> Signed-off-by: Niklas Schnelle 
> >> ---
> >>  Documentation/ABI/testing/sysfs-bus-pci |  9 +
> >>  drivers/pci/pci-sysfs.c | 21 +
> >>  2 files changed, 30 insertions(+)
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> >> b/Documentation/ABI/testing/sysfs-bus-pci
> >> index 25c9c39770c6..812dd9d3f80d 100644
> >> --- a/Documentation/ABI/testing/sysfs-bus-pci
> >> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> >> @@ -375,3 +375,12 @@ Description:
> >>The value comes from the PCI kernel device state and can be one
> >>of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
> >>The file is read only.
> >> +What: /sys/bus/pci/zpci/unique_uids
> > 
> > No blank line before this new line?
> 
> Good catch, thanks!
> 
> > 
> > And why "zpci"?
> 
> There doesn't seem to be a precedent for arch specific attributes under
> /sys/bus/pci so I went with a separate group for the RFC.

Why?  There's nothing arch-specific here, right?  Either the file is
present or not.

> "zpci" since that's what we've been calling the s390 specific PCI.
> I'm not attached to that name or having a separate group, from
> my perspective /sys/bus/pci/unique_uids would actually be ideal
> if Bjorn is okay with that, we don't currently foresee any additional
> global attributes and no one else seems to have them but
> one never knows and a separate group would then of course,
> well group them.

Why not just a simple file, no need for arch-specific names if this
really can show up under other arches?

> >> +Date: February 2021
> >> +Contact:  Niklas Schnelle 
> >> +Description:
> >> +  This attribute exposes the global state of UID Uniqueness on an
> >> +  s390 Linux system. If this file contains '1' the per-device UID
> >> +  attribute is guaranteed to provide a unique user defined
> >> +  identifier for that PCI device. If this file contains '0' UIDs
> >> +  may collide and do not provide a unique identifier.
> > 
> > What are they "colliding" with?  And where does the UID come from, the
> > device itself or somewhere else?
> 
> If this attribute is 0 multiple PCI devices seen by Linux may have the same 
> UID i.e.
> they may collide with each other on the same Linux instance.

So can't userspace figure this out on its own?

> The
> UIDs are exposed under /sys/bus/pci/devices//uid. Even if the attribute 
> is 1
> multiple Linux instances will often see the same UID. The main use case
> we currently envision is naming PCI based network interfaces "eno"
> which of course only works if the UIDs are unique for that Linux.
> On the same mainframe multiple Linux instances may then e.g. use the same
> UID for VFs from the same physical PCI network card or different cards
> but the same physical network all defined by an administrator/management
> system.
> 
> The UIDs come from the platform/firmware/hypervisor and are provided
> for each device by the CLP List PCI Functions "instruction" that is used
> on s390x where an OS can't probe the physical PCI bus but instead
> that is done by firmware. On QEMU/KVM they can be set on the QEMU cli,
> on our machine hypervisor they are defined by the machine 
> administrator/management
> software as part of the definition of VMs/Partitions on that mainframe which 
> includes
> everything from the number of CPUs, memory, I/O devices etc. With the exposed 
> UID uniqueness
> attribute the platform then certifies that it will ensure that a UID is set to
> a unique non-zero value. I can of course add more of this explanation
> in the documentation too.

Please explain it more, but why would userspace care about this?  If
userspace sees a UID "collision" then it just adds something else to the
end of the name to make it unique.

What is it supposed to do differently based on the value/presense of
this file?

> Both the uniqueness guarantee (this attribute) as well as the UIDs themselves
> are part of the Z (s390x) architecture so fall under the mainframe typical
> backwards compatibility considerations.

So what can userspace do with this?  What tool is going to rely on this?

thanks,

greg k-h


Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute

2021-02-04 Thread Niklas Schnelle



On 2/4/21 11:40 AM, Greg Kroah-Hartman wrote:
> On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote:
>> The global UID uniqueness attribute exposes whether the platform
>> guarantees that the user-defined per-device UID attribute values
>> (/sys/bus/pci/device//uid) are unique and can thus be used as
>> a global identifier for the associated PCI device. With this commit
>> it is exposed at /sys/bus/pci/zpci/unique_uids
>>
>> Signed-off-by: Niklas Schnelle 
>> ---
>>  Documentation/ABI/testing/sysfs-bus-pci |  9 +
>>  drivers/pci/pci-sysfs.c | 21 +
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
>> b/Documentation/ABI/testing/sysfs-bus-pci
>> index 25c9c39770c6..812dd9d3f80d 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-pci
>> +++ b/Documentation/ABI/testing/sysfs-bus-pci
>> @@ -375,3 +375,12 @@ Description:
>>  The value comes from the PCI kernel device state and can be one
>>  of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
>>  The file is read only.
>> +What:   /sys/bus/pci/zpci/unique_uids
> 
> No blank line before this new line?

Good catch, thanks!

> 
> And why "zpci"?

There doesn't seem to be a precedent for arch specific attributes under
/sys/bus/pci so I went with a separate group for the RFC.
"zpci" since that's what we've been calling the s390 specific PCI.
I'm not attached to that name or having a separate group, from
my perspective /sys/bus/pci/unique_uids would actually be ideal
if Bjorn is okay with that, we don't currently foresee any additional
global attributes and no one else seems to have them but
one never knows and a separate group would then of course,
well group them.

> 
>> +Date:   February 2021
>> +Contact:Niklas Schnelle 
>> +Description:
>> +This attribute exposes the global state of UID Uniqueness on an
>> +s390 Linux system. If this file contains '1' the per-device UID
>> +attribute is guaranteed to provide a unique user defined
>> +identifier for that PCI device. If this file contains '0' UIDs
>> +may collide and do not provide a unique identifier.
> 
> What are they "colliding" with?  And where does the UID come from, the
> device itself or somewhere else?

If this attribute is 0 multiple PCI devices seen by Linux may have the same UID 
i.e.
they may collide with each other on the same Linux instance. The
UIDs are exposed under /sys/bus/pci/devices//uid. Even if the attribute is 
1
multiple Linux instances will often see the same UID. The main use case
we currently envision is naming PCI based network interfaces "eno"
which of course only works if the UIDs are unique for that Linux.
On the same mainframe multiple Linux instances may then e.g. use the same
UID for VFs from the same physical PCI network card or different cards
but the same physical network all defined by an administrator/management
system.

The UIDs come from the platform/firmware/hypervisor and are provided
for each device by the CLP List PCI Functions "instruction" that is used
on s390x where an OS can't probe the physical PCI bus but instead
that is done by firmware. On QEMU/KVM they can be set on the QEMU cli,
on our machine hypervisor they are defined by the machine 
administrator/management
software as part of the definition of VMs/Partitions on that mainframe which 
includes
everything from the number of CPUs, memory, I/O devices etc. With the exposed 
UID uniqueness
attribute the platform then certifies that it will ensure that a UID is set to
a unique non-zero value. I can of course add more of this explanation
in the documentation too.

Both the uniqueness guarantee (this attribute) as well as the UIDs themselves
are part of the Z (s390x) architecture so fall under the mainframe typical
backwards compatibility considerations.

Thanks,
Niklas

> 
> thanks,
> 
> greg k-h
> 


Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute

2021-02-04 Thread Greg Kroah-Hartman
On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote:
> The global UID uniqueness attribute exposes whether the platform
> guarantees that the user-defined per-device UID attribute values
> (/sys/bus/pci/device//uid) are unique and can thus be used as
> a global identifier for the associated PCI device. With this commit
> it is exposed at /sys/bus/pci/zpci/unique_uids
> 
> Signed-off-by: Niklas Schnelle 
> ---
>  Documentation/ABI/testing/sysfs-bus-pci |  9 +
>  drivers/pci/pci-sysfs.c | 21 +
>  2 files changed, 30 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> b/Documentation/ABI/testing/sysfs-bus-pci
> index 25c9c39770c6..812dd9d3f80d 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -375,3 +375,12 @@ Description:
>   The value comes from the PCI kernel device state and can be one
>   of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
>   The file is read only.
> +What:/sys/bus/pci/zpci/unique_uids

No blank line before this new line?

And why "zpci"?

> +Date:February 2021
> +Contact: Niklas Schnelle 
> +Description:
> +     This attribute exposes the global state of UID Uniqueness on an
> + s390 Linux system. If this file contains '1' the per-device UID
> + attribute is guaranteed to provide a unique user defined
> + identifier for that PCI device. If this file contains '0' UIDs
> + may collide and do not provide a unique identifier.

What are they "colliding" with?  And where does the UID come from, the
device itself or somewhere else?

thanks,

greg k-h


[RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute

2021-02-04 Thread Niklas Schnelle
The global UID uniqueness attribute exposes whether the platform
guarantees that the user-defined per-device UID attribute values
(/sys/bus/pci/device//uid) are unique and can thus be used as
a global identifier for the associated PCI device. With this commit
it is exposed at /sys/bus/pci/zpci/unique_uids

Signed-off-by: Niklas Schnelle 
---
 Documentation/ABI/testing/sysfs-bus-pci |  9 +
 drivers/pci/pci-sysfs.c | 21 +
 2 files changed, 30 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index 25c9c39770c6..812dd9d3f80d 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -375,3 +375,12 @@ Description:
The value comes from the PCI kernel device state and can be one
of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
The file is read only.
+What:  /sys/bus/pci/zpci/unique_uids
+Date:  February 2021
+Contact:   Niklas Schnelle 
+Description:
+   This attribute exposes the global state of UID Uniqueness on an
+       s390 Linux system. If this file contains '1' the per-device UID
+   attribute is guaranteed to provide a unique user defined
+   identifier for that PCI device. If this file contains '0' UIDs
+   may collide and do not provide a unique identifier.
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index fb072f4b3176..61de66ab59cf 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -425,6 +425,24 @@ static ssize_t rescan_store(struct bus_type *bus, const 
char *buf, size_t count)
 }
 static BUS_ATTR_WO(rescan);
 
+#if defined(CONFIG_S390)
+static ssize_t unique_uids_show(struct bus_type *bus, char *buf)
+{
+   return sprintf(buf, "%i\n", zpci_unique_uid);
+}
+static BUS_ATTR_RO(unique_uids);
+
+static struct attribute *zpci_bus_attrs[] = {
+   &bus_attr_unique_uids.attr,
+   NULL,
+};
+
+static struct attribute_group zpci_bus_group = {
+   .name = "zpci",
+   .attrs = zpci_bus_attrs,
+};
+#endif
+
 static struct attribute *pci_bus_attrs[] = {
&bus_attr_rescan.attr,
NULL,
@@ -436,6 +454,9 @@ static const struct attribute_group pci_bus_group = {
 
 const struct attribute_group *pci_bus_groups[] = {
&pci_bus_group,
+#if defined(CONFIG_S390)
+   &zpci_bus_group,
+#endif
NULL,
 };
 
-- 
2.17.1



[RFC v2 0/1] s390/pci: expose UID checking state in sysfs

2021-02-04 Thread Niklas Schnelle
Hi Bjorn,

this is a follow-up to my previous RFC of exposing our s390 specific UID
Checking attribute at /sys/bus/pci/zpci/unique_uids. As suggested by Greg
(thanks!) this version changes things to use named attributes directly without
resorting to any raw kobject handling, as a result the code moves to
drivers/pci/pci-sysfs.c with a CONFIG_S390 ifdef. Also I've changed the wording
to be more restrictive about what this attribute means. Instead of directly
calling out its current use to determine if UIDs are used as PCI domains it now
only explicitly claims that /sys/bus/pci/devices//uid is guaranteed to be
a unique user-defined (in the zVM/KVM/LPAR configuration not Linux user-space)
identifier for the PCI function.

We've had some more internal discussion on this and are also considering to
instead put this attribute at /sys/firmware/zpci/unique_uids but as far as
I can see this strictly requires the use of raw kobject handling and loses us
the direct relation with PCI so I wanted to give this just one more shot and
get your opinion on it.

Thanks,
Niklas


Niklas Schnelle (1):
  PCI: Add s390 specific UID uniqueness attribute

 Documentation/ABI/testing/sysfs-bus-pci |  9 +
 drivers/pci/pci-sysfs.c | 21 +
 2 files changed, 30 insertions(+)

-- 
2.17.1



[PATCH v17 2/7] KVM: arm64: Advertise KVM UID to guests via SMCCC

2021-02-02 Thread Marc Zyngier
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong...@arm.com
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 25ea4ecb6449..b9d8607083eb 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.29.2



Re: [PATCH] configfs: make directories inherit uid/gid from creator

2021-01-27 Thread Maciej Żenczykowski
> > Currently a non-root process can create directories, but cannot
> > create stuff in the directories it creates.
>
> Isn't that on purpose?

Is it?  What's the use case of being able to create empty directories
you can't use?
Why allow their creation in the first place then?

> > + (void)configfs_setattr(dentry, &ia);
>
> No need for (void), here, right?

Just being clear we're ignoring any potential error return.

> And this feels like a big user-visible change, what is going to break
> with this?

That I don't know, but it's unlikely to break anything, since it's virtually
impossible to use as it is now.  If non-root creates the directory, it's
currently root-owned, so can only be used by root (or something with
appropriate caps).
Something with capabilities will be able to use it even if it is no
longer owned by root.


Re: [PATCH] configfs: make directories inherit uid/gid from creator

2021-01-27 Thread Greg Kroah-Hartman
On Sat, Jan 23, 2021 at 12:55:16PM -0800, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> Currently a non-root process can create directories, but cannot
> create stuff in the directories it creates.

Isn't that on purpose?

> 
> Example (before this patch):
>   phone:/ $ id
>   uid=1000(system) gid=1000(system) groups=1000(system),... context=u:r:su:s0
> 
>   phone:/ $ cd /config/usb_gadget/g1/configs/
> 
>   phone:/config/usb_gadget/g1/configs $ ls -lZ
>   drwxr-xr-x 3 system system u:object_r:configfs:s0  0 2020-12-28 06:03 b.1
> 
>   phone:/config/usb_gadget/g1/configs $ mkdir b.2
> 
>   phone:/config/usb_gadget/g1/configs $ ls -lZ
>   drwxr-xr-x 3 system system u:object_r:configfs:s0  0 2020-12-28 06:03 b.1
>   drwxr-xr-x 3 root   root   u:object_r:configfs:s0  0 2020-12-28 06:51 b.2
> 
>   phone:/config/usb_gadget/g1/configs $ chown system:system b.2
>   chown: 'b.2' to 'system:system': Operation not permitted
> 
>   phone:/config/usb_gadget/g1/configs $ ls -lZ b.1
>   -rw-r--r-- 1 system system u:object_r:configfs:s0  4096 2020-12-28 05:23 
> MaxPower
>   -rw-r--r-- 1 system system u:object_r:configfs:s0  4096 2020-12-28 05:23 
> bmAttributes
>   lrwxrwxrwx 1 root   root   u:object_r:configfs:s0 0 2020-12-28 05:23 
> function0 -> ../../../../usb_gadget/g1/functions/ffs.adb
>   drwxr-xr-x 2 system system u:object_r:configfs:s0 0 2020-12-28 05:23 
> strings
> 
>   phone:/config/usb_gadget/g1/configs $ ln -s 
> ../../../../usb_gadget/g1/functions/ffs.adb b.2/function0
>   ln: cannot create symbolic link from 
> '../../../../usb_gadget/g1/functions/ffs.adb' to 'b.2/function0': Permission 
> denied
> 
> Cc: Greg Kroah-Hartman 
> Cc: Joel Becker 
> Cc: Christoph Hellwig 
> Signed-off-by: Maciej Żenczykowski 
> Change-Id: Ia907b2def940197b44aa87b337d37c5dde9c5b91

No need for "Change-Id:" :)

> ---
>  fs/configfs/dir.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
> index b839dd1b459f..04f18402ef7c 100644
> --- a/fs/configfs/dir.c
> +++ b/fs/configfs/dir.c
> @@ -1410,6 +1410,21 @@ static int configfs_mkdir(struct inode *dir, struct 
> dentry *dentry, umode_t mode
>   else
>   ret = configfs_attach_item(parent_item, item, dentry, frag);
>  
> + /* inherit uid/gid from process creating the directory */
> + if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID) ||
> + !gid_eq(current_fsgid(), GLOBAL_ROOT_GID)) {
> + struct inode * inode = d_inode(dentry);
> + struct iattr ia = {
> + .ia_uid = current_fsuid(),
> + .ia_gid = current_fsgid(),
> + .ia_valid = ATTR_UID | ATTR_GID,
> + };
> + inode->i_uid = ia.ia_uid;
> + inode->i_gid = ia.ia_gid;
> + /* (note: the above manual assignments skip the permission 
> checks) */
> + (void)configfs_setattr(dentry, &ia);

No need for (void), here, right?

And this feels like a big user-visible change, what is going to break
with this?

thanks,

greg k-h


Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-26 Thread Tyler Hicks
On 2021-01-25 14:25:38, Miklos Szeredi wrote:
> On Fri, Jan 22, 2021 at 7:31 PM Tyler Hicks  wrote:
> >
> > On 2021-01-19 17:22:03, Miklos Szeredi wrote:
> > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
> > > vfs_setxattr()") the translation of nscap->rootid did not take stacked
> > > filesystems (overlayfs and ecryptfs) into account.
> > >
> > > That patch fixed the overlay case, but made the ecryptfs case worse.
> >
> > Thanks for sending a fix!
> >
> > I know that you don't have an eCryptfs setup to test with but I'm at a
> > loss about how to test this from the userns/fscaps side of things. Do
> > you have a sequence of unshare/setcap/getcap commands that I can run on
> > a file inside of an eCryptfs mount to verify that the bug exists after
> > 7c03e2cda4a5 and then again to verify that this patch fixes the bug?
> 
> You need two terminals:
> $ = 
> # = root
> 
> $ unshare -Um
> $ echo $$
> 
> # echo "0 1000 1" > uid_map
> # cp uid_map gid_map
> # echo 1000 2000 1 >> uid_map
> # echo 2000 3000 1 >> uid_map
> # cat uid_map > /proc//uid_map
> # cat gid_map > /proc//gid_map
> $ mkdir ~/tmp ~/mnt
> $ mount -t tmpfs tmpfs ~/tmp
> $ pwd
> /home/
> # nsenter -t  -m
> # [setup ecryptfs on /home//mnt using /home//tmp]
> $ cd ~/mnt
> $ touch test
> $ /sbin/setcap -n 1000 cap_dac_override+eip test
> $ /sbin/getcap -n test
> test = cap_dac_override+eip [rootid=1000]
> 
> Without the patch, I'm thinking that it will do a double translate and
> end up with rootid=2000 in the user namespace, but I might well have
> messed it up...
> 
> Let me know how this goes.

Spot-on instructions. Thank you for taking the time to provide the
steps.

I was able to repro the bug and verify the fix. The change visually
looks good to me and it passed through the eCryptfs regression tests.

I've pushed it to the eCryptfs next branch and I plan to submit it to
Linus on Thursday. Thanks again!

Tyler

> 
> Thanks,
> Miklos
> 


Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-25 Thread Miklos Szeredi
On Mon, Jan 25, 2021 at 2:25 PM Miklos Szeredi  wrote:
>
> On Fri, Jan 22, 2021 at 7:31 PM Tyler Hicks  wrote:
> >
> > On 2021-01-19 17:22:03, Miklos Szeredi wrote:
> > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
> > > vfs_setxattr()") the translation of nscap->rootid did not take stacked
> > > filesystems (overlayfs and ecryptfs) into account.
> > >
> > > That patch fixed the overlay case, but made the ecryptfs case worse.
> >
> > Thanks for sending a fix!
> >
> > I know that you don't have an eCryptfs setup to test with but I'm at a
> > loss about how to test this from the userns/fscaps side of things. Do
> > you have a sequence of unshare/setcap/getcap commands that I can run on
> > a file inside of an eCryptfs mount to verify that the bug exists after
> > 7c03e2cda4a5 and then again to verify that this patch fixes the bug?
>
> You need two terminals:
> $ = 
> # = root
>
> $ unshare -Um
> $ echo $$
> 
> # echo "0 1000 1" > uid_map

NOTE:   is assumed to have uid=1000, so this and following
"1000" values need to be fixed up if it's not the case.

Thanks,
Miklos



Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-25 Thread Miklos Szeredi
On Fri, Jan 22, 2021 at 7:31 PM Tyler Hicks  wrote:
>
> On 2021-01-19 17:22:03, Miklos Szeredi wrote:
> > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
> > vfs_setxattr()") the translation of nscap->rootid did not take stacked
> > filesystems (overlayfs and ecryptfs) into account.
> >
> > That patch fixed the overlay case, but made the ecryptfs case worse.
>
> Thanks for sending a fix!
>
> I know that you don't have an eCryptfs setup to test with but I'm at a
> loss about how to test this from the userns/fscaps side of things. Do
> you have a sequence of unshare/setcap/getcap commands that I can run on
> a file inside of an eCryptfs mount to verify that the bug exists after
> 7c03e2cda4a5 and then again to verify that this patch fixes the bug?

You need two terminals:
$ = 
# = root

$ unshare -Um
$ echo $$

# echo "0 1000 1" > uid_map
# cp uid_map gid_map
# echo 1000 2000 1 >> uid_map
# echo 2000 3000 1 >> uid_map
# cat uid_map > /proc//uid_map
# cat gid_map > /proc//gid_map
$ mkdir ~/tmp ~/mnt
$ mount -t tmpfs tmpfs ~/tmp
$ pwd
/home/
# nsenter -t  -m
# [setup ecryptfs on /home//mnt using /home//tmp]
$ cd ~/mnt
$ touch test
$ /sbin/setcap -n 1000 cap_dac_override+eip test
$ /sbin/getcap -n test
test = cap_dac_override+eip [rootid=1000]

Without the patch, I'm thinking that it will do a double translate and
end up with rootid=2000 in the user namespace, but I might well have
messed it up...

Let me know how this goes.

Thanks,
Miklos



[PATCH] configfs: make directories inherit uid/gid from creator

2021-01-23 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

Currently a non-root process can create directories, but cannot
create stuff in the directories it creates.

Example (before this patch):
  phone:/ $ id
  uid=1000(system) gid=1000(system) groups=1000(system),... context=u:r:su:s0

  phone:/ $ cd /config/usb_gadget/g1/configs/

  phone:/config/usb_gadget/g1/configs $ ls -lZ
  drwxr-xr-x 3 system system u:object_r:configfs:s0  0 2020-12-28 06:03 b.1

  phone:/config/usb_gadget/g1/configs $ mkdir b.2

  phone:/config/usb_gadget/g1/configs $ ls -lZ
  drwxr-xr-x 3 system system u:object_r:configfs:s0  0 2020-12-28 06:03 b.1
  drwxr-xr-x 3 root   root   u:object_r:configfs:s0  0 2020-12-28 06:51 b.2

  phone:/config/usb_gadget/g1/configs $ chown system:system b.2
  chown: 'b.2' to 'system:system': Operation not permitted

  phone:/config/usb_gadget/g1/configs $ ls -lZ b.1
  -rw-r--r-- 1 system system u:object_r:configfs:s0  4096 2020-12-28 05:23 
MaxPower
  -rw-r--r-- 1 system system u:object_r:configfs:s0  4096 2020-12-28 05:23 
bmAttributes
  lrwxrwxrwx 1 root   root   u:object_r:configfs:s0 0 2020-12-28 05:23 
function0 -> ../../../../usb_gadget/g1/functions/ffs.adb
  drwxr-xr-x 2 system system u:object_r:configfs:s0 0 2020-12-28 05:23 
strings

  phone:/config/usb_gadget/g1/configs $ ln -s 
../../../../usb_gadget/g1/functions/ffs.adb b.2/function0
  ln: cannot create symbolic link from 
'../../../../usb_gadget/g1/functions/ffs.adb' to 'b.2/function0': Permission 
denied

Cc: Greg Kroah-Hartman 
Cc: Joel Becker 
Cc: Christoph Hellwig 
Signed-off-by: Maciej Żenczykowski 
Change-Id: Ia907b2def940197b44aa87b337d37c5dde9c5b91
---
 fs/configfs/dir.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index b839dd1b459f..04f18402ef7c 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -1410,6 +1410,21 @@ static int configfs_mkdir(struct inode *dir, struct 
dentry *dentry, umode_t mode
else
ret = configfs_attach_item(parent_item, item, dentry, frag);
 
+   /* inherit uid/gid from process creating the directory */
+   if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID) ||
+   !gid_eq(current_fsgid(), GLOBAL_ROOT_GID)) {
+   struct inode * inode = d_inode(dentry);
+   struct iattr ia = {
+   .ia_uid = current_fsuid(),
+   .ia_gid = current_fsgid(),
+   .ia_valid = ATTR_UID | ATTR_GID,
+   };
+   inode->i_uid = ia.ia_uid;
+   inode->i_gid = ia.ia_gid;
+   /* (note: the above manual assignments skip the permission 
checks) */
+   (void)configfs_setattr(dentry, &ia);
+   }
+
spin_lock(&configfs_dirent_lock);
sd->s_type &= ~CONFIGFS_USET_IN_MKDIR;
if (!ret)
-- 
2.30.0.280.ga3ce27912f-goog



Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-22 Thread Tyler Hicks
On 2021-01-19 17:22:03, Miklos Szeredi wrote:
> Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
> vfs_setxattr()") the translation of nscap->rootid did not take stacked
> filesystems (overlayfs and ecryptfs) into account.
> 
> That patch fixed the overlay case, but made the ecryptfs case worse.

Thanks for sending a fix!

I know that you don't have an eCryptfs setup to test with but I'm at a
loss about how to test this from the userns/fscaps side of things. Do
you have a sequence of unshare/setcap/getcap commands that I can run on
a file inside of an eCryptfs mount to verify that the bug exists after
7c03e2cda4a5 and then again to verify that this patch fixes the bug?

Tyler

> 
> Restore old the behavior for ecryptfs that existed before the overlayfs
> fix.  This does not fix ecryptfs's handling of complex user namespace
> setups, but it does make sure existing setups don't regress.
> 
> Reported-by: Eric W. Biederman 
> Cc: Tyler Hicks 
> Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()")
> Signed-off-by: Miklos Szeredi 
> ---
>  fs/ecryptfs/inode.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index e23752d9a79f..58d0f7187997 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -1016,15 +1016,19 @@ ecryptfs_setxattr(struct dentry *dentry, struct inode 
> *inode,
>  {
>   int rc;
>   struct dentry *lower_dentry;
> + struct inode *lower_inode;
>  
>   lower_dentry = ecryptfs_dentry_to_lower(dentry);
> - if (!(d_inode(lower_dentry)->i_opflags & IOP_XATTR)) {
> + lower_inode = d_inode(lower_dentry);
> + if (!(lower_inode->i_opflags & IOP_XATTR)) {
>   rc = -EOPNOTSUPP;
>   goto out;
>   }
> - rc = vfs_setxattr(lower_dentry, name, value, size, flags);
> + inode_lock(lower_inode);
> + rc = __vfs_setxattr_locked(lower_dentry, name, value, size, flags, 
> NULL);
> + inode_unlock(lower_inode);
>   if (!rc && inode)
> - fsstack_copy_attr_all(inode, d_inode(lower_dentry));
> + fsstack_copy_attr_all(inode, lower_inode);
>  out:
>   return rc;
>  }
> -- 
> 2.26.2
> 


Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-22 Thread Tyler Hicks
On 2021-01-20 08:52:27, Miklos Szeredi wrote:
> On Tue, Jan 19, 2021 at 10:11 PM Eric W. Biederman
>  wrote:
> >
> > Miklos Szeredi  writes:
> >
> > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
> > > vfs_setxattr()") the translation of nscap->rootid did not take stacked
> > > filesystems (overlayfs and ecryptfs) into account.
> > >
> > > That patch fixed the overlay case, but made the ecryptfs case worse.
> > >
> > > Restore old the behavior for ecryptfs that existed before the overlayfs
> > > fix.  This does not fix ecryptfs's handling of complex user namespace
> > > setups, but it does make sure existing setups don't regress.
> >
> > Today vfs_setxattr handles handles a delegated_inode and breaking
> > leases.  Code that is enabled with CONFIG_FILE_LOCKING.  So unless
> > I am missing something this introduces a different regression into
> > ecryptfs.
> 
> This is in line with all the other cases of ecryptfs passing NULL as
> delegated inode.
> 
> I'll defer this to the maintainer of ecryptfs.

eCryptfs cannot be exported so I do not think this proposed fix to
ecryptfs_setxattr() creates a new regression wrt inode delegation.

Tyler

> 
> Thanks,
> Miklos


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-21 Thread Niklas Schnelle



On 1/21/21 6:28 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 21, 2021 at 06:04:52PM +0100, Niklas Schnelle wrote:
>>
>>
>> On 1/21/21 4:54 PM, Bjorn Helgaas wrote:
>>> [Greg may be able to help compare/contrast this s390 UID with udev
>>> persistent names]
>>>
>>> On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote:
>>>> On 1/15/21 4:29 PM, Bjorn Helgaas wrote:
>>>>> On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
>>>>>> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
>>>>>>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
>>>>>>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
>>>>>>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
>>>>>>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
>>>>>>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
>>>>>>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
>>>>>>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
>>>>>>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
>>>>>>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
>>>>>> ... snip ...
>>>>>>
>>>>>>>
>>>>>>>>if (!zpci_global_kset)
>>>>>>>>return -ENOMEM;
>>>>>>>>
>>>>>>>>return sysfs_create_group(&zpci_global_kset->kobj, 
>>>>>>>> &zpci_attr_group_global);
>>>>>>>
>>>>>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
>>>>>>> that's usually a huge clue that you are doing something wrong.
>>>>>>>
>>>>>>> Try the above again, with a simple attribute group, and name for it, and
>>>>>>> it should "just work".
>>>>>>
>>>>>> I'm probably missing something but I don't get how this could work
>>>>>> in this case. If I'm seeing this right the default attribute group
>>>>>> here is pci_bus_type.bus_groups and that is already set in
>>>>>> drivers/pci/pci-driver.c so I don't think I should set that.
>>>>>>
>>>>>> I did however find bus_create_file() which does work when using the
>>>>>> path /sys/bus/pci/uid_checking instead. This would work for us if
>>>>>> Bjorn is okay with that path and the code is really clean and simple
>>>>>> too.
>>>>>>
>>>>>> That said, I think we could also add something like
>>>>>> bus_create_group().  Then we could use that to also clean up
>>>>>> drivers/pci/slot.c:pci_slot_init() and get the original path
>>>>>> /sys/bus/pci/zpci/uid_checking.
>>>>>
>>>>> I don't think "uid_checking" is quite the right name.  It says
>>>>> something about the *implementation*, but it doesn't convey what that
>>>>> *means* to userspace.  IIUC this file tells userspace something about
>>>>> whether a given PCI device always has the same PCI domain/bus/dev/fn
>>>>> address (or maybe just the same domain?)
>>>>>
>>>>> It sounds like this feature could be useful beyond just s390, and
>>>>> other arches might implement it differently, without the UID concept.
>>>>> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as
>>>>> the name is not s390-specific (and "uid" sounds s390-specific).
>>>>>
>>>>> I assume it would also help with the udev/systemd end if you could
>>>>> make this less s390 dependent.
>>>>
>>>> I've thought about this more and even implemented a proof of concept
>>>> patch for a global attribute using a pcibios_has_reproducible_addressing()
>>>> hook. 
>>>>
>>>> However after implementing it I think as a more general and
>>>> future proof concept it makes more sense to do this as a per device
>>>> attribute, maybe as another flag in "stuct pci_dev" named something
>>>> like "reliable_address". My reasoning behind this can 

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-21 Thread Greg Kroah-Hartman
On Thu, Jan 21, 2021 at 06:04:52PM +0100, Niklas Schnelle wrote:
> 
> 
> On 1/21/21 4:54 PM, Bjorn Helgaas wrote:
> > [Greg may be able to help compare/contrast this s390 UID with udev
> > persistent names]
> > 
> > On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote:
> >> On 1/15/21 4:29 PM, Bjorn Helgaas wrote:
> >>> On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
> >>>> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
> >>>>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
> >>>>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
> >>>>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
> >>>>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> >>>>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> >>>>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> >>>>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> >>>>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> >>>>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> >>>> ... snip ...
> >>>>
> >>>>>
> >>>>>>if (!zpci_global_kset)
> >>>>>>return -ENOMEM;
> >>>>>>
> >>>>>>return sysfs_create_group(&zpci_global_kset->kobj, 
> >>>>>> &zpci_attr_group_global);
> >>>>>
> >>>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
> >>>>> that's usually a huge clue that you are doing something wrong.
> >>>>>
> >>>>> Try the above again, with a simple attribute group, and name for it, and
> >>>>> it should "just work".
> >>>>
> >>>> I'm probably missing something but I don't get how this could work
> >>>> in this case. If I'm seeing this right the default attribute group
> >>>> here is pci_bus_type.bus_groups and that is already set in
> >>>> drivers/pci/pci-driver.c so I don't think I should set that.
> >>>>
> >>>> I did however find bus_create_file() which does work when using the
> >>>> path /sys/bus/pci/uid_checking instead. This would work for us if
> >>>> Bjorn is okay with that path and the code is really clean and simple
> >>>> too.
> >>>>
> >>>> That said, I think we could also add something like
> >>>> bus_create_group().  Then we could use that to also clean up
> >>>> drivers/pci/slot.c:pci_slot_init() and get the original path
> >>>> /sys/bus/pci/zpci/uid_checking.
> >>>
> >>> I don't think "uid_checking" is quite the right name.  It says
> >>> something about the *implementation*, but it doesn't convey what that
> >>> *means* to userspace.  IIUC this file tells userspace something about
> >>> whether a given PCI device always has the same PCI domain/bus/dev/fn
> >>> address (or maybe just the same domain?)
> >>>
> >>> It sounds like this feature could be useful beyond just s390, and
> >>> other arches might implement it differently, without the UID concept.
> >>> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as
> >>> the name is not s390-specific (and "uid" sounds s390-specific).
> >>>
> >>> I assume it would also help with the udev/systemd end if you could
> >>> make this less s390 dependent.
> >>
> >> I've thought about this more and even implemented a proof of concept
> >> patch for a global attribute using a pcibios_has_reproducible_addressing()
> >> hook. 
> >>
> >> However after implementing it I think as a more general and
> >> future proof concept it makes more sense to do this as a per device
> >> attribute, maybe as another flag in "stuct pci_dev" named something
> >> like "reliable_address". My reasoning behind this can be best be seen
> >> with a QEMU example. While I expect that QEMU can easily guarantee
> >> that one can always use ":01:00.0" for a virtio-pci NIC and
> >> thus enp1s0 interface name, the same might be harder to guarantee
&g

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-21 Thread Niklas Schnelle



On 1/21/21 4:54 PM, Bjorn Helgaas wrote:
> [Greg may be able to help compare/contrast this s390 UID with udev
> persistent names]
> 
> On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote:
>> On 1/15/21 4:29 PM, Bjorn Helgaas wrote:
>>> On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
>>>> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
>>>>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
>>>>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
>>>>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
>>>>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
>>>>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
>>>>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
>>>>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
>>>>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
>>>>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
>>>> ... snip ...
>>>>
>>>>>
>>>>>>  if (!zpci_global_kset)
>>>>>>  return -ENOMEM;
>>>>>>
>>>>>>  return sysfs_create_group(&zpci_global_kset->kobj, 
>>>>>> &zpci_attr_group_global);
>>>>>
>>>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
>>>>> that's usually a huge clue that you are doing something wrong.
>>>>>
>>>>> Try the above again, with a simple attribute group, and name for it, and
>>>>> it should "just work".
>>>>
>>>> I'm probably missing something but I don't get how this could work
>>>> in this case. If I'm seeing this right the default attribute group
>>>> here is pci_bus_type.bus_groups and that is already set in
>>>> drivers/pci/pci-driver.c so I don't think I should set that.
>>>>
>>>> I did however find bus_create_file() which does work when using the
>>>> path /sys/bus/pci/uid_checking instead. This would work for us if
>>>> Bjorn is okay with that path and the code is really clean and simple
>>>> too.
>>>>
>>>> That said, I think we could also add something like
>>>> bus_create_group().  Then we could use that to also clean up
>>>> drivers/pci/slot.c:pci_slot_init() and get the original path
>>>> /sys/bus/pci/zpci/uid_checking.
>>>
>>> I don't think "uid_checking" is quite the right name.  It says
>>> something about the *implementation*, but it doesn't convey what that
>>> *means* to userspace.  IIUC this file tells userspace something about
>>> whether a given PCI device always has the same PCI domain/bus/dev/fn
>>> address (or maybe just the same domain?)
>>>
>>> It sounds like this feature could be useful beyond just s390, and
>>> other arches might implement it differently, without the UID concept.
>>> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as
>>> the name is not s390-specific (and "uid" sounds s390-specific).
>>>
>>> I assume it would also help with the udev/systemd end if you could
>>> make this less s390 dependent.
>>
>> I've thought about this more and even implemented a proof of concept
>> patch for a global attribute using a pcibios_has_reproducible_addressing()
>> hook. 
>>
>> However after implementing it I think as a more general and
>> future proof concept it makes more sense to do this as a per device
>> attribute, maybe as another flag in "stuct pci_dev" named something
>> like "reliable_address". My reasoning behind this can be best be seen
>> with a QEMU example. While I expect that QEMU can easily guarantee
>> that one can always use ":01:00.0" for a virtio-pci NIC and
>> thus enp1s0 interface name, the same might be harder to guarantee
>> for a SR-IOV VF passed through with vfio-pci in that same VM and
>> even less so if a thunderbolt controller is passed through and
>> enumeration may depend on daisy chaining. The QEMU example
>> also applies to s390 and maybe others will in the future.
> 
> I'm a little wary of using the PCI geographical address
> (":01:00.0") as a stable name.  Even if you can make a way to use
> that to identi

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-21 Thread Bjorn Helgaas
[Greg may be able to help compare/contrast this s390 UID with udev
persistent names]

On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote:
> On 1/15/21 4:29 PM, Bjorn Helgaas wrote:
> > On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
> >> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
> >>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
> >>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
> >>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
> >>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> >>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> >>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> >>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> >>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> >>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> >> ... snip ...
> >>
> >>>
> >>>>  if (!zpci_global_kset)
> >>>>  return -ENOMEM;
> >>>>
> >>>>  return sysfs_create_group(&zpci_global_kset->kobj, 
> >>>> &zpci_attr_group_global);
> >>>
> >>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
> >>> that's usually a huge clue that you are doing something wrong.
> >>>
> >>> Try the above again, with a simple attribute group, and name for it, and
> >>> it should "just work".
> >>
> >> I'm probably missing something but I don't get how this could work
> >> in this case. If I'm seeing this right the default attribute group
> >> here is pci_bus_type.bus_groups and that is already set in
> >> drivers/pci/pci-driver.c so I don't think I should set that.
> >>
> >> I did however find bus_create_file() which does work when using the
> >> path /sys/bus/pci/uid_checking instead. This would work for us if
> >> Bjorn is okay with that path and the code is really clean and simple
> >> too.
> >>
> >> That said, I think we could also add something like
> >> bus_create_group().  Then we could use that to also clean up
> >> drivers/pci/slot.c:pci_slot_init() and get the original path
> >> /sys/bus/pci/zpci/uid_checking.
> > 
> > I don't think "uid_checking" is quite the right name.  It says
> > something about the *implementation*, but it doesn't convey what that
> > *means* to userspace.  IIUC this file tells userspace something about
> > whether a given PCI device always has the same PCI domain/bus/dev/fn
> > address (or maybe just the same domain?)
> > 
> > It sounds like this feature could be useful beyond just s390, and
> > other arches might implement it differently, without the UID concept.
> > If so, I'm OK with something at the /sys/bus/pci/xxx level as long as
> > the name is not s390-specific (and "uid" sounds s390-specific).
> > 
> > I assume it would also help with the udev/systemd end if you could
> > make this less s390 dependent.
> 
> I've thought about this more and even implemented a proof of concept
> patch for a global attribute using a pcibios_has_reproducible_addressing()
> hook. 
> 
> However after implementing it I think as a more general and
> future proof concept it makes more sense to do this as a per device
> attribute, maybe as another flag in "stuct pci_dev" named something
> like "reliable_address". My reasoning behind this can be best be seen
> with a QEMU example. While I expect that QEMU can easily guarantee
> that one can always use ":01:00.0" for a virtio-pci NIC and
> thus enp1s0 interface name, the same might be harder to guarantee
> for a SR-IOV VF passed through with vfio-pci in that same VM and
> even less so if a thunderbolt controller is passed through and
> enumeration may depend on daisy chaining. The QEMU example
> also applies to s390 and maybe others will in the future.

I'm a little wary of using the PCI geographical address
(":01:00.0") as a stable name.  Even if you can make a way to use
that to identify a specific device instance, regardless of how it is
plugged in or passed through, it sounds like we could end up with
"physical PCI addresses" and "virtual PCI addresses" that look the
same and would cause confusion.

This concept sounds similar to the udev concept of a "persistent
device name".  What advantages does this s390 UID have over the udev
approach?

There are optional PCI device serial numbers that we currently don't
really make use of.  Would that be a generic way to help with this?


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-21 Thread Greg Kroah-Hartman
On Thu, Jan 21, 2021 at 09:54:45AM -0600, Bjorn Helgaas wrote:
> [Greg may be able to help compare/contrast this s390 UID with udev
> persistent names]
> 
> On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote:
> > On 1/15/21 4:29 PM, Bjorn Helgaas wrote:
> > > On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
> > >> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
> > >>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
> > >>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
> > >>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
> > >>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> > >>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> > >>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> > >>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> > >>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> > >>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> > >> ... snip ...
> > >>
> > >>>
> > >>>>if (!zpci_global_kset)
> > >>>>return -ENOMEM;
> > >>>>
> > >>>>return sysfs_create_group(&zpci_global_kset->kobj, 
> > >>>> &zpci_attr_group_global);
> > >>>
> > >>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
> > >>> that's usually a huge clue that you are doing something wrong.
> > >>>
> > >>> Try the above again, with a simple attribute group, and name for it, and
> > >>> it should "just work".
> > >>
> > >> I'm probably missing something but I don't get how this could work
> > >> in this case. If I'm seeing this right the default attribute group
> > >> here is pci_bus_type.bus_groups and that is already set in
> > >> drivers/pci/pci-driver.c so I don't think I should set that.
> > >>
> > >> I did however find bus_create_file() which does work when using the
> > >> path /sys/bus/pci/uid_checking instead. This would work for us if
> > >> Bjorn is okay with that path and the code is really clean and simple
> > >> too.
> > >>
> > >> That said, I think we could also add something like
> > >> bus_create_group().  Then we could use that to also clean up
> > >> drivers/pci/slot.c:pci_slot_init() and get the original path
> > >> /sys/bus/pci/zpci/uid_checking.
> > > 
> > > I don't think "uid_checking" is quite the right name.  It says
> > > something about the *implementation*, but it doesn't convey what that
> > > *means* to userspace.  IIUC this file tells userspace something about
> > > whether a given PCI device always has the same PCI domain/bus/dev/fn
> > > address (or maybe just the same domain?)
> > > 
> > > It sounds like this feature could be useful beyond just s390, and
> > > other arches might implement it differently, without the UID concept.
> > > If so, I'm OK with something at the /sys/bus/pci/xxx level as long as
> > > the name is not s390-specific (and "uid" sounds s390-specific).
> > > 
> > > I assume it would also help with the udev/systemd end if you could
> > > make this less s390 dependent.
> > 
> > I've thought about this more and even implemented a proof of concept
> > patch for a global attribute using a pcibios_has_reproducible_addressing()
> > hook. 
> > 
> > However after implementing it I think as a more general and
> > future proof concept it makes more sense to do this as a per device
> > attribute, maybe as another flag in "stuct pci_dev" named something
> > like "reliable_address". My reasoning behind this can be best be seen
> > with a QEMU example. While I expect that QEMU can easily guarantee
> > that one can always use ":01:00.0" for a virtio-pci NIC and
> > thus enp1s0 interface name, the same might be harder to guarantee
> > for a SR-IOV VF passed through with vfio-pci in that same VM and
> > even less so if a thunderbolt controller is passed through and
> > enumeration may depend on daisy chaining. The QEMU example
> > also applies to s390 and maybe others will in the future.
> 
> I

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-21 Thread Niklas Schnelle



On 1/15/21 4:29 PM, Bjorn Helgaas wrote:
> On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
>> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
>>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
>>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
>>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
>>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
>>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
>>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
>>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
>>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
>> ... snip ...
>>
>>>
>>>>if (!zpci_global_kset)
>>>>return -ENOMEM;
>>>>
>>>>return sysfs_create_group(&zpci_global_kset->kobj, 
>>>> &zpci_attr_group_global);
>>>
>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
>>> that's usually a huge clue that you are doing something wrong.
>>>
>>> Try the above again, with a simple attribute group, and name for it, and
>>> it should "just work".
>>
>> I'm probably missing something but I don't get how this could work
>> in this case. If I'm seeing this right the default attribute group
>> here is pci_bus_type.bus_groups and that is already set in
>> drivers/pci/pci-driver.c so I don't think I should set that.
>>
>> I did however find bus_create_file() which does work when using the
>> path /sys/bus/pci/uid_checking instead. This would work for us if
>> Bjorn is okay with that path and the code is really clean and simple
>> too.
>>
>> That said, I think we could also add something like
>> bus_create_group().  Then we could use that to also clean up
>> drivers/pci/slot.c:pci_slot_init() and get the original path
>> /sys/bus/pci/zpci/uid_checking.
> 
> I don't think "uid_checking" is quite the right name.  It says
> something about the *implementation*, but it doesn't convey what that
> *means* to userspace.  IIUC this file tells userspace something about
> whether a given PCI device always has the same PCI domain/bus/dev/fn
> address (or maybe just the same domain?)
> 
> It sounds like this feature could be useful beyond just s390, and
> other arches might implement it differently, without the UID concept.
> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as
> the name is not s390-specific (and "uid" sounds s390-specific).
> 
> I assume it would also help with the udev/systemd end if you could
> make this less s390 dependent.
> 
> Bjorn
> 

Hi Bjorn,

I've thought about this more and even implemented a proof of concept
patch for a global attribute using a pcibios_has_reproducible_addressing()
hook. 

However after implementing it I think as a more general and
future proof concept it makes more sense to do this as a per device
attribute, maybe as another flag in "stuct pci_dev" named something
like "reliable_address". My reasoning behind this can be best be seen
with a QEMU example. While I expect that QEMU can easily guarantee
that one can always use ":01:00.0" for a virtio-pci NIC and
thus enp1s0 interface name, the same might be harder to guarantee
for a SR-IOV VF passed through with vfio-pci in that same VM and
even less so if a thunderbolt controller is passed through and
enumeration may depend on daisy chaining. The QEMU example
also applies to s390 and maybe others will in the future.
I'll send an RFC for a per device attribute but didn't want to
let this discussion stand as if we had abandoned the idea and if
you would prefer a global attribute I can also send an RFC for that.

Besst regards,
Niklas


Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-19 Thread Miklos Szeredi
On Tue, Jan 19, 2021 at 10:11 PM Eric W. Biederman
 wrote:
>
> Miklos Szeredi  writes:
>
> > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
> > vfs_setxattr()") the translation of nscap->rootid did not take stacked
> > filesystems (overlayfs and ecryptfs) into account.
> >
> > That patch fixed the overlay case, but made the ecryptfs case worse.
> >
> > Restore old the behavior for ecryptfs that existed before the overlayfs
> > fix.  This does not fix ecryptfs's handling of complex user namespace
> > setups, but it does make sure existing setups don't regress.
>
> Today vfs_setxattr handles handles a delegated_inode and breaking
> leases.  Code that is enabled with CONFIG_FILE_LOCKING.  So unless
> I am missing something this introduces a different regression into
> ecryptfs.

This is in line with all the other cases of ecryptfs passing NULL as
delegated inode.

I'll defer this to the maintainer of ecryptfs.

Thanks,
Miklos


Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-19 Thread Eric W. Biederman
Miklos Szeredi  writes:

> Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
> vfs_setxattr()") the translation of nscap->rootid did not take stacked
> filesystems (overlayfs and ecryptfs) into account.
>
> That patch fixed the overlay case, but made the ecryptfs case worse.
>
> Restore old the behavior for ecryptfs that existed before the overlayfs
> fix.  This does not fix ecryptfs's handling of complex user namespace
> setups, but it does make sure existing setups don't regress.

Today vfs_setxattr handles handles a delegated_inode and breaking
leases.  Code that is enabled with CONFIG_FILE_LOCKING.  So unless
I am missing something this introduces a different regression into
ecryptfs.

>
> Reported-by: Eric W. Biederman 
> Cc: Tyler Hicks 
> Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()")
> Signed-off-by: Miklos Szeredi 
> ---
>  fs/ecryptfs/inode.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index e23752d9a79f..58d0f7187997 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -1016,15 +1016,19 @@ ecryptfs_setxattr(struct dentry *dentry, struct inode 
> *inode,
>  {
>   int rc;
>   struct dentry *lower_dentry;
> + struct inode *lower_inode;
>  
>   lower_dentry = ecryptfs_dentry_to_lower(dentry);
> - if (!(d_inode(lower_dentry)->i_opflags & IOP_XATTR)) {
> + lower_inode = d_inode(lower_dentry);
> + if (!(lower_inode->i_opflags & IOP_XATTR)) {
>   rc = -EOPNOTSUPP;
>   goto out;
>   }
> - rc = vfs_setxattr(lower_dentry, name, value, size, flags);
> + inode_lock(lower_inode);
> + rc = __vfs_setxattr_locked(lower_dentry, name, value, size, flags, 
> NULL);
> + inode_unlock(lower_inode);
>   if (!rc && inode)
> - fsstack_copy_attr_all(inode, d_inode(lower_dentry));
> + fsstack_copy_attr_all(inode, lower_inode);
>  out:
>   return rc;
>  }

Eric


[PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability

2021-01-19 Thread Miklos Szeredi
Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
vfs_setxattr()") the translation of nscap->rootid did not take stacked
filesystems (overlayfs and ecryptfs) into account.

That patch fixed the overlay case, but made the ecryptfs case worse.

Restore old the behavior for ecryptfs that existed before the overlayfs
fix.  This does not fix ecryptfs's handling of complex user namespace
setups, but it does make sure existing setups don't regress.

Reported-by: Eric W. Biederman 
Cc: Tyler Hicks 
Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()")
Signed-off-by: Miklos Szeredi 
---
 fs/ecryptfs/inode.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index e23752d9a79f..58d0f7187997 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1016,15 +1016,19 @@ ecryptfs_setxattr(struct dentry *dentry, struct inode 
*inode,
 {
int rc;
struct dentry *lower_dentry;
+   struct inode *lower_inode;
 
lower_dentry = ecryptfs_dentry_to_lower(dentry);
-   if (!(d_inode(lower_dentry)->i_opflags & IOP_XATTR)) {
+   lower_inode = d_inode(lower_dentry);
+   if (!(lower_inode->i_opflags & IOP_XATTR)) {
rc = -EOPNOTSUPP;
goto out;
}
-   rc = vfs_setxattr(lower_dentry, name, value, size, flags);
+   inode_lock(lower_inode);
+   rc = __vfs_setxattr_locked(lower_dentry, name, value, size, flags, 
NULL);
+   inode_unlock(lower_inode);
if (!rc && inode)
-   fsstack_copy_attr_all(inode, d_inode(lower_dentry));
+   fsstack_copy_attr_all(inode, lower_inode);
 out:
return rc;
 }
-- 
2.26.2



Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-15 Thread Niklas Schnelle
set)
>>>>return -ENOMEM;
>>>>
>>>>return sysfs_create_group(&zpci_global_kset->kobj, 
>>>> &zpci_attr_group_global);
>>>
>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
>>> that's usually a huge clue that you are doing something wrong.
>>>
>>> Try the above again, with a simple attribute group, and name for it, and
>>> it should "just work".
>>
>> I'm probably missing something but I don't get how this could work
>> in this case. If I'm seeing this right the default attribute group
>> here is pci_bus_type.bus_groups and that is already set in
>> drivers/pci/pci-driver.c so I don't think I should set that.
>>
>> I did however find bus_create_file() which does work when using the
>> path /sys/bus/pci/uid_checking instead. This would work for us if
>> Bjorn is okay with that path and the code is really clean and simple
>> too.
>>
>> That said, I think we could also add something like
>> bus_create_group().  Then we could use that to also clean up
>> drivers/pci/slot.c:pci_slot_init() and get the original path
>> /sys/bus/pci/zpci/uid_checking.
> 
> I don't think "uid_checking" is quite the right name.  It says
> something about the *implementation*, but it doesn't convey what that
> *means* to userspace.  IIUC this file tells userspace something about
> whether a given PCI device always has the same PCI domain/bus/dev/fn
> address (or maybe just the same domain?)

Yes you're right, in fact we internally also started to think about
something more meaning oriented like "unique_uids". This indeed results
in PCI addresses which can be relied upon as stable identifiers. For us
it's enough that the domain matches as the bus is always 0 and
dev/fn are determined by the device and SR-IOV stride etc.
so will remain the same for equivalent configurations.

> 
> It sounds like this feature could be useful beyond just s390, and
> other arches might implement it differently, without the UID concept.
> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as
> the name is not s390-specific (and "uid" sounds s390-specific).
> 
> I assume it would also help with the udev/systemd end if you could
> make this less s390 dependent.
> 
> Bjorn

That's a very good point! I'm absolutely open to making this a
common concept. I think the gist could be that the addressing/ids on
the bus are reproducible, there might be some user configuration
required but then they can be relied upon for stable names like
network interfaces. So maybe a name like "reproducible_addressing"?
I guess one non-s390 exclusive case of this could be virtualized
PCI under QEMU/KVM etc. versus real hardware or even UEFI/Firmware
guarantees, right?


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-15 Thread Bjorn Helgaas
On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
> > On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
> >> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
> >>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
> >>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> >>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> >>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> >>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> >>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> >>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> ... snip ...
> >>>>>>
> >>>>>> Hey Niklas et al. :)
> >>>>>>
> >>>>>> I think this will need input from Greg. He should be best versed in
> >>>>>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
> >>>>>> supposed to be kernel internal. Now, that might just be a matter of
> >>>>>> renaming the macro but let's see whether Greg has any better idea or
> >>>>>> more questions. :)
> >>>>>
> >>>>> The big question is, why are you needing this?
> >>>>>
> >>>>> No driver or driver subsystem should EVER be messing with a "raw"
> >>>>> kobject like this.  Just use the existing DEVICE_* macros instead
> >>>>> please.
> >>>>>
> >>>>> If you are using a raw kobject, please ask me how to do this properly,
> >>>>> as that is something that should NEVER show up in the /sys/devices/*
> >>>>> tree.  Otherwise userspace tools will break.
> >>>>>
> >>>>> thanks,
> >>>>>
> >>>>> greg k-h
> >>>>
> >>>> Hi Greg,
> >>>>
> >>>> this is for an architecture specific but global i.e. not device bound PCI
> >>>> attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work
> >>>> but only if we place the attribute directly under /sys/bus/pci/new_attr.
> >>>
> >>> Then you are doing something wrong :)
> >>
> >> That is very possible.
> >>
> >>>
> >>> Where _exactly_ are you wanting to put this attribute?
> >>
> >> I'm trying for /sys/bus/pci/zpci/uid_checking, I'm using
> >> the below code and the attribute even shows up but reading
> >> it gives me two 0 bytes only.
> >> The relevant code is only a slight alteration of the original patch
> >> as follows:
> >>
> >> static ssize_t uid_checking_show(struct bus_type *bus, char *buf)
> >> {
> >>return sprintf(buf, "%i\n", zpci_unique_uid);
> >> }
> >> static BUS_ATTR_RO(uid_checking);
> >>
> >> static struct kset *zpci_global_kset;
> >>
> >> static struct attribute *zpci_attrs_global[] = {
> >>&bus_attr_uid_checking.attr,
> >>NULL,
> >> };
> >>
> >> static struct attribute_group zpci_attr_group_global = {
> >>.attrs = zpci_attrs_global,
> >> };
> > 
> > Name your attribute group, and then you do not have to mess with a
> > "raw" kobject like you are below:
> 
> Thanks for this tip and sorry for bothering you again.
> 
> > 
> >>
> >> int __init zpci_sysfs_init(void)
> >> {
> >>struct kset *pci_bus_kset;
> >>
> >>pci_bus_kset = bus_get_kset(&pci_bus_type);
> >>
> >>zpci_global_kset = kset_create_and_add("zpci", NULL, 
> >> &pci_bus_kset->kobj);
> > 
> > No, do not mess with at kset, just set the default attribute group for
> > the bus to the above, and you should be fine.
> 
> Oh ok, I got this idea from the code adding /sys/bus/pci/slots/ in
> drivers/pci/slot.c:pci_slot_init(). See below maybe we can clean that up too.
> 
> > 
> >>if (!zpci_global_kset)
> >>return -ENOMEM;
> >>
> >>return sysfs_create_group(&zpci_global_kset->kobj, 
> >> &zpci_attr_group_global);
> > 
> > Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
> > that's usually a huge clue that you are doing

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-15 Thread Greg Kroah-Hartman
On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote:
> 
> 
> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
> > On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
> >>
> >>
> >> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
> >>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
> 
> 
>  On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> > On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> >> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> >>>
> >>>
> >>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
>  On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> > On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> ... snip ...
> >>
> >> Hey Niklas et al. :)
> >>
> >> I think this will need input from Greg. He should be best versed in
> >> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
> >> supposed to be kernel internal. Now, that might just be a matter of
> >> renaming the macro but let's see whether Greg has any better idea or
> >> more questions. :)
> >
> > The big question is, why are you needing this?
> >
> > No driver or driver subsystem should EVER be messing with a "raw"
> > kobject like this.  Just use the existing DEVICE_* macros instead
> > please.
> >
> > If you are using a raw kobject, please ask me how to do this properly,
> > as that is something that should NEVER show up in the /sys/devices/*
> > tree.  Otherwise userspace tools will break.
> >
> > thanks,
> >
> > greg k-h
> 
>  Hi Greg,
> 
>  this is for an architecture specific but global i.e. not device bound PCI
>  attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work
>  but only if we place the attribute directly under /sys/bus/pci/new_attr.
> >>>
> >>> Then you are doing something wrong :)
> >>
> >> That is very possible.
> >>
> >>>
> >>> Where _exactly_ are you wanting to put this attribute?
> >>
> >> I'm trying for /sys/bus/pci/zpci/uid_checking, I'm using
> >> the below code and the attribute even shows up but reading
> >> it gives me two 0 bytes only.
> >> The relevant code is only a slight alteration of the original patch
> >> as follows:
> >>
> >> static ssize_t uid_checking_show(struct bus_type *bus, char *buf)
> >> {
> >>return sprintf(buf, "%i\n", zpci_unique_uid);
> >> }
> >> static BUS_ATTR_RO(uid_checking);
> >>
> >> static struct kset *zpci_global_kset;
> >>
> >> static struct attribute *zpci_attrs_global[] = {
> >>&bus_attr_uid_checking.attr,
> >>NULL,
> >> };
> >>
> >> static struct attribute_group zpci_attr_group_global = {
> >>.attrs = zpci_attrs_global,
> >> };
> > 
> > Name your attribute group, and then you do not have to mess with a
> > "raw" kobject like you are below:
> 
> Thanks for this tip and sorry for bothering you again.
> 
> > 
> >>
> >> int __init zpci_sysfs_init(void)
> >> {
> >>struct kset *pci_bus_kset;
> >>
> >>pci_bus_kset = bus_get_kset(&pci_bus_type);
> >>
> >>zpci_global_kset = kset_create_and_add("zpci", NULL, 
> >> &pci_bus_kset->kobj);
> > 
> > No, do not mess with at kset, just set the default attribute group for
> > the bus to the above, and you should be fine.
> 
> Oh ok, I got this idea from the code adding /sys/bus/pci/slots/ in
> drivers/pci/slot.c:pci_slot_init(). See below maybe we can clean that up too.

Slots are "interesting" and that code is really old, we know how to do
things better now :)

But I doubt we should change anything there, as it does work, and
userspace is used to how they come/go.

> >>if (!zpci_global_kset)
> >>return -ENOMEM;
> >>
> >>return sysfs_create_group(&zpci_global_kset->kobj, 
> >> &zpci_attr_group_global);
> > 
> > Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
> > that's usually a huge clue that you are doing something wrong.
> > 
> > Try the above again, with a simple attribute group, and name for it, and
> > it should "just work".
> 
> I'm probably missing something but I don't get how this could work in
> this case. If I'm seeing this right the default attribute group here
> is pci_bus_type.bus_groups and that is already set in drivers/pci/pci-driver.c
> so I don't think I should set that.

Yes, add your group to that list of groups and all should be good.

> I did however find bus_create_file() which does work when using the path
> /sys/bus/pci/uid_checking instead. This would work for us if Bjorn is okay 
> with
> that path and the code is really clean and simple too.

No, use the above group you already have please.

> That said, I think we could also add something like bus_create_group().

No, use the group list as show above please.

> Then we could use that to also clean up drivers/pci/slot.c:pci_slot_init()
> and get the original path /sys/bus/pci/zpci/uid_checking.

What needs to be cleaned up there?

> I thin

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-15 Thread Niklas Schnelle



On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
>>
>>
>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:


 On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
>>>
>>>
>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
 On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
... snip ...
>>
>> Hey Niklas et al. :)
>>
>> I think this will need input from Greg. He should be best versed in
>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
>> supposed to be kernel internal. Now, that might just be a matter of
>> renaming the macro but let's see whether Greg has any better idea or
>> more questions. :)
>
> The big question is, why are you needing this?
>
> No driver or driver subsystem should EVER be messing with a "raw"
> kobject like this.  Just use the existing DEVICE_* macros instead
> please.
>
> If you are using a raw kobject, please ask me how to do this properly,
> as that is something that should NEVER show up in the /sys/devices/*
> tree.  Otherwise userspace tools will break.
>
> thanks,
>
> greg k-h

 Hi Greg,

 this is for an architecture specific but global i.e. not device bound PCI
 attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work
 but only if we place the attribute directly under /sys/bus/pci/new_attr.
>>>
>>> Then you are doing something wrong :)
>>
>> That is very possible.
>>
>>>
>>> Where _exactly_ are you wanting to put this attribute?
>>
>> I'm trying for /sys/bus/pci/zpci/uid_checking, I'm using
>> the below code and the attribute even shows up but reading
>> it gives me two 0 bytes only.
>> The relevant code is only a slight alteration of the original patch
>> as follows:
>>
>> static ssize_t uid_checking_show(struct bus_type *bus, char *buf)
>> {
>>  return sprintf(buf, "%i\n", zpci_unique_uid);
>> }
>> static BUS_ATTR_RO(uid_checking);
>>
>> static struct kset *zpci_global_kset;
>>
>> static struct attribute *zpci_attrs_global[] = {
>>  &bus_attr_uid_checking.attr,
>>  NULL,
>> };
>>
>> static struct attribute_group zpci_attr_group_global = {
>>  .attrs = zpci_attrs_global,
>> };
> 
> Name your attribute group, and then you do not have to mess with a
> "raw" kobject like you are below:

Thanks for this tip and sorry for bothering you again.

> 
>>
>> int __init zpci_sysfs_init(void)
>> {
>>  struct kset *pci_bus_kset;
>>
>>  pci_bus_kset = bus_get_kset(&pci_bus_type);
>>
>>  zpci_global_kset = kset_create_and_add("zpci", NULL, 
>> &pci_bus_kset->kobj);
> 
> No, do not mess with at kset, just set the default attribute group for
> the bus to the above, and you should be fine.

Oh ok, I got this idea from the code adding /sys/bus/pci/slots/ in
drivers/pci/slot.c:pci_slot_init(). See below maybe we can clean that up too.

> 
>>  if (!zpci_global_kset)
>>  return -ENOMEM;
>>
>>  return sysfs_create_group(&zpci_global_kset->kobj, 
>> &zpci_attr_group_global);
> 
> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*,
> that's usually a huge clue that you are doing something wrong.
> 
> Try the above again, with a simple attribute group, and name for it, and
> it should "just work".

I'm probably missing something but I don't get how this could work in
this case. If I'm seeing this right the default attribute group here
is pci_bus_type.bus_groups and that is already set in drivers/pci/pci-driver.c
so I don't think I should set that.

I did however find bus_create_file() which does work when using the path
/sys/bus/pci/uid_checking instead. This would work for us if Bjorn is okay with
that path and the code is really clean and simple too.

That said, I think we could also add something like bus_create_group().
Then we could use that to also clean up drivers/pci/slot.c:pci_slot_init()
and get the original path /sys/bus/pci/zpci/uid_checking.

I think this would also allow us to get rid of pci_bus_get_kset() which is
only used in that function and seems to me like it encourages use of raw ksets.
Or I'm completely off the mark and just missing something important.

thanks,
Niklas


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-14 Thread Greg Kroah-Hartman
On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote:
> 
> 
> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
> > On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
> >>
> >>
> >> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> >>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> >>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> >>>>>
> >>>>>
> >>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> >>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> >>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> >>>>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
> >>>>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if
> >>>>>>>>> there are multiple functions in an adapter, as PCI domain if and 
> >>>>>>>>> only if
> >>>>>>>>> UID Checking is turned on.
> >>>>>>>>> Otherwise we automatically generate domains as devices appear.
> >>>>>>>>>
> >>>>>>>>> The state of UID Checking is thus essential to know if the PCI 
> >>>>>>>>> domain
> >>>>>>>>> will be stable, yet currently there is no way to access this 
> >>>>>>>>> information
> >>>>>>>>> from userspace.
> >>>>>>>>> So let's solve this by showing the state of UID checking as a sysfs
> >>>>>>>>> attribute in /sys/bus/pci/uid_checking
> >>>>>>
> >>>>>>>>> +/* Global zPCI attributes */
> >>>>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj,
> >>>>>>>>> +struct kobj_attribute *attr, char *buf)
> >>>>>>>>> +{
> >>>>>>>>> +   return sprintf(buf, "%i\n", zpci_unique_uid);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
> >>>>>>>>> +   __ATTR(uid_checking, 0444, uid_checking_show, NULL);
> >>>>>>>>
> >>>>>>>> Use DEVICE_ATTR_RO instead of __ATTR.
> >>>>>>>
> >>>>>>> It's my understanding that DEVICE_ATTR_* is only for
> >>>>>>> per device attributes. This one is global for the entire
> >>>>>>> Z PCI. I just tried with BUS_ATTR_RO instead
> >>>>>>> and that works but only if I put the attribute at
> >>>>>>> /sys/bus/pci/uid_checking instead of with a zpci
> >>>>>>> subfolder. This path would work for us too, we
> >>>>>>> currently don't have any other global attributes
> >>>>>>> that we are planning to expose but those could of
> >>>>>>> course come up in the future.
> >>>>>>
> >>>>>> Ah, I missed the fact that this is a kobj_attribute, not a
> >>>>>> device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
> >>>>>> seems like it might fit?
> >>>>>>
> >>>>>> Bjorn
> >>>>>>
> >>>>>
> >>>>> KERNEL_ATTR_* is currently not exported in any header. After
> >>>>> adding it to include/linuc/sysfs.h it indeed works perfectly.
> >>>>> Adding Christian Brauner as suggested by get_maintainers for
> >>>>> their opinion. I'm of course willing to provide a patch
> >>>>
> >>>> Hey Niklas et al. :)
> >>>>
> >>>> I think this will need input from Greg. He should be best versed in
> >>>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
> >>>> supposed to be kernel internal. Now, that might just be a matter of
> >>>> renaming the macro but let's see whether Greg has any better idea or
> >>>> more questions. :)
> >>>
> >>> The big question is, why are you needing this?
> >>>
> >>> No driver or driver subsystem should EVER be messing with a "raw&

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-14 Thread Niklas Schnelle



On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
>>
>>
>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
>>>>>
>>>>>
>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
>>>>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
>>>>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if
>>>>>>>>> there are multiple functions in an adapter, as PCI domain if and only 
>>>>>>>>> if
>>>>>>>>> UID Checking is turned on.
>>>>>>>>> Otherwise we automatically generate domains as devices appear.
>>>>>>>>>
>>>>>>>>> The state of UID Checking is thus essential to know if the PCI domain
>>>>>>>>> will be stable, yet currently there is no way to access this 
>>>>>>>>> information
>>>>>>>>> from userspace.
>>>>>>>>> So let's solve this by showing the state of UID checking as a sysfs
>>>>>>>>> attribute in /sys/bus/pci/uid_checking
>>>>>>
>>>>>>>>> +/* Global zPCI attributes */
>>>>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj,
>>>>>>>>> +  struct kobj_attribute *attr, char *buf)
>>>>>>>>> +{
>>>>>>>>> + return sprintf(buf, "%i\n", zpci_unique_uid);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
>>>>>>>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL);
>>>>>>>>
>>>>>>>> Use DEVICE_ATTR_RO instead of __ATTR.
>>>>>>>
>>>>>>> It's my understanding that DEVICE_ATTR_* is only for
>>>>>>> per device attributes. This one is global for the entire
>>>>>>> Z PCI. I just tried with BUS_ATTR_RO instead
>>>>>>> and that works but only if I put the attribute at
>>>>>>> /sys/bus/pci/uid_checking instead of with a zpci
>>>>>>> subfolder. This path would work for us too, we
>>>>>>> currently don't have any other global attributes
>>>>>>> that we are planning to expose but those could of
>>>>>>> course come up in the future.
>>>>>>
>>>>>> Ah, I missed the fact that this is a kobj_attribute, not a
>>>>>> device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
>>>>>> seems like it might fit?
>>>>>>
>>>>>> Bjorn
>>>>>>
>>>>>
>>>>> KERNEL_ATTR_* is currently not exported in any header. After
>>>>> adding it to include/linuc/sysfs.h it indeed works perfectly.
>>>>> Adding Christian Brauner as suggested by get_maintainers for
>>>>> their opinion. I'm of course willing to provide a patch
>>>>
>>>> Hey Niklas et al. :)
>>>>
>>>> I think this will need input from Greg. He should be best versed in
>>>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
>>>> supposed to be kernel internal. Now, that might just be a matter of
>>>> renaming the macro but let's see whether Greg has any better idea or
>>>> more questions. :)
>>>
>>> The big question is, why are you needing this?
>>>
>>> No driver or driver subsystem should EVER be messing with a "raw"
>>> kobject like this.  Just use the existing DEVICE_* macros instead
>>> please.
>>>
>>> If you are using a raw kobject, please ask me how to do this properly,
>>> as that is something that should NEVER show up in the /sys/devices/*
>>> tree.  Otherwise userspace tools will break.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Hi Greg,
>>
>> this is for an archite

Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-14 Thread Greg Kroah-Hartman
On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote:
> 
> 
> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> > On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> >> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> >>>
> >>>
> >>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> >>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> >>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> >>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
> >>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if
> >>>>>>> there are multiple functions in an adapter, as PCI domain if and only 
> >>>>>>> if
> >>>>>>> UID Checking is turned on.
> >>>>>>> Otherwise we automatically generate domains as devices appear.
> >>>>>>>
> >>>>>>> The state of UID Checking is thus essential to know if the PCI domain
> >>>>>>> will be stable, yet currently there is no way to access this 
> >>>>>>> information
> >>>>>>> from userspace.
> >>>>>>> So let's solve this by showing the state of UID checking as a sysfs
> >>>>>>> attribute in /sys/bus/pci/uid_checking
> >>>>
> >>>>>>> +/* Global zPCI attributes */
> >>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj,
> >>>>>>> +  struct kobj_attribute *attr, char *buf)
> >>>>>>> +{
> >>>>>>> + return sprintf(buf, "%i\n", zpci_unique_uid);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
> >>>>>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL);
> >>>>>>
> >>>>>> Use DEVICE_ATTR_RO instead of __ATTR.
> >>>>>
> >>>>> It's my understanding that DEVICE_ATTR_* is only for
> >>>>> per device attributes. This one is global for the entire
> >>>>> Z PCI. I just tried with BUS_ATTR_RO instead
> >>>>> and that works but only if I put the attribute at
> >>>>> /sys/bus/pci/uid_checking instead of with a zpci
> >>>>> subfolder. This path would work for us too, we
> >>>>> currently don't have any other global attributes
> >>>>> that we are planning to expose but those could of
> >>>>> course come up in the future.
> >>>>
> >>>> Ah, I missed the fact that this is a kobj_attribute, not a
> >>>> device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
> >>>> seems like it might fit?
> >>>>
> >>>> Bjorn
> >>>>
> >>>
> >>> KERNEL_ATTR_* is currently not exported in any header. After
> >>> adding it to include/linuc/sysfs.h it indeed works perfectly.
> >>> Adding Christian Brauner as suggested by get_maintainers for
> >>> their opinion. I'm of course willing to provide a patch
> >>
> >> Hey Niklas et al. :)
> >>
> >> I think this will need input from Greg. He should be best versed in
> >> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
> >> supposed to be kernel internal. Now, that might just be a matter of
> >> renaming the macro but let's see whether Greg has any better idea or
> >> more questions. :)
> > 
> > The big question is, why are you needing this?
> > 
> > No driver or driver subsystem should EVER be messing with a "raw"
> > kobject like this.  Just use the existing DEVICE_* macros instead
> > please.
> > 
> > If you are using a raw kobject, please ask me how to do this properly,
> > as that is something that should NEVER show up in the /sys/devices/*
> > tree.  Otherwise userspace tools will break.
> > 
> > thanks,
> > 
> > greg k-h
> 
> Hi Greg,
> 
> this is for an architecture specific but global i.e. not device bound PCI
> attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work
> but only if we place the attribute directly under /sys/bus/pci/new_attr.

Then you are doing something wrong :)

Where _exactly_ are you wanting to put this attribute?

> I'm aware that this is quite unusual in fact I couldn't find anything
> similar. That's why this is an RFC, with a lengthy cover letter
> explaining our use case, that I sent to Bjorn to figure out where to
> even place the attribute.
> 
> So I guess this is indeed me asking you how to do this properly.
> That said it does not show up under /sys/devices/* only /sys/bus/pci/*.

Do NOT put random kobjects under a bus subsystem.  If you need that,
then use BUS_ATTR_* as that is what it is there for.

Again, if you are in a driver subsystem, do not use a raw kobject.
Either something is already there for you, or what you want to do is not
correct :)

thanks,

greg k-h


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-14 Thread Niklas Schnelle



On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
>>>
>>>
>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
>>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
>>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if
>>>>>>> there are multiple functions in an adapter, as PCI domain if and only if
>>>>>>> UID Checking is turned on.
>>>>>>> Otherwise we automatically generate domains as devices appear.
>>>>>>>
>>>>>>> The state of UID Checking is thus essential to know if the PCI domain
>>>>>>> will be stable, yet currently there is no way to access this information
>>>>>>> from userspace.
>>>>>>> So let's solve this by showing the state of UID checking as a sysfs
>>>>>>> attribute in /sys/bus/pci/uid_checking
>>>>
>>>>>>> +/* Global zPCI attributes */
>>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj,
>>>>>>> +struct kobj_attribute *attr, char *buf)
>>>>>>> +{
>>>>>>> +   return sprintf(buf, "%i\n", zpci_unique_uid);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
>>>>>>> +   __ATTR(uid_checking, 0444, uid_checking_show, NULL);
>>>>>>
>>>>>> Use DEVICE_ATTR_RO instead of __ATTR.
>>>>>
>>>>> It's my understanding that DEVICE_ATTR_* is only for
>>>>> per device attributes. This one is global for the entire
>>>>> Z PCI. I just tried with BUS_ATTR_RO instead
>>>>> and that works but only if I put the attribute at
>>>>> /sys/bus/pci/uid_checking instead of with a zpci
>>>>> subfolder. This path would work for us too, we
>>>>> currently don't have any other global attributes
>>>>> that we are planning to expose but those could of
>>>>> course come up in the future.
>>>>
>>>> Ah, I missed the fact that this is a kobj_attribute, not a
>>>> device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
>>>> seems like it might fit?
>>>>
>>>> Bjorn
>>>>
>>>
>>> KERNEL_ATTR_* is currently not exported in any header. After
>>> adding it to include/linuc/sysfs.h it indeed works perfectly.
>>> Adding Christian Brauner as suggested by get_maintainers for
>>> their opinion. I'm of course willing to provide a patch
>>
>> Hey Niklas et al. :)
>>
>> I think this will need input from Greg. He should be best versed in
>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
>> supposed to be kernel internal. Now, that might just be a matter of
>> renaming the macro but let's see whether Greg has any better idea or
>> more questions. :)
> 
> The big question is, why are you needing this?
> 
> No driver or driver subsystem should EVER be messing with a "raw"
> kobject like this.  Just use the existing DEVICE_* macros instead
> please.
> 
> If you are using a raw kobject, please ask me how to do this properly,
> as that is something that should NEVER show up in the /sys/devices/*
> tree.  Otherwise userspace tools will break.
> 
> thanks,
> 
> greg k-h

Hi Greg,

this is for an architecture specific but global i.e. not device bound PCI
attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work
but only if we place the attribute directly under /sys/bus/pci/new_attr.

I'm aware that this is quite unusual in fact I couldn't find anything
similar. That's why this is an RFC, with a lengthy cover letter
explaining our use case, that I sent to Bjorn to figure out where to
even place the attribute.

So I guess this is indeed me asking you how to do this properly.
That said it does not show up under /sys/devices/* only /sys/bus/pci/*.

Best regards,
Niklas


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-14 Thread Greg Kroah-Hartman
On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote:
> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> > 
> > 
> > On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> > > On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> > >> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> > >>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
> > >>>> We use the UID of a zPCI adapter, or the UID of the function zero if
> > >>>> there are multiple functions in an adapter, as PCI domain if and only 
> > >>>> if
> > >>>> UID Checking is turned on.
> > >>>> Otherwise we automatically generate domains as devices appear.
> > >>>>
> > >>>> The state of UID Checking is thus essential to know if the PCI domain
> > >>>> will be stable, yet currently there is no way to access this 
> > >>>> information
> > >>>> from userspace.
> > >>>> So let's solve this by showing the state of UID checking as a sysfs
> > >>>> attribute in /sys/bus/pci/uid_checking
> > > 
> > >>>> +/* Global zPCI attributes */
> > >>>> +static ssize_t uid_checking_show(struct kobject *kobj,
> > >>>> +   struct kobj_attribute *attr, char *buf)
> > >>>> +{
> > >>>> +  return sprintf(buf, "%i\n", zpci_unique_uid);
> > >>>> +}
> > >>>> +
> > >>>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
> > >>>> +  __ATTR(uid_checking, 0444, uid_checking_show, NULL);
> > >>>
> > >>> Use DEVICE_ATTR_RO instead of __ATTR.
> > >>
> > >> It's my understanding that DEVICE_ATTR_* is only for
> > >> per device attributes. This one is global for the entire
> > >> Z PCI. I just tried with BUS_ATTR_RO instead
> > >> and that works but only if I put the attribute at
> > >> /sys/bus/pci/uid_checking instead of with a zpci
> > >> subfolder. This path would work for us too, we
> > >> currently don't have any other global attributes
> > >> that we are planning to expose but those could of
> > >> course come up in the future.
> > > 
> > > Ah, I missed the fact that this is a kobj_attribute, not a
> > > device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
> > > seems like it might fit?
> > > 
> > > Bjorn
> > > 
> > 
> > KERNEL_ATTR_* is currently not exported in any header. After
> > adding it to include/linuc/sysfs.h it indeed works perfectly.
> > Adding Christian Brauner as suggested by get_maintainers for
> > their opinion. I'm of course willing to provide a patch
> 
> Hey Niklas et al. :)
> 
> I think this will need input from Greg. He should be best versed in
> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
> supposed to be kernel internal. Now, that might just be a matter of
> renaming the macro but let's see whether Greg has any better idea or
> more questions. :)

The big question is, why are you needing this?

No driver or driver subsystem should EVER be messing with a "raw"
kobject like this.  Just use the existing DEVICE_* macros instead
please.

If you are using a raw kobject, please ask me how to do this properly,
as that is something that should NEVER show up in the /sys/devices/*
tree.  Otherwise userspace tools will break.

thanks,

greg k-h


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-14 Thread Christian Brauner
On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote:
> 
> 
> On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> > On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> >> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> >>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
> >>>> We use the UID of a zPCI adapter, or the UID of the function zero if
> >>>> there are multiple functions in an adapter, as PCI domain if and only if
> >>>> UID Checking is turned on.
> >>>> Otherwise we automatically generate domains as devices appear.
> >>>>
> >>>> The state of UID Checking is thus essential to know if the PCI domain
> >>>> will be stable, yet currently there is no way to access this information
> >>>> from userspace.
> >>>> So let's solve this by showing the state of UID checking as a sysfs
> >>>> attribute in /sys/bus/pci/uid_checking
> > 
> >>>> +/* Global zPCI attributes */
> >>>> +static ssize_t uid_checking_show(struct kobject *kobj,
> >>>> + struct kobj_attribute *attr, char *buf)
> >>>> +{
> >>>> +return sprintf(buf, "%i\n", zpci_unique_uid);
> >>>> +}
> >>>> +
> >>>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
> >>>> +__ATTR(uid_checking, 0444, uid_checking_show, NULL);
> >>>
> >>> Use DEVICE_ATTR_RO instead of __ATTR.
> >>
> >> It's my understanding that DEVICE_ATTR_* is only for
> >> per device attributes. This one is global for the entire
> >> Z PCI. I just tried with BUS_ATTR_RO instead
> >> and that works but only if I put the attribute at
> >> /sys/bus/pci/uid_checking instead of with a zpci
> >> subfolder. This path would work for us too, we
> >> currently don't have any other global attributes
> >> that we are planning to expose but those could of
> >> course come up in the future.
> > 
> > Ah, I missed the fact that this is a kobj_attribute, not a
> > device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
> > seems like it might fit?
> > 
> > Bjorn
> > 
> 
> KERNEL_ATTR_* is currently not exported in any header. After
> adding it to include/linuc/sysfs.h it indeed works perfectly.
> Adding Christian Brauner as suggested by get_maintainers for
> their opinion. I'm of course willing to provide a patch

Hey Niklas et al. :)

I think this will need input from Greg. He should be best versed in
sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's
supposed to be kernel internal. Now, that might just be a matter of
renaming the macro but let's see whether Greg has any better idea or
more questions. :)

Christian


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-14 Thread Niklas Schnelle



On 1/13/21 7:55 PM, Bjorn Helgaas wrote:
> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
>>>> We use the UID of a zPCI adapter, or the UID of the function zero if
>>>> there are multiple functions in an adapter, as PCI domain if and only if
>>>> UID Checking is turned on.
>>>> Otherwise we automatically generate domains as devices appear.
>>>>
>>>> The state of UID Checking is thus essential to know if the PCI domain
>>>> will be stable, yet currently there is no way to access this information
>>>> from userspace.
>>>> So let's solve this by showing the state of UID checking as a sysfs
>>>> attribute in /sys/bus/pci/uid_checking
> 
>>>> +/* Global zPCI attributes */
>>>> +static ssize_t uid_checking_show(struct kobject *kobj,
>>>> +   struct kobj_attribute *attr, char *buf)
>>>> +{
>>>> +  return sprintf(buf, "%i\n", zpci_unique_uid);
>>>> +}
>>>> +
>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
>>>> +  __ATTR(uid_checking, 0444, uid_checking_show, NULL);
>>>
>>> Use DEVICE_ATTR_RO instead of __ATTR.
>>
>> It's my understanding that DEVICE_ATTR_* is only for
>> per device attributes. This one is global for the entire
>> Z PCI. I just tried with BUS_ATTR_RO instead
>> and that works but only if I put the attribute at
>> /sys/bus/pci/uid_checking instead of with a zpci
>> subfolder. This path would work for us too, we
>> currently don't have any other global attributes
>> that we are planning to expose but those could of
>> course come up in the future.
> 
> Ah, I missed the fact that this is a kobj_attribute, not a
> device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
> seems like it might fit?
> 
> Bjorn
> 

KERNEL_ATTR_* is currently not exported in any header. After
adding it to include/linuc/sysfs.h it indeed works perfectly.
Adding Christian Brauner as suggested by get_maintainers for
their opinion. I'm of course willing to provide a patch
for that move should it be desired.

@Bjorn apart from the correct macro do you have a preference
for either suggested path /sys/bus/pci/zpci/uid_checking vs
/sys/bus/pci/uid_checking?

For completeness some further internal discussion lets
us tend to rather name it to "unique_uids" but I guess that
doesn't make a difference to non s390 people ;-)




Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-13 Thread Bjorn Helgaas
On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote:
> On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> > On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
> >> We use the UID of a zPCI adapter, or the UID of the function zero if
> >> there are multiple functions in an adapter, as PCI domain if and only if
> >> UID Checking is turned on.
> >> Otherwise we automatically generate domains as devices appear.
> >>
> >> The state of UID Checking is thus essential to know if the PCI domain
> >> will be stable, yet currently there is no way to access this information
> >> from userspace.
> >> So let's solve this by showing the state of UID checking as a sysfs
> >> attribute in /sys/bus/pci/uid_checking

> >> +/* Global zPCI attributes */
> >> +static ssize_t uid_checking_show(struct kobject *kobj,
> >> +   struct kobj_attribute *attr, char *buf)
> >> +{
> >> +  return sprintf(buf, "%i\n", zpci_unique_uid);
> >> +}
> >> +
> >> +static struct kobj_attribute sys_zpci_uid_checking_attr =
> >> +  __ATTR(uid_checking, 0444, uid_checking_show, NULL);
> > 
> > Use DEVICE_ATTR_RO instead of __ATTR.
> 
> It's my understanding that DEVICE_ATTR_* is only for
> per device attributes. This one is global for the entire
> Z PCI. I just tried with BUS_ATTR_RO instead
> and that works but only if I put the attribute at
> /sys/bus/pci/uid_checking instead of with a zpci
> subfolder. This path would work for us too, we
> currently don't have any other global attributes
> that we are planning to expose but those could of
> course come up in the future.

Ah, I missed the fact that this is a kobj_attribute, not a
device_attribute.  Maybe KERNEL_ATTR_RO()?  Very few uses so far, but
seems like it might fit?

Bjorn


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-12 Thread Niklas Schnelle



On 1/12/21 10:50 PM, Bjorn Helgaas wrote:
> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
>> We use the UID of a zPCI adapter, or the UID of the function zero if
>> there are multiple functions in an adapter, as PCI domain if and only if
>> UID Checking is turned on.
>> Otherwise we automatically generate domains as devices appear.
>>
>> The state of UID Checking is thus essential to know if the PCI domain
>> will be stable, yet currently there is no way to access this information
>> from userspace.
>> So let's solve this by showing the state of UID checking as a sysfs
>> attribute in /sys/bus/pci/uid_checking
> 
> Cosmetic: can't tell if the above is two paragraphs separated by blank
> lines or four separated by either blank lines or short last lines.
> Please separate or reflow to avoid the ambiguity.

Thanks, you're right I split it in 3 proper paragraphs now.
Also the commit message was out of sync with the documentation,
cover letter and code. This version actually uses
/sys/bus/pci/zpci/uid_checking sorry about that.

> 
> I don't have any input on the s390 issues, and I assume this will go
> via the s390 tree.
> 
>> Signed-off-by: Niklas Schnelle 
>> ---
>>  Documentation/ABI/testing/sysfs-bus-pci | 11 
>>  arch/s390/include/asm/pci.h |  3 +++
>>  arch/s390/pci/pci.c |  4 +++
>>  arch/s390/pci/pci_sysfs.c   | 34 +
>>  4 files changed, 52 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
>> b/Documentation/ABI/testing/sysfs-bus-pci
>> index 25c9c39770c6..a174aac0ebb0 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-pci
>> +++ b/Documentation/ABI/testing/sysfs-bus-pci
>> @@ -375,3 +375,14 @@ Description:
>>  The value comes from the PCI kernel device state and can be one
>>  of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
>>  The file is read only.
>> +What:   /sys/bus/pci/zpci/uid_checking
>> +Date:   December 2020
>> +Contact:Niklas Schnelle 
>> +Description:
>> +This attribute exposes the global state of UID Checking on
>> +an s390 Linux system. If UID Checking is on this file
>> +contains '1' otherwise '0'. If UID Checking is on the UID of
>> +a zPCI device, or the UID of function zero for a multi-function
>> +device will be used as its PCI Domain number. If UID Checking
>> +is off PCI Domain numbers are generated automatically and
>> +are not stable across reboots.
>> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
>> index 212628932ddc..3cfa6cc701ba 100644
>> --- a/arch/s390/include/asm/pci.h
>> +++ b/arch/s390/include/asm/pci.h
>> @@ -285,6 +285,9 @@ void zpci_debug_exit_device(struct zpci_dev *);
>>  /* Error reporting */
>>  int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
... snip ...
>> +
>> +/* Global zPCI attributes */
>> +static ssize_t uid_checking_show(struct kobject *kobj,
>> + struct kobj_attribute *attr, char *buf)
>> +{
>> +return sprintf(buf, "%i\n", zpci_unique_uid);
>> +}
>> +
>> +static struct kobj_attribute sys_zpci_uid_checking_attr =
>> +__ATTR(uid_checking, 0444, uid_checking_show, NULL);
> 
> Use DEVICE_ATTR_RO instead of __ATTR.

It's my understanding that DEVICE_ATTR_* is only for
per device attributes. This one is global for the entire
Z PCI. I just tried with BUS_ATTR_RO instead
and that works but only if I put the attribute at
/sys/bus/pci/uid_checking instead of with a zpci
subfolder. This path would work for us too, we
currently don't have any other global attributes
that we are planning to expose but those could of
course come up in the future.

> 
... snip ...
>>


Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-12 Thread Bjorn Helgaas
On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote:
> We use the UID of a zPCI adapter, or the UID of the function zero if
> there are multiple functions in an adapter, as PCI domain if and only if
> UID Checking is turned on.
> Otherwise we automatically generate domains as devices appear.
> 
> The state of UID Checking is thus essential to know if the PCI domain
> will be stable, yet currently there is no way to access this information
> from userspace.
> So let's solve this by showing the state of UID checking as a sysfs
> attribute in /sys/bus/pci/uid_checking

Cosmetic: can't tell if the above is two paragraphs separated by blank
lines or four separated by either blank lines or short last lines.
Please separate or reflow to avoid the ambiguity.

I don't have any input on the s390 issues, and I assume this will go
via the s390 tree.

> Signed-off-by: Niklas Schnelle 
> ---
>  Documentation/ABI/testing/sysfs-bus-pci | 11 
>  arch/s390/include/asm/pci.h |  3 +++
>  arch/s390/pci/pci.c |  4 +++
>  arch/s390/pci/pci_sysfs.c   | 34 +
>  4 files changed, 52 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> b/Documentation/ABI/testing/sysfs-bus-pci
> index 25c9c39770c6..a174aac0ebb0 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -375,3 +375,14 @@ Description:
>   The value comes from the PCI kernel device state and can be one
>   of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
>   The file is read only.
> +What:/sys/bus/pci/zpci/uid_checking
> +Date:December 2020
> +Contact: Niklas Schnelle 
> +Description:
> + This attribute exposes the global state of UID Checking on
> +         an s390 Linux system. If UID Checking is on this file
> + contains '1' otherwise '0'. If UID Checking is on the UID of
> + a zPCI device, or the UID of function zero for a multi-function
> + device will be used as its PCI Domain number. If UID Checking
> + is off PCI Domain numbers are generated automatically and
> + are not stable across reboots.
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 212628932ddc..3cfa6cc701ba 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -285,6 +285,9 @@ void zpci_debug_exit_device(struct zpci_dev *);
>  /* Error reporting */
>  int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
>  
> +/* Sysfs Entries */
> +int zpci_sysfs_init(void);
> +
>  #ifdef CONFIG_NUMA
>  
>  /* Returns the node based on PCI bus */
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 41df8fcfddde..c16c93e5f9af 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -881,6 +881,10 @@ static int __init pci_base_init(void)
>   if (rc)
>   goto out_find;
>  
> + rc = zpci_sysfs_init();
> + if (rc)
> + goto out_find;
> +
>   s390_pci_initialized = 1;
>   return 0;
>  
> diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
> index 5c028bee91b9..d00690f73539 100644
> --- a/arch/s390/pci/pci_sysfs.c
> +++ b/arch/s390/pci/pci_sysfs.c
> @@ -172,3 +172,37 @@ const struct attribute_group *zpci_attr_groups[] = {
>   &pfip_attr_group,
>   NULL,
>  };
> +
> +/* Global zPCI attributes */
> +static ssize_t uid_checking_show(struct kobject *kobj,
> +  struct kobj_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%i\n", zpci_unique_uid);
> +}
> +
> +static struct kobj_attribute sys_zpci_uid_checking_attr =
> + __ATTR(uid_checking, 0444, uid_checking_show, NULL);

Use DEVICE_ATTR_RO instead of __ATTR.

> +static struct kset *zpci_global_kset;
> +
> +static struct attribute *zpci_attrs_global[] = {
> + &sys_zpci_uid_checking_attr.attr,
> + NULL,
> +};
> +
> +static struct attribute_group zpci_attr_group_global = {
> + .attrs = zpci_attrs_global,
> +};
> +
> +int __init zpci_sysfs_init(void)
> +{
> + struct kset *pci_bus_kset;
> +
> + pci_bus_kset = bus_get_kset(&pci_bus_type);
> +
> + zpci_global_kset = kset_create_and_add("zpci", NULL, 
> &pci_bus_kset->kobj);
> + if (!zpci_global_kset)
> + return -ENOMEM;
> +
> + return sysfs_create_group(&zpci_global_kset->kobj, 
> &zpci_attr_group_global);
> +}
> -- 
> 2.17.1
> 


[RFC 0/1] PCI: s390 global attribute "UID Checking"

2021-01-11 Thread Niklas Schnelle
Hi Bjorn, Hi Kernel Hackers,

With the below patch I'm proposing to expose a global (i.e. not device
bound) runtime attribute of the s390 PCI implementation (zPCI) called
"UID Checking".  You can find some background information on what this
attribute means and why it is important at the end of this mail. The
reason I'm writing to you about it however is that this is the first
global PCI attribute we would like to expose to user space and I'm
searching for a good place to put it.
The proposed patch uses "/sys/bus/pci/zpci/uid_checking" which from
our perspective would be a great choice but I realize that there
currently are no platform specific attributes directly under
"/sys/bus/pci" so this clearly requires some discussion. What's your
thought on this and do you know of any other platform specific global
PCI attributes as I couldn't find any?

Best regards,
Niklas Schnelle

Background:
On s390 OSs always run under at least a machine level hypervisor
(LPAR).  Simpliefied by usually running from SAN this makes VM
migration possible at every level. For PCI this has created the need
to allow PCI IDs to be stable across machines and to be partly user
defined such that the setup of an existing VM on one machine can be
recreated on another machine.
In particular the Domain part of the PCI ID can be user defined by
setting a per device value called UID. Since this was a late addition
and isn't used by all OSs and since UIDs need to be unique if used as
Domains, there is an additional global platform supplied runtime value
called "UID Checking".  This value indicates if UIDs are guaranteed to
be unique and set which triggers Linux to use the UIDs as PCI Domains
otherwise PCI Domains are simply incremented as necessary.
This "UID Checking" setting is thus very important e.g. when deciding
how network interface names are generated as it indicates if the
domain part of the PCI ID will remain stable across reboots and
migrations.  Once exposed we will propose a patch to udev/systemd to
use the "UID Checking" attribute to prefer network interface names
which can be guaranteed to be stable and re-creatable for migration.

Niklas Schnelle (1):
  s390/pci: expose UID checking state in sysfs

 Documentation/ABI/testing/sysfs-bus-pci | 11 
 arch/s390/include/asm/pci.h |  3 +++
 arch/s390/pci/pci.c |  4 +++
 arch/s390/pci/pci_sysfs.c   | 34 +
 4 files changed, 52 insertions(+)

-- 
2.17.1



[RFC 1/1] s390/pci: expose UID checking state in sysfs

2021-01-11 Thread Niklas Schnelle
We use the UID of a zPCI adapter, or the UID of the function zero if
there are multiple functions in an adapter, as PCI domain if and only if
UID Checking is turned on.
Otherwise we automatically generate domains as devices appear.

The state of UID Checking is thus essential to know if the PCI domain
will be stable, yet currently there is no way to access this information
from userspace.
So let's solve this by showing the state of UID checking as a sysfs
attribute in /sys/bus/pci/uid_checking

Signed-off-by: Niklas Schnelle 
---
 Documentation/ABI/testing/sysfs-bus-pci | 11 
 arch/s390/include/asm/pci.h |  3 +++
 arch/s390/pci/pci.c |  4 +++
 arch/s390/pci/pci_sysfs.c   | 34 +
 4 files changed, 52 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index 25c9c39770c6..a174aac0ebb0 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -375,3 +375,14 @@ Description:
The value comes from the PCI kernel device state and can be one
of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
The file is read only.
+What:  /sys/bus/pci/zpci/uid_checking
+Date:  December 2020
+Contact:   Niklas Schnelle 
+Description:
+   This attribute exposes the global state of UID Checking on
+   an s390 Linux system. If UID Checking is on this file
+   contains '1' otherwise '0'. If UID Checking is on the UID of
+   a zPCI device, or the UID of function zero for a multi-function
+   device will be used as its PCI Domain number. If UID Checking
+   is off PCI Domain numbers are generated automatically and
+   are not stable across reboots.
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 212628932ddc..3cfa6cc701ba 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -285,6 +285,9 @@ void zpci_debug_exit_device(struct zpci_dev *);
 /* Error reporting */
 int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
 
+/* Sysfs Entries */
+int zpci_sysfs_init(void);
+
 #ifdef CONFIG_NUMA
 
 /* Returns the node based on PCI bus */
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 41df8fcfddde..c16c93e5f9af 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -881,6 +881,10 @@ static int __init pci_base_init(void)
if (rc)
goto out_find;
 
+   rc = zpci_sysfs_init();
+   if (rc)
+   goto out_find;
+
s390_pci_initialized = 1;
return 0;
 
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 5c028bee91b9..d00690f73539 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -172,3 +172,37 @@ const struct attribute_group *zpci_attr_groups[] = {
&pfip_attr_group,
NULL,
 };
+
+/* Global zPCI attributes */
+static ssize_t uid_checking_show(struct kobject *kobj,
+struct kobj_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%i\n", zpci_unique_uid);
+}
+
+static struct kobj_attribute sys_zpci_uid_checking_attr =
+   __ATTR(uid_checking, 0444, uid_checking_show, NULL);
+
+static struct kset *zpci_global_kset;
+
+static struct attribute *zpci_attrs_global[] = {
+   &sys_zpci_uid_checking_attr.attr,
+   NULL,
+};
+
+static struct attribute_group zpci_attr_group_global = {
+   .attrs = zpci_attrs_global,
+};
+
+int __init zpci_sysfs_init(void)
+{
+   struct kset *pci_bus_kset;
+
+   pci_bus_kset = bus_get_kset(&pci_bus_type);
+
+   zpci_global_kset = kset_create_and_add("zpci", NULL, 
&pci_bus_kset->kobj);
+   if (!zpci_global_kset)
+   return -ENOMEM;
+
+   return sysfs_create_group(&zpci_global_kset->kobj, 
&zpci_attr_group_global);
+}
-- 
2.17.1



[PATCH v16 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-12-08 Thread Jianyong Wu
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 25ea4ecb6449..b9d8607083eb 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.17.1



[PATCH v15 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-11-10 Thread Jianyong Wu
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 25ea4ecb6449..b9d8607083eb 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.17.1



[PATCH v14 02/10] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-09-04 Thread Jianyong Wu
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 550dfa3e53cd..901c60f119c2 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_BP_HARDEN_UNKNOWN:
break;
case KVM_BP_HARDEN_WA_NEEDED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_BP_HARDEN_NOT_REQUIRED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
@@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_SSBD_UNKNOWN:
break;
case KVM_SSBD_KERNEL:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_SSBD_FORCE_ENABLE:
case KVM_SSBD_MITIGATED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.17.1



Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-08-20 Thread Will Deacon
On Tue, Jul 28, 2020 at 01:07:14AM +, Jianyong Wu wrote:
> 
> 
> > -Original Message-
> > From: Will Deacon 
> > Sent: Monday, July 27, 2020 7:38 PM
> > To: Jianyong Wu 
> > Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> > t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> > m...@kernel.org; richardcoch...@gmail.com; Mark Rutland
> > ; Suzuki Poulose ;
> > Steven Price ; linux-kernel@vger.kernel.org; linux-
> > arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu;
> > k...@vger.kernel.org; Steve Capper ; Kaly Xin
> > ; Justin He ; Wei Chen
> > ; nd 
> > Subject: Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests
> > via SMCCC
> > 
> > On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote:
> > > > From: Will Deacon 
> > > >
> > > > We can advertise ourselves to guests as KVM and provide a basic
> > > > features bitmap for discoverability of future hypervisor services.
> > > >
> > > > Cc: Marc Zyngier 
> > > > Signed-off-by: Will Deacon 
> > > > Signed-off-by: Jianyong Wu 
> > > > ---
> > > >  arch/arm64/kvm/hypercalls.c | 29 +++--
> > > >  1 file changed, 19 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/hypercalls.c
> > > > b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..db6dce3d0e23
> > > > 100644
> > > > --- a/arch/arm64/kvm/hypercalls.c
> > > > +++ b/arch/arm64/kvm/hypercalls.c
> > > > @@ -12,13 +12,13 @@
> > > >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
> > > > u32 func_id = smccc_get_function(vcpu);
> > > > -   long val = SMCCC_RET_NOT_SUPPORTED;
> > > > +   u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> > >
> > > There is a risk as this u32 value will return here and a u64 value
> > > will be obtained in guest. For example, The val[0] is initialized as
> > > -1 of 0x and the guest get 0x then it will be compared
> > > with -1 of 0x Also this problem exists for the
> > > transfer of address in u64 type. So the following assignment to "val"
> > > should be split into two
> > > u32 value and assign to val[0] and val[1] respectively.
> > > WDYT?
> > 
> > Yes, I think you're right that this is a bug, but isn't the solution just 
> > to make
> > that an array of 'long'?
> > 
> > long val [4];
> > 
> > That will sign-extend the negative error codes as required, while leaving 
> > the
> > explicitly unsigned UID constants alone.
> 
> Ok, that's much better. I will fix it at next version.
> 
> By the way, I wonder when will you update this patch set. I see someone like 
> me
> adopt this patch set as code base and need rebase it every time, so expect 
> your update.

I'm not working on it, so please feel free to include it along with the
patches that add an upstream user.

Will


[RFC PATCH 21/30] user namespace: Add function that checks if the UID map is defined

2020-08-18 Thread krzysztof.struczynski
From: Krzysztof Struczynski 

Add function that checks if the UID map is defined. It will be used by
ima to check if ID remapping in subject-based rules is necessary.

Signed-off-by: Krzysztof Struczynski 
---
 include/linux/user_namespace.h |  6 ++
 kernel/user_namespace.c| 11 +++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index d9759c54fead..bcb21c41c910 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -138,6 +138,7 @@ extern bool in_userns(const struct user_namespace *ancestor,
   const struct user_namespace *child);
 extern bool current_in_userns(const struct user_namespace *target_ns);
 struct ns_common *ns_get_owner(struct ns_common *ns);
+extern bool userns_set_uidmap(const struct user_namespace *ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -182,6 +183,11 @@ static inline struct ns_common *ns_get_owner(struct 
ns_common *ns)
 {
return ERR_PTR(-EPERM);
 }
+
+static inline bool userns_set_uidmap(const struct user_namespace *ns)
+{
+   return true;
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 87804e0371fe..e38f9f11a589 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1232,6 +1232,17 @@ bool current_in_userns(const struct user_namespace 
*target_ns)
 }
 EXPORT_SYMBOL(current_in_userns);
 
+bool userns_set_uidmap(const struct user_namespace *ns)
+{
+   bool mapping_defined;
+
+   mutex_lock(&userns_state_mutex);
+   mapping_defined = ns->uid_map.nr_extents != 0;
+   mutex_unlock(&userns_state_mutex);
+
+   return mapping_defined;
+}
+
 static inline struct user_namespace *to_user_ns(struct ns_common *ns)
 {
return container_of(ns, struct user_namespace, ns);
-- 
2.20.1



RE: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-07-27 Thread Jianyong Wu



> -Original Message-
> From: Will Deacon 
> Sent: Monday, July 27, 2020 7:38 PM
> To: Jianyong Wu 
> Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> m...@kernel.org; richardcoch...@gmail.com; Mark Rutland
> ; Suzuki Poulose ;
> Steven Price ; linux-kernel@vger.kernel.org; linux-
> arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu;
> k...@vger.kernel.org; Steve Capper ; Kaly Xin
> ; Justin He ; Wei Chen
> ; nd 
> Subject: Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests
> via SMCCC
> 
> On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote:
> > > From: Will Deacon 
> > >
> > > We can advertise ourselves to guests as KVM and provide a basic
> > > features bitmap for discoverability of future hypervisor services.
> > >
> > > Cc: Marc Zyngier 
> > > Signed-off-by: Will Deacon 
> > > Signed-off-by: Jianyong Wu 
> > > ---
> > >  arch/arm64/kvm/hypercalls.c | 29 +++--
> > >  1 file changed, 19 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/hypercalls.c
> > > b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..db6dce3d0e23
> > > 100644
> > > --- a/arch/arm64/kvm/hypercalls.c
> > > +++ b/arch/arm64/kvm/hypercalls.c
> > > @@ -12,13 +12,13 @@
> > >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
> > >   u32 func_id = smccc_get_function(vcpu);
> > > - long val = SMCCC_RET_NOT_SUPPORTED;
> > > + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> >
> > There is a risk as this u32 value will return here and a u64 value
> > will be obtained in guest. For example, The val[0] is initialized as
> > -1 of 0x and the guest get 0x then it will be compared
> > with -1 of 0x Also this problem exists for the
> > transfer of address in u64 type. So the following assignment to "val"
> > should be split into two
> > u32 value and assign to val[0] and val[1] respectively.
> > WDYT?
> 
> Yes, I think you're right that this is a bug, but isn't the solution just to 
> make
> that an array of 'long'?
> 
>   long val [4];
> 
> That will sign-extend the negative error codes as required, while leaving the
> explicitly unsigned UID constants alone.

Ok, that's much better. I will fix it at next version.

By the way, I wonder when will you update this patch set. I see someone like me
adopt this patch set as code base and need rebase it every time, so expect your 
update.

Thanks
Jianyong 
> 
> Will


Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-07-27 Thread Will Deacon
On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote:
> > From: Will Deacon 
> > 
> > We can advertise ourselves to guests as KVM and provide a basic features
> > bitmap for discoverability of future hypervisor services.
> > 
> > Cc: Marc Zyngier 
> > Signed-off-by: Will Deacon 
> > Signed-off-by: Jianyong Wu 
> > ---
> >  arch/arm64/kvm/hypercalls.c | 29 +++--
> >  1 file changed, 19 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 550dfa3e53cd..db6dce3d0e23 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -12,13 +12,13 @@
> >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
> > u32 func_id = smccc_get_function(vcpu);
> > -   long val = SMCCC_RET_NOT_SUPPORTED;
> > +   u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> 
> There is a risk as this u32 value will return here and a u64 value will be
> obtained in guest. For example, The val[0] is initialized as -1 of
> 0x and the guest get 0x then it will be compared with -1
> of 0x Also this problem exists for the transfer of address
> in u64 type. So the following assignment to "val" should be split into two
> u32 value and assign to val[0] and val[1] respectively.
> WDYT?

Yes, I think you're right that this is a bug, but isn't the solution just
to make that an array of 'long'?

long val [4];

That will sign-extend the negative error codes as required, while leaving
the explicitly unsigned UID constants alone.

Will


RE: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-07-26 Thread Jianyong Wu
Hi Will,

> -Original Message-
> From: Jianyong Wu 
> Sent: Friday, June 19, 2020 9:01 PM
> To: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> m...@kernel.org; richardcoch...@gmail.com; Mark Rutland
> ; w...@kernel.org; Suzuki Poulose
> ; Steven Price 
> Cc: linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> kvm...@lists.cs.columbia.edu; k...@vger.kernel.org; Steve Capper
> ; Kaly Xin ; Justin He
> ; Wei Chen ; Jianyong Wu
> ; nd 
> Subject: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via
> SMCCC
> 
> From: Will Deacon 
> 
> We can advertise ourselves to guests as KVM and provide a basic features
> bitmap for discoverability of future hypervisor services.
> 
> Cc: Marc Zyngier 
> Signed-off-by: Will Deacon 
> Signed-off-by: Jianyong Wu 
> ---
>  arch/arm64/kvm/hypercalls.c | 29 +++--
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 550dfa3e53cd..db6dce3d0e23 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -12,13 +12,13 @@
>  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
>   u32 func_id = smccc_get_function(vcpu);
> - long val = SMCCC_RET_NOT_SUPPORTED;
> + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};

There is a risk as this u32 value will return here and a u64 value will be 
obtained in guest. For example,
The val[0] is initialized as -1 of 0x and the guest get 0x then 
it will be compared with -1 of 0x
Also this problem exists for the transfer of address in u64 type. So the 
following assignment to "val" should be split into
two u32 value and assign to val[0] and val[1] respectively.
WDYT?

Thanks
Jianyong 

>   u32 feature;
>   gpa_t gpa;
> 
>   switch (func_id) {
>   case ARM_SMCCC_VERSION_FUNC_ID:
> - val = ARM_SMCCC_VERSION_1_1;
> + val[0] = ARM_SMCCC_VERSION_1_1;
>   break;
>   case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
>   feature = smccc_get_arg1(vcpu);
> @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>   case KVM_BP_HARDEN_UNKNOWN:
>   break;
>   case KVM_BP_HARDEN_WA_NEEDED:
> - val = SMCCC_RET_SUCCESS;
> + val[0] = SMCCC_RET_SUCCESS;
>   break;
>   case KVM_BP_HARDEN_NOT_REQUIRED:
> - val = SMCCC_RET_NOT_REQUIRED;
> + val[0] = SMCCC_RET_NOT_REQUIRED;
>   break;
>   }
>   break;
> @@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>   case KVM_SSBD_UNKNOWN:
>   break;
>   case KVM_SSBD_KERNEL:
> - val = SMCCC_RET_SUCCESS;
> + val[0] = SMCCC_RET_SUCCESS;
>   break;
>   case KVM_SSBD_FORCE_ENABLE:
>   case KVM_SSBD_MITIGATED:
> - val = SMCCC_RET_NOT_REQUIRED;
> + val[0] = SMCCC_RET_NOT_REQUIRED;
>   break;
>   }
>   break;
>   case ARM_SMCCC_HV_PV_TIME_FEATURES:
> - val = SMCCC_RET_SUCCESS;
> + val[0] = SMCCC_RET_SUCCESS;
>   break;
>   }
>   break;
>   case ARM_SMCCC_HV_PV_TIME_FEATURES:
> - val = kvm_hypercall_pv_features(vcpu);
> + val[0] = kvm_hypercall_pv_features(vcpu);
>   break;
>   case ARM_SMCCC_HV_PV_TIME_ST:
>   gpa = kvm_init_stolen_time(vcpu);
>   if (gpa != GPA_INVALID)
> - val = gpa;
> + val[0] = gpa;
> + break;
> + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
> + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
> + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
> + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
> + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
> + break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
>   break;
>   default:
>   return kvm_psci_call(vcpu);
>   }
> 
> - smccc_set_retval(vcpu, val, 0, 0, 0);
> + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
>   return 1;
>  }
> --
> 2.17.1



[RFC PATCH 2/5] keys: Replace uid/gid/perm permissions checking with an ACL

2020-07-16 Thread David Howells
Replace the uid/gid/perm permissions checking on a key with an ACL to allow
the SETATTR and SEARCH permissions to be split.  This will also allow a
greater range of subjects to represented.

Define a KEYCTL_CAPS_ACL capabilities bit to indicate if the kernel has
this change applied.


WHY DO THIS?


The problem is that SETATTR and SEARCH cover a slew of actions, not all of
which should be grouped together.

For SETATTR, this includes actions that are about controlling access to a
key:

 (1) Changing a key's ownership.

 (2) Changing a key's security information.

 (3) Setting a keyring's restriction.

And actions that are about managing a key's lifetime:

 (4) Setting an expiry time.

 (5) Revoking a key.

and (proposed) managing a key as part of a cache:

 (6) Invalidating a key.

Managing a key's lifetime doesn't really have anything to do with
controlling access to that key.

Expiry time is awkward since it's more about the lifetime of the content
and so, in some ways goes better with WRITE permission.  It can, however,
be set unconditionally by a process with an appropriate authorisation token
for instantiating a key, and can also be set by the key type driver when a
key is instantiated, so lumping it with the access-controlling actions is
probably okay.

As for SEARCH permission, that currently covers:

 (1) Finding keys in a keyring tree during a search.

 (2) Permitting keyrings to be joined.

 (3) Invalidation.

But these don't really belong together either, since these actions really
need to be controlled separately.

Finally, there are number of special cases to do with granting the
administrator special rights to invalidate or clear keys that I would like
to handle with the ACL rather than key flags and special checks.


===
WHAT IS CHANGED
===

The SETATTR permission is split to create two new permissions:

 (1) SETSEC - which allows the key's owner, group and ACL to be changed and
 a restriction to be placed on a keyring.

 (2) REVOKE - which allows a key to be revoked.

The SEARCH permission is split to create:

 (1) SEARCH - which allows a keyring to be search and a key to be found.

 (2) JOIN - which allows a keyring to be joined as a session keyring.

 (3) INVAL - which allows a key to be invalidated.

The WRITE permission is also split to create:

 (1) WRITE - which allows a key's content to be altered and links to be
 added, removed and replaced in a keyring.

 (2) CLEAR - which allows a keyring to be cleared completely.  This is
 split out to make it possible to give just this to an administrator.

 (3) REVOKE - see above.


Keys acquire ACLs which consist of a series of ACEs, and all that apply are
unioned together.  An ACE specifies a subject, such as:

 (*) Possessor - permitted to anyone who 'possesses' a key
 (*) Owner - permitted to the key owner
 (*) Group - permitted to the key group
 (*) Everyone - permitted to everyone

Note that 'Other' has been replaced with 'Everyone' on the assumption that
you wouldn't grant a permit to 'Other' that you wouldn't also grant to
everyone else.

Further subjects may be made available by later patches.

The ACE also specifies a permissions mask.  The set of permissions is now:

VIEWCan view the key metadata
READCan read the key content
WRITE   Can update/modify the key content
SEARCH  Can find the key by searching/requesting
LINKCan make a link to the key
SETSEC  Can change owner, ACL, expiry
INVAL   Can invalidate
REVOKE  Can revoke
JOINCan join this keyring
CLEAR   Can clear this keyring


The KEYCTL_SETPERM function is then deprecated.

The KEYCTL_SET_TIMEOUT function then is permitted if SETSEC is set, or if
the caller has a valid instantiation auth token.

The KEYCTL_INVALIDATE function then requires INVAL.

The KEYCTL_REVOKE function then requires REVOKE.

The KEYCTL_JOIN_SESSION_KEYRING function then requires JOIN to join an
existing keyring.

The JOIN permission is enabled by default for session keyrings and manually
created keyrings only.


==
BACKWARD COMPATIBILITY
==

To maintain backward compatibility, KEYCTL_SETPERM will translate the
permissions mask it is given into a new ACL for a key - unless
KEYCTL_SET_ACL has been called on that key, in which case an error will be
returned.

It will convert possessor, owner, group and other permissions into separate
ACEs, if each portion of the mask is non-zero.

SETATTR permission turns on all of INVAL, REVOKE and SETSEC.  WRITE
permission turns on WRITE, REVOKE and, if a keyring, CLEAR.  JOIN is turned
on if a keyring is being altered.

The KEYCTL_DESCRIBE function translates the ACL back into a permissions
mask to retur

[PATCH 5.7 145/265] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-29 Thread Sasha Levin
From: Anson Huang 

[ Upstream commit c95c9693b112f312b59c5d100fd09a1349970fab ]

Correct i.MX8MP UID fuse offset according to fuse map:

UID_LOW: 0x420
UID_HIGH: 0x430

Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver")
Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support")
Signed-off-by: Anson Huang 
Reviewed-by: Iuliana Prodan 
Signed-off-by: Shawn Guo 
Signed-off-by: Sasha Levin 
---
 drivers/soc/imx/soc-imx8m.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c
index 719e1f189ebf2..ada0d8804d84c 100644
--- a/drivers/soc/imx/soc-imx8m.c
+++ b/drivers/soc/imx/soc-imx8m.c
@@ -22,6 +22,8 @@
 #define OCOTP_UID_LOW  0x410
 #define OCOTP_UID_HIGH 0x420
 
+#define IMX8MP_OCOTP_UID_OFFSET0x10
+
 /* Same as ANADIG_DIGPROG_IMX7D */
 #define ANADIG_DIGPROG_IMX8MM  0x800
 
@@ -88,6 +90,8 @@ static void __init imx8mm_soc_uid(void)
 {
void __iomem *ocotp_base;
struct device_node *np;
+   u32 offset = of_machine_is_compatible("fsl,imx8mp") ?
+IMX8MP_OCOTP_UID_OFFSET : 0;
 
np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp");
if (!np)
@@ -96,9 +100,9 @@ static void __init imx8mm_soc_uid(void)
ocotp_base = of_iomap(np, 0);
WARN_ON(!ocotp_base);
 
-   soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH);
+   soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset);
soc_uid <<= 32;
-   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW);
+   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset);
 
iounmap(ocotp_base);
of_node_put(np);
-- 
2.25.1



Re: [PATCH V3] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-23 Thread Shawn Guo
On Wed, Jun 10, 2020 at 06:03:02PM +0800, Anson Huang wrote:
> Correct i.MX8MP UID fuse offset according to fuse map:
> 
> UID_LOW: 0x420
> UID_HIGH: 0x430
> 
> Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver")
> Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support")
> Signed-off-by: Anson Huang 

Applied, thanks.


[PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-06-19 Thread Jianyong Wu
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 550dfa3e53cd..db6dce3d0e23 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_BP_HARDEN_UNKNOWN:
break;
case KVM_BP_HARDEN_WA_NEEDED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_BP_HARDEN_NOT_REQUIRED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
@@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_SSBD_UNKNOWN:
break;
case KVM_SSBD_KERNEL:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_SSBD_FORCE_ENABLE:
case KVM_SSBD_MITIGATED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.17.1



[RFC PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-06-19 Thread Jianyong Wu
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 550dfa3e53cd..db6dce3d0e23 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_BP_HARDEN_UNKNOWN:
break;
case KVM_BP_HARDEN_WA_NEEDED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_BP_HARDEN_NOT_REQUIRED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
@@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_SSBD_UNKNOWN:
break;
case KVM_SSBD_KERNEL:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_SSBD_FORCE_ENABLE:
case KVM_SSBD_MITIGATED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.17.1



Re: [PATCH V3] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-10 Thread Iuliana Prodan



On 6/10/2020 1:03 PM, Anson Huang wrote:

Correct i.MX8MP UID fuse offset according to fuse map:

UID_LOW: 0x420
UID_HIGH: 0x430

Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver")
Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support")
Signed-off-by: Anson Huang 

Reviewed-by: Iuliana Prodan 


---
Changes since V2:
- add one more fix tag for original patch before file name is changed.
---
  drivers/soc/imx/soc-imx8m.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c
index 7b0759a..0bc8314 100644
--- a/drivers/soc/imx/soc-imx8m.c
+++ b/drivers/soc/imx/soc-imx8m.c
@@ -22,6 +22,8 @@
  #define OCOTP_UID_LOW 0x410
  #define OCOTP_UID_HIGH0x420
  
+#define IMX8MP_OCOTP_UID_OFFSET		0x10

+
  /* Same as ANADIG_DIGPROG_IMX7D */
  #define ANADIG_DIGPROG_IMX8MM 0x800
  
@@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void)

  {
void __iomem *ocotp_base;
struct device_node *np;
+   u32 offset = of_machine_is_compatible("fsl,imx8mp") ?
+IMX8MP_OCOTP_UID_OFFSET : 0;
  
  	np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp");

if (!np)
@@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void)
ocotp_base = of_iomap(np, 0);
WARN_ON(!ocotp_base);
  
-	soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH);

+   soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset);
soc_uid <<= 32;
-   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW);
+   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset);
  
  	iounmap(ocotp_base);

of_node_put(np);



[PATCH V3] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-10 Thread Anson Huang
Correct i.MX8MP UID fuse offset according to fuse map:

UID_LOW: 0x420
UID_HIGH: 0x430

Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver")
Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support")
Signed-off-by: Anson Huang 
---
Changes since V2:
- add one more fix tag for original patch before file name is changed.
---
 drivers/soc/imx/soc-imx8m.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c
index 7b0759a..0bc8314 100644
--- a/drivers/soc/imx/soc-imx8m.c
+++ b/drivers/soc/imx/soc-imx8m.c
@@ -22,6 +22,8 @@
 #define OCOTP_UID_LOW  0x410
 #define OCOTP_UID_HIGH 0x420
 
+#define IMX8MP_OCOTP_UID_OFFSET0x10
+
 /* Same as ANADIG_DIGPROG_IMX7D */
 #define ANADIG_DIGPROG_IMX8MM  0x800
 
@@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void)
 {
void __iomem *ocotp_base;
struct device_node *np;
+   u32 offset = of_machine_is_compatible("fsl,imx8mp") ?
+IMX8MP_OCOTP_UID_OFFSET : 0;
 
np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp");
if (!np)
@@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void)
ocotp_base = of_iomap(np, 0);
WARN_ON(!ocotp_base);
 
-   soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH);
+   soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset);
soc_uid <<= 32;
-   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW);
+   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset);
 
iounmap(ocotp_base);
of_node_put(np);
-- 
2.7.4



RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-10 Thread Anson Huang
Hi, Luliana

> Subject: Re: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
> 
> 
> 
> On 6/10/2020 10:57 AM, Anson Huang wrote:
> >
> >> Subject: RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
> >>
> >>> From: Anson Huang 
> >>> Sent: Wednesday, June 10, 2020 6:42 AM
> >>>
> >>> Correct i.MX8MP UID fuse offset according to fuse map:
> >>>
> >>> UID_LOW: 0x420
> >>> UID_HIGH: 0x430
> >>>
> >>> Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m
> >>> soc
> >>> driver")
> >>
> >> AFAIK "Fixes:" should point to the original patch which introduced the 
> >> issue.
> >> Not the one changing file name.
> >
> > But the patch can NOT be applied to the kernel version with original
> > file, how to fix it?
> >
> I believe you can add two "Fixes:" with the two commits: the one introducing
> the issue and the one changing the file name.

Thanks, will add once more fix tag, please help review V3.

Anson


Re: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-10 Thread Iuliana Prodan




On 6/10/2020 10:57 AM, Anson Huang wrote:



Subject: RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset


From: Anson Huang 
Sent: Wednesday, June 10, 2020 6:42 AM

Correct i.MX8MP UID fuse offset according to fuse map:

UID_LOW: 0x420
UID_HIGH: 0x430

Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc
driver")


AFAIK "Fixes:" should point to the original patch which introduced the issue.
Not the one changing file name.


But the patch can NOT be applied to the kernel version with original file, how 
to
fix it?

I believe you can add two "Fixes:" with the two commits: the one 
introducing the issue and the one changing the file name.


Iulia


RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-10 Thread Anson Huang

> Subject: RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
> 
> > From: Anson Huang 
> > Sent: Wednesday, June 10, 2020 6:42 AM
> >
> > Correct i.MX8MP UID fuse offset according to fuse map:
> >
> > UID_LOW: 0x420
> > UID_HIGH: 0x430
> >
> > Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc
> > driver")
> 
> AFAIK "Fixes:" should point to the original patch which introduced the issue.
> Not the one changing file name.

But the patch can NOT be applied to the kernel version with original file, how 
to
fix it?

Anson


RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-10 Thread Aisheng Dong
> From: Anson Huang 
> Sent: Wednesday, June 10, 2020 6:42 AM
> 
> Correct i.MX8MP UID fuse offset according to fuse map:
> 
> UID_LOW: 0x420
> UID_HIGH: 0x430
> 
> Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver")

AFAIK "Fixes:" should point to the original patch which introduced the issue.
Not the one changing file name.

> Signed-off-by: Anson Huang 
> ---
> Changes since V1:
>   - add fix tag.
> ---
>  drivers/soc/imx/soc-imx8m.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c index
> 7b0759a..0bc8314 100644
> --- a/drivers/soc/imx/soc-imx8m.c
> +++ b/drivers/soc/imx/soc-imx8m.c
> @@ -22,6 +22,8 @@
>  #define OCOTP_UID_LOW0x410
>  #define OCOTP_UID_HIGH   0x420
> 
> +#define IMX8MP_OCOTP_UID_OFFSET  0x10
> +
>  /* Same as ANADIG_DIGPROG_IMX7D */
>  #define ANADIG_DIGPROG_IMX8MM0x800
> 
> @@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void)  {
>   void __iomem *ocotp_base;
>   struct device_node *np;
> + u32 offset = of_machine_is_compatible("fsl,imx8mp") ?
> +  IMX8MP_OCOTP_UID_OFFSET : 0;
> 

If (of_machine_is_compatible("fsl,imx8mp"))
Octop_base += IMX8MP_OCOTP_UID_OFFSET;
Then you may not need the extra changes.

Regards
Aisheng

>   np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp");
>   if (!np)
> @@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void)
>   ocotp_base = of_iomap(np, 0);
>   WARN_ON(!ocotp_base);
> 
> - soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH);
> + soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset);
>   soc_uid <<= 32;
> - soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW);
> + soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset);
> 
>   iounmap(ocotp_base);
>   of_node_put(np);
> --
> 2.7.4



[PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-09 Thread Anson Huang
Correct i.MX8MP UID fuse offset according to fuse map:

UID_LOW: 0x420
UID_HIGH: 0x430

Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver")
Signed-off-by: Anson Huang 
---
Changes since V1:
- add fix tag.
---
 drivers/soc/imx/soc-imx8m.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c
index 7b0759a..0bc8314 100644
--- a/drivers/soc/imx/soc-imx8m.c
+++ b/drivers/soc/imx/soc-imx8m.c
@@ -22,6 +22,8 @@
 #define OCOTP_UID_LOW  0x410
 #define OCOTP_UID_HIGH 0x420
 
+#define IMX8MP_OCOTP_UID_OFFSET0x10
+
 /* Same as ANADIG_DIGPROG_IMX7D */
 #define ANADIG_DIGPROG_IMX8MM  0x800
 
@@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void)
 {
void __iomem *ocotp_base;
struct device_node *np;
+   u32 offset = of_machine_is_compatible("fsl,imx8mp") ?
+IMX8MP_OCOTP_UID_OFFSET : 0;
 
np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp");
if (!np)
@@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void)
ocotp_base = of_iomap(np, 0);
WARN_ON(!ocotp_base);
 
-   soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH);
+   soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset);
soc_uid <<= 32;
-   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW);
+   soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset);
 
iounmap(ocotp_base);
of_node_put(np);
-- 
2.7.4



RE: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-09 Thread Anson Huang
Hi, Luliana

> Subject: RE: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset
> 
> Hi, Luliana
> 
> 
> > Subject: Re: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset
> >
> >
> >
> > On 6/9/2020 4:15 PM, Anson Huang wrote:
> > > Correct i.MX8MP UID fuse offset according to fuse map:
> > >
> > > UID_LOW: 0x420
> > > UID_HIGH: 0x430
> > >
> > > Signed-off-by: Anson Huang 
> >
> > If this patch corrects the imx8mp UID shouldn't have a Fixes tag?
> 
> I thought about this, but I was confused that this file name is changed by
> commit fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc
> driver"), so if to add fix tag, should I add the tag to point to first patch 
> of
> drivers/soc/imx/soc-imx8m.c, or the original commit of supporting 8MP UID in
> drivers/soc/imx/soc-imx8.c which is no long there.

I think it should be commit fc40200ebf82 ("soc: imx: increase build coverage 
for imx8m soc
driver"), will send a V2 patch.

Thanks,
Anson


RE: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset

2020-06-09 Thread Anson Huang
Hi, Luliana


> Subject: Re: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset
> 
> 
> 
> On 6/9/2020 4:15 PM, Anson Huang wrote:
> > Correct i.MX8MP UID fuse offset according to fuse map:
> >
> > UID_LOW: 0x420
> > UID_HIGH: 0x430
> >
> > Signed-off-by: Anson Huang 
> 
> If this patch corrects the imx8mp UID shouldn't have a Fixes tag?

I thought about this, but I was confused that this file name is changed
by commit fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc 
driver"),
so if to add fix tag, should I add the tag to point to first patch of
drivers/soc/imx/soc-imx8m.c, or the original commit of supporting 8MP UID
in drivers/soc/imx/soc-imx8.c which is no long there.

Thanks
Anson


  1   2   3   4   5   6   7   >