Re: [PATCH v2] eventfs: Fix file and directory uid and gid ownership
On Thu, 21 Dec 2023 19:07:57 -0500 Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > It was reported that when mounting the tracefs file system with a gid > other than root, the ownership did not carry down to the eventfs directory > due to the dynamic nature of it. > > A fix was done to solve this, but it had two issues. > > (a) if the attr passed into update_inode_attr() was NULL, it didn't do > anything. This is true for files that have not had a chown or chgrp > done to itself or any of its sibling files, as the attr is allocated > for all children when any one needs it. > > # umount /sys/kernel/tracing > # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt > > # ls -ld /mnt/events/sched > drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/ > > # ls -ld /mnt/events/sched/sched_switch > drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/ > > But when checking the files: > > # ls -l /mnt/events/sched/sched_switch > total 0 > -rw-r- 1 root root 0 Dec 21 13:12 enable > -rw-r- 1 root root 0 Dec 21 13:12 filter > -r--r- 1 root root 0 Dec 21 13:12 format > -r--r- 1 root root 0 Dec 21 13:12 hist > -r--r- 1 root root 0 Dec 21 13:12 id > -rw-r- 1 root root 0 Dec 21 13:12 trigger > > (b) When the attr does not denote the UID or GID, it defaulted to using > the parent uid or gid. This is incorrect as changing the parent > uid or gid will automatically change all its children. > > # chgrp tracing /mnt/events/timer > > # ls -ld /mnt/events/timer > drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer > > # ls -l /mnt/events/timer > total 0 > -rw-r- 1 root root0 Dec 21 14:35 enable > -rw-r- 1 root root0 Dec 21 14:35 filter > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init > drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start > > At first it was thought that this could be easily fixed by just making the > default ownership of the superblock when it was mounted. But this does not > handle the case of: > > # chgrp tracing instances > # mkdir instances/foo > > If the superblock was used, then the group ownership would be that of what > it was when it was mounted, when it should instead be "tracing". > > Instead, set a flag for the top level eventfs directory ("events") to flag > which eventfs_inode belongs to it. > > Since the "events" directory's dentry and inode are never freed, it does > not need to use its attr field to restore its mode and ownership. Use the > this eventfs_inode's attr as the default ownership for all the files and > directories underneath it. > > When the events eventfs_inode is created, it sets its ownership to its > parent uid and gid. As the events directory is created at boot up before > it gets mounted, this will always be uid=0 and gid=0. If it's created via > an instance, then it will take the ownership of the instance directory. > > When the file system is mounted, it will update all the gids if one is > specified. This will have a callback to update the events evenfs_inode's > default entries. > > When a file or directory is created under the events directory, it will > walk the ei->dentry parents until it finds the evenfs_inode that belongs > to the events directory to retrieve the default uid and gid values. > > Link: > https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=gtw+anwscylhpfswrq84pjho...@mail.gmail.com/ > Looks good to me. Reviewed-by: Masami Hiramatsu (Google) Tested-by: Masami Hiramatsu (Google) Thank you, > Cc: sta...@vger.kernel.org > Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to > parent uid and gid") > Reported-by: Linus Torvalds > Signed-off-by: Steven Rostedt (Google) > --- > Changes since v1: > https://lore.kernel.org/linux-trace-kernel/20231221173812.51fe6...@gandalf.local.home > > - While writing kselftests to test ownership, I found a bug. The remount >
[PATCH v2] eventfs: Fix file and directory uid and gid ownership
From: "Steven Rostedt (Google)" It was reported that when mounting the tracefs file system with a gid other than root, the ownership did not carry down to the eventfs directory due to the dynamic nature of it. A fix was done to solve this, but it had two issues. (a) if the attr passed into update_inode_attr() was NULL, it didn't do anything. This is true for files that have not had a chown or chgrp done to itself or any of its sibling files, as the attr is allocated for all children when any one needs it. # umount /sys/kernel/tracing # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt # ls -ld /mnt/events/sched drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/ # ls -ld /mnt/events/sched/sched_switch drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/ But when checking the files: # ls -l /mnt/events/sched/sched_switch total 0 -rw-r- 1 root root 0 Dec 21 13:12 enable -rw-r- 1 root root 0 Dec 21 13:12 filter -r--r- 1 root root 0 Dec 21 13:12 format -r--r- 1 root root 0 Dec 21 13:12 hist -r--r- 1 root root 0 Dec 21 13:12 id -rw-r- 1 root root 0 Dec 21 13:12 trigger (b) When the attr does not denote the UID or GID, it defaulted to using the parent uid or gid. This is incorrect as changing the parent uid or gid will automatically change all its children. # chgrp tracing /mnt/events/timer # ls -ld /mnt/events/timer drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer # ls -l /mnt/events/timer total 0 -rw-r- 1 root root0 Dec 21 14:35 enable -rw-r- 1 root root0 Dec 21 14:35 filter drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start At first it was thought that this could be easily fixed by just making the default ownership of the superblock when it was mounted. But this does not handle the case of: # chgrp tracing instances # mkdir instances/foo If the superblock was used, then the group ownership would be that of what it was when it was mounted, when it should instead be "tracing". Instead, set a flag for the top level eventfs directory ("events") to flag which eventfs_inode belongs to it. Since the "events" directory's dentry and inode are never freed, it does not need to use its attr field to restore its mode and ownership. Use the this eventfs_inode's attr as the default ownership for all the files and directories underneath it. When the events eventfs_inode is created, it sets its ownership to its parent uid and gid. As the events directory is created at boot up before it gets mounted, this will always be uid=0 and gid=0. If it's created via an instance, then it will take the ownership of the instance directory. When the file system is mounted, it will update all the gids if one is specified. This will have a callback to update the events evenfs_inode's default entries. When a file or directory is created under the events directory, it will walk the ei->dentry parents until it finds the evenfs_inode that belongs to the events directory to retrieve the default uid and gid values. Link: https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=gtw+anwscylhpfswrq84pjho...@mail.gmail.com/ Cc: sta...@vger.kernel.org Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to parent uid and gid") Reported-by: Linus Torvalds Signed-off-by: Steven Rostedt (Google) --- Changes since v1: https://lore.kernel.org/linux-trace-kernel/20231221173812.51fe6...@gandalf.local.home - While writing kselftests to test ownership, I found a bug. The remount option allows for an update of the gid. If it is specified, then all dentries are traversed in tracefs and eventfs and the gid is updated. The bug is that a eventfs that does not have a dentry, but had its gid updated, the old gid is still stored, and when the dentry is created, it will use the stored gid. Instead, check all eventfs_inodes that do not have a dentry, and traverse its children to make sure their gids are updated as well. fs/tracefs/event_inode.c | 105 +++ fs/tracefs/inode.c | 6 +++ fs/tracefs/internal.h| 2 + 3 files changed, 103 insertions(+), 10 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/
[PATCH] eventfs: Fix file and directory uid and gid ownership
From: "Steven Rostedt (Google)" It was reported that when mounting the tracefs file system with a gid other than root, the ownership did not carry down to the eventfs directory due to the dynamic nature of it. A fix was done to solve this, but it had two issues. (a) if the attr passed into update_inode_attr() was NULL, it didn't do anything. This is true for files that have not had a chown or chgrp done to itself or any of its sibling files, as the attr is allocated for all children when any one needs it. # umount /sys/kernel/tracing # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt # ls -ld /mnt/events/sched drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/ # ls -ld /mnt/events/sched/sched_switch drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/ But when checking the files: # ls -l /mnt/events/sched/sched_switch total 0 -rw-r- 1 root root 0 Dec 21 13:12 enable -rw-r- 1 root root 0 Dec 21 13:12 filter -r--r- 1 root root 0 Dec 21 13:12 format -r--r- 1 root root 0 Dec 21 13:12 hist -r--r- 1 root root 0 Dec 21 13:12 id -rw-r- 1 root root 0 Dec 21 13:12 trigger (b) When the attr does not denote the UID or GID, it defaulted to using the parent uid or gid. This is incorrect as changing the parent uid or gid will automatically change all its children. # chgrp tracing /mnt/events/timer # ls -ld /mnt/events/timer drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer # ls -l /mnt/events/timer total 0 -rw-r- 1 root root0 Dec 21 14:35 enable -rw-r- 1 root root0 Dec 21 14:35 filter drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start At first it was thought that this could be easily fixed by just making the default ownership of the superblock when it was mounted. But this does not handle the case of: # chgrp tracing instances # mkdir instances/foo If the superblock was used, then the group ownership would be that of what it was when it was mounted, when it should instead be "tracing". Instead, set a flag for the top level eventfs directory ("events") to flag which eventfs_inode belongs to it. Since the "events" directory's dentry and inode are never freed, it does not need to use its attr field to restore its mode and ownership. Use the this eventfs_inode's attr as the default ownership for all the files and directories underneath it. When the events eventfs_inode is created, it sets its ownership to its parent uid and gid. As the events directory is created at boot up before it gets mounted, this will always be uid=0 and gid=0. If it's created via an instance, then it will take the ownership of the instance directory. When the file system is mounted, it will update all the gids if one is specified. This will have a callback to update the events evenfs_inode's default entries. When a file or directory is created under the events directory, it will walk the ei->dentry parents until it finds the evenfs_inode that belongs to the events directory to retrieve the default uid and gid values. Link: https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=gtw+anwscylhpfswrq84pjho...@mail.gmail.com/ Cc: sta...@vger.kernel.org Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to parent uid and gid") Reported-by: Linus Torvalds Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 74 ++-- fs/tracefs/inode.c | 6 fs/tracefs/internal.h| 2 ++ 3 files changed, 72 insertions(+), 10 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 2ccc849a5bda..2d6b11195420 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -113,7 +113,14 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, * determined by the parent directory. */ if (dentry->d_inode->i_mode & S_IFDIR) { - update_attr(&ei->attr, iattr); + /* +* The events directory dentry is never freed, unless its +* part of an instance that is deleted. It's attr is the +* default for its child fi
Re: [PATCH] eventfs: Have event files and directories default to parent uid and gid
On Wed, 20 Dec 2023 10:50:17 -0500 Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > Dongliang reported: > > I found that in the latest version, the nodes of tracefs have been > changed to dynamically created. > > This has caused me to encounter a problem where the gid I specified in > the mounting parameters cannot apply to all files, as in the following > situation: > > /data/tmp/events # mount | grep tracefs > tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012) > > gid 3012 = readtracefs > > /data/tmp # ls -lh > total 0 > -r--r- 1 root readtracefs 0 1970-01-01 08:00 README > -r--r- 1 root readtracefs 0 1970-01-01 08:00 available_events > > ums9621_1h10:/data/tmp/events # ls -lh > total 0 > drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer > drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc > > It will prevent certain applications from accessing tracefs properly, I > try to avoid this issue by making the following modifications. > > To fix this, have the files created default to taking the ownership of > the parent dentry unless the ownership was previously set by the user. > > Link: > https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang@unisoc.com/ > This looks good to me. Acked-by: Masami Hiramatsu (Google) Thank you! > Reported-by: Dongliang Cui > Signed-off-by: Steven Rostedt (Google) > --- > fs/tracefs/event_inode.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c > index 43e237864a42..2ccc849a5bda 100644 > --- a/fs/tracefs/event_inode.c > +++ b/fs/tracefs/event_inode.c > @@ -148,7 +148,8 @@ static const struct file_operations > eventfs_file_operations = { > .release= eventfs_release, > }; > > -static void update_inode_attr(struct inode *inode, struct eventfs_attr > *attr, umode_t mode) > +static void update_inode_attr(struct dentry *dentry, struct inode *inode, > + struct eventfs_attr *attr, umode_t mode) > { > if (!attr) { > inode->i_mode = mode; > @@ -162,9 +163,13 @@ static void update_inode_attr(struct inode *inode, > struct eventfs_attr *attr, um > > if (attr->mode & EVENTFS_SAVE_UID) > inode->i_uid = attr->uid; > + else > + inode->i_uid = d_inode(dentry->d_parent)->i_uid; > > if (attr->mode & EVENTFS_SAVE_GID) > inode->i_gid = attr->gid; > + else > + inode->i_gid = d_inode(dentry->d_parent)->i_gid; > } > > /** > @@ -206,7 +211,7 @@ static struct dentry *create_file(const char *name, > umode_t mode, > return eventfs_failed_creating(dentry); > > /* If the user updated the directory's attributes, use them */ > - update_inode_attr(inode, attr, mode); > + update_inode_attr(dentry, inode, attr, mode); > > inode->i_op = &eventfs_file_inode_operations; > inode->i_fop = fop; > @@ -242,7 +247,8 @@ static struct dentry *create_dir(struct eventfs_inode > *ei, struct dentry *parent > return eventfs_failed_creating(dentry); > > /* If the user updated the directory's attributes, use them */ > - update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | > S_IXUGO); > + update_inode_attr(dentry, inode, &ei->attr, > + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO); > > inode->i_op = &eventfs_root_dir_inode_operations; > inode->i_fop = &eventfs_file_operations; > -- > 2.42.0 > -- Masami Hiramatsu (Google)
Re: [PATCH] eventfs: Have event files and directories default to parent uid and gid
On Wed, 20 Dec 2023 10:50:17 -0500 Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > Dongliang reported: > > I found that in the latest version, the nodes of tracefs have been > changed to dynamically created. > > This has caused me to encounter a problem where the gid I specified in > the mounting parameters cannot apply to all files, as in the following > situation: > > /data/tmp/events # mount | grep tracefs > tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012) > > gid 3012 = readtracefs > > /data/tmp # ls -lh > total 0 > -r--r- 1 root readtracefs 0 1970-01-01 08:00 README > -r--r- 1 root readtracefs 0 1970-01-01 08:00 available_events > > ums9621_1h10:/data/tmp/events # ls -lh > total 0 > drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer > drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc > > It will prevent certain applications from accessing tracefs properly, I > try to avoid this issue by making the following modifications. > > To fix this, have the files created default to taking the ownership of > the parent dentry unless the ownership was previously set by the user. > > Link: > https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang@unisoc.com/ > I forgot to add: Cc: sta...@vger.kernel.org Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode") -- Steve > Reported-by: Dongliang Cui > Signed-off-by: Steven Rostedt (Google) > --- > fs/tracefs/event_inode.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c > index 43e237864a42..2ccc849a5bda 100644 > --- a/fs/tracefs/event_inode.c > +++ b/fs/tracefs/event_inode.c > @@ -148,7 +148,8 @@ static const struct file_operations > eventfs_file_operations = { > .release= eventfs_release, > }; > > -static void update_inode_attr(struct inode *inode, struct eventfs_attr > *attr, umode_t mode) > +static void update_inode_attr(struct dentry *dentry, struct inode *inode, > + struct eventfs_attr *attr, umode_t mode) > { > if (!attr) { > inode->i_mode = mode; > @@ -162,9 +163,13 @@ static void update_inode_attr(struct inode *inode, > struct eventfs_attr *attr, um > > if (attr->mode & EVENTFS_SAVE_UID) > inode->i_uid = attr->uid; > + else > + inode->i_uid = d_inode(dentry->d_parent)->i_uid; > > if (attr->mode & EVENTFS_SAVE_GID) > inode->i_gid = attr->gid; > + else > + inode->i_gid = d_inode(dentry->d_parent)->i_gid; > } > > /** > @@ -206,7 +211,7 @@ static struct dentry *create_file(const char *name, > umode_t mode, > return eventfs_failed_creating(dentry); > > /* If the user updated the directory's attributes, use them */ > - update_inode_attr(inode, attr, mode); > + update_inode_attr(dentry, inode, attr, mode); > > inode->i_op = &eventfs_file_inode_operations; > inode->i_fop = fop; > @@ -242,7 +247,8 @@ static struct dentry *create_dir(struct eventfs_inode > *ei, struct dentry *parent > return eventfs_failed_creating(dentry); > > /* If the user updated the directory's attributes, use them */ > - update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | > S_IXUGO); > + update_inode_attr(dentry, inode, &ei->attr, > + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO); > > inode->i_op = &eventfs_root_dir_inode_operations; > inode->i_fop = &eventfs_file_operations;
[PATCH] eventfs: Have event files and directories default to parent uid and gid
From: "Steven Rostedt (Google)" Dongliang reported: I found that in the latest version, the nodes of tracefs have been changed to dynamically created. This has caused me to encounter a problem where the gid I specified in the mounting parameters cannot apply to all files, as in the following situation: /data/tmp/events # mount | grep tracefs tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012) gid 3012 = readtracefs /data/tmp # ls -lh total 0 -r--r- 1 root readtracefs 0 1970-01-01 08:00 README -r--r- 1 root readtracefs 0 1970-01-01 08:00 available_events ums9621_1h10:/data/tmp/events # ls -lh total 0 drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc It will prevent certain applications from accessing tracefs properly, I try to avoid this issue by making the following modifications. To fix this, have the files created default to taking the ownership of the parent dentry unless the ownership was previously set by the user. Link: https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang@unisoc.com/ Reported-by: Dongliang Cui Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 43e237864a42..2ccc849a5bda 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -148,7 +148,8 @@ static const struct file_operations eventfs_file_operations = { .release= eventfs_release, }; -static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, umode_t mode) +static void update_inode_attr(struct dentry *dentry, struct inode *inode, + struct eventfs_attr *attr, umode_t mode) { if (!attr) { inode->i_mode = mode; @@ -162,9 +163,13 @@ static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, um if (attr->mode & EVENTFS_SAVE_UID) inode->i_uid = attr->uid; + else + inode->i_uid = d_inode(dentry->d_parent)->i_uid; if (attr->mode & EVENTFS_SAVE_GID) inode->i_gid = attr->gid; + else + inode->i_gid = d_inode(dentry->d_parent)->i_gid; } /** @@ -206,7 +211,7 @@ static struct dentry *create_file(const char *name, umode_t mode, return eventfs_failed_creating(dentry); /* If the user updated the directory's attributes, use them */ - update_inode_attr(inode, attr, mode); + update_inode_attr(dentry, inode, attr, mode); inode->i_op = &eventfs_file_inode_operations; inode->i_fop = fop; @@ -242,7 +247,8 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent return eventfs_failed_creating(dentry); /* If the user updated the directory's attributes, use them */ - update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO); + update_inode_attr(dentry, inode, &ei->attr, + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO); inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations; -- 2.42.0
Re: [PATCH v3.4] capabilities: require CAP_SETFCAP to map uid 0
On Tue, Apr 20, 2021 at 9:47 AM Christian Brauner wrote: > > If there's no objections then Linus can probably just pick up the single > patch here directly: I'm ok with that assuming I don't hear any last-minute concerns.. I'll plan on appling it later today, so anybody with concerns please holler quickly. I don't want to leave it to much later in the week, and I may or may not be functional tomorrow (getting my second vaccine shot, some people react more strongly to it, so I'm leaving the possibility open of not getting out of bed ;) Linus
Re: [PATCH v3.4] capabilities: require CAP_SETFCAP to map uid 0
On Tue, Apr 20, 2021 at 08:43:34AM -0500, Serge Hallyn wrote: > cap_setfcap is required to create file capabilities. > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a > process running as uid 0 but without cap_setfcap is able to work around > this as follows: unshare a new user namespace which maps parent uid 0 > into the child namespace. While this task will not have new > capabilities against the parent namespace, there is a loophole due to > the way namespaced file capabilities are represented as xattrs. File > capabilities valid in userns 1 are distinguished from file capabilities > valid in userns 2 by the kuid which underlies uid 0. Therefore the > restricted root process can unshare a new self-mapping namespace, add a > namespaced file capability onto a file, then use that file capability in > the parent namespace. > > To prevent that, do not allow mapping parent uid 0 if the process which > opened the uid_map file does not have CAP_SETFCAP, which is the capability > for setting file capabilities. > > As a further wrinkle: a task can unshare its user namespace, then > open its uid_map file itself, and map (only) its own uid. In this > case we do not have the credential from before unshare, which was > potentially more restricted. So, when creating a user namespace, we > record whether the creator had CAP_SETFCAP. Then we can use that > during map_write(). > > With this patch: > > 1. Unprivileged user can still unshare -Ur > > ubuntu@caps:~$ unshare -Ur > root@caps:~# logout > > 2. Root user can still unshare -Ur > > ubuntu@caps:~$ sudo bash > root@caps:/home/ubuntu# unshare -Ur > root@caps:/home/ubuntu# logout > > 3. Root user without CAP_SETFCAP cannot unshare -Ur: > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > unable to set CAP_SETFCAP effective capability: Operation not permitted > root@caps:/home/ubuntu# unshare -Ur > unshare: write failed /proc/self/uid_map: Operation not permitted > > Note: an alternative solution would be to allow uid 0 mappings by > processes without CAP_SETFCAP, but to prevent such a namespace from > writing any file capabilities. This approach can be seen here: > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > > History: > > Commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file > capabilities") tried to fix the issue by preventing v3 fscaps to be > written to disk when the root uid would map to the same uid in nested > user namespaces. This led to regressions for various workloads. For > example, see [1]. Ultimately this is a valid use-case we have to support > meaning we had to revert this change in 3b0c2d3eaa83 ("Revert > 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file > capabilities")"). > > [1]: https://github.com/containers/buildah/issues/3071 > > Signed-off-by: Serge Hallyn > Reviewed-by: Andrew G. Morgan > Tested-by: Christian Brauner > Reviewed-by: Christian Brauner > Tested-by: Giuseppe Scrivano > Cc: "Eric W. Biederman" If there's no objections then Linus can probably just pick up the single patch here directly: https://lore.kernel.org/lkml/20210420134334.ga11...@mail.hallyn.com I'm not sure it's worth waiting and releasing another kernel with this bug. This tigthens the semantics nicely and makes for a simple check at userns creation time instead of repeatedly checking at setxattr(). With all the testing done we can be quite confident the risk of regressions is way lower than the old patch and even if we see one I think this version of the fix is actually worth the risk. Christian
[PATCH v3.4] capabilities: require CAP_SETFCAP to map uid 0
cap_setfcap is required to create file capabilities. Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a process running as uid 0 but without cap_setfcap is able to work around this as follows: unshare a new user namespace which maps parent uid 0 into the child namespace. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities are represented as xattrs. File capabilities valid in userns 1 are distinguished from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping parent uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. As a further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. Unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. Root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. Root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Note: an alternative solution would be to allow uid 0 mappings by processes without CAP_SETFCAP, but to prevent such a namespace from writing any file capabilities. This approach can be seen here: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 History: Commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file capabilities") tried to fix the issue by preventing v3 fscaps to be written to disk when the root uid would map to the same uid in nested user namespaces. This led to regressions for various workloads. For example, see [1]. Ultimately this is a valid use-case we have to support meaning we had to revert this change in 3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")"). [1]: https://github.com/containers/buildah/issues/3071 Signed-off-by: Serge Hallyn Reviewed-by: Andrew G. Morgan Tested-by: Christian Brauner Reviewed-by: Christian Brauner Tested-by: Giuseppe Scrivano Cc: "Eric W. Biederman" Changelog: * fix logic in the case of writing to another task's uid_map * rename 'ns' to 'map_ns', and make a file_ns local variable * use /* comments */ * update the CAP_SETFCAP comment in capability.h * rename parent_unpriv to parent_can_setfcap (and reverse the logic) * remove printks * clarify (i hope) the code comments * update capability.h comment * renamed parent_can_setfcap to parent_could_setfcap * made the check its own disallowed_0_mapping() fn * moved the check into new_idmap_permitted * rename disallowed_0_mapping to verify_root_mapping * change verify_root_mapping to Christian's suggested flow * correct+clarify comments: parent uid 0 mapping to any child uid is a problem. * remove unused lower_first variable. --- include/linux/user_namespace.h | 3 ++ include/uapi/linux/capability.h | 3 +- kernel/user_namespace.c | 65 +++-- 3 files changed, 67 insertions(+), 4 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 64cf8ebdc4ec..f6c5f784be5a 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -63,6 +63,9 @@ struct user_namespace { kgid_t group; struct ns_commonns; unsigned long flags; + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP +* in its effective capability set at the child ns creation time. */ + boolparent_could_setfcap; #ifdef CONFIG_KEYS /* List of joinable keyrings in this namespace. Modification access of diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index c6ca33034147..2ddb4226cd23 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -335,7 +335,8 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_CONTROL30 -/* Set or remove ca
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)
On Mon, Apr 19, 2021 at 05:52:39PM +0200, Giuseppe Scrivano wrote: > ebied...@xmission.com (Eric W. Biederman) writes: > > > Guiseppe can you take a look at this? > > > > This is a second attempt at tightening up the semantics of writing to > > file capabilities from a user namespace. > > > > The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c > > ("capabilities: Don't allow writing ambiguous v3 file capabilities")"), > > which corrected the issue reported in: > > https://github.com/containers/buildah/issues/3071 > > > > There is a report the podman testsuite passes. While different this > > looks in many ways much more strict than the code that was reverted. So > > while I can imagine this change doesn't cause problems as is, I will be > > surprised. > > thanks for pulling me in the discussion. > > I've tested the patch with several cases similar to the issue we had in > the past and the patch seems to work well. > > Podman creates all the user namespaces within the same parent user > namespace. In the parent user namespace all the capabilities are kept > and AFAIK Docker does the same. I'd expect a change in behavior only > for nested user namespaces in containers where CAP_SETFCAP is not > granted, but that is not a common configuration given that CAP_SETFCAP > is added by default. > > > > "Serge E. Hallyn" writes: > > > >> +/** > >> + * verify_root_map() - check the uid 0 mapping > >> + * @file: idmapping file > >> + * @map_ns: user namespace of the target process > >> + * @new_map: requested idmap > >> + * > >> + * If a process requested a mapping for uid 0 onto uid 0, verify that the > >> + * process writing the map had the CAP_SETFCAP capability as the target > >> process > >> + * will be able to write fscaps that are valid in ancestor user > >> namespaces. > >> + * > >> + * Return: true if the mapping is allowed, false if not. > >> + */ > >> +static bool verify_root_map(const struct file *file, > >> + struct user_namespace *map_ns, > >> + struct uid_gid_map *new_map) > >> +{ > >> + int idx; > >> + const struct user_namespace *file_ns = file->f_cred->user_ns; > >> + struct uid_gid_extent *extent0 = NULL; > >> + > >> + for (idx = 0; idx < new_map->nr_extents; idx++) { > >> + u32 lower_first; > > nit: lower_first seems unused? Drat - I noticed that Sunday or Monday and forgot to remove it, thanks. > >> + > >> + if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) > >> + extent0 = &new_map->extent[idx]; > >> + else > >> + extent0 = &new_map->forward[idx]; > >> + if (extent0->lower_first == 0) > >> + break; > >> + > >> + extent0 = NULL; > >> + } > > Tested-by: Giuseppe Scrivano Awesome - thanks for testing.
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)
On Mon, Apr 19, 2021 at 10:42:08PM -0500, Serge Hallyn wrote: > On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote: > > On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote: > > > cap_setfcap is required to create file capabilities. > > > > > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a > > > process running as uid 0 but without cap_setfcap is able to work around > > > this as follows: unshare a new user namespace which maps parent uid 0 > > > into the child namespace. While this task will not have new > > > capabilities against the parent namespace, there is a loophole due to > > > the way namespaced file capabilities are represented as xattrs. File > > > capabilities valid in userns 1 are distinguished from file capabilities > > > valid in userns 2 by the kuid which underlies uid 0. Therefore the > > > restricted root process can unshare a new self-mapping namespace, add a > > > namespaced file capability onto a file, then use that file capability in > > > the parent namespace. > > > > > > To prevent that, do not allow mapping parent uid 0 if the process which > > > opened the uid_map file does not have CAP_SETFCAP, which is the capability > > > for setting file capabilities. > > > > > > As a further wrinkle: a task can unshare its user namespace, then > > > open its uid_map file itself, and map (only) its own uid. In this > > > case we do not have the credential from before unshare, which was > > > potentially more restricted. So, when creating a user namespace, we > > > record whether the creator had CAP_SETFCAP. Then we can use that > > > during map_write(). > > > > > > With this patch: > > > > > > 1. Unprivileged user can still unshare -Ur > > > > > > ubuntu@caps:~$ unshare -Ur > > > root@caps:~# logout > > > > > > 2. Root user can still unshare -Ur > > > > > > ubuntu@caps:~$ sudo bash > > > root@caps:/home/ubuntu# unshare -Ur > > > root@caps:/home/ubuntu# logout > > > > > > 3. Root user without CAP_SETFCAP cannot unshare -Ur: > > > > > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > > > unable to set CAP_SETFCAP effective capability: Operation not permitted > > > root@caps:/home/ubuntu# unshare -Ur > > > unshare: write failed /proc/self/uid_map: Operation not permitted > > > > > > Note: an alternative solution would be to allow uid 0 mappings by > > > processes without CAP_SETFCAP, but to prevent such a namespace from > > > writing any file capabilities. This approach can be seen here: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > > > > > > > Ah, can you link to the previous fix and its revert, please? I think > > that was mentioned in the formerly private thread as well but we forgot: > > > > commit 95ebabde382c371572297915b104e55403674e73 > > Author: Eric W. Biederman > > Date: Thu Dec 17 09:42:00 2020 -0600 > > > > capabilities: Don't allow writing ambiguous v3 file capabilities > > > > commit 3b0c2d3eaa83da259d7726192cf55a137769012f > > Author: Eric W. Biederman > > Date: Fri Mar 12 15:07:09 2021 -0600 > > > > Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 > > file capabilities") > > Sure. > > Is there a tag for that kind of thing or do I just mention it at the end > of the description? In this case it might make sense to just have a little paragraph that explains the regression. How do you feel about adding?: Commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file capabilities") tried to fix the issue by preventing v3 fscaps to be written to disk when the root uid would map to the same uid in nested user namespaces. This led to regressions for various workloads. For example, see [1]. Ultimately this is a valid use-case we have to support meaning we had to revert this change in 3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")"). [1]: https://github.com/containers/buildah/issues/3071
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)
On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote: > On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote: > > cap_setfcap is required to create file capabilities. > > > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a > > process running as uid 0 but without cap_setfcap is able to work around > > this as follows: unshare a new user namespace which maps parent uid 0 > > into the child namespace. While this task will not have new > > capabilities against the parent namespace, there is a loophole due to > > the way namespaced file capabilities are represented as xattrs. File > > capabilities valid in userns 1 are distinguished from file capabilities > > valid in userns 2 by the kuid which underlies uid 0. Therefore the > > restricted root process can unshare a new self-mapping namespace, add a > > namespaced file capability onto a file, then use that file capability in > > the parent namespace. > > > > To prevent that, do not allow mapping parent uid 0 if the process which > > opened the uid_map file does not have CAP_SETFCAP, which is the capability > > for setting file capabilities. > > > > As a further wrinkle: a task can unshare its user namespace, then > > open its uid_map file itself, and map (only) its own uid. In this > > case we do not have the credential from before unshare, which was > > potentially more restricted. So, when creating a user namespace, we > > record whether the creator had CAP_SETFCAP. Then we can use that > > during map_write(). > > > > With this patch: > > > > 1. Unprivileged user can still unshare -Ur > > > > ubuntu@caps:~$ unshare -Ur > > root@caps:~# logout > > > > 2. Root user can still unshare -Ur > > > > ubuntu@caps:~$ sudo bash > > root@caps:/home/ubuntu# unshare -Ur > > root@caps:/home/ubuntu# logout > > > > 3. Root user without CAP_SETFCAP cannot unshare -Ur: > > > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > > unable to set CAP_SETFCAP effective capability: Operation not permitted > > root@caps:/home/ubuntu# unshare -Ur > > unshare: write failed /proc/self/uid_map: Operation not permitted > > > > Note: an alternative solution would be to allow uid 0 mappings by > > processes without CAP_SETFCAP, but to prevent such a namespace from > > writing any file capabilities. This approach can be seen here: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > > > > Ah, can you link to the previous fix and its revert, please? I think > that was mentioned in the formerly private thread as well but we forgot: > > commit 95ebabde382c371572297915b104e55403674e73 > Author: Eric W. Biederman > Date: Thu Dec 17 09:42:00 2020 -0600 > > capabilities: Don't allow writing ambiguous v3 file capabilities > > commit 3b0c2d3eaa83da259d7726192cf55a137769012f > Author: Eric W. Biederman > Date: Fri Mar 12 15:07:09 2021 -0600 > > Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file > capabilities") Sure. Is there a tag for that kind of thing or do I just mention it at the end of the description?
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)
On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote: > cap_setfcap is required to create file capabilities. > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a > process running as uid 0 but without cap_setfcap is able to work around > this as follows: unshare a new user namespace which maps parent uid 0 > into the child namespace. While this task will not have new > capabilities against the parent namespace, there is a loophole due to > the way namespaced file capabilities are represented as xattrs. File > capabilities valid in userns 1 are distinguished from file capabilities > valid in userns 2 by the kuid which underlies uid 0. Therefore the > restricted root process can unshare a new self-mapping namespace, add a > namespaced file capability onto a file, then use that file capability in > the parent namespace. > > To prevent that, do not allow mapping parent uid 0 if the process which > opened the uid_map file does not have CAP_SETFCAP, which is the capability > for setting file capabilities. > > As a further wrinkle: a task can unshare its user namespace, then > open its uid_map file itself, and map (only) its own uid. In this > case we do not have the credential from before unshare, which was > potentially more restricted. So, when creating a user namespace, we > record whether the creator had CAP_SETFCAP. Then we can use that > during map_write(). > > With this patch: > > 1. Unprivileged user can still unshare -Ur > > ubuntu@caps:~$ unshare -Ur > root@caps:~# logout > > 2. Root user can still unshare -Ur > > ubuntu@caps:~$ sudo bash > root@caps:/home/ubuntu# unshare -Ur > root@caps:/home/ubuntu# logout > > 3. Root user without CAP_SETFCAP cannot unshare -Ur: > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > unable to set CAP_SETFCAP effective capability: Operation not permitted > root@caps:/home/ubuntu# unshare -Ur > unshare: write failed /proc/self/uid_map: Operation not permitted > > Note: an alternative solution would be to allow uid 0 mappings by > processes without CAP_SETFCAP, but to prevent such a namespace from > writing any file capabilities. This approach can be seen here: > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > Ah, can you link to the previous fix and its revert, please? I think that was mentioned in the formerly private thread as well but we forgot: commit 95ebabde382c371572297915b104e55403674e73 Author: Eric W. Biederman Date: Thu Dec 17 09:42:00 2020 -0600 capabilities: Don't allow writing ambiguous v3 file capabilities commit 3b0c2d3eaa83da259d7726192cf55a137769012f Author: Eric W. Biederman Date: Fri Mar 12 15:07:09 2021 -0600 Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities") > Signed-off-by: Serge Hallyn > Reviewed-by: Andrew G. Morgan > Tested-by: Christian Brauner > Reviewed-by: Christian Brauner > Cc: "Eric W. Biederman"
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)
On Mon, Apr 19, 2021 at 05:52:39PM +0200, Giuseppe Scrivano wrote: > ebied...@xmission.com (Eric W. Biederman) writes: > > > Guiseppe can you take a look at this? > > > > This is a second attempt at tightening up the semantics of writing to > > file capabilities from a user namespace. > > > > The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c > > ("capabilities: Don't allow writing ambiguous v3 file capabilities")"), > > which corrected the issue reported in: > > https://github.com/containers/buildah/issues/3071 > > > > There is a report the podman testsuite passes. While different this > > looks in many ways much more strict than the code that was reverted. So > > while I can imagine this change doesn't cause problems as is, I will be > > surprised. > > thanks for pulling me in the discussion. > > I've tested the patch with several cases similar to the issue we had in > the past and the patch seems to work well. > > Podman creates all the user namespaces within the same parent user > namespace. In the parent user namespace all the capabilities are kept > and AFAIK Docker does the same. I'd expect a change in behavior only > for nested user namespaces in containers where CAP_SETFCAP is not > granted, but that is not a common configuration given that CAP_SETFCAP > is added by default. Same for us and we do have extensive nested container workloads with other runtimes running containers too. > > > > "Serge E. Hallyn" writes: > > > >> +/** > >> + * verify_root_map() - check the uid 0 mapping > >> + * @file: idmapping file > >> + * @map_ns: user namespace of the target process > >> + * @new_map: requested idmap > >> + * > >> + * If a process requested a mapping for uid 0 onto uid 0, verify that the > >> + * process writing the map had the CAP_SETFCAP capability as the target > >> process > >> + * will be able to write fscaps that are valid in ancestor user > >> namespaces. > >> + * > >> + * Return: true if the mapping is allowed, false if not. > >> + */ > >> +static bool verify_root_map(const struct file *file, > >> + struct user_namespace *map_ns, > >> + struct uid_gid_map *new_map) > >> +{ > >> + int idx; > >> + const struct user_namespace *file_ns = file->f_cred->user_ns; > >> + struct uid_gid_extent *extent0 = NULL; > >> + > >> + for (idx = 0; idx < new_map->nr_extents; idx++) { > >> + u32 lower_first; > > nit: lower_first seems unused? > > >> + > >> + if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) > >> + extent0 = &new_map->extent[idx]; > >> + else > >> + extent0 = &new_map->forward[idx]; > >> + if (extent0->lower_first == 0) > >> + break; > >> + > >> + extent0 = NULL; > >> + } > > Tested-by: Giuseppe Scrivano Thanks for running the tests and confirming my results! Christian
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)
ebied...@xmission.com (Eric W. Biederman) writes: > Guiseppe can you take a look at this? > > This is a second attempt at tightening up the semantics of writing to > file capabilities from a user namespace. > > The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c > ("capabilities: Don't allow writing ambiguous v3 file capabilities")"), > which corrected the issue reported in: > https://github.com/containers/buildah/issues/3071 > > There is a report the podman testsuite passes. While different this > looks in many ways much more strict than the code that was reverted. So > while I can imagine this change doesn't cause problems as is, I will be > surprised. thanks for pulling me in the discussion. I've tested the patch with several cases similar to the issue we had in the past and the patch seems to work well. Podman creates all the user namespaces within the same parent user namespace. In the parent user namespace all the capabilities are kept and AFAIK Docker does the same. I'd expect a change in behavior only for nested user namespaces in containers where CAP_SETFCAP is not granted, but that is not a common configuration given that CAP_SETFCAP is added by default. > "Serge E. Hallyn" writes: > >> +/** >> + * verify_root_map() - check the uid 0 mapping >> + * @file: idmapping file >> + * @map_ns: user namespace of the target process >> + * @new_map: requested idmap >> + * >> + * If a process requested a mapping for uid 0 onto uid 0, verify that the >> + * process writing the map had the CAP_SETFCAP capability as the target >> process >> + * will be able to write fscaps that are valid in ancestor user namespaces. >> + * >> + * Return: true if the mapping is allowed, false if not. >> + */ >> +static bool verify_root_map(const struct file *file, >> +struct user_namespace *map_ns, >> +struct uid_gid_map *new_map) >> +{ >> +int idx; >> +const struct user_namespace *file_ns = file->f_cred->user_ns; >> +struct uid_gid_extent *extent0 = NULL; >> + >> +for (idx = 0; idx < new_map->nr_extents; idx++) { >> +u32 lower_first; nit: lower_first seems unused? >> + >> +if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) >> +extent0 = &new_map->extent[idx]; >> +else >> +extent0 = &new_map->forward[idx]; >> +if (extent0->lower_first == 0) >> +break; >> + >> +extent0 = NULL; >> +} Tested-by: Giuseppe Scrivano Regards, Giuseppe
[PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)
cap_setfcap is required to create file capabilities. Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a process running as uid 0 but without cap_setfcap is able to work around this as follows: unshare a new user namespace which maps parent uid 0 into the child namespace. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities are represented as xattrs. File capabilities valid in userns 1 are distinguished from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping parent uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. As a further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. Unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. Root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. Root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Note: an alternative solution would be to allow uid 0 mappings by processes without CAP_SETFCAP, but to prevent such a namespace from writing any file capabilities. This approach can be seen here: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 Signed-off-by: Serge Hallyn Reviewed-by: Andrew G. Morgan Tested-by: Christian Brauner Reviewed-by: Christian Brauner Cc: "Eric W. Biederman" Changelog: * fix logic in the case of writing to another task's uid_map * rename 'ns' to 'map_ns', and make a file_ns local variable * use /* comments */ * update the CAP_SETFCAP comment in capability.h * rename parent_unpriv to parent_can_setfcap (and reverse the logic) * remove printks * clarify (i hope) the code comments * update capability.h comment * renamed parent_can_setfcap to parent_could_setfcap * made the check its own disallowed_0_mapping() fn * moved the check into new_idmap_permitted * rename disallowed_0_mapping to verify_root_mapping * change verify_root_mapping to Christian's suggested flow * correct+clarify comments: parent uid 0 mapping to any child uid is a problem. --- include/linux/user_namespace.h | 3 ++ include/uapi/linux/capability.h | 3 +- kernel/user_namespace.c | 67 +++-- 3 files changed, 69 insertions(+), 4 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 64cf8ebdc4ec..f6c5f784be5a 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -63,6 +63,9 @@ struct user_namespace { kgid_t group; struct ns_commonns; unsigned long flags; + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP +* in its effective capability set at the child ns creation time. */ + boolparent_could_setfcap; #ifdef CONFIG_KEYS /* List of joinable keyrings in this namespace. Modification access of diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index c6ca33034147..2ddb4226cd23 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -335,7 +335,8 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_CONTROL30 -/* Set or remove capabilities on files */ +/* Set or remove capabilities on files. + Map uid=0 into a child user namespace. */ #define CAP_SETFCAP 31 diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index af612945a4d0..609a729a9879 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -106,6 +106,7 @@ int create_user_ns(struct cred *new) if (!ns) goto fail_dec; + ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP); ret = ns_alloc_inum(&ns->ns); if (ret) goto fail_free; @@ -841,6 +842,62 @@ static int sort_idmaps(struct uid_gid_map *map) retu
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)
Guiseppe can you take a look at this? This is a second attempt at tightening up the semantics of writing to file capabilities from a user namespace. The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")"), which corrected the issue reported in: https://github.com/containers/buildah/issues/3071 There is a report the podman testsuite passes. While different this looks in many ways much more strict than the code that was reverted. So while I can imagine this change doesn't cause problems as is, I will be surprised. Eric "Serge E. Hallyn" writes: > A process running as uid 0 but without cap_setfcap currently can simply > unshare a new user namespace with uid 0 mapped to 0. While this task > will not have new capabilities against the parent namespace, there is > a loophole due to the way namespaced file capabilities work. File > capabilities valid in userns 1 are distinguised from file capabilities > valid in userns 2 by the kuid which underlies uid 0. Therefore > the restricted root process can unshare a new self-mapping namespace, > add a namespaced file capability onto a file, then use that file > capability in the parent namespace. > > To prevent that, do not allow mapping uid 0 if the process which > opened the uid_map file does not have CAP_SETFCAP, which is the capability > for setting file capabilities. > > A further wrinkle: a task can unshare its user namespace, then > open its uid_map file itself, and map (only) its own uid. In this > case we do not have the credential from before unshare, which was > potentially more restricted. So, when creating a user namespace, we > record whether the creator had CAP_SETFCAP. Then we can use that > during map_write(). > > With this patch: > > 1. unprivileged user can still unshare -Ur > > ubuntu@caps:~$ unshare -Ur > root@caps:~# logout > > 2. root user can still unshare -Ur > > ubuntu@caps:~$ sudo bash > root@caps:/home/ubuntu# unshare -Ur > root@caps:/home/ubuntu# logout > > 3. root user without CAP_SETFCAP cannot unshare -Ur: > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > unable to set CAP_SETFCAP effective capability: Operation not permitted > root@caps:/home/ubuntu# unshare -Ur > unshare: write failed /proc/self/uid_map: Operation not permitted > > Signed-off-by: Serge Hallyn > > Changelog: >* fix logic in the case of writing to another task's uid_map >* rename 'ns' to 'map_ns', and make a file_ns local variable >* use /* comments */ >* update the CAP_SETFCAP comment in capability.h >* rename parent_unpriv to parent_can_setfcap (and reverse the > logic) >* remove printks >* clarify (i hope) the code comments >* update capability.h comment >* renamed parent_can_setfcap to parent_could_setfcap >* made the check its own disallowed_0_mapping() fn >* moved the check into new_idmap_permitted >* rename disallowed_0_mapping to verify_root_mapping >* change verify_root_mapping to Christian's suggested flow > --- > include/linux/user_namespace.h | 3 ++ > include/uapi/linux/capability.h | 3 +- > kernel/user_namespace.c | 66 +++-- > 3 files changed, 68 insertions(+), 4 deletions(-) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index 64cf8ebdc4ec..f6c5f784be5a 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -63,6 +63,9 @@ struct user_namespace { > kgid_t group; > struct ns_commonns; > unsigned long flags; > + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP > + * in its effective capability set at the child ns creation time. */ > + boolparent_could_setfcap; > > #ifdef CONFIG_KEYS > /* List of joinable keyrings in this namespace. Modification access of > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h > index c6ca33034147..2ddb4226cd23 100644 > --- a/include/uapi/linux/capability.h > +++ b/include/uapi/linux/capability.h > @@ -335,7 +335,8 @@ struct vfs_ns_cap_data { > > #define CAP_AUDIT_CONTROL30 > > -/* Set or remove capabilities on files */ > +/* Set or remove capabilities on files. > + Map uid=0 into a child user namespace. */ > > #define CAP_SETFCAP 31 > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index af612945a4d0..2ead291177b0 100644 > --- a/kernel/us
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)
On Sat, Apr 17, 2021 at 03:04:34PM -0500, Serge Hallyn wrote: > A process running as uid 0 but without cap_setfcap currently can simply > unshare a new user namespace with uid 0 mapped to 0. While this task > will not have new capabilities against the parent namespace, there is > a loophole due to the way namespaced file capabilities work. File > capabilities valid in userns 1 are distinguised from file capabilities > valid in userns 2 by the kuid which underlies uid 0. Therefore > the restricted root process can unshare a new self-mapping namespace, > add a namespaced file capability onto a file, then use that file > capability in the parent namespace. > > To prevent that, do not allow mapping uid 0 if the process which > opened the uid_map file does not have CAP_SETFCAP, which is the capability > for setting file capabilities. > > A further wrinkle: a task can unshare its user namespace, then > open its uid_map file itself, and map (only) its own uid. In this > case we do not have the credential from before unshare, which was > potentially more restricted. So, when creating a user namespace, we > record whether the creator had CAP_SETFCAP. Then we can use that > during map_write(). > > With this patch: > > 1. unprivileged user can still unshare -Ur > > ubuntu@caps:~$ unshare -Ur > root@caps:~# logout > > 2. root user can still unshare -Ur > > ubuntu@caps:~$ sudo bash > root@caps:/home/ubuntu# unshare -Ur > root@caps:/home/ubuntu# logout > > 3. root user without CAP_SETFCAP cannot unshare -Ur: > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > unable to set CAP_SETFCAP effective capability: Operation not permitted > root@caps:/home/ubuntu# unshare -Ur > unshare: write failed /proc/self/uid_map: Operation not permitted > > Signed-off-by: Serge Hallyn > > Changelog: >* fix logic in the case of writing to another task's uid_map >* rename 'ns' to 'map_ns', and make a file_ns local variable >* use /* comments */ >* update the CAP_SETFCAP comment in capability.h >* rename parent_unpriv to parent_can_setfcap (and reverse the > logic) >* remove printks >* clarify (i hope) the code comments >* update capability.h comment >* renamed parent_can_setfcap to parent_could_setfcap >* made the check its own disallowed_0_mapping() fn >* moved the check into new_idmap_permitted >* rename disallowed_0_mapping to verify_root_mapping >* change verify_root_mapping to Christian's suggested flow > --- Thank you. This looks good. I tested this with: - fstests - LXD testsuite - Podman testsuite - libcap testsuite Tested-by: Christian Brauner Reviewed-by: Christian Brauner
RE: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index
> -Original Message- > From: Viktor Mihajlovski > Sent: Saturday, April 17, 2021 4:18 PM > To: K, Narendra; Niklas Schnelle; Bjorn Helgaas > Cc: Stefan Raspl; Peter Oberparleiter; linux-net...@vger.kernel.org; linux- > p...@vger.kernel.org; linux-kernel@vger.kernel.org; linux- > s...@vger.kernel.org > Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index > > > [EXTERNAL EMAIL] > > > > On 4/16/21 7:58 PM, K, Narendra wrote: > >> -Original Message- > >> From: Niklas Schnelle > >> Sent: Thursday, April 15, 2021 12:55 PM > >> To: Bjorn Helgaas > >> Cc: K, Narendra; Viktor Mihajlovski; Stefan Raspl; Peter > >> Oberparleiter; linux- net...@vger.kernel.org; > >> linux-...@vger.kernel.org; linux- ker...@vger.kernel.org; > >> linux-s...@vger.kernel.org > >> Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its > >> index > >> > >> > >> [EXTERNAL EMAIL] > >> > >> On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote: > >>> On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote: > >>>> On s390 each PCI device has a user-defined ID (UID) exposed under > >>>> /sys/bus/pci/devices//uid. This ID was designed to serve as > >>>> the PCI device's primary index and to match the device within Linux > >>>> to the device configured in the hypervisor. To serve as a primary > >>>> identifier the UID must be unique within the Linux instance, this > >>>> is guaranteed by the platform if and only if the UID Uniqueness > >>>> Checking flag is set within the CLP List PCI Functions response. > >>>> > >>>> In this sense the UID serves an analogous function as the SMBIOS > >>>> instance number or ACPI index exposed as the "index" respectively > >>>> "acpi_index" device attributes and used by e.g. systemd to set > >>>> interface names. As s390 does not use and will likely never use > >>>> ACPI nor SMBIOS there is no conflict and we can just expose the UID > >>>> under the > >> "index" > >>>> attribute whenever UID Uniqueness Checking is active and get > >>>> systemd's interface naming support for free. > >>>> > >>>> Signed-off-by: Niklas Schnelle > >>>> Acked-by: Viktor Mihajlovski > >>> > >>> This seems like a nice solution to me. > >>> > >>> Acked-by: Bjorn Helgaas > >> > >> Thanks! Yes I agree it's a simple solution that also makes sense from > >> a design point. I'll wait for Narendra's opinion of course. > > > > Hi Niklas, > > > > It seems like the UID and the index are not equivalent in their meaning. > > > Hi Narendra, > > the reasoning behind the wish to reuse index is that we think it's similar in > the > sense that it provides a stable, firmware-reported interface identifier. > Some background: s390 is a highly virtualized platform. There's no traditional > bare metal mode of operation, neither for the computer system nor the I/O > subsystem. > The equivalent to a standalone system is a logical partition (LPAR) which in > essence is a kind of virtual machine. LPARs access I/O devices in a > virtualized > fashion as well. E.g., for PCI devices the I/O subsystem is responsible for > the > management of PCI hardware and will provide the PCI functions to the LPARs. > This is where UID and function_id come into play. Each PCI function will > carry a > function_id attribute which defines it uniquely within the entire s390 > system. If > a UID is configured for the function, it must be unique within an LPAR, but > doesn't need to be unique system-wide. > For instance, the admistrator may want to ensure that for every LPAR the > primary ethernet interface has the same identifier, to allow miigration of > Linux > instances accross LPARs. > This can't be achieved with a slot-based name. > > The index SMBIOS type 41 device type instance indicates that > > > > 1. The device is an onboard device. > > 2. A specific order of the onboard devices. > > > > The UID described seems to indicate that the PCI device is unique within the > Linux instance (sufficient for naming). > > But it does not indicate that PCI device is onboard and any order. > > > > As all devices with UID will be named as eno (Ethernet onboard), > > names are not in sync with how each PCI function is
[PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)
A process running as uid 0 but without cap_setfcap currently can simply unshare a new user namespace with uid 0 mapped to 0. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities work. File capabilities valid in userns 1 are distinguised from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. A further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Signed-off-by: Serge Hallyn Changelog: * fix logic in the case of writing to another task's uid_map * rename 'ns' to 'map_ns', and make a file_ns local variable * use /* comments */ * update the CAP_SETFCAP comment in capability.h * rename parent_unpriv to parent_can_setfcap (and reverse the logic) * remove printks * clarify (i hope) the code comments * update capability.h comment * renamed parent_can_setfcap to parent_could_setfcap * made the check its own disallowed_0_mapping() fn * moved the check into new_idmap_permitted * rename disallowed_0_mapping to verify_root_mapping * change verify_root_mapping to Christian's suggested flow --- include/linux/user_namespace.h | 3 ++ include/uapi/linux/capability.h | 3 +- kernel/user_namespace.c | 66 +++-- 3 files changed, 68 insertions(+), 4 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 64cf8ebdc4ec..f6c5f784be5a 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -63,6 +63,9 @@ struct user_namespace { kgid_t group; struct ns_commonns; unsigned long flags; + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP +* in its effective capability set at the child ns creation time. */ + boolparent_could_setfcap; #ifdef CONFIG_KEYS /* List of joinable keyrings in this namespace. Modification access of diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index c6ca33034147..2ddb4226cd23 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -335,7 +335,8 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_CONTROL30 -/* Set or remove capabilities on files */ +/* Set or remove capabilities on files. + Map uid=0 into a child user namespace. */ #define CAP_SETFCAP 31 diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index af612945a4d0..2ead291177b0 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -106,6 +106,7 @@ int create_user_ns(struct cred *new) if (!ns) goto fail_dec; + ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP); ret = ns_alloc_inum(&ns->ns); if (ret) goto fail_free; @@ -841,6 +842,61 @@ static int sort_idmaps(struct uid_gid_map *map) return 0; } +/** + * verify_root_map() - check the uid 0 mapping + * @file: idmapping file + * @map_ns: user namespace of the target process + * @new_map: requested idmap + * + * If a process requested a mapping for uid 0 onto uid 0, verify that the + * process writing the map had the CAP_SETFCAP capability as the target process + * will be able to write fscaps that are valid in ancestor user namespaces. + * + * Return: true if the mapping is allowed, false if not. + */ +static bool verify_root_map(const struct file *file, + struct user_namespace *map_ns, + struct uid_gid_map *new_map) +{ + int idx; + const struct user_namespace *file_ns = file->f_cred-&g
Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index
On 4/16/21 7:58 PM, K, Narendra wrote: -Original Message- From: Niklas Schnelle Sent: Thursday, April 15, 2021 12:55 PM To: Bjorn Helgaas Cc: K, Narendra; Viktor Mihajlovski; Stefan Raspl; Peter Oberparleiter; linux- net...@vger.kernel.org; linux-...@vger.kernel.org; linux- ker...@vger.kernel.org; linux-s...@vger.kernel.org Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index [EXTERNAL EMAIL] On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote: On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote: On s390 each PCI device has a user-defined ID (UID) exposed under /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI device's primary index and to match the device within Linux to the device configured in the hypervisor. To serve as a primary identifier the UID must be unique within the Linux instance, this is guaranteed by the platform if and only if the UID Uniqueness Checking flag is set within the CLP List PCI Functions response. In this sense the UID serves an analogous function as the SMBIOS instance number or ACPI index exposed as the "index" respectively "acpi_index" device attributes and used by e.g. systemd to set interface names. As s390 does not use and will likely never use ACPI nor SMBIOS there is no conflict and we can just expose the UID under the "index" attribute whenever UID Uniqueness Checking is active and get systemd's interface naming support for free. Signed-off-by: Niklas Schnelle Acked-by: Viktor Mihajlovski This seems like a nice solution to me. Acked-by: Bjorn Helgaas Thanks! Yes I agree it's a simple solution that also makes sense from a design point. I'll wait for Narendra's opinion of course. Hi Niklas, It seems like the UID and the index are not equivalent in their meaning. Hi Narendra, the reasoning behind the wish to reuse index is that we think it's similar in the sense that it provides a stable, firmware-reported interface identifier. Some background: s390 is a highly virtualized platform. There's no traditional bare metal mode of operation, neither for the computer system nor the I/O subsystem. The equivalent to a standalone system is a logical partition (LPAR) which in essence is a kind of virtual machine. LPARs access I/O devices in a virtualized fashion as well. E.g., for PCI devices the I/O subsystem is responsible for the management of PCI hardware and will provide the PCI functions to the LPARs. This is where UID and function_id come into play. Each PCI function will carry a function_id attribute which defines it uniquely within the entire s390 system. If a UID is configured for the function, it must be unique within an LPAR, but doesn't need to be unique system-wide. For instance, the admistrator may want to ensure that for every LPAR the primary ethernet interface has the same identifier, to allow miigration of Linux instances accross LPARs. This can't be achieved with a slot-based name. The index SMBIOS type 41 device type instance indicates that 1. The device is an onboard device. 2. A specific order of the onboard devices. The UID described seems to indicate that the PCI device is unique within the Linux instance (sufficient for naming). But it does not indicate that PCI device is onboard and any order. As all devices with UID will be named as eno (Ethernet onboard), names are not in sync with how each PCI function is exposed on a PCI slot (appears closer to SMBIOS type 9 record) as described below. https://github.com/systemd/systemd/pull/19017 https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc In summary, it seems like the eno names on s390 will be unique names, but may not match the meaning of the index. Also, a) Will UID remain unique/same as initially exposed across reboots ? Yes, the UID is the primary identifier for a PCI function and part of the LPAR configuration. Even hotplug will not change the UID. b) Is the reuse of index related to the 'slots' based naming scheme described below ? https://github.com/systemd/systemd/pull/19017 https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc No, this is independent. The commit above fixes the slot name parsing. c) If the slot based naming is fixed with the naming scheme from changes below, any thoughts on why is reuse of index needed ? https://github.com/systemd/systemd/pull/19017 https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc As I wrote above, the slot describes the PCI function at the system level, whereas the uid/index does it in the context off the LPAR. > With regards, Narendra K -- Kind Regards, Viktor
Re: [RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)
On Fri, Apr 16, 2021 at 04:34:53PM -0500, Serge E. Hallyn wrote: > On Fri, Apr 16, 2021 at 05:05:01PM +0200, Christian Brauner wrote: > > On Thu, Apr 15, 2021 at 11:58:51PM -0500, Serge Hallyn wrote: > > > (Eric - this patch (v3) is a cleaned up version of the previous approach. > > > v4 is at > > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > > > and is the approach you suggested. I can send it also as a separate patch > > > if you like) > > > > > > A process running as uid 0 but without cap_setfcap currently can simply > > > unshare a new user namespace with uid 0 mapped to 0. While this task > > > will not have new capabilities against the parent namespace, there is > > > a loophole due to the way namespaced file capabilities work. File > > > capabilities valid in userns 1 are distinguised from file capabilities > > > valid in userns 2 by the kuid which underlies uid 0. Therefore > > > the restricted root process can unshare a new self-mapping namespace, > > > add a namespaced file capability onto a file, then use that file > > > capability in the parent namespace. > > > > > > To prevent that, do not allow mapping uid 0 if the process which > > > opened the uid_map file does not have CAP_SETFCAP, which is the capability > > > for setting file capabilities. > > > > > > A further wrinkle: a task can unshare its user namespace, then > > > open its uid_map file itself, and map (only) its own uid. In this > > > case we do not have the credential from before unshare, which was > > > potentially more restricted. So, when creating a user namespace, we > > > record whether the creator had CAP_SETFCAP. Then we can use that > > > during map_write(). > > > > > > With this patch: > > > > > > 1. unprivileged user can still unshare -Ur > > > > > > ubuntu@caps:~$ unshare -Ur > > > root@caps:~# logout > > > > > > 2. root user can still unshare -Ur > > > > > > ubuntu@caps:~$ sudo bash > > > root@caps:/home/ubuntu# unshare -Ur > > > root@caps:/home/ubuntu# logout > > > > > > 3. root user without CAP_SETFCAP cannot unshare -Ur: > > > > > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > > > unable to set CAP_SETFCAP effective capability: Operation not permitted > > > root@caps:/home/ubuntu# unshare -Ur > > > unshare: write failed /proc/self/uid_map: Operation not permitted > > > > > > Signed-off-by: Serge Hallyn > > > > > > Changelog: > > >* fix logic in the case of writing to another task's uid_map > > >* rename 'ns' to 'map_ns', and make a file_ns local variable > > >* use /* comments */ > > >* update the CAP_SETFCAP comment in capability.h > > >* rename parent_unpriv to parent_can_setfcap (and reverse the > > > logic) > > >* remove printks > > >* clarify (i hope) the code comments > > >* update capability.h comment > > >* renamed parent_can_setfcap to parent_could_setfcap > > >* made the check its own disallowed_0_mapping() fn > > >* moved the check into new_idmap_permitted > > > --- > > > > Thank you for working on this fix! > > > > I do prefer your approach of doing the check at user namespace creation > > time instead of moving it into the setxattr() codepath. > > > > Let me reiterate that the ability to write through fscaps is a valid > > usecase and this should continue to work but that for locked down user > > namespace as Andrew wants to use them your patch provides a clean > > solution. > > We've are using identity mappings in quite a few scenarios partially > > when performing tests but also to write through fscaps. > > We also had reports of users that use identity mappings. They create > > their rootfs by running image extraction in an identity mapped userns > > where fscaps are written through. > > Podman has use-cases for this feature as well and has been affected by > > the regression of the first fix. > > Thanks for reviewing. > > I'm not sure what your point above is, so just to make sure - the > alternative implementation also does allow fscaps for cases where > root uid is remapped, only disallowing it if it would violate the > ancestor's lack o
Re: [RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)
On Fri, Apr 16, 2021 at 05:05:01PM +0200, Christian Brauner wrote: > On Thu, Apr 15, 2021 at 11:58:51PM -0500, Serge Hallyn wrote: > > (Eric - this patch (v3) is a cleaned up version of the previous approach. > > v4 is at > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > > and is the approach you suggested. I can send it also as a separate patch > > if you like) > > > > A process running as uid 0 but without cap_setfcap currently can simply > > unshare a new user namespace with uid 0 mapped to 0. While this task > > will not have new capabilities against the parent namespace, there is > > a loophole due to the way namespaced file capabilities work. File > > capabilities valid in userns 1 are distinguised from file capabilities > > valid in userns 2 by the kuid which underlies uid 0. Therefore > > the restricted root process can unshare a new self-mapping namespace, > > add a namespaced file capability onto a file, then use that file > > capability in the parent namespace. > > > > To prevent that, do not allow mapping uid 0 if the process which > > opened the uid_map file does not have CAP_SETFCAP, which is the capability > > for setting file capabilities. > > > > A further wrinkle: a task can unshare its user namespace, then > > open its uid_map file itself, and map (only) its own uid. In this > > case we do not have the credential from before unshare, which was > > potentially more restricted. So, when creating a user namespace, we > > record whether the creator had CAP_SETFCAP. Then we can use that > > during map_write(). > > > > With this patch: > > > > 1. unprivileged user can still unshare -Ur > > > > ubuntu@caps:~$ unshare -Ur > > root@caps:~# logout > > > > 2. root user can still unshare -Ur > > > > ubuntu@caps:~$ sudo bash > > root@caps:/home/ubuntu# unshare -Ur > > root@caps:/home/ubuntu# logout > > > > 3. root user without CAP_SETFCAP cannot unshare -Ur: > > > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > > unable to set CAP_SETFCAP effective capability: Operation not permitted > > root@caps:/home/ubuntu# unshare -Ur > > unshare: write failed /proc/self/uid_map: Operation not permitted > > > > Signed-off-by: Serge Hallyn > > > > Changelog: > >* fix logic in the case of writing to another task's uid_map > >* rename 'ns' to 'map_ns', and make a file_ns local variable > >* use /* comments */ > >* update the CAP_SETFCAP comment in capability.h > >* rename parent_unpriv to parent_can_setfcap (and reverse the > > logic) > >* remove printks > >* clarify (i hope) the code comments > >* update capability.h comment > >* renamed parent_can_setfcap to parent_could_setfcap > >* made the check its own disallowed_0_mapping() fn > >* moved the check into new_idmap_permitted > > --- > > Thank you for working on this fix! > > I do prefer your approach of doing the check at user namespace creation > time instead of moving it into the setxattr() codepath. > > Let me reiterate that the ability to write through fscaps is a valid > usecase and this should continue to work but that for locked down user > namespace as Andrew wants to use them your patch provides a clean > solution. > We've are using identity mappings in quite a few scenarios partially > when performing tests but also to write through fscaps. > We also had reports of users that use identity mappings. They create > their rootfs by running image extraction in an identity mapped userns > where fscaps are written through. > Podman has use-cases for this feature as well and has been affected by > the regression of the first fix. Thanks for reviewing. I'm not sure what your point above is, so just to make sure - the alternative implementation also does allow fscaps for cases where root uid is remapped, only disallowing it if it would violate the ancestor's lack of cap_setfcap. > > include/linux/user_namespace.h | 3 ++ > > include/uapi/linux/capability.h | 3 +- > > kernel/user_namespace.c | 61 +++-- > > 3 files changed, 63 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > > index 64cf8ebdc4ec..f6c5f784be5a 100644 > > --- a/include/linux/user_namespace.h > > +++ b/include/linux/user_namespace.h > > @@ -63,
RE: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index
> -Original Message- > From: Niklas Schnelle > Sent: Thursday, April 15, 2021 12:55 PM > To: Bjorn Helgaas > Cc: K, Narendra; Viktor Mihajlovski; Stefan Raspl; Peter Oberparleiter; linux- > net...@vger.kernel.org; linux-...@vger.kernel.org; linux- > ker...@vger.kernel.org; linux-s...@vger.kernel.org > Subject: Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index > > > [EXTERNAL EMAIL] > > On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote: > > On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote: > > > On s390 each PCI device has a user-defined ID (UID) exposed under > > > /sys/bus/pci/devices//uid. This ID was designed to serve as the > > > PCI device's primary index and to match the device within Linux to > > > the device configured in the hypervisor. To serve as a primary > > > identifier the UID must be unique within the Linux instance, this is > > > guaranteed by the platform if and only if the UID Uniqueness > > > Checking flag is set within the CLP List PCI Functions response. > > > > > > In this sense the UID serves an analogous function as the SMBIOS > > > instance number or ACPI index exposed as the "index" respectively > > > "acpi_index" device attributes and used by e.g. systemd to set > > > interface names. As s390 does not use and will likely never use ACPI > > > nor SMBIOS there is no conflict and we can just expose the UID under the > "index" > > > attribute whenever UID Uniqueness Checking is active and get > > > systemd's interface naming support for free. > > > > > > Signed-off-by: Niklas Schnelle > > > Acked-by: Viktor Mihajlovski > > > > This seems like a nice solution to me. > > > > Acked-by: Bjorn Helgaas > > Thanks! Yes I agree it's a simple solution that also makes sense from a design > point. I'll wait for Narendra's opinion of course. Hi Niklas, It seems like the UID and the index are not equivalent in their meaning. The index SMBIOS type 41 device type instance indicates that 1. The device is an onboard device. 2. A specific order of the onboard devices. The UID described seems to indicate that the PCI device is unique within the Linux instance (sufficient for naming). But it does not indicate that PCI device is onboard and any order. As all devices with UID will be named as eno (Ethernet onboard), names are not in sync with how each PCI function is exposed on a PCI slot (appears closer to SMBIOS type 9 record) as described below. https://github.com/systemd/systemd/pull/19017 https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc In summary, it seems like the eno names on s390 will be unique names, but may not match the meaning of the index. Also, a) Will UID remain unique/same as initially exposed across reboots ? b) Is the reuse of index related to the 'slots' based naming scheme described below ? https://github.com/systemd/systemd/pull/19017 https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc c) If the slot based naming is fixed with the naming scheme from changes below, any thoughts on why is reuse of index needed ? https://github.com/systemd/systemd/pull/19017 https://github.com/systemd/systemd/commit/a496a238e8ee66ce25ad13a3f46549b2e2e979fc With regards, Narendra K
Re: [RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)
On Thu, Apr 15, 2021 at 11:58:51PM -0500, Serge Hallyn wrote: > (Eric - this patch (v3) is a cleaned up version of the previous approach. > v4 is at > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > and is the approach you suggested. I can send it also as a separate patch > if you like) > > A process running as uid 0 but without cap_setfcap currently can simply > unshare a new user namespace with uid 0 mapped to 0. While this task > will not have new capabilities against the parent namespace, there is > a loophole due to the way namespaced file capabilities work. File > capabilities valid in userns 1 are distinguised from file capabilities > valid in userns 2 by the kuid which underlies uid 0. Therefore > the restricted root process can unshare a new self-mapping namespace, > add a namespaced file capability onto a file, then use that file > capability in the parent namespace. > > To prevent that, do not allow mapping uid 0 if the process which > opened the uid_map file does not have CAP_SETFCAP, which is the capability > for setting file capabilities. > > A further wrinkle: a task can unshare its user namespace, then > open its uid_map file itself, and map (only) its own uid. In this > case we do not have the credential from before unshare, which was > potentially more restricted. So, when creating a user namespace, we > record whether the creator had CAP_SETFCAP. Then we can use that > during map_write(). > > With this patch: > > 1. unprivileged user can still unshare -Ur > > ubuntu@caps:~$ unshare -Ur > root@caps:~# logout > > 2. root user can still unshare -Ur > > ubuntu@caps:~$ sudo bash > root@caps:/home/ubuntu# unshare -Ur > root@caps:/home/ubuntu# logout > > 3. root user without CAP_SETFCAP cannot unshare -Ur: > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > unable to set CAP_SETFCAP effective capability: Operation not permitted > root@caps:/home/ubuntu# unshare -Ur > unshare: write failed /proc/self/uid_map: Operation not permitted > > Signed-off-by: Serge Hallyn > > Changelog: >* fix logic in the case of writing to another task's uid_map >* rename 'ns' to 'map_ns', and make a file_ns local variable >* use /* comments */ >* update the CAP_SETFCAP comment in capability.h >* rename parent_unpriv to parent_can_setfcap (and reverse the > logic) >* remove printks >* clarify (i hope) the code comments >* update capability.h comment >* renamed parent_can_setfcap to parent_could_setfcap >* made the check its own disallowed_0_mapping() fn >* moved the check into new_idmap_permitted > --- Thank you for working on this fix! I do prefer your approach of doing the check at user namespace creation time instead of moving it into the setxattr() codepath. Let me reiterate that the ability to write through fscaps is a valid usecase and this should continue to work but that for locked down user namespace as Andrew wants to use them your patch provides a clean solution. We've are using identity mappings in quite a few scenarios partially when performing tests but also to write through fscaps. We also had reports of users that use identity mappings. They create their rootfs by running image extraction in an identity mapped userns where fscaps are written through. Podman has use-cases for this feature as well and has been affected by the regression of the first fix. > include/linux/user_namespace.h | 3 ++ > include/uapi/linux/capability.h | 3 +- > kernel/user_namespace.c | 61 +++-- > 3 files changed, 63 insertions(+), 4 deletions(-) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index 64cf8ebdc4ec..f6c5f784be5a 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -63,6 +63,9 @@ struct user_namespace { > kgid_t group; > struct ns_commonns; > unsigned long flags; > + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP > + * in its effective capability set at the child ns creation time. */ > + boolparent_could_setfcap; > > #ifdef CONFIG_KEYS > /* List of joinable keyrings in this namespace. Modification access of > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h > index c6ca33034147..2ddb4226cd23 100644 > --- a/include/uapi/linux/capability.h > +++ b/include/uapi/linux/capability.h > @@ -335,7 +335,8 @@ struct vfs_ns_cap_data { > > #define CAP_AUDIT_
[RFC PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3)
(Eric - this patch (v3) is a cleaned up version of the previous approach. v4 is at https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 and is the approach you suggested. I can send it also as a separate patch if you like) A process running as uid 0 but without cap_setfcap currently can simply unshare a new user namespace with uid 0 mapped to 0. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities work. File capabilities valid in userns 1 are distinguised from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. A further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Signed-off-by: Serge Hallyn Changelog: * fix logic in the case of writing to another task's uid_map * rename 'ns' to 'map_ns', and make a file_ns local variable * use /* comments */ * update the CAP_SETFCAP comment in capability.h * rename parent_unpriv to parent_can_setfcap (and reverse the logic) * remove printks * clarify (i hope) the code comments * update capability.h comment * renamed parent_can_setfcap to parent_could_setfcap * made the check its own disallowed_0_mapping() fn * moved the check into new_idmap_permitted --- include/linux/user_namespace.h | 3 ++ include/uapi/linux/capability.h | 3 +- kernel/user_namespace.c | 61 +++-- 3 files changed, 63 insertions(+), 4 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 64cf8ebdc4ec..f6c5f784be5a 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -63,6 +63,9 @@ struct user_namespace { kgid_t group; struct ns_commonns; unsigned long flags; + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP +* in its effective capability set at the child ns creation time. */ + boolparent_could_setfcap; #ifdef CONFIG_KEYS /* List of joinable keyrings in this namespace. Modification access of diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index c6ca33034147..2ddb4226cd23 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -335,7 +335,8 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_CONTROL30 -/* Set or remove capabilities on files */ +/* Set or remove capabilities on files. + Map uid=0 into a child user namespace. */ #define CAP_SETFCAP 31 diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index af612945a4d0..8c75028a9aae 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -106,6 +106,7 @@ int create_user_ns(struct cred *new) if (!ns) goto fail_dec; + ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP); ret = ns_alloc_inum(&ns->ns); if (ret) goto fail_free; @@ -841,6 +842,56 @@ static int sort_idmaps(struct uid_gid_map *map) return 0; } +/* + * If mapping uid 0, then file capabilities created by the new namespace will + * be effective in the parent namespace. Adding file capabilities requires + * CAP_SETFCAP, which the child namespace will have, so creating such a + * mapping requires CAP_SETFCAP in the parent namespace. + */ +static bool disallowed_0_mapping(const struct file *file, +struct user_namespace *map_ns, +struct uid_gid_map *new_map) +{ + int idx; + bool zeromapping = false; + const struct user_name
Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index
On Wed, 2021-04-14 at 15:17 -0500, Bjorn Helgaas wrote: > On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote: > > On s390 each PCI device has a user-defined ID (UID) exposed under > > /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI > > device's primary index and to match the device within Linux to the > > device configured in the hypervisor. To serve as a primary identifier > > the UID must be unique within the Linux instance, this is guaranteed by > > the platform if and only if the UID Uniqueness Checking flag is set > > within the CLP List PCI Functions response. > > > > In this sense the UID serves an analogous function as the SMBIOS > > instance number or ACPI index exposed as the "index" respectively > > "acpi_index" device attributes and used by e.g. systemd to set interface > > names. As s390 does not use and will likely never use ACPI nor SMBIOS > > there is no conflict and we can just expose the UID under the "index" > > attribute whenever UID Uniqueness Checking is active and get systemd's > > interface naming support for free. > > > > Signed-off-by: Niklas Schnelle > > Acked-by: Viktor Mihajlovski > > This seems like a nice solution to me. > > Acked-by: Bjorn Helgaas Thanks! Yes I agree it's a simple solution that also makes sense from a design point. I'll wait for Narendra's opinion of course. > > > --- > > Documentation/ABI/testing/sysfs-bus-pci | 11 +--- > > arch/s390/pci/pci_sysfs.c | 35 + > > 2 files changed, 42 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci > > b/Documentation/ABI/testing/sysfs-bus-pci > > index 25c9c39770c6..1241b6d11a52 100644 > > --- a/Documentation/ABI/testing/sysfs-bus-pci > > +++ b/Documentation/ABI/testing/sysfs-bus-pci > > @@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index > > Date: July 2010 > > Contact: Narendra K , linux-b...@dell.com > > Description: > > - Reading this attribute will provide the firmware > > - given instance (SMBIOS type 41 device type instance) of the > > - PCI device. The attribute will be created only if the firmware > > - has given an instance number to the PCI device. > > + Reading this attribute will provide the firmware given instance > > + number of the PCI device. Depending on the platform this can > > + be for example the SMBIOS type 41 device type instance or the > > + user-defined ID (UID) on s390. The attribute will be created > > + only if the firmware has given an instance number to the PCI > > + device and that number is guaranteed to uniquely identify the > > + device in the system. > > Users: > > Userspace applications interested in knowing the > > firmware assigned device type instance of the PCI > > diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c > > index e14d346dafd6..20dbb2058d51 100644 > > --- a/arch/s390/pci/pci_sysfs.c > > +++ b/arch/s390/pci/pci_sysfs.c > > @@ -138,6 +138,38 @@ static ssize_t uid_is_unique_show(struct device *dev, > > } > > static DEVICE_ATTR_RO(uid_is_unique); > > > > +#ifndef CONFIG_DMI > > +/* analogous to smbios index */ > > I think this is smbios_attr_instance, right? Maybe mention that > specifically to make it easier to match these up. > > Looks like smbios_attr_instance and the similar ACPI stuff could use > some updating to use the current attribute group infrastructure. > > > +static ssize_t index_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); > > + u32 index = ~0; > > + > > + if (zpci_unique_uid) > > + index = zdev->uid; > > + > > + return sysfs_emit(buf, "%u\n", index); > > +} > > +static DEVICE_ATTR_RO(index); > > + > > +static umode_t zpci_unique_uids(struct kobject *kobj, > > + struct attribute *attr, int n) > > +{ > > + return zpci_unique_uid ? attr->mode : 0; > > +} > > + > > +static struct attribute *zpci_ident_attrs[] = { > > + &dev_attr_index.attr, > > + NULL, > > +}; > > + > > +static struct attribute_group zpci_ident_attr_group = { > > + .attrs = zpci_ident_attrs, > > + .is_visib
Re: [PATCH 1/1] s390/pci: expose a PCI device's UID as its index
On Mon, Apr 12, 2021 at 03:59:05PM +0200, Niklas Schnelle wrote: > On s390 each PCI device has a user-defined ID (UID) exposed under > /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI > device's primary index and to match the device within Linux to the > device configured in the hypervisor. To serve as a primary identifier > the UID must be unique within the Linux instance, this is guaranteed by > the platform if and only if the UID Uniqueness Checking flag is set > within the CLP List PCI Functions response. > > In this sense the UID serves an analogous function as the SMBIOS > instance number or ACPI index exposed as the "index" respectively > "acpi_index" device attributes and used by e.g. systemd to set interface > names. As s390 does not use and will likely never use ACPI nor SMBIOS > there is no conflict and we can just expose the UID under the "index" > attribute whenever UID Uniqueness Checking is active and get systemd's > interface naming support for free. > > Signed-off-by: Niklas Schnelle > Acked-by: Viktor Mihajlovski This seems like a nice solution to me. Acked-by: Bjorn Helgaas > --- > Documentation/ABI/testing/sysfs-bus-pci | 11 +--- > arch/s390/pci/pci_sysfs.c | 35 + > 2 files changed, 42 insertions(+), 4 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci > b/Documentation/ABI/testing/sysfs-bus-pci > index 25c9c39770c6..1241b6d11a52 100644 > --- a/Documentation/ABI/testing/sysfs-bus-pci > +++ b/Documentation/ABI/testing/sysfs-bus-pci > @@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index > Date:July 2010 > Contact: Narendra K , linux-b...@dell.com > Description: > - Reading this attribute will provide the firmware > - given instance (SMBIOS type 41 device type instance) of the > - PCI device. The attribute will be created only if the firmware > - has given an instance number to the PCI device. > + Reading this attribute will provide the firmware given instance > + number of the PCI device. Depending on the platform this can > + be for example the SMBIOS type 41 device type instance or the > + user-defined ID (UID) on s390. The attribute will be created > + only if the firmware has given an instance number to the PCI > + device and that number is guaranteed to uniquely identify the > + device in the system. > Users: > Userspace applications interested in knowing the > firmware assigned device type instance of the PCI > diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c > index e14d346dafd6..20dbb2058d51 100644 > --- a/arch/s390/pci/pci_sysfs.c > +++ b/arch/s390/pci/pci_sysfs.c > @@ -138,6 +138,38 @@ static ssize_t uid_is_unique_show(struct device *dev, > } > static DEVICE_ATTR_RO(uid_is_unique); > > +#ifndef CONFIG_DMI > +/* analogous to smbios index */ I think this is smbios_attr_instance, right? Maybe mention that specifically to make it easier to match these up. Looks like smbios_attr_instance and the similar ACPI stuff could use some updating to use the current attribute group infrastructure. > +static ssize_t index_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); > + u32 index = ~0; > + > + if (zpci_unique_uid) > + index = zdev->uid; > + > + return sysfs_emit(buf, "%u\n", index); > +} > +static DEVICE_ATTR_RO(index); > + > +static umode_t zpci_unique_uids(struct kobject *kobj, > + struct attribute *attr, int n) > +{ > + return zpci_unique_uid ? attr->mode : 0; > +} > + > +static struct attribute *zpci_ident_attrs[] = { > + &dev_attr_index.attr, > + NULL, > +}; > + > +static struct attribute_group zpci_ident_attr_group = { > + .attrs = zpci_ident_attrs, > + .is_visible = zpci_unique_uids, It's conventional to name these functions *_is_visible() (another convention that smbios_attr_instance and acpi_attr_index probably predate). > +}; > +#endif > + > static struct bin_attribute *zpci_bin_attrs[] = { > &bin_attr_util_string, > &bin_attr_report_error, > @@ -179,5 +211,8 @@ static struct attribute_group pfip_attr_group = { > const struct attribute_group *zpci_attr_groups[] = { > &zpci_attr_group, > &pfip_attr_group, > +#ifndef CONFIG_DMI > + &zpci_ident_attr_group, > +#endif > NULL, > }; > -- > 2.25.1 >
[PATCH 1/1] s390/pci: expose a PCI device's UID as its index
On s390 each PCI device has a user-defined ID (UID) exposed under /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI device's primary index and to match the device within Linux to the device configured in the hypervisor. To serve as a primary identifier the UID must be unique within the Linux instance, this is guaranteed by the platform if and only if the UID Uniqueness Checking flag is set within the CLP List PCI Functions response. In this sense the UID serves an analogous function as the SMBIOS instance number or ACPI index exposed as the "index" respectively "acpi_index" device attributes and used by e.g. systemd to set interface names. As s390 does not use and will likely never use ACPI nor SMBIOS there is no conflict and we can just expose the UID under the "index" attribute whenever UID Uniqueness Checking is active and get systemd's interface naming support for free. Signed-off-by: Niklas Schnelle Acked-by: Viktor Mihajlovski --- Documentation/ABI/testing/sysfs-bus-pci | 11 +--- arch/s390/pci/pci_sysfs.c | 35 + 2 files changed, 42 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 25c9c39770c6..1241b6d11a52 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index Date: July 2010 Contact: Narendra K , linux-b...@dell.com Description: - Reading this attribute will provide the firmware - given instance (SMBIOS type 41 device type instance) of the - PCI device. The attribute will be created only if the firmware - has given an instance number to the PCI device. + Reading this attribute will provide the firmware given instance + number of the PCI device. Depending on the platform this can + be for example the SMBIOS type 41 device type instance or the + user-defined ID (UID) on s390. The attribute will be created + only if the firmware has given an instance number to the PCI + device and that number is guaranteed to uniquely identify the + device in the system. Users: Userspace applications interested in knowing the firmware assigned device type instance of the PCI diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index e14d346dafd6..20dbb2058d51 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -138,6 +138,38 @@ static ssize_t uid_is_unique_show(struct device *dev, } static DEVICE_ATTR_RO(uid_is_unique); +#ifndef CONFIG_DMI +/* analogous to smbios index */ +static ssize_t index_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); + u32 index = ~0; + + if (zpci_unique_uid) + index = zdev->uid; + + return sysfs_emit(buf, "%u\n", index); +} +static DEVICE_ATTR_RO(index); + +static umode_t zpci_unique_uids(struct kobject *kobj, + struct attribute *attr, int n) +{ + return zpci_unique_uid ? attr->mode : 0; +} + +static struct attribute *zpci_ident_attrs[] = { + &dev_attr_index.attr, + NULL, +}; + +static struct attribute_group zpci_ident_attr_group = { + .attrs = zpci_ident_attrs, + .is_visible = zpci_unique_uids, +}; +#endif + static struct bin_attribute *zpci_bin_attrs[] = { &bin_attr_util_string, &bin_attr_report_error, @@ -179,5 +211,8 @@ static struct attribute_group pfip_attr_group = { const struct attribute_group *zpci_attr_groups[] = { &zpci_attr_group, &pfip_attr_group, +#ifndef CONFIG_DMI + &zpci_ident_attr_group, +#endif NULL, }; -- 2.25.1
[PATCH v19 2/7] KVM: arm64: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong...@arm.com --- arch/arm64/kvm/hypercalls.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index ead21b98b620..78d32c34d49c 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u64 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case SPECTRE_VULNERABLE: break; case SPECTRE_MITIGATED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; + val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; @@ -54,22 +54,31 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; fallthrough; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; case ARM_SMCCC_TRNG_VERSION: case ARM_SMCCC_TRNG_FEATURES: @@ -81,6 +90,6 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.29.2
[PATCH] s390/pci: expose a PCI device's UID as its index
On s390 each PCI device has a user-defined ID (UID) exposed under /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI device's primary index and to match the device within Linux to the device configured in the hypervisor. To serve as a primary identifier the UID must be unique within the Linux instance, this is guaranteed by the platform if and only if the UID Uniqueness Checking flag is set within the CLP List PCI Functions response. In this the UID serves an analogous function as the SMBIOS instance number or ACPI index exposed as the "index" respectively "acpi_index" device attributes and used by e.g. systemd to set interface names. As s390 does not use and will likely never use ACPI nor SMBIOS there is no conflict and we can just expose the UID under the "index" attribute whenever UID Uniqueness Checking is active and get systemd's interface naming support for free. Signed-off-by: Niklas Schnelle Acked-by: Viktor Mihajlovski --- Changes from RFC patch: - Use sysfs_emit() instead of sprintf() (Thanks Krzysztof Wilczyński!) Documentation/ABI/testing/sysfs-bus-pci | 11 +--- arch/s390/pci/pci_sysfs.c | 36 + 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 25c9c39770c6..1241b6d11a52 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index Date: July 2010 Contact: Narendra K , linux-b...@dell.com Description: - Reading this attribute will provide the firmware - given instance (SMBIOS type 41 device type instance) of the - PCI device. The attribute will be created only if the firmware - has given an instance number to the PCI device. + Reading this attribute will provide the firmware given instance + number of the PCI device. Depending on the platform this can + be for example the SMBIOS type 41 device type instance or the + user-defined ID (UID) on s390. The attribute will be created + only if the firmware has given an instance number to the PCI + device and that number is guaranteed to uniquely identify the + device in the system. Users: Userspace applications interested in knowing the firmware assigned device type instance of the PCI diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 5c028bee91b9..c7107bd9dd93 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -131,6 +131,38 @@ static ssize_t report_error_write(struct file *filp, struct kobject *kobj, } static BIN_ATTR(report_error, S_IWUSR, NULL, report_error_write, PAGE_SIZE); +#ifndef CONFIG_DMI +/* analogous to smbios index */ +static ssize_t index_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); + u32 index = ~0; + + if (zpci_unique_uid) + index = zdev->uid; + + return sysfs_emit(buf, "%u\n", index); +} +static DEVICE_ATTR_RO(index); + +static umode_t zpci_unique_uids(struct kobject *kobj, + struct attribute *attr, int n) +{ + return zpci_unique_uid ? attr->mode : 0; +} + +static struct attribute *zpci_ident_attrs[] = { + &dev_attr_index.attr, + NULL, +}; + +static struct attribute_group zpci_ident_attr_group = { + .attrs = zpci_ident_attrs, + .is_visible = zpci_unique_uids, +}; +#endif + static struct bin_attribute *zpci_bin_attrs[] = { &bin_attr_util_string, &bin_attr_report_error, @@ -150,6 +182,7 @@ static struct attribute *zpci_dev_attrs[] = { &dev_attr_mio_enabled.attr, NULL, }; + static struct attribute_group zpci_attr_group = { .attrs = zpci_dev_attrs, .bin_attrs = zpci_bin_attrs, @@ -170,5 +203,8 @@ static struct attribute_group pfip_attr_group = { const struct attribute_group *zpci_attr_groups[] = { &zpci_attr_group, &pfip_attr_group, +#ifndef CONFIG_DMI + &zpci_ident_attr_group, +#endif NULL, }; -- 2.25.1
Re: [RFC 1/1] s390/pci: expose a PCI device's UID as its index
On 3/8/21 8:02 AM, Niklas Schnelle wrote: On 3/7/21 9:46 PM, Krzysztof Wilczyński wrote: Hi Niklas, [...] +static ssize_t index_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); + u32 index = ~0; + + if (zpci_unique_uid) + index = zdev->uid; + + return sprintf(buf, "%u\n", index); [...] Would it be possible to use the new sysfs_emit() rather than sprintf() even though the zpci_attr macro and still use mio_enabled_show() still would use sprintf(). What do you think? See https://www.kernel.org/doc/html/latest/filesystems/sysfs.html for the changes in the internal API. Krzysztof Of course that makes sense and thanks for pointing me to this API! @Viktor, may I carry your R-b over? Sure, please go ahead. I'll also update the other attributes in a clean up patch. Thanks, Niklas -- Kind Regards, Viktor
Re: [RFC 1/1] s390/pci: expose a PCI device's UID as its index
On 3/7/21 9:46 PM, Krzysztof Wilczyński wrote: > Hi Niklas, > > [...] >> +static ssize_t index_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> +struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); >> +u32 index = ~0; >> + >> +if (zpci_unique_uid) >> +index = zdev->uid; >> + >> +return sprintf(buf, "%u\n", index); > [...] > > Would it be possible to use the new sysfs_emit() rather than sprintf() > even though the zpci_attr macro and still use mio_enabled_show() still > would use sprintf(). What do you think? > > See https://www.kernel.org/doc/html/latest/filesystems/sysfs.html for > the changes in the internal API. > > Krzysztof > Of course that makes sense and thanks for pointing me to this API! @Viktor, may I carry your R-b over? I'll also update the other attributes in a clean up patch. Thanks, Niklas
Re: [RFC 1/1] s390/pci: expose a PCI device's UID as its index
Hi Niklas, [...] > +static ssize_t index_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); > + u32 index = ~0; > + > + if (zpci_unique_uid) > + index = zdev->uid; > + > + return sprintf(buf, "%u\n", index); [...] Would it be possible to use the new sysfs_emit() rather than sprintf() even though the zpci_attr macro and still use mio_enabled_show() still would use sprintf(). What do you think? See https://www.kernel.org/doc/html/latest/filesystems/sysfs.html for the changes in the internal API. Krzysztof
[RFC 0/1] s390/pci: expose a PCI device's UID as its index
Hi, On s390 each PCI device has a user-defined ID (UID). This ID was designed to serve as the PCI device's primary index and to match the device within Linux to the with the view in the hypervisor configuration. To serve as a primary identifier the UID must be unique within the Linux instance, this is guaranteed by the platform if and only if the UID Uniqueness Checking flag is set within the CLP List PCI Functions response which is also currently used to determine whether the UID is also used as the Domain part of the geographical PCI address of the device. As primary identifier of a PCI device, the UID serves an analogous function as the SMBIOS instance number or ACPI index exposed as the "index" respectively "acpi_index" device attributes. These attributes are used by e.g. systemd/udev to set network interface names. As s390 does not use ACPI nor SMBIOS there is no conflict and we can expose the UID under the "index" attribute whenever UID Uniqueness Checking is active and systemd/udev will then create "eno.." interface names. Note: This is an evolution of an earlier patch I sent for exposing the UID Uniqueness Checking flag directly. Thank you Greg for making me realize that we were looking too much at just exposing platform details instead of looking how existing interfaces could suit our purpose. Thanks, Niklas Schnelle Niklas Schnelle (1): s390/pci: expose a PCI device's UID as its index Documentation/ABI/testing/sysfs-bus-pci | 11 +--- arch/s390/pci/pci_sysfs.c | 36 + 2 files changed, 43 insertions(+), 4 deletions(-) -- 2.25.1
[RFC 1/1] s390/pci: expose a PCI device's UID as its index
On s390 each PCI device has a user-defined ID (UID) exposed under /sys/bus/pci/devices//uid. This ID was designed to serve as the PCI device's primary index and to match the device within Linux to the device configured in the hypervisor. To serve as a primary identifier the UID must be unique within the Linux instance, this is guaranteed by the platform if and only if the UID Uniqueness Checking flag is set within the CLP List PCI Functions response. In this the UID serves an analogous function as the SMBIOS instance number or ACPI index exposed as the "index" respectively "acpi_index" device attributes and used by e.g. systemd to set interface names. As s390 does not use and will likely never use ACPI nor SMBIOS there is no conflict and we can just expose the UID under the "index" attribute whenever UID Uniqueness Checking is active and get systemd's interface naming support for free. Signed-off-by: Niklas Schnelle Acked-by: Viktor Mihajlovski --- Documentation/ABI/testing/sysfs-bus-pci | 11 +--- arch/s390/pci/pci_sysfs.c | 36 + 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 25c9c39770c6..1241b6d11a52 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -195,10 +195,13 @@ What: /sys/bus/pci/devices/.../index Date: July 2010 Contact: Narendra K , linux-b...@dell.com Description: - Reading this attribute will provide the firmware - given instance (SMBIOS type 41 device type instance) of the - PCI device. The attribute will be created only if the firmware - has given an instance number to the PCI device. + Reading this attribute will provide the firmware given instance + number of the PCI device. Depending on the platform this can + be for example the SMBIOS type 41 device type instance or the + user-defined ID (UID) on s390. The attribute will be created + only if the firmware has given an instance number to the PCI + device and that number is guaranteed to uniquely identify the + device in the system. Users: Userspace applications interested in knowing the firmware assigned device type instance of the PCI diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 5c028bee91b9..30f48e8a1156 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -131,6 +131,38 @@ static ssize_t report_error_write(struct file *filp, struct kobject *kobj, } static BIN_ATTR(report_error, S_IWUSR, NULL, report_error_write, PAGE_SIZE); +#ifndef CONFIG_DMI +/* analogous to smbios index */ +static ssize_t index_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); + u32 index = ~0; + + if (zpci_unique_uid) + index = zdev->uid; + + return sprintf(buf, "%u\n", index); +} +static DEVICE_ATTR_RO(index); + +static umode_t zpci_unique_uids(struct kobject *kobj, + struct attribute *attr, int n) +{ + return zpci_unique_uid ? attr->mode : 0; +} + +static struct attribute *zpci_ident_attrs[] = { + &dev_attr_index.attr, + NULL, +}; + +static struct attribute_group zpci_ident_attr_group = { + .attrs = zpci_ident_attrs, + .is_visible = zpci_unique_uids, +}; +#endif + static struct bin_attribute *zpci_bin_attrs[] = { &bin_attr_util_string, &bin_attr_report_error, @@ -150,6 +182,7 @@ static struct attribute *zpci_dev_attrs[] = { &dev_attr_mio_enabled.attr, NULL, }; + static struct attribute_group zpci_attr_group = { .attrs = zpci_dev_attrs, .bin_attrs = zpci_bin_attrs, @@ -170,5 +203,8 @@ static struct attribute_group pfip_attr_group = { const struct attribute_group *zpci_attr_groups[] = { &zpci_attr_group, &pfip_attr_group, +#ifndef CONFIG_DMI + &zpci_ident_attr_group, +#endif NULL, }; -- 2.25.1
[PATCH v2 06/17] KVM: arm64: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Signed-off-by: Will Deacon Signed-off-by: Gavin Shan --- arch/arm64/kvm/hypercalls.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index a54c4805f2a6..e02e29a12bbf 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + long val[4] = { SMCCC_RET_NOT_SUPPORTED }; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case SPECTRE_VULNERABLE: break; case SPECTRE_MITIGATED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; + val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; @@ -54,22 +54,31 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; fallthrough; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; case SDEI_1_0_FN_SDEI_VERSION: case SDEI_1_0_FN_SDEI_EVENT_REGISTER: @@ -93,6 +102,6 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.23.0
[PATCH v18 2/7] KVM: arm64: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong...@arm.com --- arch/arm64/kvm/hypercalls.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 25ea4ecb6449..b9d8607083eb 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u64 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case SPECTRE_VULNERABLE: break; case SPECTRE_MITIGATED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; + val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; @@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; fallthrough; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; default: return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.29.2
Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute
On 2/4/21 2:38 PM, Greg Kroah-Hartman wrote: > On Thu, Feb 04, 2021 at 01:02:51PM +0100, Niklas Schnelle wrote: >> >> >> On 2/4/21 11:40 AM, Greg Kroah-Hartman wrote: >>> On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote: >>>> The global UID uniqueness attribute exposes whether the platform >>>> guarantees that the user-defined per-device UID attribute values >>>> (/sys/bus/pci/device//uid) are unique and can thus be used as >>>> a global identifier for the associated PCI device. With this commit >>>> it is exposed at /sys/bus/pci/zpci/unique_uids >>>> >>>> Signed-off-by: Niklas Schnelle >>>> --- >>>> Documentation/ABI/testing/sysfs-bus-pci | 9 + >>>> drivers/pci/pci-sysfs.c | 21 + >>>> 2 files changed, 30 insertions(+) >>>> >>>> diff --git a/Documentation/ABI/testing/sysfs-bus-pci >>>> b/Documentation/ABI/testing/sysfs-bus-pci >>>> index 25c9c39770c6..812dd9d3f80d 100644 >>>> --- a/Documentation/ABI/testing/sysfs-bus-pci >>>> +++ b/Documentation/ABI/testing/sysfs-bus-pci >>>> @@ -375,3 +375,12 @@ Description: >>>>The value comes from the PCI kernel device state and can be one >>>>of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". >>>>The file is read only. >>>> +What: /sys/bus/pci/zpci/unique_uids >>> >>> No blank line before this new line? >> >> Good catch, thanks! >> >>> >>> And why "zpci"? >> >> There doesn't seem to be a precedent for arch specific attributes under >> /sys/bus/pci so I went with a separate group for the RFC. > > Why? There's nothing arch-specific here, right? Either the file is > present or not. Good point, will change this to /sys/bus/pci/unique_uids > >> "zpci" since that's what we've been calling the s390 specific PCI. >> I'm not attached to that name or having a separate group, from >> my perspective /sys/bus/pci/unique_uids would actually be ideal >> if Bjorn is okay with that, we don't currently foresee any additional >> global attributes and no one else seems to have them but >> one never knows and a separate group would then of course, >> well group them. > > Why not just a simple file, no need for arch-specific names if this > really can show up under other arches? I guess other arches would be free to implement the same UID concept as above I'm happy to change this to a simple file. > >>>> +Date: February 2021 >>>> +Contact: Niklas Schnelle >>>> +Description: >>>> + This attribute exposes the global state of UID Uniqueness on an >>>> + s390 Linux system. If this file contains '1' the per-device UID >>>> + attribute is guaranteed to provide a unique user defined >>>> + identifier for that PCI device. If this file contains '0' UIDs >>>> + may collide and do not provide a unique identifier. >>> >>> What are they "colliding" with? And where does the UID come from, the >>> device itself or somewhere else? >> >> If this attribute is 0 multiple PCI devices seen by Linux may have the same >> UID i.e. >> they may collide with each other on the same Linux instance. > > So can't userspace figure this out on its own? Userspace can figure out that they do not currently collide but without this attribute it doesn't know that the platform also guarantees that no collision will be introduced by e.g. hotplug. We are however pushing to have this always turned on so yes over time this attribute will hopefully be always 1 but this will take years. > >> The >> UIDs are exposed under /sys/bus/pci/devices//uid. Even if the attribute >> is 1 >> multiple Linux instances will often see the same UID. The main use case >> we currently envision is naming PCI based network interfaces "eno" >> which of course only works if the UIDs are unique for that Linux. >> On the same mainframe multiple Linux instances may then e.g. use the same >> UID for VFs from the same physical PCI network card or different cards >> but the same physical network all defined by an administrator/management >> system. >> >> The UIDs come from the platform/firmware/hypervisor and are provided >> for each device by t
Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute
On Thu, Feb 04, 2021 at 01:02:51PM +0100, Niklas Schnelle wrote: > > > On 2/4/21 11:40 AM, Greg Kroah-Hartman wrote: > > On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote: > >> The global UID uniqueness attribute exposes whether the platform > >> guarantees that the user-defined per-device UID attribute values > >> (/sys/bus/pci/device//uid) are unique and can thus be used as > >> a global identifier for the associated PCI device. With this commit > >> it is exposed at /sys/bus/pci/zpci/unique_uids > >> > >> Signed-off-by: Niklas Schnelle > >> --- > >> Documentation/ABI/testing/sysfs-bus-pci | 9 + > >> drivers/pci/pci-sysfs.c | 21 + > >> 2 files changed, 30 insertions(+) > >> > >> diff --git a/Documentation/ABI/testing/sysfs-bus-pci > >> b/Documentation/ABI/testing/sysfs-bus-pci > >> index 25c9c39770c6..812dd9d3f80d 100644 > >> --- a/Documentation/ABI/testing/sysfs-bus-pci > >> +++ b/Documentation/ABI/testing/sysfs-bus-pci > >> @@ -375,3 +375,12 @@ Description: > >>The value comes from the PCI kernel device state and can be one > >>of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". > >>The file is read only. > >> +What: /sys/bus/pci/zpci/unique_uids > > > > No blank line before this new line? > > Good catch, thanks! > > > > > And why "zpci"? > > There doesn't seem to be a precedent for arch specific attributes under > /sys/bus/pci so I went with a separate group for the RFC. Why? There's nothing arch-specific here, right? Either the file is present or not. > "zpci" since that's what we've been calling the s390 specific PCI. > I'm not attached to that name or having a separate group, from > my perspective /sys/bus/pci/unique_uids would actually be ideal > if Bjorn is okay with that, we don't currently foresee any additional > global attributes and no one else seems to have them but > one never knows and a separate group would then of course, > well group them. Why not just a simple file, no need for arch-specific names if this really can show up under other arches? > >> +Date: February 2021 > >> +Contact: Niklas Schnelle > >> +Description: > >> + This attribute exposes the global state of UID Uniqueness on an > >> + s390 Linux system. If this file contains '1' the per-device UID > >> + attribute is guaranteed to provide a unique user defined > >> + identifier for that PCI device. If this file contains '0' UIDs > >> + may collide and do not provide a unique identifier. > > > > What are they "colliding" with? And where does the UID come from, the > > device itself or somewhere else? > > If this attribute is 0 multiple PCI devices seen by Linux may have the same > UID i.e. > they may collide with each other on the same Linux instance. So can't userspace figure this out on its own? > The > UIDs are exposed under /sys/bus/pci/devices//uid. Even if the attribute > is 1 > multiple Linux instances will often see the same UID. The main use case > we currently envision is naming PCI based network interfaces "eno" > which of course only works if the UIDs are unique for that Linux. > On the same mainframe multiple Linux instances may then e.g. use the same > UID for VFs from the same physical PCI network card or different cards > but the same physical network all defined by an administrator/management > system. > > The UIDs come from the platform/firmware/hypervisor and are provided > for each device by the CLP List PCI Functions "instruction" that is used > on s390x where an OS can't probe the physical PCI bus but instead > that is done by firmware. On QEMU/KVM they can be set on the QEMU cli, > on our machine hypervisor they are defined by the machine > administrator/management > software as part of the definition of VMs/Partitions on that mainframe which > includes > everything from the number of CPUs, memory, I/O devices etc. With the exposed > UID uniqueness > attribute the platform then certifies that it will ensure that a UID is set to > a unique non-zero value. I can of course add more of this explanation > in the documentation too. Please explain it more, but why would userspace care about this? If userspace sees a UID "collision" then it just adds something else to the end of the name to make it unique. What is it supposed to do differently based on the value/presense of this file? > Both the uniqueness guarantee (this attribute) as well as the UIDs themselves > are part of the Z (s390x) architecture so fall under the mainframe typical > backwards compatibility considerations. So what can userspace do with this? What tool is going to rely on this? thanks, greg k-h
Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute
On 2/4/21 11:40 AM, Greg Kroah-Hartman wrote: > On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote: >> The global UID uniqueness attribute exposes whether the platform >> guarantees that the user-defined per-device UID attribute values >> (/sys/bus/pci/device//uid) are unique and can thus be used as >> a global identifier for the associated PCI device. With this commit >> it is exposed at /sys/bus/pci/zpci/unique_uids >> >> Signed-off-by: Niklas Schnelle >> --- >> Documentation/ABI/testing/sysfs-bus-pci | 9 + >> drivers/pci/pci-sysfs.c | 21 + >> 2 files changed, 30 insertions(+) >> >> diff --git a/Documentation/ABI/testing/sysfs-bus-pci >> b/Documentation/ABI/testing/sysfs-bus-pci >> index 25c9c39770c6..812dd9d3f80d 100644 >> --- a/Documentation/ABI/testing/sysfs-bus-pci >> +++ b/Documentation/ABI/testing/sysfs-bus-pci >> @@ -375,3 +375,12 @@ Description: >> The value comes from the PCI kernel device state and can be one >> of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". >> The file is read only. >> +What: /sys/bus/pci/zpci/unique_uids > > No blank line before this new line? Good catch, thanks! > > And why "zpci"? There doesn't seem to be a precedent for arch specific attributes under /sys/bus/pci so I went with a separate group for the RFC. "zpci" since that's what we've been calling the s390 specific PCI. I'm not attached to that name or having a separate group, from my perspective /sys/bus/pci/unique_uids would actually be ideal if Bjorn is okay with that, we don't currently foresee any additional global attributes and no one else seems to have them but one never knows and a separate group would then of course, well group them. > >> +Date: February 2021 >> +Contact:Niklas Schnelle >> +Description: >> +This attribute exposes the global state of UID Uniqueness on an >> +s390 Linux system. If this file contains '1' the per-device UID >> +attribute is guaranteed to provide a unique user defined >> +identifier for that PCI device. If this file contains '0' UIDs >> +may collide and do not provide a unique identifier. > > What are they "colliding" with? And where does the UID come from, the > device itself or somewhere else? If this attribute is 0 multiple PCI devices seen by Linux may have the same UID i.e. they may collide with each other on the same Linux instance. The UIDs are exposed under /sys/bus/pci/devices//uid. Even if the attribute is 1 multiple Linux instances will often see the same UID. The main use case we currently envision is naming PCI based network interfaces "eno" which of course only works if the UIDs are unique for that Linux. On the same mainframe multiple Linux instances may then e.g. use the same UID for VFs from the same physical PCI network card or different cards but the same physical network all defined by an administrator/management system. The UIDs come from the platform/firmware/hypervisor and are provided for each device by the CLP List PCI Functions "instruction" that is used on s390x where an OS can't probe the physical PCI bus but instead that is done by firmware. On QEMU/KVM they can be set on the QEMU cli, on our machine hypervisor they are defined by the machine administrator/management software as part of the definition of VMs/Partitions on that mainframe which includes everything from the number of CPUs, memory, I/O devices etc. With the exposed UID uniqueness attribute the platform then certifies that it will ensure that a UID is set to a unique non-zero value. I can of course add more of this explanation in the documentation too. Both the uniqueness guarantee (this attribute) as well as the UIDs themselves are part of the Z (s390x) architecture so fall under the mainframe typical backwards compatibility considerations. Thanks, Niklas > > thanks, > > greg k-h >
Re: [RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute
On Thu, Feb 04, 2021 at 10:43:53AM +0100, Niklas Schnelle wrote: > The global UID uniqueness attribute exposes whether the platform > guarantees that the user-defined per-device UID attribute values > (/sys/bus/pci/device//uid) are unique and can thus be used as > a global identifier for the associated PCI device. With this commit > it is exposed at /sys/bus/pci/zpci/unique_uids > > Signed-off-by: Niklas Schnelle > --- > Documentation/ABI/testing/sysfs-bus-pci | 9 + > drivers/pci/pci-sysfs.c | 21 + > 2 files changed, 30 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci > b/Documentation/ABI/testing/sysfs-bus-pci > index 25c9c39770c6..812dd9d3f80d 100644 > --- a/Documentation/ABI/testing/sysfs-bus-pci > +++ b/Documentation/ABI/testing/sysfs-bus-pci > @@ -375,3 +375,12 @@ Description: > The value comes from the PCI kernel device state and can be one > of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". > The file is read only. > +What:/sys/bus/pci/zpci/unique_uids No blank line before this new line? And why "zpci"? > +Date:February 2021 > +Contact: Niklas Schnelle > +Description: > + This attribute exposes the global state of UID Uniqueness on an > + s390 Linux system. If this file contains '1' the per-device UID > + attribute is guaranteed to provide a unique user defined > + identifier for that PCI device. If this file contains '0' UIDs > + may collide and do not provide a unique identifier. What are they "colliding" with? And where does the UID come from, the device itself or somewhere else? thanks, greg k-h
[RFC v2 1/1] PCI: Add s390 specific UID uniqueness attribute
The global UID uniqueness attribute exposes whether the platform guarantees that the user-defined per-device UID attribute values (/sys/bus/pci/device//uid) are unique and can thus be used as a global identifier for the associated PCI device. With this commit it is exposed at /sys/bus/pci/zpci/unique_uids Signed-off-by: Niklas Schnelle --- Documentation/ABI/testing/sysfs-bus-pci | 9 + drivers/pci/pci-sysfs.c | 21 + 2 files changed, 30 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 25c9c39770c6..812dd9d3f80d 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -375,3 +375,12 @@ Description: The value comes from the PCI kernel device state and can be one of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". The file is read only. +What: /sys/bus/pci/zpci/unique_uids +Date: February 2021 +Contact: Niklas Schnelle +Description: + This attribute exposes the global state of UID Uniqueness on an + s390 Linux system. If this file contains '1' the per-device UID + attribute is guaranteed to provide a unique user defined + identifier for that PCI device. If this file contains '0' UIDs + may collide and do not provide a unique identifier. diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index fb072f4b3176..61de66ab59cf 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -425,6 +425,24 @@ static ssize_t rescan_store(struct bus_type *bus, const char *buf, size_t count) } static BUS_ATTR_WO(rescan); +#if defined(CONFIG_S390) +static ssize_t unique_uids_show(struct bus_type *bus, char *buf) +{ + return sprintf(buf, "%i\n", zpci_unique_uid); +} +static BUS_ATTR_RO(unique_uids); + +static struct attribute *zpci_bus_attrs[] = { + &bus_attr_unique_uids.attr, + NULL, +}; + +static struct attribute_group zpci_bus_group = { + .name = "zpci", + .attrs = zpci_bus_attrs, +}; +#endif + static struct attribute *pci_bus_attrs[] = { &bus_attr_rescan.attr, NULL, @@ -436,6 +454,9 @@ static const struct attribute_group pci_bus_group = { const struct attribute_group *pci_bus_groups[] = { &pci_bus_group, +#if defined(CONFIG_S390) + &zpci_bus_group, +#endif NULL, }; -- 2.17.1
[RFC v2 0/1] s390/pci: expose UID checking state in sysfs
Hi Bjorn, this is a follow-up to my previous RFC of exposing our s390 specific UID Checking attribute at /sys/bus/pci/zpci/unique_uids. As suggested by Greg (thanks!) this version changes things to use named attributes directly without resorting to any raw kobject handling, as a result the code moves to drivers/pci/pci-sysfs.c with a CONFIG_S390 ifdef. Also I've changed the wording to be more restrictive about what this attribute means. Instead of directly calling out its current use to determine if UIDs are used as PCI domains it now only explicitly claims that /sys/bus/pci/devices//uid is guaranteed to be a unique user-defined (in the zVM/KVM/LPAR configuration not Linux user-space) identifier for the PCI function. We've had some more internal discussion on this and are also considering to instead put this attribute at /sys/firmware/zpci/unique_uids but as far as I can see this strictly requires the use of raw kobject handling and loses us the direct relation with PCI so I wanted to give this just one more shot and get your opinion on it. Thanks, Niklas Niklas Schnelle (1): PCI: Add s390 specific UID uniqueness attribute Documentation/ABI/testing/sysfs-bus-pci | 9 + drivers/pci/pci-sysfs.c | 21 + 2 files changed, 30 insertions(+) -- 2.17.1
[PATCH v17 2/7] KVM: arm64: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong...@arm.com --- arch/arm64/kvm/hypercalls.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 25ea4ecb6449..b9d8607083eb 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u64 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case SPECTRE_VULNERABLE: break; case SPECTRE_MITIGATED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; + val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; @@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; fallthrough; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; default: return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.29.2
Re: [PATCH] configfs: make directories inherit uid/gid from creator
> > Currently a non-root process can create directories, but cannot > > create stuff in the directories it creates. > > Isn't that on purpose? Is it? What's the use case of being able to create empty directories you can't use? Why allow their creation in the first place then? > > + (void)configfs_setattr(dentry, &ia); > > No need for (void), here, right? Just being clear we're ignoring any potential error return. > And this feels like a big user-visible change, what is going to break > with this? That I don't know, but it's unlikely to break anything, since it's virtually impossible to use as it is now. If non-root creates the directory, it's currently root-owned, so can only be used by root (or something with appropriate caps). Something with capabilities will be able to use it even if it is no longer owned by root.
Re: [PATCH] configfs: make directories inherit uid/gid from creator
On Sat, Jan 23, 2021 at 12:55:16PM -0800, Maciej Żenczykowski wrote: > From: Maciej Żenczykowski > > Currently a non-root process can create directories, but cannot > create stuff in the directories it creates. Isn't that on purpose? > > Example (before this patch): > phone:/ $ id > uid=1000(system) gid=1000(system) groups=1000(system),... context=u:r:su:s0 > > phone:/ $ cd /config/usb_gadget/g1/configs/ > > phone:/config/usb_gadget/g1/configs $ ls -lZ > drwxr-xr-x 3 system system u:object_r:configfs:s0 0 2020-12-28 06:03 b.1 > > phone:/config/usb_gadget/g1/configs $ mkdir b.2 > > phone:/config/usb_gadget/g1/configs $ ls -lZ > drwxr-xr-x 3 system system u:object_r:configfs:s0 0 2020-12-28 06:03 b.1 > drwxr-xr-x 3 root root u:object_r:configfs:s0 0 2020-12-28 06:51 b.2 > > phone:/config/usb_gadget/g1/configs $ chown system:system b.2 > chown: 'b.2' to 'system:system': Operation not permitted > > phone:/config/usb_gadget/g1/configs $ ls -lZ b.1 > -rw-r--r-- 1 system system u:object_r:configfs:s0 4096 2020-12-28 05:23 > MaxPower > -rw-r--r-- 1 system system u:object_r:configfs:s0 4096 2020-12-28 05:23 > bmAttributes > lrwxrwxrwx 1 root root u:object_r:configfs:s0 0 2020-12-28 05:23 > function0 -> ../../../../usb_gadget/g1/functions/ffs.adb > drwxr-xr-x 2 system system u:object_r:configfs:s0 0 2020-12-28 05:23 > strings > > phone:/config/usb_gadget/g1/configs $ ln -s > ../../../../usb_gadget/g1/functions/ffs.adb b.2/function0 > ln: cannot create symbolic link from > '../../../../usb_gadget/g1/functions/ffs.adb' to 'b.2/function0': Permission > denied > > Cc: Greg Kroah-Hartman > Cc: Joel Becker > Cc: Christoph Hellwig > Signed-off-by: Maciej Żenczykowski > Change-Id: Ia907b2def940197b44aa87b337d37c5dde9c5b91 No need for "Change-Id:" :) > --- > fs/configfs/dir.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > index b839dd1b459f..04f18402ef7c 100644 > --- a/fs/configfs/dir.c > +++ b/fs/configfs/dir.c > @@ -1410,6 +1410,21 @@ static int configfs_mkdir(struct inode *dir, struct > dentry *dentry, umode_t mode > else > ret = configfs_attach_item(parent_item, item, dentry, frag); > > + /* inherit uid/gid from process creating the directory */ > + if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID) || > + !gid_eq(current_fsgid(), GLOBAL_ROOT_GID)) { > + struct inode * inode = d_inode(dentry); > + struct iattr ia = { > + .ia_uid = current_fsuid(), > + .ia_gid = current_fsgid(), > + .ia_valid = ATTR_UID | ATTR_GID, > + }; > + inode->i_uid = ia.ia_uid; > + inode->i_gid = ia.ia_gid; > + /* (note: the above manual assignments skip the permission > checks) */ > + (void)configfs_setattr(dentry, &ia); No need for (void), here, right? And this feels like a big user-visible change, what is going to break with this? thanks, greg k-h
Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
On 2021-01-25 14:25:38, Miklos Szeredi wrote: > On Fri, Jan 22, 2021 at 7:31 PM Tyler Hicks wrote: > > > > On 2021-01-19 17:22:03, Miklos Szeredi wrote: > > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into > > > vfs_setxattr()") the translation of nscap->rootid did not take stacked > > > filesystems (overlayfs and ecryptfs) into account. > > > > > > That patch fixed the overlay case, but made the ecryptfs case worse. > > > > Thanks for sending a fix! > > > > I know that you don't have an eCryptfs setup to test with but I'm at a > > loss about how to test this from the userns/fscaps side of things. Do > > you have a sequence of unshare/setcap/getcap commands that I can run on > > a file inside of an eCryptfs mount to verify that the bug exists after > > 7c03e2cda4a5 and then again to verify that this patch fixes the bug? > > You need two terminals: > $ = > # = root > > $ unshare -Um > $ echo $$ > > # echo "0 1000 1" > uid_map > # cp uid_map gid_map > # echo 1000 2000 1 >> uid_map > # echo 2000 3000 1 >> uid_map > # cat uid_map > /proc//uid_map > # cat gid_map > /proc//gid_map > $ mkdir ~/tmp ~/mnt > $ mount -t tmpfs tmpfs ~/tmp > $ pwd > /home/ > # nsenter -t -m > # [setup ecryptfs on /home//mnt using /home//tmp] > $ cd ~/mnt > $ touch test > $ /sbin/setcap -n 1000 cap_dac_override+eip test > $ /sbin/getcap -n test > test = cap_dac_override+eip [rootid=1000] > > Without the patch, I'm thinking that it will do a double translate and > end up with rootid=2000 in the user namespace, but I might well have > messed it up... > > Let me know how this goes. Spot-on instructions. Thank you for taking the time to provide the steps. I was able to repro the bug and verify the fix. The change visually looks good to me and it passed through the eCryptfs regression tests. I've pushed it to the eCryptfs next branch and I plan to submit it to Linus on Thursday. Thanks again! Tyler > > Thanks, > Miklos >
Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
On Mon, Jan 25, 2021 at 2:25 PM Miklos Szeredi wrote: > > On Fri, Jan 22, 2021 at 7:31 PM Tyler Hicks wrote: > > > > On 2021-01-19 17:22:03, Miklos Szeredi wrote: > > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into > > > vfs_setxattr()") the translation of nscap->rootid did not take stacked > > > filesystems (overlayfs and ecryptfs) into account. > > > > > > That patch fixed the overlay case, but made the ecryptfs case worse. > > > > Thanks for sending a fix! > > > > I know that you don't have an eCryptfs setup to test with but I'm at a > > loss about how to test this from the userns/fscaps side of things. Do > > you have a sequence of unshare/setcap/getcap commands that I can run on > > a file inside of an eCryptfs mount to verify that the bug exists after > > 7c03e2cda4a5 and then again to verify that this patch fixes the bug? > > You need two terminals: > $ = > # = root > > $ unshare -Um > $ echo $$ > > # echo "0 1000 1" > uid_map NOTE: is assumed to have uid=1000, so this and following "1000" values need to be fixed up if it's not the case. Thanks, Miklos
Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
On Fri, Jan 22, 2021 at 7:31 PM Tyler Hicks wrote: > > On 2021-01-19 17:22:03, Miklos Szeredi wrote: > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into > > vfs_setxattr()") the translation of nscap->rootid did not take stacked > > filesystems (overlayfs and ecryptfs) into account. > > > > That patch fixed the overlay case, but made the ecryptfs case worse. > > Thanks for sending a fix! > > I know that you don't have an eCryptfs setup to test with but I'm at a > loss about how to test this from the userns/fscaps side of things. Do > you have a sequence of unshare/setcap/getcap commands that I can run on > a file inside of an eCryptfs mount to verify that the bug exists after > 7c03e2cda4a5 and then again to verify that this patch fixes the bug? You need two terminals: $ = # = root $ unshare -Um $ echo $$ # echo "0 1000 1" > uid_map # cp uid_map gid_map # echo 1000 2000 1 >> uid_map # echo 2000 3000 1 >> uid_map # cat uid_map > /proc//uid_map # cat gid_map > /proc//gid_map $ mkdir ~/tmp ~/mnt $ mount -t tmpfs tmpfs ~/tmp $ pwd /home/ # nsenter -t -m # [setup ecryptfs on /home//mnt using /home//tmp] $ cd ~/mnt $ touch test $ /sbin/setcap -n 1000 cap_dac_override+eip test $ /sbin/getcap -n test test = cap_dac_override+eip [rootid=1000] Without the patch, I'm thinking that it will do a double translate and end up with rootid=2000 in the user namespace, but I might well have messed it up... Let me know how this goes. Thanks, Miklos
[PATCH] configfs: make directories inherit uid/gid from creator
From: Maciej Żenczykowski Currently a non-root process can create directories, but cannot create stuff in the directories it creates. Example (before this patch): phone:/ $ id uid=1000(system) gid=1000(system) groups=1000(system),... context=u:r:su:s0 phone:/ $ cd /config/usb_gadget/g1/configs/ phone:/config/usb_gadget/g1/configs $ ls -lZ drwxr-xr-x 3 system system u:object_r:configfs:s0 0 2020-12-28 06:03 b.1 phone:/config/usb_gadget/g1/configs $ mkdir b.2 phone:/config/usb_gadget/g1/configs $ ls -lZ drwxr-xr-x 3 system system u:object_r:configfs:s0 0 2020-12-28 06:03 b.1 drwxr-xr-x 3 root root u:object_r:configfs:s0 0 2020-12-28 06:51 b.2 phone:/config/usb_gadget/g1/configs $ chown system:system b.2 chown: 'b.2' to 'system:system': Operation not permitted phone:/config/usb_gadget/g1/configs $ ls -lZ b.1 -rw-r--r-- 1 system system u:object_r:configfs:s0 4096 2020-12-28 05:23 MaxPower -rw-r--r-- 1 system system u:object_r:configfs:s0 4096 2020-12-28 05:23 bmAttributes lrwxrwxrwx 1 root root u:object_r:configfs:s0 0 2020-12-28 05:23 function0 -> ../../../../usb_gadget/g1/functions/ffs.adb drwxr-xr-x 2 system system u:object_r:configfs:s0 0 2020-12-28 05:23 strings phone:/config/usb_gadget/g1/configs $ ln -s ../../../../usb_gadget/g1/functions/ffs.adb b.2/function0 ln: cannot create symbolic link from '../../../../usb_gadget/g1/functions/ffs.adb' to 'b.2/function0': Permission denied Cc: Greg Kroah-Hartman Cc: Joel Becker Cc: Christoph Hellwig Signed-off-by: Maciej Żenczykowski Change-Id: Ia907b2def940197b44aa87b337d37c5dde9c5b91 --- fs/configfs/dir.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c index b839dd1b459f..04f18402ef7c 100644 --- a/fs/configfs/dir.c +++ b/fs/configfs/dir.c @@ -1410,6 +1410,21 @@ static int configfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode else ret = configfs_attach_item(parent_item, item, dentry, frag); + /* inherit uid/gid from process creating the directory */ + if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID) || + !gid_eq(current_fsgid(), GLOBAL_ROOT_GID)) { + struct inode * inode = d_inode(dentry); + struct iattr ia = { + .ia_uid = current_fsuid(), + .ia_gid = current_fsgid(), + .ia_valid = ATTR_UID | ATTR_GID, + }; + inode->i_uid = ia.ia_uid; + inode->i_gid = ia.ia_gid; + /* (note: the above manual assignments skip the permission checks) */ + (void)configfs_setattr(dentry, &ia); + } + spin_lock(&configfs_dirent_lock); sd->s_type &= ~CONFIGFS_USET_IN_MKDIR; if (!ret) -- 2.30.0.280.ga3ce27912f-goog
Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
On 2021-01-19 17:22:03, Miklos Szeredi wrote: > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into > vfs_setxattr()") the translation of nscap->rootid did not take stacked > filesystems (overlayfs and ecryptfs) into account. > > That patch fixed the overlay case, but made the ecryptfs case worse. Thanks for sending a fix! I know that you don't have an eCryptfs setup to test with but I'm at a loss about how to test this from the userns/fscaps side of things. Do you have a sequence of unshare/setcap/getcap commands that I can run on a file inside of an eCryptfs mount to verify that the bug exists after 7c03e2cda4a5 and then again to verify that this patch fixes the bug? Tyler > > Restore old the behavior for ecryptfs that existed before the overlayfs > fix. This does not fix ecryptfs's handling of complex user namespace > setups, but it does make sure existing setups don't regress. > > Reported-by: Eric W. Biederman > Cc: Tyler Hicks > Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()") > Signed-off-by: Miklos Szeredi > --- > fs/ecryptfs/inode.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index e23752d9a79f..58d0f7187997 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -1016,15 +1016,19 @@ ecryptfs_setxattr(struct dentry *dentry, struct inode > *inode, > { > int rc; > struct dentry *lower_dentry; > + struct inode *lower_inode; > > lower_dentry = ecryptfs_dentry_to_lower(dentry); > - if (!(d_inode(lower_dentry)->i_opflags & IOP_XATTR)) { > + lower_inode = d_inode(lower_dentry); > + if (!(lower_inode->i_opflags & IOP_XATTR)) { > rc = -EOPNOTSUPP; > goto out; > } > - rc = vfs_setxattr(lower_dentry, name, value, size, flags); > + inode_lock(lower_inode); > + rc = __vfs_setxattr_locked(lower_dentry, name, value, size, flags, > NULL); > + inode_unlock(lower_inode); > if (!rc && inode) > - fsstack_copy_attr_all(inode, d_inode(lower_dentry)); > + fsstack_copy_attr_all(inode, lower_inode); > out: > return rc; > } > -- > 2.26.2 >
Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
On 2021-01-20 08:52:27, Miklos Szeredi wrote: > On Tue, Jan 19, 2021 at 10:11 PM Eric W. Biederman > wrote: > > > > Miklos Szeredi writes: > > > > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into > > > vfs_setxattr()") the translation of nscap->rootid did not take stacked > > > filesystems (overlayfs and ecryptfs) into account. > > > > > > That patch fixed the overlay case, but made the ecryptfs case worse. > > > > > > Restore old the behavior for ecryptfs that existed before the overlayfs > > > fix. This does not fix ecryptfs's handling of complex user namespace > > > setups, but it does make sure existing setups don't regress. > > > > Today vfs_setxattr handles handles a delegated_inode and breaking > > leases. Code that is enabled with CONFIG_FILE_LOCKING. So unless > > I am missing something this introduces a different regression into > > ecryptfs. > > This is in line with all the other cases of ecryptfs passing NULL as > delegated inode. > > I'll defer this to the maintainer of ecryptfs. eCryptfs cannot be exported so I do not think this proposed fix to ecryptfs_setxattr() creates a new regression wrt inode delegation. Tyler > > Thanks, > Miklos
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/21/21 6:28 PM, Greg Kroah-Hartman wrote: > On Thu, Jan 21, 2021 at 06:04:52PM +0100, Niklas Schnelle wrote: >> >> >> On 1/21/21 4:54 PM, Bjorn Helgaas wrote: >>> [Greg may be able to help compare/contrast this s390 UID with udev >>> persistent names] >>> >>> On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote: >>>> On 1/15/21 4:29 PM, Bjorn Helgaas wrote: >>>>> On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: >>>>>> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: >>>>>>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: >>>>>>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: >>>>>>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: >>>>>>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: >>>>>>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: >>>>>>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: >>>>>>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: >>>>>>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: >>>>>>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: >>>>>> ... snip ... >>>>>> >>>>>>> >>>>>>>>if (!zpci_global_kset) >>>>>>>>return -ENOMEM; >>>>>>>> >>>>>>>>return sysfs_create_group(&zpci_global_kset->kobj, >>>>>>>> &zpci_attr_group_global); >>>>>>> >>>>>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, >>>>>>> that's usually a huge clue that you are doing something wrong. >>>>>>> >>>>>>> Try the above again, with a simple attribute group, and name for it, and >>>>>>> it should "just work". >>>>>> >>>>>> I'm probably missing something but I don't get how this could work >>>>>> in this case. If I'm seeing this right the default attribute group >>>>>> here is pci_bus_type.bus_groups and that is already set in >>>>>> drivers/pci/pci-driver.c so I don't think I should set that. >>>>>> >>>>>> I did however find bus_create_file() which does work when using the >>>>>> path /sys/bus/pci/uid_checking instead. This would work for us if >>>>>> Bjorn is okay with that path and the code is really clean and simple >>>>>> too. >>>>>> >>>>>> That said, I think we could also add something like >>>>>> bus_create_group(). Then we could use that to also clean up >>>>>> drivers/pci/slot.c:pci_slot_init() and get the original path >>>>>> /sys/bus/pci/zpci/uid_checking. >>>>> >>>>> I don't think "uid_checking" is quite the right name. It says >>>>> something about the *implementation*, but it doesn't convey what that >>>>> *means* to userspace. IIUC this file tells userspace something about >>>>> whether a given PCI device always has the same PCI domain/bus/dev/fn >>>>> address (or maybe just the same domain?) >>>>> >>>>> It sounds like this feature could be useful beyond just s390, and >>>>> other arches might implement it differently, without the UID concept. >>>>> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as >>>>> the name is not s390-specific (and "uid" sounds s390-specific). >>>>> >>>>> I assume it would also help with the udev/systemd end if you could >>>>> make this less s390 dependent. >>>> >>>> I've thought about this more and even implemented a proof of concept >>>> patch for a global attribute using a pcibios_has_reproducible_addressing() >>>> hook. >>>> >>>> However after implementing it I think as a more general and >>>> future proof concept it makes more sense to do this as a per device >>>> attribute, maybe as another flag in "stuct pci_dev" named something >>>> like "reliable_address". My reasoning behind this can
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Thu, Jan 21, 2021 at 06:04:52PM +0100, Niklas Schnelle wrote: > > > On 1/21/21 4:54 PM, Bjorn Helgaas wrote: > > [Greg may be able to help compare/contrast this s390 UID with udev > > persistent names] > > > > On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote: > >> On 1/15/21 4:29 PM, Bjorn Helgaas wrote: > >>> On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: > >>>> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: > >>>>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: > >>>>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: > >>>>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: > >>>>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > >>>>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > >>>>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > >>>>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > >>>>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > >>>>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > >>>> ... snip ... > >>>> > >>>>> > >>>>>>if (!zpci_global_kset) > >>>>>>return -ENOMEM; > >>>>>> > >>>>>>return sysfs_create_group(&zpci_global_kset->kobj, > >>>>>> &zpci_attr_group_global); > >>>>> > >>>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, > >>>>> that's usually a huge clue that you are doing something wrong. > >>>>> > >>>>> Try the above again, with a simple attribute group, and name for it, and > >>>>> it should "just work". > >>>> > >>>> I'm probably missing something but I don't get how this could work > >>>> in this case. If I'm seeing this right the default attribute group > >>>> here is pci_bus_type.bus_groups and that is already set in > >>>> drivers/pci/pci-driver.c so I don't think I should set that. > >>>> > >>>> I did however find bus_create_file() which does work when using the > >>>> path /sys/bus/pci/uid_checking instead. This would work for us if > >>>> Bjorn is okay with that path and the code is really clean and simple > >>>> too. > >>>> > >>>> That said, I think we could also add something like > >>>> bus_create_group(). Then we could use that to also clean up > >>>> drivers/pci/slot.c:pci_slot_init() and get the original path > >>>> /sys/bus/pci/zpci/uid_checking. > >>> > >>> I don't think "uid_checking" is quite the right name. It says > >>> something about the *implementation*, but it doesn't convey what that > >>> *means* to userspace. IIUC this file tells userspace something about > >>> whether a given PCI device always has the same PCI domain/bus/dev/fn > >>> address (or maybe just the same domain?) > >>> > >>> It sounds like this feature could be useful beyond just s390, and > >>> other arches might implement it differently, without the UID concept. > >>> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as > >>> the name is not s390-specific (and "uid" sounds s390-specific). > >>> > >>> I assume it would also help with the udev/systemd end if you could > >>> make this less s390 dependent. > >> > >> I've thought about this more and even implemented a proof of concept > >> patch for a global attribute using a pcibios_has_reproducible_addressing() > >> hook. > >> > >> However after implementing it I think as a more general and > >> future proof concept it makes more sense to do this as a per device > >> attribute, maybe as another flag in "stuct pci_dev" named something > >> like "reliable_address". My reasoning behind this can be best be seen > >> with a QEMU example. While I expect that QEMU can easily guarantee > >> that one can always use ":01:00.0" for a virtio-pci NIC and > >> thus enp1s0 interface name, the same might be harder to guarantee &g
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/21/21 4:54 PM, Bjorn Helgaas wrote: > [Greg may be able to help compare/contrast this s390 UID with udev > persistent names] > > On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote: >> On 1/15/21 4:29 PM, Bjorn Helgaas wrote: >>> On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: >>>> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: >>>>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: >>>>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: >>>>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: >>>>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: >>>>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: >>>>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: >>>>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: >>>>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: >>>>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: >>>> ... snip ... >>>> >>>>> >>>>>> if (!zpci_global_kset) >>>>>> return -ENOMEM; >>>>>> >>>>>> return sysfs_create_group(&zpci_global_kset->kobj, >>>>>> &zpci_attr_group_global); >>>>> >>>>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, >>>>> that's usually a huge clue that you are doing something wrong. >>>>> >>>>> Try the above again, with a simple attribute group, and name for it, and >>>>> it should "just work". >>>> >>>> I'm probably missing something but I don't get how this could work >>>> in this case. If I'm seeing this right the default attribute group >>>> here is pci_bus_type.bus_groups and that is already set in >>>> drivers/pci/pci-driver.c so I don't think I should set that. >>>> >>>> I did however find bus_create_file() which does work when using the >>>> path /sys/bus/pci/uid_checking instead. This would work for us if >>>> Bjorn is okay with that path and the code is really clean and simple >>>> too. >>>> >>>> That said, I think we could also add something like >>>> bus_create_group(). Then we could use that to also clean up >>>> drivers/pci/slot.c:pci_slot_init() and get the original path >>>> /sys/bus/pci/zpci/uid_checking. >>> >>> I don't think "uid_checking" is quite the right name. It says >>> something about the *implementation*, but it doesn't convey what that >>> *means* to userspace. IIUC this file tells userspace something about >>> whether a given PCI device always has the same PCI domain/bus/dev/fn >>> address (or maybe just the same domain?) >>> >>> It sounds like this feature could be useful beyond just s390, and >>> other arches might implement it differently, without the UID concept. >>> If so, I'm OK with something at the /sys/bus/pci/xxx level as long as >>> the name is not s390-specific (and "uid" sounds s390-specific). >>> >>> I assume it would also help with the udev/systemd end if you could >>> make this less s390 dependent. >> >> I've thought about this more and even implemented a proof of concept >> patch for a global attribute using a pcibios_has_reproducible_addressing() >> hook. >> >> However after implementing it I think as a more general and >> future proof concept it makes more sense to do this as a per device >> attribute, maybe as another flag in "stuct pci_dev" named something >> like "reliable_address". My reasoning behind this can be best be seen >> with a QEMU example. While I expect that QEMU can easily guarantee >> that one can always use ":01:00.0" for a virtio-pci NIC and >> thus enp1s0 interface name, the same might be harder to guarantee >> for a SR-IOV VF passed through with vfio-pci in that same VM and >> even less so if a thunderbolt controller is passed through and >> enumeration may depend on daisy chaining. The QEMU example >> also applies to s390 and maybe others will in the future. > > I'm a little wary of using the PCI geographical address > (":01:00.0") as a stable name. Even if you can make a way to use > that to identi
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
[Greg may be able to help compare/contrast this s390 UID with udev persistent names] On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote: > On 1/15/21 4:29 PM, Bjorn Helgaas wrote: > > On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: > >> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: > >>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: > >>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: > >>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: > >>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > >>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > >>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > >>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > >>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > >>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > >> ... snip ... > >> > >>> > >>>> if (!zpci_global_kset) > >>>> return -ENOMEM; > >>>> > >>>> return sysfs_create_group(&zpci_global_kset->kobj, > >>>> &zpci_attr_group_global); > >>> > >>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, > >>> that's usually a huge clue that you are doing something wrong. > >>> > >>> Try the above again, with a simple attribute group, and name for it, and > >>> it should "just work". > >> > >> I'm probably missing something but I don't get how this could work > >> in this case. If I'm seeing this right the default attribute group > >> here is pci_bus_type.bus_groups and that is already set in > >> drivers/pci/pci-driver.c so I don't think I should set that. > >> > >> I did however find bus_create_file() which does work when using the > >> path /sys/bus/pci/uid_checking instead. This would work for us if > >> Bjorn is okay with that path and the code is really clean and simple > >> too. > >> > >> That said, I think we could also add something like > >> bus_create_group(). Then we could use that to also clean up > >> drivers/pci/slot.c:pci_slot_init() and get the original path > >> /sys/bus/pci/zpci/uid_checking. > > > > I don't think "uid_checking" is quite the right name. It says > > something about the *implementation*, but it doesn't convey what that > > *means* to userspace. IIUC this file tells userspace something about > > whether a given PCI device always has the same PCI domain/bus/dev/fn > > address (or maybe just the same domain?) > > > > It sounds like this feature could be useful beyond just s390, and > > other arches might implement it differently, without the UID concept. > > If so, I'm OK with something at the /sys/bus/pci/xxx level as long as > > the name is not s390-specific (and "uid" sounds s390-specific). > > > > I assume it would also help with the udev/systemd end if you could > > make this less s390 dependent. > > I've thought about this more and even implemented a proof of concept > patch for a global attribute using a pcibios_has_reproducible_addressing() > hook. > > However after implementing it I think as a more general and > future proof concept it makes more sense to do this as a per device > attribute, maybe as another flag in "stuct pci_dev" named something > like "reliable_address". My reasoning behind this can be best be seen > with a QEMU example. While I expect that QEMU can easily guarantee > that one can always use ":01:00.0" for a virtio-pci NIC and > thus enp1s0 interface name, the same might be harder to guarantee > for a SR-IOV VF passed through with vfio-pci in that same VM and > even less so if a thunderbolt controller is passed through and > enumeration may depend on daisy chaining. The QEMU example > also applies to s390 and maybe others will in the future. I'm a little wary of using the PCI geographical address (":01:00.0") as a stable name. Even if you can make a way to use that to identify a specific device instance, regardless of how it is plugged in or passed through, it sounds like we could end up with "physical PCI addresses" and "virtual PCI addresses" that look the same and would cause confusion. This concept sounds similar to the udev concept of a "persistent device name". What advantages does this s390 UID have over the udev approach? There are optional PCI device serial numbers that we currently don't really make use of. Would that be a generic way to help with this?
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Thu, Jan 21, 2021 at 09:54:45AM -0600, Bjorn Helgaas wrote: > [Greg may be able to help compare/contrast this s390 UID with udev > persistent names] > > On Thu, Jan 21, 2021 at 04:31:55PM +0100, Niklas Schnelle wrote: > > On 1/15/21 4:29 PM, Bjorn Helgaas wrote: > > > On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: > > >> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: > > >>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: > > >>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: > > >>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: > > >>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > > >>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > > >>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > > >>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > > >>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > > >>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > > >> ... snip ... > > >> > > >>> > > >>>>if (!zpci_global_kset) > > >>>>return -ENOMEM; > > >>>> > > >>>>return sysfs_create_group(&zpci_global_kset->kobj, > > >>>> &zpci_attr_group_global); > > >>> > > >>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, > > >>> that's usually a huge clue that you are doing something wrong. > > >>> > > >>> Try the above again, with a simple attribute group, and name for it, and > > >>> it should "just work". > > >> > > >> I'm probably missing something but I don't get how this could work > > >> in this case. If I'm seeing this right the default attribute group > > >> here is pci_bus_type.bus_groups and that is already set in > > >> drivers/pci/pci-driver.c so I don't think I should set that. > > >> > > >> I did however find bus_create_file() which does work when using the > > >> path /sys/bus/pci/uid_checking instead. This would work for us if > > >> Bjorn is okay with that path and the code is really clean and simple > > >> too. > > >> > > >> That said, I think we could also add something like > > >> bus_create_group(). Then we could use that to also clean up > > >> drivers/pci/slot.c:pci_slot_init() and get the original path > > >> /sys/bus/pci/zpci/uid_checking. > > > > > > I don't think "uid_checking" is quite the right name. It says > > > something about the *implementation*, but it doesn't convey what that > > > *means* to userspace. IIUC this file tells userspace something about > > > whether a given PCI device always has the same PCI domain/bus/dev/fn > > > address (or maybe just the same domain?) > > > > > > It sounds like this feature could be useful beyond just s390, and > > > other arches might implement it differently, without the UID concept. > > > If so, I'm OK with something at the /sys/bus/pci/xxx level as long as > > > the name is not s390-specific (and "uid" sounds s390-specific). > > > > > > I assume it would also help with the udev/systemd end if you could > > > make this less s390 dependent. > > > > I've thought about this more and even implemented a proof of concept > > patch for a global attribute using a pcibios_has_reproducible_addressing() > > hook. > > > > However after implementing it I think as a more general and > > future proof concept it makes more sense to do this as a per device > > attribute, maybe as another flag in "stuct pci_dev" named something > > like "reliable_address". My reasoning behind this can be best be seen > > with a QEMU example. While I expect that QEMU can easily guarantee > > that one can always use ":01:00.0" for a virtio-pci NIC and > > thus enp1s0 interface name, the same might be harder to guarantee > > for a SR-IOV VF passed through with vfio-pci in that same VM and > > even less so if a thunderbolt controller is passed through and > > enumeration may depend on daisy chaining. The QEMU example > > also applies to s390 and maybe others will in the future. > > I
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/15/21 4:29 PM, Bjorn Helgaas wrote: > On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: >> On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: >>> On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: >>>> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: >>>>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: >>>>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: >>>>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: >>>>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: >>>>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: >>>>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: >>>>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: >> ... snip ... >> >>> >>>>if (!zpci_global_kset) >>>>return -ENOMEM; >>>> >>>>return sysfs_create_group(&zpci_global_kset->kobj, >>>> &zpci_attr_group_global); >>> >>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, >>> that's usually a huge clue that you are doing something wrong. >>> >>> Try the above again, with a simple attribute group, and name for it, and >>> it should "just work". >> >> I'm probably missing something but I don't get how this could work >> in this case. If I'm seeing this right the default attribute group >> here is pci_bus_type.bus_groups and that is already set in >> drivers/pci/pci-driver.c so I don't think I should set that. >> >> I did however find bus_create_file() which does work when using the >> path /sys/bus/pci/uid_checking instead. This would work for us if >> Bjorn is okay with that path and the code is really clean and simple >> too. >> >> That said, I think we could also add something like >> bus_create_group(). Then we could use that to also clean up >> drivers/pci/slot.c:pci_slot_init() and get the original path >> /sys/bus/pci/zpci/uid_checking. > > I don't think "uid_checking" is quite the right name. It says > something about the *implementation*, but it doesn't convey what that > *means* to userspace. IIUC this file tells userspace something about > whether a given PCI device always has the same PCI domain/bus/dev/fn > address (or maybe just the same domain?) > > It sounds like this feature could be useful beyond just s390, and > other arches might implement it differently, without the UID concept. > If so, I'm OK with something at the /sys/bus/pci/xxx level as long as > the name is not s390-specific (and "uid" sounds s390-specific). > > I assume it would also help with the udev/systemd end if you could > make this less s390 dependent. > > Bjorn > Hi Bjorn, I've thought about this more and even implemented a proof of concept patch for a global attribute using a pcibios_has_reproducible_addressing() hook. However after implementing it I think as a more general and future proof concept it makes more sense to do this as a per device attribute, maybe as another flag in "stuct pci_dev" named something like "reliable_address". My reasoning behind this can be best be seen with a QEMU example. While I expect that QEMU can easily guarantee that one can always use ":01:00.0" for a virtio-pci NIC and thus enp1s0 interface name, the same might be harder to guarantee for a SR-IOV VF passed through with vfio-pci in that same VM and even less so if a thunderbolt controller is passed through and enumeration may depend on daisy chaining. The QEMU example also applies to s390 and maybe others will in the future. I'll send an RFC for a per device attribute but didn't want to let this discussion stand as if we had abandoned the idea and if you would prefer a global attribute I can also send an RFC for that. Besst regards, Niklas
Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
On Tue, Jan 19, 2021 at 10:11 PM Eric W. Biederman wrote: > > Miklos Szeredi writes: > > > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into > > vfs_setxattr()") the translation of nscap->rootid did not take stacked > > filesystems (overlayfs and ecryptfs) into account. > > > > That patch fixed the overlay case, but made the ecryptfs case worse. > > > > Restore old the behavior for ecryptfs that existed before the overlayfs > > fix. This does not fix ecryptfs's handling of complex user namespace > > setups, but it does make sure existing setups don't regress. > > Today vfs_setxattr handles handles a delegated_inode and breaking > leases. Code that is enabled with CONFIG_FILE_LOCKING. So unless > I am missing something this introduces a different regression into > ecryptfs. This is in line with all the other cases of ecryptfs passing NULL as delegated inode. I'll defer this to the maintainer of ecryptfs. Thanks, Miklos
Re: [PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
Miklos Szeredi writes: > Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into > vfs_setxattr()") the translation of nscap->rootid did not take stacked > filesystems (overlayfs and ecryptfs) into account. > > That patch fixed the overlay case, but made the ecryptfs case worse. > > Restore old the behavior for ecryptfs that existed before the overlayfs > fix. This does not fix ecryptfs's handling of complex user namespace > setups, but it does make sure existing setups don't regress. Today vfs_setxattr handles handles a delegated_inode and breaking leases. Code that is enabled with CONFIG_FILE_LOCKING. So unless I am missing something this introduces a different regression into ecryptfs. > > Reported-by: Eric W. Biederman > Cc: Tyler Hicks > Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()") > Signed-off-by: Miklos Szeredi > --- > fs/ecryptfs/inode.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index e23752d9a79f..58d0f7187997 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -1016,15 +1016,19 @@ ecryptfs_setxattr(struct dentry *dentry, struct inode > *inode, > { > int rc; > struct dentry *lower_dentry; > + struct inode *lower_inode; > > lower_dentry = ecryptfs_dentry_to_lower(dentry); > - if (!(d_inode(lower_dentry)->i_opflags & IOP_XATTR)) { > + lower_inode = d_inode(lower_dentry); > + if (!(lower_inode->i_opflags & IOP_XATTR)) { > rc = -EOPNOTSUPP; > goto out; > } > - rc = vfs_setxattr(lower_dentry, name, value, size, flags); > + inode_lock(lower_inode); > + rc = __vfs_setxattr_locked(lower_dentry, name, value, size, flags, > NULL); > + inode_unlock(lower_inode); > if (!rc && inode) > - fsstack_copy_attr_all(inode, d_inode(lower_dentry)); > + fsstack_copy_attr_all(inode, lower_inode); > out: > return rc; > } Eric
[PATCH 1/2] ecryptfs: fix uid translation for setxattr on security.capability
Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()") the translation of nscap->rootid did not take stacked filesystems (overlayfs and ecryptfs) into account. That patch fixed the overlay case, but made the ecryptfs case worse. Restore old the behavior for ecryptfs that existed before the overlayfs fix. This does not fix ecryptfs's handling of complex user namespace setups, but it does make sure existing setups don't regress. Reported-by: Eric W. Biederman Cc: Tyler Hicks Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()") Signed-off-by: Miklos Szeredi --- fs/ecryptfs/inode.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index e23752d9a79f..58d0f7187997 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -1016,15 +1016,19 @@ ecryptfs_setxattr(struct dentry *dentry, struct inode *inode, { int rc; struct dentry *lower_dentry; + struct inode *lower_inode; lower_dentry = ecryptfs_dentry_to_lower(dentry); - if (!(d_inode(lower_dentry)->i_opflags & IOP_XATTR)) { + lower_inode = d_inode(lower_dentry); + if (!(lower_inode->i_opflags & IOP_XATTR)) { rc = -EOPNOTSUPP; goto out; } - rc = vfs_setxattr(lower_dentry, name, value, size, flags); + inode_lock(lower_inode); + rc = __vfs_setxattr_locked(lower_dentry, name, value, size, flags, NULL); + inode_unlock(lower_inode); if (!rc && inode) - fsstack_copy_attr_all(inode, d_inode(lower_dentry)); + fsstack_copy_attr_all(inode, lower_inode); out: return rc; } -- 2.26.2
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
set) >>>>return -ENOMEM; >>>> >>>>return sysfs_create_group(&zpci_global_kset->kobj, >>>> &zpci_attr_group_global); >>> >>> Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, >>> that's usually a huge clue that you are doing something wrong. >>> >>> Try the above again, with a simple attribute group, and name for it, and >>> it should "just work". >> >> I'm probably missing something but I don't get how this could work >> in this case. If I'm seeing this right the default attribute group >> here is pci_bus_type.bus_groups and that is already set in >> drivers/pci/pci-driver.c so I don't think I should set that. >> >> I did however find bus_create_file() which does work when using the >> path /sys/bus/pci/uid_checking instead. This would work for us if >> Bjorn is okay with that path and the code is really clean and simple >> too. >> >> That said, I think we could also add something like >> bus_create_group(). Then we could use that to also clean up >> drivers/pci/slot.c:pci_slot_init() and get the original path >> /sys/bus/pci/zpci/uid_checking. > > I don't think "uid_checking" is quite the right name. It says > something about the *implementation*, but it doesn't convey what that > *means* to userspace. IIUC this file tells userspace something about > whether a given PCI device always has the same PCI domain/bus/dev/fn > address (or maybe just the same domain?) Yes you're right, in fact we internally also started to think about something more meaning oriented like "unique_uids". This indeed results in PCI addresses which can be relied upon as stable identifiers. For us it's enough that the domain matches as the bus is always 0 and dev/fn are determined by the device and SR-IOV stride etc. so will remain the same for equivalent configurations. > > It sounds like this feature could be useful beyond just s390, and > other arches might implement it differently, without the UID concept. > If so, I'm OK with something at the /sys/bus/pci/xxx level as long as > the name is not s390-specific (and "uid" sounds s390-specific). > > I assume it would also help with the udev/systemd end if you could > make this less s390 dependent. > > Bjorn That's a very good point! I'm absolutely open to making this a common concept. I think the gist could be that the addressing/ids on the bus are reproducible, there might be some user configuration required but then they can be relied upon for stable names like network interfaces. So maybe a name like "reproducible_addressing"? I guess one non-s390 exclusive case of this could be virtualized PCI under QEMU/KVM etc. versus real hardware or even UEFI/Firmware guarantees, right?
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: > On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: > > On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: > >> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: > >>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: > >>>> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > >>>>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > >>>>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > >>>>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > >>>>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > >>>>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > ... snip ... > >>>>>> > >>>>>> Hey Niklas et al. :) > >>>>>> > >>>>>> I think this will need input from Greg. He should be best versed in > >>>>>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's > >>>>>> supposed to be kernel internal. Now, that might just be a matter of > >>>>>> renaming the macro but let's see whether Greg has any better idea or > >>>>>> more questions. :) > >>>>> > >>>>> The big question is, why are you needing this? > >>>>> > >>>>> No driver or driver subsystem should EVER be messing with a "raw" > >>>>> kobject like this. Just use the existing DEVICE_* macros instead > >>>>> please. > >>>>> > >>>>> If you are using a raw kobject, please ask me how to do this properly, > >>>>> as that is something that should NEVER show up in the /sys/devices/* > >>>>> tree. Otherwise userspace tools will break. > >>>>> > >>>>> thanks, > >>>>> > >>>>> greg k-h > >>>> > >>>> Hi Greg, > >>>> > >>>> this is for an architecture specific but global i.e. not device bound PCI > >>>> attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work > >>>> but only if we place the attribute directly under /sys/bus/pci/new_attr. > >>> > >>> Then you are doing something wrong :) > >> > >> That is very possible. > >> > >>> > >>> Where _exactly_ are you wanting to put this attribute? > >> > >> I'm trying for /sys/bus/pci/zpci/uid_checking, I'm using > >> the below code and the attribute even shows up but reading > >> it gives me two 0 bytes only. > >> The relevant code is only a slight alteration of the original patch > >> as follows: > >> > >> static ssize_t uid_checking_show(struct bus_type *bus, char *buf) > >> { > >>return sprintf(buf, "%i\n", zpci_unique_uid); > >> } > >> static BUS_ATTR_RO(uid_checking); > >> > >> static struct kset *zpci_global_kset; > >> > >> static struct attribute *zpci_attrs_global[] = { > >>&bus_attr_uid_checking.attr, > >>NULL, > >> }; > >> > >> static struct attribute_group zpci_attr_group_global = { > >>.attrs = zpci_attrs_global, > >> }; > > > > Name your attribute group, and then you do not have to mess with a > > "raw" kobject like you are below: > > Thanks for this tip and sorry for bothering you again. > > > > >> > >> int __init zpci_sysfs_init(void) > >> { > >>struct kset *pci_bus_kset; > >> > >>pci_bus_kset = bus_get_kset(&pci_bus_type); > >> > >>zpci_global_kset = kset_create_and_add("zpci", NULL, > >> &pci_bus_kset->kobj); > > > > No, do not mess with at kset, just set the default attribute group for > > the bus to the above, and you should be fine. > > Oh ok, I got this idea from the code adding /sys/bus/pci/slots/ in > drivers/pci/slot.c:pci_slot_init(). See below maybe we can clean that up too. > > > > >>if (!zpci_global_kset) > >>return -ENOMEM; > >> > >>return sysfs_create_group(&zpci_global_kset->kobj, > >> &zpci_attr_group_global); > > > > Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, > > that's usually a huge clue that you are doing
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Fri, Jan 15, 2021 at 12:20:59PM +0100, Niklas Schnelle wrote: > > > On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: > > On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: > >> > >> > >> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: > >>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: > > > On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > > On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > >> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > >>> > >>> > >>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > > On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > ... snip ... > >> > >> Hey Niklas et al. :) > >> > >> I think this will need input from Greg. He should be best versed in > >> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's > >> supposed to be kernel internal. Now, that might just be a matter of > >> renaming the macro but let's see whether Greg has any better idea or > >> more questions. :) > > > > The big question is, why are you needing this? > > > > No driver or driver subsystem should EVER be messing with a "raw" > > kobject like this. Just use the existing DEVICE_* macros instead > > please. > > > > If you are using a raw kobject, please ask me how to do this properly, > > as that is something that should NEVER show up in the /sys/devices/* > > tree. Otherwise userspace tools will break. > > > > thanks, > > > > greg k-h > > Hi Greg, > > this is for an architecture specific but global i.e. not device bound PCI > attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work > but only if we place the attribute directly under /sys/bus/pci/new_attr. > >>> > >>> Then you are doing something wrong :) > >> > >> That is very possible. > >> > >>> > >>> Where _exactly_ are you wanting to put this attribute? > >> > >> I'm trying for /sys/bus/pci/zpci/uid_checking, I'm using > >> the below code and the attribute even shows up but reading > >> it gives me two 0 bytes only. > >> The relevant code is only a slight alteration of the original patch > >> as follows: > >> > >> static ssize_t uid_checking_show(struct bus_type *bus, char *buf) > >> { > >>return sprintf(buf, "%i\n", zpci_unique_uid); > >> } > >> static BUS_ATTR_RO(uid_checking); > >> > >> static struct kset *zpci_global_kset; > >> > >> static struct attribute *zpci_attrs_global[] = { > >>&bus_attr_uid_checking.attr, > >>NULL, > >> }; > >> > >> static struct attribute_group zpci_attr_group_global = { > >>.attrs = zpci_attrs_global, > >> }; > > > > Name your attribute group, and then you do not have to mess with a > > "raw" kobject like you are below: > > Thanks for this tip and sorry for bothering you again. > > > > >> > >> int __init zpci_sysfs_init(void) > >> { > >>struct kset *pci_bus_kset; > >> > >>pci_bus_kset = bus_get_kset(&pci_bus_type); > >> > >>zpci_global_kset = kset_create_and_add("zpci", NULL, > >> &pci_bus_kset->kobj); > > > > No, do not mess with at kset, just set the default attribute group for > > the bus to the above, and you should be fine. > > Oh ok, I got this idea from the code adding /sys/bus/pci/slots/ in > drivers/pci/slot.c:pci_slot_init(). See below maybe we can clean that up too. Slots are "interesting" and that code is really old, we know how to do things better now :) But I doubt we should change anything there, as it does work, and userspace is used to how they come/go. > >>if (!zpci_global_kset) > >>return -ENOMEM; > >> > >>return sysfs_create_group(&zpci_global_kset->kobj, > >> &zpci_attr_group_global); > > > > Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, > > that's usually a huge clue that you are doing something wrong. > > > > Try the above again, with a simple attribute group, and name for it, and > > it should "just work". > > I'm probably missing something but I don't get how this could work in > this case. If I'm seeing this right the default attribute group here > is pci_bus_type.bus_groups and that is already set in drivers/pci/pci-driver.c > so I don't think I should set that. Yes, add your group to that list of groups and all should be good. > I did however find bus_create_file() which does work when using the path > /sys/bus/pci/uid_checking instead. This would work for us if Bjorn is okay > with > that path and the code is really clean and simple too. No, use the above group you already have please. > That said, I think we could also add something like bus_create_group(). No, use the group list as show above please. > Then we could use that to also clean up drivers/pci/slot.c:pci_slot_init() > and get the original path /sys/bus/pci/zpci/uid_checking. What needs to be cleaned up there? > I thin
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/14/21 5:14 PM, Greg Kroah-Hartman wrote: > On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: >> >> >> On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: >>> On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: >> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: >>> >>> >>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > On 1/12/21 10:50 PM, Bjorn Helgaas wrote: ... snip ... >> >> Hey Niklas et al. :) >> >> I think this will need input from Greg. He should be best versed in >> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's >> supposed to be kernel internal. Now, that might just be a matter of >> renaming the macro but let's see whether Greg has any better idea or >> more questions. :) > > The big question is, why are you needing this? > > No driver or driver subsystem should EVER be messing with a "raw" > kobject like this. Just use the existing DEVICE_* macros instead > please. > > If you are using a raw kobject, please ask me how to do this properly, > as that is something that should NEVER show up in the /sys/devices/* > tree. Otherwise userspace tools will break. > > thanks, > > greg k-h Hi Greg, this is for an architecture specific but global i.e. not device bound PCI attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work but only if we place the attribute directly under /sys/bus/pci/new_attr. >>> >>> Then you are doing something wrong :) >> >> That is very possible. >> >>> >>> Where _exactly_ are you wanting to put this attribute? >> >> I'm trying for /sys/bus/pci/zpci/uid_checking, I'm using >> the below code and the attribute even shows up but reading >> it gives me two 0 bytes only. >> The relevant code is only a slight alteration of the original patch >> as follows: >> >> static ssize_t uid_checking_show(struct bus_type *bus, char *buf) >> { >> return sprintf(buf, "%i\n", zpci_unique_uid); >> } >> static BUS_ATTR_RO(uid_checking); >> >> static struct kset *zpci_global_kset; >> >> static struct attribute *zpci_attrs_global[] = { >> &bus_attr_uid_checking.attr, >> NULL, >> }; >> >> static struct attribute_group zpci_attr_group_global = { >> .attrs = zpci_attrs_global, >> }; > > Name your attribute group, and then you do not have to mess with a > "raw" kobject like you are below: Thanks for this tip and sorry for bothering you again. > >> >> int __init zpci_sysfs_init(void) >> { >> struct kset *pci_bus_kset; >> >> pci_bus_kset = bus_get_kset(&pci_bus_type); >> >> zpci_global_kset = kset_create_and_add("zpci", NULL, >> &pci_bus_kset->kobj); > > No, do not mess with at kset, just set the default attribute group for > the bus to the above, and you should be fine. Oh ok, I got this idea from the code adding /sys/bus/pci/slots/ in drivers/pci/slot.c:pci_slot_init(). See below maybe we can clean that up too. > >> if (!zpci_global_kset) >> return -ENOMEM; >> >> return sysfs_create_group(&zpci_global_kset->kobj, >> &zpci_attr_group_global); > > Huge hint, if in a driver, or bus subsystem, and you call sysfs_*, > that's usually a huge clue that you are doing something wrong. > > Try the above again, with a simple attribute group, and name for it, and > it should "just work". I'm probably missing something but I don't get how this could work in this case. If I'm seeing this right the default attribute group here is pci_bus_type.bus_groups and that is already set in drivers/pci/pci-driver.c so I don't think I should set that. I did however find bus_create_file() which does work when using the path /sys/bus/pci/uid_checking instead. This would work for us if Bjorn is okay with that path and the code is really clean and simple too. That said, I think we could also add something like bus_create_group(). Then we could use that to also clean up drivers/pci/slot.c:pci_slot_init() and get the original path /sys/bus/pci/zpci/uid_checking. I think this would also allow us to get rid of pci_bus_get_kset() which is only used in that function and seems to me like it encourages use of raw ksets. Or I'm completely off the mark and just missing something important. thanks, Niklas
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Thu, Jan 14, 2021 at 04:51:17PM +0100, Niklas Schnelle wrote: > > > On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: > > On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: > >> > >> > >> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > >>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > >>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > >>>>> > >>>>> > >>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > >>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > >>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > >>>>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: > >>>>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if > >>>>>>>>> there are multiple functions in an adapter, as PCI domain if and > >>>>>>>>> only if > >>>>>>>>> UID Checking is turned on. > >>>>>>>>> Otherwise we automatically generate domains as devices appear. > >>>>>>>>> > >>>>>>>>> The state of UID Checking is thus essential to know if the PCI > >>>>>>>>> domain > >>>>>>>>> will be stable, yet currently there is no way to access this > >>>>>>>>> information > >>>>>>>>> from userspace. > >>>>>>>>> So let's solve this by showing the state of UID checking as a sysfs > >>>>>>>>> attribute in /sys/bus/pci/uid_checking > >>>>>> > >>>>>>>>> +/* Global zPCI attributes */ > >>>>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj, > >>>>>>>>> +struct kobj_attribute *attr, char *buf) > >>>>>>>>> +{ > >>>>>>>>> + return sprintf(buf, "%i\n", zpci_unique_uid); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr = > >>>>>>>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL); > >>>>>>>> > >>>>>>>> Use DEVICE_ATTR_RO instead of __ATTR. > >>>>>>> > >>>>>>> It's my understanding that DEVICE_ATTR_* is only for > >>>>>>> per device attributes. This one is global for the entire > >>>>>>> Z PCI. I just tried with BUS_ATTR_RO instead > >>>>>>> and that works but only if I put the attribute at > >>>>>>> /sys/bus/pci/uid_checking instead of with a zpci > >>>>>>> subfolder. This path would work for us too, we > >>>>>>> currently don't have any other global attributes > >>>>>>> that we are planning to expose but those could of > >>>>>>> course come up in the future. > >>>>>> > >>>>>> Ah, I missed the fact that this is a kobj_attribute, not a > >>>>>> device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but > >>>>>> seems like it might fit? > >>>>>> > >>>>>> Bjorn > >>>>>> > >>>>> > >>>>> KERNEL_ATTR_* is currently not exported in any header. After > >>>>> adding it to include/linuc/sysfs.h it indeed works perfectly. > >>>>> Adding Christian Brauner as suggested by get_maintainers for > >>>>> their opinion. I'm of course willing to provide a patch > >>>> > >>>> Hey Niklas et al. :) > >>>> > >>>> I think this will need input from Greg. He should be best versed in > >>>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's > >>>> supposed to be kernel internal. Now, that might just be a matter of > >>>> renaming the macro but let's see whether Greg has any better idea or > >>>> more questions. :) > >>> > >>> The big question is, why are you needing this? > >>> > >>> No driver or driver subsystem should EVER be messing with a "raw&
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/14/21 4:17 PM, Greg Kroah-Hartman wrote: > On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: >> >> >> On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: >>> On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: >>>> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: >>>>> >>>>> >>>>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: >>>>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: >>>>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: >>>>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: >>>>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if >>>>>>>>> there are multiple functions in an adapter, as PCI domain if and only >>>>>>>>> if >>>>>>>>> UID Checking is turned on. >>>>>>>>> Otherwise we automatically generate domains as devices appear. >>>>>>>>> >>>>>>>>> The state of UID Checking is thus essential to know if the PCI domain >>>>>>>>> will be stable, yet currently there is no way to access this >>>>>>>>> information >>>>>>>>> from userspace. >>>>>>>>> So let's solve this by showing the state of UID checking as a sysfs >>>>>>>>> attribute in /sys/bus/pci/uid_checking >>>>>> >>>>>>>>> +/* Global zPCI attributes */ >>>>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj, >>>>>>>>> + struct kobj_attribute *attr, char *buf) >>>>>>>>> +{ >>>>>>>>> + return sprintf(buf, "%i\n", zpci_unique_uid); >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr = >>>>>>>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL); >>>>>>>> >>>>>>>> Use DEVICE_ATTR_RO instead of __ATTR. >>>>>>> >>>>>>> It's my understanding that DEVICE_ATTR_* is only for >>>>>>> per device attributes. This one is global for the entire >>>>>>> Z PCI. I just tried with BUS_ATTR_RO instead >>>>>>> and that works but only if I put the attribute at >>>>>>> /sys/bus/pci/uid_checking instead of with a zpci >>>>>>> subfolder. This path would work for us too, we >>>>>>> currently don't have any other global attributes >>>>>>> that we are planning to expose but those could of >>>>>>> course come up in the future. >>>>>> >>>>>> Ah, I missed the fact that this is a kobj_attribute, not a >>>>>> device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but >>>>>> seems like it might fit? >>>>>> >>>>>> Bjorn >>>>>> >>>>> >>>>> KERNEL_ATTR_* is currently not exported in any header. After >>>>> adding it to include/linuc/sysfs.h it indeed works perfectly. >>>>> Adding Christian Brauner as suggested by get_maintainers for >>>>> their opinion. I'm of course willing to provide a patch >>>> >>>> Hey Niklas et al. :) >>>> >>>> I think this will need input from Greg. He should be best versed in >>>> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's >>>> supposed to be kernel internal. Now, that might just be a matter of >>>> renaming the macro but let's see whether Greg has any better idea or >>>> more questions. :) >>> >>> The big question is, why are you needing this? >>> >>> No driver or driver subsystem should EVER be messing with a "raw" >>> kobject like this. Just use the existing DEVICE_* macros instead >>> please. >>> >>> If you are using a raw kobject, please ask me how to do this properly, >>> as that is something that should NEVER show up in the /sys/devices/* >>> tree. Otherwise userspace tools will break. >>> >>> thanks, >>> >>> greg k-h >> >> Hi Greg, >> >> this is for an archite
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Thu, Jan 14, 2021 at 04:06:11PM +0100, Niklas Schnelle wrote: > > > On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > > On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > >> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > >>> > >>> > >>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > >>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > >>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > >>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: > >>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if > >>>>>>> there are multiple functions in an adapter, as PCI domain if and only > >>>>>>> if > >>>>>>> UID Checking is turned on. > >>>>>>> Otherwise we automatically generate domains as devices appear. > >>>>>>> > >>>>>>> The state of UID Checking is thus essential to know if the PCI domain > >>>>>>> will be stable, yet currently there is no way to access this > >>>>>>> information > >>>>>>> from userspace. > >>>>>>> So let's solve this by showing the state of UID checking as a sysfs > >>>>>>> attribute in /sys/bus/pci/uid_checking > >>>> > >>>>>>> +/* Global zPCI attributes */ > >>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj, > >>>>>>> + struct kobj_attribute *attr, char *buf) > >>>>>>> +{ > >>>>>>> + return sprintf(buf, "%i\n", zpci_unique_uid); > >>>>>>> +} > >>>>>>> + > >>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr = > >>>>>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL); > >>>>>> > >>>>>> Use DEVICE_ATTR_RO instead of __ATTR. > >>>>> > >>>>> It's my understanding that DEVICE_ATTR_* is only for > >>>>> per device attributes. This one is global for the entire > >>>>> Z PCI. I just tried with BUS_ATTR_RO instead > >>>>> and that works but only if I put the attribute at > >>>>> /sys/bus/pci/uid_checking instead of with a zpci > >>>>> subfolder. This path would work for us too, we > >>>>> currently don't have any other global attributes > >>>>> that we are planning to expose but those could of > >>>>> course come up in the future. > >>>> > >>>> Ah, I missed the fact that this is a kobj_attribute, not a > >>>> device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but > >>>> seems like it might fit? > >>>> > >>>> Bjorn > >>>> > >>> > >>> KERNEL_ATTR_* is currently not exported in any header. After > >>> adding it to include/linuc/sysfs.h it indeed works perfectly. > >>> Adding Christian Brauner as suggested by get_maintainers for > >>> their opinion. I'm of course willing to provide a patch > >> > >> Hey Niklas et al. :) > >> > >> I think this will need input from Greg. He should be best versed in > >> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's > >> supposed to be kernel internal. Now, that might just be a matter of > >> renaming the macro but let's see whether Greg has any better idea or > >> more questions. :) > > > > The big question is, why are you needing this? > > > > No driver or driver subsystem should EVER be messing with a "raw" > > kobject like this. Just use the existing DEVICE_* macros instead > > please. > > > > If you are using a raw kobject, please ask me how to do this properly, > > as that is something that should NEVER show up in the /sys/devices/* > > tree. Otherwise userspace tools will break. > > > > thanks, > > > > greg k-h > > Hi Greg, > > this is for an architecture specific but global i.e. not device bound PCI > attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work > but only if we place the attribute directly under /sys/bus/pci/new_attr. Then you are doing something wrong :) Where _exactly_ are you wanting to put this attribute? > I'm aware that this is quite unusual in fact I couldn't find anything > similar. That's why this is an RFC, with a lengthy cover letter > explaining our use case, that I sent to Bjorn to figure out where to > even place the attribute. > > So I guess this is indeed me asking you how to do this properly. > That said it does not show up under /sys/devices/* only /sys/bus/pci/*. Do NOT put random kobjects under a bus subsystem. If you need that, then use BUS_ATTR_* as that is what it is there for. Again, if you are in a driver subsystem, do not use a raw kobject. Either something is already there for you, or what you want to do is not correct :) thanks, greg k-h
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/14/21 2:58 PM, Greg Kroah-Hartman wrote: > On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: >> On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: >>> >>> >>> On 1/13/21 7:55 PM, Bjorn Helgaas wrote: >>>> On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: >>>>> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: >>>>>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: >>>>>>> We use the UID of a zPCI adapter, or the UID of the function zero if >>>>>>> there are multiple functions in an adapter, as PCI domain if and only if >>>>>>> UID Checking is turned on. >>>>>>> Otherwise we automatically generate domains as devices appear. >>>>>>> >>>>>>> The state of UID Checking is thus essential to know if the PCI domain >>>>>>> will be stable, yet currently there is no way to access this information >>>>>>> from userspace. >>>>>>> So let's solve this by showing the state of UID checking as a sysfs >>>>>>> attribute in /sys/bus/pci/uid_checking >>>> >>>>>>> +/* Global zPCI attributes */ >>>>>>> +static ssize_t uid_checking_show(struct kobject *kobj, >>>>>>> +struct kobj_attribute *attr, char *buf) >>>>>>> +{ >>>>>>> + return sprintf(buf, "%i\n", zpci_unique_uid); >>>>>>> +} >>>>>>> + >>>>>>> +static struct kobj_attribute sys_zpci_uid_checking_attr = >>>>>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL); >>>>>> >>>>>> Use DEVICE_ATTR_RO instead of __ATTR. >>>>> >>>>> It's my understanding that DEVICE_ATTR_* is only for >>>>> per device attributes. This one is global for the entire >>>>> Z PCI. I just tried with BUS_ATTR_RO instead >>>>> and that works but only if I put the attribute at >>>>> /sys/bus/pci/uid_checking instead of with a zpci >>>>> subfolder. This path would work for us too, we >>>>> currently don't have any other global attributes >>>>> that we are planning to expose but those could of >>>>> course come up in the future. >>>> >>>> Ah, I missed the fact that this is a kobj_attribute, not a >>>> device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but >>>> seems like it might fit? >>>> >>>> Bjorn >>>> >>> >>> KERNEL_ATTR_* is currently not exported in any header. After >>> adding it to include/linuc/sysfs.h it indeed works perfectly. >>> Adding Christian Brauner as suggested by get_maintainers for >>> their opinion. I'm of course willing to provide a patch >> >> Hey Niklas et al. :) >> >> I think this will need input from Greg. He should be best versed in >> sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's >> supposed to be kernel internal. Now, that might just be a matter of >> renaming the macro but let's see whether Greg has any better idea or >> more questions. :) > > The big question is, why are you needing this? > > No driver or driver subsystem should EVER be messing with a "raw" > kobject like this. Just use the existing DEVICE_* macros instead > please. > > If you are using a raw kobject, please ask me how to do this properly, > as that is something that should NEVER show up in the /sys/devices/* > tree. Otherwise userspace tools will break. > > thanks, > > greg k-h Hi Greg, this is for an architecture specific but global i.e. not device bound PCI attribute. That's why DEVICE_ATTR_* does not work. BUS_ATTR_* would work but only if we place the attribute directly under /sys/bus/pci/new_attr. I'm aware that this is quite unusual in fact I couldn't find anything similar. That's why this is an RFC, with a lengthy cover letter explaining our use case, that I sent to Bjorn to figure out where to even place the attribute. So I guess this is indeed me asking you how to do this properly. That said it does not show up under /sys/devices/* only /sys/bus/pci/*. Best regards, Niklas
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Thu, Jan 14, 2021 at 02:44:53PM +0100, Christian Brauner wrote: > On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > > > > > > On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > > > On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > > >> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > > >>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: > > >>>> We use the UID of a zPCI adapter, or the UID of the function zero if > > >>>> there are multiple functions in an adapter, as PCI domain if and only > > >>>> if > > >>>> UID Checking is turned on. > > >>>> Otherwise we automatically generate domains as devices appear. > > >>>> > > >>>> The state of UID Checking is thus essential to know if the PCI domain > > >>>> will be stable, yet currently there is no way to access this > > >>>> information > > >>>> from userspace. > > >>>> So let's solve this by showing the state of UID checking as a sysfs > > >>>> attribute in /sys/bus/pci/uid_checking > > > > > >>>> +/* Global zPCI attributes */ > > >>>> +static ssize_t uid_checking_show(struct kobject *kobj, > > >>>> + struct kobj_attribute *attr, char *buf) > > >>>> +{ > > >>>> + return sprintf(buf, "%i\n", zpci_unique_uid); > > >>>> +} > > >>>> + > > >>>> +static struct kobj_attribute sys_zpci_uid_checking_attr = > > >>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL); > > >>> > > >>> Use DEVICE_ATTR_RO instead of __ATTR. > > >> > > >> It's my understanding that DEVICE_ATTR_* is only for > > >> per device attributes. This one is global for the entire > > >> Z PCI. I just tried with BUS_ATTR_RO instead > > >> and that works but only if I put the attribute at > > >> /sys/bus/pci/uid_checking instead of with a zpci > > >> subfolder. This path would work for us too, we > > >> currently don't have any other global attributes > > >> that we are planning to expose but those could of > > >> course come up in the future. > > > > > > Ah, I missed the fact that this is a kobj_attribute, not a > > > device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but > > > seems like it might fit? > > > > > > Bjorn > > > > > > > KERNEL_ATTR_* is currently not exported in any header. After > > adding it to include/linuc/sysfs.h it indeed works perfectly. > > Adding Christian Brauner as suggested by get_maintainers for > > their opinion. I'm of course willing to provide a patch > > Hey Niklas et al. :) > > I think this will need input from Greg. He should be best versed in > sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's > supposed to be kernel internal. Now, that might just be a matter of > renaming the macro but let's see whether Greg has any better idea or > more questions. :) The big question is, why are you needing this? No driver or driver subsystem should EVER be messing with a "raw" kobject like this. Just use the existing DEVICE_* macros instead please. If you are using a raw kobject, please ask me how to do this properly, as that is something that should NEVER show up in the /sys/devices/* tree. Otherwise userspace tools will break. thanks, greg k-h
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Thu, Jan 14, 2021 at 02:20:10PM +0100, Niklas Schnelle wrote: > > > On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > > On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > >> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > >>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: > >>>> We use the UID of a zPCI adapter, or the UID of the function zero if > >>>> there are multiple functions in an adapter, as PCI domain if and only if > >>>> UID Checking is turned on. > >>>> Otherwise we automatically generate domains as devices appear. > >>>> > >>>> The state of UID Checking is thus essential to know if the PCI domain > >>>> will be stable, yet currently there is no way to access this information > >>>> from userspace. > >>>> So let's solve this by showing the state of UID checking as a sysfs > >>>> attribute in /sys/bus/pci/uid_checking > > > >>>> +/* Global zPCI attributes */ > >>>> +static ssize_t uid_checking_show(struct kobject *kobj, > >>>> + struct kobj_attribute *attr, char *buf) > >>>> +{ > >>>> +return sprintf(buf, "%i\n", zpci_unique_uid); > >>>> +} > >>>> + > >>>> +static struct kobj_attribute sys_zpci_uid_checking_attr = > >>>> +__ATTR(uid_checking, 0444, uid_checking_show, NULL); > >>> > >>> Use DEVICE_ATTR_RO instead of __ATTR. > >> > >> It's my understanding that DEVICE_ATTR_* is only for > >> per device attributes. This one is global for the entire > >> Z PCI. I just tried with BUS_ATTR_RO instead > >> and that works but only if I put the attribute at > >> /sys/bus/pci/uid_checking instead of with a zpci > >> subfolder. This path would work for us too, we > >> currently don't have any other global attributes > >> that we are planning to expose but those could of > >> course come up in the future. > > > > Ah, I missed the fact that this is a kobj_attribute, not a > > device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but > > seems like it might fit? > > > > Bjorn > > > > KERNEL_ATTR_* is currently not exported in any header. After > adding it to include/linuc/sysfs.h it indeed works perfectly. > Adding Christian Brauner as suggested by get_maintainers for > their opinion. I'm of course willing to provide a patch Hey Niklas et al. :) I think this will need input from Greg. He should be best versed in sysfs attributes. The problem with KERNEL_ATTR_* to me seems that it's supposed to be kernel internal. Now, that might just be a matter of renaming the macro but let's see whether Greg has any better idea or more questions. :) Christian
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/13/21 7:55 PM, Bjorn Helgaas wrote: > On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: >> On 1/12/21 10:50 PM, Bjorn Helgaas wrote: >>> On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: >>>> We use the UID of a zPCI adapter, or the UID of the function zero if >>>> there are multiple functions in an adapter, as PCI domain if and only if >>>> UID Checking is turned on. >>>> Otherwise we automatically generate domains as devices appear. >>>> >>>> The state of UID Checking is thus essential to know if the PCI domain >>>> will be stable, yet currently there is no way to access this information >>>> from userspace. >>>> So let's solve this by showing the state of UID checking as a sysfs >>>> attribute in /sys/bus/pci/uid_checking > >>>> +/* Global zPCI attributes */ >>>> +static ssize_t uid_checking_show(struct kobject *kobj, >>>> + struct kobj_attribute *attr, char *buf) >>>> +{ >>>> + return sprintf(buf, "%i\n", zpci_unique_uid); >>>> +} >>>> + >>>> +static struct kobj_attribute sys_zpci_uid_checking_attr = >>>> + __ATTR(uid_checking, 0444, uid_checking_show, NULL); >>> >>> Use DEVICE_ATTR_RO instead of __ATTR. >> >> It's my understanding that DEVICE_ATTR_* is only for >> per device attributes. This one is global for the entire >> Z PCI. I just tried with BUS_ATTR_RO instead >> and that works but only if I put the attribute at >> /sys/bus/pci/uid_checking instead of with a zpci >> subfolder. This path would work for us too, we >> currently don't have any other global attributes >> that we are planning to expose but those could of >> course come up in the future. > > Ah, I missed the fact that this is a kobj_attribute, not a > device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but > seems like it might fit? > > Bjorn > KERNEL_ATTR_* is currently not exported in any header. After adding it to include/linuc/sysfs.h it indeed works perfectly. Adding Christian Brauner as suggested by get_maintainers for their opinion. I'm of course willing to provide a patch for that move should it be desired. @Bjorn apart from the correct macro do you have a preference for either suggested path /sys/bus/pci/zpci/uid_checking vs /sys/bus/pci/uid_checking? For completeness some further internal discussion lets us tend to rather name it to "unique_uids" but I guess that doesn't make a difference to non s390 people ;-)
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Wed, Jan 13, 2021 at 08:47:58AM +0100, Niklas Schnelle wrote: > On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > > On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: > >> We use the UID of a zPCI adapter, or the UID of the function zero if > >> there are multiple functions in an adapter, as PCI domain if and only if > >> UID Checking is turned on. > >> Otherwise we automatically generate domains as devices appear. > >> > >> The state of UID Checking is thus essential to know if the PCI domain > >> will be stable, yet currently there is no way to access this information > >> from userspace. > >> So let's solve this by showing the state of UID checking as a sysfs > >> attribute in /sys/bus/pci/uid_checking > >> +/* Global zPCI attributes */ > >> +static ssize_t uid_checking_show(struct kobject *kobj, > >> + struct kobj_attribute *attr, char *buf) > >> +{ > >> + return sprintf(buf, "%i\n", zpci_unique_uid); > >> +} > >> + > >> +static struct kobj_attribute sys_zpci_uid_checking_attr = > >> + __ATTR(uid_checking, 0444, uid_checking_show, NULL); > > > > Use DEVICE_ATTR_RO instead of __ATTR. > > It's my understanding that DEVICE_ATTR_* is only for > per device attributes. This one is global for the entire > Z PCI. I just tried with BUS_ATTR_RO instead > and that works but only if I put the attribute at > /sys/bus/pci/uid_checking instead of with a zpci > subfolder. This path would work for us too, we > currently don't have any other global attributes > that we are planning to expose but those could of > course come up in the future. Ah, I missed the fact that this is a kobj_attribute, not a device_attribute. Maybe KERNEL_ATTR_RO()? Very few uses so far, but seems like it might fit? Bjorn
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On 1/12/21 10:50 PM, Bjorn Helgaas wrote: > On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: >> We use the UID of a zPCI adapter, or the UID of the function zero if >> there are multiple functions in an adapter, as PCI domain if and only if >> UID Checking is turned on. >> Otherwise we automatically generate domains as devices appear. >> >> The state of UID Checking is thus essential to know if the PCI domain >> will be stable, yet currently there is no way to access this information >> from userspace. >> So let's solve this by showing the state of UID checking as a sysfs >> attribute in /sys/bus/pci/uid_checking > > Cosmetic: can't tell if the above is two paragraphs separated by blank > lines or four separated by either blank lines or short last lines. > Please separate or reflow to avoid the ambiguity. Thanks, you're right I split it in 3 proper paragraphs now. Also the commit message was out of sync with the documentation, cover letter and code. This version actually uses /sys/bus/pci/zpci/uid_checking sorry about that. > > I don't have any input on the s390 issues, and I assume this will go > via the s390 tree. > >> Signed-off-by: Niklas Schnelle >> --- >> Documentation/ABI/testing/sysfs-bus-pci | 11 >> arch/s390/include/asm/pci.h | 3 +++ >> arch/s390/pci/pci.c | 4 +++ >> arch/s390/pci/pci_sysfs.c | 34 + >> 4 files changed, 52 insertions(+) >> >> diff --git a/Documentation/ABI/testing/sysfs-bus-pci >> b/Documentation/ABI/testing/sysfs-bus-pci >> index 25c9c39770c6..a174aac0ebb0 100644 >> --- a/Documentation/ABI/testing/sysfs-bus-pci >> +++ b/Documentation/ABI/testing/sysfs-bus-pci >> @@ -375,3 +375,14 @@ Description: >> The value comes from the PCI kernel device state and can be one >> of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". >> The file is read only. >> +What: /sys/bus/pci/zpci/uid_checking >> +Date: December 2020 >> +Contact:Niklas Schnelle >> +Description: >> +This attribute exposes the global state of UID Checking on >> +an s390 Linux system. If UID Checking is on this file >> +contains '1' otherwise '0'. If UID Checking is on the UID of >> +a zPCI device, or the UID of function zero for a multi-function >> +device will be used as its PCI Domain number. If UID Checking >> +is off PCI Domain numbers are generated automatically and >> +are not stable across reboots. >> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h >> index 212628932ddc..3cfa6cc701ba 100644 >> --- a/arch/s390/include/asm/pci.h >> +++ b/arch/s390/include/asm/pci.h >> @@ -285,6 +285,9 @@ void zpci_debug_exit_device(struct zpci_dev *); >> /* Error reporting */ >> int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *); ... snip ... >> + >> +/* Global zPCI attributes */ >> +static ssize_t uid_checking_show(struct kobject *kobj, >> + struct kobj_attribute *attr, char *buf) >> +{ >> +return sprintf(buf, "%i\n", zpci_unique_uid); >> +} >> + >> +static struct kobj_attribute sys_zpci_uid_checking_attr = >> +__ATTR(uid_checking, 0444, uid_checking_show, NULL); > > Use DEVICE_ATTR_RO instead of __ATTR. It's my understanding that DEVICE_ATTR_* is only for per device attributes. This one is global for the entire Z PCI. I just tried with BUS_ATTR_RO instead and that works but only if I put the attribute at /sys/bus/pci/uid_checking instead of with a zpci subfolder. This path would work for us too, we currently don't have any other global attributes that we are planning to expose but those could of course come up in the future. > ... snip ... >>
Re: [RFC 1/1] s390/pci: expose UID checking state in sysfs
On Mon, Jan 11, 2021 at 10:38:57AM +0100, Niklas Schnelle wrote: > We use the UID of a zPCI adapter, or the UID of the function zero if > there are multiple functions in an adapter, as PCI domain if and only if > UID Checking is turned on. > Otherwise we automatically generate domains as devices appear. > > The state of UID Checking is thus essential to know if the PCI domain > will be stable, yet currently there is no way to access this information > from userspace. > So let's solve this by showing the state of UID checking as a sysfs > attribute in /sys/bus/pci/uid_checking Cosmetic: can't tell if the above is two paragraphs separated by blank lines or four separated by either blank lines or short last lines. Please separate or reflow to avoid the ambiguity. I don't have any input on the s390 issues, and I assume this will go via the s390 tree. > Signed-off-by: Niklas Schnelle > --- > Documentation/ABI/testing/sysfs-bus-pci | 11 > arch/s390/include/asm/pci.h | 3 +++ > arch/s390/pci/pci.c | 4 +++ > arch/s390/pci/pci_sysfs.c | 34 + > 4 files changed, 52 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci > b/Documentation/ABI/testing/sysfs-bus-pci > index 25c9c39770c6..a174aac0ebb0 100644 > --- a/Documentation/ABI/testing/sysfs-bus-pci > +++ b/Documentation/ABI/testing/sysfs-bus-pci > @@ -375,3 +375,14 @@ Description: > The value comes from the PCI kernel device state and can be one > of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". > The file is read only. > +What:/sys/bus/pci/zpci/uid_checking > +Date:December 2020 > +Contact: Niklas Schnelle > +Description: > + This attribute exposes the global state of UID Checking on > + an s390 Linux system. If UID Checking is on this file > + contains '1' otherwise '0'. If UID Checking is on the UID of > + a zPCI device, or the UID of function zero for a multi-function > + device will be used as its PCI Domain number. If UID Checking > + is off PCI Domain numbers are generated automatically and > + are not stable across reboots. > diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h > index 212628932ddc..3cfa6cc701ba 100644 > --- a/arch/s390/include/asm/pci.h > +++ b/arch/s390/include/asm/pci.h > @@ -285,6 +285,9 @@ void zpci_debug_exit_device(struct zpci_dev *); > /* Error reporting */ > int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *); > > +/* Sysfs Entries */ > +int zpci_sysfs_init(void); > + > #ifdef CONFIG_NUMA > > /* Returns the node based on PCI bus */ > diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c > index 41df8fcfddde..c16c93e5f9af 100644 > --- a/arch/s390/pci/pci.c > +++ b/arch/s390/pci/pci.c > @@ -881,6 +881,10 @@ static int __init pci_base_init(void) > if (rc) > goto out_find; > > + rc = zpci_sysfs_init(); > + if (rc) > + goto out_find; > + > s390_pci_initialized = 1; > return 0; > > diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c > index 5c028bee91b9..d00690f73539 100644 > --- a/arch/s390/pci/pci_sysfs.c > +++ b/arch/s390/pci/pci_sysfs.c > @@ -172,3 +172,37 @@ const struct attribute_group *zpci_attr_groups[] = { > &pfip_attr_group, > NULL, > }; > + > +/* Global zPCI attributes */ > +static ssize_t uid_checking_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%i\n", zpci_unique_uid); > +} > + > +static struct kobj_attribute sys_zpci_uid_checking_attr = > + __ATTR(uid_checking, 0444, uid_checking_show, NULL); Use DEVICE_ATTR_RO instead of __ATTR. > +static struct kset *zpci_global_kset; > + > +static struct attribute *zpci_attrs_global[] = { > + &sys_zpci_uid_checking_attr.attr, > + NULL, > +}; > + > +static struct attribute_group zpci_attr_group_global = { > + .attrs = zpci_attrs_global, > +}; > + > +int __init zpci_sysfs_init(void) > +{ > + struct kset *pci_bus_kset; > + > + pci_bus_kset = bus_get_kset(&pci_bus_type); > + > + zpci_global_kset = kset_create_and_add("zpci", NULL, > &pci_bus_kset->kobj); > + if (!zpci_global_kset) > + return -ENOMEM; > + > + return sysfs_create_group(&zpci_global_kset->kobj, > &zpci_attr_group_global); > +} > -- > 2.17.1 >
[RFC 0/1] PCI: s390 global attribute "UID Checking"
Hi Bjorn, Hi Kernel Hackers, With the below patch I'm proposing to expose a global (i.e. not device bound) runtime attribute of the s390 PCI implementation (zPCI) called "UID Checking". You can find some background information on what this attribute means and why it is important at the end of this mail. The reason I'm writing to you about it however is that this is the first global PCI attribute we would like to expose to user space and I'm searching for a good place to put it. The proposed patch uses "/sys/bus/pci/zpci/uid_checking" which from our perspective would be a great choice but I realize that there currently are no platform specific attributes directly under "/sys/bus/pci" so this clearly requires some discussion. What's your thought on this and do you know of any other platform specific global PCI attributes as I couldn't find any? Best regards, Niklas Schnelle Background: On s390 OSs always run under at least a machine level hypervisor (LPAR). Simpliefied by usually running from SAN this makes VM migration possible at every level. For PCI this has created the need to allow PCI IDs to be stable across machines and to be partly user defined such that the setup of an existing VM on one machine can be recreated on another machine. In particular the Domain part of the PCI ID can be user defined by setting a per device value called UID. Since this was a late addition and isn't used by all OSs and since UIDs need to be unique if used as Domains, there is an additional global platform supplied runtime value called "UID Checking". This value indicates if UIDs are guaranteed to be unique and set which triggers Linux to use the UIDs as PCI Domains otherwise PCI Domains are simply incremented as necessary. This "UID Checking" setting is thus very important e.g. when deciding how network interface names are generated as it indicates if the domain part of the PCI ID will remain stable across reboots and migrations. Once exposed we will propose a patch to udev/systemd to use the "UID Checking" attribute to prefer network interface names which can be guaranteed to be stable and re-creatable for migration. Niklas Schnelle (1): s390/pci: expose UID checking state in sysfs Documentation/ABI/testing/sysfs-bus-pci | 11 arch/s390/include/asm/pci.h | 3 +++ arch/s390/pci/pci.c | 4 +++ arch/s390/pci/pci_sysfs.c | 34 + 4 files changed, 52 insertions(+) -- 2.17.1
[RFC 1/1] s390/pci: expose UID checking state in sysfs
We use the UID of a zPCI adapter, or the UID of the function zero if there are multiple functions in an adapter, as PCI domain if and only if UID Checking is turned on. Otherwise we automatically generate domains as devices appear. The state of UID Checking is thus essential to know if the PCI domain will be stable, yet currently there is no way to access this information from userspace. So let's solve this by showing the state of UID checking as a sysfs attribute in /sys/bus/pci/uid_checking Signed-off-by: Niklas Schnelle --- Documentation/ABI/testing/sysfs-bus-pci | 11 arch/s390/include/asm/pci.h | 3 +++ arch/s390/pci/pci.c | 4 +++ arch/s390/pci/pci_sysfs.c | 34 + 4 files changed, 52 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 25c9c39770c6..a174aac0ebb0 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -375,3 +375,14 @@ Description: The value comes from the PCI kernel device state and can be one of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold". The file is read only. +What: /sys/bus/pci/zpci/uid_checking +Date: December 2020 +Contact: Niklas Schnelle +Description: + This attribute exposes the global state of UID Checking on + an s390 Linux system. If UID Checking is on this file + contains '1' otherwise '0'. If UID Checking is on the UID of + a zPCI device, or the UID of function zero for a multi-function + device will be used as its PCI Domain number. If UID Checking + is off PCI Domain numbers are generated automatically and + are not stable across reboots. diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 212628932ddc..3cfa6cc701ba 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -285,6 +285,9 @@ void zpci_debug_exit_device(struct zpci_dev *); /* Error reporting */ int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *); +/* Sysfs Entries */ +int zpci_sysfs_init(void); + #ifdef CONFIG_NUMA /* Returns the node based on PCI bus */ diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 41df8fcfddde..c16c93e5f9af 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -881,6 +881,10 @@ static int __init pci_base_init(void) if (rc) goto out_find; + rc = zpci_sysfs_init(); + if (rc) + goto out_find; + s390_pci_initialized = 1; return 0; diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 5c028bee91b9..d00690f73539 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -172,3 +172,37 @@ const struct attribute_group *zpci_attr_groups[] = { &pfip_attr_group, NULL, }; + +/* Global zPCI attributes */ +static ssize_t uid_checking_show(struct kobject *kobj, +struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%i\n", zpci_unique_uid); +} + +static struct kobj_attribute sys_zpci_uid_checking_attr = + __ATTR(uid_checking, 0444, uid_checking_show, NULL); + +static struct kset *zpci_global_kset; + +static struct attribute *zpci_attrs_global[] = { + &sys_zpci_uid_checking_attr.attr, + NULL, +}; + +static struct attribute_group zpci_attr_group_global = { + .attrs = zpci_attrs_global, +}; + +int __init zpci_sysfs_init(void) +{ + struct kset *pci_bus_kset; + + pci_bus_kset = bus_get_kset(&pci_bus_type); + + zpci_global_kset = kset_create_and_add("zpci", NULL, &pci_bus_kset->kobj); + if (!zpci_global_kset) + return -ENOMEM; + + return sysfs_create_group(&zpci_global_kset->kobj, &zpci_attr_group_global); +} -- 2.17.1
[PATCH v16 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu --- arch/arm64/kvm/hypercalls.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 25ea4ecb6449..b9d8607083eb 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u64 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case SPECTRE_VULNERABLE: break; case SPECTRE_MITIGATED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; + val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; @@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; fallthrough; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; default: return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.17.1
[PATCH v15 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu --- arch/arm64/kvm/hypercalls.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 25ea4ecb6449..b9d8607083eb 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u64 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case SPECTRE_VULNERABLE: break; case SPECTRE_MITIGATED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; + val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; @@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; fallthrough; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; default: return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.17.1
[PATCH v14 02/10] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu --- arch/arm64/kvm/hypercalls.c | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..901c60f119c2 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u64 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case KVM_BP_HARDEN_UNKNOWN: break; case KVM_BP_HARDEN_WA_NEEDED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case KVM_BP_HARDEN_NOT_REQUIRED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; @@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case KVM_SSBD_UNKNOWN: break; case KVM_SSBD_KERNEL: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case KVM_SSBD_FORCE_ENABLE: case KVM_SSBD_MITIGATED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; default: return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.17.1
Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
On Tue, Jul 28, 2020 at 01:07:14AM +, Jianyong Wu wrote: > > > > -Original Message- > > From: Will Deacon > > Sent: Monday, July 27, 2020 7:38 PM > > To: Jianyong Wu > > Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org; > > t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com; > > m...@kernel.org; richardcoch...@gmail.com; Mark Rutland > > ; Suzuki Poulose ; > > Steven Price ; linux-kernel@vger.kernel.org; linux- > > arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; > > k...@vger.kernel.org; Steve Capper ; Kaly Xin > > ; Justin He ; Wei Chen > > ; nd > > Subject: Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests > > via SMCCC > > > > On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote: > > > > From: Will Deacon > > > > > > > > We can advertise ourselves to guests as KVM and provide a basic > > > > features bitmap for discoverability of future hypervisor services. > > > > > > > > Cc: Marc Zyngier > > > > Signed-off-by: Will Deacon > > > > Signed-off-by: Jianyong Wu > > > > --- > > > > arch/arm64/kvm/hypercalls.c | 29 +++-- > > > > 1 file changed, 19 insertions(+), 10 deletions(-) > > > > > > > > diff --git a/arch/arm64/kvm/hypercalls.c > > > > b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..db6dce3d0e23 > > > > 100644 > > > > --- a/arch/arm64/kvm/hypercalls.c > > > > +++ b/arch/arm64/kvm/hypercalls.c > > > > @@ -12,13 +12,13 @@ > > > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { > > > > u32 func_id = smccc_get_function(vcpu); > > > > - long val = SMCCC_RET_NOT_SUPPORTED; > > > > + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED}; > > > > > > There is a risk as this u32 value will return here and a u64 value > > > will be obtained in guest. For example, The val[0] is initialized as > > > -1 of 0x and the guest get 0x then it will be compared > > > with -1 of 0x Also this problem exists for the > > > transfer of address in u64 type. So the following assignment to "val" > > > should be split into two > > > u32 value and assign to val[0] and val[1] respectively. > > > WDYT? > > > > Yes, I think you're right that this is a bug, but isn't the solution just > > to make > > that an array of 'long'? > > > > long val [4]; > > > > That will sign-extend the negative error codes as required, while leaving > > the > > explicitly unsigned UID constants alone. > > Ok, that's much better. I will fix it at next version. > > By the way, I wonder when will you update this patch set. I see someone like > me > adopt this patch set as code base and need rebase it every time, so expect > your update. I'm not working on it, so please feel free to include it along with the patches that add an upstream user. Will
[RFC PATCH 21/30] user namespace: Add function that checks if the UID map is defined
From: Krzysztof Struczynski Add function that checks if the UID map is defined. It will be used by ima to check if ID remapping in subject-based rules is necessary. Signed-off-by: Krzysztof Struczynski --- include/linux/user_namespace.h | 6 ++ kernel/user_namespace.c| 11 +++ 2 files changed, 17 insertions(+) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index d9759c54fead..bcb21c41c910 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -138,6 +138,7 @@ extern bool in_userns(const struct user_namespace *ancestor, const struct user_namespace *child); extern bool current_in_userns(const struct user_namespace *target_ns); struct ns_common *ns_get_owner(struct ns_common *ns); +extern bool userns_set_uidmap(const struct user_namespace *ns); #else static inline struct user_namespace *get_user_ns(struct user_namespace *ns) @@ -182,6 +183,11 @@ static inline struct ns_common *ns_get_owner(struct ns_common *ns) { return ERR_PTR(-EPERM); } + +static inline bool userns_set_uidmap(const struct user_namespace *ns) +{ + return true; +} #endif #endif /* _LINUX_USER_H */ diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 87804e0371fe..e38f9f11a589 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -1232,6 +1232,17 @@ bool current_in_userns(const struct user_namespace *target_ns) } EXPORT_SYMBOL(current_in_userns); +bool userns_set_uidmap(const struct user_namespace *ns) +{ + bool mapping_defined; + + mutex_lock(&userns_state_mutex); + mapping_defined = ns->uid_map.nr_extents != 0; + mutex_unlock(&userns_state_mutex); + + return mapping_defined; +} + static inline struct user_namespace *to_user_ns(struct ns_common *ns) { return container_of(ns, struct user_namespace, ns); -- 2.20.1
RE: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
> -Original Message- > From: Will Deacon > Sent: Monday, July 27, 2020 7:38 PM > To: Jianyong Wu > Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org; > t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com; > m...@kernel.org; richardcoch...@gmail.com; Mark Rutland > ; Suzuki Poulose ; > Steven Price ; linux-kernel@vger.kernel.org; linux- > arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; > k...@vger.kernel.org; Steve Capper ; Kaly Xin > ; Justin He ; Wei Chen > ; nd > Subject: Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests > via SMCCC > > On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote: > > > From: Will Deacon > > > > > > We can advertise ourselves to guests as KVM and provide a basic > > > features bitmap for discoverability of future hypervisor services. > > > > > > Cc: Marc Zyngier > > > Signed-off-by: Will Deacon > > > Signed-off-by: Jianyong Wu > > > --- > > > arch/arm64/kvm/hypercalls.c | 29 +++-- > > > 1 file changed, 19 insertions(+), 10 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/hypercalls.c > > > b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..db6dce3d0e23 > > > 100644 > > > --- a/arch/arm64/kvm/hypercalls.c > > > +++ b/arch/arm64/kvm/hypercalls.c > > > @@ -12,13 +12,13 @@ > > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { > > > u32 func_id = smccc_get_function(vcpu); > > > - long val = SMCCC_RET_NOT_SUPPORTED; > > > + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED}; > > > > There is a risk as this u32 value will return here and a u64 value > > will be obtained in guest. For example, The val[0] is initialized as > > -1 of 0x and the guest get 0x then it will be compared > > with -1 of 0x Also this problem exists for the > > transfer of address in u64 type. So the following assignment to "val" > > should be split into two > > u32 value and assign to val[0] and val[1] respectively. > > WDYT? > > Yes, I think you're right that this is a bug, but isn't the solution just to > make > that an array of 'long'? > > long val [4]; > > That will sign-extend the negative error codes as required, while leaving the > explicitly unsigned UID constants alone. Ok, that's much better. I will fix it at next version. By the way, I wonder when will you update this patch set. I see someone like me adopt this patch set as code base and need rebase it every time, so expect your update. Thanks Jianyong > > Will
Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote: > > From: Will Deacon > > > > We can advertise ourselves to guests as KVM and provide a basic features > > bitmap for discoverability of future hypervisor services. > > > > Cc: Marc Zyngier > > Signed-off-by: Will Deacon > > Signed-off-by: Jianyong Wu > > --- > > arch/arm64/kvm/hypercalls.c | 29 +++-- > > 1 file changed, 19 insertions(+), 10 deletions(-) > > > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c > > index 550dfa3e53cd..db6dce3d0e23 100644 > > --- a/arch/arm64/kvm/hypercalls.c > > +++ b/arch/arm64/kvm/hypercalls.c > > @@ -12,13 +12,13 @@ > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { > > u32 func_id = smccc_get_function(vcpu); > > - long val = SMCCC_RET_NOT_SUPPORTED; > > + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED}; > > There is a risk as this u32 value will return here and a u64 value will be > obtained in guest. For example, The val[0] is initialized as -1 of > 0x and the guest get 0x then it will be compared with -1 > of 0x Also this problem exists for the transfer of address > in u64 type. So the following assignment to "val" should be split into two > u32 value and assign to val[0] and val[1] respectively. > WDYT? Yes, I think you're right that this is a bug, but isn't the solution just to make that an array of 'long'? long val [4]; That will sign-extend the negative error codes as required, while leaving the explicitly unsigned UID constants alone. Will
RE: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
Hi Will, > -Original Message- > From: Jianyong Wu > Sent: Friday, June 19, 2020 9:01 PM > To: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org; > t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com; > m...@kernel.org; richardcoch...@gmail.com; Mark Rutland > ; w...@kernel.org; Suzuki Poulose > ; Steven Price > Cc: linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > kvm...@lists.cs.columbia.edu; k...@vger.kernel.org; Steve Capper > ; Kaly Xin ; Justin He > ; Wei Chen ; Jianyong Wu > ; nd > Subject: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via > SMCCC > > From: Will Deacon > > We can advertise ourselves to guests as KVM and provide a basic features > bitmap for discoverability of future hypervisor services. > > Cc: Marc Zyngier > Signed-off-by: Will Deacon > Signed-off-by: Jianyong Wu > --- > arch/arm64/kvm/hypercalls.c | 29 +++-- > 1 file changed, 19 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c > index 550dfa3e53cd..db6dce3d0e23 100644 > --- a/arch/arm64/kvm/hypercalls.c > +++ b/arch/arm64/kvm/hypercalls.c > @@ -12,13 +12,13 @@ > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { > u32 func_id = smccc_get_function(vcpu); > - long val = SMCCC_RET_NOT_SUPPORTED; > + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED}; There is a risk as this u32 value will return here and a u64 value will be obtained in guest. For example, The val[0] is initialized as -1 of 0x and the guest get 0x then it will be compared with -1 of 0x Also this problem exists for the transfer of address in u64 type. So the following assignment to "val" should be split into two u32 value and assign to val[0] and val[1] respectively. WDYT? Thanks Jianyong > u32 feature; > gpa_t gpa; > > switch (func_id) { > case ARM_SMCCC_VERSION_FUNC_ID: > - val = ARM_SMCCC_VERSION_1_1; > + val[0] = ARM_SMCCC_VERSION_1_1; > break; > case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: > feature = smccc_get_arg1(vcpu); > @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) > case KVM_BP_HARDEN_UNKNOWN: > break; > case KVM_BP_HARDEN_WA_NEEDED: > - val = SMCCC_RET_SUCCESS; > + val[0] = SMCCC_RET_SUCCESS; > break; > case KVM_BP_HARDEN_NOT_REQUIRED: > - val = SMCCC_RET_NOT_REQUIRED; > + val[0] = SMCCC_RET_NOT_REQUIRED; > break; > } > break; > @@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) > case KVM_SSBD_UNKNOWN: > break; > case KVM_SSBD_KERNEL: > - val = SMCCC_RET_SUCCESS; > + val[0] = SMCCC_RET_SUCCESS; > break; > case KVM_SSBD_FORCE_ENABLE: > case KVM_SSBD_MITIGATED: > - val = SMCCC_RET_NOT_REQUIRED; > + val[0] = SMCCC_RET_NOT_REQUIRED; > break; > } > break; > case ARM_SMCCC_HV_PV_TIME_FEATURES: > - val = SMCCC_RET_SUCCESS; > + val[0] = SMCCC_RET_SUCCESS; > break; > } > break; > case ARM_SMCCC_HV_PV_TIME_FEATURES: > - val = kvm_hypercall_pv_features(vcpu); > + val[0] = kvm_hypercall_pv_features(vcpu); > break; > case ARM_SMCCC_HV_PV_TIME_ST: > gpa = kvm_init_stolen_time(vcpu); > if (gpa != GPA_INVALID) > - val = gpa; > + val[0] = gpa; > + break; > + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: > + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; > + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; > + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; > + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; > + break; > + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: > + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); > break; > default: > return kvm_psci_call(vcpu); > } > > - smccc_set_retval(vcpu, val, 0, 0, 0); > + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); > return 1; > } > -- > 2.17.1
[RFC PATCH 2/5] keys: Replace uid/gid/perm permissions checking with an ACL
Replace the uid/gid/perm permissions checking on a key with an ACL to allow the SETATTR and SEARCH permissions to be split. This will also allow a greater range of subjects to represented. Define a KEYCTL_CAPS_ACL capabilities bit to indicate if the kernel has this change applied. WHY DO THIS? The problem is that SETATTR and SEARCH cover a slew of actions, not all of which should be grouped together. For SETATTR, this includes actions that are about controlling access to a key: (1) Changing a key's ownership. (2) Changing a key's security information. (3) Setting a keyring's restriction. And actions that are about managing a key's lifetime: (4) Setting an expiry time. (5) Revoking a key. and (proposed) managing a key as part of a cache: (6) Invalidating a key. Managing a key's lifetime doesn't really have anything to do with controlling access to that key. Expiry time is awkward since it's more about the lifetime of the content and so, in some ways goes better with WRITE permission. It can, however, be set unconditionally by a process with an appropriate authorisation token for instantiating a key, and can also be set by the key type driver when a key is instantiated, so lumping it with the access-controlling actions is probably okay. As for SEARCH permission, that currently covers: (1) Finding keys in a keyring tree during a search. (2) Permitting keyrings to be joined. (3) Invalidation. But these don't really belong together either, since these actions really need to be controlled separately. Finally, there are number of special cases to do with granting the administrator special rights to invalidate or clear keys that I would like to handle with the ACL rather than key flags and special checks. === WHAT IS CHANGED === The SETATTR permission is split to create two new permissions: (1) SETSEC - which allows the key's owner, group and ACL to be changed and a restriction to be placed on a keyring. (2) REVOKE - which allows a key to be revoked. The SEARCH permission is split to create: (1) SEARCH - which allows a keyring to be search and a key to be found. (2) JOIN - which allows a keyring to be joined as a session keyring. (3) INVAL - which allows a key to be invalidated. The WRITE permission is also split to create: (1) WRITE - which allows a key's content to be altered and links to be added, removed and replaced in a keyring. (2) CLEAR - which allows a keyring to be cleared completely. This is split out to make it possible to give just this to an administrator. (3) REVOKE - see above. Keys acquire ACLs which consist of a series of ACEs, and all that apply are unioned together. An ACE specifies a subject, such as: (*) Possessor - permitted to anyone who 'possesses' a key (*) Owner - permitted to the key owner (*) Group - permitted to the key group (*) Everyone - permitted to everyone Note that 'Other' has been replaced with 'Everyone' on the assumption that you wouldn't grant a permit to 'Other' that you wouldn't also grant to everyone else. Further subjects may be made available by later patches. The ACE also specifies a permissions mask. The set of permissions is now: VIEWCan view the key metadata READCan read the key content WRITE Can update/modify the key content SEARCH Can find the key by searching/requesting LINKCan make a link to the key SETSEC Can change owner, ACL, expiry INVAL Can invalidate REVOKE Can revoke JOINCan join this keyring CLEAR Can clear this keyring The KEYCTL_SETPERM function is then deprecated. The KEYCTL_SET_TIMEOUT function then is permitted if SETSEC is set, or if the caller has a valid instantiation auth token. The KEYCTL_INVALIDATE function then requires INVAL. The KEYCTL_REVOKE function then requires REVOKE. The KEYCTL_JOIN_SESSION_KEYRING function then requires JOIN to join an existing keyring. The JOIN permission is enabled by default for session keyrings and manually created keyrings only. == BACKWARD COMPATIBILITY == To maintain backward compatibility, KEYCTL_SETPERM will translate the permissions mask it is given into a new ACL for a key - unless KEYCTL_SET_ACL has been called on that key, in which case an error will be returned. It will convert possessor, owner, group and other permissions into separate ACEs, if each portion of the mask is non-zero. SETATTR permission turns on all of INVAL, REVOKE and SETSEC. WRITE permission turns on WRITE, REVOKE and, if a keyring, CLEAR. JOIN is turned on if a keyring is being altered. The KEYCTL_DESCRIBE function translates the ACL back into a permissions mask to retur
[PATCH 5.7 145/265] soc: imx8m: Correct i.MX8MP UID fuse offset
From: Anson Huang [ Upstream commit c95c9693b112f312b59c5d100fd09a1349970fab ] Correct i.MX8MP UID fuse offset according to fuse map: UID_LOW: 0x420 UID_HIGH: 0x430 Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support") Signed-off-by: Anson Huang Reviewed-by: Iuliana Prodan Signed-off-by: Shawn Guo Signed-off-by: Sasha Levin --- drivers/soc/imx/soc-imx8m.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c index 719e1f189ebf2..ada0d8804d84c 100644 --- a/drivers/soc/imx/soc-imx8m.c +++ b/drivers/soc/imx/soc-imx8m.c @@ -22,6 +22,8 @@ #define OCOTP_UID_LOW 0x410 #define OCOTP_UID_HIGH 0x420 +#define IMX8MP_OCOTP_UID_OFFSET0x10 + /* Same as ANADIG_DIGPROG_IMX7D */ #define ANADIG_DIGPROG_IMX8MM 0x800 @@ -88,6 +90,8 @@ static void __init imx8mm_soc_uid(void) { void __iomem *ocotp_base; struct device_node *np; + u32 offset = of_machine_is_compatible("fsl,imx8mp") ? +IMX8MP_OCOTP_UID_OFFSET : 0; np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp"); if (!np) @@ -96,9 +100,9 @@ static void __init imx8mm_soc_uid(void) ocotp_base = of_iomap(np, 0); WARN_ON(!ocotp_base); - soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH); + soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset); soc_uid <<= 32; - soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW); + soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset); iounmap(ocotp_base); of_node_put(np); -- 2.25.1
Re: [PATCH V3] soc: imx8m: Correct i.MX8MP UID fuse offset
On Wed, Jun 10, 2020 at 06:03:02PM +0800, Anson Huang wrote: > Correct i.MX8MP UID fuse offset according to fuse map: > > UID_LOW: 0x420 > UID_HIGH: 0x430 > > Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") > Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support") > Signed-off-by: Anson Huang Applied, thanks.
[PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu --- arch/arm64/kvm/hypercalls.c | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..db6dce3d0e23 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case KVM_BP_HARDEN_UNKNOWN: break; case KVM_BP_HARDEN_WA_NEEDED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case KVM_BP_HARDEN_NOT_REQUIRED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; @@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case KVM_SSBD_UNKNOWN: break; case KVM_SSBD_KERNEL: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case KVM_SSBD_FORCE_ENABLE: case KVM_SSBD_MITIGATED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; default: return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.17.1
[RFC PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
From: Will Deacon We can advertise ourselves to guests as KVM and provide a basic features bitmap for discoverability of future hypervisor services. Cc: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Jianyong Wu --- arch/arm64/kvm/hypercalls.c | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..db6dce3d0e23 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -12,13 +12,13 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - long val = SMCCC_RET_NOT_SUPPORTED; + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED}; u32 feature; gpa_t gpa; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: - val = ARM_SMCCC_VERSION_1_1; + val[0] = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: feature = smccc_get_arg1(vcpu); @@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case KVM_BP_HARDEN_UNKNOWN: break; case KVM_BP_HARDEN_WA_NEEDED: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case KVM_BP_HARDEN_NOT_REQUIRED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; @@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case KVM_SSBD_UNKNOWN: break; case KVM_SSBD_KERNEL: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; case KVM_SSBD_FORCE_ENABLE: case KVM_SSBD_MITIGATED: - val = SMCCC_RET_NOT_REQUIRED; + val[0] = SMCCC_RET_NOT_REQUIRED; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = SMCCC_RET_SUCCESS; + val[0] = SMCCC_RET_SUCCESS; break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: - val = kvm_hypercall_pv_features(vcpu); + val[0] = kvm_hypercall_pv_features(vcpu); break; case ARM_SMCCC_HV_PV_TIME_ST: gpa = kvm_init_stolen_time(vcpu); if (gpa != GPA_INVALID) - val = gpa; + val[0] = gpa; + break; + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID: + val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0; + val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1; + val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2; + val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3; + break; + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); break; default: return kvm_psci_call(vcpu); } - smccc_set_retval(vcpu, val, 0, 0, 0); + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]); return 1; } -- 2.17.1
Re: [PATCH V3] soc: imx8m: Correct i.MX8MP UID fuse offset
On 6/10/2020 1:03 PM, Anson Huang wrote: Correct i.MX8MP UID fuse offset according to fuse map: UID_LOW: 0x420 UID_HIGH: 0x430 Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support") Signed-off-by: Anson Huang Reviewed-by: Iuliana Prodan --- Changes since V2: - add one more fix tag for original patch before file name is changed. --- drivers/soc/imx/soc-imx8m.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c index 7b0759a..0bc8314 100644 --- a/drivers/soc/imx/soc-imx8m.c +++ b/drivers/soc/imx/soc-imx8m.c @@ -22,6 +22,8 @@ #define OCOTP_UID_LOW 0x410 #define OCOTP_UID_HIGH0x420 +#define IMX8MP_OCOTP_UID_OFFSET 0x10 + /* Same as ANADIG_DIGPROG_IMX7D */ #define ANADIG_DIGPROG_IMX8MM 0x800 @@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void) { void __iomem *ocotp_base; struct device_node *np; + u32 offset = of_machine_is_compatible("fsl,imx8mp") ? +IMX8MP_OCOTP_UID_OFFSET : 0; np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp"); if (!np) @@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void) ocotp_base = of_iomap(np, 0); WARN_ON(!ocotp_base); - soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH); + soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset); soc_uid <<= 32; - soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW); + soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset); iounmap(ocotp_base); of_node_put(np);
[PATCH V3] soc: imx8m: Correct i.MX8MP UID fuse offset
Correct i.MX8MP UID fuse offset according to fuse map: UID_LOW: 0x420 UID_HIGH: 0x430 Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") Fixes: 18f662a73862 ("soc: imx: Add i.MX8MP SoC driver support") Signed-off-by: Anson Huang --- Changes since V2: - add one more fix tag for original patch before file name is changed. --- drivers/soc/imx/soc-imx8m.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c index 7b0759a..0bc8314 100644 --- a/drivers/soc/imx/soc-imx8m.c +++ b/drivers/soc/imx/soc-imx8m.c @@ -22,6 +22,8 @@ #define OCOTP_UID_LOW 0x410 #define OCOTP_UID_HIGH 0x420 +#define IMX8MP_OCOTP_UID_OFFSET0x10 + /* Same as ANADIG_DIGPROG_IMX7D */ #define ANADIG_DIGPROG_IMX8MM 0x800 @@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void) { void __iomem *ocotp_base; struct device_node *np; + u32 offset = of_machine_is_compatible("fsl,imx8mp") ? +IMX8MP_OCOTP_UID_OFFSET : 0; np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp"); if (!np) @@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void) ocotp_base = of_iomap(np, 0); WARN_ON(!ocotp_base); - soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH); + soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset); soc_uid <<= 32; - soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW); + soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset); iounmap(ocotp_base); of_node_put(np); -- 2.7.4
RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
Hi, Luliana > Subject: Re: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset > > > > On 6/10/2020 10:57 AM, Anson Huang wrote: > > > >> Subject: RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset > >> > >>> From: Anson Huang > >>> Sent: Wednesday, June 10, 2020 6:42 AM > >>> > >>> Correct i.MX8MP UID fuse offset according to fuse map: > >>> > >>> UID_LOW: 0x420 > >>> UID_HIGH: 0x430 > >>> > >>> Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m > >>> soc > >>> driver") > >> > >> AFAIK "Fixes:" should point to the original patch which introduced the > >> issue. > >> Not the one changing file name. > > > > But the patch can NOT be applied to the kernel version with original > > file, how to fix it? > > > I believe you can add two "Fixes:" with the two commits: the one introducing > the issue and the one changing the file name. Thanks, will add once more fix tag, please help review V3. Anson
Re: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
On 6/10/2020 10:57 AM, Anson Huang wrote: Subject: RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset From: Anson Huang Sent: Wednesday, June 10, 2020 6:42 AM Correct i.MX8MP UID fuse offset according to fuse map: UID_LOW: 0x420 UID_HIGH: 0x430 Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") AFAIK "Fixes:" should point to the original patch which introduced the issue. Not the one changing file name. But the patch can NOT be applied to the kernel version with original file, how to fix it? I believe you can add two "Fixes:" with the two commits: the one introducing the issue and the one changing the file name. Iulia
RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
> Subject: RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset > > > From: Anson Huang > > Sent: Wednesday, June 10, 2020 6:42 AM > > > > Correct i.MX8MP UID fuse offset according to fuse map: > > > > UID_LOW: 0x420 > > UID_HIGH: 0x430 > > > > Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc > > driver") > > AFAIK "Fixes:" should point to the original patch which introduced the issue. > Not the one changing file name. But the patch can NOT be applied to the kernel version with original file, how to fix it? Anson
RE: [PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
> From: Anson Huang > Sent: Wednesday, June 10, 2020 6:42 AM > > Correct i.MX8MP UID fuse offset according to fuse map: > > UID_LOW: 0x420 > UID_HIGH: 0x430 > > Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") AFAIK "Fixes:" should point to the original patch which introduced the issue. Not the one changing file name. > Signed-off-by: Anson Huang > --- > Changes since V1: > - add fix tag. > --- > drivers/soc/imx/soc-imx8m.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c index > 7b0759a..0bc8314 100644 > --- a/drivers/soc/imx/soc-imx8m.c > +++ b/drivers/soc/imx/soc-imx8m.c > @@ -22,6 +22,8 @@ > #define OCOTP_UID_LOW0x410 > #define OCOTP_UID_HIGH 0x420 > > +#define IMX8MP_OCOTP_UID_OFFSET 0x10 > + > /* Same as ANADIG_DIGPROG_IMX7D */ > #define ANADIG_DIGPROG_IMX8MM0x800 > > @@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void) { > void __iomem *ocotp_base; > struct device_node *np; > + u32 offset = of_machine_is_compatible("fsl,imx8mp") ? > + IMX8MP_OCOTP_UID_OFFSET : 0; > If (of_machine_is_compatible("fsl,imx8mp")) Octop_base += IMX8MP_OCOTP_UID_OFFSET; Then you may not need the extra changes. Regards Aisheng > np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp"); > if (!np) > @@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void) > ocotp_base = of_iomap(np, 0); > WARN_ON(!ocotp_base); > > - soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH); > + soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset); > soc_uid <<= 32; > - soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW); > + soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset); > > iounmap(ocotp_base); > of_node_put(np); > -- > 2.7.4
[PATCH V2] soc: imx8m: Correct i.MX8MP UID fuse offset
Correct i.MX8MP UID fuse offset according to fuse map: UID_LOW: 0x420 UID_HIGH: 0x430 Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") Signed-off-by: Anson Huang --- Changes since V1: - add fix tag. --- drivers/soc/imx/soc-imx8m.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c index 7b0759a..0bc8314 100644 --- a/drivers/soc/imx/soc-imx8m.c +++ b/drivers/soc/imx/soc-imx8m.c @@ -22,6 +22,8 @@ #define OCOTP_UID_LOW 0x410 #define OCOTP_UID_HIGH 0x420 +#define IMX8MP_OCOTP_UID_OFFSET0x10 + /* Same as ANADIG_DIGPROG_IMX7D */ #define ANADIG_DIGPROG_IMX8MM 0x800 @@ -87,6 +89,8 @@ static void __init imx8mm_soc_uid(void) { void __iomem *ocotp_base; struct device_node *np; + u32 offset = of_machine_is_compatible("fsl,imx8mp") ? +IMX8MP_OCOTP_UID_OFFSET : 0; np = of_find_compatible_node(NULL, NULL, "fsl,imx8mm-ocotp"); if (!np) @@ -95,9 +99,9 @@ static void __init imx8mm_soc_uid(void) ocotp_base = of_iomap(np, 0); WARN_ON(!ocotp_base); - soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH); + soc_uid = readl_relaxed(ocotp_base + OCOTP_UID_HIGH + offset); soc_uid <<= 32; - soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW); + soc_uid |= readl_relaxed(ocotp_base + OCOTP_UID_LOW + offset); iounmap(ocotp_base); of_node_put(np); -- 2.7.4
RE: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset
Hi, Luliana > Subject: RE: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset > > Hi, Luliana > > > > Subject: Re: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset > > > > > > > > On 6/9/2020 4:15 PM, Anson Huang wrote: > > > Correct i.MX8MP UID fuse offset according to fuse map: > > > > > > UID_LOW: 0x420 > > > UID_HIGH: 0x430 > > > > > > Signed-off-by: Anson Huang > > > > If this patch corrects the imx8mp UID shouldn't have a Fixes tag? > > I thought about this, but I was confused that this file name is changed by > commit fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc > driver"), so if to add fix tag, should I add the tag to point to first patch > of > drivers/soc/imx/soc-imx8m.c, or the original commit of supporting 8MP UID in > drivers/soc/imx/soc-imx8.c which is no long there. I think it should be commit fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver"), will send a V2 patch. Thanks, Anson
RE: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset
Hi, Luliana > Subject: Re: [PATCH] soc: imx8m: Correct i.MX8MP UID fuse offset > > > > On 6/9/2020 4:15 PM, Anson Huang wrote: > > Correct i.MX8MP UID fuse offset according to fuse map: > > > > UID_LOW: 0x420 > > UID_HIGH: 0x430 > > > > Signed-off-by: Anson Huang > > If this patch corrects the imx8mp UID shouldn't have a Fixes tag? I thought about this, but I was confused that this file name is changed by commit fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver"), so if to add fix tag, should I add the tag to point to first patch of drivers/soc/imx/soc-imx8m.c, or the original commit of supporting 8MP UID in drivers/soc/imx/soc-imx8.c which is no long there. Thanks Anson