date:20180226

[PATCH bpf-next v8 02/11] fs,security: Add a new file access type: MAY_CHROOT

2018-02-26 Thread Mickaël Salaün

For compatibility reason, MAY_CHROOT is always set with MAY_CHDIR.
However, this new flag enable to differentiate a chdir form a chroot.

This is needed for the Landlock LSM to be able to evaluate a new root
directory.

Signed-off-by: Mickaël Salaün 
Cc: Alexander Viro 
Cc: Casey Schaufler 
Cc: James Morris 
Cc: John Johansen 
Cc: Kees Cook 
Cc: Paul Moore 
Cc: "Serge E. Hallyn" 
Cc: Stephen Smalley 
Cc: Tetsuo Handa 
Cc: linux-fsde...@vger.kernel.org
---
 fs/open.c  | 3 ++-
 include/linux/fs.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 7ea118471dce..084d147c0e96 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -489,7 +489,8 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
if (error)
goto out;
 
-   error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+   error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR |
+   MAY_CHROOT);
if (error)
goto dput_and_out;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2a815560fda0..67c62374446c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -90,6 +90,7 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define MAY_CHDIR  0x0040
 /* called from RCU mode, don't block */
 #define MAY_NOT_BLOCK  0x0080
+#define MAY_CHROOT 0x0100
 
 /*
  * flags in file.f_mode.  Note that FMODE_READ and FMODE_WRITE must correspond
-- 
2.16.2

[PATCH bpf-next v8 02/11] fs,security: Add a new file access type: MAY_CHROOT

2018-02-26 Thread Mickaël Salaün

For compatibility reason, MAY_CHROOT is always set with MAY_CHDIR.
However, this new flag enable to differentiate a chdir form a chroot.

This is needed for the Landlock LSM to be able to evaluate a new root
directory.

Signed-off-by: Mickaël Salaün 
Cc: Alexander Viro 
Cc: Casey Schaufler 
Cc: James Morris 
Cc: John Johansen 
Cc: Kees Cook 
Cc: Paul Moore 
Cc: "Serge E. Hallyn" 
Cc: Stephen Smalley 
Cc: Tetsuo Handa 
Cc: linux-fsde...@vger.kernel.org
---
 fs/open.c  | 3 ++-
 include/linux/fs.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 7ea118471dce..084d147c0e96 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -489,7 +489,8 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
if (error)
goto out;
 
-   error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+   error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR |
+   MAY_CHROOT);
if (error)
goto dput_and_out;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2a815560fda0..67c62374446c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -90,6 +90,7 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define MAY_CHDIR  0x0040
 /* called from RCU mode, don't block */
 #define MAY_NOT_BLOCK  0x0080
+#define MAY_CHROOT 0x0100
 
 /*
  * flags in file.f_mode.  Note that FMODE_READ and FMODE_WRITE must correspond
-- 
2.16.2

[PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode

2018-02-26 Thread Mickaël Salaün

This new map store arbitrary 64-bits values referenced by inode keys.
The map can be updated from user space with file descriptor pointing to
inodes tied to a file system.  From an eBPF (Landlock) program point of
view, such a map is read-only and can only be used to retrieved a
64-bits value tied to a given inode.  This is useful to recognize an
inode tagged by user space, without access right to this inode (i.e. no
need to have a write access to this inode).

This also add new BPF map object types: landlock_tag_object and
landlock_chain.  The landlock_chain pointer is needed to be able to
handle multiple tags per inode.  The landlock_tag_object is needed to
update a reference to a list of shared tags.  This is typically used by
a struct file (reference) and a struct inode (shared list of tags).
This way, we can account the process/user for the number of tagged
files, while still being able to read the tags from the pointed inode.

Add dedicated BPF functions to handle this type of map:
* bpf_inode_map_update_elem()
* bpf_inode_map_lookup_elem()
* bpf_inode_map_delete_elem()

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Jann Horn 
---

Changes since v7:
* new design with a dedicated map and a BPF function to tie a value to
  an inode
* add the ability to set or get a tag on an inode from a Landlock
  program

Changes since v6:
* remove WARN_ON() for missing dentry->d_inode
* refactor bpf_landlock_func_proto() (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes and rebase

Changes since v4:
* use a file abstraction (handle) to wrap inode, dentry, path and file
  structs
* remove bpf_landlock_cmp_fs_beneath()
* rename the BPF helper and move it to kernel/bpf/
* tighten helpers accessible by a Landlock rule

Changes since v3:
* remove bpf_landlock_cmp_fs_prop() (suggested by Alexei Starovoitov)
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
* add bpf_landlock_get_fs_mode() helper to check file type and mode
* merge WARN_ON() (suggested by Kees Cook)
* fix and update bpf_helpers.h
* use BPF_CALL_* for eBPF helpers (suggested by Alexei Starovoitov)
* make handle arraymap safe (RCU) and remove buggy synchronize_rcu()
* factor out the arraymay walk
* use size_t to index array (suggested by Jann Horn)

Changes since v2:
* add MNT_INTERNAL check to only add file handle from user-visible FS
  (e.g. no anonymous inode)
* replace struct file* with struct path* in map_landlock_handle
* add BPF protos
* fix bpf_landlock_cmp_fs_prop_with_struct_file()
---
 include/linux/bpf.h|  18 ++
 include/linux/bpf_types.h  |   3 +
 include/linux/landlock.h   |  24 +++
 include/uapi/linux/bpf.h   |  22 ++-
 kernel/bpf/Makefile|   3 +
 kernel/bpf/core.c  |   1 +
 kernel/bpf/helpers.c   |  38 +
 kernel/bpf/inodemap.c  | 318 +++
 kernel/bpf/syscall.c   |  27 ++-
 kernel/bpf/verifier.c  |  25 +++
 security/landlock/Makefile |   1 +
 security/landlock/tag.c| 373 +
 security/landlock/tag.h|  36 
 security/landlock/tag_fs.c |  59 +++
 security/landlock/tag_fs.h |  26 +++
 tools/include/uapi/linux/bpf.h |  22 ++-
 16 files changed, 993 insertions(+), 3 deletions(-)
 create mode 100644 kernel/bpf/inodemap.c
 create mode 100644 security/landlock/tag.c
 create mode 100644 security/landlock/tag.h
 create mode 100644 security/landlock/tag_fs.c
 create mode 100644 security/landlock/tag_fs.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 377b2f3519f3..c9b940a44c3e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -127,6 +127,10 @@ enum bpf_arg_type {
 
ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING,   /* any (initialized) argument is ok */
+
+   ARG_PTR_TO_INODE,   /* pointer to a struct inode */
+   ARG_PTR_TO_LL_TAG_OBJ,  /* pointer to a struct landlock_tag_object */
+   ARG_PTR_TO_LL_CHAIN,/* pointer to a struct landlock_chain */
 };
 
 /* type of values returned from helper functions */
@@ -184,6 +188,9 @@ enum bpf_reg_type {
PTR_TO_PACKET_META,  /* skb->data - meta_len */
PTR_TO_PACKET,   /* reg points to skb->data */
PTR_TO_PACKET_END,   /* skb->data + headlen */
+   PTR_TO_INODE,/* reg points to struct inode */
+   PTR_TO_LL_TAG_OBJ,   /* reg points to struct landlock_tag_object */
+   PTR_TO_LL_CHAIN, /* reg points to struct landlock_chain */
 };

[PATCH bpf-next v8 06/11] bpf,landlock: Add a new map type: inode

2018-02-26 Thread Mickaël Salaün

This new map store arbitrary 64-bits values referenced by inode keys.
The map can be updated from user space with file descriptor pointing to
inodes tied to a file system.  From an eBPF (Landlock) program point of
view, such a map is read-only and can only be used to retrieved a
64-bits value tied to a given inode.  This is useful to recognize an
inode tagged by user space, without access right to this inode (i.e. no
need to have a write access to this inode).

This also add new BPF map object types: landlock_tag_object and
landlock_chain.  The landlock_chain pointer is needed to be able to
handle multiple tags per inode.  The landlock_tag_object is needed to
update a reference to a list of shared tags.  This is typically used by
a struct file (reference) and a struct inode (shared list of tags).
This way, we can account the process/user for the number of tagged
files, while still being able to read the tags from the pointed inode.

Add dedicated BPF functions to handle this type of map:
* bpf_inode_map_update_elem()
* bpf_inode_map_lookup_elem()
* bpf_inode_map_delete_elem()

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Jann Horn 
---

Changes since v7:
* new design with a dedicated map and a BPF function to tie a value to
  an inode
* add the ability to set or get a tag on an inode from a Landlock
  program

Changes since v6:
* remove WARN_ON() for missing dentry->d_inode
* refactor bpf_landlock_func_proto() (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes and rebase

Changes since v4:
* use a file abstraction (handle) to wrap inode, dentry, path and file
  structs
* remove bpf_landlock_cmp_fs_beneath()
* rename the BPF helper and move it to kernel/bpf/
* tighten helpers accessible by a Landlock rule

Changes since v3:
* remove bpf_landlock_cmp_fs_prop() (suggested by Alexei Starovoitov)
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
* add bpf_landlock_get_fs_mode() helper to check file type and mode
* merge WARN_ON() (suggested by Kees Cook)
* fix and update bpf_helpers.h
* use BPF_CALL_* for eBPF helpers (suggested by Alexei Starovoitov)
* make handle arraymap safe (RCU) and remove buggy synchronize_rcu()
* factor out the arraymay walk
* use size_t to index array (suggested by Jann Horn)

Changes since v2:
* add MNT_INTERNAL check to only add file handle from user-visible FS
  (e.g. no anonymous inode)
* replace struct file* with struct path* in map_landlock_handle
* add BPF protos
* fix bpf_landlock_cmp_fs_prop_with_struct_file()
---
 include/linux/bpf.h|  18 ++
 include/linux/bpf_types.h  |   3 +
 include/linux/landlock.h   |  24 +++
 include/uapi/linux/bpf.h   |  22 ++-
 kernel/bpf/Makefile|   3 +
 kernel/bpf/core.c  |   1 +
 kernel/bpf/helpers.c   |  38 +
 kernel/bpf/inodemap.c  | 318 +++
 kernel/bpf/syscall.c   |  27 ++-
 kernel/bpf/verifier.c  |  25 +++
 security/landlock/Makefile |   1 +
 security/landlock/tag.c| 373 +
 security/landlock/tag.h|  36 
 security/landlock/tag_fs.c |  59 +++
 security/landlock/tag_fs.h |  26 +++
 tools/include/uapi/linux/bpf.h |  22 ++-
 16 files changed, 993 insertions(+), 3 deletions(-)
 create mode 100644 kernel/bpf/inodemap.c
 create mode 100644 security/landlock/tag.c
 create mode 100644 security/landlock/tag.h
 create mode 100644 security/landlock/tag_fs.c
 create mode 100644 security/landlock/tag_fs.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 377b2f3519f3..c9b940a44c3e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -127,6 +127,10 @@ enum bpf_arg_type {
 
ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING,   /* any (initialized) argument is ok */
+
+   ARG_PTR_TO_INODE,   /* pointer to a struct inode */
+   ARG_PTR_TO_LL_TAG_OBJ,  /* pointer to a struct landlock_tag_object */
+   ARG_PTR_TO_LL_CHAIN,/* pointer to a struct landlock_chain */
 };
 
 /* type of values returned from helper functions */
@@ -184,6 +188,9 @@ enum bpf_reg_type {
PTR_TO_PACKET_META,  /* skb->data - meta_len */
PTR_TO_PACKET,   /* reg points to skb->data */
PTR_TO_PACKET_END,   /* skb->data + headlen */
+   PTR_TO_INODE,/* reg points to struct inode */
+   PTR_TO_LL_TAG_OBJ,   /* reg points to struct landlock_tag_object */
+   PTR_TO_LL_CHAIN, /* reg points to struct landlock_chain */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -306,6 +313,10 @@ struct bpf_event_entry {
struct rcu_head rcu;
 };
 
+
+u64 bpf_tail_call(u64 ctx, u64 r2,

[RFC -mm] mm: Fix races between swapoff and flush dcache

2018-02-26 Thread Huang, Ying

From: Huang Ying 

>From commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB
trunks") on, after swapoff, the address_space associated with the swap
device will be freed.  So page_mapping() users which may touch the
address_space need some kind of mechanism to prevent the address_space
from being freed during accessing.

The dcache flushing functions (flush_dcache_page(), etc) in
architecture specific code may access the address_space of swap device
for anonymous pages in swap cache via page_mapping() function.  But in
some cases there are no mechanisms to prevent the swap device from
being swapoff, for example,

CPU1CPU2
__get_user_pages()  swapoff()
  flush_dcache_page()
mapping = page_mapping()
  ... exit_swap_address_space()
  ...   kvfree(spaces)
  mapping_mapped(mapping)

The address space may be accessed after being freed.

But from cachetlb.txt and Russell King, flush_dcache_page() only care
about file cache pages, for anonymous pages, flush_anon_page() should
be used.  The implementation of flush_dcache_page() in all
architectures follows this too.  They will check whether
page_mapping() is NULL and whether mapping_mapped() is true to
determine whether to flush the dcache immediately.  And they will use
interval tree (mapping->i_mmap) to find all user space mappings.
While mapping_mapped() and mapping->i_mmap isn't used by anonymous
pages in swap cache at all.

So, to fix the race between swapoff and flush dcache, __page_mapping()
is add to return the address_space for file cache pages and NULL
otherwise.  All page_mapping() invoking in flush dcache functions are
replaced with __page_mapping().

The patch is only build tested, because I have no machine with
architecture other than x86.

Signed-off-by: "Huang, Ying" 
Cc: Minchan Kim 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Dave Hansen 
Cc: Chen Liqin 
Cc: Russell King 
Cc: Yoshinori Sato 
Cc: "James E.J. Bottomley" 
Cc: Guan Xuetao 
Cc: "David S. Miller" 
Cc: Chris Zankel 
Cc: Vineet Gupta 
Cc: Ley Foon Tan 
Cc: Ralf Baechle 
Cc: Andi Kleen 
---
 arch/arc/mm/cache.c   |  2 +-
 arch/arm/mm/copypage-v4mc.c   |  2 +-
 arch/arm/mm/copypage-v6.c |  2 +-
 arch/arm/mm/copypage-xscale.c |  2 +-
 arch/arm/mm/fault-armv.c  |  2 +-
 arch/arm/mm/flush.c   |  6 +++---
 arch/mips/mm/cache.c  |  2 +-
 arch/nios2/mm/cacheflush.c|  4 ++--
 arch/parisc/kernel/cache.c|  4 ++--
 arch/score/mm/cache.c |  4 ++--
 arch/sh/mm/cache-sh4.c|  2 +-
 arch/sh/mm/cache-sh7705.c |  2 +-
 arch/sparc/kernel/smp_64.c|  8 
 arch/sparc/mm/init_64.c   |  6 +++---
 arch/sparc/mm/tlb.c   |  2 +-
 arch/unicore32/mm/flush.c |  2 +-
 arch/unicore32/mm/mmu.c   |  2 +-
 arch/xtensa/mm/cache.c|  2 +-
 include/linux/mm.h|  1 +
 mm/util.c | 20 
 20 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 2072f3451e9c..0f607d5a85da 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -833,7 +833,7 @@ void flush_dcache_page(struct page *page)
}
 
/* don't handle anon pages here */
-   mapping = page_mapping(page);
+   mapping = __page_mapping(page);
if (!mapping)
return;
 
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index 1267e64133b9..6d9e632ca43b 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -70,7 +70,7 @@ void v4_mc_copy_user_highpage(struct page *to, struct page 
*from,
void *kto = kmap_atomic(to);
 
if (!test_and_set_bit(PG_dcache_clean, >flags))
-   __flush_dcache_page(page_mapping(from), from);
+   __flush_dcache_page(__page_mapping(from), from);
 
raw_spin_lock(_lock);
 
diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index 70423345da26..2f13ffd847a6 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -76,7 +76,7 @@ static void v6_copy_user_highpage_aliasing(struct page *to,
unsigned long kfrom, kto;
 
if (!test_and_set_bit(PG_dcache_clean, >flags))
-   __flush_dcache_page(page_mapping(from), from);
+   __flush_dcache_page(__page_mapping(from), from);
 
/* FIXME: not highmem safe */
discard_old_kernel_data(page_address(to));
diff --git a/arch/arm/mm/copypage-xscale.c

[PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata

2018-02-26 Thread Mickaël Salaün

The function current_nameidata_security(struct inode *) can be used to
retrieve a blob's pointer address tied to the inode being walk through.
This enable to follow a path lookup and know where an inode access come
from. This is needed for the Landlock LSM to be able to restrict access
to file path.

The LSM hook nameidata_free_security(struct inode *) is called before
freeing the associated nameidata.

Signed-off-by: Mickaël Salaün 
Cc: Alexander Viro 
Cc: Casey Schaufler 
Cc: James Morris 
Cc: John Johansen 
Cc: Kees Cook 
Cc: Paul Moore 
Cc: "Serge E. Hallyn" 
Cc: Stephen Smalley 
Cc: Tetsuo Handa 
Cc: linux-fsde...@vger.kernel.org
---
 fs/namei.c| 39 +++
 include/linux/lsm_hooks.h |  7 +++
 include/linux/namei.h | 14 +-
 include/linux/security.h  |  7 +++
 security/security.c   |  7 +++
 5 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 921ae32dbc80..d592b3fb0d1e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,9 @@ struct nameidata {
struct inode*link_inode;
unsignedroot_seq;
int dfd;
+#ifdef CONFIG_SECURITY
+   struct nameidata_lookup lookup;
+#endif
 } __randomize_layout;
 
 static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
@@ -515,6 +518,9 @@ static void set_nameidata(struct nameidata *p, int dfd, 
struct filename *name)
p->name = name;
p->total_link_count = old ? old->total_link_count : 0;
p->saved = old;
+#ifdef CONFIG_SECURITY
+   p->lookup.security = NULL;
+#endif
current->nameidata = p;
 }
 
@@ -522,6 +528,7 @@ static void restore_nameidata(void)
 {
struct nameidata *now = current->nameidata, *old = now->saved;
 
+   security_nameidata_put_lookup(>lookup, now->inode);
current->nameidata = old;
if (old)
old->total_link_count = now->total_link_count;
@@ -549,6 +556,27 @@ static int __nd_alloc_stack(struct nameidata *nd)
return 0;
 }
 
+#ifdef CONFIG_SECURITY
+/**
+ * current_nameidata_lookup - get the state of the current path walk
+ *
+ * @inode: inode associated to the path walk
+ *
+ * Used by LSM modules for access restriction based on path walk. The LSM is in
+ * charge of the lookup->security blob allocation and management. The hook
+ * security_nameidata_put_lookup() will be called after the path walk end.
+ *
+ * Return ERR_PTR(-ENOENT) if there is no match.
+ */
+struct nameidata_lookup *current_nameidata_lookup(const struct inode *inode)
+{
+   if (!current->nameidata || current->nameidata->inode != inode)
+   return ERR_PTR(-ENOENT);
+   return >nameidata->lookup;
+}
+EXPORT_SYMBOL(current_nameidata_lookup);
+#endif
+
 /**
  * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
  * @path: nameidate to verify
@@ -2009,6 +2037,13 @@ static inline u64 hash_name(const void *salt, const char 
*name)
 
 #endif
 
+static inline void refresh_lookup(struct nameidata *nd)
+{
+#ifdef CONFIG_SECURITY
+   nd->lookup.type = nd->last_type;
+#endif
+}
+
 /*
  * Name resolution.
  * This is the basic name resolution function, turning a pathname into
@@ -2025,6 +2060,8 @@ static int link_path_walk(const char *name, struct 
nameidata *nd)
name++;
if (!*name)
return 0;
+   /* be ready for may_lookup() */
+   refresh_lookup(nd);
 
/* At this point we know we have a real path component. */
for(;;) {
@@ -2064,6 +2101,8 @@ static int link_path_walk(const char *name, struct 
nameidata *nd)
nd->last.hash_len = hash_len;
nd->last.name = name;
nd->last_type = type;
+   /* be ready for the next security_inode_permission() */
+   refresh_lookup(nd);
 
name += hashlen_len(hash_len);
if (!*name)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7161d8e7ee79..d71cf183f0be 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -428,6 +428,10 @@
  * security module does not know about attribute or a negative error code
  * to abort the copy up. Note that the caller is responsible for reading
  * and writing the xattrs as this hook is merely a filter.
+ * @nameidata_put_lookup:
+ * Deallocate and clear the current's nameidata->lookup.security field.
+ * @lookup->security contains the security structure to be freed.
+ * @inode is the last associated inode to the path walk
  *
  * Security hooks for file operations
  *
@@ -1514,6 +1518,8 @@ union security_list_options {
void (*inode_getsecid)(struct inode *inode,

[RFC -mm] mm: Fix races between swapoff and flush dcache

2018-02-26 Thread Huang, Ying

From: Huang Ying 

>From commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB
trunks") on, after swapoff, the address_space associated with the swap
device will be freed.  So page_mapping() users which may touch the
address_space need some kind of mechanism to prevent the address_space
from being freed during accessing.

The dcache flushing functions (flush_dcache_page(), etc) in
architecture specific code may access the address_space of swap device
for anonymous pages in swap cache via page_mapping() function.  But in
some cases there are no mechanisms to prevent the swap device from
being swapoff, for example,

CPU1CPU2
__get_user_pages()  swapoff()
  flush_dcache_page()
mapping = page_mapping()
  ... exit_swap_address_space()
  ...   kvfree(spaces)
  mapping_mapped(mapping)

The address space may be accessed after being freed.

But from cachetlb.txt and Russell King, flush_dcache_page() only care
about file cache pages, for anonymous pages, flush_anon_page() should
be used.  The implementation of flush_dcache_page() in all
architectures follows this too.  They will check whether
page_mapping() is NULL and whether mapping_mapped() is true to
determine whether to flush the dcache immediately.  And they will use
interval tree (mapping->i_mmap) to find all user space mappings.
While mapping_mapped() and mapping->i_mmap isn't used by anonymous
pages in swap cache at all.

So, to fix the race between swapoff and flush dcache, __page_mapping()
is add to return the address_space for file cache pages and NULL
otherwise.  All page_mapping() invoking in flush dcache functions are
replaced with __page_mapping().

The patch is only build tested, because I have no machine with
architecture other than x86.

Signed-off-by: "Huang, Ying" 
Cc: Minchan Kim 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Dave Hansen 
Cc: Chen Liqin 
Cc: Russell King 
Cc: Yoshinori Sato 
Cc: "James E.J. Bottomley" 
Cc: Guan Xuetao 
Cc: "David S. Miller" 
Cc: Chris Zankel 
Cc: Vineet Gupta 
Cc: Ley Foon Tan 
Cc: Ralf Baechle 
Cc: Andi Kleen 
---
 arch/arc/mm/cache.c   |  2 +-
 arch/arm/mm/copypage-v4mc.c   |  2 +-
 arch/arm/mm/copypage-v6.c |  2 +-
 arch/arm/mm/copypage-xscale.c |  2 +-
 arch/arm/mm/fault-armv.c  |  2 +-
 arch/arm/mm/flush.c   |  6 +++---
 arch/mips/mm/cache.c  |  2 +-
 arch/nios2/mm/cacheflush.c|  4 ++--
 arch/parisc/kernel/cache.c|  4 ++--
 arch/score/mm/cache.c |  4 ++--
 arch/sh/mm/cache-sh4.c|  2 +-
 arch/sh/mm/cache-sh7705.c |  2 +-
 arch/sparc/kernel/smp_64.c|  8 
 arch/sparc/mm/init_64.c   |  6 +++---
 arch/sparc/mm/tlb.c   |  2 +-
 arch/unicore32/mm/flush.c |  2 +-
 arch/unicore32/mm/mmu.c   |  2 +-
 arch/xtensa/mm/cache.c|  2 +-
 include/linux/mm.h|  1 +
 mm/util.c | 20 
 20 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 2072f3451e9c..0f607d5a85da 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -833,7 +833,7 @@ void flush_dcache_page(struct page *page)
}
 
/* don't handle anon pages here */
-   mapping = page_mapping(page);
+   mapping = __page_mapping(page);
if (!mapping)
return;
 
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index 1267e64133b9..6d9e632ca43b 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -70,7 +70,7 @@ void v4_mc_copy_user_highpage(struct page *to, struct page 
*from,
void *kto = kmap_atomic(to);
 
if (!test_and_set_bit(PG_dcache_clean, >flags))
-   __flush_dcache_page(page_mapping(from), from);
+   __flush_dcache_page(__page_mapping(from), from);
 
raw_spin_lock(_lock);
 
diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index 70423345da26..2f13ffd847a6 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -76,7 +76,7 @@ static void v6_copy_user_highpage_aliasing(struct page *to,
unsigned long kfrom, kto;
 
if (!test_and_set_bit(PG_dcache_clean, >flags))
-   __flush_dcache_page(page_mapping(from), from);
+   __flush_dcache_page(__page_mapping(from), from);
 
/* FIXME: not highmem safe */
discard_old_kernel_data(page_address(to));
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index 0fb85025344d..221129649627 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -90,7 +90,7 @@ void xscale_mc_copy_user_highpage(struct page *to, struct 
page *from,
void *kto = kmap_atomic(to);
 
if (!test_and_set_bit(PG_dcache_clean, >flags))
-   __flush_dcache_page(page_mapping(from), from);
+

[PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata

2018-02-26 Thread Mickaël Salaün

The function current_nameidata_security(struct inode *) can be used to
retrieve a blob's pointer address tied to the inode being walk through.
This enable to follow a path lookup and know where an inode access come
from. This is needed for the Landlock LSM to be able to restrict access
to file path.

The LSM hook nameidata_free_security(struct inode *) is called before
freeing the associated nameidata.

Signed-off-by: Mickaël Salaün 
Cc: Alexander Viro 
Cc: Casey Schaufler 
Cc: James Morris 
Cc: John Johansen 
Cc: Kees Cook 
Cc: Paul Moore 
Cc: "Serge E. Hallyn" 
Cc: Stephen Smalley 
Cc: Tetsuo Handa 
Cc: linux-fsde...@vger.kernel.org
---
 fs/namei.c| 39 +++
 include/linux/lsm_hooks.h |  7 +++
 include/linux/namei.h | 14 +-
 include/linux/security.h  |  7 +++
 security/security.c   |  7 +++
 5 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 921ae32dbc80..d592b3fb0d1e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,9 @@ struct nameidata {
struct inode*link_inode;
unsignedroot_seq;
int dfd;
+#ifdef CONFIG_SECURITY
+   struct nameidata_lookup lookup;
+#endif
 } __randomize_layout;
 
 static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
@@ -515,6 +518,9 @@ static void set_nameidata(struct nameidata *p, int dfd, 
struct filename *name)
p->name = name;
p->total_link_count = old ? old->total_link_count : 0;
p->saved = old;
+#ifdef CONFIG_SECURITY
+   p->lookup.security = NULL;
+#endif
current->nameidata = p;
 }
 
@@ -522,6 +528,7 @@ static void restore_nameidata(void)
 {
struct nameidata *now = current->nameidata, *old = now->saved;
 
+   security_nameidata_put_lookup(>lookup, now->inode);
current->nameidata = old;
if (old)
old->total_link_count = now->total_link_count;
@@ -549,6 +556,27 @@ static int __nd_alloc_stack(struct nameidata *nd)
return 0;
 }
 
+#ifdef CONFIG_SECURITY
+/**
+ * current_nameidata_lookup - get the state of the current path walk
+ *
+ * @inode: inode associated to the path walk
+ *
+ * Used by LSM modules for access restriction based on path walk. The LSM is in
+ * charge of the lookup->security blob allocation and management. The hook
+ * security_nameidata_put_lookup() will be called after the path walk end.
+ *
+ * Return ERR_PTR(-ENOENT) if there is no match.
+ */
+struct nameidata_lookup *current_nameidata_lookup(const struct inode *inode)
+{
+   if (!current->nameidata || current->nameidata->inode != inode)
+   return ERR_PTR(-ENOENT);
+   return >nameidata->lookup;
+}
+EXPORT_SYMBOL(current_nameidata_lookup);
+#endif
+
 /**
  * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
  * @path: nameidate to verify
@@ -2009,6 +2037,13 @@ static inline u64 hash_name(const void *salt, const char 
*name)
 
 #endif
 
+static inline void refresh_lookup(struct nameidata *nd)
+{
+#ifdef CONFIG_SECURITY
+   nd->lookup.type = nd->last_type;
+#endif
+}
+
 /*
  * Name resolution.
  * This is the basic name resolution function, turning a pathname into
@@ -2025,6 +2060,8 @@ static int link_path_walk(const char *name, struct 
nameidata *nd)
name++;
if (!*name)
return 0;
+   /* be ready for may_lookup() */
+   refresh_lookup(nd);
 
/* At this point we know we have a real path component. */
for(;;) {
@@ -2064,6 +2101,8 @@ static int link_path_walk(const char *name, struct 
nameidata *nd)
nd->last.hash_len = hash_len;
nd->last.name = name;
nd->last_type = type;
+   /* be ready for the next security_inode_permission() */
+   refresh_lookup(nd);
 
name += hashlen_len(hash_len);
if (!*name)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7161d8e7ee79..d71cf183f0be 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -428,6 +428,10 @@
  * security module does not know about attribute or a negative error code
  * to abort the copy up. Note that the caller is responsible for reading
  * and writing the xattrs as this hook is merely a filter.
+ * @nameidata_put_lookup:
+ * Deallocate and clear the current's nameidata->lookup.security field.
+ * @lookup->security contains the security structure to be freed.
+ * @inode is the last associated inode to the path walk
  *
  * Security hooks for file operations
  *
@@ -1514,6 +1518,8 @@ union security_list_options {
void (*inode_getsecid)(struct inode *inode, u32 *secid);
int (*inode_copy_up)(struct dentry *src, struct cred **new);
int (*inode_copy_up_xattr)(const char *name);
+   void (*nameidata_put_lookup)(struct nameidata_lookup *lookup,
+

[PATCH bpf-next v8 10/11] bpf,landlock: Add tests for Landlock

2018-02-26 Thread Mickaël Salaün

Test basic context access, ptrace protection and filesystem hooks and
Landlock program chaining with multiple cases.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Shuah Khan 
Cc: Will Drewry 
---

Changes since v7:
* update tests and add new ones for filesystem hierarchy and Landlock
  chains.

Changes since v6:
* use the new kselftest_harness.h
* use const variables
* replace ASSERT_STEP with ASSERT_*
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* force sample library rebuild
* fix install target

Changes since v5:
* add subtype test
* add ptrace tests
* split and rename files
* cleanup and rebase
---
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/bpf/bpf_helpers.h  |   7 +
 tools/testing/selftests/bpf/test_verifier.c|  84 +
 tools/testing/selftests/landlock/.gitignore|   5 +
 tools/testing/selftests/landlock/Makefile  |  35 ++
 tools/testing/selftests/landlock/test.h|  31 ++
 tools/testing/selftests/landlock/test_base.c   |  27 ++
 tools/testing/selftests/landlock/test_chain.c  | 249 +
 tools/testing/selftests/landlock/test_fs.c | 492 +
 tools/testing/selftests/landlock/test_ptrace.c | 158 
 10 files changed, 1089 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/test.h
 create mode 100644 tools/testing/selftests/landlock/test_base.c
 create mode 100644 tools/testing/selftests/landlock/test_chain.c
 create mode 100644 tools/testing/selftests/landlock/test_fs.c
 create mode 100644 tools/testing/selftests/landlock/test_ptrace.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 7442dfb73b7f..5d00deb3cab6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += gpio
 TARGETS += intel_pstate
 TARGETS += ipc
 TARGETS += kcmp
+TARGETS += landlock
 TARGETS += lib
 TARGETS += membarrier
 TARGETS += memfd
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index dde2c11d7771..414e267491f7 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -86,6 +86,13 @@ static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
(void *) BPF_FUNC_perf_prog_read_value;
 static int (*bpf_override_return)(void *ctx, unsigned long rc) =
(void *) BPF_FUNC_override_return;
+static unsigned long long (*bpf_inode_map_lookup)(void *map, void *key) =
+   (void *) BPF_FUNC_inode_map_lookup;
+static unsigned long long (*bpf_inode_get_tag)(void *inode, void *chain) =
+   (void *) BPF_FUNC_inode_get_tag;
+static unsigned long long (*bpf_landlock_set_tag)(void *tag_obj, void *chain,
+ unsigned long long value) =
+   (void *) BPF_FUNC_landlock_set_tag;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 3c24a5a7bafc..5f68b95187fe 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -11240,6 +11241,89 @@ static struct bpf_test tests[] = {
.result = REJECT,
.has_prog_subtype = true,
},
+   {
+   "missing subtype",
+   .insns = {
+   BPF_MOV32_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "",
+   .result = REJECT,
+   .prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+   },
+   {
+   "landlock/fs_pick: always accept",
+   .insns = {
+   BPF_MOV32_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+   .has_prog_subtype = true,
+   .prog_subtype = {
+   .landlock_hook = {
+   .type = LANDLOCK_HOOK_FS_PICK,
+   .triggers = LANDLOCK_TRIGGER_FS_PICK_READ,
+   }
+   },
+   },
+   {
+   "landlock/fs_pick: read context",
+   .insns = {
+   BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+

[PATCH bpf-next v8 10/11] bpf,landlock: Add tests for Landlock

2018-02-26 Thread Mickaël Salaün

Test basic context access, ptrace protection and filesystem hooks and
Landlock program chaining with multiple cases.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Shuah Khan 
Cc: Will Drewry 
---

Changes since v7:
* update tests and add new ones for filesystem hierarchy and Landlock
  chains.

Changes since v6:
* use the new kselftest_harness.h
* use const variables
* replace ASSERT_STEP with ASSERT_*
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* force sample library rebuild
* fix install target

Changes since v5:
* add subtype test
* add ptrace tests
* split and rename files
* cleanup and rebase
---
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/bpf/bpf_helpers.h  |   7 +
 tools/testing/selftests/bpf/test_verifier.c|  84 +
 tools/testing/selftests/landlock/.gitignore|   5 +
 tools/testing/selftests/landlock/Makefile  |  35 ++
 tools/testing/selftests/landlock/test.h|  31 ++
 tools/testing/selftests/landlock/test_base.c   |  27 ++
 tools/testing/selftests/landlock/test_chain.c  | 249 +
 tools/testing/selftests/landlock/test_fs.c | 492 +
 tools/testing/selftests/landlock/test_ptrace.c | 158 
 10 files changed, 1089 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/test.h
 create mode 100644 tools/testing/selftests/landlock/test_base.c
 create mode 100644 tools/testing/selftests/landlock/test_chain.c
 create mode 100644 tools/testing/selftests/landlock/test_fs.c
 create mode 100644 tools/testing/selftests/landlock/test_ptrace.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 7442dfb73b7f..5d00deb3cab6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += gpio
 TARGETS += intel_pstate
 TARGETS += ipc
 TARGETS += kcmp
+TARGETS += landlock
 TARGETS += lib
 TARGETS += membarrier
 TARGETS += memfd
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index dde2c11d7771..414e267491f7 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -86,6 +86,13 @@ static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
(void *) BPF_FUNC_perf_prog_read_value;
 static int (*bpf_override_return)(void *ctx, unsigned long rc) =
(void *) BPF_FUNC_override_return;
+static unsigned long long (*bpf_inode_map_lookup)(void *map, void *key) =
+   (void *) BPF_FUNC_inode_map_lookup;
+static unsigned long long (*bpf_inode_get_tag)(void *inode, void *chain) =
+   (void *) BPF_FUNC_inode_get_tag;
+static unsigned long long (*bpf_landlock_set_tag)(void *tag_obj, void *chain,
+ unsigned long long value) =
+   (void *) BPF_FUNC_landlock_set_tag;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 3c24a5a7bafc..5f68b95187fe 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -11240,6 +11241,89 @@ static struct bpf_test tests[] = {
.result = REJECT,
.has_prog_subtype = true,
},
+   {
+   "missing subtype",
+   .insns = {
+   BPF_MOV32_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "",
+   .result = REJECT,
+   .prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+   },
+   {
+   "landlock/fs_pick: always accept",
+   .insns = {
+   BPF_MOV32_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK,
+   .has_prog_subtype = true,
+   .prog_subtype = {
+   .landlock_hook = {
+   .type = LANDLOCK_HOOK_FS_PICK,
+   .triggers = LANDLOCK_TRIGGER_FS_PICK_READ,
+   }
+   },
+   },
+   {
+   "landlock/fs_pick: read context",
+   .insns = {
+   BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+   offsetof(struct landlock_ctx_fs_pick, cookie)),
+   /* test operations on raw values */
+

[PATCH bpf-next v8 11/11] landlock: Add user and kernel documentation for Landlock

2018-02-26 Thread Mickaël Salaün

This documentation can be built with the Sphinx framework.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Jonathan Corbet 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v7:
* update documentation according to the Landlock revamp

Changes since v6:
* add a check for ctx->event
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose and add a
  dedicated changelog section
* update tables
* relax no_new_privs recommendations
* remove ABILITY_WRITE related functions
* reword rule "appending" to "prepending" and explain it
* cosmetic fixes

Changes since v5:
* update the rule hierarchy inheritance explanation
* briefly explain ctx->arg2
* add ptrace restrictions
* explain EPERM
* update example (subtype)
* use ":manpage:"
---
 Documentation/security/index.rst   |   1 +
 Documentation/security/landlock/index.rst  |  19 +++
 Documentation/security/landlock/kernel.rst | 100 ++
 Documentation/security/landlock/user.rst   | 206 +
 4 files changed, 326 insertions(+)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 298a94a33f05..1db294025d0f 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -11,3 +11,4 @@ Security Documentation
LSM
self-protection
tpm/index
+   landlock/index
diff --git a/Documentation/security/landlock/index.rst 
b/Documentation/security/landlock/index.rst
new file mode 100644
index ..8afde6a5805c
--- /dev/null
+++ b/Documentation/security/landlock/index.rst
@@ -0,0 +1,19 @@
+=
+Landlock LSM: programmatic access control
+=
+
+Landlock is a stackable Linux Security Module (LSM) that makes it possible to
+create security sandboxes.  This kind of sandbox is expected to help mitigate
+the security impact of bugs or unexpected/malicious behaviors in user-space
+applications.  The current version allows only a process with the global
+CAP_SYS_ADMIN capability to create such sandboxes but the ultimate goal of
+Landlock is to empower any process, including unprivileged ones, to securely
+restrict themselves.  Landlock is inspired by seccomp-bpf but instead of
+filtering syscalls and their raw arguments, a Landlock rule can inspect the use
+of kernel objects like files and hence make a decision according to the kernel
+semantic.
+
+.. toctree::
+
+user
+kernel
diff --git a/Documentation/security/landlock/kernel.rst 
b/Documentation/security/landlock/kernel.rst
new file mode 100644
index ..0a52915e346c
--- /dev/null
+++ b/Documentation/security/landlock/kernel.rst
@@ -0,0 +1,100 @@
+==
+Landlock: kernel documentation
+==
+
+eBPF properties
+===
+
+To get an expressive language while still being safe and small, Landlock is
+based on eBPF. Landlock should be usable by untrusted processes and must
+therefore expose a minimal attack surface. The eBPF bytecode is minimal,
+powerful, widely used and designed to be used by untrusted applications. Thus,
+reusing the eBPF support in the kernel enables a generic approach while
+minimizing new code.
+
+An eBPF program has access to an eBPF context containing some fields used to
+inspect the current object. These arguments can be used directly (e.g. cookie)
+or passed to helper functions according to their types (e.g. inode pointer). It
+is then possible to do complex access checks without race conditions or
+inconsistent evaluation (i.e.  `incorrect mirroring of the OS code and state
+`_).
+
+A Landlock hook describes a particular access type.  For now, there is three
+hooks dedicated to filesystem related operations: LANDLOCK_HOOK_FS_PICK,
+LANDLOCK_HOOK_FS_WALK and LANDLOCK_HOOK_FS_GET.  A Landlock program is tied to
+one hook.  This makes it possible to statically check context accesses,
+potentially performed by such program, and hence prevents kernel address leaks
+and ensure the right use of hook arguments with eBPF functions.  Any user can
+add multiple Landlock programs per Landlock hook.  They are stacked and
+evaluated one after the other, starting from the most recent program, as
+seccomp-bpf does with its filters.  Underneath, a hook is an abstraction over a
+set of LSM hooks.
+
+
+Guiding

[PATCH bpf-next v8 11/11] landlock: Add user and kernel documentation for Landlock

2018-02-26 Thread Mickaël Salaün

This documentation can be built with the Sphinx framework.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Jonathan Corbet 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v7:
* update documentation according to the Landlock revamp

Changes since v6:
* add a check for ctx->event
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose and add a
  dedicated changelog section
* update tables
* relax no_new_privs recommendations
* remove ABILITY_WRITE related functions
* reword rule "appending" to "prepending" and explain it
* cosmetic fixes

Changes since v5:
* update the rule hierarchy inheritance explanation
* briefly explain ctx->arg2
* add ptrace restrictions
* explain EPERM
* update example (subtype)
* use ":manpage:"
---
 Documentation/security/index.rst   |   1 +
 Documentation/security/landlock/index.rst  |  19 +++
 Documentation/security/landlock/kernel.rst | 100 ++
 Documentation/security/landlock/user.rst   | 206 +
 4 files changed, 326 insertions(+)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 298a94a33f05..1db294025d0f 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -11,3 +11,4 @@ Security Documentation
LSM
self-protection
tpm/index
+   landlock/index
diff --git a/Documentation/security/landlock/index.rst 
b/Documentation/security/landlock/index.rst
new file mode 100644
index ..8afde6a5805c
--- /dev/null
+++ b/Documentation/security/landlock/index.rst
@@ -0,0 +1,19 @@
+=
+Landlock LSM: programmatic access control
+=
+
+Landlock is a stackable Linux Security Module (LSM) that makes it possible to
+create security sandboxes.  This kind of sandbox is expected to help mitigate
+the security impact of bugs or unexpected/malicious behaviors in user-space
+applications.  The current version allows only a process with the global
+CAP_SYS_ADMIN capability to create such sandboxes but the ultimate goal of
+Landlock is to empower any process, including unprivileged ones, to securely
+restrict themselves.  Landlock is inspired by seccomp-bpf but instead of
+filtering syscalls and their raw arguments, a Landlock rule can inspect the use
+of kernel objects like files and hence make a decision according to the kernel
+semantic.
+
+.. toctree::
+
+user
+kernel
diff --git a/Documentation/security/landlock/kernel.rst 
b/Documentation/security/landlock/kernel.rst
new file mode 100644
index ..0a52915e346c
--- /dev/null
+++ b/Documentation/security/landlock/kernel.rst
@@ -0,0 +1,100 @@
+==
+Landlock: kernel documentation
+==
+
+eBPF properties
+===
+
+To get an expressive language while still being safe and small, Landlock is
+based on eBPF. Landlock should be usable by untrusted processes and must
+therefore expose a minimal attack surface. The eBPF bytecode is minimal,
+powerful, widely used and designed to be used by untrusted applications. Thus,
+reusing the eBPF support in the kernel enables a generic approach while
+minimizing new code.
+
+An eBPF program has access to an eBPF context containing some fields used to
+inspect the current object. These arguments can be used directly (e.g. cookie)
+or passed to helper functions according to their types (e.g. inode pointer). It
+is then possible to do complex access checks without race conditions or
+inconsistent evaluation (i.e.  `incorrect mirroring of the OS code and state
+`_).
+
+A Landlock hook describes a particular access type.  For now, there is three
+hooks dedicated to filesystem related operations: LANDLOCK_HOOK_FS_PICK,
+LANDLOCK_HOOK_FS_WALK and LANDLOCK_HOOK_FS_GET.  A Landlock program is tied to
+one hook.  This makes it possible to statically check context accesses,
+potentially performed by such program, and hence prevents kernel address leaks
+and ensure the right use of hook arguments with eBPF functions.  Any user can
+add multiple Landlock programs per Landlock hook.  They are stacked and
+evaluated one after the other, starting from the most recent program, as
+seccomp-bpf does with its filters.  Underneath, a hook is an abstraction over a
+set of LSM hooks.
+
+
+Guiding principles
+==
+
+Unprivileged use
+
+
+* Landlock helpers and context should be usable by any unprivileged and
+  untrusted program while following the

[PATCH bpf-next v8 07/11] landlock: Handle filesystem access control

2018-02-26 Thread Mickaël Salaün

This add three Landlock: FS_WALK, FS_PICK and FS_GET.

The FS_WALK hook is used to walk through a file path. A program tied to
this hook will be evaluated for each directory traversal except the last
one if it is the leaf of the path.

The FS_PICK hook is used to validate a set of actions requested on a
file. This actions are defined with triggers (e.g. read, write, open,
append...).

The FS_GET hook is used to tag open files, which is necessary to be able
to evaluate relative paths.  A program tied to this hook can tag a file
with an inode map.

A Landlock program can be chained to another if it is permitted by the
BPF verifier. A FS_WALK can be chained to a FS_PICK which can be chained
to a FS_GET.

The Landlock LSM hook registration is done after other LSM to only run
actions from user-space, via eBPF programs, if the access was granted by
major (privileged) LSMs.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v7:
* major rewrite with clean Landlock hooks able to deal with file paths

Changes since v6:
* add 3 more sub-events: IOCTL, LOCK, FCNTL
  https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35...@digikod.net
* use the new security_add_hooks()
* explain the -Werror=unused-function
* constify pointers
* cleanup headers

Changes since v5:
* split hooks.[ch] into hooks.[ch] and hooks_fs.[ch]
* add more documentation
* cosmetic fixes
* rebase (SCALAR_VALUE)

Changes since v4:
* add LSM hook abstraction called Landlock event
  * use the compiler type checking to verify hooks use by an event
  * handle all filesystem related LSM hooks (e.g. file_permission,
mmap_file, sb_mount...)
* register BPF programs for Landlock just after LSM hooks registration
* move hooks registration after other LSMs
* add failsafes to check if a hook is not used by the kernel
* allow partial raw value access form the context (needed for programs
  generated by LLVM)

Changes since v3:
* split commit
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
---
 include/linux/lsm_hooks.h   |5 +
 security/landlock/Makefile  |5 +-
 security/landlock/common.h  |9 +
 security/landlock/enforce_seccomp.c |   10 +
 security/landlock/hooks.c   |  121 +
 security/landlock/hooks.h   |   35 ++
 security/landlock/hooks_cred.c  |   52 ++
 security/landlock/hooks_cred.h  |1 +
 security/landlock/hooks_fs.c| 1021 +++
 security/landlock/hooks_fs.h|   60 ++
 security/landlock/init.c|   56 ++
 security/landlock/task.c|   34 ++
 security/landlock/task.h|   29 +
 security/security.c |   12 +-
 14 files changed, 1447 insertions(+), 3 deletions(-)
 create mode 100644 security/landlock/hooks.c
 create mode 100644 security/landlock/hooks.h
 create mode 100644 security/landlock/hooks_cred.c
 create mode 100644 security/landlock/hooks_cred.h
 create mode 100644 security/landlock/hooks_fs.c
 create mode 100644 security/landlock/hooks_fs.h
 create mode 100644 security/landlock/task.c
 create mode 100644 security/landlock/task.h

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d71cf183f0be..c40163385b68 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -2032,5 +2032,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 0e1dd4612ecc..d0f532a93b4e 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := init.o chain.o \
+landlock-y := init.o chain.o task.o \
tag.o tag_fs.o \
-   enforce.o enforce_seccomp.o
+   enforce.o enforce_seccomp.o \
+   hooks.o hooks_cred.o hooks_fs.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
index 245e4ccafcf2..6d36b70068d5 100644
--- a/security/landlock/common.h
+++ b/security/landlock/common.h
@@ -82,4 +82,13 @@ static inline enum landlock_hook_type get_type(struct 
bpf_prog *prog)
return prog->aux->extra->subtype.landlock_hook.type;
 }
 
+__maybe_unused
+static bool current_has_prog_type(enum landlock_hook_type hook_type)
+{
+   struct landlock_prog_set *prog_set;
+
+   prog_set =

[PATCH bpf-next v8 07/11] landlock: Handle filesystem access control

2018-02-26 Thread Mickaël Salaün

This add three Landlock: FS_WALK, FS_PICK and FS_GET.

The FS_WALK hook is used to walk through a file path. A program tied to
this hook will be evaluated for each directory traversal except the last
one if it is the leaf of the path.

The FS_PICK hook is used to validate a set of actions requested on a
file. This actions are defined with triggers (e.g. read, write, open,
append...).

The FS_GET hook is used to tag open files, which is necessary to be able
to evaluate relative paths.  A program tied to this hook can tag a file
with an inode map.

A Landlock program can be chained to another if it is permitted by the
BPF verifier. A FS_WALK can be chained to a FS_PICK which can be chained
to a FS_GET.

The Landlock LSM hook registration is done after other LSM to only run
actions from user-space, via eBPF programs, if the access was granted by
major (privileged) LSMs.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v7:
* major rewrite with clean Landlock hooks able to deal with file paths

Changes since v6:
* add 3 more sub-events: IOCTL, LOCK, FCNTL
  https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35...@digikod.net
* use the new security_add_hooks()
* explain the -Werror=unused-function
* constify pointers
* cleanup headers

Changes since v5:
* split hooks.[ch] into hooks.[ch] and hooks_fs.[ch]
* add more documentation
* cosmetic fixes
* rebase (SCALAR_VALUE)

Changes since v4:
* add LSM hook abstraction called Landlock event
  * use the compiler type checking to verify hooks use by an event
  * handle all filesystem related LSM hooks (e.g. file_permission,
mmap_file, sb_mount...)
* register BPF programs for Landlock just after LSM hooks registration
* move hooks registration after other LSMs
* add failsafes to check if a hook is not used by the kernel
* allow partial raw value access form the context (needed for programs
  generated by LLVM)

Changes since v3:
* split commit
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
---
 include/linux/lsm_hooks.h   |5 +
 security/landlock/Makefile  |5 +-
 security/landlock/common.h  |9 +
 security/landlock/enforce_seccomp.c |   10 +
 security/landlock/hooks.c   |  121 +
 security/landlock/hooks.h   |   35 ++
 security/landlock/hooks_cred.c  |   52 ++
 security/landlock/hooks_cred.h  |1 +
 security/landlock/hooks_fs.c| 1021 +++
 security/landlock/hooks_fs.h|   60 ++
 security/landlock/init.c|   56 ++
 security/landlock/task.c|   34 ++
 security/landlock/task.h|   29 +
 security/security.c |   12 +-
 14 files changed, 1447 insertions(+), 3 deletions(-)
 create mode 100644 security/landlock/hooks.c
 create mode 100644 security/landlock/hooks.h
 create mode 100644 security/landlock/hooks_cred.c
 create mode 100644 security/landlock/hooks_cred.h
 create mode 100644 security/landlock/hooks_fs.c
 create mode 100644 security/landlock/hooks_fs.h
 create mode 100644 security/landlock/task.c
 create mode 100644 security/landlock/task.h

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d71cf183f0be..c40163385b68 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -2032,5 +2032,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 0e1dd4612ecc..d0f532a93b4e 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := init.o chain.o \
+landlock-y := init.o chain.o task.o \
tag.o tag_fs.o \
-   enforce.o enforce_seccomp.o
+   enforce.o enforce_seccomp.o \
+   hooks.o hooks_cred.o hooks_fs.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
index 245e4ccafcf2..6d36b70068d5 100644
--- a/security/landlock/common.h
+++ b/security/landlock/common.h
@@ -82,4 +82,13 @@ static inline enum landlock_hook_type get_type(struct 
bpf_prog *prog)
return prog->aux->extra->subtype.landlock_hook.type;
 }
 
+__maybe_unused
+static bool current_has_prog_type(enum landlock_hook_type hook_type)
+{
+   struct landlock_prog_set *prog_set;
+
+   prog_set = current->seccomp.landlock_prog_set;
+   return (prog_set && prog_set->programs[get_index(hook_type)]);
+}
+
 #endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git

[PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions

2018-02-26 Thread Mickaël Salaün

A landlocked process has less privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process' rules.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v6:
* factor out ptrace check
* constify pointers
* cleanup headers
* use the new security_add_hooks()
---
 security/landlock/Makefile   |   2 +-
 security/landlock/hooks_ptrace.c | 124 +++
 security/landlock/hooks_ptrace.h |  11 
 security/landlock/init.c |   2 +
 4 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/hooks_ptrace.c
 create mode 100644 security/landlock/hooks_ptrace.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index d0f532a93b4e..605504d852d3 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 landlock-y := init.o chain.o task.o \
tag.o tag_fs.o \
enforce.o enforce_seccomp.o \
-   hooks.o hooks_cred.o hooks_fs.o
+   hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
new file mode 100644
index ..f1b977b9c808
--- /dev/null
+++ b/security/landlock/hooks_ptrace.c
@@ -0,0 +1,124 @@
+/*
+ * Landlock LSM - ptrace hooks
+ *
+ * Copyright © 2017 Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include  /* ARRAY_SIZE */
+#include 
+#include  /* struct task_struct */
+#include 
+
+#include "common.h" /* struct landlock_prog_set */
+#include "hooks.h" /* landlocked() */
+#include "hooks_ptrace.h"
+
+static bool progs_are_subset(const struct landlock_prog_set *parent,
+   const struct landlock_prog_set *child)
+{
+   size_t i;
+
+   if (!parent || !child)
+   return false;
+   if (parent == child)
+   return true;
+
+   for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
+   struct landlock_prog_list *walker;
+   bool found_parent = false;
+
+   if (!parent->programs[i])
+   continue;
+   for (walker = child->programs[i]; walker;
+   walker = walker->prev) {
+   if (walker == parent->programs[i]) {
+   found_parent = true;
+   break;
+   }
+   }
+   if (!found_parent)
+   return false;
+   }
+   return true;
+}
+
+static bool task_has_subset_progs(const struct task_struct *parent,
+   const struct task_struct *child)
+{
+#ifdef CONFIG_SECCOMP_FILTER
+   if (progs_are_subset(parent->seccomp.landlock_prog_set,
+   child->seccomp.landlock_prog_set))
+   /* must be ANDed with other providers (i.e. cgroup) */
+   return true;
+#endif /* CONFIG_SECCOMP_FILTER */
+   return false;
+}
+
+static int task_ptrace(const struct task_struct *parent,
+   const struct task_struct *child)
+{
+   if (!landlocked(parent))
+   return 0;
+
+   if (!landlocked(child))
+   return -EPERM;
+
+   if (task_has_subset_progs(parent, child))
+   return 0;
+
+   return -EPERM;
+}
+
+/**
+ * hook_ptrace_access_check - determine whether the current process may access
+ *   another
+ *
+ * @child: the process to be accessed
+ * @mode: the mode of attachment
+ *
+ * If the current task has Landlock programs, then the child must have at least
+ * the same programs.  Else denied.
+ *
+ * Determine whether a process may access another, returning 0 if permission
+ * granted, -errno if denied.
+ */
+static int hook_ptrace_access_check(struct task_struct *child,
+   unsigned int mode)
+{
+   return task_ptrace(current, child);
+}
+
+/**
+ * hook_ptrace_traceme - determine whether another process may trace the
+ *  current one
+ *
+ * @parent: the task proposed to be the tracer
+ *
+ * If the parent has Landlock programs, then the current task must have the
+ * same or more programs.
+ * Else denied.
+ *
+ * Determine whether the nominated task is permitted to trace the current
+ * process, returning 0 if

[PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions

2018-02-26 Thread Mickaël Salaün

A landlocked process has less privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process' rules.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v6:
* factor out ptrace check
* constify pointers
* cleanup headers
* use the new security_add_hooks()
---
 security/landlock/Makefile   |   2 +-
 security/landlock/hooks_ptrace.c | 124 +++
 security/landlock/hooks_ptrace.h |  11 
 security/landlock/init.c |   2 +
 4 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/hooks_ptrace.c
 create mode 100644 security/landlock/hooks_ptrace.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index d0f532a93b4e..605504d852d3 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -3,4 +3,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 landlock-y := init.o chain.o task.o \
tag.o tag_fs.o \
enforce.o enforce_seccomp.o \
-   hooks.o hooks_cred.o hooks_fs.o
+   hooks.o hooks_cred.o hooks_fs.o hooks_ptrace.o
diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
new file mode 100644
index ..f1b977b9c808
--- /dev/null
+++ b/security/landlock/hooks_ptrace.c
@@ -0,0 +1,124 @@
+/*
+ * Landlock LSM - ptrace hooks
+ *
+ * Copyright © 2017 Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include  /* ARRAY_SIZE */
+#include 
+#include  /* struct task_struct */
+#include 
+
+#include "common.h" /* struct landlock_prog_set */
+#include "hooks.h" /* landlocked() */
+#include "hooks_ptrace.h"
+
+static bool progs_are_subset(const struct landlock_prog_set *parent,
+   const struct landlock_prog_set *child)
+{
+   size_t i;
+
+   if (!parent || !child)
+   return false;
+   if (parent == child)
+   return true;
+
+   for (i = 0; i < ARRAY_SIZE(child->programs); i++) {
+   struct landlock_prog_list *walker;
+   bool found_parent = false;
+
+   if (!parent->programs[i])
+   continue;
+   for (walker = child->programs[i]; walker;
+   walker = walker->prev) {
+   if (walker == parent->programs[i]) {
+   found_parent = true;
+   break;
+   }
+   }
+   if (!found_parent)
+   return false;
+   }
+   return true;
+}
+
+static bool task_has_subset_progs(const struct task_struct *parent,
+   const struct task_struct *child)
+{
+#ifdef CONFIG_SECCOMP_FILTER
+   if (progs_are_subset(parent->seccomp.landlock_prog_set,
+   child->seccomp.landlock_prog_set))
+   /* must be ANDed with other providers (i.e. cgroup) */
+   return true;
+#endif /* CONFIG_SECCOMP_FILTER */
+   return false;
+}
+
+static int task_ptrace(const struct task_struct *parent,
+   const struct task_struct *child)
+{
+   if (!landlocked(parent))
+   return 0;
+
+   if (!landlocked(child))
+   return -EPERM;
+
+   if (task_has_subset_progs(parent, child))
+   return 0;
+
+   return -EPERM;
+}
+
+/**
+ * hook_ptrace_access_check - determine whether the current process may access
+ *   another
+ *
+ * @child: the process to be accessed
+ * @mode: the mode of attachment
+ *
+ * If the current task has Landlock programs, then the child must have at least
+ * the same programs.  Else denied.
+ *
+ * Determine whether a process may access another, returning 0 if permission
+ * granted, -errno if denied.
+ */
+static int hook_ptrace_access_check(struct task_struct *child,
+   unsigned int mode)
+{
+   return task_ptrace(current, child);
+}
+
+/**
+ * hook_ptrace_traceme - determine whether another process may trace the
+ *  current one
+ *
+ * @parent: the task proposed to be the tracer
+ *
+ * If the parent has Landlock programs, then the current task must have the
+ * same or more programs.
+ * Else denied.
+ *
+ * Determine whether the nominated task is permitted to trace the current
+ * process, returning 0 if permission is granted, -errno if denied.
+ */
+static int hook_ptrace_traceme(struct task_struct *parent)
+{
+   return task_ptrace(parent, current);
+}
+
+static struct

[PATCH bpf-next v8 09/11] bpf: Add a Landlock sandbox example

2018-02-26 Thread Mickaël Salaün

Add a basic sandbox tool to launch a command which is only allowed to
access in a read only or read-write way a whitelist of file hierarchies.

Add to the bpf_load library the ability to handle a BPF program subtype.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v7:
* rewrite the example using an inode map
* add to bpf_load the ability to handle subtypes per program type

Changes since v6:
* check return value of load_and_attach()
* allow to write on pipes
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose
* use const variable (suggested by Kees Cook)
* remove useless definitions (suggested by Kees Cook)
* add detailed explanations (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes
* rebase

Changes since v4:
* write Landlock rule in C and compiled it with LLVM
* remove cgroup handling
* remove path handling: only handle a read-only environment
* remove errno return codes

Changes since v3:
* remove seccomp and origin field: completely free from seccomp programs
* handle more FS-related hooks
* handle inode hooks and directory traversal
* add faked but consistent view thanks to ENOENT
* add /lib64 in the example
* fix spelling
* rename some types and definitions (e.g. SECCOMP_ADD_LANDLOCK_RULE)

Changes since v2:
* use BPF_PROG_ATTACH for cgroup handling
---
 samples/bpf/Makefile |   4 +
 samples/bpf/bpf_load.c   |  82 -
 samples/bpf/bpf_load.h   |   7 ++
 samples/bpf/landlock1.h  |  14 
 samples/bpf/landlock1_kern.c | 171 +++
 samples/bpf/landlock1_user.c | 164 +
 6 files changed, 439 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/landlock1.h
 create mode 100644 samples/bpf/landlock1_kern.c
 create mode 100644 samples/bpf/landlock1_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ec3fc8d88e87..015b1375daa5 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -43,6 +43,7 @@ hostprogs-y += xdp_redirect_cpu
 hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
+hostprogs-y += landlock1
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -93,6 +94,7 @@ xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) 
xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+landlock1-objs := bpf_load.o $(LIBBPF) landlock1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -144,6 +146,7 @@ always += xdp_monitor_kern.o
 always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
+always += landlock1_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -188,6 +191,7 @@ HOSTLOADLIBES_xdp_redirect_cpu += -lelf
 HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_xdp_rxq_info += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
+HOSTLOADLIBES_landlock1 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 5bb37db6054b..f7c91093b2f5 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,9 @@ int prog_array_fd = -1;
 struct bpf_map_data map_data[MAX_MAPS];
 int map_data_count = 0;
 
+struct bpf_subtype_data subtype_data[MAX_PROGS];
+int subtype_data_count = 0;
+
 static int populate_prog_array(const char *event, int prog_fd)
 {
int ind = atoi(event), err;
@@ -67,12 +71,14 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int size)
bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
bool is_sockops = strncmp(event, "sockops", 7) == 0;
bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+   bool is_landlock = strncmp(event, "landlock", 8) == 0;
size_t insns_cnt = size / sizeof(struct bpf_insn);
enum bpf_prog_type prog_type;
char buf[256];
int fd, efd, err, id;
struct perf_event_attr attr = {};
union bpf_prog_subtype *st = NULL;
+   struct bpf_subtype_data *sd = NULL;
 
attr.type = PERF_TYPE_TRACEPOINT;
attr.sample_type = PERF_SAMPLE_RAW;
@@ -97,6 +103,50 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int

[PATCH bpf-next v8 09/11] bpf: Add a Landlock sandbox example

2018-02-26 Thread Mickaël Salaün

Add a basic sandbox tool to launch a command which is only allowed to
access in a read only or read-write way a whitelist of file hierarchies.

Add to the bpf_load library the ability to handle a BPF program subtype.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v7:
* rewrite the example using an inode map
* add to bpf_load the ability to handle subtypes per program type

Changes since v6:
* check return value of load_and_attach()
* allow to write on pipes
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose
* use const variable (suggested by Kees Cook)
* remove useless definitions (suggested by Kees Cook)
* add detailed explanations (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes
* rebase

Changes since v4:
* write Landlock rule in C and compiled it with LLVM
* remove cgroup handling
* remove path handling: only handle a read-only environment
* remove errno return codes

Changes since v3:
* remove seccomp and origin field: completely free from seccomp programs
* handle more FS-related hooks
* handle inode hooks and directory traversal
* add faked but consistent view thanks to ENOENT
* add /lib64 in the example
* fix spelling
* rename some types and definitions (e.g. SECCOMP_ADD_LANDLOCK_RULE)

Changes since v2:
* use BPF_PROG_ATTACH for cgroup handling
---
 samples/bpf/Makefile |   4 +
 samples/bpf/bpf_load.c   |  82 -
 samples/bpf/bpf_load.h   |   7 ++
 samples/bpf/landlock1.h  |  14 
 samples/bpf/landlock1_kern.c | 171 +++
 samples/bpf/landlock1_user.c | 164 +
 6 files changed, 439 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/landlock1.h
 create mode 100644 samples/bpf/landlock1_kern.c
 create mode 100644 samples/bpf/landlock1_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ec3fc8d88e87..015b1375daa5 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -43,6 +43,7 @@ hostprogs-y += xdp_redirect_cpu
 hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
+hostprogs-y += landlock1
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -93,6 +94,7 @@ xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) 
xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+landlock1-objs := bpf_load.o $(LIBBPF) landlock1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -144,6 +146,7 @@ always += xdp_monitor_kern.o
 always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
+always += landlock1_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -188,6 +191,7 @@ HOSTLOADLIBES_xdp_redirect_cpu += -lelf
 HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_xdp_rxq_info += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
+HOSTLOADLIBES_landlock1 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 5bb37db6054b..f7c91093b2f5 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,9 @@ int prog_array_fd = -1;
 struct bpf_map_data map_data[MAX_MAPS];
 int map_data_count = 0;
 
+struct bpf_subtype_data subtype_data[MAX_PROGS];
+int subtype_data_count = 0;
+
 static int populate_prog_array(const char *event, int prog_fd)
 {
int ind = atoi(event), err;
@@ -67,12 +71,14 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int size)
bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
bool is_sockops = strncmp(event, "sockops", 7) == 0;
bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+   bool is_landlock = strncmp(event, "landlock", 8) == 0;
size_t insns_cnt = size / sizeof(struct bpf_insn);
enum bpf_prog_type prog_type;
char buf[256];
int fd, efd, err, id;
struct perf_event_attr attr = {};
union bpf_prog_subtype *st = NULL;
+   struct bpf_subtype_data *sd = NULL;
 
attr.type = PERF_TYPE_TRACEPOINT;
attr.sample_type = PERF_SAMPLE_RAW;
@@ -97,6 +103,50 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int size)
prog_type = BPF_PROG_TYPE_SOCK_OPS;
} else if (is_sk_skb) {
prog_type = BPF_PROG_TYPE_SK_SKB;
+   } else if (is_landlock)

Re: [RFC PATCH V2 13/22] x86/intel_rdt: Support schemata write - pseudo-locking core

2018-02-26 Thread Reinette Chatre

Hi Thomas,

On 2/20/2018 9:15 AM, Thomas Gleixner wrote:
> Let's look at the existing crtl/mon groups which are each represented by a
> directory already.
> 
>  - Adding a 'size' file to the ctrl groups would be a natural extension
>which makes sense for regular cache allocations as well.
> 
>  - Adding a 'exclusive' flag would be an interesting feature even for the
>normal use case. Marking a group as exclusive prevents other groups to
>request CBM bits which are held by a exclusive allocation.
> 
>I'd suggest to have a file 'mode' for controlling this. The valid values
>would be something like 'shareable' and 'exclusive'.
> 
>When trying to set a group to exclusive mode then the schemata has to be
>checked for overlaps with the other schematas and in case of conflict
>the write fails. Once enabled subsequent writes to the schemata file
>need to be checked for conflicts as well.
> 
>If the exclusive setting is enabled then the CBM bits of that group
>are excluded from being used in other control groups.
> 
> Aside of that a file in the info directory which shows the (un)used CBM
> bits of all groups is really helpful for controlling all of that (even w/o
> pseudo locking). You have this in the 'avail' file, but there is no reason
> why this should only be available for pseudo locking enabled systems.
> 
> Now for the pseudo locking part.
> 
> What you need on top of the above is a new 'mode': 'locked'. That mode
> utilizes the 'exclusive' mode rules vs. conflict checking and the
> protection against allocating the associated CBM bits in other control
> groups.
> 
> The setup would be like this:
> 
> mkdir group
> echo '$CONFIG' >group/schemata
> echo 'locked' >group/mode
> 
> Setting mode to locked locks down the schemata file along with the
> task/cpus/cpus_list files. The task/cpu files need to be empty when
> entering locked mode, otherwise the operation fails. I'd even would not
> bother handing back the CLOSID. For simplicity the CLOSID should just stay
> associated with the control group until it is destroyed as any other
> control group.

I started looking at how this implementation may look and would like to
confirm with you that your intentions behind the new "exclusive" and
"locked" modes can be maintained. I also have a few questions.

Focusing on CAT a resource group represents a closid across all domains
(cache instances) of all resources (cache layers) on the system. A full
schemata reflecting the active bitmask associated with this closid for
each domain of each resource is maintained. The current implementation
supports partial writes to the schemata, with the assumption that only
the changed values need to be updated, the others remain as is. For the
current implementation this works well since what is shown by schemata
reflects current hardware settings and what is written to schemata will
change current hardware settings. This is done irrespective of any
overlap between bitmasks of different closids (the "shareable" mode).

A change to start us off with could be to initialize the schemata with
all the shareable and unused bits set for all domains when a new
resource group is created.

Moving to "exclusive" mode it appears that, when enabled for a resource
group, all domains of all resources are forced to have an "exclusive"
region associated with this resource group (closid). This is because the
schemata reflects the hardware settings of all resources and their
domains and the hardware does not accept a "zero" bitmask. A user thus
cannot just specify a single region of a particular cache instance as
"exclusive". Does this match your intention wrt "exclusive"?

Moving on to the "locked" mode. We cannot support different
pseudo-locked regions across multiple resources (eg. L2 and L3). In
fact, if we would at some point in the future then a pseudo-locked
region on one resource could implicitly span a second resource.
Additionally, we would like to enable a user to enable a single
pseudo-locked region on a single cache instance.

>From the above it follows that "locked" mode cannot just simply build on
top of "exclusive" mode rules (as I expressed them above) since it
cannot enforce a locked region on each domain of each resource.

We would like to support something like (as you also have in your example):

mkdir group
echo "L2:1=0x3" > schemata
echo locked > mode

The above should only pseudo-lock the indicated region and not touch any
other domain. The problem is that the schemata always contain non-zero
bitmasks for all domains so at the time "locked" is written it is not
known which cache region needs to be locked. I am currently unable to
see a simple way to build on top of the current schemata design to
support the "locked" mode as you intended. It does seem as though the
user's intention to create a pseudo-locked region needs to be
communicated before the schemata is written, but from what I understand
this does not seem to be

Re: [RFC PATCH V2 13/22] x86/intel_rdt: Support schemata write - pseudo-locking core

2018-02-26 Thread Reinette Chatre

Hi Thomas,

On 2/20/2018 9:15 AM, Thomas Gleixner wrote:
> Let's look at the existing crtl/mon groups which are each represented by a
> directory already.
> 
>  - Adding a 'size' file to the ctrl groups would be a natural extension
>which makes sense for regular cache allocations as well.
> 
>  - Adding a 'exclusive' flag would be an interesting feature even for the
>normal use case. Marking a group as exclusive prevents other groups to
>request CBM bits which are held by a exclusive allocation.
> 
>I'd suggest to have a file 'mode' for controlling this. The valid values
>would be something like 'shareable' and 'exclusive'.
> 
>When trying to set a group to exclusive mode then the schemata has to be
>checked for overlaps with the other schematas and in case of conflict
>the write fails. Once enabled subsequent writes to the schemata file
>need to be checked for conflicts as well.
> 
>If the exclusive setting is enabled then the CBM bits of that group
>are excluded from being used in other control groups.
> 
> Aside of that a file in the info directory which shows the (un)used CBM
> bits of all groups is really helpful for controlling all of that (even w/o
> pseudo locking). You have this in the 'avail' file, but there is no reason
> why this should only be available for pseudo locking enabled systems.
> 
> Now for the pseudo locking part.
> 
> What you need on top of the above is a new 'mode': 'locked'. That mode
> utilizes the 'exclusive' mode rules vs. conflict checking and the
> protection against allocating the associated CBM bits in other control
> groups.
> 
> The setup would be like this:
> 
> mkdir group
> echo '$CONFIG' >group/schemata
> echo 'locked' >group/mode
> 
> Setting mode to locked locks down the schemata file along with the
> task/cpus/cpus_list files. The task/cpu files need to be empty when
> entering locked mode, otherwise the operation fails. I'd even would not
> bother handing back the CLOSID. For simplicity the CLOSID should just stay
> associated with the control group until it is destroyed as any other
> control group.

I started looking at how this implementation may look and would like to
confirm with you that your intentions behind the new "exclusive" and
"locked" modes can be maintained. I also have a few questions.

Focusing on CAT a resource group represents a closid across all domains
(cache instances) of all resources (cache layers) on the system. A full
schemata reflecting the active bitmask associated with this closid for
each domain of each resource is maintained. The current implementation
supports partial writes to the schemata, with the assumption that only
the changed values need to be updated, the others remain as is. For the
current implementation this works well since what is shown by schemata
reflects current hardware settings and what is written to schemata will
change current hardware settings. This is done irrespective of any
overlap between bitmasks of different closids (the "shareable" mode).

A change to start us off with could be to initialize the schemata with
all the shareable and unused bits set for all domains when a new
resource group is created.

Moving to "exclusive" mode it appears that, when enabled for a resource
group, all domains of all resources are forced to have an "exclusive"
region associated with this resource group (closid). This is because the
schemata reflects the hardware settings of all resources and their
domains and the hardware does not accept a "zero" bitmask. A user thus
cannot just specify a single region of a particular cache instance as
"exclusive". Does this match your intention wrt "exclusive"?

Moving on to the "locked" mode. We cannot support different
pseudo-locked regions across multiple resources (eg. L2 and L3). In
fact, if we would at some point in the future then a pseudo-locked
region on one resource could implicitly span a second resource.
Additionally, we would like to enable a user to enable a single
pseudo-locked region on a single cache instance.

>From the above it follows that "locked" mode cannot just simply build on
top of "exclusive" mode rules (as I expressed them above) since it
cannot enforce a locked region on each domain of each resource.

We would like to support something like (as you also have in your example):

mkdir group
echo "L2:1=0x3" > schemata
echo locked > mode

The above should only pseudo-lock the indicated region and not touch any
other domain. The problem is that the schemata always contain non-zero
bitmasks for all domains so at the time "locked" is written it is not
known which cache region needs to be locked. I am currently unable to
see a simple way to build on top of the current schemata design to
support the "locked" mode as you intended. It does seem as though the
user's intention to create a pseudo-locked region needs to be
communicated before the schemata is written, but from what I understand
this does not seem to be

Re: [PATCH v15 00/11] fw_cfg: add DMA operations & etc/vmcoreinfo support

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:01PM +0100, Marc-André Lureau wrote:
> Hi,
> 
> This series adds DMA operations support to the qemu fw_cfg kernel
> module and populates "etc/vmcoreinfo" with vmcoreinfo location
> details (entry added since qemu 2.11 with -device vmcoreinfo).

Pls reorder with patches 3-7 first as these are fixes.

> v15:
> - fix fw_cfg.h uapi header #include
> - use BSD license for fw_cfg.h uapi header
> - move the uapi defines/structs for DMA & vmcoreinfo in the
>   corresponding patch
> - use cpu_relax() instead of usleep_range(50, 100);
> - replace do { } while(true) by for (;;)
> - fix the rmb() call location
> - add a preliminary patch to handle error from fw_cfg_write_blob()
> - rewrite fw_cfg_sel_endianness() to wrap iowrite() calls
> 
> v14:
> - add "fw_cfg: add a public uapi header"
> - fix sparse warnings & don't introduce new warnings
> - add memory barriers to force IO ordering
> - split fw_cfg_read_blob() in fw_cfg_read_blob_io() and
>   fw_cfg_read_blob_dma()
> - add error handling to fw_cfg_read_blob() callers
> - minor stylistic changes
> 
> v13:
> - reorder patch series, introduce DMA write before DMA read
> - do some measurements of DMA read speed-ups
> 
> v12:
> - fix virt_to_phys(NULL) panic with CONFIG_DEBUG_VIRTUAL=y
> - do not use DMA read, except for kmalloc() memory we allocated
>   ourself (only fw_cfg_register_dir_entries() so far)
> 
> v11:
> - add #include  in last patch,
>   fixing kbuild .config test
> 
> Marc-André Lureau (11):
>   crash: export paddr_vmcoreinfo_note()
>   fw_cfg: add a public uapi header
>   fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
>   fw_cfg: fix sparse warnings with fw_cfg_file
>   fw_cfg: fix sparse warning reading FW_CFG_ID
>   fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
>   fw_cfg: remove inline from fw_cfg_read_blob()
>   fw_cfg: handle fw_cfg_read_blob() error
>   fw_cfg: add DMA register
>   fw_cfg: write vmcoreinfo details
>   RFC: fw_cfg: do DMA read operation
> 
>  MAINTAINERS|   1 +
>  drivers/firmware/qemu_fw_cfg.c | 334 
> +
>  include/uapi/linux/fw_cfg.h|  97 
>  kernel/crash_core.c|   1 +
>  4 files changed, 369 insertions(+), 64 deletions(-)
>  create mode 100644 include/uapi/linux/fw_cfg.h
> 
> -- 
> 2.16.1.73.g5832b7e9f2

Re: [PATCH v15 00/11] fw_cfg: add DMA operations & etc/vmcoreinfo support

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:01PM +0100, Marc-André Lureau wrote:
> Hi,
> 
> This series adds DMA operations support to the qemu fw_cfg kernel
> module and populates "etc/vmcoreinfo" with vmcoreinfo location
> details (entry added since qemu 2.11 with -device vmcoreinfo).

Pls reorder with patches 3-7 first as these are fixes.

> v15:
> - fix fw_cfg.h uapi header #include
> - use BSD license for fw_cfg.h uapi header
> - move the uapi defines/structs for DMA & vmcoreinfo in the
>   corresponding patch
> - use cpu_relax() instead of usleep_range(50, 100);
> - replace do { } while(true) by for (;;)
> - fix the rmb() call location
> - add a preliminary patch to handle error from fw_cfg_write_blob()
> - rewrite fw_cfg_sel_endianness() to wrap iowrite() calls
> 
> v14:
> - add "fw_cfg: add a public uapi header"
> - fix sparse warnings & don't introduce new warnings
> - add memory barriers to force IO ordering
> - split fw_cfg_read_blob() in fw_cfg_read_blob_io() and
>   fw_cfg_read_blob_dma()
> - add error handling to fw_cfg_read_blob() callers
> - minor stylistic changes
> 
> v13:
> - reorder patch series, introduce DMA write before DMA read
> - do some measurements of DMA read speed-ups
> 
> v12:
> - fix virt_to_phys(NULL) panic with CONFIG_DEBUG_VIRTUAL=y
> - do not use DMA read, except for kmalloc() memory we allocated
>   ourself (only fw_cfg_register_dir_entries() so far)
> 
> v11:
> - add #include  in last patch,
>   fixing kbuild .config test
> 
> Marc-André Lureau (11):
>   crash: export paddr_vmcoreinfo_note()
>   fw_cfg: add a public uapi header
>   fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
>   fw_cfg: fix sparse warnings with fw_cfg_file
>   fw_cfg: fix sparse warning reading FW_CFG_ID
>   fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
>   fw_cfg: remove inline from fw_cfg_read_blob()
>   fw_cfg: handle fw_cfg_read_blob() error
>   fw_cfg: add DMA register
>   fw_cfg: write vmcoreinfo details
>   RFC: fw_cfg: do DMA read operation
> 
>  MAINTAINERS|   1 +
>  drivers/firmware/qemu_fw_cfg.c | 334 
> +
>  include/uapi/linux/fw_cfg.h|  97 
>  kernel/crash_core.c|   1 +
>  4 files changed, 369 insertions(+), 64 deletions(-)
>  create mode 100644 include/uapi/linux/fw_cfg.h
> 
> -- 
> 2.16.1.73.g5832b7e9f2

Re: [PATCH v15 10/11] fw_cfg: write vmcoreinfo details

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:11PM +0100, Marc-André Lureau wrote:
> If the "etc/vmcoreinfo" fw_cfg file is present and we are not running
> the kdump kernel, write the addr/size of the vmcoreinfo ELF note.
> 
> The DMA operation is expected to run synchronously with today qemu,
> but the specification states that it may become async, so we run
> "control" field check in a loop for eventual changes.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 143 
> -
>  include/uapi/linux/fw_cfg.h|  31 +
>  2 files changed, 171 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index c28bec4b5663..3015e77aebca 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -34,11 +34,17 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  MODULE_AUTHOR("Gabriel L. Somlo ");
>  MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
>  MODULE_LICENSE("GPL");
>  
> +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> +static u32 fw_cfg_rev;
> +
>  /* fw_cfg device i/o register addresses */
>  static bool fw_cfg_is_mmio;
>  static phys_addr_t fw_cfg_p_base;
> @@ -60,6 +66,64 @@ static void fw_cfg_sel_endianness(u16 key)
>   iowrite16(key, fw_cfg_reg_ctrl);
>  }
>  
> +static inline bool fw_cfg_dma_enabled(void)
> +{
> + return (fw_cfg_rev & FW_CFG_VERSION_DMA) && fw_cfg_reg_dma;
> +}
> +
> +/* qemu fw_cfg device is sync today, but spec says it may become async */
> +static void fw_cfg_wait_for_control(struct fw_cfg_dma_access *d)
> +{
> + for (;;) {
> + u32 ctrl = be32_to_cpu(READ_ONCE(d->control));
> +
> + /* do not reorder the read to d->control */
> + rmb();
> + if ((ctrl & ~FW_CFG_DMA_CTL_ERROR) == 0)
> + return;
> +
> + cpu_relax();
> + }
> +}
> +
> +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> +{
> + phys_addr_t dma;
> + struct fw_cfg_dma_access *d = NULL;
> + ssize_t ret = length;
> +
> + d = kmalloc(sizeof(*d), GFP_KERNEL);
> + if (!d) {
> + ret = -ENOMEM;
> + goto end;
> + }
> +
> + /* fw_cfg device does not need IOMMU protection, so use physical 
> addresses */
> + *d = (struct fw_cfg_dma_access) {
> + .address = cpu_to_be64(address ? virt_to_phys(address) : 0),
> + .length = cpu_to_be32(length),
> + .control = cpu_to_be32(control)
> + };
> +
> + dma = virt_to_phys(d);
> +
> + iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
> + /* force memory to sync before notifying device via MMIO */
> + wmb();
> + iowrite32be(dma, fw_cfg_reg_dma + 4);
> +
> + fw_cfg_wait_for_control(d);
> +
> + if (be32_to_cpu(READ_ONCE(d->control)) & FW_CFG_DMA_CTL_ERROR) {
> + ret = -EIO;
> + }
> +
> +end:
> + kfree(d);
> +
> + return ret;
> +}
> +
>  /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>  static ssize_t fw_cfg_read_blob(u16 key,
>   void *buf, loff_t pos, size_t count)
> @@ -89,6 +153,47 @@ static ssize_t fw_cfg_read_blob(u16 key,
>   return count;
>  }
>  
> +#ifdef CONFIG_CRASH_CORE
> +/* write chunk of given fw_cfg blob (caller responsible for sanity-check) */
> +static ssize_t fw_cfg_write_blob(u16 key,
> +  void *buf, loff_t pos, size_t count)
> +{
> + u32 glk = -1U;
> + acpi_status status;
> + ssize_t ret = count;
> +
> + /* If we have ACPI, ensure mutual exclusion against any potential
> +  * device access by the firmware, e.g. via AML methods:
> +  */
> + status = acpi_acquire_global_lock(ACPI_WAIT_FOREVER, );
> + if (ACPI_FAILURE(status) && status != AE_NOT_CONFIGURED) {
> + /* Should never get here */
> + WARN(1, "%s: Failed to lock ACPI!\n", __func__);
> + return -EINVAL;
> + }
> +
> + mutex_lock(_cfg_dev_lock);
> + if (pos == 0) {
> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> +   | FW_CFG_DMA_CTL_SELECT
> +   | FW_CFG_DMA_CTL_WRITE);
> + } else {
> + fw_cfg_sel_endianness(key);
> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> + if (ret < 0)
> + goto end;
> + ret = fw_cfg_dma_transfer(buf, count, FW_CFG_DMA_CTL_WRITE);
> + }
> +
> +end:
> + mutex_unlock(_cfg_dev_lock);
> +
> + acpi_release_global_lock(glk);
> +
> + return ret;
> +}
> +#endif /* CONFIG_CRASH_CORE */
> +
>  /* clean up fw_cfg device i/o */
>  static void fw_cfg_io_cleanup(void)
>  {
> @@ -188,9 +293,6 @@ static int fw_cfg_do_platform_probe(struct 
>

Re: [PATCH v15 10/11] fw_cfg: write vmcoreinfo details

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:11PM +0100, Marc-André Lureau wrote:
> If the "etc/vmcoreinfo" fw_cfg file is present and we are not running
> the kdump kernel, write the addr/size of the vmcoreinfo ELF note.
> 
> The DMA operation is expected to run synchronously with today qemu,
> but the specification states that it may become async, so we run
> "control" field check in a loop for eventual changes.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 143 
> -
>  include/uapi/linux/fw_cfg.h|  31 +
>  2 files changed, 171 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index c28bec4b5663..3015e77aebca 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -34,11 +34,17 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  MODULE_AUTHOR("Gabriel L. Somlo ");
>  MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
>  MODULE_LICENSE("GPL");
>  
> +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> +static u32 fw_cfg_rev;
> +
>  /* fw_cfg device i/o register addresses */
>  static bool fw_cfg_is_mmio;
>  static phys_addr_t fw_cfg_p_base;
> @@ -60,6 +66,64 @@ static void fw_cfg_sel_endianness(u16 key)
>   iowrite16(key, fw_cfg_reg_ctrl);
>  }
>  
> +static inline bool fw_cfg_dma_enabled(void)
> +{
> + return (fw_cfg_rev & FW_CFG_VERSION_DMA) && fw_cfg_reg_dma;
> +}
> +
> +/* qemu fw_cfg device is sync today, but spec says it may become async */
> +static void fw_cfg_wait_for_control(struct fw_cfg_dma_access *d)
> +{
> + for (;;) {
> + u32 ctrl = be32_to_cpu(READ_ONCE(d->control));
> +
> + /* do not reorder the read to d->control */
> + rmb();
> + if ((ctrl & ~FW_CFG_DMA_CTL_ERROR) == 0)
> + return;
> +
> + cpu_relax();
> + }
> +}
> +
> +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> +{
> + phys_addr_t dma;
> + struct fw_cfg_dma_access *d = NULL;
> + ssize_t ret = length;
> +
> + d = kmalloc(sizeof(*d), GFP_KERNEL);
> + if (!d) {
> + ret = -ENOMEM;
> + goto end;
> + }
> +
> + /* fw_cfg device does not need IOMMU protection, so use physical 
> addresses */
> + *d = (struct fw_cfg_dma_access) {
> + .address = cpu_to_be64(address ? virt_to_phys(address) : 0),
> + .length = cpu_to_be32(length),
> + .control = cpu_to_be32(control)
> + };
> +
> + dma = virt_to_phys(d);
> +
> + iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
> + /* force memory to sync before notifying device via MMIO */
> + wmb();
> + iowrite32be(dma, fw_cfg_reg_dma + 4);
> +
> + fw_cfg_wait_for_control(d);
> +
> + if (be32_to_cpu(READ_ONCE(d->control)) & FW_CFG_DMA_CTL_ERROR) {
> + ret = -EIO;
> + }
> +
> +end:
> + kfree(d);
> +
> + return ret;
> +}
> +
>  /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>  static ssize_t fw_cfg_read_blob(u16 key,
>   void *buf, loff_t pos, size_t count)
> @@ -89,6 +153,47 @@ static ssize_t fw_cfg_read_blob(u16 key,
>   return count;
>  }
>  
> +#ifdef CONFIG_CRASH_CORE
> +/* write chunk of given fw_cfg blob (caller responsible for sanity-check) */
> +static ssize_t fw_cfg_write_blob(u16 key,
> +  void *buf, loff_t pos, size_t count)
> +{
> + u32 glk = -1U;
> + acpi_status status;
> + ssize_t ret = count;
> +
> + /* If we have ACPI, ensure mutual exclusion against any potential
> +  * device access by the firmware, e.g. via AML methods:
> +  */
> + status = acpi_acquire_global_lock(ACPI_WAIT_FOREVER, );
> + if (ACPI_FAILURE(status) && status != AE_NOT_CONFIGURED) {
> + /* Should never get here */
> + WARN(1, "%s: Failed to lock ACPI!\n", __func__);
> + return -EINVAL;
> + }
> +
> + mutex_lock(_cfg_dev_lock);
> + if (pos == 0) {
> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> +   | FW_CFG_DMA_CTL_SELECT
> +   | FW_CFG_DMA_CTL_WRITE);
> + } else {
> + fw_cfg_sel_endianness(key);
> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> + if (ret < 0)
> + goto end;
> + ret = fw_cfg_dma_transfer(buf, count, FW_CFG_DMA_CTL_WRITE);
> + }
> +
> +end:
> + mutex_unlock(_cfg_dev_lock);
> +
> + acpi_release_global_lock(glk);
> +
> + return ret;
> +}
> +#endif /* CONFIG_CRASH_CORE */
> +
>  /* clean up fw_cfg device i/o */
>  static void fw_cfg_io_cleanup(void)
>  {
> @@ -188,9 +293,6 @@ static int fw_cfg_do_platform_probe(struct 
> platform_device *pdev)
>   return 0;
>

[PATCH 3/4 v2] fs: proc: use down_read_killable() in environ_read()

2018-02-26 Thread Yang Shi

Like reading /proc/*/cmdline, it is possible to be blocked for long time
when reading /proc/*/environ when manipulating large mapping at the mean
time. The environ reading process will be waiting for mmap_sem become
available for a long time then it may cause the reading task hung.

Convert down_read() and access_remote_vm() to killable version.

Signed-off-by: Yang Shi 
Suggested-by: Alexey Dobriyan 
---
 fs/proc/base.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9bdb84b..d87d9ab 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -933,7 +933,9 @@ static ssize_t environ_read(struct file *file, char __user 
*buf,
if (!mmget_not_zero(mm))
goto free;
 
-   down_read(>mmap_sem);
+   ret = down_read_killable(>mmap_sem);
+   if (ret)
+   goto out_mmput;
env_start = mm->env_start;
env_end = mm->env_end;
up_read(>mmap_sem);
@@ -950,7 +952,8 @@ static ssize_t environ_read(struct file *file, char __user 
*buf,
max_len = min_t(size_t, PAGE_SIZE, count);
this_len = min(max_len, this_len);
 
-   retval = access_remote_vm(mm, (env_start + src), page, 
this_len, 0);
+   retval = access_remote_vm_killable(mm, (env_start + src),
+   page, this_len, 0);
 
if (retval <= 0) {
ret = retval;
@@ -968,6 +971,8 @@ static ssize_t environ_read(struct file *file, char __user 
*buf,
count -= retval;
}
*ppos = src;
+
+out_mmput:
mmput(mm);
 
 free:
-- 
1.8.3.1

[PATCH 3/4 v2] fs: proc: use down_read_killable() in environ_read()

2018-02-26 Thread Yang Shi

Like reading /proc/*/cmdline, it is possible to be blocked for long time
when reading /proc/*/environ when manipulating large mapping at the mean
time. The environ reading process will be waiting for mmap_sem become
available for a long time then it may cause the reading task hung.

Convert down_read() and access_remote_vm() to killable version.

Signed-off-by: Yang Shi 
Suggested-by: Alexey Dobriyan 
---
 fs/proc/base.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9bdb84b..d87d9ab 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -933,7 +933,9 @@ static ssize_t environ_read(struct file *file, char __user 
*buf,
if (!mmget_not_zero(mm))
goto free;
 
-   down_read(>mmap_sem);
+   ret = down_read_killable(>mmap_sem);
+   if (ret)
+   goto out_mmput;
env_start = mm->env_start;
env_end = mm->env_end;
up_read(>mmap_sem);
@@ -950,7 +952,8 @@ static ssize_t environ_read(struct file *file, char __user 
*buf,
max_len = min_t(size_t, PAGE_SIZE, count);
this_len = min(max_len, this_len);
 
-   retval = access_remote_vm(mm, (env_start + src), page, 
this_len, 0);
+   retval = access_remote_vm_killable(mm, (env_start + src),
+   page, this_len, 0);
 
if (retval <= 0) {
ret = retval;
@@ -968,6 +971,8 @@ static ssize_t environ_read(struct file *file, char __user 
*buf,
count -= retval;
}
*ppos = src;
+
+out_mmput:
mmput(mm);
 
 free:
-- 
1.8.3.1

[RFC PATCH 0/4 v2] Define killable version for access_remote_vm() and use it in fs/proc

2018-02-26 Thread Yang Shi


Background:
When running vm-scalability with large memory (> 300GB), the below hung
task issue happens occasionally.

INFO: task ps:14018 blocked for more than 120 seconds.
   Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 ps  D0 14018  1 0x0004
  885582f84000 885e8682f000 880972943000 885ebf499bc0
  8828ee12 c900349bfca8 817154d0 0040
  00ff812f872a 885ebf499bc0 024000d000948300 880972943000
 Call Trace:
  [] ? __schedule+0x250/0x730
  [] schedule+0x36/0x80
  [] rwsem_down_read_failed+0xf0/0x150
  [] call_rwsem_down_read_failed+0x18/0x30
  [] down_read+0x20/0x40
  [] proc_pid_cmdline_read+0xd9/0x4e0
  [] ? do_filp_open+0xa5/0x100
  [] __vfs_read+0x37/0x150
  [] ? security_file_permission+0x9b/0xc0
  [] vfs_read+0x96/0x130
  [] SyS_read+0x55/0xc0
  [] entry_SYSCALL_64_fastpath+0x1a/0xc5

When manipulating a large mapping, the process may hold the mmap_sem for
long time, so reading /proc//cmdline may be blocked in
uninterruptible state for long time.
We already have killable version APIs for semaphore, here use 
down_read_killable()
to improve the responsiveness.


When reviewing the v1 patch (https://patchwork.kernel.org/patch/10230809/),
Alexey pointed out access_remote_vm() need to be killable too. And, 
/proc/*/environ
reading may suffer from the same issue, so it should be converted to killable
version for both down_read and access_remote_vm too.

With reading the code, both access_remote_vm() and access_process_vm() calls
__access_remote_vm() which acquires mmap_sem by down_read(). access_remote_vm()
is only used by fs/proc/base.c, but access_process_vm() is used by other
subsystems too, i.e. ptrace, audit, etc. So, it sounds not that safe to convert
both access_remote_vm() and access_process_vm() to killable.
Instead of doing so, extract command part of __access_remote_vm() (gup part) to
a new static function, called raw_access_remote_vm(), then define
__access_remote_vm() and __access_remote_vm_killable(), which acquire mmap_sem
by down_read() and _killable() respectively.

Then define access_remote_vm() and access_remote_vm_killable() to call them
respectively. Keep access_process_vm() calls __access_remote_vm().

So far fs/proc/base.c is the only user of access_remote_vm_killable(), but
there might be other users in the future.

There are 4 patches in this revision:
#1 define access_remote_vm_killable() APIs
#2 convert /proc/*/cmdline reading to down_read_killable() and 
access_remote_vm_killable()
#3 convert /proc/*/environ reading to down_read_killable() and 
access_remote_vm_killable()
#4 replace access_process_vm() to access_remote_vm() in get_cmdline to save one
   mm reference count inc (please see the commit log for the details). This
   change makes get_cmdline() is the only user of access_remote_vm()


Yang Shi (4):
  mm: add access_remote_vm_killable APIs
  fs: proc: use down_read_killable in proc_pid_cmdline_read()
  fs: proc: use down_read_killable() in environ_read()
  mm: use access_remote_vm() in get_cmdline()

 fs/proc/base.c | 21 +++--
 include/linux/mm.h |  5 +
 mm/memory.c| 44 +---
 mm/nommu.c | 36 
 mm/util.c  |  4 ++--
 5 files changed, 91 insertions(+), 19 deletions(-)

[RFC PATCH 0/4 v2] Define killable version for access_remote_vm() and use it in fs/proc

2018-02-26 Thread Yang Shi


Background:
When running vm-scalability with large memory (> 300GB), the below hung
task issue happens occasionally.

INFO: task ps:14018 blocked for more than 120 seconds.
   Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 ps  D0 14018  1 0x0004
  885582f84000 885e8682f000 880972943000 885ebf499bc0
  8828ee12 c900349bfca8 817154d0 0040
  00ff812f872a 885ebf499bc0 024000d000948300 880972943000
 Call Trace:
  [] ? __schedule+0x250/0x730
  [] schedule+0x36/0x80
  [] rwsem_down_read_failed+0xf0/0x150
  [] call_rwsem_down_read_failed+0x18/0x30
  [] down_read+0x20/0x40
  [] proc_pid_cmdline_read+0xd9/0x4e0
  [] ? do_filp_open+0xa5/0x100
  [] __vfs_read+0x37/0x150
  [] ? security_file_permission+0x9b/0xc0
  [] vfs_read+0x96/0x130
  [] SyS_read+0x55/0xc0
  [] entry_SYSCALL_64_fastpath+0x1a/0xc5

When manipulating a large mapping, the process may hold the mmap_sem for
long time, so reading /proc//cmdline may be blocked in
uninterruptible state for long time.
We already have killable version APIs for semaphore, here use 
down_read_killable()
to improve the responsiveness.


When reviewing the v1 patch (https://patchwork.kernel.org/patch/10230809/),
Alexey pointed out access_remote_vm() need to be killable too. And, 
/proc/*/environ
reading may suffer from the same issue, so it should be converted to killable
version for both down_read and access_remote_vm too.

With reading the code, both access_remote_vm() and access_process_vm() calls
__access_remote_vm() which acquires mmap_sem by down_read(). access_remote_vm()
is only used by fs/proc/base.c, but access_process_vm() is used by other
subsystems too, i.e. ptrace, audit, etc. So, it sounds not that safe to convert
both access_remote_vm() and access_process_vm() to killable.
Instead of doing so, extract command part of __access_remote_vm() (gup part) to
a new static function, called raw_access_remote_vm(), then define
__access_remote_vm() and __access_remote_vm_killable(), which acquire mmap_sem
by down_read() and _killable() respectively.

Then define access_remote_vm() and access_remote_vm_killable() to call them
respectively. Keep access_process_vm() calls __access_remote_vm().

So far fs/proc/base.c is the only user of access_remote_vm_killable(), but
there might be other users in the future.

There are 4 patches in this revision:
#1 define access_remote_vm_killable() APIs
#2 convert /proc/*/cmdline reading to down_read_killable() and 
access_remote_vm_killable()
#3 convert /proc/*/environ reading to down_read_killable() and 
access_remote_vm_killable()
#4 replace access_process_vm() to access_remote_vm() in get_cmdline to save one
   mm reference count inc (please see the commit log for the details). This
   change makes get_cmdline() is the only user of access_remote_vm()


Yang Shi (4):
  mm: add access_remote_vm_killable APIs
  fs: proc: use down_read_killable in proc_pid_cmdline_read()
  fs: proc: use down_read_killable() in environ_read()
  mm: use access_remote_vm() in get_cmdline()

 fs/proc/base.c | 21 +++--
 include/linux/mm.h |  5 +
 mm/memory.c| 44 +---
 mm/nommu.c | 36 
 mm/util.c  |  4 ++--
 5 files changed, 91 insertions(+), 19 deletions(-)

[PATCH 2/4 v2] fs: proc: use down_read_killable in proc_pid_cmdline_read()

2018-02-26 Thread Yang Shi

When running vm-scalability with large memory (> 300GB), the below hung
task issue happens occasionally.

INFO: task ps:14018 blocked for more than 120 seconds.
   Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 ps  D0 14018  1 0x0004
  885582f84000 885e8682f000 880972943000 885ebf499bc0
  8828ee12 c900349bfca8 817154d0 0040
  00ff812f872a 885ebf499bc0 024000d000948300 880972943000
 Call Trace:
  [] ? __schedule+0x250/0x730
  [] schedule+0x36/0x80
  [] rwsem_down_read_failed+0xf0/0x150
  [] call_rwsem_down_read_failed+0x18/0x30
  [] down_read+0x20/0x40
  [] proc_pid_cmdline_read+0xd9/0x4e0
  [] ? do_filp_open+0xa5/0x100
  [] __vfs_read+0x37/0x150
  [] ? security_file_permission+0x9b/0xc0
  [] vfs_read+0x96/0x130
  [] SyS_read+0x55/0xc0
  [] entry_SYSCALL_64_fastpath+0x1a/0xc5

When manipulating a large mapping, the process may hold the mmap_sem for
long time, so reading /proc//cmdline may be blocked in
uninterruptible state for long time.

down_read_trylock() sounds too aggressive, and we already have killable
version APIs for semaphore, here use down_read_killable() to improve the
responsiveness.

And, convert access_remote_vm() to killable version.

Signed-off-by: Yang Shi 
Suggested-by: Alexey Dobriyan 
---
 fs/proc/base.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9298324..9bdb84b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
goto out_mmput;
}
 
-   down_read(>mmap_sem);
+   rv = down_read_killable(>mmap_sem);
+   if (rv)
+   goto out_free_page;
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
@@ -264,7 +266,7 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
 * Inherently racy -- command line shares address space
 * with code and data.
 */
-   rv = access_remote_vm(mm, arg_end - 1, , 1, 0);
+   rv = access_remote_vm_killable(mm, arg_end - 1, , 1, 0);
if (rv <= 0)
goto out_free_page;
 
@@ -282,7 +284,8 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
int nr_read;
 
_count = min3(count, len, PAGE_SIZE);
-   nr_read = access_remote_vm(mm, p, page, _count, 0);
+   nr_read = access_remote_vm_killable(mm, p, page,
+   _count, 0);
if (nr_read < 0)
rv = nr_read;
if (nr_read <= 0)
@@ -328,7 +331,8 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
bool final;
 
_count = min3(count, len, PAGE_SIZE);
-   nr_read = access_remote_vm(mm, p, page, _count, 
0);
+   nr_read = access_remote_vm_killable(mm, p,
+   page, _count, 0);
if (nr_read < 0)
rv = nr_read;
if (nr_read <= 0)
-- 
1.8.3.1

[PATCH 2/4 v2] fs: proc: use down_read_killable in proc_pid_cmdline_read()

2018-02-26 Thread Yang Shi

When running vm-scalability with large memory (> 300GB), the below hung
task issue happens occasionally.

INFO: task ps:14018 blocked for more than 120 seconds.
   Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 ps  D0 14018  1 0x0004
  885582f84000 885e8682f000 880972943000 885ebf499bc0
  8828ee12 c900349bfca8 817154d0 0040
  00ff812f872a 885ebf499bc0 024000d000948300 880972943000
 Call Trace:
  [] ? __schedule+0x250/0x730
  [] schedule+0x36/0x80
  [] rwsem_down_read_failed+0xf0/0x150
  [] call_rwsem_down_read_failed+0x18/0x30
  [] down_read+0x20/0x40
  [] proc_pid_cmdline_read+0xd9/0x4e0
  [] ? do_filp_open+0xa5/0x100
  [] __vfs_read+0x37/0x150
  [] ? security_file_permission+0x9b/0xc0
  [] vfs_read+0x96/0x130
  [] SyS_read+0x55/0xc0
  [] entry_SYSCALL_64_fastpath+0x1a/0xc5

When manipulating a large mapping, the process may hold the mmap_sem for
long time, so reading /proc//cmdline may be blocked in
uninterruptible state for long time.

down_read_trylock() sounds too aggressive, and we already have killable
version APIs for semaphore, here use down_read_killable() to improve the
responsiveness.

And, convert access_remote_vm() to killable version.

Signed-off-by: Yang Shi 
Suggested-by: Alexey Dobriyan 
---
 fs/proc/base.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9298324..9bdb84b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
goto out_mmput;
}
 
-   down_read(>mmap_sem);
+   rv = down_read_killable(>mmap_sem);
+   if (rv)
+   goto out_free_page;
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
@@ -264,7 +266,7 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
 * Inherently racy -- command line shares address space
 * with code and data.
 */
-   rv = access_remote_vm(mm, arg_end - 1, , 1, 0);
+   rv = access_remote_vm_killable(mm, arg_end - 1, , 1, 0);
if (rv <= 0)
goto out_free_page;
 
@@ -282,7 +284,8 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
int nr_read;
 
_count = min3(count, len, PAGE_SIZE);
-   nr_read = access_remote_vm(mm, p, page, _count, 0);
+   nr_read = access_remote_vm_killable(mm, p, page,
+   _count, 0);
if (nr_read < 0)
rv = nr_read;
if (nr_read <= 0)
@@ -328,7 +331,8 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
bool final;
 
_count = min3(count, len, PAGE_SIZE);
-   nr_read = access_remote_vm(mm, p, page, _count, 
0);
+   nr_read = access_remote_vm_killable(mm, p,
+   page, _count, 0);
if (nr_read < 0)
rv = nr_read;
if (nr_read <= 0)
-- 
1.8.3.1

[PATCH 4/4 v2] mm: use access_remote_vm() in get_cmdline()

2018-02-26 Thread Yang Shi

get_cmdline() is using access_process_vm() which increases mm reference
count, but the mm reference count has been increased before calling
access_process_vm() and it is kept across get_cmdline(). It sounds
unnecessary to get mm reference count increased twice, so replace
access_process_vm() to access_remote_vm() which requires caller increase
mm reference count.

Signed-off-by: Yang Shi 
---
 mm/util.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index c125050..9b40637 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -732,7 +732,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int 
buflen)
if (len > buflen)
len = buflen;
 
-   res = access_process_vm(task, arg_start, buffer, len, FOLL_FORCE);
+   res = access_remote_vm(mm, arg_start, buffer, len, FOLL_FORCE);
 
/*
 * If the nul at the end of args has been overwritten, then
@@ -746,7 +746,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int 
buflen)
len = env_end - env_start;
if (len > buflen - res)
len = buflen - res;
-   res += access_process_vm(task, env_start,
+   res += access_remote_vm(mm, env_start,
 buffer+res, len,
 FOLL_FORCE);
res = strnlen(buffer, res);
-- 
1.8.3.1

[PATCH 1/4 v2] mm: add access_remote_vm_killable APIs

2018-02-26 Thread Yang Shi

Extracted common part (without acquiring mmap_sem) of
__access_remote_vm() into raw_access_remote_vm() then create
__access_remote_vm_killable() and access_remote_vm_killable() with
acquiring mmap_sem by down_read_killable().
Keep non-killable versions using down_read().

The killable version will be used by reading /proc/*/cmdline and
/proc/*/environ for the time being.

Signed-off-by: Yang Shi 
Cc: Alexey Dobriyan 
---
 include/linux/mm.h |  5 +
 mm/memory.c| 44 +---
 mm/nommu.c | 36 
 3 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..4574b19 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1372,8 +1372,13 @@ extern int access_process_vm(struct task_struct *tsk, 
unsigned long addr,
void *buf, int len, unsigned int gup_flags);
 extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
void *buf, int len, unsigned int gup_flags);
+extern int access_remote_vm_killable(struct mm_struct *mm, unsigned long addr,
+   void *buf, int len, unsigned int gup_flags);
 extern int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags);
+extern int __access_remote_vm_killable(struct task_struct *tsk,
+   struct mm_struct *mm, unsigned long addr, void *buf, int len,
+   unsigned int gup_flags);
 
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
diff --git a/mm/memory.c b/mm/memory.c
index dd8de96..8d7e223 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4415,18 +4415,13 @@ int generic_access_phys(struct vm_area_struct *vma, 
unsigned long addr,
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
 
-/*
- * Access another process' address space as given in mm.  If non-NULL, use the
- * given task for page fault accounting.
- */
-int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
+static int raw_access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags)
 {
struct vm_area_struct *vma;
void *old_buf = buf;
int write = gup_flags & FOLL_WRITE;
 
-   down_read(>mmap_sem);
/* ignore errors, just check how much was successfully transferred */
while (len) {
int bytes, ret, offset;
@@ -4475,11 +4470,40 @@ int __access_remote_vm(struct task_struct *tsk, struct 
mm_struct *mm,
buf += bytes;
addr += bytes;
}
-   up_read(>mmap_sem);
 
return buf - old_buf;
 }
 
+/*
+ * Access another process' address space as given in mm.  If non-NULL, use the
+ * given task for page fault accounting.
+ */
+int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
+   unsigned long addr, void *buf, int len, unsigned int gup_flags)
+{
+   int ret;
+
+   down_read(>mmap_sem);
+   ret = raw_access_remote_vm(tsk, mm, addr, buf, len, gup_flags);
+   up_read(>mmap_sem);
+   return ret;
+}
+
+int __access_remote_vm_killable(struct task_struct *tsk, struct mm_struct *mm,
+   unsigned long addr, void *buf, int len, unsigned int gup_flags)
+{
+   int ret;
+
+   ret = down_read_killable(>mmap_sem);
+   if (ret)
+   goto out;
+
+   ret = raw_access_remote_vm(tsk, mm, addr, buf, len, gup_flags);
+   up_read(>mmap_sem);
+out:
+   return ret;
+}
+
 /**
  * access_remote_vm - access another process' address space
  * @mm:the mm_struct of the target address space
@@ -4490,6 +4514,12 @@ int __access_remote_vm(struct task_struct *tsk, struct 
mm_struct *mm,
  *
  * The caller must hold a reference on @mm.
  */
+int access_remote_vm_killable(struct mm_struct *mm, unsigned long addr,
+   void *buf, int len, unsigned int gup_flags)
+{
+   return __access_remote_vm_killable(NULL, mm, addr, buf, len, gup_flags);
+}
+
 int access_remote_vm(struct mm_struct *mm, unsigned long addr,
void *buf, int len, unsigned int gup_flags)
 {
diff --git a/mm/nommu.c b/mm/nommu.c
index ebb6e61..ea043b3 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1802,14 +1802,12 @@ void filemap_map_pages(struct vm_fault *vmf,
 }
 EXPORT_SYMBOL(filemap_map_pages);
 
-int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
+static int raw_access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags)
 {
struct vm_area_struct *vma;
int write = gup_flags & FOLL_WRITE;
 
-   down_read(>mmap_sem);
-
/* the access must start within one of the target process's

[PATCH 4/4 v2] mm: use access_remote_vm() in get_cmdline()

2018-02-26 Thread Yang Shi

get_cmdline() is using access_process_vm() which increases mm reference
count, but the mm reference count has been increased before calling
access_process_vm() and it is kept across get_cmdline(). It sounds
unnecessary to get mm reference count increased twice, so replace
access_process_vm() to access_remote_vm() which requires caller increase
mm reference count.

Signed-off-by: Yang Shi 
---
 mm/util.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index c125050..9b40637 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -732,7 +732,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int 
buflen)
if (len > buflen)
len = buflen;
 
-   res = access_process_vm(task, arg_start, buffer, len, FOLL_FORCE);
+   res = access_remote_vm(mm, arg_start, buffer, len, FOLL_FORCE);
 
/*
 * If the nul at the end of args has been overwritten, then
@@ -746,7 +746,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int 
buflen)
len = env_end - env_start;
if (len > buflen - res)
len = buflen - res;
-   res += access_process_vm(task, env_start,
+   res += access_remote_vm(mm, env_start,
 buffer+res, len,
 FOLL_FORCE);
res = strnlen(buffer, res);
-- 
1.8.3.1

[PATCH 1/4 v2] mm: add access_remote_vm_killable APIs

2018-02-26 Thread Yang Shi

Extracted common part (without acquiring mmap_sem) of
__access_remote_vm() into raw_access_remote_vm() then create
__access_remote_vm_killable() and access_remote_vm_killable() with
acquiring mmap_sem by down_read_killable().
Keep non-killable versions using down_read().

The killable version will be used by reading /proc/*/cmdline and
/proc/*/environ for the time being.

Signed-off-by: Yang Shi 
Cc: Alexey Dobriyan 
---
 include/linux/mm.h |  5 +
 mm/memory.c| 44 +---
 mm/nommu.c | 36 
 3 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..4574b19 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1372,8 +1372,13 @@ extern int access_process_vm(struct task_struct *tsk, 
unsigned long addr,
void *buf, int len, unsigned int gup_flags);
 extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
void *buf, int len, unsigned int gup_flags);
+extern int access_remote_vm_killable(struct mm_struct *mm, unsigned long addr,
+   void *buf, int len, unsigned int gup_flags);
 extern int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags);
+extern int __access_remote_vm_killable(struct task_struct *tsk,
+   struct mm_struct *mm, unsigned long addr, void *buf, int len,
+   unsigned int gup_flags);
 
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
diff --git a/mm/memory.c b/mm/memory.c
index dd8de96..8d7e223 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4415,18 +4415,13 @@ int generic_access_phys(struct vm_area_struct *vma, 
unsigned long addr,
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
 
-/*
- * Access another process' address space as given in mm.  If non-NULL, use the
- * given task for page fault accounting.
- */
-int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
+static int raw_access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags)
 {
struct vm_area_struct *vma;
void *old_buf = buf;
int write = gup_flags & FOLL_WRITE;
 
-   down_read(>mmap_sem);
/* ignore errors, just check how much was successfully transferred */
while (len) {
int bytes, ret, offset;
@@ -4475,11 +4470,40 @@ int __access_remote_vm(struct task_struct *tsk, struct 
mm_struct *mm,
buf += bytes;
addr += bytes;
}
-   up_read(>mmap_sem);
 
return buf - old_buf;
 }
 
+/*
+ * Access another process' address space as given in mm.  If non-NULL, use the
+ * given task for page fault accounting.
+ */
+int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
+   unsigned long addr, void *buf, int len, unsigned int gup_flags)
+{
+   int ret;
+
+   down_read(>mmap_sem);
+   ret = raw_access_remote_vm(tsk, mm, addr, buf, len, gup_flags);
+   up_read(>mmap_sem);
+   return ret;
+}
+
+int __access_remote_vm_killable(struct task_struct *tsk, struct mm_struct *mm,
+   unsigned long addr, void *buf, int len, unsigned int gup_flags)
+{
+   int ret;
+
+   ret = down_read_killable(>mmap_sem);
+   if (ret)
+   goto out;
+
+   ret = raw_access_remote_vm(tsk, mm, addr, buf, len, gup_flags);
+   up_read(>mmap_sem);
+out:
+   return ret;
+}
+
 /**
  * access_remote_vm - access another process' address space
  * @mm:the mm_struct of the target address space
@@ -4490,6 +4514,12 @@ int __access_remote_vm(struct task_struct *tsk, struct 
mm_struct *mm,
  *
  * The caller must hold a reference on @mm.
  */
+int access_remote_vm_killable(struct mm_struct *mm, unsigned long addr,
+   void *buf, int len, unsigned int gup_flags)
+{
+   return __access_remote_vm_killable(NULL, mm, addr, buf, len, gup_flags);
+}
+
 int access_remote_vm(struct mm_struct *mm, unsigned long addr,
void *buf, int len, unsigned int gup_flags)
 {
diff --git a/mm/nommu.c b/mm/nommu.c
index ebb6e61..ea043b3 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1802,14 +1802,12 @@ void filemap_map_pages(struct vm_fault *vmf,
 }
 EXPORT_SYMBOL(filemap_map_pages);
 
-int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
+static int raw_access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags)
 {
struct vm_area_struct *vma;
int write = gup_flags & FOLL_WRITE;
 
-   down_read(>mmap_sem);
-
/* the access must start within one of the target process's mappings */
vma = find_vma(mm, addr);

Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-André Lureau wrote:
> fw_cfg_read_blob() may fail, but does not return error. This may lead
> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> memory.

I don't think that's true - there's a memset there that
will initialize the memory. probe is likely the only
case where it returns a slightly incorrect data.

> Return an error if ACPI locking failed. Also, the following
> DMA read/write extension will add more error paths that should be
> handled appropriately.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 32 ++--
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index f6f90bef604c..5e6e5ac71dab 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
>  }
>  
>  /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> -static void fw_cfg_read_blob(u16 key,
> - void *buf, loff_t pos, size_t count)
> +static ssize_t fw_cfg_read_blob(u16 key,
> + void *buf, loff_t pos, size_t count)
>  {
>   u32 glk = -1U;
>   acpi_status status;
> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
>   /* Should never get here */
>   WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
>   memset(buf, 0, count);
> - return;
> + return -EINVAL;
>   }
>  
>   mutex_lock(_cfg_dev_lock);

Wouldn't something like -EBUSY be more appropriate?

> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
>   mutex_unlock(_cfg_dev_lock);
>  
>   acpi_release_global_lock(glk);
> + return count;
>  }
>  
>  /* clean up fw_cfg device i/o */
> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct 
> platform_device *pdev)
>   }
>  
>   /* verify fw_cfg device signature */
> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> + 0, FW_CFG_SIG_SIZE) < 0 ||
> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
>   fw_cfg_io_cleanup();
>   return -ENODEV;
>   }
> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, 
> struct kobject *kobj,
>   if (count > entry->size - pos)
>   count = entry->size - pos;
>  
> - fw_cfg_read_blob(entry->select, buf, pos, count);
> - return count;
> + return fw_cfg_read_blob(entry->select, buf, pos, count);
>  }
>  
>  static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
>   struct fw_cfg_file *dir;
>   size_t dir_size;
>  
> - fw_cfg_read_blob(FW_CFG_FILE_DIR, _count, 0, sizeof(files_count));
> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, _count,
> + 0, sizeof(files_count));
> + if (ret < 0)
> + return ret;
> +
>   count = be32_to_cpu(files_count);
>   dir_size = count * sizeof(struct fw_cfg_file);
>  
> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
>   if (!dir)
>   return -ENOMEM;
>  
> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> + sizeof(files_count), dir_size);
> + if (ret < 0)
> + goto end;
>  
>   for (i = 0; i < count; i++) {
>   ret = fw_cfg_register_file([i]);
> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
>   break;
>   }
>  
> +end:
>   kfree(dir);
>   return ret;
>  }
> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device 
> *pdev)
>   goto err_probe;
>  
>   /* get revision number, add matching top-level attribute */
> - fw_cfg_read_blob(FW_CFG_ID, , 0, sizeof(rev));
> + err = fw_cfg_read_blob(FW_CFG_ID, , 0, sizeof(rev));
> + if (err < 0)
> + goto err_probe;
> +
>   fw_cfg_rev = le32_to_cpu(rev);
>   err = sysfs_create_file(fw_cfg_top_ko, _cfg_rev_attr.attr);
>   if (err)

I think that this is the only case where it's not doing the right thing right 
now in
that it shows 0 as the revision to the users.  Is it worth failing probe
here?  We could just skip the attribute, could we not?

> -- 
> 2.16.1.73.g5832b7e9f2

Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-André Lureau wrote:
> fw_cfg_read_blob() may fail, but does not return error. This may lead
> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> memory.

I don't think that's true - there's a memset there that
will initialize the memory. probe is likely the only
case where it returns a slightly incorrect data.

> Return an error if ACPI locking failed. Also, the following
> DMA read/write extension will add more error paths that should be
> handled appropriately.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 32 ++--
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index f6f90bef604c..5e6e5ac71dab 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
>  }
>  
>  /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> -static void fw_cfg_read_blob(u16 key,
> - void *buf, loff_t pos, size_t count)
> +static ssize_t fw_cfg_read_blob(u16 key,
> + void *buf, loff_t pos, size_t count)
>  {
>   u32 glk = -1U;
>   acpi_status status;
> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
>   /* Should never get here */
>   WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
>   memset(buf, 0, count);
> - return;
> + return -EINVAL;
>   }
>  
>   mutex_lock(_cfg_dev_lock);

Wouldn't something like -EBUSY be more appropriate?

> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
>   mutex_unlock(_cfg_dev_lock);
>  
>   acpi_release_global_lock(glk);
> + return count;
>  }
>  
>  /* clean up fw_cfg device i/o */
> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct 
> platform_device *pdev)
>   }
>  
>   /* verify fw_cfg device signature */
> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> + 0, FW_CFG_SIG_SIZE) < 0 ||
> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
>   fw_cfg_io_cleanup();
>   return -ENODEV;
>   }
> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, 
> struct kobject *kobj,
>   if (count > entry->size - pos)
>   count = entry->size - pos;
>  
> - fw_cfg_read_blob(entry->select, buf, pos, count);
> - return count;
> + return fw_cfg_read_blob(entry->select, buf, pos, count);
>  }
>  
>  static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
>   struct fw_cfg_file *dir;
>   size_t dir_size;
>  
> - fw_cfg_read_blob(FW_CFG_FILE_DIR, _count, 0, sizeof(files_count));
> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, _count,
> + 0, sizeof(files_count));
> + if (ret < 0)
> + return ret;
> +
>   count = be32_to_cpu(files_count);
>   dir_size = count * sizeof(struct fw_cfg_file);
>  
> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
>   if (!dir)
>   return -ENOMEM;
>  
> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> + sizeof(files_count), dir_size);
> + if (ret < 0)
> + goto end;
>  
>   for (i = 0; i < count; i++) {
>   ret = fw_cfg_register_file([i]);
> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
>   break;
>   }
>  
> +end:
>   kfree(dir);
>   return ret;
>  }
> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device 
> *pdev)
>   goto err_probe;
>  
>   /* get revision number, add matching top-level attribute */
> - fw_cfg_read_blob(FW_CFG_ID, , 0, sizeof(rev));
> + err = fw_cfg_read_blob(FW_CFG_ID, , 0, sizeof(rev));
> + if (err < 0)
> + goto err_probe;
> +
>   fw_cfg_rev = le32_to_cpu(rev);
>   err = sysfs_create_file(fw_cfg_top_ko, _cfg_rev_attr.attr);
>   if (err)

I think that this is the only case where it's not doing the right thing right 
now in
that it shows 0 as the revision to the users.  Is it worth failing probe
here?  We could just skip the attribute, could we not?

> -- 
> 2.16.1.73.g5832b7e9f2

Re: [PATCH v5 0/4] ARM: OMAP2+: AM33XX/AM43XX: Add suspend-resume support

2018-02-26 Thread santosh.shilim...@oracle.com


On 2/26/18 1:26 PM, Tony Lindgren wrote:

* Santosh Shilimkar  [180225 23:36]:

Dave Gerlach (4):
ARM: OMAP2+: Introduce low-level suspend code for AM33XX
ARM: OMAP2+: Introduce low-level suspend code for AM43XX
ARM: OMAP2+: pm33xx-core: Add platform code needed for PM
soc: ti: Add pm33xx driver for basic suspend support


Are you going to pickup this series ?


Yes planning to apply it today or tomorrow.


Sounds good.

Acked-by: Santosh Shilimkar

Re: [PATCH v5 0/4] ARM: OMAP2+: AM33XX/AM43XX: Add suspend-resume support

2018-02-26 Thread santosh.shilim...@oracle.com


On 2/26/18 1:26 PM, Tony Lindgren wrote:

* Santosh Shilimkar  [180225 23:36]:

Dave Gerlach (4):
ARM: OMAP2+: Introduce low-level suspend code for AM33XX
ARM: OMAP2+: Introduce low-level suspend code for AM43XX
ARM: OMAP2+: pm33xx-core: Add platform code needed for PM
soc: ti: Add pm33xx driver for basic suspend support


Are you going to pickup this series ?


Yes planning to apply it today or tomorrow.


Sounds good.

Acked-by: Santosh Shilimkar

Re: [PATCH v2 char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling

2018-02-26 Thread Stephen Hemminger

On Fri, 16 Feb 2018 23:05:33 +
Michael Kelley  wrote:

> Fix bugs in signaling the Hyper-V host when freeing space in the
> host->guest ring buffer:
> 
> 1. The interrupt_mask must not be used to determine whether to signal
>on the host->guest ring buffer
> 2. The ring buffer write_index must be read (via hv_get_bytes_to_write)
>*after* pending_send_sz is read in order to avoid a race condition
> 3. Comparisons with pending_send_sz must treat the "equals" case as
>not-enough-space
> 4. Don't signal if the pending_send_sz feature is not present. Older
>versions of Hyper-V that don't implement this feature will poll.
> 
> Fixes: 03bad714a161 ("vmbus: more host signalling avoidance")
> Signed-off-by: Michael Kelley 

Signed-off-by: Stephen Hemminger

Re: [PATCH v2 char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling

2018-02-26 Thread Stephen Hemminger

On Fri, 16 Feb 2018 23:05:33 +
Michael Kelley  wrote:

> Fix bugs in signaling the Hyper-V host when freeing space in the
> host->guest ring buffer:
> 
> 1. The interrupt_mask must not be used to determine whether to signal
>on the host->guest ring buffer
> 2. The ring buffer write_index must be read (via hv_get_bytes_to_write)
>*after* pending_send_sz is read in order to avoid a race condition
> 3. Comparisons with pending_send_sz must treat the "equals" case as
>not-enough-space
> 4. Don't signal if the pending_send_sz feature is not present. Older
>versions of Hyper-V that don't implement this feature will poll.
> 
> Fixes: 03bad714a161 ("vmbus: more host signalling avoidance")
> Signed-off-by: Michael Kelley 

Signed-off-by: Stephen Hemminger

Re: [PATCH 4.4 00/22] 4.4.119-stable review

2018-02-26 Thread Nathan Chancellor

On Mon, Feb 26, 2018 at 09:16:00PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.119 release.
> There are 22 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Feb 28 20:15:48 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.119-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
>

Merged, compiled, and flashed onto my Pixel 2 XL and OnePlus 5.

No immediate issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: [PATCH 4.4 00/22] 4.4.119-stable review

2018-02-26 Thread Nathan Chancellor

On Mon, Feb 26, 2018 at 09:16:00PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.119 release.
> There are 22 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Feb 28 20:15:48 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.119-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
>

Merged, compiled, and flashed onto my Pixel 2 XL and OnePlus 5.

No immediate issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: [PATCH v15 02/11] fw_cfg: add a public uapi header

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:03PM +0100, Marc-André Lureau wrote:
> Create a common header file for well-known values and structures to be
> shared by the Linux kernel with qemu or other projects.
> 
> It is based from qemu/docs/specs/fw_cfg.txt which references
> qemu/include/hw/nvram/fw_cfg_keys.h "for the most up-to-date and
> authoritative list" & vmcoreinfo.txt. Those files don't have an
> explicit license, but qemu/hw/nvram/fw_cfg.c is BSD-license, so
> Michael S. Tsirkin suggested to use the same license.
> 
> The patch intentionally left out DMA & vmcoreinfo structures &
> defines, which are added in the commits making usage of it.
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Marc-André Lureau 
> 
> ---
> 
> The related qemu patch making use of it, to be submitted:
> https://github.com/elmarco/qemu/commit/4884fc9e9c4c4467a371e5a40f3181239e1b70f5
> ---
>  MAINTAINERS|  1 +
>  drivers/firmware/qemu_fw_cfg.c | 22 ++
>  include/uapi/linux/fw_cfg.h| 66 
> ++
>  3 files changed, 69 insertions(+), 20 deletions(-)
>  create mode 100644 include/uapi/linux/fw_cfg.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3bdc260e36b7..a66b65f62811 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11352,6 +11352,7 @@ M:"Michael S. Tsirkin" 
>  L:   qemu-de...@nongnu.org
>  S:   Maintained
>  F:   drivers/firmware/qemu_fw_cfg.c
> +F:   include/uapi/linux/fw_cfg.h
>  
>  QIB DRIVER
>  M:   Dennis Dalessandro 

Why fw_cfg.h and not qemu_fw_cfg.h ? fw_cfg.h seems too generic.

> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index a41b572eeeb1..42601a3eaed5 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -32,30 +32,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  MODULE_AUTHOR("Gabriel L. Somlo ");
>  MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
>  MODULE_LICENSE("GPL");
>  
> -/* selector key values for "well-known" fw_cfg entries */
> -#define FW_CFG_SIGNATURE  0x00
> -#define FW_CFG_ID 0x01
> -#define FW_CFG_FILE_DIR   0x19
> -
> -/* size in bytes of fw_cfg signature */
> -#define FW_CFG_SIG_SIZE 4
> -
> -/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> -#define FW_CFG_MAX_FILE_PATH 56
> -
> -/* fw_cfg file directory entry type */
> -struct fw_cfg_file {
> - u32 size;
> - u16 select;
> - u16 reserved;
> - char name[FW_CFG_MAX_FILE_PATH];
> -};
> -
>  /* fw_cfg device i/o register addresses */
>  static bool fw_cfg_is_mmio;
>  static phys_addr_t fw_cfg_p_base;
> @@ -597,7 +579,7 @@ MODULE_DEVICE_TABLE(of, fw_cfg_sysfs_mmio_match);
>  
>  #ifdef CONFIG_ACPI
>  static const struct acpi_device_id fw_cfg_sysfs_acpi_match[] = {
> - { "QEMU0002", },
> + { FW_CFG_ACPI_DEVICE_ID, },
>   {},
>  };
>  MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_acpi_match);
> diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
> new file mode 100644
> index ..c698ac3812f6
> --- /dev/null
> +++ b/include/uapi/linux/fw_cfg.h
> @@ -0,0 +1,66 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +#ifndef _LINUX_FW_CFG_H
> +#define _LINUX_FW_CFG_H
> +
> +#include 
> +
> +#define FW_CFG_ACPI_DEVICE_ID"QEMU0002"
> +
> +/* selector key values for "well-known" fw_cfg entries */
> +#define FW_CFG_SIGNATURE 0x00
> +#define FW_CFG_ID0x01
> +#define FW_CFG_UUID  0x02
> +#define FW_CFG_RAM_SIZE  0x03
> +#define FW_CFG_NOGRAPHIC 0x04
> +#define FW_CFG_NB_CPUS   0x05
> +#define FW_CFG_MACHINE_ID0x06
> +#define FW_CFG_KERNEL_ADDR   0x07
> +#define FW_CFG_KERNEL_SIZE   0x08
> +#define FW_CFG_KERNEL_CMDLINE0x09
> +#define FW_CFG_INITRD_ADDR   0x0a
> +#define FW_CFG_INITRD_SIZE   0x0b
> +#define FW_CFG_BOOT_DEVICE   0x0c
> +#define FW_CFG_NUMA  0x0d
> +#define FW_CFG_BOOT_MENU 0x0e
> +#define FW_CFG_MAX_CPUS  0x0f
> +#define FW_CFG_KERNEL_ENTRY  0x10
> +#define FW_CFG_KERNEL_DATA   0x11
> +#define FW_CFG_INITRD_DATA   0x12
> +#define FW_CFG_CMDLINE_ADDR  0x13
> +#define FW_CFG_CMDLINE_SIZE  0x14
> +#define FW_CFG_CMDLINE_DATA  0x15
> +#define FW_CFG_SETUP_ADDR0x16
> +#define FW_CFG_SETUP_SIZE0x17
> +#define FW_CFG_SETUP_DATA0x18
> +#define FW_CFG_FILE_DIR  0x19
> +
> +#define FW_CFG_FILE_FIRST0x20
> +#define FW_CFG_FILE_SLOTS_MIN0x10
> +
> +#define FW_CFG_WRITE_CHANNEL 0x4000
> +#define FW_CFG_ARCH_LOCAL0x8000
> +#define FW_CFG_ENTRY_MASK(~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL))
> +
> +#define FW_CFG_INVALID   0x
> +
> +/* width in bytes of fw_cfg control register */
> +#define FW_CFG_CTL_SIZE  0x02
> +
> +/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> +#define FW_CFG_MAX_FILE_PATH

Re: [PATCH v15 02/11] fw_cfg: add a public uapi header

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:03PM +0100, Marc-André Lureau wrote:
> Create a common header file for well-known values and structures to be
> shared by the Linux kernel with qemu or other projects.
> 
> It is based from qemu/docs/specs/fw_cfg.txt which references
> qemu/include/hw/nvram/fw_cfg_keys.h "for the most up-to-date and
> authoritative list" & vmcoreinfo.txt. Those files don't have an
> explicit license, but qemu/hw/nvram/fw_cfg.c is BSD-license, so
> Michael S. Tsirkin suggested to use the same license.
> 
> The patch intentionally left out DMA & vmcoreinfo structures &
> defines, which are added in the commits making usage of it.
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Marc-André Lureau 
> 
> ---
> 
> The related qemu patch making use of it, to be submitted:
> https://github.com/elmarco/qemu/commit/4884fc9e9c4c4467a371e5a40f3181239e1b70f5
> ---
>  MAINTAINERS|  1 +
>  drivers/firmware/qemu_fw_cfg.c | 22 ++
>  include/uapi/linux/fw_cfg.h| 66 
> ++
>  3 files changed, 69 insertions(+), 20 deletions(-)
>  create mode 100644 include/uapi/linux/fw_cfg.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3bdc260e36b7..a66b65f62811 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11352,6 +11352,7 @@ M:"Michael S. Tsirkin" 
>  L:   qemu-de...@nongnu.org
>  S:   Maintained
>  F:   drivers/firmware/qemu_fw_cfg.c
> +F:   include/uapi/linux/fw_cfg.h
>  
>  QIB DRIVER
>  M:   Dennis Dalessandro 

Why fw_cfg.h and not qemu_fw_cfg.h ? fw_cfg.h seems too generic.

> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index a41b572eeeb1..42601a3eaed5 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -32,30 +32,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  MODULE_AUTHOR("Gabriel L. Somlo ");
>  MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
>  MODULE_LICENSE("GPL");
>  
> -/* selector key values for "well-known" fw_cfg entries */
> -#define FW_CFG_SIGNATURE  0x00
> -#define FW_CFG_ID 0x01
> -#define FW_CFG_FILE_DIR   0x19
> -
> -/* size in bytes of fw_cfg signature */
> -#define FW_CFG_SIG_SIZE 4
> -
> -/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> -#define FW_CFG_MAX_FILE_PATH 56
> -
> -/* fw_cfg file directory entry type */
> -struct fw_cfg_file {
> - u32 size;
> - u16 select;
> - u16 reserved;
> - char name[FW_CFG_MAX_FILE_PATH];
> -};
> -
>  /* fw_cfg device i/o register addresses */
>  static bool fw_cfg_is_mmio;
>  static phys_addr_t fw_cfg_p_base;
> @@ -597,7 +579,7 @@ MODULE_DEVICE_TABLE(of, fw_cfg_sysfs_mmio_match);
>  
>  #ifdef CONFIG_ACPI
>  static const struct acpi_device_id fw_cfg_sysfs_acpi_match[] = {
> - { "QEMU0002", },
> + { FW_CFG_ACPI_DEVICE_ID, },
>   {},
>  };
>  MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_acpi_match);
> diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
> new file mode 100644
> index ..c698ac3812f6
> --- /dev/null
> +++ b/include/uapi/linux/fw_cfg.h
> @@ -0,0 +1,66 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +#ifndef _LINUX_FW_CFG_H
> +#define _LINUX_FW_CFG_H
> +
> +#include 
> +
> +#define FW_CFG_ACPI_DEVICE_ID"QEMU0002"
> +
> +/* selector key values for "well-known" fw_cfg entries */
> +#define FW_CFG_SIGNATURE 0x00
> +#define FW_CFG_ID0x01
> +#define FW_CFG_UUID  0x02
> +#define FW_CFG_RAM_SIZE  0x03
> +#define FW_CFG_NOGRAPHIC 0x04
> +#define FW_CFG_NB_CPUS   0x05
> +#define FW_CFG_MACHINE_ID0x06
> +#define FW_CFG_KERNEL_ADDR   0x07
> +#define FW_CFG_KERNEL_SIZE   0x08
> +#define FW_CFG_KERNEL_CMDLINE0x09
> +#define FW_CFG_INITRD_ADDR   0x0a
> +#define FW_CFG_INITRD_SIZE   0x0b
> +#define FW_CFG_BOOT_DEVICE   0x0c
> +#define FW_CFG_NUMA  0x0d
> +#define FW_CFG_BOOT_MENU 0x0e
> +#define FW_CFG_MAX_CPUS  0x0f
> +#define FW_CFG_KERNEL_ENTRY  0x10
> +#define FW_CFG_KERNEL_DATA   0x11
> +#define FW_CFG_INITRD_DATA   0x12
> +#define FW_CFG_CMDLINE_ADDR  0x13
> +#define FW_CFG_CMDLINE_SIZE  0x14
> +#define FW_CFG_CMDLINE_DATA  0x15
> +#define FW_CFG_SETUP_ADDR0x16
> +#define FW_CFG_SETUP_SIZE0x17
> +#define FW_CFG_SETUP_DATA0x18
> +#define FW_CFG_FILE_DIR  0x19
> +
> +#define FW_CFG_FILE_FIRST0x20
> +#define FW_CFG_FILE_SLOTS_MIN0x10
> +
> +#define FW_CFG_WRITE_CHANNEL 0x4000
> +#define FW_CFG_ARCH_LOCAL0x8000
> +#define FW_CFG_ENTRY_MASK(~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL))
> +
> +#define FW_CFG_INVALID   0x
> +
> +/* width in bytes of fw_cfg control register */
> +#define FW_CFG_CTL_SIZE  0x02
> +
> +/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> +#define FW_CFG_MAX_FILE_PATH 56
> +
> +/* size in bytes of fw_cfg signature */
> +#define FW_CFG_SIG_SIZE 4
> +
> +/* FW_CFG_ID bits */
>

[PATCH 1/3] console: SisUSB2VGA: Drop dummy con_font_get()

2018-02-26 Thread Kees Cook

As done in commit:

  724ba8b30b04 ("console/dummy: leave .con_font_get set to NULL")

This drops the dummy .con_font_get(), as it could leave arguments
uninitialized.

Cc: Thomas Winischhofer 
Signed-off-by: Kees Cook 
---
 drivers/usb/misc/sisusbvga/sisusb_con.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/usb/misc/sisusbvga/sisusb_con.c 
b/drivers/usb/misc/sisusbvga/sisusb_con.c
index 73f7bde78e11..998df891bdde 100644
--- a/drivers/usb/misc/sisusbvga/sisusb_con.c
+++ b/drivers/usb/misc/sisusbvga/sisusb_con.c
@@ -1358,7 +1358,6 @@ static const struct consw sisusb_dummy_con = {
.con_switch =   SISUSBCONDUMMY,
.con_blank =SISUSBCONDUMMY,
.con_font_set = SISUSBCONDUMMY,
-   .con_font_get = SISUSBCONDUMMY,
.con_font_default = SISUSBCONDUMMY,
.con_font_copy =SISUSBCONDUMMY,
 };
-- 
2.7.4

[PATCH 1/3] console: SisUSB2VGA: Drop dummy con_font_get()

2018-02-26 Thread Kees Cook

As done in commit:

  724ba8b30b04 ("console/dummy: leave .con_font_get set to NULL")

This drops the dummy .con_font_get(), as it could leave arguments
uninitialized.

Cc: Thomas Winischhofer 
Signed-off-by: Kees Cook 
---
 drivers/usb/misc/sisusbvga/sisusb_con.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/usb/misc/sisusbvga/sisusb_con.c 
b/drivers/usb/misc/sisusbvga/sisusb_con.c
index 73f7bde78e11..998df891bdde 100644
--- a/drivers/usb/misc/sisusbvga/sisusb_con.c
+++ b/drivers/usb/misc/sisusbvga/sisusb_con.c
@@ -1358,7 +1358,6 @@ static const struct consw sisusb_dummy_con = {
.con_switch =   SISUSBCONDUMMY,
.con_blank =SISUSBCONDUMMY,
.con_font_set = SISUSBCONDUMMY,
-   .con_font_get = SISUSBCONDUMMY,
.con_font_default = SISUSBCONDUMMY,
.con_font_copy =SISUSBCONDUMMY,
 };
-- 
2.7.4

[PATCH 2/3] console: Fill in struct consw argument names

2018-02-26 Thread Kees Cook

Reading the function declarations for the console callbacks lacks any
hints as to what the arguments are. Instead of going and digging around in
various implementations that may each only have a subset of the callbacks,
name all the arguments in the declaration. This has no functional change.

Signed-off-by: Kees Cook 
---
 include/linux/console.h | 58 +++--
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..dfd6b0e97855 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -46,46 +46,52 @@ enum con_scroll {
 struct consw {
struct module *owner;
const char *(*con_startup)(void);
-   void(*con_init)(struct vc_data *, int);
-   void(*con_deinit)(struct vc_data *);
-   void(*con_clear)(struct vc_data *, int, int, int, int);
-   void(*con_putc)(struct vc_data *, int, int, int);
-   void(*con_putcs)(struct vc_data *, const unsigned short *, int, 
int, int);
-   void(*con_cursor)(struct vc_data *, int);
-   bool(*con_scroll)(struct vc_data *, unsigned int top,
+   void(*con_init)(struct vc_data *vc, int init);
+   void(*con_deinit)(struct vc_data *vc);
+   void(*con_clear)(struct vc_data *vc, int sy, int sx, int height,
+   int width);
+   void(*con_putc)(struct vc_data *vc, int c, int ypos, int xpos);
+   void(*con_putcs)(struct vc_data *vc, const unsigned short *s,
+   int count, int ypos, int xpos);
+   void(*con_cursor)(struct vc_data *vc, int mode);
+   bool(*con_scroll)(struct vc_data *vc, unsigned int top,
unsigned int bottom, enum con_scroll dir,
unsigned int lines);
-   int (*con_switch)(struct vc_data *);
-   int (*con_blank)(struct vc_data *, int, int);
-   int (*con_font_set)(struct vc_data *, struct console_font *, 
unsigned);
-   int (*con_font_get)(struct vc_data *, struct console_font *);
-   int (*con_font_default)(struct vc_data *, struct console_font *, 
char *);
-   int (*con_font_copy)(struct vc_data *, int);
-   int (*con_resize)(struct vc_data *, unsigned int, unsigned int,
-  unsigned int);
-   void(*con_set_palette)(struct vc_data *,
+   int (*con_switch)(struct vc_data *vc);
+   int (*con_blank)(struct vc_data *vc, int blank, int mode_switch);
+   int (*con_font_set)(struct vc_data *vc, struct console_font *font,
+   unsigned int flags);
+   int (*con_font_get)(struct vc_data *vc, struct console_font *font);
+   int (*con_font_default)(struct vc_data *vc,
+   struct console_font *font, char *name);
+   int (*con_font_copy)(struct vc_data *vc, int con);
+   int (*con_resize)(struct vc_data *vc, unsigned int width,
+   unsigned int height, unsigned int user);
+   void(*con_set_palette)(struct vc_data *vc,
const unsigned char *table);
-   void(*con_scrolldelta)(struct vc_data *, int lines);
-   int (*con_set_origin)(struct vc_data *);
-   void(*con_save_screen)(struct vc_data *);
-   u8  (*con_build_attr)(struct vc_data *, u8, u8, u8, u8, u8, u8);
-   void(*con_invert_region)(struct vc_data *, u16 *, int);
-   u16*(*con_screen_pos)(struct vc_data *, int);
-   unsigned long (*con_getxy)(struct vc_data *, unsigned long, int *, int 
*);
+   void(*con_scrolldelta)(struct vc_data *vc, int lines);
+   int (*con_set_origin)(struct vc_data *vc);
+   void(*con_save_screen)(struct vc_data *vc);
+   u8  (*con_build_attr)(struct vc_data *vc, u8 color, u8 intensity,
+   u8 blink, u8 underline, u8 reverse, u8 italic);
+   void(*con_invert_region)(struct vc_data *vc, u16 *p, int count);
+   u16*(*con_screen_pos)(struct vc_data *vc, int offset);
+   unsigned long (*con_getxy)(struct vc_data *vc, unsigned long position,
+   int *px, int *py);
/*
 * Flush the video console driver's scrollback buffer
 */
-   void(*con_flush_scrollback)(struct vc_data *);
+   void(*con_flush_scrollback)(struct vc_data *vc);
/*
 * Prepare the console for the debugger.  This includes, but is not
 * limited to, unblanking the console, loading an appropriate
 * palette, and allowing debugger generated output.
 */
-   int (*con_debug_enter)(struct vc_data *);
+   int (*con_debug_enter)(struct vc_data *vc);
/*
 * Restore the console to its pre-debug state as closely as possible.
 */
-   int (*con_debug_leave)(struct vc_data *);
+   int

[PATCH 2/3] console: Fill in struct consw argument names

2018-02-26 Thread Kees Cook

Reading the function declarations for the console callbacks lacks any
hints as to what the arguments are. Instead of going and digging around in
various implementations that may each only have a subset of the callbacks,
name all the arguments in the declaration. This has no functional change.

Signed-off-by: Kees Cook 
---
 include/linux/console.h | 58 +++--
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..dfd6b0e97855 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -46,46 +46,52 @@ enum con_scroll {
 struct consw {
struct module *owner;
const char *(*con_startup)(void);
-   void(*con_init)(struct vc_data *, int);
-   void(*con_deinit)(struct vc_data *);
-   void(*con_clear)(struct vc_data *, int, int, int, int);
-   void(*con_putc)(struct vc_data *, int, int, int);
-   void(*con_putcs)(struct vc_data *, const unsigned short *, int, 
int, int);
-   void(*con_cursor)(struct vc_data *, int);
-   bool(*con_scroll)(struct vc_data *, unsigned int top,
+   void(*con_init)(struct vc_data *vc, int init);
+   void(*con_deinit)(struct vc_data *vc);
+   void(*con_clear)(struct vc_data *vc, int sy, int sx, int height,
+   int width);
+   void(*con_putc)(struct vc_data *vc, int c, int ypos, int xpos);
+   void(*con_putcs)(struct vc_data *vc, const unsigned short *s,
+   int count, int ypos, int xpos);
+   void(*con_cursor)(struct vc_data *vc, int mode);
+   bool(*con_scroll)(struct vc_data *vc, unsigned int top,
unsigned int bottom, enum con_scroll dir,
unsigned int lines);
-   int (*con_switch)(struct vc_data *);
-   int (*con_blank)(struct vc_data *, int, int);
-   int (*con_font_set)(struct vc_data *, struct console_font *, 
unsigned);
-   int (*con_font_get)(struct vc_data *, struct console_font *);
-   int (*con_font_default)(struct vc_data *, struct console_font *, 
char *);
-   int (*con_font_copy)(struct vc_data *, int);
-   int (*con_resize)(struct vc_data *, unsigned int, unsigned int,
-  unsigned int);
-   void(*con_set_palette)(struct vc_data *,
+   int (*con_switch)(struct vc_data *vc);
+   int (*con_blank)(struct vc_data *vc, int blank, int mode_switch);
+   int (*con_font_set)(struct vc_data *vc, struct console_font *font,
+   unsigned int flags);
+   int (*con_font_get)(struct vc_data *vc, struct console_font *font);
+   int (*con_font_default)(struct vc_data *vc,
+   struct console_font *font, char *name);
+   int (*con_font_copy)(struct vc_data *vc, int con);
+   int (*con_resize)(struct vc_data *vc, unsigned int width,
+   unsigned int height, unsigned int user);
+   void(*con_set_palette)(struct vc_data *vc,
const unsigned char *table);
-   void(*con_scrolldelta)(struct vc_data *, int lines);
-   int (*con_set_origin)(struct vc_data *);
-   void(*con_save_screen)(struct vc_data *);
-   u8  (*con_build_attr)(struct vc_data *, u8, u8, u8, u8, u8, u8);
-   void(*con_invert_region)(struct vc_data *, u16 *, int);
-   u16*(*con_screen_pos)(struct vc_data *, int);
-   unsigned long (*con_getxy)(struct vc_data *, unsigned long, int *, int 
*);
+   void(*con_scrolldelta)(struct vc_data *vc, int lines);
+   int (*con_set_origin)(struct vc_data *vc);
+   void(*con_save_screen)(struct vc_data *vc);
+   u8  (*con_build_attr)(struct vc_data *vc, u8 color, u8 intensity,
+   u8 blink, u8 underline, u8 reverse, u8 italic);
+   void(*con_invert_region)(struct vc_data *vc, u16 *p, int count);
+   u16*(*con_screen_pos)(struct vc_data *vc, int offset);
+   unsigned long (*con_getxy)(struct vc_data *vc, unsigned long position,
+   int *px, int *py);
/*
 * Flush the video console driver's scrollback buffer
 */
-   void(*con_flush_scrollback)(struct vc_data *);
+   void(*con_flush_scrollback)(struct vc_data *vc);
/*
 * Prepare the console for the debugger.  This includes, but is not
 * limited to, unblanking the console, loading an appropriate
 * palette, and allowing debugger generated output.
 */
-   int (*con_debug_enter)(struct vc_data *);
+   int (*con_debug_enter)(struct vc_data *vc);
/*
 * Restore the console to its pre-debug state as closely as possible.
 */
-   int (*con_debug_leave)(struct vc_data *);
+   int (*con_debug_leave)(struct

[PATCH 0/3] console: Expand dummy functions for CFI

2018-02-26 Thread Kees Cook

This is a small series that cleans up struct consw a bit and
prepares it for Control Flow Integrity checking (i.e. Clang's
-fsanitize=cfi).

Thanks!

-Kees

[PATCH 0/3] console: Expand dummy functions for CFI

2018-02-26 Thread Kees Cook

This is a small series that cleans up struct consw a bit and
prepares it for Control Flow Integrity checking (i.e. Clang's
-fsanitize=cfi).

Thanks!

-Kees

Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:12PM +0100, Marc-André Lureau wrote:
> Modify fw_cfg_read_blob() to use DMA if the device supports it.
> Return errors, because the operation may fail.
> 
> So far, only one call in fw_cfg_register_dir_entries() is using
> kmalloc'ed buf and is thus clearly eligible to DMA read.
> 
> Initially, I didn't implement DMA read to speed up boot time, but as a
> first step before introducing DMA write (since read operations were
> already presents). Even more, I didn't realize fw-cfg entries were
> being read by the kernel during boot by default. But actally fw-cfg
> entries are being populated during module probe. I knew DMA improved a
> lot bios boot time (the main reason the DMA interface was added
> afaik). Let see the time it would take to read the whole ACPI
> tables (128kb allocated)
> 
>  # time cat /sys/firmware/qemu_fw_cfg/by_name/etc/acpi/tables/raw
>   - with DMA: sys 0m0.003s
>   - without DMA (-global fw_cfg.dma_enabled=off): sys 0m7.674s
> 
> FW_CFG_FILE_DIR (0x19) is the only "file" that is read during kernel
> boot to populate sysfs qemu_fw_cfg directory, and it is quite
> small (1-2kb). Since it does not expose itself, in order to measure
> the time it takes to read such small file, I took a comparable sized
> file of 2048 bytes and exposed it (-fw_cfg test,file=file with a
> modified read_raw enabling DMA)
> 
>  # perf stat -r 100 cat /sys/firmware/qemu_fw_cfg/by_name/test/raw >/dev/null
>   - with DMA:
>   0.636037  task-clock (msec) #0.141 CPUs utilized
> ( +-  1.19% )
>   - without DMA:
>   6.430128  task-clock (msec) #0.622 CPUs utilized
> ( +-  0.22% )
> 
> That's a few msec saved during boot by enabling DMA read (the gain
> would be more substantial if other & bigger fw-cfg entries are read by
> others from sysfs, unfortunately, it's not clear if we can always
> enable DMA there)
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 61 
> ++
>  1 file changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index 3015e77aebca..94df57e9be66 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -124,12 +124,47 @@ static ssize_t fw_cfg_dma_transfer(void *address, u32 
> length, u32 control)
>   return ret;
>  }
>  
> +/* with acpi & dev locks taken */
> +static ssize_t fw_cfg_read_blob_dma(u16 key,
> + void *buf, loff_t pos, size_t count)
> +{
> + ssize_t ret;
> +
> + if (pos == 0) {
> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> + | FW_CFG_DMA_CTL_SELECT
> + | FW_CFG_DMA_CTL_READ);
> + } else {
> + fw_cfg_sel_endianness(key);
> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> + if (ret < 0)
> + return ret;
> + ret = fw_cfg_dma_transfer(buf, count,
> + FW_CFG_DMA_CTL_READ);
> + }
> +
> + return ret;
> +}
> +
> +/* with acpi & dev locks taken */
> +static ssize_t fw_cfg_read_blob_io(u16 key,
> + void *buf, loff_t pos, size_t count)
> +{
> + fw_cfg_sel_endianness(key);
> + while (pos-- > 0)
> + ioread8(fw_cfg_reg_data);
> + ioread8_rep(fw_cfg_reg_data, buf, count);
> + return count;
> +}
> +
>  /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>  static ssize_t fw_cfg_read_blob(u16 key,
> - void *buf, loff_t pos, size_t count)
> + void *buf, loff_t pos, size_t count,
> + bool dma)
>  {
>   u32 glk = -1U;
>   acpi_status status;
> + ssize_t ret;
>  
>   /* If we have ACPI, ensure mutual exclusion against any potential
>* device access by the firmware, e.g. via AML methods:

so this adds a dma flag to fw_cfg_read_blob.



> @@ -143,14 +178,17 @@ static ssize_t fw_cfg_read_blob(u16 key,
>   }
>  
>   mutex_lock(_cfg_dev_lock);
> - fw_cfg_sel_endianness(key);
> - while (pos-- > 0)
> - ioread8(fw_cfg_reg_data);
> - ioread8_rep(fw_cfg_reg_data, buf, count);
> + if (dma && fw_cfg_dma_enabled()) {
> + ret = fw_cfg_read_blob_dma(key, buf, pos, count);
> + } else {
> + ret = fw_cfg_read_blob_io(key, buf, pos, count);
> + }
> +
>   mutex_unlock(_cfg_dev_lock);
>  
>   acpi_release_global_lock(glk);
> - return count;
> +
> + return ret;
>  }
>  
>  #ifdef CONFIG_CRASH_CORE

If set to false it does io, if set to true it does dma.

I would prefer passing an accessor function pointer
since that's clearer than true/false.


> @@ -284,7 +322,7 @@ static int

Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

2018-02-26 Thread Michael S. Tsirkin

On Thu, Feb 15, 2018 at 10:33:12PM +0100, Marc-André Lureau wrote:
> Modify fw_cfg_read_blob() to use DMA if the device supports it.
> Return errors, because the operation may fail.
> 
> So far, only one call in fw_cfg_register_dir_entries() is using
> kmalloc'ed buf and is thus clearly eligible to DMA read.
> 
> Initially, I didn't implement DMA read to speed up boot time, but as a
> first step before introducing DMA write (since read operations were
> already presents). Even more, I didn't realize fw-cfg entries were
> being read by the kernel during boot by default. But actally fw-cfg
> entries are being populated during module probe. I knew DMA improved a
> lot bios boot time (the main reason the DMA interface was added
> afaik). Let see the time it would take to read the whole ACPI
> tables (128kb allocated)
> 
>  # time cat /sys/firmware/qemu_fw_cfg/by_name/etc/acpi/tables/raw
>   - with DMA: sys 0m0.003s
>   - without DMA (-global fw_cfg.dma_enabled=off): sys 0m7.674s
> 
> FW_CFG_FILE_DIR (0x19) is the only "file" that is read during kernel
> boot to populate sysfs qemu_fw_cfg directory, and it is quite
> small (1-2kb). Since it does not expose itself, in order to measure
> the time it takes to read such small file, I took a comparable sized
> file of 2048 bytes and exposed it (-fw_cfg test,file=file with a
> modified read_raw enabling DMA)
> 
>  # perf stat -r 100 cat /sys/firmware/qemu_fw_cfg/by_name/test/raw >/dev/null
>   - with DMA:
>   0.636037  task-clock (msec) #0.141 CPUs utilized
> ( +-  1.19% )
>   - without DMA:
>   6.430128  task-clock (msec) #0.622 CPUs utilized
> ( +-  0.22% )
> 
> That's a few msec saved during boot by enabling DMA read (the gain
> would be more substantial if other & bigger fw-cfg entries are read by
> others from sysfs, unfortunately, it's not clear if we can always
> enable DMA there)
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 61 
> ++
>  1 file changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index 3015e77aebca..94df57e9be66 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -124,12 +124,47 @@ static ssize_t fw_cfg_dma_transfer(void *address, u32 
> length, u32 control)
>   return ret;
>  }
>  
> +/* with acpi & dev locks taken */
> +static ssize_t fw_cfg_read_blob_dma(u16 key,
> + void *buf, loff_t pos, size_t count)
> +{
> + ssize_t ret;
> +
> + if (pos == 0) {
> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> + | FW_CFG_DMA_CTL_SELECT
> + | FW_CFG_DMA_CTL_READ);
> + } else {
> + fw_cfg_sel_endianness(key);
> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> + if (ret < 0)
> + return ret;
> + ret = fw_cfg_dma_transfer(buf, count,
> + FW_CFG_DMA_CTL_READ);
> + }
> +
> + return ret;
> +}
> +
> +/* with acpi & dev locks taken */
> +static ssize_t fw_cfg_read_blob_io(u16 key,
> + void *buf, loff_t pos, size_t count)
> +{
> + fw_cfg_sel_endianness(key);
> + while (pos-- > 0)
> + ioread8(fw_cfg_reg_data);
> + ioread8_rep(fw_cfg_reg_data, buf, count);
> + return count;
> +}
> +
>  /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>  static ssize_t fw_cfg_read_blob(u16 key,
> - void *buf, loff_t pos, size_t count)
> + void *buf, loff_t pos, size_t count,
> + bool dma)
>  {
>   u32 glk = -1U;
>   acpi_status status;
> + ssize_t ret;
>  
>   /* If we have ACPI, ensure mutual exclusion against any potential
>* device access by the firmware, e.g. via AML methods:

so this adds a dma flag to fw_cfg_read_blob.



> @@ -143,14 +178,17 @@ static ssize_t fw_cfg_read_blob(u16 key,
>   }
>  
>   mutex_lock(_cfg_dev_lock);
> - fw_cfg_sel_endianness(key);
> - while (pos-- > 0)
> - ioread8(fw_cfg_reg_data);
> - ioread8_rep(fw_cfg_reg_data, buf, count);
> + if (dma && fw_cfg_dma_enabled()) {
> + ret = fw_cfg_read_blob_dma(key, buf, pos, count);
> + } else {
> + ret = fw_cfg_read_blob_io(key, buf, pos, count);
> + }
> +
>   mutex_unlock(_cfg_dev_lock);
>  
>   acpi_release_global_lock(glk);
> - return count;
> +
> + return ret;
>  }
>  
>  #ifdef CONFIG_CRASH_CORE

If set to false it does io, if set to true it does dma.

I would prefer passing an accessor function pointer
since that's clearer than true/false.


> @@ -284,7 +322,7 @@ static int fw_cfg_do_platform_probe(struct 
> platform_device *pdev)

[PATCH 3/3] console: Expand dummy functions for CFI

2018-02-26 Thread Kees Cook

This expands the no-op dummy functions into full prototypes to avoid
indirect call mismatches when running under Control Flow Integrity
checking, like with Clang's -fsanitize=cfi.

Co-Developed-by: Sami Tolvanen 
Signed-off-by: Sami Tolvanen 
Signed-off-by: Kees Cook 
---
 drivers/usb/misc/sisusbvga/sisusb_con.c | 67 +---
 drivers/video/console/dummycon.c| 69 +
 drivers/video/console/newport_con.c | 10 ++---
 drivers/video/console/vgacon.c  | 20 +-
 drivers/video/fbdev/core/fbcon.c|  3 +-
 5 files changed, 121 insertions(+), 48 deletions(-)

diff --git a/drivers/usb/misc/sisusbvga/sisusb_con.c 
b/drivers/usb/misc/sisusbvga/sisusb_con.c
index 998df891bdde..a0d6e0af957c 100644
--- a/drivers/usb/misc/sisusbvga/sisusb_con.c
+++ b/drivers/usb/misc/sisusbvga/sisusb_con.c
@@ -1217,7 +1217,7 @@ sisusbcon_do_font_op(struct sisusb_usb_data *sisusb, int 
set, int slot,
 /* Interface routine */
 static int
 sisusbcon_font_set(struct vc_data *c, struct console_font *font,
-   unsigned flags)
+  unsigned int flags)
 {
struct sisusb_usb_data *sisusb;
unsigned charcount = font->charcount;
@@ -1338,28 +1338,65 @@ static void sisusbdummycon_init(struct vc_data *vc, int 
init)
vc_resize(vc, 80, 25);
 }
 
-static int sisusbdummycon_dummy(void)
+static void sisusbdummycon_deinit(struct vc_data *vc) { }
+static void sisusbdummycon_clear(struct vc_data *vc, int sy, int sx,
+int height, int width) { }
+static void sisusbdummycon_putc(struct vc_data *vc, int c, int ypos,
+   int xpos) { }
+static void sisusbdummycon_putcs(struct vc_data *vc, const unsigned short *s,
+int count, int ypos, int xpos) { }
+static void sisusbdummycon_cursor(struct vc_data *vc, int mode) { }
+
+static bool sisusbdummycon_scroll(struct vc_data *vc, unsigned int top,
+ unsigned int bottom, enum con_scroll dir,
+ unsigned int lines)
 {
-return 0;
+   return false;
 }
 
-#define SISUSBCONDUMMY (void *)sisusbdummycon_dummy
+static int sisusbdummycon_switch(struct vc_data *vc)
+{
+   return 0;
+}
+
+static int sisusbdummycon_blank(struct vc_data *vc, int blank, int mode_switch)
+{
+   return 0;
+}
+
+static int sisusbdummycon_font_set(struct vc_data *vc,
+  struct console_font *font,
+  unsigned int flags)
+{
+   return 0;
+}
+
+static int sisusbdummycon_font_default(struct vc_data *vc,
+  struct console_font *font, char *name)
+{
+   return 0;
+}
+
+static int sisusbdummycon_font_copy(struct vc_data *vc, int con)
+{
+   return 0;
+}
 
 static const struct consw sisusb_dummy_con = {
.owner =THIS_MODULE,
.con_startup =  sisusbdummycon_startup,
.con_init = sisusbdummycon_init,
-   .con_deinit =   SISUSBCONDUMMY,
-   .con_clear =SISUSBCONDUMMY,
-   .con_putc = SISUSBCONDUMMY,
-   .con_putcs =SISUSBCONDUMMY,
-   .con_cursor =   SISUSBCONDUMMY,
-   .con_scroll =   SISUSBCONDUMMY,
-   .con_switch =   SISUSBCONDUMMY,
-   .con_blank =SISUSBCONDUMMY,
-   .con_font_set = SISUSBCONDUMMY,
-   .con_font_default = SISUSBCONDUMMY,
-   .con_font_copy =SISUSBCONDUMMY,
+   .con_deinit =   sisusbdummycon_deinit,
+   .con_clear =sisusbdummycon_clear,
+   .con_putc = sisusbdummycon_putc,
+   .con_putcs =sisusbdummycon_putcs,
+   .con_cursor =   sisusbdummycon_cursor,
+   .con_scroll =   sisusbdummycon_scroll,
+   .con_switch =   sisusbdummycon_switch,
+   .con_blank =sisusbdummycon_blank,
+   .con_font_set = sisusbdummycon_font_set,
+   .con_font_default = sisusbdummycon_font_default,
+   .con_font_copy =sisusbdummycon_font_copy,
 };
 
 int
diff --git a/drivers/video/console/dummycon.c b/drivers/video/console/dummycon.c
index b90ef96e43d6..f2eafe2ed980 100644
--- a/drivers/video/console/dummycon.c
+++ b/drivers/video/console/dummycon.c
@@ -41,12 +41,47 @@ static void dummycon_init(struct vc_data *vc, int init)
vc_resize(vc, DUMMY_COLUMNS, DUMMY_ROWS);
 }
 
-static int dummycon_dummy(void)
+static void dummycon_deinit(struct vc_data *vc) { }
+static void dummycon_clear(struct vc_data *vc, int sy, int sx, int height,
+  int width) { }
+static void dummycon_putc(struct vc_data *vc, int c, int ypos, int xpos) { }
+static void dummycon_putcs(struct vc_data *vc, const unsigned

[PATCH 3/3] console: Expand dummy functions for CFI

2018-02-26 Thread Kees Cook

This expands the no-op dummy functions into full prototypes to avoid
indirect call mismatches when running under Control Flow Integrity
checking, like with Clang's -fsanitize=cfi.

Co-Developed-by: Sami Tolvanen 
Signed-off-by: Sami Tolvanen 
Signed-off-by: Kees Cook 
---
 drivers/usb/misc/sisusbvga/sisusb_con.c | 67 +---
 drivers/video/console/dummycon.c| 69 +
 drivers/video/console/newport_con.c | 10 ++---
 drivers/video/console/vgacon.c  | 20 +-
 drivers/video/fbdev/core/fbcon.c|  3 +-
 5 files changed, 121 insertions(+), 48 deletions(-)

diff --git a/drivers/usb/misc/sisusbvga/sisusb_con.c 
b/drivers/usb/misc/sisusbvga/sisusb_con.c
index 998df891bdde..a0d6e0af957c 100644
--- a/drivers/usb/misc/sisusbvga/sisusb_con.c
+++ b/drivers/usb/misc/sisusbvga/sisusb_con.c
@@ -1217,7 +1217,7 @@ sisusbcon_do_font_op(struct sisusb_usb_data *sisusb, int 
set, int slot,
 /* Interface routine */
 static int
 sisusbcon_font_set(struct vc_data *c, struct console_font *font,
-   unsigned flags)
+  unsigned int flags)
 {
struct sisusb_usb_data *sisusb;
unsigned charcount = font->charcount;
@@ -1338,28 +1338,65 @@ static void sisusbdummycon_init(struct vc_data *vc, int 
init)
vc_resize(vc, 80, 25);
 }
 
-static int sisusbdummycon_dummy(void)
+static void sisusbdummycon_deinit(struct vc_data *vc) { }
+static void sisusbdummycon_clear(struct vc_data *vc, int sy, int sx,
+int height, int width) { }
+static void sisusbdummycon_putc(struct vc_data *vc, int c, int ypos,
+   int xpos) { }
+static void sisusbdummycon_putcs(struct vc_data *vc, const unsigned short *s,
+int count, int ypos, int xpos) { }
+static void sisusbdummycon_cursor(struct vc_data *vc, int mode) { }
+
+static bool sisusbdummycon_scroll(struct vc_data *vc, unsigned int top,
+ unsigned int bottom, enum con_scroll dir,
+ unsigned int lines)
 {
-return 0;
+   return false;
 }
 
-#define SISUSBCONDUMMY (void *)sisusbdummycon_dummy
+static int sisusbdummycon_switch(struct vc_data *vc)
+{
+   return 0;
+}
+
+static int sisusbdummycon_blank(struct vc_data *vc, int blank, int mode_switch)
+{
+   return 0;
+}
+
+static int sisusbdummycon_font_set(struct vc_data *vc,
+  struct console_font *font,
+  unsigned int flags)
+{
+   return 0;
+}
+
+static int sisusbdummycon_font_default(struct vc_data *vc,
+  struct console_font *font, char *name)
+{
+   return 0;
+}
+
+static int sisusbdummycon_font_copy(struct vc_data *vc, int con)
+{
+   return 0;
+}
 
 static const struct consw sisusb_dummy_con = {
.owner =THIS_MODULE,
.con_startup =  sisusbdummycon_startup,
.con_init = sisusbdummycon_init,
-   .con_deinit =   SISUSBCONDUMMY,
-   .con_clear =SISUSBCONDUMMY,
-   .con_putc = SISUSBCONDUMMY,
-   .con_putcs =SISUSBCONDUMMY,
-   .con_cursor =   SISUSBCONDUMMY,
-   .con_scroll =   SISUSBCONDUMMY,
-   .con_switch =   SISUSBCONDUMMY,
-   .con_blank =SISUSBCONDUMMY,
-   .con_font_set = SISUSBCONDUMMY,
-   .con_font_default = SISUSBCONDUMMY,
-   .con_font_copy =SISUSBCONDUMMY,
+   .con_deinit =   sisusbdummycon_deinit,
+   .con_clear =sisusbdummycon_clear,
+   .con_putc = sisusbdummycon_putc,
+   .con_putcs =sisusbdummycon_putcs,
+   .con_cursor =   sisusbdummycon_cursor,
+   .con_scroll =   sisusbdummycon_scroll,
+   .con_switch =   sisusbdummycon_switch,
+   .con_blank =sisusbdummycon_blank,
+   .con_font_set = sisusbdummycon_font_set,
+   .con_font_default = sisusbdummycon_font_default,
+   .con_font_copy =sisusbdummycon_font_copy,
 };
 
 int
diff --git a/drivers/video/console/dummycon.c b/drivers/video/console/dummycon.c
index b90ef96e43d6..f2eafe2ed980 100644
--- a/drivers/video/console/dummycon.c
+++ b/drivers/video/console/dummycon.c
@@ -41,12 +41,47 @@ static void dummycon_init(struct vc_data *vc, int init)
vc_resize(vc, DUMMY_COLUMNS, DUMMY_ROWS);
 }
 
-static int dummycon_dummy(void)
+static void dummycon_deinit(struct vc_data *vc) { }
+static void dummycon_clear(struct vc_data *vc, int sy, int sx, int height,
+  int width) { }
+static void dummycon_putc(struct vc_data *vc, int c, int ypos, int xpos) { }
+static void dummycon_putcs(struct vc_data *vc, const unsigned short *s,
+  int count, int ypos, int xpos) { }

Re: [PATCH] mm: Provide consistent declaration for num_poisoned_pages

2018-02-26 Thread David Rientjes

On Mon, 26 Feb 2018, Guenter Roeck wrote:

> clang reports the following compile warning.
> 
> In file included from mm/vmscan.c:56:
> ./include/linux/swapops.h:327:22: warning:
>   section attribute is specified on redeclared variable [-Wsection]
> extern atomic_long_t num_poisoned_pages __read_mostly;
>  ^
> ./include/linux/mm.h:2585:22: note: previous declaration is here
> extern atomic_long_t num_poisoned_pages;
>  ^
> 
> Let's use __read_mostly everywhere.
> 
> Signed-off-by: Guenter Roeck 
> Cc: Matthias Kaehlcke 
> ---
>  include/linux/mm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ad06d42adb1a..bd4bd59f02c1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2582,7 +2582,7 @@ extern int get_hwpoison_page(struct page *page);
>  extern int sysctl_memory_failure_early_kill;
>  extern int sysctl_memory_failure_recovery;
>  extern void shake_page(struct page *p, int access);
> -extern atomic_long_t num_poisoned_pages;
> +extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(struct page *page, int flags);
>  
>  

No objection to the patch, of course, but I'm wondering if it's (1) the 
only such clang compile warning for mm/, and (2) if the re-declaration in 
mm.h could be avoided by including swapops.h?

Re: [PATCH] mm: Provide consistent declaration for num_poisoned_pages

2018-02-26 Thread David Rientjes

On Mon, 26 Feb 2018, Guenter Roeck wrote:

> clang reports the following compile warning.
> 
> In file included from mm/vmscan.c:56:
> ./include/linux/swapops.h:327:22: warning:
>   section attribute is specified on redeclared variable [-Wsection]
> extern atomic_long_t num_poisoned_pages __read_mostly;
>  ^
> ./include/linux/mm.h:2585:22: note: previous declaration is here
> extern atomic_long_t num_poisoned_pages;
>  ^
> 
> Let's use __read_mostly everywhere.
> 
> Signed-off-by: Guenter Roeck 
> Cc: Matthias Kaehlcke 
> ---
>  include/linux/mm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ad06d42adb1a..bd4bd59f02c1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2582,7 +2582,7 @@ extern int get_hwpoison_page(struct page *page);
>  extern int sysctl_memory_failure_early_kill;
>  extern int sysctl_memory_failure_recovery;
>  extern void shake_page(struct page *p, int access);
> -extern atomic_long_t num_poisoned_pages;
> +extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(struct page *page, int flags);
>  
>  

No objection to the patch, of course, but I'm wondering if it's (1) the 
only such clang compile warning for mm/, and (2) if the re-declaration in 
mm.h could be avoided by including swapops.h?

[PATCH v7 2/7] fuse: Fail all requests with invalid uids or gids

2018-02-26 Thread Eric W. Biederman

Upon a cursory examinination the uid and gid of a fuse request are
necessary for correct operation.  Failing a fuse request where those
values are not reliable seems a straight forward and reliable means of
ensuring that fuse requests with bad data are not sent or processed.

In most cases the vfs will avoid actions it suspects will cause
an inode write back of an inode with an invalid uid or gid.  But that does
not map precisely to what fuse is doing, so test for this and solve
this at the fuse level as well.

Performing this work in fuse_req_init_context is cheap as the code is
already performing the translation here and only needs to check the
result of the translation to see if things are not representable in
a form the fuse server can handle.

Signed-off-by: Eric W. Biederman 
---
 fs/fuse/dev.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0fb58f364fa6..2886a56d5f61 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -112,11 +112,20 @@ static void __fuse_put_request(struct fuse_req *req)
refcount_dec(>count);
 }
 
-static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
+static bool fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-   req->in.h.uid = from_kuid_munged(_user_ns, current_fsuid());
-   req->in.h.gid = from_kgid_munged(_user_ns, current_fsgid());
+   req->in.h.uid = from_kuid(_user_ns, current_fsuid());
+   req->in.h.gid = from_kgid(_user_ns, current_fsgid());
req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
+
+   return (req->in.h.uid != ((uid_t)-1)) && (req->in.h.gid != ((gid_t)-1));
+}
+
+static void fuse_req_init_context_nofail(struct fuse_req *req)
+{
+   req->in.h.uid = 0;
+   req->in.h.gid = 0;
+   req->in.h.pid = 0;
 }
 
 void fuse_set_initialized(struct fuse_conn *fc)
@@ -162,12 +171,13 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn 
*fc, unsigned npages,
wake_up(>blocked_waitq);
goto out;
}
-
-   fuse_req_init_context(fc, req);
__set_bit(FR_WAITING, >flags);
if (for_background)
__set_bit(FR_BACKGROUND, >flags);
-
+   if (unlikely(!fuse_req_init_context(fc, req))) {
+   fuse_put_request(fc, req);
+   return ERR_PTR(-EOVERFLOW);
+   }
return req;
 
  out:
@@ -256,7 +266,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct 
fuse_conn *fc,
if (!req)
req = get_reserved_req(fc, file);
 
-   fuse_req_init_context(fc, req);
+   fuse_req_init_context_nofail(req);
__set_bit(FR_WAITING, >flags);
__clear_bit(FR_BACKGROUND, >flags);
return req;
-- 
2.14.1

[PATCH v7 6/7] fuse: Support fuse filesystems outside of init_user_ns

2018-02-26 Thread Eric W. Biederman

In order to support mounts from namespaces other than init_user_ns,
fuse must translate uids and gids to/from the userns of the process
servicing requests on /dev/fuse. This patch does that, with a couple
of restrictions on the namespace:

 - The userns for the fuse connection is fixed to the namespace
   from which /dev/fuse is opened.

 - The namespace must be the same as s_user_ns.

These restrictions simplify the implementation by avoiding the need to
pass around userns references and by allowing fuse to rely on the
checks in setattr_prepare for ownership changes.  Either restriction
could be relaxed in the future if needed.

For cuse the userns used is the opener of /dev/cuse.  Semantically the
cuse support does not appear safe for unprivileged users.  Practically
the permissions on /dev/cuse only make it accessible to the global root
user.  If something slips through the cracks in a user namespace the only
users who will be able to use the cuse device are those users mapped into
the user namespace.

Translation in the posix acl is updated to use the uuser namespace of
the filesystem.  Avoiding cases which might bypass this translation is
handled in a following change.

This change is stronlgy based on a similar change from Seth Forshee
and Dongsu Park.

Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Miklos Szeredi 
Cc: 
Cc: Dongsu Park 
Signed-off-by: Eric W. Biederman 
---
 fs/fuse/acl.c|  4 ++--
 fs/fuse/cuse.c   |  7 ++-
 fs/fuse/dev.c|  4 ++--
 fs/fuse/dir.c| 14 +++---
 fs/fuse/fuse_i.h |  6 +-
 fs/fuse/inode.c  | 31 +++
 6 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/fs/fuse/acl.c b/fs/fuse/acl.c
index 8fb2153dbf50..5a67c80e21d6 100644
--- a/fs/fuse/acl.c
+++ b/fs/fuse/acl.c
@@ -34,7 +34,7 @@ struct posix_acl *fuse_get_acl(struct inode *inode, int type)
return ERR_PTR(-ENOMEM);
size = fuse_getxattr(inode, name, value, PAGE_SIZE);
if (size > 0)
-   acl = posix_acl_from_xattr(_user_ns, value, size);
+   acl = posix_acl_from_xattr(fc->user_ns, value, size);
else if ((size == 0) || (size == -ENODATA) ||
 (size == -EOPNOTSUPP && fc->no_getxattr))
acl = NULL;
@@ -81,7 +81,7 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
if (!value)
return -ENOMEM;
 
-   ret = posix_acl_to_xattr(_user_ns, acl, value, size);
+   ret = posix_acl_to_xattr(fc->user_ns, acl, value, size);
if (ret < 0) {
kfree(value);
return ret;
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index e9e97803442a..036ee477669e 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fuse_i.h"
 
@@ -498,7 +499,11 @@ static int cuse_channel_open(struct inode *inode, struct 
file *file)
if (!cc)
return -ENOMEM;
 
-   fuse_conn_init(>fc);
+   /*
+* Limit the cuse channel to requests that can
+* be represented in file->f_cred->user_ns.
+*/
+   fuse_conn_init(>fc, file->f_cred->user_ns);
 
fud = fuse_dev_alloc(>fc);
if (!fud) {
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 2886a56d5f61..fce7915aea13 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -114,8 +114,8 @@ static void __fuse_put_request(struct fuse_req *req)
 
 static bool fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-   req->in.h.uid = from_kuid(_user_ns, current_fsuid());
-   req->in.h.gid = from_kgid(_user_ns, current_fsgid());
+   req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
+   req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 
return (req->in.h.uid != ((uid_t)-1)) && (req->in.h.gid != ((gid_t)-1));
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index a44ca509db4f..79cca1687457 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -858,8 +858,8 @@ static void fuse_fillattr(struct inode *inode, struct 
fuse_attr *attr,
stat->ino = attr->ino;
stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 0);
stat->nlink = attr->nlink;
-   stat->uid = make_kuid(_user_ns, attr->uid);
-   stat->gid = make_kgid(_user_ns, attr->gid);
+   stat->uid = make_kuid(fc->user_ns, attr->uid);
+   stat->gid = make_kgid(fc->user_ns, attr->gid);
stat->rdev = inode->i_rdev;
stat->atime.tv_sec = attr->atime;
stat->atime.tv_nsec = attr->atimensec;
@@ -1475,17 +1475,17 @@ static bool update_mtime(unsigned ivalid, bool 
trust_local_mtime)
return true;
 }
 
-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-

[PATCH v7 2/7] fuse: Fail all requests with invalid uids or gids

2018-02-26 Thread Eric W. Biederman

Upon a cursory examinination the uid and gid of a fuse request are
necessary for correct operation.  Failing a fuse request where those
values are not reliable seems a straight forward and reliable means of
ensuring that fuse requests with bad data are not sent or processed.

In most cases the vfs will avoid actions it suspects will cause
an inode write back of an inode with an invalid uid or gid.  But that does
not map precisely to what fuse is doing, so test for this and solve
this at the fuse level as well.

Performing this work in fuse_req_init_context is cheap as the code is
already performing the translation here and only needs to check the
result of the translation to see if things are not representable in
a form the fuse server can handle.

Signed-off-by: Eric W. Biederman 
---
 fs/fuse/dev.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0fb58f364fa6..2886a56d5f61 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -112,11 +112,20 @@ static void __fuse_put_request(struct fuse_req *req)
refcount_dec(>count);
 }
 
-static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
+static bool fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-   req->in.h.uid = from_kuid_munged(_user_ns, current_fsuid());
-   req->in.h.gid = from_kgid_munged(_user_ns, current_fsgid());
+   req->in.h.uid = from_kuid(_user_ns, current_fsuid());
+   req->in.h.gid = from_kgid(_user_ns, current_fsgid());
req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
+
+   return (req->in.h.uid != ((uid_t)-1)) && (req->in.h.gid != ((gid_t)-1));
+}
+
+static void fuse_req_init_context_nofail(struct fuse_req *req)
+{
+   req->in.h.uid = 0;
+   req->in.h.gid = 0;
+   req->in.h.pid = 0;
 }
 
 void fuse_set_initialized(struct fuse_conn *fc)
@@ -162,12 +171,13 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn 
*fc, unsigned npages,
wake_up(>blocked_waitq);
goto out;
}
-
-   fuse_req_init_context(fc, req);
__set_bit(FR_WAITING, >flags);
if (for_background)
__set_bit(FR_BACKGROUND, >flags);
-
+   if (unlikely(!fuse_req_init_context(fc, req))) {
+   fuse_put_request(fc, req);
+   return ERR_PTR(-EOVERFLOW);
+   }
return req;
 
  out:
@@ -256,7 +266,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct 
fuse_conn *fc,
if (!req)
req = get_reserved_req(fc, file);
 
-   fuse_req_init_context(fc, req);
+   fuse_req_init_context_nofail(req);
__set_bit(FR_WAITING, >flags);
__clear_bit(FR_BACKGROUND, >flags);
return req;
-- 
2.14.1

[PATCH v7 6/7] fuse: Support fuse filesystems outside of init_user_ns

2018-02-26 Thread Eric W. Biederman

In order to support mounts from namespaces other than init_user_ns,
fuse must translate uids and gids to/from the userns of the process
servicing requests on /dev/fuse. This patch does that, with a couple
of restrictions on the namespace:

 - The userns for the fuse connection is fixed to the namespace
   from which /dev/fuse is opened.

 - The namespace must be the same as s_user_ns.

These restrictions simplify the implementation by avoiding the need to
pass around userns references and by allowing fuse to rely on the
checks in setattr_prepare for ownership changes.  Either restriction
could be relaxed in the future if needed.

For cuse the userns used is the opener of /dev/cuse.  Semantically the
cuse support does not appear safe for unprivileged users.  Practically
the permissions on /dev/cuse only make it accessible to the global root
user.  If something slips through the cracks in a user namespace the only
users who will be able to use the cuse device are those users mapped into
the user namespace.

Translation in the posix acl is updated to use the uuser namespace of
the filesystem.  Avoiding cases which might bypass this translation is
handled in a following change.

This change is stronlgy based on a similar change from Seth Forshee
and Dongsu Park.

Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Miklos Szeredi 
Cc: 
Cc: Dongsu Park 
Signed-off-by: Eric W. Biederman 
---
 fs/fuse/acl.c|  4 ++--
 fs/fuse/cuse.c   |  7 ++-
 fs/fuse/dev.c|  4 ++--
 fs/fuse/dir.c| 14 +++---
 fs/fuse/fuse_i.h |  6 +-
 fs/fuse/inode.c  | 31 +++
 6 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/fs/fuse/acl.c b/fs/fuse/acl.c
index 8fb2153dbf50..5a67c80e21d6 100644
--- a/fs/fuse/acl.c
+++ b/fs/fuse/acl.c
@@ -34,7 +34,7 @@ struct posix_acl *fuse_get_acl(struct inode *inode, int type)
return ERR_PTR(-ENOMEM);
size = fuse_getxattr(inode, name, value, PAGE_SIZE);
if (size > 0)
-   acl = posix_acl_from_xattr(_user_ns, value, size);
+   acl = posix_acl_from_xattr(fc->user_ns, value, size);
else if ((size == 0) || (size == -ENODATA) ||
 (size == -EOPNOTSUPP && fc->no_getxattr))
acl = NULL;
@@ -81,7 +81,7 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
if (!value)
return -ENOMEM;
 
-   ret = posix_acl_to_xattr(_user_ns, acl, value, size);
+   ret = posix_acl_to_xattr(fc->user_ns, acl, value, size);
if (ret < 0) {
kfree(value);
return ret;
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index e9e97803442a..036ee477669e 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fuse_i.h"
 
@@ -498,7 +499,11 @@ static int cuse_channel_open(struct inode *inode, struct 
file *file)
if (!cc)
return -ENOMEM;
 
-   fuse_conn_init(>fc);
+   /*
+* Limit the cuse channel to requests that can
+* be represented in file->f_cred->user_ns.
+*/
+   fuse_conn_init(>fc, file->f_cred->user_ns);
 
fud = fuse_dev_alloc(>fc);
if (!fud) {
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 2886a56d5f61..fce7915aea13 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -114,8 +114,8 @@ static void __fuse_put_request(struct fuse_req *req)
 
 static bool fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-   req->in.h.uid = from_kuid(_user_ns, current_fsuid());
-   req->in.h.gid = from_kgid(_user_ns, current_fsgid());
+   req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
+   req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 
return (req->in.h.uid != ((uid_t)-1)) && (req->in.h.gid != ((gid_t)-1));
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index a44ca509db4f..79cca1687457 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -858,8 +858,8 @@ static void fuse_fillattr(struct inode *inode, struct 
fuse_attr *attr,
stat->ino = attr->ino;
stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 0);
stat->nlink = attr->nlink;
-   stat->uid = make_kuid(_user_ns, attr->uid);
-   stat->gid = make_kgid(_user_ns, attr->gid);
+   stat->uid = make_kuid(fc->user_ns, attr->uid);
+   stat->gid = make_kgid(fc->user_ns, attr->gid);
stat->rdev = inode->i_rdev;
stat->atime.tv_sec = attr->atime;
stat->atime.tv_nsec = attr->atimensec;
@@ -1475,17 +1475,17 @@ static bool update_mtime(unsigned ivalid, bool 
trust_local_mtime)
return true;
 }
 
-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-  bool trust_local_cmtime)
+static void iattr_to_fattr(struct fuse_conn *fc,

Re: [PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-26 Thread Paul E. McKenney

On Tue, Feb 27, 2018 at 08:40:47AM +0900, Byungchul Park wrote:
> On 2/27/2018 8:35 AM, Byungchul Park wrote:
> >On 2/27/2018 3:22 AM, Paul E. McKenney wrote:
> >>On Mon, Feb 26, 2018 at 12:15:14PM -0500, Steven Rostedt wrote:
> >>>On Mon, 26 Feb 2018 14:11:36 +0900
> >>>Byungchul Park  wrote:
> >>>
> rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
> Use softirq to address performance regression), where it had to be
> distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
> it's not.
> 
> Now that the code was cleaned up so that rcu_preemt_do_callback() is
> only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
> we don't have to keep the separate function anymore. Remove it for a
> better readability.
> >>>
> >>>Looks good to me (looks like commit f8b7fc6b51 "rcu: use softirq
> >>>instead of kthreads except when RCU_BOOST=y" cleaned up the ifdefs and
> >>>removed the requirement).
> >>>
> >>>Reviewed-by: Steven Rostedt (VMware) 
> >>
> >>Thank you both!  I have queued a slightly modified patch for testing
> >>and further review.  Please see below and let me know if I messed
> >>something up.
> >>
> >>    Thanx, Paul
> >>
> >>
> >>
> >>commit b8a3012ddba397d4a18d9fd4a00432f8c2626bd6
> >>Author: Byungchul Park 
> >>Date:   Mon Feb 26 14:11:36 2018 +0900
> >>
> >> rcu: Inline rcu_preempt_do_callback() into its sole caller
> >> The rcu_preempt_do_callbacks() function was introduced in commit
> >> 09223371dea(rcu: Use softirq to address performance
> >>regression), where it
> >> was necessary to handle kernel builds both containing and
> >>not containing
> >> RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
> >> ("rcu: use softirq instead of kthreads except when
> >>RCU_BOOST=y")) have
> >> resulted in this function being invoked only from
> >>rcu_kthread_do_work(),
> >> which is present only in kernels containing RCU-preempt,
> >>which in turn
> >> means that the rcu_preempt_do_callbacks() function is no
> >>longer needed.
> >> This commit therefore inlines rcu_preempt_do_callbacks() into its
> >> sole remaining caller and also removes the rcu_state_p and
> >>rcu_data_p
> >> indirection for added clarity.
> >> Signed-off-by: Byungchul Park 
> >> Reviewed-by: Steven Rostedt (VMware) 
> >> [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]
> >>
> >>diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> >>index dc6f2319fc21..9dd0ea77faed 100644
> >>--- a/kernel/rcu/tree.h
> >>+++ b/kernel/rcu/tree.h
> >>@@ -449,7 +449,6 @@ static void
> >>rcu_preempt_boost_start_gp(struct rcu_node *rnp);
> >>  static void invoke_rcu_callbacks_kthread(void);
> >>  static bool rcu_is_callbacks_kthread(void);
> >>  #ifdef CONFIG_RCU_BOOST
> >>-static void rcu_preempt_do_callbacks(void);
> >>  static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
> >>   struct rcu_node *rnp);
> >>  #endif /* #ifdef CONFIG_RCU_BOOST */
> >>diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>index 26d7a31e81cb..b0d7f9ba6bf2 100644
> >>--- a/kernel/rcu/tree_plugin.h
> >>+++ b/kernel/rcu/tree_plugin.h
> >>@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
> >>  t->rcu_read_unlock_special.b.need_qs = true;
> >>  }
> >>-#ifdef CONFIG_RCU_BOOST
> >>-
> >>-static void rcu_preempt_do_callbacks(void)
> >>-{
> >>-    rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
> >>-}
> >>-
> >>-#endif /* #ifdef CONFIG_RCU_BOOST */
> >>-
> >>  /**
> >>   * call_rcu() - Queue an RCU callback for invocation after a
> >>grace period.
> >>   * @head: structure to be used for queueing the RCU updates.
> >>@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
> >>  {
> >>  rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
> >>  rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
> >>-    rcu_preempt_do_callbacks();
> >>+    rcu_do_batch(_preempt_state, this_cpu_ptr(_preempt_data));
> >
> >OMG. Sorry for the mistake and thank you very much for fixing it.
> >
> >I will be more careful.
> 
> Ah. Logically no difference between mine and your fixed one.

Yes, your patch was perfectly fine.  It just kept an additional pair of
memory fetches that are unnecessary.  So if you considered your patch
to have a problem, then the original code had that same problem.

> But anyway yours looks much better! Thank you~ :)

And you!

Thanx, Paul

Re: [PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-26 Thread Paul E. McKenney

On Tue, Feb 27, 2018 at 08:40:47AM +0900, Byungchul Park wrote:
> On 2/27/2018 8:35 AM, Byungchul Park wrote:
> >On 2/27/2018 3:22 AM, Paul E. McKenney wrote:
> >>On Mon, Feb 26, 2018 at 12:15:14PM -0500, Steven Rostedt wrote:
> >>>On Mon, 26 Feb 2018 14:11:36 +0900
> >>>Byungchul Park  wrote:
> >>>
> rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
> Use softirq to address performance regression), where it had to be
> distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
> it's not.
> 
> Now that the code was cleaned up so that rcu_preemt_do_callback() is
> only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
> we don't have to keep the separate function anymore. Remove it for a
> better readability.
> >>>
> >>>Looks good to me (looks like commit f8b7fc6b51 "rcu: use softirq
> >>>instead of kthreads except when RCU_BOOST=y" cleaned up the ifdefs and
> >>>removed the requirement).
> >>>
> >>>Reviewed-by: Steven Rostedt (VMware) 
> >>
> >>Thank you both!  I have queued a slightly modified patch for testing
> >>and further review.  Please see below and let me know if I messed
> >>something up.
> >>
> >>    Thanx, Paul
> >>
> >>
> >>
> >>commit b8a3012ddba397d4a18d9fd4a00432f8c2626bd6
> >>Author: Byungchul Park 
> >>Date:   Mon Feb 26 14:11:36 2018 +0900
> >>
> >> rcu: Inline rcu_preempt_do_callback() into its sole caller
> >> The rcu_preempt_do_callbacks() function was introduced in commit
> >> 09223371dea(rcu: Use softirq to address performance
> >>regression), where it
> >> was necessary to handle kernel builds both containing and
> >>not containing
> >> RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
> >> ("rcu: use softirq instead of kthreads except when
> >>RCU_BOOST=y")) have
> >> resulted in this function being invoked only from
> >>rcu_kthread_do_work(),
> >> which is present only in kernels containing RCU-preempt,
> >>which in turn
> >> means that the rcu_preempt_do_callbacks() function is no
> >>longer needed.
> >> This commit therefore inlines rcu_preempt_do_callbacks() into its
> >> sole remaining caller and also removes the rcu_state_p and
> >>rcu_data_p
> >> indirection for added clarity.
> >> Signed-off-by: Byungchul Park 
> >> Reviewed-by: Steven Rostedt (VMware) 
> >> [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]
> >>
> >>diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> >>index dc6f2319fc21..9dd0ea77faed 100644
> >>--- a/kernel/rcu/tree.h
> >>+++ b/kernel/rcu/tree.h
> >>@@ -449,7 +449,6 @@ static void
> >>rcu_preempt_boost_start_gp(struct rcu_node *rnp);
> >>  static void invoke_rcu_callbacks_kthread(void);
> >>  static bool rcu_is_callbacks_kthread(void);
> >>  #ifdef CONFIG_RCU_BOOST
> >>-static void rcu_preempt_do_callbacks(void);
> >>  static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
> >>   struct rcu_node *rnp);
> >>  #endif /* #ifdef CONFIG_RCU_BOOST */
> >>diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>index 26d7a31e81cb..b0d7f9ba6bf2 100644
> >>--- a/kernel/rcu/tree_plugin.h
> >>+++ b/kernel/rcu/tree_plugin.h
> >>@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
> >>  t->rcu_read_unlock_special.b.need_qs = true;
> >>  }
> >>-#ifdef CONFIG_RCU_BOOST
> >>-
> >>-static void rcu_preempt_do_callbacks(void)
> >>-{
> >>-    rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
> >>-}
> >>-
> >>-#endif /* #ifdef CONFIG_RCU_BOOST */
> >>-
> >>  /**
> >>   * call_rcu() - Queue an RCU callback for invocation after a
> >>grace period.
> >>   * @head: structure to be used for queueing the RCU updates.
> >>@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
> >>  {
> >>  rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
> >>  rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
> >>-    rcu_preempt_do_callbacks();
> >>+    rcu_do_batch(_preempt_state, this_cpu_ptr(_preempt_data));
> >
> >OMG. Sorry for the mistake and thank you very much for fixing it.
> >
> >I will be more careful.
> 
> Ah. Logically no difference between mine and your fixed one.

Yes, your patch was perfectly fine.  It just kept an additional pair of
memory fetches that are unnecessary.  So if you considered your patch
to have a problem, then the original code had that same problem.

> But anyway yours looks much better! Thank you~ :)

And you!

Thanx, Paul

[PATCH v7 5/7] fuse: Simplfiy the posix acl handling logic.

2018-02-26 Thread Eric W. Biederman

Rename the fuse connection flag posix_acl to cached_posix_acl as that
is what it actually means.  That fuse will cache and operate on the
cached value of the posix acl.

When fc->cached_posix_acl is not set, set ACL_DONT_CACHE on the inode
so that get_acl and friends won't cache the acl values even if they
are called.

Replace forget_all_cached_acls with fuse_forget_cached_acls.  This
wrapper only takes effect when cached_posix_acl is true to prevent
losing the nocache or noxattr status in when posix acls are not
cached.

Always use posix_acl_access_xattr_handler so the fuse code
benefits from the generic posix acl handlers as much as possible.
This will become important as the code works on translation
of uid and gid in the posix acls when fuse is not mounted in
the initial user namespace.

Signed-off-by: "Eric W. Biederman" 
---
 fs/fuse/acl.c|  6 +++---
 fs/fuse/dir.c| 11 +--
 fs/fuse/fuse_i.h |  5 +++--
 fs/fuse/inode.c  | 13 ++---
 fs/fuse/xattr.c  |  5 -
 5 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/fs/fuse/acl.c b/fs/fuse/acl.c
index ec85765502f1..8fb2153dbf50 100644
--- a/fs/fuse/acl.c
+++ b/fs/fuse/acl.c
@@ -19,7 +19,7 @@ struct posix_acl *fuse_get_acl(struct inode *inode, int type)
void *value = NULL;
struct posix_acl *acl;
 
-   if (!fc->posix_acl || fc->no_getxattr)
+   if (fc->no_getxattr)
return NULL;
 
if (type == ACL_TYPE_ACCESS)
@@ -53,7 +53,7 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
const char *name;
int ret;
 
-   if (!fc->posix_acl || fc->no_setxattr)
+   if (fc->no_setxattr)
return -EOPNOTSUPP;
 
if (type == ACL_TYPE_ACCESS)
@@ -92,7 +92,7 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
} else {
ret = fuse_removexattr(inode, name);
}
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
fuse_invalidate_attr(inode);
 
return ret;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 24967382a7b1..a44ca509db4f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -237,7 +237,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, 
unsigned int flags)
if (ret || (outarg.attr.mode ^ inode->i_mode) & S_IFMT)
goto invalid;
 
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
fuse_change_attributes(inode, ,
   entry_attr_timeout(),
   attr_version);
@@ -930,7 +930,7 @@ static int fuse_update_get_attr(struct inode *inode, struct 
file *file,
int err = 0;
 
if (time_before64(fi->i_time, get_jiffies_64())) {
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
err = fuse_do_getattr(inode, stat, file);
} else if (stat) {
generic_fillattr(inode, stat);
@@ -1076,7 +1076,7 @@ static int fuse_perm_getattr(struct inode *inode, int 
mask)
if (mask & MAY_NOT_BLOCK)
return -ECHILD;
 
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
return fuse_do_getattr(inode, NULL, NULL);
 }
 
@@ -1246,7 +1246,7 @@ static int fuse_direntplus_link(struct file *file,
fi->nlookup++;
spin_unlock(>lock);
 
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
fuse_change_attributes(inode, >attr,
   entry_attr_timeout(o),
   attr_version);
@@ -1764,8 +1764,7 @@ static int fuse_setattr(struct dentry *entry, struct 
iattr *attr)
 * If filesystem supports acls it may have updated acl xattrs in
 * the filesystem, so forget cached acls for the inode.
 */
-   if (fc->posix_acl)
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
 
/* Directory mode changed, may need to revalidate access */
if (d_is_dir(entry) && (attr->ia_valid & ATTR_MODE))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index c4c093bbf456..3cf296d60bc0 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -619,7 +619,7 @@ struct fuse_conn {
unsigned no_lseek:1;
 
/** Does the filesystem support posix acls? */
-   unsigned posix_acl:1;
+   unsigned cached_posix_acl:1;
 
/** Check permissions based on the file mode or not? */
unsigned default_permissions:1;
@@ -913,6 +913,8 @@ void fuse_release_nowrite(struct inode *inode);
 
 u64 fuse_get_attr_version(struct fuse_conn *fc);
 
+void fuse_forget_cached_acls(struct inode *inode);
+
 /**
  * File-system tells the kernel to invalidate cache for the given node

[PATCH v7 5/7] fuse: Simplfiy the posix acl handling logic.

2018-02-26 Thread Eric W. Biederman

Rename the fuse connection flag posix_acl to cached_posix_acl as that
is what it actually means.  That fuse will cache and operate on the
cached value of the posix acl.

When fc->cached_posix_acl is not set, set ACL_DONT_CACHE on the inode
so that get_acl and friends won't cache the acl values even if they
are called.

Replace forget_all_cached_acls with fuse_forget_cached_acls.  This
wrapper only takes effect when cached_posix_acl is true to prevent
losing the nocache or noxattr status in when posix acls are not
cached.

Always use posix_acl_access_xattr_handler so the fuse code
benefits from the generic posix acl handlers as much as possible.
This will become important as the code works on translation
of uid and gid in the posix acls when fuse is not mounted in
the initial user namespace.

Signed-off-by: "Eric W. Biederman" 
---
 fs/fuse/acl.c|  6 +++---
 fs/fuse/dir.c| 11 +--
 fs/fuse/fuse_i.h |  5 +++--
 fs/fuse/inode.c  | 13 ++---
 fs/fuse/xattr.c  |  5 -
 5 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/fs/fuse/acl.c b/fs/fuse/acl.c
index ec85765502f1..8fb2153dbf50 100644
--- a/fs/fuse/acl.c
+++ b/fs/fuse/acl.c
@@ -19,7 +19,7 @@ struct posix_acl *fuse_get_acl(struct inode *inode, int type)
void *value = NULL;
struct posix_acl *acl;
 
-   if (!fc->posix_acl || fc->no_getxattr)
+   if (fc->no_getxattr)
return NULL;
 
if (type == ACL_TYPE_ACCESS)
@@ -53,7 +53,7 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
const char *name;
int ret;
 
-   if (!fc->posix_acl || fc->no_setxattr)
+   if (fc->no_setxattr)
return -EOPNOTSUPP;
 
if (type == ACL_TYPE_ACCESS)
@@ -92,7 +92,7 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
} else {
ret = fuse_removexattr(inode, name);
}
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
fuse_invalidate_attr(inode);
 
return ret;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 24967382a7b1..a44ca509db4f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -237,7 +237,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, 
unsigned int flags)
if (ret || (outarg.attr.mode ^ inode->i_mode) & S_IFMT)
goto invalid;
 
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
fuse_change_attributes(inode, ,
   entry_attr_timeout(),
   attr_version);
@@ -930,7 +930,7 @@ static int fuse_update_get_attr(struct inode *inode, struct 
file *file,
int err = 0;
 
if (time_before64(fi->i_time, get_jiffies_64())) {
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
err = fuse_do_getattr(inode, stat, file);
} else if (stat) {
generic_fillattr(inode, stat);
@@ -1076,7 +1076,7 @@ static int fuse_perm_getattr(struct inode *inode, int 
mask)
if (mask & MAY_NOT_BLOCK)
return -ECHILD;
 
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
return fuse_do_getattr(inode, NULL, NULL);
 }
 
@@ -1246,7 +1246,7 @@ static int fuse_direntplus_link(struct file *file,
fi->nlookup++;
spin_unlock(>lock);
 
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
fuse_change_attributes(inode, >attr,
   entry_attr_timeout(o),
   attr_version);
@@ -1764,8 +1764,7 @@ static int fuse_setattr(struct dentry *entry, struct 
iattr *attr)
 * If filesystem supports acls it may have updated acl xattrs in
 * the filesystem, so forget cached acls for the inode.
 */
-   if (fc->posix_acl)
-   forget_all_cached_acls(inode);
+   fuse_forget_cached_acls(inode);
 
/* Directory mode changed, may need to revalidate access */
if (d_is_dir(entry) && (attr->ia_valid & ATTR_MODE))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index c4c093bbf456..3cf296d60bc0 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -619,7 +619,7 @@ struct fuse_conn {
unsigned no_lseek:1;
 
/** Does the filesystem support posix acls? */
-   unsigned posix_acl:1;
+   unsigned cached_posix_acl:1;
 
/** Check permissions based on the file mode or not? */
unsigned default_permissions:1;
@@ -913,6 +913,8 @@ void fuse_release_nowrite(struct inode *inode);
 
 u64 fuse_get_attr_version(struct fuse_conn *fc);
 
+void fuse_forget_cached_acls(struct inode *inode);
+
 /**
  * File-system tells the kernel to invalidate cache for the given node id.
  */
@@ -974,7

[PATCH v7 7/7] fuse: Restrict allow_other to the superblock's namespace or a descendant

2018-02-26 Thread Eric W. Biederman

From: Seth Forshee 

Unprivileged users are normally restricted from mounting with the
allow_other option by system policy, but this could be bypassed
for a mount done with user namespace root permissions. In such
cases allow_other should not allow users outside the userns
to access the mount as doing so would give the unprivileged user
the ability to manipulate processes it would otherwise be unable
to manipulate. Restrict allow_other to apply to users in the same
userns used at mount or a descendant of that namespace. Also
export current_in_userns() for use by fuse when built as a
module.

Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: "Eric W. Biederman" 
Cc: Serge Hallyn 
Cc: Miklos Szeredi 
Acked-by: Miklos Szeredi 
Reviewed-by: Serge Hallyn 
Reviewed-by: "Eric W. Biederman" 
Signed-off-by: Seth Forshee 
Signed-off-by: Dongsu Park 
Signed-off-by: Eric W. Biederman 
---
 fs/fuse/dir.c   | 2 +-
 kernel/user_namespace.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 79cca1687457..0cbd1ff3dd48 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1030,7 +1030,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
const struct cred *cred;
 
if (fc->allow_other)
-   return 1;
+   return current_in_userns(fc->user_ns);
 
cred = current_cred();
if (uid_eq(cred->euid, fc->user_id) &&
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 246d4d4ce5c7..492c255e6c5a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1235,6 +1235,7 @@ bool current_in_userns(const struct user_namespace 
*target_ns)
 {
return in_userns(target_ns, current_user_ns());
 }
+EXPORT_SYMBOL(current_in_userns);
 
 static inline struct user_namespace *to_user_ns(struct ns_common *ns)
 {
-- 
2.14.1

[PATCH v7 7/7] fuse: Restrict allow_other to the superblock's namespace or a descendant

2018-02-26 Thread Eric W. Biederman

From: Seth Forshee 

Unprivileged users are normally restricted from mounting with the
allow_other option by system policy, but this could be bypassed
for a mount done with user namespace root permissions. In such
cases allow_other should not allow users outside the userns
to access the mount as doing so would give the unprivileged user
the ability to manipulate processes it would otherwise be unable
to manipulate. Restrict allow_other to apply to users in the same
userns used at mount or a descendant of that namespace. Also
export current_in_userns() for use by fuse when built as a
module.

Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: "Eric W. Biederman" 
Cc: Serge Hallyn 
Cc: Miklos Szeredi 
Acked-by: Miklos Szeredi 
Reviewed-by: Serge Hallyn 
Reviewed-by: "Eric W. Biederman" 
Signed-off-by: Seth Forshee 
Signed-off-by: Dongsu Park 
Signed-off-by: Eric W. Biederman 
---
 fs/fuse/dir.c   | 2 +-
 kernel/user_namespace.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 79cca1687457..0cbd1ff3dd48 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1030,7 +1030,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
const struct cred *cred;
 
if (fc->allow_other)
-   return 1;
+   return current_in_userns(fc->user_ns);
 
cred = current_cred();
if (uid_eq(cred->euid, fc->user_id) &&
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 246d4d4ce5c7..492c255e6c5a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1235,6 +1235,7 @@ bool current_in_userns(const struct user_namespace 
*target_ns)
 {
return in_userns(target_ns, current_user_ns());
 }
+EXPORT_SYMBOL(current_in_userns);
 
 static inline struct user_namespace *to_user_ns(struct ns_common *ns)
 {
-- 
2.14.1

[PATCH v7 3/7] fs/posix_acl: Document that get_acl respects ACL_DONT_CACHE

2018-02-26 Thread Eric W. Biederman

Fuse is about to join overlayfs in relying on get_acl respecting
ACL_DONT_CACHE so update the documentation in get_acl to reflect that
fact.  The comment and this change description should give people a
clue that respecting ACL_DONT_CACHE in get_acl is important, and they
should audit the filesystems before removing that support.

Additionaly update the comment above the call to get_acl itself and
remove the wrong information that an implementation of get_acl can
prevent caching by calling forget_cached_acl.  Replace that with the
correct information that to prevent caching all that is necessary is
to set inode->i_acl = inode->i_default_acl = ACL_DONT_CACHE when the
inode is initialized.

Signed-off-by: "Eric W. Biederman" 
---
 fs/posix_acl.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 2fd0fde16fe1..3c24fc263401 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -121,14 +121,17 @@ struct posix_acl *get_acl(struct inode *inode, int type)
 * could wait for that other task to complete its job, but it's easier
 * to just call ->get_acl to fetch the ACL ourself.  (This is going to
 * be an unlikely race.)
+*
+* ACL_DONT_CACHE is treated as another task updating the acl and
+* remains set.
 */
if (cmpxchg(p, ACL_NOT_CACHED, sentinel) != ACL_NOT_CACHED)
/* fall through */ ;
 
/*
 * Normally, the ACL returned by ->get_acl will be cached.
-* A filesystem can prevent that by calling
-* forget_cached_acl(inode, type) in ->get_acl.
+* A filesystem can prevent that by calling setting
+* inode->i_acl = inode->i_default_acl = ACL_DONT_CACHE.
 *
 * If the filesystem doesn't have a get_acl() function at all, we'll
 * just create the negative cache entry.
-- 
2.14.1

[PATCH v7 4/7] fuse: Cache a NULL acl when FUSE_GETXATTR returns -ENOSYS

2018-02-26 Thread Eric W. Biederman

When FUSE_GETXATTR will never return anything call cache_no_acl to
cache that state in the vfs as well in fuse with fc->no_getxattr.

The only code path this affects are the code paths that call
fuse_get_acl and caching a NULL or returning it immediately
is exactly the same effect so this should not effect anything.

This keeps the vfs from waisting it's time calling down into fuse
when fuse isn't going to do anything, and it makes it clear
when a NULL should be cached for optimal performance.

Signed-off-by: "Eric W. Biederman" 
---
 fs/fuse/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 3caac46b08b0..0520a4f47226 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -82,6 +82,7 @@ ssize_t fuse_getxattr(struct inode *inode, const char *name, 
void *value,
ret = min_t(ssize_t, outarg.size, XATTR_SIZE_MAX);
if (ret == -ENOSYS) {
fc->no_getxattr = 1;
+   cache_no_acl(inode);
ret = -EOPNOTSUPP;
}
return ret;
-- 
2.14.1

[PATCH v7 4/7] fuse: Cache a NULL acl when FUSE_GETXATTR returns -ENOSYS

2018-02-26 Thread Eric W. Biederman

When FUSE_GETXATTR will never return anything call cache_no_acl to
cache that state in the vfs as well in fuse with fc->no_getxattr.

The only code path this affects are the code paths that call
fuse_get_acl and caching a NULL or returning it immediately
is exactly the same effect so this should not effect anything.

This keeps the vfs from waisting it's time calling down into fuse
when fuse isn't going to do anything, and it makes it clear
when a NULL should be cached for optimal performance.

Signed-off-by: "Eric W. Biederman" 
---
 fs/fuse/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 3caac46b08b0..0520a4f47226 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -82,6 +82,7 @@ ssize_t fuse_getxattr(struct inode *inode, const char *name, 
void *value,
ret = min_t(ssize_t, outarg.size, XATTR_SIZE_MAX);
if (ret == -ENOSYS) {
fc->no_getxattr = 1;
+   cache_no_acl(inode);
ret = -EOPNOTSUPP;
}
return ret;
-- 
2.14.1

[PATCH v7 3/7] fs/posix_acl: Document that get_acl respects ACL_DONT_CACHE

2018-02-26 Thread Eric W. Biederman

Fuse is about to join overlayfs in relying on get_acl respecting
ACL_DONT_CACHE so update the documentation in get_acl to reflect that
fact.  The comment and this change description should give people a
clue that respecting ACL_DONT_CACHE in get_acl is important, and they
should audit the filesystems before removing that support.

Additionaly update the comment above the call to get_acl itself and
remove the wrong information that an implementation of get_acl can
prevent caching by calling forget_cached_acl.  Replace that with the
correct information that to prevent caching all that is necessary is
to set inode->i_acl = inode->i_default_acl = ACL_DONT_CACHE when the
inode is initialized.

Signed-off-by: "Eric W. Biederman" 
---
 fs/posix_acl.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 2fd0fde16fe1..3c24fc263401 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -121,14 +121,17 @@ struct posix_acl *get_acl(struct inode *inode, int type)
 * could wait for that other task to complete its job, but it's easier
 * to just call ->get_acl to fetch the ACL ourself.  (This is going to
 * be an unlikely race.)
+*
+* ACL_DONT_CACHE is treated as another task updating the acl and
+* remains set.
 */
if (cmpxchg(p, ACL_NOT_CACHED, sentinel) != ACL_NOT_CACHED)
/* fall through */ ;
 
/*
 * Normally, the ACL returned by ->get_acl will be cached.
-* A filesystem can prevent that by calling
-* forget_cached_acl(inode, type) in ->get_acl.
+* A filesystem can prevent that by calling setting
+* inode->i_acl = inode->i_default_acl = ACL_DONT_CACHE.
 *
 * If the filesystem doesn't have a get_acl() function at all, we'll
 * just create the negative cache entry.
-- 
2.14.1

[PATCH v7 1/7] fuse: Remove the buggy retranslation of pids in fuse_dev_do_read

2018-02-26 Thread Eric W. Biederman

At the point of fuse_dev_do_read the user space process that initiated the
action on the fuse filesystem may no longer exist.  The process have been
killed or may have fired an asynchronous request and exited.

If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid,
fc->pid_ns)" will either return a pid of 0, or in the unlikely event that
the pid has been reallocated it can return practically any pid.  Any pid is
possible as the pid allocator allocates pid numbers in different pid
namespaces independently.

The only way to make translation in fuse_dev_do_read reliable is to call
get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in
fuse_dev_do_read.  That reference counting in other contexts has been shown
to bounce cache lines between processors and in general be slow.  So that is
not desirable.

The only known user of running the fuse server in a different pid namespace
from the filesystem does not care what the pids are in the fuse messages
so removing this code should not matter.

Getting the translation to a server running outside of the pid namespace
of a container can still be achieved by playing setns games at mount time.
It is also possible to add an option to pass a pid namespace into the fuse
filesystem at mount time.

Fixes: 5d6d3a301c4e ("fuse: allow server to run in different pid_ns")
Signed-off-by: "Eric W. Biederman" 
---
 fs/fuse/dev.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 5d06384c2cae..0fb58f364fa6 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1260,12 +1260,6 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, 
struct file *file,
in = >in;
reqsize = in->h.len;
 
-   if (task_active_pid_ns(current) != fc->pid_ns) {
-   rcu_read_lock();
-   in->h.pid = pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns));
-   rcu_read_unlock();
-   }
-
/* If request is too large, reply with an error and restart the read */
if (nbytes < reqsize) {
req->out.h.error = -EIO;
-- 
2.14.1

[PATCH v7 1/7] fuse: Remove the buggy retranslation of pids in fuse_dev_do_read

2018-02-26 Thread Eric W. Biederman

At the point of fuse_dev_do_read the user space process that initiated the
action on the fuse filesystem may no longer exist.  The process have been
killed or may have fired an asynchronous request and exited.

If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid,
fc->pid_ns)" will either return a pid of 0, or in the unlikely event that
the pid has been reallocated it can return practically any pid.  Any pid is
possible as the pid allocator allocates pid numbers in different pid
namespaces independently.

The only way to make translation in fuse_dev_do_read reliable is to call
get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in
fuse_dev_do_read.  That reference counting in other contexts has been shown
to bounce cache lines between processors and in general be slow.  So that is
not desirable.

The only known user of running the fuse server in a different pid namespace
from the filesystem does not care what the pids are in the fuse messages
so removing this code should not matter.

Getting the translation to a server running outside of the pid namespace
of a container can still be achieved by playing setns games at mount time.
It is also possible to add an option to pass a pid namespace into the fuse
filesystem at mount time.

Fixes: 5d6d3a301c4e ("fuse: allow server to run in different pid_ns")
Signed-off-by: "Eric W. Biederman" 
---
 fs/fuse/dev.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 5d06384c2cae..0fb58f364fa6 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1260,12 +1260,6 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, 
struct file *file,
in = >in;
reqsize = in->h.len;
 
-   if (task_active_pid_ns(current) != fc->pid_ns) {
-   rcu_read_lock();
-   in->h.pid = pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns));
-   rcu_read_unlock();
-   }
-
/* If request is too large, reply with an error and restart the read */
if (nbytes < reqsize) {
req->out.h.error = -EIO;
-- 
2.14.1

[PATCH v7 0/7] fuse: mounts from non-init user namespaces

2018-02-26 Thread Eric W. Biederman


This patchset builds on the work by Donsu Park and Seth Forshee and is
reduced to the set of patches that just affect fuse.  The non-fuse
patches are far enough along we can ignore them except possibly for the
question of when does FS_USERNS_MOUNT get set in fuse_fs_type.

Fuse with a block device has been left as an exercise for a later time.

Since v5 I changed the core of this patchset around as the previous
patches were showing signs of bitrot.  Some important explanations were
missing, some important functionality was missing, and xattr handling
was completely absent.

Since v6 I have:
- Removed the failure case from fuse_get_req_nofail_nopages that I
  added.
- Updated fuse to always to use posix_acl_access_xattr_handler, and
  posix_acl_default_xattr_handler, by teaching fuse to set
  ACL_DONT_CACHE when FUSE_POSIX_ACL is not set.

Miklos can you take a look and see what you think?

I think this much of the fuse changes are ready, and as such I would
like to get them in this development cycle if possible.

These changes are also available at:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
userns-fuse-v7

Eric W. Biederman (6):
  fuse: Remove the buggy retranslation of pids in fuse_dev_do_read
  fuse: Fail all requests with invalid uids or gids
  fs/posix_acl: Document that get_acl respects ACL_DONT_CACHE
  fuse: Cache a NULL acl when FUSE_GETXATTR returns -ENOSYS
  fuse: Simplfiy the posix acl handling logic.
  fuse: Support fuse filesystems outside of init_user_ns

Seth Forshee (1):
  fuse: Restrict allow_other to the superblock's namespace or a descendant

 fs/fuse/acl.c   | 10 +-
 fs/fuse/cuse.c  |  7 ++-
 fs/fuse/dev.c   | 30 +-
 fs/fuse/dir.c   | 27 +--
 fs/fuse/fuse_i.h| 11 ---
 fs/fuse/inode.c | 44 +---
 fs/fuse/xattr.c |  6 +-
 fs/posix_acl.c  |  7 +--
 kernel/user_namespace.c |  1 +
 9 files changed, 85 insertions(+), 58 deletions(-)

Eric

[PATCH v7 0/7] fuse: mounts from non-init user namespaces

2018-02-26 Thread Eric W. Biederman


This patchset builds on the work by Donsu Park and Seth Forshee and is
reduced to the set of patches that just affect fuse.  The non-fuse
patches are far enough along we can ignore them except possibly for the
question of when does FS_USERNS_MOUNT get set in fuse_fs_type.

Fuse with a block device has been left as an exercise for a later time.

Since v5 I changed the core of this patchset around as the previous
patches were showing signs of bitrot.  Some important explanations were
missing, some important functionality was missing, and xattr handling
was completely absent.

Since v6 I have:
- Removed the failure case from fuse_get_req_nofail_nopages that I
  added.
- Updated fuse to always to use posix_acl_access_xattr_handler, and
  posix_acl_default_xattr_handler, by teaching fuse to set
  ACL_DONT_CACHE when FUSE_POSIX_ACL is not set.

Miklos can you take a look and see what you think?

I think this much of the fuse changes are ready, and as such I would
like to get them in this development cycle if possible.

These changes are also available at:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
userns-fuse-v7

Eric W. Biederman (6):
  fuse: Remove the buggy retranslation of pids in fuse_dev_do_read
  fuse: Fail all requests with invalid uids or gids
  fs/posix_acl: Document that get_acl respects ACL_DONT_CACHE
  fuse: Cache a NULL acl when FUSE_GETXATTR returns -ENOSYS
  fuse: Simplfiy the posix acl handling logic.
  fuse: Support fuse filesystems outside of init_user_ns

Seth Forshee (1):
  fuse: Restrict allow_other to the superblock's namespace or a descendant

 fs/fuse/acl.c   | 10 +-
 fs/fuse/cuse.c  |  7 ++-
 fs/fuse/dev.c   | 30 +-
 fs/fuse/dir.c   | 27 +--
 fs/fuse/fuse_i.h| 11 ---
 fs/fuse/inode.c | 44 +---
 fs/fuse/xattr.c |  6 +-
 fs/posix_acl.c  |  7 +--
 kernel/user_namespace.c |  1 +
 9 files changed, 85 insertions(+), 58 deletions(-)

Eric

[GIT PULL] TPM: Bug fixes

2018-02-26 Thread James Morris

Please pull these bugfixes for TPM, from Jeremy Boone, via Jarkko 
Sakkinen.


The following changes since commit 4c3579f6cadd5eb8250a36e789e6df66f660237a:

  Merge tag 'edac_fixes_for_4.16' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp (2018-02-26 10:19:15 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
fixes-v4.16-rc4

for you to fetch changes up to 3be23274755ee85771270a23af7691dc9b3a95db:

  tpm: fix potential buffer overruns caused by bit glitches on the bus 
(2018-02-26 15:43:46 -0800)


Jeremy Boone (5):
  tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
  tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on 
the bus
  tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on 
the bus
  tpm: st33zp24: fix potential buffer overruns caused by bit glitches on 
the bus
  tpm: fix potential buffer overruns caused by bit glitches on the bus

 drivers/char/tpm/st33zp24/st33zp24.c | 4 ++--
 drivers/char/tpm/tpm-interface.c | 4 
 drivers/char/tpm/tpm2-cmd.c  | 4 
 drivers/char/tpm/tpm_i2c_infineon.c  | 5 +++--
 drivers/char/tpm/tpm_i2c_nuvoton.c   | 8 ++--
 drivers/char/tpm/tpm_tis_core.c  | 5 +++--
 6 files changed, 22 insertions(+), 8 deletions(-)

[GIT PULL] TPM: Bug fixes

2018-02-26 Thread James Morris

Please pull these bugfixes for TPM, from Jeremy Boone, via Jarkko 
Sakkinen.


The following changes since commit 4c3579f6cadd5eb8250a36e789e6df66f660237a:

  Merge tag 'edac_fixes_for_4.16' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp (2018-02-26 10:19:15 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
fixes-v4.16-rc4

for you to fetch changes up to 3be23274755ee85771270a23af7691dc9b3a95db:

  tpm: fix potential buffer overruns caused by bit glitches on the bus 
(2018-02-26 15:43:46 -0800)


Jeremy Boone (5):
  tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
  tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on 
the bus
  tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on 
the bus
  tpm: st33zp24: fix potential buffer overruns caused by bit glitches on 
the bus
  tpm: fix potential buffer overruns caused by bit glitches on the bus

 drivers/char/tpm/st33zp24/st33zp24.c | 4 ++--
 drivers/char/tpm/tpm-interface.c | 4 
 drivers/char/tpm/tpm2-cmd.c  | 4 
 drivers/char/tpm/tpm_i2c_infineon.c  | 5 +++--
 drivers/char/tpm/tpm_i2c_nuvoton.c   | 8 ++--
 drivers/char/tpm/tpm_tis_core.c  | 5 +++--
 6 files changed, 22 insertions(+), 8 deletions(-)

Re: [PATCH 8/9] drm/xen-front: Implement GEM operations

2018-02-26 Thread Boris Ostrovsky

On 02/23/2018 10:35 AM, Oleksandr Andrushchenko wrote:
> On 02/23/2018 05:26 PM, Boris Ostrovsky wrote:
>> On 02/21/2018 03:03 AM, Oleksandr Andrushchenko wrote:
>>> +static struct xen_gem_object *gem_create(struct drm_device *dev,
>>> size_t size)
>>> +{
>>> +struct xen_drm_front_drm_info *drm_info = dev->dev_private;
>>> +struct xen_gem_object *xen_obj;
>>> +int ret;
>>> +
>>> +size = round_up(size, PAGE_SIZE);
>>> +xen_obj = gem_create_obj(dev, size);
>>> +if (IS_ERR_OR_NULL(xen_obj))
>>> +return xen_obj;
>>> +
>>> +if (drm_info->cfg->be_alloc) {
>>> +/*
>>> + * backend will allocate space for this buffer, so
>>> + * only allocate array of pointers to pages
>>> + */
>>> +xen_obj->be_alloc = true;
>> If be_alloc is a flag (which I am not sure about) --- should it be set
>> to true *after* you've successfully allocated your things?
> this is a configuration option telling about the way
> the buffer gets allocated: either by the frontend or
> backend (be_alloc -> buffer allocated by the backend)


I can see how drm_info->cfg->be_alloc might be a configuration option
but xen_obj->be_alloc is set here and that's not how configuration
options typically behave.


>>
>>> +ret = gem_alloc_pages_array(xen_obj, size);
>>> +if (ret < 0) {
>>> +gem_free_pages_array(xen_obj);
>>> +goto fail;
>>> +}
>>> +
>>> +ret = alloc_xenballooned_pages(xen_obj->num_pages,
>>> +xen_obj->pages);
>> Why are you allocating balloon pages?
> in this use-case we map pages provided by the backend
> (yes, I know this can be a problem from both security
> POV and that DomU can die holding pages of Dom0 forever:
> but still it is a configuration option, so user decides
> if her use-case needs this and takes responsibility for
> such a decision).


Perhaps I am missing something here but when you say "I know this can be
a problem from both security POV ..." then there is something wrong with
your solution.

-boris

>
> Please see description of the buffering modes in xen_drm_front.h
> specifically for backend allocated buffers:
>  
> ***
>
>  * 2. Buffers allocated by the backend
>  
> ***
>
>  *
>  * This mode of operation is run-time configured via guest domain
> configuration
>  * through XenStore entries.
>  *
>  * For systems which do not provide IOMMU support, but having specific
>  * requirements for display buffers it is possible to allocate such
> buffers
>  * at backend side and share those with the frontend.
>  * For example, if host domain is 1:1 mapped and has DRM/GPU hardware
> expecting
>  * physically contiguous memory, this allows implementing zero-copying
>  * use-cases.
>
>>
>> -boris
>>
>>> +if (ret < 0) {
>>> +DRM_ERROR("Cannot allocate %zu ballooned pages: %d\n",
>>> +xen_obj->num_pages, ret);
>>> +goto fail;
>>> +}
>>> +
>>> +return xen_obj;
>>> +}
>>> +/*
>>> + * need to allocate backing pages now, so we can share those
>>> + * with the backend
>>> + */
>>> +xen_obj->num_pages = DIV_ROUND_UP(size, PAGE_SIZE);
>>> +xen_obj->pages = drm_gem_get_pages(_obj->base);
>>> +if (IS_ERR_OR_NULL(xen_obj->pages)) {
>>> +ret = PTR_ERR(xen_obj->pages);
>>> +xen_obj->pages = NULL;
>>> +goto fail;
>>> +}
>>> +
>>> +return xen_obj;
>>> +
>>> +fail:
>>> +DRM_ERROR("Failed to allocate buffer with size %zu\n", size);
>>> +return ERR_PTR(ret);
>>> +}
>>> +
>>>
>

Re: [PATCH 8/9] drm/xen-front: Implement GEM operations

2018-02-26 Thread Boris Ostrovsky

On 02/23/2018 10:35 AM, Oleksandr Andrushchenko wrote:
> On 02/23/2018 05:26 PM, Boris Ostrovsky wrote:
>> On 02/21/2018 03:03 AM, Oleksandr Andrushchenko wrote:
>>> +static struct xen_gem_object *gem_create(struct drm_device *dev,
>>> size_t size)
>>> +{
>>> +struct xen_drm_front_drm_info *drm_info = dev->dev_private;
>>> +struct xen_gem_object *xen_obj;
>>> +int ret;
>>> +
>>> +size = round_up(size, PAGE_SIZE);
>>> +xen_obj = gem_create_obj(dev, size);
>>> +if (IS_ERR_OR_NULL(xen_obj))
>>> +return xen_obj;
>>> +
>>> +if (drm_info->cfg->be_alloc) {
>>> +/*
>>> + * backend will allocate space for this buffer, so
>>> + * only allocate array of pointers to pages
>>> + */
>>> +xen_obj->be_alloc = true;
>> If be_alloc is a flag (which I am not sure about) --- should it be set
>> to true *after* you've successfully allocated your things?
> this is a configuration option telling about the way
> the buffer gets allocated: either by the frontend or
> backend (be_alloc -> buffer allocated by the backend)


I can see how drm_info->cfg->be_alloc might be a configuration option
but xen_obj->be_alloc is set here and that's not how configuration
options typically behave.


>>
>>> +ret = gem_alloc_pages_array(xen_obj, size);
>>> +if (ret < 0) {
>>> +gem_free_pages_array(xen_obj);
>>> +goto fail;
>>> +}
>>> +
>>> +ret = alloc_xenballooned_pages(xen_obj->num_pages,
>>> +xen_obj->pages);
>> Why are you allocating balloon pages?
> in this use-case we map pages provided by the backend
> (yes, I know this can be a problem from both security
> POV and that DomU can die holding pages of Dom0 forever:
> but still it is a configuration option, so user decides
> if her use-case needs this and takes responsibility for
> such a decision).


Perhaps I am missing something here but when you say "I know this can be
a problem from both security POV ..." then there is something wrong with
your solution.

-boris

>
> Please see description of the buffering modes in xen_drm_front.h
> specifically for backend allocated buffers:
>  
> ***
>
>  * 2. Buffers allocated by the backend
>  
> ***
>
>  *
>  * This mode of operation is run-time configured via guest domain
> configuration
>  * through XenStore entries.
>  *
>  * For systems which do not provide IOMMU support, but having specific
>  * requirements for display buffers it is possible to allocate such
> buffers
>  * at backend side and share those with the frontend.
>  * For example, if host domain is 1:1 mapped and has DRM/GPU hardware
> expecting
>  * physically contiguous memory, this allows implementing zero-copying
>  * use-cases.
>
>>
>> -boris
>>
>>> +if (ret < 0) {
>>> +DRM_ERROR("Cannot allocate %zu ballooned pages: %d\n",
>>> +xen_obj->num_pages, ret);
>>> +goto fail;
>>> +}
>>> +
>>> +return xen_obj;
>>> +}
>>> +/*
>>> + * need to allocate backing pages now, so we can share those
>>> + * with the backend
>>> + */
>>> +xen_obj->num_pages = DIV_ROUND_UP(size, PAGE_SIZE);
>>> +xen_obj->pages = drm_gem_get_pages(_obj->base);
>>> +if (IS_ERR_OR_NULL(xen_obj->pages)) {
>>> +ret = PTR_ERR(xen_obj->pages);
>>> +xen_obj->pages = NULL;
>>> +goto fail;
>>> +}
>>> +
>>> +return xen_obj;
>>> +
>>> +fail:
>>> +DRM_ERROR("Failed to allocate buffer with size %zu\n", size);
>>> +return ERR_PTR(ret);
>>> +}
>>> +
>>>
>

Re: [PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-26 Thread Byungchul Park


On 2/27/2018 8:35 AM, Byungchul Park wrote:

On 2/27/2018 3:22 AM, Paul E. McKenney wrote:

On Mon, Feb 26, 2018 at 12:15:14PM -0500, Steven Rostedt wrote:

On Mon, 26 Feb 2018 14:11:36 +0900
Byungchul Park  wrote:


rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
Use softirq to address performance regression), where it had to be
distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
it's not.

Now that the code was cleaned up so that rcu_preemt_do_callback() is
only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
we don't have to keep the separate function anymore. Remove it for a
better readability.


Looks good to me (looks like commit f8b7fc6b51 "rcu: use softirq
instead of kthreads except when RCU_BOOST=y" cleaned up the ifdefs and
removed the requirement).

Reviewed-by: Steven Rostedt (VMware) 


Thank you both!  I have queued a slightly modified patch for testing
and further review.  Please see below and let me know if I messed
something up.

    Thanx, Paul



commit b8a3012ddba397d4a18d9fd4a00432f8c2626bd6
Author: Byungchul Park 
Date:   Mon Feb 26 14:11:36 2018 +0900

 rcu: Inline rcu_preempt_do_callback() into its sole caller
 The rcu_preempt_do_callbacks() function was introduced in commit
 09223371dea(rcu: Use softirq to address performance regression), 
where it
 was necessary to handle kernel builds both containing and not 
containing

 RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
 ("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) 
have
 resulted in this function being invoked only from 
rcu_kthread_do_work(),
 which is present only in kernels containing RCU-preempt, which in 
turn
 means that the rcu_preempt_do_callbacks() function is no longer 
needed.

 This commit therefore inlines rcu_preempt_do_callbacks() into its
 sole remaining caller and also removes the rcu_state_p and 
rcu_data_p

 indirection for added clarity.
 Signed-off-by: Byungchul Park 
 Reviewed-by: Steven Rostedt (VMware) 
 [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index dc6f2319fc21..9dd0ea77faed 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -449,7 +449,6 @@ static void rcu_preempt_boost_start_gp(struct 
rcu_node *rnp);

  static void invoke_rcu_callbacks_kthread(void);
  static bool rcu_is_callbacks_kthread(void);
  #ifdef CONFIG_RCU_BOOST
-static void rcu_preempt_do_callbacks(void);
  static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
   struct rcu_node *rnp);
  #endif /* #ifdef CONFIG_RCU_BOOST */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 26d7a31e81cb..b0d7f9ba6bf2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
  t->rcu_read_unlock_special.b.need_qs = true;
  }
-#ifdef CONFIG_RCU_BOOST
-
-static void rcu_preempt_do_callbacks(void)
-{
-    rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
-}
-
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
  /**
   * call_rcu() - Queue an RCU callback for invocation after a grace 
period.

   * @head: structure to be used for queueing the RCU updates.
@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
  {
  rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
  rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
-    rcu_preempt_do_callbacks();
+    rcu_do_batch(_preempt_state, this_cpu_ptr(_preempt_data));


OMG. Sorry for the mistake and thank you very much for fixing it.

I will be more careful.


Ah. Logically no difference between mine and your fixed one.

But anyway yours looks much better! Thank you~ :)

--
Thanks,
Byungchul

Re: [PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-26 Thread Byungchul Park


On 2/27/2018 8:35 AM, Byungchul Park wrote:

On 2/27/2018 3:22 AM, Paul E. McKenney wrote:

On Mon, Feb 26, 2018 at 12:15:14PM -0500, Steven Rostedt wrote:

On Mon, 26 Feb 2018 14:11:36 +0900
Byungchul Park  wrote:


rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
Use softirq to address performance regression), where it had to be
distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
it's not.

Now that the code was cleaned up so that rcu_preemt_do_callback() is
only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
we don't have to keep the separate function anymore. Remove it for a
better readability.


Looks good to me (looks like commit f8b7fc6b51 "rcu: use softirq
instead of kthreads except when RCU_BOOST=y" cleaned up the ifdefs and
removed the requirement).

Reviewed-by: Steven Rostedt (VMware) 


Thank you both!  I have queued a slightly modified patch for testing
and further review.  Please see below and let me know if I messed
something up.

    Thanx, Paul



commit b8a3012ddba397d4a18d9fd4a00432f8c2626bd6
Author: Byungchul Park 
Date:   Mon Feb 26 14:11:36 2018 +0900

 rcu: Inline rcu_preempt_do_callback() into its sole caller
 The rcu_preempt_do_callbacks() function was introduced in commit
 09223371dea(rcu: Use softirq to address performance regression), 
where it
 was necessary to handle kernel builds both containing and not 
containing

 RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
 ("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) 
have
 resulted in this function being invoked only from 
rcu_kthread_do_work(),
 which is present only in kernels containing RCU-preempt, which in 
turn
 means that the rcu_preempt_do_callbacks() function is no longer 
needed.

 This commit therefore inlines rcu_preempt_do_callbacks() into its
 sole remaining caller and also removes the rcu_state_p and 
rcu_data_p

 indirection for added clarity.
 Signed-off-by: Byungchul Park 
 Reviewed-by: Steven Rostedt (VMware) 
 [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index dc6f2319fc21..9dd0ea77faed 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -449,7 +449,6 @@ static void rcu_preempt_boost_start_gp(struct 
rcu_node *rnp);

  static void invoke_rcu_callbacks_kthread(void);
  static bool rcu_is_callbacks_kthread(void);
  #ifdef CONFIG_RCU_BOOST
-static void rcu_preempt_do_callbacks(void);
  static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
   struct rcu_node *rnp);
  #endif /* #ifdef CONFIG_RCU_BOOST */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 26d7a31e81cb..b0d7f9ba6bf2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
  t->rcu_read_unlock_special.b.need_qs = true;
  }
-#ifdef CONFIG_RCU_BOOST
-
-static void rcu_preempt_do_callbacks(void)
-{
-    rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
-}
-
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
  /**
   * call_rcu() - Queue an RCU callback for invocation after a grace 
period.

   * @head: structure to be used for queueing the RCU updates.
@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
  {
  rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
  rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
-    rcu_preempt_do_callbacks();
+    rcu_do_batch(_preempt_state, this_cpu_ptr(_preempt_data));


OMG. Sorry for the mistake and thank you very much for fixing it.

I will be more careful.


Ah. Logically no difference between mine and your fixed one.

But anyway yours looks much better! Thank you~ :)

--
Thanks,
Byungchul

Re: [PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-26 Thread Byungchul Park


On 2/27/2018 3:22 AM, Paul E. McKenney wrote:

On Mon, Feb 26, 2018 at 12:15:14PM -0500, Steven Rostedt wrote:

On Mon, 26 Feb 2018 14:11:36 +0900
Byungchul Park  wrote:


rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
Use softirq to address performance regression), where it had to be
distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
it's not.

Now that the code was cleaned up so that rcu_preemt_do_callback() is
only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
we don't have to keep the separate function anymore. Remove it for a
better readability.


Looks good to me (looks like commit f8b7fc6b51 "rcu: use softirq
instead of kthreads except when RCU_BOOST=y" cleaned up the ifdefs and
removed the requirement).

Reviewed-by: Steven Rostedt (VMware) 


Thank you both!  I have queued a slightly modified patch for testing
and further review.  Please see below and let me know if I messed
something up.

Thanx, Paul



commit b8a3012ddba397d4a18d9fd4a00432f8c2626bd6
Author: Byungchul Park 
Date:   Mon Feb 26 14:11:36 2018 +0900

 rcu: Inline rcu_preempt_do_callback() into its sole caller
 
 The rcu_preempt_do_callbacks() function was introduced in commit

 09223371dea(rcu: Use softirq to address performance regression), where it
 was necessary to handle kernel builds both containing and not containing
 RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
 ("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have
 resulted in this function being invoked only from rcu_kthread_do_work(),
 which is present only in kernels containing RCU-preempt, which in turn
 means that the rcu_preempt_do_callbacks() function is no longer needed.
 
 This commit therefore inlines rcu_preempt_do_callbacks() into its

 sole remaining caller and also removes the rcu_state_p and rcu_data_p
 indirection for added clarity.
 
 Signed-off-by: Byungchul Park 

 Reviewed-by: Steven Rostedt (VMware) 
 [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index dc6f2319fc21..9dd0ea77faed 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -449,7 +449,6 @@ static void rcu_preempt_boost_start_gp(struct rcu_node 
*rnp);
  static void invoke_rcu_callbacks_kthread(void);
  static bool rcu_is_callbacks_kthread(void);
  #ifdef CONFIG_RCU_BOOST
-static void rcu_preempt_do_callbacks(void);
  static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 struct rcu_node *rnp);
  #endif /* #ifdef CONFIG_RCU_BOOST */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 26d7a31e81cb..b0d7f9ba6bf2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
t->rcu_read_unlock_special.b.need_qs = true;
  }
  
-#ifdef CONFIG_RCU_BOOST

-
-static void rcu_preempt_do_callbacks(void)
-{
-   rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
-}
-
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
  /**
   * call_rcu() - Queue an RCU callback for invocation after a grace period.
   * @head: structure to be used for queueing the RCU updates.
@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
  {
rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
-   rcu_preempt_do_callbacks();
+   rcu_do_batch(_preempt_state, this_cpu_ptr(_preempt_data));


OMG. Sorry for the mistake and thank you very much for fixing it.

I will be more careful.

--
Thanks,
Byungchul

Re: [PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-26 Thread Byungchul Park


On 2/27/2018 3:22 AM, Paul E. McKenney wrote:

On Mon, Feb 26, 2018 at 12:15:14PM -0500, Steven Rostedt wrote:

On Mon, 26 Feb 2018 14:11:36 +0900
Byungchul Park  wrote:


rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
Use softirq to address performance regression), where it had to be
distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
it's not.

Now that the code was cleaned up so that rcu_preemt_do_callback() is
only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
we don't have to keep the separate function anymore. Remove it for a
better readability.


Looks good to me (looks like commit f8b7fc6b51 "rcu: use softirq
instead of kthreads except when RCU_BOOST=y" cleaned up the ifdefs and
removed the requirement).

Reviewed-by: Steven Rostedt (VMware) 


Thank you both!  I have queued a slightly modified patch for testing
and further review.  Please see below and let me know if I messed
something up.

Thanx, Paul



commit b8a3012ddba397d4a18d9fd4a00432f8c2626bd6
Author: Byungchul Park 
Date:   Mon Feb 26 14:11:36 2018 +0900

 rcu: Inline rcu_preempt_do_callback() into its sole caller
 
 The rcu_preempt_do_callbacks() function was introduced in commit

 09223371dea(rcu: Use softirq to address performance regression), where it
 was necessary to handle kernel builds both containing and not containing
 RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
 ("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have
 resulted in this function being invoked only from rcu_kthread_do_work(),
 which is present only in kernels containing RCU-preempt, which in turn
 means that the rcu_preempt_do_callbacks() function is no longer needed.
 
 This commit therefore inlines rcu_preempt_do_callbacks() into its

 sole remaining caller and also removes the rcu_state_p and rcu_data_p
 indirection for added clarity.
 
 Signed-off-by: Byungchul Park 

 Reviewed-by: Steven Rostedt (VMware) 
 [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index dc6f2319fc21..9dd0ea77faed 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -449,7 +449,6 @@ static void rcu_preempt_boost_start_gp(struct rcu_node 
*rnp);
  static void invoke_rcu_callbacks_kthread(void);
  static bool rcu_is_callbacks_kthread(void);
  #ifdef CONFIG_RCU_BOOST
-static void rcu_preempt_do_callbacks(void);
  static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 struct rcu_node *rnp);
  #endif /* #ifdef CONFIG_RCU_BOOST */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 26d7a31e81cb..b0d7f9ba6bf2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
t->rcu_read_unlock_special.b.need_qs = true;
  }
  
-#ifdef CONFIG_RCU_BOOST

-
-static void rcu_preempt_do_callbacks(void)
-{
-   rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
-}
-
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
  /**
   * call_rcu() - Queue an RCU callback for invocation after a grace period.
   * @head: structure to be used for queueing the RCU updates.
@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
  {
rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
-   rcu_preempt_do_callbacks();
+   rcu_do_batch(_preempt_state, this_cpu_ptr(_preempt_data));


OMG. Sorry for the mistake and thank you very much for fixing it.

I will be more careful.

--
Thanks,
Byungchul

Re: [GIT PULL] tpmdd fixes for 4.16

2018-02-26 Thread James Morris

On Mon, 26 Feb 2018, James Bottomley wrote:

> On Tue, 2018-02-27 at 05:52 +1100, James Morris wrote:
> > On Mon, 26 Feb 2018, Jarkko Sakkinen wrote:
> > 
> > > 
> > > Hi
> > > 
> > > Here is a batch of critical fixes for 4.16.
> > > 
> > 
> > Do you have CVEs for these?  If so, please include them in the commit
> > messages.
> 
> Heh, well, I suppose from this everyone now knows they're fixes for
> vulnerabilities.  The CVE doesn't usually show up until after the
> embargo is released, so we don't have one yet.

Anyone with half a brain can read Linus' -rc commits and work it out.

-- 
James Morris

Re: [GIT PULL] tpmdd fixes for 4.16

2018-02-26 Thread James Morris

On Mon, 26 Feb 2018, James Bottomley wrote:

> On Tue, 2018-02-27 at 05:52 +1100, James Morris wrote:
> > On Mon, 26 Feb 2018, Jarkko Sakkinen wrote:
> > 
> > > 
> > > Hi
> > > 
> > > Here is a batch of critical fixes for 4.16.
> > > 
> > 
> > Do you have CVEs for these?  If so, please include them in the commit
> > messages.
> 
> Heh, well, I suppose from this everyone now knows they're fixes for
> vulnerabilities.  The CVE doesn't usually show up until after the
> embargo is released, so we don't have one yet.

Anyone with half a brain can read Linus' -rc commits and work it out.

-- 
James Morris

Re: [alsa-devel] regression v4.16 on Nokia N900: sound does not work

2018-02-26 Thread Pavel Machek

Hi!

> > >> JFYI: This issues is tracked in the regression reports for Linux 4.16
> > >> (http://bit.ly/lnxregrep416 ) with this id:
> > >>
> > >> Linux-Regression-ID: lr#4b650f
> > >
> > > Ok, so it seems that issue is bigger: whole sound subsystem does not
> > > work. /proc/asound/cards is empty.
> > >
> > > 7e6127c1240ed569cdda2a67c8f03836f9f28c05 seems to be bad already.
> > >
> > > I tried to revert sound/soc changes, and sound is broken, too. Nasty
> > 
> > 
> > dmesg log?
> 
> Partial dmesg is at:
> https://github.com/pavelmachek/missy/blob/master/db/phone/nokia/n900/pavel/2018.1291171648263/dmesg.out
> 
> I should be able to get full one...
> 
> I did git bisect, and the winner seems to be:
> 
> pavel@duo:/data/l/linux-n900$ git bisect bad
> c85823390215e52d68d3826df92a447ed31e5c80 is the first bad commit
> commit c85823390215e52d68d3826df92a447ed31e5c80
> Author: Linus Walleij 
> Date:   Wed Dec 27 16:37:44 2017 +0100

I reverted it on top of v4.16-rc2, and sound now works. Ideas?

(Aha, and I see I made small mistake reverting... but...)

Pavel

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index 564bb7a..50cc590 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -157,36 +157,6 @@ int of_get_named_gpio_flags(struct device_node *np, const 
char *list_name,
 EXPORT_SYMBOL(of_get_named_gpio_flags);
 
 /*
- * The SPI GPIO bindings happened before we managed to establish that GPIO
- * properties should be named "foo-gpios" so we have this special kludge for
- * them.
- */
-static struct gpio_desc *of_find_spi_gpio(struct device *dev, const char 
*con_id,
- enum of_gpio_flags *of_flags)
-{
-   char prop_name[32]; /* 32 is max size of property name */
-   struct device_node *np = dev->of_node;
-   struct gpio_desc *desc;
-
-   /*
-* Hopefully the compiler stubs the rest of the function if this
-* is false.
-*/
-   if (!IS_ENABLED(CONFIG_SPI_MASTER))
-   return ERR_PTR(-ENOENT);
-
-   /* Allow this specifically for "spi-gpio" devices */
-   if (!of_device_is_compatible(np, "spi-gpio") || !con_id)
-   return ERR_PTR(-ENOENT);
-
-   /* Will be "gpio-sck", "gpio-mosi" or "gpio-miso" */
-   snprintf(prop_name, sizeof(prop_name), "%s-%s", "gpio", con_id);
-
-   desc = of_get_named_gpiod_flags(np, prop_name, 0, of_flags);
-   return desc;
-}
-
-/*
  * Some regulator bindings happened before we managed to establish that GPIO
  * properties should be named "foo-gpios" so we have this special kludge for
  * them.
@@ -230,7 +200,6 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
struct gpio_desc *desc;
unsigned int i;
 
-   /* Try GPIO property "foo-gpios" and "foo-gpio" */
for (i = 0; i < ARRAY_SIZE(gpio_suffixes); i++) {
if (con_id)
snprintf(prop_name, sizeof(prop_name), "%s-%s", con_id,
@@ -245,14 +214,6 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
break;
}
 
-   /* Special handling for SPI GPIOs if used */
-   if (IS_ERR(desc))
-   desc = of_find_spi_gpio(dev, con_id, _flags);
-
-   /* Special handling for regulator GPIOs if used */
-   if (IS_ERR(desc))
-   desc = of_find_regulator_gpio(dev, con_id, _flags);
-
if (IS_ERR(desc))
return desc;


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [alsa-devel] regression v4.16 on Nokia N900: sound does not work

2018-02-26 Thread Pavel Machek

Hi!

> > >> JFYI: This issues is tracked in the regression reports for Linux 4.16
> > >> (http://bit.ly/lnxregrep416 ) with this id:
> > >>
> > >> Linux-Regression-ID: lr#4b650f
> > >
> > > Ok, so it seems that issue is bigger: whole sound subsystem does not
> > > work. /proc/asound/cards is empty.
> > >
> > > 7e6127c1240ed569cdda2a67c8f03836f9f28c05 seems to be bad already.
> > >
> > > I tried to revert sound/soc changes, and sound is broken, too. Nasty
> > 
> > 
> > dmesg log?
> 
> Partial dmesg is at:
> https://github.com/pavelmachek/missy/blob/master/db/phone/nokia/n900/pavel/2018.1291171648263/dmesg.out
> 
> I should be able to get full one...
> 
> I did git bisect, and the winner seems to be:
> 
> pavel@duo:/data/l/linux-n900$ git bisect bad
> c85823390215e52d68d3826df92a447ed31e5c80 is the first bad commit
> commit c85823390215e52d68d3826df92a447ed31e5c80
> Author: Linus Walleij 
> Date:   Wed Dec 27 16:37:44 2017 +0100

I reverted it on top of v4.16-rc2, and sound now works. Ideas?

(Aha, and I see I made small mistake reverting... but...)

Pavel

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index 564bb7a..50cc590 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -157,36 +157,6 @@ int of_get_named_gpio_flags(struct device_node *np, const 
char *list_name,
 EXPORT_SYMBOL(of_get_named_gpio_flags);
 
 /*
- * The SPI GPIO bindings happened before we managed to establish that GPIO
- * properties should be named "foo-gpios" so we have this special kludge for
- * them.
- */
-static struct gpio_desc *of_find_spi_gpio(struct device *dev, const char 
*con_id,
- enum of_gpio_flags *of_flags)
-{
-   char prop_name[32]; /* 32 is max size of property name */
-   struct device_node *np = dev->of_node;
-   struct gpio_desc *desc;
-
-   /*
-* Hopefully the compiler stubs the rest of the function if this
-* is false.
-*/
-   if (!IS_ENABLED(CONFIG_SPI_MASTER))
-   return ERR_PTR(-ENOENT);
-
-   /* Allow this specifically for "spi-gpio" devices */
-   if (!of_device_is_compatible(np, "spi-gpio") || !con_id)
-   return ERR_PTR(-ENOENT);
-
-   /* Will be "gpio-sck", "gpio-mosi" or "gpio-miso" */
-   snprintf(prop_name, sizeof(prop_name), "%s-%s", "gpio", con_id);
-
-   desc = of_get_named_gpiod_flags(np, prop_name, 0, of_flags);
-   return desc;
-}
-
-/*
  * Some regulator bindings happened before we managed to establish that GPIO
  * properties should be named "foo-gpios" so we have this special kludge for
  * them.
@@ -230,7 +200,6 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
struct gpio_desc *desc;
unsigned int i;
 
-   /* Try GPIO property "foo-gpios" and "foo-gpio" */
for (i = 0; i < ARRAY_SIZE(gpio_suffixes); i++) {
if (con_id)
snprintf(prop_name, sizeof(prop_name), "%s-%s", con_id,
@@ -245,14 +214,6 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
break;
}
 
-   /* Special handling for SPI GPIOs if used */
-   if (IS_ERR(desc))
-   desc = of_find_spi_gpio(dev, con_id, _flags);
-
-   /* Special handling for regulator GPIOs if used */
-   if (IS_ERR(desc))
-   desc = of_find_regulator_gpio(dev, con_id, _flags);
-
if (IS_ERR(desc))
return desc;


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: linux-next: manual merge of the bpf-next tree with the bpf tree

2018-02-26 Thread Stephen Rothwell

Hi Dave,

On Mon, 26 Feb 2018 11:41:47 +1100 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the bpf-next tree got a conflict in:
> 
>   tools/testing/selftests/bpf/test_verifier.c
> 
> between commit:
> 
>   ca36960211eb ("bpf: allow xadd only on aligned memory")
> 
> from the bpf tree and commit:
> 
>   23d191a82c13 ("bpf: add various jit test cases")
> 
> from the bpf-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc tools/testing/selftests/bpf/test_verifier.c
> index 437c0b1c9d21,c987d3a2426f..
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@@ -11163,64 -11140,95 +11166,153 @@@ static struct bpf_test tests[] = 
>   .result = REJECT,
>   .prog_type = BPF_PROG_TYPE_TRACEPOINT,
>   },
>  +{
>  +"xadd/w check unaligned stack",
>  +.insns = {
>  +BPF_MOV64_IMM(BPF_REG_0, 1),
>  +BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
>  +BPF_STX_XADD(BPF_W, BPF_REG_10, BPF_REG_0, -7),
>  +BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
>  +BPF_EXIT_INSN(),
>  +},
>  +.result = REJECT,
>  +.errstr = "misaligned stack access off",
>  +.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  +},
>  +{
>  +"xadd/w check unaligned map",
>  +.insns = {
>  +BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>  +BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>  +BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>  +BPF_LD_MAP_FD(BPF_REG_1, 0),
>  +BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
>  + BPF_FUNC_map_lookup_elem),
>  +BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
>  +BPF_EXIT_INSN(),
>  +BPF_MOV64_IMM(BPF_REG_1, 1),
>  +BPF_STX_XADD(BPF_W, BPF_REG_0, BPF_REG_1, 3),
>  +BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 3),
>  +BPF_EXIT_INSN(),
>  +},
>  +.fixup_map1 = { 3 },
>  +.result = REJECT,
>  +.errstr = "misaligned value access off",
>  +.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  +},
>  +{
>  +"xadd/w check unaligned pkt",
>  +.insns = {
>  +BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
>  +offsetof(struct xdp_md, data)),
>  +BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
>  +offsetof(struct xdp_md, data_end)),
>  +BPF_MOV64_REG(BPF_REG_1, BPF_REG_2),
>  +BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
>  +BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 2),
>  +BPF_MOV64_IMM(BPF_REG_0, 99),
>  +BPF_JMP_IMM(BPF_JA, 0, 0, 6),
>  +BPF_MOV64_IMM(BPF_REG_0, 1),
>  +BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
>  +BPF_ST_MEM(BPF_W, BPF_REG_2, 3, 0),
>  +BPF_STX_XADD(BPF_W, BPF_REG_2, BPF_REG_0, 1),
>  +BPF_STX_XADD(BPF_W, BPF_REG_2, BPF_REG_0, 2),
>  +BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 1),
>  +BPF_EXIT_INSN(),
>  +},
>  +.result = REJECT,
>  +.errstr = "BPF_XADD stores into R2 packet",
>  +.prog_type = BPF_PROG_TYPE_XDP,
>  +},
> + {
> + "jit: lsh, rsh, arsh by 1",
> + .insns = {
> + BPF_MOV64_IMM(BPF_REG_0, 1),
> + BPF_MOV64_IMM(BPF_REG_1, 0xff),
> + BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 1),
> + BPF_ALU32_IMM(BPF_LSH, BPF_REG_1, 1),
> + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x3fc, 1),
> + BPF_EXIT_INSN(),
> + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 1),
> + BPF_ALU32_IMM(BPF_RSH, BPF_REG_1, 1),
> + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0xff, 1),
> + BPF_EXIT_INSN(),
> + BPF_ALU64_IMM(BPF_ARSH, BPF_REG_1, 1),
> + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x7f, 1),
> + BPF_EXIT_INSN(),
> + BPF_MOV64_IMM(BPF_REG_0, 2),
> + BPF_EXIT_INSN(),
> + },
> +

Re: linux-next: manual merge of the bpf-next tree with the bpf tree

2018-02-26 Thread Stephen Rothwell

Hi Dave,

On Mon, 26 Feb 2018 11:41:47 +1100 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the bpf-next tree got a conflict in:
> 
>   tools/testing/selftests/bpf/test_verifier.c
> 
> between commit:
> 
>   ca36960211eb ("bpf: allow xadd only on aligned memory")
> 
> from the bpf tree and commit:
> 
>   23d191a82c13 ("bpf: add various jit test cases")
> 
> from the bpf-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc tools/testing/selftests/bpf/test_verifier.c
> index 437c0b1c9d21,c987d3a2426f..
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@@ -11163,64 -11140,95 +11166,153 @@@ static struct bpf_test tests[] = 
>   .result = REJECT,
>   .prog_type = BPF_PROG_TYPE_TRACEPOINT,
>   },
>  +{
>  +"xadd/w check unaligned stack",
>  +.insns = {
>  +BPF_MOV64_IMM(BPF_REG_0, 1),
>  +BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
>  +BPF_STX_XADD(BPF_W, BPF_REG_10, BPF_REG_0, -7),
>  +BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
>  +BPF_EXIT_INSN(),
>  +},
>  +.result = REJECT,
>  +.errstr = "misaligned stack access off",
>  +.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  +},
>  +{
>  +"xadd/w check unaligned map",
>  +.insns = {
>  +BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>  +BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>  +BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>  +BPF_LD_MAP_FD(BPF_REG_1, 0),
>  +BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
>  + BPF_FUNC_map_lookup_elem),
>  +BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
>  +BPF_EXIT_INSN(),
>  +BPF_MOV64_IMM(BPF_REG_1, 1),
>  +BPF_STX_XADD(BPF_W, BPF_REG_0, BPF_REG_1, 3),
>  +BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 3),
>  +BPF_EXIT_INSN(),
>  +},
>  +.fixup_map1 = { 3 },
>  +.result = REJECT,
>  +.errstr = "misaligned value access off",
>  +.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  +},
>  +{
>  +"xadd/w check unaligned pkt",
>  +.insns = {
>  +BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
>  +offsetof(struct xdp_md, data)),
>  +BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
>  +offsetof(struct xdp_md, data_end)),
>  +BPF_MOV64_REG(BPF_REG_1, BPF_REG_2),
>  +BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
>  +BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 2),
>  +BPF_MOV64_IMM(BPF_REG_0, 99),
>  +BPF_JMP_IMM(BPF_JA, 0, 0, 6),
>  +BPF_MOV64_IMM(BPF_REG_0, 1),
>  +BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
>  +BPF_ST_MEM(BPF_W, BPF_REG_2, 3, 0),
>  +BPF_STX_XADD(BPF_W, BPF_REG_2, BPF_REG_0, 1),
>  +BPF_STX_XADD(BPF_W, BPF_REG_2, BPF_REG_0, 2),
>  +BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 1),
>  +BPF_EXIT_INSN(),
>  +},
>  +.result = REJECT,
>  +.errstr = "BPF_XADD stores into R2 packet",
>  +.prog_type = BPF_PROG_TYPE_XDP,
>  +},
> + {
> + "jit: lsh, rsh, arsh by 1",
> + .insns = {
> + BPF_MOV64_IMM(BPF_REG_0, 1),
> + BPF_MOV64_IMM(BPF_REG_1, 0xff),
> + BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 1),
> + BPF_ALU32_IMM(BPF_LSH, BPF_REG_1, 1),
> + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x3fc, 1),
> + BPF_EXIT_INSN(),
> + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 1),
> + BPF_ALU32_IMM(BPF_RSH, BPF_REG_1, 1),
> + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0xff, 1),
> + BPF_EXIT_INSN(),
> + BPF_ALU64_IMM(BPF_ARSH, BPF_REG_1, 1),
> + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x7f, 1),
> + BPF_EXIT_INSN(),
> + BPF_MOV64_IMM(BPF_REG_0, 2),
> + BPF_EXIT_INSN(),
> + },
> + .result = ACCEPT,
>

[PATCH] x86/mm/sme: Disable stack protection for mem_encrypt_identity.c

2018-02-26 Thread Tom Lendacky

Stack protection is not compatible with early boot code.  All of the early
SME boot code is now isolated in a separate file, mem_encrypt_identity.c,
so arch/x86/mm/Makefile can be updated to turn off stack protection for
the entire file.  This eliminates the need to worry about other functions
within the file being instrumented with stack protection (as was seen
when a newer version of GCC instrumented sme_encrypt_kernel() where an
older version hadn't).  It also allows removal of the __nostackprotector
attribute from individual functions.

Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/Makefile   |1 +
 arch/x86/mm/mem_encrypt_identity.c |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 03c6c85..4b101dd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -19,6 +19,7 @@ obj-y :=  init.o init_$(BITS).o fault.o ioremap.o extable.o 
pageattr.o mmap.o \
 nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o  := $(nostackp)
 CFLAGS_setup_nx.o  := $(nostackp)
+CFLAGS_mem_encrypt_identity.o  := $(nostackp)
 
 CFLAGS_fault.o := -I$(src)/../include/asm/trace
 
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index b4139c5..1b2197d 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -266,7 +266,7 @@ static unsigned long __init sme_pgtable_calc(unsigned long 
len)
return entries + tables;
 }
 
-void __init __nostackprotector sme_encrypt_kernel(struct boot_params *bp)
+void __init sme_encrypt_kernel(struct boot_params *bp)
 {
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
@@ -468,7 +468,7 @@ void __init __nostackprotector sme_encrypt_kernel(struct 
boot_params *bp)
native_write_cr3(__native_read_cr3());
 }
 
-void __init __nostackprotector sme_enable(struct boot_params *bp)
+void __init sme_enable(struct boot_params *bp)
 {
const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
unsigned int eax, ebx, ecx, edx;

[PATCH] x86/mm/sme: Disable stack protection for mem_encrypt_identity.c

2018-02-26 Thread Tom Lendacky

Stack protection is not compatible with early boot code.  All of the early
SME boot code is now isolated in a separate file, mem_encrypt_identity.c,
so arch/x86/mm/Makefile can be updated to turn off stack protection for
the entire file.  This eliminates the need to worry about other functions
within the file being instrumented with stack protection (as was seen
when a newer version of GCC instrumented sme_encrypt_kernel() where an
older version hadn't).  It also allows removal of the __nostackprotector
attribute from individual functions.

Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/Makefile   |1 +
 arch/x86/mm/mem_encrypt_identity.c |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 03c6c85..4b101dd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -19,6 +19,7 @@ obj-y :=  init.o init_$(BITS).o fault.o ioremap.o extable.o 
pageattr.o mmap.o \
 nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o  := $(nostackp)
 CFLAGS_setup_nx.o  := $(nostackp)
+CFLAGS_mem_encrypt_identity.o  := $(nostackp)
 
 CFLAGS_fault.o := -I$(src)/../include/asm/trace
 
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index b4139c5..1b2197d 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -266,7 +266,7 @@ static unsigned long __init sme_pgtable_calc(unsigned long 
len)
return entries + tables;
 }
 
-void __init __nostackprotector sme_encrypt_kernel(struct boot_params *bp)
+void __init sme_encrypt_kernel(struct boot_params *bp)
 {
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
@@ -468,7 +468,7 @@ void __init __nostackprotector sme_encrypt_kernel(struct 
boot_params *bp)
native_write_cr3(__native_read_cr3());
 }
 
-void __init __nostackprotector sme_enable(struct boot_params *bp)
+void __init sme_enable(struct boot_params *bp)
 {
const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
unsigned int eax, ebx, ecx, edx;

Re: [PATCH 2/2] kbuild: simplify ld-option implementation

2018-02-26 Thread Nick Desaulniers

Nice! Now we don't need to invoke $CC to find out info about linker support.

Signed-off-by: Nick Desaulniers 
Tested-by: Nick Desaulniers 

On Thu, Feb 22, 2018 at 8:57 PM Masahiro Yamada <
yamada.masah...@socionext.com> wrote:

> Currently, linker options are tested by the coordination of $(CC) and
> $(LD) because LD needs some object to link.

> As commit 86a9df597cdd ("kbuild: fix linker feature test macros when
> cross compiling with Clang") addressed, we need to make sure $(CC)
> and $(LD) agree the underlying architecture of the passed object.

> This could be a bit complex when we combine tools from different groups.
> For example, we can use clang for $(CC), but we still need to rely on
> GCC toolchain for $(LD).

> So, I was searching a way for standalone testing of linker options.
> A trick I found is to use '-v'.  This prints the version string, but
> also tests if the given option is recognized.

> If a given option is supported,

>$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419
>GNU ld (Linaro_Binutils-2017.11) 2.28.2.20170706
>$ echo $?
>0

> If unsupported,

>$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419
>GNU ld (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC
> 2013.04) 2.23.1
>aarch64-linux-gnu-ld: unrecognized option '--fix-cortex-a53-843419'
>aarch64-linux-gnu-ld: use the --help option for usage information
>$ echo $?
>1

> Gold works likewise.

>$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-843419
>GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14
>masahiro@pug:~/ref/linux$ echo $?
>0
>$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-99
>GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14
>aarch64-linux-gnu-ld.gold: --fix-cortex-a53-99: unknown option
>aarch64-linux-gnu-ld.gold: use the --help option for usage information
>$ echo $?
>1

> LLD too.

>$ ld.lld -v --gc-sections
>LLD 7.0.0 (http://llvm.org/git/lld.git
> 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
>$ echo $?
>0
>$ ld.lld -v --fix-cortex-a53-843419
>LLD 7.0.0 (http://llvm.org/git/lld.git
> 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
>$ echo $?
>0
>$ ld.lld -v --fix-cortex-a53-99
>ld.lld: error: unknown argument: --fix-cortex-a53-99
>LLD 7.0.0 (http://llvm.org/git/lld.git
> 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
>$ echo $?
>1

> Signed-off-by: Masahiro Yamada 
> ---

>   scripts/Kbuild.include | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)

> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index 34cbd81..f9c2f07 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -237,9 +237,7 @@ cc-ldoption = $(call try-run-cached,\

>   # ld-option
>   # Usage: LDFLAGS += $(call ld-option, -X)
> -ld-option = $(call try-run-cached,\
> -   $(CC) $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -x c /dev/null -c -o
> "$$TMPO"; \
> -   $(LD) $(LDFLAGS) $(1) "$$TMPO" -o "$$TMP",$(1),$(2))
> +ld-option = $(call try-run-cached, $(LD) $(LDFLAGS) $(1) -v,$(1),$(2))

>   # ar-option
>   # Usage: KBUILD_ARFLAGS := $(call ar-option,D)
> --
> 2.7.4



-- 
Thanks,
~Nick Desaulniers

Re: [PATCH 2/2] kbuild: simplify ld-option implementation

2018-02-26 Thread Nick Desaulniers

Nice! Now we don't need to invoke $CC to find out info about linker support.

Signed-off-by: Nick Desaulniers 
Tested-by: Nick Desaulniers 

On Thu, Feb 22, 2018 at 8:57 PM Masahiro Yamada <
yamada.masah...@socionext.com> wrote:

> Currently, linker options are tested by the coordination of $(CC) and
> $(LD) because LD needs some object to link.

> As commit 86a9df597cdd ("kbuild: fix linker feature test macros when
> cross compiling with Clang") addressed, we need to make sure $(CC)
> and $(LD) agree the underlying architecture of the passed object.

> This could be a bit complex when we combine tools from different groups.
> For example, we can use clang for $(CC), but we still need to rely on
> GCC toolchain for $(LD).

> So, I was searching a way for standalone testing of linker options.
> A trick I found is to use '-v'.  This prints the version string, but
> also tests if the given option is recognized.

> If a given option is supported,

>$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419
>GNU ld (Linaro_Binutils-2017.11) 2.28.2.20170706
>$ echo $?
>0

> If unsupported,

>$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419
>GNU ld (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC
> 2013.04) 2.23.1
>aarch64-linux-gnu-ld: unrecognized option '--fix-cortex-a53-843419'
>aarch64-linux-gnu-ld: use the --help option for usage information
>$ echo $?
>1

> Gold works likewise.

>$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-843419
>GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14
>masahiro@pug:~/ref/linux$ echo $?
>0
>$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-99
>GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14
>aarch64-linux-gnu-ld.gold: --fix-cortex-a53-99: unknown option
>aarch64-linux-gnu-ld.gold: use the --help option for usage information
>$ echo $?
>1

> LLD too.

>$ ld.lld -v --gc-sections
>LLD 7.0.0 (http://llvm.org/git/lld.git
> 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
>$ echo $?
>0
>$ ld.lld -v --fix-cortex-a53-843419
>LLD 7.0.0 (http://llvm.org/git/lld.git
> 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
>$ echo $?
>0
>$ ld.lld -v --fix-cortex-a53-99
>ld.lld: error: unknown argument: --fix-cortex-a53-99
>LLD 7.0.0 (http://llvm.org/git/lld.git
> 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
>$ echo $?
>1

> Signed-off-by: Masahiro Yamada 
> ---

>   scripts/Kbuild.include | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)

> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index 34cbd81..f9c2f07 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -237,9 +237,7 @@ cc-ldoption = $(call try-run-cached,\

>   # ld-option
>   # Usage: LDFLAGS += $(call ld-option, -X)
> -ld-option = $(call try-run-cached,\
> -   $(CC) $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -x c /dev/null -c -o
> "$$TMPO"; \
> -   $(LD) $(LDFLAGS) $(1) "$$TMPO" -o "$$TMP",$(1),$(2))
> +ld-option = $(call try-run-cached, $(LD) $(LDFLAGS) $(1) -v,$(1),$(2))

>   # ar-option
>   # Usage: KBUILD_ARFLAGS := $(call ar-option,D)
> --
> 2.7.4



-- 
Thanks,
~Nick Desaulniers

Re: linux-next: Signed-off-by missing for commit in the rcu tree

2018-02-26 Thread Paul E. McKenney

On Tue, Feb 27, 2018 at 09:38:16AM +1100, Stephen Rothwell wrote:
> Hi Paul,
> 
> Commit
> 
>   2a84d1aef423 ("rcu: Inline rcu_preempt_do_callback() into its sole caller")
> 
> is missing a Signed-off-by from its committer.

That would be because idiot here left the "-s" off of "git am"...

Apologies for the hassle, rebased to fix.

Thanx, Paul

Re: linux-next: Signed-off-by missing for commit in the rcu tree

2018-02-26 Thread Paul E. McKenney

On Tue, Feb 27, 2018 at 09:38:16AM +1100, Stephen Rothwell wrote:
> Hi Paul,
> 
> Commit
> 
>   2a84d1aef423 ("rcu: Inline rcu_preempt_do_callback() into its sole caller")
> 
> is missing a Signed-off-by from its committer.

That would be because idiot here left the "-s" off of "git am"...

Apologies for the hassle, rebased to fix.

Thanx, Paul

[PATCH] sparc64: Oracle DAX driver depends on SPARC64

2018-02-26 Thread Guenter Roeck

sparc:allmodconfig fails to build as follows.

ERROR: "mdesc_release" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_hvapi_register" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_get_property" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_node_by_name" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_grab" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_info" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_submit" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_kill" [drivers/sbus/char/oradax.ko] undefined!

The symbols are only available with SPARC64 builds, thus the driver
depends on it.

Fixes: dd0273284c74 ("sparc64: Oracle DAX driver")
Cc: Kees Cook 
Signed-off-by: Guenter Roeck 
---
 drivers/sbus/char/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/sbus/char/Kconfig b/drivers/sbus/char/Kconfig
index a785aa7660c3..bf3c5f735614 100644
--- a/drivers/sbus/char/Kconfig
+++ b/drivers/sbus/char/Kconfig
@@ -72,7 +72,8 @@ config DISPLAY7SEG
 
 config ORACLE_DAX
tristate "Oracle Data Analytics Accelerator"
-   default m if SPARC64
+   depends on SPARC64
+   default m
help
 Driver for Oracle Data Analytics Accelerator, which is
 a coprocessor that performs database operations in hardware.
-- 
2.7.4

[PATCH] sparc64: Oracle DAX driver depends on SPARC64

2018-02-26 Thread Guenter Roeck

sparc:allmodconfig fails to build as follows.

ERROR: "mdesc_release" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_hvapi_register" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_get_property" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_node_by_name" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_grab" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_info" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_submit" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_kill" [drivers/sbus/char/oradax.ko] undefined!

The symbols are only available with SPARC64 builds, thus the driver
depends on it.

Fixes: dd0273284c74 ("sparc64: Oracle DAX driver")
Cc: Kees Cook 
Signed-off-by: Guenter Roeck 
---
 drivers/sbus/char/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/sbus/char/Kconfig b/drivers/sbus/char/Kconfig
index a785aa7660c3..bf3c5f735614 100644
--- a/drivers/sbus/char/Kconfig
+++ b/drivers/sbus/char/Kconfig
@@ -72,7 +72,8 @@ config DISPLAY7SEG
 
 config ORACLE_DAX
tristate "Oracle Data Analytics Accelerator"
-   default m if SPARC64
+   depends on SPARC64
+   default m
help
 Driver for Oracle Data Analytics Accelerator, which is
 a coprocessor that performs database operations in hardware.
-- 
2.7.4

[PATCH] PCI: Move declaration of of_irq_parse_and_map_pci under OF_IRQ

2018-02-26 Thread Guenter Roeck

Since commit 4670d610d5923 ("PCI: Move OF-related PCI functions into
PCI core"), sparc:allmodconfig fails to build with the following error.

pcie-cadence-host.c:(.text+0x4c4):
undefined reference to `of_irq_parse_and_map_pci'
pcie-cadence-host.c:(.text+0x4c8):
undefined reference to `of_irq_parse_and_map_pci'

of_irq_parse_and_map_pci is now only available if OF_IRQ is enabled.
Make its declaration and its dummy function dependent on OF_IRQ
to solve the problem.

Fixes: 4670d610d5923 ("PCI: Move OF-related PCI functions into PCI core")
Cc: Rob Herring 
Signed-off-by: Guenter Roeck 
---
 include/linux/of_pci.h | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 88865e0ebf4d..091033a6b836 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -13,7 +13,6 @@ struct device_node;
 struct device_node *of_pci_find_child_device(struct device_node *parent,
 unsigned int devfn);
 int of_pci_get_devfn(struct device_node *np);
-int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin);
 int of_pci_parse_bus_range(struct device_node *node, struct resource *res);
 int of_get_pci_domain_nr(struct device_node *node);
 int of_pci_get_max_link_speed(struct device_node *node);
@@ -34,12 +33,6 @@ static inline int of_pci_get_devfn(struct device_node *np)
 }
 
 static inline int
-of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin)
-{
-   return 0;
-}
-
-static inline int
 of_pci_parse_bus_range(struct device_node *node, struct resource *res)
 {
return -EINVAL;
@@ -67,6 +60,16 @@ of_pci_get_max_link_speed(struct device_node *node)
 static inline void of_pci_check_probe_only(void) { }
 #endif
 
+#if IS_ENABLED(CONFIG_OF_IRQ)
+int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin);
+#else
+static inline int
+of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin)
+{
+   return 0;
+}
+#endif
+
 #if defined(CONFIG_OF_ADDRESS)
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
-- 
2.7.4

[PATCH] PCI: Move declaration of of_irq_parse_and_map_pci under OF_IRQ

2018-02-26 Thread Guenter Roeck

Since commit 4670d610d5923 ("PCI: Move OF-related PCI functions into
PCI core"), sparc:allmodconfig fails to build with the following error.

pcie-cadence-host.c:(.text+0x4c4):
undefined reference to `of_irq_parse_and_map_pci'
pcie-cadence-host.c:(.text+0x4c8):
undefined reference to `of_irq_parse_and_map_pci'

of_irq_parse_and_map_pci is now only available if OF_IRQ is enabled.
Make its declaration and its dummy function dependent on OF_IRQ
to solve the problem.

Fixes: 4670d610d5923 ("PCI: Move OF-related PCI functions into PCI core")
Cc: Rob Herring 
Signed-off-by: Guenter Roeck 
---
 include/linux/of_pci.h | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 88865e0ebf4d..091033a6b836 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -13,7 +13,6 @@ struct device_node;
 struct device_node *of_pci_find_child_device(struct device_node *parent,
 unsigned int devfn);
 int of_pci_get_devfn(struct device_node *np);
-int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin);
 int of_pci_parse_bus_range(struct device_node *node, struct resource *res);
 int of_get_pci_domain_nr(struct device_node *node);
 int of_pci_get_max_link_speed(struct device_node *node);
@@ -34,12 +33,6 @@ static inline int of_pci_get_devfn(struct device_node *np)
 }
 
 static inline int
-of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin)
-{
-   return 0;
-}
-
-static inline int
 of_pci_parse_bus_range(struct device_node *node, struct resource *res)
 {
return -EINVAL;
@@ -67,6 +60,16 @@ of_pci_get_max_link_speed(struct device_node *node)
 static inline void of_pci_check_probe_only(void) { }
 #endif
 
+#if IS_ENABLED(CONFIG_OF_IRQ)
+int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin);
+#else
+static inline int
+of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin)
+{
+   return 0;
+}
+#endif
+
 #if defined(CONFIG_OF_ADDRESS)
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
-- 
2.7.4

Re: [alsa-devel] regression v4.16 on Nokia N900: sound does not work

2018-02-26 Thread Pavel Machek

On Mon 2018-02-26 16:02:22, Daniel Baluta wrote:
> On Mon, Feb 26, 2018 at 3:13 PM, Pavel Machek  wrote:
> > Hi!
> >
> >> JFYI: This issues is tracked in the regression reports for Linux 4.16
> >> (http://bit.ly/lnxregrep416 ) with this id:
> >>
> >> Linux-Regression-ID: lr#4b650f
> >
> > Ok, so it seems that issue is bigger: whole sound subsystem does not
> > work. /proc/asound/cards is empty.
> >
> > 7e6127c1240ed569cdda2a67c8f03836f9f28c05 seems to be bad already.
> >
> > I tried to revert sound/soc changes, and sound is broken, too. Nasty
> 
> 
> dmesg log?

Partial dmesg is at:
https://github.com/pavelmachek/missy/blob/master/db/phone/nokia/n900/pavel/2018.1291171648263/dmesg.out

I should be able to get full one...

I did git bisect, and the winner seems to be:

pavel@duo:/data/l/linux-n900$ git bisect bad
c85823390215e52d68d3826df92a447ed31e5c80 is the first bad commit
commit c85823390215e52d68d3826df92a447ed31e5c80
Author: Linus Walleij 
Date:   Wed Dec 27 16:37:44 2017 +0100

gpio: of: Support SPI nonstandard GPIO properties

Before it was clearly established that all GPIO properties in the
device tree shall be named "foo-gpios" (with the deprecated
variant
"foo-gpio" for single lines) we unfortunately merged a few
bindings
which named the lines "gpio-foo" instead.

This is most prominent in the GPIO SPI driver in Linux which names
the lines "gpio-sck", "gpio-mosi" and "gpio-miso".

As we want to switch the GPIO SPI driver to using descriptors, we
need devm_gpiod_get() to return something reasonable when
looking
up these in the device tree.

Put in a special #ifdef:ed kludge to do this special lookup only
for the SPI case and gets compiled out if we're not enabling
SPI.
If we have more oddly defined legacy GPIOs like this, they
can be
handled in a similar manner.

Reviewed-by: Rob Herring 
Signed-off-by: Linus Walleij 

Unfortunately, it does not seem to revert cleanly on my v4.16 branch.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

signature.asc
Description: Digital signature

Re: [alsa-devel] regression v4.16 on Nokia N900: sound does not work

2018-02-26 Thread Pavel Machek

On Mon 2018-02-26 16:02:22, Daniel Baluta wrote:
> On Mon, Feb 26, 2018 at 3:13 PM, Pavel Machek  wrote:
> > Hi!
> >
> >> JFYI: This issues is tracked in the regression reports for Linux 4.16
> >> (http://bit.ly/lnxregrep416 ) with this id:
> >>
> >> Linux-Regression-ID: lr#4b650f
> >
> > Ok, so it seems that issue is bigger: whole sound subsystem does not
> > work. /proc/asound/cards is empty.
> >
> > 7e6127c1240ed569cdda2a67c8f03836f9f28c05 seems to be bad already.
> >
> > I tried to revert sound/soc changes, and sound is broken, too. Nasty
> 
> 
> dmesg log?

Partial dmesg is at:
https://github.com/pavelmachek/missy/blob/master/db/phone/nokia/n900/pavel/2018.1291171648263/dmesg.out

I should be able to get full one...

I did git bisect, and the winner seems to be:

pavel@duo:/data/l/linux-n900$ git bisect bad
c85823390215e52d68d3826df92a447ed31e5c80 is the first bad commit
commit c85823390215e52d68d3826df92a447ed31e5c80
Author: Linus Walleij 
Date:   Wed Dec 27 16:37:44 2017 +0100

gpio: of: Support SPI nonstandard GPIO properties

Before it was clearly established that all GPIO properties in the
device tree shall be named "foo-gpios" (with the deprecated
variant
"foo-gpio" for single lines) we unfortunately merged a few
bindings
which named the lines "gpio-foo" instead.

This is most prominent in the GPIO SPI driver in Linux which names
the lines "gpio-sck", "gpio-mosi" and "gpio-miso".

As we want to switch the GPIO SPI driver to using descriptors, we
need devm_gpiod_get() to return something reasonable when
looking
up these in the device tree.

Put in a special #ifdef:ed kludge to do this special lookup only
for the SPI case and gets compiled out if we're not enabling
SPI.
If we have more oddly defined legacy GPIOs like this, they
can be
handled in a similar manner.

Reviewed-by: Rob Herring 
Signed-off-by: Linus Walleij 

Unfortunately, it does not seem to revert cleanly on my v4.16 branch.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

signature.asc
Description: Digital signature

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 2732 matches

Mail list logo