date:20210106

Re: [PATCH v2] x86: fix movdir64b() sparse warning

2021-01-06 Thread Borislav Petkov

On Wed, Jan 06, 2021 at 03:40:25PM -0700, Dave Jiang wrote:

> Subject: Re: [PATCH v2] x86: fix movdir64b() sparse warning

There are a lot of times I don't agree with checkpatch but this time I do:

WARNING: A patch subject line should describe the change not the tool that 
found it
#2: 
Subject: [PATCH v2] x86: fix movdir64b() sparse warning

Pls fix your other patch subject too.

> Add missing __iomem annotation to address sparse warning. Caller is expected
> to pass an __iomem annotated pointer to this function. The current usages
> send a 64bytes command descriptor to an MMIO location (portal) on a
> device for consumption. When future usages for MOVDIR64B instruction show
> up in kernel for memory to memory operation is needed, we can revisit.

Who's "we"?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 3/5] crypto: add RFC5869 HKDF

2021-01-06 Thread Stephan Mueller

Am Mittwoch, dem 06.01.2021 um 23:30 -0800 schrieb Eric Biggers:
> On Mon, Jan 04, 2021 at 10:49:13PM +0100, Stephan Müller wrote:
> > RFC5869 specifies an extract and expand two-step key derivation
> > function. The HKDF implementation is provided as a service function that
> > operates on a caller-provided HMAC cipher handle.
> 
> HMAC isn't a "cipher".
> 
> > The extract function is invoked via the crypto_hkdf_setkey call.
> 
> Any reason not to call this crypto_hkdf_extract(), to match the
> specification?

I named it to match the other KDF implementation. But you are right, I will
name it accordingly.

> 
> > RFC5869
> > allows two optional parameters to be provided to the extract operation:
> > the salt and additional information. Both are to be provided with the
> > seed parameter where the salt is the first entry of the seed parameter
> > and all subsequent entries are handled as additional information. If
> > the caller intends to invoke the HKDF without salt, it has to provide a
> > NULL/0 entry as first entry in seed.
> 
> Where does "additional information" for extract come from?  RFC 5869 has:
> 
> HKDF-Extract(salt, IKM) -> PRK
> 
> Inputs:
>   salt optional salt value (a non-secret random value);
>    if not provided, it is set to a string of HashLen
> zeros.
>   IKM  input keying material
> 
> There's no "additional information".

I used the terminology from SP800-108. I will update the description
accordingly. 
> 
> > 
> > The expand function is invoked via the crypto_hkdf_generate and can be
> > invoked multiple times. This function allows the caller to provide a
> > context for the key derivation operation. As specified in RFC5869, it is
> > optional. In case such context is not provided, the caller must provide
> > NULL / 0 for the info / info_nvec parameters.
> 
> Any reason not to call this crypto_hkdf_expand() to match the specification?

I will update the function name.

Thanks
Stephan
> 
> - Eric

Re: Re: [PATCH] net: ethernet: Fix memleak in ethoc_probe

2021-01-06 Thread dinghao . liu

> On Wed, 6 Jan 2021 18:56:23 +0800 (GMT+08:00) dinghao@zju.edu.cn
> wrote:
> > > I used this one for a test:
> > > 
> > > https://patchwork.kernel.org/project/netdevbpf/patch/1609312994-121032-1-git-send-email-abaci-bug...@linux.alibaba.com/
> > > 
> > > I'm not getting the Fixes tag when I download the mbox.  
> > 
> > It seems that automatically generating Fixes tags is a hard work.
> > Both patches and bugs could be complex. Sometimes even human cannot
> > determine which commit introduced a target bug.
> > 
> > Is this an already implemented functionality?
> 
> I'm not sure I understand. Indeed finding the right commit to use in 
> a Fixes tag is not always easy, and definitely not easy to automate.
> Human validation is always required.
> 
> If we could easily automate finding the commit which introduced a
> problem we wouldn't need to add the explicit tag, backporters could
> just run such script locally.. That's why it's best if the author 
> does the digging and provides the right tag.
> 
> The conversation with Konstantin and Florian was about automatically
> picking up Fixes tags from the mailing list by the patchwork software,
> when such tags are posted in reply to the original posting, just like
> review tags. But the tags are still generated by humans.

It's clear to me, thanks.

Regards,
Dinghao

[PATCH v2] kasan: remove redundant config option

2021-01-06 Thread Walter Wu

CONFIG_KASAN_STACK and CONFIG_KASAN_STACK_ENABLE both enable KASAN stack
instrumentation, but we should only need one config, so that we remove
CONFIG_KASAN_STACK_ENABLE and make CONFIG_KASAN_STACK workable. see [1].

When enable KASAN stack instrumentation, then for gcc we could do no
prompt and default value y, and for clang prompt and default value n.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=210221

Signed-off-by: Walter Wu 
Suggested-by: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Andrey Konovalov 
Cc: Alexander Potapenko 
Cc: Andrew Morton 
---

v2: make commit log to be more readable.

---
 arch/arm64/kernel/sleep.S|  2 +-
 arch/x86/kernel/acpi/wakeup_64.S |  2 +-
 include/linux/kasan.h|  2 +-
 lib/Kconfig.kasan| 11 ---
 mm/kasan/common.c|  2 +-
 mm/kasan/kasan.h |  2 +-
 mm/kasan/report_generic.c|  2 +-
 scripts/Makefile.kasan   | 10 --
 8 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 6bdef7362c0e..7c44ede122a9 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -133,7 +133,7 @@ SYM_FUNC_START(_cpu_resume)
 */
bl  cpu_do_resume
 
-#if defined(CONFIG_KASAN) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
mov x0, sp
bl  kasan_unpoison_task_stack_below
 #endif
diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 5d3a0b8fd379..c7f412f4e07d 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -112,7 +112,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
movqpt_regs_r14(%rax), %r14
movqpt_regs_r15(%rax), %r15
 
-#if defined(CONFIG_KASAN) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
/*
 * The suspend path may have poisoned some areas deeper in the stack,
 * which we now need to unpoison.
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 5e0655fb2a6f..35d1e9b2cbfa 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -302,7 +302,7 @@ static inline void kasan_kfree_large(void *ptr, unsigned 
long ip) {}
 
 #endif /* CONFIG_KASAN */
 
-#if defined(CONFIG_KASAN) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
 void kasan_unpoison_task_stack(struct task_struct *task);
 #else
 static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index f5fa4ba126bf..59de74293454 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -138,9 +138,11 @@ config KASAN_INLINE
 
 endchoice
 
-config KASAN_STACK_ENABLE
-   bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
+config KASAN_STACK
+   bool "Enable stack instrumentation (unsafe)"
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   default y if CC_IS_GCC
+   default n if CC_IS_CLANG
help
  The LLVM stack address sanitizer has a know problem that
  causes excessive stack usage in a lot of functions, see
@@ -154,11 +156,6 @@ config KASAN_STACK_ENABLE
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
 
-config KASAN_STACK
-   int
-   default 1 if KASAN_STACK_ENABLE || CC_IS_GCC
-   default 0
-
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
depends on KASAN_SW_TAGS
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 38ba2aecd8f4..02ec7f81dc16 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -63,7 +63,7 @@ void __kasan_unpoison_range(const void *address, size_t size)
unpoison_range(address, size);
 }
 
-#if CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN_STACK)
 /* Unpoison the entire stack for a task. */
 void kasan_unpoison_task_stack(struct task_struct *task)
 {
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index cc4d9e1d49b1..bdfdb1cff653 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -224,7 +224,7 @@ void *find_first_bad_addr(void *addr, size_t size);
 const char *get_bug_type(struct kasan_access_info *info);
 void metadata_fetch_row(char *buffer, void *row);
 
-#if defined(CONFIG_KASAN_GENERIC) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN_GENERIC) && defined(CONFIG_KASAN_STACK)
 void print_address_stack_frame(const void *addr);
 #else
 static inline void print_address_stack_frame(const void *addr) { }
diff --git a/mm/kasan/report_generic.c b/mm/kasan/report_generic.c
index 8a9c889872da..137a1dba1978 100644
--- a/mm/kasan/report_generic.c
+++ b/mm/kasan/report_generic.c
@@ -128,7 +128,7 @@ void metadata_fetch_row(char *buffer, void *row)
memcpy(buffer, kasan_mem_to_shadow(row), META_BYTES_PER_ROW);
 }
 
-#if CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN_STACK)
 static bool __must_check

[PATCH 4.19 1/7] clone: add CLONE_PIDFD

2021-01-06 Thread Wen Yang

From: Christian Brauner 

[ Upstream commit b3e5838252665ee4cfa76b82bdf1198dca81e5be ]

This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call.  Linus originally suggested to implement this as a
new flag to clone() instead of making it a separate system call.  As
spotted by Linus, there is exactly one bit for clone() left.

CLONE_PIDFD creates file descriptors based on the anonymous inode
implementation in the kernel that will also be used to implement the new
mount api.  They serve as a simple opaque handle on pids.  Logically,
this makes it possible to interpret a pidfd differently, narrowing or
widening the scope of various operations (e.g. signal sending).  Thus, a
pidfd cannot just refer to a tgid, but also a tid, or in theory - given
appropriate flag arguments in relevant syscalls - a process group or
session. A pidfd does not represent a privilege.  This does not imply it
cannot ever be that way but for now this is not the case.

A pidfd comes with additional information in fdinfo if the kernel supports
procfs.  The fdinfo file contains the pid of the process in the callers
pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".

As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
parent_tidptr argument of clone.  This has the advantage that we can
give back the associated pid and the pidfd at the same time.

To remove worries about missing metadata access this patchset comes with
a sample program that illustrates how a combination of CLONE_PIDFD, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/.  The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.

Suggested-by: Linus Torvalds 
Signed-off-by: Christian Brauner 
Co-developed-by: Jann Horn 
Signed-off-by: Jann Horn 
Reviewed-by: Oleg Nesterov 
Cc: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: David Howells 
Cc: "Michael Kerrisk (man-pages)" 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc:  # 4.19.x
(clone: fix up cherry-pick conflicts for b3e583825266)
Signed-off-by: Wen Yang 
---
 include/linux/pid.h|   2 +
 include/uapi/linux/sched.h |   1 +
 kernel/fork.c  | 107 +++--
 3 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 14a9a39..29c0a99 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -66,6 +66,8 @@ struct pid
 
 extern struct pid init_struct_pid;
 
+extern const struct file_operations pidfd_fops;
+
 static inline struct pid *get_pid(struct pid *pid)
 {
if (pid)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 22627f8..ed4ee17 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -10,6 +10,7 @@
 #define CLONE_FS   0x0200  /* set if fs info shared between 
processes */
 #define CLONE_FILES0x0400  /* set if open files shared between 
processes */
 #define CLONE_SIGHAND  0x0800  /* set if signal handlers and blocked 
signals shared */
+#define CLONE_PIDFD0x1000  /* set if a pidfd should be placed in 
parent */
 #define CLONE_PTRACE   0x2000  /* set if we want to let tracing 
continue on the child too */
 #define CLONE_VFORK0x4000  /* set if the parent wants the child to 
wake it up on mm_release */
 #define CLONE_PARENT   0x8000  /* set if we want to have the same 
parent as the cloner */
diff --git a/kernel/fork.c b/kernel/fork.c
index f2c92c1..e419891 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -11,6 +11,7 @@
  * management can be a bitch. See 'mm/memory.c': 'copy_page_range()'
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -21,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1666,6 +1668,58 @@ static void copy_oom_score_adj(u64 clone_flags, struct 
task_struct *tsk)
mutex_unlock(_adj_mutex);
 }
 
+static int pidfd_release(struct inode *inode, struct file *file)
+{
+   struct pid *pid = file->private_data;
+
+   file->private_data = NULL;
+   put_pid(pid);
+   return 0;
+}
+
+#ifdef CONFIG_PROC_FS
+static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
+   struct pid *pid = f->private_data;
+
+   seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
+   seq_putc(m, '\n');
+}
+#endif
+
+const struct file_operations pidfd_fops = {
+   .release = pidfd_release,
+#ifdef CONFIG_PROC_FS
+   .show_fdinfo = pidfd_show_fdinfo,
+#endif
+};
+
+/**
+ * pidfd_create() - Create a new pid file descriptor.
+ *
+ * @pid:  struct pid that the pidfd will

[PATCH 4.19 5/7] proc: Clear the pieces of proc_inode that proc_evict_inode cares about

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 71448011ea2a1cd36d8f5cbdab0ed716c454d565 ]

This just keeps everything tidier, and allows for using flags like
SLAB_TYPESAFE_BY_RCU where slabs are not always cleared before reuse.
I don't see reuse without reinitializing happening with the proc_inode
but I had a false alarm while reworking flushing of proc dentries and
indoes when a process dies that caused me to tidy this up.

The code is a little easier to follow and reason about this
way so I figured the changes might as well be kept.

Signed-off-by: "Eric W. Biederman" 
Cc:  # 4.19.x
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index fffc7e4..45b4344 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -34,21 +34,27 @@ static void proc_evict_inode(struct inode *inode)
 {
struct proc_dir_entry *de;
struct ctl_table_header *head;
+   struct proc_inode *ei = PROC_I(inode);
 
truncate_inode_pages_final(>i_data);
clear_inode(inode);
 
/* Stop tracking associated processes */
-   put_pid(PROC_I(inode)->pid);
+   if (ei->pid) {
+   put_pid(ei->pid);
+   ei->pid = NULL;
+   }
 
/* Let go of any associated proc directory entry */
-   de = PDE(inode);
-   if (de)
+   de = ei->pde;
+   if (de) {
pde_put(de);
+   ei->pde = NULL;
+   }
 
-   head = PROC_I(inode)->sysctl;
+   head = ei->sysctl;
if (head) {
-   RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL);
+   RCU_INIT_POINTER(ei->sysctl, NULL);
proc_sys_evict_inode(inode, head);
}
 }
-- 
1.8.3.1

[PATCH 4.19 6/7] proc: Use d_invalidate in proc_prune_siblings_dcache

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit f90f3cafe8d56d593fc509a4185da1d5800efea4 ]

The function d_prune_aliases has the problem that it will only prune
aliases thare are completely unused.  It will not remove aliases for
the dcache or even think of removing mounts from the dcache.  For that
behavior d_invalidate is needed.

To use d_invalidate replace d_prune_aliases with d_find_alias followed
by d_invalidate and dput.

For completeness the directory and the non-directory cases are
separated because in theory (although not in currently in practice for
proc) directories can only ever have a single dentry while
non-directories can have hardlinks and thus multiple dentries.
As part of this separation use d_find_any_alias for directories
to spare d_find_alias the extra work of doing that.

Plus the differences between d_find_any_alias and d_find_alias makes
it clear why the directory and non-directory code and not share code.

To make it clear these routines now invalidate dentries rename
proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename
proc_sys_prune_dcache proc_sys_invalidate_dcache.

V2: Split the directory and non-directory cases.  To make this
code robust to future changes in proc.

Signed-off-by: "Eric W. Biederman" 
Cc:  # 4.19.x
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c   | 16 ++--
 fs/proc/internal.h|  2 +-
 fs/proc/proc_sysctl.c |  8 
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 45b4344..fad579e 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -118,7 +118,7 @@ void __init proc_init_kmemcache(void)
BUILD_BUG_ON(sizeof(struct proc_dir_entry) >= SIZEOF_PDE);
 }
 
-void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock)
+void proc_invalidate_siblings_dcache(struct hlist_head *inodes, spinlock_t 
*lock)
 {
struct inode *inode;
struct proc_inode *ei;
@@ -147,7 +147,19 @@ void proc_prune_siblings_dcache(struct hlist_head *inodes, 
spinlock_t *lock)
continue;
}
 
-   d_prune_aliases(inode);
+   if (S_ISDIR(inode->i_mode)) {
+   struct dentry *dir = d_find_any_alias(inode);
+   if (dir) {
+   d_invalidate(dir);
+   dput(dir);
+   }
+   } else {
+   struct dentry *dentry;
+   while ((dentry = d_find_alias(inode))) {
+   d_invalidate(dentry);
+   dput(dentry);
+   }
+   }
iput(inode);
deactivate_super(sb);
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 6cae472..1db693b 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -210,7 +210,7 @@ struct pde_opener {
 extern const struct inode_operations proc_pid_link_inode_operations;
 
 void proc_init_kmemcache(void);
-void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock);
+void proc_invalidate_siblings_dcache(struct hlist_head *inodes, spinlock_t 
*lock);
 void set_proc_pid_nlink(void);
 extern struct inode *proc_get_inode(struct super_block *, struct 
proc_dir_entry *);
 extern int proc_fill_super(struct super_block *, void *data, int flags);
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 57b16bf..f8f1f8a 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -262,9 +262,9 @@ static void unuse_table(struct ctl_table_header *p)
complete(p->unregistering);
 }
 
-static void proc_sys_prune_dcache(struct ctl_table_header *head)
+static void proc_sys_invalidate_dcache(struct ctl_table_header *head)
 {
-   proc_prune_siblings_dcache(>inodes, _lock);
+   proc_invalidate_siblings_dcache(>inodes, _lock);
 }
 
 /* called under sysctl_lock, will reacquire if has to wait */
@@ -286,10 +286,10 @@ static void start_unregistering(struct ctl_table_header 
*p)
spin_unlock(_lock);
}
/*
-* Prune dentries for unregistered sysctls: namespaced sysctls
+* Invalidate dentries for unregistered sysctls: namespaced sysctls
 * can have duplicate names and contaminate dcache very badly.
 */
-   proc_sys_prune_dcache(p);
+   proc_sys_invalidate_dcache(p);
/*
 * do not remove from the list until nobody holds it; walking the
 * list in do_sysctl() relies on that.
-- 
1.8.3.1

Re: [PATCH] arm64: dts: ls1028a: fix the offset of the reset register

2021-01-06 Thread Michael Walle


Hi Shawn,

Am 2021-01-07 07:40, schrieb Shawn Guo:

On Tue, Dec 15, 2020 at 10:26:22PM +0100, Michael Walle wrote:

The offset of the reset request register is 0, the absolute address is
0x1e6. Boards without PSCI support will fail to perform a reset:

[   26.734700] reboot: Restarting system
[   27.743259] Unable to restart system
[   27.746845] Reboot failed -- System halted

Fixes: 8897f3255c9c ("arm64: dts: Add support for NXP LS1028A SoC")
Signed-off-by: Michael Walle 


Out of curiosity, how did you get it fixed with your commit 
3f0fb37b22b4

("arm64: dts: ls1028a: fix reboot node") in the first place?


I simply must have missed it. There is also a fallback reset method
via the watchdog in the chain, which kicks in if this wasn't successful.
So if you test it, it is easy to think its working although its not.

-michael

[PATCH 4.19 2/7] pidfd: add polling support

2021-01-06 Thread Wen Yang

From: "Joel Fernandes (Google)" 

[ Upstream commit b53b0b9d9a613c418057f6cb921c2f40a6f78c24 ]

This patch adds polling support to pidfd.

Android low memory killer (LMK) needs to know when a process dies once
it is sent the kill signal. It does so by checking for the existence of
/proc/pid which is both racy and slow. For example, if a PID is reused
between when LMK sends a kill signal and checks for existence of the
PID, since the wrong PID is now possibly checked for existence.
Using the polling support, LMK will be able to get notified when a process
exists in race-free and fast way, and allows the LMK to do other things
(such as by polling on other fds) while awaiting the process being killed
to die.

For notification to polling processes, we follow the same existing
mechanism in the kernel used when the parent of the task group is to be
notified of a child's death (do_notify_parent). This is precisely when the
tasks waiting on a poll of pidfd are also awakened in this patch.

We have decided to include the waitqueue in struct pid for the following
reasons:
1. The wait queue has to survive for the lifetime of the poll. Including
   it in task_struct would not be option in this case because the task can
   be reaped and destroyed before the poll returns.

2. By including the struct pid for the waitqueue means that during
   de_thread(), the new thread group leader automatically gets the new
   waitqueue/pid even though its task_struct is different.

Appropriate test cases are added in the second patch to provide coverage of
all the cases the patch is handling.

Cc: Andy Lutomirski 
Cc: Steven Rostedt 
Cc: Daniel Colascione 
Cc: Jann Horn 
Cc: Tim Murray 
Cc: Jonathan Kowalski 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: Kees Cook 
Cc: David Howells 
Cc: Oleg Nesterov 
Cc: kernel-t...@android.com
Reviewed-by: Oleg Nesterov 
Co-developed-by: Daniel Colascione 
Signed-off-by: Daniel Colascione 
Signed-off-by: Joel Fernandes (Google) 
Signed-off-by: Christian Brauner 
Cc:  # 4.19.x
Signed-off-by: Wen Yang 
---
 include/linux/pid.h |  3 +++
 kernel/fork.c   | 26 ++
 kernel/pid.c|  2 ++
 kernel/signal.c | 11 +++
 4 files changed, 42 insertions(+)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 29c0a99..a82d2f7 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -3,6 +3,7 @@
 #define _LINUX_PID_H
 
 #include 
+#include 
 
 enum pid_type
 {
@@ -60,6 +61,8 @@ struct pid
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
+   /* wait queue for pidfd notifications */
+   wait_queue_head_t wait_pidfd;
struct rcu_head rcu;
struct upid numbers[1];
 };
diff --git a/kernel/fork.c b/kernel/fork.c
index e419891..33dc746 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1688,8 +1688,34 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct 
file *f)
 }
 #endif
 
+/*
+ * Poll support for process exit notification.
+ */
+static unsigned int pidfd_poll(struct file *file, struct poll_table_struct 
*pts)
+{
+   struct task_struct *task;
+   struct pid *pid = file->private_data;
+   int poll_flags = 0;
+
+   poll_wait(file, >wait_pidfd, pts);
+
+   rcu_read_lock();
+   task = pid_task(pid, PIDTYPE_PID);
+   /*
+* Inform pollers only when the whole thread group exits.
+* If the thread group leader exits before all other threads in the
+* group, then poll(2) should block, similar to the wait(2) family.
+*/
+   if (!task || (task->exit_state && thread_group_empty(task)))
+   poll_flags = POLLIN | POLLRDNORM;
+   rcu_read_unlock();
+
+   return poll_flags;
+}
+
 const struct file_operations pidfd_fops = {
.release = pidfd_release,
+   .poll = pidfd_poll,
 #ifdef CONFIG_PROC_FS
.show_fdinfo = pidfd_show_fdinfo,
 #endif
diff --git a/kernel/pid.c b/kernel/pid.c
index b88fe5e..3ba6fcb 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -214,6 +214,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
for (type = 0; type < PIDTYPE_MAX; ++type)
INIT_HLIST_HEAD(>tasks[type]);
 
+   init_waitqueue_head(>wait_pidfd);
+
upid = pid->numbers + ns->level;
spin_lock_irq(_lock);
if (!(ns->pid_allocated & PIDNS_ADDING))
diff --git a/kernel/signal.c b/kernel/signal.c
index a02a25a..22a04795 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1810,6 +1810,14 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, 
enum pid_type type)
return ret;
 }
 
+static void do_notify_pidfd(struct task_struct *task)
+{
+   struct pid *pid;
+
+   pid = task_pid(task);
+   wake_up_all(>wait_pidfd);
+}
+
 /*
  * Let a parent know about the death of a child.
  * For a stopped/continued status change, use do_notify_parent_cldstop instead.
@@ -1833,6 +1841,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)

[PATCH 4.19 3/7] proc: Rename in proc_inode rename sysctl_inodes sibling_inodes

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 0afa5ca82212247456f9de1468b595a111fee633 ]

I about to need and use the same functionality for pid based
inodes and there is no point in adding a second field when
this field is already here and serving the same purporse.

Just give the field a generic name so it is clear that
it is no longer sysctl specific.

Also for good measure initialize sibling_inodes when
proc_inode is initialized.

Signed-off-by: Eric W. Biederman 
Cc:  # 4.19.x
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c   | 1 +
 fs/proc/internal.h| 2 +-
 fs/proc/proc_sysctl.c | 8 
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 31bf3bb..e5334ed 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -70,6 +70,7 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
ei->pde = NULL;
ei->sysctl = NULL;
ei->sysctl_entry = NULL;
+   INIT_HLIST_NODE(>sibling_inodes);
ei->ns_ops = NULL;
inode = >vfs_inode;
return inode;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 95b1419..d922c01 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -91,7 +91,7 @@ struct proc_inode {
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
-   struct hlist_node sysctl_inodes;
+   struct hlist_node sibling_inodes;
const struct proc_ns_operations *ns_ops;
struct inode vfs_inode;
 } __randomize_layout;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c95f32b..0f578f6 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -274,9 +274,9 @@ static void proc_sys_prune_dcache(struct ctl_table_header 
*head)
node = hlist_first_rcu(>inodes);
if (!node)
break;
-   ei = hlist_entry(node, struct proc_inode, sysctl_inodes);
+   ei = hlist_entry(node, struct proc_inode, sibling_inodes);
spin_lock(_lock);
-   hlist_del_init_rcu(>sysctl_inodes);
+   hlist_del_init_rcu(>sibling_inodes);
spin_unlock(_lock);
 
inode = >vfs_inode;
@@ -478,7 +478,7 @@ static struct inode *proc_sys_make_inode(struct super_block 
*sb,
}
ei->sysctl = head;
ei->sysctl_entry = table;
-   hlist_add_head_rcu(>sysctl_inodes, >inodes);
+   hlist_add_head_rcu(>sibling_inodes, >inodes);
head->count++;
spin_unlock(_lock);
 
@@ -509,7 +509,7 @@ static struct inode *proc_sys_make_inode(struct super_block 
*sb,
 void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
 {
spin_lock(_lock);
-   hlist_del_init_rcu(_I(inode)->sysctl_inodes);
+   hlist_del_init_rcu(_I(inode)->sibling_inodes);
if (!--head->count)
kfree_rcu(head, rcu);
spin_unlock(_lock);
-- 
1.8.3.1

[PATCH 4.19 7/7] proc: Use a list of inodes to flush from proc

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]

Rework the flushing of proc to use a list of directory inodes that
need to be flushed.

The list is kept on struct pid not on struct task_struct, as there is
a fixed connection between proc inodes and pids but at least for the
case of de_thread the pid of a task_struct changes.

This removes the dependency on proc_mnt which allows for different
mounts of proc having different mount options even in the same pid
namespace and this allows for the removal of proc_mnt which will
trivially the first mount of proc to honor it's mount options.

This flushing remains an optimization.  The functions
pid_delete_dentry and pid_revalidate ensure that ordinary dcache
management will not attempt to use dentries past the point their
respective task has died.  When unused the shrinker will
eventually be able to remove these dentries.

There is a case in de_thread where proc_flush_pid can be
called early for a given pid.  Which winds up being
safe (if suboptimal) as this is just an optiimization.

Only pid directories are put on the list as the other
per pid files are children of those directories and
d_invalidate on the directory will get them as well.

So that the pid can be used during flushing it's reference count is
taken in release_task and dropped in proc_flush_pid.  Further the call
of proc_flush_pid is moved after the tasklist_lock is released in
release_task so that it is certain that the pid has already been
unhashed when flushing it taking place.  This removes a small race
where a dentry could recreated.

As struct pid is supposed to be small and I need a per pid lock
I reuse the only lock that currently exists in struct pid the
the wait_pidfd.lock.

The net result is that this adds all of this functionality
with just a little extra list management overhead and
a single extra pointer in struct pid.

v2: Initialize pid->inodes.  I somehow failed to get that
initialization into the initial version of the patch.  A boot
failure was reported by "kernel test robot ", and
failure to initialize that pid->inodes matches all of the reported
symptoms.

Signed-off-by: Eric W. Biederman 
Fixes: f333c700c610 ("pidns: Add a limit on the number of pid
namespaces")
Fixes: 60347f6716aa ("pid namespaces: prepare proc_flust_task() to flush
entries from multiple proc trees")
Cc:  # 4.19.x: b3e583825266: clone: add
CLONE_PIDFD
Cc:  # 4.19.x: b53b0b9d9a61: pidfd: add polling
support
Cc:  # 4.19.x: 0afa5ca82212: proc: Rename in
proc_inode rename sysctl_inodes sibling_inodes
Cc:  # 4.19.x: 26dbc60f385f: proc: Generalize
proc_sys_prune_dcache into proc_prune_siblings_dcache
Cc:  # 4.19.x: 71448011ea2a: proc: Clear the
pieces of
proc_inode that proc_evict_inode cares about
Cc:  # 4.19.x: f90f3cafe8d5: Use d_invalidate in
proc_prune_siblings_dcache
Cc:  # 4.19.x
Signed-off-by: Wen Yang 
---
 fs/proc/base.c  | 111 
 fs/proc/inode.c |   2 +-
 fs/proc/internal.h  |   1 +
 include/linux/pid.h |   1 +
 include/linux/proc_fs.h |   4 +-
 kernel/exit.c   |   4 +-
 kernel/pid.c|   1 +
 7 files changed, 45 insertions(+), 79 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5e705fa..ea74c7c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1741,11 +1741,25 @@ void task_dump_owner(struct task_struct *task, umode_t 
mode,
*rgid = gid;
 }
 
+void proc_pid_evict_inode(struct proc_inode *ei)
+{
+   struct pid *pid = ei->pid;
+
+   if (S_ISDIR(ei->vfs_inode.i_mode)) {
+   spin_lock(>wait_pidfd.lock);
+   hlist_del_init_rcu(>sibling_inodes);
+   spin_unlock(>wait_pidfd.lock);
+   }
+
+   put_pid(pid);
+}
+
 struct inode *proc_pid_make_inode(struct super_block * sb,
  struct task_struct *task, umode_t mode)
 {
struct inode * inode;
struct proc_inode *ei;
+   struct pid *pid;
 
/* We need a new inode */
 
@@ -1763,10 +1777,18 @@ struct inode *proc_pid_make_inode(struct super_block * 
sb,
/*
 * grab the reference to task.
 */
-   ei->pid = get_task_pid(task, PIDTYPE_PID);
-   if (!ei->pid)
+   pid = get_task_pid(task, PIDTYPE_PID);
+   if (!pid)
goto out_unlock;
 
+   /* Let the pid remember us for quick removal */
+   ei->pid = pid;
+   if (S_ISDIR(mode)) {
+   spin_lock(>wait_pidfd.lock);
+   hlist_add_head_rcu(>sibling_inodes, >inodes);
+   spin_unlock(>wait_pidfd.lock);
+   }
+
task_dump_owner(task, 0, >i_uid, >i_gid);
security_task_to_inode(task, inode);
 
@@ -3067,90 +3089,29 @@ static struct dentry *proc_tgid_base_lookup(struct 
inode *dir, struct dentry *de
.permission = proc_pid_permission,
 };
 
-static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
-{
-

[PATCH 4.19 4/7] proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 26dbc60f385ff9cff475ea2a3bad02e80fd6fa43 ]

This prepares the way for allowing the pid part of proc to use this
dcache pruning code as well.

Signed-off-by: Eric W. Biederman 
Cc:  # 4.19.x
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c   | 38 ++
 fs/proc/internal.h|  1 +
 fs/proc/proc_sysctl.c | 35 +--
 3 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index e5334ed..fffc7e4 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -112,6 +112,44 @@ void __init proc_init_kmemcache(void)
BUILD_BUG_ON(sizeof(struct proc_dir_entry) >= SIZEOF_PDE);
 }
 
+void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock)
+{
+   struct inode *inode;
+   struct proc_inode *ei;
+   struct hlist_node *node;
+   struct super_block *sb;
+
+   rcu_read_lock();
+   for (;;) {
+   node = hlist_first_rcu(inodes);
+   if (!node)
+   break;
+   ei = hlist_entry(node, struct proc_inode, sibling_inodes);
+   spin_lock(lock);
+   hlist_del_init_rcu(>sibling_inodes);
+   spin_unlock(lock);
+
+   inode = >vfs_inode;
+   sb = inode->i_sb;
+   if (!atomic_inc_not_zero(>s_active))
+   continue;
+   inode = igrab(inode);
+   rcu_read_unlock();
+   if (unlikely(!inode)) {
+   deactivate_super(sb);
+   rcu_read_lock();
+   continue;
+   }
+
+   d_prune_aliases(inode);
+   iput(inode);
+   deactivate_super(sb);
+
+   rcu_read_lock();
+   }
+   rcu_read_unlock();
+}
+
 static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
struct super_block *sb = root->d_sb;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index d922c01..6cae472 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -210,6 +210,7 @@ struct pde_opener {
 extern const struct inode_operations proc_pid_link_inode_operations;
 
 void proc_init_kmemcache(void);
+void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock);
 void set_proc_pid_nlink(void);
 extern struct inode *proc_get_inode(struct super_block *, struct 
proc_dir_entry *);
 extern int proc_fill_super(struct super_block *, void *data, int flags);
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 0f578f6..57b16bf 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -264,40 +264,7 @@ static void unuse_table(struct ctl_table_header *p)
 
 static void proc_sys_prune_dcache(struct ctl_table_header *head)
 {
-   struct inode *inode;
-   struct proc_inode *ei;
-   struct hlist_node *node;
-   struct super_block *sb;
-
-   rcu_read_lock();
-   for (;;) {
-   node = hlist_first_rcu(>inodes);
-   if (!node)
-   break;
-   ei = hlist_entry(node, struct proc_inode, sibling_inodes);
-   spin_lock(_lock);
-   hlist_del_init_rcu(>sibling_inodes);
-   spin_unlock(_lock);
-
-   inode = >vfs_inode;
-   sb = inode->i_sb;
-   if (!atomic_inc_not_zero(>s_active))
-   continue;
-   inode = igrab(inode);
-   rcu_read_unlock();
-   if (unlikely(!inode)) {
-   deactivate_super(sb);
-   rcu_read_lock();
-   continue;
-   }
-
-   d_prune_aliases(inode);
-   iput(inode);
-   deactivate_super(sb);
-
-   rcu_read_lock();
-   }
-   rcu_read_unlock();
+   proc_prune_siblings_dcache(>inodes, _lock);
 }
 
 /* called under sysctl_lock, will reacquire if has to wait */
-- 
1.8.3.1

[PATCH v2 4.9 04/10] proc: Better ownership of files for non-dumpable tasks in user namespaces

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 68eb94f16227336a5773b83ecfa8290f1d6b78ce ]

Instead of making the files owned by the GLOBAL_ROOT_USER.  Make
non-dumpable files whose mm has always lived in a user namespace owned
by the user namespace root.  This allows the container root to have
things work as expected in a container.

Signed-off-by: "Eric W. Biederman" 
Cc:  # 4.9.x
Signed-off-by: Wen Yang 
---
 fs/proc/base.c | 102 ++---
 fs/proc/fd.c   |  12 +--
 fs/proc/internal.h |  16 ++---
 3 files changed, 61 insertions(+), 69 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index ee2e0ec..5bfdb61 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1676,12 +1676,63 @@ static int proc_pid_readlink(struct dentry * dentry, 
char __user * buffer, int b
 
 /* building an inode */
 
+void task_dump_owner(struct task_struct *task, mode_t mode,
+kuid_t *ruid, kgid_t *rgid)
+{
+   /* Depending on the state of dumpable compute who should own a
+* proc file for a task.
+*/
+   const struct cred *cred;
+   kuid_t uid;
+   kgid_t gid;
+
+   /* Default to the tasks effective ownership */
+   rcu_read_lock();
+   cred = __task_cred(task);
+   uid = cred->euid;
+   gid = cred->egid;
+   rcu_read_unlock();
+
+   /*
+* Before the /proc/pid/status file was created the only way to read
+* the effective uid of a /process was to stat /proc/pid.  Reading
+* /proc/pid/status is slow enough that procps and other packages
+* kept stating /proc/pid.  To keep the rules in /proc simple I have
+* made this apply to all per process world readable and executable
+* directories.
+*/
+   if (mode != (S_IFDIR|S_IRUGO|S_IXUGO)) {
+   struct mm_struct *mm;
+   task_lock(task);
+   mm = task->mm;
+   /* Make non-dumpable tasks owned by some root */
+   if (mm) {
+   if (get_dumpable(mm) != SUID_DUMP_USER) {
+   struct user_namespace *user_ns = mm->user_ns;
+
+   uid = make_kuid(user_ns, 0);
+   if (!uid_valid(uid))
+   uid = GLOBAL_ROOT_UID;
+
+   gid = make_kgid(user_ns, 0);
+   if (!gid_valid(gid))
+   gid = GLOBAL_ROOT_GID;
+   }
+   } else {
+   uid = GLOBAL_ROOT_UID;
+   gid = GLOBAL_ROOT_GID;
+   }
+   task_unlock(task);
+   }
+   *ruid = uid;
+   *rgid = gid;
+}
+
 struct inode *proc_pid_make_inode(struct super_block * sb,
  struct task_struct *task, umode_t mode)
 {
struct inode * inode;
struct proc_inode *ei;
-   const struct cred *cred;
 
/* We need a new inode */
 
@@ -1703,13 +1754,7 @@ struct inode *proc_pid_make_inode(struct super_block * 
sb,
if (!ei->pid)
goto out_unlock;
 
-   if (task_dumpable(task)) {
-   rcu_read_lock();
-   cred = __task_cred(task);
-   inode->i_uid = cred->euid;
-   inode->i_gid = cred->egid;
-   rcu_read_unlock();
-   }
+   task_dump_owner(task, 0, >i_uid, >i_gid);
security_task_to_inode(task, inode);
 
 out:
@@ -1724,7 +1769,6 @@ int pid_getattr(struct vfsmount *mnt, struct dentry 
*dentry, struct kstat *stat)
 {
struct inode *inode = d_inode(dentry);
struct task_struct *task;
-   const struct cred *cred;
struct pid_namespace *pid = dentry->d_sb->s_fs_info;
 
generic_fillattr(inode, stat);
@@ -1742,12 +1786,7 @@ int pid_getattr(struct vfsmount *mnt, struct dentry 
*dentry, struct kstat *stat)
 */
return -ENOENT;
}
-   if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
-   task_dumpable(task)) {
-   cred = __task_cred(task);
-   stat->uid = cred->euid;
-   stat->gid = cred->egid;
-   }
+   task_dump_owner(task, inode->i_mode, >uid, >gid);
}
rcu_read_unlock();
return 0;
@@ -1763,18 +1802,11 @@ int pid_getattr(struct vfsmount *mnt, struct dentry 
*dentry, struct kstat *stat)
  * Rewrite the inode's ownerships here because the owning task may have
  * performed a setuid(), etc.
  *
- * Before the /proc/pid/status file was created the only way to read
- * the effective uid of a /process was to stat /proc/pid.  Reading
- * /proc/pid/status is slow enough that procps and other packages
- * kept stating /proc/pid.  To keep the rules in /proc simple I have
- * made this apply to all per process world readable

[PATCH 4.19 0/7] fix a race in release_task when flushing the dentry

2021-01-06 Thread Wen Yang

The dentries such as /proc//ns/ have the DCACHE_OP_DELETE flag, they 
should be deleted when the process exits. 

Suppose the following race appears： 

release_task dput 
-> proc_flush_task 
 -> dentry->d_op->d_delete(dentry) 
-> __exit_signal 
 -> dentry->d_lockref.count--  and return. 

In the proc_flush_task(), if another process is using this dentry, it will
not be deleted. At the same time, in dput(), d_op->d_delete() can be executed
before __exit_signal(pid has not been hashed), d_delete returns false, so
this dentry still cannot be deleted.

This dentry will always be cached (although its count is 0 and the
DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and
these dentries can only be deleted when drop_caches is manually triggered.

This will result in wasted memory. What's more troublesome is that these
dentries reference pid, according to the commit f333c700c610 ("pidns: Add a
limit on the number of pid namespaces"), if the pid cannot be released, it
may result in the inability to create a new pid_ns.

This issue was introduced by 60347f6716aa ("pid namespaces: prepare
proc_flust_task() to flush entries from multiple proc trees"), exposed by
f333c700c610 ("pidns: Add a limit on the number of pid namespaces"), and then
fixed by 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc").


Christian Brauner (1):
  clone: add CLONE_PIDFD

Eric W. Biederman (5):
  proc: Rename in proc_inode rename sysctl_inodes sibling_inodes
  proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
  proc: Clear the pieces of proc_inode that proc_evict_inode cares about
  proc: Use d_invalidate in proc_prune_siblings_dcache
  proc: Use a list of inodes to flush from proc

Joel Fernandes (Google) (1):
  pidfd: add polling support

 fs/proc/base.c | 111 -
 fs/proc/inode.c|  67 +--
 fs/proc/internal.h |   4 +-
 fs/proc/proc_sysctl.c  |  45 ++-
 include/linux/pid.h|   6 ++
 include/linux/proc_fs.h|   4 +-
 include/uapi/linux/sched.h |   1 +
 kernel/exit.c  |   4 +-
 kernel/fork.c  | 133 +++--
 kernel/pid.c   |   3 +
 kernel/signal.c|  11 
 11 files changed, 262 insertions(+), 127 deletions(-)

-- 
1.8.3.1

[PATCH v2 4.9 05/10] proc: use %u for pid printing and slightly less stack

2021-01-06 Thread Wen Yang

From: Alexey Dobriyan 

[ Upstream commit e3912ac37e07a13c70675cd75020694de4841c74 ]

PROC_NUMBUF is 13 which is enough for "negative int + \n + \0".

However PIDs and TGIDs are never negative and newline is not a concern,
so use just 10 per integer.

Link: http://lkml.kernel.org/r/20171120203005.GA27743@avx2
Signed-off-by: Alexey Dobriyan 
Cc: Alexander Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Cc:  # 4.9.x
Signed-off-by: Wen Yang 
---
 fs/proc/base.c| 16 
 fs/proc/fd.c  |  2 +-
 fs/proc/self.c|  6 +++---
 fs/proc/thread_self.c |  5 ++---
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5bfdb61..3502a40 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3018,11 +3018,11 @@ static struct dentry *proc_tgid_base_lookup(struct 
inode *dir, struct dentry *de
 static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 {
struct dentry *dentry, *leader, *dir;
-   char buf[PROC_NUMBUF];
+   char buf[10 + 1];
struct qstr name;
 
name.name = buf;
-   name.len = snprintf(buf, sizeof(buf), "%d", pid);
+   name.len = snprintf(buf, sizeof(buf), "%u", pid);
/* no ->d_hash() rejects on procfs */
dentry = d_hash_and_lookup(mnt->mnt_root, );
if (dentry) {
@@ -3034,7 +3034,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, 
pid_t pid, pid_t tgid)
return;
 
name.name = buf;
-   name.len = snprintf(buf, sizeof(buf), "%d", tgid);
+   name.len = snprintf(buf, sizeof(buf), "%u", tgid);
leader = d_hash_and_lookup(mnt->mnt_root, );
if (!leader)
goto out;
@@ -3046,7 +3046,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, 
pid_t pid, pid_t tgid)
goto out_put_leader;
 
name.name = buf;
-   name.len = snprintf(buf, sizeof(buf), "%d", pid);
+   name.len = snprintf(buf, sizeof(buf), "%u", pid);
dentry = d_hash_and_lookup(dir, );
if (dentry) {
d_invalidate(dentry);
@@ -3226,14 +3226,14 @@ int proc_pid_readdir(struct file *file, struct 
dir_context *ctx)
for (iter = next_tgid(ns, iter);
 iter.task;
 iter.tgid += 1, iter = next_tgid(ns, iter)) {
-   char name[PROC_NUMBUF];
+   char name[10 + 1];
int len;
 
cond_resched();
if (!has_pid_permissions(ns, iter.task, 2))
continue;
 
-   len = snprintf(name, sizeof(name), "%d", iter.tgid);
+   len = snprintf(name, sizeof(name), "%u", iter.tgid);
ctx->pos = iter.tgid + TGID_OFFSET;
if (!proc_fill_cache(file, ctx, name, len,
 proc_pid_instantiate, iter.task, NULL)) {
@@ -3557,10 +3557,10 @@ static int proc_task_readdir(struct file *file, struct 
dir_context *ctx)
for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
 task;
 task = next_tid(task), ctx->pos++) {
-   char name[PROC_NUMBUF];
+   char name[10 + 1];
int len;
tid = task_pid_nr_ns(task, ns);
-   len = snprintf(name, sizeof(name), "%d", tid);
+   len = snprintf(name, sizeof(name), "%u", tid);
if (!proc_fill_cache(file, ctx, name, len,
proc_task_instantiate, task, NULL)) {
/* returning this tgid failed, save it as the first
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 00ce153..390c2fe 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -235,7 +235,7 @@ static int proc_readfd_common(struct file *file, struct 
dir_context *ctx,
for (fd = ctx->pos - 2;
 fd < files_fdtable(files)->max_fds;
 fd++, ctx->pos++) {
-   char name[PROC_NUMBUF];
+   char name[10 + 1];
int len;
 
if (!fcheck_files(files, fd))
diff --git a/fs/proc/self.c b/fs/proc/self.c
index f6e2e3f..dd06755 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -35,11 +35,11 @@ static const char *proc_self_get_link(struct dentry *dentry,
 
if (!tgid)
return ERR_PTR(-ENOENT);
-   /* 11 for max length of signed int in decimal + NULL term */
-   name = kmalloc(12, dentry ? GFP_KERNEL : GFP_ATOMIC);
+   /* max length of unsigned int in decimal + NULL term */
+   name = kmalloc(10 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
if (unlikely(!name))
return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
-   sprintf(name, "%d", tgid);
+   sprintf(name, "%u", tgid);
set_delayed_call(done, kfree_link, name);
return name;
 }
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 02d1db8..44e0921 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -30,11 +30,10 @@

[PATCH v2 4.9 08/10] proc: Clear the pieces of proc_inode that proc_evict_inode cares about

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 71448011ea2a1cd36d8f5cbdab0ed716c454d565 ]

This just keeps everything tidier, and allows for using flags like
SLAB_TYPESAFE_BY_RCU where slabs are not always cleared before reuse.
I don't see reuse without reinitializing happening with the proc_inode
but I had a false alarm while reworking flushing of proc dentries and
indoes when a process dies that caused me to tidy this up.

The code is a little easier to follow and reason about this
way so I figured the changes might as well be kept.

Signed-off-by: "Eric W. Biederman" 
Cc:  # 4.9.x
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 920c761..739fb9c 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -32,21 +32,27 @@ static void proc_evict_inode(struct inode *inode)
 {
struct proc_dir_entry *de;
struct ctl_table_header *head;
+   struct proc_inode *ei = PROC_I(inode);
 
truncate_inode_pages_final(>i_data);
clear_inode(inode);
 
/* Stop tracking associated processes */
-   put_pid(PROC_I(inode)->pid);
+   if (ei->pid) {
+   put_pid(ei->pid);
+   ei->pid = NULL;
+   }
 
/* Let go of any associated proc directory entry */
-   de = PDE(inode);
-   if (de)
+   de = ei->pde;
+   if (de) {
pde_put(de);
+   ei->pde = NULL;
+   }
 
-   head = PROC_I(inode)->sysctl;
+   head = ei->sysctl;
if (head) {
-   RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL);
+   RCU_INIT_POINTER(ei->sysctl, NULL);
proc_sys_evict_inode(inode, head);
}
 }
-- 
1.8.3.1

[PATCH v2 4.9 10/10] proc: Use a list of inodes to flush from proc

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]

Rework the flushing of proc to use a list of directory inodes that
need to be flushed.

The list is kept on struct pid not on struct task_struct, as there is
a fixed connection between proc inodes and pids but at least for the
case of de_thread the pid of a task_struct changes.

This removes the dependency on proc_mnt which allows for different
mounts of proc having different mount options even in the same pid
namespace and this allows for the removal of proc_mnt which will
trivially the first mount of proc to honor it's mount options.

This flushing remains an optimization.  The functions
pid_delete_dentry and pid_revalidate ensure that ordinary dcache
management will not attempt to use dentries past the point their
respective task has died.  When unused the shrinker will
eventually be able to remove these dentries.

There is a case in de_thread where proc_flush_pid can be
called early for a given pid.  Which winds up being
safe (if suboptimal) as this is just an optiimization.

Only pid directories are put on the list as the other
per pid files are children of those directories and
d_invalidate on the directory will get them as well.

So that the pid can be used during flushing it's reference count is
taken in release_task and dropped in proc_flush_pid.  Further the call
of proc_flush_pid is moved after the tasklist_lock is released in
release_task so that it is certain that the pid has already been
unhashed when flushing it taking place.  This removes a small race
where a dentry could recreated.

As struct pid is supposed to be small and I need a per pid lock
I reuse the only lock that currently exists in struct pid the
the wait_pidfd.lock.

The net result is that this adds all of this functionality
with just a little extra list management overhead and
a single extra pointer in struct pid.

v2: Initialize pid->inodes.  I somehow failed to get that
initialization into the initial version of the patch.  A boot
failure was reported by "kernel test robot ", and
failure to initialize that pid->inodes matches all of the reported
symptoms.

Signed-off-by: Eric W. Biederman 
Fixes: f333c700c610 ("pidns: Add a limit on the number of pid
namespaces")
Fixes: 60347f6716aa ("pid namespaces: prepare proc_flust_task() to flush
entries from multiple proc trees")
Cc:  # 4.9.x: b3e583825266: clone: add CLONE_PIDFD
Cc:  # 4.9.x: b53b0b9d9a61: pidfd: add polling
support
Cc:  # 4.9.x: db978da8fa1d: proc: Pass file mode to
proc_pid_make_inode
Cc:  # 4.9.x: 68eb94f16227: proc: Better ownership of
files for non-dumpable tasks in user namespaces
Cc:  # 4.9.x: e3912ac37e07: proc: use %u for pid
printing and slightly less stack
Cc:  # 4.9.x: 0afa5ca82212: proc: Rename in
proc_inode rename sysctl_inodes sibling_inodes
Cc:  # 4.9.x: 26dbc60f385f: proc: Generalize
proc_sys_prune_dcache into proc_prune_siblings_dcache
Cc:  # 4.9.x: 71448011ea2a: proc: Clear the pieces of
proc_inode that proc_evict_inode cares about
Cc:  # 4.9.x: f90f3cafe8d5: Use d_invalidate in
proc_prune_siblings_dcache
Cc:  # 4.9.x
(proc: fix up cherry-pick conflicts for 7bc3e6e55acf)
Signed-off-by: Wen Yang 
---
 fs/proc/base.c  | 111 
 fs/proc/inode.c |   2 +-
 fs/proc/internal.h  |   1 +
 include/linux/pid.h |   1 +
 include/linux/proc_fs.h |   4 +-
 kernel/exit.c   |   5 ++-
 kernel/pid.c|   1 +
 7 files changed, 45 insertions(+), 80 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3502a40..11caf35 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1728,11 +1728,25 @@ void task_dump_owner(struct task_struct *task, mode_t 
mode,
*rgid = gid;
 }
 
+void proc_pid_evict_inode(struct proc_inode *ei)
+{
+   struct pid *pid = ei->pid;
+
+   if (S_ISDIR(ei->vfs_inode.i_mode)) {
+   spin_lock(>wait_pidfd.lock);
+   hlist_del_init_rcu(>sibling_inodes);
+   spin_unlock(>wait_pidfd.lock);
+   }
+
+   put_pid(pid);
+}
+
 struct inode *proc_pid_make_inode(struct super_block * sb,
  struct task_struct *task, umode_t mode)
 {
struct inode * inode;
struct proc_inode *ei;
+   struct pid *pid;
 
/* We need a new inode */
 
@@ -1750,10 +1764,18 @@ struct inode *proc_pid_make_inode(struct super_block * 
sb,
/*
 * grab the reference to task.
 */
-   ei->pid = get_task_pid(task, PIDTYPE_PID);
-   if (!ei->pid)
+   pid = get_task_pid(task, PIDTYPE_PID);
+   if (!pid)
goto out_unlock;
 
+   /* Let the pid remember us for quick removal */
+   ei->pid = pid;
+   if (S_ISDIR(mode)) {
+   spin_lock(>wait_pidfd.lock);
+   hlist_add_head_rcu(>sibling_inodes, >inodes);
+   spin_unlock(>wait_pidfd.lock);
+   }
+
task_dump_owner(task, 0, >i_uid,

[PATCH v2 4.9 01/10] clone: add CLONE_PIDFD

2021-01-06 Thread Wen Yang

From: Christian Brauner 

[ Upstream commit b3e5838252665ee4cfa76b82bdf1198dca81e5be ]

This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call.  Linus originally suggested to implement this as a
new flag to clone() instead of making it a separate system call.  As
spotted by Linus, there is exactly one bit for clone() left.

CLONE_PIDFD creates file descriptors based on the anonymous inode
implementation in the kernel that will also be used to implement the new
mount api.  They serve as a simple opaque handle on pids.  Logically,
this makes it possible to interpret a pidfd differently, narrowing or
widening the scope of various operations (e.g. signal sending).  Thus, a
pidfd cannot just refer to a tgid, but also a tid, or in theory - given
appropriate flag arguments in relevant syscalls - a process group or
session. A pidfd does not represent a privilege.  This does not imply it
cannot ever be that way but for now this is not the case.

A pidfd comes with additional information in fdinfo if the kernel supports
procfs.  The fdinfo file contains the pid of the process in the callers
pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".

As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
parent_tidptr argument of clone.  This has the advantage that we can
give back the associated pid and the pidfd at the same time.

To remove worries about missing metadata access this patchset comes with
a sample program that illustrates how a combination of CLONE_PIDFD, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/.  The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.

Suggested-by: Linus Torvalds 
Signed-off-by: Christian Brauner 
Co-developed-by: Jann Horn 
Signed-off-by: Jann Horn 
Reviewed-by: Oleg Nesterov 
Cc: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: David Howells 
Cc: "Michael Kerrisk (man-pages)" 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc:  # 4.9.x
(clone: fix up cherry-pick conflicts for b3e583825266)
Signed-off-by: Wen Yang 
---
 include/linux/pid.h|   1 +
 include/uapi/linux/sched.h |   1 +
 kernel/fork.c  | 105 +++--
 3 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 97b745d..7599a78 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -73,6 +73,7 @@ struct pid_link
struct hlist_node node;
struct pid *pid;
 };
+extern const struct file_operations pidfd_fops;
 
 static inline struct pid *get_pid(struct pid *pid)
 {
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 5f0fe01..ed6e31d 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -9,6 +9,7 @@
 #define CLONE_FS   0x0200  /* set if fs info shared between 
processes */
 #define CLONE_FILES0x0400  /* set if open files shared between 
processes */
 #define CLONE_SIGHAND  0x0800  /* set if signal handlers and blocked 
signals shared */
+#define CLONE_PIDFD0x1000  /* set if a pidfd should be placed in 
parent */
 #define CLONE_PTRACE   0x2000  /* set if we want to let tracing 
continue on the child too */
 #define CLONE_VFORK0x4000  /* set if the parent wants the child to 
wake it up on mm_release */
 #define CLONE_PARENT   0x8000  /* set if we want to have the same 
parent as the cloner */
diff --git a/kernel/fork.c b/kernel/fork.c
index b64efec..4249f60 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -11,6 +11,7 @@
  * management can be a bitch. See 'mm/memory.c': 'copy_page_range()'
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -1460,6 +1461,58 @@ static void posix_cpu_timers_init(struct task_struct 
*tsk)
 task->pids[type].pid = pid;
 }
 
+static int pidfd_release(struct inode *inode, struct file *file)
+{
+   struct pid *pid = file->private_data;
+
+   file->private_data = NULL;
+   put_pid(pid);
+   return 0;
+}
+
+#ifdef CONFIG_PROC_FS
+static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct pid_namespace *ns = file_inode(m->file)->i_sb->s_fs_info;
+   struct pid *pid = f->private_data;
+
+   seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
+   seq_putc(m, '\n');
+}
+#endif
+
+const struct file_operations pidfd_fops = {
+   .release = pidfd_release,
+#ifdef CONFIG_PROC_FS
+   .show_fdinfo = pidfd_show_fdinfo,
+#endif
+};
+
+/**
+ * pidfd_create() - Create a new pid file descriptor.
+ *
+ * @pid:  struct pid that the pidfd will reference
+ *
+ * This creates a new pid file descriptor with the O_CLOEXEC flag set.
+ *
+ *

[PATCH v2 4.9 07/10] proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 26dbc60f385ff9cff475ea2a3bad02e80fd6fa43 ]

This prepares the way for allowing the pid part of proc to use this
dcache pruning code as well.

Signed-off-by: Eric W. Biederman 
Cc:  # 4.9.x
(proc: fix up cherry-pick conflicts for 26dbc60f385f)
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c   | 38 ++
 fs/proc/internal.h|  1 +
 fs/proc/proc_sysctl.c | 35 +--
 3 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 14d9c1d..920c761 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -101,6 +101,44 @@ void __init proc_init_inodecache(void)
 init_once);
 }
 
+void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock)
+{
+   struct inode *inode;
+   struct proc_inode *ei;
+   struct hlist_node *node;
+   struct super_block *sb;
+
+   rcu_read_lock();
+   for (;;) {
+   node = hlist_first_rcu(inodes);
+   if (!node)
+   break;
+   ei = hlist_entry(node, struct proc_inode, sibling_inodes);
+   spin_lock(lock);
+   hlist_del_init_rcu(>sibling_inodes);
+   spin_unlock(lock);
+
+   inode = >vfs_inode;
+   sb = inode->i_sb;
+   if (!atomic_inc_not_zero(>s_active))
+   continue;
+   inode = igrab(inode);
+   rcu_read_unlock();
+   if (unlikely(!inode)) {
+   deactivate_super(sb);
+   rcu_read_lock();
+   continue;
+   }
+
+   d_prune_aliases(inode);
+   iput(inode);
+   deactivate_super(sb);
+
+   rcu_read_lock();
+   }
+   rcu_read_unlock();
+}
+
 static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
struct super_block *sb = root->d_sb;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 409b5c5..9bc44a1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -200,6 +200,7 @@ struct pde_opener {
 extern const struct inode_operations proc_pid_link_inode_operations;
 
 extern void proc_init_inodecache(void);
+void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock);
 extern struct inode *proc_get_inode(struct super_block *, struct 
proc_dir_entry *);
 extern int proc_fill_super(struct super_block *, void *data, int flags);
 extern void proc_entry_rundown(struct proc_dir_entry *);
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 671490e..f19063b 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -262,40 +262,7 @@ static void unuse_table(struct ctl_table_header *p)
 
 static void proc_sys_prune_dcache(struct ctl_table_header *head)
 {
-   struct inode *inode;
-   struct proc_inode *ei;
-   struct hlist_node *node;
-   struct super_block *sb;
-
-   rcu_read_lock();
-   for (;;) {
-   node = hlist_first_rcu(>inodes);
-   if (!node)
-   break;
-   ei = hlist_entry(node, struct proc_inode, sibling_inodes);
-   spin_lock(_lock);
-   hlist_del_init_rcu(>sibling_inodes);
-   spin_unlock(_lock);
-
-   inode = >vfs_inode;
-   sb = inode->i_sb;
-   if (!atomic_inc_not_zero(>s_active))
-   continue;
-   inode = igrab(inode);
-   rcu_read_unlock();
-   if (unlikely(!inode)) {
-   deactivate_super(sb);
-   rcu_read_lock();
-   continue;
-   }
-
-   d_prune_aliases(inode);
-   iput(inode);
-   deactivate_super(sb);
-
-   rcu_read_lock();
-   }
-   rcu_read_unlock();
+   proc_prune_siblings_dcache(>inodes, _lock);
 }
 
 /* called under sysctl_lock, will reacquire if has to wait */
-- 
1.8.3.1

[PATCH v2 4.9 06/10] proc: Rename in proc_inode rename sysctl_inodes sibling_inodes

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit 0afa5ca82212247456f9de1468b595a111fee633 ]

I about to need and use the same functionality for pid based
inodes and there is no point in adding a second field when
this field is already here and serving the same purporse.

Just give the field a generic name so it is clear that
it is no longer sysctl specific.

Also for good measure initialize sibling_inodes when
proc_inode is initialized.

Signed-off-by: Eric W. Biederman 
Cc:  # 4.9.x
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c   | 1 +
 fs/proc/internal.h| 2 +-
 fs/proc/proc_sysctl.c | 8 
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index a289349..14d9c1d 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -67,6 +67,7 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
ei->pde = NULL;
ei->sysctl = NULL;
ei->sysctl_entry = NULL;
+   INIT_HLIST_NODE(>sibling_inodes);
ei->ns_ops = NULL;
inode = >vfs_inode;
return inode;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 103435f..409b5c5 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -65,7 +65,7 @@ struct proc_inode {
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
-   struct hlist_node sysctl_inodes;
+   struct hlist_node sibling_inodes;
const struct proc_ns_operations *ns_ops;
struct inode vfs_inode;
 };
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 191573a..671490e 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -272,9 +272,9 @@ static void proc_sys_prune_dcache(struct ctl_table_header 
*head)
node = hlist_first_rcu(>inodes);
if (!node)
break;
-   ei = hlist_entry(node, struct proc_inode, sysctl_inodes);
+   ei = hlist_entry(node, struct proc_inode, sibling_inodes);
spin_lock(_lock);
-   hlist_del_init_rcu(>sysctl_inodes);
+   hlist_del_init_rcu(>sibling_inodes);
spin_unlock(_lock);
 
inode = >vfs_inode;
@@ -480,7 +480,7 @@ static struct inode *proc_sys_make_inode(struct super_block 
*sb,
}
ei->sysctl = head;
ei->sysctl_entry = table;
-   hlist_add_head_rcu(>sysctl_inodes, >inodes);
+   hlist_add_head_rcu(>sibling_inodes, >inodes);
head->count++;
spin_unlock(_lock);
 
@@ -511,7 +511,7 @@ static struct inode *proc_sys_make_inode(struct super_block 
*sb,
 void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
 {
spin_lock(_lock);
-   hlist_del_init_rcu(_I(inode)->sysctl_inodes);
+   hlist_del_init_rcu(_I(inode)->sibling_inodes);
if (!--head->count)
kfree_rcu(head, rcu);
spin_unlock(_lock);
-- 
1.8.3.1

[PATCH v2 4.9 09/10] proc: Use d_invalidate in proc_prune_siblings_dcache

2021-01-06 Thread Wen Yang

From: "Eric W. Biederman" 

[ Upstream commit f90f3cafe8d56d593fc509a4185da1d5800efea4 ]

The function d_prune_aliases has the problem that it will only prune
aliases thare are completely unused.  It will not remove aliases for
the dcache or even think of removing mounts from the dcache.  For that
behavior d_invalidate is needed.

To use d_invalidate replace d_prune_aliases with d_find_alias followed
by d_invalidate and dput.

For completeness the directory and the non-directory cases are
separated because in theory (although not in currently in practice for
proc) directories can only ever have a single dentry while
non-directories can have hardlinks and thus multiple dentries.
As part of this separation use d_find_any_alias for directories
to spare d_find_alias the extra work of doing that.

Plus the differences between d_find_any_alias and d_find_alias makes
it clear why the directory and non-directory code and not share code.

To make it clear these routines now invalidate dentries rename
proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename
proc_sys_prune_dcache proc_sys_invalidate_dcache.

V2: Split the directory and non-directory cases.  To make this
code robust to future changes in proc.

Signed-off-by: "Eric W. Biederman" 
Cc:  # 4.9.x
(proc: fix up cherry-pick conflicts for f90f3cafe8d5)
Signed-off-by: Wen Yang 
---
 fs/proc/inode.c   | 16 ++--
 fs/proc/internal.h|  2 +-
 fs/proc/proc_sysctl.c |  8 
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 739fb9c..2af9f4f 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -107,7 +107,7 @@ void __init proc_init_inodecache(void)
 init_once);
 }
 
-void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock)
+void proc_invalidate_siblings_dcache(struct hlist_head *inodes, spinlock_t 
*lock)
 {
struct inode *inode;
struct proc_inode *ei;
@@ -136,7 +136,19 @@ void proc_prune_siblings_dcache(struct hlist_head *inodes, 
spinlock_t *lock)
continue;
}
 
-   d_prune_aliases(inode);
+   if (S_ISDIR(inode->i_mode)) {
+   struct dentry *dir = d_find_any_alias(inode);
+   if (dir) {
+   d_invalidate(dir);
+   dput(dir);
+   }
+   } else {
+   struct dentry *dentry;
+   while ((dentry = d_find_alias(inode))) {
+   d_invalidate(dentry);
+   dput(dentry);
+   }
+   }
iput(inode);
deactivate_super(sb);
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 9bc44a1..6a1d679 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -200,7 +200,7 @@ struct pde_opener {
 extern const struct inode_operations proc_pid_link_inode_operations;
 
 extern void proc_init_inodecache(void);
-void proc_prune_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock);
+void proc_invalidate_siblings_dcache(struct hlist_head *inodes, spinlock_t 
*lock);
 extern struct inode *proc_get_inode(struct super_block *, struct 
proc_dir_entry *);
 extern int proc_fill_super(struct super_block *, void *data, int flags);
 extern void proc_entry_rundown(struct proc_dir_entry *);
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index f19063b..b6668a5 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -260,9 +260,9 @@ static void unuse_table(struct ctl_table_header *p)
complete(p->unregistering);
 }
 
-static void proc_sys_prune_dcache(struct ctl_table_header *head)
+static void proc_sys_invalidate_dcache(struct ctl_table_header *head)
 {
-   proc_prune_siblings_dcache(>inodes, _lock);
+   proc_invalidate_siblings_dcache(>inodes, _lock);
 }
 
 /* called under sysctl_lock, will reacquire if has to wait */
@@ -284,10 +284,10 @@ static void start_unregistering(struct ctl_table_header 
*p)
spin_unlock(_lock);
}
/*
-* Prune dentries for unregistered sysctls: namespaced sysctls
+* Invalidate dentries for unregistered sysctls: namespaced sysctls
 * can have duplicate names and contaminate dcache very badly.
 */
-   proc_sys_prune_dcache(p);
+   proc_sys_invalidate_dcache(p);
/*
 * do not remove from the list until nobody holds it; walking the
 * list in do_sysctl() relies on that.
-- 
1.8.3.1

[PATCH v2 4.9 03/10] proc: Pass file mode to proc_pid_make_inode

2021-01-06 Thread Wen Yang

From: Andreas Gruenbacher 

[ Upstream commit db978da8fa1d0819b210c137d31a339149b88875 ]

Pass the file mode of the proc inode to be created to
proc_pid_make_inode.  In proc_pid_make_inode, initialize inode->i_mode
before calling security_task_to_inode.  This allows selinux to set
isec->sclass right away without introducing "half-initialized" inode
security structs.

Signed-off-by: Andreas Gruenbacher 
Signed-off-by: Paul Moore 
Cc:  # 4.9.x
Signed-off-by: Wen Yang 
---
 fs/proc/base.c   | 23 +--
 fs/proc/fd.c |  6 ++
 fs/proc/internal.h   |  2 +-
 fs/proc/namespaces.c |  3 +--
 security/selinux/hooks.c |  1 +
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b9e4183..ee2e0ec 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1676,7 +1676,8 @@ static int proc_pid_readlink(struct dentry * dentry, char 
__user * buffer, int b
 
 /* building an inode */
 
-struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct 
*task)
+struct inode *proc_pid_make_inode(struct super_block * sb,
+ struct task_struct *task, umode_t mode)
 {
struct inode * inode;
struct proc_inode *ei;
@@ -1690,6 +1691,7 @@ struct inode *proc_pid_make_inode(struct super_block * 
sb, struct task_struct *t
 
/* Common stuff */
ei = PROC_I(inode);
+   inode->i_mode = mode;
inode->i_ino = get_next_ino();
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
inode->i_op = _def_inode_operations;
@@ -2041,7 +2043,9 @@ struct map_files_info {
struct proc_inode *ei;
struct inode *inode;
 
-   inode = proc_pid_make_inode(dir->i_sb, task);
+   inode = proc_pid_make_inode(dir->i_sb, task, S_IFLNK |
+   ((mode & FMODE_READ ) ? S_IRUSR : 0) |
+   ((mode & FMODE_WRITE) ? S_IWUSR : 0));
if (!inode)
return -ENOENT;
 
@@ -2050,12 +2054,6 @@ struct map_files_info {
 
inode->i_op = _map_files_link_inode_operations;
inode->i_size = 64;
-   inode->i_mode = S_IFLNK;
-
-   if (mode & FMODE_READ)
-   inode->i_mode |= S_IRUSR;
-   if (mode & FMODE_WRITE)
-   inode->i_mode |= S_IWUSR;
 
d_set_d_op(dentry, _map_files_dentry_operations);
d_add(dentry, inode);
@@ -2409,12 +2407,11 @@ static int proc_pident_instantiate(struct inode *dir,
struct inode *inode;
struct proc_inode *ei;
 
-   inode = proc_pid_make_inode(dir->i_sb, task);
+   inode = proc_pid_make_inode(dir->i_sb, task, p->mode);
if (!inode)
goto out;
 
ei = PROC_I(inode);
-   inode->i_mode = p->mode;
if (S_ISDIR(inode->i_mode))
set_nlink(inode, 2);/* Use getattr to fix if necessary */
if (p->iop)
@@ -3096,11 +3093,10 @@ static int proc_pid_instantiate(struct inode *dir,
 {
struct inode *inode;
 
-   inode = proc_pid_make_inode(dir->i_sb, task);
+   inode = proc_pid_make_inode(dir->i_sb, task, S_IFDIR | S_IRUGO | 
S_IXUGO);
if (!inode)
goto out;
 
-   inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
inode->i_op = _tgid_base_inode_operations;
inode->i_fop = _tgid_base_operations;
inode->i_flags|=S_IMMUTABLE;
@@ -3391,11 +3387,10 @@ static int proc_task_instantiate(struct inode *dir,
struct dentry *dentry, struct task_struct *task, const void *ptr)
 {
struct inode *inode;
-   inode = proc_pid_make_inode(dir->i_sb, task);
+   inode = proc_pid_make_inode(dir->i_sb, task, S_IFDIR | S_IRUGO | 
S_IXUGO);
 
if (!inode)
goto out;
-   inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
inode->i_op = _tid_base_inode_operations;
inode->i_fop = _tid_base_operations;
inode->i_flags|=S_IMMUTABLE;
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index d21dafe..4274f83 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -183,14 +183,13 @@ static int proc_fd_link(struct dentry *dentry, struct 
path *path)
struct proc_inode *ei;
struct inode *inode;
 
-   inode = proc_pid_make_inode(dir->i_sb, task);
+   inode = proc_pid_make_inode(dir->i_sb, task, S_IFLNK);
if (!inode)
goto out;
 
ei = PROC_I(inode);
ei->fd = fd;
 
-   inode->i_mode = S_IFLNK;
inode->i_op = _pid_link_inode_operations;
inode->i_size = 64;
 
@@ -322,14 +321,13 @@ int proc_fd_permission(struct inode *inode, int mask)
struct proc_inode *ei;
struct inode *inode;
 
-   inode = proc_pid_make_inode(dir->i_sb, task);
+   inode = proc_pid_make_inode(dir->i_sb, task, S_IFREG | S_IRUSR);
if (!inode)
goto out;
 
ei = PROC_I(inode);
ei->fd = fd;
 
-   inode->i_mode = S_IFREG | S_IRUSR;
inode->i_fop =

[PATCH v2 4.9 02/10] pidfd: add polling support

2021-01-06 Thread Wen Yang

From: "Joel Fernandes (Google)" 

[ Upstream commit b53b0b9d9a613c418057f6cb921c2f40a6f78c24 ]

This patch adds polling support to pidfd.

Android low memory killer (LMK) needs to know when a process dies once
it is sent the kill signal. It does so by checking for the existence of
/proc/pid which is both racy and slow. For example, if a PID is reused
between when LMK sends a kill signal and checks for existence of the
PID, since the wrong PID is now possibly checked for existence.
Using the polling support, LMK will be able to get notified when a process
exists in race-free and fast way, and allows the LMK to do other things
(such as by polling on other fds) while awaiting the process being killed
to die.

For notification to polling processes, we follow the same existing
mechanism in the kernel used when the parent of the task group is to be
notified of a child's death (do_notify_parent). This is precisely when the
tasks waiting on a poll of pidfd are also awakened in this patch.

We have decided to include the waitqueue in struct pid for the following
reasons:
1. The wait queue has to survive for the lifetime of the poll. Including
   it in task_struct would not be option in this case because the task can
   be reaped and destroyed before the poll returns.

2. By including the struct pid for the waitqueue means that during
   de_thread(), the new thread group leader automatically gets the new
   waitqueue/pid even though its task_struct is different.

Appropriate test cases are added in the second patch to provide coverage of
all the cases the patch is handling.

Cc: Andy Lutomirski 
Cc: Steven Rostedt 
Cc: Daniel Colascione 
Cc: Jann Horn 
Cc: Tim Murray 
Cc: Jonathan Kowalski 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: Kees Cook 
Cc: David Howells 
Cc: Oleg Nesterov 
Cc: kernel-t...@android.com
Reviewed-by: Oleg Nesterov 
Co-developed-by: Daniel Colascione 
Signed-off-by: Daniel Colascione 
Signed-off-by: Joel Fernandes (Google) 
Signed-off-by: Christian Brauner 
Cc:  # 4.9.x
(pidfd: fix up cherry-pick conflicts for b53b0b9d9a61)
Signed-off-by: Wen Yang 
---
 include/linux/pid.h |  3 +++
 kernel/fork.c   | 26 ++
 kernel/pid.c|  2 ++
 kernel/signal.c | 11 +++
 4 files changed, 42 insertions(+)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 7599a78..f5552ba 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -2,6 +2,7 @@
 #define _LINUX_PID_H
 
 #include 
+#include 
 
 enum pid_type
 {
@@ -62,6 +63,8 @@ struct pid
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
+   /* wait queue for pidfd notifications */
+   wait_queue_head_t wait_pidfd;
struct rcu_head rcu;
struct upid numbers[1];
 };
diff --git a/kernel/fork.c b/kernel/fork.c
index 4249f60..e3a4a14 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1481,8 +1481,34 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct 
file *f)
 }
 #endif
 
+/*
+ * Poll support for process exit notification.
+ */
+static unsigned int pidfd_poll(struct file *file, struct poll_table_struct 
*pts)
+{
+   struct task_struct *task;
+   struct pid *pid = file->private_data;
+   int poll_flags = 0;
+
+   poll_wait(file, >wait_pidfd, pts);
+
+   rcu_read_lock();
+   task = pid_task(pid, PIDTYPE_PID);
+   /*
+* Inform pollers only when the whole thread group exits.
+* If the thread group leader exits before all other threads in the
+* group, then poll(2) should block, similar to the wait(2) family.
+*/
+   if (!task || (task->exit_state && thread_group_empty(task)))
+   poll_flags = POLLIN | POLLRDNORM;
+   rcu_read_unlock();
+
+   return poll_flags;
+}
+
 const struct file_operations pidfd_fops = {
.release = pidfd_release,
+   .poll = pidfd_poll,
 #ifdef CONFIG_PROC_FS
.show_fdinfo = pidfd_show_fdinfo,
 #endif
diff --git a/kernel/pid.c b/kernel/pid.c
index fa704f8..e605398 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -333,6 +333,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
for (type = 0; type < PIDTYPE_MAX; ++type)
INIT_HLIST_HEAD(>tasks[type]);
 
+   init_waitqueue_head(>wait_pidfd);
+
upid = pid->numbers + ns->level;
spin_lock_irq(_lock);
if (!(ns->nr_hashed & PIDNS_HASH_ADDING))
diff --git a/kernel/signal.c b/kernel/signal.c
index bedca16..053de87a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1632,6 +1632,14 @@ int send_sigqueue(struct sigqueue *q, struct task_struct 
*t, int group)
return ret;
 }
 
+static void do_notify_pidfd(struct task_struct *task)
+{
+   struct pid *pid;
+
+   pid = task_pid(task);
+   wake_up_all(>wait_pidfd);
+}
+
 /*
  * Let a parent know about the death of a child.
  * For a stopped/continued status change, use do_notify_parent_cldstop instead.
@@ -1655,6 +1663,9 @@ bool

[PATCH v2 4.9 00/10] fix a race in release_task when flushing the dentry

2021-01-06 Thread Wen Yang

The dentries such as /proc//ns/ have the DCACHE_OP_DELETE flag, they 
should be deleted when the process exits. 

Suppose the following race appears： 

release_task dput 
-> proc_flush_task 
 -> dentry->d_op->d_delete(dentry) 
-> __exit_signal 
 -> dentry->d_lockref.count--  and return. 

In the proc_flush_task(), if another process is using this dentry, it will
not be deleted. At the same time, in dput(), d_op->d_delete() can be executed
before __exit_signal(pid has not been hashed), d_delete returns false, so
this dentry still cannot be deleted.

This dentry will always be cached (although its count is 0 and the
DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and
these dentries can only be deleted when drop_caches is manually triggered.

This will result in wasted memory. What's more troublesome is that these
dentries reference pid, according to the commit f333c700c610 ("pidns: Add a
limit on the number of pid namespaces"), if the pid cannot be released, it
may result in the inability to create a new pid_ns.

This issue was introduced by 60347f6716aa ("pid namespaces: prepare
proc_flust_task() to flush entries from multiple proc trees"), exposed by
f333c700c610 ("pidns: Add a limit on the number of pid namespaces"), and then
fixed by 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc").


Alexey Dobriyan (1):
  proc: use %u for pid printing and slightly less stack

Andreas Gruenbacher (1):
  proc: Pass file mode to proc_pid_make_inode

Christian Brauner (1):
  clone: add CLONE_PIDFD

Eric W. Biederman (6):
  proc: Better ownership of files for non-dumpable tasks in user
namespaces
  proc: Rename in proc_inode rename sysctl_inodes sibling_inodes
  proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
  proc: Clear the pieces of proc_inode that proc_evict_inode cares about
  proc: Use d_invalidate in proc_prune_siblings_dcache
  proc: Use a list of inodes to flush from proc

Joel Fernandes (Google) (1):
  pidfd: add polling support

 fs/proc/base.c | 242 -
 fs/proc/fd.c   |  20 +---
 fs/proc/inode.c|  67 -
 fs/proc/internal.h |  22 ++---
 fs/proc/namespaces.c   |   3 +-
 fs/proc/proc_sysctl.c  |  45 ++---
 fs/proc/self.c |   6 +-
 fs/proc/thread_self.c  |   5 +-
 include/linux/pid.h|   5 +
 include/linux/proc_fs.h|   4 +-
 include/uapi/linux/sched.h |   1 +
 kernel/exit.c  |   5 +-
 kernel/fork.c  | 131 +++-
 kernel/pid.c   |   3 +
 kernel/signal.c|  11 +++
 security/selinux/hooks.c   |   1 +
 16 files changed, 343 insertions(+), 228 deletions(-)

-- 
1.8.3.1

Re: [PATCH 5/5] fs: use HKDF implementation from kernel crypto API

2021-01-06 Thread Stephan Mueller

Am Mittwoch, dem 06.01.2021 um 23:19 -0800 schrieb Eric Biggers:
> On Mon, Jan 04, 2021 at 10:50:49PM +0100, Stephan Müller wrote:
> > As the kernel crypto API implements HKDF, replace the
> > file-system-specific HKDF implementation with the generic HKDF
> > implementation.
> > 
> > Signed-off-by: Stephan Mueller 
> > ---
> >  fs/crypto/Kconfig   |   2 +-
> >  fs/crypto/fscrypt_private.h |   4 +-
> >  fs/crypto/hkdf.c    | 108 +---
> >  3 files changed, 30 insertions(+), 84 deletions(-)
> > 
> > diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig
> > index a5f5c30368a2..9450e958f1d1 100644
> > --- a/fs/crypto/Kconfig
> > +++ b/fs/crypto/Kconfig
> > @@ -2,7 +2,7 @@
> >  config FS_ENCRYPTION
> > bool "FS Encryption (Per-file encryption)"
> > select CRYPTO
> > -   select CRYPTO_HASH
> > +   select CRYPTO_HKDF
> > select CRYPTO_SKCIPHER
> > select CRYPTO_LIB_SHA256
> > select KEYS
> > diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
> > index 3fa965eb3336..0d6871838099 100644
> > --- a/fs/crypto/fscrypt_private.h
> > +++ b/fs/crypto/fscrypt_private.h
> > @@ -304,7 +304,7 @@ struct fscrypt_hkdf {
> > struct crypto_shash *hmac_tfm;
> >  };
> >  
> > -int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
> > +int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, u8 *master_key,
> >   unsigned int master_key_size);
> 
> It shouldn't be necessary to remove const here.

Unfortunately it is when adding the pointer to struct kvec
> 
> >  
> >  /*
> > @@ -323,7 +323,7 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const
> > u8 *master_key,
> >  #define HKDF_CONTEXT_INODE_HASH_KEY7 /* info=   */
> >  
> >  int fscrypt_hkdf_expand(const struct fscrypt_hkdf *hkdf, u8 context,
> > -   const u8 *info, unsigned int infolen,
> > +   u8 *info, unsigned int infolen,
> > u8 *okm, unsigned int okmlen);
> 
> Likewise.  In fact some callers rely on 'info' not being modified.

Same here.
> 
> > -/*
> > + *
> >   * Compute HKDF-Extract using the given master key as the input keying
> > material,
> >   * and prepare an HMAC transform object keyed by the resulting
> > pseudorandom key.
> >   *
> >   * Afterwards, the keyed HMAC transform object can be used for HKDF-
> > Expand many
> >   * times without having to recompute HKDF-Extract each time.
> >   */
> > -int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
> > +int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, u8 *master_key,
> >   unsigned int master_key_size)
> >  {
> > +   /* HKDF-Extract (RFC 5869 section 2.2), unsalted */
> > +   const struct kvec seed[] = { {
> > +   .iov_base = NULL,
> > +   .iov_len = 0
> > +   }, {
> > +   .iov_base = master_key,
> > +   .iov_len = master_key_size
> > +   } };
> > struct crypto_shash *hmac_tfm;
> > -   u8 prk[HKDF_HASHLEN];
> > int err;
> >  
> > hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, 0);
> > @@ -74,16 +65,12 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const
> > u8 *master_key,
> > return PTR_ERR(hmac_tfm);
> > }
> >  
> > -   if (WARN_ON(crypto_shash_digestsize(hmac_tfm) != sizeof(prk))) {
> > +   if (WARN_ON(crypto_shash_digestsize(hmac_tfm) != HKDF_HASHLEN)) {
> > err = -EINVAL;
> > goto err_free_tfm;
> > }
> >  
> > -   err = hkdf_extract(hmac_tfm, master_key, master_key_size, prk);
> > -   if (err)
> > -   goto err_free_tfm;
> > -
> > -   err = crypto_shash_setkey(hmac_tfm, prk, sizeof(prk));
> > +   err = crypto_hkdf_setkey(hmac_tfm, seed, ARRAY_SIZE(seed));
> > if (err)
> > goto err_free_tfm;
> 
> It's weird that the salt and key have to be passed in a kvec.
> Why not just have normal function parameters like:
> 
> int crypto_hkdf_setkey(struct crypto_shash *hmac_tfm,
>    const u8 *key, size_t keysize,
>    const u8 *salt, size_t saltsize);

I wanted to have an identical interface for all types of KDFs to allow turning
them into a template eventually. For example, SP800-108 KDFs only have one
parameter. Hence the use of a kvec.

> 
> >  int fscrypt_hkdf_expand(const struct fscrypt_hkdf *hkdf, u8 context,
> > -   const u8 *info, unsigned int infolen,
> > +   u8 *info, unsigned int infolen,
> > u8 *okm, unsigned int okmlen)
> >  {
> > -   SHASH_DESC_ON_STACK(desc, hkdf->hmac_tfm);
> > -   u8 prefix[9];
> > -   unsigned int i;
> > -   int err;
> > -   const u8 *prev = NULL;
> > -   u8 counter = 1;
> > -   u8 tmp[HKDF_HASHLEN];
> > -
> > -   if (WARN_ON(okmlen > 255 *

[PATCH v4 3/3] iommu/vt-d: Fix ineffective devTLB invalidation for subdevices

2021-01-06 Thread Liu Yi L

iommu_flush_dev_iotlb() is called to invalidate caches on device. It only
loops the devices which are full-attached to the domain. For sub-devices,
this is ineffective. This results in invalid caching entries left on the
device. Fix it by adding loop for subdevices as well. Also, the domain->
has_iotlb_device needs to be updated when attaching to subdevices.

Fixes: 67b8e02b5e761 ("iommu/vt-d: Aux-domain specific domain attach/detach")
Signed-off-by: Liu Yi L 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 53 +++--
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d7720a8..65cf06d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -719,6 +719,8 @@ static int domain_update_device_node(struct dmar_domain 
*domain)
return nid;
 }
 
+static void domain_update_iotlb(struct dmar_domain *domain);
+
 /* Some capabilities may be different across iommus */
 static void domain_update_iommu_cap(struct dmar_domain *domain)
 {
@@ -744,6 +746,8 @@ static void domain_update_iommu_cap(struct dmar_domain 
*domain)
domain->domain.geometry.aperture_end = 
__DOMAIN_MAX_ADDR(domain->gaw - 1);
else
domain->domain.geometry.aperture_end = 
__DOMAIN_MAX_ADDR(domain->gaw);
+
+   domain_update_iotlb(domain);
 }
 
 struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
@@ -1464,17 +1468,22 @@ static void domain_update_iotlb(struct dmar_domain 
*domain)
 
assert_spin_locked(_domain_lock);
 
-   list_for_each_entry(info, >devices, link) {
-   struct pci_dev *pdev;
-
-   if (!info->dev || !dev_is_pci(info->dev))
-   continue;
-
-   pdev = to_pci_dev(info->dev);
-   if (pdev->ats_enabled) {
+   list_for_each_entry(info, >devices, link)
+   if (info->ats_enabled) {
has_iotlb_device = true;
break;
}
+
+   if (!has_iotlb_device) {
+   struct subdev_domain_info *sinfo;
+
+   list_for_each_entry(sinfo, >subdevices, link_domain) {
+   info = get_domain_info(sinfo->pdev);
+   if (info && info->ats_enabled) {
+   has_iotlb_device = true;
+   break;
+   }
+   }
}
 
domain->has_iotlb_device = has_iotlb_device;
@@ -1555,25 +1564,37 @@ static void iommu_disable_dev_iotlb(struct 
device_domain_info *info)
 #endif
 }
 
+static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
+   u64 addr, unsigned int mask)
+{
+   u16 sid, qdep;
+
+   if (!info || !info->ats_enabled)
+   return;
+
+   sid = info->bus << 8 | info->devfn;
+   qdep = info->ats_qdep;
+   qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
+  qdep, addr, mask);
+}
+
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
  u64 addr, unsigned mask)
 {
-   u16 sid, qdep;
unsigned long flags;
struct device_domain_info *info;
+   struct subdev_domain_info *sinfo;
 
if (!domain->has_iotlb_device)
return;
 
spin_lock_irqsave(_domain_lock, flags);
-   list_for_each_entry(info, >devices, link) {
-   if (!info->ats_enabled)
-   continue;
+   list_for_each_entry(info, >devices, link)
+   __iommu_flush_dev_iotlb(info, addr, mask);
 
-   sid = info->bus << 8 | info->devfn;
-   qdep = info->ats_qdep;
-   qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
-   qdep, addr, mask);
+   list_for_each_entry(sinfo, >subdevices, link_domain) {
+   info = get_domain_info(sinfo->pdev);
+   __iommu_flush_dev_iotlb(info, addr, mask);
}
spin_unlock_irqrestore(_domain_lock, flags);
 }
-- 
2.7.4

[PATCH v4 2/3] iommu/vt-d: Track device aux-attach with subdevice_domain_info

2021-01-06 Thread Liu Yi L

In the existing code, loop all devices attached to a domain does not
include sub-devices attached via iommu_aux_attach_device().

This was found by when I'm working on the below patch, There is no
device in the domain->devices list, thus unable to get the cap and
ecap of iommu unit. But this domain actually has subdevice which is
attached via aux-manner. But it is tracked by domain. This patch is
going to fix it.

https://lore.kernel.org/kvm/1599734733-6431-17-git-send-email-yi.l@intel.com/

And this fix goes beyond the patch above, such sub-device tracking is
necessary for other cases. For example, flushing device_iotlb for a
domain which has sub-devices attached by auxiliary manner.

Fixes: 67b8e02b5e761 ("iommu/vt-d: Aux-domain specific domain attach/detach")
Co-developed-by: Xin Zeng 
Signed-off-by: Xin Zeng 
Signed-off-by: Liu Yi L 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 95 +
 include/linux/intel-iommu.h | 16 +---
 2 files changed, 82 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 788119c..d7720a8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1877,6 +1877,7 @@ static struct dmar_domain *alloc_domain(int flags)
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(>devices);
+   INIT_LIST_HEAD(>subdevices);
 
return domain;
 }
@@ -2547,7 +2548,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->iommu = iommu;
info->pasid_table = NULL;
info->auxd_enabled = 0;
-   INIT_LIST_HEAD(>auxiliary_domains);
+   INIT_LIST_HEAD(>subdevices);
 
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -4475,33 +4476,61 @@ is_aux_domain(struct device *dev, struct iommu_domain 
*domain)
domain->type == IOMMU_DOMAIN_UNMANAGED;
 }
 
-static void auxiliary_link_device(struct dmar_domain *domain,
- struct device *dev)
+static inline struct subdev_domain_info *
+lookup_subdev_info(struct dmar_domain *domain, struct device *dev)
+{
+   struct subdev_domain_info *sinfo;
+
+   if (!list_empty(>subdevices)) {
+   list_for_each_entry(sinfo, >subdevices, link_domain) {
+   if (sinfo->pdev == dev)
+   return sinfo;
+   }
+   }
+
+   return NULL;
+}
+
+static int auxiliary_link_device(struct dmar_domain *domain,
+struct device *dev)
 {
struct device_domain_info *info = get_domain_info(dev);
+   struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
 
assert_spin_locked(_domain_lock);
if (WARN_ON(!info))
-   return;
+   return -EINVAL;
+
+   if (!sinfo) {
+   sinfo = kzalloc(sizeof(*sinfo), GFP_ATOMIC);
+   sinfo->domain = domain;
+   sinfo->pdev = dev;
+   list_add(>link_phys, >subdevices);
+   list_add(>link_domain, >subdevices);
+   }
 
-   domain->auxd_refcnt++;
-   list_add(>auxd, >auxiliary_domains);
+   return ++sinfo->users;
 }
 
-static void auxiliary_unlink_device(struct dmar_domain *domain,
-   struct device *dev)
+static int auxiliary_unlink_device(struct dmar_domain *domain,
+  struct device *dev)
 {
struct device_domain_info *info = get_domain_info(dev);
+   struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
+   int ret;
 
assert_spin_locked(_domain_lock);
-   if (WARN_ON(!info))
-   return;
+   if (WARN_ON(!info || !sinfo || sinfo->users <= 0))
+   return -EINVAL;
 
-   list_del(>auxd);
-   domain->auxd_refcnt--;
+   ret = --sinfo->users;
+   if (!ret) {
+   list_del(>link_phys);
+   list_del(>link_domain);
+   kfree(sinfo);
+   }
 
-   if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   ioasid_put(domain->default_pasid);
+   return ret;
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -4530,6 +4559,19 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
}
 
spin_lock_irqsave(_domain_lock, flags);
+   ret = auxiliary_link_device(domain, dev);
+   if (ret <= 0)
+   goto link_failed;
+
+   /*
+* Subdevices from the same physical device can be attached to the
+* same domain. For such cases, only the first subdevice attachment
+* needs to go through the full steps in this function. So if ret >
+* 1, just goto out.
+*/
+   if (ret > 1)
+   goto out;
+
/*
 * iommu->lock must be held to attach

[PATCH v4 1/3] iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev

2021-01-06 Thread Liu Yi L

Current struct intel_svm has a field to record the struct intel_iommu
pointer for a PASID bind. And struct intel_svm will be shared by all
the devices bind to the same process. The devices may be behind different
DMAR units. As the iommu driver code uses the intel_iommu pointer stored
in intel_svm struct to do cache invalidations, it may only flush the cache
on a single DMAR unit, for others, the cache invalidation is missed.

As intel_svm struct already has a device list, this patch just moves the
intel_iommu pointer to be a field of intel_svm_dev struct.

Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode")
Cc: Lu Baolu 
Cc: Jacob Pan 
Cc: Raj Ashok 
Cc: David Woodhouse 
Reported-by: Guo Kaijie 
Reported-by: Xin Zeng 
Signed-off-by: Guo Kaijie 
Signed-off-by: Xin Zeng 
Signed-off-by: Liu Yi L 
Tested-by: Guo Kaijie 
Cc: sta...@vger.kernel.org # v5.0+
Acked-by: Lu Baolu 
---
 drivers/iommu/intel/svm.c   | 9 +
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4fa248b..6956669 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm 
*svm, struct intel_svm_d
}
desc.qw2 = 0;
desc.qw3 = 0;
-   qi_submit_sync(svm->iommu, , 1, 0);
+   qi_submit_sync(sdev->iommu, , 1, 0);
 
if (sdev->dev_iotlb) {
desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
@@ -166,7 +166,7 @@ static void intel_flush_svm_range_dev (struct intel_svm 
*svm, struct intel_svm_d
}
desc.qw2 = 0;
desc.qw3 = 0;
-   qi_submit_sync(svm->iommu, , 1, 0);
+   qi_submit_sync(sdev->iommu, , 1, 0);
}
 }
 
@@ -211,7 +211,7 @@ static void intel_mm_release(struct mmu_notifier *mn, 
struct mm_struct *mm)
 */
rcu_read_lock();
list_for_each_entry_rcu(sdev, >devs, list)
-   intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
+   intel_pasid_tear_down_entry(sdev->iommu, sdev->dev,
svm->pasid, true);
rcu_read_unlock();
 
@@ -363,6 +363,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
}
sdev->dev = dev;
sdev->sid = PCI_DEVID(info->bus, info->devfn);
+   sdev->iommu = iommu;
 
/* Only count users if device has aux domains */
if (iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
@@ -546,6 +547,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
goto out;
}
sdev->dev = dev;
+   sdev->iommu = iommu;
 
ret = intel_iommu_enable_pasid(iommu, dev);
if (ret) {
@@ -575,7 +577,6 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
kfree(sdev);
goto out;
}
-   svm->iommu = iommu;
 
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index d956987..9452268 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -758,6 +758,7 @@ struct intel_svm_dev {
struct list_head list;
struct rcu_head rcu;
struct device *dev;
+   struct intel_iommu *iommu;
struct svm_dev_ops *ops;
struct iommu_sva sva;
u32 pasid;
@@ -771,7 +772,6 @@ struct intel_svm {
struct mmu_notifier notifier;
struct mm_struct *mm;
 
-   struct intel_iommu *iommu;
unsigned int flags;
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
-- 
2.7.4

[PATCH v4 0/3] iommu/vt-d: Misc fixes on scalable mode

2021-01-06 Thread Liu Yi L

Hi Baolu, Joerg, Will,

This patchset aims to fix a bug regards to native SVM usage, and
also two bugs around subdevice (attached to device via auxiliary
manner) tracking and ineffective device_tlb flush.

v3 -> v4:
- Address comments from Baolu Lu and add acked-by
- Fix issue reported by "Dan Carpenter" and "kernel test robot"
- Add tested-by from Guo Kaijie on patch 1/3
- Rebase to 5.11-rc2
v3: 
https://lore.kernel.org/linux-iommu/20201229032513.486395-1-yi.l@intel.com/

v2 -> v3:
- Address comments from Baolu Lu against v2
- Rebased to 5.11-rc1
v2: 
https://lore.kernel.org/linux-iommu/20201223062720.29364-1-yi.l@intel.com/

v1 -> v2:
- Use a more recent Fix tag in "iommu/vt-d: Move intel_iommu info from struct 
intel_svm to struct intel_svm_dev"
- Refined the "iommu/vt-d: Track device aux-attach with subdevice_domain_info"
- Rename "iommu/vt-d: A fix to iommu_flush_dev_iotlb() for aux-domain" to be
  "iommu/vt-d: Fix ineffective devTLB invalidation for subdevices"
- Refined the commit messages
v1: 
https://lore.kernel.org/linux-iommu/2020122352.183523-1-yi.l@intel.com/

Regards,
Yi Liu

Liu Yi L (3):
  iommu/vt-d: Move intel_iommu info from struct intel_svm to struct
intel_svm_dev
  iommu/vt-d: Track device aux-attach with subdevice_domain_info
  iommu/vt-d: Fix ineffective devTLB invalidation for subdevices

 drivers/iommu/intel/iommu.c | 148 
 drivers/iommu/intel/svm.c   |   9 +--
 include/linux/intel-iommu.h |  18 --
 3 files changed, 125 insertions(+), 50 deletions(-)

-- 
2.7.4

Re: [RFC PATCH kernel] block: initialize block_device::bd_bdi for bdev_cache

2021-01-06 Thread Christoph Hellwig

On Thu, Jan 07, 2021 at 10:58:39AM +1100, Alexey Kardashevskiy wrote:
>> And AFAICT the root inode on
>> bdev superblock can get only to bdev_evict_inode() and bdev_free_inode().
>> Looking at bdev_evict_inode() the only thing that's used there from struct
>> block_device is really bd_bdi. bdev_free_inode() will also access
>> bdev->bd_stats and bdev->bd_meta_info. So we need to at least initialize
>> these to NULL as well.
>
> These are all NULL.
>
>> IMO the most logical place for all these
>> initializations is in bdev_alloc_inode()...
>
>
> This works. We can also check for NULL where it crashes. But I do not know 
> the code to make an informed decision...

The root inode is the special case, so I think moving the the initializers
for everything touched in ->evict_inode and ->free_inode to
bdev_alloc_inode makes most sense.

Alexey, do you want to respin or should I send a patch?

[PATCH v4 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

From: Jaegeuk Kim 

This fixes a warning caused by wrong reserve tag usage in __ufshcd_issue_tm_cmd.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

And, in ufshcd_err_handler(), we can avoid to send tm_cmd before aborting
outstanding commands by waiting a bit for IO completion like this.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and 
free TMFs")
Fixes: 2355b66ed20c ("scsi: ufs: Handle LINERESET indication in err handler")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 35 +++
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e6e7bdf99cd7..340dd5e515dd 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -44,6 +44,9 @@
 /* Query request timeout */
 #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
 
+/* LINERESET TIME OUT */
+#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
+
 /* Task management command timeout */
 #define TM_CMD_TIMEOUT 100 /* msecs */
 
@@ -5826,6 +5829,7 @@ static void ufshcd_err_handler(struct work_struct *work)
int err = 0, pmc_err;
int tag;
bool needs_reset = false, needs_restore = false;
+   ktime_t start;
 
hba = container_of(work, struct ufs_hba, eh_work);
 
@@ -5911,6 +5915,22 @@ static void ufshcd_err_handler(struct work_struct *work)
}
 
hba->silence_err_logs = true;
+
+   /* Wait for IO completion for non-fatal errors to avoid aborting IOs */
+   start = ktime_get();
+   while (hba->outstanding_reqs) {
+   ufshcd_complete_requests(hba);
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   schedule();
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
+   LINERESET_IO_TIMEOUT_MS) {
+   dev_err(hba->dev, "%s: timeout, outstanding=0x%lx\n",
+   __func__, hba->outstanding_reqs);
+   break;
+   }
+   }
+
/* release lock as clear command might sleep */
spin_unlock_irqrestore(hba->host->host_lock, flags);
/* Clear pending transfer requests */
@@ -6302,9 +6322,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}
 
-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",
+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}
 
@@ -6348,7 +6372,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
-- 
2.29.2.729.g45daf8777d-goog

[PATCH v4 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..e6e7bdf99cd7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -7718,6 +7720,8 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
if (ret)
goto out;
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Initialize devfreq after UFS device is detected */
if (ufshcd_is_clkscaling_supported(hba)) {
memcpy(>clk_scaling.saved_pwr_info.info,
@@ -7919,8 +7923,6 @@ static void ufshcd_async_scan(void *data, async_cookie_t 
cookie)
pm_runtime_put_sync(hba->dev);
ufshcd_exit_clk_scaling(hba);
ufshcd_hba_exit(hba);
-   } else {
-   ufshcd_clear_ua_wluns(hba);
}
 }
 
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog

[PATCH v4 0/2] UFS bug fixes

2021-01-06 Thread Jaegeuk Kim

Change log from v3:
 - move ufshcd_clear_ua_wluns() after ufshcd_scsi_add_wlus()
 - remove BLK_MQ_REQ_RESERVED for tm tag
 - move IO wait to cover all the non-fatal errors

Re: [PATCH v3 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 14:57, Jaegeuk Kim wrote:
> > On 01/07, Can Guo wrote:
> > > On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > > > When gate_work/ungate_work gets an error during hibern8_enter or exit,
> > > >  ufshcd_err_handler()
> > > >ufshcd_scsi_block_requests()
> > > >ufshcd_reset_and_restore()
> > > >  ufshcd_clear_ua_wluns() -> stuck
> > > >ufshcd_scsi_unblock_requests()
> > > >
> > > > In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery
> > > > flows
> > > > such as suspend/resume, link_recovery, and error_handler.
> > > >
> > > > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd
> > > > resets")
> > > > Signed-off-by: Jaegeuk Kim 
> > > > ---
> > > >  drivers/scsi/ufs/ufshcd.c | 15 ++-
> > > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > index bedb822a40a3..1678cec08b51 100644
> > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > @@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
> > > > if (ret)
> > > > dev_err(hba->dev, "%s: link recovery failed, err %d",
> > > > __func__, ret);
> > > > +   else
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > 
> > > Can we put it right after ufshcd_scsi_add_wlus() in ufshcd_add_lus()?
> > 
> > May I ask the reason? We'll call it after ufshcd_add_lus() later tho.
> > 
> 
> I think the code will be more readable - we do all the LU related
> stuffs in one func, just nit-picking though. I found this because
> I am planning to move the devfreq init codes out of ufshcd_add_lus()
> due to it is inappropriate to init devfreq in there by its naming,
> but it might be a good place for ufshcd_clear_ua_wluns().

Ok, that looks good to me. Thanks.

> 
> Thanks,
> Can Guo.
> 
> > > 
> > > Thanks,
> > > Can Guo.
> > > 
> > > >
> > > > return ret;
> > > >  }
> > > > @@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > > ufshcd_scsi_unblock_requests(hba);
> > > > ufshcd_err_handling_unprepare(hba);
> > > > up(>eh_sem);
> > > > +
> > > > +   if (!err && needs_reset)
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -6940,14 +6945,11 @@ static int
> > > > ufshcd_host_reset_and_restore(struct ufs_hba *hba)
> > > > ufshcd_set_clk_freq(hba, true);
> > > >
> > > > err = ufshcd_hba_enable(hba);
> > > > -   if (err)
> > > > -   goto out;
> > > >
> > > > /* Establish the link again and restore the device */
> > > > -   err = ufshcd_probe_hba(hba, false);
> > > > if (!err)
> > > > -   ufshcd_clear_ua_wluns(hba);
> > > > -out:
> > > > +   err = ufshcd_probe_hba(hba, false);
> > > > +
> > > > if (err)
> > > > dev_err(hba->dev, "%s: Host init failed %d\n", 
> > > > __func__, err);
> > > > ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
> > > > @@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
> > > > enum ufs_pm_op pm_op)
> > > > ufshcd_resume_clkscaling(hba);
> > > > hba->clk_gating.is_suspended = false;
> > > > hba->dev_info.b_rpm_dev_flush_capable = false;
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > > ufshcd_release(hba);
> > > >  out:
> > > > if (hba->dev_info.b_rpm_dev_flush_capable) {
> > > > @@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba,
> > > > enum ufs_pm_op pm_op)
> > > > cancel_delayed_work(>rpm_dev_flush_recheck_work);
> > > > }
> > > >
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > > +
> > > > /* Schedule clock gating in case of no access to UFS device yet 
> > > > */
> > > > ufshcd_release(hba);

[PATCH v2 4.9 00/10] fix a race in release_task when flushing the dentry

2021-01-06 Thread Wen Yang

The dentries such as /proc//ns/ have the DCACHE_OP_DELETE flag, they 
should be deleted when the process exits. 

Suppose the following race appears： 

release_task dput 
-> proc_flush_task 
 -> dentry->d_op->d_delete(dentry) 
-> __exit_signal 
 -> dentry->d_lockref.count--  and return. 

In the proc_flush_task(), if another process is using this dentry, it will
not be deleted. At the same time, in dput(), d_op->d_delete() can be executed
before __exit_signal(pid has not been hashed), d_delete returns false, so
this dentry still cannot be deleted.

This dentry will always be cached (although its count is 0 and the
DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and
these dentries can only be deleted when drop_caches is manually triggered.

This will result in wasted memory. What's more troublesome is that these
dentries reference pid, according to the commit f333c700c610 ("pidns: Add a
limit on the number of pid namespaces"), if the pid cannot be released, it
may result in the inability to create a new pid_ns.

This issue was introduced by 60347f6716aa ("pid namespaces: prepare
proc_flust_task() to flush entries from multiple proc trees"), exposed by
f333c700c610 ("pidns: Add a limit on the number of pid namespaces"), and then
fixed by 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc").


Alexey Dobriyan (1):
  proc: use %u for pid printing and slightly less stack

Andreas Gruenbacher (1):
  proc: Pass file mode to proc_pid_make_inode

Christian Brauner (1):
  clone: add CLONE_PIDFD

Eric W. Biederman (6):
  proc: Better ownership of files for non-dumpable tasks in user
namespaces
  proc: Rename in proc_inode rename sysctl_inodes sibling_inodes
  proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
  proc: Clear the pieces of proc_inode that proc_evict_inode cares about
  proc: Use d_invalidate in proc_prune_siblings_dcache
  proc: Use a list of inodes to flush from proc

Joel Fernandes (Google) (1):
  pidfd: add polling support

 fs/proc/base.c | 242 -
 fs/proc/fd.c   |  20 +---
 fs/proc/inode.c|  67 -
 fs/proc/internal.h |  22 ++---
 fs/proc/namespaces.c   |   3 +-
 fs/proc/proc_sysctl.c  |  45 ++---
 fs/proc/self.c |   6 +-
 fs/proc/thread_self.c  |   5 +-
 include/linux/pid.h|   5 +
 include/linux/proc_fs.h|   4 +-
 include/uapi/linux/sched.h |   1 +
 kernel/exit.c  |   5 +-
 kernel/fork.c  | 131 +++-
 kernel/pid.c   |   3 +
 kernel/signal.c|  11 +++
 security/selinux/hooks.c   |   1 +
 16 files changed, 343 insertions(+), 228 deletions(-)

-- 
1.8.3.1

Re: [PATCH V4] scsi: ufs-debugfs: Add error counters

2021-01-06 Thread Can Guo


On 2021-01-07 15:25, Adrian Hunter wrote:

People testing have a need to know how many errors might be occurring
over time. Add error counters and expose them via debugfs.

A module initcall is used to create a debugfs root directory for
ufshcd-related items. In the case that modules are built-in, then
initialization is done in link order, so move ufshcd-core to the top of
the Makefile.

Signed-off-by: Adrian Hunter 
Reviewed-by: Avri Altman 
Reviewed-by: Bean Huo 


Reviewed-by: Can Guo 


---


Changes in V4:
Added Reviewed-by: Bean Huo 

Changes in V3:
Fixed link order to ensure correct initcall ordering when
modules are built-in.
Amended commit message accordingly.

Changes in V2:
Add missing '#include "ufs-debugfs.h"' in ufs-debugfs.c
Reported-by: kernel test robot 


 drivers/scsi/ufs/Makefile  | 13 +---
 drivers/scsi/ufs/ufs-debugfs.c | 56 ++
 drivers/scsi/ufs/ufs-debugfs.h | 22 +
 drivers/scsi/ufs/ufshcd.c  | 19 
 drivers/scsi/ufs/ufshcd.h  |  5 +++
 5 files changed, 111 insertions(+), 4 deletions(-)
 create mode 100644 drivers/scsi/ufs/ufs-debugfs.c
 create mode 100644 drivers/scsi/ufs/ufs-debugfs.h

diff --git a/drivers/scsi/ufs/Makefile b/drivers/scsi/ufs/Makefile
index 4679af1b564e..06f3a3fe4a44 100644
--- a/drivers/scsi/ufs/Makefile
+++ b/drivers/scsi/ufs/Makefile
@@ -1,5 +1,14 @@
 # SPDX-License-Identifier: GPL-2.0
 # UFSHCD makefile
+
+# The link order is important here. ufshcd-core must initialize
+# before vendor drivers.
+obj-$(CONFIG_SCSI_UFSHCD)  += ufshcd-core.o
+ufshcd-core-y  += ufshcd.o ufs-sysfs.o
+ufshcd-core-$(CONFIG_DEBUG_FS) += ufs-debugfs.o
+ufshcd-core-$(CONFIG_SCSI_UFS_BSG) += ufs_bsg.o
+ufshcd-core-$(CONFIG_SCSI_UFS_CRYPTO)  += ufshcd-crypto.o
+
 obj-$(CONFIG_SCSI_UFS_DWC_TC_PCI) += tc-dwc-g210-pci.o ufshcd-dwc.o
tc-dwc-g210.o
 obj-$(CONFIG_SCSI_UFS_DWC_TC_PLATFORM) += tc-dwc-g210-pltfrm.o
ufshcd-dwc.o tc-dwc-g210.o
 obj-$(CONFIG_SCSI_UFS_CDNS_PLATFORM) += cdns-pltfrm.o
@@ -7,10 +16,6 @@ obj-$(CONFIG_SCSI_UFS_QCOM) += ufs_qcom.o
 ufs_qcom-y += ufs-qcom.o
 ufs_qcom-$(CONFIG_SCSI_UFS_CRYPTO) += ufs-qcom-ice.o
 obj-$(CONFIG_SCSI_UFS_EXYNOS) += ufs-exynos.o
-obj-$(CONFIG_SCSI_UFSHCD) += ufshcd-core.o
-ufshcd-core-y  += ufshcd.o ufs-sysfs.o
-ufshcd-core-$(CONFIG_SCSI_UFS_BSG) += ufs_bsg.o
-ufshcd-core-$(CONFIG_SCSI_UFS_CRYPTO) += ufshcd-crypto.o
 obj-$(CONFIG_SCSI_UFSHCD_PCI) += ufshcd-pci.o
 obj-$(CONFIG_SCSI_UFSHCD_PLATFORM) += ufshcd-pltfrm.o
 obj-$(CONFIG_SCSI_UFS_HISI) += ufs-hisi.o
diff --git a/drivers/scsi/ufs/ufs-debugfs.c 
b/drivers/scsi/ufs/ufs-debugfs.c

new file mode 100644
index ..dee98dc72d29
--- /dev/null
+++ b/drivers/scsi/ufs/ufs-debugfs.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2020 Intel Corporation
+
+#include 
+
+#include "ufs-debugfs.h"
+#include "ufshcd.h"
+
+static struct dentry *ufs_debugfs_root;
+
+void __init ufs_debugfs_init(void)
+{
+   ufs_debugfs_root = debugfs_create_dir("ufshcd", NULL);
+}
+
+void __exit ufs_debugfs_exit(void)
+{
+   debugfs_remove_recursive(ufs_debugfs_root);
+}
+
+static int ufs_debugfs_stats_show(struct seq_file *s, void *data)
+{
+   struct ufs_hba *hba = s->private;
+   struct ufs_event_hist *e = hba->ufs_stats.event;
+
+#define PRT(fmt, typ) \
+   seq_printf(s, fmt, e[UFS_EVT_ ## typ].cnt)
+
+   PRT("PHY Adapter Layer errors (except LINERESET): %llu\n", PA_ERR);
+   PRT("Data Link Layer errors: %llu\n", DL_ERR);
+   PRT("Network Layer errors: %llu\n", NL_ERR);
+   PRT("Transport Layer errors: %llu\n", TL_ERR);
+   PRT("Generic DME errors: %llu\n", DME_ERR);
+   PRT("Auto-hibernate errors: %llu\n", AUTO_HIBERN8_ERR);
+	PRT("IS Fatal errors (CEFES, SBFES, HCFES, DFES): %llu\n", 
FATAL_ERR);

+   PRT("DME Link Startup errors: %llu\n", LINK_STARTUP_FAIL);
+   PRT("PM Resume errors: %llu\n", RESUME_ERR);
+   PRT("PM Suspend errors : %llu\n", SUSPEND_ERR);
+   PRT("Logical Unit Resets: %llu\n", DEV_RESET);
+   PRT("Host Resets: %llu\n", HOST_RESET);
+   PRT("SCSI command aborts: %llu\n", ABORT);
+#undef PRT
+   return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(ufs_debugfs_stats);
+
+void ufs_debugfs_hba_init(struct ufs_hba *hba)
+{
+	hba->debugfs_root = debugfs_create_dir(dev_name(hba->dev), 
ufs_debugfs_root);

+   debugfs_create_file("stats", 0400, hba->debugfs_root, hba,
_debugfs_stats_fops);
+}
+
+void ufs_debugfs_hba_exit(struct ufs_hba *hba)
+{
+   debugfs_remove_recursive(hba->debugfs_root);
+}
diff --git a/drivers/scsi/ufs/ufs-debugfs.h 
b/drivers/scsi/ufs/ufs-debugfs.h

new file mode 100644
index ..f35b39c4b4f5
--- /dev/null
+++ b/drivers/scsi/ufs/ufs-debugfs.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation
+ */
+
+#ifndef __UFS_DEBUGFS_H__
+#define

Re: [PATCH 3/5] crypto: add RFC5869 HKDF

2021-01-06 Thread Eric Biggers

On Mon, Jan 04, 2021 at 10:49:13PM +0100, Stephan Müller wrote:
> RFC5869 specifies an extract and expand two-step key derivation
> function. The HKDF implementation is provided as a service function that
> operates on a caller-provided HMAC cipher handle.

HMAC isn't a "cipher".

> The extract function is invoked via the crypto_hkdf_setkey call.

Any reason not to call this crypto_hkdf_extract(), to match the specification?

> RFC5869
> allows two optional parameters to be provided to the extract operation:
> the salt and additional information. Both are to be provided with the
> seed parameter where the salt is the first entry of the seed parameter
> and all subsequent entries are handled as additional information. If
> the caller intends to invoke the HKDF without salt, it has to provide a
> NULL/0 entry as first entry in seed.

Where does "additional information" for extract come from?  RFC 5869 has:

HKDF-Extract(salt, IKM) -> PRK

Inputs:
  salt optional salt value (a non-secret random value);
   if not provided, it is set to a string of HashLen zeros.
  IKM  input keying material

There's no "additional information".

> 
> The expand function is invoked via the crypto_hkdf_generate and can be
> invoked multiple times. This function allows the caller to provide a
> context for the key derivation operation. As specified in RFC5869, it is
> optional. In case such context is not provided, the caller must provide
> NULL / 0 for the info / info_nvec parameters.

Any reason not to call this crypto_hkdf_expand() to match the specification?

- Eric

Re: kerneldoc warnings since commit 538fc2ee870a3 ("rcu: Introduce kfree_rcu() single-argument macro")

2021-01-06 Thread Lukas Bulwahn

On Tue, Jan 5, 2021 at 5:29 PM Uladzislau Rezki  wrote:
>
> On Tue, Jan 05, 2021 at 06:56:59AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 05, 2021 at 02:14:41PM +0100, Uladzislau Rezki wrote:
> > > Dear, Lukas.
> > >
> > > > Dear Uladzislau,
> > > >
> > > > in commit 538fc2ee870a3 ("rcu: Introduce kfree_rcu() single-argument
> > > > macro"), you have refactored the kfree_rcu macro.
> > > >
> > > > Since then, make htmldocs warns:
> > > >
> > > > ./include/linux/rcupdate.h:882: warning: Excess function parameter
> > > > 'ptr' description in 'kfree_rcu'
> > > > ./include/linux/rcupdate.h:882: warning: Excess function parameter
> > > > 'rhf' description in 'kfree_rcu'
> > > >
> > > > As you deleted the two arguments in the macro definition, kerneldoc
> > > > cannot resolve the argument names in the macro's kerneldoc
> > > > documentation anymore and warns about that.
> > > >
> > > > Probably, it is best to just turn the formal kerneldoc references to
> > > > the two arguments, which are not used in the macro definition anymore,
> > > > simply into two informal references in the documentation.
> > > >
> > > Thanks for your suggestion. I am not sure if htmldocs supports something
> > > like "__maybe_unused", but tend to say that it does not. See below the
> > > patch:
> > >
> > > 
> > > >From 65ecc7c58810c963c02e0596ce2e5758c54ef55d Mon Sep 17 00:00:00 2001
> > > From: "Uladzislau Rezki (Sony)" 
> > > Date: Tue, 5 Jan 2021 13:23:30 +0100
> > > Subject: [PATCH] rcu: fix kerneldoc warnings
> > >
> > > After refactoring of the kfree_rcu(), it becomes possible to use
> > > the macro with one or two arguments. From the other hand, in the
> > > description there are two arguments in the macro definition expected.
> > > That is why the "htmldocs" emits a warning about it:
> > >
> > > 
> > > ./include/linux/rcupdate.h:882: warning: Excess function parameter
> > > 'ptr' description in 'kfree_rcu'
> > > ./include/linux/rcupdate.h:882: warning: Excess function parameter
> > > 'rhf' description in 'kfree_rcu'
> > > 
> > >
> > > Fix it by converting two parameters into informal references in the
> > > macro description.
> > >
> > > Fixes: 3d3d9ff077a9 ("rcu: Introduce kfree_rcu() single-argument macro")
> > > Signed-off-by: Uladzislau Rezki (Sony) 
> > > ---
> > >  include/linux/rcupdate.h | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index ebd8dcca4997..e678ce7f5ca2 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -854,8 +854,8 @@ static inline notrace void 
> > > rcu_read_unlock_sched_notrace(void)
> > >
> > >  /**
> > >   * kfree_rcu() - kfree an object after a grace period.
> > > - * @ptr: pointer to kfree for both single- and double-argument 
> > > invocations.
> > > - * @rhf: the name of the struct rcu_head within the type of @ptr,
> > > + * ptr: pointer to kfree for both single- and double-argument 
> > > invocations.
> > > + * rhf: the name of the struct rcu_head within the type of ptr,
> > >   *   but only for double-argument invocations.
> > >   *
> > >   * Many rcu callbacks functions just call kfree() on the base structure.
> > > --
> > > 2.20.1
> > > 
> > >
> > > Paul, does it work for you?
> >
> > If it works for the documentation generation, then it works for me.  ;-)
> >
> OK. Then we need the patch to be reviewed by the documentation generation :)
>
> Dear, linux-doc folk!
>
> Could you please review the patch that is in question?
>

I think you can shorten the feedback loop.
IMHO, the documentation is as comprehensible as before and it makes a
warning go away (getting us back to the zero-documentation-warnings
state).

Just send out your patch with linux-doc as CC and if there is no
complaint within a few days, Paul will pick it up and it is all good.

Lukas

Re: [RFC 0/2] kbuild: Add support to build overlays (%.dtbo)

2021-01-06 Thread Viresh Kumar

On 07-01-21, 14:28, Masahiro Yamada wrote:
> On Wed, Jan 6, 2021 at 12:21 AM Rob Herring  wrote:
> >
> > On Tue, Jan 5, 2021 at 4:24 AM Viresh Kumar  wrote:
> > >
> > > Hello,
> > >
> > > Here is an attempt to make some changes in the kernel to allow building
> > > of device tree overlays.
> > >
> > > While at it, I would also like to discuss about how we should mention
> > > the base DT blobs in the Makefiles for the overlays, so they can be
> > > build tested to make sure the overlays apply properly.
> > >
> > > A simple way is to mention that with -base extension, like this:
> > >
> > > $(overlay-file)-base := platform-base.dtb
> > >
> > > Any other preference ?
> 
> Viresh's patch is not enough.
> 
> We will need to change .gitignore
> and scripts/Makefile.dtbinst as well.

Thanks.
 
> In my understanding, the build rule is completely the same
> between .dtb and .dtbo

Right.

> As Rob mentioned, I am not sure if we really need/want
> a separate extension.
> 
> 
> A counter approach is to use an extension like '.ovl.dtb'
> It clarifies it is an overlay fragment without changing
> anything in our build system or the upstream DTC project.
> 
> We use chained extension in some places, for example,
> .dt.yaml for schema yaml files.
> 
> 
> 
> dtb-$(CONFIG_ARCH_FOO) += \
> foo-board.dtb \
> foo-overlay1.ovl.dtb \
> foo-overlay2.ovl.dtb
> 
> 
> Overlay DT source file names must end with '.ovl.dts'

I am fine with any approach that you guys feel is better, .dts or .ovl.dts. I
wanted to start a discussion where we can resolve this and be done with it.

Thanks.

-- 
viresh

[PATCH V4] scsi: ufs-debugfs: Add error counters

2021-01-06 Thread Adrian Hunter

People testing have a need to know how many errors might be occurring
over time. Add error counters and expose them via debugfs.

A module initcall is used to create a debugfs root directory for
ufshcd-related items. In the case that modules are built-in, then
initialization is done in link order, so move ufshcd-core to the top of
the Makefile.

Signed-off-by: Adrian Hunter 
Reviewed-by: Avri Altman 
Reviewed-by: Bean Huo 
---


Changes in V4:
Added Reviewed-by: Bean Huo 

Changes in V3:
Fixed link order to ensure correct initcall ordering when
modules are built-in.
Amended commit message accordingly.

Changes in V2:
Add missing '#include "ufs-debugfs.h"' in ufs-debugfs.c
Reported-by: kernel test robot 


 drivers/scsi/ufs/Makefile  | 13 +---
 drivers/scsi/ufs/ufs-debugfs.c | 56 ++
 drivers/scsi/ufs/ufs-debugfs.h | 22 +
 drivers/scsi/ufs/ufshcd.c  | 19 
 drivers/scsi/ufs/ufshcd.h  |  5 +++
 5 files changed, 111 insertions(+), 4 deletions(-)
 create mode 100644 drivers/scsi/ufs/ufs-debugfs.c
 create mode 100644 drivers/scsi/ufs/ufs-debugfs.h

diff --git a/drivers/scsi/ufs/Makefile b/drivers/scsi/ufs/Makefile
index 4679af1b564e..06f3a3fe4a44 100644
--- a/drivers/scsi/ufs/Makefile
+++ b/drivers/scsi/ufs/Makefile
@@ -1,5 +1,14 @@
 # SPDX-License-Identifier: GPL-2.0
 # UFSHCD makefile
+
+# The link order is important here. ufshcd-core must initialize
+# before vendor drivers.
+obj-$(CONFIG_SCSI_UFSHCD)  += ufshcd-core.o
+ufshcd-core-y  += ufshcd.o ufs-sysfs.o
+ufshcd-core-$(CONFIG_DEBUG_FS) += ufs-debugfs.o
+ufshcd-core-$(CONFIG_SCSI_UFS_BSG) += ufs_bsg.o
+ufshcd-core-$(CONFIG_SCSI_UFS_CRYPTO)  += ufshcd-crypto.o
+
 obj-$(CONFIG_SCSI_UFS_DWC_TC_PCI) += tc-dwc-g210-pci.o ufshcd-dwc.o 
tc-dwc-g210.o
 obj-$(CONFIG_SCSI_UFS_DWC_TC_PLATFORM) += tc-dwc-g210-pltfrm.o ufshcd-dwc.o 
tc-dwc-g210.o
 obj-$(CONFIG_SCSI_UFS_CDNS_PLATFORM) += cdns-pltfrm.o
@@ -7,10 +16,6 @@ obj-$(CONFIG_SCSI_UFS_QCOM) += ufs_qcom.o
 ufs_qcom-y += ufs-qcom.o
 ufs_qcom-$(CONFIG_SCSI_UFS_CRYPTO) += ufs-qcom-ice.o
 obj-$(CONFIG_SCSI_UFS_EXYNOS) += ufs-exynos.o
-obj-$(CONFIG_SCSI_UFSHCD) += ufshcd-core.o
-ufshcd-core-y  += ufshcd.o ufs-sysfs.o
-ufshcd-core-$(CONFIG_SCSI_UFS_BSG) += ufs_bsg.o
-ufshcd-core-$(CONFIG_SCSI_UFS_CRYPTO) += ufshcd-crypto.o
 obj-$(CONFIG_SCSI_UFSHCD_PCI) += ufshcd-pci.o
 obj-$(CONFIG_SCSI_UFSHCD_PLATFORM) += ufshcd-pltfrm.o
 obj-$(CONFIG_SCSI_UFS_HISI) += ufs-hisi.o
diff --git a/drivers/scsi/ufs/ufs-debugfs.c b/drivers/scsi/ufs/ufs-debugfs.c
new file mode 100644
index ..dee98dc72d29
--- /dev/null
+++ b/drivers/scsi/ufs/ufs-debugfs.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2020 Intel Corporation
+
+#include 
+
+#include "ufs-debugfs.h"
+#include "ufshcd.h"
+
+static struct dentry *ufs_debugfs_root;
+
+void __init ufs_debugfs_init(void)
+{
+   ufs_debugfs_root = debugfs_create_dir("ufshcd", NULL);
+}
+
+void __exit ufs_debugfs_exit(void)
+{
+   debugfs_remove_recursive(ufs_debugfs_root);
+}
+
+static int ufs_debugfs_stats_show(struct seq_file *s, void *data)
+{
+   struct ufs_hba *hba = s->private;
+   struct ufs_event_hist *e = hba->ufs_stats.event;
+
+#define PRT(fmt, typ) \
+   seq_printf(s, fmt, e[UFS_EVT_ ## typ].cnt)
+
+   PRT("PHY Adapter Layer errors (except LINERESET): %llu\n", PA_ERR);
+   PRT("Data Link Layer errors: %llu\n", DL_ERR);
+   PRT("Network Layer errors: %llu\n", NL_ERR);
+   PRT("Transport Layer errors: %llu\n", TL_ERR);
+   PRT("Generic DME errors: %llu\n", DME_ERR);
+   PRT("Auto-hibernate errors: %llu\n", AUTO_HIBERN8_ERR);
+   PRT("IS Fatal errors (CEFES, SBFES, HCFES, DFES): %llu\n", FATAL_ERR);
+   PRT("DME Link Startup errors: %llu\n", LINK_STARTUP_FAIL);
+   PRT("PM Resume errors: %llu\n", RESUME_ERR);
+   PRT("PM Suspend errors : %llu\n", SUSPEND_ERR);
+   PRT("Logical Unit Resets: %llu\n", DEV_RESET);
+   PRT("Host Resets: %llu\n", HOST_RESET);
+   PRT("SCSI command aborts: %llu\n", ABORT);
+#undef PRT
+   return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(ufs_debugfs_stats);
+
+void ufs_debugfs_hba_init(struct ufs_hba *hba)
+{
+   hba->debugfs_root = debugfs_create_dir(dev_name(hba->dev), 
ufs_debugfs_root);
+   debugfs_create_file("stats", 0400, hba->debugfs_root, hba, 
_debugfs_stats_fops);
+}
+
+void ufs_debugfs_hba_exit(struct ufs_hba *hba)
+{
+   debugfs_remove_recursive(hba->debugfs_root);
+}
diff --git a/drivers/scsi/ufs/ufs-debugfs.h b/drivers/scsi/ufs/ufs-debugfs.h
new file mode 100644
index ..f35b39c4b4f5
--- /dev/null
+++ b/drivers/scsi/ufs/ufs-debugfs.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation
+ */
+
+#ifndef __UFS_DEBUGFS_H__
+#define __UFS_DEBUGFS_H__
+
+struct ufs_hba;
+
+#ifdef CONFIG_DEBUG_FS
+void

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 14:51, Jaegeuk Kim wrote:
> > On 01/07, Can Guo wrote:
> > > On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > > > From: Jaegeuk Kim 
> > > >
> > > > This fixes a warning caused by wrong reserve tag usage in
> > > > __ufshcd_issue_tm_cmd.
> > > >
> > > > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > > > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > > > blk_mq_get_tag+0x438/0x46c
> > > >
> > > > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > > > aborting
> > > > outstanding commands by waiting a bit for IO completion like this.
> > > >
> > > > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > > >
> > > 
> > > Would you mind add a Fixes tag?
> > 
> > Ok.
> > 
> > > 
> > > > Signed-off-by: Jaegeuk Kim 
> > > > ---
> > > >  drivers/scsi/ufs/ufshcd.c | 36 
> > > >  1 file changed, 32 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > index 1678cec08b51..47fc8da3cbf9 100644
> > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > @@ -44,6 +44,9 @@
> > > >  /* Query request timeout */
> > > >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > > >
> > > > +/* LINERESET TIME OUT */
> > > > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 
> > > > sec */
> > > > +
> > > >  /* Task management command timeout */
> > > >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > > >
> > > > @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > >  * check if power mode restore is needed.
> > > >  */
> > > > if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> > > > +   ktime_t start = ktime_get();
> > > > +
> > > > hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
> > > > if (!hba->saved_uic_err)
> > > > hba->saved_err &= ~UIC_ERROR;
> > > > @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > > if (ufshcd_is_pwr_mode_restore_needed(hba))
> > > > needs_restore = true;
> > > > spin_lock_irqsave(hba->host->host_lock, flags);
> > > > +   /* Wait for IO completion to avoid aborting IOs */
> > > > +   while (hba->outstanding_reqs) {
> > > > +   ufshcd_complete_requests(hba);
> > > > +   spin_unlock_irqrestore(hba->host->host_lock, 
> > > > flags);
> > > > +   schedule();
> > > > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > > > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > > > +   
> > > > LINERESET_IO_TIMEOUT_MS) {
> > > > +   dev_err(hba->dev, "%s: timeout, 
> > > > outstanding=0x%lx\n",
> > > > +   __func__, 
> > > > hba->outstanding_reqs);
> > > > +   break;
> > > > +   }
> > > > +   }
> > > > +
> > > > if (!hba->saved_err && !needs_restore)
> > > > goto skip_err_handling;
> > > > }
> > > > @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> > > > *__hba)
> > > > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> > > > }
> > > >
> > > > -   if (enabled_intr_status && retval == IRQ_NONE) {
> > > > -   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> > > > -   __func__, intr_status);
> > > > +   if (enabled_intr_status && retval == IRQ_NONE &&
> > > > +   !ufshcd_eh_in_progress(hba)) {
> > > > +   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x 
> > > > (0x%08x,
> > > > 0x%08x)\n",
> > > > +   __func__,
> > > > +   intr_status,
> > > > +   hba->ufs_stats.last_intr_status,
> > > > +   enabled_intr_status);
> > > > ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, 
> > > > "host_regs: ");
> > > > }
> > > >
> > > > @@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> > > > *hba,
> > > >  * Even though we use wait_event() which sleeps indefinitely,
> > > >  * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> > > >  */
> > > > -   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> > > > +   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
> > > > +   BLK_MQ_REQ_NOWAIT);
> > > 
> > > Sorry that I didn't pay much attention to this part of code before.
> >

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Can Guo


On 2021-01-07 15:03, Can Guo wrote:

On 2021-01-07 14:51, Jaegeuk Kim wrote:

On 01/07, Can Guo wrote:

On 2021-01-07 05:41, Jaegeuk Kim wrote:
> From: Jaegeuk Kim 
>
> This fixes a warning caused by wrong reserve tag usage in
> __ufshcd_issue_tm_cmd.
>
> WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> blk_mq_get_tag+0x438/0x46c
>
> And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> aborting
> outstanding commands by waiting a bit for IO completion like this.
>
> __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
>

Would you mind add a Fixes tag?


Ok.



> Signed-off-by: Jaegeuk Kim 
> ---
>  drivers/scsi/ufs/ufshcd.c | 36 
>  1 file changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 1678cec08b51..47fc8da3cbf9 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -44,6 +44,9 @@
>  /* Query request timeout */
>  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
>
> +/* LINERESET TIME OUT */
> +#define LINERESET_IO_TIMEOUT_MS   (3) /* 30 sec */
> +
>  /* Task management command timeout */
>  #define TM_CMD_TIMEOUT100 /* msecs */
>
> @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> *work)
> * check if power mode restore is needed.
> */
>if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> +  ktime_t start = ktime_get();
> +
>hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
>if (!hba->saved_uic_err)
>hba->saved_err &= ~UIC_ERROR;
> @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> *work)
>if (ufshcd_is_pwr_mode_restore_needed(hba))
>needs_restore = true;
>spin_lock_irqsave(hba->host->host_lock, flags);
> +  /* Wait for IO completion to avoid aborting IOs */
> +  while (hba->outstanding_reqs) {
> +  ufshcd_complete_requests(hba);
> +  spin_unlock_irqrestore(hba->host->host_lock, flags);
> +  schedule();
> +  spin_lock_irqsave(hba->host->host_lock, flags);
> +  if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> +  LINERESET_IO_TIMEOUT_MS) {
> +  dev_err(hba->dev, "%s: timeout, 
outstanding=0x%lx\n",
> +  __func__, hba->outstanding_reqs);
> +  break;
> +  }
> +  }
> +
>if (!hba->saved_err && !needs_restore)
>goto skip_err_handling;
>}
> @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> *__hba)
>intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
>}
>
> -  if (enabled_intr_status && retval == IRQ_NONE) {
> -  dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> -  __func__, intr_status);
> +  if (enabled_intr_status && retval == IRQ_NONE &&
> +  !ufshcd_eh_in_progress(hba)) {
> +  dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
> 0x%08x)\n",
> +  __func__,
> +  intr_status,
> +  hba->ufs_stats.last_intr_status,
> +  enabled_intr_status);
>ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
>}
>
> @@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> *hba,
> * Even though we use wait_event() which sleeps indefinitely,
> * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> */
> -  req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> +  req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
> +  BLK_MQ_REQ_NOWAIT);

Sorry that I didn't pay much attention to this part of code before.
May I know why must we use the BLK_MQ_REQ_RESERVED flag?


What I understood is the reserved tag is used when aborting 
outstanding

IOs when all the 32 tags were used.



No, the tm requests and I/O requests are on two different tag sets:
tm requests come from hba->tmf_tag_set, while I/O requests come from
hba->shost->tag_set. Meaning they don't share tags with each other.


Add they are issued on two different HW queues - one for tm reqs,
one for I/O reqs, which is why two different tag sets are created.





Thanks,
Can Guo.

> +  if (IS_ERR(req))
> +  return PTR_ERR(req);
> +
>req->end_io_data = 
>free_slot = req->tag;
>WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
> @@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void
> __iomem *mmio_base, unsigned int irq)
>
>hba->tmf_tag_set = (struct blk_mq_tag_set) {
>.nr_hw_queues

Re: [PATCH] net: dsa: fix led_classdev build errors

2021-01-06 Thread Kurt Kanzenbach

On Tue Jan 05 2021, Randy Dunlap wrote:
> Fix build errors when LEDS_CLASS=m and NET_DSA_HIRSCHMANN_HELLCREEK=y.
> This limits the latter to =m when LEDS_CLASS=m.
>
> microblaze-linux-ld: drivers/net/dsa/hirschmann/hellcreek_ptp.o: in function 
> `hellcreek_ptp_setup':
> (.text+0xf80): undefined reference to `led_classdev_register_ext'
> microblaze-linux-ld: (.text+0xf94): undefined reference to 
> `led_classdev_register_ext'
> microblaze-linux-ld: drivers/net/dsa/hirschmann/hellcreek_ptp.o: in function 
> `hellcreek_ptp_free':
> (.text+0x1018): undefined reference to `led_classdev_unregister'
> microblaze-linux-ld: (.text+0x1024): undefined reference to 
> `led_classdev_unregister'
>
> Signed-off-by: Randy Dunlap 
> Reported-by: kernel test robot 
> Link: lore.kernel.org/r/202101060655.iuvmjqs2-...@intel.com
> Cc: Kurt Kanzenbach 
> Cc: net...@vger.kernel.org
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 

Fixes: 7d9ee2e8ff15 ("net: dsa: hellcreek: Add PTP status LEDs")
Reviewed-by: Kurt Kanzenbach 

Thanks,
Kurt


signature.asc
Description: PGP signature

Re: [PATCH 5/5] fs: use HKDF implementation from kernel crypto API

2021-01-06 Thread Eric Biggers

On Mon, Jan 04, 2021 at 10:50:49PM +0100, Stephan Müller wrote:
> As the kernel crypto API implements HKDF, replace the
> file-system-specific HKDF implementation with the generic HKDF
> implementation.
> 
> Signed-off-by: Stephan Mueller 
> ---
>  fs/crypto/Kconfig   |   2 +-
>  fs/crypto/fscrypt_private.h |   4 +-
>  fs/crypto/hkdf.c| 108 +---
>  3 files changed, 30 insertions(+), 84 deletions(-)
> 
> diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig
> index a5f5c30368a2..9450e958f1d1 100644
> --- a/fs/crypto/Kconfig
> +++ b/fs/crypto/Kconfig
> @@ -2,7 +2,7 @@
>  config FS_ENCRYPTION
>   bool "FS Encryption (Per-file encryption)"
>   select CRYPTO
> - select CRYPTO_HASH
> + select CRYPTO_HKDF
>   select CRYPTO_SKCIPHER
>   select CRYPTO_LIB_SHA256
>   select KEYS
> diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
> index 3fa965eb3336..0d6871838099 100644
> --- a/fs/crypto/fscrypt_private.h
> +++ b/fs/crypto/fscrypt_private.h
> @@ -304,7 +304,7 @@ struct fscrypt_hkdf {
>   struct crypto_shash *hmac_tfm;
>  };
>  
> -int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
> +int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, u8 *master_key,
> unsigned int master_key_size);

It shouldn't be necessary to remove const here.

>  
>  /*
> @@ -323,7 +323,7 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 
> *master_key,
>  #define HKDF_CONTEXT_INODE_HASH_KEY  7 /* info=   */
>  
>  int fscrypt_hkdf_expand(const struct fscrypt_hkdf *hkdf, u8 context,
> - const u8 *info, unsigned int infolen,
> + u8 *info, unsigned int infolen,
>   u8 *okm, unsigned int okmlen);

Likewise.  In fact some callers rely on 'info' not being modified.

> -/*
> + *
>   * Compute HKDF-Extract using the given master key as the input keying 
> material,
>   * and prepare an HMAC transform object keyed by the resulting pseudorandom 
> key.
>   *
>   * Afterwards, the keyed HMAC transform object can be used for HKDF-Expand 
> many
>   * times without having to recompute HKDF-Extract each time.
>   */
> -int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
> +int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, u8 *master_key,
> unsigned int master_key_size)
>  {
> + /* HKDF-Extract (RFC 5869 section 2.2), unsalted */
> + const struct kvec seed[] = { {
> + .iov_base = NULL,
> + .iov_len = 0
> + }, {
> + .iov_base = master_key,
> + .iov_len = master_key_size
> + } };
>   struct crypto_shash *hmac_tfm;
> - u8 prk[HKDF_HASHLEN];
>   int err;
>  
>   hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, 0);
> @@ -74,16 +65,12 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 
> *master_key,
>   return PTR_ERR(hmac_tfm);
>   }
>  
> - if (WARN_ON(crypto_shash_digestsize(hmac_tfm) != sizeof(prk))) {
> + if (WARN_ON(crypto_shash_digestsize(hmac_tfm) != HKDF_HASHLEN)) {
>   err = -EINVAL;
>   goto err_free_tfm;
>   }
>  
> - err = hkdf_extract(hmac_tfm, master_key, master_key_size, prk);
> - if (err)
> - goto err_free_tfm;
> -
> - err = crypto_shash_setkey(hmac_tfm, prk, sizeof(prk));
> + err = crypto_hkdf_setkey(hmac_tfm, seed, ARRAY_SIZE(seed));
>   if (err)
>   goto err_free_tfm;

It's weird that the salt and key have to be passed in a kvec.
Why not just have normal function parameters like:

int crypto_hkdf_setkey(struct crypto_shash *hmac_tfm,
   const u8 *key, size_t keysize,
   const u8 *salt, size_t saltsize);

>  int fscrypt_hkdf_expand(const struct fscrypt_hkdf *hkdf, u8 context,
> - const u8 *info, unsigned int infolen,
> + u8 *info, unsigned int infolen,
>   u8 *okm, unsigned int okmlen)
>  {
> - SHASH_DESC_ON_STACK(desc, hkdf->hmac_tfm);
> - u8 prefix[9];
> - unsigned int i;
> - int err;
> - const u8 *prev = NULL;
> - u8 counter = 1;
> - u8 tmp[HKDF_HASHLEN];
> -
> - if (WARN_ON(okmlen > 255 * HKDF_HASHLEN))
> - return -EINVAL;
> -
> - desc->tfm = hkdf->hmac_tfm;
> -
> - memcpy(prefix, "fscrypt\0", 8);
> - prefix[8] = context;
> -
> - for (i = 0; i < okmlen; i += HKDF_HASHLEN) {
> + const struct kvec info_iov[] = { {
> + .iov_base = "fscrypt\0",
> + .iov_len = 8,
> + }, {
> + .iov_base = ,
> + .iov_len = 1,
> + }, {
> + .iov_base = info,
> + .iov_len = infolen,
> + } };
> + int err = crypto_hkdf_generate(hkdf->hmac_tfm,
> +info_iov, ARRAY_SIZE(info_iov),
> +okm,

[PATCH v1] vdpa/mlx5: Fix memory key MTT population

2021-01-06 Thread Eli Cohen

map_direct_mr() assumed that the number of scatter/gather entries
returned by dma_map_sg_attrs() was equal to the number of segments in
the sgl list. This led to wrong population of the mkey object. Fix this
by properly referring to the returned value.

The hardware expects each MTT entry to contain the DMA address of a
contiguous block of memory of size (1 << mr->log_size) bytes.
dma_map_sg_attrs() can coalesce several sg entries into a single
scatter/gather entry of contiguous DMA range so we need to scan the list
and refer to the size of each s/g entry.

In addition, get rid of fill_sg() which effect is overwritten by
populate_mtts().

Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen 
---
V0->V1:
1. Fix typos
2. Improve changelog 


 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
 drivers/vdpa/mlx5/core/mr.c| 28 
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 5c92a576edae..08f742fd2409 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -15,6 +15,7 @@ struct mlx5_vdpa_direct_mr {
struct sg_table sg_head;
int log_size;
int nsg;
+   int nent;
struct list_head list;
u64 offset;
 };
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 4b6195666c58..d300f799efcd 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -25,17 +25,6 @@ static int get_octo_len(u64 len, int page_shift)
return (npages + 1) / 2;
 }
 
-static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
-{
-   struct scatterlist *sg;
-   __be64 *pas;
-   int i;
-
-   pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
-   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
-   (*pas) = cpu_to_be64(sg_dma_address(sg));
-}
-
 static void mlx5_set_access_mode(void *mkc, int mode)
 {
MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
@@ -45,10 +34,18 @@ static void mlx5_set_access_mode(void *mkc, int mode)
 static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
 {
struct scatterlist *sg;
+   int nsg = mr->nsg;
+   u64 dma_addr;
+   u64 dma_len;
+   int j = 0;
int i;
 
-   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
-   mtt[i] = cpu_to_be64(sg_dma_address(sg));
+   for_each_sg(mr->sg_head.sgl, sg, mr->nent, i) {
+   for (dma_addr = sg_dma_address(sg), dma_len = sg_dma_len(sg);
+nsg && dma_len;
+nsg--, dma_addr += BIT(mr->log_size), dma_len -= 
BIT(mr->log_size))
+   mtt[j++] = cpu_to_be64(dma_addr);
+   }
 }
 
 static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct 
mlx5_vdpa_direct_mr *mr)
@@ -64,7 +61,6 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_direct
return -ENOMEM;
 
MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
-   fill_sg(mr, in);
mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
@@ -276,8 +272,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_direct_mr
 done:
mr->log_size = log_entity_size;
mr->nsg = nsg;
-   err = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
DMA_BIDIRECTIONAL, 0);
-   if (!err)
+   mr->nent = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
DMA_BIDIRECTIONAL, 0);
+   if (!mr->nent)
goto err_map;
 
err = create_direct_mr(mvdev, mr);
-- 
2.28.0

Re: [RFC PATCH v2 1/1] platform-msi: Add platform check for subdevice irq domain

2021-01-06 Thread Leon Romanovsky

On Thu, Jan 07, 2021 at 06:55:16AM +, Tian, Kevin wrote:
> > From: Leon Romanovsky 
> > Sent: Thursday, January 7, 2021 2:09 PM
> >
> > On Thu, Jan 07, 2021 at 02:04:29AM +, Tian, Kevin wrote:
> > > > From: Leon Romanovsky 
> > > > Sent: Thursday, January 7, 2021 12:02 AM
> > > >
> > > > On Wed, Jan 06, 2021 at 11:23:39AM -0400, Jason Gunthorpe wrote:
> > > > > On Wed, Jan 06, 2021 at 12:40:17PM +0200, Leon Romanovsky wrote:
> > > > >
> > > > > > I asked what will you do when QEMU will gain needed functionality?
> > > > > > Will you remove QEMU from this list? If yes, how such "new" kernel
> > will
> > > > > > work on old QEMU versions?
> > > > >
> > > > > The needed functionality is some VMM hypercall, so presumably new
> > > > > kernels that support calling this hypercall will be able to discover
> > > > > if the VMM hypercall exists and if so superceed this entire check.
> > > >
> > > > Let's not speculate, do we have well-known path?
> > > > Will such patch be taken to stable@/distros?
> > > >
> > >
> > > There are two functions introduced in this patch. One is to detect whether
> > > running on bare metal or in a virtual machine. The other is for deciding
> > > whether the platform supports ims. Currently the two are identical because
> > > ims is supported only on bare metal at current stage. In the future it 
> > > will
> > look
> > > like below when ims can be enabled in a VM:
> > >
> > > bool arch_support_pci_device_ims(struct pci_dev *pdev)
> > > {
> > >   return on_bare_metal() || hypercall_irq_domain_supported();
> > > }
> > >
> > > The VMM vendor list is for on_bare_metal, and suppose a vendor will
> > > never be removed once being added to the list since the fact of running
> > > in a VM never changes, regardless of whether this hypervisor supports
> > > extra VMM hypercalls.
> >
> > This is what I imagined, this list will be forever, and this worries me.
> >
> > I don't know if it is true or not, but guess that at least Oracle and
> > Microsoft bare metal devices and VMs will have same DMI_SYS_VENDOR.
>
> It's true. David Woodhouse also said it's the case for Amazon EC2 instances.
>
> >
> > It means that this on_bare_metal() function won't work reliably in many
> > cases. Also being part of include/linux/msi.h, at some point of time,
> > this function will be picked by the users outside for the non-IMS cases.
> >
> > I didn't even mention custom forks of QEMU which are prohibited to change
> > DMI_SYS_VENDOR and private clouds with custom solutions.
>
> In this case the private QEMU forks are encouraged to set CPUID (X86_
> FEATURE_HYPERVISOR) if they do plan to adopt a different vendor name.

Does QEMU set this bit when it runs in host-passthrough CPU model?

>
> >
> > The current array makes DMI_SYS_VENDOR interface as some sort of ABI. If
> > in the future,
> > the QEMU will decide to use more hipster name, for example "qEmU", this
> > function
> > won't work.
> >
> > I'm aware that DMI_SYS_VENDOR is used heavily in the kernel code and
> > various names for the same company are good example how not reliable it.
> >
> > The most hilarious example is "Dell/Dell Inc./Dell Inc/Dell Computer
> > Corporation/Dell Computer",
> > but other companies are not far from them.
> >
> > Luckily enough, this identification is used for hardware product that
> > was released to the market and their name will be stable for that
> > specific model. It is not the case here where we need to ensure future
> > compatibility too (old kernel on new VM emulator).
> >
> > I'm not in position to say yes or no to this patch and don't have plans to 
> > do it.
> > Just expressing my feeling that this solution is too hacky for my taste.
> >
>
> I agree with your worries and solely relying on DMI_SYS_VENDOR is
> definitely too hacky. In previous discussions with Thomas there is no
> elegant way to handle this situation. It has to be a heuristic approach.
> First we hope the CPUID bit is set properly in most cases thus is checked
> first. Then other heuristics can be made for the remaining cases. DMI_
> SYS_VENDOR is the first hint and more can be added later. For example,
> when IOMMU is present there is vendor specific way to detect whether
> it's real or virtual. Dave also mentioned some BIOS flag to indicate a
> virtual machine. Now probably the real question here is whether people
> are OK with CPUID+DMI_SYS_VENDOR combo check for now (and grow
> it later) or prefer to having all identified heuristics so far in-place 
> together...

IMHO, it should be as much as possible close to the end result.

Thanks

>
> Thanks
> Kevin

Re: [PATCH 0/5] Add KDF implementations to crypto API

2021-01-06 Thread Eric Biggers

On Wed, Jan 06, 2021 at 10:59:24PM -0800, Eric Biggers wrote:
> On Thu, Jan 07, 2021 at 07:37:05AM +0100, Stephan Mueller wrote:
> > Am Montag, dem 04.01.2021 um 14:20 -0800 schrieb Eric Biggers:
> > > On Mon, Jan 04, 2021 at 10:45:57PM +0100, Stephan Müller wrote:
> > > > The HKDF addition is used to replace the implementation in the 
> > > > filesystem
> > > > crypto extension. This code was tested by using an EXT4 encrypted file
> > > > system that was created and contains files written to by the current
> > > > implementation. Using the new implementation a successful read of the
> > > > existing files was possible and new files / directories were created
> > > > and read successfully. These newly added file system objects could be
> > > > successfully read using the current code. Yet if there is a test suite
> > > > to validate whether the invokcation of the HKDF calculates the same
> > > > result as the existing implementation, I would be happy to validate
> > > > the implementation accordingly.
> > > 
> > > See https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html#tests
> > > for how to run the fscrypt tests.  'kvm-xfstests -c ext4 generic/582' 
> > > should
> > > be
> > > enough for this, though you could run all the tests if you want.
> > 
> > I ran the $(kvm-xfstests -c encrypt -g auto) on 5.11-rc2 with and without my
> > HKDF changes. I.e. the testing shows the same results for both kernels which
> > seems to imply that my HKDF changes do not change the behavior.
> > 
> > I get the following errors in both occasions - let me know if I should dig a
> > bit more.
> 
> The command you ran runs almost all xfstests with the test_dummy_encryption
> mount option enabled, which is different from running the encryption tests --
> and in fact it skips the real encryption tests, so it doesn't test the
> correctness of HKDF at all.  It looks like you saw some unrelated test 
> failures.
> Sorry if I wasn't clear -- by "all tests" I meant all encryption tests, i.e.
> 'kvm-xfstests -c ext4 -g encrypt'.  Also, even the single test generic/582
> should be sufficient to test HKDF, as I mentioned.
> 

I just did it myself and the tests pass.

- Eric

Re: [PATCH v3 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Can Guo


On 2021-01-07 14:57, Jaegeuk Kim wrote:

On 01/07, Can Guo wrote:

On 2021-01-07 05:41, Jaegeuk Kim wrote:
> When gate_work/ungate_work gets an error during hibern8_enter or exit,
>  ufshcd_err_handler()
>ufshcd_scsi_block_requests()
>ufshcd_reset_and_restore()
>  ufshcd_clear_ua_wluns() -> stuck
>ufshcd_scsi_unblock_requests()
>
> In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery
> flows
> such as suspend/resume, link_recovery, and error_handler.
>
> Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd
> resets")
> Signed-off-by: Jaegeuk Kim 
> ---
>  drivers/scsi/ufs/ufshcd.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index bedb822a40a3..1678cec08b51 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
>if (ret)
>dev_err(hba->dev, "%s: link recovery failed, err %d",
>__func__, ret);
> +  else
> +  ufshcd_clear_ua_wluns(hba);

Can we put it right after ufshcd_scsi_add_wlus() in ufshcd_add_lus()?


May I ask the reason? We'll call it after ufshcd_add_lus() later tho.



I think the code will be more readable - we do all the LU related
stuffs in one func, just nit-picking though. I found this because
I am planning to move the devfreq init codes out of ufshcd_add_lus()
due to it is inappropriate to init devfreq in there by its naming,
but it might be a good place for ufshcd_clear_ua_wluns().

Thanks,
Can Guo.



Thanks,
Can Guo.

>
>return ret;
>  }
> @@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct
> *work)
>ufshcd_scsi_unblock_requests(hba);
>ufshcd_err_handling_unprepare(hba);
>up(>eh_sem);
> +
> +  if (!err && needs_reset)
> +  ufshcd_clear_ua_wluns(hba);
>  }
>
>  /**
> @@ -6940,14 +6945,11 @@ static int
> ufshcd_host_reset_and_restore(struct ufs_hba *hba)
>ufshcd_set_clk_freq(hba, true);
>
>err = ufshcd_hba_enable(hba);
> -  if (err)
> -  goto out;
>
>/* Establish the link again and restore the device */
> -  err = ufshcd_probe_hba(hba, false);
>if (!err)
> -  ufshcd_clear_ua_wluns(hba);
> -out:
> +  err = ufshcd_probe_hba(hba, false);
> +
>if (err)
>dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
>ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
> @@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
> enum ufs_pm_op pm_op)
>ufshcd_resume_clkscaling(hba);
>hba->clk_gating.is_suspended = false;
>hba->dev_info.b_rpm_dev_flush_capable = false;
> +  ufshcd_clear_ua_wluns(hba);
>ufshcd_release(hba);
>  out:
>if (hba->dev_info.b_rpm_dev_flush_capable) {
> @@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba,
> enum ufs_pm_op pm_op)
>cancel_delayed_work(>rpm_dev_flush_recheck_work);
>}
>
> +  ufshcd_clear_ua_wluns(hba);
> +
>/* Schedule clock gating in case of no access to UFS device yet */
>ufshcd_release(hba);

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Borislav Petkov

On Wed, Jan 06, 2021 at 11:13:53AM -0800, Paul E. McKenney wrote:
> Not yet, it isn't!  Well, except in -rcu.  ;-)

Of course it is - saying "This commit" in this commit's commit message
is very much a tautology. :-)

> You are suggesting dropping mce_missing_cpus and just doing this?
> 
> if (!cpumask_andnot(_present_cpus, cpu_online_mask, _present_cpus))

Yes.

And pls don't call it "holdout CPUs" and change the order so that it is
more user-friendly (yap, you don't need __func__ either):

[   78.946153] mce: Not all CPUs (24-47,120-143) entered the broadcast 
exception handler.
[   78.946153] Kernel panic - not syncing: Timeout: MCA synchronization.

or so.

And that's fine if it appears twice as long as it is the same info - the
MCA code is one complex mess so you can probably guess why I'd like to
have new stuff added to it be as simplistic as possible.

> I was worried (perhaps unnecessarily) about the possibility of CPUs
> checking in during the printout operation, which would set rather than
> clear the bit.  But perhaps the possible false positives that Tony points
> out make this race not worth worrying about.
> 
> Thoughts?

Yah, apparently, it is not going to be a precise report as you wanted it
to be but at least it'll tell you which *sockets* you can rule out, if
not cores.

:-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Can Guo


On 2021-01-07 14:51, Jaegeuk Kim wrote:

On 01/07, Can Guo wrote:

On 2021-01-07 05:41, Jaegeuk Kim wrote:
> From: Jaegeuk Kim 
>
> This fixes a warning caused by wrong reserve tag usage in
> __ufshcd_issue_tm_cmd.
>
> WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> blk_mq_get_tag+0x438/0x46c
>
> And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> aborting
> outstanding commands by waiting a bit for IO completion like this.
>
> __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
>

Would you mind add a Fixes tag?


Ok.



> Signed-off-by: Jaegeuk Kim 
> ---
>  drivers/scsi/ufs/ufshcd.c | 36 
>  1 file changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 1678cec08b51..47fc8da3cbf9 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -44,6 +44,9 @@
>  /* Query request timeout */
>  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
>
> +/* LINERESET TIME OUT */
> +#define LINERESET_IO_TIMEOUT_MS   (3) /* 30 sec */
> +
>  /* Task management command timeout */
>  #define TM_CMD_TIMEOUT100 /* msecs */
>
> @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> *work)
> * check if power mode restore is needed.
> */
>if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> +  ktime_t start = ktime_get();
> +
>hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
>if (!hba->saved_uic_err)
>hba->saved_err &= ~UIC_ERROR;
> @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> *work)
>if (ufshcd_is_pwr_mode_restore_needed(hba))
>needs_restore = true;
>spin_lock_irqsave(hba->host->host_lock, flags);
> +  /* Wait for IO completion to avoid aborting IOs */
> +  while (hba->outstanding_reqs) {
> +  ufshcd_complete_requests(hba);
> +  spin_unlock_irqrestore(hba->host->host_lock, flags);
> +  schedule();
> +  spin_lock_irqsave(hba->host->host_lock, flags);
> +  if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> +  LINERESET_IO_TIMEOUT_MS) {
> +  dev_err(hba->dev, "%s: timeout, 
outstanding=0x%lx\n",
> +  __func__, hba->outstanding_reqs);
> +  break;
> +  }
> +  }
> +
>if (!hba->saved_err && !needs_restore)
>goto skip_err_handling;
>}
> @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> *__hba)
>intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
>}
>
> -  if (enabled_intr_status && retval == IRQ_NONE) {
> -  dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> -  __func__, intr_status);
> +  if (enabled_intr_status && retval == IRQ_NONE &&
> +  !ufshcd_eh_in_progress(hba)) {
> +  dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
> 0x%08x)\n",
> +  __func__,
> +  intr_status,
> +  hba->ufs_stats.last_intr_status,
> +  enabled_intr_status);
>ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
>}
>
> @@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> *hba,
> * Even though we use wait_event() which sleeps indefinitely,
> * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> */
> -  req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> +  req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
> +  BLK_MQ_REQ_NOWAIT);

Sorry that I didn't pay much attention to this part of code before.
May I know why must we use the BLK_MQ_REQ_RESERVED flag?


What I understood is the reserved tag is used when aborting outstanding
IOs when all the 32 tags were used.



No, the tm requests and I/O requests are on two different tag sets:
tm requests come from hba->tmf_tag_set, while I/O requests come from
hba->shost->tag_set. Meaning they don't share tags with each other.



Thanks,
Can Guo.

> +  if (IS_ERR(req))
> +  return PTR_ERR(req);
> +
>req->end_io_data = 
>free_slot = req->tag;
>WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
> @@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void
> __iomem *mmio_base, unsigned int irq)
>
>hba->tmf_tag_set = (struct blk_mq_tag_set) {
>.nr_hw_queues   = 1,
> +  .reserved_tags  = 1,

If we give reserved_tags as 1 and always ask for a tm requst with
BLK_MQ_REQ_RESERVED flag set, then the tag shall only be allocated

RE: [RFC PATCH 1/1] platform-msi: Add platform check for subdevice irq domain

2021-01-06 Thread Tian, Kevin

> From: David Woodhouse 
> Sent: Thursday, December 10, 2020 4:23 PM
> 
> On Thu, 2020-12-10 at 08:46 +0800, Lu Baolu wrote:
> > +/*
> > + * We want to figure out which context we are running in. But the
> hardware
> > + * does not introduce a reliable way (instruction, CPUID leaf, MSR,
> whatever)
> > + * which can be manipulated by the VMM to let the OS figure out where it
> runs.
> > + * So we go with the below probably_on_bare_metal() function as a
> replacement
> > + * for definitely_on_bare_metal() to go forward only for the very simple
> reason
> > + * that this is the only option we have.
> > + */
> > +static const char * const possible_vmm_vendor_name[] = {
> > +   "QEMU", "Bochs", "KVM", "Xen", "VMware", "VMW", "VMware Inc.",
> > +   "innotek GmbH", "Oracle Corporation", "Parallels", "BHYVE",
> > +   "Microsoft Corporation"
> > +};
> 
> People do use SeaBIOS ("Bochs") on bare metal.
> 
> You'll also see "Amazon EC2" on virt instances as well as bare metal
> instances. Although in that case I believe the virt instances do have
> the 'virtual machine' flag set in bit 4 of the BIOS Characteristics
> Extension Byte 2, and the bare metal obviously don't.
> 

Are those virtual instances having CPUID hypervisor bit set? If yes,
they can be differentiated from bare metal instances w/o checking
the vendor list.

btw do you know whether this 'virtual machine' flag is widely used
in virtualization environments? If yes, we probably should add check
on this flag even before checking DMI_SYS_VENDOR. It sounds more
general...

Thanks
Kevin

Re: INFO: task hung in do_truncate (2)

2021-01-06 Thread syzbot

syzbot suspects this issue was fixed by commit:

commit dfefd226b0bf7c435a58d75a0ce2f9273b9825f6
Author: Alexey Dobriyan 
Date:   Tue Dec 15 03:15:03 2020 +

mm: cleanup kstrto*() usage

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=111aa0cf50
start commit:   7ae77150 Merge tag 'powerpc-5.8-1' of git://git.kernel.org..
git tree:   upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=d195fe572fb15312
dashboard link: https://syzkaller.appspot.com/bug?extid=18b2ab4c697021ee8369
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15cec29610
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=153a741e10

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: mm: cleanup kstrto*() usage

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Re: [PATCH 0/5] Add KDF implementations to crypto API

2021-01-06 Thread Eric Biggers

On Thu, Jan 07, 2021 at 07:37:05AM +0100, Stephan Mueller wrote:
> Am Montag, dem 04.01.2021 um 14:20 -0800 schrieb Eric Biggers:
> > On Mon, Jan 04, 2021 at 10:45:57PM +0100, Stephan Müller wrote:
> > > The HKDF addition is used to replace the implementation in the filesystem
> > > crypto extension. This code was tested by using an EXT4 encrypted file
> > > system that was created and contains files written to by the current
> > > implementation. Using the new implementation a successful read of the
> > > existing files was possible and new files / directories were created
> > > and read successfully. These newly added file system objects could be
> > > successfully read using the current code. Yet if there is a test suite
> > > to validate whether the invokcation of the HKDF calculates the same
> > > result as the existing implementation, I would be happy to validate
> > > the implementation accordingly.
> > 
> > See https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html#tests
> > for how to run the fscrypt tests.  'kvm-xfstests -c ext4 generic/582' should
> > be
> > enough for this, though you could run all the tests if you want.
> 
> I ran the $(kvm-xfstests -c encrypt -g auto) on 5.11-rc2 with and without my
> HKDF changes. I.e. the testing shows the same results for both kernels which
> seems to imply that my HKDF changes do not change the behavior.
> 
> I get the following errors in both occasions - let me know if I should dig a
> bit more.

The command you ran runs almost all xfstests with the test_dummy_encryption
mount option enabled, which is different from running the encryption tests --
and in fact it skips the real encryption tests, so it doesn't test the
correctness of HKDF at all.  It looks like you saw some unrelated test failures.
Sorry if I wasn't clear -- by "all tests" I meant all encryption tests, i.e.
'kvm-xfstests -c ext4 -g encrypt'.  Also, even the single test generic/582
should be sufficient to test HKDF, as I mentioned.

- Eric

Re: [PATCH v3 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > When gate_work/ungate_work gets an error during hibern8_enter or exit,
> >  ufshcd_err_handler()
> >ufshcd_scsi_block_requests()
> >ufshcd_reset_and_restore()
> >  ufshcd_clear_ua_wluns() -> stuck
> >ufshcd_scsi_unblock_requests()
> > 
> > In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery
> > flows
> > such as suspend/resume, link_recovery, and error_handler.
> > 
> > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd
> > resets")
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  drivers/scsi/ufs/ufshcd.c | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index bedb822a40a3..1678cec08b51 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
> > if (ret)
> > dev_err(hba->dev, "%s: link recovery failed, err %d",
> > __func__, ret);
> > +   else
> > +   ufshcd_clear_ua_wluns(hba);
> 
> Can we put it right after ufshcd_scsi_add_wlus() in ufshcd_add_lus()?

May I ask the reason? We'll call it after ufshcd_add_lus() later tho.

> 
> Thanks,
> Can Guo.
> 
> > 
> > return ret;
> >  }
> > @@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > ufshcd_scsi_unblock_requests(hba);
> > ufshcd_err_handling_unprepare(hba);
> > up(>eh_sem);
> > +
> > +   if (!err && needs_reset)
> > +   ufshcd_clear_ua_wluns(hba);
> >  }
> > 
> >  /**
> > @@ -6940,14 +6945,11 @@ static int
> > ufshcd_host_reset_and_restore(struct ufs_hba *hba)
> > ufshcd_set_clk_freq(hba, true);
> > 
> > err = ufshcd_hba_enable(hba);
> > -   if (err)
> > -   goto out;
> > 
> > /* Establish the link again and restore the device */
> > -   err = ufshcd_probe_hba(hba, false);
> > if (!err)
> > -   ufshcd_clear_ua_wluns(hba);
> > -out:
> > +   err = ufshcd_probe_hba(hba, false);
> > +
> > if (err)
> > dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
> > ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
> > @@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
> > enum ufs_pm_op pm_op)
> > ufshcd_resume_clkscaling(hba);
> > hba->clk_gating.is_suspended = false;
> > hba->dev_info.b_rpm_dev_flush_capable = false;
> > +   ufshcd_clear_ua_wluns(hba);
> > ufshcd_release(hba);
> >  out:
> > if (hba->dev_info.b_rpm_dev_flush_capable) {
> > @@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba,
> > enum ufs_pm_op pm_op)
> > cancel_delayed_work(>rpm_dev_flush_recheck_work);
> > }
> > 
> > +   ufshcd_clear_ua_wluns(hba);
> > +
> > /* Schedule clock gating in case of no access to UFS device yet */
> > ufshcd_release(hba);

RE: [RFC PATCH v2 1/1] platform-msi: Add platform check for subdevice irq domain

2021-01-06 Thread Tian, Kevin

> From: Leon Romanovsky 
> Sent: Thursday, January 7, 2021 2:09 PM
> 
> On Thu, Jan 07, 2021 at 02:04:29AM +, Tian, Kevin wrote:
> > > From: Leon Romanovsky 
> > > Sent: Thursday, January 7, 2021 12:02 AM
> > >
> > > On Wed, Jan 06, 2021 at 11:23:39AM -0400, Jason Gunthorpe wrote:
> > > > On Wed, Jan 06, 2021 at 12:40:17PM +0200, Leon Romanovsky wrote:
> > > >
> > > > > I asked what will you do when QEMU will gain needed functionality?
> > > > > Will you remove QEMU from this list? If yes, how such "new" kernel
> will
> > > > > work on old QEMU versions?
> > > >
> > > > The needed functionality is some VMM hypercall, so presumably new
> > > > kernels that support calling this hypercall will be able to discover
> > > > if the VMM hypercall exists and if so superceed this entire check.
> > >
> > > Let's not speculate, do we have well-known path?
> > > Will such patch be taken to stable@/distros?
> > >
> >
> > There are two functions introduced in this patch. One is to detect whether
> > running on bare metal or in a virtual machine. The other is for deciding
> > whether the platform supports ims. Currently the two are identical because
> > ims is supported only on bare metal at current stage. In the future it will
> look
> > like below when ims can be enabled in a VM:
> >
> > bool arch_support_pci_device_ims(struct pci_dev *pdev)
> > {
> > return on_bare_metal() || hypercall_irq_domain_supported();
> > }
> >
> > The VMM vendor list is for on_bare_metal, and suppose a vendor will
> > never be removed once being added to the list since the fact of running
> > in a VM never changes, regardless of whether this hypervisor supports
> > extra VMM hypercalls.
> 
> This is what I imagined, this list will be forever, and this worries me.
> 
> I don't know if it is true or not, but guess that at least Oracle and
> Microsoft bare metal devices and VMs will have same DMI_SYS_VENDOR.

It's true. David Woodhouse also said it's the case for Amazon EC2 instances.

> 
> It means that this on_bare_metal() function won't work reliably in many
> cases. Also being part of include/linux/msi.h, at some point of time,
> this function will be picked by the users outside for the non-IMS cases.
> 
> I didn't even mention custom forks of QEMU which are prohibited to change
> DMI_SYS_VENDOR and private clouds with custom solutions.

In this case the private QEMU forks are encouraged to set CPUID (X86_
FEATURE_HYPERVISOR) if they do plan to adopt a different vendor name.

> 
> The current array makes DMI_SYS_VENDOR interface as some sort of ABI. If
> in the future,
> the QEMU will decide to use more hipster name, for example "qEmU", this
> function
> won't work.
> 
> I'm aware that DMI_SYS_VENDOR is used heavily in the kernel code and
> various names for the same company are good example how not reliable it.
> 
> The most hilarious example is "Dell/Dell Inc./Dell Inc/Dell Computer
> Corporation/Dell Computer",
> but other companies are not far from them.
> 
> Luckily enough, this identification is used for hardware product that
> was released to the market and their name will be stable for that
> specific model. It is not the case here where we need to ensure future
> compatibility too (old kernel on new VM emulator).
> 
> I'm not in position to say yes or no to this patch and don't have plans to do 
> it.
> Just expressing my feeling that this solution is too hacky for my taste.
> 

I agree with your worries and solely relying on DMI_SYS_VENDOR is 
definitely too hacky. In previous discussions with Thomas there is no 
elegant way to handle this situation. It has to be a heuristic approach. 
First we hope the CPUID bit is set properly in most cases thus is checked 
first. Then other heuristics can be made for the remaining cases. DMI_
SYS_VENDOR is the first hint and more can be added later. For example,
when IOMMU is present there is vendor specific way to detect whether 
it's real or virtual. Dave also mentioned some BIOS flag to indicate a
virtual machine. Now probably the real question here is whether people 
are OK with CPUID+DMI_SYS_VENDOR combo check for now (and grow 
it later) or prefer to having all identified heuristics so far in-place 
together...

Thanks
Kevin

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > From: Jaegeuk Kim 
> > 
> > This fixes a warning caused by wrong reserve tag usage in
> > __ufshcd_issue_tm_cmd.
> > 
> > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > blk_mq_get_tag+0x438/0x46c
> > 
> > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > aborting
> > outstanding commands by waiting a bit for IO completion like this.
> > 
> > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > 
> 
> Would you mind add a Fixes tag?

Ok.

> 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  drivers/scsi/ufs/ufshcd.c | 36 
> >  1 file changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index 1678cec08b51..47fc8da3cbf9 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -44,6 +44,9 @@
> >  /* Query request timeout */
> >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > 
> > +/* LINERESET TIME OUT */
> > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
> > +
> >  /* Task management command timeout */
> >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > 
> > @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> >  * check if power mode restore is needed.
> >  */
> > if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> > +   ktime_t start = ktime_get();
> > +
> > hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
> > if (!hba->saved_uic_err)
> > hba->saved_err &= ~UIC_ERROR;
> > @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > if (ufshcd_is_pwr_mode_restore_needed(hba))
> > needs_restore = true;
> > spin_lock_irqsave(hba->host->host_lock, flags);
> > +   /* Wait for IO completion to avoid aborting IOs */
> > +   while (hba->outstanding_reqs) {
> > +   ufshcd_complete_requests(hba);
> > +   spin_unlock_irqrestore(hba->host->host_lock, flags);
> > +   schedule();
> > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > +   LINERESET_IO_TIMEOUT_MS) {
> > +   dev_err(hba->dev, "%s: timeout, 
> > outstanding=0x%lx\n",
> > +   __func__, hba->outstanding_reqs);
> > +   break;
> > +   }
> > +   }
> > +
> > if (!hba->saved_err && !needs_restore)
> > goto skip_err_handling;
> > }
> > @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> > *__hba)
> > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> > }
> > 
> > -   if (enabled_intr_status && retval == IRQ_NONE) {
> > -   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> > -   __func__, intr_status);
> > +   if (enabled_intr_status && retval == IRQ_NONE &&
> > +   !ufshcd_eh_in_progress(hba)) {
> > +   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
> > 0x%08x)\n",
> > +   __func__,
> > +   intr_status,
> > +   hba->ufs_stats.last_intr_status,
> > +   enabled_intr_status);
> > ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
> > }
> > 
> > @@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> > *hba,
> >  * Even though we use wait_event() which sleeps indefinitely,
> >  * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> >  */
> > -   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> > +   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
> > +   BLK_MQ_REQ_NOWAIT);
> 
> Sorry that I didn't pay much attention to this part of code before.
> May I know why must we use the BLK_MQ_REQ_RESERVED flag?

What I understood is the reserved tag is used when aborting outstanding
IOs when all the 32 tags were used.

> 
> Thanks,
> Can Guo.
> 
> > +   if (IS_ERR(req))
> > +   return PTR_ERR(req);
> > +
> > req->end_io_data = 
> > free_slot = req->tag;
> > WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
> > @@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void
> > __iomem *mmio_base, unsigned int irq)
> > 
> > hba->tmf_tag_set = (struct blk_mq_tag_set) {
> > .nr_hw_queues   = 1,
> > +   .reserved_tags  = 1,
> 
> If we give reserved_tags as 1 and always ask for a tm requst with
>

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> Hi Jaegeuk,
> 
> On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > From: Jaegeuk Kim 
> > 
> > This fixes a warning caused by wrong reserve tag usage in
> > __ufshcd_issue_tm_cmd.
> > 
> > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > blk_mq_get_tag+0x438/0x46c
> > 
> > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > aborting
> > outstanding commands by waiting a bit for IO completion like this.
> > 
> > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  drivers/scsi/ufs/ufshcd.c | 36 
> >  1 file changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index 1678cec08b51..47fc8da3cbf9 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -44,6 +44,9 @@
> >  /* Query request timeout */
> >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > 
> > +/* LINERESET TIME OUT */
> > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
> > +
> >  /* Task management command timeout */
> >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > 
> > @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> >  * check if power mode restore is needed.
> >  */
> > if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> > +   ktime_t start = ktime_get();
> 
> I don't see the connection btw line-reset and following tmf cmd.
> My point is that line-reset is not the only non-fatal error which
> leads us to the following tmf cmd. So the wait should be outside
> of this check - just put it right before clearing outstanding reqs.

Ok. Let me move it in v4.

> 
> Thanks,
> Can Guo.
> 
> > +
> > hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
> > if (!hba->saved_uic_err)
> > hba->saved_err &= ~UIC_ERROR;
> > @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > if (ufshcd_is_pwr_mode_restore_needed(hba))
> > needs_restore = true;
> > spin_lock_irqsave(hba->host->host_lock, flags);
> > +   /* Wait for IO completion to avoid aborting IOs */
> > +   while (hba->outstanding_reqs) {
> > +   ufshcd_complete_requests(hba);
> > +   spin_unlock_irqrestore(hba->host->host_lock, flags);
> > +   schedule();
> > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > +   LINERESET_IO_TIMEOUT_MS) {
> > +   dev_err(hba->dev, "%s: timeout, 
> > outstanding=0x%lx\n",
> > +   __func__, hba->outstanding_reqs);
> > +   break;
> > +   }
> > +   }
> > +
> > if (!hba->saved_err && !needs_restore)
> > goto skip_err_handling;
> > }
> > @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> > *__hba)
> > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> > }
> > 
> > -   if (enabled_intr_status && retval == IRQ_NONE) {
> > -   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> > -   __func__, intr_status);
> > +   if (enabled_intr_status && retval == IRQ_NONE &&
> > +   !ufshcd_eh_in_progress(hba)) {
> > +   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
> > 0x%08x)\n",
> > +   __func__,
> > +   intr_status,
> > +   hba->ufs_stats.last_intr_status,
> > +   enabled_intr_status);
> > ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
> > }
> > 
> > @@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> > *hba,
> >  * Even though we use wait_event() which sleeps indefinitely,
> >  * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> >  */
> > -   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> > +   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
> > +   BLK_MQ_REQ_NOWAIT);
> > +   if (IS_ERR(req))
> > +   return PTR_ERR(req);
> > +
> > req->end_io_data = 
> > free_slot = req->tag;
> > WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
> > @@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void
> > __iomem *mmio_base, unsigned int irq)
> > 
> > hba->tmf_tag_set = (struct blk_mq_tag_set) {
> > .nr_hw_queues   = 1,
> > +   .reserved_tags  = 1,
> > .queue_depth= hba->nutmrs,
> >

Re: [PATCH] drm/panel: feiyang-fy07024di26a30d: cleanup if panel attaching failed

2021-01-06 Thread Jagan Teki

On Thu, Jan 7, 2021 at 10:16 AM Icenowy Zheng  wrote:
>
>
>
> 于 2021年1月6日 GMT+08:00 下午5:47:20, Jagan Teki  写到:
> >On Sat, Nov 28, 2020 at 6:23 PM Icenowy Zheng  wrote:
> >>
> >> Attaching the panel can fail, so cleanup work is necessary, otherwise
> >> a pointer to freed struct drm_panel* will remain in drm_panel code.
> >>
> >> Do the cleanup if panel attaching failed.
> >>
> >> Fixes: 69dc678abc2b ("drm/panel: Add Feiyang FY07024DI26A30-D
> >MIPI-DSI LCD panel")
> >
> >The fact that this has failed to probe due to recent changes in
> >sun6i_mipi_dsi.c I don't know how to put that into the commit message.
>
> It's not related, we shouldn't assume this panel driver will always
> be used with sunxi SoCs.

Well, I'm aware of it. What I'm trying to say is this panel has
referenced with one of exiting panel in a tree and that indeed return
mipi_dsi_attach and it verified with DSI host at that time.

>
> It's a panel driver bug that cannot deal with -EPROBE_DEFER well.

Yes, ie reason I have added Reviewed-by tag above.

Jagan.

Re: [PATCH] arm64: dts: ls1028a: fix the offset of the reset register

2021-01-06 Thread Shawn Guo

On Thu, Jan 7, 2021 at 2:40 PM Shawn Guo  wrote:
>
> On Tue, Dec 15, 2020 at 10:26:22PM +0100, Michael Walle wrote:
> > The offset of the reset request register is 0, the absolute address is
> > 0x1e6. Boards without PSCI support will fail to perform a reset:
> >
> > [   26.734700] reboot: Restarting system
> > [   27.743259] Unable to restart system
> > [   27.746845] Reboot failed -- System halted
> >
> > Fixes: 8897f3255c9c ("arm64: dts: Add support for NXP LS1028A SoC")
> > Signed-off-by: Michael Walle 
>
> Out of curiosity, how did you get it fixed with your commit 3f0fb37b22b4

How did you *not*, I meant.

Shawn

> ("arm64: dts: ls1028a: fix reboot node") in the first place?

Re: [PATCH] arm64: dts: ls1028a: fix the offset of the reset register

2021-01-06 Thread Shawn Guo

On Tue, Dec 15, 2020 at 10:26:22PM +0100, Michael Walle wrote:
> The offset of the reset request register is 0, the absolute address is
> 0x1e6. Boards without PSCI support will fail to perform a reset:
> 
> [   26.734700] reboot: Restarting system
> [   27.743259] Unable to restart system
> [   27.746845] Reboot failed -- System halted
> 
> Fixes: 8897f3255c9c ("arm64: dts: Add support for NXP LS1028A SoC")
> Signed-off-by: Michael Walle 

Out of curiosity, how did you get it fixed with your commit 3f0fb37b22b4
("arm64: dts: ls1028a: fix reboot node") in the first place?

Shawn

> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index 045739dbcb17..0a5923e96d7f 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -102,7 +102,7 @@
>   reboot {
>   compatible ="syscon-reboot";
>   regmap = <>;
> - offset = <0xb0>;
> + offset = <0>;
>   mask = <0x02>;
>   };
>  
> -- 
> 2.20.1
>

Re: [PATCH 0/5] Add KDF implementations to crypto API

2021-01-06 Thread Stephan Mueller

Am Montag, dem 04.01.2021 um 14:20 -0800 schrieb Eric Biggers:
> On Mon, Jan 04, 2021 at 10:45:57PM +0100, Stephan Müller wrote:
> > The HKDF addition is used to replace the implementation in the filesystem
> > crypto extension. This code was tested by using an EXT4 encrypted file
> > system that was created and contains files written to by the current
> > implementation. Using the new implementation a successful read of the
> > existing files was possible and new files / directories were created
> > and read successfully. These newly added file system objects could be
> > successfully read using the current code. Yet if there is a test suite
> > to validate whether the invokcation of the HKDF calculates the same
> > result as the existing implementation, I would be happy to validate
> > the implementation accordingly.
> 
> See https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html#tests
> for how to run the fscrypt tests.  'kvm-xfstests -c ext4 generic/582' should
> be
> enough for this, though you could run all the tests if you want.

I ran the $(kvm-xfstests -c encrypt -g auto) on 5.11-rc2 with and without my
HKDF changes. I.e. the testing shows the same results for both kernels which
seems to imply that my HKDF changes do not change the behavior.

I get the following errors in both occasions - let me know if I should dig a
bit more.


[failed, exit status 1] [06:19:21]- output mismatch (see
/results/ext4/results-encrypt/ext4/023.out.bad)
--- tests/ext4/023.out  2020-03-20 02:31:32.0 +
+++ /results/ext4/results-encrypt/ext4/023.out.bad  2021-01-07
06:19:21.292339438 +
@@ -1,3 +1,2 @@
 QA output created by 023
 Format and populate
-Mount
...
(Run 'diff -u /root/xfstests/tests/ext4/023.out /results/ext4/results-
encrypt/ext4/023.out.bad'  to see the entire )

[failed, exit status 1] [06:19:28]- output mismatch (see
/results/ext4/results-encrypt/ext4/028.out.bad)
--- tests/ext4/028.out  2020-03-20 02:31:32.0 +
+++ /results/ext4/results-encrypt/ext4/028.out.bad  2021-01-07
06:19:28.762339424 +
@@ -1,3 +1,2 @@
 QA output created by 028
 Format and mount
-Compare fsmap
...
(Run 'diff -u /root/xfstests/tests/ext4/028.out /results/ext4/results-
encrypt/ext4/028.out.bad'  to see the entire )

[failed, exit status 1] [06:21:02]- output mismatch (see
/results/ext4/results-encrypt/ext4/044.out.bad)
--- tests/ext4/044.out  2020-03-20 02:31:32.0 +
+++ /results/ext4/results-encrypt/ext4/044.out.bad  2021-01-07
06:21:02.215672727 +
@@ -1,2 +1,5 @@
 QA output created by 044
 Silence is golden
+mount: /vdc: wrong fs type, bad option, bad superblock on /dev/vdc,
missing codepage or helper program, or other e.
+ext3 mount failed
+(see /results/ext4/results-encrypt/ext4/044.full for details)
...
(Run 'diff -u /root/xfstests/tests/ext4/044.out /results/ext4/results-
encrypt/ext4/044.out.bad'  to see the entire )


generic/085 [06:32:40][  849.654788] run fstests generic/085 at
2021-01-07 06:32:40
[  849.903286] EXT4-fs (vdd): Test dummy encryption mode enabled
[  849.915355] EXT4-fs (vdd): mounted filesystem with ordered data mode. Opts:
acl,user_xattr,block_validity,test_dummy.
[  850.267282] dm-0: detected capacity change from 524288 to 0
[  850.369101] EXT4-fs (dm-0): mounted filesystem with ordered data mode.
Opts: (null). Quota mode: none.
[  850.370106] ext4 filesystem being mounted at /vdc supports timestamps until
2038 (0x7fff)
[  850.479981] EXT4-fs (dm-0): mounted filesystem with ordered data mode.
Opts: (null). Quota mode: none.
[  850.480782] ext4 filesystem being mounted at /vdc supports timestamps until
2038 (0x7fff)
[  850.530734] BUG: kernel NULL pointer dereference, address: 0058
[  850.531241] #PF: supervisor read access in kernel mode
[  850.531613] #PF: error_code(0x) - not-present page
[  850.532020] PGD 2a496067 P4D 2a496067 PUD 0 
[  850.532336] Oops:  [#1] SMP NOPTI
[  850.532604] CPU: 1 PID: 19542 Comm: dmsetup Not tainted 5.11.0-rc2-xfstests
#8
[  850.533156] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.14.0-1.fc33 04/01/2014
[  850.533780] RIP: 0010:thaw_bdev+0x47/0x90
[  850.534106] Code: 8b 83 d8 04 00 00 85 c0 74 57 83 e8 01 45 31 e4 85 c0 89
83 d8 04 00 00 7f 2d 48 8b bb 80 05 00 007
[  850.535447] RSP: 0018:b97586c2bcd8 EFLAGS: 00010286
[  850.535822] RAX:  RBX: 9df4a2e74240 RCX:
b97586c2bbdc
[  850.536361] RDX: 9df4fdc17e80 RSI: 9df4a2e74790 RDI:
9df48b0bf000
[  850.536864] RBP: 9df4a2e74720 R08:  R09:
00040216
[  850.537410] R10:  R11:  R12:

[  850.537950] R13:  R14: 0006 R15:
0001
[  850.538455] FS:  () GS:9df4fdc0(0063)
knlGS:f7a487c0
[  850.539063] CS:  0010 DS: 002b ES:

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Can Guo


On 2021-01-07 05:41, Jaegeuk Kim wrote:

From: Jaegeuk Kim 

This fixes a warning caused by wrong reserve tag usage in 
__ufshcd_issue_tm_cmd.


WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 
blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 
blk_mq_get_tag+0x438/0x46c


And, in ufshcd_err_handler(), we can avoid to send tm_cmd before 
aborting

outstanding commands by waiting a bit for IO completion like this.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out



Would you mind add a Fixes tag?


Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1678cec08b51..47fc8da3cbf9 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -44,6 +44,9 @@
 /* Query request timeout */
 #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */

+/* LINERESET TIME OUT */
+#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
+
 /* Task management command timeout */
 #define TM_CMD_TIMEOUT 100 /* msecs */

@@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct 
*work)

 * check if power mode restore is needed.
 */
if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
+   ktime_t start = ktime_get();
+
hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
if (!hba->saved_uic_err)
hba->saved_err &= ~UIC_ERROR;
@@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct 
work_struct *work)

if (ufshcd_is_pwr_mode_restore_needed(hba))
needs_restore = true;
spin_lock_irqsave(hba->host->host_lock, flags);
+   /* Wait for IO completion to avoid aborting IOs */
+   while (hba->outstanding_reqs) {
+   ufshcd_complete_requests(hba);
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   schedule();
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
+   LINERESET_IO_TIMEOUT_MS) {
+   dev_err(hba->dev, "%s: timeout, 
outstanding=0x%lx\n",
+   __func__, hba->outstanding_reqs);
+   break;
+   }
+   }
+
if (!hba->saved_err && !needs_restore)
goto skip_err_handling;
}
@@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void 
*__hba)

intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}

-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",

+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}

@@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba 
*hba,

 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
+   BLK_MQ_REQ_NOWAIT);


Sorry that I didn't pay much attention to this part of code before.
May I know why must we use the BLK_MQ_REQ_RESERVED flag?

Thanks,
Can Guo.


+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
@@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void
__iomem *mmio_base, unsigned int irq)

hba->tmf_tag_set = (struct blk_mq_tag_set) {
.nr_hw_queues   = 1,
+   .reserved_tags  = 1,


If we give reserved_tags as 1 and always ask for a tm requst with
BLK_MQ_REQ_RESERVED flag set, then the tag shall only be allocated
from the reserved sbitmap_queue, whose depth is set to 1 here.
UFS supports tm queue depth as 8, but here is allowing only one tm
req at a time. Why? Please correct me if my understanding is wrong.

Thanks,
Can Guo.


.queue_depth= hba->nutmrs,
.ops= _tmf_ops,

Re: [PATCH] rtw88: 8821c: Add RFE 2 support

2021-01-06 Thread Kai-Heng Feng

On Wed, Aug 5, 2020 at 7:24 PM Kai-Heng Feng
 wrote:
>
> Hi Tony,
>
> > On Aug 5, 2020, at 19:18, Tony Chuang  wrote:
> >
> >> 8821CE with RFE 2 isn't supported:
> >> [   12.404834] rtw_8821ce :02:00.0: rfe 2 isn't supported
> >> [   12.404937] rtw_8821ce :02:00.0: failed to setup chip efuse info
> >> [   12.404939] rtw_8821ce :02:00.0: failed to setup chip information
> >>
> >
> > NACK
> >
> > The RFE type 2 should be working with some additional fixes.
> > Did you tested connecting to AP with BT paired?
>
> No, I only tested WiFi.
>
> > The antenna configuration is different with RFE type 0.
> > I will ask someone else to fix them.
> > Then the RFE type 2 modules can be supported.
>
> Good to know that, I'll be patient and wait for a real fix.

It's been quite some time, is support for RFE type 2 ready now?

Kai-Heng

>
> Kai-Heng
>
> >
> > Yen-Hsuan
>

Re: [PATCH v1 1/3] x86/cpufeatures: Add low performance CRC32C instruction CPU feature

2021-01-06 Thread Borislav Petkov

On Thu, Jan 07, 2021 at 02:19:06PM +0800, Tony W Wang-oc wrote:
> SSE4.2 on Zhaoxin CPUs are compatible with Intel. The presence of
> CRC32C instruction is enumerated by CPUID.01H:ECX.SSE4_2[bit 20] = 1.
> Some Zhaoxin CPUs declare support SSE4.2 instruction sets but their
> CRC32C instruction are working with low performance.
> 
> Add a synthetic CPU flag to indicates that the CRC32C instruction is
> not working as intended. This low performance CRC32C instruction flag
> is depend on X86_FEATURE_XMM4_2.
> 
> Signed-off-by: Tony W Wang-oc 
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 84b8878..9e8151b 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -292,6 +292,7 @@
>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL  (11*32+ 5) /* "" LFENCE in 
> kernel entry SWAPGS path */
>  #define X86_FEATURE_SPLIT_LOCK_DETECT(11*32+ 6) /* #AC for split 
> lock */
>  #define X86_FEATURE_PER_THREAD_MBA   (11*32+ 7) /* "" Per-thread Memory 
> Bandwidth Allocation */
> +#define X86_FEATURE_CRC32C   (11*32+ 8) /* "" Low performance CRC32C 
> instruction */

Didn't hpa say to create a BUG flag for it - X86_BUG...? Low performance
insn sounds like a bug and not a feature to me.

And call it X86_BUG_CRC32C_SLOW or ..._UNUSABLE to denote what it means.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[PATCH 2/2] watchdog: BD70528: conditionally allow BD70528 module

2021-01-06 Thread Matti Vaittinen

The BD70528 watchdog module provides start/stop interface for RTC
driver because the BD70528 watchdog must be stopped when RTC time
is set. (WDG uses RTC counter and setting RTC may accidentally trigger
WDG if WDG is enabled). The BD71828 use same RTC driver as BD70528 but
don't share same WDG logic. When BD70528 is not configured a stub call
to "stop WDG" is implemented and in case when BD71828 is used, this
stub function should be called. Prevent configuring in the BD70528
watchdog when BD71828 is configured to avoid access to real WDG
functions when WDG does not exist in HW.

Signed-off-by: Matti Vaittinen 
---
 drivers/watchdog/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index fd7968635e6d..40e1b4c69537 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -163,6 +163,7 @@ config SOFT_WATCHDOG_PRETIMEOUT
 config BD70528_WATCHDOG
tristate "ROHM BD70528 PMIC Watchdog"
depends on MFD_ROHM_BD70528
+   depends on MFD_ROHM_BD71828 = n
select WATCHDOG_CORE
help
  Support for the watchdog in the ROHM BD70528 PMIC. Watchdog trigger
-- 
2.25.4


-- 
Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~
Simon says - in Latin please.
~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~
Thanks to Simon Glass for the translation =]

Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache

2021-01-06 Thread Sai Prakash Ranjan


Hi Will,

On 2021-01-06 17:26, Will Deacon wrote:

On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:

commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to mark the memory as normal sys-cached for GPU to use system 
cache.

Without this, the system cache lines are not allocated for GPU. We use
the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection
flag as the flag cannot be exposed via DMA api because of no in-tree
users.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 7c9ea9d7874a..3fb7de8304a2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,

else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
}


drivers/iommu/io-pgtable.c currently documents this quirk as applying 
only
to the page-table walker. Given that we only have one user at the 
moment,

I think it's ok to change that, but please update the comment.



Sure, how about this change in comment:

 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the 
outer-cacheability
-*  attributes set in the TCR for a non-coherent page-table 
walker.
+*  attributes set in the TCR for a non-coherent page-table 
walker
+*  and also to set the correct cacheability attributes to 
use an

+*  outer level of cache for non-coherent masters.

We also need to decide on whether we want to allow the quirk to be 
passed
if the coherency of the page-table walker differs from the DMA device, 
since

we have these combinations:

Coherent walker?IOMMU_CACHE IO_PGTABLE_QUIRK_ARM_OUTER_WBWA
0:  N   0   0
1:  N   0   1
2:  N   1   0
3:  N   1   1
4:  Y   0   0
5:  Y   0   1
6:  Y   1   0
7:  Y   1   1

Some of them are obviously bogus, such as (7), but I don't know what to
do about cases such as (3) and (5).



I thought this was already decided when IOMMU_SYS_CACHE_ONLY prot flag 
was

added in this same location [1]. dma-coherent masters can use the normal
cached memory type to use the system cache and non dma-coherent masters
willing to use system cache should use normal sys-cached memory type 
with

this quirk.

[1] 
https://lore.kernel.org/linux-arm-msm/20190516093020.18028-1-vivek.gau...@codeaurora.org/


Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation

[PATCH 1/2] watchdog: bd70528: don't crash if WDG is confiured with BD71828

2021-01-06 Thread Matti Vaittinen

If config for BD70528 watchdog is enabled when BD71828 or BD71815
are used the RTC module will issue call to BD70528 watchdog with
NULL data. Ignore this call and don't crash.

Signed-off-by: Matti Vaittinen 
---
 drivers/watchdog/bd70528_wdt.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/watchdog/bd70528_wdt.c b/drivers/watchdog/bd70528_wdt.c
index 0170b37e6674..fde242b8a4a6 100644
--- a/drivers/watchdog/bd70528_wdt.c
+++ b/drivers/watchdog/bd70528_wdt.c
@@ -49,6 +49,21 @@ int bd70528_wdt_set(struct rohm_regmap_dev *data, int 
enable, int *old_state)
u8 wd_ctrl_arr[3] = { WD_CTRL_MAGIC1, WD_CTRL_MAGIC2, 0 };
u8 *wd_ctrl = _ctrl_arr[2];
 
+   /*
+* BD71828 and BD71815 use same RTC driver as BD70528.
+* BD71815 and BD71828 do not need MFD data as they do not share
+* RTC counter with watchdog. The BD70528 watchdog should not be
+* compiled in with BD71815 or BD71828 and the stub implementation
+* for the bd70528_wdt_set should be provided instead.
+*
+* If one compiles this watchdog with BD71828 or BD71815 - the call
+* from RTC may get here and the data pointer is NULL. In that case,
+* warn and go out.
+*/
+   if (!data) {
+   pr_warn("BD70528_WATCHDOG misconfigured\n");
+   return 0;
+   }
ret = regmap_read(bd70528->chip.regmap, BD70528_REG_WDT_CTRL, );
if (ret)
return ret;

base-commit: 2c85ebc57b3e1817b6ce1a6b703928e113a90442
-- 
2.25.4


-- 
Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~
Simon says - in Latin please.
~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~
Thanks to Simon Glass for the translation =]

Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2021-01-06 Thread Tony W Wang-oc

On 03/01/2021 05:12, Herbert Xu wrote:
> On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote:
>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature, when
>> crc32c-intel and crc32c-generic are both registered, system will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic.
>>
>> When doing lmbench3 Create and Delete file test on partitions with
>> ext4 enabling metadata checksum, found using crc32c-generic driver
>> could get about 20% performance gain than using the driver crc32c-intel
>> on some Zhaoxin CPUs.
>>
>> This case expect to use crc32c-generic driver for these Zhaoxin CPUs
>> to get performance gain, so remove these Zhaoxin CPUs support from
>> crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc 
>> ---
>>  arch/x86/crypto/crc32c-intel_glue.c | 21 +++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> This does not seem to address the latest comment from hpa.
> 

Yes, please ignore this patch. Have send new patch set per Hpa's suggestion.

Sincerely
Tonyw

> Thanks,
>

[PATCH V3 2/2] scripts: dtc: Build fdtoverlay and fdtdump tools

2021-01-06 Thread Viresh Kumar

We will start building overlays for platforms soon in the kernel and
would need these tools going forward. Lets start building them.

The fdtoverlay program applies (or merges) one ore more overlay dtb
blobs to a base dtb blob. The kernel build system would later use
fdtoverlay to generate the overlaid blobs based on platform specific
configurations.

The fdtdump program prints a readable version of a flat device-tree
file. This is a very useful tool to analyze the details of the overlay's
dtb and the final dtb produced by fdtoverlay after applying the
overlay's dtb to a base dtb.

Signed-off-by: Viresh Kumar 
---
V3:
- Updated log
- Remove libfdt_dir

 scripts/dtc/Makefile | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/scripts/dtc/Makefile b/scripts/dtc/Makefile
index 4852bf44e913..472ab8cd590c 100644
--- a/scripts/dtc/Makefile
+++ b/scripts/dtc/Makefile
@@ -1,12 +1,17 @@
 # SPDX-License-Identifier: GPL-2.0
 # scripts/dtc makefile
 
-hostprogs-always-$(CONFIG_DTC) += dtc
+hostprogs-always-$(CONFIG_DTC) += dtc fdtdump fdtoverlay
 hostprogs-always-$(CHECK_DT_BINDING)   += dtc
 
 dtc-objs   := dtc.o flattree.o fstree.o data.o livetree.o treesource.o \
   srcpos.o checks.o util.o
 dtc-objs   += dtc-lexer.lex.o dtc-parser.tab.o
+fdtdump-objs   := fdtdump.o util.o
+
+libfdt-objs:= fdt.o fdt_ro.o fdt_wip.o fdt_sw.o fdt_rw.o fdt_strerror.o 
fdt_empty_tree.o fdt_addresses.o fdt_overlay.o
+libfdt = $(addprefix libfdt/,$(libfdt-objs))
+fdtoverlay-objs:= $(libfdt) fdtoverlay.o util.o
 
 # Source files need to get at the userspace version of libfdt_env.h to compile
 HOST_EXTRACFLAGS += -I $(srctree)/$(src)/libfdt
-- 
2.25.0.rc1.19.g042ed3e048af

Re: [PATCH] gpio: bd7xxxx: use helper variable for pdev->dev

2021-01-06 Thread Vaittinen, Matti

Thanks for making this better :)

On Wed, 2021-01-06 at 11:11 +0100, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> Using a helper local variable to store the address of >dev adds
> to readability and allows us to avoid unnecessary line breaks.
> 
> Signed-off-by: Bartosz Golaszewski 

Reviewed-by: Matti Vaittinen

RE: [PATCH] exfat: improve performance of exfat_free_cluster when using dirsync mount option

2021-01-06 Thread Sungjong Seo

> There are stressful update of cluster allocation bitmap when using dirsync
> mount option which is doing sync buffer on every cluster bit clearing.
> This could result in performance degradation when deleting big size file.
> Fix to update only when the bitmap buffer index is changed would make less
> disk access, improving performance especially for truncate operation.
> 
> Testing with Samsung 256GB sdcard, mounted with dirsync option (mount -t
> exfat /dev/block/mmcblk0p1 /temp/mount -o dirsync)
> 
> Remove 4GB file, blktrace result.
> [Before] : 39 secs.
> Total (blktrace):
>  Reads Queued:  0,0KiB Writes Queued:  32775,
16387KiB
>  Read Dispatches:   0,0KiB Write Dispatches:   32775,
16387KiB
>  Reads Requeued:0  Writes Requeued:0
>  Reads Completed:   0,0KiB Writes Completed:   32775,
16387KiB
>  Read Merges:   0,0KiB Write Merges:   0,
0KiB
>  IO unplugs:2  Timer unplugs:  0
> 
> [After] : 1 sec.
> Total (blktrace):
>  Reads Queued:  0,0KiB Writes Queued: 13,
6KiB
>  Read Dispatches:   0,0KiB Write Dispatches:  13,
6KiB
>  Reads Requeued:0  Writes Requeued:0
>  Reads Completed:   0,0KiB Writes Completed:  13,
6KiB
>  Read Merges:   0,0KiB Write Merges:   0,
0KiB
>  IO unplugs:1  Timer unplugs:  0
> 
> Signed-off-by: Hyeongseok Kim 

Looks good.
Thanks for your work!

Acked-by: Sungjong Seo

Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2021-01-06 Thread Tony W Wang-oc



On 22/12/2020 12:54, h...@zytor.com wrote:
> On December 21, 2020 7:01:39 PM PST, tonywwang...@zhaoxin.com wrote:
>> On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote:
>>> On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
 On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
  wrote:
> On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>
>> On 15/12/2020 04:41, Eric Biggers wrote:
>>> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
 On 12/12/2020 01:43, Eric Biggers wrote:
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>> wrote:
>> The driver crc32c-intel match CPUs supporting
> X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature,
 When
>> crc32c-intel and crc32c-generic are both registered, system
 will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic. This case expect to use crc32c-generic driver
> for
>> some Zhaoxin CPUs to get performance gain, So remove these
> Zhaoxin
>> CPUs support from crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc 
>
> Does this mean that the performance of the crc32c instruction
>>> on
> those CPUs is
> actually slower than a regular C implementation?  That's very
> weird.
>

 From the lmbench3 Create and Delete file test on those chips, I
> think yes.

>>>
>>> Did you try measuring the performance of the hashing itself, and
> not some
>>> higher-level filesystem operations?
>>>
>>
>> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>> the
> same
>> input value the generic C implementation takes fewer time than the
>> crc32c instruction implementation.
>>
>
> And that is really "working as intended"?

 These CPU's crc32c instruction is not working as intended.

  Why do these CPUs even
> declare that
> they support the crc32c instruction, when it is so slow?
>

 The presence of crc32c and some other instructions supports are
 enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
 except the crc32c instruction.

> Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>> AVX,
> etc.) that
> these CPUs similarly declare support for but they are uselessly
>> slow?

 No.

 Sincerely
 Tonyw

>
> - Eric
>>>
>>> Then the right thing to do is to disable the CPUID bit in the
>>> vendor-specific startup code.
>>
>> This way makes these CPUs do not support all instruction sets
>> enumerated
>> by CPUID.01:ECX[SSE4.2].
>> While only crc32c instruction is slow, just expect the crc32c-intel
>> driver do not
>> match these CPUs.
>>
>> Sincerely
>> Tonyw
> 
> Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We 
> *do not* bury this information in drivers; it becomes a recipe for the same 
> problems over and over.
> 

Thanks for your suggestion. Have send new patch set.

Sincerely
Tonyw

[PATCH] kasan: remove redundant config option

2021-01-06 Thread Walter Wu

CONFIG_KASAN_STACK and CONFIG_KASAN_STACK_ENABLE both enable KASAN
stack instrumentation, but we should only need one config option,
so that we remove CONFIG_KASAN_STACK_ENABLE. see [1].

For gcc we could do no prompt and default value y, and for clang
prompt and default value n.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=210221

Signed-off-by: Walter Wu 
Suggested-by: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Andrey Konovalov 
Cc: Alexander Potapenko 
Cc: Andrew Morton 
---
 arch/arm64/kernel/sleep.S|  2 +-
 arch/x86/kernel/acpi/wakeup_64.S |  2 +-
 include/linux/kasan.h|  2 +-
 lib/Kconfig.kasan| 11 ---
 mm/kasan/common.c|  2 +-
 mm/kasan/kasan.h |  2 +-
 mm/kasan/report_generic.c|  2 +-
 scripts/Makefile.kasan   | 10 --
 8 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 6bdef7362c0e..7c44ede122a9 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -133,7 +133,7 @@ SYM_FUNC_START(_cpu_resume)
 */
bl  cpu_do_resume
 
-#if defined(CONFIG_KASAN) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
mov x0, sp
bl  kasan_unpoison_task_stack_below
 #endif
diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 5d3a0b8fd379..c7f412f4e07d 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -112,7 +112,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
movqpt_regs_r14(%rax), %r14
movqpt_regs_r15(%rax), %r15
 
-#if defined(CONFIG_KASAN) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
/*
 * The suspend path may have poisoned some areas deeper in the stack,
 * which we now need to unpoison.
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 5e0655fb2a6f..35d1e9b2cbfa 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -302,7 +302,7 @@ static inline void kasan_kfree_large(void *ptr, unsigned 
long ip) {}
 
 #endif /* CONFIG_KASAN */
 
-#if defined(CONFIG_KASAN) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
 void kasan_unpoison_task_stack(struct task_struct *task);
 #else
 static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index f5fa4ba126bf..59de74293454 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -138,9 +138,11 @@ config KASAN_INLINE
 
 endchoice
 
-config KASAN_STACK_ENABLE
-   bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
+config KASAN_STACK
+   bool "Enable stack instrumentation (unsafe)"
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   default y if CC_IS_GCC
+   default n if CC_IS_CLANG
help
  The LLVM stack address sanitizer has a know problem that
  causes excessive stack usage in a lot of functions, see
@@ -154,11 +156,6 @@ config KASAN_STACK_ENABLE
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
 
-config KASAN_STACK
-   int
-   default 1 if KASAN_STACK_ENABLE || CC_IS_GCC
-   default 0
-
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
depends on KASAN_SW_TAGS
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 38ba2aecd8f4..02ec7f81dc16 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -63,7 +63,7 @@ void __kasan_unpoison_range(const void *address, size_t size)
unpoison_range(address, size);
 }
 
-#if CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN_STACK)
 /* Unpoison the entire stack for a task. */
 void kasan_unpoison_task_stack(struct task_struct *task)
 {
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index cc4d9e1d49b1..bdfdb1cff653 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -224,7 +224,7 @@ void *find_first_bad_addr(void *addr, size_t size);
 const char *get_bug_type(struct kasan_access_info *info);
 void metadata_fetch_row(char *buffer, void *row);
 
-#if defined(CONFIG_KASAN_GENERIC) && CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN_GENERIC) && defined(CONFIG_KASAN_STACK)
 void print_address_stack_frame(const void *addr);
 #else
 static inline void print_address_stack_frame(const void *addr) { }
diff --git a/mm/kasan/report_generic.c b/mm/kasan/report_generic.c
index 8a9c889872da..137a1dba1978 100644
--- a/mm/kasan/report_generic.c
+++ b/mm/kasan/report_generic.c
@@ -128,7 +128,7 @@ void metadata_fetch_row(char *buffer, void *row)
memcpy(buffer, kasan_mem_to_shadow(row), META_BYTES_PER_ROW);
 }
 
-#if CONFIG_KASAN_STACK
+#if defined(CONFIG_KASAN_STACK)
 static bool __must_check tokenize_frame_descr(const char **frame_descr,
  char *token, size_t

[PATCH v1 0/3] crypto: x86/crc32c-intel - Exclude some Zhaoxin CPUs

2021-01-06 Thread Tony W Wang-oc

The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, when
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic.

When doing lmbench3 Create and Delete file test on partitions with
ext4 enabling metadata checksum, found using crc32c-generic driver
could get about 20% performance gain than using the driver crc32c-intel
on some Zhaoxin CPUs. Lower-level testing result is that with the same
input value the generic C implementation takes fewer time than the crc32c
instruction implementation on these CPUs. This case expect to use
crc32c-generic driver for these CPUs to get performance gain.

The presence of crc32c is enumerated by CPUID.01:ECX[SSE4.2] = 1, and
these CPUs other SSE4.2 instructions is ok.

Add a synthetic flag to indicates low performance CRC32C instruction
implementation, set this flag in Zhaoxin CPUs specific init phase,
and exclude CPUs which setting this flag from the driver crc32c-intel.

https://lkml.org/lkml/2020/12/21/789

Tony W Wang-oc (3):
  x86/cpufeatures: Add low performance CRC32C instruction CPU feature
  x86/cpu: Set low performance CRC32C flag on some Zhaoxin CPUs
  crypto: x86/crc32c-intel Exclude low performance CRC32C instruction
CPUs

 arch/x86/crypto/crc32c-intel_glue.c | 5 +
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/kernel/cpu/centaur.c   | 7 +++
 arch/x86/kernel/cpu/cpuid-deps.c| 1 +
 arch/x86/kernel/cpu/zhaoxin.c   | 6 ++
 5 files changed, 20 insertions(+)

-- 
2.7.4

[PATCH v1 3/3] crypto: x86/crc32c-intel Exclude low performance CRC32C instruction CPUs

2021-01-06 Thread Tony W Wang-oc

Low performance CRC32C instruction CPUs expect to use the driver
crc32c-generic. So remove these CPUs support from crc32c-intel.

Signed-off-by: Tony W Wang-oc 
---
 arch/x86/crypto/crc32c-intel_glue.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..1b6d289 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -224,6 +224,11 @@ static int __init crc32c_intel_mod_init(void)
 {
if (!x86_match_cpu(crc32c_cpu_id))
return -ENODEV;
+
+   /* Don't merit use low performance CRC32C instruction */
+   if (boot_cpu_has(X86_FEATURE_CRC32C))
+   return -ENODEV;
+
 #ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
alg.update = crc32c_pcl_intel_update;
-- 
2.7.4

[PATCH v1 2/3] x86/cpu: Set low performance CRC32C flag on some Zhaoxin CPUs

2021-01-06 Thread Tony W Wang-oc

Some Zhaoxin CPUs declare support SSE4.2 instruction sets but
having a CRC32C instruction implementation that not working as
intended. Set low performance CRC32C flag on these CPUs for later
use.

Signed-off-by: Tony W Wang-oc 
---
 arch/x86/kernel/cpu/centaur.c | 7 +++
 arch/x86/kernel/cpu/zhaoxin.c | 6 ++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
index 345f7d9..13e6fbe 100644
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -109,6 +109,13 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
}
+
+   /*
+* These CPUs declare support SSE4.2 instruction sets but
+* having low performance CRC32C instruction implementation.
+*/
+   if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
+   set_cpu_cap(c, X86_FEATURE_CRC32C);
 }
 
 static void init_centaur(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/zhaoxin.c b/arch/x86/kernel/cpu/zhaoxin.c
index 05fa4ef..837ec65 100644
--- a/arch/x86/kernel/cpu/zhaoxin.c
+++ b/arch/x86/kernel/cpu/zhaoxin.c
@@ -79,6 +79,12 @@ static void early_init_zhaoxin(struct cpuinfo_x86 *c)
c->x86_coreid_bits = get_count_order((ebx >> 16) & 
0xff);
}
 
+   /*
+* These CPUs declare support SSE4.2 instruction sets but
+* having low performance CRC32C instruction implementation.
+*/
+   if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
+   set_cpu_cap(c, X86_FEATURE_CRC32C);
 }
 
 static void init_zhaoxin(struct cpuinfo_x86 *c)
-- 
2.7.4

[PATCH v1 1/3] x86/cpufeatures: Add low performance CRC32C instruction CPU feature

2021-01-06 Thread Tony W Wang-oc

SSE4.2 on Zhaoxin CPUs are compatible with Intel. The presence of
CRC32C instruction is enumerated by CPUID.01H:ECX.SSE4_2[bit 20] = 1.
Some Zhaoxin CPUs declare support SSE4.2 instruction sets but their
CRC32C instruction are working with low performance.

Add a synthetic CPU flag to indicates that the CRC32C instruction is
not working as intended. This low performance CRC32C instruction flag
is depend on X86_FEATURE_XMM4_2.

Signed-off-by: Tony W Wang-oc 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b8878..9e8151b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -292,6 +292,7 @@
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL(11*32+ 5) /* "" LFENCE in 
kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT  (11*32+ 6) /* #AC for split lock */
 #define X86_FEATURE_PER_THREAD_MBA (11*32+ 7) /* "" Per-thread Memory 
Bandwidth Allocation */
+#define X86_FEATURE_CRC32C (11*32+ 8) /* "" Low performance CRC32C 
instruction */
 
 /* Intel-defined CPU features, CPUID level 0x0007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16(12*32+ 5) /* AVX512 BFLOAT16 
instructions */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 42af31b6..7d7fca7 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -72,6 +72,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AVX512_FP16,  X86_FEATURE_AVX512BW  },
{ X86_FEATURE_ENQCMD,   X86_FEATURE_XSAVES},
{ X86_FEATURE_PER_THREAD_MBA,   X86_FEATURE_MBA   },
+   { X86_FEATURE_CRC32C,   X86_FEATURE_XMM4_2},
{}
 };
 
-- 
2.7.4

Re: [PATCH] proc_sysclt: fix oops caused by incorrect command parameters.

2021-01-06 Thread Xiaoming Ni


On 2021/1/7 7:46, Kees Cook wrote:

subject typo: "sysclt" -> "sysctl"

On Thu, Dec 24, 2020 at 03:42:56PM +0800, Xiaoming Ni wrote:

The process_sysctl_arg() does not check whether val is empty before
  invoking strlen(val). If the command line parameter () is incorrectly
  configured and val is empty, oops is triggered.

For example, "hung_task_panic=1" is incorrectly written as "hung_task_panic".

log:
Kernel command line:  hung_task_panic

[000n] user address but active_mm is swapper
Internal error: Oops: 9605 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.1 #1
Hardware name: linux,dummy-virt (DT)
pstate: 4005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
pc : __pi_strlen+0x10/0x98
lr : process_sysctl_arg+0x1e4/0x2ac
sp : ffc01104bd40
x29: ffc01104bd40 x28: 
x27: ff80c0a4691e x26: ffc0102a7c8c
x25:  x24: ffc01104be80
x23: ff80c22f0b00 x22: ff80c02e28c0
x21: ffc0109f9000 x20: 
x19: ffc0107c08de x18: 0003
x17: ffc01105d000 x16: 0054
x15:  x14: 3030253078413830
x13:  x12: 
x11: 0101010101010101 x10: 0005
x9 : 0003 x8 : ff80c0980c08
x7 :  x6 : 0002
x5 : ff80c0235000 x4 : ff810f7c7ee0
x3 : 043a x2 : 00bdcc4ebacf1a54
x1 :  x0 : 
Call trace:
 __pi_strlen+0x10/0x98
 parse_args+0x278/0x344
 do_sysctl_args+0x8c/0xfc
 kernel_init+0x5c/0xf4
 ret_from_fork+0x10/0x30
Code: b200c3eb 927cec01 f2400c07 54000301 (a8c10c22)

Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters
  from kernel command line")
Signed-off-by: Xiaoming Ni 
---
  fs/proc/proc_sysctl.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 317899222d7f..4516411a2b44 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1757,6 +1757,9 @@ static int process_sysctl_arg(char *param, char *val,
loff_t pos = 0;
ssize_t wret;
  
+	if (!val)

+   return 0;
+
if (strncmp(param, "sysctl", sizeof("sysctl") - 1) == 0) {
param += sizeof("sysctl") - 1;


Otherwise, yeah, this is a good test to add. I would make it more
verbose, though:

if (!val) {
pr_err("Missing param value! Expected '%s=...value...'\n", 
param);
return 0;
}


Yes, it's better to add log output.
Thank you for your review.
Do I need to send V2 patch based on review comments?

Thanks
Xiaoming Ni

[RESEND PATCH 2/2] misc: add support for retimers interfaces on Intel MAX 10 BMC

2021-01-06 Thread Xu Yilun

This driver supports the ethernet retimers (C827) for the Intel PAC
(Programmable Acceleration Card) N3000, which is a FPGA based Smart NIC.

C827 is an Intel(R) Ethernet serdes transceiver chip that supports
up to 100G transfer. On Intel PAC N3000 there are 2 C827 chips
managed by the Intel MAX 10 BMC firmware. They are configured in 4 ports
10G/25G retimer mode. Host could query their link states and firmware
version information via retimer interfaces (Shared registers) on Intel
MAX 10 BMC. The driver creates sysfs interfaces for users to query these
information.

Signed-off-by: Xu Yilun 
---
 .../ABI/testing/sysfs-driver-intel-m10-bmc-retimer |  32 +
 drivers/misc/Kconfig   |  10 ++
 drivers/misc/Makefile  |   1 +
 drivers/misc/intel-m10-bmc-retimer.c   | 158 +
 4 files changed, 201 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-retimer
 create mode 100644 drivers/misc/intel-m10-bmc-retimer.c

diff --git a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-retimer 
b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-retimer
new file mode 100644
index 000..528712a
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-retimer
@@ -0,0 +1,32 @@
+What:  /sys/bus/platform/devices/n3000bmc-retimer.*.auto/tag
+Date:  Jan 2021
+KernelVersion: 5.12
+Contact:   Xu Yilun 
+Description:   Read only. Returns the tag of the retimer chip. Now there are 2
+   retimer chips on Intel PAC N3000, they are tagged as
+   'retimer_A' and 'retimer_B'.
+   Format: "retimer_%c".
+
+What:  /sys/bus/platform/devices/n3000bmc-retimer.*.auto/sbus_version
+Date:  Jan 2021
+KernelVersion: 5.12
+Contact:   Xu Yilun 
+Description:   Read only. Returns the Transceiver bus firmware version of
+   the retimer chip.
+   Format: "0x%04x".
+
+What:  /sys/bus/platform/devices/n3000bmc-retimer.*.auto/serdes_version
+Date:  Jan 2021
+KernelVersion: 5.12
+Contact:   Xu Yilun 
+Description:   Read only. Returns the SERDES firmware version of the retimer
+   chip.
+   Format: "0x%04x".
+
+What:  /sys/bus/platform/devices/n3000bmc-retimer.*.auto/link_statusX
+Date:  Jan 2021
+KernelVersion: 5.12
+Contact:   Xu Yilun 
+Description:   Read only. Returns the status of each line side link. "1" for
+   link up, "0" for link down.
+   Format: "%u".
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index fafa8b0..7cb9433 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -466,6 +466,16 @@ config HISI_HIKEY_USB
  switching between the dual-role USB-C port and the USB-A host ports
  using only one USB controller.
 
+config INTEL_M10_BMC_RETIMER
+   tristate "Intel(R) MAX 10 BMC ethernet retimer interface support"
+   depends on MFD_INTEL_M10_BMC
+   help
+ This driver supports the ethernet retimer (C827) on Intel(R) MAX 10
+ BMC, which is used by Intel PAC N3000 FPGA based Smart NIC.
+
+ To compile this driver as a module, choose M here: the module will
+ be called intel-m10-bmc-retimer.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index d23231e..67883cf 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -57,3 +57,4 @@ obj-$(CONFIG_HABANA_AI)   += habanalabs/
 obj-$(CONFIG_UACCE)+= uacce/
 obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
 obj-$(CONFIG_HISI_HIKEY_USB)   += hisi_hikey_usb.o
+obj-$(CONFIG_INTEL_M10_BMC_RETIMER)+= intel-m10-bmc-retimer.o
diff --git a/drivers/misc/intel-m10-bmc-retimer.c 
b/drivers/misc/intel-m10-bmc-retimer.c
new file mode 100644
index 000..d845342b
--- /dev/null
+++ b/drivers/misc/intel-m10-bmc-retimer.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel Max10 BMC Retimer Interface Driver
+ *
+ * Copyright (C) 2021 Intel Corporation, Inc.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define N3000BMC_RETIMER_DEV_NAME "n3000bmc-retimer"
+
+struct m10bmc_retimer {
+   struct device *dev;
+   struct intel_m10bmc *m10bmc;
+   u32 ver_reg;
+   u32 id;
+};
+
+static ssize_t tag_show(struct device *dev, struct device_attribute *attr,
+   char *buf)
+{
+   struct m10bmc_retimer *retimer = dev_get_drvdata(dev);
+
+   return sysfs_emit(buf, "retimer_%c\n", 'A' + retimer->id);
+}
+static DEVICE_ATTR_RO(tag);
+
+static ssize_t sbus_version_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   struct m10bmc_retimer *retimer = dev_get_drvdata(dev);
+   unsigned int val;
+   int ret;
+
+   ret =

[RESEND PATCH 0/2] Add retimer interfaces support for Intel MAX 10 BMC

2021-01-06 Thread Xu Yilun

I resend this patchset to loop in networking developers for comments. This
is the previous thread. I'll fix other comments when I have a v2.

https://lore.kernel.org/lkml/x%2fv9hvxyluot9...@kroah.com/


The patchset is for the retimers connected to Intel MAX 10 BMC on Intel
PAC (Programmable Acceleration Card) N3000 Card. The network part of the
N3000 card is like the following:

   ++
   |  FPGA  |
  ++   +---+   +---+  +--+  +---+   +--+
  |QSFP|---|retimer|---|Line Side  |--|User logic|--|Host Side  |---|XL710 |
  ++   +---+   |Ether Group|  |  |  |Ether Group|   |Ethernet  |
   |(PHY + MAC)|  |wiring &  |  |(MAC + PHY)|   |Controller|
   +---+  |offloading|  +---+   +--+
   |  +--+  |
   ||
   ++

I had sent some RFC patches to expose the Line Side Ether Group + retimer +
QSFP as a netdev, and got some comments from netdev Maintainers.

https://lore.kernel.org/netdev/1603442745-13085-2-git-send-email-yilun...@intel.com/

The blocking issues I have is that physically the QSFP & retimer is
managed by the BMC and host could only get the retimer link states. This
is not enough to support some necessary netdev ops.  E.g. host cannot
realize the type/speed of the SFP by "ethtool -m", then users could not
configure the various layers accordingly.

This means the existing net tool can not work with it, so this patch just
expose the link states as custom sysfs attrs.


This patchset supports the ethernet retimers (C827) for the Intel PAC
(Programmable Acceleration Card) N3000, which is a FPGA based Smart NIC.

The 2 retimer chips connect to the Intel MAX 10 BMC on the card. They are
managed by the BMC firmware. Host could query their link states and
firmware version information via retimer interfaces (Shared registers) on
the BMC. The driver creates sysfs interfaces for users to query these
information.

The Intel OPAE (Open Programmable Acceleration Engine) lib provides tools
to read these attributes.

This is the source of the OPAE lib.

https://github.com/OPAE/opae-sdk/

Generally it facilitate the development on all the DFL (Device Feature
List) based FPGA Cards, including the management of static region &
dynamic region reprogramming, accelerators accessing and the board
specific peripherals.


Xu Yilun (2):
  mfd: intel-m10-bmc: specify the retimer sub devices
  misc: add support for retimers interfaces on Intel MAX 10 BMC

 .../ABI/testing/sysfs-driver-intel-m10-bmc-retimer |  32 +
 drivers/mfd/intel-m10-bmc.c|  19 ++-
 drivers/misc/Kconfig   |  10 ++
 drivers/misc/Makefile  |   1 +
 drivers/misc/intel-m10-bmc-retimer.c   | 158 +
 include/linux/mfd/intel-m10-bmc.h  |   7 +
 6 files changed, 226 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-retimer
 create mode 100644 drivers/misc/intel-m10-bmc-retimer.c

-- 
2.7.4

[RESEND PATCH 1/2] mfd: intel-m10-bmc: specify the retimer sub devices

2021-01-06 Thread Xu Yilun

The patch specifies the 2 retimer sub devices and their resources in the
parent driver's mfd_cell. It also adds the register definition of the
retimer sub devices.

There are 2 ethernet retimer chips (C827) connected to the Intel MAX 10
BMC. They are managed by the BMC firmware, and host could query them via
retimer interfaces (shared registers) on the BMC. The 2 retimers have
identical register interfaces in different register addresses or fields,
so it is better we define 2 retimer devices and handle them with the same
driver.

Signed-off-by: Xu Yilun 
---
 drivers/mfd/intel-m10-bmc.c   | 19 ++-
 include/linux/mfd/intel-m10-bmc.h |  7 +++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/mfd/intel-m10-bmc.c b/drivers/mfd/intel-m10-bmc.c
index b84579b..e0a99a0 100644
--- a/drivers/mfd/intel-m10-bmc.c
+++ b/drivers/mfd/intel-m10-bmc.c
@@ -17,9 +17,26 @@ enum m10bmc_type {
M10_N3000,
 };
 
+static struct resource retimer0_resources[] = {
+   {M10BMC_PKVL_A_VER, M10BMC_PKVL_A_VER, "version", IORESOURCE_REG, },
+};
+
+static struct resource retimer1_resources[] = {
+   {M10BMC_PKVL_B_VER, M10BMC_PKVL_B_VER, "version", IORESOURCE_REG, },
+};
+
 static struct mfd_cell m10bmc_pacn3000_subdevs[] = {
{ .name = "n3000bmc-hwmon" },
-   { .name = "n3000bmc-retimer" },
+   {
+   .name = "n3000bmc-retimer",
+   .num_resources = ARRAY_SIZE(retimer0_resources),
+   .resources = retimer0_resources,
+   },
+   {
+   .name = "n3000bmc-retimer",
+   .num_resources = ARRAY_SIZE(retimer1_resources),
+   .resources = retimer1_resources,
+   },
{ .name = "n3000bmc-secure" },
 };
 
diff --git a/include/linux/mfd/intel-m10-bmc.h 
b/include/linux/mfd/intel-m10-bmc.h
index c8ef2f1..d6216f9 100644
--- a/include/linux/mfd/intel-m10-bmc.h
+++ b/include/linux/mfd/intel-m10-bmc.h
@@ -21,6 +21,13 @@
 #define M10BMC_VER_PCB_INFO_MSKGENMASK(31, 24)
 #define M10BMC_VER_LEGACY_INVALID  0x
 
+/* Retimer related registers, in system register region */
+#define M10BMC_PKVL_LSTATUS0x164
+#define M10BMC_PKVL_A_VER  0x254
+#define M10BMC_PKVL_B_VER  0x258
+#define M10BMC_PKVL_SERDES_VER GENMASK(15, 0)
+#define M10BMC_PKVL_SBUS_VER   GENMASK(31, 16)
+
 /**
  * struct intel_m10bmc - Intel MAX 10 BMC parent driver data structure
  * @dev: this device
-- 
2.7.4

Re: [PATCH v3 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Can Guo


On 2021-01-07 05:41, Jaegeuk Kim wrote:

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per 
recovery flows

such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd 
resets")

Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..1678cec08b51 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);


Can we put it right after ufshcd_scsi_add_wlus() in ufshcd_add_lus()?

Thanks,
Can Guo.



return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct 
*work)

ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }

 /**
@@ -6940,14 +6945,11 @@ static int
ufshcd_host_reset_and_restore(struct ufs_hba *hba)
ufshcd_set_clk_freq(hba, true);

err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;

/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
enum ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba,
enum ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}

+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);

Re: [RFC PATCH v2 1/1] platform-msi: Add platform check for subdevice irq domain

2021-01-06 Thread Leon Romanovsky

On Thu, Jan 07, 2021 at 02:04:29AM +, Tian, Kevin wrote:
> > From: Leon Romanovsky 
> > Sent: Thursday, January 7, 2021 12:02 AM
> >
> > On Wed, Jan 06, 2021 at 11:23:39AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Jan 06, 2021 at 12:40:17PM +0200, Leon Romanovsky wrote:
> > >
> > > > I asked what will you do when QEMU will gain needed functionality?
> > > > Will you remove QEMU from this list? If yes, how such "new" kernel will
> > > > work on old QEMU versions?
> > >
> > > The needed functionality is some VMM hypercall, so presumably new
> > > kernels that support calling this hypercall will be able to discover
> > > if the VMM hypercall exists and if so superceed this entire check.
> >
> > Let's not speculate, do we have well-known path?
> > Will such patch be taken to stable@/distros?
> >
>
> There are two functions introduced in this patch. One is to detect whether
> running on bare metal or in a virtual machine. The other is for deciding
> whether the platform supports ims. Currently the two are identical because
> ims is supported only on bare metal at current stage. In the future it will 
> look
> like below when ims can be enabled in a VM:
>
> bool arch_support_pci_device_ims(struct pci_dev *pdev)
> {
>   return on_bare_metal() || hypercall_irq_domain_supported();
> }
>
> The VMM vendor list is for on_bare_metal, and suppose a vendor will
> never be removed once being added to the list since the fact of running
> in a VM never changes, regardless of whether this hypervisor supports
> extra VMM hypercalls.

This is what I imagined, this list will be forever, and this worries me.

I don't know if it is true or not, but guess that at least Oracle and
Microsoft bare metal devices and VMs will have same DMI_SYS_VENDOR.

It means that this on_bare_metal() function won't work reliably in many
cases. Also being part of include/linux/msi.h, at some point of time,
this function will be picked by the users outside for the non-IMS cases.

I didn't even mention custom forks of QEMU which are prohibited to change
DMI_SYS_VENDOR and private clouds with custom solutions.

The current array makes DMI_SYS_VENDOR interface as some sort of ABI. If in the 
future,
the QEMU will decide to use more hipster name, for example "qEmU", this function
won't work.

I'm aware that DMI_SYS_VENDOR is used heavily in the kernel code and
various names for the same company are good example how not reliable it.

The most hilarious example is "Dell/Dell Inc./Dell Inc/Dell Computer 
Corporation/Dell Computer",
but other companies are not far from them.

Luckily enough, this identification is used for hardware product that
was released to the market and their name will be stable for that
specific model. It is not the case here where we need to ensure future
compatibility too (old kernel on new VM emulator).

I'm not in position to say yes or no to this patch and don't have plans to do 
it.
Just expressing my feeling that this solution is too hacky for my taste.

Thanks

Re: [PATCH v3] scsi: ufs: Replace sprintf and snprintf with sysfs_emit

2021-01-06 Thread Can Guo


On 2021-01-07 05:15, Bean Huo wrote:

From: Bean Huo 

sprintf and snprintf may cause output defect in sysfs content, it is
better to use new added sysfs_emit function which knows the size of the
temporary buffer.



Reviewed-by: Can Guo 


Reviewed-by: Avri Altman 
Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Bean Huo 
---
Nothing changed in this patch, just take it out from patchset:
https://patchwork.kernel.org/project/linux-scsi/cover/20201224172010.10701-1-huob...@gmail.com/

---
 drivers/scsi/ufs/ufs-sysfs.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-sysfs.c 
b/drivers/scsi/ufs/ufs-sysfs.c

index 08e72b7eef6a..0e1438485133 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -67,7 +67,7 @@ static ssize_t rpm_lvl_show(struct device *dev,
 {
struct ufs_hba *hba = dev_get_drvdata(dev);

-   return sprintf(buf, "%d\n", hba->rpm_lvl);
+   return sysfs_emit(buf, "%d\n", hba->rpm_lvl);
 }

 static ssize_t rpm_lvl_store(struct device *dev,
@@ -81,7 +81,7 @@ static ssize_t rpm_target_dev_state_show(struct 
device *dev,

 {
struct ufs_hba *hba = dev_get_drvdata(dev);

-   return sprintf(buf, "%s\n", ufschd_ufs_dev_pwr_mode_to_string(
+   return sysfs_emit(buf, "%s\n", ufschd_ufs_dev_pwr_mode_to_string(
ufs_pm_lvl_states[hba->rpm_lvl].dev_state));
 }

@@ -90,7 +90,7 @@ static ssize_t rpm_target_link_state_show(struct 
device *dev,

 {
struct ufs_hba *hba = dev_get_drvdata(dev);

-   return sprintf(buf, "%s\n", ufschd_uic_link_state_to_string(
+   return sysfs_emit(buf, "%s\n", ufschd_uic_link_state_to_string(
ufs_pm_lvl_states[hba->rpm_lvl].link_state));
 }

@@ -99,7 +99,7 @@ static ssize_t spm_lvl_show(struct device *dev,
 {
struct ufs_hba *hba = dev_get_drvdata(dev);

-   return sprintf(buf, "%d\n", hba->spm_lvl);
+   return sysfs_emit(buf, "%d\n", hba->spm_lvl);
 }

 static ssize_t spm_lvl_store(struct device *dev,
@@ -113,7 +113,7 @@ static ssize_t spm_target_dev_state_show(struct 
device *dev,

 {
struct ufs_hba *hba = dev_get_drvdata(dev);

-   return sprintf(buf, "%s\n", ufschd_ufs_dev_pwr_mode_to_string(
+   return sysfs_emit(buf, "%s\n", ufschd_ufs_dev_pwr_mode_to_string(
ufs_pm_lvl_states[hba->spm_lvl].dev_state));
 }

@@ -122,7 +122,7 @@ static ssize_t spm_target_link_state_show(struct
device *dev,
 {
struct ufs_hba *hba = dev_get_drvdata(dev);

-   return sprintf(buf, "%s\n", ufschd_uic_link_state_to_string(
+   return sysfs_emit(buf, "%s\n", ufschd_uic_link_state_to_string(
ufs_pm_lvl_states[hba->spm_lvl].link_state));
 }

@@ -165,7 +165,7 @@ static ssize_t auto_hibern8_show(struct device 
*dev,

ufshcd_release(hba);
pm_runtime_put_sync(hba->dev);

-   return scnprintf(buf, PAGE_SIZE, "%d\n", ufshcd_ahit_to_us(ahit));
+   return sysfs_emit(buf, "%d\n", ufshcd_ahit_to_us(ahit));
 }

 static ssize_t auto_hibern8_store(struct device *dev,
@@ -233,18 +233,18 @@ static ssize_t ufs_sysfs_read_desc_param(struct
ufs_hba *hba,
return -EINVAL;
switch (param_size) {
case 1:
-   ret = sprintf(sysfs_buf, "0x%02X\n", *desc_buf);
+   ret = sysfs_emit(sysfs_buf, "0x%02X\n", *desc_buf);
break;
case 2:
-   ret = sprintf(sysfs_buf, "0x%04X\n",
+   ret = sysfs_emit(sysfs_buf, "0x%04X\n",
get_unaligned_be16(desc_buf));
break;
case 4:
-   ret = sprintf(sysfs_buf, "0x%08X\n",
+   ret = sysfs_emit(sysfs_buf, "0x%08X\n",
get_unaligned_be32(desc_buf));
break;
case 8:
-   ret = sprintf(sysfs_buf, "0x%016llX\n",
+   ret = sysfs_emit(sysfs_buf, "0x%016llX\n",
get_unaligned_be64(desc_buf));
break;
}
@@ -609,7 +609,7 @@ static ssize_t _name##_show(struct device 
*dev,\

  SD_ASCII_STD);\
if (ret < 0) \
goto out;   \
-   ret = snprintf(buf, PAGE_SIZE, "%s\n", desc_buf); \
+   ret = sysfs_emit(buf, "%s\n", desc_buf);  \
 out:   \
pm_runtime_put_sync(hba->dev);   \
kfree(desc_buf);\
@@ -659,7 +659,7 @@ static ssize_t _name##_show(struct device 
*dev,\

pm_runtime_put_sync(hba->dev);   \
if (ret)\
return -EINVAL;

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Can Guo


Hi Jaegeuk,

On 2021-01-07 05:41, Jaegeuk Kim wrote:

From: Jaegeuk Kim 

This fixes a warning caused by wrong reserve tag usage in 
__ufshcd_issue_tm_cmd.


WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 
blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 
blk_mq_get_tag+0x438/0x46c


And, in ufshcd_err_handler(), we can avoid to send tm_cmd before 
aborting

outstanding commands by waiting a bit for IO completion like this.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1678cec08b51..47fc8da3cbf9 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -44,6 +44,9 @@
 /* Query request timeout */
 #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */

+/* LINERESET TIME OUT */
+#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
+
 /* Task management command timeout */
 #define TM_CMD_TIMEOUT 100 /* msecs */

@@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct 
*work)

 * check if power mode restore is needed.
 */
if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
+   ktime_t start = ktime_get();


I don't see the connection btw line-reset and following tmf cmd.
My point is that line-reset is not the only non-fatal error which
leads us to the following tmf cmd. So the wait should be outside
of this check - just put it right before clearing outstanding reqs.

Thanks,
Can Guo.


+
hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
if (!hba->saved_uic_err)
hba->saved_err &= ~UIC_ERROR;
@@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct 
work_struct *work)

if (ufshcd_is_pwr_mode_restore_needed(hba))
needs_restore = true;
spin_lock_irqsave(hba->host->host_lock, flags);
+   /* Wait for IO completion to avoid aborting IOs */
+   while (hba->outstanding_reqs) {
+   ufshcd_complete_requests(hba);
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   schedule();
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
+   LINERESET_IO_TIMEOUT_MS) {
+   dev_err(hba->dev, "%s: timeout, 
outstanding=0x%lx\n",
+   __func__, hba->outstanding_reqs);
+   break;
+   }
+   }
+
if (!hba->saved_err && !needs_restore)
goto skip_err_handling;
}
@@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void 
*__hba)

intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}

-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",

+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}

@@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba 
*hba,

 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
+   BLK_MQ_REQ_NOWAIT);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
@@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void
__iomem *mmio_base, unsigned int irq)

hba->tmf_tag_set = (struct blk_mq_tag_set) {
.nr_hw_queues   = 1,
+   .reserved_tags  = 1,
.queue_depth= hba->nutmrs,
.ops= _tmf_ops,
.flags  = BLK_MQ_F_NO_SCHED,

Re: [PATCH] vdpa/mlx5: Fix memory key MTT population

2021-01-06 Thread Eli Cohen

On Thu, Jan 07, 2021 at 12:15:53PM +0800, Jason Wang wrote:
> 
> On 2021/1/6 下午5:05, Eli Cohen wrote:
> > map_direct_mr() assumed that the number of scatter/gather entries
> > returned by dma_map_sg_attrs() was equal to the number of segments in
> > the sgl list. This led to wrong population of the mkey object. Fix this
> > by properly referring to the returned value.
> > 
> > In addition, get rid of fill_sg() whjich effect is overwritten bu
> > populate_mtts().
> 
> 
> Typo.
> 
Will fix, thanks.
> 
> > 
> > Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
> > Signed-off-by: Eli Cohen 
> > ---
> >   drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
> >   drivers/vdpa/mlx5/core/mr.c| 28 
> >   2 files changed, 13 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
> > b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > index 5c92a576edae..08f742fd2409 100644
> > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > @@ -15,6 +15,7 @@ struct mlx5_vdpa_direct_mr {
> > struct sg_table sg_head;
> > int log_size;
> > int nsg;
> > +   int nent;
> > struct list_head list;
> > u64 offset;
> >   };
> > diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> > index 4b6195666c58..d300f799efcd 100644
> > --- a/drivers/vdpa/mlx5/core/mr.c
> > +++ b/drivers/vdpa/mlx5/core/mr.c
> > @@ -25,17 +25,6 @@ static int get_octo_len(u64 len, int page_shift)
> > return (npages + 1) / 2;
> >   }
> > -static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
> > -{
> > -   struct scatterlist *sg;
> > -   __be64 *pas;
> > -   int i;
> > -
> > -   pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
> > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > -   (*pas) = cpu_to_be64(sg_dma_address(sg));
> > -}
> > -
> >   static void mlx5_set_access_mode(void *mkc, int mode)
> >   {
> > MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
> > @@ -45,10 +34,18 @@ static void mlx5_set_access_mode(void *mkc, int mode)
> >   static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
> >   {
> > struct scatterlist *sg;
> > +   int nsg = mr->nsg;
> > +   u64 dma_addr;
> > +   u64 dma_len;
> > +   int j = 0;
> > int i;
> > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > -   mtt[i] = cpu_to_be64(sg_dma_address(sg));
> > +   for_each_sg(mr->sg_head.sgl, sg, mr->nent, i) {
> > +   for (dma_addr = sg_dma_address(sg), dma_len = sg_dma_len(sg);
> > +nsg && dma_len;
> > +nsg--, dma_addr += BIT(mr->log_size), dma_len -= 
> > BIT(mr->log_size))
> > +   mtt[j++] = cpu_to_be64(dma_addr);
> 
> 
> It looks to me the mtt entry is also limited by log_size. It's better to
> explain this a little bit in the commit log.

Actually, each MTT entry covers (1 << mr->log_size) contiguous memory.
I will add an explanation.

> 
> Thanks
> 
> 
> > +   }
> >   }
> >   static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct 
> > mlx5_vdpa_direct_mr *mr)
> > @@ -64,7 +61,6 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> > struct mlx5_vdpa_direct
> > return -ENOMEM;
> > MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
> > -   fill_sg(mr, in);
> > mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
> > MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
> > MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
> > @@ -276,8 +272,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> > struct mlx5_vdpa_direct_mr
> >   done:
> > mr->log_size = log_entity_size;
> > mr->nsg = nsg;
> > -   err = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> > DMA_BIDIRECTIONAL, 0);
> > -   if (!err)
> > +   mr->nent = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> > DMA_BIDIRECTIONAL, 0);
> > +   if (!mr->nent)
> > goto err_map;
> > err = create_direct_mr(mvdev, mr);
>

[PATCH] dt-bindings: mmc: sdhci-am654: Add compatible string for AM64 SoC

2021-01-06 Thread Aswath Govindraju

Add compatible string for AM64 SoC in device tree binding of AM654 SDHCI
module as the same IP is used.

Signed-off-by: Aswath Govindraju 
---
 Documentation/devicetree/bindings/mmc/sdhci-am654.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-am654.yaml 
b/Documentation/devicetree/bindings/mmc/sdhci-am654.yaml
index 1ae945434c53..34e53db29428 100644
--- a/Documentation/devicetree/bindings/mmc/sdhci-am654.yaml
+++ b/Documentation/devicetree/bindings/mmc/sdhci-am654.yaml
@@ -21,6 +21,8 @@ properties:
   - ti,j721e-sdhci-4bit
   - ti,j7200-sdhci-8bit
   - ti,j721e-sdhci-4bit
+  - ti,am64-sdhci-8bit
+  - ti,am64-sdhci-4bit
 
   reg:
 maxItems: 2
-- 
2.17.1

Re: [PATCH V2 2/2] scripts: dtc: Build fdtoverlay and fdtdump tools

2021-01-06 Thread Masahiro Yamada

On Thu, Jan 7, 2021 at 2:16 PM Viresh Kumar  wrote:
>
> We will start building overlays for platforms soon in the kernel and
> would need these tools going forward. Lets start building them.


The commit log should explain how fdtdump and fdtoverlay are used
while building the kernel tree.







> Signed-off-by: Viresh Kumar 
> ---
>  scripts/dtc/Makefile | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/dtc/Makefile b/scripts/dtc/Makefile
> index 4852bf44e913..c607980a5c17 100644
> --- a/scripts/dtc/Makefile
> +++ b/scripts/dtc/Makefile
> @@ -1,12 +1,18 @@
>  # SPDX-License-Identifier: GPL-2.0
>  # scripts/dtc makefile
>
> -hostprogs-always-$(CONFIG_DTC) += dtc
> +hostprogs-always-$(CONFIG_DTC) += dtc fdtdump fdtoverlay
>  hostprogs-always-$(CHECK_DT_BINDING)   += dtc
>
>  dtc-objs   := dtc.o flattree.o fstree.o data.o livetree.o treesource.o \
>srcpos.o checks.o util.o
>  dtc-objs   += dtc-lexer.lex.o dtc-parser.tab.o
> +fdtdump-objs   := fdtdump.o util.o
> +
> +libfdt_dir = libfdt


Adding 'libfdt_dir' is not helpful except
increasing the amount of code.

Please hard-code 'libfdt'


> +libfdt-objs:= fdt.o fdt_ro.o fdt_wip.o fdt_sw.o fdt_rw.o fdt_strerror.o 
> fdt_empty_tree.o fdt_addresses.o fdt_overlay.o
> +libfdt = $(addprefix $(libfdt_dir)/,$(libfdt-objs))
> +fdtoverlay-objs:= $(libfdt) fdtoverlay.o util.o
>  # Source files need to get at the userspace version of libfdt_env.h to 
> compile
>  HOST_EXTRACFLAGS += -I $(srctree)/$(src)/libfdt
> --
> 2.25.0.rc1.19.g042ed3e048af
>


--
Best Regards

Masahiro Yamada

[PATCH] alarmtimer: Do not mess with an enqueued hrtimer

2021-01-06 Thread Li RongQing

when an hrtimer is enqueued already, its expires should be not
changed, otherwise, this will corrupts the ordering of the
timerqueue RB tree, if other hrtimer is enqueued before this
hrtimer is restarted, whole RB tree is completely hosed

Fixes: 6cffe00f7d4e ("alarmtimer: Add functions for timerfd support")
Signed-off-by: Li RongQing 
---
 kernel/time/alarmtimer.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index f4ace1bf8382..3b34995ab8d2 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -388,8 +388,7 @@ void alarm_restart(struct alarm *alarm)
unsigned long flags;
 
spin_lock_irqsave(>lock, flags);
-   hrtimer_set_expires(>timer, alarm->node.expires);
-   hrtimer_restart(>timer);
+   hrtimer_start(>timer, alarm->node.expires, HRTIMER_MODE_ABS);
alarmtimer_enqueue(base, alarm);
spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.17.3

Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-06 Thread Muchun Song

On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko  wrote:
>
> On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > There is a race condition between __free_huge_page()
> > and dissolve_free_huge_page().
> >
> > CPU0: CPU1:
> >
> > // page_count(page) == 1
> > put_page(page)
> >   __free_huge_page(page)
> >   dissolve_free_huge_page(page)
> > spin_lock(_lock)
> > // PageHuge(page) && !page_count(page)
> > update_and_free_page(page)
> > // page is freed to the buddy
> > spin_unlock(_lock)
> > spin_lock(_lock)
> > clear_page_huge_active(page)
> > enqueue_huge_page(page)
> > // It is wrong, the page is already freed
> > spin_unlock(_lock)
> >
> > The race windows is between put_page() and spin_lock() which
> > is in the __free_huge_page().
>
> The race window reall is between put_page and dissolve_free_huge_page.
> And the result is that the put_page path would clobber an unrelated page
> (either free or already reused page) which is quite serious.
> Fortunatelly pages are dissolved very rarely. I believe that user would
> require to be privileged to hit this by intention.
>
> > We should make sure that the page is already on the free list
> > when it is dissolved.
>
> Another option would be to check for PageHuge in __free_huge_page. Have
> you considered that rather than add yet another state? The scope of the
> spinlock would have to be extended. If that sounds more tricky then can
> we check the page->lru in the dissolve path? If the page is still
> PageHuge and reference count 0 then there shouldn't be many options
> where it can be queued, right?

Did you mean that we iterate over the free list to check whether
the page is on the free list? If so, I do not think it is a good solution
than introducing another state. Because if there are a lot of pages
on the free list, it may take some time to do it with holding
hugetlb_lock. Right? Actually, we have some tail page structs
to store the state. At least it's not in short supply right now.

Thanks.

>
> > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle 
> > hugepage")
> > Signed-off-by: Muchun Song 
> > ---
> >  mm/hugetlb.c | 38 ++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 4741d60f8955..8ff138c17129 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >  static int num_fault_mutexes;
> >  struct mutex *hugetlb_fault_mutex_table cacheline_aligned_in_smp;
> >
> > +static inline bool PageHugeFreed(struct page *head)
> > +{
> > + return (unsigned long)head[3].mapping == -1U;
> > +}
> > +
> > +static inline void SetPageHugeFreed(struct page *head)
> > +{
> > + head[3].mapping = (void *)-1U;
> > +}
> > +
> > +static inline void ClearPageHugeFreed(struct page *head)
> > +{
> > + head[3].mapping = NULL;
> > +}
> > +
> >  /* Forward declaration */
> >  static int hugetlb_acct_memory(struct hstate *h, long delta);
> >
> > @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, 
> > struct page *page)
> >   list_move(>lru, >hugepage_freelists[nid]);
> >   h->free_huge_pages++;
> >   h->free_huge_pages_node[nid]++;
> > + SetPageHugeFreed(page);
> >  }
> >
> >  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > @@ -1044,6 +1060,7 @@ static struct page 
> > *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >
> >   list_move(>lru, >hugepage_activelist);
> >   set_page_refcounted(page);
> > + ClearPageHugeFreed(page);
> >   h->free_huge_pages--;
> >   h->free_huge_pages_node[nid]--;
> >   return page;
> > @@ -1291,6 +1308,17 @@ static inline void 
> > destroy_compound_gigantic_page(struct page *page,
> >   unsigned int order) { }
> >  #endif
> >
> > +/*
> > + * Because we reuse the mapping field of some tail page structs, we should
> > + * reset those mapping to initial value before @head is freed to the buddy
> > + * allocator. The invalid value will be checked in the 
> > free_tail_pages_check().
> > + */
> > +static inline void reset_tail_page_mapping(struct hstate *h, struct page 
> > *head)
> > +{
> > + if (!hstate_is_gigantic(h))
> > + head[3].mapping = TAIL_MAPPING;
> > +}
> > +
> >  static void update_and_free_page(struct hstate *h, struct page *page)
> >  {
> >   int i;
> > @@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, 
> > struct page *page)
> >   if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >   return;
> >
> > + reset_tail_page_mapping(h, page);
> >   h->nr_huge_pages--;
> >

Re: [RFC 0/2] kbuild: Add support to build overlays (%.dtbo)

2021-01-06 Thread Masahiro Yamada

On Wed, Jan 6, 2021 at 12:21 AM Rob Herring  wrote:
>
> On Tue, Jan 5, 2021 at 4:24 AM Viresh Kumar  wrote:
> >
> > Hello,
> >
> > Here is an attempt to make some changes in the kernel to allow building
> > of device tree overlays.
> >
> > While at it, I would also like to discuss about how we should mention
> > the base DT blobs in the Makefiles for the overlays, so they can be
> > build tested to make sure the overlays apply properly.
> >
> > A simple way is to mention that with -base extension, like this:
> >
> > $(overlay-file)-base := platform-base.dtb
> >
> > Any other preference ?

Viresh's patch is not enough.

We will need to change .gitignore
and scripts/Makefile.dtbinst as well.

In my understanding, the build rule is completely the same
between .dtb and .dtbo
As Rob mentioned, I am not sure if we really need/want
a separate extension.

A counter approach is to use an extension like '.ovl.dtb'
It clarifies it is an overlay fragment without changing
anything in our build system or the upstream DTC project.

We use chained extension in some places, for example,
.dt.yaml for schema yaml files.

dtb-$(CONFIG_ARCH_FOO) += \
foo-board.dtb \
foo-overlay1.ovl.dtb \
foo-overlay2.ovl.dtb

Overlay DT source file names must end with '.ovl.dts'

>
> I think we'll want something similar to how '-objs' works for modules:
>
> foo-board-1-dtbs := foo-board.dtb foo-overlay1.dtbo
> foo-board-2-dtbs := foo-board.dtb foo-overlay2.dtbo
> foo-board-1-2-dtbs := foo-board.dtb foo-overlay1.dtbo foo-overlay2.dtbo
> dtbs-y += foo-board-1.dtb foo-board-2.dtb foo-board-1-2.dtb
>
> (One difference here is we will want all the intermediate targets
> unlike .o files.)
>
> You wouldn't necessarily have all the above combinations, but you have
> to allow for them. I'm not sure how we'd handle applying any common
> overlays where the base and overlay are in different directories.

I guess the motivation for supporting -dtbs is to
add per-board -@ option only when it contains *.dtbo pattern.

But, as you notice, if the overlay files are located
under drivers/, it is difficult to add -@ per board.

Another scenario is, some people may want to compile
downstream overlay files (i.e. similar concept as external modules),
then we have no idea which base board should be given with the -@ flag.

I'd rather be tempted to add it globally

ifdef CONFIG_OF_OVERLAY
DTC_FLAGS += -@
endif

>
> Another thing here is adding all the above is not really going to
> scale on arm32 where we have a single dts directory. We need to move
> things to per vendor/soc family directories. I have the script to do
> this. We just need to agree on the vendor names and get Arnd/Olof to
> run it. I also want that so we can enable schema checks by default
> once a vendor is warning free (the whole tree is going to take
> forever).

If this is a big churn, perhaps we could make it extreme
to decouple DT and Linux-arch.

arch/*/boot/dts/*.dts
 ->  dts//*.dts

Documentation/devicetree/bindings
 -> dts/Bindings/

include/dt-bindings/
 -> dts/include/dt-bindings/

Then, other project can take dts/
to reuse for them.

> > Also fdtoverlay is an external entity right now, and is not part of the
> > kernel. Do we need to make it part of the kernel ? Or keep using the
> > external entity ?
>
> Part of the kernel. We just need to add it to the dtc sync script and
> makefile I think.
>
> Rob

--
Best Regards
Masahiro Yamada

Re: [PATCH v2 1/2] arm64: dts: mt8183: config dsi node

2021-01-06 Thread Nicolas Boichat

On Thu, Jan 7, 2021 at 1:22 PM Hsin-Yi Wang  wrote:
>
> Config dsi node for mt8183 kukui. Set panel and ports.
>
> Several kukui boards share the same panel property and only compatible
> is different. So compatible will be set in board dts for comparison
> convenience.

I like this, but maybe others have different opinions ,-)

Reviewed-by: Nicolas Boichat 

> Signed-off-by: Hsin-Yi Wang 
> ---
> Change:
> v2: move compatible to board dts
> ---
>  .../mediatek/mt8183-kukui-krane-sku176.dts|  5 +++
>  .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 37 +++
>  2 files changed, 42 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts 
> b/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts
> index 47113e275cb52..721d16f9c3b4f 100644
> --- a/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts
> +++ b/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts
> @@ -16,3 +16,8 @@ / {
> model = "MediaTek krane sku176 board";
> compatible = "google,krane-sku176", "google,krane", "mediatek,mt8183";
>  };
> +
> + {
> +status = "okay";
> +compatible = "boe,tv101wum-nl6";
> +};
> diff --git a/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi 
> b/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
> index bf2ad1294dd30..d3d20e4773cf1 100644
> --- a/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
> @@ -249,6 +249,35 @@  {
> proc-supply = <_vproc11_reg>;
>  };
>
> + {
> +   status = "okay";
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   panel: panel@0 {
> +   // compatible will be set in board dts
> +   reg = <0>;
> +   enable-gpios = < 45 0>;
> +   pinctrl-names = "default";
> +   pinctrl-0 = <_pins_default>;
> +   avdd-supply = <_lcd>;
> +   avee-supply = <_lcd>;
> +   pp1800-supply = <_lcd>;
> +   port {
> +   panel_in: endpoint {
> +   remote-endpoint = <_out>;
> +   };
> +   };
> +   };
> +
> +   ports {
> +   port {
> +   dsi_out: endpoint {
> +   remote-endpoint = <_in>;
> +   };
> +   };
> +   };
> +};
> +
>   {
> pinctrl-names = "default";
> pinctrl-0 = <_pins>;
> @@ -547,6 +576,14 @@ pins_clk {
> };
> };
>
> +   panel_pins_default: panel_pins_default {
> +   panel_reset {
> +   pinmux = ;
> +   output-low;
> +   bias-pull-up;
> +   };
> +   };
> +
> pwm0_pin_default: pwm0_pin_default {
> pins1 {
> pinmux = ;
> --
> 2.29.2.729.g45daf8777d-goog
>

[PATCH v2 1/2] arm64: dts: mt8183: config dsi node

2021-01-06 Thread Hsin-Yi Wang

Config dsi node for mt8183 kukui. Set panel and ports.

Several kukui boards share the same panel property and only compatible
is different. So compatible will be set in board dts for comparison
convenience.

Signed-off-by: Hsin-Yi Wang 
---
Change:
v2: move compatible to board dts
---
 .../mediatek/mt8183-kukui-krane-sku176.dts|  5 +++
 .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 37 +++
 2 files changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts 
b/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts
index 47113e275cb52..721d16f9c3b4f 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dts
@@ -16,3 +16,8 @@ / {
model = "MediaTek krane sku176 board";
compatible = "google,krane-sku176", "google,krane", "mediatek,mt8183";
 };
+
+ {
+status = "okay";
+compatible = "boe,tv101wum-nl6";
+};
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
index bf2ad1294dd30..d3d20e4773cf1 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
@@ -249,6 +249,35 @@  {
proc-supply = <_vproc11_reg>;
 };
 
+ {
+   status = "okay";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   panel: panel@0 {
+   // compatible will be set in board dts
+   reg = <0>;
+   enable-gpios = < 45 0>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_pins_default>;
+   avdd-supply = <_lcd>;
+   avee-supply = <_lcd>;
+   pp1800-supply = <_lcd>;
+   port {
+   panel_in: endpoint {
+   remote-endpoint = <_out>;
+   };
+   };
+   };
+
+   ports {
+   port {
+   dsi_out: endpoint {
+   remote-endpoint = <_in>;
+   };
+   };
+   };
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_pins>;
@@ -547,6 +576,14 @@ pins_clk {
};
};
 
+   panel_pins_default: panel_pins_default {
+   panel_reset {
+   pinmux = ;
+   output-low;
+   bias-pull-up;
+   };
+   };
+
pwm0_pin_default: pwm0_pin_default {
pins1 {
pinmux = ;
-- 
2.29.2.729.g45daf8777d-goog

RE: [PATCH v3 3/3] iommu/vt-d: Fix ineffective devTLB invalidation for subdevices

2021-01-06 Thread Liu, Yi L

Hi Will,

> From: Will Deacon 
> Sent: Wednesday, January 6, 2021 1:24 AM
> 
> On Tue, Jan 05, 2021 at 05:50:22AM +, Liu, Yi L wrote:
> > > > +static void __iommu_flush_dev_iotlb(struct device_domain_info
> *info,
> > > > +   u64 addr, unsigned int mask)
> > > > +{
> > > > +   u16 sid, qdep;
> > > > +
> > > > +   if (!info || !info->ats_enabled)
> > > > +   return;
> > > > +
> > > > +   sid = info->bus << 8 | info->devfn;
> > > > +   qdep = info->ats_qdep;
> > > > +   qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> > > > +  qdep, addr, mask);
> > > > +}
> > > > +
> > > >   static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
> > > >   u64 addr, unsigned mask)
> > > >   {
> > > > -   u16 sid, qdep;
> > > > unsigned long flags;
> > > > struct device_domain_info *info;
> > > > +   struct subdev_domain_info *sinfo;
> > > >
> > > > if (!domain->has_iotlb_device)
> > > > return;
> > > >
> > > > spin_lock_irqsave(_domain_lock, flags);
> > > > -   list_for_each_entry(info, >devices, link) {
> > > > -   if (!info->ats_enabled)
> > > > -   continue;
> > > > +   list_for_each_entry(info, >devices, link)
> > > > +   __iommu_flush_dev_iotlb(info, addr, mask);
> > > >
> > > > -   sid = info->bus << 8 | info->devfn;
> > > > -   qdep = info->ats_qdep;
> > > > -   qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> > > > -   qdep, addr, mask);
> > > > +   list_for_each_entry(sinfo, >subdevices, link_domain) {
> > > > +   __iommu_flush_dev_iotlb(get_domain_info(sinfo->pdev),
> > > > +   addr, mask);
> > > > }
> > >
> > > Nit:
> > >   list_for_each_entry(sinfo, >subdevices, link_domain) {
> > >   info = get_domain_info(sinfo->pdev);
> > >   __iommu_flush_dev_iotlb(info, addr, mask);
> > >   }
> >
> > you are right. this should be better.
> 
> Please can you post a v4, with Lu's acks and the issue reported by Dan fixed
> too?

sure, will send out later.

Regards,
Yi Liu

> Thanks,
> 
> Will

[PATCH v2 2/2] arm64: dts: mt8183: Add krane-sku0 board.

2021-01-06 Thread Hsin-Yi Wang

Similar to krane-sku176 but using a different panel source.

Signed-off-by: Hsin-Yi Wang 
---
Change:
v2: move compatible to board dts
---
 .../devicetree/bindings/arm/mediatek.yaml |  1 +
 arch/arm64/boot/dts/mediatek/Makefile |  1 +
 .../dts/mediatek/mt8183-kukui-krane-sku0.dts  | 23 +++
 3 files changed, 25 insertions(+)
 create mode 100644 arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku0.dts

diff --git a/Documentation/devicetree/bindings/arm/mediatek.yaml 
b/Documentation/devicetree/bindings/arm/mediatek.yaml
index 53f0d4e3ea982..3276f7a2ce672 100644
--- a/Documentation/devicetree/bindings/arm/mediatek.yaml
+++ b/Documentation/devicetree/bindings/arm/mediatek.yaml
@@ -120,6 +120,7 @@ properties:
   - const: mediatek,mt8183
   - description: Google Krane (Lenovo IdeaPad Duet, 10e,...)
 items:
+  - const: google,krane-sku0
   - const: google,krane-sku176
   - const: google,krane
   - const: mediatek,mt8183
diff --git a/arch/arm64/boot/dts/mediatek/Makefile 
b/arch/arm64/boot/dts/mediatek/Makefile
index 18f7b46c4095b..deba27ab76574 100644
--- a/arch/arm64/boot/dts/mediatek/Makefile
+++ b/arch/arm64/boot/dts/mediatek/Makefile
@@ -13,6 +13,7 @@ dtb-$(CONFIG_ARCH_MEDIATEK) += mt8173-elm-hana.dtb
 dtb-$(CONFIG_ARCH_MEDIATEK) += mt8173-elm-hana-rev7.dtb
 dtb-$(CONFIG_ARCH_MEDIATEK) += mt8173-evb.dtb
 dtb-$(CONFIG_ARCH_MEDIATEK) += mt8183-evb.dtb
+dtb-$(CONFIG_ARCH_MEDIATEK) += mt8183-kukui-krane-sku0.dtb
 dtb-$(CONFIG_ARCH_MEDIATEK) += mt8183-kukui-krane-sku176.dtb
 dtb-$(CONFIG_ARCH_MEDIATEK) += mt8192-evb.dtb
 dtb-$(CONFIG_ARCH_MEDIATEK) += mt8516-pumpkin.dtb
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku0.dts 
b/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku0.dts
new file mode 100644
index 0..fb5ee91b6fe0e
--- /dev/null
+++ b/arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku0.dts
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright 2019 Google LLC
+ *
+ * Device-tree for Krane sku0.
+ *
+ * SKU is a 8-bit value (0x00 == 0):
+ *  - Bits 7..4: Panel ID: 0x0 (AUO)
+ *  - Bits 3..0: SKU ID:   0x0 (default)
+ */
+
+/dts-v1/;
+#include "mt8183-kukui-krane.dtsi"
+
+/ {
+   model = "MediaTek krane sku0 board";
+   compatible = "google,krane-sku0", "google,krane", "mediatek,mt8183";
+};
+
+ {
+   status = "okay";
+   compatible = "auo,kd101n80-45na";
+};
-- 
2.29.2.729.g45daf8777d-goog

Re: [PATCH v3 2/2] Input: cros-ec-keyb - Expose function row physical map to userspace

2021-01-06 Thread Dmitry Torokhov

Hi Philip,

On Mon, Jan 04, 2021 at 06:22:34PM -0800, Philip Chen wrote:
> The top-row keys in a keyboard usually have dual functionalities.
> E.g. A function key "F1" is also an action key "Browser back".
> 
> Therefore, when an application receives an action key code from
> a top-row key press, the application needs to know how to correlate
> the action key code with the function key code and do the conversion
> whenever necessary.
> 
> Since the userpace already knows the key scanlines (row/column)
> associated with a received key code. Essentially, the userspace only
> needs a mapping between the key row/column and the matching physical
> location in the top row.
> 
> This patch enhances the cros-ec-keyb driver to create such a mapping
> and expose it to userspace in the form of a function-row-physmap
> attribute. The attribute would be a space separated ordered list of
> row/column codes, for the keys in the function row, in a left-to-right
> order.
> 
> The attribute will only be present when the device has a custom design
> for the top-row keys.
> 
> Signed-off-by: Philip Chen 
> ---
> 
> Changes in v3:
> - parse `function-row-physmap` from DT earlier, when we probe
>   cros_ec_keyb, and then store the extracted info in struct cros_ec_keyb.

Thank you for making the changes, much appreciated. Let's wait a bit to
see if Rob has any issues with this.

...

>  static int cros_ec_keyb_probe(struct platform_device *pdev)
>  {
>   struct cros_ec_device *ec = dev_get_drvdata(pdev->dev.parent);
> @@ -617,6 +690,12 @@ static int cros_ec_keyb_probe(struct platform_device 
> *pdev)
>   return err;
>   }
>  
> + err = sysfs_create_group(>kobj, _ec_keyb_attr_group);
> + if (err) {
> + dev_err(dev, "failed to create attributes. err=%d\n", err);
> + return err;
> + }

Let's use devm_device_add_group() so that we do not need to remove it
manually in cros_ec_keyb_remove().

Thanks.

-- 
Dmitry

[PATCH V2 2/2] scripts: dtc: Build fdtoverlay and fdtdump tools

2021-01-06 Thread Viresh Kumar

We will start building overlays for platforms soon in the kernel and
would need these tools going forward. Lets start building them.

Signed-off-by: Viresh Kumar 
---
 scripts/dtc/Makefile | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/scripts/dtc/Makefile b/scripts/dtc/Makefile
index 4852bf44e913..c607980a5c17 100644
--- a/scripts/dtc/Makefile
+++ b/scripts/dtc/Makefile
@@ -1,12 +1,18 @@
 # SPDX-License-Identifier: GPL-2.0
 # scripts/dtc makefile
 
-hostprogs-always-$(CONFIG_DTC) += dtc
+hostprogs-always-$(CONFIG_DTC) += dtc fdtdump fdtoverlay
 hostprogs-always-$(CHECK_DT_BINDING)   += dtc
 
 dtc-objs   := dtc.o flattree.o fstree.o data.o livetree.o treesource.o \
   srcpos.o checks.o util.o
 dtc-objs   += dtc-lexer.lex.o dtc-parser.tab.o
+fdtdump-objs   := fdtdump.o util.o
+
+libfdt_dir = libfdt
+libfdt-objs:= fdt.o fdt_ro.o fdt_wip.o fdt_sw.o fdt_rw.o fdt_strerror.o 
fdt_empty_tree.o fdt_addresses.o fdt_overlay.o
+libfdt = $(addprefix $(libfdt_dir)/,$(libfdt-objs))
+fdtoverlay-objs:= $(libfdt) fdtoverlay.o util.o
 
 # Source files need to get at the userspace version of libfdt_env.h to compile
 HOST_EXTRACFLAGS += -I $(srctree)/$(src)/libfdt
-- 
2.25.0.rc1.19.g042ed3e048af

[PATCH V2 1/2] scripts: dtc: Add fdtoverlay.c and fdtdump.c to DTC_SOURCE

2021-01-06 Thread Viresh Kumar

We will start building overlays for platforms soon in the kernel and
would need these tools going forward. Lets start fetching them.

Note that a copy of fdtdump.c was already copied back in the year 2012,
but was never updated or built for some reason.

Signed-off-by: Viresh Kumar 
---
V2: Separate out this change from Makefile one.

This needs to be followed by invocation of the ./update-dtc-source.sh
script so the relevant files can be copied before the Makefile is
updated in the next patch.

 scripts/dtc/update-dtc-source.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/dtc/update-dtc-source.sh b/scripts/dtc/update-dtc-source.sh
index bc704e2a6a4a..9bc4afb71415 100755
--- a/scripts/dtc/update-dtc-source.sh
+++ b/scripts/dtc/update-dtc-source.sh
@@ -31,9 +31,9 @@ set -ev
 DTC_UPSTREAM_PATH=`pwd`/../dtc
 DTC_LINUX_PATH=`pwd`/scripts/dtc
 
-DTC_SOURCE="checks.c data.c dtc.c dtc.h flattree.c fstree.c livetree.c 
srcpos.c \
-   srcpos.h treesource.c util.c util.h version_gen.h yamltree.c \
-   dtc-lexer.l dtc-parser.y"
+DTC_SOURCE="checks.c data.c dtc.c dtc.h fdtdump.c fdtoverlay.c flattree.c \
+   fstree.c livetree.c srcpos.c srcpos.h treesource.c util.c \
+   util.h version_gen.h yamltree.c dtc-lexer.l dtc-parser.y"
 LIBFDT_SOURCE="fdt.c fdt.h fdt_addresses.c fdt_empty_tree.c \
fdt_overlay.c fdt_ro.c fdt_rw.c fdt_strerror.c fdt_sw.c \
fdt_wip.c libfdt.h libfdt_env.h libfdt_internal.h"
-- 
2.25.0.rc1.19.g042ed3e048af

Re: [PATCH v3 1/6] arm64: dts: imx8mq: Add NOC node

2021-01-06 Thread Shawn Guo

On Thu, Dec 10, 2020 at 11:09:01AM +0100, Martin Kepplinger wrote:
> From: Leonard Crestez 
> 
> Add initial support for dynamic frequency scaling of the main NOC
> on imx8mq.
> 
> Make DDRC the parent of the NOC (using passive governor) so that the
> main NOC is automatically scaled together with DDRC by default.
> 
> Support for proactive scaling via interconnect will come on top.
> 
> Signed-off-by: Leonard Crestez 
> Signed-off-by: Martin Kepplinger 
> ---
>  arch/arm64/boot/dts/freescale/imx8mq.dtsi | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi 
> b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> index a841a023e8e0..9c9d68a14e69 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> @@ -1158,6 +1158,28 @@
>   };
>   };
>  
> + noc: interconnect@3270 {
> + compatible = "fsl,imx8mq-noc", "fsl,imx8m-noc";
> + reg = <0x3270 0x10>;
> + clocks = < IMX8MQ_CLK_NOC>;
> + fsl,ddrc = <>;
> + operating-points-v2 = <_opp_table>;
> +
> + noc_opp_table: opp-table {
> + compatible = "operating-points-v2";
> +
> + opp-133M {
> + opp-hz = /bits/ 64 <1>;
> + };

Please have a newline between nodes.

Shawn

> + opp-400M {
> + opp-hz = /bits/ 64 <4>;
> + };
> + opp-800M {
> + opp-hz = /bits/ 64 <8>;
> + };
> + };
> + };
> +
>   bus@32c0 { /* AIPS4 */
>   compatible = "fsl,aips-bus", "simple-bus";
>   reg = <0x32c0 0x40>;
> -- 
> 2.20.1
>

RE: [PATCH 4/6] acpi/drivers/thermal: Remove TRIPS_NONE cooling device binding

2021-01-06 Thread Zhang, Rui

ACPI thermal driver binds the devices listed in _TZD method with 
THERMAL_TRIPS_NONE.
Now given that
1. THERMAL_TRIPS_NONE is removed from thermal framework
2. _TZP is rarely supported. I searched ~500 acpidumps from different platforms 
reported by end users in kernel Bugzilla, there is only one platform with _TZP 
implemented, and it was almost 10 years ago.

So, I think it is safe to remove this piece of code.

> -Original Message-
> From: Daniel Lezcano 
> Sent: Tuesday, January 05, 2021 11:44 PM
> To: Zhang, Rui 
> Cc: mj...@codon.org.uk; linux...@vger.kernel.org; linux-
> ker...@vger.kernel.org; am...@kernel.org; thara.gopin...@linaro.org;
> Rafael J. Wysocki ; Len Brown ; open
> list:ACPI THERMAL DRIVER 
> Subject: Re: [PATCH 4/6] acpi/drivers/thermal: Remove TRIPS_NONE cooling
> device binding
> Importance: High
> 
> Hi Rui,
> 
> 
> On 15/12/2020 00:38, Daniel Lezcano wrote:
> > The loop is here to create default cooling device binding on the
> > THERMAL_TRIPS_NONE number which is used to be the 'forced_passive'
> > feature. However, we removed all code dealing with that in the thermal
> > core, thus this binding does no longer make sense.
> >
> > Remove it.
> >
> > Signed-off-by: Daniel Lezcano 

Acked-by: Zhang Rui 

Thanks,
rui
> 
> Are you fine with this change?
> 
> Thanks
> 
>   -- Daniel
> 
> > ---
> >  drivers/acpi/thermal.c | 19 ---
> >  1 file changed, 19 deletions(-)
> >
> > diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c index
> > b5e4bc9e3282..26a89ff80a0e 100644
> > --- a/drivers/acpi/thermal.c
> > +++ b/drivers/acpi/thermal.c
> > @@ -764,25 +764,6 @@ static int acpi_thermal_cooling_device_cb(struct
> thermal_zone_device *thermal,
> > }
> > }
> >
> > -   for (i = 0; i < tz->devices.count; i++) {
> > -   handle = tz->devices.handles[i];
> > -   status = acpi_bus_get_device(handle, );
> > -   if (ACPI_SUCCESS(status) && (dev == device)) {
> > -   if (bind)
> > -   result = thermal_zone_bind_cooling_device
> > -   (thermal,
> THERMAL_TRIPS_NONE,
> > -cdev, THERMAL_NO_LIMIT,
> > -THERMAL_NO_LIMIT,
> > -
> THERMAL_WEIGHT_DEFAULT);
> > -   else
> > -   result =
> thermal_zone_unbind_cooling_device
> > -   (thermal,
> THERMAL_TRIPS_NONE,
> > -cdev);
> > -   if (result)
> > -   goto failed;
> > -   }
> > -   }
> > -
> >  failed:
> > return result;
> >  }
> >
> 
> 
> --
>  Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:   Facebook |
>  Twitter |  blog/> Blog

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1126 matches

Mail list logo