date:20180723

Re: INFO: task hung in fuse_reverse_inval_entry

2018-07-23 Thread Miklos Szeredi

On Mon, Jul 23, 2018 at 3:37 PM, Dmitry Vyukov  wrote:
> On Mon, Jul 23, 2018 at 3:05 PM, Miklos Szeredi  wrote:

>> Biggest conceptual problem: your definition of fuse-server is weak.
>> Take the following example: process A is holding the fuse device fd
>> and is forwarding requests and replies to/from process B via a pipe.
>> So basically A is just a proxy that does nothing interesting, the
>> "real" server is B.  But according to your definition B is not a
>> server, only A is.
>
> I proposed to abort fuse conn when all fuse device fd's are "killed"
> (all processes having the fd opened are killed). So if _only_ process
> B is killed, then, yes, it will still hang. However if A is killed or
> both A and B (say, process group, everything inside of pid namespace,
> etc) then the deadlock will be autoresolved without human
> intervention.

Okay, so you're saying:

1) when process gets SIGKILL and is uninterruptible sleep mark process as doomed
2) for a particular fuse instance find set of fuse device fd
references that are in non-doomed tasks; if there are none then abort
fuse instance

Right?

The above is not an implementation proposal, just to get us on the
same page regarding the concept.

>> And this is just a simple example, parts of the server might be on
>> different machines, etc...  It's impossible to automatically detect if
>> a process is acting as a fuse server or not.
>
> It does not seem we need the precise definition. If no one ever can
> write anything into the fd, we can safely abort the connection (?).

Seems to me so.

> If
> we don't, we can either get that the process exits normally and the
> connection is doomed anyway, so no difference in behavior, or we can
> get a deadlock.
>
>> We could let the fuse server itself notify the kernel that it's a fuse
>> server.  That might help in the cases where the deadlock is
>> accidental, but obviously not in the case when done by a malicious
>> agent.  I'm not sure it's worth the effort.   Also I have no idea how
>> the respective maintainers would take the idea of "kill hooks"...   It
>> would probably be a lot of work for little gain.
>
> What looks wrong to me here is that fuse is only (?) subsystem in
> kernel that stops SIGKILL from working and requires complex custom
> dance performed by a human operator (which is not necessary there at
> all). Say, if a process has opened a socket, whatever, I don't need to
> locate and abort something in socketctl fs, just SIGKILL. If a
> processes has opened a file, I don't need to locate the fd in /proc
> and abort it, just SIGKILL. If a process has created an ipc object, I
> don't need to do any special dance, just SIGKILL. fuse is somehow very
> special, if we have more such cases, it definitely won't scale.
> I understand that there can be implementation difficulties, but
> fundamentally that's how things should work -- choose target
> processes, kill, done, right?

Yes, it would be nice.

But I'm not sure it will fly due to implementation difficulties.  It's
definitely not  a high prio feature currently for me, but I'll happily
accept patches.

Thanks,
Miklos

Re: [PATCH 1/1] getxattr: use correct xattr length

2018-07-23 Thread Serge E. Hallyn

Quoting Christian Brauner (christian.brau...@canonical.com):
> On Wed, Jun 13, 2018 at 10:45:37AM -0500, Serge Hallyn wrote:
> > On Thu, Jun 07, 2018 at 01:43:48PM +0200, Christian Brauner wrote:
> > > When running in a container with a user namespace, if you call getxattr
> > > with name = "system.posix_acl_access" and size % 8 != 4, then getxattr
> > > silently skips the user namespace fixup that it normally does resulting in
> > > un-fixed-up data being returned.
> > > This is caused by posix_acl_fix_xattr_to_user() being passed the total
> > > buffer size and not the actual size of the xattr as returned by
> > > vfs_getxattr().
> > > This commit passes the actual length of the xattr as returned by
> > > vfs_getxattr() down.
> > > 
> > > A reproducer for the issue is:
> > > 
> > >   touch acl_posix
> > > 
> > >   setfacl -m user:0:rwx acl_posix
> > > 
> > > and the compile:
> > > 
> > >   #define _GNU_SOURCE
> > >   #include 
> > >   #include 
> > >   #include 
> > >   #include 
> > >   #include 
> > >   #include 
> > >   #include 
> > > 
> > >   /* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */
> > >   int main(int argc, void **argv)
> > >   {
> > >   ssize_t ret1, ret2;
> > >   char buf1[128], buf2[132];
> > >   int fret = EXIT_SUCCESS;
> > >   char *file;
> > > 
> > >   if (argc < 2) {
> > >   fprintf(stderr,
> > >   "Please specify a file with "
> > >   "\"system.posix_acl_access\" permissions 
> > > set\n");
> > >   _exit(EXIT_FAILURE);
> > >   }
> > >   file = argv[1];
> > > 
> > >   ret1 = getxattr(file, "system.posix_acl_access",
> > >   buf1, sizeof(buf1));
> > >   if (ret1 < 0) {
> > >   fprintf(stderr, "%s - Failed to retrieve "
> > >   "\"system.posix_acl_access\" "
> > >   "from \"%s\"\n", strerror(errno), file);
> > >   _exit(EXIT_FAILURE);
> > >   }
> > > 
> > >   ret2 = getxattr(file, "system.posix_acl_access",
> > >   buf2, sizeof(buf2));
> > >   if (ret2 < 0) {
> > >   fprintf(stderr, "%s - Failed to retrieve "
> > >   "\"system.posix_acl_access\" "
> > >   "from \"%s\"\n", strerror(errno), file);
> > >   _exit(EXIT_FAILURE);
> > >   }
> > > 
> > >   if (ret1 != ret2) {
> > >   fprintf(stderr, "The value of \"system.posix_acl_"
> > >   "access\" for file \"%s\" changed "
> > >   "between two successive calls\n", file);
> > >   _exit(EXIT_FAILURE);
> > >   }
> > > 
> > >   for (ssize_t i = 0; i < ret2; i++) {
> > >   if (buf1[i] == buf2[i])
> > >   continue;
> > > 
> > >   fprintf(stderr,
> > >   "Unexpected different in byte %zd: "
> > >   "%02x != %02x\n", i, buf1[i], buf2[i]);
> > >   fret = EXIT_FAILURE;
> > >   }
> > > 
> > >   if (fret == EXIT_SUCCESS)
> > >   fprintf(stderr, "Test passed\n");
> > >   else
> > >   fprintf(stderr, "Test failed\n");
> > > 
> > >   _exit(fret);
> > >   }
> > > and run:
> > > 
> > >   ./tester acl_posix
> > > 
> > > On a non-fixed up kernel this should return something like:
> > > 
> > >   root@c1:/# ./t
> > >   Unexpected different in byte 16: ffa0 != 00
> > >   Unexpected different in byte 17: ff86 != 00
> > >   Unexpected different in byte 18: 01 != 00
> > > 
> > > and on a fixed kernel:
> > > 
> > >   root@c1:~# ./t
> > >   Test passed
> > > 
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945
> > > Reported-by: Colin Watson 
> > > Signed-off-by: Christian Brauner 
> > 
> > D'oh, sorry, I thought I had replied to this!
> > 
> > Acked-by: Serge Hallyn 
> 
> Should this still land as a bugfix in 4.18?

Eric, do you mind picking this up?

> Christian
> 
> > 
> > thanks,
> > Serge
> > 
> > > ---
> > >  fs/xattr.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xattr.c b/fs/xattr.c
> > > index f9cb1db187b7..1bee74682513 100644
> > > --- a/fs/xattr.c
> > > +++ b/fs/xattr.c
> > > @@ -539,7 +539,7 @@ getxattr(struct dentry *d, const char __user *name, 
> > > void __user *value,
> > >   if (error > 0) {
> > >   if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
> > >   (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
> > > - posix_acl_fix_xattr_to_user(kvalue, size);
> > > + posix_acl_fix_xattr_to_user(kvalue, error);
> > >   if (size && copy_to_user(value, kvalue, error))
> > >

Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-23 Thread Tejun Heo

Hello,

On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote:
> The cgroup's CPU controller allows to assign a specified (maximum)
> bandwidth to the tasks of a group. However this bandwidth is defined and
> enforced only on a temporal base, without considering the actual
> frequency a CPU is running on. Thus, the amount of computation completed
> by a task within an allocated bandwidth can be very different depending
> on the actual frequency the CPU is running that task.
> The amount of computation can be affected also by the specific CPU a
> task is running on, especially when running on asymmetric capacity
> systems like Arm's big.LITTLE.

One basic problem I have with this patchset is that what's being
described is way more generic than what actually got implemented.
What's described is computation bandwidth control but what's
implemented is just frequency clamping.  So, there are fundamental
discrepancies between description+interface vs. what it actually does.

I really don't think that's something we can fix up later.

> These attributes:
> 
> a) are available only for non-root nodes, both on default and legacy
>hierarchies
> b) do not enforce any constraints and/or dependency between the parent
>and its child nodes, thus relying on the delegation model and
>permission settings defined by the system management software

cgroup does host attributes which only concern the cgroup itself and
thus don't need any hierarchical behaviors on their own, but what's
being implemented does control resource allocation, and what you're
describing inherently breaks the delegation model.

> c) allow to (eventually) further restrict task-specific clamps defined
>via sched_setattr(2)

Thanks.

-- 
tejun

Re: [PATCH 1/4] x86/perf/intel: Introduce PMU flag for Extended PEBS

2018-07-23 Thread Peter Zijlstra

On Mon, Jul 23, 2018 at 11:43:56AM -0400, Liang, Kan wrote:
> > So I like PEBS_ALL.. what I don't like is that it seems to be mutually
> > exclusive with PEBS Load Latency.
> 
> Right, MSR_PEBS_ENABLE:32-35 is model specific.

Doesn't mean they couldn't have avoided conflicting bits.

> For Atom,
>   Goldmont and earlier platform, they are reserved.
>   Goldmont Plus, 32-34 are for fixed counter, 35 is reserved.
> For Core,
>   from Nehalem to latest 8th, 32-35 are for Load Latency.

Seems rather unfortunate to me. Because PEBS_ALL is good, but since they
took conflicting bits, we'll have yet another variant when/if (I hope
they do) they bring it to Core :/

Re: [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing

2018-07-23 Thread Johannes Weiner

On Mon, Jul 23, 2018 at 05:35:35PM +0200, Arnd Bergmann wrote:
> On Mon, Jul 23, 2018 at 5:23 PM, Johannes Weiner  wrote:
> > From 1d24635a6c7cd395bad5c29a3b9e5d2e98d9ab84 Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner 
> > Date: Mon, 23 Jul 2018 10:18:23 -0400
> > Subject: [PATCH] arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap
> >  setups
> >
> > Arnd reports the following arm64 randconfig build error with the PSI
> > patches that add another page flag:
> >
> 
> You could add further text here that I had just added to my
> patch description (not sent):
> 
> Further experiments show that the build error already existed before,
> but was only triggered with larger values of CONFIG_NR_CPU and/or
> CONFIG_NODES_SHIFT that might be used in actual configurations but
> not in randconfig builds.
> 
> With longer CPU and node masks, I could recreate the problem with
> kernels as old as linux-4.7 when arm64 NUMA support got added.
> 
> Cc: sta...@vger.kernel.org
> Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
> Fixes: 3e1907d5bf5a ("arm64: mm: move vmemmap region right below
> the linear region")

Sure thing.

> >  arch/arm64/mm/init.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 1b18b4722420..72c9b6778b0a 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -611,11 +611,13 @@ void __init mem_init(void)
> > BUILD_BUG_ON(TASK_SIZE_32   > TASK_SIZE_64);
> >  #endif
> >
> > +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> > /*
> 
> I tested it on two broken configurations, and found that you have
> a typo here, it should be 'ifdef', not 'ifndef'. With that change, it
> seems to build fine.
> 
> Tested-by: Arnd Bergmann 

Thanks for testing it, I don't have a cross-compile toolchain set up.

---

>From 34c4c4549f09f971d2d391a8d652d56cb9b05475 Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Mon, 23 Jul 2018 10:18:23 -0400
Subject: [PATCH] arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap
 setups

Arnd reports the following arm64 randconfig build error with the PSI
patches that add another page flag:

  /git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init':
  /git/arm-soc/include/linux/compiler.h:357:38: error: call to
  '__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON
  failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)

The additional page flag causes other information stored in
page->flags to get bumped into their own struct page member:

  #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <=
  BITS_PER_LONG - NR_PAGEFLAGS
  #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
  #else
  #define LAST_CPUPID_WIDTH 0
  #endif

  #if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
  #define LAST_CPUPID_NOT_IN_PAGE_FLAGS
  #endif

which in turn causes the struct page size to exceed the size set in
STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the
VMEMMAP page array according to address space and struct page size.

However, the check is performed - and triggers here - on a !VMEMMAP
config, which consumes an additional 22 page bits for the sparse
section id. When VMEMMAP is enabled, those bits are returned, cpupid
doesn't need its own member, and the page passes the VMEMMAP check.

Restrict that check to the situation it was meant to check: that we
are sizing the VMEMMAP page array correctly.

Says Arnd:

Further experiments show that the build error already existed before,
but was only triggered with larger values of CONFIG_NR_CPU and/or
CONFIG_NODES_SHIFT that might be used in actual configurations but
not in randconfig builds.

With longer CPU and node masks, I could recreate the problem with
kernels as old as linux-4.7 when arm64 NUMA support got added.

Reported-by: Arnd Bergmann 
Tested-by: Arnd Bergmann 
Cc: sta...@vger.kernel.org
Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
Fixes: 3e1907d5bf5a ("arm64: mm: move vmemmap region right below the linear 
region")
Signed-off-by: Johannes Weiner 
---
 arch/arm64/mm/init.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1b18b4722420..86d9f9d303b0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -611,11 +611,13 @@ void __init mem_init(void)
BUILD_BUG_ON(TASK_SIZE_32   > TASK_SIZE_64);
 #endif
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
/*
 * Make sure we chose the upper bound of sizeof(struct page)
-* correctly.
+* correctly when sizing the VMEMMAP array.
 */
BUILD_BUG_ON(sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT));
+#endif
 
if (PAGE_SIZE >= 16384 && get_num_physpages() <= 128) {
extern int sysctl_overcommit_memory;
-- 
2.18.0

possible deadlock in mnt_want_write

2018-07-23 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:45ae4df92207 Merge tag 'armsoc-fixes' of git://git.kernel...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10e7eee040
kernel config:  https://syzkaller.appspot.com/x/.config?x=c0bdc4175608181c
dashboard link: https://syzkaller.appspot.com/bug?extid=ae82084b07d0297e566b
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ae82084b07d0297e5...@syzkaller.appspotmail.com

device bridge_slave_0 left promiscuous mode
bridge0: port 1(bridge_slave_0) entered disabled state
IPVS: set_ctl: invalid protocol: 255 0.0.0.0:20004

==
WARNING: possible circular locking dependency detected
4.18.0-rc5+ #159 Not tainted
--
syz-executor7/24660 is trying to acquire lock:
7bd46ec8 (sb_writers#15){.+.+}, at: sb_start_write  
include/linux/fs.h:1554 [inline]
7bd46ec8 (sb_writers#15){.+.+}, at: mnt_want_write+0x3f/0xc0  
fs/namespace.c:386


but task is already holding lock:
a4a13f7a (>mutex){+.+.}, at: fuse_lock_inode+0xaf/0xe0  
fs/fuse/inode.c:363


which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (>mutex){+.+.}:
   __mutex_lock_common kernel/locking/mutex.c:757 [inline]
   __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
   mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
   fuse_lock_inode+0xaf/0xe0 fs/fuse/inode.c:363
   fuse_lookup+0x8f/0x4c0 fs/fuse/dir.c:359
   __lookup_hash+0x12e/0x190 fs/namei.c:1505
   filename_create+0x1e5/0x5b0 fs/namei.c:3646
   user_path_create fs/namei.c:3703 [inline]
   do_mkdirat+0xda/0x310 fs/namei.c:3842
   __do_sys_mkdirat fs/namei.c:3861 [inline]
   __se_sys_mkdirat fs/namei.c:3859 [inline]
   __x64_sys_mkdirat+0x76/0xb0 fs/namei.c:3859
   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #1 (>i_mutex_dir_key#5/1){+.+.}:
   down_write_nested+0x93/0x130 kernel/locking/rwsem.c:192
   inode_lock_nested include/linux/fs.h:750 [inline]
   filename_create+0x1b2/0x5b0 fs/namei.c:3645
   user_path_create fs/namei.c:3703 [inline]
   do_mkdirat+0xda/0x310 fs/namei.c:3842
   __do_sys_mkdirat fs/namei.c:3861 [inline]
   __se_sys_mkdirat fs/namei.c:3859 [inline]
   __x64_sys_mkdirat+0x76/0xb0 fs/namei.c:3859
nla_parse: 14 callbacks suppressed
netlink: 3 bytes leftover after parsing attributes in process  
`syz-executor1'.

   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (sb_writers#15){.+.+}:
   lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
   percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36  
[inline]

   percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
   __sb_start_write+0x1e9/0x300 fs/super.c:1403
   sb_start_write include/linux/fs.h:1554 [inline]
   mnt_want_write+0x3f/0xc0 fs/namespace.c:386
   path_removexattr+0xf0/0x210 fs/xattr.c:703
   __do_sys_removexattr fs/xattr.c:719 [inline]
   __se_sys_removexattr fs/xattr.c:716 [inline]
   __x64_sys_removexattr+0x59/0x80 fs/xattr.c:716
   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
  sb_writers#15 --> >i_mutex_dir_key#5/1 --> >mutex

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(>mutex);
   lock(>i_mutex_dir_key#5/1);
   lock(>mutex);
  lock(sb_writers#15);

 *** DEADLOCK ***

1 lock held by syz-executor7/24660:
 #0: a4a13f7a (>mutex){+.+.}, at: fuse_lock_inode+0xaf/0xe0  
fs/fuse/inode.c:363


stack backtrace:
CPU: 1 PID: 24660 Comm: syz-executor7 Not tainted 4.18.0-rc5+ #159
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_circular_bug.isra.36.cold.57+0x1bd/0x27d  
kernel/locking/lockdep.c:1227

 check_prev_add kernel/locking/lockdep.c:1867 [inline]
 check_prevs_add kernel/locking/lockdep.c:1980 [inline]
 validate_chain kernel/locking/lockdep.c:2421 [inline]
 __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36 [inline]
 percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
 __sb_start_write+0x1e9/0x300 fs/super.c:1403
 sb_start_write include/linux/fs.h:1554 [inline]
 mnt_want_write+0x3f/0xc0

Re: [PATCH] phy: qcom-qmp: Fix dts bindings to reflect reality

2018-07-23 Thread Doug Anderson

Hi,

On Mon, Jul 23, 2018 at 7:04 AM, Rob Herring  wrote:
> On Fri, Jul 20, 2018 at 11:54 AM Doug Anderson  wrote:
>>
>> Hi,
>>
>> On Fri, Jul 20, 2018 at 10:26 AM, Rob Herring  wrote:
>> > On Fri, Jul 20, 2018 at 9:13 AM Doug Anderson  
>> > wrote:
>> >>
>> >> Rob,
>> >>
>> >> On Fri, Jul 20, 2018 at 7:10 AM, Rob Herring  wrote:
>> >> > On Fri, Jul 06, 2018 at 04:31:42PM -0700, Douglas Anderson wrote:
>> >> >> A few patches have landed for the qcom-qmp PHY that affect how you
>> >> >> would write a device tree node.  ...yet the bindings weren't updated.
>> >> >> Let's remedy the situation and make the bindings refelect reality.
>> >> >
>> >> > "dt-bindings: phy: ..." for the subject.
>> >>
>> >> Sorry.  Every subsystem has different conventions for this so I
>> >> usually just do a "git log" on the file and make my best guess.  I'll
>> >> try to remember this for next time though.
>> >
>> > NP. I'd like to add this info to MAINTAINERS or maybe a git commit
>> > hook could figure this out automagically.
>> >
>> >> In this case, though, it looks like this already landed.  I see this
>> >> patch in Kishon's next branch.
>> >>
>> >>
>> >> >> Fixes: efb05a50c956 ("phy: qcom-qmp: Add support for QMP V3 USB3 PHY")
>> >> >> Fixes: ac0d239936bd ("phy: qcom-qmp: Add support for runtime PM")
>> >> >> Signed-off-by: Douglas Anderson 
>> >> >> ---
>> >> >>
>> >> >>  .../devicetree/bindings/phy/qcom-qmp-phy.txt   | 14 --
>> >> >>  1 file changed, 12 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt 
>> >> >> b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
>> >> >> index 266a1bb8bb6e..0c7629e88bf3 100644
>> >> >> --- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
>> >> >> +++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
>> >> >> @@ -12,7 +12,14 @@ Required properties:
>> >> >>  "qcom,sdm845-qmp-usb3-phy" for USB3 QMP V3 phy on sdm845,
>> >> >>  "qcom,sdm845-qmp-usb3-uni-phy" for USB3 QMP V3 UNI phy on 
>> >> >> sdm845.
>> >> >>
>> >> >> - - reg: offset and length of register set for PHY's common serdes 
>> >> >> block.
>> >> >> + - reg:
>> >> >> +   - For "qcom,sdm845-qmp-usb3-phy":
>> >> >> + - index 0: address and length of register set for PHY's common 
>> >> >> serdes
>> >> >> +   block.
>> >> >> + - named register "dp_com" (using reg-names): address and length 
>> >> >> of the
>> >> >> +   DP_COM control block.
>> >> >
>> >> > You need to list reg-names and what are the names for the other 2
>> >> > regions?
>> >>
>> >> Here's the code works.  You can tell me how you want this expressed in
>> >> the bindings:
>> >>
>> >> * In all cases the driver maps its main memory range (for the common
>> >> serdes block) without specifying any name.  This is equivalent to
>> >> asking for index 0.
>> >>
>> >> * For "qcom,sdm845-qmp-usb3-phy" the driver requests a second memory
>> >> range by name using the name "dp_com".
>> >>
>> >> ...basically the driver is inconsistent between using names and
>> >> indices and I was trying to document that fact.
>> >
>> > That's fine as long as the indices are fixed.
>> >
>> >>
>> >> I guess options:
>> >>
>> >> 1. I could reword this so it's clearer (open to suggestions)
>> >>
>> >> 2. I could add something to the bindings saying that the first reg
>> >> name should be "reg-base" or something.  Then the question is whether
>> >> we should go to the code and start enforcing that.  If we do choose to
>> >> enforce it then it's technically breaking compatibility (though I
>> >> doubt there is any real dts in the wild).  If we don't choose to
>> >> enforce it then why did we bother saying what it should be named?
>> >
>> > I think you need to state index 1 is dp_com (and only for
>> > "qcom,sdm845-qmp-usb3-phy") and a name for index 0. 'reg-base' doesn't
>> > seem great, but I don't have another suggestion.
>>
>> ...but why do we bother giving "dp_com" a name if we're saying it has
>> to be index 1?  It feels like the author of the driver was trying to
>> transition from specifying to specifying registers by index to
>> specifying them by name, but left the first register specified by
>> index for compatibility (or code simplicity?).  It seems like the
>> whole point of referring to things by name is to _not_ force the index
>> number.
>
> No. Specifying the order and indexes is how bindings are done.
> "-names" is extra information, not a license to change the rules.

OK.

Just for context: I'm not trying to be argumentative or anything--I
just seem to be lacking a fundamental understanding of why reg-names
exists and when it should be used.  I'm trying to ask questions so I
can have a better intuition here and submit better patches / do better
reviews.  I'm sorry if I'm coming off as a jerk.  :(  I'm not trying
to be.

Do you happen to know if there's anything written up that explains all
the conventions around reg-names and I can just read

[PATCH i2c-next] i2c: aspeed: Handle master/slave combined irq events properly

2018-07-23 Thread Jae Hyun Yoo

In most of cases, interrupt bits are set one by one but there are
also a lot of other cases that Aspeed I2C IP sends multiple
interrupt bits with combining master and slave events using a
single interrupt call. It happens much in multi-master environment
than single-master. For an example, when master is waiting for a
NORMAL_STOP interrupt in its MASTER_STOP state, SLAVE_MATCH and
RX_DONE interrupts could come along with the NORMAL_STOP in case of
an another master immediately sends data just after acquiring the
bus. In this case, the NORMAL_STOP interrupt should be handled
by master_irq and the SLAVE_MATCH and RX_DONE interrupts should be
handled by slave_irq. This commit modifies irq hadling logic to
handle the master/slave combined events properly.

Signed-off-by: Jae Hyun Yoo 
---
 drivers/i2c/busses/i2c-aspeed.c | 137 ++--
 1 file changed, 76 insertions(+), 61 deletions(-)

diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
index efb89422d496..24d43f143a55 100644
--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -82,6 +82,11 @@
 #define ASPEED_I2CD_INTR_RX_DONE   BIT(2)
 #define ASPEED_I2CD_INTR_TX_NAKBIT(1)
 #define ASPEED_I2CD_INTR_TX_ACKBIT(0)
+#define ASPEED_I2CD_INTR_ERRORS
   \
+   (ASPEED_I2CD_INTR_SDA_DL_TIMEOUT | \
+ASPEED_I2CD_INTR_SCL_TIMEOUT |\
+ASPEED_I2CD_INTR_ABNORMAL |   \
+ASPEED_I2CD_INTR_ARBIT_LOSS)
 #define ASPEED_I2CD_INTR_ALL  \
(ASPEED_I2CD_INTR_SDA_DL_TIMEOUT | \
 ASPEED_I2CD_INTR_BUS_RECOVER_DONE |   \
@@ -150,6 +155,7 @@ struct aspeed_i2c_bus {
int cmd_err;
/* Protected only by i2c_lock_bus */
int master_xfer_result;
+   u32 irq_status;
 #if IS_ENABLED(CONFIG_I2C_SLAVE)
struct i2c_client   *slave;
enum aspeed_i2c_slave_state slave_state;
@@ -229,36 +235,30 @@ static int aspeed_i2c_recover_bus(struct aspeed_i2c_bus 
*bus)
 #if IS_ENABLED(CONFIG_I2C_SLAVE)
 static bool aspeed_i2c_slave_irq(struct aspeed_i2c_bus *bus)
 {
-   u32 command, irq_status, status_ack = 0;
+   u32 command, status_ack = 0;
struct i2c_client *slave = bus->slave;
-   bool irq_handled = true;
u8 value;
 
-   if (!slave) {
-   irq_handled = false;
-   goto out;
-   }
+   if (!slave)
+   return false;
 
command = readl(bus->base + ASPEED_I2C_CMD_REG);
-   irq_status = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
 
/* Slave was requested, restart state machine. */
-   if (irq_status & ASPEED_I2CD_INTR_SLAVE_MATCH) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_SLAVE_MATCH) {
status_ack |= ASPEED_I2CD_INTR_SLAVE_MATCH;
bus->slave_state = ASPEED_I2C_SLAVE_START;
}
 
/* Slave is not currently active, irq was for someone else. */
-   if (bus->slave_state == ASPEED_I2C_SLAVE_STOP) {
-   irq_handled = false;
-   goto out;
-   }
+   if (bus->slave_state == ASPEED_I2C_SLAVE_STOP)
+   return false;
 
dev_dbg(bus->dev, "slave irq status 0x%08x, cmd 0x%08x\n",
-   irq_status, command);
+   bus->irq_status, command);
 
/* Slave was sent something. */
-   if (irq_status & ASPEED_I2CD_INTR_RX_DONE) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_RX_DONE) {
value = readl(bus->base + ASPEED_I2C_BYTE_BUF_REG) >> 8;
/* Handle address frame. */
if (bus->slave_state == ASPEED_I2C_SLAVE_START) {
@@ -273,28 +273,29 @@ static bool aspeed_i2c_slave_irq(struct aspeed_i2c_bus 
*bus)
}
 
/* Slave was asked to stop. */
-   if (irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
status_ack |= ASPEED_I2CD_INTR_NORMAL_STOP;
bus->slave_state = ASPEED_I2C_SLAVE_STOP;
}
-   if (irq_status & ASPEED_I2CD_INTR_TX_NAK) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_TX_NAK) {
status_ack |= ASPEED_I2CD_INTR_TX_NAK;
bus->slave_state = ASPEED_I2C_SLAVE_STOP;
}
+   if (bus->irq_status & ASPEED_I2CD_INTR_TX_ACK) {
+   status_ack |= ASPEED_I2CD_INTR_TX_ACK;
+   }
 
switch (bus->slave_state) {
case ASPEED_I2C_SLAVE_READ_REQUESTED:
-   if (irq_status & ASPEED_I2CD_INTR_TX_ACK)
+   if (bus->irq_status &

Re: [PATCH v10 4/7] Bluetooth: hci_qca: Add wrapper functions for setting UART speed

2018-07-23 Thread Matthias Kaehlcke

On Fri, Jul 20, 2018 at 07:02:40PM +0530, Balakrishna Godavarthi wrote:
> In function qca_setup, we set initial and operating speeds for Qualcomm
> Bluetooth SoC's. This block of code is common across different
> Qualcomm Bluetooth SoC's. Instead of duplicating the code, created
> a wrapper function to set the speeds. So that future coming SoC's
> can use these wrapper functions to set speeds.
> 
> Signed-off-by: Balakrishna Godavarthi 
> Reviewed-by: Matthias Kaehlcke 
> ---
>  drivers/bluetooth/hci_qca.c | 93 -
>  1 file changed, 70 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 5f1c0a8fd5cd..5f8a74d65bec 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -119,6 +119,11 @@ struct qca_data {
>   u64 votes_off;
>  };
>  
> +enum qca_speed_type {
> + QCA_INIT_SPEED = 1,
> + QCA_OPER_SPEED
> +};
> +
>  struct qca_serdev {
>   struct hci_uart  serdev_hu;
>   struct gpio_desc *bt_en;
> @@ -923,6 +928,61 @@ static inline void host_set_baudrate(struct hci_uart 
> *hu, unsigned int speed)
>   hci_uart_set_baudrate(hu, speed);
>  }
>  
> +static unsigned int qca_get_speed(struct hci_uart *hu,
> +   enum qca_speed_type speed_type)
> +{
> + unsigned int speed = 0;
> +
> + if (speed_type == QCA_INIT_SPEED) {
> + if (hu->init_speed)
> + speed = hu->init_speed;
> + else if (hu->proto->init_speed)
> + speed = hu->proto->init_speed;
> + } else {
> + if (hu->oper_speed)
> + speed = hu->oper_speed;
> + else if (hu->proto->oper_speed)
> + speed = hu->proto->oper_speed;
> + }
> +
> + return speed;
> +}
> +
> +static int qca_check_speeds(struct hci_uart *hu)
> +{
> + if (!qca_get_speed(hu, QCA_INIT_SPEED) ||
> + !qca_get_speed(hu, QCA_OPER_SPEED))
> + return -EINVAL;

You changed this from:

/* One or the other speeds should be non zero. */
if (!qca_get_speed(hu, QCA_INIT_SPEED) &&
!qca_get_speed(hu, QCA_OPER_SPEED))
return -EINVAL;

There is no entry in the change log. What is the reason for this
change?

HID: intel_ish-hid: tx_buf memory leak on probe/remove

2018-07-23 Thread Anton Vasilyev


ish_dev_init() allocates 512*176 bytes memory for tx_buf and stores it at
>wr_free_list_head.link list on ish_probe().
But there is no deallocation of this memory in ish_remove() and in 
ish_probe()

error path.
So current intel-ish-ipc provides 88 KB memory leak for each probe/release.

I have two ideas 1) to replace kzalloc allocation by devm_kzalloc,
or 2) release memory stored at >wr_free_list_head.link list (and 
may be at

>wr_processing_list_head.link) in all driver exits.

But I do not know which way is preferable for this case.

Found by Linux Driver Verification project (linuxtesting.org).

--
Anton Vasilyev
Linux Verification Center, ISPRAS
web: http://linuxtesting.org
e-mail: vasil...@ispras.ru

Re: [PATCH i2c-next] i2c: aspeed: Handle master/slave combined irq events properly

2018-07-23 Thread James Feist


On 07/23/2018 10:48 AM, Jae Hyun Yoo wrote:

In most of cases, interrupt bits are set one by one but there are
also a lot of other cases that Aspeed I2C IP sends multiple
interrupt bits with combining master and slave events using a
single interrupt call. It happens much in multi-master environment


much more


than single-master. For an example, when master is waiting for a
NORMAL_STOP interrupt in its MASTER_STOP state, SLAVE_MATCH and
RX_DONE interrupts could come along with the NORMAL_STOP in case of
an another master immediately sends data just after acquiring the
bus. In this case, the NORMAL_STOP interrupt should be handled
by master_irq and the SLAVE_MATCH and RX_DONE interrupts should be
handled by slave_irq. This commit modifies irq hadling logic to
handle the master/slave combined events properly.

Signed-off-by: Jae Hyun Yoo 
---
  drivers/i2c/busses/i2c-aspeed.c | 137 ++--
  1 file changed, 76 insertions(+), 61 deletions(-)

diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
index efb89422d496..24d43f143a55 100644
--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -82,6 +82,11 @@
  #define ASPEED_I2CD_INTR_RX_DONE  BIT(2)
  #define ASPEED_I2CD_INTR_TX_NAK   BIT(1)
  #define ASPEED_I2CD_INTR_TX_ACK   BIT(0)
+#define ASPEED_I2CD_INTR_ERRORS
   \
+   (ASPEED_I2CD_INTR_SDA_DL_TIMEOUT | \
+ASPEED_I2CD_INTR_SCL_TIMEOUT |\
+ASPEED_I2CD_INTR_ABNORMAL |   \
+ASPEED_I2CD_INTR_ARBIT_LOSS)
  #define ASPEED_I2CD_INTR_ALL \
(ASPEED_I2CD_INTR_SDA_DL_TIMEOUT | \
 ASPEED_I2CD_INTR_BUS_RECOVER_DONE |   \
@@ -150,6 +155,7 @@ struct aspeed_i2c_bus {
int cmd_err;
/* Protected only by i2c_lock_bus */
int master_xfer_result;
+   u32 irq_status;
  #if IS_ENABLED(CONFIG_I2C_SLAVE)
struct i2c_client   *slave;
enum aspeed_i2c_slave_state slave_state;
@@ -229,36 +235,30 @@ static int aspeed_i2c_recover_bus(struct aspeed_i2c_bus 
*bus)
  #if IS_ENABLED(CONFIG_I2C_SLAVE)
  static bool aspeed_i2c_slave_irq(struct aspeed_i2c_bus *bus)
  {
-   u32 command, irq_status, status_ack = 0;
+   u32 command, status_ack = 0;
struct i2c_client *slave = bus->slave;
-   bool irq_handled = true;
u8 value;
  
-	if (!slave) {

-   irq_handled = false;
-   goto out;
-   }
+   if (!slave)
+   return false;
  
  	command = readl(bus->base + ASPEED_I2C_CMD_REG);

-   irq_status = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
  
  	/* Slave was requested, restart state machine. */

-   if (irq_status & ASPEED_I2CD_INTR_SLAVE_MATCH) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_SLAVE_MATCH) {
status_ack |= ASPEED_I2CD_INTR_SLAVE_MATCH;
bus->slave_state = ASPEED_I2C_SLAVE_START;
}
  
  	/* Slave is not currently active, irq was for someone else. */

-   if (bus->slave_state == ASPEED_I2C_SLAVE_STOP) {
-   irq_handled = false;
-   goto out;
-   }
+   if (bus->slave_state == ASPEED_I2C_SLAVE_STOP)
+   return false;
  
  	dev_dbg(bus->dev, "slave irq status 0x%08x, cmd 0x%08x\n",

-   irq_status, command);
+   bus->irq_status, command);
  
  	/* Slave was sent something. */

-   if (irq_status & ASPEED_I2CD_INTR_RX_DONE) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_RX_DONE) {
value = readl(bus->base + ASPEED_I2C_BYTE_BUF_REG) >> 8;
/* Handle address frame. */
if (bus->slave_state == ASPEED_I2C_SLAVE_START) {
@@ -273,28 +273,29 @@ static bool aspeed_i2c_slave_irq(struct aspeed_i2c_bus 
*bus)
}
  
  	/* Slave was asked to stop. */

-   if (irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
status_ack |= ASPEED_I2CD_INTR_NORMAL_STOP;
bus->slave_state = ASPEED_I2C_SLAVE_STOP;
}
-   if (irq_status & ASPEED_I2CD_INTR_TX_NAK) {
+   if (bus->irq_status & ASPEED_I2CD_INTR_TX_NAK) {
status_ack |= ASPEED_I2CD_INTR_TX_NAK;
bus->slave_state = ASPEED_I2C_SLAVE_STOP;
}
+   if (bus->irq_status & ASPEED_I2CD_INTR_TX_ACK) {
+   status_ack |= ASPEED_I2CD_INTR_TX_ACK;
+   }
  
  	switch (bus->slave_state) {

case ASPEED_I2C_SLAVE_READ_REQUESTED:
-   if (irq_status & ASPEED_I2CD_INTR_TX_ACK)
+

Applied "regulator: pfuze100: add support to en-/disable switch regulators" to the regulator tree

2018-07-23 Thread Mark Brown

The patch

   regulator: pfuze100: add support to en-/disable switch regulators

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 9d2fd4f0ddfbc4aa1135000df34caebc02793a26 Mon Sep 17 00:00:00 2001
From: Marco Felsch 
Date: Mon, 23 Jul 2018 09:47:47 +0200
Subject: [PATCH] regulator: pfuze100: add support to en-/disable switch
 regulators

Add enable/disable support for switch regulators on pfuze100.

Based on commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for
switch") which is reverted due to boot regressions by commit 464a5686e6c9
("regulator: Revert "regulator: pfuze100: add enable/disable for switch"").
Disabling the switch regulators will only be done if the user specifies
"fsl,pfuze-support-disable-sw" in its device tree to keep backward
compatibility with current dtb's [1].

[1] https://patchwork.kernel.org/patch/10490381/

Signed-off-by: Marco Felsch 
Signed-off-by: Mark Brown 
---
 drivers/regulator/pfuze100-regulator.c | 36 ++
 1 file changed, 36 insertions(+)

diff --git a/drivers/regulator/pfuze100-regulator.c 
b/drivers/regulator/pfuze100-regulator.c
index cde6eda1d283..31c3a236120a 100644
--- a/drivers/regulator/pfuze100-regulator.c
+++ b/drivers/regulator/pfuze100-regulator.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 
+#define PFUZE_FLAG_DISABLE_SW  BIT(1)
+
 #define PFUZE_NUMREGS  128
 #define PFUZE100_VOL_OFFSET0
 #define PFUZE100_STANDBY_OFFSET1
@@ -50,10 +52,12 @@ struct pfuze_regulator {
struct regulator_desc desc;
unsigned char stby_reg;
unsigned char stby_mask;
+   bool sw_reg;
 };
 
 struct pfuze_chip {
int chip_id;
+   int flags;
struct regmap *regmap;
struct device *dev;
struct pfuze_regulator regulator_descs[PFUZE100_MAX_REGULATOR];
@@ -170,6 +174,17 @@ static const struct regulator_ops 
pfuze100_sw_regulator_ops = {
.set_ramp_delay = pfuze100_set_ramp_delay,
 };
 
+static const struct regulator_ops pfuze100_sw_disable_regulator_ops = {
+   .enable = regulator_enable_regmap,
+   .disable = regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
+   .list_voltage = regulator_list_voltage_linear,
+   .set_voltage_sel = regulator_set_voltage_sel_regmap,
+   .get_voltage_sel = regulator_get_voltage_sel_regmap,
+   .set_voltage_time_sel = regulator_set_voltage_time_sel,
+   .set_ramp_delay = pfuze100_set_ramp_delay,
+};
+
 static const struct regulator_ops pfuze100_swb_regulator_ops = {
.enable = regulator_enable_regmap,
.disable = regulator_disable_regmap,
@@ -209,9 +224,12 @@ static const struct regulator_ops 
pfuze100_swb_regulator_ops = {
.uV_step = (step),  \
.vsel_reg = (base) + PFUZE100_VOL_OFFSET,   \
.vsel_mask = 0x3f,  \
+   .enable_reg = (base) + PFUZE100_MODE_OFFSET,\
+   .enable_mask = 0xf, \
},  \
.stby_reg = (base) + PFUZE100_STANDBY_OFFSET,   \
.stby_mask = 0x3f,  \
+   .sw_reg = true, \
}
 
 #define PFUZE100_SWB_REG(_chip, _name, base, mask, voltages)   \
@@ -471,6 +489,9 @@ static int pfuze_parse_regulators_dt(struct pfuze_chip 
*chip)
if (!np)
return -EINVAL;
 
+   if (of_property_read_bool(np, "fsl,pfuze-support-disable-sw"))
+   chip->flags |= PFUZE_FLAG_DISABLE_SW;
+
parent = of_get_child_by_name(np, "regulators");
if (!parent) {
dev_err(dev, "regulators node not found\n");
@@ -703,6 +724,21 @@ static int pfuze100_regulator_probe(struct i2c_client 
*client,
}
}
 
+   /*
+* Allow SW regulators to turn off. Checking it trough a flag is
+* a workaround to keep the backward compatibility with existing
+* old dtb's which may relay on the fact that we didn't disable
+* the switched regulator till yet.
+*/
+   if

Re: Does /dev/urandom now block until initialised ?

2018-07-23 Thread Theodore Y. Ts'o

On Mon, Jul 23, 2018 at 12:11:12PM -0400, Jeffrey Walton wrote:
> 
> I believe Stephan Mueller wrote up the weakness a couple of years ago.
> He's the one who explained the interactions to me. Mueller was even
> cited at https://github.com/systemd/systemd/issues/4167.

Stephan had a lot of complaints about the existing random driver.
That's because he has a replacement driver that he has been pushing,
and instead of giving explicit complaints with specific patches to fix
those specific issues, he have a generalized blast of complaints, plus
a "big bang rewrite".

I've reviewed his lrng doc, and this specific issue was not among his
complaints.  Quite a while ago, I had gone through his document, and
had specifically addressed each of his complaints.  As far as I have
been able determine, all of the specific technical complaints (as
opposed to personal preference issues) have been addressed.

His complaint is a text book complaint about how *not* to file a bug
report.  That being said, we try to take bug reports from as many
sources as possible even if they aren't well formed or submitted in
the ideal place.

(I'm reminded of Linux's networking scalability limitations which
Microsoft filed in the Wall Street Journal 15+ years ago --- and which
only applied if you had 4 CPU's and four 10 megabit networking cards;
if you had four CPU's and a 100 megabit networking card, Linux would
grind Microsoft into the dust; still it was a bug, and we appreciated
the report and we fixed it, even if it wasn't filed in the ideal
forum.  :-)

> It is too bad he Mueller not receive credit for it in the CVE database.

As near as I can tell, he doesn't deserve it for this particular
issue.  It's all Jann Horn and Google's Project Zero.  (And his
writeup is a textbook example of how to report this sort of issue with
great specifity and analysis.)

- Ted

Re: kernel BUG at mm/shmem.c:LINE!

2018-07-23 Thread Hugh Dickins

On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote:
> > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815!
> > I don't know, but I'm afraid it has not fixed linux-next breakage of
> > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466!
> > 
> > Please try something like
> > mount -o remount,huge=always /dev/shm
> > cp /dev/zero /dev/shm
> > 
> > Writing soon crashes in find_lock_entry(), looking up offset 0x201
> > but getting the page for offset 0x3c1 instead.
> 
> Hmm.  I don't see a crash while running that command,

Thanks for looking.

It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page)
in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y
on this occasion? Or you don't think of an oops as a kernel crash,
and didn't notice it in dmesg? I see now that I've arranged for oops
to crash, since I don't like to miss them myself; but it is a very
clean oops, no locks held, so can just kill the process and continue.

I recommend CONFIG_DEBUG_VM=y (for developers, not for distros), but
if you'd prefer to avoid it for now, just edit that VM_BUG_ON_PAGE()
in find_lock_entry() to a BUG_ON().

Or is there something more mysterious stopping it from showing up for
you? It's repeatable for me. When not crashing, that "cp" should fill
up about half of RAM before it hits the implicit tmpfs volume limit;
but I am assuming a not entirely fragmented machine - it does need
to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE().

If you still can't see the crash, look to see how long /dev/shm/zero
is after the "cp": mine crashes a page or two over 2MB (I'm being
vague because I'm typing from the laptop I'd prefer not to reproduce
it on at the moment: I think it would be 1 page over, i_size not yet
updated for the page of index 0x201). But the xarray should by that
stage have been populated for two 2MB pages (by your "goto next" loop
in shmem_add_to_page_cache()).

> but I do see an RCU
> stall in find_get_entries() called from shmem_undo_range() when running
> 'cp' the second time -- ie while truncating the /dev/shm/zero file.

When I stopped oops crashing, I did indeed hang on that second attempt:
no "RCU stall" seen, but I've probably missed the relevant config option.

I wouldn't like to predict what happens if find_get_entry() returns the
wrong page when that VM_BUG_ON_PAGE() is compiled out, very confusing.
If it's compiled in, but just killed the process and dmesg was missed,
then there's an unlocked page lock which will indeed hang a subsequent
truncate (if the xarray yields the same wrong page again), though I
don't know if that would amount to an RCU stall.

> Maybe I'm seeing the same bug as you, and maybe I'm seeing a different
> one.  Do we have a shmem test suite somewhere?

Not as such. xfstests works on tmpfs, huge or not, but I'd have to write
up a few instructions, note one or two "-g auto" tests to patch out since
they take forever on tmpfs, and the few failures expected; and update my
snapshot of the tree to check that over first (I pulled it last mid-May).

I'd rather not get into that at present: a working "cp" will be a great
step forward, then I can easily run xfstests on the fixed kernel.

> 
> > I've spent a while on it, but better turn over to you, Matthew:
> > my guess is that xas_create_range() does not create the layout
> > you expect from it.
> 
> I've dumped the XArray tree on my machine and it actually looks fine
> *except* that the pages pointed to are free!  That indicates to me I
> screwed up somebody's reference count somewhere.

I don't actually know what a good xarray for two 2MB pages should look
like, since the best I can find seems to be a bad one!

Are you sure that those pages are free, rather than most of them tails
of one of the two compound pages involved? I think it's the same in your
rewrite of struct page, the compound_head field (lru.next), with its low
bit set, were how to recognize a tail page.

Hugh

Re: [PATCH] sched/numa: do not balance tasks onto isolated cpus

2018-07-23 Thread kbuild test robot

Hi Chen,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.18-rc6 next-20180723]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Chen-Lin/sched-numa-do-not-balance-tasks-onto-isolated-cpus/20180724-031803
config: i386-randconfig-x008-201829 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/sched/core.c: In function 'migrate_swap':
>> kernel/sched/core.c:1283:46: error: 'cpu_isolated_map' undeclared (first use 
>> in this function); did you mean 'cpu_core_map'?
|| cpumask_test_cpu(arg.dst_cpu, cpu_isolated_map))
 ^~~~
 cpu_core_map
   kernel/sched/core.c:1283:46: note: each undeclared identifier is reported 
only once for each function it appears in

vim +1283 kernel/sched/core.c

  1256  
  1257  /*
  1258   * Cross migrate two tasks
  1259   */
  1260  int migrate_swap(struct task_struct *cur, struct task_struct *p)
  1261  {
  1262  struct migration_swap_arg arg;
  1263  int ret = -EINVAL;
  1264  
  1265  arg = (struct migration_swap_arg){
  1266  .src_task = cur,
  1267  .src_cpu = task_cpu(cur),
  1268  .dst_task = p,
  1269  .dst_cpu = task_cpu(p),
  1270  };
  1271  
  1272  if (arg.src_cpu == arg.dst_cpu)
  1273  goto out;
  1274  
  1275  /*
  1276   * These three tests are all lockless; this is OK since all of 
them
  1277   * will be re-checked with proper locks held further down the 
line.
  1278   */
  1279  if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
  1280  goto out;
  1281  
  1282  if ((!cpumask_test_cpu(arg.dst_cpu, 
_task->cpus_allowed))
> 1283  || cpumask_test_cpu(arg.dst_cpu, cpu_isolated_map))
  1284  goto out;
  1285  
  1286  if ((!cpumask_test_cpu(arg.src_cpu, 
_task->cpus_allowed))
  1287  || cpumask_test_cpu(arg.src_cpu, cpu_isolated_map))
  1288  goto out;
  1289  
  1290  trace_sched_swap_numa(cur, arg.src_cpu, p, arg.dst_cpu);
  1291  ret = stop_two_cpus(arg.dst_cpu, arg.src_cpu, 
migrate_swap_stop, );
  1292  
  1293  out:
  1294  return ret;
  1295  }
  1296  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes

On Sat, 21 Jul 2018, Matthew Wilcox wrote:

> > The huge zero page can be reclaimed under memory pressure and, if it is, 
> > it is attempted to be allocted again with gfp flags that attempt memory 
> > compaction that can become expensive.  If we are constantly under memory 
> > pressure, it gets freed and reallocated millions of times always trying to 
> > compact memory both directly and by kicking kcompactd in the background.
> > 
> > It likely should also be per node.
> 
> Have you benchmarked making the non-huge zero page per-node?
> 

Not since we disable it :)  I will, though.  The more concerning issue for 
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
compacting memory for allocating the hzp in real time after it has been 
reclaimed.  We've observed this happening tens or hundreds of thousands 
of times on some systems.  It will be 2MB per node on x86 if the data 
suggests we should make it NUMA aware, I don't think the cost is too high 
to leave it persistently available even under memory pressure if 
use_zero_page is enabled.

Re: [PATCH] tpm: add support for partial reads

2018-07-23 Thread Tadeusz Struk

On 07/23/2018 01:19 PM, Jarkko Sakkinen wrote:
> In this case I do not have any major evidence of any major benefit *and*
> the change breaks the ABI.

As I said before - this does not break the ABI.
As for the benefits - it help user space in how they implement the receive
path. Application does not need to provide a 4K buffer for every read even
if the response is, for instance 8 bytes long.
Thanks,
-- 
Tadeusz

Re: [PATCH v5] PCI: Check for PCIe downtraining conditions

2018-07-23 Thread Jakub Kicinski

On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.
> 
> Signed-off-by: Alexandru Gagniuc 
> ---
> 
> For the sake of review, I've created a __pcie_print_link_status() which
> takes a 'verbose' argument. If we agree want to go this route, and update
> the users of pcie_print_link_status(), I can split this up in two patches.
> I prefer just printing this information in the core functions, and letting
> drivers not have to worry about this. Though there seems to be strong for
> not going that route, so here it goes:

FWIW the networking drivers print PCIe BW because sometimes the network
bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
card on a x8 link.

Sorry to bike shed, but currently the networking cards print the info
during probe.  Would it make sense to move your message closer to probe
time?  Rather than when device is added.  If driver structure is
available, we could also consider adding a boolean to struct pci_driver
to indicate if driver wants the verbose message?  This way we avoid
duplicated prints.

I have no objection to current patch, it LGTM.  Just a thought.

>  drivers/pci/pci.c   | 22 ++
>  drivers/pci/probe.c | 21 +
>  include/linux/pci.h |  1 +
>  3 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 316496e99da9..414ad7b3abdb 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum 
> pci_bus_speed *speed,
>  }
>  
>  /**
> - * pcie_print_link_status - Report the PCI device's link speed and width
> + * __pcie_print_link_status - Report the PCI device's link speed and width
>   * @dev: PCI device to query
> + * @verbose: Be verbose -- print info even when enough bandwidth is 
> available.
>   *
>   * Report the available bandwidth at the device.  If this is less than the
>   * device is capable of, report the device's maximum possible bandwidth and
>   * the upstream link that limits its performance to less than that.
>   */
> -void pcie_print_link_status(struct pci_dev *dev)
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>  {
>   enum pcie_link_width width, width_cap;
>   enum pci_bus_speed speed, speed_cap;
> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>   bw_cap = pcie_bandwidth_capable(dev, _cap, _cap);
>   bw_avail = pcie_bandwidth_available(dev, _dev, , );
>  
> - if (bw_avail >= bw_cap)
> + if (bw_avail >= bw_cap && verbose)
>   pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d 
> link)\n",
>bw_cap / 1000, bw_cap % 1000,
>PCIE_SPEED2STR(speed_cap), width_cap);
> - else
> + else if (bw_avail < bw_cap)
>   pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited 
> by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>bw_avail / 1000, bw_avail % 1000,
>PCIE_SPEED2STR(speed), width,
> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>bw_cap / 1000, bw_cap % 1000,
>PCIE_SPEED2STR(speed_cap), width_cap);
>  }
> +
> +/**
> + * pcie_print_link_status - Report the PCI device's link speed and width
> + * @dev: PCI device to query
> + *
> + * Report the available bandwidth at the device.  If this is less than the
> + * device is capable of, report the device's maximum possible bandwidth and
> + * the upstream link that limits its performance to less than that.
> + */
> +void pcie_print_link_status(struct pci_dev *dev)
> +{
> + __pcie_print_link_status(dev, true);
> +}
>  EXPORT_SYMBOL(pcie_print_link_status);
>  
>  /**
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac876e32de4b..1f7336377c3b 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus 
> *bus, int devfn)
>   return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> + if (!pci_is_pcie(dev))
> + return;
> +
> + /* Look from the device up to avoid downstream ports with no devices. */
> + if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> + (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> + (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +

[RFC PATCH] checkpatch: Discourage use with --f/--file outside of drivers/staging ?

2018-07-23 Thread Joe Perches

Perhaps some patch like this could help reduce the
number of ill-considered checkpatch submissions
for files outside of drivers/staging/

Concept and message wordsmithing appreciated...
---
 scripts/checkpatch.pl | 4 
 1 file changed, 4 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 34e4683de7a3..1a93421d5b1d 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2483,6 +2483,10 @@ sub process {
WARN("OBSOLETE",
 "$realfile is marked as 'obsolete' in the 
MAINTAINERS hierarchy.  No unnecessary modifications please.\n");
}
+   if ($file && $filename !~ m@^drivers/staging/@) {
+   WARN("CHECKPATCH_FILE",
+"Using -f/--file with '$realfile' may not 
be appropriate.\n");
+   }
if ($realfile =~ 
m@^(?:drivers/net/|net/|drivers/staging/)@) {
$check = 1;
} else {

Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes

On Mon, 23 Jul 2018, David Rientjes wrote:

> > > The huge zero page can be reclaimed under memory pressure and, if it is, 
> > > it is attempted to be allocted again with gfp flags that attempt memory 
> > > compaction that can become expensive.  If we are constantly under memory 
> > > pressure, it gets freed and reallocated millions of times always trying 
> > > to 
> > > compact memory both directly and by kicking kcompactd in the background.
> > > 
> > > It likely should also be per node.
> > 
> > Have you benchmarked making the non-huge zero page per-node?
> > 
> 
> Not since we disable it :)  I will, though.  The more concerning issue for 
> us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
> compacting memory for allocating the hzp in real time after it has been 
> reclaimed.  We've observed this happening tens or hundreds of thousands 
> of times on some systems.  It will be 2MB per node on x86 if the data 
> suggests we should make it NUMA aware, I don't think the cost is too high 
> to leave it persistently available even under memory pressure if 
> use_zero_page is enabled.
> 

Measuring access latency to 4GB of memory on Naples I observe ~6.7% 
slower access latency intrasocket and ~14% slower intersocket.

use_zero_page is currently a simple thp flag, meaning it rejects writes 
where val != !!val, so perhaps it would be best to overload it with 
additional options?  I can imagine 0x2 defining persistent allocation so 
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if 
the hzp should be per node.  Implementing persistent allocation fixes our 
concern with it, so I'd like to start there.  Comments?

[PATCH] android: binder: Include asm/cacheflush.h after linux/ include files

2018-07-23 Thread Guenter Roeck

If asm/cacheflush.h is included first, the following build warnings are
seen with sparc32 builds.

In file included from arch/sparc/include/asm/cacheflush.h:11:0,
from drivers/android/binder.c:54:
arch/sparc/include/asm/cacheflush_32.h:40:37: warning:
'struct page' declared inside parameter list will not be visible
outside of this definition or declaration

Moving the asm/ include after linux/ includes solves the problem.

Suggested-by: Linus Torvalds 
Signed-off-by: Guenter Roeck 
---
 drivers/android/binder.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 95283f3bb51c..1cc2fa16af8b 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -51,7 +51,6 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -73,6 +72,9 @@
 #include 
 
 #include 
+
+#include 
+
 #include "binder_alloc.h"
 #include "binder_trace.h"
 
-- 
2.7.4

linux-next-20180723: battery status funny after bootup

2018-07-23 Thread Pavel Machek


pavel@amd:~$ cat /proc/acpi/battery/BAT0/state
present: yes
capacity state:  ok
charging state:  charged
present rate:0 mW
remaining capacity:  0 mWh
present voltage: 0 mV
pavel@amd:~$ uname -a
Linux amd 4.18.0-rc6-next-20180723+ #141 SMP Mon Jul 23 22:11:47 CEST
2018 i686 GNU/Linux

It will correct itself if I unplug/replug the AC adapter, I
believe. Gnome2 battery monitor also looks confused.


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH] tpm: add support for partial reads

2018-07-23 Thread Jason Gunthorpe

On Thu, Jul 19, 2018 at 08:52:32AM -0700, Tadeusz Struk wrote:
> Currently to read a response from the TPM device an application needs
> provide "big enough" buffer for the whole response and read it in one go.
> The application doesn't know how big the response it beforehand so it
> always needs to maintain a 4K buffer and read the max (4K).
> In case if the user of the TSS library doesn't provide big enough buffer
> the TCTI spec says that the library should set the required size and return
> TSS2_TCTI_RC_INSUFFICIENT_BUFFER error code so that the application could
> allocate a bigger buffer and call receive again.
> To make it possible in the TSS library this requires being able to do
> partial reads from the driver.
> The library would read the header first to get the actual size of the
> response from the header and then read the rest of the response.
> This patch adds support for partial reads.

You should solve this in user space, the kernel API requires a full
sized buffer here, the tss library should always provide such an
internal buffer and then implement whatever scheme TSS wants by
memcpying from that buffer..

Jason

Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread Yang Shi





On 7/23/18 2:33 PM, David Rientjes wrote:

On Mon, 23 Jul 2018, David Rientjes wrote:


The huge zero page can be reclaimed under memory pressure and, if it is,
it is attempted to be allocted again with gfp flags that attempt memory
compaction that can become expensive.  If we are constantly under memory
pressure, it gets freed and reallocated millions of times always trying to
compact memory both directly and by kicking kcompactd in the background.

It likely should also be per node.

Have you benchmarked making the non-huge zero page per-node?


Not since we disable it :)  I will, though.  The more concerning issue for
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
compacting memory for allocating the hzp in real time after it has been
reclaimed.  We've observed this happening tens or hundreds of thousands
of times on some systems.  It will be 2MB per node on x86 if the data
suggests we should make it NUMA aware, I don't think the cost is too high
to leave it persistently available even under memory pressure if
use_zero_page is enabled.


Measuring access latency to 4GB of memory on Naples I observe ~6.7%
slower access latency intrasocket and ~14% slower intersocket.

use_zero_page is currently a simple thp flag, meaning it rejects writes
where val != !!val, so perhaps it would be best to overload it with
additional options?  I can imagine 0x2 defining persistent allocation so
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
the hzp should be per node.  Implementing persistent allocation fixes our
concern with it, so I'd like to start there.  Comments?


Sounds worth trying to me :-)  It might be worth making it persistent by 
default. Keeping 2MB memory unreclaimable sounds not harmful for the use 
case which prefer to use THP.

[PATCH v7 07/12] dt-bindings: mfd: Add a document for PECI client MFD

2018-07-23 Thread Jae Hyun Yoo

This commit adds a dt-bindings document for PECI client MFD.

Signed-off-by: Jae Hyun Yoo 
Cc: Lee Jones 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: Andrew Jeffery 
Cc: James Feist 
Cc: Jason M Biils 
Cc: Joel Stanley 
Cc: Vernon Mauery 
---
 .../bindings/mfd/intel-peci-client.txt| 34 +++
 1 file changed, 34 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/intel-peci-client.txt

diff --git a/Documentation/devicetree/bindings/mfd/intel-peci-client.txt 
b/Documentation/devicetree/bindings/mfd/intel-peci-client.txt
new file mode 100644
index ..cb341e363add
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/intel-peci-client.txt
@@ -0,0 +1,34 @@
+* Intel PECI client bindings
+
+PECI (Platform Environment Control Interface) is a one-wire bus interface that
+provides a communication channel from PECI clients in Intel processors and
+chipset components to external monitoring or control devices. PECI is designed
+to support the following sideband functions:
+
+- Processor and DRAM thermal management
+- Platform Manageability
+- Processor Interface Tuning and Diagnostics
+- Failure Analysis
+
+Required properties:
+- compatible : Should be "intel,peci-client".
+- reg: Should contain address of a client CPU. Address range of CPU
+  clients starts from 0x30 based on PECI specification.
+
+Example:
+   peci-bus@0 {
+   compatible = "vendor,soc-peci";
+   reg = <0x0 0x1000>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   peci-client@30 {
+   compatible = "intel,peci-client";
+   reg = <0x30>;
+   };
+
+   peci-client@31 {
+   compatible = "intel,peci-client";
+   reg = <0x31>;
+   };
+   };
-- 
2.18.0

Re: [PATCH] drivers/pci/probe: Move variable bridge inside ifdef

2018-07-23 Thread Bjorn Helgaas

On Sat, Jul 21, 2018 at 11:45:56PM +0200, Anders Roxell wrote:
> When CONFIG_PCI_QUIRKS isn't enabled we get the warning below:
> drivers/pci/probe.c: In function ‘pci_bus_read_dev_vendor_id’:
> drivers/pci/probe.c:2221:18: warning: unused variable ‘bridge’ 
> [-Wunused-variable]
>   struct pci_dev *bridge = bus->self;
>   ^~
> 
> Move the declaration of variable bridge to inside the ifdef
> CONFIG_PCI_QUIRKS.
> 
> Fixes: ac5ea104a279 ("PCI: Workaround IDT switch ACS Source Validation 
> erratum")
> Signed-off-by: Anders Roxell 

I folded this into the original commit on pci/enumeration, thanks!

> ---
>  drivers/pci/probe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 1c581346c5b9..7a5323798312 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2218,9 +2218,9 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus 
> *bus, int devfn, u32 *l,
>  bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
>   int timeout)
>  {
> +#ifdef CONFIG_PCI_QUIRKS
>   struct pci_dev *bridge = bus->self;
>  
> -#ifdef CONFIG_PCI_QUIRKS
>   /*
>* Certain IDT switches have an issue where they improperly trigger
>* ACS Source Validation errors on completions for config reads.
> -- 
> 2.18.0
>

Re: [PATCH V4 2/7] mmc: sdhci: Change SDMA address register for v4 mode

2018-07-23 Thread kbuild test robot

Hi Chunyan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on ulf.hansson-mmc/next]
[also build test WARNING on v4.18-rc6 next-20180723]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Chunyan-Zhang/mmc-add-support-for-sdhci-4-0/20180724-045328
base:   git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git next
config: arm-exynos_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:14:0,
from include/linux/delay.h:22,
from drivers/mmc/host/sdhci.c:16:
   drivers/mmc/host/sdhci.c: In function 'sdhci_data_irq':
>> drivers/mmc/host/sdhci.c:43:11: warning: format '%p' expects argument of 
>> type 'void *', but argument 4 has type 'dma_addr_t {aka unsigned int}' 
>> [-Wformat=]
 pr_debug("%s: " DRIVER_NAME ": " f, mmc_hostname(host->mmc), ## x)
  ^
   include/linux/printk.h:288:21: note: in definition of macro 'pr_fmt'
#define pr_fmt(fmt) fmt
^~~
   include/linux/printk.h:336:2: note: in expansion of macro 'dynamic_pr_debug'
 dynamic_pr_debug(fmt, ##__VA_ARGS__)
 ^~~~
   drivers/mmc/host/sdhci.c:43:2: note: in expansion of macro 'pr_debug'
 pr_debug("%s: " DRIVER_NAME ": " f, mmc_hostname(host->mmc), ## x)
 ^~~~
   drivers/mmc/host/sdhci.c:2849:4: note: in expansion of macro 'DBG'
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
   ^~~
   drivers/mmc/host/sdhci.c:2849:19: note: format string is defined here
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
 ~^
 %d
   In file included from include/linux/kernel.h:14:0,
from include/linux/delay.h:22,
from drivers/mmc/host/sdhci.c:16:
   drivers/mmc/host/sdhci.c:43:11: warning: format '%p' expects argument of 
type 'void *', but argument 6 has type 'dma_addr_t {aka unsigned int}' 
[-Wformat=]
 pr_debug("%s: " DRIVER_NAME ": " f, mmc_hostname(host->mmc), ## x)
  ^
   include/linux/printk.h:288:21: note: in definition of macro 'pr_fmt'
#define pr_fmt(fmt) fmt
^~~
   include/linux/printk.h:336:2: note: in expansion of macro 'dynamic_pr_debug'
 dynamic_pr_debug(fmt, ##__VA_ARGS__)
 ^~~~
   drivers/mmc/host/sdhci.c:43:2: note: in expansion of macro 'pr_debug'
 pr_debug("%s: " DRIVER_NAME ": " f, mmc_hostname(host->mmc), ## x)
 ^~~~
   drivers/mmc/host/sdhci.c:2849:4: note: in expansion of macro 'DBG'
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
   ^~~
   drivers/mmc/host/sdhci.c:2849:56: note: format string is defined here
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
  ~^
  %d

vim +43 drivers/mmc/host/sdhci.c

d129bceb1 drivers/mmc/sdhci.c  Pierre Ossman 2006-03-24 @16  
#include 
5a436cc0a drivers/mmc/host/sdhci.c Adrian Hunter 2017-03-20  17  
#include 
d129bceb1 drivers/mmc/sdhci.c  Pierre Ossman 2006-03-24  18  
#include 
b8c86fc5d drivers/mmc/host/sdhci.c Pierre Ossman 2008-03-18  19  
#include 
88b476797 drivers/mmc/host/sdhci.c Paul Gortmaker2011-07-03  20  
#include 
d129bceb1 drivers/mmc/sdhci.c  Pierre Ossman 2006-03-24  21  
#include 
5a0e3ad6a drivers/mmc/host/sdhci.c Tejun Heo 2010-03-24  22  
#include 
117636092 drivers/mmc/host/sdhci.c Ralf Baechle  2007-10-23  23  
#include 
bd9b90279 drivers/mmc/host/sdhci.c Linus Walleij 2018-01-29  24  
#include 
250dcd114 drivers/mmc/host/sdhci.c Ulf Hansson   2017-11-27  25  
#include 
9bea3c850 drivers/mmc/host/sdhci.c Marek Szyprowski  2010-08-10  26  
#include 
66fd8ad51 drivers/mmc/host/sdhci.c Adrian Hunter 2011-10-03  27  
#include 
92e0c44b9 drivers/mmc/host/sdhci.c Zach Brown2016-11-02  28  
#include 
d129bceb1 drivers/mmc/sdhci.c  Pierre Ossman 2006-03-24  29  
2f730fec8 drivers/mmc/host/sdhci.c Pierre Ossman 2008-03-17  30  
#include 
2f730fec8 drivers/mmc/host/sdhci.c Pierre Ossman 2008-03-17  31  
22113efd0 drivers/mmc/host/sdhci.c Aries Lee 2010-12-15  32  
#include 
d129bceb1 drivers/mmc/sdhci.c  Pierre Ossman

Re: [PATCH v2 09/10] coresight: perf: Remove set_buffer call back

2018-07-23 Thread Suzuki K Poulose


Mathieu,

On 07/23/2018 07:22 PM, Mathieu Poirier wrote:

On Fri, 20 Jul 2018 at 03:04, Suzuki K Poulose  wrote:


Mathieu,

On 19/07/18 21:36, Mathieu Poirier wrote:

On Tue, Jul 17, 2018 at 06:11:40PM +0100, Suzuki K Poulose wrote:

In coresight perf mode, we need to prepare the sink before
starting a session, which is done via set_buffer call back.
We then proceed to enable the tracing. If we fail to start
the session successfully, we leave the sink configuration
unchanged. This was fine for the existing backends as they
don't have any state associated with the buffers. But with
ETR, we need to keep track of the buffer details and need
to be cleaned up if we fail. In order to make the operation
atomic and to avoid yet another call back, we get rid of
the "set_buffer" call back and pass the buffer details
via enable() call back to the sink.


Suzuki,

I'm not sure I understand the problem you're trying to fix there.  From the
implementation of tmc_enable_etr_sink_perf() in the next patch, wouldn't the
same result been achievable using a callback?


We can definitely achieve the results using "set_buffer". But for ETR,
we track the "perf_buf" in drvdata->perf_data when we do "set_buffer".
But if we failed to enable_path(), we leave the drvdata->perf_data
and doesn't clean it up. Now when another session is about to set_buf,
we check if perf_data is empty and WARNs otherwise.
Because we can't be sure if it belongs to an abandoned session or
another active session and we completely messed somewhere in the driver.
So, we need a clear_buffer call back if the enable fails, something
not really worth. Anyways, there is no point in separating set_buffer
and enabling the sink, as the error handling becomes cumbersome as explained
above.



I'm fine with this patch and supportive of getting rid of callbacks if we can, I
just need to understand the exact problem you're after.  From looking a your
code (and the current implementation), if we succeed in setting the memory for
the sink but fail in any of the subsequent steps i.e, enabling the rest of the
compoment on the path or the source, the sink is left unchanged.


Yes, thats right. And we should WARN (which I missed in this version) if
there is a perf_data already for a disabled ETR. Please see my response to the
next patch.


The changelog for this patch states the following: "But with ETR, we
need to keep track of the buffer details and need to be cleaned up if
we fail."

I did a deep dive in the code and in the current implementation if the
source fails to be enabled in etm_event_start() the path and the sink
remains unchanged.  With your patchset this get fixed because a goto
was added to disable the path when such condition occur.  As such each
component in the path will see its ->disable() callback invoked.  In
tmc_disable_etr_sink(), drvdata->perf_data is set to NULL in
tmc_etr_disable_hw(), so the cleanup on error condition is done
properly.  As such we wouldn't need a clean_buffer() callback.


All of this is right. But we still have a case. e.g, if the ETR is
enabled in sysfs mode, coresight_enable_path() will fail after we
have set the buffer. And since we don't try to disable the path
when we fail at SINK (which is the right thing to do, as we could
be potentially disabling the ETR operated in sysfs mode), we leave
the perf_data around. And the next session finds a non-empty data.



As I said I'm in favour of removing the set_buffer() callback but I
wouldn't associated it with ETR state cleanup.  If the code can be
rearranged in a way that code can be removed then that alone is enough
to justify the change.




diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c 
b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 3cc4a0b..12a247d 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -269,16 +269,11 @@ static void etm_event_start(struct perf_event *event, int 
flags)
  path = etm_event_cpu_path(event_data, cpu);
  /* We need a sink, no need to continue without one */
  sink = coresight_get_sink(path);
-if (WARN_ON_ONCE(!sink || !sink_ops(sink)->set_buffer))
-goto fail_end_stop;
-
-/* Configure the sink */
-if (sink_ops(sink)->set_buffer(sink, handle,
-   event_data->snk_config))
+if (WARN_ON_ONCE(!sink))
  goto fail_end_stop;

  /* Nothing will happen without a path */
-if (coresight_enable_path(path, CS_MODE_PERF))
+if (coresight_enable_path(path, CS_MODE_PERF, handle))


Here we already have a handle on "event_data".  As such I think this is what we
should feed to coresight_enable_path() rather than the handle.  That way we
don't need to call etm_perf_sink_config(), we just use the data.


The advantage of passing on the handle is, we could get all the way upto the
"perf_event" for the given session. Passing the event_data will loose that
information.

i.e, perf_event->

Re: [PATCH v10 1/7] dt-bindings: net: bluetooth: Add device tree bindings for QTI chip wcn3990

2018-07-23 Thread Matthias Kaehlcke

On Fri, Jul 20, 2018 at 07:02:37PM +0530, Balakrishna Godavarthi wrote:
> This patch enables regulators for the Qualcomm Bluetooth wcn3990
> controller.
> 
> Signed-off-by: Balakrishna Godavarthi 
> Reviewed-by: Rob Herring 
> ---
> Changes in v10:
> * added entry for regulator currents
> 
> Changes in v9:
> * updated with latest reg handle and names.
> * updated max-speed definition. 
> 
> Changes in v8:
> * Separated the optional entries between two chips
> 
> Changes in v7:
> * no change.
> 
> Changes in v6:
> 
> * Changed the oper-speed to max-speed.
> 
> Changes in v5:
> 
> * Added entry for oper-speed and init-speed.
> 
> ---
>  .../bindings/net/qualcomm-bluetooth.txt   | 42 ++-
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt 
> b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
> index 0ea18a53cc29..ca04c4981048 100644
> --- a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
> +++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
> @@ -10,12 +10,34 @@ device the slave device is attached to.
>  Required properties:
>   - compatible: should contain one of the following:
> * "qcom,qca6174-bt"
> +   * "qcom,wcn3990-bt"
> +
> +Optional properties for compatible string qcom,qca6174-bt:
>  
> -Optional properties:
>   - enable-gpios: gpio specifier used to enable chip
>   - clocks: clock provided to the controller (SUSCLK_32KHZ)
>  
> -Example:
> +Optional properties for compatible string qcom,wcn3990-bt:
> +
> + - vddio-supply: Bluetooth wcn3990 VDD_IO supply regulator handle.
> + - vddxo-supply: Bluetooth wcn3990 VDD_XO supply regulator handle.
> + - vddrf-supply: Bluetooth wcn3990 VDD_RF supply regulator handle.
> + - vddch0-supply: Bluetooth wcn3990 VDD_CH0 supply regulator handle.
> +
> + - If WCN3990 is connected to platform where RPMH PMIC processor is used
> +   then the load values will be 1uA. if it is connected to platform where RPM
> +   PMIC processor is used then load value will be 1 uA.
> +   if it is connected to different platform, where current values are fixed
> +   as in data sheet then below property are not required.

Please provide details why these magic values are needed for RPMh and
RPM PMICs.

For RPMh it looks like a value of 1uA would cause the regulator to
enter idle mode:

static int rpmh_regulator_vrm_set_load(struct regulator_dev *rdev, int load_uA)
{
struct rpmh_vreg *vreg = rdev_get_drvdata(rdev);
unsigned int mode;

if (load_uA >= vreg->hw_data->hpm_min_load_uA)
mode = REGULATOR_MODE_NORMAL;
else
mode = REGULATOR_MODE_IDLE;

return rpmh_regulator_vrm_set_mode(rdev, mode);
}

https://patchwork.kernel.org/patch/10524299/

Is that really intended and if so why? It might make sense to save
power when really in idle mode, but I assume you somehow have to tell
the regulator to switch to normal mode when BT is used.

I also commented about this on patch "[7/7] Bluetooth: hci_qca: Add
support for Qualcomm Bluetooth chip wcn3990". I suggest to center the
discussion here and reply with a link to the other patch.

Re: [PATCH v2] hexagon: modify ffs() and fls() to return int

2018-07-23 Thread Richard Kuo

On Sun, Jul 22, 2018 at 04:03:58PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Building drivers/mtd/nand/raw/nandsim.c on arch/hexagon/ produces a
> printk format build warning.  This is due to hexagon's ffs() being
> coded as returning long instead of int.
> 
> Fix the printk format warning by changing all of hexagon's ffs() and
> fls() functions to return int instead of long.  The variables that
> they return are already int instead of long.  This return type
> matches the return type in .
> 
> ../drivers/mtd/nand/raw/nandsim.c: In function 'init_nandsim':
> ../drivers/mtd/nand/raw/nandsim.c:760:2: warning: format '%u' expects 
> argument of type 'unsigned int', but argument 2 has type 'long int' [-Wformat]
> 
> There are no ffs() or fls() allmodconfig build errors after making this
> change.
> 
> Signed-off-by: Randy Dunlap 
> Cc: Richard Kuo 
> Cc: linux-hexa...@vger.kernel.org
> Cc: Geert Uytterhoeven 
> ---
> v2:
> add hexagon contacts, drop erroneous sh contacts; [thanks, Geert]
> only change return type for ffs() and fls() [thanks, Geert]
>   [drop the changes for ffz(), __ffs(), and __fls()]
> 
>  arch/hexagon/include/asm/bitops.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 


Acked-by: Richard Kuo 

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

Re: [PATCHv3 2/2] mtd: m25p80: restore the status of SPI flash when exiting

2018-07-23 Thread Boris Brezillon

+Neil

On Mon, 23 Jul 2018 15:06:43 -0700
Brian Norris  wrote:

> Hi Boris,
> 
> On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
>  wrote:
> > On Mon, 23 Jul 2018 11:13:50 -0700
> > Brian Norris  wrote:  
> >> I noticed this got merged, but I wanted to put my 2 cents in here:  
> >
> > I wish you had replied to this thread when it was posted (more than
> > 6 months ago). Reverting the patch now implies making some people
> > unhappy because they'll have to resort to their old out-of-tree
> > hacks :-(.  
> 
> I'd say I'm sorry for not following things closely these days, but I'm
> not really that sorry. There are plenty of other capable hands. And if
> y'all shoot yourselves in the foot, so be it. This patch isn't going
> to blow things up, but now that I did finally notice it (because it
> happened to show up in a list of backports I was looking at), I
> thought better late than never to remind you.
> 
> For way of notification: Marek already noticed that we've started down
> a slippery slope months ago:
> 
> https://lkml.org/lkml/2018/4/8/141
> Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
> 3-byte addressing.
> 
> I'm not quite sure why that wasn't taken to its logical conclusion --
> that the hack should be reverted.
> 
> This problem has been noted many times already, and we've always
> stayed on the side of *avoiding* this hack. A few references from a
> search of my email:
> 
> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> 
> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> [RFC] MTD m25p80 3-byte addressing and boot problem
> 
> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used

To my defense, I was not actively following SPI NOR topics at this
time, but, in the light of those discussions, I still keep thinking we
should try our best to improve support for broken HW.
So, ideally, we should find a way to support getting back to 3-byte
addressing mode when resetting, while generating enough noise to make
people well aware that their board design is broken (I think the
proposal of the new DT prop + a WARN_ON() is a good idea).

> 
> >> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:  
> >> > From: Hou Zhiqiang 
> >> >
> >> > Restore the status to be compatible with legacy devices.
> >> > Take Freescale eSPI boot for example, it copies (in 3 Byte
> >> > addressing mode) the RCW and bootloader images from SPI flash
> >> > without firing a reset signal previously, so the reboot command
> >> > will fail without reseting the addressing mode of SPI flash.
> >> > This patch implement .shutdown function to restore the status
> >> > in reboot process, and add the same operation to the .remove
> >> > function.  
> >>
> >> We have previously rejected this patch multiple times, because the above
> >> comment demonstrates a broken product.  
> >
> > If we were to only support working HW parts, I fear Linux would not
> > support a lot of HW (that's even more true when it comes to flashes :P).  
> 
> You stopped allowing UBI to attach to MLC NAND recently, no? That
> sounds like almost the same boat -- you've probably killed quite a few
> shitty products, if they were to use mainline directly.

Well, that's a bit different in that it's purely a SW issue, and we
also try to encourage people to take over the work we started with
Richard to fix that problem. But I get your point, we broke setups that
were working in an unreliable way, pretty much what you're suggesting to
do here.

> 
> Anyway, that's derailing the issue. Supporting broken hardware isn't
> something you try to do by applying the same hack to all systems. You
> normally try to apply your hack as narrowly as possible.

This, I agree with.

> You seem to
> imply that below. So maybe that's a solution to move forward with. But
> I'd personally be just as happy to see the patch reverted.
> 
> >> You cannot guarantee that all
> >> reboots will invoke the .shutdown() method -- what about crashes? What
> >> about watchdog resets? IIUC, those will hit the same broken behavior,
> >> and have unexepcted behavior in your bootloader.  
> >
> > Yes, there are corner cases that are not addressed with this approach,  
> 
> Is a system crash really a corner case? :D

Well, nothing forces you to reset the platform using a HW reset when
that happens :P.

> 
> > but it still seems to improve things. Of course, that means the
> > user should try to re-route all HW reset sources to SW ones (RESET input
> > pin muxed to the GPIO controller, watchdog generating an interrupt
> > instead of directly asserting the RESET output pin), which is not always
> > possible, but even when it's not, isn't it better to have a setup that
> > works fine 99% of the time instead of 50% of the time?  
> 
> Perhaps, but not at the expense of

Re: Linux 4.18-rc6

2018-07-23 Thread Linus Torvalds

On Mon, Jul 23, 2018 at 2:23 PM Guenter Roeck  wrote:
>
> >
> > Martin - can we just remove the
> >
> >  select HAVE_GCC_PLUGINS
> >
> > from the s390 Kconfig file (or perhaps add "if BROKEN" or something to
> > disable it).
> >
> > Because if it's not getting fixed, it shouldn't be exposed.
> >
> The problem only affects 4.18 - the code has been rearranged in -next.
> Only, in my builders, I can't disable a flag for individual releases,
> so I just disabled it completely for s390.

Well, I'm not going to release a 4.18 with a known problem, so in 4.18
this *will* be disabled if it's not fixed.

The fact that it might be fixed in linux-next is entirely immaterial
to the release of 4.18.

   Linus

Re: [PATCH v2] perf/core: fix a possible deadlock scenario

2018-07-23 Thread Cong Wang

On Fri, Jul 20, 2018 at 4:52 AM Peter Zijlstra  wrote:
>
> On Thu, Jul 19, 2018 at 12:12:53PM -0700, Cong Wang wrote:
> > hrtimer_cancel() busy-waits for the hrtimer callback to stop,
> > pretty much like del_timer_sync(). This creates a possible deadlock
> > scenario where we hold a spinlock before calling hrtimer_cancel()
> > while in trying to acquire the same spinlock in the callback.
>
> Has this actually been observed?

Without lockdep annotation, it is not easy to observe.

>
> > cpu_clock_event_init():
> >   perf_swevent_init_hrtimer():
> > hwc->hrtimer.function = perf_swevent_hrtimer;
> >
> > perf_swevent_hrtimer():
> >   __perf_event_overflow():
> > __perf_event_account_interrupt():
> >   perf_adjust_period():
> > pmu->stop():
> > cpu_clock_event_stop():
> >   perf_swevent_cancel():
> > hrtimer_cancel()
>
> Please explain how a hrtimer event ever gets to perf_adjust_period().
> Last I checked perf_swevent_init_hrtimer() results in attr.freq=0.

Good point.

I thought attr.freq is specified by user-space, but seems
perf_swevent_init_hrtimer() clears it purposely and will not change
after initialization, interesting...

>
> > Getting stuck in an hrtimer is a disaster:
>
> You'll get NMI watchdog splats. Getting stuck in NMI context is far more
> 'interesting :-)

Yes, I did see a stack trace in perf_swevent_hrtimer() which led
me here. But I have to admit among those hundreds of soft lockup's,
I only saw one showing swevent hrtimer backtrace.

Previously I thought this is because of NMI handler race, but Jiri
pointed out the race doesn't exist.

>
> > +#define PERF_EF_NO_WAIT  0x08/* do not wait when stopping, 
> > for
> > +  * example, waiting for a timer
> > +  */
>
> That's a broken comment style.

It is picked by checkpatch.pl, not me, I chose a different one and got
a complain. :)

Thanks!

Re: Linux 4.18-rc6

2018-07-23 Thread Linus Torvalds

On Mon, Jul 23, 2018 at 2:23 PM Guenter Roeck  wrote:
>
> My patch is also at
>
> https://patchwork.ozlabs.org/patch/937283/

Ah, ok, so that just adds the forward-declaration of 'struct page' in
the right global namespace.

Anyway, I'll just re-order the includes as I suggested, which I think
is the right fix and makes the forward-declaration unnecessary
(although not _wrong_)

 Linus

Re: [PATCH v2] hexagon: modify ffs() and fls() to return int

2018-07-23 Thread Richard Kuo

On Mon, Jul 23, 2018 at 04:27:47PM -0700, Randy Dunlap wrote:
> On 07/23/2018 03:50 PM, Richard Kuo wrote:
> > On Sun, Jul 22, 2018 at 04:03:58PM -0700, Randy Dunlap wrote:
> >> From: Randy Dunlap 
> >>
> >> Building drivers/mtd/nand/raw/nandsim.c on arch/hexagon/ produces a
> >> printk format build warning.  This is due to hexagon's ffs() being
> >> coded as returning long instead of int.
> >>
> >> Fix the printk format warning by changing all of hexagon's ffs() and
> >> fls() functions to return int instead of long.  The variables that
> >> they return are already int instead of long.  This return type
> >> matches the return type in .
> >>
> >> ../drivers/mtd/nand/raw/nandsim.c: In function 'init_nandsim':
> >> ../drivers/mtd/nand/raw/nandsim.c:760:2: warning: format '%u' expects 
> >> argument of type 'unsigned int', but argument 2 has type 'long int' 
> >> [-Wformat]
> >>
> >> There are no ffs() or fls() allmodconfig build errors after making this
> >> change.
> >>
> >> Signed-off-by: Randy Dunlap 
> >> Cc: Richard Kuo 
> >> Cc: linux-hexa...@vger.kernel.org
> >> Cc: Geert Uytterhoeven 
> >> ---
> >> v2:
> >> add hexagon contacts, drop erroneous sh contacts; [thanks, Geert]
> >> only change return type for ffs() and fls() [thanks, Geert]
> >>   [drop the changes for ffz(), __ffs(), and __fls()]
> >>
> >>  arch/hexagon/include/asm/bitops.h |4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> > 
> > 
> > Acked-by: Richard Kuo 
> > 
> 
> Hi Richard,
> 
> You are listed as the arch/hexagon/ maintainer.  Can you please merge these
> patches?
> 
> thanks,
> -- 
> ~Randy

Yes, I can queue it up and take it through my tree.


Thanks,
Richard Kuo


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

Applied "regulator: pfuze100: add optional disable switch-regulators binding" to the regulator tree

2018-07-23 Thread Mark Brown

The patch

   regulator: pfuze100: add optional disable switch-regulators binding

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 78170811a2048de8e77a27a053be8b3eb3d4e556 Mon Sep 17 00:00:00 2001
From: Marco Felsch 
Date: Mon, 23 Jul 2018 09:47:46 +0200
Subject: [PATCH] regulator: pfuze100: add optional disable switch-regulators
 binding

This binding is used to keep the backward compatibility with the current
dtb's [1]. The binding informs the driver that the unused switch regulators
can be disabled.
If it is not specified, the driver doesn't disable the switch regulators.

[1] https://patchwork.kernel.org/patch/10490381/

Signed-off-by: Marco Felsch 
Signed-off-by: Mark Brown 
---
 Documentation/devicetree/bindings/regulator/pfuze100.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt 
b/Documentation/devicetree/bindings/regulator/pfuze100.txt
index 672c939045ff..c7610718adff 100644
--- a/Documentation/devicetree/bindings/regulator/pfuze100.txt
+++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt
@@ -4,6 +4,15 @@ Required properties:
 - compatible: "fsl,pfuze100", "fsl,pfuze200", "fsl,pfuze3000", "fsl,pfuze3001"
 - reg: I2C slave address
 
+Optional properties:
+- fsl,pfuze-support-disable-sw: Boolean, if present disable all unused switch
+  regulators to save power consumption. Attention, ensure that all important
+  regulators (e.g. DDR ref, DDR supply) has set the "regulator-always-on"
+  property. If not present, the switched regualtors are always on and can't be
+  disabled. This binding is a workaround to keep backward compatibility with
+  old dtb's which rely on the fact that the switched regulators are always on
+  and don't mark them explicit as "regulator-always-on".
+
 Required child node:
 - regulators: This is the list of child nodes that specify the regulator
   initialization data for defined regulators. Please refer to below doc
-- 
2.18.0

[PATCH 0/2 v2] Add support for cpcap regulators on Tegra devices.

2018-07-23 Thread Peter Geis

Good Afternoon,

I am re-sending the whole patch set again.
I have sent this to myself, and confirmed it still patches cleanly.

I apologize once again.

The CPCAP regulator driver can support various devices, but currently only 
supports Omap4 devices.
Adds the sw2 and sw4 voltage tables, which power the Tegra core, and a DT match 
for the Tegra device.
Tested on the Motorola Xoom MZ602.

v2:
Stopped reinventing the wheel, using git email now.
Rebased against regulator for-next branch.

v1:
Fix conversion of tabulation to spaces.

Peter Geis (2):
  Add sw2_sw4 voltage table to cpcap regulator.
  Add support for CPCAP regulators on Tegra devices.

 .../bindings/regulator/cpcap-regulator.txt|   1 +
 drivers/regulator/cpcap-regulator.c   | 103 ++
 2 files changed, 104 insertions(+)

-- 
2.17.1

[PATCH 1/2 v2] Add sw2_sw4 voltage table to cpcap regulator.

2018-07-23 Thread Peter Geis

SW2 and SW4 use a shared table to provide voltage to the cpu core and
devices on Tegra hardware.
Added this table to the cpcap regulator driver as the first step to
supporting this device on Tegra.

Signed-off-by: Peter Geis 
---
 drivers/regulator/cpcap-regulator.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/regulator/cpcap-regulator.c 
b/drivers/regulator/cpcap-regulator.c
index bd910fe123d9..c0b1e04bd90f 100644
--- a/drivers/regulator/cpcap-regulator.c
+++ b/drivers/regulator/cpcap-regulator.c
@@ -271,6 +271,29 @@ static struct regulator_ops cpcap_regulator_ops = {
 };
 
 static const unsigned int unknown_val_tbl[] = { 0, };
+static const unsigned int sw2_sw4_val_tbl[] = { 612500, 625000, 637500,
+   65, 662500, 675000,
+   687500, 70, 712500,
+   725000, 737500, 75,
+   762500, 775000, 787500,
+   80, 812500, 825000,
+   837500, 85, 862500,
+   875000, 887500, 90,
+   912500, 925000, 937500,
+   95, 962500, 975000,
+   987500, 100, 1012500,
+   1025000, 1037500, 105,
+   1062500, 1075000, 1087500,
+   110, 1112500, 1125000,
+   1137500, 115, 1162500,
+   1175000, 1187500, 120,
+   1212500, 1225000, 1237500,
+   125, 1262500, 1275000,
+   1287500, 130, 1312500,
+   1325000, 1337500, 135,
+   1362500, 1375000, 1387500,
+   140, 1412500, 1425000,
+   1437500, 145, 1462500, };
 static const unsigned int sw5_val_tbl[] = { 0, 505, };
 static const unsigned int vcam_val_tbl[] = { 260, 270, 280,
 290, };
-- 
2.17.1

[PATCH 2/2 v2] Add support for CPCAP regulators on Tegra devices.

2018-07-23 Thread Peter Geis

Added support for the CPCAP power management regulator functions on
Tegra devices.
Added sw2_sw4 value tables, which provide power to the Tegra core and
aux devices.
Added the Tegra init tables and device tree compatibility match.

Signed-off-by: Peter Geis 
---
 .../bindings/regulator/cpcap-regulator.txt|  1 +
 drivers/regulator/cpcap-regulator.c   | 80 +++
 2 files changed, 81 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt 
b/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
index 675f4437ce92..3e2d33ab1731 100644
--- a/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
@@ -4,6 +4,7 @@ Motorola CPCAP PMIC voltage regulators
 Requires node properties:
 - "compatible" value one of:
 "motorola,cpcap-regulator"
+"motorola,tegra-cpcap-regulator"
 "motorola,mapphone-cpcap-regulator"
 
 Required regulator properties:
diff --git a/drivers/regulator/cpcap-regulator.c 
b/drivers/regulator/cpcap-regulator.c
index c0b1e04bd90f..cb3774be445d 100644
--- a/drivers/regulator/cpcap-regulator.c
+++ b/drivers/regulator/cpcap-regulator.c
@@ -412,6 +412,82 @@ static struct cpcap_regulator omap4_regulators[] = {
{ /* sentinel */ },
 };
 
+static struct cpcap_regulator tegra_regulators[] = {
+   CPCAP_REG(SW1, CPCAP_REG_S1C1, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_SW1_SEL, unknown_val_tbl,
+ 0, 0, 0, 0, 0, 0),
+   CPCAP_REG(SW2, CPCAP_REG_S2C1, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_SW2_SEL, sw2_sw4_val_tbl,
+ 0xf00, 0x7f, 0, 0x800, 0, 120),
+   CPCAP_REG(SW3, CPCAP_REG_S3C, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_SW3_SEL, unknown_val_tbl,
+ 0, 0, 0, 0, 0, 0),
+   CPCAP_REG(SW4, CPCAP_REG_S4C1, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_SW4_SEL, sw2_sw4_val_tbl,
+ 0xf00, 0x7f, 0, 0x900, 0, 100),
+   CPCAP_REG(SW5, CPCAP_REG_S5C, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_SW5_SEL, sw5_val_tbl,
+ 0x2a, 0, 0, 0x22, 0, 0),
+   CPCAP_REG(SW6, CPCAP_REG_S6C, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_SW6_SEL, unknown_val_tbl,
+ 0, 0, 0, 0, 0, 0),
+   CPCAP_REG(VCAM, CPCAP_REG_VCAMC, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_VCAM_SEL, vcam_val_tbl,
+ 0x87, 0x30, 4, 0x7, 0, 420),
+   CPCAP_REG(VCSI, CPCAP_REG_VCSIC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VCSI_SEL, vcsi_val_tbl,
+ 0x47, 0x10, 4, 0x7, 0, 350),
+   CPCAP_REG(VDAC, CPCAP_REG_VDACC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VDAC_SEL, vdac_val_tbl,
+ 0x87, 0x30, 4, 0x3, 0, 420),
+   CPCAP_REG(VDIG, CPCAP_REG_VDIGC, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_VDIG_SEL, vdig_val_tbl,
+ 0x87, 0x30, 4, 0x5, 0, 420),
+   CPCAP_REG(VFUSE, CPCAP_REG_VFUSEC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VFUSE_SEL, vfuse_val_tbl,
+ 0x80, 0xf, 0, 0x80, 0, 420),
+   CPCAP_REG(VHVIO, CPCAP_REG_VHVIOC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VHVIO_SEL, vhvio_val_tbl,
+ 0x17, 0, 0, 0x2, 0, 0),
+   CPCAP_REG(VSDIO, CPCAP_REG_VSDIOC, CPCAP_REG_ASSIGN2,
+ CPCAP_BIT_VSDIO_SEL, vsdio_val_tbl,
+ 0x87, 0x38, 3, 0x2, 0, 420),
+   CPCAP_REG(VPLL, CPCAP_REG_VPLLC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VPLL_SEL, vpll_val_tbl,
+ 0x43, 0x18, 3, 0x1, 0, 420),
+   CPCAP_REG(VRF1, CPCAP_REG_VRF1C, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VRF1_SEL, vrf1_val_tbl,
+ 0xac, 0x2, 1, 0xc, 0, 10),
+   CPCAP_REG(VRF2, CPCAP_REG_VRF2C, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VRF2_SEL, vrf2_val_tbl,
+ 0x23, 0x8, 3, 0x3, 0, 10),
+   CPCAP_REG(VRFREF, CPCAP_REG_VRFREFC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VRFREF_SEL, vrfref_val_tbl,
+ 0x23, 0x8, 3, 0x3, 0, 420),
+   CPCAP_REG(VWLAN1, CPCAP_REG_VWLAN1C, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VWLAN1_SEL, vwlan1_val_tbl,
+ 0x47, 0x10, 4, 0x5, 0, 420),
+   CPCAP_REG(VWLAN2, CPCAP_REG_VWLAN2C, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VWLAN2_SEL, vwlan2_val_tbl,
+ 0x20c, 0xc0, 6, 0x8, 0, 420),
+   CPCAP_REG(VSIM, CPCAP_REG_VSIMC, CPCAP_REG_ASSIGN3,
+ 0x, vsim_val_tbl,
+ 0x23, 0x8, 3, 0x3, 0, 420),
+   CPCAP_REG(VSIMCARD, CPCAP_REG_VSIMC, CPCAP_REG_ASSIGN3,
+ 0x, vsimcard_val_tbl,
+ 0x1e80, 0x8, 3, 0x1e00, 0, 420),
+   CPCAP_REG(VVIB, CPCAP_REG_VVIBC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VVIB_SEL, vvib_val_tbl,
+ 0x1, 0xc, 2, 0, 0x1, 500),
+   CPCAP_REG(VUSB, CPCAP_REG_VUSBC, CPCAP_REG_ASSIGN3,
+ CPCAP_BIT_VUSB_SEL, vusb_val_tbl,
+ 0x11c, 0x40,

Re: [PATCH 05/11] touchscreen: elants: Use octal permissions

2018-07-23 Thread Harshit Jain

I ran a treewide script and changed them all to octal and built a kernel
which i am currently running on my machine, I have used DEVICE_ATTR_RO()
for 0444's where possible if i dont find any regressions I will post a
patch for review. I think i will be testing it at least for a week.

On Tue, Jul 24, 2018 at 12:00 AM, Joe Perches  wrote:

> On Mon, 2018-07-23 at 11:24 -0700, Guenter Roeck wrote:
> > There are much more urgent issues to fix there (such as, for example,
> > converting the "offending" drivers to the latest API, which would
> > magically cause most of the offenders to disappear).
>
> Perhaps posting a list of desired hwmon changes could help.
>
> Documentation/hwmon/submitting-patches does not seem to specify
> what the "latest API" is nor describe what changes would be
> required in older drivers.
>

Re: [patch v3 -mm 3/6] mm, memcg: add hierarchical usage oom policy

2018-07-23 Thread David Rientjes

On Mon, 16 Jul 2018, David Rientjes wrote:

> > And "tree" is different. It actually changes how the selection algorithm 
> > works,
> > and sub-tree settings do matter in this case.
> > 
> 
> "Tree" is considering the entity as a single indivisible memory consumer, 
> it is compared with siblings based on its hierarhical usage.  It has 
> cgroup oom policy.
> 
> It would be possible to separate this out, if you'd prefer, to account 
> an intermediate cgroup as the largest descendant or the sum of all 
> descendants.  I hadn't found a usecase for that, however, but it doesn't 
> mean there isn't one.  If you'd like, I can introduce another tunable.
> 

Roman, I'm trying to make progress so that the cgroup aware oom killer is 
in a state that it can be merged.  Would you prefer a second tunable here 
to specify a cgroup's points includes memory from its subtree?

It would be helpful if you would also review the rest of the patchset.

Re: bisected: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-23 Thread Meelis Roos

> >> Now this seems more relevant:
> >>
> >> mroos@rx100s2:~/linux$ nice git bisect good
> >> 24dea04767e6e5175f4750770281b0c17ac6a2fb is the first bad commit
> >> commit 24dea04767e6e5175f4750770281b0c17ac6a2fb
> >> Author: Daniel Borkmann 
> >> Date:   Fri May 4 01:08:23 2018 +0200
> >>
> >> bpf, x32: remove ld_abs/ld_ind
> >>
> >> Since LD_ABS/LD_IND instructions are now removed from the core and
> >> reimplemented through a combination of inlined BPF instructions and
> >> a slow-path helper, we can get rid of the complexity from x32 JIT.
> > 
> > This does seem much more likely than the previous bisection, given
> > that you ended up in an x86-32 specific commit (the subject says x32,
> > but that is a mistake). I also checked that systemd indeed does
> > call into bpf in a number of places, possibly for the journald socket.
> > 
> > OTOH, it's still hard to tell how that commit can have ended up
> > corrupting the clock read function in systemd. To cross-check,
> > could you try reverting that commit on the latest kernel and see
> > if it still works?
> 
> I would be curious as well about that whether revert would make it
> work. What's the value of sysctl net.core.bpf_jit_enable ? Does it
> change anything if you set it to 0 (only interpreter) or 1 (JIT
> enabled). Seems a bit strange to me that bisect ended at this commit
> given the issue you have. The JIT itself was also new in this window
> fwiw. In any case some more debug info would be great to have.

net.core.bpf_jit_enable is 1.

Since it breaks bootup, I can not easily change the value at runtime (it 
would be postfactum). Do you mean changing the 
CONFIG_BPF_JIT_ALWAYS_ON=y option?

Anyway, I started compile of v4.18-rc5 that was the latest I tested, 
with the commit in question reverted. Will see if I can test tomorrow 
morning. But I will leave tomorrow for a week and can only test further 
things if they happen to boot fine (no manual reboot possible for a 
week).

-- 
Meelis Roos (mr...@linux.ee)

Re: [PATCH] tpm: add support for partial reads

2018-07-23 Thread James Bottomley

On Mon, 2018-07-23 at 13:53 -0700, Tadeusz Struk wrote:
> On 07/23/2018 01:19 PM, Jarkko Sakkinen wrote:
> > In this case I do not have any major evidence of any major benefit
> > *and* the change breaks the ABI.
> 
> As I said before - this does not break the ABI.

The current patch does, you even provided a use case in your last email
 (it's do command to get sizing followed by do command with correctly
sized buffer). 

However, if you tie it to O_NONBLOCK, it won't because no-one currently
opens the TPM device non blocking so it's an ABI conformant
discriminator of the uses.  Tying to O_NONBLOCK should be simple
because it's in file->f_flags.

James

Re: [PATCH v2] IB/mlx5: avoid excessive warning msgs when creating VFs on 2nd port

2018-07-23 Thread Daniel Jurgens




On 7/23/2018 4:15 PM, Qing Huang wrote:
> When a CX5 device is configured in dual-port RoCE mode, after creating
> many VFs against port 1, creating the same number of VFs against port 2
> will flood kernel/syslog with something like
> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
> affiliated."
>
> So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
> repeatedly attempts to bind the new mpi structure to every device
> on the list until it finds an unbound device.
>
> Change the log level from warn to dbg to avoid log flooding as the warning
> should be harmless.
>
> Signed-off-by: Qing Huang 
> ---
>   v1 -> v2: change the log level instead
>
>  drivers/infiniband/hw/mlx5/main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index b3ba9a2..f57b8f7 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -5127,8 +5127,8 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev 
> *ibdev,
>  
>   spin_lock(>port[port_num].mp.mpi_lock);
>   if (ibdev->port[port_num].mp.mpi) {
> - mlx5_ib_warn(ibdev, "port %d already affiliated.\n",
> -  port_num + 1);
> + mlx5_ib_dbg(ibdev, "port %d already affiliated.\n",
> + port_num + 1);
>   spin_unlock(>port[port_num].mp.mpi_lock);
>   return false;
>   }
Reviewed-by: Daniel Jurgens

Re: [patch v3 -mm 3/6] mm, memcg: add hierarchical usage oom policy

2018-07-23 Thread Roman Gushchin

On Mon, Jul 23, 2018 at 01:33:19PM -0700, David Rientjes wrote:
> On Mon, 16 Jul 2018, David Rientjes wrote:
> 
> > > And "tree" is different. It actually changes how the selection algorithm 
> > > works,
> > > and sub-tree settings do matter in this case.
> > > 
> > 
> > "Tree" is considering the entity as a single indivisible memory consumer, 
> > it is compared with siblings based on its hierarhical usage.  It has 
> > cgroup oom policy.
> > 
> > It would be possible to separate this out, if you'd prefer, to account 
> > an intermediate cgroup as the largest descendant or the sum of all 
> > descendants.  I hadn't found a usecase for that, however, but it doesn't 
> > mean there isn't one.  If you'd like, I can introduce another tunable.
> > 
> 
> Roman, I'm trying to make progress so that the cgroup aware oom killer is 
> in a state that it can be merged.  Would you prefer a second tunable here 
> to specify a cgroup's points includes memory from its subtree?

Hi, David!

It's hard to tell, because I don't have a clear picture of what you're
suggesting now. My biggest concern about your last version was that it's hard
to tell what oom_policy really defines. Each value has it's own application
rules, which is a bit messy (some values are meaningful for OOMing cgroup only,
other are reading on hierarchy traversal).
If you know how to make it clear and non-contradictory,
please, describe the proposed interface.

> 
> It would be helpful if you would also review the rest of the patchset.

I think, that we should focus on interface semantics right now.
If we can't agree on how the things should work, it makes no sense
to discuss the implementation.

Thanks!

Re: [PATCH v3] PCI: Check for PCIe downtraining conditions

2018-07-23 Thread Tal Gilboa


On 7/23/2018 8:01 PM, Alex G. wrote:

On 07/23/2018 12:21 AM, Tal Gilboa wrote:

On 7/19/2018 6:49 PM, Alex G. wrote:



On 07/18/2018 08:38 AM, Tal Gilboa wrote:

On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:

[+cc maintainers of drivers that already use pcie_print_link_status()
and GPU folks]

[snip]



+    /* Multi-function PCIe share the same link/status. */
+    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
+    return;
+
+    pcie_print_link_status(dev);
+}


Is this function called by default for every PCIe device? What about 
VFs? We make an exception for them on our driver since a VF doesn't 
have access to the needed information in order to provide a 
meaningful message.


I'm assuming VF means virtual function. pcie_print_link_status() 
doesn't care if it's passed a virtual function. It will try to do its 
job. That's why I bail out three lines above, with 'dev->is_virtfn' 
check.


Alex


That's the point - we don't want to call pcie_print_link_status() for 
virtual functions. We make the distinction in our driver. If you want 
to change the code to call this function by default it shouldn't 
affect the current usage.


I'm not understanding very well what you're asking. I understand you 
want to avoid printing this message on virtual functions, and that's 
already taken care of. I'm also not changing current behavior.  Let's 
get v2 out and start the discussion again based on that.


Alex


Oh ok I see. In this case, please remove the explicit call in mlx4/5 
drivers so it won't be duplicated.

Re: [PATCH] arch/h8300: fix kernel/dma.c build warning

2018-07-23 Thread Richard Kuo

On Sun, Jul 22, 2018 at 10:24:58AM -0700, Randy Dunlap wrote:
> On 07/22/2018 02:25 AM, Geert Uytterhoeven wrote:
> > CC hexagon
> > 
> > hexagon != H8/300 != SuperH
> 
> argh.  Thanks.
> 
> > On Sat, Jul 21, 2018 at 5:17 AM Randy Dunlap  wrote:
> >>
> >> From: Randy Dunlap 
> >>
> >> Fix build warning in arch/hexagon/kernel/dma.c by casting a void *
> >> to unsigned long to match the function parameter type.
> >>
> >> ../arch/hexagon/kernel/dma.c: In function 'arch_dma_alloc':
> >> ../arch/hexagon/kernel/dma.c:51:5: warning: passing argument 2 of 
> >> 'gen_pool_add' makes integer from pointer without a cast [enabled by 
> >> default]
> >> ../include/linux/genalloc.h:112:19: note: expected 'long unsigned int' but 
> >> argument is of type 'void *'
> >>
> >> Signed-off-by: Randy Dunlap 
> >> Cc: Yoshinori Sato 
> >> Cc: Rich Felker 
> >> Cc: linux...@vger.kernel.org
> >> ---
> >>  arch/hexagon/kernel/dma.c |2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>

Thanks all the same!

For Hexagon:


Acked-by: Richard Kuo 


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

Re: [RFC v5 0/2] mm: zap pages with read mmap_sem in munmap for large mapping

2018-07-23 Thread Yang Shi


Hi folks,


Any comment on this version?


Thanks,

Yang


On 7/18/18 4:21 PM, Yang Shi wrote:

Background:
Recently, when we ran some vm scalability tests on machines with large memory,
we ran into a couple of mmap_sem scalability issues when unmapping large memory
space, please refer to https://lkml.org/lkml/2017/12/14/733 and
https://lkml.org/lkml/2018/2/20/576.


History:
Then akpm suggested to unmap large mapping section by section and drop mmap_sem
at a time to mitigate it (see https://lkml.org/lkml/2018/3/6/784).

V1 patch series was submitted to the mailing list per Andrew's suggestion
(see https://lkml.org/lkml/2018/3/20/786). Then I received a lot great feedback
and suggestions.

Then this topic was discussed on LSFMM summit 2018. In the summit, Michal Hocko
suggested (also in the v1 patches review) to try "two phases" approach. Zapping
pages with read mmap_sem, then doing via cleanup with write mmap_sem (for
discussion detail, see https://lwn.net/Articles/753269/)


Approach:
Zapping pages is the most time consuming part, according to the suggestion from
Michal Hocko [1], zapping pages can be done with holding read mmap_sem, like
what MADV_DONTNEED does. Then re-acquire write mmap_sem to cleanup vmas.

But, we can't call MADV_DONTNEED directly, since there are two major drawbacks:
   * The unexpected state from PF if it wins the race in the middle of munmap.
 It may return zero page, instead of the content or SIGSEGV.
   * Can’t handle VM_LOCKED | VM_HUGETLB | VM_PFNMAP and uprobe mappings, which
 is a showstopper from akpm

But, some part may need write mmap_sem, for example, vma splitting. So,
the design is as follows:
 acquire write mmap_sem
 lookup vmas (find and split vmas)
 detach vmas
 deal with special mappings
 downgrade_write

 zap pages
 free page tables
 release mmap_sem

The vm events with read mmap_sem may come in during page zapping, but
since vmas have been detached before, they, i.e. page fault, gup, etc,
will not be able to find valid vma, then just return SIGSEGV or -EFAULT
as expected.

If the vma has VM_LOCKED | VM_HUGETLB | VM_PFNMAP or uprobe, they are
considered as special mappings. They will be dealt with before zapping
pages with write mmap_sem held. Basically, just update vm_flags.

And, since they are also manipulated by unmap_single_vma() which is
called by unmap_vma() with read mmap_sem held in this case, to
prevent from updating vm_flags in read critical section, a new
parameter, called "skip_flags" is added to unmap_region(), unmap_vmas()
and unmap_single_vma(). If it is true, then just skip unmap those
special mappings. Currently, the only place which pass true to this
parameter is us.

With this approach we don't have to re-acquire mmap_sem again to clean
up vmas to avoid race window which might get the address space changed.

And, since the lock acquire/release cost is managed to the minimum and
almost as same as before, the optimization could be extended to any size
of mapping without incuring significant penalty to small mappings.

For the time being, just do this in munmap syscall path. Other vm_munmap() or
do_munmap() call sites (i.e mmap, mremap, etc) remain intact for stability
reason.

Changelog:
v4 -> v5:
* Detach vmas before zapping pages so that we don't have to use VM_DEAD to mark
   a being unmapping vma since they have been detached from rbtree when zapping
   pages. Per Kirill
* Eliminate VM_DEAD stuff
* With this change we don't have to re-acquire write mmap_sem to do cleanup.
   So, we could eliminate a potential race window
* Eliminate PUD_SIZE check, and extend this optimization to all size

v3 -> v4:
* Extend check_stable_address_space to check VM_DEAD as Michal suggested
* Deal with vm_flags update of VM_LOCKED | VM_HUGETLB | VM_PFNMAP and uprobe
   mappings with exclusive lock held. The actual unmapping is still done with 
read
   mmap_sem to solve akpm's concern
* Clean up vmas with calling do_munmap to prevent from race condition by not
   carrying vmas as Kirill suggested
* Extracted more common code
* Solved some code cleanup comments from akpm
* Dropped uprobe and arch specific code, now all the changes are mm only
* Still keep PUD_SIZE threshold, if everyone thinks it is better to extend to 
all
   sizes or smaller size, will remove it
* Make this optimization 64 bit only explicitly per akpm's suggestion

v2 -> v3:
* Refactor do_munmap code to extract the common part per Peter's sugestion
* Introduced VM_DEAD flag per Michal's suggestion. Just handled VM_DEAD in
   x86's page fault handler for the time being. Other architectures will be 
covered
   once the patch series is reviewed
* Now lookup vma (find and split) and set VM_DEAD flag with write mmap_sem, then
   zap mapping with read mmap_sem, then clean up pgtables and vmas with write
   mmap_sem per Peter's suggestion

v1 -> v2:
* Re-implemented the code per the discussion on LSFMM summit


Regression and

Re: [PATCH 0/3] PTI for x86-32 Fixes and Updates

2018-07-23 Thread Josh Poimboeuf

On Mon, Jul 23, 2018 at 11:38:30PM +0200, Pavel Machek wrote:
> But for now I'd like at least "global" option of turning pti on/off
> during runtime for benchmarking. Let me see...
> 
> Something like this, or is it going to be way more complex? Does
> anyone have patch by chance?

RHEL/CentOS has a global PTI enable/disable, which uses stop_machine().

-- 
Josh

[PATCH] Staging: octeon: Apply Licence and resolves warnings according to TODO list. There are also a few "checks" that probably should revised but i think most of them could be resolved by breaking d

2018-07-23 Thread Georgios Tsotsos

Signed-off-by: Georgios Tsotsos 
---
 drivers/staging/octeon-usb/octeon-hcd.c | 55 ++---
 drivers/staging/octeon-usb/octeon-hcd.h |  1 +
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/octeon-usb/octeon-hcd.c 
b/drivers/staging/octeon-usb/octeon-hcd.c
index cded30f145aa..472ad5917ad2 100644
--- a/drivers/staging/octeon-usb/octeon-hcd.c
+++ b/drivers/staging/octeon-usb/octeon-hcd.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * This file is subject to the terms and conditions of the GNU General Public
  * License.  See the file "COPYING" in the main directory of this archive
@@ -377,27 +378,29 @@ struct octeon_hcd {
 };
 
 /* This macro spins on a register waiting for it to reach a condition. */
-#define CVMX_WAIT_FOR_FIELD32(address, _union, cond, timeout_usec) \
-   ({int result;   \
-   do {\
-   u64 done = cvmx_get_cycle() + (u64)timeout_usec *   \
-  octeon_get_clock_rate() / 100;   \
-   union _union c; \
-   \
-   while (1) { \
-   c.u32 = cvmx_usb_read_csr32(usb, address);  \
-   \
-   if (cond) { \
-   result = 0; \
-   break;  \
-   } else if (cvmx_get_cycle() > done) {   \
-   result = -1;\
-   break;  \
-   } else  \
-   __delay(100);   \
-   }   \
-   } while (0);\
-   result; })
+#define CVMX_WAIT_FOR_FIELD32(address, _union, cond, timeout_usec) \
+({ \
+   int result; \
+   do {\
+   u64 done = cvmx_get_cycle() + (u64)(timeout_usec) * \
+  octeon_get_clock_rate() / 100;   \
+   union _union c; \
+   \
+   while (1) { \
+   c.u32 = cvmx_usb_read_csr32(usb, address);  \
+   \
+   if (cond) { \
+   result = 0; \
+   break;  \
+   } else if (cvmx_get_cycle() > done) {   \
+   result = -1;\
+   break;  \
+   } else  \
+   __delay(100);   \
+   }   \
+   } while (0);\
+   result; \
+})
 
 /*
  * This macro logically sets a single field in a CSR. It does the sequence
@@ -2636,12 +2639,14 @@ static int cvmx_usb_poll_channel(struct octeon_hcd 
*usb, int channel)
hcintmsk.u32 = 0;
hcintmsk.s.chhltdmsk = 1;
cvmx_usb_write_csr32(usb,
-
CVMX_USBCX_HCINTMSKX(channel, usb->index),
-hcintmsk.u32);
+   CVMX_USBCX_HCINTMSKX(channel,
+   usb->index),
+   hcintmsk.u32);
usbc_hcchar.s.chdis = 1;
cvmx_usb_write_csr32(usb,
-
CVMX_USBCX_HCCHARX(channel, usb->index),
-usbc_hcchar.u32);
+

Re: [PATCH v7 08/12] mfd: intel-peci-client: Add PECI client MFD driver

2018-07-23 Thread Randy Dunlap

On 07/23/2018 02:47 PM, Jae Hyun Yoo wrote:
> This commit adds PECI client MFD driver.
> 
> Signed-off-by: Jae Hyun Yoo 
> Cc: Lee Jones 
> Cc: Rob Herring 
> Cc: Andrew Jeffery 
> Cc: James Feist 
> Cc: Jason M Biils 
> Cc: Joel Stanley 
> Cc: Vernon Mauery 
> ---
>  drivers/mfd/Kconfig   |  14 ++
>  drivers/mfd/Makefile  |   1 +
>  drivers/mfd/intel-peci-client.c   | 182 ++
>  include/linux/mfd/intel-peci-client.h |  81 
>  4 files changed, 278 insertions(+)
>  create mode 100644 drivers/mfd/intel-peci-client.c
>  create mode 100644 include/linux/mfd/intel-peci-client.h
> 
> diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
> index f3fa516011ec..e38b591479d4 100644
> --- a/drivers/mfd/Kconfig
> +++ b/drivers/mfd/Kconfig
> @@ -595,6 +595,20 @@ config MFD_INTEL_MSIC
> Passage) chip. This chip embeds audio, battery, GPIO, etc.
> devices used in Intel Medfield platforms.
>  
> +config MFD_INTEL_PECI_CLIENT
> + bool "Intel PECI client"
> + depends on (PECI || COMPILE_TEST)
> + select MFD_CORE
> + help
> +   If you say yes to this option, support will be included for the
> +   multi-funtional Intel PECI (Platform Environment Control Interface)

  multi-functional

> +   client. PECI is a one-wire bus interface that provides a communication
> +   channel from PECI clients in Intel processors and chipset components
> +   to external monitoring or control devices.
> +
> +   Additional drivers must be enabled in order to use the functionality
> +   of the device.
> +
>  config MFD_IPAQ_MICRO
>   bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support"
>   depends on SA1100_H3100 || SA1100_H3600


-- 
~Randy

Re: kernel BUG at mm/shmem.c:LINE!

2018-07-23 Thread Matthew Wilcox

On Mon, Jul 23, 2018 at 03:42:22PM -0700, Hugh Dickins wrote:
> On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > I figured out a fix and pushed it to the 'ida' branch in
> > git://git.infradead.org/users/willy/linux-dax.git
> 
> Great, thanks a lot for sorting that out so quickly. But I've cloned
> the tree and don't see today's patch, so assume you've folded the fix
> into an existing commit? If possible, please append the diff of today's
> fix to this thread so that we can try it out. Or if that's difficult,
> please at least tell which files were modified, then I can probably
> work it out from the diff of those files against mmotm.

Sure!  It's just this:

diff --git a/lib/xarray.c b/lib/xarray.c
index 32a9c2a6a9e9..383c410997eb 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -660,6 +660,8 @@ void xas_create_range(struct xa_state *xas)
unsigned char sibs = xas->xa_sibs;
 
xas->xa_index |= ((sibs + 1) << shift) - 1;
+   if (!xas_top(xas->xa_node) && xas->xa_node->shift == xas->xa_shift)
+   xas->xa_offset |= sibs;
xas->xa_shift = 0;
xas->xa_sibs = 0;
 

The only other things changed are the test suite, and removing an
unnecessary change, so they can be ignored:

diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index 8a67d4bb1788..ec06c3ca19e9 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -695,19 +695,20 @@ static noinline void check_move(struct xarray *xa)
check_move_small(xa, (1UL << i) - 1);
 }
 
-static noinline void check_create_range_1(struct xarray *xa,
+static noinline void xa_store_many_order(struct xarray *xa,
unsigned long index, unsigned order)
 {
XA_STATE_ORDER(xas, xa, index, order);
-   unsigned int i;
+   unsigned int i = 0;
 
do {
xas_lock();
+   XA_BUG_ON(xa, xas_find_conflict());
xas_create_range();
if (xas_error())
goto unlock;
for (i = 0; i < (1U << order); i++) {
-   xas_store(, xa + i);
+   XA_BUG_ON(xa, xas_store(, xa_mk_value(index + i)));
xas_next();
}
 unlock:
@@ -715,7 +716,29 @@ static noinline void check_create_range_1(struct xarray 
*xa,
} while (xas_nomem(, GFP_KERNEL));
 
XA_BUG_ON(xa, xas_error());
-   xa_destroy(xa);
+}
+
+static noinline void check_create_range_1(struct xarray *xa,
+   unsigned long index, unsigned order)
+{
+   unsigned long i;
+
+   xa_store_many_order(xa, index, order);
+   for (i = index; i < index + (1UL << order); i++)
+   xa_erase_value(xa, i);
+   XA_BUG_ON(xa, !xa_empty(xa));
+}
+
+static noinline void check_create_range_2(struct xarray *xa, unsigned order)
+{
+   unsigned long i;
+   unsigned long nr = 1UL << order;
+
+   for (i = 0; i < nr * nr; i += nr)
+   xa_store_many_order(xa, i, order);
+   for (i = 0; i < nr * nr; i++)
+   xa_erase_value(xa, i);
+   XA_BUG_ON(xa, !xa_empty(xa));
 }
 
 static noinline void check_create_range(struct xarray *xa)
@@ -729,6 +752,8 @@ static noinline void check_create_range(struct xarray *xa)
check_create_range_1(xa, 2U << order, order);
check_create_range_1(xa, 3U << order, order);
check_create_range_1(xa, 1U << 24, order);
+   if (order < 10)
+   check_create_range_2(xa, order);
}
 }
 
diff --git a/mm/shmem.c b/mm/shmem.c
index af2d7fa05af7..3ac507803787 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -589,8 +589,8 @@ static int shmem_add_to_page_cache(struct page *page,
VM_BUG_ON(expected && PageTransHuge(page));
 
page_ref_add(page, nr);
-   page->index = index;
page->mapping = mapping;
+   page->index = index;
 
do {
void *entry;

Re: [PATCH 2/2] PCI: NVMe device specific reset quirk

2018-07-23 Thread Alex Williamson

On Mon, 23 Jul 2018 16:45:08 -0600
Keith Busch  wrote:

> On Mon, Jul 23, 2018 at 04:24:31PM -0600, Alex Williamson wrote:
> > Take advantage of NVMe devices using a standard interface to quiesce
> > the controller prior to reset, including device specific delays before
> > and after that reset.  This resolves several NVMe device assignment
> > scenarios with two different vendors.  The Intel DC P3700 controller
> > has been shown to only work as a VM boot device on the initial VM
> > startup, failing after reset or reboot, and also fails to initialize
> > after hot-plug into a VM.  Adding a delay after FLR resolves these
> > cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
> > from FLR with the PCI config space reading back as -1.  A reproducible
> > instance of this behavior is resolved by clearing the enable bit in
> > the configuration register and waiting for the ready status to clear
> > (disabling the NVMe controller) prior to FLR.
> > 
> > As all NVMe devices make use of this standard interface and the NVMe
> > specification also requires PCIe FLR support, we can apply this quirk
> > to all devices with matching class code.  
> 
> Shouldn't this go in the nvme driver's reset_prepare/reset_done callbacks?

The scenario I'm trying to fix is device assignment, the nvme driver
isn't in play there.  The device is bound to the vfio-pci driver at the
time of these resets.  Thanks,

Alex

Re: [PATCH 1/5] mfd: rk808: Add RK817 and RK809 support

2018-07-23 Thread kbuild test robot

Hi Tony,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on ljones-mfd/for-mfd-next]
[also build test ERROR on v4.18-rc6 next-20180723]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Tony-Xie/mfd-rk808-Add-RK817-and-RK809-support/20180724-040547
base:   https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next
config: i386-randconfig-a0-201829 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

>> ERROR: "pm_power_off_prepare" [drivers/mfd/rk808.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] phy: qcom-qmp: Fix dts bindings to reflect reality

2018-07-23 Thread Rob Herring

On Mon, Jul 23, 2018 at 11:37 AM Doug Anderson  wrote:
>
> Hi,
>
> On Mon, Jul 23, 2018 at 7:04 AM, Rob Herring  wrote:
> > On Fri, Jul 20, 2018 at 11:54 AM Doug Anderson  
> > wrote:
> >>
> >> Hi,
> >>
> >> On Fri, Jul 20, 2018 at 10:26 AM, Rob Herring  wrote:
> >> > On Fri, Jul 20, 2018 at 9:13 AM Doug Anderson  
> >> > wrote:
> >> >>
> >> >> Rob,
> >> >>
> >> >> On Fri, Jul 20, 2018 at 7:10 AM, Rob Herring  wrote:
> >> >> > On Fri, Jul 06, 2018 at 04:31:42PM -0700, Douglas Anderson wrote:
> >> >> >> A few patches have landed for the qcom-qmp PHY that affect how you
> >> >> >> would write a device tree node.  ...yet the bindings weren't updated.
> >> >> >> Let's remedy the situation and make the bindings refelect reality.
> >> >> >
> >> >> > "dt-bindings: phy: ..." for the subject.
> >> >>
> >> >> Sorry.  Every subsystem has different conventions for this so I
> >> >> usually just do a "git log" on the file and make my best guess.  I'll
> >> >> try to remember this for next time though.
> >> >
> >> > NP. I'd like to add this info to MAINTAINERS or maybe a git commit
> >> > hook could figure this out automagically.
> >> >
> >> >> In this case, though, it looks like this already landed.  I see this
> >> >> patch in Kishon's next branch.
> >> >>
> >> >>
> >> >> >> Fixes: efb05a50c956 ("phy: qcom-qmp: Add support for QMP V3 USB3 
> >> >> >> PHY")
> >> >> >> Fixes: ac0d239936bd ("phy: qcom-qmp: Add support for runtime PM")
> >> >> >> Signed-off-by: Douglas Anderson 
> >> >> >> ---
> >> >> >>
> >> >> >>  .../devicetree/bindings/phy/qcom-qmp-phy.txt   | 14 
> >> >> >> --
> >> >> >>  1 file changed, 12 insertions(+), 2 deletions(-)
> >> >> >>
> >> >> >> diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt 
> >> >> >> b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
> >> >> >> index 266a1bb8bb6e..0c7629e88bf3 100644
> >> >> >> --- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
> >> >> >> +++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
> >> >> >> @@ -12,7 +12,14 @@ Required properties:
> >> >> >>  "qcom,sdm845-qmp-usb3-phy" for USB3 QMP V3 phy on 
> >> >> >> sdm845,
> >> >> >>  "qcom,sdm845-qmp-usb3-uni-phy" for USB3 QMP V3 UNI phy 
> >> >> >> on sdm845.
> >> >> >>
> >> >> >> - - reg: offset and length of register set for PHY's common serdes 
> >> >> >> block.
> >> >> >> + - reg:
> >> >> >> +   - For "qcom,sdm845-qmp-usb3-phy":
> >> >> >> + - index 0: address and length of register set for PHY's common 
> >> >> >> serdes
> >> >> >> +   block.
> >> >> >> + - named register "dp_com" (using reg-names): address and 
> >> >> >> length of the
> >> >> >> +   DP_COM control block.
> >> >> >
> >> >> > You need to list reg-names and what are the names for the other 2
> >> >> > regions?
> >> >>
> >> >> Here's the code works.  You can tell me how you want this expressed in
> >> >> the bindings:
> >> >>
> >> >> * In all cases the driver maps its main memory range (for the common
> >> >> serdes block) without specifying any name.  This is equivalent to
> >> >> asking for index 0.
> >> >>
> >> >> * For "qcom,sdm845-qmp-usb3-phy" the driver requests a second memory
> >> >> range by name using the name "dp_com".
> >> >>
> >> >> ...basically the driver is inconsistent between using names and
> >> >> indices and I was trying to document that fact.
> >> >
> >> > That's fine as long as the indices are fixed.
> >> >
> >> >>
> >> >> I guess options:
> >> >>
> >> >> 1. I could reword this so it's clearer (open to suggestions)
> >> >>
> >> >> 2. I could add something to the bindings saying that the first reg
> >> >> name should be "reg-base" or something.  Then the question is whether
> >> >> we should go to the code and start enforcing that.  If we do choose to
> >> >> enforce it then it's technically breaking compatibility (though I
> >> >> doubt there is any real dts in the wild).  If we don't choose to
> >> >> enforce it then why did we bother saying what it should be named?
> >> >
> >> > I think you need to state index 1 is dp_com (and only for
> >> > "qcom,sdm845-qmp-usb3-phy") and a name for index 0. 'reg-base' doesn't
> >> > seem great, but I don't have another suggestion.
> >>
> >> ...but why do we bother giving "dp_com" a name if we're saying it has
> >> to be index 1?  It feels like the author of the driver was trying to
> >> transition from specifying to specifying registers by index to
> >> specifying them by name, but left the first register specified by
> >> index for compatibility (or code simplicity?).  It seems like the
> >> whole point of referring to things by name is to _not_ force the index
> >> number.
> >
> > No. Specifying the order and indexes is how bindings are done.
> > "-names" is extra information, not a license to change the rules.
>
> OK.
>
> Just for context: I'm not trying to be argumentative or anything--I
> just seem to be lacking a fundamental understanding of why reg-names
> exists and when it

Re: [PATCH v2 3/6] dt-bindings: sound: rockchip-i2s: add description for px30

2018-07-23 Thread Mark Brown

On Mon, Jul 23, 2018 at 05:25:21PM +0800, c...@rock-chips.com wrote:
> From: Liang Chen 
> 
> Add "rockchip,px30-i2s", "rockchip,rk3066-i2s" for i2s on px30 platform.

Please submit patches using subject lines reflecting the style for the
subsystem.  This makes it easier for people to identify relevant
patches.  Look at what existing commits in the area you're changing are
doing and make sure your subject lines visually resemble what they're
doing.

signature.asc
Description: PGP signature

Re: [PATCH 1/2] Add sw2_sw4 voltage table to cpcap regulator.

2018-07-23 Thread Dmitry Osipenko

On Monday, 23 July 2018 21:37:50 MSK Peter Geis wrote:
> On 07/23/2018 02:13 PM, Mark Brown wrote:
> > On Mon, Jul 23, 2018 at 01:58:26PM -0400, Peter Geis wrote:
> >> SW2 and SW4 use a shared table to provide voltage to the cpu core and
> >> devices on Tegra hardware.
> >> Added this table to the cpcap regulator driver as the first step to
> >> supporting this device on Tegra.
> > 
> > This also doesn't apply against current code (though it does now parse
> > OK), please check and resend - make sure you don't have other out of
> > tree changes and are using an up to date kernel (ideally my regulator
> > for-next branch) as a base.
> 
> Good Afternoon,
> 
> I thought it was my error in the patches being stripped, unfortunately
> it seems to be a known Gmail behavior.
> Any ideas on how to get around it?

Use the "git send-email" instead of email client.

You need to create and send out patches using git, that will be something like 
this:

1) "git format-patch -v1 -2 ..." to make patches
2) "git send-email --smtp-server=smtp.gmail.com --smtp-
user=pgwipe...@gmail.com --smtp-encryption=tls --smtp-server-port=587 --
suppress-cc=all --confirm=always --to 'Mark Brown ' --cc 
'linux-te...@vger.kernel.org' --cc 'linux-kernel@vger.kernel.org' ... 
00*.patch" to send out the patches

Re: [PATCH 1/2] Add sw2_sw4 voltage table to cpcap regulator.

2018-07-23 Thread Peter Geis


On 07/23/2018 03:20 PM, Dmitry Osipenko wrote:

On Monday, 23 July 2018 21:37:50 MSK Peter Geis wrote:

On 07/23/2018 02:13 PM, Mark Brown wrote:

On Mon, Jul 23, 2018 at 01:58:26PM -0400, Peter Geis wrote:

SW2 and SW4 use a shared table to provide voltage to the cpu core and
devices on Tegra hardware.
Added this table to the cpcap regulator driver as the first step to
supporting this device on Tegra.


This also doesn't apply against current code (though it does now parse
OK), please check and resend - make sure you don't have other out of
tree changes and are using an up to date kernel (ideally my regulator
for-next branch) as a base.


Good Afternoon,

I thought it was my error in the patches being stripped, unfortunately
it seems to be a known Gmail behavior.
Any ideas on how to get around it?


Use the "git send-email" instead of email client.

You need to create and send out patches using git, that will be something like
this:

1) "git format-patch -v1 -2 ..." to make patches
2) "git send-email --smtp-server=smtp.gmail.com --smtp-
user=pgwipe...@gmail.com --smtp-encryption=tls --smtp-server-port=587 --
suppress-cc=all --confirm=always --to 'Mark Brown ' --cc
'linux-te...@vger.kernel.org' --cc 'linux-kernel@vger.kernel.org' ...
00*.patch" to send out the patches



As always, thanks Dmitry!
Resent through git this time.

[PATCH v5] PCI: Check for PCIe downtraining conditions

2018-07-23 Thread Alexandru Gagniuc

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor
straightforward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc 
---

For the sake of review, I've created a __pcie_print_link_status() which
takes a 'verbose' argument. If we agree want to go this route, and update
the users of pcie_print_link_status(), I can split this up in two patches.
I prefer just printing this information in the core functions, and letting
drivers not have to worry about this. Though there seems to be strong for
not going that route, so here it goes:

Changes since v4:
 - Use 'verbose' argumnet to print bandwidth under normal conditions
 - Without verbose, only downtraining conditions are reported

Changes since v3:
 - Remove extra newline and parentheses.

Changes since v2:
 - Check dev->is_virtfn flag

Changes since v1:
 - Use pcie_print_link_status() instead of reimplementing logic

 drivers/pci/pci.c   | 22 ++
 drivers/pci/probe.c | 21 +
 include/linux/pci.h |  1 +
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 316496e99da9..414ad7b3abdb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum 
pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Be verbose -- print info even when enough bandwidth is available.
  *
  * Report the available bandwidth at the device.  If this is less than the
  * device is capable of, report the device's maximum possible bandwidth and
  * the upstream link that limits its performance to less than that.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
enum pcie_link_width width, width_cap;
enum pci_bus_speed speed, speed_cap;
@@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
bw_cap = pcie_bandwidth_capable(dev, _cap, _cap);
bw_avail = pcie_bandwidth_available(dev, _dev, , );
 
-   if (bw_avail >= bw_cap)
+   if (bw_avail >= bw_cap && verbose)
pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d 
link)\n",
 bw_cap / 1000, bw_cap % 1000,
 PCIE_SPEED2STR(speed_cap), width_cap);
-   else
+   else if (bw_avail < bw_cap)
pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited 
by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 bw_avail / 1000, bw_avail % 1000,
 PCIE_SPEED2STR(speed), width,
@@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
 bw_cap / 1000, bw_cap % 1000,
 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.  If this is less than the
+ * device is capable of, report the device's maximum possible bandwidth and
+ * the upstream link that limits its performance to less than that.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+   __pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..1f7336377c3b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus 
*bus, int devfn)
return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+   if (!pci_is_pcie(dev))
+   return;
+
+   /* Look from the device up to avoid downstream ports with no devices. */
+   if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+   (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+   (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+   return;
+
+   /* Multi-function PCIe share the same link/status. */
+   if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+   return;
+
+   __pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
/* Enhanced Allocation */
@@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
/* Advanced Error Reporting */
pci_aer_init(dev);
 
+   /* Check link and

Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-23 Thread Paul E. McKenney

On Mon, Jul 23, 2018 at 04:10:41PM -0400, Steven Rostedt wrote:
> 
> Sorry for the late reply, just came back from the Caribbean :-) :-) :-)

Welcome back, and I hope that the Caribbean trip was a good one!

> On Fri, 13 Jul 2018 11:47:18 +0800
> Lai Jiangshan  wrote:
> 
> > On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
> >  wrote:
> > > Hello!
> > >
> > > I now have a semi-reasonable prototype of changes consolidating the
> > > RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> > > There are likely still bugs to be fixed and probably other issues as well,
> > > but a prototype does exist.
> 
> What's the rational for all this churn? Linus's complaining that there
> are too many RCU variants?

A CVE stemming from someone getting confused between the different flavors
of RCU.  The churn is large, as you say, but it does have the benefit of
making RCU a bit smaller.

Not necessarily simpler, but smaller.

> > > Assuming continued good rcutorture results and no objections, I am
> > > thinking in terms of this timeline:
> > >
> > > o   Preparatory work and cleanups are slated for the v4.19 merge 
> > > window.
> > >
> > > o   The actual consolidation and post-consolidation cleanup is slated
> > > for the merge window after v4.19 (v5.0?).  These cleanups include
> > > the replacements called out below within the RCU implementation
> > > itself (but excluding kernel/rcu/sync.c, see question below).
> > >
> > > o   Replacement of now-obsolete update APIs is slated for the second
> > > merge window after v4.19 (v5.1?).  The replacements are currently
> > > expected to be as follows:
> > >
> > > synchronize_rcu_bh() -> synchronize_rcu()
> > > synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> > > call_rcu_bh() -> call_rcu()
> > > rcu_barrier_bh() -> rcu_barrier()
> > > synchronize_sched() -> synchronize_rcu()
> > > synchronize_sched_expedited() -> synchronize_rcu_expedited()
> > > call_rcu_sched() -> call_rcu()
> > > rcu_barrier_sched() -> rcu_barrier()
> > > get_state_synchronize_sched() -> get_state_synchronize_rcu()
> > > cond_synchronize_sched() -> cond_synchronize_rcu()
> > > synchronize_rcu_mult() -> synchronize_rcu()
> > >
> > > I have done light testing of these replacements with good results.
> > >
> > > Any objections to this timeline?
> > >
> > > I also have some questions on the ultimate end point.  I have default
> > > choices, which I will likely take if there is no discussion.
> > >
> > > o
> > > Currently, I am thinking in terms of keeping the per-flavor
> > > read-side functions.  For example, rcu_read_lock_bh() would
> > > continue to disable softirq, and would also continue to tell
> > > lockdep about the RCU-bh read-side critical section.  However,
> > > synchronize_rcu() will wait for all flavors of read-side critical
> > > sections, including those introduced by (say) preempt_disable(),
> > > so there will no longer be any possibility of mismatching (say)
> > > RCU-bh readers with RCU-sched updaters.
> > >
> > > I could imagine other ways of handling this, including:
> > >
> > > a.  Eliminate rcu_read_lock_bh() in favor of
> > > local_bh_disable() and so on.  Rely on lockdep
> > > instrumentation of these other functions to identify RCU
> > > readers, introducing such instrumentation as needed.  I am
> > > not a fan of this approach because of the large number of
> > > places in the Linux kernel where interrupts, preemption,
> > > and softirqs are enabled or disabled "behind the scenes".
> > >
> > > b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> > > and required callers to also disable softirqs, preemption,
> > > or whatever as needed.  I am not a fan of this approach
> > > because it seems a lot less convenient to users of RCU-bh
> > > and RCU-sched.
> > >
> > > At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> > > read-side APIs.  But are there better approaches?  
> > 
> > Hello, Paul
> > 
> > Since local_bh_disable() will be guaranteed to be protected by RCU
> > and more general. I'm afraid it will be preferred over
> > rcu_read_lock_bh() which will be gradually being phased out.
> > 
> > In other words, keeping the RCU-bh read-side APIs will be a slower
> > version of the option A. So will the same approach for the RCU-sched.
> > But it'll still be better than the hurrying option A, IMHO.
> 
> Now when all this gets done, is synchronize_rcu() going to just wait
> for everything to pass? (scheduling, RCU readers, softirqs, etc) Is
> there any worry about lengthening the time of synchronize_rcu?

Yes, when

Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes

On Fri, 20 Jul 2018, Yang Shi wrote:

> I agree to keep it for a while to let that security bug cool down, however, if
> there is no user anymore, it sounds pointless to still keep a dead knob.
> 

It's not a dead knob.  We use it, and for reasons other than 
CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to 
allocate it after it has been freed due to memry pressure, we can either 
continue to disable it, allow it to be persistently available, or use a 
new value for use_zero_page to specify it should be persistently 
available.

Re: Linux 4.18-rc6

2018-07-23 Thread Guenter Roeck

On Sun, Jul 22, 2018 at 02:23:39PM -0700, Linus Torvalds wrote:
> So this was the week when the other shoe dropped ...  The reason the
> two previous rc releases were so nice and small was that David hadn't
> sent me much networking fixes, and they came in this week.
> 

Build results:
total: 134 pass: 133 fail: 1
Failed builds: 
sparc32:allmodconfig 
Qemu test results:
total: 172 pass: 172 fail: 0

The s390 gcc plugins related build error reported previously has not really
been fixed; after feedback from the s390 maintainers, suggesting that it
won't get fixed in 4.18, I disabled GCC_PLUGINS for s390 builds. This is
not my preferred solution, but it beats not testing s390:allmodconfig
builds at all.

The sparc32 build error is still:

In file included from
...
from drivers/staging/media/omap4iss/iss_video.c:15:
include/linux/highmem.h: In function 'clear_user_highpage':
include/linux/highmem.h:137:31: error:
passing argument 1 of 'sparc_flush_page_to_ram' from incompatible 
pointer type

due to a missing declaration of 'struct page', as previously reported.

Guenter

Re: [PATCH v6] pidns: introduce syscall translate_pid

2018-07-23 Thread Michael Tirado

Hey, I'm not seeing much activity on this so here's my $0.02

> Unix socket automatically translates pid attached to SCM_CREDENTIALS.
> This requires CAP_SYS_ADMIN for sending arbitrary pids and entering
> into pid namespace, this expose process and could be insecure.


Perhaps it would be a good idea to add a sysctl switch that prevents
credential spoofing over AF_UNIX \by default\ if that is the main
concern, or is there another concern and I have read this wrong?  I'm
having trouble thinking of a legitimate use of SCM_CREDENTIALS
spoofing that isn't in a debugging or troubleshooting context and
would be more comfortable if it were not possible at all... Anyone
know of a program that relies on this spoofing functionality?

If you look at socket(7) under SO_PEERCRED there is a way to get
credentials at time of connect() for an AF_UNIX SOCK_STREAM, or at
time of socketpair() for a SOCK_DGRAM. I would like to think these
credentials are reliable, but will probably require some extra daemon
to proxy a dgram syslog socket.

[PATCH RFC] debugobjects: Make stack check warning more informative

2018-07-23 Thread Joel Fernandes

From: "Joel Fernandes (Google)" 

Recently we debugged an issue where debugobject tracking was telling
us of an annotation issue. Turns out the issue was due to the object in
concern being on a different stack which was due to another issue.

Discussing with tglx, he suggested printing the pointers and the
location of the stack for the currently running task. This helped find
the object was on the wrong stack. I turned the resulting patch into
something upstreamable, so that the error message is more informative
and can help in debugging for similar issues in the future.

Signed-off-by: Joel Fernandes (Google) 
---
 lib/debugobjects.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 994be4805cec..24c1df0d7466 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -360,9 +360,12 @@ static void debug_object_is_on_stack(void *addr, int 
onstack)
 
limit++;
if (is_on_stack)
-   pr_warn("object is on stack, but not annotated\n");
+   pr_warn("object %p is on stack %p, but NOT annotated.\n", addr,
+task_stack_page(current));
else
-   pr_warn("object is not on stack, but annotated\n");
+   pr_warn("object %p is NOT on stack %p, but annotated.\n", addr,
+task_stack_page(current));
+
WARN_ON(1);
 }
 
-- 
2.18.0.233.g985f88cf7e-goog

[PATCH v2 2/2] clk: qcom: Add qspi (Quad SPI) clocks for sdm845

2018-07-23 Thread Douglas Anderson

Add both the interface and core clock.

Signed-off-by: Douglas Anderson 
---

Changes in v2:
- Only 19.2, 100, 150, and 300 MHz now.
- All clocks come from MAIN rather than EVEN.
- Use parent map 0 instead of new parent map 9.

 drivers/clk/qcom/gcc-sdm845.c | 63 +++
 1 file changed, 63 insertions(+)

diff --git a/drivers/clk/qcom/gcc-sdm845.c b/drivers/clk/qcom/gcc-sdm845.c
index 0f694ed4238a..5bca634e277a 100644
--- a/drivers/clk/qcom/gcc-sdm845.c
+++ b/drivers/clk/qcom/gcc-sdm845.c
@@ -162,6 +162,13 @@ static const char * const gcc_parent_names_10[] = {
"core_bi_pll_test_se",
 };
 
+static const char * const gcc_parent_names_9[] = {
+   "bi_tcxo",
+   "gpll0",
+   "gpll0_out_even",
+   "core_pi_sleep_clk",
+};
+
 static struct clk_alpha_pll gpll0 = {
.offset = 0x0,
.regs = clk_alpha_pll_regs[CLK_ALPHA_PLL_TYPE_FABIA],
@@ -358,6 +365,28 @@ static struct clk_rcg2 gcc_pcie_phy_refgen_clk_src = {
},
 };
 
+static const struct freq_tbl ftbl_gcc_qspi_core_clk_src[] = {
+   F(1920, P_BI_TCXO, 1, 0, 0),
+   F(1, P_GPLL0_OUT_MAIN, 6, 0, 0),
+   F(15000, P_GPLL0_OUT_MAIN, 4, 0, 0),
+   F(3, P_GPLL0_OUT_MAIN, 2, 0, 0),
+   { }
+};
+
+static struct clk_rcg2 gcc_qspi_core_clk_src = {
+   .cmd_rcgr = 0x4b008,
+   .mnd_width = 0,
+   .hid_width = 5,
+   .parent_map = gcc_parent_map_0,
+   .freq_tbl = ftbl_gcc_qspi_core_clk_src,
+   .clkr.hw.init = &(struct clk_init_data){
+   .name = "gcc_qspi_core_clk_src",
+   .parent_names = gcc_parent_names_9,
+   .num_parents = 4,
+   .ops = _rcg2_floor_ops,
+   },
+};
+
 static const struct freq_tbl ftbl_gcc_pdm2_clk_src[] = {
F(960, P_BI_TCXO, 2, 0, 0),
F(1920, P_BI_TCXO, 1, 0, 0),
@@ -1935,6 +1964,37 @@ static struct clk_branch gcc_qmip_video_ahb_clk = {
},
 };
 
+static struct clk_branch gcc_qspi_cnoc_periph_ahb_clk = {
+   .halt_reg = 0x4b000,
+   .halt_check = BRANCH_HALT,
+   .clkr = {
+   .enable_reg = 0x4b000,
+   .enable_mask = BIT(0),
+   .hw.init = &(struct clk_init_data){
+   .name = "gcc_qspi_cnoc_periph_ahb_clk",
+   .ops = _branch2_ops,
+   },
+   },
+};
+
+static struct clk_branch gcc_qspi_core_clk = {
+   .halt_reg = 0x4b004,
+   .halt_check = BRANCH_HALT,
+   .clkr = {
+   .enable_reg = 0x4b004,
+   .enable_mask = BIT(0),
+   .hw.init = &(struct clk_init_data){
+   .name = "gcc_qspi_core_clk",
+   .parent_names = (const char *[]){
+   "gcc_qspi_core_clk_src",
+   },
+   .num_parents = 1,
+   .flags = CLK_SET_RATE_PARENT,
+   .ops = _branch2_ops,
+   },
+   },
+};
+
 static struct clk_branch gcc_qupv3_wrap0_s0_clk = {
.halt_reg = 0x17030,
.halt_check = BRANCH_HALT_VOTED,
@@ -3383,6 +3443,9 @@ static struct clk_regmap *gcc_sdm845_clocks[] = {
[GPLL4] = ,
[GCC_CPUSS_DVM_BUS_CLK] = _cpuss_dvm_bus_clk.clkr,
[GCC_CPUSS_GNOC_CLK] = _cpuss_gnoc_clk.clkr,
+   [GCC_QSPI_CORE_CLK_SRC] = _qspi_core_clk_src.clkr,
+   [GCC_QSPI_CORE_CLK] = _qspi_core_clk.clkr,
+   [GCC_QSPI_CNOC_PERIPH_AHB_CLK] = _qspi_cnoc_periph_ahb_clk.clkr,
 };
 
 static const struct qcom_reset_map gcc_sdm845_resets[] = {
-- 
2.18.0.233.g985f88cf7e-goog

Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread Yang Shi





On 7/23/18 1:31 PM, David Rientjes wrote:

On Fri, 20 Jul 2018, Yang Shi wrote:


I agree to keep it for a while to let that security bug cool down, however, if
there is no user anymore, it sounds pointless to still keep a dead knob.


It's not a dead knob.  We use it, and for reasons other than
CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to
allocate it after it has been freed due to memry pressure, we can either
continue to disable it, allow it to be persistently available, or use a
new value for use_zero_page to specify it should be persistently
available.


My understanding is the cost of memory compaction is *not* unique for 
huge zero page, right? It is expected when memory pressure is met, even 
though huge zero page is disabled.

Re: [PATCHv3 2/2] mtd: m25p80: restore the status of SPI flash when exiting

2018-07-23 Thread Brian Norris

Hi Boris,

On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
 wrote:
> On Mon, 23 Jul 2018 11:13:50 -0700
> Brian Norris  wrote:
>> I noticed this got merged, but I wanted to put my 2 cents in here:
>
> I wish you had replied to this thread when it was posted (more than
> 6 months ago). Reverting the patch now implies making some people
> unhappy because they'll have to resort to their old out-of-tree
> hacks :-(.

I'd say I'm sorry for not following things closely these days, but I'm
not really that sorry. There are plenty of other capable hands. And if
y'all shoot yourselves in the foot, so be it. This patch isn't going
to blow things up, but now that I did finally notice it (because it
happened to show up in a list of backports I was looking at), I
thought better late than never to remind you.

For way of notification: Marek already noticed that we've started down
a slippery slope months ago:

https://lkml.org/lkml/2018/4/8/141
Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
3-byte addressing.

I'm not quite sure why that wasn't taken to its logical conclusion --
that the hack should be reverted.

This problem has been noted many times already, and we've always
stayed on the side of *avoiding* this hack. A few references from a
search of my email:

http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
[PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands

http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
[RFC] MTD m25p80 3-byte addressing and boot problem

http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
[PATCH 2/2] m25p80: if supported put chip to deep power down if not used

>> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:
>> > From: Hou Zhiqiang 
>> >
>> > Restore the status to be compatible with legacy devices.
>> > Take Freescale eSPI boot for example, it copies (in 3 Byte
>> > addressing mode) the RCW and bootloader images from SPI flash
>> > without firing a reset signal previously, so the reboot command
>> > will fail without reseting the addressing mode of SPI flash.
>> > This patch implement .shutdown function to restore the status
>> > in reboot process, and add the same operation to the .remove
>> > function.
>>
>> We have previously rejected this patch multiple times, because the above
>> comment demonstrates a broken product.
>
> If we were to only support working HW parts, I fear Linux would not
> support a lot of HW (that's even more true when it comes to flashes :P).

You stopped allowing UBI to attach to MLC NAND recently, no? That
sounds like almost the same boat -- you've probably killed quite a few
shitty products, if they were to use mainline directly.

Anyway, that's derailing the issue. Supporting broken hardware isn't
something you try to do by applying the same hack to all systems. You
normally try to apply your hack as narrowly as possible. You seem to
imply that below. So maybe that's a solution to move forward with. But
I'd personally be just as happy to see the patch reverted.

>> You cannot guarantee that all
>> reboots will invoke the .shutdown() method -- what about crashes? What
>> about watchdog resets? IIUC, those will hit the same broken behavior,
>> and have unexepcted behavior in your bootloader.
>
> Yes, there are corner cases that are not addressed with this approach,

Is a system crash really a corner case? :D

> but it still seems to improve things. Of course, that means the
> user should try to re-route all HW reset sources to SW ones (RESET input
> pin muxed to the GPIO controller, watchdog generating an interrupt
> instead of directly asserting the RESET output pin), which is not always
> possible, but even when it's not, isn't it better to have a setup that
> works fine 99% of the time instead of 50% of the time?

Perhaps, but not at the expense of future development. And
realistically, no one is doing that if they have this hack. Most
people won't even know that this hack is protecting them at all (so
again, they won't try to mitigate the problem any further).

>> I suppose one could argue for doing this in remove(), but AIUI you're
>> just papering over system bugs by introducing the shutdown() function
>> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
>> other users of this driver.
>
> I understand your point. But if the problem is about making sure people
> designing new boards get that right, why not complaining at probe time
> when things are wrong?
>
> I mean, spi_nor_restore() seems to only do something on very specific
> NORs (those on which a SW RESET does not resets the addressing
> mode).

The point isn't that SW RESET doesn't reset the addressing mode -- it
does on any flash I've seen. The point is that most systems are built
around a stateless assumption in these flash. IIRC, there wasn't even
a SW RESET command at all until these "huge" flash came around and
stateful addressing modes came about. So boot

Re: [PATCH V4 2/7] mmc: sdhci: Change SDMA address register for v4 mode

2018-07-23 Thread kbuild test robot

Hi Chunyan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on ulf.hansson-mmc/next]
[also build test WARNING on v4.18-rc6 next-20180723]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Chunyan-Zhang/mmc-add-support-for-sdhci-4-0/20180724-045328
base:   git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git next
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:14:0,
from include/linux/delay.h:22,
from drivers/mmc//host/sdhci.c:16:
   drivers/mmc//host/sdhci.c: In function 'sdhci_data_irq':
>> include/linux/kern_levels.h:5:18: warning: format '%p' expects argument of 
>> type 'void *', but argument 3 has type 'dma_addr_t {aka unsigned int}' 
>> [-Wformat=]
#define KERN_SOH "\001"  /* ASCII Start Of Header */
 ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
  printk(fmt, ##__VA_ARGS__);  \
 ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
   ^~~~
   include/linux/printk.h:342:12: note: in expansion of macro 'KERN_DEBUG'
 no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
   ^~
   drivers/mmc//host/sdhci.c:43:2: note: in expansion of macro 'pr_debug'
 pr_debug("%s: " DRIVER_NAME ": " f, mmc_hostname(host->mmc), ## x)
 ^~~~
   drivers/mmc//host/sdhci.c:2849:4: note: in expansion of macro 'DBG'
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
   ^~~
   drivers/mmc//host/sdhci.c:2849:19: note: format string is defined here
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
 ~^
 %d
   In file included from include/linux/kernel.h:14:0,
from include/linux/delay.h:22,
from drivers/mmc//host/sdhci.c:16:
   include/linux/kern_levels.h:5:18: warning: format '%p' expects argument of 
type 'void *', but argument 5 has type 'dma_addr_t {aka unsigned int}' 
[-Wformat=]
#define KERN_SOH "\001"  /* ASCII Start Of Header */
 ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
  printk(fmt, ##__VA_ARGS__);  \
 ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
   ^~~~
   include/linux/printk.h:342:12: note: in expansion of macro 'KERN_DEBUG'
 no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
   ^~
   drivers/mmc//host/sdhci.c:43:2: note: in expansion of macro 'pr_debug'
 pr_debug("%s: " DRIVER_NAME ": " f, mmc_hostname(host->mmc), ## x)
 ^~~~
   drivers/mmc//host/sdhci.c:2849:4: note: in expansion of macro 'DBG'
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
   ^~~
   drivers/mmc//host/sdhci.c:2849:56: note: format string is defined here
   DBG("DMA base %pad, transferred 0x%06x bytes, next %pad\n",
  ~^
  %d
--
   In file included from include/linux/kernel.h:14:0,
from include/linux/delay.h:22,
from drivers/mmc/host/sdhci.c:16:
   drivers/mmc/host/sdhci.c: In function 'sdhci_data_irq':
>> include/linux/kern_levels.h:5:18: warning: format '%p' expects argument of 
>> type 'void *', but argument 3 has type 'dma_addr_t {aka unsigned int}' 
>> [-Wformat=]
#define KERN_SOH "\001"  /* ASCII Start Of Header */
 ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
  printk(fmt, ##__VA_ARGS__);  \
 ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
   ^~~~
   include/linux/printk.h:342:12: note: in expansion of macro 'KERN_DEBUG'
 no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
   ^~
   drivers/mmc/host/sdhci.c:43:2: note: in expansion of macro 'pr_debug'
 pr_debug("%s: " DRIVER_NAME ": " f, mmc_hostname(host->mmc), ## x)

Re: [PATCH 2/2] PCI: NVMe device specific reset quirk

2018-07-23 Thread Bjorn Helgaas

On Mon, Jul 23, 2018 at 04:24:31PM -0600, Alex Williamson wrote:
> Take advantage of NVMe devices using a standard interface to quiesce
> the controller prior to reset, including device specific delays before
> and after that reset.  This resolves several NVMe device assignment
> scenarios with two different vendors.  The Intel DC P3700 controller
> has been shown to only work as a VM boot device on the initial VM
> startup, failing after reset or reboot, and also fails to initialize
> after hot-plug into a VM.  Adding a delay after FLR resolves these
> cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
> from FLR with the PCI config space reading back as -1.  A reproducible
> instance of this behavior is resolved by clearing the enable bit in
> the configuration register and waiting for the ready status to clear
> (disabling the NVMe controller) prior to FLR.
> 
> As all NVMe devices make use of this standard interface and the NVMe
> specification also requires PCIe FLR support, we can apply this quirk
> to all devices with matching class code.

Do you have any pointers to problem reports or bugzilla entries that
we could include here?

> Signed-off-by: Alex Williamson 
> ---
>  drivers/pci/quirks.c |  112 
> ++
>  1 file changed, 112 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index e72c8742aafa..83853562f220 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include  /* isa_dma_bridge_buggy */
>  #include "pci.h"
>  
> @@ -3669,6 +3670,116 @@ static int reset_chelsio_generic_dev(struct pci_dev 
> *dev, int probe)
>  #define PCI_DEVICE_ID_INTEL_IVB_M_VGA  0x0156
>  #define PCI_DEVICE_ID_INTEL_IVB_M2_VGA 0x0166
>  
> +/* NVMe controller needs delay before testing ready status */
> +#define NVME_QUIRK_CHK_RDY_DELAY (1 << 0)
> +/* NVMe controller needs post-FLR delay */
> +#define NVME_QUIRK_POST_FLR_DELAY(1 << 1)
> +
> +static const struct pci_device_id nvme_reset_tbl[] = {
> + { PCI_DEVICE(0x1bb1, 0x0100),   /* Seagate Nytro Flash Storage */
> + .driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> + { PCI_DEVICE(0x1c58, 0x0003),   /* HGST adapter */
> + .driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> + { PCI_DEVICE(0x1c58, 0x0023),   /* WDC SN200 adapter */
> + .driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> + { PCI_DEVICE(0x1c5f, 0x0540),   /* Memblaze Pblaze4 adapter */
> + .driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> + { PCI_DEVICE(0x144d, 0xa821),   /* Samsung PM1725 */

We do have PCI_VENDOR_ID_SAMSUNG if you want to use it here.  I
don't see Seagate, HGST, etc.

> + .driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> + { PCI_DEVICE(0x144d, 0xa822),   /* Samsung PM1725a */
> + .driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> + { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0953),   /* Intel DC P3700 */
> + .driver_data = NVME_QUIRK_POST_FLR_DELAY, },
> + { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xff) },
> + { 0 }
> +};
> +
> +/*
> + * The NVMe specification requires that controllers support PCIe FLR, but
> + * but some Samsung SM961/PM961 controllers fail to recover after FLR (-1
> + * config space) unless the device is quiesced prior to FLR.  Do this for
> + * all NVMe devices by disabling the controller before reset.  Some Intel
> + * controllers also require an additional post-FLR delay or else attempts
> + * to re-enable will timeout, do that here as well with heuristically
> + * determined delay value.  Also maintain the delay between disabling and
> + * checking ready status as used by the native NVMe driver.
> + */
> +static int reset_nvme(struct pci_dev *dev, int probe)
> +{
> + const struct pci_device_id *id;
> + void __iomem *bar;
> + u16 cmd;
> + u32 cfg;
> +
> + id = pci_match_id(nvme_reset_tbl, dev);
> + if (!id || !pcie_has_flr(dev) || !pci_resource_start(dev, 0))
> + return -ENOTTY;
> +
> + if (probe)
> + return 0;
> +
> + bar = pci_iomap(dev, 0, NVME_REG_CC + sizeof(cfg));
> + if (!bar)
> + return -ENOTTY;
> +
> + pci_read_config_word(dev, PCI_COMMAND, );
> + pci_write_config_word(dev, PCI_COMMAND, cmd | PCI_COMMAND_MEMORY);
> +
> + cfg = readl(bar + NVME_REG_CC);

Apparently this is part of some NVMe spec and all controllers support
this?  Is there a public reference you could cite for the details?

> +
> + /* Disable controller if enabled */
> + if (cfg & NVME_CC_ENABLE) {
> + u64 cap = readq(bar + NVME_REG_CAP);
> + unsigned long timeout;
> +
> + /*
> +  * Per nvme_disable_ctrl() skip shutdown notification as it
> +  * could complete commands to the admin queue.  We only intend
> +  * to quiesce the device before reset.
>

Re: [PATCH 2/2] PCI: NVMe device specific reset quirk

2018-07-23 Thread Keith Busch

On Mon, Jul 23, 2018 at 04:24:31PM -0600, Alex Williamson wrote:
> Take advantage of NVMe devices using a standard interface to quiesce
> the controller prior to reset, including device specific delays before
> and after that reset.  This resolves several NVMe device assignment
> scenarios with two different vendors.  The Intel DC P3700 controller
> has been shown to only work as a VM boot device on the initial VM
> startup, failing after reset or reboot, and also fails to initialize
> after hot-plug into a VM.  Adding a delay after FLR resolves these
> cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
> from FLR with the PCI config space reading back as -1.  A reproducible
> instance of this behavior is resolved by clearing the enable bit in
> the configuration register and waiting for the ready status to clear
> (disabling the NVMe controller) prior to FLR.
> 
> As all NVMe devices make use of this standard interface and the NVMe
> specification also requires PCIe FLR support, we can apply this quirk
> to all devices with matching class code.

Shouldn't this go in the nvme driver's reset_prepare/reset_done callbacks?

linux-next: build warning after merge of the arm64 tree

2018-07-23 Thread Stephen Rothwell

Hi all,

After merging the arm64 tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

drivers/acpi/Kconfig:6:error: recursive dependency detected!
drivers/acpi/Kconfig:6: symbol ACPI depends on EFI
arch/x86/Kconfig:1920:  symbol EFI depends on ACPI
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"

Introduced by commit

  5bcd44083a08 ("drivers: acpi: add dependency of EFI for arm64")

-- 
Cheers,
Stephen Rothwell


pgp26JVM6ToSa.pgp
Description: OpenPGP digital signature

[PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER

2018-07-23 Thread Alexandru Gagniuc

When we don't own AER, we shouldn't touch the AER error bits. Clearing
error bits willy-nilly might cause firmware to miss some errors. In
theory, these bits get cleared by FFS, or via ACPI _HPX method. These
mechanisms are not subject to the problem.

This race is mostly of theoretical significance, since I can't
reasonably demonstrate this race in the lab.

On a side-note, pcie_aer_is_kernel_first() is created to alleviate the
need for two checks: aer_cap and get_firmware_first().

Signed-off-by: Alexandru Gagniuc 
---
 drivers/pci/pcie/aer.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e88386af28..85c3e173c025 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -307,6 +307,12 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
aer_set_firmware_first(dev);
return dev->__aer_firmware_first;
 }
+
+static bool pcie_aer_is_kernel_first(struct pci_dev *dev)
+{
+   return !!dev->aer_cap && !pcie_aer_get_firmware_first(dev);
+}
+
 #definePCI_EXP_AER_FLAGS   (PCI_EXP_DEVCTL_CERE | 
PCI_EXP_DEVCTL_NFERE | \
 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
 
@@ -337,10 +343,7 @@ bool aer_acpi_firmware_first(void)
 
 int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
-   if (pcie_aer_get_firmware_first(dev))
-   return -EIO;
-
-   if (!dev->aer_cap)
+   if (!pcie_aer_is_kernel_first(dev))
return -EIO;
 
return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
@@ -349,7 +352,7 @@ EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
 int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
-   if (pcie_aer_get_firmware_first(dev))
+   if (!pcie_aer_is_kernel_first(dev))
return -EIO;
 
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
@@ -383,10 +386,10 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
if (!pci_is_pcie(dev))
return -ENODEV;
 
-   pos = dev->aer_cap;
-   if (!pos)
+   if (pcie_aer_is_kernel_first(dev))
return -EIO;
 
+   pos = dev->aer_cap;
port_type = pci_pcie_type(dev);
if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, );
-- 
2.17.1

Re: Linux 4.18-rc6

2018-07-23 Thread Linus Torvalds

Adding davem for the sparc issue, Martin for the s390 one.

On Mon, Jul 23, 2018 at 1:46 PM Guenter Roeck  wrote:
>
> The s390 gcc plugins related build error reported previously has not really
> been fixed; after feedback from the s390 maintainers, suggesting that it
> won't get fixed in 4.18, I disabled GCC_PLUGINS for s390 builds. This is
> not my preferred solution, but it beats not testing s390:allmodconfig
> builds at all.

Martin - can we just remove the

 select HAVE_GCC_PLUGINS

from the s390 Kconfig file (or perhaps add "if BROKEN" or something to
disable it).

Because if it's not getting fixed, it shouldn't be exposed.

> The sparc32 build error is still:
>
> In file included from
> ...
> from drivers/staging/media/omap4iss/iss_video.c:15:
> include/linux/highmem.h: In function 'clear_user_highpage':
> include/linux/highmem.h:137:31: error:
> passing argument 1 of 'sparc_flush_page_to_ram' from incompatible 
> pointer type
>
> due to a missing declaration of 'struct page', as previously reported.

Hmm.  I assume it's

arch/sparc/include/asm/cacheflush_32.h

that wants a forward-declaration of 'struct page', and doesn't include
any header files.

The fix is presumably to move the

   #include 

in drivers/staging/media/omap4iss/iss_video.c down to below the
 includes?

The old patchwork link you had for a fix no longer works, I think
because the patchwork database got re-generated during the upgrade
(and the patchwork numbering isn't stable).

Davem?

  Linus

Re: Linux 4.18-rc6

2018-07-23 Thread David Miller

From: Linus Torvalds 
Date: Mon, 23 Jul 2018 13:56:15 -0700

> Hmm.  I assume it's
> 
> arch/sparc/include/asm/cacheflush_32.h
> 
> that wants a forward-declaration of 'struct page', and doesn't include
> any header files.
> 
> The fix is presumably to move the
> 
>#include 
> 
> in drivers/staging/media/omap4iss/iss_video.c down to below the
>  includes?
> 
> The old patchwork link you had for a fix no longer works, I think
> because the patchwork database got re-generated during the upgrade
> (and the patchwork numbering isn't stable).

I think modifying the include order in that driver is the best fix.

BTW, here is a proper patchwork link in the SPARC project on ozlabs:

http://patchwork.ozlabs.org/patch/947434/

Re: [PATCH] hexagon: switch to NO_BOOTMEM

2018-07-23 Thread Richard Kuo

On Mon, Jul 16, 2018 at 10:43:18AM +0300, Mike Rapoport wrote:
> This patch adds registration of the system memory with memblock, eliminates
> bootmem initialization and converts early memory reservations from bootmem
> to memblock.
> 
> Signed-off-by: Mike Rapoport 

Sorry for the delay, and thanks for this patch.

I think the first memblock_reserve should use ARCH_PFN_OFFSET instead of
PHYS_OFFSET.

If you can amend that I'd be happy to take it through my tree or it can go
through any other.

Thanks,
Richard Kuo

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

Re: Linux 4.18-rc6

2018-07-23 Thread Guenter Roeck

On Mon, Jul 23, 2018 at 01:56:15PM -0700, Linus Torvalds wrote:
> Adding davem for the sparc issue, Martin for the s390 one.
> 
> On Mon, Jul 23, 2018 at 1:46 PM Guenter Roeck  wrote:
> >
> > The s390 gcc plugins related build error reported previously has not really
> > been fixed; after feedback from the s390 maintainers, suggesting that it
> > won't get fixed in 4.18, I disabled GCC_PLUGINS for s390 builds. This is
> > not my preferred solution, but it beats not testing s390:allmodconfig
> > builds at all.
> 
> Martin - can we just remove the
> 
>  select HAVE_GCC_PLUGINS
> 
> from the s390 Kconfig file (or perhaps add "if BROKEN" or something to
> disable it).
> 
> Because if it's not getting fixed, it shouldn't be exposed.
> 
The problem only affects 4.18 - the code has been rearranged in -next.
Only, in my builders, I can't disable a flag for individual releases,
so I just disabled it completely for s390.

> > The sparc32 build error is still:
> >
> > In file included from
> > ...
> > from drivers/staging/media/omap4iss/iss_video.c:15:
> > include/linux/highmem.h: In function 'clear_user_highpage':
> > include/linux/highmem.h:137:31: error:
> > passing argument 1 of 'sparc_flush_page_to_ram' from incompatible 
> > pointer type
> >
> > due to a missing declaration of 'struct page', as previously reported.
> 
> Hmm.  I assume it's
> 
> arch/sparc/include/asm/cacheflush_32.h
> 
> that wants a forward-declaration of 'struct page', and doesn't include
> any header files.
> 
> The fix is presumably to move the
> 
>#include 
> 
> in drivers/staging/media/omap4iss/iss_video.c down to below the
>  includes?
> 
Good idea.

> The old patchwork link you had for a fix no longer works, I think
> because the patchwork database got re-generated during the upgrade
> (and the patchwork numbering isn't stable).
> 

Looks like they dropped lkml completely. Odd.

My patch is also at

https://patchwork.ozlabs.org/patch/937283/

Also, there is now another patch from Randy Dunlap, pretty much
doing the same.

https://patchwork.ozlabs.org/patch/947434/

I'll submit separate patches to address the include file ordering;
it does make sense to do that. I'll do the same for android/binder.c;
it has the same problem, only there it only generates a warning.

Thanks,
Guenter

Re: [PATCH 0/3] PTI for x86-32 Fixes and Updates

2018-07-23 Thread Pavel Machek

On Mon 2018-07-23 12:00:08, Linus Torvalds wrote:
> On Mon, Jul 23, 2018 at 7:09 AM Pavel Machek  wrote:
> >
> > Meanwhile... it looks like gcc is not slowed down significantly, but
> > other stuff sees 30% .. 40% slowdowns... which is rather
> > significant.
> 
> That is more or less expected.
> 
> Gcc spends about 90+% of its time in user space, and the system calls
> it *does* do tend to be "real work" (open/read/etc). And modern gcc's
> no longer have the pipe between cpp and cc1, so they don't have that
> issue either (which would have sjhown the PTI slowdown a lot more)
> 
> Some other loads will do a lot more time traversing the user/kernel
> boundary, and in 32-bit mode you won't be able to take advantage of
> the address space ID's, so you really get the full effect.

Understood. Just -- bzip2 should include quite a lot of time in
userspace, too. 

> > Would it be possible to have per-process control of kpti? I have
> > some processes where trading of speed for security would make sense.
> 
> That was pretty extensively discussed, and no sane model for it was
> ever agreed upon.  Some people wanted it per-thread, others per-mm,
> and it wasn't clear how to set it either and how it should inherit
> across fork/exec, and what the namespace rules etc should be.
> 
> You absolutely need to inherit it (so that you can say "I trust this
> session" or whatever), but at the same time you *don't* want to
> inherit if you have a server you trust that then spawns user processes
> (think "I want systemd to not have the overhead, but the user
> processes it spawns obviously do need protection").
> 
> It was just a morass. Nothing came out of it.  I guess people can
> discuss it again, but it's not simple.

I agree it is not easy. OTOH -- 30% of user-visible performance is a
_lot_. That is worth spending man-years on...  Ok, problem is not as
severe on modern CPUs with address space ID's, but...

What I want is "if A can ptrace B, and B has pti disabled, A can have
pti disabled as well". Now.. I see someone may want to have it
per-thread, because for stuff like javascript JIT, thread may have
rights to call ptrace, but is unable to call ptrace because JIT
removed that ability... hmm...

But for now I'd like at least "global" option of turning pti on/off
during runtime for benchmarking. Let me see...

Something like this, or is it going to be way more complex? Does
anyone have patch by chance?

Pavel

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index dfb975b..719e39a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -162,6 +162,9 @@
 .macro SWITCH_TO_USER_CR3 scratch_reg:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
 
+   cmpl$1, PER_CPU_VAR(pti_enabled)
+   jne .Lend_\@
+   
movl%cr3, \scratch_reg
orl $PTI_SWITCH_MASK, \scratch_reg
movl\scratch_reg, %cr3
@@ -176,6 +179,8 @@
testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
jz  .Lend_\@
.endif
+   cmpl$1, PER_CPU_VAR(pti_enabled)
+   jne .Lend_\@
/* On user-cr3? */
movl%cr3, %eax
testl   $PTI_SWITCH_MASK, %eax
@@ -192,6 +197,10 @@
  */
 .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+   cmpl$1, PER_CPU_VAR(pti_enabled)
+   jne .Lend_\@
+
movl%cr3, \scratch_reg
/* Test if we are already on kernel CR3 */
testl   $PTI_SWITCH_MASK, \scratch_reg
@@ -302,6 +311,9 @@
 */
ALTERNATIVE "jmp .Lswitched_\@", "", X86_FEATURE_PTI
 
+   cmpl$1, PER_CPU_VAR(pti_enabled)
+   jne .Lswitched_\@
+
testl   $PTI_SWITCH_MASK, \cr3_reg
jz  .Lswitched_\@
 
diff --git a/arch/x86/include/asm/cpu_entry_area.h 
b/arch/x86/include/asm/cpu_entry_area.h
index 4a7884b..8c92ae2 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -59,6 +59,7 @@ struct cpu_entry_area {
 #define CPU_ENTRY_AREA_TOT_SIZE(CPU_ENTRY_AREA_SIZE * NR_CPUS)
 
 DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
+DECLARE_PER_CPU(int, pti_enabled);
 
 extern void setup_cpu_entry_areas(void);
 extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f73fa6f..da34a21 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -507,6 +507,9 @@ void load_percpu_segment(int cpu)
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
 #endif
 
+DEFINE_PER_CPU(int, pti_enabled);
+
+
 #ifdef CONFIG_X86_64
 /*
  * Special IST stacks which the CPU switches to when it calls

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH] tpm: add support for partial reads

2018-07-23 Thread Tadeusz Struk

On 07/23/2018 02:13 PM, James Bottomley wrote:
> The current patch does, you even provided a use case in your last email
>  (it's do command to get sizing followed by do command with correctly
> sized buffer). 

The example I provided was: #1 send a command, #2 read the response header
(10 bytes), get the actual response size from the header and then #3 read
the full response (response size - size of the header bytes).

> 
> However, if you tie it to O_NONBLOCK, it won't because no-one currently
> opens the TPM device non blocking so it's an ABI conformant
> discriminator of the uses.  Tying to O_NONBLOCK should be simple
> because it's in file->f_flags.

I think that it might be an option. Especially that I have this on top of
the async patch. Let's discuss this when Jarkko is back.

Thanks,
-- 
Tadeusz

Re: [PATCH] IPoIB: use kvzalloc to allocate an array of bucket pointers

2018-07-23 Thread Jason Gunthorpe

On Mon, Jul 09, 2018 at 04:51:03PM +0300, Jan Dakinevich wrote:
> This table by default takes 32KiB which is 3rd memory order. Meanwhile,
> this memory is not aimed for DMA operation and could be safely allocated
> by vmalloc.
> 
> Signed-off-by: Jan Dakinevich 
> Reviewed-by: Håkon Bugge 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied to for-next, thanks

Jason

Re: [PATCH 0/3] PTI for x86-32 Fixes and Updates

2018-07-23 Thread Andy Lutomirski




> On Jul 23, 2018, at 2:38 PM, Pavel Machek  wrote:
> 
>> On Mon 2018-07-23 12:00:08, Linus Torvalds wrote:
>>> On Mon, Jul 23, 2018 at 7:09 AM Pavel Machek  wrote:
>>> 
>>> Meanwhile... it looks like gcc is not slowed down significantly, but
>>> other stuff sees 30% .. 40% slowdowns... which is rather
>>> significant.
>> 
>> That is more or less expected.
>> 
>> Gcc spends about 90+% of its time in user space, and the system calls
>> it *does* do tend to be "real work" (open/read/etc). And modern gcc's
>> no longer have the pipe between cpp and cc1, so they don't have that
>> issue either (which would have sjhown the PTI slowdown a lot more)
>> 
>> Some other loads will do a lot more time traversing the user/kernel
>> boundary, and in 32-bit mode you won't be able to take advantage of
>> the address space ID's, so you really get the full effect.
> 
> Understood. Just -- bzip2 should include quite a lot of time in
> userspace, too. 
> 
>>> Would it be possible to have per-process control of kpti? I have
>>> some processes where trading of speed for security would make sense.
>> 
>> That was pretty extensively discussed, and no sane model for it was
>> ever agreed upon.  Some people wanted it per-thread, others per-mm,
>> and it wasn't clear how to set it either and how it should inherit
>> across fork/exec, and what the namespace rules etc should be.
>> 
>> You absolutely need to inherit it (so that you can say "I trust this
>> session" or whatever), but at the same time you *don't* want to
>> inherit if you have a server you trust that then spawns user processes
>> (think "I want systemd to not have the overhead, but the user
>> processes it spawns obviously do need protection").
>> 
>> It was just a morass. Nothing came out of it.  I guess people can
>> discuss it again, but it's not simple.
> 
> I agree it is not easy. OTOH -- 30% of user-visible performance is a
> _lot_. That is worth spending man-years on...  Ok, problem is not as
> severe on modern CPUs with address space ID's, but...
> 
> What I want is "if A can ptrace B, and B has pti disabled, A can have
> pti disabled as well". Now.. I see someone may want to have it
> per-thread, because for stuff like javascript JIT, thread may have
> rights to call ptrace, but is unable to call ptrace because JIT
> removed that ability... hmm...

No, you don’t want that. The problem is that Meltdown isn’t a problem that 
exists in isolation. It’s very plausible that JavaScript code could trigger a 
speculation attack that, with PTI off, could read kernel memory.

Re: [PATCH 0/3] PTI for x86-32 Fixes and Updates

2018-07-23 Thread Dave Hansen

On 07/23/2018 02:59 PM, Josh Poimboeuf wrote:
> On Mon, Jul 23, 2018 at 11:38:30PM +0200, Pavel Machek wrote:
>> But for now I'd like at least "global" option of turning pti on/off
>> during runtime for benchmarking. Let me see...
>>
>> Something like this, or is it going to be way more complex? Does
>> anyone have patch by chance?
> RHEL/CentOS has a global PTI enable/disable, which uses stop_machine().

Let's not forget PTI's NX-for-userspace in the kernel page tables.  That
provides Spectre V2 mitigation as well as Meltdown.

Re: [PATCH] tpm: add support for partial reads

2018-07-23 Thread Jason Gunthorpe

On Mon, Jul 23, 2018 at 03:00:20PM -0700, Tadeusz Struk wrote:
> On 07/23/2018 02:56 PM, Jason Gunthorpe wrote:
> > The proposed patch doesn't clear the data_pending if the entire buffer
> > is not consumed, so of course it is ABI breaking, that really isn't OK.
> 
> The data_pending will be cleared by the timeout handler if the user doesn't
> read the response fully before the timeout expires. The is the same situation
> if the user would not read the response at all.

That causes write() to fail with EBUSY

NAK from me on breaking the ABI like this

Jason

Re: [PATCH v1 09/10] Input: atmel_mxt_ts - tool type is ignored when slot is closed

2018-07-23 Thread Dmitry Torokhov

On Fri, Jul 20, 2018 at 10:51:21PM +0100, Nick Dyer wrote:
> From: Nick Dyer 
> 
> input_mt_report_slot_state() ignores the tool when the slot is closed.
> Remove the tool type from these function calls, which has caused a bit of
> confusion.

Hmm, maybe we could introduce MT_TOOL_NONE or MT_TOOL_INACTIVE and get
rid of the 3rd parameter? It will require a bit of macro trickery for a
release or 2...

> 
> Signed-off-by: Nick Dyer 
> ---
>  drivers/input/touchscreen/atmel_mxt_ts.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c 
> b/drivers/input/touchscreen/atmel_mxt_ts.c
> index d7023d261458..c31af790ef84 100644
> --- a/drivers/input/touchscreen/atmel_mxt_ts.c
> +++ b/drivers/input/touchscreen/atmel_mxt_ts.c
> @@ -838,8 +838,7 @@ static void mxt_proc_t9_message(struct mxt_data *data, u8 
> *message)
>* have happened.
>*/
>   if (status & MXT_T9_RELEASE) {
> - input_mt_report_slot_state(input_dev,
> -MT_TOOL_FINGER, 0);
> + input_mt_report_slot_state(input_dev, 0, 0);
>   mxt_input_sync(data);
>   }
>  
> @@ -855,7 +854,7 @@ static void mxt_proc_t9_message(struct mxt_data *data, u8 
> *message)
>   input_report_abs(input_dev, ABS_MT_TOUCH_MAJOR, area);
>   } else {
>   /* Touch no longer active, close out slot */
> - input_mt_report_slot_state(input_dev, MT_TOOL_FINGER, 0);
> + input_mt_report_slot_state(input_dev, 0, 0);
>   }
>  
>   data->update_input = true;
> -- 
> 2.17.1
> 

-- 
Dmitry

Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes

On Mon, 23 Jul 2018, Yang Shi wrote:

> > > I agree to keep it for a while to let that security bug cool down,
> > > however, if
> > > there is no user anymore, it sounds pointless to still keep a dead knob.
> > > 
> > It's not a dead knob.  We use it, and for reasons other than
> > CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to
> > allocate it after it has been freed due to memry pressure, we can either
> > continue to disable it, allow it to be persistently available, or use a
> > new value for use_zero_page to specify it should be persistently
> > available.
> 
> My understanding is the cost of memory compaction is *not* unique for huge
> zero page, right? It is expected when memory pressure is met, even though huge
> zero page is disabled.
> 

It's caused by fragmentation, not necessarily memory pressure.  We've 
disabled it because compacting for tens of thousands of huge zero pages in 
the background has a noticeable impact on cpu.  Additionally, if the hzp 
cannot be allocated at runtime it increases the rss of applications that 
map it, making it unpredictable.  Making it persistent, as I've been 
suggesting, fixes these issues.

Re: [PATCH v2] hexagon: modify ffs() and fls() to return int

2018-07-23 Thread Randy Dunlap

On 07/23/2018 03:50 PM, Richard Kuo wrote:
> On Sun, Jul 22, 2018 at 04:03:58PM -0700, Randy Dunlap wrote:
>> From: Randy Dunlap 
>>
>> Building drivers/mtd/nand/raw/nandsim.c on arch/hexagon/ produces a
>> printk format build warning.  This is due to hexagon's ffs() being
>> coded as returning long instead of int.
>>
>> Fix the printk format warning by changing all of hexagon's ffs() and
>> fls() functions to return int instead of long.  The variables that
>> they return are already int instead of long.  This return type
>> matches the return type in .
>>
>> ../drivers/mtd/nand/raw/nandsim.c: In function 'init_nandsim':
>> ../drivers/mtd/nand/raw/nandsim.c:760:2: warning: format '%u' expects 
>> argument of type 'unsigned int', but argument 2 has type 'long int' 
>> [-Wformat]
>>
>> There are no ffs() or fls() allmodconfig build errors after making this
>> change.
>>
>> Signed-off-by: Randy Dunlap 
>> Cc: Richard Kuo 
>> Cc: linux-hexa...@vger.kernel.org
>> Cc: Geert Uytterhoeven 
>> ---
>> v2:
>> add hexagon contacts, drop erroneous sh contacts; [thanks, Geert]
>> only change return type for ffs() and fls() [thanks, Geert]
>>   [drop the changes for ffz(), __ffs(), and __fls()]
>>
>>  arch/hexagon/include/asm/bitops.h |4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
> 
> 
> Acked-by: Richard Kuo 
> 

Hi Richard,

You are listed as the arch/hexagon/ maintainer.  Can you please merge these
patches?

thanks,
-- 
~Randy

Re: [PATCHv3 2/2] mtd: m25p80: restore the status of SPI flash when exiting

2018-07-23 Thread Boris Brezillon

Hi Brian,

On Mon, 23 Jul 2018 11:13:50 -0700
Brian Norris  wrote:

> Hello,
> 
> I noticed this got merged, but I wanted to put my 2 cents in here:

I wish you had replied to this thread when it was posted (more than
6 months ago). Reverting the patch now implies making some people
unhappy because they'll have to resort to their old out-of-tree
hacks :-(.

> 
> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:
> > From: Hou Zhiqiang 
> > 
> > Restore the status to be compatible with legacy devices.
> > Take Freescale eSPI boot for example, it copies (in 3 Byte
> > addressing mode) the RCW and bootloader images from SPI flash
> > without firing a reset signal previously, so the reboot command
> > will fail without reseting the addressing mode of SPI flash.
> > This patch implement .shutdown function to restore the status
> > in reboot process, and add the same operation to the .remove
> > function.  
> 
> We have previously rejected this patch multiple times, because the above
> comment demonstrates a broken product.

If we were to only support working HW parts, I fear Linux would not
support a lot of HW (that's even more true when it comes to flashes :P).

> You cannot guarantee that all
> reboots will invoke the .shutdown() method -- what about crashes? What
> about watchdog resets? IIUC, those will hit the same broken behavior,
> and have unexepcted behavior in your bootloader.

Yes, there are corner cases that are not addressed with this approach,
but it still seems to improve things. Of course, that means the
user should try to re-route all HW reset sources to SW ones (RESET input
pin muxed to the GPIO controller, watchdog generating an interrupt
instead of directly asserting the RESET output pin), which is not always
possible, but even when it's not, isn't it better to have a setup that
works fine 99% of the time instead of 50% of the time?

> 
> I suppose one could argue for doing this in remove(), but AIUI you're
> just papering over system bugs by introducing the shutdown() function
> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
> other users of this driver.

I understand your point. But if the problem is about making sure people
designing new boards get that right, why not complaining at probe time
when things are wrong?

I mean, spi_nor_restore() seems to only do something on very specific
NORs (those on which a SW RESET does not resets the addressing
mode). So, how about adding a flag that says "my board has the NOR HW
RESET pin wired" (there would be a DT props to set that flag). Then you
add a WARN_ON() when this flag is not set and a NOR chip impacted by
this bug is detected. This way you make sure people are informed that
they're doing something wrong, and for those who can't change their HW
(because it's already widely deployed), you have a fix that improve
things.

Regards,

Boris

Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-23 Thread Steven Rostedt



Sorry for the late reply, just came back from the Caribbean :-) :-) :-)

On Fri, 13 Jul 2018 11:47:18 +0800
Lai Jiangshan  wrote:

> On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
>  wrote:
> > Hello!
> >
> > I now have a semi-reasonable prototype of changes consolidating the
> > RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> > There are likely still bugs to be fixed and probably other issues as well,
> > but a prototype does exist.

What's the rational for all this churn? Linus's complaining that there
are too many RCU variants?


> >
> > Assuming continued good rcutorture results and no objections, I am
> > thinking in terms of this timeline:
> >
> > o   Preparatory work and cleanups are slated for the v4.19 merge window.
> >
> > o   The actual consolidation and post-consolidation cleanup is slated
> > for the merge window after v4.19 (v5.0?).  These cleanups include
> > the replacements called out below within the RCU implementation
> > itself (but excluding kernel/rcu/sync.c, see question below).
> >
> > o   Replacement of now-obsolete update APIs is slated for the second
> > merge window after v4.19 (v5.1?).  The replacements are currently
> > expected to be as follows:
> >
> > synchronize_rcu_bh() -> synchronize_rcu()
> > synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> > call_rcu_bh() -> call_rcu()
> > rcu_barrier_bh() -> rcu_barrier()
> > synchronize_sched() -> synchronize_rcu()
> > synchronize_sched_expedited() -> synchronize_rcu_expedited()
> > call_rcu_sched() -> call_rcu()
> > rcu_barrier_sched() -> rcu_barrier()
> > get_state_synchronize_sched() -> get_state_synchronize_rcu()
> > cond_synchronize_sched() -> cond_synchronize_rcu()
> > synchronize_rcu_mult() -> synchronize_rcu()
> >
> > I have done light testing of these replacements with good results.
> >
> > Any objections to this timeline?
> >
> > I also have some questions on the ultimate end point.  I have default
> > choices, which I will likely take if there is no discussion.
> >
> > o
> > Currently, I am thinking in terms of keeping the per-flavor
> > read-side functions.  For example, rcu_read_lock_bh() would
> > continue to disable softirq, and would also continue to tell
> > lockdep about the RCU-bh read-side critical section.  However,
> > synchronize_rcu() will wait for all flavors of read-side critical
> > sections, including those introduced by (say) preempt_disable(),
> > so there will no longer be any possibility of mismatching (say)
> > RCU-bh readers with RCU-sched updaters.
> >
> > I could imagine other ways of handling this, including:
> >
> > a.  Eliminate rcu_read_lock_bh() in favor of
> > local_bh_disable() and so on.  Rely on lockdep
> > instrumentation of these other functions to identify RCU
> > readers, introducing such instrumentation as needed.  I am
> > not a fan of this approach because of the large number of
> > places in the Linux kernel where interrupts, preemption,
> > and softirqs are enabled or disabled "behind the scenes".
> >
> > b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> > and required callers to also disable softirqs, preemption,
> > or whatever as needed.  I am not a fan of this approach
> > because it seems a lot less convenient to users of RCU-bh
> > and RCU-sched.
> >
> > At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> > read-side APIs.  But are there better approaches?  
> 
> Hello, Paul
> 
> Since local_bh_disable() will be guaranteed to be protected by RCU
> and more general. I'm afraid it will be preferred over
> rcu_read_lock_bh() which will be gradually being phased out.
> 
> In other words, keeping the RCU-bh read-side APIs will be a slower
> version of the option A. So will the same approach for the RCU-sched.
> But it'll still be better than the hurrying option A, IMHO.

Now when all this gets done, is synchronize_rcu() going to just wait
for everything to pass? (scheduling, RCU readers, softirqs, etc) Is
there any worry about lengthening the time of synchronize_rcu?

-- Steve


> >
> > o   How should kernel/rcu/sync.c be handled?  Here are some
> > possibilities:
> >
> > a.  Leave the full gp_ops[] array and simply translate
> > the obsolete update-side functions to their RCU
> > equivalents.
> >
> > b.  Leave the current gp_ops[] array, but only have
> > the RCU_SYNC entry.  The __INIT_HELD field would
> > be set to a function that was OK with being in an
> > RCU read-side

Re: [PATCH v9 1/2] regulator: dt-bindings: add QCOM RPMh regulator bindings

2018-07-23 Thread Doug Anderson

Hi Mark,

On Fri, Jul 13, 2018 at 6:50 PM, David Collins  wrote:
> Introduce bindings for RPMh regulator devices found on some
> Qualcomm Technlogies, Inc. SoCs.  These devices allow a given
> processor within the SoC to make PMIC regulator requests which
> are aggregated within the RPMh hardware block along with requests
> from other processors in the SoC to determine the final PMIC
> regulator hardware state.
>
> Signed-off-by: David Collins 
> Reviewed-by: Rob Herring 
> Reviewed-by: Douglas Anderson 
> ---
>  .../bindings/regulator/qcom,rpmh-regulator.txt | 160 
> +
>  .../dt-bindings/regulator/qcom,rpmh-regulator.h|  36 +
>  2 files changed, 196 insertions(+)

I know you are still looking for time to review the RPMh-regulator
driver and that's fine.  One idea I had though: if the bindings look
OK to you and are less controversial, is there any chance they could
land in the meantime?

Specifically it would be very handy to be able to post up device tree
files that refer to regulators and even get those landed, but they
can't land without the bindings.

If that's not possible then no worries, but I figured I'd check.

-Doug

Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level

2018-07-23 Thread rishabhb


On 2018-07-23 04:17, Rafael J. Wysocki wrote:

On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar
 wrote:

Drivers that are registered at an initcall level may have to
wait until late_init before the probe deferral mechanism can
retry their probe functions. It is possible that their
dependencies were resolved much earlier, in some cases even
before the next initcall level. Invoke one probe retry cycle
at every _sync initcall level, allowing these drivers to be
probed earlier.


Can you please say something about the actual use case this is
expected to address?

We have a display driver that depends 3 other devices to be
probed so that it can bring-up the display. Because of dependencies
not being met the deferral mechanism defers the probes for a later time,
even though the dependencies might be met earlier. With this change
display can be brought up much earlier.



Signed-off-by: Vikram Mulukutla 
Signed-off-by: Rishabh Bhatnagar 
---
 drivers/base/dd.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 1435d72..e6a6821 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -224,23 +224,44 @@ void device_unblock_probing(void)
driver_deferred_probe_trigger();
 }

+static void enable_trigger_defer_cycle(void)
+{
+   driver_deferred_probe_enable = true;
+   driver_deferred_probe_trigger();
+   /*
+* Sort as many dependencies as possible before the next 
initcall

+* level
+*/
+   flush_work(_probe_work);
+}
+
 /**
  * deferred_probe_initcall() - Enable probing of deferred devices
  *
  * We don't want to get in the way when the bulk of drivers are 
getting probed.
  * Instead, this initcall makes sure that deferred probing is delayed 
until

- * late_initcall time.
+ * all the registered initcall functions at a particular level are 
completed.

+ * This function is invoked at every *_initcall_sync level.
  */
 static int deferred_probe_initcall(void)
 {
-   driver_deferred_probe_enable = true;
-   driver_deferred_probe_trigger();
-   /* Sort as many dependencies as possible before exiting 
initcalls */

-   flush_work(_probe_work);
+   enable_trigger_defer_cycle();
+   driver_deferred_probe_enable = false;
+   return 0;
+}
+arch_initcall_sync(deferred_probe_initcall);
+subsys_initcall_sync(deferred_probe_initcall);
+fs_initcall_sync(deferred_probe_initcall);
+device_initcall_sync(deferred_probe_initcall);
+
+static int deferred_probe_enable_fn(void)
+{
+   /* Enable deferred probing for all time */
+   enable_trigger_defer_cycle();
initcalls_done = true;
return 0;
 }
-late_initcall(deferred_probe_initcall);
+late_initcall(deferred_probe_enable_fn);

 /**
  * device_is_bound() - Check if device is bound to a driver
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

a Linux Foundation Collaborative Project

Re: kernel BUG at mm/shmem.c:LINE!

2018-07-23 Thread Matthew Wilcox

On Mon, Jul 23, 2018 at 12:14:41PM -0700, Hugh Dickins wrote:
> On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote:
> > > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815!
> > > I don't know, but I'm afraid it has not fixed linux-next breakage of
> > > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466!
> > > 
> > > Please try something like
> > > mount -o remount,huge=always /dev/shm
> > > cp /dev/zero /dev/shm
> > > 
> > > Writing soon crashes in find_lock_entry(), looking up offset 0x201
> > > but getting the page for offset 0x3c1 instead.
> > 
> > Hmm.  I don't see a crash while running that command,
> 
> Thanks for looking.
> 
> It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page)
> in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y
> on this occasion? Or you don't think of an oops as a kernel crash,
> and didn't notice it in dmesg? I see now that I've arranged for oops
> to crash, since I don't like to miss them myself; but it is a very
> clean oops, no locks held, so can just kill the process and continue.

Usually I run with that turned on, but somehow in my recent messing
with my test system, that got turned off.  Once I turned it back on,
it spots the bug instantly.

> Or is there something more mysterious stopping it from showing up for
> you? It's repeatable for me. When not crashing, that "cp" should fill
> up about half of RAM before it hits the implicit tmpfs volume limit;
> but I am assuming a not entirely fragmented machine - it does need
> to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE().

I tried that too, before noticing that DEBUG_VM was off; raised my test
VM's memory from 2GB to 8GB.

> Are you sure that those pages are free, rather than most of them tails
> of one of the two compound pages involved? I think it's the same in your
> rewrite of struct page, the compound_head field (lru.next), with its low
> bit set, were how to recognize a tail page.

Yes, PageTail was set, and so was TAIL_MAPPING (0xdead00400).
What was going on was the first 2MB page was being stored at indices
0-511, then the second 2MB page was being stored at indices 64-575
instead of 512-1023.

I figured out a fix and pushed it to the 'ida' branch in
git://git.infradead.org/users/willy/linux-dax.git

It won't be in linux-next tomorrow because the nvdimm people have
just dumped a pile of patches into their tree that conflict with
the XArray-DAX rewrite, so Stephen has pulled the XArray tree out
of linux-next temporarily.  I didn't have time to sort out the merge
conflict today because I judged your bug report more important.

Re: [PATCH v3 1/2] dt-bindings: pfuze100: add optional disable switch-regulators binding

2018-07-23 Thread Fabio Estevam

On Mon, Jul 23, 2018 at 4:47 AM, Marco Felsch  wrote:
> This binding is used to keep the backward compatibility with the current
> dtb's [1]. The binding informs the driver that the unused switch regulators
> can be disabled.
> If it is not specified, the driver doesn't disable the switch regulators.
>
> [1] https://patchwork.kernel.org/patch/10490381/
>
> Signed-off-by: Marco Felsch 

Reviewed-by: Fabio Estevam

Re: [PATCH v3 2/2] regulator: pfuze100: add support to en-/disable switch regulators

2018-07-23 Thread Fabio Estevam

On Mon, Jul 23, 2018 at 4:47 AM, Marco Felsch  wrote:
> Add enable/disable support for switch regulators on pfuze100.
>
> Based on commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for
> switch") which is reverted due to boot regressions by commit 464a5686e6c9
> ("regulator: Revert "regulator: pfuze100: add enable/disable for switch"").
> Disabling the switch regulators will only be done if the user specifies
> "fsl,pfuze-support-disable-sw" in its device tree to keep backward
> compatibility with current dtb's [1].
>
> [1] https://patchwork.kernel.org/patch/10490381/
>
> Signed-off-by: Marco Felsch 

Reviewed-by: Fabio Estevam

Re: [PATCH 0/4] ia64: switch to NO_BOOTMEM

2018-07-23 Thread Luck, Tony

On Mon, Jul 23, 2018 at 08:56:54AM +0300, Mike Rapoport wrote:
> Hi,
> 
> These patches convert ia64 to use NO_BOOTMEM.
> 
> The first two patches are cleanups, the third patches reduces usage of
> 'struct bootmem_data' for easier transition and the forth patch actually
> replaces bootmem with memblock + nobootmem.
> 
> I've tested the sim_defconfig with the ski simulator and build tested other
> defconfigs.

Boots OK on my real ia64 system.

Unless somebody else sees an issue I'll push to Linus nest merge window.

-Tony

Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2

2018-07-23 Thread Balbir Singh

On Fri, Jul 13, 2018 at 3:27 AM Johannes Weiner  wrote:
>
> PSI aggregates and reports the overall wallclock time in which the
> tasks in a system (or cgroup) wait for contended hardware resources.
>
> This helps users understand the resource pressure their workloads are
> under, which allows them to rootcause and fix throughput and latency
> problems caused by overcommitting, underprovisioning, suboptimal job
> placement in a grid, as well as anticipate major disruptions like OOM.
>
> This version 2 of the series incorporates a ton of feedback from
> PeterZ and SurenB; more details at the end of this email.
>
> Real-world applications
>
> We're using the data collected by psi (and its previous incarnation,
> memdelay) quite extensively at Facebook, with several success stories.
>
> One usecase is avoiding OOM hangs/livelocks. The reason these happen
> is because the OOM killer is triggered by reclaim not being able to
> free pages, but with fast flash devices there is *always* some clean
> and uptodate cache to reclaim; the OOM killer never kicks in, even as
> tasks spend 90% of the time thrashing the cache pages of their own
> executables. There is no situation where this ever makes sense in
> practice. We wrote a <100 line POC python script to monitor memory
> pressure and kill stuff way before such pathological thrashing leads
> to full system losses that require forcible hard resets.
>
> We've since extended and deployed this code into other places to
> guarantee latency and throughput SLAs, since they're usually violated
> way before the kernel OOM killer would ever kick in.
>
> The idea is to eventually incorporate this back into the kernel, so
> that Linux can avoid OOM livelocks (which TECHNICALLY aren't memory
> deadlocks, but for the user indistinguishable) out of the box.
>
> We also use psi memory pressure for loadshedding. Our batch job
> infrastructure used to use heuristics based on various VM stats to
> anticipate OOM situations, with lackluster success. We switched it to
> psi and managed to anticipate and avoid OOM kills and hangs fairly
> reliably. The reduction of OOM outages in the worker pool raised the
> pool's aggregate productivity, and we were able to switch that service
> to smaller machines.
>
> Lastly, we use cgroups to isolate a machine's main workload from
> maintenance crap like package upgrades, logging, configuration, as
> well as to prevent multiple workloads on a machine from stepping on
> each others' toes. We were not able to configure this properly without
> the pressure metrics; we would see latency or bandwidth drops, but it
> would often be hard to impossible to rootcause it post-mortem.
>
> We now log and graph pressure for the containers in our fleet and can
> trivially link latency spikes and throughput drops to shortages of
> specific resources after the fact, and fix the job config/scheduling.
>
> I've also recieved feedback and feature requests from Android for the
> purpose of low-latency OOM killing. The on-demand stats aggregation in
> the last patch of this series is for this purpose, to allow Android to
> react to pressure before the system starts visibly hanging.
>
> How do you use this feature?
>
> A kernel with CONFIG_PSI=y will create a /proc/pressure directory with
> 3 files: cpu, memory, and io. If using cgroup2, cgroups will also have
> cpu.pressure, memory.pressure and io.pressure files, which simply
> aggregate task stalls at the cgroup level instead of system-wide.
>
> The cpu file contains one line:
>
> some avg10=2.04 avg60=0.75 avg300=0.40 total=157656722
>
> The averages give the percentage of walltime in which one or more
> tasks are delayed on the runqueue while another task has the
> CPU. They're recent averages over 10s, 1m, 5m windows, so you can tell
> short term trends from long term ones, similarly to the load average.
>

Does the mechanism scale? I am a little concerned about how frequently
this infrastructure is monitored/read/acted upon. Why aren't existing
mechanisms sufficient -- why is the avg delay calculation in the
kernel?

> The total= value gives the absolute stall time in microseconds. This
> allows detecting latency spikes that might be too short to sway the
> running averages. It also allows custom time averaging in case the
> 10s/1m/5m windows aren't adequate for the usecase (or are too coarse
> with future hardware).
>
> What to make of this "some" metric? If CPU utilization is at 100% and
> CPU pressure is 0, it means the system is perfectly utilized, with one
> runnable thread per CPU and nobody waiting. At two or more runnable
> tasks per CPU, the system is 100% overcommitted and the pressure
> average will indicate as much. From a utilization perspective this is
> a great state of course: no CPU cycles are being wasted, even when 50%
> of the threads were to go idle (as most workloads do vary). From the
> perspective of the individual job it's not great, however, and they

[PATCH v2 0/2] clk: qcom: Quad SPI (qspi) clock support for sdm845

2018-07-23 Thread Douglas Anderson



This two-series patch adds the needed clock bits to use the Quad SPI
(qspi) part on sdm845.  It's expected that the bindings part of this
patch could land in the clock tree with an immutable git hash and then
be pulled into the Qualcomm tree so it could be used by dts files.

>From the reply to my v1, the clock plan for this clock is:
- MinSVS@19.2
- LowSVS@75
- SVS@150
- Nominal@300
...and intermediate frequencies can be used at frequences less than
300.  I didn't see a need for 75 MHz and it was unclear from previous
replies if this should come from MAIN or EVEN so I left it out.  I
have added 100 MHz here since it is useful (/ 4 = 25 MHz is a useful
clock for SPI flash)

OTHER NOTES:
- From probing lines, it appears that the Quad SPI block has a divide
  by 4 somewhere inside it (probably so it can oversample the lines,
  or possibly so it can generate phase-offset clocks).  Thus we need
  the core to go 4 times faster than we'd expect to run the SPI bus.
- SPI devices usually specify the MAX frequency they should be clocked
  at, so it's important that we use the clk_rcg2_floor_ops here rather
  than the clk_rcg2_ops

Changes in v2:
- Only 19.2, 100, 150, and 300 MHz now.
- All clocks come from MAIN rather than EVEN.
- Use parent map 0 instead of new parent map 9.

Douglas Anderson (2):
  clk: qcom: Add qspi (Quad SPI) clock defines for sdm845 to header
  clk: qcom: Add qspi (Quad SPI) clocks for sdm845

 drivers/clk/qcom/gcc-sdm845.c   | 63 +
 include/dt-bindings/clock/qcom,gcc-sdm845.h |  3 +
 2 files changed, 66 insertions(+)

-- 
2.18.0.233.g985f88cf7e-goog

Re: [PATCH] tpm: add support for partial reads

2018-07-23 Thread Jason Gunthorpe

On Mon, Jul 23, 2018 at 02:38:08PM -0700, Tadeusz Struk wrote:
> On 07/23/2018 02:13 PM, James Bottomley wrote:
> > The current patch does, you even provided a use case in your last email
> >  (it's do command to get sizing followed by do command with correctly
> > sized buffer). 
> 
> The example I provided was: #1 send a command, #2 read the response header
> (10 bytes), get the actual response size from the header and then #3 read
> the full response (response size - size of the header bytes).

The proposed patch doesn't clear the data_pending if the entire buffer
is not consumed, so of course it is ABI breaking, that really isn't OK.

> > However, if you tie it to O_NONBLOCK, it won't because no-one currently
> > opens the TPM device non blocking so it's an ABI conformant
> > discriminator of the uses.  Tying to O_NONBLOCK should be simple
> > because it's in file->f_flags.
> 
> I think that it might be an option. Especially that I have this on top of
> the async patch. Let's discuss this when Jarkko is back.

Maybe you could do this by requiring the userspace to call pread()
with a non-zero offset to get the trailing segment of the last
executed command and leave normal read/pread(off=0) with the semantics
as they have today.

Jason

Re: HID: intel_ish-hid: tx_buf memory leak on probe/remove

2018-07-23 Thread Srinivas Pandruvada

On Mon, 2018-07-23 at 20:56 +0300, Anton Vasilyev wrote:
> ish_dev_init() allocates 512*176 bytes memory for tx_buf and stores
> it at
> >wr_free_list_head.link list on ish_probe().
> But there is no deallocation of this memory in ish_remove() and in 
> ish_probe()
> error path.
> So current intel-ish-ipc provides 88 KB memory leak for each
> probe/release.
> 
> I have two ideas 1) to replace kzalloc allocation by devm_kzalloc,
Thanks for finding this. We can replace both alloc in this function
with devm_ calls. Once you have a patch I can test.

Thanks,
Srinivas 

> or 2) release memory stored at >wr_free_list_head.link list
> (and 
> may be at
> >wr_processing_list_head.link) in all driver exits.
> 
> But I do not know which way is preferable for this case.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> --
> Anton Vasilyev
> Linux Verification Center, ISPRAS
> web: http://linuxtesting.org
> e-mail: vasil...@ispras.ru

Re: [PATCH RFC] debugobjects: Make stack check warning more informative

2018-07-23 Thread Yang Shi





On 7/23/18 2:25 PM, Joel Fernandes wrote:

From: "Joel Fernandes (Google)" 

Recently we debugged an issue where debugobject tracking was telling
us of an annotation issue. Turns out the issue was due to the object in
concern being on a different stack which was due to another issue.

Discussing with tglx, he suggested printing the pointers and the
location of the stack for the currently running task. This helped find
the object was on the wrong stack. I turned the resulting patch into
something upstreamable, so that the error message is more informative
and can help in debugging for similar issues in the future.


Acked-by: Yang Shi 



Signed-off-by: Joel Fernandes (Google) 
---
  lib/debugobjects.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 994be4805cec..24c1df0d7466 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -360,9 +360,12 @@ static void debug_object_is_on_stack(void *addr, int 
onstack)
  
  	limit++;

if (is_on_stack)
-   pr_warn("object is on stack, but not annotated\n");
+   pr_warn("object %p is on stack %p, but NOT annotated.\n", addr,
+task_stack_page(current));
else
-   pr_warn("object is not on stack, but annotated\n");
+   pr_warn("object %p is NOT on stack %p, but annotated.\n", addr,
+task_stack_page(current));
+
WARN_ON(1);
  }

Re: [PATCH 0/3] PTI for x86-32 Fixes and Updates

2018-07-23 Thread Pavel Machek

Hi!

> > What I want is "if A can ptrace B, and B has pti disabled, A can have
> > pti disabled as well". Now.. I see someone may want to have it
> > per-thread, because for stuff like javascript JIT, thread may have
> > rights to call ptrace, but is unable to call ptrace because JIT
> > removed that ability... hmm...
> 
> No, you don’t want that. The problem is that Meltdown isn’t a problem that 
> exists in isolation. It’s very plausible that JavaScript code could trigger a 
> speculation attack that, with PTI off, could read kernel memory.

Yeah, the web browser threads that run javascript code should have PTI
on. But maybe I want the rest of web browser with PTI off.

So... yes, I see why someone may want it per-thread (and not
per-process).

I guess per-process would be good enough for me. Actually, maybe even
per-uid. I don't have any fancy security here, so anything running uid
0 and 1000 is close enough to trusted.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

[PATCH v2 1/2] clk: qcom: Add qspi (Quad SPI) clock defines for sdm845 to header

2018-07-23 Thread Douglas Anderson

These clocks will need to be defined in the clock driver and
referenced in device tree files.

Signed-off-by: Douglas Anderson 
---

Changes in v2: None

 include/dt-bindings/clock/qcom,gcc-sdm845.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/dt-bindings/clock/qcom,gcc-sdm845.h 
b/include/dt-bindings/clock/qcom,gcc-sdm845.h
index f96fc2dbf60e..b8eae5a76503 100644
--- a/include/dt-bindings/clock/qcom,gcc-sdm845.h
+++ b/include/dt-bindings/clock/qcom,gcc-sdm845.h
@@ -194,6 +194,9 @@
 #define GPLL4  184
 #define GCC_CPUSS_DVM_BUS_CLK  185
 #define GCC_CPUSS_GNOC_CLK 186
+#define GCC_QSPI_CORE_CLK_SRC  187
+#define GCC_QSPI_CORE_CLK  188
+#define GCC_QSPI_CNOC_PERIPH_AHB_CLK   189
 
 /* GCC Resets */
 #define GCC_MMSS_BCR   0
-- 
2.18.0.233.g985f88cf7e-goog

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1916 matches

Mail list logo