Re: [PATCH v5 3/4] arm64: Implement page table free interfaces

2018-03-28 Thread Chintan Pandya



On 3/27/2018 11:30 PM, Will Deacon wrote:

Hi Chintan,

Hi Will,



On Tue, Mar 27, 2018 at 06:54:59PM +0530, Chintan Pandya wrote:

Implement pud_free_pmd_page() and pmd_free_pte_page().

Implementation requires,
  1) Freeing of the un-used next level page tables
  2) Clearing off the current pud/pmd entry
  3) Invalidate TLB which could have previously
 valid but not stale entry

Signed-off-by: Chintan Pandya 
---
V4->V5:
  - Using __flush_tlb_kernel_pgtable instead of
flush_tlb_kernel_range


  arch/arm64/mm/mmu.c | 33 +++--
  1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index da98828..3552c7a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #define NO_BLOCK_MAPPINGS	BIT(0)

  #define NO_CONT_MAPPINGS  BIT(1)
@@ -973,12 +974,40 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
  }
  
+static int __pmd_free_pte_page(pmd_t *pmd, unsigned long addr, bool tlb_inv)

+{
+   pmd_t *table;
+
+   if (pmd_val(*pmd)) {


Please can you follow what I did in 20a004e7b017 ("arm64: mm: Use
READ_ONCE/WRITE_ONCE when accessing page tables") and:

   1. Use consistent naming, so pmd_t * pmdp.
   2. Use READ_ONCE to dereference the entry once into a local.

Similarly for the pud code below.


Sure. I'll fix this in v6.




+   table = __va(pmd_val(*pmd));
+   pmd_clear(pmd);
+   if (tlb_inv)
+   __flush_tlb_kernel_pgtable(addr);
+
+   free_page((unsigned long) table);


Hmm. Surely it's only safe to call free_page if !tlb_inv in situations when
the page table is already disconnected at a higher level? That doesn't
appear to be the case with the function below, which still has the pud
installed. What am I missing?



Point ! Without the invalidation, free'ing a page is not safe. Better, I
do __flush_tlb_kernel_pgtable() every time. This might not be as costly
as flush_tlb_kernel_range().


+   }
+   return 1;
+}
+
  int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  {
-   return pud_none(*pud);
+   pmd_t *table;
+   int i;
+
+   if (pud_val(*pud)) {
+   table = __va(pud_val(*pud));
+   for (i = 0; i < PTRS_PER_PMD; i++)
+   __pmd_free_pte_page(&table[i], addr + (i * PMD_SIZE),
+   false);
+
+   pud_clear(pud);
+   flush_tlb_kernel_range(addr, addr + PUD_SIZE);


Why aren't you using __flush_tlb_kernel_pgtable here?



Now that I will call __flush_tlb_kernel_pgtable() for every  PMD, I can
use __flush_tlb_kernel_pgtable() here as well.

Previously, the thought was, while invalidating PUD by VA would not work
always because PUD may have next level of valid mapping still present in
the table (valid next PMD but invalid next-to-next PTE). In this case
doing just __flush_tlb_kernel_pgtable() for PUD might not be enough. We
need to invalidate subsequent tables as well which I was skipping for 
optimization. So, I used flush_tlb_kernel_range().


I will upload v6.


Will



Chintan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project


general protection fault in account_system_index_time

2018-03-28 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
Linux 4.16-rc7
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=bab86ea70f6f74bd9199


So far this crash happened 2 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4644092092874752
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4728333984071680
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6080200710291456
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8440362230543204781

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+bab86ea70f6f74bd9...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

IPVS: ftp: loaded support on port[0] = 21
kasan: CONFIG_KASAN_INLINE enabled
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 164097 Comm: � Not tainted 4.16.0-rc7+ #368
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
RIP: 0010:get_running_cputimer include/linux/sched/cputime.h:85 [inline]
RIP: 0010:account_group_system_time include/linux/sched/cputime.h:149  
[inline]

RIP: 0010:account_system_index_time+0xdd/0x5e0 kernel/sched/cputime.c:172
RSP: 0018:8801db207930 EFLAGS: 00010006
RAX: 0002810100028101 RBX: 8801b22d0300 RCX: 502020005047
RDX: dc00 RSI: 000f4240 RDI: 0002810100028239
RBP: 8801db207a10 R08: 00028101 R09: fbfff0f6a2bb
R10: 8801db2079e8 R11: fbfff0f6a2ba R12: 11003b640f29
R13: 0002c5c0 R14: 000f4240 R15: 0002
FS:  019fd880() GS:8801db20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00421d60 CR3: 0001b1963003 CR4: 001606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 account_system_time+0x7f/0xb0 kernel/sched/cputime.c:203
 account_process_tick+0xd4/0x3e0 kernel/sched/cputime.c:502
 update_process_times+0x23/0x60 kernel/time/timer.c:1634
 tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
 tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
 __run_hrtimer kernel/time/hrtimer.c:1349 [inline]
 __hrtimer_run_queues+0x39c/0xec0 kernel/time/hrtimer.c:1411
 hrtimer_interrupt+0x2a5/0x6f0 kernel/time/hrtimer.c:1469
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
Code: ea 03 80 3c 02 00 0f 85 8b 04 00 00 48 8b 83 f8 06 00 00 48 ba 00 00  
00 00 00 fc ff df 48 8d b8 38 01 00 00 48 89 f9 48 c1 e9 03 <0f> b6 14 11  
48 89 f9 83 e1 07 38 ca 7f 08 84 d2 0f 85 90 03 00
RIP: __read_once_size include/linux/compiler.h:188 [inline] RSP:  
8801db207930
RIP: get_running_cputimer include/linux/sched/cputime.h:85 [inline] RSP:  
8801db207930
RIP: account_group_system_time include/linux/sched/cputime.h:149 [inline]  
RSP: 8801db207930
RIP: account_system_index_time+0xdd/0x5e0 kernel/sched/cputime.c:172 RSP:  
8801db207930


==
WARNING: possible circular locking dependency detected
4.16.0-rc7+ #368 Not tainted
--
rcu_sched/8 is trying to acquire lock:
 ((console_sem).lock){..-.}, at: [<11a525fb>]  
down_trylock+0x13/0x70 kernel/locking/semaphore.c:136


but task is already holding lock:
 (&rq->lock){-.-.}, at: [<4637b5ed>] rq_lock_irqsave  
kernel/sched/sched.h:1744 [inline]
 (&rq->lock){-.-.}, at: [<4637b5ed>] load_balance+0xb10/0x34c0  
kernel/sched/fair.c:8548


which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&rq->lock){-.-.}:
   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
   rq_lock kernel/sched/sched.h:1760 [inline]
   task_fork_fair+0x7a/0x690 kernel/sched/fair.c:9471
   sched_fork+0x450/0xc10 kernel/sched/core.c:2405
   copy_process.part.38+0x17c9/0x4bd0 kernel/fork.c:1763
   copy_process kernel/fork.c:1606 [inline]
   _do_fork+0x1f7/0xf70 kernel/fork.c:2087
   kernel_thread+0x34/0x40 kernel/fork.c:2146
   rest_init+0x22/0xf0 init/main.c:403
   start_kernel+0x7f1/0x819 init/main.c:717
   x86_64_st

Re: [PATCH v29 1/4] mm: support reporting free page blocks

2018-03-28 Thread Michal Hocko
On Tue 27-03-18 19:07:22, Michael S. Tsirkin wrote:
> On Tue, Mar 27, 2018 at 08:33:22AM +0200, Michal Hocko wrote:
> > > > > + * The function itself might sleep so it cannot be called from atomic
> > > > > + * contexts.
> > > > I don't see how walk_free_mem_block() can sleep.
> > > 
> > > OK, it would be better to remove this sentence for the current version. 
> > > But
> > > I think we could probably keep it if we decide to add cond_resched() 
> > > below.
> > 
> > The point of this sentence was to make any user aware that the function
> > might sleep from the very begining rather than chase existing callers
> > when we need to add cond_resched or sleep for any other reason. So I
> > would rather keep it.
> 
> Let's say what it is then - "will be changed to sleep in the future".

Do we really want to describe the precise implementation in the
documentation? I thought the main purpose of the documentation is to
describe the _contract_. If I am curious about the implementation I can
look at the code. As I've said earlier in this patchset lifetime. This
interface is rather dangerous because we are exposing guts of our
internal data structures. So we better set expectations of what can and
cannot be done right from the beginning. I definitely do not want
somebody to simply look at the code and see that the interface is
sleepable and abuse that fact.
-- 
Michal Hocko
SUSE Labs


INFO: rcu detected stall in __process_echoes

2018-03-28 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
Linux 4.16-rc7
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=108696293d7a21ab688f


So far this crash happened 10 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4941573204738048
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5314352743710720
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5561943247028224
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8440362230543204781

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+108696293d7a21ab6...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

INFO: rcu_sched self-detected stall on CPU
	0-: (124999 ticks this GP) idle=6aa/1/4611686018427387906  
softirq=9528/9528 fqs=31245

 (t=125000 jiffies g=4714 c=4713 q=8)
NMI backtrace for cpu 0
CPU: 0 PID: 51 Comm: kworker/u4:2 Not tainted 4.16.0-rc7+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: events_unbound flush_to_ldisc
Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
 rcu_dump_cpu_stacks+0x186/0x1de kernel/rcu/tree.c:1375
 print_cpu_stall kernel/rcu/tree.c:1524 [inline]
 check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1592
 __rcu_pending kernel/rcu/tree.c:3361 [inline]
 rcu_pending kernel/rcu/tree.c:3423 [inline]
 rcu_check_callbacks+0x238/0xd20 kernel/rcu/tree.c:2763
 update_process_times+0x30/0x60 kernel/time/timer.c:1636
 tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
 tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
 __run_hrtimer kernel/time/hrtimer.c:1349 [inline]
 __hrtimer_run_queues+0x39c/0xec0 kernel/time/hrtimer.c:1411
 hrtimer_interrupt+0x2a5/0x6f0 kernel/time/hrtimer.c:1469
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:__write_once_size include/linux/compiler.h:215 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x4b/0x50 kernel/kcov.c:109
RSP: 0018:8801d96170a8 EFLAGS: 0293 ORIG_RAX: ff12
RAX: 8801d960a5c0 RBX: c90001fb8000 RCX: 82f38349
RDX:  RSI: 11003b2c15e4 RDI: c90001fb9963
RBP: 8801d96170a8 R08: 11003b2c2dbd R09: 
R10:  R11:  R12: 00057efa5704
R13: dc00 R14: 00057efa5704 R15: 3321
 echo_buf drivers/tty/n_tty.c:144 [inline]
 __process_echoes+0x5c9/0x770 drivers/tty/n_tty.c:732
 flush_echoes drivers/tty/n_tty.c:799 [inline]
 __receive_buf drivers/tty/n_tty.c:1615 [inline]
 n_tty_receive_buf_common+0x1380/0x2520 drivers/tty/n_tty.c:1709
 n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
 tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
 tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
 receive_buf drivers/tty/tty_buffer.c:475 [inline]
 flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
 process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
 worker_thread+0x223/0x1990 kernel/workqueue.c:2247
 kthread+0x33c/0x400 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.


[PATCH v2 0/4] Support perf -vv

2018-03-28 Thread Jin Yao
We keep having bug reports that when users build perf on their own,
but they don't install some needed libraries such as libelf,
libbfd/libibery.

The perf can build, but it is missing important functionality. And
users may complain that perf has issue or bug.

This patch-set support 'perf -vv' and 'perf -version --build-options'
which will print the compiled-in status of libraries.

Once users think perf missing some functionality, it should be very
easy for them to check the libraries status.

For example:

$ ./perf -vv or ./perf -version --build-options
perf version 4.13.rc5.gcb1183
 dwarf: [ on  ]
dwarf_getlocations: [ on  ]
 glibc: [ on  ]
  gtk2: [ on  ]
  libaudit: [ OFF ]
libbfd: [ on  ]
libelf: [ on  ]
   libnuma: [ on  ]
numa_num_possible_cpus: [ on  ]
   libperl: [ on  ]
 libpython: [ on  ]
  libslang: [ on  ]
 libcrypto: [ on  ]
 libunwind: [ on  ]
libdw-dwarf-unwind: [ on  ]
  zlib: [ on  ]
  lzma: [ on  ]
 get_cpuid: [ on  ]
   bpf: [ on  ]
[ on  ]: library is compiled-in
[ OFF ]: library is disabled in make configuration
 OR library is not installed in build environment

Jin Yao (3):
  perf config: Add some new -DHAVE_XXX to CFLAGS
  perf version: Print the compiled-in status of libraries
  perf: Support perf -vv

Jiri Olsa (1):
  tools include: Add config.h header file

 tools/include/tools/config.h | 34 ++
 tools/perf/Makefile.config   | 16 +++
 tools/perf/builtin-version.c | 68 
 tools/perf/perf.c| 22 +++---
 tools/perf/perf.h|  1 +
 5 files changed, 137 insertions(+), 4 deletions(-)
 create mode 100644 tools/include/tools/config.h

-- 
2.7.4



[PATCH v2 3/4] perf version: Print the compiled-in status of libraries

2018-03-28 Thread Jin Yao
This patch checks the values passed by CFLAGS (-DHAVE_XXX) and then
print the status of libraries.

For example, if HAVE_DWARF_SUPPORT is defined, that means the
library "dwarf" is compiled-in. The patch will print the status
"on" for this library otherwise print the status "OFF".

v2:
---
1. Use IS_BUILTIN macro to replace #ifdef/#endif block.

2. Print color for on/OFF.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-version.c | 68 
 1 file changed, 68 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5..6e2e486 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -1,11 +1,79 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "builtin.h"
 #include "perf.h"
+#include "color.h"
 #include 
+#include 
 #include 
+#include 
+
+int version_verbose;
+
+static void on_off_print(const char *status)
+{
+   printf("[ ");
+
+   if (!strcmp(status, "OFF"))
+   color_fprintf(stdout, PERF_COLOR_RED, "%-3s", status);
+   else
+   color_fprintf(stdout, PERF_COLOR_GREEN, "%-3s", status);
+
+   printf(" ]");
+}
+
+static void status_print(const char *name, const char *status)
+{
+   printf("%22s: ", name);
+   on_off_print(status);
+   printf("\n");
+}
+
+#define STATUS(__d, __m)   \
+do {   \
+   if (IS_BUILTIN(__d))\
+   status_print(#__m, "on");   \
+   else\
+   status_print(#__m, "OFF");  \
+} while (0)
+
+static void library_status(void)
+{
+   STATUS(HAVE_DWARF_SUPPORT, dwarf);
+   STATUS(HAVE_DWARF_GETLOCATIONS, dwarf_getlocations);
+   STATUS(HAVE_GLIBC_SUPPORT, glibc);
+   STATUS(HAVE_GTK2_SUPPORT, gtk2);
+   STATUS(HAVE_LIBAUDIT_SUPPORT, libaudit);
+   STATUS(HAVE_LIBBFD_SUPPORT, libbfd);
+   STATUS(HAVE_LIBELF_SUPPORT, libelf);
+   STATUS(HAVE_LIBNUMA_SUPPORT, libnuma);
+   STATUS(HAVE_LIBNUMA_SUPPORT, numa_num_possible_cpus);
+   STATUS(HAVE_LIBPERL_SUPPORT, libperl);
+   STATUS(HAVE_LIBPYTHON_SUPPORT, libpython);
+   STATUS(HAVE_SLANG_SUPPORT, libslang);
+   STATUS(HAVE_LIBCRYPTO_SUPPORT, libcrypto);
+   STATUS(HAVE_LIBUNWIND_SUPPORT, libunwind);
+   STATUS(HAVE_DWARF_SUPPORT, libdw-dwarf-unwind);
+   STATUS(HAVE_ZLIB_SUPPORT, zlib);
+   STATUS(HAVE_LZMA_SUPPORT, lzma);
+   STATUS(HAVE_AUXTRACE_SUPPORT, get_cpuid);
+   STATUS(HAVE_LIBBPF_SUPPORT, bpf);
+
+   on_off_print("on");
+   printf(": library is compiled-in\n");
+
+   on_off_print("OFF");
+   printf(": library is disabled in make configuration\n");
+   printf(" OR library is not installed in build environment\n");
+}
 
 int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused)
 {
printf("perf version %s\n", perf_version_string);
+
+   if ((argc > 1 && !strcmp(argv[1], "--build-options")) ||
+   (version_verbose == 1)) {
+   library_status();
+   }
+
return 0;
 }
-- 
2.7.4



[PATCH v2 4/4] perf: Support perf -vv

2018-03-28 Thread Jin Yao
We keep having bug reports that when users build perf on their own,
but they don't install some needed libraries such as libelf,
libbfd/libibery.

The perf can build, but it is missing important functionality.

This patch provides a new option '-vv' which will print the
compiled-in status of libraries.

The 'perf -vv' is equal to 'perf -version --build-options'.

For example:

$ ./perf -vv or ./perf -version --build-options
perf version 4.13.rc5.gcb1183
 dwarf: [ on  ]
dwarf_getlocations: [ on  ]
 glibc: [ on  ]
  gtk2: [ on  ]
  libaudit: [ OFF ]
libbfd: [ on  ]
libelf: [ on  ]
   libnuma: [ on  ]
numa_num_possible_cpus: [ on  ]
   libperl: [ on  ]
 libpython: [ on  ]
  libslang: [ on  ]
 libcrypto: [ on  ]
 libunwind: [ on  ]
libdw-dwarf-unwind: [ on  ]
  zlib: [ on  ]
  lzma: [ on  ]
 get_cpuid: [ on  ]
   bpf: [ on  ]
[ on  ]: library is compiled-in
[ OFF ]: library is disabled in make configuration
 OR library is not installed in build environment

v2:
---
Use a global variable version_verbose to count the number of 'v'.

Signed-off-by: Jin Yao 
---
 tools/perf/perf.c | 22 ++
 tools/perf/perf.h |  1 +
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 1b3fc8e..355219e 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -32,7 +32,7 @@
 #include 
 
 const char perf_usage_string[] =
-   "perf [--version] [--help] [OPTIONS] COMMAND [ARGS]";
+   "perf [--version [--build-options]] [--help] [OPTIONS] COMMAND [ARGS]";
 
 const char perf_more_info_string[] =
"See 'perf help COMMAND' for more information on a specific command.";
@@ -163,6 +163,8 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
 {
int handled = 0;
 
+   version_verbose = 0;
+
while (*argc > 0) {
const char *cmd = (*argv)[0];
if (cmd[0] != '-')
@@ -185,9 +187,21 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
break;
}
 
-   if (!strcmp(cmd, "-v")) {
-   (*argv)[0] = "--version";
-   break;
+   if (strstarts(cmd, "-v")) {
+   int i;
+
+   for (i = 2; cmd[i]; i++) {
+   if (cmd[i] == 'v')
+   version_verbose++;
+   }
+
+   /*
+* Only support -v and -vv now
+*/
+   if (version_verbose < 2) {
+   (*argv)[0] = "--version";
+   break;
+   }
}
 
/*
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 8fec1ab..a1a9795 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -84,6 +84,7 @@ struct record_opts {
 struct option;
 extern const char * const *record_usage;
 extern struct option *record_options;
+extern int version_verbose;
 
 int record__parse_freq(const struct option *opt, const char *str, int unset);
 #endif
-- 
2.7.4



[PATCH v2 1/4] tools include: Add config.h header file

2018-03-28 Thread Jin Yao
From: Jiri Olsa 

Adding IS_BUILTIN macro and its dependencies into
tools world.

It's taken from kernel's include/linux/kconfig.h,
which can't be taken completely due to its kconfig
dependencies.

Signed-off-by: Jiri Olsa 
---
 tools/include/tools/config.h | 34 ++
 1 file changed, 34 insertions(+)
 create mode 100644 tools/include/tools/config.h

diff --git a/tools/include/tools/config.h b/tools/include/tools/config.h
new file mode 100644
index 000..08ade7d
--- /dev/null
+++ b/tools/include/tools/config.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TOOLS_CONFIG_H
+#define _TOOLS_CONFIG_H
+
+/* Subset of include/linux/kconfig.h */
+
+#define __ARG_PLACEHOLDER_1 0,
+#define __take_second_arg(__ignored, val, ...) val
+
+/*
+ * Helper macros to use CONFIG_ options in C/CPP expressions. Note that
+ * these only work with boolean and tristate options.
+ */
+
+/*
+ * Getting something that works in C and CPP for an arg that may or may
+ * not be defined is tricky.  Here, if we have "#define CONFIG_BOOGER 1"
+ * we match on the placeholder define, insert the "0," for arg1 and generate
+ * the triplet (0, 1, 0).  Then the last step cherry picks the 2nd arg (a one).
+ * When CONFIG_BOOGER is not defined, we generate a (... 1, 0) pair, and when
+ * the last step cherry picks the 2nd arg, we get a zero.
+ */
+#define __is_defined(x)___is_defined(x)
+#define ___is_defined(val) is_defined(__ARG_PLACEHOLDER_##val)
+#define is_defined(arg1_or_junk)   __take_second_arg(arg1_or_junk 1, 0)
+
+/*
+ * IS_BUILTIN(CONFIG_FOO) evaluates to 1 if CONFIG_FOO is set to 'y', 0
+ * otherwise. For boolean options, this is equivalent to
+ * IS_ENABLED(CONFIG_FOO).
+ */
+#define IS_BUILTIN(option) __is_defined(option)
+
+#endif /* _TOOLS_CONFIG_H */
-- 
2.7.4



Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.

2018-03-28 Thread NeilBrown
On Wed, Mar 28 2018, Herbert Xu wrote:

> On Wed, Mar 28, 2018 at 08:34:19AM +1100, NeilBrown wrote:
>>
>> It is easy to get an -EBUSY insertion failure when .disable_count is
>> enabled, and I did get that.  Blindly propagating that up caused lustre
>> to get terribly confused - not too surprising really.
>
> Right, so this failure mode is specific to your patch 6.

I disagree.  My patch 6 only makes it common instead of exceedingly
rare.  If any table in the list other than the first has a chain with 16
elements, then trying to insert an element with a hash which matches
that chain will fail with -EBUSY.  This is theoretically possible
already, though astronomically unlikely.  So that case will never be
tested for.

>
> I think I see the problem.  As it currently stands, we never
> need growing when we hit the bucket length limit.  We only do
> rehashes.
>
> With your patch, you must change it so that we actually try to
> grow the table if necessary when we hit the bucket length limit.

It is hard to know if it is necessary.  And making the new table larger
will make the error less likely, but still won't make it impossible.  So
callers will have to handle it - just like they currently have to handle
-ENOMEM even though it is highly unlikely (and not strictly necessary).

Are these errors ever actually useful?  I thought I had convinced myself
before that they were (to throttle attacks on the hash function), but
they happen even less often than I thought.

>
> Otherwise it will result in the EBUSY that you're seeing.
>
> I laso think that we shouldn't make this a toggle.  If we're going
> to do disable_count then it should be unconditionally done for
> everyone.
>
> However, I personally would prefer a percpu elem count instead of
> disabling it completely.  Would that be acceptable to your use-case?

Maybe. Reading a percpu counter isn't cheap.  Reading it whenever a hash
chain reaches 16 is reasonable, but I think we would want to read it a
lot more often than that.  So probably store the last-sampled time (with
no locking) and only sample the counter if last-sampled is more than
 jiffies - 10*HZ (???)

In the case in lustre we also shard the LRU list so that adding to the
LRU causes minimal contention. Keeping a shared counter together with
the lru is trivial and summing them periodically is little burden.
Maybe it makes sense to include that functionality if rhashtables so
that it is there for everyone.

A percpu counter uses a lot more memory than atomic_t.  Given that some
callers set nelem_hint to 2 or 3, it seem likely that those callers
don't want to waste memory.  Should we force them to?

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH v2 0/5] allow override of bus format in bridges

2018-03-28 Thread Daniel Vetter
On Mon, Mar 26, 2018 at 11:24:42PM +0200, Peter Rosin wrote:
> Hi!
> 
> [I got to v2 sooner than expected]
> 
> I have an Atmel sama5d31 hooked up to an lvds encoder and then
> on to an lvds panel. Which seems like something that has been
> done one or two times before...
> 
> The problem is that the bus_format of the SoC and the panel do
> not agree. The SoC driver (atmel-hlcdc) can handle the
> rgb444, rgb565, rgb666 and rgb888 bus formats. The hardware is
> wired for the rgb565 case. The lvds encoder supports rgb888 on
> its input side with the LSB wires for each color simply pulled
> down internally in the encoder in my case which means that the
> rgb565 bus_format is the format that works best. And the panel
> is expecting lvds (vesa-24), which is what the encoder outputs.
> 
> The reason I "blame" the bus_format of the drm_connector is that
> with the below DT snippet, things do not work *exactly* due to
> that. At least, it starts to work if I hack the panel-lvds driver
> to report the rgb565 bus_format instead of vesa-24.
> 
>   panel: panel {
>   compatible = "panel-lvds";
> 
>   width-mm = <304>;
>   height-mm = <228;
> 
>   data-mapping = "vesa-24";
> 
>   panel-timing {
>   // 1024x768 @ 60Hz (typical)
>   clock-frequency = <5214 6500 7110>;
>   hactive = <1024>;
>   vactive = <768>;
>   hfront-porch = <48 88 88>;
>   hback-porch = <96 168 168>;
>   hsync-len = <32 64 64>;
>   vfront-porch = <8 13 14>;
>   vback-porch = <8 13 14>;
>   vsync-len = <8 12 14>;
>   };
> 
>   port {
>   panel_input: endpoint {
>   remote-endpoint = <&lvds_encoder_output>;
>   };
>   };
>   };
> 
>   lvds-encoder {
>   compatible = "ti,ds90c185", "lvds-encoder";
> 
>   ports {
>   #address-cells = <1>;
>   #size-cells = <0>;
> 
>   port@0 {
>   reg = <0>;
> 
>   lvds_encoder_input: endpoint {
>   remote-endpoint = <&hlcdc_output>;
>   };
>   };
> 
>   port@1 {
>   reg = <1>;
> 
>   lvds_encoder_output: endpoint {
>   remote-endpoint = <&panel_input>;
>   };
>   };
>   };
>   };
> 
> But, instead of perverting the panel-lvds driver with support
> for a totally fake non-lvds bus_format, I intruduce an API that allows
> display controller drivers to query the required bus_format of any
> intermediate bridges, and match up with that instead of the formats
> given by the drm_connector. I trigger this with this addition to the
> lvds-encoder DT node:
> 
>   interface-pix-fmt = "rgb565";
> 
> Naming is hard though, so I'm not sure if that's good?
> 
> I threw in the first patch, since that is the actual lvds encoder
> I have in this case.
> 
> Suggestions welcome.

Took a quick look, feels rather un-atomic. And there's beend discussing
for other bridge related state that we might want to track (like the full
adjusted_mode that might need to be adjusted at each stage in the chain).
So here's my suggestions:

- Add an optional per-bridge internal state struct using the support in

https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#handling-driver-private-state

  Yes it says "driver private", but since bridge is just helper stuff
  that's all included. "driver private" == "not exposed as uapi" here.
  Include all the usual convenience wrappers to get at the state for a
  bridge.

- Then stuff your bus_format into that new drm_bridge_state struct.

- Add a new bridge callback atomic_check, which gets that bridge state as
  parameter (similar to all the other atomic_check functions).

This way we can even handle the bus_format dynamically, through the atomic
framework your bridge's atomic_check callback can look at the entire
atomic state (both up and down the chain if needed), it all neatly fits
into atomic overall and it's much easier to extend.

Please also cc Laurent Pinchart on this.

Cheers, Daniel


> 
> Cheers,
> Peter
> 
> Changes since v1 https://lkml.org/lkml/2018/3/17/221
> - Add a proper bridge API to query the bus_format instead of abusing
>   the ->get_modes part of the code. This is cleaner but requires
>   changes to all display controller drivers wishing to participate.
> - Add patch to adjust the atmel-hlcdc driver according to the above.
> - Hook the new info into the bridge local to the lvds-encoder instead
>   of messi

AW: Kontakt

2018-03-28 Thread Thomas Stein
Sehr geehrte Damen und Herren, 

nach unserem Besuch Ihrer Homepage möchten wir Ihnen ein Angebot von Produkten 
vorstellen, das Ihnen ermöglichen wird, den Verkauf Ihrer Produkte sowie 
Dienstleistungen deutlich zu erhöhen.

Die Datenbanken der Firmen sind in für Sie interessante und relevante 
Zielgruppen untergliedert.

Der neue Katalog enthält 187.764 schweizerische Firmen und stellt solche Daten 
zur Verfügung wie: Namen der Firma, Firmenanschrift, Kontaktdaten des 
Firmeninhabers oder des Managers, E-Mail-Adresse, Telefonummer,
Faxnummer, Branche usw.

*** 
1. Schweiz 2018 ( 187 764 ) - 149 EUR ( bis zum 28.03.2018 )
***

Die Verwendungsmöglichkeiten der Datenbanken sind praktisch unbegrenzt und Sie 
können durch Verwendung 
der von uns entwickelten Programme des personalisierten Versendens von 
Angeboten u.ä. mittels
E-mailing bzw. Fax effektive und sichere Werbekampagnen damit durchführen.
Bitte informieren Sie sich über die weiteren Details einmal unverbindlich auf 
unseren Webseite:

http://www.gb-schweiz.net/?page=catalog

MfG
Thomas Stein



Jeg venter på at høre fra dig

2018-03-28 Thread Mr.Yuehan Pan



God dag,

Jeg er Mr. Yuehan Pan, direktør for Bank of China
Jeg leder efter en leder / investeringspartner, der vil arbejde sammen med
mig for
en fælles virksomhed.

Kontakt mig i min private email for flere detaljer.
email (yuehanpa...@gmail.com)

Venter på at høre fra dig.

Tak skal du have,

Mr.Yuehan Pan`




INFO: rcu detected stall in commit_echoes

2018-03-28 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
Linux 4.16-rc7
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=a0366f667e2cac7c0bbf


So far this crash happened 3 times on upstream.
Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6443860423081984
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8440362230543204781

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a0366f667e2cac7c0...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

INFO: rcu_sched self-detected stall on CPU
	0-: (124999 ticks this GP) idle=692/1/4611686018427387906  
softirq=21889/21889 fqs=31226

 (t=125000 jiffies g=11261 c=11260 q=580)
NMI backtrace for cpu 0
CPU: 0 PID: 215 Comm: kworker/u4:4 Not tainted 4.16.0-rc7+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: events_unbound flush_to_ldisc
Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
 rcu_dump_cpu_stacks+0x186/0x1de kernel/rcu/tree.c:1375
 print_cpu_stall kernel/rcu/tree.c:1524 [inline]
 check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1592
 __rcu_pending kernel/rcu/tree.c:3361 [inline]
 rcu_pending kernel/rcu/tree.c:3423 [inline]
 rcu_check_callbacks+0x238/0xd20 kernel/rcu/tree.c:2763
 update_process_times+0x30/0x60 kernel/time/timer.c:1636
 tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
 tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
 __run_hrtimer kernel/time/hrtimer.c:1349 [inline]
 __hrtimer_run_queues+0x39c/0xec0 kernel/time/hrtimer.c:1411
 hrtimer_interrupt+0x2a5/0x6f0 kernel/time/hrtimer.c:1469
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50
RSP: 0018:8801d93e7070 EFLAGS: 0217 ORIG_RAX: ff12
RAX:  RBX: c90007f8a000 RCX: 82f38392
RDX: 0007 RSI: 11003b278d6c RDI: c90007f8bb1f
RBP: 8801d93e70e0 R08: 11003b27cdb5 R09: 
R10:  R11:  R12: 000577d218be
R13: dc00 R14: 08bf R15: 1ed0
 commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
 n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
 n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
 __receive_buf drivers/tty/n_tty.c:1611 [inline]
 n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
 n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
 tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
 tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
 receive_buf drivers/tty/tty_buffer.c:475 [inline]
 flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
 process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
 worker_thread+0x223/0x1990 kernel/workqueue.c:2247
 kthread+0x33c/0x400 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.


Re: INFO: rcu detected stall in commit_echoes

2018-03-28 Thread Dmitry Vyukov
#syz dup: INFO: rcu detected stall in __process_echoes

On Wed, Mar 28, 2018 at 9:11 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
> Linux 4.16-rc7
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=a0366f667e2cac7c0bbf
>
> So far this crash happened 3 times on upstream.
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6443860423081984
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
> compiler: gcc (GCC) 7.1.1 20170620
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a0366f667e2cac7c0...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> INFO: rcu_sched self-detected stall on CPU
> 0-: (124999 ticks this GP) idle=692/1/4611686018427387906
> softirq=21889/21889 fqs=31226
>  (t=125000 jiffies g=11261 c=11260 q=580)
> NMI backtrace for cpu 0
> CPU: 0 PID: 215 Comm: kworker/u4:4 Not tainted 4.16.0-rc7+ #3
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: events_unbound flush_to_ldisc
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x186/0x1de kernel/rcu/tree.c:1375
>  print_cpu_stall kernel/rcu/tree.c:1524 [inline]
>  check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1592
>  __rcu_pending kernel/rcu/tree.c:3361 [inline]
>  rcu_pending kernel/rcu/tree.c:3423 [inline]
>  rcu_check_callbacks+0x238/0xd20 kernel/rcu/tree.c:2763
>  update_process_times+0x30/0x60 kernel/time/timer.c:1636
>  tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
>  tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
>  __run_hrtimer kernel/time/hrtimer.c:1349 [inline]
>  __hrtimer_run_queues+0x39c/0xec0 kernel/time/hrtimer.c:1411
>  hrtimer_interrupt+0x2a5/0x6f0 kernel/time/hrtimer.c:1469
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
>  
> RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50
> RSP: 0018:8801d93e7070 EFLAGS: 0217 ORIG_RAX: ff12
> RAX:  RBX: c90007f8a000 RCX: 82f38392
> RDX: 0007 RSI: 11003b278d6c RDI: c90007f8bb1f
> RBP: 8801d93e70e0 R08: 11003b27cdb5 R09: 
> R10:  R11:  R12: 000577d218be
> R13: dc00 R14: 08bf R15: 1ed0
>  commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
>  n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
>  n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
>  __receive_buf drivers/tty/n_tty.c:1611 [inline]
>  n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
>  n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
>  tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
>  tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
>  receive_buf drivers/tty/tty_buffer.c:475 [inline]
>  flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
>  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/

WARNING in kvm_arch_vcpu_ioctl_run (3)

2018-03-28 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
99fec39e7725d091c94d1bb0242e40c8092994f6 (Fri Mar 23 22:34:18 2018 +)
Merge tag 'trace-v4.16-rc4' of  
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=760a73552f47a8cd0fd9


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6275011434250240
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5034017172441945317

compiler: gcc (GCC) 7.1.1 20170620
user-space arch: i386

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+760a73552f47a8cd0...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

WARNING: CPU: 1 PID: 9515 at arch/x86/kvm/x86.c:7544  
kvm_arch_vcpu_ioctl_run+0x1c7/0x5c80 arch/x86/kvm/x86.c:7544

Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 9515 Comm: syz-executor4 Not tainted 4.16.0-rc6+ #274
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x1c7/0x5c80 arch/x86/kvm/x86.c:7544
RSP: 0018:8801a2d17580 EFLAGS: 00010212
RAX: 0001 RBX: 8801cdfd8000 RCX: 810dfea7
RDX: 0062 RSI: c90003c1b000 RDI: 8801ac1a8498
RBP: 8801a2d17910 R08: 110035835b2d R09: 0001
R10: 8801a2d17560 R11: 0005 R12: 
R13: 8801ab083100 R14: 8801ac1a8280 R15: 8801ac1a8280
 kvm_vcpu_ioctl+0x6f1/0xff0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2560
 kvm_vcpu_compat_ioctl+0x364/0x450  
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2755

 C_SYSC_ioctl fs/compat_ioctl.c:1461 [inline]
 compat_SyS_ioctl+0x151/0x2a30 fs/compat_ioctl.c:1407
 do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
 do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f41c99
RSP: 002b:f773d09c EFLAGS: 0286 ORIG_RAX: 0036
RAX: ffda RBX: 0019 RCX: ae80
RDX:  RSI:  RDI: 
RBP:  R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.


Re: [PATCH] mtd: Replace typedef with struct

2018-03-28 Thread Boris Brezillon
On Tue, 27 Mar 2018 21:32:19 +0200
Richard Weinberger  wrote:

> Am Sonntag, 18. März 2018, 18:51:23 CEST schrieb Arushi Singhal:
> > Using typedef for a structure type is not suggested in Linux kernel
> > coding style guidelines. Hence, occurrence of typedefs has been
> > removed.
> > 
> > Signed-off-by: Arushi Singhal 
> > ---
> >  drivers/mtd/ssfdc.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/mtd/ssfdc.c b/drivers/mtd/ssfdc.c
> > index 95f0bf9..8bae672 100644
> > --- a/drivers/mtd/ssfdc.c
> > +++ b/drivers/mtd/ssfdc.c
> > @@ -54,15 +54,15 @@ SumSector   2,000   4,000   8,000   16,000  32,000  
> > 64,000  128,000 256,000
> >  SectorSize 512 512 512 512 512 512 512 512
> >  **/
> >  
> > -typedef struct {
> > +struct chs_entry {
> > unsigned long size;
> > unsigned short cyl;
> > unsigned char head;
> > unsigned char sec;
> > -} chs_entry_t;
> > +};
> >  
> >  /* Must be ordered by size */
> > -static const chs_entry_t chs_table[] = {
> > +static const struct chs_entry chs_table[] = {
> > { MiB(  1), 125,  4,  4 },
> > { MiB(  2), 125,  4,  8 },
> > { MiB(  4), 250,  4,  8 },
> >   
> 
> Didn't we already talk about coding style fixes on existing code? ;-)

I'll add one thing to Richard's complaint: please stop sending new
coding style or cosmetic changes until the previous ones have been
accepted.

There's a reason I don't apply those patches right away even though
they are simple. Those patches have a low priority in my review list
and improvements or fixes usually get reviewed before them. The reason
I do that is:

1/ I want to reward contributors who submit things that actually matter
   to the subsystem
2/ It tends to discourage drive-by contributors whose only interest is
   to get a lot of trivial patches in the kernel

Note that I'm not saying never, but you have to accept to wait longer
for this kind of patches, and more importantly, if you keep sending
only coding style patches, we might decide to ignore your contributions
at some point.

Regards,

Boris

-- 
Boris Brezillon, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH 4/6] rhashtable: allow a walk of the hash table without missing objects.

2018-03-28 Thread NeilBrown
On Wed, Mar 28 2018, Herbert Xu wrote:

> On Wed, Mar 28, 2018 at 08:54:41AM +1100, NeilBrown wrote:
>>
>> Possibly.
>> I particularly want the interface to require that you pass the
>> previously returned object to _continue. That makes it easy to see that
>> the object is still being used.  If someone changes to code to delete
>> the object before the _continue, there should be a strong hint that it
>> won't work.
>> 
>> Maybe it would be better to make it a WARN_ON()
>> 
>>   if (!obj || WARN_ON(iter->p != obj))
>>  iter->p = NULL;
>
> This doesn't really protect against the case where obj is removed.
> All it proves is that the user saved a copy of obj which we already
> did anyway.

True.  We ultimately need to trust the user not to do something silly.
We can do little things to help catch obvious errors though.  That was
my intent with the WARN_ON.

>
> To detect an actual removal you'd need to traverse the list.

Yes.  You could reset ->skip to zero and search for the given object.
That feels a bit like excess hand-holding to me, but probably wouldn't
be too expensive.  It only happens when you need to drop out of RCU, and
in that case you are probably already doing something expensive.

>
> I have another idea: we could save insert the walkers into the
> hash table chain at the end, essentially as a hidden list.  We
> can mark it with a flag like rht_marker so that normal traversal
> doesn't see it.
>
> That way the removal code can simply traverse that list and inform
> them that the iterator is invalid.

Sounds like over-kill to me.
It might be reasonable to have a CONFIG_DEBUG_RHASHTABLE which enables
extra to code to catch misuse, but I don't see the justification for
always performing these checks.
The DEBUG code could just scan the chain (usually quite short) to see if
the given element is present.  Of course it might have already been
rehashed to the next table, so you would to allow for that possibility -
probably check tbl->rehash.

Thanks,
NeilBrown

>
> Cheers,
> -- 
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


signature.asc
Description: PGP signature


Re: [PATCH] ANDROID: binder: change down_write to down_read

2018-03-28 Thread Martijn Coenen
On Wed, Mar 28, 2018 at 4:42 AM, Minchan Kim  wrote:
> binder_update_page_range needs down_write of mmap_sem because
> vm_insert_page need to change vma->vm_flags to VM_MIXEDMAP unless
> it is set. However, when I profile binder working, it seems
> every binder buffers should be mapped in advance by binder_mmap.

Yeah this is correct - before doing any binder transactions
binder_mmap() must be called, and we do fail transactions if that
hasn't been done yet. LGTM once you're removed the WARN_ON.

Martijn

> It means we could set VM_MIXEDMAP in bider_mmap time which is
> already hold a mmap_sem as down_write so binder_update_page_range
> doesn't need to hold a mmap_sem as down_write.
>
> Android suffers from mmap_sem contention so let's reduce mmap_sem
> down_write.
>
> Cc: Martijn Coenen 
> Cc: Todd Kjos 
> Cc: Greg Kroah-Hartman 
> Signed-off-by: Minchan Kim 
> ---
>  drivers/android/binder.c   | 2 +-
>  drivers/android/binder_alloc.c | 8 +---
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 764b63a5aade..9a14c6dd60c4 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -4722,7 +4722,7 @@ static int binder_mmap(struct file *filp, struct 
> vm_area_struct *vma)
> failure_string = "bad vm_flags";
> goto err_bad_arg;
> }
> -   vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE;
> +   vma->vm_flags |= (VM_DONTCOPY | VM_MIXEDMAP) & ~VM_MAYWRITE;
> vma->vm_ops = &binder_vm_ops;
> vma->vm_private_data = proc;
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 5a426c877dfb..a184bf12eb15 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -219,7 +219,7 @@ static int binder_update_page_range(struct binder_alloc 
> *alloc, int allocate,
> mm = alloc->vma_vm_mm;
>
> if (mm) {
> -   down_write(&mm->mmap_sem);
> +   down_read(&mm->mmap_sem);
> vma = alloc->vma;
> }
>
> @@ -229,6 +229,8 @@ static int binder_update_page_range(struct binder_alloc 
> *alloc, int allocate,
> goto err_no_vma;
> }
>
> +   WARN_ON_ONCE(vma && !(vma->vm_flags & VM_MIXEDMAP));
> +
> for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
> int ret;
> bool on_lru;
> @@ -288,7 +290,7 @@ static int binder_update_page_range(struct binder_alloc 
> *alloc, int allocate,
> /* vm_insert_page does not seem to increment the refcount */
> }
> if (mm) {
> -   up_write(&mm->mmap_sem);
> +   up_read(&mm->mmap_sem);
> mmput(mm);
> }
> return 0;
> @@ -321,7 +323,7 @@ static int binder_update_page_range(struct binder_alloc 
> *alloc, int allocate,
> }
>  err_no_vma:
> if (mm) {
> -   up_write(&mm->mmap_sem);
> +   up_read(&mm->mmap_sem);
> mmput(mm);
> }
> return vma ? -ENOMEM : -ESRCH;
> --
> 2.17.0.rc1.321.gba9d0f2565-goog
>


Re: [Intel-gfx] [PATCH v4 1/2] drm: Use srcu to protect drm_device.unplugged

2018-03-28 Thread Daniel Vetter
On Wed, Mar 28, 2018 at 09:47:40AM +0300, Oleksandr Andrushchenko wrote:
> From: Noralf Trønnes 
> 
> Use srcu to protect drm_device.unplugged in a race free manner.
> Drivers can use drm_dev_enter()/drm_dev_exit() to protect and mark
> sections preventing access to device resources that are not available
> after the device is gone.
> 
> Suggested-by: Daniel Vetter 
> Signed-off-by: Noralf Trønnes 
> Reviewed-by: Oleksandr Andrushchenko 
> Tested-by: Oleksandr Andrushchenko 
> Cc: intel-...@lists.freedesktop.org

When you apply/forward a patch we also need your s-o-b line, even if you
changed nothing. sob needs to reflect the full record of everyone who
handled a patch from author to when it finally lands in git.
-Daniel
> ---
>  drivers/gpu/drm/drm_drv.c | 54 
> ++-
>  include/drm/drm_device.h  |  9 +++-
>  include/drm/drm_drv.h | 15 +
>  3 files changed, 68 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index a1b9338736e3..32a83b41ab61 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -75,6 +76,8 @@ static bool drm_core_init_complete = false;
>  
>  static struct dentry *drm_debugfs_root;
>  
> +DEFINE_STATIC_SRCU(drm_unplug_srcu);
> +
>  /*
>   * DRM Minors
>   * A DRM device can provide several char-dev interfaces on the DRM-Major. 
> Each
> @@ -318,18 +321,51 @@ void drm_put_dev(struct drm_device *dev)
>  }
>  EXPORT_SYMBOL(drm_put_dev);
>  
> -static void drm_device_set_unplugged(struct drm_device *dev)
> +/**
> + * drm_dev_enter - Enter device critical section
> + * @dev: DRM device
> + * @idx: Pointer to index that will be passed to the matching drm_dev_exit()
> + *
> + * This function marks and protects the beginning of a section that should 
> not
> + * be entered after the device has been unplugged. The section end is marked
> + * with drm_dev_exit(). Calls to this function can be nested.
> + *
> + * Returns:
> + * True if it is OK to enter the section, false otherwise.
> + */
> +bool drm_dev_enter(struct drm_device *dev, int *idx)
> +{
> + *idx = srcu_read_lock(&drm_unplug_srcu);
> +
> + if (dev->unplugged) {
> + srcu_read_unlock(&drm_unplug_srcu, *idx);
> + return false;
> + }
> +
> + return true;
> +}
> +EXPORT_SYMBOL(drm_dev_enter);
> +
> +/**
> + * drm_dev_exit - Exit device critical section
> + * @idx: index returned from drm_dev_enter()
> + *
> + * This function marks the end of a section that should not be entered after
> + * the device has been unplugged.
> + */
> +void drm_dev_exit(int idx)
>  {
> - smp_wmb();
> - atomic_set(&dev->unplugged, 1);
> + srcu_read_unlock(&drm_unplug_srcu, idx);
>  }
> +EXPORT_SYMBOL(drm_dev_exit);
>  
>  /**
>   * drm_dev_unplug - unplug a DRM device
>   * @dev: DRM device
>   *
>   * This unplugs a hotpluggable DRM device, which makes it inaccessible to
> - * userspace operations. Entry-points can use drm_dev_is_unplugged(). This
> + * userspace operations. Entry-points can use drm_dev_enter() and
> + * drm_dev_exit() to protect device resources in a race free manner. This
>   * essentially unregisters the device like drm_dev_unregister(), but can be
>   * called while there are still open users of @dev.
>   */
> @@ -338,10 +374,18 @@ void drm_dev_unplug(struct drm_device *dev)
>   drm_dev_unregister(dev);
>  
>   mutex_lock(&drm_global_mutex);
> - drm_device_set_unplugged(dev);
>   if (dev->open_count == 0)
>   drm_dev_put(dev);
>   mutex_unlock(&drm_global_mutex);
> +
> + /*
> +  * After synchronizing any critical read section is guaranteed to see
> +  * the new value of ->unplugged, and any critical section which might
> +  * still have seen the old value of ->unplugged is guaranteed to have
> +  * finished.
> +  */
> + dev->unplugged = true;
> + synchronize_srcu(&drm_unplug_srcu);
>  }
>  EXPORT_SYMBOL(drm_dev_unplug);
>  
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index 7c4fa32f3fc6..3a0eac2885b7 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -46,7 +46,14 @@ struct drm_device {
>   /* currently active master for this device. Protected by master_mutex */
>   struct drm_master *master;
>  
> - atomic_t unplugged; /**< Flag whether dev is dead */
> + /**
> +  * @unplugged:
> +  *
> +  * Flag to tell if the device has been unplugged.
> +  * See drm_dev_enter() and drm_dev_is_unplugged().
> +  */
> + bool unplugged;
> +
>   struct inode *anon_inode;   /**< inode for private 
> address-space */
>   char *unique;   /**< unique name of the device 
> */
>   /*@} */
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index d2

linux-next: manual merge of the userns tree with the syscalls tree

2018-03-28 Thread Stephen Rothwell
Hi Eric,

Today's linux-next merge of the userns tree got a conflict in:

  ipc/msg.c

between commit:

  370c8f44ce16 ("ipc: add msgget syscall wrapper")

from the syscalls tree and commit:

  50ab44b1c5d1 ("ipc: Directly call the security hook in ipc_ops.associate")

from the userns tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc ipc/msg.c
index 9de48065c1ac,d667dd8e97ab..
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@@ -253,17 -272,7 +272,7 @@@ static void freeque(struct ipc_namespac
ipc_rcu_putref(&msq->q_perm, msg_rcu_free);
  }
  
- /*
-  * Called with msg_ids.rwsem and ipcp locked.
-  */
- static inline int msg_security(struct kern_ipc_perm *ipcp, int msgflg)
- {
-   struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm);
- 
-   return security_msg_queue_associate(msq, msgflg);
- }
- 
 -SYSCALL_DEFINE2(msgget, key_t, key, int, msgflg)
 +long ksys_msgget(key_t key, int msgflg)
  {
struct ipc_namespace *ns;
static const struct ipc_ops msg_ops = {


pgpDihZTlNHWD.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 1/6] phy: qcom-qmp: Enable pipe_clk before checking USB3 PHY_STATUS

2018-03-28 Thread Manu Gautam
Hi,


On 3/28/2018 1:44 AM, Doug Anderson wrote:
> Hi,
>
> On Tue, Mar 27, 2018 at 12:50 AM, Manu Gautam  wrote:
>> Hi,
>>
>>
>> On 3/27/2018 12:26 PM, Vivek Gautam wrote:
>>>
>>> On 3/27/2018 10:37 AM, Manu Gautam wrote:
 Hi Doug,


 On 3/27/2018 9:56 AM, Doug Anderson wrote:
> Manu
>
> On Thu, Mar 22, 2018 at 11:11 PM, Manu Gautam  
> wrote:
>> QMP PHY for USB mode requires pipe_clk for calibration and PLL lock
>> to take place. This clock is output from PHY to GCC clock_ctl and then
>> fed back to QMP PHY and is available from PHY only after PHY is reset
>> and initialized, hence it can't be enabled too early in initialization
>> sequence.
>>
>> Signed-off-by: Manu Gautam 
>> ---
>>   drivers/phy/qualcomm/phy-qcom-qmp.c | 33 
>> -
>>   1 file changed, 32 insertions(+), 1 deletion(-)
> So it's now new with this patch, but it's more obvious with this
> patch.  It seems like "UFS/PCIE" is kinda broken w/ respect to how it
> controls its clock.  Specifically:
>
> * If you init the PHY but don't power it on, then you "exit" the PHY:
> you'll disable/unprepare "pipe_clk" even though you never
> prepare/enabled it.
>
> * If you init the PHY, power it on, power it off, power it on, and
> exit the PHY: you'll leave the clock prepared one extra time.
>
> Specifically I'd expect: for UFS/PCIE the disable/unprepare should be
> symmetric with the enable/prepare and should be in "power off", not in
> exit.
>
> ...or did I miss something?
>
>
> Interestingly, your patch fixes this problem for USB3 (where init/exit
> are now symmetric), but leaves the problem there for UFS/PCIE.
>
 Thanks for review.
 One of the reason why pipe_clk is disabled as part of phy_exit is that
 halt_check from clk_disable reports error if called after PHY has been
 powered down or phy_exit.
 I believe that warning should be ignored in qcom gcc-clock driver
 (for applicable platforms) by using BRANCH_HALT_DELAY as halt_check
 for pipe_clk and performing clk_disable from power_off for UFS/PCIE.
>>> UFS doesn't use PIPE clock.
> Just to confirm: we no longer need to do this "BRANCH_HALT_DELAY" now
> that we've figured everything out, right?

That is still needed as PHY might take some time to generate pipe_clk
after its PLL is locked (or while initialization sequence is carried out).
Performing clk_enable will throw a warning. Hence, it is better to
have halt_check that will allow to club pipe_clk with other clocks and
enable it at the beginning of phy_init.

>
>
>> Yes, UFS PHY doesn't use one. But similar to pipe_clk there are rx/tx 
>> symbol_clk
>> output from PHY that is used by UFS controller. I will update code comments
>> to not refer UFS for pipe_clk.
>>
>>> But considering for PCIe, if we disable pipe clock when phy is still 
>>> running, then
>>> it shouldn't be a problem. We should also not see the halt warning as the 
>>> gcc
>>> driver should be able to just turn the gate off.
>>> The reason why it will throw that error is when the parent clock to that 
>>> gate
>>> is gated, i.e. the pipe clock is not flowing on that branch.
>> I got the confirmation that pipe_clk is needed for PCIE as well for its
>> initialization to happen successfully. So we do need clock driver change
>> to fix this in PHY driver.
> So basically if I'm understanding this correctly:
>
> * Both USB and PCIE need the clk_enable() in qcom_qmp_phy_init()
>
> * UFS doesn't even use a pipe clock (pipe_clk is NULL and thus these
> calls are no-ops).
>
> So that means the next version of this code will simply get rid of
> qcom_qmp_phy_poweron() and we can now use the same phy_ops for both
> everything again?  That also makes everything symmetric and gets rid
> of the possible imbalance of clock enable/disable, so I'm happy.
Yes.
>
>
> Actually: I'll also throw out a drastic idea here.  Maybe instead of
> having a NULL power_on/power_off, we should have a NULL init/exit.
> Does anything break if all the stuff that happens today in
> qcom_qmp_phy_com_init() happens at power_on() time instead of init()
> time?  I suggest this because:
>
> * It sounds like init() is supposed to be for initialization that can
> happen _before_ power on of the PHY.
>
> * Any initialization that happens after the PHY has been powered on
> seems expected to just be in the power_on() function after the
> regulator was enabled.
>
>
> Presumably moving this stuff to power_on could save you some power in
> some cases (since the client of the PHY presumably turns power off to
> the PHY with the idea of saving power).

This could be ok for DWC3 USB core driver which uses both phy_init and
power_on together on init/suspend.
But it looks like ufs-qcom and pcie-qcom (mainly ufs) handle power_on
and phy_init differently. They also reset core while running init/power_on.
Changing power_

Re: [PATCH v3 3/6] dt-bindings: phy-qcom-qmp: Update bindings for sdm845

2018-03-28 Thread Manu Gautam
Hi,


On 3/28/2018 3:07 AM, Doug Anderson wrote:
> Hi,
>
> On Thu, Mar 22, 2018 at 11:11 PM, Manu Gautam  wrote:
>> Update compatible strings for USB3 PHYs on SDM845.
>> One is QMPv3 DisplayPort-USB combo PHY and other one
>> is USB UNI PHY which is single lane USB3 PHY without
>> DP capability.
>>
>> Reviewed-by: Rob Herring 
>> Signed-off-by: Manu Gautam 
>> ---
>>  Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt 
>> b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
>> index dcf1b8f..cef8765 100644
>> --- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
>> +++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
>> @@ -9,7 +9,9 @@ Required properties:
>>"qcom,ipq8074-qmp-pcie-phy" for PCIe phy on IPQ8074
>>"qcom,msm8996-qmp-pcie-phy" for 14nm PCIe phy on msm8996,
>>"qcom,msm8996-qmp-usb3-phy" for 14nm USB3 phy on msm8996,
>> -  "qcom,qmp-v3-usb3-phy" for USB3 QMP V3 phy.
>> +  "qcom,qmp-v3-usb3-phy" for USB3 QMP V3 phy,
>> +  "qcom,sdm845-qmp-usb3-phy" for USB3 QMP V3 phy on sdm845,
>> +  "qcom,sdm845-qmp-usb3-uni-phy" for USB3 QMP V3 UNI phy on 
>> sdm845.
> I'm confused.  What value does "qcom,qmp-v3-usb3-phy" have as a
> separate entry from "qcom,sdm845-qmp-usb3-phy"?  Is
> "qcom,qmp-v3-usb3-phy" expected to work on some non-SDM845 based
> device?
>
> Personally I think you should remove "qcom,qmp-v3-usb3-phy" from the
> bindings as part of this patch (replacing it with the new string
> qcom,sdm845-qmp-usb3-phy)".  Yeah, yeah bindings are forever.  ...but
> that particular string was added about a month ago and (I believe) it
> was intended for SDM845 anyway.  As per
>  match to the exact same
> PHY data which leads extra credence to my belief.
>
> If later on you find that some future chip can use the exact same
> driver / settings as the SDM845 you can always list the
> "qcom,sdm845-qmp-usb3-phy" string as a secondary compatible anyway.
>

I agree. Will just remove "qcom,sdm845-qmp-usb3-phy"

> -Doug

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH 1/6] aio: don't print the page size at boot time

2018-03-28 Thread Christoph Hellwig
The page size is in no way related to the aio code, and printing it in
the (debug) dmesg at every boot serves no purpose.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index a062d75109cb..03d59593912d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -264,9 +264,6 @@ static int __init aio_setup(void)
 
kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
-
-   pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
-
return 0;
 }
 __initcall(aio_setup);
-- 
2.14.2



io_pgetevents & aio fsync V2

2018-03-28 Thread Christoph Hellwig
Hi all,

this patch adds workqueue based fsync offload.  Version of this
patch have been floating around for a couple years, but we now
have a user with seastar used by ScyllaDB (who sponsored this
work) that really wants this in addition to the aio poll support.
More details are in the patch itself.

Because the iocb types have been defined sine day one (and probably
were supported by RHEL3) libaio already supports these calls as-is.

This also pulls in the aio cleanups and io_pgetevents support previously
submitted and review as part of the aio poll series.  The aio poll
series will be resubmitted on top of this series

A git tree is available here:

git://git.infradead.org/users/hch/vfs.git aio-fsync.3

Gitweb:

http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-fsync.3

Changes since V1:
 - remove a BUG_ON_ONE(is_sync_kiocb(kiocb));
 - moved cancellation patches to the poll series
 - improve a list_empty check


[PATCH 6/6] aio: implement io_pgetevents

2018-03-28 Thread Christoph Hellwig
This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:

sigset_t origmask;

pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
ret = io_getevents(ctx, min_nr, nr, events, timeout);
pthread_sigmask(SIG_SETMASK, &origmask, NULL);

Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure.  It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/aio.c   | 114 ++---
 include/linux/compat.h |   7 ++
 include/linux/syscalls.h   |   6 ++
 include/uapi/asm-generic/unistd.h  |   4 +-
 include/uapi/linux/aio_abi.h   |   6 ++
 kernel/sys_ni.c|   2 +
 8 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 2a5e99cff859..c1018580ddaa 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382i386pkey_free   sys_pkey_free
 383i386statx   sys_statx
 384i386arch_prctl  sys_arch_prctl  
compat_sys_arch_prctl
+385i386io_pgetevents   sys_io_pgetevents   
compat_sys_io_pgetevents
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..e995cd2b4e65 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330common  pkey_alloc  sys_pkey_alloc
 331common  pkey_free   sys_pkey_free
 332common  statx   sys_statx
+333common  io_pgetevents   sys_io_pgetevents
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/aio.c b/fs/aio.c
index fd6c72918a8e..0df07d399a05 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1296,10 +1296,6 @@ static long read_events(struct kioctx *ctx, long min_nr, 
long nr,
wait_event_interruptible_hrtimeout(ctx->wait,
aio_read_events(ctx, min_nr, nr, event, &ret),
until);
-
-   if (!ret && signal_pending(current))
-   ret = -EINTR;
-
return ret;
 }
 
@@ -1914,13 +1910,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
struct timespec __user *, timeout)
 {
struct timespec64   ts;
+   int ret;
+
+   if (timeout && unlikely(get_timespec64(&ts, timeout)))
+   return -EFAULT;
+
+   ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+   if (!ret && signal_pending(current))
+   ret = -EINTR;
+   return ret;
+}
 
-   if (timeout) {
-   if (unlikely(get_timespec64(&ts, timeout)))
+SYSCALL_DEFINE6(io_pgetevents,
+   aio_context_t, ctx_id,
+   long, min_nr,
+   long, nr,
+   struct io_event __user *, events,
+   struct timespec __user *, timeout,
+   const struct __aio_sigset __user *, usig)
+{
+   struct __aio_sigset ksig = { NULL, };
+   sigset_tksigmask, sigsaved;
+   struct timespec64   ts;
+   int ret;
+
+   if (timeout && unlikely(get_timespec64(&ts, timeout)))
+   return -EFAULT;
+
+   if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+   return -EFAULT;
+
+   if (ksig.sigmask) {
+   if (ksig.sigsetsize != sizeof(sigset_t))
+   return -EINVAL;
+   if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
return -EFAULT;
+   sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+   sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+   }
+
+   ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+   if (signal_pending(current)) {
+   if (ksig.sigmask) {
+   current->saved_sigmask = sigsaved;
+   set_restore_sigmask();
+   }
+
+   if (!ret)
+   ret = -ERESTARTNOHAND;
+   } else {
+   if (ksig.sigmask)
+   sigprocmask(SIG_SETMASK, &sigsaved, NULL);
}
 
-   return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : 
NUL

Re: [PATCH 14/19] powerpc/altivec: Add missing prototypes for altivec

2018-03-28 Thread Mathieu Malaterre
On Tue, Mar 27, 2018 at 7:33 PM, LEROY Christophe
 wrote:
> LEROY Christophe  a écrit :
>
>
>> Mathieu Malaterre  a écrit :
>>
>>> Christophe,
>>>
>>> On Sat, Mar 24, 2018 at 9:10 PM, LEROY Christophe
>>>  wrote:

 Mathieu Malaterre  a écrit :


> On Fri, Mar 23, 2018 at 1:19 PM, christophe leroy
>  wrote:
>>
>>
>>
>>
>> Le 22/03/2018 à 21:20, Mathieu Malaterre a écrit :
>>>
>>>
>>>
>>> Some functions prototypes were missing for the non-altivec code. Add
>>> the
>>> missing prototypes directly in xor_vmx, fix warnings treated as
>>> errors
>>> with
>>> W=1:
>>>
>>>   arch/powerpc/lib/xor_vmx_glue.c:18:6: error: no previous prototype
>>> for
>>> ‘xor_altivec_2’ [-Werror=missing-prototypes]
>>>   arch/powerpc/lib/xor_vmx_glue.c:29:6: error: no previous prototype
>>> for
>>> ‘xor_altivec_3’ [-Werror=missing-prototypes]
>>>   arch/powerpc/lib/xor_vmx_glue.c:40:6: error: no previous prototype
>>> for
>>> ‘xor_altivec_4’ [-Werror=missing-prototypes]
>>>   arch/powerpc/lib/xor_vmx_glue.c:52:6: error: no previous prototype
>>> for
>>> ‘xor_altivec_5’ [-Werror=missing-prototypes]
>>>
>>> Signed-off-by: Mathieu Malaterre 
>>> ---
>>>  arch/powerpc/lib/xor_vmx.h | 14 ++
>>>  1 file changed, 14 insertions(+)
>>>
>>> diff --git a/arch/powerpc/lib/xor_vmx.h b/arch/powerpc/lib/xor_vmx.h
>>> index 5c2b0839b179..2173e3c84151 100644
>>> --- a/arch/powerpc/lib/xor_vmx.h
>>> +++ b/arch/powerpc/lib/xor_vmx.h
>>> @@ -19,3 +19,17 @@ void __xor_altivec_4(unsigned long bytes, unsigned
>>> long
>>> *v1_in,
>>>  void __xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
>>> unsigned long *v2_in, unsigned long
>>> *v3_in,
>>> unsigned long *v4_in, unsigned long
>>> *v5_in);
>>> +
>>> +void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
>>> +unsigned long *v2_in);
>>> +
>>
>>
>>
>>
>> Only used in one place, should be static instead of adding it in a .h
>>
>> Same for the other ones.
>
>
>
> $ git grep xor_altivec_2
> [...]
> arch/powerpc/lib/xor_vmx_glue.c:EXPORT_SYMBOL(xor_altivec_2);
>
> Are you sure I can change this function to static ?



 Yes you are right.  But in fact those fonctions are already defined in
 asm/xor. h
 So you just need to add the missing #include
>>>
>>>
>>> I originally tried it, but this leads to:
>>>
>>>  CC  arch/powerpc/lib/xor_vmx_glue.o
>>> In file included from arch/powerpc/lib/xor_vmx_glue.c:16:0:
>>> ./arch/powerpc/include/asm/xor.h:39:15: error: variable
>>> ‘xor_block_altivec’ has initializer but incomplete type
>>> static struct xor_block_template xor_block_altivec = {
>>>   ^~
>>> ./arch/powerpc/include/asm/xor.h:40:2: error: unknown field ‘name’
>>> specified in initializer
>>>  .name = "altivec",
>>>  ^
>>> [...]
>>>
>>> The file  (powerpc) is pretty much expected to be included
>>> after .
>>>
>>> I did not want to tweak  to test for #ifdef _XOR_H just before
>>>
>>> #ifdef _XOR_H
>>> static struct xor_block_template xor_block_altivec = {
>>> [...]
>>>
>>> since this seems like a hack to me.
>>>
>>> Is this ok to test for #ifdef _XOR_H in 
>>> ?
>>
>>
>> What about including linux/raid/xor.h in asm/xor.h ?

This leads to:

  CALL../arch/powerpc/kernel/systbl_chk.sh
In file included from ../arch/powerpc/include/asm/xor.h:57:0,
 from ../arch/powerpc/lib/xor_vmx_glue.c:17:
../include/asm-generic/xor.h:688:34: error: ‘xor_block_32regs’ defined
but not used [-Werror=unused-variable]
 static struct xor_block_template xor_block_32regs = {
  ^~~~
../include/asm-generic/xor.h:680:34: error: ‘xor_block_8regs’ defined
but not used [-Werror=unused-variable]
 static struct xor_block_template xor_block_8regs = {
  ^~~
In file included from ../arch/powerpc/lib/xor_vmx_glue.c:17:0:
../arch/powerpc/include/asm/xor.h:39:34: error: ‘xor_block_altivec’
defined but not used [-Werror=unused-variable]
 static struct xor_block_template xor_block_altivec = {
  ^
  CALL../arch/powerpc/kernel/prom_init_check.sh


>
> Or better: including linux/raid/xor.h then asm/xor.h in xor_vmx_glue.c ?
>
> Christophe
>
>>
>> Christophe
>>>
>>>
 Christophe


>
>> Christophe
>>
>>
>>> +void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
>>> +unsigned long *v2_in, unsigned long
>>> *v3_in);
>>> +
>>> +void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
>>> +unsigned long *v2_in, unsigned long
>>> *v3_in,
>>

Re: [PATCH 3/4] gpio: Remove VLA from xra1403 driver

2018-03-28 Thread Geert Uytterhoeven
Hi Laura,

On Sat, Mar 10, 2018 at 1:10 AM, Laura Abbott  wrote:
> The new challenge is to remove VLAs from the kernel
> (see https://lkml.org/lkml/2018/3/7/621)
>
> This patch replaces a VLA with an appropriate call to kmalloc_array.
>
> Signed-off-by: Laura Abbott 

Thanks for your patch!

> --- a/drivers/gpio/gpio-xra1403.c
> +++ b/drivers/gpio/gpio-xra1403.c
> @@ -126,11 +126,16 @@ static void xra1403_dbg_show(struct seq_file *s, struct 
> gpio_chip *chip)
>  {
> int reg;
> struct xra1403 *xra = gpiochip_get_data(chip);
> -   int value[xra1403_regmap_cfg.max_register];

Apparently xra1403_regmap_cfg.max_register is always 0x15?

What about adding

#define XRA_LAST 15

at the top, and replacing both "XRA_IFR | 0x01" and
xra1403_regmap_cfg.max_register by XRA_LAST instead?
That would avoid doing yet another memory allocation over and over.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.

2018-03-28 Thread Herbert Xu
On Wed, Mar 28, 2018 at 06:04:40PM +1100, NeilBrown wrote:
>
> I disagree.  My patch 6 only makes it common instead of exceedingly
> rare.  If any table in the list other than the first has a chain with 16
> elements, then trying to insert an element with a hash which matches
> that chain will fail with -EBUSY.  This is theoretically possible
> already, though astronomically unlikely.  So that case will never be
> tested for.

No that's not true.  If the table is correctly sized then the
probability of having a chain with 16 elements is extremely low.

Even if it does happen we won't fail because we will perform
an immediate rehash.  We only fail if it happens right away
after the rehash (that is, at least another 16 elements have
been inserted and you're trying to insert a 17th element, all
while the new hash table has not been completely populated),
which means that somebody has figured out our hash secret and
failing in that case makes sense.

> It is hard to know if it is necessary.  And making the new table larger
> will make the error less likely, but still won't make it impossible.  So
> callers will have to handle it - just like they currently have to handle
> -ENOMEM even though it is highly unlikely (and not strictly necessary).

Callers should not handle an ENOMEM error by retrying.  Nor should
they retry an EBUSY return value.

> Are these errors ever actually useful?  I thought I had convinced myself
> before that they were (to throttle attacks on the hash function), but
> they happen even less often than I thought.

The EBUSY error indicates that the hash table has essentially
degenereated into a linked list because somebody has worked out
our hash secret.

> Maybe. Reading a percpu counter isn't cheap.  Reading it whenever a hash
> chain reaches 16 is reasonable, but I think we would want to read it a
> lot more often than that.  So probably store the last-sampled time (with
> no locking) and only sample the counter if last-sampled is more than
>  jiffies - 10*HZ (???)

We could also take the spinlock table approach and have a counter
per bucket spinlock.  This should be sufficient as you'll contend
on the bucket spinlock table anyway.

This also allows us to estimate the total table size and not have
to always do a last-ditch growth when it's too late.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v3 5/6] dt-bindings: phy-qcom-usb2: Update bindings for sdm845

2018-03-28 Thread Manu Gautam
Hi,


On 3/28/2018 3:27 AM, Doug Anderson wrote:
> Hi,
>
> On Thu, Mar 22, 2018 at 11:11 PM, Manu Gautam  wrote:
>> Update compatible strings for USB2 PHYs on sdm845.
>> There are two QUSB2 PHYs present on sdm845. Few PHY registers
>> programming is different for these PHYs related to electrical
>> parameters, otherwise both are same.
>>
>> Signed-off-by: Manu Gautam 
>> ---
>>  Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt 
>> b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
>> index 42c9742..b99a57f 100644
>> --- a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
>> +++ b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
>> @@ -6,7 +6,9 @@ QUSB2 controller supports LS/FS/HS usb connectivity on 
>> Qualcomm chipsets.
>>  Required properties:
>>   - compatible: compatible list, contains
>>"qcom,msm8996-qusb2-phy" for 14nm PHY on msm8996,
>> -  "qcom,qusb2-v2-phy" for QUSB2 V2 PHY.
>> +  "qcom,qusb2-v2-phy" for QUSB2 V2 PHY,
>> +  "qcom,sdm845-qusb2-phy-1" for primary PHY on sdm845,
>> +  "qcom,sdm845-qusb2-phy-2" for secondary PHY on sdm845.
> Similar question to the one I posed on
>  for the QMP PHY.  What
> is "qcom,qusb2-v2-phy"?  Is it some ideal abstract version of the PHY?
>  Do we expect that anyone would actually use that compatible string?
>
> In this case in  it
> looks as if you're using the same settings as
> "qcom,sdm845-qusb2-phy-2", so presumably "qcom,qusb2-v2-phy" should
> just be deleted.
>

I will remove "qcom,qusb2-v2-phy".


> -Doug

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: WARNING in kvm_arch_vcpu_ioctl_run (3)

2018-03-28 Thread Wanpeng Li
2018-03-28 15:13 GMT+08:00 syzbot
:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 99fec39e7725d091c94d1bb0242e40c8092994f6 (Fri Mar 23 22:34:18 2018 +)
> Merge tag 'trace-v4.16-rc4' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=760a73552f47a8cd0fd9
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6275011434250240
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5034017172441945317
> compiler: gcc (GCC) 7.1.1 20170620
> user-space arch: i386
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+760a73552f47a8cd0...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> WARNING: CPU: 1 PID: 9515 at arch/x86/kvm/x86.c:7544

Maybe the same as this one. https://lkml.org/lkml/2018/3/21/174 Paolo,
any idea against my analysis?

Regards,
Wanpeng Li

> kvm_arch_vcpu_ioctl_run+0x1c7/0x5c80 arch/x86/kvm/x86.c:7544
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 9515 Comm: syz-executor4 Not tainted 4.16.0-rc6+ #274
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x1f4/0x2b0 lib/bug.c:186
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:kvm_arch_vcpu_ioctl_run+0x1c7/0x5c80 arch/x86/kvm/x86.c:7544
> RSP: 0018:8801a2d17580 EFLAGS: 00010212
> RAX: 0001 RBX: 8801cdfd8000 RCX: 810dfea7
> RDX: 0062 RSI: c90003c1b000 RDI: 8801ac1a8498
> RBP: 8801a2d17910 R08: 110035835b2d R09: 0001
> R10: 8801a2d17560 R11: 0005 R12: 
> R13: 8801ab083100 R14: 8801ac1a8280 R15: 8801ac1a8280
>  kvm_vcpu_ioctl+0x6f1/0xff0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2560
>  kvm_vcpu_compat_ioctl+0x364/0x450
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2755
>  C_SYSC_ioctl fs/compat_ioctl.c:1461 [inline]
>  compat_SyS_ioctl+0x151/0x2a30 fs/compat_ioctl.c:1407
>  do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
>  do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
>  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
> RIP: 0023:0xf7f41c99
> RSP: 002b:f773d09c EFLAGS: 0286 ORIG_RAX: 0036
> RAX: ffda RBX: 0019 RCX: ae80
> RDX:  RSI:  RDI: 
> RBP:  R08:  R09: 
> R10:  R11:  R12: 
> R13:  R14:  R15: 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.


Re: [PATCH] proc: register filesystem last

2018-03-28 Thread Alexey Dobriyan
On Wed, Mar 28, 2018 at 05:48:23AM +0100, Al Viro wrote:
> On Sat, Mar 10, 2018 at 03:06:34AM +0300, Alexey Dobriyan wrote:
> > On Fri, Mar 09, 2018 at 02:49:38PM -0800, Andrew Morton wrote:
> > > On Sat, 10 Mar 2018 01:27:09 +0300 Alexey Dobriyan  
> > > wrote:
> > > 
> > > > As soon as register_filesystem() exits, filesystem can be mounted.
> > > > It is better to present fully operational /proc.
> > > > 
> > > > Of course it doesn't matter because /proc is not modular
> > > > but do it anyway.
> > > > 
> > > > Drop error check, it should be handled by panicking.
> > > 
> > > So... shouldn't we add a call to panic()?
> > 
> > via FS_PANIC flag, yes. I have a patch somewhere.
> > There are 104 filesystems ATM, some internal, some not.
> > Some modular, some not.
> 
> You do realize that the only case when register_filesystem() fails is
> "another driver has already registered filesystem type with the same
> name"?  Is there *ever* a case when
>   * you could expect that to happen and
>   * panic would be a sane response?

It is for standartizing all those error checks in init sequence by
removing them. Modules won't have FS_PANIC.


aio poll and a new in-kernel poll API V7

2018-03-28 Thread Christoph Hellwig
Hi all,

this series adds support for the IOCB_CMD_POLL operation to poll for the
readyness of file descriptors using the aio subsystem.  The API is based
on patches that existed in RHAS2.1 and RHEL3, which means it already is
supported by libaio.  To implement the poll support efficiently new
methods to poll are introduced in struct file_operations:  get_poll_head
and poll_mask.  The first one returns a wait_queue_head to wait on
(lifetime is bound by the file), and the second does a non-blocking
check for the POLL* events.  This allows aio poll to work without
any additional context switches, unlike epoll.

This series sits on top of the aio-fsync series that also includes
support for io_pgetevents.

The changes were sponsored by Scylladb, and improve performance
of the seastar framework up to 10%, while also removing the need
for a privileged SCHED_FIFO epoll listener thread.

git://git.infradead.org/users/hch/vfs.git aio-poll.7

Gitweb:

http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.7

Libaio changes:

https://pagure.io/libaio.git io-poll

Seastar changes (not updated for the new io_pgetevens ABI yet):

https://github.com/avikivity/seastar/commits/aio

Changes since V7:
 - reworked cancellation

Changes since V6:
 - small changelog updates
 - rebased on top of the aio-fsync changes

Changes since V4:
 - rebased ontop of Linux 4.16-rc4

Changes since V3:
 - remove the pre-sleep ->poll_mask call in vfs_poll,
   allow ->get_poll_head to return POLL* values.

Changes since V2:
 - removed a double initialization
 - new vfs_get_poll_head helper
 - document that ->get_poll_head can return NULL
 - call ->poll_mask before sleeping
 - various ACKs
 - add conversion of random to ->poll_mask
 - add conversion of af_alg to ->poll_mask
 - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
 - reshuffled the series so that prep patches and everything not
   requiring the new in-kernel poll API is in the beginning

Changes since V1:
 - handle the NULL ->poll case in vfs_poll
 - dropped the file argument to the ->poll_mask socket operation
 - replace the ->pre_poll socket operation with ->get_poll_head as
   in the file operations


[PATCH] ANDROID: binder: prevent transactions into own process.

2018-03-28 Thread Martijn Coenen
This can't happen with normal nodes (because you can't get a ref
to a node you own), but it could happen with the context manager;
to make the behavior consistent with regular nodes, reject
transactions into the context manager by the process owning it.

Reported-by: syzbot+09e05aba06723a94d...@syzkaller.appspotmail.com
Signed-off-by: Martijn Coenen 
---
 drivers/android/binder.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index e7e4560e4c6e..57d4ba926ed0 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3001,6 +3001,14 @@ static void binder_transaction(struct binder_proc *proc,
else
return_error = BR_DEAD_REPLY;
mutex_unlock(&context->context_mgr_node_lock);
+   if (target_node && target_node->proc == proc) {
+   binder_user_error("%d:%d got transaction to 
context manager from process owning it\n",
+ proc->pid, thread->pid);
+   return_error = BR_FAILED_REPLY;
+   return_error_param = -EINVAL;
+   return_error_line = __LINE__;
+   goto err_invalid_target_handle;
+   }
}
if (!target_node) {
/*
-- 
2.17.0.rc0.231.g781580f067-goog



[PATCH 03/30] fs: update documentation to mention __poll_t and match the code

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
Reviewed-by: Greg Kroah-Hartman 
---
 Documentation/filesystems/Locking | 2 +-
 Documentation/filesystems/vfs.txt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 75d2d57e2c44..220bba28f72b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -439,7 +439,7 @@ prototypes:
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
-   unsigned int (*poll) (struct file *, struct poll_table_struct *);
+   __poll_t (*poll) (struct file *, struct poll_table_struct *);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index 5fd325df59e2..f608180ad59d 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -856,7 +856,7 @@ struct file_operations {
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
-   unsigned int (*poll) (struct file *, struct poll_table_struct *);
+   __poll_t (*poll) (struct file *, struct poll_table_struct *);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
-- 
2.14.2



[PATCH 01/30] fs: unexport poll_schedule_timeout

2018-03-28 Thread Christoph Hellwig
No users outside of select.c.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 fs/select.c  | 3 +--
 include/linux/poll.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index b6c36254028a..686de7b3a1db 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t 
*wait_address,
add_wait_queue(wait_address, &entry->wait);
 }
 
-int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
+static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
  ktime_t *expires, unsigned long slack)
 {
int rc = -EINTR;
@@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int 
state,
 
return rc;
 }
-EXPORT_SYMBOL(poll_schedule_timeout);
 
 /**
  * poll_select_set_timeout - helper function to setup the timeout value
diff --git a/include/linux/poll.h b/include/linux/poll.h
index f45ebd017eaa..a3576da63377 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -96,8 +96,6 @@ struct poll_wqueues {
 
 extern void poll_initwait(struct poll_wqueues *pwq);
 extern void poll_freewait(struct poll_wqueues *pwq);
-extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
-ktime_t *expires, unsigned long slack);
 extern u64 select_estimate_accuracy(struct timespec64 *tv);
 
 #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
-- 
2.14.2



[PATCH 04/30] fs: add new vfs_poll and file_can_poll helpers

2018-03-28 Thread Christoph Hellwig
These abstract out calls to the poll method in preparation for changes
in how we poll.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 drivers/staging/comedi/drivers/serial2002.c |  4 ++--
 drivers/vfio/virqfd.c   |  2 +-
 drivers/vhost/vhost.c   |  2 +-
 fs/eventpoll.c  |  5 ++---
 fs/select.c | 23 ---
 include/linux/poll.h| 12 
 mm/memcontrol.c |  2 +-
 net/9p/trans_fd.c   | 18 --
 virt/kvm/eventfd.c  |  2 +-
 9 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/comedi/drivers/serial2002.c 
b/drivers/staging/comedi/drivers/serial2002.c
index b3f3b4a201af..5471b2212a62 100644
--- a/drivers/staging/comedi/drivers/serial2002.c
+++ b/drivers/staging/comedi/drivers/serial2002.c
@@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, 
int timeout)
long elapsed;
__poll_t mask;
 
-   mask = f->f_op->poll(f, &table.pt);
+   mask = vfs_poll(f, &table.pt);
if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
EPOLLHUP | EPOLLERR)) {
break;
@@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int timeout)
 
result = -1;
if (!IS_ERR(f)) {
-   if (f->f_op->poll) {
+   if (file_can_poll(f)) {
serial2002_tty_read_poll_wait(f, timeout);
 
if (kernel_read(f, &ch, 1, &pos) == 1)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 085700f1be10..2a1be859ee71 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
 
-   events = irqfd.file->f_op->poll(irqfd.file, &virqfd->pt);
+   events = vfs_poll(irqfd.file, &virqfd->pt);
 
/*
 * Check if there was an event already pending on the eventfd
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1b3e8d2d5c8b..4d27e288bb1d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file 
*file)
if (poll->wqh)
return 0;
 
-   mask = file->f_op->poll(file, &poll->table);
+   mask = vfs_poll(file, &poll->table);
if (mask)
vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
if (mask & EPOLLERR) {
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0f3494ed3ed0..2bebae5a38cf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
 
pt->_key = epi->event.events;
if (!is_file_epoll(epi->ffd.file))
-   return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
-  epi->event.events;
+   return vfs_poll(epi->ffd.file, pt) & epi->event.events;
 
ep = epi->ffd.file->private_data;
poll_wait(epi->ffd.file, &ep->poll_wait, pt);
@@ -2020,7 +2019,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 
/* The target file descriptor must support poll */
error = -EPERM;
-   if (!tf.file->f_op->poll)
+   if (!file_can_poll(tf.file))
goto error_tgt_fput;
 
/* Check if EPOLLWAKEUP is allowed */
diff --git a/fs/select.c b/fs/select.c
index c6c504a814f9..ba91103707ea 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct 
timespec64 *end_time)
continue;
f = fdget(i);
if (f.file) {
-   const struct file_operations *f_op;
-   f_op = f.file->f_op;
-   mask = DEFAULT_POLLMASK;
-   if (f_op->poll) {
-   wait_key_set(wait, in, out,
-bit, busy_flag);
-   mask = (*f_op->poll)(f.file, 
wait);
-   }
+   wait_key_set(wait, in, out, bit,
+busy_flag);
+   mask = vfs_poll(f.file, wait);
+
fdput(f);
if ((mask & POLLIN_SET) && (in & bit)) {

[PATCH 10/30] net: add support for ->poll_mask in proto_ops

2018-03-28 Thread Christoph Hellwig
The socket file operations still implement ->poll until all protocols are
switched over.

Signed-off-by: Christoph Hellwig 
---
 include/linux/net.h |  3 +++
 net/socket.c| 51 ++-
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 91216b16feb7..ce3d4dacb51e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -147,6 +147,9 @@ struct proto_ops {
int (*getname)   (struct socket *sock,
  struct sockaddr *addr,
  int *sockaddr_len, int peer);
+   struct wait_queue_head *(*get_poll_head)(struct socket *sock,
+ __poll_t events);
+   __poll_t(*poll_mask) (struct socket *sock, __poll_t events);
__poll_t(*poll)  (struct file *file, struct socket *sock,
  struct poll_table_struct *wait);
int (*ioctl) (struct socket *sock, unsigned int cmd,
diff --git a/net/socket.c b/net/socket.c
index 3f859a07641a..ceb69ddcd7bd 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -118,8 +118,10 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct 
iov_iter *from);
 static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 
 static int sock_close(struct inode *inode, struct file *file);
-static __poll_t sock_poll(struct file *file,
- struct poll_table_struct *wait);
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+   __poll_t events);
+static __poll_t sock_poll_mask(struct file *file, __poll_t);
+static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait);
 static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 #ifdef CONFIG_COMPAT
 static long compat_sock_ioctl(struct file *file,
@@ -142,6 +144,8 @@ static const struct file_operations socket_file_ops = {
.llseek =   no_llseek,
.read_iter =sock_read_iter,
.write_iter =   sock_write_iter,
+   .get_poll_head = sock_get_poll_head,
+   .poll_mask =sock_poll_mask,
.poll = sock_poll,
.unlocked_ioctl = sock_ioctl,
 #ifdef CONFIG_COMPAT
@@ -1114,14 +1118,51 @@ int sock_create_lite(int family, int type, int 
protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create_lite);
 
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+   __poll_t events)
+{
+   struct socket *sock = file->private_data;
+
+   if (!sock->ops->poll_mask)
+   return NULL;
+   if (sock->ops->get_poll_head)
+   return sock->ops->get_poll_head(sock, events);
+
+   sock_poll_busy_loop(sock, events);
+   return sk_sleep(sock->sk);
+}
+
+static __poll_t sock_poll_mask(struct file *file, __poll_t events)
+{
+   struct socket *sock = file->private_data;
+
+   /*
+* We need to be sure we are in sync with the socket flags modification.
+*
+* This memory barrier is paired in the wq_has_sleeper.
+*/
+   smp_mb();
+
+   /* this socket can poll_ll so tell the system call */
+   return sock->ops->poll_mask(sock, events) |
+   (sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0);
+}
+
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
struct socket *sock = file->private_data;
-   __poll_t events = poll_requested_events(wait);
+   __poll_t events = poll_requested_events(wait), mask = 0;
 
-   sock_poll_busy_loop(sock, events);
-   return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
+   if (sock->ops->poll) {
+   sock_poll_busy_loop(sock, events);
+   mask = sock->ops->poll(file, sock, wait);
+   } else if (sock->ops->poll_mask) {
+   sock_poll_wait(file, sock_get_poll_head(file, events), wait);
+   mask = sock->ops->poll_mask(sock, events);
+   }
+
+   return mask | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2



[PATCH 08/30] aio: implement IOCB_CMD_POLL

2018-03-28 Thread Christoph Hellwig
Simple one-shot poll through the io_submit() interface.  To poll for
a file descriptor the application should submit an iocb of type
IOCB_CMD_POLL.  It will poll the fd for the events specified in the
the first 32 bits of the aio_buf field of the iocb.

Unlike poll or epoll without EPOLLONESHOT this interface always works
in one shot mode, that is once the iocb is completed, it will have to be
resubmitted.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 98 +++-
 include/uapi/linux/aio_abi.h |  6 +--
 2 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 232dd84fc897..f4ff749d9889 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -5,6 +5,7 @@
  * Implements an efficient asynchronous io interface.
  *
  * Copyright 2000, 2001, 2002 Red Hat, Inc.  All Rights Reserved.
+ * Copyright 2018 Christoph Hellwig.
  *
  * See ../COPYING for licensing terms.
  */
@@ -162,10 +163,18 @@ struct fsync_iocb {
booldatasync;
 };
 
+struct poll_iocb {
+   struct file *file;
+   __poll_tevents;
+   struct wait_queue_head  *head;
+   struct wait_queue_entry wait;
+};
+
 struct aio_kiocb {
union {
struct kiocbrw;
struct fsync_iocb   fsync;
+   struct poll_iocbpoll;
};
 
struct kioctx   *ki_ctx;
@@ -1589,7 +1598,6 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb 
*iocb, bool datasync)
return -EINVAL;
if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
return -EINVAL;
-
req->file = fget(iocb->aio_fildes);
if (unlikely(!req->file))
return -EBADF;
@@ -1608,6 +1616,92 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb 
*iocb, bool datasync)
return ret;
 }
 
+static void aio_complete_poll(struct poll_iocb *req, __poll_t mask)
+{
+   struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
+   struct file *file = req->file;
+
+   if (aio_complete(iocb, mangle_poll(mask), 0, 0))
+   fput(file);
+}
+
+static int aio_poll_cancel(struct kiocb *rw)
+{
+   struct aio_kiocb *iocb = container_of(rw, struct aio_kiocb, rw);
+   struct file *file = iocb->poll.file;
+
+   remove_wait_queue(iocb->poll.head, &iocb->poll.wait);
+   if (aio_complete(iocb, 0, 0, AIO_COMPLETE_CANCEL))
+   fput(file);
+   return 0;
+}
+
+static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int 
sync,
+   void *key)
+{
+   struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
+   struct file *file = req->file;
+   __poll_t mask = key_to_poll(key);
+
+   assert_spin_locked(&req->head->lock);
+
+   /* for instances that support it check for an event match first: */
+   if (mask && !(mask & req->events))
+   return 0;
+
+   mask = vfs_poll_mask(file, req->events);
+   if (!mask)
+   return 0;
+
+   __remove_wait_queue(req->head, &req->wait);
+   aio_complete_poll(req, mask);
+   return 1;
+}
+
+static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
+{
+   struct poll_iocb *req = &aiocb->poll;
+   unsigned long flags;
+   __poll_t mask;
+
+   /* reject any unknown events outside the normal event mask. */
+   if ((u16)iocb->aio_buf != iocb->aio_buf)
+   return -EINVAL;
+   /* reject fields that are not defined for poll */
+   if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+   return -EINVAL;
+
+   req->events = demangle_poll(iocb->aio_buf) | POLLERR | POLLHUP;
+   req->file = fget(iocb->aio_fildes);
+   if (unlikely(!req->file))
+   return -EBADF;
+
+   req->head = vfs_get_poll_head(req->file, req->events);
+   if (!req->head) {
+   fput(req->file);
+   return -EINVAL; /* same as no support for IOCB_CMD_POLL */
+   }
+   if (IS_ERR(req->head)) {
+   mask = PTR_TO_POLL(req->head);
+   goto done;
+   }
+
+   init_waitqueue_func_entry(&req->wait, aio_poll_wake);
+
+   spin_lock_irqsave(&req->head->lock, flags);
+   mask = vfs_poll_mask(req->file, req->events);
+   if (!mask) {
+   __kiocb_set_cancel_fn(aiocb, aio_poll_cancel,
+   AIO_IOCB_DELAYED_CANCEL);
+   __add_wait_queue(req->head, &req->wait);
+   }
+   spin_unlock_irqrestore(&req->head->lock, flags);
+done:
+   if (mask)
+   aio_complete_poll(req, mask);
+   return -EIOCBQUEUED;
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 struct iocb *iocb, bool compat)
 {

[PATCH 13/30] net/unix: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/unix/af_unix.c | 30 +++---
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2d465bdeccbc..619c6921dd46 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -638,9 +638,8 @@ static int unix_stream_connect(struct socket *, struct 
sockaddr *,
 static int unix_socketpair(struct socket *, struct socket *);
 static int unix_accept(struct socket *, struct socket *, int, bool);
 static int unix_getname(struct socket *, struct sockaddr *, int *, int);
-static __poll_t unix_poll(struct file *, struct socket *, poll_table *);
-static __poll_t unix_dgram_poll(struct file *, struct socket *,
-   poll_table *);
+static __poll_t unix_poll_mask(struct socket *, __poll_t);
+static __poll_t unix_dgram_poll_mask(struct socket *, __poll_t);
 static int unix_ioctl(struct socket *, unsigned int, unsigned long);
 static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t);
@@ -681,7 +680,7 @@ static const struct proto_ops unix_stream_ops = {
.socketpair =   unix_socketpair,
.accept =   unix_accept,
.getname =  unix_getname,
-   .poll = unix_poll,
+   .poll_mask =unix_poll_mask,
.ioctl =unix_ioctl,
.listen =   unix_listen,
.shutdown = unix_shutdown,
@@ -704,7 +703,7 @@ static const struct proto_ops unix_dgram_ops = {
.socketpair =   unix_socketpair,
.accept =   sock_no_accept,
.getname =  unix_getname,
-   .poll = unix_dgram_poll,
+   .poll_mask =unix_dgram_poll_mask,
.ioctl =unix_ioctl,
.listen =   sock_no_listen,
.shutdown = unix_shutdown,
@@ -726,7 +725,7 @@ static const struct proto_ops unix_seqpacket_ops = {
.socketpair =   unix_socketpair,
.accept =   unix_accept,
.getname =  unix_getname,
-   .poll = unix_dgram_poll,
+   .poll_mask =unix_dgram_poll_mask,
.ioctl =unix_ioctl,
.listen =   unix_listen,
.shutdown = unix_shutdown,
@@ -2640,13 +2639,10 @@ static int unix_ioctl(struct socket *sock, unsigned int 
cmd, unsigned long arg)
return err;
 }
 
-static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table 
*wait)
+static __poll_t unix_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
/* exceptional events? */
if (sk->sk_err)
@@ -2675,15 +2671,11 @@ static __poll_t unix_poll(struct file *file, struct 
socket *sock, poll_table *wa
return mask;
 }
 
-static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
-   poll_table *wait)
+static __poll_t unix_dgram_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk, *other;
-   unsigned int writable;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   int writable;
+   __poll_t mask = 0;
 
/* exceptional events? */
if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -2709,7 +2701,7 @@ static __poll_t unix_dgram_poll(struct file *file, struct 
socket *sock,
}
 
/* No write status requested, avoid expensive OUT tests. */
-   if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
+   if (!(events & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
return mask;
 
writable = unix_writable(sk);
-- 
2.14.2



[PATCH 25/30] net/rxrpc: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/rxrpc/af_rxrpc.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0c9c18aa7c77..d2440d5c3ce8 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -729,15 +729,11 @@ static int rxrpc_getsockopt(struct socket *sock, int 
level, int optname,
 /*
  * permit an RxRPC socket to be polled
  */
-static __poll_t rxrpc_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+static __poll_t rxrpc_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct rxrpc_sock *rx = rxrpc_sk(sk);
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
/* the socket is readable if there are any messages waiting on the Rx
 * queue */
@@ -940,7 +936,7 @@ static const struct proto_ops rxrpc_rpc_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= sock_no_getname,
-   .poll   = rxrpc_poll,
+   .poll_mask  = rxrpc_poll_mask,
.ioctl  = sock_no_ioctl,
.listen = rxrpc_listen,
.shutdown   = rxrpc_shutdown,
-- 
2.14.2



Re: [PATCH 4/6] rhashtable: allow a walk of the hash table without missing objects.

2018-03-28 Thread Herbert Xu
On Wed, Mar 28, 2018 at 06:17:57PM +1100, NeilBrown wrote:
>
> Sounds like over-kill to me.
> It might be reasonable to have a CONFIG_DEBUG_RHASHTABLE which enables
> extra to code to catch misuse, but I don't see the justification for
> always performing these checks.
> The DEBUG code could just scan the chain (usually quite short) to see if
> the given element is present.  Of course it might have already been
> rehashed to the next table, so you would to allow for that possibility -
> probably check tbl->rehash.

No this is not meant to debug users incorrectly using the cursor.
This is a replacement of your continue interface by automatically
validating the cursor.

In fact we can make it even more reliable.  We can insert the walker
right into the bucket chain, that way the walking will always be
consistent.

The only problem is that we need be able to differentiate between
a walker, a normal object, and the end of the list.  I think it
should be doable.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH 20/30] net/bluetooth: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/net/bluetooth/bluetooth.h | 2 +-
 net/bluetooth/af_bluetooth.c  | 7 ++-
 net/bluetooth/l2cap_sock.c| 2 +-
 net/bluetooth/rfcomm/sock.c   | 2 +-
 net/bluetooth/sco.c   | 2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/bluetooth/bluetooth.h 
b/include/net/bluetooth/bluetooth.h
index ec9d6bc65855..53ce8176c313 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -271,7 +271,7 @@ int  bt_sock_recvmsg(struct socket *sock, struct msghdr 
*msg, size_t len,
 int flags);
 int  bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg,
size_t len, int flags);
-__poll_t bt_sock_poll(struct file *file, struct socket *sock, poll_table 
*wait);
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
 int  bt_sock_wait_ready(struct sock *sk, unsigned long flags);
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 84d92a077834..80033a7e1de2 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -437,16 +437,13 @@ static inline __poll_t bt_accept_poll(struct sock *parent)
return 0;
 }
 
-__poll_t bt_sock_poll(struct file *file, struct socket *sock,
- poll_table *wait)
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
__poll_t mask = 0;
 
BT_DBG("sock %p, sk %p", sock, sk);
 
-   poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == BT_LISTEN)
return bt_accept_poll(sk);
 
@@ -478,7 +475,7 @@ __poll_t bt_sock_poll(struct file *file, struct socket 
*sock,
 
return mask;
 }
-EXPORT_SYMBOL(bt_sock_poll);
+EXPORT_SYMBOL(bt_sock_poll_mask);
 
 int bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 67a8642f57ea..d20b33daa80f 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1654,7 +1654,7 @@ static const struct proto_ops l2cap_sock_ops = {
.getname= l2cap_sock_getname,
.sendmsg= l2cap_sock_sendmsg,
.recvmsg= l2cap_sock_recvmsg,
-   .poll   = bt_sock_poll,
+   .poll_mask  = bt_sock_poll_mask,
.ioctl  = bt_sock_ioctl,
.mmap   = sock_no_mmap,
.socketpair = sock_no_socketpair,
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 1aaccf637479..b4dc96481d92 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -1049,7 +1049,7 @@ static const struct proto_ops rfcomm_sock_ops = {
.setsockopt = rfcomm_sock_setsockopt,
.getsockopt = rfcomm_sock_getsockopt,
.ioctl  = rfcomm_sock_ioctl,
-   .poll   = bt_sock_poll,
+   .poll_mask  = bt_sock_poll_mask,
.socketpair = sock_no_socketpair,
.mmap   = sock_no_mmap
 };
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 08df57665e1f..b2bf5c767b3e 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -1198,7 +1198,7 @@ static const struct proto_ops sco_sock_ops = {
.getname= sco_sock_getname,
.sendmsg= sco_sock_sendmsg,
.recvmsg= sco_sock_recvmsg,
-   .poll   = bt_sock_poll,
+   .poll_mask  = bt_sock_poll_mask,
.ioctl  = bt_sock_ioctl,
.mmap   = sock_no_mmap,
.socketpair = sock_no_socketpair,
-- 
2.14.2



[PATCH 18/30] net/tipc: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/tipc/socket.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 7dfa9fc99ec3..e9c6f185db74 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -695,10 +695,9 @@ static int tipc_getname(struct socket *sock, struct 
sockaddr *uaddr,
 }
 
 /**
- * tipc_poll - read and possibly block on pollmask
+ * tipc_poll - read pollmask
  * @file: file structure associated with the socket
  * @sock: socket for which to calculate the poll bits
- * @wait: ???
  *
  * Returns pollmask value
  *
@@ -712,15 +711,12 @@ static int tipc_getname(struct socket *sock, struct 
sockaddr *uaddr,
  * imply that the operation will succeed, merely that it should be performed
  * and will not block.
  */
-static __poll_t tipc_poll(struct file *file, struct socket *sock,
- poll_table *wait)
+static __poll_t tipc_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct tipc_sock *tsk = tipc_sk(sk);
__poll_t revents = 0;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_shutdown & RCV_SHUTDOWN)
revents |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
if (sk->sk_shutdown == SHUTDOWN_MASK)
@@ -3020,7 +3016,7 @@ static const struct proto_ops msg_ops = {
.socketpair = tipc_socketpair,
.accept = sock_no_accept,
.getname= tipc_getname,
-   .poll   = tipc_poll,
+   .poll_mask  = tipc_poll_mask,
.ioctl  = tipc_ioctl,
.listen = sock_no_listen,
.shutdown   = tipc_shutdown,
@@ -3041,7 +3037,7 @@ static const struct proto_ops packet_ops = {
.socketpair = tipc_socketpair,
.accept = tipc_accept,
.getname= tipc_getname,
-   .poll   = tipc_poll,
+   .poll_mask  = tipc_poll_mask,
.ioctl  = tipc_ioctl,
.listen = tipc_listen,
.shutdown   = tipc_shutdown,
@@ -3062,7 +3058,7 @@ static const struct proto_ops stream_ops = {
.socketpair = tipc_socketpair,
.accept = tipc_accept,
.getname= tipc_getname,
-   .poll   = tipc_poll,
+   .poll_mask  = tipc_poll_mask,
.ioctl  = tipc_ioctl,
.listen = tipc_listen,
.shutdown   = tipc_shutdown,
-- 
2.14.2



[PATCH 30/30] random: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
The big change is that random_read_wait and random_write_wait are merged
into a single waitqueue that uses keyed wakeups.  Because wait_event_*
doesn't know about that this will lead to occassional spurious wakeups
in _random_read and add_hwgenerator_randomness, but wait_event_* is
designed to handle these and were are not in a a hot path there.

Signed-off-by: Christoph Hellwig 
Acked-by: Theodore Ts'o 
Reviewed-by: Greg Kroah-Hartman 
---
 drivers/char/random.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index e5b3d3ba4660..840d80b64431 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -401,8 +401,7 @@ static struct poolinfo {
 /*
  * Static global variables
  */
-static DECLARE_WAIT_QUEUE_HEAD(random_read_wait);
-static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
+static DECLARE_WAIT_QUEUE_HEAD(random_wait);
 static struct fasync_struct *fasync;
 
 static DEFINE_SPINLOCK(random_ready_list_lock);
@@ -710,7 +709,7 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
 
/* should we wake readers? */
if (entropy_bits >= random_read_wakeup_bits) {
-   wake_up_interruptible(&random_read_wait);
+   wake_up_interruptible_poll(&random_wait, POLLIN);
kill_fasync(&fasync, SIGIO, POLL_IN);
}
/* If the input pool is getting full, send some
@@ -1293,7 +1292,7 @@ static size_t account(struct entropy_store *r, size_t 
nbytes, int min,
trace_debit_entropy(r->name, 8 * ibytes);
if (ibytes &&
(r->entropy_count >> ENTROPY_SHIFT) < random_write_wakeup_bits) {
-   wake_up_interruptible(&random_write_wait);
+   wake_up_interruptible_poll(&random_wait, POLLOUT);
kill_fasync(&fasync, SIGIO, POLL_OUT);
}
 
@@ -1748,7 +1747,7 @@ _random_read(int nonblock, char __user *buf, size_t 
nbytes)
if (nonblock)
return -EAGAIN;
 
-   wait_event_interruptible(random_read_wait,
+   wait_event_interruptible(random_wait,
ENTROPY_BITS(&input_pool) >=
random_read_wakeup_bits);
if (signal_pending(current))
@@ -1784,14 +1783,17 @@ urandom_read(struct file *file, char __user *buf, 
size_t nbytes, loff_t *ppos)
return ret;
 }
 
+static struct wait_queue_head *
+random_get_poll_head(struct file *file, __poll_t events)
+{
+   return &random_wait;
+}
+
 static __poll_t
-random_poll(struct file *file, poll_table * wait)
+random_poll_mask(struct file *file, __poll_t events)
 {
-   __poll_t mask;
+   __poll_t mask = 0;
 
-   poll_wait(file, &random_read_wait, wait);
-   poll_wait(file, &random_write_wait, wait);
-   mask = 0;
if (ENTROPY_BITS(&input_pool) >= random_read_wakeup_bits)
mask |= EPOLLIN | EPOLLRDNORM;
if (ENTROPY_BITS(&input_pool) < random_write_wakeup_bits)
@@ -1890,7 +1892,8 @@ static int random_fasync(int fd, struct file *filp, int 
on)
 const struct file_operations random_fops = {
.read  = random_read,
.write = random_write,
-   .poll  = random_poll,
+   .get_poll_head  = random_get_poll_head,
+   .poll_mask  = random_poll_mask,
.unlocked_ioctl = random_ioctl,
.fasync = random_fasync,
.llseek = noop_llseek,
@@ -2223,7 +2226,7 @@ void add_hwgenerator_randomness(const char *buffer, 
size_t count,
 * We'll be woken up again once below random_write_wakeup_thresh,
 * or when the calling thread is about to terminate.
 */
-   wait_event_interruptible(random_write_wait, kthread_should_stop() ||
+   wait_event_interruptible(random_wait, kthread_should_stop() ||
ENTROPY_BITS(&input_pool) <= random_write_wakeup_bits);
mix_pool_bytes(poolp, buffer, count);
credit_entropy_bits(poolp, entropy);
-- 
2.14.2



[PATCH 29/30] timerfd: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/timerfd.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index cdad49da3ff7..d84a2bee4f82 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -226,21 +226,20 @@ static int timerfd_release(struct inode *inode, struct 
file *file)
kfree_rcu(ctx, rcu);
return 0;
 }
-
-static __poll_t timerfd_poll(struct file *file, poll_table *wait)
+   
+static struct wait_queue_head *timerfd_get_poll_head(struct file *file,
+   __poll_t eventmask)
 {
struct timerfd_ctx *ctx = file->private_data;
-   __poll_t events = 0;
-   unsigned long flags;
 
-   poll_wait(file, &ctx->wqh, wait);
+   return &ctx->wqh;
+}
 
-   spin_lock_irqsave(&ctx->wqh.lock, flags);
-   if (ctx->ticks)
-   events |= EPOLLIN;
-   spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+static __poll_t timerfd_poll_mask(struct file *file, __poll_t eventmask)
+{
+   struct timerfd_ctx *ctx = file->private_data;
 
-   return events;
+   return ctx->ticks ? EPOLLIN : 0;
 }
 
 static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
@@ -364,7 +363,8 @@ static long timerfd_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg
 
 static const struct file_operations timerfd_fops = {
.release= timerfd_release,
-   .poll   = timerfd_poll,
+   .get_poll_head  = timerfd_get_poll_head,
+   .poll_mask  = timerfd_poll_mask,
.read   = timerfd_read,
.llseek = noop_llseek,
.show_fdinfo= timerfd_show,
-- 
2.14.2



[PATCH 27/30] pipe: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/pipe.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7b1954caf388..81937590ea0a 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -509,19 +509,22 @@ static long pipe_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
}
 }
 
-/* No kernel lock held - fine */
-static __poll_t
-pipe_poll(struct file *filp, poll_table *wait)
+static struct wait_queue_head *
+pipe_get_poll_head(struct file *filp, __poll_t events)
 {
-   __poll_t mask;
struct pipe_inode_info *pipe = filp->private_data;
-   int nrbufs;
 
-   poll_wait(filp, &pipe->wait, wait);
+   return &pipe->wait;
+}
+
+/* No kernel lock held - fine */
+static __poll_t pipe_poll_mask(struct file *filp, __poll_t events)
+{
+   struct pipe_inode_info *pipe = filp->private_data;
+   int nrbufs = pipe->nrbufs;
+   __poll_t mask = 0;
 
/* Reading only -- no need for acquiring the semaphore.  */
-   nrbufs = pipe->nrbufs;
-   mask = 0;
if (filp->f_mode & FMODE_READ) {
mask = (nrbufs > 0) ? EPOLLIN | EPOLLRDNORM : 0;
if (!pipe->writers && filp->f_version != pipe->w_counter)
@@ -1015,7 +1018,8 @@ const struct file_operations pipefifo_fops = {
.llseek = no_llseek,
.read_iter  = pipe_read,
.write_iter = pipe_write,
-   .poll   = pipe_poll,
+   .get_poll_head  = pipe_get_poll_head,
+   .poll_mask  = pipe_poll_mask,
.unlocked_ioctl = pipe_ioctl,
.release= pipe_release,
.fasync = pipe_fasync,
-- 
2.14.2



[PATCH 28/30] eventfd: switch to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/eventfd.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 012f5bd46dfa..d70b4907f978 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -101,14 +101,20 @@ static int eventfd_release(struct inode *inode, struct 
file *file)
return 0;
 }
 
-static __poll_t eventfd_poll(struct file *file, poll_table *wait)
+static struct wait_queue_head *
+eventfd_get_poll_head(struct file *file, __poll_t events)
+{
+   struct eventfd_ctx *ctx = file->private_data;
+
+   return &ctx->wqh;
+}
+
+static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask)
 {
struct eventfd_ctx *ctx = file->private_data;
__poll_t events = 0;
u64 count;
 
-   poll_wait(file, &ctx->wqh, wait);
-
/*
 * All writes to ctx->count occur within ctx->wqh.lock.  This read
 * can be done outside ctx->wqh.lock because we know that poll_wait
@@ -305,7 +311,8 @@ static const struct file_operations eventfd_fops = {
.show_fdinfo= eventfd_show_fdinfo,
 #endif
.release= eventfd_release,
-   .poll   = eventfd_poll,
+   .get_poll_head  = eventfd_get_poll_head,
+   .poll_mask  = eventfd_poll_mask,
.read   = eventfd_read,
.write  = eventfd_write,
.llseek = noop_llseek,
-- 
2.14.2



[PATCH 26/30] crypto: af_alg: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 crypto/af_alg.c | 13 +++--
 crypto/algif_aead.c |  4 ++--
 crypto/algif_skcipher.c |  4 ++--
 include/crypto/if_alg.h |  3 +--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 50d75de539f5..330aef1cd08b 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1060,19 +1060,12 @@ void af_alg_async_cb(struct crypto_async_request *_req, 
int err)
 }
 EXPORT_SYMBOL_GPL(af_alg_async_cb);
 
-/**
- * af_alg_poll - poll system call handler
- */
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-poll_table *wait)
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct alg_sock *ask = alg_sk(sk);
struct af_alg_ctx *ctx = ask->private;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
if (!ctx->more || ctx->used)
mask |= EPOLLIN | EPOLLRDNORM;
@@ -1082,7 +1075,7 @@ __poll_t af_alg_poll(struct file *file, struct socket 
*sock,
 
return mask;
 }
-EXPORT_SYMBOL_GPL(af_alg_poll);
+EXPORT_SYMBOL_GPL(af_alg_poll_mask);
 
 /**
  * af_alg_alloc_areq - allocate struct af_alg_async_req
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 4b07edd5a9ff..330cf9f2b767 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -375,7 +375,7 @@ static struct proto_ops algif_aead_ops = {
.sendmsg=   aead_sendmsg,
.sendpage   =   af_alg_sendpage,
.recvmsg=   aead_recvmsg,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static int aead_check_key(struct socket *sock)
@@ -471,7 +471,7 @@ static struct proto_ops algif_aead_ops_nokey = {
.sendmsg=   aead_sendmsg_nokey,
.sendpage   =   aead_sendpage_nokey,
.recvmsg=   aead_recvmsg_nokey,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static void *aead_bind(const char *name, u32 type, u32 mask)
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index c4e885df4564..15cf3c5222e0 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -205,7 +205,7 @@ static struct proto_ops algif_skcipher_ops = {
.sendmsg=   skcipher_sendmsg,
.sendpage   =   af_alg_sendpage,
.recvmsg=   skcipher_recvmsg,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static int skcipher_check_key(struct socket *sock)
@@ -301,7 +301,7 @@ static struct proto_ops algif_skcipher_ops_nokey = {
.sendmsg=   skcipher_sendmsg_nokey,
.sendpage   =   skcipher_sendpage_nokey,
.recvmsg=   skcipher_recvmsg_nokey,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static void *skcipher_bind(const char *name, u32 type, u32 mask)
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 482461d8931d..cc414db9da0a 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -245,8 +245,7 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page 
*page,
int offset, size_t size, int flags);
 void af_alg_free_resources(struct af_alg_async_req *areq);
 void af_alg_async_cb(struct crypto_async_request *_req, int err);
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-poll_table *wait);
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events);
 struct af_alg_async_req *af_alg_alloc_areq(struct sock *sk,
   unsigned int areqlen);
 int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
-- 
2.14.2



[PATCH 24/30] net/iucv: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/net/iucv/af_iucv.h | 2 --
 net/iucv/af_iucv.c | 7 ++-
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/net/iucv/af_iucv.h b/include/net/iucv/af_iucv.h
index f4c21b5a1242..b0eaeb02d46d 100644
--- a/include/net/iucv/af_iucv.h
+++ b/include/net/iucv/af_iucv.h
@@ -153,8 +153,6 @@ struct iucv_sock_list {
atomic_t  autobind_name;
 };
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-   poll_table *wait);
 void iucv_sock_link(struct iucv_sock_list *l, struct sock *s);
 void iucv_sock_unlink(struct iucv_sock_list *l, struct sock *s);
 void iucv_accept_enqueue(struct sock *parent, struct sock *sk);
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 1e8cc7bcbca3..539a312dc481 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1489,14 +1489,11 @@ static inline __poll_t iucv_accept_poll(struct sock 
*parent)
return 0;
 }
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-   poll_table *wait)
+static __poll_t iucv_sock_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
__poll_t mask = 0;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == IUCV_LISTEN)
return iucv_accept_poll(sk);
 
@@ -2389,7 +2386,7 @@ static const struct proto_ops iucv_sock_ops = {
.getname= iucv_sock_getname,
.sendmsg= iucv_sock_sendmsg,
.recvmsg= iucv_sock_recvmsg,
-   .poll   = iucv_sock_poll,
+   .poll_mask  = iucv_sock_poll_mask,
.ioctl  = sock_no_ioctl,
.mmap   = sock_no_mmap,
.socketpair = sock_no_socketpair,
-- 
2.14.2



[PATCH 22/30] net/nfc: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/nfc/llcp_sock.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 376040092142..b6010750e634 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -549,16 +549,13 @@ static inline __poll_t llcp_accept_poll(struct sock 
*parent)
return 0;
 }
 
-static __poll_t llcp_sock_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+static __poll_t llcp_sock_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
__poll_t mask = 0;
 
pr_debug("%p\n", sk);
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == LLCP_LISTEN)
return llcp_accept_poll(sk);
 
@@ -900,7 +897,7 @@ static const struct proto_ops llcp_sock_ops = {
.socketpair = sock_no_socketpair,
.accept = llcp_sock_accept,
.getname= llcp_sock_getname,
-   .poll   = llcp_sock_poll,
+   .poll_mask  = llcp_sock_poll_mask,
.ioctl  = sock_no_ioctl,
.listen = llcp_sock_listen,
.shutdown   = sock_no_shutdown,
@@ -920,7 +917,7 @@ static const struct proto_ops llcp_rawsock_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= llcp_sock_getname,
-   .poll   = llcp_sock_poll,
+   .poll_mask  = llcp_sock_poll_mask,
.ioctl  = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
-- 
2.14.2



[PATCH 23/30] net/phonet: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/phonet/socket.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 28d981512f5f..70ac4539d5b7 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -341,15 +341,12 @@ static int pn_socket_getname(struct socket *sock, struct 
sockaddr *addr,
return 0;
 }
 
-static __poll_t pn_socket_poll(struct file *file, struct socket *sock,
-   poll_table *wait)
+static __poll_t pn_socket_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct pep_sock *pn = pep_sk(sk);
__poll_t mask = 0;
 
-   poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == TCP_CLOSE)
return EPOLLERR;
if (!skb_queue_empty(&sk->sk_receive_queue))
@@ -474,7 +471,7 @@ const struct proto_ops phonet_stream_ops = {
.socketpair = sock_no_socketpair,
.accept = pn_socket_accept,
.getname= pn_socket_getname,
-   .poll   = pn_socket_poll,
+   .poll_mask  = pn_socket_poll_mask,
.ioctl  = pn_socket_ioctl,
.listen = pn_socket_listen,
.shutdown   = sock_no_shutdown,
-- 
2.14.2



[PATCH 21/30] net/caif: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/caif/caif_socket.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index a6fb1b3bcad9..c7991867d622 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -934,15 +934,11 @@ static int caif_release(struct socket *sock)
 }
 
 /* Copied from af_unix.c:unix_poll(), added CAIF tx_flow handling */
-static __poll_t caif_poll(struct file *file,
- struct socket *sock, poll_table *wait)
+static __poll_t caif_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
-   __poll_t mask;
struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
/* exceptional events? */
if (sk->sk_err)
@@ -976,7 +972,7 @@ static const struct proto_ops caif_seqpacket_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
-   .poll = caif_poll,
+   .poll_mask = caif_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
@@ -997,7 +993,7 @@ static const struct proto_ops caif_stream_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
-   .poll = caif_poll,
+   .poll_mask = caif_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
-- 
2.14.2



Re: [PATCH v3 6/6] phy: qcom-qusb2: Add QUSB2 PHYs support for sdm845

2018-03-28 Thread Manu Gautam
Hi,


On 3/28/2018 4:22 AM, Doug Anderson wrote:
> Hi,
>
> On Thu, Mar 22, 2018 at 11:11 PM, Manu Gautam  wrote:
>> There are two QUSB2 PHYs present on sdm845. Update PHY
>> registers programming for both the PHYs related to
>> electrical parameters to improve eye diagram.
> This tuning difference is truly associated with the SoC itself?  Are
> you sure?  Are the two different PHYs in the SoC somehow using
> different silicon processes?  ...or is one close to another IP block
> that is noisy?  ...or something else to account for this difference?
>
> It seems more likely that this tuning difference is associated with
> the board.  If you're _certain_ this is really due to internal SoC
> differences you'll have to come up with some darn good evidence to
> convince me...

This difference must be due to board only.

>
> If the tuning is truly associated with the board then:
>
> 1. You should have a single device tree compatible string.  IMHO it
> should contain the name of the SoC in it, so "qcom,sdm845-qusb2-phy".
> It's generally OK to name something in Linux using the name of the
> first thing that happened to support it in Linux (even if later
> processors use the exact same component).  Leaving it as just
> "qcom,qusb2-v2-phy" is OK with me too if that's what everyone wants.
I will remove "qcom,qusb2-v2-phy" as I don't expect any users of that.
>
>
> 2. You should figure out how to describe the needed board-to-board
> tuning in device tree.
>
> The only two differences you have right now are:
>
> QUSB2PHY_IMP_CTRL1: 0 => 0x8
> QUSB2PHY_PORT_TUNE1: 0x30 => 0x48
>
> I'm not sure I found all the correct documentation for the PHY (the
> docs I have say that "TUNE1" bit 3 is "reserved") so I can't come up
> with all of these for you.  But I think I found the difference
> accounting for the upper nybble of TUNE1 changing from 0x3 to 0x4.
> For this, I think you'd want a device tree property like:
>
> qcom,hstx_trim_mv
>
> ...and the values of that property would be the values from 800 to 950
> in 8 steps, or [800, 821, 842, 864, 885, 907, 928, 950].
>
> You'd want to do similar things for the other differences.
>
> You don't need to encode every possible difference right now.  When
> you come up with something that needs to be different you add a new
> optional device tree property (defaulting to whatever the driver used
> to do) to describe your new property.

Sure. I will come up with separate device tree properties to specify
board-to-board differences in PHY tuning.


>
> -Doug

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH] drm/atmel-hlcdc: add command line option to specify preferred depth

2018-03-28 Thread Boris Brezillon
Hi Peter,

On Mon, 26 Mar 2018 09:35:02 +0200
Peter Rosin  wrote:

> I have an sama5d31-based system with 64MB of memory and a 1920x1080
> LVDS display wired for 16-bpp. When I enable legacy fbdev support,
> the contiguous memory allocator invariably fails with the order-11
> allocation for a 1920x1080@24-bpp buffer (~6MB). But this HW can never
> make any good use of RGB888, so that is a wasted attempt anyway that
> would also waste precious memory should it succeed.
> 
> Sure, I could rewrite user-space to go directly to KMS etc, and that
> makes the (attempted) order-11 allocation go away, replacing it with
> one order-10 allocation per application restart for a 1920x1080@16-bpp
> buffer (<4MB). But after a few restarts, order-10 allocations start to
> fail as well, which is only to be expected AFAIU.
> 
> So, I'd rather not change user-space (which was originally written
> to target a smaller display) so that I at the same time get the
> benefit of an early pre-allocated fbdev frame-buffer that can be
> reused over and over. But to do that I need to tell the driver that
> 16-bpp is the preferred depth. Add a module parameter to do just that.
> 
> Signed-off-by: Peter Rosin 
> ---
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> I found some inspiration regarding naming and implementation here:
> https://patchwork.kernel.org/patch/9848631/
> 
> I have found no feedback on that patch though, which makes me wonder if
> I'm perhaps barking up the wronig tree?

Hm, isn't that something you can already overload with the video=
parameter?

video=:[-]

AFAIR,  encodes the color depth, so what is the benefit of adding
this new property to overload the default depth?

Maybe I'm wrong and the default depth param is actually useful, but in
this case we should probably make it generic since other drivers seems
to need it too, and we might want to attach it to a specific display
engine instance.

Thanks,

Boris

> 
> Cheers,
> Peter
> 
> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
> b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
> index c1ea5c36b006..f0148627c221 100644
> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
> @@ -29,6 +29,11 @@
>  
>  #define ATMEL_HLCDC_LAYER_IRQS_OFFSET8
>  
> +static int atmel_hlcdc_preferred_depth __read_mostly;
> +
> +MODULE_PARM_DESC(preferreddepth, "Set preferred bpp");
> +module_param_named(preferreddepth, atmel_hlcdc_preferred_depth, int, 0400);
> +
>  static const struct atmel_hlcdc_layer_desc atmel_hlcdc_at91sam9n12_layers[] 
> = {
>   {
>   .name = "base",
> @@ -590,6 +595,7 @@ static int atmel_hlcdc_dc_modeset_init(struct drm_device 
> *dev)
>   dev->mode_config.min_height = dc->desc->min_height;
>   dev->mode_config.max_width = dc->desc->max_width;
>   dev->mode_config.max_height = dc->desc->max_height;
> + dev->mode_config.preferred_depth = 24;
>   dev->mode_config.funcs = &mode_config_funcs;
>  
>   return 0;
> @@ -658,7 +664,7 @@ static int atmel_hlcdc_dc_load(struct drm_device *dev)
>  
>   platform_set_drvdata(pdev, dev);
>  
> - drm_fb_cma_fbdev_init(dev, 24, 0);
> + drm_fb_cma_fbdev_init(dev, atmel_hlcdc_preferred_depth, 0);
>  
>   drm_kms_helper_poll_init(dev);
>  
> @@ -756,6 +762,16 @@ static int atmel_hlcdc_dc_drm_probe(struct 
> platform_device *pdev)
>   struct drm_device *ddev;
>   int ret;
>  
> + switch (atmel_hlcdc_preferred_depth) {
> + case 0: /* driver default */
> + case 8:
> + case 16:
> + case 24:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
>   ddev = drm_dev_alloc(&atmel_hlcdc_dc_driver, &pdev->dev);
>   if (IS_ERR(ddev))
>   return PTR_ERR(ddev);



-- 
Boris Brezillon, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


[PATCH 19/30] net/sctp: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/net/sctp/sctp.h | 3 +--
 net/sctp/ipv6.c | 2 +-
 net/sctp/protocol.c | 2 +-
 net/sctp/socket.c   | 4 +---
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index f7ae6b0a21d0..37abd5ba4a3f 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -107,8 +107,7 @@ int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb);
 int sctp_inet_listen(struct socket *sock, int backlog);
 void sctp_write_space(struct sock *sk);
 void sctp_data_ready(struct sock *sk);
-__poll_t sctp_poll(struct file *file, struct socket *sock,
-   poll_table *wait);
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events);
 void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
struct sctp_association *asoc);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index e35d4f73d2df..6b0b8fc5b75a 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -976,7 +976,7 @@ static const struct proto_ops inet6_seqpacket_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = sctp_getname,
-   .poll  = sctp_poll,
+   .poll_mask = sctp_poll_mask,
.ioctl = inet6_ioctl,
.listen= sctp_inet_listen,
.shutdown  = inet_shutdown,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 91813e686c67..20c544890e80 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1024,7 +1024,7 @@ static const struct proto_ops inet_seqpacket_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = inet_getname,  /* Semantics are different.  */
-   .poll  = sctp_poll,
+   .poll_mask = sctp_poll_mask,
.ioctl = inet_ioctl,
.listen= sctp_inet_listen,
.shutdown  = inet_shutdown, /* Looks harmless.  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index bf271f8c2dc9..097454740929 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7587,14 +7587,12 @@ int sctp_inet_listen(struct socket *sock, int backlog)
  * here, again, by modeling the current TCP/UDP code.  We don't have
  * a good way to test with it yet.
  */
-__poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct sctp_sock *sp = sctp_sk(sk);
__poll_t mask;
 
-   poll_wait(file, sk_sleep(sk), wait);
-
sock_rps_record_flow(sk);
 
/* A TCP-style listening socket becomes readable when the accept queue
-- 
2.14.2



Re: [PATCH] perf trace: remove redundant ')'

2018-03-28 Thread Du, Changbin
Hi Arnaldo,
Just a kind reminder. Hope you didn't forget this.

On Fri, Mar 16, 2018 at 09:50:45AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 16, 2018 at 03:51:09PM +0800, Du, Changbin escreveu:
> > Hi Arnaldo, How about this simple one? Thanks.
> > 
> > On Tue, Mar 13, 2018 at 06:40:01PM +0800, changbin...@intel.com wrote:
> > > From: Changbin Du 
> > > 
> > > There is a redundant ')' at the tail of each event. So remove it.
> > > $ sudo perf trace --no-syscalls -e 'kmem:*' -a
> > >899.342 kmem:kfree:(vfs_writev+0xb9) call_site=9c453979 
> > > ptr=(nil))
> > >899.344 kmem:kfree:(___sys_recvmsg+0x188) call_site=9c9b8b88 
> > > ptr=(nil))
> > > 
> > > Signed-off-by: Changbin Du 
> > > ---
> > >  tools/perf/builtin-trace.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> > > index e7f1b18..7273f5f 100644
> > > --- a/tools/perf/builtin-trace.c
> > > +++ b/tools/perf/builtin-trace.c
> > > @@ -1959,7 +1959,7 @@ static int trace__event_handler(struct trace 
> > > *trace, struct perf_evsel *evsel,
> > > trace->output);
> > >   }
> > >  
> > > - fprintf(trace->output, ")\n");
> > > + fprintf(trace->output, "\n");
> 
> It looks simple on the surface, but I couldn't quickly recall why this
> ')' was put there in the first place... So I left for later to do a 'git
> blame' on this file, etc.
> 
> - Arnaldo
> 
> > >   if (callchain_ret > 0)
> > >   trace__fprintf_callchain(trace, sample);
> > > -- 
> > > 2.7.4
> > > 
> > 
> > -- 
> > Thanks,
> > Changbin Du

-- 
Thanks,
Changbin Du


[PATCH 17/30] net/vmw_vsock: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/vmw_vsock/af_vsock.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index e0fc84daed94..b9210329bda8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -850,18 +850,11 @@ static int vsock_shutdown(struct socket *sock, int mode)
return err;
 }
 
-static __poll_t vsock_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+static __poll_t vsock_poll_mask(struct socket *sock, __poll_t events)
 {
-   struct sock *sk;
-   __poll_t mask;
-   struct vsock_sock *vsk;
-
-   sk = sock->sk;
-   vsk = vsock_sk(sk);
-
-   poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   struct sock *sk = sock->sk;
+   struct vsock_sock *vsk = vsock_sk(sk);
+   __poll_t mask = 0;
 
if (sk->sk_err)
/* Signify that there has been an error on this socket. */
@@ -1091,7 +1084,7 @@ static const struct proto_ops vsock_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = vsock_getname,
-   .poll = vsock_poll,
+   .poll_mask = vsock_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = vsock_shutdown,
@@ -1849,7 +1842,7 @@ static const struct proto_ops vsock_stream_ops = {
.socketpair = sock_no_socketpair,
.accept = vsock_accept,
.getname = vsock_getname,
-   .poll = vsock_poll,
+   .poll_mask = vsock_poll_mask,
.ioctl = sock_no_ioctl,
.listen = vsock_listen,
.shutdown = vsock_shutdown,
-- 
2.14.2



[PATCH 14/30] net: convert datagram_poll users tp ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
---
 drivers/isdn/mISDN/socket.c|  2 +-
 drivers/net/ppp/pppoe.c|  2 +-
 drivers/staging/ipx/af_ipx.c   |  2 +-
 drivers/staging/irda/net/af_irda.c |  6 +++---
 include/linux/skbuff.h |  3 +--
 include/net/udp.h  |  2 +-
 net/appletalk/ddp.c|  2 +-
 net/ax25/af_ax25.c |  2 +-
 net/bluetooth/hci_sock.c   |  2 +-
 net/can/bcm.c  |  2 +-
 net/can/raw.c  |  2 +-
 net/core/datagram.c| 13 -
 net/decnet/af_decnet.c |  6 +++---
 net/ieee802154/socket.c|  4 ++--
 net/ipv4/af_inet.c |  6 +++---
 net/ipv4/udp.c | 10 +-
 net/ipv6/af_inet6.c|  2 +-
 net/ipv6/raw.c |  4 ++--
 net/kcm/kcmsock.c  |  4 ++--
 net/key/af_key.c   |  2 +-
 net/l2tp/l2tp_ip.c |  2 +-
 net/l2tp/l2tp_ip6.c|  2 +-
 net/l2tp/l2tp_ppp.c|  2 +-
 net/llc/af_llc.c   |  2 +-
 net/netlink/af_netlink.c   |  2 +-
 net/netrom/af_netrom.c |  2 +-
 net/nfc/rawsock.c  |  4 ++--
 net/packet/af_packet.c |  9 -
 net/phonet/socket.c|  2 +-
 net/qrtr/qrtr.c|  2 +-
 net/rose/af_rose.c |  2 +-
 net/x25/af_x25.c   |  2 +-
 32 files changed, 52 insertions(+), 59 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c84270e16bdd..61d6e4c9e7d1 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -589,7 +589,7 @@ static const struct proto_ops data_sock_ops = {
.getname= data_sock_getname,
.sendmsg= mISDN_sock_sendmsg,
.recvmsg= mISDN_sock_recvmsg,
-   .poll   = datagram_poll,
+   .poll_mask  = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = data_sock_setsockopt,
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 5aa59f41bf8c..8c311e626884 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1120,7 +1120,7 @@ static const struct proto_ops pppoe_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= pppoe_getname,
-   .poll   = datagram_poll,
+   .poll_mask  = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c
index d21a9d128d3e..3373f7f67d35 100644
--- a/drivers/staging/ipx/af_ipx.c
+++ b/drivers/staging/ipx/af_ipx.c
@@ -1967,7 +1967,7 @@ static const struct proto_ops ipx_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= ipx_getname,
-   .poll   = datagram_poll,
+   .poll_mask  = datagram_poll_mask,
.ioctl  = ipx_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl   = ipx_compat_ioctl,
diff --git a/drivers/staging/irda/net/af_irda.c 
b/drivers/staging/irda/net/af_irda.c
index 2f1e9ab3d6d0..77659b1c40ba 100644
--- a/drivers/staging/irda/net/af_irda.c
+++ b/drivers/staging/irda/net/af_irda.c
@@ -2600,7 +2600,7 @@ static const struct proto_ops irda_seqpacket_ops = {
.socketpair =   sock_no_socketpair,
.accept =   irda_accept,
.getname =  irda_getname,
-   .poll = datagram_poll,
+   .poll_mask =datagram_poll_mask,
.ioctl =irda_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = irda_compat_ioctl,
@@ -2624,7 +2624,7 @@ static const struct proto_ops irda_dgram_ops = {
.socketpair =   sock_no_socketpair,
.accept =   irda_accept,
.getname =  irda_getname,
-   .poll = datagram_poll,
+   .poll_mask =datagram_poll_mask,
.ioctl =irda_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = irda_compat_ioctl,
@@ -2649,7 +2649,7 @@ static const struct proto_ops irda_ultra_ops = {
.socketpair =   sock_no_socketpair,
.accept =   sock_no_accept,
.getname =  irda_getname,
-   .poll = datagram_poll,
+   .poll_mask =datagram_poll_mask,
.ioctl =irda_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = irda_compat_ioctl,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ddf77cf4ff2d..1ac027bd33ec 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3246,8 +3246,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, 
unsigned flags,
int *peeked, int *off, int *err);
 struct s

[PATCH 16/30] net/atm: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/atm/common.c | 11 +++
 net/atm/common.h |  2 +-
 net/atm/pvc.c|  2 +-
 net/atm/svc.c|  2 +-
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/atm/common.c b/net/atm/common.c
index fc78a0508ae1..1f2af59935db 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -648,16 +648,11 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, 
size_t size)
return error;
 }
 
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
-   struct atm_vcc *vcc;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
-
-   vcc = ATM_SD(sock);
+   struct atm_vcc *vcc = ATM_SD(sock);
+   __poll_t mask = 0;
 
/* exceptional events */
if (sk->sk_err)
diff --git a/net/atm/common.h b/net/atm/common.h
index 5850649068bb..526796ad230f 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -17,7 +17,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int 
vci);
 int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int flags);
 int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len);
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events);
 int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_setsockopt(struct socket *sock, int level, int optname,
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index e1140b3bdcaa..930651c5e77c 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -114,7 +114,7 @@ static const struct proto_ops pvc_proto_ops = {
.socketpair =   sock_no_socketpair,
.accept =   sock_no_accept,
.getname =  pvc_getname,
-   .poll = vcc_poll,
+   .poll_mask =vcc_poll_mask,
.ioctl =vcc_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = vcc_compat_ioctl,
diff --git a/net/atm/svc.c b/net/atm/svc.c
index c458adcbc177..ad0e6ffb9cfe 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -637,7 +637,7 @@ static const struct proto_ops svc_proto_ops = {
.socketpair =   sock_no_socketpair,
.accept =   svc_accept,
.getname =  svc_getname,
-   .poll = vcc_poll,
+   .poll_mask =vcc_poll_mask,
.ioctl =svc_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = svc_compat_ioctl,
-- 
2.14.2



[PATCH 15/30] net/dccp: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 net/dccp/dccp.h  |  3 +--
 net/dccp/ipv4.c  |  2 +-
 net/dccp/ipv6.c  |  2 +-
 net/dccp/proto.c | 13 ++---
 4 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index f91e3816806b..0ea2ee56ac1b 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -316,8 +316,7 @@ int dccp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
 int flags, int *addr_len);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-  poll_table *wait);
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events);
 int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 void dccp_req_err(struct sock *sk, u64 seq);
 
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index e65fcb45c3f6..e8476f319efd 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -983,7 +983,7 @@ static const struct proto_ops inet_dccp_ops = {
.accept= inet_accept,
.getname   = inet_getname,
/* FIXME: work on tcp_poll to rename it to inet_csk_poll */
-   .poll  = dccp_poll,
+   .poll_mask = dccp_poll_mask,
.ioctl = inet_ioctl,
/* FIXME: work on inet_listen to rename it to sock_common_listen */
.listen= inet_dccp_listen,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 5df7857fc0f3..f0aac8e4b888 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1069,7 +1069,7 @@ static const struct proto_ops inet6_dccp_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = inet6_getname,
-   .poll  = dccp_poll,
+   .poll_mask = dccp_poll_mask,
.ioctl = inet6_ioctl,
.listen= inet_dccp_listen,
.shutdown  = inet_shutdown,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 15bdc002d90c..26816032a7c2 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -314,20 +314,11 @@ int dccp_disconnect(struct sock *sk, int flags)
 
 EXPORT_SYMBOL_GPL(dccp_disconnect);
 
-/*
- * Wait for a DCCP event.
- *
- * Note that we don't need to lock the socket, as the upper poll layers
- * take care of normal races (between the test and the event) and we don't
- * go look at any of the socket buffers directly.
- */
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events)
 {
__poll_t mask;
struct sock *sk = sock->sk;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
if (sk->sk_state == DCCP_LISTEN)
return inet_csk_listen_poll(sk);
 
@@ -369,7 +360,7 @@ __poll_t dccp_poll(struct file *file, struct socket *sock,
return mask;
 }
 
-EXPORT_SYMBOL_GPL(dccp_poll);
+EXPORT_SYMBOL_GPL(dccp_poll_mask);
 
 int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
-- 
2.14.2



Re: [PATCH] Input: ALPS - add support for 73 03 28 devices (Thinkpad L570)

2018-03-28 Thread Pali Rohár
Ah, this si again same issue... It was already discussed here:

https://patchwork.kernel.org/patch/10081557/

-- 
Pali Rohár
pali.ro...@gmail.com


[PATCH 12/30] net/tcp: convert to ->poll_mask

2018-03-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/net/tcp.h   |  4 ++--
 net/ipv4/af_inet.c  |  3 ++-
 net/ipv4/tcp.c  | 31 ++-
 net/ipv6/af_inet6.c |  3 ++-
 4 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e3fc667f9ac2..fb52f93d556c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -387,8 +387,8 @@ bool tcp_peer_is_proven(struct request_sock *req, struct 
dst_entry *dst);
 void tcp_close(struct sock *sk, long timeout);
 void tcp_init_sock(struct sock *sk);
 void tcp_init_transfer(struct sock *sk, int bpf_op);
-__poll_t tcp_poll(struct file *file, struct socket *sock,
- struct poll_table_struct *wait);
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t 
events);
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events);
 int tcp_getsockopt(struct sock *sk, int level, int optname,
   char __user *optval, int __user *optlen);
 int tcp_setsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e4329e161943..ec32cc263b18 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -952,7 +952,8 @@ const struct proto_ops inet_stream_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = inet_getname,
-   .poll  = tcp_poll,
+   .get_poll_head = tcp_get_poll_head,
+   .poll_mask = tcp_poll_mask,
.ioctl = inet_ioctl,
.listen= inet_listen,
.shutdown  = inet_shutdown,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 48636aee23c3..ad8e281066a0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -484,33 +484,30 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
}
 }
 
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events)
+{
+   sock_poll_busy_loop(sock, events);
+   sock_rps_record_flow(sock->sk);
+   return sk_sleep(sock->sk);
+}
+EXPORT_SYMBOL(tcp_get_poll_head);
+
 /*
- * Wait for a TCP event.
- *
- * Note that we don't need to lock the socket, as the upper poll layers
- * take care of normal races (between the test and the event) and we don't
- * go look at any of the socket buffers directly.
+ * Socket is not locked. We are protected from async events by poll logic and
+ * correct handling of state changes made by other threads is impossible in
+ * any case.
  */
-__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events)
 {
-   __poll_t mask;
struct sock *sk = sock->sk;
const struct tcp_sock *tp = tcp_sk(sk);
+   __poll_t mask = 0;
int state;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
state = inet_sk_state_load(sk);
if (state == TCP_LISTEN)
return inet_csk_listen_poll(sk);
 
-   /* Socket is not locked. We are protected from async events
-* by poll logic and correct handling of state changes
-* made by other threads is impossible in any case.
-*/
-
-   mask = 0;
-
/*
 * EPOLLHUP is certainly not done right. But poll() doesn't
 * have a notion of HUP in just one direction, and for a
@@ -591,7 +588,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, 
poll_table *wait)
 
return mask;
 }
-EXPORT_SYMBOL(tcp_poll);
+EXPORT_SYMBOL(tcp_poll_mask);
 
 int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 416917719a6f..c470549d6ef9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -547,7 +547,8 @@ const struct proto_ops inet6_stream_ops = {
.socketpair= sock_no_socketpair,/* a do nothing */
.accept= inet_accept,   /* ok   */
.getname   = inet6_getname,
-   .poll  = tcp_poll,  /* ok   */
+   .get_poll_head = tcp_get_poll_head,
+   .poll_mask = tcp_poll_mask, /* ok   */
.ioctl = inet6_ioctl,   /* must change  */
.listen= inet_listen,   /* ok   */
.shutdown  = inet_shutdown, /* ok   */
-- 
2.14.2



Re: [PATCH] Input: ALPS - fix DualPoint flag for 74 03 28 devices

2018-03-28 Thread Pali Rohár
Ideally this information should be put as a comment into the code. It is
really not obvious.

On Tuesday 27 March 2018 04:37:19 Masaki Ota wrote:
> Hi, 
> 
> We can get OTP page 0 value by EA EA E9 commands, but we cannot get it by EA 
> EA EA E9.
> As far as I remember, Device initialization finish at EA command, then sends 
> EA EA E9 commands.
> In this case we cannot get correct OTP page 0 value.
> So I changed this order. (We can get OTP page 1 value by both of EA F0 F0 E9 
> and F0 F0 E9.)
> 
> Best Regards,
> Masaki Ota
> -Original Message-
> From: Pali Rohár [mailto:pali.ro...@gmail.com] 
> Sent: Monday, March 26, 2018 6:26 AM
> To: 太田 真喜 Masaki Ota 
> Cc: Dmitry Torokhov ; linux-in...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Aaron Ma 
> Subject: Re: [PATCH] Input: ALPS - fix DualPoint flag for 74 03 28 devices
> 
> On Tuesday 20 March 2018 11:47:26 Dmitry Torokhov wrote:
> > On Mon, Jan 29, 2018 at 2:51 PM, dmitry.torok...@gmail.com 
> >  wrote:
> > > Hi,
> > >
> > > On Thu, Nov 16, 2017 at 07:27:02AM +, Masaki Ota wrote:
> > >> Hi, Pali, Aaron,
> > >>
> > >> Current code is correct device setting, previous code is wrong.
> > >> If the trackstick does not work(DUALPOINT flag disable), Device Firmware 
> > >> setting is wrong.
> > >>
> > >> But recently I received the same report from Thinkpad L570 user, and I 
> > >> checked this device and found this device Firmware setting is wrong. 
> > >> Sorry for our mistake.
> > >> Is your laptop L570 ?
> > >>
> > >> I will add code that supports the trackstick for this device.
> > >
> > > Sorry for resurrecting this old thread, I am just trying to 
> > > understand what went wrong here. Is the sequence of "f0 f0 e9" and 
> > > "ea ea e9" is important in getting the correct OTP data and we 
> > > originally got this order wrong? It is not clear from the original 
> > > patch and discussion that this change was intentional.
> > 
> > Could I please get an answer to my question?
> > 
> > Thanks!
> 
> Masaki, this question is for you ↑↑↑
> 
> > >
> > > Thanks.
> > >
> > >>
> > >> Best Regards,
> > >> Masaki Ota
> > >> -Original Message-
> > >> From: Pali Rohár [mailto:pali.ro...@gmail.com]
> > >> Sent: Wednesday, November 15, 2017 5:35 PM
> > >> To: 太田 真喜 Masaki Ota 
> > >> Cc: linux-in...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> > >> dmitry.torok...@gmail.com; Aaron Ma 
> > >> Subject: Re: [PATCH] Input: ALPS - fix DualPoint flag for 74 03 28 
> > >> devices
> > >>
> > >> On Wednesday 15 November 2017 14:34:04 Aaron Ma wrote:
> > >> > There is a regression of commit 4a646580f793 ("Input: ALPS - fix 
> > >> > two-finger scroll breakage"), ALPS device fails with log:
> > >> >
> > >> > psmouse serio1: alps: Rejected trackstick packet from non 
> > >> > DualPoint device
> > >> >
> > >> > ALPS device with id "74 03 28" report OTP[0] data 0xCE after 
> > >> > commit 4a646580f793, after restore the OTP reading order, it 
> > >> > becomes to 0x10 as before and reports the right flag.
> > >> >
> > >> > Fixes: 4a646580f793 ("Input: ALPS - fix two-finger scroll 
> > >> > breakage")
> > >> > Cc: 
> > >> > Signed-off-by: Aaron Ma 
> > >> > ---
> > >> >  drivers/input/mouse/alps.c | 4 ++--
> > >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >> >
> > >> > diff --git a/drivers/input/mouse/alps.c 
> > >> > b/drivers/input/mouse/alps.c index 579b899add26..c59b8f7ca2fc 
> > >> > 100644
> > >> > --- a/drivers/input/mouse/alps.c
> > >> > +++ b/drivers/input/mouse/alps.c
> > >> > @@ -2562,8 +2562,8 @@ static int alps_set_defaults_ss4_v2(struct 
> > >> > psmouse *psmouse,
> > >> >
> > >> > memset(otp, 0, sizeof(otp));
> > >> >
> > >> > -   if (alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]) ||
> > >> > -   alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]))
> > >> > +   if (alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]) ||
> > >> > +   alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]))
> > >> > return -1;
> > >> >
> > >> > alps_update_device_area_ss4_v2(otp, priv);
> > >>
> > >> Masaki Ota, please look at this patch as it partially revert your 
> > >> commit
> > >> 4a646580f793 ("Input: ALPS - fix two-finger scroll breakage"). Something 
> > >> smells here.
> > >>
> > >> --
> > >> Pali Rohár
> > >> pali.ro...@gmail.com
> > >
> > > --
> > > Dmitry
> > 
> > 
> > 
> 
> --
> Pali Rohár
> pali.ro...@gmail.com

-- 
Pali Rohár
pali.ro...@gmail.com


[PATCH] drm: Use srcu to protect drm_device.unplugged

2018-03-28 Thread Oleksandr Andrushchenko
From: Noralf Trønnes 

Use srcu to protect drm_device.unplugged in a race free manner.
Drivers can use drm_dev_enter()/drm_dev_exit() to protect and mark
sections preventing access to device resources that are not available
after the device is gone.

Suggested-by: Daniel Vetter 
Signed-off-by: Noralf Trønnes 
Signed-off-by: Oleksandr Andrushchenko 
Reviewed-by: Oleksandr Andrushchenko 
Tested-by: Oleksandr Andrushchenko 
Cc: intel-...@lists.freedesktop.org
---
 drivers/gpu/drm/drm_drv.c | 54 ++-
 include/drm/drm_device.h  |  9 +++-
 include/drm/drm_drv.h | 15 +
 3 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index a1b9338736e3..32a83b41ab61 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -75,6 +76,8 @@ static bool drm_core_init_complete = false;
 
 static struct dentry *drm_debugfs_root;
 
+DEFINE_STATIC_SRCU(drm_unplug_srcu);
+
 /*
  * DRM Minors
  * A DRM device can provide several char-dev interfaces on the DRM-Major. Each
@@ -318,18 +321,51 @@ void drm_put_dev(struct drm_device *dev)
 }
 EXPORT_SYMBOL(drm_put_dev);
 
-static void drm_device_set_unplugged(struct drm_device *dev)
+/**
+ * drm_dev_enter - Enter device critical section
+ * @dev: DRM device
+ * @idx: Pointer to index that will be passed to the matching drm_dev_exit()
+ *
+ * This function marks and protects the beginning of a section that should not
+ * be entered after the device has been unplugged. The section end is marked
+ * with drm_dev_exit(). Calls to this function can be nested.
+ *
+ * Returns:
+ * True if it is OK to enter the section, false otherwise.
+ */
+bool drm_dev_enter(struct drm_device *dev, int *idx)
+{
+   *idx = srcu_read_lock(&drm_unplug_srcu);
+
+   if (dev->unplugged) {
+   srcu_read_unlock(&drm_unplug_srcu, *idx);
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL(drm_dev_enter);
+
+/**
+ * drm_dev_exit - Exit device critical section
+ * @idx: index returned from drm_dev_enter()
+ *
+ * This function marks the end of a section that should not be entered after
+ * the device has been unplugged.
+ */
+void drm_dev_exit(int idx)
 {
-   smp_wmb();
-   atomic_set(&dev->unplugged, 1);
+   srcu_read_unlock(&drm_unplug_srcu, idx);
 }
+EXPORT_SYMBOL(drm_dev_exit);
 
 /**
  * drm_dev_unplug - unplug a DRM device
  * @dev: DRM device
  *
  * This unplugs a hotpluggable DRM device, which makes it inaccessible to
- * userspace operations. Entry-points can use drm_dev_is_unplugged(). This
+ * userspace operations. Entry-points can use drm_dev_enter() and
+ * drm_dev_exit() to protect device resources in a race free manner. This
  * essentially unregisters the device like drm_dev_unregister(), but can be
  * called while there are still open users of @dev.
  */
@@ -338,10 +374,18 @@ void drm_dev_unplug(struct drm_device *dev)
drm_dev_unregister(dev);
 
mutex_lock(&drm_global_mutex);
-   drm_device_set_unplugged(dev);
if (dev->open_count == 0)
drm_dev_put(dev);
mutex_unlock(&drm_global_mutex);
+
+   /*
+* After synchronizing any critical read section is guaranteed to see
+* the new value of ->unplugged, and any critical section which might
+* still have seen the old value of ->unplugged is guaranteed to have
+* finished.
+*/
+   dev->unplugged = true;
+   synchronize_srcu(&drm_unplug_srcu);
 }
 EXPORT_SYMBOL(drm_dev_unplug);
 
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 7c4fa32f3fc6..3a0eac2885b7 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -46,7 +46,14 @@ struct drm_device {
/* currently active master for this device. Protected by master_mutex */
struct drm_master *master;
 
-   atomic_t unplugged; /**< Flag whether dev is dead */
+   /**
+* @unplugged:
+*
+* Flag to tell if the device has been unplugged.
+* See drm_dev_enter() and drm_dev_is_unplugged().
+*/
+   bool unplugged;
+
struct inode *anon_inode;   /**< inode for private 
address-space */
char *unique;   /**< unique name of the device 
*/
/*@} */
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index d23dcdd1bd95..7e545f5f94d3 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -624,6 +624,8 @@ void drm_dev_get(struct drm_device *dev);
 void drm_dev_put(struct drm_device *dev);
 void drm_dev_unref(struct drm_device *dev);
 void drm_put_dev(struct drm_device *dev);
+bool drm_dev_enter(struct drm_device *dev, int *idx);
+void drm_dev_exit(int idx);
 void drm_dev_unplug(struct drm_device *dev);
 
 /**
@@ -635,11 +637,16 @@ void drm_dev_unplug(s

[PATCH 09/30] net: refactor socket_poll

2018-03-28 Thread Christoph Hellwig
Factor out two busy poll related helpers for late reuse, and remove
a command that isn't very helpful, especially with the __poll_t
annotations in place.

Signed-off-by: Christoph Hellwig 
---
 include/net/busy_poll.h | 15 +++
 net/socket.c| 21 -
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 71c72a939bf8..c5187438af38 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -121,6 +121,21 @@ static inline void sk_busy_loop(struct sock *sk, int 
nonblock)
 #endif
 }
 
+static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
+{
+   if (sk_can_busy_loop(sock->sk) &&
+   events && (events & POLL_BUSY_LOOP)) {
+   /* once, only if requested by syscall */
+   sk_busy_loop(sock->sk, 1);
+   }
+}
+
+/* if this socket can poll_ll, tell the system call */
+static inline __poll_t sock_poll_busy_flag(struct socket *sock)
+{
+   return sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0;
+}
+
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index a93c99b518ca..3f859a07641a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1117,24 +1117,11 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
-   __poll_t busy_flag = 0;
-   struct socket *sock;
-
-   /*
-*  We can't return errors to poll, so it's either yes or no.
-*/
-   sock = file->private_data;
-
-   if (sk_can_busy_loop(sock->sk)) {
-   /* this socket can poll_ll so tell the system call */
-   busy_flag = POLL_BUSY_LOOP;
-
-   /* once, only if requested by syscall */
-   if (wait && (wait->_key & POLL_BUSY_LOOP))
-   sk_busy_loop(sock->sk, 1);
-   }
+   struct socket *sock = file->private_data;
+   __poll_t events = poll_requested_events(wait);
 
-   return busy_flag | sock->ops->poll(file, sock, wait);
+   sock_poll_busy_loop(sock, events);
+   return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2



[PATCH 11/30] net: remove sock_no_poll

2018-03-28 Thread Christoph Hellwig
Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need
for a stub.

Signed-off-by: Christoph Hellwig 
---
 crypto/af_alg.c | 1 -
 crypto/algif_hash.c | 2 --
 crypto/algif_rng.c  | 1 -
 drivers/isdn/mISDN/socket.c | 1 -
 drivers/net/ppp/pptp.c  | 1 -
 include/net/sock.h  | 2 --
 net/bluetooth/bnep/sock.c   | 1 -
 net/bluetooth/cmtp/sock.c   | 1 -
 net/bluetooth/hidp/sock.c   | 1 -
 net/core/sock.c | 6 --
 10 files changed, 17 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index c49766b03165..50d75de539f5 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -347,7 +347,6 @@ static const struct proto_ops alg_proto_ops = {
.sendpage   =   sock_no_sendpage,
.sendmsg=   sock_no_sendmsg,
.recvmsg=   sock_no_recvmsg,
-   .poll   =   sock_no_poll,
 
.bind   =   alg_bind,
.release=   af_alg_release,
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 6c9b1927a520..bfcf595fd8f9 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -288,7 +288,6 @@ static struct proto_ops algif_hash_ops = {
.mmap   =   sock_no_mmap,
.bind   =   sock_no_bind,
.setsockopt =   sock_no_setsockopt,
-   .poll   =   sock_no_poll,
 
.release=   af_alg_release,
.sendmsg=   hash_sendmsg,
@@ -396,7 +395,6 @@ static struct proto_ops algif_hash_ops_nokey = {
.mmap   =   sock_no_mmap,
.bind   =   sock_no_bind,
.setsockopt =   sock_no_setsockopt,
-   .poll   =   sock_no_poll,
 
.release=   af_alg_release,
.sendmsg=   hash_sendmsg_nokey,
diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
index 150c2b6480ed..22df3799a17b 100644
--- a/crypto/algif_rng.c
+++ b/crypto/algif_rng.c
@@ -106,7 +106,6 @@ static struct proto_ops algif_rng_ops = {
.bind   =   sock_no_bind,
.accept =   sock_no_accept,
.setsockopt =   sock_no_setsockopt,
-   .poll   =   sock_no_poll,
.sendmsg=   sock_no_sendmsg,
.sendpage   =   sock_no_sendpage,
 
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c5603d1a07d6..c84270e16bdd 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -746,7 +746,6 @@ static const struct proto_ops base_sock_ops = {
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,
.recvmsg= sock_no_recvmsg,
-   .poll   = sock_no_poll,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 6dde9a0cfe76..87f892f1d0fe 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -627,7 +627,6 @@ static const struct proto_ops pptp_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= pptp_getname,
-   .poll   = sock_no_poll,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/include/net/sock.h b/include/net/sock.h
index 169c92afcafa..d9249fe65859 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1585,8 +1585,6 @@ int sock_no_connect(struct socket *, struct sockaddr *, 
int, int);
 int sock_no_socketpair(struct socket *, struct socket *);
 int sock_no_accept(struct socket *, struct socket *, int, bool);
 int sock_no_getname(struct socket *, struct sockaddr *, int *, int);
-__poll_t sock_no_poll(struct file *, struct socket *,
- struct poll_table_struct *);
 int sock_no_ioctl(struct socket *, unsigned int, unsigned long);
 int sock_no_listen(struct socket *, int);
 int sock_no_shutdown(struct socket *, int);
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index b5116fa9835e..00deacdcb51c 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -175,7 +175,6 @@ static const struct proto_ops bnep_sock_ops = {
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,
.recvmsg= sock_no_recvmsg,
-   .poll   = sock_no_poll,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index ce86a7bae844..e08f28fadd65 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -178,7 +178,6 @@ static const struct proto_ops cmtp_sock_ops = {
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,
.recvmsg= sock_no_recvmsg,

[PATCH 06/30] aio: simplify cancellation

2018-03-28 Thread Christoph Hellwig
With the current aio code there is no need for the magic KIOCB_CANCELLED
value, as a cancelation just kicks the driver to queue the completion
ASAP, with all actual completion handling done in another thread. Given
that both the completion path and cancelation take the context lock there
is no need for magic cmpxchg loops either.  If we remove iocbs from the
active list before calling ->ki_cancel we can also rely on the invariant
thay anything found on the list has a ->ki_cancel callback and can be
cancelled, further simplifing the code.

Signed-off-by: Christoph Hellwig 
---
 fs/aio.c | 34 ++
 1 file changed, 2 insertions(+), 32 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0df07d399a05..c36eec8b0879 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -162,19 +162,6 @@ struct fsync_iocb {
booldatasync;
 };
 
-/*
- * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
- * cancelled or completed (this makes a certain amount of sense because
- * successful cancellation - io_cancel() - does deliver the completion to
- * userspace).
- *
- * And since most things don't implement kiocb cancellation and we'd really 
like
- * kiocb completion to be lockless when possible, we use ki_cancel to
- * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
- * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
- */
-#define KIOCB_CANCELLED((void *) (~0ULL))
-
 struct aio_kiocb {
union {
struct kiocbrw;
@@ -574,23 +561,8 @@ EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
 static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
-   kiocb_cancel_fn *old, *cancel;
-
-   /*
-* Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
-* actually has a cancel function, hence the cmpxchg()
-*/
-
-   cancel = READ_ONCE(kiocb->ki_cancel);
-   do {
-   if (!cancel || cancel == KIOCB_CANCELLED)
-   return -EINVAL;
-
-   old = cancel;
-   cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
-   } while (cancel != old);
-
-   return cancel(&kiocb->rw);
+   list_del_init(&kiocb->ki_list);
+   return kiocb->ki_cancel(&kiocb->rw);
 }
 
 static void free_ioctx(struct work_struct *work)
@@ -633,8 +605,6 @@ static void free_ioctx_users(struct percpu_ref *ref)
while (!list_empty(&ctx->active_reqs)) {
req = list_first_entry(&ctx->active_reqs,
   struct aio_kiocb, ki_list);
-
-   list_del_init(&req->ki_list);
kiocb_cancel(req);
}
 
-- 
2.14.2



[PATCH 07/30] aio: add delayed cancel support

2018-03-28 Thread Christoph Hellwig
The upcoming aio poll support would like to be able to complete the
iocb inline from the cancellation context, but that would cause
a lock order reversal.  Add support for optionally moving the cancelation
outside the context lock to avoid this reversal.

To make this safe aio_complete needs to check if this call should complete
the iocb.  If it didn't the callers must not release any other resources.

Signed-off-by: Christoph Hellwig 
---
 fs/aio.c | 81 ++--
 1 file changed, 59 insertions(+), 22 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index c36eec8b0879..232dd84fc897 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -177,6 +177,11 @@ struct aio_kiocb {
struct list_headki_list;/* the aio core uses this
 * for cancellation */
 
+   unsigned intflags;  /* protected by ctx->ctx_lock */
+#define AIO_IOCB_CAN_CANCEL(1 << 0)
+#define AIO_IOCB_DELAYED_CANCEL(1 << 1)
+#define AIO_IOCB_CANCELLED (1 << 2)
+
/*
 * If the aio_resfd field of the userspace iocb is not zero,
 * this is the underlying eventfd context to deliver events to.
@@ -543,9 +548,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int 
nr_events)
 #define AIO_EVENTS_FIRST_PAGE  ((PAGE_SIZE - sizeof(struct aio_ring)) / 
sizeof(struct io_event))
 #define AIO_EVENTS_OFFSET  (AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
 
-void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+static void __kiocb_set_cancel_fn(struct aio_kiocb *req,
+   kiocb_cancel_fn *cancel, unsigned int iocb_flags)
 {
-   struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
 
@@ -555,8 +560,15 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
kiocb_cancel_fn *cancel)
spin_lock_irqsave(&ctx->ctx_lock, flags);
list_add_tail(&req->ki_list, &ctx->active_reqs);
req->ki_cancel = cancel;
+   req->flags |= (AIO_IOCB_CAN_CANCEL | iocb_flags);
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
+
+void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+{
+   return __kiocb_set_cancel_fn(container_of(iocb, struct aio_kiocb, rw),
+   cancel, 0);
+}
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
 static int kiocb_cancel(struct aio_kiocb *kiocb)
@@ -599,17 +611,26 @@ static void free_ioctx_users(struct percpu_ref *ref)
 {
struct kioctx *ctx = container_of(ref, struct kioctx, users);
struct aio_kiocb *req;
+   LIST_HEAD(list);
 
spin_lock_irq(&ctx->ctx_lock);
-
while (!list_empty(&ctx->active_reqs)) {
req = list_first_entry(&ctx->active_reqs,
   struct aio_kiocb, ki_list);
-   kiocb_cancel(req);
+   if (req->flags & AIO_IOCB_DELAYED_CANCEL) {
+   req->flags |= AIO_IOCB_CANCELLED;
+   list_move_tail(&req->ki_list, &list);
+   } else {
+   kiocb_cancel(req);
+   }
}
-
spin_unlock_irq(&ctx->ctx_lock);
 
+   while (!list_empty(&list)) {
+   req = list_first_entry(&list, struct aio_kiocb, ki_list);
+   kiocb_cancel(req);
+   }
+
percpu_ref_kill(&ctx->reqs);
percpu_ref_put(&ctx->reqs);
 }
@@ -1045,22 +1066,30 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
return ret;
 }
 
+#define AIO_COMPLETE_CANCEL(1 << 0)
+
 /* aio_complete
  * Called when the io request on the given iocb is complete.
  */
-static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
+static bool aio_complete(struct aio_kiocb *iocb, long res, long res2,
+   unsigned complete_flags)
 {
struct kioctx   *ctx = iocb->ki_ctx;
struct aio_ring *ring;
struct io_event *ev_page, *event;
unsigned tail, pos, head;
-   unsigned long   flags;
-
-   if (!list_empty_careful(&iocb->ki_list)) {
-   unsigned long flags;
+   unsigned long flags;
 
+   if (iocb->flags & AIO_IOCB_CAN_CANCEL) {
spin_lock_irqsave(&ctx->ctx_lock, flags);
-   list_del(&iocb->ki_list);
+   if (!(complete_flags & AIO_COMPLETE_CANCEL) &&
+   (iocb->flags & AIO_IOCB_CANCELLED)) {
+   spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+   return false;
+   }
+
+   if (!list_empty(&iocb->ki_list))
+   list_del(&iocb->ki_list);
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
}
 
@@ -1136,6 +1165,7 @@ static void aio_complete(struct aio_kiocb *iocb, long 
res, long res2)
wake_up(&ctx->wait);
 
percpu_ref_put(&ctx->reqs);
+   return true;
 }
 
 /* aio_read_eve

Re: [PATCH 15/19] csky: Build infrastructure

2018-03-28 Thread Arnd Bergmann
On Wed, Mar 28, 2018 at 5:49 AM, Guo Ren  wrote:
> Hi Arnd,
>
> On Tue, Mar 27, 2018 at 09:38:56AM +0200, Arnd Bergmann wrote:
>> Usually the way gcc handles this, either each CPU is a strict superset
>> of another
>> one, so you just need to specify the one with the smallest instruction set,
>> or you have an option like -mcpu=generic that produces the common subset.
>>
> ck807 ck810 ck860 are diffrent architecture, so they can not be strict
> superset and there is no option like '-mcpu=generic' in our gcc and
> binutils.
>
> I know you want one vmlinux which could run on all ck807 ck810 ck860 with
> different dts-setting. But now, may I keep current design for abiv1&abiv2?
>
> In abiv3, we will take your advice seriously.

Ok, thanks for the clarification. Obviously if they are mutually incompatible,
there is no point in using a common kernel, so your current version is
absolutely fine, and this is similar to how we cannot have a common kernel
between ARMv5, ARMv7-A and ARMv7-M, which are all incompatible
at the kernel level.

One more question for my understanding: Are the three types of ck8xx
CPUs mutually incompatible in user space as well, or are the differences
only for the kernel? For the ARM example, ARMv5 and ARMv7
fundamentally require separate kernels, but both can run user space
programs built for ARMv5.

  Arnd


[PATCH 05/30] fs: introduce new ->get_poll_head and ->poll_mask methods

2018-03-28 Thread Christoph Hellwig
->get_poll_head returns the waitqueue that the poll operation is going
to sleep on.  Note that this means we can only use a single waitqueue
for the poll, unlike some current drivers that use two waitqueues for
different events.  But now that we have keyed wakeups and heavily use
those for poll there aren't that many good reason left to keep the
multiple waitqueues, and if there are any ->poll is still around, the
driver just won't support aio poll.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 Documentation/filesystems/Locking |  7 ++-
 Documentation/filesystems/vfs.txt | 13 +
 fs/select.c   | 28 
 include/linux/fs.h|  2 ++
 include/linux/poll.h  | 27 +++
 5 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 220bba28f72b..6d227f9d7bd9 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -440,6 +440,8 @@ prototypes:
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
+   struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+   __poll_t (*poll_mask) (struct file *, __poll_t);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
@@ -470,7 +472,7 @@ prototypes:
 };
 
 locking rules:
-   All may block.
+   All except for ->poll_mask may block.
 
 ->llseek() locking has moved from llseek to the individual llseek
 implementations.  If your fs is not using generic_file_llseek, you
@@ -498,6 +500,9 @@ in sys_read() and friends.
 the lease within the individual filesystem to record the result of the
 operation
 
+->poll_mask can be called with or without the waitqueue lock for the waitqueue
+returned from ->get_poll_head.
+
 --- dquot_operations ---
 prototypes:
int (*write_dquot) (struct dquot *);
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index f608180ad59d..50ee13563271 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,6 +857,8 @@ struct file_operations {
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
+   struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+   __poll_t (*poll_mask) (struct file *, __poll_t);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
@@ -901,6 +903,17 @@ otherwise noted.
activity on this file and (optionally) go to sleep until there
is activity. Called by the select(2) and poll(2) system calls
 
+  get_poll_head: Returns the struct wait_queue_head that poll, select,
+  epoll or aio poll should wait on in case this instance only has single
+  waitqueue.  Can return NULL to indicate polling is not supported,
+  or a POLL* value using the POLL_TO_PTR helper in case a grave error
+  occured and ->poll_mask shall not be called.
+
+  poll_mask: return the mask of POLL* values describing the file descriptor
+  state.  Called either before going to sleep on the waitqueue returned by
+  get_poll_head, or after it has been woken.  If ->get_poll_head and
+  ->poll_mask are implemented ->poll does not need to be implement.
+
   unlocked_ioctl: called by the ioctl(2) system call.
 
   compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
diff --git a/fs/select.c b/fs/select.c
index ba91103707ea..cc270d7f6192 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -34,6 +34,34 @@
 
 #include 
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+   unsigned int events = poll_requested_events(pt);
+   struct wait_queue_head *head;
+
+   if (unlikely(!file_can_poll(file)))
+   return DEFAULT_POLLMASK;
+
+   if (file->f_op->poll)
+   return file->f_op->poll(file, pt);
+
+   /*
+* Only get the poll head and do the first mask check if we are actually
+* going to sleep on this file:
+*/
+   if (pt && pt->_qproc) {
+   head = vfs_get_poll_head(file, events);
+   if (!head)
+   return DEFAULT_POLLMASK;
+   if (IS_ERR(head))
+   return PTR_TO_POLL(head);
+
+   pt->_qproc(file, head, pt);
+   }
+
+   

linux-next: build failure after merge of the userns tree

2018-03-28 Thread Stephen Rothwell
Hi Eric,

After merging the userns tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

In file included from include/linux/sched.h:16:0,
 from arch/powerpc/lib/xor_vmx_glue.c:14:
include/linux/shm.h:17:35: error: 'struct file' declared inside parameter list 
will not be visible outside of this definition or declaration [-Werror]
 bool is_file_shm_hugepages(struct file *file);
   ^~~~

and many, many more (most warnings, but some errors - arch/powerpc is
mostly built with -Werror)

Maybe caused by commit

  1a5c1349d105 ("sem: Move struct sem and struct sem_array into ipc/sem.c")

I have appplied the following fix patch for today:

From: Stephen Rothwell 
Date: Wed, 28 Mar 2018 18:36:27 +1100
Subject: [PATCH] fix up for struct file no longer being available in shm.h

Signed-off-by: Stephen Rothwell 
---
 include/linux/shm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/shm.h b/include/linux/shm.h
index 3a8eae3ca33c..d8e69aed3d32 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -7,6 +7,8 @@
 #include 
 #include 
 
+struct file;
+
 #ifdef CONFIG_SYSVIPC
 struct sysv_shm {
struct list_head shm_clist;
-- 
2.16.1

-- 
Cheers,
Stephen Rothwell


pgpBwrw6Z1_zE.pgp
Description: OpenPGP digital signature


Re: [PATCH v4 2/2] drm/xen-front: Add support for Xen PV display frontend

2018-03-28 Thread Daniel Vetter
On Wed, Mar 28, 2018 at 09:47:41AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko 
> 
> Add support for Xen para-virtualized frontend display driver.
> Accompanying backend [1] is implemented as a user-space application
> and its helper library [2], capable of running as a Weston client
> or DRM master.
> Configuration of both backend and frontend is done via
> Xen guest domain configuration options [3].
> 
> Driver limitations:
>  1. Only primary plane without additional properties is supported.
>  2. Only one video mode supported which resolution is configured via XenStore.
>  3. All CRTCs operate at fixed frequency of 60Hz.
> 
> 1. Implement Xen bus state machine for the frontend driver according to
> the state diagram and recovery flow from display para-virtualized
> protocol: xen/interface/io/displif.h.
> 
> 2. Read configuration values from Xen store according
> to xen/interface/io/displif.h protocol:
>   - read connector(s) configuration
>   - read buffer allocation mode (backend/frontend)
> 
> 3. Handle Xen event channels:
>   - create for all configured connectors and publish
> corresponding ring references and event channels in Xen store,
> so backend can connect
>   - implement event channels interrupt handlers
>   - create and destroy event channels with respect to Xen bus state
> 
> 4. Implement shared buffer handling according to the
> para-virtualized display device protocol at xen/interface/io/displif.h:
>   - handle page directories according to displif protocol:
> - allocate and share page directories
> - grant references to the required set of pages for the
>   page directory
>   - allocate xen balllooned pages via Xen balloon driver
> with alloc_xenballooned_pages/free_xenballooned_pages
>   - grant references to the required set of pages for the
> shared buffer itself
>   - implement pages map/unmap for the buffers allocated by the
> backend (gnttab_map_refs/gnttab_unmap_refs)
> 
> 5. Implement kernel modesetiing/connector handling using
> DRM simple KMS helper pipeline:
> 
> - implement KMS part of the driver with the help of DRM
>   simple pipepline helper which is possible due to the fact
>   that the para-virtualized driver only supports a single
>   (primary) plane:
>   - initialize connectors according to XenStore configuration
>   - handle frame done events from the backend
>   - create and destroy frame buffers and propagate those
> to the backend
>   - propagate set/reset mode configuration to the backend on display
> enable/disable callbacks
>   - send page flip request to the backend and implement logic for
> reporting backend IO errors on prepare fb callback
> 
> - implement virtual connector handling:
>   - support only pixel formats suitable for single plane modes
>   - make sure the connector is always connected
>   - support a single video mode as per para-virtualized driver
> configuration
> 
> 6. Implement GEM handling depending on driver mode of operation:
> depending on the requirements for the para-virtualized environment, namely
> requirements dictated by the accompanying DRM/(v)GPU drivers running in both
> host and guest environments, number of operating modes of para-virtualized
> display driver are supported:
>  - display buffers can be allocated by either frontend driver or backend
>  - display buffers can be allocated to be contiguous in memory or not
> 
> Note! Frontend driver itself has no dependency on contiguous memory for
> its operation.
> 
> 6.1. Buffers allocated by the frontend driver.
> 
> The below modes of operation are configured at compile-time via
> frontend driver's kernel configuration.
> 
> 6.1.1. Front driver configured to use GEM CMA helpers
>  This use-case is useful when used with accompanying DRM/vGPU driver in
>  guest domain which was designed to only work with contiguous buffers,
>  e.g. DRM driver based on GEM CMA helpers: such drivers can only import
>  contiguous PRIME buffers, thus requiring frontend driver to provide
>  such. In order to implement this mode of operation para-virtualized
>  frontend driver can be configured to use GEM CMA helpers.
> 
> 6.1.2. Front driver doesn't use GEM CMA
>  If accompanying drivers can cope with non-contiguous memory then, to
>  lower pressure on CMA subsystem of the kernel, driver can allocate
>  buffers from system memory.
> 
> Note! If used with accompanying DRM/(v)GPU drivers this mode of operation
> may require IOMMU support on the platform, so accompanying DRM/vGPU
> hardware can still reach display buffer memory while importing PRIME
> buffers from the frontend driver.
> 
> 6.2. Buffers allocated by the backend
> 
> This mode of operation is run-time configured via guest domain configuration
> through XenStore entries.
> 
> For systems which do not provide IOMMU support, but having specific
> requirements for display buffers it is possible to allocate such buffers
> at backend sid

[PATCH 02/30] fs: cleanup do_pollfd

2018-03-28 Thread Christoph Hellwig
Use straightline code with failure handling gotos instead of a lot
of nested conditionals.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 fs/select.c | 48 +++-
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 686de7b3a1db..c6c504a814f9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -806,34 +806,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, 
poll_table *pwait,
 bool *can_busy_poll,
 __poll_t busy_flag)
 {
-   __poll_t mask;
-   int fd;
-
-   mask = 0;
-   fd = pollfd->fd;
-   if (fd >= 0) {
-   struct fd f = fdget(fd);
-   mask = EPOLLNVAL;
-   if (f.file) {
-   /* userland u16 ->events contains POLL... bitmap */
-   __poll_t filter = demangle_poll(pollfd->events) |
-   EPOLLERR | EPOLLHUP;
-   mask = DEFAULT_POLLMASK;
-   if (f.file->f_op->poll) {
-   pwait->_key = filter;
-   pwait->_key |= busy_flag;
-   mask = f.file->f_op->poll(f.file, pwait);
-   if (mask & busy_flag)
-   *can_busy_poll = true;
-   }
-   /* Mask out unneeded events. */
-   mask &= filter;
-   fdput(f);
-   }
+   int fd = pollfd->fd;
+   __poll_t mask = 0, filter;
+   struct fd f;
+
+   if (fd < 0)
+   goto out;
+   mask = EPOLLNVAL;
+   f = fdget(fd);
+   if (!f.file)
+   goto out;
+
+   /* userland u16 ->events contains POLL... bitmap */
+   filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
+   mask = DEFAULT_POLLMASK;
+   if (f.file->f_op->poll) {
+   pwait->_key = filter | busy_flag;
+   mask = f.file->f_op->poll(f.file, pwait);
+   if (mask & busy_flag)
+   *can_busy_poll = true;
}
+   mask &= filter; /* Mask out unneeded events. */
+   fdput(f);
+
+out:
/* ... and so does ->revents */
pollfd->revents = mangle_poll(mask);
-
return mask;
 }
 
-- 
2.14.2



[PATCH 5/6] aio: implement IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC

2018-03-28 Thread Christoph Hellwig
Simple workqueue offload for now, but prepared for adding a real aio_fsync
method if the need arises.  Based on an earlier patch from Dave Chinner.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index f35801d73e0b..fd6c72918a8e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -156,6 +156,12 @@ struct kioctx {
unsignedid;
 };
 
+struct fsync_iocb {
+   struct work_struct  work;
+   struct file *file;
+   booldatasync;
+};
+
 /*
  * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
  * cancelled or completed (this makes a certain amount of sense because
@@ -172,6 +178,7 @@ struct kioctx {
 struct aio_kiocb {
union {
struct kiocbrw;
+   struct fsync_iocb   fsync;
};
 
struct kioctx   *ki_ctx;
@@ -1565,6 +1572,43 @@ static ssize_t aio_write(struct kiocb *req, struct iocb 
*iocb, bool vectored,
return ret;
 }
 
+static void aio_fsync_work(struct work_struct *work)
+{
+   struct fsync_iocb *req = container_of(work, struct fsync_iocb, work);
+   int ret;
+
+   ret = vfs_fsync(req->file, req->datasync);
+   fput(req->file);
+   aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
+}
+
+static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
+{
+   int ret;
+
+   if (iocb->aio_buf)
+   return -EINVAL;
+   if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+   return -EINVAL;
+
+   req->file = fget(iocb->aio_fildes);
+   if (unlikely(!req->file))
+   return -EBADF;
+
+   ret = -EINVAL;
+   if (!req->file->f_op->fsync)
+   goto out_fput;
+
+   req->datasync = datasync;
+   INIT_WORK(&req->work, aio_fsync_work);
+   schedule_work(&req->work);
+   return -EIOCBQUEUED;
+out_fput:
+   if (unlikely(ret && ret != -EIOCBQUEUED))
+   fput(req->file);
+   return ret;
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 struct iocb *iocb, bool compat)
 {
@@ -1628,6 +1672,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
case IOCB_CMD_PWRITEV:
ret = aio_write(&req->rw, iocb, true, compat);
break;
+   case IOCB_CMD_FSYNC:
+   ret = aio_fsync(&req->fsync, iocb, false);
+   break;
+   case IOCB_CMD_FDSYNC:
+   ret = aio_fsync(&req->fsync, iocb, true);
+   break;
default:
pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
ret = -EINVAL;
-- 
2.14.2



[PATCH 4/6] aio: sanitize ki_list handling

2018-03-28 Thread Christoph Hellwig
Instead of handcoded non-null checks always initialize ki_list to an
empty list and use list_empty / list_empty_careful on it.  While we're
at it also error out on a double call to kiocb_set_cancel_fn instead
of ignoring it.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 50c4a0554cc6..f35801d73e0b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -555,13 +555,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
kiocb_cancel_fn *cancel)
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
 
-   spin_lock_irqsave(&ctx->ctx_lock, flags);
-
-   if (!req->ki_list.next)
-   list_add(&req->ki_list, &ctx->active_reqs);
+   if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
+   return;
 
+   spin_lock_irqsave(&ctx->ctx_lock, flags);
+   list_add_tail(&req->ki_list, &ctx->active_reqs);
req->ki_cancel = cancel;
-
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
@@ -1034,7 +1033,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx 
*ctx)
goto out_put;
 
percpu_ref_get(&ctx->reqs);
-
+   INIT_LIST_HEAD(&req->ki_list);
req->ki_ctx = ctx;
return req;
 out_put:
@@ -1080,7 +1079,7 @@ static void aio_complete(struct aio_kiocb *iocb, long 
res, long res2)
unsigned tail, pos, head;
unsigned long   flags;
 
-   if (iocb->ki_list.next) {
+   if (!list_empty_careful(&iocb->ki_list)) {
unsigned long flags;
 
spin_lock_irqsave(&ctx->ctx_lock, flags);
-- 
2.14.2



[PATCH 3/6] aio: refactor read/write iocb setup

2018-03-28 Thread Christoph Hellwig
Don't reference the kiocb structure from the common aio code, and move
any use of it into helper specific to the read/write path.  This is in
preparation for aio_poll support that wants to use the space for different
fields.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 167 ---
 1 file changed, 95 insertions(+), 72 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f536b0f249d4..50c4a0554cc6 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,7 +170,9 @@ struct kioctx {
 #define KIOCB_CANCELLED((void *) (~0ULL))
 
 struct aio_kiocb {
-   struct kiocbcommon;
+   union {
+   struct kiocbrw;
+   };
 
struct kioctx   *ki_ctx;
kiocb_cancel_fn *ki_cancel;
@@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int 
nr_events)
 
 void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 {
-   struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
+   struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
 
@@ -582,7 +584,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
} while (cancel != old);
 
-   return cancel(&kiocb->common);
+   return cancel(&kiocb->rw);
 }
 
 static void free_ioctx(struct work_struct *work)
@@ -1040,15 +1042,6 @@ static inline struct aio_kiocb *aio_get_req(struct 
kioctx *ctx)
return NULL;
 }
 
-static void kiocb_free(struct aio_kiocb *req)
-{
-   if (req->common.ki_filp)
-   fput(req->common.ki_filp);
-   if (req->ki_eventfd != NULL)
-   eventfd_ctx_put(req->ki_eventfd);
-   kmem_cache_free(kiocb_cachep, req);
-}
-
 static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 {
struct aio_ring __user *ring  = (void __user *)ctx_id;
@@ -1079,27 +1072,14 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 /* aio_complete
  * Called when the io request on the given iocb is complete.
  */
-static void aio_complete(struct kiocb *kiocb, long res, long res2)
+static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 {
-   struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
struct kioctx   *ctx = iocb->ki_ctx;
struct aio_ring *ring;
struct io_event *ev_page, *event;
unsigned tail, pos, head;
unsigned long   flags;
 
-   if (kiocb->ki_flags & IOCB_WRITE) {
-   struct file *file = kiocb->ki_filp;
-
-   /*
-* Tell lockdep we inherited freeze protection from submission
-* thread.
-*/
-   if (S_ISREG(file_inode(file)->i_mode))
-   __sb_writers_acquired(file_inode(file)->i_sb, 
SB_FREEZE_WRITE);
-   file_end_write(file);
-   }
-
if (iocb->ki_list.next) {
unsigned long flags;
 
@@ -1161,11 +1141,12 @@ static void aio_complete(struct kiocb *kiocb, long res, 
long res2)
 * eventfd. The eventfd_signal() function is safe to be called
 * from IRQ context.
 */
-   if (iocb->ki_eventfd != NULL)
+   if (iocb->ki_eventfd) {
eventfd_signal(iocb->ki_eventfd, 1);
+   eventfd_ctx_put(iocb->ki_eventfd);
+   }
 
-   /* everything turned out well, dispose of the aiocb. */
-   kiocb_free(iocb);
+   kmem_cache_free(kiocb_cachep, iocb);
 
/*
 * We have to order our ring_info tail store above and test
@@ -1428,6 +1409,45 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
return -EINVAL;
 }
 
+static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
+{
+   struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
+
+   if (kiocb->ki_flags & IOCB_WRITE) {
+   struct inode *inode = file_inode(kiocb->ki_filp);
+
+   /*
+* Tell lockdep we inherited freeze protection from submission
+* thread.
+*/
+   if (S_ISREG(inode->i_mode))
+   __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+   file_end_write(kiocb->ki_filp);
+   }
+
+   fput(kiocb->ki_filp);
+   aio_complete(iocb, res, res2);
+}
+
+static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
+{
+   int ret;
+
+   req->ki_filp = fget(iocb->aio_fildes);
+   if (unlikely(!req->ki_filp))
+   return -EBADF;
+   req->ki_complete = aio_complete_rw;
+   req->ki_pos = iocb->aio_offset;
+   req->ki_flags = iocb_flags(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_RESFD)
+   req->ki_flags |= IOCB_EVENTFD;
+   req->ki_h

RE: [PATCH v2 3/8] DT: arm: renesas,r9a06g032: add the RZ/N1 bindings

2018-03-28 Thread Michel Pollet
Hi Geert,

Thanks for your review!

On  22 March 2018 12:37, Geert said:
> Hi Michel,
>
> On Thu, Mar 22, 2018 at 12:44 PM, Michel Pollet
>  wrote:
> > This documents the RZ/N1 bindings for both the RZ/N1 and the
> > RZN1D400-DB board.
> >
> > Signed-off-by: Michel Pollet 
>
> Thanks for your patch!
>
> > --- a/Documentation/devicetree/bindings/arm/shmobile.txt
> > +++ b/Documentation/devicetree/bindings/arm/shmobile.txt
> > @@ -47,7 +47,8 @@ SoCs:
> >  compatible = "renesas,r8a77980"
> >- R-Car D3 (R8A77995)
> >  compatible = "renesas,r8a77995"
> > -
> > +  - RZ/N1D (R9A06G032)
> > +compatible = "renesas,r9a06g032"
> >
> >  Boards:
> >
> > @@ -104,6 +105,8 @@ Boards:
> >  compatible = "renesas,porter", "renesas,r8a7791"
> >- RSKRZA1 (YR0K77210C000BE)
> >  compatible = "renesas,rskrza1", "renesas,r7s72100"
> > +  - RZN1D-DB (RZ/N1D Demo Board for the RZ/N1D 400 pins package)
>
> The official board part number (between parentheses) should be
> YCONNECT-IT-RZN1D?

I've checked this, and the YCONNECT is the /base board/ for this 'module', the 
module
is called RZN1D-DB.

>
> > +compatible = "renesas,rzn1d400-db", "renesas,r9a06g032"
> >- Salvator-X (RTP0RC7795SIPB0010S)
> >  compatible = "renesas,salvator-x", "renesas,r8a7795"
> >- Salvator-X (RTP0RC7796SIPB0011S)
>
> Gr{oetje,eeting}s,
>
> Geert

Michel




Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, 
Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered 
No. 04586709.


[PATCH 2/6] aio: remove an outdated comment in aio_complete

2018-03-28 Thread Christoph Hellwig
These days we don't treat sync iocbs special in the aio completion code as
they never use it.  Remove the old comment and BUG_ON given that the
current definition of is_sync_kiocb makes it impossible to hit.
iocb to the top of the function.

Signed-off-by: Christoph Hellwig 
---
 fs/aio.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 03d59593912d..f536b0f249d4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1100,15 +1100,6 @@ static void aio_complete(struct kiocb *kiocb, long res, 
long res2)
file_end_write(file);
}
 
-   /*
-* Special case handling for sync iocbs:
-*  - events go directly into the iocb for fast handling
-*  - the sync task with the iocb in its stack holds the single iocb
-*ref, no other paths have a way to get another ref
-*  - the sync task helpfully left a reference to itself in the iocb
-*/
-   BUG_ON(is_sync_kiocb(kiocb));
-
if (iocb->ki_list.next) {
unsigned long flags;
 
-- 
2.14.2



[PATCH] sched: support dynamiQ cluster

2018-03-28 Thread Vincent Guittot
Arm DynamiQ system can integrate cores with different micro architecture
or max OPP under the same DSU so we can have cores with different compute
capacity at the LLC (which was not the case with legacy big/LITTLE
architecture). Such configuration is similar in some way to ITMT on intel
platform which allows some cores to be boosted to higher turbo frequency
than others and which uses SD_ASYM_PACKING feature to ensures that CPUs with
highest capacity, will always be used in priortiy in order to provide
maximum throughput.

Add arch_asym_cpu_priority() for arm64 as this function is used to
differentiate CPUs in the scheduler. The CPU's capacity is used to order
CPUs in the same DSU.

Create sched domain topolgy level for arm64 so we can set SD_ASYM_PACKING
at MC level.

Some tests have been done on a hikey960 platform (quad cortex-A53,
quad cortex-A73). For the test purpose, the CPUs topology of the hikey960
has been modified so the 8 heterogeneous cores are described as being part
of the same cluster and sharing resources (MC level) like with a DynamiQ DSU.

Results below show the time in seconds to run sysbench --test=cpu with an
increasing number of threads. The sysbench test run 32 times

 without patch with patchdiff
1 threads11.04(+/- 30%)8.86(+/- 0%)  -19%
2 threads 5.59(+/- 14%)4.43(+/- 0%)  -20%
3 threads 3.80(+/- 13%)2.95(+/- 0%)  -22%
4 threads 3.10(+/- 12%)2.22(+/- 0%)  -28%
5 threads 2.47(+/-  5%)1.95(+/- 0%)  -21%
6 threads 2.09(+/-  0%)1.73(+/- 0%)  -17%
7 threads 1.64(+/-  0%)1.56(+/- 0%)  - 7%
8 threads 1.42(+/-  0%)1.42(+/- 0%)0%

Results show a better and stable results across iteration with the patch
compared to mainline because we are always using big cores in priority whereas
with mainline, the scheduler randomly choose a big or a little cores when
there are more cores than number of threads.
With 1 thread, the test duration varies in the range [8.85 .. 15.86] for
mainline whereas it stays in the range [8.85..8.87] with the patch

Signed-off-by: Vincent Guittot 

---

The SD_ASYM_PACKING flag is disabled by default and I'm preparing another patch
to enable this dynamically at boot time by detecting the system topology.

 arch/arm64/kernel/topology.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 2186853..cb6705e5 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -296,6 +296,33 @@ static void __init reset_cpu_topology(void)
}
 }
 
+#ifdef CONFIG_SCHED_MC
+unsigned int __read_mostly arm64_sched_asym_enabled;
+
+int arch_asym_cpu_priority(int cpu)
+{
+   return topology_get_cpu_scale(NULL, cpu);
+}
+
+static inline int arm64_sched_dynamiq(void)
+{
+   return arm64_sched_asym_enabled ? SD_ASYM_PACKING : 0;
+}
+
+static int arm64_core_flags(void)
+{
+   return cpu_core_flags() | arm64_sched_dynamiq();
+}
+#endif
+
+static struct sched_domain_topology_level arm64_topology[] = {
+#ifdef CONFIG_SCHED_MC
+   { cpu_coregroup_mask, arm64_core_flags, SD_INIT_NAME(MC) },
+#endif
+   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
+   { NULL, },
+};
+
 void __init init_cpu_topology(void)
 {
reset_cpu_topology();
@@ -306,4 +333,7 @@ void __init init_cpu_topology(void)
 */
if (of_have_populated_dt() && parse_dt_topology())
reset_cpu_topology();
+
+   /* Set scheduler topology descriptor */
+   set_sched_topology(arm64_topology);
 }
-- 
2.7.4



Re: [PATCH v9 01/24] mm: Introduce CONFIG_SPECULATIVE_PAGE_FAULT

2018-03-28 Thread Laurent Dufour
Hi David,

Thanks a lot for your deep review on this series.

On 25/03/2018 23:50, David Rientjes wrote:
> On Tue, 13 Mar 2018, Laurent Dufour wrote:
> 
>> This configuration variable will be used to build the code needed to
>> handle speculative page fault.
>>
>> By default it is turned off, and activated depending on architecture
>> support.
>>
>> Suggested-by: Thomas Gleixner 
>> Signed-off-by: Laurent Dufour 
>> ---
>>  mm/Kconfig | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index abefa573bcd8..07c566c88faf 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -759,3 +759,6 @@ config GUP_BENCHMARK
>>performance of get_user_pages_fast().
>>  
>>See tools/testing/selftests/vm/gup_benchmark.c
>> +
>> +config SPECULATIVE_PAGE_FAULT
>> +   bool
> 
> Should this be configurable even if the arch supports it?

Actually, this is not configurable unless by manually editing the .config file.

I made it this way on the Thomas's request :
https://lkml.org/lkml/2018/1/15/969

That sounds to be the smarter way to achieve that, isn't it ?

Laurent.



Re: dma-mapping: clearing GFP_ZERO flag caused crashes of Ethernet on arc/hsdk board.

2018-03-28 Thread h...@lst.de
> > The logical question is why?
> 
> 1. See that's another platform with ARC core so maybe in case of ARM
>DMA allocator already zeroes pages regardless provided flags -
>personally I didn't check that.

Yes, most architectures always clear memory returned by dma_alloc*.
Looks like a few don't and my commit got them in trouble.  As usual
I'd prefer to match x86 semantics for now to avoid problems.

I'll send patches for arc and s390 which seem to be actually used
holdouts, and will look if anyone else is also affected.


Re: [PATCH v4 2/2] drm/xen-front: Add support for Xen PV display frontend

2018-03-28 Thread Oleksandr Andrushchenko

On 03/28/2018 10:42 AM, Daniel Vetter wrote:

kms side looks good now too.

Reviewed-by: Daniel Vetter

Thank you


Re: [PATCH 2/4] vfio: ccw: refactor and improve pfn_array_alloc_pin()

2018-03-28 Thread Cornelia Huck
On Wed, 28 Mar 2018 10:36:38 +0800
Dong Jia Shi  wrote:

> * Cornelia Huck  [2018-03-27 12:01:27 +0200]:
> 
> [...]
> 
> > > > 
> > > > So, basically everything is filled by pfn_array_alloc_pin()?
> > > Yes.
> > >   
> > > > Should we expect a clean struct pfn_array handed in by the caller,
> > > > then (not just pa_nr == 0)?
> > > The current idea is:
> > > - It is a clean struct that pfn_array_alloc_pin() expects from its
> > >   caller.
> > > - pfn_array_alloc_pin() and pfn_array_unpin_free() should be used in
> > >   pair. They are the only functions those change the values of the
> > >   elements of a pfn_array struct.
> > > - Caller of pfn_array_alloc_pin() should either hand in a new allocated
> > >   pfn_array (zeroed out), or a freed-after-used one.
> > > - So using pa_nr == 0, is enough to identify all the good cases.
> > >   [We set pa_nr to 0 in pfn_array_unpin_free().]
> > > 
> > > Validating all of the elements only helps when there is case that a
> > > caller breaks the usage rule of these interfaces - the caller itself
> > > assigns values for pfn_pa elements directly... I don't think we allow
> > > this to happen.
> > > 
> > > So I think the current logic is fine.  
> > 
> > Yes, I think it is fine -- I was mainly wondering whether we wanted
> > more sanity checks.
> >   
> Ok.
> Check on (pa->pa_iova_pfn != NULL) could be added. It's easy to do so.
> Check on pa->pa_iova doesn't make sense, since its value will be
> re-assigned anyway.
> Check on pa->pa_pfn doesn't make sense, since we treat it as a pointer
> that points to part of the memory area that was pointed by
> pa->pa_iova_pfn. And we will re-assign it with new pa->pa_iova_pfn
> value.

Yeah, so additional checks are probably not very useful.

> 
> > >   
> > > > 
> > > > Would it make sense to describe the contents of the struct pfn_array
> > > > fields at the struct's definition instead? You could then shorten the
> > > > description here to "we expect pa_nr == 0, any field in this structure
> > > > will be filled in by this function".
> > > Sounds good!
> > > Do you want a separated patch for this, or I do this change on this
> > > patch? Either will be ok with me.  
> > 
> > Perhaps as an additional patch in front of this one?
> >   
> It's doable. I will do that.
> 

Cool, thx!


RE: [PATCH v2 0/8] arm: Base support for Renesas RZN1D-DB Board

2018-03-28 Thread Michel Pollet
Replying to myself here, more or less to reflect further discussion on IRC 
related to where the
bindings goes. Also, to publicly acknowledge and thank Geert for tons of advice 
and comments
outside the email chain...

So, after further discussion on IRC, that's what I've been trying to do
[hope outlook doesn't mangle this]:
+-+ ++  
   ++
|  r9a06g0xx.dtsi | | r9a06g032.dtsi |  
   | r9a06g032-rzn1d-db.dts |
| | ||  
   ||
|   compatible=   | |  compatible=   |  
   | compatible=|
| "renesas,rzn1"; | |"renesas,r9a06g032",|  
   |   "renesas,rzn1d-db",  |
| +->"renesas,rzn1"; 
+->   "renesas,r9a06g032", |
|   ...   | |  ...   |  
   |   "renesas,rzn1";  |
|   compatible=   | |  compatible=   |  
   ||
| "renesas,rzn1-reset";   | |"renesas,r9a06g032-reset",  |  
   ||
| | |"renesas,rzn1-reset";   |  
   ||
| | ||  
   ||
+-+ ++  
   ++
  Family File, "rzn1" only  Future, potential SoC   
   Board File
Specific override file

The idea is that the 1D and 1S share /everything/ apart from one extra QSPI, no 
DDR, one less CPU
and a few other bits and bobs. So the r9a06g0xx.dtsi will contain 98% of both 
SoC bindings. *Perhaps*
later I could add a r9a06g03[23].dtsi for SoC specific bindings, but currently 
that is not necessary, so we
Won't need that file.

If everyone happy with this? I've got a v3 simmering on the fire, but I'd 
really like everyone to be
happy with the proposed solution...

Cheers,
Michel

On 22 March 2018 11:45, I wrote:
> This series adds the plain basic support for booting a bare kernel on the
> RZ/N1D-DB Board. It's been trimmed to the strict minimum as a 'base',
> further patches that will add the rest of the support, pinctrl, clock
> architecture and quite a few others.
>
> Thanks for the comments on the previous version!
>
> v2:
>  + Fixes for suggestions by Simon Horman  + Fixes for suggestions by Rob
> Herring  + Fixes for suggestions by Geert Uytterhoeven  + Removed the
> mach file  + Added a MFD base for the sysctrl block  + Added a regmap based
> sub driver for the reboot handler  + Renamed the files to match shmobile
> conventions  + Adapted the compatible= strings to reflect 'family' vs 'part'
>distinction.
>  + Removed the sysctrl.h file entirelly.
>  + Fixed every warnings from the DTC compiler on W=12 mode.
>  + Split the device-tree patches from the code.
>
> Michel Pollet (8):
>   DT: mfd: renesas,rzn1-sysctrl: document RZ/N1 sysctrl node
>   DT: reset: renesas,rzn1-reboot: document RZ/N1 reboot driver
>   DT: arm: renesas,r9a06g032: add the RZ/N1 bindings
>   reset: Renesas RZ/N1 reboot driver
>   arm: rzn1: Add the RZ/N1 arch to the shmobile Kconfig
>   DT: arm: Add Renesas RZ/N1 SoC base device tree file
>   DT: arm: Add Renesas RZN1D-DB Board base file
>   DT: arm: Add the RZN1D-DB Board to Renesas Makefile target
>
>  Documentation/devicetree/bindings/arm/shmobile.txt |   5 +-
>  .../bindings/mfd/renesas,rzn1-sysctrl.txt  |  22 +
>  .../bindings/power/renesas,rzn1-reboot.txt |  22 +
>  arch/arm/boot/dts/Makefile |   1 +
>  arch/arm/boot/dts/r9a06g032-rzn1d400-db.dts|  26 +
>  arch/arm/boot/dts/r9a06g0xx.dtsi   |  96 ++
>  arch/arm/mach-shmobile/Kconfig |   5 +
>  drivers/power/reset/Kconfig|   7 ++
>  drivers/power/reset/Makefile   |   1 +
>  drivers/power/reset/rzn1-reboot.c  | 109
> +
>  10 files changed, 293 insertions(+), 1 deletion(-)  create mode 100644
> Documentation/devicetree/bindings/mfd/renesas,rzn1-sysctrl.txt
>  create mode 100644
> Documentation/devicetree/bindings/power/renesas,rzn1-reboot.txt
>  create mode 100644 arch/arm/boot/dts/r9a06g032-rzn1d400-db.dts
>  create mode 100644 arch/arm/boot/dts/r9a06g0xx.dtsi  create mode 100644
> drivers/power/reset/rzn1-reboot.c
>
> --
> 2.7.4




Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, 
Buckinghamshire, SL8 5FH,

Re: [PATCH v3 2/2] net: macb: Try to retrieve MAC addess from nvmem provider

2018-03-28 Thread Nicolas Ferre

On 27/03/2018 at 11:52, Mike Looijmans wrote:

Call of_get_nvmem_mac_address() to fetch the MAC address from an nvmem
cell, if one is provided in the device tree. This allows the address to
be stored in an I2C EEPROM device for example.

Signed-off-by: Mike Looijmans 


For this part:
Acked-by: Nicolas Ferre 


---
  drivers/net/ethernet/cadence/macb_main.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index e84afcf..eabe14f 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -3950,10 +3950,16 @@ static int macb_probe(struct platform_device *pdev)
dev->max_mtu = ETH_DATA_LEN;
  
  	mac = of_get_mac_address(np);

-   if (mac)
+   if (mac) {
ether_addr_copy(bp->dev->dev_addr, mac);
-   else
-   macb_get_hwaddr(bp);
+   } else {
+   err = of_get_nvmem_mac_address(np, bp->dev->dev_addr);
+   if (err) {
+   if (err == -EPROBE_DEFER)
+   goto err_out_free_netdev;
+   macb_get_hwaddr(bp);
+   }
+   }
  
  	err = of_get_phy_mode(np);

if (err < 0) {




--
Nicolas Ferre


Re: [PATCH] mm/list_lru: replace spinlock with RCU in __list_lru_count_one

2018-03-28 Thread kbuild test robot
Hi Li,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.16-rc7]
[cannot apply to next-20180327]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Li-RongQing/mm-list_lru-replace-spinlock-with-RCU-in-__list_lru_count_one/20180328-042620
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> mm/list_lru.c:59:15: sparse: incompatible types in comparison expression 
>> (different address spaces)
   mm/list_lru.c:61:24: sparse: incompatible types in comparison expression 
(different address spaces)
>> mm/list_lru.c:59:15: sparse: incompatible types in comparison expression 
>> (different address spaces)
   mm/list_lru.c:61:24: sparse: incompatible types in comparison expression 
(different address spaces)
>> mm/list_lru.c:59:15: sparse: incompatible types in comparison expression 
>> (different address spaces)
   mm/list_lru.c:61:24: sparse: incompatible types in comparison expression 
(different address spaces)
>> mm/list_lru.c:59:15: sparse: incompatible types in comparison expression 
>> (different address spaces)
   mm/list_lru.c:61:24: sparse: incompatible types in comparison expression 
(different address spaces)
>> mm/list_lru.c:59:15: sparse: incompatible types in comparison expression 
>> (different address spaces)
   mm/list_lru.c:61:24: sparse: incompatible types in comparison expression 
(different address spaces)
>> mm/list_lru.c:59:15: sparse: incompatible types in comparison expression 
>> (different address spaces)
   mm/list_lru.c:61:24: sparse: incompatible types in comparison expression 
(different address spaces)

vim +59 mm/list_lru.c

51  
52  static inline struct list_lru_one *
53  list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
54  {
55  struct list_lru_memcg *tmp;
56  
57  WARN_ON_ONCE(!rcu_read_lock_held());
58  
  > 59  tmp = rcu_dereference(nlru->memcg_lrus);
60  if (tmp && idx >= 0)
61  return rcu_dereference(tmp->lru[idx]);
62  
63  return &nlru->lru;
64  }
65  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH] nvme: don't send keep-alives to the discovery controller

2018-03-28 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 15/19] csky: Build infrastructure

2018-03-28 Thread Guo Ren
Hi Arnd,

On Wed, Mar 28, 2018 at 09:40:49AM +0200, Arnd Bergmann wrote:
> Ok, thanks for the clarification. Obviously if they are mutually incompatible,
> there is no point in using a common kernel, so your current version is
> absolutely fine, and this is similar to how we cannot have a common kernel
> between ARMv5, ARMv7-A and ARMv7-M, which are all incompatible
> at the kernel level.
Yes.

> One more question for my understanding: Are the three types of ck8xx
> CPUs mutually incompatible in user space as well, or are the differences
> only for the kernel? For the ARM example, ARMv5 and ARMv7
> fundamentally require separate kernels, but both can run user space
> programs built for ARMv5.

 -mcpu=ck807 app could run on ck807, ck810, ck860.
 -mcpu=ck810 app could run on ck807, ck810, ck860.
 -mcpu=ck860 app only  run on ck860.

They are all incompatible at the kernel level.

Best Regards
 Guo Ren


Re: [PATCH] nvme-multipath: implement active-active round-robin path selector

2018-03-28 Thread Christoph Hellwig
For PCIe devices the right policy is not a round robin but to use
the pcie device closer to the node.  I did a prototype for that
long ago and the concept can work.  Can you look into that and
also make that policy used automatically for PCIe devices?


Re: [PATCH] nvme: unexport nvme_start_keep_alive

2018-03-28 Thread Christoph Hellwig
On Mon, Mar 26, 2018 at 10:40:15AM +0200, Johannes Thumshirn wrote:
> nvme_start_keep_alive() isn't used outside core.c so unexport it and
> make it static.
> 
> Signed-off-by: Johannes Thumshirn 

Looks good:

Reviewed-by: Christoph Hellwig 


Re: NFS mounts failing when keytab present on client

2018-03-28 Thread M A Young
On Tue, 27 Mar 2018, Eric Biggers wrote:

> Hi Michael,
> 
> On Tue, Mar 27, 2018 at 11:06:14PM +0100, Michael Young wrote:
> > NFS mounts stopped working on one of my computers after a kernel update from
> > 4.15.3 to 4.15.4. I traced the problem to the commit
> > [46e8d06e423c4f35eac7a8b677b713b3ec9b0684] crypto: hash - prevent using
> > keyed hashes without setting key
> > and a later kernel with this patch reverted works normally.
> > 
> > The problem seems to be related to kerberos as the mount fails when the
> > keytab is present, but works if I rename the keytab file. This is true even
> > though the mount is with sec=sys . The mount should also work with sec=krb5
> > but that also fails in the same way. When the mount fails there are errors
> > in dmesg like
> > [ 1232.522816] gss_marshal: gss_get_mic FAILED (851968)
> > [ 1232.522819] RPC: couldn't encode RPC header, exit EIO
> > [ 1232.522856] gss_marshal: gss_get_mic FAILED (851968)
> > [ 1232.522857] RPC: couldn't encode RPC header, exit EIO
> > [ 1232.522863] NFS: nfs4_discover_server_trunking unhandled error -5.
> > Exiting with error EIO
> > [ 1232.525039] gss_marshal: gss_get_mic FAILED (851968)
> > [ 1232.525042] RPC: couldn't encode RPC header, exit EIO
> > 
> > Michael Young
> 
> Thanks for the bug report.  I think the error is coming from
> net/sunrpc/auth_gss/gss_krb5_crypto.c.  There are two potential problems I 
> see.
> The first one, which is definitely a bug, is that make_checksum_hmac_md5()
> allocates an HMAC transform and request, then does these crypto API calls:
> 
>   crypto_ahash_init()
>   crypto_ahash_setkey()
>   crypto_ahash_digest()
> 
> This is wrong because it makes no sense to init() the HMAC request before the
> key has been set, and doubly so when it's calling digest() which is shorthand
> for init() + update() + final().  So I think it just needs to be removed.  You
> can test the following patch:
> 
> diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c 
> b/net/sunrpc/auth_gss/gss_krb5_crypto.c
> index 12649c9fedab..8654494b4d0a 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
> @@ -237,9 +237,6 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char 
> *header, int hdrlen,
>  
> ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
>  
> -   err = crypto_ahash_init(req);
> -   if (err)
> -   goto out;
> err = crypto_ahash_setkey(hmac_md5, cksumkey, kctx->gk5e->keylength);
> if (err)
> goto out;
> 
> If that's not it, it's also possible that the error is coming from the
> crypto_ahash_init() in make_checksum().  That can only happen if 'cksumkey' is
> NULL and the hash algorithm is keyed, which implies a logical error as it
> doesn't make sense to use a keyed hash algorithm without the key.  The callers
> do check kctx->gk5e->keyed_cksum which I'd hope would prevent this, though
> perhaps kctx->cksum can be NULL.
> 
> Eric

The patch fixes the problem.

Michael Young


arm/arm64: smp_spin_table.c for arm32?

2018-03-28 Thread Michel Pollet
Hi guys,

I'm currently adapting a port from a machine-file based approach to driver 
based, and I
would have a need for arch/arm64/kernel/smp_spin_table.c -- it's *exactly* my 
use
case, but for arm32.
So what would be my options here?

1) Make myself a custom driver and ignore this one...
2) Fully duplicate smp_spin_table.c in arch/arm/kernel...
3)  something else involving a shared bit of code?

I obviously would prefer 3), however, I don't see any obvious way of 'sharing' 
anything
between these two arch...
Any suggestion greatly appreciated!

Regards,
Michel




Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, 
Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered 
No. 04586709.


Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick

2018-03-28 Thread Rafael J. Wysocki
On Wed, Mar 28, 2018 at 12:10 AM, Rafael J. Wysocki  wrote:
> On Tuesday, March 27, 2018 11:50:02 PM CEST Thomas Ilsche wrote:
>> On 2018-03-20 16:45, Rafael J. Wysocki wrote:
>> > From: Rafael J. Wysocki 
>> >
>> > In order to address the issue with short idle duration predictions
>> > by the idle governor after the tick has been stopped, reorder the
>> > code in cpuidle_idle_call() so that the governor idle state selection
>> > runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned
>> > by cpuidle_select() to decide whether or not to stop the tick.
>> >
>> > This isn't straightforward, because menu_select() invokes
>> > tick_nohz_get_sleep_length() to get the time to the next timer
>> > event and the number returned by the latter comes from
>> > __tick_nohz_idle_enter().  Fortunately, however, it is possible
>> > to compute that number without actually stopping the tick and with
>> > the help of the existing code.
>>
>> I think something is wrong with the new tick_nohz_get_sleep_length.
>> It seems to return a value that is too large, ignoring immanent
>> non-sched timer.
>
> That's a very useful hint, let me have a look.
>
>> I tested idle-loop-v7.3. It looks very similar to my previous results
>> on the first idle-loop-git-version [1]. Idle and traditional synthetic
>> powernightmares are mostly good.
>
> OK
>
>> But it selects too deep C-states for short idle periods, which is bad
>> for power consumption [2].
>
> That still needs to be improved, then.
>
>> I tracked this down with additional tests using
>> __attribute__((optimize("O0"))) menu_select
>> and perf probe. With this the behavior seems slightly different, but it
>> shows that data->next_timer_us is:
>> v4.16-rc6: the expected ~500 us [3]
>> idle-loop-v7.3: many milliseconds to minutes [4].
>> This leads to the governor to wrongly selecting C6.
>>
>> Checking with 372be9e and 6ea0577, I can confirm that the change is
>> introduced by this patch.
>
> Yes, that's where the most intrusive reordering happens.

Overall, this is an interesting conundrum, because the case in
question is when the tick should never be stopped at all during the
workload and the code's behavior in that case should not change, so
the change was not intentional.

Now, from walking through the code, as long as can_stop_idle_tick()
returns 'true' all should be fine or at least I don't see why there is
any difference in behavior in that case.

However, if can_stop_idle_tick() returns 'false' (for example, because
need_resched() returns 'true' when it is evaluated), the behavior *is*
different in a couple of ways.  I sort of know how that can be
addressed, but I'd like to reproduce your results here.

Are you still using the same workload as before to trigger this behavior?


Re: [PATCH v9 05/24] mm: Introduce pte_spinlock for FAULT_FLAG_SPECULATIVE

2018-03-28 Thread Laurent Dufour
On 25/03/2018 23:50, David Rientjes wrote:
> On Tue, 13 Mar 2018, Laurent Dufour wrote:
> 
>> When handling page fault without holding the mmap_sem the fetch of the
>> pte lock pointer and the locking will have to be done while ensuring
>> that the VMA is not touched in our back.
>>
>> So move the fetch and locking operations in a dedicated function.
>>
>> Signed-off-by: Laurent Dufour 
>> ---
>>  mm/memory.c | 15 +++
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 8ac241b9f370..21b1212a0892 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2288,6 +2288,13 @@ int apply_to_page_range(struct mm_struct *mm, 
>> unsigned long addr,
>>  }
>>  EXPORT_SYMBOL_GPL(apply_to_page_range);
>>  
>> +static bool pte_spinlock(struct vm_fault *vmf)
> 
> inline?

You're right.
Indeed this was done in the patch 18 : "mm: Provide speculative fault
infrastructure", but this has to be done there too, I'll fix that.

> 
>> +{
>> +vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
>> +spin_lock(vmf->ptl);
>> +return true;
>> +}
>> +
>>  static bool pte_map_lock(struct vm_fault *vmf)
>>  {
>>  vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
> 
> Shouldn't pte_unmap_same() take struct vm_fault * and use the new 
> pte_spinlock()?

done in the next patch, but you already acked it..



Re: [PATCH] ANDROID: binder: prevent transactions into own process.

2018-03-28 Thread Greg KH
On Wed, Mar 28, 2018 at 09:29:03AM +0200, Martijn Coenen wrote:
> This can't happen with normal nodes (because you can't get a ref
> to a node you own), but it could happen with the context manager;
> to make the behavior consistent with regular nodes, reject
> transactions into the context manager by the process owning it.
> 
> Reported-by: syzbot+09e05aba06723a94d...@syzkaller.appspotmail.com
> Signed-off-by: Martijn Coenen 

Does this need to go to older kernels as well?

I have a script that picks up everything the syzbot finds and tries to
backport them, after they are applied in Linus's tree.  Might as well
catch things before we have to rely on my script :)

thanks,

greg k-h


Re: [PATCH v3 05/11] dt-bindings: i3c: Document core bindings

2018-03-28 Thread Boris Brezillon
Hi Rob,

On Mon, 26 Mar 2018 17:24:58 -0500
Rob Herring  wrote:

> > +
> > +I3C devices
> > +===
> > +
> > +All I3C devices are supposed to support DAA (Dynamic Address Assignment), 
> > and
> > +are thus discoverable. So, by default, I3C devices do not have to be 
> > described
> > +in the device tree.
> > +This being said, one might want to attach extra resources to these devices,
> > +and those resources may have to be described in the device tree, which in 
> > turn
> > +means we have to describe I3C devices.
> > +
> > +Another use case for describing an I3C device in the device tree is when 
> > this
> > +I3C device has a static address and we want to assign it a specific dynamic
> > +address before the DAA takes place (so that other devices on the bus can't 
> >  
> 
> static is I2C address and dynamic is an I3C address. That could be 
> clearer throughout.

I'll clarify that.

> 
> > +take this dynamic address).
> > +
> > +The I3C device should be names @,,  
> 
> s/static-address/static-i2c-address/

Okay.

> 
> > +where device-type is describing the type of device connected on the bus
> > +(gpio-controller, sensor, ...).
> > +
> > +Required properties
> > +---
> > +- reg: contains 3 cells
> > +  + first cell : encodes the I2C address. Should be 0 if the device does 
> > not
> > +have one (0 is not a valid I3C address).  
> 
> Change here to "encodes the static I2C address". 
> 
> 0 is not a valid I2C address?

According to [1] it is reserved, and it's reserved in the I3C spec
anyway (see "Table 9 I3C Slave Address Restrictions" in the I3C spec).

> 
> > +
> > +  + second and third cells: should encode the ProvisionalID. The second 
> > cell
> > +   contains the manufacturer ID left-shifted by 1.
> > +   The third cell contains ORing of the part ID
> > +   left-shifted by 16, the instance ID left-shifted
> > +   by 12 and the extra information. This encoding is
> > +   following the PID definition provided by the I3C
> > +   specification.

One extra question for you: should I refer to the I3C_DEV(),
I3C_DEV_WITH_STATIC_ADDR() and I2C_DEV() macros in the bindings doc?
And if I do, should I use them my example?

Thanks,

Boris

> > +
> > +Optional properties
> > +---
> > +- assigned-address: dynamic address to be assigned to this device. This
> > +   property is only valid if the I3C device has a static
> > +   address (first cell of the reg property != 0).
> > +
> > +
> > +Example:
> > +
> > +   i3c-master@d04 {
> > +   compatible = "cdns,i3c-master";
> > +   clocks = <&coreclock>, <&i3csysclock>;
> > +   clock-names = "pclk", "sysclk";
> > +   interrupts = <3 0>;
> > +   reg = <0x0d04 0x1000>;
> > +   #address-cells = <3>;
> > +   #size-cells = <0>;
> > +
> > +   status = "okay";
> > +   i2c-scl-frequency = <10>;
> > +
> > +   /* I2C device. */
> > +   nunchuk: nunchuk@52 {
> > +   compatible = "nintendo,nunchuk";
> > +   reg = <0x52 0x8010 0x0>;
> > +   };
> > +
> > +   /* I3C device with a static address. */
> > +   thermal_sensor: sensor@68,39200144004 {
> > +   reg = <0x68 0x392 0x144004>;
> > +   assigned-address = <0xa>;
> > +   };
> > +
> > +   /*
> > +* I3C device without a static address but requiring resources
> > +* described in the DT.
> > +*/
> > +   sensor@0,39200154004 {
> > +   reg = <0x0 0x392 0x154004>;
> > +   clocks = <&clock_provider 0>;
> > +   };
> > +   };
> > +
> > -- 
> > 2.14.1
> >   

[1]http://www.i2c-bus.org/addressing

-- 
Boris Brezillon, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH 4.9 00/67] 4.9.91-stable review

2018-03-28 Thread Greg Kroah-Hartman
On Tue, Mar 27, 2018 at 08:35:01PM -0500, Dan Rue wrote:
> qemu_x86_64
> * boot - pass: 21
> * kselftest - skip: 28, pass: 52

Do you have a list of what you are skipping anywhere?  There was some
x86 changes that I had to backport that I was worried about getting
right here, are you running the x86 kselftests?

thanks,

greg k-h


Re: [PATCH v2] rslib: Remove VLAs by setting upper bound on nroots

2018-03-28 Thread Thomas Gleixner
On Tue, 27 Mar 2018, Kees Cook wrote:
> On Tue, Mar 27, 2018 at 4:45 PM, Andrew Morton
>  wrote:
> > On Mon, 26 Mar 2018 16:17:57 -0700 Kees Cook  wrote:
> >
> >> On Fri, Mar 16, 2018 at 11:25 PM, Kees Cook  wrote:
> >> > On Fri, Mar 16, 2018 at 3:59 PM, Andrew Morton
> >> >  wrote:
> >> >> On Thu, 15 Mar 2018 15:59:19 -0700 Kees Cook  
> >> >> wrote:
> >> >>
> >> >>> Avoid stack VLAs[1] by always allocating the upper bound of stack space
> >> >>> needed. The existing users of rslib appear to max out at 24 roots[2],
> >> >>> so use that as the upper bound until we have a reason to change it.
> >> >>>
> >> >>> Alternative considered: make init_rs() a true caller-instance and
> >> >>> pre-allocate the workspaces. This would possibly need locking and
> >> >>> a refactoring of the returned structure.
> >> >>>
> >> >>> Using kmalloc in this path doesn't look great, especially since at
> >> >>> least one caller (pstore) is sensitive to allocations during rslib
> >> >>> usage (it expects to run it during an Oops, for example).
> >> >>
> >> >> Oh.
> >> >>
> >> >> Could we allocate the storage during init_rs(), attach it to `struct
> >> >> rs_control'?
> >> >
> >> > No, because they're modified during decode, and struct rs_control is
> >> > shared between users. :(
> >> >
> >> > Doing those changes is possible, but it requires a rather extensive
> >> > analysis of callers, etc.
> >> >
> >> > Hence, the 24 ultimately.
> >>
> >> Can this land in -mm, or does this need further discussion?
> >
> > Grumble.  That share-the-rs_control-if-there's-already-a-matching-one
> > thing looks like premature optimization to me :(

That was done back then in the days when the first NAND chips required Reed
solomon error correction and we were still running on rather small devices.

> > I guess if we put this storage into the rs_control (rather than on the
> > stack) then we'd have to worry about concurrent uses of it.  It looks
> > like all the other fields are immutable once it's set up so there might
> > be such users.  In fact, I suspect there are...
> 
> Exactly. :( This is the same conclusion tglx and I came to.

I think we can lift that and just let all users set up a new rs_control
from scratch. Takes some time to init the tables, but 

Thanks,

tglx


Re: [PATCH 4/4] nvme: lightnvm: add late setup of block size and metadata

2018-03-28 Thread Christoph Hellwig
I really don't want more lightnvm cruft in the core.  We'll need
a proper abstraction.c

On Fri, Mar 23, 2018 at 12:00:08PM +0100, Matias Bjørling wrote:
> On 02/05/2018 01:15 PM, Matias Bjørling wrote:
> > The nvme driver sets up the size of the nvme namespace in two steps.
> > First it initializes the device with standard logical block and
> > metadata sizes, and then sets the correct logical block and metadata
> > size. Due to the OCSSD 2.0 specification relies on the namespace to
> > expose these sizes for correct initialization, let it be updated
> > appropriately on the LightNVM side as well.
> > 
> > Signed-off-by: Matias Bjørling 
> > ---
> >   drivers/nvme/host/core.c | 2 ++
> >   drivers/nvme/host/lightnvm.c | 8 
> >   drivers/nvme/host/nvme.h | 2 ++
> >   3 files changed, 12 insertions(+)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index f837d666cbd4..740ceb28067c 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1379,6 +1379,8 @@ static void __nvme_revalidate_disk(struct gendisk 
> > *disk, struct nvme_id_ns *id)
> > if (ns->noiob)
> > nvme_set_chunk_size(ns);
> > nvme_update_disk_info(disk, ns, id);
> > +   if (ns->ndev)
> > +   nvme_nvm_update_nvm_info(ns);
> >   #ifdef CONFIG_NVME_MULTIPATH
> > if (ns->head->disk)
> > nvme_update_disk_info(ns->head->disk, ns, id);
> > diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> > index a9c010655ccc..8d4301854811 100644
> > --- a/drivers/nvme/host/lightnvm.c
> > +++ b/drivers/nvme/host/lightnvm.c
> > @@ -814,6 +814,14 @@ int nvme_nvm_ioctl(struct nvme_ns *ns, unsigned int 
> > cmd, unsigned long arg)
> > }
> >   }
> > +void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
> > +{
> > +   struct nvm_dev *ndev = ns->ndev;
> > +
> > +   ndev->identity.csecs = ndev->geo.sec_size = 1 << ns->lba_shift;
> > +   ndev->identity.sos = ndev->geo.oob_size = ns->ms;
> > +}
> > +
> >   int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> >   {
> > struct request_queue *q = ns->queue;
> > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> > index ea1aa5283e8e..1ca08f4993ba 100644
> > --- a/drivers/nvme/host/nvme.h
> > +++ b/drivers/nvme/host/nvme.h
> > @@ -451,12 +451,14 @@ static inline void 
> > nvme_mpath_clear_current_path(struct nvme_ns *ns)
> >   #endif /* CONFIG_NVME_MULTIPATH */
> >   #ifdef CONFIG_NVM
> > +void nvme_nvm_update_nvm_info(struct nvme_ns *ns);
> >   int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node);
> >   void nvme_nvm_unregister(struct nvme_ns *ns);
> >   int nvme_nvm_register_sysfs(struct nvme_ns *ns);
> >   void nvme_nvm_unregister_sysfs(struct nvme_ns *ns);
> >   int nvme_nvm_ioctl(struct nvme_ns *ns, unsigned int cmd, unsigned long 
> > arg);
> >   #else
> > +static inline void nvme_nvm_update_nvm_info(struct nvme_ns *ns) {};
> >   static inline int nvme_nvm_register(struct nvme_ns *ns, char *disk_name,
> > int node)
> >   {
> > 
> 
> Hi Keith,
> 
> When going through the patches for 4.17, I forgot to run this patch by you.
> It is part of adding OCSSD2.0 support to the kernel, and slides in between a
> large refactoring, and the 2.0 part. May I add your reviewed by and let Jens
> pick it up after the nvme patches for 4.17 has gone up?
> 
> Thanks!
> 
> -Matias
> 
> ___
> Linux-nvme mailing list
> linux-n...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---


Re: [PATCH v9 06/24] mm: make pte_unmap_same compatible with SPF

2018-03-28 Thread Laurent Dufour


On 27/03/2018 23:18, David Rientjes wrote:
> On Tue, 13 Mar 2018, Laurent Dufour wrote:
> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 2f3e98edc94a..b6432a261e63 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1199,6 +1199,7 @@ static inline void clear_page_pfmemalloc(struct page 
>> *page)
>>  #define VM_FAULT_NEEDDSYNC  0x2000  /* ->fault did not modify page tables
>>   * and needs fsync() to complete (for
>>   * synchronous page faults in DAX) */
>> +#define VM_FAULT_PTNOTSAME 0x4000   /* Page table entries have changed */
>>  
>>  #define VM_FAULT_ERROR  (VM_FAULT_OOM | VM_FAULT_SIGBUS | 
>> VM_FAULT_SIGSEGV | \
>>   VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 21b1212a0892..4bc7b0bdcb40 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2309,21 +2309,29 @@ static bool pte_map_lock(struct vm_fault *vmf)
>>   * parts, do_swap_page must check under lock before unmapping the pte and
>>   * proceeding (but do_wp_page is only called after already making such a 
>> check;
>>   * and do_anonymous_page can safely check later on).
>> + *
>> + * pte_unmap_same() returns:
>> + *  0   if the PTE are the same
>> + *  VM_FAULT_PTNOTSAME  if the PTE are different
>> + *  VM_FAULT_RETRY  if the VMA has changed in our back during
>> + *  a speculative page fault handling.
>>   */
>> -static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>> -pte_t *page_table, pte_t orig_pte)
>> +static inline int pte_unmap_same(struct vm_fault *vmf)
>>  {
>> -int same = 1;
>> +int ret = 0;
>> +
>>  #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
>>  if (sizeof(pte_t) > sizeof(unsigned long)) {
>> -spinlock_t *ptl = pte_lockptr(mm, pmd);
>> -spin_lock(ptl);
>> -same = pte_same(*page_table, orig_pte);
>> -spin_unlock(ptl);
>> +if (pte_spinlock(vmf)) {
>> +if (!pte_same(*vmf->pte, vmf->orig_pte))
>> +ret = VM_FAULT_PTNOTSAME;
>> +spin_unlock(vmf->ptl);
>> +} else
>> +ret = VM_FAULT_RETRY;
>>  }
>>  #endif
>> -pte_unmap(page_table);
>> -return same;
>> +pte_unmap(vmf->pte);
>> +return ret;
>>  }
>>  
>>  static inline void cow_user_page(struct page *dst, struct page *src, 
>> unsigned long va, struct vm_area_struct *vma)
>> @@ -2913,7 +2921,8 @@ int do_swap_page(struct vm_fault *vmf)
>>  int exclusive = 0;
>>  int ret = 0;
> 
> Initialization is now unneeded.

I'm sorry, what "initialization" are you talking about here ?

> 
> Otherwise:
> 
> Acked-by: David Rientjes 

Thanks,
Laurent.



  1   2   3   4   5   6   7   8   9   10   >