date:20180408

WARNING in ip_rt_bug

2018-04-08 Thread syzbot


Hello,

syzbot hit the following crash on net-next commit
8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +)
Merge tag 'mlx5-updates-2018-03-30' of  
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=3327544840960562528

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

netlink: 'syz-executor6': attribute type 3 has an invalid length.
WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20  
net/ipv4/route.c:1212

Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:8801db007290 EFLAGS: 00010282
RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca
RDX: 0100 RSI: 8858c300 RDI: 0282
RBP: 8801db007298 R08: 11003b600de1 R09: 
R10:  R11:  R12: 8801d8dda3c0
R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xae/0x180 net/ipv4/arp.c:297
 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:541 [inline]
 smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778  
[inline]

RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12
RAX: dc00 RBX: 8801d225e400 RCX: 
RDX: 110a24e5 RSI: b98b8227 RDI: 0282
RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004
R10: 880197b3f960 R11: 0003 R12: 110032f67f36
R13:  R14:  R15: 0001
 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
 __bprm_mm_init fs/exec.c:297 [inline]
 bprm_mm_init fs/exec.c:414 [inline]
 do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
 do_execve+0x31/0x40 fs/exec.c:1847
 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.

WARNING in ip_rt_bug

2018-04-08 Thread syzbot


Hello,

syzbot hit the following crash on net-next commit
8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +)
Merge tag 'mlx5-updates-2018-03-30' of  
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=3327544840960562528

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

netlink: 'syz-executor6': attribute type 3 has an invalid length.
WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20  
net/ipv4/route.c:1212

Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:8801db007290 EFLAGS: 00010282
RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca
RDX: 0100 RSI: 8858c300 RDI: 0282
RBP: 8801db007298 R08: 11003b600de1 R09: 
R10:  R11:  R12: 8801d8dda3c0
R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xae/0x180 net/ipv4/arp.c:297
 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:541 [inline]
 smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778  
[inline]

RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12
RAX: dc00 RBX: 8801d225e400 RCX: 
RDX: 110a24e5 RSI: b98b8227 RDI: 0282
RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004
R10: 880197b3f960 R11: 0003 R12: 110032f67f36
R13:  R14:  R15: 0001
 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
 __bprm_mm_init fs/exec.c:297 [inline]
 bprm_mm_init fs/exec.c:414 [inline]
 do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
 do_execve+0x31/0x40 fs/exec.c:1847
 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.

WARNING in ip_rt_bug

2018-04-08 Thread syzbot


Hello,

syzbot hit the following crash on net-next commit
8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +)
Merge tag 'mlx5-updates-2018-03-30' of  
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=3327544840960562528

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

netlink: 'syz-executor6': attribute type 3 has an invalid length.
WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20  
net/ipv4/route.c:1212

Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:8801db007290 EFLAGS: 00010282
RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca
RDX: 0100 RSI: 8858c300 RDI: 0282
RBP: 8801db007298 R08: 11003b600de1 R09: 
R10:  R11:  R12: 8801d8dda3c0
R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xae/0x180 net/ipv4/arp.c:297
 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:541 [inline]
 smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778  
[inline]

RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12
RAX: dc00 RBX: 8801d225e400 RCX: 
RDX: 110a24e5 RSI: b98b8227 RDI: 0282
RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004
R10: 880197b3f960 R11: 0003 R12: 110032f67f36
R13:  R14:  R15: 0001
 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
 __bprm_mm_init fs/exec.c:297 [inline]
 bprm_mm_init fs/exec.c:414 [inline]
 do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
 do_execve+0x31/0x40 fs/exec.c:1847
 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.

WARNING in ip_rt_bug

2018-04-08 Thread syzbot


Hello,

syzbot hit the following crash on net-next commit
8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +)
Merge tag 'mlx5-updates-2018-03-30' of  
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=3327544840960562528

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

netlink: 'syz-executor6': attribute type 3 has an invalid length.
WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20  
net/ipv4/route.c:1212

Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:8801db007290 EFLAGS: 00010282
RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca
RDX: 0100 RSI: 8858c300 RDI: 0282
RBP: 8801db007298 R08: 11003b600de1 R09: 
R10:  R11:  R12: 8801d8dda3c0
R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xae/0x180 net/ipv4/arp.c:297
 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:541 [inline]
 smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778  
[inline]

RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12
RAX: dc00 RBX: 8801d225e400 RCX: 
RDX: 110a24e5 RSI: b98b8227 RDI: 0282
RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004
R10: 880197b3f960 R11: 0003 R12: 110032f67f36
R13:  R14:  R15: 0001
 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
 __bprm_mm_init fs/exec.c:297 [inline]
 bprm_mm_init fs/exec.c:414 [inline]
 do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
 do_execve+0x31/0x40 fs/exec.c:1847
 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.

BUG: soft lockup in snd_virmidi_output_trigger

2018-04-08 Thread syzbot


Hello,

syzbot hit the following crash on upstream commit
3fd14cdcc05a682b03743683ce3a726898b20555 (Fri Apr 6 19:15:41 2018 +)
Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=619d9f40141d826b097e


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4594231414882304
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5813481738265533882

compiler: gcc (GCC) 8.0.1 20180301 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+619d9f40141d826b0...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready
8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready
8021q: adding VLAN 0 to HW filter on device team0
watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor2:10431]
Modules linked in:
irq event stamp: 35856
hardirqs last  enabled at (35855): []  
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
hardirqs last  enabled at (35855): []  
_raw_spin_unlock_irqrestore+0x74/0xc0 kernel/locking/spinlock.c:184
hardirqs last disabled at (35856): []  
interrupt_entry+0xb1/0xf0 arch/x86/entry/entry_64.S:624
softirqs last  enabled at (162): []  
__do_softirq+0x778/0xaf5 kernel/softirq.c:311
softirqs last disabled at (95): [] invoke_softirq  
kernel/softirq.c:365 [inline]
softirqs last disabled at (95): [] irq_exit+0x1d1/0x200  
kernel/softirq.c:405

CPU: 1 PID: 10431 Comm: syz-executor2 Not tainted 4.16.0+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160  
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0  
kernel/locking/spinlock.c:184

RSP: 0018:880184db7780 EFLAGS: 0282 ORIG_RAX: ff13
RAX: dc00 RBX: 0282 RCX: 
RDX: 11162e55 RSI: 0001 RDI: 0282
RBP: 880184db7790 R08: ed0035d21962 R09: 
R10:  R11:  R12: 8801ae90cb08
R13: 880184db7810 R14: 0001 R15: 8801cb9a5880
FS:  7fc943ad8700() GS:8801db10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fc943ad7db8 CR3: 0001b070d000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 spin_unlock_irqrestore include/linux/spinlock.h:365 [inline]
 snd_virmidi_output_trigger+0x522/0x6c0 sound/core/seq/seq_virmidi.c:205
 snd_rawmidi_output_trigger sound/core/rawmidi.c:150 [inline]
 snd_rawmidi_kernel_write1+0x519/0x700 sound/core/rawmidi.c:1288
 snd_rawmidi_write+0x2e2/0xdc0 sound/core/rawmidi.c:1338
 __vfs_write+0x10b/0x880 fs/read_write.c:485
 vfs_write+0x1f8/0x560 fs/read_write.c:549
 ksys_write+0xf9/0x250 fs/read_write.c:598
 SYSC_write fs/read_write.c:610 [inline]
 SyS_write+0x24/0x30 fs/read_write.c:607
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455259
RSP: 002b:7fc943ad7c68 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 7fc943ad86d4 RCX: 00455259
RDX: e78e624c RSI: 2040 RDI: 0014
RBP: 0072c010 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 06ca R14: 006fd390 R15: 0002
Code: c7 a8 72 b1 88 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80  
3c 02 00 75 21 48 83 3d 4e 5b 67 01 00 74 0e 48 89 df 57 9d <0f> 1f 44 00  
00 eb bb 0f 0b 0f 0b e8 6f 29 68 fa eb 97 e8 68 29



---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.

BUG: soft lockup in snd_virmidi_output_trigger

2018-04-08 Thread syzbot


Hello,

syzbot hit the following crash on upstream commit
3fd14cdcc05a682b03743683ce3a726898b20555 (Fri Apr 6 19:15:41 2018 +)
Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=619d9f40141d826b097e


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4594231414882304
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5813481738265533882

compiler: gcc (GCC) 8.0.1 20180301 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+619d9f40141d826b0...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready
8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready
8021q: adding VLAN 0 to HW filter on device team0
watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor2:10431]
Modules linked in:
irq event stamp: 35856
hardirqs last  enabled at (35855): []  
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
hardirqs last  enabled at (35855): []  
_raw_spin_unlock_irqrestore+0x74/0xc0 kernel/locking/spinlock.c:184
hardirqs last disabled at (35856): []  
interrupt_entry+0xb1/0xf0 arch/x86/entry/entry_64.S:624
softirqs last  enabled at (162): []  
__do_softirq+0x778/0xaf5 kernel/softirq.c:311
softirqs last disabled at (95): [] invoke_softirq  
kernel/softirq.c:365 [inline]
softirqs last disabled at (95): [] irq_exit+0x1d1/0x200  
kernel/softirq.c:405

CPU: 1 PID: 10431 Comm: syz-executor2 Not tainted 4.16.0+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160  
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0  
kernel/locking/spinlock.c:184

RSP: 0018:880184db7780 EFLAGS: 0282 ORIG_RAX: ff13
RAX: dc00 RBX: 0282 RCX: 
RDX: 11162e55 RSI: 0001 RDI: 0282
RBP: 880184db7790 R08: ed0035d21962 R09: 
R10:  R11:  R12: 8801ae90cb08
R13: 880184db7810 R14: 0001 R15: 8801cb9a5880
FS:  7fc943ad8700() GS:8801db10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fc943ad7db8 CR3: 0001b070d000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 spin_unlock_irqrestore include/linux/spinlock.h:365 [inline]
 snd_virmidi_output_trigger+0x522/0x6c0 sound/core/seq/seq_virmidi.c:205
 snd_rawmidi_output_trigger sound/core/rawmidi.c:150 [inline]
 snd_rawmidi_kernel_write1+0x519/0x700 sound/core/rawmidi.c:1288
 snd_rawmidi_write+0x2e2/0xdc0 sound/core/rawmidi.c:1338
 __vfs_write+0x10b/0x880 fs/read_write.c:485
 vfs_write+0x1f8/0x560 fs/read_write.c:549
 ksys_write+0xf9/0x250 fs/read_write.c:598
 SYSC_write fs/read_write.c:610 [inline]
 SyS_write+0x24/0x30 fs/read_write.c:607
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455259
RSP: 002b:7fc943ad7c68 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 7fc943ad86d4 RCX: 00455259
RDX: e78e624c RSI: 2040 RDI: 0014
RBP: 0072c010 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 06ca R14: 006fd390 R15: 0002
Code: c7 a8 72 b1 88 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80  
3c 02 00 75 21 48 83 3d 4e 5b 67 01 00 74 0e 48 89 df 57 9d <0f> 1f 44 00  
00 eb bb 0f 0b 0f 0b e8 6f 29 68 fa eb 97 e8 68 29



---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.

Re: KASAN: slab-out-of-bounds Read in pfkey_add

2018-04-08 Thread Kevin Easton

On Sun, Apr 08, 2018 at 09:04:33PM -0700, Eric Biggers wrote:
...
> 
> Looks like this is going to be fixed by
> https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of
> provided sadb_key"), but it's not applied yet to the ipsec tree yet.  Kevin, 
> for
> future reference, for syzbot bugs it would be helpful to reply to the original
> bug report and say that a patch was sent out, or even better send the patch 
> as a
> reply to the bug report email, e.g.
> 
>   git format-patch 
> --in-reply-to="<001a114292fadd3e250560706...@google.com>"
> 
> for this one (and the Message ID can be found in the syzkaller-bugs archive 
> even
> if the email isn't in your inbox).

Sure, I can do that.

- Kevin

Re: KASAN: slab-out-of-bounds Read in pfkey_add

2018-04-08 Thread Kevin Easton

On Sun, Apr 08, 2018 at 09:04:33PM -0700, Eric Biggers wrote:
...
> 
> Looks like this is going to be fixed by
> https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of
> provided sadb_key"), but it's not applied yet to the ipsec tree yet.  Kevin, 
> for
> future reference, for syzbot bugs it would be helpful to reply to the original
> bug report and say that a patch was sent out, or even better send the patch 
> as a
> reply to the bug report email, e.g.
> 
>   git format-patch 
> --in-reply-to="<001a114292fadd3e250560706...@google.com>"
> 
> for this one (and the Message ID can be found in the syzkaller-bugs archive 
> even
> if the email isn't in your inbox).

Sure, I can do that.

- Kevin

[PATCH v3 1/4] zram: correct flag name of ZRAM_ACCESS

2018-04-08 Thread Minchan Kim

ZRAM_ACCESS is used for locking a slot of zram so correct the name.
It is also not a common flag to indicate status of the block so
move the declare position on top of the flag.
Lastly, let's move the function to the top of source code to be able to
use it easily without forward declaration.

Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 20 ++--
 drivers/block/zram/zram_drv.h |  6 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0f3fadd71230..18dadeab775b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -52,6 +52,16 @@ static size_t huge_class_size;
 
 static void zram_free_page(struct zram *zram, size_t index);
 
+static void zram_slot_lock(struct zram *zram, u32 index)
+{
+   bit_spin_lock(ZRAM_LOCK, >table[index].value);
+}
+
+static void zram_slot_unlock(struct zram *zram, u32 index)
+{
+   bit_spin_unlock(ZRAM_LOCK, >table[index].value);
+}
+
 static inline bool init_done(struct zram *zram)
 {
return zram->disksize;
@@ -753,16 +763,6 @@ static DEVICE_ATTR_RO(io_stat);
 static DEVICE_ATTR_RO(mm_stat);
 static DEVICE_ATTR_RO(debug_stat);
 
-static void zram_slot_lock(struct zram *zram, u32 index)
-{
-   bit_spin_lock(ZRAM_ACCESS, >table[index].value);
-}
-
-static void zram_slot_unlock(struct zram *zram, u32 index)
-{
-   bit_spin_unlock(ZRAM_ACCESS, >table[index].value);
-}
-
 static void zram_meta_free(struct zram *zram, u64 disksize)
 {
size_t num_pages = disksize >> PAGE_SHIFT;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 008861220723..8d8959ceabd1 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -43,9 +43,9 @@
 
 /* Flags for zram pages (table[page_no].value) */
 enum zram_pageflags {
-   /* Page consists the same element */
-   ZRAM_SAME = ZRAM_FLAG_SHIFT,
-   ZRAM_ACCESS,/* page is now accessed */
+   /* zram slot is locked */
+   ZRAM_LOCK = ZRAM_FLAG_SHIFT,
+   ZRAM_SAME,  /* Page consists the same element */
ZRAM_WB,/* page is stored on backing_device */
 
__NR_ZRAM_PAGEFLAGS,
-- 
2.17.0.484.g0c8726318c-goog

[PATCH v3 1/4] zram: correct flag name of ZRAM_ACCESS

2018-04-08 Thread Minchan Kim

ZRAM_ACCESS is used for locking a slot of zram so correct the name.
It is also not a common flag to indicate status of the block so
move the declare position on top of the flag.
Lastly, let's move the function to the top of source code to be able to
use it easily without forward declaration.

Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 20 ++--
 drivers/block/zram/zram_drv.h |  6 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0f3fadd71230..18dadeab775b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -52,6 +52,16 @@ static size_t huge_class_size;
 
 static void zram_free_page(struct zram *zram, size_t index);
 
+static void zram_slot_lock(struct zram *zram, u32 index)
+{
+   bit_spin_lock(ZRAM_LOCK, >table[index].value);
+}
+
+static void zram_slot_unlock(struct zram *zram, u32 index)
+{
+   bit_spin_unlock(ZRAM_LOCK, >table[index].value);
+}
+
 static inline bool init_done(struct zram *zram)
 {
return zram->disksize;
@@ -753,16 +763,6 @@ static DEVICE_ATTR_RO(io_stat);
 static DEVICE_ATTR_RO(mm_stat);
 static DEVICE_ATTR_RO(debug_stat);
 
-static void zram_slot_lock(struct zram *zram, u32 index)
-{
-   bit_spin_lock(ZRAM_ACCESS, >table[index].value);
-}
-
-static void zram_slot_unlock(struct zram *zram, u32 index)
-{
-   bit_spin_unlock(ZRAM_ACCESS, >table[index].value);
-}
-
 static void zram_meta_free(struct zram *zram, u64 disksize)
 {
size_t num_pages = disksize >> PAGE_SHIFT;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 008861220723..8d8959ceabd1 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -43,9 +43,9 @@
 
 /* Flags for zram pages (table[page_no].value) */
 enum zram_pageflags {
-   /* Page consists the same element */
-   ZRAM_SAME = ZRAM_FLAG_SHIFT,
-   ZRAM_ACCESS,/* page is now accessed */
+   /* zram slot is locked */
+   ZRAM_LOCK = ZRAM_FLAG_SHIFT,
+   ZRAM_SAME,  /* Page consists the same element */
ZRAM_WB,/* page is stored on backing_device */
 
__NR_ZRAM_PAGEFLAGS,
-- 
2.17.0.484.g0c8726318c-goog

[PATCH v3 2/4] zram: mark incompressible page as ZRAM_HUGE

2018-04-08 Thread Minchan Kim

Mark incompressible pages so that we could investigate who is the
owner of the incompressible pages once the page is swapped out
via using upcoming zram memory tracker feature.

With it, we could prevent such pages to be swapped out by using
mlock. Otherwise we might remove them.

This patch exposes new stat for huge pages via mm_stat.

Signed-off-by: Minchan Kim 
---
 Documentation/blockdev/zram.txt |  1 +
 drivers/block/zram/zram_drv.c   | 17 ++---
 drivers/block/zram/zram_drv.h   |  2 ++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 257e65714c6a..78db38d02bc9 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -218,6 +218,7 @@ The stat file represents device's mm statistics. It 
consists of a single
  same_pages   the number of same element filled pages written to this disk.
   No memory is allocated for such pages.
  pages_compacted  the number of pages freed during compaction
+ huge_pages  the number of incompressible pages
 
 9) Deactivate:
swapoff /dev/zram0
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 18dadeab775b..777fb3339f59 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -729,14 +729,15 @@ static ssize_t mm_stat_show(struct device *dev,
max_used = atomic_long_read(>stats.max_used_pages);
 
ret = scnprintf(buf, PAGE_SIZE,
-   "%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n",
+   "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n",
orig_size << PAGE_SHIFT,
(u64)atomic64_read(>stats.compr_data_size),
mem_used << PAGE_SHIFT,
zram->limit_pages << PAGE_SHIFT,
max_used << PAGE_SHIFT,
(u64)atomic64_read(>stats.same_pages),
-   pool_stats.pages_compacted);
+   pool_stats.pages_compacted,
+   (u64)atomic64_read(>stats.huge_pages));
up_read(>init_lock);
 
return ret;
@@ -805,6 +806,11 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
unsigned long handle;
 
+   if (zram_test_flag(zram, index, ZRAM_HUGE)) {
+   zram_clear_flag(zram, index, ZRAM_HUGE);
+   atomic64_dec(>stats.huge_pages);
+   }
+
if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) {
zram_wb_clear(zram, index);
atomic64_dec(>stats.pages_stored);
@@ -973,6 +979,7 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
}
 
if (unlikely(comp_len >= huge_class_size)) {
+   comp_len = PAGE_SIZE;
if (zram_wb_enabled(zram) && allow_wb) {
zcomp_stream_put(zram->comp);
ret = write_to_bdev(zram, bvec, index, bio, );
@@ -984,7 +991,6 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
allow_wb = false;
goto compress_again;
}
-   comp_len = PAGE_SIZE;
}
 
/*
@@ -1046,6 +1052,11 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
zram_slot_lock(zram, index);
zram_free_page(zram, index);
 
+   if (comp_len == PAGE_SIZE) {
+   zram_set_flag(zram, index, ZRAM_HUGE);
+   atomic64_inc(>stats.huge_pages);
+   }
+
if (flags) {
zram_set_flag(zram, index, flags);
zram_set_element(zram, index, element);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 8d8959ceabd1..ff0547bdb586 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -47,6 +47,7 @@ enum zram_pageflags {
ZRAM_LOCK = ZRAM_FLAG_SHIFT,
ZRAM_SAME,  /* Page consists the same element */
ZRAM_WB,/* page is stored on backing_device */
+   ZRAM_HUGE,  /* Incompressible page */
 
__NR_ZRAM_PAGEFLAGS,
 };
@@ -71,6 +72,7 @@ struct zram_stats {
atomic64_t invalid_io;  /* non-page-aligned I/O requests */
atomic64_t notify_free; /* no. of swap slot free notifications */
atomic64_t same_pages;  /* no. of same element filled pages */
+   atomic64_t huge_pages;  /* no. of huge pages */
atomic64_t pages_stored;/* no. of pages currently stored */
atomic_long_t max_used_pages;   /* no. of maximum pages stored */
atomic64_t writestall;  /* no. of write slow paths */
-- 
2.17.0.484.g0c8726318c-goog

[PATCH v3 2/4] zram: mark incompressible page as ZRAM_HUGE

2018-04-08 Thread Minchan Kim

Mark incompressible pages so that we could investigate who is the
owner of the incompressible pages once the page is swapped out
via using upcoming zram memory tracker feature.

With it, we could prevent such pages to be swapped out by using
mlock. Otherwise we might remove them.

This patch exposes new stat for huge pages via mm_stat.

Signed-off-by: Minchan Kim 
---
 Documentation/blockdev/zram.txt |  1 +
 drivers/block/zram/zram_drv.c   | 17 ++---
 drivers/block/zram/zram_drv.h   |  2 ++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 257e65714c6a..78db38d02bc9 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -218,6 +218,7 @@ The stat file represents device's mm statistics. It 
consists of a single
  same_pages   the number of same element filled pages written to this disk.
   No memory is allocated for such pages.
  pages_compacted  the number of pages freed during compaction
+ huge_pages  the number of incompressible pages
 
 9) Deactivate:
swapoff /dev/zram0
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 18dadeab775b..777fb3339f59 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -729,14 +729,15 @@ static ssize_t mm_stat_show(struct device *dev,
max_used = atomic_long_read(>stats.max_used_pages);
 
ret = scnprintf(buf, PAGE_SIZE,
-   "%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n",
+   "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n",
orig_size << PAGE_SHIFT,
(u64)atomic64_read(>stats.compr_data_size),
mem_used << PAGE_SHIFT,
zram->limit_pages << PAGE_SHIFT,
max_used << PAGE_SHIFT,
(u64)atomic64_read(>stats.same_pages),
-   pool_stats.pages_compacted);
+   pool_stats.pages_compacted,
+   (u64)atomic64_read(>stats.huge_pages));
up_read(>init_lock);
 
return ret;
@@ -805,6 +806,11 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
unsigned long handle;
 
+   if (zram_test_flag(zram, index, ZRAM_HUGE)) {
+   zram_clear_flag(zram, index, ZRAM_HUGE);
+   atomic64_dec(>stats.huge_pages);
+   }
+
if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) {
zram_wb_clear(zram, index);
atomic64_dec(>stats.pages_stored);
@@ -973,6 +979,7 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
}
 
if (unlikely(comp_len >= huge_class_size)) {
+   comp_len = PAGE_SIZE;
if (zram_wb_enabled(zram) && allow_wb) {
zcomp_stream_put(zram->comp);
ret = write_to_bdev(zram, bvec, index, bio, );
@@ -984,7 +991,6 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
allow_wb = false;
goto compress_again;
}
-   comp_len = PAGE_SIZE;
}
 
/*
@@ -1046,6 +1052,11 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
zram_slot_lock(zram, index);
zram_free_page(zram, index);
 
+   if (comp_len == PAGE_SIZE) {
+   zram_set_flag(zram, index, ZRAM_HUGE);
+   atomic64_inc(>stats.huge_pages);
+   }
+
if (flags) {
zram_set_flag(zram, index, flags);
zram_set_element(zram, index, element);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 8d8959ceabd1..ff0547bdb586 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -47,6 +47,7 @@ enum zram_pageflags {
ZRAM_LOCK = ZRAM_FLAG_SHIFT,
ZRAM_SAME,  /* Page consists the same element */
ZRAM_WB,/* page is stored on backing_device */
+   ZRAM_HUGE,  /* Incompressible page */
 
__NR_ZRAM_PAGEFLAGS,
 };
@@ -71,6 +72,7 @@ struct zram_stats {
atomic64_t invalid_io;  /* non-page-aligned I/O requests */
atomic64_t notify_free; /* no. of swap slot free notifications */
atomic64_t same_pages;  /* no. of same element filled pages */
+   atomic64_t huge_pages;  /* no. of huge pages */
atomic64_t pages_stored;/* no. of pages currently stored */
atomic_long_t max_used_pages;   /* no. of maximum pages stored */
atomic64_t writestall;  /* no. of write slow paths */
-- 
2.17.0.484.g0c8726318c-goog

[PATCH v3 0/4] zram memory tracking

2018-04-08 Thread Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. As well, it's pointless to store incompressible
pages to zram so better idea is app developers manages them directly
like free or mlock rather than remaining them on heap.

This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state
to represent each block's state so admin can investigate what memory is
cold|incompressible|same page with using pagemap once the pages are
swapped out.


The output is as follows,
  30075.033841 .wh
  30163.806904 s..
  30263.806919 ..h

First column is zram's block index and 3rh one represents symbol
(s: same page w: written page to backing store h: huge page) of the
block state. Second column represents usec time unit of the block
was last accessed. So above example means the 300th block is accessed
at 75.033851 second and it was huge so it was written to the backing
store.

* From v2:
  * debugfs and Kconfig cleanup - Greg KH
  * Remove unnecesarry buffer - Sergey
  * Change timestamp from sec to usec

* From v1:
  * Do not propagate error number for debugfs fail - Greg KH
  * Add writeback and hugepage information - Sergey

Minchan Kim (4):
  zram: correct flag name of ZRAM_ACCESS
  zram: mark incompressible page as ZRAM_HUGE
  zram: record accessed second
  zram: introduce zram memory tracking

 Documentation/blockdev/zram.txt |  25 +
 drivers/block/zram/Kconfig  |   9 ++
 drivers/block/zram/zram_drv.c   | 172 +---
 drivers/block/zram/zram_drv.h   |  14 ++-
 4 files changed, 203 insertions(+), 17 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog

[PATCH v3 4/4] zram: introduce zram memory tracking

2018-04-08 Thread Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch tell us last access time of each block of zram via
"cat /sys/kernel/debug/zram/zram0/block_state".

The output is as follows,
  30075.033841 .wh
  30163.806904 s..
  30263.806919 ..h

First column is zram's block index and 3rh one represents symbol
(s: same page w: written page to backing store h: huge page) of the
block state. Second column represents usec time unit of the block
was last accessed. So above example means the 300th block is accessed
at 75.033851 second and it was huge so it was written to the backing
store.

Admin can leverage this information to catch cold|incompressible pages
of process with *pagemap* once part of heaps are swapped out.

Cc: Greg KH 
Signed-off-by: Minchan Kim 
---
 Documentation/blockdev/zram.txt |  24 ++
 drivers/block/zram/Kconfig  |   9 +++
 drivers/block/zram/zram_drv.c   | 139 +---
 drivers/block/zram/zram_drv.h   |   5 ++
 4 files changed, 166 insertions(+), 11 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 78db38d02bc9..45509c7d5716 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
 User should set up backing device via /sys/block/zramX/backing_dev
 before disksize setting.
 
+= memory tracking
+
+With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
+zram block. It could be useful to catch cold or incompressible
+pages of the proess with*pagemap.
+If you enable the feature, you could see block state via
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
+
+ 30075.033841 .wh
+ 30163.806904 s..
+ 30263.806919 ..h
+
+First column is zram's block index.
+Second column is access time.
+Third column is state of the block.
+(s: same page
+w: written page to backing store
+h: huge page)
+
+First line of above example says 300th block is accessed at 75.033841sec
+and the block's state is huge so it is written back to the backing
+storage. It's a debugging feature so anyone shouldn't rely on it to work
+properly.
+
 Nitin Gupta
 ngu...@vflare.org
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index ac3a31d433b2..efe60c82d8ec 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -26,3 +26,12 @@ config ZRAM_WRITEBACK
 /sys/block/zramX/backing_dev.
 
 See zram.txt for more infomration.
+
+config ZRAM_MEMORY_TRACKING
+   bool "Tracking zram block status"
+   depends on ZRAM
+   select DEBUG_FS
+   help
+ With this feature, admin can track the state of allocated block
+ of zRAM. Admin could see the information via
+ /sys/kernel/debug/zram/zramX/block_state.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7fc10e2ad734..80e461dc70bc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "zram_drv.h"
@@ -67,6 +68,13 @@ static inline bool init_done(struct zram *zram)
return zram->disksize;
 }
 
+static inline bool zram_allocated(struct zram *zram, u32 index)
+{
+
+   return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) ||
+   zram->table[index].handle;
+}
+
 static inline struct zram *dev_to_zram(struct device *dev)
 {
return (struct zram *)dev_to_disk(dev)->private_data;
@@ -83,7 +91,7 @@ static void zram_set_handle(struct zram *zram, u32 index, 
unsigned long handle)
 }
 
 /* flag operations require table entry bit_spin_lock() being held */
-static int zram_test_flag(struct zram *zram, u32 index,
+static bool zram_test_flag(struct zram *zram, u32 index,
enum zram_pageflags flag)
 {
return zram->table[index].value & BIT(flag);
@@ -107,16 +115,6 @@ static inline void zram_set_element(struct zram *zram, u32 
index,
zram->table[index].element = element;
 }
 
-static void zram_accessed(struct zram *zram, u32 index)
-{
-   zram->table[index].ac_time = sched_clock();
-}
-
-static void zram_reset_access(struct zram *zram, u32 index)
-{
-   zram->table[index].ac_time = 0;
-}
-
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
return zram->table[index].element;
@@ -620,6 +618,121 @@ static int read_from_bdev(struct zram *zram, struct

[PATCH v3 3/4] zram: record accessed second

2018-04-08 Thread Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch records last access time of each block of zram so that
With upcoming zram memory tracking, it could help userspace developers
to reduce memory footprint.

Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 16 
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 777fb3339f59..7fc10e2ad734 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -107,6 +107,16 @@ static inline void zram_set_element(struct zram *zram, u32 
index,
zram->table[index].element = element;
 }
 
+static void zram_accessed(struct zram *zram, u32 index)
+{
+   zram->table[index].ac_time = sched_clock();
+}
+
+static void zram_reset_access(struct zram *zram, u32 index)
+{
+   zram->table[index].ac_time = 0;
+}
+
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
return zram->table[index].element;
@@ -806,6 +816,8 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
unsigned long handle;
 
+   zram_reset_access(zram, index);
+
if (zram_test_flag(zram, index, ZRAM_HUGE)) {
zram_clear_flag(zram, index, ZRAM_HUGE);
atomic64_dec(>stats.huge_pages);
@@ -1177,6 +1189,10 @@ static int zram_bvec_rw(struct zram *zram, struct 
bio_vec *bvec, u32 index,
 
generic_end_io_acct(q, rw_acct, >disk->part0, start_time);
 
+   zram_slot_lock(zram, index);
+   zram_accessed(zram, index);
+   zram_slot_unlock(zram, index);
+
if (unlikely(ret < 0)) {
if (!is_write)
atomic64_inc(>stats.failed_reads);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index ff0547bdb586..1075218e88b2 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -61,6 +61,7 @@ struct zram_table_entry {
unsigned long element;
};
unsigned long value;
+   u64 ac_time;
 };
 
 struct zram_stats {
-- 
2.17.0.484.g0c8726318c-goog

[PATCH v3 0/4] zram memory tracking

2018-04-08 Thread Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. As well, it's pointless to store incompressible
pages to zram so better idea is app developers manages them directly
like free or mlock rather than remaining them on heap.

This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state
to represent each block's state so admin can investigate what memory is
cold|incompressible|same page with using pagemap once the pages are
swapped out.


The output is as follows,
  30075.033841 .wh
  30163.806904 s..
  30263.806919 ..h

First column is zram's block index and 3rh one represents symbol
(s: same page w: written page to backing store h: huge page) of the
block state. Second column represents usec time unit of the block
was last accessed. So above example means the 300th block is accessed
at 75.033851 second and it was huge so it was written to the backing
store.

* From v2:
  * debugfs and Kconfig cleanup - Greg KH
  * Remove unnecesarry buffer - Sergey
  * Change timestamp from sec to usec

* From v1:
  * Do not propagate error number for debugfs fail - Greg KH
  * Add writeback and hugepage information - Sergey

Minchan Kim (4):
  zram: correct flag name of ZRAM_ACCESS
  zram: mark incompressible page as ZRAM_HUGE
  zram: record accessed second
  zram: introduce zram memory tracking

 Documentation/blockdev/zram.txt |  25 +
 drivers/block/zram/Kconfig  |   9 ++
 drivers/block/zram/zram_drv.c   | 172 +---
 drivers/block/zram/zram_drv.h   |  14 ++-
 4 files changed, 203 insertions(+), 17 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog

[PATCH v3 4/4] zram: introduce zram memory tracking

2018-04-08 Thread Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch tell us last access time of each block of zram via
"cat /sys/kernel/debug/zram/zram0/block_state".

The output is as follows,
  30075.033841 .wh
  30163.806904 s..
  30263.806919 ..h

First column is zram's block index and 3rh one represents symbol
(s: same page w: written page to backing store h: huge page) of the
block state. Second column represents usec time unit of the block
was last accessed. So above example means the 300th block is accessed
at 75.033851 second and it was huge so it was written to the backing
store.

Admin can leverage this information to catch cold|incompressible pages
of process with *pagemap* once part of heaps are swapped out.

Cc: Greg KH 
Signed-off-by: Minchan Kim 
---
 Documentation/blockdev/zram.txt |  24 ++
 drivers/block/zram/Kconfig  |   9 +++
 drivers/block/zram/zram_drv.c   | 139 +---
 drivers/block/zram/zram_drv.h   |   5 ++
 4 files changed, 166 insertions(+), 11 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 78db38d02bc9..45509c7d5716 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
 User should set up backing device via /sys/block/zramX/backing_dev
 before disksize setting.
 
+= memory tracking
+
+With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
+zram block. It could be useful to catch cold or incompressible
+pages of the proess with*pagemap.
+If you enable the feature, you could see block state via
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
+
+ 30075.033841 .wh
+ 30163.806904 s..
+ 30263.806919 ..h
+
+First column is zram's block index.
+Second column is access time.
+Third column is state of the block.
+(s: same page
+w: written page to backing store
+h: huge page)
+
+First line of above example says 300th block is accessed at 75.033841sec
+and the block's state is huge so it is written back to the backing
+storage. It's a debugging feature so anyone shouldn't rely on it to work
+properly.
+
 Nitin Gupta
 ngu...@vflare.org
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index ac3a31d433b2..efe60c82d8ec 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -26,3 +26,12 @@ config ZRAM_WRITEBACK
 /sys/block/zramX/backing_dev.
 
 See zram.txt for more infomration.
+
+config ZRAM_MEMORY_TRACKING
+   bool "Tracking zram block status"
+   depends on ZRAM
+   select DEBUG_FS
+   help
+ With this feature, admin can track the state of allocated block
+ of zRAM. Admin could see the information via
+ /sys/kernel/debug/zram/zramX/block_state.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7fc10e2ad734..80e461dc70bc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "zram_drv.h"
@@ -67,6 +68,13 @@ static inline bool init_done(struct zram *zram)
return zram->disksize;
 }
 
+static inline bool zram_allocated(struct zram *zram, u32 index)
+{
+
+   return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) ||
+   zram->table[index].handle;
+}
+
 static inline struct zram *dev_to_zram(struct device *dev)
 {
return (struct zram *)dev_to_disk(dev)->private_data;
@@ -83,7 +91,7 @@ static void zram_set_handle(struct zram *zram, u32 index, 
unsigned long handle)
 }
 
 /* flag operations require table entry bit_spin_lock() being held */
-static int zram_test_flag(struct zram *zram, u32 index,
+static bool zram_test_flag(struct zram *zram, u32 index,
enum zram_pageflags flag)
 {
return zram->table[index].value & BIT(flag);
@@ -107,16 +115,6 @@ static inline void zram_set_element(struct zram *zram, u32 
index,
zram->table[index].element = element;
 }
 
-static void zram_accessed(struct zram *zram, u32 index)
-{
-   zram->table[index].ac_time = sched_clock();
-}
-
-static void zram_reset_access(struct zram *zram, u32 index)
-{
-   zram->table[index].ac_time = 0;
-}
-
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
return zram->table[index].element;
@@ -620,6 +618,121 @@ static int read_from_bdev(struct zram *zram, struct 
bio_vec *bvec,
 static void zram_wb_clear(struct

[PATCH v3 3/4] zram: record accessed second

2018-04-08 Thread Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch records last access time of each block of zram so that
With upcoming zram memory tracking, it could help userspace developers
to reduce memory footprint.

Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 16 
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 777fb3339f59..7fc10e2ad734 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -107,6 +107,16 @@ static inline void zram_set_element(struct zram *zram, u32 
index,
zram->table[index].element = element;
 }
 
+static void zram_accessed(struct zram *zram, u32 index)
+{
+   zram->table[index].ac_time = sched_clock();
+}
+
+static void zram_reset_access(struct zram *zram, u32 index)
+{
+   zram->table[index].ac_time = 0;
+}
+
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
return zram->table[index].element;
@@ -806,6 +816,8 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
unsigned long handle;
 
+   zram_reset_access(zram, index);
+
if (zram_test_flag(zram, index, ZRAM_HUGE)) {
zram_clear_flag(zram, index, ZRAM_HUGE);
atomic64_dec(>stats.huge_pages);
@@ -1177,6 +1189,10 @@ static int zram_bvec_rw(struct zram *zram, struct 
bio_vec *bvec, u32 index,
 
generic_end_io_acct(q, rw_acct, >disk->part0, start_time);
 
+   zram_slot_lock(zram, index);
+   zram_accessed(zram, index);
+   zram_slot_unlock(zram, index);
+
if (unlikely(ret < 0)) {
if (!is_write)
atomic64_inc(>stats.failed_reads);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index ff0547bdb586..1075218e88b2 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -61,6 +61,7 @@ struct zram_table_entry {
unsigned long element;
};
unsigned long value;
+   u64 ac_time;
 };
 
 struct zram_stats {
-- 
2.17.0.484.g0c8726318c-goog

Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread Michael S. Tsirkin

On Mon, Apr 09, 2018 at 04:09:20AM +, haibinzhang(张海斌) wrote:
> 
> > On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote:
> > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx 
> > > busy
> > > polling udp packets with small length(e.g. 1byte udp payload), because 
> > > setting
> > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> > > length.
> > > 
> > > Ping-Latencies shown below were tested between two Virtual Machines using
> > > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > > 
> > > Packet-Weight  Ping-Latencies(millisecond)
> > >min  avg   max
> > > Origin   3.319   18.48957.303
> > > 64   1.6432.021 2.552
> > > 128  1.8252.600 3.224
> > > 256  1.9972.710 4.295
> > > 512  1.8603.171 4.631
> > > 1024 2.0024.173 9.056
> > > 2048 2.2575.650 9.688
> > > 4096 2.0938.50815.943
> >
> > And this is with Q size 256 right?
> 
> Yes. Ping-latencies with 512 VQ size show below.
> 
> Packet-Weight  Ping-Latencies(millisecond)
> min  avg   max
> Origin   6.357   29.17766.245
> 64   2.7983.614 4.403
> 128  2.8613.820 4.775
> 256  3.0084.018 4.807
> 512  3.2544.523 5.824
> 1024 3.0795.335 7.747
> 2048 3.9448.201 12.762
> 4096 4.158   11.05719.985
> 
> We will submit again. Is there anything else?

Seems pretty consistent, a small dip at 2 VQ sizes.


Acked-by: Michael S. Tsirkin 

> >
> > > Ring size is a hint from device about a burst size it can tolerate. Based 
> > > on
> > > benchmarks, set the weight to 2 * vq size.
> > > 
> > > To evaluate this change, another tests were done using netperf(RR, TX) 
> > > between
> > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size 
> > > was
> > > tweaked through qemu. Results shown below does not show obvious changes.
> >
> > What I asked for is ping-latency with different VQ sizes,
> > streaming below does not show anything.
> >
> > > vq size=256 TCP_RRvq size=512 TCP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >1/   1/  -7%/-2%  1/   1/   0%/-2%
> > >1/   4/  +1%/ 0%  1/   4/  +1%/ 0%
> > >1/   8/  +1%/-2%  1/   8/   0%/+1%
> > >   64/   1/  -6%/ 0% 64/   1/  +7%/+3%
> > >   64/   4/   0%/+2% 64/   4/  -1%/+1%
> > >   64/   8/   0%/ 0% 64/   8/  -1%/-2%
> > >  256/   1/  -3%/-4%256/   1/  -4%/-2%
> > >  256/   4/  +3%/+4%256/   4/  +1%/+2%
> > >  256/   8/  +2%/ 0%256/   8/  +1%/-1%
> > > 
> > > vq size=256 UDP_RRvq size=512 UDP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >1/   1/  -5%/+1%  1/   1/  -3%/-2%
> > >1/   4/  +4%/+1%  1/   4/  -2%/+2%
> > >1/   8/  -1%/-1%  1/   8/  -1%/ 0%
> > >   64/   1/  -2%/-3% 64/   1/  +1%/+1%
> > >   64/   4/  -5%/-1% 64/   4/  +2%/ 0%
> > >   64/   8/   0%/-1% 64/   8/  -2%/+1%
> > >  256/   1/  +7%/+1%256/   1/  -7%/ 0%
> > >  256/   4/  +1%/+1%256/   4/  -3%/-4%
> > >  256/   8/  +2%/+2%256/   8/  +1%/+1%
> > > 
> > > vq size=256 TCP_STREAMvq size=512 TCP_STREAM
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >   64/   1/   0%/-3% 64/   1/   0%/ 0%
> > >   64/   4/  +3%/-1% 64/   4/  -2%/+4%
> > >   64/   8/  +9%/-4% 64/   8/  -1%/+2%
> > >  256/   1/  +1%/-4%256/   1/  +1%/+1%
> > >  256/   4/  -1%/-1%256/   4/  -3%/ 0%
> > >  256/   8/  +7%/+5%256/   8/  -3%/ 0%
> > >  512/   1/  +1%/ 0%512/   1/  -1%/-1%
> > >  512/   4/  +1%/-1%512/   4/   0%/ 0%
> > >  512/   8/  +7%/-5%512/   8/  +6%/-1%
> > > 1024/   1/   0%/-1%   1024/   1/   0%/+1%
> > > 1024/   4/  +3%/ 0%   1024/   4/  +1%/ 0%
> > > 1024/   8/  +8%/+5%   1024/   8/  -1%/ 0%
> > > 2048/   1/  +2%/+2%   2048/   1/  -1%/ 0%
> > > 2048/   4/  +1%/ 0%   2048/   4/

Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread Michael S. Tsirkin

On Mon, Apr 09, 2018 at 04:09:20AM +, haibinzhang(张海斌) wrote:
> 
> > On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote:
> > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx 
> > > busy
> > > polling udp packets with small length(e.g. 1byte udp payload), because 
> > > setting
> > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> > > length.
> > > 
> > > Ping-Latencies shown below were tested between two Virtual Machines using
> > > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > > 
> > > Packet-Weight  Ping-Latencies(millisecond)
> > >min  avg   max
> > > Origin   3.319   18.48957.303
> > > 64   1.6432.021 2.552
> > > 128  1.8252.600 3.224
> > > 256  1.9972.710 4.295
> > > 512  1.8603.171 4.631
> > > 1024 2.0024.173 9.056
> > > 2048 2.2575.650 9.688
> > > 4096 2.0938.50815.943
> >
> > And this is with Q size 256 right?
> 
> Yes. Ping-latencies with 512 VQ size show below.
> 
> Packet-Weight  Ping-Latencies(millisecond)
> min  avg   max
> Origin   6.357   29.17766.245
> 64   2.7983.614 4.403
> 128  2.8613.820 4.775
> 256  3.0084.018 4.807
> 512  3.2544.523 5.824
> 1024 3.0795.335 7.747
> 2048 3.9448.201 12.762
> 4096 4.158   11.05719.985
> 
> We will submit again. Is there anything else?

Seems pretty consistent, a small dip at 2 VQ sizes.


Acked-by: Michael S. Tsirkin 

> >
> > > Ring size is a hint from device about a burst size it can tolerate. Based 
> > > on
> > > benchmarks, set the weight to 2 * vq size.
> > > 
> > > To evaluate this change, another tests were done using netperf(RR, TX) 
> > > between
> > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size 
> > > was
> > > tweaked through qemu. Results shown below does not show obvious changes.
> >
> > What I asked for is ping-latency with different VQ sizes,
> > streaming below does not show anything.
> >
> > > vq size=256 TCP_RRvq size=512 TCP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >1/   1/  -7%/-2%  1/   1/   0%/-2%
> > >1/   4/  +1%/ 0%  1/   4/  +1%/ 0%
> > >1/   8/  +1%/-2%  1/   8/   0%/+1%
> > >   64/   1/  -6%/ 0% 64/   1/  +7%/+3%
> > >   64/   4/   0%/+2% 64/   4/  -1%/+1%
> > >   64/   8/   0%/ 0% 64/   8/  -1%/-2%
> > >  256/   1/  -3%/-4%256/   1/  -4%/-2%
> > >  256/   4/  +3%/+4%256/   4/  +1%/+2%
> > >  256/   8/  +2%/ 0%256/   8/  +1%/-1%
> > > 
> > > vq size=256 UDP_RRvq size=512 UDP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >1/   1/  -5%/+1%  1/   1/  -3%/-2%
> > >1/   4/  +4%/+1%  1/   4/  -2%/+2%
> > >1/   8/  -1%/-1%  1/   8/  -1%/ 0%
> > >   64/   1/  -2%/-3% 64/   1/  +1%/+1%
> > >   64/   4/  -5%/-1% 64/   4/  +2%/ 0%
> > >   64/   8/   0%/-1% 64/   8/  -2%/+1%
> > >  256/   1/  +7%/+1%256/   1/  -7%/ 0%
> > >  256/   4/  +1%/+1%256/   4/  -3%/-4%
> > >  256/   8/  +2%/+2%256/   8/  +1%/+1%
> > > 
> > > vq size=256 TCP_STREAMvq size=512 TCP_STREAM
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >   64/   1/   0%/-3% 64/   1/   0%/ 0%
> > >   64/   4/  +3%/-1% 64/   4/  -2%/+4%
> > >   64/   8/  +9%/-4% 64/   8/  -1%/+2%
> > >  256/   1/  +1%/-4%256/   1/  +1%/+1%
> > >  256/   4/  -1%/-1%256/   4/  -3%/ 0%
> > >  256/   8/  +7%/+5%256/   8/  -3%/ 0%
> > >  512/   1/  +1%/ 0%512/   1/  -1%/-1%
> > >  512/   4/  +1%/-1%512/   4/   0%/ 0%
> > >  512/   8/  +7%/-5%512/   8/  +6%/-1%
> > > 1024/   1/   0%/-1%   1024/   1/   0%/+1%
> > > 1024/   4/  +3%/ 0%   1024/   4/  +1%/ 0%
> > > 1024/   8/  +8%/+5%   1024/   8/  -1%/ 0%
> > > 2048/   1/  +2%/+2%   2048/   1/  -1%/ 0%
> > > 2048/   4/  +1%/ 0%   2048/   4/   0%/-1%

Re: [PATCH] crypto: DRBG - guard uninstantion by lock

2018-04-08 Thread Stephan Mueller

Am Montag, 9. April 2018, 00:46:03 CEST schrieb Theodore Y. Ts'o:

Hi Theodore,
> 
> So the syzbot will run while the patch goes through the normal e-mail
> review process, which is kind of neat.  :-)

Thank you very much for the hint. That is a neat feature indeed.

As I came late to the party and I missed the original mails, I am wondering 
about which GIT repo was used and which branch of it. With that, I would be 
happy to resubmit with the test line.

Ciao
Stephan

Re: [PATCH] crypto: DRBG - guard uninstantion by lock

2018-04-08 Thread Stephan Mueller

Am Montag, 9. April 2018, 00:46:03 CEST schrieb Theodore Y. Ts'o:

Hi Theodore,
> 
> So the syzbot will run while the patch goes through the normal e-mail
> review process, which is kind of neat.  :-)

Thank you very much for the hint. That is a neat feature indeed.

As I came late to the party and I missed the original mails, I am wondering 
about which GIT repo was used and which branch of it. With that, I would be 
happy to resubmit with the test line.

Ciao
Stephan

Re: [PATCH 4/4] x86: usercopy: reimplement arch_within_stack_frames with unwinder

2018-04-08 Thread Keun-O Park

Hi Kees,

On Thu, Apr 5, 2018 at 3:11 AM, Kees Cook  wrote:
> [resending with the CCs I forgot...]
>
> On Thu, Mar 1, 2018 at 2:19 AM,   wrote:
>> From: Sahara 
>>
>> The old arch_within_stack_frames which used the frame pointer is
>> now reimplemented to use frame pointer unwinder apis. So the main
>> functionality is same as before.
>>
>> Signed-off-by: Sahara 
>
> This will result in slightly more expensive stack checking for
> hardened usercopy, but I think that'd be okay if this could also be
> made to be unwinder-agnostic. Then it would work for ORC too, and
> wouldn't have to depend on just FRAME_POINTER. Without that, I'm not
> sure what the benefit is in changing this?

Exactly. It's the only reason not to depend on the FRAME_POINTER only.
And, it will be better if it would work for ORC.

>
> Further notes below...
>
>> ---
>>  arch/x86/include/asm/unwind.h  |  5 +++
>>  arch/x86/kernel/stacktrace.c   | 77 
>> +-
>>  arch/x86/kernel/unwind_frame.c |  4 +--
>>  3 files changed, 60 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
>> index 1f86e1b..6f04906f 100644
>> --- a/arch/x86/include/asm/unwind.h
>> +++ b/arch/x86/include/asm/unwind.h
>> @@ -87,6 +87,11 @@ void unwind_init(void);
>>  void unwind_module_init(struct module *mod, void *orc_ip, size_t 
>> orc_ip_size,
>> void *orc, size_t orc_size);
>>  #else
>> +#ifdef CONFIG_UNWINDER_FRAME_POINTER
>> +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
>> +size_t regs_size(struct pt_regs *regs);
>> +#endif
>> +
>>  static inline void unwind_init(void) {}
>>  static inline
>>  void unwind_module_init(struct module *mod, void *orc_ip, size_t 
>> orc_ip_size,
>> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
>> index f433a33..c26eb55 100644
>> --- a/arch/x86/kernel/stacktrace.c
>> +++ b/arch/x86/kernel/stacktrace.c
>> @@ -12,6 +12,37 @@
>>  #include 
>>
>>
>> +static inline void *get_cur_frame(struct unwind_state *state)
>> +{
>> +   void *frame = NULL;
>> +
>> +#if defined(CONFIG_UNWINDER_ORC)
>> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER)
>> +   if (state->regs)
>> +   frame = (void *)state->regs;
>> +   else
>> +   frame = (void *)state->bp;
>> +#else
>> +#endif
>> +   return frame;
>> +}
>
> What's going on here with the #if statement? Shouldn't this just be:
>
> +static inline void *get_cur_frame(struct unwind_state *state)
> +{
> +   void *frame = NULL;
> +
> +#ifdef CONFIG_UNWINDER_FRAME_POINTER
> +   if (state->regs)
> +   frame = (void *)state->regs;
> +   else
> +   frame = (void *)state->bp;
> +#endif
> +   return frame;
> +}
>
> ?

Removed the unused #ifdef.



>
>> +
>> +static inline void *get_frame_end(struct unwind_state *state)
>> +{
>> +   void *frame_end = NULL;
>> +
>> +#if defined(CONFIG_UNWINDER_ORC)
>> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER)
>> +   if (state->regs) {
>> +   frame_end = (void *)state->regs + regs_size(state->regs);
>> +   } else {
>> +   frame_end = (void *)state->bp + FRAME_HEADER_SIZE;
>> +   }
>> +#else
>> +#endif
>> +   return frame_end;
>> +}
>
> Same thing above?

Removed the unused #ifdef.

>
>> +
>>  /*
>>   * Walks up the stack frames to make sure that the specified object is
>>   * entirely contained by a single stack frame.
>> @@ -25,31 +56,31 @@ int arch_within_stack_frames(const void * const stack,
>>  const void * const stackend,
>>  const void *obj, unsigned long len)
>>  {
>> -#if defined(CONFIG_FRAME_POINTER)
>> -   const void *frame = NULL;
>> -   const void *oldframe;
>> -
>> -   oldframe = __builtin_frame_address(2);
>> -   if (oldframe)
>> -   frame = __builtin_frame_address(3);
>> +#if defined(CONFIG_UNWINDER_FRAME_POINTER)
>> +   struct unwind_state state;
>> +   void *prev_frame_end = NULL;
>> /*
>> -* low --> high
>> -* [saved bp][saved ip][args][local vars][saved bp][saved ip]
>> -* ^^
>> -*   allow copies only within here
>
> I think it's worth keeping this diagram: it explains what region is
> being checked...

Kept the comment in v2 patch.


>
>> +* Skip 3 non-inlined frames: arch_within_stack_frames(),
>> +* check_stack_object() and __check_object_size().
>> +*
>>  */
>> -   while (stack <= frame && frame < stackend) {
>> -   /*
>> -* If obj + len extends past the last frame, this
>> -* check won't pass and the next frame will be 0,
>> -* causing us to bail out and correctly report
>> -

Re: [PATCH 4/4] x86: usercopy: reimplement arch_within_stack_frames with unwinder

2018-04-08 Thread Keun-O Park

Hi Kees,

On Thu, Apr 5, 2018 at 3:11 AM, Kees Cook  wrote:
> [resending with the CCs I forgot...]
>
> On Thu, Mar 1, 2018 at 2:19 AM,   wrote:
>> From: Sahara 
>>
>> The old arch_within_stack_frames which used the frame pointer is
>> now reimplemented to use frame pointer unwinder apis. So the main
>> functionality is same as before.
>>
>> Signed-off-by: Sahara 
>
> This will result in slightly more expensive stack checking for
> hardened usercopy, but I think that'd be okay if this could also be
> made to be unwinder-agnostic. Then it would work for ORC too, and
> wouldn't have to depend on just FRAME_POINTER. Without that, I'm not
> sure what the benefit is in changing this?

Exactly. It's the only reason not to depend on the FRAME_POINTER only.
And, it will be better if it would work for ORC.

>
> Further notes below...
>
>> ---
>>  arch/x86/include/asm/unwind.h  |  5 +++
>>  arch/x86/kernel/stacktrace.c   | 77 
>> +-
>>  arch/x86/kernel/unwind_frame.c |  4 +--
>>  3 files changed, 60 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
>> index 1f86e1b..6f04906f 100644
>> --- a/arch/x86/include/asm/unwind.h
>> +++ b/arch/x86/include/asm/unwind.h
>> @@ -87,6 +87,11 @@ void unwind_init(void);
>>  void unwind_module_init(struct module *mod, void *orc_ip, size_t 
>> orc_ip_size,
>> void *orc, size_t orc_size);
>>  #else
>> +#ifdef CONFIG_UNWINDER_FRAME_POINTER
>> +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
>> +size_t regs_size(struct pt_regs *regs);
>> +#endif
>> +
>>  static inline void unwind_init(void) {}
>>  static inline
>>  void unwind_module_init(struct module *mod, void *orc_ip, size_t 
>> orc_ip_size,
>> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
>> index f433a33..c26eb55 100644
>> --- a/arch/x86/kernel/stacktrace.c
>> +++ b/arch/x86/kernel/stacktrace.c
>> @@ -12,6 +12,37 @@
>>  #include 
>>
>>
>> +static inline void *get_cur_frame(struct unwind_state *state)
>> +{
>> +   void *frame = NULL;
>> +
>> +#if defined(CONFIG_UNWINDER_ORC)
>> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER)
>> +   if (state->regs)
>> +   frame = (void *)state->regs;
>> +   else
>> +   frame = (void *)state->bp;
>> +#else
>> +#endif
>> +   return frame;
>> +}
>
> What's going on here with the #if statement? Shouldn't this just be:
>
> +static inline void *get_cur_frame(struct unwind_state *state)
> +{
> +   void *frame = NULL;
> +
> +#ifdef CONFIG_UNWINDER_FRAME_POINTER
> +   if (state->regs)
> +   frame = (void *)state->regs;
> +   else
> +   frame = (void *)state->bp;
> +#endif
> +   return frame;
> +}
>
> ?

Removed the unused #ifdef.



>
>> +
>> +static inline void *get_frame_end(struct unwind_state *state)
>> +{
>> +   void *frame_end = NULL;
>> +
>> +#if defined(CONFIG_UNWINDER_ORC)
>> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER)
>> +   if (state->regs) {
>> +   frame_end = (void *)state->regs + regs_size(state->regs);
>> +   } else {
>> +   frame_end = (void *)state->bp + FRAME_HEADER_SIZE;
>> +   }
>> +#else
>> +#endif
>> +   return frame_end;
>> +}
>
> Same thing above?

Removed the unused #ifdef.

>
>> +
>>  /*
>>   * Walks up the stack frames to make sure that the specified object is
>>   * entirely contained by a single stack frame.
>> @@ -25,31 +56,31 @@ int arch_within_stack_frames(const void * const stack,
>>  const void * const stackend,
>>  const void *obj, unsigned long len)
>>  {
>> -#if defined(CONFIG_FRAME_POINTER)
>> -   const void *frame = NULL;
>> -   const void *oldframe;
>> -
>> -   oldframe = __builtin_frame_address(2);
>> -   if (oldframe)
>> -   frame = __builtin_frame_address(3);
>> +#if defined(CONFIG_UNWINDER_FRAME_POINTER)
>> +   struct unwind_state state;
>> +   void *prev_frame_end = NULL;
>> /*
>> -* low --> high
>> -* [saved bp][saved ip][args][local vars][saved bp][saved ip]
>> -* ^^
>> -*   allow copies only within here
>
> I think it's worth keeping this diagram: it explains what region is
> being checked...

Kept the comment in v2 patch.


>
>> +* Skip 3 non-inlined frames: arch_within_stack_frames(),
>> +* check_stack_object() and __check_object_size().
>> +*
>>  */
>> -   while (stack <= frame && frame < stackend) {
>> -   /*
>> -* If obj + len extends past the last frame, this
>> -* check won't pass and the next frame will be 0,
>> -* causing us to bail out and correctly report
>> -* the copy as invalid.
>> -*/
>
> Also seems like we should keep the comment

Re: [PATCH V2 3/9] dt-bindings: Tegra186 tachometer device tree bindings

2018-04-08 Thread Mikko Perttunen


Rob,

this binding is for a specific IP block (for measuring/aggregating input 
pulses) on the Tegra186 SoC, so I don't think it fits into any generic 
binding.


Thanks,
Mikko

On 03/27/2018 05:52 PM, Rob Herring wrote:

On Wed, Mar 21, 2018 at 10:10:38AM +0530, Rajkumar Rampelli wrote:

Supply Device tree binding documentation for the NVIDIA
Tegra186 SoC's Tachometer Controller

Signed-off-by: Rajkumar Rampelli 
---

V2: Renamed compatible string to "nvidia,tegra186-pwm-tachometer"
 Renamed dt property values of clock-names and reset-names to "tachometer"
 from "tach"


Read my prior comments on v1.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 3/9] dt-bindings: Tegra186 tachometer device tree bindings

2018-04-08 Thread Mikko Perttunen


Rob,

this binding is for a specific IP block (for measuring/aggregating input 
pulses) on the Tegra186 SoC, so I don't think it fits into any generic 
binding.


Thanks,
Mikko

On 03/27/2018 05:52 PM, Rob Herring wrote:

On Wed, Mar 21, 2018 at 10:10:38AM +0530, Rajkumar Rampelli wrote:

Supply Device tree binding documentation for the NVIDIA
Tegra186 SoC's Tachometer Controller

Signed-off-by: Rajkumar Rampelli 
---

V2: Renamed compatible string to "nvidia,tegra186-pwm-tachometer"
 Renamed dt property values of clock-names and reset-names to "tachometer"
 from "tach"


Read my prior comments on v1.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted

2018-04-08 Thread Dou Liyang


Hi RongQing,

Is there an local x2apic whose ID is 0x in your machine?

At 04/08/2018 07:38 PM, Li RongQing wrote:

local_apic_id of acpi_madt_local_x2apic is u32, it is converted to
int when checked by default_apic_id_valid() and return true if it is
larger than 0x7fff, this is wrong



For x2apic enabled systems,

  - the byte length of X2APIC ID is 4, and it can be larger than
0x7fff in theory

  - the ->apic_id_valid points to x2apic_apic_id_valid(), which always
return _ture_ , not default_apic_id_valid().

Thanks,
dou


and if local_apic_id is invalid, we should prevent it from being
accounted >
This fixes a bug that Purley platform displays too many possible cpu

Signed-off-by: Li RongQing 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Dou Liyang 
---
  arch/x86/include/asm/apic.h  |  4 ++--
  arch/x86/kernel/acpi/boot.c  | 10 ++
  arch/x86/kernel/apic/apic_common.c   |  2 +-
  arch/x86/kernel/apic/apic_numachip.c |  2 +-
  arch/x86/kernel/apic/x2apic.h|  2 +-
  arch/x86/kernel/apic/x2apic_phys.c   |  2 +-
  arch/x86/kernel/apic/x2apic_uv_x.c   |  2 +-
  arch/x86/xen/apic.c  |  2 +-
  8 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 40a3d3642f3a..08acd954f00e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -313,7 +313,7 @@ struct apic {
/* Probe, setup and smpboot functions */
int (*probe)(void);
int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
-   int (*apic_id_valid)(int apicid);
+   int (*apic_id_valid)(u32 apicid);
int (*apic_id_registered)(void);
  
  	bool	(*check_apicid_used)(physid_mask_t *map, int apicid);

@@ -486,7 +486,7 @@ static inline unsigned int read_apic_id(void)
return apic->get_apic_id(reg);
  }
  
-extern int default_apic_id_valid(int apicid);

+extern int default_apic_id_valid(u32 apicid);
  extern int default_acpi_madt_oem_check(char *, char *);
  extern void default_setup_apic_routing(void);
  
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c

index 7a37d9357bc4..7412564dc2a7 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -200,7 +200,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
  {
struct acpi_madt_local_x2apic *processor = NULL;
  #ifdef CONFIG_X86_X2APIC
-   int apic_id;
+   u32 apic_id;
u8 enabled;
  #endif
  
@@ -222,10 +222,12 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end)

 * to not preallocating memory for all NR_CPUS
 * when we use CPU hotplug.
 */
-   if (!apic->apic_id_valid(apic_id) && enabled)
+   if (!apic->apic_id_valid(apic_id)) {
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
-   else
-   acpi_register_lapic(apic_id, processor->uid, enabled);
+   return 0;
+   }
+
+   acpi_register_lapic(apic_id, processor->uid, enabled);
  #else
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
  #endif
diff --git a/arch/x86/kernel/apic/apic_common.c 
b/arch/x86/kernel/apic/apic_common.c
index a360801779ae..02b4839478b1 100644
--- a/arch/x86/kernel/apic/apic_common.c
+++ b/arch/x86/kernel/apic/apic_common.c
@@ -40,7 +40,7 @@ int default_check_phys_apicid_present(int phys_apicid)
return physid_isset(phys_apicid, phys_cpu_present_map);
  }
  
-int default_apic_id_valid(int apicid)

+int default_apic_id_valid(u32 apicid)
  {
return (apicid < 255);
  }
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index 134e04506ab4..78778b54f904 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -56,7 +56,7 @@ static u32 numachip2_set_apic_id(unsigned int id)
return id << 24;
  }
  
-static int numachip_apic_id_valid(int apicid)

+static int numachip_apic_id_valid(u32 apicid)
  {
/* Trust what bootloader passes in MADT */
return 1;
diff --git a/arch/x86/kernel/apic/x2apic.h b/arch/x86/kernel/apic/x2apic.h
index b107de381cb5..a49b3604027f 100644
--- a/arch/x86/kernel/apic/x2apic.h
+++ b/arch/x86/kernel/apic/x2apic.h
@@ -1,6 +1,6 @@
  /* Common bits for X2APIC cluster/physical modes. */
  
-int x2apic_apic_id_valid(int apicid);

+int x2apic_apic_id_valid(u32 apicid);
  int x2apic_apic_id_registered(void);
  void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int 
dest);
  unsigned int x2apic_get_apic_id(unsigned long id);
diff --git a/arch/x86/kernel/apic/x2apic_phys.c 
b/arch/x86/kernel/apic/x2apic_phys.c
index e2829bf40e4a..b5cf9e7b3830 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -101,7

Re: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted

2018-04-08 Thread Dou Liyang


Hi RongQing,

Is there an local x2apic whose ID is 0x in your machine?

At 04/08/2018 07:38 PM, Li RongQing wrote:

local_apic_id of acpi_madt_local_x2apic is u32, it is converted to
int when checked by default_apic_id_valid() and return true if it is
larger than 0x7fff, this is wrong



For x2apic enabled systems,

  - the byte length of X2APIC ID is 4, and it can be larger than
0x7fff in theory

  - the ->apic_id_valid points to x2apic_apic_id_valid(), which always
return _ture_ , not default_apic_id_valid().

Thanks,
dou


and if local_apic_id is invalid, we should prevent it from being
accounted >
This fixes a bug that Purley platform displays too many possible cpu

Signed-off-by: Li RongQing 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Dou Liyang 
---
  arch/x86/include/asm/apic.h  |  4 ++--
  arch/x86/kernel/acpi/boot.c  | 10 ++
  arch/x86/kernel/apic/apic_common.c   |  2 +-
  arch/x86/kernel/apic/apic_numachip.c |  2 +-
  arch/x86/kernel/apic/x2apic.h|  2 +-
  arch/x86/kernel/apic/x2apic_phys.c   |  2 +-
  arch/x86/kernel/apic/x2apic_uv_x.c   |  2 +-
  arch/x86/xen/apic.c  |  2 +-
  8 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 40a3d3642f3a..08acd954f00e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -313,7 +313,7 @@ struct apic {
/* Probe, setup and smpboot functions */
int (*probe)(void);
int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
-   int (*apic_id_valid)(int apicid);
+   int (*apic_id_valid)(u32 apicid);
int (*apic_id_registered)(void);
  
  	bool	(*check_apicid_used)(physid_mask_t *map, int apicid);

@@ -486,7 +486,7 @@ static inline unsigned int read_apic_id(void)
return apic->get_apic_id(reg);
  }
  
-extern int default_apic_id_valid(int apicid);

+extern int default_apic_id_valid(u32 apicid);
  extern int default_acpi_madt_oem_check(char *, char *);
  extern void default_setup_apic_routing(void);
  
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c

index 7a37d9357bc4..7412564dc2a7 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -200,7 +200,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
  {
struct acpi_madt_local_x2apic *processor = NULL;
  #ifdef CONFIG_X86_X2APIC
-   int apic_id;
+   u32 apic_id;
u8 enabled;
  #endif
  
@@ -222,10 +222,12 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end)

 * to not preallocating memory for all NR_CPUS
 * when we use CPU hotplug.
 */
-   if (!apic->apic_id_valid(apic_id) && enabled)
+   if (!apic->apic_id_valid(apic_id)) {
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
-   else
-   acpi_register_lapic(apic_id, processor->uid, enabled);
+   return 0;
+   }
+
+   acpi_register_lapic(apic_id, processor->uid, enabled);
  #else
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
  #endif
diff --git a/arch/x86/kernel/apic/apic_common.c 
b/arch/x86/kernel/apic/apic_common.c
index a360801779ae..02b4839478b1 100644
--- a/arch/x86/kernel/apic/apic_common.c
+++ b/arch/x86/kernel/apic/apic_common.c
@@ -40,7 +40,7 @@ int default_check_phys_apicid_present(int phys_apicid)
return physid_isset(phys_apicid, phys_cpu_present_map);
  }
  
-int default_apic_id_valid(int apicid)

+int default_apic_id_valid(u32 apicid)
  {
return (apicid < 255);
  }
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index 134e04506ab4..78778b54f904 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -56,7 +56,7 @@ static u32 numachip2_set_apic_id(unsigned int id)
return id << 24;
  }
  
-static int numachip_apic_id_valid(int apicid)

+static int numachip_apic_id_valid(u32 apicid)
  {
/* Trust what bootloader passes in MADT */
return 1;
diff --git a/arch/x86/kernel/apic/x2apic.h b/arch/x86/kernel/apic/x2apic.h
index b107de381cb5..a49b3604027f 100644
--- a/arch/x86/kernel/apic/x2apic.h
+++ b/arch/x86/kernel/apic/x2apic.h
@@ -1,6 +1,6 @@
  /* Common bits for X2APIC cluster/physical modes. */
  
-int x2apic_apic_id_valid(int apicid);

+int x2apic_apic_id_valid(u32 apicid);
  int x2apic_apic_id_registered(void);
  void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int 
dest);
  unsigned int x2apic_get_apic_id(unsigned long id);
diff --git a/arch/x86/kernel/apic/x2apic_phys.c 
b/arch/x86/kernel/apic/x2apic_phys.c
index e2829bf40e4a..b5cf9e7b3830 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -101,7 +101,7 @@ static int x2apic_phys_probe(void)
  }
  
  /* Common x2apic functions, also used

[lkp-robot] [init, tracing] 2580d6b795: BUG:kernel_reboot-without-warning_in_boot_stage

2018-04-08 Thread kernel test robot


FYI, we noticed the following commit (built with gcc-7):

commit: 2580d6b795e25879c825a0891cf67390f665b11f ("init, tracing: Have printk 
come through the trace events for initcall_debug")
url: 
https://github.com/0day-ci/linux/commits/Steven-Rostedt/init-tracing/20180407-130743


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu Nehalem -smp 2 -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | ecf6709d07 
| 2580d6b795 |
+--+++
| boot_successes   | 0  
| 0  |
| boot_failures| 8  
| 8  |
| invoked_oom-killer:gfp_mask=0x   | 8  
||
| Mem-Info | 8  
||
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 8  
||
| BUG:kernel_reboot-without-warning_in_boot_stage  | 0  
| 8  |
+--+++



[0.00] RAMDISK: [mem 0x1b7e2000-0x1ffc]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000F6860 14 (v00 BOCHS )
[0.00] ACPI: RSDT 0x1FFE1628 30 (v01 BOCHS  BXPCRSDT 
0001 BXPC 0001)
[0.00] ACPI: FACP 0x1FFE147C 74 (v01 BOCHS  BXPCFACP 
0001 BXPC 0001)
BUG: kernel reboot-without-warning in boot stage

Elapsed time: 10

#!/bin/bash



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.16.0 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_KASAN_SHADOW_OFFSET=0xdc00
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not

[lkp-robot] [init, tracing] 2580d6b795: BUG:kernel_reboot-without-warning_in_boot_stage

2018-04-08 Thread kernel test robot


FYI, we noticed the following commit (built with gcc-7):

commit: 2580d6b795e25879c825a0891cf67390f665b11f ("init, tracing: Have printk 
come through the trace events for initcall_debug")
url: 
https://github.com/0day-ci/linux/commits/Steven-Rostedt/init-tracing/20180407-130743


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu Nehalem -smp 2 -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | ecf6709d07 
| 2580d6b795 |
+--+++
| boot_successes   | 0  
| 0  |
| boot_failures| 8  
| 8  |
| invoked_oom-killer:gfp_mask=0x   | 8  
||
| Mem-Info | 8  
||
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 8  
||
| BUG:kernel_reboot-without-warning_in_boot_stage  | 0  
| 8  |
+--+++



[0.00] RAMDISK: [mem 0x1b7e2000-0x1ffc]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000F6860 14 (v00 BOCHS )
[0.00] ACPI: RSDT 0x1FFE1628 30 (v01 BOCHS  BXPCRSDT 
0001 BXPC 0001)
[0.00] ACPI: FACP 0x1FFE147C 74 (v01 BOCHS  BXPCFACP 
0001 BXPC 0001)
BUG: kernel reboot-without-warning in boot stage

Elapsed time: 10

#!/bin/bash



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.16.0 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_KASAN_SHADOW_OFFSET=0xdc00
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not

Re: [GIT PULL] SELinux patches for v4.17

2018-04-08 Thread Xin Long

On Mon, Apr 9, 2018 at 6:44 AM, Richard Haines
 wrote:
> On Sun, 2018-04-08 at 19:59 +0100, Richard Haines via Selinux wrote:
>> On Mon, 2018-04-09 at 01:43 +0800, Xin Long wrote:
>> > On Sun, Apr 8, 2018 at 10:09 PM, Richard Haines
>> >  wrote:
>> > > On Sun, 2018-04-08 at 08:50 -0400, Paul Moore wrote:
>> > > > On April 7, 2018 1:03:57 PM Linus Torvalds > > > > da
>> > > > tion
>> > > > .org> wrote:
>> > > > On Sat, Apr 7, 2018 at 9:54 AM, Richard Haines
>> > > >  wrote:
>> > > >
>> > > > So please check my resolution, but also somebody should tell me
>> > > > "Linus, you're a cretin, sctp_connect() doesn't want that
>> > > > security_sctp_bind_connect() at all because it was already done
>> > > > by
>> > > > XYZ"
>> > > >
>> > > > sctp_connect() or __sctp_connect() do not need to call
>> > > > security_sctp_bind_connect(). This is because the connect(2)
>> > > > call
>> > > > will
>> > > > handle the checks required via security_socket_connect():
>> > > >
>> > > > Ok, thanks, that's exactly what I wanted to get.
>> > > >
>> > > > Anyway, somebody should still verify that it all looks good in
>> > > > my
>> > > > tree, but I don't actually expect the merge to have had any
>> > > > issues
>> > > > even if the refactoring made it a bit more complex than most
>> > > > merges
>> > > > are.
>> > > >
>> > > > Thanks for the quick response Richard.
>> > > >
>> > > > Xin Long looked it over and gave it the thumbs up, I'll take a
>> > > > look
>> > > > too, but to be honest I trust his SCTP understanding much more
>> > > > than
>> > > > mine.  I also do weekly tests of each rcX release at a minimum
>> > > > so
>> > > > if
>> > > > something odd pops up I'll make sure you get a fix.
>> > > >
>> > > > Thanks again everyone.
>> > >
>> > > I built the kernel this morning and sorry to spoil the party, but
>> > > I've
>> > > run into a problem with lksctp-tools when running the func_tests:
>> > >
>> > > make v6test
>> > > ..
>> > > ..
>> > > ./test_timetolive_v6
>> > > test_timetolive.c  0 INFO : Creating fillmsg of size 3087
>> > > test_timetolive.c  1 PASS : Send a message with timeout
>> > > test_timetolive.c  2 PASS : Send a message with no timeout
>> > > test_timetolive.c  3 PASS : Send a fragmented message with
>> > > timeout
>> > > test_timetolive.c  0 INFO :  **  SLEEPING for 3 seconds **
>> > > test_timetolive.c  4 BROK : Got a datamsg of unexpected
>> > > length:23,
>> > > expected length:27
>> > > DUMP_CORE sctputil.c: 247
>> > > /bin/sh: line 1: 30981 Segmentation fault  (core dumped) ./$a
>> > > test_timetolive_v6 fails
>> > >
>> > > make v4 test fails the same way. I'm using lksctp-tools from [1].
>> > > I
>> > > have not investigated the cause yet as just found this and
>> > > thought
>> > > I
>> > > should flag first just in case someone has the answer !!!
>> >
>> > test_timetolive(_v6) works for me, In lksctp-tools/src/func_tests,
>> > I
>> > had
>> > another case failed,./test_1_to_1_events,  it's caused by:
>> > commit 30f6ebf65bc46161c5aaff1db2e6e7c76aa4a06b
>> > Author: Xin Long 
>> > Date:   Wed Mar 14 19:05:34 2018 +0800
>> >
>> > sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT
>> >
>> > It's not kernel's issue, after that commit, ./test_1_to_1_events
>> > should
>> > have been improved. or avoid it by 'sysctl -w
>> > net.sctp.auth_enable=1'
>> >
>> > I'm not sure why test_timetolive(_v6) is not working in your env.
>>
>> It appears to depend on the run sequence of the tests. I rebooted the
>> system, ran test_timetolive_v6, it worked okay.
>> Ran "sctp-tests run" on a terminal, then ran test_timetolive_v6 at
>> various intervals on another terminal. Once sctp-tests started the
>> "===
>> ndatasched ===" sequence, test_timetolive_v6 failed.
>
> 1) When SCTP is initialised /proc/sys/net/sctp/prsctp_enable = 1
> 2) When sctp-tests/testcase/regression/extoverflow/test.sh is executed,
> on exit it sets prsctp_enable = 0. This seems to be causing the issue
> I'm seeing. I can now simulate the problem:
>
> Running from fresh boot:
> checksctp
> cat /proc/sys/net/sctp/prsctp_enable
> 1
> ./test_timetolive_v6
> passes
> echo 0 > /proc/sys/net/sctp/prsctp_enable
> ./test_timetolive_v6
> fails
> echo 1 > /proc/sys/net/sctp/prsctp_enable
> ./test_timetolive_v6
> passes
I see ...

commit 8ae808eb853e3789b81b8a502cdf22bb01b76880
Author: Xin Long 
Date:   Sat Oct 8 11:40:16 2016 +0800

sctp: remove the old ttl expires policy

ttl expire is considered as one of the prsctp policies after
this commit, so prsctp_enable is required. I will think to
update this test case in lksctp-tools.

Thanks for the reproducer.

Re: [GIT PULL] SELinux patches for v4.17

2018-04-08 Thread Xin Long

On Mon, Apr 9, 2018 at 6:44 AM, Richard Haines
 wrote:
> On Sun, 2018-04-08 at 19:59 +0100, Richard Haines via Selinux wrote:
>> On Mon, 2018-04-09 at 01:43 +0800, Xin Long wrote:
>> > On Sun, Apr 8, 2018 at 10:09 PM, Richard Haines
>> >  wrote:
>> > > On Sun, 2018-04-08 at 08:50 -0400, Paul Moore wrote:
>> > > > On April 7, 2018 1:03:57 PM Linus Torvalds > > > > da
>> > > > tion
>> > > > .org> wrote:
>> > > > On Sat, Apr 7, 2018 at 9:54 AM, Richard Haines
>> > > >  wrote:
>> > > >
>> > > > So please check my resolution, but also somebody should tell me
>> > > > "Linus, you're a cretin, sctp_connect() doesn't want that
>> > > > security_sctp_bind_connect() at all because it was already done
>> > > > by
>> > > > XYZ"
>> > > >
>> > > > sctp_connect() or __sctp_connect() do not need to call
>> > > > security_sctp_bind_connect(). This is because the connect(2)
>> > > > call
>> > > > will
>> > > > handle the checks required via security_socket_connect():
>> > > >
>> > > > Ok, thanks, that's exactly what I wanted to get.
>> > > >
>> > > > Anyway, somebody should still verify that it all looks good in
>> > > > my
>> > > > tree, but I don't actually expect the merge to have had any
>> > > > issues
>> > > > even if the refactoring made it a bit more complex than most
>> > > > merges
>> > > > are.
>> > > >
>> > > > Thanks for the quick response Richard.
>> > > >
>> > > > Xin Long looked it over and gave it the thumbs up, I'll take a
>> > > > look
>> > > > too, but to be honest I trust his SCTP understanding much more
>> > > > than
>> > > > mine.  I also do weekly tests of each rcX release at a minimum
>> > > > so
>> > > > if
>> > > > something odd pops up I'll make sure you get a fix.
>> > > >
>> > > > Thanks again everyone.
>> > >
>> > > I built the kernel this morning and sorry to spoil the party, but
>> > > I've
>> > > run into a problem with lksctp-tools when running the func_tests:
>> > >
>> > > make v6test
>> > > ..
>> > > ..
>> > > ./test_timetolive_v6
>> > > test_timetolive.c  0 INFO : Creating fillmsg of size 3087
>> > > test_timetolive.c  1 PASS : Send a message with timeout
>> > > test_timetolive.c  2 PASS : Send a message with no timeout
>> > > test_timetolive.c  3 PASS : Send a fragmented message with
>> > > timeout
>> > > test_timetolive.c  0 INFO :  **  SLEEPING for 3 seconds **
>> > > test_timetolive.c  4 BROK : Got a datamsg of unexpected
>> > > length:23,
>> > > expected length:27
>> > > DUMP_CORE sctputil.c: 247
>> > > /bin/sh: line 1: 30981 Segmentation fault  (core dumped) ./$a
>> > > test_timetolive_v6 fails
>> > >
>> > > make v4 test fails the same way. I'm using lksctp-tools from [1].
>> > > I
>> > > have not investigated the cause yet as just found this and
>> > > thought
>> > > I
>> > > should flag first just in case someone has the answer !!!
>> >
>> > test_timetolive(_v6) works for me, In lksctp-tools/src/func_tests,
>> > I
>> > had
>> > another case failed,./test_1_to_1_events,  it's caused by:
>> > commit 30f6ebf65bc46161c5aaff1db2e6e7c76aa4a06b
>> > Author: Xin Long 
>> > Date:   Wed Mar 14 19:05:34 2018 +0800
>> >
>> > sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT
>> >
>> > It's not kernel's issue, after that commit, ./test_1_to_1_events
>> > should
>> > have been improved. or avoid it by 'sysctl -w
>> > net.sctp.auth_enable=1'
>> >
>> > I'm not sure why test_timetolive(_v6) is not working in your env.
>>
>> It appears to depend on the run sequence of the tests. I rebooted the
>> system, ran test_timetolive_v6, it worked okay.
>> Ran "sctp-tests run" on a terminal, then ran test_timetolive_v6 at
>> various intervals on another terminal. Once sctp-tests started the
>> "===
>> ndatasched ===" sequence, test_timetolive_v6 failed.
>
> 1) When SCTP is initialised /proc/sys/net/sctp/prsctp_enable = 1
> 2) When sctp-tests/testcase/regression/extoverflow/test.sh is executed,
> on exit it sets prsctp_enable = 0. This seems to be causing the issue
> I'm seeing. I can now simulate the problem:
>
> Running from fresh boot:
> checksctp
> cat /proc/sys/net/sctp/prsctp_enable
> 1
> ./test_timetolive_v6
> passes
> echo 0 > /proc/sys/net/sctp/prsctp_enable
> ./test_timetolive_v6
> fails
> echo 1 > /proc/sys/net/sctp/prsctp_enable
> ./test_timetolive_v6
> passes
I see ...

commit 8ae808eb853e3789b81b8a502cdf22bb01b76880
Author: Xin Long 
Date:   Sat Oct 8 11:40:16 2016 +0800

sctp: remove the old ttl expires policy

ttl expire is considered as one of the prsctp policies after
this commit, so prsctp_enable is required. I will think to
update this test case in lksctp-tools.

Thanks for the reproducer.

Re: [PATCH v2 4/4] clk: qcom: Add Global Clock controller (GCC) driver for SDM845

2018-04-08 Thread Amit Nischal

On 2018-04-06 04:27, Stephen Boyd wrote:

Quoting Amit Nischal (2018-04-03 05:24:41)

On 2018-03-20 06:12, Stephen Boyd wrote:
> Quoting Amit Nischal (2018-03-07 23:18:15)
>> +};
>> +
>> +static struct clk_rcg2 gcc_sdcc4_apps_clk_src = {
>> +   .cmd_rcgr = 0x1600c,
>> +   .mnd_width = 8,
>> +   .hid_width = 5,
>> +   .parent_map = gcc_parent_map_0,
>> +   .freq_tbl = ftbl_gcc_sdcc4_apps_clk_src,
>> +   .safe_src_freq_tbl = _safe_src_f,
>
> Why does sdcc have safe src stuff? Is something turning on the sdcc clk
> outside of our control?

I will get more details on this and will get back.

Any news?

I am removing the safe src for SDCC, but I am trying to get details from
teams as to why this was added, if it would be required I will add back
the safe src index again and submit the patch.

>
>> +   .clkr.hw.init = &(struct clk_init_data){
>> +   .name = "gcc_sdcc4_apps_clk_src",
>> +   .parent_names = gcc_parent_names_0,
>> +   .num_parents = 4,
>> +   .flags = CLK_SET_RATE_PARENT,
>> +   .ops = _rcg2_shared_ops,
>> +   },
>> +};
>> +
> [...]
>> +
>> +static struct clk_branch gcc_video_xo_clk = {
>> +   .halt_reg = 0xb028,
>> +   .halt_check = BRANCH_HALT,
>> +   .clkr = {
>> +   .enable_reg = 0xb028,
>> +   .enable_mask = BIT(0),
>> +   .hw.init = &(struct clk_init_data){
>> +   .name = "gcc_video_xo_clk",
>> +   .flags = CLK_IS_CRITICAL,
>> +   .ops = _branch2_ops,
>> +
>
> These things have no parents and we mark them critical. Why are we
> even exposing them to the kernel? Are they not on by default? Are we
> going to change these to non-critical at some point in the future?

These clocks are not enabled by default and going to video or other
multimedia cores so we are marking them as critical and need to expose
to the kernel. As of now, there is no plan to change these to
non-critical.

Ok. Can we open code enabling these branches in the driver probe then?
Still seems wasteful if nobody uses these.

Put another way, either a driver (or other clk controller) should be
toggling these gates at runtime or we should enable them once and leave
them out of the framework. If the driver approach is taken, then the
drivers should be able to turn the clks on and off to save some power.

As of now, no client driver is taking care of toggling these gates at
runtime. We want these clocks to be always on and that's why marked
them as CRITICAL so that if any user tries to unprepare/disable then
it won't happen and framework generates the warning.
Once the client drivers will take care of above, then we will submit
a cleanup patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-clk" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 4/4] clk: qcom: Add Global Clock controller (GCC) driver for SDM845

2018-04-08 Thread Amit Nischal

On 2018-04-06 04:27, Stephen Boyd wrote:

Quoting Amit Nischal (2018-04-03 05:24:41)

On 2018-03-20 06:12, Stephen Boyd wrote:
> Quoting Amit Nischal (2018-03-07 23:18:15)
>> +};
>> +
>> +static struct clk_rcg2 gcc_sdcc4_apps_clk_src = {
>> +   .cmd_rcgr = 0x1600c,
>> +   .mnd_width = 8,
>> +   .hid_width = 5,
>> +   .parent_map = gcc_parent_map_0,
>> +   .freq_tbl = ftbl_gcc_sdcc4_apps_clk_src,
>> +   .safe_src_freq_tbl = _safe_src_f,
>
> Why does sdcc have safe src stuff? Is something turning on the sdcc clk
> outside of our control?

I will get more details on this and will get back.

Any news?

I am removing the safe src for SDCC, but I am trying to get details from
teams as to why this was added, if it would be required I will add back
the safe src index again and submit the patch.

>
>> +   .clkr.hw.init = &(struct clk_init_data){
>> +   .name = "gcc_sdcc4_apps_clk_src",
>> +   .parent_names = gcc_parent_names_0,
>> +   .num_parents = 4,
>> +   .flags = CLK_SET_RATE_PARENT,
>> +   .ops = _rcg2_shared_ops,
>> +   },
>> +};
>> +
> [...]
>> +
>> +static struct clk_branch gcc_video_xo_clk = {
>> +   .halt_reg = 0xb028,
>> +   .halt_check = BRANCH_HALT,
>> +   .clkr = {
>> +   .enable_reg = 0xb028,
>> +   .enable_mask = BIT(0),
>> +   .hw.init = &(struct clk_init_data){
>> +   .name = "gcc_video_xo_clk",
>> +   .flags = CLK_IS_CRITICAL,
>> +   .ops = _branch2_ops,
>> +
>
> These things have no parents and we mark them critical. Why are we
> even exposing them to the kernel? Are they not on by default? Are we
> going to change these to non-critical at some point in the future?

These clocks are not enabled by default and going to video or other
multimedia cores so we are marking them as critical and need to expose
to the kernel. As of now, there is no plan to change these to
non-critical.

Ok. Can we open code enabling these branches in the driver probe then?
Still seems wasteful if nobody uses these.

Put another way, either a driver (or other clk controller) should be
toggling these gates at runtime or we should enable them once and leave
them out of the framework. If the driver approach is taken, then the
drivers should be able to turn the clks on and off to save some power.

As of now, no client driver is taking care of toggling these gates at
runtime. We want these clocks to be always on and that's why marked
them as CRITICAL so that if any user tries to unprepare/disable then
it won't happen and framework generates the warning.
Once the client drivers will take care of above, then we will submit
a cleanup patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-clk" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time

2018-04-08 Thread Tetsuo Handa

Sargun Dhillon wrote:
> >   Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and
> >   the exception in randomize_layout_plugin.c because preventing module
> >   unloading won't work as expected.
> >
> 
> Rather than completely removing the unloading code, might it make
> sense to add a BUG_ON or WARN_ON, in security_delete_hooks if
> allow_unload_module is false, and owner is not NULL?

Do we need to check ->owner != NULL? Although it will be true that
SELinux's ->owner == NULL and LKM-based LSM module's ->owner != NULL,
I think we unregister SELinux before setting allow_unload_module to false.
Thus, rejecting delete_security_hooks() if allow_unload_module == false will
be sufficient. SELinux might want to call panic() if delete_security_hooks()
did not unregister due to allow_unload_module == false. Also,
allow_unload_module would be renamed to allow_unregister_module.

By the way, please don't use BUG_ON() or WARN_ON() because syzbot would hit
and call panic() because syzbot runs tests with panic_on_warn == true.

Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time

2018-04-08 Thread Tetsuo Handa

Sargun Dhillon wrote:
> >   Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and
> >   the exception in randomize_layout_plugin.c because preventing module
> >   unloading won't work as expected.
> >
> 
> Rather than completely removing the unloading code, might it make
> sense to add a BUG_ON or WARN_ON, in security_delete_hooks if
> allow_unload_module is false, and owner is not NULL?

Do we need to check ->owner != NULL? Although it will be true that
SELinux's ->owner == NULL and LKM-based LSM module's ->owner != NULL,
I think we unregister SELinux before setting allow_unload_module to false.
Thus, rejecting delete_security_hooks() if allow_unload_module == false will
be sufficient. SELinux might want to call panic() if delete_security_hooks()
did not unregister due to allow_unload_module == false. Also,
allow_unload_module would be renamed to allow_unregister_module.

By the way, please don't use BUG_ON() or WARN_ON() because syzbot would hit
and call panic() because syzbot runs tests with panic_on_warn == true.

Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample

2018-04-08 Thread Alexey Budankov

On 07.04.2018 9:18, Alexey Budankov wrote:
> On 06.04.2018 22:53, Andi Kleen wrote:
>> On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote:
>>> On 06.04.2018 18:31, Andi Kleen wrote:
> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
> index e47b2dbbdef3..9284048cf5b0 100644
> --- a/arch/x86/kernel/perf_regs.c
> +++ b/arch/x86/kernel/perf_regs.c
> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>*/
>   regs_user_copy->bx = -1;
>   regs_user_copy->bp = -1;
> + if (user_64bit_mode(user_regs)) {

 Why is it 64bit only? Should work on 32bit too.
>>>
>>> bp register is a part of i386 syscall ABI 
>>> (http://man7.org/linux/man-pages/man2/syscall.2.html) 
>>> so not sure if it will make any sense for 32bit processes. 
>>
>> Both 32bit and 64bit use the same frame pointer, if they
>> use frame pointer.
> 
> Well let me check the same scenario for 32bit binary.

Here is what I have when profiling 32bit process on the patched 64bit 
kernel w/o 32bit frame-pointer exposure:

vmlinux ! try_to_wake_up - [unknown source file]
vmlinux ! wake_up_q + 0x3e - [unknown source file]
vmlinux ! futex_wake + 0x141 - [unknown source file]
vmlinux ! do_futex + 0x49b - [unknown source file]
vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
==> [vdso] ! __kernel_vsyscall + 0x8 - [unknown source file]
==> libc-2.26.so ! syscall + 0x26 - [unknown source file]
==> futex32-fp ! main + 0xba - [unknown source file]
==> libc-2.26.so ! __libc_start_main + 0xf2 - [unknown source file]

so stack is unwound till the top. However if I enable 32bit exposure 
then the stack looks like this:

vmlinux ! try_to_wake_up - [unknown source file]
vmlinux ! wake_up_q + 0x3e - [unknown source file]
vmlinux ! futex_wake + 0x141 - [unknown source file]
vmlinux ! do_futex + 0x49b - [unknown source file]
vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
==> [vdso] ! [vdso] + 0x1058 - [unknown source file]
==> vmlinux ! [Skipped stack frame(s)] + 0x1 - [unknown source file]

and x86_64 perf report --stdio shows this:

...
unwind: target platform=x86 is not supported
...
# Samples: 140K of event 'cycles'
# Event count (approx.): 93688193797
#
# Children  Self  Command Shared Object Symbol  
  
#     ..    .
#
86.00%14.40%  futex32-fp  [kernel.vmlinux]  [k] entry_SYSENTER_compat
|
---entry_SYSENTER_compat
   |  
--71.60%--do_fast_syscall_32
  |  
  |--54.62%--compat_sys_futex
  |  |  
  |   --53.67%--do_futex

I am not sure it is worth exposing frame pointer for 32bit too.

-Alexey

> If the issue exists for it too and is fixed by the exposing bp
> then it is obviously worth this improvement.
> 
> -Alexey
> 
>>
>> -Andi
>>
> 
>

Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample

2018-04-08 Thread Alexey Budankov

On 07.04.2018 9:18, Alexey Budankov wrote:
> On 06.04.2018 22:53, Andi Kleen wrote:
>> On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote:
>>> On 06.04.2018 18:31, Andi Kleen wrote:
> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
> index e47b2dbbdef3..9284048cf5b0 100644
> --- a/arch/x86/kernel/perf_regs.c
> +++ b/arch/x86/kernel/perf_regs.c
> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>*/
>   regs_user_copy->bx = -1;
>   regs_user_copy->bp = -1;
> + if (user_64bit_mode(user_regs)) {

 Why is it 64bit only? Should work on 32bit too.
>>>
>>> bp register is a part of i386 syscall ABI 
>>> (http://man7.org/linux/man-pages/man2/syscall.2.html) 
>>> so not sure if it will make any sense for 32bit processes. 
>>
>> Both 32bit and 64bit use the same frame pointer, if they
>> use frame pointer.
> 
> Well let me check the same scenario for 32bit binary.

Here is what I have when profiling 32bit process on the patched 64bit 
kernel w/o 32bit frame-pointer exposure:

vmlinux ! try_to_wake_up - [unknown source file]
vmlinux ! wake_up_q + 0x3e - [unknown source file]
vmlinux ! futex_wake + 0x141 - [unknown source file]
vmlinux ! do_futex + 0x49b - [unknown source file]
vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
==> [vdso] ! __kernel_vsyscall + 0x8 - [unknown source file]
==> libc-2.26.so ! syscall + 0x26 - [unknown source file]
==> futex32-fp ! main + 0xba - [unknown source file]
==> libc-2.26.so ! __libc_start_main + 0xf2 - [unknown source file]

so stack is unwound till the top. However if I enable 32bit exposure 
then the stack looks like this:

vmlinux ! try_to_wake_up - [unknown source file]
vmlinux ! wake_up_q + 0x3e - [unknown source file]
vmlinux ! futex_wake + 0x141 - [unknown source file]
vmlinux ! do_futex + 0x49b - [unknown source file]
vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
==> [vdso] ! [vdso] + 0x1058 - [unknown source file]
==> vmlinux ! [Skipped stack frame(s)] + 0x1 - [unknown source file]

and x86_64 perf report --stdio shows this:

...
unwind: target platform=x86 is not supported
...
# Samples: 140K of event 'cycles'
# Event count (approx.): 93688193797
#
# Children  Self  Command Shared Object Symbol  
  
#     ..    .
#
86.00%14.40%  futex32-fp  [kernel.vmlinux]  [k] entry_SYSENTER_compat
|
---entry_SYSENTER_compat
   |  
--71.60%--do_fast_syscall_32
  |  
  |--54.62%--compat_sys_futex
  |  |  
  |   --53.67%--do_futex

I am not sure it is worth exposing frame pointer for 32bit too.

-Alexey

> If the issue exists for it too and is fixed by the exposing bp
> then it is obviously worth this improvement.
> 
> -Alexey
> 
>>
>> -Andi
>>
> 
>

Re: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo

2018-04-08 Thread David Wang

> -邮件原件-
> 发件人: David Wang [mailto:davidw...@zhaoxin.com]
> 发送时间: 2018年4月8日 17:36
> 收件人: t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
> mi...@kernel.org; gre...@linuxfoundation.org; x...@kernel.org;
> linux-kernel@vger.kernel.org
> 抄送: brucech...@via-alliance.com; cooper...@zhaoxin.com;
> qiyuanw...@zhaoxin.com; benjamin...@viatech.com; luke...@viacpu.com;
> tim...@zhaoxin.com; David Wang 
> 主题: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo
> 
> We add this patch to show correct HW features(arch_perfmon, tpr_shadow,
> vnmi, flexpriority, ept and vpid) when user execute "cat /proc/cpuinfo".
> 
> Signed-off-by: David Wang 
> ---
>  arch/x86/kernel/cpu/centaur.c | 49
> +++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
> index e5ec0f1..969fb8f 100644
> --- a/arch/x86/kernel/cpu/centaur.c
> +++ b/arch/x86/kernel/cpu/centaur.c
> @@ -112,6 +112,44 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
>   }
>  }
> 
> +static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) {
> +#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x0020
> +#define X86_VMX_FEATURE_PROC_CTLS_VNMI   0x0040
> +#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS   0x8000
> +#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x0001
> +#define X86_VMX_FEATURE_PROC_CTLS2_EPT   0x0002
> +#define X86_VMX_FEATURE_PROC_CTLS2_VPID  0x0020
> +
> + u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
> +
> + clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW);
> + clear_cpu_cap(c, X86_FEATURE_VNMI);
> + clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
> + clear_cpu_cap(c, X86_FEATURE_EPT);
> + clear_cpu_cap(c, X86_FEATURE_VPID);
> +
> + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
> + msr_ctl = vmx_msr_high | vmx_msr_low;
> +
> + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)
> + set_cpu_cap(c, X86_FEATURE_TPR_SHADOW);
> + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI)
> + set_cpu_cap(c, X86_FEATURE_VNMI);
> + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) {
> + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
> +   vmx_msr_low, vmx_msr_high);
> + msr_ctl2 = vmx_msr_high | vmx_msr_low;
> + if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) &&
> + (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW))
> + set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
> + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT)
> + set_cpu_cap(c, X86_FEATURE_EPT);
> + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
> + set_cpu_cap(c, X86_FEATURE_VPID);
> + }
> +}
> +
>  static void init_centaur(struct cpuinfo_x86 *c)  {  #ifdef CONFIG_X86_32
> @@ -128,6 +166,14 @@ static void init_centaur(struct cpuinfo_x86 *c)
>   clear_cpu_cap(c, 0*32+31);
>  #endif
>   early_init_centaur(c);
> +
> + if (c->cpuid_level > 9) {
> + unsigned eax = cpuid_eax(10);
> + /* Check for version and the number of counters */
> + if ((eax & 0xff) && (((eax >> 8) & 0xff) > 1))
> + set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON);
> + }
> +
>   switch (c->x86) {
>  #ifdef CONFIG_X86_32
>   case 5:
> @@ -199,6 +245,9 @@ static void init_centaur(struct cpuinfo_x86 *c)
#ifdef
> CONFIG_X86_64
>   set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);  #endif
> +
> + if (cpu_has(c, X86_FEATURE_VMX))
> + centaur_detect_vmx_virtcap(c);
>  }
> 
>  #ifdef CONFIG_X86_32
> --
> 1.9.1

Sorry to send to wrong email address.
---
David

Re: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo

2018-04-08 Thread David Wang

> -邮件原件-
> 发件人: David Wang [mailto:davidw...@zhaoxin.com]
> 发送时间: 2018年4月8日 17:36
> 收件人: t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
> mi...@kernel.org; gre...@linuxfoundation.org; x...@kernel.org;
> linux-kernel@vger.kernel.org
> 抄送: brucech...@via-alliance.com; cooper...@zhaoxin.com;
> qiyuanw...@zhaoxin.com; benjamin...@viatech.com; luke...@viacpu.com;
> tim...@zhaoxin.com; David Wang 
> 主题: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo
> 
> We add this patch to show correct HW features(arch_perfmon, tpr_shadow,
> vnmi, flexpriority, ept and vpid) when user execute "cat /proc/cpuinfo".
> 
> Signed-off-by: David Wang 
> ---
>  arch/x86/kernel/cpu/centaur.c | 49
> +++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
> index e5ec0f1..969fb8f 100644
> --- a/arch/x86/kernel/cpu/centaur.c
> +++ b/arch/x86/kernel/cpu/centaur.c
> @@ -112,6 +112,44 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
>   }
>  }
> 
> +static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) {
> +#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x0020
> +#define X86_VMX_FEATURE_PROC_CTLS_VNMI   0x0040
> +#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS   0x8000
> +#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x0001
> +#define X86_VMX_FEATURE_PROC_CTLS2_EPT   0x0002
> +#define X86_VMX_FEATURE_PROC_CTLS2_VPID  0x0020
> +
> + u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
> +
> + clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW);
> + clear_cpu_cap(c, X86_FEATURE_VNMI);
> + clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
> + clear_cpu_cap(c, X86_FEATURE_EPT);
> + clear_cpu_cap(c, X86_FEATURE_VPID);
> +
> + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
> + msr_ctl = vmx_msr_high | vmx_msr_low;
> +
> + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)
> + set_cpu_cap(c, X86_FEATURE_TPR_SHADOW);
> + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI)
> + set_cpu_cap(c, X86_FEATURE_VNMI);
> + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) {
> + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
> +   vmx_msr_low, vmx_msr_high);
> + msr_ctl2 = vmx_msr_high | vmx_msr_low;
> + if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) &&
> + (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW))
> + set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
> + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT)
> + set_cpu_cap(c, X86_FEATURE_EPT);
> + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
> + set_cpu_cap(c, X86_FEATURE_VPID);
> + }
> +}
> +
>  static void init_centaur(struct cpuinfo_x86 *c)  {  #ifdef CONFIG_X86_32
> @@ -128,6 +166,14 @@ static void init_centaur(struct cpuinfo_x86 *c)
>   clear_cpu_cap(c, 0*32+31);
>  #endif
>   early_init_centaur(c);
> +
> + if (c->cpuid_level > 9) {
> + unsigned eax = cpuid_eax(10);
> + /* Check for version and the number of counters */
> + if ((eax & 0xff) && (((eax >> 8) & 0xff) > 1))
> + set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON);
> + }
> +
>   switch (c->x86) {
>  #ifdef CONFIG_X86_32
>   case 5:
> @@ -199,6 +245,9 @@ static void init_centaur(struct cpuinfo_x86 *c)
#ifdef
> CONFIG_X86_64
>   set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);  #endif
> +
> + if (cpu_has(c, X86_FEATURE_VMX))
> + centaur_detect_vmx_virtcap(c);
>  }
> 
>  #ifdef CONFIG_X86_32
> --
> 1.9.1

Sorry to send to wrong email address.
---
David

Re: [PATCH] thermal: devfreq_cooling: add const to struct thermal_cooling_device_ops

2018-04-08 Thread kbuild test robot

Hi srp,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on thermal/next]
[also build test ERROR on v4.16 next-20180406]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/srplinux2008/thermal-devfreq_cooling-add-const-to-struct-thermal_cooling_device_ops/20180409-105457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next
config: x86_64-randconfig-x010-201814 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers//thermal/devfreq_cooling.c: In function 
'of_devfreq_cooling_register_power':
>> drivers//thermal/devfreq_cooling.c:522:43: error: assignment of member 
>> 'get_requested_power' in read-only object
  devfreq_cooling_ops.get_requested_power =
  ^
>> drivers//thermal/devfreq_cooling.c:524:35: error: assignment of member 
>> 'state2power' in read-only object
  devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
  ^
>> drivers//thermal/devfreq_cooling.c:525:35: error: assignment of member 
>> 'power2state' in read-only object
  devfreq_cooling_ops.power2state = devfreq_cooling_power2state;
  ^

vim +/get_requested_power +522 drivers//thermal/devfreq_cooling.c

a76caf55 Ørjan Eide 2015-09-10  488  
a76caf55 Ørjan Eide 2015-09-10  489  /**
a76caf55 Ørjan Eide 2015-09-10  490   * of_devfreq_cooling_register_power() 
- Register devfreq cooling device,
a76caf55 Ørjan Eide 2015-09-10  491   * 
 with OF and power information.
a76caf55 Ørjan Eide 2015-09-10  492   * @np:Pointer to OF 
device_node.
a76caf55 Ørjan Eide 2015-09-10  493   * @df:Pointer to devfreq 
device.
a76caf55 Ørjan Eide 2015-09-10  494   * @dfc_power: Pointer to 
devfreq_cooling_power.
a76caf55 Ørjan Eide 2015-09-10  495   *
a76caf55 Ørjan Eide 2015-09-10  496   * Register a devfreq cooling device.  
The available OPPs must be
a76caf55 Ørjan Eide 2015-09-10  497   * registered on the device.
a76caf55 Ørjan Eide 2015-09-10  498   *
a76caf55 Ørjan Eide 2015-09-10  499   * If @dfc_power is provided, the 
cooling device is registered with the
a76caf55 Ørjan Eide 2015-09-10  500   * power extensions.  For the power 
extensions to work correctly,
a76caf55 Ørjan Eide 2015-09-10  501   * devfreq should use the 
simple_ondemand governor, other governors
a76caf55 Ørjan Eide 2015-09-10  502   * are not currently supported.
a76caf55 Ørjan Eide 2015-09-10  503   */
3c99c2ce Javi Merino2015-11-02  504  struct thermal_cooling_device *
a76caf55 Ørjan Eide 2015-09-10  505  
of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
a76caf55 Ørjan Eide 2015-09-10  506   
struct devfreq_cooling_power *dfc_power)
a76caf55 Ørjan Eide 2015-09-10  507  {
a76caf55 Ørjan Eide 2015-09-10  508 struct thermal_cooling_device 
*cdev;
a76caf55 Ørjan Eide 2015-09-10  509 struct devfreq_cooling_device 
*dfc;
a76caf55 Ørjan Eide 2015-09-10  510 char 
dev_name[THERMAL_NAME_LENGTH];
a76caf55 Ørjan Eide 2015-09-10  511 int err;
a76caf55 Ørjan Eide 2015-09-10  512  
a76caf55 Ørjan Eide 2015-09-10  513 dfc = kzalloc(sizeof(*dfc), 
GFP_KERNEL);
a76caf55 Ørjan Eide 2015-09-10  514 if (!dfc)
a76caf55 Ørjan Eide 2015-09-10  515 return ERR_PTR(-ENOMEM);
a76caf55 Ørjan Eide 2015-09-10  516  
a76caf55 Ørjan Eide 2015-09-10  517 dfc->devfreq = df;
a76caf55 Ørjan Eide 2015-09-10  518  
a76caf55 Ørjan Eide 2015-09-10  519 if (dfc_power) {
a76caf55 Ørjan Eide 2015-09-10  520 dfc->power_ops = 
dfc_power;
a76caf55 Ørjan Eide 2015-09-10  521  
a76caf55 Ørjan Eide 2015-09-10 @522 
devfreq_cooling_ops.get_requested_power =
a76caf55 Ørjan Eide 2015-09-10  523 
devfreq_cooling_get_requested_power;
a76caf55 Ørjan Eide 2015-09-10 @524 
devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
a76caf55 Ørjan Eide 2015-09-10 @525 
devfreq_cooling_ops.power2state = devfreq_cooling_power2state;
a76caf55 Ørjan Eide 2015-09-10  526 }
a76caf55 Ørjan Eide 2015-09-10  527  
a76caf55 Ørjan Eide 2015-09-10  528 err = 
devfreq_cooling_gen_tables(dfc);
a76caf55 Ørjan Eide 2015-09-10  529 if (err)
a76caf55 Ørjan Eide 2015-09-10  530 goto free_dfc;
a76caf55 Ørjan Eide 2015-09-10  531  
2f96c035 Matthew Wilcox 2016-12-21  532 err = 
ida_simple_get(_ida, 0, 0, GFP_KERNEL);
2f96c035 Matthew

Re: [PATCH] thermal: devfreq_cooling: add const to struct thermal_cooling_device_ops

2018-04-08 Thread kbuild test robot

Hi srp,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on thermal/next]
[also build test ERROR on v4.16 next-20180406]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/srplinux2008/thermal-devfreq_cooling-add-const-to-struct-thermal_cooling_device_ops/20180409-105457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next
config: x86_64-randconfig-x010-201814 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers//thermal/devfreq_cooling.c: In function 
'of_devfreq_cooling_register_power':
>> drivers//thermal/devfreq_cooling.c:522:43: error: assignment of member 
>> 'get_requested_power' in read-only object
  devfreq_cooling_ops.get_requested_power =
  ^
>> drivers//thermal/devfreq_cooling.c:524:35: error: assignment of member 
>> 'state2power' in read-only object
  devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
  ^
>> drivers//thermal/devfreq_cooling.c:525:35: error: assignment of member 
>> 'power2state' in read-only object
  devfreq_cooling_ops.power2state = devfreq_cooling_power2state;
  ^

vim +/get_requested_power +522 drivers//thermal/devfreq_cooling.c

a76caf55 Ørjan Eide 2015-09-10  488  
a76caf55 Ørjan Eide 2015-09-10  489  /**
a76caf55 Ørjan Eide 2015-09-10  490   * of_devfreq_cooling_register_power() 
- Register devfreq cooling device,
a76caf55 Ørjan Eide 2015-09-10  491   * 
 with OF and power information.
a76caf55 Ørjan Eide 2015-09-10  492   * @np:Pointer to OF 
device_node.
a76caf55 Ørjan Eide 2015-09-10  493   * @df:Pointer to devfreq 
device.
a76caf55 Ørjan Eide 2015-09-10  494   * @dfc_power: Pointer to 
devfreq_cooling_power.
a76caf55 Ørjan Eide 2015-09-10  495   *
a76caf55 Ørjan Eide 2015-09-10  496   * Register a devfreq cooling device.  
The available OPPs must be
a76caf55 Ørjan Eide 2015-09-10  497   * registered on the device.
a76caf55 Ørjan Eide 2015-09-10  498   *
a76caf55 Ørjan Eide 2015-09-10  499   * If @dfc_power is provided, the 
cooling device is registered with the
a76caf55 Ørjan Eide 2015-09-10  500   * power extensions.  For the power 
extensions to work correctly,
a76caf55 Ørjan Eide 2015-09-10  501   * devfreq should use the 
simple_ondemand governor, other governors
a76caf55 Ørjan Eide 2015-09-10  502   * are not currently supported.
a76caf55 Ørjan Eide 2015-09-10  503   */
3c99c2ce Javi Merino2015-11-02  504  struct thermal_cooling_device *
a76caf55 Ørjan Eide 2015-09-10  505  
of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
a76caf55 Ørjan Eide 2015-09-10  506   
struct devfreq_cooling_power *dfc_power)
a76caf55 Ørjan Eide 2015-09-10  507  {
a76caf55 Ørjan Eide 2015-09-10  508 struct thermal_cooling_device 
*cdev;
a76caf55 Ørjan Eide 2015-09-10  509 struct devfreq_cooling_device 
*dfc;
a76caf55 Ørjan Eide 2015-09-10  510 char 
dev_name[THERMAL_NAME_LENGTH];
a76caf55 Ørjan Eide 2015-09-10  511 int err;
a76caf55 Ørjan Eide 2015-09-10  512  
a76caf55 Ørjan Eide 2015-09-10  513 dfc = kzalloc(sizeof(*dfc), 
GFP_KERNEL);
a76caf55 Ørjan Eide 2015-09-10  514 if (!dfc)
a76caf55 Ørjan Eide 2015-09-10  515 return ERR_PTR(-ENOMEM);
a76caf55 Ørjan Eide 2015-09-10  516  
a76caf55 Ørjan Eide 2015-09-10  517 dfc->devfreq = df;
a76caf55 Ørjan Eide 2015-09-10  518  
a76caf55 Ørjan Eide 2015-09-10  519 if (dfc_power) {
a76caf55 Ørjan Eide 2015-09-10  520 dfc->power_ops = 
dfc_power;
a76caf55 Ørjan Eide 2015-09-10  521  
a76caf55 Ørjan Eide 2015-09-10 @522 
devfreq_cooling_ops.get_requested_power =
a76caf55 Ørjan Eide 2015-09-10  523 
devfreq_cooling_get_requested_power;
a76caf55 Ørjan Eide 2015-09-10 @524 
devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
a76caf55 Ørjan Eide 2015-09-10 @525 
devfreq_cooling_ops.power2state = devfreq_cooling_power2state;
a76caf55 Ørjan Eide 2015-09-10  526 }
a76caf55 Ørjan Eide 2015-09-10  527  
a76caf55 Ørjan Eide 2015-09-10  528 err = 
devfreq_cooling_gen_tables(dfc);
a76caf55 Ørjan Eide 2015-09-10  529 if (err)
a76caf55 Ørjan Eide 2015-09-10  530 goto free_dfc;
a76caf55 Ørjan Eide 2015-09-10  531  
2f96c035 Matthew Wilcox 2016-12-21  532 err = 
ida_simple_get(_ida, 0, 0, GFP_KERNEL);
2f96c035 Matthew

linux-next: Tree for Apr 9

2018-04-08 Thread Stephen Rothwell

Hi all,

Please do not add any v4.18 destined stuff to your linux-next included
trees until after v4.17-rc1 has been released.

Changes since 20180406:

The vfs tree lost its build failure.

The parisc-hd tree still had its build failure for which I applied a patch.

The nvdimm tree gained a build failure so I used the version from
next-20180406.

Non-merge commits (relative to Linus' tree): 1826
 1817 files changed, 67325 insertions(+), 33557 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 258 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (f8cf2f16a7c9 Merge branch 'next-integrity' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add 
correct dependency to Makefile)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (1b8837b61714 ARM: 8750/1: deflate_xip_data.sh: minor 
fixes)
Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax 
ARM_SMCCC_ARCH_WORKAROUND_1 discovery)
Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" 
comment)
Merging powerpc-fixes/fixes (52396500f97c powerpc/64s: Fix i-side SLB miss bad 
address handler saving nonvolatile GPRs)
Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (4c7c12e0c9b8 Merge branch 'for-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth)
Merging bpf/master (33491588c1fb kernel/bpf/syscall: fix warning defined but 
not used)
Merging ipsec/master (9a3fb9fb84cc xfrm: Fix transport mode skb control buffer 
usage.)
Merging netfilter/master (b9fc828debc8 qede: Fix barrier usage after tx 
doorbell write.)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (4608f064532c Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next)
Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer 
rdma_addr_size() variants)
Merging sound-current/for-linus (e15dc99dbb9c ALSA: pcm: Fix endless loop for 
XRUN recovery in OSS emulation)
Merging pci-current/for-linus (fc110ebdd014 PCI: dwc: Fix enumeration end when 
reaching root subordinate)
Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 
'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: 
add binging for r8a77965)
Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add

linux-next: Tree for Apr 9

2018-04-08 Thread Stephen Rothwell

Hi all,

Please do not add any v4.18 destined stuff to your linux-next included
trees until after v4.17-rc1 has been released.

Changes since 20180406:

The vfs tree lost its build failure.

The parisc-hd tree still had its build failure for which I applied a patch.

The nvdimm tree gained a build failure so I used the version from
next-20180406.

Non-merge commits (relative to Linus' tree): 1826
 1817 files changed, 67325 insertions(+), 33557 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 258 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (f8cf2f16a7c9 Merge branch 'next-integrity' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add 
correct dependency to Makefile)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (1b8837b61714 ARM: 8750/1: deflate_xip_data.sh: minor 
fixes)
Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax 
ARM_SMCCC_ARCH_WORKAROUND_1 discovery)
Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" 
comment)
Merging powerpc-fixes/fixes (52396500f97c powerpc/64s: Fix i-side SLB miss bad 
address handler saving nonvolatile GPRs)
Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (4c7c12e0c9b8 Merge branch 'for-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth)
Merging bpf/master (33491588c1fb kernel/bpf/syscall: fix warning defined but 
not used)
Merging ipsec/master (9a3fb9fb84cc xfrm: Fix transport mode skb control buffer 
usage.)
Merging netfilter/master (b9fc828debc8 qede: Fix barrier usage after tx 
doorbell write.)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (4608f064532c Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next)
Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer 
rdma_addr_size() variants)
Merging sound-current/for-linus (e15dc99dbb9c ALSA: pcm: Fix endless loop for 
XRUN recovery in OSS emulation)
Merging pci-current/for-linus (fc110ebdd014 PCI: dwc: Fix enumeration end when 
reaching root subordinate)
Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 
'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: 
add binging for r8a77965)
Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add

[PATCH] ipc/shm: fix use-after-free of shm file via remap_file_pages()

2018-04-08 Thread Eric Biggers

From: Eric Biggers 

syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().
Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

#include 
#include 
#include 
#include 

int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 
4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

BUG: unable to handle kernel NULL pointer dereference at 
0058
PGD 0 P4D 0
Oops:  [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 
#189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
 file_accessed include/linux/fs.h:2063 [inline]
 shmem_mmap+0x25/0x40 mm/shmem.c:2149
 call_mmap include/linux/fs.h:1789 [inline]
 shm_mmap+0x34/0x80 ipc/shm.c:465
 call_mmap include/linux/fs.h:1789 [inline]
 mmap_region+0x309/0x5b0 mm/mmap.c:1712
 do_mmap+0x294/0x4a0 mm/mmap.c:1483
 do_mmap_pgoff include/linux/mm.h:2235 [inline]
 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Reported-by: 
syzbot+d11f321e7f1923157eac80aa990b446596f46...@syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Cc: sta...@vger.kernel.org
Signed-off-by: Eric Biggers 
---
 ipc/shm.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index acefe44fefefa..c80c5691a9970 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_struct *vma)
if (IS_ERR(shp))
return PTR_ERR(shp);
 
+   if (shp->shm_file != sfd->file) {
+   /* ID was reused */
+   shm_unlock(shp);
+   return -EINVAL;
+   }
+
shp->shm_atim = ktime_get_real_seconds();
ipc_update_pid(>shm_lprid, task_tgid(current));
shp->shm_nattch++;
@@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, struct 
vm_area_struct *vma)
int ret;
 
/*
-* In case of remap_file_pages() emulation, the file can represent
-* removed IPC ID: propogate shm_lock() error to caller.
+* In case of remap_file_pages() emulation, the file can represent an
+* IPC ID that was removed, and possibly even reused by another shm
+* segment already.  Propagate this case as an error to caller.
 */
ret = __shm_open(vma);
if (ret)
@@ -480,6 +487,7 @@ static int shm_release(struct inode *ino, struct file *file)
struct shm_file_data *sfd = shm_file_data(file);
 
put_ipc_ns(sfd->ns);
+   fput(sfd->file);
shm_file_data(file) = NULL;
kfree(sfd);

[PATCH] ipc/shm: fix use-after-free of shm file via remap_file_pages()

2018-04-08 Thread Eric Biggers

From: Eric Biggers 

syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().
Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

#include 
#include 
#include 
#include 

int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 
4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

BUG: unable to handle kernel NULL pointer dereference at 
0058
PGD 0 P4D 0
Oops:  [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 
#189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
 file_accessed include/linux/fs.h:2063 [inline]
 shmem_mmap+0x25/0x40 mm/shmem.c:2149
 call_mmap include/linux/fs.h:1789 [inline]
 shm_mmap+0x34/0x80 ipc/shm.c:465
 call_mmap include/linux/fs.h:1789 [inline]
 mmap_region+0x309/0x5b0 mm/mmap.c:1712
 do_mmap+0x294/0x4a0 mm/mmap.c:1483
 do_mmap_pgoff include/linux/mm.h:2235 [inline]
 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Reported-by: 
syzbot+d11f321e7f1923157eac80aa990b446596f46...@syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Cc: sta...@vger.kernel.org
Signed-off-by: Eric Biggers 
---
 ipc/shm.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index acefe44fefefa..c80c5691a9970 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_struct *vma)
if (IS_ERR(shp))
return PTR_ERR(shp);
 
+   if (shp->shm_file != sfd->file) {
+   /* ID was reused */
+   shm_unlock(shp);
+   return -EINVAL;
+   }
+
shp->shm_atim = ktime_get_real_seconds();
ipc_update_pid(>shm_lprid, task_tgid(current));
shp->shm_nattch++;
@@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, struct 
vm_area_struct *vma)
int ret;
 
/*
-* In case of remap_file_pages() emulation, the file can represent
-* removed IPC ID: propogate shm_lock() error to caller.
+* In case of remap_file_pages() emulation, the file can represent an
+* IPC ID that was removed, and possibly even reused by another shm
+* segment already.  Propagate this case as an error to caller.
 */
ret = __shm_open(vma);
if (ret)
@@ -480,6 +487,7 @@ static int shm_release(struct inode *ino, struct file *file)
struct shm_file_data *sfd = shm_file_data(file);
 
put_ipc_ns(sfd->ns);
+   fput(sfd->file);
shm_file_data(file) = NULL;
kfree(sfd);
return 0;
@@ -1432,7 +1440,7 @@ long

Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree

2018-04-08 Thread James Morris

On Mon, 9 Apr 2018, Stephen Rothwell wrote:

> Hi James,
> 
> On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morris  
> wrote:
> >
> > That's odd, my next-general branch is merged to Linus.
> 
> The security tree in linux-next is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing
> 
> which has the efi-lock-down tree merged into it.

Ahh, I see.  I'll rebase next-testing.

-- 
James Morris

Re: [PATCH AUTOSEL for 3.18 059/101] x86/um: thin archives build fix

2018-04-08 Thread Nicholas Piggin

On Mon, 9 Apr 2018 00:41:22 +
Sasha Levin  wrote:

> From: Nicholas Piggin 
> 
> [ Upstream commit 827880ec260ba048f95fe646b96a205c394fa0f0 ]
> 
> The linker does not like vdso-syms.lds in input archive files.
> Make it an extra-y instead.

I wouldn't say these should be needed on kernels without thin
archives build.

It shouldn't hurt, but no point risking stable breakage.

Thanks,
Nick

Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree

2018-04-08 Thread James Morris

On Mon, 9 Apr 2018, Stephen Rothwell wrote:

> Hi James,
> 
> On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morris  
> wrote:
> >
> > That's odd, my next-general branch is merged to Linus.
> 
> The security tree in linux-next is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing
> 
> which has the efi-lock-down tree merged into it.

Ahh, I see.  I'll rebase next-testing.

-- 
James Morris

Re: [PATCH AUTOSEL for 3.18 059/101] x86/um: thin archives build fix

2018-04-08 Thread Nicholas Piggin

On Mon, 9 Apr 2018 00:41:22 +
Sasha Levin  wrote:

> From: Nicholas Piggin 
> 
> [ Upstream commit 827880ec260ba048f95fe646b96a205c394fa0f0 ]
> 
> The linker does not like vdso-syms.lds in input archive files.
> Make it an extra-y instead.

I wouldn't say these should be needed on kernels without thin
archives build.

It shouldn't hurt, but no point risking stable breakage.

Thanks,
Nick

Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree

2018-04-08 Thread Stephen Rothwell

Hi James,

On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morris  wrote:
>
> That's odd, my next-general branch is merged to Linus.

The security tree in linux-next is

git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing

which has the efi-lock-down tree merged into it.

-- 
Cheers,
Stephen Rothwell


pgp6JOJRcC1_1.pgp
Description: OpenPGP digital signature

Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree

2018-04-08 Thread Stephen Rothwell

Hi James,

On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morris  wrote:
>
> That's odd, my next-general branch is merged to Linus.

The security tree in linux-next is

git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing

which has the efi-lock-down tree merged into it.

-- 
Cheers,
Stephen Rothwell


pgp6JOJRcC1_1.pgp
Description: OpenPGP digital signature

Re: [PATCH v11 0/4] iommu/arm-smmu: Add runtime pm/sleep support

2018-04-08 Thread Tomasz Figa

Hi Will, Robin,

On Thu, Mar 22, 2018 at 7:22 PM Vivek Gautam 
wrote:

> This series provides the support for turning on the arm-smmu's
> clocks/power domains using runtime pm. This is done using the
> recently introduced device links patches, which lets the smmu's
> runtime to follow the master's runtime pm, so the smmu remains
> powered only when the masters use it.
> As not all implementations support clock/power gating, we are checking
> for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime
> power management for such smmu implementations that can support it.

> This series also adds support for Qcom's arm-smmu-v2 variant that
> has different clocks and power requirements.

> Took some reference from the exynos runtime patches [1].

> With conditional runtime pm now, we avoid touching dev->power.lock
> in fastpaths for smmu implementations that don't need to do anything
> useful with pm_runtime.
> This lets us to use the much-argued pm_runtime_get_sync/put_sync()
> calls in map/unmap callbacks so that the clients do not have to
> worry about handling any of the arm-smmu's power.

> Previous version of this patch series is @ [5].

> [v11]
> * Some more cleanups for device link. We don't need an explicit
>   delete for device link from the driver, but just set the flag
>   DL_FLAG_AUTOREMOVE.
>   device_link_add() API description says -
>   "If the DL_FLAG_AUTOREMOVE is set, the link will be removed
>   automatically when the consumer device driver unbinds."
> * Addressed the comments for 'smmu' in arm_smmu_map/unmap().
> * Dropped the patch [10] that introduced device_link_del_dev() API.

As far as I can see, this version addresses all the earlier comments. Do
you think this is something that you could apply?

Best regards,
Tomasz

Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time

2018-04-08 Thread Sargun Dhillon

On Sun, Apr 8, 2018 at 8:38 PM, Tetsuo Handa
 wrote:
> Suggested changes on top of your patch:
>
>   Replace "struct hlist_head *head" in "struct security_hook_list" with
>   "const unsigned int offset" because there is no need to initialize with
>   address of the immutable/mutable chains.
>
>   Remove LSM_HOOK_INIT_MUTABLE() by embedding just offset (in bytes) from
>   head of "struct security_hook_heads" into "struct 
> security_hook_list"->offset.
>
>   Make "struct security_hook_heads security_hook_heads" and
>   "struct security_hook_heads security_hook_heads_mutable" local variables.
>
>   Rename "struct security_hook_heads security_hook_heads" to
>   "struct security_hook_heads security_mutable_hook_heads" and mark it as
>   __ro_after_init.
>
>   Add the fourth argument to security_add_hooks() which specifies to which
>   chain (security_{mutable|immutable}_hook_heads) to connect.
>
>   Make all built-in LSM modules (except SELinux if
>   CONFIG_SECURITY_SELINUX_DISABLE=y) be connected to
>   security_immutable_hook_heads.
>
>   Rename __lsm_ro_after_init to __selinux_ro_after_init which is local to
>   SELinux.
>
>   Mark "struct security_hook_list"->hook const because it won't change.
>
>   Mark "struct security_hook_list"->lsm const because none of
>   security_add_hooks() callers are ready to modify the third argument.
>
>   Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and
>   the exception in randomize_layout_plugin.c because preventing module
>   unloading won't work as expected.
>

Rather than completely removing the unloading code, might it make
sense to add a BUG_ON or WARN_ON, in security_delete_hooks if
allow_unload_module is false, and owner is not NULL?

Re: [PATCH v11 0/4] iommu/arm-smmu: Add runtime pm/sleep support

2018-04-08 Thread Tomasz Figa

Hi Will, Robin,

On Thu, Mar 22, 2018 at 7:22 PM Vivek Gautam 
wrote:

> This series provides the support for turning on the arm-smmu's
> clocks/power domains using runtime pm. This is done using the
> recently introduced device links patches, which lets the smmu's
> runtime to follow the master's runtime pm, so the smmu remains
> powered only when the masters use it.
> As not all implementations support clock/power gating, we are checking
> for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime
> power management for such smmu implementations that can support it.

> This series also adds support for Qcom's arm-smmu-v2 variant that
> has different clocks and power requirements.

> Took some reference from the exynos runtime patches [1].

> With conditional runtime pm now, we avoid touching dev->power.lock
> in fastpaths for smmu implementations that don't need to do anything
> useful with pm_runtime.
> This lets us to use the much-argued pm_runtime_get_sync/put_sync()
> calls in map/unmap callbacks so that the clients do not have to
> worry about handling any of the arm-smmu's power.

> Previous version of this patch series is @ [5].

> [v11]
> * Some more cleanups for device link. We don't need an explicit
>   delete for device link from the driver, but just set the flag
>   DL_FLAG_AUTOREMOVE.
>   device_link_add() API description says -
>   "If the DL_FLAG_AUTOREMOVE is set, the link will be removed
>   automatically when the consumer device driver unbinds."
> * Addressed the comments for 'smmu' in arm_smmu_map/unmap().
> * Dropped the patch [10] that introduced device_link_del_dev() API.

As far as I can see, this version addresses all the earlier comments. Do
you think this is something that you could apply?

Best regards,
Tomasz

Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time

2018-04-08 Thread Sargun Dhillon

On Sun, Apr 8, 2018 at 8:38 PM, Tetsuo Handa
 wrote:
> Suggested changes on top of your patch:
>
>   Replace "struct hlist_head *head" in "struct security_hook_list" with
>   "const unsigned int offset" because there is no need to initialize with
>   address of the immutable/mutable chains.
>
>   Remove LSM_HOOK_INIT_MUTABLE() by embedding just offset (in bytes) from
>   head of "struct security_hook_heads" into "struct 
> security_hook_list"->offset.
>
>   Make "struct security_hook_heads security_hook_heads" and
>   "struct security_hook_heads security_hook_heads_mutable" local variables.
>
>   Rename "struct security_hook_heads security_hook_heads" to
>   "struct security_hook_heads security_mutable_hook_heads" and mark it as
>   __ro_after_init.
>
>   Add the fourth argument to security_add_hooks() which specifies to which
>   chain (security_{mutable|immutable}_hook_heads) to connect.
>
>   Make all built-in LSM modules (except SELinux if
>   CONFIG_SECURITY_SELINUX_DISABLE=y) be connected to
>   security_immutable_hook_heads.
>
>   Rename __lsm_ro_after_init to __selinux_ro_after_init which is local to
>   SELinux.
>
>   Mark "struct security_hook_list"->hook const because it won't change.
>
>   Mark "struct security_hook_list"->lsm const because none of
>   security_add_hooks() callers are ready to modify the third argument.
>
>   Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and
>   the exception in randomize_layout_plugin.c because preventing module
>   unloading won't work as expected.
>

Rather than completely removing the unloading code, might it make
sense to add a BUG_ON or WARN_ON, in security_delete_hooks if
allow_unload_module is false, and owner is not NULL?

[PATCH AUTOSEL for 4.15 001/189] firewire-ohci: work around oversized DMA reads on JMicron controllers

2018-04-08 Thread Sasha Levin

From: Hector Martin 

[ Upstream commit 188775181bc05f29372b305ef96485840e351fde ]

At least some JMicron controllers issue buggy oversized DMA reads when
fetching context descriptors, always fetching 0x20 bytes at once for
descriptors which are only 0x10 bytes long. This is often harmless, but
can cause page faults on modern systems with IOMMUs:

DMAR: [DMA Read] Request device [05:00.0] fault addr fff56000 [fault reason 06] 
PTE Read access is not set
firewire_ohci :05:00.0: DMA context IT0 has stopped, error code: 
evt_descriptor_read

This works around the problem by always leaving 0x10 padding bytes at
the end of descriptor buffer pages, which should be harmless to do
unconditionally for controllers in case others have the same behavior.

Signed-off-by: Hector Martin 
Reviewed-by: Clemens Ladisch 
Signed-off-by: Stefan Richter 
Signed-off-by: Sasha Levin 
---
 drivers/firewire/ohci.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index ccf52368a073..45c048751f3b 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -1128,7 +1128,13 @@ static int context_add_buffer(struct context *ctx)
return -ENOMEM;
 
offset = (void *)>buffer - (void *)desc;
-   desc->buffer_size = PAGE_SIZE - offset;
+   /*
+* Some controllers, like JMicron ones, always issue 0x20-byte DMA reads
+* for descriptors, even 0x10-byte ones. This can cause page faults when
+* an IOMMU is in use and the oversized read crosses a page boundary.
+* Work around this by always leaving at least 0x10 bytes of padding.
+*/
+   desc->buffer_size = PAGE_SIZE - offset - 0x10;
desc->buffer_bus = bus_addr + offset;
desc->used = 0;
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 001/189] firewire-ohci: work around oversized DMA reads on JMicron controllers

2018-04-08 Thread Sasha Levin

From: Hector Martin 

[ Upstream commit 188775181bc05f29372b305ef96485840e351fde ]

At least some JMicron controllers issue buggy oversized DMA reads when
fetching context descriptors, always fetching 0x20 bytes at once for
descriptors which are only 0x10 bytes long. This is often harmless, but
can cause page faults on modern systems with IOMMUs:

DMAR: [DMA Read] Request device [05:00.0] fault addr fff56000 [fault reason 06] 
PTE Read access is not set
firewire_ohci :05:00.0: DMA context IT0 has stopped, error code: 
evt_descriptor_read

This works around the problem by always leaving 0x10 padding bytes at
the end of descriptor buffer pages, which should be harmless to do
unconditionally for controllers in case others have the same behavior.

Signed-off-by: Hector Martin 
Reviewed-by: Clemens Ladisch 
Signed-off-by: Stefan Richter 
Signed-off-by: Sasha Levin 
---
 drivers/firewire/ohci.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index ccf52368a073..45c048751f3b 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -1128,7 +1128,13 @@ static int context_add_buffer(struct context *ctx)
return -ENOMEM;
 
offset = (void *)>buffer - (void *)desc;
-   desc->buffer_size = PAGE_SIZE - offset;
+   /*
+* Some controllers, like JMicron ones, always issue 0x20-byte DMA reads
+* for descriptors, even 0x10-byte ones. This can cause page faults when
+* an IOMMU is in use and the oversized read crosses a page boundary.
+* Work around this by always leaving at least 0x10 bytes of padding.
+*/
+   desc->buffer_size = PAGE_SIZE - offset - 0x10;
desc->buffer_bus = bus_addr + offset;
desc->used = 0;
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 005/189] hwmon: (ina2xx) Fix access to uninitialized mutex

2018-04-08 Thread Sasha Levin

From: Marek Szyprowski 

[ Upstream commit 0c4c5860e9983eb3da7a3d73ca987643c3ed034b ]

Initialize data->config_lock mutex before it is used by the driver code.

This fixes following warning on Odroid XU3 boards:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 
4.15.0-rc7-next-20180115-1-gb75575dee3f2 #107
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x90/0xc8)
[] (dump_stack) from [] (register_lock_class+0x1c0/0x59c)
[] (register_lock_class) from [] 
(__lock_acquire+0x78/0x1850)
[] (__lock_acquire) from [] (lock_acquire+0xc8/0x2b8)
[] (lock_acquire) from [] (__mutex_lock+0x60/0xa0c)
[] (__mutex_lock) from [] (mutex_lock_nested+0x1c/0x24)
[] (mutex_lock_nested) from [] (ina2xx_set_shunt+0x70/0xb0)
[] (ina2xx_set_shunt) from [] (ina2xx_probe+0x88/0x1b0)
[] (ina2xx_probe) from [] (i2c_device_probe+0x1e0/0x2d0)
[] (i2c_device_probe) from [] 
(driver_probe_device+0x2b8/0x4a0)
[] (driver_probe_device) from [] 
(__driver_attach+0xfc/0x120)
[] (__driver_attach) from [] (bus_for_each_dev+0x58/0x7c)
[] (bus_for_each_dev) from [] (bus_add_driver+0x174/0x250)
[] (bus_add_driver) from [] (driver_register+0x78/0xf4)
[] (driver_register) from [] (i2c_register_driver+0x38/0xa8)
[] (i2c_register_driver) from [] 
(do_one_initcall+0x48/0x18c)
[] (do_one_initcall) from [] 
(kernel_init_freeable+0x110/0x1d4)
[] (kernel_init_freeable) from [] (kernel_init+0x8/0x114)
[] (kernel_init) from [] (ret_from_fork+0x14/0x20)

Fixes: 5d389b125186 ("hwmon: (ina2xx) Make calibration register value fixed")
Signed-off-by: Marek Szyprowski 
Signed-off-by: Guenter Roeck 
Signed-off-by: Sasha Levin 
---
 drivers/hwmon/ina2xx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/hwmon/ina2xx.c b/drivers/hwmon/ina2xx.c
index 62e38fa8cda2..a55823b52b2f 100644
--- a/drivers/hwmon/ina2xx.c
+++ b/drivers/hwmon/ina2xx.c
@@ -438,6 +438,7 @@ static int ina2xx_probe(struct i2c_client *client,
 
/* set the device type */
data->config = _config[chip];
+   mutex_init(>config_lock);
 
if (of_property_read_u32(dev->of_node, "shunt-resistor", ) < 0) {
struct ina2xx_platform_data *pdata = dev_get_platdata(dev);
@@ -467,8 +468,6 @@ static int ina2xx_probe(struct i2c_client *client,
return -ENODEV;
}
 
-   mutex_init(>config_lock);
-
data->groups[group++] = _group;
if (id->driver_data == ina226)
data->groups[group++] = _group;
-- 
2.15.1

[PATCH AUTOSEL for 4.15 004/189] nvme: host delete_work and reset_work on separate workqueues

2018-04-08 Thread Sasha Levin

From: Roy Shterman 

[ Upstream commit b227c59b9b5b8ae52639c8980af853d2f654f90a ]

We need to ensure that delete_work will be hosted on a different
workqueue than all the works we flush or cancel from it.
Otherwise we may hit a circular dependency warning [1].

Also, given that delete_work flushes reset_work, host reset_work
on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
fix the flushing in the individual drivers to flush nvme_delete_wq
when draining queued deletes.

[1]:
[  178.491942] =
[  178.492718] [ INFO: possible recursive locking detected ]
[  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G   OE
[  178.494382] -
[  178.495160] kworker/5:1/135 is trying to acquire lock:
[  178.495894]  (
[  178.496120] "nvme-wq"
[  178.496471] ){.+}
[  178.496599] , at:
[  178.496921] [] flush_work+0x1a6/0x2d0
[  178.497670]
   but task is already holding lock:
[  178.498499]  (
[  178.498724] "nvme-wq"
[  178.499074] ){.+}
[  178.499202] , at:
[  178.499520] [] process_one_work+0x162/0x6a0
[  178.500343]
   other info that might help us debug this:
[  178.501269]  Possible unsafe locking scenario:

[  178.502113]CPU0
[  178.502472]
[  178.502829]   lock(
[  178.503115] "nvme-wq"
[  178.503467] );
[  178.503716]   lock(
[  178.504001] "nvme-wq"
[  178.504353] );
[  178.504601]
*** DEADLOCK ***

[  178.505441]  May be due to missing lock nesting notation

[  178.506453] 2 locks held by kworker/5:1/135:
[  178.507068]  #0:
[  178.507330]  (
[  178.507598] "nvme-wq"
[  178.507726] ){.+}
[  178.508079] , at:
[  178.508173] [] process_one_work+0x162/0x6a0
[  178.509004]  #1:
[  178.509265]  (
[  178.509532] (>delete_work)
[  178.509795] ){+.+.+.}
[  178.510145] , at:
[  178.510239] [] process_one_work+0x162/0x6a0
[  178.511070]
   stack backtrace:
:
[  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G   OE   
4.9.0-rc4-c844263313a8-lb #3
[  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.1-1ubuntu1 04/01/2014
[  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
[  178.515071]  c2668175bae0 a7450823 a88abd80 
a88abd80
[  178.516195]  c2668175bb98 a70eb012 a8d8d90d 
9c472e9ea700
[  178.517318]  9c472e9ea700 9c47 9c477200 
ab83be61bec0d50e
[  178.518443] Call Trace:
[  178.518807]  [] dump_stack+0x85/0xc2
[  178.519542]  [] __lock_acquire+0x17d2/0x18f0
[  178.520377]  [] ? serial8250_console_putchar+0x27/0x30
[  178.521330]  [] ? wait_for_xmitr+0xa0/0xa0
[  178.522174]  [] ? flush_work+0x18b/0x2d0
[  178.522975]  [] lock_acquire+0x11b/0x220
[  178.523753]  [] ? flush_work+0x1a6/0x2d0
[  178.524535]  [] flush_work+0x1c9/0x2d0
[  178.525291]  [] ? flush_work+0x1a6/0x2d0
[  178.526077]  [] ? flush_workqueue_prep_pwqs+0x220/0x220
[  178.527040]  [] __cancel_work_timer+0x10f/0x1d0
[  178.527907]  [] ? vprintk_default+0x29/0x40
[  178.528726]  [] ? printk+0x48/0x50
[  178.529434]  [] cancel_delayed_work_sync+0x13/0x20
[  178.530381]  [] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
[  178.531314]  [] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
[  178.532271]  [] process_one_work+0x1e1/0x6a0
[  178.533101]  [] ? process_one_work+0x162/0x6a0
[  178.533954]  [] worker_thread+0x4e/0x490
[  178.534735]  [] ? process_one_work+0x6a0/0x6a0
[  178.535588]  [] ? process_one_work+0x6a0/0x6a0
[  178.536441]  [] kthread+0xff/0x120
[  178.537149]  [] ? kthread_park+0x60/0x60
[  178.538094]  [] ? kthread_park+0x60/0x60
[  178.538900]  [] ret_from_fork+0x2a/0x40

Signed-off-by: Roy Shterman 
Signed-off-by: Sagi Grimberg 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 drivers/nvme/host/core.c   | 44 +++-
 drivers/nvme/host/nvme.h   |  2 ++
 drivers/nvme/host/rdma.c   |  2 +-
 drivers/nvme/target/loop.c |  2 +-
 4 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 935593032123..93a4fa053e7f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -65,9 +65,26 @@ static bool streams;
 module_param(streams, bool, 0644);
 MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
 
+/*
+ * nvme_wq - hosts nvme related works that are not reset or delete
+ * nvme_reset_wq - hosts nvme reset works
+ * nvme_delete_wq - hosts nvme delete works
+ *
+ * nvme_wq will host works such are scan, aen handling, fw activation,
+ * keep-alive error recovery, periodic reconnects etc. nvme_reset_wq
+ * runs reset works which also flush works hosted on nvme_wq for
+ * serialization purposes. nvme_delete_wq host controller deletion
+ * works which flush reset works for serialization.
+ */

[PATCH AUTOSEL for 4.15 005/189] hwmon: (ina2xx) Fix access to uninitialized mutex

2018-04-08 Thread Sasha Levin

From: Marek Szyprowski 

[ Upstream commit 0c4c5860e9983eb3da7a3d73ca987643c3ed034b ]

Initialize data->config_lock mutex before it is used by the driver code.

This fixes following warning on Odroid XU3 boards:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 
4.15.0-rc7-next-20180115-1-gb75575dee3f2 #107
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x90/0xc8)
[] (dump_stack) from [] (register_lock_class+0x1c0/0x59c)
[] (register_lock_class) from [] 
(__lock_acquire+0x78/0x1850)
[] (__lock_acquire) from [] (lock_acquire+0xc8/0x2b8)
[] (lock_acquire) from [] (__mutex_lock+0x60/0xa0c)
[] (__mutex_lock) from [] (mutex_lock_nested+0x1c/0x24)
[] (mutex_lock_nested) from [] (ina2xx_set_shunt+0x70/0xb0)
[] (ina2xx_set_shunt) from [] (ina2xx_probe+0x88/0x1b0)
[] (ina2xx_probe) from [] (i2c_device_probe+0x1e0/0x2d0)
[] (i2c_device_probe) from [] 
(driver_probe_device+0x2b8/0x4a0)
[] (driver_probe_device) from [] 
(__driver_attach+0xfc/0x120)
[] (__driver_attach) from [] (bus_for_each_dev+0x58/0x7c)
[] (bus_for_each_dev) from [] (bus_add_driver+0x174/0x250)
[] (bus_add_driver) from [] (driver_register+0x78/0xf4)
[] (driver_register) from [] (i2c_register_driver+0x38/0xa8)
[] (i2c_register_driver) from [] 
(do_one_initcall+0x48/0x18c)
[] (do_one_initcall) from [] 
(kernel_init_freeable+0x110/0x1d4)
[] (kernel_init_freeable) from [] (kernel_init+0x8/0x114)
[] (kernel_init) from [] (ret_from_fork+0x14/0x20)

Fixes: 5d389b125186 ("hwmon: (ina2xx) Make calibration register value fixed")
Signed-off-by: Marek Szyprowski 
Signed-off-by: Guenter Roeck 
Signed-off-by: Sasha Levin 
---
 drivers/hwmon/ina2xx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/hwmon/ina2xx.c b/drivers/hwmon/ina2xx.c
index 62e38fa8cda2..a55823b52b2f 100644
--- a/drivers/hwmon/ina2xx.c
+++ b/drivers/hwmon/ina2xx.c
@@ -438,6 +438,7 @@ static int ina2xx_probe(struct i2c_client *client,
 
/* set the device type */
data->config = _config[chip];
+   mutex_init(>config_lock);
 
if (of_property_read_u32(dev->of_node, "shunt-resistor", ) < 0) {
struct ina2xx_platform_data *pdata = dev_get_platdata(dev);
@@ -467,8 +468,6 @@ static int ina2xx_probe(struct i2c_client *client,
return -ENODEV;
}
 
-   mutex_init(>config_lock);
-
data->groups[group++] = _group;
if (id->driver_data == ina226)
data->groups[group++] = _group;
-- 
2.15.1

[PATCH AUTOSEL for 4.15 004/189] nvme: host delete_work and reset_work on separate workqueues

2018-04-08 Thread Sasha Levin

From: Roy Shterman 

[ Upstream commit b227c59b9b5b8ae52639c8980af853d2f654f90a ]

We need to ensure that delete_work will be hosted on a different
workqueue than all the works we flush or cancel from it.
Otherwise we may hit a circular dependency warning [1].

Also, given that delete_work flushes reset_work, host reset_work
on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
fix the flushing in the individual drivers to flush nvme_delete_wq
when draining queued deletes.

[1]:
[  178.491942] =
[  178.492718] [ INFO: possible recursive locking detected ]
[  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G   OE
[  178.494382] -
[  178.495160] kworker/5:1/135 is trying to acquire lock:
[  178.495894]  (
[  178.496120] "nvme-wq"
[  178.496471] ){.+}
[  178.496599] , at:
[  178.496921] [] flush_work+0x1a6/0x2d0
[  178.497670]
   but task is already holding lock:
[  178.498499]  (
[  178.498724] "nvme-wq"
[  178.499074] ){.+}
[  178.499202] , at:
[  178.499520] [] process_one_work+0x162/0x6a0
[  178.500343]
   other info that might help us debug this:
[  178.501269]  Possible unsafe locking scenario:

[  178.502113]CPU0
[  178.502472]
[  178.502829]   lock(
[  178.503115] "nvme-wq"
[  178.503467] );
[  178.503716]   lock(
[  178.504001] "nvme-wq"
[  178.504353] );
[  178.504601]
*** DEADLOCK ***

[  178.505441]  May be due to missing lock nesting notation

[  178.506453] 2 locks held by kworker/5:1/135:
[  178.507068]  #0:
[  178.507330]  (
[  178.507598] "nvme-wq"
[  178.507726] ){.+}
[  178.508079] , at:
[  178.508173] [] process_one_work+0x162/0x6a0
[  178.509004]  #1:
[  178.509265]  (
[  178.509532] (>delete_work)
[  178.509795] ){+.+.+.}
[  178.510145] , at:
[  178.510239] [] process_one_work+0x162/0x6a0
[  178.511070]
   stack backtrace:
:
[  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G   OE   
4.9.0-rc4-c844263313a8-lb #3
[  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.1-1ubuntu1 04/01/2014
[  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
[  178.515071]  c2668175bae0 a7450823 a88abd80 
a88abd80
[  178.516195]  c2668175bb98 a70eb012 a8d8d90d 
9c472e9ea700
[  178.517318]  9c472e9ea700 9c47 9c477200 
ab83be61bec0d50e
[  178.518443] Call Trace:
[  178.518807]  [] dump_stack+0x85/0xc2
[  178.519542]  [] __lock_acquire+0x17d2/0x18f0
[  178.520377]  [] ? serial8250_console_putchar+0x27/0x30
[  178.521330]  [] ? wait_for_xmitr+0xa0/0xa0
[  178.522174]  [] ? flush_work+0x18b/0x2d0
[  178.522975]  [] lock_acquire+0x11b/0x220
[  178.523753]  [] ? flush_work+0x1a6/0x2d0
[  178.524535]  [] flush_work+0x1c9/0x2d0
[  178.525291]  [] ? flush_work+0x1a6/0x2d0
[  178.526077]  [] ? flush_workqueue_prep_pwqs+0x220/0x220
[  178.527040]  [] __cancel_work_timer+0x10f/0x1d0
[  178.527907]  [] ? vprintk_default+0x29/0x40
[  178.528726]  [] ? printk+0x48/0x50
[  178.529434]  [] cancel_delayed_work_sync+0x13/0x20
[  178.530381]  [] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
[  178.531314]  [] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
[  178.532271]  [] process_one_work+0x1e1/0x6a0
[  178.533101]  [] ? process_one_work+0x162/0x6a0
[  178.533954]  [] worker_thread+0x4e/0x490
[  178.534735]  [] ? process_one_work+0x6a0/0x6a0
[  178.535588]  [] ? process_one_work+0x6a0/0x6a0
[  178.536441]  [] kthread+0xff/0x120
[  178.537149]  [] ? kthread_park+0x60/0x60
[  178.538094]  [] ? kthread_park+0x60/0x60
[  178.538900]  [] ret_from_fork+0x2a/0x40

Signed-off-by: Roy Shterman 
Signed-off-by: Sagi Grimberg 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 drivers/nvme/host/core.c   | 44 +++-
 drivers/nvme/host/nvme.h   |  2 ++
 drivers/nvme/host/rdma.c   |  2 +-
 drivers/nvme/target/loop.c |  2 +-
 4 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 935593032123..93a4fa053e7f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -65,9 +65,26 @@ static bool streams;
 module_param(streams, bool, 0644);
 MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
 
+/*
+ * nvme_wq - hosts nvme related works that are not reset or delete
+ * nvme_reset_wq - hosts nvme reset works
+ * nvme_delete_wq - hosts nvme delete works
+ *
+ * nvme_wq will host works such are scan, aen handling, fw activation,
+ * keep-alive error recovery, periodic reconnects etc. nvme_reset_wq
+ * runs reset works which also flush works hosted on nvme_wq for
+ * serialization purposes. nvme_delete_wq host controller deletion
+ * works which flush reset works for serialization.
+ */
 struct workqueue_struct *nvme_wq;
 EXPORT_SYMBOL_GPL(nvme_wq);
 
+struct workqueue_struct *nvme_reset_wq;

[PATCH AUTOSEL for 4.15 002/189] x86/tsc: Allow TSC calibration without PIT

2018-04-08 Thread Sasha Levin

From: Peter Zijlstra 

[ Upstream commit 30c7e5b123673d5e570e238dbada2fb68a87212c ]

Zhang Rui reported that a Surface Pro 4 will fail to boot with
lapic=notscdeadline. Part of the problem is that that machine doesn't have
a PIT.

If, for some reason, the TSC init has to fall back to TSC calibration, it
relies on the PIT to be present.

Allow TSC calibration to reliably fall back to HPET.

The below results in an accurate TSC measurement when forced on a IVB:

  tsc: Unable to calibrate against PIT
  tsc: No reference (HPET/PMTIMER) available
  tsc: Unable to calibrate against PIT
  tsc: using HPET reference calibration
  tsc: Detected 2792.451 MHz processor

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: len.br...@intel.com
Cc: rui.zh...@intel.com
Link: https://lkml.kernel.org/r/20171222092243.333145...@infradead.org
Signed-off-by: Sasha Levin 
---
 arch/x86/include/asm/i8259.h |  5 +
 arch/x86/kernel/tsc.c| 18 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index c8376b40e882..5cdcdbd4d892 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -69,6 +69,11 @@ struct legacy_pic {
 extern struct legacy_pic *legacy_pic;
 extern struct legacy_pic null_legacy_pic;
 
+static inline bool has_legacy_pic(void)
+{
+   return legacy_pic != _legacy_pic;
+}
+
 static inline int nr_legacy_irqs(void)
 {
return legacy_pic->nr_legacy_irqs;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index e169e85db434..a2c9dd8bfc6f 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned int __read_mostly cpu_khz;/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -363,6 +364,20 @@ static unsigned long pit_calibrate_tsc(u32 latch, unsigned 
long ms, int loopmin)
unsigned long tscmin, tscmax;
int pitcnt;
 
+   if (!has_legacy_pic()) {
+   /*
+* Relies on tsc_early_delay_calibrate() to have given us semi
+* usable udelay(), wait for the same 50ms we would have with
+* the PIT loop below.
+*/
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   return ULONG_MAX;
+   }
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 
@@ -487,6 +502,9 @@ static unsigned long quick_pit_calibrate(void)
u64 tsc, delta;
unsigned long d1, d2;
 
+   if (!has_legacy_pic())
+   return 0;
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 002/189] x86/tsc: Allow TSC calibration without PIT

2018-04-08 Thread Sasha Levin

From: Peter Zijlstra 

[ Upstream commit 30c7e5b123673d5e570e238dbada2fb68a87212c ]

Zhang Rui reported that a Surface Pro 4 will fail to boot with
lapic=notscdeadline. Part of the problem is that that machine doesn't have
a PIT.

If, for some reason, the TSC init has to fall back to TSC calibration, it
relies on the PIT to be present.

Allow TSC calibration to reliably fall back to HPET.

The below results in an accurate TSC measurement when forced on a IVB:

  tsc: Unable to calibrate against PIT
  tsc: No reference (HPET/PMTIMER) available
  tsc: Unable to calibrate against PIT
  tsc: using HPET reference calibration
  tsc: Detected 2792.451 MHz processor

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: len.br...@intel.com
Cc: rui.zh...@intel.com
Link: https://lkml.kernel.org/r/20171222092243.333145...@infradead.org
Signed-off-by: Sasha Levin 
---
 arch/x86/include/asm/i8259.h |  5 +
 arch/x86/kernel/tsc.c| 18 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index c8376b40e882..5cdcdbd4d892 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -69,6 +69,11 @@ struct legacy_pic {
 extern struct legacy_pic *legacy_pic;
 extern struct legacy_pic null_legacy_pic;
 
+static inline bool has_legacy_pic(void)
+{
+   return legacy_pic != _legacy_pic;
+}
+
 static inline int nr_legacy_irqs(void)
 {
return legacy_pic->nr_legacy_irqs;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index e169e85db434..a2c9dd8bfc6f 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned int __read_mostly cpu_khz;/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -363,6 +364,20 @@ static unsigned long pit_calibrate_tsc(u32 latch, unsigned 
long ms, int loopmin)
unsigned long tscmin, tscmax;
int pitcnt;
 
+   if (!has_legacy_pic()) {
+   /*
+* Relies on tsc_early_delay_calibrate() to have given us semi
+* usable udelay(), wait for the same 50ms we would have with
+* the PIT loop below.
+*/
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   return ULONG_MAX;
+   }
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 
@@ -487,6 +502,9 @@ static unsigned long quick_pit_calibrate(void)
u64 tsc, delta;
unsigned long d1, d2;
 
+   if (!has_legacy_pic())
+   return 0;
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 006/189] ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources

2018-04-08 Thread Sasha Levin

From: Hans de Goede 

[ Upstream commit e1681599345b8466786b6e54a2db2a00a068a3f3 ]

acpi_lpss_create_device() skips handling LPSS devices which do not have
a mmio resources in their resource list (typically these devices are
disabled by the firmware). But since the LPSS code does not bind to the
device, acpi_bus_attach() ends up still creating a platform device for
it and the regular platform_driver for the ACPI HID still tries to bind
to it.

This happens e.g. on some boards which do not use the pwm-controller
and have an empty or invalid resource-table for it. Currently this causes
these error messages to get logged:

[3.281966] pwm-lpss 80862288:00: invalid resource
[3.287098] pwm-lpss: probe of 80862288:00 failed with error -22

This commit stops the undesirable creation of a platform_device for
disabled LPSS devices by setting pnp.type.platform_id to 0. Note that
acpi_scan_attach_handler() also sets pnp.type.platform_id to 0 when there
is a matching handler for the device and that handler has no attach
callback, so we simply behave as a handler without an attach function
in this case.

Signed-off-by: Hans de Goede 
Acked-by: Mika Westerberg 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Sasha Levin 
---
 drivers/acpi/acpi_lpss.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
index 7f2b02cc8ea1..c71f5a2a592e 100644
--- a/drivers/acpi/acpi_lpss.c
+++ b/drivers/acpi/acpi_lpss.c
@@ -465,6 +465,8 @@ static int acpi_lpss_create_device(struct acpi_device *adev,
acpi_dev_free_resource_list(_list);
 
if (!pdata->mmio_base) {
+   /* Avoid acpi_bus_attach() instantiating a pdev for this dev. */
+   adev->pnp.type.platform_id = 0;
/* Skip the device, but continue the namespace scan. */
ret = 0;
goto err_out;
-- 
2.15.1

[PATCH AUTOSEL for 4.15 006/189] ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources

2018-04-08 Thread Sasha Levin

From: Hans de Goede 

[ Upstream commit e1681599345b8466786b6e54a2db2a00a068a3f3 ]

acpi_lpss_create_device() skips handling LPSS devices which do not have
a mmio resources in their resource list (typically these devices are
disabled by the firmware). But since the LPSS code does not bind to the
device, acpi_bus_attach() ends up still creating a platform device for
it and the regular platform_driver for the ACPI HID still tries to bind
to it.

This happens e.g. on some boards which do not use the pwm-controller
and have an empty or invalid resource-table for it. Currently this causes
these error messages to get logged:

[3.281966] pwm-lpss 80862288:00: invalid resource
[3.287098] pwm-lpss: probe of 80862288:00 failed with error -22

This commit stops the undesirable creation of a platform_device for
disabled LPSS devices by setting pnp.type.platform_id to 0. Note that
acpi_scan_attach_handler() also sets pnp.type.platform_id to 0 when there
is a matching handler for the device and that handler has no attach
callback, so we simply behave as a handler without an attach function
in this case.

Signed-off-by: Hans de Goede 
Acked-by: Mika Westerberg 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Sasha Levin 
---
 drivers/acpi/acpi_lpss.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
index 7f2b02cc8ea1..c71f5a2a592e 100644
--- a/drivers/acpi/acpi_lpss.c
+++ b/drivers/acpi/acpi_lpss.c
@@ -465,6 +465,8 @@ static int acpi_lpss_create_device(struct acpi_device *adev,
acpi_dev_free_resource_list(_list);
 
if (!pdata->mmio_base) {
+   /* Avoid acpi_bus_attach() instantiating a pdev for this dev. */
+   adev->pnp.type.platform_id = 0;
/* Skip the device, but continue the namespace scan. */
ret = 0;
goto err_out;
-- 
2.15.1

[PATCH AUTOSEL for 4.15 003/189] NFSv4: always set NFS_LOCK_LOST when a lock is lost.

2018-04-08 Thread Sasha Levin

From: NeilBrown 

[ Upstream commit dce2630c7da73b0634686bca557cc8945cc450c8 ]

There are 2 comments in the NFSv4 code which suggest that
SIGLOST should possibly be sent to a process.  In these
cases a lock has been lost.
The current practice is to set NFS_LOCK_LOST so that
read/write returns EIO when a lock is lost.
So change these comments to code when sets NFS_LOCK_LOST.

One case is when lock recovery after apparent server restart
fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or
NFS4ERRO_RECLAIM_CONFLICT.  The other case is when a lock
attempt as part of lease recovery fails with NFS4ERR_DENIED.

In an ideal world, these should not happen.  However I have
a packet trace showing an NFSv4.1 session getting
NFS4ERR_BADSESSION after an extended network parition.  The
NFSv4.1 client treats this like server reboot until/unless
it get NFS4ERR_NO_GRACE, in which case it switches over to
"nograce" recovery mode.  In this network trace, the client
attempts to recover a lock and the server (incorrectly)
reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE.  This
leads to the ineffective comment and the client then
continues to write using the OPEN stateid.

Signed-off-by: NeilBrown 
Signed-off-by: Trond Myklebust 
Signed-off-by: Sasha Levin 
---
 fs/nfs/nfs4proc.c  | 12 
 fs/nfs/nfs4state.c |  5 -
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 56fa5a16e097..083802f7a1e9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2019,7 +2019,7 @@ static int nfs4_open_reclaim(struct nfs4_state_owner *sp, 
struct nfs4_state *sta
return ret;
 }
 
-static int nfs4_handle_delegation_recall_error(struct nfs_server *server, 
struct nfs4_state *state, const nfs4_stateid *stateid, int err)
+static int nfs4_handle_delegation_recall_error(struct nfs_server *server, 
struct nfs4_state *state, const nfs4_stateid *stateid, struct file_lock *fl, 
int err)
 {
switch (err) {
default:
@@ -2066,7 +2066,11 @@ static int nfs4_handle_delegation_recall_error(struct 
nfs_server *server, struct
return -EAGAIN;
case -ENOMEM:
case -NFS4ERR_DENIED:
-   /* kill_proc(fl->fl_pid, SIGLOST, 1); */
+   if (fl) {
+   struct nfs4_lock_state *lsp = 
fl->fl_u.nfs4_fl.owner;
+   if (lsp)
+   set_bit(NFS_LOCK_LOST, >ls_flags);
+   }
return 0;
}
return err;
@@ -2102,7 +2106,7 @@ int nfs4_open_delegation_recall(struct nfs_open_context 
*ctx,
err = nfs4_open_recover_helper(opendata, FMODE_READ);
}
nfs4_opendata_put(opendata);
-   return nfs4_handle_delegation_recall_error(server, state, stateid, err);
+   return nfs4_handle_delegation_recall_error(server, state, stateid, 
NULL, err);
 }
 
 static void nfs4_open_confirm_prepare(struct rpc_task *task, void *calldata)
@@ -6739,7 +6743,7 @@ int nfs4_lock_delegation_recall(struct file_lock *fl, 
struct nfs4_state *state,
if (err != 0)
return err;
err = _nfs4_do_setlk(state, F_SETLK, fl, NFS_LOCK_NEW);
-   return nfs4_handle_delegation_recall_error(server, state, stateid, err);
+   return nfs4_handle_delegation_recall_error(server, state, stateid, fl, 
err);
 }
 
 struct nfs_release_lockowner_data {
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e4f4a09ed9f4..91a4d4eeb235 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1482,6 +1482,7 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, 
const struct nfs4_state_
struct inode *inode = state->inode;
struct nfs_inode *nfsi = NFS_I(inode);
struct file_lock *fl;
+   struct nfs4_lock_state *lsp;
int status = 0;
struct file_lock_context *flctx = inode->i_flctx;
struct list_head *list;
@@ -1522,7 +1523,9 @@ restart:
case -NFS4ERR_DENIED:
case -NFS4ERR_RECLAIM_BAD:
case -NFS4ERR_RECLAIM_CONFLICT:
-   /* kill_proc(fl->fl_pid, SIGLOST, 1); */
+   lsp = fl->fl_u.nfs4_fl.owner;
+   if (lsp)
+   set_bit(NFS_LOCK_LOST, >ls_flags);
status = 0;
}
spin_lock(>flc_lock);
-- 
2.15.1

[PATCH AUTOSEL for 4.15 011/189] RDMA/core: Clarify rdma_ah_find_type

2018-04-08 Thread Sasha Levin

From: Parav Pandit 

[ Upstream commit a6532e7139660c103dda181aa5b2c734aa26ed6c ]

iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
ports and for clarity it shouldn't have a special test for iWarp.

This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
when wrongly called on an iWarp port.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Sasha Levin 
---
 include/rdma/ib_verbs.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0d6a110dae7c..20ebf9061962 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3793,8 +3793,7 @@ static inline void rdma_ah_set_grh(struct rdma_ah_attr 
*attr,
 static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev,
   u32 port_num)
 {
-   if ((rdma_protocol_roce(dev, port_num)) ||
-   (rdma_protocol_iwarp(dev, port_num)))
+   if (rdma_protocol_roce(dev, port_num))
return RDMA_AH_ATTR_TYPE_ROCE;
else if ((rdma_protocol_ib(dev, port_num)) &&
 (rdma_cap_opa_ah(dev, port_num)))
-- 
2.15.1

[PATCH AUTOSEL for 4.15 007/189] tipc: fix a potental access after delete in tipc_sk_join()

2018-04-08 Thread Sasha Levin

From: Jon Maloy 

[ Upstream commit febafc8455fdbb0ba53d596075068a683b75f355 ]

In commit d12d2e12cec2 "tipc: send out join messages as soon as new
member is discovered") we added a call to the function tipc_group_join()
without considering the case that the preceding tipc_sk_publish() might
have failed, and the group item already deleted.

We fix this by returning from tipc_sk_join() directly after the
failed tipc_sk_publish.

Reported-by: syzbot+e3eeae78ea88b8d6d...@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 net/tipc/socket.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 3b4084480377..8efd2e42de30 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2759,6 +2759,7 @@ static int tipc_sk_join(struct tipc_sock *tsk, struct 
tipc_group_req *mreq)
if (rc) {
tipc_group_delete(net, grp);
tsk->group = NULL;
+   return rc;
}
 
/* Eliminate any risk that a broadcast overtakes the sent JOIN */
-- 
2.15.1

[PATCH AUTOSEL for 4.15 003/189] NFSv4: always set NFS_LOCK_LOST when a lock is lost.

2018-04-08 Thread Sasha Levin

From: NeilBrown 

[ Upstream commit dce2630c7da73b0634686bca557cc8945cc450c8 ]

There are 2 comments in the NFSv4 code which suggest that
SIGLOST should possibly be sent to a process.  In these
cases a lock has been lost.
The current practice is to set NFS_LOCK_LOST so that
read/write returns EIO when a lock is lost.
So change these comments to code when sets NFS_LOCK_LOST.

One case is when lock recovery after apparent server restart
fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or
NFS4ERRO_RECLAIM_CONFLICT.  The other case is when a lock
attempt as part of lease recovery fails with NFS4ERR_DENIED.

In an ideal world, these should not happen.  However I have
a packet trace showing an NFSv4.1 session getting
NFS4ERR_BADSESSION after an extended network parition.  The
NFSv4.1 client treats this like server reboot until/unless
it get NFS4ERR_NO_GRACE, in which case it switches over to
"nograce" recovery mode.  In this network trace, the client
attempts to recover a lock and the server (incorrectly)
reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE.  This
leads to the ineffective comment and the client then
continues to write using the OPEN stateid.

Signed-off-by: NeilBrown 
Signed-off-by: Trond Myklebust 
Signed-off-by: Sasha Levin 
---
 fs/nfs/nfs4proc.c  | 12 
 fs/nfs/nfs4state.c |  5 -
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 56fa5a16e097..083802f7a1e9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2019,7 +2019,7 @@ static int nfs4_open_reclaim(struct nfs4_state_owner *sp, 
struct nfs4_state *sta
return ret;
 }
 
-static int nfs4_handle_delegation_recall_error(struct nfs_server *server, 
struct nfs4_state *state, const nfs4_stateid *stateid, int err)
+static int nfs4_handle_delegation_recall_error(struct nfs_server *server, 
struct nfs4_state *state, const nfs4_stateid *stateid, struct file_lock *fl, 
int err)
 {
switch (err) {
default:
@@ -2066,7 +2066,11 @@ static int nfs4_handle_delegation_recall_error(struct 
nfs_server *server, struct
return -EAGAIN;
case -ENOMEM:
case -NFS4ERR_DENIED:
-   /* kill_proc(fl->fl_pid, SIGLOST, 1); */
+   if (fl) {
+   struct nfs4_lock_state *lsp = 
fl->fl_u.nfs4_fl.owner;
+   if (lsp)
+   set_bit(NFS_LOCK_LOST, >ls_flags);
+   }
return 0;
}
return err;
@@ -2102,7 +2106,7 @@ int nfs4_open_delegation_recall(struct nfs_open_context 
*ctx,
err = nfs4_open_recover_helper(opendata, FMODE_READ);
}
nfs4_opendata_put(opendata);
-   return nfs4_handle_delegation_recall_error(server, state, stateid, err);
+   return nfs4_handle_delegation_recall_error(server, state, stateid, 
NULL, err);
 }
 
 static void nfs4_open_confirm_prepare(struct rpc_task *task, void *calldata)
@@ -6739,7 +6743,7 @@ int nfs4_lock_delegation_recall(struct file_lock *fl, 
struct nfs4_state *state,
if (err != 0)
return err;
err = _nfs4_do_setlk(state, F_SETLK, fl, NFS_LOCK_NEW);
-   return nfs4_handle_delegation_recall_error(server, state, stateid, err);
+   return nfs4_handle_delegation_recall_error(server, state, stateid, fl, 
err);
 }
 
 struct nfs_release_lockowner_data {
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e4f4a09ed9f4..91a4d4eeb235 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1482,6 +1482,7 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, 
const struct nfs4_state_
struct inode *inode = state->inode;
struct nfs_inode *nfsi = NFS_I(inode);
struct file_lock *fl;
+   struct nfs4_lock_state *lsp;
int status = 0;
struct file_lock_context *flctx = inode->i_flctx;
struct list_head *list;
@@ -1522,7 +1523,9 @@ restart:
case -NFS4ERR_DENIED:
case -NFS4ERR_RECLAIM_BAD:
case -NFS4ERR_RECLAIM_CONFLICT:
-   /* kill_proc(fl->fl_pid, SIGLOST, 1); */
+   lsp = fl->fl_u.nfs4_fl.owner;
+   if (lsp)
+   set_bit(NFS_LOCK_LOST, >ls_flags);
status = 0;
}
spin_lock(>flc_lock);
-- 
2.15.1

[PATCH AUTOSEL for 4.15 011/189] RDMA/core: Clarify rdma_ah_find_type

2018-04-08 Thread Sasha Levin

From: Parav Pandit 

[ Upstream commit a6532e7139660c103dda181aa5b2c734aa26ed6c ]

iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
ports and for clarity it shouldn't have a special test for iWarp.

This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
when wrongly called on an iWarp port.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Sasha Levin 
---
 include/rdma/ib_verbs.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0d6a110dae7c..20ebf9061962 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3793,8 +3793,7 @@ static inline void rdma_ah_set_grh(struct rdma_ah_attr 
*attr,
 static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev,
   u32 port_num)
 {
-   if ((rdma_protocol_roce(dev, port_num)) ||
-   (rdma_protocol_iwarp(dev, port_num)))
+   if (rdma_protocol_roce(dev, port_num))
return RDMA_AH_ATTR_TYPE_ROCE;
else if ((rdma_protocol_ib(dev, port_num)) &&
 (rdma_cap_opa_ah(dev, port_num)))
-- 
2.15.1

[PATCH AUTOSEL for 4.15 007/189] tipc: fix a potental access after delete in tipc_sk_join()

2018-04-08 Thread Sasha Levin

From: Jon Maloy 

[ Upstream commit febafc8455fdbb0ba53d596075068a683b75f355 ]

In commit d12d2e12cec2 "tipc: send out join messages as soon as new
member is discovered") we added a call to the function tipc_group_join()
without considering the case that the preceding tipc_sk_publish() might
have failed, and the group item already deleted.

We fix this by returning from tipc_sk_join() directly after the
failed tipc_sk_publish.

Reported-by: syzbot+e3eeae78ea88b8d6d...@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 net/tipc/socket.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 3b4084480377..8efd2e42de30 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2759,6 +2759,7 @@ static int tipc_sk_join(struct tipc_sock *tsk, struct 
tipc_group_req *mreq)
if (rc) {
tipc_group_delete(net, grp);
tsk->group = NULL;
+   return rc;
}
 
/* Eliminate any risk that a broadcast overtakes the sent JOIN */
-- 
2.15.1

[PATCH AUTOSEL for 4.15 010/189] kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl

2018-04-08 Thread Sasha Levin

From: Paolo Bonzini 

[ Upstream commit 51776043afa415435c7e4636204fbe4f7edc4501 ]

This ioctl is obsolete (it was used by Xenner as far as I know) but
still let's not break it gratuitously...  Its handler is copying
directly into struct kvm.  Go through a bounce buffer instead, with
the added benefit that we can actually do something useful with the
flags argument---the previous code was exiting with -EINVAL but still
doing the copy.

This technically is a userspace ABI breakage, but since no one should be
using the ioctl, it's a good occasion to see if someone actually
complains.

Cc: kernel-harden...@lists.openwall.com
Cc: Kees Cook 
Cc: Radim Krčmář 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Kees Cook 
Signed-off-by: Sasha Levin 
---
 arch/x86/kvm/x86.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a10da5052072..41a8ac44d5cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4244,13 +4244,14 @@ set_identity_unlock:
mutex_unlock(>lock);
break;
case KVM_XEN_HVM_CONFIG: {
+   struct kvm_xen_hvm_config xhc;
r = -EFAULT;
-   if (copy_from_user(>arch.xen_hvm_config, argp,
-  sizeof(struct kvm_xen_hvm_config)))
+   if (copy_from_user(, argp, sizeof(xhc)))
goto out;
r = -EINVAL;
-   if (kvm->arch.xen_hvm_config.flags)
+   if (xhc.flags)
goto out;
+   memcpy(>arch.xen_hvm_config, , sizeof(xhc));
r = 0;
break;
}
-- 
2.15.1

[PATCH AUTOSEL for 4.15 014/189] tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account

2018-04-08 Thread Sasha Levin

From: Anna-Maria Gleixner 

[ Upstream commit 91633eed73a3ac37aaece5c8c1f93a18bae616a9 ]

So far only CLOCK_MONOTONIC and CLOCK_REALTIME were taken into account as
well as HRTIMER_MODE_ABS/REL in the hrtimer_init tracepoint. The query for
detecting the ABS or REL timer modes is not valid anymore, it got broken
by the introduction of HRTIMER_MODE_PINNED.

HRTIMER_MODE_PINNED is not evaluated in the hrtimer_init() call, but for the
sake of completeness print all given modes.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-9-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 include/trace/events/timer.h | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 16e305e69f34..c6f728037c53 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -136,6 +136,20 @@ DEFINE_EVENT(timer_class, timer_cancel,
TP_ARGS(timer)
 );
 
+#define decode_clockid(type)   \
+   __print_symbolic(type,  \
+   { CLOCK_REALTIME,   "CLOCK_REALTIME"},  \
+   { CLOCK_MONOTONIC,  "CLOCK_MONOTONIC"   },  \
+   { CLOCK_BOOTTIME,   "CLOCK_BOOTTIME"},  \
+   { CLOCK_TAI,"CLOCK_TAI" })
+
+#define decode_hrtimer_mode(mode)  \
+   __print_symbolic(mode,  \
+   { HRTIMER_MODE_ABS, "ABS"   },  \
+   { HRTIMER_MODE_REL, "REL"   },  \
+   { HRTIMER_MODE_ABS_PINNED,  "ABS|PINNED"},  \
+   { HRTIMER_MODE_REL_PINNED,  "REL|PINNED"})
+
 /**
  * hrtimer_init - called when the hrtimer is initialized
  * @hrtimer:   pointer to struct hrtimer
@@ -162,10 +176,8 @@ TRACE_EVENT(hrtimer_init,
),
 
TP_printk("hrtimer=%p clockid=%s mode=%s", __entry->hrtimer,
- __entry->clockid == CLOCK_REALTIME ?
-   "CLOCK_REALTIME" : "CLOCK_MONOTONIC",
- __entry->mode == HRTIMER_MODE_ABS ?
-   "HRTIMER_MODE_ABS" : "HRTIMER_MODE_REL")
+ decode_clockid(__entry->clockid),
+ decode_hrtimer_mode(__entry->mode))
 );
 
 /**
-- 
2.15.1

[PATCH AUTOSEL for 4.15 010/189] kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl

2018-04-08 Thread Sasha Levin

From: Paolo Bonzini 

[ Upstream commit 51776043afa415435c7e4636204fbe4f7edc4501 ]

This ioctl is obsolete (it was used by Xenner as far as I know) but
still let's not break it gratuitously...  Its handler is copying
directly into struct kvm.  Go through a bounce buffer instead, with
the added benefit that we can actually do something useful with the
flags argument---the previous code was exiting with -EINVAL but still
doing the copy.

This technically is a userspace ABI breakage, but since no one should be
using the ioctl, it's a good occasion to see if someone actually
complains.

Cc: kernel-harden...@lists.openwall.com
Cc: Kees Cook 
Cc: Radim Krčmář 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Kees Cook 
Signed-off-by: Sasha Levin 
---
 arch/x86/kvm/x86.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a10da5052072..41a8ac44d5cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4244,13 +4244,14 @@ set_identity_unlock:
mutex_unlock(>lock);
break;
case KVM_XEN_HVM_CONFIG: {
+   struct kvm_xen_hvm_config xhc;
r = -EFAULT;
-   if (copy_from_user(>arch.xen_hvm_config, argp,
-  sizeof(struct kvm_xen_hvm_config)))
+   if (copy_from_user(, argp, sizeof(xhc)))
goto out;
r = -EINVAL;
-   if (kvm->arch.xen_hvm_config.flags)
+   if (xhc.flags)
goto out;
+   memcpy(>arch.xen_hvm_config, , sizeof(xhc));
r = 0;
break;
}
-- 
2.15.1

[PATCH AUTOSEL for 4.15 014/189] tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account

2018-04-08 Thread Sasha Levin

From: Anna-Maria Gleixner 

[ Upstream commit 91633eed73a3ac37aaece5c8c1f93a18bae616a9 ]

So far only CLOCK_MONOTONIC and CLOCK_REALTIME were taken into account as
well as HRTIMER_MODE_ABS/REL in the hrtimer_init tracepoint. The query for
detecting the ABS or REL timer modes is not valid anymore, it got broken
by the introduction of HRTIMER_MODE_PINNED.

HRTIMER_MODE_PINNED is not evaluated in the hrtimer_init() call, but for the
sake of completeness print all given modes.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-9-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 include/trace/events/timer.h | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 16e305e69f34..c6f728037c53 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -136,6 +136,20 @@ DEFINE_EVENT(timer_class, timer_cancel,
TP_ARGS(timer)
 );
 
+#define decode_clockid(type)   \
+   __print_symbolic(type,  \
+   { CLOCK_REALTIME,   "CLOCK_REALTIME"},  \
+   { CLOCK_MONOTONIC,  "CLOCK_MONOTONIC"   },  \
+   { CLOCK_BOOTTIME,   "CLOCK_BOOTTIME"},  \
+   { CLOCK_TAI,"CLOCK_TAI" })
+
+#define decode_hrtimer_mode(mode)  \
+   __print_symbolic(mode,  \
+   { HRTIMER_MODE_ABS, "ABS"   },  \
+   { HRTIMER_MODE_REL, "REL"   },  \
+   { HRTIMER_MODE_ABS_PINNED,  "ABS|PINNED"},  \
+   { HRTIMER_MODE_REL_PINNED,  "REL|PINNED"})
+
 /**
  * hrtimer_init - called when the hrtimer is initialized
  * @hrtimer:   pointer to struct hrtimer
@@ -162,10 +176,8 @@ TRACE_EVENT(hrtimer_init,
),
 
TP_printk("hrtimer=%p clockid=%s mode=%s", __entry->hrtimer,
- __entry->clockid == CLOCK_REALTIME ?
-   "CLOCK_REALTIME" : "CLOCK_MONOTONIC",
- __entry->mode == HRTIMER_MODE_ABS ?
-   "HRTIMER_MODE_ABS" : "HRTIMER_MODE_REL")
+ decode_clockid(__entry->clockid),
+ decode_hrtimer_mode(__entry->mode))
 );
 
 /**
-- 
2.15.1

[PATCH AUTOSEL for 4.15 008/189] ALSA: hda - Use IS_REACHABLE() for dependency on input

2018-04-08 Thread Sasha Levin

From: Takashi Iwai 

[ Upstream commit c469652bb5e8fb715db7d152f46d33b3740c9b87 ]

The commit ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek
HD-audio codec") introduced the reverse-selection of CONFIG_INPUT for
Realtek codec in order to avoid the mess with dependency between
built-in and modules.  Later on, we obtained IS_REACHABLE() macro
exactly for this kind of problems, and now we can remove th INPUT
selection in Kconfig and put IS_REACHABLE(INPUT) to the appropriate
places in the code, so that the driver doesn't need to select other
subsystem forcibly.

Fixes: ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec")
Reported-by: Randy Dunlap 
Acked-by: Randy Dunlap  # and build-tested
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 sound/pci/hda/Kconfig | 1 -
 sound/pci/hda/patch_realtek.c | 5 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/sound/pci/hda/Kconfig b/sound/pci/hda/Kconfig
index 7f3b5ed81995..f7a492c382d9 100644
--- a/sound/pci/hda/Kconfig
+++ b/sound/pci/hda/Kconfig
@@ -88,7 +88,6 @@ config SND_HDA_PATCH_LOADER
 config SND_HDA_CODEC_REALTEK
tristate "Build Realtek HD-audio codec support"
select SND_HDA_GENERIC
-   select INPUT
help
  Say Y or M here to include Realtek HD-audio codec support in
  snd-hda-intel driver, such as ALC880.
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 206774703a33..ac7ef3957159 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -3744,6 +3744,7 @@ static void alc280_fixup_hp_gpio4(struct hda_codec *codec,
}
 }
 
+#if IS_REACHABLE(INPUT)
 static void gpio2_mic_hotkey_event(struct hda_codec *codec,
   struct hda_jack_callback *event)
 {
@@ -3876,6 +3877,10 @@ static void alc233_fixup_lenovo_line2_mic_hotkey(struct 
hda_codec *codec,
spec->kb_dev = NULL;
}
 }
+#else /* INPUT */
+#define alc280_fixup_hp_gpio2_mic_hotkey   NULL
+#define alc233_fixup_lenovo_line2_mic_hotkey   NULL
+#endif /* INPUT */
 
 static void alc269_fixup_hp_line1_mic1_led(struct hda_codec *codec,
const struct hda_fixup *fix, int action)
-- 
2.15.1

[PATCH AUTOSEL for 4.15 008/189] ALSA: hda - Use IS_REACHABLE() for dependency on input

2018-04-08 Thread Sasha Levin

From: Takashi Iwai 

[ Upstream commit c469652bb5e8fb715db7d152f46d33b3740c9b87 ]

The commit ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek
HD-audio codec") introduced the reverse-selection of CONFIG_INPUT for
Realtek codec in order to avoid the mess with dependency between
built-in and modules.  Later on, we obtained IS_REACHABLE() macro
exactly for this kind of problems, and now we can remove th INPUT
selection in Kconfig and put IS_REACHABLE(INPUT) to the appropriate
places in the code, so that the driver doesn't need to select other
subsystem forcibly.

Fixes: ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec")
Reported-by: Randy Dunlap 
Acked-by: Randy Dunlap  # and build-tested
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 sound/pci/hda/Kconfig | 1 -
 sound/pci/hda/patch_realtek.c | 5 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/sound/pci/hda/Kconfig b/sound/pci/hda/Kconfig
index 7f3b5ed81995..f7a492c382d9 100644
--- a/sound/pci/hda/Kconfig
+++ b/sound/pci/hda/Kconfig
@@ -88,7 +88,6 @@ config SND_HDA_PATCH_LOADER
 config SND_HDA_CODEC_REALTEK
tristate "Build Realtek HD-audio codec support"
select SND_HDA_GENERIC
-   select INPUT
help
  Say Y or M here to include Realtek HD-audio codec support in
  snd-hda-intel driver, such as ALC880.
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 206774703a33..ac7ef3957159 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -3744,6 +3744,7 @@ static void alc280_fixup_hp_gpio4(struct hda_codec *codec,
}
 }
 
+#if IS_REACHABLE(INPUT)
 static void gpio2_mic_hotkey_event(struct hda_codec *codec,
   struct hda_jack_callback *event)
 {
@@ -3876,6 +3877,10 @@ static void alc233_fixup_lenovo_line2_mic_hotkey(struct 
hda_codec *codec,
spec->kb_dev = NULL;
}
 }
+#else /* INPUT */
+#define alc280_fixup_hp_gpio2_mic_hotkey   NULL
+#define alc233_fixup_lenovo_line2_mic_hotkey   NULL
+#endif /* INPUT */
 
 static void alc269_fixup_hp_line1_mic1_led(struct hda_codec *codec,
const struct hda_fixup *fix, int action)
-- 
2.15.1

[PATCH AUTOSEL for 4.15 013/189] netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460

2018-04-08 Thread Sasha Levin

From: Subash Abhinov Kasiviswanathan 

[ Upstream commit 83f1999caeb14e15df205e80d210699951733287 ]

ipv6_defrag pulls network headers before fragment header. In case of
an error, the netfilter layer is currently dropping these packets.
This results in failure of some IPv6 standards tests which passed on
older kernels due to the netfilter framework using cloning.

The test case run here is a check for ICMPv6 error message replies
when some invalid IPv6 fragments are sent. This specific test case is
listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf
in the Extension Header Processing Order section.

A packet with unrecognized option Type 11 is sent and the test expects
an ICMP error in line with RFC2460 section 4.2 -

11 - discard the packet and, only if the packet's Destination
 Address was not a multicast address, send an ICMP Parameter
 Problem, Code 2, message to the packet's Source Address,
 pointing to the unrecognized Option Type.

Since netfilter layer now drops all invalid IPv6 frag packets, we no
longer see the ICMP error message and fail the test case.

To fix this, save the transport header. If defrag is unable to process
the packet due to RFC2460, restore the transport header and allow packet
to be processed by stack. There is no change for other packet
processing paths.

Tested by confirming that stack sends an ICMP error when it receives
these packets. Also tested that fragmented ICMP pings succeed.

v1->v2: Instead of cloning always, save the transport_header and
restore it in case of this specific error. Update the title and
commit message accordingly.

Signed-off-by: Subash Abhinov Kasiviswanathan 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Sasha Levin 
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 977d8900cfd1..ce53dcfda88a 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -231,7 +231,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct 
sk_buff *skb,
 
if ((unsigned int)end > IPV6_MAXPLEN) {
pr_debug("offset is too large.\n");
-   return -1;
+   return -EINVAL;
}
 
ecn = ip6_frag_ecn(ipv6_hdr(skb));
@@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct 
sk_buff *skb,
 * this case. -DaveM
 */
pr_debug("end of fragment not rounded to 8 bytes.\n");
-   return -1;
+   return -EPROTO;
}
if (end > fq->q.len) {
/* Some bits beyond end -> corruption. */
@@ -358,7 +358,7 @@ found:
 discard_fq:
inet_frag_kill(>q, _frags);
 err:
-   return -1;
+   return -EINVAL;
 }
 
 /*
@@ -567,6 +567,7 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int 
*prevhoff, int *fhoff)
 
 int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
 {
+   u16 savethdr = skb->transport_header;
struct net_device *dev = skb->dev;
int fhoff, nhoff, ret;
struct frag_hdr *fhdr;
@@ -600,8 +601,12 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff 
*skb, u32 user)
 
spin_lock_bh(>q.lock);
 
-   if (nf_ct_frag6_queue(fq, skb, fhdr, nhoff) < 0) {
-   ret = -EINVAL;
+   ret = nf_ct_frag6_queue(fq, skb, fhdr, nhoff);
+   if (ret < 0) {
+   if (ret == -EPROTO) {
+   skb->transport_header = savethdr;
+   ret = 0;
+   }
goto out_unlock;
}
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 013/189] netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460

2018-04-08 Thread Sasha Levin

From: Subash Abhinov Kasiviswanathan 

[ Upstream commit 83f1999caeb14e15df205e80d210699951733287 ]

ipv6_defrag pulls network headers before fragment header. In case of
an error, the netfilter layer is currently dropping these packets.
This results in failure of some IPv6 standards tests which passed on
older kernels due to the netfilter framework using cloning.

The test case run here is a check for ICMPv6 error message replies
when some invalid IPv6 fragments are sent. This specific test case is
listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf
in the Extension Header Processing Order section.

A packet with unrecognized option Type 11 is sent and the test expects
an ICMP error in line with RFC2460 section 4.2 -

11 - discard the packet and, only if the packet's Destination
 Address was not a multicast address, send an ICMP Parameter
 Problem, Code 2, message to the packet's Source Address,
 pointing to the unrecognized Option Type.

Since netfilter layer now drops all invalid IPv6 frag packets, we no
longer see the ICMP error message and fail the test case.

To fix this, save the transport header. If defrag is unable to process
the packet due to RFC2460, restore the transport header and allow packet
to be processed by stack. There is no change for other packet
processing paths.

Tested by confirming that stack sends an ICMP error when it receives
these packets. Also tested that fragmented ICMP pings succeed.

v1->v2: Instead of cloning always, save the transport_header and
restore it in case of this specific error. Update the title and
commit message accordingly.

Signed-off-by: Subash Abhinov Kasiviswanathan 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Sasha Levin 
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 977d8900cfd1..ce53dcfda88a 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -231,7 +231,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct 
sk_buff *skb,
 
if ((unsigned int)end > IPV6_MAXPLEN) {
pr_debug("offset is too large.\n");
-   return -1;
+   return -EINVAL;
}
 
ecn = ip6_frag_ecn(ipv6_hdr(skb));
@@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct 
sk_buff *skb,
 * this case. -DaveM
 */
pr_debug("end of fragment not rounded to 8 bytes.\n");
-   return -1;
+   return -EPROTO;
}
if (end > fq->q.len) {
/* Some bits beyond end -> corruption. */
@@ -358,7 +358,7 @@ found:
 discard_fq:
inet_frag_kill(>q, _frags);
 err:
-   return -1;
+   return -EINVAL;
 }
 
 /*
@@ -567,6 +567,7 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int 
*prevhoff, int *fhoff)
 
 int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
 {
+   u16 savethdr = skb->transport_header;
struct net_device *dev = skb->dev;
int fhoff, nhoff, ret;
struct frag_hdr *fhdr;
@@ -600,8 +601,12 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff 
*skb, u32 user)
 
spin_lock_bh(>q.lock);
 
-   if (nf_ct_frag6_queue(fq, skb, fhdr, nhoff) < 0) {
-   ret = -EINVAL;
+   ret = nf_ct_frag6_queue(fq, skb, fhdr, nhoff);
+   if (ret < 0) {
+   if (ret == -EPROTO) {
+   skb->transport_header = savethdr;
+   ret = 0;
+   }
goto out_unlock;
}
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 016/189] platform/x86: dell-laptop: Filter out spurious keyboard backlight change events

2018-04-08 Thread Sasha Levin

From: Hans de Goede 

[ Upstream commit 4d6bde512a86c32df3a1f289d2b4cd04b17758d1 ]

On some Dell XPS models WMI events of type 0x reporting a keycode of
0xe00c get reported when the brightness of the LCD panel changes.

This leads to us reporting false-positive kbd_led change events to
userspace which in turn leads to the kbd backlight OSD showing when it
should not.

We already read the current keyboard backlight brightness value when
reporting events because the led_classdev_notify_brightness_hw_changed
API requires this. Compare this value to the last known value and filter
out duplicate events, fixing this.

Note the fixed issue is esp. a problem on XPS models with an ambient light
sensor and automatic brightness adjustments turned on, this causes the kbd
backlight OSD to show all the time there.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514969
Fixes: 9c656b0799 ("platform/x86: dell-*: Call new led hw_changed API ...")
Acked-by: Pali Rohár 
Signed-off-by: Hans de Goede 
Signed-off-by: Andy Shevchenko 
Signed-off-by: Sasha Levin 
---
 drivers/platform/x86/dell-laptop.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/dell-laptop.c 
b/drivers/platform/x86/dell-laptop.c
index c864430b9fcf..8ddb309cfc85 100644
--- a/drivers/platform/x86/dell-laptop.c
+++ b/drivers/platform/x86/dell-laptop.c
@@ -1149,6 +1149,7 @@ static u8 kbd_previous_mode_bit;
 
 static bool kbd_led_present;
 static DEFINE_MUTEX(kbd_led_mutex);
+static enum led_brightness kbd_led_level;
 
 /*
  * NOTE: there are three ways to set the keyboard backlight level.
@@ -1971,6 +1972,7 @@ static enum led_brightness kbd_led_level_get(struct 
led_classdev *led_cdev)
 static int kbd_led_level_set(struct led_classdev *led_cdev,
 enum led_brightness value)
 {
+   enum led_brightness new_value = value;
struct kbd_state state;
struct kbd_state new_state;
u16 num;
@@ -2000,6 +2002,9 @@ static int kbd_led_level_set(struct led_classdev 
*led_cdev,
}
 
 out:
+   if (ret == 0)
+   kbd_led_level = new_value;
+
mutex_unlock(_led_mutex);
return ret;
 }
@@ -2027,6 +2032,9 @@ static int __init kbd_led_init(struct device *dev)
if (kbd_led.max_brightness)
kbd_led.max_brightness--;
}
+
+   kbd_led_level = kbd_led_level_get(NULL);
+
ret = led_classdev_register(dev, _led);
if (ret)
kbd_led_present = false;
@@ -2051,13 +2059,25 @@ static void kbd_led_exit(void)
 static int dell_laptop_notifier_call(struct notifier_block *nb,
 unsigned long action, void *data)
 {
+   bool changed = false;
+   enum led_brightness new_kbd_led_level;
+
switch (action) {
case DELL_LAPTOP_KBD_BACKLIGHT_BRIGHTNESS_CHANGED:
if (!kbd_led_present)
break;
 
-   led_classdev_notify_brightness_hw_changed(_led,
-   kbd_led_level_get(_led));
+   mutex_lock(_led_mutex);
+   new_kbd_led_level = kbd_led_level_get(_led);
+   if (kbd_led_level != new_kbd_led_level) {
+   kbd_led_level = new_kbd_led_level;
+   changed = true;
+   }
+   mutex_unlock(_led_mutex);
+
+   if (changed)
+   led_classdev_notify_brightness_hw_changed(_led,
+   kbd_led_level);
break;
}
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 015/189] KVM: s390: use created_vcpus in more places

2018-04-08 Thread Sasha Levin

From: Christian Borntraeger 

[ Upstream commit 241e3ec0faf5ab1a0d9b1f6c43eefa919fb9c112 ]

commit a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus") introduced
kvm->created_vcpus to avoid races with the existing kvm->online_vcpus
scheme. One place was "forgotten" and one new place was "added".
Let's fix those.

Reported-by: Halil Pasic 
Signed-off-by: Christian Borntraeger 
Reviewed-by: Halil Pasic 
Reviewed-by: Cornelia Huck 
Reviewed-by: David Hildenbrand 
Fixes: 4e0b1ab72b8a ("KVM: s390: gs support for kvm guests")
Fixes: a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus")
Signed-off-by: Sasha Levin 
---
 arch/s390/kvm/kvm-s390.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 004684eaa827..50193cbc819a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -602,7 +602,7 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct 
kvm_enable_cap *cap)
case KVM_CAP_S390_GS:
r = -EINVAL;
mutex_lock(>lock);
-   if (atomic_read(>online_vcpus)) {
+   if (kvm->created_vcpus) {
r = -EBUSY;
} else if (test_facility(133)) {
set_kvm_facility(kvm->arch.model.fac_mask, 133);
@@ -1122,7 +1122,7 @@ static int kvm_s390_set_processor_feat(struct kvm *kvm,
return -EINVAL;
 
mutex_lock(>lock);
-   if (!atomic_read(>online_vcpus)) {
+   if (!kvm->created_vcpus) {
bitmap_copy(kvm->arch.cpu_feat, (unsigned long *) data.feat,
KVM_S390_VM_CPU_FEAT_NR_BITS);
ret = 0;
-- 
2.15.1

[PATCH AUTOSEL for 4.15 018/189] xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_request

2018-04-08 Thread Sasha Levin

From: Chuck Lever 

[ Upstream commit 42b9f5c58aa8c59c91ead0254f0c193e3438b020 ]

The rpcrdma_req is not shared yet, and its associated Send hasn't
been posted, thus RMW should be safe. There's no need for the
expense of a lock cycle here.

Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion")
Signed-off-by: Chuck Lever 
Signed-off-by: Anna Schumaker 
Signed-off-by: Sasha Levin 
---
 net/sunrpc/xprtrdma/transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 6ee1ad8978f3..76c03aa6cb57 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -744,7 +744,7 @@ xprt_rdma_send_request(struct rpc_task *task)
goto drop_connection;
req->rl_connect_cookie = xprt->connect_cookie;
 
-   set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags);
+   __set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags);
if (rpcrdma_ep_post(_xprt->rx_ia, _xprt->rx_ep, req))
goto drop_connection;
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 016/189] platform/x86: dell-laptop: Filter out spurious keyboard backlight change events

2018-04-08 Thread Sasha Levin

From: Hans de Goede 

[ Upstream commit 4d6bde512a86c32df3a1f289d2b4cd04b17758d1 ]

On some Dell XPS models WMI events of type 0x reporting a keycode of
0xe00c get reported when the brightness of the LCD panel changes.

This leads to us reporting false-positive kbd_led change events to
userspace which in turn leads to the kbd backlight OSD showing when it
should not.

We already read the current keyboard backlight brightness value when
reporting events because the led_classdev_notify_brightness_hw_changed
API requires this. Compare this value to the last known value and filter
out duplicate events, fixing this.

Note the fixed issue is esp. a problem on XPS models with an ambient light
sensor and automatic brightness adjustments turned on, this causes the kbd
backlight OSD to show all the time there.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514969
Fixes: 9c656b0799 ("platform/x86: dell-*: Call new led hw_changed API ...")
Acked-by: Pali Rohár 
Signed-off-by: Hans de Goede 
Signed-off-by: Andy Shevchenko 
Signed-off-by: Sasha Levin 
---
 drivers/platform/x86/dell-laptop.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/dell-laptop.c 
b/drivers/platform/x86/dell-laptop.c
index c864430b9fcf..8ddb309cfc85 100644
--- a/drivers/platform/x86/dell-laptop.c
+++ b/drivers/platform/x86/dell-laptop.c
@@ -1149,6 +1149,7 @@ static u8 kbd_previous_mode_bit;
 
 static bool kbd_led_present;
 static DEFINE_MUTEX(kbd_led_mutex);
+static enum led_brightness kbd_led_level;
 
 /*
  * NOTE: there are three ways to set the keyboard backlight level.
@@ -1971,6 +1972,7 @@ static enum led_brightness kbd_led_level_get(struct 
led_classdev *led_cdev)
 static int kbd_led_level_set(struct led_classdev *led_cdev,
 enum led_brightness value)
 {
+   enum led_brightness new_value = value;
struct kbd_state state;
struct kbd_state new_state;
u16 num;
@@ -2000,6 +2002,9 @@ static int kbd_led_level_set(struct led_classdev 
*led_cdev,
}
 
 out:
+   if (ret == 0)
+   kbd_led_level = new_value;
+
mutex_unlock(_led_mutex);
return ret;
 }
@@ -2027,6 +2032,9 @@ static int __init kbd_led_init(struct device *dev)
if (kbd_led.max_brightness)
kbd_led.max_brightness--;
}
+
+   kbd_led_level = kbd_led_level_get(NULL);
+
ret = led_classdev_register(dev, _led);
if (ret)
kbd_led_present = false;
@@ -2051,13 +2059,25 @@ static void kbd_led_exit(void)
 static int dell_laptop_notifier_call(struct notifier_block *nb,
 unsigned long action, void *data)
 {
+   bool changed = false;
+   enum led_brightness new_kbd_led_level;
+
switch (action) {
case DELL_LAPTOP_KBD_BACKLIGHT_BRIGHTNESS_CHANGED:
if (!kbd_led_present)
break;
 
-   led_classdev_notify_brightness_hw_changed(_led,
-   kbd_led_level_get(_led));
+   mutex_lock(_led_mutex);
+   new_kbd_led_level = kbd_led_level_get(_led);
+   if (kbd_led_level != new_kbd_led_level) {
+   kbd_led_level = new_kbd_led_level;
+   changed = true;
+   }
+   mutex_unlock(_led_mutex);
+
+   if (changed)
+   led_classdev_notify_brightness_hw_changed(_led,
+   kbd_led_level);
break;
}
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 015/189] KVM: s390: use created_vcpus in more places

2018-04-08 Thread Sasha Levin

From: Christian Borntraeger 

[ Upstream commit 241e3ec0faf5ab1a0d9b1f6c43eefa919fb9c112 ]

commit a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus") introduced
kvm->created_vcpus to avoid races with the existing kvm->online_vcpus
scheme. One place was "forgotten" and one new place was "added".
Let's fix those.

Reported-by: Halil Pasic 
Signed-off-by: Christian Borntraeger 
Reviewed-by: Halil Pasic 
Reviewed-by: Cornelia Huck 
Reviewed-by: David Hildenbrand 
Fixes: 4e0b1ab72b8a ("KVM: s390: gs support for kvm guests")
Fixes: a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus")
Signed-off-by: Sasha Levin 
---
 arch/s390/kvm/kvm-s390.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 004684eaa827..50193cbc819a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -602,7 +602,7 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct 
kvm_enable_cap *cap)
case KVM_CAP_S390_GS:
r = -EINVAL;
mutex_lock(>lock);
-   if (atomic_read(>online_vcpus)) {
+   if (kvm->created_vcpus) {
r = -EBUSY;
} else if (test_facility(133)) {
set_kvm_facility(kvm->arch.model.fac_mask, 133);
@@ -1122,7 +1122,7 @@ static int kvm_s390_set_processor_feat(struct kvm *kvm,
return -EINVAL;
 
mutex_lock(>lock);
-   if (!atomic_read(>online_vcpus)) {
+   if (!kvm->created_vcpus) {
bitmap_copy(kvm->arch.cpu_feat, (unsigned long *) data.feat,
KVM_S390_VM_CPU_FEAT_NR_BITS);
ret = 0;
-- 
2.15.1

[PATCH AUTOSEL for 4.15 018/189] xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_request

2018-04-08 Thread Sasha Levin

From: Chuck Lever 

[ Upstream commit 42b9f5c58aa8c59c91ead0254f0c193e3438b020 ]

The rpcrdma_req is not shared yet, and its associated Send hasn't
been posted, thus RMW should be safe. There's no need for the
expense of a lock cycle here.

Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion")
Signed-off-by: Chuck Lever 
Signed-off-by: Anna Schumaker 
Signed-off-by: Sasha Levin 
---
 net/sunrpc/xprtrdma/transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 6ee1ad8978f3..76c03aa6cb57 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -744,7 +744,7 @@ xprt_rdma_send_request(struct rpc_task *task)
goto drop_connection;
req->rl_connect_cookie = xprt->connect_cookie;
 
-   set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags);
+   __set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags);
if (rpcrdma_ep_post(_xprt->rx_ia, _xprt->rx_ep, req))
goto drop_connection;
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 019/189] printk: Add console owner and waiter logic to load balance console writes

2018-04-08 Thread Sasha Levin

From: "Steven Rostedt (VMware)" 

[ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ]

This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.

Here's the design again:

I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.

There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.

In printk() when it tries to write to the consoles, we have:

if (console_trylock())
console_unlock();

Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.

When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.

If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.

Then the waiter calls console_unlock() and continues to write to the
consoles.

If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!

By Petr Mladek about possible new deadlocks:

The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."

We could look at it from this side. The possible deadlock would
look like:

CPU0CPU1

console_unlock()

  console_owner = current;

spin_lockA()
  printk()
spin = true;
while (...)

call_console_drivers()
  spin_lockA()

This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.

But if the above is true than the following scenario was
already possible before:

CPU0

spin_lockA()
  printk()
console_unlock()
  call_console_drivers()
spin_lockA()

By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.

By Steven Rostedt:

To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.

 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 static bool stop_testing;
 static unsigned int loops = 1;

 static void preempt_printk_workfn(struct work_struct *work)
 {
int i;

while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
 " XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
 }

 static struct work_struct __percpu *works;

 static void finish(void)
 {
int cpu;

WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
 }

 static int __init test_init(void)
 {
int cpu;

works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;

/*
 * This is just a test module. This will break if you
 * do any CPU hot plugging between loading and
 * unloading the module.
 */

for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);

INIT_WORK(work, _printk_workfn);
schedule_work_on(cpu, work);
}

return 0;
 }

 static void __exit test_exit(void)
 {
finish();
 }

[PATCH AUTOSEL for 4.15 019/189] printk: Add console owner and waiter logic to load balance console writes

2018-04-08 Thread Sasha Levin

From: "Steven Rostedt (VMware)" 

[ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ]

This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.

Here's the design again:

I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.

There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.

In printk() when it tries to write to the consoles, we have:

if (console_trylock())
console_unlock();

Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.

When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.

If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.

Then the waiter calls console_unlock() and continues to write to the
consoles.

If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!

By Petr Mladek about possible new deadlocks:

The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."

We could look at it from this side. The possible deadlock would
look like:

CPU0CPU1

console_unlock()

  console_owner = current;

spin_lockA()
  printk()
spin = true;
while (...)

call_console_drivers()
  spin_lockA()

This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.

But if the above is true than the following scenario was
already possible before:

CPU0

spin_lockA()
  printk()
console_unlock()
  call_console_drivers()
spin_lockA()

By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.

By Steven Rostedt:

To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.

 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 static bool stop_testing;
 static unsigned int loops = 1;

 static void preempt_printk_workfn(struct work_struct *work)
 {
int i;

while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
 " XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
 }

 static struct work_struct __percpu *works;

 static void finish(void)
 {
int cpu;

WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
 }

 static int __init test_init(void)
 {
int cpu;

works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;

/*
 * This is just a test module. This will break if you
 * do any CPU hot plugging between loading and
 * unloading the module.
 */

for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);

INIT_WORK(work, _printk_workfn);
schedule_work_on(cpu, work);
}

return 0;
 }

 static void __exit test_exit(void)
 {
finish();
 }

 module_param(loops, uint,

[PATCH AUTOSEL for 4.15 017/189] xprtrdma: Fix backchannel allocation of extra rpcrdma_reps

2018-04-08 Thread Sasha Levin

From: Chuck Lever 

[ Upstream commit d698c4a02ee02053bbebe051322ff427a2dad56a ]

The backchannel code uses rpcrdma_recv_buffer_put to add new reps
to the free rep list. This also decrements rb_recv_count, which
spoofs the receive overrun logic in rpcrdma_buffer_get_rep.

Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of
spin_lock_irqsave(rb_lock)") replaced the original open-coded
list_add with a call to rpcrdma_recv_buffer_put(), but then a year
later, commit 05c974669ece ("xprtrdma: Fix receive buffer
accounting") added rep accounting to rpcrdma_recv_buffer_put.
It was an oversight to let the backchannel continue to use this
function.

The fix this, let's combine the "add to free list" logic with
rpcrdma_create_rep.

Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in
rpcrdma_buffer_create and then allocate additional rpcrdma_reps in
rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel
set-up is sufficient.

Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting")
Signed-off-by: Chuck Lever 
Signed-off-by: Anna Schumaker 
Signed-off-by: Sasha Levin 
---
 net/sunrpc/xprtrdma/backchannel.c | 12 ++--
 net/sunrpc/xprtrdma/verbs.c   | 32 +++-
 net/sunrpc/xprtrdma/xprt_rdma.h   |  2 +-
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c 
b/net/sunrpc/xprtrdma/backchannel.c
index 8b818bb3518a..256c67b433c1 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -74,21 +74,13 @@ out_fail:
 static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt,
 unsigned int count)
 {
-   struct rpcrdma_rep *rep;
int rc = 0;
 
while (count--) {
-   rep = rpcrdma_create_rep(r_xprt);
-   if (IS_ERR(rep)) {
-   pr_err("RPC:   %s: reply buffer alloc failed\n",
-  __func__);
-   rc = PTR_ERR(rep);
+   rc = rpcrdma_create_rep(r_xprt);
+   if (rc)
break;
-   }
-
-   rpcrdma_recv_buffer_put(rep);
}
-
return rc;
 }
 
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 8cd7ee4fa0cd..371fbd9b55bb 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1093,10 +1093,17 @@ rpcrdma_create_req(struct rpcrdma_xprt *r_xprt)
return req;
 }
 
-struct rpcrdma_rep *
+/**
+ * rpcrdma_create_rep - Allocate an rpcrdma_rep object
+ * @r_xprt: controlling transport
+ *
+ * Returns 0 on success or a negative errno on failure.
+ */
+int
 rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
 {
struct rpcrdma_create_data_internal *cdata = _xprt->rx_data;
+   struct rpcrdma_buffer *buf = _xprt->rx_buf;
struct rpcrdma_rep *rep;
int rc;
 
@@ -1121,12 +1128,18 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
rep->rr_recv_wr.wr_cqe = >rr_cqe;
rep->rr_recv_wr.sg_list = >rr_rdmabuf->rg_iov;
rep->rr_recv_wr.num_sge = 1;
-   return rep;
+
+   spin_lock(>rb_lock);
+   list_add(>rr_list, >rb_recv_bufs);
+   spin_unlock(>rb_lock);
+   return 0;
 
 out_free:
kfree(rep);
 out:
-   return ERR_PTR(rc);
+   dprintk("RPC:   %s: reply buffer %d alloc failed\n",
+   __func__, rc);
+   return rc;
 }
 
 int
@@ -1167,17 +1180,10 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
}
 
INIT_LIST_HEAD(>rb_recv_bufs);
-   for (i = 0; i < buf->rb_max_requests + RPCRDMA_MAX_BC_REQUESTS; i++) {
-   struct rpcrdma_rep *rep;
-
-   rep = rpcrdma_create_rep(r_xprt);
-   if (IS_ERR(rep)) {
-   dprintk("RPC:   %s: reply buffer %d alloc failed\n",
-   __func__, i);
-   rc = PTR_ERR(rep);
+   for (i = 0; i <= buf->rb_max_requests; i++) {
+   rc = rpcrdma_create_rep(r_xprt);
+   if (rc)
goto out;
-   }
-   list_add(>rr_list, >rb_recv_bufs);
}
 
rc = rpcrdma_sendctxs_create(r_xprt);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 1342f743f1c4..3b63e61feae2 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -564,8 +564,8 @@ int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct 
rpcrdma_rep *);
  * Buffer calls - xprtrdma/verbs.c
  */
 struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *);
-struct rpcrdma_rep *rpcrdma_create_rep(struct rpcrdma_xprt *);
 void rpcrdma_destroy_req(struct rpcrdma_req *);
+int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt);
 int rpcrdma_buffer_create(struct rpcrdma_xprt *);
 void rpcrdma_buffer_destroy(struct

[PATCH AUTOSEL for 4.15 009/189] ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()

2018-04-08 Thread Sasha Levin

From: Dan Carpenter 

[ Upstream commit 123af9043e93cb6f235207d260d50f832cdb5439 ]

The loop timeout doesn't work because it's a post op and ends with "tmo"
set to -1.  I changed it from a post-op to a pre-op and I changed the
initial the starting value from 5 to 6 so we still iterate 5 times.  I
left the other as it was because it's a large number.

Fixes: b3c70c9ea62a ("ASoC: Alchemy AC97C/I2SC audio support")
Signed-off-by: Dan Carpenter 
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/au1x/ac97c.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sound/soc/au1x/ac97c.c b/sound/soc/au1x/ac97c.c
index 29a97d52e8ad..66d6c52e7761 100644
--- a/sound/soc/au1x/ac97c.c
+++ b/sound/soc/au1x/ac97c.c
@@ -91,8 +91,8 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 
*ac97,
do {
mutex_lock(>lock);
 
-   tmo = 5;
-   while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--)
+   tmo = 6;
+   while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo)
udelay(21); /* wait an ac97 frame time */
if (!tmo) {
pr_debug("ac97rd timeout #1\n");
@@ -105,7 +105,7 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 
*ac97,
 * poll, Forrest, poll...
 */
tmo = 0x1;
-   while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--)
+   while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo)
asm volatile ("nop");
data = RD(ctx, AC97_CMDRESP);
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 017/189] xprtrdma: Fix backchannel allocation of extra rpcrdma_reps

2018-04-08 Thread Sasha Levin

From: Chuck Lever 

[ Upstream commit d698c4a02ee02053bbebe051322ff427a2dad56a ]

The backchannel code uses rpcrdma_recv_buffer_put to add new reps
to the free rep list. This also decrements rb_recv_count, which
spoofs the receive overrun logic in rpcrdma_buffer_get_rep.

Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of
spin_lock_irqsave(rb_lock)") replaced the original open-coded
list_add with a call to rpcrdma_recv_buffer_put(), but then a year
later, commit 05c974669ece ("xprtrdma: Fix receive buffer
accounting") added rep accounting to rpcrdma_recv_buffer_put.
It was an oversight to let the backchannel continue to use this
function.

The fix this, let's combine the "add to free list" logic with
rpcrdma_create_rep.

Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in
rpcrdma_buffer_create and then allocate additional rpcrdma_reps in
rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel
set-up is sufficient.

Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting")
Signed-off-by: Chuck Lever 
Signed-off-by: Anna Schumaker 
Signed-off-by: Sasha Levin 
---
 net/sunrpc/xprtrdma/backchannel.c | 12 ++--
 net/sunrpc/xprtrdma/verbs.c   | 32 +++-
 net/sunrpc/xprtrdma/xprt_rdma.h   |  2 +-
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c 
b/net/sunrpc/xprtrdma/backchannel.c
index 8b818bb3518a..256c67b433c1 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -74,21 +74,13 @@ out_fail:
 static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt,
 unsigned int count)
 {
-   struct rpcrdma_rep *rep;
int rc = 0;
 
while (count--) {
-   rep = rpcrdma_create_rep(r_xprt);
-   if (IS_ERR(rep)) {
-   pr_err("RPC:   %s: reply buffer alloc failed\n",
-  __func__);
-   rc = PTR_ERR(rep);
+   rc = rpcrdma_create_rep(r_xprt);
+   if (rc)
break;
-   }
-
-   rpcrdma_recv_buffer_put(rep);
}
-
return rc;
 }
 
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 8cd7ee4fa0cd..371fbd9b55bb 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1093,10 +1093,17 @@ rpcrdma_create_req(struct rpcrdma_xprt *r_xprt)
return req;
 }
 
-struct rpcrdma_rep *
+/**
+ * rpcrdma_create_rep - Allocate an rpcrdma_rep object
+ * @r_xprt: controlling transport
+ *
+ * Returns 0 on success or a negative errno on failure.
+ */
+int
 rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
 {
struct rpcrdma_create_data_internal *cdata = _xprt->rx_data;
+   struct rpcrdma_buffer *buf = _xprt->rx_buf;
struct rpcrdma_rep *rep;
int rc;
 
@@ -1121,12 +1128,18 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
rep->rr_recv_wr.wr_cqe = >rr_cqe;
rep->rr_recv_wr.sg_list = >rr_rdmabuf->rg_iov;
rep->rr_recv_wr.num_sge = 1;
-   return rep;
+
+   spin_lock(>rb_lock);
+   list_add(>rr_list, >rb_recv_bufs);
+   spin_unlock(>rb_lock);
+   return 0;
 
 out_free:
kfree(rep);
 out:
-   return ERR_PTR(rc);
+   dprintk("RPC:   %s: reply buffer %d alloc failed\n",
+   __func__, rc);
+   return rc;
 }
 
 int
@@ -1167,17 +1180,10 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
}
 
INIT_LIST_HEAD(>rb_recv_bufs);
-   for (i = 0; i < buf->rb_max_requests + RPCRDMA_MAX_BC_REQUESTS; i++) {
-   struct rpcrdma_rep *rep;
-
-   rep = rpcrdma_create_rep(r_xprt);
-   if (IS_ERR(rep)) {
-   dprintk("RPC:   %s: reply buffer %d alloc failed\n",
-   __func__, i);
-   rc = PTR_ERR(rep);
+   for (i = 0; i <= buf->rb_max_requests; i++) {
+   rc = rpcrdma_create_rep(r_xprt);
+   if (rc)
goto out;
-   }
-   list_add(>rr_list, >rb_recv_bufs);
}
 
rc = rpcrdma_sendctxs_create(r_xprt);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 1342f743f1c4..3b63e61feae2 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -564,8 +564,8 @@ int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct 
rpcrdma_rep *);
  * Buffer calls - xprtrdma/verbs.c
  */
 struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *);
-struct rpcrdma_rep *rpcrdma_create_rep(struct rpcrdma_xprt *);
 void rpcrdma_destroy_req(struct rpcrdma_req *);
+int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt);
 int rpcrdma_buffer_create(struct rpcrdma_xprt *);
 void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);
 struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf);
-- 
2.15.1

[PATCH AUTOSEL for 4.15 009/189] ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()

2018-04-08 Thread Sasha Levin

From: Dan Carpenter 

[ Upstream commit 123af9043e93cb6f235207d260d50f832cdb5439 ]

The loop timeout doesn't work because it's a post op and ends with "tmo"
set to -1.  I changed it from a post-op to a pre-op and I changed the
initial the starting value from 5 to 6 so we still iterate 5 times.  I
left the other as it was because it's a large number.

Fixes: b3c70c9ea62a ("ASoC: Alchemy AC97C/I2SC audio support")
Signed-off-by: Dan Carpenter 
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/au1x/ac97c.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sound/soc/au1x/ac97c.c b/sound/soc/au1x/ac97c.c
index 29a97d52e8ad..66d6c52e7761 100644
--- a/sound/soc/au1x/ac97c.c
+++ b/sound/soc/au1x/ac97c.c
@@ -91,8 +91,8 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 
*ac97,
do {
mutex_lock(>lock);
 
-   tmo = 5;
-   while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--)
+   tmo = 6;
+   while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo)
udelay(21); /* wait an ac97 frame time */
if (!tmo) {
pr_debug("ac97rd timeout #1\n");
@@ -105,7 +105,7 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 
*ac97,
 * poll, Forrest, poll...
 */
tmo = 0x1;
-   while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--)
+   while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo)
asm volatile ("nop");
data = RD(ctx, AC97_CMDRESP);
 
-- 
2.15.1

[PATCH AUTOSEL for 4.15 024/189] Input: synaptics - reset the ABS_X/Y fuzz after initializing MT axes

2018-04-08 Thread Sasha Levin

From: Peter Hutterer 

[ Upstream commit 19eb4ed1141bd1096b9bc84ba9c4d03d5830c143 ]

input_mt_init_slots() resets the ABS_X/Y fuzz to 0 and expects the driver
to call input_mt_report_pointer_emulation(). That is based on the MT
position bits which are already defuzzed - hence a fuzz of 0.

In the case of synaptics semi-mt devices, we report the ABS_X/Y axes
manually.  This results in the MT position being defuzzed but the
single-touch emulation missing that defuzzing.

Work around this by re-initializing the ABS_X/Y axes after the MT axis to
get the same fuzz value back.

https://bugs.freedesktop.org/show_bug.cgi?id=104533

Signed-off-by: Peter Hutterer 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Sasha Levin 
---
 drivers/input/mouse/synaptics.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
index ee5466a374bf..a246fc686bb7 100644
--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -1280,6 +1280,16 @@ static void set_input_params(struct psmouse *psmouse,
INPUT_MT_POINTER |
(cr48_profile_sensor ?
INPUT_MT_TRACK : INPUT_MT_SEMI_MT));
+
+   /*
+* For semi-mt devices we send ABS_X/Y ourselves instead of
+* input_mt_report_pointer_emulation. But
+* input_mt_init_slots() resets the fuzz to 0, leading to a
+* filtered ABS_MT_POSITION_X but an unfiltered ABS_X
+* position. Let's re-initialize ABS_X/Y here.
+*/
+   if (!cr48_profile_sensor)
+   set_abs_position_params(dev, >info, ABS_X, ABS_Y);
}
 
if (SYN_CAP_PALMDETECT(info->capabilities))
-- 
2.15.1

[PATCH AUTOSEL for 4.15 024/189] Input: synaptics - reset the ABS_X/Y fuzz after initializing MT axes

2018-04-08 Thread Sasha Levin

From: Peter Hutterer 

[ Upstream commit 19eb4ed1141bd1096b9bc84ba9c4d03d5830c143 ]

input_mt_init_slots() resets the ABS_X/Y fuzz to 0 and expects the driver
to call input_mt_report_pointer_emulation(). That is based on the MT
position bits which are already defuzzed - hence a fuzz of 0.

In the case of synaptics semi-mt devices, we report the ABS_X/Y axes
manually.  This results in the MT position being defuzzed but the
single-touch emulation missing that defuzzing.

Work around this by re-initializing the ABS_X/Y axes after the MT axis to
get the same fuzz value back.

https://bugs.freedesktop.org/show_bug.cgi?id=104533

Signed-off-by: Peter Hutterer 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Sasha Levin 
---
 drivers/input/mouse/synaptics.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
index ee5466a374bf..a246fc686bb7 100644
--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -1280,6 +1280,16 @@ static void set_input_params(struct psmouse *psmouse,
INPUT_MT_POINTER |
(cr48_profile_sensor ?
INPUT_MT_TRACK : INPUT_MT_SEMI_MT));
+
+   /*
+* For semi-mt devices we send ABS_X/Y ourselves instead of
+* input_mt_report_pointer_emulation. But
+* input_mt_init_slots() resets the fuzz to 0, leading to a
+* filtered ABS_MT_POSITION_X but an unfiltered ABS_X
+* position. Let's re-initialize ABS_X/Y here.
+*/
+   if (!cr48_profile_sensor)
+   set_abs_position_params(dev, >info, ABS_X, ABS_Y);
}
 
if (SYN_CAP_PALMDETECT(info->capabilities))
-- 
2.15.1

[PATCH AUTOSEL for 4.15 022/189] Input: psmouse - fix Synaptics detection when protocol is disabled

2018-04-08 Thread Sasha Levin

From: Dmitry Torokhov 

[ Upstream commit 2bc4298f59d2f15175bb568e2d356b5912d0cdd9 ]

When Synaptics protocol is disabled, we still need to try and detect the
hardware, so we can switch to SMBus device if SMbus is detected, or we know
that it is Synaptics device and reset it properly for the bare PS/2
protocol.

Fixes: c378b5119eb0 ("Input: psmouse - factor out common protocol probing code")
Reported-by: Matteo Croce 
Tested-by: Matteo Croce 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Sasha Levin 
---
 drivers/input/mouse/psmouse-base.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/input/mouse/psmouse-base.c 
b/drivers/input/mouse/psmouse-base.c
index 6a5649e52eed..8ac9e03c05b4 100644
--- a/drivers/input/mouse/psmouse-base.c
+++ b/drivers/input/mouse/psmouse-base.c
@@ -975,6 +975,21 @@ static void psmouse_apply_defaults(struct psmouse *psmouse)
psmouse->pt_deactivate = NULL;
 }
 
+static bool psmouse_do_detect(int (*detect)(struct psmouse *, bool),
+ struct psmouse *psmouse, bool allow_passthrough,
+ bool set_properties)
+{
+   if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU &&
+   !allow_passthrough) {
+   return false;
+   }
+
+   if (set_properties)
+   psmouse_apply_defaults(psmouse);
+
+   return detect(psmouse, set_properties) == 0;
+}
+
 static bool psmouse_try_protocol(struct psmouse *psmouse,
 enum psmouse_type type,
 unsigned int *max_proto,
@@ -986,15 +1001,8 @@ static bool psmouse_try_protocol(struct psmouse *psmouse,
if (!proto)
return false;
 
-   if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU &&
-   !proto->try_passthru) {
-   return false;
-   }
-
-   if (set_properties)
-   psmouse_apply_defaults(psmouse);
-
-   if (proto->detect(psmouse, set_properties) != 0)
+   if (!psmouse_do_detect(proto->detect, psmouse, proto->try_passthru,
+  set_properties))
return false;
 
if (set_properties && proto->init && init_allowed) {
@@ -1027,8 +1035,8 @@ static int psmouse_extensions(struct psmouse *psmouse,
 * Always check for focaltech, this is safe as it uses pnp-id
 * matching.
 */
-   if (psmouse_try_protocol(psmouse, PSMOUSE_FOCALTECH,
-_proto, set_properties, false)) {
+   if (psmouse_do_detect(focaltech_detect,
+ psmouse, false, set_properties)) {
if (max_proto > PSMOUSE_IMEX &&
IS_ENABLED(CONFIG_MOUSE_PS2_FOCALTECH) &&
(!set_properties || focaltech_init(psmouse) == 0)) {
@@ -1074,8 +1082,8 @@ static int psmouse_extensions(struct psmouse *psmouse,
 * probing for IntelliMouse.
 */
if (max_proto > PSMOUSE_PS2 &&
-   psmouse_try_protocol(psmouse, PSMOUSE_SYNAPTICS, _proto,
-set_properties, false)) {
+   psmouse_do_detect(synaptics_detect,
+ psmouse, false, set_properties)) {
synaptics_hardware = true;
 
if (max_proto > PSMOUSE_IMEX) {
-- 
2.15.1

[PATCH AUTOSEL for 4.15 022/189] Input: psmouse - fix Synaptics detection when protocol is disabled

2018-04-08 Thread Sasha Levin

From: Dmitry Torokhov 

[ Upstream commit 2bc4298f59d2f15175bb568e2d356b5912d0cdd9 ]

When Synaptics protocol is disabled, we still need to try and detect the
hardware, so we can switch to SMBus device if SMbus is detected, or we know
that it is Synaptics device and reset it properly for the bare PS/2
protocol.

Fixes: c378b5119eb0 ("Input: psmouse - factor out common protocol probing code")
Reported-by: Matteo Croce 
Tested-by: Matteo Croce 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Sasha Levin 
---
 drivers/input/mouse/psmouse-base.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/input/mouse/psmouse-base.c 
b/drivers/input/mouse/psmouse-base.c
index 6a5649e52eed..8ac9e03c05b4 100644
--- a/drivers/input/mouse/psmouse-base.c
+++ b/drivers/input/mouse/psmouse-base.c
@@ -975,6 +975,21 @@ static void psmouse_apply_defaults(struct psmouse *psmouse)
psmouse->pt_deactivate = NULL;
 }
 
+static bool psmouse_do_detect(int (*detect)(struct psmouse *, bool),
+ struct psmouse *psmouse, bool allow_passthrough,
+ bool set_properties)
+{
+   if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU &&
+   !allow_passthrough) {
+   return false;
+   }
+
+   if (set_properties)
+   psmouse_apply_defaults(psmouse);
+
+   return detect(psmouse, set_properties) == 0;
+}
+
 static bool psmouse_try_protocol(struct psmouse *psmouse,
 enum psmouse_type type,
 unsigned int *max_proto,
@@ -986,15 +1001,8 @@ static bool psmouse_try_protocol(struct psmouse *psmouse,
if (!proto)
return false;
 
-   if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU &&
-   !proto->try_passthru) {
-   return false;
-   }
-
-   if (set_properties)
-   psmouse_apply_defaults(psmouse);
-
-   if (proto->detect(psmouse, set_properties) != 0)
+   if (!psmouse_do_detect(proto->detect, psmouse, proto->try_passthru,
+  set_properties))
return false;
 
if (set_properties && proto->init && init_allowed) {
@@ -1027,8 +1035,8 @@ static int psmouse_extensions(struct psmouse *psmouse,
 * Always check for focaltech, this is safe as it uses pnp-id
 * matching.
 */
-   if (psmouse_try_protocol(psmouse, PSMOUSE_FOCALTECH,
-_proto, set_properties, false)) {
+   if (psmouse_do_detect(focaltech_detect,
+ psmouse, false, set_properties)) {
if (max_proto > PSMOUSE_IMEX &&
IS_ENABLED(CONFIG_MOUSE_PS2_FOCALTECH) &&
(!set_properties || focaltech_init(psmouse) == 0)) {
@@ -1074,8 +1082,8 @@ static int psmouse_extensions(struct psmouse *psmouse,
 * probing for IntelliMouse.
 */
if (max_proto > PSMOUSE_PS2 &&
-   psmouse_try_protocol(psmouse, PSMOUSE_SYNAPTICS, _proto,
-set_properties, false)) {
+   psmouse_do_detect(synaptics_detect,
+ psmouse, false, set_properties)) {
synaptics_hardware = true;
 
if (max_proto > PSMOUSE_IMEX) {
-- 
2.15.1

[PATCH AUTOSEL for 4.15 025/189] i40iw: Free IEQ resources

2018-04-08 Thread Sasha Levin

From: Mustafa Ismail 

[ Upstream commit f20d429511affab6a2a9129f46042f43e6ffe396 ]

The iWARP Exception Queue (IEQ) resources are not freed when a QP is
destroyed. Fix this by freeing IEQ resources when freeing QP resources.

Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Signed-off-by: Mustafa Ismail 
Signed-off-by: Shiraz Saleem 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Sasha Levin 
---
 drivers/infiniband/hw/i40iw/i40iw_puda.c  | 3 +--
 drivers/infiniband/hw/i40iw/i40iw_puda.h  | 1 +
 drivers/infiniband/hw/i40iw/i40iw_verbs.c | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.c 
b/drivers/infiniband/hw/i40iw/i40iw_puda.c
index 796a815b53fd..266c5952ba92 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_puda.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_puda.c
@@ -48,7 +48,6 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, void 
*sqwrid);
 static void i40iw_ilq_putback_rcvbuf(struct i40iw_sc_qp *qp, u32 wqe_idx);
 static enum i40iw_status_code i40iw_puda_replenish_rq(struct i40iw_puda_rsrc
  *rsrc, bool initial);
-static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct 
i40iw_sc_qp *qp);
 /**
  * i40iw_puda_get_listbuf - get buffer from puda list
  * @list: list to use for buffers (ILQ or IEQ)
@@ -1483,7 +1482,7 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, 
void *sqwrid)
  * @ieq: ieq resource
  * @qp: all pending fpdu buffers
  */
-static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct 
i40iw_sc_qp *qp)
+void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp)
 {
struct i40iw_puda_buf *buf;
struct i40iw_pfpdu *pfpdu = >pfpdu;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.h 
b/drivers/infiniband/hw/i40iw/i40iw_puda.h
index 660aa3edae56..53a7d58c84b5 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_puda.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_puda.h
@@ -184,4 +184,5 @@ enum i40iw_status_code i40iw_cqp_qp_create_cmd(struct 
i40iw_sc_dev *dev, struct
 enum i40iw_status_code i40iw_cqp_cq_create_cmd(struct i40iw_sc_dev *dev, 
struct i40iw_sc_cq *cq);
 void i40iw_cqp_qp_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_qp 
*qp);
 void i40iw_cqp_cq_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_cq 
*cq);
+void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp);
 #endif
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c 
b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 3c6f3ce88f89..6aa613835405 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -412,6 +412,7 @@ void i40iw_free_qp_resources(struct i40iw_device *iwdev,
 {
struct i40iw_pbl *iwpbl = >iwpbl;
 
+   i40iw_ieq_cleanup_qp(iwdev->vsi.ieq, >sc_qp);
i40iw_dealloc_push_page(iwdev, >sc_qp);
if (qp_num)
i40iw_free_resource(iwdev, iwdev->allocated_qps, qp_num);
-- 
2.15.1

Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread 张海斌


> On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote:
> > handle_tx will delay rx for tens or even hundreds of milliseconds when tx 
> > busy
> > polling udp packets with small length(e.g. 1byte udp payload), because 
> > setting
> > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> > length.
> > 
> > Ping-Latencies shown below were tested between two Virtual Machines using
> > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > 
> > Packet-Weight  Ping-Latencies(millisecond)
> >min  avg   max
> > Origin   3.319   18.48957.303
> > 64   1.6432.021 2.552
> > 128  1.8252.600 3.224
> > 256  1.9972.710 4.295
> > 512  1.8603.171 4.631
> > 1024 2.0024.173 9.056
> > 2048 2.2575.650 9.688
> > 4096 2.0938.50815.943
>
> And this is with Q size 256 right?

Yes. Ping-latencies with 512 VQ size show below.

Packet-Weight  Ping-Latencies(millisecond)
min  avg   max
Origin   6.357   29.17766.245
64   2.7983.614 4.403
128  2.8613.820 4.775
256  3.0084.018 4.807
512  3.2544.523 5.824
1024 3.0795.335 7.747
2048 3.9448.201 12.762
4096 4.158   11.05719.985

We will submit again. Is there anything else?

>
> > Ring size is a hint from device about a burst size it can tolerate. Based on
> > benchmarks, set the weight to 2 * vq size.
> > 
> > To evaluate this change, another tests were done using netperf(RR, TX) 
> > between
> > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> > tweaked through qemu. Results shown below does not show obvious changes.
>
> What I asked for is ping-latency with different VQ sizes,
> streaming below does not show anything.
>
> > vq size=256 TCP_RRvq size=512 TCP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >1/   1/  -7%/-2%  1/   1/   0%/-2%
> >1/   4/  +1%/ 0%  1/   4/  +1%/ 0%
> >1/   8/  +1%/-2%  1/   8/   0%/+1%
> >   64/   1/  -6%/ 0% 64/   1/  +7%/+3%
> >   64/   4/   0%/+2% 64/   4/  -1%/+1%
> >   64/   8/   0%/ 0% 64/   8/  -1%/-2%
> >  256/   1/  -3%/-4%256/   1/  -4%/-2%
> >  256/   4/  +3%/+4%256/   4/  +1%/+2%
> >  256/   8/  +2%/ 0%256/   8/  +1%/-1%
> > 
> > vq size=256 UDP_RRvq size=512 UDP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >1/   1/  -5%/+1%  1/   1/  -3%/-2%
> >1/   4/  +4%/+1%  1/   4/  -2%/+2%
> >1/   8/  -1%/-1%  1/   8/  -1%/ 0%
> >   64/   1/  -2%/-3% 64/   1/  +1%/+1%
> >   64/   4/  -5%/-1% 64/   4/  +2%/ 0%
> >   64/   8/   0%/-1% 64/   8/  -2%/+1%
> >  256/   1/  +7%/+1%256/   1/  -7%/ 0%
> >  256/   4/  +1%/+1%256/   4/  -3%/-4%
> >  256/   8/  +2%/+2%256/   8/  +1%/+1%
> > 
> > vq size=256 TCP_STREAMvq size=512 TCP_STREAM
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >   64/   1/   0%/-3% 64/   1/   0%/ 0%
> >   64/   4/  +3%/-1% 64/   4/  -2%/+4%
> >   64/   8/  +9%/-4% 64/   8/  -1%/+2%
> >  256/   1/  +1%/-4%256/   1/  +1%/+1%
> >  256/   4/  -1%/-1%256/   4/  -3%/ 0%
> >  256/   8/  +7%/+5%256/   8/  -3%/ 0%
> >  512/   1/  +1%/ 0%512/   1/  -1%/-1%
> >  512/   4/  +1%/-1%512/   4/   0%/ 0%
> >  512/   8/  +7%/-5%512/   8/  +6%/-1%
> > 1024/   1/   0%/-1%   1024/   1/   0%/+1%
> > 1024/   4/  +3%/ 0%   1024/   4/  +1%/ 0%
> > 1024/   8/  +8%/+5%   1024/   8/  -1%/ 0%
> > 2048/   1/  +2%/+2%   2048/   1/  -1%/ 0%
> > 2048/   4/  +1%/ 0%   2048/   4/   0%/-1%
> > 2048/   8/  -2%/ 0%   2048/   8/   5%/-1%
> > 4096/   1/  -2%/ 0%   4096/   1/  -2%/ 0%
> > 4096/   4/  +2%/ 0%   4096/   4/   0%/ 0%
> > 4096/   8/  +9%/-2%   4096/   8/  -5%/-1%
> > 
> > Signed-off-by: Haibin Zhang 
> >

[PATCH AUTOSEL for 4.15 025/189] i40iw: Free IEQ resources

2018-04-08 Thread Sasha Levin

From: Mustafa Ismail 

[ Upstream commit f20d429511affab6a2a9129f46042f43e6ffe396 ]

The iWARP Exception Queue (IEQ) resources are not freed when a QP is
destroyed. Fix this by freeing IEQ resources when freeing QP resources.

Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Signed-off-by: Mustafa Ismail 
Signed-off-by: Shiraz Saleem 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Sasha Levin 
---
 drivers/infiniband/hw/i40iw/i40iw_puda.c  | 3 +--
 drivers/infiniband/hw/i40iw/i40iw_puda.h  | 1 +
 drivers/infiniband/hw/i40iw/i40iw_verbs.c | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.c 
b/drivers/infiniband/hw/i40iw/i40iw_puda.c
index 796a815b53fd..266c5952ba92 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_puda.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_puda.c
@@ -48,7 +48,6 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, void 
*sqwrid);
 static void i40iw_ilq_putback_rcvbuf(struct i40iw_sc_qp *qp, u32 wqe_idx);
 static enum i40iw_status_code i40iw_puda_replenish_rq(struct i40iw_puda_rsrc
  *rsrc, bool initial);
-static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct 
i40iw_sc_qp *qp);
 /**
  * i40iw_puda_get_listbuf - get buffer from puda list
  * @list: list to use for buffers (ILQ or IEQ)
@@ -1483,7 +1482,7 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, 
void *sqwrid)
  * @ieq: ieq resource
  * @qp: all pending fpdu buffers
  */
-static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct 
i40iw_sc_qp *qp)
+void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp)
 {
struct i40iw_puda_buf *buf;
struct i40iw_pfpdu *pfpdu = >pfpdu;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.h 
b/drivers/infiniband/hw/i40iw/i40iw_puda.h
index 660aa3edae56..53a7d58c84b5 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_puda.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_puda.h
@@ -184,4 +184,5 @@ enum i40iw_status_code i40iw_cqp_qp_create_cmd(struct 
i40iw_sc_dev *dev, struct
 enum i40iw_status_code i40iw_cqp_cq_create_cmd(struct i40iw_sc_dev *dev, 
struct i40iw_sc_cq *cq);
 void i40iw_cqp_qp_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_qp 
*qp);
 void i40iw_cqp_cq_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_cq 
*cq);
+void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp);
 #endif
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c 
b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 3c6f3ce88f89..6aa613835405 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -412,6 +412,7 @@ void i40iw_free_qp_resources(struct i40iw_device *iwdev,
 {
struct i40iw_pbl *iwpbl = >iwpbl;
 
+   i40iw_ieq_cleanup_qp(iwdev->vsi.ieq, >sc_qp);
i40iw_dealloc_push_page(iwdev, >sc_qp);
if (qp_num)
i40iw_free_resource(iwdev, iwdev->allocated_qps, qp_num);
-- 
2.15.1

Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread 张海斌


> On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote:
> > handle_tx will delay rx for tens or even hundreds of milliseconds when tx 
> > busy
> > polling udp packets with small length(e.g. 1byte udp payload), because 
> > setting
> > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> > length.
> > 
> > Ping-Latencies shown below were tested between two Virtual Machines using
> > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > 
> > Packet-Weight  Ping-Latencies(millisecond)
> >min  avg   max
> > Origin   3.319   18.48957.303
> > 64   1.6432.021 2.552
> > 128  1.8252.600 3.224
> > 256  1.9972.710 4.295
> > 512  1.8603.171 4.631
> > 1024 2.0024.173 9.056
> > 2048 2.2575.650 9.688
> > 4096 2.0938.50815.943
>
> And this is with Q size 256 right?

Yes. Ping-latencies with 512 VQ size show below.

Packet-Weight  Ping-Latencies(millisecond)
min  avg   max
Origin   6.357   29.17766.245
64   2.7983.614 4.403
128  2.8613.820 4.775
256  3.0084.018 4.807
512  3.2544.523 5.824
1024 3.0795.335 7.747
2048 3.9448.201 12.762
4096 4.158   11.05719.985

We will submit again. Is there anything else?

>
> > Ring size is a hint from device about a burst size it can tolerate. Based on
> > benchmarks, set the weight to 2 * vq size.
> > 
> > To evaluate this change, another tests were done using netperf(RR, TX) 
> > between
> > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> > tweaked through qemu. Results shown below does not show obvious changes.
>
> What I asked for is ping-latency with different VQ sizes,
> streaming below does not show anything.
>
> > vq size=256 TCP_RRvq size=512 TCP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >1/   1/  -7%/-2%  1/   1/   0%/-2%
> >1/   4/  +1%/ 0%  1/   4/  +1%/ 0%
> >1/   8/  +1%/-2%  1/   8/   0%/+1%
> >   64/   1/  -6%/ 0% 64/   1/  +7%/+3%
> >   64/   4/   0%/+2% 64/   4/  -1%/+1%
> >   64/   8/   0%/ 0% 64/   8/  -1%/-2%
> >  256/   1/  -3%/-4%256/   1/  -4%/-2%
> >  256/   4/  +3%/+4%256/   4/  +1%/+2%
> >  256/   8/  +2%/ 0%256/   8/  +1%/-1%
> > 
> > vq size=256 UDP_RRvq size=512 UDP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >1/   1/  -5%/+1%  1/   1/  -3%/-2%
> >1/   4/  +4%/+1%  1/   4/  -2%/+2%
> >1/   8/  -1%/-1%  1/   8/  -1%/ 0%
> >   64/   1/  -2%/-3% 64/   1/  +1%/+1%
> >   64/   4/  -5%/-1% 64/   4/  +2%/ 0%
> >   64/   8/   0%/-1% 64/   8/  -2%/+1%
> >  256/   1/  +7%/+1%256/   1/  -7%/ 0%
> >  256/   4/  +1%/+1%256/   4/  -3%/-4%
> >  256/   8/  +2%/+2%256/   8/  +1%/+1%
> > 
> > vq size=256 TCP_STREAMvq size=512 TCP_STREAM
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >   64/   1/   0%/-3% 64/   1/   0%/ 0%
> >   64/   4/  +3%/-1% 64/   4/  -2%/+4%
> >   64/   8/  +9%/-4% 64/   8/  -1%/+2%
> >  256/   1/  +1%/-4%256/   1/  +1%/+1%
> >  256/   4/  -1%/-1%256/   4/  -3%/ 0%
> >  256/   8/  +7%/+5%256/   8/  -3%/ 0%
> >  512/   1/  +1%/ 0%512/   1/  -1%/-1%
> >  512/   4/  +1%/-1%512/   4/   0%/ 0%
> >  512/   8/  +7%/-5%512/   8/  +6%/-1%
> > 1024/   1/   0%/-1%   1024/   1/   0%/+1%
> > 1024/   4/  +3%/ 0%   1024/   4/  +1%/ 0%
> > 1024/   8/  +8%/+5%   1024/   8/  -1%/ 0%
> > 2048/   1/  +2%/+2%   2048/   1/  -1%/ 0%
> > 2048/   4/  +1%/ 0%   2048/   4/   0%/-1%
> > 2048/   8/  -2%/ 0%   2048/   8/   5%/-1%
> > 4096/   1/  -2%/ 0%   4096/   1/  -2%/ 0%
> > 4096/   4/  +2%/ 0%   4096/   4/   0%/ 0%
> > 4096/   8/  +9%/-2%   4096/   8/  -5%/-1%
> > 
> > Signed-off-by: Haibin Zhang 
> > Signed-off-by: Yunfang

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2422 matches

Mail list logo