WARNING in ip_rt_bug
Hello, syzbot hit the following crash on net-next commit 8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +) Merge tag 'mlx5-updates-2018-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5991727739437056 Kernel config: https://syzkaller.appspot.com/x/.config?id=3327544840960562528 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. netlink: 'syz-executor6': attribute type 3 has an invalid length. WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x1f4/0x2b0 lib/bug.c:186 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 RSP: 0018:8801db007290 EFLAGS: 00010282 RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca RDX: 0100 RSI: 8858c300 RDI: 0282 RBP: 8801db007298 R08: 11003b600de1 R09: R10: R11: R12: 8801d8dda3c0 R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418 dst_output include/net/dst.h:444 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0xae/0x180 net/ipv4/arp.c:297 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:541 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778 [inline] RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923 RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12 RAX: dc00 RBX: 8801d225e400 RCX: RDX: 110a24e5 RSI: b98b8227 RDI: 0282 RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004 R10: 880197b3f960 R11: 0003 R12: 110032f67f36 R13: R14: R15: 0001 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84 __bprm_mm_init fs/exec.c:297 [inline] bprm_mm_init fs/exec.c:414 [inline] do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771 do_execve+0x31/0x40 fs/exec.c:1847 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
WARNING in ip_rt_bug
Hello, syzbot hit the following crash on net-next commit 8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +) Merge tag 'mlx5-updates-2018-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5991727739437056 Kernel config: https://syzkaller.appspot.com/x/.config?id=3327544840960562528 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. netlink: 'syz-executor6': attribute type 3 has an invalid length. WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x1f4/0x2b0 lib/bug.c:186 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 RSP: 0018:8801db007290 EFLAGS: 00010282 RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca RDX: 0100 RSI: 8858c300 RDI: 0282 RBP: 8801db007298 R08: 11003b600de1 R09: R10: R11: R12: 8801d8dda3c0 R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418 dst_output include/net/dst.h:444 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0xae/0x180 net/ipv4/arp.c:297 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:541 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778 [inline] RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923 RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12 RAX: dc00 RBX: 8801d225e400 RCX: RDX: 110a24e5 RSI: b98b8227 RDI: 0282 RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004 R10: 880197b3f960 R11: 0003 R12: 110032f67f36 R13: R14: R15: 0001 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84 __bprm_mm_init fs/exec.c:297 [inline] bprm_mm_init fs/exec.c:414 [inline] do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771 do_execve+0x31/0x40 fs/exec.c:1847 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
WARNING in ip_rt_bug
Hello, syzbot hit the following crash on net-next commit 8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +) Merge tag 'mlx5-updates-2018-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5991727739437056 Kernel config: https://syzkaller.appspot.com/x/.config?id=3327544840960562528 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. netlink: 'syz-executor6': attribute type 3 has an invalid length. WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x1f4/0x2b0 lib/bug.c:186 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 RSP: 0018:8801db007290 EFLAGS: 00010282 RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca RDX: 0100 RSI: 8858c300 RDI: 0282 RBP: 8801db007298 R08: 11003b600de1 R09: R10: R11: R12: 8801d8dda3c0 R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418 dst_output include/net/dst.h:444 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0xae/0x180 net/ipv4/arp.c:297 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:541 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778 [inline] RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923 RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12 RAX: dc00 RBX: 8801d225e400 RCX: RDX: 110a24e5 RSI: b98b8227 RDI: 0282 RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004 R10: 880197b3f960 R11: 0003 R12: 110032f67f36 R13: R14: R15: 0001 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84 __bprm_mm_init fs/exec.c:297 [inline] bprm_mm_init fs/exec.c:414 [inline] do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771 do_execve+0x31/0x40 fs/exec.c:1847 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
WARNING in ip_rt_bug
Hello, syzbot hit the following crash on net-next commit 8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +) Merge tag 'mlx5-updates-2018-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5991727739437056 Kernel config: https://syzkaller.appspot.com/x/.config?id=3327544840960562528 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. netlink: 'syz-executor6': attribute type 3 has an invalid length. WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x1f4/0x2b0 lib/bug.c:186 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212 RSP: 0018:8801db007290 EFLAGS: 00010282 RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca RDX: 0100 RSI: 8858c300 RDI: 0282 RBP: 8801db007298 R08: 11003b600de1 R09: R10: R11: R12: 8801d8dda3c0 R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418 dst_output include/net/dst.h:444 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0xae/0x180 net/ipv4/arp.c:297 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:541 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778 [inline] RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923 RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12 RAX: dc00 RBX: 8801d225e400 RCX: RDX: 110a24e5 RSI: b98b8227 RDI: 0282 RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004 R10: 880197b3f960 R11: 0003 R12: 110032f67f36 R13: R14: R15: 0001 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84 __bprm_mm_init fs/exec.c:297 [inline] bprm_mm_init fs/exec.c:414 [inline] do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771 do_execve+0x31/0x40 fs/exec.c:1847 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
BUG: soft lockup in snd_virmidi_output_trigger
Hello, syzbot hit the following crash on upstream commit 3fd14cdcc05a682b03743683ce3a726898b20555 (Fri Apr 6 19:15:41 2018 +) Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=619d9f40141d826b097e Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4594231414882304 Kernel config: https://syzkaller.appspot.com/x/.config?id=-5813481738265533882 compiler: gcc (GCC) 8.0.1 20180301 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+619d9f40141d826b0...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. 8021q: adding VLAN 0 to HW filter on device team0 IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready 8021q: adding VLAN 0 to HW filter on device team0 IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready 8021q: adding VLAN 0 to HW filter on device team0 watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor2:10431] Modules linked in: irq event stamp: 35856 hardirqs last enabled at (35855): [] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] hardirqs last enabled at (35855): [] _raw_spin_unlock_irqrestore+0x74/0xc0 kernel/locking/spinlock.c:184 hardirqs last disabled at (35856): [] interrupt_entry+0xb1/0xf0 arch/x86/entry/entry_64.S:624 softirqs last enabled at (162): [] __do_softirq+0x778/0xaf5 kernel/softirq.c:311 softirqs last disabled at (95): [] invoke_softirq kernel/softirq.c:365 [inline] softirqs last disabled at (95): [] irq_exit+0x1d1/0x200 kernel/softirq.c:405 CPU: 1 PID: 10431 Comm: syz-executor2 Not tainted 4.16.0+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0 kernel/locking/spinlock.c:184 RSP: 0018:880184db7780 EFLAGS: 0282 ORIG_RAX: ff13 RAX: dc00 RBX: 0282 RCX: RDX: 11162e55 RSI: 0001 RDI: 0282 RBP: 880184db7790 R08: ed0035d21962 R09: R10: R11: R12: 8801ae90cb08 R13: 880184db7810 R14: 0001 R15: 8801cb9a5880 FS: 7fc943ad8700() GS:8801db10() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fc943ad7db8 CR3: 0001b070d000 CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: spin_unlock_irqrestore include/linux/spinlock.h:365 [inline] snd_virmidi_output_trigger+0x522/0x6c0 sound/core/seq/seq_virmidi.c:205 snd_rawmidi_output_trigger sound/core/rawmidi.c:150 [inline] snd_rawmidi_kernel_write1+0x519/0x700 sound/core/rawmidi.c:1288 snd_rawmidi_write+0x2e2/0xdc0 sound/core/rawmidi.c:1338 __vfs_write+0x10b/0x880 fs/read_write.c:485 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0xf9/0x250 fs/read_write.c:598 SYSC_write fs/read_write.c:610 [inline] SyS_write+0x24/0x30 fs/read_write.c:607 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x455259 RSP: 002b:7fc943ad7c68 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 7fc943ad86d4 RCX: 00455259 RDX: e78e624c RSI: 2040 RDI: 0014 RBP: 0072c010 R08: R09: R10: R11: 0246 R12: R13: 06ca R14: 006fd390 R15: 0002 Code: c7 a8 72 b1 88 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 21 48 83 3d 4e 5b 67 01 00 74 0e 48 89 df 57 9d <0f> 1f 44 00 00 eb bb 0f 0b 0f 0b e8 6f 29 68 fa eb 97 e8 68 29 --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
BUG: soft lockup in snd_virmidi_output_trigger
Hello, syzbot hit the following crash on upstream commit 3fd14cdcc05a682b03743683ce3a726898b20555 (Fri Apr 6 19:15:41 2018 +) Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=619d9f40141d826b097e Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4594231414882304 Kernel config: https://syzkaller.appspot.com/x/.config?id=-5813481738265533882 compiler: gcc (GCC) 8.0.1 20180301 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+619d9f40141d826b0...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. 8021q: adding VLAN 0 to HW filter on device team0 IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready 8021q: adding VLAN 0 to HW filter on device team0 IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready 8021q: adding VLAN 0 to HW filter on device team0 watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor2:10431] Modules linked in: irq event stamp: 35856 hardirqs last enabled at (35855): [] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] hardirqs last enabled at (35855): [] _raw_spin_unlock_irqrestore+0x74/0xc0 kernel/locking/spinlock.c:184 hardirqs last disabled at (35856): [] interrupt_entry+0xb1/0xf0 arch/x86/entry/entry_64.S:624 softirqs last enabled at (162): [] __do_softirq+0x778/0xaf5 kernel/softirq.c:311 softirqs last disabled at (95): [] invoke_softirq kernel/softirq.c:365 [inline] softirqs last disabled at (95): [] irq_exit+0x1d1/0x200 kernel/softirq.c:405 CPU: 1 PID: 10431 Comm: syz-executor2 Not tainted 4.16.0+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0 kernel/locking/spinlock.c:184 RSP: 0018:880184db7780 EFLAGS: 0282 ORIG_RAX: ff13 RAX: dc00 RBX: 0282 RCX: RDX: 11162e55 RSI: 0001 RDI: 0282 RBP: 880184db7790 R08: ed0035d21962 R09: R10: R11: R12: 8801ae90cb08 R13: 880184db7810 R14: 0001 R15: 8801cb9a5880 FS: 7fc943ad8700() GS:8801db10() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fc943ad7db8 CR3: 0001b070d000 CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: spin_unlock_irqrestore include/linux/spinlock.h:365 [inline] snd_virmidi_output_trigger+0x522/0x6c0 sound/core/seq/seq_virmidi.c:205 snd_rawmidi_output_trigger sound/core/rawmidi.c:150 [inline] snd_rawmidi_kernel_write1+0x519/0x700 sound/core/rawmidi.c:1288 snd_rawmidi_write+0x2e2/0xdc0 sound/core/rawmidi.c:1338 __vfs_write+0x10b/0x880 fs/read_write.c:485 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0xf9/0x250 fs/read_write.c:598 SYSC_write fs/read_write.c:610 [inline] SyS_write+0x24/0x30 fs/read_write.c:607 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x455259 RSP: 002b:7fc943ad7c68 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 7fc943ad86d4 RCX: 00455259 RDX: e78e624c RSI: 2040 RDI: 0014 RBP: 0072c010 R08: R09: R10: R11: 0246 R12: R13: 06ca R14: 006fd390 R15: 0002 Code: c7 a8 72 b1 88 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 21 48 83 3d 4e 5b 67 01 00 74 0e 48 89 df 57 9d <0f> 1f 44 00 00 eb bb 0f 0b 0f 0b e8 6f 29 68 fa eb 97 e8 68 29 --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
Re: KASAN: slab-out-of-bounds Read in pfkey_add
On Sun, Apr 08, 2018 at 09:04:33PM -0700, Eric Biggers wrote: ... > > Looks like this is going to be fixed by > https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of > provided sadb_key"), but it's not applied yet to the ipsec tree yet. Kevin, > for > future reference, for syzbot bugs it would be helpful to reply to the original > bug report and say that a patch was sent out, or even better send the patch > as a > reply to the bug report email, e.g. > > git format-patch > --in-reply-to="<001a114292fadd3e250560706...@google.com>" > > for this one (and the Message ID can be found in the syzkaller-bugs archive > even > if the email isn't in your inbox). Sure, I can do that. - Kevin
Re: KASAN: slab-out-of-bounds Read in pfkey_add
On Sun, Apr 08, 2018 at 09:04:33PM -0700, Eric Biggers wrote: ... > > Looks like this is going to be fixed by > https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of > provided sadb_key"), but it's not applied yet to the ipsec tree yet. Kevin, > for > future reference, for syzbot bugs it would be helpful to reply to the original > bug report and say that a patch was sent out, or even better send the patch > as a > reply to the bug report email, e.g. > > git format-patch > --in-reply-to="<001a114292fadd3e250560706...@google.com>" > > for this one (and the Message ID can be found in the syzkaller-bugs archive > even > if the email isn't in your inbox). Sure, I can do that. - Kevin
[PATCH v3 1/4] zram: correct flag name of ZRAM_ACCESS
ZRAM_ACCESS is used for locking a slot of zram so correct the name. It is also not a common flag to indicate status of the block so move the declare position on top of the flag. Lastly, let's move the function to the top of source code to be able to use it easily without forward declaration. Signed-off-by: Minchan Kim--- drivers/block/zram/zram_drv.c | 20 ++-- drivers/block/zram/zram_drv.h | 6 +++--- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 0f3fadd71230..18dadeab775b 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -52,6 +52,16 @@ static size_t huge_class_size; static void zram_free_page(struct zram *zram, size_t index); +static void zram_slot_lock(struct zram *zram, u32 index) +{ + bit_spin_lock(ZRAM_LOCK, >table[index].value); +} + +static void zram_slot_unlock(struct zram *zram, u32 index) +{ + bit_spin_unlock(ZRAM_LOCK, >table[index].value); +} + static inline bool init_done(struct zram *zram) { return zram->disksize; @@ -753,16 +763,6 @@ static DEVICE_ATTR_RO(io_stat); static DEVICE_ATTR_RO(mm_stat); static DEVICE_ATTR_RO(debug_stat); -static void zram_slot_lock(struct zram *zram, u32 index) -{ - bit_spin_lock(ZRAM_ACCESS, >table[index].value); -} - -static void zram_slot_unlock(struct zram *zram, u32 index) -{ - bit_spin_unlock(ZRAM_ACCESS, >table[index].value); -} - static void zram_meta_free(struct zram *zram, u64 disksize) { size_t num_pages = disksize >> PAGE_SHIFT; diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 008861220723..8d8959ceabd1 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -43,9 +43,9 @@ /* Flags for zram pages (table[page_no].value) */ enum zram_pageflags { - /* Page consists the same element */ - ZRAM_SAME = ZRAM_FLAG_SHIFT, - ZRAM_ACCESS,/* page is now accessed */ + /* zram slot is locked */ + ZRAM_LOCK = ZRAM_FLAG_SHIFT, + ZRAM_SAME, /* Page consists the same element */ ZRAM_WB,/* page is stored on backing_device */ __NR_ZRAM_PAGEFLAGS, -- 2.17.0.484.g0c8726318c-goog
[PATCH v3 1/4] zram: correct flag name of ZRAM_ACCESS
ZRAM_ACCESS is used for locking a slot of zram so correct the name. It is also not a common flag to indicate status of the block so move the declare position on top of the flag. Lastly, let's move the function to the top of source code to be able to use it easily without forward declaration. Signed-off-by: Minchan Kim --- drivers/block/zram/zram_drv.c | 20 ++-- drivers/block/zram/zram_drv.h | 6 +++--- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 0f3fadd71230..18dadeab775b 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -52,6 +52,16 @@ static size_t huge_class_size; static void zram_free_page(struct zram *zram, size_t index); +static void zram_slot_lock(struct zram *zram, u32 index) +{ + bit_spin_lock(ZRAM_LOCK, >table[index].value); +} + +static void zram_slot_unlock(struct zram *zram, u32 index) +{ + bit_spin_unlock(ZRAM_LOCK, >table[index].value); +} + static inline bool init_done(struct zram *zram) { return zram->disksize; @@ -753,16 +763,6 @@ static DEVICE_ATTR_RO(io_stat); static DEVICE_ATTR_RO(mm_stat); static DEVICE_ATTR_RO(debug_stat); -static void zram_slot_lock(struct zram *zram, u32 index) -{ - bit_spin_lock(ZRAM_ACCESS, >table[index].value); -} - -static void zram_slot_unlock(struct zram *zram, u32 index) -{ - bit_spin_unlock(ZRAM_ACCESS, >table[index].value); -} - static void zram_meta_free(struct zram *zram, u64 disksize) { size_t num_pages = disksize >> PAGE_SHIFT; diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 008861220723..8d8959ceabd1 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -43,9 +43,9 @@ /* Flags for zram pages (table[page_no].value) */ enum zram_pageflags { - /* Page consists the same element */ - ZRAM_SAME = ZRAM_FLAG_SHIFT, - ZRAM_ACCESS,/* page is now accessed */ + /* zram slot is locked */ + ZRAM_LOCK = ZRAM_FLAG_SHIFT, + ZRAM_SAME, /* Page consists the same element */ ZRAM_WB,/* page is stored on backing_device */ __NR_ZRAM_PAGEFLAGS, -- 2.17.0.484.g0c8726318c-goog
[PATCH v3 2/4] zram: mark incompressible page as ZRAM_HUGE
Mark incompressible pages so that we could investigate who is the owner of the incompressible pages once the page is swapped out via using upcoming zram memory tracker feature. With it, we could prevent such pages to be swapped out by using mlock. Otherwise we might remove them. This patch exposes new stat for huge pages via mm_stat. Signed-off-by: Minchan Kim--- Documentation/blockdev/zram.txt | 1 + drivers/block/zram/zram_drv.c | 17 ++--- drivers/block/zram/zram_drv.h | 2 ++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 257e65714c6a..78db38d02bc9 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -218,6 +218,7 @@ The stat file represents device's mm statistics. It consists of a single same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. pages_compacted the number of pages freed during compaction + huge_pages the number of incompressible pages 9) Deactivate: swapoff /dev/zram0 diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 18dadeab775b..777fb3339f59 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -729,14 +729,15 @@ static ssize_t mm_stat_show(struct device *dev, max_used = atomic_long_read(>stats.max_used_pages); ret = scnprintf(buf, PAGE_SIZE, - "%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n", + "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n", orig_size << PAGE_SHIFT, (u64)atomic64_read(>stats.compr_data_size), mem_used << PAGE_SHIFT, zram->limit_pages << PAGE_SHIFT, max_used << PAGE_SHIFT, (u64)atomic64_read(>stats.same_pages), - pool_stats.pages_compacted); + pool_stats.pages_compacted, + (u64)atomic64_read(>stats.huge_pages)); up_read(>init_lock); return ret; @@ -805,6 +806,11 @@ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; + if (zram_test_flag(zram, index, ZRAM_HUGE)) { + zram_clear_flag(zram, index, ZRAM_HUGE); + atomic64_dec(>stats.huge_pages); + } + if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) { zram_wb_clear(zram, index); atomic64_dec(>stats.pages_stored); @@ -973,6 +979,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, } if (unlikely(comp_len >= huge_class_size)) { + comp_len = PAGE_SIZE; if (zram_wb_enabled(zram) && allow_wb) { zcomp_stream_put(zram->comp); ret = write_to_bdev(zram, bvec, index, bio, ); @@ -984,7 +991,6 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, allow_wb = false; goto compress_again; } - comp_len = PAGE_SIZE; } /* @@ -1046,6 +1052,11 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, zram_slot_lock(zram, index); zram_free_page(zram, index); + if (comp_len == PAGE_SIZE) { + zram_set_flag(zram, index, ZRAM_HUGE); + atomic64_inc(>stats.huge_pages); + } + if (flags) { zram_set_flag(zram, index, flags); zram_set_element(zram, index, element); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 8d8959ceabd1..ff0547bdb586 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -47,6 +47,7 @@ enum zram_pageflags { ZRAM_LOCK = ZRAM_FLAG_SHIFT, ZRAM_SAME, /* Page consists the same element */ ZRAM_WB,/* page is stored on backing_device */ + ZRAM_HUGE, /* Incompressible page */ __NR_ZRAM_PAGEFLAGS, }; @@ -71,6 +72,7 @@ struct zram_stats { atomic64_t invalid_io; /* non-page-aligned I/O requests */ atomic64_t notify_free; /* no. of swap slot free notifications */ atomic64_t same_pages; /* no. of same element filled pages */ + atomic64_t huge_pages; /* no. of huge pages */ atomic64_t pages_stored;/* no. of pages currently stored */ atomic_long_t max_used_pages; /* no. of maximum pages stored */ atomic64_t writestall; /* no. of write slow paths */ -- 2.17.0.484.g0c8726318c-goog
[PATCH v3 2/4] zram: mark incompressible page as ZRAM_HUGE
Mark incompressible pages so that we could investigate who is the owner of the incompressible pages once the page is swapped out via using upcoming zram memory tracker feature. With it, we could prevent such pages to be swapped out by using mlock. Otherwise we might remove them. This patch exposes new stat for huge pages via mm_stat. Signed-off-by: Minchan Kim --- Documentation/blockdev/zram.txt | 1 + drivers/block/zram/zram_drv.c | 17 ++--- drivers/block/zram/zram_drv.h | 2 ++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 257e65714c6a..78db38d02bc9 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -218,6 +218,7 @@ The stat file represents device's mm statistics. It consists of a single same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. pages_compacted the number of pages freed during compaction + huge_pages the number of incompressible pages 9) Deactivate: swapoff /dev/zram0 diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 18dadeab775b..777fb3339f59 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -729,14 +729,15 @@ static ssize_t mm_stat_show(struct device *dev, max_used = atomic_long_read(>stats.max_used_pages); ret = scnprintf(buf, PAGE_SIZE, - "%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n", + "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n", orig_size << PAGE_SHIFT, (u64)atomic64_read(>stats.compr_data_size), mem_used << PAGE_SHIFT, zram->limit_pages << PAGE_SHIFT, max_used << PAGE_SHIFT, (u64)atomic64_read(>stats.same_pages), - pool_stats.pages_compacted); + pool_stats.pages_compacted, + (u64)atomic64_read(>stats.huge_pages)); up_read(>init_lock); return ret; @@ -805,6 +806,11 @@ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; + if (zram_test_flag(zram, index, ZRAM_HUGE)) { + zram_clear_flag(zram, index, ZRAM_HUGE); + atomic64_dec(>stats.huge_pages); + } + if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) { zram_wb_clear(zram, index); atomic64_dec(>stats.pages_stored); @@ -973,6 +979,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, } if (unlikely(comp_len >= huge_class_size)) { + comp_len = PAGE_SIZE; if (zram_wb_enabled(zram) && allow_wb) { zcomp_stream_put(zram->comp); ret = write_to_bdev(zram, bvec, index, bio, ); @@ -984,7 +991,6 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, allow_wb = false; goto compress_again; } - comp_len = PAGE_SIZE; } /* @@ -1046,6 +1052,11 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, zram_slot_lock(zram, index); zram_free_page(zram, index); + if (comp_len == PAGE_SIZE) { + zram_set_flag(zram, index, ZRAM_HUGE); + atomic64_inc(>stats.huge_pages); + } + if (flags) { zram_set_flag(zram, index, flags); zram_set_element(zram, index, element); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 8d8959ceabd1..ff0547bdb586 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -47,6 +47,7 @@ enum zram_pageflags { ZRAM_LOCK = ZRAM_FLAG_SHIFT, ZRAM_SAME, /* Page consists the same element */ ZRAM_WB,/* page is stored on backing_device */ + ZRAM_HUGE, /* Incompressible page */ __NR_ZRAM_PAGEFLAGS, }; @@ -71,6 +72,7 @@ struct zram_stats { atomic64_t invalid_io; /* non-page-aligned I/O requests */ atomic64_t notify_free; /* no. of swap slot free notifications */ atomic64_t same_pages; /* no. of same element filled pages */ + atomic64_t huge_pages; /* no. of huge pages */ atomic64_t pages_stored;/* no. of pages currently stored */ atomic_long_t max_used_pages; /* no. of maximum pages stored */ atomic64_t writestall; /* no. of write slow paths */ -- 2.17.0.484.g0c8726318c-goog
[PATCH v3 0/4] zram memory tracking
zRam as swap is useful for small memory device. However, swap means those pages on zram are mostly cold pages due to VM's LRU algorithm. Especially, once init data for application are touched for launching, they tend to be not accessed any more and finally swapped out. zRAM can store such cold pages as compressed form but it's pointless to keep in memory. As well, it's pointless to store incompressible pages to zram so better idea is app developers manages them directly like free or mlock rather than remaining them on heap. This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state to represent each block's state so admin can investigate what memory is cold|incompressible|same page with using pagemap once the pages are swapped out. The output is as follows, 30075.033841 .wh 30163.806904 s.. 30263.806919 ..h First column is zram's block index and 3rh one represents symbol (s: same page w: written page to backing store h: huge page) of the block state. Second column represents usec time unit of the block was last accessed. So above example means the 300th block is accessed at 75.033851 second and it was huge so it was written to the backing store. * From v2: * debugfs and Kconfig cleanup - Greg KH * Remove unnecesarry buffer - Sergey * Change timestamp from sec to usec * From v1: * Do not propagate error number for debugfs fail - Greg KH * Add writeback and hugepage information - Sergey Minchan Kim (4): zram: correct flag name of ZRAM_ACCESS zram: mark incompressible page as ZRAM_HUGE zram: record accessed second zram: introduce zram memory tracking Documentation/blockdev/zram.txt | 25 + drivers/block/zram/Kconfig | 9 ++ drivers/block/zram/zram_drv.c | 172 +--- drivers/block/zram/zram_drv.h | 14 ++- 4 files changed, 203 insertions(+), 17 deletions(-) -- 2.17.0.484.g0c8726318c-goog
[PATCH v3 4/4] zram: introduce zram memory tracking
zRam as swap is useful for small memory device. However, swap means those pages on zram are mostly cold pages due to VM's LRU algorithm. Especially, once init data for application are touched for launching, they tend to be not accessed any more and finally swapped out. zRAM can store such cold pages as compressed form but it's pointless to keep in memory. Better idea is app developers free them directly rather than remaining them on heap. This patch tell us last access time of each block of zram via "cat /sys/kernel/debug/zram/zram0/block_state". The output is as follows, 30075.033841 .wh 30163.806904 s.. 30263.806919 ..h First column is zram's block index and 3rh one represents symbol (s: same page w: written page to backing store h: huge page) of the block state. Second column represents usec time unit of the block was last accessed. So above example means the 300th block is accessed at 75.033851 second and it was huge so it was written to the backing store. Admin can leverage this information to catch cold|incompressible pages of process with *pagemap* once part of heaps are swapped out. Cc: Greg KHSigned-off-by: Minchan Kim --- Documentation/blockdev/zram.txt | 24 ++ drivers/block/zram/Kconfig | 9 +++ drivers/block/zram/zram_drv.c | 139 +--- drivers/block/zram/zram_drv.h | 5 ++ 4 files changed, 166 insertions(+), 11 deletions(-) diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 78db38d02bc9..45509c7d5716 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory. User should set up backing device via /sys/block/zramX/backing_dev before disksize setting. += memory tracking + +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the +zram block. It could be useful to catch cold or incompressible +pages of the proess with*pagemap. +If you enable the feature, you could see block state via +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, + + 30075.033841 .wh + 30163.806904 s.. + 30263.806919 ..h + +First column is zram's block index. +Second column is access time. +Third column is state of the block. +(s: same page +w: written page to backing store +h: huge page) + +First line of above example says 300th block is accessed at 75.033841sec +and the block's state is huge so it is written back to the backing +storage. It's a debugging feature so anyone shouldn't rely on it to work +properly. + Nitin Gupta ngu...@vflare.org diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index ac3a31d433b2..efe60c82d8ec 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -26,3 +26,12 @@ config ZRAM_WRITEBACK /sys/block/zramX/backing_dev. See zram.txt for more infomration. + +config ZRAM_MEMORY_TRACKING + bool "Tracking zram block status" + depends on ZRAM + select DEBUG_FS + help + With this feature, admin can track the state of allocated block + of zRAM. Admin could see the information via + /sys/kernel/debug/zram/zramX/block_state. diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 7fc10e2ad734..80e461dc70bc 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include "zram_drv.h" @@ -67,6 +68,13 @@ static inline bool init_done(struct zram *zram) return zram->disksize; } +static inline bool zram_allocated(struct zram *zram, u32 index) +{ + + return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) || + zram->table[index].handle; +} + static inline struct zram *dev_to_zram(struct device *dev) { return (struct zram *)dev_to_disk(dev)->private_data; @@ -83,7 +91,7 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) } /* flag operations require table entry bit_spin_lock() being held */ -static int zram_test_flag(struct zram *zram, u32 index, +static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { return zram->table[index].value & BIT(flag); @@ -107,16 +115,6 @@ static inline void zram_set_element(struct zram *zram, u32 index, zram->table[index].element = element; } -static void zram_accessed(struct zram *zram, u32 index) -{ - zram->table[index].ac_time = sched_clock(); -} - -static void zram_reset_access(struct zram *zram, u32 index) -{ - zram->table[index].ac_time = 0; -} - static unsigned long zram_get_element(struct zram *zram, u32 index) { return zram->table[index].element; @@ -620,6 +618,121 @@ static int read_from_bdev(struct zram *zram, struct
[PATCH v3 3/4] zram: record accessed second
zRam as swap is useful for small memory device. However, swap means those pages on zram are mostly cold pages due to VM's LRU algorithm. Especially, once init data for application are touched for launching, they tend to be not accessed any more and finally swapped out. zRAM can store such cold pages as compressed form but it's pointless to keep in memory. Better idea is app developers free them directly rather than remaining them on heap. This patch records last access time of each block of zram so that With upcoming zram memory tracking, it could help userspace developers to reduce memory footprint. Signed-off-by: Minchan Kim--- drivers/block/zram/zram_drv.c | 16 drivers/block/zram/zram_drv.h | 1 + 2 files changed, 17 insertions(+) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 777fb3339f59..7fc10e2ad734 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -107,6 +107,16 @@ static inline void zram_set_element(struct zram *zram, u32 index, zram->table[index].element = element; } +static void zram_accessed(struct zram *zram, u32 index) +{ + zram->table[index].ac_time = sched_clock(); +} + +static void zram_reset_access(struct zram *zram, u32 index) +{ + zram->table[index].ac_time = 0; +} + static unsigned long zram_get_element(struct zram *zram, u32 index) { return zram->table[index].element; @@ -806,6 +816,8 @@ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; + zram_reset_access(zram, index); + if (zram_test_flag(zram, index, ZRAM_HUGE)) { zram_clear_flag(zram, index, ZRAM_HUGE); atomic64_dec(>stats.huge_pages); @@ -1177,6 +1189,10 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index, generic_end_io_acct(q, rw_acct, >disk->part0, start_time); + zram_slot_lock(zram, index); + zram_accessed(zram, index); + zram_slot_unlock(zram, index); + if (unlikely(ret < 0)) { if (!is_write) atomic64_inc(>stats.failed_reads); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index ff0547bdb586..1075218e88b2 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -61,6 +61,7 @@ struct zram_table_entry { unsigned long element; }; unsigned long value; + u64 ac_time; }; struct zram_stats { -- 2.17.0.484.g0c8726318c-goog
[PATCH v3 0/4] zram memory tracking
zRam as swap is useful for small memory device. However, swap means those pages on zram are mostly cold pages due to VM's LRU algorithm. Especially, once init data for application are touched for launching, they tend to be not accessed any more and finally swapped out. zRAM can store such cold pages as compressed form but it's pointless to keep in memory. As well, it's pointless to store incompressible pages to zram so better idea is app developers manages them directly like free or mlock rather than remaining them on heap. This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state to represent each block's state so admin can investigate what memory is cold|incompressible|same page with using pagemap once the pages are swapped out. The output is as follows, 30075.033841 .wh 30163.806904 s.. 30263.806919 ..h First column is zram's block index and 3rh one represents symbol (s: same page w: written page to backing store h: huge page) of the block state. Second column represents usec time unit of the block was last accessed. So above example means the 300th block is accessed at 75.033851 second and it was huge so it was written to the backing store. * From v2: * debugfs and Kconfig cleanup - Greg KH * Remove unnecesarry buffer - Sergey * Change timestamp from sec to usec * From v1: * Do not propagate error number for debugfs fail - Greg KH * Add writeback and hugepage information - Sergey Minchan Kim (4): zram: correct flag name of ZRAM_ACCESS zram: mark incompressible page as ZRAM_HUGE zram: record accessed second zram: introduce zram memory tracking Documentation/blockdev/zram.txt | 25 + drivers/block/zram/Kconfig | 9 ++ drivers/block/zram/zram_drv.c | 172 +--- drivers/block/zram/zram_drv.h | 14 ++- 4 files changed, 203 insertions(+), 17 deletions(-) -- 2.17.0.484.g0c8726318c-goog
[PATCH v3 4/4] zram: introduce zram memory tracking
zRam as swap is useful for small memory device. However, swap means those pages on zram are mostly cold pages due to VM's LRU algorithm. Especially, once init data for application are touched for launching, they tend to be not accessed any more and finally swapped out. zRAM can store such cold pages as compressed form but it's pointless to keep in memory. Better idea is app developers free them directly rather than remaining them on heap. This patch tell us last access time of each block of zram via "cat /sys/kernel/debug/zram/zram0/block_state". The output is as follows, 30075.033841 .wh 30163.806904 s.. 30263.806919 ..h First column is zram's block index and 3rh one represents symbol (s: same page w: written page to backing store h: huge page) of the block state. Second column represents usec time unit of the block was last accessed. So above example means the 300th block is accessed at 75.033851 second and it was huge so it was written to the backing store. Admin can leverage this information to catch cold|incompressible pages of process with *pagemap* once part of heaps are swapped out. Cc: Greg KH Signed-off-by: Minchan Kim --- Documentation/blockdev/zram.txt | 24 ++ drivers/block/zram/Kconfig | 9 +++ drivers/block/zram/zram_drv.c | 139 +--- drivers/block/zram/zram_drv.h | 5 ++ 4 files changed, 166 insertions(+), 11 deletions(-) diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 78db38d02bc9..45509c7d5716 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory. User should set up backing device via /sys/block/zramX/backing_dev before disksize setting. += memory tracking + +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the +zram block. It could be useful to catch cold or incompressible +pages of the proess with*pagemap. +If you enable the feature, you could see block state via +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, + + 30075.033841 .wh + 30163.806904 s.. + 30263.806919 ..h + +First column is zram's block index. +Second column is access time. +Third column is state of the block. +(s: same page +w: written page to backing store +h: huge page) + +First line of above example says 300th block is accessed at 75.033841sec +and the block's state is huge so it is written back to the backing +storage. It's a debugging feature so anyone shouldn't rely on it to work +properly. + Nitin Gupta ngu...@vflare.org diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index ac3a31d433b2..efe60c82d8ec 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -26,3 +26,12 @@ config ZRAM_WRITEBACK /sys/block/zramX/backing_dev. See zram.txt for more infomration. + +config ZRAM_MEMORY_TRACKING + bool "Tracking zram block status" + depends on ZRAM + select DEBUG_FS + help + With this feature, admin can track the state of allocated block + of zRAM. Admin could see the information via + /sys/kernel/debug/zram/zramX/block_state. diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 7fc10e2ad734..80e461dc70bc 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include "zram_drv.h" @@ -67,6 +68,13 @@ static inline bool init_done(struct zram *zram) return zram->disksize; } +static inline bool zram_allocated(struct zram *zram, u32 index) +{ + + return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) || + zram->table[index].handle; +} + static inline struct zram *dev_to_zram(struct device *dev) { return (struct zram *)dev_to_disk(dev)->private_data; @@ -83,7 +91,7 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) } /* flag operations require table entry bit_spin_lock() being held */ -static int zram_test_flag(struct zram *zram, u32 index, +static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { return zram->table[index].value & BIT(flag); @@ -107,16 +115,6 @@ static inline void zram_set_element(struct zram *zram, u32 index, zram->table[index].element = element; } -static void zram_accessed(struct zram *zram, u32 index) -{ - zram->table[index].ac_time = sched_clock(); -} - -static void zram_reset_access(struct zram *zram, u32 index) -{ - zram->table[index].ac_time = 0; -} - static unsigned long zram_get_element(struct zram *zram, u32 index) { return zram->table[index].element; @@ -620,6 +618,121 @@ static int read_from_bdev(struct zram *zram, struct bio_vec *bvec, static void zram_wb_clear(struct
[PATCH v3 3/4] zram: record accessed second
zRam as swap is useful for small memory device. However, swap means those pages on zram are mostly cold pages due to VM's LRU algorithm. Especially, once init data for application are touched for launching, they tend to be not accessed any more and finally swapped out. zRAM can store such cold pages as compressed form but it's pointless to keep in memory. Better idea is app developers free them directly rather than remaining them on heap. This patch records last access time of each block of zram so that With upcoming zram memory tracking, it could help userspace developers to reduce memory footprint. Signed-off-by: Minchan Kim --- drivers/block/zram/zram_drv.c | 16 drivers/block/zram/zram_drv.h | 1 + 2 files changed, 17 insertions(+) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 777fb3339f59..7fc10e2ad734 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -107,6 +107,16 @@ static inline void zram_set_element(struct zram *zram, u32 index, zram->table[index].element = element; } +static void zram_accessed(struct zram *zram, u32 index) +{ + zram->table[index].ac_time = sched_clock(); +} + +static void zram_reset_access(struct zram *zram, u32 index) +{ + zram->table[index].ac_time = 0; +} + static unsigned long zram_get_element(struct zram *zram, u32 index) { return zram->table[index].element; @@ -806,6 +816,8 @@ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; + zram_reset_access(zram, index); + if (zram_test_flag(zram, index, ZRAM_HUGE)) { zram_clear_flag(zram, index, ZRAM_HUGE); atomic64_dec(>stats.huge_pages); @@ -1177,6 +1189,10 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index, generic_end_io_acct(q, rw_acct, >disk->part0, start_time); + zram_slot_lock(zram, index); + zram_accessed(zram, index); + zram_slot_unlock(zram, index); + if (unlikely(ret < 0)) { if (!is_write) atomic64_inc(>stats.failed_reads); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index ff0547bdb586..1075218e88b2 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -61,6 +61,7 @@ struct zram_table_entry { unsigned long element; }; unsigned long value; + u64 ac_time; }; struct zram_stats { -- 2.17.0.484.g0c8726318c-goog
Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
On Mon, Apr 09, 2018 at 04:09:20AM +, haibinzhang(张海斌) wrote: > > > On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote: > > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx > > > busy > > > polling udp packets with small length(e.g. 1byte udp payload), because > > > setting > > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet > > > length. > > > > > > Ping-Latencies shown below were tested between two Virtual Machines using > > > netperf (UDP_STREAM, len=1), and then another machine pinged the client: > > > > > > Packet-Weight Ping-Latencies(millisecond) > > >min avg max > > > Origin 3.319 18.48957.303 > > > 64 1.6432.021 2.552 > > > 128 1.8252.600 3.224 > > > 256 1.9972.710 4.295 > > > 512 1.8603.171 4.631 > > > 1024 2.0024.173 9.056 > > > 2048 2.2575.650 9.688 > > > 4096 2.0938.50815.943 > > > > And this is with Q size 256 right? > > Yes. Ping-latencies with 512 VQ size show below. > > Packet-Weight Ping-Latencies(millisecond) > min avg max > Origin 6.357 29.17766.245 > 64 2.7983.614 4.403 > 128 2.8613.820 4.775 > 256 3.0084.018 4.807 > 512 3.2544.523 5.824 > 1024 3.0795.335 7.747 > 2048 3.9448.201 12.762 > 4096 4.158 11.05719.985 > > We will submit again. Is there anything else? Seems pretty consistent, a small dip at 2 VQ sizes. Acked-by: Michael S. Tsirkin> > > > > Ring size is a hint from device about a burst size it can tolerate. Based > > > on > > > benchmarks, set the weight to 2 * vq size. > > > > > > To evaluate this change, another tests were done using netperf(RR, TX) > > > between > > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size > > > was > > > tweaked through qemu. Results shown below does not show obvious changes. > > > > What I asked for is ping-latency with different VQ sizes, > > streaming below does not show anything. > > > > > vq size=256 TCP_RRvq size=512 TCP_RR > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > >1/ 1/ -7%/-2% 1/ 1/ 0%/-2% > > >1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% > > >1/ 8/ +1%/-2% 1/ 8/ 0%/+1% > > > 64/ 1/ -6%/ 0% 64/ 1/ +7%/+3% > > > 64/ 4/ 0%/+2% 64/ 4/ -1%/+1% > > > 64/ 8/ 0%/ 0% 64/ 8/ -1%/-2% > > > 256/ 1/ -3%/-4%256/ 1/ -4%/-2% > > > 256/ 4/ +3%/+4%256/ 4/ +1%/+2% > > > 256/ 8/ +2%/ 0%256/ 8/ +1%/-1% > > > > > > vq size=256 UDP_RRvq size=512 UDP_RR > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > >1/ 1/ -5%/+1% 1/ 1/ -3%/-2% > > >1/ 4/ +4%/+1% 1/ 4/ -2%/+2% > > >1/ 8/ -1%/-1% 1/ 8/ -1%/ 0% > > > 64/ 1/ -2%/-3% 64/ 1/ +1%/+1% > > > 64/ 4/ -5%/-1% 64/ 4/ +2%/ 0% > > > 64/ 8/ 0%/-1% 64/ 8/ -2%/+1% > > > 256/ 1/ +7%/+1%256/ 1/ -7%/ 0% > > > 256/ 4/ +1%/+1%256/ 4/ -3%/-4% > > > 256/ 8/ +2%/+2%256/ 8/ +1%/+1% > > > > > > vq size=256 TCP_STREAMvq size=512 TCP_STREAM > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > > 64/ 1/ 0%/-3% 64/ 1/ 0%/ 0% > > > 64/ 4/ +3%/-1% 64/ 4/ -2%/+4% > > > 64/ 8/ +9%/-4% 64/ 8/ -1%/+2% > > > 256/ 1/ +1%/-4%256/ 1/ +1%/+1% > > > 256/ 4/ -1%/-1%256/ 4/ -3%/ 0% > > > 256/ 8/ +7%/+5%256/ 8/ -3%/ 0% > > > 512/ 1/ +1%/ 0%512/ 1/ -1%/-1% > > > 512/ 4/ +1%/-1%512/ 4/ 0%/ 0% > > > 512/ 8/ +7%/-5%512/ 8/ +6%/-1% > > > 1024/ 1/ 0%/-1% 1024/ 1/ 0%/+1% > > > 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% > > > 1024/ 8/ +8%/+5% 1024/ 8/ -1%/ 0% > > > 2048/ 1/ +2%/+2% 2048/ 1/ -1%/ 0% > > > 2048/ 4/ +1%/ 0% 2048/ 4/
Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
On Mon, Apr 09, 2018 at 04:09:20AM +, haibinzhang(张海斌) wrote: > > > On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote: > > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx > > > busy > > > polling udp packets with small length(e.g. 1byte udp payload), because > > > setting > > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet > > > length. > > > > > > Ping-Latencies shown below were tested between two Virtual Machines using > > > netperf (UDP_STREAM, len=1), and then another machine pinged the client: > > > > > > Packet-Weight Ping-Latencies(millisecond) > > >min avg max > > > Origin 3.319 18.48957.303 > > > 64 1.6432.021 2.552 > > > 128 1.8252.600 3.224 > > > 256 1.9972.710 4.295 > > > 512 1.8603.171 4.631 > > > 1024 2.0024.173 9.056 > > > 2048 2.2575.650 9.688 > > > 4096 2.0938.50815.943 > > > > And this is with Q size 256 right? > > Yes. Ping-latencies with 512 VQ size show below. > > Packet-Weight Ping-Latencies(millisecond) > min avg max > Origin 6.357 29.17766.245 > 64 2.7983.614 4.403 > 128 2.8613.820 4.775 > 256 3.0084.018 4.807 > 512 3.2544.523 5.824 > 1024 3.0795.335 7.747 > 2048 3.9448.201 12.762 > 4096 4.158 11.05719.985 > > We will submit again. Is there anything else? Seems pretty consistent, a small dip at 2 VQ sizes. Acked-by: Michael S. Tsirkin > > > > > Ring size is a hint from device about a burst size it can tolerate. Based > > > on > > > benchmarks, set the weight to 2 * vq size. > > > > > > To evaluate this change, another tests were done using netperf(RR, TX) > > > between > > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size > > > was > > > tweaked through qemu. Results shown below does not show obvious changes. > > > > What I asked for is ping-latency with different VQ sizes, > > streaming below does not show anything. > > > > > vq size=256 TCP_RRvq size=512 TCP_RR > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > >1/ 1/ -7%/-2% 1/ 1/ 0%/-2% > > >1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% > > >1/ 8/ +1%/-2% 1/ 8/ 0%/+1% > > > 64/ 1/ -6%/ 0% 64/ 1/ +7%/+3% > > > 64/ 4/ 0%/+2% 64/ 4/ -1%/+1% > > > 64/ 8/ 0%/ 0% 64/ 8/ -1%/-2% > > > 256/ 1/ -3%/-4%256/ 1/ -4%/-2% > > > 256/ 4/ +3%/+4%256/ 4/ +1%/+2% > > > 256/ 8/ +2%/ 0%256/ 8/ +1%/-1% > > > > > > vq size=256 UDP_RRvq size=512 UDP_RR > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > >1/ 1/ -5%/+1% 1/ 1/ -3%/-2% > > >1/ 4/ +4%/+1% 1/ 4/ -2%/+2% > > >1/ 8/ -1%/-1% 1/ 8/ -1%/ 0% > > > 64/ 1/ -2%/-3% 64/ 1/ +1%/+1% > > > 64/ 4/ -5%/-1% 64/ 4/ +2%/ 0% > > > 64/ 8/ 0%/-1% 64/ 8/ -2%/+1% > > > 256/ 1/ +7%/+1%256/ 1/ -7%/ 0% > > > 256/ 4/ +1%/+1%256/ 4/ -3%/-4% > > > 256/ 8/ +2%/+2%256/ 8/ +1%/+1% > > > > > > vq size=256 TCP_STREAMvq size=512 TCP_STREAM > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > > 64/ 1/ 0%/-3% 64/ 1/ 0%/ 0% > > > 64/ 4/ +3%/-1% 64/ 4/ -2%/+4% > > > 64/ 8/ +9%/-4% 64/ 8/ -1%/+2% > > > 256/ 1/ +1%/-4%256/ 1/ +1%/+1% > > > 256/ 4/ -1%/-1%256/ 4/ -3%/ 0% > > > 256/ 8/ +7%/+5%256/ 8/ -3%/ 0% > > > 512/ 1/ +1%/ 0%512/ 1/ -1%/-1% > > > 512/ 4/ +1%/-1%512/ 4/ 0%/ 0% > > > 512/ 8/ +7%/-5%512/ 8/ +6%/-1% > > > 1024/ 1/ 0%/-1% 1024/ 1/ 0%/+1% > > > 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% > > > 1024/ 8/ +8%/+5% 1024/ 8/ -1%/ 0% > > > 2048/ 1/ +2%/+2% 2048/ 1/ -1%/ 0% > > > 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/-1%
Re: [PATCH] crypto: DRBG - guard uninstantion by lock
Am Montag, 9. April 2018, 00:46:03 CEST schrieb Theodore Y. Ts'o: Hi Theodore, > > So the syzbot will run while the patch goes through the normal e-mail > review process, which is kind of neat. :-) Thank you very much for the hint. That is a neat feature indeed. As I came late to the party and I missed the original mails, I am wondering about which GIT repo was used and which branch of it. With that, I would be happy to resubmit with the test line. Ciao Stephan
Re: [PATCH] crypto: DRBG - guard uninstantion by lock
Am Montag, 9. April 2018, 00:46:03 CEST schrieb Theodore Y. Ts'o: Hi Theodore, > > So the syzbot will run while the patch goes through the normal e-mail > review process, which is kind of neat. :-) Thank you very much for the hint. That is a neat feature indeed. As I came late to the party and I missed the original mails, I am wondering about which GIT repo was used and which branch of it. With that, I would be happy to resubmit with the test line. Ciao Stephan
Re: [PATCH 4/4] x86: usercopy: reimplement arch_within_stack_frames with unwinder
Hi Kees, On Thu, Apr 5, 2018 at 3:11 AM, Kees Cookwrote: > [resending with the CCs I forgot...] > > On Thu, Mar 1, 2018 at 2:19 AM, wrote: >> From: Sahara >> >> The old arch_within_stack_frames which used the frame pointer is >> now reimplemented to use frame pointer unwinder apis. So the main >> functionality is same as before. >> >> Signed-off-by: Sahara > > This will result in slightly more expensive stack checking for > hardened usercopy, but I think that'd be okay if this could also be > made to be unwinder-agnostic. Then it would work for ORC too, and > wouldn't have to depend on just FRAME_POINTER. Without that, I'm not > sure what the benefit is in changing this? Exactly. It's the only reason not to depend on the FRAME_POINTER only. And, it will be better if it would work for ORC. > > Further notes below... > >> --- >> arch/x86/include/asm/unwind.h | 5 +++ >> arch/x86/kernel/stacktrace.c | 77 >> +- >> arch/x86/kernel/unwind_frame.c | 4 +-- >> 3 files changed, 60 insertions(+), 26 deletions(-) >> >> diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h >> index 1f86e1b..6f04906f 100644 >> --- a/arch/x86/include/asm/unwind.h >> +++ b/arch/x86/include/asm/unwind.h >> @@ -87,6 +87,11 @@ void unwind_init(void); >> void unwind_module_init(struct module *mod, void *orc_ip, size_t >> orc_ip_size, >> void *orc, size_t orc_size); >> #else >> +#ifdef CONFIG_UNWINDER_FRAME_POINTER >> +#define FRAME_HEADER_SIZE (sizeof(long) * 2) >> +size_t regs_size(struct pt_regs *regs); >> +#endif >> + >> static inline void unwind_init(void) {} >> static inline >> void unwind_module_init(struct module *mod, void *orc_ip, size_t >> orc_ip_size, >> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c >> index f433a33..c26eb55 100644 >> --- a/arch/x86/kernel/stacktrace.c >> +++ b/arch/x86/kernel/stacktrace.c >> @@ -12,6 +12,37 @@ >> #include >> >> >> +static inline void *get_cur_frame(struct unwind_state *state) >> +{ >> + void *frame = NULL; >> + >> +#if defined(CONFIG_UNWINDER_ORC) >> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER) >> + if (state->regs) >> + frame = (void *)state->regs; >> + else >> + frame = (void *)state->bp; >> +#else >> +#endif >> + return frame; >> +} > > What's going on here with the #if statement? Shouldn't this just be: > > +static inline void *get_cur_frame(struct unwind_state *state) > +{ > + void *frame = NULL; > + > +#ifdef CONFIG_UNWINDER_FRAME_POINTER > + if (state->regs) > + frame = (void *)state->regs; > + else > + frame = (void *)state->bp; > +#endif > + return frame; > +} > > ? Removed the unused #ifdef. > >> + >> +static inline void *get_frame_end(struct unwind_state *state) >> +{ >> + void *frame_end = NULL; >> + >> +#if defined(CONFIG_UNWINDER_ORC) >> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER) >> + if (state->regs) { >> + frame_end = (void *)state->regs + regs_size(state->regs); >> + } else { >> + frame_end = (void *)state->bp + FRAME_HEADER_SIZE; >> + } >> +#else >> +#endif >> + return frame_end; >> +} > > Same thing above? Removed the unused #ifdef. > >> + >> /* >> * Walks up the stack frames to make sure that the specified object is >> * entirely contained by a single stack frame. >> @@ -25,31 +56,31 @@ int arch_within_stack_frames(const void * const stack, >> const void * const stackend, >> const void *obj, unsigned long len) >> { >> -#if defined(CONFIG_FRAME_POINTER) >> - const void *frame = NULL; >> - const void *oldframe; >> - >> - oldframe = __builtin_frame_address(2); >> - if (oldframe) >> - frame = __builtin_frame_address(3); >> +#if defined(CONFIG_UNWINDER_FRAME_POINTER) >> + struct unwind_state state; >> + void *prev_frame_end = NULL; >> /* >> -* low --> high >> -* [saved bp][saved ip][args][local vars][saved bp][saved ip] >> -* ^^ >> -* allow copies only within here > > I think it's worth keeping this diagram: it explains what region is > being checked... Kept the comment in v2 patch. > >> +* Skip 3 non-inlined frames: arch_within_stack_frames(), >> +* check_stack_object() and __check_object_size(). >> +* >> */ >> - while (stack <= frame && frame < stackend) { >> - /* >> -* If obj + len extends past the last frame, this >> -* check won't pass and the next frame will be 0, >> -* causing us to bail out and correctly report >> -
Re: [PATCH 4/4] x86: usercopy: reimplement arch_within_stack_frames with unwinder
Hi Kees, On Thu, Apr 5, 2018 at 3:11 AM, Kees Cook wrote: > [resending with the CCs I forgot...] > > On Thu, Mar 1, 2018 at 2:19 AM, wrote: >> From: Sahara >> >> The old arch_within_stack_frames which used the frame pointer is >> now reimplemented to use frame pointer unwinder apis. So the main >> functionality is same as before. >> >> Signed-off-by: Sahara > > This will result in slightly more expensive stack checking for > hardened usercopy, but I think that'd be okay if this could also be > made to be unwinder-agnostic. Then it would work for ORC too, and > wouldn't have to depend on just FRAME_POINTER. Without that, I'm not > sure what the benefit is in changing this? Exactly. It's the only reason not to depend on the FRAME_POINTER only. And, it will be better if it would work for ORC. > > Further notes below... > >> --- >> arch/x86/include/asm/unwind.h | 5 +++ >> arch/x86/kernel/stacktrace.c | 77 >> +- >> arch/x86/kernel/unwind_frame.c | 4 +-- >> 3 files changed, 60 insertions(+), 26 deletions(-) >> >> diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h >> index 1f86e1b..6f04906f 100644 >> --- a/arch/x86/include/asm/unwind.h >> +++ b/arch/x86/include/asm/unwind.h >> @@ -87,6 +87,11 @@ void unwind_init(void); >> void unwind_module_init(struct module *mod, void *orc_ip, size_t >> orc_ip_size, >> void *orc, size_t orc_size); >> #else >> +#ifdef CONFIG_UNWINDER_FRAME_POINTER >> +#define FRAME_HEADER_SIZE (sizeof(long) * 2) >> +size_t regs_size(struct pt_regs *regs); >> +#endif >> + >> static inline void unwind_init(void) {} >> static inline >> void unwind_module_init(struct module *mod, void *orc_ip, size_t >> orc_ip_size, >> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c >> index f433a33..c26eb55 100644 >> --- a/arch/x86/kernel/stacktrace.c >> +++ b/arch/x86/kernel/stacktrace.c >> @@ -12,6 +12,37 @@ >> #include >> >> >> +static inline void *get_cur_frame(struct unwind_state *state) >> +{ >> + void *frame = NULL; >> + >> +#if defined(CONFIG_UNWINDER_ORC) >> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER) >> + if (state->regs) >> + frame = (void *)state->regs; >> + else >> + frame = (void *)state->bp; >> +#else >> +#endif >> + return frame; >> +} > > What's going on here with the #if statement? Shouldn't this just be: > > +static inline void *get_cur_frame(struct unwind_state *state) > +{ > + void *frame = NULL; > + > +#ifdef CONFIG_UNWINDER_FRAME_POINTER > + if (state->regs) > + frame = (void *)state->regs; > + else > + frame = (void *)state->bp; > +#endif > + return frame; > +} > > ? Removed the unused #ifdef. > >> + >> +static inline void *get_frame_end(struct unwind_state *state) >> +{ >> + void *frame_end = NULL; >> + >> +#if defined(CONFIG_UNWINDER_ORC) >> +#elif defined(CONFIG_UNWINDER_FRAME_POINTER) >> + if (state->regs) { >> + frame_end = (void *)state->regs + regs_size(state->regs); >> + } else { >> + frame_end = (void *)state->bp + FRAME_HEADER_SIZE; >> + } >> +#else >> +#endif >> + return frame_end; >> +} > > Same thing above? Removed the unused #ifdef. > >> + >> /* >> * Walks up the stack frames to make sure that the specified object is >> * entirely contained by a single stack frame. >> @@ -25,31 +56,31 @@ int arch_within_stack_frames(const void * const stack, >> const void * const stackend, >> const void *obj, unsigned long len) >> { >> -#if defined(CONFIG_FRAME_POINTER) >> - const void *frame = NULL; >> - const void *oldframe; >> - >> - oldframe = __builtin_frame_address(2); >> - if (oldframe) >> - frame = __builtin_frame_address(3); >> +#if defined(CONFIG_UNWINDER_FRAME_POINTER) >> + struct unwind_state state; >> + void *prev_frame_end = NULL; >> /* >> -* low --> high >> -* [saved bp][saved ip][args][local vars][saved bp][saved ip] >> -* ^^ >> -* allow copies only within here > > I think it's worth keeping this diagram: it explains what region is > being checked... Kept the comment in v2 patch. > >> +* Skip 3 non-inlined frames: arch_within_stack_frames(), >> +* check_stack_object() and __check_object_size(). >> +* >> */ >> - while (stack <= frame && frame < stackend) { >> - /* >> -* If obj + len extends past the last frame, this >> -* check won't pass and the next frame will be 0, >> -* causing us to bail out and correctly report >> -* the copy as invalid. >> -*/ > > Also seems like we should keep the comment
Re: [PATCH V2 3/9] dt-bindings: Tegra186 tachometer device tree bindings
Rob, this binding is for a specific IP block (for measuring/aggregating input pulses) on the Tegra186 SoC, so I don't think it fits into any generic binding. Thanks, Mikko On 03/27/2018 05:52 PM, Rob Herring wrote: On Wed, Mar 21, 2018 at 10:10:38AM +0530, Rajkumar Rampelli wrote: Supply Device tree binding documentation for the NVIDIA Tegra186 SoC's Tachometer Controller Signed-off-by: Rajkumar Rampelli--- V2: Renamed compatible string to "nvidia,tegra186-pwm-tachometer" Renamed dt property values of clock-names and reset-names to "tachometer" from "tach" Read my prior comments on v1. Rob -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 3/9] dt-bindings: Tegra186 tachometer device tree bindings
Rob, this binding is for a specific IP block (for measuring/aggregating input pulses) on the Tegra186 SoC, so I don't think it fits into any generic binding. Thanks, Mikko On 03/27/2018 05:52 PM, Rob Herring wrote: On Wed, Mar 21, 2018 at 10:10:38AM +0530, Rajkumar Rampelli wrote: Supply Device tree binding documentation for the NVIDIA Tegra186 SoC's Tachometer Controller Signed-off-by: Rajkumar Rampelli --- V2: Renamed compatible string to "nvidia,tegra186-pwm-tachometer" Renamed dt property values of clock-names and reset-names to "tachometer" from "tach" Read my prior comments on v1. Rob -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted
Hi RongQing, Is there an local x2apic whose ID is 0x in your machine? At 04/08/2018 07:38 PM, Li RongQing wrote: local_apic_id of acpi_madt_local_x2apic is u32, it is converted to int when checked by default_apic_id_valid() and return true if it is larger than 0x7fff, this is wrong For x2apic enabled systems, - the byte length of X2APIC ID is 4, and it can be larger than 0x7fff in theory - the ->apic_id_valid points to x2apic_apic_id_valid(), which always return _ture_ , not default_apic_id_valid(). Thanks, dou and if local_apic_id is invalid, we should prevent it from being accounted > This fixes a bug that Purley platform displays too many possible cpu Signed-off-by: Li RongQingCc: Peter Zijlstra Cc: Thomas Gleixner Cc: Dou Liyang --- arch/x86/include/asm/apic.h | 4 ++-- arch/x86/kernel/acpi/boot.c | 10 ++ arch/x86/kernel/apic/apic_common.c | 2 +- arch/x86/kernel/apic/apic_numachip.c | 2 +- arch/x86/kernel/apic/x2apic.h| 2 +- arch/x86/kernel/apic/x2apic_phys.c | 2 +- arch/x86/kernel/apic/x2apic_uv_x.c | 2 +- arch/x86/xen/apic.c | 2 +- 8 files changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 40a3d3642f3a..08acd954f00e 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -313,7 +313,7 @@ struct apic { /* Probe, setup and smpboot functions */ int (*probe)(void); int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id); - int (*apic_id_valid)(int apicid); + int (*apic_id_valid)(u32 apicid); int (*apic_id_registered)(void); bool (*check_apicid_used)(physid_mask_t *map, int apicid); @@ -486,7 +486,7 @@ static inline unsigned int read_apic_id(void) return apic->get_apic_id(reg); } -extern int default_apic_id_valid(int apicid); +extern int default_apic_id_valid(u32 apicid); extern int default_acpi_madt_oem_check(char *, char *); extern void default_setup_apic_routing(void); diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 7a37d9357bc4..7412564dc2a7 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -200,7 +200,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) { struct acpi_madt_local_x2apic *processor = NULL; #ifdef CONFIG_X86_X2APIC - int apic_id; + u32 apic_id; u8 enabled; #endif @@ -222,10 +222,12 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) * to not preallocating memory for all NR_CPUS * when we use CPU hotplug. */ - if (!apic->apic_id_valid(apic_id) && enabled) + if (!apic->apic_id_valid(apic_id)) { printk(KERN_WARNING PREFIX "x2apic entry ignored\n"); - else - acpi_register_lapic(apic_id, processor->uid, enabled); + return 0; + } + + acpi_register_lapic(apic_id, processor->uid, enabled); #else printk(KERN_WARNING PREFIX "x2apic entry ignored\n"); #endif diff --git a/arch/x86/kernel/apic/apic_common.c b/arch/x86/kernel/apic/apic_common.c index a360801779ae..02b4839478b1 100644 --- a/arch/x86/kernel/apic/apic_common.c +++ b/arch/x86/kernel/apic/apic_common.c @@ -40,7 +40,7 @@ int default_check_phys_apicid_present(int phys_apicid) return physid_isset(phys_apicid, phys_cpu_present_map); } -int default_apic_id_valid(int apicid) +int default_apic_id_valid(u32 apicid) { return (apicid < 255); } diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c index 134e04506ab4..78778b54f904 100644 --- a/arch/x86/kernel/apic/apic_numachip.c +++ b/arch/x86/kernel/apic/apic_numachip.c @@ -56,7 +56,7 @@ static u32 numachip2_set_apic_id(unsigned int id) return id << 24; } -static int numachip_apic_id_valid(int apicid) +static int numachip_apic_id_valid(u32 apicid) { /* Trust what bootloader passes in MADT */ return 1; diff --git a/arch/x86/kernel/apic/x2apic.h b/arch/x86/kernel/apic/x2apic.h index b107de381cb5..a49b3604027f 100644 --- a/arch/x86/kernel/apic/x2apic.h +++ b/arch/x86/kernel/apic/x2apic.h @@ -1,6 +1,6 @@ /* Common bits for X2APIC cluster/physical modes. */ -int x2apic_apic_id_valid(int apicid); +int x2apic_apic_id_valid(u32 apicid); int x2apic_apic_id_registered(void); void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest); unsigned int x2apic_get_apic_id(unsigned long id); diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index e2829bf40e4a..b5cf9e7b3830 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -101,7
Re: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted
Hi RongQing, Is there an local x2apic whose ID is 0x in your machine? At 04/08/2018 07:38 PM, Li RongQing wrote: local_apic_id of acpi_madt_local_x2apic is u32, it is converted to int when checked by default_apic_id_valid() and return true if it is larger than 0x7fff, this is wrong For x2apic enabled systems, - the byte length of X2APIC ID is 4, and it can be larger than 0x7fff in theory - the ->apic_id_valid points to x2apic_apic_id_valid(), which always return _ture_ , not default_apic_id_valid(). Thanks, dou and if local_apic_id is invalid, we should prevent it from being accounted > This fixes a bug that Purley platform displays too many possible cpu Signed-off-by: Li RongQing Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Dou Liyang --- arch/x86/include/asm/apic.h | 4 ++-- arch/x86/kernel/acpi/boot.c | 10 ++ arch/x86/kernel/apic/apic_common.c | 2 +- arch/x86/kernel/apic/apic_numachip.c | 2 +- arch/x86/kernel/apic/x2apic.h| 2 +- arch/x86/kernel/apic/x2apic_phys.c | 2 +- arch/x86/kernel/apic/x2apic_uv_x.c | 2 +- arch/x86/xen/apic.c | 2 +- 8 files changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 40a3d3642f3a..08acd954f00e 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -313,7 +313,7 @@ struct apic { /* Probe, setup and smpboot functions */ int (*probe)(void); int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id); - int (*apic_id_valid)(int apicid); + int (*apic_id_valid)(u32 apicid); int (*apic_id_registered)(void); bool (*check_apicid_used)(physid_mask_t *map, int apicid); @@ -486,7 +486,7 @@ static inline unsigned int read_apic_id(void) return apic->get_apic_id(reg); } -extern int default_apic_id_valid(int apicid); +extern int default_apic_id_valid(u32 apicid); extern int default_acpi_madt_oem_check(char *, char *); extern void default_setup_apic_routing(void); diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 7a37d9357bc4..7412564dc2a7 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -200,7 +200,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) { struct acpi_madt_local_x2apic *processor = NULL; #ifdef CONFIG_X86_X2APIC - int apic_id; + u32 apic_id; u8 enabled; #endif @@ -222,10 +222,12 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) * to not preallocating memory for all NR_CPUS * when we use CPU hotplug. */ - if (!apic->apic_id_valid(apic_id) && enabled) + if (!apic->apic_id_valid(apic_id)) { printk(KERN_WARNING PREFIX "x2apic entry ignored\n"); - else - acpi_register_lapic(apic_id, processor->uid, enabled); + return 0; + } + + acpi_register_lapic(apic_id, processor->uid, enabled); #else printk(KERN_WARNING PREFIX "x2apic entry ignored\n"); #endif diff --git a/arch/x86/kernel/apic/apic_common.c b/arch/x86/kernel/apic/apic_common.c index a360801779ae..02b4839478b1 100644 --- a/arch/x86/kernel/apic/apic_common.c +++ b/arch/x86/kernel/apic/apic_common.c @@ -40,7 +40,7 @@ int default_check_phys_apicid_present(int phys_apicid) return physid_isset(phys_apicid, phys_cpu_present_map); } -int default_apic_id_valid(int apicid) +int default_apic_id_valid(u32 apicid) { return (apicid < 255); } diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c index 134e04506ab4..78778b54f904 100644 --- a/arch/x86/kernel/apic/apic_numachip.c +++ b/arch/x86/kernel/apic/apic_numachip.c @@ -56,7 +56,7 @@ static u32 numachip2_set_apic_id(unsigned int id) return id << 24; } -static int numachip_apic_id_valid(int apicid) +static int numachip_apic_id_valid(u32 apicid) { /* Trust what bootloader passes in MADT */ return 1; diff --git a/arch/x86/kernel/apic/x2apic.h b/arch/x86/kernel/apic/x2apic.h index b107de381cb5..a49b3604027f 100644 --- a/arch/x86/kernel/apic/x2apic.h +++ b/arch/x86/kernel/apic/x2apic.h @@ -1,6 +1,6 @@ /* Common bits for X2APIC cluster/physical modes. */ -int x2apic_apic_id_valid(int apicid); +int x2apic_apic_id_valid(u32 apicid); int x2apic_apic_id_registered(void); void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest); unsigned int x2apic_get_apic_id(unsigned long id); diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index e2829bf40e4a..b5cf9e7b3830 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -101,7 +101,7 @@ static int x2apic_phys_probe(void) } /* Common x2apic functions, also used
[lkp-robot] [init, tracing] 2580d6b795: BUG:kernel_reboot-without-warning_in_boot_stage
FYI, we noticed the following commit (built with gcc-7): commit: 2580d6b795e25879c825a0891cf67390f665b11f ("init, tracing: Have printk come through the trace events for initcall_debug") url: https://github.com/0day-ci/linux/commits/Steven-Rostedt/init-tracing/20180407-130743 in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu Nehalem -smp 2 -m 512M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | ecf6709d07 | 2580d6b795 | +--+++ | boot_successes | 0 | 0 | | boot_failures| 8 | 8 | | invoked_oom-killer:gfp_mask=0x | 8 || | Mem-Info | 8 || | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 8 || | BUG:kernel_reboot-without-warning_in_boot_stage | 0 | 8 | +--+++ [0.00] RAMDISK: [mem 0x1b7e2000-0x1ffc] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F6860 14 (v00 BOCHS ) [0.00] ACPI: RSDT 0x1FFE1628 30 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.00] ACPI: FACP 0x1FFE147C 74 (v01 BOCHS BXPCFACP 0001 BXPC 0001) BUG: kernel reboot-without-warning in boot stage Elapsed time: 10 #!/bin/bash To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Thanks, Xiaolong # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.16.0 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_KASAN_SHADOW_OFFSET=0xdc00 CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set CONFIG_KERNEL_LZO=y # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" # CONFIG_SWAP is not set # CONFIG_SYSVIPC is not set CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_DEBUGFS=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not
[lkp-robot] [init, tracing] 2580d6b795: BUG:kernel_reboot-without-warning_in_boot_stage
FYI, we noticed the following commit (built with gcc-7): commit: 2580d6b795e25879c825a0891cf67390f665b11f ("init, tracing: Have printk come through the trace events for initcall_debug") url: https://github.com/0day-ci/linux/commits/Steven-Rostedt/init-tracing/20180407-130743 in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu Nehalem -smp 2 -m 512M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | ecf6709d07 | 2580d6b795 | +--+++ | boot_successes | 0 | 0 | | boot_failures| 8 | 8 | | invoked_oom-killer:gfp_mask=0x | 8 || | Mem-Info | 8 || | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 8 || | BUG:kernel_reboot-without-warning_in_boot_stage | 0 | 8 | +--+++ [0.00] RAMDISK: [mem 0x1b7e2000-0x1ffc] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F6860 14 (v00 BOCHS ) [0.00] ACPI: RSDT 0x1FFE1628 30 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.00] ACPI: FACP 0x1FFE147C 74 (v01 BOCHS BXPCFACP 0001 BXPC 0001) BUG: kernel reboot-without-warning in boot stage Elapsed time: 10 #!/bin/bash To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Thanks, Xiaolong # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.16.0 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_KASAN_SHADOW_OFFSET=0xdc00 CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set CONFIG_KERNEL_LZO=y # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" # CONFIG_SWAP is not set # CONFIG_SYSVIPC is not set CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_DEBUGFS=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not
Re: [GIT PULL] SELinux patches for v4.17
On Mon, Apr 9, 2018 at 6:44 AM, Richard Haineswrote: > On Sun, 2018-04-08 at 19:59 +0100, Richard Haines via Selinux wrote: >> On Mon, 2018-04-09 at 01:43 +0800, Xin Long wrote: >> > On Sun, Apr 8, 2018 at 10:09 PM, Richard Haines >> > wrote: >> > > On Sun, 2018-04-08 at 08:50 -0400, Paul Moore wrote: >> > > > On April 7, 2018 1:03:57 PM Linus Torvalds > > > > da >> > > > tion >> > > > .org> wrote: >> > > > On Sat, Apr 7, 2018 at 9:54 AM, Richard Haines >> > > > wrote: >> > > > >> > > > So please check my resolution, but also somebody should tell me >> > > > "Linus, you're a cretin, sctp_connect() doesn't want that >> > > > security_sctp_bind_connect() at all because it was already done >> > > > by >> > > > XYZ" >> > > > >> > > > sctp_connect() or __sctp_connect() do not need to call >> > > > security_sctp_bind_connect(). This is because the connect(2) >> > > > call >> > > > will >> > > > handle the checks required via security_socket_connect(): >> > > > >> > > > Ok, thanks, that's exactly what I wanted to get. >> > > > >> > > > Anyway, somebody should still verify that it all looks good in >> > > > my >> > > > tree, but I don't actually expect the merge to have had any >> > > > issues >> > > > even if the refactoring made it a bit more complex than most >> > > > merges >> > > > are. >> > > > >> > > > Thanks for the quick response Richard. >> > > > >> > > > Xin Long looked it over and gave it the thumbs up, I'll take a >> > > > look >> > > > too, but to be honest I trust his SCTP understanding much more >> > > > than >> > > > mine. I also do weekly tests of each rcX release at a minimum >> > > > so >> > > > if >> > > > something odd pops up I'll make sure you get a fix. >> > > > >> > > > Thanks again everyone. >> > > >> > > I built the kernel this morning and sorry to spoil the party, but >> > > I've >> > > run into a problem with lksctp-tools when running the func_tests: >> > > >> > > make v6test >> > > .. >> > > .. >> > > ./test_timetolive_v6 >> > > test_timetolive.c 0 INFO : Creating fillmsg of size 3087 >> > > test_timetolive.c 1 PASS : Send a message with timeout >> > > test_timetolive.c 2 PASS : Send a message with no timeout >> > > test_timetolive.c 3 PASS : Send a fragmented message with >> > > timeout >> > > test_timetolive.c 0 INFO : ** SLEEPING for 3 seconds ** >> > > test_timetolive.c 4 BROK : Got a datamsg of unexpected >> > > length:23, >> > > expected length:27 >> > > DUMP_CORE sctputil.c: 247 >> > > /bin/sh: line 1: 30981 Segmentation fault (core dumped) ./$a >> > > test_timetolive_v6 fails >> > > >> > > make v4 test fails the same way. I'm using lksctp-tools from [1]. >> > > I >> > > have not investigated the cause yet as just found this and >> > > thought >> > > I >> > > should flag first just in case someone has the answer !!! >> > >> > test_timetolive(_v6) works for me, In lksctp-tools/src/func_tests, >> > I >> > had >> > another case failed,./test_1_to_1_events, it's caused by: >> > commit 30f6ebf65bc46161c5aaff1db2e6e7c76aa4a06b >> > Author: Xin Long >> > Date: Wed Mar 14 19:05:34 2018 +0800 >> > >> > sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT >> > >> > It's not kernel's issue, after that commit, ./test_1_to_1_events >> > should >> > have been improved. or avoid it by 'sysctl -w >> > net.sctp.auth_enable=1' >> > >> > I'm not sure why test_timetolive(_v6) is not working in your env. >> >> It appears to depend on the run sequence of the tests. I rebooted the >> system, ran test_timetolive_v6, it worked okay. >> Ran "sctp-tests run" on a terminal, then ran test_timetolive_v6 at >> various intervals on another terminal. Once sctp-tests started the >> "=== >> ndatasched ===" sequence, test_timetolive_v6 failed. > > 1) When SCTP is initialised /proc/sys/net/sctp/prsctp_enable = 1 > 2) When sctp-tests/testcase/regression/extoverflow/test.sh is executed, > on exit it sets prsctp_enable = 0. This seems to be causing the issue > I'm seeing. I can now simulate the problem: > > Running from fresh boot: > checksctp > cat /proc/sys/net/sctp/prsctp_enable > 1 > ./test_timetolive_v6 > passes > echo 0 > /proc/sys/net/sctp/prsctp_enable > ./test_timetolive_v6 > fails > echo 1 > /proc/sys/net/sctp/prsctp_enable > ./test_timetolive_v6 > passes I see ... commit 8ae808eb853e3789b81b8a502cdf22bb01b76880 Author: Xin Long Date: Sat Oct 8 11:40:16 2016 +0800 sctp: remove the old ttl expires policy ttl expire is considered as one of the prsctp policies after this commit, so prsctp_enable is required. I will think to update this test case in lksctp-tools. Thanks for the reproducer.
Re: [GIT PULL] SELinux patches for v4.17
On Mon, Apr 9, 2018 at 6:44 AM, Richard Haines wrote: > On Sun, 2018-04-08 at 19:59 +0100, Richard Haines via Selinux wrote: >> On Mon, 2018-04-09 at 01:43 +0800, Xin Long wrote: >> > On Sun, Apr 8, 2018 at 10:09 PM, Richard Haines >> > wrote: >> > > On Sun, 2018-04-08 at 08:50 -0400, Paul Moore wrote: >> > > > On April 7, 2018 1:03:57 PM Linus Torvalds > > > > da >> > > > tion >> > > > .org> wrote: >> > > > On Sat, Apr 7, 2018 at 9:54 AM, Richard Haines >> > > > wrote: >> > > > >> > > > So please check my resolution, but also somebody should tell me >> > > > "Linus, you're a cretin, sctp_connect() doesn't want that >> > > > security_sctp_bind_connect() at all because it was already done >> > > > by >> > > > XYZ" >> > > > >> > > > sctp_connect() or __sctp_connect() do not need to call >> > > > security_sctp_bind_connect(). This is because the connect(2) >> > > > call >> > > > will >> > > > handle the checks required via security_socket_connect(): >> > > > >> > > > Ok, thanks, that's exactly what I wanted to get. >> > > > >> > > > Anyway, somebody should still verify that it all looks good in >> > > > my >> > > > tree, but I don't actually expect the merge to have had any >> > > > issues >> > > > even if the refactoring made it a bit more complex than most >> > > > merges >> > > > are. >> > > > >> > > > Thanks for the quick response Richard. >> > > > >> > > > Xin Long looked it over and gave it the thumbs up, I'll take a >> > > > look >> > > > too, but to be honest I trust his SCTP understanding much more >> > > > than >> > > > mine. I also do weekly tests of each rcX release at a minimum >> > > > so >> > > > if >> > > > something odd pops up I'll make sure you get a fix. >> > > > >> > > > Thanks again everyone. >> > > >> > > I built the kernel this morning and sorry to spoil the party, but >> > > I've >> > > run into a problem with lksctp-tools when running the func_tests: >> > > >> > > make v6test >> > > .. >> > > .. >> > > ./test_timetolive_v6 >> > > test_timetolive.c 0 INFO : Creating fillmsg of size 3087 >> > > test_timetolive.c 1 PASS : Send a message with timeout >> > > test_timetolive.c 2 PASS : Send a message with no timeout >> > > test_timetolive.c 3 PASS : Send a fragmented message with >> > > timeout >> > > test_timetolive.c 0 INFO : ** SLEEPING for 3 seconds ** >> > > test_timetolive.c 4 BROK : Got a datamsg of unexpected >> > > length:23, >> > > expected length:27 >> > > DUMP_CORE sctputil.c: 247 >> > > /bin/sh: line 1: 30981 Segmentation fault (core dumped) ./$a >> > > test_timetolive_v6 fails >> > > >> > > make v4 test fails the same way. I'm using lksctp-tools from [1]. >> > > I >> > > have not investigated the cause yet as just found this and >> > > thought >> > > I >> > > should flag first just in case someone has the answer !!! >> > >> > test_timetolive(_v6) works for me, In lksctp-tools/src/func_tests, >> > I >> > had >> > another case failed,./test_1_to_1_events, it's caused by: >> > commit 30f6ebf65bc46161c5aaff1db2e6e7c76aa4a06b >> > Author: Xin Long >> > Date: Wed Mar 14 19:05:34 2018 +0800 >> > >> > sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT >> > >> > It's not kernel's issue, after that commit, ./test_1_to_1_events >> > should >> > have been improved. or avoid it by 'sysctl -w >> > net.sctp.auth_enable=1' >> > >> > I'm not sure why test_timetolive(_v6) is not working in your env. >> >> It appears to depend on the run sequence of the tests. I rebooted the >> system, ran test_timetolive_v6, it worked okay. >> Ran "sctp-tests run" on a terminal, then ran test_timetolive_v6 at >> various intervals on another terminal. Once sctp-tests started the >> "=== >> ndatasched ===" sequence, test_timetolive_v6 failed. > > 1) When SCTP is initialised /proc/sys/net/sctp/prsctp_enable = 1 > 2) When sctp-tests/testcase/regression/extoverflow/test.sh is executed, > on exit it sets prsctp_enable = 0. This seems to be causing the issue > I'm seeing. I can now simulate the problem: > > Running from fresh boot: > checksctp > cat /proc/sys/net/sctp/prsctp_enable > 1 > ./test_timetolive_v6 > passes > echo 0 > /proc/sys/net/sctp/prsctp_enable > ./test_timetolive_v6 > fails > echo 1 > /proc/sys/net/sctp/prsctp_enable > ./test_timetolive_v6 > passes I see ... commit 8ae808eb853e3789b81b8a502cdf22bb01b76880 Author: Xin Long Date: Sat Oct 8 11:40:16 2016 +0800 sctp: remove the old ttl expires policy ttl expire is considered as one of the prsctp policies after this commit, so prsctp_enable is required. I will think to update this test case in lksctp-tools. Thanks for the reproducer.
Re: [PATCH v2 4/4] clk: qcom: Add Global Clock controller (GCC) driver for SDM845
On 2018-04-06 04:27, Stephen Boyd wrote: Quoting Amit Nischal (2018-04-03 05:24:41) On 2018-03-20 06:12, Stephen Boyd wrote: > Quoting Amit Nischal (2018-03-07 23:18:15) >> +}; >> + >> +static struct clk_rcg2 gcc_sdcc4_apps_clk_src = { >> + .cmd_rcgr = 0x1600c, >> + .mnd_width = 8, >> + .hid_width = 5, >> + .parent_map = gcc_parent_map_0, >> + .freq_tbl = ftbl_gcc_sdcc4_apps_clk_src, >> + .safe_src_freq_tbl = _safe_src_f, > > Why does sdcc have safe src stuff? Is something turning on the sdcc clk > outside of our control? I will get more details on this and will get back. Any news? I am removing the safe src for SDCC, but I am trying to get details from teams as to why this was added, if it would be required I will add back the safe src index again and submit the patch. > >> + .clkr.hw.init = &(struct clk_init_data){ >> + .name = "gcc_sdcc4_apps_clk_src", >> + .parent_names = gcc_parent_names_0, >> + .num_parents = 4, >> + .flags = CLK_SET_RATE_PARENT, >> + .ops = _rcg2_shared_ops, >> + }, >> +}; >> + > [...] >> + >> +static struct clk_branch gcc_video_xo_clk = { >> + .halt_reg = 0xb028, >> + .halt_check = BRANCH_HALT, >> + .clkr = { >> + .enable_reg = 0xb028, >> + .enable_mask = BIT(0), >> + .hw.init = &(struct clk_init_data){ >> + .name = "gcc_video_xo_clk", >> + .flags = CLK_IS_CRITICAL, >> + .ops = _branch2_ops, >> + > > These things have no parents and we mark them critical. Why are we > even exposing them to the kernel? Are they not on by default? Are we > going to change these to non-critical at some point in the future? These clocks are not enabled by default and going to video or other multimedia cores so we are marking them as critical and need to expose to the kernel. As of now, there is no plan to change these to non-critical. Ok. Can we open code enabling these branches in the driver probe then? Still seems wasteful if nobody uses these. Put another way, either a driver (or other clk controller) should be toggling these gates at runtime or we should enable them once and leave them out of the framework. If the driver approach is taken, then the drivers should be able to turn the clks on and off to save some power. As of now, no client driver is taking care of toggling these gates at runtime. We want these clocks to be always on and that's why marked them as CRITICAL so that if any user tries to unprepare/disable then it won't happen and framework generates the warning. Once the client drivers will take care of above, then we will submit a cleanup patch. -- To unsubscribe from this list: send the line "unsubscribe linux-clk" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/4] clk: qcom: Add Global Clock controller (GCC) driver for SDM845
On 2018-04-06 04:27, Stephen Boyd wrote: Quoting Amit Nischal (2018-04-03 05:24:41) On 2018-03-20 06:12, Stephen Boyd wrote: > Quoting Amit Nischal (2018-03-07 23:18:15) >> +}; >> + >> +static struct clk_rcg2 gcc_sdcc4_apps_clk_src = { >> + .cmd_rcgr = 0x1600c, >> + .mnd_width = 8, >> + .hid_width = 5, >> + .parent_map = gcc_parent_map_0, >> + .freq_tbl = ftbl_gcc_sdcc4_apps_clk_src, >> + .safe_src_freq_tbl = _safe_src_f, > > Why does sdcc have safe src stuff? Is something turning on the sdcc clk > outside of our control? I will get more details on this and will get back. Any news? I am removing the safe src for SDCC, but I am trying to get details from teams as to why this was added, if it would be required I will add back the safe src index again and submit the patch. > >> + .clkr.hw.init = &(struct clk_init_data){ >> + .name = "gcc_sdcc4_apps_clk_src", >> + .parent_names = gcc_parent_names_0, >> + .num_parents = 4, >> + .flags = CLK_SET_RATE_PARENT, >> + .ops = _rcg2_shared_ops, >> + }, >> +}; >> + > [...] >> + >> +static struct clk_branch gcc_video_xo_clk = { >> + .halt_reg = 0xb028, >> + .halt_check = BRANCH_HALT, >> + .clkr = { >> + .enable_reg = 0xb028, >> + .enable_mask = BIT(0), >> + .hw.init = &(struct clk_init_data){ >> + .name = "gcc_video_xo_clk", >> + .flags = CLK_IS_CRITICAL, >> + .ops = _branch2_ops, >> + > > These things have no parents and we mark them critical. Why are we > even exposing them to the kernel? Are they not on by default? Are we > going to change these to non-critical at some point in the future? These clocks are not enabled by default and going to video or other multimedia cores so we are marking them as critical and need to expose to the kernel. As of now, there is no plan to change these to non-critical. Ok. Can we open code enabling these branches in the driver probe then? Still seems wasteful if nobody uses these. Put another way, either a driver (or other clk controller) should be toggling these gates at runtime or we should enable them once and leave them out of the framework. If the driver approach is taken, then the drivers should be able to turn the clks on and off to save some power. As of now, no client driver is taking care of toggling these gates at runtime. We want these clocks to be always on and that's why marked them as CRITICAL so that if any user tries to unprepare/disable then it won't happen and framework generates the warning. Once the client drivers will take care of above, then we will submit a cleanup patch. -- To unsubscribe from this list: send the line "unsubscribe linux-clk" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time
Sargun Dhillon wrote: > > Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and > > the exception in randomize_layout_plugin.c because preventing module > > unloading won't work as expected. > > > > Rather than completely removing the unloading code, might it make > sense to add a BUG_ON or WARN_ON, in security_delete_hooks if > allow_unload_module is false, and owner is not NULL? Do we need to check ->owner != NULL? Although it will be true that SELinux's ->owner == NULL and LKM-based LSM module's ->owner != NULL, I think we unregister SELinux before setting allow_unload_module to false. Thus, rejecting delete_security_hooks() if allow_unload_module == false will be sufficient. SELinux might want to call panic() if delete_security_hooks() did not unregister due to allow_unload_module == false. Also, allow_unload_module would be renamed to allow_unregister_module. By the way, please don't use BUG_ON() or WARN_ON() because syzbot would hit and call panic() because syzbot runs tests with panic_on_warn == true.
Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time
Sargun Dhillon wrote: > > Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and > > the exception in randomize_layout_plugin.c because preventing module > > unloading won't work as expected. > > > > Rather than completely removing the unloading code, might it make > sense to add a BUG_ON or WARN_ON, in security_delete_hooks if > allow_unload_module is false, and owner is not NULL? Do we need to check ->owner != NULL? Although it will be true that SELinux's ->owner == NULL and LKM-based LSM module's ->owner != NULL, I think we unregister SELinux before setting allow_unload_module to false. Thus, rejecting delete_security_hooks() if allow_unload_module == false will be sufficient. SELinux might want to call panic() if delete_security_hooks() did not unregister due to allow_unload_module == false. Also, allow_unload_module would be renamed to allow_unregister_module. By the way, please don't use BUG_ON() or WARN_ON() because syzbot would hit and call panic() because syzbot runs tests with panic_on_warn == true.
Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
On 07.04.2018 9:18, Alexey Budankov wrote: > On 06.04.2018 22:53, Andi Kleen wrote: >> On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote: >>> On 06.04.2018 18:31, Andi Kleen wrote: > diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c > index e47b2dbbdef3..9284048cf5b0 100644 > --- a/arch/x86/kernel/perf_regs.c > +++ b/arch/x86/kernel/perf_regs.c > @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user, >*/ > regs_user_copy->bx = -1; > regs_user_copy->bp = -1; > + if (user_64bit_mode(user_regs)) { Why is it 64bit only? Should work on 32bit too. >>> >>> bp register is a part of i386 syscall ABI >>> (http://man7.org/linux/man-pages/man2/syscall.2.html) >>> so not sure if it will make any sense for 32bit processes. >> >> Both 32bit and 64bit use the same frame pointer, if they >> use frame pointer. > > Well let me check the same scenario for 32bit binary. Here is what I have when profiling 32bit process on the patched 64bit kernel w/o 32bit frame-pointer exposure: vmlinux ! try_to_wake_up - [unknown source file] vmlinux ! wake_up_q + 0x3e - [unknown source file] vmlinux ! futex_wake + 0x141 - [unknown source file] vmlinux ! do_futex + 0x49b - [unknown source file] vmlinux ! compat_SyS_futex + 0x123 - [unknown source file] vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file] vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file] ==> [vdso] ! __kernel_vsyscall + 0x8 - [unknown source file] ==> libc-2.26.so ! syscall + 0x26 - [unknown source file] ==> futex32-fp ! main + 0xba - [unknown source file] ==> libc-2.26.so ! __libc_start_main + 0xf2 - [unknown source file] so stack is unwound till the top. However if I enable 32bit exposure then the stack looks like this: vmlinux ! try_to_wake_up - [unknown source file] vmlinux ! wake_up_q + 0x3e - [unknown source file] vmlinux ! futex_wake + 0x141 - [unknown source file] vmlinux ! do_futex + 0x49b - [unknown source file] vmlinux ! compat_SyS_futex + 0x123 - [unknown source file] vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file] vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file] ==> [vdso] ! [vdso] + 0x1058 - [unknown source file] ==> vmlinux ! [Skipped stack frame(s)] + 0x1 - [unknown source file] and x86_64 perf report --stdio shows this: ... unwind: target platform=x86 is not supported ... # Samples: 140K of event 'cycles' # Event count (approx.): 93688193797 # # Children Self Command Shared Object Symbol # .. . # 86.00%14.40% futex32-fp [kernel.vmlinux] [k] entry_SYSENTER_compat | ---entry_SYSENTER_compat | --71.60%--do_fast_syscall_32 | |--54.62%--compat_sys_futex | | | --53.67%--do_futex I am not sure it is worth exposing frame pointer for 32bit too. -Alexey > If the issue exists for it too and is fixed by the exposing bp > then it is obviously worth this improvement. > > -Alexey > >> >> -Andi >> > >
Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
On 07.04.2018 9:18, Alexey Budankov wrote: > On 06.04.2018 22:53, Andi Kleen wrote: >> On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote: >>> On 06.04.2018 18:31, Andi Kleen wrote: > diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c > index e47b2dbbdef3..9284048cf5b0 100644 > --- a/arch/x86/kernel/perf_regs.c > +++ b/arch/x86/kernel/perf_regs.c > @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user, >*/ > regs_user_copy->bx = -1; > regs_user_copy->bp = -1; > + if (user_64bit_mode(user_regs)) { Why is it 64bit only? Should work on 32bit too. >>> >>> bp register is a part of i386 syscall ABI >>> (http://man7.org/linux/man-pages/man2/syscall.2.html) >>> so not sure if it will make any sense for 32bit processes. >> >> Both 32bit and 64bit use the same frame pointer, if they >> use frame pointer. > > Well let me check the same scenario for 32bit binary. Here is what I have when profiling 32bit process on the patched 64bit kernel w/o 32bit frame-pointer exposure: vmlinux ! try_to_wake_up - [unknown source file] vmlinux ! wake_up_q + 0x3e - [unknown source file] vmlinux ! futex_wake + 0x141 - [unknown source file] vmlinux ! do_futex + 0x49b - [unknown source file] vmlinux ! compat_SyS_futex + 0x123 - [unknown source file] vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file] vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file] ==> [vdso] ! __kernel_vsyscall + 0x8 - [unknown source file] ==> libc-2.26.so ! syscall + 0x26 - [unknown source file] ==> futex32-fp ! main + 0xba - [unknown source file] ==> libc-2.26.so ! __libc_start_main + 0xf2 - [unknown source file] so stack is unwound till the top. However if I enable 32bit exposure then the stack looks like this: vmlinux ! try_to_wake_up - [unknown source file] vmlinux ! wake_up_q + 0x3e - [unknown source file] vmlinux ! futex_wake + 0x141 - [unknown source file] vmlinux ! do_futex + 0x49b - [unknown source file] vmlinux ! compat_SyS_futex + 0x123 - [unknown source file] vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file] vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file] ==> [vdso] ! [vdso] + 0x1058 - [unknown source file] ==> vmlinux ! [Skipped stack frame(s)] + 0x1 - [unknown source file] and x86_64 perf report --stdio shows this: ... unwind: target platform=x86 is not supported ... # Samples: 140K of event 'cycles' # Event count (approx.): 93688193797 # # Children Self Command Shared Object Symbol # .. . # 86.00%14.40% futex32-fp [kernel.vmlinux] [k] entry_SYSENTER_compat | ---entry_SYSENTER_compat | --71.60%--do_fast_syscall_32 | |--54.62%--compat_sys_futex | | | --53.67%--do_futex I am not sure it is worth exposing frame pointer for 32bit too. -Alexey > If the issue exists for it too and is fixed by the exposing bp > then it is obviously worth this improvement. > > -Alexey > >> >> -Andi >> > >
Re: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo
> -邮件原件- > 发件人: David Wang [mailto:davidw...@zhaoxin.com] > 发送时间: 2018年4月8日 17:36 > 收件人: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; > mi...@kernel.org; gre...@linuxfoundation.org; x...@kernel.org; > linux-kernel@vger.kernel.org > 抄送: brucech...@via-alliance.com; cooper...@zhaoxin.com; > qiyuanw...@zhaoxin.com; benjamin...@viatech.com; luke...@viacpu.com; > tim...@zhaoxin.com; David Wang> 主题: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo > > We add this patch to show correct HW features(arch_perfmon, tpr_shadow, > vnmi, flexpriority, ept and vpid) when user execute "cat /proc/cpuinfo". > > Signed-off-by: David Wang > --- > arch/x86/kernel/cpu/centaur.c | 49 > +++ > 1 file changed, 49 insertions(+) > > diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c > index e5ec0f1..969fb8f 100644 > --- a/arch/x86/kernel/cpu/centaur.c > +++ b/arch/x86/kernel/cpu/centaur.c > @@ -112,6 +112,44 @@ static void early_init_centaur(struct cpuinfo_x86 *c) > } > } > > +static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) { > +#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x0020 > +#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x0040 > +#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x8000 > +#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x0001 > +#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x0002 > +#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x0020 > + > + u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; > + > + clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > + clear_cpu_cap(c, X86_FEATURE_VNMI); > + clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > + clear_cpu_cap(c, X86_FEATURE_EPT); > + clear_cpu_cap(c, X86_FEATURE_VPID); > + > + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); > + msr_ctl = vmx_msr_high | vmx_msr_low; > + > + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) > + set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) > + set_cpu_cap(c, X86_FEATURE_VNMI); > + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { > + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, > + vmx_msr_low, vmx_msr_high); > + msr_ctl2 = vmx_msr_high | vmx_msr_low; > + if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && > + (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) > + set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) > + set_cpu_cap(c, X86_FEATURE_EPT); > + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) > + set_cpu_cap(c, X86_FEATURE_VPID); > + } > +} > + > static void init_centaur(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_32 > @@ -128,6 +166,14 @@ static void init_centaur(struct cpuinfo_x86 *c) > clear_cpu_cap(c, 0*32+31); > #endif > early_init_centaur(c); > + > + if (c->cpuid_level > 9) { > + unsigned eax = cpuid_eax(10); > + /* Check for version and the number of counters */ > + if ((eax & 0xff) && (((eax >> 8) & 0xff) > 1)) > + set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON); > + } > + > switch (c->x86) { > #ifdef CONFIG_X86_32 > case 5: > @@ -199,6 +245,9 @@ static void init_centaur(struct cpuinfo_x86 *c) #ifdef > CONFIG_X86_64 > set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); #endif > + > + if (cpu_has(c, X86_FEATURE_VMX)) > + centaur_detect_vmx_virtcap(c); > } > > #ifdef CONFIG_X86_32 > -- > 1.9.1 Sorry to send to wrong email address. --- David
Re: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo
> -邮件原件- > 发件人: David Wang [mailto:davidw...@zhaoxin.com] > 发送时间: 2018年4月8日 17:36 > 收件人: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; > mi...@kernel.org; gre...@linuxfoundation.org; x...@kernel.org; > linux-kernel@vger.kernel.org > 抄送: brucech...@via-alliance.com; cooper...@zhaoxin.com; > qiyuanw...@zhaoxin.com; benjamin...@viatech.com; luke...@viacpu.com; > tim...@zhaoxin.com; David Wang > 主题: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo > > We add this patch to show correct HW features(arch_perfmon, tpr_shadow, > vnmi, flexpriority, ept and vpid) when user execute "cat /proc/cpuinfo". > > Signed-off-by: David Wang > --- > arch/x86/kernel/cpu/centaur.c | 49 > +++ > 1 file changed, 49 insertions(+) > > diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c > index e5ec0f1..969fb8f 100644 > --- a/arch/x86/kernel/cpu/centaur.c > +++ b/arch/x86/kernel/cpu/centaur.c > @@ -112,6 +112,44 @@ static void early_init_centaur(struct cpuinfo_x86 *c) > } > } > > +static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) { > +#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x0020 > +#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x0040 > +#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x8000 > +#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x0001 > +#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x0002 > +#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x0020 > + > + u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; > + > + clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > + clear_cpu_cap(c, X86_FEATURE_VNMI); > + clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > + clear_cpu_cap(c, X86_FEATURE_EPT); > + clear_cpu_cap(c, X86_FEATURE_VPID); > + > + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); > + msr_ctl = vmx_msr_high | vmx_msr_low; > + > + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) > + set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) > + set_cpu_cap(c, X86_FEATURE_VNMI); > + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { > + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, > + vmx_msr_low, vmx_msr_high); > + msr_ctl2 = vmx_msr_high | vmx_msr_low; > + if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && > + (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) > + set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) > + set_cpu_cap(c, X86_FEATURE_EPT); > + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) > + set_cpu_cap(c, X86_FEATURE_VPID); > + } > +} > + > static void init_centaur(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_32 > @@ -128,6 +166,14 @@ static void init_centaur(struct cpuinfo_x86 *c) > clear_cpu_cap(c, 0*32+31); > #endif > early_init_centaur(c); > + > + if (c->cpuid_level > 9) { > + unsigned eax = cpuid_eax(10); > + /* Check for version and the number of counters */ > + if ((eax & 0xff) && (((eax >> 8) & 0xff) > 1)) > + set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON); > + } > + > switch (c->x86) { > #ifdef CONFIG_X86_32 > case 5: > @@ -199,6 +245,9 @@ static void init_centaur(struct cpuinfo_x86 *c) #ifdef > CONFIG_X86_64 > set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); #endif > + > + if (cpu_has(c, X86_FEATURE_VMX)) > + centaur_detect_vmx_virtcap(c); > } > > #ifdef CONFIG_X86_32 > -- > 1.9.1 Sorry to send to wrong email address. --- David
Re: [PATCH] thermal: devfreq_cooling: add const to struct thermal_cooling_device_ops
Hi srp, Thank you for the patch! Yet something to improve: [auto build test ERROR on thermal/next] [also build test ERROR on v4.16 next-20180406] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/srplinux2008/thermal-devfreq_cooling-add-const-to-struct-thermal_cooling_device_ops/20180409-105457 base: https://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next config: x86_64-randconfig-x010-201814 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers//thermal/devfreq_cooling.c: In function 'of_devfreq_cooling_register_power': >> drivers//thermal/devfreq_cooling.c:522:43: error: assignment of member >> 'get_requested_power' in read-only object devfreq_cooling_ops.get_requested_power = ^ >> drivers//thermal/devfreq_cooling.c:524:35: error: assignment of member >> 'state2power' in read-only object devfreq_cooling_ops.state2power = devfreq_cooling_state2power; ^ >> drivers//thermal/devfreq_cooling.c:525:35: error: assignment of member >> 'power2state' in read-only object devfreq_cooling_ops.power2state = devfreq_cooling_power2state; ^ vim +/get_requested_power +522 drivers//thermal/devfreq_cooling.c a76caf55 Ørjan Eide 2015-09-10 488 a76caf55 Ørjan Eide 2015-09-10 489 /** a76caf55 Ørjan Eide 2015-09-10 490 * of_devfreq_cooling_register_power() - Register devfreq cooling device, a76caf55 Ørjan Eide 2015-09-10 491 * with OF and power information. a76caf55 Ørjan Eide 2015-09-10 492 * @np:Pointer to OF device_node. a76caf55 Ørjan Eide 2015-09-10 493 * @df:Pointer to devfreq device. a76caf55 Ørjan Eide 2015-09-10 494 * @dfc_power: Pointer to devfreq_cooling_power. a76caf55 Ørjan Eide 2015-09-10 495 * a76caf55 Ørjan Eide 2015-09-10 496 * Register a devfreq cooling device. The available OPPs must be a76caf55 Ørjan Eide 2015-09-10 497 * registered on the device. a76caf55 Ørjan Eide 2015-09-10 498 * a76caf55 Ørjan Eide 2015-09-10 499 * If @dfc_power is provided, the cooling device is registered with the a76caf55 Ørjan Eide 2015-09-10 500 * power extensions. For the power extensions to work correctly, a76caf55 Ørjan Eide 2015-09-10 501 * devfreq should use the simple_ondemand governor, other governors a76caf55 Ørjan Eide 2015-09-10 502 * are not currently supported. a76caf55 Ørjan Eide 2015-09-10 503 */ 3c99c2ce Javi Merino2015-11-02 504 struct thermal_cooling_device * a76caf55 Ørjan Eide 2015-09-10 505 of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, a76caf55 Ørjan Eide 2015-09-10 506 struct devfreq_cooling_power *dfc_power) a76caf55 Ørjan Eide 2015-09-10 507 { a76caf55 Ørjan Eide 2015-09-10 508 struct thermal_cooling_device *cdev; a76caf55 Ørjan Eide 2015-09-10 509 struct devfreq_cooling_device *dfc; a76caf55 Ørjan Eide 2015-09-10 510 char dev_name[THERMAL_NAME_LENGTH]; a76caf55 Ørjan Eide 2015-09-10 511 int err; a76caf55 Ørjan Eide 2015-09-10 512 a76caf55 Ørjan Eide 2015-09-10 513 dfc = kzalloc(sizeof(*dfc), GFP_KERNEL); a76caf55 Ørjan Eide 2015-09-10 514 if (!dfc) a76caf55 Ørjan Eide 2015-09-10 515 return ERR_PTR(-ENOMEM); a76caf55 Ørjan Eide 2015-09-10 516 a76caf55 Ørjan Eide 2015-09-10 517 dfc->devfreq = df; a76caf55 Ørjan Eide 2015-09-10 518 a76caf55 Ørjan Eide 2015-09-10 519 if (dfc_power) { a76caf55 Ørjan Eide 2015-09-10 520 dfc->power_ops = dfc_power; a76caf55 Ørjan Eide 2015-09-10 521 a76caf55 Ørjan Eide 2015-09-10 @522 devfreq_cooling_ops.get_requested_power = a76caf55 Ørjan Eide 2015-09-10 523 devfreq_cooling_get_requested_power; a76caf55 Ørjan Eide 2015-09-10 @524 devfreq_cooling_ops.state2power = devfreq_cooling_state2power; a76caf55 Ørjan Eide 2015-09-10 @525 devfreq_cooling_ops.power2state = devfreq_cooling_power2state; a76caf55 Ørjan Eide 2015-09-10 526 } a76caf55 Ørjan Eide 2015-09-10 527 a76caf55 Ørjan Eide 2015-09-10 528 err = devfreq_cooling_gen_tables(dfc); a76caf55 Ørjan Eide 2015-09-10 529 if (err) a76caf55 Ørjan Eide 2015-09-10 530 goto free_dfc; a76caf55 Ørjan Eide 2015-09-10 531 2f96c035 Matthew Wilcox 2016-12-21 532 err = ida_simple_get(_ida, 0, 0, GFP_KERNEL); 2f96c035 Matthew
Re: [PATCH] thermal: devfreq_cooling: add const to struct thermal_cooling_device_ops
Hi srp, Thank you for the patch! Yet something to improve: [auto build test ERROR on thermal/next] [also build test ERROR on v4.16 next-20180406] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/srplinux2008/thermal-devfreq_cooling-add-const-to-struct-thermal_cooling_device_ops/20180409-105457 base: https://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next config: x86_64-randconfig-x010-201814 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers//thermal/devfreq_cooling.c: In function 'of_devfreq_cooling_register_power': >> drivers//thermal/devfreq_cooling.c:522:43: error: assignment of member >> 'get_requested_power' in read-only object devfreq_cooling_ops.get_requested_power = ^ >> drivers//thermal/devfreq_cooling.c:524:35: error: assignment of member >> 'state2power' in read-only object devfreq_cooling_ops.state2power = devfreq_cooling_state2power; ^ >> drivers//thermal/devfreq_cooling.c:525:35: error: assignment of member >> 'power2state' in read-only object devfreq_cooling_ops.power2state = devfreq_cooling_power2state; ^ vim +/get_requested_power +522 drivers//thermal/devfreq_cooling.c a76caf55 Ørjan Eide 2015-09-10 488 a76caf55 Ørjan Eide 2015-09-10 489 /** a76caf55 Ørjan Eide 2015-09-10 490 * of_devfreq_cooling_register_power() - Register devfreq cooling device, a76caf55 Ørjan Eide 2015-09-10 491 * with OF and power information. a76caf55 Ørjan Eide 2015-09-10 492 * @np:Pointer to OF device_node. a76caf55 Ørjan Eide 2015-09-10 493 * @df:Pointer to devfreq device. a76caf55 Ørjan Eide 2015-09-10 494 * @dfc_power: Pointer to devfreq_cooling_power. a76caf55 Ørjan Eide 2015-09-10 495 * a76caf55 Ørjan Eide 2015-09-10 496 * Register a devfreq cooling device. The available OPPs must be a76caf55 Ørjan Eide 2015-09-10 497 * registered on the device. a76caf55 Ørjan Eide 2015-09-10 498 * a76caf55 Ørjan Eide 2015-09-10 499 * If @dfc_power is provided, the cooling device is registered with the a76caf55 Ørjan Eide 2015-09-10 500 * power extensions. For the power extensions to work correctly, a76caf55 Ørjan Eide 2015-09-10 501 * devfreq should use the simple_ondemand governor, other governors a76caf55 Ørjan Eide 2015-09-10 502 * are not currently supported. a76caf55 Ørjan Eide 2015-09-10 503 */ 3c99c2ce Javi Merino2015-11-02 504 struct thermal_cooling_device * a76caf55 Ørjan Eide 2015-09-10 505 of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, a76caf55 Ørjan Eide 2015-09-10 506 struct devfreq_cooling_power *dfc_power) a76caf55 Ørjan Eide 2015-09-10 507 { a76caf55 Ørjan Eide 2015-09-10 508 struct thermal_cooling_device *cdev; a76caf55 Ørjan Eide 2015-09-10 509 struct devfreq_cooling_device *dfc; a76caf55 Ørjan Eide 2015-09-10 510 char dev_name[THERMAL_NAME_LENGTH]; a76caf55 Ørjan Eide 2015-09-10 511 int err; a76caf55 Ørjan Eide 2015-09-10 512 a76caf55 Ørjan Eide 2015-09-10 513 dfc = kzalloc(sizeof(*dfc), GFP_KERNEL); a76caf55 Ørjan Eide 2015-09-10 514 if (!dfc) a76caf55 Ørjan Eide 2015-09-10 515 return ERR_PTR(-ENOMEM); a76caf55 Ørjan Eide 2015-09-10 516 a76caf55 Ørjan Eide 2015-09-10 517 dfc->devfreq = df; a76caf55 Ørjan Eide 2015-09-10 518 a76caf55 Ørjan Eide 2015-09-10 519 if (dfc_power) { a76caf55 Ørjan Eide 2015-09-10 520 dfc->power_ops = dfc_power; a76caf55 Ørjan Eide 2015-09-10 521 a76caf55 Ørjan Eide 2015-09-10 @522 devfreq_cooling_ops.get_requested_power = a76caf55 Ørjan Eide 2015-09-10 523 devfreq_cooling_get_requested_power; a76caf55 Ørjan Eide 2015-09-10 @524 devfreq_cooling_ops.state2power = devfreq_cooling_state2power; a76caf55 Ørjan Eide 2015-09-10 @525 devfreq_cooling_ops.power2state = devfreq_cooling_power2state; a76caf55 Ørjan Eide 2015-09-10 526 } a76caf55 Ørjan Eide 2015-09-10 527 a76caf55 Ørjan Eide 2015-09-10 528 err = devfreq_cooling_gen_tables(dfc); a76caf55 Ørjan Eide 2015-09-10 529 if (err) a76caf55 Ørjan Eide 2015-09-10 530 goto free_dfc; a76caf55 Ørjan Eide 2015-09-10 531 2f96c035 Matthew Wilcox 2016-12-21 532 err = ida_simple_get(_ida, 0, 0, GFP_KERNEL); 2f96c035 Matthew
linux-next: Tree for Apr 9
Hi all, Please do not add any v4.18 destined stuff to your linux-next included trees until after v4.17-rc1 has been released. Changes since 20180406: The vfs tree lost its build failure. The parisc-hd tree still had its build failure for which I applied a patch. The nvdimm tree gained a build failure so I used the version from next-20180406. Non-merge commits (relative to Linus' tree): 1826 1817 files changed, 67325 insertions(+), 33557 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 258 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (f8cf2f16a7c9 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add correct dependency to Makefile) Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4) Merging arm-current/fixes (1b8837b61714 ARM: 8750/1: deflate_xip_data.sh: minor fixes) Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery) Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" comment) Merging powerpc-fixes/fixes (52396500f97c powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRs) Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (4c7c12e0c9b8 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth) Merging bpf/master (33491588c1fb kernel/bpf/syscall: fix warning defined but not used) Merging ipsec/master (9a3fb9fb84cc xfrm: Fix transport mode skb control buffer usage.) Merging netfilter/master (b9fc828debc8 qede: Fix barrier usage after tx doorbell write.) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (4608f064532c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next) Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer rdma_addr_size() variants) Merging sound-current/for-linus (e15dc99dbb9c ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation) Merging pci-current/for-linus (fc110ebdd014 PCI: dwc: Fix enumeration end when reaching root subordinate) Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: add binging for r8a77965) Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add
linux-next: Tree for Apr 9
Hi all, Please do not add any v4.18 destined stuff to your linux-next included trees until after v4.17-rc1 has been released. Changes since 20180406: The vfs tree lost its build failure. The parisc-hd tree still had its build failure for which I applied a patch. The nvdimm tree gained a build failure so I used the version from next-20180406. Non-merge commits (relative to Linus' tree): 1826 1817 files changed, 67325 insertions(+), 33557 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 258 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (f8cf2f16a7c9 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add correct dependency to Makefile) Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4) Merging arm-current/fixes (1b8837b61714 ARM: 8750/1: deflate_xip_data.sh: minor fixes) Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery) Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" comment) Merging powerpc-fixes/fixes (52396500f97c powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRs) Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (4c7c12e0c9b8 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth) Merging bpf/master (33491588c1fb kernel/bpf/syscall: fix warning defined but not used) Merging ipsec/master (9a3fb9fb84cc xfrm: Fix transport mode skb control buffer usage.) Merging netfilter/master (b9fc828debc8 qede: Fix barrier usage after tx doorbell write.) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (4608f064532c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next) Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer rdma_addr_size() variants) Merging sound-current/for-linus (e15dc99dbb9c ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation) Merging pci-current/for-linus (fc110ebdd014 PCI: dwc: Fix enumeration end when reaching root subordinate) Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: add binging for r8a77965) Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add
[PATCH] ipc/shm: fix use-after-free of shm file via remap_file_pages()
From: Eric Biggerssyzbot reported a use-after-free of shm_file_data(file)->file->f_op in shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately it couldn't generate a reproducer, but I found a bug which I think caused it. When remap_file_pages() is passed a full System V shared memory segment, the memory is first unmapped, then a new map is created using the ->vm_file. Between these steps, the shm ID can be removed and reused for a new shm segment. But, shm_mmap() only checks whether the ID is currently valid before calling the underlying file's ->mmap(); it doesn't check whether it was reused. Thus it can use the wrong underlying file, one that was already freed. Fix this by making the "outer" shm file (the one that gets put in ->vm_file) hold a reference to the real shm file, and by making __shm_open() require that the file associated with the shm ID matches the one associated with the "outer" file. Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in shm_mmap()") almost fixed this bug, but it didn't go far enough because it didn't consider the case where the shm ID is reused. The following program usually reproduces this bug: #include #include #include #include int main() { int is_parent = (fork() != 0); srand(getpid()); for (;;) { int id = shmget(0xF00F, 4096, IPC_CREAT|0700); if (is_parent) { void *addr = shmat(id, NULL, 0); usleep(rand() % 50); while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0)); } else { usleep(rand() % 50); shmctl(id, IPC_RMID, NULL); } } } It causes the following NULL pointer dereference due to a 'struct file' being used while it's being freed. (I couldn't actually get a KASAN use-after-free splat like in the syzbot report. But I think it's possible with this bug; it would just take a more extraordinary race...) BUG: unable to handle kernel NULL pointer dereference at 0058 PGD 0 P4D 0 Oops: [#1] SMP NOPTI CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 RIP: 0010:d_inode include/linux/dcache.h:519 [inline] RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724 [...] Call Trace: file_accessed include/linux/fs.h:2063 [inline] shmem_mmap+0x25/0x40 mm/shmem.c:2149 call_mmap include/linux/fs.h:1789 [inline] shm_mmap+0x34/0x80 ipc/shm.c:465 call_mmap include/linux/fs.h:1789 [inline] mmap_region+0x309/0x5b0 mm/mmap.c:1712 do_mmap+0x294/0x4a0 mm/mmap.c:1483 do_mmap_pgoff include/linux/mm.h:2235 [inline] SYSC_remap_file_pages mm/mmap.c:2853 [inline] SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46...@syzkaller.appspotmail.com Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation") Cc: sta...@vger.kernel.org Signed-off-by: Eric Biggers --- ipc/shm.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/ipc/shm.c b/ipc/shm.c index acefe44fefefa..c80c5691a9970 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_struct *vma) if (IS_ERR(shp)) return PTR_ERR(shp); + if (shp->shm_file != sfd->file) { + /* ID was reused */ + shm_unlock(shp); + return -EINVAL; + } + shp->shm_atim = ktime_get_real_seconds(); ipc_update_pid(>shm_lprid, task_tgid(current)); shp->shm_nattch++; @@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma) int ret; /* -* In case of remap_file_pages() emulation, the file can represent -* removed IPC ID: propogate shm_lock() error to caller. +* In case of remap_file_pages() emulation, the file can represent an +* IPC ID that was removed, and possibly even reused by another shm +* segment already. Propagate this case as an error to caller. */ ret = __shm_open(vma); if (ret) @@ -480,6 +487,7 @@ static int shm_release(struct inode *ino, struct file *file) struct shm_file_data *sfd = shm_file_data(file); put_ipc_ns(sfd->ns); + fput(sfd->file); shm_file_data(file) = NULL; kfree(sfd);
[PATCH] ipc/shm: fix use-after-free of shm file via remap_file_pages()
From: Eric Biggers syzbot reported a use-after-free of shm_file_data(file)->file->f_op in shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately it couldn't generate a reproducer, but I found a bug which I think caused it. When remap_file_pages() is passed a full System V shared memory segment, the memory is first unmapped, then a new map is created using the ->vm_file. Between these steps, the shm ID can be removed and reused for a new shm segment. But, shm_mmap() only checks whether the ID is currently valid before calling the underlying file's ->mmap(); it doesn't check whether it was reused. Thus it can use the wrong underlying file, one that was already freed. Fix this by making the "outer" shm file (the one that gets put in ->vm_file) hold a reference to the real shm file, and by making __shm_open() require that the file associated with the shm ID matches the one associated with the "outer" file. Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in shm_mmap()") almost fixed this bug, but it didn't go far enough because it didn't consider the case where the shm ID is reused. The following program usually reproduces this bug: #include #include #include #include int main() { int is_parent = (fork() != 0); srand(getpid()); for (;;) { int id = shmget(0xF00F, 4096, IPC_CREAT|0700); if (is_parent) { void *addr = shmat(id, NULL, 0); usleep(rand() % 50); while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0)); } else { usleep(rand() % 50); shmctl(id, IPC_RMID, NULL); } } } It causes the following NULL pointer dereference due to a 'struct file' being used while it's being freed. (I couldn't actually get a KASAN use-after-free splat like in the syzbot report. But I think it's possible with this bug; it would just take a more extraordinary race...) BUG: unable to handle kernel NULL pointer dereference at 0058 PGD 0 P4D 0 Oops: [#1] SMP NOPTI CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 RIP: 0010:d_inode include/linux/dcache.h:519 [inline] RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724 [...] Call Trace: file_accessed include/linux/fs.h:2063 [inline] shmem_mmap+0x25/0x40 mm/shmem.c:2149 call_mmap include/linux/fs.h:1789 [inline] shm_mmap+0x34/0x80 ipc/shm.c:465 call_mmap include/linux/fs.h:1789 [inline] mmap_region+0x309/0x5b0 mm/mmap.c:1712 do_mmap+0x294/0x4a0 mm/mmap.c:1483 do_mmap_pgoff include/linux/mm.h:2235 [inline] SYSC_remap_file_pages mm/mmap.c:2853 [inline] SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46...@syzkaller.appspotmail.com Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation") Cc: sta...@vger.kernel.org Signed-off-by: Eric Biggers --- ipc/shm.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/ipc/shm.c b/ipc/shm.c index acefe44fefefa..c80c5691a9970 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_struct *vma) if (IS_ERR(shp)) return PTR_ERR(shp); + if (shp->shm_file != sfd->file) { + /* ID was reused */ + shm_unlock(shp); + return -EINVAL; + } + shp->shm_atim = ktime_get_real_seconds(); ipc_update_pid(>shm_lprid, task_tgid(current)); shp->shm_nattch++; @@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma) int ret; /* -* In case of remap_file_pages() emulation, the file can represent -* removed IPC ID: propogate shm_lock() error to caller. +* In case of remap_file_pages() emulation, the file can represent an +* IPC ID that was removed, and possibly even reused by another shm +* segment already. Propagate this case as an error to caller. */ ret = __shm_open(vma); if (ret) @@ -480,6 +487,7 @@ static int shm_release(struct inode *ino, struct file *file) struct shm_file_data *sfd = shm_file_data(file); put_ipc_ns(sfd->ns); + fput(sfd->file); shm_file_data(file) = NULL; kfree(sfd); return 0; @@ -1432,7 +1440,7 @@ long
Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree
On Mon, 9 Apr 2018, Stephen Rothwell wrote: > Hi James, > > On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morris> wrote: > > > > That's odd, my next-general branch is merged to Linus. > > The security tree in linux-next is > > git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing > > which has the efi-lock-down tree merged into it. Ahh, I see. I'll rebase next-testing. -- James Morris
Re: [PATCH AUTOSEL for 3.18 059/101] x86/um: thin archives build fix
On Mon, 9 Apr 2018 00:41:22 + Sasha Levinwrote: > From: Nicholas Piggin > > [ Upstream commit 827880ec260ba048f95fe646b96a205c394fa0f0 ] > > The linker does not like vdso-syms.lds in input archive files. > Make it an extra-y instead. I wouldn't say these should be needed on kernels without thin archives build. It shouldn't hurt, but no point risking stable breakage. Thanks, Nick
Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree
On Mon, 9 Apr 2018, Stephen Rothwell wrote: > Hi James, > > On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morris > wrote: > > > > That's odd, my next-general branch is merged to Linus. > > The security tree in linux-next is > > git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing > > which has the efi-lock-down tree merged into it. Ahh, I see. I'll rebase next-testing. -- James Morris
Re: [PATCH AUTOSEL for 3.18 059/101] x86/um: thin archives build fix
On Mon, 9 Apr 2018 00:41:22 + Sasha Levin wrote: > From: Nicholas Piggin > > [ Upstream commit 827880ec260ba048f95fe646b96a205c394fa0f0 ] > > The linker does not like vdso-syms.lds in input archive files. > Make it an extra-y instead. I wouldn't say these should be needed on kernels without thin archives build. It shouldn't hurt, but no point risking stable breakage. Thanks, Nick
Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree
Hi James, On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morriswrote: > > That's odd, my next-general branch is merged to Linus. The security tree in linux-next is git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing which has the efi-lock-down tree merged into it. -- Cheers, Stephen Rothwell pgp6JOJRcC1_1.pgp Description: OpenPGP digital signature
Re: linux-next: manual merge of the scsi-mkp tree with the efi-lock-down tree
Hi James, On Mon, 9 Apr 2018 12:51:53 +1000 (AEST) James Morris wrote: > > That's odd, my next-general branch is merged to Linus. The security tree in linux-next is git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git#next-testing which has the efi-lock-down tree merged into it. -- Cheers, Stephen Rothwell pgp6JOJRcC1_1.pgp Description: OpenPGP digital signature
Re: [PATCH v11 0/4] iommu/arm-smmu: Add runtime pm/sleep support
Hi Will, Robin, On Thu, Mar 22, 2018 at 7:22 PM Vivek Gautamwrote: > This series provides the support for turning on the arm-smmu's > clocks/power domains using runtime pm. This is done using the > recently introduced device links patches, which lets the smmu's > runtime to follow the master's runtime pm, so the smmu remains > powered only when the masters use it. > As not all implementations support clock/power gating, we are checking > for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime > power management for such smmu implementations that can support it. > This series also adds support for Qcom's arm-smmu-v2 variant that > has different clocks and power requirements. > Took some reference from the exynos runtime patches [1]. > With conditional runtime pm now, we avoid touching dev->power.lock > in fastpaths for smmu implementations that don't need to do anything > useful with pm_runtime. > This lets us to use the much-argued pm_runtime_get_sync/put_sync() > calls in map/unmap callbacks so that the clients do not have to > worry about handling any of the arm-smmu's power. > Previous version of this patch series is @ [5]. > [v11] > * Some more cleanups for device link. We don't need an explicit > delete for device link from the driver, but just set the flag > DL_FLAG_AUTOREMOVE. > device_link_add() API description says - > "If the DL_FLAG_AUTOREMOVE is set, the link will be removed > automatically when the consumer device driver unbinds." > * Addressed the comments for 'smmu' in arm_smmu_map/unmap(). > * Dropped the patch [10] that introduced device_link_del_dev() API. As far as I can see, this version addresses all the earlier comments. Do you think this is something that you could apply? Best regards, Tomasz
Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time
On Sun, Apr 8, 2018 at 8:38 PM, Tetsuo Handawrote: > Suggested changes on top of your patch: > > Replace "struct hlist_head *head" in "struct security_hook_list" with > "const unsigned int offset" because there is no need to initialize with > address of the immutable/mutable chains. > > Remove LSM_HOOK_INIT_MUTABLE() by embedding just offset (in bytes) from > head of "struct security_hook_heads" into "struct > security_hook_list"->offset. > > Make "struct security_hook_heads security_hook_heads" and > "struct security_hook_heads security_hook_heads_mutable" local variables. > > Rename "struct security_hook_heads security_hook_heads" to > "struct security_hook_heads security_mutable_hook_heads" and mark it as > __ro_after_init. > > Add the fourth argument to security_add_hooks() which specifies to which > chain (security_{mutable|immutable}_hook_heads) to connect. > > Make all built-in LSM modules (except SELinux if > CONFIG_SECURITY_SELINUX_DISABLE=y) be connected to > security_immutable_hook_heads. > > Rename __lsm_ro_after_init to __selinux_ro_after_init which is local to > SELinux. > > Mark "struct security_hook_list"->hook const because it won't change. > > Mark "struct security_hook_list"->lsm const because none of > security_add_hooks() callers are ready to modify the third argument. > > Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and > the exception in randomize_layout_plugin.c because preventing module > unloading won't work as expected. > Rather than completely removing the unloading code, might it make sense to add a BUG_ON or WARN_ON, in security_delete_hooks if allow_unload_module is false, and owner is not NULL?
Re: [PATCH v11 0/4] iommu/arm-smmu: Add runtime pm/sleep support
Hi Will, Robin, On Thu, Mar 22, 2018 at 7:22 PM Vivek Gautam wrote: > This series provides the support for turning on the arm-smmu's > clocks/power domains using runtime pm. This is done using the > recently introduced device links patches, which lets the smmu's > runtime to follow the master's runtime pm, so the smmu remains > powered only when the masters use it. > As not all implementations support clock/power gating, we are checking > for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime > power management for such smmu implementations that can support it. > This series also adds support for Qcom's arm-smmu-v2 variant that > has different clocks and power requirements. > Took some reference from the exynos runtime patches [1]. > With conditional runtime pm now, we avoid touching dev->power.lock > in fastpaths for smmu implementations that don't need to do anything > useful with pm_runtime. > This lets us to use the much-argued pm_runtime_get_sync/put_sync() > calls in map/unmap callbacks so that the clients do not have to > worry about handling any of the arm-smmu's power. > Previous version of this patch series is @ [5]. > [v11] > * Some more cleanups for device link. We don't need an explicit > delete for device link from the driver, but just set the flag > DL_FLAG_AUTOREMOVE. > device_link_add() API description says - > "If the DL_FLAG_AUTOREMOVE is set, the link will be removed > automatically when the consumer device driver unbinds." > * Addressed the comments for 'smmu' in arm_smmu_map/unmap(). > * Dropped the patch [10] that introduced device_link_del_dev() API. As far as I can see, this version addresses all the earlier comments. Do you think this is something that you could apply? Best regards, Tomasz
Re: [PATCH v5 1/1] security: Add mechanism to safely (un)load LSMs after boot time
On Sun, Apr 8, 2018 at 8:38 PM, Tetsuo Handa wrote: > Suggested changes on top of your patch: > > Replace "struct hlist_head *head" in "struct security_hook_list" with > "const unsigned int offset" because there is no need to initialize with > address of the immutable/mutable chains. > > Remove LSM_HOOK_INIT_MUTABLE() by embedding just offset (in bytes) from > head of "struct security_hook_heads" into "struct > security_hook_list"->offset. > > Make "struct security_hook_heads security_hook_heads" and > "struct security_hook_heads security_hook_heads_mutable" local variables. > > Rename "struct security_hook_heads security_hook_heads" to > "struct security_hook_heads security_mutable_hook_heads" and mark it as > __ro_after_init. > > Add the fourth argument to security_add_hooks() which specifies to which > chain (security_{mutable|immutable}_hook_heads) to connect. > > Make all built-in LSM modules (except SELinux if > CONFIG_SECURITY_SELINUX_DISABLE=y) be connected to > security_immutable_hook_heads. > > Rename __lsm_ro_after_init to __selinux_ro_after_init which is local to > SELinux. > > Mark "struct security_hook_list"->hook const because it won't change. > > Mark "struct security_hook_list"->lsm const because none of > security_add_hooks() callers are ready to modify the third argument. > > Remove SECURITY_HOOK_COUNT and "struct security_hook_list"->owner and > the exception in randomize_layout_plugin.c because preventing module > unloading won't work as expected. > Rather than completely removing the unloading code, might it make sense to add a BUG_ON or WARN_ON, in security_delete_hooks if allow_unload_module is false, and owner is not NULL?
[PATCH AUTOSEL for 4.15 001/189] firewire-ohci: work around oversized DMA reads on JMicron controllers
From: Hector Martin[ Upstream commit 188775181bc05f29372b305ef96485840e351fde ] At least some JMicron controllers issue buggy oversized DMA reads when fetching context descriptors, always fetching 0x20 bytes at once for descriptors which are only 0x10 bytes long. This is often harmless, but can cause page faults on modern systems with IOMMUs: DMAR: [DMA Read] Request device [05:00.0] fault addr fff56000 [fault reason 06] PTE Read access is not set firewire_ohci :05:00.0: DMA context IT0 has stopped, error code: evt_descriptor_read This works around the problem by always leaving 0x10 padding bytes at the end of descriptor buffer pages, which should be harmless to do unconditionally for controllers in case others have the same behavior. Signed-off-by: Hector Martin Reviewed-by: Clemens Ladisch Signed-off-by: Stefan Richter Signed-off-by: Sasha Levin --- drivers/firewire/ohci.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c index ccf52368a073..45c048751f3b 100644 --- a/drivers/firewire/ohci.c +++ b/drivers/firewire/ohci.c @@ -1128,7 +1128,13 @@ static int context_add_buffer(struct context *ctx) return -ENOMEM; offset = (void *)>buffer - (void *)desc; - desc->buffer_size = PAGE_SIZE - offset; + /* +* Some controllers, like JMicron ones, always issue 0x20-byte DMA reads +* for descriptors, even 0x10-byte ones. This can cause page faults when +* an IOMMU is in use and the oversized read crosses a page boundary. +* Work around this by always leaving at least 0x10 bytes of padding. +*/ + desc->buffer_size = PAGE_SIZE - offset - 0x10; desc->buffer_bus = bus_addr + offset; desc->used = 0; -- 2.15.1
[PATCH AUTOSEL for 4.15 001/189] firewire-ohci: work around oversized DMA reads on JMicron controllers
From: Hector Martin [ Upstream commit 188775181bc05f29372b305ef96485840e351fde ] At least some JMicron controllers issue buggy oversized DMA reads when fetching context descriptors, always fetching 0x20 bytes at once for descriptors which are only 0x10 bytes long. This is often harmless, but can cause page faults on modern systems with IOMMUs: DMAR: [DMA Read] Request device [05:00.0] fault addr fff56000 [fault reason 06] PTE Read access is not set firewire_ohci :05:00.0: DMA context IT0 has stopped, error code: evt_descriptor_read This works around the problem by always leaving 0x10 padding bytes at the end of descriptor buffer pages, which should be harmless to do unconditionally for controllers in case others have the same behavior. Signed-off-by: Hector Martin Reviewed-by: Clemens Ladisch Signed-off-by: Stefan Richter Signed-off-by: Sasha Levin --- drivers/firewire/ohci.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c index ccf52368a073..45c048751f3b 100644 --- a/drivers/firewire/ohci.c +++ b/drivers/firewire/ohci.c @@ -1128,7 +1128,13 @@ static int context_add_buffer(struct context *ctx) return -ENOMEM; offset = (void *)>buffer - (void *)desc; - desc->buffer_size = PAGE_SIZE - offset; + /* +* Some controllers, like JMicron ones, always issue 0x20-byte DMA reads +* for descriptors, even 0x10-byte ones. This can cause page faults when +* an IOMMU is in use and the oversized read crosses a page boundary. +* Work around this by always leaving at least 0x10 bytes of padding. +*/ + desc->buffer_size = PAGE_SIZE - offset - 0x10; desc->buffer_bus = bus_addr + offset; desc->used = 0; -- 2.15.1
[PATCH AUTOSEL for 4.15 005/189] hwmon: (ina2xx) Fix access to uninitialized mutex
From: Marek Szyprowski[ Upstream commit 0c4c5860e9983eb3da7a3d73ca987643c3ed034b ] Initialize data->config_lock mutex before it is used by the driver code. This fixes following warning on Odroid XU3 boards: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc7-next-20180115-1-gb75575dee3f2 #107 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x90/0xc8) [] (dump_stack) from [] (register_lock_class+0x1c0/0x59c) [] (register_lock_class) from [] (__lock_acquire+0x78/0x1850) [] (__lock_acquire) from [] (lock_acquire+0xc8/0x2b8) [] (lock_acquire) from [] (__mutex_lock+0x60/0xa0c) [] (__mutex_lock) from [] (mutex_lock_nested+0x1c/0x24) [] (mutex_lock_nested) from [] (ina2xx_set_shunt+0x70/0xb0) [] (ina2xx_set_shunt) from [] (ina2xx_probe+0x88/0x1b0) [] (ina2xx_probe) from [] (i2c_device_probe+0x1e0/0x2d0) [] (i2c_device_probe) from [] (driver_probe_device+0x2b8/0x4a0) [] (driver_probe_device) from [] (__driver_attach+0xfc/0x120) [] (__driver_attach) from [] (bus_for_each_dev+0x58/0x7c) [] (bus_for_each_dev) from [] (bus_add_driver+0x174/0x250) [] (bus_add_driver) from [] (driver_register+0x78/0xf4) [] (driver_register) from [] (i2c_register_driver+0x38/0xa8) [] (i2c_register_driver) from [] (do_one_initcall+0x48/0x18c) [] (do_one_initcall) from [] (kernel_init_freeable+0x110/0x1d4) [] (kernel_init_freeable) from [] (kernel_init+0x8/0x114) [] (kernel_init) from [] (ret_from_fork+0x14/0x20) Fixes: 5d389b125186 ("hwmon: (ina2xx) Make calibration register value fixed") Signed-off-by: Marek Szyprowski Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin --- drivers/hwmon/ina2xx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/hwmon/ina2xx.c b/drivers/hwmon/ina2xx.c index 62e38fa8cda2..a55823b52b2f 100644 --- a/drivers/hwmon/ina2xx.c +++ b/drivers/hwmon/ina2xx.c @@ -438,6 +438,7 @@ static int ina2xx_probe(struct i2c_client *client, /* set the device type */ data->config = _config[chip]; + mutex_init(>config_lock); if (of_property_read_u32(dev->of_node, "shunt-resistor", ) < 0) { struct ina2xx_platform_data *pdata = dev_get_platdata(dev); @@ -467,8 +468,6 @@ static int ina2xx_probe(struct i2c_client *client, return -ENODEV; } - mutex_init(>config_lock); - data->groups[group++] = _group; if (id->driver_data == ina226) data->groups[group++] = _group; -- 2.15.1
[PATCH AUTOSEL for 4.15 004/189] nvme: host delete_work and reset_work on separate workqueues
From: Roy Shterman[ Upstream commit b227c59b9b5b8ae52639c8980af853d2f654f90a ] We need to ensure that delete_work will be hosted on a different workqueue than all the works we flush or cancel from it. Otherwise we may hit a circular dependency warning [1]. Also, given that delete_work flushes reset_work, host reset_work on nvme_reset_wq and delete_work on nvme_delete_wq. In addition, fix the flushing in the individual drivers to flush nvme_delete_wq when draining queued deletes. [1]: [ 178.491942] = [ 178.492718] [ INFO: possible recursive locking detected ] [ 178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G OE [ 178.494382] - [ 178.495160] kworker/5:1/135 is trying to acquire lock: [ 178.495894] ( [ 178.496120] "nvme-wq" [ 178.496471] ){.+} [ 178.496599] , at: [ 178.496921] [] flush_work+0x1a6/0x2d0 [ 178.497670] but task is already holding lock: [ 178.498499] ( [ 178.498724] "nvme-wq" [ 178.499074] ){.+} [ 178.499202] , at: [ 178.499520] [] process_one_work+0x162/0x6a0 [ 178.500343] other info that might help us debug this: [ 178.501269] Possible unsafe locking scenario: [ 178.502113]CPU0 [ 178.502472] [ 178.502829] lock( [ 178.503115] "nvme-wq" [ 178.503467] ); [ 178.503716] lock( [ 178.504001] "nvme-wq" [ 178.504353] ); [ 178.504601] *** DEADLOCK *** [ 178.505441] May be due to missing lock nesting notation [ 178.506453] 2 locks held by kworker/5:1/135: [ 178.507068] #0: [ 178.507330] ( [ 178.507598] "nvme-wq" [ 178.507726] ){.+} [ 178.508079] , at: [ 178.508173] [] process_one_work+0x162/0x6a0 [ 178.509004] #1: [ 178.509265] ( [ 178.509532] (>delete_work) [ 178.509795] ){+.+.+.} [ 178.510145] , at: [ 178.510239] [] process_one_work+0x162/0x6a0 [ 178.511070] stack backtrace: : [ 178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G OE 4.9.0-rc4-c844263313a8-lb #3 [ 178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014 [ 178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp] [ 178.515071] c2668175bae0 a7450823 a88abd80 a88abd80 [ 178.516195] c2668175bb98 a70eb012 a8d8d90d 9c472e9ea700 [ 178.517318] 9c472e9ea700 9c47 9c477200 ab83be61bec0d50e [ 178.518443] Call Trace: [ 178.518807] [] dump_stack+0x85/0xc2 [ 178.519542] [] __lock_acquire+0x17d2/0x18f0 [ 178.520377] [] ? serial8250_console_putchar+0x27/0x30 [ 178.521330] [] ? wait_for_xmitr+0xa0/0xa0 [ 178.522174] [] ? flush_work+0x18b/0x2d0 [ 178.522975] [] lock_acquire+0x11b/0x220 [ 178.523753] [] ? flush_work+0x1a6/0x2d0 [ 178.524535] [] flush_work+0x1c9/0x2d0 [ 178.525291] [] ? flush_work+0x1a6/0x2d0 [ 178.526077] [] ? flush_workqueue_prep_pwqs+0x220/0x220 [ 178.527040] [] __cancel_work_timer+0x10f/0x1d0 [ 178.527907] [] ? vprintk_default+0x29/0x40 [ 178.528726] [] ? printk+0x48/0x50 [ 178.529434] [] cancel_delayed_work_sync+0x13/0x20 [ 178.530381] [] nvme_stop_ctrl+0x5b/0x70 [nvme_core] [ 178.531314] [] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp] [ 178.532271] [] process_one_work+0x1e1/0x6a0 [ 178.533101] [] ? process_one_work+0x162/0x6a0 [ 178.533954] [] worker_thread+0x4e/0x490 [ 178.534735] [] ? process_one_work+0x6a0/0x6a0 [ 178.535588] [] ? process_one_work+0x6a0/0x6a0 [ 178.536441] [] kthread+0xff/0x120 [ 178.537149] [] ? kthread_park+0x60/0x60 [ 178.538094] [] ? kthread_park+0x60/0x60 [ 178.538900] [] ret_from_fork+0x2a/0x40 Signed-off-by: Roy Shterman Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/core.c | 44 +++- drivers/nvme/host/nvme.h | 2 ++ drivers/nvme/host/rdma.c | 2 +- drivers/nvme/target/loop.c | 2 +- 4 files changed, 43 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 935593032123..93a4fa053e7f 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -65,9 +65,26 @@ static bool streams; module_param(streams, bool, 0644); MODULE_PARM_DESC(streams, "turn on support for Streams write directives"); +/* + * nvme_wq - hosts nvme related works that are not reset or delete + * nvme_reset_wq - hosts nvme reset works + * nvme_delete_wq - hosts nvme delete works + * + * nvme_wq will host works such are scan, aen handling, fw activation, + * keep-alive error recovery, periodic reconnects etc. nvme_reset_wq + * runs reset works which also flush works hosted on nvme_wq for + * serialization purposes. nvme_delete_wq host controller deletion + * works which flush reset works for serialization. + */
[PATCH AUTOSEL for 4.15 005/189] hwmon: (ina2xx) Fix access to uninitialized mutex
From: Marek Szyprowski [ Upstream commit 0c4c5860e9983eb3da7a3d73ca987643c3ed034b ] Initialize data->config_lock mutex before it is used by the driver code. This fixes following warning on Odroid XU3 boards: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc7-next-20180115-1-gb75575dee3f2 #107 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x90/0xc8) [] (dump_stack) from [] (register_lock_class+0x1c0/0x59c) [] (register_lock_class) from [] (__lock_acquire+0x78/0x1850) [] (__lock_acquire) from [] (lock_acquire+0xc8/0x2b8) [] (lock_acquire) from [] (__mutex_lock+0x60/0xa0c) [] (__mutex_lock) from [] (mutex_lock_nested+0x1c/0x24) [] (mutex_lock_nested) from [] (ina2xx_set_shunt+0x70/0xb0) [] (ina2xx_set_shunt) from [] (ina2xx_probe+0x88/0x1b0) [] (ina2xx_probe) from [] (i2c_device_probe+0x1e0/0x2d0) [] (i2c_device_probe) from [] (driver_probe_device+0x2b8/0x4a0) [] (driver_probe_device) from [] (__driver_attach+0xfc/0x120) [] (__driver_attach) from [] (bus_for_each_dev+0x58/0x7c) [] (bus_for_each_dev) from [] (bus_add_driver+0x174/0x250) [] (bus_add_driver) from [] (driver_register+0x78/0xf4) [] (driver_register) from [] (i2c_register_driver+0x38/0xa8) [] (i2c_register_driver) from [] (do_one_initcall+0x48/0x18c) [] (do_one_initcall) from [] (kernel_init_freeable+0x110/0x1d4) [] (kernel_init_freeable) from [] (kernel_init+0x8/0x114) [] (kernel_init) from [] (ret_from_fork+0x14/0x20) Fixes: 5d389b125186 ("hwmon: (ina2xx) Make calibration register value fixed") Signed-off-by: Marek Szyprowski Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin --- drivers/hwmon/ina2xx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/hwmon/ina2xx.c b/drivers/hwmon/ina2xx.c index 62e38fa8cda2..a55823b52b2f 100644 --- a/drivers/hwmon/ina2xx.c +++ b/drivers/hwmon/ina2xx.c @@ -438,6 +438,7 @@ static int ina2xx_probe(struct i2c_client *client, /* set the device type */ data->config = _config[chip]; + mutex_init(>config_lock); if (of_property_read_u32(dev->of_node, "shunt-resistor", ) < 0) { struct ina2xx_platform_data *pdata = dev_get_platdata(dev); @@ -467,8 +468,6 @@ static int ina2xx_probe(struct i2c_client *client, return -ENODEV; } - mutex_init(>config_lock); - data->groups[group++] = _group; if (id->driver_data == ina226) data->groups[group++] = _group; -- 2.15.1
[PATCH AUTOSEL for 4.15 004/189] nvme: host delete_work and reset_work on separate workqueues
From: Roy Shterman [ Upstream commit b227c59b9b5b8ae52639c8980af853d2f654f90a ] We need to ensure that delete_work will be hosted on a different workqueue than all the works we flush or cancel from it. Otherwise we may hit a circular dependency warning [1]. Also, given that delete_work flushes reset_work, host reset_work on nvme_reset_wq and delete_work on nvme_delete_wq. In addition, fix the flushing in the individual drivers to flush nvme_delete_wq when draining queued deletes. [1]: [ 178.491942] = [ 178.492718] [ INFO: possible recursive locking detected ] [ 178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G OE [ 178.494382] - [ 178.495160] kworker/5:1/135 is trying to acquire lock: [ 178.495894] ( [ 178.496120] "nvme-wq" [ 178.496471] ){.+} [ 178.496599] , at: [ 178.496921] [] flush_work+0x1a6/0x2d0 [ 178.497670] but task is already holding lock: [ 178.498499] ( [ 178.498724] "nvme-wq" [ 178.499074] ){.+} [ 178.499202] , at: [ 178.499520] [] process_one_work+0x162/0x6a0 [ 178.500343] other info that might help us debug this: [ 178.501269] Possible unsafe locking scenario: [ 178.502113]CPU0 [ 178.502472] [ 178.502829] lock( [ 178.503115] "nvme-wq" [ 178.503467] ); [ 178.503716] lock( [ 178.504001] "nvme-wq" [ 178.504353] ); [ 178.504601] *** DEADLOCK *** [ 178.505441] May be due to missing lock nesting notation [ 178.506453] 2 locks held by kworker/5:1/135: [ 178.507068] #0: [ 178.507330] ( [ 178.507598] "nvme-wq" [ 178.507726] ){.+} [ 178.508079] , at: [ 178.508173] [] process_one_work+0x162/0x6a0 [ 178.509004] #1: [ 178.509265] ( [ 178.509532] (>delete_work) [ 178.509795] ){+.+.+.} [ 178.510145] , at: [ 178.510239] [] process_one_work+0x162/0x6a0 [ 178.511070] stack backtrace: : [ 178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G OE 4.9.0-rc4-c844263313a8-lb #3 [ 178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014 [ 178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp] [ 178.515071] c2668175bae0 a7450823 a88abd80 a88abd80 [ 178.516195] c2668175bb98 a70eb012 a8d8d90d 9c472e9ea700 [ 178.517318] 9c472e9ea700 9c47 9c477200 ab83be61bec0d50e [ 178.518443] Call Trace: [ 178.518807] [] dump_stack+0x85/0xc2 [ 178.519542] [] __lock_acquire+0x17d2/0x18f0 [ 178.520377] [] ? serial8250_console_putchar+0x27/0x30 [ 178.521330] [] ? wait_for_xmitr+0xa0/0xa0 [ 178.522174] [] ? flush_work+0x18b/0x2d0 [ 178.522975] [] lock_acquire+0x11b/0x220 [ 178.523753] [] ? flush_work+0x1a6/0x2d0 [ 178.524535] [] flush_work+0x1c9/0x2d0 [ 178.525291] [] ? flush_work+0x1a6/0x2d0 [ 178.526077] [] ? flush_workqueue_prep_pwqs+0x220/0x220 [ 178.527040] [] __cancel_work_timer+0x10f/0x1d0 [ 178.527907] [] ? vprintk_default+0x29/0x40 [ 178.528726] [] ? printk+0x48/0x50 [ 178.529434] [] cancel_delayed_work_sync+0x13/0x20 [ 178.530381] [] nvme_stop_ctrl+0x5b/0x70 [nvme_core] [ 178.531314] [] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp] [ 178.532271] [] process_one_work+0x1e1/0x6a0 [ 178.533101] [] ? process_one_work+0x162/0x6a0 [ 178.533954] [] worker_thread+0x4e/0x490 [ 178.534735] [] ? process_one_work+0x6a0/0x6a0 [ 178.535588] [] ? process_one_work+0x6a0/0x6a0 [ 178.536441] [] kthread+0xff/0x120 [ 178.537149] [] ? kthread_park+0x60/0x60 [ 178.538094] [] ? kthread_park+0x60/0x60 [ 178.538900] [] ret_from_fork+0x2a/0x40 Signed-off-by: Roy Shterman Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/core.c | 44 +++- drivers/nvme/host/nvme.h | 2 ++ drivers/nvme/host/rdma.c | 2 +- drivers/nvme/target/loop.c | 2 +- 4 files changed, 43 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 935593032123..93a4fa053e7f 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -65,9 +65,26 @@ static bool streams; module_param(streams, bool, 0644); MODULE_PARM_DESC(streams, "turn on support for Streams write directives"); +/* + * nvme_wq - hosts nvme related works that are not reset or delete + * nvme_reset_wq - hosts nvme reset works + * nvme_delete_wq - hosts nvme delete works + * + * nvme_wq will host works such are scan, aen handling, fw activation, + * keep-alive error recovery, periodic reconnects etc. nvme_reset_wq + * runs reset works which also flush works hosted on nvme_wq for + * serialization purposes. nvme_delete_wq host controller deletion + * works which flush reset works for serialization. + */ struct workqueue_struct *nvme_wq; EXPORT_SYMBOL_GPL(nvme_wq); +struct workqueue_struct *nvme_reset_wq;
[PATCH AUTOSEL for 4.15 002/189] x86/tsc: Allow TSC calibration without PIT
From: Peter Zijlstra[ Upstream commit 30c7e5b123673d5e570e238dbada2fb68a87212c ] Zhang Rui reported that a Surface Pro 4 will fail to boot with lapic=notscdeadline. Part of the problem is that that machine doesn't have a PIT. If, for some reason, the TSC init has to fall back to TSC calibration, it relies on the PIT to be present. Allow TSC calibration to reliably fall back to HPET. The below results in an accurate TSC measurement when forced on a IVB: tsc: Unable to calibrate against PIT tsc: No reference (HPET/PMTIMER) available tsc: Unable to calibrate against PIT tsc: using HPET reference calibration tsc: Detected 2792.451 MHz processor Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: len.br...@intel.com Cc: rui.zh...@intel.com Link: https://lkml.kernel.org/r/20171222092243.333145...@infradead.org Signed-off-by: Sasha Levin --- arch/x86/include/asm/i8259.h | 5 + arch/x86/kernel/tsc.c| 18 ++ 2 files changed, 23 insertions(+) diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h index c8376b40e882..5cdcdbd4d892 100644 --- a/arch/x86/include/asm/i8259.h +++ b/arch/x86/include/asm/i8259.h @@ -69,6 +69,11 @@ struct legacy_pic { extern struct legacy_pic *legacy_pic; extern struct legacy_pic null_legacy_pic; +static inline bool has_legacy_pic(void) +{ + return legacy_pic != _legacy_pic; +} + static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index e169e85db434..a2c9dd8bfc6f 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -25,6 +25,7 @@ #include #include #include +#include unsigned int __read_mostly cpu_khz;/* TSC clocks / usec, not used here */ EXPORT_SYMBOL(cpu_khz); @@ -363,6 +364,20 @@ static unsigned long pit_calibrate_tsc(u32 latch, unsigned long ms, int loopmin) unsigned long tscmin, tscmax; int pitcnt; + if (!has_legacy_pic()) { + /* +* Relies on tsc_early_delay_calibrate() to have given us semi +* usable udelay(), wait for the same 50ms we would have with +* the PIT loop below. +*/ + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + return ULONG_MAX; + } + /* Set the Gate high, disable speaker */ outb((inb(0x61) & ~0x02) | 0x01, 0x61); @@ -487,6 +502,9 @@ static unsigned long quick_pit_calibrate(void) u64 tsc, delta; unsigned long d1, d2; + if (!has_legacy_pic()) + return 0; + /* Set the Gate high, disable speaker */ outb((inb(0x61) & ~0x02) | 0x01, 0x61); -- 2.15.1
[PATCH AUTOSEL for 4.15 002/189] x86/tsc: Allow TSC calibration without PIT
From: Peter Zijlstra [ Upstream commit 30c7e5b123673d5e570e238dbada2fb68a87212c ] Zhang Rui reported that a Surface Pro 4 will fail to boot with lapic=notscdeadline. Part of the problem is that that machine doesn't have a PIT. If, for some reason, the TSC init has to fall back to TSC calibration, it relies on the PIT to be present. Allow TSC calibration to reliably fall back to HPET. The below results in an accurate TSC measurement when forced on a IVB: tsc: Unable to calibrate against PIT tsc: No reference (HPET/PMTIMER) available tsc: Unable to calibrate against PIT tsc: using HPET reference calibration tsc: Detected 2792.451 MHz processor Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: len.br...@intel.com Cc: rui.zh...@intel.com Link: https://lkml.kernel.org/r/20171222092243.333145...@infradead.org Signed-off-by: Sasha Levin --- arch/x86/include/asm/i8259.h | 5 + arch/x86/kernel/tsc.c| 18 ++ 2 files changed, 23 insertions(+) diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h index c8376b40e882..5cdcdbd4d892 100644 --- a/arch/x86/include/asm/i8259.h +++ b/arch/x86/include/asm/i8259.h @@ -69,6 +69,11 @@ struct legacy_pic { extern struct legacy_pic *legacy_pic; extern struct legacy_pic null_legacy_pic; +static inline bool has_legacy_pic(void) +{ + return legacy_pic != _legacy_pic; +} + static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index e169e85db434..a2c9dd8bfc6f 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -25,6 +25,7 @@ #include #include #include +#include unsigned int __read_mostly cpu_khz;/* TSC clocks / usec, not used here */ EXPORT_SYMBOL(cpu_khz); @@ -363,6 +364,20 @@ static unsigned long pit_calibrate_tsc(u32 latch, unsigned long ms, int loopmin) unsigned long tscmin, tscmax; int pitcnt; + if (!has_legacy_pic()) { + /* +* Relies on tsc_early_delay_calibrate() to have given us semi +* usable udelay(), wait for the same 50ms we would have with +* the PIT loop below. +*/ + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + return ULONG_MAX; + } + /* Set the Gate high, disable speaker */ outb((inb(0x61) & ~0x02) | 0x01, 0x61); @@ -487,6 +502,9 @@ static unsigned long quick_pit_calibrate(void) u64 tsc, delta; unsigned long d1, d2; + if (!has_legacy_pic()) + return 0; + /* Set the Gate high, disable speaker */ outb((inb(0x61) & ~0x02) | 0x01, 0x61); -- 2.15.1
[PATCH AUTOSEL for 4.15 006/189] ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources
From: Hans de Goede[ Upstream commit e1681599345b8466786b6e54a2db2a00a068a3f3 ] acpi_lpss_create_device() skips handling LPSS devices which do not have a mmio resources in their resource list (typically these devices are disabled by the firmware). But since the LPSS code does not bind to the device, acpi_bus_attach() ends up still creating a platform device for it and the regular platform_driver for the ACPI HID still tries to bind to it. This happens e.g. on some boards which do not use the pwm-controller and have an empty or invalid resource-table for it. Currently this causes these error messages to get logged: [3.281966] pwm-lpss 80862288:00: invalid resource [3.287098] pwm-lpss: probe of 80862288:00 failed with error -22 This commit stops the undesirable creation of a platform_device for disabled LPSS devices by setting pnp.type.platform_id to 0. Note that acpi_scan_attach_handler() also sets pnp.type.platform_id to 0 when there is a matching handler for the device and that handler has no attach callback, so we simply behave as a handler without an attach function in this case. Signed-off-by: Hans de Goede Acked-by: Mika Westerberg Reviewed-by: Andy Shevchenko Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/acpi_lpss.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 7f2b02cc8ea1..c71f5a2a592e 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -465,6 +465,8 @@ static int acpi_lpss_create_device(struct acpi_device *adev, acpi_dev_free_resource_list(_list); if (!pdata->mmio_base) { + /* Avoid acpi_bus_attach() instantiating a pdev for this dev. */ + adev->pnp.type.platform_id = 0; /* Skip the device, but continue the namespace scan. */ ret = 0; goto err_out; -- 2.15.1
[PATCH AUTOSEL for 4.15 006/189] ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources
From: Hans de Goede [ Upstream commit e1681599345b8466786b6e54a2db2a00a068a3f3 ] acpi_lpss_create_device() skips handling LPSS devices which do not have a mmio resources in their resource list (typically these devices are disabled by the firmware). But since the LPSS code does not bind to the device, acpi_bus_attach() ends up still creating a platform device for it and the regular platform_driver for the ACPI HID still tries to bind to it. This happens e.g. on some boards which do not use the pwm-controller and have an empty or invalid resource-table for it. Currently this causes these error messages to get logged: [3.281966] pwm-lpss 80862288:00: invalid resource [3.287098] pwm-lpss: probe of 80862288:00 failed with error -22 This commit stops the undesirable creation of a platform_device for disabled LPSS devices by setting pnp.type.platform_id to 0. Note that acpi_scan_attach_handler() also sets pnp.type.platform_id to 0 when there is a matching handler for the device and that handler has no attach callback, so we simply behave as a handler without an attach function in this case. Signed-off-by: Hans de Goede Acked-by: Mika Westerberg Reviewed-by: Andy Shevchenko Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/acpi_lpss.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 7f2b02cc8ea1..c71f5a2a592e 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -465,6 +465,8 @@ static int acpi_lpss_create_device(struct acpi_device *adev, acpi_dev_free_resource_list(_list); if (!pdata->mmio_base) { + /* Avoid acpi_bus_attach() instantiating a pdev for this dev. */ + adev->pnp.type.platform_id = 0; /* Skip the device, but continue the namespace scan. */ ret = 0; goto err_out; -- 2.15.1
[PATCH AUTOSEL for 4.15 003/189] NFSv4: always set NFS_LOCK_LOST when a lock is lost.
From: NeilBrown[ Upstream commit dce2630c7da73b0634686bca557cc8945cc450c8 ] There are 2 comments in the NFSv4 code which suggest that SIGLOST should possibly be sent to a process. In these cases a lock has been lost. The current practice is to set NFS_LOCK_LOST so that read/write returns EIO when a lock is lost. So change these comments to code when sets NFS_LOCK_LOST. One case is when lock recovery after apparent server restart fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock attempt as part of lease recovery fails with NFS4ERR_DENIED. In an ideal world, these should not happen. However I have a packet trace showing an NFSv4.1 session getting NFS4ERR_BADSESSION after an extended network parition. The NFSv4.1 client treats this like server reboot until/unless it get NFS4ERR_NO_GRACE, in which case it switches over to "nograce" recovery mode. In this network trace, the client attempts to recover a lock and the server (incorrectly) reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This leads to the ineffective comment and the client then continues to write using the OPEN stateid. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- fs/nfs/nfs4proc.c | 12 fs/nfs/nfs4state.c | 5 - 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 56fa5a16e097..083802f7a1e9 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2019,7 +2019,7 @@ static int nfs4_open_reclaim(struct nfs4_state_owner *sp, struct nfs4_state *sta return ret; } -static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct nfs4_state *state, const nfs4_stateid *stateid, int err) +static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct nfs4_state *state, const nfs4_stateid *stateid, struct file_lock *fl, int err) { switch (err) { default: @@ -2066,7 +2066,11 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct return -EAGAIN; case -ENOMEM: case -NFS4ERR_DENIED: - /* kill_proc(fl->fl_pid, SIGLOST, 1); */ + if (fl) { + struct nfs4_lock_state *lsp = fl->fl_u.nfs4_fl.owner; + if (lsp) + set_bit(NFS_LOCK_LOST, >ls_flags); + } return 0; } return err; @@ -2102,7 +2106,7 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, err = nfs4_open_recover_helper(opendata, FMODE_READ); } nfs4_opendata_put(opendata); - return nfs4_handle_delegation_recall_error(server, state, stateid, err); + return nfs4_handle_delegation_recall_error(server, state, stateid, NULL, err); } static void nfs4_open_confirm_prepare(struct rpc_task *task, void *calldata) @@ -6739,7 +6743,7 @@ int nfs4_lock_delegation_recall(struct file_lock *fl, struct nfs4_state *state, if (err != 0) return err; err = _nfs4_do_setlk(state, F_SETLK, fl, NFS_LOCK_NEW); - return nfs4_handle_delegation_recall_error(server, state, stateid, err); + return nfs4_handle_delegation_recall_error(server, state, stateid, fl, err); } struct nfs_release_lockowner_data { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index e4f4a09ed9f4..91a4d4eeb235 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1482,6 +1482,7 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_ struct inode *inode = state->inode; struct nfs_inode *nfsi = NFS_I(inode); struct file_lock *fl; + struct nfs4_lock_state *lsp; int status = 0; struct file_lock_context *flctx = inode->i_flctx; struct list_head *list; @@ -1522,7 +1523,9 @@ restart: case -NFS4ERR_DENIED: case -NFS4ERR_RECLAIM_BAD: case -NFS4ERR_RECLAIM_CONFLICT: - /* kill_proc(fl->fl_pid, SIGLOST, 1); */ + lsp = fl->fl_u.nfs4_fl.owner; + if (lsp) + set_bit(NFS_LOCK_LOST, >ls_flags); status = 0; } spin_lock(>flc_lock); -- 2.15.1
[PATCH AUTOSEL for 4.15 011/189] RDMA/core: Clarify rdma_ah_find_type
From: Parav Pandit[ Upstream commit a6532e7139660c103dda181aa5b2c734aa26ed6c ] iWARP does not use rdma_ah_attr_type, and for this reason we do not have a RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp ports and for clarity it shouldn't have a special test for iWarp. This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB when wrongly called on an iWarp port. Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Parav Pandit Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin --- include/rdma/ib_verbs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 0d6a110dae7c..20ebf9061962 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3793,8 +3793,7 @@ static inline void rdma_ah_set_grh(struct rdma_ah_attr *attr, static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev, u32 port_num) { - if ((rdma_protocol_roce(dev, port_num)) || - (rdma_protocol_iwarp(dev, port_num))) + if (rdma_protocol_roce(dev, port_num)) return RDMA_AH_ATTR_TYPE_ROCE; else if ((rdma_protocol_ib(dev, port_num)) && (rdma_cap_opa_ah(dev, port_num))) -- 2.15.1
[PATCH AUTOSEL for 4.15 007/189] tipc: fix a potental access after delete in tipc_sk_join()
From: Jon Maloy[ Upstream commit febafc8455fdbb0ba53d596075068a683b75f355 ] In commit d12d2e12cec2 "tipc: send out join messages as soon as new member is discovered") we added a call to the function tipc_group_join() without considering the case that the preceding tipc_sk_publish() might have failed, and the group item already deleted. We fix this by returning from tipc_sk_join() directly after the failed tipc_sk_publish. Reported-by: syzbot+e3eeae78ea88b8d6d...@syzkaller.appspotmail.com Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/tipc/socket.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 3b4084480377..8efd2e42de30 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2759,6 +2759,7 @@ static int tipc_sk_join(struct tipc_sock *tsk, struct tipc_group_req *mreq) if (rc) { tipc_group_delete(net, grp); tsk->group = NULL; + return rc; } /* Eliminate any risk that a broadcast overtakes the sent JOIN */ -- 2.15.1
[PATCH AUTOSEL for 4.15 003/189] NFSv4: always set NFS_LOCK_LOST when a lock is lost.
From: NeilBrown [ Upstream commit dce2630c7da73b0634686bca557cc8945cc450c8 ] There are 2 comments in the NFSv4 code which suggest that SIGLOST should possibly be sent to a process. In these cases a lock has been lost. The current practice is to set NFS_LOCK_LOST so that read/write returns EIO when a lock is lost. So change these comments to code when sets NFS_LOCK_LOST. One case is when lock recovery after apparent server restart fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock attempt as part of lease recovery fails with NFS4ERR_DENIED. In an ideal world, these should not happen. However I have a packet trace showing an NFSv4.1 session getting NFS4ERR_BADSESSION after an extended network parition. The NFSv4.1 client treats this like server reboot until/unless it get NFS4ERR_NO_GRACE, in which case it switches over to "nograce" recovery mode. In this network trace, the client attempts to recover a lock and the server (incorrectly) reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This leads to the ineffective comment and the client then continues to write using the OPEN stateid. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- fs/nfs/nfs4proc.c | 12 fs/nfs/nfs4state.c | 5 - 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 56fa5a16e097..083802f7a1e9 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2019,7 +2019,7 @@ static int nfs4_open_reclaim(struct nfs4_state_owner *sp, struct nfs4_state *sta return ret; } -static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct nfs4_state *state, const nfs4_stateid *stateid, int err) +static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct nfs4_state *state, const nfs4_stateid *stateid, struct file_lock *fl, int err) { switch (err) { default: @@ -2066,7 +2066,11 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct return -EAGAIN; case -ENOMEM: case -NFS4ERR_DENIED: - /* kill_proc(fl->fl_pid, SIGLOST, 1); */ + if (fl) { + struct nfs4_lock_state *lsp = fl->fl_u.nfs4_fl.owner; + if (lsp) + set_bit(NFS_LOCK_LOST, >ls_flags); + } return 0; } return err; @@ -2102,7 +2106,7 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, err = nfs4_open_recover_helper(opendata, FMODE_READ); } nfs4_opendata_put(opendata); - return nfs4_handle_delegation_recall_error(server, state, stateid, err); + return nfs4_handle_delegation_recall_error(server, state, stateid, NULL, err); } static void nfs4_open_confirm_prepare(struct rpc_task *task, void *calldata) @@ -6739,7 +6743,7 @@ int nfs4_lock_delegation_recall(struct file_lock *fl, struct nfs4_state *state, if (err != 0) return err; err = _nfs4_do_setlk(state, F_SETLK, fl, NFS_LOCK_NEW); - return nfs4_handle_delegation_recall_error(server, state, stateid, err); + return nfs4_handle_delegation_recall_error(server, state, stateid, fl, err); } struct nfs_release_lockowner_data { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index e4f4a09ed9f4..91a4d4eeb235 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1482,6 +1482,7 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_ struct inode *inode = state->inode; struct nfs_inode *nfsi = NFS_I(inode); struct file_lock *fl; + struct nfs4_lock_state *lsp; int status = 0; struct file_lock_context *flctx = inode->i_flctx; struct list_head *list; @@ -1522,7 +1523,9 @@ restart: case -NFS4ERR_DENIED: case -NFS4ERR_RECLAIM_BAD: case -NFS4ERR_RECLAIM_CONFLICT: - /* kill_proc(fl->fl_pid, SIGLOST, 1); */ + lsp = fl->fl_u.nfs4_fl.owner; + if (lsp) + set_bit(NFS_LOCK_LOST, >ls_flags); status = 0; } spin_lock(>flc_lock); -- 2.15.1
[PATCH AUTOSEL for 4.15 011/189] RDMA/core: Clarify rdma_ah_find_type
From: Parav Pandit [ Upstream commit a6532e7139660c103dda181aa5b2c734aa26ed6c ] iWARP does not use rdma_ah_attr_type, and for this reason we do not have a RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp ports and for clarity it shouldn't have a special test for iWarp. This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB when wrongly called on an iWarp port. Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Parav Pandit Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin --- include/rdma/ib_verbs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 0d6a110dae7c..20ebf9061962 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3793,8 +3793,7 @@ static inline void rdma_ah_set_grh(struct rdma_ah_attr *attr, static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev, u32 port_num) { - if ((rdma_protocol_roce(dev, port_num)) || - (rdma_protocol_iwarp(dev, port_num))) + if (rdma_protocol_roce(dev, port_num)) return RDMA_AH_ATTR_TYPE_ROCE; else if ((rdma_protocol_ib(dev, port_num)) && (rdma_cap_opa_ah(dev, port_num))) -- 2.15.1
[PATCH AUTOSEL for 4.15 007/189] tipc: fix a potental access after delete in tipc_sk_join()
From: Jon Maloy [ Upstream commit febafc8455fdbb0ba53d596075068a683b75f355 ] In commit d12d2e12cec2 "tipc: send out join messages as soon as new member is discovered") we added a call to the function tipc_group_join() without considering the case that the preceding tipc_sk_publish() might have failed, and the group item already deleted. We fix this by returning from tipc_sk_join() directly after the failed tipc_sk_publish. Reported-by: syzbot+e3eeae78ea88b8d6d...@syzkaller.appspotmail.com Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/tipc/socket.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 3b4084480377..8efd2e42de30 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2759,6 +2759,7 @@ static int tipc_sk_join(struct tipc_sock *tsk, struct tipc_group_req *mreq) if (rc) { tipc_group_delete(net, grp); tsk->group = NULL; + return rc; } /* Eliminate any risk that a broadcast overtakes the sent JOIN */ -- 2.15.1
[PATCH AUTOSEL for 4.15 010/189] kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
From: Paolo Bonzini[ Upstream commit 51776043afa415435c7e4636204fbe4f7edc4501 ] This ioctl is obsolete (it was used by Xenner as far as I know) but still let's not break it gratuitously... Its handler is copying directly into struct kvm. Go through a bounce buffer instead, with the added benefit that we can actually do something useful with the flags argument---the previous code was exiting with -EINVAL but still doing the copy. This technically is a userspace ABI breakage, but since no one should be using the ioctl, it's a good occasion to see if someone actually complains. Cc: kernel-harden...@lists.openwall.com Cc: Kees Cook Cc: Radim Krčmář Signed-off-by: Paolo Bonzini Signed-off-by: Kees Cook Signed-off-by: Sasha Levin --- arch/x86/kvm/x86.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a10da5052072..41a8ac44d5cc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4244,13 +4244,14 @@ set_identity_unlock: mutex_unlock(>lock); break; case KVM_XEN_HVM_CONFIG: { + struct kvm_xen_hvm_config xhc; r = -EFAULT; - if (copy_from_user(>arch.xen_hvm_config, argp, - sizeof(struct kvm_xen_hvm_config))) + if (copy_from_user(, argp, sizeof(xhc))) goto out; r = -EINVAL; - if (kvm->arch.xen_hvm_config.flags) + if (xhc.flags) goto out; + memcpy(>arch.xen_hvm_config, , sizeof(xhc)); r = 0; break; } -- 2.15.1
[PATCH AUTOSEL for 4.15 014/189] tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
From: Anna-Maria Gleixner[ Upstream commit 91633eed73a3ac37aaece5c8c1f93a18bae616a9 ] So far only CLOCK_MONOTONIC and CLOCK_REALTIME were taken into account as well as HRTIMER_MODE_ABS/REL in the hrtimer_init tracepoint. The query for detecting the ABS or REL timer modes is not valid anymore, it got broken by the introduction of HRTIMER_MODE_PINNED. HRTIMER_MODE_PINNED is not evaluated in the hrtimer_init() call, but for the sake of completeness print all given modes. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-9-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- include/trace/events/timer.h | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 16e305e69f34..c6f728037c53 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -136,6 +136,20 @@ DEFINE_EVENT(timer_class, timer_cancel, TP_ARGS(timer) ); +#define decode_clockid(type) \ + __print_symbolic(type, \ + { CLOCK_REALTIME, "CLOCK_REALTIME"}, \ + { CLOCK_MONOTONIC, "CLOCK_MONOTONIC" }, \ + { CLOCK_BOOTTIME, "CLOCK_BOOTTIME"}, \ + { CLOCK_TAI,"CLOCK_TAI" }) + +#define decode_hrtimer_mode(mode) \ + __print_symbolic(mode, \ + { HRTIMER_MODE_ABS, "ABS" }, \ + { HRTIMER_MODE_REL, "REL" }, \ + { HRTIMER_MODE_ABS_PINNED, "ABS|PINNED"}, \ + { HRTIMER_MODE_REL_PINNED, "REL|PINNED"}) + /** * hrtimer_init - called when the hrtimer is initialized * @hrtimer: pointer to struct hrtimer @@ -162,10 +176,8 @@ TRACE_EVENT(hrtimer_init, ), TP_printk("hrtimer=%p clockid=%s mode=%s", __entry->hrtimer, - __entry->clockid == CLOCK_REALTIME ? - "CLOCK_REALTIME" : "CLOCK_MONOTONIC", - __entry->mode == HRTIMER_MODE_ABS ? - "HRTIMER_MODE_ABS" : "HRTIMER_MODE_REL") + decode_clockid(__entry->clockid), + decode_hrtimer_mode(__entry->mode)) ); /** -- 2.15.1
[PATCH AUTOSEL for 4.15 010/189] kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
From: Paolo Bonzini [ Upstream commit 51776043afa415435c7e4636204fbe4f7edc4501 ] This ioctl is obsolete (it was used by Xenner as far as I know) but still let's not break it gratuitously... Its handler is copying directly into struct kvm. Go through a bounce buffer instead, with the added benefit that we can actually do something useful with the flags argument---the previous code was exiting with -EINVAL but still doing the copy. This technically is a userspace ABI breakage, but since no one should be using the ioctl, it's a good occasion to see if someone actually complains. Cc: kernel-harden...@lists.openwall.com Cc: Kees Cook Cc: Radim Krčmář Signed-off-by: Paolo Bonzini Signed-off-by: Kees Cook Signed-off-by: Sasha Levin --- arch/x86/kvm/x86.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a10da5052072..41a8ac44d5cc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4244,13 +4244,14 @@ set_identity_unlock: mutex_unlock(>lock); break; case KVM_XEN_HVM_CONFIG: { + struct kvm_xen_hvm_config xhc; r = -EFAULT; - if (copy_from_user(>arch.xen_hvm_config, argp, - sizeof(struct kvm_xen_hvm_config))) + if (copy_from_user(, argp, sizeof(xhc))) goto out; r = -EINVAL; - if (kvm->arch.xen_hvm_config.flags) + if (xhc.flags) goto out; + memcpy(>arch.xen_hvm_config, , sizeof(xhc)); r = 0; break; } -- 2.15.1
[PATCH AUTOSEL for 4.15 014/189] tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
From: Anna-Maria Gleixner [ Upstream commit 91633eed73a3ac37aaece5c8c1f93a18bae616a9 ] So far only CLOCK_MONOTONIC and CLOCK_REALTIME were taken into account as well as HRTIMER_MODE_ABS/REL in the hrtimer_init tracepoint. The query for detecting the ABS or REL timer modes is not valid anymore, it got broken by the introduction of HRTIMER_MODE_PINNED. HRTIMER_MODE_PINNED is not evaluated in the hrtimer_init() call, but for the sake of completeness print all given modes. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-9-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- include/trace/events/timer.h | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 16e305e69f34..c6f728037c53 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -136,6 +136,20 @@ DEFINE_EVENT(timer_class, timer_cancel, TP_ARGS(timer) ); +#define decode_clockid(type) \ + __print_symbolic(type, \ + { CLOCK_REALTIME, "CLOCK_REALTIME"}, \ + { CLOCK_MONOTONIC, "CLOCK_MONOTONIC" }, \ + { CLOCK_BOOTTIME, "CLOCK_BOOTTIME"}, \ + { CLOCK_TAI,"CLOCK_TAI" }) + +#define decode_hrtimer_mode(mode) \ + __print_symbolic(mode, \ + { HRTIMER_MODE_ABS, "ABS" }, \ + { HRTIMER_MODE_REL, "REL" }, \ + { HRTIMER_MODE_ABS_PINNED, "ABS|PINNED"}, \ + { HRTIMER_MODE_REL_PINNED, "REL|PINNED"}) + /** * hrtimer_init - called when the hrtimer is initialized * @hrtimer: pointer to struct hrtimer @@ -162,10 +176,8 @@ TRACE_EVENT(hrtimer_init, ), TP_printk("hrtimer=%p clockid=%s mode=%s", __entry->hrtimer, - __entry->clockid == CLOCK_REALTIME ? - "CLOCK_REALTIME" : "CLOCK_MONOTONIC", - __entry->mode == HRTIMER_MODE_ABS ? - "HRTIMER_MODE_ABS" : "HRTIMER_MODE_REL") + decode_clockid(__entry->clockid), + decode_hrtimer_mode(__entry->mode)) ); /** -- 2.15.1
[PATCH AUTOSEL for 4.15 008/189] ALSA: hda - Use IS_REACHABLE() for dependency on input
From: Takashi Iwai[ Upstream commit c469652bb5e8fb715db7d152f46d33b3740c9b87 ] The commit ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec") introduced the reverse-selection of CONFIG_INPUT for Realtek codec in order to avoid the mess with dependency between built-in and modules. Later on, we obtained IS_REACHABLE() macro exactly for this kind of problems, and now we can remove th INPUT selection in Kconfig and put IS_REACHABLE(INPUT) to the appropriate places in the code, so that the driver doesn't need to select other subsystem forcibly. Fixes: ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec") Reported-by: Randy Dunlap Acked-by: Randy Dunlap # and build-tested Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/pci/hda/Kconfig | 1 - sound/pci/hda/patch_realtek.c | 5 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/sound/pci/hda/Kconfig b/sound/pci/hda/Kconfig index 7f3b5ed81995..f7a492c382d9 100644 --- a/sound/pci/hda/Kconfig +++ b/sound/pci/hda/Kconfig @@ -88,7 +88,6 @@ config SND_HDA_PATCH_LOADER config SND_HDA_CODEC_REALTEK tristate "Build Realtek HD-audio codec support" select SND_HDA_GENERIC - select INPUT help Say Y or M here to include Realtek HD-audio codec support in snd-hda-intel driver, such as ALC880. diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 206774703a33..ac7ef3957159 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -3744,6 +3744,7 @@ static void alc280_fixup_hp_gpio4(struct hda_codec *codec, } } +#if IS_REACHABLE(INPUT) static void gpio2_mic_hotkey_event(struct hda_codec *codec, struct hda_jack_callback *event) { @@ -3876,6 +3877,10 @@ static void alc233_fixup_lenovo_line2_mic_hotkey(struct hda_codec *codec, spec->kb_dev = NULL; } } +#else /* INPUT */ +#define alc280_fixup_hp_gpio2_mic_hotkey NULL +#define alc233_fixup_lenovo_line2_mic_hotkey NULL +#endif /* INPUT */ static void alc269_fixup_hp_line1_mic1_led(struct hda_codec *codec, const struct hda_fixup *fix, int action) -- 2.15.1
[PATCH AUTOSEL for 4.15 008/189] ALSA: hda - Use IS_REACHABLE() for dependency on input
From: Takashi Iwai [ Upstream commit c469652bb5e8fb715db7d152f46d33b3740c9b87 ] The commit ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec") introduced the reverse-selection of CONFIG_INPUT for Realtek codec in order to avoid the mess with dependency between built-in and modules. Later on, we obtained IS_REACHABLE() macro exactly for this kind of problems, and now we can remove th INPUT selection in Kconfig and put IS_REACHABLE(INPUT) to the appropriate places in the code, so that the driver doesn't need to select other subsystem forcibly. Fixes: ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec") Reported-by: Randy Dunlap Acked-by: Randy Dunlap # and build-tested Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/pci/hda/Kconfig | 1 - sound/pci/hda/patch_realtek.c | 5 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/sound/pci/hda/Kconfig b/sound/pci/hda/Kconfig index 7f3b5ed81995..f7a492c382d9 100644 --- a/sound/pci/hda/Kconfig +++ b/sound/pci/hda/Kconfig @@ -88,7 +88,6 @@ config SND_HDA_PATCH_LOADER config SND_HDA_CODEC_REALTEK tristate "Build Realtek HD-audio codec support" select SND_HDA_GENERIC - select INPUT help Say Y or M here to include Realtek HD-audio codec support in snd-hda-intel driver, such as ALC880. diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 206774703a33..ac7ef3957159 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -3744,6 +3744,7 @@ static void alc280_fixup_hp_gpio4(struct hda_codec *codec, } } +#if IS_REACHABLE(INPUT) static void gpio2_mic_hotkey_event(struct hda_codec *codec, struct hda_jack_callback *event) { @@ -3876,6 +3877,10 @@ static void alc233_fixup_lenovo_line2_mic_hotkey(struct hda_codec *codec, spec->kb_dev = NULL; } } +#else /* INPUT */ +#define alc280_fixup_hp_gpio2_mic_hotkey NULL +#define alc233_fixup_lenovo_line2_mic_hotkey NULL +#endif /* INPUT */ static void alc269_fixup_hp_line1_mic1_led(struct hda_codec *codec, const struct hda_fixup *fix, int action) -- 2.15.1
[PATCH AUTOSEL for 4.15 013/189] netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460
From: Subash Abhinov Kasiviswanathan[ Upstream commit 83f1999caeb14e15df205e80d210699951733287 ] ipv6_defrag pulls network headers before fragment header. In case of an error, the netfilter layer is currently dropping these packets. This results in failure of some IPv6 standards tests which passed on older kernels due to the netfilter framework using cloning. The test case run here is a check for ICMPv6 error message replies when some invalid IPv6 fragments are sent. This specific test case is listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf in the Extension Header Processing Order section. A packet with unrecognized option Type 11 is sent and the test expects an ICMP error in line with RFC2460 section 4.2 - 11 - discard the packet and, only if the packet's Destination Address was not a multicast address, send an ICMP Parameter Problem, Code 2, message to the packet's Source Address, pointing to the unrecognized Option Type. Since netfilter layer now drops all invalid IPv6 frag packets, we no longer see the ICMP error message and fail the test case. To fix this, save the transport header. If defrag is unable to process the packet due to RFC2460, restore the transport header and allow packet to be processed by stack. There is no change for other packet processing paths. Tested by confirming that stack sends an ICMP error when it receives these packets. Also tested that fragmented ICMP pings succeed. v1->v2: Instead of cloning always, save the transport_header and restore it in case of this specific error. Update the title and commit message accordingly. Signed-off-by: Subash Abhinov Kasiviswanathan Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/ipv6/netfilter/nf_conntrack_reasm.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 977d8900cfd1..ce53dcfda88a 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -231,7 +231,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, if ((unsigned int)end > IPV6_MAXPLEN) { pr_debug("offset is too large.\n"); - return -1; + return -EINVAL; } ecn = ip6_frag_ecn(ipv6_hdr(skb)); @@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, * this case. -DaveM */ pr_debug("end of fragment not rounded to 8 bytes.\n"); - return -1; + return -EPROTO; } if (end > fq->q.len) { /* Some bits beyond end -> corruption. */ @@ -358,7 +358,7 @@ found: discard_fq: inet_frag_kill(>q, _frags); err: - return -1; + return -EINVAL; } /* @@ -567,6 +567,7 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff) int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) { + u16 savethdr = skb->transport_header; struct net_device *dev = skb->dev; int fhoff, nhoff, ret; struct frag_hdr *fhdr; @@ -600,8 +601,12 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) spin_lock_bh(>q.lock); - if (nf_ct_frag6_queue(fq, skb, fhdr, nhoff) < 0) { - ret = -EINVAL; + ret = nf_ct_frag6_queue(fq, skb, fhdr, nhoff); + if (ret < 0) { + if (ret == -EPROTO) { + skb->transport_header = savethdr; + ret = 0; + } goto out_unlock; } -- 2.15.1
[PATCH AUTOSEL for 4.15 013/189] netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460
From: Subash Abhinov Kasiviswanathan [ Upstream commit 83f1999caeb14e15df205e80d210699951733287 ] ipv6_defrag pulls network headers before fragment header. In case of an error, the netfilter layer is currently dropping these packets. This results in failure of some IPv6 standards tests which passed on older kernels due to the netfilter framework using cloning. The test case run here is a check for ICMPv6 error message replies when some invalid IPv6 fragments are sent. This specific test case is listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf in the Extension Header Processing Order section. A packet with unrecognized option Type 11 is sent and the test expects an ICMP error in line with RFC2460 section 4.2 - 11 - discard the packet and, only if the packet's Destination Address was not a multicast address, send an ICMP Parameter Problem, Code 2, message to the packet's Source Address, pointing to the unrecognized Option Type. Since netfilter layer now drops all invalid IPv6 frag packets, we no longer see the ICMP error message and fail the test case. To fix this, save the transport header. If defrag is unable to process the packet due to RFC2460, restore the transport header and allow packet to be processed by stack. There is no change for other packet processing paths. Tested by confirming that stack sends an ICMP error when it receives these packets. Also tested that fragmented ICMP pings succeed. v1->v2: Instead of cloning always, save the transport_header and restore it in case of this specific error. Update the title and commit message accordingly. Signed-off-by: Subash Abhinov Kasiviswanathan Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/ipv6/netfilter/nf_conntrack_reasm.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 977d8900cfd1..ce53dcfda88a 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -231,7 +231,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, if ((unsigned int)end > IPV6_MAXPLEN) { pr_debug("offset is too large.\n"); - return -1; + return -EINVAL; } ecn = ip6_frag_ecn(ipv6_hdr(skb)); @@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, * this case. -DaveM */ pr_debug("end of fragment not rounded to 8 bytes.\n"); - return -1; + return -EPROTO; } if (end > fq->q.len) { /* Some bits beyond end -> corruption. */ @@ -358,7 +358,7 @@ found: discard_fq: inet_frag_kill(>q, _frags); err: - return -1; + return -EINVAL; } /* @@ -567,6 +567,7 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff) int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) { + u16 savethdr = skb->transport_header; struct net_device *dev = skb->dev; int fhoff, nhoff, ret; struct frag_hdr *fhdr; @@ -600,8 +601,12 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) spin_lock_bh(>q.lock); - if (nf_ct_frag6_queue(fq, skb, fhdr, nhoff) < 0) { - ret = -EINVAL; + ret = nf_ct_frag6_queue(fq, skb, fhdr, nhoff); + if (ret < 0) { + if (ret == -EPROTO) { + skb->transport_header = savethdr; + ret = 0; + } goto out_unlock; } -- 2.15.1
[PATCH AUTOSEL for 4.15 016/189] platform/x86: dell-laptop: Filter out spurious keyboard backlight change events
From: Hans de Goede[ Upstream commit 4d6bde512a86c32df3a1f289d2b4cd04b17758d1 ] On some Dell XPS models WMI events of type 0x reporting a keycode of 0xe00c get reported when the brightness of the LCD panel changes. This leads to us reporting false-positive kbd_led change events to userspace which in turn leads to the kbd backlight OSD showing when it should not. We already read the current keyboard backlight brightness value when reporting events because the led_classdev_notify_brightness_hw_changed API requires this. Compare this value to the last known value and filter out duplicate events, fixing this. Note the fixed issue is esp. a problem on XPS models with an ambient light sensor and automatic brightness adjustments turned on, this causes the kbd backlight OSD to show all the time there. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514969 Fixes: 9c656b0799 ("platform/x86: dell-*: Call new led hw_changed API ...") Acked-by: Pali Rohár Signed-off-by: Hans de Goede Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin --- drivers/platform/x86/dell-laptop.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/platform/x86/dell-laptop.c b/drivers/platform/x86/dell-laptop.c index c864430b9fcf..8ddb309cfc85 100644 --- a/drivers/platform/x86/dell-laptop.c +++ b/drivers/platform/x86/dell-laptop.c @@ -1149,6 +1149,7 @@ static u8 kbd_previous_mode_bit; static bool kbd_led_present; static DEFINE_MUTEX(kbd_led_mutex); +static enum led_brightness kbd_led_level; /* * NOTE: there are three ways to set the keyboard backlight level. @@ -1971,6 +1972,7 @@ static enum led_brightness kbd_led_level_get(struct led_classdev *led_cdev) static int kbd_led_level_set(struct led_classdev *led_cdev, enum led_brightness value) { + enum led_brightness new_value = value; struct kbd_state state; struct kbd_state new_state; u16 num; @@ -2000,6 +2002,9 @@ static int kbd_led_level_set(struct led_classdev *led_cdev, } out: + if (ret == 0) + kbd_led_level = new_value; + mutex_unlock(_led_mutex); return ret; } @@ -2027,6 +2032,9 @@ static int __init kbd_led_init(struct device *dev) if (kbd_led.max_brightness) kbd_led.max_brightness--; } + + kbd_led_level = kbd_led_level_get(NULL); + ret = led_classdev_register(dev, _led); if (ret) kbd_led_present = false; @@ -2051,13 +2059,25 @@ static void kbd_led_exit(void) static int dell_laptop_notifier_call(struct notifier_block *nb, unsigned long action, void *data) { + bool changed = false; + enum led_brightness new_kbd_led_level; + switch (action) { case DELL_LAPTOP_KBD_BACKLIGHT_BRIGHTNESS_CHANGED: if (!kbd_led_present) break; - led_classdev_notify_brightness_hw_changed(_led, - kbd_led_level_get(_led)); + mutex_lock(_led_mutex); + new_kbd_led_level = kbd_led_level_get(_led); + if (kbd_led_level != new_kbd_led_level) { + kbd_led_level = new_kbd_led_level; + changed = true; + } + mutex_unlock(_led_mutex); + + if (changed) + led_classdev_notify_brightness_hw_changed(_led, + kbd_led_level); break; } -- 2.15.1
[PATCH AUTOSEL for 4.15 015/189] KVM: s390: use created_vcpus in more places
From: Christian Borntraeger[ Upstream commit 241e3ec0faf5ab1a0d9b1f6c43eefa919fb9c112 ] commit a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus") introduced kvm->created_vcpus to avoid races with the existing kvm->online_vcpus scheme. One place was "forgotten" and one new place was "added". Let's fix those. Reported-by: Halil Pasic Signed-off-by: Christian Borntraeger Reviewed-by: Halil Pasic Reviewed-by: Cornelia Huck Reviewed-by: David Hildenbrand Fixes: 4e0b1ab72b8a ("KVM: s390: gs support for kvm guests") Fixes: a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus") Signed-off-by: Sasha Levin --- arch/s390/kvm/kvm-s390.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 004684eaa827..50193cbc819a 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -602,7 +602,7 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) case KVM_CAP_S390_GS: r = -EINVAL; mutex_lock(>lock); - if (atomic_read(>online_vcpus)) { + if (kvm->created_vcpus) { r = -EBUSY; } else if (test_facility(133)) { set_kvm_facility(kvm->arch.model.fac_mask, 133); @@ -1122,7 +1122,7 @@ static int kvm_s390_set_processor_feat(struct kvm *kvm, return -EINVAL; mutex_lock(>lock); - if (!atomic_read(>online_vcpus)) { + if (!kvm->created_vcpus) { bitmap_copy(kvm->arch.cpu_feat, (unsigned long *) data.feat, KVM_S390_VM_CPU_FEAT_NR_BITS); ret = 0; -- 2.15.1
[PATCH AUTOSEL for 4.15 018/189] xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_request
From: Chuck Lever[ Upstream commit 42b9f5c58aa8c59c91ead0254f0c193e3438b020 ] The rpcrdma_req is not shared yet, and its associated Send hasn't been posted, thus RMW should be safe. There's no need for the expense of a lock cycle here. Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion") Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin --- net/sunrpc/xprtrdma/transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 6ee1ad8978f3..76c03aa6cb57 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -744,7 +744,7 @@ xprt_rdma_send_request(struct rpc_task *task) goto drop_connection; req->rl_connect_cookie = xprt->connect_cookie; - set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags); + __set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags); if (rpcrdma_ep_post(_xprt->rx_ia, _xprt->rx_ep, req)) goto drop_connection; -- 2.15.1
[PATCH AUTOSEL for 4.15 016/189] platform/x86: dell-laptop: Filter out spurious keyboard backlight change events
From: Hans de Goede [ Upstream commit 4d6bde512a86c32df3a1f289d2b4cd04b17758d1 ] On some Dell XPS models WMI events of type 0x reporting a keycode of 0xe00c get reported when the brightness of the LCD panel changes. This leads to us reporting false-positive kbd_led change events to userspace which in turn leads to the kbd backlight OSD showing when it should not. We already read the current keyboard backlight brightness value when reporting events because the led_classdev_notify_brightness_hw_changed API requires this. Compare this value to the last known value and filter out duplicate events, fixing this. Note the fixed issue is esp. a problem on XPS models with an ambient light sensor and automatic brightness adjustments turned on, this causes the kbd backlight OSD to show all the time there. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514969 Fixes: 9c656b0799 ("platform/x86: dell-*: Call new led hw_changed API ...") Acked-by: Pali Rohár Signed-off-by: Hans de Goede Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin --- drivers/platform/x86/dell-laptop.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/platform/x86/dell-laptop.c b/drivers/platform/x86/dell-laptop.c index c864430b9fcf..8ddb309cfc85 100644 --- a/drivers/platform/x86/dell-laptop.c +++ b/drivers/platform/x86/dell-laptop.c @@ -1149,6 +1149,7 @@ static u8 kbd_previous_mode_bit; static bool kbd_led_present; static DEFINE_MUTEX(kbd_led_mutex); +static enum led_brightness kbd_led_level; /* * NOTE: there are three ways to set the keyboard backlight level. @@ -1971,6 +1972,7 @@ static enum led_brightness kbd_led_level_get(struct led_classdev *led_cdev) static int kbd_led_level_set(struct led_classdev *led_cdev, enum led_brightness value) { + enum led_brightness new_value = value; struct kbd_state state; struct kbd_state new_state; u16 num; @@ -2000,6 +2002,9 @@ static int kbd_led_level_set(struct led_classdev *led_cdev, } out: + if (ret == 0) + kbd_led_level = new_value; + mutex_unlock(_led_mutex); return ret; } @@ -2027,6 +2032,9 @@ static int __init kbd_led_init(struct device *dev) if (kbd_led.max_brightness) kbd_led.max_brightness--; } + + kbd_led_level = kbd_led_level_get(NULL); + ret = led_classdev_register(dev, _led); if (ret) kbd_led_present = false; @@ -2051,13 +2059,25 @@ static void kbd_led_exit(void) static int dell_laptop_notifier_call(struct notifier_block *nb, unsigned long action, void *data) { + bool changed = false; + enum led_brightness new_kbd_led_level; + switch (action) { case DELL_LAPTOP_KBD_BACKLIGHT_BRIGHTNESS_CHANGED: if (!kbd_led_present) break; - led_classdev_notify_brightness_hw_changed(_led, - kbd_led_level_get(_led)); + mutex_lock(_led_mutex); + new_kbd_led_level = kbd_led_level_get(_led); + if (kbd_led_level != new_kbd_led_level) { + kbd_led_level = new_kbd_led_level; + changed = true; + } + mutex_unlock(_led_mutex); + + if (changed) + led_classdev_notify_brightness_hw_changed(_led, + kbd_led_level); break; } -- 2.15.1
[PATCH AUTOSEL for 4.15 015/189] KVM: s390: use created_vcpus in more places
From: Christian Borntraeger [ Upstream commit 241e3ec0faf5ab1a0d9b1f6c43eefa919fb9c112 ] commit a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus") introduced kvm->created_vcpus to avoid races with the existing kvm->online_vcpus scheme. One place was "forgotten" and one new place was "added". Let's fix those. Reported-by: Halil Pasic Signed-off-by: Christian Borntraeger Reviewed-by: Halil Pasic Reviewed-by: Cornelia Huck Reviewed-by: David Hildenbrand Fixes: 4e0b1ab72b8a ("KVM: s390: gs support for kvm guests") Fixes: a03825bbd0c3 ("KVM: s390: use kvm->created_vcpus") Signed-off-by: Sasha Levin --- arch/s390/kvm/kvm-s390.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 004684eaa827..50193cbc819a 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -602,7 +602,7 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) case KVM_CAP_S390_GS: r = -EINVAL; mutex_lock(>lock); - if (atomic_read(>online_vcpus)) { + if (kvm->created_vcpus) { r = -EBUSY; } else if (test_facility(133)) { set_kvm_facility(kvm->arch.model.fac_mask, 133); @@ -1122,7 +1122,7 @@ static int kvm_s390_set_processor_feat(struct kvm *kvm, return -EINVAL; mutex_lock(>lock); - if (!atomic_read(>online_vcpus)) { + if (!kvm->created_vcpus) { bitmap_copy(kvm->arch.cpu_feat, (unsigned long *) data.feat, KVM_S390_VM_CPU_FEAT_NR_BITS); ret = 0; -- 2.15.1
[PATCH AUTOSEL for 4.15 018/189] xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_request
From: Chuck Lever [ Upstream commit 42b9f5c58aa8c59c91ead0254f0c193e3438b020 ] The rpcrdma_req is not shared yet, and its associated Send hasn't been posted, thus RMW should be safe. There's no need for the expense of a lock cycle here. Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion") Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin --- net/sunrpc/xprtrdma/transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 6ee1ad8978f3..76c03aa6cb57 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -744,7 +744,7 @@ xprt_rdma_send_request(struct rpc_task *task) goto drop_connection; req->rl_connect_cookie = xprt->connect_cookie; - set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags); + __set_bit(RPCRDMA_REQ_F_PENDING, >rl_flags); if (rpcrdma_ep_post(_xprt->rx_ia, _xprt->rx_ep, req)) goto drop_connection; -- 2.15.1
[PATCH AUTOSEL for 4.15 019/189] printk: Add console owner and waiter logic to load balance console writes
From: "Steven Rostedt (VMware)"[ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ] This patch implements what I discussed in Kernel Summit. I added lockdep annotation (hopefully correctly), and it hasn't had any splats (since I fixed some bugs in the first iterations). It did catch problems when I had the owner covering too much. But now that the owner is only set when actively calling the consoles, lockdep has stayed quiet. Here's the design again: I added a "console_owner" which is set to a task that is actively writing to the consoles. It is *not* the same as the owner of the console_lock. It is only set when doing the calls to the console functions. It is protected by a console_owner_lock which is a raw spin lock. There is a console_waiter. This is set when there is an active console owner that is not current, and waiter is not set. This too is protected by console_owner_lock. In printk() when it tries to write to the consoles, we have: if (console_trylock()) console_unlock(); Now I added an else, which will check if there is an active owner, and no current waiter. If that is the case, then console_waiter is set, and the task goes into a spin until it is no longer set. When the active console owner finishes writing the current message to the consoles, it grabs the console_owner_lock and sees if there is a waiter, and clears console_owner. If there is a waiter, then it breaks out of the loop, clears the waiter flag (because that will release the waiter from its spin), and exits. Note, it does *not* release the console semaphore. Because it is a semaphore, there is no owner. Another task may release it. This means that the waiter is guaranteed to be the new console owner! Which it becomes. Then the waiter calls console_unlock() and continues to write to the consoles. If another task comes along and does a printk() it too can become the new waiter, and we wash rinse and repeat! By Petr Mladek about possible new deadlocks: The thing is that we move console_sem only to printk() call that normally calls console_unlock() as well. It means that the transferred owner should not bring new type of dependencies. As Steven said somewhere: "If there is a deadlock, it was there even before." We could look at it from this side. The possible deadlock would look like: CPU0CPU1 console_unlock() console_owner = current; spin_lockA() printk() spin = true; while (...) call_console_drivers() spin_lockA() This would be a deadlock. CPU0 would wait for the lock A. While CPU1 would own the lockA and would wait for CPU0 to finish calling the console drivers and pass the console_sem owner. But if the above is true than the following scenario was already possible before: CPU0 spin_lockA() printk() console_unlock() call_console_drivers() spin_lockA() By other words, this deadlock was there even before. Such deadlocks are prevented by using printk_deferred() in the sections guarded by the lock A. By Steven Rostedt: To demonstrate the issue, this module has been shown to lock up a system with 4 CPUs and a slow console (like a serial console). It is also able to lock up a 8 CPU system with only a fast (VGA) console, by passing in "loops=100". The changes in this commit prevent this module from locking up the system. #include #include #include #include #include #include static bool stop_testing; static unsigned int loops = 1; static void preempt_printk_workfn(struct work_struct *work) { int i; while (!READ_ONCE(stop_testing)) { for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) { preempt_disable(); pr_emerg("%5d%-75s\n", smp_processor_id(), " XXX NOPREEMPT"); preempt_enable(); } msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, _printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); }
[PATCH AUTOSEL for 4.15 019/189] printk: Add console owner and waiter logic to load balance console writes
From: "Steven Rostedt (VMware)" [ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ] This patch implements what I discussed in Kernel Summit. I added lockdep annotation (hopefully correctly), and it hasn't had any splats (since I fixed some bugs in the first iterations). It did catch problems when I had the owner covering too much. But now that the owner is only set when actively calling the consoles, lockdep has stayed quiet. Here's the design again: I added a "console_owner" which is set to a task that is actively writing to the consoles. It is *not* the same as the owner of the console_lock. It is only set when doing the calls to the console functions. It is protected by a console_owner_lock which is a raw spin lock. There is a console_waiter. This is set when there is an active console owner that is not current, and waiter is not set. This too is protected by console_owner_lock. In printk() when it tries to write to the consoles, we have: if (console_trylock()) console_unlock(); Now I added an else, which will check if there is an active owner, and no current waiter. If that is the case, then console_waiter is set, and the task goes into a spin until it is no longer set. When the active console owner finishes writing the current message to the consoles, it grabs the console_owner_lock and sees if there is a waiter, and clears console_owner. If there is a waiter, then it breaks out of the loop, clears the waiter flag (because that will release the waiter from its spin), and exits. Note, it does *not* release the console semaphore. Because it is a semaphore, there is no owner. Another task may release it. This means that the waiter is guaranteed to be the new console owner! Which it becomes. Then the waiter calls console_unlock() and continues to write to the consoles. If another task comes along and does a printk() it too can become the new waiter, and we wash rinse and repeat! By Petr Mladek about possible new deadlocks: The thing is that we move console_sem only to printk() call that normally calls console_unlock() as well. It means that the transferred owner should not bring new type of dependencies. As Steven said somewhere: "If there is a deadlock, it was there even before." We could look at it from this side. The possible deadlock would look like: CPU0CPU1 console_unlock() console_owner = current; spin_lockA() printk() spin = true; while (...) call_console_drivers() spin_lockA() This would be a deadlock. CPU0 would wait for the lock A. While CPU1 would own the lockA and would wait for CPU0 to finish calling the console drivers and pass the console_sem owner. But if the above is true than the following scenario was already possible before: CPU0 spin_lockA() printk() console_unlock() call_console_drivers() spin_lockA() By other words, this deadlock was there even before. Such deadlocks are prevented by using printk_deferred() in the sections guarded by the lock A. By Steven Rostedt: To demonstrate the issue, this module has been shown to lock up a system with 4 CPUs and a slow console (like a serial console). It is also able to lock up a 8 CPU system with only a fast (VGA) console, by passing in "loops=100". The changes in this commit prevent this module from locking up the system. #include #include #include #include #include #include static bool stop_testing; static unsigned int loops = 1; static void preempt_printk_workfn(struct work_struct *work) { int i; while (!READ_ONCE(stop_testing)) { for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) { preempt_disable(); pr_emerg("%5d%-75s\n", smp_processor_id(), " XXX NOPREEMPT"); preempt_enable(); } msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, _printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); } module_param(loops, uint,
[PATCH AUTOSEL for 4.15 017/189] xprtrdma: Fix backchannel allocation of extra rpcrdma_reps
From: Chuck Lever[ Upstream commit d698c4a02ee02053bbebe051322ff427a2dad56a ] The backchannel code uses rpcrdma_recv_buffer_put to add new reps to the free rep list. This also decrements rb_recv_count, which spoofs the receive overrun logic in rpcrdma_buffer_get_rep. Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)") replaced the original open-coded list_add with a call to rpcrdma_recv_buffer_put(), but then a year later, commit 05c974669ece ("xprtrdma: Fix receive buffer accounting") added rep accounting to rpcrdma_recv_buffer_put. It was an oversight to let the backchannel continue to use this function. The fix this, let's combine the "add to free list" logic with rpcrdma_create_rep. Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in rpcrdma_buffer_create and then allocate additional rpcrdma_reps in rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel set-up is sufficient. Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting") Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin --- net/sunrpc/xprtrdma/backchannel.c | 12 ++-- net/sunrpc/xprtrdma/verbs.c | 32 +++- net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- 3 files changed, 22 insertions(+), 24 deletions(-) diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c index 8b818bb3518a..256c67b433c1 100644 --- a/net/sunrpc/xprtrdma/backchannel.c +++ b/net/sunrpc/xprtrdma/backchannel.c @@ -74,21 +74,13 @@ out_fail: static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt, unsigned int count) { - struct rpcrdma_rep *rep; int rc = 0; while (count--) { - rep = rpcrdma_create_rep(r_xprt); - if (IS_ERR(rep)) { - pr_err("RPC: %s: reply buffer alloc failed\n", - __func__); - rc = PTR_ERR(rep); + rc = rpcrdma_create_rep(r_xprt); + if (rc) break; - } - - rpcrdma_recv_buffer_put(rep); } - return rc; } diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 8cd7ee4fa0cd..371fbd9b55bb 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1093,10 +1093,17 @@ rpcrdma_create_req(struct rpcrdma_xprt *r_xprt) return req; } -struct rpcrdma_rep * +/** + * rpcrdma_create_rep - Allocate an rpcrdma_rep object + * @r_xprt: controlling transport + * + * Returns 0 on success or a negative errno on failure. + */ +int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt) { struct rpcrdma_create_data_internal *cdata = _xprt->rx_data; + struct rpcrdma_buffer *buf = _xprt->rx_buf; struct rpcrdma_rep *rep; int rc; @@ -1121,12 +1128,18 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt) rep->rr_recv_wr.wr_cqe = >rr_cqe; rep->rr_recv_wr.sg_list = >rr_rdmabuf->rg_iov; rep->rr_recv_wr.num_sge = 1; - return rep; + + spin_lock(>rb_lock); + list_add(>rr_list, >rb_recv_bufs); + spin_unlock(>rb_lock); + return 0; out_free: kfree(rep); out: - return ERR_PTR(rc); + dprintk("RPC: %s: reply buffer %d alloc failed\n", + __func__, rc); + return rc; } int @@ -1167,17 +1180,10 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt) } INIT_LIST_HEAD(>rb_recv_bufs); - for (i = 0; i < buf->rb_max_requests + RPCRDMA_MAX_BC_REQUESTS; i++) { - struct rpcrdma_rep *rep; - - rep = rpcrdma_create_rep(r_xprt); - if (IS_ERR(rep)) { - dprintk("RPC: %s: reply buffer %d alloc failed\n", - __func__, i); - rc = PTR_ERR(rep); + for (i = 0; i <= buf->rb_max_requests; i++) { + rc = rpcrdma_create_rep(r_xprt); + if (rc) goto out; - } - list_add(>rr_list, >rb_recv_bufs); } rc = rpcrdma_sendctxs_create(r_xprt); diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 1342f743f1c4..3b63e61feae2 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -564,8 +564,8 @@ int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_rep *); * Buffer calls - xprtrdma/verbs.c */ struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *); -struct rpcrdma_rep *rpcrdma_create_rep(struct rpcrdma_xprt *); void rpcrdma_destroy_req(struct rpcrdma_req *); +int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt); int rpcrdma_buffer_create(struct rpcrdma_xprt *); void rpcrdma_buffer_destroy(struct
[PATCH AUTOSEL for 4.15 009/189] ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
From: Dan Carpenter[ Upstream commit 123af9043e93cb6f235207d260d50f832cdb5439 ] The loop timeout doesn't work because it's a post op and ends with "tmo" set to -1. I changed it from a post-op to a pre-op and I changed the initial the starting value from 5 to 6 so we still iterate 5 times. I left the other as it was because it's a large number. Fixes: b3c70c9ea62a ("ASoC: Alchemy AC97C/I2SC audio support") Signed-off-by: Dan Carpenter Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/au1x/ac97c.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sound/soc/au1x/ac97c.c b/sound/soc/au1x/ac97c.c index 29a97d52e8ad..66d6c52e7761 100644 --- a/sound/soc/au1x/ac97c.c +++ b/sound/soc/au1x/ac97c.c @@ -91,8 +91,8 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 *ac97, do { mutex_lock(>lock); - tmo = 5; - while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--) + tmo = 6; + while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo) udelay(21); /* wait an ac97 frame time */ if (!tmo) { pr_debug("ac97rd timeout #1\n"); @@ -105,7 +105,7 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 *ac97, * poll, Forrest, poll... */ tmo = 0x1; - while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--) + while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo) asm volatile ("nop"); data = RD(ctx, AC97_CMDRESP); -- 2.15.1
[PATCH AUTOSEL for 4.15 017/189] xprtrdma: Fix backchannel allocation of extra rpcrdma_reps
From: Chuck Lever [ Upstream commit d698c4a02ee02053bbebe051322ff427a2dad56a ] The backchannel code uses rpcrdma_recv_buffer_put to add new reps to the free rep list. This also decrements rb_recv_count, which spoofs the receive overrun logic in rpcrdma_buffer_get_rep. Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)") replaced the original open-coded list_add with a call to rpcrdma_recv_buffer_put(), but then a year later, commit 05c974669ece ("xprtrdma: Fix receive buffer accounting") added rep accounting to rpcrdma_recv_buffer_put. It was an oversight to let the backchannel continue to use this function. The fix this, let's combine the "add to free list" logic with rpcrdma_create_rep. Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in rpcrdma_buffer_create and then allocate additional rpcrdma_reps in rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel set-up is sufficient. Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting") Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin --- net/sunrpc/xprtrdma/backchannel.c | 12 ++-- net/sunrpc/xprtrdma/verbs.c | 32 +++- net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- 3 files changed, 22 insertions(+), 24 deletions(-) diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c index 8b818bb3518a..256c67b433c1 100644 --- a/net/sunrpc/xprtrdma/backchannel.c +++ b/net/sunrpc/xprtrdma/backchannel.c @@ -74,21 +74,13 @@ out_fail: static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt, unsigned int count) { - struct rpcrdma_rep *rep; int rc = 0; while (count--) { - rep = rpcrdma_create_rep(r_xprt); - if (IS_ERR(rep)) { - pr_err("RPC: %s: reply buffer alloc failed\n", - __func__); - rc = PTR_ERR(rep); + rc = rpcrdma_create_rep(r_xprt); + if (rc) break; - } - - rpcrdma_recv_buffer_put(rep); } - return rc; } diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 8cd7ee4fa0cd..371fbd9b55bb 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1093,10 +1093,17 @@ rpcrdma_create_req(struct rpcrdma_xprt *r_xprt) return req; } -struct rpcrdma_rep * +/** + * rpcrdma_create_rep - Allocate an rpcrdma_rep object + * @r_xprt: controlling transport + * + * Returns 0 on success or a negative errno on failure. + */ +int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt) { struct rpcrdma_create_data_internal *cdata = _xprt->rx_data; + struct rpcrdma_buffer *buf = _xprt->rx_buf; struct rpcrdma_rep *rep; int rc; @@ -1121,12 +1128,18 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt) rep->rr_recv_wr.wr_cqe = >rr_cqe; rep->rr_recv_wr.sg_list = >rr_rdmabuf->rg_iov; rep->rr_recv_wr.num_sge = 1; - return rep; + + spin_lock(>rb_lock); + list_add(>rr_list, >rb_recv_bufs); + spin_unlock(>rb_lock); + return 0; out_free: kfree(rep); out: - return ERR_PTR(rc); + dprintk("RPC: %s: reply buffer %d alloc failed\n", + __func__, rc); + return rc; } int @@ -1167,17 +1180,10 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt) } INIT_LIST_HEAD(>rb_recv_bufs); - for (i = 0; i < buf->rb_max_requests + RPCRDMA_MAX_BC_REQUESTS; i++) { - struct rpcrdma_rep *rep; - - rep = rpcrdma_create_rep(r_xprt); - if (IS_ERR(rep)) { - dprintk("RPC: %s: reply buffer %d alloc failed\n", - __func__, i); - rc = PTR_ERR(rep); + for (i = 0; i <= buf->rb_max_requests; i++) { + rc = rpcrdma_create_rep(r_xprt); + if (rc) goto out; - } - list_add(>rr_list, >rb_recv_bufs); } rc = rpcrdma_sendctxs_create(r_xprt); diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 1342f743f1c4..3b63e61feae2 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -564,8 +564,8 @@ int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_rep *); * Buffer calls - xprtrdma/verbs.c */ struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *); -struct rpcrdma_rep *rpcrdma_create_rep(struct rpcrdma_xprt *); void rpcrdma_destroy_req(struct rpcrdma_req *); +int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt); int rpcrdma_buffer_create(struct rpcrdma_xprt *); void rpcrdma_buffer_destroy(struct rpcrdma_buffer *); struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf); -- 2.15.1
[PATCH AUTOSEL for 4.15 009/189] ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
From: Dan Carpenter [ Upstream commit 123af9043e93cb6f235207d260d50f832cdb5439 ] The loop timeout doesn't work because it's a post op and ends with "tmo" set to -1. I changed it from a post-op to a pre-op and I changed the initial the starting value from 5 to 6 so we still iterate 5 times. I left the other as it was because it's a large number. Fixes: b3c70c9ea62a ("ASoC: Alchemy AC97C/I2SC audio support") Signed-off-by: Dan Carpenter Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/au1x/ac97c.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sound/soc/au1x/ac97c.c b/sound/soc/au1x/ac97c.c index 29a97d52e8ad..66d6c52e7761 100644 --- a/sound/soc/au1x/ac97c.c +++ b/sound/soc/au1x/ac97c.c @@ -91,8 +91,8 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 *ac97, do { mutex_lock(>lock); - tmo = 5; - while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--) + tmo = 6; + while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo) udelay(21); /* wait an ac97 frame time */ if (!tmo) { pr_debug("ac97rd timeout #1\n"); @@ -105,7 +105,7 @@ static unsigned short au1xac97c_ac97_read(struct snd_ac97 *ac97, * poll, Forrest, poll... */ tmo = 0x1; - while ((RD(ctx, AC97_STATUS) & STAT_CP) && tmo--) + while ((RD(ctx, AC97_STATUS) & STAT_CP) && --tmo) asm volatile ("nop"); data = RD(ctx, AC97_CMDRESP); -- 2.15.1
[PATCH AUTOSEL for 4.15 024/189] Input: synaptics - reset the ABS_X/Y fuzz after initializing MT axes
From: Peter Hutterer[ Upstream commit 19eb4ed1141bd1096b9bc84ba9c4d03d5830c143 ] input_mt_init_slots() resets the ABS_X/Y fuzz to 0 and expects the driver to call input_mt_report_pointer_emulation(). That is based on the MT position bits which are already defuzzed - hence a fuzz of 0. In the case of synaptics semi-mt devices, we report the ABS_X/Y axes manually. This results in the MT position being defuzzed but the single-touch emulation missing that defuzzing. Work around this by re-initializing the ABS_X/Y axes after the MT axis to get the same fuzz value back. https://bugs.freedesktop.org/show_bug.cgi?id=104533 Signed-off-by: Peter Hutterer Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/mouse/synaptics.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c index ee5466a374bf..a246fc686bb7 100644 --- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -1280,6 +1280,16 @@ static void set_input_params(struct psmouse *psmouse, INPUT_MT_POINTER | (cr48_profile_sensor ? INPUT_MT_TRACK : INPUT_MT_SEMI_MT)); + + /* +* For semi-mt devices we send ABS_X/Y ourselves instead of +* input_mt_report_pointer_emulation. But +* input_mt_init_slots() resets the fuzz to 0, leading to a +* filtered ABS_MT_POSITION_X but an unfiltered ABS_X +* position. Let's re-initialize ABS_X/Y here. +*/ + if (!cr48_profile_sensor) + set_abs_position_params(dev, >info, ABS_X, ABS_Y); } if (SYN_CAP_PALMDETECT(info->capabilities)) -- 2.15.1
[PATCH AUTOSEL for 4.15 024/189] Input: synaptics - reset the ABS_X/Y fuzz after initializing MT axes
From: Peter Hutterer [ Upstream commit 19eb4ed1141bd1096b9bc84ba9c4d03d5830c143 ] input_mt_init_slots() resets the ABS_X/Y fuzz to 0 and expects the driver to call input_mt_report_pointer_emulation(). That is based on the MT position bits which are already defuzzed - hence a fuzz of 0. In the case of synaptics semi-mt devices, we report the ABS_X/Y axes manually. This results in the MT position being defuzzed but the single-touch emulation missing that defuzzing. Work around this by re-initializing the ABS_X/Y axes after the MT axis to get the same fuzz value back. https://bugs.freedesktop.org/show_bug.cgi?id=104533 Signed-off-by: Peter Hutterer Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/mouse/synaptics.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c index ee5466a374bf..a246fc686bb7 100644 --- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -1280,6 +1280,16 @@ static void set_input_params(struct psmouse *psmouse, INPUT_MT_POINTER | (cr48_profile_sensor ? INPUT_MT_TRACK : INPUT_MT_SEMI_MT)); + + /* +* For semi-mt devices we send ABS_X/Y ourselves instead of +* input_mt_report_pointer_emulation. But +* input_mt_init_slots() resets the fuzz to 0, leading to a +* filtered ABS_MT_POSITION_X but an unfiltered ABS_X +* position. Let's re-initialize ABS_X/Y here. +*/ + if (!cr48_profile_sensor) + set_abs_position_params(dev, >info, ABS_X, ABS_Y); } if (SYN_CAP_PALMDETECT(info->capabilities)) -- 2.15.1
[PATCH AUTOSEL for 4.15 022/189] Input: psmouse - fix Synaptics detection when protocol is disabled
From: Dmitry Torokhov[ Upstream commit 2bc4298f59d2f15175bb568e2d356b5912d0cdd9 ] When Synaptics protocol is disabled, we still need to try and detect the hardware, so we can switch to SMBus device if SMbus is detected, or we know that it is Synaptics device and reset it properly for the bare PS/2 protocol. Fixes: c378b5119eb0 ("Input: psmouse - factor out common protocol probing code") Reported-by: Matteo Croce Tested-by: Matteo Croce Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/mouse/psmouse-base.c | 34 +- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/drivers/input/mouse/psmouse-base.c b/drivers/input/mouse/psmouse-base.c index 6a5649e52eed..8ac9e03c05b4 100644 --- a/drivers/input/mouse/psmouse-base.c +++ b/drivers/input/mouse/psmouse-base.c @@ -975,6 +975,21 @@ static void psmouse_apply_defaults(struct psmouse *psmouse) psmouse->pt_deactivate = NULL; } +static bool psmouse_do_detect(int (*detect)(struct psmouse *, bool), + struct psmouse *psmouse, bool allow_passthrough, + bool set_properties) +{ + if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU && + !allow_passthrough) { + return false; + } + + if (set_properties) + psmouse_apply_defaults(psmouse); + + return detect(psmouse, set_properties) == 0; +} + static bool psmouse_try_protocol(struct psmouse *psmouse, enum psmouse_type type, unsigned int *max_proto, @@ -986,15 +1001,8 @@ static bool psmouse_try_protocol(struct psmouse *psmouse, if (!proto) return false; - if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU && - !proto->try_passthru) { - return false; - } - - if (set_properties) - psmouse_apply_defaults(psmouse); - - if (proto->detect(psmouse, set_properties) != 0) + if (!psmouse_do_detect(proto->detect, psmouse, proto->try_passthru, + set_properties)) return false; if (set_properties && proto->init && init_allowed) { @@ -1027,8 +1035,8 @@ static int psmouse_extensions(struct psmouse *psmouse, * Always check for focaltech, this is safe as it uses pnp-id * matching. */ - if (psmouse_try_protocol(psmouse, PSMOUSE_FOCALTECH, -_proto, set_properties, false)) { + if (psmouse_do_detect(focaltech_detect, + psmouse, false, set_properties)) { if (max_proto > PSMOUSE_IMEX && IS_ENABLED(CONFIG_MOUSE_PS2_FOCALTECH) && (!set_properties || focaltech_init(psmouse) == 0)) { @@ -1074,8 +1082,8 @@ static int psmouse_extensions(struct psmouse *psmouse, * probing for IntelliMouse. */ if (max_proto > PSMOUSE_PS2 && - psmouse_try_protocol(psmouse, PSMOUSE_SYNAPTICS, _proto, -set_properties, false)) { + psmouse_do_detect(synaptics_detect, + psmouse, false, set_properties)) { synaptics_hardware = true; if (max_proto > PSMOUSE_IMEX) { -- 2.15.1
[PATCH AUTOSEL for 4.15 022/189] Input: psmouse - fix Synaptics detection when protocol is disabled
From: Dmitry Torokhov [ Upstream commit 2bc4298f59d2f15175bb568e2d356b5912d0cdd9 ] When Synaptics protocol is disabled, we still need to try and detect the hardware, so we can switch to SMBus device if SMbus is detected, or we know that it is Synaptics device and reset it properly for the bare PS/2 protocol. Fixes: c378b5119eb0 ("Input: psmouse - factor out common protocol probing code") Reported-by: Matteo Croce Tested-by: Matteo Croce Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/mouse/psmouse-base.c | 34 +- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/drivers/input/mouse/psmouse-base.c b/drivers/input/mouse/psmouse-base.c index 6a5649e52eed..8ac9e03c05b4 100644 --- a/drivers/input/mouse/psmouse-base.c +++ b/drivers/input/mouse/psmouse-base.c @@ -975,6 +975,21 @@ static void psmouse_apply_defaults(struct psmouse *psmouse) psmouse->pt_deactivate = NULL; } +static bool psmouse_do_detect(int (*detect)(struct psmouse *, bool), + struct psmouse *psmouse, bool allow_passthrough, + bool set_properties) +{ + if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU && + !allow_passthrough) { + return false; + } + + if (set_properties) + psmouse_apply_defaults(psmouse); + + return detect(psmouse, set_properties) == 0; +} + static bool psmouse_try_protocol(struct psmouse *psmouse, enum psmouse_type type, unsigned int *max_proto, @@ -986,15 +1001,8 @@ static bool psmouse_try_protocol(struct psmouse *psmouse, if (!proto) return false; - if (psmouse->ps2dev.serio->id.type == SERIO_PS_PSTHRU && - !proto->try_passthru) { - return false; - } - - if (set_properties) - psmouse_apply_defaults(psmouse); - - if (proto->detect(psmouse, set_properties) != 0) + if (!psmouse_do_detect(proto->detect, psmouse, proto->try_passthru, + set_properties)) return false; if (set_properties && proto->init && init_allowed) { @@ -1027,8 +1035,8 @@ static int psmouse_extensions(struct psmouse *psmouse, * Always check for focaltech, this is safe as it uses pnp-id * matching. */ - if (psmouse_try_protocol(psmouse, PSMOUSE_FOCALTECH, -_proto, set_properties, false)) { + if (psmouse_do_detect(focaltech_detect, + psmouse, false, set_properties)) { if (max_proto > PSMOUSE_IMEX && IS_ENABLED(CONFIG_MOUSE_PS2_FOCALTECH) && (!set_properties || focaltech_init(psmouse) == 0)) { @@ -1074,8 +1082,8 @@ static int psmouse_extensions(struct psmouse *psmouse, * probing for IntelliMouse. */ if (max_proto > PSMOUSE_PS2 && - psmouse_try_protocol(psmouse, PSMOUSE_SYNAPTICS, _proto, -set_properties, false)) { + psmouse_do_detect(synaptics_detect, + psmouse, false, set_properties)) { synaptics_hardware = true; if (max_proto > PSMOUSE_IMEX) { -- 2.15.1
[PATCH AUTOSEL for 4.15 025/189] i40iw: Free IEQ resources
From: Mustafa Ismail[ Upstream commit f20d429511affab6a2a9129f46042f43e6ffe396 ] The iWARP Exception Queue (IEQ) resources are not freed when a QP is destroyed. Fix this by freeing IEQ resources when freeing QP resources. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail Signed-off-by: Shiraz Saleem Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin --- drivers/infiniband/hw/i40iw/i40iw_puda.c | 3 +-- drivers/infiniband/hw/i40iw/i40iw_puda.h | 1 + drivers/infiniband/hw/i40iw/i40iw_verbs.c | 1 + 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.c b/drivers/infiniband/hw/i40iw/i40iw_puda.c index 796a815b53fd..266c5952ba92 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_puda.c +++ b/drivers/infiniband/hw/i40iw/i40iw_puda.c @@ -48,7 +48,6 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, void *sqwrid); static void i40iw_ilq_putback_rcvbuf(struct i40iw_sc_qp *qp, u32 wqe_idx); static enum i40iw_status_code i40iw_puda_replenish_rq(struct i40iw_puda_rsrc *rsrc, bool initial); -static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp); /** * i40iw_puda_get_listbuf - get buffer from puda list * @list: list to use for buffers (ILQ or IEQ) @@ -1483,7 +1482,7 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, void *sqwrid) * @ieq: ieq resource * @qp: all pending fpdu buffers */ -static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp) +void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp) { struct i40iw_puda_buf *buf; struct i40iw_pfpdu *pfpdu = >pfpdu; diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.h b/drivers/infiniband/hw/i40iw/i40iw_puda.h index 660aa3edae56..53a7d58c84b5 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_puda.h +++ b/drivers/infiniband/hw/i40iw/i40iw_puda.h @@ -184,4 +184,5 @@ enum i40iw_status_code i40iw_cqp_qp_create_cmd(struct i40iw_sc_dev *dev, struct enum i40iw_status_code i40iw_cqp_cq_create_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_cq *cq); void i40iw_cqp_qp_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_qp *qp); void i40iw_cqp_cq_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_cq *cq); +void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp); #endif diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 3c6f3ce88f89..6aa613835405 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -412,6 +412,7 @@ void i40iw_free_qp_resources(struct i40iw_device *iwdev, { struct i40iw_pbl *iwpbl = >iwpbl; + i40iw_ieq_cleanup_qp(iwdev->vsi.ieq, >sc_qp); i40iw_dealloc_push_page(iwdev, >sc_qp); if (qp_num) i40iw_free_resource(iwdev, iwdev->allocated_qps, qp_num); -- 2.15.1
Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
> On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote: > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx > > busy > > polling udp packets with small length(e.g. 1byte udp payload), because > > setting > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet > > length. > > > > Ping-Latencies shown below were tested between two Virtual Machines using > > netperf (UDP_STREAM, len=1), and then another machine pinged the client: > > > > Packet-Weight Ping-Latencies(millisecond) > >min avg max > > Origin 3.319 18.48957.303 > > 64 1.6432.021 2.552 > > 128 1.8252.600 3.224 > > 256 1.9972.710 4.295 > > 512 1.8603.171 4.631 > > 1024 2.0024.173 9.056 > > 2048 2.2575.650 9.688 > > 4096 2.0938.50815.943 > > And this is with Q size 256 right? Yes. Ping-latencies with 512 VQ size show below. Packet-Weight Ping-Latencies(millisecond) min avg max Origin 6.357 29.17766.245 64 2.7983.614 4.403 128 2.8613.820 4.775 256 3.0084.018 4.807 512 3.2544.523 5.824 1024 3.0795.335 7.747 2048 3.9448.201 12.762 4096 4.158 11.05719.985 We will submit again. Is there anything else? > > > Ring size is a hint from device about a burst size it can tolerate. Based on > > benchmarks, set the weight to 2 * vq size. > > > > To evaluate this change, another tests were done using netperf(RR, TX) > > between > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was > > tweaked through qemu. Results shown below does not show obvious changes. > > What I asked for is ping-latency with different VQ sizes, > streaming below does not show anything. > > > vq size=256 TCP_RRvq size=512 TCP_RR > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > >1/ 1/ -7%/-2% 1/ 1/ 0%/-2% > >1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% > >1/ 8/ +1%/-2% 1/ 8/ 0%/+1% > > 64/ 1/ -6%/ 0% 64/ 1/ +7%/+3% > > 64/ 4/ 0%/+2% 64/ 4/ -1%/+1% > > 64/ 8/ 0%/ 0% 64/ 8/ -1%/-2% > > 256/ 1/ -3%/-4%256/ 1/ -4%/-2% > > 256/ 4/ +3%/+4%256/ 4/ +1%/+2% > > 256/ 8/ +2%/ 0%256/ 8/ +1%/-1% > > > > vq size=256 UDP_RRvq size=512 UDP_RR > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > >1/ 1/ -5%/+1% 1/ 1/ -3%/-2% > >1/ 4/ +4%/+1% 1/ 4/ -2%/+2% > >1/ 8/ -1%/-1% 1/ 8/ -1%/ 0% > > 64/ 1/ -2%/-3% 64/ 1/ +1%/+1% > > 64/ 4/ -5%/-1% 64/ 4/ +2%/ 0% > > 64/ 8/ 0%/-1% 64/ 8/ -2%/+1% > > 256/ 1/ +7%/+1%256/ 1/ -7%/ 0% > > 256/ 4/ +1%/+1%256/ 4/ -3%/-4% > > 256/ 8/ +2%/+2%256/ 8/ +1%/+1% > > > > vq size=256 TCP_STREAMvq size=512 TCP_STREAM > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > 64/ 1/ 0%/-3% 64/ 1/ 0%/ 0% > > 64/ 4/ +3%/-1% 64/ 4/ -2%/+4% > > 64/ 8/ +9%/-4% 64/ 8/ -1%/+2% > > 256/ 1/ +1%/-4%256/ 1/ +1%/+1% > > 256/ 4/ -1%/-1%256/ 4/ -3%/ 0% > > 256/ 8/ +7%/+5%256/ 8/ -3%/ 0% > > 512/ 1/ +1%/ 0%512/ 1/ -1%/-1% > > 512/ 4/ +1%/-1%512/ 4/ 0%/ 0% > > 512/ 8/ +7%/-5%512/ 8/ +6%/-1% > > 1024/ 1/ 0%/-1% 1024/ 1/ 0%/+1% > > 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% > > 1024/ 8/ +8%/+5% 1024/ 8/ -1%/ 0% > > 2048/ 1/ +2%/+2% 2048/ 1/ -1%/ 0% > > 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/-1% > > 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/-1% > > 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0% > > 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0% > > 4096/ 8/ +9%/-2% 4096/ 8/ -5%/-1% > > > > Signed-off-by: Haibin Zhang> >
[PATCH AUTOSEL for 4.15 025/189] i40iw: Free IEQ resources
From: Mustafa Ismail [ Upstream commit f20d429511affab6a2a9129f46042f43e6ffe396 ] The iWARP Exception Queue (IEQ) resources are not freed when a QP is destroyed. Fix this by freeing IEQ resources when freeing QP resources. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail Signed-off-by: Shiraz Saleem Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin --- drivers/infiniband/hw/i40iw/i40iw_puda.c | 3 +-- drivers/infiniband/hw/i40iw/i40iw_puda.h | 1 + drivers/infiniband/hw/i40iw/i40iw_verbs.c | 1 + 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.c b/drivers/infiniband/hw/i40iw/i40iw_puda.c index 796a815b53fd..266c5952ba92 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_puda.c +++ b/drivers/infiniband/hw/i40iw/i40iw_puda.c @@ -48,7 +48,6 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, void *sqwrid); static void i40iw_ilq_putback_rcvbuf(struct i40iw_sc_qp *qp, u32 wqe_idx); static enum i40iw_status_code i40iw_puda_replenish_rq(struct i40iw_puda_rsrc *rsrc, bool initial); -static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp); /** * i40iw_puda_get_listbuf - get buffer from puda list * @list: list to use for buffers (ILQ or IEQ) @@ -1483,7 +1482,7 @@ static void i40iw_ieq_tx_compl(struct i40iw_sc_vsi *vsi, void *sqwrid) * @ieq: ieq resource * @qp: all pending fpdu buffers */ -static void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp) +void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp) { struct i40iw_puda_buf *buf; struct i40iw_pfpdu *pfpdu = >pfpdu; diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.h b/drivers/infiniband/hw/i40iw/i40iw_puda.h index 660aa3edae56..53a7d58c84b5 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_puda.h +++ b/drivers/infiniband/hw/i40iw/i40iw_puda.h @@ -184,4 +184,5 @@ enum i40iw_status_code i40iw_cqp_qp_create_cmd(struct i40iw_sc_dev *dev, struct enum i40iw_status_code i40iw_cqp_cq_create_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_cq *cq); void i40iw_cqp_qp_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_qp *qp); void i40iw_cqp_cq_destroy_cmd(struct i40iw_sc_dev *dev, struct i40iw_sc_cq *cq); +void i40iw_ieq_cleanup_qp(struct i40iw_puda_rsrc *ieq, struct i40iw_sc_qp *qp); #endif diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 3c6f3ce88f89..6aa613835405 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -412,6 +412,7 @@ void i40iw_free_qp_resources(struct i40iw_device *iwdev, { struct i40iw_pbl *iwpbl = >iwpbl; + i40iw_ieq_cleanup_qp(iwdev->vsi.ieq, >sc_qp); i40iw_dealloc_push_page(iwdev, >sc_qp); if (qp_num) i40iw_free_resource(iwdev, iwdev->allocated_qps, qp_num); -- 2.15.1
Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
> On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote: > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx > > busy > > polling udp packets with small length(e.g. 1byte udp payload), because > > setting > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet > > length. > > > > Ping-Latencies shown below were tested between two Virtual Machines using > > netperf (UDP_STREAM, len=1), and then another machine pinged the client: > > > > Packet-Weight Ping-Latencies(millisecond) > >min avg max > > Origin 3.319 18.48957.303 > > 64 1.6432.021 2.552 > > 128 1.8252.600 3.224 > > 256 1.9972.710 4.295 > > 512 1.8603.171 4.631 > > 1024 2.0024.173 9.056 > > 2048 2.2575.650 9.688 > > 4096 2.0938.50815.943 > > And this is with Q size 256 right? Yes. Ping-latencies with 512 VQ size show below. Packet-Weight Ping-Latencies(millisecond) min avg max Origin 6.357 29.17766.245 64 2.7983.614 4.403 128 2.8613.820 4.775 256 3.0084.018 4.807 512 3.2544.523 5.824 1024 3.0795.335 7.747 2048 3.9448.201 12.762 4096 4.158 11.05719.985 We will submit again. Is there anything else? > > > Ring size is a hint from device about a burst size it can tolerate. Based on > > benchmarks, set the weight to 2 * vq size. > > > > To evaluate this change, another tests were done using netperf(RR, TX) > > between > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was > > tweaked through qemu. Results shown below does not show obvious changes. > > What I asked for is ping-latency with different VQ sizes, > streaming below does not show anything. > > > vq size=256 TCP_RRvq size=512 TCP_RR > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > >1/ 1/ -7%/-2% 1/ 1/ 0%/-2% > >1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% > >1/ 8/ +1%/-2% 1/ 8/ 0%/+1% > > 64/ 1/ -6%/ 0% 64/ 1/ +7%/+3% > > 64/ 4/ 0%/+2% 64/ 4/ -1%/+1% > > 64/ 8/ 0%/ 0% 64/ 8/ -1%/-2% > > 256/ 1/ -3%/-4%256/ 1/ -4%/-2% > > 256/ 4/ +3%/+4%256/ 4/ +1%/+2% > > 256/ 8/ +2%/ 0%256/ 8/ +1%/-1% > > > > vq size=256 UDP_RRvq size=512 UDP_RR > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > >1/ 1/ -5%/+1% 1/ 1/ -3%/-2% > >1/ 4/ +4%/+1% 1/ 4/ -2%/+2% > >1/ 8/ -1%/-1% 1/ 8/ -1%/ 0% > > 64/ 1/ -2%/-3% 64/ 1/ +1%/+1% > > 64/ 4/ -5%/-1% 64/ 4/ +2%/ 0% > > 64/ 8/ 0%/-1% 64/ 8/ -2%/+1% > > 256/ 1/ +7%/+1%256/ 1/ -7%/ 0% > > 256/ 4/ +1%/+1%256/ 4/ -3%/-4% > > 256/ 8/ +2%/+2%256/ 8/ +1%/+1% > > > > vq size=256 TCP_STREAMvq size=512 TCP_STREAM > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > 64/ 1/ 0%/-3% 64/ 1/ 0%/ 0% > > 64/ 4/ +3%/-1% 64/ 4/ -2%/+4% > > 64/ 8/ +9%/-4% 64/ 8/ -1%/+2% > > 256/ 1/ +1%/-4%256/ 1/ +1%/+1% > > 256/ 4/ -1%/-1%256/ 4/ -3%/ 0% > > 256/ 8/ +7%/+5%256/ 8/ -3%/ 0% > > 512/ 1/ +1%/ 0%512/ 1/ -1%/-1% > > 512/ 4/ +1%/-1%512/ 4/ 0%/ 0% > > 512/ 8/ +7%/-5%512/ 8/ +6%/-1% > > 1024/ 1/ 0%/-1% 1024/ 1/ 0%/+1% > > 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% > > 1024/ 8/ +8%/+5% 1024/ 8/ -1%/ 0% > > 2048/ 1/ +2%/+2% 2048/ 1/ -1%/ 0% > > 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/-1% > > 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/-1% > > 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0% > > 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0% > > 4096/ 8/ +9%/-2% 4096/ 8/ -5%/-1% > > > > Signed-off-by: Haibin Zhang > > Signed-off-by: Yunfang