LTS: proc: fix lookup in /proc/net subdirectories after setns(2)
Hi Greg, Can you cherry-pick these to 4.19.y & 5.4.y: commit e06689bf57017ac022ccf0f2a5071f760821ce0f Author: Alexey Dobriyan Date: Wed Dec 4 16:49:59 2019 -0800 proc: change ->nlink under proc_subdir_lock commit c6c75deda81344c3a95d1d1f606d5cee109e5d54 Author: Alexey Dobriyan Date: Tue Dec 15 20:42:39 2020 -0800 proc: fix lookup in /proc/net subdirectories after setns(2) -Tommi
/proc/net/sctp/snmp, setns, proc: revalidate misc dentries
Hello, Bisected problems with setns() and /proc/net/sctp/snmp to this: commit 1da4d377f943fe4194ffb9fb9c26cc58fad4dd24 Author: Alexey Dobriyan Date: Fri Apr 13 15:35:42 2018 -0700 proc: revalidate misc dentries Reproduces for example with Fedora 5.9.10-100.fc32.x86_64, so 1fde6f21d90f ("proc: fix /proc/net/* after setns(2)") does not seem to cover /proc/net/sctp/snmp Reproducer attached, that does open+read+close of /proc/net/sctp/snmp before and after setns() syscall. The second open+read+close of /proc/net/sctp/snmp incorrectly produces results for the default namespace, not the target namespace. Example, create netns and do some sctp: # ./iperf-netns + modprobe sctp + ip netns add test + ip netns exec test ip link set lo up + ip netns exec test iperf3 -s -1 --- Server listening on 5201 --- + ip netns exec test iperf3 -c 127.0.0.1 --sctp --bitrate 50M --time 4 Connecting to host 127.0.0.1, port 5201 Accepted connection from 127.0.0.1, port 50696 [ 5] local 127.0.0.1 port 54735 connected to 127.0.0.1 port 5201 [ 5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 54735 [ ID] Interval Transfer Bitrate [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 6.00 MBytes 50.3 Mbits/sec [ 5] 0.00-1.00 sec 6.00 MBytes 50.3 Mbits/sec [ 5] 1.00-2.00 sec 5.94 MBytes 49.8 Mbits/sec [ 5] 1.00-2.00 sec 5.94 MBytes 49.8 Mbits/sec [ 5] 2.00-3.00 sec 6.00 MBytes 50.3 Mbits/sec [ 5] 2.00-3.00 sec 6.00 MBytes 50.3 Mbits/sec [ 5] 3.00-4.00 sec 5.94 MBytes 49.8 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-4.00 sec 23.9 MBytes 50.1 Mbits/sec receiver [ 5] 3.00-4.00 sec 5.94 MBytes 49.8 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-4.00 sec 23.9 MBytes 50.1 Mbits/sec [ 5] 0.00-4.00 sec 23.9 MBytes 50.1 iperf Done. + cat /proc/net/sctp/snmp SctpCurrEstab 0 SctpActiveEstabs0 SctpPassiveEstabs 0 SctpAborteds0 SctpShutdowns 0 SctpOutOfBlues 0 SctpChecksumErrors 0 [...] + ip netns exec test cat /proc/net/sctp/snmp SctpCurrEstab 0 SctpActiveEstabs2 SctpPassiveEstabs 2 SctpAborteds0 SctpShutdowns 4 SctpOutOfBlues 0 SctpChecksumErrors 0 SctpOutCtrlChunks 1544 SctpOutOrderChunks 1530 [...] + wait But now we see all zeroes in /proc/net/sctp/snmp with the reproducer: $ gcc repro.c -o repro # ./repro /proc/net/sctp/snmp [pid: 175998] SctpCurrEstab 0 SctpActiveEstabs0 SctpPassiveEstabs 0 SctpAborteds0 SctpShutdowns 0 [...] setns(/run/netns/test) ... /proc/net/sctp/snmp [pid: 175998] SctpCurrEstab 0 SctpActiveEstabs0 SctpPassiveEstabs 0 SctpAborteds0 SctpShutdowns 0 SctpOutOfBlues 0 [...] -Tommi #define _GNU_SOURCE #include #include #include #include #include #include #include void slurp(const char *fn) { char buf[8192]; ssize_t r; int fd; printf("%s [pid: %d]\n", fn, getpid()); fflush(stdout); fd = open(fn, O_RDONLY); if (fd < 0) { perror("open"); exit(1); } r = read(fd, buf, sizeof(buf)-1); if (r < 0) { perror("read"); exit(1); } buf[r] = 0; puts(buf); fflush(stdout); if (close(fd) < 0) { perror("close"); exit(1); } } void newnet(const char *ns) { int fd; fd = open(ns, O_RDONLY); if (fd < 0) { perror("open"); exit(1); } if (setns(fd, CLONE_NEWNET) < 0) { perror("setns"); exit(1); } if (close(fd) < 0) { perror("close"); exit(1); } } int main(int argc, char **argv) { const char *ns = "/run/netns/test"; const char *fn = "/proc/net/sctp/snmp"; int d = 1; // Optional args: /run/netns/... /proc/net/... n if (argc >= 2) ns = argv[1]; if (argc >= 3) fn = argv[2]; if (argc >= 4 && argv[3][0] == 'n') d = 0; if (d) slurp(fn); printf("setns(%s) ...\n", ns); fflush(stdout); newnet(ns); slurp(fn); } iperf-netns Description: iperf-netns
Re: [PATCH] selftests: intel_pstate: ftime() is deprecated
On Tue, 2020-10-27 at 14:08 -0600, Shuah Khan wrote: > > > @@ -73,8 +80,8 @@ int main(int argc, char **argv) { > > aperf = new_aperf-old_aperf; > > mperf = new_mperf-old_mperf; > > > > - start = before.time*1000 + before.millitm; > > - finish = after.time*1000 + after.millitm; > > + start = before.tv_sec*1000 + before.tv_nsec/100L; > > + finish = after.tv_sec*1000 + after.tv_nsec/100L; > > Why not use timespec dNSEC_PER_MSEC define from include/vdso/time64.h? Hi, If the define was available in the UAPI headers, then certainly would make sense to use it. But I would not mess with the kernel internal headers here. -Tommi
LTS couple perf test and perf top fixes
Hi Greg, Sasha, Can you pick this to 5.4: commit dbd660e6b2884b864d2642d930a163d3bcebe4be Author: Tommi Rantala Date: Thu Apr 23 14:53:40 2020 +0300 perf test session topology: Fix data path And this to 5.4 and older LTS trees too: commit 29b4f5f188571c112713c35cc87eefb46efee612 Author: Tommi Rantala Date: Thu Mar 5 10:37:12 2020 +0200 perf top: Fix stdio interface input handling with glibc 2.28+ Thanks! -Tommi
Re: [PATCH 4.14 038/190] KVM: x86: only do L1TF workaround on affected processors
On Wed, 2020-06-24 at 10:15 -0400, Sasha Levin wrote: > On Wed, Jun 24, 2020 at 12:00:59PM +0000, Rantala, Tommi T. (Nokia - > FI/Espoo) wrote: > > On Fri, 2020-06-19 at 16:31 +0200, Greg Kroah-Hartman wrote: > > > From: Paolo Bonzini > > > > > > [ Upstream commit d43e2675e96fc6ae1a633b6a69d296394448cc32 ] > > > > > > KVM stores the gfn in MMIO SPTEs as a caching optimization. > > > > Any ideas what's missing in 4.14 ? > > I think that this was because we're missing 6129ed877d40 ("KVM: x86/mmu: > Set mmio_value to '0' if reserved #PF can't be generated"). I've queued > it up (along with a few other related commits) and a new -rc cycle > should be underway for those. Sorry, I still see it with 4.14.186: [2.355140] [ cut here ] [2.355872] WARNING: CPU: 0 PID: 849 at arch/x86/kvm/mmu.c:284 kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm] [2.357723] Modules linked in: kvm_intel(+) kvm irqbypass bfq sch_fq_codel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ata_piix dm_mirror dm_region_hash dm_log dm_mod dax autofs4 [2.359639] CPU: 0 PID: 849 Comm: systemd-udevd Not tainted 4.14.186 #2 [2.360309] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [2.361177] task: 8a3d19429dc0 task.stack: b2558460c000 [2.361775] RIP: 0010:kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm] [2.362390] RSP: 0018:b2558460fc58 EFLAGS: 00010206 [2.362901] RAX: RBX: c0179000 RCX: ff45 [2.363617] RDX: 0028 RSI: 00080001 RDI: 00080001 [2.364329] RBP: c00c5951 R08: R09: 3fff [2.365021] R10: b255841592b8 R11: fffe R12: 5bc0 [2.365717] R13: c017a780 R14: b2558460fea0 R15: 0001 [2.366437] FS: 7fc6fcab6c40() GS:8a3d1ea0() knlGS: [2.367270] CS: 0010 DS: ES: CR0: 80050033 [2.367824] CR2: 564de775f840 CR3: 000818efc001 CR4: 001606f0 [2.368535] Call Trace: [2.368809] kvm_mmu_module_init+0x15f/0x240 [kvm] [2.369323] kvm_arch_init+0x5e/0x100 [kvm] [2.369750] kvm_init+0x1c/0x2b0 [kvm] [2.370155] ? free_pcppages_bulk+0x22d/0x4b0 [2.370591] ? hardware_setup+0x4ab/0x4ab [kvm_intel] [2.371113] vmx_init+0x21/0x6af [kvm_intel] [2.371596] ? hardware_setup+0x4ab/0x4ab [kvm_intel] [2.372118] do_one_initcall+0x3e/0xf4 [2.372501] ? kmem_cache_alloc_trace+0xef/0x190 [2.372964] do_init_module+0x5c/0x1f0 [2.373383] load_module+0x1f31/0x2620 [2.373765] ? SYSC_finit_module+0x95/0xb0 [2.374205] SYSC_finit_module+0x95/0xb0 [2.374601] do_syscall_64+0x74/0x190 [2.374974] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [2.375500] RIP: 0033:0x7fc6fd3801bd [2.375853] RSP: 002b:7ffd768187f8 EFLAGS: 0246 ORIG_RAX: 0139 [2.376593] RAX: ffda RBX: 564539d9ab50 RCX: 7fc6fd3801bd [2.377305] RDX: RSI: 7fc6fcfc784d RDI: 000e [2.377981] RBP: 0002 R08: R09: 0007 [2.378693] R10: 000e R11: 0246 R12: 7fc6fcfc784d [2.379401] R13: R14: 564539d7a530 R15: 564539d9ab50 [2.380104] Code: 59 25 06 00 75 25 48 b8 00 00 00 00 00 00 00 40 48 09 c6 48 09 c7 48 89 35 68 25 06 00 48 89 3d 69 25 06 00 c3 0f 0b 0f 0b eb d2 <0f> 0b eb d7 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 [2.381905] ---[ end trace 5f757335c2eac657 ]---
Re: [PATCH 4.14 038/190] KVM: x86: only do L1TF workaround on affected processors
On Fri, 2020-06-19 at 16:31 +0200, Greg Kroah-Hartman wrote: > From: Paolo Bonzini > > [ Upstream commit d43e2675e96fc6ae1a633b6a69d296394448cc32 ] > > KVM stores the gfn in MMIO SPTEs as a caching optimization. These are > split > in two parts, as in "[high 1 low]", to thwart any attempt to use these > bits > in an L1TF attack. This works as long as there are 5 free bits between > MAXPHYADDR and bit 50 (inclusive), leaving bit 51 free so that the MMIO > access triggers a reserved-bit-set page fault. Hi, I'm now seeing this warning in VM bootup with 4.14.y Not seen with 4.19.129 and 5.4.47 that also included this commit. Any ideas what's missing in 4.14 ? [2.294049] [ cut here ] [2.294621] WARNING: CPU: 43 PID: 856 at arch/x86/kvm/mmu.c:279 kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm] [2.295583] Modules linked in: kvm_intel(+) kvm irqbypass bfq sch_fq_codel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ata_piix dm_mirror dm_region_hash dm_log dm_mod dax autofs4 [2.297269] CPU: 43 PID: 856 Comm: systemd-udevd Not tainted 4.14.185 #1 [2.297920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [2.298782] task: 9b2350b19dc0 task.stack: a86344604000 [2.299390] RIP: 0010:kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm] [2.299987] RSP: 0018:a86344607c78 EFLAGS: 00010206 [2.300522] RAX: RBX: c0457000 RCX: [2.301239] RDX: 0001 RSI: 00080001 RDI: 00080001 [2.301935] RBP: c03bd951 R08: 9b235f4e33a0 R09: 9b2355f57258 [2.302646] R10: 0164 R11: R12: [2.303356] R13: c0458780 R14: a86344607ea0 R15: 0001 [2.304069] FS: 7f3e95dedc40() GS:9b235f4c() knlGS: [2.304852] CS: 0010 DS: ES: CR0: 80050033 [2.305425] CR2: 55bd35ff10d0 CR3: 00081026a004 CR4: 001606e0 [2.306137] Call Trace: [2.306414] kvm_arch_init+0x90/0x130 [kvm] [2.306852] kvm_init+0x1c/0x2b0 [kvm] [2.307258] ? __slab_free+0x13a/0x2e0 [2.307649] ? hardware_setup+0x4ab/0x4ab [kvm_intel] [2.308178] vmx_init+0x21/0x6af [kvm_intel] [2.308604] ? hardware_setup+0x4ab/0x4ab [kvm_intel] [2.309132] do_one_initcall+0x3e/0xf4 [2.309512] ? kmem_cache_alloc_trace+0xef/0x190 [2.309985] do_init_module+0x5c/0x1f0 [2.310386] load_module+0x1f31/0x2620 [2.310769] ? SYSC_finit_module+0x95/0xb0 [2.311202] SYSC_finit_module+0x95/0xb0 [2.311600] do_syscall_64+0x74/0x190 [2.311980] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [2.312496] RIP: 0033:0x7f3e966b71bd [2.312860] RSP: 002b:7ffe0db584c8 EFLAGS: 0246 ORIG_RAX: 0139 [2.313606] RAX: ffda RBX: 55bd36027b10 RCX: 7f3e966b71bd [2.314314] RDX: RSI: 7f3e962fe84d RDI: 000f [2.315017] RBP: 0002 R08: R09: 0007 [2.315719] R10: 000f R11: 0246 R12: 7f3e962fe84d [2.316420] R13: R14: 55bd3602f400 R15: 55bd36027b10 [2.317130] Code: 29 25 06 00 75 25 48 b8 00 00 00 00 00 00 00 40 48 09 c6 48 09 c7 48 89 35 38 25 06 00 48 89 3d 39 25 06 00 c3 0f 0b 0f 0b eb d2 <0f> 0b eb d7 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 [2.318933] ---[ end trace d933315308434918 ]--- $ head /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > The bit positions however were computed wrongly for AMD processors that > have > encryption support. In this case, x86_phys_bits is reduced (for example > from 48 to 43, to account for the C bit at position 47 and four bits used > internally to store the SEV ASID and other stuff) while x86_cache_bits in > would remain set to 48, and _all_ bits between the reduced MAXPHYADDR > and bit 51 are set. Then low_phys_bits would also cover some of the > bits that are set in the shadow_mmio_value, terribly confusing the gfn > caching mechanism. > > To fix this, avoid splitting gfns as long as the processor does not have > the L1TF bug (which includes all AMD processors). When there is no > splitting, low_phys_bits can be set to the reduced MAXPHYADDR removing > the overlap. This fixes "npt=0" operation on EPYC processors. > > Thanks to Maxim Levitsky for bisecting this bug. > > Cc: sta...@vger.kernel.org > Fixes: 52918ed5fcf0 ("KVM: SVM: Override default MMIO mask if memory > encryption is enabled") > Signed-off-by: Paolo Bonzini > Signed-off-by: Sasha Levin > --- > arch/x86/kvm/mmu.c | 19 ++- > 1 file changed, 10 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index d8878266553c..7220ab210dcf 100644 >
rseq selftests param_test.c gettid build failure
Hi Mathieu, I'm getting rseq selftest build failure with glibc 2.30, which added gettid(): param_test.c:18:21: error: static declaration of 'gettid' follows non- static declaration 18 | static inline pid_t gettid(void) | ^~ In file included from /usr/include/unistd.h:1170, from param_test.c:11: /usr/include/bits/unistd_ext.h:34:16: note: previous declaration of 'gettid' was here 34 | extern __pid_t gettid (void) __THROW; |^~ BR, Tommi
nfs4 server stops responding
Hello, I have two VMs, exporting some directories in one VM: # cat /etc/exports /mnt 192.168.1.0/24(ro,fsid=0,no_subtree_check,sync) /mnt/export 192.168.1.0/24(rw,no_root_squash,sync,no_wdelay,no_subtree_check) [...] And NFS mounting in the second VM: # grep nfs /proc/mounts server:/export /mnt/export nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255, acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,nordirplus, proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.11, local_lock=none,addr=192.168.1.10 0 0 [...] If I keep some file descriptor open for several minutes in the second VM, for example by running this: # sleep 10m >/mnt/export/test Then result is that the NFS mount stops responding: the sleep process never finished but is "forever" stuck in (killable) D state, and any I/O attempt from other processes in /mnt/export never finish. It's always reproducible with this sleep command. To recover the mountpoint I need to reboot the second VM. Kernel version is 5.3.0-rc4 in both VMs. Also reproducible with 4.14.x and 4.19.x # ps aux|grep sleep root 2524 0.0 0.0 5900 688 pts/0D14:04 0:00 sleep 5m # grep -C100 nfs /proc/*/stack /proc/2524/stack:[<0>] nfs4_do_close+0x87d/0xb20 [nfsv4] /proc/2524/stack:[<0>] __put_nfs_open_context+0x297/0x4f0 [nfs] /proc/2524/stack:[<0>] nfs_file_release+0xbe/0xf0 [nfs] /proc/2524/stack-[<0>] __fput+0x1df/0x690 /proc/2524/stack-[<0>] task_work_run+0x123/0x1b0 /proc/2524/stack-[<0>] exit_to_usermode_loop+0x121/0x140 /proc/2524/stack-[<0>] do_syscall_64+0x2d1/0x370 /proc/2524/stack-[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 -- /proc/561/stack-[<0>] __rpc_execute+0x692/0xb10 [sunrpc] /proc/561/stack-[<0>] rpc_run_task+0x45f/0x5d0 [sunrpc] /proc/561/stack:[<0>] nfs4_call_sync_sequence+0x12a/0x210 [nfsv4] /proc/561/stack:[<0>] _nfs4_proc_getattr+0x19a/0x200 [nfsv4] /proc/561/stack:[<0>] nfs4_proc_getattr+0xda/0x230 [nfsv4] /proc/561/stack:[<0>] __nfs_revalidate_inode+0x2ed/0x7a0 [nfs] /proc/561/stack:[<0>] nfs_do_access+0x605/0xd00 [nfs] /proc/561/stack:[<0>] nfs_permission+0x500/0x5e0 [nfs] /proc/561/stack-[<0>] inode_permission+0x2dd/0x3f0 /proc/561/stack-[<0>] link_path_walk.part.60+0x681/0xe40 /proc/561/stack-[<0>] path_lookupat.isra.63+0x1af/0x850 /proc/561/stack-[<0>] filename_lookup.part.79+0x165/0x360 /proc/561/stack-[<0>] vfs_statx+0xb9/0x140 /proc/561/stack-[<0>] __do_sys_newstat+0x77/0xd0 /proc/561/stack-[<0>] do_syscall_64+0x9a/0x370 /proc/561/stack-[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 In dmesg of second VM sometimes nfs complaints are seen: [ 386.362897] nfs: server xyz not responding, still trying Any ideas what's going wrong here...? -Tommi
Re: [PATCH 4.14 43/43] tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
On Fri, 2019-08-02 at 09:28 +0200, gre...@linuxfoundation.org wrote: > On Thu, Aug 01, 2019 at 10:17:30AM +0000, Rantala, Tommi T. (Nokia - > FI/Espoo) wrote: > > Hi, > > > > This tipc patch added in 4.14.132 is triggering a crash for me, > > revert > > fixes it. > > > > Anyone have ideas if some other commits missing in 4.14.x to make > > this > > work...? > > Do you also hav a problem with 4.19.y? How about 5.2.y? If not, can > you do 'git bisect' to find the patch that fixes the issue? > > thanks, > > greg k-h Hi, please pick this to 4.14.y and 4.19.y, tested that it fixes the crash in both: commit 5684abf7020dfc5f0b6ba1d68eda3663871fce52 Author: Xin Long Date: Mon Jun 17 21:34:13 2019 +0800 ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL For 5.2.y nothing is needed, these commits were in v5.2-rc6 already. -Tommi
Re: [PATCH 4.14 43/43] tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
On Tue, 2019-07-02 at 10:02 +0200, Greg Kroah-Hartman wrote: > From: Xin Long > > commit c3bcde026684c62d7a2b6f626dc7cf763833875c upstream. > > udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel > device > to count packets on dev->tstats, a perpcu variable. However, TIPC is > using > udp tunnel with no tunnel device, and pass the lower dev, like veth > device > that only initializes dev->lstats(a perpcu variable) when creating > it. Hi, This tipc patch added in 4.14.132 is triggering a crash for me, revert fixes it. Anyone have ideas if some other commits missing in 4.14.x to make this work...? # modprobe tipc # tipc node set addr 1.1.2 # tipc bearer enable media udp name UDP1 localip 192.168.1.15 [ 143.105529] Own node address <1.1.2>, network identity 4711 [ 172.087098] BUG: unable to handle kernel NULL pointer dereference at 04f0 [ 172.088375] IP: iptunnel_xmit+0x15e/0x1e0 [ 172.089072] PGD 800231306067 P4D 800231306067 PUD 2356e1067 PMD 0 [ 172.090094] Oops: [#1] SMP PTI [ 172.090610] Modules linked in: tipc ip6_udp_tunnel udp_tunnel isofs kvm_intel kvm irqbypass sch_fq_codel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ata_piix dm_mirror dm_region_hash dm_log dm_mod dax autofs4 [ 172.093293] CPU: 1 PID: 747 Comm: tipc Not tainted 4.14.134-1.x86_64 #1 [ 172.094448] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 172.095703] task: 8b99f12c task.stack: 9ab481198000 [ 172.096731] RIP: 0010:iptunnel_xmit+0x15e/0x1e0 [ 172.097460] RSP: 0018:9ab48119ba00 EFLAGS: 00010202 [ 172.098214] RAX: RBX: bf4d8140 RCX: 008c [ 172.099320] RDX: 0001 RSI: fe01 RDI: be944d62 [ 172.100392] RBP: 8b99f1e7ed00 R08: 8b99ffc64520 R09: [ 172.101451] R10: 00023426d000 R11: 0002 R12: [ 172.102607] R13: 0040 R14: R15: 8b99f426e0e8 [ 172.103728] FS: 7efc82b96800() GS:8b99ffc4() knlGS: [ 172.104976] CS: 0010 DS: ES: CR0: 80050033 [ 172.105821] CR2: 04f0 CR3: 000234250001 CR4: 003606e0 [ 172.106981] DR0: DR1: DR2: [ 172.108120] DR3: DR6: fffe0ff0 DR7: 0400 [ 172.109386] Call Trace: [ 172.109808] tipc_udp_xmit.isra.18+0x1a7/0x1c0 [tipc] [ 172.110687] ? __internal_add_timer+0x1a/0x50 [ 172.111369] ? __skb_clone+0x29/0x130 [ 172.111999] tipc_bearer_xmit_skb+0x4d/0x80 [tipc] [ 172.112845] tipc_enable_bearer+0x2b9/0x3c0 [tipc] [ 172.113637] ? __nla_put+0xc/0x20 [ 172.114213] tipc_nl_bearer_enable+0xca/0x100 [tipc] [ 172.114952] genl_family_rcv_msg+0x190/0x390 [ 172.115748] genl_rcv_msg+0x47/0x90 [ 172.116287] ? __alloc_skb+0x72/0x1b0 [ 172.116898] ? genl_family_rcv_msg+0x390/0x390 [ 172.117669] netlink_rcv_skb+0x3d/0x100 [ 172.118361] genl_rcv+0x24/0x40 [ 172.119005] netlink_unicast+0x16d/0x230 [ 172.119777] netlink_sendmsg+0x1ae/0x3c0 [ 172.120525] SYSC_sendto+0xe6/0x140 [ 172.121248] ? SYSC_getsockname+0x81/0xa0 [ 172.121989] ? sock_alloc_file+0x97/0x120 [ 172.122645] ? sock_map_fd+0x3d/0x60 [ 172.123278] do_syscall_64+0x74/0x190 [ 172.123911] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 172.124716] RIP: 0033:0x7efc82d6ac6b [ 172.125368] RSP: 002b:7fff40411ae8 EFLAGS: 0246 ORIG_RAX: 002c [ 172.126486] RAX: ffda RBX: 01dfca20 RCX: 7efc82d6ac6b [ 172.127632] RDX: 0054 RSI: 7fff40411b60 RDI: 0003 [ 172.128765] RBP: 7fff40411b50 R08: 7efc82e36000 R09: 000c [ 172.129793] R10: R11: 0246 R12: 7fff40411b60 [ 172.130799] R13: 7fff40412d10 R14: 0040bb44 R15: [ 172.131868] Code: 01 00 00 00 85 d2 0f 44 d0 e8 1f f3 fa ff 48 8b 74 24 08 4c 89 fa 48 89 df e8 9f 94 fb ff 83 e0 fd 75 35 8b 4c 24 1c 85 c9 7e 2b <49> 8b 84 24 f0 04 00 00 65 48 03 05 aa 29 68 41 48 83 40 10 01 [ 172.134773] RIP: iptunnel_xmit+0x15e/0x1e0 RSP: 9ab48119ba00 [ 172.135697] CR2: 04f0 [ 172.136305] ---[ end trace 27f7522ade26797f ]--- > Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the > dev as > a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' > each > pointer points to a bigger struct than lstats, so when tstats- > >tx_bytes is > increased, other percpu variable's members could be overwritten. > > syzbot has reported quite a few crashes due to fib_nh_common percpu > member > 'nhc_pcpu_rth_output' overwritten, call traces are like: > > BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 > net/ipv4/route.c:1556 > rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 > __mkroute_output net/ipv4/route.c:2332 [inline] >
[PATCH 4.14] perf machine: Guard against NULL in machine__exit()
From: Arnaldo Carvalho de Melo commit 4a2233b194c77ae1ea8304cb7c00b551de4313f0 upstream. A recent fix for 'perf trace' introduced a bug where machine__exit(trace->host) could be called while trace->host was still NULL, so make this more robust by guarding against NULL, just like free() does. The problem happens, for instance, when !root users try to run 'perf trace': [acme@jouet linux]$ trace Error:No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' perf: Segmentation fault Obtained 7 stack frames. [0x4f1b2e] /lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f] [0x4f3fec] [0x47468b] [0x42a2db] /lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509] [0x42a6c9] Segmentation fault (core dumped) [acme@jouet linux]$ Cc: Adrian Hunter Cc: Alexander Shishkin Cc: Andrei Vagin Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Vasily Averin Cc: Wang Nan Fixes: 33974a414ce2 ("perf trace: Call machine__exit() at exit") Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Tommi Rantala --- tools/perf/util/machine.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 968fd0454e6b..d246080cd85e 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -156,6 +156,9 @@ void machine__delete_threads(struct machine *machine) void machine__exit(struct machine *machine) { + if (machine == NULL) + return; + machine__destroy_kernel_maps(machine); map_groups__exit(>kmaps); dsos__exit(>dsos); -- 2.20.1
perf top --stdio, glibc 2.28, stdio EOF sticky
Hello, "perf top --stdio" (or perf kvm top --stdio) keyboard handling does not work properly for me. Instead of accepting key presses, it just displays the "Mapped keys:" help output always. Seems to be related to this glibc 2.28 stdio change: https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS * All stdio functions now treat end-of-file as a sticky condition. If you read from a file until EOF, and then the file is enlarged by another process, you must call clearerr or another function with the same effect (e.g. fseek, rewind) before you can read the additional data. This corrects a longstanding C99 conformance bug. It is most likely to affect programs that use stdio to read interactive input from a terminal. (Bug #1190.) Also "perf top
Re: [PATCH 4.19 144/187] selftests/bpf: skip verifier tests for unsupported program types
On Thu, 2019-04-04 at 10:48 +0200, Greg Kroah-Hartman wrote: > 4.19-stable review patch. If anyone has any objections, please let > me know. > > -- > > [ Upstream commit 8184d44c9a577a2f1842ed6cc844bfd4a9981d8e ] > > Use recently introduced bpf_probe_prog_type() to skip tests in the > test_verifier() if bpf_verify_program() fails. The skipped test is > indicated in the output. Hi, this patch added in 4.19.34 causes test_verifier build failure, as bpf_probe_prog_type() is not available: gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf -I../../../../include/generated -DHAVE_GENHDR -I../../../includetest_verifier.c /root/linux- 4.19.44/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread -o /root/linux-4.19.44/tools/testing/selftests/bpf/test_verifier test_verifier.c: In function ‘do_test_single’: test_verifier.c:12775:22: warning: implicit declaration of function ‘bpf_probe_prog_type’; did you mean ‘bpf_program__set_type’? [- Wimplicit-function-declaration] if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) { ^~~ bpf_program__set_type /usr/bin/ld: /tmp/ccEtyLhk.o: in function `do_test_single': test_verifier.c:(.text+0xa19): undefined reference to `bpf_probe_prog_type' collect2: error: ld returned 1 exit status make[1]: *** [../lib.mk:152: /root/linux- 4.19.44/tools/testing/selftests/bpf/test_verifier] Error 1 - Tommi > Example: > > ... > 679/p bpf_get_stack return R0 within range SKIP (unsupported program > type 5) > 680/p ld_abs: invalid op 1 OK > ... > Summary: 863 PASSED, 165 SKIPPED, 3 FAILED > > Signed-off-by: Stanislav Fomichev > Signed-off-by: Daniel Borkmann > Signed-off-by: Sasha Levin > --- > tools/testing/selftests/bpf/test_verifier.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/test_verifier.c > b/tools/testing/selftests/bpf/test_verifier.c > index 9db5a7378f40..294fc18aba2a 100644 > --- a/tools/testing/selftests/bpf/test_verifier.c > +++ b/tools/testing/selftests/bpf/test_verifier.c > @@ -32,6 +32,7 @@ > #include > > #include > +#include > > #ifdef HAVE_GENHDR > # include "autoconf.h" > @@ -56,6 +57,7 @@ > > #define UNPRIV_SYSCTL "kernel/unprivileged_bpf_disabled" > static bool unpriv_disabled = false; > +static int skips; > > struct bpf_test { > const char *descr; > @@ -12770,6 +12772,11 @@ static void do_test_single(struct bpf_test > *test, bool unpriv, > fd_prog = bpf_verify_program(prog_type ? : > BPF_PROG_TYPE_SOCKET_FILTER, >prog, prog_len, test->flags & > F_LOAD_WITH_STRICT_ALIGNMENT, >"GPL", 0, bpf_vlog, > sizeof(bpf_vlog), 1); > + if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) { > + printf("SKIP (unsupported program type %d)\n", > prog_type); > + skips++; > + goto close_fds; > + } > > expected_ret = unpriv && test->result_unpriv != UNDEF ? > test->result_unpriv : test->result; > @@ -12905,7 +12912,7 @@ static void get_unpriv_disabled() > > static int do_test(bool unpriv, unsigned int from, unsigned int to) > { > - int i, passes = 0, errors = 0, skips = 0; > + int i, passes = 0, errors = 0; > > for (i = from; i < to; i++) { > struct bpf_test *test = [i];
Re: [PATCH 4.14 09/69] x86: vdso: Use $LD instead of $CC to link
On Fri, 2019-04-26 at 05:48 -0700, Nathan Chancellor wrote: > On Fri, Apr 26, 2019 at 11:41:30AM +0000, Rantala, Tommi T. (Nokia - > FI/Espoo) wrote: > > On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote: > > > commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream. > > > > > > > Hi, > > > > With this patch in 4.14.112 build-id is now missing in vdso32.so: > > > > $ file arch/x86/entry/vdso/vdso*so* > > arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, > > Intel > > 80386, version 1 (SYSV), dynamically linked, stripped > > arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, > > Intel > > 80386, version 1 (SYSV), dynamically linked, with debug_info, not > > stripped > > arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, > > x86- > > 64, version 1 (SYSV), dynamically linked, > > BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped > > arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, > > x86- > > 64, version 1 (SYSV), dynamically linked, > > BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with > > debug_info, not stripped > > > > > > Based on quick check, "$(call ld-option, --build-id)" fails due to > > some > > 32/64 bit mismatch, so the --build-id linker flag is not used when > > linking vdso32.so > > > > Perhaps scripts/Kbuild.include is missing some change in 4.14.y to > > make > > this work properly. > > > > Hi Tommi, > > This appears to be fixed by commit 0294e6f4a000 ("kbuild: simplify > ld-option implementation") upstream. Could you test the attached > backport and make sure everything works on your end? Assuming that it > does, I will test the other stable releases and see if this is needed > and send those backports along. Yes this patch fixes it. Many thanks! -Tommi > Thanks and sorry for the trouble! > Nathan > > > -Tommi > > > > > The vdso{32,64}.so can fail to link with CC=clang when clang > > > tries to > > > find > > > a suitable GCC toolchain to link these libraries with. > > > > > > /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: > > > access beyond end of merged section (782) > > > > > > This happens because the host environment leaked into the cross > > > compiler > > > environment due to the way clang searches for suitable GCC > > > toolchains. > > > > > > Clang is a retargetable compiler, and each invocation of it must > > > provide > > > --target= --gcc-toolchain= to allow it to > > > find > > > the > > > correct binutils for cross compilation. These flags had been > > > added to > > > KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS > > > (for > > > various > > > reasons) which breaks clang's ability to find the correct linker > > > when > > > cross > > > compiling. > > > > > > Most of the time this goes unnoticed because the host linker is > > > new > > > enough > > > to work anyway, or is incompatible and skipped, but this cannot > > > be > > > reliably > > > assumed. > > > > > > This change alters the vdso makefile to just use LD directly, > > > which > > > bypasses clang and thus the searching problem. The makefile will > > > just > > > use > > > ${CROSS_COMPILE}ld instead, which is always what we want. This > > > matches the > > > method used to link vmlinux. > > > > > > This drops references to DISABLE_LTO; this option doesn't seem to > > > be > > > set > > > anywhere, and not knowing what its possible values are, it's not > > > clear how > > > to convert it from CC to LD flag. > > > > > > Signed-off-by: Alistair Strachan > > > Signed-off-by: Thomas Gleixner > > > Acked-by: Andy Lutomirski > > > Cc: "H. Peter Anvin" > > > Cc: Greg Kroah-Hartman > > > Cc: kernel-t...@android.com > > > Cc: j...@joelfernandes.org > > > Cc: Andi Kleen > > > Link: > > > https://lkml.kernel.org/r/20180803173931.117515-1-astrac...@google.com > > > Signed-off-by: Nathan Chancellor > > > Signed-off-by: Sasha Levin > > > --- > > > arch/x86/entry/vdso/Makefile | 22 +- > > > 1 file changed, 9 insertions(+), 13 deletions(-) > > > > > > d
Re: [PATCH 4.14 09/69] x86: vdso: Use $LD instead of $CC to link
On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote: > commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream. > Hi, With this patch in 4.14.112 build-id is now missing in vdso32.so: $ file arch/x86/entry/vdso/vdso*so* arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, stripped arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, with debug_info, not stripped arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with debug_info, not stripped Based on quick check, "$(call ld-option, --build-id)" fails due to some 32/64 bit mismatch, so the --build-id linker flag is not used when linking vdso32.so Perhaps scripts/Kbuild.include is missing some change in 4.14.y to make this work properly. -Tommi > The vdso{32,64}.so can fail to link with CC=clang when clang tries to > find > a suitable GCC toolchain to link these libraries with. > > /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: > access beyond end of merged section (782) > > This happens because the host environment leaked into the cross > compiler > environment due to the way clang searches for suitable GCC > toolchains. > > Clang is a retargetable compiler, and each invocation of it must > provide > --target= --gcc-toolchain= to allow it to find > the > correct binutils for cross compilation. These flags had been added to > KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for > various > reasons) which breaks clang's ability to find the correct linker when > cross > compiling. > > Most of the time this goes unnoticed because the host linker is new > enough > to work anyway, or is incompatible and skipped, but this cannot be > reliably > assumed. > > This change alters the vdso makefile to just use LD directly, which > bypasses clang and thus the searching problem. The makefile will just > use > ${CROSS_COMPILE}ld instead, which is always what we want. This > matches the > method used to link vmlinux. > > This drops references to DISABLE_LTO; this option doesn't seem to be > set > anywhere, and not knowing what its possible values are, it's not > clear how > to convert it from CC to LD flag. > > Signed-off-by: Alistair Strachan > Signed-off-by: Thomas Gleixner > Acked-by: Andy Lutomirski > Cc: "H. Peter Anvin" > Cc: Greg Kroah-Hartman > Cc: kernel-t...@android.com > Cc: j...@joelfernandes.org > Cc: Andi Kleen > Link: > https://lkml.kernel.org/r/20180803173931.117515-1-astrac...@google.com > Signed-off-by: Nathan Chancellor > Signed-off-by: Sasha Levin > --- > arch/x86/entry/vdso/Makefile | 22 +- > 1 file changed, 9 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/entry/vdso/Makefile > b/arch/x86/entry/vdso/Makefile > index 0a550dc5c525..0defcc939ab4 100644 > --- a/arch/x86/entry/vdso/Makefile > +++ b/arch/x86/entry/vdso/Makefile > @@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg) > > export CPPFLAGS_vdso.lds += -P -C > > -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \ > - -Wl,--no-undefined \ > - -Wl,-z,max-page-size=4096 -Wl,-z,common-page- > size=4096 \ > - $(DISABLE_LTO) > +VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no- > undefined \ > + -z max-page-size=4096 -z common-page-size=4096 > > $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE > $(call if_changed,vdso) > @@ -103,10 +101,8 @@ CFLAGS_REMOVE_vvar.o = -pg > # > > CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) > -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \ > --Wl,-soname=linux-vdso.so.1 \ > --Wl,-z,max-page-size=4096 \ > --Wl,-z,common-page-size=4096 > +VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \ > +-z max-page-size=4096 -z common-page- > size=4096 > > # 64-bit objects to re-brand as x32 > vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y)) > @@ -134,7 +130,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds > $(vobjx32s) FORCE > $(call if_changed,vdso) > > CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) > -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux- > gate.so.1 > +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1 > > # This makes sure the $(obj) subdirectory exists even though vdso32/ > # is not a kbuild sub-make subdirectory. > @@ -180,13 +176,13 @@ $(obj)/vdso32.so.dbg: FORCE \ > # The DSO images are built using a special linker script. > # > quiet_cmd_vdso = VDSO$@ > - cmd_vdso = $(CC)
/proc/sys/kernel/sched_domain/, isolcpus, CONFIG_CPUMASK_OFFSTACK
Hello, /proc/sys/kernel/sched_domain/ seems to be somewhat broken when kernel is configured without CONFIG_CPUMASK_OFFSTACK and booting with isolcpus= option. Example with 8x CPU. With CONFIG_CPUMASK_OFFSTACK=y and "isolcpus=2": # uname -r 5.0.0-0.rc3.git0.1.fc30.x86_64 # ls /proc/sys/kernel/sched_domain/* /proc/sys/kernel/sched_domain/cpu0: domain0 /proc/sys/kernel/sched_domain/cpu1: domain0 /proc/sys/kernel/sched_domain/cpu2: /proc/sys/kernel/sched_domain/cpu3: domain0 /proc/sys/kernel/sched_domain/cpu4: domain0 /proc/sys/kernel/sched_domain/cpu5: domain0 /proc/sys/kernel/sched_domain/cpu6: domain0 /proc/sys/kernel/sched_domain/cpu7: domain0 Another kernel without CONFIG_CPUMASK_OFFSTACK and "isolcpus=2", so directories missing for CPUs 2-7: # ls /proc/sys/kernel/sched_domain/ cpu0 cpu1 # ls /proc/sys/kernel/sched_domain/* /proc/sys/kernel/sched_domain/cpu0: domain0 /proc/sys/kernel/sched_domain/cpu1: domain0 -Tommi
4.14 perf test patches
Hi, Can you pick these patches to 4.14.y? These fix some "perf test" errors seen when running in VM. commit 10836d9f9ac63d40ccfa756f871ce4ed51ae3b52 Author: Jiri Olsa Date: Mon Jul 3 16:50:30 2017 +0200 perf tests attr: Fix task term values commit f6a9820d572bd8384d982357cbad214b3a6c04bb Author: Jiri Olsa Date: Thu Sep 28 18:06:33 2017 +0200 perf tests attr: Fix group stat tests commit 692f5a22cd284bb8233a38e3ed86881d2d9c89d4 Author: Jiri Olsa Date: Mon Oct 9 15:07:12 2017 +0200 perf tests attr: Make hw events optional -Tommi
[PATCH 4.14 6/8] uio: fix wrong return value from uio_mmap()
From: Hailong Liu commit e7de2590f18a272e63732b9d519250d1b522b2c4 upstream. uio_mmap has multiple fail paths to set return value to nonzero then goto out. However, it always returns *0* from the *out* at end, and this will mislead callers who check the return value of this function. Fixes: 57c5f4df0a5a0ee ("uio: fix crash after the device is unregistered") CC: Xiubo Li Signed-off-by: Hailong Liu Cc: stable Signed-off-by: Jiang Biao Signed-off-by: Greg Kroah-Hartman Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 262610192755..fed2d8fa4d4d 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -816,7 +816,7 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) out: mutex_unlock(>info_lock); - return 0; + return ret; } static const struct file_operations uio_fops = { -- 2.20.1
[PATCH 4.14 8/8] Revert "uio: use request_threaded_irq instead"
From: Xiubo Li commit 3d27c4de8d4fb2d4099ff324671792aa2578c6f9 upstream. Since mutex lock in irq hanler is useless currently, here will remove it together with it. This reverts commit 9421e45f5ff3d558cf8b75a8cc0824530caf3453. Reported-by: james.r.har...@intel.com CC: Ahsan Atta Signed-off-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 4e0cb7cdf739..fb5c9701b1fb 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -445,13 +445,10 @@ static irqreturn_t uio_interrupt(int irq, void *dev_id) struct uio_device *idev = (struct uio_device *)dev_id; irqreturn_t ret; - mutex_lock(>info_lock); - ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); - mutex_unlock(>info_lock); return ret; } @@ -974,9 +971,8 @@ int __uio_register_device(struct module *owner, * FDs at the time of unregister and therefore may not be * freed until they are released. */ - ret = request_threaded_irq(info->irq, NULL, uio_interrupt, - info->irq_flags, info->name, idev); - + ret = request_irq(info->irq, uio_interrupt, + info->irq_flags, info->name, idev); if (ret) { info->uio_dev = NULL; goto err_request_irq; -- 2.20.1
[PATCH 4.14 7/8] uio: fix possible circular locking dependency
From: Xiubo Li commit b34e9a15b37b8ddbf06a4da142b0c39c74211eb4 upstream. The call trace: XXX/1910 is trying to acquire lock: (>mmap_sem){++}, at: [] might_fault+0x57/0xb0 but task is already holding lock: (>info_lock){+.+...}, at: [] uio_write+0x46/0x130 [uio] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (>info_lock){+.+...}: [] lock_acquire+0x99/0x1e0 [] mutex_lock_nested+0x93/0x410 [] uio_mmap+0x2d/0x170 [uio] [] mmap_region+0x428/0x650 [] do_mmap+0x3b8/0x4e0 [] vm_mmap_pgoff+0xd3/0x120 [] SyS_mmap_pgoff+0x1f1/0x270 [] SyS_mmap+0x22/0x30 [] system_call_fastpath+0x1c/0x21 -> #0 (>mmap_sem){++}: [] __lock_acquire+0xdac/0x15f0 [] lock_acquire+0x99/0x1e0 [] might_fault+0x84/0xb0 [] uio_write+0xb4/0x130 [uio] [] vfs_write+0xc3/0x1f0 [] SyS_write+0x8a/0x100 [] system_call_fastpath+0x1c/0x21 other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(>info_lock); lock(>mmap_sem); lock(>info_lock); lock(>mmap_sem); *** DEADLOCK *** 1 lock held by XXX/1910: #0: (>info_lock){+.+...}, at: [] uio_write+0x46/0x130 [uio] stack backtrace: CPU: 0 PID: 1910 Comm: XXX Kdump: loaded Not tainted #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 Call Trace: [] dump_stack+0x19/0x1b [] print_circular_bug+0x1f9/0x207 [] check_prevs_add+0x957/0x960 [] __lock_acquire+0xdac/0x15f0 [] ? mark_held_locks+0xb9/0x140 [] lock_acquire+0x99/0x1e0 [] ? might_fault+0x57/0xb0 [] might_fault+0x84/0xb0 [] ? might_fault+0x57/0xb0 [] uio_write+0xb4/0x130 [uio] [] vfs_write+0xc3/0x1f0 [] ? fget_light+0xfc/0x510 [] SyS_write+0x8a/0x100 [] system_call_fastpath+0x1c/0x21 Signed-off-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index fed2d8fa4d4d..4e0cb7cdf739 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -627,6 +627,12 @@ static ssize_t uio_write(struct file *filep, const char __user *buf, ssize_t retval; s32 irq_on; + if (count != sizeof(s32)) + return -EINVAL; + + if (copy_from_user(_on, buf, count)) + return -EFAULT; + mutex_lock(>info_lock); if (!idev->info) { retval = -EINVAL; @@ -638,21 +644,11 @@ static ssize_t uio_write(struct file *filep, const char __user *buf, goto out; } - if (count != sizeof(s32)) { - retval = -EINVAL; - goto out; - } - if (!idev->info->irqcontrol) { retval = -ENOSYS; goto out; } - if (copy_from_user(_on, buf, count)) { - retval = -EFAULT; - goto out; - } - retval = idev->info->irqcontrol(idev->info, irq_on); out: -- 2.20.1
[PATCH 4.14 4/8] uio: change to use the mutex lock instead of the spin lock
From: Xiubo Li commit 543af5861f41af0a5d2432f6fb5976af50f9cee5 upstream. We are hitting a regression with the following commit: commit a93e7b331568227500186a465fee3c2cb5dffd1f Author: Hamish Martin Date: Mon May 14 13:32:23 2018 +1200 uio: Prevent device destruction while fds are open The problem is the addition of spin_lock_irqsave in uio_write. This leads to hitting uio_write -> copy_from_user -> _copy_from_user -> might_fault and the logs filling up with sleeping warnings. I also noticed some uio drivers allocate memory, sleep, grab mutexes from callouts like open() and release and uio is now doing spin_lock_irqsave while calling them. Reported-by: Mike Christie CC: Hamish Martin Reviewed-by: Hamish Martin Signed-off-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 32 +--- include/linux/uio_driver.h | 2 +- 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index c97945a3f572..4441235a56cc 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -435,7 +435,6 @@ static int uio_open(struct inode *inode, struct file *filep) struct uio_device *idev; struct uio_listener *listener; int ret = 0; - unsigned long flags; mutex_lock(_lock); idev = idr_find(_idr, iminor(inode)); @@ -462,10 +461,10 @@ static int uio_open(struct inode *inode, struct file *filep) listener->event_count = atomic_read(>event); filep->private_data = listener; - spin_lock_irqsave(>info_lock, flags); + mutex_lock(>info_lock); if (idev->info && idev->info->open) ret = idev->info->open(idev->info, inode); - spin_unlock_irqrestore(>info_lock, flags); + mutex_unlock(>info_lock); if (ret) goto err_infoopen; @@ -497,12 +496,11 @@ static int uio_release(struct inode *inode, struct file *filep) int ret = 0; struct uio_listener *listener = filep->private_data; struct uio_device *idev = listener->dev; - unsigned long flags; - spin_lock_irqsave(>info_lock, flags); + mutex_lock(>info_lock); if (idev->info && idev->info->release) ret = idev->info->release(idev->info, inode); - spin_unlock_irqrestore(>info_lock, flags); + mutex_unlock(>info_lock); module_put(idev->owner); kfree(listener); @@ -515,12 +513,11 @@ static unsigned int uio_poll(struct file *filep, poll_table *wait) struct uio_listener *listener = filep->private_data; struct uio_device *idev = listener->dev; unsigned int ret = 0; - unsigned long flags; - spin_lock_irqsave(>info_lock, flags); + mutex_lock(>info_lock); if (!idev->info || !idev->info->irq) ret = -EIO; - spin_unlock_irqrestore(>info_lock, flags); + mutex_unlock(>info_lock); if (ret) return ret; @@ -539,12 +536,11 @@ static ssize_t uio_read(struct file *filep, char __user *buf, DECLARE_WAITQUEUE(wait, current); ssize_t retval = 0; s32 event_count; - unsigned long flags; - spin_lock_irqsave(>info_lock, flags); + mutex_lock(>info_lock); if (!idev->info || !idev->info->irq) retval = -EIO; - spin_unlock_irqrestore(>info_lock, flags); + mutex_unlock(>info_lock); if (retval) return retval; @@ -594,9 +590,8 @@ static ssize_t uio_write(struct file *filep, const char __user *buf, struct uio_device *idev = listener->dev; ssize_t retval; s32 irq_on; - unsigned long flags; - spin_lock_irqsave(>info_lock, flags); + mutex_lock(>info_lock); if (!idev->info || !idev->info->irq) { retval = -EIO; goto out; @@ -620,7 +615,7 @@ static ssize_t uio_write(struct file *filep, const char __user *buf, retval = idev->info->irqcontrol(idev->info, irq_on); out: - spin_unlock_irqrestore(>info_lock, flags); + mutex_unlock(>info_lock); return retval ? retval : sizeof(s32); } @@ -874,7 +869,7 @@ int __uio_register_device(struct module *owner, idev->owner = owner; idev->info = info; - spin_lock_init(>info_lock); + mutex_init(>info_lock); init_waitqueue_head(>wait); atomic_set(>event, 0); @@ -940,7 +935,6 @@ EXPORT_SYMBOL_GPL(__uio_register_device); void uio_unregister_device(struct uio_info *info) { struct uio_device *idev; - unsigned long flags; if (!info || !info->uio_dev) return; @@ -954,9 +948,9 @@ void uio_unregister_device(struct uio_info *info) if (info->irq && info->irq != UIO_IRQ_CUSTOM) free_irq(info->irq, idev); - spin_lock_irqsave(>info_lock, flags); + mutex_lock(>info_lock); idev->info =
[PATCH 4.14 5/8] uio: fix crash after the device is unregistered
From: Xiubo Li commit 57c5f4df0a5a0ee83df71251e2ee93a5e4e9 upstream. For the target_core_user use case, after the device is unregistered it maybe still opened in user space, then the kernel will crash, like: [ 251.163692] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 251.163820] IP: [] show_name+0x23/0x40 [uio] [ 251.163965] PGD 800062694067 PUD 62696067 PMD 0 [ 251.164097] Oops: [#1] SMP ... [ 251.165605] e1000 mptscsih mptbase drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [ 251.166014] CPU: 0 PID: 13380 Comm: tcmu-runner Kdump: loaded Not tainted 3.10.0-916.el7.test.x86_64 #1 [ 251.166381] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 251.166747] task: 971eb91db0c0 ti: 971e9e384000 task.ti: 971e9e384000 [ 251.167137] RIP: 0010:[] [] show_name+0x23/0x40 [uio] [ 251.167563] RSP: 0018:971e9e387dc8 EFLAGS: 00010282 [ 251.167978] RAX: RBX: 971e9e3f8000 RCX: 971eb8368d98 [ 251.168408] RDX: 971e9e3f8000 RSI: c0738084 RDI: 971e9e3f8000 [ 251.168856] RBP: 971e9e387dd0 R08: 971eb8bc0018 R09: [ 251.169296] R10: 1000 R11: a09d444d R12: a1076e80 [ 251.169750] R13: 971e9e387f18 R14: 0001 R15: 971e9cfb1c80 [ 251.170213] FS: 7ff37d175880() GS:971ebb60() knlGS: [ 251.170693] CS: 0010 DS: ES: CR0: 80050033 [ 251.171248] CR2: 0008 CR3: 001f6000 CR4: 003607f0 [ 251.172071] DR0: DR1: DR2: [ 251.172640] DR3: DR6: fffe0ff0 DR7: 0400 [ 251.173236] Call Trace: [ 251.173789] [] dev_attr_show+0x23/0x60 [ 251.174356] [] ? mutex_lock+0x12/0x2f [ 251.174892] [] sysfs_kf_seq_show+0xcf/0x1f0 [ 251.175433] [] kernfs_seq_show+0x26/0x30 [ 251.175981] [] seq_read+0x110/0x3f0 [ 251.176609] [] kernfs_fop_read+0xf5/0x160 [ 251.177158] [] vfs_read+0x9f/0x170 [ 251.177707] [] SyS_read+0x7f/0xf0 [ 251.178268] [] system_call_fastpath+0x1c/0x21 [ 251.178823] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 d3 e8 7e 96 56 e0 48 8b 80 d8 02 00 00 48 89 df 48 c7 c6 84 80 73 c0 <48> 8b 50 08 31 c0 e8 e2 67 44 e0 5b 48 98 5d c3 0f 1f 00 66 2e [ 251.180115] RIP [] show_name+0x23/0x40 [uio] [ 251.180820] RSP [ 251.181473] CR2: 0008 CC: Hamish Martin CC: Mike Christie Reviewed-by: Hamish Martin Signed-off-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 104 +++--- 1 file changed, 88 insertions(+), 16 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 4441235a56cc..262610192755 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -215,7 +215,20 @@ static ssize_t name_show(struct device *dev, struct device_attribute *attr, char *buf) { struct uio_device *idev = dev_get_drvdata(dev); - return sprintf(buf, "%s\n", idev->info->name); + int ret; + + mutex_lock(>info_lock); + if (!idev->info) { + ret = -EINVAL; + dev_err(dev, "the device has been unregistered\n"); + goto out; + } + + ret = sprintf(buf, "%s\n", idev->info->name); + +out: + mutex_unlock(>info_lock); + return ret; } static DEVICE_ATTR_RO(name); @@ -223,7 +236,20 @@ static ssize_t version_show(struct device *dev, struct device_attribute *attr, char *buf) { struct uio_device *idev = dev_get_drvdata(dev); - return sprintf(buf, "%s\n", idev->info->version); + int ret; + + mutex_lock(>info_lock); + if (!idev->info) { + ret = -EINVAL; + dev_err(dev, "the device has been unregistered\n"); + goto out; + } + + ret = sprintf(buf, "%s\n", idev->info->version); + +out: + mutex_unlock(>info_lock); + return ret; } static DEVICE_ATTR_RO(version); @@ -417,11 +443,15 @@ EXPORT_SYMBOL_GPL(uio_event_notify); static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; - irqreturn_t ret = idev->info->handler(irq, idev->info); + irqreturn_t ret; + + mutex_lock(>info_lock); + ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); + mutex_unlock(>info_lock); return ret; } @@ -462,6 +492,12 @@ static int uio_open(struct inode *inode, struct file *filep) filep->private_data = listener; mutex_lock(>info_lock); + if (!idev->info) { + mutex_unlock(>info_lock); + ret = -EINVAL; + goto
[PATCH 4.14 1/8] uio: Reduce return paths from uio_write()
From: Hamish Martin commit 81daa406c2cc97d85eef9409400404efc2a3f756 upstream. Drive all return paths for uio_write() through a single block at the end of the function. Signed-off-by: Hamish Martin Reviewed-by: Chris Packham Signed-off-by: Greg Kroah-Hartman Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 654579bc1e54..10f249628e79 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -570,20 +570,29 @@ static ssize_t uio_write(struct file *filep, const char __user *buf, ssize_t retval; s32 irq_on; - if (!idev->info->irq) - return -EIO; + if (!idev->info->irq) { + retval = -EIO; + goto out; + } - if (count != sizeof(s32)) - return -EINVAL; + if (count != sizeof(s32)) { + retval = -EINVAL; + goto out; + } - if (!idev->info->irqcontrol) - return -ENOSYS; + if (!idev->info->irqcontrol) { + retval = -ENOSYS; + goto out; + } - if (copy_from_user(_on, buf, count)) - return -EFAULT; + if (copy_from_user(_on, buf, count)) { + retval = -EFAULT; + goto out; + } retval = idev->info->irqcontrol(idev->info, irq_on); +out: return retval ? retval : sizeof(s32); } -- 2.20.1
[PATCH 4.14 3/8] uio: use request_threaded_irq instead
From: Xiubo Li commit 9421e45f5ff3d558cf8b75a8cc0824530caf3453 upstream. Prepraing for changing to use mutex lock. Signed-off-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 288c4b977184..c97945a3f572 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -911,8 +911,9 @@ int __uio_register_device(struct module *owner, * FDs at the time of unregister and therefore may not be * freed until they are released. */ - ret = request_irq(info->irq, uio_interrupt, - info->irq_flags, info->name, idev); + ret = request_threaded_irq(info->irq, NULL, uio_interrupt, + info->irq_flags, info->name, idev); + if (ret) { info->uio_dev = NULL; goto err_request_irq; -- 2.20.1
[PATCH 4.14 2/8] uio: Prevent device destruction while fds are open
From: Hamish Martin commit a93e7b331568227500186a465fee3c2cb5dffd1f upstream. Prevent destruction of a uio_device while user space apps hold open file descriptors to that device. Further, access to the 'info' member of the struct uio_device is protected by spinlock. This is to ensure stale pointers to data not under control of the UIO subsystem are not dereferenced. Signed-off-by: Hamish Martin Reviewed-by: Chris Packham Signed-off-by: Greg Kroah-Hartman [4.14 change __poll_t to unsigned int] Signed-off-by: Tommi Rantala --- drivers/uio/uio.c | 98 -- include/linux/uio_driver.h | 4 +- 2 files changed, 75 insertions(+), 27 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 10f249628e79..288c4b977184 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -272,7 +272,7 @@ static int uio_dev_add_attributes(struct uio_device *idev) if (!map_found) { map_found = 1; idev->map_dir = kobject_create_and_add("maps", - >dev->kobj); + >dev.kobj); if (!idev->map_dir) { ret = -ENOMEM; goto err_map; @@ -301,7 +301,7 @@ static int uio_dev_add_attributes(struct uio_device *idev) if (!portio_found) { portio_found = 1; idev->portio_dir = kobject_create_and_add("portio", - >dev->kobj); + >dev.kobj); if (!idev->portio_dir) { ret = -ENOMEM; goto err_portio; @@ -344,7 +344,7 @@ static int uio_dev_add_attributes(struct uio_device *idev) kobject_put(>kobj); } kobject_put(idev->map_dir); - dev_err(idev->dev, "error creating sysfs files (%d)\n", ret); + dev_err(>dev, "error creating sysfs files (%d)\n", ret); return ret; } @@ -381,7 +381,7 @@ static int uio_get_minor(struct uio_device *idev) idev->minor = retval; retval = 0; } else if (retval == -ENOSPC) { - dev_err(idev->dev, "too many uio devices\n"); + dev_err(>dev, "too many uio devices\n"); retval = -EINVAL; } mutex_unlock(_lock); @@ -435,6 +435,7 @@ static int uio_open(struct inode *inode, struct file *filep) struct uio_device *idev; struct uio_listener *listener; int ret = 0; + unsigned long flags; mutex_lock(_lock); idev = idr_find(_idr, iminor(inode)); @@ -444,9 +445,11 @@ static int uio_open(struct inode *inode, struct file *filep) goto out; } + get_device(>dev); + if (!try_module_get(idev->owner)) { ret = -ENODEV; - goto out; + goto err_module_get; } listener = kmalloc(sizeof(*listener), GFP_KERNEL); @@ -459,11 +462,13 @@ static int uio_open(struct inode *inode, struct file *filep) listener->event_count = atomic_read(>event); filep->private_data = listener; - if (idev->info->open) { + spin_lock_irqsave(>info_lock, flags); + if (idev->info && idev->info->open) ret = idev->info->open(idev->info, inode); - if (ret) - goto err_infoopen; - } + spin_unlock_irqrestore(>info_lock, flags); + if (ret) + goto err_infoopen; + return 0; err_infoopen: @@ -472,6 +477,9 @@ static int uio_open(struct inode *inode, struct file *filep) err_alloc_listener: module_put(idev->owner); +err_module_get: + put_device(>dev); + out: return ret; } @@ -489,12 +497,16 @@ static int uio_release(struct inode *inode, struct file *filep) int ret = 0; struct uio_listener *listener = filep->private_data; struct uio_device *idev = listener->dev; + unsigned long flags; - if (idev->info->release) + spin_lock_irqsave(>info_lock, flags); + if (idev->info && idev->info->release) ret = idev->info->release(idev->info, inode); + spin_unlock_irqrestore(>info_lock, flags); module_put(idev->owner); kfree(listener); + put_device(>dev); return ret; } @@ -502,9 +514,16 @@ static unsigned int uio_poll(struct file *filep, poll_table *wait) { struct uio_listener *listener = filep->private_data; struct uio_device *idev = listener->dev; + unsigned int ret = 0; + unsigned long flags; - if (!idev->info->irq) - return -EIO; + spin_lock_irqsave(>info_lock, flags); + if (!idev->info || !idev->info->irq) +
[PATCH 4.14 0/8] uio backport fixes for 4.14
Backport uio fixes to 4.14, to fix use-after-free memory errors. Changed __poll_t to unsigned int as the former not found in 4.14, and resolved some patch context conflicts. Hailong Liu (1): uio: fix wrong return value from uio_mmap() Hamish Martin (2): uio: Reduce return paths from uio_write() uio: Prevent device destruction while fds are open Xiubo Li (5): uio: use request_threaded_irq instead uio: change to use the mutex lock instead of the spin lock uio: fix crash after the device is unregistered uio: fix possible circular locking dependency Revert "uio: use request_threaded_irq instead" drivers/uio/uio.c | 206 - include/linux/uio_driver.h | 4 +- 2 files changed, 163 insertions(+), 47 deletions(-) -- 2.20.1
Re: Suspected SPAM - Re: [PATCH 4.14 198/205] perf/core: Dont WARN() for impossible ring-buffer sizes
On Wed, 2019-02-13 at 13:03 +, Rantala, Tommi T. (Nokia - FI/Espoo) wrote: > On Mon, 2019-02-11 at 15:19 +0100, Greg Kroah-Hartman wrote: > > 4.14-stable review patch. If anyone has any objections, please let > > me know. > > > > -- > > > > From: Mark Rutland > > > > commit 9dff0aa95a324e262ffb03f425d00e4751f3294e upstream. > > > > The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to > > determine > > how > > large its ringbuffer mmap should be. This can be configured to > > arbitrary > > values, which can be larger than the maximum possible allocation > > from > > kmalloc. > > > > When this is configured to a suitably large value (e.g. thanks to > > the > > perf fuzzer), attempting to use perf record triggers a > > WARN_ON_ONCE() > > in > > __alloc_pages_nodemask(): > > > >WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 > > __alloc_pages_nodemask+0x3f8/0xbc8 > > > > Let's avoid this by checking that the requested allocation is > > possible > > before calling kzalloc. > > Hi, > > Perf tool is broken for me in 4.14.99 (running in x86_64 VM), > bisection > points to this patch. ... and I see there's a fix available: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=528871b456026e6127d95b1b2bd8e3a003dc1614 -Tommi > > > > Reported-by: Julien Thierry > > Signed-off-by: Mark Rutland > > Signed-off-by: Peter Zijlstra (Intel) > > Reviewed-by: Julien Thierry > > Cc: Alexander Shishkin > > Cc: Arnaldo Carvalho de Melo > > Cc: Jiri Olsa > > Cc: Linus Torvalds > > Cc: Namhyung Kim > > Cc: Peter Zijlstra > > Cc: Thomas Gleixner > > Cc: > > Link: > > https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutl...@arm.com > > Signed-off-by: Ingo Molnar > > Signed-off-by: Greg Kroah-Hartman > > > > --- > > kernel/events/ring_buffer.c |3 +++ > > 1 file changed, 3 insertions(+) > > > > --- a/kernel/events/ring_buffer.c > > +++ b/kernel/events/ring_buffer.c > > @@ -719,6 +719,9 @@ struct ring_buffer *rb_alloc(int nr_page > > size = sizeof(struct ring_buffer); > > size += nr_pages * sizeof(void *); > > > > + if (order_base_2(size) >= MAX_ORDER) > > + goto fail; > > + > > rb = kzalloc(size, GFP_KERNEL); > > if (!rb) > > goto fail; > > > >
Re: [PATCH 4.14 198/205] perf/core: Dont WARN() for impossible ring-buffer sizes
On Mon, 2019-02-11 at 15:19 +0100, Greg Kroah-Hartman wrote: > 4.14-stable review patch. If anyone has any objections, please let > me know. > > -- > > From: Mark Rutland > > commit 9dff0aa95a324e262ffb03f425d00e4751f3294e upstream. > > The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine > how > large its ringbuffer mmap should be. This can be configured to > arbitrary > values, which can be larger than the maximum possible allocation from > kmalloc. > > When this is configured to a suitably large value (e.g. thanks to the > perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() > in > __alloc_pages_nodemask(): > >WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 > __alloc_pages_nodemask+0x3f8/0xbc8 > > Let's avoid this by checking that the requested allocation is > possible > before calling kzalloc. Hi, Perf tool is broken for me in 4.14.99 (running in x86_64 VM), bisection points to this patch. # perf top Error: Failed to mmap with 12 (Cannot allocate memory) # perf trace Cannot allocate memory # strace -T -tt -f -y perf top [...] 14:22:09.829544 openat(AT_FDCWD, "/proc/sys/kernel/perf_event_mlock_kb", O_RDONLY) = 18 <0.15> 14:22:09.829612 read(18, "516\n", 64) = 4 <0.11> 14:22:09.829655 close(18) = 0 <0.08> 14:22:09.829702 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory) <0.15> 14:22:09.829763 write(2, "Error:\n", 7) = 7 <0.09> 14:22:09.829810 write(2, "Failed to mmap with 12 (Cannot a"..., 48) = 48 <0.08> Changing the patch like this fixes it... - if (order_base_2(size) >= MAX_ORDER) + if (order_base_2(size) > MAX_ORDER) -Tommi > Reported-by: Julien Thierry > Signed-off-by: Mark Rutland > Signed-off-by: Peter Zijlstra (Intel) > Reviewed-by: Julien Thierry > Cc: Alexander Shishkin > Cc: Arnaldo Carvalho de Melo > Cc: Jiri Olsa > Cc: Linus Torvalds > Cc: Namhyung Kim > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: > Link: > https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutl...@arm.com > Signed-off-by: Ingo Molnar > Signed-off-by: Greg Kroah-Hartman > > --- > kernel/events/ring_buffer.c |3 +++ > 1 file changed, 3 insertions(+) > > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -719,6 +719,9 @@ struct ring_buffer *rb_alloc(int nr_page > size = sizeof(struct ring_buffer); > size += nr_pages * sizeof(void *); > > + if (order_base_2(size) >= MAX_ORDER) > + goto fail; > + > rb = kzalloc(size, GFP_KERNEL); > if (!rb) > goto fail; > >
4.4 "rcu: Force boolean subscript for expedited stall warnings"
Hi, Can you pick this tiny one-liner patch to 4.4.y? Fixes unexpected null byte in RCU "expedited stall" message. commit ec3833ed02ae6ef2a933ece9de7cbab0c64c699e Author: Paul E. McKenney Date: Mon Jan 11 16:29:29 2016 -0800 rcu: Force boolean subscript for expedited stall warnings -Tommi
4.14 "random: add a config option to trust the CPU's hwrng"
Hi stable maintainers, Can you consider including these "random" patches in 4.14.y? These are very useful in fixing esp. first-bootup delays of VMs due to entropy starvation. commit 39a8883a2b989d1d21bd8dd99f5557f0c5e89694 Author: Theodore Ts'o Date: Tue Jul 17 18:24:27 2018 -0400 random: add a config option to trust the CPU's hwrng commit 9b25436662d5fb4c66eb527ead53cab15f596ee0 Author: Kees Cook Date: Mon Aug 27 14:51:54 2018 -0700 random: make CPU trust a boot parameter -Tommi
4.14 "uio: Prevent device destruction while fds are open"
Hi, I hit use-after-free issues in UIO in 4.14.x, and discovered that it's already fixed in later kernel versions: commit a93e7b331568227500186a465fee3c2cb5dffd1f Author: Hamish Martin Date: Mon May 14 13:32:23 2018 +1200 uio: Prevent device destruction while fds are open Can we have this in 4.14.y? (good idea to older LTS kernels too) I picked and tested the following commits in 4.14.x: # Temporarily revert "uio: Fix an Oops on load", # to avoid merge conflict later with "uio: use # request_threaded_irq instead" git revert f6a6ae4e0f345aa481535bfe2046cd33f4dc37b8 # "uio: Reduce return paths from uio_write()" git cherry-pick 81daa406c2cc97d85eef9409400404efc2a3f756 # "uio: Prevent device destruction while fds are open" # Also amend this, change __poll_t to plain unsigned int, # the former not found in 4.14. git cherry-pick a93e7b331568227500186a465fee3c2cb5dffd1f sed -i "s/__poll_t/unsigned int/" drivers/uio/uio.c git commit --amend drivers/uio/uio.c # "uio: use request_threaded_irq instead" git cherry-pick 9421e45f5ff3d558cf8b75a8cc0824530caf3453 # "uio: change to use the mutex lock instead of the spin lock" # Resolve conflict due to __poll_t in patch context. git cherry-pick 543af5861f41af0a5d2432f6fb5976af50f9cee5 sed -i -e '/<<>>/d' \ -e 's/__poll_t/unsigned int/' drivers/uio/uio.c git add drivers/uio/uio.c git cherry-pick --continue # uio: fix crash after the device is unregistered git cherry-pick 57c5f4df0a5a0ee83df71251e2ee93a5e4e9 # uio: fix wrong return value from uio_mmap() git cherry-pick e7de2590f18a272e63732b9d519250d1b522b2c4 # uio: fix possible circular locking dependency git cherry-pick b34e9a15b37b8ddbf06a4da142b0c39c74211eb4 # Revert "uio: use request_threaded_irq instead" git cherry-pick 3d27c4de8d4fb2d4099ff324671792aa2578c6f9 # re-apply: uio: Fix an Oops on load git cherry-pick 432798195bbce1f8cd33d1c0284d0538835e25fb -Tommi
4.14 revert "seccomp: add a selftest for get_metadata"
Hi Greg, Can you please revert this commit in 4.14? commit e65cd9a20343ea90f576c24c38ee85ab6e7d5fec Author: Tycho Andersen Date: Tue Feb 20 19:47:47 2018 -0700 seccomp: add a selftest for get_metadata [ Upstream commit d057dc4e35e16050befa3dda943876dab39cbf80 ] Let's test that we get the flags correctly, and that we preserve the filter index across the ptrace(PTRACE_SECCOMP_GET_METADATA) correctly. PTRACE_SECCOMP_GET_METADATA was only added in 4.16 (26500475ac1b499d8636ff281311d633909f5d20) And it's also breaking seccomp_bpf.c compilation for me: seccomp_bpf.c: In function ‘get_metadata’: seccomp_bpf.c:2878:26: error: storage size of ‘md’ isn’t known struct seccomp_metadata md; ^~ -Tommi
4.14 perf unwind fixes
Hi Greg, Can you please pick these two upstream patches to 4.14? They fix broken perf unwinding for me. commit 3d20c6246690219881786de10d2dda93f616d0ac Author: Martin Vuille < jpm...@aim.com> Date: Sun Feb 11 16:24:20 2018 -0500 perf unwind: Unwind with libdw doesn't take symfs into account commit 1fe627da30331024f453faef04d500079b901107 Author: Milian Wolff < milian.wo...@kdab.com> Date: Mon Oct 29 15:16:44 2018 +0100 perf unwind: Take pgoff into account when reporting elf to libdwfl -Tommi