Memory corruption (redzone overwritten) names_cache?
Hi, We're getting some random memory corruption on an AWS EC2 instance with 4.19.x kernels. I've tried 4.19.19, 4.19.52, but the results below are from the most recent (4.19.72). For debugging I enabled KASAN+slub_debug, but TBH, I can't make heads or tails from these. Without slub_debug, the host reboots within couple of minutes of uptime. With slub_debug it survives a bit longer, but eventually all sorts of issues manifest (including: reboot; ps not being able to read some processes' /proc//cmdline while /proc//stack shows acct_collect()->down_read(), etc). Upon multiple tests, the slab I most often seen pop up as first detected as corrupted was names_cache. What is really weird is that multiple times I saw redzone being overwritten by the same content, which looks like part of 'sessions.py' Python's 'requests' module. Any debugging hints would be greatly appreciated. Command line: BOOT_IMAGE=(hd0,msdos2)/vmlinuz ro root=/dev/xvda5 console=tty0 console=ttyS0,9600n8 crashkernel=512M-2G:64M,2G-:128M kmemleak=on slub_debug=FZPU slub_nomerge (...) [ 262.957418] = [ 262.957423] BUG vm_area_struct (Tainted: GB O ): Redzone overwritten [ 262.957424] - [ 262.957427] INFO: 0xb91cc681-0x98bd5238. First byte 0x6e instead of 0xcc [ 262.957433] INFO: Allocated in vm_area_dup+0x1e/0x180 age=6117 cpu=0 pid=8187 [ 262.957438] kmem_cache_alloc+0x1a4/0x1d0 [ 262.957439] vm_area_dup+0x1e/0x180 [ 262.957441] copy_process.part.4+0x2fa9/0x6cd0 [ 262.957443] _do_fork+0x151/0x7a0 [ 262.957446] do_syscall_64+0x9b/0x290 [ 262.957452] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 262.957455] INFO: Freed in qlist_free_all+0x37/0xd0 age=7431 cpu=0 pid=8521 [ 262.957457] quarantine_reduce+0x1a2/0x210 [ 262.957458] kasan_kmalloc+0x95/0xc0 [ 262.957460] kmem_cache_alloc+0xc6/0x1d0 [ 262.957463] getname_flags+0xba/0x510 [ 262.957465] user_path_at_empty+0x1d/0x40 [ 262.957468] vfs_statx+0xb9/0x140 [ 262.957470] __se_sys_newstat+0x7c/0xd0 [ 262.957472] do_syscall_64+0x9b/0x290 [ 262.957474] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 262.957476] INFO: Slab 0xca532806 objects=30 used=24 fp=0x6ce6da86 flags=0x2008101 [ 262.957477] INFO: Object 0x5eb7e26b @offset=8 fp=0xac807fa7 [ 262.957480] Redzone b91cc681: 6e 73 2e 70 79 5c 22 2c ns.py\", [ 262.957482] Object 5eb7e26b: 20 6c 69 6e 65 20 36 34 36 2c 20 69 6e 20 73 65 line 646, in se [ 262.957484] Object 7d5d4673: 6e 64 5c 6e 20 20 20 20 72 20 3d 20 61 64 61 70 nd\nr = adap [ 262.957485] Object a3cf6db1: 74 65 72 2e 73 65 6e 64 28 72 65 71 75 65 73 74 ter.send(request [ 262.957487] Object d8b14cdd: 2c 20 2a 2a 6b 77 61 72 00 00 00 00 00 00 00 00 , **kwar [ 262.957489] Object 5eca0928: 40 97 5a 73 83 88 ff ff 25 00 00 00 00 00 00 80 @.Zs%... [ 262.957491] Object 592ffbd7: 71 00 00 00 00 00 00 00 e0 c8 22 6d 83 88 ff ff q."m [ 262.957492] Object 84c88ae5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 262.957494] Object ea6d1cb3: 83 00 00 00 00 00 00 00 80 c0 fd 5a 83 88 ff ff ...Z [ 262.957495] Object a236617c: 80 c0 fd 5a 83 88 ff ff 00 00 00 00 00 00 00 00 ...Z [ 262.957497] Object 91c7956c: 00 3a 94 b0 ff ff ff ff 75 00 00 00 00 00 00 00 .:..u... [ 262.957499] Object 216cef35: c0 85 cc 6a 83 88 ff ff 00 00 00 00 00 00 00 00 ...j [ 262.957500] Object e0fd506c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 262.957502] Redzone f5906e86: cc cc cc cc cc cc cc cc [ 262.957503] Padding 53d79574: 5a 5a 5a 5a 5a 5a 5a 5a [ 262.957507] CPU: 3 PID: 11769 Comm: ps Kdump: loaded Tainted: GB O 4.19.72_3upstreamdbg #1 [ 262.957508] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 [ 262.957509] Call Trace: [ 262.957516] dump_stack+0x9a/0xf0 [ 262.957519] check_bytes_and_report.cold.24+0x3f/0x6b [ 262.957521] check_object+0x17c/0x280 [ 262.957524] free_debug_processing+0x105/0x2a0 [ 262.957526] ? qlist_free_all+0x37/0xd0 [ 262.957527] ? qlist_free_all+0x37/0xd0 [ 262.957529] __slab_free+0x218/0x3b0 [ 262.957533] ? __free_pages_ok+0x62f/0x840 [ 262.957536] ? _raw_spin_unlock_irqrestore+0x2b/0x40 [ 262.957537] ? qlist_free_all+0x37/0xd0 [ 262.957541] ? trace_hardirqs_on+0x35/0x140 [ 262.957543] ? qlist_free_all+0x37/0xd0 [ 262.957544] qlist_free_all+0x4c/0xd0 [ 262.957546] quarantine_reduce+0x1a2/0x210 [ 262.957549] ? getname_flags+0xba/0x510 [ 262.957550] kasan_kmalloc+0x95/0xc0 [ 262.957553] ? getname
Re: [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled
On 2019-08-14, Reindl Harald wrote: that's still not in 5.2.8 It will make its way into next 5.2.x release, as it is now in the pending queue: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.2 Regards, Jakub. without the exception and "nf_conntrack_tcp_timeout_max_retrans = 60" a vnc-over-ssh session having the VNC view in the background freezes within 60 secods --- IPV4 TABLE MANGLE (STATEFUL PRE-NAT/FILTER) --- Chain PREROUTING (policy ACCEPT 100 packets, 9437 bytes) num pkts bytes target prot opt in out source destination 1 6526 3892K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0ctstate RELATED,ESTABLISHED 2 125 6264 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 3 64 4952 ACCEPT all -- vmnet8 * 0.0.0.0/0 0.0.0.0/0 4140 DROP all -- * * 0.0.0.0/0 0.0.0.0/0ctstate INVALID Weitergeleitete Nachricht Betreff: [PATCH AUTOSEL 5.2 07/76] netfilter: conntrack: always store window size un-scaled Am 08.08.19 um 11:02 schrieb Thomas Jarosch: Hello together, You wrote on Fri, Aug 02, 2019 at 09:22:24AM -0400: From: Florian Westphal [ Upstream commit 959b69ef57db00cb33e9c4777400ae7183ebddd3 ] Jakub Jankowski reported following oddity: After 3 way handshake completes, timeout of new connection is set to max_retrans (300s) instead of established (5 days). shortened excerpt from pcap provided: 25.070622 IP (flags [DF], proto TCP (6), length 52) 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8] 26.070462 IP (flags [DF], proto TCP (6), length 48) 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3] 27.070449 IP (flags [DF], proto TCP (6), length 40) 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0 Turns out the last_win is of u16 type, but we store the scaled value: 512 << 8 (== 0x2) becomes 0 window. The Fixes tag is not correct, as the bug has existed forever, but without that change all that this causes might cause is to mistake a window update (to-nonzero-from-zero) for a retransmit. Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0") Reported-by: Jakub Jankowski Tested-by: Jakub Jankowski Signed-off-by: Florian Westphal Acked-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Also: Tested-by: Thomas Jarosch ;) We've hit the issue with the wrong conntrack timeout at two different sites, long-lived connections to a SAP server over IPSec VPN were constantly dropping. For us this was a regression after updating from kernel 3.14 to 4.19. Yesterday I've applied the patch to kernel 4.19.57 and the problem is fixed. The issue was extra hard to debug as we could just boot the new kernel for twenty minutes in the evening on these productive systems. The stable kernel patch from last Friday came right on time. I was just about the replay the TCP connection with tcpreplay, so this saved me from another week of debugging. Thanks everyone! -- Jakub Jankowski|sha...@toxcorp.com|https://toxcorp.com/
Re: [PATCH] proc: Fix uninitialized byte read in get_mm_cmdline()
On 2019-07-12, Alexey Izbyshev wrote: On 7/12/19 8:46 PM, Alexey Dobriyan wrote: The proper fix to all /proc/*/cmdline problems is to revert f5b65348fd77839b50e79bc0a5e536832ea52d8d proc: fix missing final NUL in get_mm_cmdline() rewrite 5ab8271899658042fabc5ae7e6a99066a210bc0e fs/proc: simplify and clarify get_mm_cmdline() function Should this be interpreted as an actual suggestion to revert the patches, fix the conflicts, test and submit them, or is this more like thinking out loud? In the former case, will it be OK for long term branches? get_mm_cmdline() does seem easier to read for me before 5ab8271899658042. But it also has different semantics in corner cases, for example: - If there is no NUL at arg_end-1, it reads only the first string in the combined arg/env block, and doesn't terminate it with NUL. - If there is any problem with access_remote_vm() or copy_to_user(), it returns -EFAULT even if some data were copied to userspace. On the other hand, 5ab8271899658042 was merged not too long ago (about a year), so it's possible that the current semantics isn't heavily relied upon. I posted this (corner?) case ~3 months ago, unfortunately it wasn't picked up by anyone: https://lkml.org/lkml/2019/4/5/825 You can treat it as another datapoint in this discussion. Regards, Jakub -- Jakub Jankowski|sha...@toxcorp.com|https://toxcorp.com/
get_mm_cmdline and userspace (Perl) changing argv0
Starting with 4.18 we noticed changing own commandline in Perl by setting $0 leads to (a bit) surprising results. As long as we're setting $0 to something longer than the initial argv0, everything works (almost) as expected: reading from /proc/$$/cmdline yields what we set $0 to, followed by a single NULL byte. However, when setting $0 to something shorter, doing the same yields what we set $0 to, then a single NULL byte, then thousands (depending on the size of the environment) of ' ' (0x20), and then a single NULL. $ for i in $(seq 1 10); do perl -e '$0 = "1234567890"x'"$i"'; print `cat /proc/$$/cmdline`;' | wc -c; done 3291 3291 3291 3291 3291 3291 71 81 91 101 $ This leads to weird output of commands like "ps axufww", where a daemonized Perl script that had changed its $0 shows up as a long, mostly empty line on the process list. We've seen this on both Perl 5.20 and 5.28, and Perl's code handling changes to $0 hasn't really changed since: https://perl5.git.perl.org/perl.git/blob/HEAD:/mg.c#l2673 Reverting commit 5ab8271899658042fabc5ae7e6a99066a210bc0e ("fs/proc: simplify and clarify get_mm_cmdline() function") in kernel, however, seems to restore previous behavior (no NULL byte at the end): $ for i in $(seq 1 10); do perl -e '$0 = "1234567890"x'"$i"'; print `cat /proc/$$/cmdline`;' | wc -c; done 10 20 30 40 50 60 70 80 90 100 $ Has anybody else seen / been bothered by this? Regards, Jakub. -- Jakub Jankowski|sha...@toxcorp.com|https://toxcorp.com/ GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D