Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
On Mon, Oct 14, 2019 at 08:57:41AM -0700, Linus Torvalds wrote: > On Mon, Oct 14, 2019 at 12:03 AM Catalin Marinas > wrote: > > Linus, could you please merge the patch above? I can send it again if > > it's easier. > > I took it. Thanks. > Generally I prefer having patches (re-)sent to me explicitly rather > than getting a link to it, so for next time... Noted. -- Catalin
Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
On Mon, Oct 14, 2019 at 12:03 AM Catalin Marinas wrote: > > Linus, could you please merge the patch above? I can send it again if > it's easier. I took it. Generally I prefer having patches (re-)sent to me explicitly rather than getting a link to it, so for next time... Linus
Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
On Mon, Oct 14, 2019 at 01:51:15PM +0100, Catalin Marinas wrote: > In your case, CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, so it disables itself > irrespective of the pool size and trips over the bug. Even with default > off, it still involves the clean-up since kmemleak needs to track early > allocations in case it is turned on by the kmemleak=on cmdline option. > > So I think 16000 is sufficient in your case, the default-off triggered > the bug (well, unless you find in the logs "kmemleak: Memory pool empty, > consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE"). Ah, got it, thanks for the clarification! - Ted
Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
On Mon, Oct 14, 2019 at 07:50:21AM -0400, Theodore Y. Ts'o wrote: > On Mon, Oct 14, 2019 at 08:03:14AM +0100, Catalin Marinas wrote: > > Thanks for the report. I have a fix already: > > > > http://lkml.kernel.org/r/20191004134624.46216-1-catalin.mari...@arm.com > > > > I was hoping Andrew had sent it to Linus before -rc3 but it doesn't seem > > to be in mainline yet. > > Thanks for the pointer to the fix! Does that mean that the workaround > is to increase the kmemleak pool size? I had been using the default > (16000) and it seems surprising that that it wasn't enough to even get > the kernel through a standard boot sequence. Should we perhaps > increase the default mempool size? In your case, CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, so it disables itself irrespective of the pool size and trips over the bug. Even with default off, it still involves the clean-up since kmemleak needs to track early allocations in case it is turned on by the kmemleak=on cmdline option. So I think 16000 is sufficient in your case, the default-off triggered the bug (well, unless you find in the logs "kmemleak: Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE"). -- Catalin
Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
On Mon, Oct 14, 2019 at 08:03:14AM +0100, Catalin Marinas wrote: > Thanks for the report. I have a fix already: > > http://lkml.kernel.org/r/20191004134624.46216-1-catalin.mari...@arm.com > > I was hoping Andrew had sent it to Linus before -rc3 but it doesn't seem > to be in mainline yet. Thanks for the pointer to the fix! Does that mean that the workaround is to increase the kmemleak pool size? I had been using the default (16000) and it seems surprising that that it wasn't enough to even get the kernel through a standard boot sequence. Should we perhaps increase the default mempool size? - Ted
Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
Hi Ted, On Sun, Oct 13, 2019 at 10:26:33PM -0400, Theodore Y. Ts'o wrote: > Commit c566586818 ("mm: kmemleak: use the memory pool for early > allocations") causes my test kernels to fail to boot on using both kvm > and using Google Compute Engine. A git bisect localized it to > c566586818, and I confirmed by test building v5.4-rc3, which failed as > above using KVM. When I reverted c566586818 the kernel booted > successfully. Thanks for the report. I have a fix already: http://lkml.kernel.org/r/20191004134624.46216-1-catalin.mari...@arm.com I was hoping Andrew had sent it to Linus before -rc3 but it doesn't seem to be in mainline yet. Linus, could you please merge the patch above? I can send it again if it's easier. Thanks. -- Catalin
[REGRESSION] kmemleak: commit c566586818 causes failure to boot
Commit c566586818 ("mm: kmemleak: use the memory pool for early allocations") causes my test kernels to fail to boot on using both kvm and using Google Compute Engine. A git bisect localized it to c566586818, and I confirmed by test building v5.4-rc3, which failed as above using KVM. When I reverted c566586818 the kernel booted successfully. The symptoms are that the boot hangs after: [2.844808] hctosys: unable to open rtc device (rtc0) and then about 25 seconds later, we get the following warning: [ 28.237938] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:7] [ 28.239345] irq event stamp: 198897938 [ 28.240017] hardirqs last enabled at (198897937): [] _raw_write_unlock_irqrestore+ 0x43/0x47 [ 28.241979] hardirqs last disabled at (198897938): [] trace_hardirqs_off_thunk+0x1a /0x20 [ 28.243930] softirqs last enabled at (198876302): [] __do_softirq+0x32a/0x42a [ 28.247350] softirqs last disabled at (198876295): [] irq_exit+0xb3/0xc0 [ 28.250080] CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.4.0-rc3-xfstests-00403-g4f5cafb5cb84 #1225 [ 28.253081] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 28.254885] Workqueue: events kmemleak_do_cleanup [ 28.255570] RIP: 0010:_raw_write_unlock_irqrestore+0x45/0x47 [ 28.256401] Code: e8 b0 4d 60 ff 48 89 ef e8 d8 a6 60 ff f6 c7 02 75 11 53 9d e8 dc b1 68 ff 65 ff 0d cd 73 10 5f 5b 5d c3 e8 ed b0 68 ff 53 9d ed 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 [ 28.260440] RSP: :98984006fdf8 EFLAGS: 0246 ORIG_RAX: ff13 [ 28.262258] RAX: 94d7fd23a1c0 RBX: 0246 RCX: 0006 [ 28.264238] RDX: 0007 RSI: 94d7fd23a9c0 RDI: 94d7fd23a1c0 [ 28.267333] RBP: a1c94bc0 R08: 0006931c1cf2 R09: [ 28.269871] R10: R11: R12: [ 28.272175] R13: R14: R15: a1c94aa8 [ 28.274649] FS: () GS:94d7fd80() knlGS: [ 28.277758] CS: 0010 DS: ES: CR0: 80050033 [ 28.279638] CR2: CR3: 5fc12001 CR4: 00360ef0 [ 28.282367] Call Trace: [ 28.283075] find_and_remove_object+0x7f/0x90 [ 28.284335] delete_object_full+0xc/0x20 [ 28.285488] __kmemleak_do_cleanup+0x63/0x100 [ 28.286913] process_one_work+0x246/0x570 [ 28.288801] worker_thread+0x50/0x3b0 [ 28.290406] ? process_one_work+0x570/0x570 [ 28.291497] kthread+0x126/0x140 [ 28.292316] ? kthread_delayed_work_timer_fn+0xa0/0xa0 [ 28.294262] ret_from_fork+0x3a/0x50 [ 28.837921] rcu: INFO: rcu_sched self-detected stall on CPU ... I've attached the log from the KVM session and the config.gz used to build the kernels. - Ted config.gz Description: application/gzip log.201910132216.gz Description: application/gzip