Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018/12/19 23:30, Sebastian Andrzej Siewior wrote: > On 2018-12-18 15:07:45 [+], Catalin Marinas wrote: > … >> It may be worth running some performance/latency tests during kmemleak >> scanning (echo scan > /sys/kernel/debug/kmemleak) but at a quick look, >> I don't think we'd see any difference with a raw_spin_lock_t. >> >> With a bit more thinking (though I'll be off until the new year), we >> could probably get rid of the kmemleak_lock entirely in scan_block() and >> make lookup_object() and the related rbtree code in kmemleak RCU-safe. > Okay. So let me apply that patch into my RT tree with your ack (from the > other email). And then I hope that it either shows up upstream or gets > replaced with RCU in the ende :) I'd like to do the upstreaming or replacing. Thanks. Zhe > > Thanks. > > Sebastian >
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-12-18 15:07:45 [+], Catalin Marinas wrote: … > It may be worth running some performance/latency tests during kmemleak > scanning (echo scan > /sys/kernel/debug/kmemleak) but at a quick look, > I don't think we'd see any difference with a raw_spin_lock_t. > > With a bit more thinking (though I'll be off until the new year), we > could probably get rid of the kmemleak_lock entirely in scan_block() and > make lookup_object() and the related rbtree code in kmemleak RCU-safe. Okay. So let me apply that patch into my RT tree with your ack (from the other email). And then I hope that it either shows up upstream or gets replaced with RCU in the ende :) Thanks. Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Thu, Nov 22, 2018 at 05:04:19PM +0800, zhe...@windriver.com wrote: > From: He Zhe > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. > > BUG: scheduling while atomic: migration/15/132/0x0002 > Modules linked in: iTCO_wdt iTCO_vendor_support intel_rapl pcc_cpufreq > pnd2_edac intel_powerclamp coretemp crct10dif_pclmul crct10dif_common > aesni_intel matroxfb_base aes_x86_64 matroxfb_g450 matroxfb_accel > crypto_simd matroxfb_DAC1064 cryptd glue_helper g450_pll matroxfb_misc > i2c_ismt i2c_i801 acpi_cpufreq > Preemption disabled at: > [] cpu_stopper_thread+0x71/0x100 > CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 > Hardware name: Intel Corp. Harcuvar/Server, BIOS > HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 > Call Trace: > dump_stack+0x4f/0x6a > ? cpu_stopper_thread+0x71/0x100 > __schedule_bug.cold.16+0x38/0x55 > __schedule+0x484/0x6c0 > schedule+0x3d/0xe0 > rt_spin_lock_slowlock_locked+0x118/0x2a0 > rt_spin_lock_slowlock+0x57/0x90 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > ? intel_pmu_cpu_dying+0x67/0x70 > rt_write_lock+0x2a/0x30 > find_and_remove_object+0x1e/0x80 > delete_object_full+0x10/0x20 > kmemleak_free+0x32/0x50 > kfree+0x104/0x1f0 > ? x86_pmu_starting_cpu+0x30/0x30 > intel_pmu_cpu_dying+0x67/0x70 > x86_pmu_dying_cpu+0x1a/0x30 > cpuhp_invoke_callback+0x92/0x700 > take_cpu_down+0x70/0xa0 > multi_cpu_stop+0x62/0xc0 > ? cpu_stop_queue_work+0x130/0x130 > cpu_stopper_thread+0x79/0x100 > smpboot_thread_fn+0x20f/0x2d0 > kthread+0x121/0x140 > ? sort_range+0x30/0x30 > ? kthread_park+0x90/0x90 > ret_from_fork+0x35/0x40 > > And on v4.18 stable tree the following call trace, caused by grabbing > kmemleak_lock again, is also observed. > > kernel BUG at kernel/locking/rtmutex.c:1048! > invalid opcode: [#1] PREEMPT SMP PTI > CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 > Hardware name: Intel Corp. Harcuvar/Server, BIOS > HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 > RIP: 0010:rt_spin_lock_slowlock_locked+0x277/0x2a0 > Code: e8 5e 64 61 ff e9 bc fe ff ff e8 54 64 61 ff e9 b7 fe ff ff 0f 0b e8 98 > 57 53 ff e9 43 fe ff ff e8 8e 57 53 ff e9 74 ff ff ff <0f> 0b 0f 0b 0f 0b 48 > 8b 43 10 48 85 c0 74 06 48 3b 58 38 75 0b 49 > RSP: 0018:936846d4f3b0 EFLAGS: 00010046 > RAX: 8e3680361e00 RBX: 83a8b240 RCX: 0001 > RDX: RSI: 8e3680361e00 RDI: 83a8b258 > RBP: 936846d4f3e8 R08: 8e3680361e01 R09: 82adfdf0 > R10: 827ede18 R11: R12: 936846d4f3f8 > R13: 8e3680361e00 R14: 936846d4f3f8 R15: 0246 > FS: 7fc8b6bfd780() GS:8e369f34() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 55fb5659e000 CR3: 0007fdd14000 CR4: 003406e0 > Call Trace: > ? preempt_count_add+0x74/0xc0 > rt_spin_lock_slowlock+0x57/0x90 > ? __kernel_text_address+0x12/0x40 > ? __save_stack_trace+0x75/0x100 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > rt_write_lock+0x2a/0x30 > create_object+0x17d/0x2b0 > kmemleak_alloc+0x34/0x50 > kmem_cache_alloc+0x146/0x220 > ? mempool_alloc_slab+0x15/0x20 > mempool_alloc_slab+0x15/0x20 > mempool_alloc+0x65/0x170 > sg_pool_alloc+0x21/0x60 > __sg_alloc_table+0x101/0x160 > ? sg_free_table_chained+0x30/0x30 > sg_alloc_table_chained+0x8b/0xb0 > scsi_init_sgtable+0x31/0x90 > scsi_init_io+0x44/0x130 > sd_setup_write_same16_cmnd+0xef/0x150 > sd_init_command+0x6bf/0xaa0 > ? cgroup_base_stat_cputime_account_end.isra.0+0x26/0x60 > ? elv_rb_del+0x2a/0x40 > scsi_setup_cmnd+0x8e/0x140 > scsi_prep_fn+0x5d/0x140 > blk_peek_request+0xda/0x2f0 > scsi_request_fn+0x33/0x550 > ? cfq_rb_erase+0x23/0x40 > __blk_run_queue+0x43/0x60 > cfq_insert_request+0x2f3/0x5d0 > __elv_add_request+0x160/0x290 > blk_flush_plug_list+0x204/0x230 > schedule+0x87/0xe0 > __write_rt_lock+0x18b/0x1a0 > rt_write_lock+0x2a/0x30 > create_object+0x17d/0x2b0 > kmemleak_alloc+0x34/0x50 > __kmalloc_node+0x1cd/0x340 > alloc_request_size+0x30/0x70 > mempool_alloc+0x65/0x170 > ? ioc_lookup_icq+0x54/0x70 > get_request+0x4e3/0x8d0 > ? wait_woken+0x80/0x80 > blk_queue_bio+0x153/0x470 > generic_make_request+0x1dc/0x3f0 > submit_bio+0x49/0x140 > ? next_bio+0x38/0x40 > submit_bio_wait+0x59/0x90 > blkdev_issue_discard+0x7a/0xd0 > ? _raw_spin_unlock_irqrestore+0x18/0x50 > blk_ioctl_discard+0xc7/0x110 > blkdev_ioctl+0x57e/0x960 > ? __wake_up+0x13/0x20 > block_ioctl+0x3d/0x50 > do_vfs_ioctl+0xa8/0x610 > ? vfs_write+0x166/0x1b0 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x1a/0x20 > do_syscall_64+0x4d/0xf0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > kmemleak is an error detecting feature. We would not expect as good > performance > as without it. As there is no raw rwlock defining helpers, we turn
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Tue, Dec 18, 2018 at 06:51:41PM +0800, He Zhe wrote: > On 2018/12/6 03:14, Sebastian Andrzej Siewior wrote: > > With raw locks you wouldn't have multiple readers at the same time. > > Maybe you wouldn't have recursion but since you can't have multiple > > readers you would add lock contention where was none (because you could > > have two readers at the same time). > > OK. I understand your concern finally. At the commit log said, I wanted to > use raw > rwlock but didn't find the DEFINE helper for it. Thinking it would not be > expected to > have great performance, I turn to use raw spinlock instead. And yes, this > would > introduce worse performance. Looking through the kmemleak code, I can't really find significant reader contention. The longest holder of this lock (read) is the scan thread which is also protected by a scan_mutex, so can't run concurrently with another scanner (triggered through debugfs). The other read_lock(_lock) user is find_and_get_object() called from a few places. However, all such places normally follow a create_object() call (kmemleak_alloc() and friends) which already performs a write_lock(_lock), so it needs to wait for the scan thread to release the kmemleak_lock. It may be worth running some performance/latency tests during kmemleak scanning (echo scan > /sys/kernel/debug/kmemleak) but at a quick look, I don't think we'd see any difference with a raw_spin_lock_t. With a bit more thinking (though I'll be off until the new year), we could probably get rid of the kmemleak_lock entirely in scan_block() and make lookup_object() and the related rbtree code in kmemleak RCU-safe. -- Catalin
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018/12/6 03:14, Sebastian Andrzej Siewior wrote: > On 2018-12-05 21:53:37 [+0800], He Zhe wrote: >> For call trace 1: > … >> Since kmemleak would most likely be used to debug in environments where >> we would not expect as great performance as without it, and kfree() has raw >> locks >> in its main path and other debug function paths, I suppose it wouldn't hurt >> that >> we change to raw locks. > okay. > >From what I reached above, this is RT-only and happens on v4.18 and v4.19. The call trace above is caused by grabbing kmemleak_lock and then getting scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve this problem. >>> But this is a reader / writer lock. And if I understand the other part >>> of the thread then it needs multiple readers. >> For call trace 2: >> >> I don't get what "it needs multiple readers" exactly means here. >> >> In this call trace, the kmemleak_lock is grabbed as write lock, and then >> scheduled >> away, and then grabbed again as write lock from another path. It's a >> write->write locking, compared to the discussion in the other part of the >> thread. >> >> This is essentially because kmemleak hooks on the very low level memory >> allocation and free operations. After scheduled away, it can easily re-enter >> itself. >> We need raw locks to prevent this from happening. > With raw locks you wouldn't have multiple readers at the same time. > Maybe you wouldn't have recursion but since you can't have multiple > readers you would add lock contention where was none (because you could > have two readers at the same time). Sorry for slow reply. OK. I understand your concern finally. At the commit log said, I wanted to use raw rwlock but didn't find the DEFINE helper for it. Thinking it would not be expected to have great performance, I turn to use raw spinlock instead. And yes, this would introduce worse performance. Maybe I miss the reason, but why don't we have rwlock_types_raw.h to define raw rwlock helper for RT? With that, we can cleanly replace kmemleak_lock with a raw rwlock. Or should we just define a raw rwlock using basic type, like arch_rwlock_t, only in kmemleak? > >>> Couldn't we just get rid of that kfree() or move it somewhere else? >>> I mean if the free() memory on CPU-down and allocate it again CPU-up >>> then we could skip that, rigth? Just allocate it and don't free it >>> because the CPU will likely get up again. >> For call trace 1: >> >> I went through the CPU hotplug code and found that the allocation of the >> problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And >> the free is done in intel_pmu_cpu_dying. They are handlers triggered by two >> different perf events. >> >> It seems we can hardly form a convincing method that holds the data while >> CPUs are off and then uses it again. raw locks would be easy and good enough. > Why not allocate the memory in intel_pmu_cpu_prepare() if it is not > already there (otherwise skip the allocation) and in > intel_pmu_cpu_dying() not free it. It looks easy. Thanks for your suggestion. I've sent the change for call trace 1 to mainline mailing list. Hopefully it can be accepted. Zhe > >> Thanks, >> Zhe > Sebastian >
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-12-05 21:53:37 [+0800], He Zhe wrote: > For call trace 1: … > Since kmemleak would most likely be used to debug in environments where > we would not expect as great performance as without it, and kfree() has raw > locks > in its main path and other debug function paths, I suppose it wouldn't hurt > that > we change to raw locks. okay. > >> >From what I reached above, this is RT-only and happens on v4.18 and v4.19. > >> > >> The call trace above is caused by grabbing kmemleak_lock and then getting > >> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve > >> this problem. > > But this is a reader / writer lock. And if I understand the other part > > of the thread then it needs multiple readers. > > For call trace 2: > > I don't get what "it needs multiple readers" exactly means here. > > In this call trace, the kmemleak_lock is grabbed as write lock, and then > scheduled > away, and then grabbed again as write lock from another path. It's a > write->write locking, compared to the discussion in the other part of the > thread. > > This is essentially because kmemleak hooks on the very low level memory > allocation and free operations. After scheduled away, it can easily re-enter > itself. > We need raw locks to prevent this from happening. With raw locks you wouldn't have multiple readers at the same time. Maybe you wouldn't have recursion but since you can't have multiple readers you would add lock contention where was none (because you could have two readers at the same time). > > Couldn't we just get rid of that kfree() or move it somewhere else? > > I mean if the free() memory on CPU-down and allocate it again CPU-up > > then we could skip that, rigth? Just allocate it and don't free it > > because the CPU will likely get up again. > > For call trace 1: > > I went through the CPU hotplug code and found that the allocation of the > problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And > the free is done in intel_pmu_cpu_dying. They are handlers triggered by two > different perf events. > > It seems we can hardly form a convincing method that holds the data while > CPUs are off and then uses it again. raw locks would be easy and good enough. Why not allocate the memory in intel_pmu_cpu_prepare() if it is not already there (otherwise skip the allocation) and in intel_pmu_cpu_dying() not free it. It looks easy. > Thanks, > Zhe Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-12-05 21:53:37 [+0800], He Zhe wrote: > For call trace 1: … > Since kmemleak would most likely be used to debug in environments where > we would not expect as great performance as without it, and kfree() has raw > locks > in its main path and other debug function paths, I suppose it wouldn't hurt > that > we change to raw locks. okay. > >> >From what I reached above, this is RT-only and happens on v4.18 and v4.19. > >> > >> The call trace above is caused by grabbing kmemleak_lock and then getting > >> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve > >> this problem. > > But this is a reader / writer lock. And if I understand the other part > > of the thread then it needs multiple readers. > > For call trace 2: > > I don't get what "it needs multiple readers" exactly means here. > > In this call trace, the kmemleak_lock is grabbed as write lock, and then > scheduled > away, and then grabbed again as write lock from another path. It's a > write->write locking, compared to the discussion in the other part of the > thread. > > This is essentially because kmemleak hooks on the very low level memory > allocation and free operations. After scheduled away, it can easily re-enter > itself. > We need raw locks to prevent this from happening. With raw locks you wouldn't have multiple readers at the same time. Maybe you wouldn't have recursion but since you can't have multiple readers you would add lock contention where was none (because you could have two readers at the same time). > > Couldn't we just get rid of that kfree() or move it somewhere else? > > I mean if the free() memory on CPU-down and allocate it again CPU-up > > then we could skip that, rigth? Just allocate it and don't free it > > because the CPU will likely get up again. > > For call trace 1: > > I went through the CPU hotplug code and found that the allocation of the > problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And > the free is done in intel_pmu_cpu_dying. They are handlers triggered by two > different perf events. > > It seems we can hardly form a convincing method that holds the data while > CPUs are off and then uses it again. raw locks would be easy and good enough. Why not allocate the memory in intel_pmu_cpu_prepare() if it is not already there (otherwise skip the allocation) and in intel_pmu_cpu_dying() not free it. It looks easy. > Thanks, > Zhe Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018/12/1 02:19, Sebastian Andrzej Siewior wrote: > On 2018-11-24 22:26:46 [+0800], He Zhe wrote: >> On latest v4.19.1-rt3, both of the call traces can be reproduced with >> kmemleak >> enabied. And none can be reproduced with kmemleak disabled. > okay. So it needs attention. > >> On latest mainline tree, none can be reproduced no matter kmemleak is enabled >> or disabled. >> >> I don't get why kfree from a preempt-disabled section should cause a warning >> without kmemleak, since kfree can't sleep. > it might. It will acquire a sleeping lock if it has go down to the > memory allocator to actually give memory back. Got it. Thanks. > >> If I understand correctly, the call trace above is caused by trying to >> schedule >> after preemption is disabled, which cannot be reached in mainline kernel. So >> we might need to turn to use raw lock to keep preemption disabled. > The buddy-allocator runs with spin locks so it is okay on !RT. So you > can use kfree() with disabled preemption or disabled interrupts. > I don't think that we want to use raw-locks in the buddy-allocator. For call trace 1: I went through the calling paths inside kfree() and found that there have already been things using raw lock in it as follow. 1) in the path of kfree() itself kfree -> slab_free -> do_slab_free -> __slab_free -> raw_spin_lock_irqsave 2) in the path of CONFIG_DEBUG_OBJECTS_FREE kfree -> slab_free -> slab_free_freelist_hook -> slab_free_hook -> debug_check_no_obj_freed -> __debug_check_no_obj_freed -> raw_spin_lock_irqsave 3) in the path of CONFIG_LOCKDEP kfree -> __free_pages -> free_unref_page -> free_unref_page_prepare -> free_pcp_prepare -> free_pages_prepare -> debug_check_no_locks_freed -> debug_check_no_locks_freed -> raw_local_irq_save Since kmemleak would most likely be used to debug in environments where we would not expect as great performance as without it, and kfree() has raw locks in its main path and other debug function paths, I suppose it wouldn't hurt that we change to raw locks. > >> >From what I reached above, this is RT-only and happens on v4.18 and v4.19. >> >> The call trace above is caused by grabbing kmemleak_lock and then getting >> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve >> this problem. > But this is a reader / writer lock. And if I understand the other part > of the thread then it needs multiple readers. For call trace 2: I don't get what "it needs multiple readers" exactly means here. In this call trace, the kmemleak_lock is grabbed as write lock, and then scheduled away, and then grabbed again as write lock from another path. It's a write->write locking, compared to the discussion in the other part of the thread. This is essentially because kmemleak hooks on the very low level memory allocation and free operations. After scheduled away, it can easily re-enter itself. We need raw locks to prevent this from happening. > Couldn't we just get rid of that kfree() or move it somewhere else? > I mean if the free() memory on CPU-down and allocate it again CPU-up > then we could skip that, rigth? Just allocate it and don't free it > because the CPU will likely get up again. For call trace 1: I went through the CPU hotplug code and found that the allocation of the problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And the free is done in intel_pmu_cpu_dying. They are handlers triggered by two different perf events. It seems we can hardly form a convincing method that holds the data while CPUs are off and then uses it again. raw locks would be easy and good enough. Thanks, Zhe > >> Thanks, >> Zhe > Sebastian >
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018/12/1 02:19, Sebastian Andrzej Siewior wrote: > On 2018-11-24 22:26:46 [+0800], He Zhe wrote: >> On latest v4.19.1-rt3, both of the call traces can be reproduced with >> kmemleak >> enabied. And none can be reproduced with kmemleak disabled. > okay. So it needs attention. > >> On latest mainline tree, none can be reproduced no matter kmemleak is enabled >> or disabled. >> >> I don't get why kfree from a preempt-disabled section should cause a warning >> without kmemleak, since kfree can't sleep. > it might. It will acquire a sleeping lock if it has go down to the > memory allocator to actually give memory back. Got it. Thanks. > >> If I understand correctly, the call trace above is caused by trying to >> schedule >> after preemption is disabled, which cannot be reached in mainline kernel. So >> we might need to turn to use raw lock to keep preemption disabled. > The buddy-allocator runs with spin locks so it is okay on !RT. So you > can use kfree() with disabled preemption or disabled interrupts. > I don't think that we want to use raw-locks in the buddy-allocator. For call trace 1: I went through the calling paths inside kfree() and found that there have already been things using raw lock in it as follow. 1) in the path of kfree() itself kfree -> slab_free -> do_slab_free -> __slab_free -> raw_spin_lock_irqsave 2) in the path of CONFIG_DEBUG_OBJECTS_FREE kfree -> slab_free -> slab_free_freelist_hook -> slab_free_hook -> debug_check_no_obj_freed -> __debug_check_no_obj_freed -> raw_spin_lock_irqsave 3) in the path of CONFIG_LOCKDEP kfree -> __free_pages -> free_unref_page -> free_unref_page_prepare -> free_pcp_prepare -> free_pages_prepare -> debug_check_no_locks_freed -> debug_check_no_locks_freed -> raw_local_irq_save Since kmemleak would most likely be used to debug in environments where we would not expect as great performance as without it, and kfree() has raw locks in its main path and other debug function paths, I suppose it wouldn't hurt that we change to raw locks. > >> >From what I reached above, this is RT-only and happens on v4.18 and v4.19. >> >> The call trace above is caused by grabbing kmemleak_lock and then getting >> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve >> this problem. > But this is a reader / writer lock. And if I understand the other part > of the thread then it needs multiple readers. For call trace 2: I don't get what "it needs multiple readers" exactly means here. In this call trace, the kmemleak_lock is grabbed as write lock, and then scheduled away, and then grabbed again as write lock from another path. It's a write->write locking, compared to the discussion in the other part of the thread. This is essentially because kmemleak hooks on the very low level memory allocation and free operations. After scheduled away, it can easily re-enter itself. We need raw locks to prevent this from happening. > Couldn't we just get rid of that kfree() or move it somewhere else? > I mean if the free() memory on CPU-down and allocate it again CPU-up > then we could skip that, rigth? Just allocate it and don't free it > because the CPU will likely get up again. For call trace 1: I went through the CPU hotplug code and found that the allocation of the problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And the free is done in intel_pmu_cpu_dying. They are handlers triggered by two different perf events. It seems we can hardly form a convincing method that holds the data while CPUs are off and then uses it again. raw locks would be easy and good enough. Thanks, Zhe > >> Thanks, >> Zhe > Sebastian >
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-24 22:26:46 [+0800], He Zhe wrote: > On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak > enabied. And none can be reproduced with kmemleak disabled. okay. So it needs attention. > On latest mainline tree, none can be reproduced no matter kmemleak is enabled > or disabled. > > I don't get why kfree from a preempt-disabled section should cause a warning > without kmemleak, since kfree can't sleep. it might. It will acquire a sleeping lock if it has go down to the memory allocator to actually give memory back. > If I understand correctly, the call trace above is caused by trying to > schedule > after preemption is disabled, which cannot be reached in mainline kernel. So > we might need to turn to use raw lock to keep preemption disabled. The buddy-allocator runs with spin locks so it is okay on !RT. So you can use kfree() with disabled preemption or disabled interrupts. I don't think that we want to use raw-locks in the buddy-allocator. > >From what I reached above, this is RT-only and happens on v4.18 and v4.19. > > The call trace above is caused by grabbing kmemleak_lock and then getting > scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve > this problem. But this is a reader / writer lock. And if I understand the other part of the thread then it needs multiple readers. Couldn't we just get rid of that kfree() or move it somewhere else? I mean if the free() memory on CPU-down and allocate it again CPU-up then we could skip that, rigth? Just allocate it and don't free it because the CPU will likely get up again. > Thanks, > Zhe Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-24 22:26:46 [+0800], He Zhe wrote: > On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak > enabied. And none can be reproduced with kmemleak disabled. okay. So it needs attention. > On latest mainline tree, none can be reproduced no matter kmemleak is enabled > or disabled. > > I don't get why kfree from a preempt-disabled section should cause a warning > without kmemleak, since kfree can't sleep. it might. It will acquire a sleeping lock if it has go down to the memory allocator to actually give memory back. > If I understand correctly, the call trace above is caused by trying to > schedule > after preemption is disabled, which cannot be reached in mainline kernel. So > we might need to turn to use raw lock to keep preemption disabled. The buddy-allocator runs with spin locks so it is okay on !RT. So you can use kfree() with disabled preemption or disabled interrupts. I don't think that we want to use raw-locks in the buddy-allocator. > >From what I reached above, this is RT-only and happens on v4.18 and v4.19. > > The call trace above is caused by grabbing kmemleak_lock and then getting > scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve > this problem. But this is a reader / writer lock. And if I understand the other part of the thread then it needs multiple readers. Couldn't we just get rid of that kfree() or move it somewhere else? I mean if the free() memory on CPU-down and allocate it again CPU-up then we could skip that, rigth? Just allocate it and don't free it because the CPU will likely get up again. > Thanks, > Zhe Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Fri, Nov 23, 2018 at 12:06:11PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > > is this an RT-only problem? Because mainline should not allow read->read > > > locking or read->write locking for reader-writer locks. If this only > > > happens on v4.18 and not on v4.19 then something must have fixed it. > > > > Probably misunderstanding, but I'd say that read->read locking is "the > > norm"...? > > > > If you don't use qrwlock, readers are also "recursive", in part., > > > > P0P1 > > read_lock(l) > > write_lock(l) > > read_lock(l) > > > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > > the analysis; IIUC, they are recursive if and only if in_interrupt().). > > ehm, peterz, is that true? My memory on that is that all readers will > block if there is a writer pending. Since qrwlock is the more strict, all users should use its semantics. Just like we cannot 'rely' on the unfairness of some lock implementations.
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Fri, Nov 23, 2018 at 12:06:11PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > > is this an RT-only problem? Because mainline should not allow read->read > > > locking or read->write locking for reader-writer locks. If this only > > > happens on v4.18 and not on v4.19 then something must have fixed it. > > > > Probably misunderstanding, but I'd say that read->read locking is "the > > norm"...? > > > > If you don't use qrwlock, readers are also "recursive", in part., > > > > P0P1 > > read_lock(l) > > write_lock(l) > > read_lock(l) > > > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > > the analysis; IIUC, they are recursive if and only if in_interrupt().). > > ehm, peterz, is that true? My memory on that is that all readers will > block if there is a writer pending. Since qrwlock is the more strict, all users should use its semantics. Just like we cannot 'rely' on the unfairness of some lock implementations.
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018/11/23 17:53, Sebastian Andrzej Siewior wrote: > On 2018-11-22 17:04:19 [+0800], zhe...@windriver.com wrote: >> From: He Zhe >> >> kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and >> causes the follow BUG. >> >> BUG: scheduling while atomic: migration/15/132/0x0002 > … >> Preemption disabled at: >> [] cpu_stopper_thread+0x71/0x100 >> CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 >> Hardware name: Intel Corp. Harcuvar/Server, BIOS >> HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 >> Call Trace: >> dump_stack+0x4f/0x6a >> ? cpu_stopper_thread+0x71/0x100 >> __schedule_bug.cold.16+0x38/0x55 >> __schedule+0x484/0x6c0 >> schedule+0x3d/0xe0 >> rt_spin_lock_slowlock_locked+0x118/0x2a0 >> rt_spin_lock_slowlock+0x57/0x90 >> __rt_spin_lock+0x26/0x30 >> __write_rt_lock+0x23/0x1a0 >> ? intel_pmu_cpu_dying+0x67/0x70 >> rt_write_lock+0x2a/0x30 >> find_and_remove_object+0x1e/0x80 >> delete_object_full+0x10/0x20 >> kmemleak_free+0x32/0x50 >> kfree+0x104/0x1f0 >> ? x86_pmu_starting_cpu+0x30/0x30 >> intel_pmu_cpu_dying+0x67/0x70 >> x86_pmu_dying_cpu+0x1a/0x30 >> cpuhp_invoke_callback+0x92/0x700 >> take_cpu_down+0x70/0xa0 >> multi_cpu_stop+0x62/0xc0 >> ? cpu_stop_queue_work+0x130/0x130 >> cpu_stopper_thread+0x79/0x100 >> smpboot_thread_fn+0x20f/0x2d0 >> kthread+0x121/0x140 >> ? sort_range+0x30/0x30 >> ? kthread_park+0x90/0x90 >> ret_from_fork+0x35/0x40 > If this is the only problem? kfree() from a preempt-disabled section > should cause a warning even without kmemleak. Thanks for your review. I just did some tests aginst the latest code. On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak enabied. And none can be reproduced with kmemleak disabled. On latest mainline tree, none can be reproduced no matter kmemleak is enabled or disabled. I don't get why kfree from a preempt-disabled section should cause a warning without kmemleak, since kfree can't sleep. If I understand correctly, the call trace above is caused by trying to schedule after preemption is disabled, which cannot be reached in mainline kernel. So we might need to turn to use raw lock to keep preemption disabled. > >> And on v4.18 stable tree the following call trace, caused by grabbing >> kmemleak_lock again, is also observed. >> >> kernel BUG at kernel/locking/rtmutex.c:1048! >> invalid opcode: [#1] PREEMPT SMP PTI >> CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 > … >> Call Trace: >> ? preempt_count_add+0x74/0xc0 >> rt_spin_lock_slowlock+0x57/0x90 >> ? __kernel_text_address+0x12/0x40 >> ? __save_stack_trace+0x75/0x100 >> __rt_spin_lock+0x26/0x30 >> __write_rt_lock+0x23/0x1a0 >> rt_write_lock+0x2a/0x30 >> create_object+0x17d/0x2b0 > … > > is this an RT-only problem? Because mainline should not allow read->read > locking or read->write locking for reader-writer locks. If this only > happens on v4.18 and not on v4.19 then something must have fixed it. >From what I reached above, this is RT-only and happens on v4.18 and v4.19. The call trace above is caused by grabbing kmemleak_lock and then getting scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve this problem. Thanks, Zhe > > > Sebastian >
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018/11/23 17:53, Sebastian Andrzej Siewior wrote: > On 2018-11-22 17:04:19 [+0800], zhe...@windriver.com wrote: >> From: He Zhe >> >> kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and >> causes the follow BUG. >> >> BUG: scheduling while atomic: migration/15/132/0x0002 > … >> Preemption disabled at: >> [] cpu_stopper_thread+0x71/0x100 >> CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 >> Hardware name: Intel Corp. Harcuvar/Server, BIOS >> HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 >> Call Trace: >> dump_stack+0x4f/0x6a >> ? cpu_stopper_thread+0x71/0x100 >> __schedule_bug.cold.16+0x38/0x55 >> __schedule+0x484/0x6c0 >> schedule+0x3d/0xe0 >> rt_spin_lock_slowlock_locked+0x118/0x2a0 >> rt_spin_lock_slowlock+0x57/0x90 >> __rt_spin_lock+0x26/0x30 >> __write_rt_lock+0x23/0x1a0 >> ? intel_pmu_cpu_dying+0x67/0x70 >> rt_write_lock+0x2a/0x30 >> find_and_remove_object+0x1e/0x80 >> delete_object_full+0x10/0x20 >> kmemleak_free+0x32/0x50 >> kfree+0x104/0x1f0 >> ? x86_pmu_starting_cpu+0x30/0x30 >> intel_pmu_cpu_dying+0x67/0x70 >> x86_pmu_dying_cpu+0x1a/0x30 >> cpuhp_invoke_callback+0x92/0x700 >> take_cpu_down+0x70/0xa0 >> multi_cpu_stop+0x62/0xc0 >> ? cpu_stop_queue_work+0x130/0x130 >> cpu_stopper_thread+0x79/0x100 >> smpboot_thread_fn+0x20f/0x2d0 >> kthread+0x121/0x140 >> ? sort_range+0x30/0x30 >> ? kthread_park+0x90/0x90 >> ret_from_fork+0x35/0x40 > If this is the only problem? kfree() from a preempt-disabled section > should cause a warning even without kmemleak. Thanks for your review. I just did some tests aginst the latest code. On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak enabied. And none can be reproduced with kmemleak disabled. On latest mainline tree, none can be reproduced no matter kmemleak is enabled or disabled. I don't get why kfree from a preempt-disabled section should cause a warning without kmemleak, since kfree can't sleep. If I understand correctly, the call trace above is caused by trying to schedule after preemption is disabled, which cannot be reached in mainline kernel. So we might need to turn to use raw lock to keep preemption disabled. > >> And on v4.18 stable tree the following call trace, caused by grabbing >> kmemleak_lock again, is also observed. >> >> kernel BUG at kernel/locking/rtmutex.c:1048! >> invalid opcode: [#1] PREEMPT SMP PTI >> CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 > … >> Call Trace: >> ? preempt_count_add+0x74/0xc0 >> rt_spin_lock_slowlock+0x57/0x90 >> ? __kernel_text_address+0x12/0x40 >> ? __save_stack_trace+0x75/0x100 >> __rt_spin_lock+0x26/0x30 >> __write_rt_lock+0x23/0x1a0 >> rt_write_lock+0x2a/0x30 >> create_object+0x17d/0x2b0 > … > > is this an RT-only problem? Because mainline should not allow read->read > locking or read->write locking for reader-writer locks. If this only > happens on v4.18 and not on v4.19 then something must have fixed it. >From what I reached above, this is RT-only and happens on v4.18 and v4.19. The call trace above is caused by grabbing kmemleak_lock and then getting scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve this problem. Thanks, Zhe > > > Sebastian >
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Fri, 23 Nov 2018 11:31:31 + Catalin Marinas wrote: > With qwrlocks, the readers will normally block if there is a pending > writer (to avoid starving the writer), unless in_interrupt() when the > readers are allowed to starve a pending writer. > > TLA+/PlusCal model here: ;) > > https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qrwlock.tla > And the code appears to confirm it too: void queued_read_lock_slowpath(struct qrwlock *lock) { /* * Readers come here when they cannot get the lock without waiting */ if (unlikely(in_interrupt())) { /* * Readers in interrupt context will get the lock immediately * if the writer is just waiting (not holding the lock yet), * so spin with ACQUIRE semantics until the lock is available * without waiting in the queue. */ atomic_cond_read_acquire(>cnts, !(VAL & _QW_LOCKED)); return; } atomic_sub(_QR_BIAS, >cnts); /* * Put the reader into the wait queue */ arch_spin_lock(>wait_lock); atomic_add(_QR_BIAS, >cnts); -- Steve
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Fri, 23 Nov 2018 11:31:31 + Catalin Marinas wrote: > With qwrlocks, the readers will normally block if there is a pending > writer (to avoid starving the writer), unless in_interrupt() when the > readers are allowed to starve a pending writer. > > TLA+/PlusCal model here: ;) > > https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qrwlock.tla > And the code appears to confirm it too: void queued_read_lock_slowpath(struct qrwlock *lock) { /* * Readers come here when they cannot get the lock without waiting */ if (unlikely(in_interrupt())) { /* * Readers in interrupt context will get the lock immediately * if the writer is just waiting (not holding the lock yet), * so spin with ACQUIRE semantics until the lock is available * without waiting in the queue. */ atomic_cond_read_acquire(>cnts, !(VAL & _QW_LOCKED)); return; } atomic_sub(_QR_BIAS, >cnts); /* * Put the reader into the wait queue */ arch_spin_lock(>wait_lock); atomic_add(_QR_BIAS, >cnts); -- Steve
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Fri, Nov 23, 2018 at 12:06:11PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > > is this an RT-only problem? Because mainline should not allow read->read > > > locking or read->write locking for reader-writer locks. If this only > > > happens on v4.18 and not on v4.19 then something must have fixed it. > > > > Probably misunderstanding, but I'd say that read->read locking is "the > > norm"...? > > > > If you don't use qrwlock, readers are also "recursive", in part., > > > > P0P1 > > read_lock(l) > > write_lock(l) > > read_lock(l) > > > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > > the analysis; IIUC, they are recursive if and only if in_interrupt().). > > ehm, peterz, is that true? My memory on that is that all readers will > block if there is a writer pending. With qwrlocks, the readers will normally block if there is a pending writer (to avoid starving the writer), unless in_interrupt() when the readers are allowed to starve a pending writer. TLA+/PlusCal model here: ;) https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qrwlock.tla -- Catalin
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On Fri, Nov 23, 2018 at 12:06:11PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > > is this an RT-only problem? Because mainline should not allow read->read > > > locking or read->write locking for reader-writer locks. If this only > > > happens on v4.18 and not on v4.19 then something must have fixed it. > > > > Probably misunderstanding, but I'd say that read->read locking is "the > > norm"...? > > > > If you don't use qrwlock, readers are also "recursive", in part., > > > > P0P1 > > read_lock(l) > > write_lock(l) > > read_lock(l) > > > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > > the analysis; IIUC, they are recursive if and only if in_interrupt().). > > ehm, peterz, is that true? My memory on that is that all readers will > block if there is a writer pending. With qwrlocks, the readers will normally block if there is a pending writer (to avoid starving the writer), unless in_interrupt() when the readers are allowed to starve a pending writer. TLA+/PlusCal model here: ;) https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qrwlock.tla -- Catalin
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > is this an RT-only problem? Because mainline should not allow read->read > > locking or read->write locking for reader-writer locks. If this only > > happens on v4.18 and not on v4.19 then something must have fixed it. > > Probably misunderstanding, but I'd say that read->read locking is "the > norm"...? > > If you don't use qrwlock, readers are also "recursive", in part., > > P0 P1 > read_lock(l) > write_lock(l) > read_lock(l) > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > the analysis; IIUC, they are recursive if and only if in_interrupt().). ehm, peterz, is that true? My memory on that is that all readers will block if there is a writer pending. > Andrea Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > is this an RT-only problem? Because mainline should not allow read->read > > locking or read->write locking for reader-writer locks. If this only > > happens on v4.18 and not on v4.19 then something must have fixed it. > > Probably misunderstanding, but I'd say that read->read locking is "the > norm"...? > > If you don't use qrwlock, readers are also "recursive", in part., > > P0 P1 > read_lock(l) > write_lock(l) > read_lock(l) > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > the analysis; IIUC, they are recursive if and only if in_interrupt().). ehm, peterz, is that true? My memory on that is that all readers will block if there is a writer pending. > Andrea Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
> is this an RT-only problem? Because mainline should not allow read->read > locking or read->write locking for reader-writer locks. If this only > happens on v4.18 and not on v4.19 then something must have fixed it. Probably misunderstanding, but I'd say that read->read locking is "the norm"...? If you don't use qrwlock, readers are also "recursive", in part., P0P1 read_lock(l) write_lock(l) read_lock(l) won't block P0 on the second read_lock(). (qrwlock somehow complicate the analysis; IIUC, they are recursive if and only if in_interrupt().). Andrea > > > Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
> is this an RT-only problem? Because mainline should not allow read->read > locking or read->write locking for reader-writer locks. If this only > happens on v4.18 and not on v4.19 then something must have fixed it. Probably misunderstanding, but I'd say that read->read locking is "the norm"...? If you don't use qrwlock, readers are also "recursive", in part., P0P1 read_lock(l) write_lock(l) read_lock(l) won't block P0 on the second read_lock(). (qrwlock somehow complicate the analysis; IIUC, they are recursive if and only if in_interrupt().). Andrea > > > Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-22 17:04:19 [+0800], zhe...@windriver.com wrote: > From: He Zhe > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. > > BUG: scheduling while atomic: migration/15/132/0x0002 … > Preemption disabled at: > [] cpu_stopper_thread+0x71/0x100 > CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 > Hardware name: Intel Corp. Harcuvar/Server, BIOS > HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 > Call Trace: > dump_stack+0x4f/0x6a > ? cpu_stopper_thread+0x71/0x100 > __schedule_bug.cold.16+0x38/0x55 > __schedule+0x484/0x6c0 > schedule+0x3d/0xe0 > rt_spin_lock_slowlock_locked+0x118/0x2a0 > rt_spin_lock_slowlock+0x57/0x90 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > ? intel_pmu_cpu_dying+0x67/0x70 > rt_write_lock+0x2a/0x30 > find_and_remove_object+0x1e/0x80 > delete_object_full+0x10/0x20 > kmemleak_free+0x32/0x50 > kfree+0x104/0x1f0 > ? x86_pmu_starting_cpu+0x30/0x30 > intel_pmu_cpu_dying+0x67/0x70 > x86_pmu_dying_cpu+0x1a/0x30 > cpuhp_invoke_callback+0x92/0x700 > take_cpu_down+0x70/0xa0 > multi_cpu_stop+0x62/0xc0 > ? cpu_stop_queue_work+0x130/0x130 > cpu_stopper_thread+0x79/0x100 > smpboot_thread_fn+0x20f/0x2d0 > kthread+0x121/0x140 > ? sort_range+0x30/0x30 > ? kthread_park+0x90/0x90 > ret_from_fork+0x35/0x40 If this is the only problem? kfree() from a preempt-disabled section should cause a warning even without kmemleak. > And on v4.18 stable tree the following call trace, caused by grabbing > kmemleak_lock again, is also observed. > > kernel BUG at kernel/locking/rtmutex.c:1048! > invalid opcode: [#1] PREEMPT SMP PTI > CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 … > Call Trace: > ? preempt_count_add+0x74/0xc0 > rt_spin_lock_slowlock+0x57/0x90 > ? __kernel_text_address+0x12/0x40 > ? __save_stack_trace+0x75/0x100 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > rt_write_lock+0x2a/0x30 > create_object+0x17d/0x2b0 … is this an RT-only problem? Because mainline should not allow read->read locking or read->write locking for reader-writer locks. If this only happens on v4.18 and not on v4.19 then something must have fixed it. Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-22 17:04:19 [+0800], zhe...@windriver.com wrote: > From: He Zhe > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. > > BUG: scheduling while atomic: migration/15/132/0x0002 … > Preemption disabled at: > [] cpu_stopper_thread+0x71/0x100 > CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 > Hardware name: Intel Corp. Harcuvar/Server, BIOS > HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 > Call Trace: > dump_stack+0x4f/0x6a > ? cpu_stopper_thread+0x71/0x100 > __schedule_bug.cold.16+0x38/0x55 > __schedule+0x484/0x6c0 > schedule+0x3d/0xe0 > rt_spin_lock_slowlock_locked+0x118/0x2a0 > rt_spin_lock_slowlock+0x57/0x90 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > ? intel_pmu_cpu_dying+0x67/0x70 > rt_write_lock+0x2a/0x30 > find_and_remove_object+0x1e/0x80 > delete_object_full+0x10/0x20 > kmemleak_free+0x32/0x50 > kfree+0x104/0x1f0 > ? x86_pmu_starting_cpu+0x30/0x30 > intel_pmu_cpu_dying+0x67/0x70 > x86_pmu_dying_cpu+0x1a/0x30 > cpuhp_invoke_callback+0x92/0x700 > take_cpu_down+0x70/0xa0 > multi_cpu_stop+0x62/0xc0 > ? cpu_stop_queue_work+0x130/0x130 > cpu_stopper_thread+0x79/0x100 > smpboot_thread_fn+0x20f/0x2d0 > kthread+0x121/0x140 > ? sort_range+0x30/0x30 > ? kthread_park+0x90/0x90 > ret_from_fork+0x35/0x40 If this is the only problem? kfree() from a preempt-disabled section should cause a warning even without kmemleak. > And on v4.18 stable tree the following call trace, caused by grabbing > kmemleak_lock again, is also observed. > > kernel BUG at kernel/locking/rtmutex.c:1048! > invalid opcode: [#1] PREEMPT SMP PTI > CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 … > Call Trace: > ? preempt_count_add+0x74/0xc0 > rt_spin_lock_slowlock+0x57/0x90 > ? __kernel_text_address+0x12/0x40 > ? __save_stack_trace+0x75/0x100 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > rt_write_lock+0x2a/0x30 > create_object+0x17d/0x2b0 … is this an RT-only problem? Because mainline should not allow read->read locking or read->write locking for reader-writer locks. If this only happens on v4.18 and not on v4.19 then something must have fixed it. Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-22 17:04:19 [+0800], zhe...@windriver.com wrote: > From: He Zhe > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. please use [PATCH RT … ] in future while posting for RT. And this was (and is) on my TODO list. Sebastian
Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
On 2018-11-22 17:04:19 [+0800], zhe...@windriver.com wrote: > From: He Zhe > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. please use [PATCH RT … ] in future while posting for RT. And this was (and is) on my TODO list. Sebastian
[PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
From: He Zhe kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and causes the follow BUG. BUG: scheduling while atomic: migration/15/132/0x0002 Modules linked in: iTCO_wdt iTCO_vendor_support intel_rapl pcc_cpufreq pnd2_edac intel_powerclamp coretemp crct10dif_pclmul crct10dif_common aesni_intel matroxfb_base aes_x86_64 matroxfb_g450 matroxfb_accel crypto_simd matroxfb_DAC1064 cryptd glue_helper g450_pll matroxfb_misc i2c_ismt i2c_i801 acpi_cpufreq Preemption disabled at: [] cpu_stopper_thread+0x71/0x100 CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 Call Trace: dump_stack+0x4f/0x6a ? cpu_stopper_thread+0x71/0x100 __schedule_bug.cold.16+0x38/0x55 __schedule+0x484/0x6c0 schedule+0x3d/0xe0 rt_spin_lock_slowlock_locked+0x118/0x2a0 rt_spin_lock_slowlock+0x57/0x90 __rt_spin_lock+0x26/0x30 __write_rt_lock+0x23/0x1a0 ? intel_pmu_cpu_dying+0x67/0x70 rt_write_lock+0x2a/0x30 find_and_remove_object+0x1e/0x80 delete_object_full+0x10/0x20 kmemleak_free+0x32/0x50 kfree+0x104/0x1f0 ? x86_pmu_starting_cpu+0x30/0x30 intel_pmu_cpu_dying+0x67/0x70 x86_pmu_dying_cpu+0x1a/0x30 cpuhp_invoke_callback+0x92/0x700 take_cpu_down+0x70/0xa0 multi_cpu_stop+0x62/0xc0 ? cpu_stop_queue_work+0x130/0x130 cpu_stopper_thread+0x79/0x100 smpboot_thread_fn+0x20f/0x2d0 kthread+0x121/0x140 ? sort_range+0x30/0x30 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 And on v4.18 stable tree the following call trace, caused by grabbing kmemleak_lock again, is also observed. kernel BUG at kernel/locking/rtmutex.c:1048! invalid opcode: [#1] PREEMPT SMP PTI CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 RIP: 0010:rt_spin_lock_slowlock_locked+0x277/0x2a0 Code: e8 5e 64 61 ff e9 bc fe ff ff e8 54 64 61 ff e9 b7 fe ff ff 0f 0b e8 98 57 53 ff e9 43 fe ff ff e8 8e 57 53 ff e9 74 ff ff ff <0f> 0b 0f 0b 0f 0b 48 8b 43 10 48 85 c0 74 06 48 3b 58 38 75 0b 49 RSP: 0018:936846d4f3b0 EFLAGS: 00010046 RAX: 8e3680361e00 RBX: 83a8b240 RCX: 0001 RDX: RSI: 8e3680361e00 RDI: 83a8b258 RBP: 936846d4f3e8 R08: 8e3680361e01 R09: 82adfdf0 R10: 827ede18 R11: R12: 936846d4f3f8 R13: 8e3680361e00 R14: 936846d4f3f8 R15: 0246 FS: 7fc8b6bfd780() GS:8e369f34() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 55fb5659e000 CR3: 0007fdd14000 CR4: 003406e0 Call Trace: ? preempt_count_add+0x74/0xc0 rt_spin_lock_slowlock+0x57/0x90 ? __kernel_text_address+0x12/0x40 ? __save_stack_trace+0x75/0x100 __rt_spin_lock+0x26/0x30 __write_rt_lock+0x23/0x1a0 rt_write_lock+0x2a/0x30 create_object+0x17d/0x2b0 kmemleak_alloc+0x34/0x50 kmem_cache_alloc+0x146/0x220 ? mempool_alloc_slab+0x15/0x20 mempool_alloc_slab+0x15/0x20 mempool_alloc+0x65/0x170 sg_pool_alloc+0x21/0x60 __sg_alloc_table+0x101/0x160 ? sg_free_table_chained+0x30/0x30 sg_alloc_table_chained+0x8b/0xb0 scsi_init_sgtable+0x31/0x90 scsi_init_io+0x44/0x130 sd_setup_write_same16_cmnd+0xef/0x150 sd_init_command+0x6bf/0xaa0 ? cgroup_base_stat_cputime_account_end.isra.0+0x26/0x60 ? elv_rb_del+0x2a/0x40 scsi_setup_cmnd+0x8e/0x140 scsi_prep_fn+0x5d/0x140 blk_peek_request+0xda/0x2f0 scsi_request_fn+0x33/0x550 ? cfq_rb_erase+0x23/0x40 __blk_run_queue+0x43/0x60 cfq_insert_request+0x2f3/0x5d0 __elv_add_request+0x160/0x290 blk_flush_plug_list+0x204/0x230 schedule+0x87/0xe0 __write_rt_lock+0x18b/0x1a0 rt_write_lock+0x2a/0x30 create_object+0x17d/0x2b0 kmemleak_alloc+0x34/0x50 __kmalloc_node+0x1cd/0x340 alloc_request_size+0x30/0x70 mempool_alloc+0x65/0x170 ? ioc_lookup_icq+0x54/0x70 get_request+0x4e3/0x8d0 ? wait_woken+0x80/0x80 blk_queue_bio+0x153/0x470 generic_make_request+0x1dc/0x3f0 submit_bio+0x49/0x140 ? next_bio+0x38/0x40 submit_bio_wait+0x59/0x90 blkdev_issue_discard+0x7a/0xd0 ? _raw_spin_unlock_irqrestore+0x18/0x50 blk_ioctl_discard+0xc7/0x110 blkdev_ioctl+0x57e/0x960 ? __wake_up+0x13/0x20 block_ioctl+0x3d/0x50 do_vfs_ioctl+0xa8/0x610 ? vfs_write+0x166/0x1b0 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x4d/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmemleak is an error detecting feature. We would not expect as good performance as without it. As there is no raw rwlock defining helpers, we turn kmemleak_lock to a raw spinlock. Signed-off-by: He Zhe Cc: catalin.mari...@arm.com Cc: bige...@linutronix.de Cc: t...@linutronix.de Cc: rost...@goodmis.org --- v2: Remove stable tag as this is only for preempt-rt patchset mm/kmemleak.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git
[PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT
From: He Zhe kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and causes the follow BUG. BUG: scheduling while atomic: migration/15/132/0x0002 Modules linked in: iTCO_wdt iTCO_vendor_support intel_rapl pcc_cpufreq pnd2_edac intel_powerclamp coretemp crct10dif_pclmul crct10dif_common aesni_intel matroxfb_base aes_x86_64 matroxfb_g450 matroxfb_accel crypto_simd matroxfb_DAC1064 cryptd glue_helper g450_pll matroxfb_misc i2c_ismt i2c_i801 acpi_cpufreq Preemption disabled at: [] cpu_stopper_thread+0x71/0x100 CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 Call Trace: dump_stack+0x4f/0x6a ? cpu_stopper_thread+0x71/0x100 __schedule_bug.cold.16+0x38/0x55 __schedule+0x484/0x6c0 schedule+0x3d/0xe0 rt_spin_lock_slowlock_locked+0x118/0x2a0 rt_spin_lock_slowlock+0x57/0x90 __rt_spin_lock+0x26/0x30 __write_rt_lock+0x23/0x1a0 ? intel_pmu_cpu_dying+0x67/0x70 rt_write_lock+0x2a/0x30 find_and_remove_object+0x1e/0x80 delete_object_full+0x10/0x20 kmemleak_free+0x32/0x50 kfree+0x104/0x1f0 ? x86_pmu_starting_cpu+0x30/0x30 intel_pmu_cpu_dying+0x67/0x70 x86_pmu_dying_cpu+0x1a/0x30 cpuhp_invoke_callback+0x92/0x700 take_cpu_down+0x70/0xa0 multi_cpu_stop+0x62/0xc0 ? cpu_stop_queue_work+0x130/0x130 cpu_stopper_thread+0x79/0x100 smpboot_thread_fn+0x20f/0x2d0 kthread+0x121/0x140 ? sort_range+0x30/0x30 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 And on v4.18 stable tree the following call trace, caused by grabbing kmemleak_lock again, is also observed. kernel BUG at kernel/locking/rtmutex.c:1048! invalid opcode: [#1] PREEMPT SMP PTI CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 RIP: 0010:rt_spin_lock_slowlock_locked+0x277/0x2a0 Code: e8 5e 64 61 ff e9 bc fe ff ff e8 54 64 61 ff e9 b7 fe ff ff 0f 0b e8 98 57 53 ff e9 43 fe ff ff e8 8e 57 53 ff e9 74 ff ff ff <0f> 0b 0f 0b 0f 0b 48 8b 43 10 48 85 c0 74 06 48 3b 58 38 75 0b 49 RSP: 0018:936846d4f3b0 EFLAGS: 00010046 RAX: 8e3680361e00 RBX: 83a8b240 RCX: 0001 RDX: RSI: 8e3680361e00 RDI: 83a8b258 RBP: 936846d4f3e8 R08: 8e3680361e01 R09: 82adfdf0 R10: 827ede18 R11: R12: 936846d4f3f8 R13: 8e3680361e00 R14: 936846d4f3f8 R15: 0246 FS: 7fc8b6bfd780() GS:8e369f34() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 55fb5659e000 CR3: 0007fdd14000 CR4: 003406e0 Call Trace: ? preempt_count_add+0x74/0xc0 rt_spin_lock_slowlock+0x57/0x90 ? __kernel_text_address+0x12/0x40 ? __save_stack_trace+0x75/0x100 __rt_spin_lock+0x26/0x30 __write_rt_lock+0x23/0x1a0 rt_write_lock+0x2a/0x30 create_object+0x17d/0x2b0 kmemleak_alloc+0x34/0x50 kmem_cache_alloc+0x146/0x220 ? mempool_alloc_slab+0x15/0x20 mempool_alloc_slab+0x15/0x20 mempool_alloc+0x65/0x170 sg_pool_alloc+0x21/0x60 __sg_alloc_table+0x101/0x160 ? sg_free_table_chained+0x30/0x30 sg_alloc_table_chained+0x8b/0xb0 scsi_init_sgtable+0x31/0x90 scsi_init_io+0x44/0x130 sd_setup_write_same16_cmnd+0xef/0x150 sd_init_command+0x6bf/0xaa0 ? cgroup_base_stat_cputime_account_end.isra.0+0x26/0x60 ? elv_rb_del+0x2a/0x40 scsi_setup_cmnd+0x8e/0x140 scsi_prep_fn+0x5d/0x140 blk_peek_request+0xda/0x2f0 scsi_request_fn+0x33/0x550 ? cfq_rb_erase+0x23/0x40 __blk_run_queue+0x43/0x60 cfq_insert_request+0x2f3/0x5d0 __elv_add_request+0x160/0x290 blk_flush_plug_list+0x204/0x230 schedule+0x87/0xe0 __write_rt_lock+0x18b/0x1a0 rt_write_lock+0x2a/0x30 create_object+0x17d/0x2b0 kmemleak_alloc+0x34/0x50 __kmalloc_node+0x1cd/0x340 alloc_request_size+0x30/0x70 mempool_alloc+0x65/0x170 ? ioc_lookup_icq+0x54/0x70 get_request+0x4e3/0x8d0 ? wait_woken+0x80/0x80 blk_queue_bio+0x153/0x470 generic_make_request+0x1dc/0x3f0 submit_bio+0x49/0x140 ? next_bio+0x38/0x40 submit_bio_wait+0x59/0x90 blkdev_issue_discard+0x7a/0xd0 ? _raw_spin_unlock_irqrestore+0x18/0x50 blk_ioctl_discard+0xc7/0x110 blkdev_ioctl+0x57e/0x960 ? __wake_up+0x13/0x20 block_ioctl+0x3d/0x50 do_vfs_ioctl+0xa8/0x610 ? vfs_write+0x166/0x1b0 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x4d/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmemleak is an error detecting feature. We would not expect as good performance as without it. As there is no raw rwlock defining helpers, we turn kmemleak_lock to a raw spinlock. Signed-off-by: He Zhe Cc: catalin.mari...@arm.com Cc: bige...@linutronix.de Cc: t...@linutronix.de Cc: rost...@goodmis.org --- v2: Remove stable tag as this is only for preempt-rt patchset mm/kmemleak.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git