On 26/02/2020 05:40, Amol Grover wrote:
On Tue, Feb 25, 2020 at 05:47:54PM +0000, John Garry wrote:
Hi Corey,
I was just wondering are all the changes to fix this issue applied yet?
I could not help but notice this change:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/char/ipmi?h=next-20200225&id=ef0129a4d06a669a0a4d49654d2130ff4343a262
I tested this on top of 5.6-rc3, and I still a warn of suspicious RCU usage
from the IPMI driver.
Thanks,
John
I'm booting an allmodconfig kernel on my arm64 system from recent Linus'
master branch, and I see this:
[ 188.241001][ T1] IPMI message handler: version 39.2
[ 188.247331][ T1] ipmi device interface
[ 188.256271][ T1]
[ 188.258541][ T1] =============================
[ 188.263388][ T1] WARNING: suspicious RCU usage
[ 188.268236][ T1] 5.4.0+ #1416 Not tainted
[ 188.272645][ T1] -----------------------------
[ 188.277496][ T1] drivers/char/ipmi/ipmi_msghandler.c:744 RCU-list
traversed in non-reader section!!
Hi John,
Hi Amol,
Are you referring to this suspicious RCU usage warning that you're
getting on 5.6-rc3?
Yes, I see these warnings on 5.6-rc3; they look like the same issues I
saw on earlier kernel versions:
Is this the only warning that you're getting?
I see 2x warnings on 5.6-rc3:
[ 29.772408] =============================
[ 29.776863] WARNING: suspicious RCU usage
[ 29.780915] 5.6.0-rc3-00001-g907305ae6618-dirty #1755 Not tainted
[ 29.787046] -----------------------------
[ 29.791100] drivers/char/ipmi/ipmi_msghandler.c:744 RCU-list
traversed in non-reader section!!
[ 29.799753]
other info that might help us debug this:
[ 29.807801]
rcu_scheduler_active = 2, debug_locks = 1
[ 29.814372] 2 locks held by swapper/0/1:
[ 29.818331] #0: ffffa00013e1ae60 (smi_watchers_mutex){+.+.}, at:
ipmi_smi_watcher_register+0x64/0x220
[ 29.827696] #1: ffffa000190fd7a8 (&ipmi_interfaces_srcu){....}, at:
ipmi_smi_watcher_register+0x80/0x220
[ 29.837314]
stack backtrace:
[ 29.841712] CPU: 24 PID: 1 Comm: swapper/0 Not tainted
5.6.0-rc3-00001-g907305ae6618-dirty #1755
[ 29.850495] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019
[ 29.859016] Call trace:
[ 29.861463] dump_backtrace+0x0/0x298
[ 29.865124] show_stack+0x14/0x20
[ 29.868441] dump_stack+0x118/0x190
[ 29.871931] lockdep_rcu_suspicious+0xe0/0x120
[ 29.876373] ipmi_smi_watcher_register+0x1e4/0x220
[ 29.881166] init_ipmi_devintf+0xbc/0x108
[ 29.885174] do_one_initcall+0xf0/0x608
[ 29.889011] kernel_init_freeable+0x31c/0x384
[ 29.893367] kernel_init+0x10/0x118
[ 29.896854] ret_from_fork+0x10/0x18
[ 30.212615] =============================
[ 30.216710] WARNING: suspicious RCU usage
[ 30.220785] 5.6.0-rc3-00001-g907305ae6618-dirty #1755 Not tainted
[ 30.226923] -----------------------------
[ 30.230972] drivers/char/ipmi/ipmi_msghandler.c:3453 RCU-list
traversed in non-reader section!!
[ 30.239706]
other info that might help us debug this:
[ 30.247747]
rcu_scheduler_active = 2, debug_locks = 1
[ 30.254312] 2 locks held by swapper/0/1:
[ 30.258276] #0: ffffa00013e1e960 (smi_infos_lock){+.+.}, at:
init_ipmi_si+0x64/0x1dc
[ 30.266172] #1: ffffa00013e1adc0 (ipmi_interfaces_mutex){+.+.}, at:
ipmi_add_smi+0x5ac/0xa30
[ 30.274758]
stack backtrace:
[ 30.279155] CPU: 25 PID: 1 Comm: swapper/0 Not tainted
5.6.0-rc3-00001-g907305ae6618-dirty #1755
[ 30.287938] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019
[ 30.296459] Call trace:
[ 30.298907] dump_backtrace+0x0/0x298
[ 30.302568] show_stack+0x14/0x20
[ 30.305884] dump_stack+0x118/0x190
[ 30.309372] lockdep_rcu_suspicious+0xe0/0x120
[ 30.313815] ipmi_add_smi+0x85c/0xa30
[ 30.317477] try_smi_init+0x9b4/0xd78
[ 30.321138] init_ipmi_si+0x98/0x1dc
[ 30.324713] do_one_initcall+0xf0/0x608
[ 30.328549] kernel_init_freeable+0x31c/0x384
[ 30.332906] kernel_init+0x10/0x118
[ 30.336393] ret_from_fork+0x10/0x18
I have put a full kernel log here:
https://pastebin.com/7yBzBt1y
Patch "drivers: char: ipmi: ipmi_msghandler: Pass lockdep expression to
RCU lists" looks to have eliminated the 2nd warning
(drivers/char/ipmi/ipmi_msghandler.c:3453 RCU-list traversed in
non-reader section!!), above.
I can
see a few usages of list_for_each_entry_rcu() under SRCU read lock
which, if I'm not mistaken, might all throw these warnings as no
srcu_read_lock_held() is used. I'll be happy to take a look at them. Let
me know.
Thanks,
John
Thanks
Amol
[ 188.286971][ T1]
[ 188.286971][ T1] other info that might help us debug this:
[ 188.286971][ T1]
[ 188.297235][ T1]
[ 188.297235][ T1] rcu_scheduler_active = 2, debug_locks = 1
[ 188.305314][ T1] 2 locks held by swapper/0/1:
[ 188.310072][ T1] #0: ffffffd0149687f0 (smi_watchers_mutex){+.+.},
at: ipmi_smi_watcher_register+0x5c/0x248
[ 188.320295][ T1] #1: ffffffd016c9aff8
(&ipmi_interfaces_srcu){....}, at: ipmi_smi_watcher_register+0xa8/0x248
[ 188.330771][ T1]
[ 188.330771][ T1] stack backtrace:
[ 188.336671][ T1] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.0+
#1416
[ 188.343604][ T1] Hardware name: Huawei Taishan 2280 /D05, BIOS
Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 188.353506][ T1] Call trace:
[ 188.356688][ T1] dump_backtrace+0x0/0x3a0
[ 188.361089][ T1] show_stack+0x28/0x38
[ 188.365142][ T1] dump_stack+0x1ac/0x23c
[ 188.369371][ T1] lockdep_rcu_suspicious+0x100/0x118
[ 188.374646][ T1] ipmi_smi_watcher_register+0x134/0x248
[ 188.380185][ T1] init_ipmi_devintf+0xec/0x140
[ 188.384936][ T1] do_one_initcall+0x2f8/0x710
[ 188.389600][ T1] kernel_init_freeable+0x304/0x3dc
[ 188.394701][ T1] kernel_init+0x20/0x180
[ 188.398928][ T1] ret_from_fork+0x10/0x18
[ 188.403333][ T1] ipmi_si: IPMI System Interface driver
[ 188.411646][ T1] ipmi_si hisi-lpc-ipmi.5.auto: ipmi_platform:
probing via ACPI
[ 188.421958][ T1] ipmi_si hisi-lpc-ipmi.5.auto: IRQ index 0 not found
[ 188.428737][ T1] ipmi_si hisi-lpc-ipmi.5.auto: ipmi_platform: [io
0xffc0e3-0xffc0e7] regsize 1 spacing 1 irq 0
[ 188.439386][ T1] ipmi_si: Adding ACPI-specified bt state machine
[ 188.448341][ T1] kobject: 'wakeup' ((____ptrval____)):
kobject_release, parent (____ptrval____) (delayed 1000)
[ 188.458824][ T1] kobject: 'wakeup20' ((____ptrval____)):
kobject_release, parent (____ptrval____) (delayed 250)
[ 188.471778][ T1] ipmi_si hisi-lpc-ipmi.5.auto: ipmi_platform:
probing via ACPI
[ 188.481621][ T1] ipmi_si hisi-lpc-ipmi.5.auto: IRQ index 0 not found
[ 188.488390][ T1] ipmi_si hisi-lpc-ipmi.5.auto: ipmi_platform: [io
0xffc0e3-0xffc0e7] regsize 1 spacing 1 irq 0
[ 188.498936][ T1] ipmi_si: Adding ACPI-specified bt state machine
[ 188.507082][ T1] ipmi_si: Trying ACPI-specified bt state machine
at i/o address 0xffc0e3, slave address 0x0, irq 0
[ 188.524402][ T1] ipmi_si hisi-lpc-ipmi.5.auto: bt cap response too
short: 3
[ 188.531809][ T1] ipmi_si hisi-lpc-ipmi.5.auto: using default values
[ 188.538483][ T1] ipmi_si hisi-lpc-ipmi.5.auto: req2rsp=5 secs
retries=2
[ 188.572794][ T1]
[ 188.575080][ T1] =============================
[ 188.579930][ T1] WARNING: suspicious RCU usage
[ 188.584797][ T1] 5.4.0+ #1416 Not tainted
[ 188.589206][ T1] -----------------------------
[ 188.594055][ T1] drivers/char/ipmi/ipmi_msghandler.c:3453 RCU-list
traversed in non-reader section!!
[ 188.603617][ T1]
[ 188.603617][ T1] other info that might help us debug this:
[ 188.603617][ T1]
[ 188.613884][ T1]
[ 188.613884][ T1] rcu_scheduler_active = 2, debug_locks = 1
[ 188.621945][ T1] 2 locks held by swapper/0/1:
[ 188.626704][ T1] #0: ffffffd01496c8f0 (smi_infos_lock){+.+.}, at:
init_ipmi_si+0x78/0x24c
[ 188.635436][ T1] #1: ffffffd0149689b0
(ipmi_interfaces_mutex){+.+.}, at: ipmi_add_smi+0x5e4/0xae0
[ 188.644864][ T1]
[ 188.644864][ T1] stack backtrace:
[ 188.650765][ T1] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.0+
#1416
[ 188.657698][ T1] Hardware name: Huawei Taishan 2280 /D05, BIOS
Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 188.667598][ T1] Call trace:
[ 188.670777][ T1] dump_backtrace+0x0/0x3a0
[ 188.675178][ T1] show_stack+0x28/0x38
[ 188.679230][ T1] dump_stack+0x1ac/0x23c
[ 188.683458][ T1] lockdep_rcu_suspicious+0x100/0x118
[ 188.688726][ T1] ipmi_add_smi+0x630/0xae0
[ 188.688745][ T1] try_smi_init+0xdbc/0xf18
[ 188.688759][ T1] init_ipmi_si+0x14c/0x24c
[ 188.688773][ T1] do_one_initcall+0x2f8/0x710
[ 188.688787][ T1] kernel_init_freeable+0x304/0x3dc
[ 188.688802][ T1] kernel_init+0x20/0x180
[ 188.688815][ T1] ret_from_fork+0x10/0x18
[ 188.731333][ T1] ipmi_si hisi-lpc-ipmi.5.auto: IPMI message
handler: Found new BMC (man_id: 0x0007db, prod_id: 0x0001, dev_id: 0x01)
[ 188.935301][ T1] ipmi_si hisi-lpc-ipmi.5.auto: IPMI bt interface
initialized
[ 188.942797][ T1] ipmi_ssif: IPMI SSIF Interface driver
In the code we seem to be using srcu read lock instead of rcu read lock
for list iteration protection, but that's a quick analysis. And there is
also a mutex locked in ipmi_smi_watcher_register() [didn't check the
other case], so I wonder if rcu list iterator variant is required at
all.
Apologies if this has been reported previously. I see that Anders
(cc'ed) has been reporting some of these sorts of issues.
Thanks,
John
_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer