On 11/09/2017 12:02 PM, Andrew Banman wrote:
On 11/8/17 2:00 PM, Corey Minyard wrote:
On 11/08/2017 11:11 AM, Andrew Banman wrote:
On 11/8/17 11:06 AM, Andrew Banman wrote:
If there are uninitialized SMIs in the smi_infos list, i.e. with no
handlers set, then disable_si_irq() in cleanup_smi_one() will hit a null
pointer dereference when the former attempts to start the check enables
transaction. Thus, we panic during module exit.
I think this points to a broader problem of holding uninitialized smi_info
structs in smi_infos list. There are many places where handlers and other struct
members are assumed. Maybe a better design would be to remove SMIs from the list
if we have no intention of initializing them?

This begs the question: How did you produce this?  From what I can tell,
there is no way you can get to this code if you don't have a working and
initialized smi_info structure, and that's not the only place this would
have to be fixed if it wasn't.  So it's not what you assume, I don't think
it's an uninitialized smi_info structure on the list.
My kernel is missing 0944d889a2, wherein the dmi handling moved to a platform
device. Without this commit the driver discovers both ACPI and SMBIOS records
but only initializes the former; this is visible in the tracebacks below.

Is there a case for pushing a similar patch to stable releases predating 
0944d889a2
to prevent this crash?

It looks like you are using a Redhat 3.10 kernel.  Is that right?

-corey

Thank you for your time,

Andrew

As usual with these sorts of things, please send tracebacks and reproduction
procedures.
# dmesg | grep -i ipmi
[   14.883554] ipmi message handler version 39.2
[   14.905543] ipmi device interface
[   14.931149] ipmi_si IPI0001:00: ipmi_si: probing via ACPI
[   14.937254] ipmi_si IPI0001:00: [io  0x0ce4-0x0ce6] regsize 1 spacing 1 irq 6
[   14.946767] ipmi_si: Adding ACPI-specified bt state machine
[   14.954621] IPMI System Interface driver.
[   14.975257] ipmi_si: probing via SMBIOS
[   14.980446] ipmi_si: SMBIOS: mem 0xce4 regsize 1 spacing 1 irq 6
[   14.987163] ipmi_si: Adding SMBIOS-specified bt state machine
[   14.996706] ipmi_si: probing via SPMI
[   15.002351] ipmi_si: SPMI: io 0xce4 regsize 1 spacing 1 irq 6
[   15.010320] ipmi_si: Adding SPMI-specified bt state machine duplicate 
interface
[   15.020056] ipmi_si: Trying ACPI-specified bt state machine at i/o address 
0xce4, slave address 0x0, irq 6
[   15.066306] ipmi_si IPI0001:00: Using irq 6
[   15.077855] IPMI BT: req2rsp=2 secs retries=1
[   15.130045] ipmi_si IPI0001:00: Found new BMC (man_id: 0x000000, prod_id: 
0x0101, dev_id: 0x20)
[   15.140472] ipmi_si IPI0001:00: IPMI bt interface initialized
# rmmod ipmi_si
[   85.492578] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000010
[   85.501346] IP: [<ffffffffc027840e>] start_check_enables+0x3e/0x80 [ipmi_si]
[   85.509233] PGD 7fb3875067 PUD 7fbc222067 PMD 0
[   85.514421] Oops: 0000 [#1] SMP
[   85.518056] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge 
stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support 
intel_powerclamp coretemp vfat intel_rapl fat iosf_mbi kvm_intel kvm irqbypass 
crc32_pclmul ghash_clmulni_intel aesni_intel lrw i40e gf128mul glue_helper 
ablk_helper cryptd ptp pcspkr pps_core joydev sg i2c_i801 shpchp lpc_ich 
ipmi_si(-) wmi ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_cpufreq nfsd 
auth_rpcgss nfs_acl lockd numatools(OE) grace hwperf(OE) binfmt_misc sunrpc 
ip_tables sr_mod cdrom xfs libcrc32c sd_mod crc_t10dif crct10dif_generic 
mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul 
crct10dif_common sysimgblt crc32c_intel fb_sys_fops ahci ttm libahci uas drm 
libata usb_storage i2c_core
[   85.616799] CPU: 241 PID: 5743 Comm: rmmod Tainted: G           OE  
------------   3.10.0-693.el7.x86_64 #1
[   85.627674] Hardware name: HPE Superdome Flex/Superdome Flex, BIOS 
IP147.006.000.137.000.1711022001 11/02/2017
[   85.638843] task: ffff887fb8ac6eb0 ti: ffff887ffd30c000 task.ti: 
ffff887ffd30c000
[   85.647200] RIP: 0010:[<ffffffffc027840e>]  [<ffffffffc027840e>] 
start_check_enables+0x3e/0x80 [ipmi_si]
[   85.657794] RSP: 0018:ffff887ffd30fea8  EFLAGS: 00010246
[   85.663723] RAX: 0000000000000000 RBX: ffff88016930b000 RCX: 0000000000000006
[   85.671688] RDX: 0000000000000002 RSI: ffff887ffd30feae RDI: 0000000000000000
[   85.679655] RBP: ffff887ffd30fec0 R08: ffff88016930b1a0 R09: 00000001820001fb
[   85.687623] R10: 00000000beb72001 R11: ffffea013efadc80 R12: ffffffffc02822c0
[   85.695589] R13: 0000000000000800 R14: 0000000001f222b0 R15: 0000000001f22010
[   85.703555] FS:  00007fb9a5c10740(0000) GS:ffff887fbea40000(0000) 
knlGS:0000000000000000
[   85.712587] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   85.719002] CR2: 0000000000000010 CR3: 0000007ffd2a5000 CR4: 00000000003407e0
[   85.726968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   85.734936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   85.742904] Stack:
[   85.745150]  2f18887ffd30fec0 000000007706aaa0 ffff88016930b000 
ffff887ffd30fed8
[   85.753456]  ffffffffc027ae19 ffffffffc0281ff0 ffff887ffd30fef0 
ffffffffc027b9cb
[   85.761762]  fffffffffffffff5 ffff887ffd30ff78 ffffffff810fe55b 
0000000000000000
[   85.770063] Call Trace:
[   85.772797]  [<ffffffffc027ae19>] cleanup_one_si+0x149/0x180 [ipmi_si]
[   85.780090]  [<ffffffffc027b9cb>] cleanup_ipmi_si+0x6b/0xc0 [ipmi_si]
[   85.787293]  [<ffffffff810fe55b>] SyS_delete_module+0x19b/0x300
[   85.793913]  [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[   85.800621] Code: ee 18 c6 45 ef 2f 65 48 8b 04 25 28 00 00 00 48 89 45 f0 31 c0 
40 84 f6 75 33 48 8b 47 18 ba 02 00 00 00 48 8b 7f 10 48 8d 75 ee <ff> 50 10 48 
8b 45 f0 65 48 33 04 25 28 00 00 00 c7 43 38 05 00
[   85.822638] RIP  [<ffffffffc027840e>] start_check_enables+0x3e/0x80 [ipmi_si]
[   85.830611]  RSP <ffff887ffd30fea8>
[   85.834504] CR2: 0000000000000010


And with debug printks printing out some contents of the struct at the start of
module exit:

# rmmod ipmi_si; dmesg -c
[ 1083.527776]             smi_info ffff880fbb38f600
[ 1083.527779]             intf_num 0
[ 1083.527779]                 intf ffff880fbb800000
[ 1083.527780]               *si_sm ffff880fbb144400
[ 1083.527780]            *handlers ffffffffa051bac0
[ 1083.527780]              si_type 2

[ 1083.527782]             smi_info ffff880fbb38f200
[ 1083.527783]             intf_num 0
[ 1083.527783]                 intf           (null)
[ 1083.527784]               *si_sm           (null)
[ 1083.527785]            *handlers           (null)
[ 1083.527786]              si_type 2

@@ -3961,6 +3985,15 @@ static void cleanup_ipmi_si(void)
        if (!initialized)
                return;
+ list_for_each_entry_safe(e, tmp_e, &smi_infos, link) {
+               pr_crit("%20s %p\n", "smi_info",    e);
+               pr_crit("%20s %d\n", "intf_num",    e->intf_num);
+               pr_crit("%20s %p\n", "intf",                e->intf);
+               pr_crit("%20s %p\n", "*si_sm",              e->si_sm);
+               pr_crit("%20s %p\n", "*handlers",   e->handlers);
+               pr_crit("%20s %d\n\n", "si_type",   e->si_type);
+       }
+
  #ifdef CONFIG_PCI
        if (pci_registered)
                pci_unregister_driver(&ipmi_pci_driver);
If you are removing the module and this happens, there may be a race
conditions, but this is the wrong fix.  More likely that the structure gets
cleaned up and this function is called afterwards.

-corey

Andrew

Avoid panicking when there are uninitialized SMIs by checking for a handler
pointer before starting the check enables transaction.

Signed-off-by: Andrew Banman <aban...@hpe.com>
---
   drivers/char/ipmi/ipmi_si_intf.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index cb5719e..6c0b1b3 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -442,7 +442,7 @@ static void start_check_enables(struct smi_info
*smi_info, bool start_timer)

       if (start_timer)
           start_new_msg(smi_info, msg, 2);
-    else
+    else if (smi_info->handlers)
smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2);
       smi_info->si_state = SI_CHECKING_ENABLES;
   }




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to