On Wed, Sep 16, 2020 at 10:52:31PM +0800, Wu Bo wrote: > On 2020/9/14 22:29, Corey Minyard wrote: > > On Mon, Sep 14, 2020 at 09:33:07PM +0800, Wu Bo wrote: > > > On 2020/9/14 20:13, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > > Hi, Wu > > > > > > > > > From: Wu Bo <wub...@huawei.com> > > > > > In the virtual machine, Use mce_inject to inject errors into the > > > > > system. > > > > > After mce-inject injects an uncorrectable error, there is a > > > > > probability > > > > > that the virtual machine is not reset immediately, > > > > > but hangs for more than > > > > > 3000 seconds. And the write_data array is accessed out of bounds. > > > > > > > > > > The real reason is that smi_event_handler lack of lock protection in > > > > > the > > > > > multi-threaded scenario, which causes write_pos > > > > > to exceed the size of the write_data array. > > > > > > > > Thank you for the fix, but I wonder why this multi-threaded scenario > > > > happens. > > > > If my understanding is correct, only one CPU can run panic routines, and > > > > this means only one CPU calls flush_messages. Did I miss some call > > > > path? > > > > > > > > Best regards, > > > > > > > > Hidehiro Kawai > > > > Hitachi, Ltd. Research & Development Group > > > > > > Hi, > > > > > > You're right, only one CPU can run panic routines. > > > Sorry, I missed another call path. when the panic occurred and > > > interruption > > > occurred at the same time. > > > > > > CPU0 CPU3 > > > ->nmi_handle handle_irq > > > -> mce_raise_notify -> handle_fasteoi_irq > > > -> panic_event -> handle_irq_event > > > -> ipmi_panic_request_and_wait -> handle_irq_event_percpu > > > ->flush_messages -> ipmi_si_irq_handler > > > -> smi_event_handler -> smi_event_handler > > > ->kcs_event() ->kcs_event() > > > > > > There is a simultaneous call to the smi_event_handler() function. > > > > With your patch, this will result in spinning waiting for the lock > > forever. So that's not what is happening here. > > > > The panic event should stop the other processors before flush_messages() > > happens, right? If so, it is possible that the IPMI state machine is > > running, but this should result in the state machine being reset. > > > > -corey > > > > Hi, corey > > Thanks for your reply. MCE appears on all CPUs, > Currently all CPUs are in the NMI interrupt context. > cpu0 is the first to seize the opportunity to run panic routines, and panic > event should stop the other processors before flush_messages(). but cpu1, > cpu2 and cpu3 has already in NMI interrupt context, So the Second NMI > interrupt will not be processed again by cpu1, cpu2 and cpu3. > at this time, cpu1,cpu2 and cpu3 did not stoped. > > cpu1, cpu2 and cpu3 are waitting for cpu0 to finish the panic process. (do > wait_for_panic()), the irq is enabled in the wait_for_panic() function. > > ipmi IRQ occurs on the cpu3, and the cpu0 is doing the panic, > they have the opportunity to call the smi_event_handler() function > concurrently.
Ah, I understand now. This sounds more like a bug in the x86 exception handling. My understanding is that panic() runs with only one operational CPU in the system. I think there is other code (like printk) that also relies on this assumption. See bust_spinlocks() and oops_in_progress. I can't find anything that says this, but it's obvious that both panic() and mce_panic() make special efforts to ensure this. The trouble is that if you add the spinlocks in the IPMI panic handler, if an exception occurs while a CPU holds that spinlock, the system will hang forever, not just for 3000 seconds. The trouble is, I don't know of a good solution besides making the panic handing work like I think it should. Thanks, -corey > > Best regards, > > Wu Bo > > > > > > > In the kcs_event(): > > > case KCS_WAIT_WRITE: > > > ... > > > if (kcs->write_count == 1) { > > > write_cmd(kcs, KCS_WRITE_END); > > > kcs->state = KCS_WAIT_WRITE_END; > > > } else { > > > write_next_byte(kcs); > > > } > > > ... > > > > > > The interrupt call path has been locked and protected. > > > But the panic call path is not protected by lock. > > > There may be kcs->write_count == 1 is not invalid. So will appear > > > call write_next_byte() repeatedly, resulting in the write_data array is > > > accessed out of bounds. > > > > > > crash> bt -a > > > PID: 0 TASK: ffffffff96812780 CPU: 0 COMMAND: "swapper/0" > > > #0 [fffffe0000007bc0] panic at ffffffff956b2d3e > > > #1 [fffffe0000007c48] wait_for_panic at ffffffff95637ca2 > > > #2 [fffffe0000007c58] mce_timed_out at ffffffff95637f5d > > > #3 [fffffe0000007c70] do_machine_check at ffffffff95638db4 > > > #4 [fffffe0000007d80] raise_exception at ffffffffc05b6117 [mce_inject] > > > #5 [fffffe0000007e48] mce_raise_notify at ffffffffc05b6a92 [mce_inject] > > > #6 [fffffe0000007e58] nmi_handle at ffffffff95621c73 > > > #7 [fffffe0000007eb0] default_do_nmi at ffffffff9562213e > > > #8 [fffffe0000007ed0] do_nmi at ffffffff9562231c > > > #9 [fffffe0000007ef0] end_repeat_nmi at ffffffff960016b4 > > > [exception RIP: native_safe_halt+14] > > > RIP: ffffffff95e7223e RSP: ffffffff96803e90 RFLAGS: 00000246 > > > RAX: ffffffff95e71f30 RBX: 0000000000000000 RCX: 0000000000000000 > > > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 > > > RBP: 0000000000000000 R8: 00000018cf7c068a R9: 0000000000000001 > > > R10: ffffa222c0b17b88 R11: 0000000001e2f8fb R12: 0000000000000000 > > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > > --- <NMI exception stack> --- > > > #10 [ffffffff96803e90] native_safe_halt at ffffffff95e7223e > > > #11 [ffffffff96803e90] default_idle at ffffffff95e71f4a > > > #12 [ffffffff96803eb0] do_idle at ffffffff956e959a > > > #13 [ffffffff96803ef0] cpu_startup_entry at ffffffff956e981f > > > #14 [ffffffff96803f10] start_kernel at ffffffff96d9b206 > > > #15 [ffffffff96803f50] secondary_startup_64 at ffffffff956000e7 > > > > > > > > > PID: 0 TASK: ffff8b06c77dc740 CPU: 3 COMMAND: "swapper/3" > > > [exception RIP: port_outb+17] > > > RIP: ffffffffc035f1a1 RSP: ffff8b06fad83e90 RFLAGS: 00000002 > > > RAX: 0000000000000000 RBX: ffff8b06f08bec00 RCX: 0000000000000010 > > > RDX: 0000000000000ca2 RSI: 0000000000000000 RDI: ffff8b06f0bd5e40 > > > RBP: 0000000000000001 R8: ffff8b06fad80080 R9: ffff8b06fad84000 > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > > > R13: ffff8b06fad83f54 R14: 0000000000000000 R15: 0000000000000000 > > > CS: 0010 SS: 0018 > > > #0 [ffff8b06fad83e90] kcs_event at ffffffffc035c2c7 [ipmi_si] > > > #1 [ffff8b06fad83eb0] smi_event_handler at ffffffffc035aa3f [ipmi_si] > > > #2 [ffff8b06fad83ee8] ipmi_si_irq_handler at ffffffffc035b0cc [ipmi_si] > > > #3 [ffff8b06fad83f08] __handle_irq_event_percpu at ffffffff9571dfc0 > > > #4 [ffff8b06fad83f48] handle_irq_event_percpu at ffffffff9571e140 > > > #5 [ffff8b06fad83f70] handle_irq_event at ffffffff9571e1b6 > > > #6 [ffff8b06fad83f90] handle_fasteoi_irq at ffffffff95721b42 > > > #7 [ffff8b06fad83fb0] handle_irq at ffffffff956209e8 > > > #8 [ffff8b06fad83fc0] do_IRQ at ffffffff96001ee9 > > > --- <IRQ stack> --- > > > #9 [fffffe0000088b98] ret_from_intr at ffffffff96000a8f > > > [exception RIP: delay_tsc+52] > > > RIP: ffffffff95e5fb64 RSP: fffffe0000088c48 RFLAGS: 00000287 > > > RAX: 000023fb5edf4b14 RBX: 00000000003e0451 RCX: 000023fb5edf4798 > > > RDX: 000000000000037c RSI: 0000000000000003 RDI: 000000000000095b > > > RBP: fffffe0000088cc0 R8: 0000000000000004 R9: fffffe0000088c5c > > > R10: ffffffff96a05ae0 R11: 0000000000000000 R12: fffffe0000088cb0 > > > R13: 0000000000000001 R14: fffffe0000088ef8 R15: ffffffff9666a2f0 > > > ORIG_RAX: ffffffffffffffd9 CS: 0010 SS: 0018 > > > #10 [fffffe0000088c48] wait_for_panic at ffffffff95637c6c > > > #11 [fffffe0000088c58] mce_timed_out at ffffffff95637f5d > > > #12 [fffffe0000088c70] do_machine_check at ffffffff95638db4 > > > #13 [fffffe0000088d80] raise_exception at ffffffffc05b6117 [mce_inject] > > > #14 [fffffe0000088e48] mce_raise_notify at ffffffffc05b6a92 [mce_inject] > > > #15 [fffffe0000088e58] nmi_handle at ffffffff95621c73 > > > #16 [fffffe0000088eb0] default_do_nmi at ffffffff9562213e > > > #17 [fffffe0000088ed0] do_nmi at ffffffff9562231c > > > #18 [fffffe0000088ef0] end_repeat_nmi at ffffffff960016b4 > > > [exception RIP: native_safe_halt+14] > > > RIP: ffffffff95e7223e RSP: ffffa222c06a3eb0 RFLAGS: 00000246 > > > RAX: ffffffff95e71f30 RBX: 0000000000000003 RCX: 0000000000000001 > > > RDX: 0000000000000001 RSI: 0000000000000083 RDI: 0000000000000000 > > > RBP: 0000000000000003 R8: 00000018cf7cd9a0 R9: 0000000000000001 > > > R10: 0000000000000400 R11: 00000000000003fb R12: 0000000000000000 > > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > > --- <NMI exception stack> --- > > > #19 [ffffa222c06a3eb0] native_safe_halt at ffffffff95e7223e > > > #20 [ffffa222c06a3eb0] default_idle at ffffffff95e71f4a > > > #21 [ffffa222c06a3ed0] do_idle at ffffffff956e959a > > > #22 [ffffa222c06a3f10] cpu_startup_entry at ffffffff956e981f > > > #23 [ffffa222c06a3f30] start_secondary at ffffffff9564e697 > > > #24 [ffffa222c06a3f50] secondary_startup_64 at ffffffff956000e7 > > > > > > > > > > > > > > > > . > > > _______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer