On Wed, Sep 16, 2020 at 10:52:31PM +0800, Wu Bo wrote:
> On 2020/9/14 22:29, Corey Minyard wrote:
> > On Mon, Sep 14, 2020 at 09:33:07PM +0800, Wu Bo wrote:
> > > On 2020/9/14 20:13, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi, Wu
> > > > 
> > > > > From: Wu Bo <wub...@huawei.com>
> > > > > In the virtual machine, Use mce_inject to inject errors into the 
> > > > > system.
> > > > > After mce-inject injects an uncorrectable error, there is a 
> > > > > probability
> > > > > that the virtual machine is not reset immediately,
> > > > > but hangs for more than
> > > > > 3000 seconds. And the write_data array is accessed out of bounds.
> > > > > 
> > > > > The real reason is that smi_event_handler lack of lock protection in 
> > > > > the
> > > > > multi-threaded scenario, which causes write_pos
> > > > > to exceed the size of the write_data array.
> > > > 
> > > > Thank you for the fix, but I wonder why this multi-threaded scenario 
> > > > happens.
> > > > If my understanding is correct, only one CPU can run panic routines, and
> > > > this means only one CPU calls flush_messages.  Did I miss some call 
> > > > path?
> > > > 
> > > > Best regards,
> > > > 
> > > > Hidehiro Kawai
> > > > Hitachi, Ltd. Research & Development Group
> > > 
> > > Hi,
> > > 
> > > You're right, only one CPU can run panic routines.
> > > Sorry, I missed another call path. when the panic occurred and 
> > > interruption
> > > occurred at the same time.
> > > 
> > > CPU0                                  CPU3
> > >   ->nmi_handle                         handle_irq
> > >     -> mce_raise_notify                 -> handle_fasteoi_irq
> > >       -> panic_event                      -> handle_irq_event
> > >         -> ipmi_panic_request_and_wait      -> handle_irq_event_percpu
> > >            ->flush_messages                   -> ipmi_si_irq_handler
> > >                -> smi_event_handler              -> smi_event_handler
> > >           ->kcs_event()                       ->kcs_event()
> > > 
> > > There is a simultaneous call to the smi_event_handler() function.
> > 
> > With your patch, this will result in spinning waiting for the lock
> > forever. So that's not what is happening here.
> > 
> > The panic event should stop the other processors before flush_messages()
> > happens, right?  If so, it is possible that the IPMI state machine is
> > running, but this should result in the state machine being reset.
> > 
> > -corey
> > 
> 
> Hi, corey
> 
> Thanks for your reply. MCE appears on all CPUs,
> Currently all CPUs are in the NMI interrupt context.
> cpu0 is the first to seize the opportunity to run panic routines, and panic
> event should stop the other processors before flush_messages(). but cpu1,
> cpu2 and cpu3 has already in NMI interrupt context, So the Second NMI
> interrupt will not be processed again by cpu1, cpu2 and cpu3.
> at this time, cpu1,cpu2 and cpu3 did not stoped.
> 
> cpu1, cpu2 and cpu3 are waitting for cpu0 to finish the panic process. (do
> wait_for_panic()), the irq is enabled in the wait_for_panic() function.
> 
> ipmi IRQ occurs on the cpu3, and the cpu0 is doing the panic,
> they have the opportunity to call the smi_event_handler() function
> concurrently.

Ah, I understand now.  This sounds more like a bug in the x86 exception
handling.  My understanding is that panic() runs with only one
operational CPU in the system.  I think there is other code (like
printk) that also relies on this assumption.  See bust_spinlocks() and
oops_in_progress.

I can't find anything that says this, but it's obvious that both panic()
and mce_panic() make special efforts to ensure this.

The trouble is that if you add the spinlocks in the IPMI panic handler,
if an exception occurs while a CPU holds that spinlock, the system will
hang forever, not just for 3000 seconds.  The trouble is, I don't know
of a good solution besides making the panic handing work like I think it
should.

Thanks,

-corey

> 
> Best regards,
> 
> Wu Bo
> 
> > > 
> > > In the kcs_event():
> > > case KCS_WAIT_WRITE:
> > > ...
> > > if (kcs->write_count == 1) {
> > >   write_cmd(kcs, KCS_WRITE_END);
> > >   kcs->state = KCS_WAIT_WRITE_END;
> > > } else {
> > >   write_next_byte(kcs);
> > > }
> > > ...
> > > 
> > > The interrupt call path has been locked and protected.
> > > But the panic call path is not protected by lock.
> > > There may be kcs->write_count == 1 is not invalid. So will appear
> > > call write_next_byte() repeatedly, resulting in the write_data array is
> > > accessed out of bounds.
> > > 
> > > crash> bt -a
> > > PID: 0      TASK: ffffffff96812780  CPU: 0   COMMAND: "swapper/0"
> > >   #0 [fffffe0000007bc0] panic at ffffffff956b2d3e
> > >   #1 [fffffe0000007c48] wait_for_panic at ffffffff95637ca2
> > >   #2 [fffffe0000007c58] mce_timed_out at ffffffff95637f5d
> > >   #3 [fffffe0000007c70] do_machine_check at ffffffff95638db4
> > >   #4 [fffffe0000007d80] raise_exception at ffffffffc05b6117 [mce_inject]
> > >   #5 [fffffe0000007e48] mce_raise_notify at ffffffffc05b6a92 [mce_inject]
> > >   #6 [fffffe0000007e58] nmi_handle at ffffffff95621c73
> > >   #7 [fffffe0000007eb0] default_do_nmi at ffffffff9562213e
> > >   #8 [fffffe0000007ed0] do_nmi at ffffffff9562231c
> > >   #9 [fffffe0000007ef0] end_repeat_nmi at ffffffff960016b4
> > >      [exception RIP: native_safe_halt+14]
> > >      RIP: ffffffff95e7223e  RSP: ffffffff96803e90  RFLAGS: 00000246
> > >      RAX: ffffffff95e71f30  RBX: 0000000000000000  RCX: 0000000000000000
> > >      RDX: 0000000000000001  RSI: 0000000000000000  RDI: 0000000000000000
> > >      RBP: 0000000000000000   R8: 00000018cf7c068a   R9: 0000000000000001
> > >      R10: ffffa222c0b17b88  R11: 0000000001e2f8fb  R12: 0000000000000000
> > >      R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
> > >      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > --- <NMI exception stack> ---
> > > #10 [ffffffff96803e90] native_safe_halt at ffffffff95e7223e
> > > #11 [ffffffff96803e90] default_idle at ffffffff95e71f4a
> > > #12 [ffffffff96803eb0] do_idle at ffffffff956e959a
> > > #13 [ffffffff96803ef0] cpu_startup_entry at ffffffff956e981f
> > > #14 [ffffffff96803f10] start_kernel at ffffffff96d9b206
> > > #15 [ffffffff96803f50] secondary_startup_64 at ffffffff956000e7
> > > 
> > > 
> > > PID: 0      TASK: ffff8b06c77dc740  CPU: 3   COMMAND: "swapper/3"
> > >      [exception RIP: port_outb+17]
> > >      RIP: ffffffffc035f1a1  RSP: ffff8b06fad83e90  RFLAGS: 00000002
> > >      RAX: 0000000000000000  RBX: ffff8b06f08bec00  RCX: 0000000000000010
> > >      RDX: 0000000000000ca2  RSI: 0000000000000000  RDI: ffff8b06f0bd5e40
> > >      RBP: 0000000000000001   R8: ffff8b06fad80080   R9: ffff8b06fad84000
> > >      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
> > >      R13: ffff8b06fad83f54  R14: 0000000000000000  R15: 0000000000000000
> > >      CS: 0010  SS: 0018
> > >   #0 [ffff8b06fad83e90] kcs_event at ffffffffc035c2c7 [ipmi_si]
> > >   #1 [ffff8b06fad83eb0] smi_event_handler at ffffffffc035aa3f [ipmi_si]
> > >   #2 [ffff8b06fad83ee8] ipmi_si_irq_handler at ffffffffc035b0cc [ipmi_si]
> > >   #3 [ffff8b06fad83f08] __handle_irq_event_percpu at ffffffff9571dfc0
> > >   #4 [ffff8b06fad83f48] handle_irq_event_percpu at ffffffff9571e140
> > >   #5 [ffff8b06fad83f70] handle_irq_event at ffffffff9571e1b6
> > >   #6 [ffff8b06fad83f90] handle_fasteoi_irq at ffffffff95721b42
> > >   #7 [ffff8b06fad83fb0] handle_irq at ffffffff956209e8
> > >   #8 [ffff8b06fad83fc0] do_IRQ at ffffffff96001ee9
> > > --- <IRQ stack> ---
> > >   #9 [fffffe0000088b98] ret_from_intr at ffffffff96000a8f
> > >      [exception RIP: delay_tsc+52]
> > >      RIP: ffffffff95e5fb64  RSP: fffffe0000088c48  RFLAGS: 00000287
> > >      RAX: 000023fb5edf4b14  RBX: 00000000003e0451  RCX: 000023fb5edf4798
> > >      RDX: 000000000000037c  RSI: 0000000000000003  RDI: 000000000000095b
> > >      RBP: fffffe0000088cc0   R8: 0000000000000004   R9: fffffe0000088c5c
> > >      R10: ffffffff96a05ae0  R11: 0000000000000000  R12: fffffe0000088cb0
> > >      R13: 0000000000000001  R14: fffffe0000088ef8  R15: ffffffff9666a2f0
> > >      ORIG_RAX: ffffffffffffffd9  CS: 0010  SS: 0018
> > > #10 [fffffe0000088c48] wait_for_panic at ffffffff95637c6c
> > > #11 [fffffe0000088c58] mce_timed_out at ffffffff95637f5d
> > > #12 [fffffe0000088c70] do_machine_check at ffffffff95638db4
> > > #13 [fffffe0000088d80] raise_exception at ffffffffc05b6117 [mce_inject]
> > > #14 [fffffe0000088e48] mce_raise_notify at ffffffffc05b6a92 [mce_inject]
> > > #15 [fffffe0000088e58] nmi_handle at ffffffff95621c73
> > > #16 [fffffe0000088eb0] default_do_nmi at ffffffff9562213e
> > > #17 [fffffe0000088ed0] do_nmi at ffffffff9562231c
> > > #18 [fffffe0000088ef0] end_repeat_nmi at ffffffff960016b4
> > >      [exception RIP: native_safe_halt+14]
> > >      RIP: ffffffff95e7223e  RSP: ffffa222c06a3eb0  RFLAGS: 00000246
> > >      RAX: ffffffff95e71f30  RBX: 0000000000000003  RCX: 0000000000000001
> > >      RDX: 0000000000000001  RSI: 0000000000000083  RDI: 0000000000000000
> > >      RBP: 0000000000000003   R8: 00000018cf7cd9a0   R9: 0000000000000001
> > >      R10: 0000000000000400  R11: 00000000000003fb  R12: 0000000000000000
> > >      R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
> > >      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > --- <NMI exception stack> ---
> > > #19 [ffffa222c06a3eb0] native_safe_halt at ffffffff95e7223e
> > > #20 [ffffa222c06a3eb0] default_idle at ffffffff95e71f4a
> > > #21 [ffffa222c06a3ed0] do_idle at ffffffff956e959a
> > > #22 [ffffa222c06a3f10] cpu_startup_entry at ffffffff956e981f
> > > #23 [ffffa222c06a3f30] start_secondary at ffffffff9564e697
> > > #24 [ffffa222c06a3f50] secondary_startup_64 at ffffffff956000e7
> > > 
> > > 
> > > 
> > > 
> > 
> > .
> > 
> 


_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to