Hi,

we are seeing kernel hangs (hard reset required) under high interrupt
load (triggered by ix(4) cards). I've traced this back to an overflow
of the kernel stack. This is how the stack looks at the time of the
overflow (manual backtrace, as ddb won't show the final part):

EBP             (%EBP)          4(%EBP)
0xe13b825c:     0xe13b827c      0xd02d8b1d      panic
0xe13b827c:     0xe13b82ac      0xd02d1a42      timeout_add
0xe13b82ac:     0xe13b82dc      0xd02bfd6e      hardclock
0xe13b82dc:     0xe13b82fc      0xd04dddca      lapic_clockintr
0xe13b82fc:     0xe13b8304      0xd0202475      Xintrltimer
/* OLD EBP at ebp+0x20(9 words) OLD EIP at ebp+0x3c (15words) */
                EBP=0xe13b8384  EIP=0xd02d7c31  pool_do_put
0xe13b8384:     0xe13b83a4      0xd02d7b6f      pool_put
0xe13b83a4:     0xe13b83d4      0xd02e7f26      m_free
0xe13b83d4:     0xe13b83f4      0xd02e7fa9      m_freem
0xe13b83f4:     0xe13b8434      0xd030a492      ether_input
0xe13b8434:     0xe13b8474      0xd043f2cf      ixgbe_rxeof
0xe13b8474:     0xe13b84b4      0xd043c7ed      ixgbe_legacy_irq
0xe13b84b4:     0xe13b84bc      0xd02027ee      Xintr_ioapic1
                EBP=0xe13b9e10  EIP=d0202128    Xdoreti (before iret)
0xe13b9e10:     0xe13b9e40      0xd028bd27      pf_pull_hdr
0xe13b9e40:     0xe13b9f40      0xd028d570      pf_test
0xe13b9f40:     0xe13b9f80      0xd03478b4      ipv4_input
0xe13b9f80:     0xe13b9fa0      0xd0347765      ipintr
0xe13b9fa0:     0xe13b9fa0      0xd0202182      Xsoftnet

Note that the size of the stack frame between Xintr_ioapic1 and pf_pull_hdr
is huge (~6k). This area is filled with a 12 byte pattern that looks like
a code address, the kernel code segment and a pushed eflags register. The
code address points to the interrupt return path in this piece of code from
i386/locore.s:

#define INTRFASTEXIT \
        popl    %fs             ; \
        popl    %gs             ; \
        popl    %es             ; \
        popl    %ds             ; \
        popl    %edi            ; \
        popl    %esi            ; \
        popl    %ebp            ; \
        popl    %ebx            ; \
        popl    %edx            ; \
        popl    %ecx            ; \
        popl    %eax            ; \
        sti                     ; \    <===== (1)
        addl    $8,%esp         ; \
        iret                           <===== (2)

The return address points to the iret marked with (2). I.e. we get hit by
an interrupt immediately before the iret and this happens repeatedly until
the kernel stack overflows. This is only possible due to the sti instruction
at the point marked (1). As iret will restore the pushed eflags value anyway,
it should be safe to remove the sti altogether. Discussed with bluhm@ and
hshoexer@. Patch follows:

Index: locore.s
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
retrieving revision 1.130
diff -u -r1.130 locore.s
--- locore.s    3 Jul 2010 04:54:32 -0000       1.130
+++ locore.s    18 Apr 2011 13:52:16 -0000
@@ -128,7 +128,6 @@
        popl    %edx            ; \
        popl    %ecx            ; \
        popl    %eax            ; \
-       sti                     ; \
        addl    $8,%esp         ; \
        iret
 

The sti was introduced in revision 1.97 of locore.s in March 2006 by
mickey@. Commit message:

| prevent the faults on iret to run w/ disabled intrs and cause
| deadlocks; niklas toby tom ok

Maybe mickey or one of the people giving oks back then want to comment?

     regards     Christian

Reply via email to