Re: Panic on kldload/kldunload in/near callout

2015-09-13 Thread hiren panchasara
On 09/13/15 at 08:51P, Alexander V. Chernikov wrote:
> 
> 
> 12.09.2015, 20:30, "hiren panchasara" :
> > On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
> >> ?12.09.2015, 02:22, "hiren panchasara" :
> >> ?> On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
> >
> > [skip]
> >> ?> I'll try to get it. Meanwhile I am getting another panic on idle box:
> >> ?> http://pastebin.com/9qJTFMik
> >> ?The easiest explanation could be lack of lla_create() result check, fixed 
> >> in r286945.
> >> ?This panic is triggered by fast interface down-up (or just up), when ARP 
> >> packet is received but there are no (matching) IPv4 prefix on the 
> >> interface.
> >> ?If this is not the case (e.g. it paniced w/o any interface changes and 
> >> there were no other subnets in given L2 segment) I'd be happy to debug 
> >> this further.
> >
> > Just hit another last night. (Box goes to db> ; let me know if you want
> > to debug anything when that happens.)
> Would you mind showing full backtrace for that core? (e.g. situation has to 
> be different for newer -current).

Sure:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x20
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x80f214d6
stack pointer   = 0x28:0xfe3d0620
frame pointer   = 0x28:0xfe3d0630
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (irq264: igb0:que 0)
[ thread pid 12 tid 100035 ]
Stopped at  memcpy+0x16:repe movsb  (%rsi),%es:(%rdi)
db> bt full
Symbol not found
KDB: reentering
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+43/frame 0xfe3cff10
kdb_reenter() at kdb_reenter+51/frame 0xfe3cff20
db_term() at db_term+136/frame 0xfe3cff40
db_unary() at db_unary+116/frame 0xfe3cff60
db_mult_expr() at db_mult_expr+27/frame 0xfe3cffa0
db_add_expr() at db_add_expr+27/frame 0xfe3cffe0
db_expression() at db_expression+29/frame 0xfe3d0030
db_stack_trace() at db_stack_trace+48/frame 0xfe3d0060
db_command() at db_command+865/frame 0xfe3d0120
db_command_loop() at db_command_loop+100/frame 0xfe3d0130
db_trap() at db_trap+219/frame 0xfe3d01c0
kdb_trap() at kdb_trap+404/frame 0xfe3d0250
trap_fatal() at trap_fatal+789/frame 0xfe3d02b0
trap_pfault() at trap_pfault+806/frame 0xfe3d0350
trap() at trap+1124/frame 0xfe3d0560
calltrap() at calltrap+8/frame 0xfe3d0560
--- trap 12, rip = 18446744071577933014, rsp = 18446741874690295344, rbp = 
18446741874690295344 ---
memcpy() at memcpy+22/frame 0xfe3d0630
arpintr() at arpintr+2951/frame 0xfe3d0750
netisr_dispatch_src() at netisr_dispatch_src+97/frame 0xfe3d07c0
ether_demux() at ether_demux+345/frame 0xfe3d07f0
ether_nh_input() at ether_nh_input+888/frame 0xfe3d0850
netisr_dispatch_src() at netisr_dispatch_src+97/frame 0xfe3d08c0
ether_input() at ether_input+38/frame 0xfe3d08e0
igb_rxeof() at igb_rxeof+1764/frame 0xfe3d0990
igb_msix_que() at igb_msix_que+352/frame 0xfe3d09e0
intr_event_execute_handlers() at intr_event_execute_handlers+474/frame 
0xfe3d0a20
ithread_loop() at ithread_loop+166/frame 0xfe3d0a70
fork_exit() at fork_exit+156/frame 0xfe3d0ab0
fork_trampoline() at fork_trampoline+14/frame 0xfe3d0ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db> 

Cheers,
Hiren


pgp7EDF20rSEw.pgp
Description: PGP signature


Re: Panic on kldload/kldunload in/near callout

2015-09-13 Thread hiren panchasara
On 09/13/15 at 01:56P, hiren panchasara wrote:
> On 09/13/15 at 08:51P, Alexander V. Chernikov wrote:
> > 
> > 
> > 12.09.2015, 20:30, "hiren panchasara" :
> > > On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
> > >> ?12.09.2015, 02:22, "hiren panchasara" :
> > >> ?> On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
> > >
> > > [skip]
> > >> ?> I'll try to get it. Meanwhile I am getting another panic on idle box:
> > >> ?> http://pastebin.com/9qJTFMik
> > >> ?The easiest explanation could be lack of lla_create() result check, 
> > >> fixed in r286945.
> > >> ?This panic is triggered by fast interface down-up (or just up), when 
> > >> ARP packet is received but there are no (matching) IPv4 prefix on the 
> > >> interface.
> > >> ?If this is not the case (e.g. it paniced w/o any interface changes and 
> > >> there were no other subnets in given L2 segment) I'd be happy to debug 
> > >> this further.
> > >
> > > Just hit another last night. (Box goes to db> ; let me know if you want
> > > to debug anything when that happens.)
> > Would you mind showing full backtrace for that core? (e.g. situation has to 
> > be different for newer -current).

Apparently I was using an older current than r286945. :-(
Apologies for the false alarm. I'll update again if I see any issues.

Hiren


pgp6PwCZXEFcf.pgp
Description: PGP signature


Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread hiren panchasara
On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
> 12.09.2015, 02:22, "hiren panchasara" :
> > On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
[skip]
> > I'll try to get it. Meanwhile I am getting another panic on idle box:
> > http://pastebin.com/9qJTFMik
> The easiest explanation could be lack of lla_create() result check, fixed in 
> r286945.
> This panic is triggered by fast interface down-up (or just up), when ARP 
> packet is received but there are no (matching) IPv4 prefix on the interface.
> If this is not the case (e.g. it paniced w/o any interface changes and there 
> were no other subnets in given L2 segment) I'd be happy to debug this further.

Just hit another last night. (Box goes to db> ; let me know if you want
to debug anything when that happens.)
I am sure there were no interface changes on the box and it was sitting
idle. (Unsure of the other subnets part.) And I am on 3 days old -head
so I already have r286945. I disabled IPv6 on the box just to eliminate
that but panic still happens.

Cheers,
Hiren


pgprP6v0Xu12r.pgp
Description: PGP signature


Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread Alexander V . Chernikov


12.09.2015, 20:30, "hiren panchasara" :
> On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
>>  12.09.2015, 02:22, "hiren panchasara" :
>>  > On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
>
> [skip]
>>  > I'll try to get it. Meanwhile I am getting another panic on idle box:
>>  > http://pastebin.com/9qJTFMik
>>  The easiest explanation could be lack of lla_create() result check, fixed 
>> in r286945.
>>  This panic is triggered by fast interface down-up (or just up), when ARP 
>> packet is received but there are no (matching) IPv4 prefix on the interface.
>>  If this is not the case (e.g. it paniced w/o any interface changes and 
>> there were no other subnets in given L2 segment) I'd be happy to debug this 
>> further.
>
> Just hit another last night. (Box goes to db> ; let me know if you want
> to debug anything when that happens.)
Would you mind showing full backtrace for that core? (e.g. situation has to be 
different for newer -current).
> I am sure there were no interface changes on the box and it was sitting
> idle. (Unsure of the other subnets part.) And I am on 3 days old -head
> so I already have r286945. I disabled IPv6 on the box just to eliminate
> that but panic still happens.
>
> Cheers,
> Hiren
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread Hans Petter Selasky

On 09/12/15 01:21, hiren panchasara wrote:

On 09/11/15 at 09:06P, Hans Petter Selasky wrote:

On 09/10/15 21:23, hiren panchasara wrote:

I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10
08:15:43 MST 2015

I get random (1 out of 10 tries) panics when I do:
# kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet

I used to get panics on a couple months old -head also.

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x8225cf58
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad500
stack pointer   = 0x28:0xfe1f9d588700
frame pointer   = 0x28:0xfe1f9d588790
code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, long 1, def32 0, gran 1

Following https://www.freebsd.org/doc/faq/advanced.html, I did:
# nm -n /boot/kernel/kernel | grep 80aad500
# nm -n /boot/kernel/kernel | grep 80aad50
# nm -n /boot/kernel/kernel | grep 80aad5
# nm -n /boot/kernel/kernel | grep 80aad
80aad030 t itimers_event_hook_exec
80aad040 t realtimer_expire
80aad360 T callout_process
80aad6b0 t softclock_call_cc
80aadc10 T softclock
80aadd20 T timeout
80aade90 T callout_reset_sbt_on

So I guess " 80aad360 T callout_process" is the closest match?

I'll try to get real dump to get more information but that may take a
while.

ccing jch and hans who've been playing in this area.


Hi,

Possibly it means some timer was not drained before the module was
unloaded. It is not enough to only stop timers before freeing its
memory. Or maybe a timer was restarted after drain.

Can you get the full backtrace and put debugging symbols into the kernel?


I'll try to get it. Meanwhile I am getting another panic on idle box:
http://pastebin.com/9qJTFMik


That looks like a bug in the igb driver which is passing a NULL mbuf up!


#16 0x80b88156 in ether_input (ifp=, m=0x0) at 
/root/head/sys/net/if_ethersubr.c:676
#17 0x8053f004 in igb_rxeof (count=337545368) at 
/root/head/sys/dev/e1000/if_igb.c:4979


--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread Alexander V . Chernikov
12.09.2015, 02:22, "hiren panchasara" :
> On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
>>  On 09/10/15 21:23, hiren panchasara wrote:
>>  > I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10
>>  > 08:15:43 MST 2015
>>  >
>>  > I get random (1 out of 10 tries) panics when I do:
>>  > # kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet
>>  >
>>  > I used to get panics on a couple months old -head also.
>>  >
>>  > kernel trap 12 with interrupts disabled
>>  >
>>  > Fatal trap 12: page fault while in kernel mode
>>  > cpuid = 0; apic id = 00
>>  > fault virtual address = 0x8225cf58
>>  > fault code = supervisor read data, page not present
>>  > instruction pointer = 0x20:0x80aad500
>>  > stack pointer = 0x28:0xfe1f9d588700
>>  > frame pointer = 0x28:0xfe1f9d588790
>>  > code segment = base 0x0, limit 0xf, type 0x1b
>>  > = DPL 0, pres 1, long 1, def32 0, gran 1
>>  >
>>  > Following https://www.freebsd.org/doc/faq/advanced.html, I did:
>>  > # nm -n /boot/kernel/kernel | grep 80aad500
>>  > # nm -n /boot/kernel/kernel | grep 80aad50
>>  > # nm -n /boot/kernel/kernel | grep 80aad5
>>  > # nm -n /boot/kernel/kernel | grep 80aad
>>  > 80aad030 t itimers_event_hook_exec
>>  > 80aad040 t realtimer_expire
>>  > 80aad360 T callout_process
>>  > 80aad6b0 t softclock_call_cc
>>  > 80aadc10 T softclock
>>  > 80aadd20 T timeout
>>  > 80aade90 T callout_reset_sbt_on
>>  >
>>  > So I guess " 80aad360 T callout_process" is the closest match?
>>  >
>>  > I'll try to get real dump to get more information but that may take a
>>  > while.
>>  >
>>  > ccing jch and hans who've been playing in this area.
>>
>>  Hi,
>>
>>  Possibly it means some timer was not drained before the module was
>>  unloaded. It is not enough to only stop timers before freeing its
>>  memory. Or maybe a timer was restarted after drain.
>>
>>  Can you get the full backtrace and put debugging symbols into the kernel?
>
> I'll try to get it. Meanwhile I am getting another panic on idle box:
> http://pastebin.com/9qJTFMik
The easiest explanation could be lack of lla_create() result check, fixed in 
r286945.
This panic is triggered by fast interface down-up (or just up), when ARP packet 
is received but there are no (matching) IPv4 prefix on the interface.
If this is not the case (e.g. it paniced w/o any interface changes and there 
were no other subnets in given L2 segment) I'd be happy to debug this further.
>
> This "looks" similar to
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=156026 which got fixed
> via https://svnweb.freebsd.org/base?view=revision=r214675
> "Don't leak the LLE lock if the arptimer callout is pending or
> inactive."
>
> Is what I am seeing similar to this?
>
> I'll try and get more info.
>
> Cheers,
> Hiren
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Panic on kldload/kldunload in/near callout

2015-09-11 Thread Hans Petter Selasky

On 09/10/15 21:23, hiren panchasara wrote:

I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10
08:15:43 MST 2015

I get random (1 out of 10 tries) panics when I do:
# kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet

I used to get panics on a couple months old -head also.

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x8225cf58
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad500
stack pointer   = 0x28:0xfe1f9d588700
frame pointer   = 0x28:0xfe1f9d588790
code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1

Following https://www.freebsd.org/doc/faq/advanced.html, I did:
# nm -n /boot/kernel/kernel | grep 80aad500
# nm -n /boot/kernel/kernel | grep 80aad50
# nm -n /boot/kernel/kernel | grep 80aad5
# nm -n /boot/kernel/kernel | grep 80aad
80aad030 t itimers_event_hook_exec
80aad040 t realtimer_expire
80aad360 T callout_process
80aad6b0 t softclock_call_cc
80aadc10 T softclock
80aadd20 T timeout
80aade90 T callout_reset_sbt_on

So I guess " 80aad360 T callout_process" is the closest match?

I'll try to get real dump to get more information but that may take a
while.

ccing jch and hans who've been playing in this area.


Hi,

Possibly it means some timer was not drained before the module was 
unloaded. It is not enough to only stop timers before freeing its 
memory. Or maybe a timer was restarted after drain.


Can you get the full backtrace and put debugging symbols into the kernel?

--HPS



Cheers,
Hiren



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on kldload/kldunload in/near callout

2015-09-11 Thread hiren panchasara
On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
> On 09/10/15 21:23, hiren panchasara wrote:
> > I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10
> > 08:15:43 MST 2015
> >
> > I get random (1 out of 10 tries) panics when I do:
> > # kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet
> >
> > I used to get panics on a couple months old -head also.
> >
> > kernel trap 12 with interrupts disabled
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address   = 0x8225cf58
> > fault code  = supervisor read data, page not present
> > instruction pointer = 0x20:0x80aad500
> > stack pointer   = 0x28:0xfe1f9d588700
> > frame pointer   = 0x28:0xfe1f9d588790
> > code segment= base 0x0, limit 0xf, type 0x1b
> >  = DPL 0, pres 1, long 1, def32 0, gran 1
> >
> > Following https://www.freebsd.org/doc/faq/advanced.html, I did:
> > # nm -n /boot/kernel/kernel | grep 80aad500
> > # nm -n /boot/kernel/kernel | grep 80aad50
> > # nm -n /boot/kernel/kernel | grep 80aad5
> > # nm -n /boot/kernel/kernel | grep 80aad
> > 80aad030 t itimers_event_hook_exec
> > 80aad040 t realtimer_expire
> > 80aad360 T callout_process
> > 80aad6b0 t softclock_call_cc
> > 80aadc10 T softclock
> > 80aadd20 T timeout
> > 80aade90 T callout_reset_sbt_on
> >
> > So I guess " 80aad360 T callout_process" is the closest match?
> >
> > I'll try to get real dump to get more information but that may take a
> > while.
> >
> > ccing jch and hans who've been playing in this area.
> 
> Hi,
> 
> Possibly it means some timer was not drained before the module was 
> unloaded. It is not enough to only stop timers before freeing its 
> memory. Or maybe a timer was restarted after drain.
> 
> Can you get the full backtrace and put debugging symbols into the kernel?

I'll try to get it. Meanwhile I am getting another panic on idle box:
http://pastebin.com/9qJTFMik

This "looks" similar to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=156026 which got fixed
via https://svnweb.freebsd.org/base?view=revision=r214675
"Don't leak the LLE lock if the arptimer callout is pending or
inactive." 

Is what I am seeing similar to this?

I'll try and get more info.

Cheers,
Hiren


pgpYTDsiqkcuT.pgp
Description: PGP signature