Re: panics in network stack in 12-current

2017-04-29 Thread Tom Uffner

Hamza Sheikh wrote:

I may have encountered something similar on an EdgeRouter Lite running
r317256. It's serving as network gateway at home. After some time the
WAN connection goes dead. It starts working with either (a)
reconnecting the network cable or (b) pinging any IP on the internet
from that box. On rare occasions I had to reboot to get it to work.


it doesn't sound much like my problem. i had no network issues until
the system would suddenly panic and reboot. removing FLOWTABLE from
my kernel might have fixed it, but it is too early to tell as I have
yet to discover a reproducible way to trigger the bug.


I'm still new to FreeBSD and don't know how to collect relevant
information or whether to even determine if my issue is related to
Andrey's. Any help is really appreciated. My setup is documented in
detail in a blog post[0] if it helps.


You probably don't want to hear this, but if you are new to FreeBSD,
maybe you shouldn't be running current. I probably shouldn't running
current and I have 35 years of BSD experience. I do it as a way of
contributing to the project by alpha-testing new code when I have time.

Brendan Gregg has some very good material on his site that might help you
learn to collect useful info about what is going on inside your systems.

http://www.brendangregg.com/Perf/freebsd_observability_tools.png
http://www.brendangregg.com/
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panics in network stack in 12-current

2017-04-27 Thread Hamza Sheikh
On Thu, Apr 27, 2017 at 10:02 AM, Tom Uffner  wrote:
> Andrey V. Elsukov wrote:
>>
>> On 27.04.2017 08:42, Tom Uffner wrote:
>>>
>>> r315956 panicked about 22 min after boot. failed to dump a core.
>>
>>
>> Why not update to the latest revision?
>
>
> I did several times a while ago, but didn't get a panic free system.

I may have encountered something similar on an EdgeRouter Lite running
r317256. It's serving as network gateway at home. After some time the
WAN connection goes dead. It starts working with either (a)
reconnecting the network cable or (b) pinging any IP on the internet
from that box. On rare occasions I had to reboot to get it to work.

I'm still new to FreeBSD and don't know how to collect relevant
information or whether to even determine if my issue is related to
Andrey's. Any help is really appreciated. My setup is documented in
detail in a blog post[0] if it helps.


[0] http://www.codeghar.com/blog/freebsd-network-gateway-on-edgerouter-lite.html

---
Hamza Sheikh
Twitter: @aikchar
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panics in network stack in 12-current

2017-04-27 Thread Tom Uffner

Andrey V. Elsukov wrote:

On 27.04.2017 08:42, Tom Uffner wrote:

Tom Uffner wrote:

Andrey V. Elsukov wrote:

I think the most of these panics should be fixed in r315956.


thanks. I'll give it a try and report back as soon as I have a result.


r315956 panicked about 22 min after boot. failed to dump a core.


Why not update to the latest revision?

Probably this is flowtable related, don't think it is usable.
Anyway we need the trace to determine the cause. Also it seems you have
VIMAGE enabled. This also have some known panics.


attached is a text dump from this version


core.txt.6.bz2
Description: Binary data
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: panics in network stack in 12-current

2017-04-27 Thread Tom Uffner

Andrey V. Elsukov wrote:

On 27.04.2017 08:42, Tom Uffner wrote:

r315956 panicked about 22 min after boot. failed to dump a core.


Why not update to the latest revision?


I did several times a while ago, but didn't get a panic free system. I was
hoping to bisect the point the point where the problem actually occurred
and maybe send a patch instead of just begging for help. trouble was, I got
down to a small number of revisions and none of them had any changes that
looked even remotely related to my problem. I'll give today's HEAD a try.


Probably this is flowtable related, don't think it is usable.
Anyway we need the trace to determine the cause. Also it seems you have
VIMAGE enabled. This also have some known panics.


OK, I will also try disabling flowtable. Not sure about VIMAGE. I don't
have it specifically enabled, but I don't have it specifically disabled
either if it defaults to on. I don't know much about it.

I have also tried using the GENERIC kernel instead of my custom one, but
it was even less stable on my hardware and bricked the system instead of
panicking and producing a core dump.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panics in network stack in 12-current

2017-04-27 Thread Andrey V. Elsukov
On 27.04.2017 08:42, Tom Uffner wrote:
> Tom Uffner wrote:
>> Andrey V. Elsukov wrote:
>>> I think the most of these panics should be fixed in r315956.
>>
>> thanks. I'll give it a try and report back as soon as I have a result.
> 
> r315956 panicked about 22 min after boot. failed to dump a core.

Why not update to the latest revision?

Probably this is flowtable related, don't think it is usable.
Anyway we need the trace to determine the cause. Also it seems you have
VIMAGE enabled. This also have some known panics.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: panics in network stack in 12-current

2017-04-26 Thread Tom Uffner

Tom Uffner wrote:

Andrey V. Elsukov wrote:

I think the most of these panics should be fixed in r315956.


thanks. I'll give it a try and report back as soon as I have a result.


r315956 panicked about 22 min after boot. failed to dump a core.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panics in network stack in 12-current

2017-04-25 Thread Tom Uffner

Andrey V. Elsukov wrote:

On 26.04.2017 04:03, Tom Uffner wrote:
I think the most of these panics should be fixed in r315956.


thanks. I'll give it a try and report back as soon as I have a result.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panics in network stack in 12-current

2017-04-25 Thread Andrey V. Elsukov
On 26.04.2017 04:03, Tom Uffner wrote:
> Since updating my -current box to 12 several months ago, I have been
> trying to pin down several elusive and probably related panics.
> 
> they always manifest a a trap out of rw_wlock_hard()
> 
> i am fairly certain that r302409 was stable, revs up through r306792 may be
> stable, or perhaps I just didn't wait long enough for my system to panic. I
> don't know of anything that I can reproducably poke at to trigger this.
> r306807 is definitely bad, as is everything up through r309124. I
> haven't seen anything on the mailing lists or in the SVN logs that looks
> like it is related to my problem.
> 
> my hardware is an Asus M4A77TD MB, AMD Phenom 2 X6 1100T CPU (for some of
> this time I had an Athlon 2 X2, but upgraded recently), and RealTek 8168
> PCIe Gigabit NIC.
> 
> FreeBSD discordia.uffner.com 12.0-CURRENT FreeBSD 12.0-CURRENT #33
> r306807M: Tue Apr 18 17:09:55 EDT 2017
> t...@discordia.uffner.com:/usr/obj/usr/src/sys/DISCORDIA  amd64
> 
> in revs between 306807-307125, the panics have been in flowcleaner, in
> more recent ones, they happen in arbitrary userspace processes that make
> heavy use
> of the network.
> 
> I know I should try the latest rev to see if it went away. aside from
> that, any thoughts on how I should proceed?
I think the most of these panics should be fixed in r315956.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


panics in network stack in 12-current

2017-04-25 Thread Tom Uffner
Since updating my -current box to 12 several months ago, I have been trying to 
pin down several elusive and probably related panics.


they always manifest a a trap out of rw_wlock_hard()

i am fairly certain that r302409 was stable, revs up through r306792 may be
stable, or perhaps I just didn't wait long enough for my system to panic. I
don't know of anything that I can reproducably poke at to trigger this.
r306807 is definitely bad, as is everything up through r309124. I haven't seen 
anything on the mailing lists or in the SVN logs that looks like it is related 
to my problem.


my hardware is an Asus M4A77TD MB, AMD Phenom 2 X6 1100T CPU (for some of
this time I had an Athlon 2 X2, but upgraded recently), and RealTek 8168
PCIe Gigabit NIC.

FreeBSD discordia.uffner.com 12.0-CURRENT FreeBSD 12.0-CURRENT #33 r306807M: 
Tue Apr 18 17:09:55 EDT 2017 
t...@discordia.uffner.com:/usr/obj/usr/src/sys/DISCORDIA  amd64


in revs between 306807-307125, the panics have been in flowcleaner, in more 
recent ones, they happen in arbitrary userspace processes that make heavy use

of the network.

I know I should try the latest rev to see if it went away. aside from that, 
any thoughts on how I should proceed?


Mon Apr 17 02:52:10 EDT 2017

FreeBSD discordia.uffner.com 12.0-CURRENT FreeBSD 12.0-CURRENT #32 r306821M: 
Fri Apr  7 02:11:44 EDT 2017 
t...@discordia.uffner.com:/usr/obj/usr/src/sys/DISCORDIA  amd64


panic: page fault

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x3b8
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8057820d
stack pointer   = 0x28:0xfe046a422650
frame pointer   = 0x28:0xfe046a422690
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 697 (ntpd)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe046a4222b0
vpanic() at vpanic+0x186/frame 0xfe046a422330
panic() at panic+0x43/frame 0xfe046a422390
trap_fatal() at trap_fatal+0x331/frame 0xfe046a4223f0
trap_pfault() at trap_pfault+0x14f/frame 0xfe046a422430
trap() at trap+0x21e/frame 0xfe046a422580
calltrap() at calltrap+0x8/frame 0xfe046a422580
--- trap 0xc, rip = 0x8057820d, rsp = 0xfe046a422650, rbp = 
0xfe046a422690 ---

__rw_wlock_hard() at __rw_wlock_hard+0xad/frame 0xfe046a422690
ip_output() at ip_output+0x483/frame 0xfe046a4227c0
udp_send() at udp_send+0xb8f/frame 0xfe046a422890
sosend_dgram() at sosend_dgram+0x431/frame 0xfe046a422910
kern_sendit() at kern_sendit+0x178/frame 0xfe046a4229c0
sendit() at sendit+0x179/frame 0xfe046a422a10
sys_sendto() at sys_sendto+0x4d/frame 0xfe046a422a60
amd64_syscall() at amd64_syscall+0x391/frame 0xfe046a422bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe046a422bf0
--- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x8013c9cba, rsp = 
0x7fffdfffc7e8, rbp = 0x7fffdfffc830 ---



Mon Apr 17 03:19:00 EDT 2017

FreeBSD discordia.uffner.com 12.0-CURRENT FreeBSD 12.0-CURRENT #32 r306821M: 
Fri Apr  7 02:11:44 EDT 2017 
t...@discordia.uffner.com:/usr/obj/usr/src/sys/DISCORDIA  amd64


panic: page fault

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x3b8
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8057820d
stack pointer   = 0x28:0xfe0469a0eab0
frame pointer   = 0x28:0xfe0469a0eaf0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 21 (flowcleaner)
trap number = 12
Timeout initializing vt_vga
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0469a0e710
vpanic() at vpanic+0x186/frame 0xfe0469a0e790
panic() at panic+0x43/frame 0xfe0469a0e7f0
trap_fatal() at trap_fatal+0x331/frame 0xfe0469a0e850
trap_pfault() at trap_pfault+0x14f/frame 0xfe0469a0e890
trap() at trap+0x21e/frame 0xfe0469a0e9e0
calltrap() at calltrap+0x8/frame 0xfe0469a0e9e0
--- trap 0xc, rip = 0x8057820d, rsp = 0xfe0469a0eab0, rbp = 
0xfe0469a0eaf0 ---

__rw_wlock_hard() at __rw_wlock_hard+0xad/frame 0xfe0469a0eaf0
flowtable_clean_vnet() at flowtable_clean_vnet+0x496/frame 0xfe0469a0eb80
flowtable_cleaner() at flowtable_cleaner+0x90/frame 0xfe0469a0ebb0
fork_exit() at fork_exit+0x75/frame 0xfe0469a0ebf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0469a0ebf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---



Mon Apr 17 02:25:20 EDT 2017

FreeBSD