Re: em broken on current amd64
> On 8 Sep 2015, at 19:02, Mark R V Murraywrote: > > >> On 8 Sep 2015, at 17:22, Sean Bruno wrote: >> >> > > I’m also seeing breakage with the em0 device; this isn’t a kernel > hang, it is a failure to move data after about 10-15 minutes. The > symptom is that my WAN ethernet no longer moves traffic, no pings, > nothing. Booting looks normal: > > em0: port > 0x30c0-0x30df mem 0x5030-0x5031,0x50324000-0x50324fff irq > 20 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet > address: 00:16:76:d3:e1:5b em0: netmap queues/slots: TX 1/1024, RX > 1/1024 > > Fixing it is as easy as … > > # ifconfig em0 down ; service ipfw restart ; ifconfig em0 up > > :-) > > I’m running CURRENT, r287538. This last worked of me a month or so > ago at my previous build. > > M > Just so I'm clear, the original problem reported was a failure to attach (you were among several folks reporting breakage). Is that fixed ? >>> >>> I did not report the failure to attach, and I am not seeing it as I don’t >>> think I built a kernel that had that particular failure. I am having the >>> “failure after 10-15 minutes” problem; this is on an em0 device. >>> >>> M >>> >> >> >> Hrm, that's odd. That sounds like a hole where interrupts aren't being >> reset for "reasons" that I cannot fathom. >> >> What hardware (pciconf -lv) does your system actually have? The em(4) >> driver doesn't identify components which is frustrating. > > pciconf -lv output below: > > hostb0@pci0:0:0:0:class=0x06 card=0x514d8086 chip=0x29a08086 rev=0x02 > hdr=0x00 >vendor = 'Intel Corporation' >device = '82P965/G965 Memory Controller Hub' >class = bridge >subclass = HOST-PCI I just caught this, on today’s build: em0: Watchdog timeout Queue[0]-- resetting Interface is RUNNING and ACTIVE em0: TX Queue 0 -- em0: hw tdh = 127, hw tdt = 139 em0: Tx Queue Status = -2147483648 em0: TX descriptors avail = 1012 em0: Tx Descriptors avail failure = 0 em0: RX Queue 0 -- em0: hw rdh = 0, hw rdt = 1023 em0: RX discarded packets = 0 em0: RX Next to Check = 0 em0: RX Next to Refresh = 1023 [graveyard] /usr/ports 09:42 pm # uname -a FreeBSD graveyard.grondar.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r287705: Sat Sep 12 15:07:54 BST 2015 r...@graveyard.grondar.org:/b/obj/usr/src/sys/G_AMD64_GATE amd64 M -- Mark R V Murray ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic on kldload/kldunload in/near callout
On 09/12/15 at 03:32P, Alexander V. Chernikov wrote: > 12.09.2015, 02:22, "hiren panchasara": > > On 09/11/15 at 09:06P, Hans Petter Selasky wrote: [skip] > > I'll try to get it. Meanwhile I am getting another panic on idle box: > > http://pastebin.com/9qJTFMik > The easiest explanation could be lack of lla_create() result check, fixed in > r286945. > This panic is triggered by fast interface down-up (or just up), when ARP > packet is received but there are no (matching) IPv4 prefix on the interface. > If this is not the case (e.g. it paniced w/o any interface changes and there > were no other subnets in given L2 segment) I'd be happy to debug this further. Just hit another last night. (Box goes to db> ; let me know if you want to debug anything when that happens.) I am sure there were no interface changes on the box and it was sitting idle. (Unsure of the other subnets part.) And I am on 3 days old -head so I already have r286945. I disabled IPv6 on the box just to eliminate that but panic still happens. Cheers, Hiren pgprP6v0Xu12r.pgp Description: PGP signature
Re: Panic on kldload/kldunload in/near callout
12.09.2015, 20:30, "hiren panchasara": > On 09/12/15 at 03:32P, Alexander V. Chernikov wrote: >> 12.09.2015, 02:22, "hiren panchasara" : >> > On 09/11/15 at 09:06P, Hans Petter Selasky wrote: > > [skip] >> > I'll try to get it. Meanwhile I am getting another panic on idle box: >> > http://pastebin.com/9qJTFMik >> The easiest explanation could be lack of lla_create() result check, fixed >> in r286945. >> This panic is triggered by fast interface down-up (or just up), when ARP >> packet is received but there are no (matching) IPv4 prefix on the interface. >> If this is not the case (e.g. it paniced w/o any interface changes and >> there were no other subnets in given L2 segment) I'd be happy to debug this >> further. > > Just hit another last night. (Box goes to db> ; let me know if you want > to debug anything when that happens.) Would you mind showing full backtrace for that core? (e.g. situation has to be different for newer -current). > I am sure there were no interface changes on the box and it was sitting > idle. (Unsure of the other subnets part.) And I am on 3 days old -head > so I already have r286945. I disabled IPv6 on the box just to eliminate > that but panic still happens. > > Cheers, > Hiren ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
FreeBSD_HEAD_i386 - Build #1093 - Failure
FreeBSD_HEAD_i386 - Build #1093 - Failure: Build information: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1093/ Full change log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1093/changes Full build log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1093/console Change summaries: 287704 by trasz: Point potential geom_fox(4) users to gmultipath(8). MFC after: 1 month Sponsored by: The FreeBSD Foundation 287703 by delphij: MFV r287684: 6091 avl_add doesn't assert on non-debug builds Use assfail() from libuutil instead of ASSERT() in userland AVL avl_add. illumos/illumos-gate@faa2b6be2fc102adf9ed584fc1a667b4ddf50d78 Illumos issues: 6091 avl_add doesn't assert on non-debug builds https://www.illumos.org/issues/6091 287702 by delphij: MFV r287624: 5987 zfs prefetch code needs work Rewrite the ZFS prefetch code to detect only forward, sequential streams. The following kstats have been added: kstat.zfs.misc.arcstats.sync_wait_for_async How many sync reads have waited for async read to complete. (less is better) kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better) zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file. The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream. illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af Illumos ZFS issues: 5987 zfs prefetch code needs work https://www.illumos.org/issues/5987 287701 by bapt: Regression: fix pw usermod -d Mark the user has having been edited if -d option is passed to usermod and so the request change of home directory actually happen PR: 203052 Reported by:lenzi.ser...@gmail.com MFC after: 2 days The end of the build log: [...truncated 86399 lines...] ^~ --- lib.all__D --- --- s_copysignf.po --- cc -DPROF -O2 -pipe -I/usr/src/lib/msun/x86 -I/usr/src/lib/msun/ld80 -I/usr/src/lib/msun/i387 -I/usr/src/lib/msun/src -I/usr/src/lib/msun/../libc/include -I/usr/src/lib/msun/../libc/i386 -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wno-pointer-sign -Wno-unknown-pragmas -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef -Wno-switch -Wno-switch-enum -Wno-knr-promoted-parameter -Wno-parentheses -Qunused-arguments -c /usr/src/lib/msun/i387/s_copysignf.S -o s_copysignf.po --- all_subdir_libc --- --- chown.po --- cc -DPROF -O2 -pipe -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS -D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa -I/usr/src/lib/libc/../../contrib/libc-vis -DINET6 -I/usr/obj/usr/src/lib/libc -I/usr/src/lib/libc/resolv -D_ACL_PRIVATE -DPOSIX_MISTAKE -I/usr/src/lib/libc/../libmd -I/usr/src/lib/libc/../../contrib/jemalloc/include -I/usr/src/lib/libc/../../contrib/tzcode/stdtime -I/usr/src/lib/libc/stdtime -I/usr/src/lib/libc/locale -DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP -DNS_CACHING -DSYMBOL_VERSIONING -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef -Wno-switch -Wno-switch-enum -Wno-knr-promoted-parameter -Qunused-arguments -I/us r/src/lib/libutil -I/usr/src/lib/msun/i387 -I/usr/src/lib/msun/x86 -I/usr/src/lib/msun/src -c chown.S -o chown.po --- all_subdir_msun --- --- s_floorf.po --- cc -DPROF -O2 -pipe -I/usr/src/lib/msun/x86 -I/usr/src/lib/msun/ld80 -I/usr/src/lib/msun/i387 -I/usr/src/lib/msun/src -I/usr/src/lib/msun/../libc/include -I/usr/src/lib/msun/../libc/i386 -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wno-pointer-sign -Wno-unknown-pragmas -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef -Wno-switch -Wno-switch-enum -Wno-knr-promoted-parameter -Wno-parentheses -Qunused-arguments -c /usr/src/lib/msun/i387/s_floorf.S -o s_floorf.po --- all_subdir_libc --- --- freebsd4_getfsstat.po --- cc -DPROF -O2 -pipe -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS -D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa
Re: Panic on kldload/kldunload in/near callout
On 09/12/15 01:21, hiren panchasara wrote: On 09/11/15 at 09:06P, Hans Petter Selasky wrote: On 09/10/15 21:23, hiren panchasara wrote: I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10 08:15:43 MST 2015 I get random (1 out of 10 tries) panics when I do: # kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet I used to get panics on a couple months old -head also. kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x8225cf58 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80aad500 stack pointer = 0x28:0xfe1f9d588700 frame pointer = 0x28:0xfe1f9d588790 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 Following https://www.freebsd.org/doc/faq/advanced.html, I did: # nm -n /boot/kernel/kernel | grep 80aad500 # nm -n /boot/kernel/kernel | grep 80aad50 # nm -n /boot/kernel/kernel | grep 80aad5 # nm -n /boot/kernel/kernel | grep 80aad 80aad030 t itimers_event_hook_exec 80aad040 t realtimer_expire 80aad360 T callout_process 80aad6b0 t softclock_call_cc 80aadc10 T softclock 80aadd20 T timeout 80aade90 T callout_reset_sbt_on So I guess " 80aad360 T callout_process" is the closest match? I'll try to get real dump to get more information but that may take a while. ccing jch and hans who've been playing in this area. Hi, Possibly it means some timer was not drained before the module was unloaded. It is not enough to only stop timers before freeing its memory. Or maybe a timer was restarted after drain. Can you get the full backtrace and put debugging symbols into the kernel? I'll try to get it. Meanwhile I am getting another panic on idle box: http://pastebin.com/9qJTFMik That looks like a bug in the igb driver which is passing a NULL mbuf up! #16 0x80b88156 in ether_input (ifp=, m=0x0) at /root/head/sys/net/if_ethersubr.c:676 #17 0x8053f004 in igb_rxeof (count=337545368) at /root/head/sys/dev/e1000/if_igb.c:4979 --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
FreeBSD_HEAD_i386 - Build #1094 - Fixed
FreeBSD_HEAD_i386 - Build #1094 - Fixed: Build information: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1094/ Full change log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1094/changes Full build log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1094/console Change summaries: 287707 by mav: CTL documentation update, mostly for HA. 287706 by delphij: MFV r287699: 6214 zpools going south In r286570 (MFV of r277426) an unprotected write to b_flags to set the compression mode was introduced. This would open a race window where data is partially decompressed, modified, checksummed and written to the pool, resulting in pool corruption due to the partial decompression. Prevent this by reintroducing b_compress illumos/illumos-gate@d4cd038c92c36fd0ae35945831a8fc2975b5272c Illumos issues: 6214 zpools going south https://www.illumos.org/issues/6214 287705 by delphij: Fix build (r287703). Lesson learned: no matter how a change looks like an innocent one, always do a build test first. Pointy hat to: delphij ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic on kldload/kldunload in/near callout
12.09.2015, 02:22, "hiren panchasara": > On 09/11/15 at 09:06P, Hans Petter Selasky wrote: >> On 09/10/15 21:23, hiren panchasara wrote: >> > I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10 >> > 08:15:43 MST 2015 >> > >> > I get random (1 out of 10 tries) panics when I do: >> > # kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet >> > >> > I used to get panics on a couple months old -head also. >> > >> > kernel trap 12 with interrupts disabled >> > >> > Fatal trap 12: page fault while in kernel mode >> > cpuid = 0; apic id = 00 >> > fault virtual address = 0x8225cf58 >> > fault code = supervisor read data, page not present >> > instruction pointer = 0x20:0x80aad500 >> > stack pointer = 0x28:0xfe1f9d588700 >> > frame pointer = 0x28:0xfe1f9d588790 >> > code segment = base 0x0, limit 0xf, type 0x1b >> > = DPL 0, pres 1, long 1, def32 0, gran 1 >> > >> > Following https://www.freebsd.org/doc/faq/advanced.html, I did: >> > # nm -n /boot/kernel/kernel | grep 80aad500 >> > # nm -n /boot/kernel/kernel | grep 80aad50 >> > # nm -n /boot/kernel/kernel | grep 80aad5 >> > # nm -n /boot/kernel/kernel | grep 80aad >> > 80aad030 t itimers_event_hook_exec >> > 80aad040 t realtimer_expire >> > 80aad360 T callout_process >> > 80aad6b0 t softclock_call_cc >> > 80aadc10 T softclock >> > 80aadd20 T timeout >> > 80aade90 T callout_reset_sbt_on >> > >> > So I guess " 80aad360 T callout_process" is the closest match? >> > >> > I'll try to get real dump to get more information but that may take a >> > while. >> > >> > ccing jch and hans who've been playing in this area. >> >> Hi, >> >> Possibly it means some timer was not drained before the module was >> unloaded. It is not enough to only stop timers before freeing its >> memory. Or maybe a timer was restarted after drain. >> >> Can you get the full backtrace and put debugging symbols into the kernel? > > I'll try to get it. Meanwhile I am getting another panic on idle box: > http://pastebin.com/9qJTFMik The easiest explanation could be lack of lla_create() result check, fixed in r286945. This panic is triggered by fast interface down-up (or just up), when ARP packet is received but there are no (matching) IPv4 prefix on the interface. If this is not the case (e.g. it paniced w/o any interface changes and there were no other subnets in given L2 segment) I'd be happy to debug this further. > > This "looks" similar to > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=156026 which got fixed > via https://svnweb.freebsd.org/base?view=revision=r214675 > "Don't leak the LLE lock if the arptimer callout is pending or > inactive." > > Is what I am seeing similar to this? > > I'll try and get more info. > > Cheers, > Hiren ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"