Re: em broken on current amd64

2015-09-12 Thread Mark R V Murray

> On 8 Sep 2015, at 19:02, Mark R V Murray  wrote:
> 
> 
>> On 8 Sep 2015, at 17:22, Sean Bruno  wrote:
>> 
>> 
> 
> I’m also seeing breakage with the em0 device; this isn’t a kernel
> hang, it is a failure to move data after about 10-15 minutes. The
> symptom is that my WAN ethernet no longer moves traffic, no pings,
> nothing. Booting looks normal:
> 
> em0:  port
> 0x30c0-0x30df mem 0x5030-0x5031,0x50324000-0x50324fff irq
> 20 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet
> address: 00:16:76:d3:e1:5b em0: netmap queues/slots: TX 1/1024, RX
> 1/1024
> 
> Fixing it is as easy as …
> 
> # ifconfig em0 down ; service ipfw restart ; ifconfig em0 up
> 
> :-)
> 
> I’m running CURRENT, r287538. This last worked of me a month or so
> ago at my previous build.
> 
> M
> 
 
 
 Just so I'm clear, the original problem reported was a failure to
 attach (you were among several folks reporting breakage).  Is that fixed
 ?
>>> 
>>> I did not report the failure to attach, and I am not seeing it as I don’t
>>> think I built a kernel that had that particular failure. I am having the
>>> “failure after 10-15 minutes” problem; this is on an em0 device.
>>> 
>>> M
>>> 
>> 
>> 
>> Hrm, that's odd.  That sounds like a hole where interrupts aren't being
>> reset for "reasons" that I cannot fathom.
>> 
>> What hardware (pciconf -lv) does your system actually have?  The em(4)
>> driver doesn't identify components which is frustrating.
> 
> pciconf -lv output below:
> 
> hostb0@pci0:0:0:0:class=0x06 card=0x514d8086 chip=0x29a08086 rev=0x02 
> hdr=0x00
>vendor = 'Intel Corporation'
>device = '82P965/G965 Memory Controller Hub'
>class  = bridge
>subclass   = HOST-PCI

I just caught this, on today’s build:

em0: Watchdog timeout Queue[0]-- resetting
Interface is RUNNING and ACTIVE
em0: TX Queue 0 --
em0: hw tdh = 127, hw tdt = 139
em0: Tx Queue Status = -2147483648
em0: TX descriptors avail = 1012
em0: Tx Descriptors avail failure = 0
em0: RX Queue 0 --
em0: hw rdh = 0, hw rdt = 1023
em0: RX discarded packets = 0
em0: RX Next to Check = 0
em0: RX Next to Refresh = 1023

[graveyard] /usr/ports 09:42 pm # uname -a
FreeBSD graveyard.grondar.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r287705: Sat 
Sep 12 15:07:54 BST 2015 
r...@graveyard.grondar.org:/b/obj/usr/src/sys/G_AMD64_GATE  amd64

M
-- 
Mark R V Murray

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread hiren panchasara
On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
> 12.09.2015, 02:22, "hiren panchasara" :
> > On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
[skip]
> > I'll try to get it. Meanwhile I am getting another panic on idle box:
> > http://pastebin.com/9qJTFMik
> The easiest explanation could be lack of lla_create() result check, fixed in 
> r286945.
> This panic is triggered by fast interface down-up (or just up), when ARP 
> packet is received but there are no (matching) IPv4 prefix on the interface.
> If this is not the case (e.g. it paniced w/o any interface changes and there 
> were no other subnets in given L2 segment) I'd be happy to debug this further.

Just hit another last night. (Box goes to db> ; let me know if you want
to debug anything when that happens.)
I am sure there were no interface changes on the box and it was sitting
idle. (Unsure of the other subnets part.) And I am on 3 days old -head
so I already have r286945. I disabled IPv6 on the box just to eliminate
that but panic still happens.

Cheers,
Hiren


pgprP6v0Xu12r.pgp
Description: PGP signature


Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread Alexander V . Chernikov


12.09.2015, 20:30, "hiren panchasara" :
> On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
>>  12.09.2015, 02:22, "hiren panchasara" :
>>  > On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
>
> [skip]
>>  > I'll try to get it. Meanwhile I am getting another panic on idle box:
>>  > http://pastebin.com/9qJTFMik
>>  The easiest explanation could be lack of lla_create() result check, fixed 
>> in r286945.
>>  This panic is triggered by fast interface down-up (or just up), when ARP 
>> packet is received but there are no (matching) IPv4 prefix on the interface.
>>  If this is not the case (e.g. it paniced w/o any interface changes and 
>> there were no other subnets in given L2 segment) I'd be happy to debug this 
>> further.
>
> Just hit another last night. (Box goes to db> ; let me know if you want
> to debug anything when that happens.)
Would you mind showing full backtrace for that core? (e.g. situation has to be 
different for newer -current).
> I am sure there were no interface changes on the box and it was sitting
> idle. (Unsure of the other subnets part.) And I am on 3 days old -head
> so I already have r286945. I disabled IPv6 on the box just to eliminate
> that but panic still happens.
>
> Cheers,
> Hiren
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

FreeBSD_HEAD_i386 - Build #1093 - Failure

2015-09-12 Thread jenkins-admin
FreeBSD_HEAD_i386 - Build #1093 - Failure:

Build information: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1093/
Full change log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1093/changes
Full build log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1093/console

Change summaries:

287704 by trasz:
Point potential geom_fox(4) users to gmultipath(8).

MFC after:  1 month
Sponsored by:   The FreeBSD Foundation

287703 by delphij:
MFV r287684: 6091 avl_add doesn't assert on non-debug builds

Use assfail() from libuutil instead of ASSERT() in userland
AVL avl_add.

illumos/illumos-gate@faa2b6be2fc102adf9ed584fc1a667b4ddf50d78

Illumos issues:

6091 avl_add doesn't assert on non-debug builds
https://www.illumos.org/issues/6091

287702 by delphij:
MFV r287624: 5987 zfs prefetch code needs work

Rewrite the ZFS prefetch code to detect only forward, sequential
streams.

The following kstats have been added:

kstat.zfs.misc.arcstats.sync_wait_for_async

How many sync reads have waited for async read
to complete. (less is better)

kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch

How many demand read didn't have to wait for I/O
because of predictive prefetch.  (more is better)

zfetch kstats have been similified to hits, misses, and max_streams,
with max_streams representing times when we were not able to create
new stream because we already have the maximum number of sequences
for a file.

The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been
replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes
to prefetch per stream.

illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af

Illumos ZFS issues:

5987 zfs prefetch code needs work
https://www.illumos.org/issues/5987

287701 by bapt:
Regression: fix pw usermod -d

Mark the user has having been edited if -d option is passed to usermod and
so the request change of home directory actually happen

PR: 203052
Reported by:lenzi.ser...@gmail.com
MFC after:  2 days



The end of the build log:

[...truncated 86399 lines...]
   ^~
--- lib.all__D ---
--- s_copysignf.po ---
cc  -DPROF -O2 -pipe   -I/usr/src/lib/msun/x86 -I/usr/src/lib/msun/ld80 
-I/usr/src/lib/msun/i387 -I/usr/src/lib/msun/src 
-I/usr/src/lib/msun/../libc/include  -I/usr/src/lib/msun/../libc/i386 
-std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wno-pointer-sign 
-Wno-unknown-pragmas -Wno-empty-body -Wno-string-plus-int 
-Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value 
-Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion 
-Wno-unused-local-typedef -Wno-switch -Wno-switch-enum 
-Wno-knr-promoted-parameter -Wno-parentheses -Qunused-arguments  -c 
/usr/src/lib/msun/i387/s_copysignf.S -o s_copysignf.po
--- all_subdir_libc ---
--- chown.po ---
cc  -DPROF -O2 -pipe   -I/usr/src/lib/libc/include 
-I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS  
-D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa 
-I/usr/src/lib/libc/../../contrib/libc-vis -DINET6 -I/usr/obj/usr/src/lib/libc 
-I/usr/src/lib/libc/resolv -D_ACL_PRIVATE -DPOSIX_MISTAKE 
-I/usr/src/lib/libc/../libmd -I/usr/src/lib/libc/../../contrib/jemalloc/include 
-I/usr/src/lib/libc/../../contrib/tzcode/stdtime -I/usr/src/lib/libc/stdtime  
-I/usr/src/lib/libc/locale -DBROKEN_DES -DPORTMAP -DDES_BUILTIN 
-I/usr/src/lib/libc/rpc -DYP -DNS_CACHING -DSYMBOL_VERSIONING -std=gnu99 
-fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k 
-Wno-uninitialized -Wno-pointer-sign -Wno-empty-body -Wno-string-plus-int 
-Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value 
-Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion 
-Wno-unused-local-typedef -Wno-switch -Wno-switch-enum 
-Wno-knr-promoted-parameter -Qunused-arguments -I/us
 r/src/lib/libutil -I/usr/src/lib/msun/i387 -I/usr/src/lib/msun/x86 
-I/usr/src/lib/msun/src  -c chown.S -o chown.po
--- all_subdir_msun ---
--- s_floorf.po ---
cc  -DPROF -O2 -pipe   -I/usr/src/lib/msun/x86 -I/usr/src/lib/msun/ld80 
-I/usr/src/lib/msun/i387 -I/usr/src/lib/msun/src 
-I/usr/src/lib/msun/../libc/include  -I/usr/src/lib/msun/../libc/i386 
-std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wno-pointer-sign 
-Wno-unknown-pragmas -Wno-empty-body -Wno-string-plus-int 
-Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value 
-Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion 
-Wno-unused-local-typedef -Wno-switch -Wno-switch-enum 
-Wno-knr-promoted-parameter -Wno-parentheses -Qunused-arguments  -c 
/usr/src/lib/msun/i387/s_floorf.S -o s_floorf.po
--- all_subdir_libc ---
--- freebsd4_getfsstat.po ---
cc  -DPROF -O2 -pipe   -I/usr/src/lib/libc/include 
-I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS  
-D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa 

Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread Hans Petter Selasky

On 09/12/15 01:21, hiren panchasara wrote:

On 09/11/15 at 09:06P, Hans Petter Selasky wrote:

On 09/10/15 21:23, hiren panchasara wrote:

I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10
08:15:43 MST 2015

I get random (1 out of 10 tries) panics when I do:
# kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet

I used to get panics on a couple months old -head also.

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x8225cf58
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad500
stack pointer   = 0x28:0xfe1f9d588700
frame pointer   = 0x28:0xfe1f9d588790
code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, long 1, def32 0, gran 1

Following https://www.freebsd.org/doc/faq/advanced.html, I did:
# nm -n /boot/kernel/kernel | grep 80aad500
# nm -n /boot/kernel/kernel | grep 80aad50
# nm -n /boot/kernel/kernel | grep 80aad5
# nm -n /boot/kernel/kernel | grep 80aad
80aad030 t itimers_event_hook_exec
80aad040 t realtimer_expire
80aad360 T callout_process
80aad6b0 t softclock_call_cc
80aadc10 T softclock
80aadd20 T timeout
80aade90 T callout_reset_sbt_on

So I guess " 80aad360 T callout_process" is the closest match?

I'll try to get real dump to get more information but that may take a
while.

ccing jch and hans who've been playing in this area.


Hi,

Possibly it means some timer was not drained before the module was
unloaded. It is not enough to only stop timers before freeing its
memory. Or maybe a timer was restarted after drain.

Can you get the full backtrace and put debugging symbols into the kernel?


I'll try to get it. Meanwhile I am getting another panic on idle box:
http://pastebin.com/9qJTFMik


That looks like a bug in the igb driver which is passing a NULL mbuf up!


#16 0x80b88156 in ether_input (ifp=, m=0x0) at 
/root/head/sys/net/if_ethersubr.c:676
#17 0x8053f004 in igb_rxeof (count=337545368) at 
/root/head/sys/dev/e1000/if_igb.c:4979


--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


FreeBSD_HEAD_i386 - Build #1094 - Fixed

2015-09-12 Thread jenkins-admin
FreeBSD_HEAD_i386 - Build #1094 - Fixed:

Build information: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1094/
Full change log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1094/changes
Full build log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/1094/console

Change summaries:

287707 by mav:
CTL documentation update, mostly for HA.

287706 by delphij:
MFV r287699: 6214 zpools going south

In r286570 (MFV of r277426) an unprotected write to b_flags to
set the compression mode was introduced.  This would open a race
window where data is partially decompressed, modified, checksummed
and written to the pool, resulting in pool corruption due to the
partial decompression.

Prevent this by reintroducing b_compress

illumos/illumos-gate@d4cd038c92c36fd0ae35945831a8fc2975b5272c

Illumos issues:

6214 zpools going south
https://www.illumos.org/issues/6214

287705 by delphij:
Fix build (r287703).  Lesson learned: no matter how a change looks like an
innocent one, always do a build test first.

Pointy hat to:  delphij

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on kldload/kldunload in/near callout

2015-09-12 Thread Alexander V . Chernikov
12.09.2015, 02:22, "hiren panchasara" :
> On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
>>  On 09/10/15 21:23, hiren panchasara wrote:
>>  > I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r286760M: Thu Sep 10
>>  > 08:15:43 MST 2015
>>  >
>>  > I get random (1 out of 10 tries) panics when I do:
>>  > # kldunload dummynet ; kldunload ipfw ;kldload ipfw ; kldload dummynet
>>  >
>>  > I used to get panics on a couple months old -head also.
>>  >
>>  > kernel trap 12 with interrupts disabled
>>  >
>>  > Fatal trap 12: page fault while in kernel mode
>>  > cpuid = 0; apic id = 00
>>  > fault virtual address = 0x8225cf58
>>  > fault code = supervisor read data, page not present
>>  > instruction pointer = 0x20:0x80aad500
>>  > stack pointer = 0x28:0xfe1f9d588700
>>  > frame pointer = 0x28:0xfe1f9d588790
>>  > code segment = base 0x0, limit 0xf, type 0x1b
>>  > = DPL 0, pres 1, long 1, def32 0, gran 1
>>  >
>>  > Following https://www.freebsd.org/doc/faq/advanced.html, I did:
>>  > # nm -n /boot/kernel/kernel | grep 80aad500
>>  > # nm -n /boot/kernel/kernel | grep 80aad50
>>  > # nm -n /boot/kernel/kernel | grep 80aad5
>>  > # nm -n /boot/kernel/kernel | grep 80aad
>>  > 80aad030 t itimers_event_hook_exec
>>  > 80aad040 t realtimer_expire
>>  > 80aad360 T callout_process
>>  > 80aad6b0 t softclock_call_cc
>>  > 80aadc10 T softclock
>>  > 80aadd20 T timeout
>>  > 80aade90 T callout_reset_sbt_on
>>  >
>>  > So I guess " 80aad360 T callout_process" is the closest match?
>>  >
>>  > I'll try to get real dump to get more information but that may take a
>>  > while.
>>  >
>>  > ccing jch and hans who've been playing in this area.
>>
>>  Hi,
>>
>>  Possibly it means some timer was not drained before the module was
>>  unloaded. It is not enough to only stop timers before freeing its
>>  memory. Or maybe a timer was restarted after drain.
>>
>>  Can you get the full backtrace and put debugging symbols into the kernel?
>
> I'll try to get it. Meanwhile I am getting another panic on idle box:
> http://pastebin.com/9qJTFMik
The easiest explanation could be lack of lla_create() result check, fixed in 
r286945.
This panic is triggered by fast interface down-up (or just up), when ARP packet 
is received but there are no (matching) IPv4 prefix on the interface.
If this is not the case (e.g. it paniced w/o any interface changes and there 
were no other subnets in given L2 segment) I'd be happy to debug this further.
>
> This "looks" similar to
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=156026 which got fixed
> via https://svnweb.freebsd.org/base?view=revision=r214675
> "Don't leak the LLE lock if the arptimer callout is pending or
> inactive."
>
> Is what I am seeing similar to this?
>
> I'll try and get more info.
>
> Cheers,
> Hiren
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"