Re: em performs worse than igb (latency wise) in 12?

2019-04-09 Thread Nick Rogers
On Sat, Apr 6, 2019 at 10:24 PM Graham Menhennitt 
wrote:

> Not that it's at all relevant to the question here, but...
>
> It does mostly work without em in the 12 kernel - I'm not sure how, but
> it does.
>
> I upgraded to 12-stable via source but didn't add em to my custom
> kernel. Most things worked - basic network functionality. But I had
> problems with ipfw and igb. Adding em to the kernel fixed them.
>

FWIW the latest GENERIC kernel includes the iflib, em, etc devices as far
as I can tell. I found the new UPDATING entry about iflib "no longer
unconditionally compiled into the kernel" a bit confusing... So long as you
are including GENERIC it should be the same as 12-RELEASE.


> Graham
>
> On 6/4/19 6:12 am, Kris von Mach wrote:
> > On 4/6/2019 2:56 AM, Pete French wrote:
> >> Something odd going on there there - I am using 12-STABLE and I have
> >> igb just fine, and it attaches to the same hardware that 11 did:
> >
> > It does work in 12, throughput is great, just that the latency is
> > higher than 11.
> >
> > igb0: flags=8843 metric 0 mtu
> > 1500
> >
> options=e527bb
>
> >
> > ether 38:ea:a7:8d:c1:6c
> > inet 208.72.56.19 netmask 0xfc00 broadcast 208.72.59.255
> > inet6 fe80::3aea:a7ff:fe8d:c16c%igb0 prefixlen 64 scopeid 0x1
> > inet6 2602:ffb8::208:72:56:9 prefixlen 64
> > media: Ethernet autoselect (1000baseT )
> > status: active
> > nd6 options=21
> >
> >> Do you have a custom kernel, and if so did you see this note in
> >> UPDATING?
> >
> > Yes I do, but it includes all of GENERIC which includes em drivers,
> > otherwise it wouldn't even work with the network card.
> >
> > my custom kernel:
> >
> > include GENERIC
> > ident   CUSTOM
> > makeoptions WITH_EXTRA_TCP_STACKS=1
> > options TCPHPTS
> > options SC_KERNEL_CONS_ATTR=(FG_GREEN|BG_BLACK)
> > options IPSTEALTH
> > options   AHC_REG_PRETTY_PRINT  # Print register bitfields in debug
> > options   AHD_REG_PRETTY_PRINT  # Print register bitfields in debug
> > device cryptodev
> > device aesni
> >
> > I did try without RACK just in case that was the culprit.
> >
> >
> > ___
> > freebsd-stable@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org
> "
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
On Mon, Mar 4, 2019 at 5:29 PM Andriy Gapon  wrote:

> On 04/03/2019 22:35, Nick Rogers wrote:
> > v_lock = {lock_object = {lo_name =
> > 0x8144af45 "zfs", lo_flags = 117112840, lo_data = 0, lo_witness =
> > 0x0}, lk_lock = 18446744073709551605, lk_exslpfail = 0, lk_timo = 51,
> > lk_pri = 96}
>
> Hmm, lk_lock looks bogus.
> 18446744073709551605 == 0xfff5 and it's LK_SHARE |
> LK_EXCLUSIVE_WAITERS with 0xfff shared owners.
> Perhaps, this is a result of LK_SHARERS_LOCK(-1).
>
> Is your kernel compiled with INVARIANTS and INVARIANT_SUPPORT?
> I suspect that the vnode was accessed (unlocked?) through a stale pointer
> after
> it was recycled.
>

I don't believe so - it's basically amd64 GENERIC w/ a reduced set of
modules and static zfs option.


> --
> Andriy Gapon
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
On Sat, Mar 2, 2019 at 12:48 PM Andriy Gapon  wrote:

> On 01/03/2019 17:00, Nick Rogers wrote:
> > 36704 101146 perl-   mi_switch+0xe1
> > sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c
> VOP_LOCK1_APV+0x7e
> > _vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d
> > zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
> > kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
>
> I suspect that this thread is a root cause of the problem.
> In this place, the vnode should be freshly created and not visible to
> anything
> but the current thread.  So, vn_lock() should always immediately succeed.
> I
> cannot understand how the vnode lock could be held by another thread.
>

It happened again. I tried to get a backtrace from the offending thread and
one of the others waiting for it. At the moment I have access to this
particular system in its bad state and can leave it like this for as long
as possible, so let me know if there's something else useful I can get out
of the debugger.

courtland# procstat -kka | grep zfs
0 100140 kernel  zfsvfs  mi_switch+0xe1
sleepq_wait+0x2c _sleep+0x237 taskqueue_thread_loop+0xf1 fork_exit+0x83
fork_trampoline+0xe
0 100424 kernel  zfs_vn_rele_taskq   mi_switch+0xe1
sleepq_wait+0x2c _sleep+0x237 taskqueue_thread_loop+0xf1 fork_exit+0x83
fork_trampoline+0xe
   23 100119 zfskern arc_reclaim_thread  mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a arc_reclaim_thread+0x146
fork_exit+0x83 fork_trampoline+0xe
   23 100120 zfskern arc_dnlc_evicts_thr mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 arc_dnlc_evicts_thread+0x16f fork_exit+0x83
fork_trampoline+0xe
   23 100122 zfskern dbuf_evict_thread   mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a dbuf_evict_thread+0x1c8
fork_exit+0x83 fork_trampoline+0xe
   23 100139 zfskern l2arc_feed_thread   mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a l2arc_feed_thread+0x219
fork_exit+0x83 fork_trampoline+0xe
   23 100405 zfskern trim zroot  mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a trim_thread+0x11f
fork_exit+0x83 fork_trampoline+0xe
   23 100441 zfskern txg_thread_entermi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 txg_quiesce+0x21b txg_quiesce_thread+0x11b
fork_exit+0x83 fork_trampoline+0xe
   23 100442 zfskern txg_thread_entermi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 txg_sync_thread+0x13b fork_exit+0x83
fork_trampoline+0xe
   23 100443 zfskern solthread 0xfff mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0xcc fork_exit+0x83
fork_trampoline+0xe
   23 100444 zfskern solthread 0xfff mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0xcc fork_exit+0x83
fork_trampoline+0xe
 7476 100751 postgres-   mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 dmu_tx_wait+0x2eb dmu_tx_assign+0x48
zfs_freebsd_create+0x4c8 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
 7480 100527 postgres-   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d
zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
46101 100471 postgres-   mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 dmu_tx_wait+0x2eb dmu_tx_assign+0x48
zfs_freebsd_create+0x4c8 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
52625 100488 perl-   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
52675 100643 csh -   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
52826 100562 ls  -   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
52889 100641 bash-   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
courtland# kgdb
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
Copyright

Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
Thanks for the insight, it does appear that in all instances of this
problem there is always one thread stuck on zfs_znode_alloc. Unfortunately
its always a different application (e.g., perl, sh, postgres). I will post
more information in the bug.

On Sat, Mar 2, 2019 at 12:48 PM Andriy Gapon  wrote:

> On 01/03/2019 17:00, Nick Rogers wrote:
> > 36704 101146 perl-   mi_switch+0xe1
> > sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c
> VOP_LOCK1_APV+0x7e
> > _vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d
> > zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
> > kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
>
> I suspect that this thread is a root cause of the problem.
> In this place, the vnode should be freshly created and not visible to
> anything
> but the current thread.  So, vn_lock() should always immediately succeed.
> I
> cannot understand how the vnode lock could be held by another thread.
>
> --
> Andriy Gapon
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
On Sat, Mar 2, 2019 at 5:27 PM Peter Avalos via freebsd-stable <
freebsd-stable@freebsd.org> wrote:

>
> > On Mar 1, 2019, at 7:00 AM, Nick Rogers  wrote:
> >
> > I am hoping someone can help me figure out if this is a legitimate bug,
> or
> > something already fixed in 12-STABLE. I wish I could reproduce it
> reliably
> > to try against STABLE, but there doesn't appear to be any related ZFS
> fixes
> > not in RELEASE. Thanks.
> >
>
> I have also experienced this problem, but I haven’t been able to
> troubleshoot it at all.
>

I've opened a bug report, so if you have any more information about how it
is affecting you that may be helpful to share here.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236220


>
> Peter
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


12.0-RELEASE zfs/vnode deadlock issue

2019-03-01 Thread Nick Rogers
Recently a number of my production 12.0 systems have experienced what I can
only gather is a ZFS deadlock related to vnodes. It seems similar to the
relatively recent FreeBSD-EN-18:18.zfs (ZFS vnode reclaim deadlock)
problem. Previously the same systems were running 11.1-RELEASE without
problem.

Threads are always stuck with the stack around
vn_lock->zfs_root->lookup->namei. When the system is in this state, a
simple `ls /` or `ls /tmp` always hangs, but other datasets seem
unaffected. I have a fairly straightforward ZFS root setup on a single pool
with one SSD. The workload is a ruby/rails/nginx/postgresql backed web
application combined with some data warehousing and other periodic tasks.

Sometimes I can remote SSH in, other times that fails because the user
shell fails to load, and I can run commands via `ssh ... command`.
Sometimes the system is not accessible remotely at all, or it eventually
becomes inaccessible if left long enough in this state. I always have to
physically reboot the device because the shutdown procedure fails. The
network stack (e.g. ping) seems to work completely fine whilst this is
going on, until you try to interact with or spawn a process that hits the
deadlock.

Like previous similar ZFS deadlock issues, increasing kern.vnodes seems to
make the system last longer by up to a few weeks, but is still a bandaid.
However, I have yet to witness vnodes usage actually getting close to the
maximum.

I haven't had any luck reproducing this reliably, but eventually it happens
after a few days or a few weeks... I managed to connect to a system in this
state and grab a procstat and get (hopefully) something useful out of kgdb.
I will note that although I was able to install debug symbols, I couldn't
manage to get the source files onto it for kgdb purposes before I lost SSH
access.

I am hoping someone can help me figure out if this is a legitimate bug, or
something already fixed in 12-STABLE. I wish I could reproduce it reliably
to try against STABLE, but there doesn't appear to be any related ZFS fixes
not in RELEASE. Thanks.

Below is an abbreviated procstat and what I was able to get out of kgdb for
an affected thread. Note that the thread backtrace is from a simple `ls`
command. The procstat dump below is truncated because my last attempt to
send this was rejected by this list for being too long - so a number of
sh/cron processes and some zfs threads in a hung state were removed.

ld# kgdb
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from
/usr/lib/debug//boot/kernel/kernel.debug...done.
done.
sched_switch (td=0xf8002452a000, newtd=0xf80003625580,
flags=)
at /usr/src/sys/kern/sched_ule.c:2112
2112 /usr/src/sys/kern/sched_ule.c: No such file or directory.
(kgdb) tid 102023
(kgdb) bt
#0  sched_switch (td=0xf801a83dc580, newtd=0xf80003550580,
flags=)
at /usr/src/sys/kern/sched_ule.c:2112
#1  0x80d0e0a1 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:439
#2  0x80d5c80c in sleepq_wait (wchan=,
pri=)
at /usr/src/sys/kern/subr_sleepqueue.c:692
#3  0x80cd9105 in sleeplk (lk=0xf800247307e8, flags=, ilk=,
wmesg=, pri=, timo=51, queue=1) at
/usr/src/sys/kern/kern_lock.c:300
#4  0x80cd7f85 in lockmgr_slock_hard (lk=,
flags=, ilk=,
file=, line=0, lwa=) at
/usr/src/sys/kern/kern_lock.c:646
#5  0x813acc5e in VOP_LOCK1_APV (vop=,
a=0xfe00f89dd450) at vnode_if.c:2087
#6  0x80de2820 in VOP_LOCK1 (vp=0xf80024730780, flags=2105344,
file=0x814d4f74
"/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c",
line=2074)
at ./vnode_if.h:859
#7  _vn_lock (vp=0xf80024730780, flags=2105344,
file=0x814d4f74
"/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c",
line=2074)
at /usr/src/sys/kern/vfs_vnops.c:1533
#8  0x8049f68d in zfs_root (vfsp=, flags=2105344,
vpp=0xfe00f89dd558)
at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:2074
#9  0x80dc5d43 in lookup (ndp=0xfe00f89dd780) at
/usr/src/sys/kern/vfs_lookup.c:961
#10 0x80dc4f9b in namei (ndp=0xfe00f89dd780) at
/usr/src/sys/kern/vfs_lookup.c:444
#11 0x80ddc637 in 

10.1-RELEASE-p33 update does not exist?

2016-05-05 Thread Nick Rogers
Hello,

I am not sure if this is the appropriate place to inquire about this, but I
am unable to update my 10.1-RELEASE machines to the latest releng branch
(10.1-RELEASE-p33) with the latest FreeBSD-SA-16:17.openssl advisory.

Here's what happens when I try freebsd-update.

# freebsd-version -ku
10.1-RELEASE-p31
10.1-RELEASE-p32
# uname -v
FreeBSD 10.1-RELEASE-p31 #0: Wed Mar 16 18:39:20 UTC 2016
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
# freebsdupdatec
# freebsd-update fetch
Looking up update.FreeBSD.org mirrors... 4 mirrors found.
Fetching metadata signature for 10.1-RELEASE from update6.freebsd.org...
done.
Fetching metadata index... done.
Inspecting system... done.
Preparing to download files... done.

The following files are affected by updates, but no changes have
been downloaded because the files have been modified locally:
/etc/mtree/BSD.usr.dist
/var/db/etcupdate/current/etc/mtree/BSD.usr.dist
/var/db/etcupdate/current/etc/ntp.conf
/var/db/mergemaster.mtree

No updates needed to update system to 10.1-RELEASE-p32.

There seems to be some discussion that is perhaps related to this issue in
this bug:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209147

Thanks.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: freebsd-update and hang during reboot

2015-04-17 Thread Nick Rogers
On Thu, Apr 16, 2015 at 6:52 PM, Glen Barber g...@freebsd.org wrote:

 On Wed, Apr 15, 2015 at 02:44:44PM -0700, Nick Rogers wrote:
  On Mon, Mar 9, 2015 at 9:19 AM, Nick Rogers ncrog...@gmail.com wrote:
   Is anyone working on fixing this problem? It seems like this should
 have
   some kind of full court press as it is obviously affecting plenty of
   people, some of which have spoken up in the following PR
  
   https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458
  
   I realize its a tough problem to track down, and if I had the
 appropriate
   skills I would help. But so far all I've been able to do, like others,
 is
   replicate and complain about the problem.
  
   Its still affecting upgrading to 10.1-RELEASE-p6 from the official
   10.1-RELEASE distribution, and from 10.1-RELEASE-p5. I just had another
   production server hang during reboot after updating to p6, and I don't
 see
   this changing for the inevitable p7 unless this problem gets more
   attention. Can someone with the right skill-set please help figure this
   out? Thank you.
  
 
  In case anyone is still dealing with this problem, the fix was MFC'd to
  stable/10 a few days. I am assuming this will not end up getting back
  ported to releng/10.1.

 An EN for 10.1-RELEASE is planned.


Oh thats good to know. Thank you. I had asked that in the related bug a
week or so ago and did not receive a response, so I figured it would not.
Any idea the eta for the EN?


 Glen


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: freebsd-update and hang during reboot

2015-04-15 Thread Nick Rogers
On Mon, Mar 9, 2015 at 9:19 AM, Nick Rogers ncrog...@gmail.com wrote:



 On Tue, Feb 10, 2015 at 1:37 PM, Nick Rogers ncrog...@gmail.com wrote:



 On Mon, Feb 9, 2015 at 9:08 AM, Ian Lepore i...@freebsd.org wrote:

 On Mon, 2015-02-09 at 11:41 -0500, Kurt Lidl wrote:
  Joel wrote:
   Hi,
  
   Just about every machine I have seems to hang after running
 freebsd-update and doing a reboot. The last message on the screen is All
 buffers synced” and it just freezes.
  
   This happens when doing a freebsd-update and going from 10.0 to
 10.1, but also when doing a fresh 10.1 install and using freebsd-update to
 get the latest -pX security patches. As soon as I reboot the machine, it
 hangs.
  
   I’ve tried it on several different HP ProLiant models, on Intel NUCs
 and on VMware virtual machines. Same phenomenon everywhere. It’s really
 easy to trigger: just install 10.1, use default settings everywhere,
 freebsd-update fetch/install, shutdown -r now and BOOM. It hangs. I think
 I’ve seen it on
  
  
  
  
   30 servers or so now.
  
   Everything works like it should after the initial hang tough - no
 matter how many times I reboot it completes the reboot cycle just fine.
  
   I’ve seen several people (mostly on IRC) mention this problem, but
 no solution.
  
   Is anyone working on fixing this?
 
  I ran into this problem in spades when upgrading a set of servers from
  FreeBSD 9.0 to 9.1.  I happened consistently.  Normal reboots worked,
  but when going from 9.0 to 9.1, it *ALWAYS* hung, and it always hung
  at the same place, after printing the All buffers synced message.
 
  I ultimately determined that if I did the following, rather than
  just a reboot or shutdown -r now 'FreeBSD 9.1-RELEASE upgrade',
  it would consistently AVOID the hang:
 
  sync ; sync ; sync ; shutdown -o -n -r now FreeBSD 9.1 install
 
  Your mileage may vary, but you don't have a lot to lose by trying it.
 
  -Kurt
 

 That is just bad advice.  sync(1) does not g'tee that all data has been
 written, no matter how many times you type it.  shutdown -n tells the
 system to abandon unwritten data.  All in all, this is a recipe for
 silent filesystem corruption.  Using it after an update is just asking
 to have a mix of old and new files on the system after the reboot.

 A more robust workaround would be to mount -r on all filesystems
 before invoking the shutdown (even a shutdown -n should be safe after
 everything has been remounted readonly).  If the mount -r hangs on one
 of the filesystems, then you've probably got a clue as to where a normal
 shutdown is hanging.


 FWIW mount -r on the root filesystem hangs for me. If I disable
 softupdates-journaling on the root filesystem before the upgrade process,
 the system no longer hangs on the last reboot after userland upgrade.
 However, the root filesystem still comes up dirty with an incorrect free
 block count during fsck.


  Is anyone working on fixing this problem? It seems like this should have
 some kind of full court press as it is obviously affecting plenty of
 people, some of which have spoken up in the following PR

 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

 I realize its a tough problem to track down, and if I had the appropriate
 skills I would help. But so far all I've been able to do, like others, is
 replicate and complain about the problem.

 Its still affecting upgrading to 10.1-RELEASE-p6 from the official
 10.1-RELEASE distribution, and from 10.1-RELEASE-p5. I just had another
 production server hang during reboot after updating to p6, and I don't see
 this changing for the inevitable p7 unless this problem gets more
 attention. Can someone with the right skill-set please help figure this
 out? Thank you.


In case anyone is still dealing with this problem, the fix was MFC'd to
stable/10 a few days. I am assuming this will not end up getting back
ported to releng/10.1. I've compiled a patch with the fix that works
against 10.1-RELEASE. Maybe it will be useful for any of you like me that
don't run 10-stable, but are comfortable with custom kernels and are still
dealing with this issue when running freebsd-update every time a new patch
level is released. Diff is below.

# Fix bug causing a hang while unmounting the root filesystem during
# reboot after performing a freebsd-update.
#
#
# Original commit to HEAD:
# https://svnweb.freebsd.org/base?view=revisionrevision=280760
# MFC to stable:
# https://svnweb.freebsd.org/base?view=revisionrevision=281350
#
# The following commits were taken from stable/10/sys/ufs/ffs between
# the release of 10.1-RELEASE (r272459) and MFC of the fix (r281350)
# in order for the fix to cleanly apply to releng/10.1. The two
# unrelated commits seem like reasonable fixes to include as well.
#
# https://svnweb.freebsd.org/base?view=revisionrevision=281350
# https://svnweb.freebsd.org/base?view=revisionrevision=278667
# https://svnweb.freebsd.org/base?view=revisionrevision=274305
#
Index: ufs/ffs/ffs_vfsops.c

Re: freebsd-update and hang during reboot

2015-03-09 Thread Nick Rogers
On Tue, Feb 10, 2015 at 1:37 PM, Nick Rogers ncrog...@gmail.com wrote:



 On Mon, Feb 9, 2015 at 9:08 AM, Ian Lepore i...@freebsd.org wrote:

 On Mon, 2015-02-09 at 11:41 -0500, Kurt Lidl wrote:
  Joel wrote:
   Hi,
  
   Just about every machine I have seems to hang after running
 freebsd-update and doing a reboot. The last message on the screen is All
 buffers synced” and it just freezes.
  
   This happens when doing a freebsd-update and going from 10.0 to 10.1,
 but also when doing a fresh 10.1 install and using freebsd-update to get
 the latest -pX security patches. As soon as I reboot the machine, it hangs.
  
   I’ve tried it on several different HP ProLiant models, on Intel NUCs
 and on VMware virtual machines. Same phenomenon everywhere. It’s really
 easy to trigger: just install 10.1, use default settings everywhere,
 freebsd-update fetch/install, shutdown -r now and BOOM. It hangs. I think
 I’ve seen it on
  
  
  
  
   30 servers or so now.
  
   Everything works like it should after the initial hang tough - no
 matter how many times I reboot it completes the reboot cycle just fine.
  
   I’ve seen several people (mostly on IRC) mention this problem, but no
 solution.
  
   Is anyone working on fixing this?
 
  I ran into this problem in spades when upgrading a set of servers from
  FreeBSD 9.0 to 9.1.  I happened consistently.  Normal reboots worked,
  but when going from 9.0 to 9.1, it *ALWAYS* hung, and it always hung
  at the same place, after printing the All buffers synced message.
 
  I ultimately determined that if I did the following, rather than
  just a reboot or shutdown -r now 'FreeBSD 9.1-RELEASE upgrade',
  it would consistently AVOID the hang:
 
  sync ; sync ; sync ; shutdown -o -n -r now FreeBSD 9.1 install
 
  Your mileage may vary, but you don't have a lot to lose by trying it.
 
  -Kurt
 

 That is just bad advice.  sync(1) does not g'tee that all data has been
 written, no matter how many times you type it.  shutdown -n tells the
 system to abandon unwritten data.  All in all, this is a recipe for
 silent filesystem corruption.  Using it after an update is just asking
 to have a mix of old and new files on the system after the reboot.

 A more robust workaround would be to mount -r on all filesystems
 before invoking the shutdown (even a shutdown -n should be safe after
 everything has been remounted readonly).  If the mount -r hangs on one
 of the filesystems, then you've probably got a clue as to where a normal
 shutdown is hanging.


 FWIW mount -r on the root filesystem hangs for me. If I disable
 softupdates-journaling on the root filesystem before the upgrade process,
 the system no longer hangs on the last reboot after userland upgrade.
 However, the root filesystem still comes up dirty with an incorrect free
 block count during fsck.


 Is anyone working on fixing this problem? It seems like this should have
some kind of full court press as it is obviously affecting plenty of
people, some of which have spoken up in the following PR

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

I realize its a tough problem to track down, and if I had the appropriate
skills I would help. But so far all I've been able to do, like others, is
replicate and complain about the problem.

Its still affecting upgrading to 10.1-RELEASE-p6 from the official
10.1-RELEASE distribution, and from 10.1-RELEASE-p5. I just had another
production server hang during reboot after updating to p6, and I don't see
this changing for the inevitable p7 unless this problem gets more
attention. Can someone with the right skill-set please help figure this
out? Thank you.


 -- Ian


 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: netstat output changes in 8.0?

2010-06-09 Thread Nick Rogers
On Tue, Jan 26, 2010 at 12:49 PM, Nick Rogers ncrog...@gmail.com wrote:

 Thanks a lot. Thats a bummer. What are the chances of getting something
 like that worked into arp(8) permanently?


I recently noticed that arp(8) was changed a few months back to show when an
entry expires. Thanks!

http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.sbin/arp/arp.c?rev=1.75
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: arp -na performance w/ many permanent entries

2010-06-09 Thread Nick Rogers
On Sun, Jun 6, 2010 at 4:23 PM, Nick Rogers ncrog...@gmail.com wrote:



 On Sat, Jun 5, 2010 at 11:54 PM, Garrett Cooper yanef...@gmail.comwrote:


 I agree with Jeremy. I think that the problem that you've
 discovered is the fact that it's using stdio-based buffered output
 instead of buffering more of the contents in a string and punting it
 out in larger chunks.
 HTH,
 -Garrett


 I don't think so. The performance difference when taking out the interface
 lookup is huge even though the data output to STDOUT is mostly the same.
 I'll try the other lists, thanks.


FYI there is a bugfix/patch for this issue being discussed in
freebsd-hackers:

http://www.mail-archive.com/freebsd-hack...@freebsd.org/msg157097.html

Thanks again for suggesting I try another list.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: arp -na performance w/ many permanent entries

2010-06-06 Thread Nick Rogers
On Sat, Jun 5, 2010 at 11:54 PM, Garrett Cooper yanef...@gmail.com wrote:


 I agree with Jeremy. I think that the problem that you've
 discovered is the fact that it's using stdio-based buffered output
 instead of buffering more of the contents in a string and punting it
 out in larger chunks.
 HTH,
 -Garrett


I don't think so. The performance difference when taking out the interface
lookup is huge even though the data output to STDOUT is mostly the same.
I'll try the other lists, thanks.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: arp -na performance w/ many permanent entries

2010-06-05 Thread Nick Rogers
On Mon, May 31, 2010 at 10:54 PM, Nick Rogers ncrog...@gmail.com wrote:


 [root@ ~]# time arp -na  /dev/null

 real 0m12.761s
 user 0m2.959s
 sys 0m9.753s
 [root@ ~]#


 Notice that arp -na takes about 13s to execute even though there is no
 other load. This can get a lot worse by a few orders of magnitude on a
 loaded machine in a production environment, and seems to scale up linearly
 when more aliases are added to the interface (permanent ARP entries
 created).

 Is this a reasonable problem that can be fixed/improved, or am I stuck with
 the slow arp -na output? Any help or comments is greatly appreciated.


I tried the same scenario on 8.1-BETA1 and it still takes a very long time
for arp(8) to complete.

I was able to isolate the performance bottleneck to a small piece of the
arp(8) code. It seems that looking up the interface for an ARP entry is a
very heavy operation when that entry corresponds to an alias assigned to the
interface. Permanent ARP entries that do not correspond with an interface
alias do not seem to cause arp(8) to puke on the interface lookup.

The following commands and code diff illustrates how arp(8) can be modified
to run a lot faster in this scenario, but obviously the associated interface
is no longer printed for each entry.

[root@ /usr/src/usr.sbin/arp]# uname -a
FreeBSD .localdomain 8.1-BETA1 FreeBSD 8.1-BETA1 #0: Thu May 27 15:03:30 UTC
2010 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
[root@ /usr/src/usr.sbin/arp]# time /usr/sbin/arp -na | wc -l
4100

real 0m14.903s
user 0m3.133s
sys 0m11.519s
[root@ /usr/src/usr.sbin/arp]# pwd
/usr/src/usr.sbin/arp
[root@ /usr/src/usr.sbin/arp]# !diff
diff -ruN arp.c.orig arp.c
--- arp.c.orig 2010-06-05 18:25:24.0 +
+++ arp.c 2010-06-05 18:28:19.0 +
@@ -562,7 +562,7 @@
  const char *host;
  struct hostent *hp;
  struct iso88025_sockaddr_dl_data *trld;
- char ifname[IF_NAMESIZE];
+ //char ifname[IF_NAMESIZE];
  int seg;

  if (nflag == 0)
@@ -591,8 +591,8 @@
  }
  } else
  printf((incomplete));
- if (if_indextoname(sdl-sdl_index, ifname) != NULL)
- printf( on %s, ifname);
+ //if (if_indextoname(sdl-sdl_index, ifname) != NULL)
+ //printf( on %s, ifname);
  if (rtm-rtm_rmx.rmx_expire == 0)
  printf( permanent);
  else {
[root@ /usr/src/usr.sbin/arp]# make clean  make
rm -f arp arp.o arp.4.gz arp.8.gz arp.4.cat.gz arp.8.cat.gz
Warning: Object directory not changed from original /usr/src/usr.sbin/arp
cc -O2 -pipe  -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall
-Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes
-Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign -c
arp.c
cc -O2 -pipe  -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall
-Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes
-Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign
 -o arp arp.o
gzip -cn arp.4  arp.4.gz
gzip -cn arp.8  arp.8.gz
[root@ /usr/src/usr.sbin/arp]# time ./arp -na | wc -l
4099

real 0m0.036s
user 0m0.015s
sys 0m0.021s
[root@ /usr/src/usr.sbin/arp]#

Notice that 0.036s without the interface lookup is a heck of a lot faster
than 14.903s when doing the interface lookup.

Is there something that can be done to speedup the call to if_indextoname(),
or would it be worthwhile for me to submit a patch that adds the ability to
skip the interface lookup as an arp(8) option?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


arp -na performance w/ many permanent entries

2010-05-31 Thread Nick Rogers
I have an 8.0-RELEASE system with 4000 permanent ARP entries due to having
a network interface (em(4)) configured with 4000 aliases. The arp -na
command takes what I consider to be an extremely long time to finish (up to
30s on an otherwise unloaded system). I am able to replicate this in a test
environment by installing 8.0-RELEASE-amd64 on a VMWare VM w/ 1GB of RAM and
a 2GHz CPU. The 4000 aliases/entries is arbitrary, but nicely illustrates
the performance problem.

The performance is much worse on a real/loaded system. I realize the 4k
aliases on an interface is unusual but I have been effectively using this
configuration in my network to try and keep my end-users's each on his/her
own broadcast domain. The box is a router and I allocate addresses to each
user and put each on his/her own subnet with a netmask of /30. If you would
like more info on this I can provide it, but it has worked effectively in
FreeBSD 6.0-7.2. The slow performance of arp -na is an issue for me
because I have a web/CGI tool that runs various reports, many of them
relying on acquiring the current ARP table, and the performance of arp(8)
makes the web interface extremely slow.

I believe the problem was introduced between 7.2 and 8.0, when, as far as I
understand, parts of the ARP subsystem were improved. In 7.2, the aliases
configured on an interface were not considered ARP entries (at least
according to arp(8)), but as of 8.0 they are marked as permanent ARP
entries and displayed by arp(8), which seems to attribute to the performance
problem.

I ran the following perl script to setup my test system. This script was run
after installing 8.0-RELEASE and adding the bash, perl, and p5-NetAddr-IP
packages via pkg_add -r.

#!/usr/bin/perl

use strict;
use diagnostics;

use NetAddr::IP;

my $interface = 'em1';
my $cidr= '10.0.0.1/18';

# configure the interface with 4000 or so aliases
foreach my $na (@{NetAddr::IP-new($cidr)-splitref(30)}) {
my $ip= $na-addr();
my $mask  = $na-mask();
my $bcast  = $na-broadcast()-addr();

my $cmd = ifconfig $interface inet alias $ip netmask $mask broadcast
$bcast;
print STDERR $cmd\n;
system($cmd);
}


The results are as follows:

[root@ ~]# uname -a
FreeBSD .localdomain 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08
UTC 2009 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
[root@ ~]# ifconfig -a | wc -l
4113
[root@ ~]# ifconfig -a | head -15
em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM
ether 00:0c:29:65:4d:3e
inet 172.16.16.244 netmask 0xff00 broadcast 172.16.16.255
media: Ethernet autoselect (1000baseT full-duplex)
status: active
em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM
ether 00:0c:29:65:4d:48
inet 10.0.0.0 netmask 0xfffc broadcast 10.0.0.3
inet 10.0.0.4 netmask 0xfffc broadcast 10.0.0.7
inet 10.0.0.8 netmask 0xfffc broadcast 10.0.0.11
inet 10.0.0.12 netmask 0xfffc broadcast 10.0.0.15
inet 10.0.0.16 netmask 0xfffc broadcast 10.0.0.19
inet 10.0.0.20 netmask 0xfffc broadcast 10.0.0.23
[root@ ~]# time ifconfig -a  /dev/null

real 0m0.032s
user 0m0.023s
sys 0m0.008s


[root@ ~]# arp -na | wc -l
4100
[root@ ~]# arp -na | tail -15
? (10.0.5.80) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.5.48) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.5.16) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.244) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.212) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.180) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.148) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.116) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.84) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.52) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (10.0.1.20) at 00:0c:29:65:4d:48 on em1 permanent [ethernet]
? (172.16.16.1) at 00:50:56:c0:00:08 on em0 [ethernet]
? (172.16.16.2) at 00:50:56:ea:ea:1a on em0 [ethernet]
? (172.16.16.254) at 00:50:56:f2:75:00 on em0 [ethernet]
? (172.16.16.244) at 00:0c:29:65:4d:3e on em0 permanent [ethernet]
[root@ ~]# uptime
 7:28PM  up 42 mins, 2 users, load averages: 0.00, 0.00, 0.00
[root@ ~]# time arp -na  /dev/null

real 0m12.761s
user 0m2.959s
sys 0m9.753s
[root@ ~]#


Notice that arp -na takes about 13s to execute even though there is no
other load. This can get a lot worse by a few orders of magnitude on a
loaded machine in a production environment, and seems to scale up linearly
when more aliases are added to the interface (permanent ARP entries
created).

Is this a reasonable problem that can be fixed/improved, or am I stuck with
the slow arp -na output? Any help or comments is greatly appreciated.
___
freebsd-stable@freebsd.org mailing 

Re: em(4) interface hangs under 8.0-RELEASE

2010-03-06 Thread Nick Rogers
Yes, this was the first em(4) problem I ran into when upgrading from
7.2-RELEASE to 8.0-RELEASE. Yourself and others on another thread eventually
recommended turning off TSO and what not. I never had a chance to thoroughly
test this solution on this particular hardware because we had already
switched to a different set of interfaces (on-motherboard bge(4)). We also
had that ALTQ problem popup on em which I'm sure you remember, which
prevented me from going back to the em interfaces for a while.

After solving the ALTQ problem by going back to the 7.2-RELEASE driver, I
thought it would be OK to switch this particular hardware back to using the
em interfaces in production. (we were also experiencing kernel panics due to
some bge(4) issue a few times a week). I had assumed, incorrectly, that
because we were using the em(4) driver from 7.2-RELEASE, there would be no
more hanging problem. I've also tried the latest CURRENT em(4) driver.

So I am pretty sure I have experienced this problem under 8.0-RELEASE,
8.0-RELEASE w/ em(4) driver from 7.2, and 8.0-RELEASE w/ em(4) driver from
CVS HEAD.

On Sat, Mar 6, 2010 at 10:39 AM, Jack Vogel jfvo...@gmail.com wrote:

 I need a bit more context Nick. Is this a card that has been
 non-problematic
 on older releases and just showed a problem with 8.0 REL?

 Regards,

 Jack


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em(4) interface hangs under 8.0-RELEASE

2010-03-06 Thread Nick Rogers
ALTQ + RELENG_8 + em(4) will not work at the moment. It does not matter what
your PF ruleset looks like or how much traffic you are pushing. The packets
that transit the em interface simply never make it to the ALTQ queues (not
even the interface's root queue). Thus any kind of bandwidth rate limiting
or whatever you are doing will not work.

This was fixed by the following commit. I think its supposed to MFC soon?
http://svn.freebsd.org/viewvc/base?view=revisionrevision=203834

On Sat, Mar 6, 2010 at 12:21 PM, Jeremy Chadwick
free...@jdc.parodius.comwrote:


 Why I care: upgrading our RELENG_7 machine which uses ALTQ directives is
 on my to-do list, and if this feature is somehow broken under RELENG_8,
 I need to know in advance so I can use ipfw + dummynet instead.

 --
 | Jeremy Chadwick   j...@parodius.com |
 | Parodius Networking   http://www.parodius.com/ |
 | UNIX Systems Administrator  Mountain View, CA, USA |
 | Making life hard for others since 1977.  PGP: 4BD6C0CB |


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


em(4) interface hangs under 8.0-RELEASE

2010-03-05 Thread Nick Rogers
I'm still having a problem where an em(4) interface mysteriously hangs and
mostly stops sending/receiving  packets until I issue an ifconfig emX down
followed by an ifconfig emX up, which fixes the problem for some amount of
time. Traffic on the interface is about a consistent 3mb/s.

One interesting thing to note is that if I tcpdump the interface during the
hang, I sometimes see a portion of the expected packets, usually only
outbound.

I've tried compiling a custom 8.0-RELEASE kernel with the em(4) driver
(sys/dev/e1000) from 7.2-RELEASE as well as the same from cvs HEAD. Neither
seem to fix the problem.

I've also tried, as suggested in a previous thread, disabling TSO, TXCSUM,
RXCSUM via the following:

sysctl net.inet.tcp.tso=0
ifconfig em1 -tso -txcsum
ifconfig em1 down
ifconfig em1 up

Relevant ifconfig and pciconf dump below. There are no attached VLAN
interfaces.

em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=98VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM
ether 00:04:23:ca:a7:b7
inet 172.31.1.3 netmask 0xfff8 broadcast 172.31.1.7
media: Ethernet autoselect (1000baseT full-duplex)
status: active

e...@pci0:2:2:1: class=0x02 card=0x11798086 chip=0x10798086 rev=0x03
hdr=0x00
vendor = 'Intel Corporation'
device = 'Dual Port Gigabit Ethernet Controller (82546EB)'
class  = network
subclass   = ethernet
cap 01[dc] = powerspec 2  supports D0 D3  current D0
cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split
transaction
cap 05[f0] = MSI supports 1 message, 64 bit

Any suggestions are greatly appreciated. Thanks.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-03-02 Thread Nick Rogers
Second that. Daily panics using a Tyan board w/ BCM5704. Unfortunately
unable to provide crash dump and I was forced to use a different NIC. But
for what its worth here is the relevant pciconf -lv output.

b...@pci0:2:9:0: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
b...@pci0:2:9:1: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet


On Sat, Feb 27, 2010 at 2:50 PM, Erik Klavon er...@berkeley.edu wrote:

 I have BCM5704 hardware (Tyan S2882 system board). I am seeing kernel
 panics very similar to those described in this thread on this
 hardware. pciconf -lcv output below. If you'd like access to this
 hardware I can arrange it; please contact me off list.

 Erik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-15 Thread Nick Rogers
hw.bge.allow_asf: 0

On Mon, Feb 15, 2010 at 2:23 AM, Giacomo Olgeni g.olg...@colby.it wrote:


 Hello,

 Are you running with hw.bge.allow_asf enabled?



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-14 Thread Nick Rogers
I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
shed light on the below error? I unfortunately cannot provide a proper crash
dump. The pointer addresses are always the same. The only other thing I've
noticed that may be related is a watchdog timeout on bge0 error before the
panic. Thanks.

Jan 27 15:25:01 wifi kernel:
Jan 27 15:25:01 wifi kernel:
Jan 27 15:25:01 wifi kernel: Fatal trap 12: page fault while in kernel mode
Jan 27 15:25:01 wifi kernel: cpuid = 4; apic id = 04
Jan 27 15:25:02 wifi kernel:
Jan 27 15:25:02 wifi kernel: fault virtual address  = 0x28
Jan 27 15:25:02 wifi kernel: fault code = supervisor write data,
page not present
Jan 27 15:25:02 wifi kernel: instruction pointer=
0x20:0x803263b7
Jan 27 15:25:02 wifi kernel: stack pointer  =
0x28:0xff8073acdb40
Jan 27 15:25:02 wifi kernel: frame pointer  =
0x28:0xff8073acdba0
Jan 27 15:25:02 wifi kernel: code segment   = base 0x0, limit
0xf, type 0x1b
Jan 27 15:25:02 wifi kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Jan 27 15:25:02 wifi kernel: processor eflags   =
Jan 27 15:25:02 wifi kernel: interrupt enabled,
Jan 27 15:25:02 wifi kernel: resume,
Jan 27 15:25:02 wifi kernel: IOPL = 0
Jan 27 15:25:02 wifi kernel:
Jan 27 15:25:02 wifi kernel: current process
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em(4) + ALTQ broken

2010-02-11 Thread Nick Rogers
Anyone else get a chance to review this?

On Fri, Feb 5, 2010 at 8:44 PM, Nick Rogers ncrog...@gmail.com wrote:

 I applied drbr_altq.diff to the e1000 driver (sys/dev/e1000) from HEAD on
 top of 8.0-RELEASE kernel sources. It appears to have fixed the immediate
 problem where queues simply don't work on em interfaces. Thanks a bunch.

 I suppose further review and testing by others would be greatly appreciated
 from my point of view. I am trying to decide on a relatively stable 8.0
 kernel with working em(4) + ALTQ to put into production on 100 or so
 installations. Are you guys more comfortable with the HEAD sys/dev/e1000 +
 this patch on top of 8.0-RELEASE, or e1000 from 7.2 on top of 8.0-RELEASE?
 So far I am having good luck with the later. Thanks again for your
 contributions!


 On Thu, Feb 4, 2010 at 6:51 PM, Max Laier m...@love2party.net wrote:

 Okay ... attached is a patch to fix this for em(4) (and lay the groundwork
 to
 fix it for other drbr_* consumer as well).  I have tested it in
 VirtualBox,
 but don't have real hardware to check for non-ALTQ performance or other
 regressions.

 Test, comments and review appreciated.

 --
   Max



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: PF Traffic Redirection issues

2010-02-05 Thread Nick Rogers
On Fri, Feb 5, 2010 at 9:41 AM, Spas Karabelov st...@sofiahouse.net wrote:

 Hello,

 I am trying to perform traffic redirection with PF on 7.2-RELEASE.
 The traffic is in the same subnet and I try doing that by using just one
 interface em0.


PF cannot redirect packets back out the interface they originated on.

From pf.conf(5)...

Redirections cannot reflect packets back through the interface they arrive
on, they can only be redirected to hosts connected to different interfaces
or
to the firewall itself.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em(4) + ALTQ broken

2010-02-05 Thread Nick Rogers
I applied drbr_altq.diff to the e1000 driver (sys/dev/e1000) from HEAD on
top of 8.0-RELEASE kernel sources. It appears to have fixed the immediate
problem where queues simply don't work on em interfaces. Thanks a bunch.

I suppose further review and testing by others would be greatly appreciated
from my point of view. I am trying to decide on a relatively stable 8.0
kernel with working em(4) + ALTQ to put into production on 100 or so
installations. Are you guys more comfortable with the HEAD sys/dev/e1000 +
this patch on top of 8.0-RELEASE, or e1000 from 7.2 on top of 8.0-RELEASE?
So far I am having good luck with the later. Thanks again for your
contributions!

On Thu, Feb 4, 2010 at 6:51 PM, Max Laier m...@love2party.net wrote:

 Okay ... attached is a patch to fix this for em(4) (and lay the groundwork
 to
 fix it for other drbr_* consumer as well).  I have tested it in VirtualBox,
 but don't have real hardware to check for non-ALTQ performance or other
 regressions.

 Test, comments and review appreciated.

 --
   Max

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em(4) + ALTQ broken

2010-02-02 Thread Nick Rogers
 I guess the problem comes from multi-queue support. The drbr
 interface is implemented with inline function so em(4)/igb(4) may
 have to define ALTQ to the header. I have not tested the patch(no
 time at this moment) but would you give it try?

 I tried the patch and it did not work.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em(4) + ALTQ broken

2010-02-02 Thread Nick Rogers
On Tue, Feb 2, 2010 at 9:37 AM, Pyun YongHyeon pyu...@gmail.com wrote:

 On Tue, Feb 02, 2010 at 09:30:52AM -0800, Nick Rogers wrote:
   I guess the problem comes from multi-queue support. The drbr
   interface is implemented with inline function so em(4)/igb(4) may
   have to define ALTQ to the header. I have not tested the patch(no
   time at this moment) but would you give it try?
  
   I tried the patch and it did not work.

 You rebuilt kernel, right? Rebuilding kernel module has no effect.

Yes I rebuilt the kernel itself and replaced the one on my test system.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


em(4) + ALTQ broken

2010-01-31 Thread Nick Rogers
I'm having problems getting PF + ALTQ to work on em(4) interfaces under
8.0-RELEASE. Kernel was rebuilt with the additional options necessary for
ALTQ and what not. Same basic configuration works fine under 7.2-RELEASE.
Basically, the queues create successfully but running a pfctl -vsq shows a
zero packet/byte count for all queues, even the interface's root queues.

This same problem is mentioned in PR kern/138392, and the following forum
thread...
http://forums.freebsd.org/showthread.php?t=6656

em(4)/e1000 driver from CURRENT does not fix the problem. Building an
8.0-RELEASE kernel with the em(4) driver from 7.2-RELEASE fixes the problem
(i.e., replacing sys/dev/e1000 with that from 7.2).

One of the machines im experiencing this on has the following intel
chipset...

[u...@foo ~]$ sysctl dev.em.0
dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6
dev.em.0.%driver: em
dev.em.0.%location: slot=0 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9
subdevice=0x040d class=0x02
dev.em.0.%parent: pci3
dev.em.0.debug: -1
dev.em.0.stats: -1
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_processing_limit: 100

Is this issue receiving any attention? I ask because one of the em(4) driver
contributors mentioned he had not heard of this problem in a recent thread
regarding a different em(4) bug, and this is a pretty serious problem for me
as I have many devices in production that need to be upgraded to 8.0, all
running Intel interfaces and relying on ALTQ.

I appreciate any updates or information and would be happy to test any
patches and/or provide more information. Thanks.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-29 Thread Nick Rogers
Does any of this have anything to do with the fact that ALTQ seems to be
broken for em(4) under 8.0-RELEASE? I just ran into this similar problem
today where my PF/ALTQ hfsc rules no longer seem to do anything on em
interfaces.

http://forums.freebsd.org/showthread.php?t=6656

Any information regarding this would be appreciated. Thanks.

On Fri, Jan 29, 2010 at 9:45 AM, Jack Vogel jfvo...@gmail.com wrote:

 No need, I set it up and tried it, and I was right, it does not fail if
 that
 routine is not used. The interesting thing is that the igb driver, which
 has the same code, works fine.

 In any case, I'm hot on the track of this and hope I can figure it out
 today.

 Jack


 On Fri, Jan 29, 2010 at 5:38 AM, Marco van Tol ma...@tols.org wrote:

  On Thu, Jan 28, 2010 at 11:16:02AM -0800, Jack Vogel wrote:
   I am investigating it, and have a suspicion about what's going on, you
  can
   assist in verifying my suspicion.  In if_em.c search for
  em_setup_vlan_hw,
   you will find a compile time option that uses that only if
  FreeBSD_version
   is  700029, hack the code however you wish so that it uses the OLD way
   (ie that it never calls em_setup_vlan_hw_support()) and see if that
 makes
   the issue disappear.
 
  Oh good, I will try that and let you know about the result first chance I
  get.  Should be days rather then hours, but I'll make it asap.
 
   If you have any problems or questions email me directly.
 
  Will do, thanks!
 
  Marco
 
 
 
 
   On Thu, Jan 28, 2010 at 4:17 AM, Marco van Tol ma...@tols.org wrote:
  
On Tue, Jan 26, 2010 at 09:00:35AM -0800, Nick Rogers wrote:
 Is it advisable to patch 8.0-RELEASE kernel sources with the latest
 (CURRENT) em driver (i.e., src/sys/dev/e1000)? It looks like there
  are
some
 updates to the driver since 8.0-RELEASE that may fix some problems?
   
While on the em subject, forgive me if I mail this to the
 inappropriate
place, but is there any ETA on progress for bug kern/141646?
   
I'm currently suffering from it and would be willing to provide
 needed
assistance for fixing it.
   
Thank you very much in advance,
   
Marco van Tol
 
  --
  Better to remain silent and be thought a fool
  than to speak out and remove all doubt.
  - Abraham Lincoln
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-29 Thread Nick Rogers
I just discovered it myself today. I'll try and post more info in another
thread.

On Fri, Jan 29, 2010 at 5:43 PM, Jack Vogel jfvo...@gmail.com wrote:

 You know, i know absolutely nothing about ALTQ :) This is the first I've
 heard
 about this problem, you should make sure the maintainer of the driver gets
 informed sooner :)

 Would be happy to look into it as I have time.

 Jack



 On Fri, Jan 29, 2010 at 5:28 PM, Nick Rogers ncrog...@gmail.com wrote:

 Does any of this have anything to do with the fact that ALTQ seems to be
 broken for em(4) under 8.0-RELEASE? I just ran into this similar problem
 today where my PF/ALTQ hfsc rules no longer seem to do anything on em
 interfaces.

 http://forums.freebsd.org/showthread.php?t=6656

 Any information regarding this would be appreciated. Thanks.


 On Fri, Jan 29, 2010 at 9:45 AM, Jack Vogel jfvo...@gmail.com wrote:

 No need, I set it up and tried it, and I was right, it does not fail if
 that
 routine is not used. The interesting thing is that the igb driver, which
 has the same code, works fine.

 In any case, I'm hot on the track of this and hope I can figure it out
 today.

 Jack


 On Fri, Jan 29, 2010 at 5:38 AM, Marco van Tol ma...@tols.org wrote:

  On Thu, Jan 28, 2010 at 11:16:02AM -0800, Jack Vogel wrote:
   I am investigating it, and have a suspicion about what's going on,
 you
  can
   assist in verifying my suspicion.  In if_em.c search for
  em_setup_vlan_hw,
   you will find a compile time option that uses that only if
  FreeBSD_version
   is  700029, hack the code however you wish so that it uses the OLD
 way
   (ie that it never calls em_setup_vlan_hw_support()) and see if that
 makes
   the issue disappear.
 
  Oh good, I will try that and let you know about the result first chance
 I
  get.  Should be days rather then hours, but I'll make it asap.
 
   If you have any problems or questions email me directly.
 
  Will do, thanks!
 
  Marco
 
 
 
 
   On Thu, Jan 28, 2010 at 4:17 AM, Marco van Tol ma...@tols.org
 wrote:
  
On Tue, Jan 26, 2010 at 09:00:35AM -0800, Nick Rogers wrote:
 Is it advisable to patch 8.0-RELEASE kernel sources with the
 latest
 (CURRENT) em driver (i.e., src/sys/dev/e1000)? It looks like
 there
  are
some
 updates to the driver since 8.0-RELEASE that may fix some
 problems?
   
While on the em subject, forgive me if I mail this to the
 inappropriate
place, but is there any ETA on progress for bug kern/141646?
   
I'm currently suffering from it and would be willing to provide
 needed
assistance for fixing it.
   
Thank you very much in advance,
   
Marco van Tol
 
  --
  Better to remain silent and be thought a fool
  than to speak out and remove all doubt.
  - Abraham Lincoln
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to 
 freebsd-stable-unsubscr...@freebsd.org
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-29 Thread Nick Rogers
On Fri, Jan 29, 2010 at 5:43 PM, Jack Vogel jfvo...@gmail.com wrote:

 You know, i know absolutely nothing about ALTQ :) This is the first I've
 heard
 about this problem, you should make sure the maintainer of the driver gets
 informed sooner :)

 Look like there is an old PR for it. See kern/138392.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: netstat output changes in 8.0?

2010-01-26 Thread Nick Rogers
Thanks a lot. Thats a bummer. What are the chances of getting something like
that worked into arp(8) permanently?

On Tue, Jan 26, 2010 at 4:41 AM, Ruslan Ermilov r...@freebsd.org wrote:

 On Mon, Jan 25, 2010 at 07:01:46PM -0800, Nick Rogers wrote:
  Before 8.0-RELEASE, if I ran netstat -rn, it listed a separate route for
  each host on the network, along with its MAC address. For example ...
 
  172.20.172.17  00:02:b3:2f:64:6a  UHLW1 105712   1500
   vlan172595
  172.20.172.20  00:1e:c9:bb:7c:a9  UHLW1   1002   1500
   vlan172598
  172.20.172.22  00:14:5e:16:bb:b6  UHLW1107   1500
   vlan172491
 
  This behavior seems to have changed in 8.0, where now only the
  locally-assigned IP addresses and related CIDRs are displayed.

 From src/UPDATING:

 : 20081214:
 : __FreeBSD_version 800059 incorporates the new arp-v2 rewrite.
 : RTF_CLONING, RTF_LLINFO and RTF_WASCLONED flags are eliminated.
 : The new code reduced struct rtentry{} by 16 bytes on 32-bit
 : architecture and 40 bytes on 64-bit architecture. The userland
 : applications arp and ndp have been updated accordingly.
 : The output from netstat -r shows only routing entries and
 : none of the L2 information.

  Is there any way to get this behavior back, perhaps with a new flag that
 I
  am not able to find? Or some sysctl? I have a script that was relying on
  each host's expire flag in the routing table to determine when the MAC
  address first appeared on the network according to ARP.

 If you need to know when a particular ARP entry expires, a variation
 of the following patch can be used, perhaps hiding this output by the
 -v (verbose) option.

 %%%
 Index: arp.c
 ===
 --- arp.c   (revision 203016)
 +++ arp.c   (working copy)
 @@ -101,7 +101,8 @@
  static int nflag;  /* no reverse dns lookups */
  static char *rifname;

 -static int expire_time, flags, doing_proxy, proxy_only;
 +static time_t  expire_time;
 +static int flags, doing_proxy, proxy_only;

  /* which function we're supposed to do */
  #define F_GET  1
 @@ -594,6 +595,15 @@
printf( on %s, ifname);
if (rtm-rtm_rmx.rmx_expire == 0)
printf( permanent);
 +   else {
 +   static struct timeval tv;
 +   if (tv.tv_sec == 0)
 +   gettimeofday(tv, 0);
 +   if ((expire_time = rtm-rtm_rmx.rmx_expire - tv.tv_sec) 
 0)
 +   printf( expires %d, (int)expire_time);
 +   else
 +   printf( expired);
 +   }
if (addr-sin_other  SIN_PROXY)
printf( published (proxy only));
if (rtm-rtm_flags  RTF_ANNOUNCE)
 %%%


 Cheers,
 --
 Ruslan Ermilov
 r...@freebsd.org
 FreeBSD committer

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-26 Thread Nick Rogers
Is it advisable to patch 8.0-RELEASE kernel sources with the latest
(CURRENT) em driver (i.e., src/sys/dev/e1000)? It looks like there are some
updates to the driver since 8.0-RELEASE that may fix some problems?

On Mon, Jan 25, 2010 at 8:31 PM, Joshua Boyd boy...@jbip.net wrote:

 I've been having a similar problem with my network dropping completely on
 my
 8-STABLE gateway/firewall/fileserver. My setup is a little different, as I
 have re0 and ral0 bridged for LAN, and em0 for WAN. I've just turned off TX
 checksum offloading to see if that makes any difference.
 \\

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-26 Thread Nick Rogers
looks like the patch mentioned in kern/141843 has not been applied to the
tree?

On Tue, Jan 26, 2010 at 9:00 AM, Nick Rogers ncrog...@gmail.com wrote:

 Is it advisable to patch 8.0-RELEASE kernel sources with the latest
 (CURRENT) em driver (i.e., src/sys/dev/e1000)? It looks like there are some
 updates to the driver since 8.0-RELEASE that may fix some problems?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-26 Thread Nick Rogers
Can anyone clarify if I should be looking to disable TSO or TXCSUM, or both,
or does disabling either one somehow work around the problem? Thanks a lot.

On Mon, Jan 25, 2010 at 8:31 PM, Joshua Boyd boy...@jbip.net wrote:

 I've been having a similar problem with my network dropping completely on
 my
 8-STABLE gateway/firewall/fileserver. My setup is a little different, as I
 have re0 and ral0 bridged for LAN, and em0 for WAN. I've just turned off TX
 checksum offloading to see if that makes any difference.

 On Mon, Jan 25, 2010 at 1:49 PM, Lars Eggert lars.egg...@nokia.com
 wrote:

  Hi,
 
  On 2010-1-25, at 19:38, Nick Rogers wrote:
   On Mon, Jan 25, 2010 at 10:22 AM, Pyun YongHyeon pyu...@gmail.com
  wrote:
   I'm not sure you're seeing a checksum offload bug of em(4) but the
   bug is easily reproducible in VLAN environments. If the issue is
   gone when you disable TX checksum offloading, see kern/141843 for
   for more detailed information as well as fix.
  
   Good to know, but I am having a similar problem on another em(4)
  interface that has no VLAN interfaces.
 
  FYI, I also have these issues without using VLANs, and turning off TSO
  fixed them.
 
  Lars

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-25 Thread Nick Rogers
I have not tried toying with any tcp sysctl. I'm not having performance
problems so much as the interface just stops working entirely, which I would
think has nothing to do with the TCP stack when layer 2 is not functioning?

I'll give it a shot if I can. For the moment I have had to switch to a
different (lower performance) network card to get things stable and I would
like to be aware of a more concrete driver fix in STABLE before switching
back my production machines.

On Mon, Jan 25, 2010 at 6:25 AM, Lars Eggert lars.egg...@nokia.com wrote:

 Hi,

 have you tried turning off TCP Segmentation Offloading (net.inet.tcp.tso
 sysctl)? That fixed performance issues with some em cards for me.

 Lars


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-25 Thread Nick Rogers
On Mon, Jan 25, 2010 at 10:22 AM, Pyun YongHyeon pyu...@gmail.com wrote:

 On Mon, Jan 25, 2010 at 08:25:43AM -0800, Nick Rogers wrote:
  I have not tried toying with any tcp sysctl. I'm not having performance
  problems so much as the interface just stops working entirely, which I
 would
  think has nothing to do with the TCP stack when layer 2 is not
 functioning?
 

 I'm not sure you're seeing a checksum offload bug of em(4) but the
 bug is easily reproducible in VLAN environments. If the issue is
 gone when you disable TX checksum offloading, see kern/141843 for
 for more detailed information as well as fix.


Good to know, but I am having a similar problem on another em(4) interface
that has no VLAN interfaces.


  I'll give it a shot if I can. For the moment I have had to switch to a
  different (lower performance) network card to get things stable and I
 would
  like to be aware of a more concrete driver fix in STABLE before switching
  back my production machines.
 
  On Mon, Jan 25, 2010 at 6:25 AM, Lars Eggert lars.egg...@nokia.com
 wrote:
 
   Hi,
  
   have you tried turning off TCP Segmentation Offloading
 (net.inet.tcp.tso
   sysctl)? That fixed performance issues with some em cards for me.
  
   Lars
  
  

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


netstat output changes in 8.0?

2010-01-25 Thread Nick Rogers
Before 8.0-RELEASE, if I ran netstat -rn, it listed a separate route for
each host on the network, along with its MAC address. For example ...

172.20.172.17  00:02:b3:2f:64:6a  UHLW1 105712   1500
 vlan172595
172.20.172.20  00:1e:c9:bb:7c:a9  UHLW1   1002   1500
 vlan172598
172.20.172.22  00:14:5e:16:bb:b6  UHLW1107   1500
 vlan172491

This behavior seems to have changed in 8.0, where now only the
locally-assigned IP addresses and related CIDRs are displayed.

Is there any way to get this behavior back, perhaps with a new flag that I
am not able to find? Or some sysctl? I have a script that was relying on
each host's expire flag in the routing table to determine when the MAC
address first appeared on the network according to ARP.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em interface slow down on 8.0R

2010-01-24 Thread Nick Rogers
I am having similar em interface problems with some of my production
machines running older intel 2-port cards, since upgrading from 7.2-RELEASE
to 8.0-RELEASE. The problem is basically, everything works fine, but
periodically the interface hangs (tcpdump shows no frames). A reboot or an
ifconfig down followed by an ifconfig up fixes the problem for some time.
Traffic peaks at maybe 20mbit per day and its all 802.1Q VLAN tagged traffic
(about 10 vlan interfaces). When this happens netstat reports only errors
and no packets on the affected interface. Media is set to autoselect. This
is happening about 5-10x per day.

Heres relevant sysctl and ifconfig info.

dev.em.6.%desc: Intel(R) PRO/1000 Network Connection 6.9.14
dev.em.6.%driver: em
dev.em.6.%location: slot=3 function=0
dev.em.6.%pnpinfo: vendor=0x8086 device=0x1079 subvendor=0x8086
subdevice=0x1179 class=0x02
dev.em.6.%parent: pci3
dev.em.6.debug: -1
dev.em.6.stats: -1
dev.em.6.rx_int_delay: 0
dev.em.6.tx_int_delay: 66
dev.em.6.rx_abs_int_delay: 66
dev.em.6.tx_abs_int_delay: 66
dev.em.6.rx_processing_limit: 100

em6: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM
ether 00:04:23:cd:47:82
media: Ethernet autoselect (1000baseT full-duplex)
status: active

On Tue, Jan 5, 2010 at 6:35 PM, Jason Chambers jchamb...@ucla.edu wrote:

 Hiroki Sato wrote:
   Thank you!  I have investigated some more details.  First, I got
   something wrong with the affected FreeBSD versions; one I tried was
   8.0-STABLE, not 8.0-RELEASE.  So I started to try 8.0R.  A summary of
   chips and releases I tried so far is now the following:
 
7.2R  8.0R  8.0-STABLE
   82540EM (chip=0x100e8086, rev=0x02)  OKOKtoo slow[1]
   82541PI (chip=0x107c8086, rev=0x05)  OK? OK


 Running 8.0R I've noticed the same problem with this card (0x107c8086).
   Duplex and speed are manually set at full/1000.


 e...@pci0:3:3:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05
 hdr=0x00
vendor = 'Intel Corporation'
device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
class  = network
subclass   = ethernet


 Regards,

 --Jason
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org