Re: Strange TCP-related hang on startup w/ recent CVSUP

2005-06-09 Thread Terry Kennedy
  Just to close this out, this was caused by the uipc_socket.c change that
also broke NFS. All better now...

Terry Kennedy http://www.tmk.com
[EMAIL PROTECTED] New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Strange TCP-related hang on startup w/ recent CVSUP

2005-06-08 Thread Terry Kennedy
Kris Kennaway (kris at obsecurity.org) writes:

> Are you sure you didn't change your kernel config or forget to rebuild
> modules?

  No changes to the kernel config, and I did the usual sequence of make
buildworld ; make buildkernel; make installkernel; reboot to single-user
mode; make installworld; mergemaster.

  Any ideas?

Terry Kennedy http://www.tmk.com
[EMAIL PROTECTED] New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Strange TCP-related hang on startup w/ recent CVSUP

2005-06-07 Thread Kris Kennaway
On Wed, Jun 08, 2005 at 12:37:59AM -0500, Terry Kennedy wrote:
>   I have a number of boxes which have been running well w/ 5.4-stable. The 
> last kernel and userland builds on those boxes were on May 26th, from a 
> CVSUP done that same day.
> 
>   Last night I did another CVSUP and the usual buildworld/buildkernel/in-
> stallkernel/installworld/mergemaster, and the boxes wouldn't come back up,
> hanging in sbwait state at the point where they would normally do their
> first net access.
> 
>   In one box, this is a batch of NFS mounts; on the other it is the initial
> ntpdate query. At this point the boxes don't respond to pings and sit for-
> ever (at least an hour) not doing anything. Typing ^C on the console aborts
> the hanging process, and then the startup proceeds. With a ping going from
> another system, I see that the net doesn't come up on the problem boxes until
> several seconds into the execution of the next network-related command - it
> is almost as if the previous ifconfig (done from rc.conf) didn't have any
> effect (despite the "link up" message on the console). The other oddity is
> that I get a "rpc.lockd: 100024 RPC: Port mapper failure" right after lockd
> starts, even though other net commands have completed successfully.
> 
>   Booting from kernel.old (the May 26th one) boots normally, so it is a ker-
> nel issue, not something in userland.

Are you sure you didn't change your kernel config or forget to rebuild
modules?

Kris

pgpFox1cppNxp.pgp
Description: PGP signature


Strange TCP-related hang on startup w/ recent CVSUP

2005-06-07 Thread Terry Kennedy
  I have a number of boxes which have been running well w/ 5.4-stable. The 
last kernel and userland builds on those boxes were on May 26th, from a 
CVSUP done that same day.

  Last night I did another CVSUP and the usual buildworld/buildkernel/in-
stallkernel/installworld/mergemaster, and the boxes wouldn't come back up,
hanging in sbwait state at the point where they would normally do their
first net access.

  In one box, this is a batch of NFS mounts; on the other it is the initial
ntpdate query. At this point the boxes don't respond to pings and sit for-
ever (at least an hour) not doing anything. Typing ^C on the console aborts
the hanging process, and then the startup proceeds. With a ping going from
another system, I see that the net doesn't come up on the problem boxes until
several seconds into the execution of the next network-related command - it
is almost as if the previous ifconfig (done from rc.conf) didn't have any
effect (despite the "link up" message on the console). The other oddity is
that I get a "rpc.lockd: 100024 RPC: Port mapper failure" right after lockd
starts, even though other net commands have completed successfully.

  Booting from kernel.old (the May 26th one) boots normally, so it is a ker-
nel issue, not something in userland.

  Hardware is a Tyan S2721-533 (dual Xeon) board w/ onboard Intel "em" Gig-
abit Ethernet. To keep this message short, I'm omitting the dmesg output,
but it can be found at http://www.tmk.com/raidzilla.

  Here's the relevant parts of the console output from the two systems:

first system:

/dev/da1s3h: clean, 199137891 free (819 frags, 24892134 blocks, 0.0% 
fragmentation)
Setting hostname: rz1.tmk.com.
em0: flags=8843 mtu 1500
options=b
inet 204.141.35.40 netmask 0xff00 broadcast 204.141.35.255
inet6 fe80::2e0:81ff:fe28:94d6%em0 prefixlen 64 tentative scopeid 0x1 
ether 00:e0:81:28:94:d6
media: Ethernet 1000baseTX  (autoselect)
status: no carrier
lo0: flags=8049 mtu 16384
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
add net default: gateway 204.141.35.1
Additional routing options:.
Starting devd.
Mounting NFS file systems:em0: Link is up 1000 Mbps Full Duplex
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.01  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.01  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.56  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.56  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.07  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
^CScript /etc/rc.d/mountcritremote interrupted
Starting syslogd.
Checking for core dump on /dev/da0s1b...
savecore: no dumps found
Setting date via ntp.
Looking for host gate.tmk.com and service ntp
host found : gate.tmk.com
Looking for host 204.141.35.61 and service ntp
host found : gate.tmk.com
Looking for host 199.224.0.146 and service ntp
host found : ns1.ispnetinc.net
Looking for host 199.224.0.154 and service ntp
host found : ns2.ispnetinc.net
Looking for host 204.141.40.135 and service ntp
host found : ns3.ispnetinc.net
 8 Jun 00:30:34 ntpdate[416]: step time server 199.224.0.146 offset 1.526419 sec
Starting rpcbind.
NFS access cache time=2
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting mountd.
Starting nfsd.
Starting statd.
Starting lockd.
Jun  8 00:30:35 rz1 rpc.lockd: 100024 RPC: Port mapper failure
Starting usbd.
Starting local daemons:.
Updating motd.
Starting ntpd.
Starting rwhod.
Configuring syscons: blanktime screensaver.

second system:

/dev/da1s3h: clean, 199127439 free (1095 frags, 24890793 blocks, 0.0% 
fragmentation)
Setting hostname: rz2.tmk.com.
em0: flags=8843 mtu 1500
options=b
inet 204.141.35.41 netmask 0xff00 broadcast 204.141.35.255
inet6 fe80::2e0:81ff:fe2e:67b0%em0 prefixlen 64 tentative scopeid 0x1 
ether 00:e0:81:2e:67:b0
media: Ethernet autoselect
status: no carrier
lo0: flags=8049 mtu 16384
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
add net default: gateway 204.141.35.1
Additional routing options:.
Starting devd.
Mounting NFS file systems:.
Starting syslogd.
Checking for core dump on /dev/da0s1b...
savecore: no dumps found
Setting date via ntp.
Looking for host gate.tmk.com and service ntp
em0: Link is up 1000 Mbps Full Duplex
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0