help with flaky reboot on 3.1

1999-09-14 Thread Stevan Arychuk

Greetings,

We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the
RELENG_3 branch (used this to take advantage of the KVA modifications
that were rolled in after the release).

The machines are dual PII 450's (N440BX) with 512MB RAM.  We are also
using built in ethernet and SCSI controllers.

Our kernel configuration is fairly standard with the following
exceptions:
maxusers 512
options  NMBCLUSTERS=33280
options  SMP
options  APIC_IO
options "VM_KMEM_SIZE=(128*1024*1024)"
options "VM_KMEM_SIZE_MAX=(128*1024*1024)"

Here are the symptoms we are seeing:

1 machine running a caching squid reverse proxy would spontaneously
reboot with no error messages every week or so.  This machine was a
single CPU only.  

We were seeing an excessive number of sockets in the CLOSING state, via
netstat.  The reboots seemed to be co-related to having many such
sockets.  Suspecting bad TCP stack on the Internet, we did 'sysctl -w
net.inet.tcp.always_keepalive=1'  This fixed the many CLOSING sockets
problem, but did not fix the reboots.

Other machines running custom software (Dual CPU) would also
spontaneously reboot also with no error messages.  The reboots are
happening on an increasing frequency, almost to the point of a couple
times a day.  Sometimes a machine would reboot a couple times a day,
then be ok for another week or so.  

Our software excercies the disk, CPU and network quite a bit, but not
excessively.  The only machines that are having problems, are production
machines directly connected to the Internet.  We've had the same
machines running internally with longer uptimes, and heavier volumes.

Any suggestions/idea's?

Sorry about the super-post, I thought detail was important.

- Stevan Arychuk


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: help with flaky reboot on 3.1

1999-09-14 Thread Chris D. Faulhaber

On Tue, 14 Sep 1999, Stevan Arychuk wrote:

 Greetings,
 
 We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the
 RELENG_3 branch (used this to take advantage of the KVA modifications
 that were rolled in after the release).
 
 
 Here are the symptoms we are seeing:
 
 1 machine running a caching squid reverse proxy would spontaneously
 reboot with no error messages every week or so.  This machine was a
 single CPU only.  
 
 We were seeing an excessive number of sockets in the CLOSING state, via
 netstat.  The reboots seemed to be co-related to having many such
 sockets.  Suspecting bad TCP stack on the Internet, we did 'sysctl -w
 net.inet.tcp.always_keepalive=1'  This fixed the many CLOSING sockets
 problem, but did not fix the reboots.
 
 Other machines running custom software (Dual CPU) would also
 spontaneously reboot also with no error messages.  The reboots are
 happening on an increasing frequency, almost to the point of a couple
 times a day.  Sometimes a machine would reboot a couple times a day,
 then be ok for another week or so.  
 
 Our software excercies the disk, CPU and network quite a bit, but not
 excessively.  The only machines that are having problems, are production
 machines directly connected to the Internet.  We've had the same
 machines running internally with longer uptimes, and heavier volumes.
 
 Any suggestions/idea's?
 
 Sorry about the super-post, I thought detail was important.
 
 - Stevan Arychuk
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message
 

-
Chris D. Faulhaber [EMAIL PROTECTED]  |  All the true gurus I've met never
System/Network Administrator,|  claimed they were one, and always
Reality Check Information, Inc.  |  pointed to someone better.






To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: help with flaky reboot on 3.1

1999-09-14 Thread Chris D. Faulhaber

On Tue, 14 Sep 1999, Stevan Arychuk wrote:

 Greetings,
 
 We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the
 RELENG_3 branch (used this to take advantage of the KVA modifications
 that were rolled in after the release).
 

Are the kernel and user-land out of sync (kernel sources newer than system
sources)?

Cheers,
Chris

p.s. sorry about the prev. reply without comments...Pine's send and cancel
keys are too close together :)

-
Chris D. Faulhaber [EMAIL PROTECTED]  |  All the true gurus I've met never
System/Network Administrator,|  claimed they were one, and always
Reality Check Information, Inc.  |  pointed to someone better.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



help with flaky reboot on 3.1

1999-09-14 Thread Stevan Arychuk
Greetings,

We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the
RELENG_3 branch (used this to take advantage of the KVA modifications
that were rolled in after the release).

The machines are dual PII 450's (N440BX) with 512MB RAM.  We are also
using built in ethernet and SCSI controllers.

Our kernel configuration is fairly standard with the following
exceptions:
maxusers 512
options  NMBCLUSTERS=33280
options  SMP
options  APIC_IO
options VM_KMEM_SIZE=(128*1024*1024)
options VM_KMEM_SIZE_MAX=(128*1024*1024)

Here are the symptoms we are seeing:

1 machine running a caching squid reverse proxy would spontaneously
reboot with no error messages every week or so.  This machine was a
single CPU only.  

We were seeing an excessive number of sockets in the CLOSING state, via
netstat.  The reboots seemed to be co-related to having many such
sockets.  Suspecting bad TCP stack on the Internet, we did 'sysctl -w
net.inet.tcp.always_keepalive=1'  This fixed the many CLOSING sockets
problem, but did not fix the reboots.

Other machines running custom software (Dual CPU) would also
spontaneously reboot also with no error messages.  The reboots are
happening on an increasing frequency, almost to the point of a couple
times a day.  Sometimes a machine would reboot a couple times a day,
then be ok for another week or so.  

Our software excercies the disk, CPU and network quite a bit, but not
excessively.  The only machines that are having problems, are production
machines directly connected to the Internet.  We've had the same
machines running internally with longer uptimes, and heavier volumes.

Any suggestions/idea's?

Sorry about the super-post, I thought detail was important.

- Stevan Arychuk


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: help with flaky reboot on 3.1

1999-09-14 Thread Chris D. Faulhaber
On Tue, 14 Sep 1999, Stevan Arychuk wrote:

 Greetings,
 
 We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the
 RELENG_3 branch (used this to take advantage of the KVA modifications
 that were rolled in after the release).
 
 
 Here are the symptoms we are seeing:
 
 1 machine running a caching squid reverse proxy would spontaneously
 reboot with no error messages every week or so.  This machine was a
 single CPU only.  
 
 We were seeing an excessive number of sockets in the CLOSING state, via
 netstat.  The reboots seemed to be co-related to having many such
 sockets.  Suspecting bad TCP stack on the Internet, we did 'sysctl -w
 net.inet.tcp.always_keepalive=1'  This fixed the many CLOSING sockets
 problem, but did not fix the reboots.
 
 Other machines running custom software (Dual CPU) would also
 spontaneously reboot also with no error messages.  The reboots are
 happening on an increasing frequency, almost to the point of a couple
 times a day.  Sometimes a machine would reboot a couple times a day,
 then be ok for another week or so.  
 
 Our software excercies the disk, CPU and network quite a bit, but not
 excessively.  The only machines that are having problems, are production
 machines directly connected to the Internet.  We've had the same
 machines running internally with longer uptimes, and heavier volumes.
 
 Any suggestions/idea's?
 
 Sorry about the super-post, I thought detail was important.
 
 - Stevan Arychuk
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the message
 

-
Chris D. Faulhaber jed...@fxp.org  |  All the true gurus I've met never
System/Network Administrator,|  claimed they were one, and always
Reality Check Information, Inc.  |  pointed to someone better.






To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: help with flaky reboot on 3.1

1999-09-14 Thread Chris D. Faulhaber
On Tue, 14 Sep 1999, Stevan Arychuk wrote:

 Greetings,
 
 We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the
 RELENG_3 branch (used this to take advantage of the KVA modifications
 that were rolled in after the release).
 

Are the kernel and user-land out of sync (kernel sources newer than system
sources)?

Cheers,
Chris

p.s. sorry about the prev. reply without comments...Pine's send and cancel
keys are too close together :)

-
Chris D. Faulhaber jed...@fxp.org  |  All the true gurus I've met never
System/Network Administrator,|  claimed they were one, and always
Reality Check Information, Inc.  |  pointed to someone better.




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: help with flaky reboot on 3.1

1999-09-14 Thread Stevan Arychuk
Yes,

The KVA patches were introduced after the initial release of 3.1, so I
guess you could say we're running 3.1 1/2 - RELEASE.

We held off on upgrading to 3.2 as there was still a problem with GDB. 
I'm testing 3.3-19990909-RC and I've been able to do a back trace on a
core dump from a SMP-enabled kernel 4 out of 6 times.  

I haven't been following the other lists, does anyone know how close
this latest RC is, will it be 3.3-RELEASE by tommorow?

- Stevan

Chris D. Faulhaber wrote:
 
 On Tue, 14 Sep 1999, Stevan Arychuk wrote:
 
  Greetings,
 
  We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the
  RELENG_3 branch (used this to take advantage of the KVA modifications
  that were rolled in after the release).
 
 
 Are the kernel and user-land out of sync (kernel sources newer than system
 sources)?
 
 Cheers,
 Chris
 
 p.s. sorry about the prev. reply without comments...Pine's send and cancel
 keys are too close together :)
 
 -
 Chris D. Faulhaber jed...@fxp.org  |  All the true gurus I've met never
 System/Network Administrator,|  claimed they were one, and always
 Reality Check Information, Inc.  |  pointed to someone better.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message