Re: 74 hours till next No Buffer Space Available reboot ...

2007-04-14 Thread Daniel Eischen

On Sat, 14 Apr 2007, Marc G. Fournier wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Sunday, April 08, 2007 23:04:42 -0400 Dave [EMAIL PROTECTED] wrote:


Hello,
This is what i get for catching this late. Can you describe your
situation? I've got a server, router actually running 6.1-p6 i believe, and
lately it's been doing this stop. I can't be any more specific than that,
because that's all i know. The box just goes unresponsive, i can get a login
prompt on the console, but it's unresponsive. I have to reboot it. This has
occurred twice now and i'm starting to get concerned. I've ruled out ram, i
recently replaced it's ram for an unrelated reason so i don't think that's
it. If your situation is similar can you let me know what you tried?


This is a different situation, I think ... first, I'm running 6.2-STABLE, as of
about last week, so a much newer kernel then you are running ... and in my
case, at least, I can still login to the machine using ssh and force a reboot
remotely ... it doesn't seem to be a 'solid hang' ... if I were to hazard a
guess as to what it feels like ... it feels like the network interface
buffer has filled up, but isn't being released properly ... almost like a
memory leak, but on the network ... if I leave it long enough, it will
eventually require a tech to power cycle it, but if I catch it early enough, I
can still get in to do a reboot ...

But ... that said ... when you say 'get a login prompt on the console, but
it's unresponse ... do you mean that you can actually type in a userid, and
possibly passwd, but after that it just hangs?


I will just add that I get this on an old 4-stable router
box (for years).  It is on an sf interface and I _thought_
it was due to a flaky hub.  I got the sendto: no buffer
space avail message on the incoming/outgoing interface
to the router that was doing NAT and ipfw to our internal
LANs.  I resorted to writing a cron job that would try
to ping the router at the other end of the sf interface
and do an 'ifconfig sf0 down; ifconfig sf0 up' whenever
the router at the other end could not be ping'd.  Something
like this:

  if ping -c 2 remote-router  /dev/null; then
/usr/bin/true
  else
/sbin/ifconfig sf0 down
/bin/sleep 1
/sbin/ifconfig sf0 up
  fi

This router is running 4.11.  Without the cronjob, the
network would fail every week or two.  I gave up trying
to figure out what the real problem was.

--
DE
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 74 hours till next No Buffer Space Available reboot ...

2007-04-13 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Sunday, April 08, 2007 23:04:42 -0400 Dave [EMAIL PROTECTED] wrote:

 Hello,
 This is what i get for catching this late. Can you describe your
 situation? I've got a server, router actually running 6.1-p6 i believe, and
 lately it's been doing this stop. I can't be any more specific than that,
 because that's all i know. The box just goes unresponsive, i can get a login
 prompt on the console, but it's unresponsive. I have to reboot it. This has
 occurred twice now and i'm starting to get concerned. I've ruled out ram, i
 recently replaced it's ram for an unrelated reason so i don't think that's
 it. If your situation is similar can you let me know what you tried?

This is a different situation, I think ... first, I'm running 6.2-STABLE, as of 
about last week, so a much newer kernel then you are running ... and in my 
case, at least, I can still login to the machine using ssh and force a reboot 
remotely ... it doesn't seem to be a 'solid hang' ... if I were to hazard a 
guess as to what it feels like ... it feels like the network interface 
buffer has filled up, but isn't being released properly ... almost like a 
memory leak, but on the network ... if I leave it long enough, it will 
eventually require a tech to power cycle it, but if I catch it early enough, I 
can still get in to do a reboot ...

But ... that said ... when you say 'get a login prompt on the console, but 
it's unresponse ... do you mean that you can actually type in a userid, and 
possibly passwd, but after that it just hangs?


 Thanks.
 Dave.

 - Original Message - From: Marc G. Fournier [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: Chris [EMAIL PROTECTED]; Thiago Esteves de Oliveira
 [EMAIL PROTECTED]
 Sent: Sunday, April 08, 2007 10:28 PM
 Subject: 74 hours till next No Buffer Space Available reboot ...


 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1


 In my case, I can almost set my watch to it (if I had a watch) ... every 3
 days, 2 hours, it seems that I have to reboot this machine, as that is
 when the
 'No Buffer Space Available' r starts to be generated ...

 There are two others (CC'd in this) that have experienced the same ...

 Chris / Thiago ... in your cases, are you finding that it happens as
 regularly
 with your servers?  Thiago, I believe you ended up reverting to an older
 kernel
 to clear up the situation?

 I've included my 'netstat -m' report ... from it, it doesn't look to me
 like
 its an mbuf issue, or am I missing something?  Is there something else
 that, in
 74 hours, I can provide before I do the reboot?

 Chris, you mentioned reducing recvspace/sendspace to correct the issue?
 Has
 that fixed it for you, or just prolonged until it happens again?  How did
 you
 set this?  I've checked both the man pages for ifconfig and fxp, and don't
 see
 anything ... ah, just found it doing a 'sysctl -a' ... can you post your
 settings from /etc/sysctl.conf?  or did you set it somewhere else?  I'd
 like to
 try that and see if maybe that changes my '74 hours uptime', either good
 or bad
 ...



 # netstat -m
 161/949/1110 mbufs in use (current/cache/total)
 133/639/772/25600 mbuf clusters in use (current/cache/total/max)
 133/396 mbuf+clusters out of packet secondary zone in use (current/cache)
 0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max)
 0/0/0/0 9k jumbo clusters in use (current/cache/total/max)
 0/0/0/0 16k jumbo clusters in use (current/cache/total/max)
 306K/1515K/1821K bytes allocated to network (current/cache/total)
 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
 0/45/6656 sfbufs in use (current/peak/max)
 0 requests for sfbufs denied
 0 requests for sfbufs delayed
 325 requests for I/O initiated by sendfile
 731 calls to protocol drain routines


 - 
 Marc G. Fournier   Hub.Org Networking Services
 (http://www.hub.org)
 Email . [EMAIL PROTECTED]  MSN . [EMAIL 
 PROTECTED]
 Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.5 (FreeBSD)

 iD8DBQFGGaTD4QvfyHIvDvMRAm3jAKDtZk1IgW3DbMGGKASiSsbNV7Ok3QCgtvwK
 JSuRYW1Af0lfFK2QvYMo9v8=
 =3DwH
 -END PGP SIGNATURE-

 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]




- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGIEq34QvfyHIvDvMRAo+uAKDTevbmYP2q7p7tvO674RMlFoiPpACgoCVY
cvG08TsmvMN/iwBI3BVEEeo=
=0r5p
-END PGP SIGNATURE-

___
[EMAIL PROTECTED] mailing list

74 hours till next No Buffer Space Available reboot ...

2007-04-08 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


In my case, I can almost set my watch to it (if I had a watch) ... every 3 
days, 2 hours, it seems that I have to reboot this machine, as that is when the 
'No Buffer Space Available' r starts to be generated ...

There are two others (CC'd in this) that have experienced the same ...

Chris / Thiago ... in your cases, are you finding that it happens as regularly 
with your servers?  Thiago, I believe you ended up reverting to an older kernel 
to clear up the situation?

I've included my 'netstat -m' report ... from it, it doesn't look to me like 
its an mbuf issue, or am I missing something?  Is there something else that, in 
74 hours, I can provide before I do the reboot?

Chris, you mentioned reducing recvspace/sendspace to correct the issue?  Has 
that fixed it for you, or just prolonged until it happens again?  How did you 
set this?  I've checked both the man pages for ifconfig and fxp, and don't see 
anything ... ah, just found it doing a 'sysctl -a' ... can you post your 
settings from /etc/sysctl.conf?  or did you set it somewhere else?  I'd like to 
try that and see if maybe that changes my '74 hours uptime', either good or bad 
...



# netstat -m
161/949/1110 mbufs in use (current/cache/total)
133/639/772/25600 mbuf clusters in use (current/cache/total/max)
133/396 mbuf+clusters out of packet secondary zone in use (current/cache)
0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/0 9k jumbo clusters in use (current/cache/total/max)
0/0/0/0 16k jumbo clusters in use (current/cache/total/max)
306K/1515K/1821K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/45/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
325 requests for I/O initiated by sendfile
731 calls to protocol drain routines


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGGaTD4QvfyHIvDvMRAm3jAKDtZk1IgW3DbMGGKASiSsbNV7Ok3QCgtvwK
JSuRYW1Af0lfFK2QvYMo9v8=
=3DwH
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]