Re: Socket leak (Was: Re: What triggers "No Buffer Space) ?Available"?

2007-05-15 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


It didn't  kept climbing ...

- --On Tuesday, May 15, 2007 21:39:35 +0200 Ulrich Spoerlein 
<[EMAIL PROTECTED]> wrote:

> I'm slowly cathing up on FreeBSD related mails and found this mail ...
>
> Marc G. Fournier wrote:
>> >  > kern.ipc.numopensockets: 7400
>> >  > kern.ipc.maxsockets: 12328
>> >  >
>> >  > ps looks like:
>> >  >
>>
>> 
>>
>> > 2368  p2  Is+  Sat01PM   0:00.03 /bin/tcsh  > root2112  0.0  0.1  5220
>> > 2360  p3  Ss+  Sat01PM   0:00.04 /bin/tcsh  > root   91221  0.0  0.1  5140
>> > 2440  p4  Ss+  11:49PM   0:00.12 -tcsh (tcsh)
>> >
>> > I don't think those processes should consume 7400 sockets.
>> > Indeed, this really looks like a leak in the kernel.
>>
>> Robert has sent me a suggestion to try that I'm in the process of putting
>> together right now, involving backing out some work on uipc_usrreg.c ...
>
> How did the backing out work for you?
>
> Ulrich Spoerlein
> --
> "The trouble with the dictionary is you have to know how the word is
> spelled before you can look it up to see how it is spelled."
> -- Will Cuppy
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"



- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGSjDm4QvfyHIvDvMRAv+4AKCUc0ijgXs4igHymP94NGM5XAmvXQCfUi2X
m/jpnf+voCioDKmJjedIRbw=
=dyqI
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) ?Available"?

2007-05-15 Thread Jeremy Chadwick
On Tue, May 15, 2007 at 09:39:35PM +0200, Ulrich Spoerlein wrote:
> How did the backing out work for you?

Taken from another mail from Marc, since there's now multiple threads
discussing this:

>> Did we determine whether backing out to before the unpcb socket
>> reference count change made any difference for you?
>
> The problem appeared to persist after backing it out ...

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) ?Available"?

2007-05-15 Thread Ulrich Spoerlein
I'm slowly cathing up on FreeBSD related mails and found this mail ...

Marc G. Fournier wrote:
> >  > kern.ipc.numopensockets: 7400
> >  > kern.ipc.maxsockets: 12328
> >  >
> >  > ps looks like:
> >  >
> 
> 
> 
> > 2368  p2  Is+  Sat01PM   0:00.03 /bin/tcsh  > root2112  0.0  0.1  5220
> > 2360  p3  Ss+  Sat01PM   0:00.04 /bin/tcsh  > root   91221  0.0  0.1  5140
> > 2440  p4  Ss+  11:49PM   0:00.12 -tcsh (tcsh)
> >
> > I don't think those processes should consume 7400 sockets.
> > Indeed, this really looks like a leak in the kernel.
> 
> Robert has sent me a suggestion to try that I'm in the process of putting 
> together right now, involving backing out some work on uipc_usrreg.c ...

How did the backing out work for you?

Ulrich Spoerlein
-- 
"The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled."
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) ?Available"?

2007-05-08 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, May 08, 2007 15:14:29 +0200 Oliver Fromme 
<[EMAIL PROTECTED]> wrote:

> What kind of jails are those?  What applications are
> running inside them?  It's quite possible that the
> processes on one machine use 120 sockets per jail,
> while on a different machine they use only half that
> many per jail, on average.  Of course, I can't tell
> for sure without knowing what is running in those
> jails.

The all run pretty much the same thing, on all the machines ... by default, 
standard syslog, sshd, cron, cyrus imapd, postfix and apache ... some run 
aolserver over top of that, or jdk/tomcat, or zope ... but they aren't specific 
to the server itself, as they get moved around ...

>  > kern.ipc.numopensockets: 7400
>  > kern.ipc.maxsockets: 12328
>  >
>  > ps looks like:
>  >



> 2368  p2  Is+  Sat01PM   0:00.03 /bin/tcsh  > root2112  0.0  0.1  5220
> 2360  p3  Ss+  Sat01PM   0:00.04 /bin/tcsh  > root   91221  0.0  0.1  5140
> 2440  p4  Ss+  11:49PM   0:00.12 -tcsh (tcsh)
>
> I don't think those processes should consume 7400 sockets.
> Indeed, this really looks like a leak in the kernel.

Robert has sent me a suggestion to try that I'm in the process of putting 
together right now, involving backing out some work on uipc_usrreg.c ...


> Maybe "sockstat -u" and/or "fstat | grep -w local" (both
> of those commands should basically list the same kind of
> information).  My guess is that the output will be rather
> short, i.e. much shorter than 7355 lines.  If that's true,
> it is another indication that the problem is caused by
> a kernel leak.

at the time I rebooted, with no processes, but 7400 sockets:

> wc -l sockstat.out.txt
  12 sockstat.out.txt
> grep local fstat.out.txt | wc -l
   7

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGQLrf4QvfyHIvDvMRAqlWAJ9Dg2J55e6YVAzkfC9mGascFfr+JQCeJpWo
uXAZtN0WbyKdM4a12WJjszs=
=BA7G
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) ?Available"?

2007-05-08 Thread Oliver Fromme
Marc G. Fournier wrote:
 > Oliver Fromme wrote:
 > > If I remember correctly, you wrote that 11k sockets are
 > > in use with 90 jails.  That's about 120 sockets per jail,
 > > which isn't out of the ordinary.  Of course it depends on
 > > what is running in those jails, but my guess is that you
 > > just need to increase the limit on the number of sockets
 > > (i.e. kern.ipc.maxsockets).
 > 
 > The problem is that if I compare it to another server, running 2/3 as
 > many jails, I'm finding its using 1/4 as many sockets, after over 60
 > days of uptime:
 > 
 > kern.ipc.numopensockets: 3929
 > kern.ipc.maxsockets: 12328

What kind of jails are those?  What applications are
running inside them?  It's quite possible that the
processes on one machine use 120 sockets per jail,
while on a different machine they use only half that
many per jail, on average.  Of course, I can't tell
for sure without knowing what is running in those
jails.

 > But, let's try what I think it was Matt suggested ...

Yes, that was a good suggestion.

 > right now, I'm at just over 11k sockets on that machine, so I'm going
 > to shutdown everything except bare minimum server (all jails shut
 > off) and see where sockets drop to after that ...
 > 
 > I'm down to ~7400 sockets:
 > 
 > kern.ipc.numopensockets: 7400
 > kern.ipc.maxsockets: 12328
 > 
 > ps looks like:
 > 
 > mars# ps aux
 > USER PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
 > [kernel threads omitted]
 > root   1  0.0  0.0   768   232  ??  ILs  Sat12PM   3:22.01 /sbin/init --
 > root 480  0.0  0.0   528   244  ??  Is   Sat12PM   0:04.32 /sbin/devd
 > root 539  0.0  0.0  1388   848  ??  Ss   Sat12PM   0:07.21 
 > /usr/sbin/syslogd -l /var/run/log -l /var/named/var/run/log -s -s
 > daemon   708  0.0  0.0  1316   748  ??  Ss   Sat12PM   0:02.49 
 > /usr/sbin/rwhod
 > root 749  0.0  0.0  3532  1824  ??  Is   Sat12PM   0:07.60 /usr/sbin/sshd
 > root 768  0.0  0.0  1412   920  ??  Is   Sat12PM   0:02.23 
 > /usr/sbin/cron -s
 > root2087  0.0  0.0  2132  1360  ??  Ss   Sat01PM   0:04.73 screen -R
 > root   88103  0.0  0.1  6276  2600  ??  Ss   11:41PM   0:00.62 sshd: [EMAIL 
 > PROTECTED] (sshd)
 > root   91218  0.0  0.1  6276  2664  ??  Ss   11:49PM   0:00.24 sshd: [EMAIL 
 > PROTECTED] (sshd)
 > root 813  0.0  0.0  1352   748  v0  Is+  Sat12PM   0:00.00 
 > /usr/libexec/getty Pc ttyv0
 > root   88106  0.0  0.1  5160  2516  p0  Ss   11:41PM   0:00.20 -tcsh (tcsh)
 > root   97563  0.0  0.0  1468   804  p0  R+   12:17AM   0:00.00 ps aux
 > root2088  0.0  0.1  5352  2368  p2  Is+  Sat01PM   0:00.03 /bin/tcsh
 > root2112  0.0  0.1  5220  2360  p3  Ss+  Sat01PM   0:00.04 /bin/tcsh
 > root   91221  0.0  0.1  5140  2440  p4  Ss+  11:49PM   0:00.12 -tcsh (tcsh)

I don't think those processes should consume 7400 sockets.
Indeed, this really looks like a leak in the kernel.

 > And netstat -n -funix shows 7355 lines similar to:
 > 
 > d05f1000 stream  0  00 d05f109000
 > d05f1090 stream  0  00 d05f100000
 > cf1be000 stream  0  00 cf1bdea000
 > cf1bdea0 stream  0  00 cf1be00000
 > cec42bd0 stream  0  00 cf2ac48000
 > cf2ac480 stream  0  00 cec42bd000
 > 
 > with the final few associated with running processes:

How do you determine that?  You _cannot_ tell from netstat
which sockets are associated with running processes.

 > I'm willing to shut everthing down like this again the next time it happens 
 > (in 
 > 2-3 days) if someone has some other command / output they'd like fo rme to 
 > provide the output of?

Maybe "sockstat -u" and/or "fstat | grep -w local" (both
of those commands should basically list the same kind of
information).  My guess is that the output will be rather
short, i.e. much shorter than 7355 lines.  If that's true,
it is another indication that the problem is caused by
a kernel leak.

 > And, I have the following outputs as of the above, where everythign is 
 > shutdown 
 > and its running on minimal processes:
 > 
 > # ls -lt
 > total 532
 > - -rw-r--r--  1 root  wheel   11142 May  8 00:20 fstat.out
 > - -rw-r--r--  1 root  wheel 742 May  8 00:20 netstat_m.out
 > - -rw-r--r--  1 root  wheel  486047 May  8 00:20 netstat_na.out
 > - -rw-r--r--  1 root  wheel 735 May  8 00:20 sockstat.out
   ^^^
Aha.  :-)

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"C++ is the only current language making COBOL look good."
-- Bertrand Meyer

Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-08 Thread Robert Watson


On Tue, 8 May 2007, Marc G. Fournier wrote:


So, over 7000 sockets with pretty much all processes shut down ...

Shouldn't the garbage collector be cutting in somewhere here?

I'm willing to shut everthing down like this again the next time it happens 
(in 2-3 days) if someone has some other command / output they'd like fo rme 
to provide the output of?


And, I have the following outputs as of the above, where everythign is 
shutdown and its running on minimal processes:


I think there may be a bug in the MFC of the UNIX domain socket reference 
count changes in RELENG_6:


  revision 1.155.2.8
  date: 2007/01/12 16:24:23;  author: jhb;  state: Exp;  lines: +36 -7
  MFC: Close a race between enumerating UNIX domain socket pcb structures via
  sysctl and socket teardown.  Note that we engage in a bit of trickery to
  preserve the ABI of 'struct unpcb' in 6.x.  We change the UMA zone to hold
  a 'struct unpcb_wrapper' which holds a 6.x 'struct unpcb' followed by the
  new reference count needed for handling the race.  We then cast 'struct
  unpcb' pointers to 'struct unpcb_wrapper' pointers when we need to access
  the reference count.

  Submitted by:   ups (including the ABI trickery)

Could you try backing this out locally and see if the problem goes away?  I've 
forwarded the information you sent to me previously to Stephan so he can take 
a look.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-07 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Monday, May 07, 2007 19:01:02 +0200 Oliver Fromme
<[EMAIL PROTECTED]> 
wrote:

> If I remember correctly, you wrote that 11k sockets are
> in use with 90 jails.  That's about 120 sockets per jail,
> which isn't out of the ordinary.  Of course it depends on
> what is running in those jails, but my guess is that you
> just need to increase the limit on the number of sockets
> (i.e. kern.ipc.maxsockets).

The problem is that if I compare it to another server, running 2/3 as many 
jails, I'm finding its using 1/4 as many sockets, after over 60 days of uptime:

kern.ipc.numopensockets: 3929
kern.ipc.maxsockets: 12328

But, let's try what I think it was Matt suggested ... right now, I'm at just 
over 11k sockets on that machine, so I'm going to shutdown everything except 
bare minimum server (all jails shut off) and see where sockets drop to after 
that ...

I'm down to ~7400 sockets:

kern.ipc.numopensockets: 7400
kern.ipc.maxsockets: 12328

ps looks like:

mars# ps aux
USER PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
root  10 99.0  0.0 0 8  ??  RL   Sat12PM 2527:55.02 [idle: cpu1]
root  11 99.0  0.0 0 8  ??  RL   Sat12PM 2816:58.21 [idle: cpu0]
root   0  0.0  0.0 0 0  ??  WLs  Sat12PM   0:00.00 [swapper]
root   1  0.0  0.0   768   232  ??  ILs  Sat12PM   3:22.01 /sbin/init --
root   2  0.0  0.0 0 8  ??  DL   Sat12PM   0:31.14 [g_event]
root   3  0.0  0.0 0 8  ??  DL   Sat12PM  12:02.57 [g_up]
root   4  0.0  0.0 0 8  ??  DL   Sat12PM  17:20.73 [g_down]
root   5  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.35 [thread taskq]
root   6  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [xpt_thrd]
root   7  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [kqueue taskq]
root   8  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [aic_recovery0]
root   9  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [aic_recovery0]
root  12  0.0  0.0 0 8  ??  WL   Sat12PM  12:11.84 [swi1: net]
root  13  0.0  0.0 0 8  ??  WL   Sat12PM  15:31.57 [swi4: clock]
root  14  0.0  0.0 0 8  ??  WL   Sat12PM   0:00.00 [swi3: vm]
root  15  0.0  0.0 0 8  ??  DL   Sat12PM   1:10.54 [yarrow]
root  16  0.0  0.0 0 8  ??  WL   Sat12PM   0:00.00 [swi6: task 
queue]
root  17  0.0  0.0 0 8  ??  WL   Sat12PM   0:00.00 [swi6: Giant 
taskq]
root  18  0.0  0.0 0 8  ??  WL   Sat12PM   0:00.00 [swi5: +]
root  19  0.0  0.0 0 8  ??  WL   Sat12PM  11:50.45 [swi2: cambio]
root  20  0.0  0.0 0 8  ??  WL   Sat12PM   8:28.94 [irq20: fxp0]
root  21  0.0  0.0 0 8  ??  WL   Sat12PM   0:00.00 [irq21: fxp1]
root  22  0.0  0.0 0 8  ??  WL   Sat12PM   0:00.00 [irq25: ahc0]
root  23  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [aic_recovery1]
root  24  0.0  0.0 0 8  ??  WL   Sat12PM   7:53.11 [irq26: ahc1]
root  25  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [aic_recovery1]
root  26  0.0  0.0 0 8  ??  WL   Sat12PM   0:00.00 [irq1: atkbd0]
root  27  0.0  0.0 0 8  ??  DL   Sat12PM   0:32.19 [pagedaemon]
root  28  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [vmdaemon]
root  29  0.0  0.0 0 8  ??  DL   Sat12PM  38:04.73 [pagezero]
root  30  0.0  0.0 0 8  ??  DL   Sat12PM   0:30.43 [bufdaemon]
root  31  0.0  0.0 0 8  ??  DL   Sat12PM  11:38.76 [syncer]
root  32  0.0  0.0 0 8  ??  DL   Sat12PM   0:57.76 [vnlru]
root  33  0.0  0.0 0 8  ??  DL   Sat12PM   1:21.24 [softdepflush]
root  34  0.0  0.0 0 8  ??  DL   Sat12PM   6:00.16 [schedcpu]
root  35  0.0  0.0 0 8  ??  DL   Sat12PM   6:26.10 [g_mirror md1]
root  36  0.0  0.0 0 8  ??  DL   Sat12PM   6:10.56 [g_mirror md2]
root  37  0.0  0.0 0 8  ??  DL   Sat12PM   0:00.00 [g_mirror vm]
root 480  0.0  0.0   528   244  ??  Is   Sat12PM   0:04.32 /sbin/devd
root 539  0.0  0.0  1388   848  ??  Ss   Sat12PM   0:07.21 
/usr/sbin/syslogd -l /var/run/log -l /var/named/var/run/log -s -s
daemon   708  0.0  0.0  1316   748  ??  Ss   Sat12PM   0:02.49 /usr/sbin/rwhod
root 749  0.0  0.0  3532  1824  ??  Is   Sat12PM   0:07.60 /usr/sbin/sshd
root 768  0.0  0.0  1412   920  ??  Is   Sat12PM   0:02.23 /usr/sbin/cron -s
root2087  0.0  0.0  2132  1360  ??  Ss   Sat01PM   0:04.73 screen -R
root   88103  0.0  0.1  6276  2600  ??  Ss   11:41PM   0:00.62 sshd: [EMAIL 
PROTECTED] 
(sshd)
root   91218  0.0  0.1  6276  2664  ??  Ss   11:49PM   0:00.24 sshd: [EMAIL 
PROTECTED] 
(sshd)
root 813  0.0  0.0  1352   748  v0  Is+  Sat12PM   0:00.00 
/usr/libexec/getty Pc ttyv0
root   88106  0.0  0.1  5160  2516  p0  Ss   11:41PM   0:00.20 -tcsh (tcsh)
root   97563  0.0  0.0  1468   804  p0  R+   12:17AM   0:00.00 ps aux
root2088  0.0  0.1  5352  2368  p2  Is+  Sat01PM 

Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-07 Thread Kris Kennaway
On Mon, May 07, 2007 at 07:01:02PM +0200, Oliver Fromme wrote:
> Marc G. Fournier wrote:
>  > Now, that makes sense to me, I can understand that ... but, how would
>  > that look as far as netstat -nA shows?  Or, would it?  For example, I
>  > have:
> 
> You should use "-na" to list all sockets, not "-nA".
> 
>  > mars# netstat -nA | grep c9655a20
>  > c9655a20 stream  0  00 c95d63f000
>  > c95d63f0 stream  0  00 c9655a2000
>  > mars# netstat -nA | grep c95d63f0
>  > c9655a20 stream  0  00 c95d63f000
>  > c95d63f0 stream  0  00 c9655a2000
>  > 
>  > They are attached to each other, but there appears to be no 'referencing 
>  > process'
> 
> netstat doesn't show processes at all (sockstat, fstat
> and lsof list sockets by processes).  The sockets above
> are probably from a socketpair(2) or a pipe (which is
> implemented with socketpair(2), AFAIK).  That's perfectly
> normal.
> 
> If I remember correctly, you wrote that 11k sockets are
> in use with 90 jails.  That's about 120 sockets per jail,
> which isn't out of the ordinary.  Of course it depends on
> what is running in those jails, but my guess is that you
> just need to increase the limit on the number of sockets
> (i.e. kern.ipc.maxsockets).

Yes, and if you have 11000 sockets in use under "normal" situations
then you're likely to be pressing right up against the default limit
anyway (e.g. on this machine with 8GB of RAM the default is 12328), so
a slight increase in load will run out of space.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-07 Thread Oliver Fromme
Marc G. Fournier wrote:
 > Now, that makes sense to me, I can understand that ... but, how would
 > that look as far as netstat -nA shows?  Or, would it?  For example, I
 > have:

You should use "-na" to list all sockets, not "-nA".

 > mars# netstat -nA | grep c9655a20
 > c9655a20 stream  0  00 c95d63f000
 > c95d63f0 stream  0  00 c9655a2000
 > mars# netstat -nA | grep c95d63f0
 > c9655a20 stream  0  00 c95d63f000
 > c95d63f0 stream  0  00 c9655a2000
 > 
 > They are attached to each other, but there appears to be no 'referencing 
 > process'

netstat doesn't show processes at all (sockstat, fstat
and lsof list sockets by processes).  The sockets above
are probably from a socketpair(2) or a pipe (which is
implemented with socketpair(2), AFAIK).  That's perfectly
normal.

If I remember correctly, you wrote that 11k sockets are
in use with 90 jails.  That's about 120 sockets per jail,
which isn't out of the ordinary.  Of course it depends on
what is running in those jails, but my guess is that you
just need to increase the limit on the number of sockets
(i.e. kern.ipc.maxsockets).

 > Again, if I'm reading / understanding things right, without the 'referencing 
 > process', it won't show up in sockstat -u, which is why my netstat -nA 
 > numbers 
 > keep growing, but sockstat -u numbers don't ... which also means that there 
 > is 
 > no way to figure out what process / program is leaving 'dangling sockets'? :(

Be careful here, sockstat's output is process-based and
lists sockets multiple times.  For example, the server
sockets that httpd children inherit from their parent
are listed for every single child, while you see it only
once in the netstat output.  On the other hand, sockstat
doesn't show sockets that have been closed and are in
TIME_WAIT state or similar.

Are you sure that UNIX domain sockets are causing the
problem?  Can you rule out other sockets (e.g. tcp)?
In that case you should run "netstat -funix" to list
only UNIX domain sockets (basically the same as the
-u option to sockstat).

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

$ dd if=/dev/urandom of=test.pl count=1
$ file test.pl
test.pl: perl script text executable
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-04 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Friday, May 04, 2007 12:05:11 +0100 Robert Watson <[EMAIL PROTECTED]> 
wrote:

> I think we should be careful to avoid prematurely drawing conclusions about
> the source of the problem.  First question: have you confirmed that the
> resource limit on sockets is definitely what is causing the error you're
> seeing?  I.e., does the number of sockets hit the maximum sockets?

'k, so, based on your other email this morning, about sockstat | stream, I'm 
now keeping an eye on:

# uptime ; netstat -nA | grep -c stream ; sockstat -u | grep -c stream ; sysctl 
kern.ipc.numopensockets ; sysctl kern.ipc.maxsockets
 8:59AM  up 1 day,  9:57, 7 users, load averages: 1.63, 4.92, 5.12
6877
2323
kern.ipc.numopensockets: 8463
kern.ipc.maxsockets: 12328

I'm at least 24 hours out from the error(s) starting to happen ...

> Second point: there are two kinds of resource leaks that seem likely
> candidates for a socket resource exhaustion problem. First, kernel bugs, in
> which the kernel maintains objects despite there being no application
> references, and second, application reference leaks, in which applications
> keep references to kernel objects despite no longer needing them.  Our
> immediate goal is to determine which of these is the case: is it a kernel
> bug, or an application bug?  Using tools like netstat and sockstat, we can
> try and determine if all kernel sockets are properly referenced.  Experience
> suggests that it is an application bug, but we shouldn't rule out a kernel
> bug; the good news is that the tools to use in the debugging process are
> identical at this stage.

'k, in preparation for it starting, so that I can reboot as quickly as 
possible, but get max information ... do I just want to save the output of 
'sockstat -u' and 'netstat -nA', or is there something else that will be useful?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOz294QvfyHIvDvMRAsy6AKCme99kb27uIHrgLC53fVCZrqKkSgCgheFR
2DYk1DPdmAGzoJhqAXpt+Sc=
=G1NF
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-04 Thread Robert Watson

On Thu, 3 May 2007, Marc G. Fournier wrote:

I'm trying to probe this as well as I can, but network stacks and sockets 
have never been my strong suit ...


Robert had mentioned in one of his emails about a "Sockets can also exist 
without any referencing process (if the application closes, but there is 
still data draining on an open socket)."


Now, that makes sense to me, I can understand that ... but, how would that 
look as far as netstat -nA shows?  Or, would it?  For example, I have:


mars# netstat -nA | grep c9655a20
c9655a20 stream  0  00 c95d63f000
c95d63f0 stream  0  00 c9655a2000
mars# netstat -nA | grep c95d63f0
c9655a20 stream  0  00 c95d63f000
c95d63f0 stream  0  00 c9655a2000

They are attached to each other, but there appears to be no 'referencing 
process' ... it is now 10pm at night ... I saved a 'snapshot' of netstat -nA 
output at 6:45pm, over 3 hours ago, and it has the same entries as above:


c9655a20 stream  0  00 c95d63f000
c95d63f0 stream  0  00 c9655a2000

again, if I'm reading this right, there is no 'referencing process' ... 
first, of course, am I reading this right?


second ... if I am reading this right, and, if I am understanding what 
Robert was saying about 'draining' (alot of ifs, I know) ... isn't it odd 
for it to take >3 hours to drain?


Again, if I'm reading / understanding things right, without the 'referencing 
process', it won't show up in sockstat -u, which is why my netstat -nA 
numbers keep growing, but sockstat -u numbers don't ... which also means 
that there is no way to figure out what process / program is leaving 
'dangling sockets'? :(


I think we should be careful to avoid prematurely drawing conclusions about 
the source of the problem.  First question: have you confirmed that the 
resource limit on sockets is definitely what is causing the error you're 
seeing?  I.e., does the number of sockets hit the maximum sockets?


Second point: there are two kinds of resource leaks that seem likely 
candidates for a socket resource exhaustion problem. First, kernel bugs, in 
which the kernel maintains objects despite there being no application 
references, and second, application reference leaks, in which applications 
keep references to kernel objects despite no longer needing them.  Our 
immediate goal is to determine which of these is the case: is it a kernel bug, 
or an application bug?  Using tools like netstat and sockstat, we can try and 
determine if all kernel sockets are properly referenced.  Experience suggests 
that it is an application bug, but we shouldn't rule out a kernel bug; the 
good news is that the tools to use in the debugging process are identical at 
this stage.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-04 Thread Robert Watson


On Thu, 3 May 2007, Marc G. Fournier wrote:

'k, all I'm looking at right now is the Unix Domain Sockets, and the output 
of netstat -> sockstat is growing since I first started counting both ..


This was shortly after reboot:

mars# netstat -A | grep stream | wc -l ; sockstat -u | wc -l
   2705
   2981

- From your explanation above, I'm guessing that the higher sockstat #s is where
you were talking about one socket being used by multiple processes?  But, right
now:

mars# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l
   5025
   2905

sockstat -u #s are *down*, but netstat -na is almost double ...

Again, based on what you state above: "Sockets can also exist without any 
referencing process (if the application closes, but there is still data 
draining on an open socket)."


Now, looking at another 6-STABLE server, but one that has been running for 2 
months now, I'm seeing numbers more consistent with what mars looks like 
shortly after all the jails start up:


venus# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l
   2126
   2209

So, if those sockets on mars are 'still draining on an open socket', is 
there some way of finding out where?  If I'm understanding what you've said 
above, these 'draining sockets' don't have any processes associated with 
them anymore? So, its not like I can just kill off a process, correct?


The draining state cannot occur for UNIX domain sockets.  The only cases I 
know of in which UNIX domain sockets can have PCBs without a process 
connection is if the UNIX domain socket is attached to a socket that is being 
passed over another socket where the original socket has released all other 
references to it, and in using FIFOs.  The former is a relatively rare 
occurence with almost all applications, since very few use explicit file 
descriptor passing.  Is there any chance that any of your applications is 
using a large number of POSIX FIFOs?


BTW, when using sockstat as above, you need to sockstat -u | grep -c stream, 
for the same reason you do it with netstat.  Datagram UNIX domain sockets are 
quite frequently used -- for example, with syslog, so need to be omitted from 
the count if you are comparing only stream sockets.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-03 Thread Ian Smith
On Thu, 3 May 2007, Marc G. Fournier wrote:

 > Robert had mentioned in one of his emails about a "Sockets can also exist 
 > without any referencing process (if the application closes, but there is 
 > still 
 > data draining on an open socket)."

[..]

 > Again, if I'm reading / understanding things right, without the 'referencing 
 > process', it won't show up in sockstat -u, which is why my netstat -nA 
 > numbers 
 > keep growing, but sockstat -u numbers don't ... which also means that there 
 > is 
 > no way to figure out what process / program is leaving 'dangling sockets'? :(

Marc, I don't know if it may provide any more clues in this instance,
but lsof -U also shows unix domain sockets with pid, command and fd. 

Cheers, Ian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-03 Thread Matthew Dillon

:*groan*  why couldn't this be happening on a server that I have better remote 
:access to? :(
:
:But, based on your explanation(s) above ... if I kill off all of the jail(s) 
on 
:the machine, so that there are minimal processes running, shouldn't I see a 
:significant drop in the number of sockets in use as well?  or is there 
:something special about single user mode vs just killing off all 'extra 
:processes'?
:
:- 
:Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)

Yes, you can.  Nothing special about single user... just kill all
the processes that might be using sockets.  Killing the jails is a good
start.

If you are running a lot of jails then I would strongly suspect that
there is an issue with file desciptor passing over unix domain sockets.
In particular, web servers, databases, and java or other applets could
be the culprit.

Other possibilities... you could just be running out of file descriptors
in the file descriptor table.

use vmstat -m and vmstat -z too... find out what allocates the socket
memory and see what it reports.  Check your mbuf allocation statistics
too (netstat -m).  Damn, I wish that information were collected
on a per-jail basis but I don't think it is.  Look at all the memory
statistics and check to see if anything is growing unbounded over a
long period of time (verses just growing into a cache balance).  Create
a cron job that dumps memory statistics once a minute to a file then
break each report with a clear-screen sequence and cat it in a really
big xterm window.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-03 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Thursday, May 03, 2007 18:26:30 -0700 Matthew Dillon 
<[EMAIL PROTECTED]> wrote:


> One thing you can do is drop into single user mode... kill all the
> processes on the system, and see if the sockets are recovered.  That
> will give you a good idea as to whether it is a real leak or whether
> some process is directly or indirectly (by not draining a unix domain
> socket on which other sockets are being transfered) holding onto the
> socket.

*groan*  why couldn't this be happening on a server that I have better remote 
access to? :(

But, based on your explanation(s) above ... if I kill off all of the jail(s) on 
the machine, so that there are minimal processes running, shouldn't I see a 
significant drop in the number of sockets in use as well?  or is there 
something special about single user mode vs just killing off all 'extra 
processes'?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOpeM4QvfyHIvDvMRAoppAJ9SNmIi+i2vDXEZzrpaVe74a3uKyQCfeMY7
z3lFWXEo111CL5peXvqqsCQ=
=qxmO
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-03 Thread Matthew Dillon
:I'm trying to probe this as well as I can, but network stacks and sockets have 
:never been my strong suit ...
:
:Robert had mentioned in one of his emails about a "Sockets can also exist 
:without any referencing process (if the application closes, but there is still 
:data draining on an open socket)."
:
:Now, that makes sense to me, I can understand that ... but, how would that 
look 
:as far as netstat -nA shows?  Or, would it?  For example, I have:
:
:...

Netstat should show any sockets, whether they are attached to processes
or not.  Usually you can match up the address from netstat -nA with
the addresses from sockets shown by fstat to figure out what processes
the sockets are attached to.

There are three situations that you have to watch out for:

(1) The socket was close()'d and is still draining.  The socket
will timeout and terminate within ~1-5 minutes.  It will not
be referenced to a descriptor or process.

(2) The socket descriptor itself has been sent over a unix domain socket
from one process to another and is currently in transit.  The 
file pointer representing the descriptor is what is actually in
transit, and will not be referenced by any processes while it is
in transit.

There is a garbage collector that figures out unreferencable loops.
I think its called unp_gc or something like that.

(3) The socket is not closed, but is idle (like having a remote shell
open and never typing in it).  Service processes can get stuck
waiting for data on such sockets.  The socket WILL be referenced
by some process.

These are controlled by net.inet.tcp.keep* and
net.inet.tcp.always_keepalive.  I almost universally turn on
net.inet.tcp.always_keepalive to ensure that dead idle connections
get cleaned out.

Note that keepalive only applies to idle connections.  A socket
that has been closed and needs to drain (either data or the FIN
state) will timeout and clean up itself whether keepalive is
turned on or off).

netstat -nA will give you the status of all your sockets.  You can
observe the state of any TCP sockets.

Unix domain sockets have no state and closure is governed simply by
them being dereferenced, just like a pipe.  In this case there are really
only two situations:  (1) One end of the unix domain socket is still
referenced by a process or (2) The socket has been sent over another
unix domain socket and is 'in transit'.  The socket will remain intact
until it is either no longer in transit (read out from the other unix
domain socket), or the garbage collector determines that the socket the
descripor is transiting over is not externally referencablee, and
will destroy it and any in-transit sockets contained within.

Any sockets that don't fall into these categories are in trouble...
either a timer has failed somewhere or (if unix domain) the garbage
collector has failed to detect that it is in an unreferencable loop.

-

One thing you can do is drop into single user mode... kill all the 
processes on the system, and see if the sockets are recovered.  That
will give you a good idea as to whether it is a real leak or whether
some process is directly or indirectly (by not draining a unix domain
socket on which other sockets are being transfered) holding onto the
socket.

-Matt

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

2007-05-03 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


I'm trying to probe this as well as I can, but network stacks and sockets have 
never been my strong suit ...

Robert had mentioned in one of his emails about a "Sockets can also exist 
without any referencing process (if the application closes, but there is still 
data draining on an open socket)."

Now, that makes sense to me, I can understand that ... but, how would that look 
as far as netstat -nA shows?  Or, would it?  For example, I have:

mars# netstat -nA | grep c9655a20
c9655a20 stream  0  00 c95d63f000
c95d63f0 stream  0  00 c9655a2000
mars# netstat -nA | grep c95d63f0
c9655a20 stream  0  00 c95d63f000
c95d63f0 stream  0  00 c9655a2000

They are attached to each other, but there appears to be no 'referencing 
process' ... it is now 10pm at night ... I saved a 'snapshot' of netstat -nA 
output at 6:45pm, over 3 hours ago, and it has the same entries as above:

c9655a20 stream  0  00 c95d63f000
c95d63f0 stream  0  00 c9655a2000

again, if I'm reading this right, there is no 'referencing process' ... first, 
of course, am I reading this right?

second ... if I am reading this right, and, if I am understanding what Robert 
was saying about 'draining' (alot of ifs, I know) ... isn't it odd for it to 
take >3 hours to drain?

Again, if I'm reading / understanding things right, without the 'referencing 
process', it won't show up in sockstat -u, which is why my netstat -nA numbers 
keep growing, but sockstat -u numbers don't ... which also means that there is 
no way to figure out what process / program is leaving 'dangling sockets'? :(


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOoe94QvfyHIvDvMRAj2LAKDXobcYr4VGOB+WfXYqCBTatZNZLQCfbyWa
zsG/o1K3RM3ybjA5RLiSW5s=
=8DJi
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-03 Thread Adrian Chadd

On 04/05/07, Marc G. Fournier <[EMAIL PROTECTED]> wrote:


'k, all I'm looking at right now is the Unix Domain Sockets, and the output of
netstat -> sockstat is growing since I first started counting both ..


Hm! What about graphing them? It shouldn't be hard to write an mrtg
shell script data source to graph these things on your different
servers to compare/contrast. You could graph different chunks of the
netstat -m output along with the stuff below.

Ever played with munin, btw? The exercise would be to find other stuff
on the system to correlate against your apparent unbounded socket
growth and then investigate possible causation.



Adrian



This was shortly after reboot:

mars# netstat -A | grep stream | wc -l ; sockstat -u | wc -l
2705
2981

- From your explanation above, I'm guessing that the higher sockstat #s is where
you were talking about one socket being used by multiple processes?  But, right
now:

mars# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l
5025
2905

sockstat -u #s are *down*, but netstat -na is almost double ...

Again, based on what you state above: "Sockets can also exist without any
referencing process (if the application closes, but there is still data
draining on an open socket)."

Now, looking at another 6-STABLE server, but one that has been running for 2
months now, I'm seeing numbers more consistent with what mars looks like
shortly after all the jails start up:

venus# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l
2126
2209

So, if those sockets on mars are 'still draining on an open socket', is there
some way of finding out where?  If I'm understanding what you've said above,
these 'draining sockets' don't have any processes associated with them anymore?
So, its not like I can just kill off a process, correct?


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOlh34QvfyHIvDvMRApSUAJ9jPszXBw83hXPRLbczimNWFtn6WwCgpijT
nDWi/kW4Gt8/J2a4U3n2prk=
=IQCW
-END PGP SIGNATURE-





--
Adrian Chadd - [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-03 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Thursday, May 03, 2007 19:28:56 +0100 Robert Watson <[EMAIL PROTECTED]> 
wrote:

> I generally recommend using a combination of netstat and sockstat.  Sockets
> represent, loosely, IPC endpoints.  There are actually two "layers"
> associated with each socket -- the IPC object (socket) and the protocol
> control block (PCB).  Both are resource limited to pevent run-away processes
> from swamping the system, so exhaustion of either can lead to ENOBUFS.
>
> The behaviors of netstat and sockstat are quite different, even though the
> output is similar: netstat walks the protocol-layer connection lists and
> prints information about them.  sockstat walks the process file descriptor
> table and prints information on reachable sockets.  As sockets can exist
> without PCBs, and PCBs can exist without sockets, you need to look at both to
> get a full picture.  This can occur if a proces exits, closes the socket, and
> the connection remains in, for example, the TIME_WAIT state.
>
> There are some other differences -- the same socket can appear more than once
> in sockstat output, as more than one process can reference the same socket.
> Sockets can also exist without any referencing process (if the application
> closes, but there is still data draining on an open socket).
>
> I would suggest starting with sockstat, as that will allow you to link socket
> use to applications, and provide a fairly neat summary.  When using netstat,
> use "netstat -na", which will list all sockets and avoid name lookups.

'k, all I'm looking at right now is the Unix Domain Sockets, and the output of 
netstat -> sockstat is growing since I first started counting both ..

This was shortly after reboot:

mars# netstat -A | grep stream | wc -l ; sockstat -u | wc -l
2705
2981

- From your explanation above, I'm guessing that the higher sockstat #s is 
where 
you were talking about one socket being used by multiple processes?  But, right 
now:

mars# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l
5025
2905

sockstat -u #s are *down*, but netstat -na is almost double ...

Again, based on what you state above: "Sockets can also exist without any 
referencing process (if the application closes, but there is still data 
draining on an open socket)."

Now, looking at another 6-STABLE server, but one that has been running for 2 
months now, I'm seeing numbers more consistent with what mars looks like 
shortly after all the jails start up:

venus# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l
2126
2209

So, if those sockets on mars are 'still draining on an open socket', is there 
some way of finding out where?  If I'm understanding what you've said above, 
these 'draining sockets' don't have any processes associated with them anymore? 
So, its not like I can just kill off a process, correct?


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOlh34QvfyHIvDvMRApSUAJ9jPszXBw83hXPRLbczimNWFtn6WwCgpijT
nDWi/kW4Gt8/J2a4U3n2prk=
=IQCW
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-03 Thread Robert Watson


On Wed, 2 May 2007, Marc G. Fournier wrote:


# netstat | egrep "tcp4|udp4" | awk '{print $1}' | uniq -c
171 tcp4
103 udp4

or is there a better command I should be using?


I generally recommend using a combination of netstat and sockstat.  Sockets 
represent, loosely, IPC endpoints.  There are actually two "layers" associated 
with each socket -- the IPC object (socket) and the protocol control block 
(PCB).  Both are resource limited to pevent run-away processes from swamping 
the system, so exhaustion of either can lead to ENOBUFS.


The behaviors of netstat and sockstat are quite different, even though the 
output is similar: netstat walks the protocol-layer connection lists and 
prints information about them.  sockstat walks the process file descriptor 
table and prints information on reachable sockets.  As sockets can exist 
without PCBs, and PCBs can exist without sockets, you need to look at both to 
get a full picture.  This can occur if a proces exits, closes the socket, and 
the connection remains in, for example, the TIME_WAIT state.


There are some other differences -- the same socket can appear more than once 
in sockstat output, as more than one process can reference the same socket. 
Sockets can also exist without any referencing process (if the application 
closes, but there is still data draining on an open socket).


I would suggest starting with sockstat, as that will allow you to link socket 
use to applications, and provide a fairly neat summary.  When using netstat, 
use "netstat -na", which will list all sockets and avoid name lookups.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-03 Thread Robert Watson

On Tue, 1 May 2007, Marc G. Fournier wrote:

I'm still being hit by this one ... more frequently right now as I had to 
move a bit more stuff *onto* that server ... I'm trying to figure out what I 
can monitor for a 'leak' somewhere, but the only thing I'm able to find is 
the whole nmbclusters stuff:


mars# netstat -m | grep "mbuf clusters"
130/542/672/25600 mbuf clusters in use (current/cache/total/max)

the above is after 26hrs uptime ...

Is there something else that will trigger/generate the above error message?


ENOBUFS is a common error in the network stack reflecting a lack of free 
memory or exceeding a system, user, or process resource limit.  While the 
classic source of ENOBUFS is mbuf or mbuf cluster exhaustion, there are 
several other sources of the error.  For example, you will get ENOBUFS back if 
you run out of sockets, or a process tries to increase the size of socket 
buffers beyond the user resource limit.  I'd look at all the output of netstat 
-m, not just clusters.  I'd also look at kern.ipc.numopensockets and compare 
it to kern.ipc.maxsockets.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-03 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Thursday, May 03, 2007 11:17:56 -0400 Chuck Swiger <[EMAIL PROTECTED]>
wrote:


> The ones you're showing are from Postfix.  It would be interesting to sort
> them by frequency and see what the majority of the use is from.
>
> If you sort the data by the conn field, do the ones without an address all
> hit the same thing?  If you grep for that in the first field, I found a lot
> that are talking to /var/run/logpriv (ie, a socketpair() to syslogd,
> presumably).

Okay, assuming that I'm doing this right, here' what I have:

Last night, before I went to bed:

mars# netstat -A | grep stream | wc -l ; sockstat -u | wc -l
2705
2981

Today, 5 minutes ago:

# netstat -A | grep stream | wc -l ; sockstat -u | wc -l
4397
2961

Looking at the Conn field from netstat -A:

mars# awk '{print $6}' /tmp/output | sort | uniq -c | sort -nr | head -5
2125 0
   1 d14dbe10
   1 d14dbbd0
   1 d14dbb40
   1 d14dba20

So, 2125 sockets not connected to anything?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOhP+4QvfyHIvDvMRAhdvAKCZo5JRwFea0E8wb+iFblJ1aHM57gCdEb2T
KMJhc7OT5kyQNMslL7Rm+LE=
=+0kp
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-03 Thread Chuck Swiger

Marc G. Fournier wrote:
[ ... ]
okay, next question ... under 'Active UNIX domain sockets, I see alot that have 
no Addr:


Active UNIX domain sockets
Address  Type   Recv-Q Send-QInode Conn Refs  Nextref Addr
d06b7480 stream  0  00 c969b24000 
private/proxymap

c969b240 stream  0  00 d06b748000
ce6fc870 stream  0  00 cf74487000 
private/rewrite

cf744870 stream  0  00 ce6fc87000
ce4b2630 stream  0  00 d0cee90000 
private/proxymap


The ones you're showing are from Postfix.  It would be interesting to sort 
them by frequency and see what the majority of the use is from.


If you sort the data by the conn field, do the ones without an address all hit 
the same thing?  If you grep for that in the first field, I found a lot that 
are talking to /var/run/logpriv (ie, a socketpair() to syslogd, presumably).


--
-Chuck
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-02 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


'k, I just rebooted the server (messages started again), and netstat -A is 
showing 3600 sockets open ... based on jupiter/pluto/venus numbers, this is 
what I'd expect to see (~1000 sockets per 30 jails) ... so, over the course of 
hte next 2 days, I expect that that will grow to the 11k+ that I saw when I 
rebooted, with most of those apparently not attached to an 'Addr' ...

- --On Wednesday, May 02, 2007 17:47:59 -0300 "Marc G. Fournier" 
<[EMAIL PROTECTED]> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
>
> - --On Wednesday, May 02, 2007 11:17:02 -0700 John-Mark Gurney
> <[EMAIL PROTECTED]> wrote:
>
>
>> netstat -A will list the socket address, fstat will list the fd, and what
>> socket it connected to that fd..
>
> Oh wow ... according to this, I have:
>
> mars# wc -l /tmp/output
>11238 /tmp/output
>
> (minus some header lines) sockets running righ tnow ...
>
> okay, next question ... under 'Active UNIX domain sockets, I see alot that
> have  no Addr:
>
> Active UNIX domain sockets
> Address  Type   Recv-Q Send-QInode Conn Refs  Nextref Addr
> d06b7480 stream  0  00 c969b24000
> private/proxymap
> c969b240 stream  0  00 d06b748000
> ce6fc870 stream  0  00 cf74487000
> private/rewrite
> cf744870 stream  0  00 ce6fc87000
> ce4b2630 stream  0  00 d0cee90000
> private/proxymap
> d0cee900 stream  0  00 ce4b263000
> d0437240 stream  0  00 cf71600000
> private/proxymap
> cf716000 stream  0  00 d043724000
> c94f4990 stream  0  00 cee6ed8000
> private/rewrite
> cee6ed80 stream  0  00 c94f499000
> d0cefcf0 stream  0  00 cb281a2000
> private/rewrite
> cb281a20 stream  0  00 d0cefcf000
> ce0d5240 stream  0  00 cb25148000
> private/anvil
>
> Now, the 'Conn' field from the previous line matches the 'Address' line of
> the  'blank Addr' ... so there are two sockets for each Addr?  in vs out?
>
> To give reference point ... mars above has 91 jail'd environments running on
> it, its been up 2days, 9hrs now, and has 11k sockets in use ...
>
> Hrmmm ... just checked jupiter, and she has 32 jail with 1080 sockets ...
> venus  has 62
> jail with 2819 sockets ... and pluto has 35 jails with 1818 sockets ...
>
> mars is running on average 2x the number of sockets per jail then the other
> servers ...
>
> Is this normal?
>
> mars# grep d067f900 /tmp/output
> d067f900 stream  0  00 cafd4c6000
> cafd4c60 stream  0  00 d067f90000
>
> There is no 'Addr' related to either of them?  I can scroll down pages and
> pages of those types of entries, that don't have any Addr field associated
> with  them ...
>
>
>
>>
>> --
>>   John-Mark Gurney   Voice: +1 415 225 5579
>>
>>  "All that I will do, has been done, All that I have, has not."
>> ___
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
>
>
> - 
> Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
> Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
> Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.5 (FreeBSD)
>
> iD8DBQFGOPj/4QvfyHIvDvMRAsbFAKDRrAE4QazlJ1iQM6lLOULBwdNSygCfV2r2
> AeY8lpmf0E+Av1zmAGijo+g=
> =zDXV
> -END PGP SIGNATURE-
>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"



- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOVOS4QvfyHIvDvMRAv1RAJwIU84/Mh+8fdJVuyScsljFDSQB1QCg11Qe
C6U/KSqScqYTHUhEM1dLXQM=
=mzYI
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-02 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


- --On Wednesday, May 02, 2007 11:17:02 -0700 John-Mark Gurney 
<[EMAIL PROTECTED]> wrote:


> netstat -A will list the socket address, fstat will list the fd, and what
> socket it connected to that fd..

Oh wow ... according to this, I have:

mars# wc -l /tmp/output
   11238 /tmp/output

(minus some header lines) sockets running righ tnow ...

okay, next question ... under 'Active UNIX domain sockets, I see alot that have 
no Addr:

Active UNIX domain sockets
Address  Type   Recv-Q Send-QInode Conn Refs  Nextref Addr
d06b7480 stream  0  00 c969b24000 
private/proxymap
c969b240 stream  0  00 d06b748000
ce6fc870 stream  0  00 cf74487000 
private/rewrite
cf744870 stream  0  00 ce6fc87000
ce4b2630 stream  0  00 d0cee90000 
private/proxymap
d0cee900 stream  0  00 ce4b263000
d0437240 stream  0  00 cf71600000 
private/proxymap
cf716000 stream  0  00 d043724000
c94f4990 stream  0  00 cee6ed8000 
private/rewrite
cee6ed80 stream  0  00 c94f499000
d0cefcf0 stream  0  00 cb281a2000 
private/rewrite
cb281a20 stream  0  00 d0cefcf000
ce0d5240 stream  0  00 cb25148000 private/anvil

Now, the 'Conn' field from the previous line matches the 'Address' line of the 
'blank Addr' ... so there are two sockets for each Addr?  in vs out?

To give reference point ... mars above has 91 jail'd environments running on 
it, its been up 2days, 9hrs now, and has 11k sockets in use ...

Hrmmm ... just checked jupiter, and she has 32 jail with 1080 sockets ... venus 
has 62
jail with 2819 sockets ... and pluto has 35 jails with 1818 sockets ...

mars is running on average 2x the number of sockets per jail then the other 
servers ...

Is this normal?

mars# grep d067f900 /tmp/output
d067f900 stream  0  00 cafd4c6000
cafd4c60 stream  0  00 d067f90000

There is no 'Addr' related to either of them?  I can scroll down pages and 
pages of those types of entries, that don't have any Addr field associated with 
them ...



>
> --
>   John-Mark GurneyVoice: +1 415 225 5579
>
>  "All that I will do, has been done, All that I have, has not."
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"



- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOPj/4QvfyHIvDvMRAsbFAKDRrAE4QazlJ1iQM6lLOULBwdNSygCfV2r2
AeY8lpmf0E+Av1zmAGijo+g=
=zDXV
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-02 Thread John-Mark Gurney
Marc G. Fournier wrote this message on Wed, May 02, 2007 at 14:34 -0300:
> Is there any way of determining which apps are holding open which sockets?  
> ie. 
> lsof for open files?

netstat -A will list the socket address, fstat will list the fd, and what
socket it connected to that fd..

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-02 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Wednesday, May 02, 2007 11:00:17 +0800 Adrian Chadd <[EMAIL PROTECTED]> 
wrote:


> It doesn't panic whe it happens, no?

Nope ... I can login via ssh (sometimes it takes a try or two, but I can always 
login) and then do a 'reboot', and all is well again for another 72 hours or so 
...

> I'd check the number of sockets you've currently got open at that
> point.

ie:

# netstat | egrep "tcp4|udp4" | awk '{print $1}' | uniq -c
 171 tcp4
 103 udp4

or is there a better command I should be using?

> Some applications might be holding open a whole load of sockets
> and their buffers stay allocated until they're closed. If they don't
> handle/don't get told about the error then they'll just hold open the
> mbufs.

Is there any way of determining which apps are holding open which sockets?  ie. 
lsof for open files?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOMu/4QvfyHIvDvMRAldVAJ9B4uUUGbON16nWw+dR5QKveyQevACgju4M
TtBVUWAqf2PGqHVQxOnRbew=
=4/1c
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: What triggers "No Buffer Space Available"?

2007-05-01 Thread Adrian Chadd

On 01/05/07, Marc G. Fournier <[EMAIL PROTECTED]> wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


I'm still being hit by this one ... more frequently right now as I had to move
a bit more stuff *onto* that server ... I'm trying to figure out what I can
monitor for a 'leak' somewhere, but the only thing I'm able to find is the
whole nmbclusters stuff:

mars# netstat -m | grep "mbuf clusters"
130/542/672/25600 mbuf clusters in use (current/cache/total/max)

the above is after 26hrs uptime ...

Is there something else that will trigger/generate the above error message?


It doesn't panic whe it happens, no?

I'd check the number of sockets you've currently got open at that
point. Some applications might be holding open a whole load of sockets
and their buffers stay allocated until they're closed. If they don't
handle/don't get told about the error then they'll just hold open the
mbufs.

(I came across this when banging TCP connections through a simple TCP
socket proxy and wondered why networking would lock up. Turns out
FreeBSD-6 isn't logging the "please consider raising NMBCLUSTERS"
kernel message anymore and I needed to do exactly that. Killing the
proxy process actually restored network connectivity.)


Adrian

--
Adrian Chadd - [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


What triggers "No Buffer Space Available"?

2007-05-01 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


I'm still being hit by this one ... more frequently right now as I had to move 
a bit more stuff *onto* that server ... I'm trying to figure out what I can 
monitor for a 'leak' somewhere, but the only thing I'm able to find is the 
whole nmbclusters stuff:

mars# netstat -m | grep "mbuf clusters"
130/542/672/25600 mbuf clusters in use (current/cache/total/max)

the above is after 26hrs uptime ...

Is there something else that will trigger/generate the above error message?


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGN0W14QvfyHIvDvMRAo+CAKCGpBrcf30/BWFJcrKsJNFr2G7jJQCff67L
FxFIiBd52huPFdQgb88AtHE=
=mbLc
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"