Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-11 Thread Gleb Smirnoff
On Sun, Dec 09, 2012 at 09:57:30AM -0800, Richard Sharpe wrote:
R> > lsof and sockstat can be helpful.  lsof may be able to help determine if 
R> > there's a leak because it MAY will find sockets not associated with a 
R> > process.
R> > 
R> > Hope this helps.
R> 
R> Thanks Alfred. After following through the call graph and confirming
R> (with the code) that it was correct, I am now pretty convinced that I
R> was wrong in assuming that it was a socket leak.

You can always check number of socket allocations in kernel via:

  vmstat -z | grep ^socket | awk '{print $4}'

If you can't establish a scenario when the number infinitely grows,
then there is no leak.

-- 
Totus tuus, Glebius.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-09 Thread Richard Sharpe
On Sun, 2012-12-09 at 00:10 -0800, Alfred Perlstein wrote:
> On 12/8/12 5:05 PM, Richard Sharpe wrote:
> > On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:
> >>> Hi folks,
> >>>
> >>> Our QA group (at xxx) using Samba and smbtorture has been seeing a
> >>> lot of cases where accept returns ECONNABORTED because the system load
> >>> is high and Samba has a large listen backlog.
> >>>
> >>> Every now and then we get a crash in smbd or in winbindd and winbindd
> >>> complains of too many open files in the system.
> >>>
> >>> In looking at kern_accept, it seems to me that FreeBSD can leak a socket
> >>> when kern_accept calls soaccept on it but gets ECONNABORTED. This error
> >>> is the only error returned from tcp_usr_accept.
> >>>
> >>> It seems like the socket taken off so_comp is never freed in this case
> >>> and that there has been a call on soref on it as well, so that something
> >>> like the following is needed in the error path:
> >>>
> >>>  //some-path/freebsd/sys/kern/uipc_syscalls.c#1
> >>> - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
> >>> @@ -433,6 +433,14 @@
> >>>   */
> >>>  if (name)
> >>>  *namelen = 0;
> >>> +   /*
> >>> +* We need to close the socket we unlinked
> >>> +* so we do not leak it.
> >>> +*/
> >>> +   ACCEPT_LOCK();
> >>> +   SOCK_LOCK(so);
> >>> +   soclose(so);
> >>>  goto noconnection;
> >>>  }
> >>>  if (sa == NULL) {
> >>>
> >>> I think an soclose is needed at this point because soisconnected has
> >>> been called on the socket.
> >>>
> >>> Do you think this analysis is reasonable?
> >>   >
> >>> We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
> >>> maybe I am wrong since I am not sure if the fdclose call would free the
> >>> socket, but a quick look suggested that it doesn't.
> >> The fdclose should properly tear down the file descriptor.  The call
> >> graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() ->
> >> soclose() -> sorele() -> sofree() -> sodealloc().
> >>
> >> A socket leak would not count against "kern.maxfiles" unless the file
> >> descriptor leaks as well.  So it is unlikely that this is the problem.
> > OK, thanks for the feedback. I will keep looking.
> >
> >> Samba may open a large number of files (real files and sockets) and
> >> you may run into the maxfiles limit.  You can check the limit with
> >> "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf
> >> with "kern.maxfiles=10" for example.
> > Well, some of the smbds are dying, but it is possible that there is a
> > file leak in Samba or our VFS that we are tripping as well.
> 
> lsof and sockstat can be helpful.  lsof may be able to help determine if 
> there's a leak because it MAY will find sockets not associated with a 
> process.
> 
> Hope this helps.

Thanks Alfred. After following through the call graph and confirming
(with the code) that it was correct, I am now pretty convinced that I
was wrong in assuming that it was a socket leak.

However, lsof will be useful in allowing me to see how many FDs each
smdb in this test is using. We have, I am told, kern.maxfiles set to
65536, which I think might be a little low for the test they are
running. 


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-09 Thread Alfred Perlstein

On 12/8/12 5:05 PM, Richard Sharpe wrote:

On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:

Hi folks,

Our QA group (at xxx) using Samba and smbtorture has been seeing a
lot of cases where accept returns ECONNABORTED because the system load
is high and Samba has a large listen backlog.

Every now and then we get a crash in smbd or in winbindd and winbindd
complains of too many open files in the system.

In looking at kern_accept, it seems to me that FreeBSD can leak a socket
when kern_accept calls soaccept on it but gets ECONNABORTED. This error
is the only error returned from tcp_usr_accept.

It seems like the socket taken off so_comp is never freed in this case
and that there has been a call on soref on it as well, so that something
like the following is needed in the error path:

 //some-path/freebsd/sys/kern/uipc_syscalls.c#1
- /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
@@ -433,6 +433,14 @@
  */
 if (name)
 *namelen = 0;
+   /*
+* We need to close the socket we unlinked
+* so we do not leak it.
+*/
+   ACCEPT_LOCK();
+   SOCK_LOCK(so);
+   soclose(so);
 goto noconnection;
 }
 if (sa == NULL) {

I think an soclose is needed at this point because soisconnected has
been called on the socket.

Do you think this analysis is reasonable?

  >

We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
maybe I am wrong since I am not sure if the fdclose call would free the
socket, but a quick look suggested that it doesn't.

The fdclose should properly tear down the file descriptor.  The call
graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() ->
soclose() -> sorele() -> sofree() -> sodealloc().

A socket leak would not count against "kern.maxfiles" unless the file
descriptor leaks as well.  So it is unlikely that this is the problem.

OK, thanks for the feedback. I will keep looking.


Samba may open a large number of files (real files and sockets) and
you may run into the maxfiles limit.  You can check the limit with
"sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf
with "kern.maxfiles=10" for example.

Well, some of the smbds are dying, but it is possible that there is a
file leak in Samba or our VFS that we are tripping as well.


lsof and sockstat can be helpful.  lsof may be able to help determine if 
there's a leak because it MAY will find sockets not associated with a 
process.


Hope this helps.

-Alfred

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-08 Thread Richard Sharpe
On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:
> > Hi folks,
> >
> > Our QA group (at xxx) using Samba and smbtorture has been seeing a
> > lot of cases where accept returns ECONNABORTED because the system load
> > is high and Samba has a large listen backlog.
> >
> > Every now and then we get a crash in smbd or in winbindd and winbindd
> > complains of too many open files in the system.
> >
> > In looking at kern_accept, it seems to me that FreeBSD can leak a socket
> > when kern_accept calls soaccept on it but gets ECONNABORTED. This error
> > is the only error returned from tcp_usr_accept.
> >
> > It seems like the socket taken off so_comp is never freed in this case
> > and that there has been a call on soref on it as well, so that something
> > like the following is needed in the error path:
> >
> >  //some-path/freebsd/sys/kern/uipc_syscalls.c#1
> > - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
> > @@ -433,6 +433,14 @@
> >  */
> > if (name)
> > *namelen = 0;
> > +   /*
> > +* We need to close the socket we unlinked
> > +* so we do not leak it.
> > +*/
> > +   ACCEPT_LOCK();
> > +   SOCK_LOCK(so);
> > +   soclose(so);
> > goto noconnection;
> > }
> > if (sa == NULL) {
> >
> > I think an soclose is needed at this point because soisconnected has
> > been called on the socket.
> >
> > Do you think this analysis is reasonable?
>  >
> > We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
> > maybe I am wrong since I am not sure if the fdclose call would free the
> > socket, but a quick look suggested that it doesn't.
> 
> The fdclose should properly tear down the file descriptor.  The call
> graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() ->
> soclose() -> sorele() -> sofree() -> sodealloc().
> 
> A socket leak would not count against "kern.maxfiles" unless the file
> descriptor leaks as well.  So it is unlikely that this is the problem.

OK, thanks for the feedback. I will keep looking.

> Samba may open a large number of files (real files and sockets) and
> you may run into the maxfiles limit.  You can check the limit with
> "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf
> with "kern.maxfiles=10" for example.

Well, some of the smbds are dying, but it is possible that there is a
file leak in Samba or our VFS that we are tripping as well.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-08 Thread Andre Oppermann

Hi folks,

Our QA group (at xxx) using Samba and smbtorture has been seeing a
lot of cases where accept returns ECONNABORTED because the system load
is high and Samba has a large listen backlog.

Every now and then we get a crash in smbd or in winbindd and winbindd
complains of too many open files in the system.

In looking at kern_accept, it seems to me that FreeBSD can leak a socket
when kern_accept calls soaccept on it but gets ECONNABORTED. This error
is the only error returned from tcp_usr_accept.

It seems like the socket taken off so_comp is never freed in this case
and that there has been a call on soref on it as well, so that something
like the following is needed in the error path:

 //some-path/freebsd/sys/kern/uipc_syscalls.c#1
- /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
@@ -433,6 +433,14 @@
 */
if (name)
*namelen = 0;
+   /*
+* We need to close the socket we unlinked
+* so we do not leak it.
+*/
+   ACCEPT_LOCK();
+   SOCK_LOCK(so);
+   soclose(so);
goto noconnection;
}
if (sa == NULL) {

I think an soclose is needed at this point because soisconnected has
been called on the socket.

Do you think this analysis is reasonable?

>

We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
maybe I am wrong since I am not sure if the fdclose call would free the
socket, but a quick look suggested that it doesn't.


The fdclose should properly tear down the file descriptor.  The call
graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() ->
soclose() -> sorele() -> sofree() -> sodealloc().

A socket leak would not count against "kern.maxfiles" unless the file
descriptor leaks as well.  So it is unlikely that this is the problem.

Samba may open a large number of files (real files and sockets) and
you may run into the maxfiles limit.  You can check the limit with
"sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf
with "kern.maxfiles=10" for example.

--
Andre

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-08 Thread Adrian Chadd
Hi,

If this is a real leak, please file a PR so it doesn't get lost.
*cough* let me rephrase that - so the eager PR beavers can keep
chasing it iup.

But, wow. Nice catch!



Adrian


On 8 December 2012 10:13, Richard Sharpe  wrote:
> Hi folks,
>
> Our QA group (at xxx) using Samba and smbtorture has been seeing a
> lot of cases where accept returns ECONNABORTED because the system load
> is high and Samba has a large listen backlog.
>
> Every now and then we get a crash in smbd or in winbindd and winbindd
> complains of too many open files in the system.
>
> In looking at kern_accept, it seems to me that FreeBSD can leak a socket
> when kern_accept calls soaccept on it but gets ECONNABORTED. This error
> is the only error returned from tcp_usr_accept.
>
> It seems like the socket taken off so_comp is never freed in this case
> and that there has been a call on soref on it as well, so that something
> like the following is needed in the error path:
>
>  //some-path/freebsd/sys/kern/uipc_syscalls.c#1
> - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
> @@ -433,6 +433,14 @@
>  */
> if (name)
> *namelen = 0;
> +   /*
> +* We need to close the socket we unlinked
> +* so we do not leak it.
> +*/
> +   ACCEPT_LOCK();
> +   SOCK_LOCK(so);
> +   soclose(so);
> goto noconnection;
> }
> if (sa == NULL) {
>
> I think an soclose is needed at this point because soisconnected has
> been called on the socket.
>
> Do you think this analysis is reasonable?
>
> We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
> maybe I am wrong since I am not sure if the fdclose call would free the
> socket, but a quick look suggested that it doesn't.
>
>  I would appreciate your feedback.
>
> ___
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Possible obscure socket leak when system under load and listener is slow to accept

2012-12-08 Thread Richard Sharpe
Hi folks,

Our QA group (at xxx) using Samba and smbtorture has been seeing a
lot of cases where accept returns ECONNABORTED because the system load
is high and Samba has a large listen backlog.

Every now and then we get a crash in smbd or in winbindd and winbindd
complains of too many open files in the system.

In looking at kern_accept, it seems to me that FreeBSD can leak a socket
when kern_accept calls soaccept on it but gets ECONNABORTED. This error
is the only error returned from tcp_usr_accept.

It seems like the socket taken off so_comp is never freed in this case
and that there has been a call on soref on it as well, so that something
like the following is needed in the error path:

 //some-path/freebsd/sys/kern/uipc_syscalls.c#1
- /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
@@ -433,6 +433,14 @@
 */
if (name)
*namelen = 0;
+   /*
+* We need to close the socket we unlinked
+* so we do not leak it.
+*/
+   ACCEPT_LOCK();
+   SOCK_LOCK(so);
+   soclose(so);
goto noconnection;
}
if (sa == NULL) {

I think an soclose is needed at this point because soisconnected has
been called on the socket.

Do you think this analysis is reasonable?

We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
maybe I am wrong since I am not sure if the fdclose call would free the
socket, but a quick look suggested that it doesn't.

 I would appreciate your feedback.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"