Re: Possible obscure socket leak when system under load and listener is slow to accept
On Sun, Dec 09, 2012 at 09:57:30AM -0800, Richard Sharpe wrote: R> > lsof and sockstat can be helpful. lsof may be able to help determine if R> > there's a leak because it MAY will find sockets not associated with a R> > process. R> > R> > Hope this helps. R> R> Thanks Alfred. After following through the call graph and confirming R> (with the code) that it was correct, I am now pretty convinced that I R> was wrong in assuming that it was a socket leak. You can always check number of socket allocations in kernel via: vmstat -z | grep ^socket | awk '{print $4}' If you can't establish a scenario when the number infinitely grows, then there is no leak. -- Totus tuus, Glebius. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Possible obscure socket leak when system under load and listener is slow to accept
On Sun, 2012-12-09 at 00:10 -0800, Alfred Perlstein wrote: > On 12/8/12 5:05 PM, Richard Sharpe wrote: > > On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote: > >>> Hi folks, > >>> > >>> Our QA group (at xxx) using Samba and smbtorture has been seeing a > >>> lot of cases where accept returns ECONNABORTED because the system load > >>> is high and Samba has a large listen backlog. > >>> > >>> Every now and then we get a crash in smbd or in winbindd and winbindd > >>> complains of too many open files in the system. > >>> > >>> In looking at kern_accept, it seems to me that FreeBSD can leak a socket > >>> when kern_accept calls soaccept on it but gets ECONNABORTED. This error > >>> is the only error returned from tcp_usr_accept. > >>> > >>> It seems like the socket taken off so_comp is never freed in this case > >>> and that there has been a call on soref on it as well, so that something > >>> like the following is needed in the error path: > >>> > >>> //some-path/freebsd/sys/kern/uipc_syscalls.c#1 > >>> - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c > >>> @@ -433,6 +433,14 @@ > >>> */ > >>> if (name) > >>> *namelen = 0; > >>> + /* > >>> +* We need to close the socket we unlinked > >>> +* so we do not leak it. > >>> +*/ > >>> + ACCEPT_LOCK(); > >>> + SOCK_LOCK(so); > >>> + soclose(so); > >>> goto noconnection; > >>> } > >>> if (sa == NULL) { > >>> > >>> I think an soclose is needed at this point because soisconnected has > >>> been called on the socket. > >>> > >>> Do you think this analysis is reasonable? > >> > > >>> We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, > >>> maybe I am wrong since I am not sure if the fdclose call would free the > >>> socket, but a quick look suggested that it doesn't. > >> The fdclose should properly tear down the file descriptor. The call > >> graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() -> > >> soclose() -> sorele() -> sofree() -> sodealloc(). > >> > >> A socket leak would not count against "kern.maxfiles" unless the file > >> descriptor leaks as well. So it is unlikely that this is the problem. > > OK, thanks for the feedback. I will keep looking. > > > >> Samba may open a large number of files (real files and sockets) and > >> you may run into the maxfiles limit. You can check the limit with > >> "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf > >> with "kern.maxfiles=10" for example. > > Well, some of the smbds are dying, but it is possible that there is a > > file leak in Samba or our VFS that we are tripping as well. > > lsof and sockstat can be helpful. lsof may be able to help determine if > there's a leak because it MAY will find sockets not associated with a > process. > > Hope this helps. Thanks Alfred. After following through the call graph and confirming (with the code) that it was correct, I am now pretty convinced that I was wrong in assuming that it was a socket leak. However, lsof will be useful in allowing me to see how many FDs each smdb in this test is using. We have, I am told, kern.maxfiles set to 65536, which I think might be a little low for the test they are running. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Possible obscure socket leak when system under load and listener is slow to accept
On 12/8/12 5:05 PM, Richard Sharpe wrote: On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote: Hi folks, Our QA group (at xxx) using Samba and smbtorture has been seeing a lot of cases where accept returns ECONNABORTED because the system load is high and Samba has a large listen backlog. Every now and then we get a crash in smbd or in winbindd and winbindd complains of too many open files in the system. In looking at kern_accept, it seems to me that FreeBSD can leak a socket when kern_accept calls soaccept on it but gets ECONNABORTED. This error is the only error returned from tcp_usr_accept. It seems like the socket taken off so_comp is never freed in this case and that there has been a call on soref on it as well, so that something like the following is needed in the error path: //some-path/freebsd/sys/kern/uipc_syscalls.c#1 - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c @@ -433,6 +433,14 @@ */ if (name) *namelen = 0; + /* +* We need to close the socket we unlinked +* so we do not leak it. +*/ + ACCEPT_LOCK(); + SOCK_LOCK(so); + soclose(so); goto noconnection; } if (sa == NULL) { I think an soclose is needed at this point because soisconnected has been called on the socket. Do you think this analysis is reasonable? > We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, maybe I am wrong since I am not sure if the fdclose call would free the socket, but a quick look suggested that it doesn't. The fdclose should properly tear down the file descriptor. The call graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() -> soclose() -> sorele() -> sofree() -> sodealloc(). A socket leak would not count against "kern.maxfiles" unless the file descriptor leaks as well. So it is unlikely that this is the problem. OK, thanks for the feedback. I will keep looking. Samba may open a large number of files (real files and sockets) and you may run into the maxfiles limit. You can check the limit with "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf with "kern.maxfiles=10" for example. Well, some of the smbds are dying, but it is possible that there is a file leak in Samba or our VFS that we are tripping as well. lsof and sockstat can be helpful. lsof may be able to help determine if there's a leak because it MAY will find sockets not associated with a process. Hope this helps. -Alfred ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Possible obscure socket leak when system under load and listener is slow to accept
On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote: > > Hi folks, > > > > Our QA group (at xxx) using Samba and smbtorture has been seeing a > > lot of cases where accept returns ECONNABORTED because the system load > > is high and Samba has a large listen backlog. > > > > Every now and then we get a crash in smbd or in winbindd and winbindd > > complains of too many open files in the system. > > > > In looking at kern_accept, it seems to me that FreeBSD can leak a socket > > when kern_accept calls soaccept on it but gets ECONNABORTED. This error > > is the only error returned from tcp_usr_accept. > > > > It seems like the socket taken off so_comp is never freed in this case > > and that there has been a call on soref on it as well, so that something > > like the following is needed in the error path: > > > > //some-path/freebsd/sys/kern/uipc_syscalls.c#1 > > - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c > > @@ -433,6 +433,14 @@ > > */ > > if (name) > > *namelen = 0; > > + /* > > +* We need to close the socket we unlinked > > +* so we do not leak it. > > +*/ > > + ACCEPT_LOCK(); > > + SOCK_LOCK(so); > > + soclose(so); > > goto noconnection; > > } > > if (sa == NULL) { > > > > I think an soclose is needed at this point because soisconnected has > > been called on the socket. > > > > Do you think this analysis is reasonable? > > > > We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, > > maybe I am wrong since I am not sure if the fdclose call would free the > > socket, but a quick look suggested that it doesn't. > > The fdclose should properly tear down the file descriptor. The call > graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() -> > soclose() -> sorele() -> sofree() -> sodealloc(). > > A socket leak would not count against "kern.maxfiles" unless the file > descriptor leaks as well. So it is unlikely that this is the problem. OK, thanks for the feedback. I will keep looking. > Samba may open a large number of files (real files and sockets) and > you may run into the maxfiles limit. You can check the limit with > "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf > with "kern.maxfiles=10" for example. Well, some of the smbds are dying, but it is possible that there is a file leak in Samba or our VFS that we are tripping as well. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Possible obscure socket leak when system under load and listener is slow to accept
Hi folks, Our QA group (at xxx) using Samba and smbtorture has been seeing a lot of cases where accept returns ECONNABORTED because the system load is high and Samba has a large listen backlog. Every now and then we get a crash in smbd or in winbindd and winbindd complains of too many open files in the system. In looking at kern_accept, it seems to me that FreeBSD can leak a socket when kern_accept calls soaccept on it but gets ECONNABORTED. This error is the only error returned from tcp_usr_accept. It seems like the socket taken off so_comp is never freed in this case and that there has been a call on soref on it as well, so that something like the following is needed in the error path: //some-path/freebsd/sys/kern/uipc_syscalls.c#1 - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c @@ -433,6 +433,14 @@ */ if (name) *namelen = 0; + /* +* We need to close the socket we unlinked +* so we do not leak it. +*/ + ACCEPT_LOCK(); + SOCK_LOCK(so); + soclose(so); goto noconnection; } if (sa == NULL) { I think an soclose is needed at this point because soisconnected has been called on the socket. Do you think this analysis is reasonable? > We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, maybe I am wrong since I am not sure if the fdclose call would free the socket, but a quick look suggested that it doesn't. The fdclose should properly tear down the file descriptor. The call graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() -> soclose() -> sorele() -> sofree() -> sodealloc(). A socket leak would not count against "kern.maxfiles" unless the file descriptor leaks as well. So it is unlikely that this is the problem. Samba may open a large number of files (real files and sockets) and you may run into the maxfiles limit. You can check the limit with "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf with "kern.maxfiles=10" for example. -- Andre ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Possible obscure socket leak when system under load and listener is slow to accept
Hi, If this is a real leak, please file a PR so it doesn't get lost. *cough* let me rephrase that - so the eager PR beavers can keep chasing it iup. But, wow. Nice catch! Adrian On 8 December 2012 10:13, Richard Sharpe wrote: > Hi folks, > > Our QA group (at xxx) using Samba and smbtorture has been seeing a > lot of cases where accept returns ECONNABORTED because the system load > is high and Samba has a large listen backlog. > > Every now and then we get a crash in smbd or in winbindd and winbindd > complains of too many open files in the system. > > In looking at kern_accept, it seems to me that FreeBSD can leak a socket > when kern_accept calls soaccept on it but gets ECONNABORTED. This error > is the only error returned from tcp_usr_accept. > > It seems like the socket taken off so_comp is never freed in this case > and that there has been a call on soref on it as well, so that something > like the following is needed in the error path: > > //some-path/freebsd/sys/kern/uipc_syscalls.c#1 > - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c > @@ -433,6 +433,14 @@ > */ > if (name) > *namelen = 0; > + /* > +* We need to close the socket we unlinked > +* so we do not leak it. > +*/ > + ACCEPT_LOCK(); > + SOCK_LOCK(so); > + soclose(so); > goto noconnection; > } > if (sa == NULL) { > > I think an soclose is needed at this point because soisconnected has > been called on the socket. > > Do you think this analysis is reasonable? > > We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, > maybe I am wrong since I am not sure if the fdclose call would free the > socket, but a quick look suggested that it doesn't. > > I would appreciate your feedback. > > ___ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Possible obscure socket leak when system under load and listener is slow to accept
Hi folks, Our QA group (at xxx) using Samba and smbtorture has been seeing a lot of cases where accept returns ECONNABORTED because the system load is high and Samba has a large listen backlog. Every now and then we get a crash in smbd or in winbindd and winbindd complains of too many open files in the system. In looking at kern_accept, it seems to me that FreeBSD can leak a socket when kern_accept calls soaccept on it but gets ECONNABORTED. This error is the only error returned from tcp_usr_accept. It seems like the socket taken off so_comp is never freed in this case and that there has been a call on soref on it as well, so that something like the following is needed in the error path: //some-path/freebsd/sys/kern/uipc_syscalls.c#1 - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c @@ -433,6 +433,14 @@ */ if (name) *namelen = 0; + /* +* We need to close the socket we unlinked +* so we do not leak it. +*/ + ACCEPT_LOCK(); + SOCK_LOCK(so); + soclose(so); goto noconnection; } if (sa == NULL) { I think an soclose is needed at this point because soisconnected has been called on the socket. Do you think this analysis is reasonable? We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, maybe I am wrong since I am not sure if the fdclose call would free the socket, but a quick look suggested that it doesn't. I would appreciate your feedback. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"