Re: [Patch] httpd - don't leak fcgi file descriptors
On Mon, Jun 08, 2015 at 09:17:41PM +0200, Claudio Jeker wrote: On Mon, Jun 08, 2015 at 09:12:32PM +0200, Joerg Jung wrote: On Tue, Jun 02, 2015 at 05:47:47PM +0200, Claudio Jeker wrote: On Tue, Jun 02, 2015 at 01:50:35PM +0200, Joerg Jung wrote: Am 01.06.2015 um 01:25 schrieb Todd Mortimer t...@opennet.ca: I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? Yes, you are right. I think your proposed diff is correct. I would like to commit it, if anyone is willing to give OK. This feels to me more like a workaround. Since what happens is that a connection is either reused without cleanup or we call the connection establishment multiple time for the same client. relayd had a similar issue that I fixed lately. One of the issues is that event callbacks can be called when you don't really expect them to be called. Yes, workaround was my first impression as well. But after staring at the code for a while, I think fixing it in the right way seems not trivial and involves several changes. So, since the diff below is simple and goes along with the style of the existing code, AND fixes an actual leak, I would suggest to commit it for now, until someone comes up with something better? Sure. Fine with me. Wondering if the -1 check is needed. IIRC close(-1); is save. Anyway you want to add a space after the if. OK for me as well. Please go ahead. I was hoping to look into it and find the correct fix but I didn't get around to it. So this workaround is better than nothing I guess. -- I'm not entirely sure you are real.
Re: [Patch] httpd - don't leak fcgi file descriptors
On Tue, Jun 09, 2015 at 07:46:15AM +, Florian Obser wrote: On Mon, Jun 08, 2015 at 09:17:41PM +0200, Claudio Jeker wrote: On Mon, Jun 08, 2015 at 09:12:32PM +0200, Joerg Jung wrote: On Tue, Jun 02, 2015 at 05:47:47PM +0200, Claudio Jeker wrote: On Tue, Jun 02, 2015 at 01:50:35PM +0200, Joerg Jung wrote: Am 01.06.2015 um 01:25 schrieb Todd Mortimer t...@opennet.ca: I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? Yes, you are right. I think your proposed diff is correct. I would like to commit it, if anyone is willing to give OK. This feels to me more like a workaround. Since what happens is that a connection is either reused without cleanup or we call the connection establishment multiple time for the same client. relayd had a similar issue that I fixed lately. One of the issues is that event callbacks can be called when you don't really expect them to be called. Yes, workaround was my first impression as well. But after staring at the code for a while, I think fixing it in the right way seems not trivial and involves several changes. So, since the diff below is simple and goes along with the style of the existing code, AND fixes an actual leak, I would suggest to commit it for now, until someone comes up with something better? Sure. Fine with me. Wondering if the -1 check is needed. IIRC close(-1); is save. Anyway you want to add a space after the if. OK for me as well. Please go ahead. I was hoping to look into it and find the correct fix but I didn't get around to it. So this workaround is better than nothing I guess. Committed, thanks. Regards, Joerg
Re: [Patch] httpd - don't leak fcgi file descriptors
On Mon, Jun 08, 2015 at 09:12:32PM +0200, Joerg Jung wrote: On Tue, Jun 02, 2015 at 05:47:47PM +0200, Claudio Jeker wrote: On Tue, Jun 02, 2015 at 01:50:35PM +0200, Joerg Jung wrote: Am 01.06.2015 um 01:25 schrieb Todd Mortimer t...@opennet.ca: I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? Yes, you are right. I think your proposed diff is correct. I would like to commit it, if anyone is willing to give OK. This feels to me more like a workaround. Since what happens is that a connection is either reused without cleanup or we call the connection establishment multiple time for the same client. relayd had a similar issue that I fixed lately. One of the issues is that event callbacks can be called when you don't really expect them to be called. Yes, workaround was my first impression as well. But after staring at the code for a while, I think fixing it in the right way seems not trivial and involves several changes. So, since the diff below is simple and goes along with the style of the existing code, AND fixes an actual leak, I would suggest to commit it for now, until someone comes up with something better? Sure. Fine with me. Wondering if the -1 check is needed. IIRC close(-1); is save. Anyway you want to add a space after the if. Regards, Joerg Index: server_fcgi.c === RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v retrieving revision 1.53 diff -u -p -u -p -r1.53 server_fcgi.c --- server_fcgi.c26 Mar 2015 09:01:51 -1.53 +++ server_fcgi.c31 May 2015 22:33:54 - @@ -31,6 +31,7 @@ #include stdio.h #include time.h #include ctype.h +#include unistd.h #include event.h #include httpd.h @@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl errstr = failed to allocate evbuffer; goto fail; } + +if(clt-clt_fd != -1) +close(clt-clt_fd); clt-clt_fd = fd; if (clt-clt_srvbev != NULL) -- :wq Claudio -- :wq Claudio
Re: [Patch] httpd - don't leak fcgi file descriptors
On Tue, Jun 02, 2015 at 05:47:47PM +0200, Claudio Jeker wrote: On Tue, Jun 02, 2015 at 01:50:35PM +0200, Joerg Jung wrote: Am 01.06.2015 um 01:25 schrieb Todd Mortimer t...@opennet.ca: I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? Yes, you are right. I think your proposed diff is correct. I would like to commit it, if anyone is willing to give OK. This feels to me more like a workaround. Since what happens is that a connection is either reused without cleanup or we call the connection establishment multiple time for the same client. relayd had a similar issue that I fixed lately. One of the issues is that event callbacks can be called when you don't really expect them to be called. Yes, workaround was my first impression as well. But after staring at the code for a while, I think fixing it in the right way seems not trivial and involves several changes. So, since the diff below is simple and goes along with the style of the existing code, AND fixes an actual leak, I would suggest to commit it for now, until someone comes up with something better? Regards, Joerg Index: server_fcgi.c === RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v retrieving revision 1.53 diff -u -p -u -p -r1.53 server_fcgi.c --- server_fcgi.c26 Mar 2015 09:01:51 -1.53 +++ server_fcgi.c31 May 2015 22:33:54 - @@ -31,6 +31,7 @@ #include stdio.h #include time.h #include ctype.h +#include unistd.h #include event.h #include httpd.h @@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl errstr = failed to allocate evbuffer; goto fail; } + +if(clt-clt_fd != -1) +close(clt-clt_fd); clt-clt_fd = fd; if (clt-clt_srvbev != NULL) -- :wq Claudio
Re: [Patch] httpd - don't leak fcgi file descriptors
Sure. Fine with me. Wondering if the -1 check is needed. IIRC close(-1); is save. Anyway you want to add a space after the if. One side effect: it changes errno. But I don't see an impact immediately.
Re: [Patch] httpd - don't leak fcgi file descriptors
Am 01.06.2015 um 01:25 schrieb Todd Mortimer t...@opennet.ca: I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? Yes, you are right. I think your proposed diff is correct. I would like to commit it, if anyone is willing to give OK. Thanks, Regards, Joerg Index: server_fcgi.c === RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v retrieving revision 1.53 diff -u -p -u -p -r1.53 server_fcgi.c --- server_fcgi.c26 Mar 2015 09:01:51 -1.53 +++ server_fcgi.c31 May 2015 22:33:54 - @@ -31,6 +31,7 @@ #include stdio.h #include time.h #include ctype.h +#include unistd.h #include event.h #include httpd.h @@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl errstr = failed to allocate evbuffer; goto fail; } + +if(clt-clt_fd != -1) +close(clt-clt_fd); clt-clt_fd = fd; if (clt-clt_srvbev != NULL)
Re: [Patch] httpd - don't leak fcgi file descriptors
On Tue, Jun 02, 2015 at 01:50:35PM +0200, Joerg Jung wrote: Am 01.06.2015 um 01:25 schrieb Todd Mortimer t...@opennet.ca: I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? Yes, you are right. I think your proposed diff is correct. I would like to commit it, if anyone is willing to give OK. This feels to me more like a workaround. Since what happens is that a connection is either reused without cleanup or we call the connection establishment multiple time for the same client. relayd had a similar issue that I fixed lately. One of the issues is that event callbacks can be called when you don't really expect them to be called. Thanks, Regards, Joerg Index: server_fcgi.c === RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v retrieving revision 1.53 diff -u -p -u -p -r1.53 server_fcgi.c --- server_fcgi.c26 Mar 2015 09:01:51 -1.53 +++ server_fcgi.c31 May 2015 22:33:54 - @@ -31,6 +31,7 @@ #include stdio.h #include time.h #include ctype.h +#include unistd.h #include event.h #include httpd.h @@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl errstr = failed to allocate evbuffer; goto fail; } + +if(clt-clt_fd != -1) +close(clt-clt_fd); clt-clt_fd = fd; if (clt-clt_srvbev != NULL) -- :wq Claudio
Re: [Patch] httpd - don't leak fcgi file descriptors
On Sun, 31 May 2015 19:25:22 -0400 Todd Mortimer t...@opennet.ca wrote: Hi tech@, Hi Joerg, Thanks for getting back to me. I cloned the server and upgraded it to the 31 May snapshot, did the sysmerge and upgraded the packages to the snapshot versions. The behaviour is still there. It actually appears to be somewhat more pronounced, and php-fpm hits max_children more quickly than it does under 5.7-stable. The same patch prevents the php-fpm processes from going idle on netio, and I have reproduced it against -current below. It also seems that httpd on -current has more parallel connections open to php-fpm at once compared to the same setup on 5.7-stable. I have been following this thread since the initial report. I'm also running owncloud (+ roundcube for mail) on an OpenBSD -current amd64 server (snapshot from May 20). I have been hitting the exact same issue but initially I accounted it for just not tweaking the settings. My server is really low on traffic (only two users rarely concurrent). I can confirm that after getting to max_children limit the server starts returning error 500 on each request (both on roundcube owncloud) until php-fpm or httpd is restarted. I did increase pm.max_children setting from 5 - 10 I still hit the max_children limit on roughly the same interval. I will try to find the time to upgrade the server to a newer snapshot and test the patch provided by Todd. # grep max_children /var/log/php-fpm.log [29-Mar-2015 17:25:31] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [29-Mar-2015 18:57:16] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [29-Mar-2015 19:22:12] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [30-Mar-2015 03:22:25] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [31-Mar-2015 21:47:08] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [31-Mar-2015 22:47:09] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [02-Apr-2015 11:39:22] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [02-Apr-2015 13:39:26] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [03-Apr-2015 15:44:39] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [05-Apr-2015 17:07:15] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [05-Apr-2015 17:12:54] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [08-Apr-2015 14:00:57] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [21-May-2015 13:32:37] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [25-May-2015 17:15:54] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [25-May-2015 17:42:39] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [30-May-2015 15:27:55] WARNING: [pool www] server reached pm.max_children setting (10), consider raising it # After increasing the limit I also hit: [30-May-2015 15:27:51] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 6 total children [30-May-2015 15:27:52] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 7 total children [30-May-2015 15:27:53] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 8 total children [30-May-2015 15:27:54] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 9 total children I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? I would be happy to look into a proper solution if that would be better. Thanks! Todd On May 31, 2015, at 2:23, Joerg Jung m...@umaxx.net wrote: Hi, Am 20.05.2015 um 02:06 schrieb Todd Mortimer t...@opennet.ca: The attached patch fixes a problem I’ve been having with httpd + php_fpm + owncloud on 5.7. The patch is against 5.7-release. Can you try with recent snapshot, and see if issue still occurs, please? Development happens in -current. After several days running
Re: [Patch] httpd - don't leak fcgi file descriptors
Hi Joerg, Thanks for getting back to me. I cloned the server and upgraded it to the 31 May snapshot, did the sysmerge and upgraded the packages to the snapshot versions. The behaviour is still there. It actually appears to be somewhat more pronounced, and php-fpm hits max_children more quickly than it does under 5.7-stable. The same patch prevents the php-fpm processes from going idle on netio, and I have reproduced it against -current below. It also seems that httpd on -current has more parallel connections open to php-fpm at once compared to the same setup on 5.7-stable. I agree that my patch is more of a workaround, and it would be better to track down how it is that the client is being passed to server_fcgi with an open socket. I was going this way when I started looking at the source, but then I saw that clt-clt_srvevb and clt-clt_srvbev get the same treatment (free if not null, then reassign) at the same spot in server_fcgi(), and I figured if it was good enough for clt_srvevb and clt_srvbev, why not for clt_fd? I would be happy to look into a proper solution if that would be better. Thanks! Todd On May 31, 2015, at 2:23, Joerg Jung m...@umaxx.net wrote: Hi, Am 20.05.2015 um 02:06 schrieb Todd Mortimer t...@opennet.ca: The attached patch fixes a problem I’ve been having with httpd + php_fpm + owncloud on 5.7. The patch is against 5.7-release. Can you try with recent snapshot, and see if issue still occurs, please? Development happens in -current. After several days running owncloud with httpd, php_fpm started complaining about hitting pm.max_children, and top would show a bunch of idle php_fpm processes waiting on netio. Eventually httpd would start returning error 500 and owncloud would stop working. Restarting php_fpm or httpd would temporarily fix the issue. The same server with nginx did not have the same problem. I’ve had this patch running for 5 days now, and php_fpm isnt leaving idle processes lying around anymore. I did run with some debugging output to verify that clt-clt_fd is sometimes not -1 when it is overwritten with the new socket fd. IMHO your proposed fix is just a workaround. Instead of 'blindly' close()'ing, better approach is to figure out where the fd was leaked earlier. Regards, Joerg Index: server_fcgi.c === RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v retrieving revision 1.53 diff -u -p -u -p -r1.53 server_fcgi.c --- server_fcgi.c 26 Mar 2015 09:01:51 - 1.53 +++ server_fcgi.c 31 May 2015 22:33:54 - @@ -31,6 +31,7 @@ #include stdio.h #include time.h #include ctype.h +#include unistd.h #include event.h #include httpd.h @@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl errstr = failed to allocate evbuffer; goto fail; } + + if(clt-clt_fd != -1) + close(clt-clt_fd); clt-clt_fd = fd; if (clt-clt_srvbev != NULL)
Re: [Patch] httpd - don't leak fcgi file descriptors
Hi, Am 20.05.2015 um 02:06 schrieb Todd Mortimer t...@opennet.ca: The attached patch fixes a problem I’ve been having with httpd + php_fpm + owncloud on 5.7. The patch is against 5.7-release. Can you try with recent snapshot, and see if issue still occurs, please? Development happens in -current. After several days running owncloud with httpd, php_fpm started complaining about hitting pm.max_children, and top would show a bunch of idle php_fpm processes waiting on netio. Eventually httpd would start returning error 500 and owncloud would stop working. Restarting php_fpm or httpd would temporarily fix the issue. The same server with nginx did not have the same problem. I’ve had this patch running for 5 days now, and php_fpm isnt leaving idle processes lying around anymore. I did run with some debugging output to verify that clt-clt_fd is sometimes not -1 when it is overwritten with the new socket fd. IMHO your proposed fix is just a workaround. Instead of 'blindly' close()'ing, better approach is to figure out where the fd was leaked earlier. Regards, Joerg Index: server_fcgi.c === RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v retrieving revision 1.52 diff -u -p -u -p -r1.52 server_fcgi.c --- server_fcgi.c 23 Feb 2015 19:22:43 - 1.52 +++ server_fcgi.c 15 May 2015 22:12:30 - @@ -31,6 +31,7 @@ #include stdio.h #include time.h #include ctype.h +#include unistd.h #include event.h #include httpd.h @@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl errstr = failed to allocate evbuffer; goto fail; } + + if (clt-clt_fd != -1) + close(clt-clt_fd); clt-clt_fd = fd; if (clt-clt_srvbev != NULL)
[Patch] httpd - don't leak fcgi file descriptors
Hi tech, The attached patch fixes a problem I’ve been having with httpd + php_fpm + owncloud on 5.7. The patch is against 5.7-release. After several days running owncloud with httpd, php_fpm started complaining about hitting pm.max_children, and top would show a bunch of idle php_fpm processes waiting on netio. Eventually httpd would start returning error 500 and owncloud would stop working. Restarting php_fpm or httpd would temporarily fix the issue. The same server with nginx did not have the same problem. I’ve had this patch running for 5 days now, and php_fpm isnt leaving idle processes lying around anymore. I did run with some debugging output to verify that clt-clt_fd is sometimes not -1 when it is overwritten with the new socket fd. I’m happy to test or revise if needed. Thank you! Todd Index: server_fcgi.c === RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v retrieving revision 1.52 diff -u -p -u -p -r1.52 server_fcgi.c --- server_fcgi.c 23 Feb 2015 19:22:43 - 1.52 +++ server_fcgi.c 15 May 2015 22:12:30 - @@ -31,6 +31,7 @@ #include stdio.h #include time.h #include ctype.h +#include unistd.h #include event.h #include httpd.h @@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl errstr = failed to allocate evbuffer; goto fail; } + + if (clt-clt_fd != -1) + close(clt-clt_fd); clt-clt_fd = fd; if (clt-clt_srvbev != NULL)