Re: [PATCH 2/2] daemon: plug memory leak

2015-10-26 Thread Jeff King
On Sat, Oct 24, 2015 at 02:23:20PM +0200, René Scharfe wrote:

> Call child_process_clear() when a child ends to release the memory
> allocated for its environment.  This is necessary because unlike all
> other users of start_command() we don't call finish_command(), which
> would have taken care of that for us.
> 
> This leak was introduced by f063d38b (daemon: use cld->env_array
> when re-spawning).

I agree this is a leak we need to address, but I find the solution a
little unsatisfactory (but tl;dr, it's probably the best we can do).

It would be nice if we could call finish_command() here to do the
wait(). But it does not know about WNOHANG, so we would have to
introduce a new helper anyway.

However, the whole check_dead_children function is accidentally
quadratic, and should probably be refactored in general. It loops over
the list of children, calling `waitpid(pid, WNOHANG)` on each. So if `n`
children connect at one time, we make `O(n^2)` waitpid calls.

I have patches to instead loop over waitpid(-1, WNOHANG) until there are
no dead children left (and use a hash to go from pid to each "struct
child").

But the reason I haven't posted them is that I'm somewhat of the opinion
that this child-list can go away altogether. The main use is in
enforcing the max_connections variable. Just enforcing that can be done
with a counter, but we do something much weirder: when we get a new
connection, we try to kill an _existing_ child by finding the first
client with multiple connections, killing that, sleeping for a second
(!), and then if that killed something, giving the slot to the new
connection.

This has a few problems:

  1. Under non-malicious heavy load, you probably want to reject the new
 request rather than killing an existing one. If you kill a fetch
 that is 99% completed, that client is likely to come back and fetch
 again, and you've just thrown away all of the effort (both CPU and
 bandwidth) put into the failed clone. You haven't spent anything on
 the incoming connection, so you'd much rather they go away and come
 back in a moment.

  2. I think the kill_some_child algorithm was meant to enforce
 fairness so that a single IP cannot monopolize all of your slots.
 I'm not sure that's realistic for a few reasons:

   a. Real bad guys will just hit you with a DDoS from many IPs.

   b. Real sites have load-balancing in front of git-daemon, and the
  client IPs may not be meaningful.

   c. The duplicate selection is pretty naive. A bad guy with tons
  of duplicate connections is no more likely to be killed than a
  normal client with exactly two connections (it actually
  depends solely on when the latest connection from either came
  in). So I suspect a determine attacker with a single IP could
  still present a problem.

  3. Calling sleep() in the server parent process is a great way to kill
 performance. :)

Of course my view is somewhat skewed by running a really big site with
load balancers and such (and we have long set --max-connections high
enough that it has never been hit). Maybe it's more realistic protection
for a "normal" site.

But my inclination would be to drop kill_some_child entirely in favor of
simply rejecting new connections that are over the limit, and getting
rid of the child-list entirely (and calling waitpid only to reap the
zombie PIDs).

I guess we'd still need something like child_process_clear(), though. We
could just run it immediately after spawning the child, rather than when
we cleaned it up.  So in that sense this series is the first incremental
step toward that future, anyway.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] daemon: plug memory leak

2015-10-24 Thread René Scharfe
Call child_process_clear() when a child ends to release the memory
allocated for its environment.  This is necessary because unlike all
other users of start_command() we don't call finish_command(), which
would have taken care of that for us.

This leak was introduced by f063d38b (daemon: use cld->env_array
when re-spawning).

Signed-off-by: Rene Scharfe 
---
 daemon.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/daemon.c b/daemon.c
index 56679a1..be70cd4 100644
--- a/daemon.c
+++ b/daemon.c
@@ -802,6 +802,7 @@ static void check_dead_children(void)
/* remove the child */
*cradle = blanket->next;
live_children--;
+   child_process_clear(&blanket->cld);
free(blanket);
} else
cradle = &blanket->next;
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html