On 16 November 2017 at 16:38, Peter Geoghegan <p...@bowt.ie> wrote: > * To understand how this relates to admission control. The only > obvious difference that I can think of is that admission control > probably involves queuing when very memory constrained, and isn't > limited to caring about memory. I'm not trying to make swapping/OOM > impossible here; I'm trying to make it easier to be a Postgres DBA > sizing work_mem, and make it so that DBAs don't have to be stingy with > work_mem. The work_mem sizing formulas we sometimes promote (based on > max_connections) are probably very limiting in the real world.
I had always imagined that this should be some sort of work_mem_pool. Each plan would have some mention of how much they expect to consume, which I'd thought was N * work_mem where N is the number of Nodes in the plan that require a work_mem, then at the start of execution, we atomically increment variable in shared mem that tracks the work_mem_pool usage, then check if that variable is <= work_mem_pool then start execution, if not we add ourselves to some waiters queue and go to sleep only to be signaled when another plan execution completes and releases memory back into the pool, we'd then re-check and just go back to sleep if there's still not enough space. Probably simple plans with no work_mem requirement can skip all these checks which may well keep concurrency up. I'm just not all that clear on how to handle the case where the plan's memory estimate exceeds work_mem_pool. It would never get to run. Perhaps everything that requires any memory must wait in that case so this query can run alone. i.e. special case this to require the work_mem_pool usage to be 0 before we run, or maybe it should just be an ERROR? Probably the whole feature could be disabled if work_mem_pool is -1, which might be a better option for users who find there's some kind of contention around memory pool checks. > I freely admit that my proposal is pretty hand-wavy at this point, > but, as I said, I want to at least get the ball rolling. Me too. I might have overlooked some giant roadblock. I think it's important that the work_mem_pool tracker is consumed at the start of the query rather than when the work_mem node first runs, as there'd likely be some deadlocking type waiting issue if we have plans part-way through execution start waiting for other plans to complete. That might not be ideal, as we'd be assuming that a plan will always consume all their work_mems at once, but it seems better than what we have today. Maybe we can improve on it later. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services