Re: [PERFORM] work_mem in high transaction rate database

Scott Carey Wed, 04 Mar 2009 12:10:14 -0800

You may have decreased performance in your batch jobs with the lower work_mem 
setting.
Additionally, the fact that you haven't had swap storm issues so far means that 
although there is certain risk of an issue, its probably a lot lower than what 
has been talked about here so far.
Without a change in client behavior (new queries or large change in data) a 
change in load alone is very unlikely to cause a problem.  So take your time to 
do it right.  I disagree with the knee-jerk "change it now!" response.  The 
very fact you have gotten this far means it is not as risky as the bare 
settings indicate.
Definitely plan on improving and testing out work_mem variants soon, but a 
hasty change to a small value might cause your batch jobs to take much longer - 
what is your risk if they take too long or don't complete in time?  That risk 
is yours to assess - if its not much of a risk, then by all means lower 
work_mem soon.  But if it is business critical for those batch jobs to complete 
within some time frame, be careful.

If you know what you are doing, and are careful,  your work_mem is probably, 
but not necessarily too high.
It all depends on how much you know about your clients.
For example 2010 connections with 500MB work_mem is not always a problem.  If 
you know 2000 of those are from an application that accesses with a user that 
can only see small tables, and you know what queries those are, it may be 
perfectly safe.  For example, I've dealt with an application that had a couple 
thousand connections, 95% were idle at any time (connection pool much like 
those likely from your Jboss apps).  The queries these ran were a limited set 
of about 20 statements that all accessed by unique key on small-ish sized 
tables (<30MB) with few joins.  There were tons of connections, but they:
1: hardly did anything, most were idle.  On 75% of the connections, a query set 
was run exactly once every 15 minutes selecting * from small (sub 1MB) tables.
2: the more active connections (20%) did small select queries on single rows 
accessed by primary key.

So, the calculation  max connections * work_mem is utterly inappropriate for 
that sort of workload.   Yes, in theory, those connections could use work_mem * 
some factor of memory - if they changed their queries, and accessed other 
tables.  In practice - nowhere close.

The remaining few connections(~5) were batch jobs that needed ~800MB of 
work_mem or else the performance would stink.  And they didn't need 800MB of 
work_mem for real (the hashes used ~250MB) they needed a SETTING of 800MB 
because the planner is incapable of estimating row counts properly with 
partitioned table access.
Both applications were not able to configure their own work_mem for quite some 
time (getting client applications to change is not always a quick process).
But the risk of having a large setting, even with 2000 connections was low.  
The risk of changing it too low was very high (batch jobs taking 3 hours 
instead of 10 minutes).  Only 5 ish connections even accessed schemas/tables 
with lots of data.  The remaining couple thousand were constrained in many ways 
other than work_mem.

In the end I did have swap storms... However it was not caused by work_mem.  It 
was the query planner, which can use GBs of memory per connection planning a 
query on partitioned tables.

So, my point is that if you don't know a lot about the database or its clients 
be very wary of large work_mem settings.  If you do, and have a lot of control 
or knowledge about your clients, the work_mem * max_connections calculation is 
inappropriate.

The detailed formula is along the lines of:
SUM_i [work_mem_i * active_connecions_i]    (for each 'type' of connection i).
If you don't know enough about your connections, then the conservative estimate 
is work_mem * max_connections.

A single query has the potential of using multiples of work_mem depending on 
how many concurrent hashes / sorts are in a query, so the above is not quite 
right either.

Is there a way to make a particular database user have a user-local work_mem 
setting without having the client change their code? You could then have each 
application have its own user, with its own default setting.  The batch jobs 
with few connections can get much larger work_mem than the Jboss ones.  This 
would be especially powerful for applications that can't change or that use 
higher level tools for db access that make it impossible or very difficult to 
send non-standard commands like "SET".

On 3/4/09 4:46 AM, "Flavio Henrique Araque Gurgel" <[email protected]> wrote:

----- "Scott Marlowe" <[email protected]> escreveu:
> Oh my lord, that is a foot gun waiting to go off.  Assuming 2k
> connections, and somehow a fair number of them went active with big
> sorts, you'd be able to exhaust all physical memory  with about 8 to
> 16 connections.  Lower work_mem now. To something like 1 to 4 Meg.  Do
> not pass go.  If some oddball query really needs a lot of work_mem,
> and benchmarks show something larger work_mem helps, consider raising
> the work_mem setting for that one query to something under 1G (way
> under 1G) That makes it noticeably faster.  Don't allocate more than a
> test shows you helps.

Thanks a lot Scott. That's what I thought in the beginning but was very 
doubtful since the documentation is a bit odd regarding this point and several 
bloggers talk about increasing this value up to 250MB. I really think that 
separating regular non pooled distributed applications and pooled application 
servers makes a lot of difference in this point.

> Vacuum does not block transactions.  unless you're dropping tables or 
> something.

I'll try to separate things and check if the DELETE queries have something 
related here.

(...)
> what you think they mean.  Post the first 20 or so lines from top to
> show us.

Unfortunately I can't do it. The data there is very sensitive (it's a public 
company here in Brazil) and the server is operated only by selected personal. I 
just ask for information and give written recomendations. Anyway, I'm going to 
pay some more attention in this topic.

This is a very interesting implementation of PostgreSQL (3 large databases, 
heavy load, things growing all the time) and I'll let you all know what 
happened when tuning it. I'll feedback you after lowering work_mem and changing 
related settings.

Thanks
Flavio

Re: [PERFORM] work_mem in high transaction rate database

Reply via email to