Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-09-04 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 Also, I am a bit doubtful about the advice on sizing the
 connection pool as applied to small servers:
 surely it's not sane to recommend that a single-processor system
 with one disk should have max_connections = 3.  At least, *I*
 don't think that's sane.

 I'm not sure it's wrong when combined with this: Remember that
 this sweet spot is for the number of connections that are
 actively doing work.  ...  You should always make max_connections
 a bit bigger than the number of connections you enable in your
 connection pool. That way there are always a few slots available
 for direct connections for system maintenance and monitoring. 
 Where would you expect the knee to be for connections
 concurrently actively doing work on a single-core, single-drive
 system ?
 
 I don't know.  But my experience with our customers is that people
 are often forced to set the size of the connection pool far larger
 than what that formula would suggest.  Many people are doing
 transaction-level pooling, and for those people, they've got to
 look at how many multi-statement transactions they've got and
 think about what the peak value for that quantity is.  It's still
 worth using pooling because it reduces the number of simultaneous
 connections, but it's not going to reduce it to the kind of values
 you're talking about.  Also, consider that transactions aren't all
 the same length.  Suppose 90% of your queries execute in 50ms, and
 10% execute in 60s.  Even though it's less efficient, you've got
 to set the connection pool large enough that at least some of the
 50 ms queries can continue to get processed even if the maximum
 number of 60s queries that you ever expect to see in parallel are
 already running.  This may seem like a theoretical problem but we
 have customers who use connection pools to get the number of
 simultaneous connections down to, say, 800.  I guarantee you that
 these people do not have 200 CPUs and 400 disks, but they're smart
 people and they find that smaller pool sizes don't work.
 
It is something which has to be considered, and I don't think it's
theoretical at all.  Here's how we deal with it.  We don't use a
plain FIFO queue for our transaction requests, but a prioritized
FIFO with 10 levels of priority (0 to 9).  The highest priority (9)
is reserved for utility requests -- where a running transaction
needs to spin off a related transaction to do some work for it.  For
the lowest level (0) we normally allocate only a single connection,
and it is used for very long-running reports which we want to queue
to run one-at-a-time.  As examples of how we categorize queries,
filling a large list in an interactive application will run at
priority 3, while translating a key which must cause a description
on the screen to display is run at priority 8.  Normal single-row
updates and deletes from an interactive application run at priority
5.
 
Each connection in the pool has a worker thread, and is assigned a
minimum priority that it will handle.  When all threads are busy and
transaction requests are queued, any thread completing a database
transaction pulls from the front of the highest priority queue with
a waiting request to run a transaction, looking only at priorities
which are not beneath it.  If there are no waiting requests of
high enough priority, the thread waits for one to arrive.
 
We have found that the formula I presented, when combined with
transactional request queuing like I describe here gives us our best
performance.  I don't have the exact numbers in front of me at the
moment, but on a machine with 16 cores and a 40-drive array (but
heavily cached, so that the effective spindle count was lower than
that), servicing thousands of concurrent web users with hundreds of
tps, we improved performance significantly by dropping our
connection pool size from about 60 to about 30, in addition to the
separate pool of six which are handling logical replication from
about 80 sources.  That was a real-life production situation, but we
ran a series of benchmarks and found that in a pretty wide spectrum
of situations the formula I gave fits pretty neatly.
 
If someone is using 800 connections for, say, a 32 core machine with
a 200 drive array I would suspect that they would get a lot of
benefit from a smarter connection pool.
 
 Sure, we can say, well, the fine print tells you that 2*CPUs+disks
 is not REALLY the formula you should use, but it's just so far off
 what I see in the field that I have a hard time thinking it's
 really helping people to give them that as a starting point.
 
The point is that it *is* generally really close to the numbers we
have seen here in both benchmarks and production, and I have gotten
comments both on and off the lists from people who have told me that
they tried that formula against their benchmark results and found
that it fit well.  Now, this may be dependent on OS or 

Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-09-04 Thread Robert Haas
On Tue, Sep 4, 2012 at 11:15 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 It is something which has to be considered, and I don't think it's
 theoretical at all.  Here's how we deal with it.  We don't use a
 plain FIFO queue for our transaction requests, but a prioritized
 FIFO with 10 levels of priority (0 to 9).  The highest priority (9)
 is reserved for utility requests -- where a running transaction
 needs to spin off a related transaction to do some work for it.  For
 the lowest level (0) we normally allocate only a single connection,
 and it is used for very long-running reports which we want to queue
 to run one-at-a-time.  As examples of how we categorize queries,
 filling a large list in an interactive application will run at
 priority 3, while translating a key which must cause a description
 on the screen to display is run at priority 8.  Normal single-row
 updates and deletes from an interactive application run at priority
 5.

 Each connection in the pool has a worker thread, and is assigned a
 minimum priority that it will handle.  When all threads are busy and
 transaction requests are queued, any thread completing a database
 transaction pulls from the front of the highest priority queue with
 a waiting request to run a transaction, looking only at priorities
 which are not beneath it.  If there are no waiting requests of
 high enough priority, the thread waits for one to arrive.

I well believe that with this sort of sophisticated system you can
make the connection pool much smaller and get a benefit out of it.
However, I think it's quite rare for people to have a system this
sophisticated.  I suspect that's why I typically see much larger pool
sizes.

Here's my other thought about this: we talk a lot about how a system
with 32 cores and 40 drives can't do more than 72 things at once, and
that's absolutely true.  But I think much of the reason why PostgreSQL
users get a benefit out of connection pooling is unrelated to that
effect.  What I think we're really working around, in many cases, is
internal lock contention.  That's why people are talking about
adjusting formulas for 9.2.  It's not that a system with 72 resources
can suddenly do more than 72 things; it's that in the old world lock
contention could easily make it a loser to have even half that many
tasks running at once, and now that's less true.  Hopefully we'll make
further improvements in the future and it'll become even less true
still.  So is the real issue the hardware limits of the server, or is
it the limits of our software?  The former is certainly in the mix,
but I personally believe the latter has a lot more to do with pool
size selection than we typically credit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-08-30 Thread Robert Haas
On Thu, Aug 9, 2012 at 1:11 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 I didn't figure it was; my emphasis was because this has been raised
 before and nothing happened for want of a consensus on what
 particular wording should be used, so users were left with no
 guidance.  I don't want this to continue to be a victim of the
 perfect is the enemy of the good syndrome.

So, to get the ball rolling here, I spent some time on this today, and
added a paragraph to the Linux Memory Overcommit section of the
documentation.  I back-patched it back to 9.0.   There were additional
merge on conflicts in 8.4 which I did not bother to resolve.  There
may be room to further improve what I did here; suggestions are
welcome.  I think we probably still need to add something to the
max_connections documentation; I have not done that.

 Also, I am a bit doubtful about the advice on sizing the
 connection pool as applied to small servers:
 surely it's not sane to recommend that a single-processor system
 with one disk should have max_connections = 3.  At least, *I*
 don't think that's sane.

 I'm not sure it's wrong when combined with this: Remember that this
 sweet spot is for the number of connections that are actively
 doing work.  ...  You should always make max_connections a bit
 bigger than the number of connections you enable in your connection
 pool. That way there are always a few slots available for direct
 connections for system maintenance and monitoring.  Where would you
 expect the knee to be for connections concurrently actively doing
 work on a single-core, single-drive system ?

I don't know.  But my experience with our customers is that people are
often forced to set the size of the connection pool far larger than
what that formula would suggest.  Many people are doing
transaction-level pooling, and for those people, they've got to look
at how many multi-statement transactions they've got and think about
what the peak value for that quantity is.  It's still worth using
pooling because it reduces the number of simultaneous connections, but
it's not going to reduce it to the kind of values you're talking
about.  Also, consider that transactions aren't all the same length.
Suppose 90% of your queries execute in 50ms, and 10% execute in 60s.
Even though it's less efficient, you've got to set the connection pool
large enough that at least some of the 50 ms queries can continue to
get processed even if the maximum number of 60s queries that you ever
expect to see in parallel are already running.  This may seem like a
theoretical problem but we have customers who use connection pools to
get the number of simultaneous connections down to, say, 800.  I
guarantee you that these people do not have 200 CPUs and 400 disks,
but they're smart people and they find that smaller pool sizes don't
work.

Sure, we can say, well, the fine print tells you that 2*CPUs+disks is
not REALLY the formula you should use, but it's just so far off what I
see in the field that I have a hard time thinking it's really helping
people to give them that as a starting point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-08-09 Thread Kevin Grittner
[Forwarding to the -hackers list.  Please respond there.]
 
Craig Ringer ring...@ringerc.id.au wrote:
 On 08/09/2012 04:24 AM, Kevin Grittner wrote:
 http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

 Can we please please PLEASE link to that as a comment above
 max_connections?

 Last time this came up nobody was happy with wording of a comment
 so nothing got done. It's a real usability wart - causing
 real-world performance and reliability problems - that people
 unwittingly raise max_connections to absurd levels because they
 get no warnings, hints or guidance of any sort.
 
I see that we currently have five links to wiki.postgresql.org in
release notes and four more in the rest of the docs.  Are people OK
with adding this link to the docs on max_connections?  (Feel free to
improve it before answering if you have qualms about the specifics
on that page.)
 
We do seem to get an awful lot of posts (between here and
StackOverflow) from people who assume they need one database
connection per active user, and then are surprised that performance
is horrible.
 
If we get consensus on linking to this I'll put together a patch to
make a brief comment in the docs with a link to the Wiki.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-08-09 Thread Robert Haas
On Thu, Aug 9, 2012 at 10:22 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 I see that we currently have five links to wiki.postgresql.org in
 release notes and four more in the rest of the docs.  Are people OK
 with adding this link to the docs on max_connections?  (Feel free to
 improve it before answering if you have qualms about the specifics
 on that page.)

I'm generally not in favor of linking to the wiki unless there's some
reason that it wouldn't be appropriate to include the material in the
documentation in some form.  I don't see that that's the case here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-08-09 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 I see that we currently have five links to wiki.postgresql.org in
 release notes and four more in the rest of the docs.  Are people
 OK with adding this link to the docs on max_connections?  (Feel
 free to improve it before answering if you have qualms about the
 specifics on that page.)
 
 I'm generally not in favor of linking to the wiki unless there's
 some reason that it wouldn't be appropriate to include the
 material in the documentation in some form.  I don't see that
 that's the case here.
 
All right, but we *really* need to actually get *something* into the
docs on this, preferably back-patched.  In the -admin thread which
prompted Craig to plead for a link to the Wiki page, the OP thought
it was reasonable to worry about how to configure the oom killer to
deal with the situation that his 600 connections used all 32GB of
RAM *plus* the 32GB of swap space he has configured.  Imagine what
performance must be like by the time it gets to *that* point!
 
The OP clearly has read the docs, because he was attempting to take
advice from the section on Linux Memory Overcommit.  But as far as I
have seen, there is nothing in the docs to suggest connection
pooling.  It is far better to avoid the oom killer by connection
pooling than to tweak the oom killer configuration, yet you wouldn't
have a clue about that from the docs.
 
I actually think that the issue merits space in the docs roughly
matchnig the Wiki page.  Perhaps we could find a place in the Server
Setup and Operation chapter to more-or-less include the current Wiki
page contents, and reference *that* from the max_connections docs?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-08-09 Thread Robert Haas
On Thu, Aug 9, 2012 at 11:27 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 I see that we currently have five links to wiki.postgresql.org in
 release notes and four more in the rest of the docs.  Are people
 OK with adding this link to the docs on max_connections?  (Feel
 free to improve it before answering if you have qualms about the
 specifics on that page.)

 I'm generally not in favor of linking to the wiki unless there's
 some reason that it wouldn't be appropriate to include the
 material in the documentation in some form.  I don't see that
 that's the case here.

 All right, but we *really* need to actually get *something* into the
 docs on this, preferably back-patched.  In the -admin thread which
 prompted Craig to plead for a link to the Wiki page, the OP thought
 it was reasonable to worry about how to configure the oom killer to
 deal with the situation that his 600 connections used all 32GB of
 RAM *plus* the 32GB of swap space he has configured.  Imagine what
 performance must be like by the time it gets to *that* point!

Sure, I'm not opposed to documenting it.  Putting it in the actual
documentation is not a demotion relative to putting a link to it in
the documentation!  As for back-patching, we do that regularly with
documentation patches, as the situation seems to warrant.

 The OP clearly has read the docs, because he was attempting to take
 advice from the section on Linux Memory Overcommit.  But as far as I
 have seen, there is nothing in the docs to suggest connection
 pooling.  It is far better to avoid the oom killer by connection
 pooling than to tweak the oom killer configuration, yet you wouldn't
 have a clue about that from the docs.

Sounds easily fixable.  Let's add a note to that section right after
the first paragraph.

 I actually think that the issue merits space in the docs roughly
 matchnig the Wiki page.  Perhaps we could find a place in the Server
 Setup and Operation chapter to more-or-less include the current Wiki
 page contents, and reference *that* from the max_connections docs?

I think the page as it exists today needs some rewriting for tone, but
I think the contents are mostly OK.  Also, I am a bit doubtful about
the advice on sizing the connection pool as applied to small servers:
surely it's not sane to recommend that a single-processor system with
one disk should have max_connections = 3.  At least, *I* don't think
that's sane.

At least in HEAD/9.2, I like the idea of adding a section to Server
Setup and Operation to address this.  Perhaps something like Managing
Server Load or Preventing Server Overload.  I think there's quite a
bit we could say in such a chapter apart from the stuff on maximum
connections, and I think it would be valuable to have that stuff in
our documentation rather than scattered around the Internet in wiki
articles and random Greg Smith web pages.  I'm not sure we want to
back-patch an entire new chapter but let's put something together
first and then we'll figure out what makes sense.

A systemic problem with our documentation is that it tells you how to
do X, rather than what you ought to do.  This is a good example of
that, but not the only one.  We need to be careful of putting advice
in the docs that is one person's opinion rather than an actual best
practice, but I think there is plenty of stuff upon which enough
consensus exists that it really ought to be in the docs.  Like, who
really uses REINDEX on a production system, rather than CREATE INDEX
CONCURRENTLY + DROP INDEX?  Yet, you won't find that suggested in our
documentation anywhere, AFAIK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)

2012-08-09 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 we *really* need to actually get *something* into the docs on
 this,
 
 Sure, I'm not opposed to documenting it.  Putting it in the actual
 documentation is not a demotion relative to putting a link to it
 in the documentation!
 
I didn't figure it was; my emphasis was because this has been raised
before and nothing happened for want of a consensus on what
particular wording should be used, so users were left with no
guidance.  I don't want this to continue to be a victim of the
perfect is the enemy of the good syndrome.
 
  preferably back-patched.
 
 As for back-patching, we do that regularly with documentation
 patches, as the situation seems to warrant.
 
Yeah, I was arguing that this is one of those situations where it is
warranted.
 
 Let's add a note to that section right after the first
 paragraph.
 
OK.
 
 I actually think that the issue merits space in the docs roughly
 matchnig the Wiki page.  Perhaps we could find a place in the
 Server Setup and Operation chapter to more-or-less include the
 current Wiki page contents, and reference *that* from the
 max_connections docs?
 
 I think the page as it exists today needs some rewriting for tone,
 but I think the contents are mostly OK.
 
Would you like to edit the tone in Wiki form?  (If the tone is off,
I suspect it should be fixed in both places.)
 
 Also, I am a bit doubtful about the advice on sizing the
 connection pool as applied to small servers:
 surely it's not sane to recommend that a single-processor system
 with one disk should have max_connections = 3.  At least, *I*
 don't think that's sane.
 
I'm not sure it's wrong when combined with this: Remember that this
sweet spot is for the number of connections that are actively
doing work.  ...  You should always make max_connections a bit
bigger than the number of connections you enable in your connection
pool. That way there are always a few slots available for direct
connections for system maintenance and monitoring.  Where would you
expect the knee to be for connections concurrently actively doing
work on a single-core, single-drive system ?
 
 At least in HEAD/9.2, I like the idea of adding a section to
 Server Setup and Operation to address this.  Perhaps something
 like Managing Server Load or Preventing Server Overload.  I
 think there's quite a bit we could say in such a chapter apart
 from the stuff on maximum connections, and I think it would be
 valuable to have that stuff in our documentation rather than
 scattered around the Internet in wiki articles and random Greg
 Smith web pages.  I'm not sure we want to back-patch an entire new
 chapter but let's put something together first and then we'll
 figure out what makes sense.
 
As long as we're not drifting (again) into enemy of the good
territory.  Perhaps we should address this issue first, since it is
particularly acute, and then add others as separate patches?
 
 A systemic problem with our documentation is that it tells you how
 to do X, rather than what you ought to do.  This is a good example
 of that, but not the only one.  We need to be careful of putting
 advice in the docs that is one person's opinion rather than an
 actual best practice, but I think there is plenty of stuff upon
 which enough consensus exists that it really ought to be in the
 docs.  Like, who really uses REINDEX on a production system,
 rather than CREATE INDEX CONCURRENTLY + DROP INDEX?  Yet, you
 won't find that suggested in our documentation anywhere, AFAIK.
 
Good points.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers