Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 One possible way to do make an improvement in this area would be to
 move the responsibility for accepting connections out of the
 postmaster.  Instead, you'd have a group of children that would all
 call accept() on the socket, and the OS would arbitrarily pick one to
 receive each new incoming connection.  The postmaster would just be
 responsible for making sure that there were enough children hanging
 around.  You could in fact make this change without doing anything
 else, in which case it wouldn't save any work but would possibly
 reduce connection latency a bit since more of the work could be done
 before the connection actually arrived.

This seems like potentially a good idea independent of anything else,
just to reduce connection latency: fork() (not to mention exec() on
Windows) now happens before not after receipt of the connection request.
However, I see a couple of stumbling blocks:

1. Does accept() work that way everywhere (Windows, I'm looking at you)

2. What do you do when max_connections is exceeded, and you don't have
anybody at all listening on the socket?  Right now we are at least able
to send back an error message explaining the problem.

Another issue that would require some thought is what algorithm the
postmaster uses for deciding to spawn new children.  But that doesn't
sound like a potential showstopper.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Josh Berkus

On 12/06/2010 09:38 AM, Tom Lane wrote:

Another issue that would require some thought is what algorithm the
postmaster uses for deciding to spawn new children.  But that doesn't
sound like a potential showstopper.


We'd probably want a couple of different ones, optimized for different 
connection patterns.  Realistically.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Robert Haas
On Mon, Dec 6, 2010 at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 One possible way to do make an improvement in this area would be to
 move the responsibility for accepting connections out of the
 postmaster.  Instead, you'd have a group of children that would all
 call accept() on the socket, and the OS would arbitrarily pick one to
 receive each new incoming connection.  The postmaster would just be
 responsible for making sure that there were enough children hanging
 around.  You could in fact make this change without doing anything
 else, in which case it wouldn't save any work but would possibly
 reduce connection latency a bit since more of the work could be done
 before the connection actually arrived.

 This seems like potentially a good idea independent of anything else,
 just to reduce connection latency: fork() (not to mention exec() on
 Windows) now happens before not after receipt of the connection request.
 However, I see a couple of stumbling blocks:

 1. Does accept() work that way everywhere (Windows, I'm looking at you)

Not sure.  It might be useful to look at what Apache does, but I don't
have time to do that ATM.

 2. What do you do when max_connections is exceeded, and you don't have
 anybody at all listening on the socket?  Right now we are at least able
 to send back an error message explaining the problem.

Sending back an error message explaining the problem seems like a
non-negotiable requirement.  I'm not quite sure how to dance around
this.  Perhaps if max_connections is exhausted, the postmaster itself
joins the accept() queue and launches a dead-end backend for each new
connection.  Or perhaps we reserve one extra backend slot for a
probably-dead-end backend that will just sit there and mail rejection
notices; except that if it sees that a regular backend slot has opened
up it grabs it and turns itself into a regular backend.

 Another issue that would require some thought is what algorithm the
 postmaster uses for deciding to spawn new children.  But that doesn't
 sound like a potential showstopper.

The obvious algorithm would be to try to keep N spare workers around.
Any time the number of unconnected backends drops below N the
postmaster starts spawning new ones until it gets back up to N.  I
think the trick may not be the algorithm so much as finding a way to
make the signaling sufficiently robust and lightweight.  For example,
I bet having each child that gets a new connection signal() the
postmaster is a bad plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Josh Berkus



At some point Hackers should look at pg vs MySQL multi tenantry but it
is way tangential today.


My understanding is that our schemas work like MySQL databases; and
our databases are an even higher level of isolation.  No?


That's correct.  Drizzle is looking at implementing a feature like our 
databases called catalogs (per the SQL spec).


Let me stress that not everyone is happy with the MySQL multi-tenantry 
approach.  But it does make multi-tenancy on a scale which you seldom 
see with PG possible, even if it has problems.  It's worth seeing 
whether we can steal any of their optimization ideas without breaking PG.


I was specifically looking at the login model, which works around the 
issue that we have: namely that different login ROLEs can't share a 
connection pool.  In MySQL, they can share the built-in connection 
pool because role-switching effectively is a session variable. 
AFAICT, anyway.


For that matter, if anyone knows any other DB which does multi-tenant 
well/better, we should be looking at them too.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Robert Haas
On Mon, Dec 6, 2010 at 12:57 PM, Josh Berkus j...@agliodbs.com wrote:
 At some point Hackers should look at pg vs MySQL multi tenantry but it
 is way tangential today.

 My understanding is that our schemas work like MySQL databases; and
 our databases are an even higher level of isolation.  No?

 That's correct.  Drizzle is looking at implementing a feature like our
 databases called catalogs (per the SQL spec).

 Let me stress that not everyone is happy with the MySQL multi-tenantry
 approach.  But it does make multi-tenancy on a scale which you seldom see
 with PG possible, even if it has problems.  It's worth seeing whether we can
 steal any of their optimization ideas without breaking PG.

Please make sure to articulate what you think is wrong with our existing model.

 I was specifically looking at the login model, which works around the issue
 that we have: namely that different login ROLEs can't share a connection
 pool.  In MySQL, they can share the built-in connection pool because
 role-switching effectively is a session variable. AFAICT, anyway.

Please explain more precisely what is wrong with SET SESSION
AUTHORIZATION / SET ROLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Josh Berkus

 Please explain more precisely what is wrong with SET SESSION
 AUTHORIZATION / SET ROLE.

1) Session GUCS do not change with a SET ROLE (this is a TODO I haven't
had any time to work on)

2) Users can always issue their own SET ROLE and then hack into other
users' data.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Robert Haas
On Mon, Dec 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote:

 Please explain more precisely what is wrong with SET SESSION
 AUTHORIZATION / SET ROLE.

 1) Session GUCS do not change with a SET ROLE (this is a TODO I haven't
 had any time to work on)

 2) Users can always issue their own SET ROLE and then hack into other
 users' data.

Makes sense.  It would be nice to fix those issues, independent of
anything else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Alvaro Herrera
Excerpts from Robert Haas's message of lun dic 06 23:09:56 -0300 2010:
 On Mon, Dec 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Please explain more precisely what is wrong with SET SESSION
  AUTHORIZATION / SET ROLE.
 
  1) Session GUCS do not change with a SET ROLE (this is a TODO I haven't
  had any time to work on)
 
  2) Users can always issue their own SET ROLE and then hack into other
  users' data.
 
 Makes sense.  It would be nice to fix those issues, independent of
 anything else.

It seems plausible to fix the first one, but how would you fix the
second one?  You either allow SET ROLE (which you need, to support the
pooler changing authorization), or you don't.  There doesn't seem to be
a usable middleground.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Josh Berkus

 It seems plausible to fix the first one, but how would you fix the
 second one?  You either allow SET ROLE (which you need, to support the
 pooler changing authorization), or you don't.  There doesn't seem to be
 a usable middleground.

Well, this is why such a pooler would *have* to be built into the
backend.  It would need to be able to SET ROLE even though SET ROLE
would not be accepted over the client connection.  We'd also need
bookkeeping to track the ROLE (and other GUCs) of each client connection
and reset them whenever that client connection switches back.

Mind you, I'm not entirely convinced that the end result of this would
be performant.  And they would certainly be complicated.  I think that
we should start by dealing with the simplest situation, ignoring SET
ROLE and GUC issues for now.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Robert Haas
On Mon, Dec 6, 2010 at 9:37 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of lun dic 06 23:09:56 -0300 2010:
 On Mon, Dec 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Please explain more precisely what is wrong with SET SESSION
  AUTHORIZATION / SET ROLE.
 
  1) Session GUCS do not change with a SET ROLE (this is a TODO I haven't
  had any time to work on)
 
  2) Users can always issue their own SET ROLE and then hack into other
  users' data.

 Makes sense.  It would be nice to fix those issues, independent of
 anything else.

 It seems plausible to fix the first one, but how would you fix the
 second one?  You either allow SET ROLE (which you need, to support the
 pooler changing authorization), or you don't.  There doesn't seem to be
 a usable middleground.

You could add a protocol message that does a permanent role switch
in a way that can't be undone except by another such protocol message.
 Then connection poolers could simply refuse to proxy that particular
message.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-06 Thread Craig Ringer
On 07/12/10 10:48, Josh Berkus wrote:
 
 It seems plausible to fix the first one, but how would you fix the
 second one?  You either allow SET ROLE (which you need, to support the
 pooler changing authorization), or you don't.  There doesn't seem to be
 a usable middleground.
 
 Well, this is why such a pooler would *have* to be built into the
 backend.  It would need to be able to SET ROLE even though SET ROLE
 would not be accepted over the client connection.

There's actually another way to do that that could be retrofitted onto
an existing external pooler. It's not lovely, but if the approach above
proved too hard...

SET ROLE could accept a cookie / one-time password that had to be passed
to RESET ROLE in order for RESET ROLE to accept the command.

SET ROLE fred WITH COOKIE 'goqu8Mi6choht8ie';
-- hand to the user
-- blah blah user work blah
-- returned by the user
RESET ROLE WITH COOKIE 'goqu8Mi6choht8ie';


The tricky bit might be that the user should still be permitted to SET
ROLE, but only to roles that the role the pooler switched them to
(fred) has rights to SET ROLE to, not to roles that the pooler user
its self has rights to switch to.

 We'd also need
 bookkeeping to track the ROLE (and other GUCs) of each client connection
 and reset them whenever that client connection switches back.

I'm really interested in this direction. Taken just a little further, it
could bring Pg to the point where query executors (backends) are
separated from connection state, so a given backend could pick up and
work on queries by several different connections in rapid succession.
The advantage there is that idle connections would become cheap,
low-overhead affairs.

As I (poorly) understand how Pg is designed it'd only be possible for a
backend to work on queries that act on the same database, it couldn't
really switch databases. That'd still be a real bonus especially for
newer users who don't realize they *need* a connection pool.

-- 
System  Network Administrator
POST Newspapers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Stefan Kaltenbrunner

On 12/01/2010 05:32 AM, Jeff Janes wrote:

On 11/28/10, Robert Haasrobertmh...@gmail.com  wrote:


In a close race, I don't think we should get bogged down in
micro-optimization here, both because micro-optimizations may not gain
much and because what works well on one platform may not do much at
all on another.  The more general issue here is what to do about our
high backend startup costs.  Beyond trying to recycle backends for new
connections, as I've previous proposed and with all the problems it
entails,


Is there a particular discussion of that matter you could point me to?


the only thing that looks promising here is to try to somehow
cut down on the cost of populating the catcache and relcache, not that
I have a very clear idea how to do that.  This has to be a soluble
problem because other people have solved it.


Oracle's backend start up time seems to be way higher than PG's.
Their main solution is something that is fundamentally a built in
connection pooler with some bells and whistles built in.   I'm not
sure other people you had in mind--Oracle is generally the one that
pops to my mind.


To some degree we're a
victim of our own flexible and extensible architecture here, but I
find it pretty unsatisfying to just say, OK, well, we're slow.



What about well OK, we have PGbouncer?  Are there fixable
short-comings that it has which could make the issue less of an issue?


well I would very much like to seen an integrated pooler in postgresql - 
pgbouncer is a very nice piece of software (and might even be a base for 
an integrated bouncer), but being not closely tied to the backend you 
are loosing a lot.
One of the more obvious examples is that now that we have no flatfile 
copy of pg_authid you have to use cruel hacks like:

http://www.depesz.com/index.php/2010/12/04/auto-refreshing-password-file-for-pgbouncer/

to get automatic management of roles. There are some other drawbacks 
as well:


* no coordination of restarts/configuration changes between the cluster 
and the pooler
* you have two pieces of config files to configure your pooling settings 
(having all that available say in a catalog in pg would be awesome)
* you lose all of the advanced authentication features of pg (because 
all connections need to go through the pooler) and also ip-based stuff

* no SSL support(in the case of pgbouncer)
* complexity in reseting backend state (we added some support for that 
through explicit SQL level commands in recent releases but it still is a 
hazard)



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Josh Berkus



* no coordination of restarts/configuration changes between the cluster
and the pooler
* you have two pieces of config files to configure your pooling settings
(having all that available say in a catalog in pg would be awesome)
* you lose all of the advanced authentication features of pg (because
all connections need to go through the pooler) and also ip-based stuff
* no SSL support(in the case of pgbouncer)
* complexity in reseting backend state (we added some support for that
through explicit SQL level commands in recent releases but it still is a
hazard)


More:

* pooler logs to separate file, for which there are (currently) no 
anaysis tools

* pooling is incompatible with the use of ROLES for data security

The last is a major issue, and not one I think we can easily resolve. 
MySQL has a pooling-friendly user system, because when you connect to 
MySQL you basically always connect as the superuser and on connection it 
switches you to your chosen login role.  This, per Rob Wulsch, is one of 
the things at the heart of allowing MySQL to support 100,000 low 
frequency users per cheap hosting system.


As you might imagine, this behavior is also the source of a lot of 
MySQL's security bugs.  I don't see how we could imitate it without 
getting the bugs as well.



--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Rob Wultsch
On Sun, Dec 5, 2010 at 11:59 AM, Josh Berkus j...@agliodbs.com wrote:

 * no coordination of restarts/configuration changes between the cluster
 and the pooler
 * you have two pieces of config files to configure your pooling settings
 (having all that available say in a catalog in pg would be awesome)
 * you lose all of the advanced authentication features of pg (because
 all connections need to go through the pooler) and also ip-based stuff
 * no SSL support(in the case of pgbouncer)
 * complexity in reseting backend state (we added some support for that
 through explicit SQL level commands in recent releases but it still is a
 hazard)

 More:

 * pooler logs to separate file, for which there are (currently) no anaysis
 tools
 * pooling is incompatible with the use of ROLES for data security

 The last is a major issue, and not one I think we can easily resolve. MySQL
 has a pooling-friendly user system, because when you connect to MySQL you
 basically always connect as the superuser and on connection it switches you
 to your chosen login role.  This, per Rob Wulsch, is one of the things at
 the heart of allowing MySQL to support 100,000 low frequency users per cheap
 hosting system.

 As you might imagine, this behavior is also the source of a lot of MySQL's
 security bugs.  I don't see how we could imitate it without getting the bugs
 as well.



I think you have read a bit more into what I have said than is
correct.  MySQL can deal with thousands of users and separate schemas
on commodity hardware. There are many design decisions (some
questionable) that have made MySQL much better in a shared hosting
environment than pg and I don't know where the grants system falls
into that.

MySQL does not have that many security problems because of how grants
are stored. Most MySQL security issues are DOS sort of stuff based on
a authenticated user being able to cause a crash. The decoupled
backend storage and a less than awesome parser shared most of the
blame for these issues.

One thing I would suggest that the PG community keeps in mind while
talking about built in connection process caching, is that it is very
nice feature for memory leaks caused by a connection to not exist for
and continue growing forever.

NOTE: 100k is not a number that I would put much stock in. I don't
recall ever mentioning that number and it is not a number that would
be truthful for me to throw out.



-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Rob Wultsch
On Sun, Dec 5, 2010 at 12:45 PM, Rob Wultsch wult...@gmail.com wrote:
 One thing I would suggest that the PG community keeps in mind while
 talking about built in connection process caching, is that it is very
 nice feature for memory leaks caused by a connection to not exist for
 and continue growing forever.

s/not exist for/not exist/

I have had issues with very slow leaks in MySQL building up over
months. It really sucks to have to go to management to ask for
downtime because of a slow memory leak.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas
On Sun, Dec 5, 2010 at 3:17 PM, Rob Wultsch wult...@gmail.com wrote:
 On Sun, Dec 5, 2010 at 12:45 PM, Rob Wultsch wult...@gmail.com wrote:
 One thing I would suggest that the PG community keeps in mind while
 talking about built in connection process caching, is that it is very
 nice feature for memory leaks caused by a connection to not exist for
 and continue growing forever.

 s/not exist for/not exist/

 I have had issues with very slow leaks in MySQL building up over
 months. It really sucks to have to go to management to ask for
 downtime because of a slow memory leak.

Apache has a very simple and effective solution to this problem - they
have a configuration option controlling the number of connections a
child process handles before it dies and a new one is spawned.  I've
found that setting this to 1000 works excellently.  Process startup
overhead decreases by three orders of magnitude, and only egregiously
bad leaks add up to enough to matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas
On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch wult...@gmail.com wrote:
 I think you have read a bit more into what I have said than is
 correct.  MySQL can deal with thousands of users and separate schemas
 on commodity hardware. There are many design decisions (some
 questionable) that have made MySQL much better in a shared hosting
 environment than pg and I don't know where the grants system falls
 into that.

Objection: Vague.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas
On Sat, Dec 4, 2010 at 8:04 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 But who would be doing the passing?  For the postmaster to be doing
 that would probably go against the minimalist design.  It would have
 to keep track of which backend is available, and which db and user it
 is primed for.  Perhaps a feature could be added to the backend to
 allow it to get passed a FD from pgbouncer or pgpool-II and then hand
 control back to the pooler upon close of the connection, as they
 already have the infrastructure to keep pools around while the
 postmaster does not.  Are pgbouncer and pgpool close enough to core
 to make such intimate collaboration with the backend OK?

I am not sure.  I'm afraid that might be adding complexity without
really solving anything, but maybe I'm a pessimist.

One possible way to do make an improvement in this area would be to
move the responsibility for accepting connections out of the
postmaster.  Instead, you'd have a group of children that would all
call accept() on the socket, and the OS would arbitrarily pick one to
receive each new incoming connection.  The postmaster would just be
responsible for making sure that there were enough children hanging
around.  You could in fact make this change without doing anything
else, in which case it wouldn't save any work but would possibly
reduce connection latency a bit since more of the work could be done
before the connection actually arrived.

From there, you could go two ways.

One option would be to have backends that would otherwise terminate
normally instead do the equivalent of DISCARD ALL and then go back
around and try to accept() another incoming connection.  If they get a
guy who wants the database to which they previously connected, profit.
 If not, laboriously flush every cache in sight and rebind to the new
database.

Another option would be to have backends that would otherwise
terminate normally instead do the equivalent of DISCARD ALL and then
mark themselves as able to accept a new connection to the same
database to which they are already connected (but not any other
database).  Following authentication, a backend that accepted a new
incoming connection looks through the pool of such backends and, if it
finds one, hands off the connection using file-descriptor passing and
then loops back around to accept() again.  Otherwise it handles the
connection itself.  This wouldn't offer much of an advantage over the
first option for a cluster that basically has just one database, or
for a cluster that has 1000 actively used databases.  But it would be
much better for a system with three databases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Rob Wultsch
On Sun, Dec 5, 2010 at 6:59 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch wult...@gmail.com wrote:
 I think you have read a bit more into what I have said than is
 correct.  MySQL can deal with thousands of users and separate schemas
 on commodity hardware. There are many design decisions (some
 questionable) that have made MySQL much better in a shared hosting
 environment than pg and I don't know where the grants system falls
 into that.

 Objection: Vague.


I retract the remark, your honor.

At some point Hackers should look at pg vs MySQL multi tenantry but it
is way tangential today.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas
On Sun, Dec 5, 2010 at 9:35 PM, Rob Wultsch wult...@gmail.com wrote:
 On Sun, Dec 5, 2010 at 6:59 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch wult...@gmail.com wrote:
 I think you have read a bit more into what I have said than is
 correct.  MySQL can deal with thousands of users and separate schemas
 on commodity hardware. There are many design decisions (some
 questionable) that have made MySQL much better in a shared hosting
 environment than pg and I don't know where the grants system falls
 into that.

 Objection: Vague.

 I retract the remark, your honor.

Clarifying it would be fine, too...  :-)

 At some point Hackers should look at pg vs MySQL multi tenantry but it
 is way tangential today.

My understanding is that our schemas work like MySQL databases; and
our databases are an even higher level of isolation.  No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-04 Thread Jeff Janes
On Wed, Dec 1, 2010 at 6:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Nov 30, 2010 at 11:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On 11/28/10, Robert Haas robertmh...@gmail.com wrote:

 In a close race, I don't think we should get bogged down in
 micro-optimization here, both because micro-optimizations may not gain
 much and because what works well on one platform may not do much at
 all on another.  The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails,

 Is there a particular discussion of that matter you could point me to?

 the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.  This has to be a soluble
 problem because other people have solved it.

 Oracle's backend start up time seems to be way higher than PG's.
 Their main solution is something that is fundamentally a built in
 connection pooler with some bells and whistles built in.   I'm not
 sure other people you had in mind--Oracle is generally the one that
 pops to my mind.

 Interesting.  How about MySQL and SQL Server?

I don't have experience with MS SQL Server, and don't know how it
performs on that front.  I haven't really considered MySQL to be a
real RDBMS, more of just an indexing system, although I guess it is
steadily becoming more featurefull.  It is indisputably faster at
making connections than PG, but still much slower than a connection
pooler.


 To some degree we're a
 victim of our own flexible and extensible architecture here, but I
 find it pretty unsatisfying to just say, OK, well, we're slow.

 What about well OK, we have PGbouncer?  Are there fixable
 short-comings that it has which could make the issue less of an issue?

 We do have pgbouncer, and pgpool-II, and that's a good thing.  But it
 also requires proxying every interaction with the database through an
 intermediate piece of software, which is not free.

True, a simple in-memory benchmark with pgbench -S -c1 showed 10,000
tps connecting straight, and 7000 tps through pgbouncer.  But if
people want to make and breaks 100s of connections per second, they
must not be doing very many queries per connection so I don't know how
relevant that per query slow-down is.

 An in-core
 solution ought to be able to arrange for each new connection to be
 directly attached to an existing backend, using file-descriptor
 passing.

But who would be doing the passing?  For the postmaster to be doing
that would probably go against the minimalist design.  It would have
to keep track of which backend is available, and which db and user it
is primed for.  Perhaps a feature could be added to the backend to
allow it to get passed a FD from pgbouncer or pgpool-II and then hand
control back to the pooler upon close of the connection, as they
already have the infrastructure to keep pools around while the
postmaster does not.  Are pgbouncer and pgpool close enough to core
to make such intimate collaboration with the backend OK?


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-01 Thread Robert Haas
On Tue, Nov 30, 2010 at 11:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On 11/28/10, Robert Haas robertmh...@gmail.com wrote:

 In a close race, I don't think we should get bogged down in
 micro-optimization here, both because micro-optimizations may not gain
 much and because what works well on one platform may not do much at
 all on another.  The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails,

 Is there a particular discussion of that matter you could point me to?

 the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.  This has to be a soluble
 problem because other people have solved it.

 Oracle's backend start up time seems to be way higher than PG's.
 Their main solution is something that is fundamentally a built in
 connection pooler with some bells and whistles built in.   I'm not
 sure other people you had in mind--Oracle is generally the one that
 pops to my mind.

Interesting.  How about MySQL and SQL Server?

 To some degree we're a
 victim of our own flexible and extensible architecture here, but I
 find it pretty unsatisfying to just say, OK, well, we're slow.

 What about well OK, we have PGbouncer?  Are there fixable
 short-comings that it has which could make the issue less of an issue?

We do have pgbouncer, and pgpool-II, and that's a good thing.  But it
also requires proxying every interaction with the database through an
intermediate piece of software, which is not free.  An in-core
solution ought to be able to arrange for each new connection to be
directly attached to an existing backend, using file-descriptor
passing.  Tom has previously complained that this isn't portable, but
a little research suggests that it is supported on at least Linux, Mac
OS X, FreeBSD, OpenBSD, Solaris, and Windows, so in practice the
percentage of our user base who could benefit seems like it would
likely be very high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-01 Thread Andres Freund
On Wednesday 01 December 2010 15:20:32 Robert Haas wrote:
 On Tue, Nov 30, 2010 at 11:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  On 11/28/10, Robert Haas robertmh...@gmail.com wrote:
  To some degree we're a
  victim of our own flexible and extensible architecture here, but I
  find it pretty unsatisfying to just say, OK, well, we're slow.
  
  What about well OK, we have PGbouncer?  Are there fixable
  short-comings that it has which could make the issue less of an issue?
 
 We do have pgbouncer, and pgpool-II, and that's a good thing.  But it
 also requires proxying every interaction with the database through an
 intermediate piece of software, which is not free.  An in-core
 solution ought to be able to arrange for each new connection to be
 directly attached to an existing backend, using file-descriptor
 passing.  Tom has previously complained that this isn't portable, but
 a little research suggests that it is supported on at least Linux, Mac
 OS X, FreeBSD, OpenBSD, Solaris, and Windows, so in practice the
 percentage of our user base who could benefit seems like it would
 likely be very high.
HPUX and AIX allow fd transfer as well. I still don't see what even remotely 
relevant platform would be a problem.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-12-01 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Jeff Janes jeff.ja...@gmail.com wrote:
 
 Oracle's backend start up time seems to be way higher than PG's.
 
 Interesting.  How about MySQL and SQL Server?
 
My recollection of Sybase ASE is that establishing a connection
doesn't start a backend or even a thread.  It establishes a network
connection and associates network queues and a connection context
structure with it.  Engine threads with CPU affinity (and a few
miscellaneous worker threads, too, if I remember right) do all the
work in a queue-based fashion.
 
Last I worked with MS SQL Server it was based on the Sybase code and
therefore worked the same way.  I know they've made a lot of changes
in the last five years (including switching to MVCC and adding
snapshot isolation in addition to the already-existing serializable
isolation), so I don't know whether connection startup cost has
changed along the way.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-30 Thread Peter Eisentraut
On mån, 2010-11-29 at 13:10 -0500, Tom Lane wrote:
 Rolling in calloc in place of
 malloc/memset made no particular difference either, which says that
 Fedora 13's glibc does not have any optimization for that case as I'd
 hoped.

glibc's calloc is either mmap of /dev/zero or malloc followed by memset.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-30 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2010-11-29 at 13:10 -0500, Tom Lane wrote:
 Rolling in calloc in place of
 malloc/memset made no particular difference either, which says that
 Fedora 13's glibc does not have any optimization for that case as I'd
 hoped.

 glibc's calloc is either mmap of /dev/zero or malloc followed by memset.

Hmm.  I would have expected to see a difference then.  Do you know what
conditions are needed to cause the mmap to be used?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-30 Thread Jeff Janes
On 11/28/10, Robert Haas robertmh...@gmail.com wrote:

 In a close race, I don't think we should get bogged down in
 micro-optimization here, both because micro-optimizations may not gain
 much and because what works well on one platform may not do much at
 all on another.  The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails,

Is there a particular discussion of that matter you could point me to?

 the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.  This has to be a soluble
 problem because other people have solved it.

Oracle's backend start up time seems to be way higher than PG's.
Their main solution is something that is fundamentally a built in
connection pooler with some bells and whistles built in.   I'm not
sure other people you had in mind--Oracle is generally the one that
pops to my mind.

 To some degree we're a
 victim of our own flexible and extensible architecture here, but I
 find it pretty unsatisfying to just say, OK, well, we're slow.


What about well OK, we have PGbouncer?  Are there fixable
short-comings that it has which could make the issue less of an issue?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-30 Thread Peter Eisentraut
On tis, 2010-11-30 at 15:49 -0500, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On mån, 2010-11-29 at 13:10 -0500, Tom Lane wrote:
  Rolling in calloc in place of
  malloc/memset made no particular difference either, which says that
  Fedora 13's glibc does not have any optimization for that case as I'd
  hoped.
 
  glibc's calloc is either mmap of /dev/zero or malloc followed by memset.
 
 Hmm.  I would have expected to see a difference then.  Do you know what
 conditions are needed to cause the mmap to be used?

Check out the mallopt(3) man page.  It contains a few tunable malloc
options that may be useful for your investigation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Well, the lack of extensible XLOG support is definitely a big handicap
 to building a *production* index AM as an add-on.  But it's not such a
 handicap for development.

 Realistically, it's hard for me to imagine that anyone would go to the
 trouble of building it as a loadable module first and then converting
 it to part of core later on.  That'd hardly be less work.

Well, it depends a lot on external factors. Like for example willing to
use the code before to spend the necessary QA time that is needed for it
to land in core. Two particular examples come to mind, which are tsearch
and KNN GiST. The main problem with integrating into core, AFAIUI, are
related to code maitenance, not at all with code stability and quality
of the addon itself.

It's just so much easier to develop an external module…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Robert Haas
On Sun, Nov 28, 2010 at 11:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yeah, very true.  What's a bit frustrating about the whole thing is
 that we spend a lot of time pulling data into the caches that's
 basically static and never likely to change anywhere, ever.

 True.  I wonder if we could do something like the relcache init file
 for the catcaches.

Maybe.  It's hard to know exactly what to pull in, though, nor is it
clear to me how much it would really save.  You've got to keep the
thing up to date somehow, too.

I finally got around to doing some testing of
page-faults-versus-actually-memory-initialization, using the attached
test program, compiled with warnings, but without optimization.
Typical results on MacOS X:

first run: 297299
second run: 99653

And on Fedora 12 (2.6.32.23-170.fc12.x86_64):

first run: 509309
second run: 114721

I guess the word run is misleading (I wrote the program in 5
minutes); it's just zeroing the same chunk twice and measuring the
times.  The difference is presumably the page fault overhead, which
implies that faulting is two-thirds of the overhead on MacOS X and
three-quarters of the overhead on Linux.  This makes me pretty
pessimistic about the chances of a meaningful speedup here.

 Maybe we could speed things up a bit if we got rid of the pg_attribute
 entries for the system attributes (except OID).

 I used to have high hopes for that idea, but the column privileges
 patch broke it permanently.

http://archives.postgresql.org/pgsql-hackers/2010-07/msg00151.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
#include stdio.h
#include stdlib.h
#include string.h
#include sys/time.h

char bss[512*1024*1024];

void
print_times(char *tag, struct timeval *before, struct timeval *after)
{
	int result = (after-tv_sec - before-tv_sec) * 100
		+ ((int)after-tv_usec) - ((int)before-tv_usec);
	printf(%s: %d\n, tag, result);
}

int
main(int argc, char **argv)
{
	struct timeval t1;
	struct timeval t2;
	struct timeval t3;

	if (gettimeofday(t1, NULL))
		return 1;
	memset(bss, 0, sizeof bss);
	if (gettimeofday(t2, NULL))
		return 1;
	memset(bss, 0, sizeof bss);
	if (gettimeofday(t3, NULL))
		return 1;

	print_times(first run, t1, t2);
	print_times(second run, t2, t3);

	return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Andres Freund
On Monday 29 November 2010 17:57:51 Robert Haas wrote:
 On Sun, Nov 28, 2010 at 11:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Yeah, very true.  What's a bit frustrating about the whole thing is
  that we spend a lot of time pulling data into the caches that's
  basically static and never likely to change anywhere, ever.
  
  True.  I wonder if we could do something like the relcache init file
  for the catcaches.
 
 Maybe.  It's hard to know exactly what to pull in, though, nor is it
 clear to me how much it would really save.  You've got to keep the
 thing up to date somehow, too.
 
 I finally got around to doing some testing of
 page-faults-versus-actually-memory-initialization, using the attached
 test program, compiled with warnings, but without optimization.
 Typical results on MacOS X:
 
 first run: 297299
 second run: 99653
 
 And on Fedora 12 (2.6.32.23-170.fc12.x86_64):
 
 first run: 509309
 second run: 114721
Hm. A quick test shows that its quite a bit faster if you allocate memory 
with:
size_t s = 512*1024*1024;
char *bss = mmap(0, s, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_POPULATE|
MAP_ANONYMOUS, -1, 0);

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Robert Haas
On Mon, Nov 29, 2010 at 12:24 PM, Andres Freund and...@anarazel.de wrote:
 Hm. A quick test shows that its quite a bit faster if you allocate memory
 with:
 size_t s = 512*1024*1024;
 char *bss = mmap(0, s, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_POPULATE|
 MAP_ANONYMOUS, -1, 0);

Numbers?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Andres Freund
On Monday 29 November 2010 18:34:02 Robert Haas wrote:
 On Mon, Nov 29, 2010 at 12:24 PM, Andres Freund and...@anarazel.de wrote:
  Hm. A quick test shows that its quite a bit faster if you allocate memory
  with:
  size_t s = 512*1024*1024;
  char *bss = mmap(0, s, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_POPULATE|
  MAP_ANONYMOUS, -1, 0);
 
 Numbers?
malloc alloc: 43
malloc memset1: 438763
malloc memset2: 98764
total: 537570

mmap alloc: 296065
mmap memset1: 99203
mmap memset2: 100608
total: 495876

But you don't actually need the memset1 in the mmap case as MAP_ANONYMOUS 
memory is already zeroed. We could actually use that knowledge even without 
MAP_POPULATE if we somehow keep track whether an allocated memory region is 
still zeroed.

Taking that into account its:

malloc alloc: 47
malloc memset1: 437819
malloc memset2: 98317
total: 536183
mmap alloc: 292904
mmap memset1: 1
mmap memset2: 99284
total: 392189


I am somewhat reluctant to believe thats the way to go.

Andres

#include stdio.h
#include stdlib.h
#include string.h
#include sys/time.h
#include sys/mman.h
#include malloc.h

char bss[512*1024*1024];

void
print_times(char *tag, struct timeval *before, struct timeval *after)
{
	int result = (after-tv_sec - before-tv_sec) * 100
		+ ((int)after-tv_usec) - ((int)before-tv_usec);
	printf(%s: %d\n, tag, result);
}

int
main(int argc, char **argv)
{
	size_t s = 512*1024*1024;

	struct timeval t1_1;
	struct timeval t1_2;
	struct timeval t1_3;
	struct timeval t1_4;
	struct timeval t2_1;
	struct timeval t2_2;
	struct timeval t2_3;
	struct timeval t2_4;

	if (gettimeofday(t1_1, NULL))
		return 1;
	mallopt(M_MMAP_MAX, 0);
	char* bss1 = malloc(s);

	if (gettimeofday(t1_2, NULL))
		return 1;

	memset(bss1, 0, s);

	if (gettimeofday(t1_3, NULL))
		return 1;

	memset(bss1, 0, s);

	if (gettimeofday(t1_4, NULL))
		return 1;



	if (gettimeofday(t2_1, NULL))
		return 1;

	char* bss2 = mmap(0, s, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_POPULATE|MAP_ANONYMOUS, -1, 0);

	if (gettimeofday(t2_2, NULL))
		return 1;

	//memset(bss1, 0, s);

	if (gettimeofday(t2_3, NULL))
		return 1;

	memset(bss1, 0, s);

	if (gettimeofday(t2_4, NULL))
		return 1;


	print_times(malloc alloc, t1_1, t1_2);
	print_times(malloc memset1, t1_2, t1_3);
	print_times(malloc memset2, t1_3, t1_4);
	print_times(total, t1_1, t1_4);

	print_times(mmap alloc, t2_1, t2_2);
	print_times(mmap memset1, t2_2, t2_3);
	print_times(mmap memset2, t2_3, t2_4);
	print_times(total, t2_1, t2_4);

	return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Jeff Janes
On Mon, Nov 29, 2010 at 9:24 AM, Andres Freund and...@anarazel.de wrote:
 On Monday 29 November 2010 17:57:51 Robert Haas wrote:
 On Sun, Nov 28, 2010 at 11:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Yeah, very true.  What's a bit frustrating about the whole thing is
  that we spend a lot of time pulling data into the caches that's
  basically static and never likely to change anywhere, ever.
 
  True.  I wonder if we could do something like the relcache init file
  for the catcaches.

 Maybe.  It's hard to know exactly what to pull in, though, nor is it
 clear to me how much it would really save.  You've got to keep the
 thing up to date somehow, too.

 I finally got around to doing some testing of
 page-faults-versus-actually-memory-initialization, using the attached
 test program, compiled with warnings, but without optimization.
 Typical results on MacOS X:

 first run: 297299
 second run: 99653

 And on Fedora 12 (2.6.32.23-170.fc12.x86_64):

 first run: 509309
 second run: 114721
 Hm. A quick test shows that its quite a bit faster if you allocate memory
 with:
 size_t s = 512*1024*1024;
 char *bss = mmap(0, s, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_POPULATE|
 MAP_ANONYMOUS, -1, 0);

Could you post the program?

Are you sure you haven't just moved the page-fault time to a part of
the code where it still exists, but just isn't being captured and
reported?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I guess the word run is misleading (I wrote the program in 5
 minutes); it's just zeroing the same chunk twice and measuring the
 times.  The difference is presumably the page fault overhead, which
 implies that faulting is two-thirds of the overhead on MacOS X and
 three-quarters of the overhead on Linux.

Ah, cute solution to the measurement problem.  I replicated the
experiment just as a cross-check:

Fedora 13 on x86_64 (recent Nehalem):
first  run: 346767
second run: 103143

Darwin on x86_64 (not-so-recent Penryn):
first  run: 341289
second run: 64535

HPUX on HPPA:
first  run: 2191136
second run: 1199879

(On the last two machines I had to cut the array size to 256MB to avoid
swapping.)  All builds with gcc -O2.

 This makes me pretty
 pessimistic about the chances of a meaningful speedup here.

Yeah, this is confirmation that what you are seeing in the original test
is mostly about faulting pages in, not about the zeroing.  I think it
would still be interesting to revisit the micro-optimization of
MemSet(), but it doesn't look like massive restructuring to avoid it
altogether is going to be worthwhile.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 Are you sure you haven't just moved the page-fault time to a part of
 the code where it still exists, but just isn't being captured and
 reported?

I'm a bit suspicious about that too.  Another thing to keep in mind
is that Robert's original program doesn't guarantee that the char
array is maxaligned; though reasonable implementations of memset
should be able to use the same inner loop anyway for most of the
array.

I did some experimentation here and couldn't find any real difference in
runtime between the original program and substituting a malloc() call
for the static array allocation.  Rolling in calloc in place of
malloc/memset made no particular difference either, which says that
Fedora 13's glibc does not have any optimization for that case as I'd
hoped.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Robert Haas
On Mon, Nov 29, 2010 at 12:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 (On the last two machines I had to cut the array size to 256MB to avoid
 swapping.)

You weren't kidding about that not so recent part.  :-)

 This makes me pretty
 pessimistic about the chances of a meaningful speedup here.

 Yeah, this is confirmation that what you are seeing in the original test
 is mostly about faulting pages in, not about the zeroing.  I think it
 would still be interesting to revisit the micro-optimization of
 MemSet(), but it doesn't look like massive restructuring to avoid it
 altogether is going to be worthwhile.

Yep.  I think that what we've established here is that starting new
processes all time time is just plain expensive, and we're going to
have to start fewer of them if we want to make a meaningful
improvement.

My impression is that the process startup overhead is even higher on
Windows, although I am not now nor have I ever been a Windows
programmer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Bruce Momjian
Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
  Not sure that information moves us forward. ?If the postmaster cleared
  the memory, we would have COW in the child and probably be even slower.
 
  Well, we can determine the answers to these questions empirically.
 
 Not really.  Per Bruce's description, a page would become COW the moment
 the postmaster touched (either write or read) any variable on it.  Since
 we have no control over how the loader lays out static variables, the
 actual behavior of a particular build would be pretty random and subject
 to unexpected changes caused by seemingly unrelated edits.

I believe all linkers will put initialized data (data segment) before
unitialized data (bss segment):

http://en.wikipedia.org/wiki/Data_segment

The only question is whether the linker has data and bss sharing the
same VM page (4k), or whether a new VM page is used when starting the
bss segment.

 Also, the referenced URL only purports to describe the behavior of
 HPUX, which is not exactly a mainstream OS.  I think it requires a
 considerable leap of faith to assume that all or even most platforms
 work the way this suggests, and not in the dumber fashion Andres
 suggested.  Has anybody here actually looked at the relevant Linux
 or BSD kernel code?

I have years ago, but not recently.  You can see the sections on Linux
via objdump:

$ objdump --headers /bin/ls

/bin/ls: file format elf32-i386

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
...
 24 .data 012c  080611a0  080611a0  000191a0  2**5
  CONTENTS, ALLOC, LOAD, DATA
 25 .bss  0c40  080612e0  080612e0  000192cc  2**5
  ALLOC

Based on this output, a new 4k page is not started for the 'bss'
segment.  It basically uses 32-byte alignment.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Bruce Momjian
Robert Haas wrote:
 In a close race, I don't think we should get bogged down in
 micro-optimization here, both because micro-optimizations may not gain
 much and because what works well on one platform may not do much at
 all on another.  The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails, the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.  This has to be a soluble
 problem because other people have solved it.  To some degree we're a
 victim of our own flexible and extensible architecture here, but I
 find it pretty unsatisfying to just say, OK, well, we're slow.

Combining your last two sentences, I am not sure anyone with the
flexibility we have has solved the cache populating problem.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Bruce Momjian
Tom Lane wrote:
 BTW, this might be premature to mention pending some tests about mapping
 versus zeroing overhead, but it strikes me that there's more than one
 way to skin a cat.  I still think the idea of statically allocated space
 sucks.  But what if we rearranged things so that palloc0 doesn't consist
 of palloc-then-memset, but rather push the zeroing responsibility down
 into the allocator?  In particular, I'm imagining that palloc0 with a
 sufficiently large space request --- more than a couple pages --- could
 somehow arrange to get space that's guaranteed zero already.  And if the
 request isn't large, zeroing it isn't where our problem is anyhow.

 The most portable way to do that would be to use calloc insted of malloc,
 and hope that libc is smart enough to provide freshly-mapped space.
 It would be good to look and see whether glibc actually does so,
 of course.  If not we might end up having to mess with sbrk for
 ourselves, and I'm not sure how pleasantly that interacts with malloc.

Yes, I was going to suggest trying calloc(), either because we can get
already-zeroed sbrk() memory, or because libc uses assembly language for
zeroing memory, as some good libc's do.  I know most kernels also use
assembly for zeroing memory.

 Another question that would be worth asking here is whether the
 hand-baked MemSet macro still outruns memset on modern architectures.
 I think it's been quite a few years since that was last tested.

Yes, MemSet was found to be faster than calling a C function, but new
testing is certainly warranted.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Bruce Momjian
Robert Haas wrote:
 On Sun, Nov 28, 2010 at 7:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  One possible way to get a real speedup here would be to look for ways
  to trim the number of catcaches.
 
  BTW, it's not going to help to remove catcaches that have a small
  initial size, as the pg_am cache certainly does. ?If the bucket zeroing
  cost is really something to minimize, it's only the caches with the
  largest nbuckets counts that are worth considering --- and we certainly
  can't remove those without penalty.
 
 Yeah, very true.  What's a bit frustrating about the whole thing is
 that we spend a lot of time pulling data into the caches that's
 basically static and never likely to change anywhere, ever.  I bet the
 number of people for whom (int4, int4) has any non-standard
 properties is somewhere between slim and none; and it might well be
 the case that formrdesc() is faster than reading the relcache init
 file, if we didn't need to worry about deviation from canonical.  This
 is even more frustrating in the hypothetical situation where a backend
 can switch databases, because we have to blow away all of these cache
 entries that are 99.9% likely to be basically identical in the old and
 new databases.

It is very tempting to look at optimizations here, but I am worried we
might head down the flat-files solution that caused continual problems
in the past.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Bruce Momjian
Greg Stark wrote:
 On Mon, Nov 29, 2010 at 12:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  The most portable way to do that would be to use calloc insted of malloc,
  and hope that libc is smart enough to provide freshly-mapped space.
  It would be good to look and see whether glibc actually does so,
  of course. ?If not we might end up having to mess with sbrk for
  ourselves, and I'm not sure how pleasantly that interacts with malloc.
 
 It's *supposed* to interact fine. The only thing I wonder is that I
 think malloc intentionally uses mmap for larger allocations but I'm
 not clear what the advantages are. Is it because it's a cheaper way to
 get zeroed bytes? Or just so that free has a hope of returning the
 allocations to the OS?

Using mmap() so you can return large allocations to the OS is a neat
trick, certainly.  I am not sure who implements that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-29 Thread Andres Freund
On Monday 29 November 2010 19:10:07 Tom Lane wrote:
 Jeff Janes jeff.ja...@gmail.com writes:
  Are you sure you haven't just moved the page-fault time to a part of
  the code where it still exists, but just isn't being captured and
  reported?
 
 I'm a bit suspicious about that too.  Another thing to keep in mind
 is that Robert's original program doesn't guarantee that the char
 array is maxaligned; though reasonable implementations of memset
 should be able to use the same inner loop anyway for most of the
 array.
Yes, I measured the time including mmap itself. I don't find it surprising its 
taking measurably shorter as it can just put up the mappings without 
explicitly faulting for each and every page. The benefit is too small to worry 
though, so ...

The answer to Robert includes the timings + test program.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
 Not sure that information moves us forward.  If the postmaster cleared
 the memory, we would have COW in the child and probably be even slower.

 Well, we can determine the answers to these questions empirically.

Not really.  Per Bruce's description, a page would become COW the moment
the postmaster touched (either write or read) any variable on it.  Since
we have no control over how the loader lays out static variables, the
actual behavior of a particular build would be pretty random and subject
to unexpected changes caused by seemingly unrelated edits.

Also, the referenced URL only purports to describe the behavior of
HPUX, which is not exactly a mainstream OS.  I think it requires a
considerable leap of faith to assume that all or even most platforms
work the way this suggests, and not in the dumber fashion Andres
suggested.  Has anybody here actually looked at the relevant Linux
or BSD kernel code?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Robert Haas
On Sun, Nov 28, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
 Not sure that information moves us forward.  If the postmaster cleared
 the memory, we would have COW in the child and probably be even slower.

 Well, we can determine the answers to these questions empirically.

 Not really.  Per Bruce's description, a page would become COW the moment
 the postmaster touched (either write or read) any variable on it.  Since
 we have no control over how the loader lays out static variables, the
 actual behavior of a particular build would be pretty random and subject
 to unexpected changes caused by seemingly unrelated edits.

Well, one big character array pretty much has to be laid out
contiguously, and it would be pretty surprising (but not entirely
impossible) to find that the linker randomly sprinkles symbols from
other files in between consecutive definitions in the same source
file.  I think the next question to answer is to try to allocate blame
for the memset/memcpy overhead between page faults and the zeroing
itself.  That seems like something we can easily member by writing a
test program that zeroes the same region twice and kicks out timing
numbers.  If, as you and Andres are arguing, the actual zeroing is
minor, then we can forget this whole line of discussion and move on to
other possible optimizations.  If that turns out not to be true then
we can worry about how best to avoid the zeroing.  I have to believe
that's a solvable problem; the question is whether there's a benefit.

In a close race, I don't think we should get bogged down in
micro-optimization here, both because micro-optimizations may not gain
much and because what works well on one platform may not do much at
all on another.  The more general issue here is what to do about our
high backend startup costs.  Beyond trying to recycle backends for new
connections, as I've previous proposed and with all the problems it
entails, the only thing that looks promising here is to try to somehow
cut down on the cost of populating the catcache and relcache, not that
I have a very clear idea how to do that.  This has to be a soluble
problem because other people have solved it.  To some degree we're a
victim of our own flexible and extensible architecture here, but I
find it pretty unsatisfying to just say, OK, well, we're slow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Bruce Momjian
Robert Haas wrote:
 On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
  Not sure that information moves us forward. ?If the postmaster cleared
  the memory, we would have COW in the child and probably be even slower.
 
 Well, we can determine the answers to these questions empirically.  I
 think some more scrutiny of the code with the points you and Andres
 and Tom have raised is probably in order, and probably some more
 benchmarking, too.  I haven't had a chance to do that yet, however.

Basically, my bet is if you allocated a large zero-data variable in the
postmaster but never accessed it from the postmaster, at most you would
copy-on-write (COW) fault in two page, one at the beginning that is
shared by accessed variables, and one at the end.  The remaining pages
(4k default for x86) would be zero-filled and not COW shared.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails, the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.

One comment to make here is that it would be a serious error to focus on
the costs of just starting and stopping a backend; you have to think
about cases where the backend does at least some useful work in between,
and that means actually *populating* those caches (to some extent) not
just initializing them.  Maybe your wording above was chosen with that
in mind, but I think onlookers might easily overlook the point.

FWIW, today I've been looking into getting rid of the silliness in
build_index_pathkeys whereby it reconstructs pathkey opfamily OIDs
from sortops instead of just using the index opfamilies directly.
It turns out that once you fix that, there is no need at all for
relcache to cache per-index operator data (the rd_operator arrays)
because that's the only code that uses 'em.  I don't see any particular
change in the runtime of the regression tests from ripping out that
part of the cached data, but it ought to have at least some beneficial
effect on real startup time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Robert Haas
On Sun, Nov 28, 2010 at 3:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails, the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.

 One comment to make here is that it would be a serious error to focus on
 the costs of just starting and stopping a backend; you have to think
 about cases where the backend does at least some useful work in between,
 and that means actually *populating* those caches (to some extent) not
 just initializing them.  Maybe your wording above was chosen with that
 in mind, but I think onlookers might easily overlook the point.

I did have that in mind, but I agree the point is worth mentioning.
So, for example, it wouldn't gain anything meaningful for us to
postpone catcache initialization until someone executes a query.  It
would improve the synthetic benchmark, but that's it.

 FWIW, today I've been looking into getting rid of the silliness in
 build_index_pathkeys whereby it reconstructs pathkey opfamily OIDs
 from sortops instead of just using the index opfamilies directly.
 It turns out that once you fix that, there is no need at all for
 relcache to cache per-index operator data (the rd_operator arrays)
 because that's the only code that uses 'em.  I don't see any particular
 change in the runtime of the regression tests from ripping out that
 part of the cached data, but it ought to have at least some beneficial
 effect on real startup time.

Wow. that's great.  The fact that it simplifies the code is probably
the main point, but obviously whatever cycles we can save during
startup (and ongoing operation) are all to the good.

One possible way to get a real speedup here would be to look for ways
to trim the number of catcaches.  But I'm not too convinced there's
much water to squeeze out of that rock.  After our recent conversation
about KNNGIST, it occurred to me to wonder whether there's really any
point in pretending that a user can usefully add an AM, both due to
hard-wired planner knowledge and due to lack of any sort of extensible
XLOG support.  If not, we could potentially turn pg_am into a
hardcoded lookup table rather than a modifiable catalog, which would
also likely be faster; and perhaps reference AMs elsewhere with
characters rather than OIDs.  But even if this were judged a sensible
thing to do I'm not very sure that even a purpose-built synthetic
benchmark would be able to measure the speedup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 After our recent conversation
 about KNNGIST, it occurred to me to wonder whether there's really any
 point in pretending that a user can usefully add an AM, both due to
 hard-wired planner knowledge and due to lack of any sort of extensible
 XLOG support.  If not, we could potentially turn pg_am into a
 hardcoded lookup table rather than a modifiable catalog, which would
 also likely be faster; and perhaps reference AMs elsewhere with
 characters rather than OIDs.  But even if this were judged a sensible
 thing to do I'm not very sure that even a purpose-built synthetic
 benchmark would be able to measure the speedup.

Well, the lack of extensible XLOG support is definitely a big handicap
to building a *production* index AM as an add-on.  But it's not such a
handicap for development.  And I don't believe that the planner is
hardwired in any way that doesn't allow new index types.  GIST and GIN
have both been added successfully without kluging the planner.  It does
know a lot more about btree than other index types, but that doesn't
mean you can't add a new index type that doesn't behave like btree;
that's more reflective of where development effort has been spent.

So I would consider the above idea a step backwards, and I doubt it
would save anything meaningful anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Robert Haas
On Sun, Nov 28, 2010 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 After our recent conversation
 about KNNGIST, it occurred to me to wonder whether there's really any
 point in pretending that a user can usefully add an AM, both due to
 hard-wired planner knowledge and due to lack of any sort of extensible
 XLOG support.  If not, we could potentially turn pg_am into a
 hardcoded lookup table rather than a modifiable catalog, which would
 also likely be faster; and perhaps reference AMs elsewhere with
 characters rather than OIDs.  But even if this were judged a sensible
 thing to do I'm not very sure that even a purpose-built synthetic
 benchmark would be able to measure the speedup.

 Well, the lack of extensible XLOG support is definitely a big handicap
 to building a *production* index AM as an add-on.  But it's not such a
 handicap for development.

Realistically, it's hard for me to imagine that anyone would go to the
trouble of building it as a loadable module first and then converting
it to part of core later on.  That'd hardly be less work.

 And I don't believe that the planner is
 hardwired in any way that doesn't allow new index types.  GIST and GIN
 have both been added successfully without kluging the planner.

We have 9 boolean flags to indicate the capabilities (or lack thereof)
of AMs, and we only have 4 AMs.  It seems altogether plausible to
assume that the next AM we add could require flags 10 and 11.  Heck, I
think KNNGIST is going to require another flag... which will likely
never be set for any AM other than GIST.

 It does
 know a lot more about btree than other index types, but that doesn't
 mean you can't add a new index type that doesn't behave like btree;
 that's more reflective of where development effort has been spent.

 So I would consider the above idea a step backwards, and I doubt it
 would save anything meaningful anyway.

That latter point, as far as I'm concerned, is the real nail in the coffin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 One possible way to get a real speedup here would be to look for ways
 to trim the number of catcaches.

BTW, it's not going to help to remove catcaches that have a small
initial size, as the pg_am cache certainly does.  If the bucket zeroing
cost is really something to minimize, it's only the caches with the
largest nbuckets counts that are worth considering --- and we certainly
can't remove those without penalty.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Tom Lane
BTW, this might be premature to mention pending some tests about mapping
versus zeroing overhead, but it strikes me that there's more than one
way to skin a cat.  I still think the idea of statically allocated space
sucks.  But what if we rearranged things so that palloc0 doesn't consist
of palloc-then-memset, but rather push the zeroing responsibility down
into the allocator?  In particular, I'm imagining that palloc0 with a
sufficiently large space request --- more than a couple pages --- could
somehow arrange to get space that's guaranteed zero already.  And if the
request isn't large, zeroing it isn't where our problem is anyhow.

The most portable way to do that would be to use calloc insted of malloc,
and hope that libc is smart enough to provide freshly-mapped space.
It would be good to look and see whether glibc actually does so,
of course.  If not we might end up having to mess with sbrk for
ourselves, and I'm not sure how pleasantly that interacts with malloc.

Another question that would be worth asking here is whether the
hand-baked MemSet macro still outruns memset on modern architectures.
I think it's been quite a few years since that was last tested.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Greg Stark
On Mon, Nov 29, 2010 at 12:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The most portable way to do that would be to use calloc insted of malloc,
 and hope that libc is smart enough to provide freshly-mapped space.
 It would be good to look and see whether glibc actually does so,
 of course.  If not we might end up having to mess with sbrk for
 ourselves, and I'm not sure how pleasantly that interacts with malloc.

It's *supposed* to interact fine. The only thing I wonder is that I
think malloc intentionally uses mmap for larger allocations but I'm
not clear what the advantages are. Is it because it's a cheaper way to
get zeroed bytes? Or just so that free has a hope of returning the
allocations to the OS?


 Another question that would be worth asking here is whether the
 hand-baked MemSet macro still outruns memset on modern architectures.
 I think it's been quite a few years since that was last tested.

I know glibc has some sexy memset macros for cases where the size is a
constant. I'm not sure there's been much of an advance in the general
case though. This would tend to imply we should consider going the
other direction of having the caller of palloc0 do the zeroing
instead. Or making palloc0 a macro which expands to include calling
memset with the parameter inlined.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Mon, Nov 29, 2010 at 12:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another question that would be worth asking here is whether the
 hand-baked MemSet macro still outruns memset on modern architectures.
 I think it's been quite a few years since that was last tested.

 I know glibc has some sexy memset macros for cases where the size is a
 constant. I'm not sure there's been much of an advance in the general
 case though. This would tend to imply we should consider going the
 other direction of having the caller of palloc0 do the zeroing
 instead. Or making palloc0 a macro which expands to include calling
 memset with the parameter inlined.

Well, that was exactly the reason why we did it the way we do it.
However, I think it's probably only node allocations where the size
is likely to be constant and hence result in a win.  Perhaps we should
implement makeNode() differently from the general case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Robert Haas
On Sun, Nov 28, 2010 at 7:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 One possible way to get a real speedup here would be to look for ways
 to trim the number of catcaches.

 BTW, it's not going to help to remove catcaches that have a small
 initial size, as the pg_am cache certainly does.  If the bucket zeroing
 cost is really something to minimize, it's only the caches with the
 largest nbuckets counts that are worth considering --- and we certainly
 can't remove those without penalty.

Yeah, very true.  What's a bit frustrating about the whole thing is
that we spend a lot of time pulling data into the caches that's
basically static and never likely to change anywhere, ever.  I bet the
number of people for whom (int4, int4) has any non-standard
properties is somewhere between slim and none; and it might well be
the case that formrdesc() is faster than reading the relcache init
file, if we didn't need to worry about deviation from canonical.  This
is even more frustrating in the hypothetical situation where a backend
can switch databases, because we have to blow away all of these cache
entries that are 99.9% likely to be basically identical in the old and
new databases.

The relation descriptors for pg_class and pg_attribute are examples of
things it would be nice to hardwire and never need to update.  We are
really pretty much screwed if there is any meaningful deviation from
what is expected, but relpages, reltuples, and relfrozenxid - and
maybe relacl or reloptions - can legitimately vary between databases.

Maybe we could speed things up a bit if we got rid of the pg_attribute
entries for the system attributes (except OID).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Yeah, very true.  What's a bit frustrating about the whole thing is
 that we spend a lot of time pulling data into the caches that's
 basically static and never likely to change anywhere, ever.

True.  I wonder if we could do something like the relcache init file
for the catcaches.

 Maybe we could speed things up a bit if we got rid of the pg_attribute
 entries for the system attributes (except OID).

I used to have high hopes for that idea, but the column privileges
patch broke it permanently.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-27 Thread Bruce Momjian
Robert Haas wrote:
  In fact, it wouldn't be that hard to relax the known at compile time
  constraint either. ?We could just declare:
 
  char lotsa_zero_bytes[NUM_ZERO_BYTES_WE_NEED];
 
  ...and then peel off chunks.
  Won't this just cause loads of additional pagefaults after fork() when those
  pages are used the first time and then a second time when first written to 
  (to
  copy it)?
 
 Aren't we incurring those page faults anyway, for whatever memory
 palloc is handing out?  The heap is no different from bss; we just
 move the pointer with sbrk().

Here is perhaps more detail than you wanted, but ...

Basically in a forked process, the text/program is fixed, and the
initialized data and stack are copy on write (COW).  Allocating a big
block of zero memory in data is unitialized data, and the behavior there
differs depending on whether the parent process faulted in those pages. 
If it did, then they are COW, but if it did not, odds are the OS just
gives them to you clean and not shared.  The pages have to be empty
because if it gave you anything else it could be giving you data from
another process.  For details, see
http://docs.hp.com/en/5965-4641/ch01s11.html, Faulting In a Page of
Stack or Uninitialized Data.

As far as sbrk(), those pages are zero-filled also, again for security
reasons.  You have to clear malloc()'ed memory (or call calloc()) not
because the OS gave you dirty pages but because you might be using
memory that you previously freed.  If you have never freed memory (and
the postmaster/parent has not either), I bet all malloc'ed memory would
be zero-filled.

Not sure that information moves us forward.  If the postmaster cleared
the memory, we would have COW in the child and probably be even slower.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-27 Thread Robert Haas
On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
 Not sure that information moves us forward.  If the postmaster cleared
 the memory, we would have COW in the child and probably be even slower.

Well, we can determine the answers to these questions empirically.  I
think some more scrutiny of the code with the points you and Andres
and Tom have raised is probably in order, and probably some more
benchmarking, too.  I haven't had a chance to do that yet, however.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 2:10 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Anything we can do about this?  That's a lot of overhead, and it'd be
 a lot worse on a big machine with 8GB of shared_buffers.

 Micro-optimizing that search for the non-zero value helps a little bit
 (attached). Reduces the percentage shown by oprofile from about 16% to 12%
 on my laptop.

 For bigger gains,

The first optimization that occurred to me was remove the loop
altogether.  I could maybe see needing to do something like this if
we're recovering from an error, but why do we need to do this (except
perhaps to fail an assertion) if we're exiting cleanly?  Even a
session-lifetime buffer-pin leak would be quite disastrous, one would
think.

 Now, the other question is if this really matters. Even if we eliminate that
 loop in AtProcExit_Buffers altogether, is connect/disconnect still be so
 slow that you have to use a connection pooler if you do that a lot?

Oh, I'm sure this isn't going to be nearly enough to fix that problem,
but every little bit helps; and if we never do the first optimization,
we'll never get to #30 or wherever it is that it really starts to move
the needle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Nov 24, 2010 at 2:10 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Micro-optimizing that search for the non-zero value helps a little bit
 (attached). Reduces the percentage shown by oprofile from about 16% to 12%
 on my laptop.

That micro-optimization looks to me like your compiler leaves
something to be desired.

 The first optimization that occurred to me was remove the loop
 altogether.

Or make it execute only in assert-enabled mode, perhaps.

This check had some use back in the bad old days, but the ResourceOwner
mechanism has probably removed a lot of the argument for it.

The counter-argument might be that failing to remove a buffer pin would
be disastrous; but I can't see that it'd be worse than failing to remove
an LWLock, and we have no belt-and-suspenders-too loop for those.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The first optimization that occurred to me was remove the loop
 altogether.

 Or make it execute only in assert-enabled mode, perhaps.

 This check had some use back in the bad old days, but the ResourceOwner
 mechanism has probably removed a lot of the argument for it.

Yeah, that's what I was thinking - this could would have been a good
backstop when our cleanup mechanisms were not as robust as they seem
to be today.  But making the check execute only in assert-enabled more
doesn't seem right, since the check actually acts to mask other coding
errors, rather than reveal them.  Maybe we replace the check with one
that only occurs in an Assert-enabled build and just loops through and
does Assert(PrivateRefCount[i] == 0).  I'm not sure exactly where this
gets called in the shutdown sequence, though - is it sensible to
Assert() here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Nov 24, 2010 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Or make it execute only in assert-enabled mode, perhaps.

 But making the check execute only in assert-enabled more
 doesn't seem right, since the check actually acts to mask other coding
 errors, rather than reveal them.  Maybe we replace the check with one
 that only occurs in an Assert-enabled build and just loops through and
 does Assert(PrivateRefCount[i] == 0).

Yeah, that would be sensible.  There is precedent for this elsewhere
too; I think there's a similar setup for checking buffer refcounts
during transaction cleanup.

 I'm not sure exactly where this
 gets called in the shutdown sequence, though - is it sensible to
 Assert() here?

Assert is sensible anywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 11:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Nov 24, 2010 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Or make it execute only in assert-enabled mode, perhaps.

 But making the check execute only in assert-enabled more
 doesn't seem right, since the check actually acts to mask other coding
 errors, rather than reveal them.  Maybe we replace the check with one
 that only occurs in an Assert-enabled build and just loops through and
 does Assert(PrivateRefCount[i] == 0).

 Yeah, that would be sensible.  There is precedent for this elsewhere
 too; I think there's a similar setup for checking buffer refcounts
 during transaction cleanup.

 I'm not sure exactly where this
 gets called in the shutdown sequence, though - is it sensible to
 Assert() here?

 Assert is sensible anywhere.

OK, patch attached.  Here's what oprofile output looks like with this applied:

3505 10.4396  libc-2.11.2.so   memset
2051  6.1089  libc-2.11.2.so   memcpy
1686  5.0217  postgres AllocSetAlloc
1642  4.8907  postgres hash_search_with_hash_value
1247  3.7142  libc-2.11.2.so   _int_malloc
1096  3.2644  libc-2.11.2.so   fread
855   2.5466  ld-2.11.2.so do_lookup_x
723   2.1535  ld-2.11.2.so _dl_fixup
645   1.9211  ld-2.11.2.so strcmp
620   1.8467  postgres MemoryContextAllocZero

Somehow I don't think I'm going to get much further with this without
figuring out how to get oprofile to cough up a call graph.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


AtProcExit_Buffers.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 OK, patch attached.

Two comments:

1. A comment would help, something like Assert we released all buffer pins.

2. AtProcExit_LocalBuffers should be redone the same way, for
consistency (it likely won't make any performance difference).
Note the comment for AtProcExit_LocalBuffers, too; that probably
needs to be changed along the lines of If we missed any, and
assertions aren't enabled, we'll fail later in DropRelFileNodeBuffers
while trying to drop the temp rels.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 1:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 OK, patch attached.

 Two comments:

Revised patch attached.

I tried configuring oprofile with --callgraph=10 and then running
oprofile with -c, but it gives kooky looking output I can't interpret.
 For example:

  642.8571  postgres record_in
  857.1429  postgres pg_perm_setlocale
17035 5.7219  libc-2.11.2.so   memcpy
  17035100.000  libc-2.11.2.so   memcpy [self]

Not that helpful.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


AtProcExit_Buffers-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Gerhard Heift
On Wed, Nov 24, 2010 at 01:20:36PM -0500, Robert Haas wrote:
 On Wed, Nov 24, 2010 at 1:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  OK, patch attached.
 
  Two comments:
 
 Revised patch attached.
 
 I tried configuring oprofile with --callgraph=10 and then running
 oprofile with -c, but it gives kooky looking output I can't interpret.
  For example:
 
   642.8571  postgres record_in
   857.1429  postgres pg_perm_setlocale
 17035 5.7219  libc-2.11.2.so   memcpy
   17035100.000  libc-2.11.2.so   memcpy [self]
 
 Not that helpful.  :-(

Have a look at the wiki:
http://wiki.postgresql.org/wiki/Profiling_with_OProfile#Additional_analysis

 Robert Haas

Regards,
  Gerhard Heift


signature.asc
Description: Digital signature


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Andres Freund
On Wednesday 24 November 2010 19:01:32 Robert Haas wrote:
 Somehow I don't think I'm going to get much further with this without
 figuring out how to get oprofile to cough up a call graph.
I think to do that sensibly you need CFLAGS=-O2 -fno-omit-frame-pointer...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Gerhard Heift ml-postgresql-20081012-3...@gheift.de writes:
 On Wed, Nov 24, 2010 at 01:20:36PM -0500, Robert Haas wrote:
 I tried configuring oprofile with --callgraph=10 and then running
 oprofile with -c, but it gives kooky looking output I can't interpret.

 Have a look at the wiki:
 http://wiki.postgresql.org/wiki/Profiling_with_OProfile#Additional_analysis

The critical piece of information is there now, but it wasn't a minute
ago.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Revised patch attached.

The asserts in AtProcExit_LocalBuffers are a bit pointless since
you forgot to remove the code that forcibly zeroes LocalRefCount[]...
otherwise +1.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Full results, and call graph, attached.  The first obvious fact is
 that most of the memset overhead appears to be coming from
 InitCatCache.

AFAICT that must be the palloc0 calls that are zeroing out (mostly)
the hash bucket headers.  I don't see any real way to make that cheaper
other than to cut the initial sizes of the hash tables (and add support
for expanding them later, which is lacking in catcache ATM).  Not
convinced that that creates any net savings --- it might just save
some cycles at startup in exchange for more cycles later, in typical
backend usage.

Making those hashtables expansible wouldn't be a bad thing in itself,
mind you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 3:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Full results, and call graph, attached.  The first obvious fact is
 that most of the memset overhead appears to be coming from
 InitCatCache.

 AFAICT that must be the palloc0 calls that are zeroing out (mostly)
 the hash bucket headers.  I don't see any real way to make that cheaper
 other than to cut the initial sizes of the hash tables (and add support
 for expanding them later, which is lacking in catcache ATM).  Not
 convinced that that creates any net savings --- it might just save
 some cycles at startup in exchange for more cycles later, in typical
 backend usage.

 Making those hashtables expansible wouldn't be a bad thing in itself,
 mind you.

The idea I had was to go the other way and say, hey, if these hash
tables can't be expanded anyway, let's put them on the BSS instead of
heap-allocating them.  Any new pages we request from the OS will be
zeroed anyway, but with palloc we then have to re-zero the allocated
block anyway because palloc can return a memory that's been used,
freed, and reused.  However, for anything that only needs to be
allocated once and never freed, and whose size can be known at compile
time, that's not an issue.

In fact, it wouldn't be that hard to relax the known at compile time
constraint either.  We could just declare:

char lotsa_zero_bytes[NUM_ZERO_BYTES_WE_NEED];

...and then peel off chunks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Andres Freund
On Wednesday 24 November 2010 21:47:32 Robert Haas wrote:
 On Wed, Nov 24, 2010 at 3:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Full results, and call graph, attached.  The first obvious fact is
  that most of the memset overhead appears to be coming from
  InitCatCache.
  
  AFAICT that must be the palloc0 calls that are zeroing out (mostly)
  the hash bucket headers.  I don't see any real way to make that cheaper
  other than to cut the initial sizes of the hash tables (and add support
  for expanding them later, which is lacking in catcache ATM).  Not
  convinced that that creates any net savings --- it might just save
  some cycles at startup in exchange for more cycles later, in typical
  backend usage.
  
  Making those hashtables expansible wouldn't be a bad thing in itself,
  mind you.
 
 The idea I had was to go the other way and say, hey, if these hash
 tables can't be expanded anyway, let's put them on the BSS instead of
 heap-allocating them.  Any new pages we request from the OS will be
 zeroed anyway, but with palloc we then have to re-zero the allocated
 block anyway because palloc can return a memory that's been used,
 freed, and reused.  However, for anything that only needs to be
 allocated once and never freed, and whose size can be known at compile
 time, that's not an issue.
 
 In fact, it wouldn't be that hard to relax the known at compile time
 constraint either.  We could just declare:
 
 char lotsa_zero_bytes[NUM_ZERO_BYTES_WE_NEED];
 
 ...and then peel off chunks.
Won't this just cause loads of additional pagefaults after fork() when those 
pages are used the first time and then a second time when first written to (to 
copy it)?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 3:53 PM, Andres Freund and...@anarazel.de wrote:
 On Wednesday 24 November 2010 21:47:32 Robert Haas wrote:
 On Wed, Nov 24, 2010 at 3:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Full results, and call graph, attached.  The first obvious fact is
  that most of the memset overhead appears to be coming from
  InitCatCache.
 
  AFAICT that must be the palloc0 calls that are zeroing out (mostly)
  the hash bucket headers.  I don't see any real way to make that cheaper
  other than to cut the initial sizes of the hash tables (and add support
  for expanding them later, which is lacking in catcache ATM).  Not
  convinced that that creates any net savings --- it might just save
  some cycles at startup in exchange for more cycles later, in typical
  backend usage.
 
  Making those hashtables expansible wouldn't be a bad thing in itself,
  mind you.

 The idea I had was to go the other way and say, hey, if these hash
 tables can't be expanded anyway, let's put them on the BSS instead of
 heap-allocating them.  Any new pages we request from the OS will be
 zeroed anyway, but with palloc we then have to re-zero the allocated
 block anyway because palloc can return a memory that's been used,
 freed, and reused.  However, for anything that only needs to be
 allocated once and never freed, and whose size can be known at compile
 time, that's not an issue.

 In fact, it wouldn't be that hard to relax the known at compile time
 constraint either.  We could just declare:

 char lotsa_zero_bytes[NUM_ZERO_BYTES_WE_NEED];

 ...and then peel off chunks.
 Won't this just cause loads of additional pagefaults after fork() when those
 pages are used the first time and then a second time when first written to (to
 copy it)?

Aren't we incurring those page faults anyway, for whatever memory
palloc is handing out?  The heap is no different from bss; we just
move the pointer with sbrk().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Nov 24, 2010 at 3:53 PM, Andres Freund and...@anarazel.de wrote:
 The idea I had was to go the other way and say, hey, if these hash
 tables can't be expanded anyway, let's put them on the BSS instead of
 heap-allocating them.

 Won't this just cause loads of additional pagefaults after fork() when those
 pages are used the first time and then a second time when first written to 
 (to
 copy it)?

 Aren't we incurring those page faults anyway, for whatever memory
 palloc is handing out?  The heap is no different from bss; we just
 move the pointer with sbrk().

I think you're missing the real point, which that the cost you're
measuring here probably isn't so much memset() as faulting in large
chunks of address space.  Avoiding the explicit memset() likely will
save little in real runtime --- it'll just make sure the initial-touch
costs are more distributed and harder to measure.  But in any case I
think this idea is a nonstarter because it gets in the way of making
those hashtables expansible, which we *do* need to do eventually.

(You might be able to confirm or disprove this theory if you ask
oprofile to count memory access stalls instead of CPU clock cycles...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Andres Freund
On Wednesday 24 November 2010 21:54:53 Robert Haas wrote:
 On Wed, Nov 24, 2010 at 3:53 PM, Andres Freund and...@anarazel.de wrote:
  On Wednesday 24 November 2010 21:47:32 Robert Haas wrote:
  On Wed, Nov 24, 2010 at 3:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   Robert Haas robertmh...@gmail.com writes:
   Full results, and call graph, attached.  The first obvious fact is
   that most of the memset overhead appears to be coming from
   InitCatCache.
   
   AFAICT that must be the palloc0 calls that are zeroing out (mostly)
   the hash bucket headers.  I don't see any real way to make that
   cheaper other than to cut the initial sizes of the hash tables (and
   add support for expanding them later, which is lacking in catcache
   ATM).  Not convinced that that creates any net savings --- it might
   just save some cycles at startup in exchange for more cycles later,
   in typical backend usage.
   
   Making those hashtables expansible wouldn't be a bad thing in itself,
   mind you.
  
  The idea I had was to go the other way and say, hey, if these hash
  tables can't be expanded anyway, let's put them on the BSS instead of
  heap-allocating them.  Any new pages we request from the OS will be
  zeroed anyway, but with palloc we then have to re-zero the allocated
  block anyway because palloc can return a memory that's been used,
  freed, and reused.  However, for anything that only needs to be
  allocated once and never freed, and whose size can be known at compile
  time, that's not an issue.
  
  In fact, it wouldn't be that hard to relax the known at compile time
  constraint either.  We could just declare:
  
  char lotsa_zero_bytes[NUM_ZERO_BYTES_WE_NEED];
  
  ...and then peel off chunks.
  
  Won't this just cause loads of additional pagefaults after fork() when
  those pages are used the first time and then a second time when first
  written to (to copy it)?
 
 Aren't we incurring those page faults anyway, for whatever memory
 palloc is handing out?  The heap is no different from bss; we just
 move the pointer with sbrk().
Yes, but only once. Also scrubbing a page is faster than copying it... (and 
there were patches floating around to do that in advance, not sure if they got 
integrated into mainline linux)

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Nov 24, 2010, at 4:05 PM, Andres Freund and...@anarazel.de wrote:
 
 Won't this just cause loads of additional pagefaults after fork() when
 those pages are used the first time and then a second time when first
 written to (to copy it)?
 
 Aren't we incurring those page faults anyway, for whatever memory
 palloc is handing out?  The heap is no different from bss; we just
 move the pointer with sbrk().
 Yes, but only once. Also scrubbing a page is faster than copying it... (and 
 there were patches floating around to do that in advance, not sure if they 
 got 
 integrated into mainline linux)

I'm not following - can you elaborate?

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Nov 24, 2010, at 4:05 PM, Andres Freund and...@anarazel.de wrote:
 Yes, but only once. Also scrubbing a page is faster than copying it... (and 
 there were patches floating around to do that in advance, not sure if they 
 got 
 integrated into mainline linux)

 I'm not following - can you elaborate?

I think Andres is saying that bss space isn't optimized during a fork
operation: it'll be propagated to the child as copy-on-write pages.
Dunno if that's true or not, but if it is, it'd be a good reason to
avoid the scheme you're suggesting.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Andres Freund
On Wednesday 24 November 2010 22:18:08 Robert Haas wrote:
 On Nov 24, 2010, at 4:05 PM, Andres Freund and...@anarazel.de wrote:
  Won't this just cause loads of additional pagefaults after fork() when
  those pages are used the first time and then a second time when first
  written to (to copy it)?
  
  Aren't we incurring those page faults anyway, for whatever memory
  palloc is handing out?  The heap is no different from bss; we just
  move the pointer with sbrk().
  
  Yes, but only once. Also scrubbing a page is faster than copying it...
  (and there were patches floating around to do that in advance, not sure
  if they got integrated into mainline linux)
 I'm not following - can you elaborate?
When forking the memory mapping of the process is copied - the actual pages 
are not. When a page is first accessed the page fault handler will setup a 
mapping to the old page and mark it as shared. When now written to it will 
fault again and copy the page.

In contrast if you access a page the first time after an sbrk (or mmap, doesn't 
matter) a new page will get scrubbed and and a mapping will get setup.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Andres Freund
On Wednesday 24 November 2010 22:25:45 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Nov 24, 2010, at 4:05 PM, Andres Freund and...@anarazel.de wrote:
  Yes, but only once. Also scrubbing a page is faster than copying it...
  (and there were patches floating around to do that in advance, not sure
  if they got integrated into mainline linux)
  
  I'm not following - can you elaborate?
 
 I think Andres is saying that bss space isn't optimized during a fork
 operation: it'll be propagated to the child as copy-on-write pages.
 Dunno if that's true or not, but if it is, it'd be a good reason to
 avoid the scheme you're suggesting.
Afair nearly all pages are propagated with copy-on-write semantics.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 (You might be able to confirm or disprove this theory if you ask
 oprofile to count memory access stalls instead of CPU clock cycles...)

I don't see an event for that.

# opcontrol --list-events | grep STALL
INSTRUCTION_FETCH_STALL: (counter: all)
DISPATCH_STALLS: (counter: all)
DISPATCH_STALL_FOR_BRANCH_ABORT: (counter: all)
DISPATCH_STALL_FOR_SERIALIZATION: (counter: all)
DISPATCH_STALL_FOR_SEGMENT_LOAD: (counter: all)
DISPATCH_STALL_FOR_REORDER_BUFFER_FULL: (counter: all)
DISPATCH_STALL_FOR_RESERVATION_STATION_FULL: (counter: all)
DISPATCH_STALL_FOR_FPU_FULL: (counter: all)
DISPATCH_STALL_FOR_LS_FULL: (counter: all)
DISPATCH_STALL_WAITING_FOR_ALL_QUIET: (counter: all)
DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RESYNC: (counter: all)

# opcontrol --list-events | grep MEMORY
MEMORY_REQUESTS: (counter: all)
MEMORY_CONTROLLER_PAGE_TABLE_OVERFLOWS: (counter: all)
MEMORY_CONTROLLER_SLOT_MISSED: (counter: all)
MEMORY_CONTROLLER_TURNAROUNDS: (counter: all)
MEMORY_CONTROLLER_BYPASS_COUNTER_SATURATION: (counter: all)
CPU_IO_REQUESTS_TO_MEMORY_IO: (counter: all)
MEMORY_CONTROLLER_REQUESTS: (counter: all)

Ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Nov 24, 2010 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 (You might be able to confirm or disprove this theory if you ask
 oprofile to count memory access stalls instead of CPU clock cycles...)

 I don't see an event for that.

You probably want something involving cache misses.  The event names
vary depending on just which CPU you've got.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Andres Freund
On Wednesday 24 November 2010 23:03:48 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Wed, Nov 24, 2010 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  (You might be able to confirm or disprove this theory if you ask
  oprofile to count memory access stalls instead of CPU clock cycles...)
  
  I don't see an event for that.
 
 You probably want something involving cache misses.  The event names
 vary depending on just which CPU you've got.
Or some BUS OUTSTANDING event.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 5:15 PM, Andres Freund and...@anarazel.de wrote:
 On Wednesday 24 November 2010 23:03:48 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Wed, Nov 24, 2010 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  (You might be able to confirm or disprove this theory if you ask
  oprofile to count memory access stalls instead of CPU clock cycles...)
 
  I don't see an event for that.

 You probably want something involving cache misses.  The event names
 vary depending on just which CPU you've got.
 Or some BUS OUTSTANDING event.

I don't see anything for BUS OUTSTANDING.  For CACHE and MISS I have
some options:

# opcontrol --list-events | grep CACHE
DATA_CACHE_ACCESSES: (counter: all)
DATA_CACHE_MISSES: (counter: all)
DATA_CACHE_REFILLS_FROM_L2_OR_NORTHBRIDGE: (counter: all)
DATA_CACHE_REFILLS_FROM_NORTHBRIDGE: (counter: all)
DATA_CACHE_LINES_EVICTED: (counter: all)
LOCKED_INSTRUCTIONS_DCACHE_MISSES: (counter: all)
L2_CACHE_MISS: (counter: all)
L2_CACHE_FILL_WRITEBACK: (counter: all)
INSTRUCTION_CACHE_FETCHES: (counter: all)
INSTRUCTION_CACHE_MISSES: (counter: all)
INSTRUCTION_CACHE_REFILLS_FROM_L2: (counter: all)
INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM: (counter: all)
INSTRUCTION_CACHE_VICTIMS: (counter: all)
INSTRUCTION_CACHE_INVALIDATED: (counter: all)
CACHE_BLOCK_COMMANDS: (counter: all)
READ_REQUEST_L3_CACHE: (counter: all)
L3_CACHE_MISSES: (counter: all)
IBS_FETCH_ICACHE_MISSES: (ext: ibs_fetch)
IBS_FETCH_ICACHE_HITS: (ext: ibs_fetch)
IBS_OP_DATA_CACHE_MISS: (ext: ibs_op)
IBS_OP_NB_LOCAL_CACHE: (ext: ibs_op)
IBS_OP_NB_REMOTE_CACHE: (ext: ibs_op)
IBS_OP_NB_CACHE_MODIFIED: (ext: ibs_op)
IBS_OP_NB_CACHE_OWNED: (ext: ibs_op)
IBS_OP_NB_LOCAL_CACHE_LAT: (ext: ibs_op)
IBS_OP_NB_REMOTE_CACHE_LAT: (ext: ibs_op)

# opcontrol --list-events | grep MISS | grep -v CACHE
L1_DTLB_MISS_AND_L2_DTLB_HIT: (counter: all)
L1_DTLB_AND_L2_DTLB_MISS: (counter: all)
L1_ITLB_MISS_AND_L2_ITLB_HIT: (counter: all)
L1_ITLB_MISS_AND_L2_ITLB_MISS: (counter: all)
MEMORY_CONTROLLER_SLOT_MISSED: (counter: all)
IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_HITS: (ext: ibs_fetch)
IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_MISSES: (ext: ibs_fetch)
IBS_OP_L1_DTLB_MISS_L2_DTLB_HIT: (ext: ibs_op)
IBS_OP_L1_L2_DTLB_MISS: (ext: ibs_op)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I don't see anything for BUS OUTSTANDING.  For CACHE and MISS I have
 some options:

 DATA_CACHE_MISSES: (counter: all)
 L3_CACHE_MISSES: (counter: all)

Those two look promising, though I can't claim to be an expert.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-24 Thread Robert Haas
On Wed, Nov 24, 2010 at 5:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I don't see anything for BUS OUTSTANDING.  For CACHE and MISS I have
 some options:

 DATA_CACHE_MISSES: (counter: all)
 L3_CACHE_MISSES: (counter: all)

 Those two look promising, though I can't claim to be an expert.

OK.  Thanksgiving is about to interfere with my access to this
machine, but I'll pick this up next week.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-23 Thread Robert Haas
On Wed, Nov 24, 2010 at 12:07 AM, Robert Haas robertmh...@gmail.com wrote:
 Per previous threats, I spent some time tonight running oprofile
 (using the directions Tom Lane was foolish enough to provide me back
 in May).  I took testlibpq.c and hacked it up to just connect to the
 server and then disconnect in a tight loop without doing anything
 useful, hoping to measure the overhead of starting up a new
 connection.

Oh, right: attachments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
CPU: AMD64 family10, speed 2200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %image name   symbol name
120899   18.0616  postgres AtProcExit_Buffers
56891 8.4992  libc-2.11.2.so   memset
30987 4.6293  libc-2.11.2.so   memcpy
26944 4.0253  postgres hash_search_with_hash_value
26554 3.9670  postgres AllocSetAlloc
20407 3.0487  libc-2.11.2.so   _int_malloc
17269 2.5799  libc-2.11.2.so   fread
13005 1.9429  ld-2.11.2.so do_lookup_x
11850 1.7703  ld-2.11.2.so _dl_fixup
10194 1.5229  libc-2.11.2.so   _IO_file_xsgetn
10087 1.5069  postgres MemoryContextAllocZero
9143  1.3659  ld-2.11.2.so strcmp
8957  1.3381  postgres load_relcache_init_file
7132  1.0655  postgres fmgr_info_cxt_security
5630  0.8411  libc-2.11.2.so   vfprintf
5029  0.7513  ld-2.11.2.so _dl_lookup_symbol_x
4997  0.7465  postgres _bt_getroot
3935  0.5879  libc-2.11.2.so   memcmp
3874  0.5788  postgres hash_seq_search
3718  0.5554  postgres LWLockAcquire
3666  0.5477  postgres guc_name_compare
3457  0.5165  libc-2.11.2.so   __strlen_sse2
3297  0.4926  postgres load_relmap_file
3175  0.4743  libc-2.11.2.so   malloc
3170  0.4736  postgres LockAcquireExtended
3139  0.4689  postgres hash_any
3113  0.4651  postgres MemoryContextAlloc
2946  0.4401  postgres _bt_compare
2936  0.4386  postgres index_getnext
2885  0.4310  ld-2.11.2.so _dl_sort_fini
2873  0.4292  libc-2.11.2.so   _int_free
2619  0.3913  postgres MemoryContextCreate
2579  0.3853  ld-2.11.2.so check_match.12146
2485  0.3712  postgres heap_page_prune_opt
2457  0.3671  postgres LWLockRelease
2438  0.3642  postgres CreateTemplateTupleDesc
2322  0.3469  ld-2.11.2.so _dl_fini
2301  0.3438  postgres set_config_option
2253  0.3366  postgres _bt_first
2225  0.3324  postgres PinBuffer
2140  0.3197  postgres BeginReportingGUCOptions
2091  0.3124  postgres _bt_preprocess_keys
2085  0.3115  libc-2.11.2.so   _IO_vfscanf
2051  0.3064  postgres element_alloc
1962  0.2931  postgres ServerLoop
1936  0.2892  postgres CreateTupleDescCopyConstr
1884  0.2815  libc-2.11.2.so   __strcpy_sse2
1846  0.2758  libkrb5.so.3.3   /lib64/libkrb5.so.3.3
1801  0.2691  postgres FunctionCall2
1797  0.2685  postgres hash_create
1782  0.2662  postgres PgstatCollectorMain
1761  0.2631  postgres _bt_checkpage
1728  0.2582  postgres AllocSetFree
1597  0.2386  libselinux.so.1  /lib64/libselinux.so.1
1579  0.2359  libc-2.11.2.so   _IO_default_xsputn
1543  0.2305  libc-2.11.2.so   free
1531  0.2287  postgres SearchCatCache
1528  0.2283  postgres BuildHardcodedDescriptor
1506  0.2250  libc-2.11.2.so   strchrnul
1475  0.2204  postgres _bt_checkkeys
1457  0.2177  postgres ResourceOwnerForgetRelationRef
1451  0.2168  ld-2.11.2.so _dl_runtime_resolve
1443  0.2156  postgres InitCatCache
1443  0.2156  postgres hash_search
1382  0.2065  ld-2.11.2.so _dl_name_match_p
1360  0.2032  postgres PostgresMain
1347  0.2012  postgres pgstat_report_stat
1342  0.2005  libssl.so.1.0.0b /usr/lib64/libssl.so.1.0.0b
1340  0.2002  postgres systable_beginscan
1311  0.1959  libgssapi_krb5.so.2.2/lib64/libgssapi_krb5.so.2.2
1254  0.1873  postgres errstart
1247  0.1863  libc-2.11.2.so   __strncmp_sse2
1245

Re: [HACKERS] profiling connection overhead

2010-11-23 Thread Heikki Linnakangas

On 24.11.2010 07:07, Robert Haas wrote:

Per previous threats, I spent some time tonight running oprofile
(using the directions Tom Lane was foolish enough to provide me back
in May).  I took testlibpq.c and hacked it up to just connect to the
server and then disconnect in a tight loop without doing anything
useful, hoping to measure the overhead of starting up a new
connection.  Ha, ha, funny about that:

120899   18.0616  postgres AtProcExit_Buffers
56891 8.4992  libc-2.11.2.so   memset
30987 4.6293  libc-2.11.2.so   memcpy
26944 4.0253  postgres hash_search_with_hash_value
26554 3.9670  postgres AllocSetAlloc
20407 3.0487  libc-2.11.2.so   _int_malloc
17269 2.5799  libc-2.11.2.so   fread
13005 1.9429  ld-2.11.2.so do_lookup_x
11850 1.7703  ld-2.11.2.so _dl_fixup
10194 1.5229  libc-2.11.2.so   _IO_file_xsgetn

In English: the #1 overhead here is actually something that happens
when processes EXIT, not when they start.  Essentially all the time is
in two lines:

  56920  6.6006 :for (i = 0; i  NBuffers; i++)
:{
  98745 11.4507 :if (PrivateRefCount[i] != 0)


Oh, that's quite surprising.


Anything we can do about this?  That's a lot of overhead, and it'd be
a lot worse on a big machine with 8GB of shared_buffers.


Micro-optimizing that search for the non-zero value helps a little bit 
(attached). Reduces the percentage shown by oprofile from about 16% to 
12% on my laptop.


For bigger gains, I think you need to somehow make the PrivateRefCount 
smaller. Perhaps only use one byte for each buffer instead of int32, and 
use some sort of an overflow list for the rare case that a buffer is 
pinned more than 255 times. Or make it a hash table instead of a simple 
lookup array. But whatever you do, you have to be very careful to not 
add overhead to PinBuffer/UnPinBuffer, those can already be quite high 
in oprofile reports of real applications. It might be worth 
experimenting a bit, at the moment PrivateRefCount takes up 5MB of 
memory per 1GB of shared_buffers. Machines with a high shared_buffers 
setting have no shortage of memory, but a large array like that might 
waste a lot of precious CPU cache.


Now, the other question is if this really matters. Even if we eliminate 
that loop in AtProcExit_Buffers altogether, is connect/disconnect still 
be so slow that you have to use a connection pooler if you do that a lot?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 54c7109..03593fd 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1665,11 +1665,20 @@ static void
 AtProcExit_Buffers(int code, Datum arg)
 {
 	int			i;
+	int		   *ptr;
+	int		   *end;
 
 	AbortBufferIO();
 	UnlockBuffers();
 
-	for (i = 0; i  NBuffers; i++)
+	/* Fast search for the first non-zero entry in PrivateRefCount */
+	end = (int *) PrivateRefCount[NBuffers - 1];
+	ptr = (int *) PrivateRefCount;
+	while(ptr  end  *ptr == 0)
+		ptr++;
+	i = ((int32 *) ptr) - PrivateRefCount;
+
+	for (;i  NBuffers; i++)
 	{
 		if (PrivateRefCount[i] != 0)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers