Re: [HACKERS] why do we need two snapshots per query?

2011-12-19 Thread Greg Smith
This feature has now passed through review by Dimitri with him no longer 
having anything to say about it.  I've marked it ready for committer 
now.  Seems the main decision left here is whether another committer 
wants to take a look at this, or if Robert wants to take a spin on the 
buildfarm wheel by committing it himself.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-12-16 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I thought about adjusting it, but I didn't see what it made sense to
 adjust it to.  It still is the parameter used for parameter I/O and
 parsing/planning, so the existing text isn't wrong.  It will possibly
 also get reused for execution, but the previous statement has a
 lengthy comment on that, so it didn't seem worth recapitulating here.

Ah yes, the previous comment is not far away, so it's easy to read it
that way.  Agreed.

 Actually, I did, but the change was in the second patch file attached
 to the same email, which maybe you missed?  Combined patch attached.

Oops, I missed it, yes.  Looks good to me.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-12-13 Thread Robert Haas
On Sat, Nov 26, 2011 at 2:50 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 +     /* Done with the snapshot used for parameter I/O and parsing/planning 
 */
 +     if (snapshot_set)
 +             PopActiveSnapshot();

 This comment needs adjusting.

I thought about adjusting it, but I didn't see what it made sense to
adjust it to.  It still is the parameter used for parameter I/O and
parsing/planning, so the existing text isn't wrong.  It will possibly
also get reused for execution, but the previous statement has a
lengthy comment on that, so it didn't seem worth recapitulating here.

 You need to be editing the comments for this function.  To be specific
 you didn't update this text:

  * The caller can optionally pass a snapshot to be used; pass InvalidSnapshot
  * for the normal behavior of setting a new snapshot.  This parameter is
  * presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
  * to be used for cursors).

Actually, I did, but the change was in the second patch file attached
to the same email, which maybe you missed?  Combined patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


fewer-snapshots.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-26 Thread Dimitri Fontaine
Hi,

Robert Haas robertmh...@gmail.com writes:
 On Sun, Nov 13, 2011 at 8:57 PM, Robert Haas robertmh...@gmail.com wrote:
 In the -M extended case, we take a snapshot from exec_parse_message(),
 and the same two in the exec_bind_message() call that are taken in the
 -M prepared case.  So reducing the prepared case from two snapshots to
 one will reduce the extended case from three snapshots to two, thus
 saving one snapshot per query regardless of how it's executed.

I like the idea and I think it's better semantics to use the same
snapshot for planning and executing in the simple query case.

I didn't try to reproduce the performance benefits seen by Robert here,
nor did I tried to double check to compilation warnings etc.  I guess
that reviewing a commiter's patch allows for being not as thorough :)

 + /* Done with the snapshot used for parameter I/O and parsing/planning */
 + if (snapshot_set)
 + PopActiveSnapshot();

This comment needs adjusting.

 diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
 index 466727b..c41272b 100644
 --- a/src/backend/tcop/pquery.c
 +++ b/src/backend/tcop/pquery.c
 @@ -455,7 +455,7 @@ FetchStatementTargetList(Node *stmt)
   * tupdesc (if any) is known.
   */
  void
 -PortalStart(Portal portal, ParamListInfo params, Snapshot snapshot)
 +PortalStart(Portal portal, ParamListInfo params, bool use_active_snapshot)

You need to be editing the comments for this function.  To be specific
you didn't update this text:

 * The caller can optionally pass a snapshot to be used; pass InvalidSnapshot
 * for the normal behavior of setting a new snapshot.  This parameter is
 * presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
 * to be used for cursors).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-14 Thread Robert Haas
On Sun, Nov 13, 2011 at 9:40 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Nov 13, 2011 at 8:57 PM, Robert Haas robertmh...@gmail.com wrote:
 In the -M extended case, we take a snapshot from exec_parse_message(),
 and the same two in the exec_bind_message() call that are taken in the
 -M prepared case.  So reducing the prepared case from two snapshots to
 one will reduce the extended case from three snapshots to two, thus
 saving one snapshot per query regardless of how it's executed.

 And here are the revised patches.  Apply refactor-portal-start
 (unchanged) first and then just-one-snapshot-v2.

Some pgbench -S numbers (SELECT-only test) from Nate Boley's 32-core
box.   I benchmarked commit f1585362856d4da17113ba2e4ba46cf83cba0cf2,
patched and unpatched.  I set shared_buffers = 8GB,
maintenance_work_mem = 1GB, synchronous_commit = off,
checkpoint_segments = 300, checkpoint_timeout = 15min,
checkpoint_completion_target = 0.9, wal_writer_delay = 20ms.  All
numbers are median of five-minute runs.  Lines beginning with m are
unpatched master; lines beginning with s are patched; the number
immediately following is the client count.

== with -M simple ==
m01 tps = 4347.393421 (including connections establishing)
s01 tps = 4336.883587 (including connections establishing)
m08 tps = 33510.055702 (including connections establishing)
s08 tps = 33826.161862 (including connections establishing)
m32 tps = 203457.891154 (including connections establishing)
s32 tps = 218206.065239 (including connections establishing)
m80 tps = 200494.623552 (including connections establishing)
s80 tps = 219344.961016 (including connections establishing)

== with -M extended ==
m01 tps = 3567.409671 (including connections establishing)
s01 tps = 3678.526702 (including connections establishing)
m08 tps = 27754.682736 (including connections establishing)
s08 tps = 28474.566418 (including connections establishing)
m32 tps = 177439.118199 (including connections establishing)
s32 tps = 187307.500501 (including connections establishing)
m80 tps = 173765.388249 (including connections establishing)
s80 tps = 184047.873286 (including connections establishing)

== with -M prepared ==
m01 tps = 7310.682085 (including connections establishing)
s01 tps = 7229.791967 (including connections establishing)
m08 tps = 54397.250840 (including connections establishing)
s08 tps = 55045.651468 (including connections establishing)
m32 tps = 303142.385619 (including connections establishing)
s32 tps = 313493.928436 (including connections establishing)
m80 tps = 304652.195974 (including connections establishing)
s80 tps = 311330.343510 (including connections establishing)

Of course, the fact that this gives good benchmark numbers doesn't
make it correct.  But the fact that it gives good benchmark numbers
seems - to me anyway - like a good reason to think carefully before
rejecting this approach out of hand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-14 Thread Greg Smith

On 11/14/2011 04:04 PM, Robert Haas wrote:

Some pgbench -S numbers (SELECT-only test) from Nate Boley's 32-core
box


It seems like Nate Boley's system should be be credited in the 9.2 
release notes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-14 Thread Robert Haas
On Nov 14, 2011, at 4:31 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 11/14/2011 04:04 PM, Robert Haas wrote:
 Some pgbench -S numbers (SELECT-only test) from Nate Boley's 32-core
 box
 
 It seems like Nate Boley's system should be be credited in the 9.2 release 
 notes.

+1.  Having access to that box has been extremely helpful; it would be nice to 
have equally convenient access to a few more.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-14 Thread Tatsuo Ishii
 On the other hand, if our goal in life is to promote the extended
 query protocol over the simple query protocol at all costs, then I
 agree that we shouldn't optimize the simple query protocol in any way.
  Perhaps we should even post a big notice on it that says this
 facility is deprecated and will be removed in a future version of
 PostgreSQL.  But why should that be our goal?  Presumably our goal is
 to put forward the best technology, not to artificially pump up one
 alternative at the expense of some other one.  If the simple protocol
 is faster in certain use cases than the extended protocol, then let
 people use it.  I wouldn't have noticed this optimization opportunity
 in the first place but for the fact that psql seems to use the simple
 protocol - why does it do that, if the extended protocol is
 universally better?  I suspect that, as with many other things where
 we support multiple alternatives, the best alternative depends on the
 situation, and we should let users pick depending on their use case.

+1. I don't see any justfication not to enhance simple protocol case
influenced by extended protocol's relatively poor performance.

 At any rate, if you're concerned about the relative efficiency of the
 simple query protocol versus the extended protocol, it seems that the
 horse has already left the barn.  I just did a quick 32-client pgbench
 -S test on a 32-core box.  This is just a thirty-second run, but
 that's enough to make the point: if you're not using prepared queries,
 using the extended query protocol incurs a significant penalty - more
 than 15% on this test:
 
 [simple] tps = 246808.409932 (including connections establishing)
 [extended] tps = 205609.438247 (including connections establishing)
 [prepared] tps = 338150.881389 (including connections establishing)

Quite impressive result.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 If we could be certain that a query was being executed immediately

... that is, with the same snapshot ...

 then it would be possible to simplify expressions using stable
 functions as if they were constants. My earlier patch did exactly
 that.

Mph.  I had forgotten about that aspect of it.  I think that it's
very largely superseded by Marti Raudsepp's pending patch:
https://commitfest.postgresql.org/action/patch_view?id=649
which does more and doesn't require any assumption that plan and
execution snapshots are the same.

Now you're going to say that that doesn't help for failure to prove
partial index or constraint conditions involving stable functions,
and my answer is going to be that that isn't an interesting use-case.
Partial index conditions *must* be immutable, and constraint conditions
*should* be.  As far as partitioning goes, the correct solution there
is to move the partition selection to run-time, so we should not be
contorting query semantics to make incremental performance improvements
with the existing partitioning infrastructure.

I remain of the opinion that Robert's proposal is a bad idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Simon Riggs
On Sun, Nov 13, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 As far as partitioning goes, the correct solution there
 is to move the partition selection to run-time, so we should not be
 contorting query semantics to make incremental performance improvements
 with the existing partitioning infrastructure.

Agreed, but I think we need both planning and execution time
awareness, just as we do with indexonly.

That's what I'd like to be able to do: link planning and execution.

It's all very well to refuse individual cases where linkage is
required, but ISTM clear that there are many possible uses of being
able to tell whether a plan is one-shot or not and nothing lost by
allowing that information (a boolean) pass to the executor.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 It's all very well to refuse individual cases where linkage is
 required, but ISTM clear that there are many possible uses of being
 able to tell whether a plan is one-shot or not and nothing lost by
 allowing that information (a boolean) pass to the executor.

It's an interconnection between major modules that IMO we don't need.
Having the executor behave differently depending on the planning path
the query took creates complexity, which creates bugs.  You haven't
produced any use-case at all that convinces me that it's worth the risk;
nor do I believe there are lots more use-cases right around the corner.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 If we could be certain that a query was being executed immediately

 ... that is, with the same snapshot ...

 then it would be possible to simplify expressions using stable
 functions as if they were constants. My earlier patch did exactly
 that.

 Mph.  I had forgotten about that aspect of it.  I think that it's
 very largely superseded by Marti Raudsepp's pending patch:
 https://commitfest.postgresql.org/action/patch_view?id=649
 which does more and doesn't require any assumption that plan and
 execution snapshots are the same.

 Now you're going to say that that doesn't help for failure to prove
 partial index or constraint conditions involving stable functions,
 and my answer is going to be that that isn't an interesting use-case.
 Partial index conditions *must* be immutable, and constraint conditions
 *should* be.  As far as partitioning goes, the correct solution there
 is to move the partition selection to run-time, so we should not be
 contorting query semantics to make incremental performance improvements
 with the existing partitioning infrastructure.

 I remain of the opinion that Robert's proposal is a bad idea.

Wait a minute.  I can understand why you think it's a bad idea to
preserve a snapshot across multiple protocol messages
(parse/bind/execute), but why or how would it be a bad idea to keep
the same snapshot between planning and execution when the whole thing
is being done as a unit?  You haven't offered any real justification
for that position, and it seems to me that if anything the semantics
of such a thing are far *less* intuitive than it would be to do the
whole thing under a single snapshot.  The whole point of snapshot
isolation is that our view of the database doesn't change mid-query;
and yet you are now saying that's exactly the behavior we should have.
 That seems exactly backwards to me.

I also think you are dismissing Simon's stable-expression-folding
proposal far too lightly.  I am not sure that the behavior he wants is
safe given the current details of our implementation - or even with my
patch; I suspect a little more than that is needed - but I am pretty
certain it's the behavior that users want and expect, and we should be
moving toward it, not away from it.  I have seen a significant number
of cases over the years where the query optimizer generated a bad plan
because it did less constant-folding than the user expected.  Users do
not walk around thinking about the fact that the planner and executor
are separate modules and therefore probably should use separate
snapshots.  They expect their query to see a consistent view of the
database.  Period.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Nov 13, 2011 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I remain of the opinion that Robert's proposal is a bad idea.

 Wait a minute.  I can understand why you think it's a bad idea to
 preserve a snapshot across multiple protocol messages
 (parse/bind/execute), but why or how would it be a bad idea to keep
 the same snapshot between planning and execution when the whole thing
 is being done as a unit?  You haven't offered any real justification
 for that position,

It's not hard to come by: execution should proceed with the latest
available view of the database.

 and it seems to me that if anything the semantics
 of such a thing are far *less* intuitive than it would be to do the
 whole thing under a single snapshot.

In that case you must be of the opinion that extended query protocol
is a bad idea and we should get rid of it, and the same for prepared
plans of all types.  What you're basically proposing is that simple
query mode will act differently from other ways of submitting a query,
and I don't think that's a good idea.  It might be sane if planning
could be assumed to take zero time, but that's hardly true.

 I also think you are dismissing Simon's stable-expression-folding
 proposal far too lightly.  I am not sure that the behavior he wants is
 safe given the current details of our implementation - or even with my
 patch; I suspect a little more than that is needed - but I am pretty
 certain it's the behavior that users want and expect, and we should be
 moving toward it, not away from it.  I have seen a significant number
 of cases over the years where the query optimizer generated a bad plan
 because it did less constant-folding than the user expected.

This is just FUD, unless you can point to specific examples where
Marti's patch won't fix it.  If that patch crashes and burns for
some reason, then we should revisit this idea; but if it succeeds
it will cover more cases than plan-time constant folding could.

One of the reasons I don't want to go this direction is that it would
re-introduce causes of extended query protocol having poor performance
relative to simple protocol.  That's not something that users find
intuitive or desirable, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Kevin Grittner
 Tom Lane  wrote:
 Robert Haas  writes:
 
 I can understand why you think it's a bad idea to preserve a
 snapshot across multiple protocol messages (parse/bind/execute),
 but why or how would it be a bad idea to keep the same snapshot
 between planning and execution when the whole thing is being done
 as a unit? You haven't offered any real justification for that
 position,

 It's not hard to come by: execution should proceed with the latest
 available view of the database.
 
I don't think that stands as an intuitively obvious assertion.  I
think we need to see the argument which leads to that conclusion.
 
 and it seems to me that if anything the semantics of such a thing
 are far *less* intuitive than it would be to do the whole thing
 under a single snapshot.

 In that case you must be of the opinion that extended query
 protocol is a bad idea and we should get rid of it, and the same
 for prepared plans of all types. What you're basically proposing is
 that simple query mode will act differently from other ways of
 submitting a query, and I don't think that's a good idea.
 
In what way would that difference be user-visible?
 
 One of the reasons I don't want to go this direction is that it
 would re-introduce causes of extended query protocol having poor
 performance relative to simple protocol. That's not something that
 users find intuitive or desirable, either.
 
If the simple protocol can perform better than the extended protocol,
it hardly seems like a good idea to intentionally cripple the fast
one to keep them at the same performance.  It seems like it would be
better to document the performance difference so that people can
weigh the trade-offs.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Wait a minute.  I can understand why you think it's a bad idea to
 preserve a snapshot across multiple protocol messages
 (parse/bind/execute), but why or how would it be a bad idea to keep
 the same snapshot between planning and execution when the whole thing
 is being done as a unit?  You haven't offered any real justification
 for that position,

 It's not hard to come by: execution should proceed with the latest
 available view of the database.

The word latest doesn't seem very illuminating to me. If you take
that to its (illogical) conclusion, that would mean that we ought to
do everything under SnapshotNow - i.e. every time we fetch a tuple,
use the latest available view of the database.  It seems to me that
you can wrap some logic around this - we shouldn't use a snapshot
taken later than event1 because reason1, and we shouldn't use one
taken earlier than event2 because reason2.

It seems to me that the *latest* snapshot we could use would be one
taken the instant before we did any calculation whose result might
depend on our choice of snapshot.  For example, if the query involves
calculating pi out to 5000 decimal places (without looking at any
tables) and then scanning for the matching value in some table column,
we could do the whole calculation prior to taking a snapshot and then
take the snapshot only when we start groveling through the table.
That view would be later than the one we use now, and but still
correct.

On the other hand, it seems to me that the *earliest* snapshot we can
use is one taken the instant after we receive the protocol message
that tells us to execute the query.  If we take it any sooner than
that, we might fail to see as committed some transaction which was
acknowledged before the user sent the message.

Between those two extremes, it seems to me that when exactly the
snapshot gets taken is an implementation detail.

 and it seems to me that if anything the semantics
 of such a thing are far *less* intuitive than it would be to do the
 whole thing under a single snapshot.

 In that case you must be of the opinion that extended query protocol
 is a bad idea and we should get rid of it, and the same for prepared
 plans of all types.  What you're basically proposing is that simple
 query mode will act differently from other ways of submitting a query,
 and I don't think that's a good idea.

I don't see why anything I said would indicate that we shouldn't have
prepared plans.  It is useful for users to have the option to parse
and plan before execution - especially if they want to execute the
same query repeatedly - and if they choose to make use of that
functionality, then we and they will have to deal with the fact that
things can change between plan time and execution time.  If that means
we miss some optimization opportunities, so be it.  But we needn't
deliver the semantics associated with the extended query protocol when
the user isn't using it; and the next time we bump the protocol
version we probably should give some thought to making sure that you
only need to use the extended query protocol when you explicitly want
to separate parse/plan from execution, and not just to get at some
other functionality that we've only chosen to provided using the
extended protocol.

 It might be sane if planning
 could be assumed to take zero time, but that's hardly true.

I still maintain that the length of planning is irrelevant; more, if
the planning and execution are happening in response to a single
protocol message, then the semantics of the query need not (and
perhaps even should not) depend on how much of that time is spent
planning and how much is spent executing.

 I also think you are dismissing Simon's stable-expression-folding
 proposal far too lightly.  I am not sure that the behavior he wants is
 safe given the current details of our implementation - or even with my
 patch; I suspect a little more than that is needed - but I am pretty
 certain it's the behavior that users want and expect, and we should be
 moving toward it, not away from it.  I have seen a significant number
 of cases over the years where the query optimizer generated a bad plan
 because it did less constant-folding than the user expected.

 This is just FUD, unless you can point to specific examples where
 Marti's patch won't fix it.  If that patch crashes and burns for
 some reason, then we should revisit this idea; but if it succeeds
 it will cover more cases than plan-time constant folding could.

I haven't reviewed the two patches in enough detail to have a clear
understanding of which use cases each one does and does not cover.
But, for example, you wrote this:

tgl As far as partitioning goes, the correct solution there
tgl is to move the partition selection to run-time, so we should not be
tgl contorting query semantics to make incremental performance improvements
tgl with the existing partitioning infrastructure.

...and 

Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Florian Pflug
On Nov14, 2011, at 00:13 , Robert Haas wrote:
 On Sun, Nov 13, 2011 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In that case you must be of the opinion that extended query protocol
 is a bad idea and we should get rid of it, and the same for prepared
 plans of all types.  What you're basically proposing is that simple
 query mode will act differently from other ways of submitting a query,
 and I don't think that's a good idea.
 
 I don't see why anything I said would indicate that we shouldn't have
 prepared plans.  It is useful for users to have the option to parse
 and plan before execution - especially if they want to execute the
 same query repeatedly - and if they choose to make use of that
 functionality, then we and they will have to deal with the fact that
 things can change between plan time and execution time.

The problem, or at least what I perceived to be the problem, is that
protocol-level support for prepared plans isn't the only reason to use
the extended query protocol. The other reasons are protocol-level control
over text vs. binary format, and out-of-line parameters.

In my experience, it's hard enough as it is to convince developers to
use statement parameters instead of interpolating them into the SQL
string. Once word gets out that the simple protocol is now has less locking
overhead than the extended protocol, it's going to get even harder...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 6:45 PM, Florian Pflug f...@phlo.org wrote:
 On Nov14, 2011, at 00:13 , Robert Haas wrote:
 On Sun, Nov 13, 2011 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In that case you must be of the opinion that extended query protocol
 is a bad idea and we should get rid of it, and the same for prepared
 plans of all types.  What you're basically proposing is that simple
 query mode will act differently from other ways of submitting a query,
 and I don't think that's a good idea.

 I don't see why anything I said would indicate that we shouldn't have
 prepared plans.  It is useful for users to have the option to parse
 and plan before execution - especially if they want to execute the
 same query repeatedly - and if they choose to make use of that
 functionality, then we and they will have to deal with the fact that
 things can change between plan time and execution time.

 The problem, or at least what I perceived to be the problem, is that
 protocol-level support for prepared plans isn't the only reason to use
 the extended query protocol. The other reasons are protocol-level control
 over text vs. binary format, and out-of-line parameters.

 In my experience, it's hard enough as it is to convince developers to
 use statement parameters instead of interpolating them into the SQL
 string. Once word gets out that the simple protocol is now has less locking
 overhead than the extended protocol, it's going to get even harder...

Well, if our goal in life is to allow people to have protocol control
over text vs. binary format and support out-of-line parameters without
requiring multiple protocol messages, we can build that facility in to
the next version of the protocol.  I know Kevin's been thinking about
working on that project for a number of reasons, and this would be a
good thing to get on the list.

On the other hand, if our goal in life is to promote the extended
query protocol over the simple query protocol at all costs, then I
agree that we shouldn't optimize the simple query protocol in any way.
 Perhaps we should even post a big notice on it that says this
facility is deprecated and will be removed in a future version of
PostgreSQL.  But why should that be our goal?  Presumably our goal is
to put forward the best technology, not to artificially pump up one
alternative at the expense of some other one.  If the simple protocol
is faster in certain use cases than the extended protocol, then let
people use it.  I wouldn't have noticed this optimization opportunity
in the first place but for the fact that psql seems to use the simple
protocol - why does it do that, if the extended protocol is
universally better?  I suspect that, as with many other things where
we support multiple alternatives, the best alternative depends on the
situation, and we should let users pick depending on their use case.

At any rate, if you're concerned about the relative efficiency of the
simple query protocol versus the extended protocol, it seems that the
horse has already left the barn.  I just did a quick 32-client pgbench
-S test on a 32-core box.  This is just a thirty-second run, but
that's enough to make the point: if you're not using prepared queries,
using the extended query protocol incurs a significant penalty - more
than 15% on this test:

[simple] tps = 246808.409932 (including connections establishing)
[extended] tps = 205609.438247 (including connections establishing)
[prepared] tps = 338150.881389 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 7:37 PM, Robert Haas robertmh...@gmail.com wrote:
 In my experience, it's hard enough as it is to convince developers to
 use statement parameters instead of interpolating them into the SQL
 string. Once word gets out that the simple protocol is now has less locking
 overhead than the extended protocol, it's going to get even harder...

 [ discussion of convincing people to use

 At any rate, if you're concerned about the relative efficiency of the
 simple query protocol versus the extended protocol, it seems that the
 horse has already left the barn.

On further examination, it seems that the behavior of the current code
is as follows:

pgbench -n -S -t 2000 == ~4000 snapshots
pgbench -n -S -t 2000 -M extended == ~6000 snapshots
pgbench -n -S -t 2000 -M prepared == ~4000 snapshots

So it's already the case that simple protocol has less locking
overhead than the extended protocol, unless you're using prepared
queries.  The -M prepared case appears to be doing just about exactly
the same thing that happens in the simple case: we take a snapshot in
exec_bind_message() and then release it a nanosecond before calling
PortalStart(), which promptly takes a new one.  IOW, it looks like the
same optimization that applies to the simple case can be applied here
as well.

In the -M extended case, we take a snapshot from exec_parse_message(),
and the same two in the exec_bind_message() call that are taken in the
-M prepared case.  So reducing the prepared case from two snapshots to
one will reduce the extended case from three snapshots to two, thus
saving one snapshot per query regardless of how it's executed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 8:57 PM, Robert Haas robertmh...@gmail.com wrote:
 In the -M extended case, we take a snapshot from exec_parse_message(),
 and the same two in the exec_bind_message() call that are taken in the
 -M prepared case.  So reducing the prepared case from two snapshots to
 one will reduce the extended case from three snapshots to two, thus
 saving one snapshot per query regardless of how it's executed.

And here are the revised patches.  Apply refactor-portal-start
(unchanged) first and then just-one-snapshot-v2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


just-one-snapshot-v2.patch
Description: Binary data


refactor-portal-start.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-12 Thread Florian Pflug
On Nov11, 2011, at 19:17 , Tom Lane wrote:
 But frankly I do not like any of these proposals.  Making fundamental
 changes in long-established semantics in the name of squeezing out a few
 cycles is the wrong way to design software.

Hm, then maybe this is one of the things to put onto the next protocol
version todo list?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-12 Thread Robert Haas
On Sat, Nov 12, 2011 at 5:09 PM, Florian Pflug f...@phlo.org wrote:
 On Nov11, 2011, at 19:17 , Tom Lane wrote:
 But frankly I do not like any of these proposals.  Making fundamental
 changes in long-established semantics in the name of squeezing out a few
 cycles is the wrong way to design software.

 Hm, then maybe this is one of the things to put onto the next protocol
 version todo list?

+1.  I had the same thought.  It seems clear that we could design this
in a way that would make it clear to the server whether we wanted to
execute immediately or only upon further instructions, but trying to
guess the user's intentions seems a little too rich.

Meanwhile, here's my attempt at fixing this for the simple query
protocol.  I'm attaching two patches:

- refactor-portal-start.patch, which attempts to change the API for
PortalStart() without any behavioral change whatsoever.  The idea here
is that instead of passing a snapshot to PortalStart() explicitly, we
just pass a flag saying whether or not it's OK to use the active
snapshot (versus taking a new one).  This seems to fit nicely with
existing calling patterns for this function.

- just-one-snapshot.patch, which applies atop
refactor-portal-start.patch, makes use of the new API to avoid the
need for PORTAL_ONE_SELECT queries to take two snapshots.  It does so
by keeping the parse/analyze snapshot around just long enough to pass
it to PortalStart().  If PortalStart() chooses to register it, then it
(or a copy of it) will be around for a while longer; otherwise, it
will be dropped immediately after PortalStart() finishes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


refactor-portal-start.patch
Description: Binary data


just-one-snapshot.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-12 Thread Simon Riggs
On Fri, Nov 11, 2011 at 10:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Tom, in that earlier thread you said you'd be doing something in this
 release about that. Can you say more about what that was, and will you
 be doing it still?

 http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=e6faf910d75027bdce7cd0f2033db4e912592bcc

 I think that largely supersedes what I understood your notion of a
 one-shot plan to be about, though perhaps I missed something?

I was looking at other use cases, specifically partitioning/partial indexes.

If we could be certain that a query was being executed immediately
then it would be possible to simplify expressions using stable
functions as if they were constants. My earlier patch did exactly
that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Can't we arrange to retain the snapshot used for parse
 analysis / planning and reuse it for the portal that we create just
 afterwards?

Possibly, but what if planning takes a long time?  Also, I think you're
ignoring the extended-query-protocol scenario, where it would be a whole
lot harder to justify keeping a snapshot from Parse through Bind and
Execute.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Robert Haas
On Fri, Nov 11, 2011 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Can't we arrange to retain the snapshot used for parse
 analysis / planning and reuse it for the portal that we create just
 afterwards?

 Possibly, but what if planning takes a long time?  Also, I think you're
 ignoring the extended-query-protocol scenario, where it would be a whole
 lot harder to justify keeping a snapshot from Parse through Bind and
 Execute.

In the extend query protocol scenario, it seems to me that keeping the
snapshot would be both wrong and a bad idea.  It would be wrong
because the user will (I think) expect the query can see all rows that
were marked as committed prior to Execute message.  It would be a bad
idea because we'd have to keep that snapshot advertised for the entire
time between Parse and Execute, even if the client was sitting there
doing nothing for a long time, which would hold back RecentGlobalXmin.

But in the simple query scenario, I think it's fine.  Even if query
planning does take a long time, it's a single operation from a user
perspective.  If the user sends a query and gets an answer back ten
seconds later, they don't know (and shouldn't care) whether that
happened because the query took nine seconds to plan and one second to
execute, or one second to plan and nine seconds to execute, or 50ms to
plan and 9950ms to execute.  For the scenario you're talking about to
be a problem, someone would have to be expecting a query to see rows
from a transaction that committed *after* the query was sent - based,
presumably, on the knowledge, that the execution snapshot wouldn't be
taken immediately, and that the concurrent transaction would commit
meanwhile.  But such a practice is flagrantly unsafe anyway, because
any optimization that makes query planning faster could break it.  And
I'm not prepared to guarantee that we're never going to speed up the
optimizer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Florian Pflug
On Nov11, 2011, at 16:18 , Robert Haas wrote:
 In the extend query protocol scenario, it seems to me that keeping the
 snapshot would be both wrong and a bad idea.  It would be wrong
 because the user will (I think) expect the query can see all rows that
 were marked as committed prior to Execute message.  It would be a bad
 idea because we'd have to keep that snapshot advertised for the entire
 time between Parse and Execute, even if the client was sitting there
 doing nothing for a long time, which would hold back RecentGlobalXmin.

Hm, but that'd penalize clients who use the extended query protocol, which
they have to if they want to transmit out-of-line parameters. You could
work around that by making the extended protocol scenario work like the
simply protocol scenario if the unnamed statement and/or portal is used.

Since clients presumably use pipelined Parse,Bind,Execute messages when
using the unnamed statement and portal, they're unlikely to observe the
difference between a snapshot taken during Parse, Bind or Execute.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Nov11, 2011, at 16:18 , Robert Haas wrote:
 In the extend query protocol scenario, it seems to me that keeping the
 snapshot would be both wrong and a bad idea.

 Hm, but that'd penalize clients who use the extended query protocol, which
 they have to if they want to transmit out-of-line parameters. You could
 work around that by making the extended protocol scenario work like the
 simply protocol scenario if the unnamed statement and/or portal is used.

 Since clients presumably use pipelined Parse,Bind,Execute messages when
 using the unnamed statement and portal, they're unlikely to observe the
 difference between a snapshot taken during Parse, Bind or Execute.

I think it would be a seriously bad idea to allow the unnamed portal to
have semantic differences from other portals.  We've gotten enough flak
about the fact that it had planner behavioral differences (enough so that
those differences are gone as of HEAD).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Florian Pflug
On Nov11, 2011, at 17:06 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 On Nov11, 2011, at 16:18 , Robert Haas wrote:
 In the extend query protocol scenario, it seems to me that keeping the
 snapshot would be both wrong and a bad idea.
 
 Hm, but that'd penalize clients who use the extended query protocol, which
 they have to if they want to transmit out-of-line parameters. You could
 work around that by making the extended protocol scenario work like the
 simply protocol scenario if the unnamed statement and/or portal is used.
 
 Since clients presumably use pipelined Parse,Bind,Execute messages when
 using the unnamed statement and portal, they're unlikely to observe the
 difference between a snapshot taken during Parse, Bind or Execute.
 
 I think it would be a seriously bad idea to allow the unnamed portal to
 have semantic differences from other portals.  We've gotten enough flak
 about the fact that it had planner behavioral differences (enough so that
 those differences are gone as of HEAD).

Oh, I missed that and worked from the assumption that we're still special-
casing the unnamed case. Since we don't, re-introducing a difference in
behaviour is probably a bad idea.

Still, optimizing only the simple protocol seems weird.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 Still, optimizing only the simple protocol seems weird.

Would it be sane to decree that the statement snapshot lasts until
Sync is received, in extended query mode?

But frankly I do not like any of these proposals.  Making fundamental
changes in long-established semantics in the name of squeezing out a few
cycles is the wrong way to design software.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Considering that GetSnapshotData() is the number-one consumer of CPU
 time on many profiling runs I've done, this seems needlessly
 inefficient.  Can't we arrange to retain the snapshot used for parse
 analysis / planning and reuse it for the portal that we create just
 afterwards?  Off the top of my head, I'm not exactly sure how to do
 that cleanly, but it seems like it should work.

Please refer to this thread:

  http://postgresql.1045698.n5.nabble.com/One-Shot-Plans-td4488820.html

It seems one of the more prominent drawback of Simon's approach to
one-shot plans then was which snapshot it's running against, so your
proposal to optimize one-shot plan by enforcing the use of a single
snapshot looks like a step forward here.

The other problem is how to recognize a query as being a candidate for
one-shot optimization, but I guess exec_simple_query (as opposed to the
v3 protocol) applies.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Robert Haas
On Fri, Nov 11, 2011 at 2:21 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Considering that GetSnapshotData() is the number-one consumer of CPU
 time on many profiling runs I've done, this seems needlessly
 inefficient.  Can't we arrange to retain the snapshot used for parse
 analysis / planning and reuse it for the portal that we create just
 afterwards?  Off the top of my head, I'm not exactly sure how to do
 that cleanly, but it seems like it should work.

 Please refer to this thread:

  http://postgresql.1045698.n5.nabble.com/One-Shot-Plans-td4488820.html

 It seems one of the more prominent drawback of Simon's approach to
 one-shot plans then was which snapshot it's running against, so your
 proposal to optimize one-shot plan by enforcing the use of a single
 snapshot looks like a step forward here.

 The other problem is how to recognize a query as being a candidate for
 one-shot optimization, but I guess exec_simple_query (as opposed to the
 v3 protocol) applies.

It would be nice if we could kill two birds with one stone, but I'm
not sure it we'll be that lucky.  The trouble is that PortalStart()
does different things depending on what opinion ChoosePortalStrategy()
offers about the statement to be processed, and the code that sets the
snapshot for parsing and planning uses a completely separate (and
generally simpler) heuristic.  Maybe there's an easy way to centralize
that decision-making; I'll have a look.  If not, I'll settle for
improving the case that looks improvable.

In terms of improving things for the extended protocol, I think there
may be other ways to do that, but this particular optimization won't
apply, so it's a separate project...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Simon Riggs
On Fri, Nov 11, 2011 at 7:21 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Considering that GetSnapshotData() is the number-one consumer of CPU
 time on many profiling runs I've done, this seems needlessly
 inefficient.  Can't we arrange to retain the snapshot used for parse
 analysis / planning and reuse it for the portal that we create just
 afterwards?  Off the top of my head, I'm not exactly sure how to do
 that cleanly, but it seems like it should work.

 Please refer to this thread:

  http://postgresql.1045698.n5.nabble.com/One-Shot-Plans-td4488820.html

 It seems one of the more prominent drawback of Simon's approach to
 one-shot plans then was which snapshot it's running against, so your
 proposal to optimize one-shot plan by enforcing the use of a single
 snapshot looks like a step forward here.

Agreed, its essentially the same thing.

If execution immediately follows planning we should recognise it and
do something about it.

Tom, in that earlier thread you said you'd be doing something in this
release about that. Can you say more about what that was, and will you
be doing it still?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-11 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Tom, in that earlier thread you said you'd be doing something in this
 release about that. Can you say more about what that was, and will you
 be doing it still?

http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=e6faf910d75027bdce7cd0f2033db4e912592bcc

I think that largely supersedes what I understood your notion of a
one-shot plan to be about, though perhaps I missed something?

I don't think this has a lot to do with what Robert is on about, since
in any situation where a plan is cached for later, we surely are not
going to use the same snapshot to execute it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers