Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-07 Thread Robert Haas
On Sun, Feb 5, 2012 at 12:44 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Tom Lane  wrote:

 Yeah, I think we need to preserve that property. Unexpectedly
 executing query (which may have side-effects) is a very dangerous
 thing.  People are used to the idea that ANALYZE == execute, and
 adding random other flags that also cause execution is going to
 burn somebody.

 +1

 FWIW, another reason not to use Robert's suggested syntax is that you
 get rows=n entries with or without the actual run.  You just don't
 get the actual block to compare to the estimate.  So ROWS as an
 option would be very ambiguous.

OK, so based on that resoundingly unanimous input, I've committed
Tomas's last version.  I made some alterations to the sgml
documentation to avoid mentioning gettimeofday specifically, because
that might not be the call everywhere (e.g. Windows) and even if it
is, it doesn't seem too user-friendly.  The code is entirely as he had
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-04 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 I suspect we will be unwilling to make such a break with the past.  In
 that case, I think I prefer the originally proposed semantics, even
 though I agree they are somewhat less natural.  ANALYZE is a big flag
 that means This query will be executed, not just planned.  If we are
 not going to make a major break, but only nibble around the edges,
 then I don't think we should remove the property that the query will
 be executed if and only if ANALYZE is specified.

Yeah, I think we need to preserve that property.  Unexpectedly executing
a query (which may have side-effects) is a very dangerous thing.  People
are used to the idea that ANALYZE == execute, and adding random other
flags that also cause execution is going to burn somebody.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-04 Thread Kevin Grittner
Tom Lane  wrote:
 
 Yeah, I think we need to preserve that property. Unexpectedly
 executing query (which may have side-effects) is a very dangerous
 thing.  People are used to the idea that ANALYZE == execute, and
 adding random other flags that also cause execution is going to
 burn somebody.
 
+1
 
FWIW, another reason not to use Robert's suggested syntax is that you
get rows=n entries with or without the actual run.  You just don't
get the actual block to compare to the estimate.  So ROWS as an
option would be very ambiguous.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Robert Haas
On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Fixed. The default value of TIMING option did not work as intended, it
 was set to true even for plain EXPLAIN (without ANALYZE). In that case
 the EXPLAIN failed.


 I've applied this over the show Heap Fetches in EXPLAIN ANALYZE
 output for index-only scans patch.  It applied and built and passes
 installcheck.

 I have not done a full review of this, but I want to say that I want
 this very much.  Not only can I get the row counts, but also the Heap
 Fetches and the output of BUFFERS, without the overhead of timing.  I
 haven't looked at contrib aspects of it at all.

 Thank you for making this.

+1.

After looking at this a bit, I think we can make the interface a bit
more convenient.  I'd like to propose that we call the EXPLAIN option
ROWS rather than TIMING, and that we provide the row-count
information if *either* ROWS or ANALYZE is specified.

Then, if you want rows without timing, you can say this:

EXPLAIN (ROWS) query...

Rather than, as in the approach taken by the current patch:

EXPLAIN (ANALYZE, TIMING off) query...

...which is a little more long-winded.  This also has the nice
advantage of making the name of the EXPLAIN option match the name of
the INSTRUMENT flag.

Revised patch along those lines attached.  Barring objections, I'll
commit this version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


explain-rows.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Tomas Vondra
On 3 Únor 2012, 17:12, Robert Haas wrote:
 On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Fixed. The default value of TIMING option did not work as intended, it
 was set to true even for plain EXPLAIN (without ANALYZE). In that case
 the EXPLAIN failed.


 I've applied this over the show Heap Fetches in EXPLAIN ANALYZE
 output for index-only scans patch.  It applied and built and passes
 installcheck.

 I have not done a full review of this, but I want to say that I want
 this very much.  Not only can I get the row counts, but also the Heap
 Fetches and the output of BUFFERS, without the overhead of timing.  I
 haven't looked at contrib aspects of it at all.

 Thank you for making this.

 +1.

 After looking at this a bit, I think we can make the interface a bit
 more convenient.  I'd like to propose that we call the EXPLAIN option
 ROWS rather than TIMING, and that we provide the row-count
 information if *either* ROWS or ANALYZE is specified.

 Then, if you want rows without timing, you can say this:

 EXPLAIN (ROWS) query...

 Rather than, as in the approach taken by the current patch:

 EXPLAIN (ANALYZE, TIMING off) query...

 ...which is a little more long-winded.  This also has the nice
 advantage of making the name of the EXPLAIN option match the name of
 the INSTRUMENT flag.

 Revised patch along those lines attached.  Barring objections, I'll
 commit this version.

I don't think changing the EXPLAIN syntax this way is a good idea. I think
that one option should not silently enable/disable others, so ROWS should
not enable ANALYZE automatically. It should throw an error just like the
TIMING option does when invoked without ANALYZE (IIRC).

Which leads to syntax like this

  EXPLAIN (ANALYZE, ROWS)

and that's a bit strange because it mentions ROWS but actually manipulates
TIMING behind the curtains. You can't actually do

  EXPLAIN (ANALYZE, ROWS off)

because row counts are always collected. So I think the syntax I proposed
makes more sense.

kind regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Robert Haas
On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I don't think changing the EXPLAIN syntax this way is a good idea. I think
 that one option should not silently enable/disable others, so ROWS should
 not enable ANALYZE automatically.

I didn't propose that.  The point is that the desired behavior
(however we name it) is a SUBSET of what ANALYZE does.

So we can either:

1. Have ANALYZE enable all the behavior, and have another option
(TIMING) that can be used to turn some of it back on again.

2. Have ANALYZE enable all the behavior, and have another option
(ROWS) that enables just the subset of it that we want.

I prefer #2 to #1, because I think it's a bit confusing to have an
option whose effect is to partially disable another option.  YMMV, of
course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Tomas Vondra
On 3.2.2012 18:28, Robert Haas wrote:
 On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I don't think changing the EXPLAIN syntax this way is a good idea. I think
 that one option should not silently enable/disable others, so ROWS should
 not enable ANALYZE automatically.
 
 I didn't propose that.  The point is that the desired behavior
 (however we name it) is a SUBSET of what ANALYZE does.
 
 So we can either:
 
 1. Have ANALYZE enable all the behavior, and have another option
 (TIMING) that can be used to turn some of it back on again.
 
 2. Have ANALYZE enable all the behavior, and have another option
 (ROWS) that enables just the subset of it that we want.
 
 I prefer #2 to #1, because I think it's a bit confusing to have an
 option whose effect is to partially disable another option.  YMMV, of
 course.

OK, thanks for the explanation. I don't like the idea of subsets as it
IMHO makes it less obvious what options are enabled. For example this

   EXPLAIN (ROWS) query...

does not immediately show it's actually going to do ANALYZE.

I prefer to keep the current 'ANALYZE' definition, i.e. collecting both
row counts and timing data (which is what 99% of people wants anyway),
and an option to disable the timing.

And the BUFFERS option currently works exactly like that, so defining
ROWS the way you proposed would be inconsistent with the current behavior.

Sure, we could redefine BUFFERS as a subset, so you could do

  EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off)
  EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE, BUFFERS on)

but what if someone wants both at the same time? Maybe he could do

  EXPLAIN (ROWS, BUFFERS)

and treat that as a union of those subsets. I don't think it's worth it.

I surely can live with both solutions (mine or the one you proposed).


kind regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Robert Haas
On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra t...@fuzzy.cz wrote:
 OK, thanks for the explanation. I don't like the idea of subsets as it
 IMHO makes it less obvious what options are enabled. For example this

   EXPLAIN (ROWS) query...

 does not immediately show it's actually going to do ANALYZE.

Well, it isn't, if ANALYZE means rows + timing...

 I prefer to keep the current 'ANALYZE' definition, i.e. collecting both
 row counts and timing data (which is what 99% of people wants anyway),
 and an option to disable the timing.

 And the BUFFERS option currently works exactly like that, so defining
 ROWS the way you proposed would be inconsistent with the current behavior.

 Sure, we could redefine BUFFERS as a subset, so you could do

  EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off)
  EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE, BUFFERS on)

 but what if someone wants both at the same time? Maybe he could do

  EXPLAIN (ROWS, BUFFERS)

 and treat that as a union of those subsets. I don't think it's worth it.

Yeah, I forgot that we'd have to allow that, though I don't think it
would be a big deal to fix that.

 I surely can live with both solutions (mine or the one you proposed).

Let's wait and see if anyone else has an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Tomas Vondra
On 3.2.2012 22:51, Robert Haas wrote:
 On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra t...@fuzzy.cz wrote:
 OK, thanks for the explanation. I don't like the idea of subsets as it
 IMHO makes it less obvious what options are enabled. For example this

   EXPLAIN (ROWS) query...

 does not immediately show it's actually going to do ANALYZE.
 
 Well, it isn't, if ANALYZE means rows + timing...

Yeah, sure. It depends on how you define ANALYZE. I see that as 'stats
collected at runtime' (in contrast to a query plan, which is just a ...
well, plan).

 but what if someone wants both at the same time? Maybe he could do

  EXPLAIN (ROWS, BUFFERS)

 and treat that as a union of those subsets. I don't think it's worth it.
 
 Yeah, I forgot that we'd have to allow that, though I don't think it
 would be a big deal to fix that.
 
 I surely can live with both solutions (mine or the one you proposed).
 
 Let's wait and see if anyone else has an opinion.

Sure. And thanks for your feedback!

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Jeff Janes
On Fri, Feb 3, 2012 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Fixed. The default value of TIMING option did not work as intended, it
 was set to true even for plain EXPLAIN (without ANALYZE). In that case
 the EXPLAIN failed.


 I've applied this over the show Heap Fetches in EXPLAIN ANALYZE
 output for index-only scans patch.  It applied and built and passes
 installcheck.

 I have not done a full review of this, but I want to say that I want
 this very much.  Not only can I get the row counts, but also the Heap
 Fetches and the output of BUFFERS, without the overhead of timing.  I
 haven't looked at contrib aspects of it at all.

 Thank you for making this.

 +1.

 After looking at this a bit, I think we can make the interface a bit
 more convenient.  I'd like to propose that we call the EXPLAIN option
 ROWS rather than TIMING, and that we provide the row-count
 information if *either* ROWS or ANALYZE is specified.

If we are allowed to make big breaks with the past, I would get rid of
ANALYZE altogether.

analyze basically is a synonym of explain, or is contained within
it.  You can't explain anything until you have first analyzed it.

Without ANALYZE, you are explaining the analysis done by the planner.
With it, you are explaining an analysis done by the planner and the
executor.

So we could do away with ANALYZE and have TIMING, ROWS, and BUFFERS,
any of which causes the query to actually be executed, not just
planned.

I suspect we will be unwilling to make such a break with the past.  In
that case, I think I prefer the originally proposed semantics, even
though I agree they are somewhat less natural.  ANALYZE is a big flag
that means This query will be executed, not just planned.  If we are
not going to make a major break, but only nibble around the edges,
then I don't think we should remove the property that the query will
be executed if and only if ANALYZE is specified.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-01-21 Thread Jeff Janes
On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Fixed. The default value of TIMING option did not work as intended, it
 was set to true even for plain EXPLAIN (without ANALYZE). In that case
 the EXPLAIN failed.


I've applied this over the show Heap Fetches in EXPLAIN ANALYZE
output for index-only scans patch.  It applied and built and passes
installcheck.

I have not done a full review of this, but I want to say that I want
this very much.  Not only can I get the row counts, but also the Heap
Fetches and the output of BUFFERS, without the overhead of timing.  I
haven't looked at contrib aspects of it at all.

Thank you for making this.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-01-13 Thread Tomas Vondra
On 13.1.2012 18:07, Josh Berkus wrote:
 Eric's review follows:
 
 Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
 2012-01-12 git checkout.
 
 Patch applied fine.
 
 'make check' results in failures when this patch is put into place.
 
 
  6 of 128 tests failed.
 
 
 Here are the relevant failures.
 
 parallel group (2 tests):  create_view create_index
  create_index ... FAILED
 parallel group (9 tests):  create_cast create_aggregate drop_if_exists
 typed_table vacuum constraints create_table_like triggers inherit
  inherit  ... FAILED
 parallel group (20 tests):  select_distinct_on btree_index update random
 select_distinct select_having namespace delete case hash_index union
 select_implicit select_into transactions portals subselect arrays
 aggregates join prepared_xacts
  join ... FAILED
  aggregates   ... FAILED
 parallel group (15 tests):  combocid portals_p2 advisory_lock xmlmap
 tsdicts guc functional_deps dependency select_views cluster tsearch
 window foreign_data foreign_key bitmapops
  foreign_data ... FAILED
 parallel group (19 tests):  limit prepare copy2 conversion xml plancache
 returning temp sequence without_oid with rowtypes truncate polymorphism
 domain largeobject rangefuncs alter_table plpgsql
  alter_table  ... FAILED
 

Fixed. The default value of TIMING option did not work as intended, it
was set to true even for plain EXPLAIN (without ANALYZE). In that case
the EXPLAIN failed.

Tomas
diff --git a/contrib/auto_explain/auto_explain.c 
b/contrib/auto_explain/auto_explain.c
index 61da6a2..9fc0ae1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -23,6 +23,7 @@ static intauto_explain_log_min_duration = -1; /* msec or 
-1 */
 static bool auto_explain_log_analyze = false;
 static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_timing  = false;
 static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
 
@@ -133,6 +134,17 @@ _PG_init(void)
 NULL,
 NULL);
 
+   DefineCustomBoolVariable(auto_explain.log_timing,
+Collect timing data, 
not just row counts.,
+NULL,
+
auto_explain_log_timing,
+true,
+PGC_SUSET,
+0,
+NULL,
+NULL,
+NULL);
+
EmitWarningsOnPlaceholders(auto_explain);
 
/* Install hooks. */
@@ -170,7 +182,12 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* Enable per-node instrumentation iff log_analyze is required. 
*/
if (auto_explain_log_analyze  (eflags  
EXEC_FLAG_EXPLAIN_ONLY) == 0)
{
-   queryDesc-instrument_options |= INSTRUMENT_TIMER;
+   if (auto_explain_log_timing)
+   queryDesc-instrument_options |= 
INSTRUMENT_TIMER;
+   else
+   queryDesc-instrument_options |= 
INSTRUMENT_ROWS;
+
+
if (auto_explain_log_buffers)
queryDesc-instrument_options |= 
INSTRUMENT_BUFFERS;
}
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 8a9c9de..4488956 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -40,6 +40,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] replaceable 
class=parameterstatement/replac
 VERBOSE [ replaceable class=parameterboolean/replaceable ]
 COSTS [ replaceable class=parameterboolean/replaceable ]
 BUFFERS [ replaceable class=parameterboolean/replaceable ]
+TIMING [ replaceable class=parameterboolean/replaceable ]
 FORMAT { TEXT | XML | JSON | YAML }
 /synopsis
  /refsynopsisdiv
@@ -172,6 +173,20 @@ ROLLBACK;
/varlistentry
 
varlistentry
+termliteralTIMING/literal/term
+listitem
+ para
+  Include information on timing for each node. Specifically, include the
+  actual startup time and time spent in the node. This may involve a lot
+  of gettimeofday calls, and on some systems this means significant
+  slowdown of the query. 
+  This parameter may only be used when literalANALYZE/literal is also
+  enabled.  It defaults to