Re: [HACKERS] EXPLAIN (ANALYZE, BUFFERS) reports bogus temporary buffer reads

2017-10-31 Thread Robert Haas
On Tue, Oct 17, 2017 at 2:29 AM, Thomas Munro
 wrote:
> Vik Fearing asked off-list why hash joins appear to read slightly more
> temporary data than they write.  The reason is that we notch up a
> phantom block read when we hit the end of each file.  Harmless but it
> looks a bit weird and it's easily fixed.
>
> Unpatched, a 16 batch hash join reports that we read 30 more blocks
> than we wrote (2 per batch after the first, as expected):
>
>Buffers: shared hit=434 read=16234, temp read=5532 written=5502
>
> With the attached patch:
>
>Buffers: shared hit=547 read=16121, temp read=5502 written=5502

Committed.  Arguably we ought to back-patch this, but it's minor so I didn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.

2016-07-07 Thread Fujii Masao
On Fri, Jul 8, 2016 at 12:55 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Jul 7, 2016 at 10:07 AM, Tom Lane  wrote:
>>> Presumably the instrumentation data needed for that is not getting
>>> returned from the worker to the leader.
>
>> Yes.
>
>> ...
>> I'm not sure about the rest of you, but I'd kind of like to finish
>> this release and start working on the next one.
>
> Agreed.  We should make sure that the possible omissions from EXPLAIN
> output are adequately documented, but actually fixing that seems like
> material for a future release cycle.

Fair enough.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.

2016-07-07 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jul 7, 2016 at 10:07 AM, Tom Lane  wrote:
>> Presumably the instrumentation data needed for that is not getting
>> returned from the worker to the leader.

> Yes.

> ...
> I'm not sure about the rest of you, but I'd kind of like to finish
> this release and start working on the next one.

Agreed.  We should make sure that the possible omissions from EXPLAIN
output are adequately documented, but actually fixing that seems like
material for a future release cycle.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.

2016-07-07 Thread Robert Haas
On Thu, Jul 7, 2016 at 10:07 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masao  wrote:
>>> I found $SUBJECT while trying to test parallel queries. Is this a bug?
>
> Presumably the instrumentation data needed for that is not getting
> returned from the worker to the leader.

Yes.

> I would bet there's a lot
> of other plan-node-specific data that doesn't work either.

That's probably true, too.  Generally, what's going to happen here is
that if you have a true parallel query plan, any of this sort of
subsidiary information is going to reflect what the leader did, but
not what the workers did.  If the leader did nothing, as in the case
of force_parallel_mode, then EXPLAIN ANALYZE will show the same thing
that it would have shown if that node had never executed.

>> I think this can never happen for force_parallel_mode TO off, because
>> we don't generate a gather on top of sort node.  The reason why we are
>> able to push Sort below gather, because it is marked as parallel_safe
>> (create_sort_path).  I think we should not mark it as parallel_safe.
>
> That seems rather ridiculous.  An oversight in managing EXPLAIN data
> is not a sufficient reason to cripple parallel query.

+1.

Fixing this is actually somewhat difficult.  The parallel query stuff
does handle propagating the common instrumentation information from
the leader to the workers, but the EXPLAIN ANALYZE output can depend
in arbitrary ways on the final executor state tree, which is, of
course, unshared, and which is also not something we can propagate
between backends since executor state nodes don't have (and can't
really support) serialization and deserialization functions.  I think
we can eventually fix this by teaching individual nodes to store the
relevant information in dynamic shared memory rather than
backend-local memory when parallel query is in use: the
Estimate/InitializeDSM callbacks already give the nodes a chance to
obtain control in the right places, except that right now they're only
invoked for parallel-aware nodes.  I think, though, that it will take
more development than we want to undrertake at this point in the
cycle.

I'm not sure about the rest of you, but I'd kind of like to finish
this release and start working on the next one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.

2016-07-07 Thread Amit Kapila
On Thu, Jul 7, 2016 at 7:37 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masao  wrote:
>>> I found $SUBJECT while trying to test parallel queries. Is this a bug?
>
> Presumably the instrumentation data needed for that is not getting
> returned from the worker to the leader.  I would bet there's a lot
> of other plan-node-specific data that doesn't work either.
>
>> I think this can never happen for force_parallel_mode TO off, because
>> we don't generate a gather on top of sort node.  The reason why we are
>> able to push Sort below gather, because it is marked as parallel_safe
>> (create_sort_path).  I think we should not mark it as parallel_safe.
>
> That seems rather ridiculous.  An oversight in managing EXPLAIN data
> is not a sufficient reason to cripple parallel query.
>

I am analyzing that point only and you seems to be right that we have
missed to propagate some information.  We have taken care of
instrumentation information to be propagated back to leader, but it
seems there are other things that needs to be taken care in that area.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.

2016-07-07 Thread Tom Lane
Amit Kapila  writes:
> On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masao  wrote:
>> I found $SUBJECT while trying to test parallel queries. Is this a bug?

Presumably the instrumentation data needed for that is not getting
returned from the worker to the leader.  I would bet there's a lot
of other plan-node-specific data that doesn't work either.

> I think this can never happen for force_parallel_mode TO off, because
> we don't generate a gather on top of sort node.  The reason why we are
> able to push Sort below gather, because it is marked as parallel_safe
> (create_sort_path).  I think we should not mark it as parallel_safe.

That seems rather ridiculous.  An oversight in managing EXPLAIN data
is not a sufficient reason to cripple parallel query.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.

2016-07-07 Thread Amit Kapila
On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masao  wrote:
> Hi,
>
> I found $SUBJECT while trying to test parallel queries. Is this a bug?
>
>
> In not parallel mode, EXPLAIN ANALYZE reports the information about
> Sort Method as follows.
>
> =# EXPLAIN ANALYZE SELECT * FROM pgbench_accounts ORDER BY bid;
> QUERY PLAN
> ---
>  Sort  (cost=180739.34..183239.34 rows=100 width=97) (actual
> time=1501.342..1836.057 rows=100 loops=1)
>Sort Key: bid
>Sort Method: external sort  Disk: 104600kB
>->  Seq Scan on pgbench_accounts  (cost=0.00..26394.00 rows=100
> width=97) (actual time=0.013..179.315 rows=100 loops=1)
>
>
> However, in parallel mode, it's not reported, as follows.
>
> =# SET force_parallel_mode TO on;
> =# EXPLAIN ANALYZE SELECT * FROM pgbench_accounts ORDER BY bid;
>QUERY
> PLAN
> -
>  Gather  (cost=181739.34..284239.34 rows=100 width=97) (actual
> time=1507.138..2394.028 rows=100 loops=1)
>Workers Planned: 1
>Workers Launched: 1
>Single Copy: true
>->  Sort  (cost=180739.34..183239.34 rows=100 width=97) (actual
> time=1503.112..1901.117 rows=100 loops=1)
>  Sort Key: bid
>  ->  Seq Scan on pgbench_accounts  (cost=0.00..26394.00
> rows=100 width=97) (actual time=0.021..181.079 rows=100
> loops=1)
>

I think this can never happen for force_parallel_mode TO off, because
we don't generate a gather on top of sort node.  The reason why we are
able to push Sort below gather, because it is marked as parallel_safe
(create_sort_path).  I think we should not mark it as parallel_safe.
Will investigate some more and send a patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze does not report actual rows correctly?

2016-05-18 Thread Tom Lane
chang chao  writes:
> The actual rows(rows=9950) part in the following line contained in the above 
> query plan seems strange.
> "  ->  Sort  (cost=10.64..11.14 rows=200 width=520) (actual time=0.045..0.561 
> rows=9950 loops=1)"
> Shouldn't it be 200?

No, that's probably correct, seeing that this node is the righthand child
of a mergejoin.  The discrepancy is from extra fetches due to the same row
being re-fetched multiple times thanks to mark/restore rescanning.  What
explain is reporting is the number of rows pulled from the node, not the
number of unique rows it processed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-14 Thread Amit Kapila
On Tue, Mar 15, 2016 at 5:22 AM, Robert Haas  wrote:
>
> On Sat, Mar 12, 2016 at 1:58 AM, Amit Kapila 
wrote:
> > Yeah, that makes the addition of test for this functionality difficult.
> > Robert, do you have any idea what kind of test would have caught this
issue?
>
> Yep.  Committed with that test:
>

Nice.  Thanks!

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-14 Thread Robert Haas
On Sat, Mar 12, 2016 at 1:58 AM, Amit Kapila  wrote:
> Yeah, that makes the addition of test for this functionality difficult.
> Robert, do you have any idea what kind of test would have caught this issue?

Yep.  Committed with that test:

DO $$
BEGIN
   EXECUTE 'EXPLAIN ANALYZE SELECT * INTO TABLE easi FROM int8_tbl';
END$$;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Amit Kapila
On Sat, Mar 12, 2016 at 7:11 PM, Mithun Cy 
wrote:
>
>
>
> On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila 
wrote:
> >With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
>force_parallel_mode=regress
>
> as per user manual.
> Setting this value to regress has all of the same effects as setting it
to on plus some additional effect that are intended to facilitate automated
> regression testing.
>

Yes, that is the only reason I mentioned that it better to have a test
which can be checked in automated way and I understand that the way you
have written test using Explain won't work in automated way, so not sure if
it is good idea to add such a test.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy
Sorry there was some issue with my mail settings same mail got set more
than once.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy
On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila 
wrote:
>With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
>force_parallel_mode=regress

as per user manual.
Setting this value to regress has all of the same effects as setting it to
on plus some additional effect that are intended to facilitate automated
regression testing. Normally, messages from a parallel worker are prefixed
with a context line, but a setting of regress suppresses this to guarantee
reproducible results. *Also, the Gather nodes added to plans by this
setting are hidden from the EXPLAIN output so that the output matches what
would be obtained if this setting were turned off.  *

And my test is for EXPLAIN statements. I think under regress mode it will
never fail even if parallel scan is used as per above statement.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy
On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila 
wrote:
>With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
>force_parallel_mode=regress

as per user manual.
Setting this value to regress has all of the same effects as setting it to
on plus some additional effect that are intended to facilitate automated
regression testing. Normally, messages from a parallel worker are prefixed
with a context line, but a setting of regress suppresses this to guarantee
reproducible results. *Also, the Gather nodes added to plans by this
setting are hidden from the EXPLAIN output so that the output matches what
would be obtained if this setting were turned off.  *

And my test is for EXPLAIN statements. I think under regress mode it will
never fail even if parallel scan is used as per above statement.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy
On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila 
wrote:
>With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
>force_parallel_mode=regress

as per user manual.
Setting this value to regress has all of the same effects as setting it to
on plus some additional effect that are intended to facilitate automated
regression testing. Normally, messages from a parallel worker are prefixed
with a context line, but a setting of regress suppresses this to guarantee
reproducible results. *Also, the Gather nodes added to plans by this
setting are hidden from the EXPLAIN output so that the output matches what
would be obtained if this setting were turned off.  *

And my test is for EXPLAIN statements. I think under regress mode it will
never fail even if parallel scan is used as per above statement.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Amit Kapila
On Sat, Mar 12, 2016 at 2:02 PM, Mithun Cy 
wrote:
>
>
>
> On Sat, Mar 12, 2016 at 12:28 PM, Amit Kapila 
wrote
> >I don't see how this test will fail with force_parallel_mode=regress and
max_parallel_degree > 0 even without the patch proposed to fix the issue in
>hand.  In short, I don't think this test would have caught the issue, so I
don't see much advantage in adding such a test.  Even if we want to add
such a >test case, I think as proposed this will substantially increase the
timing for "Select Into" test which might not be an acceptable test case
addition >especially for testing one corner case.
>
>
> Without above patch the make installcheck fails for select_into.sql with
below diff
>
> when
> force_parallel_mode = on
> max_parallel_degree = 3
>

With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
force_parallel_mode=regress


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy
On Sat, Mar 12, 2016 at 12:28 PM, Amit Kapila 
wrote
>I don't see how this test will fail with force_parallel_mode=regress and
max_parallel_degree > 0 even without the patch proposed to fix the issue in
>hand.  In short, I don't think this test would have caught the issue, so I
don't see much advantage in adding such a test.  Even if we want to add
such a >test case, I think as proposed this will substantially increase the
timing for "Select Into" test which might not be an acceptable test case
addition >especially for testing one corner case.


Without above patch the make installcheck fails for select_into.sql with
below diff

when
force_parallel_mode = on
max_parallel_degree = 3

diff results/select_into.out expected/select_into.out

104,110c104,107

< QUERY PLAN

< 

< Gather

< Number of Workers: 1

< Single Copy: true

< -> Seq Scan on mt1

< (4 rows)

---

> QUERY PLAN

> -

> Seq Scan on mt1

> (1 row)


Again with postgresql.conf non default settings.

force_parallel_mode = on
max_parallel_degree = 3
parallel_tuple_cost = 0

[mithun@localhost regress]$ diff results/select_into.out
expected/select_into.out

104,109c104,107

< QUERY PLAN

< 

< Gather

< Number of Workers: 3

< -> Parallel Seq Scan on mt1

< (3 rows)

---

> QUERY PLAN

> -

> Seq Scan on mt1

> (1 row)

To reduce the time of execution I can set the generate_series parameter to
500, which is fast in my machine and also fails with above diff but this
time only one worker is assigned as per plan.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-11 Thread Amit Kapila
On Fri, Mar 11, 2016 at 3:34 PM, Mithun Cy 
wrote:
>
> On Thu, Mar 10, 2016 at 9:39 PM, Robert Haas 
wrote:
> >I guess there must not be an occurrence of this pattern in the
> >regression tests, or previous force_parallel_mode testing would have
> >found this problem.  Perhaps this patch should add one?
>
> I have added the test to select_into.sql. Added Explain select into
statement.
>

I don't see how this test will fail with force_parallel_mode=regress and
max_parallel_degree > 0 even without the patch proposed to fix the issue in
hand.  In short, I don't think this test would have caught the issue, so I
don't see much advantage in adding such a test.  Even if we want to add
such a test case, I think as proposed this will substantially increase the
timing for "Select Into" test which might not be an acceptable test case
addition especially for testing one corner case.

>
> Explain Analyze produces planning time and execution time even with
TIMING OFF
> so not adding the same to regress tests.
>

Yeah, that makes the addition of test for this functionality difficult.
Robert, do you have any idea what kind of test would have caught this issue?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-11 Thread Mithun Cy
On Thu, Mar 10, 2016 at 9:39 PM, Robert Haas  wrote:
>I guess there must not be an occurrence of this pattern in the
>regression tests, or previous force_parallel_mode testing would have
>found this problem.  Perhaps this patch should add one?

I have added the test to select_into.sql. Added Explain select into
statement.
Explain Analyze produces planning time and execution time even with TIMING
OFF
so not adding the same to regress tests.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/test/regress/expected/select_into.out b/src/test/regress/expected/select_into.out
index 9d3f047..bb71260 100644
--- a/src/test/regress/expected/select_into.out
+++ b/src/test/regress/expected/select_into.out
@@ -94,3 +94,16 @@ INSERT INTO b SELECT 1 INTO f;
 ERROR:  SELECT ... INTO is not allowed here
 LINE 1: INSERT INTO b SELECT 1 INTO f;
 ^
+--
+-- EXPLAIN [ANALYZE] SELECT INTO should not use parallel scan.
+--
+CREATE TABLE mt1 (n INT);
+INSERT INTO mt1 VALUES (GENERATE_SERIES(1,500));
+ANALYZE mt1;
+EXPLAIN (COSTS OFF) SELECT INTO mt2 FROM mt1;
+   QUERY PLAN
+-
+ Seq Scan on mt1
+(1 row)
+
+DROP TABLE mt1;
diff --git a/src/test/regress/sql/select_into.sql b/src/test/regress/sql/select_into.sql
index 4d1cc86..6eb5e24 100644
--- a/src/test/regress/sql/select_into.sql
+++ b/src/test/regress/sql/select_into.sql
@@ -76,3 +76,13 @@ COPY (SELECT 1 INTO frak UNION SELECT 2) TO 'blob';
 SELECT * FROM (SELECT 1 INTO f) bar;
 CREATE VIEW foo AS SELECT 1 INTO b;
 INSERT INTO b SELECT 1 INTO f;
+
+--
+-- EXPLAIN [ANALYZE] SELECT INTO should not use parallel scan.
+--
+CREATE TABLE mt1 (n INT);
+INSERT INTO mt1 VALUES (GENERATE_SERIES(1,500));
+ANALYZE mt1;
+
+EXPLAIN (COSTS OFF) SELECT INTO mt2 FROM mt1;
+DROP TABLE mt1;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-10 Thread Robert Haas
On Thu, Mar 10, 2016 at 4:43 AM, Amit Kapila  wrote:
> There should be a white space between 0:CURSOR_OPT_PARALLEL_OK.  Also I
> don't see this comment is required as similar other usage doesn't have any
> such comment.  Fixed these two points in the attached patch.
>
> In general, the patch looks good to me and solves the problem mentioned.  I
> have ran the regression tests with force_parallel_mode and doesn't see any
> problem.

I guess there must not be an occurrence of this pattern in the
regression tests, or previous force_parallel_mode testing would have
found this problem.  Perhaps this patch should add one?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-10 Thread Amit Kapila
On Wed, Mar 9, 2016 at 8:18 PM, Mithun Cy 
wrote:
>
> Hi All,
>
> Explain [Analyze] Select Into table. produces the plan which uses
parallel scans.
>
> Possible Fix:
>
> I tried to make a patch to fix this. Now in ExplainOneQuery if into
clause is
>
> defined then parallel plans are disabled as similar to their execution.
>


- /* plan the query */

- plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, params);

+ /*

+ * plan the query.

+ * Note: If Explain is for CreateTableAs / SelectInto Avoid parallel

+ *   plans.

+ */

+ plan = pg_plan_query(query, into ? 0:CURSOR_OPT_PARALLEL_OK, params);


There should be a white space between 0:CURSOR_OPT_PARALLEL_OK.  Also I
don't see this comment is required as similar other usage doesn't have any
such comment.  Fixed these two points in the attached patch.

In general, the patch looks good to me and solves the problem mentioned.  I
have ran the regression tests with force_parallel_mode and doesn't see any
problem.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Analyze_select_into_disable_parallel_scan_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
-  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
  Sort Method: top-N heapsort

Going off on a tangent, when I was playing with a merge-sort
implementation I propagated limit information into the sort
node, for a significant win.  Eliding the Limit node gave
a further slight win.

I wasn't convinced the use-case was common enough to justify
the replacement of quicksort (despite having consistently
fewer compares, the merge sort was slightly slower.  I never
understood why) - but I never asked.  Is there any appetite
for supporting alternate sort algorithms?
-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Tom Lane
Jeremy Harris j...@wizmail.org writes:
 On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
 -  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
 Sort Method: top-N heapsort

 Going off on a tangent, when I was playing with a merge-sort
 implementation I propagated limit information into the sort
 node, for a significant win.

I'm not entirely following.  The top-N heapsort approach already
makes use of the limit info.

If the limit is so large that the sort spills to disk, then we
stop thinking about the limit.  But I'm finding it doubtful either
that that's a case worthy of extra code or that you could get very
much win if you did add code for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 14:54, Tom Lane wrote:
 Jeremy Harris j...@wizmail.org writes:
 On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
 -  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
 Sort Method: top-N heapsort
 
 Going off on a tangent, when I was playing with a merge-sort
 implementation I propagated limit information into the sort
 node, for a significant win.
 
 I'm not entirely following.  The top-N heapsort approach already
 makes use of the limit info.

Having gone back to look, you're right.  It was Uniq nodes I merged
(the sort handles both bounded-output and dedup).
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
-  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
  Sort Method: top-N heapsort

 The Sort estimate shows 568733 rows, whereas the actual rows are 20.

[ shrug... ]  The estimated value is the planner's estimate of what would
happen *if you ran the node to completion*, which in practice doesn't
happen because of the LIMIT.  The actual value is, well, the actual value.
We certainly should not munge around the actual value.

We could imagine munging the reported estimates to account for the parent
LIMIT, but that would make it a lot harder to understand the planner's
thought processes, because the reported estimates would have that much
less to do with the numbers actually used in the internal calculations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-13 Thread David G Johnston
Tom Lane-2 wrote
 Simon Riggs lt;

 simon@

 gt; writes:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
-  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
  Sort Method: top-N heapsort
 
 The Sort estimate shows 568733 rows, whereas the actual rows are 20.
 
 [ shrug... ]  The estimated value is the planner's estimate of what would
 happen *if you ran the node to completion*, which in practice doesn't
 happen because of the LIMIT.  The actual value is, well, the actual value.
 We certainly should not munge around the actual value.
 
 We could imagine munging the reported estimates to account for the parent
 LIMIT, but that would make it a lot harder to understand the planner's
 thought processes, because the reported estimates would have that much
 less to do with the numbers actually used in the internal calculations.

Is it even possible for a sort node directly under a limit to output (as
nebulous as that term is in this context) more rows that desired by the
limit?

The interesting thing about a sort node is not its output but its input -
i.e., the number of rows being fed to it via the node nested under it. 
Which prompts the question whether it would be good to show that value as an
attribute of the sort node during EXPLAIN ANALYZE instead of having to scan
down to the child node.  I guess you can argue that we are currently since
that is the same value as the estimated rows returned.  If you were to
change that to reflect the impact of the parent limit node you'd probably
want to add something else to reflect the child input size (in rows, not
memory).

From a pure theory standpoint having the estimated rows reflect the input
size instead of the output size seems wrong.  In the presence of limit it
won't output more than N rows whereas in all other cases the input and the
output will be identical.  That said I am only pondering this concept
because of this thread - it would help to know what sparked all of this in
the first place.  From a practical perspective the current behavior captures
the most important aspect of the sort - the size of the input - and the user
knowing of the limit isn't likely to wonder whether we are somehow being
wasteful by returning the extra rows; which are not returned so much as
scanned over in place by the parent node.

David J.






--
View this message in context: 
http://postgresql.nabble.com/EXPLAIN-ANALYZE-output-weird-for-Top-N-Sort-tp5826922p5826935.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-13 Thread Tom Lane
David G Johnston david.g.johns...@gmail.com writes:
 Tom Lane-2 wrote
 [ shrug... ]  The estimated value is the planner's estimate of what would
 happen *if you ran the node to completion*, which in practice doesn't
 happen because of the LIMIT.  

 I don't see how a sort node cannot run to completion...

The sort must have read all of its *input*, or it can't be sure it's
giving the correct first result row.  But run to completion means
that it delivered all of its *output*, which obviously does not happen
when under a LIMIT.

It's entirely possible BTW that the sort's internal processing is not
complete when it starts returning rows.  For example, when we do a
spill-to-disk merge sort, the final merge pass is typically done
on-the-fly while returning rows, and so some fraction of that processing
may never be completed if the query stops early.  It's still seen all the
input rows, but it hasn't completely determined their ordering.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN (ANALYZE) broken

2013-06-05 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Commit 2c92edad48796119c83d7dbe6c33425d1924626d has broken
 EXPLAIN's ANALYZE option in parentheses, which also makes some
 other options unusable.

 test=# EXPLAIN (ANALYZE) SELECT 1;
 ERROR:  syntax error at or near ANALYZE
 LINE 1: EXPLAIN (ANALYZE) SELECT 1;
    ^

[ scratches head... ]  Coulda sworn I checked that.  Will look into it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-22 Thread Rudyar

On 22/11/11 02:58, Kevin Grittner wrote:

Rudyar  wrote:


I try to get the execution time of a query workload. I try using
explain analyze but this time is allways higher than the execution
time of a query across a client like pgadmin3

what is the reason about that difference?


It's the observer effect -- there is a cost to the timing,
counting, measuring, and reporting which is done by EXPLAIN ANALYZE,
which distorts what is being measured.  It's just about impossible to
avoid entirely.

-Kevin

Thanks kevin,

what tool you recommend for measure the query real query execution time?

--
Rudyar Cortés.
Estudiante de Ingeniería Civil Informática
Universidad Técnica Federico Santa María.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-22 Thread Kevin Grittner
Rudyar  wrote:
 
 what tool you recommend for measure the query real query
 execution time?
 
The -hackers list is for discussion to coordinate development of the
PostgreSQL database product.  For user questions like this, please
pick a more appropriate list based on the descriptions here:
 
http://www.postgresql.org/community/lists/
 
While any further follow-up should be on another list, I'll briefly
answer here.  EXPLAIN ANALYZE is great for seeing how a query is
being executed, but for accurate timing of how long the query runs
without generating all that extra information, you can measure it on
the client side, or turn on logging of statements running long than
some particular time.  In psql you can use \timing on, in Java you
can run System.currentTimeInMillis() before and after running the
query, etc.
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-22 Thread Rudyar

On 22/11/11 10:26, Kevin Grittner wrote:

Rudyar  wrote:


what tool you recommend for measure the query real query
execution time?


The -hackers list is for discussion to coordinate development of the
PostgreSQL database product.  For user questions like this, please
pick a more appropriate list based on the descriptions here:

http://www.postgresql.org/community/lists/

While any further follow-up should be on another list, I'll briefly
answer here.  EXPLAIN ANALYZE is great for seeing how a query is
being executed, but for accurate timing of how long the query runs
without generating all that extra information, you can measure it on
the client side, or turn on logging of statements running long than
some particular time.  In psql you can use \timing on, in Java you
can run System.currentTimeInMillis() before and after running the
query, etc.

-Kevin



Thanks Kevin ;)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-21 Thread Jeff Janes
On 11/21/11, Rudyar rudyar.cor...@gmail.com wrote:
 Hello,

 I try to get the execution time of a query workload. I try using explain
 analyze but this time is allways higher than
 the execution time of a query across a client like pgadmin3

 what is the reason about that difference?

Analyze has to do a lot of gettimeofday calls and other bookkeeping,
and that takes time.  If the data queried is all in memory, this
analyze overhead can be far more than the query itself.

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-21 Thread Kevin Grittner
Rudyar  wrote:
 
 I try to get the execution time of a query workload. I try using
 explain analyze but this time is allways higher than the execution
 time of a query across a client like pgadmin3
 
 what is the reason about that difference?
 
It's the observer effect -- there is a cost to the timing,
counting, measuring, and reporting which is done by EXPLAIN ANALYZE,
which distorts what is being measured.  It's just about impossible to
avoid entirely.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-22 Thread Magnus Hagander
On Mon, Nov 15, 2010 at 03:33, Greg Stark st...@mit.edu wrote:
 This is an update to my earlier patch to add getrusage resource
 tracking to EXPLAIN ANALYZE.

 With this patch you get something like:

                                                  QUERY PLAN
 --
  Seq Scan on i  (cost=0.00..6919.44 rows=262144 width=101) (actual
 time=17.240..1123.751 rows=262144 loops=1)
   Resources: sys=210.000ms user=430.000ms read=33.6MB
   Buffers: shared read=4298
  Total runtime: 1548.651 ms
 (4 rows)

 The main change is to make it work under Windows. At least I think the
 changes should make it work under Windows, I haven't been able to test
 it. Actually I'm not to happy with the way I did it, I would be more
 inclined to hack the getrusagestub,h definition of struct rusage to
 have an instr_time in it so that we can use the same macros directly.
 But that's more changes than I would be happy making without being
 able to compile them to test them.

I tried building this under windows, and got a bunch of errors.

First and easiest - you need to rename IOCOUNTERS to IO_COUNTERS in
getrusage.c :P

But then I get a number of:
c:\pgsql\src\include\portability/instr_time.h(133) : error C2371: 'instr_time' :
 redefinition; different basic types

and
.\src\backend\utils\adt\pgstatfuncs.c(1345) : error C2039: 'QuadPart' : is not a
 member of 'timeval'
C:\Program Files\Microsoft SDKs\Windows\v6.1\include\winsock2.h(176) : s
ee declaration of 'timeval'


which  believe are related to the same issue. Haven't looked close
enough to figure out what you actually intend for it to be :-)

Finally, a number of:
c:\pgsql\src\include\executor/instrument.h(19) : fatal error C1083: Cannot open
include file: 'sys/resource.h': No such file or directory

include files simply doesn't exist on Windows. Hiding it  behind an
#ifdef complains about fields missing in struct rusage in some cases
and lack of existing rusage definition in others. I think you need
includes of pg_rusage.h, which will make sure it brings in
rusagestub.h when necessary and sys/resource.h when it's there?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-22 Thread Greg Stark
On Mon, Nov 22, 2010 at 12:40 PM, Magnus Hagander mag...@hagander.net wrote:
 I tried building this under windows, and got a bunch of errors.

Thanks!

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-16 Thread Greg Stark
On Tue, Nov 16, 2010 at 2:43 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't really think these changes to the INSTR macros make much
 sense.  The macros don't really add much notational convenience;
 they're mostly wrappers to make the WIN32 and non-WIN32 cases work
 similarly for the instrumentation stuff, so hacking them up to use
 them for this doesn't seem like it adds anything.  Just do whatever it
 is you need to do, or define macros locally in explain.c.

Well they're going to be just like the macros in instr_time and I
didn't really want to duplicate them. What's really goofy here is that
on Windows the getrusage portability wrapper is actually consing up
those times from FILETIMEs which I believe are actually just like the
instr_time data type that we're already using in those macros so it's
going of doing a silly dance converting to struct timeval just so it
can use macros that aren't convenient at all on Windows.

I definitely agree this is awkward. I would love to find a cleaner
solution here. I'll try copying the macros and see if that works out
more cleanly.


 It doesn't make much sense to me to normalize the memory for this
 output to a variable unit when the other memory values we use in
 explain.c are still going to be printed as kB.  I think we should just
 print it in kB and call it good.  Alternatively, we could apply the
 same normalization algorithm across the board, but I don't think
 that's as good.

I think we should have a project policy of always printing memory and
disk usage in kB, MB, GB etc unless they're functions returning an
integer intended for machine use. Effectively this is the dual of
accepting units on all our memory gucs. I don't know about others but
I find it pretty hard to read things like 1234567kB and compare it to
125765kB in my head.

 I continue to feel strongly that the choice of EXPLAIN format should
 only affect the format, not the choice of information to be displayed.

I don't have a good feeling for what should appear in
XML/JSON/whetever. I thought I was just implementing the consensus
which was that there wasn't much point in not including the maximum
level of detail because the consumer could decide what to display. But
having it be the same as the human readable output is also reasonable.
Whatever we're doing elsewhere I'll make this the consistent.

  Using the verbose option to control how much data the resource option
 prints is, I think, not a good idea.  If you want to have two modes,
 one for partial rusage data and one for full rusage data, you can just
 as easily implement EXPLAIN (RESOURCE [PARTIAL|FULL]).  I believe that
 the existing grammar is adequate to support that; you'd just need to
 write the appropriate DefElem-parsing code.  But personally I'd just
 print the whole kit and kaboodle regardless.

This is a separate question. The numbers I'm outputing without VERBOSE
now are the numbers that I understand and which relate to the other
numbers in explain. My intent was that as we understand how the other
numbers relate to Postgres functioning we'll learn which ones to
include. But mostly they're hard to relate to Postgres and the query
execution in any way and just noise that are hard to explain to users.
I'm not even sure which ones it makes sense to sum over time -- is the
integral virtual memory usage over time something that it makes sense
to sum over time? What units is it in?

I don't see why VERBOSE isn't the right key to use. The other things
VERBOSE toggles are also extra detail that's usually useless but might
be useful if you're interested in the inner workings. Things like
which fields are being included at each level of the plan for example.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-16 Thread Robert Haas
On Tue, Nov 16, 2010 at 12:19 PM, Greg Stark st...@mit.edu wrote:
 On Tue, Nov 16, 2010 at 2:43 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't really think these changes to the INSTR macros make much
 sense.  The macros don't really add much notational convenience;
 they're mostly wrappers to make the WIN32 and non-WIN32 cases work
 similarly for the instrumentation stuff, so hacking them up to use
 them for this doesn't seem like it adds anything.  Just do whatever it
 is you need to do, or define macros locally in explain.c.

 Well they're going to be just like the macros in instr_time and I
 didn't really want to duplicate them. What's really goofy here is that
 on Windows the getrusage portability wrapper is actually consing up
 those times from FILETIMEs which I believe are actually just like the
 instr_time data type that we're already using in those macros so it's
 going of doing a silly dance converting to struct timeval just so it
 can use macros that aren't convenient at all on Windows.

 I definitely agree this is awkward. I would love to find a cleaner
 solution here. I'll try copying the macros and see if that works out
 more cleanly.

OK.

 It doesn't make much sense to me to normalize the memory for this
 output to a variable unit when the other memory values we use in
 explain.c are still going to be printed as kB.  I think we should just
 print it in kB and call it good.  Alternatively, we could apply the
 same normalization algorithm across the board, but I don't think
 that's as good.

 I think we should have a project policy of always printing memory and
 disk usage in kB, MB, GB etc unless they're functions returning an
 integer intended for machine use. Effectively this is the dual of
 accepting units on all our memory gucs. I don't know about others but
 I find it pretty hard to read things like 1234567kB and compare it to
 125765kB in my head.

We have no other place in the system that takes 12345678kB and
converts it inexactly to MB or GB.

rhaas=# set work_mem to '1048576kB';
SET
rhaas=# show work_mem;
 work_mem
--
 1GB
(1 row)

rhaas=# set work_mem to '1048577kB';
SET
rhaas=# show work_mem;
 work_mem
---
 1048577kB
(1 row)

We could decide that for text-format EXPLAIN output, an inexact
conversion is OK, but it's still not OK to do it for the new fields
you're adding and not for the existing fields (see the sort and hash
instrumentation).  Personally, I'm not really convinced that making
such a change has a lot of value, and I think it should be submitted
as a separate patch and discussed separately, rather than being rolled
in here.  But if we are going to change it then we at least need to be
consistent.

  Using the verbose option to control how much data the resource option
 prints is, I think, not a good idea.  If you want to have two modes,
 one for partial rusage data and one for full rusage data, you can just
 as easily implement EXPLAIN (RESOURCE [PARTIAL|FULL]).  I believe that
 the existing grammar is adequate to support that; you'd just need to
 write the appropriate DefElem-parsing code.  But personally I'd just
 print the whole kit and kaboodle regardless.

 This is a separate question. The numbers I'm outputing without VERBOSE
 now are the numbers that I understand and which relate to the other
 numbers in explain. My intent was that as we understand how the other
 numbers relate to Postgres functioning we'll learn which ones to
 include. But mostly they're hard to relate to Postgres and the query
 execution in any way and just noise that are hard to explain to users.
 I'm not even sure which ones it makes sense to sum over time -- is the
 integral virtual memory usage over time something that it makes sense
 to sum over time? What units is it in?

I suspect that the ru_maxrss, ru_ixrss, ru_idrss, and ru_isrss values
are useless to us for EXPLAIN purposes because it sounds like they
don't increment over time.  The others presumably do, so it makes
sense to show 'em if the system is collecting them.

 I don't see why VERBOSE isn't the right key to use. The other things
 VERBOSE toggles are also extra detail that's usually useless but might
 be useful if you're interested in the inner workings. Things like
 which fields are being included at each level of the plan for example.

Yeah, VERBOSE is kind of a catch-all for things that we don't have
individual flags for.  But I think it's better for each piece of data
to depend on one setting, rather than a combination of two or more
settings.  Otherwise you end up being forced to use VERBOSE and then
you get this deluge of output.  I'd actually sort of like to remove
some things from VERBOSE and give them their own settings, rather than
adding more.  The fact that VERBOSE turns on Output: is particularly
annoying.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-16 Thread Greg Stark
On Tue, Nov 16, 2010 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote:
 I think we should have a project policy of always printing memory and
 disk usage in kB, MB, GB etc unless they're functions returning an
 integer intended for machine use.

 rhaas=# set work_mem to '1048577kB';

Interesting. Though in this case I'll mention the raw data is in 512
byte increments so that would imply having to print .5 sometimes.

 We could decide that for text-format EXPLAIN output, an inexact
 conversion is OK, but it's still not OK to do it for the new fields
 you're adding and not for the existing fields (see the sort and hash
 instrumentation).  Personally, I'm not really convinced that making
 such a change has a lot of value, and I think it should be submitted
 as a separate patch and discussed separately, rather than being rolled
 in here.  But if we are going to change it then we at least need to be
 consistent.

Well I proposed that as a separate patch back pre-9.0 and it was
problematic. Having a policy of doing x doesn't mean we have to switch
everything to x or do nothing. We can have a policy that it's better
to do x and then go around fixing things as we find them.

But fine, I'll separate that into a separate patch.

 I suspect that the ru_maxrss, ru_ixrss, ru_idrss, and ru_isrss values
 are useless to us for EXPLAIN purposes because it sounds like they
 don't increment over time.  The others presumably do, so it makes
 sense to show 'em if the system is collecting them.

I'm not sure, the integral part might mean it is ok to sum. I've
never seen it explained anywhere exactly what these represent. But the
other question is whether they have anything to do with the plan
nodes. If Postgres is taking lots of memory or receiving signals or
receiving sysv messages (we don't even use sysv messages so that would
be very strange) it doesn't necessarily have anything to do with the
executor algorithm. They're probably just not very interesting stats
to present to the user.

 Yeah, VERBOSE is kind of a catch-all for things that we don't have
 individual flags for.  But I think it's better for each piece of data
 to depend on one setting, rather than a combination of two or more
 settings.  Otherwise you end up being forced to use VERBOSE and then
 you get this deluge of output.  I'd actually sort of like to remove
 some things from VERBOSE and give them their own settings, rather than
 adding more.  The fact that VERBOSE turns on Output: is particularly
 annoying.

I tend to think it's don't be clever about showing me just the useful
stuff, include whatever you've got even if you think it's not useful.
I would consider it a bug if there's anything that requires VERBOSE
and a user finds is relevant to a fixing a user problem (as opposed to
debugging a postgres bug).

I'm concerned about the converse problem. The I why do I have to
include two dozen flags to get postgres to actually include
everything?. Along with the try turning on this flag and resending
the plan. Hm, oops forgot a flag resend it again with this too. Oh
that's still not including something we need try it a third time with
this flag etc.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-16 Thread Robert Haas
On Tue, Nov 16, 2010 at 2:53 PM, Greg Stark st...@mit.edu wrote:
 Yeah, VERBOSE is kind of a catch-all for things that we don't have
 individual flags for.  But I think it's better for each piece of data
 to depend on one setting, rather than a combination of two or more
 settings.  Otherwise you end up being forced to use VERBOSE and then
 you get this deluge of output.  I'd actually sort of like to remove
 some things from VERBOSE and give them their own settings, rather than
 adding more.  The fact that VERBOSE turns on Output: is particularly
 annoying.

 I tend to think it's don't be clever about showing me just the useful
 stuff, include whatever you've got even if you think it's not useful.
 I would consider it a bug if there's anything that requires VERBOSE
 and a user finds is relevant to a fixing a user problem (as opposed to
 debugging a postgres bug).

 I'm concerned about the converse problem. The I why do I have to
 include two dozen flags to get postgres to actually include
 everything?. Along with the try turning on this flag and resending
 the plan. Hm, oops forgot a flag resend it again with this too. Oh
 that's still not including something we need try it a third time with
 this flag etc.

Well, the reason's pretty obvious in this case: each of ANALYZE,
BUFFERS, and RESOURCE adds a huge amount of incremental overhead.  I
wouldn't object to having a flag that says just turn on absolutely
everything you've got.  Then you could just say:

EXPLAIN (THECRAPOUTOFIT) query...

In any case, this seems like an argument for making RESOURCE print
everything we think might ever be useful, rather than just some subset
of it.  A rule that says print some of the RESOURCE information when
RESOURCE is specified but the rest only if VERBOSE is also specified
or if a non-text output format is used seems to have exactly the sort
of ickiness you're complaining about; I'd rather make it all or
nothing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 9:33 PM, Greg Stark st...@mit.edu wrote:
 This is an update to my earlier patch to add getrusage resource
 tracking to EXPLAIN ANALYZE.

 With this patch you get something like:

                                                  QUERY PLAN
 --
  Seq Scan on i  (cost=0.00..6919.44 rows=262144 width=101) (actual
 time=17.240..1123.751 rows=262144 loops=1)
   Resources: sys=210.000ms user=430.000ms read=33.6MB
   Buffers: shared read=4298
  Total runtime: 1548.651 ms
 (4 rows)

 The main change is to make it work under Windows. At least I think the
 changes should make it work under Windows, I haven't been able to test
 it. Actually I'm not to happy with the way I did it, I would be more
 inclined to hack the getrusagestub,h definition of struct rusage to
 have an instr_time in it so that we can use the same macros directly.
 But that's more changes than I would be happy making without being
 able to compile them to test them.

I don't really think these changes to the INSTR macros make much
sense.  The macros don't really add much notational convenience;
they're mostly wrappers to make the WIN32 and non-WIN32 cases work
similarly for the instrumentation stuff, so hacking them up to use
them for this doesn't seem like it adds anything.  Just do whatever it
is you need to do, or define macros locally in explain.c.

It doesn't make much sense to me to normalize the memory for this
output to a variable unit when the other memory values we use in
explain.c are still going to be printed as kB.  I think we should just
print it in kB and call it good.  Alternatively, we could apply the
same normalization algorithm across the board, but I don't think
that's as good.

I continue to feel strongly that the choice of EXPLAIN format should
only affect the format, not the choice of information to be displayed.
 Using the verbose option to control how much data the resource option
prints is, I think, not a good idea.  If you want to have two modes,
one for partial rusage data and one for full rusage data, you can just
as easily implement EXPLAIN (RESOURCE [PARTIAL|FULL]).  I believe that
the existing grammar is adequate to support that; you'd just need to
write the appropriate DefElem-parsing code.  But personally I'd just
print the whole kit and kaboodle regardless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Simon Riggs

On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote:

 What I'd find strange about 6.67 rows in your example is more that on
 the estimated rows side, it seems to imply an unrealistically precise estimate
 in the same way that 667 rows would seem unrealistically precise to me.
 Maybe rounding to 2 significant digits would reduce confusion?

You're right that the number of significant digits already exceeds the
true accuracy of the computation. I think what Robert wants to see is
the exact value used in the calc, so the estimates can be checked more
thoroughly than is currently possible.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Robert Haas

On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote:



On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote:

What I'd find strange about 6.67 rows in your example is more  
that on
the estimated rows side, it seems to imply an unrealistically  
precise estimate
in the same way that 667 rows would seem unrealistically precise  
to me.

Maybe rounding to 2 significant digits would reduce confusion?


You're right that the number of significant digits already exceeds the
true accuracy of the computation. I think what Robert wants to see is
the exact value used in the calc, so the estimates can be checked more
thoroughly than is currently possible.


Bingo.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 You're right that the number of significant digits already exceeds the
 true accuracy of the computation. I think what Robert wants to see is
 the exact value used in the calc, so the estimates can be checked more
 thoroughly than is currently possible.

 Bingo.

Uh, the planner's estimate *is* an integer.  What was under discussion
(I thought) was showing some fractional digits in the case where EXPLAIN
ANALYZE is outputting a measured row count that is an average over
multiple loops, and therefore isn't necessarily an integer.  In that
case the measured value can be considered arbitrarily precise --- though
I think in practice one or two fractional digits would be plenty.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Robert Haas



...Robert

On Jun 2, 2009, at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:


Robert Haas robertmh...@gmail.com writes:
On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com  
wrote:
You're right that the number of significant digits already exceeds  
the
true accuracy of the computation. I think what Robert wants to see  
is
the exact value used in the calc, so the estimates can be checked  
more

thoroughly than is currently possible.



Bingo.


Uh, the planner's estimate *is* an integer.  What was under discussion
(I thought) was showing some fractional digits in the case where  
EXPLAIN

ANALYZE is outputting a measured row count that is an average over
multiple loops, and therefore isn't necessarily an integer.  In that
case the measured value can be considered arbitrarily precise ---  
though

I think in practice one or two fractional digits would be plenty.


We're in violent agreement here.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-01 Thread Ron Mayer
Euler Taveira de Oliveira wrote:
 Robert Haas escreveu:
 ...EXPLAIN ANALYZE reports the number of rows as an integer...  Any
 chance we could reconsider this decision?  I often find myself wanting
 to know the value that is here called ntuples, but rounding
 ntuples/nloops off to the nearest integer loses too much precision.

 Don't you think is too strange having, for example, 6.67 rows? I would confuse
 users and programs that parses the EXPLAIN output. However, I wouldn't object

I don't think it's that confusing.   If it says 0.1 rows, I imagine most
people would infer that this means typically 0, but sometimes 1 or a few rows.

What I'd find strange about 6.67 rows in your example is more that on
the estimated rows side, it seems to imply an unrealistically precise estimate
in the same way that 667 rows would seem unrealistically precise to me.
Maybe rounding to 2 significant digits would reduce confusion?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-05-29 Thread Tom Lane
Joshua Tolley eggyk...@gmail.com writes:
 On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote:
 On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira
 Don't you think is too strange having, for example, 6.67 rows?
 
 No stranger than having it say 7 when it's really not.  Actually mine
 mostly come out 1 when the real value is somewhere between 0.5 and
 1.49.  :-(

 +1. It would help users realize more quickly that some of the values in the
 EXPLAIN output are, for instance, *average* number of rows *per iteration* of 
 a
 nested loop, say, rather than total rows found in all loops.

I think it would only be sensible to show fractional digits if nloops is
greater than 1.  Otherwise the value must in fact be an integer, and
you're just going to confuse people more by suggesting that it might not
be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-05-29 Thread Robert Haas
On Fri, May 29, 2009 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Joshua Tolley eggyk...@gmail.com writes:
 On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote:
 On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira
 Don't you think is too strange having, for example, 6.67 rows?

 No stranger than having it say 7 when it's really not.  Actually mine
 mostly come out 1 when the real value is somewhere between 0.5 and
 1.49.  :-(

 +1. It would help users realize more quickly that some of the values in the
 EXPLAIN output are, for instance, *average* number of rows *per iteration* 
 of a
 nested loop, say, rather than total rows found in all loops.

 I think it would only be sensible to show fractional digits if nloops is
 greater than 1.  Otherwise the value must in fact be an integer, and
 you're just going to confuse people more by suggesting that it might not
 be.

That might be over-engineering, but I'll take it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-05-28 Thread Euler Taveira de Oliveira
Robert Haas escreveu:
 I have always assumed that there is some very good reason why EXPLAIN
 ANALYZE reports the number of rows as an integer rather than a
 floating point value, but in reading explain.c it seems that the
 reason is just that we decided to round to zero decimal places.  Any
 chance we could reconsider this decision?  I often find myself wanting
 to know the value that is here called ntuples, but rounding
 ntuples/nloops off to the nearest integer loses too much precision.
 
Don't you think is too strange having, for example, 6.67 rows? I would confuse
users and programs that parses the EXPLAIN output. However, I wouldn't object
to add ntuples to an extended explain output (as discussed in the other thread).


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-05-28 Thread Robert Haas
On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira
eu...@timbira.com wrote:
 Robert Haas escreveu:
 I have always assumed that there is some very good reason why EXPLAIN
 ANALYZE reports the number of rows as an integer rather than a
 floating point value, but in reading explain.c it seems that the
 reason is just that we decided to round to zero decimal places.  Any
 chance we could reconsider this decision?  I often find myself wanting
 to know the value that is here called ntuples, but rounding
 ntuples/nloops off to the nearest integer loses too much precision.

 Don't you think is too strange having, for example, 6.67 rows?

No stranger than having it say 7 when it's really not.  Actually mine
mostly come out 1 when the real value is somewhere between 0.5 and
1.49.  :-(

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-05-28 Thread Joshua Tolley
On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote:
 On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira
  Don't you think is too strange having, for example, 6.67 rows?
 
 No stranger than having it say 7 when it's really not.  Actually mine
 mostly come out 1 when the real value is somewhere between 0.5 and
 1.49.  :-(

+1. It would help users realize more quickly that some of the values in the
EXPLAIN output are, for instance, *average* number of rows *per iteration* of a
nested loop, say, rather than total rows found in all loops. That's an
important distinction that isn't immediately clear to the novice EXPLAIN
reader, but would become so very quickly as users tried to figure out how a
scan could come up with a fractional row.

- Josh / eggyknap


signature.asc
Description: Digital signature


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2008-01-03 Thread Robert Lor

Greg,

Gregory Stark wrote:

I don't think DTrace is overkill either. The programmatic interface is
undocumented (but I've gotten Sun people to admit it exists -- I just have to
reverse engineer it from the existing code samples) but should be more or less
exactly what we need.

  
You  probably know this already. There are existing commands that use 
the programmatic interface and would provide a good starting point. Here 
are a couple:


http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/lockstat/lockstat.c
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/dtrace/dtrace.c

One of my colleagues is in the process of putting a tutorial together 
for how to do this, so if you decided to pursue this approach and need 
assistance, please let me know.


Regards,
Robert

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-18 Thread Gregory Stark
Decibel! [EMAIL PROTECTED] writes:

 Also, has anyone looked into adding a class of system calls that  would
 actually tell us if the kernel issued physical IO? I find it  hard to believe
 that other RDBMSes wouldn't like to have that info...

Yeah, I think that's called DTrace

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-18 Thread Gregory Stark

Decibel! [EMAIL PROTECTED] writes:

 When a read() call returns, surely the kernel knows whether it  actually 
 issued
 a physical read request to satisfy that. I don't see  any reason why you
 couldn't have a version of read() that returns  that information. I also 
 rather
 doubt that we're the only userland  software that would make use of that.

I'm told this exists on Windows for the async interface. But AFAIK it doesn't
on Unix. The visibility into things like this is what makes DTrace so
remarkable.

I think there aren't many userland software interested in this. The only two
cases I can think of are databases -- which use direct I/O partly because of
this issue -- and real-time software like multimedia software -- which use
aren't so much interested in measuring it as forcing things to be preloaded
with stuff like posix_fadvise() or mlock().

I don't think DTrace is overkill either. The programmatic interface is
undocumented (but I've gotten Sun people to admit it exists -- I just have to
reverse engineer it from the existing code samples) but should be more or less
exactly what we need.

But the lowest-common-denominator of just timing read() and seeing if it took
long enough to involve either a context switch or sleeping on physical i/o
should be a pretty close approximation. The case where it would be least
accurate is when most or all of the data is actually in the cache. Then even
with a low false-positive rate detecting cache misses it'll still dominate the
true near-zero rate of cache misses.

We could mitigate that somewhat by describing it in the plan as something like

... (... I/O fast=nnn slow=nnn)

instead of the more descriptive physical and logical

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-17 Thread Decibel!

On Dec 14, 2007, at 11:10 PM, Neil Conway wrote:
But it occurred to me just now that the hardware instruction  
counter available
on just about every platform would be good enough for a heuristic  
guess at

whether the read(2) was cached.


I'm skeptical that this would be reliable enough to be very useful,
especially in the face of concurrent, unpredictable system activity  
on a

busy system. I agree that it would be useful information, though.
Perhaps a useful first step would be to teach EXPLAIN ANALYZE to  
report
the number of logical and physical I/Os from Postgres' perspective  
(i.e.

physical I/O just means we need to go to the kernel).



*watches hands wave*

If we assume that what we *really* want to see the difference of is a  
cache IO vs one that truly hits a platter, I don't see how you could  
come up with enough variation to account for that. Unless you're  
super lucky and the sector you need is just about to hit the head  
when you issue the request, you're going to have a delay measured in  
milliseconds. 1ms on a modern CPU is over 100,000 cycles. That's a  
lot of instructions...


Someone want to throw together some code that actually measures this?  
Maybe something that keeps a histogram of how many instructions take  
place per I/O request? If it turns out that counters do vary too much  
between CPUs, there might be ways that we can account for that.


Also, has anyone looked into adding a class of system calls that  
would actually tell us if the kernel issued physical IO? I find it  
hard to believe that other RDBMSes wouldn't like to have that info...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-17 Thread Greg Smith

On Mon, 17 Dec 2007, Decibel! wrote:

Someone want to throw together some code that actually measures this? Maybe 
something that keeps a histogram of how many instructions take place per I/O 
request? If it turns out that counters do vary too much between CPUs, there 
might be ways that we can account for that.


I'd expect the easiest way to do a proof of concept here would be to use 
the Linux oprofile tool.  That's already abstracted away the differences 
in hardware counters and provides a relatively simple interface to collect 
the data without getting bogged down with implementation there.  Starting 
from scratch, going right to the hardware counters and building from 
there, is a big project--they've been hacking on oprofile for almost six 
years now and still aren't suggesting it's release quality yet.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-17 Thread Trevor Talbot
On 12/17/07, Decibel! [EMAIL PROTECTED] wrote:

 Also, has anyone looked into adding a class of system calls that
 would actually tell us if the kernel issued physical IO? I find it
 hard to believe that other RDBMSes wouldn't like to have that info...

Non-blocking style interfaces can help here. On Windows, for instance,
a read returns data at the call site if it was satisfied by cache,
instead of invoking the asynchronous notification.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-16 Thread Gokulakannan Somasundaram
On Dec 16, 2007 1:03 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  I was going to say that I'm really only interested in physical I/O.
 Logical
  I/O which is satisfied by the kernel cache is only marginally
 interesting
  and
  buffer fetches from Postgres's shared buffer is entirely uninteresting
  from
  the point of view of trying to figure out what is slowing down a query.
 
  Ok the Physical I/Os are already visible, if you enable
 log_statement_stats.

 I think you missed the point. What log_statement_stats shows are not
 physical I/Os, they're read() system calls. Unfortunately there's no
 direct way to tell if a read() is satisfied from OS cache or not. Greg's
 suggestion was about how to do that.


Oh OK. Thanks for clarifying..

-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-15 Thread Gregory Stark
Gokulakannan Somasundaram [EMAIL PROTECTED] writes:

 Hi,
 I already made a discussion about it. We can view the Logical I/Os. If
 we enable the log_statement_stats in the conf file and apply the following
 patch, it is possible. But putting it in Explain analyze makes more sense to
 me.

I was going to say that I'm really only interested in physical I/O. Logical
I/O which is satisfied by the kernel cache is only marginally interesting and
buffer fetches from Postgres's shared buffer is entirely uninteresting from
the point of view of trying to figure out what is slowing down a query.

However I suppose that's not true. There are other reasons why buffer fetches
could be interesting. In particular I imagine when users post explain analyzes
it would give us a good idea of whether their tables or bloated or their
tuples are extremely wide (in cases where the planner gets it wrong).

But I do think that showing logical I/Os without even an heuristic based
measurement of actual physical i/o is pretty useless. It will make people
think they want to grow their shared buffers to cover all of memory.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-15 Thread Gokulakannan Somasundaram
I was going to say that I'm really only interested in physical I/O. Logical
 I/O which is satisfied by the kernel cache is only marginally interesting
 and
 buffer fetches from Postgres's shared buffer is entirely uninteresting
 from
 the point of view of trying to figure out what is slowing down a query.


Ok the Physical I/Os are already visible, if you enable log_statement_stats.
Again i accept that it would be more helpful, if it gets displayed with
Explain Analyze.



 However I suppose that's not true. There are other reasons why buffer
 fetches
 could be interesting. In particular I imagine when users post explain
 analyzes
 it would give us a good idea of whether their tables or bloated or their
 tuples are extremely wide (in cases where the planner gets it wrong).


I have used it a lot for query tuning. If we re-write a query in such a way,
the logical reads will come down, then it implies lesser physical reads in
production. I think you would accept that there are some ways in which the
query can be re-written only by humans and not by the optimizer. When we do
that, instead of looking at the explain analyze time, it makes more sense
for me to look at the logical reads




 But I do think that showing logical I/Os without even an heuristic based
 measurement of actual physical i/o is pretty useless. It will make people
 think they want to grow their shared buffers to cover all of memory.


I just want to clarify that we should display both Logical reads and
physical reads together. But increasing the shared buffer by looking at the
performance of a query doesn't seem to be a good idea. But people should be
aware that Logical reads is not for shared buffer management.




 --
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!




-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-15 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

I was going to say that I'm really only interested in physical I/O. Logical

I/O which is satisfied by the kernel cache is only marginally interesting
and
buffer fetches from Postgres's shared buffer is entirely uninteresting
from
the point of view of trying to figure out what is slowing down a query.


Ok the Physical I/Os are already visible, if you enable log_statement_stats.


I think you missed the point. What log_statement_stats shows are not 
physical I/Os, they're read() system calls. Unfortunately there's no 
direct way to tell if a read() is satisfied from OS cache or not. Greg's 
suggestion was about how to do that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-14 Thread Neil Conway
On Fri, 2007-12-14 at 15:47 +, Gregory Stark wrote:
 I've wanted for a long time to have EXPLAIN ANALYZE output per-node I/O usage.
 This would be especially useful if we could distinguish hardware versus
 logical I/O though. And I always thought that would be very hard.
 
 My thought in the past was that would could do it on Solaris by having
 Postgres use DTrace directly via its (undocumented but existing) programmatic
 interface.
 
 For other operating systems it was tempting to suggest just timing the read(2)
 call to see if it took too long to be a logical operation. The problem there
 is that gettimeofday would impose far too much overhead to make that practical
 (or even be precise enough to work properly).
 
 But it occurred to me just now that the hardware instruction counter available
 on just about every platform would be good enough for a heuristic guess at
 whether the read(2) was cached.

I'm skeptical that this would be reliable enough to be very useful,
especially in the face of concurrent, unpredictable system activity on a
busy system. I agree that it would be useful information, though.
Perhaps a useful first step would be to teach EXPLAIN ANALYZE to report
the number of logical and physical I/Os from Postgres' perspective (i.e.
physical I/O just means we need to go to the kernel).

  The problem generally with using the hardware
 instruction counter is that it's not necessarily in sync between processors
 and might therefore run backwards or skip time forwards. This is a problem for
 profiling but if all we care about is a boolean guess at whether the request
 was satisfied quickly from cache then any such skipping forward or backward
 would represent a context switch which we could just toss in the hardware i/o
 bucket. It doesn't matter exactly how long the hardware i/o took, only that
 there was one.
 
 To that end I would love to see something like:
 
  QUERY PLAN   

 -
  Bitmap Heap Scan on h  (cost=8.52..16.45 rows=2 width=512) (actual 
 time=78.926..87.708 rows=2 loops=1 logical-I/O=2 physical-I/O=1)
Recheck Cond: (i = ANY ('{100,1000}'::integer[]))
-  Bitmap Index Scan on hi  (cost=0.00..8.52 rows=2 width=0) (actual 
 time=74.539..74.539 rows=2 loops=1 logical-I/O=2 physical-I/O=2))
  Index Cond: (i = ANY ('{100,1000}'::integer[]))
  Total runtime: 87.820 ms
 
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node

2007-12-14 Thread Gokulakannan Somasundaram
Hi,
I already made a discussion about it. We can view the Logical I/Os. If
we enable the log_statement_stats in the conf file and apply the following
patch, it is possible. But putting it in Explain analyze makes more sense to
me.



*** postgresql-8.3beta1/src/backend/storage/buffer/bufmgr.c Tue Sep 25
18:11:48 2007
--- postgresql-8.3patch/src/backend/storage/buffer/bufmgr.c Fri Oct 19
23:18:36 2007
***
*** 1470,1477 
   localhitrate = (float) LocalBufferHitCount *100.0 /
ReadLocalBufferCount;

   appendStringInfo(str,
!   !\tShared blocks: %10ld read, %10ld written, buffer hit rate =
%.2f%%\n,
!   ReadBufferCount - BufferHitCount,
BufferFlushCount, hitrate);
   appendStringInfo(str,
   !\tLocal  blocks: %10ld read, %10ld written, buffer hit rate =
%.2f%%\n,
ReadLocalBufferCount -
LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);
--- 1470,1477 
   localhitrate = (float) LocalBufferHitCount *100.0 /
ReadLocalBufferCount;

   appendStringInfo(str,
!   !\tShared blocks: %10ld Logical Reads, %10ld Physical Reads, %10ld
written, buffer hit rate = %.2f%%\n,
!   ReadBufferCount, ReadBufferCount -
BufferHitCount,
BufferFlushCount, hitrate);
   appendStringInfo(str,
   !\tLocal  blocks: %10ld read, %10ld written, buffer hit rate =
%.2f%%\n,
ReadLocalBufferCount -
LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);





-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-13 Thread Jim C. Nasby
On Mon, Dec 11, 2006 at 12:24:12AM +0100, Peter Eisentraut wrote:
 Simon Riggs wrote:
  Well, I'd like a way of making EXPLAIN ANALYZE return something
  useful within a reasonable amount of time. We can define that as the
  amount of time that the user considers is their goal for the query.
 
 What sort of useful results would you expect to be able to see from 
 such an aborted EXPLAIN ANALYZE?  I cannot quite imagine what 
 instructive value a partially executed plan output would have.  It's 
 not like we can somehow ensure executing an equal proportion of each 
 plan node or something.  Do you have a specific case in mind?

The query is most likely to get canceled while it is working on whatever
node in the plan is the bottleneck, and it's likely going to be easy to
spot since nodes above it wouldn't have gotten much done.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-13 Thread Joshua Reich

Thumbs up on this from a lurker.

I recall a previous post about some sort of progress bar hack that 
would show you where in a plan a currently executing query was at. Has 
any work been done on this?


Josh Reich


Jim C. Nasby wrote:

On Mon, Dec 11, 2006 at 12:24:12AM +0100, Peter Eisentraut wrote:
  

Simon Riggs wrote:


Well, I'd like a way of making EXPLAIN ANALYZE return something
useful within a reasonable amount of time. We can define that as the
amount of time that the user considers is their goal for the query.
  
What sort of useful results would you expect to be able to see from 
such an aborted EXPLAIN ANALYZE?  I cannot quite imagine what 
instructive value a partially executed plan output would have.  It's 
not like we can somehow ensure executing an equal proportion of each 
plan node or something.  Do you have a specific case in mind?



The query is most likely to get canceled while it is working on whatever
node in the plan is the bottleneck, and it's likely going to be easy to
spot since nodes above it wouldn't have gotten much done.
  



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-12 Thread Bruce Momjian
Richard Huxton wrote:
 Simon Riggs wrote:
  Intermediate results are always better than none at all. I do understand
  what a partial execution would look like - frequently it is the
  preparatory stages that slow a query down - costly sorts, underestimated
  hash joins etc. Other times it is loop underestimation, which can
  usually be seen fairly quickly.
 
 Surely all you're interested in is where the actual plan differs from 
 the expected plan? Could you not just have a mode that issues NOTICEs 
 when expected/actual number of rows differ by more than a set amount? 
 You'd probably want two NOTICEs - one when the threshold is exceeded, 
 one when the node completes.

Right, we already have a TODO:

* Have EXPLAIN ANALYZE highlight poor optimizer estimates

I was thinking we could issue NOTICE when the estimates differed from
the actual by a specified percentage, and that NOTICE could be issued
while the query is still processing, assuming the stage completes before
the query does.  This seems much easier than doing protocol changes. 
TODO updated:

* Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and
  actual row counts differ by a specified percentage

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-12 Thread Neil Conway
On Tue, 2006-12-12 at 17:30 -0500, Bruce Momjian wrote: 
   * Have EXPLAIN ANALYZE highlight poor optimizer estimates

 TODO updated:
 
   * Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and
 actual row counts differ by a specified percentage

I don't think this is an improvement. The old wording describes a broad
set of possible improvements. Your new text describes one way of
implementing a subset of the former TODO wording.

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-12 Thread Bruce Momjian
Neil Conway wrote:
 On Tue, 2006-12-12 at 17:30 -0500, Bruce Momjian wrote: 
  * Have EXPLAIN ANALYZE highlight poor optimizer estimates
 
  TODO updated:
  
  * Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and
actual row counts differ by a specified percentage
 
 I don't think this is an improvement. The old wording describes a broad
 set of possible improvements. Your new text describes one way of
 implementing a subset of the former TODO wording.

Well, we can still do a broader implementation if we want it.  Do you
have any suggestions?  Basically, the more specific, the more likely we
will get someone to do it, and  we can always add more details.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Simon Riggs
On Sun, 2006-12-10 at 18:09 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  The EA case is pretty straightforward though;
 
 Well, no its not, as you'll recall if you re-read the prior discussions.
 The killer problem is that it's unclear whether the early termination of
 the query represents an error condition or not.  If it's not an error
 then you've got a serious problem for non-SELECT queries (which EA
 actually executes, remember) 

I was expecting to take the non-ERROR route, realising all of the
problems you mention make the issue so costly to solve that way. Most
big queries are SELECTs without any updates and its those we care about
here. Anybody doing huge updates needs other assistance, IMHO.

Intermediate results are always better than none at all. I do understand
what a partial execution would look like - frequently it is the
preparatory stages that slow a query down - costly sorts, underestimated
hash joins etc. Other times it is loop underestimation, which can
usually be seen fairly quickly.

I foresaw that it would be possible to enforce EA as a read-only
transaction, or throw an error (with no output) when issued in
time-limited form. Maybe that isn't possible.

Yes, I am looking for a fix that can be accomplished without major work
and/or change. This topic is a pain, but not such a priority feature.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 We might be able to finesse the protocol problem by teaching EA to
 respond to query cancel by emitting the data-so-far as a NOTICE (like it
 used to do many moons ago), rather than a standard query result, then
 allowing the query to error out.  However this'd be fairly unfriendly
 for client-side tools that are expecting a query result.

What I suggested was introducing a new FE/BE message type for analyze query
plans. Then clients that recognize it can use it to display the query plan
without interfering with the query results. Clients that don't know what to do
with it would have to just ignore it.

Then we could introduce as many ways of triggering these messages as we like.
A GUC to trigger one every n seconds, a FE/BE message like QueryCancel, say,
QueryProbe which triggers one when the user presses a button in pgadmin or C-t
(SIGINFO) in psql, etc.

I was thinking that it should be more structured than the current block of
text that clients receive. I had in mind to make it equivalent to a PGResult
so the various bits of data would be in different named columns. This would
let GUI clients like pgadmin interpret the results more effectively and make
it easier for us to add data without worrying about information overload on
the user's side.

And the query would keep operating. Canceling the query and statement_timeout
would both be entirely orthogonal to requesting analyze results.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Richard Huxton

Simon Riggs wrote:

Intermediate results are always better than none at all. I do understand
what a partial execution would look like - frequently it is the
preparatory stages that slow a query down - costly sorts, underestimated
hash joins etc. Other times it is loop underestimation, which can
usually be seen fairly quickly.


Surely all you're interested in is where the actual plan differs from 
the expected plan? Could you not just have a mode that issues NOTICEs 
when expected/actual number of rows differ by more than a set amount? 
You'd probably want two NOTICEs - one when the threshold is exceeded, 
one when the node completes.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Simon Riggs
On Mon, 2006-12-11 at 11:00 +, Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  We might be able to finesse the protocol problem by teaching EA to
  respond to query cancel by emitting the data-so-far as a NOTICE (like it
  used to do many moons ago), rather than a standard query result, then
  allowing the query to error out.  However this'd be fairly unfriendly
  for client-side tools that are expecting a query result.
 
 What I suggested was introducing a new FE/BE message type for analyze query
 plans. Then clients that recognize it can use it to display the query plan
 without interfering with the query results. Clients that don't know what to do
 with it would have to just ignore it.
 
 Then we could introduce as many ways of triggering these messages as we like.
 A GUC to trigger one every n seconds, a FE/BE message like QueryCancel, say,
 QueryProbe which triggers one when the user presses a button in pgadmin or C-t
 (SIGINFO) in psql, etc.
 
 I was thinking that it should be more structured than the current block of
 text that clients receive. I had in mind to make it equivalent to a PGResult
 so the various bits of data would be in different named columns. This would
 let GUI clients like pgadmin interpret the results more effectively and make
 it easier for us to add data without worrying about information overload on
 the user's side.
 
 And the query would keep operating. Canceling the query and statement_timeout
 would both be entirely orthogonal to requesting analyze results.

I like the idea, but its more work than I really wanted to get into
right now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-12-11 at 11:00 +, Gregory Stark wrote:
 What I suggested was introducing a new FE/BE message type for analyze query
 plans.

 I like the idea, but its more work than I really wanted to get into
 right now.

Yeah ... a protocol change is *painful*, especially if you really want
clients to behave in a significantly new way.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Neil Conway

Tom Lane wrote:

Yeah ... a protocol change is *painful*, especially if you really want
clients to behave in a significantly new way.


A backward-incompatible protocol change is painful, sure, but ISTM we 
could implement what Greg describes as a straightforward extension to 
the V3 protocol. Then the backend could just avoid sending the query 
progress information to  V4 protocol clients.


-Neil


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Neil Conway

Simon Riggs wrote:

I like the idea, but its more work than I really wanted to get into
right now.


Well, from another point of view: do we need this feature so urgently 
that there is not enough time to do it properly? IMHO, no.


-Neil

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] EXPLAIN ANALYZE

2006-12-11 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Yeah ... a protocol change is *painful*, especially if you really want
 clients to behave in a significantly new way.

 A backward-incompatible protocol change is painful, sure, but ISTM we 
 could implement what Greg describes as a straightforward extension to 
 the V3 protocol. Then the backend could just avoid sending the query 
 progress information to  V4 protocol clients.

You're dodging the point though.  If you want the new message type to do
anything useful in V4 clients, you still have to define an API for
libpq, update psql, try to figure out what the heck JDBC and ODBC are
going to do with it, etc etc.  All doable, but it's a lot more work than
just a quick hack in the backend.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Explain Analyze mode

2005-03-11 Thread Oleg Bartunov
See your postgresql.conf for
log_planner_stats = true #false
log_executor_stats = true #false
#log_statement_stats = false

Oleg
On Fri, 11 Mar 2005, Ioannis Theoharis wrote:

Hi, i found this form of output of explain analyze, watching some old
mails in lists.
test4=# explain analyze select * from patients;
LOG:  query: explain analyze select * from patients;
LOG:  duration: 0.603887 sec
LOG:  QUERY STATISTICS
! system usage stats:
!   0.624269 elapsed 0.458985 user 0.123047 system sec
!   [0.468750 user 0.125000 sys total]
!   0/0 [0/0] filesystem blocks in/out
!   7/4 [310/158] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   0/0 [0/0] voluntary/involuntary context switches
! buffer usage stats:
!   Shared blocks:   2742 read,  0 written, buffer hit
rate = 3.59%
!   Local  blocks:  0 read,  0 written, buffer hit
rate = 0.00%
!   Direct blocks:  0 read,  0 written
  QUERY PLAN
-
Seq Scan on patients  (cost=0.00..4048.60 rows=131960 width=172)
(actual time=0.04..562.97 rows=133886 loops=1)
Total runtime: 602.42 msec
(2 rows)
How can i turn my configuration to achieve this output for explain
analyze (and only the QUERY PLAN, as like tomorrow)?
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 8: explain analyze is your friend