Re: [HACKERS] EXPLAIN (ANALYZE, BUFFERS) reports bogus temporary buffer reads
On Tue, Oct 17, 2017 at 2:29 AM, Thomas Munrowrote: > Vik Fearing asked off-list why hash joins appear to read slightly more > temporary data than they write. The reason is that we notch up a > phantom block read when we hit the end of each file. Harmless but it > looks a bit weird and it's easily fixed. > > Unpatched, a 16 batch hash join reports that we read 30 more blocks > than we wrote (2 per batch after the first, as expected): > >Buffers: shared hit=434 read=16234, temp read=5532 written=5502 > > With the attached patch: > >Buffers: shared hit=547 read=16121, temp read=5502 written=5502 Committed. Arguably we ought to back-patch this, but it's minor so I didn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.
On Fri, Jul 8, 2016 at 12:55 AM, Tom Lanewrote: > Robert Haas writes: >> On Thu, Jul 7, 2016 at 10:07 AM, Tom Lane wrote: >>> Presumably the instrumentation data needed for that is not getting >>> returned from the worker to the leader. > >> Yes. > >> ... >> I'm not sure about the rest of you, but I'd kind of like to finish >> this release and start working on the next one. > > Agreed. We should make sure that the possible omissions from EXPLAIN > output are adequately documented, but actually fixing that seems like > material for a future release cycle. Fair enough. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.
Robert Haaswrites: > On Thu, Jul 7, 2016 at 10:07 AM, Tom Lane wrote: >> Presumably the instrumentation data needed for that is not getting >> returned from the worker to the leader. > Yes. > ... > I'm not sure about the rest of you, but I'd kind of like to finish > this release and start working on the next one. Agreed. We should make sure that the possible omissions from EXPLAIN output are adequately documented, but actually fixing that seems like material for a future release cycle. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.
On Thu, Jul 7, 2016 at 10:07 AM, Tom Lanewrote: > Amit Kapila writes: >> On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masao wrote: >>> I found $SUBJECT while trying to test parallel queries. Is this a bug? > > Presumably the instrumentation data needed for that is not getting > returned from the worker to the leader. Yes. > I would bet there's a lot > of other plan-node-specific data that doesn't work either. That's probably true, too. Generally, what's going to happen here is that if you have a true parallel query plan, any of this sort of subsidiary information is going to reflect what the leader did, but not what the workers did. If the leader did nothing, as in the case of force_parallel_mode, then EXPLAIN ANALYZE will show the same thing that it would have shown if that node had never executed. >> I think this can never happen for force_parallel_mode TO off, because >> we don't generate a gather on top of sort node. The reason why we are >> able to push Sort below gather, because it is marked as parallel_safe >> (create_sort_path). I think we should not mark it as parallel_safe. > > That seems rather ridiculous. An oversight in managing EXPLAIN data > is not a sufficient reason to cripple parallel query. +1. Fixing this is actually somewhat difficult. The parallel query stuff does handle propagating the common instrumentation information from the leader to the workers, but the EXPLAIN ANALYZE output can depend in arbitrary ways on the final executor state tree, which is, of course, unshared, and which is also not something we can propagate between backends since executor state nodes don't have (and can't really support) serialization and deserialization functions. I think we can eventually fix this by teaching individual nodes to store the relevant information in dynamic shared memory rather than backend-local memory when parallel query is in use: the Estimate/InitializeDSM callbacks already give the nodes a chance to obtain control in the right places, except that right now they're only invoked for parallel-aware nodes. I think, though, that it will take more development than we want to undrertake at this point in the cycle. I'm not sure about the rest of you, but I'd kind of like to finish this release and start working on the next one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.
On Thu, Jul 7, 2016 at 7:37 PM, Tom Lanewrote: > Amit Kapila writes: >> On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masao wrote: >>> I found $SUBJECT while trying to test parallel queries. Is this a bug? > > Presumably the instrumentation data needed for that is not getting > returned from the worker to the leader. I would bet there's a lot > of other plan-node-specific data that doesn't work either. > >> I think this can never happen for force_parallel_mode TO off, because >> we don't generate a gather on top of sort node. The reason why we are >> able to push Sort below gather, because it is marked as parallel_safe >> (create_sort_path). I think we should not mark it as parallel_safe. > > That seems rather ridiculous. An oversight in managing EXPLAIN data > is not a sufficient reason to cripple parallel query. > I am analyzing that point only and you seems to be right that we have missed to propagate some information. We have taken care of instrumentation information to be propagated back to leader, but it seems there are other things that needs to be taken care in that area. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.
Amit Kapilawrites: > On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masao wrote: >> I found $SUBJECT while trying to test parallel queries. Is this a bug? Presumably the instrumentation data needed for that is not getting returned from the worker to the leader. I would bet there's a lot of other plan-node-specific data that doesn't work either. > I think this can never happen for force_parallel_mode TO off, because > we don't generate a gather on top of sort node. The reason why we are > able to push Sort below gather, because it is marked as parallel_safe > (create_sort_path). I think we should not mark it as parallel_safe. That seems rather ridiculous. An oversight in managing EXPLAIN data is not a sufficient reason to cripple parallel query. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.
On Thu, Jul 7, 2016 at 1:23 PM, Fujii Masaowrote: > Hi, > > I found $SUBJECT while trying to test parallel queries. Is this a bug? > > > In not parallel mode, EXPLAIN ANALYZE reports the information about > Sort Method as follows. > > =# EXPLAIN ANALYZE SELECT * FROM pgbench_accounts ORDER BY bid; > QUERY PLAN > --- > Sort (cost=180739.34..183239.34 rows=100 width=97) (actual > time=1501.342..1836.057 rows=100 loops=1) >Sort Key: bid >Sort Method: external sort Disk: 104600kB >-> Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=100 > width=97) (actual time=0.013..179.315 rows=100 loops=1) > > > However, in parallel mode, it's not reported, as follows. > > =# SET force_parallel_mode TO on; > =# EXPLAIN ANALYZE SELECT * FROM pgbench_accounts ORDER BY bid; >QUERY > PLAN > - > Gather (cost=181739.34..284239.34 rows=100 width=97) (actual > time=1507.138..2394.028 rows=100 loops=1) >Workers Planned: 1 >Workers Launched: 1 >Single Copy: true >-> Sort (cost=180739.34..183239.34 rows=100 width=97) (actual > time=1503.112..1901.117 rows=100 loops=1) > Sort Key: bid > -> Seq Scan on pgbench_accounts (cost=0.00..26394.00 > rows=100 width=97) (actual time=0.021..181.079 rows=100 > loops=1) > I think this can never happen for force_parallel_mode TO off, because we don't generate a gather on top of sort node. The reason why we are able to push Sort below gather, because it is marked as parallel_safe (create_sort_path). I think we should not mark it as parallel_safe. Will investigate some more and send a patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze does not report actual rows correctly?
chang chaowrites: > The actual rows(rows=9950) part in the following line contained in the above > query plan seems strange. > " -> Sort (cost=10.64..11.14 rows=200 width=520) (actual time=0.045..0.561 > rows=9950 loops=1)" > Shouldn't it be 200? No, that's probably correct, seeing that this node is the righthand child of a mergejoin. The discrepancy is from extra fetches due to the same row being re-fetched multiple times thanks to mark/restore rescanning. What explain is reporting is the number of rows pulled from the node, not the number of unique rows it processed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Tue, Mar 15, 2016 at 5:22 AM, Robert Haaswrote: > > On Sat, Mar 12, 2016 at 1:58 AM, Amit Kapila wrote: > > Yeah, that makes the addition of test for this functionality difficult. > > Robert, do you have any idea what kind of test would have caught this issue? > > Yep. Committed with that test: > Nice. Thanks! With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Sat, Mar 12, 2016 at 1:58 AM, Amit Kapilawrote: > Yeah, that makes the addition of test for this functionality difficult. > Robert, do you have any idea what kind of test would have caught this issue? Yep. Committed with that test: DO $$ BEGIN EXECUTE 'EXPLAIN ANALYZE SELECT * INTO TABLE easi FROM int8_tbl'; END$$; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Sat, Mar 12, 2016 at 7:11 PM, Mithun Cywrote: > > > > On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila wrote: > >With force_parallel_mode=on, I could see many other failures as well. I think it is better to have test, which tests this functionality with >force_parallel_mode=regress > > as per user manual. > Setting this value to regress has all of the same effects as setting it to on plus some additional effect that are intended to facilitate automated > regression testing. > Yes, that is the only reason I mentioned that it better to have a test which can be checked in automated way and I understand that the way you have written test using Explain won't work in automated way, so not sure if it is good idea to add such a test. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
Sorry there was some issue with my mail settings same mail got set more than once. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapilawrote: >With force_parallel_mode=on, I could see many other failures as well. I think it is better to have test, which tests this functionality with >force_parallel_mode=regress as per user manual. Setting this value to regress has all of the same effects as setting it to on plus some additional effect that are intended to facilitate automated regression testing. Normally, messages from a parallel worker are prefixed with a context line, but a setting of regress suppresses this to guarantee reproducible results. *Also, the Gather nodes added to plans by this setting are hidden from the EXPLAIN output so that the output matches what would be obtained if this setting were turned off. * And my test is for EXPLAIN statements. I think under regress mode it will never fail even if parallel scan is used as per above statement. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapilawrote: >With force_parallel_mode=on, I could see many other failures as well. I think it is better to have test, which tests this functionality with >force_parallel_mode=regress as per user manual. Setting this value to regress has all of the same effects as setting it to on plus some additional effect that are intended to facilitate automated regression testing. Normally, messages from a parallel worker are prefixed with a context line, but a setting of regress suppresses this to guarantee reproducible results. *Also, the Gather nodes added to plans by this setting are hidden from the EXPLAIN output so that the output matches what would be obtained if this setting were turned off. * And my test is for EXPLAIN statements. I think under regress mode it will never fail even if parallel scan is used as per above statement. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapilawrote: >With force_parallel_mode=on, I could see many other failures as well. I think it is better to have test, which tests this functionality with >force_parallel_mode=regress as per user manual. Setting this value to regress has all of the same effects as setting it to on plus some additional effect that are intended to facilitate automated regression testing. Normally, messages from a parallel worker are prefixed with a context line, but a setting of regress suppresses this to guarantee reproducible results. *Also, the Gather nodes added to plans by this setting are hidden from the EXPLAIN output so that the output matches what would be obtained if this setting were turned off. * And my test is for EXPLAIN statements. I think under regress mode it will never fail even if parallel scan is used as per above statement. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Sat, Mar 12, 2016 at 2:02 PM, Mithun Cywrote: > > > > On Sat, Mar 12, 2016 at 12:28 PM, Amit Kapila wrote > >I don't see how this test will fail with force_parallel_mode=regress and max_parallel_degree > 0 even without the patch proposed to fix the issue in >hand. In short, I don't think this test would have caught the issue, so I don't see much advantage in adding such a test. Even if we want to add such a >test case, I think as proposed this will substantially increase the timing for "Select Into" test which might not be an acceptable test case addition >especially for testing one corner case. > > > Without above patch the make installcheck fails for select_into.sql with below diff > > when > force_parallel_mode = on > max_parallel_degree = 3 > With force_parallel_mode=on, I could see many other failures as well. I think it is better to have test, which tests this functionality with force_parallel_mode=regress With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Sat, Mar 12, 2016 at 12:28 PM, Amit Kapilawrote >I don't see how this test will fail with force_parallel_mode=regress and max_parallel_degree > 0 even without the patch proposed to fix the issue in >hand. In short, I don't think this test would have caught the issue, so I don't see much advantage in adding such a test. Even if we want to add such a >test case, I think as proposed this will substantially increase the timing for "Select Into" test which might not be an acceptable test case addition >especially for testing one corner case. Without above patch the make installcheck fails for select_into.sql with below diff when force_parallel_mode = on max_parallel_degree = 3 diff results/select_into.out expected/select_into.out 104,110c104,107 < QUERY PLAN < < Gather < Number of Workers: 1 < Single Copy: true < -> Seq Scan on mt1 < (4 rows) --- > QUERY PLAN > - > Seq Scan on mt1 > (1 row) Again with postgresql.conf non default settings. force_parallel_mode = on max_parallel_degree = 3 parallel_tuple_cost = 0 [mithun@localhost regress]$ diff results/select_into.out expected/select_into.out 104,109c104,107 < QUERY PLAN < < Gather < Number of Workers: 3 < -> Parallel Seq Scan on mt1 < (3 rows) --- > QUERY PLAN > - > Seq Scan on mt1 > (1 row) To reduce the time of execution I can set the generate_series parameter to 500, which is fast in my machine and also fails with above diff but this time only one worker is assigned as per plan. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Fri, Mar 11, 2016 at 3:34 PM, Mithun Cywrote: > > On Thu, Mar 10, 2016 at 9:39 PM, Robert Haas wrote: > >I guess there must not be an occurrence of this pattern in the > >regression tests, or previous force_parallel_mode testing would have > >found this problem. Perhaps this patch should add one? > > I have added the test to select_into.sql. Added Explain select into statement. > I don't see how this test will fail with force_parallel_mode=regress and max_parallel_degree > 0 even without the patch proposed to fix the issue in hand. In short, I don't think this test would have caught the issue, so I don't see much advantage in adding such a test. Even if we want to add such a test case, I think as proposed this will substantially increase the timing for "Select Into" test which might not be an acceptable test case addition especially for testing one corner case. > > Explain Analyze produces planning time and execution time even with TIMING OFF > so not adding the same to regress tests. > Yeah, that makes the addition of test for this functionality difficult. Robert, do you have any idea what kind of test would have caught this issue? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Thu, Mar 10, 2016 at 9:39 PM, Robert Haaswrote: >I guess there must not be an occurrence of this pattern in the >regression tests, or previous force_parallel_mode testing would have >found this problem. Perhaps this patch should add one? I have added the test to select_into.sql. Added Explain select into statement. Explain Analyze produces planning time and execution time even with TIMING OFF so not adding the same to regress tests. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com diff --git a/src/test/regress/expected/select_into.out b/src/test/regress/expected/select_into.out index 9d3f047..bb71260 100644 --- a/src/test/regress/expected/select_into.out +++ b/src/test/regress/expected/select_into.out @@ -94,3 +94,16 @@ INSERT INTO b SELECT 1 INTO f; ERROR: SELECT ... INTO is not allowed here LINE 1: INSERT INTO b SELECT 1 INTO f; ^ +-- +-- EXPLAIN [ANALYZE] SELECT INTO should not use parallel scan. +-- +CREATE TABLE mt1 (n INT); +INSERT INTO mt1 VALUES (GENERATE_SERIES(1,500)); +ANALYZE mt1; +EXPLAIN (COSTS OFF) SELECT INTO mt2 FROM mt1; + QUERY PLAN +- + Seq Scan on mt1 +(1 row) + +DROP TABLE mt1; diff --git a/src/test/regress/sql/select_into.sql b/src/test/regress/sql/select_into.sql index 4d1cc86..6eb5e24 100644 --- a/src/test/regress/sql/select_into.sql +++ b/src/test/regress/sql/select_into.sql @@ -76,3 +76,13 @@ COPY (SELECT 1 INTO frak UNION SELECT 2) TO 'blob'; SELECT * FROM (SELECT 1 INTO f) bar; CREATE VIEW foo AS SELECT 1 INTO b; INSERT INTO b SELECT 1 INTO f; + +-- +-- EXPLAIN [ANALYZE] SELECT INTO should not use parallel scan. +-- +CREATE TABLE mt1 (n INT); +INSERT INTO mt1 VALUES (GENERATE_SERIES(1,500)); +ANALYZE mt1; + +EXPLAIN (COSTS OFF) SELECT INTO mt2 FROM mt1; +DROP TABLE mt1; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Thu, Mar 10, 2016 at 4:43 AM, Amit Kapilawrote: > There should be a white space between 0:CURSOR_OPT_PARALLEL_OK. Also I > don't see this comment is required as similar other usage doesn't have any > such comment. Fixed these two points in the attached patch. > > In general, the patch looks good to me and solves the problem mentioned. I > have ran the regression tests with force_parallel_mode and doesn't see any > problem. I guess there must not be an occurrence of this pattern in the regression tests, or previous force_parallel_mode testing would have found this problem. Perhaps this patch should add one? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.
On Wed, Mar 9, 2016 at 8:18 PM, Mithun Cywrote: > > Hi All, > > Explain [Analyze] Select Into table. produces the plan which uses parallel scans. > > Possible Fix: > > I tried to make a patch to fix this. Now in ExplainOneQuery if into clause is > > defined then parallel plans are disabled as similar to their execution. > - /* plan the query */ - plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, params); + /* + * plan the query. + * Note: If Explain is for CreateTableAs / SelectInto Avoid parallel + * plans. + */ + plan = pg_plan_query(query, into ? 0:CURSOR_OPT_PARALLEL_OK, params); There should be a white space between 0:CURSOR_OPT_PARALLEL_OK. Also I don't see this comment is required as similar other usage doesn't have any such comment. Fixed these two points in the attached patch. In general, the patch looks good to me and solves the problem mentioned. I have ran the regression tests with force_parallel_mode and doesn't see any problem. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com Analyze_select_into_disable_parallel_scan_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. Eliding the Limit node gave a further slight win. I wasn't convinced the use-case was common enough to justify the replacement of quicksort (despite having consistently fewer compares, the merge sort was slightly slower. I never understood why) - but I never asked. Is there any appetite for supporting alternate sort algorithms? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Jeremy Harris j...@wizmail.org writes: On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. I'm not entirely following. The top-N heapsort approach already makes use of the limit info. If the limit is so large that the sort spills to disk, then we stop thinking about the limit. But I'm finding it doubtful either that that's a case worthy of extra code or that you could get very much win if you did add code for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 14:54, Tom Lane wrote: Jeremy Harris j...@wizmail.org writes: On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. I'm not entirely following. The top-N heapsort approach already makes use of the limit info. Having gone back to look, you're right. It was Uniq nodes I merged (the sort handles both bounded-output and dedup). -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Simon Riggs si...@2ndquadrant.com writes: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort The Sort estimate shows 568733 rows, whereas the actual rows are 20. [ shrug... ] The estimated value is the planner's estimate of what would happen *if you ran the node to completion*, which in practice doesn't happen because of the LIMIT. The actual value is, well, the actual value. We certainly should not munge around the actual value. We could imagine munging the reported estimates to account for the parent LIMIT, but that would make it a lot harder to understand the planner's thought processes, because the reported estimates would have that much less to do with the numbers actually used in the internal calculations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Tom Lane-2 wrote Simon Riggs lt; simon@ gt; writes: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort The Sort estimate shows 568733 rows, whereas the actual rows are 20. [ shrug... ] The estimated value is the planner's estimate of what would happen *if you ran the node to completion*, which in practice doesn't happen because of the LIMIT. The actual value is, well, the actual value. We certainly should not munge around the actual value. We could imagine munging the reported estimates to account for the parent LIMIT, but that would make it a lot harder to understand the planner's thought processes, because the reported estimates would have that much less to do with the numbers actually used in the internal calculations. Is it even possible for a sort node directly under a limit to output (as nebulous as that term is in this context) more rows that desired by the limit? The interesting thing about a sort node is not its output but its input - i.e., the number of rows being fed to it via the node nested under it. Which prompts the question whether it would be good to show that value as an attribute of the sort node during EXPLAIN ANALYZE instead of having to scan down to the child node. I guess you can argue that we are currently since that is the same value as the estimated rows returned. If you were to change that to reflect the impact of the parent limit node you'd probably want to add something else to reflect the child input size (in rows, not memory). From a pure theory standpoint having the estimated rows reflect the input size instead of the output size seems wrong. In the presence of limit it won't output more than N rows whereas in all other cases the input and the output will be identical. That said I am only pondering this concept because of this thread - it would help to know what sparked all of this in the first place. From a practical perspective the current behavior captures the most important aspect of the sort - the size of the input - and the user knowing of the limit isn't likely to wonder whether we are somehow being wasteful by returning the extra rows; which are not returned so much as scanned over in place by the parent node. David J. -- View this message in context: http://postgresql.nabble.com/EXPLAIN-ANALYZE-output-weird-for-Top-N-Sort-tp5826922p5826935.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
David G Johnston david.g.johns...@gmail.com writes: Tom Lane-2 wrote [ shrug... ] The estimated value is the planner's estimate of what would happen *if you ran the node to completion*, which in practice doesn't happen because of the LIMIT. I don't see how a sort node cannot run to completion... The sort must have read all of its *input*, or it can't be sure it's giving the correct first result row. But run to completion means that it delivered all of its *output*, which obviously does not happen when under a LIMIT. It's entirely possible BTW that the sort's internal processing is not complete when it starts returning rows. For example, when we do a spill-to-disk merge sort, the final merge pass is typically done on-the-fly while returning rows, and so some fraction of that processing may never be completed if the query stops early. It's still seen all the input rows, but it hasn't completely determined their ordering. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN (ANALYZE) broken
Kevin Grittner kgri...@ymail.com writes: Commit 2c92edad48796119c83d7dbe6c33425d1924626d has broken EXPLAIN's ANALYZE option in parentheses, which also makes some other options unusable. test=# EXPLAIN (ANALYZE) SELECT 1; ERROR: syntax error at or near ANALYZE LINE 1: EXPLAIN (ANALYZE) SELECT 1; ^ [ scratches head... ] Coulda sworn I checked that. Will look into it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze query execution time
On 22/11/11 02:58, Kevin Grittner wrote: Rudyar wrote: I try to get the execution time of a query workload. I try using explain analyze but this time is allways higher than the execution time of a query across a client like pgadmin3 what is the reason about that difference? It's the observer effect -- there is a cost to the timing, counting, measuring, and reporting which is done by EXPLAIN ANALYZE, which distorts what is being measured. It's just about impossible to avoid entirely. -Kevin Thanks kevin, what tool you recommend for measure the query real query execution time? -- Rudyar Cortés. Estudiante de Ingeniería Civil Informática Universidad Técnica Federico Santa María. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze query execution time
Rudyar wrote: what tool you recommend for measure the query real query execution time? The -hackers list is for discussion to coordinate development of the PostgreSQL database product. For user questions like this, please pick a more appropriate list based on the descriptions here: http://www.postgresql.org/community/lists/ While any further follow-up should be on another list, I'll briefly answer here. EXPLAIN ANALYZE is great for seeing how a query is being executed, but for accurate timing of how long the query runs without generating all that extra information, you can measure it on the client side, or turn on logging of statements running long than some particular time. In psql you can use \timing on, in Java you can run System.currentTimeInMillis() before and after running the query, etc. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze query execution time
On 22/11/11 10:26, Kevin Grittner wrote: Rudyar wrote: what tool you recommend for measure the query real query execution time? The -hackers list is for discussion to coordinate development of the PostgreSQL database product. For user questions like this, please pick a more appropriate list based on the descriptions here: http://www.postgresql.org/community/lists/ While any further follow-up should be on another list, I'll briefly answer here. EXPLAIN ANALYZE is great for seeing how a query is being executed, but for accurate timing of how long the query runs without generating all that extra information, you can measure it on the client side, or turn on logging of statements running long than some particular time. In psql you can use \timing on, in Java you can run System.currentTimeInMillis() before and after running the query, etc. -Kevin Thanks Kevin ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze query execution time
On 11/21/11, Rudyar rudyar.cor...@gmail.com wrote: Hello, I try to get the execution time of a query workload. I try using explain analyze but this time is allways higher than the execution time of a query across a client like pgadmin3 what is the reason about that difference? Analyze has to do a lot of gettimeofday calls and other bookkeeping, and that takes time. If the data queried is all in memory, this analyze overhead can be far more than the query itself. Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze query execution time
Rudyar wrote: I try to get the execution time of a query workload. I try using explain analyze but this time is allways higher than the execution time of a query across a client like pgadmin3 what is the reason about that difference? It's the observer effect -- there is a cost to the timing, counting, measuring, and reporting which is done by EXPLAIN ANALYZE, which distorts what is being measured. It's just about impossible to avoid entirely. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain analyze getrusage tracking
On Mon, Nov 15, 2010 at 03:33, Greg Stark st...@mit.edu wrote: This is an update to my earlier patch to add getrusage resource tracking to EXPLAIN ANALYZE. With this patch you get something like: QUERY PLAN -- Seq Scan on i (cost=0.00..6919.44 rows=262144 width=101) (actual time=17.240..1123.751 rows=262144 loops=1) Resources: sys=210.000ms user=430.000ms read=33.6MB Buffers: shared read=4298 Total runtime: 1548.651 ms (4 rows) The main change is to make it work under Windows. At least I think the changes should make it work under Windows, I haven't been able to test it. Actually I'm not to happy with the way I did it, I would be more inclined to hack the getrusagestub,h definition of struct rusage to have an instr_time in it so that we can use the same macros directly. But that's more changes than I would be happy making without being able to compile them to test them. I tried building this under windows, and got a bunch of errors. First and easiest - you need to rename IOCOUNTERS to IO_COUNTERS in getrusage.c :P But then I get a number of: c:\pgsql\src\include\portability/instr_time.h(133) : error C2371: 'instr_time' : redefinition; different basic types and .\src\backend\utils\adt\pgstatfuncs.c(1345) : error C2039: 'QuadPart' : is not a member of 'timeval' C:\Program Files\Microsoft SDKs\Windows\v6.1\include\winsock2.h(176) : s ee declaration of 'timeval' which believe are related to the same issue. Haven't looked close enough to figure out what you actually intend for it to be :-) Finally, a number of: c:\pgsql\src\include\executor/instrument.h(19) : fatal error C1083: Cannot open include file: 'sys/resource.h': No such file or directory include files simply doesn't exist on Windows. Hiding it behind an #ifdef complains about fields missing in struct rusage in some cases and lack of existing rusage definition in others. I think you need includes of pg_rusage.h, which will make sure it brings in rusagestub.h when necessary and sys/resource.h when it's there? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain analyze getrusage tracking
On Mon, Nov 22, 2010 at 12:40 PM, Magnus Hagander mag...@hagander.net wrote: I tried building this under windows, and got a bunch of errors. Thanks! -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain analyze getrusage tracking
On Tue, Nov 16, 2010 at 2:43 AM, Robert Haas robertmh...@gmail.com wrote: I don't really think these changes to the INSTR macros make much sense. The macros don't really add much notational convenience; they're mostly wrappers to make the WIN32 and non-WIN32 cases work similarly for the instrumentation stuff, so hacking them up to use them for this doesn't seem like it adds anything. Just do whatever it is you need to do, or define macros locally in explain.c. Well they're going to be just like the macros in instr_time and I didn't really want to duplicate them. What's really goofy here is that on Windows the getrusage portability wrapper is actually consing up those times from FILETIMEs which I believe are actually just like the instr_time data type that we're already using in those macros so it's going of doing a silly dance converting to struct timeval just so it can use macros that aren't convenient at all on Windows. I definitely agree this is awkward. I would love to find a cleaner solution here. I'll try copying the macros and see if that works out more cleanly. It doesn't make much sense to me to normalize the memory for this output to a variable unit when the other memory values we use in explain.c are still going to be printed as kB. I think we should just print it in kB and call it good. Alternatively, we could apply the same normalization algorithm across the board, but I don't think that's as good. I think we should have a project policy of always printing memory and disk usage in kB, MB, GB etc unless they're functions returning an integer intended for machine use. Effectively this is the dual of accepting units on all our memory gucs. I don't know about others but I find it pretty hard to read things like 1234567kB and compare it to 125765kB in my head. I continue to feel strongly that the choice of EXPLAIN format should only affect the format, not the choice of information to be displayed. I don't have a good feeling for what should appear in XML/JSON/whetever. I thought I was just implementing the consensus which was that there wasn't much point in not including the maximum level of detail because the consumer could decide what to display. But having it be the same as the human readable output is also reasonable. Whatever we're doing elsewhere I'll make this the consistent. Using the verbose option to control how much data the resource option prints is, I think, not a good idea. If you want to have two modes, one for partial rusage data and one for full rusage data, you can just as easily implement EXPLAIN (RESOURCE [PARTIAL|FULL]). I believe that the existing grammar is adequate to support that; you'd just need to write the appropriate DefElem-parsing code. But personally I'd just print the whole kit and kaboodle regardless. This is a separate question. The numbers I'm outputing without VERBOSE now are the numbers that I understand and which relate to the other numbers in explain. My intent was that as we understand how the other numbers relate to Postgres functioning we'll learn which ones to include. But mostly they're hard to relate to Postgres and the query execution in any way and just noise that are hard to explain to users. I'm not even sure which ones it makes sense to sum over time -- is the integral virtual memory usage over time something that it makes sense to sum over time? What units is it in? I don't see why VERBOSE isn't the right key to use. The other things VERBOSE toggles are also extra detail that's usually useless but might be useful if you're interested in the inner workings. Things like which fields are being included at each level of the plan for example. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain analyze getrusage tracking
On Tue, Nov 16, 2010 at 12:19 PM, Greg Stark st...@mit.edu wrote: On Tue, Nov 16, 2010 at 2:43 AM, Robert Haas robertmh...@gmail.com wrote: I don't really think these changes to the INSTR macros make much sense. The macros don't really add much notational convenience; they're mostly wrappers to make the WIN32 and non-WIN32 cases work similarly for the instrumentation stuff, so hacking them up to use them for this doesn't seem like it adds anything. Just do whatever it is you need to do, or define macros locally in explain.c. Well they're going to be just like the macros in instr_time and I didn't really want to duplicate them. What's really goofy here is that on Windows the getrusage portability wrapper is actually consing up those times from FILETIMEs which I believe are actually just like the instr_time data type that we're already using in those macros so it's going of doing a silly dance converting to struct timeval just so it can use macros that aren't convenient at all on Windows. I definitely agree this is awkward. I would love to find a cleaner solution here. I'll try copying the macros and see if that works out more cleanly. OK. It doesn't make much sense to me to normalize the memory for this output to a variable unit when the other memory values we use in explain.c are still going to be printed as kB. I think we should just print it in kB and call it good. Alternatively, we could apply the same normalization algorithm across the board, but I don't think that's as good. I think we should have a project policy of always printing memory and disk usage in kB, MB, GB etc unless they're functions returning an integer intended for machine use. Effectively this is the dual of accepting units on all our memory gucs. I don't know about others but I find it pretty hard to read things like 1234567kB and compare it to 125765kB in my head. We have no other place in the system that takes 12345678kB and converts it inexactly to MB or GB. rhaas=# set work_mem to '1048576kB'; SET rhaas=# show work_mem; work_mem -- 1GB (1 row) rhaas=# set work_mem to '1048577kB'; SET rhaas=# show work_mem; work_mem --- 1048577kB (1 row) We could decide that for text-format EXPLAIN output, an inexact conversion is OK, but it's still not OK to do it for the new fields you're adding and not for the existing fields (see the sort and hash instrumentation). Personally, I'm not really convinced that making such a change has a lot of value, and I think it should be submitted as a separate patch and discussed separately, rather than being rolled in here. But if we are going to change it then we at least need to be consistent. Using the verbose option to control how much data the resource option prints is, I think, not a good idea. If you want to have two modes, one for partial rusage data and one for full rusage data, you can just as easily implement EXPLAIN (RESOURCE [PARTIAL|FULL]). I believe that the existing grammar is adequate to support that; you'd just need to write the appropriate DefElem-parsing code. But personally I'd just print the whole kit and kaboodle regardless. This is a separate question. The numbers I'm outputing without VERBOSE now are the numbers that I understand and which relate to the other numbers in explain. My intent was that as we understand how the other numbers relate to Postgres functioning we'll learn which ones to include. But mostly they're hard to relate to Postgres and the query execution in any way and just noise that are hard to explain to users. I'm not even sure which ones it makes sense to sum over time -- is the integral virtual memory usage over time something that it makes sense to sum over time? What units is it in? I suspect that the ru_maxrss, ru_ixrss, ru_idrss, and ru_isrss values are useless to us for EXPLAIN purposes because it sounds like they don't increment over time. The others presumably do, so it makes sense to show 'em if the system is collecting them. I don't see why VERBOSE isn't the right key to use. The other things VERBOSE toggles are also extra detail that's usually useless but might be useful if you're interested in the inner workings. Things like which fields are being included at each level of the plan for example. Yeah, VERBOSE is kind of a catch-all for things that we don't have individual flags for. But I think it's better for each piece of data to depend on one setting, rather than a combination of two or more settings. Otherwise you end up being forced to use VERBOSE and then you get this deluge of output. I'd actually sort of like to remove some things from VERBOSE and give them their own settings, rather than adding more. The fact that VERBOSE turns on Output: is particularly annoying. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain analyze getrusage tracking
On Tue, Nov 16, 2010 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote: I think we should have a project policy of always printing memory and disk usage in kB, MB, GB etc unless they're functions returning an integer intended for machine use. rhaas=# set work_mem to '1048577kB'; Interesting. Though in this case I'll mention the raw data is in 512 byte increments so that would imply having to print .5 sometimes. We could decide that for text-format EXPLAIN output, an inexact conversion is OK, but it's still not OK to do it for the new fields you're adding and not for the existing fields (see the sort and hash instrumentation). Personally, I'm not really convinced that making such a change has a lot of value, and I think it should be submitted as a separate patch and discussed separately, rather than being rolled in here. But if we are going to change it then we at least need to be consistent. Well I proposed that as a separate patch back pre-9.0 and it was problematic. Having a policy of doing x doesn't mean we have to switch everything to x or do nothing. We can have a policy that it's better to do x and then go around fixing things as we find them. But fine, I'll separate that into a separate patch. I suspect that the ru_maxrss, ru_ixrss, ru_idrss, and ru_isrss values are useless to us for EXPLAIN purposes because it sounds like they don't increment over time. The others presumably do, so it makes sense to show 'em if the system is collecting them. I'm not sure, the integral part might mean it is ok to sum. I've never seen it explained anywhere exactly what these represent. But the other question is whether they have anything to do with the plan nodes. If Postgres is taking lots of memory or receiving signals or receiving sysv messages (we don't even use sysv messages so that would be very strange) it doesn't necessarily have anything to do with the executor algorithm. They're probably just not very interesting stats to present to the user. Yeah, VERBOSE is kind of a catch-all for things that we don't have individual flags for. But I think it's better for each piece of data to depend on one setting, rather than a combination of two or more settings. Otherwise you end up being forced to use VERBOSE and then you get this deluge of output. I'd actually sort of like to remove some things from VERBOSE and give them their own settings, rather than adding more. The fact that VERBOSE turns on Output: is particularly annoying. I tend to think it's don't be clever about showing me just the useful stuff, include whatever you've got even if you think it's not useful. I would consider it a bug if there's anything that requires VERBOSE and a user finds is relevant to a fixing a user problem (as opposed to debugging a postgres bug). I'm concerned about the converse problem. The I why do I have to include two dozen flags to get postgres to actually include everything?. Along with the try turning on this flag and resending the plan. Hm, oops forgot a flag resend it again with this too. Oh that's still not including something we need try it a third time with this flag etc. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain analyze getrusage tracking
On Tue, Nov 16, 2010 at 2:53 PM, Greg Stark st...@mit.edu wrote: Yeah, VERBOSE is kind of a catch-all for things that we don't have individual flags for. But I think it's better for each piece of data to depend on one setting, rather than a combination of two or more settings. Otherwise you end up being forced to use VERBOSE and then you get this deluge of output. I'd actually sort of like to remove some things from VERBOSE and give them their own settings, rather than adding more. The fact that VERBOSE turns on Output: is particularly annoying. I tend to think it's don't be clever about showing me just the useful stuff, include whatever you've got even if you think it's not useful. I would consider it a bug if there's anything that requires VERBOSE and a user finds is relevant to a fixing a user problem (as opposed to debugging a postgres bug). I'm concerned about the converse problem. The I why do I have to include two dozen flags to get postgres to actually include everything?. Along with the try turning on this flag and resending the plan. Hm, oops forgot a flag resend it again with this too. Oh that's still not including something we need try it a third time with this flag etc. Well, the reason's pretty obvious in this case: each of ANALYZE, BUFFERS, and RESOURCE adds a huge amount of incremental overhead. I wouldn't object to having a flag that says just turn on absolutely everything you've got. Then you could just say: EXPLAIN (THECRAPOUTOFIT) query... In any case, this seems like an argument for making RESOURCE print everything we think might ever be useful, rather than just some subset of it. A rule that says print some of the RESOURCE information when RESOURCE is specified but the rest only if VERBOSE is also specified or if a non-text output format is used seems to have exactly the sort of ickiness you're complaining about; I'd rather make it all or nothing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain analyze getrusage tracking
On Sun, Nov 14, 2010 at 9:33 PM, Greg Stark st...@mit.edu wrote: This is an update to my earlier patch to add getrusage resource tracking to EXPLAIN ANALYZE. With this patch you get something like: QUERY PLAN -- Seq Scan on i (cost=0.00..6919.44 rows=262144 width=101) (actual time=17.240..1123.751 rows=262144 loops=1) Resources: sys=210.000ms user=430.000ms read=33.6MB Buffers: shared read=4298 Total runtime: 1548.651 ms (4 rows) The main change is to make it work under Windows. At least I think the changes should make it work under Windows, I haven't been able to test it. Actually I'm not to happy with the way I did it, I would be more inclined to hack the getrusagestub,h definition of struct rusage to have an instr_time in it so that we can use the same macros directly. But that's more changes than I would be happy making without being able to compile them to test them. I don't really think these changes to the INSTR macros make much sense. The macros don't really add much notational convenience; they're mostly wrappers to make the WIN32 and non-WIN32 cases work similarly for the instrumentation stuff, so hacking them up to use them for this doesn't seem like it adds anything. Just do whatever it is you need to do, or define macros locally in explain.c. It doesn't make much sense to me to normalize the memory for this output to a variable unit when the other memory values we use in explain.c are still going to be printed as kB. I think we should just print it in kB and call it good. Alternatively, we could apply the same normalization algorithm across the board, but I don't think that's as good. I continue to feel strongly that the choice of EXPLAIN format should only affect the format, not the choice of information to be displayed. Using the verbose option to control how much data the resource option prints is, I think, not a good idea. If you want to have two modes, one for partial rusage data and one for full rusage data, you can just as easily implement EXPLAIN (RESOURCE [PARTIAL|FULL]). I believe that the existing grammar is adequate to support that; you'd just need to write the appropriate DefElem-parsing code. But personally I'd just print the whole kit and kaboodle regardless. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote: What I'd find strange about 6.67 rows in your example is more that on the estimated rows side, it seems to imply an unrealistically precise estimate in the same way that 667 rows would seem unrealistically precise to me. Maybe rounding to 2 significant digits would reduce confusion? You're right that the number of significant digits already exceeds the true accuracy of the computation. I think what Robert wants to see is the exact value used in the calc, so the estimates can be checked more thoroughly than is currently possible. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote: What I'd find strange about 6.67 rows in your example is more that on the estimated rows side, it seems to imply an unrealistically precise estimate in the same way that 667 rows would seem unrealistically precise to me. Maybe rounding to 2 significant digits would reduce confusion? You're right that the number of significant digits already exceeds the true accuracy of the computation. I think what Robert wants to see is the exact value used in the calc, so the estimates can be checked more thoroughly than is currently possible. Bingo. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
Robert Haas robertmh...@gmail.com writes: On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote: You're right that the number of significant digits already exceeds the true accuracy of the computation. I think what Robert wants to see is the exact value used in the calc, so the estimates can be checked more thoroughly than is currently possible. Bingo. Uh, the planner's estimate *is* an integer. What was under discussion (I thought) was showing some fractional digits in the case where EXPLAIN ANALYZE is outputting a measured row count that is an average over multiple loops, and therefore isn't necessarily an integer. In that case the measured value can be considered arbitrarily precise --- though I think in practice one or two fractional digits would be plenty. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
...Robert On Jun 2, 2009, at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote: You're right that the number of significant digits already exceeds the true accuracy of the computation. I think what Robert wants to see is the exact value used in the calc, so the estimates can be checked more thoroughly than is currently possible. Bingo. Uh, the planner's estimate *is* an integer. What was under discussion (I thought) was showing some fractional digits in the case where EXPLAIN ANALYZE is outputting a measured row count that is an average over multiple loops, and therefore isn't necessarily an integer. In that case the measured value can be considered arbitrarily precise --- though I think in practice one or two fractional digits would be plenty. We're in violent agreement here. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
Euler Taveira de Oliveira wrote: Robert Haas escreveu: ...EXPLAIN ANALYZE reports the number of rows as an integer... Any chance we could reconsider this decision? I often find myself wanting to know the value that is here called ntuples, but rounding ntuples/nloops off to the nearest integer loses too much precision. Don't you think is too strange having, for example, 6.67 rows? I would confuse users and programs that parses the EXPLAIN output. However, I wouldn't object I don't think it's that confusing. If it says 0.1 rows, I imagine most people would infer that this means typically 0, but sometimes 1 or a few rows. What I'd find strange about 6.67 rows in your example is more that on the estimated rows side, it seems to imply an unrealistically precise estimate in the same way that 667 rows would seem unrealistically precise to me. Maybe rounding to 2 significant digits would reduce confusion? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
Joshua Tolley eggyk...@gmail.com writes: On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote: On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira Don't you think is too strange having, for example, 6.67 rows? No stranger than having it say 7 when it's really not. Actually mine mostly come out 1 when the real value is somewhere between 0.5 and 1.49. :-( +1. It would help users realize more quickly that some of the values in the EXPLAIN output are, for instance, *average* number of rows *per iteration* of a nested loop, say, rather than total rows found in all loops. I think it would only be sensible to show fractional digits if nloops is greater than 1. Otherwise the value must in fact be an integer, and you're just going to confuse people more by suggesting that it might not be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
On Fri, May 29, 2009 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Joshua Tolley eggyk...@gmail.com writes: On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote: On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira Don't you think is too strange having, for example, 6.67 rows? No stranger than having it say 7 when it's really not. Actually mine mostly come out 1 when the real value is somewhere between 0.5 and 1.49. :-( +1. It would help users realize more quickly that some of the values in the EXPLAIN output are, for instance, *average* number of rows *per iteration* of a nested loop, say, rather than total rows found in all loops. I think it would only be sensible to show fractional digits if nloops is greater than 1. Otherwise the value must in fact be an integer, and you're just going to confuse people more by suggesting that it might not be. That might be over-engineering, but I'll take it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
Robert Haas escreveu: I have always assumed that there is some very good reason why EXPLAIN ANALYZE reports the number of rows as an integer rather than a floating point value, but in reading explain.c it seems that the reason is just that we decided to round to zero decimal places. Any chance we could reconsider this decision? I often find myself wanting to know the value that is here called ntuples, but rounding ntuples/nloops off to the nearest integer loses too much precision. Don't you think is too strange having, for example, 6.67 rows? I would confuse users and programs that parses the EXPLAIN output. However, I wouldn't object to add ntuples to an extended explain output (as discussed in the other thread). -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira eu...@timbira.com wrote: Robert Haas escreveu: I have always assumed that there is some very good reason why EXPLAIN ANALYZE reports the number of rows as an integer rather than a floating point value, but in reading explain.c it seems that the reason is just that we decided to round to zero decimal places. Any chance we could reconsider this decision? I often find myself wanting to know the value that is here called ntuples, but rounding ntuples/nloops off to the nearest integer loses too much precision. Don't you think is too strange having, for example, 6.67 rows? No stranger than having it say 7 when it's really not. Actually mine mostly come out 1 when the real value is somewhere between 0.5 and 1.49. :-( ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] explain analyze rows=%.0f
On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote: On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira Don't you think is too strange having, for example, 6.67 rows? No stranger than having it say 7 when it's really not. Actually mine mostly come out 1 when the real value is somewhere between 0.5 and 1.49. :-( +1. It would help users realize more quickly that some of the values in the EXPLAIN output are, for instance, *average* number of rows *per iteration* of a nested loop, say, rather than total rows found in all loops. That's an important distinction that isn't immediately clear to the novice EXPLAIN reader, but would become so very quickly as users tried to figure out how a scan could come up with a fractional row. - Josh / eggyknap signature.asc Description: Digital signature
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
Greg, Gregory Stark wrote: I don't think DTrace is overkill either. The programmatic interface is undocumented (but I've gotten Sun people to admit it exists -- I just have to reverse engineer it from the existing code samples) but should be more or less exactly what we need. You probably know this already. There are existing commands that use the programmatic interface and would provide a good starting point. Here are a couple: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/lockstat/lockstat.c http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/dtrace/dtrace.c One of my colleagues is in the process of putting a tutorial together for how to do this, so if you decided to pursue this approach and need assistance, please let me know. Regards, Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
Decibel! [EMAIL PROTECTED] writes: Also, has anyone looked into adding a class of system calls that would actually tell us if the kernel issued physical IO? I find it hard to believe that other RDBMSes wouldn't like to have that info... Yeah, I think that's called DTrace -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
Decibel! [EMAIL PROTECTED] writes: When a read() call returns, surely the kernel knows whether it actually issued a physical read request to satisfy that. I don't see any reason why you couldn't have a version of read() that returns that information. I also rather doubt that we're the only userland software that would make use of that. I'm told this exists on Windows for the async interface. But AFAIK it doesn't on Unix. The visibility into things like this is what makes DTrace so remarkable. I think there aren't many userland software interested in this. The only two cases I can think of are databases -- which use direct I/O partly because of this issue -- and real-time software like multimedia software -- which use aren't so much interested in measuring it as forcing things to be preloaded with stuff like posix_fadvise() or mlock(). I don't think DTrace is overkill either. The programmatic interface is undocumented (but I've gotten Sun people to admit it exists -- I just have to reverse engineer it from the existing code samples) but should be more or less exactly what we need. But the lowest-common-denominator of just timing read() and seeing if it took long enough to involve either a context switch or sleeping on physical i/o should be a pretty close approximation. The case where it would be least accurate is when most or all of the data is actually in the cache. Then even with a low false-positive rate detecting cache misses it'll still dominate the true near-zero rate of cache misses. We could mitigate that somewhat by describing it in the plan as something like ... (... I/O fast=nnn slow=nnn) instead of the more descriptive physical and logical -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
On Dec 14, 2007, at 11:10 PM, Neil Conway wrote: But it occurred to me just now that the hardware instruction counter available on just about every platform would be good enough for a heuristic guess at whether the read(2) was cached. I'm skeptical that this would be reliable enough to be very useful, especially in the face of concurrent, unpredictable system activity on a busy system. I agree that it would be useful information, though. Perhaps a useful first step would be to teach EXPLAIN ANALYZE to report the number of logical and physical I/Os from Postgres' perspective (i.e. physical I/O just means we need to go to the kernel). *watches hands wave* If we assume that what we *really* want to see the difference of is a cache IO vs one that truly hits a platter, I don't see how you could come up with enough variation to account for that. Unless you're super lucky and the sector you need is just about to hit the head when you issue the request, you're going to have a delay measured in milliseconds. 1ms on a modern CPU is over 100,000 cycles. That's a lot of instructions... Someone want to throw together some code that actually measures this? Maybe something that keeps a histogram of how many instructions take place per I/O request? If it turns out that counters do vary too much between CPUs, there might be ways that we can account for that. Also, has anyone looked into adding a class of system calls that would actually tell us if the kernel issued physical IO? I find it hard to believe that other RDBMSes wouldn't like to have that info... -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
On Mon, 17 Dec 2007, Decibel! wrote: Someone want to throw together some code that actually measures this? Maybe something that keeps a histogram of how many instructions take place per I/O request? If it turns out that counters do vary too much between CPUs, there might be ways that we can account for that. I'd expect the easiest way to do a proof of concept here would be to use the Linux oprofile tool. That's already abstracted away the differences in hardware counters and provides a relatively simple interface to collect the data without getting bogged down with implementation there. Starting from scratch, going right to the hardware counters and building from there, is a big project--they've been hacking on oprofile for almost six years now and still aren't suggesting it's release quality yet. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
On 12/17/07, Decibel! [EMAIL PROTECTED] wrote: Also, has anyone looked into adding a class of system calls that would actually tell us if the kernel issued physical IO? I find it hard to believe that other RDBMSes wouldn't like to have that info... Non-blocking style interfaces can help here. On Windows, for instance, a read returns data at the call site if it was satisfied by cache, instead of invoking the asynchronous notification. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
On Dec 16, 2007 1:03 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: I was going to say that I'm really only interested in physical I/O. Logical I/O which is satisfied by the kernel cache is only marginally interesting and buffer fetches from Postgres's shared buffer is entirely uninteresting from the point of view of trying to figure out what is slowing down a query. Ok the Physical I/Os are already visible, if you enable log_statement_stats. I think you missed the point. What log_statement_stats shows are not physical I/Os, they're read() system calls. Unfortunately there's no direct way to tell if a read() is satisfied from OS cache or not. Greg's suggestion was about how to do that. Oh OK. Thanks for clarifying.. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
Gokulakannan Somasundaram [EMAIL PROTECTED] writes: Hi, I already made a discussion about it. We can view the Logical I/Os. If we enable the log_statement_stats in the conf file and apply the following patch, it is possible. But putting it in Explain analyze makes more sense to me. I was going to say that I'm really only interested in physical I/O. Logical I/O which is satisfied by the kernel cache is only marginally interesting and buffer fetches from Postgres's shared buffer is entirely uninteresting from the point of view of trying to figure out what is slowing down a query. However I suppose that's not true. There are other reasons why buffer fetches could be interesting. In particular I imagine when users post explain analyzes it would give us a good idea of whether their tables or bloated or their tuples are extremely wide (in cases where the planner gets it wrong). But I do think that showing logical I/Os without even an heuristic based measurement of actual physical i/o is pretty useless. It will make people think they want to grow their shared buffers to cover all of memory. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
I was going to say that I'm really only interested in physical I/O. Logical I/O which is satisfied by the kernel cache is only marginally interesting and buffer fetches from Postgres's shared buffer is entirely uninteresting from the point of view of trying to figure out what is slowing down a query. Ok the Physical I/Os are already visible, if you enable log_statement_stats. Again i accept that it would be more helpful, if it gets displayed with Explain Analyze. However I suppose that's not true. There are other reasons why buffer fetches could be interesting. In particular I imagine when users post explain analyzes it would give us a good idea of whether their tables or bloated or their tuples are extremely wide (in cases where the planner gets it wrong). I have used it a lot for query tuning. If we re-write a query in such a way, the logical reads will come down, then it implies lesser physical reads in production. I think you would accept that there are some ways in which the query can be re-written only by humans and not by the optimizer. When we do that, instead of looking at the explain analyze time, it makes more sense for me to look at the logical reads But I do think that showing logical I/Os without even an heuristic based measurement of actual physical i/o is pretty useless. It will make people think they want to grow their shared buffers to cover all of memory. I just want to clarify that we should display both Logical reads and physical reads together. But increasing the shared buffer by looking at the performance of a query doesn't seem to be a good idea. But people should be aware that Logical reads is not for shared buffer management. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
Gokulakannan Somasundaram wrote: I was going to say that I'm really only interested in physical I/O. Logical I/O which is satisfied by the kernel cache is only marginally interesting and buffer fetches from Postgres's shared buffer is entirely uninteresting from the point of view of trying to figure out what is slowing down a query. Ok the Physical I/Os are already visible, if you enable log_statement_stats. I think you missed the point. What log_statement_stats shows are not physical I/Os, they're read() system calls. Unfortunately there's no direct way to tell if a read() is satisfied from OS cache or not. Greg's suggestion was about how to do that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
On Fri, 2007-12-14 at 15:47 +, Gregory Stark wrote: I've wanted for a long time to have EXPLAIN ANALYZE output per-node I/O usage. This would be especially useful if we could distinguish hardware versus logical I/O though. And I always thought that would be very hard. My thought in the past was that would could do it on Solaris by having Postgres use DTrace directly via its (undocumented but existing) programmatic interface. For other operating systems it was tempting to suggest just timing the read(2) call to see if it took too long to be a logical operation. The problem there is that gettimeofday would impose far too much overhead to make that practical (or even be precise enough to work properly). But it occurred to me just now that the hardware instruction counter available on just about every platform would be good enough for a heuristic guess at whether the read(2) was cached. I'm skeptical that this would be reliable enough to be very useful, especially in the face of concurrent, unpredictable system activity on a busy system. I agree that it would be useful information, though. Perhaps a useful first step would be to teach EXPLAIN ANALYZE to report the number of logical and physical I/Os from Postgres' perspective (i.e. physical I/O just means we need to go to the kernel). The problem generally with using the hardware instruction counter is that it's not necessarily in sync between processors and might therefore run backwards or skip time forwards. This is a problem for profiling but if all we care about is a boolean guess at whether the request was satisfied quickly from cache then any such skipping forward or backward would represent a context switch which we could just toss in the hardware i/o bucket. It doesn't matter exactly how long the hardware i/o took, only that there was one. To that end I would love to see something like: QUERY PLAN - Bitmap Heap Scan on h (cost=8.52..16.45 rows=2 width=512) (actual time=78.926..87.708 rows=2 loops=1 logical-I/O=2 physical-I/O=1) Recheck Cond: (i = ANY ('{100,1000}'::integer[])) - Bitmap Index Scan on hi (cost=0.00..8.52 rows=2 width=0) (actual time=74.539..74.539 rows=2 loops=1 logical-I/O=2 physical-I/O=2)) Index Cond: (i = ANY ('{100,1000}'::integer[])) Total runtime: 87.820 ms ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
Hi, I already made a discussion about it. We can view the Logical I/Os. If we enable the log_statement_stats in the conf file and apply the following patch, it is possible. But putting it in Explain analyze makes more sense to me. *** postgresql-8.3beta1/src/backend/storage/buffer/bufmgr.c Tue Sep 25 18:11:48 2007 --- postgresql-8.3patch/src/backend/storage/buffer/bufmgr.c Fri Oct 19 23:18:36 2007 *** *** 1470,1477 localhitrate = (float) LocalBufferHitCount *100.0 / ReadLocalBufferCount; appendStringInfo(str, ! !\tShared blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n, ! ReadBufferCount - BufferHitCount, BufferFlushCount, hitrate); appendStringInfo(str, !\tLocal blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n, ReadLocalBufferCount - LocalBufferHitCount, LocalBufferFlushCount, localhitrate); --- 1470,1477 localhitrate = (float) LocalBufferHitCount *100.0 / ReadLocalBufferCount; appendStringInfo(str, ! !\tShared blocks: %10ld Logical Reads, %10ld Physical Reads, %10ld written, buffer hit rate = %.2f%%\n, ! ReadBufferCount, ReadBufferCount - BufferHitCount, BufferFlushCount, hitrate); appendStringInfo(str, !\tLocal blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n, ReadLocalBufferCount - LocalBufferHitCount, LocalBufferFlushCount, localhitrate); -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] EXPLAIN ANALYZE
On Mon, Dec 11, 2006 at 12:24:12AM +0100, Peter Eisentraut wrote: Simon Riggs wrote: Well, I'd like a way of making EXPLAIN ANALYZE return something useful within a reasonable amount of time. We can define that as the amount of time that the user considers is their goal for the query. What sort of useful results would you expect to be able to see from such an aborted EXPLAIN ANALYZE? I cannot quite imagine what instructive value a partially executed plan output would have. It's not like we can somehow ensure executing an equal proportion of each plan node or something. Do you have a specific case in mind? The query is most likely to get canceled while it is working on whatever node in the plan is the bottleneck, and it's likely going to be easy to spot since nodes above it wouldn't have gotten much done. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] EXPLAIN ANALYZE
Thumbs up on this from a lurker. I recall a previous post about some sort of progress bar hack that would show you where in a plan a currently executing query was at. Has any work been done on this? Josh Reich Jim C. Nasby wrote: On Mon, Dec 11, 2006 at 12:24:12AM +0100, Peter Eisentraut wrote: Simon Riggs wrote: Well, I'd like a way of making EXPLAIN ANALYZE return something useful within a reasonable amount of time. We can define that as the amount of time that the user considers is their goal for the query. What sort of useful results would you expect to be able to see from such an aborted EXPLAIN ANALYZE? I cannot quite imagine what instructive value a partially executed plan output would have. It's not like we can somehow ensure executing an equal proportion of each plan node or something. Do you have a specific case in mind? The query is most likely to get canceled while it is working on whatever node in the plan is the bottleneck, and it's likely going to be easy to spot since nodes above it wouldn't have gotten much done. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] EXPLAIN ANALYZE
Richard Huxton wrote: Simon Riggs wrote: Intermediate results are always better than none at all. I do understand what a partial execution would look like - frequently it is the preparatory stages that slow a query down - costly sorts, underestimated hash joins etc. Other times it is loop underestimation, which can usually be seen fairly quickly. Surely all you're interested in is where the actual plan differs from the expected plan? Could you not just have a mode that issues NOTICEs when expected/actual number of rows differ by more than a set amount? You'd probably want two NOTICEs - one when the threshold is exceeded, one when the node completes. Right, we already have a TODO: * Have EXPLAIN ANALYZE highlight poor optimizer estimates I was thinking we could issue NOTICE when the estimates differed from the actual by a specified percentage, and that NOTICE could be issued while the query is still processing, assuming the stage completes before the query does. This seems much easier than doing protocol changes. TODO updated: * Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] EXPLAIN ANALYZE
On Tue, 2006-12-12 at 17:30 -0500, Bruce Momjian wrote: * Have EXPLAIN ANALYZE highlight poor optimizer estimates TODO updated: * Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage I don't think this is an improvement. The old wording describes a broad set of possible improvements. Your new text describes one way of implementing a subset of the former TODO wording. -Neil ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] EXPLAIN ANALYZE
Neil Conway wrote: On Tue, 2006-12-12 at 17:30 -0500, Bruce Momjian wrote: * Have EXPLAIN ANALYZE highlight poor optimizer estimates TODO updated: * Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage I don't think this is an improvement. The old wording describes a broad set of possible improvements. Your new text describes one way of implementing a subset of the former TODO wording. Well, we can still do a broader implementation if we want it. Do you have any suggestions? Basically, the more specific, the more likely we will get someone to do it, and we can always add more details. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] EXPLAIN ANALYZE
On Sun, 2006-12-10 at 18:09 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: The EA case is pretty straightforward though; Well, no its not, as you'll recall if you re-read the prior discussions. The killer problem is that it's unclear whether the early termination of the query represents an error condition or not. If it's not an error then you've got a serious problem for non-SELECT queries (which EA actually executes, remember) I was expecting to take the non-ERROR route, realising all of the problems you mention make the issue so costly to solve that way. Most big queries are SELECTs without any updates and its those we care about here. Anybody doing huge updates needs other assistance, IMHO. Intermediate results are always better than none at all. I do understand what a partial execution would look like - frequently it is the preparatory stages that slow a query down - costly sorts, underestimated hash joins etc. Other times it is loop underestimation, which can usually be seen fairly quickly. I foresaw that it would be possible to enforce EA as a read-only transaction, or throw an error (with no output) when issued in time-limited form. Maybe that isn't possible. Yes, I am looking for a fix that can be accomplished without major work and/or change. This topic is a pain, but not such a priority feature. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] EXPLAIN ANALYZE
Tom Lane [EMAIL PROTECTED] writes: We might be able to finesse the protocol problem by teaching EA to respond to query cancel by emitting the data-so-far as a NOTICE (like it used to do many moons ago), rather than a standard query result, then allowing the query to error out. However this'd be fairly unfriendly for client-side tools that are expecting a query result. What I suggested was introducing a new FE/BE message type for analyze query plans. Then clients that recognize it can use it to display the query plan without interfering with the query results. Clients that don't know what to do with it would have to just ignore it. Then we could introduce as many ways of triggering these messages as we like. A GUC to trigger one every n seconds, a FE/BE message like QueryCancel, say, QueryProbe which triggers one when the user presses a button in pgadmin or C-t (SIGINFO) in psql, etc. I was thinking that it should be more structured than the current block of text that clients receive. I had in mind to make it equivalent to a PGResult so the various bits of data would be in different named columns. This would let GUI clients like pgadmin interpret the results more effectively and make it easier for us to add data without worrying about information overload on the user's side. And the query would keep operating. Canceling the query and statement_timeout would both be entirely orthogonal to requesting analyze results. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] EXPLAIN ANALYZE
Simon Riggs wrote: Intermediate results are always better than none at all. I do understand what a partial execution would look like - frequently it is the preparatory stages that slow a query down - costly sorts, underestimated hash joins etc. Other times it is loop underestimation, which can usually be seen fairly quickly. Surely all you're interested in is where the actual plan differs from the expected plan? Could you not just have a mode that issues NOTICEs when expected/actual number of rows differ by more than a set amount? You'd probably want two NOTICEs - one when the threshold is exceeded, one when the node completes. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] EXPLAIN ANALYZE
On Mon, 2006-12-11 at 11:00 +, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: We might be able to finesse the protocol problem by teaching EA to respond to query cancel by emitting the data-so-far as a NOTICE (like it used to do many moons ago), rather than a standard query result, then allowing the query to error out. However this'd be fairly unfriendly for client-side tools that are expecting a query result. What I suggested was introducing a new FE/BE message type for analyze query plans. Then clients that recognize it can use it to display the query plan without interfering with the query results. Clients that don't know what to do with it would have to just ignore it. Then we could introduce as many ways of triggering these messages as we like. A GUC to trigger one every n seconds, a FE/BE message like QueryCancel, say, QueryProbe which triggers one when the user presses a button in pgadmin or C-t (SIGINFO) in psql, etc. I was thinking that it should be more structured than the current block of text that clients receive. I had in mind to make it equivalent to a PGResult so the various bits of data would be in different named columns. This would let GUI clients like pgadmin interpret the results more effectively and make it easier for us to add data without worrying about information overload on the user's side. And the query would keep operating. Canceling the query and statement_timeout would both be entirely orthogonal to requesting analyze results. I like the idea, but its more work than I really wanted to get into right now. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] EXPLAIN ANALYZE
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2006-12-11 at 11:00 +, Gregory Stark wrote: What I suggested was introducing a new FE/BE message type for analyze query plans. I like the idea, but its more work than I really wanted to get into right now. Yeah ... a protocol change is *painful*, especially if you really want clients to behave in a significantly new way. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] EXPLAIN ANALYZE
Tom Lane wrote: Yeah ... a protocol change is *painful*, especially if you really want clients to behave in a significantly new way. A backward-incompatible protocol change is painful, sure, but ISTM we could implement what Greg describes as a straightforward extension to the V3 protocol. Then the backend could just avoid sending the query progress information to V4 protocol clients. -Neil ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] EXPLAIN ANALYZE
Simon Riggs wrote: I like the idea, but its more work than I really wanted to get into right now. Well, from another point of view: do we need this feature so urgently that there is not enough time to do it properly? IMHO, no. -Neil ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] EXPLAIN ANALYZE
Neil Conway [EMAIL PROTECTED] writes: Tom Lane wrote: Yeah ... a protocol change is *painful*, especially if you really want clients to behave in a significantly new way. A backward-incompatible protocol change is painful, sure, but ISTM we could implement what Greg describes as a straightforward extension to the V3 protocol. Then the backend could just avoid sending the query progress information to V4 protocol clients. You're dodging the point though. If you want the new message type to do anything useful in V4 clients, you still have to define an API for libpq, update psql, try to figure out what the heck JDBC and ODBC are going to do with it, etc etc. All doable, but it's a lot more work than just a quick hack in the backend. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Explain Analyze mode
See your postgresql.conf for log_planner_stats = true #false log_executor_stats = true #false #log_statement_stats = false Oleg On Fri, 11 Mar 2005, Ioannis Theoharis wrote: Hi, i found this form of output of explain analyze, watching some old mails in lists. test4=# explain analyze select * from patients; LOG: query: explain analyze select * from patients; LOG: duration: 0.603887 sec LOG: QUERY STATISTICS ! system usage stats: ! 0.624269 elapsed 0.458985 user 0.123047 system sec ! [0.468750 user 0.125000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 7/4 [310/158] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! buffer usage stats: ! Shared blocks: 2742 read, 0 written, buffer hit rate = 3.59% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written QUERY PLAN - Seq Scan on patients (cost=0.00..4048.60 rows=131960 width=172) (actual time=0.04..562.97 rows=133886 loops=1) Total runtime: 602.42 msec (2 rows) How can i turn my configuration to achieve this output for explain analyze (and only the QUERY PLAN, as like tomorrow)? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 8: explain analyze is your friend