Re: [PERFORM] subselect requires offset 0 for good performance.
On 08/02/2013 03:22 AM, Scott Marlowe wrote: On Thu, Aug 1, 2013 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Marlowe scott.marl...@gmail.com writes: I am running 8.4.15 and can try 8.4.17 if some patch has been applied to it to address this issue. I just want to know should I A: upgrade to 8.4.17 or B: create a self contained test case. A quick look at the release notes shows no planner fixes in 8.4.16 or 8.4.17, so it would be rather surprising if (A) helps. OK. I was doing some initial testing and if I select out the 4 columns into a test table the query runs fast. If I select all the columns into a test table it runs slow, so it appears table width affects this. Will have more to report tomorrow on it. I don't know what your query is, but here's one I was working on yesterday that shows the problem. It may not be the smallest test case possible, but it works. EXPLAIN ANALYZE WITH RECURSIVE x (start_time) AS ( SELECT generate_series(1, 100) ), t (time, timeround) AS ( SELECT time, time - time % 90 AS timeround FROM (SELECT min(start_time) AS time FROM x) AS tmp UNION ALL SELECT time, time - time % 90 FROM (SELECT (SELECT min(start_time) AS time FROM x WHERE start_time = t.timeround + 90) FROM t WHERE t.time IS NOT NULL OFFSET 0 ) tmp ) SELECT count(*) FROM t WHERE time IS NOT NULL; If you remove the OFFSET 0, you'll see two more subplans (because time is referenced three times). The difference is much more noticeable if you make the x CTE its own table. Vik PS: This query is emulating a LooseIndexScan. http://wiki.postgresql.org/wiki/Loose_indexscan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
Good day, I have a performance issue when JOINing a view within another view more than once. The query takes over three seconds to execute, which is too long in this case. It's not a problem if the tables are nearly empty, but that isn't the case on the production database. I suspect the planner thinks it's better to first put together the v_address view and JOIN it to the parcel table later on, but the function fx_get_user_tree_subordinates_by_id should be JOINed to the parcel table first, as it reduces the number of rows to less than 200 and any following JOINs would be much faster. I have also ran vacuum, reindex and analyze on the whole database, but it seems to have had to effect. Is there any way to nudge the planner toward that way of execution? This is the query: https://app.box.com/s/jzxiuuxoyj28q4q8rzxr This is the query plan: https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h (plain text) https://app.box.com/s/jzxiuuxoyj28q4q8rzxr (graphical output) These are the views: https://app.box.com/s/uibzidsazwv3eeauovuk (paginated view) https://app.box.com/s/v71vyexmdyl97m4f3m6u (used three times in the paginated view). Thank you. Peter Slapansky -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
On Fri, Aug 2, 2013 at 2:58 AM, Sergey Burladyan eshkin...@gmail.com wrote: PS: I think my main problem is here: select min(user_id) from items; min - 1 (1 row) Time: 504.520 ms That is a long time, but still 100 fold less than the planner is taking. What about max(user_id)? also, i cannot reindex it concurrently now, because it run autovacuum: VACUUM ANALYZE public.items (to prevent wraparound) That is going to take a long time if you have the cost settings at their defaults. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
Jeff Janes jeff.ja...@gmail.com writes: On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com wrote: If I not mistaken, may be two code paths like this here: (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - get_actual_variable_range - index_getnext (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - get_actual_variable_range - index_getnext Yeah, I think you are correct. mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range. It calls get_variable_range, which only looks at the pg_statistic entries. I think we need to see the actual stack traces, not incomplete versions. It's possible that the situation here involves bloat in pg_statistic, but we're just leaping to conclusions if we assume that that's where the index fetches are occurring. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com wrote: I also find this trace for other query: explain select * from xview.user_items_v v where ( v.item_id = 132358330 ); If I not mistaken, may be two code paths like this here: (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - get_actual_variable_range - index_getnext (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - get_actual_variable_range - index_getnext Yeah, I think you are correct. And may be get_actual_variable_range() function is too expensive for call with my bloated table items with bloated index items_user_id_idx on it? But why is it bloated in this way? It must be visiting many thousands of dead/invisible rows before finding the first visible one. But, Btree index have a mechanism to remove dead tuples from indexes, so it doesn't follow them over and over again (see kill_prior_tuple). So is that mechanism not working, or are the tuples not dead but just invisible (i.e. inserted by a still open transaction)? Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
Tom Lane t...@sss.pgh.pa.us writes: Jeff Janes jeff.ja...@gmail.com writes: On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com wrote: If I not mistaken, may be two code paths like this here: (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - get_actual_variable_range - index_getnext (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - get_actual_variable_range - index_getnext Yeah, I think you are correct. mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range. It calls get_variable_range, which only looks at the pg_statistic entries. Hmm, I speak about 9.2.2 but in current HEAD this call still exist, please see: http://doxygen.postgresql.org/selfuncs_8c_source.html#l02976 I think we need to see the actual stack traces, not incomplete versions. It's possible that the situation here involves bloat in pg_statistic, but we're just leaping to conclusions if we assume that that's where the index fetches are occurring. I found debug symbols and send stack trace to mail list, but it blocked by size, try again with zip gdb.log.gz Description: GNU Zip compressed data -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] subselect requires offset 0 for good performance.
On Thu, Aug 1, 2013 at 7:22 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Thu, Aug 1, 2013 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Marlowe scott.marl...@gmail.com writes: I am running 8.4.15 and can try 8.4.17 if some patch has been applied to it to address this issue. I just want to know should I A: upgrade to 8.4.17 or B: create a self contained test case. A quick look at the release notes shows no planner fixes in 8.4.16 or 8.4.17, so it would be rather surprising if (A) helps. OK. I was doing some initial testing and if I select out the 4 columns into a test table the query runs fast. If I select all the columns into a test table it runs slow, so it appears table width affects this. Will have more to report tomorrow on it. Here's the query: SELECT * FROM dba.pp_test_wide p LEFT JOIN ( SELECT tree_sortkey FROM dba.pp_test_wide WHERE tree_sortkey BETWEEN '00010101010001010100'::VARBIT AND public.tree_right('00010101010001010100'::VARBIT) AND product_name IS NOT NULL AND tree_sortkey '00010101010001010100'::VARBIT ) pp ON p.tree_sortkey BETWEEN pp.tree_sortkey AND public.tree_right(pp.tree_sortkey) WHERE p.tree_sortkey BETWEEN '00010101010001010100'::VARBIT AND public.tree_right('00010101010001010100'::VARBIT) AND p.tree_sortkey BETWEEN ''::VARBIT AND public.tree_right(''::VARBIT) AND p.deleted_at IS NULL AND pp.tree_sortkey IS NULL I extracted all the data like so: select * into dba.pp_test_wide from original table; and get this query plan from explain analyze: http://explain.depesz.com/s/EPx which takes 20 minutes to run. If I extract it this way: select tree_sortkey, product_name, deleted_at into db.pp_test_3col from original table; I get this plan: http://explain.depesz.com/s/gru which gets a materialize in it, and suddenly takes 106 ms. the factor in performance increase is therefore ~ 11,342. that's pretty huge. I'll try to make a self contained test case now. Hopefully that at least points in the right direction tho to a bug of some kind. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
Jeff Janes jeff.ja...@gmail.com writes: On Fri, Aug 2, 2013 at 2:58 AM, Sergey Burladyan eshkin...@gmail.com wrote: PS: I think my main problem is here: select min(user_id) from items; min - 1 (1 row) Time: 504.520 ms That is a long time, but still 100 fold less than the planner is taking. What about max(user_id)? max is good, only rows with user_id = 0 was updated: select max(user_id) from items; Time: 59.646 ms also, i cannot reindex it concurrently now, because it run autovacuum: VACUUM ANALYZE public.items (to prevent wraparound) That is going to take a long time if you have the cost settings at their defaults. Yes, I have custom setting, more slow, it will last about a week. But why is it bloated in this way? Don't known. It has been updated many items last week. ~ 10% of table. It must be visiting many thousands of dead/invisible rows before finding the first visible one. But, Btree index have a mechanism to remove dead tuples from indexes, so it doesn't follow them over and over again (see kill_prior_tuple). So is that mechanism not working, or are the tuples not dead but just invisible (i.e. inserted by a still open transaction)? It is deleted, but VACUUM still not completed. BTW, it is standby server, and it query plan (block read) is very different from master: Hot standby: explain (analyze,verbose,buffers) select min(user_id) from items; 'Result (cost=0.12..0.13 rows=1 width=0) (actual time=56064.514..56064.514 rows=1 loops=1)' ' Output: $0' ' Buffers: shared hit=3694164 read=6591224 written=121652' ' InitPlan 1 (returns $0)' '- Limit (cost=0.00..0.12 rows=1 width=8) (actual time=56064.502..56064.503 rows=1 loops=1)' ' Output: public.items.user_id' ' Buffers: shared hit=3694164 read=6591224 written=121652' ' - Index Only Scan using items_user_id_idx on public.items (cost=0.00..24165743.48 rows=200673143 width=8) (actual time=56064.499..56064.499 rows=1 loops=1)' 'Output: public.items.user_id' 'Index Cond: (public.items.user_id IS NOT NULL)' 'Heap Fetches: 8256426' 'Buffers: shared hit=3694164 read=6591224 written=121652' 'Total runtime: 56064.571 ms' Master: 'Result (cost=0.12..0.13 rows=1 width=0) (actual time=202.759..202.759 rows=1 loops=1)' ' Output: $0' ' Buffers: shared hit=153577 read=1' ' InitPlan 1 (returns $0)' '- Limit (cost=0.00..0.12 rows=1 width=8) (actual time=202.756..202.757 rows=1 loops=1)' ' Output: public.items.user_id' ' Buffers: shared hit=153577 read=1' ' - Index Only Scan using items_user_id_idx on public.items (cost=0.00..24166856.02 rows=200680528 width=8) (actual time=202.756..202.756 rows=1 loops=1)' 'Output: public.items.user_id' 'Index Cond: (public.items.user_id IS NOT NULL)' 'Heap Fetches: 0' 'Buffers: shared hit=153577 read=1' 'Total runtime: 202.786 ms' And from backup, before index|heap bloated :) Result (cost=0.87..0.88 rows=1 width=0) (actual time=16.002..16.003 rows=1 loops=1) Output: $0 Buffers: shared hit=3 read=4 InitPlan 1 (returns $0) - Limit (cost=0.00..0.87 rows=1 width=8) (actual time=15.993..15.995 rows=1 loops=1) Output: public.items.user_id Buffers: shared hit=3 read=4 - Index Only Scan using items_user_id_idx on public.items (cost=0.00..169143085.72 rows=193309210 width=8) (actual time=15.987..15.987 rows=1 loops=1) Output: public.items.user_id Index Cond: (public.items.user_id IS NOT NULL) Heap Fetches: 1 Buffers: shared hit=3 read=4 Total runtime: 16.057 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
Tom Lane escribió: Jeff Janes jeff.ja...@gmail.com writes: On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com wrote: If I not mistaken, may be two code paths like this here: (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - get_actual_variable_range - index_getnext (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - get_actual_variable_range - index_getnext Yeah, I think you are correct. mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range. It calls get_variable_range, which only looks at the pg_statistic entries. Uh? It's right there in line 2976 in HEAD. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] subselect requires offset 0 for good performance.
Scott Marlowe scott.marl...@gmail.com writes: I extracted all the data like so: select * into dba.pp_test_wide from original table; and get this query plan from explain analyze: http://explain.depesz.com/s/EPx which takes 20 minutes to run. If I extract it this way: select tree_sortkey, product_name, deleted_at into db.pp_test_3col from original table; I get this plan: http://explain.depesz.com/s/gru which gets a materialize in it, and suddenly takes 106 ms. There's no reason why suppressing some unrelated columns would change the rowcount estimates, but those two plans show different rowcount estimates. I suspect the *actual* reason for the plan change was that autovacuum had had a chance to update statistics for the one table, and not yet for the other. Please do a manual ANALYZE on both tables and see if there's still a plan difference. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane escribió: It calls get_variable_range, which only looks at the pg_statistic entries. Uh? It's right there in line 2976 in HEAD. Meh. You're right, I was thinking of this bit in get_variable_range() /* * XXX It's very tempting to try to use the actual column min and max, if * we can get them relatively-cheaply with an index probe. However, since * this function is called many times during join planning, that could * have unpleasant effects on planning speed. Need more investigation * before enabling this. */ #ifdef NOT_USED if (get_actual_variable_range(root, vardata, sortop, min, max)) return true; #endif I think when that was written, we didn't have the code in scalarineqsel that tries to go out and get the actual endpoints from an index. Now that we do, the planning cost impact that I was afraid of here can actually bite us, and it seems that at least for Sergey's case it's pretty bad. Another problem is that we'll end up comparing endpoints gotten from pg_statistic to endpoints gotten from the index, making the resulting numbers at least self-inconsistent and very possibly meaningless. The planner already caches the results of mergejoinscansel in hopes of alleviating its cost, but I wonder if we need another lower-level cache for the min/max values of each variable that participates in a mergejoinable clause. Having said that, it's still not clear why these probes are so expensive in Sergey's case. I favor your idea about lots of dead rows, but we don't have actual proof of it. Maybe pgstattuple could help here? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] subselect requires offset 0 for good performance.
On Fri, Aug 2, 2013 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Marlowe scott.marl...@gmail.com writes: I extracted all the data like so: select * into dba.pp_test_wide from original table; and get this query plan from explain analyze: http://explain.depesz.com/s/EPx which takes 20 minutes to run. If I extract it this way: select tree_sortkey, product_name, deleted_at into db.pp_test_3col from original table; I get this plan: http://explain.depesz.com/s/gru which gets a materialize in it, and suddenly takes 106 ms. There's no reason why suppressing some unrelated columns would change the rowcount estimates, but those two plans show different rowcount estimates. I suspect the *actual* reason for the plan change was that autovacuum had had a chance to update statistics for the one table, and not yet for the other. Please do a manual ANALYZE on both tables and see if there's still a plan difference. Interesting. I ran analyze on both tables and sure enough the new test table runs fast. Ran analyze on the old table and it runs slow. The only thing the old table and its plan are missing is the materialize. So what is likely to change from the old table to the new one? Here's the explain analyze output from the old table and the same query against it: http://explain.depesz.com/s/CtZ and here's the plan with offset 0 in it: http://explain.depesz.com/s/Gug note that while the estimates are a bit off, the really huge difference here says to me some suboptimal method is getting deployed in the background somewhere. Do we need a stack trace? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] subselect requires offset 0 for good performance.
On Fri, Aug 2, 2013 at 1:58 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Fri, Aug 2, 2013 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Marlowe scott.marl...@gmail.com writes: I extracted all the data like so: select * into dba.pp_test_wide from original table; and get this query plan from explain analyze: http://explain.depesz.com/s/EPx which takes 20 minutes to run. If I extract it this way: select tree_sortkey, product_name, deleted_at into db.pp_test_3col from original table; I get this plan: http://explain.depesz.com/s/gru which gets a materialize in it, and suddenly takes 106 ms. There's no reason why suppressing some unrelated columns would change the rowcount estimates, but those two plans show different rowcount estimates. I suspect the *actual* reason for the plan change was that autovacuum had had a chance to update statistics for the one table, and not yet for the other. Please do a manual ANALYZE on both tables and see if there's still a plan difference. Interesting. I ran analyze on both tables and sure enough the new test table runs fast. Ran analyze on the old table and it runs slow. The only thing the old table and its plan are missing is the materialize. So what is likely to change from the old table to the new one? Here's the explain analyze output from the old table and the same query against it: http://explain.depesz.com/s/CtZ and here's the plan with offset 0 in it: http://explain.depesz.com/s/Gug note that while the estimates are a bit off, the really huge difference here says to me some suboptimal method is getting deployed in the background somewhere. Do we need a stack trace? So as a followup. I ran vacuum verbose analyze on the original table, thinking it might be bloated but it wasn't. Out of 320k or so rows there were 4k dead tuples recovered, and none that it couldn't recover. So now I'm trying to recreate the original table with a select into with an order by random() on the end. Nope it gets a materialize in it and runs fast. Well it's danged hard to make a test case when copying the table with random ordering results in a much faster query against the same data. I'm at a loss on how to reproduce this. Are the indexes on the master table leading it astray maybe? Yep. Added the indexes and performance went right into the dumper. New plan on new table with old data added in random order now looks like the old table, only worse because it's on a slower drive. Just to be complete here's the plan: http://explain.depesz.com/s/PYH Note that I created new table with order by random() and created indexes. Ran analyze on it, and the select plan looks similar now: http://explain.depesz.com/s/bsE So maybe I can make a test case now. But to summarize, when it can use indexes this query gets REAL slow because it lacks a materialize step. That seem about right? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] subselect requires offset 0 for good performance.
Scott Marlowe scott.marl...@gmail.com writes: Yep. Added the indexes and performance went right into the dumper. New plan on new table with old data added in random order now looks like the old table, only worse because it's on a slower drive. Just to be complete here's the plan: http://explain.depesz.com/s/PYH Note that I created new table with order by random() and created indexes. Ran analyze on it, and the select plan looks similar now: http://explain.depesz.com/s/bsE So maybe I can make a test case now. But to summarize, when it can use indexes this query gets REAL slow because it lacks a materialize step. That seem about right? Well, the plans shown here could *not* use a materialize step because the inner scan makes use of a value from the current outer row. The materialized plan has to omit one of the index conditions from the inner scan and then apply it as a join condition. I suspect the real reason that the fast case is fast is that the inner relation, even without the p.tree_sortkey = pro_partners.tree_sortkey condition, is empty, and thus the join runs very quickly. But the planner doesn't know that. Its estimate of the row count isn't very large, but it's definitely not zero, plus it thinks that adding the additional index condition reduces the rowcount by a factor of 3 from there. So it comes to the wrong conclusion about the value of materializing a fixed inner relation as opposed to using a parameterized indexscan. Have you tried increasing the statistics targets for these join columns? It's also possible that what you need to do is adjust the planner's cost parameters ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] to many locks held
Jeison Bedoya jeis...@audifarma.com.co wrote: memory ram: 128 GB cores: 32 max_connections: 900 temp_buffers = 512MB In addition to the other comments, be aware that temp_buffers is the limit of how much RAM *each connection* can acquire to avoid writing temporary table data to disk. Once allocated to a connection, it will be reserved for that use on that connection until the connection closes. So temp_buffers could lock down 450 GB of RAM even while all connections are idle. If the maximum connections become active, and they average one work_mem allocation apiece, that's an *additional* 900 GB of RAM which would be needed to avoid problems. Reducing connections through a pooler is strongly indicated, and you may still need to reduce work_mem or temp_buffers. http://wiki.postgresql.org/wiki/Number_Of_Database_Connections -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds
Sergey Burladyan eshkin...@gmail.com writes: Hot standby: ... ' - Index Only Scan using items_user_id_idx on public.items (cost=0.00..24165743.48 rows=200673143 width=8) (actual time=56064.499..56064.499 rows=1 loops=1)' 'Output: public.items.user_id' 'Index Cond: (public.items.user_id IS NOT NULL)' 'Heap Fetches: 8256426' 'Buffers: shared hit=3694164 read=6591224 written=121652' 'Total runtime: 56064.571 ms' Master: ... ' - Index Only Scan using items_user_id_idx on public.items (cost=0.00..24166856.02 rows=200680528 width=8) (actual time=202.756..202.756 rows=1 loops=1)' 'Output: public.items.user_id' 'Index Cond: (public.items.user_id IS NOT NULL)' 'Heap Fetches: 0' 'Buffers: shared hit=153577 read=1' 'Total runtime: 202.786 ms' Looks like visibility map is not replicated into slave somehow? If it matters, Master was restarted yesterday, Standby was not. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] subselect requires offset 0 for good performance.
On Fri, Aug 2, 2013 at 2:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Marlowe scott.marl...@gmail.com writes: Yep. Added the indexes and performance went right into the dumper. New plan on new table with old data added in random order now looks like the old table, only worse because it's on a slower drive. Just to be complete here's the plan: http://explain.depesz.com/s/PYH Note that I created new table with order by random() and created indexes. Ran analyze on it, and the select plan looks similar now: http://explain.depesz.com/s/bsE So maybe I can make a test case now. But to summarize, when it can use indexes this query gets REAL slow because it lacks a materialize step. That seem about right? Well, the plans shown here could *not* use a materialize step because the inner scan makes use of a value from the current outer row. The materialized plan has to omit one of the index conditions from the inner scan and then apply it as a join condition. I suspect the real reason that the fast case is fast is that the inner relation, even without the p.tree_sortkey = pro_partners.tree_sortkey condition, is empty, and thus the join runs very quickly. But the planner doesn't know that. Its estimate of the row count isn't very large, but it's definitely not zero, plus it thinks that adding the additional index condition reduces the rowcount by a factor of 3 from there. So it comes to the wrong conclusion about the value of materializing a fixed inner relation as opposed to using a parameterized indexscan. Have you tried increasing the statistics targets for these join columns? It's also possible that what you need to do is adjust the planner's cost parameters ... I've tried changing random_page_cost, sequential_page_cost, the cpu* costs, and setting effective_cache_size all over the place and it stays just as slow. our default stats target is 100. Did a stats target = 1000 on the three cols we access. Same terrible performance. Plan here: http://explain.depesz.com/s/XVt stats target=1, same bad performance, plan: http://explain.depesz.com/s/kJ54 pretty much the same. Setting effective_cache_size='1000GB' make no difference, still slow. If I set random_page_cost to 75 makes it work, i.e. a materialize shows up. Note that we run on FusionIO cards, and the whole db fits in memory, so a very large effective cache size and random page cost of 1.0 is actually accurate for our hardware. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance