Re: [PERFORM] CREATE STATISTICS and join selectivity
On 16 November 2017 at 09:19, Justin Pryzby <pry...@telsasoft.com> wrote: > I just noticed that PG10 CREATE STATISTICS (dependencies) doesn't seem to work > for joins on multiple columns; is that right? Unfortunately, for now, they're not used for join selectivity estimates, only for the base rel selectivities. That's all there was time for with PG10. This is highly likely to be improved sometime in the future. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unnecessary DISTINCT while primary key in SQL
On 5 November 2017 at 04:20, 刘瑞 <whx20...@gmail.com> wrote: > CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text) > INSERT into test_tbl select generate_series(1,1000), 'test'; > > SQL with DISTINCT: > test=# explain analyze select distinct col, k from test_tbl order by k limit > 1000; > QUERY PLAN > -- > Limit (cost=1277683.22..1277690.72 rows=1000 width=36) (actual > time=12697.994..12698.382 rows=1000 loops=1) >-> Unique (cost=1277683.22..1329170.61 rows=6864985 width=36) (actual > time=12697.992..12698.311 rows=1000 loops=1) > -> Sort (cost=1277683.22..1294845.68 rows=6864985 width=36) > (actual time=12697.991..12698.107 rows=1000 loops=1) >Sort Key: k, col >Sort Method: external sort Disk: 215064kB >-> Seq Scan on test_tbl (cost=0.00..122704.85 rows=6864985 > width=36) (actual time=0.809..7561.215 rows=1000 loops=1) > Planning time: 2.368 ms > Execution time: 12728.471 ms > (8 rows) The current planner does not make much of an effort into recording which columns remain distinct at each level. I have ideas on how to improve this and it would include improving your case here. 9.6 did improve a slight variation of your query, but this was for GROUP BY instead of DISTINCT. Probably there's no reason why the same optimisation could not be applied to DISTINCT, I just didn't think of it when writing the patch. The item from the release notes [1] reads "Ignore GROUP BY columns that are functionally dependent on other columns" So, if you were to write the query as: explain analyze select col, k from test_tbl group by col, k order by k limit 1000; It should run much more quickly, although still not as optimal as it could be. [1] https://www.postgresql.org/docs/9.6/static/release-9-6.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cheaper subquery scan not considered unless offset 0
On 30 October 2017 at 00:24, Benjamin Coutu <ben.co...@zeyos.com> wrote: > -> Index Scan using "PK_items_ID" on items a (cost=0.42..1.05 rows=1 > width=332) (actual time=0.003..0.003 rows=1 loops=15521 total=46.563) I've never seen EXPLAIN output like that before. Is this some modified version of PostgreSQL? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does max_parallel_workers_per_gather change load averages?
On 4 October 2017 at 08:48, Ben Nachtriebwrote: > I have 2 cores and my max_parallel_workers_per_gather = 2 and > max_worker_processes = 8, but my load averages are between 8 and 5 with > scheduled at 1/189 to 5/195. Are these so high because I increased > max_parallel_workers_per_gather? My understanding is that if my load > averages are greater than my number of cores the system is overloaded. > Should I think about it differently once I increase > max_parallel_workers_per_gather? How should I think about it? Parallel query is not 100% efficient. For example, adding twice the CPU, in theory, will never double the performance, there's always some overhead to this. It's really only useful to do on systems with spare CPU cycles to perform this extra work. You don't seem to have much to spare, so you may get along better if you disable parallel query. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
On 21 September 2017 at 04:15, Mike Broers <mbro...@gmail.com> wrote: > Ultimately I think this is just highlighting the need in my environment to > set random_page_cost lower (we are on an SSD SAN anyway..), but I dont think > I have a satisfactory reason by the row estimates are so bad in the QA > planner and why it doesnt use that partition index there. Without the index there are no stats to allow the planner to perform a good estimate on "e.body->>'SID' is not null", so it applies a default of 99.5%. So, as a simple example, if you have a partition with 1 million rows. If you apply 99.5% to that you get 995000 rows. Now if you add the selectivity for "e.validation_status_code = 'P' ", let's say that's 50%, the row estimate for the entire WHERE clause would be 497500 (100 * 0.995 * 0.5). Since the 99.5% is applied in both cases, then the only variable part is validation_status_code. Perhaps validation_status_code = 'P' is much more common in QA than in production. You can look at the stats as gathered by ANALYZE with: \x on select * from pg_stats where tablename = 'event__' and attname = 'validation_status_code'; \x off -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
On 14 September 2017 at 08:28, Mike Broers <mbro...@gmail.com> wrote: > I have a query of a partitioned table that uses the partition index in > production but uses sequence scans in qa. The only major difference I can > tell is the partitions are much smaller in qa. In production the partitions > range in size from around 25 million rows to around 60 million rows, in QA > the partitions are between 4 and 12 million rows. I would think this would > be big enough to get the planner to prefer the index but this is the major > difference between the two database as far as I can tell. QA: > │ -> Seq Scan on event__ e_1 > (cost=0.00..2527918.06 rows=11457484 width=782)│ > Production: > > │ -> Index Scan using > ix_event__00011162_landing_id on event__00011162 e_1 (cost=0.56..15476.59 > rows=23400 width=572) │ If QA has between 4 and 12 million rows, then the planner's row estimate for the condition thinks 11457484 are going to match, so a Seqscan is likely best here. If those estimates are off then it might be worth double checking your nightly analyze is working correctly on QA. The planner may be able to be coaxed into using the index with a higher effective_cache_size and/or a lower random_page_cost setting, although you really should be looking at those row estimates first. Showing us the EXPLAIN ANALYZE would have been much more useful so that we could have seen if those are accurate or not. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan for views and WHERE clauses, Luke is not using the index
On 19 August 2017 at 04:46, kimaidou <kimai...@gmail.com> wrote: > When we call the WHERE on the view: > > EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) > SELECT * > FROM "qgep"."vw_qgep_reach" > WHERE "progression_geometry" && > st_makeenvelope(1728327.03249295568093657,8240789.26074041239917278,1728608.10987572139129043,8240958.16933418624103069,3949) > > > The query plan is "wrong", as PostgreSQL seems to consider it should do a > seq scan on the tables, and only afterwards filter with the WHERE: > https://explain.depesz.com/s/wXV > > The query takes about 1 second instead of less than 100ms. > > Do you have any hint on this kind of issue ? This is by design due to the DISTINCT ON() clause. Only quals which filter columns which are in the DISTINCT ON can be safely pushed down. Consider the following, where I've manually pushed the WHERE clause. postgres=# create table tt (a int, b int); CREATE TABLE postgres=# create index on tt (a); CREATE INDEX postgres=# insert into tt values(1,1),(1,2),(2,1),(2,2); INSERT 0 4 postgres=# select * from (select distinct on (a) a,b from tt order by a,b) tt where b = 2; a | b ---+--- (0 rows) postgres=# select * from (select distinct on (a) a,b from tt where b = 2 order by a,b) tt; a | b ---+--- 1 | 2 2 | 2 (2 rows) Note the results are not the same. If I'd done WHERE a = 2, then the planner would have pushed the qual down into the subquery. More reading in check_output_expressions() in allpaths.c: /* If subquery uses DISTINCT ON, check point 3 */ if (subquery->hasDistinctOn && !targetIsInSortList(tle, InvalidOid, subquery->distinctClause)) { /* non-DISTINCT column, so mark it unsafe */ safetyInfo->unsafeColumns[tle->resno] = true; continue; } The comment for point 3 reads: * 3. If the subquery uses DISTINCT ON, we must not push down any quals that * refer to non-DISTINCT output columns, because that could change the set * of rows returned. (This condition is vacuous for DISTINCT, because then * there are no non-DISTINCT output columns, so we needn't check. Note that * subquery_is_pushdown_safe already reported that we can't use volatile * quals if there's DISTINCT or DISTINCT ON.) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inefficient max query when using group by
On 4 May 2017 at 22:52, <jesse.hieta...@vaisala.com> wrote: > I have a performance problem with my query. As a simplified example, I have > a table called Book, which has three columns: id, released (timestamp) and > author_id. I have a need to search for the latest books released by multiple > authors, at a specific point in the history. This could be latest book > between beginning of time and now, or latest book released last year etc. In > other words, only the latest book for each author, in specific time window. > I have also a combined index for released and author_id columns. > > First, I tried a simple query that selects maximum value of released and the > author_id, which are grouped by the author_id (then later do a join by these > author_id, released columns to get the whole rows). Performance of this > query is pretty bad (Execution time around 250-300ms for five authors). See > query and query plan in the link below: > > https://gist.github.com/jehie/ca9fac16b6e3c19612d815446a0e1bc0 > > > > The execution time seems to grow linearly when the number of author_ids > increase (50ms per author_id). I don’t completely understand why it takes so > long for this query to execute and why it does not use the directional index > scan? > > I also tried second query using limit (where I can only ask for one > author_id at a time, so cannot use this directly when searching for books of > multiple author), which performs nicely (0.2ms): > > https://gist.github.com/jehie/284e7852089f6debe22e05c63e73027f > > > > So, any ideas how to make multiple-author lookups (like in the first query) > perform better? Or any other ideas? Yes, you could sidestep the whole issue by using a LATERAL join. Something like: EXPLAIN ANALYZE SELECT b.released, b.author_id FROM (VALUES('1'),('2'),('3'),('4'),('5')) a (author_id) CROSS JOIN LATERAL (SELECT released, author_id FROM book WHERE author_id = a.author_id AND released<=to_timestamp(2e9) AND released>=to_timestamp(0) ORDER BY released desc LIMIT 1) b; or you could write a function which just runs that query. Although, with the above or the function method, if you give this enough authors, then it'll eventually become slower than the problem query. Perhaps if you know the number of authors will not be too great, then you'll be ok. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query with 3 table joins
On 26 April 2017 at 15:19, Alessandro Ferrucci <alessandroferru...@gmail.com> wrote: > After about 40 inutes the slow query finally finished and the result of the > EXPLAIN plan can be found here: > > https://explain.depesz.com/s/BX22 > Index Scan using field_unit_id_idx on field (cost=0.00..8746678.52 > rows=850149 width=8) (actual time=0.030..2414345.998 rows=10315 loops=1)" This estimate seems a long way off. Are the stats up-to-date on the table? Try again after running: ANALYZE field; It might also be a good idea to ANALYZE all the tables. Is auto-vacuum switched on? The plan in question would work better if you create an index on field (field_name, unit_id); but I think if you update the stats the plan will switch. A HashJoin, hashing "unit" and index scanning on field_field_name_idx would have been a much smarter plan choice for the planner to make. Also how many distinct field_names are there? SELECT COUNT(DISTINCT field_name) FROM field; You may want to increase the histogram buckets on that columns if there are more than 100 field names, and the number of rows with each field name is highly variable. ALTER TABLE field ALTER COLUMN field_name SET STATISTICS ; 100 is the default, and 10000 is the maximum. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insert Concurrency
On 18 April 2017 at 14:55, ROBERT PRICE <rprice...@hotmail.com> wrote: > I come from an Oracle background and am porting an application to postgres. > App has a table that will contain 100 million rows and has to be loaded by a > process that reads messages off a SQS queue and makes web service calls to > insert records one row at a time in a postgres RDS instance. I know slow by > slow is not the ideal approach but I was wondering if postgres had > partitioning or other ways to tune concurrent insert statements. Process > will run 50 - 100 concurrent threads. Have you tested performance and noticed that it is insufficient for your needs? or do you just assume PostgreSQL suffers from the same issue as Oracle in regards to INSERT contention on a single table? You may like to look at pgbench [1] to test the performance if you've not done so already. [1] https://www.postgresql.org/docs/9.6/static/pgbench.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fast HashJoin only after a cluster/recreate table
Index Cond: ((es09codemp = 1) AND > (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc)) > Filter: (es09tipdoc ~~ '%'::text) > Buffers: shared hit=6706 >-> Sort (cost=37.35..38.71 rows=547 width=32) (actual > time=0.592..2.206 rows=2919 loops=1) > Sort Key: t2.es08tipdoc > Sort Method: quicksort Memory: 67kB > Buffers: shared hit=7 > -> Seq Scan on es08t t2 (cost=0.00..12.47 rows=547 > width=32) (actual time=0.003..0.126 rows=547 loops=1) >Buffers: shared hit=7 > -> Materialize (cost=0.56..287644.85 rows=716126 width=23) > (actual time=0.027..68577.800 rows=993087854 loops=1) >Buffers: shared hit=75342 >-> GroupAggregate (cost=0.56..278693.28 rows=716126 > width=15) (actual time=0.025..4242.453 rows=3607573 loops=1) > Group Key: es09t1.es09codemp, es09t1.es09tipdoc, > es09t1.es09numdoc > Buffers: shared hit=75342 > -> Index Only Scan using es09t1_pkey on es09t1 > (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031 > rows=7160921 loops=1) >Index Cond: (es09codemp = 1) >Heap Fetches: 51499 >Buffers: shared hit=75342 > Planning time: 50.129 ms > Execution time: 380419.435 ms > -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PERFORM] Merge joins on index scans
On 27 February 2016 at 11:07, James Parks <james.pa...@meraki.net> wrote: > > CREATE TABLE a (id bigint primary key, nonce bigint); > CREATE TABLE b (id bigint primary key, a_id bigint not null); > CREATE INDEX a_idx ON b (a_id); > > The query: > > SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = ? ORDER BY b.id > ASC; > > (skip down to [1] and [2] to see the query performance) > > What I know: > > If you force the query planner to use a merge join on the above query, it > takes 10+ minutes to complete using the data as per below. If you force the > query planner to use a hash join on the same data, it takes ~200 > milliseconds. I believe I know what is going on here, but can you please test; SELECT b.* FROM b WHERE EXISTS (SELECT 1 FROM a ON b.a_id = a.id AND a.nonce = ?) ORDER BY b.id ASC; using the merge join plan. If this performs much better then the problem is due to the merge join mark/restore causing the join to have to transition through many tuples which don't match the a.nonce = ? predicate. The mark and restore is not required for the rewritten query, as this use a semi join rather than a regular inner join. With the semi join the executor knows that it's only meant to be matching a single tuple in "a", so once the first match is found it can move to the next row in the outer relation without having to restore the scan back to where it started matching that inner row again. If I'm right, to get around the problem you could; create index on a (nonce, id); If such an index is out of the question then a patch has been submitted for review which should fix this problem in (hopefully) either 9.6 or 9.7 https://commitfest.postgresql.org/9/129/ If you have a test environment handy, it would be nice if you could test the patch on the current git head to see if this fixes your problem. The findings would be quite interesting for me. Please note this patch is for test environments only at this stage, not for production use. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL seems to create inefficient plans in simple conditional joins
On 31 January 2016 at 01:30, Hedayat Vatankhah <hedayat@gmail.com> wrote: > Personally, I expect both queries below to perform exactly the same: > > SELECT > t1.id, * > FROM > t1 > INNER JOIN > t2 ON t1.id = t2.id > where t1.id > -9223372036513411363; > > And: > > SELECT > t1.id, * > FROM > t1 > INNER JOIN > t2 ON t1.id = t2.id > where t1.id > -9223372036513411363 and t2.id > -9223372036513411363; > > Unfortunately, they do not. PostgreSQL creates different plans for these > queries, which results in very poor performance for the first one compared > to the second (What I'm testing against is a DB with around 350 million > rows in t1, and slightly less in t2). > > EXPLAIN output: > First query: http://explain.depesz.com/s/uauk > Second query: link: http://explain.depesz.com/s/uQd Yes, unfortunately you've done about the only thing that you can do, and that's just include both conditions in the query. Is there some special reason why you can't just write the t2.id > ... condition in the query too? or is the query generated dynamically by some software that you have no control over? I'd personally quite like to see improvements in this area, and even wrote a patch [1] which fixes this problem too. The problem I had when proposing the fix for this was that I was unable to report details about how many people are hit by this planner limitation. The patch I proposed caused a very small impact on planning time for many queries, and was thought by many not to apply in enough cases for it to be worth slowing down queries which cannot possibly benefit. Of course I agree with this, I've no interest in slowing down planning on queries, but at the same time understand the annoying poor optimisation in this area. Although please remember the patch I proposed was merely a first draft proposal. Not for production use. [1] http://www.postgresql.org/message-id/flat/cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com#cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL seems to create inefficient plans in simple conditional joins
On 31 January 2016 at 06:14, Vitalii Tymchyshyn <v...@tym.im> wrote: > It may be more for -hackers, but I often hear "this wont be used because of > planning time increase". Now as I know we have statistics on real query time > after few runs that is used to decide if plan should be switched. > Can this statistics be used to apply advanced planning features for > relatively long running queries? E.g. a parameter like > sophisticated_planning_l1_threshold=500ms. If query runs over this > threshold, replan it with more sophisticated features taking few more > millis. Possibly different levels can be introduced. Also allow to set > threshold to 0, saying "apply to all queries right away". > Another good option is to threshold against cumulative query time. E.g. if > there was 1 runs 0.5 millis each, it may be beneficial to spend few > millis to get 0.2 millis each. I agree with you. I recently was working with long running queries on a large 3TB database. I discovered a new optimisation was possible, and wrote a patch to implement. On testing the extra work which the optimiser performed took 7 micoseconds, and this saved 6 hours of execution time. Now, I've never been much of an investor in my life, but a 3 billion times return on an investment seems quite favourable. Of course, that's quite an extreme case, but it's hard to ignore the benefit is still significant in less extreme cases. The idea you've mentioned here is very similar to what I bought up at the developer meeting a few days ago, see AOB section in [1] Unfortunately I didn't really get many of the correct people on my side with it, and some wanted examples of specific patches, which is completely not what I wanted to talk about. I was more aiming for some agreement for generic infrastructure to do exactly as you describe. [1] https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query order of magnitude slower with slightly different where clause
t;> Execution time: 5459.461 ms > > > Please let me know if there is any more info I can provide to help figure > out why it's choosing an undesirable plan with just a slight change in the > the clause. > Hi Adam, This is fairly simple to explain. The reason you see better performance with the singe claim_id is that IN() clauses with a single 1 item are converted to a single equality expression. For example: (just using system tables so you can try this too, without having to create any special tables) # explain select * from pg_class where oid in(1); QUERY PLAN - Index Scan using pg_class_oid_index on pg_class (cost=0.27..8.29 rows=1 width=219) Index Cond: (oid = '1'::oid) We get an index scan with the index condition: oid = 1. If we have 2 items, then we don't get this. # explain select * from pg_class where oid in(1,2); QUERY PLAN - Bitmap Heap Scan on pg_class (cost=8.56..14.03 rows=2 width=219) Recheck Cond: (oid = ANY ('{1,2}'::oid[])) -> Bitmap Index Scan on pg_class_oid_index (cost=0.00..8.56 rows=2 width=0) Index Cond: (oid = ANY ('{1,2}'::oid[])) (4 rows) Now I also need to explain that PostgreSQL will currently push ONLY equality expressions into other relations. For example, if we write: # explain select * from pg_class pc inner join pg_attribute pa on pc.oid = pa.attrelid where pc.oid in(1); QUERY PLAN Nested Loop (cost=0.55..22.63 rows=4 width=422) -> Index Scan using pg_class_oid_index on pg_class pc (cost=0.27..8.29 rows=1 width=223) Index Cond: (oid = '1'::oid) -> Index Scan using pg_attribute_relid_attnum_index on pg_attribute pa (cost=0.28..14.30 rows=4 width=203) Index Cond: (attrelid = '1'::oid) (5 rows) You can see that I only put pg_class.oid = 1 in the query, but internally the query planner also added the pg_attribute.attrelid = 1. It was able to do this due to the join condition dictating that pc.oid = pa.attrelid, therefore this will always be equal, and since pc.oid = 1, then pa.attrelid must also be 1. If we have 2 items in the IN() clause, then this no longer happens: # explain select * from pg_class pc inner join pg_attribute pa on pc.oid = pa.attrelid where pc.oid in(1,2); QUERY PLAN Nested Loop (cost=8.84..54.84 rows=15 width=422) -> Bitmap Heap Scan on pg_class pc (cost=8.56..14.03 rows=2 width=223) Recheck Cond: (oid = ANY ('{1,2}'::oid[])) -> Bitmap Index Scan on pg_class_oid_index (cost=0.00..8.56 rows=2 width=0) Index Cond: (oid = ANY ('{1,2}'::oid[])) -> Index Scan using pg_attribute_relid_attnum_index on pg_attribute pa (cost=0.28..20.33 rows=8 width=203) Index Cond: (attrelid = pc.oid) (7 rows) In your case the claim_id = 'e8a38718-7997-4304-bbfa-138deb84aa82'::uuid was pushed down into the subqueries, thus giving them less work to do, and also the flexibility of using indexes on claim_id in the tables contained within the subqueries. PostgreSQL currently does not push any inequality predicates down at all. A few months ago I did a little bit of work to try and lift this restriction, although I only made it cover the >=, >, < and <= operators as a first measure. Details here: http://www.postgresql.org/message-id/flat/cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com#cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com If you didn't have the VIEW, you could manually push these predicates into each subquery. However this is not really possible to do with the VIEW. Perhaps something could be done with a function and using dynamic SQL to craft a query manually, or you could just get rid of the view and have the application build the query. If that's not an option then maybe you could response to the thread above to mention that you've been hit by this problem and would +1 some solution to fix it, and perhaps cross link to this thread. I did have a little bit of a hard time in convincing people that this was in fact a fairly common problem in the above thread, so it would be nice to see people who have hit this problem respond to that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PERFORM] Why is now()::date so much faster than current_date
On 17 November 2015 at 21:49, Thomas Kellerer <spam_ea...@gmx.net> wrote: > Hello, > > I stumbled over this answer: http://stackoverflow.com/a/9717125/330315 > and this sounded quite strange to me. > > So I ran this on my Windows laptop with Postgres 9.4.5, 64bit and indeed > now()::date is much faster than current_date: > > explain analyze > select current_date > from generate_series (1, 100); > > Function Scan on generate_series (cost=0.00..6.00 rows=1000 width=0) > (actual time=243.878..1451.839 rows=100 loops=1) > Planning time: 0.047 ms > Execution time: 1517.881 ms > > And: > > explain analyze > select now()::date > from generate_series (1, 100); > > Function Scan on generate_series (cost=0.00..6.00 rows=1000 width=0) > (actual time=244.491..785.819 rows=100 loops=1) > Planning time: 0.037 ms > Execution time: 826.612 ms > > > The key to this is in the EXPLAIN VERBOSE output: postgres=# explain verbose select current_date; QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=0) Output: ('now'::cstring)::date (2 rows) You can see that the implementation of current_date requires using the date_in() function as well as the date_out() function. date_in() parses the 'now' string, then the resulting date is converted back into a date string with date_out(). Using now()::date does not have to parse any date strings, it just needs to call date_out() to give the final output. The reason for this is likely best explained by the comment in gram.y: /* * Translate as "'now'::text::date". * * We cannot use "'now'::date" because coerce_type() will * immediately reduce that to a constant representing * today's date. We need to delay the conversion until * runtime, else the wrong things will happen when * CURRENT_DATE is used in a column default value or rule. * * This could be simplified if we had a way to generate * an expression tree representing runtime application * of type-input conversion functions. (As of PG 7.3 * that is actually possible, but not clear that we want * to rely on it.) * * The token location is attached to the run-time * typecast, not to the Const, for the convenience of * pg_stat_statements (which doesn't want these constructs * to appear to be replaceable constants). */ -- David Rowley http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Training & Services
Re: [PERFORM] Having some problems with concurrent COPY commands
On 14 October 2015 at 08:33, Shaun Thomas <bonesmo...@gmail.com> wrote: > On Tue, Oct 13, 2015 at 7:23 AM, Andres Freund <and...@anarazel.de> wrote: > > and send the results. > > Whelp, I'm an idiot. I can't account for how I did it, but I can only > assume I didn't export my ports in the tests properly. I ran > everything again and there's a marked difference between 9.3 and 9.4. > The parallel copy times still inflate, but only from 1.4s to 2.5s at 4 > procs. Though it gets a bit dicey after that. > > > Do the times still inflate in the same way if you perform the COPY before adding the indexes to the table? -- David Rowley http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Training & Services
Re: [PERFORM] Slow HashAggregate/cache access
On 6 August 2015 at 07:55, Andreas Joseph Krogh andr...@visena.com wrote: På onsdag 05. august 2015 kl. 20:25:25, skrev Maxim Boguk maxim.bo...@gmail.com: [snip] I think I know where issue is. The PostgreSQL planner unable pass join conditions into subquery with aggregate functions (it's well known limitation). [snip] I'm curious; will 9.5 help here as it has WHERE clause pushdown in subqueries with window functions? http://michael.otacoo.com/postgresql-2/postgres-9-5-feature-highlight-where-pushdown-with-window-function/ I've not looked at the query in any detail, but that particular patch won't help as it only allows pushdown of predicate into subqueries with window functions where the predicate is part of all of the subquery's PARTITION BY clauses. The query in question has no window clauses, so qual pushdown is not disabled for that reason. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [PERFORM] Slow HashAggregate/cache access
On 6 August 2015 at 06:25, Maxim Boguk maxim.bo...@gmail.com wrote: On Wed, Aug 5, 2015 at 11:41 AM, Alexandre de Arruda Paes adald...@gmail.com wrote: Hi, First, sorry to compare Post with other database system, but I know nothing about Oracle... This customer have an application made with a framework thats generates the SQL statements (so, We can't make any query optimizations) . We did the following tests: 1) Postgresql 9.3 and Oracle 10 in a desktop machine(8 GB RAM, 1 SATA disk,Core i5) 2) Postgresql 9.3 in a server + FC storage (128 GB RAM, Xeon 32 cores, SAS disks) I think I know where issue is. The PostgreSQL planner unable pass join conditions into subquery with aggregate functions (it's well known limitation). I think this statement is quite misleading. Let's look at an example: create table t1 (a int not null, v int not null); create table t2 (a int not null); insert into t1 select s.i,10 from generate_series(1,1000) s(i),generate_series(1,1000); insert into t2 select generate_series(1,1000); create index on t1 (a); explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s inner join t2 on t2.a = s.a where t2.a = 1; QUERY PLAN -- Nested Loop (cost=0.42..59.76 rows=1 width=12) - GroupAggregate (cost=0.42..42.24 rows=1 width=8) Group Key: t1.a - Index Scan using t1_a_idx on t1 (cost=0.42..37.38 rows=969 width=8) Index Cond: (a = 1) - Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4) Filter: (a = 1) (7 rows) As you can see, the predicate is pushes down just fine into a subquery with aggregates. The likely reason that PostgreSQL Is not behaving the same as SQL Server and Oracle is because the predicate pushdowns are limited to equality operators only as internally these are all represented by a series of equivalence classes which in this case say that 1 = t2.a = t1.a, therefore it's possible to apply t1.a = 1 at the lowest level. These equivalence classes don't currently handle non-equality operators. Here's an example: explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s inner join t2 on t2.a = s.a where t2.a = 1; QUERY PLAN Hash Join (cost=19442.51..19466.27 rows=1 width=12) Hash Cond: (t1.a = t2.a) - HashAggregate (cost=19425.00..19435.00 rows=1000 width=8) Group Key: t1.a - Seq Scan on t1 (cost=0.00..14425.00 rows=100 width=8) - Hash (cost=17.50..17.50 rows=1 width=4) - Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4) Filter: (a = 1) (8 rows) Notice the seq scan on t1 instead of the index scan on t1_a_idx. A way around this is to manually push the predicate down into the subquery: explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 where t1.a = 1 group by a) s inner join t2 on t2.a = s.a where t2.a = 1; QUERY PLAN --- Nested Loop (cost=0.42..21.98 rows=1 width=12) Join Filter: (t1.a = t2.a) - GroupAggregate (cost=0.42..4.46 rows=1 width=8) Group Key: t1.a - Index Scan using t1_a_idx on t1 (cost=0.42..4.44 rows=1 width=8) Index Cond: (a = 1) - Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4) Filter: (a = 1) (8 rows) The query in question is likely performing badly because of this: - Seq Scan on fr13t1 (cost=0.00..25072.50 rows=311800 width=21) (actual time=0.007..115.766 rows=311800 loops=1) Filter: (fr01codemp = '1'::smallint) Buffers: shared hit=21175 Just how selective is fr01codemp = '1'::smallint ? Is there an index on that column ? Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [PERFORM] hyperthreadin low performance
On 21 July 2015 at 14:59, Jeison Bedoya Delgado jeis...@audifarma.com.co wrote: hi everyone, Recently update a database to machine with RHEL7, but i see that the performance is betther if the hyperthreading tecnology is deactivated and use only 32 cores. is normal that the machine performance is better with 32 cores that 64 cores?. You might be interested in http://www.postgresql.org/message-id/53f4f36e.6050...@agliodbs.com Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [PERFORM] Query RE: Optimising UUID Lookups
On 21 March 2015 at 23:34, Roland Dunn roland.d...@gmail.com wrote: If we did add more RAM, would it be the effective_cache_size setting that we would alter? Is there a way to force PG to load a particular table into RAM? If so, is it actually a good idea? Have you had a look at EXPLAIN (ANALYZE, BUFFERS) for the query? Pay special attention to Buffers: shared read=NNN and Buffers: shared hit=NNN, if you're not reading any buffers between runs then the pages are in the PostgreSQL shared buffers. By the looks of your config you have 10GB of these. On the other hand if you're getting buffer reads, then they're either coming from disk, or from the OS cache. PostgreSQL won't really know the difference. If you're not getting any buffer reads and it's still slow, then the problem is not I/O Just for fun... What happens if you stick the 50 UUIDs in some table, analyze it, then perform a join between the 2 tables, using IN() or EXISTS()... Is that any faster? Also how well does it perform with: set enable_bitmapscan = off; ? Regards David Rowley
Re: [PERFORM] Postgres does not use indexes with OR-conditions
On Fri, Nov 7, 2014 at 5:16 PM, arhipov arhi...@dc.baikal.ru wrote: Hello, I have just came across interesting Postgres behaviour with OR-conditions. Are there any chances that the optimizer will handle this situation in the future? select * from commons.financial_documents fd where fd.creation_time = '2011-11-07 10:39:07.285022+08' order by fd.creation_time desc limit 200 select * from commons.financial_documents fd where fd.creation_time = '2011-11-07 10:39:07.285022+08' or fd.creation_time '2011-11-07 10:39:07.285022+08' order by fd.creation_time desc limit 200 It would certainly be possible, providing the constants compare equally, but... Question: Would you really want to pay a, say 1% increase in planning time for ALL queries, so that you could have this unique case of queries perform better at execution time? Is there a valid reason why you don't just write the query with the = operator? Regards David Rowley
Re: [PERFORM] unnecessary sort in the execution plan when doing group by
On Tue, Oct 28, 2014 at 7:26 PM, Huang, Suya suya.hu...@au.experian.com wrote: Hi, This is the Greenplum database 4.3.1.0. Likely this is the wrong place to ask for help. The plan output that you've pasted below looks very different to PostgreSQL's EXPLAIN output. QUERY PLAN --- Gather Motion 24:1 (slice2; segments: 24) (cost=31286842.08..31287447.81 rows=1683 width=536) Rows out: 15380160 rows at destination with 14860 ms to first row, 23856 ms to end, start offset by 104 ms. - HashAggregate (cost=31286842.08..31287447.81 rows=1683 width=536) - Gather Motion 24:1 (slice2; segments: 24) (cost=152269717.33..157009763.41 rows=1196982 width=568) Rows out: 15380160 rows at destination with 35320 ms to first row, 70091 ms to end, start offset by 102 ms. - GroupAggregate (cost=152269717.33..157009763.41 rows=1196982 width=568) Most likely the reason you're getting the difference in plan is because the planner is probably decided that there will be too many hash entries for a hash table based on the 3 grouping columns... Look at the estimates, 1683 with 2 columns and 1196982 with the 3 columns. If those estimates turned out to be true, then the hash table for 3 columns will be massively bigger than it would be with 2 columns. With PostgreSQL you might see the plan changing if you increased the work_mem setting. For greenplum, I've no idea if that's the same. Databases are often not very good at knowing with the number of distinct values would be over more than 1 column. Certain databases have solved this with multi column statistics, but PostgreSQL does not have these. Although I just noticed last night that someone is working on them. Regards David Rowley
Re: [PERFORM] IS NOT NULL and LEFT JOIN
On Tue, Oct 21, 2014 at 2:58 AM, Tom Lane t...@sss.pgh.pa.us wrote: Laurent Martelli laurent.marte...@enercoop.org writes: Do we agree that both queries are identical ? No, they *aren't* identical. Go consult any SQL reference. Left join conditions don't work the way you seem to be thinking: after the join, the RHS column might be null, rather than equal to the LHS column. For what it's worth I'd say they are identical, at least, if you discount deferring foreign key constraints or also executing the query from within a volatile function which was called by a query which just updated the user_info table to break referential integrity. The presence of the foreign key on contract_contract.user_info which references user_user_info.id means that any non-null contract_contract.user_info record must reference a valid user_user_info record, therefore the join is not required to prove that a non nulled user_info contract records match a user info record, therefore the join to check it exists is pretty much pointless in just about all cases that you're likely to care about. Although, saying that I'm still a bit confused about the question. Are you asking if there's some way to get PostgreSQL to run the 1st query faster? Or are you asking if both queries are equivalent? Regards David Rowley
Re: [PERFORM] IS NOT NULL and LEFT JOIN
On Sun, Oct 19, 2014 at 5:10 PM, Laurent Martelli laurent.marte...@enercoop.org wrote: Hello there, I have a strange query plan involving an IS NOT NULL and a LEFT JOIN. I grant you that the query can be written without the JOIN on user_user_info, but it is generated like this by hibernate. Just changing the IS NOT NULL condition to the other side of useless JOIN makes a big difference in the query plan : -- THE BAD ONE : given the selectivity on c.name and c.email, barely more than one row will ever be returned But it looks like you're ignoring the fact that the OR condition would force the query to match not only the user and the email, but also any row that finds a match in the user_user_info table, which going by the planner's estimates, that's every row in the contract_contract table. This is why the planner chooses a seqscan on the contract_contract table instead of using the index on lower(name). Is it really your intention to get all rows that find a this martelli contract that has this email, and along with that, get every contract that has a not null user_info record? I see that you have a foreign key on c.user_info to reference the user, so this should be matching everything with a non null user_info record. explain analyze select c.* from contact_contact c left outer join user_user_info u on c.user_info=u.id left outer join contact_address a on c.address=a.id where lower(c.name)='martelli' and c.email='ds...@ezrfz.com' or u.id is not null; QUERY PLAN Hash Left Join (cost=1.83..2246.76 rows=59412 width=4012) (actual time=53.645..53.645 rows=0 loops=1) Hash Cond: (c.user_info = u.id) Filter: (((lower((c.name)::text) = 'martelli'::text) AND ((c.email)::text = 'ds...@ezrfz.com'::text)) OR (u.id IS NOT NULL)) Rows Removed by Filter: 58247 - Seq Scan on contact_contact c (cost=0.00..2022.12 rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1) - Hash (cost=1.37..1.37 rows=37 width=8) (actual time=0.029..0.029 rows=37 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 2kB - Seq Scan on user_user_info u (cost=0.00..1.37 rows=37 width=8) (actual time=0.004..0.015 rows=37 loops=1) Planning time: 0.790 ms Execution time: 53.712 ms -- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead of userinfo1_.id) explain analyze select c.* from contact_contact c left outer join user_user_info u on c.user_info=u.id left outer join contact_address a on c.address=a.id where lower(c.name)='martelli' and c.email='ds...@ezrfz.com' or c.user_info is not null; QUERY PLAN Bitmap Heap Scan on contact_contact c (cost=8.60..16.41 rows=1 width=4012) (actual time=0.037..0.037 rows=0 loops=1) Recheck Cond: (((email)::text = 'ds...@ezrfz.com'::text) OR (user_info IS NOT NULL)) Filter: (((lower((name)::text) = 'martelli'::text) AND ((email)::text = 'ds...@ezrfz.com'::text)) OR (user_info IS NOT NULL)) - BitmapOr (cost=8.60..8.60 rows=2 width=0) (actual time=0.034..0.034 rows=0 loops=1) - Bitmap Index Scan on idx_contact_email (cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1) Index Cond: ((email)::text = 'ds...@ezrfz.com'::text) - Bitmap Index Scan on contact_contact_user_info_idx (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (user_info IS NOT NULL) Planning time: 0.602 ms Execution time: 0.118 ms If you look closely at the 2nd query plan, you'll see that no joins are performed, and it's only the contract_contract table that's looked at. This is because PostgresSQL sees that none of the columns from the 2 tables which are being left joined to are used, and also that the columns that you're joining to on these tables are unique, therefore joining to them cannot duplicate any rows, and since these are left joined, if there was no matching row, then it wouldn't filter out rows from the contract_contract table, as it would with INNER JOINs. The planner sees that these left joins are pointless, so just removes them from the plan. Regards David Rowley
Re: [PERFORM] View has different query plan than select statement
On Mon, May 19, 2014 at 4:47 PM, Geoff Hull geoff.h...@mccarthy.co.nzwrote: I am sending this on behalf of my colleague who tried to post to this list last year but without success, then also tried pgsql-performance-ow...@postgresql.org but without getting a reply. I have recently re-tested this in P/G version 9.3.4 with the same results: Hi, I have created a table 'test_table' and index 'idx_test_table' with a view 'v_test_table'. However the query plan used by the view does not use the index but when running the select statement itself it does use the index. Given that query specific hints are not available in Postgres 9.1 how can I persuade the view to use the same query plan as the select statement? Thanks, Tim --DROP table test_table CASCADE; -- create test table CREATE TABLE test_table ( history_id SERIAL, id character varying(50) NOT NULL , name character varying(50), CONSTRAINT test_table_pkey PRIMARY KEY (history_id) ); -- create index on test table CREATE INDEX idx_test_table ON test_table (id); -- populate test table INSERT INTO test_table (id, name) SELECT *, 'Danger Mouse' FROM (SELECT md5(random()::text) from generate_series(1,1)) q; -- collect stats ANALYZE test_table; EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM test_table WHERE id = '02b304b1c54542570d9f7bd39361f5b4'; Index Scan using idx_test_table on test_table (cost=0.00..8.27 rows=1 width=50) (actual time=0.021..0.022 rows=1 loops=1) Index Cond: ((id)::text = '02b304b1c54542570d9f7bd39361f5b4'::text) Buffers: shared hit=3 Total runtime: 0.051 ms -- select statement with good plan EXPLAIN (ANALYZE, BUFFERS) SELECT id, CASE WHEN COALESCE(LAG(name) OVER (PARTITION BY id ORDER BY history_id), name || 'x') name then name end as name FROM test_table WHERE id = '02b304b1c54542570d9f7bd39361f5b4'; WindowAgg (cost=8.28..8.31 rows=1 width=50) (actual time=0.050..0.051 rows=1 loops=1) Buffers: shared hit=3 - Sort (cost=8.28..8.29 rows=1 width=50) (actual time=0.039..0.039 rows=1 loops=1) Sort Key: history_id Sort Method: quicksort Memory: 25kB Buffers: shared hit=3 - Index Scan using idx_test_table on test_table (cost=0.00..8.27 rows=1 width=50) (actual time=0.030..0.031 rows=1 loops=1) Index Cond: ((id)::text = '02b304b1c54542570d9f7bd39361f5b4'::text) Buffers: shared hit=3 Total runtime: 0.102 ms --DROP VIEW v_test_table; CREATE OR REPLACE VIEW v_test_table AS SELECT id, CASE WHEN COALESCE(LAG(name) OVER (PARTITION BY id ORDER BY history_id), name || 'x') name then name end as name FROM test_table; -- Query via view with bad plan EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM v_test_table WHERE id = '02b304b1c54542570d9f7bd39361f5b4'; Subquery Scan on v_test_table (cost=868.39..1243.39 rows=50 width=65) (actual time=26.115..33.327 rows=1 loops=1) Filter: ((v_test_table.id)::text = '02b304b1c54542570d9f7bd39361f5b4'::text) Buffers: shared hit=104, temp read=77 written=77 - WindowAgg (cost=868.39..1118.39 rows=1 width=50) (actual time= 26.022..32.519 rows=1 loops=1) Buffers: shared hit=104, temp read=77 written=77 - Sort (cost=868.39..893.39 rows=1 width=50) (actual time=26.013..27.796 rows=1 loops=1) Sort Key: test_table.id, test_table.history_id Sort Method: external merge Disk: 608kB Buffers: shared hit=104, temp read=77 written=77 - Seq Scan on test_table (cost=0.00..204.00 rows=1 width=50) (actual time=0.010..1.804 rows=1 loops=1) Buffers: shared hit=104 Total runtime: 33.491 ms How can I get the view to use the same query plan as the select statement? Hi Geoff, Unfortunately the view is not making use of the index due to the presence of the windowing function in the view. I think you would find that if that was removed then the view would more than likely use the index again. The reason for this is that currently the WHERE clause of the outer query is not pushed down into the view due to some overly strict code which completely disallows pushdowns of where clauses into sub queries that contain windowing functions... In your case, because you have this id in your partition by clause, then technically it is possible to push the where clause down into the sub query. I wrote a patch a while back which lifts this restriction. it unfortunately missed the boat for 9.4, but with any luck it will make it into 9.5. If you're up for compiling postgres from source, then you can test the patch out: http://www.postgresql.org/message-id/cahoyfk9ihosarntwc-nj5tphko4wcausd-1c_0wecogi9ue...@mail.gmail.com It should apply to current HEAD without too much trouble. Regards David Rowley
Re: [PERFORM] DATE_TRUNC() and GROUP BY?
On Fri, Dec 20, 2013 at 1:35 PM, Dave Johansen davejohan...@gmail.comwrote: I just ran into an interesting issue on Postgres 8.4. I have a database with about 3 months of data and when I do following query: SELECT DATE_TRUNC('day', time) AS time_t, COUNT(*) FROM mytable GROUP BY time_t; EXPLAIN shows that it's doing a sort and then a GroupAggregate. There will only be ~90 outputs, so is there a way I can hint/force the planner to just do a HashAggregate? Just to see if it would change the plan, I tried increasing the work_mem up to 1GB and it still did the same plan. PostgreSQL does not really have any stats on the selectivity of date_trunc('day', time) so my guess is that it can only assume that it has the same selectivity as the time column by itself... Which is very untrue in this case. The group aggregate plan is chosen here as PostgreSQL thinks the the hash table is going to end up pretty big and decides that the group aggregate will be the cheaper option. I mocked up your data and on 9.4 I can get the hash aggregate plan to run if I set the n_distinct value to 90 then analyze the table again.. Even if you could do this on 8.4 I'd not recommend it as it will probably cause havoc with other plans around the time column. I did also get the hash aggregate plan to run if I created a functional index on date_trunc('day', time) then ran analyze again. I don't have a copy of 8.4 around to see if the planner will make use of the index in the same way. What would be really nice is if we could create our own statistics on what we want, something like: CREATE STATISTICS name ON table (date_trunc('day', time)); That way postgres could have a better idea of the selectivity in this situation. I'd give creating the function index a try, but keep in mind the overhead that it will cause with inserts, updates and deletes. Regards David Rowley Thanks, Dave
Re: [PERFORM] PostgreSQL over internet
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of belal hamed Sent: 27 January 2013 13:16 To: pgsql-performance@postgresql.org Subject: [PERFORM] PostgreSQL over internet by wireshark I monitor TCP packets I found total data transmit/received 400B I took about 2.5s to fetch results why ?? Are you sure there's not any QOS somewhere that is slowing down the packets for port 5432 or whichever you're using for PostgreSQL? Perhaps temporarily changing PostgreSQL's listening port to something else might be a good test. David -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] indexing for distinct search in timestamp based table
I once also had a similar performance problem when looking for all matching rows between two timestamps. In fact that's why I'm here today. The problem was with MySQL. I had some tables of around 10 million rows and all my searching was timestamp based. MySQL didn't do what I wanted. I found that using a CLUSTERED index with postgresql to be lightning quick. Yet mostly the matching rows I was working with was not much over the 100k mark. I'm wondering if clustering the table on ad_log_start_time will help cut down on random reads. That's if you can afford to block the users while postgresql clusters the table. If you're inserting in order of the start_time column (which I was) then the cluster should almost maintain itself (I think), providing you're not updating or deleting anyway, I'd assume that since it looks like a log table. David. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Mager Sent: 28 August 2008 09:06 To: pgsql-performance@postgresql.org Subject: [PERFORM] indexing for distinct search in timestamp based table I'm looking for some help in speeding up searches. My table is pretty simple (see below), but somewhat large, and continuously growing. Currently it has about 50 million rows. The table is (I know I have excessive indexes, I'm trying to get the appropriate ones and drop the extras): Table public.ad_log Column|Type | Modifiers --+-+--- - ad_log_id| integer | not null default nextval('ad_log_ad_log_id_seq'::regclass) channel_name | text| not null player_name | text| not null ad_name | text| not null start_time | timestamp without time zone | not null end_time | timestamp without time zone | not null Indexes: ad_log_pkey PRIMARY KEY, btree (ad_log_id) ad_log_channel_name_key UNIQUE, btree (channel_name, player_name, ad_name, start_time, end_time) ad_log_ad_and_start btree (ad_name, start_time) ad_log_ad_name btree (ad_name) ad_log_all btree (channel_name, player_name, start_time, ad_name) ad_log_channel_name btree (channel_name) ad_log_end_time btree (end_time) ad_log_player_and_start btree (player_name, start_time) ad_log_player_name btree (player_name) ad_log_start_time btree (start_time) The query I'm trying to speed up is below. In it the field tag can be one of channel_name, player_name, or ad_name. I'm actually trying to return the distinct values and I found GROUP BY to be slightly faster than using DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses in which case we use '%', but it seems Postgres optimizes that pretty well. SELECT field FROM ad_log WHERE channel_name LIKE :channel_name AND player_name LIKE :player_name AND ad_name LIKE :ad_name AND start_time BETWEEN :start_date AND (date(:end_date) + 1) GROUP BY field ORDER BY field A typical query is: explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%' AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND (date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name; with the result being: QUERY PLAN --- Sort (cost=1163169.02..1163169.03 rows=5 width=10) (actual time=75460.187..75460.192 rows=15 loops=1) Sort Key: channel_name Sort Method: quicksort Memory: 17kB - HashAggregate (cost=1163168.91..1163168.96 rows=5 width=10) (actual time=75460.107..75460.114 rows=15 loops=1) - Bitmap Heap Scan on ad_log (cost=285064.30..1129582.84 rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296 loops=1) Recheck Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~ '%'::text)) - Bitmap Index Scan on ad_log_start_time (cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443 rows=13701296 loops=1) Index Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Total runtime: 75460.361 ms It seems to me there should be some way to create an index to speed this up, but the various ones I've tried so far haven't helped. Any suggestions would be greatly appreciated. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
Re: [PERFORM] Nested Loop join being improperly chosen
I had a similar problem here: http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php Is the nested loop performing a LEFT join with yours? It's a little difficult to tell just from the query plan you showed. A work around for mine was to use a full outer join and eliminate the extra rows in the where clause. A bit of a hack but it changed a 2 min query into one that ran in under a second. Of course this is not helping with your problem but at least may trigger some more feedback. David. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brad Ediger Sent: 22 August 2008 16:26 To: pgsql-performance@postgresql.org Subject: [PERFORM] Nested Loop join being improperly chosen Hello, I'm having trouble with a Nested Loop being selected for a rather complex query; it turns out this is a pretty bad plan as the nested loop's row estimates are quite off (1 estimated / 1207881 actual). If I disable enable_nestloop, the query executes much faster (42 seconds instead of 605). The tables in the query have all been ANALYZEd just before generating these plans. Here are the plans with and without enable_nestloop: http://pastie.org/258043 The inventory table is huge; it currently has about 1.3 x 10^9 tuples. The items table has around 10,000 tuples, and the other tables in the query are tiny. Any ideas or suggestions would be greatly appreciated. Thanks! -- Brad Ediger -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance