Hi all: I am new in pgsql, I am even new in using mailing list. I send this email just to give a suggestion on performance. I found that if primary key or a colume which has an unique index, is in a select sql,the distinct sometimes is Unnecessary. Actually, the SQL with DISTINCT will runs more slowly than the SQL without DISTINCT.
My english is poor, here is the SQL: CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text) INSERT into test_tbl select generate_series(1,10000000), 'test'; SQL with DISTINCT: test=# explain analyze select distinct col, k from test_tbl order by k limit 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=1277683.22..1277690.72 rows=1000 width=36) (actual time=12697.994..12698.382 rows=1000 loops=1) -> Unique (cost=1277683.22..1329170.61 rows=6864985 width=36) (actual time=12697.992..12698.311 rows=1000 loops=1) -> Sort (cost=1277683.22..1294845.68 rows=6864985 width=36) (actual time=12697.991..12698.107 rows=1000 loops=1) Sort Key: k, col Sort Method: external sort Disk: 215064kB -> Seq Scan on test_tbl (cost=0.00..122704.85 rows=6864985 width=36) (actual time=0.809..7561.215 rows=10000000 loops=1) Planning time: 2.368 ms Execution time: 12728.471 ms (8 rows) SQL without DISTINCT: test=# explain analyze select col, k from test_tbl order by k limit 1000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..31.81 rows=1000 width=9) (actual time=0.667..1.811 rows=1000 loops=1) -> Index Scan using test_tbl_pkey on test_tbl (cost=0.43..313745.06 rows=10000175 width=9) (actual time=0.666..1.744 rows=1000 loops=1) Planning time: 0.676 ms Execution time: 1.872 ms (4 rows) Also, after reading "Understanding+How+PostgreSQL+Executes+a+Query", I guess this happened: 1. the planner see distinct 2. the planner wants to use unique 3. the planner wants to use sort or hash 4. the planner see order by , and the "order by colume" is k, which is in the "select colume" -- col, k 5. the planner use sort In fact, the k is primary key, so not only k is distinct, but also any value combined with k. And even more, we have a "default primary key index", so we do not need to sort either. So my question is : 1. Is my guess correct ? 2. Can we make the planner more clever, so that it can just ignore the DISTINCT in the cases which just like mine, and use the index ? Other infomations: PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit Best regards. Rui Liu