[HACKERS] Using indexes for ORDER BY and PARTITION BY clause in windowing functions

2013-10-23 Thread Sameer Kumar
Hi,

I was dealing with windowing function recently. I feel they are pretty
useful and quite handy in lot of operations.

I am not sure why but my PostgreSQL does not seem to be using indexes for
ORDER BY clause or PARTITION BY CLAUSE which I use with windowing function.
I have tried ORDER BY and GROUP BY clauses in a normal sql statement and
they seem to use indexes nicely.

Is this being already considered for development?

Best Regards,
*Sameer Kumar | Database Consultant*
*ASHNIK PTE. LTD.*


Re: [HACKERS] Using indexes for ORDER BY and PARTITION BY clause in windowing functions

2013-10-23 Thread Tom Lane
Sameer Kumar sameer.ku...@ashnik.com writes:
 I am not sure why but my PostgreSQL does not seem to be using indexes for
 ORDER BY clause or PARTITION BY CLAUSE which I use with windowing function.

When the entire contents of the table have to be read, a seqscan-and-sort
will frequently be estimated as cheaper than an indexscan.  If you think
this is not true on your hardware, you might need to adjust
random_page_cost.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using indexes for ORDER BY and PARTITION BY clause in windowing functions

2013-10-23 Thread Sameer Kumar
On Wed, Oct 23, 2013 at 10:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Sameer Kumar sameer.ku...@ashnik.com writes:
  I am not sure why but my PostgreSQL does not seem to be using indexes for
  ORDER BY clause or PARTITION BY CLAUSE which I use with windowing
 function.

 When the entire contents of the table have to be read, a seqscan-and-sort
 will frequently be estimated as cheaper than an indexscan.  If you think
 this is not true on your hardware, you might need to adjust
 random_page_cost.

 regards, tom lane

My mistake. I had understood the issue wrongly.

Actually when I use functions like max to find the maximum value grouped by
another column I get a better performance when I try to do the same
operation using max() over().

Take a look at below plan:

edb=# \x
Expanded display is on.
edb=# \dS= student_score;
 Table enterprisedb.student_score
Column|  Type   | Modifiers
--+-+---
 id   | integer | not null
 student_name | character varying(1000) |
 score| integer |
 course   | character varying(100)  |
Indexes:
student_score_pkey PRIMARY KEY, btree (id)
idx_course btree (course)
idx_score btree (score)

edb=# select count(*) from student_score ;
-[ RECORD 1 ]-
count | 122880

edb=# explain analyze select max(score) from student_score group by course;
-[ RECORD 1
]-
QUERY PLAN | HashAggregate  (cost=3198.20..3198.26 rows=6 width=9) (actual
time=110.792..110.793 rows=6 loops=1)
-[ RECORD 2
]-
QUERY PLAN |   -  Seq Scan on student_score  (cost=0.00..2583.80
rows=122880 width=9) (actual time=0.011..23.055 rows=122880 loops=1)
-[ RECORD 3
]-
QUERY PLAN | Total runtime: 110.862 ms

edb=# explain analyze select max(score) over(partition by course) from
student_score ;
-[ RECORD 1
]
QUERY PLAN | WindowAgg  (cost=0.00..10324.65 rows=122880 width=9) (actual
time=36.145..224.504 rows=122880 loops=1)
-[ RECORD 2
]
QUERY PLAN |   -  Index Scan using idx_course on student_score
 (cost=0.00..8481.45 rows=122880 width=9) (actual time=0.037..85.283
rows=122880 loops=1)
-[ RECORD 3
]
QUERY PLAN | Total runtime: 242.949 ms

AS you can see there is a difference of twice. On similar lines, when I
have to find students who topped (had highest score) per course, I will
fire something like below:



edb=# explain analyze select student_name from student_score where
(course,score)in (select course,max(score) from student_score group by
course);
-[ RECORD 1
]---
QUERY PLAN | Hash Semi Join  (cost=3198.41..6516.76 rows=7300 width=43)
(actual time=113.727..181.045 rows=555 loops=1)
-[ RECORD 2
]---
QUERY PLAN |   Hash Cond: (((enterprisedb.student_score.course)::text =
(enterprisedb.student_score.course)::text) AND
(enterprisedb.student_score.score =
(max(enterprisedb.student_score.score
-[ RECORD 3
]---
QUERY PLAN |   -  Seq Scan on student_score  (cost=0.00..2583.80
rows=122880 width=52) (actual time=0.009..22.702 rows=122880 loops=1)
-[ RECORD 4
]---
QUERY PLAN |   -  Hash  (cost=3198.32..3198.32 rows=6 width=9) (actual
time=111.521..111.521 rows=6 loops=1)
-[ RECORD 5
]---
QUERY PLAN | Buckets: 1024  Batches: 1  Memory Usage: 1kB
-[ RECORD 6