[HACKERS] Index scan optimization

Rajeev rastogi Sun, 21 Sep 2014 21:48:18 -0700

I have observed a scope of considerable performance improvement in-case of 
index by a very minor code change.
Consider the below schema:


create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

Query as:
                select count(*) from tbl2 where id2>'a' and id3>990000;

As per current design, it takes following steps to retrieve index tuples:

1.       Find the scan start position by searching first position in BTree as 
per the first key condition i.e. as per id2>'a'

2.       Then it fetches each tuples from position found in step-1.

3.       For each tuple, it matches all scan key condition, in our example it 
matches both scan key condition.

4.       If condition match, it returns the tuple otherwise scan stops.

Now problem is here that already first scan key condition is matched to find 
the scan start position (Step-1), so it is obvious that any further tuple also 
will match the first scan key condition (as records are sorted).
So comparison on first scan key condition again in step-3 seems to be redundant.

So my proposal is to skip the condition check on the first scan key condition 
for every tuple.

I have prototype code to see the result, Below are my some test data as per the 
query attached with this mail:
Time:
                Original Code took:        3621 ms
                Optimized Code took:   2576 ms
                So overall performance improvement is around 29%.

Instruction:
                Original Code took:        20057
                Optimized Code took:   14557
                So overall instruction improvement is around 27%.
Also I observed that only for function varstr_cmp, around 661M instruction was 
taken, which is completely saved for optimized code.

This optimization can be extended:

1.       For a case where once all scan key matches the condition, no need to 
check condition for further tuples for any scan key. Just keep returning the 
tuple.

2.       Above approach can be used for '<' operator also by finding the 
position of last matching tuples.

I would like to submit the patch for this improvement.
Please provide your feedback. Also let me know if I am missing something.

Thanks and Regards,
Kumar Rajeev Rastogi

--Schema
create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

--Procedure to insert 1M data:

create or replace function insert_data(count1 int, count2 int) returns void
AS
$$
Begin
        for i IN 1..count1 loop
                insert into tbl2 values(i, 'a', i);
        end loop;       
        
        for i IN count1+1..count2 loop
                insert into tbl2 values(i, 'b', i);
        end loop;       
End;                              
$$      language plpgsql;

select insert_data(990000, 1000000);

--Procedure to select data 1000 times (1000 times selected to make data more 
appropriate.)
create or replace function select_data(count1 int) returns void
AS
$$
declare
        x int;
Begin
        for i IN 1..count1 loop
                select count(*) into x from tbl2 where id2>'a' and id3>990000; 
        end loop;       
End;                              
$$      language plpgsql;

select select_data(1000);

--Result:
                                   Optimized            Original
Reading-1                       2579.27                 3621.82
Reading-2                       2573.82                 3618.29
Reading-3                       2575.08                 3625.16
Average                         2576.06                 3621.76

Overall Improvement                                     29%     

Instruction     Original Code took:     20057 M
Insrtuction     Optimized Code took:    14557 M
So overall instruction improvement is around 27%.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Index scan optimization

Reply via email to