Re: [HACKERS] Index scan optimization

Rajeev rastogi Sun, 16 Nov 2014 21:50:13 -0800

On 16 November 2014 19:30, Simon Riggs Wrote:
> Sent: 16 November 2014 19:30
 
> > I marked the patch as ready for committer.
> 
> This looks very interesting.
> 
> The value of the patch seems to come from skipping costly checks. Your
> performance test has a leading VARCHAR column, so in that specific case
> skipping the leading column is a big win. I don't see it would be in
> all cases.


Yes you are right. Best improvement will be for a case where leading column 
comparison is very costly (e.g. VARCHAR).

> Can we see a few tests on similar things to check for other
> opportunities and regressions.

> * Single column bigint index
        It gives around 5% improvement. 

> * Multi-column bigint index
        It gives around 5% improvement.

> * 5 column index with mixed text and integers
        It gives around 15% improvement.

As we can see in-case of bigint, improvement is not as high as for VARCHAR 
because comparison of bigint data-type is not costly.
So this being one of the worst scenario performance test and I think even 
in-case of worst case 5% improvement with so less/safe code changes
should be OK specially since for other scenario (like varchar index) 
improvement is high (15-30%). 

Also even for bigint (or any other similar data-type) improvement can increase 
if number of records going to be selected increases.

Test-case used for testing is attached.

Please provide your opinion.

> The explanatory comments need some work to more clearly explain what
> this patch does.

Please help me to understand this point, you want me to add more comments about 
patch in this mail chain or in code.


Thanks and Regards,
Kumar Rajeev Rastogi

----As per Simon Riggs suggestion:

--------------------------------------
----With Single column bigint index:
--Schema
create table tbl2(id1 int, id2 bigint, id3 int);
create index idx2 on tbl2(id2);

--Procedure to insert 1M data:

insert into tbl2 values(1,generate_series(1, 1000000), 2);

--Procedure to select data 1000 times (1000 times selected to make data more 
appropriate.)
create or replace function select_data(count1 int) returns void
AS
$$
declare
        x int;
Begin
        for i IN 1..count1 loop
                select count(*) into x from tbl2 where id2>990000; 
        end loop;       
End;                              
$$      language plpgsql;

select select_data(1000);

--------------------------------------
----With Multiple column bigint index:

create table tbl3(id1 int, id2 bigint, id3 bigint, id4 bigint);
create index idx3 on tbl3(id2, id3, id4);

insert into tbl3 values(1,generate_series(1, 1000000), generate_series(1, 
1000000), generate_series(1, 1000000));

--Procedure to select data 1000 times (1000 times selected to make data more 
appropriate.)
create or replace function select_data_multi_bigint(count1 int) returns void
AS
$$
declare
        x int;
Begin
        for i IN 1..count1 loop
                select count(*) into x from tbl3 where id2>990000 and 
id3>990000 and id4>990000; 
        end loop;       
End;                              
$$      language plpgsql;

select select_data_multi_bigint(1000);          


--------------------------------------
---5 column index with mixed text and integers

--Schema
create table tbl4(id1 int, id2 varchar(10), id3 int, id4 varchar(10), id5 int, 
id6 int);
create index idx4 on tbl4(id2, id3, id4, id5, id6);

--Procedure to insert 1M data:

create or replace function insert_data_mix_text_int(count1 int, count2 int) 
returns void
AS
$$
Begin
        for i IN 1..count1 loop
                insert into tbl4 values(i, 'a', i, 'a', i, i);
        end loop;       
        
        for i IN count1+1..count2 loop
                insert into tbl4 values(i, 'b', i, 'b', i, i);
        end loop;       
End;                              
$$      language plpgsql;

select insert_data_mix_text_int(990000, 1000000);

--Procedure to select data 1000 times (1000 times selected to make data more 
appropriate.)
create or replace function select_data_mix_text_int(count1 int) returns void
AS
$$
declare
        x int;
Begin
        for i IN 1..count1 loop
                select count(*) into x from tbl4 where id2>'a' and id3>990000 
and id4>'a' and id5>990000 and id6>990000; 
        end loop;       
End;                              
$$      language plpgsql;

select select_data_mix_text_int(1000);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index scan optimization

Reply via email to