Should there be a Rule for Select to cause partitions to be excluded ?
On 8 February 2011 20:08, Sylvain Rabot <sylv...@abstraction.fr> wrote: > I also tried to do table partitioning using the same immutable function, > it works well except for constraint exclusion. > > CREATE TABLE mike.directory_part_0 () INHERITS (mike.directory) WITH > (fillfactor = 90); > CREATE RULE directory_part_0_insert AS ON INSERT TO mike.directory WHERE > (__mod_cons_hash(new.id_user::bigint, 2) = 0) > DO INSTEAD INSERT INTO mike.directory_part_0 VALUES (new.*); > > CREATE TABLE mike.directory_part_1 () INHERITS (mike.directory) WITH > (fillfactor = 90); > CREATE RULE directory_part_1_insert AS ON INSERT TO mike.directory WHERE > (__mod_cons_hash(new.id_user::bigint, 2) = 1) > DO INSTEAD INSERT INTO mike.directory_part_1 VALUES (new.*); > > mike_part=# explain analyze select * from directory where id_user = 3; > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Result (cost=0.00..310.21 rows=5226 width=141) (actual time=0.080..7.583 > rows=2653 loops=1) > -> Append (cost=0.00..310.21 rows=5226 width=141) (actual > time=0.077..3.654 rows=2653 loops=1) > -> Index Scan using directory_id_user_btree_idx on directory > (cost=0.00..8.27 rows=1 width=141) (actual time=0.007..0.007 rows=0 > loops=1) > Index Cond: (id_user = 3) > -> Index Scan using directory_part_0_id_user_btree_idx on > directory_part_0 directory (cost=0.00..8.27 rows=1 width=150) (actual > time=0.035..0.035 rows=0 loops=1) > Index Cond: (id_user = 3) > -> Index Scan using directory_part_1_id_user_btree_idx on > directory_part_1 directory (cost=0.00..293.67 rows=5224 width=141) (actual > time=0.035..2.037 rows=2653 loops=1) > Index Cond: (id_user = 3) > Total runtime: 8.807 ms > (9 rows) > > > On Tue, 2011-02-08 at 01:14 +0100, Sylvain Rabot wrote: > > Hi, > > > > I am trying to understand how indexes works to get the most of them. > > > > First I would like to know if there is more advantage than overhead to > > split an index in several ones using conditions e.g. doing : > > > > CREATE INDEX directory_id_user_0_btree_idx ON mike.directory USING btree > (id_user) WHERE id_user < 250000; > > CREATE INDEX directory_id_user_250000_btree_idx ON mike.directory USING > btree (id_user) WHERE id_user >= 250000 AND id_user < 500000; > > CREATE INDEX directory_id_user_500000_btree_idx ON mike.directory USING > btree (id_user) WHERE id_user >= 500000 AND id_user < 750000; > > CREATE INDEX directory_id_user_750000_btree_idx ON mike.directory USING > btree (id_user) WHERE id_user >= 750000 AND id_user < 1000000; > > > > instead of having only one index for all the id_user. the forecasts for > > the table directory are +500 millions records and something like 1 > > million distinct id_user. > > > > If there is my idea was to do a repartition in the indexes using a > > consistent hash algorithm in order to fill the indexes in parallel > > instead of successively : > > > > CREATE OR REPLACE FUNCTION mike.__mod_cons_hash( > > IN in_dividend bigint, > > IN in_divisor integer, > > OUT remainder integer > > ) AS $__$ > > > > BEGIN > > SELECT in_dividend % in_divisor INTO remainder; > > END; > > > > $__$ LANGUAGE plpgsql IMMUTABLE COST 10; > > > > CREATE INDEX directory_id_user_mod_cons_hash_0_btree_idx ON > mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 0; > > CREATE INDEX directory_id_user_mod_cons_hash_1_btree_idx ON > mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 1; > > CREATE INDEX directory_id_user_mod_cons_hash_2_btree_idx ON > mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 2; > > CREATE INDEX directory_id_user_mod_cons_hash_3_btree_idx ON > mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 3; > > > > But the thing is the indexes are not used : > > > > mike=# SELECT version(); > > version > > > ------------------------------------------------------------------------------------------------------------------- > > PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real > (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit > > (1 row) > > > > mike=# REINDEX INDEX directory_id_user_mod_cons_hash_0_btree_idx; > > LOG: duration: 14644.160 ms statement: REINDEX INDEX > > directory_id_user_mod_cons_hash_0_btree_idx; > > REINDEX > > mike=# EXPLAIN ANALYZE SELECT * FROM directory WHERE id_user = 4; > > QUERY PLAN > > > ---------------------------------------------------------------------------------------------------------------- > > Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual > time=0.077..348.211 rows=10303 loops=1) > > Filter: (id_user = 4) > > Total runtime: 351.114 ms > > (3 rows) > > > > So I also did this test : > > > > mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING > btree (id_user) WHERE id_user > 3 and id_user < 5; > > CREATE INDEX > > mike=# EXPLAIN ANALYZE select * from directory where id_user = 4; > > QUERY > PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using directory_id_user_4_btree_idx on directory > (cost=0.00..10.58 rows=67 width=148) (actual time=0.169..7.753 rows=10303 > loops=1) > > Index Cond: (id_user = 4) > > Total runtime: 10.973 ms > > (3 rows) > > > > mike=# DROP INDEX directory_id_user_4_btree_idx; > > DROP INDEX > > mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING > btree (id_user) WHERE id_user - 1 > 2 and id_user + 1 < 6; > > CREATE INDEX > > mike=# EXPLAIN ANALYZE select * from directory where id_user = 4; > > QUERY PLAN > > > ---------------------------------------------------------------------------------------------------------------- > > Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual > time=0.153..360.020 rows=10303 loops=1) > > Filter: (id_user = 4) > > Total runtime: 363.106 ms > > (3 rows) > > > > mike=# DROP INDEX directory_id_user_4_btree_idx; > > DROP INDEX > > mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING > btree (id_user) WHERE id_user > 2 + 1 and id_user < 6 - 1; > > CREATE INDEX > > mike=# EXPLAIN ANALYZE select * from directory where id_user = 4; > > QUERY > PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using directory_id_user_4_btree_idx on directory > (cost=0.00..10.58 rows=67 width=148) (actual time=0.245..8.262 rows=10303 > loops=1) > > Index Cond: (id_user = 4) > > Total runtime: 11.110 ms > > (3 rows) > > > > As you see the index condition although, differently written, is the > > same but the second index is not used apparently because the immutable > > function is applied on the column. > > > > So do you know the reason why the planner is not able to use indexes > > which have immutable functions applied to the column in their > > condition ? > > > > Regards. > > > > -- > Sylvain Rabot <sylv...@abstraction.fr> > -- Nick Lello | Web Architect o +44 (0) 8433309374 | m +44 (0) 7917 138319 Email: nick.lello at rentrak.com RENTRAK | www.rentrak.com | NASDAQ: RENT