Greetings, I'm experiencing a strange query plan change on a "simple" request, based on the LIMIT parameter. I present here two tables, named _article and _comment. they are part of a much larger database.
It's a tree-like database (making use of the ltree to keep the coherence), and in our case, an article can have any number of comments (linked by _ article.id = _comment.parent_id), and a _comment can itself have ONE comment (at most). The _article table contains 12 millions t-uples, and the _comment table around 26 millions. The server runs a postgresql 8.3.5 in its 64bits version. Here are the tables definition : -- _article table > > CREATE TABLE "ob2"."_article" ( > "id" bigint NOT NULL DEFAULT nextval('element_id_sequence'::regclass), > "parent_id" bigint, > "path" ltree, > "data" text, > "date_creation" timestamp without time zone NOT NULL DEFAULT now(), > "date_publishing" timestamp without time zone NOT NULL DEFAULT now(), > "date_modification" timestamp without time zone NOT NULL DEFAULT now(), > "counters" hstore, > "reference" integer NOT NULL DEFAULT > nextval('_article_reference_seq'::regclass), > "title" character varying NOT NULL, > "text" text, > "blog_id" bigint, > "user_id" bigint, > "site_id" bigint, > "topic_id" bigint, > "community_id" bigint, > CONSTRAINT "_article_pkey" PRIMARY KEY (id) > ) WITHOUT OIDS; > ALTER TABLE ONLY "ob2"."_article" ALTER COLUMN "path" SET STORAGE PLAIN; > ALTER TABLE ONLY "ob2"."_article" ALTER COLUMN "title" SET STORAGE PLAIN; > > -- Indexes > > CREATE UNIQUE INDEX _article_pkey ON _article USING btree (id); > CREATE INDEX gist_idx_article_path ON _article USING gist (path); > CREATE INDEX idx_article_blog_id ON _article USING btree (blog_id); > CREATE INDEX idx_article_community_id ON _article USING btree > (community_id); > CREATE INDEX idx_article_date_creation ON _article USING btree > (date_creation); > CREATE INDEX idx_article_date_modification ON _article USING btree > (date_modification); > CREATE INDEX idx_article_date_publishing ON _article USING btree > (date_publishing); > CREATE INDEX idx_article_parent_id ON _article USING btree (parent_id); > CREATE UNIQUE INDEX idx_article_reference_unique ON _article USING btree > (reference); > CREATE INDEX idx_article_site_id ON _article USING btree (site_id); > CREATE INDEX idx_article_topic_id ON _article USING btree (topic_id); > CREATE INDEX idx_article_user_id ON _article USING btree (user_id); > > -- _comment table > > CREATE TABLE "ob2"."_comment" ( > "id" bigint NOT NULL DEFAULT nextval('element_id_sequence'::regclass), > "parent_id" bigint, > "path" ltree, > "data" text, > "date_creation" timestamp without time zone NOT NULL DEFAULT now(), > "date_publishing" timestamp without time zone NOT NULL DEFAULT now(), > "date_modification" timestamp without time zone NOT NULL DEFAULT now(), > "counters" hstore, > "reference" integer NOT NULL DEFAULT > nextval('_comment_reference_seq'::regclass), > "text" text, > "article_id" bigint, > "blog_id" bigint, > "user_id" bigint, > "site_id" bigint, > CONSTRAINT "_comment_pkey" PRIMARY KEY (id) > ) WITHOUT OIDS; > ALTER TABLE ONLY "ob2"."_comment" ALTER COLUMN "path" SET STORAGE PLAIN; > > -- Indexes > > CREATE UNIQUE INDEX _comment_pkey ON _comment USING btree (id); > CREATE INDEX gist_idx_comment_path ON _comment USING gist (path); > CREATE INDEX idx_comment_date_creation ON _comment USING btree > (date_creation); > CREATE INDEX idx_comment_date_publishing ON _comment USING btree > (date_publishing); > CREATE INDEX idx_comment_parent_id ON _comment USING btree (parent_id); > CREATE INDEX idx_comment_reference ON _comment USING btree (reference); > Now I created a function to get simply the comment reply to a given comment : CREATE OR REPLACE FUNCTION get_comment_response (BIGINT) RETURNS _comment AS > $$ > SELECT * FROM _comment WHERE parent_id = $1; > $$ > STABLE > COST 1 > LANGUAGE SQL; > Ok, now, all is set. I'd like to get with a simple query the comments of a given article, ordered by publishing date, as well as their replies if they exists. So I write this request : SELECT _comment.id, (get_comment_response(_comment.id)).id AS r_id FROM _comment INNER JOIN _article ON _article.id = _comment.parent_id WHERE _comment.parent_id = '17355952' ORDER BY _comment.date_publishing ASC OFFSET 0 LIMIT 10; Results are good, quite fast, BUT, when executing tests I discovered something very strange. The query was fast for 3+ comments, but very slow with a limit of 1 or 2 ! Just because the query plan change : EXPLAIN > SELECT _comment.id, > (get_comment_response(_comment.id)).id AS r_id > FROM _comment > INNER JOIN _article > ON _article.id = _comment.parent_id > WHERE _comment.parent_id = '17355952' > ORDER BY _comment.id ASC > OFFSET 0 > LIMIT 1000; > > QUERY > PLAN > > --------------------------------------------------------------------------------------------------------------- > Limit (cost=10261.19..10263.69 rows=1000 width=8) > -> Sort (cost=10261.19..10281.06 rows=7949 width=8) > Sort Key: _comment.id > -> Nested Loop (cost=0.00..9825.35 rows=7949 width=8) > -> Index Scan using _article_pkey on _article > (cost=0.00..9.55 rows=1 width=8) > Index Cond: (id = 17355952::bigint) > -> Index Scan using idx_comment_parent_id on _comment > (cost=0.00..9716.44 rows=7949 width=16) > Index Cond: (_comment.parent_id = 17355952::bigint) > (8 rows) > EXPLAIN > SELECT _comment.id, > (get_comment_response(_comment.id)).id AS r_id > FROM _comment > INNER JOIN _article > ON _article.id = _comment.parent_id > WHERE _comment.parent_id = '17355952' > ORDER BY _comment.id ASC > OFFSET 0 > LIMIT 1; > QUERY > PLAN > > ----------------------------------------------------------------------------------------------------- > Limit (cost=0.00..3588.42 rows=1 width=8) > -> Nested Loop (cost=0.00..28524312.40 rows=7949 width=8) > -> Index Scan using _comment_pkey on _comment > (cost=0.00..28448324.73 rows=7949 width=16) > Filter: (parent_id = 17355952::bigint) > -> Index Scan using _article_pkey on _article (cost=0.00..9.55 > rows=1 width=8) > Index Cond: (_article.id = 17355952::bigint) > (6 rows) > The second query scans the whole comment table which is very dangerous for production servers. So did I do something wrong ? Is there a way to handle this issue smoothly ? Thanks in advance Yannick