2009/6/2 Robert Haas <robertmh...@gmail.com> > On Mon, Jun 1, 2009 at 4:53 PM, Anne Rosset <aros...@collab.net> wrote: > >> On Mon, Jun 1, 2009 at 2:14 PM, Anne Rosset <aros...@collab.net> wrote: > >>> SELECT SUM(1) FROM item WHERE is_deleted = 'f'; sum --------- 1824592 > (1 > >>> row) > >>> SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 > >>> </sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701>'; sum > -------- > >>> 122412 (1 row) > >>> SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 > >>> </sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701>' AND > is_deleted > >>> = > >>> 'f'; sum ----- 71 (1 row) > >>> SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 > >>> </sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701>' AND > is_deleted > >>> = > >>> 't'; sum -------- 122341 (1 row) > > > > The item table has 2324829 rows > > So 1824592/2324829 = 78.4% of the rows have is_deleted = false, and > 0.06709% of the rows have the relevant folder_id. Therefore the > planner assumes that there will be 2324829 * 78.4% * 0.06709% =~ > 96,000 rows that satisfy both criteria (the original explain had > 97,000; there's some variability due to the fact that the analyze only > samples a random subset of pages), but the real number is 71, leading > it to make a very bad decision. This is a classic "hidden > correlation" problem, where two columns are correlated but the planner > doesn't notice, and you get a terrible plan. > > Unfortunately, I'm not aware of any real good solution to this > problem. The two obvious approaches are multi-column statistics and > planner hints; PostgreSQL supports neither. >
How about partial index (create index idx on item(folder_id) where not is_deleted)? Won't it have required statistics (even if it is not used in plan)?