Jeff Davis <[EMAIL PROTECTED]> writes: > On Thu, 2007-05-03 at 14:33 -0700, jaba the mobzy wrote: >> mycorr_100 took 11.4 s to run although it had to fetch 100000 row from >> the base table. >> mycorr_10 took 24.4 s to run although it had to fetch 10563 row from >> the base table.
> This is because the physical distribution of data is different. The > mycorr_10 table has tuples in which a and b are > 15.9M spread all > throughout. mycorr_100 has them all collected together at the end of the > physical file. Less disk seeking. If the OP had generated the data randomly, as claimed, the rows shouldn't be particularly more clumped in one table than the other. But I sure agree that it sounds like a nonrandom distribution in the mycorr_100 table. FWIW I tried to duplicate the behavior, and could not, using tables made up like this: create table src as select int4(16*1024*1024*random()) as key, int4(16*1024*1024*random()) as a, int4(16*1024*1024*random()) as b from generate_series(1,16*1024*1024); create table mycorr_10 as select key, a, case when random() < 0.1 then a else b end as b from src; create table mycorr_100 as select key, a, a as b from src; create index mycorr_10i on mycorr_10(a,b); create index mycorr_100i on mycorr_100(a,b); vacuum analyze mycorr_10; vacuum analyze mycorr_100; regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate