Greg Stark <st...@mit.edu> writes: > So I've done some empirical tests for a table generated by: > create table sizeskew as (select i,j,repeat('i',i) from > generate_series(1,1000) as i, generate_series(1,1000) as j);
> I find that using the whole block doesn't cause any problem with the > avg_width field for the "repeat" column.That does reinforce my belief > that we might not need any particularly black magic here. > It does however cause a systemic error in the histogram bounds. It > seems the median is systematically overestimated by more and more the > larger the number of rows per block are used: Hm. You can only take N rows from a block if there actually are at least N rows in the block. So the sampling rule I suppose you are using is "select up to N rows from each sampled block" --- and that is going to favor the contents of blocks containing narrower-than-average rows. Now in this case, it looks like that ought to favor rows with *smaller* i values, but you say the median goes up not down. So I'm not sure what's going on. I thought at first that TOAST compression might be part of the explanation, but TOAST shouldn't kick in on rows with raw representation narrower than 2KB. Did you do a run with no upper limit on the number of rows per block? Because I'm not sure that tests with a limit in place are a good guide to what happens without it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers