Re: [HACKERS] multivariate statistics v14

Jeff Janes Mon, 21 Mar 2016 22:54:06 -0700

On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra
<[email protected]> wrote:
>
>
> D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
> remaining whitespace issues.


Hi Tomas,

I'm trying out v16 against a common problem, where postgresql thinks
it is likely top stop early during a "order by (index express) limit
1" but it doesn't actually stop early due to cross-column
correlations.  But the multivariate statistics don't seem to help.  Am
I doing this wrong, or just expecting too much?


jjanes=# create table foo as select x, floor(x/(10000000/500))::int as
y  from generate_series(1,10000000) f(x);
jjanes=# create index on foo (x,y);
jjanes=# create index on foo (y,x);
jjanes=# create statistics jjj on foo (x,y) with (dependencies,histogram);
jjanes=# vacuum analyze ;


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x limit 1;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..4.92 rows=1 width=4) (actual rows=1 loops=1)
   ->  Index Only Scan using foo_x_y_idx on foo  (cost=0.43..210156.55
rows=46812 width=4) (actual rows=1 loops=1)
         Index Cond: ((y >= 478) AND (y <= 480))
         Heap Fetches: 0
 Planning time: 0.311 ms
 Execution time: 478.917 ms

Here is walks up the index on x, until it meets the first row meeting
the qualification on y. It thinks it will get to stop early and be
very fast, but it doesn't.

If I add an dummy addition to the ORDER BY, to force it not to talk
the index, I get a plan which uses the other index and is actually
much faster, but is planned to be several hundred times slower:


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x+0 limit 1;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1803.77..1803.77 rows=1 width=8) (actual rows=1 loops=1)
   ->  Sort  (cost=1803.77..1920.80 rows=46812 width=8) (actual rows=1 loops=1)
         Sort Key: ((x + 0))
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Only Scan using foo_y_x_idx on foo
(cost=0.43..1569.70 rows=46812 width=8) (actual rows=60000 loops=1)
               Index Cond: ((y >= 478) AND (y <= 480))
               Heap Fetches: 0
 Planning time: 0.175 ms
 Execution time: 20.264 ms

(I use the "timing off" option, because without it the second plan
spends most of its time calling "gettimeofday")

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics v14

Reply via email to