Re: [PATCHES] LIMIT/SORT optimization

Bruce Momjian Sat, 07 Apr 2007 18:19:11 -0700

I reran the test using:

        test=> CREATE TABLE test (x INTEGER);
        test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000);
        test=> SET log_min_duration_statement = 0;


and got on an unpatched system:

        1751.320 ms  select * from (select * from test order by x limit 3) as x 
limit 1;
        1725.092 ms  select * from (select * from test order by x limit 3) as x 
limit 1;
        1709.463 ms  select * from (select * from test order by x limit 3) as x 
limit 1;
        1702.917 ms  select * from (select * from test order by x limit 10) as 
x limit 1;
        1705.793 ms  select * from (select * from test order by x limit 10) as 
x limit 1;
        1704.046 ms  select * from (select * from test order by x limit 10) as 
x limit 1;
        1699.730 ms  select * from (select * from test order by x limit 100) as 
x limit 1;
        1712.628 ms  select * from (select * from test order by x limit 100) as 
x limit 1;
        1699.454 ms  select * from (select * from test order by x limit 100) as 
x limit 1;
        1720.207 ms  select * from (select * from test order by x limit 1000) 
as x limit 1;
        1725.519 ms  select * from (select * from test order by x limit 1000) 
as x limit 1;
        1728.933 ms  select * from (select * from test order by x limit 1000) 
as x limit 1;
        1699.609 ms  select * from (select * from test order by x limit 10000) 
as x limit 1;
        1698.386 ms  select * from (select * from test order by x limit 10000) 
as x limit 1;
        1698.985 ms  select * from (select * from test order by x limit 10000) 
as x limit 1;
        1700.740 ms  select * from (select * from test order by x limit 100000) 
as x limit 1;
        1700.989 ms  select * from (select * from test order by x limit 100000) 
as x limit 1;
        1695.771 ms  select * from (select * from test order by x limit 100000) 
as x limit 1;

which is expected because the sort work is constant.  With the patch I
see:
        
        433.892 ms  select * from (select * from test order by x limit 3) as x 
limit 1;
        496.016 ms  select * from (select * from test order by x limit 3) as x 
limit 1;
        434.604 ms  select * from (select * from test order by x limit 3) as x 
limit 1;
        433.265 ms  select * from (select * from test order by x limit 10) as x 
limit 1;
        432.058 ms  select * from (select * from test order by x limit 10) as x 
limit 1;
        431.329 ms  select * from (select * from test order by x limit 10) as x 
limit 1;
        429.722 ms  select * from (select * from test order by x limit 100) as 
x limit 1;
        434.754 ms  select * from (select * from test order by x limit 100) as 
x limit 1;
        429.758 ms  select * from (select * from test order by x limit 100) as 
x limit 1;
        432.060 ms  select * from (select * from test order by x limit 1000) as 
x limit 1;
        432.523 ms  select * from (select * from test order by x limit 1000) as 
x limit 1;
        433.917 ms  select * from (select * from test order by x limit 1000) as 
x limit 1;
        449.885 ms  select * from (select * from test order by x limit 10000) 
as x limit 1;
        450.182 ms  select * from (select * from test order by x limit 10000) 
as x limit 1;
        450.536 ms  select * from (select * from test order by x limit 10000) 
as x limit 1;
        1771.807 ms  select * from (select * from test order by x limit 100000) 
as x limit 1;
        1746.628 ms  select * from (select * from test order by x limit 100000) 
as x limit 1;
        1795.600 ms  select * from (select * from test order by x limit 100000) 
as x limit 1;

The patch is faster until we hit 100k or 10% of the table, at which
point it is the same speed.  What is interesting is 1M is also the same
speed:

        1756.401 ms  select * from (select * from test order by x limit 
1000000) as x limit 1;
        1744.104 ms  select * from (select * from test order by x limit 
1000000) as x limit 1;
        1734.198 ms  select * from (select * from test order by x limit 
1000000) as x limit 1;

This is with the default work_mem of '1M'.  I used LIMIT 1 so the times
were not affected by the size of the data transfer to the client.


---------------------------------------------------------------------------

Bruce Momjian wrote:
> 
> I did some performance testing of the patch, and the results were good. 
> I did this:
> 
>       test=> CREATE TABLE test (x INTEGER);
>       test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000);
>       test=> SET log_min_duration_statement = 0;
>       test=> SELECT * FROM test ORDER BY x LIMIT 3;
> 
> and the results where, before the patch, for three runs:
> 
>   LOG:  duration: 1753.518 ms  statement: select * from test order by x limit 
> 3;
>   LOG:  duration: 1766.019 ms  statement: select * from test order by x limit 
> 3;
>   LOG:  duration: 1777.520 ms  statement: select * from test order by x limit 
> 3;
> 
> and after the patch:
> 
>   LOG:  duration: 449.649 ms  statement: select * from test order by x limit 
> 3;
>   LOG:  duration: 443.450 ms  statement: select * from test order by x limit 
> 3;
>   LOG:  duration: 443.086 ms  statement: select * from test order by x limit 
> 3;
> 
> ---------------------------------------------------------------------------
> 
> Gregory Stark wrote:
> > 
> > Updated patch attached:
> > 
> > 1) Removes #if 0 optimizations
> > 
> > 2) Changes #if 0 to #if NOT_USED for code that's there for completeness and 
> > to
> >    keep the code self-documenting purposes rather but isn't needed by 
> > anything
> >    currently
> > 
> > 3) Fixed cost model to represent bounded sorts
> > 
> > 
> 
> [ Attachment, skipping... ]
> 
> > 
> > 
> > "Gregory Stark" <[EMAIL PROTECTED]> writes:
> > 
> > > "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> > >
> > >> There's a few blocks of code surrounded with "#if 0 - #endif". Are those 
> > >> just
> > >> leftovers that should be removed, or are things that still need to 
> > >> finished and
> > >> enabled?
> > >
> > > Uhm, I don't remember, will go look, thanks.
> > 
> > Ok, they were left over code from an optimization that I decided wasn't very
> > important to pursue. The code that was ifdef'd out detected when disk sorts
> > could abort a disk sort merge because it had already generated enough tuples
> > for to satisfy the limit. 
> > 
> > But I never wrote the code to actually abort the run and it looks a bit
> > tricky. In any case the disk sort use case is extremely narrow, you would 
> > need
> > something like "LIMIT 50000" or more to do it and it would have to be a an
> > input table huge enough to cause multiple rounds of merges.
> > 
> > 
> > I think I've figured out how to adjust the cost model. It turns out that it
> > doesn't usually matter whether the cost model is correct since any case 
> > where
> > the optimization kicks in is a case you're reading a small portion of the
> > input so it's a case where an index would be *much* better if available. So
> > the only times the optimization is used is when there's no index available.
> > Nonetheless it's nice to get the estimates right so that higher levels in 
> > the
> > plan get reasonable values.
> > 
> > I think I figured out how to do the cost model. At least the results are
> > reasonable. I'm not sure if I've done it the "right" way though.
> > 
> > 
> > -- 
> >   Gregory Stark
> >   EnterpriseDB          http://www.enterprisedb.com
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> 
> -- 
>   Bruce Momjian  <[EMAIL PROTECTED]>          http://momjian.us
>   EnterpriseDB                               http://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [PATCHES] LIMIT/SORT optimization

Reply via email to