Hi,

On 09/05/2015 10:53 AM, Simon Riggs wrote:
On 4 September 2015 at 22:03, Tomas Vondra <tomas.von...@2ndquadrant.com
<mailto:tomas.von...@2ndquadrant.com>> wrote:

    A summary of 100 EXPLAIN timings looks like this:


    master       A          B          C          D          E          F
    -------------------------------------------------------------------------
    min        0.10       0.10       0.30       0.29       0.66       0.23
    max        1.07       1.00       2.13       1.98       4.52       1.59
    median     0.49       0.52       0.31       0.33       0.68       1.12
    average    0.43       0.35       0.62       0.49       1.01       0.89


What are these? Times? in ms?

Yes, those are planning times in milliseconds. I've been thinking about possible issues in the benchmark, and I ended up with two main suspects:

  (a) environment - VM running on a laptop. thus quite noisy and
      subject to various sources of overhead, power-management, etc.

  (b) time measured using \timing in psql (by running EXPLAIN), so
      probably influenced by formatting/transfer

So I reran the benchmark on a different machine (bare metal, pretty much no noise in the results), and measured the planning time using EXPLAIN ANALYZE (Planning Time). And I got this (milliseconds):

           A         B         C         D         E         F
-----------------------------------------------------------------
 min      0.04      0.04      0.11      0.10      0.37      0.12
 max      0.10      0.10      0.92      0.92      1.62      1.23
 median   0.04      0.04      0.11      0.11      0.37      0.13
 average  0.04      0.04      0.11      0.11      0.38      0.14

           A         B         C         D         E         F
-----------------------------------------------------------------
 min      0.04      0.04      0.11      0.11      0.38      0.13
 max      0.10      0.10      0.92      0.94      1.64      1.21
 median   0.04      0.04      0.11      0.11      0.39      0.13
 average  0.04      0.04      0.11      0.12      0.40      0.14

So much lower numbers (better CPU, no virtualization, etc.), but otherwise exactly the same conclusion - no overhead compared to master.

I think of three ways how to make the checks more expensive:

   (a) using more indexes

       The current benchmark already uses 40 indexes (and I've tried
       with 100), and we've seen no impact at all. Adding more indexes
       will eventually show some overhead, but the number of indexes
       will be very high - I doubt anyone has a table with hundreds of
       partial indexes on a it.

   (b) using more complex index predicates

       I expect the predicate_implied_by() call to get more expensive
       for more complex predicates. I however believe that's quite
       uncommon case - vast majority of index predicates that I've seen
       use just a single equality clause.

   (c) using more complex queries (more WHERE conditions)

       Having more complex WHERE clauses seems quite plausible, though,
       so I've decided to try it. Instead of the simple query used
       before:

           select a from t where b >= 100 and b <= 200;

       I've used a query with a bunch of other conditions:

           select a from t where b >= 100 and b <= 200
                             and c >= 100 and c <= 200
                             and d >= 100 and d <= 200
                             and a >= 100 and a <= 100;

       And indeed, this made a (tiny) difference - on the master, the
       planning was 0.50 ms on average, while with the patch it was
       0.55. But 0.05 ms is just barely above noise, even on this HW.

       Of course, this only impacts the case with partial indexes, all
       the other cases were exactly the same with and without patch.

    However that was the idea of the benchmark, and I got no difference.


Please explain what this means and your conclusion, so its clear. That
way we can either reject the patch or commit it. Thanks

That means I've been unable to measure any significant overhead of the patch. There certainly are extreme cases where this patch might make the planning noticeably slower, but I believe those are rather artificial, and certainly wouldn't expect them in databases where a tiny increase of planning time would be a problem.

This benchmark however only looked at the planning overhead, but we should weight that with respect to possible gains. And IOS is a great optimization - it's not uncommon to see 2-3x improvements on databases that fit into RAM, and order of magnitude improvements on large databases (thanks to eliminating the random I/O when accessing the heap).

So my opinion is that we should commit this patch.

regards

--
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to