Re: [HACKERS] [WIP] Better partial index-only scans

2014-06-30 Thread Tom Lane
Joshua Yanovski pythones...@gmail.com writes:
 Proof of concept initial patch for enabling index only scans for
 partial indices even when an attribute is not in the target list, as
 long as it is only used in restriction clauses that can be proved by
 the index predicate.  This also works for index quals, though they
 still can't be used in the target list.  However, this patch may be
 inefficient since it duplicates effort that is currently delayed until
 after the best plan is chosen.

I took a quick look at this.  I think it's logically incorrect to exclude
Vars used only in index quals from the set that the index has to return,
since you can't know at this stage whether the index is lossy (ie, might
report xs_recheck = TRUE at runtime).  While this is moot for btree
indexes, it's not moot for SPGiST indexes which also support index-only
scans today.

In principle we could extend the AM and opclass API and demand that AMs
tell us whether they might return xs_recheck = TRUE.  However, I'm pretty
hesitant to change the opclass APIs for this purpose; it'd likely break
third-party code.  Moreover, an advantage of confining the patch to
considering only partial-index quals is that you could skip all the added
work for non-partial indexes, which would probably largely solve the
added-planning-time problem.

 It also includes a minor
 fix in the same code in createplan.c to make sure we're explicitly
 comparing an empty list to NIL, but I can take that out if that's not
 considered in scope.

I don't think the existing code is poor style there.  There are certainly
hundreds of other cases where we treat != NULL or != NIL as implicit
(though of course also other places where we don't).

 ... as I see it performance could improve in any combination of five
 ways:
 * Improve the performance of determining which clauses can't be
 discarded (e.g. precompute some information about equivalence classes
 for index predicates, mess around with the order in which we check the
 clauses to make it fail faster, switch to real union-find data
 structures for equivalence classes).

This is certainly possible, though rather open-ended, and it's not clear
that it really fixes the objection (ie, if you speed these things up then
you still have a performance discrepancy from adding the tests earlier).

 * Take advantage of work we do here to speed things up elsewhere (e.g.
 if this does get chosen as the best plan we don't need to recompute
 the same information in create_indexscan_plan).

That would likely be worth doing if we do this, but it will only buy
back a small part of the cost, since the whole problem here is we'd
be doing this work for all indexes and not only the eventually selected
one.

 * Delay determining whether to use an index scan or index only scan
 until after cost analysis somehow.  I'm not sure exactly what this
 would entail.

That seems impossible to me, since the whole point of an index-only
scan is that it's a lot cheaper than a regular one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 I don't think it's fair to mark this as returned with feedback without a
 more detailed review (I think of returned with feedback as providing a
 concrete direction to follow). I've set it back to needs review.

 Does anyone else want to look at this patch?

I offered a few comments.  I think it might also be useful to try to
quantify the worst-case performance cost for this, which would arise
for a single-table query on a table with lots of indexes.  Don't
have time for that myself though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-06-29 Thread Emre Hasegeli
Joshua Yanovski pythones...@gmail.com:
 Proof of concept initial patch for enabling index only scans for
 partial indices even when an attribute is not in the target list, as
 long as it is only used in restriction clauses that can be proved by
 the index predicate.  This also works for index quals, though they
 still can't be used in the target list.  However, this patch may be
 inefficient since it duplicates effort that is currently delayed until
 after the best plan is chosen.

I find the improvement very useful.  I use functional and partial
indexes, a lot.  I think we even need a dummy index to make more use
of these features.

 The patch works by basically repeating the logic from
 create_indexscan_plan in createplan.c that determines which clauses
 can't be discarded, instead of the current approach, which just
 assumes that any attributes referenced anywhere in a restriction
 clause has to be a column in the relevant index.  It should build
 against master and passes make check for me.  It also includes a minor
 fix in the same code in createplan.c to make sure we're explicitly
 comparing an empty list to NIL, but I can take that out if that's not
 considered in scope.  If this were the final patch I'd probably
 coalesce the code used in both places into a single function, but
 since I'm not certain that the implementation in check_index_only
 won't change substantially I held off on that.
 
 Since the original comment suggested that this was not done due to
 planner performance concerns, I assume the performance of this
 approach is unacceptable (though I did a few benchmarks and wasn't
 able to detect a consistent difference--what would be a good test for
 this?).  As such, this is intended as more of a first pass that I can
 build on, but I wanted to get feedback at this stage on where we can
 improve (particularly if there were already ideas on how this might be
 done, as the comment hints).  Index only scans cost less than regular
 index scans so I don't think we can get away with waiting until we've
 chosen the best plan before we do the work described above.  That
 said, as I see it performance could improve in any combination of five
 ways:
 * Improve the performance of determining which clauses can't be
 discarded (e.g. precompute some information about equivalence classes
 for index predicates, mess around with the order in which we check the
 clauses to make it fail faster, switch to real union-find data
 structures for equivalence classes).
 * Find a cleverer way of figuring out whether a partial index can be
 used than just checking which clauses can't be discarded.
 * Use a simpler heuristic (that doesn't match what use to determine
 which clauses can be discarded, but still matches more than we do
 now).
 * Take advantage of work we do here to speed things up elsewhere (e.g.
 if this does get chosen as the best plan we don't need to recompute
 the same information in create_indexscan_plan).
 * Delay determining whether to use an index scan or index only scan
 until after cost analysis somehow.  I'm not sure exactly what this
 would entail.

I do not know much about the internals of the planner.  So, I am not
in a position to decide the performance is acceptable or not.  If it
is not, I think your first solution would minimize the penalty in
almost all scenarios.  Your other options seems harder to implement.

I think, it can also be useful to store predicates implied by the
index clause or the index predicate under the path tree.  We may
make use of them in future improvements.  Especially it would be
nice to avoid calling expensive functions if they are included in
an index.  This approach can also simplify the design.  The predicates
can be stored under IndexPath even index only scan is disabled.
They can be used used unconditionally on create_indexscan_plan().

I will update the patch as returned with feedback, but it would
be nice if someone more experienced give an opinion about these
ideas.  I would be happy to review further developments when they
are ready.  Please let me know if I can help any other way.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-06-29 Thread Abhijit Menon-Sen
At 2014-06-29 21:55:24 +0300, e...@hasegeli.com wrote:

 I will update the patch as returned with feedback

I don't think it's fair to mark this as returned with feedback without a
more detailed review (I think of returned with feedback as providing a
concrete direction to follow). I've set it back to needs review.

Does anyone else want to look at this patch?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-03-18 Thread Robert Haas
On Mon, Mar 17, 2014 at 3:14 AM, Joshua Yanovski pythones...@gmail.com wrote:
 Here's a SQL script that (1) demonstrates the new index only scan
 functionality, and (2) at least on my machine, has a consistently
 higher planning time for the version with my change than without it.

I'm glad you're looking at this, but we're in the final throws of
nailing down 9.4 and I don't have anticipate I'll have time to look at
it in the near future.  You should add it here so we don't forget
about it:

https://commitfest.postgresql.org/action/commitfest_view/open

And you might also check this:
https://wiki.postgresql.org/wiki/Submitting_a_Patch

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-03-18 Thread Joshua Yanovski
 I'm glad you're looking at this, but we're in the final throws of
 nailing down 9.4 and I don't have anticipate I'll have time to look at
 it in the near future.  You should add it here so we don't forget
 about it:

 https://commitfest.postgresql.org/action/commitfest_view/open
Yeah, no worries--you guys are busy enough as it is.  As far as adding
it to the commitfest goes, I did, actually.  Should I add the comment
with the testcase as well?  I'm investigating further and it's looking
to me like what I'm really up against is O(n^2) behavior in the
optimizer for OR clauses, but I'll keep looking--don't want to say
anything too prematurely.

 And you might also check this:
 https://wiki.postgresql.org/wiki/Submitting_a_Patch

Also read that--did I do something wrong?  I tried to make sure I
followed its guidelines.  Anyway, thanks for the response :)

-- 
Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-03-18 Thread Robert Haas
On Tue, Mar 18, 2014 at 4:18 PM, Joshua Yanovski pythones...@gmail.com wrote:
 I'm glad you're looking at this, but we're in the final throws of
 nailing down 9.4 and I don't have anticipate I'll have time to look at
 it in the near future.  You should add it here so we don't forget
 about it:

 https://commitfest.postgresql.org/action/commitfest_view/open
 Yeah, no worries--you guys are busy enough as it is.  As far as adding
 it to the commitfest goes, I did, actually.  Should I add the comment
 with the testcase as well?  I'm investigating further and it's looking
 to me like what I'm really up against is O(n^2) behavior in the
 optimizer for OR clauses, but I'll keep looking--don't want to say
 anything too prematurely.

 And you might also check this:
 https://wiki.postgresql.org/wiki/Submitting_a_Patch

 Also read that--did I do something wrong?  I tried to make sure I
 followed its guidelines.  Anyway, thanks for the response :)

No, I think you wrote a nice email, and I didn't think you did
anything wrong.  It was mostly just a form letter to say, hey, this
looks interesting.  Glad you were already aware of the resources.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-03-17 Thread Joshua Yanovski
Here's a SQL script that (1) demonstrates the new index only scan
functionality, and (2) at least on my machine, has a consistently
higher planning time for the version with my change than without it.

On Sun, Mar 16, 2014 at 5:08 AM, Joshua Yanovski pythones...@gmail.com wrote:
 Proof of concept initial patch for enabling index only scans for
 partial indices even when an attribute is not in the target list, as
 long as it is only used in restriction clauses that can be proved by
 the index predicate.  This also works for index quals, though they
 still can't be used in the target list.  However, this patch may be
 inefficient since it duplicates effort that is currently delayed until
 after the best plan is chosen.

 The patch works by basically repeating the logic from
 create_indexscan_plan in createplan.c that determines which clauses
 can't be discarded, instead of the current approach, which just
 assumes that any attributes referenced anywhere in a restriction
 clause has to be a column in the relevant index.  It should build
 against master and passes make check for me.  It also includes a minor
 fix in the same code in createplan.c to make sure we're explicitly
 comparing an empty list to NIL, but I can take that out if that's not
 considered in scope.  If this were the final patch I'd probably
 coalesce the code used in both places into a single function, but
 since I'm not certain that the implementation in check_index_only
 won't change substantially I held off on that.

 Since the original comment suggested that this was not done due to
 planner performance concerns, I assume the performance of this
 approach is unacceptable (though I did a few benchmarks and wasn't
 able to detect a consistent difference--what would be a good test for
 this?).  As such, this is intended as more of a first pass that I can
 build on, but I wanted to get feedback at this stage on where we can
 improve (particularly if there were already ideas on how this might be
 done, as the comment hints).  Index only scans cost less than regular
 index scans so I don't think we can get away with waiting until we've
 chosen the best plan before we do the work described above.  That
 said, as I see it performance could improve in any combination of five
 ways:
 * Improve the performance of determining which clauses can't be
 discarded (e.g. precompute some information about equivalence classes
 for index predicates, mess around with the order in which we check the
 clauses to make it fail faster, switch to real union-find data
 structures for equivalence classes).
 * Find a cleverer way of figuring out whether a partial index can be
 used than just checking which clauses can't be discarded.
 * Use a simpler heuristic (that doesn't match what use to determine
 which clauses can be discarded, but still matches more than we do
 now).
 * Take advantage of work we do here to speed things up elsewhere (e.g.
 if this does get chosen as the best plan we don't need to recompute
 the same information in create_indexscan_plan).
 * Delay determining whether to use an index scan or index only scan
 until after cost analysis somehow.  I'm not sure exactly what this
 would entail.

 Since this is my first real work with the codebase, I'd really
 appreciate it if people could help me figure out the best approach
 here (and, more importantly, if one is necessary based on benchmarks).
  And while this should go without saying, if this patch doesn't
 actually work then please let me know, since all the above is based on
 the assumption that what's there is enough :)

 Thanks,
 Joshua Yanovski



-- 
Josh


test_indexscan.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [WIP] Better partial index-only scans

2014-03-16 Thread Joshua Yanovski
Proof of concept initial patch for enabling index only scans for
partial indices even when an attribute is not in the target list, as
long as it is only used in restriction clauses that can be proved by
the index predicate.  This also works for index quals, though they
still can't be used in the target list.  However, this patch may be
inefficient since it duplicates effort that is currently delayed until
after the best plan is chosen.

The patch works by basically repeating the logic from
create_indexscan_plan in createplan.c that determines which clauses
can't be discarded, instead of the current approach, which just
assumes that any attributes referenced anywhere in a restriction
clause has to be a column in the relevant index.  It should build
against master and passes make check for me.  It also includes a minor
fix in the same code in createplan.c to make sure we're explicitly
comparing an empty list to NIL, but I can take that out if that's not
considered in scope.  If this were the final patch I'd probably
coalesce the code used in both places into a single function, but
since I'm not certain that the implementation in check_index_only
won't change substantially I held off on that.

Since the original comment suggested that this was not done due to
planner performance concerns, I assume the performance of this
approach is unacceptable (though I did a few benchmarks and wasn't
able to detect a consistent difference--what would be a good test for
this?).  As such, this is intended as more of a first pass that I can
build on, but I wanted to get feedback at this stage on where we can
improve (particularly if there were already ideas on how this might be
done, as the comment hints).  Index only scans cost less than regular
index scans so I don't think we can get away with waiting until we've
chosen the best plan before we do the work described above.  That
said, as I see it performance could improve in any combination of five
ways:
* Improve the performance of determining which clauses can't be
discarded (e.g. precompute some information about equivalence classes
for index predicates, mess around with the order in which we check the
clauses to make it fail faster, switch to real union-find data
structures for equivalence classes).
* Find a cleverer way of figuring out whether a partial index can be
used than just checking which clauses can't be discarded.
* Use a simpler heuristic (that doesn't match what use to determine
which clauses can be discarded, but still matches more than we do
now).
* Take advantage of work we do here to speed things up elsewhere (e.g.
if this does get chosen as the best plan we don't need to recompute
the same information in create_indexscan_plan).
* Delay determining whether to use an index scan or index only scan
until after cost analysis somehow.  I'm not sure exactly what this
would entail.

Since this is my first real work with the codebase, I'd really
appreciate it if people could help me figure out the best approach
here (and, more importantly, if one is necessary based on benchmarks).
 And while this should go without saying, if this patch doesn't
actually work then please let me know, since all the above is based on
the assumption that what's there is enough :)

Thanks,
Joshua Yanovski
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
new file mode 100644
index a912174..ed87261
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
***
*** 32,37 
--- 32,38 
  #include optimizer/predtest.h
  #include optimizer/restrictinfo.h
  #include optimizer/var.h
+ #include parser/parsetree.h
  #include utils/builtins.h
  #include utils/bytea.h
  #include utils/lsyscache.h
*** static PathClauseUsage *classify_index_c
*** 135,141 
  static Relids get_bitmap_tree_required_outer(Path *bitmapqual);
  static void find_indexpath_quals(Path *bitmapqual, List **quals, List 
**preds);
  static intfind_list_position(Node *node, List **nodelist);
! static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index);
  static double get_loop_count(PlannerInfo *root, Relids outer_relids);
  static void match_restriction_clauses_to_index(RelOptInfo *rel,
   IndexOptInfo 
*index,
--- 136,143 
  static Relids get_bitmap_tree_required_outer(Path *bitmapqual);
  static void find_indexpath_quals(Path *bitmapqual, List **quals, List 
**preds);
  static intfind_list_position(Node *node, List **nodelist);
! static bool check_index_only(PlannerInfo *root, RelOptInfo *rel,
! IndexOptInfo *index, List *index_clauses);
  static double get_loop_count(PlannerInfo *root, Relids outer_relids);
  static void match_restriction_clauses_to_index(RelOptInfo *rel,
   IndexOptInfo 
*index,
*** build_index_paths(PlannerInfo