Re: [HACKERS] [WIP] Better partial index-only scans
Joshua Yanovski pythones...@gmail.com writes: Proof of concept initial patch for enabling index only scans for partial indices even when an attribute is not in the target list, as long as it is only used in restriction clauses that can be proved by the index predicate. This also works for index quals, though they still can't be used in the target list. However, this patch may be inefficient since it duplicates effort that is currently delayed until after the best plan is chosen. I took a quick look at this. I think it's logically incorrect to exclude Vars used only in index quals from the set that the index has to return, since you can't know at this stage whether the index is lossy (ie, might report xs_recheck = TRUE at runtime). While this is moot for btree indexes, it's not moot for SPGiST indexes which also support index-only scans today. In principle we could extend the AM and opclass API and demand that AMs tell us whether they might return xs_recheck = TRUE. However, I'm pretty hesitant to change the opclass APIs for this purpose; it'd likely break third-party code. Moreover, an advantage of confining the patch to considering only partial-index quals is that you could skip all the added work for non-partial indexes, which would probably largely solve the added-planning-time problem. It also includes a minor fix in the same code in createplan.c to make sure we're explicitly comparing an empty list to NIL, but I can take that out if that's not considered in scope. I don't think the existing code is poor style there. There are certainly hundreds of other cases where we treat != NULL or != NIL as implicit (though of course also other places where we don't). ... as I see it performance could improve in any combination of five ways: * Improve the performance of determining which clauses can't be discarded (e.g. precompute some information about equivalence classes for index predicates, mess around with the order in which we check the clauses to make it fail faster, switch to real union-find data structures for equivalence classes). This is certainly possible, though rather open-ended, and it's not clear that it really fixes the objection (ie, if you speed these things up then you still have a performance discrepancy from adding the tests earlier). * Take advantage of work we do here to speed things up elsewhere (e.g. if this does get chosen as the best plan we don't need to recompute the same information in create_indexscan_plan). That would likely be worth doing if we do this, but it will only buy back a small part of the cost, since the whole problem here is we'd be doing this work for all indexes and not only the eventually selected one. * Delay determining whether to use an index scan or index only scan until after cost analysis somehow. I'm not sure exactly what this would entail. That seems impossible to me, since the whole point of an index-only scan is that it's a lot cheaper than a regular one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
Abhijit Menon-Sen a...@2ndquadrant.com writes: I don't think it's fair to mark this as returned with feedback without a more detailed review (I think of returned with feedback as providing a concrete direction to follow). I've set it back to needs review. Does anyone else want to look at this patch? I offered a few comments. I think it might also be useful to try to quantify the worst-case performance cost for this, which would arise for a single-table query on a table with lots of indexes. Don't have time for that myself though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
Joshua Yanovski pythones...@gmail.com: Proof of concept initial patch for enabling index only scans for partial indices even when an attribute is not in the target list, as long as it is only used in restriction clauses that can be proved by the index predicate. This also works for index quals, though they still can't be used in the target list. However, this patch may be inefficient since it duplicates effort that is currently delayed until after the best plan is chosen. I find the improvement very useful. I use functional and partial indexes, a lot. I think we even need a dummy index to make more use of these features. The patch works by basically repeating the logic from create_indexscan_plan in createplan.c that determines which clauses can't be discarded, instead of the current approach, which just assumes that any attributes referenced anywhere in a restriction clause has to be a column in the relevant index. It should build against master and passes make check for me. It also includes a minor fix in the same code in createplan.c to make sure we're explicitly comparing an empty list to NIL, but I can take that out if that's not considered in scope. If this were the final patch I'd probably coalesce the code used in both places into a single function, but since I'm not certain that the implementation in check_index_only won't change substantially I held off on that. Since the original comment suggested that this was not done due to planner performance concerns, I assume the performance of this approach is unacceptable (though I did a few benchmarks and wasn't able to detect a consistent difference--what would be a good test for this?). As such, this is intended as more of a first pass that I can build on, but I wanted to get feedback at this stage on where we can improve (particularly if there were already ideas on how this might be done, as the comment hints). Index only scans cost less than regular index scans so I don't think we can get away with waiting until we've chosen the best plan before we do the work described above. That said, as I see it performance could improve in any combination of five ways: * Improve the performance of determining which clauses can't be discarded (e.g. precompute some information about equivalence classes for index predicates, mess around with the order in which we check the clauses to make it fail faster, switch to real union-find data structures for equivalence classes). * Find a cleverer way of figuring out whether a partial index can be used than just checking which clauses can't be discarded. * Use a simpler heuristic (that doesn't match what use to determine which clauses can be discarded, but still matches more than we do now). * Take advantage of work we do here to speed things up elsewhere (e.g. if this does get chosen as the best plan we don't need to recompute the same information in create_indexscan_plan). * Delay determining whether to use an index scan or index only scan until after cost analysis somehow. I'm not sure exactly what this would entail. I do not know much about the internals of the planner. So, I am not in a position to decide the performance is acceptable or not. If it is not, I think your first solution would minimize the penalty in almost all scenarios. Your other options seems harder to implement. I think, it can also be useful to store predicates implied by the index clause or the index predicate under the path tree. We may make use of them in future improvements. Especially it would be nice to avoid calling expensive functions if they are included in an index. This approach can also simplify the design. The predicates can be stored under IndexPath even index only scan is disabled. They can be used used unconditionally on create_indexscan_plan(). I will update the patch as returned with feedback, but it would be nice if someone more experienced give an opinion about these ideas. I would be happy to review further developments when they are ready. Please let me know if I can help any other way. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
At 2014-06-29 21:55:24 +0300, e...@hasegeli.com wrote: I will update the patch as returned with feedback I don't think it's fair to mark this as returned with feedback without a more detailed review (I think of returned with feedback as providing a concrete direction to follow). I've set it back to needs review. Does anyone else want to look at this patch? -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
On Mon, Mar 17, 2014 at 3:14 AM, Joshua Yanovski pythones...@gmail.com wrote: Here's a SQL script that (1) demonstrates the new index only scan functionality, and (2) at least on my machine, has a consistently higher planning time for the version with my change than without it. I'm glad you're looking at this, but we're in the final throws of nailing down 9.4 and I don't have anticipate I'll have time to look at it in the near future. You should add it here so we don't forget about it: https://commitfest.postgresql.org/action/commitfest_view/open And you might also check this: https://wiki.postgresql.org/wiki/Submitting_a_Patch -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
I'm glad you're looking at this, but we're in the final throws of nailing down 9.4 and I don't have anticipate I'll have time to look at it in the near future. You should add it here so we don't forget about it: https://commitfest.postgresql.org/action/commitfest_view/open Yeah, no worries--you guys are busy enough as it is. As far as adding it to the commitfest goes, I did, actually. Should I add the comment with the testcase as well? I'm investigating further and it's looking to me like what I'm really up against is O(n^2) behavior in the optimizer for OR clauses, but I'll keep looking--don't want to say anything too prematurely. And you might also check this: https://wiki.postgresql.org/wiki/Submitting_a_Patch Also read that--did I do something wrong? I tried to make sure I followed its guidelines. Anyway, thanks for the response :) -- Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
On Tue, Mar 18, 2014 at 4:18 PM, Joshua Yanovski pythones...@gmail.com wrote: I'm glad you're looking at this, but we're in the final throws of nailing down 9.4 and I don't have anticipate I'll have time to look at it in the near future. You should add it here so we don't forget about it: https://commitfest.postgresql.org/action/commitfest_view/open Yeah, no worries--you guys are busy enough as it is. As far as adding it to the commitfest goes, I did, actually. Should I add the comment with the testcase as well? I'm investigating further and it's looking to me like what I'm really up against is O(n^2) behavior in the optimizer for OR clauses, but I'll keep looking--don't want to say anything too prematurely. And you might also check this: https://wiki.postgresql.org/wiki/Submitting_a_Patch Also read that--did I do something wrong? I tried to make sure I followed its guidelines. Anyway, thanks for the response :) No, I think you wrote a nice email, and I didn't think you did anything wrong. It was mostly just a form letter to say, hey, this looks interesting. Glad you were already aware of the resources. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
Here's a SQL script that (1) demonstrates the new index only scan functionality, and (2) at least on my machine, has a consistently higher planning time for the version with my change than without it. On Sun, Mar 16, 2014 at 5:08 AM, Joshua Yanovski pythones...@gmail.com wrote: Proof of concept initial patch for enabling index only scans for partial indices even when an attribute is not in the target list, as long as it is only used in restriction clauses that can be proved by the index predicate. This also works for index quals, though they still can't be used in the target list. However, this patch may be inefficient since it duplicates effort that is currently delayed until after the best plan is chosen. The patch works by basically repeating the logic from create_indexscan_plan in createplan.c that determines which clauses can't be discarded, instead of the current approach, which just assumes that any attributes referenced anywhere in a restriction clause has to be a column in the relevant index. It should build against master and passes make check for me. It also includes a minor fix in the same code in createplan.c to make sure we're explicitly comparing an empty list to NIL, but I can take that out if that's not considered in scope. If this were the final patch I'd probably coalesce the code used in both places into a single function, but since I'm not certain that the implementation in check_index_only won't change substantially I held off on that. Since the original comment suggested that this was not done due to planner performance concerns, I assume the performance of this approach is unacceptable (though I did a few benchmarks and wasn't able to detect a consistent difference--what would be a good test for this?). As such, this is intended as more of a first pass that I can build on, but I wanted to get feedback at this stage on where we can improve (particularly if there were already ideas on how this might be done, as the comment hints). Index only scans cost less than regular index scans so I don't think we can get away with waiting until we've chosen the best plan before we do the work described above. That said, as I see it performance could improve in any combination of five ways: * Improve the performance of determining which clauses can't be discarded (e.g. precompute some information about equivalence classes for index predicates, mess around with the order in which we check the clauses to make it fail faster, switch to real union-find data structures for equivalence classes). * Find a cleverer way of figuring out whether a partial index can be used than just checking which clauses can't be discarded. * Use a simpler heuristic (that doesn't match what use to determine which clauses can be discarded, but still matches more than we do now). * Take advantage of work we do here to speed things up elsewhere (e.g. if this does get chosen as the best plan we don't need to recompute the same information in create_indexscan_plan). * Delay determining whether to use an index scan or index only scan until after cost analysis somehow. I'm not sure exactly what this would entail. Since this is my first real work with the codebase, I'd really appreciate it if people could help me figure out the best approach here (and, more importantly, if one is necessary based on benchmarks). And while this should go without saying, if this patch doesn't actually work then please let me know, since all the above is based on the assumption that what's there is enough :) Thanks, Joshua Yanovski -- Josh test_indexscan.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [WIP] Better partial index-only scans
Proof of concept initial patch for enabling index only scans for partial indices even when an attribute is not in the target list, as long as it is only used in restriction clauses that can be proved by the index predicate. This also works for index quals, though they still can't be used in the target list. However, this patch may be inefficient since it duplicates effort that is currently delayed until after the best plan is chosen. The patch works by basically repeating the logic from create_indexscan_plan in createplan.c that determines which clauses can't be discarded, instead of the current approach, which just assumes that any attributes referenced anywhere in a restriction clause has to be a column in the relevant index. It should build against master and passes make check for me. It also includes a minor fix in the same code in createplan.c to make sure we're explicitly comparing an empty list to NIL, but I can take that out if that's not considered in scope. If this were the final patch I'd probably coalesce the code used in both places into a single function, but since I'm not certain that the implementation in check_index_only won't change substantially I held off on that. Since the original comment suggested that this was not done due to planner performance concerns, I assume the performance of this approach is unacceptable (though I did a few benchmarks and wasn't able to detect a consistent difference--what would be a good test for this?). As such, this is intended as more of a first pass that I can build on, but I wanted to get feedback at this stage on where we can improve (particularly if there were already ideas on how this might be done, as the comment hints). Index only scans cost less than regular index scans so I don't think we can get away with waiting until we've chosen the best plan before we do the work described above. That said, as I see it performance could improve in any combination of five ways: * Improve the performance of determining which clauses can't be discarded (e.g. precompute some information about equivalence classes for index predicates, mess around with the order in which we check the clauses to make it fail faster, switch to real union-find data structures for equivalence classes). * Find a cleverer way of figuring out whether a partial index can be used than just checking which clauses can't be discarded. * Use a simpler heuristic (that doesn't match what use to determine which clauses can be discarded, but still matches more than we do now). * Take advantage of work we do here to speed things up elsewhere (e.g. if this does get chosen as the best plan we don't need to recompute the same information in create_indexscan_plan). * Delay determining whether to use an index scan or index only scan until after cost analysis somehow. I'm not sure exactly what this would entail. Since this is my first real work with the codebase, I'd really appreciate it if people could help me figure out the best approach here (and, more importantly, if one is necessary based on benchmarks). And while this should go without saying, if this patch doesn't actually work then please let me know, since all the above is based on the assumption that what's there is enough :) Thanks, Joshua Yanovski diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c new file mode 100644 index a912174..ed87261 *** a/src/backend/optimizer/path/indxpath.c --- b/src/backend/optimizer/path/indxpath.c *** *** 32,37 --- 32,38 #include optimizer/predtest.h #include optimizer/restrictinfo.h #include optimizer/var.h + #include parser/parsetree.h #include utils/builtins.h #include utils/bytea.h #include utils/lsyscache.h *** static PathClauseUsage *classify_index_c *** 135,141 static Relids get_bitmap_tree_required_outer(Path *bitmapqual); static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds); static intfind_list_position(Node *node, List **nodelist); ! static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index); static double get_loop_count(PlannerInfo *root, Relids outer_relids); static void match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index, --- 136,143 static Relids get_bitmap_tree_required_outer(Path *bitmapqual); static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds); static intfind_list_position(Node *node, List **nodelist); ! static bool check_index_only(PlannerInfo *root, RelOptInfo *rel, ! IndexOptInfo *index, List *index_clauses); static double get_loop_count(PlannerInfo *root, Relids outer_relids); static void match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index, *** build_index_paths(PlannerInfo