2017-01-10 14:53 GMT+05:00 Alexander Korotkov <a.korot...@postgrespro.ru>: > 1. What project ideas we have?
I have one more project of interest which I can mentor. Topic. GiST API advancement ===Idea=== GiST API was designed at the beginning of 90th to reduce boilerplate code around data access methods over balanced tree. Now, after 30 years, there are some ideas on improving this API. ===Project details=== Opclass developer must specify 4 core operations to make a type GiST-indexable: 1. Split: a function to split set of datatype instances into two parts. 2. Penalty calculation: a function to measure penalty for unification of two keys. 3. Collision check: a function which determines whether two keys may have overlap or are not intersecting. 4. Unification: a function to combine two keys into one so that combined key collides with both input keys. Functions 2 and 3 can be improved. For example, Revised R*-tree[1] algorithm of insertion cannot be expressed in terms of penalty-based algorithms. There was some attempts to bring parts of RR*-tree insertion, but they come down to ugly hacks [2]. Current GiST API, due to penalty-based insertion algorithm, does not allow to implement important feature of RR*-tree: overlap optimization. As Norbert Beckman, author of RR*-tree, put it in discussion: “Overlap optimization is one of the main elements, if not the main query performance tuning element of the RR*-tree. You would fall back to old R-Tree times if that would be left off.” Collision check currently returns binary result: 1. Query may be collides with subtree MBR 2. Query do not collides with subtree This result may be augmented with a third state: subtree is totally within query. In this case GiST scan can scan down subtree without key checks. Potential effect of these improvements must be benchmarked. Probably, implementation of these two will spawn more ideas on GiST performance improvements. Finally, GiST do not provide API for bulk loading. Alexander Korotkov during GSoC 2011 implemented buffered GiST build. This index construction is faster, but yields the index tree with virtually same querying performance. There are different algorithms aiming to provide better indexing tree due to some knowledge of data, e.g. [3] [1] Beckmann, Norbert, and Bernhard Seeger. "A revised r*-tree in comparison with related index structures." Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 2009. [2] https://www.postgresql.org/message-id/flat/CAJEAwVFMo-FXaJ6Lkj8Wtb1br0MtBY48EGMVEJBOodROEGykKg%40mail.gmail.com#cajeawvfmo-fxaj6lkj8wtb1br0mtby48egmvejboodroegy...@mail.gmail.com [3] Achakeev, Daniar, Bernhard Seeger, and Peter Widmayer. "Sort-based query-adaptive loading of r-trees." Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers