Here is new version of the patch, now it includes recommendations from Anastasia Lubennikova.
I've investigated anamalous index size decrease. Most probable version appeared to be true. Cube extension, as some others, use Guttman's polynomial time split algorithm. It is known to generate "needle-like" MBBs (MBRs) for sorted data due to imbalanced splits (splitting 100 tuples as 98 to 2). Due to previous throw-to-the-end behavior of GiST this imbalance was further amplificated (most of inserts were going to bigger part after split). So GiST inserts were extremely slow for sorted data. There is no need to do exact sorting to trigger this behavior. This behavior can be fused by implementation of small-m restriction in split (AFAIR this is described in original R-tree paper from 84), which prescribes to do a split in a parts no smaller than m, where m is around 20% of a page capacity (in tuples number). After application of this patch performance for sorted data is roughly the same as performance for randomized data. If data is randomized this effect of Guttman poly-time split is not in effect; test from the start of the thread will show no statistically confident improvement by the patch. To prove performance increase in randomized case I've composed modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1 This test with five runs shows following: Before patch Insert Time AVG 78 seconds STD 9.5 Afer patch Insert Time AVG 68 seconds STD 7.7 This is marginal but statistically viable performance improvement. For sorted data performance is improved by a factor of 3. Best regards, Andrey Borodin, Octonica & Ural Federal University.
Description: Binary data
-- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers