Thank you for your corrections. Here is the patch with suggestions taken into account, except 6th. >6) I'd rather use alignednewsize here. > + ItemIdSetNormal(tupid, offset + size_diff, newsize); This behavior is accroding to ubiquitous PageAddItem.
Best regards, Andrey Borodin, Octonica & Ural Federal University. 2016-08-16 20:23 GMT+05:00 Anastasia Lubennikova <a.lubennik...@postgrespro.ru>: > 09.08.2016 19:45, Andrew Borodin: >> >> Here is new version of the patch, now it includes recommendations from >> Anastasia Lubennikova. >> >> >> I've investigated anamalous index size decrease. Most probable version >> appeared to be true. >> Cube extension, as some others, use Guttman's polynomial time split >> algorithm. It is known to generate "needle-like" MBBs (MBRs) for >> sorted data due to imbalanced splits (splitting 100 tuples as 98 to >> 2). Due to previous throw-to-the-end behavior of GiST this imbalance >> was further amplificated (most of inserts were going to bigger part >> after split). So GiST inserts were extremely slow for sorted data. >> There is no need to do exact sorting to trigger this behavior. >> This behavior can be fused by implementation of small-m restriction in >> split (AFAIR this is described in original R-tree paper from 84), >> which prescribes to do a split in a parts no smaller than m, where m >> is around 20% of a page capacity (in tuples number). After application >> of this patch performance for sorted data is roughly the same as >> performance for randomized data. > > > Thank you for explanation. Now it's clear to me. I did some more testing and > found nothing special. The declared feature is implemented correctly. > It passes all regression tests and improves performance. > > I still have a few minor nitpicks about the patch style. > You can find a style guide on > https://www.postgresql.org/docs/9.6/static/source.html > > 1) remove extra whitespace in README > > 2) add whitespace: if(ntup == 1) > > 3) fix comments in accordance with coding conventions > > /* In case of single tuple update GiST calls overwrite > * In all other cases function gistplacetopage deletes > * old tuples and place updated at the end > * */ > > > + /* Normally here was following assertion > + * Assert(ItemIdHasStorage(ii)); > + * This assertion was introduced in PageIndexTupleDelete > + * But if this function will be used from BRIN index > + * this assertion will fail. Thus, here we do not check that > + * item has the storage. > + * */ > > 4) remove unrelated changes > - data += sizeof(OffsetNumber) * xldata->ntodelete; > + data += sizeof(OffsetNumber) *xldata->ntodelete; > > 5) If the comment is correct, maxalignment is not necessary. > + /* tuples on a page are always maxaligned */ > + oldsize = MAXALIGN(oldsize); > > 6) I'd rather use alignednewsize here. > + ItemIdSetNormal(tupid, offset + size_diff, newsize); > > > After the cleanup you can change status to "Ready for Committer" > without waiting for the response. > >> If data is randomized this effect of Guttman poly-time split is not in >> effect; test from the start of the thread will show no statistically >> confident improvement by the patch. >> To prove performance increase in randomized case I've composed >> modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1 >> This test with five runs shows following: >> Before patch >> Insert Time AVG 78 seconds STD 9.5 >> Afer patch >> Insert Time AVG 68 seconds STD 7.7 >> This is marginal but statistically viable performance improvement. >> >> For sorted data performance is improved by a factor of 3. >> Best regards, Andrey Borodin, Octonica & Ural Federal University. >> > > -- > Anastasia Lubennikova > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >
Description: Binary data
-- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers