Re: [HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

Tomas Vondra Tue, 01 Sep 2015 08:43:20 -0700


On 09/01/2015 11:31 AM, Alexander Korotkov wrote:
...


Yes, In general GIN is a btree with effective duplicates handling +
support of splitting single datums into multiple keys.
This proposal is mostly porting duplicates handling from GIN to btree.

    Sure, there are differences - GIN indexes don't handle UNIQUE indexes,


The difference between btree_gin and btree is not only UNIQUE feature.
1) There is no gingettuple in GIN. GIN supports only bitmap scans. And
it's not feasible to add gingettuple to GIN. At least with same
semantics as it is in btree.
2) GIN doesn't support multicolumn indexes in the way btree does.
Multicolumn GIN is more like set of separate singlecolumn GINs: it
doesn't have composite keys.
3) btree_gin can't effectively handle range searches. "a < x < b" would
be hangle as "a < x" intersect "x < b". That is extremely inefficient.
It is possible to fix. However, there is no clear proposal how to fit
this case into GIN interface, yet.

    but the compression can only be effective when there are duplicate
    rows. So either the index is not UNIQUE (so the b-tree feature is
    not needed), or there are many updates.

 From my observations users can use btree_gin only in some cases. They
like compression, but can't use btree_gin mostly because of #1.

Thanks for the explanation! I'm not that familiar with GIN internals,but this mostly matches my understanding. I have only mentioned UNIQUEbecause the lack of gettuple() method seems obvious - and it works finewhen GIN indexes are used as "bitmap indexes".

But you're right - we can't do index only scans on GIN indexes, which isa huge benefit of btree indexes.


    Which brings me to the other benefit of btree indexes - they are
    designed for high concurrency. How much is this going to be affected
    by introducing the posting lists?


I'd notice that current duplicates handling in PostgreSQL is hack over
original btree. It is designed so in btree access method in PostgreSQL,
not btree in general.
Posting lists shouldn't change concurrency much. Currently, in btree you
have to lock one page exclusively when you're inserting new value.
When posting list is small and fits one page you have to do similar
thing: exclusive lock of one page to insert new value.
When you have posting tree, you have to do exclusive lock on one page of
posting tree.

OK.


One can say that concurrency would became worse because index would
become smaller and number of pages would became smaller too. Since
number of pages would be smaller, backends are more likely concur for
the same page. But this argument can be user against any compression and
for any bloat.

Which might be a problem for some use cases, but I assume we could addan option disabling this per-index. Probably having it "off" by default,and only enabling the compression explicitly.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

Reply via email to