We've learned the hard way that keys are never removed in GIN indexes,
leading to infinite index size growth in certain scenarios. The PostgreSQL
documentation doesn't mention it even in the "Limitations" section for GIN
indexes (https://www.postgresql.org/docs/15/gin-limit.html).

The only place I found info about the unusual behavior is README file in
the source code:
https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README#L391
.


We keep a tree-like structure in the database and use GIN indexes to find
descendants of a node quickly. The simplified table looks as follows:

node_id | ancestor_ids
--------|-------------
 1      | []
 2      | [1]
 15     | [1, 2]

node_id is a sequence, and nodes are often added and removed from the
table: today node_ids are in the range from 1 to 100K but next month it
will be 500K-600K. Because GIN index never removes entry keys, it will
contain keys from 1 to 600K. In our case the GIN index became 2.5 times
larger than the table - that's the hard-way part.

How about sparing others the hard-way part and explicitly mentioning the
unusual behavior in the "Limitations" section for GIN indexes?

Best regards,
Marcin BarczyƄski

Reply via email to