Paul A Jungwirth <[email protected]> writes:
> Our docs for GiST indexes say the compress function is only used for
> internal pages, not leaf pages, but actually it is used everywhere.
> Here are two patches to clean things up.
> You can see that we store compressed values with the pageinspect
> extension. For instance, multiranges are compressed to ranges. Here
> they are in leaf pages:
Actually I think it's more complicated than that. A GiST opclass
can choose whether to compress leaf-key entries, and if it does it
can use a different representation than it does on internal pages.
You can see that in action in compress/decompress functions that
pay attention to the GISTENTRY.leafkey flag, which many do.
So I'm inclined to propose text more like the attached. I merged
your two patches into one (didn't seem all that useful to separate).
Also, I dropped the adjacent sentence suggesting using the STORAGE
option. AFAIK that's pretty useless here: I don't think any GiST
code pays attention to it. At least part of the reason is that it's
inadequate to describe the possibility that leaf and internal datums
are different.
Thoughts?
regards, tom lane
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 5c0a0c48bab..7a5e664db68 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -273,13 +273,15 @@ CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
index will depend on the <function>penalty</function> and <function>picksplit</function>
methods.
Two optional methods are <function>compress</function> and
- <function>decompress</function>, which allow an index to have internal tree data of
- a different type than the data it indexes. The leaves are to be of the
- indexed data type, while the other tree nodes can be of any C struct (but
+ <function>decompress</function>, which allow an index to store keys that
+ are of a different type than the data it indexes. The index entries can be
+ any valid Datums (but
you still have to follow <productname>PostgreSQL</productname> data type rules here,
- see about <literal>varlena</literal> for variable sized data). If the tree's
- internal data type exists at the SQL level, the <literal>STORAGE</literal> option
- of the <command>CREATE OPERATOR CLASS</command> command can be used.
+ see about <literal>varlena</literal> for variable sized data).
+ Furthermore, since <function>compress</function> and
+ <function>decompress</function> are told whether they are working on
+ Datums for leaf-level or internal pages, different representations
+ can be used for leaf keys than higher-level pages.
The optional eighth method is <function>distance</function>, which is needed
if the operator class wishes to support ordered scans (nearest-neighbor
searches). The optional ninth method <function>fetch</function> is needed if the
diff --git a/src/backend/access/gist/README b/src/backend/access/gist/README
index 76e0e11f228..75445b07455 100644
--- a/src/backend/access/gist/README
+++ b/src/backend/access/gist/README
@@ -10,9 +10,13 @@ GiST stands for Generalized Search Tree. It was introduced in the seminal paper
Jeffrey F. Naughton, Avi Pfeffer:
http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
+
+Concurrency support was described in "Concurrency and Recovery in Generalized
+Search Trees", 1997, Marcel Kornacker, C. Mohan, Joseph M. Hellerstein:
+
https://dsf.berkeley.edu/papers/sigmod97-gist.pdf
-and implemented by J. Hellerstein and P. Aoki in an early version of
+GiST was implemented by J. Hellerstein and P. Aoki in an early version of
PostgreSQL (more details are available from The GiST Indexing Project
at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
project it had a limited number of features and was in rare use.
@@ -55,6 +59,9 @@ The original algorithms were modified in several ways:
it is now a single-pass algorithm.
* Since the papers were theoretical, some details were omitted and we
had to find out ourself how to solve some specific problems.
+* The 1997 paper above (but not the 1995 one) states that leaf pages should
+ store the original key. While that can be done in PostgreSQL, it is
+ also possible to use a compressed representation in leaf pages.
Because of the above reasons, we have revised the interaction of GiST
core and PostgreSQL WAL system. Moreover, we encountered (and solved)