Teodor,
Attached is a diff -c against your original gindocs patch. I did my
best not to change any of the semantics. My changes no doubt overlap &
conflict with those Jeff Davis sent you earlier, so consider both of our
diffs.
Thanks,
Dave Fuhry
Teodor Sigaev wrote:
Patch adds GIN documentation and slightly improves GiST docs.
Somebody of native English speakers, pls, check the text... Thank you.
------------------------------------------------------------------------
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
*** gindocs.orig 2006-09-17 00:21:38.000000000 -0400
--- gindocs 2006-09-17 00:57:12.000000000 -0400
***************
*** 22,28 ****
! </indexterm>
! <listitem>
! <para>
! ! Soft upper limit of the size of the returned set by GIN index. For more
! information see <xref linkend="gin-tips">.
! </para>
! </listitem>
--- 22,28 ----
! </indexterm>
! <listitem>
! <para>
! ! Soft upper limit of the size of the set returned by the GIN index. For more
! information see <xref linkend="gin-tips">.
! </para>
! </listitem>
***************
*** 88,95 ****
+ <para>
+ <acronym>GIN</acronym> stands for Generalized Inverted Index. It is
+ an index structure storing a set of (key, posting list) pairs, where
! + 'posting list' is a set of rows in which the key occurs. The
! + row may contains a lot of keys.
+ </para>
+
+ <para>
--- 88,95 ----
+ <para>
+ <acronym>GIN</acronym> stands for Generalized Inverted Index. It is
+ an index structure storing a set of (key, posting list) pairs, where
! + 'posting list' is a set of rows in which the key occurs. Each
! + row may contain many keys.
+ </para>
+
+ <para>
***************
*** 178,184 ****
+ <listitem>
+ <para>
+ Returns an array of keys of the query to be executed. n contains
! + strategy number of operation (see <xref linkend="xindex-strategies">).
+ Depending on n, query may be different type.
+ </para>
+ </listitem>
--- 178,184 ----
+ <listitem>
+ <para>
+ Returns an array of keys of the query to be executed. n contains
! + the strategy number of the operation (see <xref linkend="xindex-strategies">).
+ Depending on n, query may be different type.
+ </para>
+ </listitem>
***************
*** 188,196 ****
+ <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
+ <listitem>
+ <para>
! + Returns TRUE if indexed value satisfies query qualifier with strategy n
+ (or may satisfy in case of RECHECK mark in operator class).
! + Each element of the check array is TRUE if indexed value has a
+ corresponding key in the query: if (check[i] == TRUE ) the i-th key of
+ the query is present in the indexed value.
+ </para>
--- 188,196 ----
+ <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
+ <listitem>
+ <para>
! + Returns TRUE if the indexed value satisfies the query qualifier with strategy n
+ (or may satisfy in case of RECHECK mark in operator class).
! + Each element of the check array is TRUE if the indexed value has a
+ corresponding key in the query: if (check[i] == TRUE ) the i-th key of
+ the query is present in the indexed value.
+ </para>
***************
*** 209,218 ****
+ <term>Create vs insert</term>
+ <listitem>
+ <para>
! + In most cases, insertion into <acronym>GIN</acronym> index is slow enough
! + due to a lot keys should be inserted per one value. So, for bulk upload
! + data in table it will be useful to drop index and create it
! + after finishing upload.
+ </para>
+ </listitem>
+ </varlistentry>
--- 209,218 ----
+ <term>Create vs insert</term>
+ <listitem>
+ <para>
! + In most cases, insertion into a <acronym>GIN</acronym> index is slow
! + due to the likelihood of many keys being inserted for each value. So, for bulk insertions into a
! + table it is advisable to to drop the GIN index and recreate it
! + after finishing bulk insertion.
+ </para>
+ </listitem>
+ </varlistentry>
***************
*** 221,227 ****
+ <term>gin_fuzzy_search_limit</term>
+ <listitem>
+ <para>
! + The primary goal of development <acronym>GIN</acronym> indices was
+ support for highly scalable, full-text search in
+ <productname>PostgreSQL</productname> and there are often situations when
+ a full-text search returns a very large set of results. Since reading
--- 221,227 ----
+ <term>gin_fuzzy_search_limit</term>
+ <listitem>
+ <para>
! + The primary goal of developing <acronym>GIN</acronym> indices was
+ support for highly scalable, full-text search in
+ <productname>PostgreSQL</productname> and there are often situations when
+ a full-text search returns a very large set of results. Since reading
***************
*** 232,238 ****
+ <para>
+ Such queries usually contain very frequent words, so the results are not
+ very helpful. To facilitate execution of such queries
! + <acronym>GIN</acronym> has a configurable soft upper limit of the size
+ of the returned set, determined by the
+ <varname>gin_fuzzy_search_limit</varname> GUC variable. It is set to 0 by
+ default (no limit).
--- 232,238 ----
+ <para>
+ Such queries usually contain very frequent words, so the results are not
+ very helpful. To facilitate execution of such queries
! + <acronym>GIN</acronym> has a configurable soft upper limit of the size
+ of the returned set, determined by the
+ <varname>gin_fuzzy_search_limit</varname> GUC variable. It is set to 0 by
+ default (no limit).
***************
*** 256,271 ****
+ <title>Limitations</title>
+
+ <para>
! + <acronym>GIN</acronym> doesn't support full scan of index due to it's
! + extremely inefficiency: because of a lot of keys per value,
+ each heap pointer will returned several times.
+ </para>
+
+ <para>
! + When extractQuery returns zero number of keys, <acronym>GIN</acronym> will
! + emit a error: for different opclass and strategy semantic meaning of void
! + query may be different (for example, any array contains void array,
! + but they aren't overlapped with void one), and <acronym>GIN</acronym> can't
+ suggest reasonable answer.
+ </para>
+
--- 256,271 ----
+ <title>Limitations</title>
+
+ <para>
! + <acronym>GIN</acronym> doesn't support full index scans due to their
! + extremely inefficiency: because there are often many keys per value,
+ each heap pointer will returned several times.
+ </para>
+
+ <para>
! + When extractQuery returns zero keys, <acronym>GIN</acronym> will
! + emit a error: for different opclasses and strategies the semantic meaning of a void
! + query may be different (for example, any array contains the void array,
! + but they don't overlap the void array), and <acronym>GIN</acronym> can't
+ suggest reasonable answer.
+ </para>
+
***************
*** 340,346 ****
+ <see>index</see>
+ </indexterm>
+ GIN is a inverted index and it's usable for values which have more
! + than one key, arrays for example. Like to GiST, GIN may support
+ many different user-defined indexing strategies and the particular
+ operators with which a GIN index can be used vary depending on the
+ indexing strategy.
--- 340,346 ----
+ <see>index</see>
+ </indexterm>
+ GIN is a inverted index and it's usable for values which have more
! + than one key, arrays for example. Like GiST, GIN may support
+ many different user-defined indexing strategies and the particular
+ operators with which a GIN index can be used vary depending on the
+ indexing strategy.
***************
*** 358,364 ****
+
+ (See <xref linkend="functions-array"> for the meaning of
+ these operators.)
! + Another GIN operator classes are available in the <literal>contrib</>
+ tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
</para>
</sect1>
--- 358,364 ----
+
+ (See <xref linkend="functions-array"> for the meaning of
+ these operators.)
! + Other GIN operator classes are available in the <literal>contrib</>
+ tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
</para>
</sect1>
***************
*** 381,389 ****
+ <para>
+ Short-term share/exclusive page-level locks are used for
+ read/write access. Locks are released immediately after each
! + index row is fetched or inserted. But note, that GIN index
! + usually requires produce several inserts per one row, so,
! + GIN makes more work per one value's insertion.
+ </para>
+ </listitem>
+ </varlistentry>
--- 381,390 ----
+ <para>
+ Short-term share/exclusive page-level locks are used for
+ read/write access. Locks are released immediately after each
! + index row is fetched or inserted. But note that a GIN-indexed
! + value insertion usually produces several index key insertions
! + per row, so GIN may do substantial work for a single value's
! + insertion.
+ </para>
+ </listitem>
+ </varlistentry>
***************
*** 436,443 ****
</table>
<para>
! + GIN indexes are similar to GiST in flexibility: it hasn't a fixed set
! + of strategies. Instead, the <quote>consistency</> support routine
+ interprets the strategy numbers accordingly with operator class
+ definition. As an example, strategies of operator class over arrays
+ is shown in <xref linkend="xindex-gin-array-strat-table">.
--- 437,444 ----
</table>
<para>
! + GIN indexes are similar to GiST's in flexibility: they don't have a fixed
! + set of strategies. Instead, the <quote>consistency</> support routine
+ interprets the strategy numbers accordingly with operator class
+ definition. As an example, strategies of operator class over arrays
+ is shown in <xref linkend="xindex-gin-array-strat-table">.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings