Re: [HACKERS] GIN documentation

David Fuhry Sat, 16 Sep 2006 21:54:50 -0700

Teodor,

Attached is a diff -c against your original gindocs patch. I did mybest not to change any of the semantics. My changes no doubt overlap &conflict with those Jeff Davis sent you earlier, so consider both of ourdiffs.


Thanks,

Dave Fuhry

Teodor Sigaev wrote:

Patch adds GIN documentation and slightly improves GiST docs.

Somebody of native English speakers, pls, check the text... Thank you.


------------------------------------------------------------------------


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

*** gindocs.orig	2006-09-17 00:21:38.000000000 -0400
--- gindocs	2006-09-17 00:57:12.000000000 -0400
***************
*** 22,28 ****
  ! 	  </indexterm>
  ! 	  <listitem>
  ! 	   <para>
! ! 		Soft upper limit of the size of the returned set by GIN index. For more
  ! 		information see <xref linkend="gin-tips">.
  ! 	   </para>
  ! 	  </listitem>
--- 22,28 ----
  ! 	  </indexterm>
  ! 	  <listitem>
  ! 	   <para>
! ! 		Soft upper limit of the size of the set returned by the GIN index. For more
  ! 		information see <xref linkend="gin-tips">.
  ! 	   </para>
  ! 	  </listitem>
***************
*** 88,95 ****
  +  <para>
  +    <acronym>GIN</acronym> stands for Generalized Inverted Index.  It is
  +    an index structure storing a set of (key, posting list) pairs, where
! +    'posting list' is a set of rows in which the key occurs. The
! +    row may contains a lot of keys.
  +  </para>
  + 
  +  <para>
--- 88,95 ----
  +  <para>
  +    <acronym>GIN</acronym> stands for Generalized Inverted Index.  It is
  +    an index structure storing a set of (key, posting list) pairs, where
! +    'posting list' is a set of rows in which the key occurs. Each
! +    row may contain many keys.
  +  </para>
  + 
  +  <para>
***************
*** 178,184 ****
  +      <listitem>
  +       <para>
  + 	   Returns an array of keys of the query to be executed. n contains
! + 	   strategy number of operation (see <xref linkend="xindex-strategies">).
  + 	   Depending on n, query may be different type.
  +       </para>
  +      </listitem>
--- 178,184 ----
  +      <listitem>
  +       <para>
  + 	   Returns an array of keys of the query to be executed. n contains
! + 	   the strategy number of the operation (see <xref linkend="xindex-strategies">).
  + 	   Depending on n, query may be different type.
  +       </para>
  +      </listitem>
***************
*** 188,196 ****
  +      <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
  +      <listitem>
  +       <para>
! + 	   Returns TRUE if indexed value satisfies query qualifier with strategy n 
  + 	   (or may satisfy in case of RECHECK mark in operator class). 
! + 	   Each element of the check array is TRUE if indexed value has a 
  + 	   corresponding key in the query: if (check[i] == TRUE ) the i-th key of 
  + 	   the query is present in the indexed value.
  +       </para>
--- 188,196 ----
  +      <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
  +      <listitem>
  +       <para>
! + 	   Returns TRUE if the indexed value satisfies the query qualifier with strategy n 
  + 	   (or may satisfy in case of RECHECK mark in operator class). 
! + 	   Each element of the check array is TRUE if the indexed value has a 
  + 	   corresponding key in the query: if (check[i] == TRUE ) the i-th key of 
  + 	   the query is present in the indexed value.
  +       </para>
***************
*** 209,218 ****
  +    <term>Create vs insert</term>
  +    <listitem>
  + 	<para>
! + 	 In most cases, insertion into <acronym>GIN</acronym> index is slow enough
! + 	 due to a lot keys should be inserted per one value. So, for bulk upload
! + 	 data in table it will be useful to drop index and create it
! + 	 after finishing upload.
  + 	</para>
  +    </listitem>
  +   </varlistentry>
--- 209,218 ----
  +    <term>Create vs insert</term>
  +    <listitem>
  + 	<para>
! + 	 In most cases, insertion into a <acronym>GIN</acronym> index is slow
! + 	 due to the likelihood of many keys being inserted for each value. So, for bulk insertions into a
! + 	 table it is advisable to to drop the GIN index and recreate it
! + 	 after finishing bulk insertion.
  + 	</para>
  +    </listitem>
  +   </varlistentry>
***************
*** 221,227 ****
  +    <term>gin_fuzzy_search_limit</term>
  +    <listitem>
  + 	<para>
! + 	 The primary goal of development <acronym>GIN</acronym> indices was 
  + 	 support for highly scalable, full-text search in 
  + 	 <productname>PostgreSQL</productname> and there are often situations when 
  + 	 a full-text search returns a very large set of results.  Since reading 
--- 221,227 ----
  +    <term>gin_fuzzy_search_limit</term>
  +    <listitem>
  + 	<para>
! + 	 The primary goal of developing <acronym>GIN</acronym> indices was 
  + 	 support for highly scalable, full-text search in 
  + 	 <productname>PostgreSQL</productname> and there are often situations when 
  + 	 a full-text search returns a very large set of results.  Since reading 
***************
*** 232,238 ****
  + 	<para>
  + 	 Such queries usually contain very frequent words, so the results are not 
  + 	 very helpful. To facilitate execution of such queries 
! + 	 <acronym>GIN</acronym> has a configurable  soft upper limit of the size 
  + 	 of the returned set, determined by the 
  + 	 <varname>gin_fuzzy_search_limit</varname> GUC variable.  It is set to 0 by
  + 	 default (no limit).
--- 232,238 ----
  + 	<para>
  + 	 Such queries usually contain very frequent words, so the results are not 
  + 	 very helpful. To facilitate execution of such queries 
! + 	 <acronym>GIN</acronym> has a configurable soft upper limit of the size 
  + 	 of the returned set, determined by the 
  + 	 <varname>gin_fuzzy_search_limit</varname> GUC variable.  It is set to 0 by
  + 	 default (no limit).
***************
*** 256,271 ****
  +  <title>Limitations</title>
  + 
  +  <para>
! +   <acronym>GIN</acronym> doesn't support full scan of index due to it's 
! +   extremely inefficiency: because of a lot of keys per value, 
  +   each heap pointer will returned several times.
  +  </para>
  + 
  +  <para>
! +   When extractQuery returns zero number of keys, <acronym>GIN</acronym> will 
! +   emit a error: for different opclass and strategy semantic meaning of void 
! +   query may be different (for example, any array contains void array, 
! +   but they aren't overlapped with void one), and <acronym>GIN</acronym> can't 
  +   suggest reasonable answer.
  +  </para>
  + 
--- 256,271 ----
  +  <title>Limitations</title>
  + 
  +  <para>
! +   <acronym>GIN</acronym> doesn't support full index scans due to their 
! +   extremely inefficiency: because there are often many keys per value, 
  +   each heap pointer will returned several times.
  +  </para>
  + 
  +  <para>
! +   When extractQuery returns zero keys, <acronym>GIN</acronym> will 
! +   emit a error: for different opclasses and strategies the semantic meaning of a void 
! +   query may be different (for example, any array contains the void array, 
! +   but they don't overlap the void array), and <acronym>GIN</acronym> can't 
  +   suggest reasonable answer.
  +  </para>
  + 
***************
*** 340,346 ****
  +     <see>index</see>
  +    </indexterm>
  +    GIN is a inverted index and it's usable for values which have more
! +    than one key, arrays for example. Like to GiST, GIN may support
  +    many different user-defined indexing strategies and the particular 
  +    operators with which a GIN index can be used vary depending on the 
  +    indexing strategy.  
--- 340,346 ----
  +     <see>index</see>
  +    </indexterm>
  +    GIN is a inverted index and it's usable for values which have more
! +    than one key, arrays for example. Like GiST, GIN may support
  +    many different user-defined indexing strategies and the particular 
  +    operators with which a GIN index can be used vary depending on the 
  +    indexing strategy.  
***************
*** 358,364 ****
  + 
  +    (See <xref linkend="functions-array"> for the meaning of
  +    these operators.)
! +    Another GIN operator classes are available in the <literal>contrib</> 
  +    tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
      </para>
     </sect1>
--- 358,364 ----
  + 
  +    (See <xref linkend="functions-array"> for the meaning of
  +    these operators.)
! +    Other GIN operator classes are available in the <literal>contrib</> 
  +    tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
      </para>
     </sect1>
***************
*** 381,389 ****
  +        <para>
  + 		Short-term share/exclusive page-level locks are used for 
  + 		read/write access. Locks are released immediately after each
! + 		index row is fetched or inserted. But note, that GIN index
! + 		usually requires produce several inserts per one row, so,
! + 		GIN makes more work per one value's insertion.
  +        </para>
  +       </listitem>
  +      </varlistentry>
--- 381,390 ----
  +        <para>
  + 		Short-term share/exclusive page-level locks are used for 
  + 		read/write access. Locks are released immediately after each
! + 		index row is fetched or inserted. But note that a GIN-indexed
! + 		value insertion usually produces several index key insertions
! + 		per row, so GIN may do substantial work for a single value's
! + 		insertion.
  +        </para>
  +       </listitem>
  +      </varlistentry>
***************
*** 436,443 ****
       </table>
    
      <para>
! +    GIN indexes are similar to GiST in flexibility: it hasn't a fixed set
! +    of strategies. Instead, the <quote>consistency</> support routine
  +    interprets the strategy numbers accordingly with operator class
  +    definition. As an example, strategies of operator class over arrays
  +    is shown in <xref linkend="xindex-gin-array-strat-table">.
--- 437,444 ----
       </table>
    
      <para>
! +    GIN indexes are similar to GiST's in flexibility: they don't have a fixed 
! +    set of strategies. Instead, the <quote>consistency</> support routine
  +    interprets the strategy numbers accordingly with operator class
  +    definition. As an example, strategies of operator class over arrays
  +    is shown in <xref linkend="xindex-gin-array-strat-table">.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] GIN documentation

Reply via email to