Improving docs for strict_word_similarity()

Bruce Momjian Sat, 26 May 2018 09:57:34 -0700

While creating the release notes, I was confused by the description for
strict_word_similarity(), particularly "extent boundaries".  The
attached patch clarifies, at least for me, how word_similarity() and
strict_word_similarity() differ.


-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
new file mode 100644
index be43cdf..afb589b
*** a/doc/src/sgml/pgtrgm.sgml
--- b/doc/src/sgml/pgtrgm.sgml
***************
*** 112,119 ****
        </entry>
        <entry><type>real</type></entry>
        <entry>
!        Same as <function>word_similarity(text, text)</function>, but forces
!        extent boundaries to match word boundaries.
        </entry>
       </row>
       <row>
--- 112,119 ----
        </entry>
        <entry><type>real</type></entry>
        <entry>
!        Same as <function>word_similarity(text, text)</function>, but
!        considers the set of trigrams to be of the same length.
        </entry>
       </row>
       <row>
***************
*** 164,179 ****
     This function returns a value that can be approximately understood as the
     greatest similarity between the first string and any substring of the second
     string.  However, this function does not add padding to the boundaries of
!    the extent.  Thus, a whole word match gets a higher score than a match with
!    a part of the word.
    </para>
  
    <para>
!    At the same time, <function>strict_word_similarity(text, text)</function>
!    has to select an extent that matches word boundaries.  In the example above,
!    <function>strict_word_similarity(text, text)</function> would select the
!    extent <literal>{"  w"," wo","wor","ord","rds","ds "}</literal>, which
!    corresponds to the whole word <literal>'words'</literal>.
  
  <programlisting>
  # SELECT strict_word_similarity('word', 'two words'), similarity('word', 'words');
--- 164,182 ----
     This function returns a value that can be approximately understood as the
     greatest similarity between the first string and any substring of the second
     string.  However, this function does not add padding to the boundaries of
!    the extent.  Thus, the number of additional characters present in the
!    second string is not considered, except for the mismatched word boundry.
    </para>
  
    <para>
!    The function <function>strict_word_similarity(text, text)</function>
!    does consider additional characters in the second string.  In the
!    example above, <function>strict_word_similarity(text, text)</function>
!    would use the full trigram for the second string when computing
!    similarity, not just the part of the trigram that matches the
!    first string. For example, it would use the <literal>{" w","
!    wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole
!    word <literal>'words'</literal>.
  
  <programlisting>
  # SELECT strict_word_similarity('word', 'two words'), similarity('word', 'words');
***************
*** 186,194 ****
  
    <para>
     Thus, the <function>strict_word_similarity(text, text)</function> function
!    is useful for finding similar subsets of whole words, while
     <function>word_similarity(text, text)</function> is more suitable for
!    searching similar parts of words.
    </para>
  
    <table id="pgtrgm-op-table">
--- 189,197 ----
  
    <para>
     Thus, the <function>strict_word_similarity(text, text)</function> function
!    is useful for finding the similarity to whole words, while
     <function>word_similarity(text, text)</function> is more suitable for
!    finding the similarity for parts of words.
    </para>
  
    <table id="pgtrgm-op-table">

Improving docs for strict_word_similarity()

Reply via email to