Re: An inconsistency in the documentation about full text search

2023-11-29 Thread Euler Taveira
On Tue, Nov 7, 2023, at 7:12 AM, Erki Eessaar wrote:
> Section
> 
> 12.2.2. Creating Indexes
> https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
> 
> wrtites the following: "Another approach is to create a separate tsvector 
> column to hold the output of to_tsvector. To keep this column automatically 
> up to date with its source data, use a stored generated column. This example 
> is a concatenation of title and body, using coalesce to ensure that one field 
> will still be indexed when the other is NULL"
> 
> Section
> 
> 12.4.3. Triggers for Automatic Updates
> https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS

This section has a note at the top that this trigger-based method is obsolete
and refers to section 12.2.2 (Creating Indexes).


--
Euler Taveira
EDB   https://www.enterprisedb.com/


An inconsistency in the documentation about full text search

2023-11-07 Thread Erki Eessaar
Hello

I write about the documentation of full-text search.

Section

12.2.2. Creating Indexes
https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

wrtites the following: "Another approach is to create a separate tsvector 
column to hold the output of to_tsvector. To keep this column automatically up 
to date with its source data, use a stored generated column. This example is a 
concatenation of title and body, using coalesce to ensure that one field will 
still be indexed when the other is NULL"

Section

12.4.3. Triggers for Automatic Updates
https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS

writes the following: "When using a separate column to store the tsvector 
representation of your documents, it is necessary to create a trigger to update 
the tsvector column when the document content columns change. Two built-in 
trigger functions are available for this, or you can write your own."

Thus, one part of the documentation states that one should use the generated 
column-based approach and another part states that one should use the 
trigger-based approach to keep the values in a tsvector column up-to-date. If 
these solutions are both appropriate, then the documentation should state this.

Perhaps something like that?

"Another approach is to create a separate tsvector column to hold the output of 
to_tsvector. It is possible to use a stored generated column to keep this 
column automatically up to date with its source data. This example is a 
concatenation of title and body, using coalesce to ensure that one field will 
still be indexed when the other is NULL"

writes the following: "When using a separate column to store the tsvector 
representation of your documents the column has to be kept up to date with its 
source data. It is possible to create a trigger to update the tsvector column 
when the document content columns change. Two built-in trigger functions are 
available for this, or you can write your own."

Best regards
Erki Eessaar


Re: Full Text Search

2023-10-30 Thread Julien Champalbert
Thank you David for your response, it makes sense now.

Julien 

> Le 29 oct. 2023 à 22:02, David G. Johnston  a 
> écrit :
> 
> On Sun, Oct 29, 2023, 13:58 PG Doc comments form  > wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/14/textsearch-limitations.html 
> 
> Description:
> 
> Hello,
> 
> In the FTS/Limitations part of the documentation, it says :
> 
> "Another example — the PostgreSQL mailing list archives contained 910,989
> unique words with 57,491,343 lexemes in 461,020 messages."
> 
> How could the number of lexemes be greater than unique words ?
> 
> 
> https://www.postgresql.org/docs/14/textsearch-parsers.html 
> 
> 
> Note the part with the hyphenated word example.
> 
> David J.



Re: Full Text Search

2023-10-29 Thread David G. Johnston
On Sun, Oct 29, 2023, 13:58 PG Doc comments form 
wrote:

> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/14/textsearch-limitations.html
> Description:
>
> Hello,
>
> In the FTS/Limitations part of the documentation, it says :
>
> "Another example — the PostgreSQL mailing list archives contained 910,989
> unique words with 57,491,343 lexemes in 461,020 messages."
>
> How could the number of lexemes be greater than unique words ?
>


https://www.postgresql.org/docs/14/textsearch-parsers.html

Note the part with the hyphenated word example.

David J.

>


Full Text Search

2023-10-29 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/textsearch-limitations.html
Description:

Hello,

In the FTS/Limitations part of the documentation, it says :

"Another example — the PostgreSQL mailing list archives contained 910,989
unique words with 57,491,343 lexemes in 461,020 messages."

How could the number of lexemes be greater than unique words ?

Thank you
Julien


Pattern matching also includes Full-text search, Trigram and Fuzzysearch

2023-06-09 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/15/functions-matching.html
Description:

Hi, 

The tip states: "If you have pattern matching needs that go beyond this,
consider writing a user-defined function in Perl or Tcl." 

But there's a Full-text search, Trigram and Fuzzysearch extensions.
Shouldn't they be listed in the tip? 

Regards, 
B Banchev


Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Apr 12, 2022 at 1:28 PM Tom Lane  wrote:
>> Proposed patch attached.  The existing text already says "GIN indexes are
>> the preferred text search index type", so I'm not sure we need to go
>> further than that about guiding people which one to use.  In particular,
>> since GIN can't support included columns, we can't really deprecate GiST
>> altogether here.

> LGTM.

Done that way, then.

> I don't know enough about the topic to be able to claim that the
> robots.txt solution would also work out well, in about the same way.
> But I suspect that it might, and know that it's a reversible process.

Yeah, it's outside my expertise too.

regards, tom lane




Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 1:28 PM Tom Lane  wrote:
> Proposed patch attached.  The existing text already says "GIN indexes are
> the preferred text search index type", so I'm not sure we need to go
> further than that about guiding people which one to use.  In particular,
> since GIN can't support included columns, we can't really deprecate GiST
> altogether here.

LGTM.

> > There is always the extreme option of excluding older versions in
> > robots.txt. I bet that would work.
>
> Yeah, I was wondering about that too.  It's sort of the nuclear option,
> but if we don't want to modify EOL'd versions then we may not have any
> other way to keep Google from glomming onto them.

I think that our recent decision to just live with the downsides that
go with making the most recent stable release docs canonical was a
wise one, on balance. The reality is that we have very few ways of
influencing search results from Google.

I don't know enough about the topic to be able to claim that the
robots.txt solution would also work out well, in about the same way.
But I suspect that it might, and know that it's a reversible process.

-- 
Peter Geoghegan




Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Apr 12, 2022 at 12:49 PM Tom Lane  wrote:
>> I think we should take the index type names out of the section title
>> entirely, and name it something generic like "Preferred Index Types for
>> Full Text Search".

> Agreed.

Proposed patch attached.  The existing text already says "GIN indexes are
the preferred text search index type", so I'm not sure we need to go
further than that about guiding people which one to use.  In particular,
since GIN can't support included columns, we can't really deprecate GiST
altogether here.

> There is always the extreme option of excluding older versions in
> robots.txt. I bet that would work.

Yeah, I was wondering about that too.  It's sort of the nuclear option,
but if we don't want to modify EOL'd versions then we may not have any
other way to keep Google from glomming onto them.

regards, tom lane

diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 20db7b7afe..6afaf9e62c 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -3618,7 +3618,7 @@ SELECT plainto_tsquery('supernovae stars');
  
 
  
-  GIN and GiST Index Types
+  Preferred Index Types for Text Search
 
   
text search
@@ -3627,10 +3627,16 @@ SELECT plainto_tsquery('supernovae stars');
 
   
There are two kinds of indexes that can be used to speed up full text
-   searches.
+   searches:
+   GIN and
+   GiST.
Note that indexes are not mandatory for full text searching, but in
cases where a column is searched on a regular basis, an index is
usually desirable.
+  
+
+  
+   To create such an index, do one of:
 

 


Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 12:49 PM Tom Lane  wrote:
> I think we should take the index type names out of the section title
> entirely, and name it something generic like "Preferred Index Types for
> Full Text Search".

Agreed.

> After the recent changes discussed on the -www list, it's possible
> that Google will eventually stop indexing the 9.1 page altogether,
> but I'm not holding my breath.

There is always the extreme option of excluding older versions in
robots.txt. I bet that would work. Do you see any downside with that
solution, Jonathan?

--
Peter Geoghegan




Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> The page in question is "12.9. GIN and GiST Index Types", but it's
> really supplementary information for "12.2.2. Creating Indexes". The
> fact that the former has greater prominence than the latter (a general
> discussion of FTS indexing) seems like a problem in itself.

> At one point GiST was competitive with GIN for full text search
> performance (or at least more competitive). These days use of GiST for
> FTS should be rare. So the title should suggest that GiST FTS indexing
> is the nonstandard choice.

I think we should take the index type names out of the section title
entirely, and name it something generic like "Preferred Index Types for
Full Text Search".  Unfortunately, with the EOL'd documentation versions
being pretty much frozen in time, it's not clear that we can prevent
Google from continuing to find that 9.1 page when the search terms
include GIN and GIST.  I suspect it's keying off those terms appearing
in the page title :-(

After the recent changes discussed on the -www list, it's possible
that Google will eventually stop indexing the 9.1 page altogether,
but I'm not holding my breath.

regards, tom lane




Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 12:12 PM PG Doc comments form
 wrote:
> Even more effective would be to update the page title and/or headline to
> make clear that it is about using GIN and GiST indexes in context of full
> text search only.

I agree that the overall structure is unclear, and seems to be more of
an accident than a deliberate choice.

The page in question is "12.9. GIN and GiST Index Types", but it's
really supplementary information for "12.2.2. Creating Indexes". The
fact that the former has greater prominence than the latter (a general
discussion of FTS indexing) seems like a problem in itself.

At one point GiST was competitive with GIN for full text search
performance (or at least more competitive). These days use of GiST for
FTS should be rare. So the title should suggest that GiST FTS indexing
is the nonstandard choice.

-- 
Peter Geoghegan




"GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/textsearch-indexes.html
Description:

Hey,

when you google for "postgresql gist gin index" you will most probably see
this page (or an older version of it) as #1 and the only result from
postgresql.org:
https://www.postgresql.org/docs/current/textsearch-indexes.html This led me
an others in our team to initially misunderstand that GiST and GIN indexes
are purely a full text search thing in PostgreSQL. But they are of course so
much more, but from this page you would not be able to discover that. (It is
interesting that even searching for `GiST` on postgresql.org lists that page
first, and that for example https://www.postgresql.org/docs/14/sql.html only
lists that page if you Ctrl+F for `gin` or `gist`).

It would probably be a good idea to link to
https://www.postgresql.org/docs/14/gin.html and
https://www.postgresql.org/docs/14/gist.html (or whatever are the best pages
to explain GIN and GiST indexes) in the introduction of this article to lead
people in the right direction. (Bonus points if this can be added to older
versions of the docs as well, as those are ranking on Google and not
everyone clicks through to `current` I guess - including me sometimes.)

Even more effective would be to update the page title and/or headline to
make clear that it is about using GIN and GiST indexes in context of full
text search only.

For the page content itself, it might be beneficial to highlight that the
code example itself is a shorthand that skips the (implied via the type)
definition of an operator class (although it might be possible I do not
understand the full picture here right now - docs are pretty scarce or hard
to find after all).

Let me know if there is a public GH repo where I could send PRs to suggest
these changes of course.

Best
Jan Piotrowski