Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Apr 12, 2022 at 1:28 PM Tom Lane  wrote:
>> Proposed patch attached.  The existing text already says "GIN indexes are
>> the preferred text search index type", so I'm not sure we need to go
>> further than that about guiding people which one to use.  In particular,
>> since GIN can't support included columns, we can't really deprecate GiST
>> altogether here.

> LGTM.

Done that way, then.

> I don't know enough about the topic to be able to claim that the
> robots.txt solution would also work out well, in about the same way.
> But I suspect that it might, and know that it's a reversible process.

Yeah, it's outside my expertise too.

regards, tom lane




Re: incorrect information in documentation

2022-04-12 Thread David G. Johnston
On Mon, Aug 9, 2021 at 8:40 PM David G. Johnston 
wrote:

> On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian  wrote:
>
>>
>> > selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/
>> > num_distinct1,
>> > 1/num_distinct2)
>> > = (1 - 0) * (1 - 0) / max(1, 1)
>> > = 0.0001
>>
>> Nice, can you provide a patch please?
>>
>>
> Change the line:
>
>
Concretely, as attached and inline.

David J.


commit 73fa486a855d75d74a1a695bb350bfbfe27c7751
Author: David G. Johnston 
Date:   Tue Apr 12 21:23:53 2022 +

doc: make unique non-null join selectivity example match the prose

The description of the computation for the unique, non-null,
join selectivity describes a division by the maximum of two values,
while the example shows a multiplication by their reciprocal.  While
equivalent the max phrasing is easier to understand; which seems
more important here than precisely adhering to the formula use
in the code (for which either variant is still an approximation).

While both num_distinct and num_rows are equal for a unique column
both the concept and formula use row count (10,000) and the
field num_distinct has already been set to mean the specific value
present in the pg_stats table (i.e, -1), so use num_rows here.

diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 78053d7c49..f72bc4b274 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -391,18 +391,20 @@ tablename  | null_frac | n_distinct | most_common_vals
 

In this case there is no MCV information for
-   unique2 because all the values appear to be
-   unique, so we use an algorithm that relies only on the number of
-   distinct values for both relations together with their null fractions:
+   unique2 and all the values appear to be
+   unique (n_distinct = -1), so we use an algorithm that relies on the row
+   count estimates for both relations (num_rows, not shown, but "tenk")
+   together with the column null fractions (zero for both):

 
-selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1,
1/num_distinct2)
+selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1,
num_rows2)
 = (1 - 0) * (1 - 0) / max(1, 1)
 = 0.0001
 

This is, subtract the null fraction from one for each of the relations,
-   and divide by the maximum of the numbers of distinct values.
+   and divide by the row count of the larger relation (this value does get
+   scaled in the non-unique case).
The number of rows
that the join is likely to emit is calculated as the cardinality of the
Cartesian product of the two inputs, multiplied by the


v0001-doc-make-row-estimation-example-match-prose.patch
Description: Binary data


Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 1:28 PM Tom Lane  wrote:
> Proposed patch attached.  The existing text already says "GIN indexes are
> the preferred text search index type", so I'm not sure we need to go
> further than that about guiding people which one to use.  In particular,
> since GIN can't support included columns, we can't really deprecate GiST
> altogether here.

LGTM.

> > There is always the extreme option of excluding older versions in
> > robots.txt. I bet that would work.
>
> Yeah, I was wondering about that too.  It's sort of the nuclear option,
> but if we don't want to modify EOL'd versions then we may not have any
> other way to keep Google from glomming onto them.

I think that our recent decision to just live with the downsides that
go with making the most recent stable release docs canonical was a
wise one, on balance. The reality is that we have very few ways of
influencing search results from Google.

I don't know enough about the topic to be able to claim that the
robots.txt solution would also work out well, in about the same way.
But I suspect that it might, and know that it's a reversible process.

-- 
Peter Geoghegan




Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Apr 12, 2022 at 12:49 PM Tom Lane  wrote:
>> I think we should take the index type names out of the section title
>> entirely, and name it something generic like "Preferred Index Types for
>> Full Text Search".

> Agreed.

Proposed patch attached.  The existing text already says "GIN indexes are
the preferred text search index type", so I'm not sure we need to go
further than that about guiding people which one to use.  In particular,
since GIN can't support included columns, we can't really deprecate GiST
altogether here.

> There is always the extreme option of excluding older versions in
> robots.txt. I bet that would work.

Yeah, I was wondering about that too.  It's sort of the nuclear option,
but if we don't want to modify EOL'd versions then we may not have any
other way to keep Google from glomming onto them.

regards, tom lane

diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 20db7b7afe..6afaf9e62c 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -3618,7 +3618,7 @@ SELECT plainto_tsquery('supernovae stars');
  
 
  
-  GIN and GiST Index Types
+  Preferred Index Types for Text Search
 
   
text search
@@ -3627,10 +3627,16 @@ SELECT plainto_tsquery('supernovae stars');
 
   
There are two kinds of indexes that can be used to speed up full text
-   searches.
+   searches:
+   GIN and
+   GiST.
Note that indexes are not mandatory for full text searching, but in
cases where a column is searched on a regular basis, an index is
usually desirable.
+  
+
+  
+   To create such an index, do one of:
 

 


Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 12:49 PM Tom Lane  wrote:
> I think we should take the index type names out of the section title
> entirely, and name it something generic like "Preferred Index Types for
> Full Text Search".

Agreed.

> After the recent changes discussed on the -www list, it's possible
> that Google will eventually stop indexing the 9.1 page altogether,
> but I'm not holding my breath.

There is always the extreme option of excluding older versions in
robots.txt. I bet that would work. Do you see any downside with that
solution, Jonathan?

--
Peter Geoghegan




Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> The page in question is "12.9. GIN and GiST Index Types", but it's
> really supplementary information for "12.2.2. Creating Indexes". The
> fact that the former has greater prominence than the latter (a general
> discussion of FTS indexing) seems like a problem in itself.

> At one point GiST was competitive with GIN for full text search
> performance (or at least more competitive). These days use of GiST for
> FTS should be rare. So the title should suggest that GiST FTS indexing
> is the nonstandard choice.

I think we should take the index type names out of the section title
entirely, and name it something generic like "Preferred Index Types for
Full Text Search".  Unfortunately, with the EOL'd documentation versions
being pretty much frozen in time, it's not clear that we can prevent
Google from continuing to find that 9.1 page when the search terms
include GIN and GIST.  I suspect it's keying off those terms appearing
in the page title :-(

After the recent changes discussed on the -www list, it's possible
that Google will eventually stop indexing the 9.1 page altogether,
but I'm not holding my breath.

regards, tom lane




Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 12:12 PM PG Doc comments form
 wrote:
> Even more effective would be to update the page title and/or headline to
> make clear that it is about using GIN and GiST indexes in context of full
> text search only.

I agree that the overall structure is unclear, and seems to be more of
an accident than a deliberate choice.

The page in question is "12.9. GIN and GiST Index Types", but it's
really supplementary information for "12.2.2. Creating Indexes". The
fact that the former has greater prominence than the latter (a general
discussion of FTS indexing) seems like a problem in itself.

At one point GiST was competitive with GIN for full text search
performance (or at least more competitive). These days use of GiST for
FTS should be rare. So the title should suggest that GiST FTS indexing
is the nonstandard choice.

-- 
Peter Geoghegan




Re: role attributes are missing from this page

2022-04-12 Thread Tom Lane
PG Doc comments form  writes:
> Issue #1:
> "old" is a keyword, but I cannot query the old column explicitely with
> surrounding quotes.
> SELECT "old" FROM pg_roles;

Look closer.  That column is "oid" not "old".

> Issue #2:
> You do not have ALL role attributes documented on this page.  Please add and
> document the missing role attributes.
> https://www.postgresql.org/docs/current/role-attributes.html

AFAICS this page is merely trying to document the role properties that
can reasonably be considered to be kinds of privileges.  I don't have
a problem with it not mentioning role name or password, for example.
In any case, if you want a full list of what is in the pg_roles view,
the place to look is under the system views documentation:

https://www.postgresql.org/docs/current/view-pg-roles.html

role-attributes.html is not intended to be a substitute for that.

regards, tom lane




"GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

2022-04-12 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/textsearch-indexes.html
Description:

Hey,

when you google for "postgresql gist gin index" you will most probably see
this page (or an older version of it) as #1 and the only result from
postgresql.org:
https://www.postgresql.org/docs/current/textsearch-indexes.html This led me
an others in our team to initially misunderstand that GiST and GIN indexes
are purely a full text search thing in PostgreSQL. But they are of course so
much more, but from this page you would not be able to discover that. (It is
interesting that even searching for `GiST` on postgresql.org lists that page
first, and that for example https://www.postgresql.org/docs/14/sql.html only
lists that page if you Ctrl+F for `gin` or `gist`).

It would probably be a good idea to link to
https://www.postgresql.org/docs/14/gin.html and
https://www.postgresql.org/docs/14/gist.html (or whatever are the best pages
to explain GIN and GiST indexes) in the introduction of this article to lead
people in the right direction. (Bonus points if this can be added to older
versions of the docs as well, as those are ranking on Google and not
everyone clicks through to `current` I guess - including me sometimes.)

Even more effective would be to update the page title and/or headline to
make clear that it is about using GIN and GiST indexes in context of full
text search only.

For the page content itself, it might be beneficial to highlight that the
code example itself is a shorthand that skips the (implied via the type)
definition of an operator class (although it might be possible I do not
understand the full picture here right now - docs are pretty scarce or hard
to find after all).

Let me know if there is a public GH repo where I could send PRs to suggest
these changes of course.

Best
Jan Piotrowski


role attributes are missing from this page

2022-04-12 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/role-attributes.html
Description:

Issue #1:
"old" is a keyword, but I cannot query the old column explicitely with
surrounding quotes.
SELECT "old" FROM pg_roles;

Issue #2:
You do not have ALL role attributes documented on this page.  Please add and
document the missing role attributes.
https://www.postgresql.org/docs/current/role-attributes.html

This is a list of all role attributes:
SELECT
rolname,rolsuper,rolinherit,rolcreaterole,rolcreatedb,rolcanlogin,rolreplication,rolconnlimit,rolpassword,rolvaliduntil,rolbypassrls,rolconfig,"old"

FROM pg_roles;

These role attributes are documented:
SELECT
rolcanlogin,rolsuper,rolcreatedb,rolcreaterole,rolreplication,rolpassword
FROM pg_roles;

These role attributes are NOT documented:
SELECT
rolname,rolsuper,rolinherit,rolconnlimit,,rolvaliduntil,rolbypassrls,rolconfig,"old"
FROM pg_roles;
SELECT "old" FROM pg_roles;


Re: Missing example for SAVEPOINT using the same savepoint name

2022-04-12 Thread Sebastien Flaesch
Hello!
The new sample code looks clearer I agree!
Seb

From: David G. Johnston 
Sent: Tuesday, April 12, 2022 4:41 PM
To: Sebastien Flaesch ; Pg Docs 

Subject: Re: Missing example for SAVEPOINT using the same savepoint name


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

On Mon, Apr 11, 2022 at 9:43 AM PG Doc comments form 
mailto:nore...@postgresql.org>> wrote:
The following documentation comment has been logged on the website:

Page: 
https://www.postgresql.org/docs/14/sql-savepoint.html
Description:

The "Compatibility" section deserves some code example to illustrate the
behavior when reusing the same savepoint name.


I agree with the premise but not the implementation.  I've attached (and 
inlined) a patch that implements the example change, updates savepoint to 
introduce the behavior prior to the example, and updates the rollback to 
savepoint to note the behavior as well.


commit e921441f22bad972393144628c7ee48845d5384c
Author: David G. Johnston 
mailto:david.g.johns...@gmail.com>>
Date:   Tue Apr 12 14:30:11 2022 +

doc: Improve docs regarding savepoint name reuse

Per documentation comment the savepoint command lacks an example
where the savepoint name is reused.  The suggested example didn't
conform to the others on the page, nor did the suggested location
in compatibility seem desirable, but the omission rang true. Add
another example to the examples section demonstrating this case.
Additionally, document under the description for savepoint_name
that we allow for the name to be repeated - and note what that
means in terms of release and rollback. It seems desirable to
place this comment in description rather than notes for savepoint.
For the other two commands the behavior in the presence of
duplicate savepoint names best fits as notes.  In fact release
already had one.  This commit copies the same verbiage over to
rollback.

diff --git a/doc/src/sgml/ref/rollback_to.sgml 
b/doc/src/sgml/ref/rollback_to.sgml
index 3d5a241e1a..7bd1b41feb 100644
--- a/doc/src/sgml/ref/rollback_to.sgml
+++ b/doc/src/sgml/ref/rollback_to.sgml
@@ -89,6 +89,12 @@ ROLLBACK [ WORK | TRANSACTION ] TO [ SAVEPOINT ] 
savepoint_nameROLLBACK TO SAVEPOINT, the cursor can no longer be used.
   
+
+  
+   If multiple savepoints have the same name, only the one that was most
+   recently defined is released.
+  
+
  

  
diff --git a/doc/src/sgml/ref/savepoint.sgml b/doc/src/sgml/ref/savepoint.sgml
index b17342a1ee..fd017935ea 100644
--- a/doc/src/sgml/ref/savepoint.sgml
+++ b/doc/src/sgml/ref/savepoint.sgml
@@ -53,7 +53,9 @@ SAVEPOINT savepoint_name
 savepoint_name
 
  
-  The name to give to the new savepoint.
+  The name to give to the new savepoint.  The name may already exist,
+  in which case a rollback or release to the same name will use the
+  one that was most recently defined.
  
 

@@ -106,6 +108,25 @@ COMMIT;
 
The above transaction will insert both 3 and 4.
   
+
+  
+  To use a single savepoint name:
+
+BEGIN;
+INSERT INTO table1 VALUES (1);
+SAVEPOINT my_savepoint;
+INSERT INTO table1 VALUES (2);
+SAVEPOINT my_savepoint;
+INSERT INTO table1 VALUES (3);
+ROLLBACK TO SAVEPOINT my_savepoint;
+SELECT * FROM table1; // 1, 2
+ROLLBACK TO SAVEPOINT my_savepoint;
+SELECT * FROM table1; // just 1
+COMMIT;
+
+  The above transaction shows row 3 being rolled back first then row 2.
+  
+
  

  



Re: Missing example for SAVEPOINT using the same savepoint name

2022-04-12 Thread David G. Johnston
On Mon, Apr 11, 2022 at 9:43 AM PG Doc comments form 
wrote:

> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/14/sql-savepoint.html
> Description:
>
> The "Compatibility" section deserves some code example to illustrate the
> behavior when reusing the same savepoint name.
>
>
I agree with the premise but not the implementation.  I've attached (and
inlined) a patch that implements the example change, updates savepoint to
introduce the behavior prior to the example, and updates the rollback to
savepoint to note the behavior as well.


commit e921441f22bad972393144628c7ee48845d5384c
Author: David G. Johnston 
Date:   Tue Apr 12 14:30:11 2022 +

doc: Improve docs regarding savepoint name reuse

Per documentation comment the savepoint command lacks an example
where the savepoint name is reused.  The suggested example didn't
conform to the others on the page, nor did the suggested location
in compatibility seem desirable, but the omission rang true. Add
another example to the examples section demonstrating this case.
Additionally, document under the description for savepoint_name
that we allow for the name to be repeated - and note what that
means in terms of release and rollback. It seems desirable to
place this comment in description rather than notes for savepoint.
For the other two commands the behavior in the presence of
duplicate savepoint names best fits as notes.  In fact release
already had one.  This commit copies the same verbiage over to
rollback.

diff --git a/doc/src/sgml/ref/rollback_to.sgml
b/doc/src/sgml/ref/rollback_to.sgml
index 3d5a241e1a..7bd1b41feb 100644
--- a/doc/src/sgml/ref/rollback_to.sgml
+++ b/doc/src/sgml/ref/rollback_to.sgml
@@ -89,6 +89,12 @@ ROLLBACK [ WORK | TRANSACTION ] TO [ SAVEPOINT ]
savepoint_nameROLLBACK TO SAVEPOINT, the cursor can no longer be
used.
   
+
+  
+   If multiple savepoints have the same name, only the one that was most
+   recently defined is released.
+  
+
  

  
diff --git a/doc/src/sgml/ref/savepoint.sgml
b/doc/src/sgml/ref/savepoint.sgml
index b17342a1ee..fd017935ea 100644
--- a/doc/src/sgml/ref/savepoint.sgml
+++ b/doc/src/sgml/ref/savepoint.sgml
@@ -53,7 +53,9 @@ SAVEPOINT savepoint_name
 savepoint_name
 
  
-  The name to give to the new savepoint.
+  The name to give to the new savepoint.  The name may already exist,
+  in which case a rollback or release to the same name will use the
+  one that was most recently defined.
  
 

@@ -106,6 +108,25 @@ COMMIT;
 
The above transaction will insert both 3 and 4.
   
+
+  
+  To use a single savepoint name:
+
+BEGIN;
+INSERT INTO table1 VALUES (1);
+SAVEPOINT my_savepoint;
+INSERT INTO table1 VALUES (2);
+SAVEPOINT my_savepoint;
+INSERT INTO table1 VALUES (3);
+ROLLBACK TO SAVEPOINT my_savepoint;
+SELECT * FROM table1; // 1, 2
+ROLLBACK TO SAVEPOINT my_savepoint;
+SELECT * FROM table1; // just 1
+COMMIT;
+
+  The above transaction shows row 3 being rolled back first then row 2.
+  
+
  

  


v0001-doc-savepoint-name-reuse.patch
Description: Binary data