Re: Add A Glossary

2020-06-19 Thread Alvaro Herrera
Thanks for these fixes!  I included all of these.

On 2020-Jun-19, Erik Rijkers wrote:

> And one thing that I am not sure of (but strikes me as a bit odd):
> there are several cases of
> 'are enforced unique'. Should that not be
> 'are enforced to be unique'  ?

I included this change too; I am not too sure of it myself.  If some
English language neatnik wants to argue one way or the other, be my
guest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-06-19 Thread Erik Rijkers

On 2020-06-19 01:51, Alvaro Herrera wrote:

On 2020-Jun-16, Justin Pryzby wrote:

On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:


I noticed one typo:

'aggregates functions'  should be
'aggregate functions'


And one thing that I am not sure of (but strikes me as a bit odd):
there are several cases of
'are enforced unique'. Should that not be
'are enforced to be unique'  ?


Anther small mistake (2x):

'The name of such objects of the same type are'  should be
'The names of such objects of the same type are'

(this phrase occurs 2x wrong, 1x correct)


thanks,

Erik Rijkers

















Re: Add A Glossary

2020-06-18 Thread Alvaro Herrera
On 2020-Jun-16, Justin Pryzby wrote:
> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:

Thanks for the review.  I merged all your suggestions.  This one:

> >Most local objects belong to a specific
> > +  schema in their
> > +  containing database, such as
> > +  all types of 
> > relations,
> > +  all types of 
> > functions,
> 
> Maybe say: >Relations< (all types), and >Functions< (all types)

led me down not one but two rabbit holes; first I realized that
"functions" is an insufficient term since procedures should also be
included but weren't, so I had to add the more generic term "routine"
and then modify the definitions of all routine types to mix in well.  I
think overall the quality of these definitions is improved as a result.

I also felt the need to revise the definition of "relations", so I did
that too; this made me change the definition of resultset too.

On 2020-Jun-17, Jürgen Purtz wrote:

> +1, with two formal changes:
> 
> -  Rearrangement of term "Data page" to meet alphabetical order.

To forestall these ordering issues (look, another rabbit hole), I
grepped the file for all glossterms and sorted that under en_US rules,
then reordered the terms to match that.  Turns out there were several
other ordering mistakes.

git grep ''  | sed -e 's/<[^>]*>\([^<]*\)<[^>]*>/\1/' > orig
LC_COLLATE=en_US.UTF-8 sort orig > sorted

(Eliminating the tags is important, otherwise the sort uses the tags
themselves to disambiguate)

> One last question: The definition of "Data directory" reads "... A cluster's
> storage space comprises the data directory plus ..." and 'cluster' links to
> '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?

Yes, an oversight, thanks.

I also added TPS, because I had already written it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 25b03f3b37..5274feabba 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -23,7 +23,7 @@
   
 
   
-   Aggregate function
+   Aggregate function (routine)

 
  A function that
@@ -39,6 +39,11 @@

   
 
+  
+   Analytic function
+   
+  
+
   
Analyze (operation)

@@ -57,11 +62,6 @@

   
 
-  
-   Analytic function
-   
-  
-
   
Atomic

@@ -389,40 +389,33 @@

   
 
-  
-   Data directory
+  
+   Database

 
- The base directory on the filesystem of a
- server that contains all
- data files and subdirectories associated with an
- instance (with the
- exception of tablespaces).
- The environment variable PGDATA is commonly used to
- refer to the
- data directory.
-
-
- An instance's storage
- space comprises the data directory plus any additional tablespaces.
+ A named collection of
+ local SQL objects.
 
 
  For more information, see
- .
+ .
 

   
 
-  
-   Database
+  
+   Database cluster

 
- A named collection of
- SQL objects.
+ A collection of databases and global SQL objects,
+ and their common static and dynamic metadata.
+ Sometimes referred to as a
+ cluster.
 
 
- For more information, see
- .
+ In PostgreSQL, the term
+ cluster is also sometimes used to refer to an instance.
+ (Don't confuse this term with the SQL command CLUSTER.)
 

   
@@ -432,6 +425,31 @@

   
 
+  
+   Data directory
+   
+
+ The base directory on the filesystem of a
+ server that contains all
+ data files and subdirectories associated with a
+ database cluster
+ (with the exception of
+ tablespaces,
+ and optionally WAL).
+ The environment variable PGDATA is commonly used to
+ refer to the data directory.
+
+
+ A cluster's storage
+ space comprises the data directory plus any additional tablespaces.
+
+
+ For more information, see
+ .
+
+   
+  
+
   
Data page

@@ -578,7 +596,7 @@
   
 
   
-   Foreign table
+   Foreign table (relation)

 
  A relation which appears to have
@@ -631,12 +649,20 @@
   
 
   
-   Function
+   Function (routine)

 
- Any defined transformation of data. Many functions are already defined
- within PostgreSQL itself, but user-defined
- ones can also be added.
+ A type of routine that receives zero or more arguments, returns zero or more
+ output values, and is constrained to run within one transaction.
+ Functions are invoked as part of a query, for example via
+ SELECT.
+ Certain functions can return
+ sets; those are
+ called set-returning functions.
+
+
+ Functions can also be used for
+ triggers to invoke.
 
 
  For more information, see
@@ -689,13 +715,12 @@
   
 
   
-   Index
+   Index (relation)

 
  A relation 

Re: Add A Glossary

2020-06-17 Thread Jürgen Purtz


On 17.06.20 02:09, Alvaro Herrera wrote:

On 2020-Jun-09, Jürgen Purtz wrote:


Can you agree to the following definitions? If no, we can alternatively
formulate for each of them: "Under discussion - currently not defined". My
proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
databases, and a collection of databases managed by a single PostgreSQL
server instance constitutes a database cluster."

After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl

pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.
  -D, --pgdata=DATADIR   location of the database storage area

to:

pg_ctl is a utility to initialize or control a PostgreSQL database cluster.
  -D, --pgdata=DATADIR   location of the database directory

pg_basebackup:

pg_basebackup takes a base backup of a running PostgreSQL server.

to:

pg_basebackup takes a base backup of a PostgreSQL instance.


+1, with two formal changes:

-  Rearrangement of term "Data page" to meet alphabetical order.

-  Add  in one case to meet xml-well-formedness.


One last question: The definition of "Data directory" reads "... A 
cluster's storage space comprises the data directory plus ..." and 
'cluster' links to '"glossary-instance". Shouldn't it link to 
"glossary-db-cluster"?


--

Jürgen Purtz


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index e29b55e5ac..0499f9044f 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -413,6 +413,22 @@

   
 
+  
+   Data page
+   
+
+ The basic structure used to store relation data.
+ All pages are of the same size.
+ Data pages are typically stored on disk, each in a specific file,
+ and can be read to shared buffers
+ where they can be modified, becoming
+ dirty.  They become clean when written
+ to disk.  New pages, which initially exist in memory only, are also
+ dirty until written.
+
+   
+  
+
   
Database

@@ -441,6 +457,7 @@
  cluster is also sometimes used to refer to an instance.
  (Don't confuse this term with the SQL command CLUSTER.)
 
+   
   
 
   
@@ -448,22 +465,6 @@

   
 
-  
-   Data page
-   
-
- The basic structure used to store relation data.
- All pages are of the same size.
- Data pages are typically stored on disk, each in a specific file,
- and can be read to shared buffers
- where they can be modified, becoming
- dirty.  They become clean when written
- to disk.  New pages, which initially exist in memory only, are also
- dirty until written.
-
-   
-  
-
   
Datum



Re: Add A Glossary

2020-06-16 Thread Justin Pryzby
On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:
> diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
> index 25b03f3b37..e29b55e5ac 100644
> --- a/doc/src/sgml/glossary.sgml
> +++ b/doc/src/sgml/glossary.sgml
> @@ -395,15 +395,15 @@
>  
>   The base directory on the filesystem of a
>   server that contains 
> all
> - data files and subdirectories associated with an
> - instance (with the
> - exception of  linkend="glossary-tablespace">tablespaces).
> + data files and subdirectories associated with a
> + database cluster
> + (with the exception of
> + tablespaces).

and (optionally) WAL

> +  
> +   Database cluster
> +   
> +
> + A collection of databases and global SQL objects,
> + and their common static and dynamic meta-data.

metadata

> @@ -1245,12 +1255,17 @@
>   SQL objects,
>   which all reside in the same
>   database.
> - Each SQL object must reside in exactly one schema.
> + Each SQL object must reside in exactly one schema
> + (though certain types of SQL objects exist outside schemas).

(except for global objects which ..)

>  
>   The names of SQL objects of the same type in the same schema are 
> enforced
>   to be unique.
>   There is no restriction on reusing a name in multiple schemas.
> + For local objects that exist outside schemas, their names are enforced
> + unique across the whole database.  For global objects, their names

I would say "unique within the database"

> + are enforced unique across the whole
> + database cluster.

and "unique within the whole db cluster"

>Most local objects belong to a specific
> -  schema in their 
> containing database.
> +  schema in their
> +  containing database, such as
> +  all types of 
> relations,
> +  all types of 
> functions,

Maybe say: >Relations< (all types), and >Functions< (all types)

>   used as the default one for all SQL objects, called 
> pg_default.
>   
>  
"the default" (remove "one")

-- 
Justin




Re: Add A Glossary

2020-06-16 Thread Alvaro Herrera
On 2020-Jun-09, Jürgen Purtz wrote:

> Can you agree to the following definitions? If no, we can alternatively
> formulate for each of them: "Under discussion - currently not defined". My
> proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
> databases, and a collection of databases managed by a single PostgreSQL
> server instance constitutes a database cluster."

After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl
> pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL 
> server.
>  -D, --pgdata=DATADIR   location of the database storage area
to:
> pg_ctl is a utility to initialize or control a PostgreSQL database cluster.
>  -D, --pgdata=DATADIR   location of the database directory

pg_basebackup:
> pg_basebackup takes a base backup of a running PostgreSQL server.
to:
> pg_basebackup takes a base backup of a PostgreSQL instance.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 25b03f3b37..e29b55e5ac 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -395,15 +395,15 @@
 
  The base directory on the filesystem of a
  server that contains all
- data files and subdirectories associated with an
- instance (with the
- exception of tablespaces).
+ data files and subdirectories associated with a
+ database cluster
+ (with the exception of
+ tablespaces).
  The environment variable PGDATA is commonly used to
- refer to the
- data directory.
+ refer to the data directory.
 
 
- An instance's storage
+ A cluster's storage
  space comprises the data directory plus any additional tablespaces.
 
 
@@ -418,7 +418,7 @@

 
  A named collection of
- SQL objects.
+ local SQL objects.
 
 
  For more information, see
@@ -427,6 +427,22 @@

   
 
+  
+   Database cluster
+   
+
+ A collection of databases and global SQL objects,
+ and their common static and dynamic meta-data.
+ Sometimes referred to as a
+ cluster.
+
+
+ In PostgreSQL, the term
+ cluster is also sometimes used to refer to an instance.
+ (Don't confuse this term with the SQL command CLUSTER.)
+
+  
+
   
Database server

@@ -634,7 +650,7 @@
Function

 
- Any defined transformation of data. Many functions are already defined
+ A defined transformation of data.  Many functions are already defined
  within PostgreSQL itself, but user-defined
  ones can also be added.
 
@@ -724,14 +740,12 @@
Instance

 
- A set of databases and accompanying global SQL objects that are stored in
- the same data directory
- in a single server.
- If running, one
+ A group of backend and auxiliary processes that communicate using
+ a common shared memory area.  One 
  postmaster process
- manages a group of backend and auxiliary processes that communicate
- using a common shared memory
- area.  Many instances can run on the same
+ manages the instance; one instance manages exactly one
+ database cluster
+ with all its databases.  Many instances can run on the same
  server
  as long as their TCP ports do not conflict.
 
@@ -739,14 +753,10 @@
  The instance handles all key features of a DBMS:
  read and write access to files and shared memory,
  assurance of the ACID properties,
- connections to client processes,
+ connections to
+ client processes,
  privilege verification, crash recovery, replication, etc.
 
-
- In PostgreSQL, the term
- cluster is also sometimes used to refer to an instance.
- (Don't confuse this term with the SQL command CLUSTER.)
-

   
 
@@ -1245,12 +1255,17 @@
  SQL objects,
  which all reside in the same
  database.
- Each SQL object must reside in exactly one schema.
+ Each SQL object must reside in exactly one schema
+ (though certain types of SQL objects exist outside schemas).
 
 
  The names of SQL objects of the same type in the same schema are enforced
  to be unique.
  There is no restriction on reusing a name in multiple schemas.
+ For local objects that e

Re: Add A Glossary

2020-06-09 Thread Jürgen Purtz

On 17.05.20 17:28, Alvaro Herrera wrote:

I think the terms under discussion are just

* cluster
* instance
* server



Despite the short period of its existence the glossary achieved some 
importance, see: 
https://www.postgresql.org/message-id/b8e12875ebec9e6d3107df5fa1129e1e%40postgrespro.ru 
. We have to be careful with publications. It's not acceptable that we 
change definitions from release to release. Therefore IMO we should mark 
or even ignore such terms for which we cannot reach consensus.


Can you agree to the following definitions? If no, we can alternatively 
formulate for each of them: "Under discussion - currently not defined". 
My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped 
into databases, and a collection of databases managed by a single 
PostgreSQL server instance constitutes a database cluster."



- "Database" (No change to existing definition): "A named collection of 
SQL objects."



- "Database Cluster", "Cluster" (New definition and rearrangements of 
some sentences): "A collection of related databases, and their common 
static and dynamic meta-data.


This term is sometimes used to refer to an instance.

(Don't confuse the term CLUSTER with the SQL command CLUSTER.)"


- "Data Directory" (Replaced 'instance' by 'cluster'): "The base 
directory on the filesystem of a server that contains all data files and 
subdirectories associated with a cluster (with the exception of 
tablespaces). The environment variable PGDATA is commonly used to refer 
to the data directory.


A cluster's storage space comprises the data directory plus any 
additional tablespaces.


For more information, see Section 68.1."


- "Database Server", "Instance" (Major changes): "A group of backend and 
auxiliary processes that communicate using a common shared memory area. 
One postmaster process manages the instance; one instance manages 
exactly one cluster with all its databases. Many instances can run on 
the same server as long as their TCP ports do not conflict.


The instance handles all key features of a DBMS: read and write access 
to files and shared memory, assurance of the ACID properties, 
connections to client processes, privilege verification, crash recovery, 
replication, etc."



- "Server" (No change to existing definition): "A computer on which 
PostgreSQL instances run. The term server denotes real hardware, a 
container, or a virtual machine.


This term is sometimes used to refer to an instance or to a host."


- "Host" (No change to existing definition): "A computer that 
communicates with other computers over a network. This is sometimes used 
as a synonym for server. It is also used to refer to a computer where 
client processes run."



--

Jürgen Purtz






Re: Add A Glossary

2020-05-26 Thread Peter Eisentraut

On 2020-04-29 21:55, Corey Huinker wrote:
On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut 
> wrote:


Why are all the glossary terms capitalized?  Seems kind of strange.


They weren't intended to be, and they don't appear to be in the page I'm 
looking at. Are you referring to the anchor like in 
https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? 
If so, that all-capping is part of the rendering, as the ids were all 
named in all-lower-case.


Sorry, I meant why is the first letter of each term capitalized.  That 
seems unusual.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-05-20 Thread Laurenz Albe
On Wed, 2020-05-20 at 13:17 +0200, Jürgen Purtz wrote:
> > FWIW, I feel somewhat like Alvaro on that point; I use those terms 
> > synonymously,
> > perhaps distinguishing between a "started cluster" and a "stopped cluster".
> > After all, "cluster" refers to "a cluster of databases", which are there, 
> > regardless
> > if you start the server or not.
> > 
> > The term "cluster" is unfortunate, because to most people it suggests a 
> > group of
> > machines, so the term "instance" is better, but that ship has sailed long 
> > ago.
> > 
> > The static part of a cluster to me is the "data directory".
>   
> cluster/instance: The different nature (static/dynamic) of what I
>   call "cluster" and "instance" as well as the existence of the two
>   commands "initdb — create a new PostgreSQL database cluster" and 
>   "pg_ctl — initialize, start, stop, or control a PostgreSQL server"
>   confirms me in my opinion that we need two different terms for
>   them.

I think that the "pg_ctl" example does not apply:
It does not talk about starting the cluster, but about starting the server 
process,
that is "server" in the way I understand it.

> There are situations where we need a single term for both of
>   them. "Instance and its data directory" or "Instance and its
>   cluster" are too wordy. In many cases we use "database server" or
>   "server" in this sense. Imo "Server" is too short and ambiguous.
>   "database server", the plural form "databases server", or the new
>   term "cluster server", which is more accurate, would be ok for me.
>   (Similar to "server", the term "cluster" is also used in many
>   different contexts - but only outside of the PG world; within our
>   context "cluster" is not ambiguous.) 

That does not feel right to me.

"cluster server", ouch. "databases server", ouch as well.

I never felt the term "cluster" was unclear in these contexts.
Sometimes it means "data directory", sometimes it is used for "server process",
but I think few people would think one cound connect to a data directory
or create a process in a directory (initdb).

I think clarity is a Good Thing, but it can be overdone.

> > > server/host: We need a term to describe the underlying hardware 
> > > respectively
> > > the virtual machine or container, where PG is running. I suggest to use 
> > > both
> > > *server* and *host*. In computer science, both have their eligibility and 
> > > are
> > > widely used. Everybody understands *client/server architecture* or *host* 
> > > in
> > > TCP/IP configuration. We cannot change such matter of course. I suggest to
> > > use both depending on the context, but with the same meaning: "real 
> > > hardware,
> > > a container, or a virtual machine".
> > 
> > On this I have a strong opinion because of my Unix mindset.
> > "machine" and "host" are synonyms, and it doesn't matter to the database if 
> > they
> > are virtualized or not.  You can always disambiguate by adding "virtual" or 
> > "physical".
> > 
> > A "server" is a piece of software that responds to client requests, never a 
> > machine.
> > In my book, this is purely Windows jargon.  The term "client-server 
> > architecture"
> > that you quote emphasized that.
> > 
> > Perhaps "machine" would be the preferable term, because "host" is more 
> > prone to
> > misunderstandings (except in a networking context).
> 
> server/host: I agree that we are not interested in the question
>   whether there is real hardware or any virtualization container. We
>   are even not interested in the operating system. Our primary
>   concern is the existence of a port of the Internet Protocol. But
>   is the term "server" appropriate to name an IP-port? Additionally,
>   "server" is used for other meanings: a) the previously mentioned
>   "database server" b) a (virtual) machine: "server-side", "... the
>   file ... loaded by the server ..." c) binaries "... the server
>   must be built with SSL support ..." d) whenever it seems to be
>   appropriate: "standby server", "... the server parses query ...",
>   "server configuration", "server process".

You are most thorough :^)
   
> Because of its ambiguous usage, the definition of "server" must
>   clarify the allowed meanings. What's about:
> 
> server: Depending on the context, the term *server* denotes:
>   
> An IP-port which is offered by any OS.   ?

A port is a server?  No way.
  
> A - possibly virtualized - machine

It might be good to disambiguate that, but I don't think that the PostgreSQL
documentation should use the word "server" to mean "machine".

> An abbreviation for the slightly longer term
> "database(s)/cluster server"  ??? this will support the
> readability, but not the clarity ???

"Server" is short for "database server" and is a set of processes that listen
for and handle incoming database client requests.

I think that covers all the meaning

Re: Add A Glossary

2020-05-20 Thread Jürgen Purtz

On 19.05.20 08:17, Laurenz Albe wrote:

On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:

cluster/instance: PG (mainly) consists of a group of processes that commonly
act on shared buffers. The processes are very closely related to each other
and with the buffers. They exist altogether or not at all. They use a common
initialization file and are incarnated by one command. Everything exists
solely in RAM and therefor has a fluctuating nature. In summary: they build
a unit and this unit needs to have a name of itself. In some pages we used
to use the term *instance* - sometimes in extended forms: *database instance*,
*PG instance*, *standby instance*, *standby server instance*, *server instance*,
or *remote instance*.  For me, the term *instance* makes sense, the extensions
*standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, 
regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".


cluster/instance: The different nature (static/dynamic) of what I call 
"cluster" and "instance" as well as the existence of the two commands 
"initdb — create a new PostgreSQL database cluster" and "pg_ctl — 
initialize, start, stop, or control a PostgreSQL server" confirms me in 
my opinion that we need two different terms for them. Those two terms 
shall not be synonym to each other, they label distinct things. If 
people prefer "data directory" instead of "cluster", this is ok for me.


There are situations where we need a single term for both of them. 
"Instance and its data directory" or "Instance and its cluster" are too 
wordy. In many cases we use "database server" or "server" in this sense. 
Imo "Server" is too short and ambiguous. "database server", the plural 
form "databases server", or the new term "cluster server", which is more 
accurate, would be ok for me. (Similar to "server", the term "cluster" 
is also used in many different contexts - but only outside of the PG 
world; within our context "cluster" is not ambiguous.)



server/host: We need a term to describe the underlying hardware respectively
the virtual machine or container, where PG is running. I suggest to use both
*server* and *host*. In computer science, both have their eligibility and are
widely used. Everybody understands *client/server architecture* or *host* in
TCP/IP configuration. We cannot change such matter of course. I suggest to
use both depending on the context, but with the same meaning: "real hardware,
a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or 
"physical".

A "server" is a piece of software that responds to client requests, never a 
machine.
In my book, this is purely Windows jargon.  The term "client-server 
architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

server/host: I agree that we are not interested in the question whether 
there is real hardware or any virtualization container. We are even not 
interested in the operating system. Our primary concern is the existence 
of a port of the Internet Protocol. But is the term "server" appropriate 
to name an IP-port? Additionally, "server" is used for other meanings: 
a) the previously mentioned "database server" b) a (virtual) machine: 
"server-side", "... the file ... loaded by the server ..." c) binaries 
"... the server must be built with SSL support ..." d) whenever it seems 
to be appropriate: "standby server", "... the server parses query ...", 
"server configuration", "server process".


Because of its ambiguous usage, the definition of "server" must clarify 
the allowed meanings. What's about:


--

server: Depending on the context, the term *server* denotes:

 * An IP-port which is offered by any OS.   ?
 * A - possibly virtualized - machine
 * An abbreviation for the slightly longer term "database(s)/cluster
   server"  ??? this will support the readability, but not the clarity ???
 * More ?

--

The term "host" is used mainly for IP configuration "host name", "host 
address" and in the context of compiling "host language", "host 
variable". These are clear situations and can be defined easily.





Re: Add A Glossary

2020-05-19 Thread Peter Eisentraut

On 2020-05-19 08:17, Laurenz Albe wrote:

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.


I don't see what would stop us from renaming some things, with some care.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-05-18 Thread Andrew Grillet
I think there needs to be a careful analysis of the language and a formal
effort to stabilise it for the future.

In the context of, say, an Oracle T series, which is partitioned into
multiple domains (virtual machines) in it, each
of these has multiple CPUs, and can run an instance of the OS which hosts
multiple virtual instances
of the same or different OSes. Som domains might do this while others do
not!

A host could be a domain, one of many virtual machines, or it could be one
of many hosts on that VM
but even these hosts could be virtual machines that each runs several
virtual servers!

Of course, PostgreSQL can run on any tier of this regime, but the
documentation at least needs to be consistent
about language.

A "machine" should probably refer to hardware, although I would accept that
a domain might count as "virtual
hardware" while a host should probably refer to a single instance of OS.

Of course it is possible for a single  instance of OS to run multiple
instances of PostgreSQL, and people do this. (I have
in the past).

Slightly more confusingly, it would appear possible for a single instance
of an OS to have multiple IP addresses
and if there are multiple instances of PostgreSQL, they may serve different
IP Addresses uniquely, or
share them. I think this case suggests that a host probably best describes
an OS instance. I might be wrong.

The word "server" might be an instance of any of the above, or a waiter
with a bowl of soup. It is best
reserved for situations where clarity is not required.

If you are new to all this, I am sure it is very confusing, and
inconsistent language is not going to help.

Andrew



AFAICT





On Tue, 19 May 2020 at 07:17, Laurenz Albe  wrote:

> On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> > cluster/instance: PG (mainly) consists of a group of processes that
> commonly
> > act on shared buffers. The processes are very closely related to each
> other
> > and with the buffers. They exist altogether or not at all. They use a
> common
> > initialization file and are incarnated by one command. Everything exists
> > solely in RAM and therefor has a fluctuating nature. In summary: they
> build
> > a unit and this unit needs to have a name of itself. In some pages we
> used
> > to use the term *instance* - sometimes in extended forms: *database
> instance*,
> > *PG instance*, *standby instance*, *standby server instance*, *server
> instance*,
> > or *remote instance*.  For me, the term *instance* makes sense, the
> extensions
> > *standby instance* and *remote instance* in their context too.
>
> FWIW, I feel somewhat like Alvaro on that point; I use those terms
> synonymously,
> perhaps distinguishing between a "started cluster" and a "stopped cluster".
> After all, "cluster" refers to "a cluster of databases", which are there,
> regardless
> if you start the server or not.
>
> The term "cluster" is unfortunate, because to most people it suggests a
> group of
> machines, so the term "instance" is better, but that ship has sailed long
> ago.
>
> The static part of a cluster to me is the "data directory".
>
> > server/host: We need a term to describe the underlying hardware
> respectively
> > the virtual machine or container, where PG is running. I suggest to use
> both
> > *server* and *host*. In computer science, both have their eligibility
> and are
> > widely used. Everybody understands *client/server architecture* or
> *host* in
> > TCP/IP configuration. We cannot change such matter of course. I suggest
> to
> > use both depending on the context, but with the same meaning: "real
> hardware,
> > a container, or a virtual machine".
>
> On this I have a strong opinion because of my Unix mindset.
> "machine" and "host" are synonyms, and it doesn't matter to the database
> if they
> are virtualized or not.  You can always disambiguate by adding "virtual"
> or "physical".
>
> A "server" is a piece of software that responds to client requests, never
> a machine.
> In my book, this is purely Windows jargon.  The term "client-server
> architecture"
> that you quote emphasized that.
>
> Perhaps "machine" would be the preferable term, because "host" is more
> prone to
> misunderstandings (except in a networking context).
>
> Yours,
> Laurenz Albe
>
>
>
>


Re: Add A Glossary

2020-05-18 Thread Laurenz Albe
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server 
> instance*,
> or *remote instance*.  For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, 
regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or 
"physical".

A "server" is a piece of software that responds to client requests, never a 
machine.
In my book, this is purely Windows jargon.  The term "client-server 
architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

Yours,
Laurenz Albe





Re: Add A Glossary

2020-05-18 Thread Jürgen Purtz

On 17.05.20 17:28, Alvaro Herrera wrote:

On 2020-May-17, Erik Rijkers wrote:


On 2020-05-17 08:51, Alvaro Herrera wrote:

I don't think that's the general understanding of those terms.  For all
I know, they*are*  synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.

For what it's worth, I've also always understood 'instance' as 'a running
database'.  I admit it might be a left-over from my oracle years:

https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601

There, 'instance' clearly refers to a running database.  When that database
is stopped, it ceases to be an instance.

I've never understood it that way, but I'm open to having my opinion on
it changed.  So let's discuss it and maybe gather opinions from others.

I think the terms under discussion are just

* cluster
* instance
* server

We don't have "host" (I just made it a synonym for server), but perhaps
we can add that too, if it's useful.  It would be good to be consistent
with historical Postgres usage, such as the initdb usage of "cluster"
etc.

Perhaps we should not only define what our use of each term is, but also
explain how each term is used outside PostgreSQL and highlight the
differences.  (This would be particularly useful for "cluster" ISTM.)


In fact, we have reached a point where we don't have a common 
understanding of a group of terms. I'm sure that we will meet some more 
situations like this in the future. Such discussions, subsequent 
decisions, and implementations in the docs are necessary to gain a solid 
foundation - primarily for newcomers (what is my first motivation) as 
well as for more complex discussions among experts. Obviously, each of 
us will include his previous understanding of terms. But we also should 
be open to sometimes revise old terms.


Here are my two cents.

cluster/instance: PG (mainly) consists of a group of processes that 
commonly act on shared buffers. The processes are very closely related 
to each other and with the buffers. They exist altogether or not at all. 
They use a common initialization file and are incarnated by one command. 
Everything exists solely in RAM and therefor has a fluctuating nature. 
In summary: they build a unit and this unit needs to have a name of 
itself. In some pages we used to use the term *instance* - sometimes in 
extended forms: *database instance*, *PG instance*, *standby instance*, 
*standby server instance*, *server instance*, or *remote instance*.  For 
me, the term *instance* makes sense, the extensions *standby instance* 
and *remote instance* in their context too.


The next essential component is the data itself. It is organized as a 
group of databases plus some common management information (global, 
pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a 
whole because the management information concerns all databases. Its 
nature is different from the processes and shared buffers. Of course, 
its content changes, but it has a steady nature. It even survives a 
'power down'. There is one command to instantiate a new incarnation of 
the directory structure and all files. In summary, it's something of its 
own and should have its own name. 'database' is not possible because it 
consists of databases and other things. My favorite is *cluster*; 
*database cluster* is also possible.


server/host: We need a term to describe the underlying hardware 
respectively the virtual machine or container, where PG is running. I 
suggest to use both *server* and *host*. In computer science, both have 
their eligibility and are widely used. Everybody understands 
*client/server architecture* or *host* in TCP/IP configuration. We 
cannot change such matter of course. I suggest to use both depending on 
the context, but with the same meaning: "real hardware, a container, or 
a virtual machine".


--

Jürgen Purtz

(PS: I added the docs mailing list)




Re: Add A Glossary

2020-05-17 Thread Alvaro Herrera
On 2020-May-17, Erik Rijkers wrote:

> On 2020-05-17 08:51, Alvaro Herrera wrote:

> > I don't think that's the general understanding of those terms.  For all
> > I know, they *are* synonyms, and there's no specific term for "the
> > fluctuating objects" as you call them.  The instance is either running
> > (in which case there are processes and RAM) or it isn't.
> 
> For what it's worth, I've also always understood 'instance' as 'a running
> database'.  I admit it might be a left-over from my oracle years:
> 
> https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601
> 
> There, 'instance' clearly refers to a running database.  When that database
> is stopped, it ceases to be an instance.

I've never understood it that way, but I'm open to having my opinion on
it changed.  So let's discuss it and maybe gather opinions from others.

I think the terms under discussion are just

* cluster
* instance
* server

We don't have "host" (I just made it a synonym for server), but perhaps
we can add that too, if it's useful.  It would be good to be consistent
with historical Postgres usage, such as the initdb usage of "cluster"
etc.

Perhaps we should not only define what our use of each term is, but also
explain how each term is used outside PostgreSQL and highlight the
differences.  (This would be particularly useful for "cluster" ISTM.)

It seems difficult to get this sorted out before beta1, but there's
still time before the glossary is released.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-05-17 Thread Erik Rijkers

On 2020-05-17 08:51, Alvaro Herrera wrote:

On 2020-May-17, Jürgen Purtz wrote:


On 15.05.20 02:00, Alvaro Herrera wrote:
> Thanks everybody.  I have compiled together all the suggestions and the
>
> * I changed "instance", and made "cluster" be mostly a synonym of that.
In my understanding, "instance" and "cluster" should be different 
things,


I don't think that's the general understanding of those terms.  For all
I know, they *are* synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.


For what it's worth, I've also always understood 'instance' as 'a 
running database'.  I admit it might be a left-over from my oracle 
years:


  
https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601


There, 'instance' clearly refers to a running database.  When that 
database is stopped, it ceases to be an instance.  I've always 
understood this to be the same for the PostgreSQL 'instance'.  Once 
stopped, it is no longer an instance, but it is, of course, still a 
cluster.


I know, we don't have to do the same as Oracle, but clearly it's going 
to be an ongoing source of misunderstanding if we define such a 
high-level term differently.


Erik Rijkers




Re: Add A Glossary

2020-05-17 Thread Jürgen Purtz

On 17.05.20 08:51, Alvaro Herrera wrote:

On 15.05.20 02:00, Alvaro Herrera wrote:

Thanks everybody.  I have compiled together all the suggestions and the
result is in the attached patch.  Some of it is of my own devising.

* I changed "instance", and made "cluster" be mostly a synonym of that.

In my understanding, "instance" and "cluster" should be different things,
not only synonyms. "instance" can be the term for permanently fluctuating
objects (processes and RAM) and "cluster" can denote the more static objects
(directories and files). What do you think? If you agree, I would create a
patch.

I don't think that's the general understanding of those terms.  For all
I know, they*are*  synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.

We have the basic tools "initdb — create a new PostgreSQL database 
cluster" which affects nothing but files, and we have "pg_ctl — 
initialize, start, stop, or control a PostgreSQL server" which - 
directly - affects nothing but processes and RAM. (Here the term 
"server" collides with new definitions in the glossary. But that's 
another story.)


--

Jürgen Purtz




Re: Add A Glossary

2020-05-17 Thread Jürgen Purtz

On 17.05.20 08:51, Alvaro Herrera wrote:

Any object that
exists in a database is local, regardless of whether it exists in a
schema or not.
This implies that the term "local" is unnecessary, just call them "SQL 
object".

"Extensions" is one type of object that does not belong
in a schema.  "Foreign data wrapper" is another type of object that does
not belong in a schema.  ...  They are*not*
global objects.
postgres_fdw is a module among many others. It's only an example for 
"extensions" and has no different nature. Yes, they are not global SQL 
objects because they don't belong to the cluster.


In summary we have 3 types of objects: belonging to a schema, to a 
database, or to the cluster (global). Maybe, we can avoid the use of the 
different names 'local SQL object' and 'global SQL object' at all and 
just call them 'SQL object'. 'global SQL object' is used only once. We 
could rephrase "A set of databases and accompanying global SQL objects 
... " to "A set of databases and accompanying SQL objects, which exists 
at the cluster level, ... "



TBH I'm not sure of this term at all.  I think we sometimes use the
term "bloat" to talk about the dead rows only, ignoring the free space.


That's a good example for the necessity of the glossary. Currently we 
don't have a common understanding about all of our used terms. The 
glossary shall fix that and give a mandatory definition - after a 
clearing discussion.


--

Jürgen Purtz






Re: Add A Glossary

2020-05-16 Thread Alvaro Herrera
On 2020-May-17, Jürgen Purtz wrote:

> On 15.05.20 02:00, Alvaro Herrera wrote:
> > Thanks everybody.  I have compiled together all the suggestions and the
> > result is in the attached patch.  Some of it is of my own devising.
> > 
> > * I changed "instance", and made "cluster" be mostly a synonym of that.
> In my understanding, "instance" and "cluster" should be different things,
> not only synonyms. "instance" can be the term for permanently fluctuating
> objects (processes and RAM) and "cluster" can denote the more static objects
> (directories and files). What do you think? If you agree, I would create a
> patch.

I don't think that's the general understanding of those terms.  For all
I know, they *are* synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.


> > * I removed "global SQL object" and made "SQL object" explain it.
> +1., but see the (huge) different spellings in patch.

This seems a misunderstanding of what "local" means.  Any object that
exists in a database is local, regardless of whether it exists in a
schema or not.  "Extensions" is one type of object that does not belong
in a schema.  "Foreign data wrapper" is another type of object that does
not belong in a schema.  Same with data type casts.  They are *not*
global objects.

> bloat: changed 'current row' to 'relevant row' because not only the youngest
> one is relevant (non-bloat).

Hm.  TBH I'm not sure of this term at all.  I think we sometimes use the
term "bloat" to talk about the dead rows only, ignoring the free space.

> data type casts: Are you sure that they are global? In pg_cast 'relisshared'
> is 'false'.

I'm not saying they're global.  I'm saying they're outside schemas.
Maybe this definition needs more rewording, if this bit is unclear.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-05-16 Thread Jürgen Purtz

On 15.05.20 02:00, Alvaro Herrera wrote:

Thanks everybody.  I have compiled together all the suggestions and the
result is in the attached patch.  Some of it is of my own devising.

* I changed "instance", and made "cluster" be mostly a synonym of that.
In my understanding, "instance" and "cluster" should be different 
things, not only synonyms. "instance" can be the term for permanently 
fluctuating objects (processes and RAM) and "cluster" can denote the 
more static objects (directories and files). What do you think? If you 
agree, I would create a patch.

* I removed "global SQL object" and made "SQL object" explain it.

+1., but see the (huge) different spellings in patch.

bloat: changed 'current row' to 'relevant row' because not only the 
youngest one is relevant (non-bloat).


data type casts: Are you sure that they are global? In pg_cast 
'relisshared' is 'false'.


--

Jürgen Purtz


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8bb1ea5d87..75f0dc9a8c 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -179,7 +179,7 @@
Bloat

 
- Space in data pages which does not contain current row versions,
+ Space in data pages which does not contain relevant row versions,
  such as unused (free) space or outdated row versions.
 

@@ -1388,23 +1388,27 @@
 
  
   Any object that can be created with a CREATE
-  command.  Most objects are specific to one database, and are commonly
-  known as local objects.
+  command. Most objects are specific to one schema within
+  one database, and are commonly
+  known as local SQL objects.
+ 
+ 
+  Some of the SQL objects do not belong to a single schema but
+  are known in all schemas of a database. Examples are
+  extensions like
+  foreign data wrappers, and
+  data type casts.
+ 
+ 
+  Some others even belong to the complete cluster and are
+  known in all databases and all its schemas.
   Roles,
   tablespaces,
   replication origins, subscriptions for logical replication, and
-  databases themselves are not local SQL objects since they exist
-  entirely outside of any specific database;
-  they are called global objects.
+  all database names exist
+  entirely outside of any specific database.
+  They are called global SQL objects.
  
- 
-  Most local objects belong to a specific
-  schema in their containing database.
-  There also exist local objects that do not belong to schemas; some examples are
-  extensions,
-  data type casts, and
-  foreign data wrappers.
-
 
   For more information, see
   .


Re: Add A Glossary

2020-05-16 Thread Alvaro Herrera
On 2020-May-16, Erik Rijkers wrote:

> On 2020-05-15 19:26, Alvaro Herrera wrote:
> > Applied all these suggestions, and made a few additional very small
> > edits, and pushed -- better to ship what we have now in beta1, but
> > further edits are still possible.
> 
> I've gone through the glossary as committed and found some more small
> things; patch attached.

All pushed!  Many thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-05-16 Thread Erik Rijkers

On 2020-05-15 19:26, Alvaro Herrera wrote:

Applied all these suggestions, and made a few additional very small
edits, and pushed -- better to ship what we have now in beta1, but
further edits are still possible.


I've gone through the glossary as committed and found some more small 
things; patch attached.


Thanks,


Erik Rijkers



Other possible terms to define, including those from the tweet I linked
to and a couple more:

archive
availability
backup
composite type
common table expression
data type
domain
dump
export
fault tolerance
GUC
high availability
hot standby
LSN
restore
secondary server (?)
snapshot
transactions per second

Anybody want to try their hand at a tentative definition?

--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--- doc/src/sgml/glossary.sgml.orig	2020-05-16 23:30:01.132290237 +0200
+++ doc/src/sgml/glossary.sgml	2020-05-16 23:38:35.643371310 +0200
@@ -67,7 +67,7 @@

 
  In reference to a datum:
- the fact that its value that cannot be broken down into smaller
+ the fact that its value cannot be broken down into smaller
  components.
 

@@ -360,7 +360,7 @@
  integrity constraints.
  Transactions may be allowed to violate some of the constraints
  transiently before it commits, but if such violations are not resolved
- by the time it commits, such transaction is automatically
+ by the time it commits, such a transaction is automatically
  rolled back.
  This is one of the ACID properties.
 
@@ -649,8 +649,8 @@
Grant

 
- An SQL command that is used to allow
- users or
+ An SQL command that is used to allow a
+ user or
  role to access
  specific objects within the database.
 
@@ -887,7 +887,7 @@

 
  A relation that is
- defined in the same way that a a view
+ defined in the same way that a view
  is, but stores data in the same way that a
  table does. It cannot be
  modified via INSERT, UPDATE, or
@@ -962,7 +962,7 @@

 
  In reference to a window function:
- a partition is a user-defined criteria that identifies which neighboring
+ a partition is a user-defined criterion that identifies which neighboring
  rows can be considered by the
  function.
 
@@ -1446,8 +1446,8 @@
  The system catalog resides in the schema pg_catalog.
  These tables contain data in internal representation and are
  not typically considered useful for user examination;
- a number of user-friendlier views
- also in schema pg_catalog offer more convenient access to
+ a number of user-friendlier views,
+ also in schema pg_catalog, offer more convenient access to
  some of that information, while additional tables and views
  exist in schema information_schema
  (see ) that expose some
@@ -1739,7 +1739,7 @@
  each page stores two bits: the first one
  (all-visible) indicates that all tuples
  in the page are visible to all transactions.  The second one
- (all-frozen) indicate that all tuples
+ (all-frozen) indicates that all tuples
  in the page are marked frozen.
 

@@ -1755,7 +1755,7 @@

 
  A process that saves copies of WAL files
- for the purposes of creating backups or keeping
+ for the purpose of creating backups or keeping
  replicas current.
 
 
@@ -1777,7 +1777,7 @@
  and are written in sequential order, interspersing changes
  as they occur in multiple simultaneous sessions.
  If the system crashes, the files are read in order, and each of the
- changes is replayed to restore the system to the state as it was
+ changes is replayed to restore the system to the state it was in
  before the crash.
 
 


Re: Add A Glossary

2020-05-15 Thread Alvaro Herrera
Applied all these suggestions, and made a few additional very small
edits, and pushed -- better to ship what we have now in beta1, but
further edits are still possible.

Other possible terms to define, including those from the tweet I linked
to and a couple more:

archive
availability
backup
composite type
common table expression
data type
domain
dump
export
fault tolerance
GUC
high availability
hot standby
LSN
restore
secondary server (?)
snapshot
transactions per second

Anybody want to try their hand at a tentative definition?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 6cdbfd1d9f2d6c7fa815aab853a51f86b3650e11 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 15 May 2020 13:04:11 -0400
Subject: [PATCH v3] Review of the glossary
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add some more terms, clarify some definitions, remove redundant terms,
move a couple of terms to keep alphabetical order.

Co-authored-by: Jürgen Purtz 
Co-authored-by: Erik Rijkers 
Co-authored-by: Laurenz Albe 
Discussion: https://postgr.es/m/7b9b469e804777ac9df4d37716db9...@xs4all.nl
---
 doc/src/sgml/acronyms.sgml |   2 +-
 doc/src/sgml/glossary.sgml | 536 ++---
 2 files changed, 316 insertions(+), 222 deletions(-)

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..b05c065546 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -766,7 +766,7 @@
 XID
 
  
-  Transaction Identifier
+  Transaction identifier
  
 

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8c6cb6e942..a1e8a595c8 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -7,8 +7,23 @@
  
 
  
+  
+   ACID
+   
+
+ Atomicity,
+ Consistency,
+ Isolation, and
+ Durability.
+ This set of properties of database transactions is intended to
+ guarantee validity in concurrent operation and even in event of
+ errors, power failures, etc.
+ 
+   
+  
+
   
-   Aggregate Function
+   Aggregate function

 
  A function that
@@ -35,11 +50,15 @@
  to make decisions about how to execute
  queries.
 
+
+ (Don't confuse this term with the ANALYZE option
+ to the  command.)
+

   
 
   
-   Analytic Function
+   Analytic function

   
 
@@ -106,8 +125,8 @@

 
  Process of an instance
- which act on behalf of client sessions
- and handle their requests.
+ which acts on behalf of a client session
+ and handles its requests.
 
 
  (Don't confuse this term with the similar terms
@@ -118,7 +137,7 @@
   
 
   
-   Background Worker (process)
+   Background worker (process)

 
  Process within an instance,
@@ -138,10 +157,11 @@
   
 
   
-   Background Writer (process)
+   Background writer (process)

 
- A process that continuously writes dirty pages from
+ A process that writes dirty
+ data pages from
  shared memory to
  the file system.  It wakes up periodically, but works only for a short
  period in order to distribute its expensive I/O
@@ -155,6 +175,16 @@

   
 
+  
+   Bloat
+   
+
+ Space in data pages which does not contain current row versions,
+ such as unused (free) space or outdated row versions.
+
+   
+  
+
   
Cast

@@ -190,7 +220,7 @@
   
 
   
-   Check Constraint
+   Check constraint

 
  A type of constraint
@@ -208,15 +238,6 @@

   
 
-  
-   Checkpointer (process)
-   
-
- A specialized process responsible for executing checkpoints.
-
-   
-  
-
   
Checkpoint

@@ -244,6 +265,15 @@

   
 
+  
+   Checkpointer (process)
+   
+
+ A specialized process responsible for executing checkpoints.
+
+   
+  
+
   
Class (archaic)

@@ -262,27 +292,6 @@

   
 
-  
-   Cluster
-   
-
- A group of databases plus their
- global SQL objects. The
- cluster is managed by exactly one
- instance. A newly created
- Cluster will have three databases created automatically. They are
- template0, template1, and
- postgres. It is expected that an application will
- create one or more additional database aside from these three.
-
-
- (Don't confuse the PostgreSQL-specific term
- Cluster with the SQL
- command CLUSTER).
-
-   
-  
-
   
Column

@@ -363,7 +372,10 @@

 
  A restriction on the values of data allowed within a
- Table.
+ table,
+ or in attributes of a
+ 
+ domain.
 
 
  For more information, see
@@ -373,19 +385,19 @@
   
 
   
-   Data Area
+   Data area

   
 
   
-   Data Directory
+   Data directory

 
  The base directory on the filesystem of a
  server that contains all
-

Re: Add A Glossary

2020-05-14 Thread Justin Pryzby
On Thu, May 14, 2020 at 08:00:17PM -0400, Alvaro Herrera wrote:
> +   ACID
> +   
> +
> + Atomicity,
> + consistency,
> + isolation, and
> + durability.
> + A set of properties of database transactions intended to guarantee 
> validity
> + in concurrent operation and even in event of errors, power failures, 
> etc.

I would capitalize Consistency, Isolation, Durability, and say "These four
properties" or "This set of four properties" (althought that makes this sounds
more like a fun game of DBA jeopardy).

> +   Background writer (process)
> 
>  
> - A process that continuously writes dirty pages from
> + A process that continuously writes dirty

I don't like "continuously"

> + data pages from
>  
> +  
> +   Bloat
> +   
> +
> + Space in data pages which does not contain relevant data,
> + such as unused (free) space or outdated row versions.

"current row versions" instead of relevant ?

> +  
> +   Data page
> +   
> +
> + The basic structure used to store relation data.
> + All pages are of the same size.
> + Data pages are typically stored on disk, each in a specific file,
> + and can be read to shared 
> buffers
> + where they can be modified, becoming
> + dirty.  They get clean by being written down

say "They become clean when written to disk"

> + to disk.  New pages, which initially exist in memory only, are also
> + dirty until written.

> +  
> +   Fork
> +   
> +
> + Each of the separate segmented file sets that a relation stores its
> + data in.  There exist a main fork and two 
> secondary

"in which a relation's data is stored"

> + forks: the free space map
> + visibility map.

missing "and" ?

> +  
> +   Free space map (fork)
> +   
> +
> + A storage structure that keeps metadata about each data page in a 
> table's
> + main storage space.

s/in/of/

just say "main fork"?

> The free space map entry for each space stores the

for each page ?

> + amount of free space that's available for future tuples, and is 
> structured
> + so it is efficient to search for available space for a new tuple of a 
> given
> + size.

..to be efficiently searched to find free space..

>   The heap is realized within
> - segment files.
> + segmented files
> + in the relation's main 
> fork.

Hm, the files aren't segmented.  Say "one or more file segments per relation"

> +  There also exist local objects that do not belong to schemas; some 
> examples are
> +  extensions,
> +  data type casts, and
> +  foreign data 
> wrappers.

Don't extensions have schemas ?

> +  
> +   Transaction ID
> +   
> +
> + The numerical, unique, sequentially-assigned identifier that each
> + transaction receives when it first causes a database modification.
> + Frequently abbreviated xid.

abbreviated *as* xid

> + approximately four billion write transactions IDs can be generated;
> + to permit the system to run for longer than that would allow,

remove "would allow"

>  
>   The process of removing outdated  linkend="glossary-tuple">tuple
>   versions from tables, and other closely related

actually tables or materialized views..

> +  
> +   Visibility map (fork)
> +   
> +
> + A storage structure that keeps metadata about each data page
> + in a table's main storage space.  The visibility map entry for

s/in/of/

main fork?

-- 
Justin




Re: Add A Glossary

2020-05-14 Thread Alvaro Herrera
Thanks everybody.  I have compiled together all the suggestions and the
result is in the attached patch.  Some of it is of my own devising.

* I changed "instance", and made "cluster" be mostly a synonym of that.

* I removed "global SQL object" and made "SQL object" explain it.

* Added definitions for ACID, sequence, bloat, fork, FSM, VM, data page,
  transaction ID, epoch.

* Changed "a SQL" to "an sql" everywhere.

* Sorted alphabetically.

* Removed caps in term names.

I think I should get this pushed, and if there are further suggestions,
they're welcome.

Dim Fontaine and others suggested a number of terms that could be
included; see https://twitter.com/alvherre/status/1246192786287865856

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..b05c065546 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -766,7 +766,7 @@
 XID
 
  
-  Transaction Identifier
+  Transaction identifier
  
 

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8c6cb6e942..d4255215aa 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -7,8 +7,22 @@
  
 
  
+  
+   ACID
+   
+
+ Atomicity,
+ consistency,
+ isolation, and
+ durability.
+ A set of properties of database transactions intended to guarantee validity
+ in concurrent operation and even in event of errors, power failures, etc.
+ 
+   
+  
+
   
-   Aggregate Function
+   Aggregate function

 
  A function that
@@ -35,11 +49,15 @@
  to make decisions about how to execute
  queries.
 
+
+ (Don't confuse this term with the ANALYZE option
+ to the  command.)
+

   
 
   
-   Analytic Function
+   Analytic function

   
 
@@ -106,8 +124,8 @@

 
  Process of an instance
- which act on behalf of client sessions
- and handle their requests.
+ which acts on behalf of a client session
+ and handles its requests.
 
 
  (Don't confuse this term with the similar terms
@@ -118,7 +136,7 @@
   
 
   
-   Background Worker (process)
+   Background worker (process)

 
  Process within an instance,
@@ -138,10 +156,11 @@
   
 
   
-   Background Writer (process)
+   Background writer (process)

 
- A process that continuously writes dirty pages from
+ A process that continuously writes dirty
+ data pages from
  shared memory to
  the file system.  It wakes up periodically, but works only for a short
  period in order to distribute its expensive I/O
@@ -155,6 +174,16 @@

   
 
+  
+   Bloat
+   
+
+ Space in data pages which does not contain relevant data,
+ such as unused (free) space or outdated row versions.
+
+   
+  
+
   
Cast

@@ -190,7 +219,7 @@
   
 
   
-   Check Constraint
+   Check constraint

 
  A type of constraint
@@ -208,15 +237,6 @@

   
 
-  
-   Checkpointer (process)
-   
-
- A specialized process responsible for executing checkpoints.
-
-   
-  
-
   
Checkpoint

@@ -244,6 +264,15 @@

   
 
+  
+   Checkpointer (process)
+   
+
+ A specialized process responsible for executing checkpoints.
+
+   
+  
+
   
Class (archaic)

@@ -262,27 +291,6 @@

   
 
-  
-   Cluster
-   
-
- A group of databases plus their
- global SQL objects. The
- cluster is managed by exactly one
- instance. A newly created
- Cluster will have three databases created automatically. They are
- template0, template1, and
- postgres. It is expected that an application will
- create one or more additional database aside from these three.
-
-
- (Don't confuse the PostgreSQL-specific term
- Cluster with the SQL
- command CLUSTER).
-
-   
-  
-
   
Column

@@ -363,7 +371,10 @@

 
  A restriction on the values of data allowed within a
- Table.
+ table,
+ or in attributes of a
+ 
+ domain.
 
 
  For more information, see
@@ -373,18 +384,18 @@
   
 
   
-   Data Area
+   Data area

   
 
   
-   Data Directory
+   Data directory

 
  The base directory on the filesystem of a
  server that contains all
  data files and subdirectories associated with a
- cluster with the
+ instance with the
  exception of tablespaces.
  The environment variable PGDATA is commonly used to
  refer to the
@@ -416,15 +427,31 @@
   
 
   
-   Database Server
+   Database server

   
 
+  
+   Data page
+   
+
+ The basic structure used to store relation data.
+ All pages are of the same size.
+ Data pages are typically stored on disk, each in a specific file,
+ and can be read to shared buffers
+ where they can be modified, be

Re: Add A Glossary

2020-04-29 Thread Corey Huinker
On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> Why are all the glossary terms capitalized?  Seems kind of strange.
>
>
They weren't intended to be, and they don't appear to be in the page I'm
looking at. Are you referring to the anchor like in
https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? If
so, that all-capping is part of the rendering, as the ids were all named in
all-lower-case.


Re: Add A Glossary

2020-04-29 Thread Peter Eisentraut

Why are all the glossary terms capitalized?  Seems kind of strange.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-04-12 Thread Jürgen Purtz


On 11.04.20 21:47, Corey Huinker wrote:



Term 'relation': A sequence is internally a table with one row -
right?
Shall we extend the list of concrete relations by 'sequence'? Or
is this
not necessary because 'table' is already there?


I wrote one for sequence, it was a bit math-y for Alvaro's taste, so 
we're going to try again.


This seems to be a misunderstanding. My question was whether we shall 
extend the definition of Relation to: "... Tables, views, foreign 
tables, materialized views, indexes, and *sequences* are all relations."


Kind regards, Jürgen



Re: Add A Glossary

2020-04-11 Thread Corey Huinker
>
>
> Term 'relation': A sequence is internally a table with one row - right?
> Shall we extend the list of concrete relations by 'sequence'? Or is this
> not necessary because 'table' is already there?
>

I wrote one for sequence, it was a bit math-y for Alvaro's taste, so we're
going to try again.


Re: Add A Glossary

2020-04-11 Thread Jürgen Purtz

On 2020-Apr-05, Jürgen Purtz wrote:


a) Some rearrangements of the sequence of terms to meet alphabetical order.

Thanks, will get this pushed.


b)   -->   in
two cases. Or should it be a ?

Ah, yeah, those should be linkend.

Term 'relation': A sequence is internally a table with one row - right? 
Shall we extend the list of concrete relations by 'sequence'? Or is this 
not necessary because 'table' is already there?


Kind regards, Jürgen






Re: Add A Glossary

2020-04-05 Thread Alvaro Herrera
On 2020-Apr-05, Fabien COELHO wrote:

> > > As the definitions are short and to the point, maybe the HTML display
> > > could (also) "hover" the definitions when the mouse passes over the word,
> > > using the "title" attribute?
> > 
> > I like that idea, if it doesn't conflict with accessibility standards
> > (maybe that's just titles on images, not sure).
> 
> The following worked fine:
> 
>   Title Tag Test
>   The ACID
>   property is great.
>   

I don't see myself patching the stylesheet as would be needed to do
this.

> > I suggest we pursue this idea in another thread, as we'd probably want to
> > do it for acronyms as well.
> 
> Or not. I'd test committer temperature before investing time because it
> would mean that backpatching the doc would be a little harder.

TBH I can't get very excited about this idea.  Maybe other documentation
champions would be happier about doing that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-04-05 Thread Alvaro Herrera
On 2020-Apr-05, Jürgen Purtz wrote:

> a) Some rearrangements of the sequence of terms to meet alphabetical order.

Thanks, will get this pushed.

> b)   -->   in
> two cases. Or should it be a ?

Ah, yeah, those should be linkend.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-04-05 Thread Jürgen Purtz

a) Some rearrangements of the sequence of terms to meet alphabetical order.

b)   -->   
in two cases. Or should it be a ?



Kind regards, Jürgen


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8c6cb6e942..25762b7c3a 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -208,15 +208,6 @@

   
 
-  
-   Checkpointer (process)
-   
-
- A specialized process responsible for executing checkpoints.
-
-   
-  
-
   
Checkpoint

@@ -244,6 +235,15 @@

   
 
+  
+   Checkpointer (process)
+   
+
+ A specialized process responsible for executing checkpoints.
+
+   
+  
+
   
Class (archaic)

@@ -761,25 +761,6 @@

   
 
-  
-   Logger (process)
-   
-
- If activated, the
- Logger process
- writes information about database events into the current
- log file.
- When reaching certain time- or
- volume-dependent criteria, a new log file is created.
- Also called syslogger.
-
-
- For more information, see
- .
-
-   
-  
-
   
Log Record
 
@@ -803,6 +784,25 @@

   
 
+  
+   Logger (process)
+   
+
+ If activated, the
+ Logger process
+ writes information about database events into the current
+ log file.
+ When reaching certain time- or
+ volume-dependent criteria, a new log file is created.
+ Also called syslogger.
+
+
+ For more information, see
+ .
+
+   
+  
+
   
Master (server)

@@ -1651,6 +1651,11 @@

   
 
+  
+   WAL
+   
+  
+
   
WAL Archiver (process)

@@ -1696,11 +1701,6 @@

   
 
-  
-   WAL
-   
-  
-
   
WAL Record

@@ -1728,8 +1728,8 @@
   

 A process that writes WAL records
-from shared memory to
-WAL files.
+from shared memory to
+WAL files.


 For more information, see


Re: Add A Glossary

2020-04-05 Thread Fabien COELHO



Hi Corey,


ISTM that occurrences of these words elsewhere in the documentation should
link to the glossary definitions?


Yes, that's a big project. I was considering writing a script to compile
all the terms as search terms, paired with their glossary ids, and then
invoke git grep to identify all pages that have term FOO but don't have
glossary-foo. We would then go about gloss-linking those pages as
appropriate, but only a few pages at a time to keep scope sane.


Id go for scripting the thing.

Should the glossary be backpatched, to possibly ease doc patchpatches?

Also, I'm unclear about the circumstances under which we should _not_ 
tag a term.


At least when then are explained locally.

I remember hearing that we should only tag it on the first usage, but is 
that per section or per page?


Page?


As the definitions are short and to the point, maybe the HTML display
could (also) "hover" the definitions when the mouse passes over the word,
using the "title" attribute?


I like that idea, if it doesn't conflict with accessibility standards
(maybe that's just titles on images, not sure).


The following worked fine:

  Title Tag Test
  The ACID
  property is great.
  

So basically the def can be put on the glossary link, however retrieving 
the definition should be automatic.



I suspect we would want to just carry over the first sentence or so with a
... to avoid cluttering the screen with my overblown definition of a
sequence.


Dunno. The definitions are quite short, maybe the can fit whole.


I suggest we pursue this idea in another thread, as we'd probably want to
do it for acronyms as well.


Or not. I'd test committer temperature before investing time because it 
would mean that backpatching the doc would be a little harder.



Entries could link to relevant wikipedia pages, like the acronyms section
does?


They could. I opted not to do that because each external link invites
debate about how authoritative that link is, which is easier to do with
acronyms. Now that the glossary is a reality, it's easier to have those
discussions.


Ok.

--
Fabien.




Re: Add A Glossary

2020-04-04 Thread Corey Huinker
On Sat, Apr 4, 2020 at 2:55 AM Fabien COELHO  wrote:

>
> > BTW it's now visible at:
> > https://www.postgresql.org/docs/devel/glossary.html


Nice. I went looking for it yesterday and the docs hadn't rebuilt yet.


> ISTM that occurrences of these words elsewhere in the documentation should
> link to the glossary definitions?
>

Yes, that's a big project. I was considering writing a script to compile
all the terms as search terms, paired with their glossary ids, and then
invoke git grep to identify all pages that have term FOO but don't have
glossary-foo. We would then go about gloss-linking those pages as
appropriate, but only a few pages at a time to keep scope sane. Also, I'm
unclear about the circumstances under which we should _not_ tag a term. I
remember hearing that we should only tag it on the first usage, but is that
per section or per page?


> As the definitions are short and to the point, maybe the HTML display
> could (also) "hover" the definitions when the mouse passes over the word,
> using the "title" attribute?
>

I like that idea, if it doesn't conflict with accessibility standards
(maybe that's just titles on images, not sure).
I suspect we would want to just carry over the first sentence or so with a
... to avoid cluttering the screen with my overblown definition of a
sequence.
I suggest we pursue this idea in another thread, as we'd probably want to
do it for acronyms as well.


>
> "ACID" does not appear as an entry, nor in the acronyms sections. Also no
> DCL, although DML & DDL are in acronyms.
>

It needs to be in the acronyms page, and in light of all the docbook
wizardry that I've learned from Alvaro, those should probably get their own
acronym-foo ids as well. The cutoff date for 13 fast approaches, so it
might be for 14+ unless doc-only patches are treated differently.


> Entries could link to relevant wikipedia pages, like the acronyms section
> does?
>

They could. I opted not to do that because each external link invites
debate about how authoritative that link is, which is easier to do with
acronyms. Now that the glossary is a reality, it's easier to have those
discussions.


Re: Add A Glossary

2020-04-04 Thread Jürgen Purtz




- Server: is that really our definition?
   I thought that "server" is what the glossary defines as "instance", and
   the thing called "server" in the glossary should really be called "host".

   Maybe I am too Unix-centered.

   Many people I know use "instance" synonymous to "cluster".


Currently our documentation uses 'server', 'database server', 'host', 
'instance', ...  in an indifferent way. Similar problem with 
database/cluster. Now we have the chance to come to a conclusion about 
preferred terms an their exact meaning. Definitions in the glossary 
shall be the guideline, the documentation itself can adopt these terms 
over time.


Here is my point of view. We have distinguishable things:

(1) (virtual) hardware

(2) an abstract structure of several object types, which models a 
management system for data


(3) a group of closely related processes. They implement the internal 
'business logic' or 'work flow' of (2).


(4) abstract data, which fits into (2)

(5) a physical representation of (4). Mainly and long lasting on disc, 
but - partly - mirrored in RAM.


(6) client processes, which connect to (3)


IMO for (1) the two terms 'server' and 'host' both have their 
justification, depending on the context. There are historical terms 
('server-side', 'foreign server', 'client/server architecture', 'host' 
or 'host name' for IP-specification, 'host variable') which cannot be 
changed. Therefor we shall accept both with identical definition and use 
them as synonyms. Independent from this, there are many paragraphs in 
the documentation, where they are used in a misleading sense ('server 
crash', '... started the server', 'database server'). They should be 
changed over time.


For me, (3) is an 'instance' and (5) is a 'cluster'. There is a 1:1 
relation between the two, because one 'instance' controls exactly one 
'cluster'. But the 'instance' consists of processes and memory whereas 
the 'cluster' of databases which resides (mainly) on disc.


Concerning (6) we are not interested in any hardware-question. We are 
only interested in the processes, which connect to backend processes. We 
should only define the term "Client process".


Kind regards, Jürgen






Re: Add A Glossary

2020-04-03 Thread Fabien COELHO




BTW it's now visible at:
https://www.postgresql.org/docs/devel/glossary.html


Awesome! Linking beetween defs and to relevant sections is great.

BTW, I'm in favor of "an SQL" because I pronounce it "ess-kew-el", but I 
guess that people who say "sequel" would prefer "a SQL". Failing that, I'm 
fine with some heterogeneity, life is diverse!


ISTM that occurrences of these words elsewhere in the documentation should 
link to the glossary definitions?


As the definitions are short and to the point, maybe the HTML display 
could (also) "hover" the definitions when the mouse passes over the word, 
using the "title" attribute?


"ACID" does not appear as an entry, nor in the acronyms sections. Also no 
DCL, although DML & DDL are in acronyms.


Entries could link to relevant wikipedia pages, like the acronyms section 
does?


--
Fabien.




Re: Add A Glossary

2020-04-03 Thread Laurenz Albe
On Fri, 2020-04-03 at 16:01 -0500, Justin Pryzby wrote:
> BTW it's now visible at:
> https://www.postgresql.org/docs/devel/glossary.html

Great!

Some comments:

- SQL object: There are more kinds of objects, like roles or full text 
dictionaries.
  Perhaps better:

Anything that is created with a CREATE statement, for example ...
Most objects belong to a database schema, except ...

  Or do we consider a replication slot to be an object?

- The glossary has "Primary (server)", but not "Standby (server)".
  That should be a synonym for "Replica".

- Server: is that really our definition?
  I thought that "server" is what the glossary defines as "instance", and
  the thing called "server" in the glossary should really be called "host".

  Maybe I am too Unix-centered.

  Many people I know use "instance" synonymous to "cluster".

- Role: I understand the motivation behind the definition (except that the word 
"instance"
  is ill chosen), but a role is more than a collection of privileges.
  How can a collection of privileges have a password or own an object?
  Perhaps, instead of the first sentence:

A database object used for authentication, authorization and ownership.
Both database users and user groups are "roles" in PostgreSQL.

  In the second sentence, "roles" is mis-spelled as "roless".

- Null

  I think it should say "It represents the absence of *a definite* value."
  Usually it is better to think of NULL as "unknown".

- Function

  I don't know if "transformation of data" describes it well.
  Quite a lot of functions in PostgreSQL have side effects.
  How about:

Procedural code stored in the database that can be used in SQL statements.

Yours,
Laurenz Albe





Re: Add A Glossary

2020-04-03 Thread Erik Rijkers

On 2020-04-03 22:51, Alvaro Herrera wrote:

On 2020-Apr-03, Erik Rijkers wrote:


On 2020-04-03 18:45, Alvaro Herrera wrote:
> Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
> and Roger for the great help, and to Justin for the extensive review and
> Fabien for the initial discussion.

A few improvements:


Thanks!  That gives me the attached patch.


Should also be a  lemmata in the glossary:

ACID


Agreed.  Wording suggestions welcome.


How about:

"
ACID

Atomicity, consistency, isolation, and durability. ACID is a set of 
properties of database transactions intended to guarantee validity even 
in the event of power failures, etc.
ACID is concerned with how the database recovers from such failures that 
might occur while processing a transaction.

"

'archaic' should maybe be 'obsolete'. That seems to me to be an easier 
word

for non-native speakers.


Bummer ;-)


OK - we'll figure it out :)



--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Add A Glossary

2020-04-03 Thread Justin Pryzby
On Fri, Apr 03, 2020 at 05:51:43PM -0300, Alvaro Herrera wrote:
> - The internal representation of one value of a SQL
> + The internal representation of one value of an SQL

I'm not sure about this one.  The new glossary says "a SQL" seven times, and
doesn't say "an sql" at all.

"An SQL" does appear to be more common in the rest of the docs, but if you
change one, I think you'd change them all.

BTW it's now visible at:
https://www.postgresql.org/docs/devel/glossary.html

-- 
Justin




Re: Add A Glossary

2020-04-03 Thread Alvaro Herrera
On 2020-Apr-03, Erik Rijkers wrote:

> On 2020-04-03 18:45, Alvaro Herrera wrote:
> > Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
> > and Roger for the great help, and to Justin for the extensive review and
> > Fabien for the initial discussion.
> 
> A few improvements:

Thanks!  That gives me the attached patch.

> Should also be a  lemmata in the glossary:
> 
> ACID

Agreed.  Wording suggestions welcome.

> 'archaic' should maybe be 'obsolete'. That seems to me to be an easier word
> for non-native speakers.

Bummer ;-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8c6cb6e942..b5155e1a85 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -48,7 +48,7 @@

 
  In reference to a datum:
- the fact that its value that cannot be broken down into smaller
+ the fact that its value cannot be broken down into smaller
  components.
 

@@ -270,14 +270,14 @@
  global SQL objects. The
  cluster is managed by exactly one
  instance. A newly created
- Cluster will have three databases created automatically. They are
+ cluster will have three databases created automatically. They are
  template0, template1, and
  postgres. It is expected that an application will
  create one or more additional database aside from these three.
 
 
  (Don't confuse the PostgreSQL-specific term
- Cluster with the SQL
+ cluster with the SQL
  command CLUSTER).
 

@@ -363,7 +363,7 @@

 
  A restriction on the values of data allowed within a
- Table.
+ table.
 
 
  For more information, see
@@ -424,7 +424,7 @@
Datum

 
- The internal representation of one value of a SQL
+ The internal representation of one value of an SQL
  data type.
 

@@ -617,7 +617,7 @@

 
  Contains the values of row
- attributes (i.e. the data) for a
+ attributes (i.e., the data) for a
  relation.
  The heap is realized within
  segment files.
@@ -835,7 +835,7 @@

 
  A relation that is
- defined in the same way that a a view
+ defined in the same way that a view
  is, but stores data in the same way that a
  table does. It cannot be
  modified via INSERT, UPDATE, or
@@ -900,7 +900,7 @@
 
  In reference to a
  partitioned table:
- One of the tables that each contain part of the data of the partitioned table,
+ One of multiple tables that each contain part of the data of the partitioned table,
  which is said to be the parent.
  The partition is itself a table, so it can also be queried directly;
  at the same time, a partition can sometimes be a partitioned table,
@@ -910,7 +910,7 @@

 
  In reference to a window function:
- a partition is a user-defined criteria that identifies which neighboring
+ a partition is a user-defined criterion that identifies which neighboring
  rows can be considered by the
  function.
 
@@ -1103,7 +1103,7 @@
 
  A data structure transmitted from a
  backend process to
- client program upon the completion of a SQL
+ client program upon the completion of an SQL
  command, usually a SELECT but it can be an
  INSERT, UPDATE, or
  DELETE command if the RETURNING
@@ -1134,8 +1134,8 @@

 
  A collection of access privileges to the
- instance.
- Roless are themselves a privilege that can be granted to other roles.
+ instance.
+ Roles are themselves a privilege that can be granted to other roles.
  This is often done for convenience or to ensure completeness
  when multiple users need
  the same privileges.
@@ -1151,7 +1151,7 @@
Rollback

 
- A command to undo all of the operations performed since the beginning
+ A command to undo all operations performed since the beginning
  of a transaction.
 
 
@@ -1170,7 +1170,7 @@
Savepoint

 
- A special mark inside the sequence of steps in a
+ A special mark in the sequence of steps in a
  transaction.
  Data modifications after this point in time may be reverted
  to the time of the savepoint.
@@ -1192,7 +1192,8 @@
  SQL object must reside in exactly one schema.
 
 
- The names of SQL objects of the same type in the same schema are enforced unique.
+ The names of SQL objects of the same type in the same schema are enforced
+ to be unique.
  There is no restriction on reusing a name in multiple schemas.
 
 
@@ -1205,7 +1206,7 @@


 
- More generically, the term Schema is used to mean
+ More generically, the term schema is used to mean
  all data descriptions (table definitions,
  constraints, comments, etc)
  for a

Re: Add A Glossary

2020-04-03 Thread Erik Rijkers

On 2020-04-03 18:45, Alvaro Herrera wrote:
Pushed now.  Many thanks to Corey who put the main thrust, and to 
Jürgen
and Roger for the great help, and to Justin for the extensive review 
and

Fabien for the initial discussion.


A few improvements:

'its value that cannot'  should be
'its value cannot'

'A newly created Cluster'  should be
'A newly created cluster'

'term Cluster'  should be
'term cluster'

'allowed within a Table.'  should be
'allowed within a table.'

'of a SQL data type.'  should be
'of an SQL data type.'

'A SQL command'  should be
'An SQL command'

'i.e. the data'  should be
'i.e., the data'

'that a a view is'  should be
'that a view is'

'One of the tables that each contain part'  should be
'One of multiple tables that each contain part'

'a partition is a user-defined criteria'  should be
'a partition is a user-defined criterion'

'Roless are'  should be
'Roles are'

'undo all of the operations'  should be
'undo all operations'

'A special mark inside the sequence of steps'  should be
'A special mark in the sequence of steps'

'are enforced unique'  should be (?)
'are enforced to be unique'

'the term Schema is used'  should be
'the term schema is used'

'belong to exactly one Schema.'  should be
'belong to exactly one schema.'

'about the Cluster's activities'  should be
'about the cluster's activities'

'the most common form of Relation'  should be
'the most common form of relation'

'A Trigger executes'  should be
'A trigger executes'

'and other closely related garbage-collection-like processing'  should 
be

'and other processing'

'each of the changes are replayed'  should be
'each of the changes is replayed'

Should also be a  lemmata in the glossary:

ACID


'archaic' should maybe be 'obsolete'. That seems to me to be an easier 
word for non-native speakers.



Thanks,

Erik Rijkers




Re: Add A Glossary

2020-04-03 Thread Roger Harkavy
On Fri, Apr 3, 2020 at 1:34 PM Corey Huinker 
wrote:

> Thanks for all your work on this!
>

And to add on to Corey's message of thanks, I also want to thank everyone
for their input and assistance on that. I am very grateful for the
opportunity to contribute to this project!


Re: Add A Glossary

2020-04-03 Thread Corey Huinker
>
> we have it, we can start thinking of patching the main part of the docs
> to make reference to it by using  in key spots.  Right now
> the glossary links to itself, but it makes lots of sense to have other
> places point to it.
>

I have some ideas about how to patch the main docs, but will leave those to
a separate thread.


> * I commented out the definition of "sequence", which seemed to go into
>   excessive detail.  Let's have a more concise definition?
>

That one's my fault.


>
> Patches for these omissions, and other contributions, welcome.
>

Thanks for all your work on this!


Re: Add A Glossary

2020-04-03 Thread Alvaro Herrera
Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
and Roger for the great help, and to Justin for the extensive review and
Fabien for the initial discussion.

This is just a starting point.  Let's keep improving it.  And how that
we have it, we can start thinking of patching the main part of the docs
to make reference to it by using  in key spots.  Right now
the glossary links to itself, but it makes lots of sense to have other
places point to it.

On 2020-Apr-02, Justin Pryzby wrote:

> We already have Session:
> A Connection to the Database. 

Yes, but I didn't like that much, so I rewrote it -- I was asking for
suggestions on how to improve it further.  While I think we use those
terms (connection and session) interchangeably sometimes, they're not
exactly the same and the glossary should be more precise or at least
less vague about the distinction.

> I propose: Client:
>   A host (or a process on a host) which connects to a server to make
> queries or other requests.
> 
> But note, "host" is still defined as "server", which I didn't like.
> 
> Maybe it should be:
>   A computer which may act as a >client< or a >server<.

I changed all these terms, and a few others, added a couple more and
commented out some that I was not happy with, and pushed.

I think this still needs more work:

* We had "serializable", but none of the other isolation levels were
  defined.  If we think we should define them, let's define them all.
  But also the definition we had for serializable was not correct;
  it seemed more suited to define "repeatable read".

* I commented out the definition of "sequence", which seemed to go into
  excessive detail.  Let's have a more concise definition?

* We're missing exclusion constraints, and NOT NULL which is also a
  weird type of constraint.

Patches for these omissions, and other contributions, welcome.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-04-02 Thread Justin Pryzby
On Thu, Apr 02, 2020 at 07:09:32PM -0300, Alvaro Herrera wrote:
> "partition" instead).  If you (or anybody) have suggestions for the
> definition of "client" and "session", I'm all ears.

We already have Session:
A Connection to the Database. 

I propose: Client:
A host (or a process on a host) which connects to a server to make
queries or other requests.

But note, "host" is still defined as "server", which I didn't like.

Maybe it should be:
A computer which may act as a >client< or a >server<.

-- 
Justin




Re: Add A Glossary

2020-04-02 Thread Alvaro Herrera
On 2020-Apr-02, Corey Huinker wrote:

> On Thu, Apr 2, 2020 at 8:44 AM Jürgen Purtz  wrote:
> 
> > +1 and many thanks to Alvaros edits.
> >
> >
> I did some of the grunt work Alvaro alluded to in v6, and the results are
> attached and they build, which means there are no invalid links.

Thank you!  I had been working on some other changes myself, and merged
most of your changes.  I give you v8.

> * renamed id glossary-temporary-tables to glossary-temporary-table

Good.

> * temporarily re-added an id for glossary-row as we have many references to
> that. unsure if we should use the term Tuple in all those places or say Row
> while linking to glossary-tuple, or something else

I changed these to link to glossary-tuple; that entry already explains
these two other terms, so this seems acceptable.

> * temporarily re-added an id for glossary-segment, glossary-wal-segment,
> glossary-analytic-function, as those were also referenced and will need
> similar decisions made

Ditto.

> * added a stub entry for glossary-unique-index, unsure if it should have a
> definition on it's own, or we split it into unique and index.

I changed Unique Index into Unique Constraint, which is supposed to be
the overarching concept.  Used that in the definition of primary key.

> * I noticed several cases where a glossterm is used twice in a definition,
> but didn't de-term them

Did that for most I found, but I expect that some remain.

> * I'm curious about how we should tag a term when using it in its own
> definition. same as anywhere else?

I think we should not tag those.

I fixed the definition of global object as mentioned previously.  Also
added "client", made "connection" have less importance compared to
"session", and removed "window frame" (made "window function" refer to
"partition" instead).  If you (or anybody) have suggestions for the
definition of "client" and "session", I'm all ears.

I'm quite liking the result of this now.  Thanks for all your efforts.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1043d0f7ab..cf21ef857e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..7d981a9223
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1692 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+
+ 
+  
+   Aggregate Function
+   
+
+ A function that
+ combines (aggregates) multiple input values,
+ for example by counting, averaging or adding,
+ yielding a single output value.
+
+
+ For more information, see
+ .
+
+
+   
+  
+
+  
+   Analyze (operation)
+   
+
+ The process of collecting statistics from data in
+ tables
+ and other relations
+ to help the query planner
+ to make decisions about how to execute
+ queries.
+
+   
+  
+
+  
+   Analytic Function
+   
+  
+
+  
+   Atomic
+   
+
+ In reference to a datum:
+ the fact that its value that cannot be broken down into smaller
+ components.
+
+   
+   
+
+ In reference to a
+ database transaction:
+ see atomicity.
+
+   
+  
+
+  
+   Atomicity
+   
+
+ The property of a transaction
+ that either all its operations complete as a single unit or none do.
+ In addition, if a system failure occurs during the execution of a
+ transaction, no partial results are visible after recovery.
+ This is one of the ACID properties.
+
+   
+  
+
+  
+   Attribute
+   
+
+ An element with a certain name and data type found within a
+ tuple or
+ table.
+
+   
+  
+
+  
+   Autovacuum (process)
+   
+
+ A set of background processes that routinely perform
+ vacuum
+ and analyze
+ operations.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Backend (process)
+   
+
+ Processes of an instance
+ which act on behalf of client sessions
+ and handle their requests.
+
+
+ (Don't confuse this term with the similar terms
+ Background Worker or
+ Background Writer).
+
+   
+  
+
+  
+   Background Worker (process)
+   
+
+ Individual processes within an instance,
+ which run system- or user-supplied code.
+ They provide infrastructure for several features in
+ PostgreSQL, such as 
+ logical replication
+ and parallel queries.
+ Extensions can add
+ custom background worker processes, as well.
+   
+   
+For more information, see
+.
+   
+   
+  
+
+  
+   Background Writer (process)
+   
+
+ A process that continuously writes dirty pages from
+ Shared Memory to
+ th

Re: Add A Glossary

2020-04-02 Thread Corey Huinker
On Thu, Apr 2, 2020 at 8:44 AM Jürgen Purtz  wrote:

> +1 and many thanks to Alvaros edits.
>
>
I did some of the grunt work Alvaro alluded to in v6, and the results are
attached and they build, which means there are no invalid links.

Notes:
* no definition wordings were changed
* added a linkend to all remaining glossterms that do not immediately
follow a glossentry
* renamed id glossary-temporary-tables to glossary-temporary-table
* temporarily re-added an id for glossary-row as we have many references to
that. unsure if we should use the term Tuple in all those places or say Row
while linking to glossary-tuple, or something else
* temporarily re-added an id for glossary-segment, glossary-wal-segment,
glossary-analytic-function, as those were also referenced and will need
similar decisions made
* added a stub entry for glossary-unique-index, unsure if it should have a
definition on it's own, or we split it into unique and index.
* I noticed several cases where a glossterm is used twice in a definition,
but didn't de-term them
* I'm curious about how we should tag a term when using it in its own
definition. same as anywhere else?
From 4603ce04306e77f5508bb207b42e5dec1425e7c5 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 2 Apr 2020 15:32:43 -0400
Subject: [PATCH] glossary v7

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1589 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1591 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1043d0f7ab..cf21ef857e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..edfcf9d725
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1589 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+
+ 
+  
+   Aggregate Function
+   
+
+ A function that
+ combines (aggregates) multiple input values,
+ for example by counting, averaging or adding,
+ yielding a single output value.
+
+
+ For more information, see
+ .
+
+
+   
+  
+
+  
+   Analyze (operation)
+   
+
+ The process of collecting statistics from data in
+ tables
+ and other relations
+ to help the query planner
+ to make decisions about how to execute
+ queries.
+
+   
+  
+
+  
+   Analytic Function
+   
+  
+
+  
+   Atomic
+   
+
+ In reference to a datum:
+ the fact that its value that cannot be broken down into smaller
+ components.
+
+   
+   
+
+ In reference to a
+ database transaction:
+ see atomicity.
+
+   
+  
+
+  
+   Atomicity
+   
+
+ The property of a transaction
+ that either all its operations complete as a single unit or none do.
+ This is one of the ACID properties.
+
+   
+  
+
+  
+   Attribute
+   
+
+ An element with a certain name and data type found within a
+ tuple or
+ table.
+
+   
+  
+
+  
+   Autovacuum
+   
+
+ Background processes that routinely perform
+ Vacuum and Analyze
+ operations.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Backend (process)
+   
+
+ Processes of an Instance which act on behalf of
+ client Connections and handle their requests.
+
+
+ (Don't confuse this term with the similar terms
+ Background Worker or
+ Background Writer).
+
+   
+  
+
+  
+   Background Worker (process)
+   
+
+ Individual processes within an Instance, which
+ run system- or user-supplied code.  A typical use case is a process
+ which handles parts of an SQL query to take
+ advantage of parallel execution on servers with multiple
+ CPUs.
+   
+   
+For more information, see
+.
+   
+   
+  
+
+  
+   Background Writer (process)
+   
+
+ A process that continuously writes dirty pages from
+ Shared Memory to the file system.
+ It wakes up periodically, but
+ works only for a short period in order to distribute its expensive
+ I/O activity over time, instead of generating fewer
+ larger I/O peaks which could block other processes.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Cast
+   
+
+ A conversion of a Datum from its current data
+ type to another data type.
+
+   
+  
+
+  
+   Catalog
+   
+
+ The SQL standard uses this term to
+ indicate what is called a Database in
+ PostgreSQL's terminology.
+
+
+ This should not be confused with the
+ System Catalog.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Check Constraint
+   
+
+ A type of Constraint defined on a
+

Re: Add A Glossary

2020-04-02 Thread Jürgen Purtz

+1 and many thanks to Alvaros edits.


Kind regards

Jürgen Purtz






Re: Add A Glossary

2020-04-02 Thread Alvaro Herrera
On 2020-Apr-01, Corey Huinker wrote:

> > I propose we define "planner" and make "optimizer" a  entry.
> 
> I have no objection to more entries, or edits to entries, but am concerned
> that the process leads to someone having to manually merge several
> start-from-scratch patches, with no clear sense of when we'll be done. I
> may make sense to appoint an edit-collector.

I added "query planner" (please suggest edits) and "query" (using
Justin's def) and edited the defs of the ACID terms a little bit (in
particular moved the definition of atomic transaction to "atomicity"
from "atomic", and made the latter reference the former instead of the
other way around).  Also removed "Aggregating" as suggested upthread.  I
moved "master" over to "primary (server)", keeping the ref; we don't use
the former much.

There's only one "serious" mistake in the defs AFAICS which is that of
"global objects".  Only roles, tablespace, databases are global objects.
Objects that are not in a schema (extensions, etc) are not "global" in
that sense.

I think all  used in definitions should have linkend.

I hope to get this committed today, but I'm going to sleep now so if you
want to suggest further edits, now's the time.  I think the terms
proposed by Justin are good to have -- please discuss the defs he
proposed -- only "normalized" I'd rather stay away from.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1043d0f7ab..cf21ef857e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..fb4934d322
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1580 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+
+ 
+  
+   Aggregate Function
+   
+
+ A function that
+ combines (aggregates) multiple input values,
+ for example by counting, averaging or adding,
+ yielding a single output value.
+
+
+ For more information, see
+ .
+
+
+   
+  
+
+  
+   Analyze (operation)
+   
+
+ The process of collecting statistics from data in
+ tables
+ and other relations
+ to help the query planner
+ to make decisions about how to execute
+ queries.
+
+   
+  
+
+  
+   Analytic Function
+   
+  
+
+  
+   Atomic
+   
+
+ In reference to a datum:
+ the fact that its value that cannot be broken down into smaller
+ components.
+
+   
+   
+
+ In reference to a
+ database transaction:
+ see atomicity.
+
+   
+  
+
+  
+   Atomicity
+   
+
+ The property of a transaction
+ that either all its operations complete as a single unit or none do.
+ This is one of the ACID properties.
+
+   
+  
+
+  
+   Attribute
+   
+
+ An element with a certain name and data type found within a
+ tuple or
+ table.
+
+   
+  
+
+  
+   Autovacuum
+   
+
+ Background processes that routinely perform
+ Vacuum and Analyze
+ operations.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Backend (process)
+   
+
+ Processes of an Instance which act on behalf of
+ client Connections and handle their requests.
+
+
+ (Don't confuse this term with the similar terms
+ Background Worker or
+ Background Writer).
+
+   
+  
+
+  
+   Background Worker (process)
+   
+
+ Individual processes within an Instance, which
+ run system- or user-supplied code.  A typical use case is a process
+ which handles parts of an SQL query to take
+ advantage of parallel execution on servers with multiple
+ CPUs.
+   
+   
+For more information, see
+.
+   
+   
+  
+
+  
+   Background Writer (process)
+   
+
+ A process that continuously writes dirty pages from
+ Shared Memory to the file system.
+ It wakes up periodically, but
+ works only for a short period in order to distribute its expensive
+ I/O activity over time, instead of generating fewer
+ larger I/O peaks which could block other processes.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Cast
+   
+
+ A conversion of a Datum from its current data
+ type to another data type.
+
+   
+  
+
+  
+   Catalog
+   
+
+ The SQL standard uses this term to
+ indicate what is called a Database in
+ PostgreSQL's terminology.
+
+
+ This should not be confused with the
+ System Catalog.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Check Constraint
+   
+
+ A type of Constraint defined on a
+ Relation which restricts the values 

Re: Add A Glossary

2020-04-01 Thread Corey Huinker
>
> I propose we define "planner" and make "optimizer" a  entry.
>

I have no objection to more entries, or edits to entries, but am concerned
that the process leads to someone having to manually merge several
start-from-scratch patches, with no clear sense of when we'll be done. I
may make sense to appoint an edit-collector.


> I further propose not to define the term "normalized", at least not for
> now.  That seems a very deep rabbit hole.
>

+1 I think we appointed a guy named Xeno to work on that definition. He
says he's getting close...


Re: Add A Glossary

2020-04-01 Thread Corey Huinker
>
> 2. I found out that "see xyz" and "see also" have bespoke markup in
> Docbook --  and .  I changed some glossentries
> to use those, removing some glossdefs and changing a couple of paras to
> glossseealsos.  I also removed all "id" properties from glossentries
> that are just , because I think it's a mistake to have
> references to entries that will make the reader look up a different
> term; for me as a reader that's annoying, and I don't like to annoy
> people.
>

+1 These structural enhancements are great. I'm fine with removing the id
from just-glossee, and glad that we're keeping the entry to aid discovery.


> I rewrote the definition for "atomic" once again.  Made it two
> glossdefs, because I can.  If you don't like this, I can undo.
>

+1 Splitting this into two definitions, one for each context, is the most
sensible thing and I don't know why I didn't do that in the first place.


Re: Add A Glossary

2020-04-01 Thread Alvaro Herrera
On 2020-Apr-01, Justin Pryzby wrote:

> planner/optimizer: ...

I propose we define "planner" and make "optimizer" a  entry.

I further propose not to define the term "normalized", at least not for
now.  That seems a very deep rabbit hole.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-04-01 Thread Justin Pryzby
On Tue, Mar 31, 2020 at 03:26:02PM -0400, Corey Huinker wrote:
> Just so I can prioritize my work, which of these things, along with your
> suggestions in previous emails, would you say is a barrier to considering
> this ready for a committer?

To answer your off-list inquiry, I'm not likely to mark it "ready" myself.
I don't know if any of these would be a "blocker" for someone else.

> > Here's some ideas; I'm *not* suggesting to include all of everything, but
> > hopefully start with a coherent, self-contained list.
> 
> > grep -roh '[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c
> > |sort -nr |less

I looked through that list and found these that might be good to include now or
in the future.  Probably all of these need language polishing; I'm not
requesting you to just copy them in just to say they're there.

join: concept of combining columns from two tables or other relations.  The
result of joining a table with N rows to another table with M rows might have
up to N*M rows (if every row from the first table "joins to" every row on the
second table).

normalized: A database schema is said to be "normalized" if its redundancy has
been removed.  Typically a "normalized" schema has a larger number of tables,
which include ID columns, and queries typically involve joining together
multiple tables.

query: a request send by a client to a server, usually to return results or to
modify data on the server;

query plan: the particular procedure by which the database server executes a
query.  A simple query involving a single table could might be planned using a
sequential scan or an index scan.  For a complex query involving multiple
tables joined togther, the optimizer attempts to determine the
cheapest/fastest/best way to execute the query, by joining tables in the
optimal order, and with the optimal join strategy.

planner/optimizer: ...

transaction isolation:
psql: ...

synchronous: An action is said to be "synchronous" if it does not return to its
requestor until its completion;

bind parameters: arguments to a SQL query that are sent separately from the
query text.  For example, the query text "SELECT * FROM tbl WHERE col=$1" might
be executed for some certain value of the $1 parameter.  If parameters are sent
"in-line" as a part of the query text, they need to be properly
quoted/escaped/sanitized, to avoid accidental or malicious misbehavior if the
input contains special characters like semicolons or quotes.

> > Maybe also:
> > object identifier
> > operator classes
> > operator family
> > visibility map

-- 
Justin




Re: Add A Glossary

2020-04-01 Thread Alvaro Herrera
On 2020-Apr-01, Jürgen Purtz wrote:

> 
> On 31.03.20 19:58, Justin Pryzby wrote:
> > On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote:
> > > Please find some minor suggestions in the attachment. They are based on
> > > Corey's last patch 0001-glossary-v4.patch.
> > > @@ -220,7 +220,7 @@
> > > Records to the file system and creates a special
> > > checkpoint record. This process is initiated when predefined
> > > conditions are met, such as a specified amount of time has 
> > > passed, or
> > > -  a certain volume of records have been collected.
> > > +  a certain volume of records has been collected.
> > I think you're correct in that "volume" is singular.  But I think 
> > "collected"
> > is the wrong world.  I suggested "written".
> > 
> "collected" is not optimal. I suggest "created". Please avoid "written", the
> WAL records will be written when the Checkpointer is running, not before.

Actually, you're mistaken; the checkpointer hardly writes any WAL
records.  In fact, it only writes *one* wal record, which is the
checkpoint record itself.  All the other wal records are written either
by the backends that produce it, or by the wal writer process.  By the
time the checkpoint runs, the wal records are long expected to be written.

Anyway I changed a lot of terms again, as well as changing the way the
terms are marked up -- for two reasons:

1. I didn't like the way the WAL-related entries were structured.  I
created a new entry called "Write-Ahead Log", which explains what WAL
is; this replaces the term "WAL Log", which is redundant (since the L in
WAL stands for "log" already). I kept the id as glossary-wal, though,
because it's shorter and *shrug*.  The definition uses the terms "wal
record" and "wal file", which I also rewrote.

2. I found out that "see xyz" and "see also" have bespoke markup in
Docbook --  and .  I changed some glossentries
to use those, removing some glossdefs and changing a couple of paras to
glossseealsos.  I also removed all "id" properties from glossentries
that are just , because I think it's a mistake to have
references to entries that will make the reader look up a different
term; for me as a reader that's annoying, and I don't like to annoy
people.


While at it, I again came across "analytic", which is a term we don't
use much, so I made it a glosssee for "window function"; and while at it
I realized we didn't clearly explain what a window was. So I added
"window frame" for that.  I considered adding the term "partition" which
is used in this context, but decided it wasn't necessary.

I also added "(process)" to terms that define processes.  So
now we have "checkpointer (process)" and so on.

I rewrote the definition for "atomic" once again.  Made it two
glossdefs, because I can.  If you don't like this, I can undo.

I added "recycling".

I still have to go through some other defs.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1043d0f7ab..cf21ef857e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..3c417f2fd3
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1526 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+
+ 
+  
+   Aggregating
+   
+
+ The act of combining a collection of data (input) values into
+ a single output value, which may not be of the same type as the
+ input values.
+
+   
+  
+
+  
+   Aggregate Function
+   
+
+ A function that
+ combines multiple input values,
+ for example by counting, averaging or adding them all together,
+ yielding a single output value.
+
+
+ For more information, see
+ .
+
+
+   
+  
+
+  
+   Analytic Function
+   
+  
+
+  
+   Atomic
+   
+
+ In reference to a datum:
+ the fact that its value that cannot be broken down into smaller
+ components.
+
+   
+   
+
+ In reference to a
+ database transaction:
+ the fact that all the operations in the transaction either complete as
+ a whole, or none of them become visible.
+
+   
+  
+
+  
+   Atomicity
+   
+
+ One of the ACID properties. This is the state of
+ being Atomic in the operational/transactional sense.
+
+   
+  
+
+  
+   Attribute
+   
+
+ An element with a certain name and data type found within a
+ Tuple or Table.
+
+   
+  
+
+  
+   Autovacuum
+   
+
+ Background processes that routinely perform
+ Vacuum and Analyze
+ operations.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Backend Process
+   
+
+ Proc

Re: Add A Glossary

2020-04-01 Thread Jürgen Purtz



On 31.03.20 19:58, Justin Pryzby wrote:

On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote:

Please find some minor suggestions in the attachment. They are based on
Corey's last patch 0001-glossary-v4.patch.
@@ -220,7 +220,7 @@
Records to the file system and creates a special
checkpoint record. This process is initiated when predefined
conditions are met, such as a specified amount of time has passed, or
-  a certain volume of records have been collected.
+  a certain volume of records has been collected.

I think you're correct in that "volume" is singular.  But I think "collected"
is the wrong world.  I suggested "written".

"collected" is not optimal. I suggest "created". Please avoid "written", 
the WAL records will be written when the Checkpointer is running, not 
before. So:


 "a certain volume of WAL records has been 
collected."



Every thing else is ok for me.

Kind regards, Jürgen




Re: Add A Glossary

2020-04-01 Thread Jürgen Purtz



On 31.03.20 20:07, Justin Pryzby wrote:

On Mon, Mar 30, 2020 at 01:10:19PM -0400, Corey Huinker wrote:

+   
+Aggregating
+
+ 
+  The act of combining a collection of data (input) values into
+  a single output value, which may not be of the same type as the
+  input values.

I think we maybe already tried to address this ; but could we define a noun
form ?  But not "aggregate" since it's the same word as the verb form.  I think
it would maybe be best to merge with "aggregate function", below.


Yes, combine the two. Or remove "aggregating" at all.



+ 

+Log Writer
+
+ 
+  If activated and parameterized, the

I still don't know what parameterized means here.


Remove "and parameterized". The Log Writer always has (default) parameters.


Every thing else is ok for me.

Kind regards, Jürgen




Re: Add A Glossary

2020-03-31 Thread Corey Huinker
On Tue, Mar 31, 2020 at 2:09 PM Justin Pryzby  wrote:

> On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote:
> > 1. It's obviously incomplete. There are more terms, a lot more, to add.
>
> How did you come up with the initial list of terms ?
>

1. I asked some newer database people to come up with a list of terms that
they used.
2. I then added some more terms that seemed obvious given that first list.
3. That combined list was long on general database concepts and theory, and
short on administration concepts
4. Then Jürgen suggested that we integrate his working list of terms, very
much focused on internals, so I did that.
5. Everything after that was applying suggested edits and new terms.


> Here's some ideas; I'm *not* suggesting to include all of everything, but
> hopefully start with a coherent, self-contained list.
>

I don't think this list will ever be complete. It will always be a work in
progress. I'd prefer to get the general structure of a glossary committed
in the short term, and we're free to follow up with edits that focus on the
wording.


>
> grep -roh '[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c
> |sort -nr |less
>
> Maybe also:
> object identifier
> operator classes
> operator family
> visibility map
>

Just so I can prioritize my work, which of these things, along with your
suggestions in previous emails, would you say is a barrier to considering
this ready for a committer?


Re: Add A Glossary

2020-03-31 Thread Justin Pryzby
On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote:
> 1. It's obviously incomplete. There are more terms, a lot more, to add.

How did you come up with the initial list of terms ?

Here's some ideas; I'm *not* suggesting to include all of everything, but
hopefully start with a coherent, self-contained list.

grep -roh '[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c |sort -nr 
|less

Maybe also:
object identifier
operator classes
operator family
visibility map

-- 
Justin




Re: Add A Glossary

2020-03-31 Thread Justin Pryzby
On Mon, Mar 30, 2020 at 01:10:19PM -0400, Corey Huinker wrote:
> +   
> +Aggregating
> +
> + 
> +  The act of combining a collection of data (input) values into
> +  a single output value, which may not be of the same type as the
> +  input values.

I think we maybe already tried to address this ; but could we define a noun
form ?  But not "aggregate" since it's the same word as the verb form.  I think
it would maybe be best to merge with "aggregate function", below.

> +   
> +Consistency
> +
> + 
> +  One of the ACID properties. This means that the 
> database
> +  is always in compliance with its own rules such as 
> Table
> +  structure, Constraints,

I don't think the definition of "compliance" is good.  The state of being
consistent means an absense of corruption more than that an absense of data
integrity issues (which could be caused by corruption).

> +   
> +Datum
> +
> + 
> +  The internal representation of a SQL data type.

Could you say "..used by PostgreSQL" ?

> +File Segment
> +
> + 
> +   A physical file which stores data for a given
> +   Heap or Index object.
> +   File Segments are limited in size by a
> +   configuration value and if that size is exceeded, it will be split
> +   into multiple physical files.

Say "if an object exceeds that size, then it will be stored across multiple
physical files".

> +  which handles parts of an SQL query to take
...
> +  A SQL command used to add new data into a

I mentioned before, please be consistent: "A SQL or An SQL".

> + 
> + 
> +  Many Instances can run on the same server as

Say "multiple" not many.

> +   
> +Join
> +
> + 
> +  A SQL keyword used in SELECT 
> statements for
> +  combining data from multiple Relations.

Could you add a link to the docs ?

> +   
> +Log Writer
> +
> + 
> +  If activated and parameterized, the

I still don't know what parameterized means here.

> +   
> +System Catalog
> +
> + 
> +  A collection of Tables and
> +  Views which describe the structure of all
> +  SQL objects of the Database

I would say "... a PostgreSQL >Database<"

> +  and the Global SQL Objects of the
> +  Cluster. The System
> +  Catalog resides in the schema
> +  pg_catalog. Main parts are mirrored as
> +  Views in the Schema
> +  information_schema.

I wouldn't say "mirror":  Some information is also exposed as >Views< in the
>information_schema< >Schema<.

> +   
> +Tablespace
> +
> + 
> +  A named location on the server filesystem. All SQL 
> Objects
> +  which require storage beyond their definition in the
> +  System Catalog
> +  must belong to a single tablespace.

Remove "single" as it sounds like we only support one.

> +Transaction
> +
> + 
> +  A combination of commands that must act as a single
> +  Atomic command: they all succeed or all fail
> +  as a single unit, and their effects are not visible to other
> +  Sessions until
> +  the Transaction is complete.

s/complete/commited/ ?


> +   
> +Unique
> +
> + 
> +  The condition of having no duplicate values in the same
> +  Relation. Often used in the concept of

s/concept/context/

> +Vacuum
> +
> + 
> +  The process of removing outdated MVCC

Maybe say "tuples which were deleted or obsoleted by an UPDATE".
But maybe you're trying to use generic language.

-- 
Justin




Re: Add A Glossary

2020-03-31 Thread Justin Pryzby
On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote:
> Please find some minor suggestions in the attachment. They are based on
> Corey's last patch 0001-glossary-v4.patch.

> @@ -220,7 +220,7 @@
>Records to the file system and creates a special
>checkpoint record. This process is initiated when predefined
>conditions are met, such as a specified amount of time has passed, or
> -  a certain volume of records have been collected.
> +  a certain volume of records has been collected.

I think you're correct in that "volume" is singular.  But I think "collected"
is the wrong world.  I suggested "written".

>   
> -  One of the ACID properties. This means that 
> concurrently running 
> +  One of the ACID properties. This means that 
> concurrently running

These could maybe say "required" or "essential" >ACID< properties

>   
> +  In reference to a Table:
>A Table that can be queried directly,

Maybe: "In reference to a >Relation<: A table which can be queried directly,"

>table in the collection.
>   
>   
> -  When referring to an Analytic
> -  Function: a partition is a definition
> -  that identifies which neighboring
> +  In reference to a Analytic Function:
s/a/an/

> @@ -1333,7 +1334,8 @@
>  
>   
>The condition of having no duplicate values in the same
> -  Relation. Often used in the concept of
> +  Column of a Relation.
> +  Often used in the concept of

s/concept/context/, but  I said that before, so maybe it was rejected.

-- 
Justin




Re: Add A Glossary

2020-03-31 Thread Jürgen Purtz

On 30.03.20 19:10, Corey Huinker wrote:



On Sun, Mar 29, 2020 at 5:29 AM Jürgen Purtz > wrote:


On 27.03.20 21:12, Justin Pryzby wrote:
> On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
 + Archiver
>>> Can you change that to archiver process ?
>> I prefer the short term without the addition of 'process' -
concerning
>> 'Archiver' as well as the other cases. But I'm not an native
English
>> speaker.
> I didn't like it due to lack of context.
>
> What about "wal archiver" ?
>
> It occured to me when I read this.
>

https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
>
"WAL archiver" is ok for me. In the current documentation we have 2
places with "WAL archiver" and 4 with "archiver"-only
(high-availability.sgml, monitoring.sgml).

"backend process" is an exception to the other terms because the
standalone term "backend" is sensibly used in diverse situations.

Kind regards, Jürgen


I've taken Alvarao's fixes and done my best to incorporate the 
feedback into a new patch, which Roger's (tech writer) reviewed yesterday.


The changes are too numerous to list, but the highlights are:

New definitions:
* All four ACID terms
* Vacuum (split off from Autovacuum)
* Tablespace
* WAL Archiver (replaces Archiver)

Changes to existing terms:
* Implemented most wording changes recommended by Justin
* all remaining links were either made into xrefs or edited out of
existence

* de-tagged most second uses of of a term within a definition


Did not do
* Addressed the " Process" suffix suggested by Justin. There isn't
consensus on these changes, and I'm neutral on the matter
* change the Cast definition. I think it's important to express
that a cast has a FROM datatype as well as a TO
* anything host/server related as I couldn't see a consensus reached

Other thoughts:
* Trivial definitions that are just see-other-definition are ok
with me, as the goal of this glossary is to aid in discovery of
term meanings, so knowing that two terms are interchangable is
itself helpful


It is my hope that this revision represents the final _structural_ 
change to the glossary. New definitions and edits to existing 
definitions will, of course, go on forever.


Please find some minor suggestions in the attachment. They are based on 
Corey's last patch 0001-glossary-v4.patch.


Kind regards, Jürgen


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index eab14f3c9b..623922a4c3 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -36,10 +36,10 @@

 

-Analytic
+Analytic Function
 
  
-  A Function whose computed value can reference
+  A type of Functions whose result may be based on
   values found in nearby Rows of the same
   Result Set.
  
@@ -59,12 +59,12 @@
   into smaller components.
  
  
+  
   In reference to an operation: an event that cannot be completed in
   part; it must either entirely succeed or entirely fail. For
   example, a series of SQL statements can be
   combined into a Transaction, and that
-  transaction is said to be atomic.
-  Atomic.
+  transaction is said to be Atomic.
  
 

@@ -73,7 +73,7 @@
 Atomicity
 
  
-  One of the ACID properties. This is the state of 
+  One of the ACID properties. This is the state of
   being Atomic in the operational/transactional sense.
  
 
@@ -152,7 +152,7 @@
   A process that continuously writes dirty pages from
   Shared Memory to the file system.
   It wakes up periodically, but
-  works only for a short period in order to distribute expensive
+  works only for a short period in order to distribute its expensive
   I/O activity over time, instead of generating fewer
   larger I/O peaks which could block other processes.
  
@@ -220,7 +220,7 @@
   Records to the file system and creates a special
   checkpoint record. This process is initiated when predefined
   conditions are met, such as a specified amount of time has passed, or
-  a certain volume of records have been collected.
+  a certain volume of records has been collected.
  
 

@@ -303,7 +303,7 @@
 
  
   An established line of communication between a client process
-  and a server process. If the two involved processes reside on the
+  and a Backend Process. If the two involved processes reside on the
   same Server, then the connection can either use
   TCP/IP or Unix-domain sockets. Otherwise,
   only TCP/IP can be used.
@@ -470,7 +470,7 @@
   A type of Constraint defined on one or more
   Columns in a Table which
   requires the value(s) in those Columns to
-  identify

Re: Add A Glossary

2020-03-30 Thread Corey Huinker
On Sun, Mar 29, 2020 at 5:29 AM Jürgen Purtz  wrote:

> On 27.03.20 21:12, Justin Pryzby wrote:
> > On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
>  +Archiver
> >>> Can you change that to archiver process ?
> >> I prefer the short term without the addition of 'process' - concerning
> >> 'Archiver' as well as the other cases. But I'm not an native English
> >> speaker.
> > I didn't like it due to lack of context.
> >
> > What about "wal archiver" ?
> >
> > It occured to me when I read this.
> >
> https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
> >
> "WAL archiver" is ok for me. In the current documentation we have 2
> places with "WAL archiver" and 4 with "archiver"-only
> (high-availability.sgml, monitoring.sgml).
>
> "backend process" is an exception to the other terms because the
> standalone term "backend" is sensibly used in diverse situations.
>
> Kind regards, Jürgen
>

I've taken Alvarao's fixes and done my best to incorporate the feedback
into a new patch, which Roger's (tech writer) reviewed yesterday.

The changes are too numerous to list, but the highlights are:

New definitions:
* All four ACID terms
* Vacuum (split off from Autovacuum)
* Tablespace
* WAL Archiver (replaces Archiver)

Changes to existing terms:
* Implemented most wording changes recommended by Justin
* all remaining links were either made into xrefs or edited out of existence

* de-tagged most second uses of of a term within a definition


Did not do
* Addressed the " Process" suffix suggested by Justin. There isn't
consensus on these changes, and I'm neutral on the matter
* change the Cast definition. I think it's important to express that a cast
has a FROM datatype as well as a TO
* anything host/server related as I couldn't see a consensus reached

Other thoughts:
* Trivial definitions that are just see-other-definition are ok with me, as
the goal of this glossary is to aid in discovery of term meanings, so
knowing that two terms are interchangable is itself helpful


It is my hope that this revision represents the final _structural_ change
to the glossary. New definitions and edits to existing definitions will, of
course, go on forever.
From 8a163603102f51a3eddfb05c51baf3b840c5d7f7 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 30 Mar 2020 13:08:27 -0400
Subject: [PATCH] glossary v4

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1551 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1553 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1043d0f7ab..cf21ef857e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..eab14f3c9b
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1551 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+  
+   
+Aggregating
+
+ 
+  The act of combining a collection of data (input) values into
+  a single output value, which may not be of the same type as the
+  input values.
+ 
+
+   
+
+   
+Aggregate Function
+
+ 
+  A Function that combines multiple input values,
+  for example by counting, averaging or adding them all together,
+  yielding a single output value.
+ 
+ 
+  For more information, see
+  .
+ 
+ 
+  See also Window Function.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference
+  values found in nearby Rows of the same
+  Result Set.
+ 
+ 
+  For more information, see
+  .
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or
+  Datum: an item that cannot be broken down
+  into smaller components.
+ 
+ 
+  In reference to an operation: an event that cannot be completed in
+  part; it must either entirely succeed or entirely fail. For
+  example, a series of SQL statements can be
+  combined into a Transaction, and that
+  transaction is said to be atomic.
+  Atomic.
+ 
+
+   
+
+   
+Atomicity
+
+ 
+  One of the ACID properties. This is the state of 
+  being Atomic in the operational/transactional sense.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  An element with a certain name and data type found within a
+  Tuple or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Background Worker processes that routinely
+  perform Vacuum operations.
+ 
+ 
+  For more information, see
+  .
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Insta

Re: Add A Glossary

2020-03-29 Thread Jürgen Purtz

On 27.03.20 21:12, Justin Pryzby wrote:

On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:

+Archiver

Can you change that to archiver process ?

I prefer the short term without the addition of 'process' - concerning
'Archiver' as well as the other cases. But I'm not an native English
speaker.

I didn't like it due to lack of context.

What about "wal archiver" ?

It occured to me when I read this.
https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com

"WAL archiver" is ok for me. In the current documentation we have 2 
places with "WAL archiver" and 4 with "archiver"-only 
(high-availability.sgml, monitoring.sgml).


"backend process" is an exception to the other terms because the 
standalone term "backend" is sensibly used in diverse situations.


Kind regards, Jürgen






Re: Add A Glossary

2020-03-27 Thread Justin Pryzby
On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
> > > +Archiver
> > Can you change that to archiver process ?
> 
> I prefer the short term without the addition of 'process' - concerning
> 'Archiver' as well as the other cases. But I'm not an native English
> speaker.

I didn't like it due to lack of context.

What about "wal archiver" ?

It occured to me when I read this.
https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com

-- 
Justin




Re: Add A Glossary

2020-03-24 Thread Robert Haas
On Tue, Mar 24, 2020 at 3:40 PM Jürgen Purtz  wrote:
> On 24.03.20 19:46, Robert Haas wrote:
> Do we use shared_buffers for WAL ?
>
> No.
>
> What's about the explanation in 
> https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers 
> (integer)The amount of shared memory used for WAL data that has not yet 
> been written to disk. The default setting of -1 selects a size equal to 
> 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the 
> parameter wal_buffers grabs some of the existing shared_buffers for its own 
> purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers two 
> different shared memory areas?

Yes. The code adds up the shared memory requests from all of the
different subsystems and then allocates one giant chunk of shared
memory which is divided up between them. The overwhelming majority of
that memory goes into shared_buffers, but not all of it. You can use
the new pg_get_shmem_allocations() function to see how it's used. For
example, with shared_buffers=4GB:

rhaas=# select name, pg_size_pretty(size) from
pg_get_shmem_allocations() order by size desc limit 10;
 name | pg_size_pretty
--+
 Buffer Blocks| 4096 MB
 Buffer Descriptors   | 32 MB
   | 32 MB
 XLOG Ctl | 16 MB
 Buffer IO Locks  | 16 MB
 Checkpointer Data| 12 MB
 Checkpoint BufferIds | 10 MB
 clog | 2067 kB
  | 1876 kB
 subtrans | 261 kB
(10 rows)

rhaas=# select count(*), pg_size_pretty(sum(size)) from
pg_get_shmem_allocations();
 count | pg_size_pretty
---+
54 | 4219 MB
(1 row)

So, in this configuration, there whole shared memory segment is
4219MB, of which 4096MB is allocated to shared_buffers and the rest to
dozens of smaller allocations, with 1876 kB left over that might get
snapped up later by an extension that wants some shared memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add A Glossary

2020-03-24 Thread Jürgen Purtz


On 24.03.20 19:46, Robert Haas wrote:

Do we use shared_buffers for WAL ?

No.


What's about the explanation in 
https://www.postgresql.org/docs/12/runtime-config-wal.html : 
"wal_buffers (integer)    The amount of shared memory used for WAL data 
that has not yet been written to disk. The default setting of -1 selects 
a size equal to 1/32nd (about 3%) of shared_buffers, ... " ? My 
understanding was, that the parameter wal_buffers grabs some of the 
existing shared_buffers for its own purpose. Is this a 
misinterpretation? Are shared_buffers and wal_buffers two different 
shared memory areas?


Kind regards, Jürgen




Re: Add A Glossary

2020-03-24 Thread Corey Huinker
>
>
> > > +  Records to the file system and creates a special
> >
> > Does the chckpointer actually write WAL ?
>
> Yes.
>
> > An FK doesn't require the values in its table to be unique, right ?
>
> I believe it does require that the values are unique.
>
> > I think there's some confusion.  Constraints are not objects, right ?
>
> I think constraints are definitely objects. They have names and you
> can, for example, COMMENT on them.
>
> > Do we use shared_buffers for WAL ?
>
> No.
>
> (I have not reviewed the patch; these are just a few comments on your
> comments.)
>
>
I'm going to be coalescing the feedback into an updated patch very soon
(tonight/tomorrow), so please keep the feedback on the text/wording coming
until then.
If anyone has a first attempt at all the ACID definitions, I'd love to see
those as well.


Re: Add A Glossary

2020-03-24 Thread Robert Haas
On Fri, Mar 20, 2020 at 3:58 PM Justin Pryzby  wrote:
> > +  A process that writes dirty pages and WAL
> > +  Records to the file system and creates a special
>
> Does the chckpointer actually write WAL ?

Yes.

> An FK doesn't require the values in its table to be unique, right ?

I believe it does require that the values are unique.

> I think there's some confusion.  Constraints are not objects, right ?

I think constraints are definitely objects. They have names and you
can, for example, COMMENT on them.

> Do we use shared_buffers for WAL ?

No.

(I have not reviewed the patch; these are just a few comments on your comments.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add A Glossary

2020-03-24 Thread Peter Eisentraut

On 2020-03-20 01:11, Alvaro Herrera wrote:

I gave this a look.  I first reformatted it so I could read it; that's
0001.  Second I changed all the long  items into s, which
are shorter and don't have to repeat the title of the refered to page.
(Of course, this changes the link to be in the same style as every other
link in our documentation; some people don't like it. But it's our
style.)


AFAICT, all the  elements in this patch should be changed to .

If there is something undesirable about the output style, we can change 
that, but it's not this patch's job to make up its own rules.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-03-21 Thread Justin Pryzby
On Sat, Mar 21, 2020 at 03:08:30PM +0100, Jürgen Purtz wrote:
> On 21.03.20 00:03, Justin Pryzby wrote:
> > > > > +   
> > > > > +Host
> > > > > +
> > > > > + 
> > > > > +  See Server.
> > > > Or client.  Or proxy at some layer or other intermediate thing.  Maybe 
> > > > just
> > > > remove this.
> > > Sometimes the term "host" is used in a different meaning. Therefor we 
> > > shall
> > > have this glossary entry for clarification that it shall be used only in 
> > > the
> > > sense of a "server".
> > I think that suggests just removing "host" and consistently saying "server".
> 
> "server", "host", "database server": All three terms are used intensively in
> the documentation. When we define glossary terms, we should also take into
> account the consequences for those parts.

The documentation uses "host", but doesn't always mean "server".

$ git grep -Fw host doc/src/
doc/src/sgml/backup.sgml:   that you can perform this backup procedure from any 
remote host that has

pg_hba appears to be all about client "hosts".
FATAL:  no pg_hba.conf entry for host "123.123.123.123", user "andym", database 
"testdb"

-- 
Justin




Re: Add A Glossary

2020-03-21 Thread Jürgen Purtz

On 21.03.20 00:03, Justin Pryzby wrote:

+   
+Host
+
+ 
+  See Server.

Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
remove this.

Sometimes the term "host" is used in a different meaning. Therefor we shall
have this glossary entry for clarification that it shall be used only in the
sense of a "server".

I think that suggests just removing "host" and consistently saying "server".


"server", "host", "database server": All three terms are used 
intensively in the documentation. When we define glossary terms, we 
should also take into account the consequences for those parts. 
"database server" is the most diffuse. E.g.: In 'config.sgml' he is used 
in the sense of some hardware or VM "... If you have a dedicated 
database server with 1GB or more of RAM ..." as well as in the sense of 
an instance "... To start the database server on the command prompt 
...". Additionally the term is completely misleading. In both cases we 
do not mean something which is related to a database but something which 
is related to a cluster.


In the past, people accepted such blurs. My - minimal - intention is to 
raise awareness of such ambiguities, or - better - to clearly define the 
situation in the glossary. But this is only a first step. The second 
step shall be a rework of the documentation to use the preferred terms 
defined in the glossary. Because there will be a time gap between the 
two steps, we may want to be a little chatty in the glossary and define 
ambiguous terms as shown in the following example:


---

Server: The term "Server" denotes   .

Host: An outdated term which will be replaced by 
Server over time.


Database Server: An outdated term which sometimes denotes a 
Server and sometimes an 
Instance.


---

This is a pattern for all terms which we currently described with the 
phrase "See ...". Later, after reviewing the documentation by 
eliminating the non-preferred terms, the glossary entries with "An 
outdated term ..." can be dropped.


In the last days we have seen many huge and small proposals. I think, it 
will be helpful to summarize this work by waiting for a patch from 
Alvaro containing everything it deems useful from his point of view.


Kind regards, Jürgen




Re: Add A Glossary

2020-03-20 Thread Corey Huinker
On Fri, Mar 20, 2020 at 6:32 PM Jürgen Purtz  wrote:

> man pages: Sorry, if I confused someone with my poor English. I just
> want to express in my 'offline' mail that we don't have to worry about
> man page generation. The patch doesn't affect files in the /ref
> subdirectory from where man pages are created.
>

It wasn't your poor English - everyone else understood what you meant. I
had wondered if our docs went into man page format as well, so my research
was still time well spent.


Re: Add A Glossary

2020-03-20 Thread Justin Pryzby
On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
> > > +   
> > > +File Segment
> > > +
> > > + 
> > > +   If a heap or index file grows in size over 1 GB, it will be split
> > 1GB is the default "segment size", which you should define.
> 
> ???

"A <> or other >>Relation<<" is larger than a >Cluster's< segment size
is stored in multiple physical files.  This avoids file size limitations which
vary across operating systems."

https://www.postgresql.org/docs/devel/runtime-config-preset.html

ts=# SELECT name, setting, unit, category, short_desc FROM pg_settings WHERE 
name~'block_size|segment_size';
   name   | setting  | unit |category|  
short_desc  
--+--+--++--
 block_size   | 8192 |  | Preset Options | Shows the size of a disk 
block.
 segment_size | 131072   | 8kB  | Preset Options | Shows the number of 
pages per disk file.
 wal_block_size   | 8192 |  | Preset Options | Shows the block size in 
the write ahead log.
 wal_segment_size | 16777216 | B| Preset Options | Shows the size of write 
ahead log segments.

> > > +   
> > > +Heap
> > > +
> > > + 
> > > +  Contains the original values of Row 
> > > attributes
> > I'm not sure what "original" means here ?
> 
> Yes, this may be misleading. I want to express, that values are stored in
> the heap (the 'original') and possibly repeated as a key in an index.

Maybe "this is the content of rows/attributes in >>Tables<< or other 
>>Relations<<".
or "this is the data store for ..."

> > > +   
> > > +Host
> > > +
> > > + 
> > > +  See Server.
> > Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
> > remove this.
> 
> Sometimes the term "host" is used in a different meaning. Therefor we shall
> have this glossary entry for clarification that it shall be used only in the
> sense of a "server".

I think that suggests just removing "host" and consistently saying "server".

-- 
Justin




Re: Add A Glossary

2020-03-20 Thread Jürgen Purtz



On 20.03.20 20:58, Justin Pryzby wrote:

On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote:

+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose
+  value may not be of the same type as the original values.
+  Aggregate Functions
+  combine multiple Rows that share a common set
+  of values into one Row, which means that the
+  only data visible in the values in common, and the aggregates of the

IS the values in common ?
(or, "is the shared values")


+Analytic
+
+ 
+  A Function whose computed value can reference
+  values found in nearby Rows of the same
+  Result Set.
+Archiver

Can you change that to archiver process ?



I prefer the short term without the addition of 'process' - concerning 
'Archiver' as well as the other cases. But I'm not an native English 
speaker.




+Atomic

..

+ 
+  In reference to an operation: An event that cannot be completed in
+  part: it must either entirely succeed or entirely fail. A series of

Can you say: "an action which is not allowed to partially succed and then fail,
..."


+Autovacuum

Say autovacuum process ?


+
+ 
+  Processes that remove outdated MVCC

I would say "A set of processes that remove..."


+  Records of the Heap and

I'm not sure, can you say "tuples" ?



This concerns the upcomming MVCC terms. We need a linguistic distinction 
between the different versions of 'records' or 'tuples'. In my 
understanding the term 'tuple' is nearer to a logical construct 
(relational algebra) and a 'record' some concrete implementation on 
disc. Therefor I prefer 'record' in this context.




+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of

Say DATABASE instance



-1: The term 'database' is used inflationary. We shall restrict it to a 
few cases.




+Backend Server
+
+ 
+  See Instance.

same


+Background Worker
+
+ 
+  Individual processes within an Instance, which

same


+  run system- or user-supplied code. Typical use cases are processes
+  which handle parts of an SQL query to take
+  advantage of parallel execution on servers with multiple
+  CPUs.

I would say "A typical use case is"

+1

+Background Writer

Add "process" ?


+
+ 
+  Writes continuously dirty pages from Shared

Say "Continuously writes"

+1

+  Memory to the file system. It starts periodically, but

Hm, maybe "wakes up periodically"

+1

+Cast
+
+ 
+  A conversion of a Datum from its current data
+  type to another data type.

maybe just say
A conversion of a Datum another data type.


+Catalog
+
+ 
+  The SQL standard uses this standalone term to
+  indicate what is called a Database in
+  PostgreSQL's terminology.

Maybe remove "standalone" ?


+Checkpointer

Process


+  A process that writes dirty pages and WAL
+  Records to the file system and creates a special

Does the chckpointer actually write WAL ?



YES, not only WAL Writer.



+  checkpoint record. This process is initiated when predefined
+  conditions are met, such as a specified amount of time has passed, or
+  a certain volume of records have been collected.

collected or written?

I would say:

+  A checkpoint is usually initiated by
+  a specified amount of time having passed, or
+  a certain volume of records having been written.



+-0



+Checkpoint
+
+ 
+  A  Checkpoint is a point in time

Extra space


+   
+Connection
+
+ 
+  A TCP/IP or socket line for inter-process

I don't know if I've ever heard the phase "socket line"
I guess you mean a unix socket.



+1



+Constraint
+
+ 
+  A concept of restricting the values of data allowed within a
+  Table.

Just say: "A restriction on the values..."?


+Data Area

Remove this ?  I've never heard this phrase before.



grep on *.sgml delivers 4 occurrences.



+
+ 
+  The base directory on the filesystem of a
+  Server that contains all data files and
+  subdirectories associated with a Cluster with
+  the exception of tablespaces. The environment variable

Should add an entry for "tablespace".



+1



+Datum
+
+ 
+  The internal representation of a SQL data type.

I'm not sure if should use "a SQL" or "an SQL", but not both.



grep | wc delivers 106 occurrences for "an SQL" and 63 for "a SQL". It 
depends on how people pronounce the term SQL: "an esquel" or "a sequel".




+Delete
+
+ 
+  A SQL command whose purpose is to remove

just say "which removes"



+1



+   
+File Segment
+
+ 
+   If a heap or index file grows in size over 1 GB, it will be split

1GB is the default "segment size", which you should define.



???



+   
+Foreign Data Wrapper
+
+ 
+  A means of representing data t

Re: Add A Glossary

2020-03-20 Thread Jürgen Purtz
man pages: Sorry, if I confused someone with my poor English. I just 
want to express in my 'offline' mail that we don't have to worry about 
man page generation. The patch doesn't affect files in the /ref 
subdirectory from where man pages are created.


review process: Yes, it will be time-consumptive and it may be a hard 
job because of a) the patch has multiple authors with divergent writing 
styles and b) the terms affect different fundamental issues: SQL basics 
and PG basics. Concerning PG basics in the past we used a wide range of 
similar terms with different meanings as well as different terms for the 
same matter - within our documentation as well as in secondary 
publications. The terms "backend server" / "instance" are such an 
example and there shall be a clear decision in favor of one of the two. 
Presumably we will see more discussions about the question which one is 
the preferred term (remember the discussion concerning the terms 
master/slave, primary/secondary some weeks ago).


ongoing: Intermediate questions for clarifications are welcome.


Kind regards, Jürgen






Re: Add A Glossary

2020-03-20 Thread Justin Pryzby
On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote:
> +Aggregate
> +
> + 
> +  To combine a collection of data values into a single value, whose
> +  value may not be of the same type as the original values.
> +  Aggregate Functions
> +  combine multiple Rows that share a common set
> +  of values into one Row, which means that the
> +  only data visible in the values in common, and the aggregates of the

IS the values in common ?
(or, "is the shared values")

> +Analytic
> +
> + 
> +  A Function whose computed value can reference
> +  values found in nearby Rows of the same
> +  Result Set.

> +Archiver

Can you change that to archiver process ?

> +Atomic
..
> + 
> +  In reference to an operation: An event that cannot be completed in
> +  part: it must either entirely succeed or entirely fail. A series of

Can you say: "an action which is not allowed to partially succed and then fail,
..."

> +Autovacuum

Say autovacuum process ?

> +
> + 
> +  Processes that remove outdated MVCC

I would say "A set of processes that remove..."

> +  Records of the Heap and

I'm not sure, can you say "tuples" ?

> +Backend Process
> +
> + 
> +  Processes of an Instance which act on behalf of

Say DATABASE instance

> +Backend Server
> +
> + 
> +  See Instance.
same

> +Background Worker
> +
> + 
> +  Individual processes within an Instance, which
same

> +  run system- or user-supplied code. Typical use cases are processes
> +  which handle parts of an SQL query to take
> +  advantage of parallel execution on servers with multiple
> +  CPUs.

I would say "A typical use case is"

> +Background Writer

Add "process" ?

> +
> + 
> +  Writes continuously dirty pages from Shared

Say "Continuously writes"

> +  Memory to the file system. It starts periodically, but

Hm, maybe "wakes up periodically"

> +Cast
> +
> + 
> +  A conversion of a Datum from its current data
> +  type to another data type.

maybe just say
A conversion of a Datum another data type.

> +Catalog
> +
> + 
> +  The SQL standard uses this standalone term to
> +  indicate what is called a Database in
> +  PostgreSQL's terminology.

Maybe remove "standalone" ?

> +Checkpointer

Process

> +  A process that writes dirty pages and WAL
> +  Records to the file system and creates a special

Does the chckpointer actually write WAL ?

> +  checkpoint record. This process is initiated when predefined
> +  conditions are met, such as a specified amount of time has passed, or
> +  a certain volume of records have been collected.

collected or written?

I would say:
> +  A checkpoint is usually initiated by
> +  a specified amount of time having passed, or
> +  a certain volume of records having been written.

> +Checkpoint
> +
> + 
> +  A  Checkpoint is a point in time

Extra space

> +   
> +Connection
> +
> + 
> +  A TCP/IP or socket line for inter-process

I don't know if I've ever heard the phase "socket line"
I guess you mean a unix socket.

> +Constraint
> +
> + 
> +  A concept of restricting the values of data allowed within a
> +  Table.

Just say: "A restriction on the values..."?

> +Data Area

Remove this ?  I've never heard this phrase before.

> +
> + 
> +  The base directory on the filesystem of a
> +  Server that contains all data files and
> +  subdirectories associated with a Cluster with
> +  the exception of tablespaces. The environment variable

Should add an entry for "tablespace".

> +Datum
> +
> + 
> +  The internal representation of a SQL data type.

I'm not sure if should use "a SQL" or "an SQL", but not both.

> +Delete
> +
> + 
> +  A SQL command whose purpose is to remove

just say "which removes"

> +   
> +File Segment
> +
> + 
> +   If a heap or index file grows in size over 1 GB, it will be split

1GB is the default "segment size", which you should define.

> +   
> +Foreign Data Wrapper
> +
> + 
> +  A means of representing data that is not contained in the local
> +  Database as if were in local
> +  Table(s).

I'd say:

+ A means of representing data as a Table(s) even though
+ it is not contained in the local Database 


> +   
> +Foreign Key
> +
> + 
> +  A type of Constraint defined on one or more
> +  Columns in a Table which
> +  requires the value in those Columns to uniquely
> +  identify a Row in the specified
> +  Table.

An FK doesn't require the values in its table to be unique, right ?
I'd say something like: "..which enforces that the values in those Columns are
also present in an(other) table."
Reference Referential Integrity?

> +Function
> +
> + 
> +

Re: Add A Glossary

2020-03-20 Thread Corey Huinker
>
> It's hard to review work from a professional tech writer.  I'm under the
> constant impression that I'm ruining somebody's perfect end product,
> making a fool of myself.


If it makes you feel better, it's a mix of definitions I wrote that Roger
proofed and restructured, ones that Jürgen had written for a separate
effort which then got a Roger-pass, and then some edits of my own and some
by Jürgen which I merged without consulting Roger.


Re: Add A Glossary

2020-03-20 Thread Roger Harkavy
Alvaro, I know that you are joking, but I want to impress on everyone:
please don't feel like anyone here is breaking anything when it comes to
modifying the content and structure of this glossary.

I do have technical writing experience, but everyone else here is a subject
matter expert when it comes to the world of databases and how this one in
particular functions.

On Fri, Mar 20, 2020 at 1:51 PM Alvaro Herrera 
wrote:

> On 2020-Mar-20, Corey Huinker wrote:
>
> > > Jürgen mentioned off-list that the man page doesn't build. I was going
> to
> > > look into that, but if anyone has more familiarity with that, I'm
> listening.
>
> > Looking at this some more, I'm not sure anything needs to be done for man
> > pages.
>
> Yeah, I don't think he was saying that we needed to do anything to
> produce a glossary man page; rather that the "make man" command failed.
> I tried it here, and indeed it failed.  But on further investigation,
> after a "make maintainer-clean" it no longer failed.  I'm not sure what
> to make of it, but it seems that this patch needn't concern itself with
> that.
>
> I gave a read through the first few actual definitions.  It's a much
> slower work than I thought!  Attached you'll find the first few edits
> that I propose.
>
> Looking at the definition of "Aggregate" it seemed weird to have it
> stand as a verb infinitive.  I looked up other glossaries, found this
> one
> https://www.gartner.com/en/information-technology/glossary?glossaryletter=T
> and realized that when they do verbs, they put the present participle
> (-ing) form.  So I changed it to "Aggregating", and split out the
> "Aggregate function" into its own term.
>
> In Atomic, there seemed to be excessive use of  in the
> definitions.  Style guides seem to suggest to do that only the first
> time you use a term in a definition.  I removed some markup.
>
> I'm not sure about some terms such as "analytic" and "backend server".
> I put them in XML comments for now.
>
> The other changes should be self-explanatory.
>
> It's hard to review work from a professional tech writer.  I'm under the
> constant impression that I'm ruining somebody's perfect end product,
> making a fool of myself.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Add A Glossary

2020-03-20 Thread Alvaro Herrera
On 2020-Mar-20, Corey Huinker wrote:

> > Jürgen mentioned off-list that the man page doesn't build. I was going to
> > look into that, but if anyone has more familiarity with that, I'm listening.

> Looking at this some more, I'm not sure anything needs to be done for man
> pages.

Yeah, I don't think he was saying that we needed to do anything to
produce a glossary man page; rather that the "make man" command failed.
I tried it here, and indeed it failed.  But on further investigation,
after a "make maintainer-clean" it no longer failed.  I'm not sure what
to make of it, but it seems that this patch needn't concern itself with
that.

I gave a read through the first few actual definitions.  It's a much
slower work than I thought!  Attached you'll find the first few edits
that I propose.

Looking at the definition of "Aggregate" it seemed weird to have it
stand as a verb infinitive.  I looked up other glossaries, found this
one
https://www.gartner.com/en/information-technology/glossary?glossaryletter=T
and realized that when they do verbs, they put the present participle
(-ing) form.  So I changed it to "Aggregating", and split out the
"Aggregate function" into its own term.

In Atomic, there seemed to be excessive use of  in the
definitions.  Style guides seem to suggest to do that only the first
time you use a term in a definition.  I removed some markup.

I'm not sure about some terms such as "analytic" and "backend server".
I put them in XML comments for now.

The other changes should be self-explanatory.

It's hard to review work from a professional tech writer.  I'm under the
constant impression that I'm ruining somebody's perfect end product,
making a fool of myself.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 612ce6e5f4..05fca33d9b 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -6,25 +6,34 @@
   systems in general.
  
   
-   
-Aggregate
+   
+Aggregating
 
  
-  To combine a collection of data values into a single value, whose
-  value may not be of the same type as the original values.
-  Aggregate Functions
-  combine multiple Rows that share a common set
-  of values into one Row, which means that the
-  only data visible in the values in common, and the aggregates of the
-  non-common data.
+  The act of combining a collection of data (input) values into
+  a single output value, which may not be of the same type as the
+  input values.
+ 
+
+   
+
+   
+Aggregate Function
+
+  A Function that combines multiple input values,
+  for example by counting, averaging or adding them all together,
+  yielding a single output value.
  
  
   For more information, see
   .
  
+ 
+  See also Window Function.
 

 
+
 

 Archiver
 
  
-  A process that backs up WAL Files in order to
-  reclaim space on the file system.
+  A process that saves aside copies of WAL Files,
+  for the purposes of creating backup copies or keeping
+  Replicas current.
  
  
   For more information, see
@@ -59,16 +70,15 @@
 
  
   In reference to the value of an Attribute or
-  Datum: cannot be broken down into smaller
-  components.
+  Datum: the property that it cannot be broken down
+  into smaller components.
  
  
   In reference to an operation: An event that cannot be completed in
   part: it must either entirely succeed or entirely fail. A series of
   SQL statements can be combined into a
   Transaction, and that
-  transaction is said to be
-  Atomic.
+  transaction is said to be atomic.
  
 

@@ -112,6 +122,7 @@
 

 
+
 

 Background Worker
@@ -142,8 +154,9 @@
 Background Writer
 
  
-  Writes continuously dirty pages from Shared
-  Memory to the file system. It starts periodically, but
+  A process that continuously writes dirty pages from
+  Shared Memory to the file system.
+  It starts periodically, but
   works only for a short period in order to distribute expensive
   I/O activity over time instead of generating fewer
   large I/O peaks which could block other processes.
@@ -218,10 +231,9 @@
 Checkpoint
 
  
-  A  Checkpoint is a point in time
-  when all older dirty pages of the Shared
-  Memory, all older WAL records, and
-  a special Checkpoint record have been written
+  A point in time when all older dirty pages of the
+  Shared Memory, all older WAL records,
+  and a special Checkpoint record have been written
   and flushed to disk.
  
 
@@ -543,8 +555,8 @@
 
  
   A Relation that contains data derived from a
-  Table (or Relation such
-  as a M

Re: Add A Glossary

2020-03-20 Thread Corey Huinker
>
> Jürgen mentioned off-list that the man page doesn't build. I was going to
>> look into that, but if anyone has more familiarity with that, I'm listening.
>>
>
Looking at this some more, I'm not sure anything needs to be done for man
pages. man1 is for executables, man3 seems to be dblink and SPI, and man7
is all SQL commands. This isn't any of those. The only possible thing left
would be how to render the text of a foo
sgml/postgres.sgml:  &acronyms;
sgml/release.sgml:[A-Z][A-Z_ ]+[A-Z_] , ,
, 
sgml/stylesheet.css:acronym { font-style: inherit; }

filelist.sgml, postgres.sgml, ans stylesheet.css already have the
corresponding change, and the release.sgml is just an incidental mention of
acronym.

Of course I could be missing something.

>


Re: Add A Glossary

2020-03-19 Thread Corey Huinker
On Thu, Mar 19, 2020 at 8:11 PM Alvaro Herrera 
wrote:

> I gave this a look.  I first reformatted it so I could read it; that's
> 0001.  Second I changed all the long  items into s, which
>

Thanks! I didn't know about xrefs, that is a big improvement.


> are shorter and don't have to repeat the title of the refered to page.
> (Of course, this changes the link to be in the same style as every other
> link in our documentation; some people don't like it. But it's our
> style.)
>
> There are some mistakes.  "Tupple" is most glaring one -- not just the
> typo but also the fact that it goes to sql-revoke.  A few definitions
> we'll want to modify.  Nothing too big.  In general I like this work and
> I think we should have it in pg13.
>
> Please bikeshed the definition of your favorite term, and suggest what
> other terms to add.  No pointing out of mere typos yet, please.
>

Jürgen mentioned off-list that the man page doesn't build. I was going to
look into that, but if anyone has more familiarity with that, I'm listening.


> I think we should have the terms Consistency, Isolation, Durability.
>

+1


Re: Add A Glossary

2020-03-19 Thread Alvaro Herrera
I gave this a look.  I first reformatted it so I could read it; that's
0001.  Second I changed all the long  items into s, which
are shorter and don't have to repeat the title of the refered to page.
(Of course, this changes the link to be in the same style as every other
link in our documentation; some people don't like it. But it's our
style.)

There are some mistakes.  "Tupple" is most glaring one -- not just the
typo but also the fact that it goes to sql-revoke.  A few definitions
we'll want to modify.  Nothing too big.  In general I like this work and
I think we should have it in pg13.

Please bikeshed the definition of your favorite term, and suggest what
other terms to add.  No pointing out of mere typos yet, please.

I think we should have the terms Consistency, Isolation, Durability.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 72f7a425fcd9a803010294e6974ecd7680a9aee6 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 19 Mar 2020 18:29:12 -0300
Subject: [PATCH 1/2] glossary

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1441 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1443 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..cf20fb759c
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1441 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose
+  value may not be of the same type as the original values.
+  Aggregate Functions
+  combine multiple Rows that share a common set
+  of values into one Row, which means that the
+  only data visible in the values in common, and the aggregates of the
+  non-common data.
+ 
+ 
+  For more information, see
+  Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference
+  values found in nearby Rows of the same
+  Result Set.
+ 
+ 
+  For more information, see
+  Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to
+  reclaim space on the file system.
+ 
+ 
+  For more information, see
+  Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or
+  Datum: cannot be broken down into smaller
+  components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in
+  part: it must either entirely succeed or entirely fail. A series of
+  SQL statements can be combined into a
+  Transaction, and that
+  transaction is said to be
+  Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  An element with a certain name and data type found within a
+  Tuple or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC
+  Records of the Heap and
+  Index.
+ 
+ 
+  For more information, see
+  Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of
+  client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background
+  Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which
+  run system- or user-supplied code. Typical use cases are processes
+  which handle parts of an SQL query to take
+  advantage of parallel execution on servers with multiple
+  CPUs.
+
+
+ For more information, see
+ Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared
+  Memory to the file system. It starts periodically, but
+  works only for a short period in order to distribute expensive
+  I/O activity over time instead of generating fewer
+  large I/O peaks which could block other processes.
+ 
+ 
+  For more information, see
+  Server Configuration: Resource Consumption.
+ 
+
+   
+
+   
+Cast
+
+ 
+  A conversi

Re: Add A Glossary

2020-03-18 Thread Corey Huinker
On Fri, Mar 13, 2020 at 12:18 AM Jürgen Purtz  wrote:

>
> The statement that names of schema objects are unique isn't *strictly* true,
> just *mostly* true. Take the case of a unique constraints.
>
> Concerning CONSTRAINTS you are right. Constraints seems to be an exception:
>
>- Their name belongs to a schema, but are not necessarily unique
>within this context:
>https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
>- There is a UNIQUE index within the system catalog pg_constraints:  
> "pg_constraint_conrelid_contypid_conname_index"
>UNIQUE, btree (conrelid, contypid, conname), which expresses that
>names are unique within the context of a table/constraint-type.
>Nevertheless tests have shown that some stronger restrictions exists across
>table-boarders (,which seems to be implemented in CREATE statements - or as
>a consequence of your mentioned correlation between constraint and index 
> ?).
>
> I hope that there are no more such exception to the global rule 'object
> names in a schema are unique':
> https://www.postgresql.org/docs/current/sql-createschema.html
>
> This facts must be mentioned as a short note in glossary and in more
> detail in the later patch about the architecture.
>
>
> I did what I could to address the near uniqueness, as well as incorporate
your earlier edits into this new, squashed patch attached.
From dbce6922194eb4ad8de57e81e182b9a6eebf859e Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 10 Mar 2020 11:26:29 -0400
Subject: [PATCH] add glossary page with revisions

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1072 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1074 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..d28bfb6fcf
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1072 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of PostgreSQL and Databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data.
+ 
+ 
+  For more information, see Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference values found in nearby Rows of the same Result Set.
+ 
+ 
+  For more information, see Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to reclaim space on the file system.
+ 
+ 
+  For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken down into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  An element with a certain name and data type found within a Tuple or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC Records of the Heap and Index.
+ 
+ 
+  For more information, see Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs.
+
+
+ For more information, see Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute
+expensive I/O activity over time instead of generating fewe

Re: Add A Glossary

2020-03-12 Thread Jürgen Purtz


The statement that names of schema objects are unique isn't 
/strictly/ true, just /mostly/ true. Take the case of a unique 
constraints. 


Concerning CONSTRAINTS you are right. Constraints seems to be an exception:

 * Their name belongs to a schema, but are not necessarily unique
   within this context:
   https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
 * There is a UNIQUE index within the system catalog pg_constraints:
   "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree
   (conrelid, contypid, conname), which expresses that names are unique
   within the context of a table/constraint-type. Nevertheless tests
   have shown that some stronger restrictions exists across
   table-boarders (,which seems to be implemented in CREATE statements
   - or as a consequence of your mentioned correlation between
   constraint and index ?).

I hope that there are no more such exception to the global rule 'object 
names in a schema are unique': 
https://www.postgresql.org/docs/current/sql-createschema.html


This facts must be mentioned as a short note in glossary and in more 
detail in the later patch about the architecture.


J. Purtz




Re: Add A Glossary

2020-03-11 Thread Corey Huinker
>
>
> * Transaction - yes, all those things could be "visible" or they could be
> "side effects". It may be best to leave the over-simplified definition in
> place, and add a "For more information see < tutorial-transactions>>
>

transaction-iso would be a better linkref in this case


Re: Add A Glossary

2020-03-11 Thread Corey Huinker
>
> It will be helpful for diff-ing to restrict the length of lines in the
> SGML files to 71 characters (as usual).


I did it that way for the following reasons
1. It aids grep-ability
2. The committers seem to be moving towards that for SQL strings, mostly
for reason #1
3. I recall that the code is put through a linter as one of the final steps
before release, I assumed that the SGML gets the same.
4. Even if #3 is false, its easy enough to do manually for me to do for
this one file once we've settled on the text of the definitions.

As for the changes, most things seem fine, I specifically like:
* Checkpoint - looks good
* yes, PGDATA should have been a literal
* Partition - the a/b split works for me
* Unlogged - it reads better

I'm not so sure on / responses to your ???s:
* The statement that names of schema objects are unique isn't *strictly* true,
just *mostly* true. Take the case of a unique constraints. The constraint
has a name and the unique index has the same name, to the point where
adding a unique constraint using an existing index renames that index to
conform to the constraint name.
* Serializable "other way around" question - It's both. Outside the
transaction you can't see changes made inside another transaction (though
you can be blocked by them), and inside serializable you can't see any
changes made since you started. Does that make sense? Were you asking a
different question?
* Transaction - yes, all those things could be "visible" or they could be
"side effects". It may be best to leave the over-simplified definition in
place, and add a "For more information see <>


Re: Add A Glossary

2020-03-11 Thread Corey Huinker
On Wed, Mar 11, 2020 at 12:50 PM Jürgen Purtz  wrote:

> I made changes on top of 0001-add-glossary-page.patch which was supplied
> by C. Huinker. This affects not only terms proposed by me but also his
> original terms. If my changes are not obvious, please let me know and I
> will describe my motivation.
>
> Please note especially lines marked with question marks.
>
> It will be helpful for diff-ing to restrict the length of lines in the
> SGML files to 71 characters (as usual).
>
> J. Purtz
>

A new person replied off-list with some suggested edits, all of which
seemed pretty good. I'll incorporate them myself if that person chooses to
remain off-list.


Re: Add A Glossary

2020-03-11 Thread Jürgen Purtz
I made changes on top of 0001-add-glossary-page.patch which was supplied 
by C. Huinker. This affects not only terms proposed by me but also his 
original terms. If my changes are not obvious, please let me know and I 
will describe my motivation.


Please note especially lines marked with question marks.

It will be helpful for diff-ing to restrict the length of lines in the 
SGML files to 71 characters (as usual).


J. Purtz


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 851e9debe6..52169b86a2 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -1,7 +1,7 @@
 
  Glossary
  
-  This is a list of terms and their in the context of PostgreSQL and Databases in general.
+  This is a list of terms and their meaning in the context of PostgreSQL and Databases in general.
  
   

@@ -44,7 +44,7 @@
 Atomic
 
  
-  In reference to the value of an Attribute or Datum: cannot be broken up into smaller components.
+  In reference to the value of an Attribute or Datum: cannot be broken down into smaller components.
  
  
   In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
@@ -56,7 +56,7 @@
 Attribute
 
  
-  A typed data element found within a Tuple or Relation or Table.
+  An element with a certain name and data type found within a Tuple or Table.
  
 

@@ -164,12 +164,30 @@ any Attribute in the Relation, but
 

 
+   
+Checkpoint
+
+ 
+  A 
+  Checkpoint
+  is a point in time when all older dirty
+  pages of the Shared Memory,
+  all older WAL records, and a
+  special Checkpoint record
+  have been written and flushed to disk.
+ 
+
+   
+

 Cluster
 
  
   A group of Databases plus their Global SQL Objects. The Cluster is managed by exactly one Instance. A newly created Cluster will have three Databases created automatically. They are template0, template1, and postgres. It is expected that an application will create one or more additional Databases aside from these three.
  
+ 
+  Don't confuse the PostgreSQL specific term Cluster with the SQL command Cluster.
+ 
 

 
@@ -198,7 +216,7 @@ any Attribute in the Relation, but
 Concurrency
 
  
-  The concept that multiple independent operations can be happening within the Database at the same time.
+  The concept that multiple independent operations happen within the Database at the same time.
  
 

@@ -216,7 +234,7 @@ any Attribute in the Relation, but
 Constraint
 
  
-  A method of restricting the values of data allowed within a Table.
+  A concept of restricting the values of data allowed within a Table.
  
  
   For more information, see Constraints.
@@ -238,7 +256,7 @@ any Attribute in the Relation, but
 
  
   The base directory on the filesystem of a Server that contains all data
-files and subdirectories associated with a Cluster. The name for this directory in configuration files is PGDATA.
+  files and subdirectories associated with a Cluster with the exception of tablespaces. The environment variable PGDATA often - but not always - referes to the Data Directory.
  
  
   For more information, see Database Physical Storage: Database File Layout.
@@ -250,7 +268,7 @@ files and subdirectories associated with a Cluster. The n
 Database
 
  
-  A named collection of SQL Objects.
+  A named collection of SQL Objects.
  
  
   For more information, see Managing Databases: Overview.
@@ -292,14 +310,13 @@ files and subdirectories associated with a Cluster. The n
 File Segment
 
  
-   If a Database object grows in size past a designated limit, it may be
-split into multiple physical files. These files are called File Segments.
+   If a heap or index file grows in size over 1 GB, it will be split into multiple physical files. These files are called File Segments.
  
  
-  (Don't confuse this term with the similar term WAL Segment).
+  For more information, see Database Physical Storage: Database File Layout.
  
  
-  For more information, see Database Physical Storage: Database File Layout.
+  (Don't confuse this term with the similar term WAL Segment).
  
 

@@ -353,7 +370,7 @@ split into multiple physical files. These files are called File Segme
 Function
 
  
-  Any pre-defined tranformation of data. Many Functions are already defined within PostgreSQL itself, but can also be user-defined.
+  Any pre-defined transformation of data. Many Functions are already defined within PostgreSQL itself, but can also be user-defined.
  
  
   For more information, see

Re: Add A Glossary

2020-03-11 Thread Roger Harkavy
Hello, everyone, I'm Roger, the tech writer who worked with Corey on the
glossary file. I just thought I'd announce that I am also on the list, and
I'm looking forward to any questions or comments people may have. Thanks!

On Tue, Mar 10, 2020 at 11:37 AM Corey Huinker 
wrote:

> This latest version is an attempt at merging the work of Jürgen Purtz into
> what I had posted earlier. There was relatively little overlap in the terms
> we had chosen to define.
>
> Each glossary definition now has a reference id (good idea Jürgen), the
> form of which is "glossary-term". So we can link to the glossary from
> outside if we so choose.
>
> I encourage everyone to read the definitions, and suggest fixes to any
> inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on
> the structure itself, and hoping to get that committed.
>
>
> On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker 
> wrote:
>
>> It seems like this could be a good idea, still the patch has been
>>> waiting on his author for more than two weeks now, so I have marked it
>>> as returned with feedback.
>>>
>>
>> In light of feedback, I enlisted the help of an actual technical writer
>> (Roger Harkavy, CCed) and we eventually found the time to take a second
>> pass at this.
>>
>> Attached is a revised patch.
>>
>>
>


Re: Add A Glossary

2020-03-10 Thread Corey Huinker
This latest version is an attempt at merging the work of Jürgen Purtz into
what I had posted earlier. There was relatively little overlap in the terms
we had chosen to define.

Each glossary definition now has a reference id (good idea Jürgen), the
form of which is "glossary-term". So we can link to the glossary from
outside if we so choose.

I encourage everyone to read the definitions, and suggest fixes to any
inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on
the structure itself, and hoping to get that committed.


On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker 
wrote:

> It seems like this could be a good idea, still the patch has been
>> waiting on his author for more than two weeks now, so I have marked it
>> as returned with feedback.
>>
>
> In light of feedback, I enlisted the help of an actual technical writer
> (Roger Harkavy, CCed) and we eventually found the time to take a second
> pass at this.
>
> Attached is a revised patch.
>
>
From 690473e51fc442c55c1744f69813795fce9d22dc Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 10 Mar 2020 11:26:29 -0400
Subject: [PATCH] add glossary page

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1008 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1010 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..851e9debe6
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1008 @@
+
+ Glossary
+ 
+  This is a list of terms and their in the context of PostgreSQL and Databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data.
+ 
+ 
+  For more information, see Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference values found in nearby Rows of the same Result Set.
+ 
+ 
+  For more information, see Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to reclaim space on the file system.
+ 
+ 
+  For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken up into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  A typed data element found within a Tuple or Relation or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC Records of the Heap and Index.
+ 
+ 
+  For more information, see Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs.
+
+
+ For more information, see Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute
+expensive I/O activity over time instead of generating fewer large I/O peaks which could block other processes.
+ 
+ 
+  For more information, see Server Configuration: Resource Consumption.
+ 
+
+   
+
+   
+Cast
+
+ 
+  A conversion of a Datum from its current data type to another data type.
+ 
+
+   
+
+ 
+Catalog
+
+ 
+  The SQL standard uses this standalone term to indicate what is called a
+Database in PostgreSQL's terminology.
+ 
+

Re: Add A Glossary

2020-02-11 Thread Corey Huinker
>
> It seems like this could be a good idea, still the patch has been
> waiting on his author for more than two weeks now, so I have marked it
> as returned with feedback.
>

In light of feedback, I enlisted the help of an actual technical writer
(Roger Harkavy, CCed) and we eventually found the time to take a second
pass at this.

Attached is a revised patch.
From f087e44fe4db7996880cf4df982297018d444363 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Wed, 12 Feb 2020 04:17:59 +
Subject: [PATCH] add glossary page with initial definitions

---
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/glossary.sgml | 540 +
 doc/src/sgml/postgres.sgml |   1 +
 3 files changed, 542 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..1b881690fa
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,540 @@
+
+
+
+ Glossary
+ 
+  This is a list of terms and their definitions in the context of PostgreSQL and databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  The act of combining a defined collection of data values into a single value that may not be the same type as the original values. Aggregate functions are most often used with Grouping operations which define the separate sets of data by the common values shared within those sets.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A function whose computed value can reference values found in nearby rows of the same result set.
+ 
+
+   
+
+   
+Atomic
+
+ 
+  When referring to the value of an attribute or datum: cannot be broken up into smaller components.
+ 
+ 
+  When referring to an operation: An event that cannot be partially completed; it must either completely succeed or completely fail. A series of SQL statements can be combined into a transaction, and that transaction is described as atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  A typed data element found within a tuple or relation or table.
+ 
+
+   
+
+   
+BYTEA
+
+ 
+  A data type for storing binary data. It is roughly analogous to the BLOB data type in other database products.
+ 
+
+   
+
+   
+Cast
+
+ 
+  The act of converting of a datum from its current data type to another data type.
+ 
+
+   
+
+   
+Check Constraint
+
+ 
+  A type of constraint defined for a relation which restricts the values allowed in one or more attributes. The check constraint can make reference to any attribute in the relation, but cannot reference other rows of the same relation or other relations.
+ 
+
+   
+
+   
+Column
+
+ 
+  An attribute found in a table or view.
+ 
+
+   
+
+   
+Commit
+
+ 
+  The act of finalizing a transaction within the database.
+ 
+
+   
+
+   
+Concurrency
+
+ 
+  The concept that multiple independent operations can be happening within the database at the same time.
+ 
+
+   
+
+   
+Constraint
+
+ 
+  A method of restricting the values of data allowed within a relation. Constraints can currently be of the following types: Check Constraint, Unique Constraint, and Exclusion Constraint.
+ 
+
+   
+
+   
+Datum
+
+ 
+  The internal representation of a SQL datatype.
+ 
+
+   
+
+   
+Delete
+
+ 
+  A SQL command that removes rows from a given table or relation.
+ 
+
+   
+
+   
+Exclusion Constraint
+
+ 
+  Exclusion constraints define both a set of columns for matching rows, and rules where values in one row would conflict with values in another.
+ 
+
+   
+
+   
+Foreign Data Wrapper
+
+ 
+  A means of representing data outside the local database so that it appears as if it were in local tables. With a Foreign Data Wrapper it is possible to define a Foreign Server and Foreign Tables.
+ 
+
+   
+
+   
+Foreign Key
+
+ 
+  A type of constraint defined on one or more columns in a table which requires the value in those columns to uniquely identify a row in the specified table.
+ 
+
+   
+
+   
+Foreign Server
+
+ 
+  A named collection of Foreign Tables which all use the same Foreign Data Wrapper and have other configured attributes in common.
+ 
+
+   
+
+   
+Foreign Table
+
+ 
+  A relation which appears to have rows and columns like a regular table, but when queried will instead forward the request for data through its Foreign Data Wrapper, which will return results structured according to the definition of the Foreign T

Re: Add A Glossary

2019-11-24 Thread Michael Paquier
On Sat, Nov 09, 2019 at 09:19:16AM +0100, Fabien COELHO wrote:
> On principle, I'm fine with having a glossary, i.e. word definitions, which
> are expected to be rather stable in the long run.
> 
> I'm wondering whether the effort would not be made redundant by other
> on-line effort such as wikipedia, wiktionary, stackoverflow, standards,
> whatever.
> 
> When explaining something, the teacher I am usually provides some level of
> example. This may or may not be appropriate there.

That's exactly a good reason for being a reviewer here.  You have
quite some insight here.

> I'd consider making SQL keywords uppercase.
> 
> Developing that is a significant undertaking. Do we have the available
> energy?

It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.
--
Michael


signature.asc
Description: PGP signature


Re: Add A Glossary

2019-11-09 Thread Fabien COELHO


Hello Corey,

My 0.02€:

On principle, I'm fine with having a glossary, i.e. word definitions, 
which are expected to be rather stable in the long run.


I'm wondering whether the effort would not be made redundant by other 
on-line effort such as wikipedia, wiktionary, stackoverflow, standards, 
whatever.


When explaining something, the teacher I am usually provides some level of 
example. This may or may not be appropriate there.


ISTM that there should be pointers to relevant sections in the 
documentation, for instance "Analytics" provided definition suggests

pointing to windowing functions.

There is significant redundancy involved, because a lot of term would be 
defined in other sections anyway.


There should be cross references, eg "Column" definition talks about 
Attribute, Table & View, which should be linked to.


I'd consider making SQL keywords uppercase.

Developing that is a significant undertaking. Do we have the available 
energy?


Patch generates a warning on "git apply".

 sh> git apply ...
 ... terms-and-definitions.patch:159: tab in indent. [...]
 warning: 1 line adds whitespace errors.

"Record" def as nested  for some unclear reason.

Basically the redacted definitions look pretty clear and well written to 
the non-native English speaker I am.


On Sun, 13 Oct 2019, Corey Huinker wrote:


Date: Sun, 13 Oct 2019 16:52:05 -0400
From: Corey Huinker 
To: pgsql-hack...@postgresql.org
Subject: Add A Glossary

Attached is a v1 patch to add a Glossary to the appendix of our current
documentation.

I believe that our documentation needs a glossary for a few reasons:

1. It's hard to ask for help if you don't know the proper terminology of
the problem you're having.

2. Readers who are new to databases may not understand a few of the terms
that are used casually both in the documentation and in forums. This helps
to make our documentation a bit more useful as a teaching tool.

3. Readers whose primary language is not English may struggle to find the
correct search terms, and this glossary may help them grasp that a given
term has a usage in databases that is different from common English usage.

3b. If we are not able to find the resources to translate all of the
documentation into a given language, translating the glossary page would be
a good first step.

4. The glossary would be web-searchable, and draw viewers to the official
documentation.

5. adding link anchors to each term would make them cite-able, useful in
forum conversations.


A few notes about this patch:

1. It's obviously incomplete. There are more terms, a lot more, to add.

2. The individual definitions supplied are off-the-cuff, and should be
thoroughly reviewed.

3. The definitions as a whole should be reviewed by an actual tech writer
(one was initially involved but had to step back due to prior commitments),
and the definitions should be normalized in terms of voice, tone, audience,
etc.

4. My understanding of DocBook is not strong. The glossary vs glosslist tag
issue is a bit confusing to me, and I'm not sure if the glossary tag is
even appropriate for our needs.

5. I've made no effort at making each term an anchor, nor have I done any
CSS styling at all.

6. I'm not quite sure how to handle terms that have different definitions
in different contexts. Should that be two glossdefs following one
glossterm, or two separate def/term pairs?

Please review and share your thoughts.



--
Fabien Coelho - CRI, MINES ParisTech