Re: Additional Chapter for Tutorial

Erik Rijkers Fri, 30 Oct 2020 09:45:58 -0700

On 2020-10-30 11:57, Jürgen Purtz wrote:

On 26.10.20 15:53, David G. Johnston wrote:

Removing -docs as moderation won’t let me cross-post.


Hi,

I applied 0009-architecture-vs-master.patch to head
and went through architecture.sgml (only that file),
then produced the attached .diff


And I wrote down some separate items:

1.
'Two Phase Locking' and 'TPL' should be, I think,
'Two-Phase Commit'. Please someone confirm.
(no changes made)

2.

To compare xid to sequence because they similarly 'count up' seems a badidea.

(I don't think it's always true in the case of sequences)
(no changes made)

3.

'accesses' seems a somewhat strange word most of the time just 'access'may be better. Not sure - native speaker wanted. (no changes made)

4.

'heap', in postgres, means often (always?) files. But more generally,the meaning is more associated with memory. Therefore it would be goodI think to explicitly use 'heap file' at least in the beginning once tomake clear that heap implies 'safely written away to disk'. Again, I'mnot quite sure if my understanding is correct - I have made no changesin this regard.




Erik Rijkers

--- doc/src/sgml/architecture.sgml.orig	2020-10-30 15:19:54.469275256 +0100
+++ doc/src/sgml/architecture.sgml	2020-10-30 17:28:24.835233482 +0100
@@ -19,19 +19,18 @@
     In the case of <productname>PostgreSQL</productname>, the server
     launches a single process for each client connection, referred to as a
     <glossterm linkend="glossary-backend">Backend</glossterm> process.
-    Those Backend processes handle the client's requests by acting on the
+    Such a Backend process handles the client's requests by acting on the
     <glossterm linkend="glossary-shared-memory">Shared Memory</glossterm>.
     This leads to other activities (file access, WAL, vacuum, ...) of the
     <glossterm linkend="glossary-instance">Instance</glossterm>. The
     Instance is a group of server-side processes acting on a common
-    Shared Memory. Notably, PostgreSQL does not utilize application
-    threading within its implementation.
+    Shared Memory. PostgreSQL does not utilize threading.
    </para>
 
    <para>
-    The first step in an Instance start is the start of the
+    The first step when an Instance starts is the start of the
     <glossterm linkend="glossary-postmaster">Postmaster</glossterm>.
-    He loads the configuration files, allocates Shared Memory, and
+    It loads the configuration files, allocates Shared Memory, and
     starts the other processes of the Instance:
     <glossterm linkend="glossary-background-writer">Background Writer</glossterm>,
     <glossterm linkend="glossary-checkpointer">Checkpointer</glossterm>,
@@ -66,32 +65,32 @@
    <para>
     When a client application tries to connect to a
     <glossterm linkend="glossary-database">database</glossterm>,
-    this request is handled initially by the Postmaster. He
+    this request is handled initially by the Postmaster. It
     starts a new Backend process and instructs the client
     application to connect to it. All further client requests
-    go to this process and are handled by it.
+    are handled by this process.
    </para>
 
    <para>
     Client requests like <command>SELECT</command> or
     <command>UPDATE</command> usually lead to the
-    necessity to read or write some data. This is carried out
+    necessity to read or write data. This is carried out
     by the client's backend process. Reads involve a page-level
-    cache housed in Shared Memory (for details see:
+    cache, located in Shared Memory (for details see:
     <xref linkend="sysvipc"/>) for the benefit of all processes
-    in the instance. Writes also involve this cache, in additional
+    in the instance. Writes also use this cache, in addition
     to a journal, called a write-ahead-log or WAL.
    </para>
 
    <para>
-    Shared Memory is limited in size. Thus, it becomes necessary
+    Shared Memory is limited in size and it can become necessary
     to evict pages. As long as the content of such pages hasn't
     changed, this is not a problem. But in Shared Memory also
     write actions take place. Modified pages are called dirty
     pages or dirty buffers and before they can be evicted they
-    must be written back to disk. This happens regularly by the
+    must be written to disk. This happens regularly by the
     Background Writer and the Checkpointer process to ensure
-    that the disk version of the pages are kept up-to-date.
+    that the disk version of the pages are up-to-date.
     The synchronisation from RAM to disk consists of two steps.
    </para>
 
@@ -109,7 +108,7 @@
     Shared Memory. The parallel running WAL Writer process
     reads them and appends them to the end of the current
     <glossterm linkend="glossary-wal-record">WAL file</glossterm>.
-    Such sequential writes are much faster than writes to random
+    Such sequential writes are faster than writes to random
     positions of heap and index files. All WAL records created
     out of one dirty page must be transferred to disk before the
     dirty page itself can be transferred to disk in the second step.
@@ -119,19 +118,19 @@
     Second, the transfer of dirty buffers from Shared Memory to
     files must take place. This is the primary task of the
     Background Writer process. Because I/O activities can block
-    other processes significantly, it starts periodically and
+    other processes, it starts periodically and
     acts only for a short period. Doing so, its extensive (and
     expensive) I/O activities are spread over time, avoiding
-    debilitating I/O peaks. Also, the Checkpointer process
-    transfers dirty buffers to file.
+    debilitating I/O peaks. The Checkpointer process
+    also transfers dirty buffers to file.
    </para>
 
    <para>
-    The Checkpointer creates
+    The Checkpointer process creates
     <glossterm linkend="glossary-checkpoint">Checkpoints</glossterm>.
     A Checkpoint is a point in time when all older dirty buffers,
     all older WAL records, and finally a special Checkpoint record
-    have been written and flushed to disk. Heap and index files
+    are written and flushed to disk. Heap and index files
     on the one hand and WAL files on the other hand are in sync.
     Previous WAL is no longer required. In other words,
     a possibly occurring recovery, which integrates the delta
@@ -141,13 +140,13 @@
    </para>
 
    <para>
-    While the Checkpointer ensures that a running system can crash
+    While the Checkpointer ensures that the database system can crash
     and restart itself in a valid state, the administrator needs
     to handle the case where the heap and files themselves become
     corrupted (and possibly the locally written WAL, though that is
     less common). The options and details are covered extensively
     in the backup and restore section (<xref linkend="backup"/>).
-    For our purposes here, note just that the WAL Archiver process
+    For our purposes here, just note that the WAL Archiver process
     can be enabled and configured to run a script on filled WAL
     files &mdash; usually to copy them to a remote location.
    </para>
@@ -234,13 +233,13 @@
    <para>
     Every database must contain at least one schema because all
     <glossterm linkend="glossary-sql-object">SQL Objects</glossterm>
-    are contained in a schema.
-    Schemas are namespaces for their SQL objects and ensure
-    (with one exception) that within their scope names are used
-    only once across all types of SQL objects. E.g., it is not possible
+    must be contained in a schema.
+    Schemas are namespaces for SQL objects and ensure
+    (with one exception) that the SQL object names are used only once within
+    their scope across all types of SQL objects. E.g., it is not possible
     to have a table <literal>employee</literal> and a view
     <literal>employee</literal> within the same schema. But it is
-    possible to have two tables <literal>employee</literal> in
+    possible to have two tables <literal>employee</literal> in two
     different schemas. In this case, the two tables
     are separate objects and independent of each
     other. The only exception to this cross-type uniqueness is that
@@ -273,7 +272,7 @@
     <firstterm>Global SQL Objects</firstterm>, are outside of the
     strict hierarchy: All <firstterm>database names</firstterm>,
     all <firstterm>tablespace names</firstterm>, and all
-    <firstterm>role names</firstterm> are automatically known and
+    <firstterm>role names</firstterm> are automatically
     available throughout the cluster, independent from
     the database or schema in which they where defined originally.
     <xref linkend="tutorial-internal-objects-hierarchy-figure"/>
@@ -302,7 +301,7 @@
    <title>The physical Perspective: Directories and Files</title>
 
    <para>
-    <productname>PostgreSQL</productname> organizes long-lasting
+    <productname>PostgreSQL</productname> organizes long-lasting (persistent)
     data as well as volatile state information about transactions
     or replication actions in the file system. Every
     <xref linkend="glossary-db-cluster"/> has its root directory
@@ -352,20 +351,19 @@
     every table and every index to store heap and index
     data. Those files are accompanied by files for the
     <link linkend="storage-fsm">Free Space Maps</link>
-    (extension <literal>_fsm</literal>) and
+    (suffixed <literal>_fsm</literal>) and
     <link linkend="storage-vm">Visibility Maps</link>
-    (extension <literal>_vm</literal>), which contain optimization information.
+    (suffixed <literal>_vm</literal>), which contain optimization information.
    </para>
 
    <para>
-    Another subdirectory is <literal>global</literal>.
-    In analogy to the database-specific
-    subdirectories, there are files containing information about
+    Another subdirectory is <literal>global</literal> which 
+    contains files with information about
     <glossterm linkend="glossary-sql-object">Global SQL Objects</glossterm>.
     One type of such Global SQL Objects are
     <glossterm linkend="glossary-tablespace">tablespaces</glossterm>.
     In <literal>global</literal> there is information about
-    the tablespaces, not the tablespaces themselves.
+    the tablespaces; not the tablespaces themselves.
    </para>
 
    <para>
@@ -392,11 +390,11 @@
    <para>
     In the root directory <literal>data</literal>
     there are also some files. In many cases, the configuration
-    files of the cluster are stored here. As long as the
+    files of the cluster are stored here. If the
     instance is up and running, the file
     <literal>postmaster.pid</literal> exists here
     and contains the process ID (pid) of the
-    Postmaster which has started the instance.
+    Postmaster which started the instance.
    </para>
 
    <para>
@@ -411,7 +409,7 @@
 
    <para>
     In most cases, <productname>PostgreSQL</productname> databases
-    support many clients at the same time. Therefore, it is necessary to
+    support many clients at the same time which makes it necessary to
     protect concurrently running requests from unwanted overwriting
     of other's data as well as from reading inconsistent data. Imagine an
     online shop offering the last copy of an article. Two clients have the
@@ -432,11 +430,11 @@
     <productname>PostgreSQL</productname> implements a third, more
     sophisticated technique: <firstterm>Multiversion Concurrency
     Control</firstterm> (MVCC). The crucial advantage of MVCC
-    over other technologies gets evident in multiuser OLTP
+    over other technologies becomes evident in multiuser OLTP
     environments with a massive number of concurrent write
     actions. There, MVCC generally performs better than solutions
     using locks. In a <productname>PostgreSQL</productname>
-    database reading never blocks writing and writing never
+    database, reading never blocks writing and writing never
     blocks reading, even in the strictest level of transaction
     isolation.
    </para>
@@ -444,14 +442,14 @@
    <para>
     Instead of locking rows, the <firstterm>MVCC</firstterm> technique creates
     a new version of the row when a data-change takes place. To
-    distinguish between these two versions and to track the timeline
+    distinguish between these two versions, and to track the timeline
     of the row, each of the versions contains, in addition to their user-defined
     columns, two special system columns, which are not visible
     for the usual <command>SELECT * FROM ...</command> command.
     The column <literal>xmin</literal> contains the transaction ID (xid)
-    of the transaction, which created this version of the row. Accordingly,
-    <literal>xmax</literal> contains the xid of the transaction, which has
-    deleted this version, or zero, if the version is not
+    of the transaction which created this version of the row.
+    <literal>xmax</literal> contains the xid of the transaction which has
+    deleted this version, or zero if the version is not
     deleted. You can read both with the command
     <command>SELECT xmin, xmax, * FROM ... </command>.
    </para>
@@ -469,7 +467,7 @@
    </para>
 
    <para>
-    The description in this chapter simplifies by omitting some details.
+    The description in this chapter simplifies by omitting details.
     When many transactions are running simultaneously, things can
     get complicated. Sometimes transactions get aborted via
     <command>ROLLBACK</command> immediately or after a lot of other activities, sometimes
@@ -526,8 +524,8 @@
     creates a new version of the row with its xid in
     <literal>xmin</literal>, <literal>0</literal> in
     <literal>xmax</literal>, and <literal>'y'</literal> in the
-    user data (plus all the other user data from the old version).
-    This version is now valid for all coming transactions.
+    user data (plus all other user data from the old version).
+    This version is now valid for all future transactions.
    </para>
 
    <para>
@@ -624,9 +622,9 @@
     <para>
      Autovacuum runs automatically by
      default. Its default parameters as well as such for
-     <command>VACUUM</command> fit well for most standard
+     <command>VACUUM</command> are appropriate for most standard
      situations. Therefore a novice database manager can
-     easily skip the rest of this chapter which explains
+     skip the rest of this chapter which explains
      a lot of details.
     </para>
    </note>
@@ -687,7 +685,7 @@
 
    <para>
     The eagerness &mdash; you can call it 'aggression' &mdash; of the
-    operations <emphasis>eliminating bloat</emphasis> and
+    operations for <emphasis>eliminating bloat</emphasis> and
     <emphasis>freeze</emphasis> is controlled by configuration
     parameters, runtime flags, and in extreme situations by
     the processes themselves. Because vacuum operations typically are I/O
@@ -783,7 +781,7 @@
        When a client issues the SQL command <command>VACUUM</command>
        with the option <command>FULL</command>.
        Also, in this mode, the bloat disappears, but the strategy used
-       is very different: In this case, the complete table is copied
+       is very different: in this case, the complete table is copied
        to a different file skipping all outdated row versions. This
        leads to a significant reduction of used disk space because
        the new file contains only the actual data. The old file
@@ -1143,7 +1141,7 @@
     atomicity: either all or none of its operations succeed,
     regardless of the fact that it may consist of a lot of
     different write-operations, and each such operation may
-    affect thousands or millions of rows. As soon as one of the
+    affect many rows. As soon as one of the
     operations fails, all previous operations fail also, which
     means that all modified rows retain their values as of the
     beginning of the transaction.
@@ -1157,14 +1155,14 @@
     &mdash; even in the lowest
     <link linkend="transaction-iso">isolation level</link>
     of transactions. <productname>PostgreSQL</productname>
-    does never show uncommitted changes to other connections.
+    never shows uncommitted changes to other connections.
    </para>
 
    <para>
     The situation regarding visibility is somewhat different
     from the point of view of the modifying transaction.
-    <command>SELECT</command> commands issued inside a
-    transaction delivers all changes done so far by this
+    A <command>SELECT</command> command issued inside a
+    transaction shows all changes done so far by this
     transaction.
    </para>
 
@@ -1231,7 +1229,7 @@
    <para>
     Transactions ensure that the
     <glossterm linkend="glossary-consistency">consistency</glossterm>
-    of the complete database always keeps valid. Declarative
+    of the complete database always remains valid. Declarative
     rules like
     <link linkend="ddl-constraints-primary-keys">primary</link>- or
     <link linkend="ddl-constraints-fk">foreign keys</link>,
@@ -1242,13 +1240,6 @@
    </para>
 
    <para>
-    Also, all self-evident &mdash; but possibly not obvious
-    &mdash; low-level demands on the database system are
-    ensured; e.g. index entries for rows must become
-    visible at the same moment as the rows themselves.
-   </para>
-
-   <para>
     There is the additional feature
     '<link linkend="transaction-iso">isolation level</link>',
     which separates transactions from each other in certain ways.
@@ -1287,7 +1278,7 @@
     a severe software error like a null pointer exception.
     Because <productname>PostgreSQL</productname> uses a
     client/server architecture, no direct problem for the
-    database will occur. In all of this cases, the
+    database will occur. In all of these cases, the
     <glossterm linkend="glossary-backend">Backend process</glossterm>,
     which is the client's counterpart at the server-side,
     may recognize that the network connection is no longer
@@ -1310,7 +1301,7 @@
     automatically recognizes that the last shutdown of the
     instance did not happen as expected: files might not be
     closed properly and the <literal>postmaster.pid</literal>
-    file exists. <productname>PostgreSQL</productname>
+    file unexpectedly exists. <productname>PostgreSQL</productname>
     tries to clean up the situation. This is possible because
     all changes in the database are stored twice. First,
     the WAL files contain them as a chronology of
@@ -1328,8 +1319,8 @@
     <glossterm linkend="glossary-checkpoint">checkpoint</glossterm>.
     This checkpoint signals that the database files are in
     a consistent state, especially that all WAL records up to
-    this point were successfully stored in heap and index. Starting
-    here, the recovery process copies the following WAL records
+    this point were successfully stored in heap and index files. Starting
+    here, the recovery process copies the remaining WAL records
     to heap and index. As a result, the files contain all
     changes and reach a consistent state. Changes of committed
     transactions are visible; those of uncommited transactions
@@ -1344,10 +1335,10 @@
    <bridgehead renderas="sect3">Disk crash</bridgehead>
    <para>
     If a disk crashes, the course of action described previously
-    cannot work. It is likely that the WAL files and/or the
+    cannot work: it is likely that the WAL files and/or the
     data and index files are no longer available. The
     database administrator must take special actions to
-    overcome such situations.
+    prepare for such a situation.
    </para>
    <para>
     He obviously needs a backup. How to take such a backup
@@ -1427,7 +1418,7 @@
    </para>
    <para>
     The obvious disadvantage of this method is that there
-    is a downtime where no user interaction is possible.
+    is a downtime.
     The other two strategies run during regular operating
     times.
    </para>
@@ -1456,14 +1447,14 @@
    <bridgehead renderas="sect2">Continuous archiving based on pg_basebackup and WAL files</bridgehead>
    <para>
     <link linkend="continuous-archiving">This method</link>
-    is the most sophisticated and complex one. It
+    is the most sophisticated and most complex one. It
     consists of two phases.
    </para>
    <para>
-    First, you need to create a so called
+    First, you need to create a so-called
     <firstterm>basebackup</firstterm> with the tool
     <command>pg_basebackup</command>. The result is a
-    directory structure plus files which contains a
+    directory structure plus files which contain a
     consistent copy of the original cluster.
     <command>pg_basebackup</command> runs in
     parallel to other processes in its own transaction.
@@ -1484,7 +1475,7 @@
     <glossterm linkend="glossary-wal-archiver">Archiver process</glossterm>
     will automatically copy every single WAL file to a save location.
     <link linkend="backup-archiving-wal">Its configuration</link>
-    consists mainly of a string, which contains a copy command
+    consists mainly of a string that contains a copy command
     in the operating system's syntax. In order to protect your
     data against a disk crash, the destination location
     of a basebackup as well as of the
@@ -1492,9 +1483,8 @@
     disk which is different from the data disk.
    </para>
    <para>
-    If it gets necessary to restore the cluster, you have to
-    copy the basebackup and the
-    archived WAL files to
+    If it becomes necessary to restore the cluster, you have to
+    copy the basebackup and the archived WAL files to
     their original directories. The configuration of this
     <link linkend="backup-pitr-recovery">recovery procedure</link>
     contains a string with the reverse copy command: from

Re: Additional Chapter for Tutorial

Reply via email to