Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

Kyotaro Horiguchi Thu, 31 Mar 2022 18:31:34 -0700

At Thu, 31 Mar 2022 08:45:56 -0700, Nathan Bossart <nathandboss...@gmail.com> 
wrote in 
>     At all times, <productname>PostgreSQL</productname> maintains a
>     <firstterm>write ahead log</firstterm> (WAL) in the 
> <filename>pg_wal/</filename>
> -   subdirectory of the cluster's data directory. The log records
> -   every change made to the database's data files.  This log exists
> +   subdirectory of the cluster's data directory. The WAL records
> +   capture every change made to the database's data files.  This log exists
> 
> I don't think this change really adds anything.  The preceding sentence
> makes it clear that we are discussing the write-ahead log, and IMO the
> change in phrasing ("the log records every change" is changed to "the
> records capture every change") subtly changes the meaning of the sentence.
> 
> The rest looks good to me.


+1.  It is not a composite noun "log records".

The original sentence is "S(The log) V(records) O(every change that is
made to .. files)".  The proposed change looks like changing it to
"S(The log records) V(capture) O(every ..files)".  In that sense, the
original one seem rather correct to me, since "capture" seems to have
the implication of "write after log..", to me.


I looked though the document and found other use of "log
record|segment".  What do you think about the attached?

There're some uncertain point in the change.

      you should at least save the contents of the cluster's 
<filename>pg_wal</filename>
-     subdirectory, as it might contain logs which
+     subdirectory, as it might contain WAL files which
      were not archived before the system went down.

The "logs" means acutally "WAL segment (files)" but the concept of
"segment" is out of focus in the context.  So just "file" is used
there.  The same change is applied on dezon of places.


-   disk-space requirements for the <acronym>WAL</acronym> logs are met,
+   disk-space requirements for the <acronym>WAL</acronym> are met,

This might be better be "WAL files" instead of just "WAL".


-   <acronym>WAL</acronym> logs are stored in the directory
+   <acronym>WAL</acronym> is stored in the directory
    <filename>pg_wal</filename> under the data directory, as a set of

I'm not sure which is better, use "WAL" as a collective noun, or "WAL
files" as the cocrete objects.


-   The aim of <acronym>WAL</acronym> is to ensure that the log is
+   The aim of <acronym>WAL</acronym> is to ensure that the WAL record is
    written before database records are altered, but this can be subverted by

This is not a mechanical change.  But I think this is correct.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index dd8640b092..941042f646 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1246,7 +1246,7 @@ SELECT pg_stop_backup();
      require that you have enough free space on your system to hold two
      copies of your existing database. If you do not have enough space,
      you should at least save the contents of the cluster's 
<filename>pg_wal</filename>
-     subdirectory, as it might contain logs which
+     subdirectory, as it might contain WAL files which
      were not archived before the system went down.
     </para>
    </listitem>
@@ -1324,8 +1324,8 @@ SELECT pg_stop_backup();
     which tells <productname>PostgreSQL</productname> how to retrieve archived
     WAL file segments.  Like the <varname>archive_command</varname>, this is
     a shell command string.  It can contain <literal>%f</literal>, which is
-    replaced by the name of the desired log file, and <literal>%p</literal>,
-    which is replaced by the path name to copy the log file to.
+    replaced by the name of the desired WAL file, and <literal>%p</literal>,
+    which is replaced by the path name to copy the WAL file to.
     (The path name is relative to the current working directory,
     i.e., the cluster's data directory.)
     Write <literal>%%</literal> if you need to embed an actual 
<literal>%</literal>
@@ -1651,9 +1651,9 @@ archive_command = 'local_backup_script.sh "%p" "%f"'
      <link linkend="sql-createtablespace"><command>CREATE 
TABLESPACE</command></link>
      commands are WAL-logged with the literal absolute path, and will
      therefore be replayed as tablespace creations with the same
-     absolute path.  This might be undesirable if the log is being
+     absolute path.  This might be undesirable if the WAL is being
      replayed on a different machine.  It can be dangerous even if the
-     log is being replayed on the same machine, but into a new data
+     WAL is being replayed on the same machine, but into a new data
      directory: the replay will still overwrite the contents of the
      original tablespace.  To avoid potential gotchas of this sort,
      the best practice is to take a new base backup after creating or
@@ -1670,11 +1670,11 @@ archive_command = 'local_backup_script.sh "%p" "%f"'
     we might need to fix partially-written disk pages.  Depending on
     your system hardware and software, the risk of partial writes might
     be small enough to ignore, in which case you can significantly
-    reduce the total volume of archived logs by turning off page
+    reduce the total volume of archived WAL files by turning off page
     snapshots using the <xref linkend="guc-full-page-writes"/>
     parameter.  (Read the notes and warnings in <xref linkend="wal"/>
     before you do so.)  Turning off page snapshots does not prevent
-    use of the logs for PITR operations.  An area for future
+    use of the WAL for PITR operations.  An area for future
     development is to compress archived WAL data by removing
     unnecessary page copies even when <varname>full_page_writes</varname> is
     on.  In the meantime, administrators might wish to reduce the number
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 96f9b3dd70..de0bed2b10 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -53,7 +53,7 @@ PostgreSQL documentation
       <term><replaceable class="parameter">startseg</replaceable></term>
       <listitem>
        <para>
-        Start reading at the specified log segment file.  This implicitly 
determines
+        Start reading at the specified WAL segment file.  This implicitly 
determines
         the path in which files will be searched for, and the timeline to use.
        </para>
       </listitem>
@@ -63,7 +63,7 @@ PostgreSQL documentation
       <term><replaceable class="parameter">endseg</replaceable></term>
       <listitem>
        <para>
-        Stop after reading the specified log segment file.
+        Stop after reading the specified WAL segment file.
        </para>
       </listitem>
      </varlistentry>
@@ -141,7 +141,7 @@ PostgreSQL documentation
       <term><option>--path=<replaceable>path</replaceable></option></term>
       <listitem>
        <para>
-        Specifies a directory to search for log segment files or a
+        Specifies a directory to search for WAL segment files or a
         directory with a <literal>pg_wal</literal> subdirectory that
         contains such files.  The default is to search in the current
         directory, the <literal>pg_wal</literal> subdirectory of the
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 2677996f2a..69dd74f4ab 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -322,15 +322,15 @@
 
    <para>
     Using <acronym>WAL</acronym> results in a
-    significantly reduced number of disk writes, because only the log
+    significantly reduced number of disk writes, because only the WAL
     file needs to be flushed to disk to guarantee that a transaction is
     committed, rather than every data file changed by the transaction.
-    The log file is written sequentially,
-    and so the cost of syncing the log is much less than the cost of
+    The WAL file is written sequentially,
+    and so the cost of syncing the WAL is much less than the cost of
     flushing the data pages.  This is especially true for servers
     handling many small transactions touching different parts of the data
     store.  Furthermore, when the server is processing many small concurrent
-    transactions, one <function>fsync</function> of the log file may
+    transactions, one <function>fsync</function> of the WAL file may
     suffice to commit many transactions.
    </para>
 
@@ -340,10 +340,10 @@
     linkend="continuous-archiving"/>.  By archiving the WAL data we can support
     reverting to any time instant covered by the available WAL data:
     we simply install a prior physical backup of the database, and
-    replay the WAL log just as far as the desired time.  What's more,
+    replay the WAL just as far as the desired time.  What's more,
     the physical backup doesn't have to be an instantaneous snapshot
     of the database state &mdash; if it is made over some period of time,
-    then replaying the WAL log for that period will fix any internal
+    then replaying the WAL for that period will fix any internal
     inconsistencies.
    </para>
   </sect1>
@@ -496,15 +496,15 @@
    that the heap and index data files have been updated with all
    information written before that checkpoint.  At checkpoint time, all
    dirty data pages are flushed to disk and a special checkpoint record is
-   written to the log file.  (The change records were previously flushed
+   written to the WAL file.  (The change records were previously flushed
    to the <acronym>WAL</acronym> files.)
    In the event of a crash, the crash recovery procedure looks at the latest
-   checkpoint record to determine the point in the log (known as the redo
+   checkpoint record to determine the point in the WAL (known as the redo
    record) from which it should start the REDO operation.  Any changes made to
    data files before that point are guaranteed to be already on disk.
-   Hence, after a checkpoint, log segments preceding the one containing
+   Hence, after a checkpoint, WAL segments preceding the one containing
    the redo record are no longer needed and can be recycled or removed. (When
-   <acronym>WAL</acronym> archiving is being done, the log segments must be
+   <acronym>WAL</acronym> archiving is being done, the WAL segments must be
    archived before being recycled or removed.)
   </para>
 
@@ -543,7 +543,7 @@
    another factor to consider. To ensure data page consistency,
    the first modification of a data page after each checkpoint results in
    logging the entire page content. In that case,
-   a smaller checkpoint interval increases the volume of output to the WAL log,
+   a smaller checkpoint interval increases the volume of output to the WAL,
    partially negating the goal of using a smaller interval,
    and in any case causing more disk I/O.
   </para>
@@ -613,10 +613,10 @@
   <para>
    The number of WAL segment files in <filename>pg_wal</filename> directory 
depends on
    <varname>min_wal_size</varname>, <varname>max_wal_size</varname> and
-   the amount of WAL generated in previous checkpoint cycles. When old log
+   the amount of WAL generated in previous checkpoint cycles. When old WAL
    segment files are no longer needed, they are removed or recycled (that is,
    renamed to become future segments in the numbered sequence). If, due to a
-   short-term peak of log output rate, <varname>max_wal_size</varname> is
+   short-term peak of WAL output rate, <varname>max_wal_size</varname> is
    exceeded, the unneeded segment files will be removed until the system
    gets back under this limit. Below that limit, the system recycles enough
    WAL files to cover the estimated need until the next checkpoint, and
@@ -649,7 +649,7 @@
    which are similar to checkpoints in normal operation: the server forces
    all its state to disk, updates the <filename>pg_control</filename> file to
    indicate that the already-processed WAL data need not be scanned again,
-   and then recycles any old log segment files in the 
<filename>pg_wal</filename>
+   and then recycles any old WAL segment files in the 
<filename>pg_wal</filename>
    directory.
    Restartpoints can't be performed more frequently than checkpoints on the
    primary because restartpoints can only be performed at checkpoint records.
@@ -675,12 +675,12 @@
    insertion) at a time when an exclusive lock is held on affected
    data pages, so the operation needs to be as fast as possible.  What
    is worse, writing <acronym>WAL</acronym> buffers might also force the
-   creation of a new log segment, which takes even more
+   creation of a new WAL segment, which takes even more
    time. Normally, <acronym>WAL</acronym> buffers should be written
    and flushed by an <function>XLogFlush</function> request, which is
    made, for the most part, at transaction commit time to ensure that
    transaction records are flushed to permanent storage. On systems
-   with high log output, <function>XLogFlush</function> requests might
+   with high WAL output, <function>XLogFlush</function> requests might
    not occur often enough to prevent <function>XLogInsertRecord</function>
    from having to do writes.  On such systems
    one should increase the number of <acronym>WAL</acronym> buffers by
@@ -723,7 +723,7 @@
    <varname>commit_delay</varname>, so this value is recommended as the
    starting point to use when optimizing for a particular workload.  While
    tuning <varname>commit_delay</varname> is particularly useful when the
-   WAL log is stored on high-latency rotating disks, benefits can be
+   WAL is stored on high-latency rotating disks, benefits can be
    significant even on storage media with very fast sync times, such as
    solid-state drives or RAID arrays with a battery-backed write cache;
    but this should definitely be tested against a representative workload.
@@ -815,16 +815,16 @@
   <para>
    <acronym>WAL</acronym> is automatically enabled; no action is
    required from the administrator except ensuring that the
-   disk-space requirements for the <acronym>WAL</acronym> logs are met,
+   disk-space requirements for the <acronym>WAL</acronym> are met,
    and that any necessary tuning is done (see <xref
    linkend="wal-configuration"/>).
   </para>
 
   <para>
    <acronym>WAL</acronym> records are appended to the <acronym>WAL</acronym>
-   logs as each new record is written. The insert position is described by
+   as each new record is written. The insert position is described by
    a Log Sequence Number (<acronym>LSN</acronym>) that is a byte offset into
-   the logs, increasing monotonically with each new record.
+   the WAL, increasing monotonically with each new record.
    <acronym>LSN</acronym> values are returned as the datatype
    <link linkend="datatype-pg-lsn"><type>pg_lsn</type></link>. Values can be
    compared to calculate the volume of <acronym>WAL</acronym> data that
@@ -833,7 +833,7 @@
   </para>
 
   <para>
-   <acronym>WAL</acronym> logs are stored in the directory
+   <acronym>WAL</acronym> is stored in the directory
    <filename>pg_wal</filename> under the data directory, as a set of
    segment files, normally each 16 MB in size (but the size can be changed
    by altering the <option>--wal-segsize</option> 
<application>initdb</application> option).  Each segment is
@@ -848,7 +848,7 @@
   </para>
 
   <para>
-   It is advantageous if the log is located on a different disk from the
+   It is advantageous if the WAL is located on a different disk from the
    main database files.  This can be achieved by moving the
    <filename>pg_wal</filename> directory to another location (while the server
    is shut down, of course) and creating a symbolic link from the
@@ -856,7 +856,7 @@
   </para>
 
   <para>
-   The aim of <acronym>WAL</acronym> is to ensure that the log is
+   The aim of <acronym>WAL</acronym> is to ensure that the WAL record is
    written before database records are altered, but this can be subverted by
    disk drives<indexterm><primary>disk drive</primary></indexterm> that 
falsely report a
    successful write to the kernel,
@@ -864,19 +864,19 @@
    on the disk.  A power failure in such a situation might lead to
    irrecoverable data corruption.  Administrators should try to ensure
    that disks holding <productname>PostgreSQL</productname>'s
-   <acronym>WAL</acronym> log files do not make such false reports.
+   <acronym>WAL</acronym> files do not make such false reports.
    (See <xref linkend="wal-reliability"/>.)
   </para>
 
   <para>
-   After a checkpoint has been made and the log flushed, the
+   After a checkpoint has been made and the WAL flushed, the
    checkpoint's position is saved in the file
    <filename>pg_control</filename>. Therefore, at the start of recovery,
    the server first reads <filename>pg_control</filename> and
    then the checkpoint record; then it performs the REDO operation by
-   scanning forward from the log location indicated in the checkpoint
+   scanning forward from the WAL location indicated in the checkpoint
    record.  Because the entire content of data pages is saved in the
-   log on the first page modification after a checkpoint (assuming
+   WAL record on the first page modification after a checkpoint (assuming
    <xref linkend="guc-full-page-writes"/> is not disabled), all pages
    changed since the checkpoint will be restored to a consistent
    state.
@@ -884,7 +884,7 @@
 
   <para>
    To deal with the case where <filename>pg_control</filename> is
-   corrupt, we should support the possibility of scanning existing log
+   corrupt, we should support the possibility of scanning existing WAL
    segments in reverse order &mdash; newest to oldest &mdash; in order to find 
the
    latest checkpoint.  This has not been implemented yet.
    <filename>pg_control</filename> is small enough (less than one disk page)

Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

Reply via email to