From: Osumi, Takamichi/大墨 昂道 <[email protected]>
> I wrote and attached the first patch to disable WAL logging.
> This patch passes the regression test of check-world already and is formatted
I think make check-world uses the default setting for wal_level. You need to
set wal_level = none and do make installcheck-world.
> I compared two wal_levels both 'minimal' and 'none'.
> For both levels, I measured
> (1) cluster's restore from pg_dumpall output,
> (2) COPY initial data to a defined table as initial data loading, and
> (3) COPY data to a defined table with tuples, 3 times each for all cases.
> After that, calculated the average and the ratio of the loading speed.
> The conclusion is that wal_level=none cuts about 20% of the loading speed
> compared to 'minimal' in the three cases above.
Hmm. I wonder why pg_dumpall's performance increases by as much as 20%. On
the contrary to my previous hope, pg_dumpall uses COPY to restore data, so it
doesn't emit WAL when wal_level = minimal. (Is it brought by the difference of
whether DDL's WAL is emitted or not?)
> Sharing other scenario to measure is welcome.
How about running multiple concurrent data loading sessions when adding data to
existing tables with data, so that WAL is the bottleneck? That's the use case
of the target customer, isn't it?
> The input data was generated from pgbench with 1000 scale factor.
> It's about 9.3GB. For the table definition or the initial data for appended
> data
IIRC, I thought the scale factor of 1,000 is 1.5 GB. What displayed 9.3 GB?
SELECT pg_database_size() or something?
Below are review comments:
(1)
@@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
return EndPos;
}
+ /* Issues WAL only for transaction end and check point */
+ if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
+ {
+ XLogResetInsertion();
+ return GetLatestCheckPointLSN();
+ }
+
do
This does not emit transaction completion WAL records. Their RM ID is
RM_XACT_ID. Also, RM_XLOG_ID includes other kinds of WAL records than the
checkpoint WAL record. Correct the comment accordingly. I don't have a good
idea on how to represent the RM_XLOG_ID,, but the following might help:
[rmgrlist.h]
/* symbol name, textual name, redo, desc, identify, startup, cleanup */
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL,
NULL)
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL,
NULL, NULL)
(2)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
@@ -2591,10 +2591,10 @@ include_dir 'conf.d'
data to support WAL archiving and replication, including running
read-only queries on a standby server. <literal>minimal</literal>
removes all
logging except the information required to recover from a crash or
- immediate shutdown. Finally,
+ immediate shutdown. <literal>none</literal> generates no WAL in any
case. Finally,
According to the previous code, "none" emits some kinds of WAL records. So I
think we need to come up with a good name and/or description.
(3)
<literal>logical</literal> adds information necessary to support
logical
- decoding. Each level includes the information logged at all lower
- levels. This parameter can only be set at server start.
+ decoding. Each level except for <literal>none</literal> includes the
+ information logged at all lower levels. This parameter can only be
set at server start.
</para>
Why is this change necessary?
(4)
+ On the other hand, an unexpected crash of the server makes the
database cluster
+ inconsistent. For that reason, before utilizing this level, get a full
backup of the cluster and
+ backup of the entire operations that are done under the condition that
+ <varname>wal_level</varname> is <literal>none</literal>.
This gives the impression that the user can start the database server and see
inconsistent data. The reality is that the database server does not start,
isn't it?
(5)
@@ -1751,7 +1752,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND
somethingelse;
Aside from avoiding the time for the archiver or WAL sender to process the
WAL data, doing this will actually make certain commands faster, because
they do not to write WAL at all if <varname>wal_level</varname>
- is <literal>minimal</literal> and the current subtransaction (or top-level
+ is either <literal>minimal</literal> or <literal>minimal</literal>
+ and the current subtransaction (or top-level
transaction) created or truncated the table or index they change. (They
can guarantee crash safety more cheaply by doing
an <function>fsync</function> at the end than by writing WAL.)
This is not correct. In minimal, some conditions need to hold true for WAL to
not be generated as described above. OTOH, wal_level = none does not generate
WAL unconditionally.
(6)
14.4.9. Some Notes about pg_dump
...
If using WAL archiving or streaming replication, consider disabling them during
the restore. To do that, set archive_mode to off, wal_level to minimal, and
max_wal_senders to zero before loading the dump. Afterwards, set them back to
the right values and take a fresh base backup.
Why don't you refer to wal_level = none here as well?
(7)
@@ -918,10 +918,13 @@ PostmasterMain(int argc, char *argv[])
ReservedBackends, MaxConnections);
ExitPostmaster(1);
}
- if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level ==
WAL_LEVEL_MINIMAL)
+ if ((XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_NONE)
||
+ (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level ==
WAL_LEVEL_MINIMAL))
ereport(ERROR,
- (errmsg("WAL archival cannot be enabled when
wal_level is \"minimal\"")));
- if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
+ (errmsg("WAL archival cannot be enabled when
wal_level is \"%s\"",
+ wal_level == WAL_LEVEL_MINIMAL
? "minimal" : "none")));
+ if ((max_wal_senders > 0 && wal_level == WAL_LEVEL_NONE) ||
+ (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL))
ereport(ERROR,
(errmsg("WAL streaming (max_wal_senders > 0)
requires wal_level \"replica\" or \"logical\"")));
This style of writing conitions is redundant. You can just change the code to
wal_level <= WAL_LEVEL_MINIMAL.
Also, the first message can be '"minimal" or "none"' like the second one.
(8)
@@ -989,6 +992,12 @@ PostmasterMain(int argc, char *argv[])
LocalProcessControlFile(false);
/*
+ * Check some conditions specific to wal_level='none' and ensures the
+ * database isn't inconsistent.
+ */
+ SafelyFinishedNoneLevel();
+
+ /*
This check should be moved to around the beginning of StartupXLOG().
PostmasterMain() is called in multi-user mode. It's not called in single-user
mode (postgres --single).
The new function is not necessary. StartupXLOG() can see the control file
contents directly.
(9)
+ /*
+ * Detect if we previously crashed under wal_level='none' or not.
+ */
+ unexpected_shutdown = ControlFile->state != DB_SHUTDOWNED &&
+ ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY;
+ if ((ControlFile->wal_level == WAL_LEVEL_NONE && unexpected_shutdown))
+ {
+ ereport(ERROR,
+ (errmsg("Server was unexpectedly shut down when
WAL logging was disabled"),
+ errhint("It looks like you need to deploy a
new cluster from your full backup again.")));
+ }
+}
Refine the message according to the following message guideline. For example,
the primary message has to start with a lower case letter.
https://www.postgresql.org/docs/devel/source.html
(10)
You need to edit this to add none:
src/backend/utils/misc/postgresql.conf.sample
(11)
src/include/access/xlogdefs.h
src/backend/access/transam/varsup.c
Consider modifying the comments in these files that refer to wal_level. Maybe
wal_level <= minimal is enough?
(12)
src/include/utils/rel.h
Modify the RelationNeedsWAL() so that it returns false when wal_level = none.
Adding wal_level != WAL_LEVEL_NONE is would probably be okay.
(13)
@@ -161,7 +161,8 @@ extern int XLogArchiveMode;
/* WAL levels */
typedef enum WalLevel
{
- WAL_LEVEL_MINIMAL = 0,
+ WAL_LEVEL_NONE = 0,
+ WAL_LEVEL_MINIMAL,
WAL_LEVEL_REPLICA,
WAL_LEVEL_LOGICAL
} WalLevel;
I'm a bit concerned about if we can change the values of existing symbols,
because wal_level setting is stored in pg_control file. Having a quick look at
source code, there seems to be no problem. However, I'm not sure about
pg_upgrade. Can you try, for example,
1. Create the database cluster with an older version, say, PG 13.
2. Start and stop the database server.
3. Run pg_controldata and see that it prints replica for the wal_level value.
4. Upgrade the database cluster with pg_upgrade.
5. Run pg_controldata and see the wal_level value.
If you change the values of existing symbols like your patch, you need to bump
PG_CONTROL_VERSION.
If you find we can't tchange the existing values, you can probably set
WAL_LEVEL_NONE to -1.
Regards
Takayuki Tsunakawa