Re: Operation log for major operations

2023-03-13 Thread Dmitry Koval

Kirk, I'm sorry about the long pause in my reply.

>We need some kind of semaphore flag that tells us something awkward
>happened. When it happened, and a little bit of extra information.

I agree that we do not have this kind of information.
Additionally, legal events like start of pg_rewind, pg_reset, ...  are 
interesting.



>You also make the point that if such things have happened, it would
>probably be a good idea to NOT allow pg_upgrade to run.
>It might even be a reason to constantly bother someone until
>the issue is repaired.

I think no reason to forbid the run of pg_upgrade for the user 
(especially in automatic mode).
If we automatically do NOT allow pg_upgrade, what should the user do for 
allow pg_upgrade?
Unfortunately, PostgreSQL does not have the utilities to correct errors 
in the database (in case of errors users uses copies of the DB or 
corrects errors manually).

An ordinary user cannot correct errors on his own ...
So we cannot REQUIRE the user to correct database errors, we can only 
INFORM about them.



>To that point, this feels like a "postgresql_panic.log" file (within
>the postgresql files?)...  Something that would prevent pg_upgrade,
>etc. That everyone recognizes is serious. Especially 3rd party vendors.
>I see the need for such a thing. I have to agree with others about
>questioning the proper place to write this.
>Are there other places that make sense, that you could use, especially
>if knowing it exists means there was a serious issue?

The location of the operation log (like a "postgresql_panic.log") is not 
easy question.
Our technical support is sure that the large number of failures are 
caused by "human factor" (actions of database administrators).
It is not difficult for database administrators to delete the 
"postgresql_panic.log" file or edit it (for example, replacing it with 
an old version; CRC will not save you from such an action).


Therefore, our technical support decided to place the operation log at 
the end of the pg_control file, at an offset of 8192 bytes (and protect 
this log with CRC).
About writing to the pg_control file what worries Tom Lane: information 
in pg_control is written once at system startup (twice in case of 
"promote").
Also, some utilities write information to the operation log too - 
pg_resetwal, pg_rewind, pg_upgrade (these utilities also modify the 
pg_control file without the operation log).


If you are interested, I can attach the current patch (for info - I 
think it makes no sense to offer this patch at commitfest).


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: Operation log for major operations

2023-03-02 Thread Kirk Wolak
On Thu, Mar 2, 2023 at 4:09 PM Dmitry Koval  wrote:

> I'll try to expand my explanation.
> I fully understand and accept the arguments about "limited sense to go
> into the control file" and "about recording *anything* in the control
> file". This is totally correct for vanilla.
> But vendors have forks of PostgreSQL with custom features and extensions.
> Sometimes (especially at the first releases) these custom components
> have bugs which can causes rare problems in data.
> These problems can migrate with using pg_upgrade and "lazy" upgrade of
> pages to higher versions of PostgreSQL fork.
>
> So in error cases "recording crash information" etc. is not the only
> important information.
> Very important is history of this database (pg_upgrades, promotions,
> pg_resets, pg_rewinds etc.).
> Often these "history" allows you to determine from which version of the
> PostgreSQL fork the error came from and what causes of errors we can
> discard immediately.
>
> This "history" is the information that our technical support wants (and
> reason of this patch), but this information is not needed for vanilla...
>
> Another important condition is that the user should not have easy ways
> to delete information about "history" (about reason to use pg_control
> file as "history" storage, but write into it from position 4kB, 8kB,...).
>
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com
>
> Dmitry, this is a great explanation.  Thinking outside the box, it feels
like:
We need some kind of semaphore flag that tells us something awkward
happened.
When it happened, and a little bit of extra information.

You also make the point that if such things have happened, it would
probably be a good idea to NOT
allow pg_upgrade to run.  It might even be a reason to constantly bother
someone until the issue is repaired.

To that point, this feels like a "postgresql_panic.log" file (within the
postgresql files?)...  Something that would prevent pg_upgrade,
etc.  That everyone recognizes is serious.  Especially 3rd party vendors.

I see the need for such a thing.  I have to agree with others about
questioning the proper place to write this.

Are there other places that make sense, that you could use, especially if
knowing it exists means there was a serious issue?

Kirk


Re: Operation log for major operations

2023-03-02 Thread Dmitry Koval

I'll try to expand my explanation.
I fully understand and accept the arguments about "limited sense to go 
into the control file" and "about recording *anything* in the control 
file". This is totally correct for vanilla.

But vendors have forks of PostgreSQL with custom features and extensions.
Sometimes (especially at the first releases) these custom components 
have bugs which can causes rare problems in data.
These problems can migrate with using pg_upgrade and "lazy" upgrade of 
pages to higher versions of PostgreSQL fork.


So in error cases "recording crash information" etc. is not the only 
important information.
Very important is history of this database (pg_upgrades, promotions, 
pg_resets, pg_rewinds etc.).
Often these "history" allows you to determine from which version of the 
PostgreSQL fork the error came from and what causes of errors we can 
discard immediately.


This "history" is the information that our technical support wants (and 
reason of this patch), but this information is not needed for vanilla...


Another important condition is that the user should not have easy ways 
to delete information about "history" (about reason to use pg_control 
file as "history" storage, but write into it from position 4kB, 8kB,...).


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: Operation log for major operations

2023-03-02 Thread Tom Lane
Justin Pryzby  writes:
> On Thu, Mar 02, 2023 at 08:57:43PM +0300, Dmitry Koval wrote:
>> These changes did not interest the community. It was expected (topic is very
>> specifiс: vendor's technical support). So no sense to distract developers

> Actually, I think there is interest, but it has to be phrased in a
> limited sense to go into the control file.

> In November, I referenced 2 threads, but I think you misunderstood one
> of them.  If you skim the first couple mails, you'll find a discussion
> about recording crash information in the control file.  

> https://www.postgresql.org/message-id/666c2599a07addea00ae2d0af96192def8441974.camel%40j-davis.com

> It's come up several times now, and there seems to be ample support for
> adding some limited information.

> But a "log" which might exceed a few dozen bytes (now or later), that's
> inconsistent with the pre-existing purpose served by pg_control.

I'm pretty dubious about recording *anything* in the control file.
Every time we write to that, we risk the entire database on completing
the write successfully.  I don't want to do that more often than once
per checkpoint.  If you want to put crash info in some less-critical
place, maybe we could talk.

regards, tom lane




Re: Operation log for major operations

2023-03-02 Thread Justin Pryzby
On Thu, Mar 02, 2023 at 08:57:43PM +0300, Dmitry Koval wrote:
> These changes did not interest the community. It was expected (topic is very
> specifiс: vendor's technical support). So no sense to distract developers

Actually, I think there is interest, but it has to be phrased in a
limited sense to go into the control file.

In November, I referenced 2 threads, but I think you misunderstood one
of them.  If you skim the first couple mails, you'll find a discussion
about recording crash information in the control file.  

https://www.postgresql.org/message-id/666c2599a07addea00ae2d0af96192def8441974.camel%40j-davis.com

It's come up several times now, and there seems to be ample support for
adding some limited information.

But a "log" which might exceed a few dozen bytes (now or later), that's
inconsistent with the pre-existing purpose served by pg_control.

-- 
Justin




Re: Operation log for major operations

2023-03-02 Thread Dmitry Koval

Hi!

These changes did not interest the community. It was expected (topic is 
very specifiс: vendor's technical support). So no sense to distract 
developers ...


I'll move patch to Withdrawn.

--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: Operation log for major operations

2023-03-01 Thread Gregory Stark (as CFM)
On Thu, 19 Jan 2023 at 16:12, Dmitry Koval  wrote:
>
>  >The patch does not apply on top of HEAD ...
>
> Thanks!
> Here is a fixed version.

Sorry to say, but this needs a rebase again... Setting to Waiting on Author...

Are there specific feedback needed to make progress? Once it's rebased
if you think it's ready set it to Ready for Committer or if you still
need feedback then Needs Review -- but it's usually more helpful to do
that with an email expressing what questions you're blocked on.

-- 
Gregory Stark
As Commitfest Manager




Re: Operation log for major operations

2023-01-25 Thread Dmitry Koval

Hi!

Maybe another discussion thread can be created for the consolidation of 
XX_file_exists functions.


Usually XX_file_exists functions are simple. They contain single call 
stat() or open() and specific error processing after this call.


Likely the unified function will be too complex to cover all the uses of 
XX_file_exists functions.


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: Operation log for major operations

2023-01-20 Thread Ted Yu
On Fri, Jan 20, 2023 at 1:19 AM Dmitry Koval  wrote:

> Thanks, Ted Yu!
>
>  > Please update year for the license in pg_controllog.c
>
> License updated for files pg_controllog.c, controllog_utils.c,
> pg_controllog.h, controllog_utils.h.
>
>  > +check_file_exists(const char *datadir, const char *path)
>  > There is existing helper function such as:
>  > src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char
> *name);
>  > Is it possible to reuse that code ?
>
> There are a lot of functions that check the file existence:
>
> 1) src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char
> *name);
> 2) src/backend/jit/jit.c:static bool file_exists(const char *name);
> 3) src/test/regress/pg_regress.c:bool file_exists(const char *file);
> 4) src/bin/pg_upgrade/exec.c:bool pid_lock_file_exists(const char
> *datadir);
> 5) src/backend/commands/extension.c:bool extension_file_exists(const
> char *extensionName);
>
> But there is no unified function: different components use their own
> function with their own specific.
> Probably we can not reuse code of dfmgr.c:file_exists() because this
> function skip "errno == EACCES" (this error is critical for us).
> I copied for src/bin/pg_rewind/file_ops.c:check_file_exists() code of
> function jit.c:file_exists() (with adaptation for the utility).
>
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com

Hi,
Maybe another discussion thread can be created for the consolidation of
XX_file_exists functions.

Cheers


Re: Operation log for major operations

2023-01-20 Thread Dmitry Koval

Thanks, Ted Yu!

> Please update year for the license in pg_controllog.c

License updated for files pg_controllog.c, controllog_utils.c, 
pg_controllog.h, controllog_utils.h.


> +check_file_exists(const char *datadir, const char *path)
> There is existing helper function such as:
> src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char *name);
> Is it possible to reuse that code ?

There are a lot of functions that check the file existence:

1) src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char *name);
2) src/backend/jit/jit.c:static bool file_exists(const char *name);
3) src/test/regress/pg_regress.c:bool file_exists(const char *file);
4) src/bin/pg_upgrade/exec.c:bool pid_lock_file_exists(const char *datadir);
5) src/backend/commands/extension.c:bool extension_file_exists(const 
char *extensionName);


But there is no unified function: different components use their own 
function with their own specific.
Probably we can not reuse code of dfmgr.c:file_exists() because this 
function skip "errno == EACCES" (this error is critical for us).
I copied for src/bin/pg_rewind/file_ops.c:check_file_exists() code of 
function jit.c:file_exists() (with adaptation for the utility).


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom aabff7fa8e51e760c558efa7b53fb675f996c7ff Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Mon, 14 Nov 2022 21:39:14 +0300
Subject: [PATCH v5] Operation log

---
 src/backend/access/transam/xlog.c |  10 +
 src/backend/backup/basebackup.c   |   1 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/meson.build|   1 +
 src/backend/utils/misc/pg_controllog.c| 142 
 src/bin/pg_checksums/pg_checksums.c   |   1 +
 src/bin/pg_resetwal/pg_resetwal.c |   4 +
 src/bin/pg_rewind/file_ops.c  |  20 +
 src/bin/pg_rewind/file_ops.h  |   2 +
 src/bin/pg_rewind/pg_rewind.c |  59 ++
 src/bin/pg_upgrade/controldata.c  |  52 ++
 src/bin/pg_upgrade/exec.c |   9 +-
 src/bin/pg_upgrade/pg_upgrade.c   |   2 +
 src/bin/pg_upgrade/pg_upgrade.h   |   2 +
 src/common/Makefile   |   1 +
 src/common/controllog_utils.c | 667 ++
 src/common/meson.build|   1 +
 src/include/catalog/pg_controllog.h   | 142 
 src/include/catalog/pg_proc.dat   |   9 +
 src/include/common/controllog_utils.h |  27 +
 src/test/modules/test_misc/meson.build|   1 +
 .../modules/test_misc/t/004_operation_log.pl  | 136 
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 24 files changed, 1290 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/misc/pg_controllog.c
 create mode 100644 src/common/controllog_utils.c
 create mode 100644 src/include/catalog/pg_controllog.h
 create mode 100644 src/include/common/controllog_utils.h
 create mode 100644 src/test/modules/test_misc/t/004_operation_log.pl

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 8f47fb7570..dd3c4c7ac4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/controllog_utils.h"
 #include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -4775,6 +4776,9 @@ BootStrapXLOG(void)
/* some additional ControlFile fields are set in WriteControlFile() */
WriteControlFile();
 
+   /* Put information into operation log. */
+   put_operation_log_element(DataDir, OLT_BOOTSTRAP);
+
/* Bootstrap the commit log, too */
BootStrapCLOG();
BootStrapCommitTs();
@@ -5743,8 +5747,14 @@ StartupXLOG(void)
SpinLockRelease(>info_lck);
 
UpdateControlFile();
+
LWLockRelease(ControlFileLock);
 
+   /* Put information into operation log. */
+   if (promoted)
+   put_operation_log_element(DataDir, OLT_PROMOTED);
+   put_operation_log_element(DataDir, OLT_STARTUP);
+
/*
 * Shutdown the recovery environment.  This must occur after
 * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 3fb9451643..0ca709b5b2 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -211,6 +211,7 @@ static const struct exclude_list_item excludeFiles[] =
  */
 static const struct exclude_list_item noChecksumFiles[] = {
{"pg_control", false},
+   {"pg_control_log", false},
{"pg_filenode.map", false},
{"pg_internal.init", true},
{"PG_VERSION", false},
diff 

Re: Operation log for major operations

2023-01-19 Thread Ted Yu
On Thu, Jan 19, 2023 at 1:12 PM Dmitry Koval  wrote:

>  >The patch does not apply on top of HEAD ...
>
> Thanks!
> Here is a fixed version.
>
> Additional changes:
> 1) get_operation_log() function doesn't create empty operation log file;
> 2) removed extra unlink() call.
>
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com

Hi,

Copyright (c) 1996-2022

Please update year for the license in pg_controllog.c

+check_file_exists(const char *datadir, const char *path)

There is existing helper function such as:

src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char *name);

Is it possible to reuse that code ?

Cheers


Re: Operation log for major operations

2023-01-19 Thread Dmitry Koval

>The patch does not apply on top of HEAD ...

Thanks!
Here is a fixed version.

Additional changes:
1) get_operation_log() function doesn't create empty operation log file;
2) removed extra unlink() call.

--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom d713a32499802395639645412a7c605870280f3a Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Mon, 14 Nov 2022 21:39:14 +0300
Subject: [PATCH v4] Operation log

---
 src/backend/access/transam/xlog.c |  10 +
 src/backend/backup/basebackup.c   |   1 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/meson.build|   1 +
 src/backend/utils/misc/pg_controllog.c| 142 
 src/bin/pg_checksums/pg_checksums.c   |   1 +
 src/bin/pg_resetwal/pg_resetwal.c |   4 +
 src/bin/pg_rewind/file_ops.c  |  28 +
 src/bin/pg_rewind/file_ops.h  |   2 +
 src/bin/pg_rewind/pg_rewind.c |  59 ++
 src/bin/pg_upgrade/controldata.c  |  52 ++
 src/bin/pg_upgrade/exec.c |   9 +-
 src/bin/pg_upgrade/pg_upgrade.c   |   2 +
 src/bin/pg_upgrade/pg_upgrade.h   |   2 +
 src/common/Makefile   |   1 +
 src/common/controllog_utils.c | 667 ++
 src/common/meson.build|   1 +
 src/include/catalog/pg_controllog.h   | 142 
 src/include/catalog/pg_proc.dat   |   9 +
 src/include/common/controllog_utils.h |  27 +
 src/test/modules/test_misc/meson.build|   1 +
 .../modules/test_misc/t/004_operation_log.pl  | 136 
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 24 files changed, 1298 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/misc/pg_controllog.c
 create mode 100644 src/common/controllog_utils.c
 create mode 100644 src/include/catalog/pg_controllog.h
 create mode 100644 src/include/common/controllog_utils.h
 create mode 100644 src/test/modules/test_misc/t/004_operation_log.pl

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 8f47fb7570..dd3c4c7ac4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/controllog_utils.h"
 #include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -4775,6 +4776,9 @@ BootStrapXLOG(void)
/* some additional ControlFile fields are set in WriteControlFile() */
WriteControlFile();
 
+   /* Put information into operation log. */
+   put_operation_log_element(DataDir, OLT_BOOTSTRAP);
+
/* Bootstrap the commit log, too */
BootStrapCLOG();
BootStrapCommitTs();
@@ -5743,8 +5747,14 @@ StartupXLOG(void)
SpinLockRelease(>info_lck);
 
UpdateControlFile();
+
LWLockRelease(ControlFileLock);
 
+   /* Put information into operation log. */
+   if (promoted)
+   put_operation_log_element(DataDir, OLT_PROMOTED);
+   put_operation_log_element(DataDir, OLT_STARTUP);
+
/*
 * Shutdown the recovery environment.  This must occur after
 * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 3fb9451643..0ca709b5b2 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -211,6 +211,7 @@ static const struct exclude_list_item excludeFiles[] =
  */
 static const struct exclude_list_item noChecksumFiles[] = {
{"pg_control", false},
+   {"pg_control_log", false},
{"pg_filenode.map", false},
{"pg_internal.init", true},
{"PG_VERSION", false},
diff --git a/src/backend/storage/lmgr/lwlocknames.txt 
b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5673de1669 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock   46
 NotifyQueueTailLock47
+ControlLogFileLock 48
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index b9ee4eb48a..3fa20e0368 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -23,6 +23,7 @@ OBJS = \
help_config.o \
pg_config.o \
pg_controldata.o \
+   pg_controllog.o \
pg_rusage.o \
ps_status.o \
queryenvironment.o \
diff --git a/src/backend/utils/misc/meson.build 
b/src/backend/utils/misc/meson.build
index 

Re: Operation log for major operations

2023-01-19 Thread vignesh C
On Sat, 14 Jan 2023 at 15:47, Dmitry Koval  wrote:
>
> Hi!
>
>  >The patch does not apply on top of HEAD ...
>
> Here is a fixed version.
> Small additional fixes:
> 1) added CRC calculation for empty 'pg_control_log' file;
> 2) added saving 'errno' before calling LWLockRelease and restoring after
> that;
> 3) corrected pg_upgrade for case old cluster does not have
> 'pg_control_log' file.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
14bdb3f13de16523609d838b725540af5e23ddd3 ===
=== applying patch ./v3-0001-Operation-log.patch
...
patching file src/tools/msvc/Mkvcbuild.pm
Hunk #1 FAILED at 134.
1 out of 1 hunk FAILED -- saving rejects to file src/tools/msvc/Mkvcbuild.pm.rej

[1] - http://cfbot.cputube.org/patch_41_4018.log

Regards,
Vignesh




Re: Operation log for major operations

2023-01-14 Thread Dmitry Koval

Hi!

>The patch does not apply on top of HEAD ...

Here is a fixed version.
Small additional fixes:
1) added CRC calculation for empty 'pg_control_log' file;
2) added saving 'errno' before calling LWLockRelease and restoring after 
that;
3) corrected pg_upgrade for case old cluster does not have 
'pg_control_log' file.


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom 69faac5e9fc11d17d641a0cff3b26806c2930e3f Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Mon, 14 Nov 2022 21:39:14 +0300
Subject: [PATCH v3] Operation log

---
 src/backend/access/transam/xlog.c |  10 +
 src/backend/backup/basebackup.c   |   1 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/meson.build|   1 +
 src/backend/utils/misc/pg_controllog.c| 142 
 src/bin/pg_checksums/pg_checksums.c   |   1 +
 src/bin/pg_resetwal/pg_resetwal.c |   4 +
 src/bin/pg_rewind/file_ops.c  |  28 +
 src/bin/pg_rewind/file_ops.h  |   2 +
 src/bin/pg_rewind/pg_rewind.c |  59 ++
 src/bin/pg_upgrade/controldata.c  |  67 ++
 src/bin/pg_upgrade/exec.c |   9 +-
 src/bin/pg_upgrade/pg_upgrade.c   |   2 +
 src/bin/pg_upgrade/pg_upgrade.h   |   2 +
 src/common/Makefile   |   1 +
 src/common/controllog_utils.c | 683 ++
 src/common/meson.build|   1 +
 src/include/catalog/pg_controllog.h   | 142 
 src/include/catalog/pg_proc.dat   |   9 +
 src/include/common/controllog_utils.h |  27 +
 src/test/modules/test_misc/meson.build|   1 +
 .../modules/test_misc/t/004_operation_log.pl  | 136 
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 24 files changed, 1329 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/misc/pg_controllog.c
 create mode 100644 src/common/controllog_utils.c
 create mode 100644 src/include/catalog/pg_controllog.h
 create mode 100644 src/include/common/controllog_utils.h
 create mode 100644 src/test/modules/test_misc/t/004_operation_log.pl

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 0070d56b0b..6b82acb46b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/controllog_utils.h"
 #include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -4774,6 +4775,9 @@ BootStrapXLOG(void)
/* some additional ControlFile fields are set in WriteControlFile() */
WriteControlFile();
 
+   /* Put information into operation log. */
+   put_operation_log_element(DataDir, OLT_BOOTSTRAP);
+
/* Bootstrap the commit log, too */
BootStrapCLOG();
BootStrapCommitTs();
@@ -5740,8 +5744,14 @@ StartupXLOG(void)
SpinLockRelease(>info_lck);
 
UpdateControlFile();
+
LWLockRelease(ControlFileLock);
 
+   /* Put information into operation log. */
+   if (promoted)
+   put_operation_log_element(DataDir, OLT_PROMOTED);
+   put_operation_log_element(DataDir, OLT_STARTUP);
+
/*
 * Shutdown the recovery environment.  This must occur after
 * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 3fb9451643..0ca709b5b2 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -211,6 +211,7 @@ static const struct exclude_list_item excludeFiles[] =
  */
 static const struct exclude_list_item noChecksumFiles[] = {
{"pg_control", false},
+   {"pg_control_log", false},
{"pg_filenode.map", false},
{"pg_internal.init", true},
{"PG_VERSION", false},
diff --git a/src/backend/storage/lmgr/lwlocknames.txt 
b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5673de1669 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock   46
 NotifyQueueTailLock47
+ControlLogFileLock 48
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index b9ee4eb48a..3fa20e0368 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -23,6 +23,7 @@ OBJS = \
help_config.o \
pg_config.o \
pg_controldata.o \
+   pg_controllog.o \
pg_rusage.o \
ps_status.o \

Re: Operation log for major operations

2023-01-13 Thread vignesh C
On Mon, 5 Dec 2022 at 13:42, Dmitry Koval  wrote:
>
> Hi!
>
>  >I think storing this in pg_control is a bad idea.  That file is
>  >extremely critical and if you break it, you're pretty much SOL on
>  >recovering your data.  I suggest that this should use a separate file.
>
> Thanks. Operation log location changed to 'global/pg_control_log' (if
> the name is not pretty, it can be changed).
>
> I attached the patch (v2-0001-Operation-log.patch) and description of
> operation log (Operation-log.txt).

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
ff23b592ad6621563d3128b26860bcb41daf9542 ===
=== applying patch ./v2-0001-Operation-log.patch

patching file src/tools/msvc/Mkvcbuild.pm
Hunk #1 FAILED at 134.
1 out of 1 hunk FAILED -- saving rejects to file src/tools/msvc/Mkvcbuild.pm.rej

[1] - http://cfbot.cputube.org/patch_41_4018.log

Regards,
Vignesh




Re: Operation log for major operations

2022-12-05 Thread Dmitry Koval

Hi!

>I think storing this in pg_control is a bad idea.  That file is
>extremely critical and if you break it, you're pretty much SOL on
>recovering your data.  I suggest that this should use a separate file.

Thanks. Operation log location changed to 'global/pg_control_log' (if 
the name is not pretty, it can be changed).


I attached the patch (v2-0001-Operation-log.patch) and description of 
operation log (Operation-log.txt).



With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com-
Introduction.
-
Operation log is designed to store information about critical system events 
(like pg_upgrade, pg_resetwal, pg_resetwal, etc.).
This information is not interesting to the ordinary user, but it is very 
important for the vendor's technical support.
An example: client complains about DB damage to technical support (damage was 
caused by pg_resetwal which was "silent" executed by one of administrators).

Concepts.
-
* operation log is placed in the separate file 'global/pg_control_log' (log 
size is 8kB);
* user can not modify the operation log;  log can be changed  by function call 
only (from postgres or from postgres utilities);
* operation log is a ring buffer (with CRC-32 protection), deleting entries 
from the operation log is possible only when the buffer is overflowed;
* SQL-function is used to read data of operation log.

Example of operation log data.
--

>select * from pg_operation_log();
   event|edition|version|   lsn   | last |count
+---+---+-+--+--
 startup|vanilla|10.22.0|0/828|2022-11-18 23:06:27+03|1
 pg_upgrade |vanilla|15.0.0 |0/828|2022-11-18 23:06:27+03|1
 startup|vanilla|15.0.0 |0/80001F8|2022-11-18 23:11:53+03|3
 pg_resetwal|vanilla|15.0.0 |0/80001F8|2022-11-18 23:09:53+03|2
(4 rows)

Some details about inserting data to operation log.
---
There are two modes of inserting information about events in the operation log.

* MERGE mode (events "startup", "pg_resetwal", "pg_rewind").
We searches in ring buffer of operation log an event with the same type 
("startup" for example) with the same version number.
If event was found, we will increment event counter by 1 and update the 
date/time of event ("last" field) with the current value.
If event was not found, we will add this event to the ring buffer (see INSERT 
mode).
* INSERT mode (events "bootstrap", "pg_upgrade", "promoted").
We will add an event to the ring buffer (without searching).
-
Operation log structures.
-
The operation log is placed in the pg_control_log file (file size 
PG_OPERATION_LOG_FULL_SIZE=8192).
Operation log is a ring buffer of 341 elements (24 bytes each) with header (8 
bytes).

а) Operation log element:

typedef struct OperationLogData
{
uint8   ol_type;/* operation type */
uint8   ol_edition; /* postgres edition */
uint16  ol_count;   /* number of operations */
uint32  ol_version; /* postgres version */
pg_time_t   ol_timestamp;   /* = int64, operation date/time */
XLogRecPtr  ol_lsn; /* = uint64, last check point 
record ptr */
}   OperationLogData;

Description of fields:

* ol_type - event type. The types are contained in an enum:

typedef enum
{
OLT_BOOTSTRAP = 1,  /* bootstrap */
OLT_STARTUP,/* server startup */
OLT_RESETWAL,   /* pg_resetwal */
OLT_REWIND, /* pg_rewind */
OLT_UPGRADE,/* pg_upgrade */
OLT_PROMOTED,   /* promoted */
OLT_NumberOfTypes   /* should be last */
}   ol_type_enum;

* ol_edition - PostgreSQL edition (this field is important for custom 
PostgreSQL editions).
The editions are contained in an enum:

typedef enum
{
ED_PG_ORIGINAL = 0
/* Here can be custom editions */
}   PgNumEdition;

* ol_count - the number of fired events in case events of this type can be 
merged (otherwise 1). Maximum is 65536;
* ol_version - version formed with using PG_VERSION_NUM (PG_VERSION_NUM*100; 
for example: 13000800 for v13.8). Two last digits can be used for patch version;
* ol_timestamp - date/time of the last fired event in case events of this type 
can be merged (otherwise - the date/time of the event);
* ol_lsn - "Latest checkpoint location" value (ControlFile->checkPoint) which 
contains in pg_control file at the time of the event.

б) Operation log header:

typedef struct OperationLogHeader
{
pg_crc32c   ol_crc; /* CRC of operation 

Re: Operation log for major operations

2022-11-23 Thread Alvaro Herrera
On 2022-Nov-21, Dmitry Koval wrote:

> Concepts.
> -
> * operation log is placed in the file 'global/pg_control', starting from
> position 4097 (log size is 4kB);

I think storing this in pg_control is a bad idea.  That file is
extremely critical and if you break it, you're pretty much SOL on
recovering your data.  I suggest that this should use a separate file.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
 (George Orwell's The Lord of the Rings)




Re: Operation log for major operations

2022-11-21 Thread Dmitry Koval

Thanks for references, Justin!

Couple comments about these references.

1) "Make unlogged table resets detectable".
https://www.postgresql.org/message-id/flat/62750df5b126e1d8ee039a79ef3cc64ac3d47cd5.camel%40j-davis.com

This conversation is about specific problem (unlogged table repairing). 
Operation log has another use - it is primary a helper for diagnostics.


2) "RFC: Add 'taint' field to pg_control."
https://www.postgresql.org/message-id/flat/20180228214311.jclah37cwh572t2c%40alap3.anarazel.de

This is almost the same problem that we want to solve with operation 
log. Differences between the operation log and what is discussed in the 
thread:
* there suggested to store operation log in pg_control file - but 
separate from pg_control main data (and write data separately too);
* operation log data can be represented in relational form (not flags), 
this is more usable for RDBMS;
* number of registered event types can be increased easy (without 
changes of storage).


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: Operation log for major operations

2022-11-21 Thread Justin Pryzby
See also prior discussions:
https://www.postgresql.org/message-id/flat/62750df5b126e1d8ee039a79ef3cc64ac3d47cd5.camel%40j-davis.com
https://www.postgresql.org/message-id/flat/20180228214311.jclah37cwh572t2c%40alap3.anarazel.de

-- 
Justin




Operation log for major operations

2022-11-21 Thread Dmitry Koval

Hi, hackers!

It is important for customer support to know what system operations 
(pg_resetwal, pg_rewind, pg_upgrade, ...) have been executed on the 
database.  A variant of implementation of the log for system operations 
(operation log) is attached to this email.


Introduction.
-
Operation log is designed to store information about critical system 
events (like pg_upgrade, pg_resetwal, pg_resetwal, etc.).
This information is not interesting to the ordinary user, but it is very 
important for the vendor's technical support.
An example: client complains about DB damage to technical support 
(damage was caused by pg_resetwal which was "silent" executed by one of 
administrators).


Concepts.
-
* operation log is placed in the file 'global/pg_control', starting from 
position 4097 (log size is 4kB);
* user can not modify the operation log;  log can be changed  by 
function call only (from postgres or from postgres utilities);
* operation log is a ring buffer (with CRC-32 protection), deleting 
entries from the operation log is possible only when the buffer is 
overflowed;

* SQL-function is used to read data of operation log.

Example of operation log data.
--

>select * from pg_operation_log();
   event|edition|version|   lsn   | last |count
+---+---+-+--+--
 startup|vanilla|10.22.0|0/828|2022-11-18 23:06:27+03|1
 pg_upgrade |vanilla|15.0.0 |0/828|2022-11-18 23:06:27+03|1
 startup|vanilla|15.0.0 |0/80001F8|2022-11-18 23:11:53+03|3
 pg_resetwal|vanilla|15.0.0 |0/80001F8|2022-11-18 23:09:53+03|2
(4 rows)

Some details about inserting data to operation log.
---
There are two modes of inserting information about events in the 
operation log.


* MERGE mode (events "startup", "pg_resetwal", "pg_rewind").
We searches in ring buffer of operation log an event with the same type 
("startup" for example) with the same version number.
If event was found, we will increment event counter by 1 and update the 
date/time of event ("last" field) with the current value.
If event was not found, we will add this event to the ring buffer (see 
INSERT mode).

* INSERT mode (events "bootstrap", "pg_upgrade", "promoted").
We will add an event to the ring buffer (without searching).


P.S. File 'global/pg_control' was chosen as operation log storage 
because data of this file cannot be removed or modified in a simple way 
and no need to change any extensions and utilities to support this file.


I attached the patch (v1-0001-Operation-log.patch) and extended 
description of operation log (Operation-log.txt).



With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com-
Operation log structures.
-
The operation log is placed in the pg_control file (file size 
PG_CONTROL_FILE_SIZE=8192).
Operation log size is PG_OPERATION_LOG_FULL_SIZE=4096.
Operation log is a ring buffer of 170 elements (24 bytes each) with header (16 
bytes).

а) Operation log element:

typedef struct OperationLogData
{
uint8   ol_type;/* operation type */
uint8   ol_edition; /* postgres edition */
uint16  ol_count;   /* number of operations */
uint32  ol_version; /* postgres version */
pg_time_t   ol_timestamp;   /* = int64, operation date/time */
XLogRecPtr  ol_lsn; /* = uint64, last check point 
record ptr */
}   OperationLogData;

Description of fields:

* ol_type - event type. The types are contained in an enum:

typedef enum
{
OLT_BOOTSTRAP = 1,  /* bootstrap */
OLT_STARTUP,/* server startup */
OLT_RESETWAL,   /* pg_resetwal */
OLT_REWIND, /* pg_rewind */
OLT_UPGRADE,/* pg_upgrade */
OLT_PROMOTED,   /* promoted */
OLT_NumberOfTypes   /* should be last */
}   ol_type_enum;

* ol_edition - PostgreSQL edition (this field is important for custom 
PostgreSQL editions).
The editions are contained in an enum:

typedef enum
{
ED_PG_ORIGINAL = 0
/* Here can be custom editions */
}   PgNumEdition;

* ol_count - the number of fired events in case events of this type can be 
merged (otherwise 1). Maximum is 65536;
* ol_version - version formed with using PG_VERSION_NUM (PG_VERSION_NUM*100; 
for example: 13000800 for v13.8). Two last digits can be used for patch version;
* ol_timestamp - date/time of the last fired event in case events of this type 
can be merged (otherwise - the date/time of the event);
* ol_lsn - "Latest checkpoint location"