Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-27 Thread Bharath Rupireddy
On Mon, Dec 26, 2022 at 4:18 PM Bharath Rupireddy
 wrote:
>
> On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier  wrote:
>
> +1. I think this feature will also be useful in pg_walinspect.

Just for the record - here's the pg_walinspect function to extract
FPIs from WAL records -
https://www.postgresql.org/message-id/CALj2ACVCcvzd7WiWvD%3D6_7NBvVB_r6G0EGSxL4F8vosAi6Se4g%40mail.gmail.com.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Michael Paquier
On Mon, Dec 26, 2022 at 04:18:18PM +0530, Bharath Rupireddy wrote:
> +1. I think this feature will also be useful in pg_walinspect.
> However, I'm a bit concerned that it can flood the running database
> disk - if someone generates a lot of FPI files.

pg_read_file() and pg_waldump can be fed absolute paths outside the
data folder.  So, well, just don't do that then :)
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Michael Paquier
On Mon, Dec 26, 2022 at 02:39:03PM -0600, Justin Pryzby wrote:
> fclose() should be tested, too:

Sure.  Done that too, and applied the change after a last lookup.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Justin Pryzby
On Mon, Dec 26, 2022 at 04:28:52PM +0900, Michael Paquier wrote:
> Comments?

> + file = fopen(filename, PG_BINARY_W);
> + if (!file)
> + pg_fatal("could not open file \"%s\": %m", filename);
> +
> + if (fwrite(page, BLCKSZ, 1, file) != 1)
> + pg_fatal("could not write file \"%s\": %m", filename);
> +
> + fclose(file);

fclose() should be tested, too:

> + if (fwrite(page, BLCKSZ, 1, file) != 1 || fclose(file) != 0)
> + pg_fatal("could not write file \"%s\": %m", filename);

-- 
Justin




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread David Christensen
> On Dec 26, 2022, at 1:29 AM, Michael Paquier  wrote:
> 
> On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote:
>> Thanks for the patch. I've made the above change as well as renamed
>> the test file name to be save_fpi.pl, everything else remains the same
>> as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
>> https://commitfest.postgresql.org/41/3628/.
> 
> I have done a review of that, and here are my notes:
> - The variable names were a bit inconsistent, so I have switched most
> of the new code to use "fullpage".
> - The code was not able to handle the case of a target directory
> existing but empty, so I have added a wrapper on pg_check_dir().
> - XLogRecordHasFPW() could be checked directly in the function saving
> the blocks.  Still, there is no need for it as we apply the same
> checks again in the inner loop of the routine.
> - The new test has been renamed.
> - RestoreBlockImage() would report a failure and the code would just
> skip it and continue its work.  This could point out to a compression
> failure for example, so like any code paths calling this routine I
> think that we'd better do a pg_fatal() and fail hard.
> - I did not understand why there is a reason to make this option
> conditional on the record prints or even the stats, so I have moved
> the FPW save routine into a separate code path.  The other two could
> be silenced (or not) using --quiet for example, for the same result as
> v12 without impacting the usability of this feature.
> - Few tweaks to the docs, the --help output, the comments and the
> tests.
> - Indentation applied.
> 
> Being able to filter the blocks saved using start/end LSNs or just
> --relation is really cool, especially as the file names use the same
> order as what's needed for this option.

Sounds good, definitely along the ideas of what I’d originally envisioned.

Thanks,

David





Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Bharath Rupireddy
On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier  wrote:
>
> I have done a review of that, and here are my notes:
> - The variable names were a bit inconsistent, so I have switched most
> of the new code to use "fullpage".
>
> - The new test has been renamed.
>
> - RestoreBlockImage() would report a failure and the code would just
> skip it and continue its work.  This could point out to a compression
> failure for example, so like any code paths calling this routine I
> think that we'd better do a pg_fatal() and fail hard.
>
> - XLogRecordHasFPW() could be checked directly in the function saving
> the blocks.  Still, there is no need for it as we apply the same
> checks again in the inner loop of the routine.
>
> - Few tweaks to the docs, the --help output, the comments and the
> tests.
> - Indentation applied.
>
> - I did not understand why there is a reason to make this option
> conditional on the record prints or even the stats, so I have moved
> the FPW save routine into a separate code path.  The other two could
> be silenced (or not) using --quiet for example, for the same result as
> v12 without impacting the usability of this feature.

Looks good.

> - The code was not able to handle the case of a target directory
> existing but empty, so I have added a wrapper on pg_check_dir().

Looks okay and with the following, we impose the user-provided target
directory must be empty.

+case 4:
+/* Exists and not empty */
+pg_fatal("directory \"%s\" exists but is not empty", path);

> Being able to filter the blocks saved using start/end LSNs or just
> --relation is really cool, especially as the file names use the same
> order as what's needed for this option.
>
> Comments?

+1. I think this feature will also be useful in pg_walinspect.
However, I'm a bit concerned that it can flood the running database
disk - if someone generates a lot of FPI files.

Overall, the v13 patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-25 Thread Michael Paquier
On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote:
> Thanks for the patch. I've made the above change as well as renamed
> the test file name to be save_fpi.pl, everything else remains the same
> as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
> https://commitfest.postgresql.org/41/3628/.

I have done a review of that, and here are my notes:
- The variable names were a bit inconsistent, so I have switched most
of the new code to use "fullpage".
- The code was not able to handle the case of a target directory
existing but empty, so I have added a wrapper on pg_check_dir().
- XLogRecordHasFPW() could be checked directly in the function saving
the blocks.  Still, there is no need for it as we apply the same
checks again in the inner loop of the routine.
- The new test has been renamed.
- RestoreBlockImage() would report a failure and the code would just
skip it and continue its work.  This could point out to a compression
failure for example, so like any code paths calling this routine I
think that we'd better do a pg_fatal() and fail hard.
- I did not understand why there is a reason to make this option
conditional on the record prints or even the stats, so I have moved
the FPW save routine into a separate code path.  The other two could
be silenced (or not) using --quiet for example, for the same result as
v12 without impacting the usability of this feature.
- Few tweaks to the docs, the --help output, the comments and the
tests.
- Indentation applied.

Being able to filter the blocks saved using start/end LSNs or just
--relation is really cool, especially as the file names use the same
order as what's needed for this option.

Comments?
--
Michael
From 66eedbb96858a4f6804509900b116d8a63b39925 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 26 Dec 2022 16:28:00 +0900
Subject: [PATCH v13] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: _ with
formatting to make things somewhat sortable; for instance:

-01C0.1663.1.6117.0_main
-01000150.1664.0.6115.0_main
-010001E0.1664.0.6114.0_main
-01000270.1663.1.6116.0_main
-01000300.1663.1.6113.0_main
-01000390.1663.1.6112.0_main
-01000420.1663.1.8903.0_main
-010004B0.1663.1.8902.0_main
-01000540.1663.1.6111.0_main
-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 src/bin/pg_waldump/meson.build|   1 +
 src/bin/pg_waldump/pg_waldump.c   | 107 +
 src/bin/pg_waldump/t/002_save_fullpage.pl | 111 ++
 doc/src/sgml/ref/pg_waldump.sgml  |  66 +
 4 files changed, 285 insertions(+)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 3fa1b53e71..0428998350 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -31,6 +31,7 @@ tests += {
   'tap': {
 'tests': [
   't/001_basic.pl',
+  't/002_save_fullpage.pl',
 ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..d90a142c68 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fullpage_path;
 } XLogDumpConfig;
 
 
@@ -112,6 +119,37 @@ verify_directory(const char *directory)
 	return true;
 }
 
+/*
+ * Create if necessary the directory storing the full-page images extracted
+ * from the WAL records read.
+ */
+static void
+create_fullpage_directory(char *path)
+{
+	int			ret;
+
+	switch ((ret = pg_check_dir(path)))
+	{
+		case 0:
+			/* Does 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-24 Thread Bharath Rupireddy
On Sat, Dec 24, 2022 at 12:58 AM David Christensen
 wrote:
>
> On Fri, Dec 23, 2022 at 12:57 PM David Christensen
>  wrote:
> >
> > On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
> >  wrote:
>
> > > 2. +$node->init(extra => ['-k'], allows_streaming => 1);
> > > When enabled with allows_streaming, there are a bunch of things that
> > > happen to the node while initializing, I don't think we need all of
> > > them for this.
> >
> > I think the "allows_streaming" was required to ensure the WAL files
> > were preserved properly, and was the approach we ended up taking
> > rather than trying to fail the archive_command or other approaches I'd
> > taken earlier. I'd rather keep this if we can, unless you can propose
> > a different approach that would continue to work in the same way.
>
> Confirmed that we needed this in order to create the replication slot,
> so this /is/ required for the test to work.

The added test needs wal_level to be replica, but the TAP tests set it
to minimal if allows_streaming isn't passed. However, if passed
allows_streaming, it sets a bunch of other parameters which are not
required for this test (see note->init function in cluster.pm), hence
we could just set the required parameters wal_level = replica and
max_wal_senders for the replication slot to be created.

> Enclosing v11 with yours and Michael's latest feedback.

Thanks for the patch. I've made the above change as well as renamed
the test file name to be save_fpi.pl, everything else remains the same
as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
https://commitfest.postgresql.org/41/3628/.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v12-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-23 Thread David Christensen
On Fri, Dec 23, 2022 at 12:57 PM David Christensen
 wrote:
>
> On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
>  wrote:

[snip]

> > 2. +$node->init(extra => ['-k'], allows_streaming => 1);
> > When enabled with allows_streaming, there are a bunch of things that
> > happen to the node while initializing, I don't think we need all of
> > them for this.
>
> I think the "allows_streaming" was required to ensure the WAL files
> were preserved properly, and was the approach we ended up taking
> rather than trying to fail the archive_command or other approaches I'd
> taken earlier. I'd rather keep this if we can, unless you can propose
> a different approach that would continue to work in the same way.

Confirmed that we needed this in order to create the replication slot,
so this /is/ required for the test to work.

Enclosing v11 with yours and Michael's latest feedback.

Best,

David


v11-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-23 Thread David Christensen
On Mon, Dec 19, 2022 at 12:23 AM Michael Paquier  wrote:
>
> On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote:
> > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  
> > wrote:
> > This v10 should incorporate your feedback as well as Bharath's.
>
> Thanks for the new version.  I have minor comments.
>
> >> It seems to me that you could allow things to work even if the
> >> directory exists and is empty.  See for example
> >> verify_dir_is_empty_or_create() in pg_basebackup.c.
> >
> > The `pg_mkdir_p()` supports an existing directory (and I don't think
> > we want to require it to be empty first), so this only errors when it
> > can't create a directory for some reason.
>
> Sure, but things can also be made so as we don't fail if the directory
> exists and is empty?  This would be more consistent with the base
> directories created by pg_basebackup and initdb.

I guess I'm feeling a little dense here; how is this failing if there
is an existing empty directory?

> >> +$node->safe_psql('postgres', < >> +SELECT 'init' FROM 
> >> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, 
> >> false);
> >> +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
> >> +CHECKPOINT; -- required to force FPI for next writes
> >> +UPDATE test_table SET a = a + 1;
> >> Using an EOF to execute a multi-line query would be a first.  Couldn't
> >> you use the same thing as anywhere else?  009_twophase.pl just to
> >> mention one.  (Mentioned by Bharath upthread, where he asked for an
> >> extra opinion so here it is.)
> >
> > Fair enough, while idiomatic perl to me, not a hill to die on;
> > converted to a standard multiline string.
>
> By the way, knowing that we have an option called --fullpage, could be
> be better to use --save-fullpage for the option name?

Works for me.  I think I'd just wanted to avoid reformatting the
entire usage message which is why I'd gone with the shorter version.

> +   OPF = fopen(filename, PG_BINARY_W);
> +   if (!OPF)
> +   pg_fatal("couldn't open file for output: %s", filename);
> [..]
> +   if (fwrite(page, BLCKSZ, 1, OPF) != 1)
> +   pg_fatal("couldn't write out complete full page image to file: 
> %s", filename);
> These should more more generic, as of "could not open file \"%s\"" and
> "could not write file \"%s\"" as the file name provides all the
> information about what this writes.

Sure, will update.

Best,

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-23 Thread David Christensen
On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
 wrote:
>
> On Fri, Dec 16, 2022 at 4:47 AM David Christensen
>  wrote:
> >
> > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  
> > wrote:
> > >
> > > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> > > > I can get one sent in tomorrow.
> >
> > This v10 should incorporate your feedback as well as Bharath's.
>
> Thanks for the patch. Here're some minor comments:
>
> 1. +my $node =  PostgreSQL::Test::Cluster->new('primary');
> Can the name be other than 'primary' because we don't create a standby
> for this test? Something like - 'node_a' or 'node_extract_fpi' or some
> other.

Sure, no issues.

> 2. +$node->init(extra => ['-k'], allows_streaming => 1);
> When enabled with allows_streaming, there are a bunch of things that
> happen to the node while initializing, I don't think we need all of
> them for this.

I think the "allows_streaming" was required to ensure the WAL files
were preserved properly, and was the approach we ended up taking
rather than trying to fail the archive_command or other approaches I'd
taken earlier. I'd rather keep this if we can, unless you can propose
a different approach that would continue to work in the same way.

> 3. +$node->init(extra => ['-k'], allows_streaming => 1);
> Can we use --data-checksums instead of -k for more readability?
> Perhaps, a comment on why we need that option helps greatly.

Yeah, can spell out; don't recall exactly why we needed it offhand,
but will confirm or remove if insignificant.

> 4.
> +page = (Page) buf.data;
> +
> +if (!XLogRecHasBlockRef(record, block_id))
> +continue;
> +
> +if (!XLogRecHasBlockImage(record, block_id))
> +continue;
> +
> +if (!RestoreBlockImage(record, block_id, page))
> +continue;
> Can you shift  page = (Page) buf.data; just before the last if
> condition RestoreBlockImage() so that it doesn't get executed for the
> other two continue statements?

Sure; since it was just setting a pointer value I didn't consider it
to be a hotspot for optimization.

Best,

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-21 Thread Bharath Rupireddy
On Fri, Dec 16, 2022 at 4:47 AM David Christensen
 wrote:
>
> On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  wrote:
> >
> > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> > > I can get one sent in tomorrow.
>
> This v10 should incorporate your feedback as well as Bharath's.

Thanks for the patch. Here're some minor comments:

1. +my $node =  PostgreSQL::Test::Cluster->new('primary');
Can the name be other than 'primary' because we don't create a standby
for this test? Something like - 'node_a' or 'node_extract_fpi' or some
other.

2. +$node->init(extra => ['-k'], allows_streaming => 1);
When enabled with allows_streaming, there are a bunch of things that
happen to the node while initializing, I don't think we need all of
them for this.

3. +$node->init(extra => ['-k'], allows_streaming => 1);
Can we use --data-checksums instead of -k for more readability?
Perhaps, a comment on why we need that option helps greatly.

4.
+page = (Page) buf.data;
+
+if (!XLogRecHasBlockRef(record, block_id))
+continue;
+
+if (!XLogRecHasBlockImage(record, block_id))
+continue;
+
+if (!RestoreBlockImage(record, block_id, page))
+continue;
Can you shift  page = (Page) buf.data; just before the last if
condition RestoreBlockImage() so that it doesn't get executed for the
other two continue statements?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-18 Thread Michael Paquier
On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote:
> On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  wrote:
> This v10 should incorporate your feedback as well as Bharath's.

Thanks for the new version.  I have minor comments.

>> It seems to me that you could allow things to work even if the
>> directory exists and is empty.  See for example
>> verify_dir_is_empty_or_create() in pg_basebackup.c.
> 
> The `pg_mkdir_p()` supports an existing directory (and I don't think
> we want to require it to be empty first), so this only errors when it
> can't create a directory for some reason.

Sure, but things can also be made so as we don't fail if the directory
exists and is empty?  This would be more consistent with the base
directories created by pg_basebackup and initdb.

>> +$node->safe_psql('postgres', <> +SELECT 'init' FROM 
>> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
>> +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
>> +CHECKPOINT; -- required to force FPI for next writes
>> +UPDATE test_table SET a = a + 1;
>> Using an EOF to execute a multi-line query would be a first.  Couldn't
>> you use the same thing as anywhere else?  009_twophase.pl just to
>> mention one.  (Mentioned by Bharath upthread, where he asked for an
>> extra opinion so here it is.)
> 
> Fair enough, while idiomatic perl to me, not a hill to die on;
> converted to a standard multiline string.

By the way, knowing that we have an option called --fullpage, could be
be better to use --save-fullpage for the option name?

+   OPF = fopen(filename, PG_BINARY_W);
+   if (!OPF)
+   pg_fatal("couldn't open file for output: %s", filename);
[..]
+   if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+   pg_fatal("couldn't write out complete full page image to file: %s", 
filename);
These should more more generic, as of "could not open file \"%s\"" and
"could not write file \"%s\"" as the file name provides all the
information about what this writes.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-15 Thread David Christensen
On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  wrote:
>
> On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> > I can get one sent in tomorrow.

This v10 should incorporate your feedback as well as Bharath's.

> -XLogRecordHasFPW(XLogReaderState *record)
> +XLogRecordHasFPI(XLogReaderState *record)
> This still refers to a FPW, so let's leave that out as well as any
> renamings of this kind..

Reverted those changes.

> +   if (config.save_fpi_path != NULL)
> +   {
> +   /* Create the dir if it doesn't exist */
> +   if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
> +   {
> +   pg_log_error("could not create output directory \"%s\": %m",
> +config.save_fpi_path);
> +   goto bad_argument;
> +   }
> +   }
> It seems to me that you could allow things to work even if the
> directory exists and is empty.  See for example
> verify_dir_is_empty_or_create() in pg_basebackup.c.

The `pg_mkdir_p()` supports an existing directory (and I don't think
we want to require it to be empty first), so this only errors when it
can't create a directory for some reason.

> +my $file_re =
> +  
> qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
> This is artistic to parse for people not used to regexps (I do, a
> little).  Perhaps this could use a small comment with an example of
> name or a reference describing this format?

Added a description of what this is looking for.

> +# Set umask so test directories and files are created with default 
> permissions
> +umask(0077);
> Incorrect copy-paste coming from elsewhere like the TAP tests of
> pg_basebackup with group permissions?  Doesn't
> PostgreSQL::Test::Utils::tempdir give already enough protection in
> terms of umask() and permissions?

I'd expect that's where that came from. Removed.

> +   if (config.save_fpi_path != NULL)
> +   {
> +   /* We fsync our output directory only; since these files are not part
> +* of the production database we do not require the performance hit
> +* that fsyncing every FPI would entail, so are doing this as a
> +* compromise. */
> +
> +   fsync_fname(config.save_fpi_path, true);
> +   }
> Why is it necessary to flush anything at all, then?

I personally don't think it is, but added it per Bharath's request.
Removed in this revision.

> +my $relation = $node->safe_psql('postgres',
> +   q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN
> dattablespace ELSE reltablespace END, pg_database.oid,
> pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE
> relname = 'test_table' AND datname = current_database()}
> Could you rewrite that to be on multiple lines?

Sure, reformated.

> +diag "using walfile: $walfile";
> You should avoid the use of "diag", as this would cause extra output
> noise with a simple make check.

Had been using it for debugging and didn't realize it'd cause issues.
Removed both instances.

> +$node->safe_psql('postgres', "SELECT 
> pg_drop_replication_slot('regress_pg_waldump_slot')")
> That's not really necessary, the nodes are wiped out at the end of the
> test.

Removed.

> +$node->safe_psql('postgres', < +SELECT 'init' FROM 
> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
> +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
> +CHECKPOINT; -- required to force FPI for next writes
> +UPDATE test_table SET a = a + 1;
> Using an EOF to execute a multi-line query would be a first.  Couldn't
> you use the same thing as anywhere else?  009_twophase.pl just to
> mention one.  (Mentioned by Bharath upthread, where he asked for an
> extra opinion so here it is.)

Fair enough, while idiomatic perl to me, not a hill to die on;
converted to a standard multiline string.

Best,

David


v10-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-14 Thread Michael Paquier
On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> I can get one sent in tomorrow.

-XLogRecordHasFPW(XLogReaderState *record)
+XLogRecordHasFPI(XLogReaderState *record)
This still refers to a FPW, so let's leave that out as well as any
renamings of this kind..

+   if (config.save_fpi_path != NULL)
+   {
+   /* Create the dir if it doesn't exist */
+   if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+   {
+   pg_log_error("could not create output directory \"%s\": %m",
+config.save_fpi_path);
+   goto bad_argument;
+   }
+   }
It seems to me that you could allow things to work even if the
directory exists and is empty.  See for example
verify_dir_is_empty_or_create() in pg_basebackup.c.

+my $file_re =
+  
qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
This is artistic to parse for people not used to regexps (I do, a
little).  Perhaps this could use a small comment with an example of
name or a reference describing this format?

+# Set umask so test directories and files are created with default permissions
+umask(0077);
Incorrect copy-paste coming from elsewhere like the TAP tests of
pg_basebackup with group permissions?  Doesn't 
PostgreSQL::Test::Utils::tempdir give already enough protection in
terms of umask() and permissions?

+   if (config.save_fpi_path != NULL)
+   {
+   /* We fsync our output directory only; since these files are not part
+* of the production database we do not require the performance hit
+* that fsyncing every FPI would entail, so are doing this as a
+* compromise. */
+
+   fsync_fname(config.save_fpi_path, true);
+   }
Why is it necessary to flush anything at all, then?

+my $relation = $node->safe_psql('postgres',
+   q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN
dattablespace ELSE reltablespace END, pg_database.oid,
pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE
relname = 'test_table' AND datname = current_database()}
Could you rewrite that to be on multiple lines?

+diag "using walfile: $walfile";
You should avoid the use of "diag", as this would cause extra output
noise with a simple make check.

+$node->safe_psql('postgres', "SELECT 
pg_drop_replication_slot('regress_pg_waldump_slot')")
That's not really necessary, the nodes are wiped out at the end of the
test. 

+$node->safe_psql('postgres', <

signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-14 Thread David Christensen
Hi Bharath,

I can get one sent in tomorrow.

Thanks,

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-14 Thread Bharath Rupireddy
On Thu, Nov 17, 2022 at 10:02 PM David Christensen
 wrote:
>
> On Wed, Nov 16, 2022 at 3:30 AM Bharath Rupireddy
>  wrote:
> > 1.
> > -if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
> > +if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
> > These changes are not related to this feature, hence renaming those
> > variables/function names must be dealt with separately. If required,
> > proposing another patch can be submitted to change filter_by_fpw to
> > filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().
>
> Not required; can revert the changes unrelated to this specific patch.
> (I'd written the original ones for it, so didn't really think anything
> of it... :-))
>
> > 2.
> > +/* We fsync our output directory only; since these files are not 
> > part
> > + * of the production database we do not require the performance hit
> > + * that fsyncing every FPI would entail, so are doing this as a
> > + * compromise. */
> > The commenting style doesn't match the standard that we follow
> > elsewhere in postgres, please refer to other multi-line comments.
>
> Will fix.
>
> > 3.
> > +fsync_fname(config.save_fpi_path, true);
> > +}
> > It looks like fsync_fname()/fsync() in general isn't recursive, in the
> > sense that it doesn't fsync the files under the directory, but the
> > directory only. So, the idea of directory fsync doesn't seem worth it.
> > We either 1) get rid of fsync entirely or 2) fsync all the files after
> > they are created and the directory at the end or 3) do option (2) with
> > --no-sync option similar to its friends. Since option (2) is a no go,
> > we can either choose option (1) or option (2). My vote at this point
> > is for option (1).
>
> Agree to remove.
>
> > 4.
> > +($walfile_name, $blocksize) = split '\|' =>
> > $node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
> > current_setting('block_size')");
> > +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
> > I think there's something wrong with this, no? pg_switch_wal() can, at
> > times, return end+1 of the prior segment (see below snippet) and I'm
> > not sure if such a case can happen here.
> >
> >  * The return value is either the end+1 address of the switch record,
> >  * or the end+1 address of the prior segment if we did not need to
> >  * write a switch record because we are already at segment start.
> >  */
> > XLogRecPtr
> > RequestXLogSwitch(bool mark_unimportant)
>
> I think this approach is pretty common to get the walfile name, no?
> While there might be an edge case here, since the rest of the test is
> a controlled environment I'm inclined to just not worry about it; this
> would require the changes prior to this to exactly fill a WAL segment
> which strikes me as extremely unlikely to the point of impossible in
> this specific scenario.
>
> > 5.
> > +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
> > +ok(-f $walfile, "Got a WAL file");
> > Is this checking if the WAL file is present or not in PGDATA/pg_wal?
> > If yes, I think this isn't required as pg_switch_wal() ensures that
> > the WAL is written and flushed to disk.
>
> You are correct, probably another artifact of the earlier version.
> That said, not sure I see the harm in keeping it as a sanity-check.
>
> > 6.
> > +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
> > Isn't "pgdata" hardcoded here? I think you might need to do the following:
> > $node->data_dir . '/pg_wal/' . $walfile_name;;
>
> Can fix.
>
> > 7.
> > +# save filename for later verification
> > +$files{$file}++;
> >
> > +# validate that we ended up with some FPIs saved
> > +ok(keys %files > 0, 'verify we processed some files');
> > Why do we need to store filenames in an array when we later just check
> > the size of the array? Can't we use a boolean (file_found) or an int
> > variable (file_count) to verify that we found the file.
>
> Another artifact; we were comparing the files output between two
> separate lists of arbitrary numbers of pages being written out and
> verifying the raw/fixup versions had the same lists.
>
> > 8.
> > +$node->safe_psql('postgres', < > +SELECT 'init' FROM
> > pg_create_physical_replication_slot('regress_pg_waldump_slot', true,
> > false);
> > +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
> > +CHECKPOINT; -- required to force FPI for next writes
> > +UPDATE test_table SET a = a + 1;
> > +EOF
> > The EOF with append_conf() is being used in 4 files and elsewhere in
> > the TAP test files (more than 100?) qq[] or quotes is being used. I
> > have no strong opinion here, I'll leave it to the other reviewers or
> > committer.
>
> I'm inclined to leave it just for (personal) readability, but can
> change if there's a strong consensus against.
>
> > > > 11.
> > > > +# verify filename formats matches w/--save-fpi
> > > > +for my $fullpath (glob "$tmp_folder/raw/*")
> > > > Do we 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-17 Thread David Christensen
On Wed, Nov 16, 2022 at 3:30 AM Bharath Rupireddy
 wrote:
> 1.
> -if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
> +if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
> These changes are not related to this feature, hence renaming those
> variables/function names must be dealt with separately. If required,
> proposing another patch can be submitted to change filter_by_fpw to
> filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().

Not required; can revert the changes unrelated to this specific patch.
(I'd written the original ones for it, so didn't really think anything
of it... :-))

> 2.
> +/* We fsync our output directory only; since these files are not part
> + * of the production database we do not require the performance hit
> + * that fsyncing every FPI would entail, so are doing this as a
> + * compromise. */
> The commenting style doesn't match the standard that we follow
> elsewhere in postgres, please refer to other multi-line comments.

Will fix.

> 3.
> +fsync_fname(config.save_fpi_path, true);
> +}
> It looks like fsync_fname()/fsync() in general isn't recursive, in the
> sense that it doesn't fsync the files under the directory, but the
> directory only. So, the idea of directory fsync doesn't seem worth it.
> We either 1) get rid of fsync entirely or 2) fsync all the files after
> they are created and the directory at the end or 3) do option (2) with
> --no-sync option similar to its friends. Since option (2) is a no go,
> we can either choose option (1) or option (2). My vote at this point
> is for option (1).

Agree to remove.

> 4.
> +($walfile_name, $blocksize) = split '\|' =>
> $node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
> current_setting('block_size')");
> +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
> I think there's something wrong with this, no? pg_switch_wal() can, at
> times, return end+1 of the prior segment (see below snippet) and I'm
> not sure if such a case can happen here.
>
>  * The return value is either the end+1 address of the switch record,
>  * or the end+1 address of the prior segment if we did not need to
>  * write a switch record because we are already at segment start.
>  */
> XLogRecPtr
> RequestXLogSwitch(bool mark_unimportant)

I think this approach is pretty common to get the walfile name, no?
While there might be an edge case here, since the rest of the test is
a controlled environment I'm inclined to just not worry about it; this
would require the changes prior to this to exactly fill a WAL segment
which strikes me as extremely unlikely to the point of impossible in
this specific scenario.

> 5.
> +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
> +ok(-f $walfile, "Got a WAL file");
> Is this checking if the WAL file is present or not in PGDATA/pg_wal?
> If yes, I think this isn't required as pg_switch_wal() ensures that
> the WAL is written and flushed to disk.

You are correct, probably another artifact of the earlier version.
That said, not sure I see the harm in keeping it as a sanity-check.

> 6.
> +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
> Isn't "pgdata" hardcoded here? I think you might need to do the following:
> $node->data_dir . '/pg_wal/' . $walfile_name;;

Can fix.

> 7.
> +# save filename for later verification
> +$files{$file}++;
>
> +# validate that we ended up with some FPIs saved
> +ok(keys %files > 0, 'verify we processed some files');
> Why do we need to store filenames in an array when we later just check
> the size of the array? Can't we use a boolean (file_found) or an int
> variable (file_count) to verify that we found the file.

Another artifact; we were comparing the files output between two
separate lists of arbitrary numbers of pages being written out and
verifying the raw/fixup versions had the same lists.

> 8.
> +$node->safe_psql('postgres', < +SELECT 'init' FROM
> pg_create_physical_replication_slot('regress_pg_waldump_slot', true,
> false);
> +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
> +CHECKPOINT; -- required to force FPI for next writes
> +UPDATE test_table SET a = a + 1;
> +EOF
> The EOF with append_conf() is being used in 4 files and elsewhere in
> the TAP test files (more than 100?) qq[] or quotes is being used. I
> have no strong opinion here, I'll leave it to the other reviewers or
> committer.

I'm inclined to leave it just for (personal) readability, but can
change if there's a strong consensus against.

> > > 11.
> > > +# verify filename formats matches w/--save-fpi
> > > +for my $fullpath (glob "$tmp_folder/raw/*")
> > > Do we need to look for the exact match of the file that gets created
> > > in the save-fpi path? While checking for this is great, it makes the
> > > test code non-portable (may not work on Windows or other platforms,
> > > no?) and complex? This way, you can get rid of 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-16 Thread Bharath Rupireddy
On Wed, Nov 16, 2022 at 1:20 AM David Christensen
 wrote:
>
> Enclosed is v9.
>
> - code style consistency (FPI instead of FPW) internally.
> - cleanup of no-longer needed checksum-related pieces from code and tests.
> - test cleanup/simplification.
> - other comment cleanup.
>
> Passes all CI checks.

Thanks for the updated patch.

1.
-if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
+if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
These changes are not related to this feature, hence renaming those
variables/function names must be dealt with separately. If required,
proposing another patch can be submitted to change filter_by_fpw to
filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().

2.
+/* We fsync our output directory only; since these files are not part
+ * of the production database we do not require the performance hit
+ * that fsyncing every FPI would entail, so are doing this as a
+ * compromise. */
The commenting style doesn't match the standard that we follow
elsewhere in postgres, please refer to other multi-line comments.

3.
+fsync_fname(config.save_fpi_path, true);
+}
It looks like fsync_fname()/fsync() in general isn't recursive, in the
sense that it doesn't fsync the files under the directory, but the
directory only. So, the idea of directory fsync doesn't seem worth it.
We either 1) get rid of fsync entirely or 2) fsync all the files after
they are created and the directory at the end or 3) do option (2) with
--no-sync option similar to its friends. Since option (2) is a no go,
we can either choose option (1) or option (2). My vote at this point
is for option (1).

4.
+($walfile_name, $blocksize) = split '\|' =>
$node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
current_setting('block_size')");
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
I think there's something wrong with this, no? pg_switch_wal() can, at
times, return end+1 of the prior segment (see below snippet) and I'm
not sure if such a case can happen here.

 * The return value is either the end+1 address of the switch record,
 * or the end+1 address of the prior segment if we did not need to
 * write a switch record because we are already at segment start.
 */
XLogRecPtr
RequestXLogSwitch(bool mark_unimportant)

5.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
+ok(-f $walfile, "Got a WAL file");
Is this checking if the WAL file is present or not in PGDATA/pg_wal?
If yes, I think this isn't required as pg_switch_wal() ensures that
the WAL is written and flushed to disk.

6.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
Isn't "pgdata" hardcoded here? I think you might need to do the following:
$node->data_dir . '/pg_wal/' . $walfile_name;;

7.
+# save filename for later verification
+$files{$file}++;

+# validate that we ended up with some FPIs saved
+ok(keys %files > 0, 'verify we processed some files');
Why do we need to store filenames in an array when we later just check
the size of the array? Can't we use a boolean (file_found) or an int
variable (file_count) to verify that we found the file.

8.
+$node->safe_psql('postgres', < > 11.
> > +# verify filename formats matches w/--save-fpi
> > +for my $fullpath (glob "$tmp_folder/raw/*")
> > Do we need to look for the exact match of the file that gets created
> > in the save-fpi path? While checking for this is great, it makes the
> > test code non-portable (may not work on Windows or other platforms,
> > no?) and complex? This way, you can get rid of get_block_info() as
> > well? And +for my $fullpath (glob "$tmp_folder/raw/*")
> > will also  get simplified.
> >
> > I think you can further simplify the tests by:
> > create the node
> > generate an FPI
> > call pg_waldump with save-fpi option
> > check the target directory for a file that contains the relid,
> > something like '%relid%'.
> >
> > The above would still serve the purpose, tests the code without much 
> > complexity.
>
> I disagree; I think there is utility in keeping the validation of the
> expected output.  Since we have the code that works for it (and does
> work on Windows, per passing the CI tests) I'm not seeing why we
> wouldn't want to continue to validate as much as possible.

My intention is to simplify the tests further and I still stick to it.
It looks like the majority of test code is to form the file name in
the format that pg_waldump outputs and match with the file name in the
target directory - for instance, in get_block_info(), and in the loop
for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests
need to aim for file format checks, it's enough to look for the
written file with '%relid%' by pg_waldump, if needed, the contents of
the files written/FPI can also be verified with, say, pg_checksum
tool. Others may have different opinions though.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-15 Thread David Christensen
Enclosed is v9.

- code style consistency (FPI instead of FPW) internally.
- cleanup of no-longer needed checksum-related pieces from code and tests.
- test cleanup/simplification.
- other comment cleanup.

Passes all CI checks.

Best,

David


v9-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-15 Thread David Christensen
On Tue, Nov 15, 2022 at 4:41 AM Bharath Rupireddy
 wrote:
>
> On Tue, Nov 15, 2022 at 1:29 AM David Christensen
>  wrote:
> >
> > Enclosed is v8, which uses the replication slot method to retain WAL
> > as well as fsync'ing the output directory when everything is done.
>
> Thanks. It mostly is in good shape. However, few more comments:
>
> 1.
> +if it does not exist.  The images saved will be subject to the same
> +filtering and limiting criteria as display records, but in this
> +mode pg_waldump will not output any other
> +information.
> May I know what's the intention of the statement 'The images saved
> '? If it's not necessary and convey anything useful to the user,
> can we remove it?

Basically I mean if you're limiting to a specific relation or rmgr
type, etc, it only saves those FPIs.  (So filtering is applied first
before considering whether to save the FPI or not.)

> 2.
> +#include "storage/checksum.h"
> +#include "storage/checksum_impl.h"
> I think we don't need the above includes as we got rid of verifying
> page checksums. The patch compiles without them for me.

Good catch.

> 3.
> +char   *save_fpw_path;
> Can we rename the above variable to save_fpi_path, just to be in sync
> with what we expose to the user, the option name 'save-fpi'?

Sure.

> 4.
> +if (config.save_fpw_path != NULL)
> +{
> +/* Fsync our output directory */
> +fsync_fname(config.save_fpw_path, true);
> +}
> I guess adding a comment there as to why we aren't fsyncing for every
> file that gets created, but once per the directory at the end. That'd
> help clarify doubts that other members might get while looking at the
> code.

Can do.

> 5.
> +if (config.save_fpw_path != NULL)
> +{
> +/* Fsync our output directory */
> +fsync_fname(config.save_fpw_path, true);
> +}
> So, are we sure that we don't want to fsync for time_to_stop exit(0)
> cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely
> meaning exiting with return code 0, shouldn't we fsync the directory?

We can. Like I've said before, since these aren't production parts of
the cluster I don't personally have much of an opinion if fsync() is
appropriate at all, so don't have strong feelings here.

> 6.
> +else if (config.save_fpw_path)
> Let's use the same convention to check non-NULLness,
> config.save_fpw_path != NULL.

Good catch.

> 7.
> +CHECKPOINT;
> +SELECT pg_switch_wal();
> +UPDATE test_table SET a = a + 1;
> +SELECT pg_switch_wal();
> I don't think switching WAL after checkpoint is necessary here,
> because the checkpoint ensures all the WAL gets flushed to disk.
> Please remove it.

The point is to ensure we have a clean WAL segment that we know will
contain the relation we are filtering by.  Will test if this still
holds without the extra pg_switch_wal(), but that's the rationale.

> PS: I've seen the following code:
> +my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
> want the second WAL file, which will be a complete WAL file with
> full-page writes for our specific relation.

I don't understand the question.

> 8.
> +$node->safe_psql('postgres', < +EOF
> Why EOF is used here? Can't we do something like below to execute
> multiple statements?
> $node->safe_psql(
> 'postgres', qq[
> SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> ]);
>
> Same here:
> +$node->safe_psql('postgres', < +SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
> +EOQ

As a long-time perl programmer, heredocs seem more natural and easier
to read rather than a string that accomplishes the same function.  If
there is an established project style I'll stick with it, but it just
rolled out that way. :-)

> 9.
> +my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
> want the second WAL file, which will be a complete WAL file with
> full-page writes for our specific relation.
> Is it guaranteed that just looking at the second WAL file in the
> pg_wal directory assures WAL file with FPIs? I think we have to save
> the WAL file that contains FPIs, that is the file after, CHECKPOINT,
> UPDATE and pg_switch_wal. I think you can store output LSN of
> pg_switch_wal

Yeah, I could look at that approach; originally this test was doing a
lot more, so this is sort of residual from that original
implementation.  For a single file, this would probably be an
acceptable route.

> 10.
> +$node->safe_psql('postgres', < +SELECT 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-15 Thread Bharath Rupireddy
On Tue, Nov 15, 2022 at 1:29 AM David Christensen
 wrote:
>
> Enclosed is v8, which uses the replication slot method to retain WAL
> as well as fsync'ing the output directory when everything is done.

Thanks. It mostly is in good shape. However, few more comments:

1.
+if it does not exist.  The images saved will be subject to the same
+filtering and limiting criteria as display records, but in this
+mode pg_waldump will not output any other
+information.
May I know what's the intention of the statement 'The images saved
'? If it's not necessary and convey anything useful to the user,
can we remove it?

2.
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
I think we don't need the above includes as we got rid of verifying
page checksums. The patch compiles without them for me.

3.
+char   *save_fpw_path;
Can we rename the above variable to save_fpi_path, just to be in sync
with what we expose to the user, the option name 'save-fpi'?

4.
+if (config.save_fpw_path != NULL)
+{
+/* Fsync our output directory */
+fsync_fname(config.save_fpw_path, true);
+}
I guess adding a comment there as to why we aren't fsyncing for every
file that gets created, but once per the directory at the end. That'd
help clarify doubts that other members might get while looking at the
code.

5.
+if (config.save_fpw_path != NULL)
+{
+/* Fsync our output directory */
+fsync_fname(config.save_fpw_path, true);
+}
So, are we sure that we don't want to fsync for time_to_stop exit(0)
cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely
meaning exiting with return code 0, shouldn't we fsync the directory?

6.
+else if (config.save_fpw_path)
Let's use the same convention to check non-NULLness,
config.save_fpw_path != NULL.

7.
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+SELECT pg_switch_wal();
I don't think switching WAL after checkpoint is necessary here,
because the checkpoint ensures all the WAL gets flushed to disk.
Please remove it.

PS: I've seen the following code:
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.

8.
+$node->safe_psql('postgres', safe_psql('postgres', < $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.
Is it guaranteed that just looking at the second WAL file in the
pg_wal directory assures WAL file with FPIs? I think we have to save
the WAL file that contains FPIs, that is the file after, CHECKPOINT,
UPDATE and pg_switch_wal. I think you can store output LSN of
pg_switch_wal

10.
+$node->safe_psql('postgres', 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-14 Thread David Christensen
Enclosed is v8, which uses the replication slot method to retain WAL
as well as fsync'ing the output directory when everything is done.


v8-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-14 Thread David Christensen
On Fri, Nov 11, 2022 at 4:57 AM Bharath Rupireddy
 wrote:
> > Well if it's not the same output then I guess you're right and there's
> > not a use for the `--fixup` mode.  By the same token, I'd say
> > calculating/setting the checksum also wouldn't need to be done, we
> > should just include the page as included in the WAL stream.
>
> Let's hear from others, we may be missing something here. I recommend
> keeping the --fixup patch as 0002, in case if we decide to discard
> it's easier, however I'll leave that to you.

I've whacked in `git` for now; I can resurrect if people consider it useful.

> > > > > 5.
> > > > > +if (!RestoreBlockImage(record, block_id, page))
> > > > > +continue;
> > > > > +
> > > > > +/* we have our extracted FPI, let's save it now */
> > > > > After extracting the page from the WAL record, do we need to perform a
> > > > > checksum on it?
> > >
> > > I think you just need to do the following, this will ensure the
> > > authenticity of the page that pg_waldump returns.
> > > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, 
> > > blk))
> > > {
> > > pg_fatal("page checksum failed");
> > > }
> >
> > The WAL already has a checksum, so not certain this makes sense on its
> > own. Also I'm inclined to make it a warning if it doesn't match rather
> > than a fatal. (I'd also have to verify that the checksum is properly
> > set on the page prior to copying the FPI into WAL, which I'm pretty
> > sure it is but not certain.)
>
> How about having it as an Assert()?

Based on empirical testing, the checksums don't match, so
asserting/alerting on each block extracted seems next to useless, so
going to just remove that.

> > > 5.
> > > +fsync(fileno(OPF));
> > > +fclose(OPF);
> > > I think just the fsync() isn't enough, you still need fsync_fname()
> > > and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
> > > <= XLogRecMaxBlockId(record); block_id++) loop.
> >
> > I'm not sure I get the value of the fsyncs here; if you are using this
> > tool at this capacity you're by definition doing some sort of
> > transient investigative steps.  Since the WAL was fsync'd, you could
> > always rerun/recreate as needed in the unlikely event of an OS crash
> > in the middle of this investigation.  Since this is outside the
> > purview of the database operations proper (unlike, say, initdb) seems
> > like it's unnecessary (or definitely shouldn't need to be selectable).
> > My thoughts are that if we're going to fsync, just do the fsyncs
> > unconditionally rather than complicate the interface further.
>
> -1 for fysnc() per file created as it can create a lot of sync load on
> production servers impacting performance. How about just syncing the
> directory at the end assuming it doesn't cost as much as fsync() per
> FPI file created would?

I can fsync the dir if that's a useful compromise.

> > > 4.
> > > +$primary->append_conf('postgresql.conf', "wal_level='replica'");
> > > +$primary->append_conf('postgresql.conf', 'archive_mode=on');
> > > +$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
> > > Why do you need to set wal_level to replica, out of the box your
> > > cluster comes with replica only no?
> > > And why do you need archive_mode on and set the command to do nothing?
> > > Why archiving is needed for testing your feature firstly?
> >
> > I think it had shown "minimal" in my testing; I was purposefully
> > failing archives so the WAL would stick around.  Maybe a custom
> > archive command that just copied a single WAL file into a known
> > location so I could use that instead of the current approach would
> > work, though not sure how Windows support would work with that.  Open
> > to other ideas to more cleanly get a single WAL file that isn't the
> > last one.  (Earlier versions of this test were using /all/ of the
> > generated WAL files rather than a single one, so maybe I am
> > overcomplicating things for a single WAL file case.)
>
> Typically we create a physical replication slot at the beginning so
> that the server keeps the WAL required for you in pg_wal itself, for
> instance, please see pg_walinspect:
>
> -- Make sure checkpoints don't interfere with the test.
> SELECT 'init' FROM
> pg_create_physical_replication_slot('regress_pg_walinspect_slot',
> true, false);

Will see if I can get something like this to work; I'm currently
stopping the server before running the file-based tests, but I suppose
there's no reason to do so, so a temporary slot that holds it around
until the test is complete is probably fine.

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-11 Thread David Christensen
On Fri, Nov 11, 2022 at 8:15 AM Justin Pryzby  wrote:
>
> On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote:
> > On Wed, Nov 9, 2022 at 2:08 PM David Christensen 
> >  wrote:
> > > Justin sez:
> > > > I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> > > > majority of TAP tests don't.
> > > >
> > > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
> > > >1435 safe_psql('postgres',
> > > > 335 safe_psql(
> > > >  23 safe_psql($connect_db,
> > >
> > > If there was a reason, I don't recall offhand; I will test removing it
> > > and if things still work will consider it good enough.
> >
> > Things blew up when I did that; rather than hunt it down, I just left it 
> > in. :-)
>
> > +$primary->safe_psql('db1',  <
> It worked for me when I removed the 3 references to db1.
> That's good for efficiency of the test.

I did figure that out later; fixed in git.

> > +my $blocksize = 8192;
>
> I think this should be just "my $blocksize;" rather than setting a value
> which is later overwriten.

Yep.  Fixed in git.

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-11 Thread Justin Pryzby
On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote:
> On Wed, Nov 9, 2022 at 2:08 PM David Christensen 
>  wrote:
> > Justin sez:
> > > I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> > > majority of TAP tests don't.
> > >
> > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
> > >1435 safe_psql('postgres',
> > > 335 safe_psql(
> > >  23 safe_psql($connect_db,
> >
> > If there was a reason, I don't recall offhand; I will test removing it
> > and if things still work will consider it good enough.
> 
> Things blew up when I did that; rather than hunt it down, I just left it in. 
> :-)

> +$primary->safe_psql('db1',  < +my $blocksize = 8192;

I think this should be just "my $blocksize;" rather than setting a value
which is later overwriten.

-- 
Justin




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-11 Thread Bharath Rupireddy
On Thu, Nov 10, 2022 at 9:52 PM David Christensen
 wrote:
>
> > > > 2. I'm unable to understand the use-case for --fixup-fpi option.
> > > > pg_waldump is supposed to be just WAL reader, and must not return any
> > > > modified information, with --fixup-fpi option, the patch violates this
> > > > principle i.e. it sets page LSN and returns. Without actually
> > > > replaying the WAL record on the page, how is it correct to just set
> > > > the LSN? How will it be useful? ISTM, we must ignore this option
> > > > unless there's a strong use-case.
> > >
> > > How I was envisioning this was for cases like extreme surgery for
> > > corrupted pages, where you extract the page from WAL but it has lsn
> > > and checksum set so you could do something like `dd if=fixup-block
> > > of=relation ...`, so it *simulates* the replay of said fullpage blocks
> > > in cases where for some reason you can't play the intermediate
> > > records; since this is always a fullpage block, it's capturing what
> > > would be the snapshot so you could manually insert somewhere as needed
> > > without needing to replay (say if dealing with an incomplete or
> > > corrupted WAL stream).
> >
> > Recovery sets the page LSN after it replayed the WAL record on the
> > page right? Recovery does this - base_page/FPI +
> > apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
> > new_version_of_page. Essentially, in your patch, you are just setting
> > the WAL record LSN with the page contents being the base page's. I'm
> > still not sure what's the real use-case here. We don't have an
> > independent function in postgres, given a base page and a WAL record
> > that just replays the WAL record and output's the new version of the
> > page, so I think what you do in the patch with fixup option seems
> > wrong to me.
>
> Well if it's not the same output then I guess you're right and there's
> not a use for the `--fixup` mode.  By the same token, I'd say
> calculating/setting the checksum also wouldn't need to be done, we
> should just include the page as included in the WAL stream.

Let's hear from others, we may be missing something here. I recommend
keeping the --fixup patch as 0002, in case if we decide to discard
it's easier, however I'll leave that to you.

> > > > 5.
> > > > +if (!RestoreBlockImage(record, block_id, page))
> > > > +continue;
> > > > +
> > > > +/* we have our extracted FPI, let's save it now */
> > > > After extracting the page from the WAL record, do we need to perform a
> > > > checksum on it?
> >
> > I think you just need to do the following, this will ensure the
> > authenticity of the page that pg_waldump returns.
> > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
> > {
> > pg_fatal("page checksum failed");
> > }
>
> The WAL already has a checksum, so not certain this makes sense on its
> own. Also I'm inclined to make it a warning if it doesn't match rather
> than a fatal. (I'd also have to verify that the checksum is properly
> set on the page prior to copying the FPI into WAL, which I'm pretty
> sure it is but not certain.)

How about having it as an Assert()?

> > 5.
> > +fsync(fileno(OPF));
> > +fclose(OPF);
> > I think just the fsync() isn't enough, you still need fsync_fname()
> > and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
> > <= XLogRecMaxBlockId(record); block_id++) loop.
>
> I'm not sure I get the value of the fsyncs here; if you are using this
> tool at this capacity you're by definition doing some sort of
> transient investigative steps.  Since the WAL was fsync'd, you could
> always rerun/recreate as needed in the unlikely event of an OS crash
> in the middle of this investigation.  Since this is outside the
> purview of the database operations proper (unlike, say, initdb) seems
> like it's unnecessary (or definitely shouldn't need to be selectable).
> My thoughts are that if we're going to fsync, just do the fsyncs
> unconditionally rather than complicate the interface further.

-1 for fysnc() per file created as it can create a lot of sync load on
production servers impacting performance. How about just syncing the
directory at the end assuming it doesn't cost as much as fsync() per
FPI file created would?

> > 4.
> > +$primary->append_conf('postgresql.conf', "wal_level='replica'");
> > +$primary->append_conf('postgresql.conf', 'archive_mode=on');
> > +$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
> > Why do you need to set wal_level to replica, out of the box your
> > cluster comes with replica only no?
> > And why do you need archive_mode on and set the command to do nothing?
> > Why archiving is needed for testing your feature firstly?
>
> I think it had shown "minimal" in my testing; I was purposefully
> failing archives so the WAL would stick around.  Maybe a custom
> archive command that just copied a single WAL file into a known
> location so I could use that instead 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-10 Thread David Christensen
On Thu, Nov 10, 2022 at 2:33 AM Bharath Rupireddy
 wrote:
>
> On Thu, Nov 10, 2022 at 1:31 AM David Christensen
>  wrote:
> >
>
> Thanks for providing the v7 patch, please see my comments and responses below.

Hi Bharath, Thanks for the feedback.

> > > 2. I'm unable to understand the use-case for --fixup-fpi option.
> > > pg_waldump is supposed to be just WAL reader, and must not return any
> > > modified information, with --fixup-fpi option, the patch violates this
> > > principle i.e. it sets page LSN and returns. Without actually
> > > replaying the WAL record on the page, how is it correct to just set
> > > the LSN? How will it be useful? ISTM, we must ignore this option
> > > unless there's a strong use-case.
> >
> > How I was envisioning this was for cases like extreme surgery for
> > corrupted pages, where you extract the page from WAL but it has lsn
> > and checksum set so you could do something like `dd if=fixup-block
> > of=relation ...`, so it *simulates* the replay of said fullpage blocks
> > in cases where for some reason you can't play the intermediate
> > records; since this is always a fullpage block, it's capturing what
> > would be the snapshot so you could manually insert somewhere as needed
> > without needing to replay (say if dealing with an incomplete or
> > corrupted WAL stream).
>
> Recovery sets the page LSN after it replayed the WAL record on the
> page right? Recovery does this - base_page/FPI +
> apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
> new_version_of_page. Essentially, in your patch, you are just setting
> the WAL record LSN with the page contents being the base page's. I'm
> still not sure what's the real use-case here. We don't have an
> independent function in postgres, given a base page and a WAL record
> that just replays the WAL record and output's the new version of the
> page, so I think what you do in the patch with fixup option seems
> wrong to me.

Well if it's not the same output then I guess you're right and there's
not a use for the `--fixup` mode.  By the same token, I'd say
calculating/setting the checksum also wouldn't need to be done, we
should just include the page as included in the WAL stream.

> > > 5.
> > > +if (!RestoreBlockImage(record, block_id, page))
> > > +continue;
> > > +
> > > +/* we have our extracted FPI, let's save it now */
> > > After extracting the page from the WAL record, do we need to perform a
> > > checksum on it?
>
> I think you just need to do the following, this will ensure the
> authenticity of the page that pg_waldump returns.
> if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
> {
> pg_fatal("page checksum failed");
> }

The WAL already has a checksum, so not certain this makes sense on its
own. Also I'm inclined to make it a warning if it doesn't match rather
than a fatal. (I'd also have to verify that the checksum is properly
set on the page prior to copying the FPI into WAL, which I'm pretty
sure it is but not certain.)

> > > case 'W':
> > > config.save_fpw_path = pg_strdup(optarg);
> > > case 'X':
> > >config.fixup_fpw = true;
> > >config.save_fpw_path = pg_strdup(optarg);
> >
> > Like separate opt processing with their own `break` statement?
> > Probably a bit more readable/conventional.
>
> Yes.

Moot with the removal of the --fixup mode.

> Some more comments:
>
> 1.
> +PGAlignedBlockzerobuf;
> Essentially, it's not a zero buffer, please rename the variable to
> something like 'buf' or 'page_buf' or someother?

Sure.

> 2.
> +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
> Replace pg_pwrite with fwrite() and avoid fileno() system calls that
> should suffice here, AFICS, we don't need pg_pwrite.

Sure.

> 3.
> +if (config.save_fpw_path != NULL)
> +{
> +/* Create the dir if it doesn't exist */
> +if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
> I think  you still need pg_check_dir() here, how about something like below?

I was assuming pg_mkdir_p() acted just like mkdir -p, where it's just
an idempotent action, so an existing dir is just treated the same.
What's the benefit here? Would assume if a non-dir file existed at
that path or other permissions issues arose we'd just get an error
from pg_mkdir_p().  (Will review the code there and confirm.)

> 4.
> +/* Create the dir if it doesn't exist */
> +if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
> +{
> +pg_log_error("could not create output directory \"%s\": %m",
> + config.save_fpw_path);
> +goto bad_argument;
> Why is the directory creation error a bad_argument? I think you need
> just pg_fatal() here.

Sure. Was just following the other patterns I'd seen for argument handling.

> 5.
> +fsync(fileno(OPF));
> +fclose(OPF);
> I think just the fsync() isn't enough, you still need fsync_fname()
> and/or 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-10 Thread Bharath Rupireddy
On Thu, Nov 10, 2022 at 1:31 AM David Christensen
 wrote:
>

Thanks for providing the v7 patch, please see my comments and responses below.

> > 2. I'm unable to understand the use-case for --fixup-fpi option.
> > pg_waldump is supposed to be just WAL reader, and must not return any
> > modified information, with --fixup-fpi option, the patch violates this
> > principle i.e. it sets page LSN and returns. Without actually
> > replaying the WAL record on the page, how is it correct to just set
> > the LSN? How will it be useful? ISTM, we must ignore this option
> > unless there's a strong use-case.
>
> How I was envisioning this was for cases like extreme surgery for
> corrupted pages, where you extract the page from WAL but it has lsn
> and checksum set so you could do something like `dd if=fixup-block
> of=relation ...`, so it *simulates* the replay of said fullpage blocks
> in cases where for some reason you can't play the intermediate
> records; since this is always a fullpage block, it's capturing what
> would be the snapshot so you could manually insert somewhere as needed
> without needing to replay (say if dealing with an incomplete or
> corrupted WAL stream).

Recovery sets the page LSN after it replayed the WAL record on the
page right? Recovery does this - base_page/FPI +
apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
new_version_of_page. Essentially, in your patch, you are just setting
the WAL record LSN with the page contents being the base page's. I'm
still not sure what's the real use-case here. We don't have an
independent function in postgres, given a base page and a WAL record
that just replays the WAL record and output's the new version of the
page, so I think what you do in the patch with fixup option seems
wrong to me.

> > 5.
> > +if (!RestoreBlockImage(record, block_id, page))
> > +continue;
> > +
> > +/* we have our extracted FPI, let's save it now */
> > After extracting the page from the WAL record, do we need to perform a
> > checksum on it?

I think you just need to do the following, this will ensure the
authenticity of the page that pg_waldump returns.
if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
{
pg_fatal("page checksum failed");
}

> > case 'W':
> > config.save_fpw_path = pg_strdup(optarg);
> > case 'X':
> >config.fixup_fpw = true;
> >config.save_fpw_path = pg_strdup(optarg);
>
> Like separate opt processing with their own `break` statement?
> Probably a bit more readable/conventional.

Yes.

Some more comments:

1.
+PGAlignedBlockzerobuf;
Essentially, it's not a zero buffer, please rename the variable to
something like 'buf' or 'page_buf' or someother?

2.
+if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Replace pg_pwrite with fwrite() and avoid fileno() system calls that
should suffice here, AFICS, we don't need pg_pwrite.

3.
+if (config.save_fpw_path != NULL)
+{
+/* Create the dir if it doesn't exist */
+if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
I think  you still need pg_check_dir() here, how about something like below?

if (pg_check_dir(config.save_fpw_path) == 0)
{
if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
{
  /* error */
}
}

4.
+/* Create the dir if it doesn't exist */
+if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
+{
+pg_log_error("could not create output directory \"%s\": %m",
+ config.save_fpw_path);
+goto bad_argument;
Why is the directory creation error a bad_argument? I think you need
just pg_fatal() here.

5.
+fsync(fileno(OPF));
+fclose(OPF);
I think just the fsync() isn't enough, you still need fsync_fname()
and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
<= XLogRecMaxBlockId(record); block_id++) loop.

6. Speaking of which, do we need to do fsync()'s optionally? If we
were to write many such FPI files, aren't there going to be more
fsync() calls and imagine this feature being used on a production
server and a lot of fsync() will definitely make running server
fsync() ops slower.  I think we need a new option whether pg_waldump
ever do fsync() or not, something similar to --no-sync of
pg_receivewal/pg_upgrade/pg_dump/pg_initdb/pg_checksums etc. I would
like it if the pg_waldump's --no-sync is coded as 0001 and 0002 can
make use of it.

7.
+pg_fatal("couldn't write out complete fullpage image to
file: %s", filename);
We typically use "full page image" in the output strings, please correct.

8.
+
+if (((PageHeader) page)->pd_checksum)
+((PageHeader) page)->pd_checksum =
pg_checksum_page((char *) page, blk);
Why do you need to set the page's checksum by yourself? I don't think
this is the right way, pg_waldump should just return what it sees in
the WAL record, of course, after verifying a few checks (like 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread David Christensen
On Wed, Nov 9, 2022 at 2:08 PM David Christensen
 wrote:
> Justin sez:
> > I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> > majority of TAP tests don't.
> >
> > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
> >1435 safe_psql('postgres',
> > 335 safe_psql(
> >  23 safe_psql($connect_db,
>
> If there was a reason, I don't recall offhand; I will test removing it
> and if things still work will consider it good enough.

Things blew up when I did that; rather than hunt it down, I just left it in. :-)

Enclosed is v7, with changes thus suggested thus far.


v7-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread David Christensen
> > 6.
> > +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> > Use pg_dir_create_mode instead of hard-coded 0007?
>
> I think I thought of that when I first looked at the patch ...  but, I'm
> not sure, since it says:
>
> src/include/common/file_perm.h-/* Modes for creating directories and files IN 
> THE DATA DIRECTORY */
> src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode;

Looks like it's pretty evenly split in src/bin:

$ git grep -o -E -w '(pg_mkdir_p|mkdir)' '**.c' | sort | uniq -c
  3 initdb/initdb.c:mkdir
  3 initdb/initdb.c:pg_mkdir_p
  1 pg_basebackup/bbstreamer_file.c:mkdir
  2 pg_basebackup/pg_basebackup.c:pg_mkdir_p
  1 pg_dump/pg_backup_directory.c:mkdir
  1 pg_rewind/file_ops.c:mkdir
  4 pg_upgrade/pg_upgrade.c:mkdir

So if that is the preferred approach I'll go ahead and use it.

> I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> majority of TAP tests don't.
>
> $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
>1435 safe_psql('postgres',
> 335 safe_psql(
>  23 safe_psql($connect_db,

If there was a reason, I don't recall offhand; I will test removing it
and if things still work will consider it good enough.

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread David Christensen
On Wed, Nov 9, 2022 at 6:30 AM Bharath Rupireddy
 wrote:
>
> On Wed, Nov 9, 2022 at 5:08 AM David Christensen
>  wrote:
> >
> > Enclosed is v6, which squashes your refactor and adds the additional
> > recent suggestions.
>
> Thanks for working on this feature. Here are some comments for now. I
> haven't looked at the tests, I will take another look at the code and
> tests after these and all other comments are addressed.
>
> 1. For ease of review, please split the test patch to 0002.

Per later discussion seems like new feature tests are fine in the same
patch, yes?

> 2. I'm unable to understand the use-case for --fixup-fpi option.
> pg_waldump is supposed to be just WAL reader, and must not return any
> modified information, with --fixup-fpi option, the patch violates this
> principle i.e. it sets page LSN and returns. Without actually
> replaying the WAL record on the page, how is it correct to just set
> the LSN? How will it be useful? ISTM, we must ignore this option
> unless there's a strong use-case.

How I was envisioning this was for cases like extreme surgery for
corrupted pages, where you extract the page from WAL but it has lsn
and checksum set so you could do something like `dd if=fixup-block
of=relation ...`, so it *simulates* the replay of said fullpage blocks
in cases where for some reason you can't play the intermediate
records; since this is always a fullpage block, it's capturing what
would be the snapshot so you could manually insert somewhere as needed
without needing to replay (say if dealing with an incomplete or
corrupted WAL stream).

> 3.
> +if (fork >= 0 && fork <= MAX_FORKNUM)
> +{
> +if (fork)
> +sprintf(forkname, "_%s", forkNames[fork]);
> +else
> +forkname[0] = 0;
> +}
> +else
> +pg_fatal("found invalid fork number: %u", fork);
>
> Isn't the above complex? What's the problem with something like below?
> Why do we need if (fork) - else block?
>
> if (fork >= 0  && fork <= MAX_FORKNUM)
> sprintf(forkname, "_%s", forkNames[fork]);
> else
> pg_fatal("found invalid fork number: %u", fork);

This was to suppress any suffix for main forks, but yes, could
simplify and include the `_` in the suffix name.  Will include such a
change.

> 3.
> +charpage[BLCKSZ] = {0};
> I think when writing to a file, we need PGAlignedBlock rather than a
> simple char array of bytes, see the description around PGAlignedBlock
> for why it is so.

Easy enough change, and makes sense.

> 4.
> +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
> Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can
> avoid fileno() system calls, no? Do you need the file position to
> remain the same after writing, hence pg_pwrite()?

I don't recall the original motivation, TBH.

> 5.
> +if (!RestoreBlockImage(record, block_id, page))
> +continue;
> +
> +/* we have our extracted FPI, let's save it now */
> After extracting the page from the WAL record, do we need to perform a
> checksum on it?

That is there in fixup mode (or should be).  Are you thinking this
should also be set if not in fixup mode?  That defeats the purpose of
the raw page extract, which is to see *exactly* what the WAL stream
has.

> 6.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> Use pg_dir_create_mode instead of hard-coded 0007?

Sure.

> 7.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> +fsync(fileno(OPF));
> +fclose(OPF);
> Since you're creating the directory in case it's not available, you
> need to fsync the directory too?

Sure.

> 8.
> +case 'W':
> +case 'X':
> +config.fixup_fpw = (option == 'X');
> +config.save_fpw_path = pg_strdup(optarg);
> +break;
> Just set  config.fixup_fpw = false before the switch block starts,
> like the other variables, and then perhaps doing like below is more
> readable:
> case 'W':
> config.save_fpw_path = pg_strdup(optarg);
> case 'X':
>config.fixup_fpw = true;
>config.save_fpw_path = pg_strdup(optarg);

Like separate opt processing with their own `break` statement?
Probably a bit more readable/conventional.

> 9.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> Should we use pg_mkdir_p() instead of mkdir()?

Sure.

Thanks,

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread Alvaro Herrera
On 2022-Nov-09, Justin Pryzby wrote:

> On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote:

> > 1. For ease of review, please split the test patch to 0002.
> 
> This is just my opinion, but .. why ?  Since it's easy to
> filter/skip/display a file, I don't think it's usually useful to have
> separate patches for tests or docs.

I concur with Justin.  When a patch is bugfix and a test is added that
verifies it, I like to keep the test in a separate commit (for submit
purposes and in my personal repo -- not for the official push!) so that
I can git-checkout to just the test and make sure it fails ahead of
pushing the fix commit.  But for a new feature, there's no reason to do
so.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread Justin Pryzby
On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote:
> On Wed, Nov 9, 2022 at 5:08 AM David Christensen 
>  wrote:
> >
> > Enclosed is v6, which squashes your refactor and adds the additional
> > recent suggestions.
> 
> Thanks for working on this feature. Here are some comments for now. I
> haven't looked at the tests, I will take another look at the code and
> tests after these and all other comments are addressed.
> 
> 1. For ease of review, please split the test patch to 0002.

This is just my opinion, but .. why ?  Since it's easy to
filter/skip/display a file, I don't think it's usually useful to have
separate patches for tests or docs.

> 6.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> Use pg_dir_create_mode instead of hard-coded 0007?

I think I thought of that when I first looked at the patch ...  but, I'm
not sure, since it says:

src/include/common/file_perm.h-/* Modes for creating directories and files IN 
THE DATA DIRECTORY */
src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode;

I was wondering if there's any reason to do "CREATE DATABASE".  The vast
majority of TAP tests don't.

$ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
   1435 safe_psql('postgres',
335 safe_psql(
 23 safe_psql($connect_db,

-- 
Justin




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread Bharath Rupireddy
On Wed, Nov 9, 2022 at 5:08 AM David Christensen
 wrote:
>
> Enclosed is v6, which squashes your refactor and adds the additional
> recent suggestions.

Thanks for working on this feature. Here are some comments for now. I
haven't looked at the tests, I will take another look at the code and
tests after these and all other comments are addressed.

1. For ease of review, please split the test patch to 0002.

2. I'm unable to understand the use-case for --fixup-fpi option.
pg_waldump is supposed to be just WAL reader, and must not return any
modified information, with --fixup-fpi option, the patch violates this
principle i.e. it sets page LSN and returns. Without actually
replaying the WAL record on the page, how is it correct to just set
the LSN? How will it be useful? ISTM, we must ignore this option
unless there's a strong use-case.
3.
+if (fork >= 0 && fork <= MAX_FORKNUM)
+{
+if (fork)
+sprintf(forkname, "_%s", forkNames[fork]);
+else
+forkname[0] = 0;
+}
+else
+pg_fatal("found invalid fork number: %u", fork);

Isn't the above complex? What's the problem with something like below?
Why do we need if (fork) - else block?

if (fork >= 0  && fork <= MAX_FORKNUM)
sprintf(forkname, "_%s", forkNames[fork]);
else
pg_fatal("found invalid fork number: %u", fork);

3.
+charpage[BLCKSZ] = {0};
I think when writing to a file, we need PGAlignedBlock rather than a
simple char array of bytes, see the description around PGAlignedBlock
for why it is so.

4.
+if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can
avoid fileno() system calls, no? Do you need the file position to
remain the same after writing, hence pg_pwrite()?

5.
+if (!RestoreBlockImage(record, block_id, page))
+continue;
+
+/* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

6.
+if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Use pg_dir_create_mode instead of hard-coded 0007?

7.
+if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
+fsync(fileno(OPF));
+fclose(OPF);
Since you're creating the directory in case it's not available, you
need to fsync the directory too?

8.
+case 'W':
+case 'X':
+config.fixup_fpw = (option == 'X');
+config.save_fpw_path = pg_strdup(optarg);
+break;
Just set  config.fixup_fpw = false before the switch block starts,
like the other variables, and then perhaps doing like below is more
readable:
case 'W':
config.save_fpw_path = pg_strdup(optarg);
case 'X':
   config.fixup_fpw = true;
   config.save_fpw_path = pg_strdup(optarg);

9.
+if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Should we use pg_mkdir_p() instead of mkdir()?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread sho kato
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hello,

I tested this patch on Linux and there is no problem.
Also, I reviewed this patch and commented below.

@@ -439,6 +447,107 @@ XLogRecordHasFPW(XLogReaderState *record)
+   if (fork >= 0 && fork <= MAX_FORKNUM)
+   {
+   if (fork)
+   sprintf(forkname, "_%s", forkNames[fork]);
+   else
+   forkname[0] = 0;
+   }
+   else
+   pg_fatal("found invalid fork number: %u", fork);

Should we add the comment if the main fork is saved without "_main" suffix for 
code readability?

@@ -679,6 +788,9 @@ usage(void)
 " (default: 1 or the value used in 
STARTSEG)\n"));
printf(_("  -V, --version  output version information, then 
exit\n"));
printf(_("  -w, --fullpage only show records with a full page 
write\n"));
+   printf(_("  -W, --save-fpi=pathsave full page images to given path as\n"
+" LSN.T.D.R.B_F\n"));
+   printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to 
saved page\n"));
printf(_("  -x, --xid=XID  only show records with transaction ID 
XID\n"));
printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 " (optionally, show per-record 
statistics)\n"));

Since lower-case options are displayed at the top, should we switch the order 
of -x and -X?

@@ -972,6 +1093,25 @@ main(int argc, char **argv)
}
}

+   int dir_status = pg_check_dir(config.save_fpw_path);
+
+   if (dir_status < 0)
+   {
+   pg_log_error("could not access output directory: %s", 
config.save_fpw_path);
+   goto bad_argument;
+   }

Should we output %s enclosed with \"?

Regards,
Sho Kato

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread David Christensen
Enclosed is v6, which squashes your refactor and adds the additional
recent suggestions.

Thanks!


v6-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread David Christensen
On Tue, Nov 8, 2022 at 4:45 PM Justin Pryzby  wrote:
>
> On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote:
> > Hi Justin et al,
> >
> > Enclosed is v5 of this patch which now passes the CirrusCI checks for
> > all supported OSes. I went ahead and reworked the test a bit so it's a
> > little more amenable to the OS-agnostic approach for testing.
>
> Great, thanks.
>
> This includes the changes that I'd started a few months ago.
> Plus adding the test which was missing for meson.

Cool, will review, thanks.

> +format: 
> LSN.TSOID.DBOID.RELNODE.BLKNO
>
> I'd prefer if the abbreviations were "reltablespace" and "datoid"

Sure, no issues there.

> Also, should the test case call pg_relation_filenode() rather than using
> relfilenode directly ?  Is it a problem that the test code assumes
> pagesize=8192 ?

Both good points. Is pagesize just exposed via
`current_setting('block_size')` or is there a different approach?

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread Justin Pryzby
On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote:
> Hi Justin et al,
> 
> Enclosed is v5 of this patch which now passes the CirrusCI checks for
> all supported OSes. I went ahead and reworked the test a bit so it's a
> little more amenable to the OS-agnostic approach for testing.

Great, thanks.

This includes the changes that I'd started a few months ago.
Plus adding the test which was missing for meson.

+format: 
LSN.TSOID.DBOID.RELNODE.BLKNO

I'd prefer if the abbreviations were "reltablespace" and "datoid"

Also, should the test case call pg_relation_filenode() rather than using
relfilenode directly ?  Is it a problem that the test code assumes
pagesize=8192 ?

-- 
Justin
>From d4cb23bc2f0edcc65b0bef75fd2e8d6e5aeda21f Mon Sep 17 00:00:00 2001
From: David Christensen 
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH 1/2] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename:  with
formatting to make things somewhat sortable; for instance:

-01C0.1663.1.6117.0
-01000150.1664.0.6115.0
-010001E0.1664.0.6114.0
-01000270.1663.1.6116.0
-01000300.1663.1.6113.0
-01000390.1663.1.6112.0
-01000420.1663.1.8903.0
-010004B0.1663.1.8902.0
-01000540.1663.1.6111.0
-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fpi) and one
which applies the updated pd_lsn and pd_checksum (--fixup-fpi).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml  |  79 +++
 src/bin/pg_waldump/pg_waldump.c   | 151 +-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 137 
 3 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5f..cc3ce918905 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,85 @@ PostgreSQL documentation

  
 
+ 
+   -W save_path
+   --save-fpi=save_path
+   -X save_path
+   --fixup-fpi=save_path
+   
+   
+Save full page images seen in the WAL stream to the
+save_path directory, which will be created
+if it does not exist.  The images saved will be subject to the same
+filtering and limiting criteria as display records, but in this
+mode pg_waldump will not output any other
+information.
+   
+   
+If invoked using
+the -X/--fixup-fpi
+option, this page image will include the pd_lsn of
+the replayed record rather than the raw page image; as well,
+the pd_checksum field will be updated if it had
+previously existed.
+   
+   
+The page images will be saved with the file
+format: LSN.TSOID.DBOID.RELNODE.BLKNOFORK
+
+The dot-separated components are (in order):
+
+
+ 
+  
+   
+Component
+Description
+   
+  
+
+  
+   
+LSN
+The LSN of the record with this block, formatted
+ as two 8-character hexadecimal numbers %08X-%08X
+   
+
+   
+TSOID
+tablespace OID for the block
+   
+
+   
+DBOID
+database OID for the block
+   
+
+   
+RELNODE
+relnode id for the block
+   
+
+   
+BLKNO
+the block number of this block
+   
+
+   
+FORK
+
+ if coming from the main fork, will be empty, otherwise will be
+ one of _fsm, _vm,
+ or _init.
+
+   
+  
+ 
+
+   
+   
+ 
+
  
   -x xid
   --xid=xid
diff --git 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-07 Thread David Christensen
Hi Justin et al,

Enclosed is v5 of this patch which now passes the CirrusCI checks for
all supported OSes. I went ahead and reworked the test a bit so it's a
little more amenable to the OS-agnostic approach for testing.

Best,

David


v5-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread David Christensen
On Fri, Nov 4, 2022 at 1:38 PM Justin Pryzby  wrote:
> > > As I recall, that's due to relying on "cp".  And "rsync", which
> > > shouldn't be assumed to exist by regression tests).

Will poke around other TAP tests to see if there's a more consistent
interface, what perl version we can assume and available modules, etc.
If there's not some trivial wrapper at this point so all TAP tests
could use it regardless of OS, it would definitely be good to
introduce such a method.

> > I will work on supporting the windows compatibility here. Is there some 
> > list of guidelines for what you can and can’t use? I don’t have a windows 
> > machine available to develop on.
>
> I think a lot (most?) developers here don't have a windows environment
> available, so now have been using cirrusci's tests to verify.  If you
> haven't used cirrusci directly (not via cfbot) before, start at:
> src/tools/ci/README

Thanks, good starting point.

> > Was it failing on windows? I was attempting to skip it as I recall.
>
> I don't see anything about skipping, and cirrus's logs from 2
> commitfests ago were pruned.  I looked at this patch earlier this year,
> but never got around to replacing the calls to rsync and cp.

Ah, it's skipped (not fixed) in my git repo, but never got around to
submitting that version through email.  That explains it.

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread Justin Pryzby
On Fri, Nov 04, 2022 at 09:16:29AM -0500, David Christensen wrote:
> On Nov 4, 2022, at 9:02 AM, Justin Pryzby  wrote:
> > On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:
> >> 2022年5月3日(火) 8:45 David Christensen :
> >>> 
> >>> ...and pushing a couple fixups pointed out by cfbot, so here's v4.
> >> 
> >> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
> >> currently underway, this would be an excellent time to update the patch.
> > 
> > More important than needing to be rebased, the patch has never passed
> > its current tests on windows.
> > 
> > As I recall, that's due to relying on "cp".  And "rsync", which
> > shouldn't be assumed to exist by regression tests).
> 
> I will work on supporting the windows compatibility here. Is there some list 
> of guidelines for what you can and can’t use? I don’t have a windows machine 
> available to develop on. 

I think a lot (most?) developers here don't have a windows environment
available, so now have been using cirrusci's tests to verify.  If you
haven't used cirrusci directly (not via cfbot) before, start at:
src/tools/ci/README

There's not much assumed about the build environment, and not much more
assumed about the test environment.  Most of the portability is handled
by using C and perl.  I think there's even no assumption that "tar" is
available (except maybe for building releases).  This patch should avoid
relying on tools that aren't already required.

As a practical matter, cfbot needs to pass, not only to demonstrate that
the patch consistently passes tests, but also because if the patch were
merged while it failed tests in cfbot, it would cause every other patch
to start to fail, too.

> Was it failing on windows? I was attempting to skip it as I recall. 

I don't see anything about skipping, and cirrus's logs from 2
commitfests ago were pruned.  I looked at this patch earlier this year,
but never got around to replacing the calls to rsync and cp.

-- 
Justin




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread David Christensen
On Nov 4, 2022, at 9:02 AM, Justin Pryzby  wrote:
> 
> On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:
>> 2022年5月3日(火) 8:45 David Christensen :
>>> 
>>> ...and pushing a couple fixups pointed out by cfbot, so here's v4.
>> 
>> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
>> currently underway, this would be an excellent time to update the patch.
> 
> More important than needing to be rebased, the patch has never passed
> its current tests on windows.
> 
> As I recall, that's due to relying on "cp".  And "rsync", which
> shouldn't be assumed to exist by regression tests).

I will work on supporting the windows compatibility here. Is there some list of 
guidelines for what you can and can’t use? I don’t have a windows machine 
available to develop on. 

Was it failing on windows? I was attempting to skip it as I recall. 

Best,

David





Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread Justin Pryzby
On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:
> 2022年5月3日(火) 8:45 David Christensen :
> >
> > ...and pushing a couple fixups pointed out by cfbot, so here's v4.
> 
> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
> currently underway, this would be an excellent time to update the patch.

More important than needing to be rebased, the patch has never passed
its current tests on windows.

As I recall, that's due to relying on "cp".  And "rsync", which
shouldn't be assumed to exist by regression tests).

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3628




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-03 Thread Ian Lawrence Barwick
2022年5月3日(火) 8:45 David Christensen :
>
> ...and pushing a couple fixups pointed out by cfbot, so here's v4.

Hi

cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

[1] http://cfbot.cputube.org/patch_40_3628.log

Thanks

Ian Barwick




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-06-14 Thread Michael Paquier
On Tue, May 03, 2022 at 10:34:41AM +0200, Alvaro Herrera wrote:
> I remember Greg Mullane posted a tool that attempts to correct page CRC
> mismatches[1].  This new tool might be useful to feed healing attempts,
> too.  (It's of course not in any way a *solution*, because the page
> might have been modified by other WAL records since the last FPI, but it
> could be a start towards building a solution that scoops page contents
> from WAL.)
> 
> [1] https://github.com/turnstep/pg_healer

Fun.  Thanks for mentioning that.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-05-03 Thread Alvaro Herrera
On 2022-Apr-27, Michael Paquier wrote:

> On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:
> > True. :-) This does seem like a tool geared towards "expert mode", so
> > maybe we just assume if you need it you know what you're doing?
> 
> This is definitely an expert mode toy.

I remember Greg Mullane posted a tool that attempts to correct page CRC
mismatches[1].  This new tool might be useful to feed healing attempts,
too.  (It's of course not in any way a *solution*, because the page
might have been modified by other WAL records since the last FPI, but it
could be a start towards building a solution that scoops page contents
from WAL.)

[1] https://github.com/turnstep/pg_healer

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-05-02 Thread David Christensen
...and pushing a couple fixups pointed out by cfbot, so here's v4.


On Mon, May 2, 2022 at 8:42 AM David Christensen
 wrote:
>
> Enclosed is v3 of this patch; this adds two modes for this feature,
> one with the raw page `--save-fullpage/-W` and one with the
> LSN+checksum fixups `--save-fullpage-fixup/-X`.
>
> I've added at least some basic sanity-checking of the underlying
> feature, as well as run the test file and the changes to pg_waldump.c
> through pgindent/perltidy to make them adhere to project standards.
> Threw in a rebase as well.
>
> Would appreciate any additional feedback here.
>
> Best,
>
> David


v4-pg_waldump-save-fpi.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-05-02 Thread David Christensen
Enclosed is v3 of this patch; this adds two modes for this feature,
one with the raw page `--save-fullpage/-W` and one with the
LSN+checksum fixups `--save-fullpage-fixup/-X`.

I've added at least some basic sanity-checking of the underlying
feature, as well as run the test file and the changes to pg_waldump.c
through pgindent/perltidy to make them adhere to project standards.
Threw in a rebase as well.

Would appreciate any additional feedback here.

Best,

David


v3-pg_waldump-save-fpi.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-26 Thread Michael Paquier
On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:
> True. :-) This does seem like a tool geared towards "expert mode", so
> maybe we just assume if you need it you know what you're doing?

This is definitely an expert mode toy.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-26 Thread David Christensen
On Mon, Apr 25, 2022 at 9:42 PM Michael Paquier  wrote:
> On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:
> > On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
> >  wrote:
> >> Thanks for working on this. I'm just thinking if we can use these FPIs
> >> to repair the corrupted pages? I would like to understand more
> >> detailed usages of the FPIs other than inspecting with pageinspect.
> >
> > My main use case was for being able to look at potential corruption,
> > either in the WAL stream, on heap, or in tools associated with the WAL
> > stream.  I suppose you could use the page images to replace corrupted
> > on-disk pages (and in fact I think I've heard of a tool or two that
> > try to do that), though don't know that I consider this the primary
> > purpose (and having toast tables and the list, as well as clog would
> > make it potentially hard to just drop-in a page version without
> > issues).  Might help in extreme situations though.
>
> You could do a bunch of things with those images, even make things
> worse if you are not careful enough.

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

> >> 10) Along with pg_pwrite(), can we also fsync the files (of course
> >> users can choose it optionally) so that the writes will be durable for
> >> the OS crashes?
> >
> > Can add; you thinking a separate flag to disable this with default true?
>
> We expect data generated by tools like pg_dump, pg_receivewal
> (depending on the use --synchronous) or pg_basebackup to be consistent
> when we exit from the call.  FWIW, flushing this data does not seem
> like a strong requirement for something aimed at being used page-level
> chirurgy or lookups, because the WAL segments should still be around
> even if the host holding the archives is unplugged.

I have added the fsync to the latest path (forthcoming), but I have no
strong preferences here as to the correct/expected behavior.

Best,

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-26 Thread David Christensen
On Mon, Apr 25, 2022 at 9:54 PM Michael Paquier  wrote:
> On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:
> > On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier  wrote:
> >> I don't think that there is any need to rely on a new logic if there
> >> is already some code in place able to do the same work.  See
> >> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
> >> that relies on pg_check_dir().  I think that you'd better rely at
> >> least on what pgcheckdir.c offers.
> >
> > Yeah, though I am tending towards what another user had suggested and
> > just accepting an existing directory rather than requiring it to be
> > empty, so thinking I might just skip this one. (Will review the file
> > you've pointed out to see if there is a relevant function though.)
>
> OK.  FWIW, pg_check_dir() is used in initdb and pg_basebackup because
> these care about the behavior to use when specifying a target path.
> You could reuse it, but use a different policy depending on its
> returned result for the needs of what you see fit in this case.

I have a new version of the patch (pending tests) that uses
pg_check_dir's return value to handle things appropriately, so at
least some code reuse now.  It did end up simplifying a lot.

> >> +   PageSetLSN(page, record->ReadRecPtr);
> >> +   /* if checksum field is non-zero then we have 
> >> checksums enabled,
> >> +* so recalculate the checksum with new LSN (yes, this 
> >> is a hack)
> >> +*/
> >> Yeah, that looks like a hack, but putting in place a page on a cluster
> >> that has checksums enabled would be more annoying with
> >> zero_damaged_pages enabled if we don't do that, so that's fine by me
> >> as-is.  Perhaps you should mention that FPWs don't have their
> >> pd_checksum updated when written.
> >
> > Can make a mention; you thinking just a comment in the code is
> > sufficient, or want there to be user-visible docs as well?
>
> I was thinking about a comment, at least.

New patch version has significantly more comments.

> > This was to make the page as extracted equivalent to the effect of
> > applying the WAL record block on replay (the LSN and checksum both);
> > since recovery calls this function to mark the LSN where the page came
> > from this is why I did this as well.  (I do see perhaps a case for
> > --raw output that doesn't munge the page whatsoever, just made
> > comparisons easier.)
>
> Hm.  Okay.  The argument goes both ways, I guess, depending on what we
> want to do with those raw pages.  Still you should not need pd_lsn if
> the point is to be able to stick the pages back in place to attempt to
> get back as much data as possible when loading it back to the shared
> buffers?

Yeah, I can see that too; I think there's at least enough of an
argument for a flag to apply the fixups or just extract only the raw
page pre-modification.  Not sure which should be the "default"
behavior; either `--raw` or `--fixup-metadata` or something could
work. (Naming suggestions welcomed.)

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread Michael Paquier
On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:
> On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier  wrote:
>> I don't think that there is any need to rely on a new logic if there
>> is already some code in place able to do the same work.  See
>> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
>> that relies on pg_check_dir().  I think that you'd better rely at
>> least on what pgcheckdir.c offers.
> 
> Yeah, though I am tending towards what another user had suggested and
> just accepting an existing directory rather than requiring it to be
> empty, so thinking I might just skip this one. (Will review the file
> you've pointed out to see if there is a relevant function though.)

OK.  FWIW, pg_check_dir() is used in initdb and pg_basebackup because
these care about the behavior to use when specifying a target path.
You could reuse it, but use a different policy depending on its
returned result for the needs of what you see fit in this case.

>> +   PageSetLSN(page, record->ReadRecPtr);
>> +   /* if checksum field is non-zero then we have checksums 
>> enabled,
>> +* so recalculate the checksum with new LSN (yes, this 
>> is a hack)
>> +*/
>> Yeah, that looks like a hack, but putting in place a page on a cluster
>> that has checksums enabled would be more annoying with
>> zero_damaged_pages enabled if we don't do that, so that's fine by me
>> as-is.  Perhaps you should mention that FPWs don't have their
>> pd_checksum updated when written.
> 
> Can make a mention; you thinking just a comment in the code is
> sufficient, or want there to be user-visible docs as well?

I was thinking about a comment, at least.

> This was to make the page as extracted equivalent to the effect of
> applying the WAL record block on replay (the LSN and checksum both);
> since recovery calls this function to mark the LSN where the page came
> from this is why I did this as well.  (I do see perhaps a case for
> --raw output that doesn't munge the page whatsoever, just made
> comparisons easier.)

Hm.  Okay.  The argument goes both ways, I guess, depending on what we
want to do with those raw pages.  Still you should not need pd_lsn if
the point is to be able to stick the pages back in place to attempt to
get back as much data as possible when loading it back to the shared
buffers?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread Michael Paquier
On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:
> On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
>  wrote:
>> Thanks for working on this. I'm just thinking if we can use these FPIs
>> to repair the corrupted pages? I would like to understand more
>> detailed usages of the FPIs other than inspecting with pageinspect.
> 
> My main use case was for being able to look at potential corruption,
> either in the WAL stream, on heap, or in tools associated with the WAL
> stream.  I suppose you could use the page images to replace corrupted
> on-disk pages (and in fact I think I've heard of a tool or two that
> try to do that), though don't know that I consider this the primary
> purpose (and having toast tables and the list, as well as clog would
> make it potentially hard to just drop-in a page version without
> issues).  Might help in extreme situations though.

You could do a bunch of things with those images, even make things
worse if you are not careful enough.

>> 10) Along with pg_pwrite(), can we also fsync the files (of course
>> users can choose it optionally) so that the writes will be durable for
>> the OS crashes?
> 
> Can add; you thinking a separate flag to disable this with default true?

We expect data generated by tools like pg_dump, pg_receivewal
(depending on the use --synchronous) or pg_basebackup to be consistent
when we exit from the call.  FWIW, flushing this data does not seem
like a strong requirement for something aimed at being used page-level
chirurgy or lookups, because the WAL segments should still be around
even if the host holding the archives is unplugged.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread David Christensen
On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
 wrote:
> Thanks for working on this. I'm just thinking if we can use these FPIs
> to repair the corrupted pages? I would like to understand more
> detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream.  I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues).  Might help in extreme situations though.

> Given that others have realistic use-cases (of course I would like to
> know more about those), +1 for the idea. However, I would suggest
> adding a function to extract raw FPI data to the pg_walinspect
> extension that got recently committed in PG 15, the out of which can
> directly be fed to pageinspect functions or

Yeah, makes sense to have some overlap here; will review what is there
and see if there is some shared code base we can utilize.  (ISTR some
work towards getting these two tools using more of the same code, and
this seems like another such instance.)

> Few comments:
> 1) I think it's good to mention the stored file name format.
> + printf(_("  -W, --raw-fpi=path save found full page images to
> given path\n"));

+1, though I've also thought there could be uses to have multiple
possible output formats here (most immediately, there may be cases
where we want *each* FPI for a block vs the *latest*, so files name
with/without the LSN component seem the easiest way forward here).
That would introduce some additional complexity though, so might need
to see if others think that makes any sense.

> 2)
> + for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
> + {
> + /* we will now extract the fullpage image from the XLogRecord and save
> + * it to a calculated filename */
> +
> + if (XLogRecHasBlockImage(record, block_id))
>
> I think we need XLogRecHasBlockRef to be true to check
> XLogRecHasBlockImage otherwise, we will see some build farms failing,
> recently I've seen this failure for pg_walinspect..
>
> for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
> {
> if (!XLogRecHasBlockRef(record, block_id))
> continue;
>
> if (XLogRecHasBlockImage(record, block_id))
> *fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
> }

Good point; my previous patch that got committed here (127aea2a65)
probably also needed this treatment.

> 3) Please correct the commenting format:
> + /* we will now extract the fullpage image from the XLogRecord and save
> + * it to a calculated filename */

Ack.

> 4) Usually we start errors with lower case letters "could not ."
> + pg_fatal("Couldn't open file for output: %s", filename);
> + pg_fatal("Couldn't write out complete FPI to file: %s", filename);
> And the variable name too:
> + FILE *OPF;

Ack.

> 5) Not sure how the FPIs of TOASTed tables get stored, but it would be
> good to check.

What would be different here? Are there issues you can think of, or
just more from the pageinspect side of things?

> 6) Good to specify the known usages of FPIs in the documentation.

Ack. Prob good to get additional info/use cases from others, as mine
is fairly short. :-)

> 7) Isn't it good to emit an error if RestoreBlockImage returns false?
> + if (RestoreBlockImage(record, block_id, page))
> + {

Ack.

> 8) I think I don't mind if a non-empty directory is specified - IMO
> better usability is this - if the directory is non-empty, just go add
> the FPI files if FPI file exists just replace it, if the directory
> isn't existing, create and write the FPI files.
> + /* we accept an empty existing directory */
> + if (stat(config.save_fpw_path, ) == 0 && S_ISDIR(st.st_mode))
> + {

Agreed; was mainly trying to prevent accidental expansion inside
`pg_wal` when an earlier version of the patch implied `.` as the
current dir with an optional path, but I've since made the path
non-optional and agree that this is unnecessarily restrictive.

>  9) Instead of following:
> + if (XLogRecordHasFPW(xlogreader_state))
> + XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
> I will just do this in XLogRecordSaveFPWs:
> for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
> {
> if (!XLogRecHasBlockRef(record, block_id))
> continue;
>
> if (XLogRecHasBlockImage(record, block_id))
> {
>
> }
> }

Yeah, a little redundant.

> 10) Along with pg_pwrite(), can we also fsync the files (of course
> users can choose it optionally) so that the writes will be durable for
> the OS crashes?

Can add; you thinking a separate flag to disable this 

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread David Christensen
On Mon, Apr 25, 2022 at 2:00 AM Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 4/25/22 8:11 AM, Michael Paquier wrote:
> > On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:
> >> Hi Matthias, great point.  Enclosed is a revised version of the patch
> >> that adds the fork identifier to the end if it's a non-main fork.
> > Like Alvaro, I have seen cases where this would have been really
> > handy.  So +1 from me, as well, to have more tooling like what you are
> > proposing.
>
> +1 on the idea.
>
> FWIW, there is an extension doing this [1] but having the feature
> included in pg_waldump would be great.

Cool, glad to see there is some interest; definitely some overlap in
forensics inside and outside the database both, as there are different
use cases for both.

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread David Christensen
On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier  wrote:
>
> On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:
> > Hi Matthias, great point.  Enclosed is a revised version of the patch
> > that adds the fork identifier to the end if it's a non-main fork.
>
> Like Alvaro, I have seen cases where this would have been really
> handy.  So +1 from me, as well, to have more tooling like what you are
> proposing.  Fine for me to use one file for each block with a name
> like what you are suggesting for each one of them.
>
> +   /* we accept an empty existing directory */
> +   if (stat(config.save_fpw_path, ) == 0 && S_ISDIR(st.st_mode))
> +   {
> I don't think that there is any need to rely on a new logic if there
> is already some code in place able to do the same work.  See
> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
> that relies on pg_check_dir().  I think that you'd better rely at
> least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

> +   {"raw-fpi", required_argument, NULL, 'W'},
> I think that we'd better rename this option.  "fpi", that is not used
> much in the user-facing docs, is additionally not adapted when we have
> an other option called -w/--fullpage.  I can think of
> --save-fullpage.

Makes sense.

> +   PageSetLSN(page, record->ReadRecPtr);
> +   /* if checksum field is non-zero then we have checksums 
> enabled,
> +* so recalculate the checksum with new LSN (yes, this is 
> a hack)
> +*/
> Yeah, that looks like a hack, but putting in place a page on a cluster
> that has checksums enabled would be more annoying with
> zero_damaged_pages enabled if we don't do that, so that's fine by me
> as-is.  Perhaps you should mention that FPWs don't have their
> pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

> +   /* we will now extract the fullpage image from the XLogRecord and save
> +* it to a calculated filename */
> The format of this comment is incorrect.
>
> +The LSN of the record with this block, formatted
> +as %08x-%08X instead of the
> +conventional %X/%X due to filesystem naming
> +limits
> The last part of the sentence about %X/%X could just be removed.  That
> could be confusing, at worse.

Makes sense.

> +   PageSetLSN(page, record->ReadRecPtr);
> Why is pd_lsn set?

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well.  (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

> git diff --check complains a bit.

Can look into this.

> This stuff should include some tests.  With --end, the tests can
> be cheap.

Yeah, makes sense, will include some in the next version.

David




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread Bharath Rupireddy
On Sat, Apr 23, 2022 at 4:21 AM David Christensen
 wrote:
>
> Hi -hackers,
>
> Enclosed is a patch to allow extraction/saving of FPI from the WAL
> stream via pg_waldump.
>
> Description from the commit:
>
> Extracts full-page images from the WAL stream into a target directory,
> which must be empty or not
> exist.  These images are subject to the same filtering rules as normal
> display in pg_waldump, which
> means that you can isolate the full page writes to a target relation,
> among other things.
>
> Files are saved with the filename:  with
> formatting to make things
> somewhat sortable; for instance:
>
> -01C0.1663.1.6117.0
> -01000150.1664.0.6115.0
> -010001E0.1664.0.6114.0
> -01000270.1663.1.6116.0
> -01000300.1663.1.6113.0
> -01000390.1663.1.6112.0
> -01000420.1663.1.8903.0
> -010004B0.1663.1.8902.0
> -01000540.1663.1.6111.0
> -010005D0.1663.1.6110.0
>
> It's noteworthy that the raw images do not have the current LSN stored
> with them in the WAL
> stream (as would be true for on-heap versions of the blocks), nor
> would the checksum be valid in
> them (though WAL itself has checksums, so there is some protection
> there).  This patch chooses to
> place the LSN and calculate the proper checksum (if non-zero in the
> source image) in the outputted
> block.  (This could perhaps be a targetted flag if we decide we don't
> always want this.)
>
> These images could be loaded/inspected via `pg_read_binary_file()` and
> used in the `pageinspect`
> suite of tools to perform detailed analysis on the pages in question,
> based on historical
> information, and may come in handy for forensics work.

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

Given that others have realistic use-cases (of course I would like to
know more about those), +1 for the idea. However, I would suggest
adding a function to extract raw FPI data to the pg_walinspect
extension that got recently committed in PG 15, the out of which can
directly be fed to pageinspect functions or

Few comments:
1) I think it's good to mention the stored file name format.
+ printf(_("  -W, --raw-fpi=path save found full page images to
given path\n"));
2)
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
+
+ if (XLogRecHasBlockImage(record, block_id))

I think we need XLogRecHasBlockRef to be true to check
XLogRecHasBlockImage otherwise, we will see some build farms failing,
recently I've seen this failure for pg_walinspect..

for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
*fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
}
3) Please correct the commenting format:
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
4) Usually we start errors with lower case letters "could not ."
+ pg_fatal("Couldn't open file for output: %s", filename);
+ pg_fatal("Couldn't write out complete FPI to file: %s", filename);
And the variable name too:
+ FILE *OPF;
5) Not sure how the FPIs of TOASTed tables get stored, but it would be
good to check.
6) Good to specify the known usages of FPIs in the documentation.
7) Isn't it good to emit an error if RestoreBlockImage returns false?
+ if (RestoreBlockImage(record, block_id, page))
+ {
8) I think I don't mind if a non-empty directory is specified - IMO
better usability is this - if the directory is non-empty, just go add
the FPI files if FPI file exists just replace it, if the directory
isn't existing, create and write the FPI files.
+ /* we accept an empty existing directory */
+ if (stat(config.save_fpw_path, ) == 0 && S_ISDIR(st.st_mode))
+ {
 9) Instead of following:
+ if (XLogRecordHasFPW(xlogreader_state))
+ XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
I will just do this in XLogRecordSaveFPWs:
for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
{

}
}
10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Regards,
Bharath Rupireddy.




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread Michael Paquier
On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:
> Hi Matthias, great point.  Enclosed is a revised version of the patch
> that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy.  So +1 from me, as well, to have more tooling like what you are
proposing.  Fine for me to use one file for each block with a name
like what you are suggesting for each one of them. 

+   /* we accept an empty existing directory */
+   if (stat(config.save_fpw_path, ) == 0 && S_ISDIR(st.st_mode))
+   {
I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work.  See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir().  I think that you'd better rely at
least on what pgcheckdir.c offers.

+   {"raw-fpi", required_argument, NULL, 'W'},
I think that we'd better rename this option.  "fpi", that is not used
much in the user-facing docs, is additionally not adapted when we have
an other option called -w/--fullpage.  I can think of
--save-fullpage.

+   PageSetLSN(page, record->ReadRecPtr);
+   /* if checksum field is non-zero then we have checksums 
enabled,
+* so recalculate the checksum with new LSN (yes, this is a 
hack)
+*/
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

+   /* we will now extract the fullpage image from the XLogRecord and save
+* it to a calculated filename */
The format of this comment is incorrect.

+The LSN of the record with this block, formatted
+as %08x-%08X instead of the
+conventional %X/%X due to filesystem naming
+limits
The last part of the sentence about %X/%X could just be removed.  That
could be confusing, at worse.

+   PageSetLSN(page, record->ReadRecPtr);
Why is pd_lsn set?

git diff --check complains a bit.

This stuff should include some tests.  With --end, the tests can
be cheap.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-23 Thread David Christensen
On Sat, Apr 23, 2022 at 9:49 AM Matthias van de Meent
 wrote:

> Regardless of my (lack of) opinion on the inclusion of this patch in
> PG (I did not significantly review this patch); I noticed that you do
> not yet identify the 'fork' of the FPI in the file name.
>
> A lack of fork identifier in the exported file names would make
> debugging much more difficult due to the relatively difficult to
> identify data contained in !main forks, so I think this oversight
> should be fixed, be it through `_forkname` postfix like normal fork
> segments, or be it through `.` numerical in- or postfix in
> the filename.
>
> -Matthias

Hi Matthias, great point.  Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Best,

David


v2-pg_waldump-save-fpi.patch
Description: Binary data


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-23 Thread Matthias van de Meent
On Sat, 23 Apr 2022 at 00:51, David Christensen
 wrote:
>
> Hi -hackers,
>
> Enclosed is a patch to allow extraction/saving of FPI from the WAL
> stream via pg_waldump.
>
> Description from the commit:
>
> Extracts full-page images from the WAL stream into a target directory,
> which must be empty or not
> exist.  These images are subject to the same filtering rules as normal
> display in pg_waldump, which
> means that you can isolate the full page writes to a target relation,
> among other things.
>
> Files are saved with the filename:  with
> formatting to make things

Regardless of my (lack of) opinion on the inclusion of this patch in
PG (I did not significantly review this patch); I noticed that you do
not yet identify the 'fork' of the FPI in the file name.

A lack of fork identifier in the exported file names would make
debugging much more difficult due to the relatively difficult to
identify data contained in !main forks, so I think this oversight
should be fixed, be it through `_forkname` postfix like normal fork
segments, or be it through `.` numerical in- or postfix in
the filename.

-Matthias




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-23 Thread Alvaro Herrera
On 2022-Apr-22, David Christensen wrote:

> Hi -hackers,
> 
> Enclosed is a patch to allow extraction/saving of FPI from the WAL
> stream via pg_waldump.

I already wrote and posted a patch to do exactly this, and found it the
only way to fix a customer problem, so +1 on having this feature.  I
haven't reviewed David's patch.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




[PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-22 Thread David Christensen
Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist.  These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename:  with
formatting to make things
somewhat sortable; for instance:

-01C0.1663.1.6117.0
-01000150.1664.0.6115.0
-010001E0.1664.0.6114.0
-01000270.1663.1.6116.0
-01000300.1663.1.6113.0
-01000390.1663.1.6112.0
-01000420.1663.1.8903.0
-010004B0.1663.1.8902.0
-01000540.1663.1.6111.0
-010005D0.1663.1.6110.0

It's noteworthy that the raw images do not have the current LSN stored
with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor
would the checksum be valid in
them (though WAL itself has checksums, so there is some protection
there).  This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the
source image) in the outputted
block.  (This could perhaps be a targetted flag if we decide we don't
always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question,
based on historical
information, and may come in handy for forensics work.

Best,

David


v1-pg_waldump-save-fpi.patch
Description: Binary data