above, if recovery on the second
> standby had reached the end of the page-spanning record before
> redirection to the first standby, it would need pg_rewind to connect
> to the first standby.
>
Correct, IMO pg_rewind is a right way of solving it.
Regards,
--
Alexander Kukushkin
ixed. However, for reasons unclear to me, it shows another issue, and
> I am running out of time and need more caffeine. I'll continue
> investigating this tomorrow.
>
Thank you for spending your time on it!
--
Regards,
--
Alexander Kukushkin
Hello Michael, Kyotaro,
Please find attached the patch fixing the problem and the updated TAP test
that addresses Nit.
--
Regards,
--
Alexander Kukushkin
042_no_contrecord_switch.pl
Description: Perl program
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam
copies the intentionally broken file, which differs
> from the data that should be received via streaming.
As I already said, this is a simple way to emulate the primary crash while
standbys receiving WAL.
It could easily happen that the record spans on multiple pages is not fully
received and flushed.
--
Regards,
--
Alexander Kukushkin
get something done next week.
>
> Nit. In your test, it seems to me that you should not call directly
> set_standby_mode and enable_restoring, just rely on has_restoring with
> the standby option included.
>
Thanks, I'll look into it.
--
Regards,
--
Alexander Kukushkin
ing of
primary_conninfo GUC with reload.
Please find attached the TAP test that reproduces the problem.
To be honest, I don't know yet how to fix it nicely. I am thinking about
returning XLREAD_FAIL from XLogPageRead() if it suddenly switched to a new
timeline while trying to read a
these builds and the
introduced TAP test being executed.
Regards,
--
Alexander Kukushkin
.testschema.testvar = 1;
LET
testdb=# select testdb.testschema.testvar;
testvar
-----
1
(1 row)
Regards,
--
Alexander Kukushkin
TER SYSTEM
In my opinion it would be fair to make parsing of config files with the
rest of the code responsible for GUC handling by allowing custom parameters
containing more than one dot.
The fix is rather simple, please find the patch attached.
Regards,
--
Alexander Kukushkin
From 755b7d9a44901f3
ot; as a alternative to "false", because the
> last one
> 'a' should be 'an'?
>
>
Thanks for the feedback
Please find the new version attached.
Regards,
--
Alexander Kukushkin
On Thu, 2 Nov 2023 at 04:24, torikoshia wrote:
> On 2023-10-31 00:26, Alexander Kukushki
Hi,
On Wed, 18 Oct 2023 at 08:50, torikoshia wrote:
>
> I have very minor questions on the regression tests mainly regarding the
> consistency with other tests for pg_rewind:
>
Please find attached a new version of the patch. It addresses all your
comments.
Regards,
--
Alexand
Hi,
Please find attached v6.
Changes compared to v5:
1. use "perl -e 'exit(1)'" instead of "false" as archive_command, so it
also works on Windows
2. fixed the test name
Regards,
--
Alexander Kukushkin
From 3e1e6c9d968e9b829357b6eb0a7dfa366b550668 Mon Sep 17 00:00:00
it could be extremely inefficient and unnecessary.
3. Added TAP test that actually at least one file isn't removed.
Regards,
--
Alexander Kukushkin
From 3e1e6c9d968e9b829357b6eb0a7dfa366b550668 Mon Sep 17 00:00:00 2001
From: Alexander Kukushkin
Date: Tue, 12 Sep 2023 14:09:47 +0200
Subject: [PATCH v5
gt; emulate a delay in WAL archiving, it is possible to set
> archive_command to a command that we know will fail, for instance.
>
Yes, I totally agree, it is on our radar, but meanwhile please see the new
version, just to check if I correctly understood your idea.
Regards,
--
Alexander
the target cluster.
*
* This uses a logic based on "postgres -C" to get the value from the
* cluster.
*/
static void
getRestoreCommand(const char *argv0)
For the running source cluster one could just use "SHOW log_directory"
Regards,
--
Alexander Kukushkin
on the target we have
"my_log" (they are configured using "log_directory" GUC).
When doing rewind in this case we want neither to remove the content of
"my_log" on the target nor to copy content of "pg_log" from the source.
It couldn't be achieved just by introducing a static string "log". The
"log_directory" GUC must be examined on both, source and target.
Regards,
--
Alexander Kukushkin
s happened with the old primary
2. Unlike "pg_wal", the "log" directory is not necessarily located inside
PGDATA. The actual value is configured using "log_directory" GUC, which
just happened to be "log" by default. And in fact actual values on source
and
Hello Kyotaro,
any further thoughts on it?
Regards,
--
Alexander Kukushkin
name
(and also takes into account TLI) could be much simpler than the current
approach.
Actually, since we start doing some additional "manipulations" with files
in pg_wal, we probably should do a symmetric action with files inside
pg_wal/archive_status
Regards,
--
Alexander Kukushkin
k becomes more complex because we will have to consider both timelines,
1 and 2.
Also, we need to take into account the divergency LSN. Files after it are
not required.
Regards,
--
Alexander Kukushkin
>
>
> I did a slight modification of your script that reproduces a problem.
>
>
>
>
It seems that formatting damaged the script, so I better attach it as a
file.
Regards,
--
Alexander Kukushkin
pg_rewind-removes-wal-segments-reproduce.sh
Description: application/shellscript
en removed"
# The alternative of copying-in
# echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f
%p'">> oldprim/postgresql.conf
# copy-in WAL files from new primary's archive to old primary
(cd newarch;
for f in `ls`; do
if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal;
fi
done)
postgres -D oldprim # also fails with "requested WAL segment XXX has
already been removed"
===
Regards,
--
Alexander Kukushkin
better if pg_rewind didn't remove WAL files between the
last common checkpoint and diverged LSN in the first place.
Regards,
--
Alexander Kukushkin
t to use the -c option,
+provide a (relative or absolute) path to the
postgresql.conf using this option.
+
It took me a while to understand the meaning of -c. Maybe
changing it to --restore-target-wal will make it easier to
understand.
[1]
https://www.postgresql.org/message-id/flat/16982-f12294cccd221480%40postgresql.org
Regards,
--
Alexander Kukushkin
wrong, because the actual value didn't change: it was an
empty string in the config and now it remains an empty string due to the
default value in the guc.c
Regards,
--
Alexander Kukushkin
=0x56059f51a080) at
./build/../src/backend/postmaster/postmaster.c:1380 #17
0x56059d37a992 in main (argc=17, argv=0x56059f51a080) at
./build/../src/backend/main/main.c:228
It has happened on 11.9, but after looking at HEAD I think the problem
still exists.
Regards,
--
Alexander Kukushkin
to reproduce it with 13.0 and 12.4, and I believe older
versions are also affected.
Regards,
--
Alexander Kukushkin
ide pg_wal just
works.
At the same time, pg_rewind due to such "fatal" error leaves PGDATA in
an inconsistent state with empty pg_control file, this is totally bad
and easily fixable. We want the specific file to be absent and it is
already absent, why should it be a fatal error and not warning?
Regards,
--
Alexander Kukushkin
see anything criminal in skipping non-existing
> file, when executing a file map by pg_rewind.
Good, I will prepare a patch then.
Regards,
--
Alexander Kukushkin
e remove_target_file() if the errno == ENOENT
What do you think?
Regards,
--
Alexander Kukushkin
is not possible to copy the token into psql password prompt,
but there is a workaround, export PGPASSWORD=verylongtokenstring &&
psql
JWT: https://jwt.io/
PAM module to verify OAuth tokens: https://github.com/CyberDem0n/pam-oauth2
Regards,
--
Alexander Kukushkin
ong other output :(
Regards,
--
Alexander Kukushkin
mple wrapper
> run_simple_command which checks after PGRES_COMMAND_OK, and frees the
> result then? This could be used for the temporary table creation and
> when setting synchronous_commit.
Done, please see the next version attached.
Regards,
--
Alexander Kukushkin
diff --git a/src/bin/pg_rewind/li
use the same wrapper for run_simple_query() and
for places where we call a SET, because PQresultStatus() returns
PGRES_TUPLES_OK and PGRES_COMMAND_OK respectively.
Passing expected ExecStatusType to the wrapper for comparison is
looking a bit ugly to me.
Regards,
--
Alexander Kukushkin
ll of them have certain pros and cons. The third approach works good
for automation, but IMHO we should simply fix pg_rewind itself and SET
statement_timeout after establishing a connection, so everybody will
benefit from it.
Patch attached.
Regards,
--
Alexander Kukushkin
diff --git a/src/bin
bit broken, perhaps?
Indeed, this is broken psycopg2 behavior :(
I am thinking about submitting a patch fixing it.
Actually I quickly skimmed through the pgjdbc logical replication
source code and example
https://jdbc.postgresql.org/documentation/head/replication.html and I
think that it will also cause problems with the shutdown.
Regards,
--
Alexander Kukushkin
st
> (and quite important) step. So thanks for doing that.
>
> That being said, I think those are two separate issues, with different
> causes and likely different fixes. I don't think fixing the xlog flush
> will resolve your issue, and vice versa.
Agree, these are different issues.
Regards,
--
Alexander Kukushkin
python example behavior very
similar to the pg_recvlogical.
All above text probably looks like a brain dump, but I don't think
that it conflicts with Tomas's findings it rather compliments them.
I am very glad that now I know how to mitigate the problem on the
client side, but IMHO it is also very important to fix the server
behavior if it is ever possible.
Regards,
--
Alexander Kukushkin
ERSION,
because we added the new field into the control file, but decided to
leave this change to committer.
Regards,
--
Alexander Kukushkin
> requiring its value on the replica to be not lower than the one on the
> > primary?
> >
>
> I think it does, we need the proc slots for walsenders on the standby
> same way we do for normal backends.
You are absolutely right. Attaching the new version of the patch.
Regards
Hi,
attaching the new version of the patch.
Now it simply reserves max_wal_senders slots in the ProcGlobal, what
guarantees that only walsender process could use them.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index
Hi,
Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin wrote:
>
> Hi,
>
> On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> wrote:
>
> >
> > Instaed, we can iterally "reserve&quo
is not possible to put it into ".pgpass" either, because it assumes
that line could not be longer than 320 bytes (64*5)
At the moment there are only two ways to use such tokens as a password:
1. export PGPASSWORD=very_long.token
2. specify the token(password) in the connection url
Regards,
--
Alexander Kukushkin
rs comes after all the slots are filled to grab an
> available "normal" slot, it works as the same as the current
> behavior when walsender_reserved_connectsions = 0.
>
> What do you think about this?
Sounds reasonable, please see the updated patch.
Regards,
--
Alexander Kukushk
ntroduces
replication_reservd_connections GUC
Regards,
--
Alexander Kukushkin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e1073ac6d3..80e6ef9f67 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3059,6 +3059,32 @@ include_dir 'conf.d'
+
managed to write to
disk before the first crash.
Regards,
--
Alexander Kukushkin
v=0x5646d2e53390) at postmaster.c:1329
#14 0x5646d08d2880 in main (argc=17, argv=0x5646d2e53390) at main.c:228
Regards,
--
Alexander Kukushkin
in such conditions it reaches a
consistency much earlier than it should!
Regards,
--
Alexander Kukushkin
fix.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b33cd..2215ebbb5a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2685,7 +2685,7 @@ pmdie(SIGNAL_ARGS)
quot;.done". Postgres
will not call archive_command for files which already marked as
".done".
I think most of the good backup tools already doing that. For example,
pgBackRest, wal-e, wal-g (just named the tools I was working with)/
Regards,
--
Alexander Kukushkin
will wait forever.
At the same time, if you do immediate or smart shutdown, it works fine.
The problem is in the pmdie function. Proposed fix attached.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b33cd..9b36941a20
aming => restore_command).
I am not sure about the last option, but in any case. before going to
some remote place, postgres should try to find (and try to replay) the
WAL file in the pg_wal.
Regards,
--
Alexander Kukushkin
.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d60026d..13caeef 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2273,6 +2273,10 @@ InitWalSenderSlot(void)
walsnd->apply
I've re-analyze those columns.
Regards,
Alexander Kukushkin
54 matches
Mail list logo