o use the -c option,
+provide a (relative or absolute) path to the
postgresql.conf using this option.
+
It took me a while to understand the meaning of -c. Maybe
changing it to --restore-target-wal will make it easier to
understand.
[1]
https://www.postgresql.org/message-id/flat/16982-f12294cccd221480%40postgresql.org
Regards,
--
Alexander Kukushkin
reproduce it with 13.0 and 12.4, and I believe older
versions are also affected.
Regards,
--
Alexander Kukushkin
=0x56059f51a080) at
./build/../src/backend/postmaster/postmaster.c:1380 #17
0x56059d37a992 in main (argc=17, argv=0x56059f51a080) at
./build/../src/backend/main/main.c:228
It has happened on 11.9, but after looking at HEAD I think the problem
still exists.
Regards,
--
Alexander Kukushkin
d.
All of them have certain pros and cons. The third approach works good
for automation, but IMHO we should simply fix pg_rewind itself and SET
statement_timeout after establishing a connection, so everybody will
benefit from it.
Patch attached.
Regards,
--
Alexander Kukushkin
diff --git a/sr
nk we can use the same wrapper for run_simple_query() and
for places where we call a SET, because PQresultStatus() returns
PGRES_TUPLES_OK and PGRES_COMMAND_OK respectively.
Passing expected ExecStatusType to the wrapper for comparison is
looking a bit ugly to me.
Regards,
--
Alexander Kukushkin
a simple wrapper
> run_simple_command which checks after PGRES_COMMAND_OK, and frees the
> result then? This could be used for the temporary table creation and
> when setting synchronous_commit.
Done, please see the next version attached.
Regards,
--
Alexander Kukushkin
diff --git a/src/bin/pg_rewi
ong other output :(
Regards,
--
Alexander Kukushkin
it is not possible to copy&paste the token into psql password prompt,
but there is a workaround, export PGPASSWORD=verylongtokenstring &&
psql
JWT: https://jwt.io/
PAM module to verify OAuth tokens: https://github.com/CyberDem0n/pam-oauth2
Regards,
--
Alexander Kukushkin
ally such a little change made the python example behavior very
similar to the pg_recvlogical.
All above text probably looks like a brain dump, but I don't think
that it conflicts with Tomas's findings it rather compliments them.
I am very glad that now I know how to mitigate the problem on the
client side, but IMHO it is also very important to fix the server
behavior if it is ever possible.
Regards,
--
Alexander Kukushkin
eproduce it is a first
> (and quite important) step. So thanks for doing that.
>
> That being said, I think those are two separate issues, with different
> causes and likely different fixes. I don't think fixing the xlog flush
> will resolve your issue, and vice versa.
Agree, these are different issues.
Regards,
--
Alexander Kukushkin
gt;
> Uh, that seems a bit broken, perhaps?
Indeed, this is broken psycopg2 behavior :(
I am thinking about submitting a patch fixing it.
Actually I quickly skimmed through the pgjdbc logical replication
source code and example
https://jdbc.postgresql.org/documentation/head/replication.html and I
think that it will also cause problems with the shutdown.
Regards,
--
Alexander Kukushkin
t is not possible to put it into ".pgpass" either, because it assumes
that line could not be longer than 320 bytes (64*5)
At the moment there are only two ways to use such tokens as a password:
1. export PGPASSWORD=very_long.token
2. specify the token(password) in the connection url
Regards,
--
Alexander Kukushkin
Hi,
Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin wrote:
>
> Hi,
>
> On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> wrote:
>
> >
> > Instaed, we can iterally "reserve&quo
Hi,
attaching the new version of the patch.
Now it simply reserves max_wal_senders slots in the ProcGlobal, what
guarantees that only walsender process could use them.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index
for these builds and the
introduced TAP test being executed.
Regards,
--
Alexander Kukushkin
ut what has happened with the old primary
2. Unlike "pg_wal", the "log" directory is not necessarily located inside
PGDATA. The actual value is configured using "log_directory" GUC, which
just happened to be "log" by default. And in fact actual values on
g_log" and on the target we have
"my_log" (they are configured using "log_directory" GUC).
When doing rewind in this case we want neither to remove the content of
"my_log" on the target nor to copy content of "pg_log" from the source.
It couldn't be achieved just by introducing a static string "log". The
"log_directory" GUC must be examined on both, source and target.
Regards,
--
Alexander Kukushkin
from the target cluster.
*
* This uses a logic based on "postgres -C" to get the value from the
* cluster.
*/
static void
getRestoreCommand(const char *argv0)
For the running source cluster one could just use "SHOW log_directory"
Regards,
--
Alexander Kukushkin
regardless of the changes above, if recovery on the second
> standby had reached the end of the page-spanning record before
> redirection to the first standby, it would need pg_rewind to connect
> to the first standby.
>
Correct, IMO pg_rewind is a right way of solving it.
Regards,
--
Alexander Kukushkin
changing of
primary_conninfo GUC with reload.
Please find attached the TAP test that reproduces the problem.
To be honest, I don't know yet how to fix it nicely. I am thinking about
returning XLREAD_FAIL from XLogPageRead() if it suddenly switched to a new
timeline while trying
ly I will get something done next week.
>
> Nit. In your test, it seems to me that you should not call directly
> set_standby_mode and enable_restoring, just rely on has_restoring with
> the standby option included.
>
Thanks, I'll look into it.
--
Regards,
--
Alexander Kukushkin
the second standby copies the intentionally broken file, which differs
> from the data that should be received via streaming.
As I already said, this is a simple way to emulate the primary crash while
standbys receiving WAL.
It could easily happen that the record spans on multiple pages is not fully
received and flushed.
--
Regards,
--
Alexander Kukushkin
Hello Michael, Kyotaro,
Please find attached the patch fixing the problem and the updated TAP test
that addresses Nit.
--
Regards,
--
Alexander Kukushkin
042_no_contrecord_switch.pl
Description: Perl program
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam
ixed. However, for reasons unclear to me, it shows another issue, and
> I am running out of time and need more caffeine. I'll continue
> investigating this tomorrow.
>
Thank you for spending your time on it!
--
Regards,
--
Alexander Kukushkin
Hi,
On Wed, 18 Oct 2023 at 08:50, torikoshia wrote:
>
> I have very minor questions on the regression tests mainly regarding the
> consistency with other tests for pg_rewind:
>
Please find attached a new version of the patch. It addresses all your
comments.
Regards,
--
Alexand
)'" as a alternative to "false", because the
> last one
> 'a' should be 'an'?
>
>
Thanks for the feedback
Please find the new version attached.
Regards,
--
Alexander Kukushkin
On Thu, 2 Nov 2023 at 04:24, torikoshia wrote:
> On 2023-1
TER SYSTEM
In my opinion it would be fair to make parsing of config files with the
rest of the code responsible for GUC handling by allowing custom parameters
containing more than one dot.
The fix is rather simple, please find the patch attached.
Regards,
--
Alexander Kukushkin
From 755b7d9a44901f3
db=# let testdb.testschema.testvar = 1;
LET
testdb=# select testdb.testschema.testvar;
testvar
-
1
(1 row)
Regards,
--
Alexander Kukushkin
could be extremely inefficient and unnecessary.
3. Added TAP test that actually at least one file isn't removed.
Regards,
--
Alexander Kukushkin
From 3e1e6c9d968e9b829357b6eb0a7dfa366b550668 Mon Sep 17 00:00:00 2001
From: Alexander Kukushkin
Date: Tue, 12 Sep 2023 14:09:47 +0200
Subject: [PAT
Hi,
Please find attached v6.
Changes compared to v5:
1. use "perl -e 'exit(1)'" instead of "false" as archive_command, so it
also works on Windows
2. fixed the test name
Regards,
--
Alexander Kukushkin
From 3e1e6c9d968e9b829357b6eb0a7dfa366b550668 Mon Sep 17
Hello Kyotaro,
any further thoughts on it?
Regards,
--
Alexander Kukushkin
│ t
(1 row)
IMO is totally wrong, because the actual value didn't change: it was an
empty string in the config and now it remains an empty string due to the
default value in the guc.c
Regards,
--
Alexander Kukushkin
can just set search_path for the current session.
With this feature they will also be able to influence search_path of not
protected functions when they create an extension.
Regards,
--
Alexander Kukushkin
on.
But, I also agree with Jelte, it should be a property of a control file,
rather than a user controlled parameter, so that an attacker can't opt out.
Regards,
--
Alexander Kukushkin
issions on a schema, or we need to remove this check from
ALTER EXTENSION ... SET SCHEMA.
2. Independently of that we need to switch to a BOOTSTRAP_SUPERUSERID in
the AlterExtensionNamespace()
What do you think?
Regards,
--
Alexander Kukushkin
ause XLOGDIR is "pg_wal". So xlogfname has enough
> size but snprintf(xlogfname, MAXPGPATH) is wrong usage.
> (And XLogFileName() uses snprintf(xlogfname, MAXFNAMELEN)
> internally.)
>
Nice catch!
I don't think we need another buffer here, just need to use MAXFNAMELEN,
because
one wants to
> emulate a delay in WAL archiving, it is possible to set
> archive_command to a command that we know will fail, for instance.
>
Yes, I totally agree, it is on our radar, but meanwhile please see the new
version, just to check if I correctly understood your idea.
Regards
Hi Michael and Kyotaro,
Now that beta1 was released I hope you are not so busy and hence would like
to follow up on this problem.
Regards,
--
Alexander Kukushkin
would be much better if pg_rewind didn't remove WAL files between the
last common checkpoint and diverged LSN in the first place.
Regards,
--
Alexander Kukushkin
5433 -c 'checkpoint'
echo "primary_conninfo='host=/tmp port=5433'">> oldprim/postgresql.conf
touch oldprim/standby.signal
postgres -D oldprim # fails with "WAL file has been removed"
# The alternative of copying-in
# echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f
%p'">> oldprim/postgresql.conf
# copy-in WAL files from new primary's archive to old primary
(cd newarch;
for f in `ls`; do
if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal;
fi
done)
postgres -D oldprim # also fails with "requested WAL segment XXX has
already been removed"
===
Regards,
--
Alexander Kukushkin
>
>
> I did a slight modification of your script that reproduces a problem.
>
>
>
>
It seems that formatting damaged the script, so I better attach it as a
file.
Regards,
--
Alexander Kukushkin
pg_rewind-removes-wal-segments-reproduce.sh
Description: application/shellscript
k becomes more complex because we will have to consider both timelines,
1 and 2.
Also, we need to take into account the divergency LSN. Files after it are
not required.
Regards,
--
Alexander Kukushkin
WAL file name
(and also takes into account TLI) could be much simpler than the current
approach.
Actually, since we start doing some additional "manipulations" with files
in pg_wal, we probably should do a symmetric action with files inside
pg_wal/archive_status
Regards,
--
Alexander Kukushkin
> requiring its value on the replica to be not lower than the one on the
> > primary?
> >
>
> I think it does, we need the proc slots for walsenders on the standby
> same way we do for normal backends.
You are absolutely right. Attaching the new version of the patch.
Regards
ERSION,
because we added the new field into the control file, but decided to
leave this change to committer.
Regards,
--
Alexander Kukushkin
.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d60026d..13caeef 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2273,6 +2273,10 @@ InitWalSenderSlot(void)
walsnd->apply
maybe even
(pg_wal => streaming => restore_command).
I am not sure about the last option, but in any case. before going to
some remote place, postgres should try to find (and try to replay) the
WAL file in the pg_wal.
Regards,
--
Alexander Kukushkin
similarly to
the remove_target_file() if the errno == ENOENT
What do you think?
Regards,
--
Alexander Kukushkin
> idea for me, I do not see anything criminal in skipping non-existing
> file, when executing a file map by pg_rewind.
Good, I will prepare a patch then.
Regards,
--
Alexander Kukushkin
pg_wal just
works.
At the same time, pg_rewind due to such "fatal" error leaves PGDATA in
an inconsistent state with empty pg_control file, this is totally bad
and easily fixable. We want the specific file to be absent and it is
already absent, why should it be a fatal error and not warning?
Regards,
--
Alexander Kukushkin
fter that I've re-analyze those columns.
Regards,
Alexander Kukushkin
wait forever.
At the same time, if you do immediate or smart shutdown, it works fine.
The problem is in the pmdie function. Proposed fix attached.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b33cd..9b36941a20
quot;.done". Postgres
will not call archive_command for files which already marked as
".done".
I think most of the good backup tools already doing that. For example,
pgBackRest, wal-e, wal-g (just named the tools I was working with)/
Regards,
--
Alexander Kukushkin
f the fix.
Regards,
--
Alexander Kukushkin
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b33cd..2215ebbb5a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2685,7 +2685,7 @@ pmdie(SIGNAL_ARG
a happened to crash in such conditions it reaches a
consistency much earlier than it should!
Regards,
--
Alexander Kukushkin
799 in PostmasterMain (argc=17,
argv=0x5646d2e53390) at postmaster.c:1329
#14 0x5646d08d2880 in main (argc=17, argv=0x5646d2e53390) at main.c:228
Regards,
--
Alexander Kukushkin
ay all WAL files it managed to write to
disk before the first crash.
Regards,
--
Alexander Kukushkin
ntroduces
replication_reservd_connections GUC
Regards,
--
Alexander Kukushkin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e1073ac6d3..80e6ef9f67 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3059,6 +3059,32 @@ include_dir 'conf.d'
omes after all the slots are filled to grab an
> available "normal" slot, it works as the same as the current
> behavior when walsender_reserved_connectsions = 0.
>
> What do you think about this?
Sounds reasonable, please see the updated patch.
Regards,
--
Alexander Kukushk
Hi Michael,
Now that v17 is released and before v18 feature freeze we have a few
months, I hope you will find some time to look at it.
On Wed, 5 Jun 2024 at 07:09, Michael Paquier wrote:
> On Tue, Jun 04, 2024 at 04:16:43PM +0200, Alexander Kukushkin wrote:
> > Now that beta1 was r
e commit message is
> sound?
>
> Thanks
>
> --
> Álvaro Herrera 48°01'N 7°57'E —
> https://www.EnterpriseDB.com/
> "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
>
Regards,
--
Alexander Kukushkin
tgres" (OID 5) @ TID (0,4)
REASSIGN OWNED
It was discovered on 16.6, however the master shows the same behaviour.
I suspect it to be due to aac2c9b4fde889d13f859c233c2523345e72d32b.
Regards,
--
Alexander Kukushkin
27;exit(1)''
>
> The test still succeeds for some reason.
>
Oh, nice catch.
The attached patch should address it.
--
Regards,
--
Alexander Kukushkin
diff --git a/src/bin/pg_rewind/t/010_keep_recycled_wals.pl b/src/bin/pg_rewind/t/010_keep_recycled_wals.pl
index e6dfce2a54.
AL_BLOCK_SIZE
{
my $self = shift;
self->{_WAL_BLOCK_SIZE} = self->get_int_setting('wal_block_size')
unless defined self->{_WAL_BLOCK_SIZE};
return self->{_WAL_BLOCK_SIZE};
}
sub start_of_page
{
my ($self, $lsn) = @_;
return $lsn & ~($self->WAL_BLOCK_SIZE - 1);
}
Regards,
--
Alexander Kukushkin
unctions, pg_last_wal_write_lsn() and
pg_last_wal_flush_lsn()?
These names are more aligned with column names in pg_stat_replication view
and speak for themselves.
And, we can keep pg_last_wal_receive_lsn() as an alias of
pg_last_wal_flush_lsn() for backward compatibility.
--
Regards,
--
Alexander Kukushkin
LSN, because standby may be replaying WALs from the archive
using restore_command and as a result only replay LSN is progressing.
That is, they are supposed to be doing something like max(write_lsn,
replay_lsn).
--
Regards,
--
Alexander Kukushkin
atever LSN client sends.
I tend to agree with Amit, we shouldn't allow confirmed_flush_lsn to move
backwards.
--
Regards,
--
Alexander Kukushkin
n this case doesn't feel right, because these tables are still
different and therefore queries.
Regards,
--
Alexander Kukushkin
so for physical walsenders
it might result in increased resource usage on primary without providing
much additional value. For example, restore_command is failing, but standby
indefinitely continues making replication connection attempts.
I don't mind if it will also work for physical replica
iling, what is cheap.
Calling restore_command is more expensive and therefore the impact on
resource usage will be bigger.
Regards,
--
Alexander Kukushkin
logical replication it is a different story, and it would be
really great if restore_command is used when WAL's are missing to fetch it.
Regards,
--
Alexander Kukushkin
| 2
shared_blks_hit| 4425
That is, two different queries, accessing two absolutely different tables
(one of them has 10 times more rows!) were merged together.
Regards,
--
Alexander Kukushkin
72 matches
Mail list logo