Hi, Am Donnerstag, dem 10.03.2022 um 15:28 +0100 schrieb Gunnar "Nick" Bluth: > Am 10.03.22 um 14:43 schrieb Michael Banck: > > some minor comments, I didn't look at the added test and I did not > > test the patch at all, but (as part of the Debian/Ubuntu packaging > > team) I think this patch is really important: > > Much appreciated! > > [...] > > Fixed. > > [...]
Thanks for the updated patch. The patch applies, make is ok, make check is ok, pg_rewind TAP tests are ok. I did some tests now with Patroni 2.1.3 and the attached patch applied. The following test was made: 1. Deploy 3-node (pg1, pg2, pg3) patroni cluster on Debian unstable running postgresql-15 (approx. master) 2. Switch on archive_mode, and set archive_command and restore_command 3. Switchover so that pg1 is the primary (if not already) 4. Kill pg1 hard 5. Wait till a new leader has taken over and the (now 2-node) cluster is healthy again 6. Restart pg1 without this patch: |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,084 INFO: running pg_rewind from pg3 |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,102 INFO: running pg_rewind from dbname=postgres user=postgres_rewind host=10.0.3.227 port=5432 target_session_attrs=read-write |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,135 INFO: pg_rewind exit code=1 |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,136 INFO: stdout= |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,136 INFO: stderr=postgres: could not access the server configuration file "/var/lib/postgresql/15/main/postgresql.conf": No such file or directory |Apr 03 19:09:01 pg1 patroni@15-main[99]: no data was returned by command "/usr/lib/postgresql/15/bin/postgres -D /var/lib/postgresql/15/main -C restore_command" |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,149 ERROR: Failed to rewind from healty master: pg3 |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,149 WARNING: remove_data_directory_on_diverged_timelines is set. removing... |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,149 INFO: Removing data directory: /var/lib/postgresql/15/main |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,245 INFO: Lock owner: pg3; I am pg1 |Apr 03 19:09:01 pg1 patroni@15-main[99]: 2022-04-03 19:09:01,248 INFO: trying to bootstrap from leader 'pg3' So pg_rewind fails and Patroni does a re-bootstrap. with this patch: |Apr 03 19:12:35 pg1 patroni@15-main[99]: 2022-04-03 19:12:35,576 INFO: running pg_rewind from pg2 |Apr 03 19:12:35 pg1 patroni@15-main[99]: 2022-04-03 19:12:35,590 INFO: running pg_rewind from dbname=postgres user=postgres_rewind host=10.0.3.145 port=5432 target_session_attrs=read-write |Apr 03 19:12:37 pg1 patroni@15-main[99]: 2022-04-03 19:12:37,147 INFO: pg_rewind exit code=0 |Apr 03 19:12:37 pg1 patroni@15-main[99]: 2022-04-03 19:12:37,148 INFO: stdout= |Apr 03 19:12:37 pg1 patroni@15-main[99]: 2022-04-03 19:12:37,148 INFO: stderr=pg_rewind: servers diverged at WAL location 0/1D000180 on timeline 38 |Apr 03 19:12:37 pg1 patroni@15-main[99]: pg_rewind: rewinding from last common checkpoint at 0/1D000108 on timeline 38 |Apr 03 19:12:37 pg1 patroni@15-main[99]: pg_rewind: Done! |Apr 03 19:12:37 pg1 patroni@15-main[99]: 2022-04-03 19:12:37,151 WARNING: Postgresql is not running. |Apr 03 19:12:37 pg1 patroni@15-main[99]: 2022-04-03 19:12:37,151 INFO: Lock owner: pg2; I am pg1 Here, pg_rewind is called and works fine. So I think this works as intended, and I'm marking it Ready for Committer. Michael -- Michael Banck Teamleiter PostgreSQL-Team Projektleiter Tel.: +49 2166 9901-171 E-Mail: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
--- /usr/lib/python3/dist-packages/patroni/postgresql/__init__.py 2022-02-18 13:16:15.000000000 +0000 +++ __init__.py 2022-04-03 19:17:29.952665383 +0000 @@ -798,7 +798,8 @@ return True def get_guc_value(self, name): - cmd = [self.pgcommand('postgres'), '-D', self._data_dir, '-C', name] + cmd = [self.pgcommand('postgres'), '-D', self._data_dir, '-C', name, + '--config-file={}'.format(self.config.postgresql_conf)] try: data = subprocess.check_output(cmd) if data: --- /usr/lib/python3/dist-packages/patroni/postgresql/rewind.py 2022-02-18 13:16:15.000000000 +0000 +++ rewind.py 2022-04-03 19:21:14.479726127 +0000 @@ -314,6 +314,7 @@ cmd = [self._postgresql.pgcommand('pg_rewind')] if self._postgresql.major_version >= 130000 and restore_command: cmd.append('--restore-target-wal') + cmd.append('--config-file={0}/postgresql.conf'.format(self._postgresql.config._config_dir)) cmd.extend(['-D', self._postgresql.data_dir, '--source-server', dsn]) while True: