Re: Replication failure, slave requesting old segments

2018-08-13 Thread Phil Endecott

Adrian Klaver wrote:
"If you set up a WAL archive that's accessible from the standby, these 
solutions are not required, since the standby can always use the archive 
to catch up provided it retains enough segments. *This is dependent on 
verification that the archiving is working properly. A belt and 
suspenders approach would be to set wal_keep_segments to a value > 0 in 
the event archiving is not properly functioning*"

"


Adrian, I believe that the suggestion that my issue was the result of
my archiving process not working is not correct.

The quote above does not address the requirement for wal_keep_segments
to be >= 1 even when archiving is functioning correctly.

I will continue to monitor this thread in the hope that others will
confirm my understanding, but otherwise I will bow out now (and file a
bug).


Thanks to all.

Phil.






Re: Replication failure, slave requesting old segments

2018-08-13 Thread Adrian Klaver

On 08/13/2018 05:08 AM, Phil Endecott wrote:

Adrian Klaver wrote:

On 08/12/2018 02:56 PM, Phil Endecott wrote:
Anyway.  Do others agree that my issue was the result of 
wal_keep_segments=0 ?


Only as a sub-issue of the slave losing contact with the master. The 
basic problem is maintaining two separate operations, archiving and 
streaming, in sync. If either or some combination of both lose 
synchronization then it is anyone's guess on what is appropriate for 
wal_keep_segments.


Really?  I thought the intention was that the system should be
able to recover reliably when the slave reconnects after a
period of downtime, subject only to there being sufficient
network/CPU/disk bandwidth etc. for it to eventually catch up.


See also my reply to Stephen earlier. Basically you are trying to 
coordinate two different operations. They start from the same source 
pg_xlog(pg_wal 10+) but arrive on a different time scale and from 
different locations. Without sufficient sanity checks it is possible 
they diverge enough on one or both paths to render the process unstable.




If that's not true, I think the docs need an even more extensive
overhaul!  Suggestion for the paragraph that I quoted before from
26.2.5:

"If you set up a WAL archive that's accessible from the standby,
it's anyone's guess what is appropriate for wal_keep_segments."


I would say that:

"If you set up a WAL archive that's accessible from the standby, these 
solutions are not required, since the standby can always use the archive 
to catch up provided it retains enough segments."


should be more like:

"If you set up a WAL archive that's accessible from the standby, these 
solutions are not required, since the standby can always use the archive 
to catch up provided it retains enough segments. *This is dependent on 
verification that the archiving is working properly. A belt and 
suspenders approach would be to set wal_keep_segments to a value > 0 in 
the event archiving is not properly functioning*"

"




Regards, Phil.








--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 08/13/2018 05:39 AM, Stephen Frost wrote:
> >* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>Adrian Klaver wrote:
> >>>On 08/12/2018 02:56 PM, Phil Endecott wrote:
> Anyway.  Do others agree that my issue was the result of
> wal_keep_segments=0 ?
> >>>
> >>>Only as a sub-issue of the slave losing contact with the master. The basic
> >>>problem is maintaining two separate operations, archiving and streaming,
> >>>in sync. If either or some combination of both lose synchronization then
> >>>it is anyone's guess on what is appropriate for wal_keep_segments.
> >
> >Uh, no, having an archive_command and a restore_command configures
> >exactly should remove the need to worry about what wal_keep_segments is
> >set to because anything not on the primary really should be available
> >through what's been archived and PG shouldn't have any trouble figuring
> >that out and working with it.
> >
> >If all you've got is streaming replication then, sure, you have no idea
> >what to set wal_keep_segments to because the replica could be offline
> >for an indeterminate amount of time, but as long as you're keeping track
> >of all the WAL through archive_command, that shouldn't be an issue.
> 
> Therein lies the rub. As I stated previously the bigger issue is syncing two
> different operations, archiving and streaming. 

That's not correct though, there isn't a big issue regarding syncing of
those two operations.

> The OP got caught short
> assuming the archiving would handle the situation where the streaming was
> down for a period. In his particular setup and for this particular situation
> a wal_keep_segments of 1 would have helped. I do not see this as a default
> value though as it depends on too many variables outside the reach of the
> database, mostly notably the success of the archive command. 

What's been pointed out here is that even if everything is working
(archive_command, restore_command, etc) there's a possible gap in the
transisition from replay-from-archive and starting streaming replication
where the replica might not be able to start streaming.

That's an entirely *PostgreSQL* issue, as far as I see it, and hasn't
got anything to do with his particular setup except that he managed to
expose the issue.

> First is the
> command even valid, two is the network link reliable, three is there even a
> network link, is there more then one network link, four is the restore
> command valid? That is just of the top of my head, more caffeine and I could
> come up with more. Saying that having archiving, streaming and a
> wal_keep_segments=1 has you covered, is misleading. I don't see it as
> detrimental to performance but I do see more posts down the road from folks
> who are surprised when it does not cover their case. Personally I think it
> better to be up front that this requires more thought or a third party
> solution that has done the thinking.

This is all down to "and you should also monitor to make sure things
continue working" which I certainly agree with but that doesn't mean we
shouldn't fix this issue.

This entire side-discussion feels like it's really off in the weeds.
The next steps which I outlined a while ago seem to still be entirely
appropriate and we should figure out a way to solve this issue so that,
when everything else is working (archive command, restore command,
replica is able to connect to the primary, etc), that PG behaves sanely
and is able to catch up with and connect to the primary and resume
streaming.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Adrian Klaver

On 08/13/2018 05:39 AM, Stephen Frost wrote:

Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:

Adrian Klaver wrote:

On 08/12/2018 02:56 PM, Phil Endecott wrote:

Anyway.  Do others agree that my issue was the result of
wal_keep_segments=0 ?


Only as a sub-issue of the slave losing contact with the master. The basic
problem is maintaining two separate operations, archiving and streaming,
in sync. If either or some combination of both lose synchronization then
it is anyone's guess on what is appropriate for wal_keep_segments.


Uh, no, having an archive_command and a restore_command configures
exactly should remove the need to worry about what wal_keep_segments is
set to because anything not on the primary really should be available
through what's been archived and PG shouldn't have any trouble figuring
that out and working with it.

If all you've got is streaming replication then, sure, you have no idea
what to set wal_keep_segments to because the replica could be offline
for an indeterminate amount of time, but as long as you're keeping track
of all the WAL through archive_command, that shouldn't be an issue.


Therein lies the rub. As I stated previously the bigger issue is syncing 
two different operations, archiving and streaming. The OP got caught 
short assuming the archiving would handle the situation where the 
streaming was down for a period. In his particular setup and for this 
particular situation a wal_keep_segments of 1 would have helped. I do 
not see this as a default value though as it depends on too many 
variables outside the reach of the database, mostly notably the success 
of the archive command. First is the command even valid, two is the 
network link reliable, three is there even a network link, is there more 
then one network link, four is the restore command valid? That is just 
of the top of my head, more caffeine and I could come up with more. 
Saying that having archiving, streaming and a wal_keep_segments=1 has 
you covered, is misleading. I don't see it as detrimental to performance 
but I do see more posts down the road from folks who are surprised when 
it does not cover their case. Personally I think it better to be up 
front that this requires more thought or a third party solution that has 
done the thinking.





Really?  I thought the intention was that the system should be
able to recover reliably when the slave reconnects after a
period of downtime, subject only to there being sufficient
network/CPU/disk bandwidth etc. for it to eventually catch up.


Yes, that's correct, the replica should always be able to catch back up
presuming there's no gaps in the WAL between when the replica failed and
where the primary is at.

Thanks!

Stephen




--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Adrian Klaver wrote:
> >On 08/12/2018 02:56 PM, Phil Endecott wrote:
> >>Anyway.  Do others agree that my issue was the result of
> >>wal_keep_segments=0 ?
> >
> >Only as a sub-issue of the slave losing contact with the master. The basic
> >problem is maintaining two separate operations, archiving and streaming,
> >in sync. If either or some combination of both lose synchronization then
> >it is anyone's guess on what is appropriate for wal_keep_segments.

Uh, no, having an archive_command and a restore_command configures
exactly should remove the need to worry about what wal_keep_segments is
set to because anything not on the primary really should be available
through what's been archived and PG shouldn't have any trouble figuring
that out and working with it.

If all you've got is streaming replication then, sure, you have no idea
what to set wal_keep_segments to because the replica could be offline
for an indeterminate amount of time, but as long as you're keeping track
of all the WAL through archive_command, that shouldn't be an issue.

> Really?  I thought the intention was that the system should be
> able to recover reliably when the slave reconnects after a
> period of downtime, subject only to there being sufficient
> network/CPU/disk bandwidth etc. for it to eventually catch up.

Yes, that's correct, the replica should always be able to catch back up
presuming there's no gaps in the WAL between when the replica failed and
where the primary is at.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Adrian Klaver wrote:
> >On 08/12/2018 03:54 PM, Stephen Frost wrote:
> >>Greetings,
> >>
> >>* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>>OK.  I think this is perhaps a documentation bug, maybe a missing
> >>>warning when the master reads its configuration, and maybe (as you say)
> >>>a bad default value.
> >>
> >>If we consider it to be an issue worthy of a change then we should
> >>probably just change the default value, and maybe not even allow it to
> >>be set lower than '1'.
> >
> >I would say leave the default at 0 as it leaves no doubt that you are
> >performing without a net. A setting of '1' implies you are covered and for
> >a fast moving cluster or slow moving one with sufficient downtime that
> >would not be the case.
> 
> Can you explain how it can fail in the case of a "slow moving cluster with
> sufficient downtime"?

I'm guessing 'sufficient downtime' here is, basically, 'offline until
the next checkpoint', which isn't actually all *that* much time.

> It seems to me that if I have correctly understood what happened in this
> case then 0, the default, really cannot ever work properly when you have
> enabled WAL archiving plus streaming.

Well, it's not like it'd work without WAL archiving either, though
that's perhaps more obviously true.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Phil Endecott

Adrian Klaver wrote:

On 08/12/2018 03:54 PM, Stephen Frost wrote:

Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:

OK.  I think this is perhaps a documentation bug, maybe a missing
warning when the master reads its configuration, and maybe (as you say)
a bad default value.


If we consider it to be an issue worthy of a change then we should
probably just change the default value, and maybe not even allow it to
be set lower than '1'.



I would say leave the default at 0 as it leaves no doubt that you are 
performing without a net. A setting of '1' implies you are covered and 
for a fast moving cluster or slow moving one with sufficient downtime 
that would not be the case.


Can you explain how it can fail in the case of a "slow moving cluster with
sufficient downtime"?

It seems to me that if I have correctly understood what happened in this
case then 0, the default, really cannot ever work properly when you have
enabled WAL archiving plus streaming.

Better to let the end user know this is not 
a simple problem and some thought needs to go into configuration.


I certainly agree that this is "not a simple problem", having read something
like 18,000 words of documentation multiple times and, apparently, still
got it wrong in multiple ways.


Regards, Phil.







Re: Replication failure, slave requesting old segments

2018-08-13 Thread Phil Endecott

Adrian Klaver wrote:

On 08/12/2018 02:56 PM, Phil Endecott wrote:
Anyway.  Do others agree that my issue was the result of 
wal_keep_segments=0 ?


Only as a sub-issue of the slave losing contact with the master. The 
basic problem is maintaining two separate operations, archiving and 
streaming, in sync. If either or some combination of both lose 
synchronization then it is anyone's guess on what is appropriate for 
wal_keep_segments.


Really?  I thought the intention was that the system should be
able to recover reliably when the slave reconnects after a
period of downtime, subject only to there being sufficient
network/CPU/disk bandwidth etc. for it to eventually catch up.

If that's not true, I think the docs need an even more extensive
overhaul!  Suggestion for the paragraph that I quoted before from
26.2.5:

"If you set up a WAL archive that's accessible from the standby,
it's anyone's guess what is appropriate for wal_keep_segments."


Regards, Phil.







Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 08/12/2018 03:54 PM, Stephen Frost wrote:
> >* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>OK.  I think this is perhaps a documentation bug, maybe a missing
> >>warning when the master reads its configuration, and maybe (as you say)
> >>a bad default value.
> >
> >If we consider it to be an issue worthy of a change then we should
> >probably just change the default value, and maybe not even allow it to
> >be set lower than '1'.
> 
> I would say leave the default at 0 as it leaves no doubt that you are
> performing without a net. A setting of '1' implies you are covered and for a
> fast moving cluster or slow moving one with sufficient downtime that would
> not be the case. Better to let the end user know this is not a simple
> problem and some thought needs to go into configuration.

Uh, this specific case is where there *is* a 'safety net' though-
archive command and restore command were configured and being used, so I
don't buy off on this argument at all.

Maybe we just internally bump wal_keep_segments to '1' to avoid this
specific risk without actually changing the default or making people
change their existing configurations, but if this is really what's
happening then I don't think the answer is "don't do anything."

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-12 Thread Adrian Klaver

On 08/12/2018 03:54 PM, Stephen Frost wrote:

Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:

OK.  I think this is perhaps a documentation bug, maybe a missing
warning when the master reads its configuration, and maybe (as you say)
a bad default value.


If we consider it to be an issue worthy of a change then we should
probably just change the default value, and maybe not even allow it to
be set lower than '1'.



I would say leave the default at 0 as it leaves no doubt that you are 
performing without a net. A setting of '1' implies you are covered and 
for a fast moving cluster or slow moving one with sufficient downtime 
that would not be the case. Better to let the end user know this is not 
a simple problem and some thought needs to go into configuration.



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Replication failure, slave requesting old segments

2018-08-12 Thread Adrian Klaver

On 08/12/2018 02:56 PM, Phil Endecott wrote:

Stephen Frost wrote:




Specifically, section 26.2.5 of the docs says:

"If you use streaming replication without file-based continuous archiving,
the server might recycle old WAL segments before the standby has received
them. If this occurs, the standby will need to be reinitialized from a new
base backup. You can avoid this by setting wal_keep_segments to a value
large enough to ensure that WAL segments are not recycled too early, or by
configuring a replication slot for the standby. If you set up a WAL archive
that's accessible from the standby, these solutions are not required, since
the standby can always use the archive to catch up provided it retains 
enough

segments."

OR, maybe the WAL reader that process the files that restore_command 
fetches

could be smart enough to realise that it can skip over the gap at the end?

Anyway.  Do others agree that my issue was the result of 
wal_keep_segments=0 ?


Only as a sub-issue of the slave losing contact with the master. The 
basic problem is maintaining two separate operations, archiving and 
streaming, in sync. If either or some combination of both lose 
synchronization then it is anyone's guess on what is appropriate for 
wal_keep_segments.






Regards, Phil.









--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Replication failure, slave requesting old segments

2018-08-12 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> OK.  I think this is perhaps a documentation bug, maybe a missing 
> warning when the master reads its configuration, and maybe (as you say) 
> a bad default value.

If we consider it to be an issue worthy of a change then we should
probably just change the default value, and maybe not even allow it to
be set lower than '1'.

> Specifically, section 26.2.5 of the docs says:
> 
> "If you use streaming replication without file-based continuous archiving, 
> the server might recycle old WAL segments before the standby has received 
> them. If this occurs, the standby will need to be reinitialized from a new 
> base backup. You can avoid this by setting wal_keep_segments to a value 
> large enough to ensure that WAL segments are not recycled too early, or by 
> configuring a replication slot for the standby. If you set up a WAL archive 
> that's accessible from the standby, these solutions are not required, since 
> the standby can always use the archive to catch up provided it retains enough 
> segments."
> 
> OR, maybe the WAL reader that process the files that restore_command fetches 
> could be smart enough to realise that it can skip over the gap at the end?

That strikes me as a whole lot more complication in something we'd
rather not introduce additional complications into without very good
reason.  Then again, there was just a nearby discussion about how it'd
be nice if the backend could realize when a WAL file is complete, and I
do think that'll be more of an issue when users start configuring larger
WAL files, so perhaps we should figure out a way to handle this..

> Anyway.  Do others agree that my issue was the result of wal_keep_segments=0 ?

Yeah, I've not spent much time actually looking at code around this, so
it'd be nice to get:

a) a reproducible case demonstrating it's happening
b) testing to see how long it's been this way
c) if setting wal_keep_segments=1 fixes it
d) perhaps some thought around if we could address this some other way

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-12 Thread Phil Endecott

Stephen Frost wrote:

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:

Stephen Frost wrote:
>* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
>>2018-08-11 00:12:15.536 UTC [7954] LOG:  restored log file 
"0001000700D0" from archive
>>2018-08-11 00:12:15.797 UTC [7954] LOG:  redo starts at 7/D0F956C0
>>2018-08-11 00:12:16.068 UTC [7954] LOG:  consistent recovery state reached at 
7/D0FFF088

Are the last two log lines above telling us anything useful?  Is that
saying that, of the 16 MB (0x100 byte) WAL file it restored only as
far as byte 0xf956c0 or 0xfff088?  Is that what we would expect?  Is it
trying to use the streaming connection to get the missing bytes from
088 to ?  Is that just an empty gap at the end of the file
due to the next record being too big to fit?


The short answer is that, yes, the next record was likely too big to
fit, but that's what the replica was trying to figure out and couldn't
because D0 was gone from the primary already.  One thing which should
help this would be to use physical replication slots on the primary,
which would keep it from throwing away WAL files until it knows the
replica has them, but that runs the risk of ending up with lots of extra
WAL on the primary if the replica is gone for a while.  If you'd prefer
to avoid that then having wal_keep_segments set to '1' would avoid this
particular issue as well, I'd expect.

I do wonder if perhaps we should just default to having it as '1' to
avoid exactly this case, as it seems like perhaps PG archived D0 and
then flipped to D1 and got rid of D0, which is all pretty reasonable,
except that a replica trying to catch up is going to end up asking for
D0 from the primary because it didn't know if there was anything else
that should have been in D0..


OK.  I think this is perhaps a documentation bug, maybe a missing
warning when the master reads its configuration, and maybe (as you say)
a bad default value.

Specifically, section 26.2.5 of the docs says:

"If you use streaming replication without file-based continuous archiving,
the server might recycle old WAL segments before the standby has received
them. If this occurs, the standby will need to be reinitialized from a new
base backup. You can avoid this by setting wal_keep_segments to a value
large enough to ensure that WAL segments are not recycled too early, or by
configuring a replication slot for the standby. If you set up a WAL archive
that's accessible from the standby, these solutions are not required, since
the standby can always use the archive to catch up provided it retains enough
segments."

OR, maybe the WAL reader that process the files that restore_command fetches
could be smart enough to realise that it can skip over the gap at the end?

Anyway.  Do others agree that my issue was the result of 
wal_keep_segments=0 ?



Regards, Phil.







Re: Replication failure, slave requesting old segments

2018-08-12 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Stephen Frost wrote:
> >* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
> >>   scp %p backup:backup/postgresql/archivedir/%f'
> >
> >This is really not a sufficient or particularly intelligent
> >archive_command.  In general, I'd strongly recommend against trying to
> >roll your own archiving/backup solution for PostgreSQL.  In particular,
> >the above is quite expensive, doesn't address the case where a file is
> >only partially copied to the backup server, and doesn't ensure that the
> >file is completely written and fsync'd on the backup server meaning that
> >a failure will likely result in WAL going missing.  There's much better
> >solutions out there and which you really should be using instead of
> >trying to build your own.  In particular, my preference is pgbackrest
> >(though I'm one of the folks in its development, to be clear), but
> >there's other options such as barman or WAL-E which I believe also
> >address the concerns raised above.
> 
> Hmmm, well I based it on what I read in the documentation:
> 
>   https://www.postgresql.org/docs/9.6/static/continuous-archiving.html
> 
> The expense is not an issue in this case.  I would be surprised if
> scp were to exit(0) after only partially copying a file but perhaps
> you know something I don't about that.

The scp might not, but so what?  If a partial file remains on the far
side, then a subsequent invokation of that archive_command will come
back saying that the file is there.  Further, the file isn't going to
actually be sync'd to the filesystem on the far side, so even if the scp
returns success, a failure on the backup server could result in loss of
the WAL file.

> I know that rsync creates a temporary file and moves it into place when
> all the data has been transferred, which scp doesn't do; perhaps that
> would be safer.

No, what would be better is using a tool which will actually fsync() the
file on the backup server after writing it out there, making sure it's
been stored to the filesystem, and then to perform an atomic mv
operation to put it in place, and then to fsync() the directory to
ensure that change has also been sync'd to disk.

Even better is to calculate a hash of the file and to reverify that on
restore, to make sure that it doesn't end up getting corrupted on the
backup server somehow.

> The master was only writing WAL files maybe 3 or 4 times per day at
> the time, so any scenario that requires its crash to have occurred
> exactly as it was doing an scp involves a substantial coincidence.

While I get that it's unlikely to be relevant to this particular case, I
wouldn't recommend depending on never having that coincidence happen.
What I've seen quite a bit of, just so you know, is that things tend to
all break down at about the same time- so right when something bad is
happening on the primary, the backup server runs into issues too.  This
might be due to networking breaking down in a particular area all at
once or shared storage having some issue but it's happened often enough
that relying on uncommon things to avoid each other is something I've
got a hard time doing these days.

> I am limited in what I can run on the backup server.

You could consider backing up locally using a tool which is built for
PostgreSQL and then sync'ing the results of that backup over to the
backup server, perhaps.  Or point out that you need a backup server
where you don't have those restirctions in order to properly have
backups.  Or use an external service like s3 (or a clone) to back up to.

> >Having it go back to D0 is at least a bit interesting.  I wonder if
> >somehow it was partial..  Have you looked at the files on the archive
> >server to make sure they're all 16MB and the same size?
> 
> Yes, all the files on the backup including D0 were 16 MB.  But:

Ok.  Did you see any failures on the sending side when archive_command
was being run?  PG shouldn't start trying to archive the file until it's
done with it, of course, so unless something failed right at the end of
the scp and left the file at 16MB but it wasn't actually filled..

> >>2018-08-11 00:12:15.536 UTC [7954] LOG:  restored log file 
> >>"0001000700D0" from archive
> >>2018-08-11 00:12:15.797 UTC [7954] LOG:  redo starts at 7/D0F956C0
> >>2018-08-11 00:12:16.068 UTC [7954] LOG:  consistent recovery state reached 
> >>at 7/D0FFF088
> 
> Are the last two log lines above telling us anything useful?  Is that
> saying that, of the 16 MB (0x100 byte) WAL file it restored only as
> far as byte 0xf956c0 or 0xfff088?  Is that what we would expect?  Is it
> trying to use the streaming connection to get the missing bytes from
> 088 to ?  Is that just an empty gap at the end of the file
> due to the next record being too big to fit?

The short answer is that, yes, the next 

Re: Replication failure, slave requesting old segments

2018-08-12 Thread Adrian Klaver

On 08/12/2018 12:53 PM, Phil Endecott wrote:

Phil Endecott wrote:

On the master, I have:

wal_level = replica
archive_mode = on
archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
scp %p backup:backup/postgresql/archivedir/%f'

On the slave I have:

standby_mode = 'on'
primary_conninfo = 'user=postgres host=master port=5432'
restore_command = 'scp backup:backup/postgresql/archivedir/%f %p'

hot_standby = on



2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
"0001000700D0" from archive
scp: backup/postgresql/archivedir/0001000700D1: No such file or 
directory
2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from primary at 
7/D000 on timeline 1
2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from WAL 
stream: ERROR:  requested WAL segment 0001000700D0 has already been 
removed



I am wondering if I need to set wal_keep_segments to at least 1 or 2 for
this to work.  I currently have it unset and I believe the default is 0.


Given that WAL's are only 16 MB I would probably bump it up to be on 
safe side, or use:


https://www.postgresql.org/docs/9.6/static/warm-standby.html

26.2.6. Replication Slots

Though the above does not limit storage of WAL's, so a long outage could 
result in WAL's piling up.




My understanding was that when using archive_command/restore_command to copy
WAL segments it would not be necessary to use wal_keep_segments to retain
files in pg_xlog on the server; the slave can get everything using a
combination of copying files using the restore_command and streaming.
But these lines from the log:

2018-08-11 00:12:15.797 UTC [7954] LOG: redo starts at 7/D0F956C0
2018-08-11 00:12:16.068 UTC [7954] LOG: consistent recovery state reached at 
7/D0FFF088

make me think that there is an issue when the slave reaches the end of the
copied WAL file.  I speculate that the useful content of this WAL segment
ends at FFF088, which is followed by an empty gap due to record sizes.  But
the slave tries to start streaming from this point, D0FFF088, not D100.
If the master still had a copy of segment D0 then it would be able to stream
this gap followed by the real content in the current segment D1.

Does that make any sense at all?


Regards, Phil.







--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Replication failure, slave requesting old segments

2018-08-12 Thread Adrian Klaver

On 08/12/2018 12:25 PM, Phil Endecott wrote:

Hi Adrian,

Adrian Klaver wrote:

On 08/11/2018 12:42 PM, Phil Endecott wrote:

Hi Adrian,

Adrian Klaver wrote:
Looks like the master recycled the WAL's while the slave could not 
connect.


Yes but... why is that a problem?  The master is copying the WALs to
the backup server using scp, where they remain forever.  The slave gets


To me it looks like that did not happen:

2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
"0001000700D0" from archive
scp: backup/postgresql/archivedir/0001000700D1: No such 
file or directory
2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from 
primary at 7/D000 on timeline 1
2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from 
WAL stream: ERROR:  requested WAL segment 0001000700D0 has 
already been removed


Above 0001000700D0 is gone/recycled on the master and the 
archived version does not seem to be complete as the streaming 
replication is trying to find it.


The files on the backup server were all 16 MB.


WAL files are created/recycled as 16 MB files, which is not the same as 
saying they are complete for the purposes of restoring. In other words 
you could be looking at a 16 MB file full of 0's.





Below you kick the master and it coughs up the files to the archive 
including *D0 and *D1 on up to *D4 and then the streaming picks using 
*D5.


When I kicked it, the master wrote D1 to D4 to the backup.  It did not
change D0 (its modification time on the backup is from before the "kick").
The slave re-read D0, again, as it had been doing throughout this period,
and then read D1 to D4.


Well something happened because the slave could not get all the 
information it needed from the D0 in the archive and was trying to get 
it from the masters pg_xlog.






Best guess is the archiving did not work as expected during:

"(During this time the master was also down for a shorter period.)"


Around the time the master was down, the WAL segment names were CB and CC.
Files CD to CF were written between the master coming up and the slave
coming up.  The slave had no trouble restoring those segments when it 
started.
The problematic segments D0 and D1 were the ones that were "current" 
when the

slave restarted, at which time the master was up consistently.


Regards, Phil.








--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Replication failure, slave requesting old segments

2018-08-12 Thread Phil Endecott
Phil Endecott wrote:
> On the master, I have:
>
> wal_level = replica
> archive_mode = on
> archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
>scp %p backup:backup/postgresql/archivedir/%f'
>
> On the slave I have:
>
> standby_mode = 'on'
> primary_conninfo = 'user=postgres host=master port=5432'
> restore_command = 'scp backup:backup/postgresql/archivedir/%f %p'
>
> hot_standby = on

> 2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
> "0001000700D0" from archive
> scp: backup/postgresql/archivedir/0001000700D1: No such file or 
> directory
> 2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from primary 
> at 7/D000 on timeline 1
> 2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from WAL 
> stream: ERROR:  requested WAL segment 0001000700D0 has already 
> been removed


I am wondering if I need to set wal_keep_segments to at least 1 or 2 for 
this to work.  I currently have it unset and I believe the default is 0.

My understanding was that when using archive_command/restore_command to copy 
WAL segments it would not be necessary to use wal_keep_segments to retain 
files in pg_xlog on the server; the slave can get everything using a 
combination of copying files using the restore_command and streaming. 
But these lines from the log:

2018-08-11 00:12:15.797 UTC [7954] LOG: redo starts at 7/D0F956C0
2018-08-11 00:12:16.068 UTC [7954] LOG: consistent recovery state reached at 
7/D0FFF088

make me think that there is an issue when the slave reaches the end of the 
copied WAL file.  I speculate that the useful content of this WAL segment 
ends at FFF088, which is followed by an empty gap due to record sizes.  But 
the slave tries to start streaming from this point, D0FFF088, not D100. 
If the master still had a copy of segment D0 then it would be able to stream 
this gap followed by the real content in the current segment D1.

Does that make any sense at all?


Regards, Phil.





Re: Replication failure, slave requesting old segments

2018-08-12 Thread Phil Endecott

Hi Adrian,

Adrian Klaver wrote:

On 08/11/2018 12:42 PM, Phil Endecott wrote:

Hi Adrian,

Adrian Klaver wrote:
Looks like the master recycled the WAL's while the slave could not 
connect.


Yes but... why is that a problem?  The master is copying the WALs to
the backup server using scp, where they remain forever.  The slave gets


To me it looks like that did not happen:

2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
"0001000700D0" from archive
scp: backup/postgresql/archivedir/0001000700D1: No such file 
or directory
2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from 
primary at 7/D000 on timeline 1
2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from 
WAL stream: ERROR:  requested WAL segment 0001000700D0 has 
already been removed


Above 0001000700D0 is gone/recycled on the master and the 
archived version does not seem to be complete as the streaming 
replication is trying to find it.


The files on the backup server were all 16 MB.


Below you kick the master and it coughs up the files to the archive 
including *D0 and *D1 on up to *D4 and then the streaming picks using *D5.


When I kicked it, the master wrote D1 to D4 to the backup.  It did not
change D0 (its modification time on the backup is from before the "kick").
The slave re-read D0, again, as it had been doing throughout this period,
and then read D1 to D4.



Best guess is the archiving did not work as expected during:

"(During this time the master was also down for a shorter period.)"


Around the time the master was down, the WAL segment names were CB and CC.
Files CD to CF were written between the master coming up and the slave
coming up.  The slave had no trouble restoring those segments when it started.
The problematic segments D0 and D1 were the ones that were "current" 
when the

slave restarted, at which time the master was up consistently.


Regards, Phil.







Re: Replication failure, slave requesting old segments

2018-08-11 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Dear Experts,

Since you're asking ...

> I recently set up replication for the first time.  It seemed to be 
> working OK in my initial tests, but didn't cope when the slave was 
> down for a longer period.  This is all with the Debian stable 
> packages of PostgreSQL 9.6.  My replication setup involves a third 
> server, "backup", to and from which WAL files are copied using scp 
> (and essentially never deleted), plus streaming replication in 
> "hot standby" mode.
> 
> On the master, I have:
> 
> wal_level = replica
> archive_mode = on
> archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
>scp %p backup:backup/postgresql/archivedir/%f'

This is really not a sufficient or particularly intelligent
archive_command.  In general, I'd strongly recommend against trying to
roll your own archiving/backup solution for PostgreSQL.  In particular,
the above is quite expensive, doesn't address the case where a file is
only partially copied to the backup server, and doesn't ensure that the
file is completely written and fsync'd on the backup server meaning that
a failure will likely result in WAL going missing.  There's much better
solutions out there and which you really should be using instead of
trying to build your own.  In particular, my preference is pgbackrest
(though I'm one of the folks in its development, to be clear), but
there's other options such as barman or WAL-E which I believe also
address the concerns raised above.

> On the slave I have:
> 
> standby_mode = 'on'
> primary_conninfo = 'user=postgres host=master port=5432'
> restore_command = 'scp backup:backup/postgresql/archivedir/%f %p'
> 
> hot_standby = on
> 
> 
> This all seemed to work OK until the slave was shut down for a few days.  
> (During this time the master was also down for a shorter period.) 
> When the slave came back up it successfully copied several WAL files 
> from the backup server until it reached one that hasn't been created yet:

That all sounds fine.

> 2018-08-10 22:37:38.322 UTC [615] LOG:  restored log file 
> "0001000700CC" from archive
> 2018-08-10 22:38:02.011 UTC [615] LOG:  restored log file 
> "0001000700CD" from archive
> 2018-08-10 22:38:29.802 UTC [615] LOG:  restored log file 
> "0001000700CE" from archive
> 2018-08-10 22:38:55.973 UTC [615] LOG:  restored log file 
> "0001000700CF" from archive
> 2018-08-10 22:39:24.964 UTC [615] LOG:  restored log file 
> "0001000700D0" from archive
> scp: backup/postgresql/archivedir/0001000700D1: No such file or 
> directory
> 
> At this point there was a temporary problem due to a firewall rule that 
> blocked the replication connection to the master:
> 
> 2018-08-10 22:41:35.865 UTC [1289] FATAL:  could not connect to the primary 
> server: SSL connection has been closed unexpectedly

Well, that's certainly not good.

> The slave then entered a loop, retrying to scp the next file from the backup 
> archive (and failing) and connecting to the master for streaming replication 
> (and also failing).  That is as expected, except for the "invalid magic 
> number":
> 
> scp: backup/postgresql/archivedir/0001000700D1: No such file or 
> directory
> 2018-08-10 22:41:37.018 UTC [615] LOG:  invalid magic number  in log 
> segment 0001000700D1, offset 0
> 2018-08-10 22:43:46.936 UTC [1445] FATAL:  could not connect to the primary 
> server: SSL connection has been closed unexpectedly

Note that the invalid magic number error is just LOG-level, and that
PostgreSQL will certainly ask for WAL files which don't exist in the
archive yet.

> Note that it's complaining about an invalid magic number in log segment D1, 
> which is 
> the one that has just failed to scp.  That looked suspicious to me and I 
> remembered 
> notes in the docs about ensuring that the archive and/or restore commands 
> return an 
> error exit status when they fail; I checked that and yes scp does exit(1) 
> when the 
> requested file doesn't exist:

Sure, but scp doesn't produce any output when it's successful, so are
you sure that an scp wasn't performed after the "no such file or
directory" log message and which perhaps performed a partial copy
(see above about *exactly* that concern being mentioned by me, even
before getting down to this part, due to using these simple unix
commands...)?

> $ scp backup:/egheriugherg /tmp/a
> scp: /egheriugherg: No such file or directory
> $ echo $?
> 1

scp has no idea if it made a copy of a partial file though.

> Anyway, the slave continued in its retrying loop as expected, except that 
> each time 
> it re-fetched the previous segment, D0; this seems a bit peculiar:
> 
> 2018-08-10 22:44:17.796 UTC [615] LOG:  restored log file 
> "0001000700D0" from archive
> scp: backup/postgresql/archivedir/0001000700D1: No such file or 
> 

Re: Replication failure, slave requesting old segments

2018-08-11 Thread Adrian Klaver

On 08/11/2018 12:42 PM, Phil Endecott wrote:

Hi Adrian,

Adrian Klaver wrote:
Looks like the master recycled the WAL's while the slave could not 
connect.


Yes but... why is that a problem?  The master is copying the WALs to
the backup server using scp, where they remain forever.  The slave gets


To me it looks like that did not happen:

2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
"0001000700D0" from archive
scp: backup/postgresql/archivedir/0001000700D1: No such file 
or directory
2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from 
primary at 7/D000 on timeline 1
2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from 
WAL stream: ERROR:  requested WAL segment 0001000700D0 has 
already been removed


Above 0001000700D0 is gone/recycled on the master and the 
archived version does not seem to be complete as the streaming 
replication is trying to find it.



Below you kick the master and it coughs up the files to the archive 
including *D0 and *D1 on up to *D4 and then the streaming picks using *D5.


2018-08-11 00:55:49.741 UTC [7954] LOG:  restored log file 
"0001000700D0" from archive
2018-08-11 00:56:12.304 UTC [7954] LOG:  restored log file 
"0001000700D1" from archive
2018-08-11 00:56:35.481 UTC [7954] LOG:  restored log file 
"0001000700D2" from archive
2018-08-11 00:56:57.443 UTC [7954] LOG:  restored log file 
"0001000700D3" from archive
2018-08-11 00:57:21.723 UTC [7954] LOG:  restored log file 
"0001000700D4" from archive
scp: backup/postgresql/archivedir/0001000700D5: No such file 
or directory
2018-08-11 00:57:22.915 UTC [7954] LOG:  unexpected pageaddr 7/C700 
in log segment 00010007

00D5, offset 0
2018-08-11 00:57:23.114 UTC [12348] LOG:  started streaming WAL from 
primary at 7/D500 on timeline 1



Best guess is the archiving did not work as expected during:

"(During this time the master was also down for a shorter period.)"


them from there before it starts streaming.  So it shouldn't matter
if the master recycles them, as the slave should be able to get everything
using the combination of scp and then streaming.

Am I missing something about how this sort of replication is supposed to
work?


Thanks, Phil.








--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Replication failure, slave requesting old segments

2018-08-11 Thread Phil Endecott

Hi Adrian,

Adrian Klaver wrote:

Looks like the master recycled the WAL's while the slave could not connect.


Yes but... why is that a problem?  The master is copying the WALs to
the backup server using scp, where they remain forever.  The slave gets
them from there before it starts streaming.  So it shouldn't matter
if the master recycles them, as the slave should be able to get everything
using the combination of scp and then streaming.

Am I missing something about how this sort of replication is supposed to
work?


Thanks, Phil.