On Mon, Jun 27, 2016 at 8:27 PM, Alex Malek <magicag...@gmail.com> wrote:

> Should I file a bug?
>

Oh, sorry. I did not spot that this was posted to pgsql-admin, which has a
lot fewer readers. I've moved it over to -hackers where more people will
see it.

(And also, please don't top-post on these lists, as it makes the discussion
much harder to follow)



>
> Does anyone know of a workaround?
>
> I could try running pg_basebackup w/ --xlog-method=fetch and see if that
> helps but I do like knowing that the base backup will have all needed WAL
> files regardless of WAL retention policy on master.
>
> Thanks.
> Alex
>
> Recap of what I *think* is going on:
>
> 1) pg_basebackup w/ --xlog-method=stream is started
> 2) archive_command is not executed while pg_basebackup() is running
>

There is no reason why archive_command should not run during a
pg_basebackup run. Do you get anything at all in the logfile indicating
that this would be the case? If you look at the ps output, does the
archiver process indicate what it's doing, and does it have a sub-process
for your rsync command?

Is there any chance that for example the network or other part of the
system gets saturated and the archive_command is simply runniung too slow
to keep up?


3) while pg_basebackup() is running pg_start_backup() is executed
> 4) Later pg_stop_backup() is executed.
> 5) pg_stop_backup() hangs related to the archive_command not running
> 6) hours later when pg_basebackup finishes, back logged archive commands
> get executed then pg_stop_backup() finishes
>

This would be consistent with (2), but there is no reason why (2) would
happen because of pg_basebackup unless there's an actual bug there.

//Magnus



> On Mon, Jun 20, 2016 at 10:15 AM, Alex Malek <magicag...@gmail.com> wrote:
>
>>
>> Thanks for the response Magnus.
>> I've replied inline below.
>>
>> On Mon, Jun 20, 2016 at 4:17 AM, Magnus Hagander <mag...@hagander.net>
>> wrote:
>>
>>> On Mon, Jun 13, 2016 at 6:59 PM, Alex Malek <magicag...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> I am experiencing two problems
>>>> 1) pg_stop_backup hangs until pg_basebackup finishes
>>>> 2) WAL contains references to invalid pages
>>>>
>>>> (I suspect when 1. is fixed I won't experience 2. )
>>>>
>>>> Full description below:
>>>>
>>>>
>>>> postgresql: 9.3.13
>>>>
>>>> My situation is I create a writable report database every day using
>>>> pg_start_backup() / rsync / pg_stop_backup()
>>>> (whole process takes 1-2 hours, the rsync actually copies the data dir
>>>> of a  slave/warm spare DB)
>>>>
>>>
>>> Taking what is called an exclusive backup, which is what
>>> pg_start_backup() does, is not supported off a standby node. Only
>>> pg_basebackup is supported against a standby node.
>>>
>>> Can you reproduce your errors if you make this backup from the master?
>>>
>>
>>
>> pg_start_backup() is run on the master.
>> The rsync copies the data dir of the standby b/c it is local.
>> I have tried rsyncing from the master w/ the same results.
>>
>>
>>>
>>>
>>>
>>>> Also once a week I create a database for backup/archive purposes using
>>>> pg_basebackup
>>>> (whole process takes about 13 hours)
>>>>
>>>>
>>>> These two processes used to be able to coincide until recently.
>>>> Recent changes include a major debian upgrade and a minor version of
>>>> postgres upgrade from 9.3.10.
>>>>
>>>> Now when the report sync occurs during the pg_basebackup (w/
>>>> --xlog-method=stream option) the pg_stop_backup() hangs until
>>>> the the pg_basebackup has completed (4 hours later).
>>>>
>>>
>>> Do you also have an archive_command, and what is it? Normally,
>>> pg_stop_backup() blocks on the archive command - so perhaps you have ended
>>> up with a dependency between that one and the base backup command somehow?
>>>
>>>
>> The archive command is:
>>
>> archive_command = 'rsync -a %p postgres@archivedb
>> :/storage/postgres/9.3/main-archive/pg_xlog/%f'
>>
>> I thought the archive command was working properly while pg_stop_backup
>> was hanging based on the timestamps of the files, but upon closer
>> inspection using stat I see that the files actually were not copied over
>> until pg_stop_backup stopped hanging.
>>
>> Actually it appears that the archive_command is not executed the entire
>> time pg_basebackup is running.
>> pg_basebackup is started about 5 hours before pg_start_backup is executed.
>> Based on stat it looks like the wal files get written locally but do not
>> get rsynced until pg_basebackup is done.
>>
>>
>>
>>>
>>>
>>>> A labeled WAL backup file is created when the pg_stop_backup() is first
>>>> executed and another
>>>> is created when the pg_basebackup completes.
>>>>
>>>>
>>>>
>>>> While the pg_stop_backup() hangs the following appears in the logs:
>>>>
>>>> 2016-06-11 07:50:45 EDT: WARNING:  pg_stop_backup still waiting for all
>>>> required WAL segments to be archived (7680 seconds elapsed)
>>>> 2016-06-11 07:50:45 EDT: HINT:  Check that your archive_command is
>>>> executing properly.  pg_stop_backup can be canceled safely, but the
>>>> database backup will not be usable without all the WAL segments.
>>>>
>>>>
>>>> until eventually:
>>>>
>>>> 2016-06-11 12:59:07 EDT:LOG:  duration: 26190082.497 ms  statement:
>>>> SELECT pg_stop_backup()
>>>>
>>>> The archive command appears to work as WAL files are being archived
>>>> successfully with timestamps that occur while pg_stop_backup is hanging.
>>>>
>>>> -rw------- 1 postgres postgres 314 Jun 11 12:56
>>>> 000000030000109200000066.00000028.backup
>>>> -rw------- 1 postgres postgres 16M Jun 11 12:56 000000030000109300000029
>>>> -rw------- 1 postgres postgres 16M Jun 11 12:54 000000030000109300000028
>>>> -rw------- 1 postgres postgres 16M Jun 11 12:52 000000030000109300000027
>>>> <snip>
>>>> -rw------- 1 postgres postgres 16M Jun 11 06:22 0000000300001092000000F1
>>>> -rw------- 1 postgres postgres 335 Jun 11 05:42
>>>> 0000000300001092000000D4.00000028.backup
>>>> -rw------- 1 postgres postgres 16M Jun 11 05:42 0000000300001092000000F0
>>>>
>>>> Content of 0000000300001092000000D4.00000028.backup:
>>>>
>>>> START WAL LOCATION: 1092/D4000028 (file 0000000300001092000000D4)
>>>> STOP WAL LOCATION: 1092/F008EA10 (file 0000000300001092000000F0)
>>>> CHECKPOINT LOCATION: 1092/D4004E08
>>>> BACKUP METHOD: pg_start_backup
>>>> BACKUP FROM: master
>>>> START TIME: 2016-06-11 04:51:04 EDT
>>>> LABEL: reporting 2016-06-11 04:51:03.892804-04
>>>> STOP TIME: 2016-06-11 05:42:37 EDT
>>>>
>>>
>>> This file shows that the backup took about an hour, doesn't it? So it
>>> doesn't look like it was waiting?
>>>
>>> Or are you saying the wait happens *after* the .backup file has been
>>> dropped in the archives?
>>>
>>
>> The .backup file gets written locally in a timely manner but doesn't get
>> dropped into the archive until after the
>> pg_stop_backup() stops hanging.
>>
>> So it appears that pg_basebackup ( perhaps --xlog-method=stream ? ) is
>> causing the archive_command to pause which in turn
>> causes pg_stop_backup() to hang?
>>
>> Thanks.
>>
>> Alex
>>
>>
>

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Reply via email to