On Thu, Jul 12, 2012 at 6:07 PM, Fujii Masao <masao.fu...@gmail.com> wrote:
> On Thu, Jul 12, 2012 at 8:39 PM, Magnus Hagander <mag...@hagander.net> wrote:
>> On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao <masao.fu...@gmail.com> wrote:
>>> On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
>>>> Hi,
>>>> I found several problems in pg_receivexlog, e.g., memory leaks,
>>>> file-descripter leaks, ..etc. The attached patch fixes these problems.
>>>> ISTM there are still some other problems in pg_receivexlog, so I'll
>>>> read it deeply later.
>>> While pg_basebackup background process is streaming WAL records,
>>> if its replication connection is terminated (e.g., walsender in the server
>>> is accidentally terminated by SIGTERM signal), pg_basebackup ends
>>> up failing to include all required WAL files in the backup. The problem
>>> is that, in this case, pg_basebackup doesn't emit any error message at all.
>>> So an user might misunderstand that a base backup has been successfully
>>> taken even though it doesn't include all required WAL files.
>> Ouch. That is definitely a bug if it behaves that way.
>>> To fix this problem, I think that, when the replication connection is
>>> terminated, ReceiveXlogStream() should check whether we've already
>>> reached the stop point by calling stream_stop() before returning TRUE.
>>> If we've not yet (this means that we've not received all required WAL
>>> files yet), ReceiveXlogStream() should return FALSE and
>>> pg_basebackup should emit an error message.  Comments?
>> Doesn't it already return false because it detects the error of the
>> connection? What's the codepath where we end up returning true even
>> though we had a connection failure? Shouldn't that end up under the
>> "could not read copy data" branch, which already returns false?
> You're right. If the error is detected, that function always returns false
> and the error message is emitted (but I think that current error message
> "pg_basebackup: child process exited with error 1" is confusing....),
> so it's OK. But if walsender in the server is terminated by SIGTERM,
> no error is detected and pg_basebackup background process gets out
> of the loop in ReceiveXlogStream() and returns true.

Oh. Because the server does a graceful shutdown. D'uh, of course.

Then yes, your suggested fix seems like a good one.

 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to