Re: [HACKERS] several problems in pg_receivexlog
On Tue, Jul 31, 2012 at 6:50 PM, Fujii Masao wrote: > On Wed, Aug 1, 2012 at 12:09 AM, Alvaro Herrera > wrote: >> >> Excerpts from Fujii Masao's message of mar jul 17 13:58:38 -0400 2012: >> >>> >> You're right. If the error is detected, that function always returns >>> >> false >>> >> and the error message is emitted (but I think that current error message >>> >> "pg_basebackup: child process exited with error 1" is confusing), >>> >> so it's OK. But if walsender in the server is terminated by SIGTERM, >>> >> no error is detected and pg_basebackup background process gets out >>> >> of the loop in ReceiveXlogStream() and returns true. >>> > >>> > Oh. Because the server does a graceful shutdown. D'uh, of course. >>> > >>> > Then yes, your suggested fix seems like a good one. >>> >>> Attached patch adds the fix. >>> >>> Also I found I had forgotten to set the file descriptor to -1 at the end of >>> ReceiveXlogStream(), in previously-committed my patch. Attached patch >>> fixes this problem. >> >> This hasn't been committed yet AFAICT, and it probably needs a refresh >> now after my changes to pg_basebackup. Please update the patch. > > I attached the updated version. Thanks, applied. >> Also, >> if this is not in the Open Items list, please put it there so that we >> don't forget it before the 9.2 release. > > Yep, done. And I'll go take it off :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Tue, Jul 31, 2012 at 5:06 PM, Alvaro Herrera wrote: > > Excerpts from Magnus Hagander's message of jue jul 12 07:35:11 -0400 2012: >> On Tue, Jul 10, 2012 at 6:45 PM, Fujii Masao wrote: > >> > When an error happens after replication connection has been established, >> > pg_receivexlog doesn't close an open file descriptor and release an >> > allocated >> > memory area. This was harmless before >> > 16282ae688de2b320cf176e9be8a89e4dfc60698 >> > because pg_receivexlog exits immediately when an error happens. But >> > currently in an error case, pg_receivexlog tries reconnecting to the server >> > infinitely, so file descriptors and memory would leak. I think this is >> > problem >> > and should be fixed. The patch which I submitted yesterday changes >> > pg_receivexlog so that it closes the open file and frees the memory area >> > before reconnecting to the server. >> >> Thanks. I get it now, and this explains why I didn't see it before - I >> didn't check properly after we added the loop mode. Patch applied with >> minor changes (e.g. there's no point in doing PQfinish(tmpconn) right >> after you've verified tmpconn is NULL) > > For some reason, Magnus neglected to backpatch this to 9.2, so I just > did. Thanks. I believe that was just an oversight. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Wed, Aug 1, 2012 at 12:09 AM, Alvaro Herrera wrote: > > Excerpts from Fujii Masao's message of mar jul 17 13:58:38 -0400 2012: > >> >> You're right. If the error is detected, that function always returns false >> >> and the error message is emitted (but I think that current error message >> >> "pg_basebackup: child process exited with error 1" is confusing), >> >> so it's OK. But if walsender in the server is terminated by SIGTERM, >> >> no error is detected and pg_basebackup background process gets out >> >> of the loop in ReceiveXlogStream() and returns true. >> > >> > Oh. Because the server does a graceful shutdown. D'uh, of course. >> > >> > Then yes, your suggested fix seems like a good one. >> >> Attached patch adds the fix. >> >> Also I found I had forgotten to set the file descriptor to -1 at the end of >> ReceiveXlogStream(), in previously-committed my patch. Attached patch >> fixes this problem. > > This hasn't been committed yet AFAICT, and it probably needs a refresh > now after my changes to pg_basebackup. Please update the patch. I attached the updated version. > Also, > if this is not in the Open Items list, please put it there so that we > don't forget it before the 9.2 release. Yep, done. Regards, -- Fujii Masao pgreceivexlog_check_stoppoint_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
Excerpts from Fujii Masao's message of mar jul 17 13:58:38 -0400 2012: > >> You're right. If the error is detected, that function always returns false > >> and the error message is emitted (but I think that current error message > >> "pg_basebackup: child process exited with error 1" is confusing), > >> so it's OK. But if walsender in the server is terminated by SIGTERM, > >> no error is detected and pg_basebackup background process gets out > >> of the loop in ReceiveXlogStream() and returns true. > > > > Oh. Because the server does a graceful shutdown. D'uh, of course. > > > > Then yes, your suggested fix seems like a good one. > > Attached patch adds the fix. > > Also I found I had forgotten to set the file descriptor to -1 at the end of > ReceiveXlogStream(), in previously-committed my patch. Attached patch > fixes this problem. This hasn't been committed yet AFAICT, and it probably needs a refresh now after my changes to pg_basebackup. Please update the patch. Also, if this is not in the Open Items list, please put it there so that we don't forget it before the 9.2 release. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
Excerpts from Magnus Hagander's message of jue jul 12 07:35:11 -0400 2012: > On Tue, Jul 10, 2012 at 6:45 PM, Fujii Masao wrote: > > When an error happens after replication connection has been established, > > pg_receivexlog doesn't close an open file descriptor and release an > > allocated > > memory area. This was harmless before > > 16282ae688de2b320cf176e9be8a89e4dfc60698 > > because pg_receivexlog exits immediately when an error happens. But > > currently in an error case, pg_receivexlog tries reconnecting to the server > > infinitely, so file descriptors and memory would leak. I think this is > > problem > > and should be fixed. The patch which I submitted yesterday changes > > pg_receivexlog so that it closes the open file and frees the memory area > > before reconnecting to the server. > > Thanks. I get it now, and this explains why I didn't see it before - I > didn't check properly after we added the loop mode. Patch applied with > minor changes (e.g. there's no point in doing PQfinish(tmpconn) right > after you've verified tmpconn is NULL) For some reason, Magnus neglected to backpatch this to 9.2, so I just did. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Fri, Jul 13, 2012 at 1:15 AM, Magnus Hagander wrote: > On Thu, Jul 12, 2012 at 6:07 PM, Fujii Masao wrote: >> On Thu, Jul 12, 2012 at 8:39 PM, Magnus Hagander wrote: >>> On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao wrote: On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao wrote: > Hi, > > I found several problems in pg_receivexlog, e.g., memory leaks, > file-descripter leaks, ..etc. The attached patch fixes these problems. > > ISTM there are still some other problems in pg_receivexlog, so I'll > read it deeply later. While pg_basebackup background process is streaming WAL records, if its replication connection is terminated (e.g., walsender in the server is accidentally terminated by SIGTERM signal), pg_basebackup ends up failing to include all required WAL files in the backup. The problem is that, in this case, pg_basebackup doesn't emit any error message at all. So an user might misunderstand that a base backup has been successfully taken even though it doesn't include all required WAL files. >>> >>> Ouch. That is definitely a bug if it behaves that way. >>> >>> To fix this problem, I think that, when the replication connection is terminated, ReceiveXlogStream() should check whether we've already reached the stop point by calling stream_stop() before returning TRUE. If we've not yet (this means that we've not received all required WAL files yet), ReceiveXlogStream() should return FALSE and pg_basebackup should emit an error message. Comments? >>> >>> Doesn't it already return false because it detects the error of the >>> connection? What's the codepath where we end up returning true even >>> though we had a connection failure? Shouldn't that end up under the >>> "could not read copy data" branch, which already returns false? >> >> You're right. If the error is detected, that function always returns false >> and the error message is emitted (but I think that current error message >> "pg_basebackup: child process exited with error 1" is confusing), >> so it's OK. But if walsender in the server is terminated by SIGTERM, >> no error is detected and pg_basebackup background process gets out >> of the loop in ReceiveXlogStream() and returns true. > > Oh. Because the server does a graceful shutdown. D'uh, of course. > > Then yes, your suggested fix seems like a good one. Attached patch adds the fix. Also I found I had forgotten to set the file descriptor to -1 at the end of ReceiveXlogStream(), in previously-committed my patch. Attached patch fixes this problem. Regards, -- Fujii Masao pgreceivexlog_check_stoppoint_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Thu, Jul 12, 2012 at 6:07 PM, Fujii Masao wrote: > On Thu, Jul 12, 2012 at 8:39 PM, Magnus Hagander wrote: >> On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao wrote: >>> On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao wrote: Hi, I found several problems in pg_receivexlog, e.g., memory leaks, file-descripter leaks, ..etc. The attached patch fixes these problems. ISTM there are still some other problems in pg_receivexlog, so I'll read it deeply later. >>> >>> While pg_basebackup background process is streaming WAL records, >>> if its replication connection is terminated (e.g., walsender in the server >>> is accidentally terminated by SIGTERM signal), pg_basebackup ends >>> up failing to include all required WAL files in the backup. The problem >>> is that, in this case, pg_basebackup doesn't emit any error message at all. >>> So an user might misunderstand that a base backup has been successfully >>> taken even though it doesn't include all required WAL files. >> >> Ouch. That is definitely a bug if it behaves that way. >> >> >>> To fix this problem, I think that, when the replication connection is >>> terminated, ReceiveXlogStream() should check whether we've already >>> reached the stop point by calling stream_stop() before returning TRUE. >>> If we've not yet (this means that we've not received all required WAL >>> files yet), ReceiveXlogStream() should return FALSE and >>> pg_basebackup should emit an error message. Comments? >> >> Doesn't it already return false because it detects the error of the >> connection? What's the codepath where we end up returning true even >> though we had a connection failure? Shouldn't that end up under the >> "could not read copy data" branch, which already returns false? > > You're right. If the error is detected, that function always returns false > and the error message is emitted (but I think that current error message > "pg_basebackup: child process exited with error 1" is confusing), > so it's OK. But if walsender in the server is terminated by SIGTERM, > no error is detected and pg_basebackup background process gets out > of the loop in ReceiveXlogStream() and returns true. Oh. Because the server does a graceful shutdown. D'uh, of course. Then yes, your suggested fix seems like a good one. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Thu, Jul 12, 2012 at 8:39 PM, Magnus Hagander wrote: > On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao wrote: >> On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao wrote: >>> Hi, >>> >>> I found several problems in pg_receivexlog, e.g., memory leaks, >>> file-descripter leaks, ..etc. The attached patch fixes these problems. >>> >>> ISTM there are still some other problems in pg_receivexlog, so I'll >>> read it deeply later. >> >> While pg_basebackup background process is streaming WAL records, >> if its replication connection is terminated (e.g., walsender in the server >> is accidentally terminated by SIGTERM signal), pg_basebackup ends >> up failing to include all required WAL files in the backup. The problem >> is that, in this case, pg_basebackup doesn't emit any error message at all. >> So an user might misunderstand that a base backup has been successfully >> taken even though it doesn't include all required WAL files. > > Ouch. That is definitely a bug if it behaves that way. > > >> To fix this problem, I think that, when the replication connection is >> terminated, ReceiveXlogStream() should check whether we've already >> reached the stop point by calling stream_stop() before returning TRUE. >> If we've not yet (this means that we've not received all required WAL >> files yet), ReceiveXlogStream() should return FALSE and >> pg_basebackup should emit an error message. Comments? > > Doesn't it already return false because it detects the error of the > connection? What's the codepath where we end up returning true even > though we had a connection failure? Shouldn't that end up under the > "could not read copy data" branch, which already returns false? You're right. If the error is detected, that function always returns false and the error message is emitted (but I think that current error message "pg_basebackup: child process exited with error 1" is confusing), so it's OK. But if walsender in the server is terminated by SIGTERM, no error is detected and pg_basebackup background process gets out of the loop in ReceiveXlogStream() and returns true. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao wrote: > On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao wrote: >> Hi, >> >> I found several problems in pg_receivexlog, e.g., memory leaks, >> file-descripter leaks, ..etc. The attached patch fixes these problems. >> >> ISTM there are still some other problems in pg_receivexlog, so I'll >> read it deeply later. > > While pg_basebackup background process is streaming WAL records, > if its replication connection is terminated (e.g., walsender in the server > is accidentally terminated by SIGTERM signal), pg_basebackup ends > up failing to include all required WAL files in the backup. The problem > is that, in this case, pg_basebackup doesn't emit any error message at all. > So an user might misunderstand that a base backup has been successfully > taken even though it doesn't include all required WAL files. Ouch. That is definitely a bug if it behaves that way. > To fix this problem, I think that, when the replication connection is > terminated, ReceiveXlogStream() should check whether we've already > reached the stop point by calling stream_stop() before returning TRUE. > If we've not yet (this means that we've not received all required WAL > files yet), ReceiveXlogStream() should return FALSE and > pg_basebackup should emit an error message. Comments? Doesn't it already return false because it detects the error of the connection? What's the codepath where we end up returning true even though we had a connection failure? Shouldn't that end up under the "could not read copy data" branch, which already returns false? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Tue, Jul 10, 2012 at 6:45 PM, Fujii Masao wrote: > On Tue, Jul 10, 2012 at 6:27 AM, Magnus Hagander wrote: >> On Mon, Jul 9, 2012 at 8:23 PM, Fujii Masao wrote: >>> Hi, >>> >>> I found several problems in pg_receivexlog, e.g., memory leaks, >>> file-descripter leaks, ..etc. The attached patch fixes these problems. >> >> While I don't doubt that what you've found are real problems, would >> you mind explaining exactly what they are, so we don't have to >> reverse-engineer the patch to figure that out? > > Yep. > > When an error happens after replication connection has been established, > pg_receivexlog doesn't close an open file descriptor and release an allocated > memory area. This was harmless before 16282ae688de2b320cf176e9be8a89e4dfc60698 > because pg_receivexlog exits immediately when an error happens. But > currently in an error case, pg_receivexlog tries reconnecting to the server > infinitely, so file descriptors and memory would leak. I think this is problem > and should be fixed. The patch which I submitted yesterday changes > pg_receivexlog so that it closes the open file and frees the memory area > before reconnecting to the server. Thanks. I get it now, and this explains why I didn't see it before - I didn't check properly after we added the loop mode. Patch applied with minor changes (e.g. there's no point in doing PQfinish(tmpconn) right after you've verified tmpconn is NULL) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao wrote: > Hi, > > I found several problems in pg_receivexlog, e.g., memory leaks, > file-descripter leaks, ..etc. The attached patch fixes these problems. > > ISTM there are still some other problems in pg_receivexlog, so I'll > read it deeply later. While pg_basebackup background process is streaming WAL records, if its replication connection is terminated (e.g., walsender in the server is accidentally terminated by SIGTERM signal), pg_basebackup ends up failing to include all required WAL files in the backup. The problem is that, in this case, pg_basebackup doesn't emit any error message at all. So an user might misunderstand that a base backup has been successfully taken even though it doesn't include all required WAL files. To fix this problem, I think that, when the replication connection is terminated, ReceiveXlogStream() should check whether we've already reached the stop point by calling stream_stop() before returning TRUE. If we've not yet (this means that we've not received all required WAL files yet), ReceiveXlogStream() should return FALSE and pg_basebackup should emit an error message. Comments? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Tue, Jul 10, 2012 at 6:27 AM, Magnus Hagander wrote: > On Mon, Jul 9, 2012 at 8:23 PM, Fujii Masao wrote: >> Hi, >> >> I found several problems in pg_receivexlog, e.g., memory leaks, >> file-descripter leaks, ..etc. The attached patch fixes these problems. > > While I don't doubt that what you've found are real problems, would > you mind explaining exactly what they are, so we don't have to > reverse-engineer the patch to figure that out? Yep. When an error happens after replication connection has been established, pg_receivexlog doesn't close an open file descriptor and release an allocated memory area. This was harmless before 16282ae688de2b320cf176e9be8a89e4dfc60698 because pg_receivexlog exits immediately when an error happens. But currently in an error case, pg_receivexlog tries reconnecting to the server infinitely, so file descriptors and memory would leak. I think this is problem and should be fixed. The patch which I submitted yesterday changes pg_receivexlog so that it closes the open file and frees the memory area before reconnecting to the server. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] several problems in pg_receivexlog
On Mon, Jul 9, 2012 at 8:23 PM, Fujii Masao wrote: > Hi, > > I found several problems in pg_receivexlog, e.g., memory leaks, > file-descripter leaks, ..etc. The attached patch fixes these problems. While I don't doubt that what you've found are real problems, would you mind explaining exactly what they are, so we don't have to reverse-engineer the patch to figure that out? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] several problems in pg_receivexlog
Hi, I found several problems in pg_receivexlog, e.g., memory leaks, file-descripter leaks, ..etc. The attached patch fixes these problems. ISTM there are still some other problems in pg_receivexlog, so I'll read it deeply later. Regards, -- Fujii Masao fix_pgreceivexlog_problems_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers