Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander wrote: > I've applied this version with a few more minor changes that Heikki found. Cool! When I tried pg_receivexlog and checked the contents of streamed WAL file by xlogdump, I found that recent WAL records that walsender has already sent don't exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL records to the disk as soon as possible, but it doesn't. Is this intentional? Or bug? Am I missing something? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 09:29, Fujii Masao wrote: > On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander wrote: >> I've applied this version with a few more minor changes that Heikki found. > > Cool! > > When I tried pg_receivexlog and checked the contents of streamed WAL file by > xlogdump, I found that recent WAL records that walsender has already sent > don't > exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL > records to the disk as soon as possible, but it doesn't. Is this > intentional? Or bug? > Am I missing something? It writes it to disk as soon as possible, but doesn't fsync() until the end of each segment. Are you by any chance looking at the file while it's running? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander wrote: > On Thu, Oct 27, 2011 at 09:29, Fujii Masao wrote: >> On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander wrote: >>> I've applied this version with a few more minor changes that Heikki found. >> >> Cool! >> >> When I tried pg_receivexlog and checked the contents of streamed WAL file by >> xlogdump, I found that recent WAL records that walsender has already sent >> don't >> exist in that WAL file. I expected that pg_receivexlog writes the streamed >> WAL >> records to the disk as soon as possible, but it doesn't. Is this >> intentional? Or bug? >> Am I missing something? > > It writes it to disk as soon as possible, but doesn't fsync() until > the end of each segment. Are you by any chance looking at the file > while it's running? No. I looked at that file after shutting down the master server. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 09:46, Fujii Masao wrote: > On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander wrote: >> On Thu, Oct 27, 2011 at 09:29, Fujii Masao wrote: >>> On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander >>> wrote: I've applied this version with a few more minor changes that Heikki found. >>> >>> Cool! >>> >>> When I tried pg_receivexlog and checked the contents of streamed WAL file by >>> xlogdump, I found that recent WAL records that walsender has already sent >>> don't >>> exist in that WAL file. I expected that pg_receivexlog writes the streamed >>> WAL >>> records to the disk as soon as possible, but it doesn't. Is this >>> intentional? Or bug? >>> Am I missing something? >> >> It writes it to disk as soon as possible, but doesn't fsync() until >> the end of each segment. Are you by any chance looking at the file >> while it's running? > > No. I looked at that file after shutting down the master server. Ugh, in that case something is certainly wrong. There is nothing but setting up some offset values between PQgetCopyData() and write()... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander wrote: > On Thu, Oct 27, 2011 at 09:46, Fujii Masao wrote: >> On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander wrote: >>> On Thu, Oct 27, 2011 at 09:29, Fujii Masao wrote: On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander wrote: > I've applied this version with a few more minor changes that Heikki found. Cool! When I tried pg_receivexlog and checked the contents of streamed WAL file by xlogdump, I found that recent WAL records that walsender has already sent don't exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL records to the disk as soon as possible, but it doesn't. Is this intentional? Or bug? Am I missing something? >>> >>> It writes it to disk as soon as possible, but doesn't fsync() until >>> the end of each segment. Are you by any chance looking at the file >>> while it's running? >> >> No. I looked at that file after shutting down the master server. > > Ugh, in that case something is certainly wrong. There is nothing but > setting up some offset values between PQgetCopyData() and write()... When end-of-copy stream is found or an error happens, pg_receivexlog exits without flushing outstanding WAL records. Which seems to cause the problem I reported. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On 27.10.2011 09:57, Heikki Linnakangas wrote: My suggestion is to fix the CLOG problem in that same way that you fixed the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before CheckPointGuts(). Here's what I image CreateCheckPoint() should look like: 1) LogStandbySnapshot() and fill out oldestActiveXid 2) Fill out REDO 3) Wait for concurrent commits 4) Fill out nextXid and the other fields 5) CheckPointGuts() 6) Rest It's then no longer necessary for LogStandbySnapshot() do modify the nextXid, since we fill out nextXid after LogStandbySnapshot() and will thus derive a higher value than LogStandbySnapshot() would have. Hmm, I don't think that fully fixes the problem. Even if you're certain that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might decide to truncate it away again while the checkpoint is running. Oh, scratch that. During recovery, we merrily treat missing slru files as they were filled with zeros. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 10:12, Fujii Masao wrote: > On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander wrote: >> On Thu, Oct 27, 2011 at 09:46, Fujii Masao wrote: >>> On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander >>> wrote: On Thu, Oct 27, 2011 at 09:29, Fujii Masao wrote: > On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander > wrote: >> I've applied this version with a few more minor changes that Heikki >> found. > > Cool! > > When I tried pg_receivexlog and checked the contents of streamed WAL file > by > xlogdump, I found that recent WAL records that walsender has already sent > don't > exist in that WAL file. I expected that pg_receivexlog writes the > streamed WAL > records to the disk as soon as possible, but it doesn't. Is this > intentional? Or bug? > Am I missing something? It writes it to disk as soon as possible, but doesn't fsync() until the end of each segment. Are you by any chance looking at the file while it's running? >>> >>> No. I looked at that file after shutting down the master server. >> >> Ugh, in that case something is certainly wrong. There is nothing but >> setting up some offset values between PQgetCopyData() and write()... > > When end-of-copy stream is found or an error happens, pg_receivexlog > exits without flushing outstanding WAL records. Which seems to cause > the problem I reported. Not sure I follow. When we arrive at PQgetCopyData() there should be nothing buffered, and if the end of stream happens there it returns -1, and we exit, no? So where is the data that's lost? I do realize we don't actually fsync() and close() in this case - is that what you are referring to? But the data should already have been write()d, so it should still be there, no? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander wrote: > Not sure I follow. When we arrive at PQgetCopyData() there should be > nothing buffered, and if the end of stream happens there it returns > -1, and we exit, no? So where is the data that's lost? > > I do realize we don't actually fsync() and close() in this case - is > that what you are referring to? But the data should already have been > write()d, so it should still be there, no? Oh, right. Hmm.. xlogdump might be the cause. Though I've not read the code of xlogdump, I wonder if it gives up outputting the contents of WAL file when it finds a partial WAL page... This strikes me that recovery code has the same problem. No? IOW, when a partial WAL page is found during recovery, I'm afraid that page would not be replayed though it contains valid data. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs wrote: >>> This fixes both the subtrans and clog bugs in one patch. > >> I don't see the point of changing StartupCLOG() to be an empty >> function and adding a new function TrimCLOG() that does everything >> StartupCLOG() used to do. > > +1 ... I found that overly cute also. It would have been even easier to move StartupCLOG() later, but then we'd need a big comment explaining why CLOG starts up at one point and subtrans starts up at another point, since that is very confusing way of doing things. I wrote it that way first and it definitely looks strange. It's much easier to understand that StartupCLOG() is actually a no-op and that we need to trim the clog at the end of recovery in all cases. The patch isn't meant to be cute, just a better of way of expressing what needs to be done, so I think the patch should stay that way. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On 27.10.2011 02:29, Florian Pflug wrote: Per my theory about the cause of the problem in my other mail, I think you might see StartupCLOG failures even during crash recovery, provided that wal_level was set to hot_standby when the primary crashed. Here's how 1) We start a checkpoint, and get as far as LogStandbySnapshot() 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId(). The assigned XID requires CLOG extension. 3) The checkpoint continues, and LogStandbySnapshot () advances the checkpoint's nextXid to the XID assigned in (2). 4) We crash after writing the checkpoint record, but before the CLOG extension makes it to the disk, and before any trace of the XID assigned in (2) makes it to the xlog. Then StartupCLOG() would fail at the end of recovery, because we'd end up with a nextXid whose corresponding CLOG page doesn't exist. No, clog extension is WAL-logged while holding the XidGenLock. At step 3, LogStandbySnapshot() would block until the clog-extension record is written to WAL, so crash recovery would see and replay that record before calling StartupCLOG(). That can happen during hot standby, though, because StartupCLOG() is called earlier. My suggestion is to fix the CLOG problem in that same way that you fixed the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before CheckPointGuts(). Here's what I image CreateCheckPoint() should look like: 1) LogStandbySnapshot() and fill out oldestActiveXid 2) Fill out REDO 3) Wait for concurrent commits 4) Fill out nextXid and the other fields 5) CheckPointGuts() 6) Rest It's then no longer necessary for LogStandbySnapshot() do modify the nextXid, since we fill out nextXid after LogStandbySnapshot() and will thus derive a higher value than LogStandbySnapshot() would have. Hmm, I don't think that fully fixes the problem. Even if you're certain that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might decide to truncate it away again while the checkpoint is running. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao wrote: > On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander wrote: >> Not sure I follow. When we arrive at PQgetCopyData() there should be >> nothing buffered, and if the end of stream happens there it returns >> -1, and we exit, no? So where is the data that's lost? >> >> I do realize we don't actually fsync() and close() in this case - is >> that what you are referring to? But the data should already have been >> write()d, so it should still be there, no? > > Oh, right. Hmm.. xlogdump might be the cause. > > Though I've not read the code of xlogdump, I wonder if it gives up > outputting the contents of WAL file when it finds a partial WAL page... > This strikes me that recovery code has the same problem. No? > IOW, when a partial WAL page is found during recovery, I'm afraid > that page would not be replayed though it contains valid data. My concern was right. When I performed a recovery using the streamed WAL files, the loss of data happened. A partial WAL page was not replayed. To avoid this problem, I think that we should change pg_receivexlog so that it writes WAL data *by the block*, or creates, like walreceiver, WAL file before writing any data. Otherwise, though pg_receivexlog streams WAL data in realtime, the latest WAL data might not be available for recovery. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 12:29, Fujii Masao wrote: > On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao wrote: >> On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander wrote: >>> Not sure I follow. When we arrive at PQgetCopyData() there should be >>> nothing buffered, and if the end of stream happens there it returns >>> -1, and we exit, no? So where is the data that's lost? >>> >>> I do realize we don't actually fsync() and close() in this case - is >>> that what you are referring to? But the data should already have been >>> write()d, so it should still be there, no? >> >> Oh, right. Hmm.. xlogdump might be the cause. >> >> Though I've not read the code of xlogdump, I wonder if it gives up >> outputting the contents of WAL file when it finds a partial WAL page... >> This strikes me that recovery code has the same problem. No? >> IOW, when a partial WAL page is found during recovery, I'm afraid >> that page would not be replayed though it contains valid data. > > My concern was right. When I performed a recovery using the streamed > WAL files, the loss of data happened. A partial WAL page was not replayed. > > To avoid this problem, I think that we should change pg_receivexlog so > that it writes WAL data *by the block*, or creates, like walreceiver, WAL file > before writing any data. Otherwise, though pg_receivexlog streams WAL > data in realtime, the latest WAL data might not be available for recovery. Ah, so you were recovering data from the last, partial, file? Not from a completed file? I'm rewriting the handling of partial files per the other thread started by Heikki. The idea is that there will be an actual .partial file in there when pg_receivexlog has ended, and you have to deal with it manually. The typical way would be to pad it with zeroes to the end. Doing such padding would solve this recovery issue, correct? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander wrote: > On Thu, Oct 27, 2011 at 12:29, Fujii Masao wrote: >> On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao wrote: >>> On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander >>> wrote: Not sure I follow. When we arrive at PQgetCopyData() there should be nothing buffered, and if the end of stream happens there it returns -1, and we exit, no? So where is the data that's lost? I do realize we don't actually fsync() and close() in this case - is that what you are referring to? But the data should already have been write()d, so it should still be there, no? >>> >>> Oh, right. Hmm.. xlogdump might be the cause. >>> >>> Though I've not read the code of xlogdump, I wonder if it gives up >>> outputting the contents of WAL file when it finds a partial WAL page... >>> This strikes me that recovery code has the same problem. No? >>> IOW, when a partial WAL page is found during recovery, I'm afraid >>> that page would not be replayed though it contains valid data. >> >> My concern was right. When I performed a recovery using the streamed >> WAL files, the loss of data happened. A partial WAL page was not replayed. >> >> To avoid this problem, I think that we should change pg_receivexlog so >> that it writes WAL data *by the block*, or creates, like walreceiver, WAL >> file >> before writing any data. Otherwise, though pg_receivexlog streams WAL >> data in realtime, the latest WAL data might not be available for recovery. > > Ah, so you were recovering data from the last, partial, file? Not from > a completed file? Yes. I copied all streamed WAL files to pg_xlog directory and started recovery. > I'm rewriting the handling of partial files per the other thread > started by Heikki. The idea is that there will be an actual .partial > file in there when pg_receivexlog has ended, and you have to deal with > it manually. The typical way would be to pad it with zeroes to the > end. Doing such padding would solve this recovery issue, correct? Yes. But that sounds unuserfriendly. Padding the WAL file manually is easy-to-do for a user? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On 27.10.2011 14:09, Fujii Masao wrote: On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander wrote: I'm rewriting the handling of partial files per the other thread started by Heikki. The idea is that there will be an actual .partial file in there when pg_receivexlog has ended, and you have to deal with it manually. The typical way would be to pad it with zeroes to the end. Doing such padding would solve this recovery issue, correct? Yes. But that sounds unuserfriendly. Padding the WAL file manually is easy-to-do for a user? "truncate -s 16M " works at least on my Linux system. Not sure how you'd do it on Windows. Perhaps we should add automatic padding in the server, though. It wouldn't take much code in the server, and would make life easier for people writing their scripts. Maybe even have the backend check for a .partial file if it can't find a regularly named XLOG file. Needs some thought.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas wrote: > On 27.10.2011 14:09, Fujii Masao wrote: >> >> On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander >> wrote: >>> >>> I'm rewriting the handling of partial files per the other thread >>> started by Heikki. The idea is that there will be an actual .partial >>> file in there when pg_receivexlog has ended, and you have to deal with >>> it manually. The typical way would be to pad it with zeroes to the >>> end. Doing such padding would solve this recovery issue, correct? >> >> Yes. But that sounds unuserfriendly. Padding the WAL file manually >> is easy-to-do for a user? > > "truncate -s 16M " works at least on my Linux system. Not sure how > you'd do it on Windows. Yeah, taht's easy enough. I don't think there are similar tools on windows, but we could probably put together a quick script for people to use if necessary. > Perhaps we should add automatic padding in the server, though. It wouldn't > take much code in the server, and would make life easier for people writing > their scripts. Maybe even have the backend check for a .partial file if it > can't find a regularly named XLOG file. Needs some thought.. I'd definitely want to avoid anything that requires pg_receivexlog to actually *parse* the WAL. That'll make it way more complex than I'd like. Having recovery consider a .partial file might be interesting. It could consider that only if there are no other complete files available, or something like that? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs wrote: > On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane wrote: >> Robert Haas writes: >>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs wrote: This fixes both the subtrans and clog bugs in one patch. >> >>> I don't see the point of changing StartupCLOG() to be an empty >>> function and adding a new function TrimCLOG() that does everything >>> StartupCLOG() used to do. >> >> +1 ... I found that overly cute also. > > It would have been even easier to move StartupCLOG() later, but then > we'd need a big comment explaining why CLOG starts up at one point and > subtrans starts up at another point, since that is very confusing way > of doing things. I wrote it that way first and it definitely looks > strange. > > It's much easier to understand that StartupCLOG() is actually a no-op > and that we need to trim the clog at the end of recovery in all cases. If it's a no-op, why have it at all? I know we have a bunch of places in the code where we have empty stubs where there used to be initialization or cleanup code, but I've never found that particularly good style. If something no longer requires initialization in a certain place, I think we should nuke the whole function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 7:19 AM, Heikki Linnakangas wrote: > On 27.10.2011 14:09, Fujii Masao wrote: >> >> On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander >> wrote: >>> >>> I'm rewriting the handling of partial files per the other thread >>> started by Heikki. The idea is that there will be an actual .partial >>> file in there when pg_receivexlog has ended, and you have to deal with >>> it manually. The typical way would be to pad it with zeroes to the >>> end. Doing such padding would solve this recovery issue, correct? >> >> Yes. But that sounds unuserfriendly. Padding the WAL file manually >> is easy-to-do for a user? > > "truncate -s 16M " works at least on my Linux system. Not sure how > you'd do it on Windows. One of the common I hear about PostgreSQL is that our replication system is more difficult to set up than people would like, and it's too easy to make mistakes that can corrupt your data without realizing it; I don't think making them need to truncate a file to 16 megabytes is going to improve things there. > Perhaps we should add automatic padding in the server, though. It wouldn't > take much code in the server, and would make life easier for people writing > their scripts. Maybe even have the backend check for a .partial file if it > can't find a regularly named XLOG file. Needs some thought.. +1 for figuring out something, though I'm not sure exactly what. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct27, 2011, at 08:57 , Heikki Linnakangas wrote: > On 27.10.2011 02:29, Florian Pflug wrote: >> Per my theory about the cause of the problem in my other mail, I think you >> might see StartupCLOG failures even during crash recovery, provided that >> wal_level was set to hot_standby when the primary crashed. Here's how >> >> 1) We start a checkpoint, and get as far as LogStandbySnapshot() >> 2) A backend does AssignTransactionId, and gets as far as >> GetTransactionoId(). >> The assigned XID requires CLOG extension. >> 3) The checkpoint continues, and LogStandbySnapshot () advances the >> checkpoint's nextXid to the XID assigned in (2). >> 4) We crash after writing the checkpoint record, but before the CLOG >> extension makes it to the disk, and before any trace of the XID assigned >> in (2) makes it to the xlog. >> >> Then StartupCLOG() would fail at the end of recovery, because we'd end up >> with a nextXid whose corresponding CLOG page doesn't exist. > > No, clog extension is WAL-logged while holding the XidGenLock. At step 3, > LogStandbySnapshot() would block until the clog-extension record is written > to WAL, so crash recovery would see and replay that record before calling > StartupCLOG(). Hm, true. But it still seems wrong for LogStandbySnapshot() to modify the checkpoint's nextXid, and even more wrong to do that only if wal_mode = hot_standby. Plus, I think it's a smart idea to verify that the required parts of the CLOG are available at the start of recovery. Because if they're missing, the data on the standby *will* be corrupted. Is there any argument against doiing LogStandbySnapshot() earlier (i.e., at the time oldestActiveXid is computed)? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] isolationtester and invalid permutations
On Tue, Oct 25, 2011 at 9:13 PM, Alvaro Herrera wrote: > Instead of simply aborting a spec that specifies running commands on > blocked sessions (what we call an invalid permutation), it seems more > useful to report the problem, cleanup the sessions, and continue with > the next permutation. > > This, in conjunction with the dry-run patch I submitted earlier, makes > it easier to determine a working spec: dry-run the spec; copy the > so-generated permutation lines into the spec; run the spec normally, > which reports the invalid permutations; comment out the invalid > permutations from the spec; done. > > The attached patch, again from Alexander Shulgin (with some tweaks from > me) does that. > > Comments? Seems sensible. I think we should avoid including invalid permutations in our regression test suite, but this still seems useful for the reasons you mention. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)
(pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch; linux x86_64 GNU/Linux 2.6.18-274.3.1.el5) Hi, here is another peculiarity, which I think is a bug: -- first without CORRESPONDING: $ psql -Xaf null.sql select 1 a , 2 b union all select null a, 4 b ; a | b ---+--- 1 | 2 | 4 (2 rows) -- then with CORRESPONDING: select 1 a , 2 b union all corresponding select null a, 4 b ; psql:null.sql:9: ERROR: failed to find conversion function from unknown to integer If the null value is in a table column the error does not occur: drop table if exists t1; create table t1 (a int, b int); insert into t1 values (1,2); drop table if exists t2; create table t2 (a int, b int); insert into t2 values (null,2); select a,b from t1 union all corresponding select a,b from t2 ; a | b ---+--- 1 | 2 | 2 (2 rows) I'm not sure it is actually a bug; but it seems an unneccessary error. thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hot Standby startup with overflowed snapshots
Chris Redekop's recent report of slow startup for Hot Standby has made me revisit the code there. Although there isn't a bug, there is a missed opportunity for starting up faster which could be the source of Chris' annoyance. The following patch allows a faster startup in some circumstances. The patch also alters the log levels for messages and gives a single simple message for this situation. The log will now say LOG: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least %u (now %u) ...multiple times until snapshot non-overflowed or xid reached... whereas before the first LOG message shown was LOG: consistent state delayed because recovery snapshot incomplete and only later, at DEBUG2 do you see LOG: recovery snapshot waiting for %u oldest active xid on standby is %u ...multiple times until xid reached... Comments please. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services faster_hot_standby_startup_withsubxacts.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas wrote: > On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs wrote: >> On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane wrote: >>> Robert Haas writes: On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs wrote: > This fixes both the subtrans and clog bugs in one patch. >>> I don't see the point of changing StartupCLOG() to be an empty function and adding a new function TrimCLOG() that does everything StartupCLOG() used to do. >>> >>> +1 ... I found that overly cute also. >> >> It would have been even easier to move StartupCLOG() later, but then >> we'd need a big comment explaining why CLOG starts up at one point and >> subtrans starts up at another point, since that is very confusing way >> of doing things. I wrote it that way first and it definitely looks >> strange. >> >> It's much easier to understand that StartupCLOG() is actually a no-op >> and that we need to trim the clog at the end of recovery in all cases. > > If it's a no-op, why have it at all? I know we have a bunch of places > in the code where we have empty stubs where there used to be > initialization or cleanup code, but I've never found that particularly > good style. If something no longer requires initialization in a > certain place, I think we should nuke the whole function. It is a no-op for exactly the same reason other similar functions are no-ops - it used to do something but now does not. Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog will immediately ask "why?". IMHO it's easier to have an obviously named function than a comment - its less invasive for a backpatch as well. I'm following current code style. If you wish to change that, feel free to change this and all other locations that do this. Until then, doing this makes most sense and follows current coding style. If I had done it the way you suggest, I don't doubt someone would say in about 6 months "Which idiot removed StartupClog()?". -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug wrote: > Per my theory about the cause of the problem in my other mail, I think you > might see StartupCLOG failures even during crash recovery, provided that > wal_level was set to hot_standby when the primary crashed. Here's how > > 1) We start a checkpoint, and get as far as LogStandbySnapshot() > 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId(). > The assigned XID requires CLOG extension. > 3) The checkpoint continues, and LogStandbySnapshot () advances the > checkpoint's nextXid to the XID assigned in (2). > 4) We crash after writing the checkpoint record, but before the CLOG > extension makes it to the disk, and before any trace of the XID assigned > in (2) makes it to the xlog. > > Then StartupCLOG() would fail at the end of recovery, because we'd end up > with a nextXid whose corresponding CLOG page doesn't exist. Clog extension holds XidGenLock, as does LogStandbySnapshot, which specifically excludes the above scenario. > Quite aside from that concern, I think it's probably not a good idea > for the nextXid value of a checkpoint to depend on whether wal_level > was set to hot_standby or not. Our recovery code is already quite complex > and hard to test, and this just adds one more combination that has to > be thought about while coding and that needs to be tested. > > My suggestion is to fix the CLOG problem in that same way that you fixed > the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before > CheckPointGuts(). > > Here's what I image CreateCheckPoint() should look like: > > 1) LogStandbySnapshot() and fill out oldestActiveXid > 2) Fill out REDO > 3) Wait for concurrent commits > 4) Fill out nextXid and the other fields > 5) CheckPointGuts() > 6) Rest > > It's then no longer necessary for LogStandbySnapshot() do modify > the nextXid, since we fill out nextXid after LogStandbySnapshot() and > will thus derive a higher value than LogStandbySnapshot() would have. > > We could then also fold GetOldestActiveTransactionId() back into > your proposed LogStandbySnapshot() and thus don't need two ProcArray > traversals. I think you make a good case for doing this. However, I'm concerned that moving LogStandbySnapshot() in a backpatch seems more risky than it's worth. We could easily introduce a new bug into what we would all agree is a complex piece of code. Minimal change seems best in this case. And also, 2 ProcArray traversals is not a problem there. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct27, 2011, at 15:51 , Simon Riggs wrote: > On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug wrote: >> Here's what I image CreateCheckPoint() should look like: >> >> 1) LogStandbySnapshot() and fill out oldestActiveXid >> 2) Fill out REDO >> 3) Wait for concurrent commits >> 4) Fill out nextXid and the other fields >> 5) CheckPointGuts() >> 6) Rest >> >> It's then no longer necessary for LogStandbySnapshot() do modify >> the nextXid, since we fill out nextXid after LogStandbySnapshot() and >> will thus derive a higher value than LogStandbySnapshot() would have. >> >> We could then also fold GetOldestActiveTransactionId() back into >> your proposed LogStandbySnapshot() and thus don't need two ProcArray >> traversals. > > I think you make a good case for doing this. > > However, I'm concerned that moving LogStandbySnapshot() in a backpatch > seems more risky than it's worth. We could easily introduce a new bug > into what we would all agree is a complex piece of code. Minimal > change seems best in this case. OTOH, we currently compute oldestActiveXid within LogStandbySnapshot(). Your proposed patch changes that, which also carries a risk since something could depend on these values being in sync. Especially since both the logged snapshot and oldestActiveXid influence the snapshot tracking on the slave. But since you wrote most of that code, your judgement about the relative risks of these two approaches obviously out-weights mine. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
Simon Riggs writes: > On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas wrote: >> On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs wrote: >>> It's much easier to understand that StartupCLOG() is actually a no-op >>> and that we need to trim the clog at the end of recovery in all cases. >> If it's a no-op, why have it at all? > It is a no-op for exactly the same reason other similar functions are > no-ops - it used to do something but now does not. > Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog > will immediately ask "why?". I think it's a good point that StartupCLog doesn't exist in a vacuum but should be parallel to the init functions for the other SLRU modules. So at this point I think I agree with Simon's approach. However, the obvious next question is whether those other modules don't need to be changed also, and if not why not. Another issue is that if StartupCLog is left as a no-op, what will happen if someone mistakenly tries to access clog before the trim function is called? It would be a good idea to make sure that such a thing results in an easily-identifiable failure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup
On Wed, Oct 26, 2011 at 09:52, Heikki Linnakangas wrote: > (CC'ing pgsql-hackers, this started as an IM discussion yesterday but really > belongs in the archives) > > On 25.10.2011 23:52, Magnus Hagander wrote: >>> >>> There's a tiny chance to get incomplete xlog files with pg_receivexlog if >>> you crash: >>> 1. pg_receivexlog finishes write()ing a file but system crashes before >>> fsync() finishes. >>> 2. When pg_receivexlog restarts after crash, the last WAL file was not >>> fully flushed to disk, with >>> holes in the middle, but it has the right length. pg_receivexlog will >>> continue streaming from the next file. >>> not sure if we care about such a narrow window, but maybe we do >> >> So how would we go about fixing that? Always unlink the last file in >> the directory and try from there would seem dangerous too - what if >> it's not available on the master anymore, then we might have given up >> on data... > > Start streaming from the beginning of the last segment, but don't unlink it > first. Just overwrite it as you receive the data. > > Or, always create new xlog file as "0001000100D3.partial", and > only when it's fully written, fsync it, and then rename it to > "0001000100D3". Then you know that if a file doesn't have the > .partial suffix, it's complete. The fact that the last partial file always > has the .partial suffix needs some extra pushups in the restore_command, > though. Here's a version that does this. Turns out this requires a lot less code than what was previously in there, which is always nice. We still need to solve the other part which is how to deal with the partial files on restore. But this is definitely a cleaner way from a pure pg_receivexlog perspective. Comments/reviews? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ *** a/src/bin/pg_basebackup/pg_receivexlog.c --- b/src/bin/pg_basebackup/pg_receivexlog.c *** *** 71,104 usage(void) static bool segment_callback(XLogRecPtr segendpos, uint32 timeline) { - char fn[MAXPGPATH]; - struct stat statbuf; - if (verbose) fprintf(stderr, _("%s: finished segment at %X/%X (timeline %u)\n"), progname, segendpos.xlogid, segendpos.xrecoff, timeline); /* - * Check if there is a partial file for the name we just finished, and if - * there is, remove it under the assumption that we have now got all the - * data we need. - */ - segendpos.xrecoff /= XLOG_SEG_SIZE; - PrevLogSeg(segendpos.xlogid, segendpos.xrecoff); - snprintf(fn, sizeof(fn), "%s/%08X%08X%08X.partial", - basedir, timeline, - segendpos.xlogid, - segendpos.xrecoff); - if (stat(fn, &statbuf) == 0) - { - /* File existed, get rid of it */ - if (verbose) - fprintf(stderr, _("%s: removing file \"%s\"\n"), - progname, fn); - unlink(fn); - } - - /* * Never abort from this - we handle all aborting in continue_streaming() */ return false; --- 71,81 *** *** 133,139 FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline) bool b; uint32 high_log = 0; uint32 high_seg = 0; - bool partial = false; dir = opendir(basedir); if (dir == NULL) --- 110,115 *** *** 195,201 FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline) disconnect_and_exit(1); } ! if (statbuf.st_size == 16 * 1024 * 1024) { /* Completed segment */ if (log > high_log || --- 171,177 disconnect_and_exit(1); } ! if (statbuf.st_size == XLOG_SEG_SIZE) { /* Completed segment */ if (log > high_log || *** *** 208,244 FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline) } else { ! /* ! * This is a partial file. Rename it out of the way. ! */ ! char newfn[MAXPGPATH]; ! ! fprintf(stderr, _("%s: renaming partial file \"%s\" to \"%s.partial\"\n"), ! progname, dirent->d_name, dirent->d_name); ! ! snprintf(newfn, sizeof(newfn), "%s/%s.partial", ! basedir, dirent->d_name); ! ! if (stat(newfn, &statbuf) == 0) ! { ! /* ! * XXX: perhaps we should only error out if the existing file ! * is larger? ! */ ! fprintf(stderr, _("%s: file \"%s\" already exists. Check and clean up manually.\n"), ! progname, newfn); ! disconnect_and_exit(1); ! } ! if (rename(fullpath, newfn) != 0) ! { ! fprintf(stderr, _("%s: could not rename \"%s\" to \"%s\": %s\n"), ! progname, fullpath, newfn, strerror(errno)); ! disconnect_and_exit(1); ! } ! ! /* Don't continue looking for more, we assume this is the last */ ! partial = true; ! break; } } --- 184,192 } else { ! fprintf(stderr, _("%s: segment file '%s' is incorrect size %d, skipping\n"), ! progname, dirent->d_name, (int) statbuf.st_size); ! continue; } } *** *** 247,263 FindStreamingStart(XLogRe
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug wrote: >> I think you make a good case for doing this. >> >> However, I'm concerned that moving LogStandbySnapshot() in a backpatch >> seems more risky than it's worth. We could easily introduce a new bug >> into what we would all agree is a complex piece of code. Minimal >> change seems best in this case. > > OTOH, we currently compute oldestActiveXid within LogStandbySnapshot(). > Your proposed patch changes that, which also carries a risk since something > could depend on these values being in sync. Especially since both the logged > snapshot and oldestActiveXid influence the snapshot tracking on the slave. > > But since you wrote most of that code, your judgement about the relative > risks of these two approaches obviously out-weights mine. We must move oldestActiveXid since that is the source of a bug. There is no need to move LogStandbySnapshot(), so I am suggesting we don't do that for the backpatch. I was going to implement it the way you suggest in HEAD, since I agree that is a cleaner way. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
Magnus Hagander writes: > On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas > wrote: >> On 27.10.2011 14:09, Fujii Masao wrote: >>> Yes. But that sounds unuserfriendly. Padding the WAL file manually >>> is easy-to-do for a user? > I'd definitely want to avoid anything that requires pg_receivexlog to > actually *parse* the WAL. That'll make it way more complex than I'd > like. What parsing? Just pad to 16MB with zeroes. In fact, I think the receiver should just create the file that size to start with, and then write received data into it, much like normal WAL creation does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 16:54, Tom Lane wrote: > Magnus Hagander writes: >> On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas >> wrote: >>> On 27.10.2011 14:09, Fujii Masao wrote: Yes. But that sounds unuserfriendly. Padding the WAL file manually is easy-to-do for a user? > >> I'd definitely want to avoid anything that requires pg_receivexlog to >> actually *parse* the WAL. That'll make it way more complex than I'd >> like. > > What parsing? Just pad to 16MB with zeroes. In fact, I think the I'm just sayihng that *if* parsing is required, it would be bad. > receiver should just create the file that size to start with, and then > write received data into it, much like normal WAL creation does. So when pg_receivexlog starts up, how would it know if the last file represents a completed file, or a half-full file, without actually parsing it? It could be a 16Mb file with 10 bytes of valid data, or a complete file with 16Mb of valid data. We could always ask for a retransmit of the whole file, but if that file is gone on the master, we won't be able to do that, and will error out in a situation that's not actually an error. Though I guess if we leave the file as .partial up until this point (per my other patch just posted), I guess this doesn't actually apply - if the file is called .partial, we'll overwrite into it. If it's not, then we assume it's a complete segment. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
Magnus Hagander writes: > On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas > wrote: >> Perhaps we should add automatic padding in the server, though. It wouldn't >> take much code in the server, and would make life easier for people writing >> their scripts. Maybe even have the backend check for a .partial file if it >> can't find a regularly named XLOG file. Needs some thought.. > > I'd definitely want to avoid anything that requires pg_receivexlog to > actually *parse* the WAL. That'll make it way more complex than I'd > like. What about creating the WAL file filled up with zeroes at the receiving end and then overwriting data as we receive it? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Thu, Oct 27, 2011 at 3:13 PM, Tom Lane wrote: > However, the > obvious next question is whether those other modules don't need to be > changed also, and if not why not. Good point. StartupSubtrans() is also changed by this patch, since it will be supplied with an earlier initialisation value. StartupMultiXact() didn't need changing, I thought, but I will review further. > Another issue is that if StartupCLog is left as a no-op, what will > happen if someone mistakenly tries to access clog before the trim > function is called? It would be a good idea to make sure that such > a thing results in an easily-identifiable failure. The old StartupCLOG() didn't do anything that was essential to using the clog, which is why its a no-op. You can still use the clog, just with zero startup. Maybe setting the current page should go in at startup, will think. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] out-of-order caution
On the docs page for the SELECT statement, there is a caution which starts with: | It is possible for a SELECT command using ORDER BY and FOR | UPDATE/SHARE to return rows out of order. This is because ORDER BY | is applied first. Is this risk limited to queries running in READ COMMITTED transactions? If so, I think that should be mentioned in the caution. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-order caution
On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner wrote: > On the docs page for the SELECT statement, there is a caution which > starts with: > > | It is possible for a SELECT command using ORDER BY and FOR > | UPDATE/SHARE to return rows out of order. This is because ORDER BY > | is applied first. > > Is this risk limited to queries running in READ COMMITTED > transactions? If so, I think that should be mentioned in the > caution. I think it should say that if this occurs with SERIALIZED transactions it will result in a serialisation error. Just to say there is no effect in serializable mode wouldn't be helpful. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] plpgsql versus long ELSIF chains
Some off-list discussion found that the cause of this problem: http://archives.postgresql.org/pgsql-general/2011-10/msg00879.php was an attempt to write a plpgsql IF-ELSIF-ELSIF-ELSIF-ELSIF-...-ELSE statement with five thousand branches. Putting aside the wisdom of doing that, it seems like the parser ought not fall over when you do. The reason it's doing so is that plpgsql's grammar handles ELSIF with a rule like this: stmt_else: K_ELSIF expr_until_then proc_sect stmt_else This is called right recursion in the Bison manual (as opposed to left recursion, where a nonterminal has itself as the *first* item in its own expansion), and the manual specifically says you should always avoid right recursion in favor of left recursion, else you are vulnerable to parser stack overflow. Duh. So, looking a bit more closely into why it's done that way, it's because ELSIF was shoehorned into a parsetree representation that was only meant for plain IF-THEN-ELSE. The generated parse tree will have a new level of PLpgSQL_stmt_if struct for each ELSIF. That means that even if we fix the grammar, we're still at risk of stack overflow at execution time because of recursion in pl_exec.c. So really what needs to be done here is to make ELSIF chains explicit in the parsetree representation, and handle them via looping not recursion at runtime. This will also make it a lot easier to do the grammar via left-recursion. So I'm going to go off and do that, but I wonder whether anyone thinks this is sufficiently important to back-patch. I'm inclined to think that back-patching isn't a good idea, because changing the representation of PLpgSQL_stmt_if will break (at least) EDB's plpgsql debugger; ISTM the number of complaints isn't enough to warrant doing that in released branches. More generally, it seems like a good idea to check whether there are any other unnecessary uses of right recursion in the core and plpgsql grammars. Unless somebody knows of an existing tool to do that, I'm thinking of writing a little perl script to check for it (probably by parsing the output of "bison -v") and adding the script to src/tools/ so we'll have it around to check again in the future. Any objections? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby startup with overflowed snapshots
Thanks for the patch Simon, but unfortunately it does not resolve the issue I am seeing. The standby still refuses to finish starting up until long after all clients have disconnected from the primary (>10 minutes). I do see your new log statement on startup, but only once - it does not repeat. Is there any way for me to see what the oldest xid on the standby is via controldata or something like that? The standby does stream to keep up with the primary while the primary has load, and then it becomes idle when the primary becomes idle (when I kill all the connections)so it appears to be current...but it just doesn't finish starting up I'm not sure if it's relevant, but after it has sat idle for a couple minutes I start seeing these statements in the log (with the same offset every time): DEBUG: skipping restartpoint, already performed at 9/9520 On Thu, Oct 27, 2011 at 7:26 AM, Simon Riggs wrote: > Chris Redekop's recent report of slow startup for Hot Standby has made > me revisit the code there. > > Although there isn't a bug, there is a missed opportunity for starting > up faster which could be the source of Chris' annoyance. > > The following patch allows a faster startup in some circumstances. > > The patch also alters the log levels for messages and gives a single > simple message for this situation. The log will now say > > LOG: recovery snapshot waiting for non-overflowed snapshot or until > oldest active xid on standby is at least %u (now %u) > ...multiple times until snapshot non-overflowed or xid reached... > > whereas before the first LOG message shown was > > LOG: consistent state delayed because recovery snapshot incomplete > and only later, at DEBUG2 do you see > LOG: recovery snapshot waiting for %u oldest active xid on standby is %u > ...multiple times until xid reached... > > Comments please. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] plpgsql versus long ELSIF chains
On Oct 27, 2011, at 9:18 AM, Tom Lane wrote: > So I'm going to go off and do that, but I wonder whether anyone thinks > this is sufficiently important to back-patch. I'm inclined to think > that back-patching isn't a good idea, because changing the > representation of PLpgSQL_stmt_if will break (at least) EDB's plpgsql > debugger; ISTM the number of complaints isn't enough to warrant doing > that in released branches. +1 to not back-patching. Seems like it doesn't come up all that often, right Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus long ELSIF chains
On 27.10.2011 19:18, Tom Lane wrote: So really what needs to be done here is to make ELSIF chains explicit in the parsetree representation, and handle them via looping not recursion at runtime. This will also make it a lot easier to do the grammar via left-recursion. So I'm going to go off and do that, but I wonder whether anyone thinks this is sufficiently important to back-patch. This doesn't look like a bug to me, just an unoptimal implementation. So -1 for backpatching. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby startup with overflowed snapshots
On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop wrote: > Thanks for the patch Simon, but unfortunately it does not resolve the issue > I am seeing. The standby still refuses to finish starting up until long > after all clients have disconnected from the primary (>10 minutes). I do > see your new log statement on startup, but only once - it does not repeat. > Is there any way for me to see what the oldest xid on the standby is via > controldata or something like that? The standby does stream to keep up with > the primary while the primary has load, and then it becomes idle when the > primary becomes idle (when I kill all the connections)so it appears to > be current...but it just doesn't finish starting up > I'm not sure if it's relevant, but after it has sat idle for a couple > minutes I start seeing these statements in the log (with the same offset > every time): > DEBUG: skipping restartpoint, already performed at 9/9520 OK, so it looks like there are 2 opportunities to improve, not just one. Try this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services faster_hot_standby_startup_withsubxacts.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in walsender when calling out to do_pg_stop_backup (and others?)
Greg Jaskiewicz wrote: > > On 19 Oct 2011, at 18:28, Florian Pflug wrote: > > All the other flags which indicate cancellation reasons are set from signal > > handers, I believe. We could of course mark as ClientConnectionLostPending > > as volatile just to be consistent. Not sure whether that's a good idea, or > > not. It might prevent a mistake should we ever add code to detect lost > > connections asynchronously (i.e., from somewhere else than pq_flush). And > > the cost is probably negligible, because CHECK_FOR_INTERRUPTS tests for > > InterruptPending before calling ProcessInterrupts(), so we only pay the > > cost of volatile if there's actually an interrupt pending. But I still > > think it's better to add qualifies such a volatile only when really > > necessary. A comment about why it *isn't* volatile is probably in order, > > though, so I'll add that in the next version of the patch. > > > Makes sense. > > I had to ask, because it sticks out. And indeed there is a possibility > that someone will one day will try to use from signal handler, etc. A C comment might help there. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-order caution
Simon Riggs wrote: > On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner > wrote: >> On the docs page for the SELECT statement, there is a caution >> which starts with: >> >> | It is possible for a SELECT command using ORDER BY and FOR >> | UPDATE/SHARE to return rows out of order. This is because ORDER >> | BY is applied first. >> >> Is this risk limited to queries running in READ COMMITTED >> transactions? If so, I think that should be mentioned in the >> caution. > > I think it should say that if this occurs with SERIALIZED > transactions it will result in a serialisation error. > > Just to say there is no effect in serializable mode wouldn't be > helpful. Hmm. At first reading I thought this was related to the mixed-snapshot issue in READ COMMITTED, but now I'm not so sure. Does anyone know which isolation levels are affected? Barring that, can anyone point to an existing test which demonstrates the problem? If this can happen in snapshot isolation with just one reader and one writer, I doubt that SSI helps with it. :-( -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-order caution
On Thu, Oct 27, 2011 at 1:51 PM, Kevin Grittner wrote: > Simon Riggs wrote: >> On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner >> wrote: >>> On the docs page for the SELECT statement, there is a caution >>> which starts with: >>> >>> | It is possible for a SELECT command using ORDER BY and FOR >>> | UPDATE/SHARE to return rows out of order. This is because ORDER >>> | BY is applied first. >>> >>> Is this risk limited to queries running in READ COMMITTED >>> transactions? If so, I think that should be mentioned in the >>> caution. >> >> I think it should say that if this occurs with SERIALIZED >> transactions it will result in a serialisation error. >> >> Just to say there is no effect in serializable mode wouldn't be >> helpful. > > Hmm. At first reading I thought this was related to the > mixed-snapshot issue in READ COMMITTED, but now I'm not so sure. > Does anyone know which isolation levels are affected? Barring that, > can anyone point to an existing test which demonstrates the problem? > > If this can happen in snapshot isolation with just one reader and > one writer, I doubt that SSI helps with it. :-( Simple test case: rhaas=# create table oops (a int); CREATE TABLE rhaas=# insert into oops values (1), (2), (3), (4); INSERT 0 4 rhaas=# begin; BEGIN rhaas=# update oops set a = 5 where a = 2; UPDATE 1 In another session: rhaas=# select * from oops order by 1 for update; Back to the first session: rhaas=# commit; COMMIT Second session now returns: a --- 1 5 3 4 (4 rows) But if you do the same thing at REPEATABLE READ, you get: ERROR: could not serialize access due to concurrent update STATEMENT: select * from oops order by 1 for update; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-order caution
Robert Haas wrote: > Simple test case: > > rhaas=# create table oops (a int); > CREATE TABLE > rhaas=# insert into oops values (1), (2), (3), (4); > INSERT 0 4 > rhaas=# begin; > BEGIN > rhaas=# update oops set a = 5 where a = 2; > UPDATE 1 > > In another session: > > rhaas=# select * from oops order by 1 for update; > > > Back to the first session: > > rhaas=# commit; > COMMIT > > Second session now returns: > > a > --- > 1 > 5 > 3 > 4 > (4 rows) > > But if you do the same thing at REPEATABLE READ, you get: > > ERROR: could not serialize access due to concurrent update > STATEMENT: select * from oops order by 1 for update; So it seems to me that the caution about this issue is only half-right. Below REPEATABLE READ isolation it behaves as currently described; REPEATABLE READ or SERIALIZABLE will throw that error. That is probably worth noting, since: (1) People should understand that they can't get incorrect results at either of the stricter isolation levels. (2) They *can* get a serialization failure involving just two transactions: a read and a write. This is not something which normally happens at any level, so it might tend to surprise people. No words leap to mind for me. Anyone else? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)
Hi, Union with NULL error persists without the corresponding patch. Here is the output from postgres without the patch: SELECT a FROM (SELECT 1 a) foo UNION SELECT a FROM (SELECT NULL a) foo2; ERROR: failed to find conversion function from unknown to integer It is thrown from parse_coerce.c:coerce_type method. I will try to dig deep on it. Regards, Kerem KAT On Thu, Oct 27, 2011 at 15:45, Erik Rijkers wrote: > (pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch; > linux x86_64 GNU/Linux 2.6.18-274.3.1.el5) > > Hi, > > here is another peculiarity, which I think is a bug: > > -- first without CORRESPONDING: > > $ psql -Xaf null.sql > select 1 a , 2 b > union all > select null a, 4 b ; > a | b > ---+--- > 1 | 2 > | 4 > (2 rows) > > -- then with CORRESPONDING: > > select 1 a , 2 b > union all > corresponding > select null a, 4 b ; > psql:null.sql:9: ERROR: failed to find conversion function from unknown to > integer > > > If the null value is in a table column the error does not occur: > > drop table if exists t1; create table t1 (a int, b int); insert into t1 > values (1,2); > drop table if exists t2; create table t2 (a int, b int); insert into t2 > values (null,2); > select a,b from t1 > union all > corresponding > select a,b from t2 ; > a | b > ---+--- > 1 | 2 > | 2 > (2 rows) > > > I'm not sure it is actually a bug; but it seems an unneccessary error. > > > thanks, > > Erik Rijkers > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct27, 2011, at 16:30 , Simon Riggs wrote: > On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug wrote: > >>> I think you make a good case for doing this. >>> >>> However, I'm concerned that moving LogStandbySnapshot() in a backpatch >>> seems more risky than it's worth. We could easily introduce a new bug >>> into what we would all agree is a complex piece of code. Minimal >>> change seems best in this case. >> >> OTOH, we currently compute oldestActiveXid within LogStandbySnapshot(). >> Your proposed patch changes that, which also carries a risk since something >> could depend on these values being in sync. Especially since both the logged >> snapshot and oldestActiveXid influence the snapshot tracking on the slave. >> >> But since you wrote most of that code, your judgement about the relative >> risks of these two approaches obviously out-weights mine. > > We must move oldestActiveXid since that is the source of a bug. There > is no need to move LogStandbySnapshot(), so I am suggesting we don't > do that for the backpatch. I was going to implement it the way you > suggest in HEAD, since I agree that is a cleaner way. Sound good. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-order caution
"Kevin Grittner" writes: > Simon Riggs wrote: >> On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner >> wrote: >>> | It is possible for a SELECT command using ORDER BY and FOR >>> | UPDATE/SHARE to return rows out of order. This is because ORDER >>> | BY is applied first. >> I think it should say that if this occurs with SERIALIZED >> transactions it will result in a serialisation error. > Hmm. At first reading I thought this was related to the > mixed-snapshot issue in READ COMMITTED, but now I'm not so sure. Simon's comment is correct. If you do a SELECT FOR UPDATE/SHARE in a non-READ-COMMITTED transaction, and it turns out that someone modified the tuple before you could lock it, you'll get a serialization error (cf ExecLockRows()), not updated data. So out-of-order sorting is not possible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-order caution
"Kevin Grittner" writes: > (2) They *can* get a serialization failure involving just two > transactions: a read and a write. Only if you ignore the difference between SELECT FOR UPDATE/SHARE and plain SELECT. I think calling the former a "read" is a conceptual error to start with. It has the same locking and synchronization behavior as a write. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] portal with hold
I am posting to beg for the implementation of a "with hold" feature for portals, similar to what available for cursors. This is needed by the JDBC driver to implement Java's Result.HOLD_CURSORS_OVER_COMMIT which is needed to make Java's setFetchSize() work which is needed to read large result sets. Please also see: http://postgresql.1045698.n5.nabble.com/setFetchSize-tp4935215p4941614.html -- View this message in context: http://postgresql.1045698.n5.nabble.com/portal-with-hold-tp4943998p4943998.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] fun with unlogged tables
One of the optimizations that I did for 9.1 was to make transactions that touch only temporary and/or unlogged tables always commit asynchronously, because if the database crashes the table contents will be blown away in their entirety, and whether or not the commit made it down to disk won't matter a bit. In my testing, this hugely improved the performance of unlogged tables. However, Heikki recently reported to me off-list that this can actually cause a significant performance problem in some circumstances, because committing asynchronously means that we can't set hint bits right away - we must wait until the WAL writer has completed its background flush. With default settings, this takes long enough to cause lots of extra clog traffic. He ran into the problem while running pgbench at scale factor 15, and I reproduced it the same way. I believe that at a large scale factor the effect is lessened because you're less likely to reread the same row multiple times before the commit hits the disk. It strikes me that, while it's not safe to set hint bits until the commit record hits the disk for *permanent* relations, it ought to be just fine for temporary and unlogged relations, because those pages will be gone after a crash, and their hint bits with them. Attached is a patch taking that approach. Another approach would be to have transactions that only touch temporary or unlogged relations to avoid changing the value that will be returned by TransactionIdGetCommitLSN(). This approach is better when synchronous_commit=off and a transaction touches both permanent and non-permanent tables, because it makes the decision as to whether hint bits can be set early based on which page is being updated rather than on some characteristic of the transaction. However, it also adds a small amount of overhead to the case where we're doing an asynchronous commit on a permanent table, because we do one more check before concluding that hint bits can't be set. I did some benchmarking of this approach using pgbench with scale factor 15. shared_buffers = 8GB, maintenance_work_mem = 1GB, checkpoint_segments = 30, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, synchronous_commit = off. On permanent tables 8 clients came out slower and 16 came out faster; I'm inclined to believe it's all in the noise. On unlogged tables the patch appears to be a clear win, massively so at 32 clients. Results below. The first number on each line is the client count, and the remaining numbers are tps including connections establishing, from individual 5-minute runs. Unlogged Tables, unpatched: 1 861.841894 752.762490 837.847109 8 3379.832456 4100.539369 3751.842036 16 6259.907605 5523.406202 4437.648873 32 4547.725770 5360.246166 4958.086754 Unlogged Tables, with patch: 1 887.562141 785.539717 920.275452 8 4436.366884 4374.135712 4335.791842 16 7518.908796 7478.427691 7476.817757 32 10433.615767 10450.577573 10374.186566 Permanent Tables, unpatched: 1 648.824800 647.239277 652.116249 8 3785.485647 3481.021391 3827.455756 16 5652.069678 4004.780105 4207.354612 32 5084.804778 4645.997471 5222.387075 Permanent Tables, with patch: 1 636.162775 637.145834 640.298944 8 3383.817707 3388.273809 3815.298676 16 5543.585926 5093.483757 5112.854318 32 5229.295024 4985.736460 5103.441187 Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company bip.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)
Kerem Kat writes: > Union with NULL error persists without the corresponding patch. Here > is the output from postgres without the patch: > SELECT a FROM (SELECT 1 a) foo > UNION > SELECT a FROM (SELECT NULL a) foo2; > ERROR: failed to find conversion function from unknown to integer Yeah, this is a longstanding issue that is not simple to fix without introducing other unpleasantnesses. It is not something you should try to deal with at the same time as implementing CORRESPONDING. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces
Florian Pflug wrote: > On Oct21, 2011, at 16:42 , Phil Sorber wrote: > > If you did want to make them immutable, I also like Florian's idea of > > a dependency graph. This would make the dumps less readable though. > > Hm, I kinda reversed my opinion on that, though - i.e., I no longer think > that the dependency graph idea has much merit. For two reasons > > First, dependencies work on OIDs, not on names. Thus, for the dependency > machinery to work for GUCs, they'd also need to store OIDs instead of > names of referenced schema objects. (Otherwise you get into trouble if > objects are renamed) > > Which of course doesn't work, at least for roles, because roles are > shared objects, but referenced objects might be database-local. > (search_path, for example). Is this a TODO? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)
I wrote: > Kerem Kat writes: >> Union with NULL error persists without the corresponding patch. Here >> is the output from postgres without the patch: >> SELECT a FROM (SELECT 1 a) foo >> UNION >> SELECT a FROM (SELECT NULL a) foo2; >> ERROR: failed to find conversion function from unknown to integer > Yeah, this is a longstanding issue that is not simple to fix without > introducing other unpleasantnesses. It is not something you should > try to deal with at the same time as implementing CORRESPONDING. BTW, just to clarify: although that case fails, the case Erik was complaining of does work in unmodified Postgres: regression=# select 1 a , 2 b union all select null a, 4 b ; a | b ---+--- 1 | 2 | 4 (2 rows) and I agree with him that it should still work with CORRESPONDING. Even though the behavior of unlabeled NULLs is less than perfect, we definitely don't want to break cases that work now. I suspect the failure means that you tried to postpone too much work to plan time. You do have to match up the columns honestly at parse time and do the necessary type coercions on them then. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)
Kerem Kat writes: > On Thu, Oct 27, 2011 at 23:20, Tom Lane wrote: >> BTW, just to clarify: although that case fails, the case Erik was >> complaining of does work in unmodified Postgres: >> ... >> and I agree with him that it should still work with CORRESPONDING. > That is by design, because CORRESPONDING is implemented as subqueries: Well, this appears to me to be a counterexample sufficient to refute that implementation decision. You can inject subqueries at plan time, if that helps you make things match up, but you can't rearrange things that way at parse time, as I gather you're doing or else you would not be seeing this problem. In any case, I already pointed out to you that rearranging the parse tree that way is problematic for reverse-listing the parse tree. We don't want to see subqueries injected in the results of printing parse trees with ruleutils.c. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby startup with overflowed snapshots
hrmz, still basically the same behaviour. I think it might be a *little* better with this patch. Before when under load it would start up quickly maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4 or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm still only seeing your log statement a single time (I'm running at debug2). I have discovered something though - when the standby is in this state if I force a checkpoint on the primary then the standby comes right up. Is there anything I check or try for you to help figure this out?or is it actually as designed that it could take 10-ish minutes to start up even after all clients have disconnected from the primary? On Thu, Oct 27, 2011 at 11:27 AM, Simon Riggs wrote: > On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop wrote: > > > Thanks for the patch Simon, but unfortunately it does not resolve the > issue > > I am seeing. The standby still refuses to finish starting up until long > > after all clients have disconnected from the primary (>10 minutes). I do > > see your new log statement on startup, but only once - it does not > repeat. > > Is there any way for me to see what the oldest xid on the standby is > via > > controldata or something like that? The standby does stream to keep up > with > > the primary while the primary has load, and then it becomes idle when the > > primary becomes idle (when I kill all the connections)so it appears > to > > be current...but it just doesn't finish starting up > > I'm not sure if it's relevant, but after it has sat idle for a couple > > minutes I start seeing these statements in the log (with the same offset > > every time): > > DEBUG: skipping restartpoint, already performed at 9/9520 > > OK, so it looks like there are 2 opportunities to improve, not just one. > > Try this. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)
On Thu, Oct 27, 2011 at 23:20, Tom Lane wrote: > I wrote: >> Kerem Kat writes: >>> Union with NULL error persists without the corresponding patch. Here >>> is the output from postgres without the patch: > >>> SELECT a FROM (SELECT 1 a) foo >>> UNION >>> SELECT a FROM (SELECT NULL a) foo2; > >>> ERROR: failed to find conversion function from unknown to integer > >> Yeah, this is a longstanding issue that is not simple to fix without >> introducing other unpleasantnesses. It is not something you should >> try to deal with at the same time as implementing CORRESPONDING. > > BTW, just to clarify: although that case fails, the case Erik was > complaining of does work in unmodified Postgres: > > regression=# select 1 a , 2 b > union all > select null a, 4 b ; > a | b > ---+--- > 1 | 2 > | 4 > (2 rows) > > and I agree with him that it should still work with CORRESPONDING. > Even though the behavior of unlabeled NULLs is less than perfect, > we definitely don't want to break cases that work now. I suspect > the failure means that you tried to postpone too much work to plan > time. You do have to match up the columns honestly at parse time > and do the necessary type coercions on them then. > > regards, tom lane > That is by design, because CORRESPONDING is implemented as subqueries: select 1 a , 2 b union all corresponding select null a, 4 b ; is equivalent to SELECT a, b FROM ( SELECT 1 a, 2 b ) foo UNION ALL SELECT a, b FROM ( SELECT null a, 4 b ) foo2; which gives the same error in unpatched postgres. Regards, Kerem KAT -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby startup with overflowed snapshots
On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop wrote: > hrmz, still basically the same behaviour. I think it might be a *little* > better with this patch. Before when under load it would start up quickly > maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4 > or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm still > only seeing your log statement a single time (I'm running at debug2). I > have discovered something though - when the standby is in this state if I > force a checkpoint on the primary then the standby comes right up. Is there > anything I check or try for you to help figure this out?or is it > actually as designed that it could take 10-ish minutes to start up even > after all clients have disconnected from the primary? Thanks for testing. The improvements cover specific cases, so its not subject to chance; its not a performance patch. It's not "designed" to act the way you describe, but it does. The reason this occurs is that you have a transaction heavy workload with occasional periods of complete quiet and a base backup time that is much less than checkpoint_timeout. If your base backup was slower the checkpoint would have hit naturally before recovery had reached a consistent state. Which seems fairly atypical. I guess you're doing this on a test system. It seems cheap to add in a call to LogStandbySnapshot() after each call to pg_stop_backup(). Does anyone think this case is worth adding code for? Seems like one more thing to break. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces
On Oct27, 2011, at 23:02 , Bruce Momjian wrote: > Florian Pflug wrote: >> On Oct21, 2011, at 16:42 , Phil Sorber wrote: >>> If you did want to make them immutable, I also like Florian's idea of >>> a dependency graph. This would make the dumps less readable though. >> >> Hm, I kinda reversed my opinion on that, though - i.e., I no longer think >> that the dependency graph idea has much merit. For two reasons >> >> First, dependencies work on OIDs, not on names. Thus, for the dependency >> machinery to work for GUCs, they'd also need to store OIDs instead of >> names of referenced schema objects. (Otherwise you get into trouble if >> objects are renamed) >> >> Which of course doesn't work, at least for roles, because roles are >> shared objects, but referenced objects might be database-local. >> (search_path, for example). > > Is this a TODO? The idea quoted above, no. But Downgrade non-immutable (i.e., dependent on database state) checks during "ALTER ROLE/DATABASE SET" to WARNINGs to avoid breakage during restore makes for a fine TODO, I'd say. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add socket dir to pg_config..?
All, Was just wondering if we might want to include the default socket directory that was compiled in as part of the pg_config output..? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Robert Haas wrote: > On Tue, Oct 4, 2011 at 12:11 PM, Heikki Linnakangas > wrote: > > pg_upgrade doesn't work if the 'postgres' database has been dropped in the > > old cluster: > > > > ~/pgsql.master$ bin/pg_upgrade -b ~/pgsql.91stable/bin -B bin/ -d > > ~/pgsql.91stable/data -D data-upgraded/ > > Performing Consistency Checks > > - > > Checking current, bin, and data directories ? ? ? ? ? ? ? ? ok > > Checking cluster versions ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok > > Checking database user is a superuser ? ? ? ? ? ? ? ? ? ? ? ok > > Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok > > Checking for reg* system OID user data types ? ? ? ? ? ? ? ?ok > > Checking for contrib/isn with bigint-passing mismatch ? ? ? ok > > Creating catalog dump ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok > > Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok > > > > New cluster database "postgres" does not exist in the old cluster > > Failure, exiting > > > > > > That's a bit unfortunate. We have some other tools that don't work without > > the 'postgres' database, like 'reindexdb -all', but it would still be nice > > if pg_upgrade did. > > +1. I think our usual policy is to try postgres first and then try > template1, so it would seem logical for pg_upgrade to do the same. Well, it is a little tricky. The problem is that I am not just connecting to a database --- I am creating support functions in the database. Now, this is complex because template1 is the template for new databases, except for pg_dump which uses template0. So, it is going to be confusing to support both databases because there is the cleanup details I have to document if I use template1. Also, pg_dumpall unconditionally connects to the postgres database to restore roles: fprintf(OPF, "\\connect postgres\n\n"); We could connect to template1 for this, but I am not comfortable changing this. And when pg_dumpall throws an error for a missing postgres database, pg_upgrade is going to fail. We started using the postgres database as a database for connections --- do we want to change that? We certainly can't have the pg_dumpall output _conditionally_ connecting to template1. I am feeling this isn't worth pursuing. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
* Bruce Momjian (br...@momjian.us) wrote: > So, it is going to be confusing to support both databases because there > is the cleanup details I have to document if I use template1. Presumably there's some other database in the system besides template1 if postgres doesn't exist.. Would it be possible to just make it configurable? Then the user could pick a non-template database. Having it fail if the option isn't used and the default postgres isn't there is fine, imv. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (br...@momjian.us) wrote: > > So, it is going to be confusing to support both databases because there > > is the cleanup details I have to document if I use template1. > > Presumably there's some other database in the system besides template1 > if postgres doesn't exist.. Would it be possible to just make it > configurable? Then the user could pick a non-template database. Having > it fail if the option isn't used and the default postgres isn't there is > fine, imv. I have not seen enough demand to make this a user-visible configuration. We can just tell them to create a postgres database. Frankly, they would have had to _remove_ the postgres database after initdb for it not to be there, and they are instructed to change nothing about the new database. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
* Bruce Momjian (br...@momjian.us) wrote: > I have not seen enough demand to make this a user-visible configuration. > We can just tell them to create a postgres database. Frankly, they > would have had to _remove_ the postgres database after initdb for it not > to be there, and they are instructed to change nothing about the new > database. Yes, they would have removed it because they didn't want it. As I recall, part of the agreement to create an extra database by default was that it could be removed if users didn't want it. Turning around and then saying "but things won't work if it's not there" isn't exactly supporting users who decide to remove it. Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can be configured to connect to other databases instead, including for globals. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Hot Standby startup with overflowed snapshots
Sorry..."designed" was poor choice of words, I meant "not unexpected". Doing the checkpoint right after pg_stop_backup() looks like it will work perfectly for me, so thanks for all your help! On a side note I am sporadically seeing another error on hotstandby startup. I'm not terribly concerned about it as it is pretty rare and it will work on a retry so it's not a big deal. The error is "FATAL: out-of-order XID insertion in KnownAssignedXids". If you think it might be a bug and are interested in hunting it down let me know and I'll help any way I can...but if you're not too worried about it then neither am I :) On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs wrote: > On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop > wrote: > > > hrmz, still basically the same behaviour. I think it might be a *little* > > better with this patch. Before when under load it would start up quickly > > maybe 2 or 3 times out of 10 attemptswith this patch it might be up > to 4 > > or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm > still > > only seeing your log statement a single time (I'm running at debug2). I > > have discovered something though - when the standby is in this state if I > > force a checkpoint on the primary then the standby comes right up. Is > there > > anything I check or try for you to help figure this out?or is it > > actually as designed that it could take 10-ish minutes to start up even > > after all clients have disconnected from the primary? > > Thanks for testing. The improvements cover specific cases, so its not > subject to chance; its not a performance patch. > > It's not "designed" to act the way you describe, but it does. > > The reason this occurs is that you have a transaction heavy workload > with occasional periods of complete quiet and a base backup time that > is much less than checkpoint_timeout. If your base backup was slower > the checkpoint would have hit naturally before recovery had reached a > consistent state. Which seems fairly atypical. I guess you're doing > this on a test system. > > It seems cheap to add in a call to LogStandbySnapshot() after each > call to pg_stop_backup(). > > Does anyone think this case is worth adding code for? Seems like one > more thing to break. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (br...@momjian.us) wrote: > > I have not seen enough demand to make this a user-visible configuration. > > We can just tell them to create a postgres database. Frankly, they > > would have had to _remove_ the postgres database after initdb for it not > > to be there, and they are instructed to change nothing about the new > > database. > > Yes, they would have removed it because they didn't want it. As I > recall, part of the agreement to create an extra database by default was > that it could be removed if users didn't want it. Turning around and > then saying "but things won't work if it's not there" isn't exactly > supporting users who decide to remove it. Well, you would have to remove it _after_ you did the pg_upgrade. Right now if you do a normal dump/restore upgrade, you also have to re-remove the postgres database. We don't have any mechanism to drop a database as part of pg_dumpall's restore if it didn't exist in the old cluster. > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can > be configured to connect to other databases instead, including for > globals. Well, please show me the code, because the C code I showed you had the '\connect postgres' string hardcoded in there. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
* Bruce Momjian (br...@momjian.us) wrote: > Well, you would have to remove it _after_ you did the pg_upgrade. Right > now if you do a normal dump/restore upgrade, you also have to re-remove > the postgres database. We don't have any mechanism to drop a database > as part of pg_dumpall's restore if it didn't exist in the old cluster. Perhaps not, but it *could* be removed after the restore and all would be well, yes? > > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can > > be configured to connect to other databases instead, including for > > globals. > > Well, please show me the code, because the C code I showed you had the > '\connect postgres' string hardcoded in there. I guess there's a difference between "can be used and will work correctly, but might create some extra garbage" and "can't be used at all". pg_dumpall has a -l option for connecting to whatever *existing* database you have to pull the global data, and then it'll restore into a clean initdb'd cluster, after which you could remove postgres. Admittedly, if you initdb the cluster, drop postgres, and then try a restore, it would fail. Personally, I'm not a big fan of that (why don't we use what was passed in to -l for that..?), but, practically, it's not that big a deal. I don't know many folks who are going to restore a pg_dumpall dump into an existing configuration where they've monkied with things (that could cause all kinds of other issues anyway, role conflicts, etc). If I understood correctly (perhaps I didn't..), is that pg_upgrade doesn't have the pg_dumpall equivilant of the '-l' or '--database' option, and that's what is at issue here. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Add socket dir to pg_config..?
Stephen Frost writes: > Was just wondering if we might want to include the default socket > directory that was compiled in as part of the pg_config output..? [ shrug... ] We don't report the compiled-in port number, which is considerably more critical. And we don't report changes in any of the other stuff in pg_config_manual.h. MHO is that changing the socket directory is only marginally supported, and we shouldn't encourage it unless we're prepared to fully support it (which we can't really). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (br...@momjian.us) wrote: > > Well, you would have to remove it _after_ you did the pg_upgrade. Right > > now if you do a normal dump/restore upgrade, you also have to re-remove > > the postgres database. We don't have any mechanism to drop a database > > as part of pg_dumpall's restore if it didn't exist in the old cluster. > > Perhaps not, but it *could* be removed after the restore and all would > be well, yes? I am not sure how much pg_dump worries about removing system objects during a restore --- I don't think it does that at all actually. I thought we did that for plpgsql, but I don't see that in the C code now, and testing doesn't show it being removed by pg_dump. > > > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can > > > be configured to connect to other databases instead, including for > > > globals. > > > > Well, please show me the code, because the C code I showed you had the > > '\connect postgres' string hardcoded in there. > > I guess there's a difference between "can be used and will work > correctly, but might create some extra garbage" and "can't be used at > all". pg_dumpall has a -l option for connecting to whatever *existing* > database you have to pull the global data, and then it'll restore into a > clean initdb'd cluster, after which you could remove postgres. Keep in mind -l might connect to a specified database to do the dump, but it will still connect to the 'postgres' database to recreate them. > Admittedly, if you initdb the cluster, drop postgres, and then try a > restore, it would fail. Personally, I'm not a big fan of that (why Right, same with pg_upgrade. > don't we use what was passed in to -l for that..?), but, practically, No idea. > it's not that big a deal. I don't know many folks who are going to > restore a pg_dumpall dump into an existing configuration where they've > monkied with things (that could cause all kinds of other issues anyway, > role conflicts, etc). > > If I understood correctly (perhaps I didn't..), is that pg_upgrade > doesn't have the pg_dumpall equivilant of the '-l' or '--database' > option, and that's what is at issue here. Well, I can modify pg_upgrade to connect to template1 easily (no need for a switch) --- the problem is that pg_dumpall requires the 'postgres' database to restore its dump file. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Bruce Momjian writes: > Stephen Frost wrote: >> Yes, they would have removed it because they didn't want it. As I >> recall, part of the agreement to create an extra database by default was >> that it could be removed if users didn't want it. Turning around and >> then saying "but things won't work if it's not there" isn't exactly >> supporting users who decide to remove it. > Well, you would have to remove it _after_ you did the pg_upgrade. As far as the *target* cluster is concerned, I have no sympathy for someone who messes with its contents before running pg_upgrade. That's an RTFM matter: you're supposed to upgrade into a virgin just-initdb'd cluster. However, it would be nice if pg_upgrade supported transferring from a *source* cluster that didn't have the postgres DB. What about creating a new, single-purpose database in the source cluster and then removing it again after we're done? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Tom Lane wrote: > Bruce Momjian writes: > > Stephen Frost wrote: > >> Yes, they would have removed it because they didn't want it. As I > >> recall, part of the agreement to create an extra database by default was > >> that it could be removed if users didn't want it. Turning around and > >> then saying "but things won't work if it's not there" isn't exactly > >> supporting users who decide to remove it. > > > Well, you would have to remove it _after_ you did the pg_upgrade. > > As far as the *target* cluster is concerned, I have no sympathy for > someone who messes with its contents before running pg_upgrade. > That's an RTFM matter: you're supposed to upgrade into a virgin > just-initdb'd cluster. > > However, it would be nice if pg_upgrade supported transferring from a > *source* cluster that didn't have the postgres DB. I have this C comment in pg_upgrade about this: * If someone removes the 'postgres' database from the old cluster and * the new cluster has a 'postgres' database, the number of databases * will not match. We actually could upgrade such a setup, but it would * violate the 1-to-1 mapping of database counts, so we throw an error * instead. We would detect this as a database count mismatch during * upgrade, but we want to detect it during the check phase and report * the database name. Is this worth fixing? Another problem is that pg_dumpall doesn't remove the postgres database in the new cluster if it was not in the old one, so they are going to end up with a postgres database in the new cluster anyway. I could argue that pg_upgrade is better because reports it cannot recreate the new cluster exactly, while pg_dumpall just keeps a postgres database that was not in the old cluster. > What about creating a new, single-purpose database in the source > cluster and then removing it again after we're done? That is not a problem --- I can easily use template1. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian wrote: >> What about creating a new, single-purpose database in the source >> cluster and then removing it again after we're done? > > That is not a problem --- I can easily use template1. Huh? You just said upthread that you didn't want to use template1 because you didn't want to modify the template database. I think the point is that if you're doing something to the database that someone might object to, you oughtn't be doing it to the postgres database either. You should create a database just for pg_upgrade's use and install its crap in there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Robert Haas wrote: > On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian wrote: > >> What about creating a new, single-purpose database in the source > >> cluster and then removing it again after we're done? > > > > That is not a problem --- I can easily use template1. > > Huh? > > You just said upthread that you didn't want to use template1 because > you didn't want to modify the template database. I think the point is I don't want to use postgres and then fall back to template1 if necessary --- I would just use template1 always. > that if you're doing something to the database that someone might > object to, you oughtn't be doing it to the postgres database either. > You should create a database just for pg_upgrade's use and install its > crap in there. It installs crap in all databases to set oids on system tables, for example, so we are only creating it early in postgres (or template1) to set auth_id. Our sticking point now is that pg_dumpall has the 'postgres' database hardcoded for role creation. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby startup with overflowed snapshots
On Thu, Oct 27, 2011 at 6:55 PM, Simon Riggs wrote: > It seems cheap to add in a call to LogStandbySnapshot() after each > call to pg_stop_backup(). > > Does anyone think this case is worth adding code for? Seems like one > more thing to break. Why at that particular time? It would maybe nice if the master could notice when it has a plausible (non-overflowed) snapshot and log it then. But that might be more code than the problem is worth. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Include commit identifier in version() function
Hi. I think, it be useful to include in version() function a hexadecimal identifier of commit, for fast checkout to it in git. -- pasman -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: [PL/pgSQL] %TYPE and array declaration - second patch
Pavel Stehule writes: > there is final Wojciech's patch I looked this over a little bit, but I don't see an answer to the question I put on the commitfest entry: why is this being done in plpgsql, and not somewhere in the core code? The core has also got the concept of %TYPE, and could stand to have support for attaching [] notation to that. I'm not entirely sure if anything could be shared, but it would sure be worth looking at. I've wished for some time that %TYPE not be handled directly in read_datatype() at all, but rather in the core parse_datatype() function. This would require passing some sort of callback function to parse_datatype() to let it know what variables can be referenced in %TYPE, but we have parser callback functions just like that already. One benefit we could get that way is that the core meaning of %TYPE (type of a table field name) would still be available in plpgsql, if the name didn't match any declared plpgsql variable. However, assuming that we're sticking to just changing plpgsql for the moment ... I cannot escape the feeling that this is a large patch with a small patch struggling to get out. It should not require 500 net new lines of code to provide this functionality, when all we're doing is looking up the array type whose element type is the type the existing code can derive. I would have expected to see the grammar passing one extra flag to plpgsql_parse_cwordtype and plpgsql_parse_cwordrowtype routines that were little changed except for the addition of a step to find the array type. Another complaint is that the grammar changes assume that the first token of decl_datatype must be T_WORD or T_CWORD, which means that it fails for cases such as unreserved plpgsql keywords. This example should work, and does work in 9.1: create domain hint as int; create or replace function foo() returns void as $$ declare x hint; begin end $$ language plpgsql; but fails with this patch because "hint" is returned by the lexer as K_HINT not T_WORD. You might be able to get around that by adding a production with unreserved_keyword --- but I'm unsure that that would cover every case that worked before, and in any case I do not see the point of changing the grammar production at all. It gains almost nothing to have Bison parse the [] rather than doing it with C code in read_datatype(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
Bruce Momjian writes: > Robert Haas wrote: >> that if you're doing something to the database that someone might >> object to, you oughtn't be doing it to the postgres database either. >> You should create a database just for pg_upgrade's use and install its >> crap in there. > It installs crap in all databases to set oids on system tables, It seems like you're both confusing the source and target clusters. None of that stuff gets installed in the source, does it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: [PL/pgSQL] %TYPE and array declaration - second patch
Hello 2011/10/28 Tom Lane : > Pavel Stehule writes: >> there is final Wojciech's patch > this is just small note about length of this patch. This patch was significantly smaller then he solved problem with derivate types for compound types - it should to solve problem described in this thread http://stackoverflow.com/questions/7634704/declare-variable-of-composite-type-in-postgresql-using-type Regards Pavel Stehule > I looked this over a little bit, but I don't see an answer to the > question I put on the commitfest entry: why is this being done in > plpgsql, and not somewhere in the core code? The core has also got > the concept of %TYPE, and could stand to have support for attaching [] > notation to that. I'm not entirely sure if anything could be shared, > but it would sure be worth looking at. I've wished for some time that > %TYPE not be handled directly in read_datatype() at all, but rather in > the core parse_datatype() function. This would require passing some > sort of callback function to parse_datatype() to let it know what > variables can be referenced in %TYPE, but we have parser callback > functions just like that already. One benefit we could get that way > is that the core meaning of %TYPE (type of a table field name) would > still be available in plpgsql, if the name didn't match any declared > plpgsql variable. > > However, assuming that we're sticking to just changing plpgsql for the > moment ... I cannot escape the feeling that this is a large patch with a > small patch struggling to get out. It should not require 500 net new > lines of code to provide this functionality, when all we're doing is > looking up the array type whose element type is the type the existing > code can derive. I would have expected to see the grammar passing one > extra flag to plpgsql_parse_cwordtype and plpgsql_parse_cwordrowtype > routines that were little changed except for the addition of a step to > find the array type. > > Another complaint is that the grammar changes assume that the first > token of decl_datatype must be T_WORD or T_CWORD, which means that > it fails for cases such as unreserved plpgsql keywords. This example > should work, and does work in 9.1: > > create domain hint as int; > > create or replace function foo() returns void as $$ > declare > x hint; > begin > end > $$ language plpgsql; > > but fails with this patch because "hint" is returned by the lexer as > K_HINT not T_WORD. You might be able to get around that by adding a > production with unreserved_keyword --- but I'm unsure that that would > cover every case that worked before, and in any case I do not see the > point of changing the grammar production at all. It gains almost > nothing to have Bison parse the [] rather than doing it with C code in > read_datatype(). > > regards, tom lane > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
On Oct 28, 2011 5:22 AM, "Tom Lane" wrote: > > Bruce Momjian writes: > > Stephen Frost wrote: > >> Yes, they would have removed it because they didn't want it. As I > >> recall, part of the agreement to create an extra database by default was > >> that it could be removed if users didn't want it. Turning around and > >> then saying "but things won't work if it's not there" isn't exactly > >> supporting users who decide to remove it. > > > Well, you would have to remove it _after_ you did the pg_upgrade. > > As far as the *target* cluster is concerned, I have no sympathy for > someone who messes with its contents before running pg_upgrade. > That's an RTFM matter: you're supposed to upgrade into a virgin > just-initdb'd cluster. > > However, it would be nice if pg_upgrade supported transferring from a > *source* cluster that didn't have the postgres DB. > > What about creating a new, single-purpose database in the source > cluster and then removing it again after we're done? How about naming this newly created database "postgres"? That would make the code simple enough - always use the postgres database, just drop it at the end if it didn't exist in the source cluster. /Magnus
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
On Oct 28, 2011 5:19 AM, "Bruce Momjian" wrote: > > Stephen Frost wrote: > > > > > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can > > > > be configured to connect to other databases instead, including for > > > > globals. > > > > > > Well, please show me the code, because the C code I showed you had the > > > '\connect postgres' string hardcoded in there. > > > > I guess there's a difference between "can be used and will work > > correctly, but might create some extra garbage" and "can't be used at > > all". pg_dumpall has a -l option for connecting to whatever *existing* > > database you have to pull the global data, and then it'll restore into a > > clean initdb'd cluster, after which you could remove postgres. > > Keep in mind -l might connect to a specified database to do the dump, > but it will still connect to the 'postgres' database to recreate them. > > > Admittedly, if you initdb the cluster, drop postgres, and then try a > > restore, it would fail. Personally, I'm not a big fan of that (why > > Right, same with pg_upgrade. > > > don't we use what was passed in to -l for that..?), but, practically, > > No idea. Chicken/egg? If we did that, the pg_dumpall dump could no longer be loaded into an empty cluster since the db it wanted to talk to didn't exist yet. And restoring into an empty cluster has to be the main use for pg_dumpall after all /Magnus
Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup
On Thu, Oct 27, 2011 at 11:14 PM, Magnus Hagander wrote: > Here's a version that does this. Turns out this requires a lot less > code than what was previously in there, which is always nice. > > We still need to solve the other part which is how to deal with the > partial files on restore. But this is definitely a cleaner way from a > pure pg_receivexlog perspective. > > Comments/reviews? Looks good. Minor comment: the source code comment of FindStreamingStart() seems to need to be updated. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Oct 27, 2011 at 11:57 PM, Magnus Hagander wrote: > On Thu, Oct 27, 2011 at 16:54, Tom Lane wrote: >> Magnus Hagander writes: >>> On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas >>> wrote: On 27.10.2011 14:09, Fujii Masao wrote: > Yes. But that sounds unuserfriendly. Padding the WAL file manually > is easy-to-do for a user? >> >>> I'd definitely want to avoid anything that requires pg_receivexlog to >>> actually *parse* the WAL. That'll make it way more complex than I'd >>> like. >> >> What parsing? Just pad to 16MB with zeroes. In fact, I think the > > I'm just sayihng that *if* parsing is required, it would be bad. > >> receiver should just create the file that size to start with, and then >> write received data into it, much like normal WAL creation does. > > So when pg_receivexlog starts up, how would it know if the last file > represents a completed file, or a half-full file, without actually > parsing it? It could be a 16Mb file with 10 bytes of valid data, or a > complete file with 16Mb of valid data. > > We could always ask for a retransmit of the whole file, but if that > file is gone on the master, we won't be able to do that, and will > error out in a situation that's not actually an error. > > Though I guess if we leave the file as .partial up until this point > (per my other patch just posted), I guess this doesn't actually apply > - if the file is called .partial, we'll overwrite into it. If it's > not, then we assume it's a complete segment. Yeah, I think that we should commit the patch that you posted in other thread, and should change pg_receivexlog so that it creates new WAL file filled up with zero or opens a pre-existing one, like XLogFileInit() does, before writing any streamed data. If we do this, a user can easily use a partial WAL file for recovery by renaming that file. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers