Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
Robert Haas writes: > On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs wrote: >> This fixes both the subtrans and clog bugs in one patch. > I don't see the point of changing StartupCLOG() to be an empty > function and adding a new function TrimCLOG() that does everything > StartupCLOG() used to do. +1 ... I found that overly cute also. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs wrote: > On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs wrote: > >> Brewing a patch now. > > Latest thinking... confirmations or other error reports please. > > This fixes both the subtrans and clog bugs in one patch. I don't see the point of changing StartupCLOG() to be an empty function and adding a new function TrimCLOG() that does everything StartupCLOG() used to do. Seems simpler to just move the calls to StartupCLOG() wherever they need to be - i.e. remove the one that happens before WAL replay, and extricate the one at end-of-recovery from the if block which currently contains it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct26, 2011, at 18:08 , Simon Riggs wrote: > On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug wrote: >> On Oct26, 2011, at 15:57 , Florian Pflug wrote: >> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), >> but >> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint >> whose CLOG page hasn't necessarily made it to the disk yet. The longer >> CheckPointGuts() >> takes to finish it's work the more likely it becomes (assuming that CLOG >> writing >> and syncing doesn't happen at the very end). This fits the OP's observation >> ob the >> problem vanishing when pg_start_backup() does an immediate checkpoint. > > As it turns out the correct fix is actually just to skip StartupClog() > until the end of recovery because it does nothing useful when executed > at that time. When I wrote the original code I remember thinking that > StartupClog() is superfluous at that point. Hm, that fixes the problem in the hot standby case, but as I said in my reply to Chris Redekop, normal crash recovery is also at risk (although the probability of hitting the bug is much smaller there). Here's my reasoning from that other mail: Per my theory about the cause of the problem in my other mail, I think you might see StartupCLOG failures even during crash recovery, provided that wal_level was set to hot_standby when the primary crashed. Here's how 1) We start a checkpoint, and get as far as LogStandbySnapshot() 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId(). The assigned XID requires CLOG extension. 3) The checkpoint continues, and LogStandbySnapshot () advances the checkpoint's nextXid to the XID assigned in (2). 4) We crash after writing the checkpoint record, but before the CLOG extension makes it to the disk, and before any trace of the XID assigned in (2) makes it to the xlog. Then StartupCLOG() would fail at the end of recovery, because we'd end up with a nextXid whose corresponding CLOG page doesn't exist. Quite aside from that concern, I think it's probably not a good idea for the nextXid value of a checkpoint to depend on whether wal_level was set to hot_standby or not. Our recovery code is already quite complex and hard to test, and this just adds one more combination that has to be thought about while coding and that needs to be tested. My suggestion is to fix the CLOG problem in that same way that you fixed the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before CheckPointGuts(). Here's what I image CreateCheckPoint() should look like: 1) LogStandbySnapshot() and fill out oldestActiveXid 2) Fill out REDO 3) Wait for concurrent commits 4) Fill out nextXid and the other fields 5) CheckPointGuts() 6) Rest It's then no longer necessary for LogStandbySnapshot() do modify the nextXid, since we fill out nextXid after LogStandbySnapshot() and will thus derive a higher value than LogStandbySnapshot() would have. We could then also fold GetOldestActiveTransactionId() back into your proposed LogStandbySnapshot() and thus don't need two ProcArray traversals. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot
On Oct 26, 2011, at 2:58 PM, Tom Lane wrote: >> I read this as equivalent to "can be not released." Which of course is >> silly, so as I read it I realize what it means, but it trips up my overly >> logical brain. It interrupts the flow. There is no such confusion in "cannot >> be released" and thus no tripping up on meaning. > > This particular change seems like an improvement to me, but it's hardly > an adequate argument for a global search-and-replace. There might be > other places where such a change renders things *less* readable. The patch is actually quite modest; there are only a few instances of "can not". Attached. Best, David cannot.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot
"David E. Wheeler" writes: > On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote: >> Why? "can not" is perfectly grammatical AFAIK. > True, but there's a logic issue. Take this example from > doc/src/sgml/func.sgml: >> >> pg_advisory_xact_lock works the same as >> pg_advisory_lock, expect the lock is automatically released >> at the end of the current transaction and can not be released explicitly. >> > I read this as equivalent to "can be not released." Which of course is silly, > so as I read it I realize what it means, but it trips up my overly logical > brain. It interrupts the flow. There is no such confusion in "cannot be > released" and thus no tripping up on meaning. This particular change seems like an improvement to me, but it's hardly an adequate argument for a global search-and-replace. There might be other places where such a change renders things *less* readable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot
On 10/26/2011 05:15 PM, David E. Wheeler wrote: On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote: Suggested doc “patch”: grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g' Why? "can not" is perfectly grammatical AFAIK. True, but there's a logic issue. Take this example from doc/src/sgml/func.sgml: pg_advisory_xact_lock works the same as pg_advisory_lock, expect the lock is automatically released at the end of the current transaction and can not be released explicitly. I read this as equivalent to "can be not released." Which of course is silly, so as I read it I realize what it means, but it trips up my overly logical brain. It interrupts the flow. There is no such confusion in "cannot be released" and thus no tripping up on meaning. Here's what I would do: 1. s/expect/except that/ 2. s/can not be released explicitly/can not be explicitly released/ cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot
On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote: >> Suggested doc “patch”: >> >> grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g' > > Why? "can not" is perfectly grammatical AFAIK. True, but there's a logic issue. Take this example from doc/src/sgml/func.sgml: > > pg_advisory_xact_lock works the same as > pg_advisory_lock, expect the lock is automatically released > at the end of the current transaction and can not be released explicitly. > I read this as equivalent to "can be not released." Which of course is silly, so as I read it I realize what it means, but it trips up my overly logical brain. It interrupts the flow. There is no such confusion in "cannot be released" and thus no tripping up on meaning. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot
On 10/26/2011 02:16 PM, David E. Wheeler wrote: Suggested doc “patch”: grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g' Why? "can not" is perfectly grammatical AFAIK. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum workers warning
2011/10/26 Euler Taveira de Oliveira : > I'm not saying that is not the right direction, I'm arguing that a hint is > better than nothing. Right now the only way to know if it is out of workers > is to query pg_stat_activity frequently. The currently number of autovaccum workers could be in the errmsg only instead errhint, then errhint could be omited from patch if there isn't a good hint to report. -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum workers warning
Excerpts from Euler Taveira de Oliveira's message of mié oct 26 16:57:18 -0300 2011: > > On 26-10-2011 16:14, Alvaro Herrera wrote: > > Well, just increasing the number of workers would do nothing to solve > > the problem, because the more workers there are, the slower they work. > > The actual solution to the problem would be decreasing > > autovacuum_vacuum_delay_cost, and/or related knobs. > > > Why not? You're saying that parallelizing the work won't help? As about > autovacuum_vacuum_cost_delay, don't you think that 20ms isn't small enough to > suggest decreasing instead of increasing the number of workers? I am saying that if you have two workers running, they increase their cost delay to 40ms internally. If you increase the max to four, they would run at an effective delay of 80ms. > > Wasn't there some discussion recently on measuring the length of the > > work queue, or something like that? > > > Yes, there is. As I said, it is an expensive and approximate measure. I'm not > saying that is not the right direction, I'm arguing that a hint is better > than > nothing. Right now the only way to know if it is out of workers is to query > pg_stat_activity frequently. Well, I'm not saying there aren't other ways. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum workers warning
On 26-10-2011 16:14, Alvaro Herrera wrote: Well, just increasing the number of workers would do nothing to solve the problem, because the more workers there are, the slower they work. The actual solution to the problem would be decreasing autovacuum_vacuum_delay_cost, and/or related knobs. Why not? You're saying that parallelizing the work won't help? As about autovacuum_vacuum_cost_delay, don't you think that 20ms isn't small enough to suggest decreasing instead of increasing the number of workers? Wasn't there some discussion recently on measuring the length of the work queue, or something like that? Yes, there is. As I said, it is an expensive and approximate measure. I'm not saying that is not the right direction, I'm arguing that a hint is better than nothing. Right now the only way to know if it is out of workers is to query pg_stat_activity frequently. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
FYI I have given this patch a good test and can now no longer reproduce either the subtrans nor the clog error. Thanks guys! On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs wrote: > On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs > wrote: > > On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs > wrote: > > > >> Brewing a patch now. > > > > Latest thinking... confirmations or other error reports please. > > > > This fixes both the subtrans and clog bugs in one patch. > > I'll be looking to commit that tomorrow afternoon as two separate > patches with appropriate credits. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] autovacuum workers warning
Excerpts from Euler Taveira de Oliveira's message of mar oct 25 16:56:12 -0300 2011: > Hi, > > Some time ago [1], I proposed print a message every time there isn't > autovacuum slots available and it asks for another one. It is not a complete > solution for autovacuum tuning but it would at least give us a hint that > number of workers is insufficient to keep up with the current load. The > accurate number of slots needed would be the optimal solution but that > information is not free (it would have to check every table in the databases > available to get the approximate number of slots needed. Approximate because > some table could be finishing the operation). A new warning is better than > nothing. If we decided to improve this area in a future we should remove the > warning but right now it would be an excelent hint to tune autovacuum. Well, just increasing the number of workers would do nothing to solve the problem, because the more workers there are, the slower they work. The actual solution to the problem would be decreasing autovacuum_vacuum_delay_cost, and/or related knobs. I'm sure we need more data on autovacuum activities to properly tune autovac, but I'm not sure that "maxxing out max_workers" is one of them. Wasn't there some discussion recently on measuring the length of the work queue, or something like that? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum workers warning
Euler Taveira de Oliveira writes: > + if (!can_launch) > + ereport(LOG, > + (errmsg("maximum number of autovacuum > workers reached"), > + errhint("Consider increasing > autovacuum_max_workers (currently %d).", > + > autovacuum_max_workers))); Isn't it normal for the launcher to max out the number of workers? A log message that's generated routinely in normal operation doesn't sound particularly helpful to me ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Tue, Oct 25, 2011 at 12:37, Magnus Hagander wrote: > On Mon, Oct 24, 2011 at 14:40, Magnus Hagander wrote: >> On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas >> wrote: + /* + * Looks like an xlog file. Parse it's position. >>> >>> s/it's/its/ >>> + */ + if (sscanf(dirent->d_name, "%08X%08X%08X", &tli, &log, &seg) != 3) + { + fprintf(stderr, _("%s: could not parse xlog filename \"%s\"\n"), + progname, dirent->d_name); + disconnect_and_exit(1); + } + log *= XLOG_SEG_SIZE; >>> >>> That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high. >>> It seems completely unnecessary, anyway, >> >> How do you mean completely unnecessary? We'd have to change the points >> that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a >> way to get around the overflow, but I'm not sure that's what you mean? > > Talked to Heikki on IM about this one, turns out we were both wrong. > It's needed, but there was a bug hiding under it, due to (once again) > mixing up segments and offsets. Has been fixed now. > >>> In pg_basebackup, it would be a good sanity check to check that the systemid >>> returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming >>> connection match. Just to be sure that some connection pooler didn't hijack >>> one of the connections and point to a different server. And better check >>> timelineid too while you're at it. >> >> That's a good idea. Will fix. > > Added to the new version of the patch. > > >>> How does this interact with synchronous replication? If a base backup that >>> streams WAL is in progress, and you have synchronous_standby_names set to >>> '*', I believe the in-progress backup will count as a standby for that >>> purpose. That might give a false sense of security. >> >> Ah yes. Did not think of that. Yes, it will have this problem. > > Actually, thinking more, per other mail, it won't. Because it will > never report that the data is synced to disk, so it will not be > considered for sync standby. > > This is something we might consider in the future (it could be a > reasonable scenario where you had this), but not in the first version. > > Updated version of the patch attached. I've applied this version with a few more minor changes that Heikki found. His comment about .partial files still applies, and I intend to address this in a follow-up commit, along with some further documentation enhancements. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cannot
Suggested doc “patch”: grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g' Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
Robert Haas writes: > I believe that we're in trouble with XIDs as soon as you have two > active XIDs that are separated by a billion, because then you could > have a situation where some people think a given XID is in the future > and others think it's in the past. I have been wondering if we should > have some sort of active guard against that scenario; I don't think we > do at present. Sure we do. It's covered by the XID wraparound prevention logic. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Magnus Hagander writes: > On Wed, Oct 26, 2011 at 16:37, Tom Lane wrote: >> If that was what he meant, I'd vote against it. There are way too many >> people who will *not* want their databases configured to be able to >> reach out onto the net. This feature should be something that has to be >> installed by explicit user action. > That is not what I meant. > I meant installed the shared library by defualt, but still require > CREATE EXTENSION. Whether the shlib is installed by default is a decision for packagers to make, not us. At best we could make a recommendation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] autovacuum workers warning
Hi, Some time ago [1], I proposed print a message every time there isn't autovacuum slots available and it asks for another one. It is not a complete solution for autovacuum tuning but it would at least give us a hint that number of workers is insufficient to keep up with the current load. The accurate number of slots needed would be the optimal solution but that information is not free (it would have to check every table in the databases available to get the approximate number of slots needed. Approximate because some table could be finishing the operation). A new warning is better than nothing. If we decided to improve this area in a future we should remove the warning but right now it would be an excelent hint to tune autovacuum. [1] http://archives.postgresql.org/pgsql-hackers/2011-06/msg00678.php -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 3b71232..4ec0f87 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -656,6 +656,12 @@ AutoVacLauncherMain(int argc, char *argv[]) can_launch = (AutoVacuumShmem->av_freeWorkers != NULL); + if (!can_launch) + ereport(LOG, + (errmsg("maximum number of autovacuum workers reached"), + errhint("Consider increasing autovacuum_max_workers (currently %d).", + autovacuum_max_workers))); + if (AutoVacuumShmem->av_startingWorker != NULL) { int waittime; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On Wed, Oct 26, 2011 at 19:25, Andrew Dunstan wrote: > > On 10/26/2011 12:47 PM, Magnus Hagander wrote: >>> >>> If that was what he meant, I'd vote against it. There are way too many >>> people who will *not* want their databases configured to be able to >>> reach out onto the net. This feature should be something that has to be >>> installed by explicit user action. >> >> That is not what I meant. >> >> I meant installed the shared library by defualt, but still require >> CREATE EXTENSION. >> > > I don't see why it should be different from other standard modules, such as > citext or hstore, both of which have pretty wide use, and less possible > security implications than this. As I stated earlier, it's really back to the old discussion of splitting up contrib. This would be the "additional module" part, but not the "example of how to do things" part of that... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
Heikki Linnakangas writes: > That's not what Jeff is referring to here, though (correct me if I'm > wrong). He's talking about the one-item cache in > TransactionIdLogFetch(). You don't need need long-running transactions > for that to get confused. Specifically, this could happen: > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT; > The row has xmin = 123456, and it is cached as committed in the > one-item cache by TransactionLogFetch. > 2. A lot of time passes. Everything is frozen, and XID wrap-around > happens. (Session A is idle but not in a transaction, so it doesn't > inhibit freezing.) > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK; > By coincidence, this transaction was assigned XID 123456. > 4. In session A: SELECT * FROM foo WHERE id = 2; > The one-item cache still says that 123456 committed, so we return > the tuple inserted by the aborted transaction. Oops. I think this is probably a red herring, because it's impossible for a session to remain totally idle for that long --- see sinval updating. (If you wanted to be really sure, we could have sinval reset clear the TransactionLogFetch cache, but I doubt it's worth the trouble.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On 10/26/2011 12:47 PM, Magnus Hagander wrote: If that was what he meant, I'd vote against it. There are way too many people who will *not* want their databases configured to be able to reach out onto the net. This feature should be something that has to be installed by explicit user action. That is not what I meant. I meant installed the shared library by defualt, but still require CREATE EXTENSION. I don't see why it should be different from other standard modules, such as citext or hstore, both of which have pretty wide use, and less possible security implications than this. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs wrote: > On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs wrote: > >> Brewing a patch now. > > Latest thinking... confirmations or other error reports please. > > This fixes both the subtrans and clog bugs in one patch. I'll be looking to commit that tomorrow afternoon as two separate patches with appropriate credits. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
On Wed, 2011-10-26 at 12:19 -0400, Robert Haas wrote: > > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT; > > The row has xmin = 123456, and it is cached as committed in the one-item > > cache by TransactionLogFetch. > > 2. A lot of time passes. Everything is frozen, and XID wrap-around happens. > > (Session A is idle but not in a transaction, so it doesn't inhibit > > freezing.) > > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK; > > By coincidence, this transaction was assigned XID 123456. > > 4. In session A: SELECT * FROM foo WHERE id = 2; > > The one-item cache still says that 123456 committed, so we return the > > tuple inserted by the aborted transaction. Oops. Yes, that's the scenario I was talking about. > Oh, hmm. That sucks. It didn't seem like a major concern a while ago: http://archives.postgresql.org/message-id/28107.1291264...@sss.pgh.pa.us Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On Wed, Oct 26, 2011 at 16:37, Tom Lane wrote: > Shigeru Hanada writes: >> (2011/10/25 19:15), Magnus Hagander wrote: >>> I have not looked at the code itself, but I wonder if we shouldn't >>> consider making this a part of core-proper, not just a contrib module. >>> The fact that it isn't *already* available in core surprises a lot of >>> people... > >> Do you mean that pgsql_fdw should be a built-in extension like plpgsql >> so that it's available just after initdb? > > If that was what he meant, I'd vote against it. There are way too many > people who will *not* want their databases configured to be able to > reach out onto the net. This feature should be something that has to be > installed by explicit user action. That is not what I meant. I meant installed the shared library by defualt, but still require CREATE EXTENSION. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
Excerpts from Robert Haas's message of mié oct 26 13:19:47 -0300 2011: > On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas > wrote: > > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT; > > The row has xmin = 123456, and it is cached as committed in the one-item > > cache by TransactionLogFetch. > > 2. A lot of time passes. Everything is frozen, and XID wrap-around happens. > > (Session A is idle but not in a transaction, so it doesn't inhibit > > freezing.) > > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK; > > By coincidence, this transaction was assigned XID 123456. > > 4. In session A: SELECT * FROM foo WHERE id = 2; > > The one-item cache still says that 123456 committed, so we return the > > tuple inserted by the aborted transaction. Oops. > > Oh, hmm. That sucks. Can this be solved by simple application of the Xid epoch? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas wrote: > On 26.10.2011 18:42, Robert Haas wrote: >> >> On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis wrote: >>> >>> Aren't there a few other cases like this floating around the code? I >>> know the single-xid cache is potentially vulnerable to xid wraparound >>> for the same reason. >> >> I believe that we're in trouble with XIDs as soon as you have two >> active XIDs that are separated by a billion, ... > > That's not what Jeff is referring to here, though (correct me if I'm wrong). > He's talking about the one-item cache in TransactionIdLogFetch(). You don't > need need long-running transactions for that to get confused. Specifically, > this could happen: > > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT; > The row has xmin = 123456, and it is cached as committed in the one-item > cache by TransactionLogFetch. > 2. A lot of time passes. Everything is frozen, and XID wrap-around happens. > (Session A is idle but not in a transaction, so it doesn't inhibit > freezing.) > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK; > By coincidence, this transaction was assigned XID 123456. > 4. In session A: SELECT * FROM foo WHERE id = 2; > The one-item cache still says that 123456 committed, so we return the > tuple inserted by the aborted transaction. Oops. Oh, hmm. That sucks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs wrote: > Brewing a patch now. Latest thinking... confirmations or other error reports please. This fixes both the subtrans and clog bugs in one patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services oldestActiveXid_fixed.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug wrote: > On Oct26, 2011, at 15:57 , Florian Pflug wrote: >> As you said, the CLOG page corresponding to nextId >> *should* always be accessible at the start of recovery (Unless whole file >> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG. >> Yet the error suggest that the CLOG is, in fact, too short. What I said >> is that we shouldn't apply any fix (for the CLOG problem) before we >> understand >> the reason for that apparent contradiction. > > Ha! I think I've got a working theory. > > In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint > record, and then call CheckPointGuts() which does the actual writing and > fsyncing. > So far, that fine. If a transaction ID is assigned before we compute the > checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() > will > make sure the new CLOG page goes to disk. > > But, if wal_level = hot_standby, we also call LogStandbySnapshot() in > CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be > fine too, except that LogStandbySnapshot() re-assigned the *current* value of > ShmemVariableCache->nextXid to the checkpoint's nextXid field. > > Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), > but > before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint > whose CLOG page hasn't necessarily made it to the disk yet. The longer > CheckPointGuts() > takes to finish it's work the more likely it becomes (assuming that CLOG > writing > and syncing doesn't happen at the very end). This fits the OP's observation > ob the > problem vanishing when pg_start_backup() does an immediate checkpoint. This is the correct explanation. I've just come back into Wifi range, so I was just writing to you with this explanation but your original point that nextxid must be wrong deserves credit. OTOH I was just waiting to find out what the reason for the physical read was, rather than guessing. Notice that the nextxid value isn't wrong, its just not the correct value to use for starting clog. As it turns out the correct fix is actually just to skip StartupClog() until the end of recovery because it does nothing useful when executed at that time. When I wrote the original code I remember thinking that StartupClog() is superfluous at that point. Brewing a patch now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct26, 2011, at 17:36 , Chris Redekop wrote: > > And I think they also reported that if they didn't run hot standby, > > but just normal recovery into a new master, it didn't have the problem > > either, i.e. without hotstandby, recovery ran, properly extended the > > clog, and then ran as a new master fine. > > Yes this is correct...attempting to start as hotstandby will produce the > pg_clog error repeatedly and then without changing anything else, just > turning hot standby off it will start up successfully. Yup, because with hot standby disabled (on the client side), StartupCLOG() happens after recovery has completed. That, at the very least, makes the problem very unlikely to occur in the non-hot-standby case. I'm not sure it's completely impossible, though. Per my theory about the cause of the problem in my other mail, I think you might see StartupCLOG failures even during crash recovery, provided that wal_level was set to hot_standby when the primary crashed. Here's how 1) We start a checkpoint, and get as far as LogStandbySnapshot() 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId(). The assigned XID requires CLOG extension. 3) The checkpoint continues, and LogStandbySnapshot () advances the checkpoint's nextXid to the XID assigned in (2). 4) We crash after writing the checkpoint record, but before the CLOG extension makes it to the disk, and before any trace of the XID assigned in (2) makes it to the xlog. Then StartupCLOG() would fail at the end of recovery, because we'd end up with a nextXid whose corresponding CLOG page doesn't exist. > > This fits the OP's observation ob the > > problem vanishing when pg_start_backup() does an immediate checkpoint. > > Note that this is *not* the behaviour I'm seeingit's possible it happens > more frequently without the immediate checkpoint, but I am seeing it happen > even with the immediate checkpoint. Yeah, I should have said "of the problem's likelihood decreasing" instead of "vanishing". The point is, the longer the checkpoint takes, the higher the chance the nextId is advanced far enough to require a CLOG extension. That alone isn't enough to trigger the error - the CLOG extension must also *not* make it to the disk before the checkpoint completes - but it's a required precondition for the error to occur. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug wrote: > On Oct26, 2011, at 15:12 , Simon Riggs wrote: >> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk wrote: >> >>> The read fails because their is no data at the location it's trying to >>> read from, because clog hasn't been extended yet by recovery. >> >> You don't actually know that, though I agree it seems a reasonable >> guess and was my first thought also. > > The actual error message also supports that theory. Here's the relevant > snippet from the OP's log (Found in > ca9fd2fe.1d8d2%linas.virba...@continuent.com) > > 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction > 1188673 > 2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at > offset 32768: Success. > > Note that it says "Success" at the end of the second log entry. That > can only happen, I think, if we're trying to read the page adjacent to > the last page in the file. The seek would be successfull, and the subsequent > read() would indicate EOF by returning zero bytes. None of the calls would > set errno. If there was a real IO error, read() would set errno, and if the > page wasn't adjacent to the last page in the file, seek() would set errno. > In both cases we'd see the corresponding error messag, not "Success". And even more pointedly, in the original go around on this: http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056 He reported that clog/ after pg_start_backup call: -rw--- 1 postgres postgres 8192 Sep 23 14:31 Changed during the rsync phase to this: -rw--- 1 postgres postgres 16384 Sep 23 14:33 But on the slave, of course, it was copied before it was extend so it was the original size (that's ok, that's the point of recovery after the backup): -rw--- 1 postgres postgres 8192 Sep 23 14:31 With the error: 2011-09-23 14:33:46 CEST FATAL: could not access status of transaction 37206 2011-09-23 14:33:46 CEST DETAIL: Could not read from file "pg_clog/" at offset 8192: Success. And that error happens *before* recovery even can get attempted. And that if he copied the "recent" clog/ from the master, it did start up. And I think they also reported that if they didn't run hot standby, but just normal recovery into a new master, it didn't have the problem either, i.e. without hotstandby, recovery ran, properly extended the clog, and then ran as a new master fine. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
2011/10/26 Robert Haas : > 2011/10/26 Shigeru Hanada : >> (2011/10/25 19:15), Magnus Hagander wrote: >>> 2011/10/25 Shigeru Hanada: I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a contrib module. I think that this module would be the basis of further SQL/MED development for core, e.g. join-push-down and ANALYZE support. >>> >>> I have not looked at the code itself, but I wonder if we shouldn't >>> consider making this a part of core-proper, not just a contrib module. >>> The fact that it isn't *already* available in core surprises a lot of >>> people... >> >> Do you mean that pgsql_fdw should be a built-in extension like plpgsql >> so that it's available just after initdb? It would be accomplished with >> some more changes: >> >> * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and >> install dynamically loadable module during "make install" for core. The >> pgsql_fdw_handler function can't be included into core binary because we >> must avoid liking libpq with server binary directly. This method is >> also used for libwalreceiver of replication module. >> * Create pgsql_fdw extension during initdb invocation, like plpgsql. >> >> These are not trivial, but not difficult so much. However, I think >> contrib would be the appropriate place for pgsql_fdw because it's >> (relatively) special feature. > > I agree. pgsql_fdw will be a nice feature, but there's no reason to > think that everyone will want it installed by default, and there are > some security reasons to think that they might not. On the flip side, > pushing it out of contrib and onto pgfoundry or whatever makes it > unnecessarily difficult to install, and not as many people will > benefit from it. So contrib seems exactly right to me. > I also agree. The pgsql_fdw will be worthful to locate in the main tree as a contrib module. It will give us clear opportunity to test new features of FDW using RDBMS characteristics; such as join-push-down. However, it should be a separated discussion whether it shall be installed by the default. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
On 26.10.2011 18:42, Robert Haas wrote: On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis wrote: Aren't there a few other cases like this floating around the code? I know the single-xid cache is potentially vulnerable to xid wraparound for the same reason. I believe that we're in trouble with XIDs as soon as you have two active XIDs that are separated by a billion, ... That's not what Jeff is referring to here, though (correct me if I'm wrong). He's talking about the one-item cache in TransactionIdLogFetch(). You don't need need long-running transactions for that to get confused. Specifically, this could happen: 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT; The row has xmin = 123456, and it is cached as committed in the one-item cache by TransactionLogFetch. 2. A lot of time passes. Everything is frozen, and XID wrap-around happens. (Session A is idle but not in a transaction, so it doesn't inhibit freezing.) 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK; By coincidence, this transaction was assigned XID 123456. 4. In session A: SELECT * FROM foo WHERE id = 2; The one-item cache still says that 123456 committed, so we return the tuple inserted by the aborted transaction. Oops. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Stephen Frost writes: > I'm in favor of making that distinction. I would still have pgsql_fdw, > file_fdw, etc, be packaged more-or-less the same way and still use the > CREATE EXTENTION framework, of course. We called that idea “core extension” at the latest hackers meeting, and Greg Smith had a patch with a first selections of extensions to package this way. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct26, 2011, at 15:57 , Florian Pflug wrote: > As you said, the CLOG page corresponding to nextId > *should* always be accessible at the start of recovery (Unless whole file > has been removed by VACUUM, that is). So we shouldn't need to extends CLOG. > Yet the error suggest that the CLOG is, in fact, too short. What I said > is that we shouldn't apply any fix (for the CLOG problem) before we understand > the reason for that apparent contradiction. Ha! I think I've got a working theory. In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint record, and then call CheckPointGuts() which does the actual writing and fsyncing. So far, that fine. If a transaction ID is assigned before we compute the checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() will make sure the new CLOG page goes to disk. But, if wal_level = hot_standby, we also call LogStandbySnapshot() in CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be fine too, except that LogStandbySnapshot() re-assigned the *current* value of ShmemVariableCache->nextXid to the checkpoint's nextXid field. Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts() takes to finish it's work the more likely it becomes (assuming that CLOG writing and syncing doesn't happen at the very end). This fits the OP's observation ob the problem vanishing when pg_start_backup() does an immediate checkpoint. I dunno how to this fix, though, since I don't really understand why LogStandbySnapshot() needs to modify the checkpoint's nextXid.Simon, is there some documentation on what assumptions the hot standby code makes about the various XID fields included in a checkpoint? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
2011/10/26 Shigeru Hanada : > (2011/10/25 19:15), Magnus Hagander wrote: >> 2011/10/25 Shigeru Hanada: >>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >>> contrib module. I think that this module would be the basis of further >>> SQL/MED development for core, e.g. join-push-down and ANALYZE support. >> >> I have not looked at the code itself, but I wonder if we shouldn't >> consider making this a part of core-proper, not just a contrib module. >> The fact that it isn't *already* available in core surprises a lot of >> people... > > Do you mean that pgsql_fdw should be a built-in extension like plpgsql > so that it's available just after initdb? It would be accomplished with > some more changes: > > * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and > install dynamically loadable module during "make install" for core. The > pgsql_fdw_handler function can't be included into core binary because we > must avoid liking libpq with server binary directly. This method is > also used for libwalreceiver of replication module. > * Create pgsql_fdw extension during initdb invocation, like plpgsql. > > These are not trivial, but not difficult so much. However, I think > contrib would be the appropriate place for pgsql_fdw because it's > (relatively) special feature. I agree. pgsql_fdw will be a nice feature, but there's no reason to think that everyone will want it installed by default, and there are some security reasons to think that they might not. On the flip side, pushing it out of contrib and onto pgfoundry or whatever makes it unnecessarily difficult to install, and not as many people will benefit from it. So contrib seems exactly right to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] cache lookup failed in plpgsql - reason?
Hello one my customer reported a random issue. He uses a procedure with following fragment IF NOT EXISTS( SELECT relname FROM pg_class WHERE relname = 'tmp_object_state_change' AND relkind = 'r' AND pg_table_is_visible(oid) ) THEN CREATE TEMPORARY TABLE tmp_object_state_change ( object_id INTEGER, object_hid INTEGER, new_states INTEGER[], old_states INTEGER[] ); ELSE TRUNCATE tmp_object_state_change; END IF; These lines sometimes raise a error Oct 25 20:13:44 db-s-01 postgres: -- postgres[29970]: [3-1] 2011-10-25 20:13:44 CEST adifd 29970 ERROR: cache lookup failed for relation 319883311 Oct 25 20:13:44 db-s-01 postgres: -- postgres[29970]: [3-2] 2011-10-25 20:13:44 CEST adifd 29970 CONTEXT: SQL statement "SELECT NOT EXISTS( SELECT relname FROM pg_class WHERE relname = Oct 25 20:13:44 db-s-01 postgres: -- postgres[29970]: [3-3] 'tmp_object_state_change' AND relkind = 'r' AND pg_table_is_visible(oid) )" Oct 25 20:13:44 db-s-01 postgres: -- postgres[29970]: [3-4] PL/pgSQL function "update_object_states" line 2 at IF Oct 25 20:13:44 db-s-01 postgres: -- postgres[29970]: [3-5] 2011-10-25 20:13:44 CEST adifd 29970 STATEMENT: SELECT update_object_states($1::integer) I don't see a reason why on this query cache should be broken, He uses Pg 8.3.15. Any idea? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
> And I think they also reported that if they didn't run hot standby, > but just normal recovery into a new master, it didn't have the problem > either, i.e. without hotstandby, recovery ran, properly extended the > clog, and then ran as a new master fine. Yes this is correct...attempting to start as hotstandby will produce the pg_clog error repeatedly and then without changing anything else, just turning hot standby off it will start up successfully. > This fits the OP's observation ob the > problem vanishing when pg_start_backup() does an immediate checkpoint. Note that this is *not* the behaviour I'm seeingit's possible it happens more frequently without the immediate checkpoint, but I am seeing it happen even with the immediate checkpoint. > This is a different problem and has already been reported by one of > your colleagues in a separate thread, and answered in detail by me > there. There is no bug related to this error message. Excellent...I will continue this discussion in that thread.
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct26, 2011, at 15:12 , Simon Riggs wrote: > On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk wrote: > >> The read fails because their is no data at the location it's trying to >> read from, because clog hasn't been extended yet by recovery. > > You don't actually know that, though I agree it seems a reasonable > guess and was my first thought also. The actual error message also supports that theory. Here's the relevant snippet from the OP's log (Found in ca9fd2fe.1d8d2%linas.virba...@continuent.com) 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673 2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at offset 32768: Success. Note that it says "Success" at the end of the second log entry. That can only happen, I think, if we're trying to read the page adjacent to the last page in the file. The seek would be successfull, and the subsequent read() would indicate EOF by returning zero bytes. None of the calls would set errno. If there was a real IO error, read() would set errno, and if the page wasn't adjacent to the last page in the file, seek() would set errno. In both cases we'd see the corresponding error messag, not "Success". > The error is very specifically referring to 22811359, which is the > nextxid from pg_control and updated by checkpoint. Where does that XID come from? The reference to that XID in the archives that I can find is in your message CA+U5nMKUUoA8kRG=itfso5nzue3x_kdjz78eaun3_fkmq-u...@mail.gmail.com > 22811359 is mid-way through a clog page, so prior xids will already > have been allocated, pages extended and then those pages fsyncd before > the end of pg_start_backup(). So it shouldn't be possible for that > page to be absent from the base backup, unless the base backup was > taken without a preceding checkpoint, which seems is not the case from > the script output. Or unless the nextId we store in the checkpoint is for some reason higher than it should be. Or unless nextId somehow gets mangled during recovery. Or unless there's some interaction between VACUUM and CHECKPOINTS that we're overlooking... > Note that if you are correct, then the solution is to extend clog, > which Florian disagrees with as a solution. That's not what I said. As you said, the CLOG page corresponding to nextId *should* always be accessible at the start of recovery (Unless whole file has been removed by VACUUM, that is). So we shouldn't need to extends CLOG. Yet the error suggest that the CLOG is, in fact, too short. What I said is that we shouldn't apply any fix (for the CLOG problem) before we understand the reason for that apparent contradiction. Doing it nevertheless to get rid of this seems dangerous. What happens, for example, to the CLOG state of transactions earlier than the checkpoint's nextId? There COMMIT record may very well lie before the checkpoint's REDO pointer, so the CLOG we copied better contained their correct state. Yet if it does, then why isn't the nextId's CLOG page accessible? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Shigeru Hanada writes: > (2011/10/25 19:15), Magnus Hagander wrote: >> I have not looked at the code itself, but I wonder if we shouldn't >> consider making this a part of core-proper, not just a contrib module. >> The fact that it isn't *already* available in core surprises a lot of >> people... > Do you mean that pgsql_fdw should be a built-in extension like plpgsql > so that it's available just after initdb? If that was what he meant, I'd vote against it. There are way too many people who will *not* want their databases configured to be able to reach out onto the net. This feature should be something that has to be installed by explicit user action. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis wrote: > On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote: >> Hmm, I don't think that's safe. After Oid wraparound, a range type oid >> might get reused for some other range type, and the cache would return >> stale values. Extremely unlikely to happen by accident, but could be >> exploited by an attacker. > > Any ideas on how to remedy that? I don't have another plan for making it > perform well. Plugging it into the cache invalidation mechanism seems > like overkill, but I suppose that would solve the problem. > > Aren't there a few other cases like this floating around the code? I > know the single-xid cache is potentially vulnerable to xid wraparound > for the same reason. I believe that we're in trouble with XIDs as soon as you have two active XIDs that are separated by a billion, because then you could have a situation where some people think a given XID is in the future and others think it's in the past. I have been wondering if we should have some sort of active guard against that scenario; I don't think we do at present. But OID wraparound is not the same as XID wraparound. It's far more common, I think, for a single transaction to use lots of OIDs than it is for it to use lots of XIDs (i.e. have many subtransactions). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified
On Tue, Oct 25, 2011 at 11:24 PM, Tom Lane wrote: > Even given your recent changes to reduce the overhead of checking for > sinval messages, I'm not sure that it'd be practical to move the sinval > message processing to just-before-we-look-up-a-cache-entry. Wait a minute: I'm confused. What's at issue here, at least AIUI, is taking a TOAST pointer and fetching the corresponding value. But that must involve reading from the TOAST table, and our usual paradigm for reading from system catalogs is (1) take AccessShareLock, (2) read the relevant rows, (3) release AccessShareLock. If we're doing that here, then AcceptInvalidationMessages() is already getting called. If we're not, this seems horribly unsafe; aside from the current bug, somebody could rewrite the table in the interim. > Right now, > we do AcceptInvalidationMessages basically once per table per query > (or maybe it's twice or so, but anyway a very small multiplier on that). > If we try to do it every time through SearchSysCache, we are probably > talking two to three orders of magnitude more checks, which ISTM is > certain to push the sinval queue back up to the top of the heap for > contention. I think we've pretty much got the *contention* licked, barring an increase in the number of things that generate sinval messages, but calling it too often will still be slow, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk wrote: > The read fails because their is no data at the location it's trying to > read from, because clog hasn't been extended yet by recovery. You don't actually know that, though I agree it seems a reasonable guess and was my first thought also. The error is very specifically referring to 22811359, which is the nextxid from pg_control and updated by checkpoint. 22811359 is mid-way through a clog page, so prior xids will already have been allocated, pages extended and then those pages fsyncd before the end of pg_start_backup(). So it shouldn't be possible for that page to be absent from the base backup, unless the base backup was taken without a preceding checkpoint, which seems is not the case from the script output. Note that if you are correct, then the solution is to extend clog, which Florian disagrees with as a solution. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs wrote: >> It's very likely that it's a PostgreSQL problem, though. It's probably >> not a pilot error since it happens even for backups taken with >> pg_basebackup(), >> so the only explanation other than a PostgreSQL bug is broken hardware or >> a pretty serious kernel/filesystem bug. > > The way forwards here is for someone to show the clog file that causes > the error and find out why the call to read() fails. Sorry, I thought the problem was obvious. Either that, of I've completely missed something in these threads... I'll admit to not following this one very closely anymore... When the backup started, the clog was small. So on the "recovering instance", the clog is small. PostgreSQL is supposed to be able to deal with any file as it was when the backup starts. When the backup is stopped, clog is big. But that file was copied after the backup was started, not after the backup finished. So its size is only guarenteed to be as big as it was when the backup started. Recovery is responsible for extending it as it was extended during the backup period on the master. The read fails because their is no data at the location it's trying to read from, because clog hasn't been extended yet by recovery. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Wed, Oct 26, 2011 at 12:16 PM, Florian Pflug wrote: >> Chris' clog error was caused by a file read error. The file was >> opened, we did a seek within the file and then the call to read() >> failed to return a complete page from the file. >> >> The xid shown is 22811359, which is the nextxid in the control file. >> >> So StartupClog() must have failed trying to read the clog page from disk. > > Yep. > >> That isn't a Hot Standby problem, a recovery problem nor is it certain >> its a PostgreSQL problem. > > It's very likely that it's a PostgreSQL problem, though. It's probably > not a pilot error since it happens even for backups taken with > pg_basebackup(), > so the only explanation other than a PostgreSQL bug is broken hardware or > a pretty serious kernel/filesystem bug. The way forwards here is for someone to show the clog file that causes the error and find out why the call to read() fails. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for distinguishing PG instances in event log v2
From: "Magnus Hagander" 2011/7/16 MauMau : Hello, The attached file is a revised patch which reflects all review comments by Magnus in: http://archives.postgresql.org/pgsql-hackers/2011-07/msg00839.php I have applied this patch after another round of rather extensive modifications. Thank you, Magnus. I'll see the final code some later. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct25, 2011, at 13:39 , Florian Pflug wrote: > On Oct25, 2011, at 11:13 , Simon Riggs wrote: >> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs wrote: >>> We are starting recovery at the right place but we are initialising >>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is >>> being derived later than it should be, which can cause problems if >>> this then means that whole pages are unitialised in subtrans. The bug >>> only shows up if you do enough transactions (2048 is always enough) to >>> move to the next subtrans page between the redo pointer and the >>> checkpoint record while at the same time we do not have a long running >>> transaction that spans those two points. That's just enough to happen >>> reasonably frequently on busy systems and yet just enough to have >>> slipped through testing. >>> >>> We must either >>> >>> 1. During CreateCheckpoint() we should derive oldestActiveXid before >>> we derive the redo location > >> (1) looks the best way forwards in all cases. > > Let me see if I understand this > > The probem seems to be that we currently derive oldestActiveXid end the end of > the checkpoint, just before writing the checkpoint record. Since we use > oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before > that checkpoint record (but after the REDO location, of course) may very well > contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt > to touch these XID's SUBTRANS state, we error out. > > Your patch seems sensible, because the checkpoint "logically" occurs at the > REDO location not the checkpoint's location, so we ought to log an > oldestActiveXID > corresponding to that location. Thinking about this some more (and tracing through the code), I realized that things are a bit more complicated. What we actually need to ensure, I think, is that the XID we pass to StartupSUBTRANS() is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at first glance, implies that we ought to use the nextId at the *beginning* of the checkpoint for SUBTRANS initialization. At second glace, however, that'd be wrong, because backends emit XLOG_XACT_ASSIGNMENT only every PGPROC_MAX_CACHED_SUBXIDS sub-xid assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has started may contain sub-XIDs which were assigned *before* the checkpoint has started. Using oldestActiveXID works around that because we guarantee that sub-XIDs are always larger than their parent XIDs and because only active transactions can produce XLOG_XACT_ASSIGNMENT records. So your patch is fine, but I think the reasoning about why oldestActiveXID is the correct value for StartupSUBTRANS deserves an explanation somewhere. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct25, 2011, at 14:51 , Simon Riggs wrote: > On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug wrote: > >> What I don't understand is how this affects the CLOG. How does >> oldestActiveXID >> factor into CLOG initialization? > > It is an entirely different error. Ah, OK. I assumed that you believe the wrong oldestActiveXID computation solved both the SUBTRANS-related *and* the CLOG-related errors, since you said "We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly" at the start of the mail. > Chris' clog error was caused by a file read error. The file was > opened, we did a seek within the file and then the call to read() > failed to return a complete page from the file. > > The xid shown is 22811359, which is the nextxid in the control file. > > So StartupClog() must have failed trying to read the clog page from disk. Yep. > That isn't a Hot Standby problem, a recovery problem nor is it certain > its a PostgreSQL problem. It's very likely that it's a PostgreSQL problem, though. It's probably not a pilot error since it happens even for backups taken with pg_basebackup(), so the only explanation other than a PostgreSQL bug is broken hardware or a pretty serious kernel/filesystem bug. > OTOH SlruPhysicalReadPage() does cope gracefully with missing clog > files during recovery, so maybe we can think of a way to make recovery > cope with a SLRU_READ_FAILED error gracefully also. Any ideas? As long as we don't understand how the CLOG-related errors happen in the first place, I think it's a bad idea to silence them. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified
On Wed, Oct 26, 2011 at 4:24 AM, Tom Lane wrote: > Even given your recent changes to reduce the overhead of checking for > sinval messages, I'm not sure that it'd be practical to move the sinval > message processing to just-before-we-look-up-a-cache-entry. Right now, > we do AcceptInvalidationMessages basically once per table per query > (or maybe it's twice or so, but anyway a very small multiplier on that). > If we try to do it every time through SearchSysCache, we are probably > talking two to three orders of magnitude more checks, which ISTM is > certain to push the sinval queue back up to the top of the heap for > contention. Initially I provided an implementation of eager locking, as Robert suggests, but the above is a great argument against doing it that way. Incidentally, I'd like to focus on the causal reason for these problems. We have static type definitions, which allow us to completely avoid dynamic type management at runtime. That is a performance advantage for us in normal running, but it can also give operational problems when we look to make changes. > But in any case, this isn't the core of the problem. The real point > here is that we need a guarantee that a syscache entry we're going to > use is/was valid as of some suitable time point later than the start of > our current transaction. (Once we have taken a snapshot, VACUUM will > know that it can't remove any tuples that were deleted after the time of > that snapshot; so even for SnapshotNow fetches, it's important to have > an MVCC snapshot to protect toast-table dereferences.) Perhaps rather > than tying the problem into SearchSysCache, we should attach the > overhead to GetTransactionSnapshot, which is called appealingly few > times per query. But right offhand it seems like that only protects us > against the toast-tuple-deletion problem, not against the more general > one of getting a stale view of the status of some relation. Do we need to guarantee that? It seems strange to me to use the words cache and strict in the same sentence. I'm sure there are many points in the code where strictness is required though also in many cases, it should be acceptable to say that a cache miss is not a problem, so does not require strict guarantees. Do we need a redesign? Or do we need a way to use relaxation/optimistic techniques. I'm aware that it could be a huge undertaking to examine all call points. If you have any ideas of where to investigate or parts of the problem to assist with, I'll be happy to work on this. This seems important to me. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Tue, Oct 25, 2011 at 10:06 PM, Chris Redekop wrote: >> Chris, can you rearrange the backup so you copy the pg_control file as >> the first act after the pg_start_backup? > I tried this and it doesn't seem to make any difference. It won't, that was a poor initial diagnosis on my part. > I also tried the > patch and I can no longer reproduce the subtrans error Good >, however instead it > now it starts up, but never gets to the point where it'll accept > connections. It starts up but if I try to do anything I always get "FATAL: > the database system is starting up"...even if the load is removed from the > primary, the standby still never finishes "starting up". ... > 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG: consistent state delayed > because recovery snapshot incomplete ... This is a different problem and has already been reported by one of your colleagues in a separate thread, and answered in detail by me there. There is no bug related to this error message. >From here, it looks like the published fixes the originally reported problem. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
(2011/10/25 19:15), Magnus Hagander wrote: > 2011/10/25 Shigeru Hanada: >> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >> contrib module. I think that this module would be the basis of further >> SQL/MED development for core, e.g. join-push-down and ANALYZE support. > > I have not looked at the code itself, but I wonder if we shouldn't > consider making this a part of core-proper, not just a contrib module. > The fact that it isn't *already* available in core surprises a lot of > people... Do you mean that pgsql_fdw should be a built-in extension like plpgsql so that it's available just after initdb? It would be accomplished with some more changes: * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and install dynamically loadable module during "make install" for core. The pgsql_fdw_handler function can't be included into core binary because we must avoid liking libpq with server binary directly. This method is also used for libwalreceiver of replication module. * Create pgsql_fdw extension during initdb invocation, like plpgsql. These are not trivial, but not difficult so much. However, I think contrib would be the appropriate place for pgsql_fdw because it's (relatively) special feature. -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup
(CC'ing pgsql-hackers, this started as an IM discussion yesterday but really belongs in the archives) On 25.10.2011 23:52, Magnus Hagander wrote: There's a tiny chance to get incomplete xlog files with pg_receivexlog if you crash: 1. pg_receivexlog finishes write()ing a file but system crashes before fsync() finishes. 2. When pg_receivexlog restarts after crash, the last WAL file was not fully flushed to disk, with holes in the middle, but it has the right length. pg_receivexlog will continue streaming from the next file. not sure if we care about such a narrow window, but maybe we do So how would we go about fixing that? Always unlink the last file in the directory and try from there would seem dangerous too - what if it's not available on the master anymore, then we might have given up on data... Start streaming from the beginning of the last segment, but don't unlink it first. Just overwrite it as you receive the data. Or, always create new xlog file as "0001000100D3.partial", and only when it's fully written, fsync it, and then rename it to "0001000100D3". Then you know that if a file doesn't have the .partial suffix, it's complete. The fact that the last partial file always has the .partial suffix needs some extra pushups in the restore_command, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
On 25.10.2011 19:37, Jeff Davis wrote: On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote: Hmm, I don't think that's safe. After Oid wraparound, a range type oid might get reused for some other range type, and the cache would return stale values. Extremely unlikely to happen by accident, but could be exploited by an attacker. Any ideas on how to remedy that? I don't have another plan for making it perform well. Plugging it into the cache invalidation mechanism seems like overkill, but I suppose that would solve the problem. I think we should look at the array-functions for precedent. array_in et al cache the information in fn_extra, so that when it's called repeatedly in one statement for the same type, the information is only looked up once. That's good enough, it covers repeated execution in a single query, as well as COPY and comparison calls from index searches, for example. Aren't there a few other cases like this floating around the code? Not that I know of. That said, I wouldn't be too surprised if there was. I know the single-xid cache is potentially vulnerable to xid wraparound for the same reason. True. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers