Re: [HACKERS] [PATCHES] WAL logging freezing
We just discussed this in detail with Simon, and it looks like we have 5 (!) different but related problems: 1) The original problem of freeze then crash, leaving too high values in relminxid and datminxid. If you then run vacuum, it might truncate CLOG and you lose the commit status of the records that were supposed to be frozen. To fix this, we need to WAL log freezing as already discussed. 2) vactuple_get_minxid doesn't take into account xmax's of tuples that have HEAP_XMAX_INVALID set. That's a problem: transaction 1001 - BEGIN; DELETE FROM foo where key = 1; transaction 1001 - ROLLBACK; transaction 1002 - VACUUM foo; crash VACUUM foo will set relminxid to 1002, because HEAP_XMAX_INVALID was set on the tuple (possibly by vacuum itself) that the deletion that rolled back touched. However, that hint-bit update hasn't hit the disk yet, so after recovery, the tuple will have an xmax of 1001 with no hint-bit, and relminxid is 1002. The simplest fix for this issue is to ignore the HEAP_XMAX_INVALID hint bit, and take any xmax other than InvalidXid into account when calculating the relminxid. 3) If you recover from a PITR backup (or have a hot stand-by), with base backup that's more than 4 billion transactions older than the newest WAL record, the clog entries of old transactions in the base backup will overlap with the clog entries of new transactions that are in the WAL records. This is the problem you also pointed out below. To fix this, we need to emit a WAL record when truncating the clog. We must also make sure that recovery of any WAL record type doesn't rely on clog, because if we truncate the clog and then crash, recovery won't have the clog available for the old transactions. At the moment, TruncateCLog issues a checkpoint to protect from that but that's not going to work when rolling forward logs in PITR, right? 4) If we fix issue 2 so that vactuple_get_minxid always takes xmax into account, even if HEAP_XMAX_INVALID is set, a tuple with an aborted xmax will keep us from advancing relminxid and truncating clog etc. That doesn't lead to data corruption, but you will eventually hit the transaction wrap-around limit. We don't have the same problem with xmin, because we freeze tuples that are older than FreezeLimit to avoid it, but we don't do that for xmax. To fix this, replace any xmax older than FreezeLimit with InvalidXid during vacuum. That also needs to be WAL logged. 5) We don't freeze tuples that are in RECENTLY_DEAD or DELETE_IN_PROGRESS state. That doesn't lead to data corruption, but it might make you hit the transaction wrap-around limit. That can happen if you have a transaction that deletes or updates a very old, but not yet frozen tuple. If you run vacuum while the deleting transaction is in progress, vacuum won't freeze the tuple, and won't advance the wrap-around limit because of the old tuple. That's not serious if the deleting transaction commits, because the next vacuum will then remove the tuple, but if it aborts, we might run into the same problem on the next vacuum, and the next one, and the next one, until we reach the wrap-around. To fix this, simply do the freezing for tuples in RECENTLY_DEAD and DELETE_IN_PROGRESS states as well, Am I missing something? Finding this many bugs makes me nervous... Simon volunteered to make the clog changes for 3 because it's a PITR related issue. I can write a patch/patches for the other changes if it helps. Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: I think it's premature to start writing patches until we've decided how this really needs to work. Not logging hint-bit updates seems safe to me. As long as we have the clog, the hint-bit is just a hint. The problem with freezing is that after freezing tuples, the corresponding clog page can go away. Actually clog can go away much sooner than that, at least in normal operation --- that's what datvacuumxid is for, to track where we can truncate clog. Maybe it's OK to say that during WAL replay we keep it all the way back to the freeze horizon, but I'm not sure how we keep the system from wiping clog it still needs right after switching to normal operation. Maybe we should somehow not xlog updates of datvacuumxid? Another thing I'm concerned about is the scenario where a PITR hot-standby machine tracks a master over a period of more than 4 billion transactions. I'm not sure what will happen in the slave's pg_clog directory, but I'm afraid it won't be good :-( regards, tom lane -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] WAL logging freezing
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: I think it's premature to start writing patches until we've decided how this really needs to work. Not logging hint-bit updates seems safe to me. As long as we have the clog, the hint-bit is just a hint. The problem with freezing is that after freezing tuples, the corresponding clog page can go away. Actually clog can go away much sooner than that, at least in normal operation --- that's what datvacuumxid is for, to track where we can truncate clog. Maybe it's OK to say that during WAL replay we keep it all the way back to the freeze horizon, but I'm not sure how we keep the system from wiping clog it still needs right after switching to normal operation. Maybe we should somehow not xlog updates of datvacuumxid? Oh, I just understood what you were trying to say. We really don't keep clog around all the way back to the freeze horizon. Well, how about doing just that? With a FreezeLimit of 1 billion transactions, that's 128 megabytes of clog. We could freeze more aggressively to make it less. We can't just not xlog updates of datvacuumxid, because that buffer might still be written to disk before all the hint-bit updates. One trick would be to include the old value in the WAL record. WAL recovery would have to keep track of those records, and reset any datvacuumxid to the old value if there's no checkpoint record afterwards. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] WAL logging freezing
Heikki Linnakangas wrote: We just discussed this in detail with Simon, and it looks like we have 5 (!) different but related problems: Wow, four of them are mine :-( 2) vactuple_get_minxid doesn't take into account xmax's of tuples that have HEAP_XMAX_INVALID set. That's a problem: transaction 1001 - BEGIN; DELETE FROM foo where key = 1; transaction 1001 - ROLLBACK; transaction 1002 - VACUUM foo; crash VACUUM foo will set relminxid to 1002, because HEAP_XMAX_INVALID was set on the tuple (possibly by vacuum itself) that the deletion that rolled back touched. However, that hint-bit update hasn't hit the disk yet, so after recovery, the tuple will have an xmax of 1001 with no hint-bit, and relminxid is 1002. The simplest fix for this issue is to ignore the HEAP_XMAX_INVALID hint bit, and take any xmax other than InvalidXid into account when calculating the relminxid. Ugh. Is there another solution to this? Say, sync the buffer so that the hint bits are written to disk? The bug (4) below is problematic if you take this approach; basically it removes all the optimization won by the relminxid patch. Simon volunteered to make the clog changes for 3 because it's a PITR related issue. I can write a patch/patches for the other changes if it helps. I'm swamped at the moment, so I'd appreciate it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] bug in on_error_rollback !?
On Fri, Oct 27, 2006 at 01:19:25PM -, Greg Sabino Mullane wrote: This is documented clearly on the psql man page, so it is simply not a bug, and changing this would probably break lots of legacy scripts. In a general sense, perhaps, but in this *particular* case, I don't see what harm allowing \set on_error_rollback would be: it certainly won't break any existing scripts. I wrote this feature (but someone else chose the name!) and I still occasionally write it lowercase and wonder why it isn't working. :) Perhaps even allowing all of the \set commands to be case-insensitive may be a good idea? The problem is that people my be depending on the case-sensitivity in their scripts... \set ACounter 1 \set aCounter 2 Of course, they're just asking for trouble, but suddenly making psql variables case-insensitive would break that code. Perhaps a good compromise would be adopting SQL case syntax (ie: you have to wrap in double quotes to preserve case). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] WAL logging freezing
Alvaro Herrera [EMAIL PROTECTED] writes: Ugh. Is there another solution to this? Say, sync the buffer so that the hint bits are written to disk? Yeah. The original design for all this is explained by the notes for TruncateCLOG: * When this is called, we know that the database logically contains no * reference to transaction IDs older than oldestXact. However, we must * not truncate the CLOG until we have performed a checkpoint, to ensure * that no such references remain on disk either; else a crash just after * the truncation might leave us with a problem. The pre-8.2 coding is actually perfectly safe within a single database, because TruncateCLOG is only called at the end of a database-wide vacuum, and so the checkpoint is guaranteed to have flushed valid hint bits for all tuples to disk. There is a risk in other databases though. I think that in the 8.2 structure the equivalent notion must be that VACUUM has to flush and fsync a table before it can advance the table's relminxid. That still leaves us with the problem of hint bits not being updated during WAL replay. I think the best solution for this is for WAL replay to force relvacuumxid to equal relminxid (btw, these field names seem poorly chosen, and the comment in catalogs.sgml isn't self-explanatory...) rather than adopting the value shown in the WAL record. This probably is best done by abandoning the generic overwrite tuple WAL record type in favor of something specific to minxid updates. The effect would then be that a PITR slave would not truncate its clog beyond the freeze horizon until it had performed a vacuum of its own. The point about aborted xmax being a risk factor is a good one. I don't think the risk is material for ordinary crash recovery scenarios, because ordinarily we'd have many opportunities to set the hint bit before anything really breaks, but it's definitely an issue for long-term PITR replay scenarios. I'll work on this as soon as I get done with the btree-index issue I'm messing with now. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Index greater than 8k
Hello, I recently posted about a word being too long with Tsearch2. That isn't actually the problem I am trying to solve (thanks for the feedback though, now I understand it). The problem I am after is the 8k index size issue. It is very easy to get a GIST index (especially when using tsearch2) that is larger than that. Is recompiling the block size the option there? What are the downsides, except for the custom build? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Deadlock with pg_dump?
On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote: Chris Campbell [EMAIL PROTECTED] writes: Is there additional logging information I can turn on to get more details? I guess I need to see exactly what locks both processes hold, and what queries they were running when the deadlock occurred? Is that easily done, without turning on logging for *all* statements? log_min_error_statement = error would at least get you the statements reporting the deadlocks, though not what they're conflicting against. Yeh, we need a much better locking logger for performance analysis. We really need to dump the whole wait-for graph for deadlocks, since this might be more complex than just two statements involved. Deadlocks ought to be so infrequent that we can afford the log space to do this - plus if we did this it would likely lead to fewer deadlocks. For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter that would allow you to dump the wait-for graph for any data-level locks that wait too long, rather than just those that deadlock. Many applications experience heavy locking because of lack of holistic design. That will also show up the need for other utilities to act CONCURRENTLY, if possible. [ Memo to hackers: why is it that log_min_error_statement = error isn't the default? ] For production systems where we expect fewer ERRORs, each one is important, so this would be a good default. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] WAL logging freezing
On Mon, 2006-10-30 at 12:05 -0500, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Ugh. Is there another solution to this? Say, sync the buffer so that the hint bits are written to disk? Yeah. The original design for all this is explained by the notes for TruncateCLOG: * When this is called, we know that the database logically contains no * reference to transaction IDs older than oldestXact.However, we must * not truncate the CLOG until we have performed a checkpoint, to ensure * that no such references remain on disk either; else a crash just after * the truncation might leave us with a problem. The pre-8.2 coding is actually perfectly safe within a single database, because TruncateCLOG is only called at the end of a database-wide vacuum, and so the checkpoint is guaranteed to have flushed valid hint bits for all tuples to disk. There is a risk in other databases though. I think that in the 8.2 structure the equivalent notion must be that VACUUM has to flush and fsync a table before it can advance the table's relminxid. Ouch! We did discuss that also. Flushing the buffercache is nasty with very large caches, so this makes autovacuum much less friendly - and could take a seriously long time if you enforce the vacuum delay costings. ISTM we only need to flush iff the clog would be truncated when we update relminxid. Otherwise we are safe to update even if we crash, since the clog will not have been truncated. That still leaves us with the problem of hint bits not being updated during WAL replay. I think the best solution for this is for WAL replay to force relvacuumxid to equal relminxid (btw, these field names seem poorly chosen, and the comment in catalogs.sgml isn't self-explanatory...) rather than adopting the value shown in the WAL record. This probably is best done by abandoning the generic overwrite tuple WAL record type in favor of something specific to minxid updates. The effect would then be that a PITR slave would not truncate its clog beyond the freeze horizon until it had performed a vacuum of its own. Sounds good. Methinks we do still need the TruncateCLOG patch to ensure we do WAL replay for the truncation? I'm posting that now to -patches as a prototype. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/clog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v retrieving revision 1.40 diff -c -r1.40 clog.c *** src/backend/access/transam/clog.c 4 Oct 2006 00:29:49 - 1.40 --- src/backend/access/transam/clog.c 30 Oct 2006 14:32:14 - *** *** 68,74 static int ZeroCLOGPage(int pageno, bool writeXlog); static bool CLOGPagePrecedes(int page1, int page2); ! static void WriteZeroPageXlogRec(int pageno); /* --- 68,74 static int ZeroCLOGPage(int pageno, bool writeXlog); static bool CLOGPagePrecedes(int page1, int page2); ! static void WriteClogXlogRec(int pageno, int rectype); /* *** *** 198,204 slotno = SimpleLruZeroPage(ClogCtl, pageno); if (writeXlog) ! WriteZeroPageXlogRec(pageno); return slotno; } --- 198,204 slotno = SimpleLruZeroPage(ClogCtl, pageno); if (writeXlog) ! WriteClogXlogRec(pageno, CLOG_ZEROPAGE); return slotno; } *** *** 338,343 --- 338,345 /* Perform a CHECKPOINT */ RequestCheckpoint(true, false); + WriteClogXlogRec(cutoffPage, CLOG_TRUNCATE); + /* Now we can remove the old CLOG segment(s) */ SimpleLruTruncate(ClogCtl, cutoffPage); } *** *** 375,389 * (Besides which, this is normally done just before entering a transaction.) */ static void ! WriteZeroPageXlogRec(int pageno) { XLogRecData rdata; rdata.data = (char *) (pageno); rdata.len = sizeof(int); rdata.buffer = InvalidBuffer; rdata.next = NULL; ! (void) XLogInsert(RM_CLOG_ID, CLOG_ZEROPAGE | XLOG_NO_TRAN, rdata); } /* --- 377,393 * (Besides which, this is normally done just before entering a transaction.) */ static void ! WriteClogXlogRec(int pageno, int rectype) { XLogRecData rdata; + Assert(rectype == CLOG_ZEROPAGE || rectype == CLOG_TRUNCATE); + rdata.data = (char *) (pageno); rdata.len = sizeof(int); rdata.buffer = InvalidBuffer; rdata.next = NULL; ! (void) XLogInsert(RM_CLOG_ID, rectype | XLOG_NO_TRAN, rdata); } /* *** *** 409,414 --- 413,432 LWLockRelease(CLogControlLock); } + else if (info == CLOG_TRUNCATE) + { + int pageno; + + memcpy(pageno, XLogRecGetData(record), sizeof(int)); + + LWLockAcquire(CLogControlLock, LW_EXCLUSIVE); + + SimpleLruTruncate(ClogCtl, pageno); + + LWLockRelease(CLogControlLock); + } + else + elog(PANIC, clog_redo: unknown op code %u, info); } void
Re: [HACKERS] [PATCHES] WAL logging freezing
Simon Riggs wrote: On Mon, 2006-10-30 at 12:05 -0500, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Ugh. Is there another solution to this? Say, sync the buffer so that the hint bits are written to disk? Yeah. The original design for all this is explained by the notes for TruncateCLOG: * When this is called, we know that the database logically contains no * reference to transaction IDs older than oldestXact. However, we must * not truncate the CLOG until we have performed a checkpoint, to ensure * that no such references remain on disk either; else a crash just after * the truncation might leave us with a problem. The pre-8.2 coding is actually perfectly safe within a single database, because TruncateCLOG is only called at the end of a database-wide vacuum, and so the checkpoint is guaranteed to have flushed valid hint bits for all tuples to disk. There is a risk in other databases though. I think that in the 8.2 structure the equivalent notion must be that VACUUM has to flush and fsync a table before it can advance the table's relminxid. Ouch! We did discuss that also. Flushing the buffercache is nasty with very large caches, so this makes autovacuum much less friendly - and could take a seriously long time if you enforce the vacuum delay costings. ISTM we only need to flush iff the clog would be truncated when we update relminxid. Otherwise we are safe to update even if we crash, since the clog will not have been truncated. I don't understand. When clog is actually going to be truncated, if it's determined that there's any page that can be truncated, then a checkpoint is forced. If no page is going to be removed then there's no checkpoint, which makes a lot of sense and of course avoids the problem of useless flushes. In fact I don't understand what's the point about multiple databases vs. a single database. Surely a checkpoint would flush all buffers in all databases, no? This would flush all hint bits, everywhere. So this bug does not really exist. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] WAL logging freezing
Simon Riggs [EMAIL PROTECTED] writes: ISTM we only need to flush iff the clog would be truncated when we update relminxid. Wrong :-( If the relvacuumxid change (not relminxid ... as I said, these names aren't very transparent) makes it to disk but not all the hint bits do, you're at risk. Crash, restart, vacuum some other table, and *now* the global min vacuumxid advances. The fact that we're WAL-logging the relvacuumxid change makes this scenario exceedingly probable, if no action is taken to force out the hint bits. The only alternative I can see is the one Heikki suggested: don't truncate clog until the freeze horizon. That's safe (given the planned change to WAL-log tuple freezing) and clean and simple, but a permanent requirement of 250MB+ for pg_clog would put the final nail in the coffin of PG's usability in small-disk-footprint environments. So I don't like it much. I suppose it could be made more tolerable by reducing the freeze horizon, say to 100M instead of 1G transactions. Anyone for a GUC parameter? In a high-volume DB you'd want the larger setting to minimize the amount of tuple freezing work. OTOH it seems like making this configurable creates a nasty risk for PITR situations: a slave that's configured with a smaller freeze window than the master is probably not safe. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] WAL logging freezing
Alvaro Herrera [EMAIL PROTECTED] writes: In fact I don't understand what's the point about multiple databases vs. a single database. Surely a checkpoint would flush all buffers in all databases, no? Yeah --- all the ones that are dirty *now*. Consider the case where you vacuum DB X, update its datvacuumxid, and don't checkpoint because the global min didn't advance. Now you crash, possibly leaving some hint bits unwritten; but the datvacuumxid change did make it to disk. After restart, vacuum DB Y, update its datvacuumxid, and find that the global min *did* advance. You checkpoint, and that guarantees that DB Y is clean for the clog truncation. But DB X isn't. The 8.2 changes have created the equivalent risk at the level of each individual table. We can't write a vacuumxid change unless we are sure that the hint-bit changes it promises are actually down to disk. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] WAL logging freezing
On Mon, 2006-10-30 at 16:58 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ISTM we only need to flush iff the clog would be truncated when we update relminxid. Wrong :-( If the relvacuumxid change (not relminxid ... as I said, these names aren't very transparent) makes it to disk but not all the hint bits do, you're at risk. Crash, restart, vacuum some other table, and *now* the global min vacuumxid advances. The fact that we're WAL-logging the relvacuumxid change makes this scenario exceedingly probable, if no action is taken to force out the hint bits. I don't agree: If the truncation points are at 1 million, 2 million etc, then if we advance the relvacuumxid from 1.2 million to 1.5 million, then crash, the hints bits for that last vacuum are lost. Sounds bad, but we have not truncated clog, so there is no danger. In order to truncate up to 2 million we need to re-vacuum; at that point we discover that the 1.5 million setting was wrong, realise it should have been 1.2 million but don't care because we now set it to 1.8 million etc. No problem, even with repeated crashes. We only flush when we move the counter past a truncation point. If you look at this another way, maybe you'll see what I'm saying: Only update relvacuumxid iff the update would allow us to truncate the clog. That way we leap forwards in 1 million Xid chunks, rounded down. No change to clog = no update = no danger that we need to flush to avoid. The only alternative I can see is the one Heikki suggested: don't truncate clog until the freeze horizon. That's safe (given the planned change to WAL-log tuple freezing) and clean and simple, but a permanent requirement of 250MB+ for pg_clog would put the final nail in the coffin of PG's usability in small-disk-footprint environments. So I don't like it much. I suppose it could be made more tolerable by reducing the freeze horizon, say to 100M instead of 1G transactions. Anyone for a GUC parameter? In a high-volume DB you'd want the larger setting to minimize the amount of tuple freezing work. OTOH it seems like making this configurable creates a nasty risk for PITR situations: a slave that's configured with a smaller freeze window than the master is probably not safe. If we need to, just put the CLOG seg size in pg_config_manual.h -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] WAL logging freezing
Simon Riggs [EMAIL PROTECTED] writes: I don't agree: If the truncation points are at 1 million, 2 million etc, then if we advance the relvacuumxid from 1.2 million to 1.5 million, then crash, the hints bits for that last vacuum are lost. Sounds bad, but we have not truncated clog, so there is no danger. You're still wrong though. Suppose that VACUUM moves a particular rel's relvacuumxid from 1.9 to 2.1 million, but because this rel is not currently the oldest vacuumxid, it doesn't truncate clog. Then we crash and lose hint bits, but not the relvacuumxid change. Then VACUUM vacuums some other rel and advances its relvacuumxid from 1.9 to 2.1 million --- but this time that *was* the globally oldest value, and now we think we can truncate clog at 2 million. But the first rel might still have some unhinted xids around 1.9 million. If you look at this another way, maybe you'll see what I'm saying: Only update relvacuumxid iff the update would allow us to truncate the clog. Then you'll never update it at all, because there will always be some other rel constraining the global min. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] WAL logging freezing
On Mon, 2006-10-30 at 19:18 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I don't agree: If the truncation points are at 1 million, 2 million etc, then if we advance the relvacuumxid from 1.2 million to 1.5 million, then crash, the hints bits for that last vacuum are lost. Sounds bad, but we have not truncated clog, so there is no danger. You're still wrong though. Frequently, I'd say :-) Suppose that VACUUM moves a particular rel's relvacuumxid from 1.9 to 2.1 million, but because this rel is not currently the oldest vacuumxid, it doesn't truncate clog. Then we crash and lose hint bits, but not the relvacuumxid change. Then VACUUM vacuums some other rel and advances its relvacuumxid from 1.9 to 2.1 million --- but this time that *was* the globally oldest value, and now we think we can truncate clog at 2 million. But the first rel might still have some unhinted xids around 1.9 million. That was understood; in the above example I agree you need to flush. If you don't pass a truncation point, you don't need to flush whether or not you actually truncate. So we don't need to flush *every* time, so IMHO we don't need to play safe and keep clog the size of an iceberg. Anyway, if PITR is safe again, I'd like to sleepzz -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] WAL logging freezing
Simon Riggs [EMAIL PROTECTED] writes: That was understood; in the above example I agree you need to flush. If you don't pass a truncation point, you don't need to flush whether or not you actually truncate. So we don't need to flush *every* time, OK, but does that actually do much of anything for your performance complaint? Just after GlobalXmin has passed a truncation point, *every* vacuum the system does will start performing a flush-n-fsync, which seems like exactly what you didn't like. If the syncs were spread out in time for different rels then maybe this idea would help, but AFAICS they won't be. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] --single-transaction doc clarification
Neil Conway [EMAIL PROTECTED] writes: On Tue, 2006-10-31 at 01:07 +, Simon Riggs wrote: As requested. Applied, thanks for the patch. This patch converted a correct statement into a lie: there is not anything that will cause begin/commit in a script file to fail just because you wrapped begin/commit around them. I rewrote the text to If the script itself uses commandBEGIN/, commandCOMMIT/, or commandROLLBACK/, this option will not have the desired effects. Also, if the script contains any command that cannot be executed inside a transaction block, specifying this option will cause that command (and hence the whole transaction) to fail. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity?
On Friday 27 October 2006 19:38, Joe wrote: Hi Beau, On Fri, 2006-10-27 at 16:23 -0700, beau hargis wrote: I am hoping that there is an easy way to obtain case-preservation with case-insensitivity, or at the very least, case-preservation and complete case-sensitivity, or case-preservation and a consistant case-conversion strategy. The case of the column names need to be preserved because that is the way the schema is designed and most importantly (VERY, VERY IMPORTANT), column names are used in apps as hash values, or as named references which are case sensitive and as such need to be delivered to the client in exactly in the manner specified at the time of table creation. I went through the same issue in my conversion from MySQL to Postgres and (since I had a small application) I ended up changing up all my tables and columns UserProfile to user_profile. I'm afraid however, that it's MySQL that is the odd man out. I haven't researched this completely but I believe PG follows either the FIPS-127 or SQL-92 standard with respect to what are called delimited identifiers. Basically, this says if you want case sensitivity in identifier names, you have to use double quotes wherever you refer to the identifier. Without the double quotes, the SQL implementor can either use UPPERCASE (as I believe Oracle and DB2 do) or lowercase (as PG does) when it displays those identifiers. Joe Again, I am at the same point I was at when searching and searching for information on the problem, and I am not sure what the SQL standard has to say about it: I do not, and I dont think that anyone else who is struggling to deal with the problem does, care about case-sensitivity. I am interested in case preservation of column names. I do, indeed, want identifiers treated in a case insensitive way, but I want the case PRESERVED in the table definitions and I want that case, as preserved, to be reflected in the field names as returned by the server to any client library that connects and initiates a query. Case-preservation is not the same as case-sensitivity; nor is case-normalization the same as case-insensitivity. What PostgreSQL is doing is converting any, and all, identifiers to a lower case and then matching those against the identifiers (as stored in the table definition) in a case-sensitive manner. It 'normalizes' the case of the identifiers so that it has a common internal representation; the desires of the programmer and database architect be damned. Referenced specification details: From FIPS-127: === 3. Delimited identifiers. In the previous ANSI SQL specification, it was not possible for an application to specify identifiers with spaces or other special symbols. Also, it was not possible to protect against future assaults on the name space for (identifier) by additions to the (reserved word) list. The new facility for (delimited identifier) allows a user to enclose all identifiers in double-quotation marks, thereby ensuring that the name defined or referenced may contain spaces or other special symbols and will not be impacted by future additions to the (reserved word) list. === From SQL-92/Sec. 5.2: === 10)The identifier body of a regular identifier is equivalent to an identifier body in which every letter that is a lower- case letter is replaced by the equivalent upper-case letter or letters. This treatment includes determination of equiva- lence, representation in the Information and Definition Schemas, representation in the diagnostics area, and similar uses. 11)The identifier body of a regular identifier (with every letter that is a lower-case letter replaced by the equivalent upper-case letter or letters), treated as the repetition of a character string literal that specifies a character set specification of SQL_TEXT, shall not be equal, according to the comparison rules in Subclause 8.2, comparison predicate, to any reserved word (with every letter that is a lower-case letter replaced by the equivalent upper-case letter or letters), treated as the repetition of a character string literal that specifies a character set specification of SQL_TEXT. 12)Two regular identifiers are equivalent if their identifier bodys, considered as the repetition of a character string literal that specifies a character set specification of SQL_TEXT, compare equally according to the comparison rules in Subclause 8.2, comparison predicate. 13)A regular identifier and a delimited identifier are equiva- lent if the identifier body of the regular identifier (with every letter that is a lower-case letter replaced by the equiva- lent upper-case letter or letters) and the delimited identifier body of the delimited identifier (with all occurrences of quote replaced by quote symbol and all occurrences of dou- blequote symbol replaced by double quote), considered as the
Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity?
beau hargis [EMAIL PROTECTED] writes: Considering the differences that already exist between database systems and their varying compliance with SQL and the various extensions that have been created, I do not consider that the preservation of case for identifiers would violate any SQL standard. That's not how I read the spec. It is true that we are not 100% spec compliant, but that isn't a good argument for moving further away from spec. Not to mention breaking backwards compatibility with our historical behavior. The change you propose would fix your application at the cost of breaking other people's applications. Perhaps you should consider fixing your app instead. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Index greater than 8k
The problem I am after is the 8k index size issue. It is very easy to get a GIST index (especially when using tsearch2) that is larger than that. Hmm, tsearch2 GIST index is specially designed for support huge index entry: first, every lexemes in tsvectore are transformed to hash value (with a help of crc32), second, it's stripped all position infos, third, if size of array is greater than TOAST_INDEX_TARGET then tsearch2 will make bit signature of tsvector. Signature's length is fixed and equals to 252 bytes by default (+ 8 bytes for header of datum). All values on internal pages are represented as signatures below. So, tsearch2 guarantees that index entry will be small enough. If it's not true, then there is a bug - pls, make test suite demonstrating the problem. Is recompiling the block size the option there? What are the downsides, except for the custom build? Can you send exact error message? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [SQL] Case Preservation disregarding case
At Teradata, we certainly interpreted the spec to allow case-preserving, but case-insensitive, identifiers. Users really liked it that way: If you re-created a CREATE TABLE statement from the catalog, you could get back exactly the case the user had entered, but people using the table didn't need to worry about case. And column titles in reports would have the nice case preserving information. Sort of like how Windows systems treat file names... The case is preserved, but you don't need to know it to access the file. I know UNIX users usually think case-preserving with case-insensitive a foreign concept, but that doesn't mean the average user feels the same. If I want my column named WeeklyTotalSales, I really don't want to have to always quote it and type in the exact case. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Monday, October 30, 2006 7:24 PM To: beau hargis Cc: pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity? beau hargis [EMAIL PROTECTED] writes: Considering the differences that already exist between database systems and their varying compliance with SQL and the various extensions that have been created, I do not consider that the preservation of case for identifiers would violate any SQL standard. That's not how I read the spec. It is true that we are not 100% spec compliant, but that isn't a good argument for moving further away from spec. Not to mention breaking backwards compatibility with our historical behavior. The change you propose would fix your application at the cost of breaking other people's applications. Perhaps you should consider fixing your app instead. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity?
Chuck McDevitt [EMAIL PROTECTED] writes: At Teradata, we certainly interpreted the spec to allow case-preserving, but case-insensitive, identifiers. Really? As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2) 26) A regular identifier and a delimited identifier are equivalent if the identifier body of the regular identifier (with every letter that is a lower-case letter replaced by the corresponding upper-case letter or letters) and the delimited identifier body of the delimited identifier (with all occurrences of quote replaced by quote symbol and all occurrences of doublequote symbol replaced by double quote), considered as the repetition of a character string literal that specifies a character set specification of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, comparison predicate. 27) Two delimited identifiers are equivalent if their delimited identifier bodys, considered as the repetition of a character string literal that specifies a character set specification of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, comparison predicate. Note well the sensitive to case bits there. Now consider CREATE TABLE tab ( foobar int, FooBar timestamp, FOOBAR varchar(3) ); We can *not* reject this as containing duplicate column names, else we have certainly violated rule 27. Now what will you do with SELECT fooBar FROM tab; ? The spec is unquestionably on the side of you selected the varchar column; historical Postgres practice is on the side of you selected the int column. AFAICS a case-insensitive approach would have to fail with some I can't identify which column you mean error. I am interested to see where you find support for that in the spec... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org