Re: [HACKERS] Page Checksums
On Tue, Dec 27, 2011 at 1:39 PM, Jeff Davis pg...@j-davis.com wrote: On Mon, 2011-12-19 at 07:50 -0500, Robert Haas wrote: I think it would be regrettable if everyone had to give up 4 bytes per page because some people want checksums. I can understand that some people might not want the CPU expense of calculating CRCs; or the upgrade expense to convert to new pages; but do you think 4 bytes out of 8192 is a real concern? (Aside: it would be MAXALIGNed anyway, so probably more like 8 bytes.) Yeah, I do. Our on-disk footprint is already significantly greater than that of some other systems, and IMHO we should be looking for a way to shrink our overhead in that area, not make it bigger. Admittedly, most of the fat is probably in the tuple header rather than the page header, but at any rate I don't consider burning up 1% of our available storage space to be a negligible overhead. I'm not sure I believe it should need to be MAXALIGN'd, since it is followed by item pointers which IIRC only need 2-byte alignment, but then again Heikki also recently proposed adding 4 bytes per page to allow each page to track its XID generation, to help mitigate the need for anti-wraparound vacuuming. I think Simon's approach of stealing the 16-bit page version field is reasonably clever in this regard, although I also understand why Tom objects to it, and I certainly agree with him that we need to be careful not to back ourselves into a corner. What I'm not too clear about is whether a 16-bit checksum meets the needs of people who want checksums. If we assume that flaky hardware is going to corrupt pages steadily over time, then it seems like it might be adequate, because in the unlikely event that the first corrupted page happens to still pass its checksum test, well, another will come along and we'll probably spot the problem then, likely well before any significant fraction of the data gets eaten. But I'm not sure whether that's the right mental model. I, and I think some others, initially assumed we'd want a 32-bit checksum, but I'm not sure I can justify that beyond well, I think that's what people usually do. It could be that even if we add new page header space for the checksum (as opposed to stuffing it into the page version field) we still want to add only 2 bytes. Not sure... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: How would you know when to look in the double write buffer? You scan the double-write buffer, and every page in the double write buffer that has a valid checksum, you copy to the main storage. There's no need to check validity of pages in the main storage. OK, then we are talking at cross purposes. Double write buffers, in the way you explain them allow us to remove full page writes. They clearly don't do anything to check page validity on read. Torn pages are not the only fault we wish to correct against... and the double writes idea is orthogonal to the idea of checksums. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Wed, Dec 28, 2011 at 9:00 AM, Robert Haas robertmh...@gmail.com wrote: What I'm not too clear about is whether a 16-bit checksum meets the needs of people who want checksums. We need this now, hence the gymnastics to get it into this release. 16-bits of checksum is way better than zero bits of checksum, probably about a million times better (numbers taken from papers quoted earlier on effectiveness of checksums). The strategy I am suggesting is 16-bits now, 32/64 later. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause at end of recovery
On Thu, Dec 22, 2011 at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote: I can see a reason to do this now. I've written patch and will commit on Friday. Nudge me if I don't. It's hard to write this so it works in all cases and doesn't work in the right cases also. Basically, we can't get in the way of crash recovery, so the only way we can currently tell a crash recovery from an archive recovery is the presence of restore_command. If you don't have that and you haven't set a recovery target, it won't pause and there's nothing I can do, AFAICS. Please test this and review before commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 8647024..1e1614f 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -263,6 +263,8 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows para Specifies whether recovery should pause when the recovery target is reached. The default is true. +If varnamepause_at_recovery_target/ is set yet no recovery target +is specified there will be a pause when we reach the end of WAL. This is intended to allow queries to be executed against the database to check if this recovery target is the most desirable point for recovery. The paused state can be resumed by using @@ -275,7 +277,7 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /para para This setting has no effect if xref linkend=guc-hot-standby is not -enabled, or if no recovery target is set. +enabled, or if the database has not reached a consistent state. /para /listitem /varlistentry diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 41800a4..dc72c00 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -186,6 +186,9 @@ static bool InArchiveRecovery = false; /* Was the last xlog file restored from archive, or local? */ static bool restoredFromArchive = false; +/* Were we explicitly requested to terminate recovery, by any means? */ +static bool triggered = false; + /* options taken from recovery.conf for archive recovery */ static char *recoveryRestoreCommand = NULL; static char *recoveryEndCommand = NULL; @@ -6569,6 +6572,24 @@ StartupXLOG(void) ereport(LOG, (errmsg(last completed transaction was at log time %s, timestamptz_to_str(xtime; + + /* + * If we are in archive recovery and yet didn't have an explicit + * recovery target then pause at the end of recovery, unless we + * already paused above or we have been triggered to go live. + * Pause only if users can connect to send a resume message + */ + if (recoveryPauseAtTarget +standbyState == STANDBY_SNAPSHOT_READY +recoveryTarget == RECOVERY_TARGET_UNSET +InArchiveRecovery +!triggered +!reachedStopPoint) + { +SetRecoveryPause(true); +recoveryPausesHere(); + } + InRedo = false; } else @@ -9836,7 +9857,7 @@ retry: * can from archive and pg_xlog before failover. */ if (CheckForStandbyTrigger()) - goto triggered; + goto trigger_received; } /* @@ -9960,7 +9981,7 @@ next_record_is_invalid: else return false; -triggered: +trigger_received: if (readFile = 0) close(readFile); readFile = -1; @@ -10012,7 +10033,6 @@ static bool CheckForStandbyTrigger(void) { struct stat stat_buf; - static bool triggered = false; if (triggered) return true; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
With help from IBM Japan Ltd. we did some tests on a larger IBM machine than Tom Lane has used for his test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us). In his case it was IBM 8406-71Y, which has 8 physical cores and 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32 physical cores and 4SMT(128 threadings), 256GB RAM. The test method was same as the one in the article above. The differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer size(8GB). We tested 3 methods to enhance spin lock contention: 1) Add hint parameter to lwarx op which is usable POWER6 or later architecure. 2) Add non-locked test in TAS() 3) #1 + #2 We saw small performance enhancement with #1, larger one with #2 and even better with #1+#2. Hmm, so you added the non-locked test in TAS()? Did you try adding it just to TAS_SPIN()? On Itanium, I found that it was slightly better to do it only in TAS_SPIN() - i.e. in the contended case. Here is new patch using TAS_SPIN(), created by Manabu Ori from IBM Japan. Also this patch deal with older Power architectures which do not have hint argument of lwarx opcode. According to him, the patch resulted in much better performance stock git head. Stock git head without patch: pgbench -c 1 -j 1 -S -T 300tps = 11360.472691 (including ... pgbench -c 2 -j 1 -S -T 300tps = 22173.943133 (including ... pgbench -c 4 -j 2 -S -T 300tps = 43397.331641 (including ... pgbench -c 8 -j 4 -S -T 300tps = 73469.073714 (including ... pgbench -c 16 -j 8 -S -T 300 tps = 151094.270443 (including ... pgbench -c 32 -j 16 -S -T 300 tps = 166752.637452 (including ... pgbench -c 64 -j 32 -S -T 300 tps = 148139.338204 (including ... pgbench -c 128 -j 64 -S -T 300 tps = 115412.622895 (including ... Stock git head with patch: pgbench -c 1 -j 1 -S -T 300tps = 11103.370854 (including ... pgbench -c 2 -j 1 -S -T 300tps = 22118.907582 (including ... pgbench -c 4 -j 2 -S -T 300tps = 42608.641820 (including ... pgbench -c 8 -j 4 -S -T 300tps = 77592.862639 (including ... pgbench -c 16 -j 8 -S -T 300 tps = 150469.841892 (including ... pgbench -c 32 -j 16 -S -T 300 tps = 267726.082168 (including ... pgbench -c 64 -j 32 -S -T 300 tps = 322582.271713 (including ... pgbench -c 128 -j 64 -S -T 300 tps = 273071.683663 (including ... (Graph is attached) Test environment: Power 750 (32 physical cores, virtually 128 cores using SMT4) mem: 256GB OS: RHEL6.1 kernel 2.6.32-131.0.15.el6.ppc64 gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) PostgreSQL Git head (0510b62d91151b9d8c1fe1aa15c9cf3ffe9bf25b) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp inline: PostgreSQL-ppc-TAS_SPIN-20111228.png ppc-TAS_SPIN-20111228.diff.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is currently in test/dev mode. I'm currently seeing the following messages occurring every few seconds: ... Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait timeout Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait timeout Hm, so can you strace the stats collector to see what it's doing? Maybe grab a backtrace with GDB from it before anything else. My guess is 27324 is the autovac launcher and the others are autovac workers just as they die. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ordering op for WHERE
hi, does it make sense to teach the planner (and the executor?) use an ordering op to optimize queries like the following? select * from t where a - 1000 10 YAMAMOTO Takashi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of VS 2010 support patches
On 12/27/2011 11:09 PM, Brar Piening wrote: Brar Piening wrote: I have to admit that it's currently broken (it builds but fails during regression tests becuse it can't connect) when building with Visual Studio 2010 or Windows SDK 7.1 because of commit 1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the mingw-w64 32 bit compiler). It seems like VS 2010 has a few of the E... constants in src/include/port/win32.h already defined, but obviously in a way that breaks postgres. Because of my missing experience and as I don't have a Mingw64 build environment I don't feel like I could fix that without breaking anythig else. I'd like to add that I'm certainly willing to test suggested fixes or patches in my VS 2010 build environment. Can you narrow down exactly what in that commit broke VS 2010? Are there any compiler warnings? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of VS 2010 support patches
On Wed, Dec 28, 2011 at 14:38, Andrew Dunstan and...@dunslane.net wrote: On 12/27/2011 11:09 PM, Brar Piening wrote: Brar Piening wrote: I have to admit that it's currently broken (it builds but fails during regression tests becuse it can't connect) when building with Visual Studio 2010 or Windows SDK 7.1 because of commit 1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the mingw-w64 32 bit compiler). It seems like VS 2010 has a few of the E... constants in src/include/port/win32.h already defined, but obviously in a way that breaks postgres. Because of my missing experience and as I don't have a Mingw64 build environment I don't feel like I could fix that without breaking anythig else. I'd like to add that I'm certainly willing to test suggested fixes or patches in my VS 2010 build environment. Can you narrow down exactly what in that commit broke VS 2010? Are there any compiler warnings? He did post the commit - 1a0c76c32fe470142d3663dd84ac960d75a4e8db. Not the exact error or warnings, though ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of VS 2010 support patches
On 12/28/2011 08:43 AM, Magnus Hagander wrote: On Wed, Dec 28, 2011 at 14:38, Andrew Dunstanand...@dunslane.net wrote: On 12/27/2011 11:09 PM, Brar Piening wrote: Brar Piening wrote: I have to admit that it's currently broken (it builds but fails during regression tests becuse it can't connect) when building with Visual Studio 2010 or Windows SDK 7.1 because of commit 1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the mingw-w64 32 bit compiler). It seems like VS 2010 has a few of the E... constants in src/include/port/win32.h already defined, but obviously in a way that breaks postgres. Because of my missing experience and as I don't have a Mingw64 build environment I don't feel like I could fix that without breaking anythig else. I'd like to add that I'm certainly willing to test suggested fixes or patches in my VS 2010 build environment. Can you narrow down exactly what in that commit broke VS 2010? Are there any compiler warnings? He did post the commit - 1a0c76c32fe470142d3663dd84ac960d75a4e8db. I realize that. I'm asking him to narrow it down more. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of VS 2010 support patches
On Wed, Dec 28, 2011 at 15:08, Andrew Dunstan and...@dunslane.net wrote: On 12/28/2011 08:43 AM, Magnus Hagander wrote: On Wed, Dec 28, 2011 at 14:38, Andrew Dunstanand...@dunslane.net wrote: On 12/27/2011 11:09 PM, Brar Piening wrote: Brar Piening wrote: I have to admit that it's currently broken (it builds but fails during regression tests becuse it can't connect) when building with Visual Studio 2010 or Windows SDK 7.1 because of commit 1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the mingw-w64 32 bit compiler). It seems like VS 2010 has a few of the E... constants in src/include/port/win32.h already defined, but obviously in a way that breaks postgres. Because of my missing experience and as I don't have a Mingw64 build environment I don't feel like I could fix that without breaking anythig else. I'd like to add that I'm certainly willing to test suggested fixes or patches in my VS 2010 build environment. Can you narrow down exactly what in that commit broke VS 2010? Are there any compiler warnings? He did post the commit - 1a0c76c32fe470142d3663dd84ac960d75a4e8db. I realize that. I'm asking him to narrow it down more. Meh. I can't read. Sorry about that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure mmonc...@gmail.com wrote: I bet if you kept a judicious number of clog pages in each local process with some smart invalidation you could cover enough cases that scribbling the bits down would become unnecessary. I don't understand how any cache can completely remove the need for hint bits. Without hint bits the xids in the tuples will be in-doubt forever. No matter how large your cache you'll always come across tuples that are arbitrarily old and are from an unbounded size set of xids. We could replace the xids with a frozen xid sooner but that just amounts to nearly the same thing as the hint bits only with page locking and wal records. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Wed, Dec 28, 2011 at 8:45 AM, Greg Stark st...@mit.edu wrote: On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure mmonc...@gmail.com wrote: I bet if you kept a judicious number of clog pages in each local process with some smart invalidation you could cover enough cases that scribbling the bits down would become unnecessary. I don't understand how any cache can completely remove the need for hint bits. Without hint bits the xids in the tuples will be in-doubt forever. No matter how large your cache you'll always come across tuples that are arbitrarily old and are from an unbounded size set of xids. well, hint bits aren't needed strictly speaking, they are an optimization to guard against clog lookups. but is marking bits on the tuple the only way to get that effect? I'm conjecturing that some process local memory could be laid on top of the clog slru that would be fast enough such that it could take the place of the tuple bits in the visibility check. Maybe this could reduce clog contention as well -- or maybe the idea is unworkable. That said, it shouldn't be that much work to make a proof of concept to test the idea. We could replace the xids with a frozen xid sooner but that just amounts to nearly the same thing as the hint bits only with page locking and wal records. right -- I don't think that helps. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
On 12/28/2011 05:05 AM, Alvaro Herrera wrote: Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is currently in test/dev mode. I'm currently seeing the following messages occurring every few seconds: ... Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait timeout Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait timeout Hm, so can you strace the stats collector to see what it's doing? Maybe grab a backtrace with GDB from it before anything else. My guess is 27324 is the autovac launcher and the others are autovac workers just as they die. You are correct. 27324 is the launcher and the others are autovac workers. Here's the strace of the stats collector process: getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) rinse...lather...repeat...ad nauseum... And the backtrace: #0 0x7ff4d2e80f58 in poll () from /lib/libc.so.6 #1 0x7ff4d4e6f465 in ?? () #2 0x7ff4d4e6fd83 in pgstat_start () #3 0x7ff4d4e73475 in ?? () #4 signal handler called #5 0x7ff4d2e85fd3 in select () from /lib/libc.so.6 #6 0x7ff4d4e71b93 in ?? () #7 0x7ff4d4e74b01 in PostmasterMain () #8 0x7ff4d4e193b3 in main () Cheers, Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On 28.12.2011 11:00, Robert Haas wrote: Admittedly, most of the fat is probably in the tuple header rather than the page header, but at any rate I don't consider burning up 1% of our available storage space to be a negligible overhead. 8 / 8192 = 0.1%. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011: On 12/28/2011 05:05 AM, Alvaro Herrera wrote: Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is currently in test/dev mode. I'm currently seeing the following messages occurring every few seconds: ... Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait timeout Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait timeout Hm, so can you strace the stats collector to see what it's doing? Maybe grab a backtrace with GDB from it before anything else. My guess is 27324 is the autovac launcher and the others are autovac workers just as they die. You are correct. 27324 is the launcher and the others are autovac workers. Here's the strace of the stats collector process: getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) rinse...lather...repeat...ad nauseum... Weird ... even across more pgstat wait timeout messages? It's like it's not getting the inquiry messages that would tell it to write the file ... something wrong with the UDP socket perhaps? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On 28.12.2011 11:22, Simon Riggs wrote: On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: How would you know when to look in the double write buffer? You scan the double-write buffer, and every page in the double write buffer that has a valid checksum, you copy to the main storage. There's no need to check validity of pages in the main storage. OK, then we are talking at cross purposes. Double write buffers, in the way you explain them allow us to remove full page writes. They clearly don't do anything to check page validity on read. Torn pages are not the only fault we wish to correct against... and the double writes idea is orthogonal to the idea of checksums. The reason we're talking about double write buffers in this thread is that double write buffers can be used to solve the problem with hint bits and checksums. You're right, though, that it's academical whether double write buffers can be used without checksums on data pages, if the whole point of the exercise is to make it possible to have checksums on data pages.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] age(xid) on hot standby
The check_postgres txn_wraparound action[0] runs this query: SELECT datname, age(datfrozenxid) AS age FROM pg_database WHERE datallowconn ORDER BY 1, 2 On a hot standby, this fails with: ERROR: cannot assign TransactionIds during recovery So, a couple of things to wonder about: Is it unreasonable to check for transaction ID wraparound on a standby? It should mirror the situation on the primary, shouldn't it? Should the age(xid) function do something more useful on a standby, e.g., have a custom error message or return null or use the transaction ID from the master? The error message is coded as an elog() call, meaning that users shouldn't see it, but it can evidently be triggered by a user, so maybe we should decorate it with some detail, depending on the outcome of the previous question. (It looks like age(xid) isn't documented at all. Maybe it should be.) [0] - http://bucardo.org/check_postgres/check_postgres.pl.html#txn_wraparound -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
On 12/28/2011 09:34 AM, Alvaro Herrera wrote: Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011: On 12/28/2011 05:05 AM, Alvaro Herrera wrote: Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is currently in test/dev mode. I'm currently seeing the following messages occurring every few seconds: ... Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait timeout Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait timeout Hm, so can you strace the stats collector to see what it's doing? Maybe grab a backtrace with GDB from it before anything else. My guess is 27324 is the autovac launcher and the others are autovac workers just as they die. You are correct. 27324 is the launcher and the others are autovac workers. Here's the strace of the stats collector process: getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) rinse...lather...repeat...ad nauseum... Weird ... even across more pgstat wait timeout messages? It's like it's not getting the inquiry messages that would tell it to write the file ... something wrong with the UDP socket perhaps? Bingo! postgres 27325 postgres8u *IPv6*5379428 0t0UDP localhost:47204-localhost:47204 In working on diagnosing a network timeout issue over an IPv4 to IPv4 VPN I disabled IPv6 via sysctl on this machine and pretty much forgot about it since we are still IPv4 internally. But PostgreSQL had already established a (now non-functional) IPv6 local connection. Re-enabling IPv6, as it was not related to the VPN timeouts, corrected the pgstat wait timeout issue. Cheers, Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib/README
Tom Lane t...@sss.pgh.pa.us writes: I wonder whether it's time to drop that file altogether ... it served a purpose back before we integrated contrib into the SGML docs, but now I'm not quite sure why we should bother with it. I wonder if we shouldn't keep the file and have it just point to the relevant documentation chapters (extend, contribs). Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] age(xid) on hot standby
Excerpts from Peter Eisentraut's message of mié dic 28 15:04:09 -0300 2011: The check_postgres txn_wraparound action[0] runs this query: SELECT datname, age(datfrozenxid) AS age FROM pg_database WHERE datallowconn ORDER BY 1, 2 On a hot standby, this fails with: ERROR: cannot assign TransactionIds during recovery So, a couple of things to wonder about: Is it unreasonable to check for transaction ID wraparound on a standby? It should mirror the situation on the primary, shouldn't it? Should the age(xid) function do something more useful on a standby, e.g., have a custom error message or return null or use the transaction ID from the master? I think we could just have the xid_age call GetCurrentTransactionIdIfAny, and if that returns InvalidXid, use ReadNewTransactionId instead. That xid_age assigns a transaction seems more of an accident than really intended. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] age(xid) on hot standby
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Peter Eisentraut's message of mié dic 28 15:04:09 -0300 2011: On a hot standby, this fails with: ERROR: cannot assign TransactionIds during recovery I think we could just have the xid_age call GetCurrentTransactionIdIfAny, and if that returns InvalidXid, use ReadNewTransactionId instead. That xid_age assigns a transaction seems more of an accident than really intended. The trouble with using ReadNewTransactionId is that it makes the results volatile, not stable as the function is declared to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
On 28.12.2011 14:03, Tatsuo Ishii wrote: With help from IBM Japan Ltd. we did some tests on a larger IBM machine than Tom Lane has used for his test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us). In his case it was IBM 8406-71Y, which has 8 physical cores and 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32 physical cores and 4SMT(128 threadings), 256GB RAM. The test method was same as the one in the article above. The differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer size(8GB). We tested 3 methods to enhance spin lock contention: 1) Add hint parameter to lwarx op which is usable POWER6 or later architecure. 2) Add non-locked test in TAS() 3) #1 + #2 We saw small performance enhancement with #1, larger one with #2 and even better with #1+#2. Hmm, so you added the non-locked test in TAS()? Did you try adding it just to TAS_SPIN()? On Itanium, I found that it was slightly better to do it only in TAS_SPIN() - i.e. in the contended case. Here is new patch using TAS_SPIN(), created by Manabu Ori from IBM Japan. Also this patch deal with older Power architectures which do not have hint argument of lwarx opcode. According to him, the patch resulted in much better performance stock git head. Impressive results. config/c-compiler.m4 doesn't seem like the right place for the configure test. Would there be any harm in setting the lwarx hint always; what would happen on older ppc processors that don't support it? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
On 28.12.2011 14:03, Tatsuo Ishii wrote: With help from IBM Japan Ltd. we did some tests on a larger IBM machine than Tom Lane has used for his test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us). In his case it was IBM 8406-71Y, which has 8 physical cores and 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32 physical cores and 4SMT(128 threadings), 256GB RAM. The test method was same as the one in the article above. The differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer size(8GB). We tested 3 methods to enhance spin lock contention: 1) Add hint parameter to lwarx op which is usable POWER6 or later architecure. 2) Add non-locked test in TAS() 3) #1 + #2 We saw small performance enhancement with #1, larger one with #2 and even better with #1+#2. Hmm, so you added the non-locked test in TAS()? Did you try adding it just to TAS_SPIN()? On Itanium, I found that it was slightly better to do it only in TAS_SPIN() - i.e. in the contended case. Here is new patch using TAS_SPIN(), created by Manabu Ori from IBM Japan. Also this patch deal with older Power architectures which do not have hint argument of lwarx opcode. According to him, the patch resulted in much better performance stock git head. Impressive results. config/c-compiler.m4 doesn't seem like the right place for the configure test. Would there be any harm in setting the lwarx hint always; what would happen on older ppc processors that don't support it? I think the load module just fails to run in this case, but I'd like to confirm. Ori-san? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of VS 2010 support patches
Andrew Dunstan wrote: Can you narrow down exactly what in that commit broke VS 2010? Are there any compiler warnings? I was able to nail down the problem. Running the regression tests (vcregress check) gives the following messages: snip == creating temporary installation== == initializing database system == == starting postmaster== pg_regress: postmaster did not respond within 60 seconds Examine src/test/regress/log/postmaster.log for the reason /snip postmaster.log shows the following messages: snip LOG: database system was shut down at 2011-12-28 22:09:46 CET LOG: database system is ready to accept connections LOG: autovacuum launcher started LOG: incomplete startup packet /snip with the line LOG: incomplete startup packet repeated several times afterwards. The problem seems to be related to an invalid socket error constant. EWOULDBLOCK gets expanded to 140 with commit 1a0c76c32fe470142d3663dd84ac960d75a4e8db applied whereas it got expanded to 10035L before. Adding the following code to src/include/port/win32.h restores the former (running) behaviour : snip #if _MSC_VER = 1600 #pragma warning(disable:4005) #define EWOULDBLOCK WSAEWOULDBLOCK #endif /snip But according to the winsock docs this minimal invasive surgery isn't really appropriate (at least for visual c). http://msdn.microsoft.com/en-us/library/windows/desktop/ms737828(v=vs.85).aspx It appears that VS 2010 and Windows SDK 7.1 now have an extended errno.h that defines quite a few of the E* constants: snip /* POSIX SUPPLEMENT */ #define EADDRINUSE 100 #define EADDRNOTAVAIL 101 [...] #define ETXTBSY 139 #define EWOULDBLOCK 140 /snip Here we probably run into the conflict that winsock2.h has always been warning about: snip /* * Windows Sockets errors redefined as regular Berkeley error constants. * These are commented out in Windows NT to avoid conflicts with errno.h. * Use the WSA constants instead. */ #if 0 #define EWOULDBLOCK WSAEWOULDBLOCK [...] #define ESTALE WSAESTALE #define EREMOTE WSAEREMOTE #endif /snip A possible solution would be to use something like PGEWOULDBLOCK and similiar constants wherever socket errors are used and set them to the WSAE* constants on windows and the E* constants on other platforms. Anyhow, this would be ways beyond the scope of my patch and there will probably be a better solution to be suggested from a real C hacker. Regards, Brar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: config/c-compiler.m4 doesn't seem like the right place for the configure test. Would there be any harm in setting the lwarx hint always; what would happen on older ppc processors that don't support it? More to the point, a configure test only proves whether the build machine can deal with the flag, not whether the machine the executables will ultimately run on knows what the flag means. We cannot assume that the build and execution boxes are the same. (In general, AC_TRY_RUN tests are best avoided because of this.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
Tom Lane wrote: a configure test only proves whether the build machine can deal with the flag, not whether the machine the executables will ultimately run on knows what the flag means. We cannot assume that the build and execution boxes are the same. (In general, AC_TRY_RUN tests are best avoided because of this.) I understand why that is important in general, but as a shop which builds from source, and is fine with a separate build for each hardware model / OS version combination, it would be great if any optimizations which are only available if you *do* assume that the build machine and the run machine are the same (or at lease identical) could be enabled with some configure switch. Maybe something like --enable-platform-specific-optimizations. I don't know if any such possible optimizations currently exist, I'm just saying that if any are identified, it would be nice to have the option of using them. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
OT: Please use mail address manabu@gmail.com, not manabu@gmailc.com when following this thread. I accidently made a mistake when I posted the first mail in this thread. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp Tom Lane wrote: a configure test only proves whether the build machine can deal with the flag, not whether the machine the executables will ultimately run on knows what the flag means. We cannot assume that the build and execution boxes are the same. (In general, AC_TRY_RUN tests are best avoided because of this.) I understand why that is important in general, but as a shop which builds from source, and is fine with a separate build for each hardware model / OS version combination, it would be great if any optimizations which are only available if you *do* assume that the build machine and the run machine are the same (or at lease identical) could be enabled with some configure switch. Maybe something like --enable-platform-specific-optimizations. I don't know if any such possible optimizations currently exist, I'm just saying that if any are identified, it would be nice to have the option of using them. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
2011/12/29 Tatsuo Ishii is...@postgresql.org Impressive results. config/c-compiler.m4 doesn't seem like the right place for the configure test. Would there be any harm in setting the lwarx hint always; what would happen on older ppc processors that don't support it? I think the load module just fails to run in this case, but I'd like to confirm. Ori-san? I don't know where is the right config/*.m4 to place this kind of configure test. Do you have any idea? I believe lwarx hint would be no harm for recent PowerPC processors. What I tested are: (1) Built postgres on POWER6 + RHEL5, which got lwarx hint included. Then copy these src tree to POWER5 + RHEL4 and run make test, finished successfully. (2) Lwarx test in configure failed on POWER5 + RHEL4. Note that POWER6 understands lwarx hint and POWER5 doesn't. RHEL5 binutils supports lwarx hint and RHEL4 binutils doesn't. The only concern is for very old PowerPC. Referring to Power Instruction Set Architecture manual(*1), on some processors that precede PowerISA v2.00, executing lwarx with hint will cause the illegal instruction error. Lwarx test in configure should fail on these kind of processors, guessing from my test(2). (*1) p.689 of https://www.power.org/resources/downloads/PowerISA_V2.06B_V2_PUBLIC.pdf Regards, Manabu Ori
Re: [HACKERS] spinlocks on HP-UX
a configure test only proves whether the build machine can deal with the flag, not whether the machine the executables will ultimately run on knows what the flag means. We cannot assume that the build and execution boxes are the same. (In general, AC_TRY_RUN tests are best avoided because of this.) I understand why that is important in general, but as a shop which builds from source, and is fine with a separate build for each hardware model / OS version combination, it would be great if any optimizations which are only available if you *do* assume that the build machine and the run machine are the same (or at lease identical) could be enabled with some configure switch. Maybe something like --enable-platform-specific-optimizations. I don't know if any such possible optimizations currently exist, I'm just saying that if any are identified, it would be nice to have the option of using them. I can't say the right way to go for now, but I'd like binary packages could enjoy the effect of my patch as far as possible so that I made lwarx hint test run in configure runtime. Regards, Manabu Ori