Re: [HACKERS] Page Checksums

2011-12-28 Thread Robert Haas
On Tue, Dec 27, 2011 at 1:39 PM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-12-19 at 07:50 -0500, Robert Haas wrote:
 I
 think it would be regrettable if everyone had to give up 4 bytes per
 page because some people want checksums.

 I can understand that some people might not want the CPU expense of
 calculating CRCs; or the upgrade expense to convert to new pages; but do
 you think 4 bytes out of 8192 is a real concern?

 (Aside: it would be MAXALIGNed anyway, so probably more like 8 bytes.)

Yeah, I do.  Our on-disk footprint is already significantly greater
than that of some other systems, and IMHO we should be looking for a
way to shrink our overhead in that area, not make it bigger.
Admittedly, most of the fat is probably in the tuple header rather
than the page header, but at any rate I don't consider burning up 1%
of our available storage space to be a negligible overhead.  I'm not
sure I believe it should need to be MAXALIGN'd, since it is followed
by item pointers which IIRC only need 2-byte alignment, but then again
Heikki also recently proposed adding 4 bytes per page to allow each
page to track its XID generation, to help mitigate the need for
anti-wraparound vacuuming.

I think Simon's approach of stealing the 16-bit page version field is
reasonably clever in this regard, although I also understand why Tom
objects to it, and I certainly agree with him that we need to be
careful not to back ourselves into a corner.  What I'm not too clear
about is whether a 16-bit checksum meets the needs of people who want
checksums.  If we assume that flaky hardware is going to corrupt pages
steadily over time, then it seems like it might be adequate, because
in the unlikely event that the first corrupted page happens to still
pass its checksum test, well, another will come along and we'll
probably spot the problem then, likely well before any significant
fraction of the data gets eaten.  But I'm not sure whether that's the
right mental model.  I, and I think some others, initially assumed
we'd want a 32-bit checksum, but I'm not sure I can justify that
beyond well, I think that's what people usually do.  It could be
that even if we add new page header space for the checksum (as opposed
to stuffing it into the page version field) we still want to add only
2 bytes.  Not sure...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-28 Thread Simon Riggs
On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 How would you know when to look in the double write buffer?


 You scan the double-write buffer, and every page in the double write buffer
 that has a valid checksum, you copy to the main storage. There's no need to
 check validity of pages in the main storage.

OK, then we are talking at cross purposes. Double write buffers, in
the way you explain them allow us to remove full page writes. They
clearly don't do anything to check page validity on read. Torn pages
are not the only fault we wish to correct against... and the double
writes idea is orthogonal to the idea of checksums.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-28 Thread Simon Riggs
On Wed, Dec 28, 2011 at 9:00 AM, Robert Haas robertmh...@gmail.com wrote:

 What I'm not too clear
 about is whether a 16-bit checksum meets the needs of people who want
 checksums.

We need this now, hence the gymnastics to get it into this release.

16-bits of checksum is way better than zero bits of checksum, probably
about a million times better (numbers taken from papers quoted earlier
on effectiveness of checksums).

The strategy I am suggesting is 16-bits now, 32/64 later.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pause at end of recovery

2011-12-28 Thread Simon Riggs
On Thu, Dec 22, 2011 at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote:

 I can see a reason to do this now. I've written patch and will commit
 on Friday. Nudge me if I don't.

It's hard to write this so it works in all cases and doesn't work in
the right cases also.

Basically, we can't get in the way of crash recovery, so the only way
we can currently tell a crash recovery from an archive recovery is the
presence of restore_command.

If you don't have that and you haven't set a recovery target, it won't
pause and there's nothing I can do, AFAICS.

Please test this and review before commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 8647024..1e1614f 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -263,6 +263,8 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
para
 Specifies whether recovery should pause when the recovery target
 is reached. The default is true.
+If varnamepause_at_recovery_target/ is set yet no recovery target
+is specified there will be a pause when we reach the end of WAL.
 This is intended to allow queries to be executed against the
 database to check if this recovery target is the most desirable
 point for recovery. The paused state can be resumed by using
@@ -275,7 +277,7 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
/para
para
 This setting has no effect if xref linkend=guc-hot-standby is not
-enabled, or if no recovery target is set.
+enabled, or if the database has not reached a consistent state.
/para
   /listitem
  /varlistentry
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 41800a4..dc72c00 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -186,6 +186,9 @@ static bool InArchiveRecovery = false;
 /* Was the last xlog file restored from archive, or local? */
 static bool restoredFromArchive = false;
 
+/* Were we explicitly requested to terminate recovery, by any means? */
+static bool triggered = false;
+
 /* options taken from recovery.conf for archive recovery */
 static char *recoveryRestoreCommand = NULL;
 static char *recoveryEndCommand = NULL;
@@ -6569,6 +6572,24 @@ StartupXLOG(void)
 ereport(LOG,
 	 (errmsg(last completed transaction was at log time %s,
 			 timestamptz_to_str(xtime;
+
+			/*
+			 * If we are in archive recovery and yet didn't have an explicit
+			 * recovery target then pause at the end of recovery, unless we
+			 * already paused above or we have been triggered to go live.
+			 * Pause only if users can connect to send a resume message
+			 */
+			if (recoveryPauseAtTarget  
+standbyState == STANDBY_SNAPSHOT_READY 
+recoveryTarget == RECOVERY_TARGET_UNSET 
+InArchiveRecovery 
+!triggered 
+!reachedStopPoint)
+			{
+SetRecoveryPause(true);
+recoveryPausesHere();
+			}
+
 			InRedo = false;
 		}
 		else
@@ -9836,7 +9857,7 @@ retry:
 	 * can from archive and pg_xlog before failover.
 	 */
 	if (CheckForStandbyTrigger())
-		goto triggered;
+		goto trigger_received;
 }
 
 /*
@@ -9960,7 +9981,7 @@ next_record_is_invalid:
 	else
 		return false;
 
-triggered:
+trigger_received:
 	if (readFile = 0)
 		close(readFile);
 	readFile = -1;
@@ -10012,7 +10033,6 @@ static bool
 CheckForStandbyTrigger(void)
 {
 	struct stat stat_buf;
-	static bool triggered = false;
 
 	if (triggered)
 		return true;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Tatsuo Ishii
 With help from IBM Japan Ltd. we did some tests on a larger IBM
 machine than Tom Lane has used for his
 test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
 In his case it was IBM 8406-71Y, which has 8 physical cores and
 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32
 physical cores and 4SMT(128 threadings), 256GB RAM.

 The test method was same as the one in the article above. The
 differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer
 size(8GB).

 We tested 3 methods to enhance spin lock contention:

 1) Add hint parameter to lwarx op which is usable POWER6 or later
   architecure.

 2) Add non-locked test in TAS()

 3) #1 + #2

 We saw small performance enhancement with #1, larger one with #2 and
 even better with #1+#2.
 
 Hmm, so you added the non-locked test in TAS()?  Did you try adding it
 just to TAS_SPIN()?  On Itanium, I found that it was slightly better
 to do it only in TAS_SPIN() - i.e. in the contended case.

Here is new patch using TAS_SPIN(), created by Manabu Ori from IBM
Japan. Also this patch deal with older Power architectures which do
not have hint argument of lwarx opcode.

According to him, the patch resulted in much better performance stock
git head.

Stock git head without patch:
pgbench -c 1 -j 1 -S -T 300tps = 11360.472691 (including ...
pgbench -c 2 -j 1 -S -T 300tps = 22173.943133 (including ...
pgbench -c 4 -j 2 -S -T 300tps = 43397.331641 (including ...
pgbench -c 8 -j 4 -S -T 300tps = 73469.073714 (including ...
pgbench -c 16 -j 8 -S -T 300   tps = 151094.270443 (including ...
pgbench -c 32 -j 16 -S -T 300  tps = 166752.637452 (including ...
pgbench -c 64 -j 32 -S -T 300  tps = 148139.338204 (including ...
pgbench -c 128 -j 64 -S -T 300 tps = 115412.622895 (including ...

Stock git head with patch:
pgbench -c 1 -j 1 -S -T 300tps = 11103.370854 (including ...
pgbench -c 2 -j 1 -S -T 300tps = 22118.907582 (including ...
pgbench -c 4 -j 2 -S -T 300tps = 42608.641820 (including ...
pgbench -c 8 -j 4 -S -T 300tps = 77592.862639 (including ...
pgbench -c 16 -j 8 -S -T 300   tps = 150469.841892 (including ...
pgbench -c 32 -j 16 -S -T 300  tps = 267726.082168 (including ...
pgbench -c 64 -j 32 -S -T 300  tps = 322582.271713 (including ...
pgbench -c 128 -j 64 -S -T 300 tps = 273071.683663 (including ...

(Graph is attached)

Test environment:
Power 750 (32 physical cores, virtually 128 cores using SMT4)
mem: 256GB
OS: RHEL6.1 kernel 2.6.32-131.0.15.el6.ppc64
gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) 
PostgreSQL Git head (0510b62d91151b9d8c1fe1aa15c9cf3ffe9bf25b)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
inline: PostgreSQL-ppc-TAS_SPIN-20111228.png

ppc-TAS_SPIN-20111228.diff.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Alvaro Herrera

Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
 I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is 
 currently in test/dev mode. I'm currently seeing the following messages 
 occurring every few seconds:
 
 ...
 Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
 Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait 
 timeout
 Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
 Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait 
 timeout

Hm, so can you strace the stats collector to see what it's doing?  Maybe
grab a backtrace with GDB from it before anything else.

My guess is 27324 is the autovac launcher and the others are autovac
workers just as they die.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ordering op for WHERE

2011-12-28 Thread YAMAMOTO Takashi
hi,

does it make sense to teach the planner (and the executor?) use an ordering op
to optimize queries like the following?

select * from t where a - 1000  10

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of VS 2010 support patches

2011-12-28 Thread Andrew Dunstan



On 12/27/2011 11:09 PM, Brar Piening wrote:

Brar Piening wrote:
I have to admit that it's currently broken (it builds but fails 
during regression tests becuse it can't connect) when building with 
Visual Studio 2010 or Windows SDK 7.1 because of commit 
1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the 
mingw-w64 32 bit compiler).


It seems like VS 2010 has a few of the E... constants in 
src/include/port/win32.h already defined, but obviously in a way that 
breaks postgres.


Because of my missing experience and as I don't have a Mingw64 build 
environment I don't feel like I could fix that without breaking 
anythig else.


I'd like to add that I'm certainly willing to test suggested fixes or 
patches in my VS 2010 build environment.





Can you narrow down exactly what in that commit broke VS 2010? Are there 
any compiler warnings?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of VS 2010 support patches

2011-12-28 Thread Magnus Hagander
On Wed, Dec 28, 2011 at 14:38, Andrew Dunstan and...@dunslane.net wrote:


 On 12/27/2011 11:09 PM, Brar Piening wrote:

 Brar Piening wrote:

 I have to admit that it's currently broken (it builds but fails during
 regression tests becuse it can't connect) when building with Visual Studio
 2010 or Windows SDK 7.1 because of commit
 1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the
 mingw-w64 32 bit compiler).

 It seems like VS 2010 has a few of the E... constants in
 src/include/port/win32.h already defined, but obviously in a way that breaks
 postgres.

 Because of my missing experience and as I don't have a Mingw64 build
 environment I don't feel like I could fix that without breaking anythig
 else.


 I'd like to add that I'm certainly willing to test suggested fixes or
 patches in my VS 2010 build environment.



 Can you narrow down exactly what in that commit broke VS 2010? Are there any
 compiler warnings?

He did post the commit - 1a0c76c32fe470142d3663dd84ac960d75a4e8db.

Not the exact error or warnings, though ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of VS 2010 support patches

2011-12-28 Thread Andrew Dunstan



On 12/28/2011 08:43 AM, Magnus Hagander wrote:

On Wed, Dec 28, 2011 at 14:38, Andrew Dunstanand...@dunslane.net  wrote:


On 12/27/2011 11:09 PM, Brar Piening wrote:

Brar Piening wrote:

I have to admit that it's currently broken (it builds but fails during
regression tests becuse it can't connect) when building with Visual Studio
2010 or Windows SDK 7.1 because of commit
1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the
mingw-w64 32 bit compiler).

It seems like VS 2010 has a few of the E... constants in
src/include/port/win32.h already defined, but obviously in a way that breaks
postgres.

Because of my missing experience and as I don't have a Mingw64 build
environment I don't feel like I could fix that without breaking anythig
else.


I'd like to add that I'm certainly willing to test suggested fixes or
patches in my VS 2010 build environment.



Can you narrow down exactly what in that commit broke VS 2010? Are there any
compiler warnings?

He did post the commit - 1a0c76c32fe470142d3663dd84ac960d75a4e8db.



I realize that. I'm asking him to narrow it down more.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of VS 2010 support patches

2011-12-28 Thread Magnus Hagander
On Wed, Dec 28, 2011 at 15:08, Andrew Dunstan and...@dunslane.net wrote:


 On 12/28/2011 08:43 AM, Magnus Hagander wrote:

 On Wed, Dec 28, 2011 at 14:38, Andrew Dunstanand...@dunslane.net  wrote:


 On 12/27/2011 11:09 PM, Brar Piening wrote:

 Brar Piening wrote:

 I have to admit that it's currently broken (it builds but fails during
 regression tests becuse it can't connect) when building with Visual
 Studio
 2010 or Windows SDK 7.1 because of commit
 1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the
 mingw-w64 32 bit compiler).

 It seems like VS 2010 has a few of the E... constants in
 src/include/port/win32.h already defined, but obviously in a way that
 breaks
 postgres.

 Because of my missing experience and as I don't have a Mingw64 build
 environment I don't feel like I could fix that without breaking anythig
 else.


 I'd like to add that I'm certainly willing to test suggested fixes or
 patches in my VS 2010 build environment.


 Can you narrow down exactly what in that commit broke VS 2010? Are there
 any
 compiler warnings?

 He did post the commit - 1a0c76c32fe470142d3663dd84ac960d75a4e8db.



 I realize that. I'm asking him to narrow it down more.

Meh. I can't read. Sorry about that.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-28 Thread Greg Stark
On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure mmonc...@gmail.com wrote:
  I bet if you kept a judicious number of
 clog pages in each local process with some smart invalidation you
 could cover enough cases that scribbling the bits down would become
 unnecessary.

I don't understand how any cache can completely remove the need for
hint bits. Without hint bits the xids in the tuples will be in-doubt
forever. No matter how large your cache you'll always come across
tuples that are arbitrarily old and are from an unbounded size set of
xids.

We could replace the xids with a frozen xid sooner but that just
amounts to nearly the same thing as the hint bits only with page
locking and wal records.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-28 Thread Merlin Moncure
On Wed, Dec 28, 2011 at 8:45 AM, Greg Stark st...@mit.edu wrote:
 On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure mmonc...@gmail.com wrote:
  I bet if you kept a judicious number of
 clog pages in each local process with some smart invalidation you
 could cover enough cases that scribbling the bits down would become
 unnecessary.

 I don't understand how any cache can completely remove the need for
 hint bits. Without hint bits the xids in the tuples will be in-doubt
 forever. No matter how large your cache you'll always come across
 tuples that are arbitrarily old and are from an unbounded size set of
 xids.

well, hint bits aren't needed strictly speaking, they are an
optimization to guard against clog lookups.   but is marking bits on
the tuple the only way to get that effect?

I'm conjecturing that some process local memory could be laid on top
of the clog slru that would be fast enough such that it could take the
place of the tuple bits in the visibility check.  Maybe this could
reduce clog contention as well -- or maybe the idea is unworkable.
That said, it shouldn't be that much work to make a proof of concept
to test the idea.

 We could replace the xids with a frozen xid sooner but that just
 amounts to nearly the same thing as the hint bits only with page
 locking and wal records.

right -- I don't think that helps.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Steve Crawford

On 12/28/2011 05:05 AM, Alvaro Herrera wrote:

Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:

I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
currently in test/dev mode. I'm currently seeing the following messages
occurring every few seconds:

...
Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait
timeout
Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait
timeout

Hm, so can you strace the stats collector to see what it's doing?  Maybe
grab a backtrace with GDB from it before anything else.

My guess is 27324 is the autovac launcher and the others are autovac
workers just as they die.

You are correct. 27324 is the launcher and the others are autovac 
workers. Here's the strace of the stats collector process:


getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
rinse...lather...repeat...ad nauseum...

And the backtrace:

#0  0x7ff4d2e80f58 in poll () from /lib/libc.so.6
#1  0x7ff4d4e6f465 in ?? ()
#2  0x7ff4d4e6fd83 in pgstat_start ()
#3  0x7ff4d4e73475 in ?? ()
#4 signal handler called
#5  0x7ff4d2e85fd3 in select () from /lib/libc.so.6
#6  0x7ff4d4e71b93 in ?? ()
#7  0x7ff4d4e74b01 in PostmasterMain ()
#8  0x7ff4d4e193b3 in main ()

Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-28 Thread Heikki Linnakangas

On 28.12.2011 11:00, Robert Haas wrote:

Admittedly, most of the fat is probably in the tuple header rather
than the page header, but at any rate I don't consider burning up 1%
of our available storage space to be a negligible overhead.


8 / 8192 = 0.1%.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Alvaro Herrera

Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:
 On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
  Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
  I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
  currently in test/dev mode. I'm currently seeing the following messages
  occurring every few seconds:
 
  ...
  Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
  Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait
  timeout
  Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
  Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait
  timeout
  Hm, so can you strace the stats collector to see what it's doing?  Maybe
  grab a backtrace with GDB from it before anything else.
 
  My guess is 27324 is the autovac launcher and the others are autovac
  workers just as they die.
 
 You are correct. 27324 is the launcher and the others are autovac 
 workers. Here's the strace of the stats collector process:
 
 getppid()   = 27320
 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
 getppid()   = 27320
 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
 getppid()   = 27320
 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
 rinse...lather...repeat...ad nauseum...

Weird ... even across more pgstat wait timeout messages?  It's like
it's not getting the inquiry messages that would tell it to write the
file ... something wrong with the UDP socket perhaps?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-28 Thread Heikki Linnakangas

On 28.12.2011 11:22, Simon Riggs wrote:

On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:


How would you know when to look in the double write buffer?



You scan the double-write buffer, and every page in the double write buffer
that has a valid checksum, you copy to the main storage. There's no need to
check validity of pages in the main storage.


OK, then we are talking at cross purposes. Double write buffers, in
the way you explain them allow us to remove full page writes. They
clearly don't do anything to check page validity on read. Torn pages
are not the only fault we wish to correct against... and the double
writes idea is orthogonal to the idea of checksums.


The reason we're talking about double write buffers in this thread is 
that double write buffers can be used to solve the problem with hint 
bits and checksums.


You're right, though, that it's academical whether double write buffers 
can be used without checksums on data pages, if the whole point of the 
exercise is to make it possible to have checksums on data pages..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] age(xid) on hot standby

2011-12-28 Thread Peter Eisentraut
The check_postgres txn_wraparound action[0] runs this query:

   SELECT datname, age(datfrozenxid) AS age FROM pg_database WHERE datallowconn 
ORDER BY 1, 2

On a hot standby, this fails with:

   ERROR:  cannot assign TransactionIds during recovery

So, a couple of things to wonder about:

Is it unreasonable to check for transaction ID wraparound on a standby?
It should mirror the situation on the primary, shouldn't it?

Should the age(xid) function do something more useful on a standby,
e.g., have a custom error message or return null or use the transaction
ID from the master?

The error message is coded as an elog() call, meaning that users
shouldn't see it, but it can evidently be triggered by a user, so maybe
we should decorate it with some detail, depending on the outcome of the
previous question.

(It looks like age(xid) isn't documented at all.  Maybe it should be.)


[0] - http://bucardo.org/check_postgres/check_postgres.pl.html#txn_wraparound


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Steve Crawford

On 12/28/2011 09:34 AM, Alvaro Herrera wrote:

Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:

On 12/28/2011 05:05 AM, Alvaro Herrera wrote:

Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:

I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
currently in test/dev mode. I'm currently seeing the following messages
occurring every few seconds:

...
Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait
timeout
Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait
timeout

Hm, so can you strace the stats collector to see what it's doing?  Maybe
grab a backtrace with GDB from it before anything else.

My guess is 27324 is the autovac launcher and the others are autovac
workers just as they die.


You are correct. 27324 is the launcher and the others are autovac
workers. Here's the strace of the stats collector process:

getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
rinse...lather...repeat...ad nauseum...

Weird ... even across more pgstat wait timeout messages?  It's like
it's not getting the inquiry messages that would tell it to write the
file ... something wrong with the UDP socket perhaps?


Bingo!

postgres  27325 postgres8u *IPv6*5379428   
0t0UDP localhost:47204-localhost:47204


In working on diagnosing a network timeout issue over an IPv4 to IPv4 
VPN I disabled IPv6 via sysctl on this machine and pretty much forgot 
about it since we are still IPv4 internally. But PostgreSQL had already 
established a (now non-functional) IPv6 local connection. Re-enabling 
IPv6, as it was not related to the VPN timeouts, corrected the pgstat 
wait timeout issue.


Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/README

2011-12-28 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I wonder whether it's time to drop that file altogether ... it served a
 purpose back before we integrated contrib into the SGML docs, but now
 I'm not quite sure why we should bother with it.

I wonder if we shouldn't keep the file and have it just point to the
relevant documentation chapters (extend, contribs).

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] age(xid) on hot standby

2011-12-28 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mié dic 28 15:04:09 -0300 2011:
 The check_postgres txn_wraparound action[0] runs this query:
 
SELECT datname, age(datfrozenxid) AS age FROM pg_database WHERE 
 datallowconn ORDER BY 1, 2
 
 On a hot standby, this fails with:
 
ERROR:  cannot assign TransactionIds during recovery
 
 So, a couple of things to wonder about:
 
 Is it unreasonable to check for transaction ID wraparound on a standby?
 It should mirror the situation on the primary, shouldn't it?
 
 Should the age(xid) function do something more useful on a standby,
 e.g., have a custom error message or return null or use the transaction
 ID from the master?

I think we could just have the xid_age call
GetCurrentTransactionIdIfAny, and if that returns InvalidXid, use
ReadNewTransactionId instead.  That xid_age assigns a transaction seems
more of an accident than really intended.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] age(xid) on hot standby

2011-12-28 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Peter Eisentraut's message of mié dic 28 15:04:09 -0300 2011:
 On a hot standby, this fails with:
 ERROR:  cannot assign TransactionIds during recovery

 I think we could just have the xid_age call
 GetCurrentTransactionIdIfAny, and if that returns InvalidXid, use
 ReadNewTransactionId instead.  That xid_age assigns a transaction seems
 more of an accident than really intended.

The trouble with using ReadNewTransactionId is that it makes the results
volatile, not stable as the function is declared to be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Heikki Linnakangas

On 28.12.2011 14:03, Tatsuo Ishii wrote:

With help from IBM Japan Ltd. we did some tests on a larger IBM
machine than Tom Lane has used for his
test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
In his case it was IBM 8406-71Y, which has 8 physical cores and
4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32
physical cores and 4SMT(128 threadings), 256GB RAM.

The test method was same as the one in the article above. The
differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer
size(8GB).

We tested 3 methods to enhance spin lock contention:

1) Add hint parameter to lwarx op which is usable POWER6 or later
   architecure.

2) Add non-locked test in TAS()

3) #1 + #2

We saw small performance enhancement with #1, larger one with #2 and
even better with #1+#2.


Hmm, so you added the non-locked test in TAS()?  Did you try adding it
just to TAS_SPIN()?  On Itanium, I found that it was slightly better
to do it only in TAS_SPIN() - i.e. in the contended case.


Here is new patch using TAS_SPIN(), created by Manabu Ori from IBM
Japan. Also this patch deal with older Power architectures which do
not have hint argument of lwarx opcode.

According to him, the patch resulted in much better performance stock
git head.


Impressive results.

config/c-compiler.m4 doesn't seem like the right place for the configure 
test. Would there be any harm in setting the lwarx hint always; what 
would happen on older ppc processors that don't support it?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Tatsuo Ishii
 On 28.12.2011 14:03, Tatsuo Ishii wrote:
 With help from IBM Japan Ltd. we did some tests on a larger IBM
 machine than Tom Lane has used for his
 test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
 In his case it was IBM 8406-71Y, which has 8 physical cores and
 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32
 physical cores and 4SMT(128 threadings), 256GB RAM.

 The test method was same as the one in the article above. The
 differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer
 size(8GB).

 We tested 3 methods to enhance spin lock contention:

 1) Add hint parameter to lwarx op which is usable POWER6 or later
architecure.

 2) Add non-locked test in TAS()

 3) #1 + #2

 We saw small performance enhancement with #1, larger one with #2 and
 even better with #1+#2.

 Hmm, so you added the non-locked test in TAS()?  Did you try adding it
 just to TAS_SPIN()?  On Itanium, I found that it was slightly better
 to do it only in TAS_SPIN() - i.e. in the contended case.

 Here is new patch using TAS_SPIN(), created by Manabu Ori from IBM
 Japan. Also this patch deal with older Power architectures which do
 not have hint argument of lwarx opcode.

 According to him, the patch resulted in much better performance stock
 git head.
 
 Impressive results.
 
 config/c-compiler.m4 doesn't seem like the right place for the
 configure test. Would there be any harm in setting the lwarx hint
 always; what would happen on older ppc processors that don't support
 it?

I think the load module just fails to run in this case, but I'd like
to confirm. Ori-san?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of VS 2010 support patches

2011-12-28 Thread Brar Piening

Andrew Dunstan wrote:
Can you narrow down exactly what in that commit broke VS 2010? Are 
there any compiler warnings?


I was able to nail down the problem.

Running the regression tests (vcregress check) gives the following messages:
snip
== creating temporary installation==
== initializing database system   ==
== starting postmaster==

pg_regress: postmaster did not respond within 60 seconds
Examine src/test/regress/log/postmaster.log for the reason
/snip

postmaster.log shows the following messages:
snip
LOG:  database system was shut down at 2011-12-28 22:09:46 CET
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  incomplete startup packet
/snip
with the line LOG:  incomplete startup packet repeated several times 
afterwards.


The problem seems to be related to an invalid socket error constant.
EWOULDBLOCK gets expanded to 140 with commit 
1a0c76c32fe470142d3663dd84ac960d75a4e8db applied whereas it got expanded 
to 10035L before.
 Adding the following code to src/include/port/win32.h restores the 
former (running) behaviour :

snip
#if _MSC_VER = 1600
#pragma warning(disable:4005)
#define EWOULDBLOCK WSAEWOULDBLOCK
#endif
/snip

But according to the winsock docs this minimal invasive surgery isn't 
really appropriate (at least for visual c).

http://msdn.microsoft.com/en-us/library/windows/desktop/ms737828(v=vs.85).aspx

It appears that VS 2010 and Windows SDK 7.1 now have an extended errno.h 
that defines quite a few of the E* constants:

snip
/* POSIX SUPPLEMENT */
#define EADDRINUSE  100
#define EADDRNOTAVAIL   101
[...]
#define ETXTBSY 139
#define EWOULDBLOCK 140
/snip

Here we probably run into the conflict that winsock2.h has always been 
warning about:

snip
/*
 * Windows Sockets errors redefined as regular Berkeley error constants.
 * These are commented out in Windows NT to avoid conflicts with errno.h.
 * Use the WSA constants instead.
 */
#if 0
#define EWOULDBLOCK WSAEWOULDBLOCK
[...]
#define ESTALE  WSAESTALE
#define EREMOTE WSAEREMOTE
#endif
/snip

A possible solution would be to use something like PGEWOULDBLOCK and 
similiar constants wherever socket errors are used and set them to the 
WSAE* constants on windows and the E* constants on other platforms.


Anyhow, this would be ways beyond the scope of my patch and there will 
probably be a better solution to be suggested from a real C hacker.


Regards,

Brar














--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 config/c-compiler.m4 doesn't seem like the right place for the configure 
 test. Would there be any harm in setting the lwarx hint always; what 
 would happen on older ppc processors that don't support it?

More to the point, a configure test only proves whether the
build machine can deal with the flag, not whether the machine
the executables will ultimately run on knows what the flag means.
We cannot assume that the build and execution boxes are the same.
(In general, AC_TRY_RUN tests are best avoided because of this.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Kevin Grittner
Tom Lane  wrote:
 
 a configure test only proves whether the build machine can deal
 with the flag, not whether the machine the executables will
 ultimately run on knows what the flag means.  We cannot assume that
 the build and execution boxes are the same.  (In general,
 AC_TRY_RUN tests are best avoided because of this.)
 
I understand why that is important in general, but as a shop which
builds from source, and is fine with a separate build for each
hardware model / OS version combination, it would be great if any
optimizations which are only available if you *do* assume that the
build machine and the run machine are the same (or at lease
identical) could be enabled with some configure switch.  Maybe
something like --enable-platform-specific-optimizations.
 
I don't know if any such possible optimizations currently exist, I'm
just saying that if any are identified, it would be nice to have the
option of using them.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Tatsuo Ishii
OT:

Please use mail address manabu@gmail.com, not
manabu@gmailc.com when following this thread.  I accidently made
a mistake when I posted the first mail in this thread.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

 Tom Lane  wrote:
  
 a configure test only proves whether the build machine can deal
 with the flag, not whether the machine the executables will
 ultimately run on knows what the flag means.  We cannot assume that
 the build and execution boxes are the same.  (In general,
 AC_TRY_RUN tests are best avoided because of this.)
  
 I understand why that is important in general, but as a shop which
 builds from source, and is fine with a separate build for each
 hardware model / OS version combination, it would be great if any
 optimizations which are only available if you *do* assume that the
 build machine and the run machine are the same (or at lease
 identical) could be enabled with some configure switch.  Maybe
 something like --enable-platform-specific-optimizations.
  
 I don't know if any such possible optimizations currently exist, I'm
 just saying that if any are identified, it would be nice to have the
 option of using them.
  
 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Manabu Ori
2011/12/29 Tatsuo Ishii is...@postgresql.org
  Impressive results.
 
  config/c-compiler.m4 doesn't seem like the right place for the
  configure test. Would there be any harm in setting the lwarx hint
  always; what would happen on older ppc processors that don't support
  it?

 I think the load module just fails to run in this case, but I'd like
 to confirm. Ori-san?

I don't know where is the right config/*.m4 to place this kind of
configure test. Do you have any idea?

I believe lwarx hint would be no harm for recent PowerPC processors.
What I tested are:

  (1) Built postgres on POWER6 + RHEL5, which got lwarx hint
  included. Then copy these src tree to POWER5 + RHEL4 and
  run make test, finished successfully.

  (2) Lwarx test in configure failed on POWER5 + RHEL4.

Note that POWER6 understands lwarx hint and POWER5 doesn't.
RHEL5 binutils supports lwarx hint and RHEL4 binutils doesn't.

The only concern is for very old PowerPC.
Referring to Power Instruction Set Architecture manual(*1), on
some processors that precede PowerISA v2.00, executing lwarx with
hint will cause the illegal instruction error.

Lwarx test in configure should fail on these kind of processors,
guessing from my test(2).

(*1) p.689 of
https://www.power.org/resources/downloads/PowerISA_V2.06B_V2_PUBLIC.pdf

Regards,
Manabu Ori


Re: [HACKERS] spinlocks on HP-UX

2011-12-28 Thread Manabu Ori
  a configure test only proves whether the build machine can deal
  with the flag, not whether the machine the executables will
  ultimately run on knows what the flag means.  We cannot assume that
  the build and execution boxes are the same.  (In general,
  AC_TRY_RUN tests are best avoided because of this.)

 I understand why that is important in general, but as a shop which
 builds from source, and is fine with a separate build for each
 hardware model / OS version combination, it would be great if any
 optimizations which are only available if you *do* assume that the
 build machine and the run machine are the same (or at lease
 identical) could be enabled with some configure switch.  Maybe
 something like --enable-platform-specific-optimizations.

 I don't know if any such possible optimizations currently exist, I'm
 just saying that if any are identified, it would be nice to have the
 option of using them.

I can't say the right way to go for now, but I'd like binary
packages could enjoy the effect of my patch as far as possible so
that I made lwarx hint test run in configure runtime.

Regards,
Manabu Ori