[HACKERS] Re: patch-3 (3-allow-wal-record-header-to-be-split.patch)WAL Format Changes

2012-07-01 Thread Amit kapila
From: Heikki Linnakangas [heikki.linnakan...@enterprisedb.com]
Sent: Sunday, July 01, 2012 1:54 AM
On 30.06.2012 10:11, Amit kapila wrote:
 3. General observation, not related to your changes
 XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
.
.
 if (freespace == 0)
   {
   updrqst = AdvanceXLInsertBuffer(false);

 In the code, AdvanceXLInsertBuffer(false); is called after starting critical 
 section and acquiring
 WALInsertLock, now if any error occurs inside  AdvanceXLInsertBuffer()
 (in one of the paths it calls XLogWrite(), from which it can call 
 XLogFileInit() where error can occur);
 how will it release WALInsertLock or end critical section.

 Yep, if an I/O error happens while writing a WAL record - running out of
 disk space is the typical example - we PANIC. Nope, it's not ideal.

 Even if we somehow managed to avoid doing I/O in the critical section in
 XLogInsert(), most callers of XLogInsert() surround the call in a
 critical section anyway. For example, when a new tuple is inserted to
 heap, once you've modified the page to add the new tuple, it's too late
 to back out. The WAL record is written while holding the lock on the
 page, and if something goes wrong with writing the WAL record, we have
 no choice but PANIC.

PANIC is understandable as after this user cannot perform operation without 
restart.
However if the level is ERROR, then there might be other problems as user can 
perform operations after that through same session 
The case which I am highlighting is of ERROR, please refer the code of 
XLogFileInit().

For Example:
fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
 S_IRUSR | S_IWUSR);
  if (fd  0)
  {
   if (errno != ENOENT)
ereport(ERROR,
  (errcode_for_file_access(),
   errmsg(could not open file \%s\ (log file %u, segment %u): %m,
path, log, seg)));
  }
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-01 Thread Alexander Korotkov
On Wed, Jun 27, 2012 at 11:35 PM, Robert Haas robertmh...@gmail.com wrote:

 It looks to me like pg_wchar2utf_with_len will not work, because
 unicode_to_utf8 returns its second argument unmodified - not, as your
 code seems to assume, the byte following what was already written.


Fixed.


 MULE also looks problematic.  The code that you've written isn't
 symmetric with the opposite conversion, unlike what you did in all
 other cases, and I don't understand why.  I'm also somewhat baffled by
 the reverse conversion: it treats a multi-byte sequence beginning with
 a byte for which IS_LCPRV1(x) returns true as invalid if there are
 less than 3 bytes available, but it only reads two; similarly, for
 IS_LCPRV2(x), it demands 4 bytes but converts only 3.


Should we save existing pg_wchar representation for MULE encoding?
Probably, we can modify it like in 0.1 version of patch in order to make it
more transparent.

--
With best regards,
Alexander Korotkov.


wchar2mb-0.4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Boszormenyi Zoltan

Hi,

attached is a patch that does $SUBJECT.

It's a usability enhancement, to take a backup, write
a minimalistic recovery.conf and start the streaming
standby in one go.

Comments?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/

diff -durpN postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml postgresql/doc/src/sgml/ref/pg_basebackup.sgml
--- postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml	2012-06-18 07:06:42.774009894 +0200
+++ postgresql/doc/src/sgml/ref/pg_basebackup.sgml	2012-07-01 12:51:51.050652170 +0200
@@ -186,6 +186,28 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-R/option/term
+  termoption--write-recovery-conf/option/term
+  listitem
+   para
+Write a minimal recovery.conf into the output directory using
+the connection parameters from the command line to ease
+setting up the standby. Conflicts with option--xlog/option
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
+  termoption-S/option/term
+  termoption--start-standby/option/term
+  listitem
+   para
+Start the standby database server. Implies option-R/option.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-X replaceable class=parametermethod/replaceable/option/term
   termoption--xlog-method=replaceable class=parametermethod/replaceable/option/term
   listitem
diff -durpN postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c postgresql/src/bin/pg_basebackup/pg_basebackup.c
--- postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c	2012-06-26 09:10:21.301759598 +0200
+++ postgresql/src/bin/pg_basebackup/pg_basebackup.c	2012-07-01 12:44:36.920813960 +0200
@@ -46,6 +46,8 @@ int			compresslevel = 0;
 bool		includewal = false;
 bool		streamwal = false;
 bool		fastcheckpoint = false;
+bool		writerecoveryconf = false;
+bool		startstandby = false;
 int			standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 
 /* Progress counters */
@@ -107,6 +109,10 @@ usage(void)
 	printf(_(\nOptions controlling the output:\n));
 	printf(_(  -D, --pgdata=DIRECTORY   receive base backup into directory\n));
 	printf(_(  -F, --format=p|t output format (plain (default), tar)\n));
+	printf(_(  -R, --write-recovery-conf\n
+		  write recovery.conf after backup\n));
+	printf(_(  -S, --start-standby  start standby after backup, implies -R\n
+		  conflicts with -x\n));
 	printf(_(  -x, --xlog   include required WAL files in backup (fetch mode)\n));
 	printf(_(  -X, --xlog-method=fetch|stream\n
 		  include required WAL files with specified method\n));
@@ -1193,6 +1199,85 @@ BaseBackup(void)
 		fprintf(stderr, %s: base backup completed\n, progname);
 }
 
+static void WriteRecoveryConf(void)
+{
+	char		filename[MAXPGPATH];
+	char		cfhost[MAXPGPATH] = ;
+	char		cfport[MAXPGPATH] = ;
+	char		cfuser[MAXPGPATH] = ;
+	char		cfpass[MAXPGPATH] = ;
+	FILE	   *cf;
+
+	if (!writerecoveryconf)
+		return;
+
+	memset(cfhost, 0, MAXPGPATH);
+	if (dbhost)
+		snprintf(cfhost, MAXPGPATH-1, host=%s, dbhost);
+
+	memset(cfport, 0, MAXPGPATH);
+	if (dbport)
+		snprintf(cfport, MAXPGPATH-1, port=%s, dbport);
+
+	memset(cfuser, 0, MAXPGPATH);
+	if (dbuser)
+		snprintf(cfuser, MAXPGPATH-1, user=%s, dbuser);
+
+	memset(cfpass, 0, MAXPGPATH);
+	if (dbpassword)
+		snprintf(cfpass, MAXPGPATH-1, password='%s', dbpassword);
+	else
+		printf(add password to primary_conninfo in %s if needed\n, filename);
+
+	sprintf(filename, %s/recovery.conf, basedir);
+
+	cf = fopen(filename, w);
+	if (cf == NULL)
+	{
+		fprintf(stderr, _(cannot create %s), filename);
+		exit(1);
+	}
+
+	fprintf(cf, standby_mode = 'on'\n);
+	fprintf(cf, primary_conninfo = '%s %s %s'\n, cfhost, cfport, cfuser);
+
+	fclose(cf);
+}
+
+static void StartStandby(const char *command)
+{
+	char	   *path;
+	char	   *args[5];
+	int		pos, len;
+
+	if (!startstandby)
+		return;
+
+	path = xstrdup(command);
+	len = strlen(path);
+
+	for (pos = len; pos  0; pos--)
+	{
+		if (path[pos - 1] == '/'
+#ifdef WIN32
+			|| path[pos - 1] == '\\'
+#endif
+		)
+			break;
+	}
+
+	sprintf(path[pos], pg_ctl);
+
+	args[0] = path;
+	args[1] = -D;
+	args[2] = basedir;
+	args[3] = start;
+	args[4] = NULL;
+
+	if (execvp(path, args)  0)
+		fprintf(stderr, _(Cannot execute pg_ctl: %s),
+			strerror(errno));
+}
 
 int
 main(int argc, char **argv)
@@ -1203,6 +1288,8 @@ main(int argc, char **argv)
 		{pgdata, required_argument, NULL, 'D'},
 		{format, required_argument, NULL, 'F'},
 		{checkpoint, required_argument, NULL, 'c'},
+		{write-recovery-conf, no_argument, NULL, 'R'},
+		{start-standby, no_argument, NULL, 'S'},
 		{xlog, no_argument, NULL, 'x'},
 		{xlog-method, 

Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions

2012-07-01 Thread Alexander Korotkov
Hi, Andres!

There is my review of this patch.

1) Patches don't apply cleanly to head. So I used commit
bed88fceac04042f0105eb22a018a4f91d64400d as the base for patches, then all
the patches close to this apply cleanly. Regression tests pass OK, but it
seems that new functionality isn't covered by regression tests.

2) Patch needs more comments. I think we need at least one comment in head
of each function describing its behaviour, even if it is evident from
function name.

4) There is significant code duplication in APPLY_CACHE_CHANGE_UPDATE and
APPLY_CACHE_CHANGE_DELETE branches of case in apply_change function. I
think this could be refactored to reduce code duplication.

5) Apply mechanism requires PK from each table. So, throwing error here if
we don't find PK is necessary. But we need to prevent user from run logical
replication earlier than failing applying received messages. AFACS patch
which is creating corresponding log messages is here:
http://archives.postgresql.org/message-id/1339586927-13156-7-git-send-email-and...@2ndquadrant.com.
And it throws any warning if it fails to find PK. On which stage we prevent
user from running logical replication on tables which doesn't have PK?

6) I've seen comment /* FIXME: locking */. But you open index with command
index_rel = index_open(indexoid, AccessShareLock);
and close it with command
heap_close(index_rel, NoLock);
Shouldn't we use same level of locking on close as on open? Also,
heap_close doesn't looks good to me for closing index. Why don't use
index_close or relation_close?

7) We find each updated and deleted tuple by PK. Imagine we update
significant part of the table (for example, 10%) in single query and
planner choose sequential scan for it. Then applying of this changes could
be more expensive than doing original changes. This it probably ok. But, we
could do some heuristics: detect that sequential scan is cheaper because of
large amount of updates or deletes in one table.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Robert Haas
On Fri, Jun 29, 2012 at 1:07 PM, Nils Goroll sl...@schokola.de wrote:
 FWIW, I kicked off a looong benchmarking run on this a couple of days
 ago on the IBM POWER7 box, testing pgbench -S, regular pgbench, and
 pgbench --unlogged-tables at various client counts with and without
 the patch; three half-hour test runs for each test configuration.  It
 should be done tonight and I will post the results once they're in.

 Sounds great! I am really curious.

Here are the results.  Each result is the median of three 30-minute
test runs on an IBM POWER7 system with 16 cores, 64 hardware threads.
Configuration was shared_buffers = 8GB, maintenance_work_mem = 1GB,
synchronous_commit = off, checkpoint_segments = 300,
checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
wal_writer_delay = 20ms, log_line_prefix = '%t [%p] '.  Lines
beginning with m show performance on master; lines beginning with p
show performance with patch; the following number is the # of clients
used for the test.

Permanent Tables


m01 tps = 1364.521373 (including connections establishing)
m08 tps = 9175.281381 (including connections establishing)
m32 tps = 14770.652793 (including connections establishing)
m64 tps = 14183.495875 (including connections establishing)
p01 tps = 1366.447001 (including connections establishing)
p08 tps = 9406.181857 (including connections establishing)
p32 tps = 14608.766540 (including connections establishing)
p64 tps = 14182.576636 (including connections establishing)

Unlogged Tables
===

m01 tps = 1459.649000 (including connections establishing)
m08 tps = 11872.102025 (including connections establishing)
m32 tps = 32834.258026 (including connections establishing)
m64 tps = 33404.988834 (including connections establishing)
p01 tps = 1481.876584 (including connections establishing)
p08 tps = 11787.657258 (including connections establishing)
p32 tps = 32959.342248 (including connections establishing)
p64 tps = 33672.008244 (including connections establishing)

SELECT-only
===

m01 tps = 8777.971832 (including connections establishing)
m08 tps = 70695.558964 (including connections establishing)
m32 tps = 201762.696020 (including connections establishing)
m64 tps = 310137.544470 (including connections establishing)
p01 tps = 8914.165586 (including connections establishing)
p08 tps = 71351.501358 (including connections establishing)
p32 tps = 201946.425301 (including connections establishing)
p64 tps = 305627.413716 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] Row-Level Security

2012-07-01 Thread Kohei KaiGai
2012/6/28 Tom Lane t...@sss.pgh.pa.us:
 Kohei KaiGai kai...@kaigai.gr.jp writes:
 2012/6/27 Florian Pflug f...@phlo.org:
 Hm, what happens if a SECURITY DEFINER functions returns a refcursor?

 My impression is, here is no matter even if SECURITY DEFINER function
 returns refcursor.

 I think Florian has a point: it *should* work, but *will* it?

 I believe it works today, because the executor only applies permissions
 checks during query startup.  So those checks are executed while still
 within the SECURITY DEFINER context, and should behave as expected.
 Subsequently, the cursor portal is returned to caller and caller can
 execute it to completion, no problem.

 However, with RLS security-related checks will happen throughout the
 execution of the portal.  They might do the wrong thing once the
 SECURITY DEFINER function has been exited.

I tried the scenario that Florian pointed out.

postgres=# CREATE OR REPLACE FUNCTION f_test(refcursor) RETURNS refcursor
postgres-# SECURITY DEFINER LANGUAGE plpgsql
postgres-# AS 'BEGIN OPEN $1 FOR SELECT current_user, * FROM t1;
RETURN $1; END';
CREATE FUNCTION
postgres=# BEGIN;
BEGIN
postgres=# SELECT f_test('abc');
 f_test

 abc
(1 row)

postgres=# FETCH abc;
 current_user | a |  b
--+---+-
 kaigai   | 1 | aaa
(1 row)

postgres=# SET SESSION AUTHORIZATION alice;
SET
postgres= FETCH abc;
 current_user | a |  b
--+---+-
 alice| 2 | bbb
(1 row)

postgres= SET SESSION AUTHORIZATION bob;
SET
postgres= FETCH abc;
 current_user | a |  b
--+---+-
 bob  | 3 | ccc
(1 row)

Indeed, the output of current_user depends on the setting of session
authorization, even though it was declared within security definer
functions (thus, its security checks are applied on the privileges of
function owner).

 We might need to consider that a portal has a local value of
 current_user, which is kind of a pain, but probably doable.

It seems to me, it is an individual matter to be fixed independent
from RLS feature. How about your opinion?

If we go ahead, an idea to tackle this matter is switch user-id
into saved one in the Portal at the beginning of PortanRun or
PortalRunFetch.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Nils Goroll
Thank you, Robert.

as this patch was not targeted towards increasing tps, I am at happy to hear
that your benchmarks also suggest that performance is comparable.

But my main question is: how about resource consumption? For the issue I am
working on, my current working hypothesis is that spinning on locks saturates
resources and brings down overall performance in a high-contention situation.

Do you have any getrusage figures or anything equivalent?

Thanks, Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Nils Goroll
 test runs on an IBM POWER7 system with 16 cores, 64 hardware threads.

Could you add the CPU Type / clock speed please?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Fujii Masao
On Sun, Jul 1, 2012 at 8:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 Hi,

 attached is a patch that does $SUBJECT.

 It's a usability enhancement, to take a backup, write
 a minimalistic recovery.conf and start the streaming
 standby in one go.

 Comments?

Could you add the patch to the next CommitFest?

If the backup is taken from the standby server, the standby's recovery.conf
is included in the backup. What happens in this case?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Boszormenyi Zoltan

Hi,

2012-07-01 17:38 keltezéssel, Fujii Masao írta:

On Sun, Jul 1, 2012 at 8:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote:

Hi,

attached is a patch that does $SUBJECT.

It's a usability enhancement, to take a backup, write
a minimalistic recovery.conf and start the streaming
standby in one go.

Comments?

Could you add the patch to the next CommitFest?


I will.


If the backup is taken from the standby server, the standby's recovery.conf
is included in the backup. What happens in this case?


As documented, the command line parameters of pg_basebackup
will be used for recovery.conf. So, the new standby will replicate
the previous one. Cascading replication works since 9.2.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Magnus Hagander
On Sun, Jul 1, 2012 at 1:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 Hi,

 attached is a patch that does $SUBJECT.

 It's a usability enhancement, to take a backup, write
 a minimalistic recovery.conf and start the streaming
 standby in one go.

I like the writing of recovery.conf. In fact, I had it in my code at
one very early point and took it out in order to get a clean patch
ready :)

But I think that part is lacking in functionality: AFAICT it's
hardcoded to only handle host, port, user and password. What about
other connection parameters, likely passed to pg_basebackup through
the environment in that case? isn't that quite likely to break the
server later?

Maybe the proper way around that is to provide the ability for
pg_basebackup to take a full connection string, just like we allow
psql to do?



I'm not sure we should go the way of providing the start slave.
Given thta how you want to start the slave differs so much on
platforms. The most glaring example is on windows you really need to
*start the service* rather than use pg_ctl. Sure, you can document
your way around that, but I'm not sure the functionality added is
really worth it. What about all the other potential connection
parameters?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions

2012-07-01 Thread Alexander Korotkov
On Sun, Jul 1, 2012 at 3:11 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 1) Patches don't apply cleanly to head. So I used commit
 bed88fceac04042f0105eb22a018a4f91d64400d as the base for patches, then all
 the patches close to this apply cleanly. Regression tests pass OK, but it
 seems that new functionality isn't covered by regression tests.

 2) Patch needs more comments. I think we need at least one comment in head
 of each function describing its behaviour, even if it is evident from
 function name.

 4) There is significant code duplication in APPLY_CACHE_CHANGE_UPDATE and
 APPLY_CACHE_CHANGE_DELETE branches of case in apply_change function. I
 think this could be refactored to reduce code duplication.

 5) Apply mechanism requires PK from each table. So, throwing error here if
 we don't find PK is necessary. But we need to prevent user from run logical
 replication earlier than failing applying received messages. AFACS patch
 which is creating corresponding log messages is here:
 http://archives.postgresql.org/message-id/1339586927-13156-7-git-send-email-and...@2ndquadrant.com.
 And it throws any warning if it fails to find PK. On which stage we prevent
 user from running logical replication on tables which doesn't have PK?

 6) I've seen comment /* FIXME: locking */. But you open index with command
 index_rel = index_open(indexoid, AccessShareLock);
 and close it with command
 heap_close(index_rel, NoLock);
 Shouldn't we use same level of locking on close as on open? Also,
 heap_close doesn't looks good to me for closing index. Why don't use
 index_close or relation_close?

 7) We find each updated and deleted tuple by PK. Imagine we update
 significant part of the table (for example, 10%) in single query and
 planner choose sequential scan for it. Then applying of this changes could
 be more expensive than doing original changes. This it probably ok. But, we
 could do some heuristics: detect that sequential scan is cheaper because of
 large amount of updates or deletes in one table.


8) If we can't find tuple for update or delete we likely need to put PK
value into the log message.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Fujii Masao
On Mon, Jul 2, 2012 at 12:42 AM, Boszormenyi Zoltan z...@cybertec.at wrote:
 Hi,

 2012-07-01 17:38 keltezéssel, Fujii Masao írta:

 On Sun, Jul 1, 2012 at 8:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote:

 Hi,

 attached is a patch that does $SUBJECT.

 It's a usability enhancement, to take a backup, write
 a minimalistic recovery.conf and start the streaming
 standby in one go.

 Comments?

 Could you add the patch to the next CommitFest?


 I will.


 If the backup is taken from the standby server, the standby's
 recovery.conf
 is included in the backup. What happens in this case?


 As documented, the command line parameters of pg_basebackup
 will be used for recovery.conf. So, the new standby will replicate
 the previous one. Cascading replication works since 9.2.

So pg_basebackup overwrites the recovery.conf which was backed up
from the standby with the recovery.conf which was created by using
the command line parameters of pg_basebackup?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Fujii Masao
On Mon, Jul 2, 2012 at 12:44 AM, Magnus Hagander mag...@hagander.net wrote:
 On Sun, Jul 1, 2012 at 1:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 Hi,

 attached is a patch that does $SUBJECT.

 It's a usability enhancement, to take a backup, write
 a minimalistic recovery.conf and start the streaming
 standby in one go.

 I like the writing of recovery.conf.

Agreed.

 In fact, I had it in my code at
 one very early point and took it out in order to get a clean patch
 ready :)

 But I think that part is lacking in functionality: AFAICT it's
 hardcoded to only handle host, port, user and password. What about
 other connection parameters, likely passed to pg_basebackup through
 the environment in that case? isn't that quite likely to break the
 server later?

What about something like PQconninfo which returns the connection
string value established at connection?

 Maybe the proper way around that is to provide the ability for
 pg_basebackup to take a full connection string, just like we allow
 psql to do?

+1

 I'm not sure we should go the way of providing the start slave.
 Given thta how you want to start the slave differs so much on
 platforms. The most glaring example is on windows you really need to
 *start the service* rather than use pg_ctl. Sure, you can document
 your way around that, but I'm not sure the functionality added is
 really worth it.

Agreed.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compiler warnings on the buildfarm

2012-07-01 Thread Stefan Kaltenbrunner
seeing some of the latest commits about fixing compiler warnings I took
a look at the buildfarm to see if there are any interesting ones there
(in total we have a thousends of warnings on the buildfarm but most of
those are from very noisy compilers).

so in case anybody is interested those are a selection of the ones that
at least look somewhat interesting(duplicates mostly removed, windows
ignored):

animal: grebe
Snapshot: 2012-07-01 150224
scan.c: In function 'yy_try_NUL_trans':
scan.c:16243: warning: unused variable 'yyg'
auth.c: In function 'auth_peer':
auth.c:1775: warning: implicit declaration of function 'getpeereid'
ip.c: In function 'getaddrinfo_unix':
ip.c:228: warning: large integer implicitly truncated to unsigned type
Extra instructions are being generated for each reference to a TOC
symbol if the symbol is in the TOC overflow area.
fe-connect.c: In function 'PQconnectPoll':
fe-connect.c:1913: warning: implicit declaration of function 'getpeereid'
ip.c: In function 'getaddrinfo_unix':
ip.c:228: warning: large integer implicitly truncated to unsigned type


animal: spoonbill
Snapshot: 2012-07-01 110005
tuptoaster.c: In function 'heap_tuple_untoast_attr_slice':
tuptoaster.c:198: warning: array size (1) smaller than bound length (16)
tuptoaster.c:198: warning: array size (1) smaller than bound length (16)
tuptoaster.c: In function 'toast_raw_datum_size':
tuptoaster.c:275: warning: array size (1) smaller than bound length (16)
tuptoaster.c:275: warning: array size (1) smaller than bound length (16)
tuptoaster.c: In function 'toast_datum_size':
tuptoaster.c:320: warning: array size (1) smaller than bound length (16)
tuptoaster.c:320: warning: array size (1) smaller than bound length (16)
tuptoaster.c: In function 'toast_save_datum':
tuptoaster.c:1344: warning: array size (1) smaller than bound length (16)
tuptoaster.c:1344: warning: array size (1) smaller than bound length (16)
tuptoaster.c:1458: warning: array size (1) smaller than bound length (16)
tuptoaster.c:1458: warning: array size (1) smaller than bound length (16)
tuptoaster.c: In function 'toast_delete_datum':
tuptoaster.c:1485: warning: array size (1) smaller than bound length (16)
tuptoaster.c:1485: warning: array size (1) smaller than bound length (16)
tuptoaster.c: In function 'toast_fetch_datum':
tuptoaster.c:1610: warning: array size (1) smaller than bound length (16)
tuptoaster.c:1610: warning: array size (1) smaller than bound length (16)
tuptoaster.c: In function 'toast_fetch_datum_slice':
tuptoaster.c:1779: warning: array size (1) smaller than bound length (16)
tuptoaster.c:1779: warning: array size (1) smaller than bound length (16)
relmapper.c: In function 'relmap_redo':
relmapper.c:876: warning: array size (1) smaller than bound length (512)
relmapper.c:876: warning: array size (1) smaller than bound length (512)
elog.c: In function 'write_pipe_chunks':
elog.c:2541: warning: array size (1) smaller than bound length (503)
elog.c:2541: warning: array size (1) smaller than bound length (503)


animal: jaguarundi
Snapshot: 2012-07-01 031500
plpy_exec.c: In function 'PLy_procedure_call':
plpy_exec.c:818: warning: null format string

animal: rover_firefly
Snapshot: 2012-07-01 030400
float.c: In function 'is_infinite':
float.c:167:2: warning: implicit declaration of function 'isinf'
[-Wimplicit-function-declaration]
geo_ops.c: In function 'pg_hypot':
geo_ops.c:5455:2: warning: implicit declaration of function 'isinf'
[-Wimplicit-function-declaration]
execute.c: In function 'sprintf_double_value':
execute.c:473:2: warning: implicit declaration of function 'isinf'
[-Wimplicit-function-declaration]


animal: nightjar
Snapshot: 2012-07-01 023700
In file included from
/usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/backend/parser/gram.y:13338:
scan.c: In function 'yy_try_NUL_trans':
scan.c:16243: warning: unused variable 'yyg'
/usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/plpython/plpy_exec.c:
In function 'PLy_procedure_call':
/usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/plpython/plpy_exec.c:818:
warning: null format string
/usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/tcl/pltcl.c:
In function '_PG_init':
/usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/tcl/pltcl.c:343:
warning: assignment from incompatible pointer type
/usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/tcl/pltcl.c:344:
warning: assignment from incompatible pointer type


animal: locust
Snapshot: 2012-07-01 023122
xlog.c: In function 'StartupXLOG':
xlog.c:5988: warning: 'checkPointLoc' may be used uninitialized in this
function
pgstat.c: In function 'pgstat_report_activity':
pgstat.c:2538: warning: passing argument 1 of
'__dtrace_probe$postgresql$statement__status$v1$63686172202a' discards
qualifiers from pointer target type
In file included from repl_gram.y:172:
postgres.c: In function 'pg_parse_query':
postgres.c:559: warning: passing argument 1 of

Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance

2012-07-01 Thread Fujii Masao
Thanks for the review!

On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 You agreed to add something like NOSYNC option into START_REPLICATION 
 command?

 I'm on the fence. I was hoping somebody else would chime in with an
 opinion as well.

 +1

 Nobody else with any opinion on this? :(

 I don't think we really need a NOSYNC flag at this point.  Just not
 setting the flush location in clients that make a point of flushing in
 a timely fashion seems fine.

 Okay, I'm in the minority, so I'm writing the patch that way. WIP
 patch attached.

 In the patch, pg_basebackup background process and pg_receivexlog always
 return invalid location as flush one, and will never become sync standby even
 if their name is in synchronous_standby_names. The timing of their sending

 That doesn't match with the patch, afaics. The patch always sets the
 correct write location, which means it can become a remote_write
 synchronous standby, no? It will only send it back when timeout
 expires, but it will be sent back.

No. Though correct write location is sent back, they don't become sync standby
because flush location is always invalid. While flush location is
invalid, the master
will never regard the remote server as sync one even if synchronous_commit is
set to remote_write.


 I wonder if that might actually be a more reasonable mode of operation
 in general:

 * always send back the write position, at the write interval
 * always send back the flush position, when we're flushing (meaning
 when we switch xlog)

 have an option that makes it possible to:
 * always send back the write position as soon as it changes (making
 for a reasonable remote_write sync standby)
 * actually flush the log after each write instead of end of file
 (making for a reasonable full sync standby)

 meaning you'd have something like pg_receivexlog --sync=write and
 pg_receivexlog --sync=flush controlling it instead.

Yeah, in this way, pg_receivexlog can become sync even if
synchronous_commit is on, which seems more useful. But
I'm thinking that the synchronous pg_receivexlog stuff should
be postponed to 9.3 because its patch seems to become too
big to apply at this beta stage. So, in 9.2, to fix the problem,
what about just applying the simple patch which prevents
pg_basebackup background process and pg_receivexlog from
becoming sync standby whatever synchronous_standby_names
and synchronous_commit are set to?

 And deal with the user put * in synchronous_standby_names and
 accidentally got pg_receivexlog as the sync standby by more clearly
 warning people not to use * for that parameter... Since it's simply
 dangerous :)

Yep.

 the reply depends on the standby_message_timeout specified in -s option. So
 the write position may lag behind the true position.

 pg_receivexlog accepts new option -S (better option character?). If this 
 option
 is specified, pg_receivexlog returns true flush position, and can become sync
 standby. It sends back the reply to the master each time the write position
 changes or the timeout passes. If synchronous_commit is set to remote_write,
 synchronous replication to pg_receivexlog would work well.

 Yeah, I hadn't considered the remote_write mode, but I guess that's
 why you have to track the current write position across loads, which
 first confused me.

The patch has to track the current write location to decide whether to send
back the reply to the master, IOW to know whether the write location
has changed, IOW to know whether we've already sent the reply about
the latest write location.

 Looking at some other usecases for this, I wonder if we should also
 force a status message whenever we switch xlog files, even if we
 aren't running in sync mode, even if the timeout hasn't expired. I
 think that would be a reasonable thing to do, since you often want to
 track things based on files.

You mean that the pg_receivexlog should send back the correct flush
location whenever it switches xlog files?

 The patch needs more documentation. But I think that it's worth reviewing the
 code in advance, so I attached the WIP patch. Comments? Objections?

 Looking at the code, what exactly prompts the changes to the backend
 side? That seems unrelated? Are we actually considering picking a
 standby with InvalidXlogRecPtr as a sync standby today?

 Isn't it enough to just send the proper write and flush locations from
 the frontend?

No, unless I'm missing something.

The problem that we should address first is that the master can pick up
pg_basebackup background process and pg_receivexlog as a sync
standby even if they always return an invalid write and flush positions.
Since they don't send any correct write and flush positions, if they are
accidentally regarded as 

Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash

2012-07-01 Thread Robert Haas
On Sat, Jun 30, 2012 at 5:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 Currently its possible to cause transactions to fail with ALTER ENUM ADD
 AFTER/BEFORE:

 psql 1:

 CREATE TYPE enumcrash AS ENUM('a', 'b');
 CREATE FUNCTION randenum() RETURNS enumcrash LANGUAGE sql AS $$SELECT * FROM
 unnest(enum_range('a'::enumcrash)) ORDER BY random() LIMIT 1$$;
 CREATE TABLE enumcrash_table(id serial primary key, val enumcrash);
 CREATE INDEX enumcrash_table__val__id ON enumcrash_table (val, id);
 INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1,
 1);INSERT 0 1

 psql 2:
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1,
 1);

 psql 1:
 ALTER TYPE enumcrash ADD VALUE 'a_1' AFTER 'a';
 INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1,
 1);

 psql 2:
 INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1,
 1);
 ERROR:  XX000: enum value 117745 not found in cache for enum enumcrash
 LOCATION:  compare_values_of_enum, typcache.c:957

 This is not surprising. psql 2's backend finds rows in the index with enum
 values that are not visible in its mvcc snapshot. That mvcc snapshot is needed
 because a_1 is an enum value with an uneven numbered oid because its inserted
 after the initial creation.

I think the problem is that load_enum_cache_data() uses
GetTransactionSnapshot() rather than GetLatestSnapshot().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Robert Haas
On Sun, Jul 1, 2012 at 11:13 AM, Nils Goroll sl...@schokola.de wrote:
 as this patch was not targeted towards increasing tps, I am at happy to hear
 that your benchmarks also suggest that performance is comparable.

 But my main question is: how about resource consumption? For the issue I am
 working on, my current working hypothesis is that spinning on locks saturates
 resources and brings down overall performance in a high-contention situation.

 Do you have any getrusage figures or anything equivalent?

Spinlock contentions cause tps to go down.  The fact that tps didn't
change much in this case suggests that either these workloads don't
generate enough spinlock contention to benefit from your patch, or
your patch doesn't meaningfully reduce it, or both.  We might need a
test case that is more spinlock-bound to observe an effect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Robert Haas
On Sun, Jul 1, 2012 at 11:18 AM, Nils Goroll sl...@schokola.de wrote:
 test runs on an IBM POWER7 system with 16 cores, 64 hardware threads.

 Could you add the CPU Type / clock speed please?

cpu : POWER7 (architected), altivec supported
clock   : 3550.00MHz
revision: 2.1 (pvr 003f 0201)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new --maintenance-db options

2012-07-01 Thread Robert Haas
On Fri, Jun 29, 2012 at 3:32 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Jun 25, 2012 at 11:57:36AM -0400, Robert Haas wrote:
 In retrospect, it seems as though it might have been a good idea to
 make the postgres database read-only and undroppable, so that all
 client utilities could count on being able to connect to it and get a
 list of databases in the cluster without the need for all this
 complexity.  Or else having some other way for a client to
 authenticate and list out all the available databases.  In the absence
 of such a mechanism, I don't think we can turn around and say that not
 having a postgres database is an unsupported configuration, and
 therefore we need some way to cope with it when it happens.

 Well, we certainly don't allow 'template1' to be dropped:

 test= DROP DATABASE template1;
 ERROR:  cannot drop a template database

 so you could make the argument that making 'postgres' undroppable seem
 reasonable.  I should point out that it was EnterpriseDB that complained
 about this related to their Advanced Server product, that doesn't have a
 'postgres' database, but an 'edb' one.  I said that was their problem,
 but when community users said they also dropped the 'postgres' database,
 it became a community problem too.

 Where are we going on this for PG 9.2?  9.3?  I hate to ship options in
 9.2 that will be gone in 9.3.

 FYI, we do allow the 'template1' database to be renamed:

 test= ALTER DATABASE template1 RENAME TO template2;
 ALTER DATABASE

 Oops.  TODO?

Not only that, but you can change datistemplate and then drop it OR
rename it.  We don't have a rule that says you can't drop template1.
 We have a rule that says you can't drop template databases.
template1 is merely the default template database, but the user can
create more, and they can get rid of, rename, or modify that one.  I
imagine that most people don't, but let's not make up an imaginary
rule that template1 always has to exist, because it doesn't.

Also, even if it does exist, it may have datallowconn = false (I think
I've actually seen this, on a system that also had no postgres
database), or pg_hba.conf may exclude connections to it, or it may be
screwed up in a hundred other ways (databases that can't be connected
to because the system catalogs are screwed up are not terribly rare).
So in my opinion, any code that relies on the existence of, ability to
connect to, or sane state of a database with any particular name is
plain broken, because somebody somewhere is going to have an
installation where it isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance

2012-07-01 Thread Magnus Hagander
On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 You agreed to add something like NOSYNC option into START_REPLICATION 
 command?

 I'm on the fence. I was hoping somebody else would chime in with an
 opinion as well.

 +1

 Nobody else with any opinion on this? :(

 I don't think we really need a NOSYNC flag at this point.  Just not
 setting the flush location in clients that make a point of flushing in
 a timely fashion seems fine.

 Okay, I'm in the minority, so I'm writing the patch that way. WIP
 patch attached.

 In the patch, pg_basebackup background process and pg_receivexlog always
 return invalid location as flush one, and will never become sync standby 
 even
 if their name is in synchronous_standby_names. The timing of their sending

 That doesn't match with the patch, afaics. The patch always sets the
 correct write location, which means it can become a remote_write
 synchronous standby, no? It will only send it back when timeout
 expires, but it will be sent back.

 No. Though correct write location is sent back, they don't become sync standby
 because flush location is always invalid. While flush location is
 invalid, the master
 will never regard the remote server as sync one even if synchronous_commit is
 set to remote_write.

Oh. I wasn't aware of that part.


 I wonder if that might actually be a more reasonable mode of operation
 in general:

 * always send back the write position, at the write interval
 * always send back the flush position, when we're flushing (meaning
 when we switch xlog)

 have an option that makes it possible to:
 * always send back the write position as soon as it changes (making
 for a reasonable remote_write sync standby)
 * actually flush the log after each write instead of end of file
 (making for a reasonable full sync standby)

 meaning you'd have something like pg_receivexlog --sync=write and
 pg_receivexlog --sync=flush controlling it instead.

 Yeah, in this way, pg_receivexlog can become sync even if
 synchronous_commit is on, which seems more useful. But
 I'm thinking that the synchronous pg_receivexlog stuff should
 be postponed to 9.3 because its patch seems to become too
 big to apply at this beta stage. So, in 9.2, to fix the problem,
 what about just applying the simple patch which prevents
 pg_basebackup background process and pg_receivexlog from
 becoming sync standby whatever synchronous_standby_names
 and synchronous_commit are set to?

Agreed.

With the addition that we should set the write location, because
that's very useful and per what you said above should be perfectly
safe.


 And deal with the user put * in synchronous_standby_names and
 accidentally got pg_receivexlog as the sync standby by more clearly
 warning people not to use * for that parameter... Since it's simply
 dangerous :)

 Yep.

What would be  good wording? Something along the line of Using the *
entry is not recommended since it can lead to unexpected results when
new standbys are added or something like that?


 the reply depends on the standby_message_timeout specified in -s option. So
 the write position may lag behind the true position.

 pg_receivexlog accepts new option -S (better option character?). If this 
 option
 is specified, pg_receivexlog returns true flush position, and can become 
 sync
 standby. It sends back the reply to the master each time the write position
 changes or the timeout passes. If synchronous_commit is set to remote_write,
 synchronous replication to pg_receivexlog would work well.

 Yeah, I hadn't considered the remote_write mode, but I guess that's
 why you have to track the current write position across loads, which
 first confused me.

 The patch has to track the current write location to decide whether to send
 back the reply to the master, IOW to know whether the write location
 has changed, IOW to know whether we've already sent the reply about
 the latest write location.

Yeha, makes perfect sense.


 Looking at some other usecases for this, I wonder if we should also
 force a status message whenever we switch xlog files, even if we
 aren't running in sync mode, even if the timeout hasn't expired. I
 think that would be a reasonable thing to do, since you often want to
 track things based on files.

 You mean that the pg_receivexlog should send back the correct flush
 location whenever it switches xlog files?

No, I mean just send back a status message. Meaning that without
specifiying the sync modes per above, it would send back the *write*
location. This would be useful for tracking xlog filenames between
master and pg_receivexlog, without extra delay.

 The patch needs more documentation. But I think 

Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Nils Goroll
Hi Robert,

 Spinlock contentions cause tps to go down.  The fact that tps didn't
 change much in this case suggests that either these workloads don't
 generate enough spinlock contention to benefit from your patch, or
 your patch doesn't meaningfully reduce it, or both.  We might need a
 test case that is more spinlock-bound to observe an effect.

Agree. My understanding is that

- for no contention, aquiring a futex should almost be as fast as aquiring a
  spinlock, so we should observe

  - comparable tps
  - comparable resource consumption

  I believe this is what your test has shown for the low concurrency tests.


- for light contention, spinning will be faster than syscalling, so
  we should observe with the patch

  - slightly worse tps
  - more syscalls, otherwise comparable resource consumption

  I believe your test supports the first point for high concurrency tests.


- for high contention, spinning should be be
  - unfair (because the time to aquire a lock is not deterministic -
individual threads could starve)
  - much less efficient

  and we should see with the patch

  - slightly better tps if the system is not saturated because
the next process to aquire a contended futex gets scheduled immediately,
rather than when a process returns from sleeping

- much better tps if the system is saturated / oversubscribed due to
  increased scheduling latency for spinning processes

  - significantly lower resource consumption
- so we should have much more headroom before running into saturation
  as described above


So would it be possible for you to record resource consumption and rerun the 
test?

Thank you, Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Hans-Jürgen Schönig
On Jul 1, 2012, at 5:44 PM, Magnus Hagander wrote:

 On Sun, Jul 1, 2012 at 1:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 Hi,
 
 attached is a patch that does $SUBJECT.
 
 It's a usability enhancement, to take a backup, write
 a minimalistic recovery.conf and start the streaming
 standby in one go.
 
 I like the writing of recovery.conf. In fact, I had it in my code at
 one very early point and took it out in order to get a clean patch
 ready :)
 
 But I think that part is lacking in functionality: AFAICT it's
 hardcoded to only handle host, port, user and password. What about
 other connection parameters, likely passed to pg_basebackup through
 the environment in that case? isn't that quite likely to break the
 server later?
 


one option would be to check the environments and take them if needed.
however, i am not sure if this is a good idea either - just thing of PGPASSWORD 
or so. do we really want to take it and write it to a file straight away? i 
guess there are arguments for both ideas.

still, i guess your argument is a reasonable one.



 Maybe the proper way around that is to provide the ability for
 pg_basebackup to take a full connection string, just like we allow
 psql to do?
 


this would make things redundant. i am quite sure some users might not get the 
distinction straight away.


 
 
 I'm not sure we should go the way of providing the start slave.
 Given thta how you want to start the slave differs so much on
 platforms. The most glaring example is on windows you really need to
 *start the service* rather than use pg_ctl. Sure, you can document
 your way around that, but I'm not sure the functionality added is
 really worth it. What about all the other potential connection
 parameters.


regards,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for array_remove and array_replace functions

2012-07-01 Thread Alex Hunsaker
On Sat, Jun 30, 2012 at 3:28 PM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:

 On 30/06/2012 04:16, Alex Hunsaker wrote:
 
  Hi, I've been reviewing this patch.
 
  Good documentation, and regression tests. The code looked fine but I
  didn't care for the code duplication between array_replace and
  array_remove so I merged those into a helper function,
  array_replace_internal(). Thoughts?

 It looks reasonable.

 There was a typo in array_replace which was caught by regression tests.
 I've fixed the typo and changed a comment in array_replace_internal.

 Patch v3 attached.

Looks good to me, marked ready for commiter.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash

2012-07-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jun 30, 2012 at 5:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 This is not surprising. psql 2's backend finds rows in the index with enum
 values that are not visible in its mvcc snapshot.

 I think the problem is that load_enum_cache_data() uses
 GetTransactionSnapshot() rather than GetLatestSnapshot().

That would only make the race condition window smaller (ie, hard
to reproduce manually like this, but not gone).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash

2012-07-01 Thread Tom Lane
I wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think the problem is that load_enum_cache_data() uses
 GetTransactionSnapshot() rather than GetLatestSnapshot().

 That would only make the race condition window smaller (ie, hard
 to reproduce manually like this, but not gone).

No, wait, we made ALTER TYPE ADD VALUE PreventTransactionChain so that
uncommitted enum OIDs could never get into tables or indexes.  So I
think you're right, forcing a new snapshot to be used would fix this.

However, I'm a bit worried by the if (!FirstSnapshotSet) restriction
in GetLatestSnapshot.  Are we sure that enum comparisons could never
happen without a snapshot already being set?  What's the point of
throwing an error there anyway, as opposed to letting it redirect to
GetTransactionSnapshot?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Jeff Janes
On Fri, Jun 29, 2012 at 10:07 AM, Nils Goroll sl...@schokola.de wrote:

 On 06/28/12 05:21 PM, Jeff Janes wrote:

 It looks like the hacked code is slower than the original.  That
 doesn't seem so good to me.  Am I misreading this?

 No, you are right - in a way. This is not about maximizing tps, this is about
 maximizing efficiency under load situations

But why wouldn't this maximized efficiency present itself as increased TPS?


 Also, 20 transactions per connection is not enough of a run to make
 any evaluation on.

 As you can see I've repeated the tests 10 times. I've tested slight variations
 as mentioned above, so I was looking for quick results with acceptable 
 variation.

Testing it 10 times doesn't necessarily improve things.  You have ten
times as many transactions, but you also have 10 times as much
start-up and tear-down events polluting the numbers.  (Unless this
start up and tear down are exactly the things you are trying to
measure).  And once you change your benchmark so that it is testing
what you want to be testing, it will probably get even worse.  You
should use at least -T30, rather than -t20.

Anyway, your current benchmark speed of around 600 TPS over such a
short time periods suggests you are limited by fsyncs.  It is going to
be pretty hard to get a spinlock bottleneck in simple queries like
pgbench does as long as that is the case.  You could turn --fsync=off,
or just change your benchmark to a read-only one like -S, or better
the -P option I've been trying get into pgbench.

Does your production server have fast fsyncs (BBU) while your test
server does not?



 Regarding the actual production issue, I did not manage to synthetically 
 provoke
 the saturation we are seeing in production using pgbench - I could not even 
 get
 anywhere near the production load.

 What metrics/tools are you using to compare the two loads?

 We've got cpu + load avg statistics for the old+new machine and compared 
 values
 before/after the migration. The user load presumably is comparable and the 
 main
 metric is users complaining vs. users happy.

The users probably don't care about the load average.  Presumably they
are unhappy because of lowered throughput (TPS) or higher peak latency
(-l switch in pgbench).  So I think the only use of load average is to
verify that your benchmark is nothing like your production workload.
(But it doesn't go the other way around, just because the load
averages are similar doesn't mean the actual workloads are.)

 I wish we had a synthetic benchmark close to the actual load, and I hope that
 one of the insights from this will be that the customer should have one.

If they could simulate a workload close to what they actually do, that
would be great.  But surely just with fairly simple pgbench
configuration you can get much closer to it than what you are
currently.

 What is the production load like?

 Here's an anonymized excerpt from a pgFouine analysis of 137 seconds worth of
 query logs at average production user load.

 Type    Count   Percentage
 SELECT  80,217  27.1
 INSERT   6,248   2.1
 UPDATE  37,159  12.6
 DELETE   4,579   1.5

Without knowing how complicated the joins involved in the various
statements are, I don't think I can get much info out of this.  but
I'm not familiar with pgFouine, maybe there is another way to
summarize its output that is more informative.


 Queries that took up the most time (N) ^


 Rank    Total duration  Times executed  Av. duration s  Query
 1       3m39s           83,667           0.00           COMMIT;

So fsync's probably are not totally free on production, but I still
think they must be much cheaper than on your test box.

 2       54.4s                2          27.18           SELECT ...

That is interesting.  Maybe those two queries are hammering everything
else to death.

 3       41.1s              281           0.15           UPDATE ...
 4       25.4s           18,960           0.00           UPDATE ...
 5       21.9s   ...

 the 9th rank is already below 10 seconds Total duration

But how does the 9th rank through the final rank, cumulatively, stack up?

In other words, how many query-seconds worth of time transpired during
the 137 wall seconds?  That would give an estimate of how many
simultaneously active connections the production server has.

 Each transaction has to update one of ten pgbench_branch rows, so you
 can't have more than ten transactions productively active at any given
 time, even though you have 768 connections.  So you need to jack up
 the pgbench scale, or switch to using -N mode.

 Sorry for having omitted that detail. I had initialized pgbench with -i -s 100

Are you sure?  In an earlier email you reported the entire output of
pgbench, and is said it was using 10.  Maybe you've changed it since
then...


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2012-07-01 Thread Jeff Janes
On Wed, Jun 20, 2012 at 12:32 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 01.06.2012 03:02, Jeff Janes wrote:

 I've attached a new patch which addresses several of your concerns,
 and adds the documentation.  The description is much longer than the
 descriptions of other nearby options, which mostly just give a simple
 statement of what they do rather than a description of why that is
 useful.  I don't know if that means I'm starting a good trend, or a
 bad one, or I'm just putting the exposition in the wrong place.

 In addition to showing the benefits of coding things on the server
 side when that is applicable, it also allows hackers to stress parts
 of the server code that are not easy to stress otherwise.


 As you mentioned in your original email over a year ago, most of this could
 be done as a custom script. It's nice to have another workload to test, but
 then again, there's an infinite number of workloads that might be
 interesting.

I would argue that my specific proposed transaction does occupy a
somewhat privileged place, as it uses the default table structure, and
it fits on a natural progression along the default, -N, -S continuum.
In each step one (or more) source of bottleneck is removed, to allow
different ones to bubble up to the top to be inspected and addressed.

I've written dozens of custom benchmark scripts, and so far this is
the only one I've thought was generally useful enough to think that it
belongs inside the core.

Do I need to make a better argument for why this particular
transaction is as generally useful as I think it is; or is the
objection to the entire idea that any new transactions should be added
at all?


 You can achieve the same with this custom script:

 \set loops 512

 do $$ DECLARE  sum integer default 0; amount integer; account_id integer;
 BEGIN FOR i IN 1..:loops LOOP   account_id=1 + floor(random() * :scale);
 SELECT abalance into strict amount FROM pgbench_accounts WHERE aid =
 account_id;   sum := sum + amount; END LOOP; END; $$;

 It's a bit awkward because it has to be all on one line, and you don't get
 the auto-detection of scale. But those are really the only differences
 AFAICS.


True.  And we could fix the one-line thing by allowing lines ending in
\ to be continued.  And this might be mostly backwards compatible,
as I don't think there is much of a reason to end a statement with a
literal \.

We could make the scale be auto-detected, but that would be more
dangerous as a backward compatibility thing, as people with existing
custom scripts might rely :scale being set to 1.

But it would defeat my primary purpose of making it easy for us to ask
some else posting on this list or on performance to run this.  Copy
this file someplace, then make sure you use the correct scale, and
remember where you put it, etc. is a much higher overhead.

Also, my patch changes the output formatting in a way that couldn't be
done with a custom script, and might be very confusing if it were not
changed.  I don't how good of an argument this is.  Remember to
multiply the reported TPS by 512 is another barrier to easy use.


 I think we would be better off improving pgbench to support those things in
 custom scripts. It would be nice to be able to write initialization steps
 that only run once in the beginning of the test.

I can think of 3 possibly desirable behaviors.  Run once at -i time,
Run once per pgbench run, and run once per connection.  Should all
of those be implemented?

 You could then put the
 SELECT COUNT(*) FROM pgbench_branches there, to do the scale
 auto-detection.

So, we should add a way to do \set variable_name SQL QUERY?

Would we use a command other than \set to do that, or look at the
thing after the variable to decide if it is a query rather than a
simple expression?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Nils Goroll
Hi Jeff,

 It looks like the hacked code is slower than the original.  That
 doesn't seem so good to me.  Am I misreading this?

 No, you are right - in a way. This is not about maximizing tps, this is about
 maximizing efficiency under load situations
 
 But why wouldn't this maximized efficiency present itself as increased TPS?

Because the latency of lock aquision influences TPS, but this is only marginally
related to the cost in terms of cpu cyclues to aquire the locks.

See my posting as of Sun, 01 Jul 2012 21:02:05 +0200 for an overview of my
understanding.

 Also, 20 transactions per connection is not enough of a run to make
 any evaluation on.

 As you can see I've repeated the tests 10 times. I've tested slight 
 variations
 as mentioned above, so I was looking for quick results with acceptable 
 variation.
 
 Testing it 10 times doesn't necessarily improve things.

My intention was to average over the imperfections of rusage accounting because
I was maily interested in lowering rusage, not maximizing tps.

Yes, in order to get reliable results, I'd have to run longer tests, but
interestingly the results from my quick tests already approximated those from
the huge tests Robert has run with respect to the differences between unpatched
and patched.

 You should use at least -T30, rather than -t20.

Thanks for the advice - it is really appreciated and I will take it when I run
more test tests.

But I don't understand yet how to best provoke high spinlock concurrency with
pgbench. Or are there are any other test tools out there for this case?

 Anyway, your current benchmark speed of around 600 TPS over such a
 short time periods suggests you are limited by fsyncs.

Definitely. I described the setup in my initial posting (why roll-your-own
s_lock? / improving scalability - Tue, 26 Jun 2012 19:02:31 +0200)

 pgbench does as long as that is the case.  You could turn --fsync=off,
 or just change your benchmark to a read-only one like -S, or better
 the -P option I've been trying get into pgbench.

I don't like to make assumptions which I haven't validated. The system showing
the behavior is designed to write to persistent SSD storage in order to reduce
the risk of data loss by a (BBU) cache failure. Running a test with fsync=off
would divert even further from reality.

 Does your production server have fast fsyncs (BBU) while your test
 server does not?

No, we're writing directly to SSDs (ref: initial posting).

 The users probably don't care about the load average.  Presumably they
 are unhappy because of lowered throughput (TPS) or higher peak latency
 (-l switch in pgbench).  So I think the only use of load average is to
 verify that your benchmark is nothing like your production workload.
 (But it doesn't go the other way around, just because the load
 averages are similar doesn't mean the actual workloads are.)

Fully agree.


 RankTotal duration  Times executed  Av. duration s  Query
 1   3m39s   83,667   0.00   COMMIT;
 
 So fsync's probably are not totally free on production, but I still
 think they must be much cheaper than on your test box.

Oh, the two are the same. I ran the tests on the prod machine during quiet 
periods.

 2   54.4s2  27.18   SELECT ...
 
 That is interesting.  Maybe those two queries are hammering everything
 else to death.

With 64 cores?

I should have mentioned that these were simply the result of a missing index
when the data was collected.

 But how does the 9th rank through the final rank, cumulatively, stack up?
 
 In other words, how many query-seconds worth of time transpired during
 the 137 wall seconds?  That would give an estimate of how many
 simultaneously active connections the production server has.

Sorry, I should have given you the stats from pgFouine:

Number of unique normalized queries: 507
Number of queries: 295,949
Total query duration: 8m38s
First query: 2012-06-23 14:51:01
Last query: 2012-06-23 14:53:17
Query peak: 6,532 queries/s at 2012-06-23 14:51:33

 Sorry for having omitted that detail. I had initialized pgbench with -i -s 
 100
 
 Are you sure?  In an earlier email you reported the entire output of
 pgbench, and is said it was using 10.  Maybe you've changed it since
 then...

good catch, I was wrong in the email you quoted. Sorry.

-bash-4.1$ rsync -av --delete /tmp/test_template_data/ /tmp/data/
...
-bash-4.1$ ./postgres -D /tmp/data -p 55502 
[1] 38303
-bash-4.1$ LOG:  database system was shut down at 2012-06-26 23:18:42 CEST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
-bash-4.1$ ./psql -p 55502
psql (9.1.3)
Type help for help.
postgres=# select count(*) from pgbench_branches;
 count
---
10
(1 row)


Thank you very much, Jeff! The one question remains: Do we really have all we
need to provoke very high lock contention?

Nils

-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-07-01 Thread Jeff Janes
On Thu, Jun 28, 2012 at 6:57 PM, Josh Berkus j...@agliodbs.com wrote:

 A second obstacle to opportunistic wraparound vacuum is that
 wraparound vacuum is not interruptable. If you have to kill it off and
 do something else for a couple hours, it can't pick up where it left
 off; it needs to scan the whole table from the beginning again.

Would recording a different relfrozenxid for each 1GB chunk of the
relation solve that?


 Since your users weren't complaining about performance with one or two
 autovac workers running (were they?),

 No, it's when we hit 3 that it fell over.  Thresholds vary with memory
 and table size, of course.

Does that mean it worked fine with 2 workers simultaneously in large
tables, or did that situation not occur and so it is not known whether
it would have worked fine or not?


 BTW, the primary reason I think (based on a glance at system stats) this
 drove the system to its knees was that the simultaneous wraparound
 vacuum of 3 old-cold tables evicted all of the current data out of the
 FS cache, forcing user queries which would normally hit the FS cache
 onto disk.  I/O throughput was NOT at 100% capacity.

Do you know if it was the input or the output that caused that to
happen?  I would think the kernel has logic similar to BAS to prevent
reading a huge amount of data sequentially from evicting all the other
data.  But that logic might be defeated if all that data is dirtied
right after being read.

If the partitions had not been touched since the last freeze, then it
should generate no dirty blocks (right?), but if they were touched
since then you could basically be writing out the entire table.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proof of concept: auto updatable views

2012-07-01 Thread Dean Rasheed
Hi,

I've been playing around with the idea of supporting automatically
updatable views, and I have a working proof of concept. I've taken a
different approach than the previous attempts to implement this
feature (e.g., 
http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php),
instead doing all the work in the rewriter, substituting the view for
its base relation rather than attempting to auto-generate any rules or
triggers.

Basically what it does is this: in the first stage of query rewriting,
just after any non-SELECT rules are applied, the new code kicks in -
if the target relation is a view, and there were no unqualified
INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the
view is simply updatable. If so, the target view is replaced by its
base relation and columns are re-mapped. Then the remainder of the
rewriting process continues, recursively handling any further
non-SELECT rules or additional simply updatable views. This handles
the case of views on top of views, with or without rules and/or
triggers.

Here's a simple example:

CREATE TABLE my_table(id int primary key, val text);
CREATE VIEW my_view AS SELECT * FROM my_table WHERE id  0;

then any modifications to the view get redirected to underlying table:

EXPLAIN ANALYSE INSERT INTO my_view VALUES(1, 'Test row');
   QUERY PLAN

 Insert on my_table  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.208..0.208 rows=0 loops=1)
   -  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.004..0.004 rows=1 loops=1)
 Total runtime: 0.327 ms
(3 rows)


EXPLAIN ANALYSE UPDATE my_view SET val='Updated' WHERE id=1;
  QUERY PLAN
---
 Update on my_table  (cost=0.00..8.27 rows=1 width=10) (actual
time=0.039..0.039 rows=0 loops=1)
   -  Index Scan using my_table_pkey on my_table  (cost=0.00..8.27
rows=1 width=10) (actual time=0.014..0.015 rows=1 loops=1)
 Index Cond: ((id  0) AND (id = 1))
 Total runtime: 0.090 ms
(4 rows)


EXPLAIN ANALYSE DELETE FROM my_view;
   QUERY PLAN

 Delete on my_table  (cost=0.00..1.01 rows=1 width=6) (actual
time=0.030..0.030 rows=0 loops=1)
   -  Seq Scan on my_table  (cost=0.00..1.01 rows=1 width=6) (actual
time=0.015..0.016 rows=1 loops=1)
 Filter: (id  0)
 Total runtime: 0.063 ms
(4 rows)


The patch is currently very strict about what kinds of views can be
updated (based on SQL-92), and there is no support for WITH CHECK
OPTION, because I wanted to keep this patch as simple as possible.

The consensus last time seemed to be that backwards compatibility
should be offered through a new GUC variable to allow this feature to
be disabled globally, which I've not implemented yet.

I'm also aware that my new function ChangeVarAttnos() is almost
identical to the function map_variable_attnos() that Tom recently
added, but I couldn't immediately see a neat way to merge the two. My
function handles whole-row references to the view by mapping them to a
generic RowExpr based on the view definition. I don't think a
ConvertRowtypeExpr can be used in this case, because the column names
don't necessarily match.

Obviously there's still more work to do but the early signs seem to be
encouraging.

Thoughts?


Regards,
Dean


auto-update-views.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-07-01 Thread Jeff Janes
On Sun, Jul 1, 2012 at 2:28 PM, Nils Goroll sl...@schokola.de wrote:
 Hi Jeff,

 It looks like the hacked code is slower than the original.  That
 doesn't seem so good to me.  Am I misreading this?

 No, you are right - in a way. This is not about maximizing tps, this is 
 about
 maximizing efficiency under load situations

 But why wouldn't this maximized efficiency present itself as increased TPS?

 Because the latency of lock aquision influences TPS, but this is only 
 marginally
 related to the cost in terms of cpu cyclues to aquire the locks.

 See my posting as of Sun, 01 Jul 2012 21:02:05 +0200 for an overview of my
 understanding.

I still don't see how improving that could not improve TPS.  But let's
focus on reproducing the problem first, otherwise it is all just
talking in the dark.

 But I don't understand yet how to best provoke high spinlock concurrency with
 pgbench. Or are there are any other test tools out there for this case?

Use pgbench -S, or apply my patch from pgbench--new transaction type
and then run pgbench -P.

Make sure that the scale is such that all of your data fits in
shared_buffers (I find on 64 bit that pgbench takes about 15MB *
scale)

 Anyway, your current benchmark speed of around 600 TPS over such a
 short time periods suggests you are limited by fsyncs.

 Definitely. I described the setup in my initial posting (why roll-your-own
 s_lock? / improving scalability - Tue, 26 Jun 2012 19:02:31 +0200)

OK.  It looks like several things changed simultaneously.  How likely
do you think it is that the turning off of the write cache caused the
problem?


 pgbench does as long as that is the case.  You could turn --fsync=off,
 or just change your benchmark to a read-only one like -S, or better
 the -P option I've been trying get into pgbench.

 I don't like to make assumptions which I haven't validated. The system showing
 the behavior is designed to write to persistent SSD storage in order to reduce
 the risk of data loss by a (BBU) cache failure. Running a test with fsync=off
 would divert even further from reality.

I think that you can't get much farther from reality than your current
benchmarks are, I'm afraid.

If your goal is the get pgbench closer to being limited by spinlock
contention, then fsync=off, or using -S or -P, will certainly do that.

So if you have high confidence that spinlock contention is really the
problem, fsync=off will get you closer to the thing you want to focus
on, even if it takes you further away from the holistic big-picture
production environment.   And since you went to the trouble of making
patches for spinlocks, I assume you are fairly confident that that is
the problem.

If you are not confident that spinlocks are really the problem, then I
agree it would be a mistake to try to craft a simple pgbench run which
focuses in on one tiny area which might not actually be the correct
area.  In that case, you would instead want to either create a very
complicated workload that closely simulates your production load (a
huge undertaking) or find a way to capture an oprofile of the
production server while it is actually in distress.  Also, it would
help if you could get oprofile to do a call graph so you can see which
call sites the contended spin locks are coming from (sorry, I don't
know how to do this successfully with oprofile)




 Does your production server have fast fsyncs (BBU) while your test
 server does not?

 No, we're writing directly to SSDs (ref: initial posting).

OK.  So it seems like the pgbench workload you are doing are limited
by fsyncs, and the CPU is basically idle because of that limit.  While
your real work load needs a much larger amount of processing power per
fsync, so it is closer to both limits at the same time.  But, since
the stats you posted were for the normal rather than the distressed
state, maybe I'm way off here.

Anyway, the easiest way to increase the pgbench CPU per fsync need
is to turn of fsync or synchronous_commit, or to switch to read only
queries.


 2       54.4s                2          27.18           SELECT ...

 That is interesting.  Maybe those two queries are hammering everything
 else to death.

 With 64 cores?

Maybe.  That is the nature of spin-locks.  The more cores you have,
the more other things each one interferes with.  Except that the
duration is not long enough to cover the entire run period.  But then
again, maybe in the distressed state those same queries did cover the
entire duration.  But yeah, now that I think about it this would not
be my top hypothesis.



 In other words, how many query-seconds worth of time transpired during
 the 137 wall seconds?  That would give an estimate of how many
 simultaneously active connections the production server has.

 Sorry, I should have given you the stats from pgFouine:

     Number of unique normalized queries: 507
     Number of queries: 295,949
     Total query duration: 8m38s
     First query: 2012-06-23 14:51:01
     Last query: 

Re: [HACKERS] Proof of concept: auto updatable views

2012-07-01 Thread Darren Duncan
My thoughts on this is that it would be a very valuable feature to have, and 
would make Postgres views behave more like they always were intended to behave, 
which is indistinguishible to users from tables in behavior where all possible, 
and that the reverse mapping would be automatic with the DBMS being given only 
the view-defining SELECT, where possible. -- Darren Duncan


Dean Rasheed wrote:

I've been playing around with the idea of supporting automatically
updatable views, and I have a working proof of concept. I've taken a
different approach than the previous attempts to implement this
feature (e.g., 
http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php),
instead doing all the work in the rewriter, substituting the view for
its base relation rather than attempting to auto-generate any rules or
triggers.

Basically what it does is this: in the first stage of query rewriting,
just after any non-SELECT rules are applied, the new code kicks in -
if the target relation is a view, and there were no unqualified
INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the
view is simply updatable. If so, the target view is replaced by its
base relation and columns are re-mapped. Then the remainder of the
rewriting process continues, recursively handling any further
non-SELECT rules or additional simply updatable views. This handles
the case of views on top of views, with or without rules and/or
triggers.


snip


Obviously there's still more work to do but the early signs seem to be
encouraging.

Thoughts?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash

2012-07-01 Thread Robert Haas
On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I'm a bit worried by the if (!FirstSnapshotSet) restriction
 in GetLatestSnapshot.  Are we sure that enum comparisons could never
 happen without a snapshot already being set?  What's the point of
 throwing an error there anyway, as opposed to letting it redirect to
 GetTransactionSnapshot?

I don't know whether it should set the transaction snapshot or just r

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash

2012-07-01 Thread Robert Haas
On Jul 2, 2012, at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I'm a bit worried by the if (!FirstSnapshotSet) restriction
 in GetLatestSnapshot.  Are we sure that enum comparisons could never
 happen without a snapshot already being set?  What's the point of
 throwing an error there anyway, as opposed to letting it redirect to
 GetTransactionSnapshot?
 
 I don't know whether it should set the transaction snapshot or just r

Argh, sorry.

...or just return a current snapshot, and I also don't know whether it needs to 
be changed because of this; but I agree with changing it. Erroring out always 
seemed kind of pointless to me...

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash

2012-07-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Jul 2, 2012, at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I'm a bit worried by the if (!FirstSnapshotSet) restriction
 in GetLatestSnapshot.

 I don't know whether it should set the transaction snapshot or just r
 Argh, sorry.
 ...or just return a current snapshot, and I also don't know whether it needs 
 to be changed because of this; but I agree with changing it. Erroring out 
 always seemed kind of pointless to me...

I think it was coded like that because the sole original use was in
ri_triggers.c, in which it would have been a red flag if no transaction
snapshot already existed.  However, the restriction clearly doesn't fit
with GetLatestSnapshot's API spec, so it seems to me to be sensible
to change it (as opposed to, say, inventing a new snapshot function
with a subtly different API spec).

As for creating an MVCC snapshot without causing a transaction snapshot
to be established, no thanks --- that would create a path of control
that exists nowhere today and has gotten no testing at all.  I suspect
that it might actively break some assumptions in snapshot management.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash

2012-07-01 Thread Alvaro Herrera

Excerpts from Tom Lane's message of lun jul 02 00:24:06 -0400 2012:
 Robert Haas robertmh...@gmail.com writes:
  On Jul 2, 2012, at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:
  On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  However, I'm a bit worried by the if (!FirstSnapshotSet) restriction
  in GetLatestSnapshot.
 
  I don't know whether it should set the transaction snapshot or just r
  Argh, sorry.
  ...or just return a current snapshot, and I also don't know whether it 
  needs to be changed because of this; but I agree with changing it. Erroring 
  out always seemed kind of pointless to me...
 
 I think it was coded like that because the sole original use was in
 ri_triggers.c, in which it would have been a red flag if no transaction
 snapshot already existed.

Yeah, that's correct.  I guess I could have made the check an Assert()
instead of elog().

 However, the restriction clearly doesn't fit
 with GetLatestSnapshot's API spec, so it seems to me to be sensible
 to change it (as opposed to, say, inventing a new snapshot function
 with a subtly different API spec).

Sounds sensible.

 As for creating an MVCC snapshot without causing a transaction snapshot
 to be established, no thanks --- that would create a path of control
 that exists nowhere today and has gotten no testing at all.  I suspect
 that it might actively break some assumptions in snapshot management.

I think it should work fine as long as the snapshot is registered, but
as you say it is untested.  Maybe it'd have interactions with the way
snapshots connect with resource owners.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers