date:20150703

On Thu, Jul 2, 2015 at 8:02 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Hi,

 While I'm implementing the patch around pg_archivecleanup, I found
 the following warning about pg_archivecleanup in the document.

 Note that the .backup file name passed to the program should not
 include the extension.

 ISTM that pg_archivecleanup works as expected even if the .backup file
 with the extension is specified as follows. So, isn't this warning already
 obsolete? Or I'm missing something?

 $ pg_archivecleanup -d -x .zip myarchive
 00010009.0028.backup.zip
 pg_archivecleanup: keep WAL file myarchive/00010009 and 
 later
 pg_archivecleanup: removing file myarchive/00010005.zip
 pg_archivecleanup: removing file myarchive/00010003.zip
 pg_archivecleanup: removing file myarchive/00010001.zip
 pg_archivecleanup: removing file myarchive/00010007.zip
 pg_archivecleanup: removing file myarchive/00010006.zip
 pg_archivecleanup: removing file myarchive/00010004.zip
 pg_archivecleanup: removing file myarchive/00010002.zip
 pg_archivecleanup: removing file myarchive/00010008.zip

Even in 9.2 where -x option was added firstly, I confirmed that
I could pass the .backup file name including the extension to
pg_archivecleanup program. So I'm wondering if the warning in
question was incorrect from the beginning...
Or am I missing something?

In the past thread of -x option patch, I found the following
Robert comment. This makes me think that we unfortunately failed to
notice his comment and finally added the unnecessary warning into
the document.

-
http://www.postgresql.org/message-id/CA+TgmoZDYD_W7K_S1ZuEnqVNOaRWYCX=eetx+r27vb7akrr...@mail.gmail.com
Also, I'm wondering about this warning in the documentation:

+extension added by the compression program.  Note that the
+filename.backup/ file name passed to the program should not
+include the extension.

IIUC, the latest change you made makes that warning obsolete, no?

[rhaas pgsql]$ contrib/pg_archivecleanup/pg_archivecleanup -d -x .gz .
00010010.0020.backup.gz
pg_archivecleanup: keep WAL file ./00010010 and later
-

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

On Thu, Jul 2, 2015 at 11:16 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 I was going through the code and have few comments:
 - Why do you change the return value of TryReuseIndex? Can't we use reuse
 the same OidIsValid(stmt-oldNode) check that ATExecAddIndex is doing
 instead?

As pointed out by Heikki previously, that is actually unnecessary,
comments are still lost even if the index is reused for constraints.
So perhaps for simplicity we could just unconditionally recreate the
comments all the time if they are available.

 - I think the changes to ATPostAlterTypeParse should follow more closely the
 coding of transformTableLikeClause - namely use the idxcomment

I am not sure I follow here. Could you elaborate?

 - Also the changes to ATPostAlterTypeParse move the code somewhat
 uncomfortably over the 80 char width, I don't really see a way to fix that
 except for moving some of the nesting out to another function.

Yes, I did some refactoring here by adding a set of new routines
dedicated to attach generated commands to the correct queue. This way
the code is empty of large switch/case blocks.

Update patch is attached, with the issue reported by Heikki upthread
fixed as well.
Regards,
-- 
Michael
From 2244608d3d06b89202b506f31b00a7d58a57f9c5 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Fri, 3 Jul 2015 22:47:22 +0900
Subject: [PATCH] Ensure COMMENT persistency of indexes and constraint with
 ALTER TABLE

When rewriting a table, in some cases indexes and constraints present
on it need to be recreated from scratch, making any existing comment
entry, as known as a description in pg_description, disappear after
ALTER TABLE.

This commit fixes this issue by tracking the existing constraint,
indexes, and combinations of both when running ALTER TABLE and recreate
COMMENT entries when appropriate. A set of regression tests is added
to test all the new code paths added.
---
 src/backend/commands/tablecmds.c  | 287 ++
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/alter_table.out |  95 ++
 src/test/regress/sql/alter_table.sql  |  37 
 4 files changed, 346 insertions(+), 74 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d394713..78e6b5c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -316,6 +316,13 @@ static void ATRewriteTables(AlterTableStmt *parsetree,
 List **wqueue, LOCKMODE lockmode);
 static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
 static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel);
+static void ATAttachQueueCommand(Oid oldId, Oid refRelId, List **wqueue,
+Node *stm, Relation rel, bool rewrite);
+static void ATAttachQueueIndexStmt(Oid oldId, List **wqueue,
+IndexStmt *stmt, Relation rel, bool rewrite);
+static void ATAttachQueueAlterTableStmt(Oid oldId, Oid refRelId,
+			List **wqueue, AlterTableStmt *stmt,
+			Relation rel, bool rewrite);
 static void ATSimplePermissions(Relation rel, int allowed_targets);
 static void ATWrongRelkindError(Relation rel, int allowed_targets);
 static void ATSimpleRecursion(List **wqueue, Relation rel,
@@ -386,6 +393,12 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
 	 char *cmd, List **wqueue, LOCKMODE lockmode,
 	 bool rewrite);
+static void RebuildObjectComment(AlteredTableInfo *tab,
+	 int cmdidx,
+	 ObjectType objtype,
+	 Oid objid,
+	 Oid classoid,
+	 List *objname);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
 static void TryReuseForeignKey(Oid oldId, Constraint *con);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -3498,6 +3511,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			address = ATExecAddIndex(tab, rel, (IndexStmt *) cmd-def, true,
 	 lockmode);
 			break;
+		case AT_ReAddComment:	/* Re-add existing comment */
+			CommentObject((CommentStmt *) cmd-def);
+			break;
 		case AT_AddConstraint:	/* ADD CONSTRAINT */
 			address =
 ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd-def,
@@ -4251,6 +4267,162 @@ ATGetQueueEntry(List **wqueue, Relation rel)
 	return tab;
 }
 
+
+/*
+ * ATAttachQueueCommand
+ *
+ * Attach each generated command to the proper place in the work queue.
+ * Note this could result in creation of entirely new work-queue entries.
+ *
+ * Also note that the command subtypes have to be tweaked, because it
+ * turns out that re-creation of indexes and constraints has to act a bit
+ * differently from initial creation.
+ */
+static void
+ATAttachQueueCommand(Oid oldId, Oid refRelId, List **wqueue,
+	 Node *stm, Relation rel, bool rewrite)
+{
+	switch (nodeTag(stm))
+	{
+		case T_IndexStmt:
+			ATAttachQueueIndexStmt(oldId, wqueue,
+   (IndexStmt *) stm, rel,

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-03 Thread Andrew Dunstan



On 07/03/2015 06:27 AM, Heikki Linnakangas wrote:

On 05/27/2015 09:51 PM, Andrew Dunstan wrote:


On 05/27/2015 02:37 PM, Robert Haas wrote:

On Tue, May 26, 2015 at 2:50 AM, Shulgin, Oleksandr
oleksandr.shul...@zalando.de wrote:

Is it reasonable to add this patch to CommitFest now?

It's always reasonable to add a patch to the CommitFest if you would
like for it to be reviewed and avoid having it get forgotten about.
There seems to be some disagreement about whether we want this, but
don't let that stop you from adding it to the next CommitFest.


I'm not dead set against it either. When I have time I will take a
closer look.


Andrew, will you have the time to review this? Please add yourself as 
reviewer in the commitfest app if you do.


My 2 cents is that I agree with your initial reaction: This is a lot 
of infrastructure and generalizing things, for little benefit. Let's 
change the current code where we generate JSON to be consistent with 
whitespace, and call it a day.


- Heikki




I'm somewhat on vacation for the next week or so, so I won't claim it, 
but I'll try to make time to look at it. Other people (Merlin?) could 
also provide reviews.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement

2015-07-03 Thread Ashutosh Bapat

On Fri, Jul 3, 2015 at 4:48 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas robertmh...@gmail.com wrote:
 
  On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
  
 http://www.postgresql.org/message-id/ca+tgmozm+-0r7h0edpzzjbokvvq+gavkchmno4fypveccw-...@mail.gmail.com
 
  I like the idea of the feature a lot, but the proposal to which you
  refer here mentions some problems, which I'm curious how you think you
  might solve.  (I don't have any good ideas myself, beyond what I
  mentioned there.)
 

 You're right, and we have another design (steps 1 and 2 was implemented
 last year):


 *** ALTER TABLE changes

 1) ATController
 1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c
 - AlterTableGetLockLevel:3023)


 2) Prepare to change relpersistence (src/backend/commands/tablecmds.c -
 ATPrepCmd:3249-3270)
 • check temp table (src/backend/commands/tablecmds.c -
 ATPrepChangePersistence:11074)
 • check foreign key constraints (src/backend/commands/tablecmds.c -
 ATPrepChangePersistence:11102)


 3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync
 (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if
 exists)


 4) Create a new fork called  TRANSIENT INIT FORK:

 • from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
   ∘ new forkName (src/common/relpath.c) called _initl
   ∘ insert XLog record to drop it if transaction abort

 • from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
   ∘ new forkName (src/common/relpath.c) called _initu
   ∘ insert XLog record to drop it if transaction abort


AFAIU, while reading WAL, the server doesn't know whether the transaction
that produced that WAL record aborted or committed. It's only when it sees
a COMMIT/ABORT record down the line, it can confirm whether the transaction
committed or aborted. So, one can only redo the things that WAL tells
have been done. We can not undo things based on the WAL. We might record
this fact somewhere while reading the WAL and act on it once we know the
status of the transaction, but I do not see that as part of this idea. This
comment applies to all the steps inserting WALs for undoing things.




 5) Change the relpersistence in catalog (pg_class-relpersistence) (heap,
 toast, indexes)


 6) Remove/Create INIT_FORK

 • from Unlogged to Logged
   ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the
 pendingDeletes queue
 • from Logged to Unlogged
   ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
   ∘ create the INIT_FORK using heap_create_init_fork
   ∘ insert XLog record to drop init fork if the transaction abort



 *** CRASH RECOVERY changes

 1) During crash recovery
 (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)


This operation is carried out in two phases: one before replaying WAL
records and second after that. Are you sure that the WALs generated for the
unlogged or logged forks, as described above, have been taken care of?


 • if the transient fork _initl exists then
   ∘ drop the transient fork _initl
   ∘ if the init fork doesn't exist then create it
   ∘ reset relation
 • if the transient fork _initu exists then
   ∘ drop the transient fork _initl
   ∘ if the init fork exists then drop it
   ∘ don't reset the relation


Consider case of converting unlogged to logged. The server crashes after
6th step when both the forks are removed. During crash recovery, it will
not see any of the fork and won't reset the unlogged table. Remember the
table is still unlogged since the transaction was aborted due to the crash.
So, it has to have an init fork to be reset on crash recovery.

Similarly while converting from logged to unlogged. The server crashes in
the 6th step after creating the INIT_FORK, during crash recovery the init
fork will be seen and a supposedly logged table will be trashed.

The ideas in 1 and 2 might be better than having yet another init fork.


1. http://www.postgresql.org/message-id/533d457a.4030...@vmware.com


 Regards,

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira: http://www.timbira.com.br
  Blog: http://fabriziomello.github.io
  Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello
  Github: http://github.com/fabriziomello




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

2015-07-03 Thread Petr Jelinek


On 2015-07-03 15:50, Michael Paquier wrote:

On Thu, Jul 2, 2015 at 11:16 PM, Petr Jelinek p...@2ndquadrant.com wrote:

I was going through the code and have few comments:
- Why do you change the return value of TryReuseIndex? Can't we use reuse
the same OidIsValid(stmt-oldNode) check that ATExecAddIndex is doing
instead?


As pointed out by Heikki previously, that is actually unnecessary,
comments are still lost even if the index is reused for constraints.
So perhaps for simplicity we could just unconditionally recreate the
comments all the time if they are available.



Ah ok, I missed Heikki's email.


- I think the changes to ATPostAlterTypeParse should follow more closely the
coding of transformTableLikeClause - namely use the idxcomment


I am not sure I follow here. Could you elaborate?



Well for indexes you don't really need to add the new AT command, as 
IndexStmt has char *idxcomment which it will automatically uses as 
comment if not NULL. While  I am not huge fan of the idxcomment it 
doesn't seem to be easy to remove it in the future and it's what 
transformTableLikeClause uses so it might be good to be consistent with 
that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On Fri, Jul 3, 2015 at 3:01 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Fri, Jul 03, 2015 at 02:34:44PM +0900, Fujii Masao wrote:
  Hmm, for me it is 100% reproducable. Are you familiar with Docker? I
  can probably construct a Dockerfile that reproduces it pretty reliably.

 I could reproduce the problem in the master branch by doing
 the following steps.

 Thank you, I wasn't sure if you could kill the server fast enough
 without containers, but it looks like immediate mode is enough.

 1. start the PostgreSQL server with wal_level = minimal
 2. execute the following SQL statements
  begin;
  create table test(id serial primary key);
  truncate table test;
  commit;
 3. shutdown the server with immediate mode
 4. restart the server (crash recovery occurs)
 5. execute the following SQL statement
 select * from test;

 The optimization of TRUNCATE opereation that we can use when
 CREATE TABLE and TRUNCATE are executed in the same transaction block
 seems to cause the problem. In this case, only index file truncation is
 logged, and index creation in btbuild() is not logged because wal_level
 is minimal. Then at the subsequent crash recovery, index file is truncated
 to 0 byte... Very simple fix is to log an index creation in that case,
 but not sure if that's ok to do..

In 9.2 or before, this problem doesn't occur because no such error is thrown
even if an index file size is zero. But in 9.3 or later, since the planner
tries to read a meta page of an index to get the height of the btree tree,
an empty index file causes such error. The planner was changed that way by
commit 31f38f28, and the problem seems to be an oversight of that commit.

I'm not familiar with that change of the planner, but ISTM that we can
simply change _bt_getrootheight() so that 0 is returned if an index file is
empty, i.e., meta page cannot be read, in order to work around the problem.
Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Fujii Masao masao.fu...@gmail.com writes:
 The optimization of TRUNCATE opereation that we can use when
 CREATE TABLE and TRUNCATE are executed in the same transaction block
 seems to cause the problem. In this case, only index file truncation is
 logged, and index creation in btbuild() is not logged because wal_level
 is minimal. Then at the subsequent crash recovery, index file is truncated
 to 0 byte... Very simple fix is to log an index creation in that case,
 but not sure if that's ok to do..

 In 9.2 or before, this problem doesn't occur because no such error is thrown
 even if an index file size is zero. But in 9.3 or later, since the planner
 tries to read a meta page of an index to get the height of the btree tree,
 an empty index file causes such error. The planner was changed that way by
 commit 31f38f28, and the problem seems to be an oversight of that commit.

What?  You want to blame the planner for failing because the index was
left corrupt by broken WAL replay?  A failure would occur anyway at
execution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-03 Thread Marco Atzeri


On 7/3/2015 8:19 AM, Michael Paquier wrote:

On Fri, Jul 3, 2015 at 2:47 PM, Marco Atzeri marco.atz...@gmail.com wrote:

On 7/2/2015 5:16 PM, Dave Page wrote:



-lldap
hstore_plperl.o: In function `hstore_to_plperl':
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1
/contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to
`hstoreUpgrade   '


So... dangomushi is able to build it at least. Here are the logs to
the last build for example:
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dangomushidt=2015-07-02%2019%3A19%3A39stg=make-contrib
What is the name of the library generated in hstore? Perhaps there is
a mismatch here.


OK thanks for the feedback.
It may be a cygwin perl specific issue.
I will investigate

Regards
Marco



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

Fabien COELHO coe...@cri.ensmp.fr writes:
 I'm pretty clearly in favor of doing correct lexing. I think we should
 generalize that and make it reusable. psql has it's own hacked up
 version already, there seems little point in having variedly good copies
 around.

 I must admit that I do not know how to share lexer rules but have 
 different actions on them (psql vs sql parser vs ...), as the action code 
 is intrinsically intertwined with expressions.

Obviously this is scope creep of the first magnitude, but ISTM that
it would be possible to share a lexer between psql and pgbench, since
in both of them the basic requirement is break SQL commands apart and
identify newline-terminated backslash commands.  If we're gonna break
pgbench's backwards compatibility anyway, there would be a whole lot
to be said for just going over to psql's input parsing rules, lock
stock 'n barrel; and this would be a good way to achieve that.

As it stands, psqlscan.l has some external dependencies on the rest of
psql, but we could perhaps refactor some of those away, and provide dummy
implementations to satisfy others (eg pgbench could provide a dummy
GetVariable() that just always returns NULL).

So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it
as-is (possibly after refactoring in psql).  A possible issue is avoiding
unnecessary invocations of flex, though.  Maybe symlinking the .c file
would work better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-03 Thread Andrew Dunstan



On 07/03/2015 01:47 AM, Marco Atzeri wrote:

On 7/2/2015 5:16 PM, Dave Page wrote:

The PostgreSQL Global Development Group announces that the alpha
release of PostgreSQL 9.5, the latest version of the world's leading
open source database, is available today.  This release contains
previews of all of the features which will be available in the final
release of version 9.5, although some details will change before then.
Please download, test, and report what you find.

Help Test for Bugs
--




building on cygwin and

$ perl --version

This is perl 5, version 22, subversion 0 (v5.22.0) built for 
cygwin-thread-multi-64int


build fail here, anyone seeing the same on other platforms ?

make -C hstore_plperl all
make[1]: Entering directory 
'/cygdrive/e/cyg_pub/devel/postgresql/prova/postgres 
ql-9.5alpha1-1.i686/build/contrib/hstore_plperl'
gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -We ndif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f 
   wrapv -fexcess-precision=standard -ggdb -O2 
-pipe -Wimplicit-function-declaratio   n 
-I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5a 
  lpha1/src/pl/plperl 
-I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr 
   c/postgresql-9.5alpha1/contrib/hstore -I. 
-I/pub/devel/postgresql/prova/postgres 
ql-9.5alpha1-1.i686/src/postgresql-9.5alpha1/contrib/hstore_plperl 
-I../../src/i   nclude 
-I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql- 
 9.5alpha1/src/include 
-I/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE  -c 
-o hstore_plperl.o 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/p 
ostgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c
In file included from 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr

c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:6:0:
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 
  /contrib/hstore/hstore.h:79:0: warning: 
HS_KEY redefined

 #define HS_KEY(arr_,str_,i_) ((str_) + HSE_OFF((arr_)[2*(i_)]))
 ^
In file included from 
/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/perl.h: 
 3730:0,
 from 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr 
 c/postgresql-9.5alpha1/src/pl/plperl/plperl.h:48,
 from 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr

c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:4:
/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/util.h:221:0: note: 
this is t   he location of the previous 
definition

 #  define HS_KEY(setxsubfn, popmark, apiver, xsver) \
 ^
gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -We ndif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f 
   wrapv -fexcess-precision=standard -ggdb -O2 
-pipe -Wimplicit-function-declaratio   n 
-shared -o hstore_plperl.dll -Wl,--out-implib=libhstore_plperl.a 
hstore_plpe   rl.o  -L../../src/port 
-L../../src/common -Wl,-no-undefined -Wl,--allow-multiple 
-definition -Wl,--enable-auto-import  -L/usr/local/lib 
-Wl,--as-needed   -L../..   /src/backend 
-lpostgres -lpgcommon -lpgport -lintl -lssl -lcrypto -lz -lreadline 
   -lcrypt -lldap

hstore_plperl.o: In function `hstore_to_plperl':
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 
/contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to 
`hstoreUpgrade   '






That #define clash is annoying, We should probably #undefine HS_KEY if 
it's defined before including hstore.h.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Martijn van Oosterhout

On Fri, Jul 03, 2015 at 12:53:56PM -0400, Tom Lane wrote:
 Fujii Masao masao.fu...@gmail.com writes:
  Okay, so probably we need to change WAL replay of TRUNCATE so that
  the index file is truncated to one containing only meta page instead of
  empty one. That is, the WAL replay of TRUNCATE would need to call
  index_build() after smgrtruncate() maybe.
 
 That seems completely unworkable.  For one thing, index_build would expect
 to be able to do catalog lookups, but we can't assume that the catalogs
 are in a good state yet.
 
 I think the responsibility has to be on the WAL-writing end to emit WAL
 instructions that lead to a correct on-disk state.  Putting complex
 behavior into the reading side is fundamentally misguided.

Am I missing something. ISTM that if the truncate record was simply not
logged at all everything would work fine. The whole point is that the
table was created in this transaction and so if it exists the table on
disk must be the correct representation.

The broken index is just one symptom. The heap also shouldn't be
truncated at all. If you insert a row before commit then after replay
the tuple should be there still.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

On 2015-07-03 10:27:05 -0700, Josh Berkus wrote:
 On 07/03/2015 03:12 AM, Sawada Masahiko wrote:
  Thanks. So we can choice the next master server using by checking the
  progress of each server, if hot standby is enabled.
  And a such procedure is needed even today replication.
  
  I think that the #2 problem which is Josh pointed out seems to be solved;
  1. I need to ensure that data is replicated to X places.
  2. I need to *know* which places data was synchronously replicated
  to when the master goes down.
  And we can address #1 problem using quorum commit.
 
 It's not solved. I still have zero ways of knowing if a replica was in
 sync or not at the time the master went down.

What?

You pick the standby that's furthest ahead. And you use a high enough
quorum so that given your tolerance for failures you'll always be able
to reach at least one of the synchronous replicas. Then you promote the
one with the highest LSN. Done.

This is something that gets *easier* by quorum, not harder.

 I forked the subject line because I think that the inability to
 identify synch replicas under failover conditions is a serious problem
 with synch rep *today*, and pretending that it doesn't exist doesn't
 help us even if we don't fix it in 9.6.

That's just not how failovers can sanely work. And again, you *have* the
information you can have on the standbys already. You *know* what/from
when the last replayed xact is.

 Let me give you three cases where our lack of information on the replica
 side about whether it thinks it's in sync or not causes synch rep to
 fail to protect data.  The first case is one I've actually seen in
 production, and the other two are hypothetical but entirely plausible.
 
 Case #1: two synchronous replica servers have the application name
 synchreplica.  An admin uses the wrong Chef template, and deploys a
 server which was supposed to be an async replica with the same
 recovery.conf template, and it ends up in the synchreplica group as
 well. Due to restarts (pushing out an update release), the new server
 ends up seizing and keeping sync. Then the master dies.  Because the new
 server wasn't supposed to be a sync replica in the first place, it is
 not checked; they just fail over to the furthest ahead of the two
 original synch replicas, neither of which was actually in synch.

Nobody can protect you against such configuration errors. We can make it
harder to misconfigure, sure, but it doesn't have anything to do with
the topic at hand.

 Case #2: 2 { local, london, nyc } setup.  At 2am, the links between
 data centers become unreliable, such that the on-call sysadmin disables
 synch rep because commits on the master are intolerably slow.  Then, at
 10am, the links between data centers fail entirely.  The day shift, not
 knowing that the night shift disabled sync, fail over to London thinking
 that they can do so with zero data loss.

As I said earlier, you can check against that today by checking the last
replayed timestamp. SELECT pg_last_xact_replay_timestamp();

You don't have to pick the one that used to be a sync replica. You pick
the one with the most data received.


If the day shift doesn't bother to check the standbys now, they'd not
check either if they had some way to check whether a node was the chosen
sync replica.

 Case #3 1 { london, frankfurt }, 1 { sydney, tokyo } multi-group
 priority setup.  We lose communication with everything but Europe.  How
 can we decide whether to wait to get sydney back, or to promote London
 immedately?

You normally don't continue automatically at all in that situation. To
avoid/minimize data loss you want to have a majority election system to
select the new primary. That requires reaching the majority of the
nodes. This isn't something specific to postgres, if you look at any
solution out there, they're also doing it that way.

Statically choosing which of the replicas in a group is the current sync
one is a *bad* idea. You want to ensure that at least node in a group
has received the data, and stop waiting as soon that's the case.

 It's an issue *now* that the only data we have about the state of sync
 rep is on the master, and dies with the master.   And it severely limits
 the actual utility of our synch rep.  People implement synch rep in the
 first place because the best effort of asynch rep isn't good enough
 for them, and yet when it comes to failover we're just telling them
 give it your best effort.

We don't tell them that, but apparently you do.


This subthread is getting absurd, stopping here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix broken Install.bat when target directory contains a space


On 04/21/2015 04:02 PM, Michael Paquier wrote:

On Tue, Apr 21, 2015 at 4:33 PM, Asif Naeem anaeem...@gmail.com wrote:


The v2 patch looks good to me, just a minor concern on usage message i.e.

C:\PG\postgresql\src\tools\msvcinstall

Invalid command line options.
Usage: install.bat targetdir [installtype]
installtype: client



It seems that there are two install options i.e. client, all (any other
string other than client is being considered or treated as all), the
following install command works i.e.

install C:\PG\postgresql\inst option_does_not_exist


As your patch effects this area of code, I thought to share these findings
with you,o BTW, it is a minor thing that can be handled in another patch,


Well, that's the same behavior that this script has been having for ages.
Let's just update the usage message to mention both all and client. I
see no point in breaking a behavior that has been like that for ages, and
the main point of this patch is to fix the install path issue.


Hmm. Why is install.bat not like build.bat, i.e. just a thin wrapper 
that just calls install.pl, passing all arguments?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Fujii Masao masao.fu...@gmail.com writes:
 Okay, so probably we need to change WAL replay of TRUNCATE so that
 the index file is truncated to one containing only meta page instead of
 empty one. That is, the WAL replay of TRUNCATE would need to call
 index_build() after smgrtruncate() maybe.

That seems completely unworkable.  For one thing, index_build would expect
to be able to do catalog lookups, but we can't assume that the catalogs
are in a good state yet.

I think the responsibility has to be on the WAL-writing end to emit WAL
instructions that lead to a correct on-disk state.  Putting complex
behavior into the reading side is fundamentally misguided.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

I wrote:
 As it stands, psqlscan.l has some external dependencies on the rest of
 psql, but we could perhaps refactor some of those away, and provide dummy
 implementations to satisfy others (eg pgbench could provide a dummy
 GetVariable() that just always returns NULL).

 So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it
 as-is (possibly after refactoring in psql).  A possible issue is avoiding
 unnecessary invocations of flex, though.  Maybe symlinking the .c file
 would work better.

A quick experiment with compiling psqlscan inside pgbench yields the
following failures:

pgbench.o: In function `psql_scan_setup':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1239: undefined reference to 
`pset'
pgbench.o: In function `escape_variable':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to 
`GetVariable'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1956: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1957: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1971: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1973: undefined reference to 
`psql_error'
pgbench.o: In function `evaluate_backtick':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1701: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1712: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1722: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1728: undefined reference to 
`psql_error'
pgbench.o: In function `yylex':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:511: undefined reference to 
`standard_strings'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to 
`GetVariable'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:751: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to 
`GetVariable'
pgbench.o: In function `psql_scan_slash_option':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1619: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1628: undefined reference to 
`psql_error'

The pset references are to pset.encoding, pset.db, or pset.vars.  I'd
think the best way to deal with the encoding and connection are to pass
them as parameters to psql_scan_setup() which'd store them in 
the PsqlScanState.  pset.vars is only passed to GetVariable.  We could
refactor that away somehow (although actually, why wouldn't we want to
just implement variable substitution exactly like it is in psql?  Maybe
the right answer is to import psql/variables.c lock stock n barrel too...)
psql_error() and standard_strings() wouldn't be hard to provide.

So this is looking *eminently* doable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

Jeevan Chalke jeevan.cha...@enterprisedb.com writes:
 Patch looks excellent now. No issues.
 Found a typo which I have fixed in the attached patch.

Starting to look at this ...

The business with numbering lines from SELECT seems to me to be completely
nonsensical.  In the first place, it fails to allow for views containing
WITH clauses.  But really it looks like it was cargo-culted over from
\ef/\sf without understanding why those commands number lines the way
they do.  The reason they do that is that for errors occurring inside a
function definition, the PL will typically report a line number relative
to the function body text, and so we're trying to be helpful about
interpreting line numbers of that kind.  But there's no comparable
behavior in the case of a view.  If you fat-finger a view, you'll get
a line number relative to the text of the whole CREATE command, eg

regression=# create or replace view z as
regression-# select 1/col
regression-# from bar;
ERROR:  relation bar does not exist
LINE 3: from bar;
 ^

So AFAICS, \ev and \sv should just number lines straightforwardly, with
1 being the first line of the CREATE command text.  Am I missing
something?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On 2015-07-03 18:49:31 +0200, Andres Freund wrote:
 But the more interesting question is why that's not hhappening
 today. RelationTruncateIndexes() does call the index_build() which
 should end up WAL logging the index creation.

So that's because there's an XLogIsNeeded() preventing it.

Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm
right now missing how the whole skip wal logging if relation has just
been truncated optimization can ever actually be crashsafe unless we
use a new relfilenode (which we don't!).

Sure, we do an heap_sync() at the the end of the transaction. That's
nice and all. But it doesn't help if we crash and re-start WAL apply
from a checkpoint before the table was created. Because that'll replay
the truncation.

That's much worse than just the indexes - the rows added by a COPY
without WAL logging will also be truncated away, no?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] more RLS oversights

2015-07-03 Thread Noah Misch

On Wed, Feb 25, 2015 at 11:37:24PM -0500, Stephen Frost wrote:
 * Stephen Frost (sfr...@snowman.net) wrote:
  I agree that it's great that we're catching issues prior to when the
  feature is released and look forward to anything else you (or anyone
  else!) finds.
 
 I've pushed a fix for this.  Please let me know if you see any other
 issues or run into any problems.

(1) CreatePolicy() and AlterPolicy() omit to call assign_expr_collations() on
the node trees.  Test case:

begin;
set row_security = force;
create table t (c) as values ('bar'::text);
create policy p on t using (c  ('foo'::text COLLATE C));
alter table t enable row level security;
table pg_policy;  -- note :inputcollid 0
select * from t;  -- ERROR:  could not determine which collation ...
rollback;


(2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for
each role in the TO clause.  Test case:

begin;
create role alice;
create table t (c) as values ('bar'::text);
grant select on table t to alice;
create policy p on t to alice using (true);
select refclassid::regclass, refobjid, refobjsubid, deptype
  from pg_depend where classid = 'pg_policy'::regclass;
select refclassid::regclass, refobjid, deptype
  from pg_shdepend where classid = 'pg_policy'::regclass;
savepoint q; drop role alice; rollback to q;
revoke all on table t from alice;
\d t
drop role alice;
\d t
rollback;


 +static void
 +dumpPolicy(Archive *fout, PolicyInfo *polinfo)
...
 + if (polinfo-polqual != NULL)
 + appendPQExpBuffer(query,  USING %s, polinfo-polqual);

(3) The USING clause needs parentheses; a dump+reload failed like so:

pg_restore: [archiver (db)] could not execute query: ERROR:  syntax error at or 
near CASE
LINE 2: CASE
^
Command was: CREATE POLICY p2 ON category FOR ALL TO PUBLIC USING 
CASE
WHEN (current_user() = 'rls_regress_user1'::name) THE...

Add the same parentheses to psql \d output also, keeping that output similar
to the SQL syntax.

(3a) I found this by hacking the rowsecurity.sql regression test to not drop
its objects, then running the pg_upgrade test suite.  New features that affect
pg_dump should leave objects in the regression database to test the pg_dump
support via that suite.


(4) When DefineQueryRewrite() is about to convert a table to a view, it checks
the table for features unavailable to views.  For example, it rejects tables
having triggers.  It omits to reject tables having relrowsecurity or a
pg_policy record.  Test case:

begin;
set row_security = force;
create table t (c) as select * from generate_series(1,5);
create policy p on t using (c % 2 = 1);
alter table t enable row level security;
table t;
truncate t;
create rule _RETURN as on select to t do instead
  select * from generate_series(1,5) t0(c);
table t;
select polrelid::regclass from pg_policy;
select relrowsecurity from pg_class where oid = 't'::regclass;
rollback;


 +  para
 +   Referential integrity checks, such as unique or primary key constraints
 +   and foreign key references, will bypass row security to ensure that
 +   data integrity is maintained.  Care must be taken when developing
 +   schemas and row level policies to avoid a covert channel leak of
 +   information through these referential integrity checks.
...
 + /*
 +  * Row-level security should be disabled in the case where a foreign-key
 +  * relation is queried to check existence of tuples that references the
 +  * primary-key being modified.
 +  */
 + temp_sec_context = save_sec_context | SECURITY_LOCAL_USERID_CHANGE;
 + if (qkey-constr_queryno == RI_PLAN_CHECK_LOOKUPPK
 + || qkey-constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 + || qkey-constr_queryno == RI_PLAN_RESTRICT_DEL_CHECKREF
 + || qkey-constr_queryno == RI_PLAN_RESTRICT_UPD_CHECKREF)
 + temp_sec_context |= SECURITY_ROW_LEVEL_DISABLED;

(5) This code does not use SECURITY_ROW_LEVEL_DISABLED for foreign keys with
CASCADE, SET NULL or SET DEFAULT actions.  The associated documentation says
nothing about this presumably-important distinction.

Is SECURITY_ROW_LEVEL_DISABLED mode even needed?  We switch users to the owner
of the FROM-clause table before running an RI query.  That means use of this
mode can only matter under row_security=force, right?  I would rest easier if
this mode went away, because it is a security landmine.  If user code managed
to run in this mode, it would bypass every policy in the database.  (I find no
such vulnerability today, because we use the mode only for parse analysis of
ri_triggers.c queries.)


(6) AlterPolicy() calls InvokeObjectPostAlterHook(PolicyRelationId, ...), but
CreatePolicy() and DropPolicy() lack their respective hook invocations.


(7) Using an aggregate function in a policy predicate elicits an inapposite
error message due to use of EXPR_KIND_WHERE for parse analysis.  Need a new
ParseExprKind.  Test case:

begin;
set row_security = force;
create table t (c) as values

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-03 Thread Petr Korobeinikov

пт, 3 июля 2015 г. в 19:30, Tom Lane t...@sss.pgh.pa.us:

 So AFAICS, \ev and \sv should just number lines straightforwardly, with
 1 being the first line of the CREATE command text.  Am I missing
 something?


Fixed. Now both \ev and \sv  numbering lines starting with 1. New version
attached.

As I've already noticed that pg_get_viewdef() does not support full syntax
of creating or replacing views. In my opinion, psql source code isn't the
place where some formatting hacks should be. So, can you give me an idea
how to produce already formatted output supporting WITH statement without
breaking backward compatibility of pg_get_viewdef() internals?


psql-ev-sv-support-v6.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts



(although actually, why wouldn't we want to just implement variable 
substitution exactly like it is in psql?


Pgbench variable substitution is performed when the script is run, not 
while the file is being processed for being split, which is when a lexer 
would be used. The situation is not the same with psql. The most it could 
do would be to keep track of what substitution are done in queries.



So this is looking *eminently* doable.


Possibly.  How much more effort would be involved compared to the quick 
patch I did, I wonder:-)


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote:
 Am I missing something. ISTM that if the truncate record was simply not
 logged at all everything would work fine. The whole point is that the
 table was created in this transaction and so if it exists the table on
 disk must be the correct representation.

That'd not work either. Consider:

BEGIN;
CREATE TABLE ...
INSERT;
TRUNCATE;
INSERT;
COMMIT;

If you replay that without a truncation wal record the second INSERT
will try to add stuff to already occupied space. And they can have
different lengths and stuff, so you cannot just ignore that fact.

 The broken index is just one symptom.

Agreed. I think the problem is something else though. Namely that we
reuse the relfilenode for heap_truncate_one_rel(). That's just entirely
broken afaics. We need to allocate a new relfilenode and write stuff
into that. Then we can forgo WAL logging the truncation record.

 If you insert a row before commit then after replay the tuple should be there 
 still.

The insert would be WAL logged. COPY skips wal logging tho.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing the size of BufferTag remodeling forks

On 2015-07-03 13:59:07 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
 
  2) Replace relation forks, with the exception of the init fork which is
 special anyway, with separate relfilenodes. Stored in seperate
 columns in pg_class.
 
 Different AMs have different fork needs; for heaps you want one main
 fork, one VM, one fsm.  But for indexes, the VM fork is not necessary,
 and some AMs might want different ones.  For instance, GIN would benefit
 from having separate forks to store the internal indexes, and BRIN would
 benefit from a separate fork for the revmap.
 
 What I'm saying is that I'm not sure it's okay to store the forks in
 pg_class columns

Right. Part of the point of this design is that you could easily add
further forks without system wide knowledge and that it is *not*
required anymore that all relfilenodes are in one column. I think it'd
probably make sense to have at least _vm and _fsm in pg_class, but we
could easily add further ones elsewhere.

 instead perhaps we should have a separate catalog on
 which each relation can have many forks, or perhaps have the pg_class
 entry store an array (ick).  Or perhaps rather than relvmfork (the
 pg_class column for the relfilenode for the VM fork) we could store
 relfilenode1, relfilenode2 where the value for each N fork is
 AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is
 main, 2 is revmap; and so forth).

None of these sound particularly pretty to me. An array of relfilenodes
would probably be the least ugly one.

 FWIW the whole idea seems reasonable to me.  I worry about concurrent
 traffic into the pg_relfilenode shared table -- if temp table creation
 is common across many databases, is it going to become a contention
 point?

I don't think it'll be. It's essentially just inserts into a tiny table
with a single index, right? We can do a bootload of inserts into one of
these.

In an *assert enabled* build:
CREATE TABLE pg_relfilenode() WITH OIDS;
ALTER TABLE pg_relfilenode ADD CONSTRAINT pg_relfilenode_pkey PRIMARY KEY(oid);

which is pretty much how pg_relfilenode would look like. Although we'd
not go through the whole executor for the inserts.

pgbench -h localhost -p 5440 postgres -n -f /tmp/pg_relfilenode.sql -j 16 -c 16 
-T 20 -P 1
cat /tmp/pg_relfilenode.sql:
INSERT INTO pg_relfilenode DEFAULT VALUES;
progress: 5.0 s, 32168.4 tps, lat 0.495 ms stddev 1.728
progress: 6.0 s, 33719.6 tps, lat 0.473 ms stddev 0.773

andres@awork2:~/src/postgresql$ pgbench -h localhost -p 5440 postgres -n -f 
/tmp/temptable.sql -j 16 -c 16 -T 20 -P 1
CREATE TEMPORARY TABLE blarg() ON COMMIT DROP;
progress: 6.0 s, 5018.2 tps, lat 3.185 ms stddev 3.671
progress: 7.0 s, 4890.9 tps, lat 3.272 ms stddev 4.346

and that's with zero actual columns. If you instead add some:
CREATE TEMPORARY TABLE blarg(id serial primary key, data text, blarg int) ON 
COMMIT DROP;
progress: 7.0 s, 974.1 tps, lat 16.462 ms stddev 9.058
progress: 8.0 s, 999.9 tps, lat 16.045 ms stddev 7.011

So no, I don't think this'll be a relevant problem. We do so many
inserts for a single temp table that a single insert into another one
completely vanishes in comparison.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Josh Berkus

On 07/03/2015 03:12 AM, Sawada Masahiko wrote:
 Thanks. So we can choice the next master server using by checking the
 progress of each server, if hot standby is enabled.
 And a such procedure is needed even today replication.
 
 I think that the #2 problem which is Josh pointed out seems to be solved;
 1. I need to ensure that data is replicated to X places.
 2. I need to *know* which places data was synchronously replicated
 to when the master goes down.
 And we can address #1 problem using quorum commit.

It's not solved. I still have zero ways of knowing if a replica was in
sync or not at the time the master went down.

Now, you and others have argued persuasively that there are valuable use
cases for quorum commit even without solving that particular issue, but
there's a big difference between we can work around this problem and
the problem is solved.  I forked the subject line because I think that
the inability to identify synch replicas under failover conditions is a
serious problem with synch rep *today*, and pretending that it doesn't
exist doesn't help us even if we don't fix it in 9.6.

Let me give you three cases where our lack of information on the replica
side about whether it thinks it's in sync or not causes synch rep to
fail to protect data.  The first case is one I've actually seen in
production, and the other two are hypothetical but entirely plausible.

Case #1: two synchronous replica servers have the application name
synchreplica.  An admin uses the wrong Chef template, and deploys a
server which was supposed to be an async replica with the same
recovery.conf template, and it ends up in the synchreplica group as
well. Due to restarts (pushing out an update release), the new server
ends up seizing and keeping sync. Then the master dies.  Because the new
server wasn't supposed to be a sync replica in the first place, it is
not checked; they just fail over to the furthest ahead of the two
original synch replicas, neither of which was actually in synch.

Case #2: 2 { local, london, nyc } setup.  At 2am, the links between
data centers become unreliable, such that the on-call sysadmin disables
synch rep because commits on the master are intolerably slow.  Then, at
10am, the links between data centers fail entirely.  The day shift, not
knowing that the night shift disabled sync, fail over to London thinking
that they can do so with zero data loss.

Case #3 1 { london, frankfurt }, 1 { sydney, tokyo } multi-group
priority setup.  We lose communication with everything but Europe.  How
can we decide whether to wait to get sydney back, or to promote London
immedately?

I could come up with numerous other situations, but all of the three
above completely reasonable cases show how having the knowledge of what
time a replica thought it was last in sync is vital to preventing bad
failovers and data loss, and to knowing the quantity of data loss when
it can't be prevented.

It's an issue *now* that the only data we have about the state of sync
rep is on the master, and dies with the master.   And it severely limits
the actual utility of our synch rep.  People implement synch rep in the
first place because the best effort of asynch rep isn't good enough
for them, and yet when it comes to failover we're just telling them
give it your best effort.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On 2015-07-03 19:02:29 +0200, Andres Freund wrote:
 Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm
 right now missing how the whole skip wal logging if relation has just
 been truncated optimization can ever actually be crashsafe unless we
 use a new relfilenode (which we don't!).

We actually used to use a different relfilenode, but optimized that
away: cab9a0656c36739f59277b34fea8ab9438395869

commit cab9a0656c36739f59277b34fea8ab9438395869
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Sun Aug 23 19:23:41 2009 +

Make TRUNCATE do truncate-in-place when processing a relation that was 
created
or previously truncated in the current (sub)transaction.  This is safe since
if the (sub)transaction later rolls back, we'd just discard the rel's 
current
physical file anyway.  This avoids unreasonable growth in the number of
transient files when a relation is repeatedly truncated.  Per a performance
gripe a couple weeks ago from Todd Cook.

to me the reasoning here looks flawed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On Fri, Jul 3, 2015 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 The optimization of TRUNCATE opereation that we can use when
 CREATE TABLE and TRUNCATE are executed in the same transaction block
 seems to cause the problem. In this case, only index file truncation is
 logged, and index creation in btbuild() is not logged because wal_level
 is minimal. Then at the subsequent crash recovery, index file is truncated
 to 0 byte... Very simple fix is to log an index creation in that case,
 but not sure if that's ok to do..

 In 9.2 or before, this problem doesn't occur because no such error is thrown
 even if an index file size is zero. But in 9.3 or later, since the planner
 tries to read a meta page of an index to get the height of the btree tree,
 an empty index file causes such error. The planner was changed that way by
 commit 31f38f28, and the problem seems to be an oversight of that commit.

 What?  You want to blame the planner for failing because the index was
 left corrupt by broken WAL replay?  A failure would occur anyway at
 execution.

Yep, right. I was not thinking that such index with file size 0 is corrupted
because the reported problem didn't happen before that commit was added.
But that's my fault. Such index can cause an error even in other code paths.

Okay, so probably we need to change WAL replay of TRUNCATE so that
the index file is truncated to one containing only meta page instead of
empty one. That is, the WAL replay of TRUNCATE would need to call
index_build() after smgrtruncate() maybe.

Then how should we implement that? Invent new WAL record type that
calls smgrtruncate() and index_build() during WAL replay? Or add the
special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay
call index_build() only if the flag is found? Any other good idea?
Anyway ISTM that we might need to add or modify WAL record.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On 2015-07-04 01:39:42 +0900, Fujii Masao wrote:
 Okay, so probably we need to change WAL replay of TRUNCATE so that
 the index file is truncated to one containing only meta page instead of
 empty one. That is, the WAL replay of TRUNCATE would need to call
 index_build() after smgrtruncate() maybe.
 
 Then how should we implement that? Invent new WAL record type that
 calls smgrtruncate() and index_build() during WAL replay? Or add the
 special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay
 call index_build() only if the flag is found? Any other good idea?
 Anyway ISTM that we might need to add or modify WAL record.

It's easy enough to log something like a metapage with
log_newpage().

But the more interesting question is why that's not hhappening
today. RelationTruncateIndexes() does call the index_build() which
should end up WAL logging the index creation.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing the size of BufferTag remodeling forks

2015-07-03 Thread Alvaro Herrera

Andres Freund wrote:

 2) Replace relation forks, with the exception of the init fork which is
special anyway, with separate relfilenodes. Stored in seperate
columns in pg_class.

Different AMs have different fork needs; for heaps you want one main
fork, one VM, one fsm.  But for indexes, the VM fork is not necessary,
and some AMs might want different ones.  For instance, GIN would benefit
from having separate forks to store the internal indexes, and BRIN would
benefit from a separate fork for the revmap.

What I'm saying is that I'm not sure it's okay to store the forks in
pg_class columns; instead perhaps we should have a separate catalog on
which each relation can have many forks, or perhaps have the pg_class
entry store an array (ick).  Or perhaps rather than relvmfork (the
pg_class column for the relfilenode for the VM fork) we could store
relfilenode1, relfilenode2 where the value for each N fork is
AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is
main, 2 is revmap; and so forth).

FWIW the whole idea seems reasonable to me.  I worry about concurrent
traffic into the pg_relfilenode shared table -- if temp table creation
is common across many databases, is it going to become a contention
point?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More logging for autovacuum

2015-07-03 Thread Amit Kapila

On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh gurj...@singh.im wrote:

 On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera 
alvhe...@commandprompt.com wrote:

 Gregory Stark wrote:
 
  I'm having trouble following what's going on with autovacuum and I'm
finding
  the existing logging insufficient. In particular that it's only
logging vacuum
  runs *after* the vacuum finishes makes it hard to see what vacuums are
running
  at any given time. Also, I want to see what is making autovacuum
decide to
  forgo vacuuming a table and the log with that information is at DEBUG2.

 So did this idea go anywhere?


 Assuming the thread stopped here, I'd like to rekindle the proposal.

 log_min_messages acts as a single gate for everything headed for the
server logs; controls for per-background process logging do not exist. If
one wants to see DEBUG/INFO messages for just the Autovacuum (or
checkpointer, bgwriter, etc.), they have to set log_min_messages to that
level, but the result would be a lot of clutter from other processes to
grovel through, to see the messages of interest.


I think that will be quite helpful.  During the patch development of
parallel sequential scan, it was quite helpful to see the LOG messages
of bgworkers, however one of the recent commits (91118f1a) have
changed those to DEBUG1, now if I have to enable all DEBUG1
messages, then what I need will be difficult to find in all the log
messages.
Having control of separate logging for background tasks will serve such
a purpose.

 The facilities don't yet exist, but it'd be nice if such parameters when
unset (ie NULL) pick up the value of log_min_messages. So by default, the
users will get the same behaviour as today, but can choose to tweak per
background-process logging when needed.


Will this proposal allow user to see all the messages by all the background
workers or will it be per background task.  Example in some cases user
might want to see the messages by all bgworkers and in some cases one
might want to see just messages by AutoVacuum and it's workers.

I think here designing User Interface is an important work if others also
agree
that such an option is useful, could you please elaborate your proposal?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Determine operator from it's function

Jim Nasby jim.na...@bluetreble.com writes:
 On 7/3/15 2:33 AM, Heikki Linnakangas wrote:
 On 07/03/2015 01:20 AM, Jim Nasby wrote:
 Is there a way to determine the operator that resulted in calling the
 operator function? I thought fcinfo-flinfo-fn_expr might get set to
 the OpExpr, but seems it can be a FuncExpr even when called via an
 operator...

 Don't think there is. Why do you need to know?

 I'd like to support arbitrary operators in variant.

Why would you expect there to be multiple operators pointing at the same
function?  If there were multiple operators pointing at the same function,
why would you need to distinguish them?  ISTM that such a situation would
necessarily mean that there was no distinction worthy of notice.

(The particular situation you are bitching about comes from the fact that
eval_const_expressions's simplify_functions code deliberately ignores any
distinction between operators and functions.  But for its purposes, that
is *correct*, and I will strongly resist any claim that it isn't.  If you
are unhappy then you defined your operators wrongly.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

Jan de Visser j...@de-visser.net writes:
 Attached a new patch, rebased against the current head. Errors in
 pg_hba.conf and pg_ident.conf are now also noticed.

 I checked the documentation for pg_ctl reload, and the only place where
 it's explained seems to be runtime.sgml and that description is so
 high-level that adding this new bit of functionality wouldn't make much
 sense.

BTW, it's probably worth pointing out that the recent work on the
pg_file_settings view has taken away a large part of the use-case for
this, in that you can find out more with less risk by inspecting
pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and
hoping you didn't break anything too nastily.  Also, you can use
pg_file_settings remotely, unlike pg_ctl (though admittedly you
still need contrib/adminpack or something to allow uploading a
new config file if you're doing remote admin).

I wonder whether we should consider inventing similar views for
pg_hba.conf and pg_ident.conf.

While that's not necessarily a reason not to adopt this patch anyway,
it does mean that we should think twice about whether we need to add
a couple of hundred lines for a facility that's less useful than
what we already have.

BTW, this version of this patch neglects to update the comments in
miscadmin.h, and it makes the return convention for
ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy
and inconsistency in the comments is a symptom of that.  I didn't read it
in enough detail to say whether there are other problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

On July 3, 2015 06:21:09 PM Tom Lane wrote:
 Jan de Visser j...@de-visser.net writes:
  Attached a new patch, rebased against the current head. Errors in
  pg_hba.conf and pg_ident.conf are now also noticed.
  
  I checked the documentation for pg_ctl reload, and the only place where
  it's explained seems to be runtime.sgml and that description is so
  high-level that adding this new bit of functionality wouldn't make much
  sense.
 
 BTW, it's probably worth pointing out that the recent work on the
 pg_file_settings view has taken away a large part of the use-case for
 this, in that you can find out more with less risk by inspecting
 pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and
 hoping you didn't break anything too nastily.  Also, you can use
 pg_file_settings remotely, unlike pg_ctl (though admittedly you
 still need contrib/adminpack or something to allow uploading a
 new config file if you're doing remote admin).
 
 I wonder whether we should consider inventing similar views for
 pg_hba.conf and pg_ident.conf.
 
 While that's not necessarily a reason not to adopt this patch anyway,
 it does mean that we should think twice about whether we need to add
 a couple of hundred lines for a facility that's less useful than
 what we already have.

Since you were the one proposing the feature, I'm going to leave it to you 
whether or not I should continue with this. I have no use for this feature; 
for me it just seemed like a nice bite-sized feature to get myself acquainted 
with the code base and the development process. As far as I'm concerned that 
goal has already been achieved, even though finalizing a patch towards commit 
(and having my name in the release notes ha ha) would be the icing on the 
cake.

 
 BTW, this version of this patch neglects to update the comments in
 miscadmin.h, and it makes the return convention for
 ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy
 and inconsistency in the comments is a symptom of that.  I didn't read it
 in enough detail to say whether there are other problems.

OK, miscadmin.h. I'll go and look what that's all about. And would it make 
sense to find a better solution for the ProcessConfigFileInternal return value 
(which is convoluted, I agree - I went for the solution with the least impact 
on existing code), or should I improve documentation?

 
   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On 2015-07-03 18:38:37 -0400, Tom Lane wrote:
  Why exactly?  The first truncation in the (sub)xact would have assigned a
 new relfilenode, why do we need another one?  The file in question will
 go away on crash/rollback in any case, and no other transaction can see
 it yet.

Consider:

BEGIN;
CREATE TABLE;
INSERT largeval;
TRUNCATE;
INSERT 1;
COPY;
INSERT 2;
COMMIT;

INSERT 1 is going to be WAL logged. For that to work correctly TRUNCATE
has to be WAL logged, as otherwise there'll be conflicting/overlapping
tuples on the target page.

But:

The truncation itself is not fully wal logged, neither is the COPY. Both
rely on heap_sync()/immedsync(). For that to be correct the current
relfilenode's truncation may *not* be wal-logged, because the contents
of the COPY or the truncation itself will only be on-disk, not in the
WAL.

Only being on-disk but not in the WAL is a problem if we crash and
replay the truncate record.

 I'm prepared to believe that some bit of logic is doing the wrong
 thing in this state, but I do not agree that truncate-in-place is
 unworkable.

Unless we're prepared to make everything that potentially WAL logs
something do the rel-rd_createSubid == mySubid  dance, I can't see
that working.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

On Sat, Jul 4, 2015 at 2:44 AM, Andres Freund wrote:
 This subthread is getting absurd, stopping here.

Yeah, I agree with Andres here, we are making a mountain of nothing
(Frenglish?). I'll send to the other thread some additional ideas soon
using a JSON structure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Martijn van Oosterhout klep...@svana.org writes:
 With inserts the WAL records look as follows (relfilenodes changed):
 ...
 And amazingly, the database cluster successfuly recovers and there's no
 error now.  So the problem is *only* because there is no data in the
 table at commit time.  Which indicates that it's the 'newroot record
 that saves the day normally.  And it's apparently generated by the
 first insert.

Yeah, because the correct empty state of a btree index is to have a
metapage but no root page, so the first insert forces creation of a root
page.  And, by chance, btree_xlog_newroot restores the metapage from
scratch, so this works even if the metapage had been missing or corrupt.

However, things would still break if the first access to the index was
a read attempt rather than an insert.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Asynchronous execution on FDW


On 07/02/2015 08:48 AM, Kyotaro HORIGUCHI wrote:

- It was a problem when to give the first kick for async exec. It
   is not in ExecInit phase, and ExecProc phase does not fit,
   too. An extra phase ExecPreProc or something is too
   invasive. So I tried pre-exec callback.

   Any init-node can register callbacks on their turn, then the
   registerd callbacks are called just before ExecProc phase in
   executor. The first patch adds functions and structs to enable
   this.


At a quick glance, I think this has all the same problems as starting 
the execution at ExecInit phase. The correct way to do this is to kick 
off the queries in the first IterateForeignScan() call. You said that 
ExecProc phase does not fit - why not?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

On 2015-07-03 19:26:05 +0200, Andres Freund wrote:
 On 2015-07-03 19:02:29 +0200, Andres Freund wrote:
  Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm
  right now missing how the whole skip wal logging if relation has just
  been truncated optimization can ever actually be crashsafe unless we
  use a new relfilenode (which we don't!).
 
 We actually used to use a different relfilenode, but optimized that
 away: cab9a0656c36739f59277b34fea8ab9438395869
 
 commit cab9a0656c36739f59277b34fea8ab9438395869
 Author: Tom Lane t...@sss.pgh.pa.us
 Date:   Sun Aug 23 19:23:41 2009 +
 
 Make TRUNCATE do truncate-in-place when processing a relation that was 
 created
 or previously truncated in the current (sub)transaction.  This is safe 
 since
 if the (sub)transaction later rolls back, we'd just discard the rel's 
 current
 physical file anyway.  This avoids unreasonable growth in the number of
 transient files when a relation is repeatedly truncated.  Per a 
 performance
 gripe a couple weeks ago from Todd Cook.
 
 to me the reasoning here looks flawed.

It looks to me we need to re-neg on this a bit. I think we can still be
more efficient than the general codepath: We can drop the old
relfilenode immediately. But pg_class.relfilenode has to differ from the
old after the truncation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Andres Freund and...@anarazel.de writes:
 On 2015-07-03 19:26:05 +0200, Andres Freund wrote:
 commit cab9a0656c36739f59277b34fea8ab9438395869
 Author: Tom Lane t...@sss.pgh.pa.us
 Date:   Sun Aug 23 19:23:41 2009 +
 
 Make TRUNCATE do truncate-in-place when processing a relation that was 
 created
 or previously truncated in the current (sub)transaction.  This is safe since
 if the (sub)transaction later rolls back, we'd just discard the rel's current
 physical file anyway.  This avoids unreasonable growth in the number of
 transient files when a relation is repeatedly truncated.  Per a performance
 gripe a couple weeks ago from Todd Cook.
 
 to me the reasoning here looks flawed.

 It looks to me we need to re-neg on this a bit. I think we can still be
 more efficient than the general codepath: We can drop the old
 relfilenode immediately. But pg_class.relfilenode has to differ from the
 old after the truncation.

Why exactly?  The first truncation in the (sub)xact would have assigned a
new relfilenode, why do we need another one?  The file in question will
go away on crash/rollback in any case, and no other transaction can see
it yet.

I'm prepared to believe that some bit of logic is doing the wrong thing in
this state, but I do not agree that truncate-in-place is unworkable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Determine operator from it's function

2015-07-03 Thread Jim Nasby


On 7/3/15 2:33 AM, Heikki Linnakangas wrote:

On 07/03/2015 01:20 AM, Jim Nasby wrote:

Is there a way to determine the operator that resulted in calling the
operator function? I thought fcinfo-flinfo-fn_expr might get set to
the OpExpr, but seems it can be a FuncExpr even when called via an
operator...


Don't think there is. Why do you need to know?


I'd like to support arbitrary operators in variant. I did initial 
testing and it looked like I was getting an OpExpr in fn_expr, but I 
think that's because I was using a real table to test with. When I do 
something like 'a'  'b' it looks like the operator gets written out of 
the plan. If that's indeed what's happening is there a hook I could use 
to change that behavior?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] More work on SortSupport for text - strcoll() and strxfrm() caching

2015-07-03 Thread Peter Geoghegan

Since apparently we're back to development work, I thought it was time
to share a patch implementing a few additional simple tricks to make
sorting text under a non-C locale even faster than in 9.5. These
techniques are mostly effective when values are physically clustered
together. This might be because there is a physical/logical
correlation, but cases involving any kind of clustering of values are
helped significantly.

Caching
==

The basic idea is that we cache strxfrm() blobs. Separately, we
exploit temporal locality and clustering of values by caching the
result of the most recent strcoll()-resolved comparison performed. The
strxfrm() technique helps a lot with low cardinality single attribute
sorts if we can avoid most strxfrm() work. On the other hand,
strcoll() comparison caching particularly helps with multi-attribute
sorts where there is a low to moderate cardinality secondary attribute
and low cardinality leading attribute. The master branch will still
opportunistically take the equality memcmp() fastpath plenty of times
for that second attribute, but there are no abbreviated keys to help
when that doesn't work out (because it isn't the leading attribute).

Regressions
==

The patch still helps with strcoll() avoidance when the ordering of a
moderate cardinality attribute is totally random, but it helps much
less there. I have not seen a regression for any case. I'm expecting
someone to ask me to do something with the program I wrote last year,
to prove the opportunistic memcmp() equality fastpath for text is
free [1]. This patch has exactly the same tension as last year's
memcmp() equality one [2]: I add something opportunistic, that in
general might consistently not work out at all in some cases, and on
the face of it implies extra costs for those cases -- costs which must
be paid every single time. So as with the opportunistic memcmp()
equality thing, the *actual* overhead for cases that do not benefit
must be virtually zero for the patch to be worthwhile. That is the
standard that I expect that this patch will be held to, too.

Benchmark
=

The query that I've been trying this out with is a typical rollup
query, using my cities sample data [3] (this is somewhat, although
not perfectly correlated on (country, province) before sorting):

postgres=# select country, province, count(*) from cities group by
rollup (country, province);
   country   |   province
  | count
-+---+
 Afghanistan | Badaẖšan
   |  5
 Afghanistan | Bādgīs
  |  2
 Afghanistan | Baġlān
  |  5
 Afghanistan | Balẖ
   |  6
 Afghanistan | Bāmiyān
  |  3
 Afghanistan | Farāh
  |  3
 Afghanistan | Fāryāb
  |  4
 Afghanistan | Ġawr
  |  3

*** SNIP *
...

 Zimbabwe| Manicaland
  | 22
 Zimbabwe| Mashonaland Central
  | 13
 Zimbabwe| Mashonaland East
  |  9
 Zimbabwe| Mashonaland West
  | 21
 Zimbabwe| Masvingo
  | 11
 Zimbabwe| Matabeleland North
  |  8
 Zimbabwe| Matabeleland South
  | 14
 Zimbabwe| Midlands
  | 14
 Zimbabwe| [null]
  |116
 [null]  | [null]
  | 317102
(3529 rows)

With master, this takes about 525ms when it stabilizes after a few
runs on my laptop. With the patch, it takes about 405ms. That's almost
a 25% reduction in total run time. If I perform a more direct test of
sort performance against this data with minimal non-sorting overhead,
I see a reduction of as much as 30% in total query runtime (I chose
this rollup query because it is obviously representative of the real
world).

If this data is *perfectly* correlated (e.g. because someone ran
CLUSTER) and some sort can use the dubious bubble sort best case
path [4] that we added to qsort back in 2006, the improvement still
hold up at ~20%, I've found.

Performance of the C locale
---

For this particular rollup query, my 25% improvement leaves the
collated text sort perhaps marginally faster than an equivalent query
that uses the C locale (with or without the patch applied). It's
hard to be sure that that effect is real -- many trials are needed --
but it's reasonable to speculate that it's possible to sometimes beat
the C locale because of factors like final abbreviated key
cardinality.

It's easy to *contrive* a case where the C locale is beaten even
with 9.5 -- just sort a bunch

Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-03 Thread Noah Misch

On Wed, Jul 01, 2015 at 03:22:33AM -0400, Noah Misch wrote:
 On Tue, Jun 30, 2015 at 09:45:08AM -0400, Tom Lane wrote:
  Noah Misch n...@leadboat.com writes:
   On Sun, Jun 28, 2015 at 07:00:14PM -0400, Tom Lane wrote:
   Another idea would be to make a test during postmaster start to see
   if this bug exists, and fail if so.  I'm generally on board with the
   thought that we don't need to work on systems with such a bad bug,
   but it would be a good thing if the failure was clean and produced
   a helpful error message, rather than looking like a Postgres bug.
  
   Failing cleanly on unpatched Solaris is adequate, agreed.  A check at
   postmaster start isn't enough, because the postmaster might run in the C
   locale while individual databases or collations use problem locales.  The
   safest thing is to test after every setlocale(LC_COLLATE) and
   newlocale(LC_COLLATE).  That's once at backend start and once per backend 
   per
   collation used, more frequent than I would like.  Hmm.

Solaris does not have locale_t or strxfrm_l(), so per-collation checks aren't
relevant after all.  Checking at postmaster start and backend start seems
cheap enough, hence this patch.
commit 17d0abf (HEAD)
Author: Noah Misch n...@leadboat.com
AuthorDate: Fri Jul 3 19:59:50 2015 -0400
Commit: Noah Misch n...@leadboat.com
CommitDate: Fri Jul 3 19:59:50 2015 -0400

Revoke support for strxfrm() that write past the specified array length.

This formalizes a decision implicit in commit
4ea51cdfe85ceef8afabceb03c446574daa0ac23 and adds clean detection of
affected systems.  Vendor updates are available for each such known bug.
Back-patch to 9.5, where the aforementioned commit first appeared.

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 2ecadd6..4fad6f3 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -149,6 +149,8 @@ main(int argc, char *argv[])
 */
unsetenv(LC_ALL);
 
+   check_strxfrm_bug();
+
/*
 * Catch standard options before doing much else, in particular before 
we
 * insist on not being root.
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 84215e0..d91959e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -855,6 +855,64 @@ IsoLocaleName(const char *winlocname)
 
 
 /*
+ * Detect aging strxfrm() implementations that, in a subset of locales, write
+ * past the specified buffer length.  Affected users must update OS packages
+ * before using PostgreSQL 9.5 or later.
+ *
+ * Assume that the bug can come and go from one postmaster startup to another
+ * due to physical replication among diverse machines.  Assume that the bug's
+ * presence will not change during the life of a particular postmaster.  Given
+ * those assumptions, call this no less than once per postmaster startup per
+ * LC_COLLATE setting used.  No known-affected system offers strxfrm_l(), so
+ * there is no need to consider pg_collation locales.
+ */
+void
+check_strxfrm_bug(void)
+{
+   charbuf[32];
+   const int   canary = 0x7F;
+   boolok = true;
+
+   /*
+* Given a two-byte ASCII string and length limit 7, 8 or 9, Solaris 10
+* 05/08 returns 18 and modifies 10 bytes.  It respects limits above or
+* below that range.
+*
+* The bug is present in Solaris 8 as well; it is absent in Solaris 10
+* 01/13 and Solaris 11.2.  Affected locales include is_IS.ISO8859-1,
+* en_US.UTF-8, en_US.ISO8859-1, and ru_RU.KOI8-R.  Unaffected locales
+* include de_DE.UTF-8, de_DE.ISO8859-1, zh_TW.UTF-8, and C.
+*/
+   buf[7] = canary;
+   (void) strxfrm(buf, ab, 7);
+   if (buf[7] != canary)
+   ok = false;
+
+   /*
+* illumos bug #1594 was present in the source tree from 2010-10-11 to
+* 2012-02-01.  Given an ASCII string of any length and length limit 1,
+* affected systems ignore the length limit and modify a number of bytes
+* one less than the return value.  The problem inputs for this bug do 
not
+* overlap those for the Solaris bug, hence a distinct test.
+*
+* Affected systems include smartos-20110926T021612Z.  Affected locales
+* include en_US.ISO8859-1 and en_US.UTF-8.  Unaffected locales include 
C.
+*/
+   buf[1] = canary;
+   (void) strxfrm(buf, a, 1);
+   if (buf[1] != canary)
+   ok = false;
+
+   if (!ok)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYSTEM_ERROR),
+errmsg_internal(strxfrm(), in locale \%s\, 
writes past the specified array length,
+
setlocale(LC_COLLATE, NULL)),
+errhint(Apply system library package 
updates.)));
+}
+
+
+/*
  * Cache

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

On July 3, 2015 06:21:09 PM Tom Lane wrote:
 I wonder whether we should consider inventing similar views for
 pg_hba.conf and pg_ident.conf.

(Apologies for the flurry of emails).

Was there not an attempt at a view for pg_hba.conf which ended in a lot of 
bikeshedding and no decisions?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

On July 3, 2015 09:24:36 PM Jan de Visser wrote:
 On July 3, 2015 06:21:09 PM Tom Lane wrote:
  BTW, this version of this patch neglects to update the comments in
  miscadmin.h, and it makes the return convention for
  ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy
  and inconsistency in the comments is a symptom of that.  I didn't read it
  in enough detail to say whether there are other problems.
 
 OK, miscadmin.h. I'll go and look what that's all about. And would it make
 sense to find a better solution for the ProcessConfigFileInternal return
 value (which is convoluted, I agree - I went for the solution with the
 least impact on existing code), or should I improve documentation?
 

Heh. I actually touched that file. I completely missed those comments (or saw 
them, thought that I should update them, and then forgot about them - just as 
likely). I'll obviously fix this if we carry this to completion.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

Attached a new patch, rebased against the current head. Errors in
pg_hba.conf and pg_ident.conf are now also noticed.

I checked the documentation for pg_ctl reload, and the only place where
it's explained seems to be runtime.sgml and that description is so
high-level that adding this new bit of functionality wouldn't make much
sense.

On Sat, Apr 25, 2015 at 10:03 AM, Jan de Visser j...@de-visser.net wrote:

 On April 22, 2015 06:04:42 PM Payal Singh wrote:
  The following review has been posted through the commitfest application:
  make installcheck-world:  not tested
  Implements feature:   tested, failed
  Spec compliant:   not tested
  Documentation:tested, failed
 
  Error in postgresql.conf gives the expected result on pg_ctl reload,
  although errors in pg_hba.conf file don't. Like before, reload completes
  fine without any information that pg_hba failed to load and only
  information is present in the log file. I'm assuming pg_ctl reload should
  prompt user if file fails to load irrespective of which file it is -
  postgresql.conf or pg_hba.conf.

 Will fix. Not hard, just move the code that writes the .pid file to after
 the
 pg_hba reload.

 
  There is no documentation change so far, but I guess that's not yet
  necessary.

 I will update the page for pg_ctl. At least the behaviour of -w/-W in
 relation
 to the reload command needs explained.

 
  gmake check passed all tests.
 
  The new status of this patch is: Waiting on Author

 jan


diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index df8037b..909a078 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1235,6 +1235,15 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
+	 * Update postmaster.pid with startup time as the last reload time:
+	 */
+	{
+		char last_reload_info[32];
+		snprintf(last_reload_info, 32, %ld %d, (long) MyStartTime, 1);
+		AddToDataDirLockFile(LOCK_FILE_LINE_LAST_RELOAD, last_reload_info);
+	}
+
+	/*
 	 * Remember postmaster startup time
 	 */
 	PgStartTime = GetCurrentTimestamp();
@@ -2349,6 +2358,8 @@ static void
 SIGHUP_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
+	boolreload_success;
+	charlast_reload_info[32];
 
 	PG_SETMASK(BlockSig);
 
@@ -2356,7 +2367,8 @@ SIGHUP_handler(SIGNAL_ARGS)
 	{
 		ereport(LOG,
 (errmsg(received SIGHUP, reloading configuration files)));
-		ProcessConfigFile(PGC_SIGHUP);
+		reload_success = ProcessConfigFile(PGC_SIGHUP);
+
 		SignalChildren(SIGHUP);
 		SignalUnconnectedWorkers(SIGHUP);
 		if (StartupPID != 0)
@@ -2379,13 +2391,25 @@ SIGHUP_handler(SIGNAL_ARGS)
 			signal_child(PgStatPID, SIGHUP);
 
 		/* Reload authentication config files too */
-		if (!load_hba())
+		if (!load_hba()) {
+			reload_success = 0; 
 			ereport(WARNING,
 	(errmsg(pg_hba.conf not reloaded)));
+		}
 
-		if (!load_ident())
+		if (!load_ident()) {
+			reload_success = 0;
 			ereport(WARNING,
 	(errmsg(pg_ident.conf not reloaded)));
+		}
+
+		/*
+		 * Write the current time and the result of the reload to the
+		 * postmaster.pid file.
+		 */
+		snprintf(last_reload_info, 32, %ld %d,
+(long) time(NULL), reload_success);
+		AddToDataDirLockFile(LOCK_FILE_LINE_LAST_RELOAD, last_reload_info);
 
 #ifdef EXEC_BACKEND
 		/* Update the starting-point file for future children */
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 5b5846c..2f5537d 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -117,12 +117,13 @@ STRING			\'([^'\\\n]|\\.|\'\')*\'
  * If a hard error occurs, no values will be changed.  (There can also be
  * errors that prevent just one value from being changed.)
  */
-void
+bool
 ProcessConfigFile(GucContext context)
 {
 	int			elevel;
 	MemoryContext config_cxt;
 	MemoryContext caller_cxt;
+bool  ok;
 
 	/*
 	 * Config files are processed on startup (by the postmaster only) and on
@@ -153,16 +154,19 @@ ProcessConfigFile(GucContext context)
 	/*
 	 * Read and apply the config file.  We don't need to examine the result.
 	 */
-	(void) ProcessConfigFileInternal(context, true, elevel);
+ok = ProcessConfigFileInternal(context, true, elevel) != NULL;
 
 	/* Clean up */
 	MemoryContextSwitchTo(caller_cxt);
 	MemoryContextDelete(config_cxt);
+	return ok;
 }
 
 /*
  * This function handles both actual config file (re)loads and execution of
- * show_all_file_settings() (i.e., the pg_file_settings view).  In the latter
+ * show_all_file_settings() (i.e., the pg_file_settings view).  In the former
+ * case, the settings are applied and this function returns the ConfigVariable
+ * list when this is successful, or NULL when it is not.  In the latter
  * case we don't apply any of the settings, but we make all the usual validity
  * checks, and we return the ConfigVariable list so that it can be printed out
  * by show_all_file_settings().
@@ -505,9 +509,13 @@ bail_out:

Re: [HACKERS] Rounding to even for numeric data type

Michael Paquier michael.paqu...@gmail.com writes:
 On Sat, May 2, 2015 at 9:53 PM, Fabien COELHO wrote:
 Quick review: patches applies, make check is fine, all is well.

 Thanks for the feedback, Fabien!

 All the casting tests could be put in numeric.sql, as there are all
 related to numeric and that would avoid duplicating the values lists.

 Not sure about that, the tests are placed here to be consistent with
 for is done for float8.

 For the documentation, I would also add 3.5 so that rounding to even is even
 clearer:-)

 Good idea. I reworked the example in the docs.

Pushed with minor adjustments --- you missed updating
int8-exp-three-digits.out, and I thought the documentation wording
could be better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

Petr Korobeinikov pkorobeini...@gmail.com writes:
 Fixed. Now both \ev and \sv  numbering lines starting with 1. New version
 attached.

Applied with a fair amount of mostly-cosmetic adjustment.

 As I've already noticed that pg_get_viewdef() does not support full syntax
 of creating or replacing views.

Oh?  If that were true, pg_dump wouldn't work on such views.  It is kind
of a PITA for this purpose that it doesn't include the CREATE text for
you, but we're surely not changing that behavior now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Martijn van Oosterhout

On Fri, Jul 03, 2015 at 07:21:21PM +0200, Andres Freund wrote:
 On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote:
  Am I missing something. ISTM that if the truncate record was simply not
  logged at all everything would work fine. The whole point is that the
  table was created in this transaction and so if it exists the table on
  disk must be the correct representation.
 
 That'd not work either. Consider:
 
 BEGIN;
 CREATE TABLE ...
 INSERT;
 TRUNCATE;
 INSERT;
 COMMIT;
 
 If you replay that without a truncation wal record the second INSERT
 will try to add stuff to already occupied space. And they can have
 different lengths and stuff, so you cannot just ignore that fact.

I was about to disagree with you by suggesting that if the table was
created in this transaction then WAL logging is skipped. But testing
shows that inserts are indeed logged, as you point out.

With inserts the WAL records look as follows (relfilenodes changed):

martijn@martijn-jessie:~/git/ctm/docker$ sudo 
/usr/lib/postgresql/9.4/bin/pg_xlogdump -p /tmp/pgtest/postgres/pg_xlog/ 
00010001 |grep -wE '16386|16384|16390'
rmgr: Storage len (rec/tot): 16/48, tx:  0, lsn: 
0/016A79C8, prev 0/016A79A0, bkp: , desc: file create: base/12139/16384
rmgr: Sequencelen (rec/tot):158/   190, tx:683, lsn: 
0/016B4258, prev 0/016B2508, bkp: , desc: log: rel 1663/12139/16384
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016B4318, prev 0/016B4258, bkp: , desc: file create: base/12139/16386
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016B9468, prev 0/016B9418, bkp: , desc: file create: base/12139/16390
rmgr: Sequencelen (rec/tot):158/   190, tx:683, lsn: 
0/016BC938, prev 0/016BC880, bkp: , desc: log: rel 1663/12139/16384
rmgr: Sequencelen (rec/tot):158/   190, tx:683, lsn: 
0/016BCAF0, prev 0/016BCAA0, bkp: , desc: log: rel 1663/12139/16384
rmgr: Heaplen (rec/tot): 35/67, tx:683, lsn: 
0/016BCBB0, prev 0/016BCAF0, bkp: , desc: insert(init): rel 
1663/12139/16386; tid 0/1
rmgr: Btree   len (rec/tot): 20/52, tx:683, lsn: 
0/016BCBF8, prev 0/016BCBB0, bkp: , desc: newroot: rel 1663/12139/16390; 
root 1 lev 0
rmgr: Btree   len (rec/tot): 34/66, tx:683, lsn: 
0/016BCC30, prev 0/016BCBF8, bkp: , desc: insert: rel 1663/12139/16390; tid 
1/1
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016BCC78, prev 0/016BCC30, bkp: , desc: file truncate: base/12139/16386 
to 0 blocks
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016BCCA8, prev 0/016BCC78, bkp: , desc: file truncate: base/12139/16390 
to 0 blocks
rmgr: Heaplen (rec/tot): 35/67, tx:683, lsn: 
0/016BCCD8, prev 0/016BCCA8, bkp: , desc: insert(init): rel 
1663/12139/16386; tid 0/1
rmgr: Btree   len (rec/tot): 20/52, tx:683, lsn: 
0/016BCD20, prev 0/016BCCD8, bkp: , desc: newroot: rel 1663/12139/16390; 
root 1 lev 0
rmgr: Btree   len (rec/tot): 34/66, tx:683, lsn: 
0/016BCD58, prev 0/016BCD20, bkp: , desc: insert: rel 1663/12139/16390; tid 
1/1
 
   relname   | relfilenode 
-+-
 test|   16386
 test_id_seq |   16384
 test_pkey   |   16390
(3 rows)

And amazingly, the database cluster successfuly recovers and there's no
error now.  So the problem is *only* because there is no data in the
table at commit time.  Which indicates that it's the 'newroot record
that saves the day normally.  And it's apparently generated by the
first insert.

 Agreed. I think the problem is something else though. Namely that we
 reuse the relfilenode for heap_truncate_one_rel(). That's just entirely
 broken afaics. We need to allocate a new relfilenode and write stuff
 into that. Then we can forgo WAL logging the truncation record.

Would that properly initialise the index though?

Anyway, this is way outside my expertise, so I'll bow out now. Let me
know if I can be of more assistance.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

Andreas Karlsson andr...@proxel.se writes:
 On 03/21/2015 01:19 PM, Julien Tachoires wrote:
 I am confused by your fix. Wouldn't cleaner fix be to use
 tbinfo-reltablespace rather than tbinfo-reltoasttablespace when
 calling ArchiveEntry()?

 Yes, doing this that way is cleaner. Here is a new version including
 your fix. Thanks.

 I am now satisfied with how the patch looks. Thanks for your work. I 
 will mark this as ready for committeer now but not expect any committer 
 to look at it until the commitfest starts.

I have just looked through this thread, and TBH I think we should reject
this patch altogether --- not RWF, but no we don't want this.  The
use-case remains hypothetical: no performance numbers showing a real-world
benefit have been exhibited AFAICS.  And allowing a toast table to be in a
different tablespace from its parent opens up a host of corner cases that,
despite the many revisions of the patch so far, probably haven't all been
addressed yet.  (For instance, I wonder whether pg_upgrade copes with
toast tables in non-parent tablespaces.)

I regret the idea of wasting all the work that's been poured into this,
but I think pushing this patch forward will just waste even more time,
now and in the future, for benefit that will be at most marginal.

If any committers nonetheless want to take this up, be advised that it's
far from committable as-is.  Here are some notes just from looking at
the pg_dump part of the patch:

* Addition in pg_backup_archiver.c seems pretty dubious; we don't
handle --no-tablespace that way for other cases.

* Why is getTables() collecting toast tablespace only from latest-model
servers?  This will likely lead to doing the wrong thing (ie, dumping
incorrect ALTER SET TOAST TABLESPACE pg_default commands) when dumping
from an older server.

* dumpTOASTTablespace (man, that's an ugly choice of name ... camel
case combined with all-upper-case-words is a bad idea) neglects the
possibility that it needs to quote the tablespace name.

* Assorted random whitespace inconsistencies, only some of which would
be cleaned up by pgindent.

I've not studied the rest of the patch, but it would clearly need to
be gone over very carefully to get to a committable state.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?



power 1,5 is almost certainly not right for all cases, but it is simple 
and better.


It is better in some cases, as I've been told on my patch. If you have a 
separate disk for WAL writes the power formula may just degrade 
performance, or maybe not, or not too much, or it really should be a guc.


Well, I just think that it needs more performance testing with various 
loads and sizes, really. I'm not against this patch at all.



And easy to remove if something even better arrives.

I don't see the two patches being in conflict.


They are not in conflict from a git point of view, or even so it would 
be trivial to solve.


They are in conflict as the patch changes the checkpoint load 
significantly, which would mean that my X00 hours of performance testing 
on the checkpoint scheduler should more or less be run again. Ok, it is 
somehow egoistic, but I'm trying to avoid wasting people time.


Another point is that I'm not sure I understand the decision process: for 
some patch in some area extensive performance tests are required, and for 
other patches in the same area they would not be.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: remove nclients/nthreads constraint from pgbench


On 07/03/2015 07:50 AM, Fabien COELHO wrote:

This doesn't behave correctly if you set -j to greater than -c, and also use
the rate limit option:


Indeed. v3 attached fixed the case when nthreads  nclients.


Ok, committed. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?



Hello Andres,


In conclusion, and very egoistically, I would prefer if this patch could
wait for the checkpoint scheduling patch to be considered, as it would
basically invalidate the X00 hours of performance tests I ran:-)


These two patches target pretty independent mechanics. If you patch were
significantly influenced by this something would be wrong. It might
decrease the benefit of your patch a mite, but that's not really a
problem.


That is not the issue I see. On the principle of performance testing it 
really means that I should rerun the tests, even if I expect that the 
overall influence would be pretty small in this case. This is my egoistic 
argument. Well, probably I would just rerun a few cases to check that the 
impact is mite, as you said, not all cases.


Another point is that I'm not sure that this patch is ripe, in particular 
I'm skeptical about the hardcoded 1.5 without further testing. Maybe it is 
good, maybe 1.3 or 1.6 is better, maybe it depends and it should just be a 
guc with some advises about how to set it. So I really think that it needs 
more performance figures than it has a positive effect on one load.


Well, this is just my opinion, no need to care too much about it:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

On Fri, Jul 3, 2015 at 5:56 AM, Heikki Linnakangas wrote:
 On 04/10/2015 01:46 PM, chenhj wrote:
 I was about to commit the attached, but when I tested this between my
 Windows 8.1 virtual machine and Linux host, I was not able to see any
 performance difference. It may be because the case is hobbled by other
 inefficiencies, in the virtualization or somewhere else, but I wonder if
 others can reproduce the speedup?

I just gave this a shot in a 8.1 VM, but I could not reproduce a
speedup of more than a couple of percents (sorry no servers
available), still this seemed to be some noise.

The approach taken by this patch sounds safe enough to me that it
should be applied. I mean, we know that on Win8/2k12 the default
socket buffer size is 64k so reducing it to 32k would hurt
performance.

By the way, perhaps the link mentioned in this code should be updated
to this one, visibly microsoft has changed an bit the URL shape:
https://support.microsoft.com/en-us/kb/823764
A refresh would not hurt.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix pgbench --progress report under (very) low rate


On 06/15/2015 09:12 PM, Fabien COELHO wrote:



v3 rebase (after pgbench moved to src/bin) and minor style tweaking.


v4 adds a fix to another progress timing issue:

Currently if pgbench/postgres get stuck somewhere, the report catches up
by repeating progresses several time in a row, which looks like that:

progress: 10.0 s ...
progress: 11.0 s ... stuck...
progress: 14.2 s catchup for 11.0 - 14.2
progress: 14.2 s stupid data
progress: 14.2 s stupid data
progress: 15.0 s ...
progress: 16.0 s ...

The correction removes the stupid data lines which compute a reports on
a very short time, including absurd tps figures.

Yet again, shame on me in the first place for this behavior.


Thanks, applied. I chose to also backpatch this, although arguably this 
is a change in behaviour that would not be good to change in a minor 
version. However, progress reports are a very user-facing feature, it's 
not going to break anyone's scripts.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

2015-07-03 Thread David Rowley

On 3 July 2015 at 20:06, Andres Freund and...@anarazel.de wrote:

 On 2015-07-02 23:56:16 +0300, Heikki Linnakangas wrote:
  On 04/10/2015 01:46 PM, chenhj wrote:
  Result(execute time):
  default(8K),  7.370s
  set SO_SNDBUF to 32K, 4.159s(the current implement)
  set SO_SNDBUF to 64K, 2.875s
  set SO_SNDBUF to 128K,  1.593s
  set SO_SNDBUF to 256K,  1.324s
 
  I was about to commit the attached, but when I tested this between my
  Windows 8.1 virtual machine and Linux host, I was not able to see any
  performance difference. It may be because the case is hobbled by other
  inefficiencies, in the virtualization or somewhere else, but I wonder if
  others can reproduce the speedup?

 Given that too small sockets incur significantly smaller latency bumps
 in a virtualized environment than in a real network, and hit another set
 of buffers inside the virtualization technology,, I'm not particularly
 surprised by that.


I'm wondering what the original test setup was. I'm assuming psql and
postgres both running on separate windows machines?

I've tested the patch just connecting to a database running on localhost
and I'm not getting much of a speedup. Perhaps 1%, if that's not noise. I
don't have enough hardware here to have client and server on separate
machines, at least not with a stable network that goes through copper.

Here's the results.

Unpatched:

-- 100MB
Measure-Command { .\psql.exe -d postgres -t -A -c select
'1'::char(1000),generate_series(1,10)  $null }

TotalMilliseconds : 1997.3908
TotalMilliseconds : 2111.4119
TotalMilliseconds : 2040.4415
TotalMilliseconds : 2167.5532
TotalMilliseconds : 2087.6444
TotalMilliseconds : 2117.3759
TotalMilliseconds : 2100.3229
TotalMilliseconds : 2132.3522
TotalMilliseconds : 2129.9487
TotalMilliseconds : 2101.675

Median: 2106.54345
Average: 2098.61165

-- 500MB
Measure-Command { .\psql.exe -d postgres -t -A -c select
'1'::char(1000),generate_series(1,50)  $null }

TotalMilliseconds : 10344.4251
TotalMilliseconds : 10248.3671
TotalMilliseconds : 10370.3856
TotalMilliseconds : 10412.507
TotalMilliseconds : 10469.173
TotalMilliseconds : 10248.8889
TotalMilliseconds : 10331.9476
TotalMilliseconds : 10320.7841
TotalMilliseconds : 10470.3022
TotalMilliseconds : 10333.4203

Median: 10338.9227
Average: 10355.02009


Patched:

-- 100MB
Measure-Command { .\psql.exe -d postgres -t -A -c select
'1'::char(1000),generate_series(1,10)  $null }

TotalMilliseconds : 2066.3701
TotalMilliseconds : 2106.6628
TotalMilliseconds : 2110.2459
TotalMilliseconds : 2047.8337
TotalMilliseconds : 2081.9166
TotalMilliseconds : 2034.7086
TotalMilliseconds : 2082.9072
TotalMilliseconds : 2146.6878
TotalMilliseconds : 2133.351
TotalMilliseconds : 2076.6862

Median: 2082.4119
Average: 2088.73699

-- 500MB
Measure-Command { .\psql.exe -d postgres -t -A -c select
'1'::char(1000),generate_series(1,50)  $null }

TotalMilliseconds : 10217.4794
TotalMilliseconds : 10244.8074
TotalMilliseconds : 10451.7265
TotalMilliseconds : 10162.9862
TotalMilliseconds : 10304.1866
TotalMilliseconds : 10374.7922
TotalMilliseconds : 10227.9632
TotalMilliseconds : 10145.5825
TotalMilliseconds : 10298.7048
TotalMilliseconds : 10170.3754

Median: 10236.3853
Average: 10259.86042


Comparison (Unpatched / Patched)
100MB 500MB
Median 101.16% 101.00%
Average 100.47% 100.93%

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] copy.c handling for RLS is insecure

2015-07-03 Thread Noah Misch

On Tue, Dec 02, 2014 at 11:32:27AM -0500, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost sfr...@snowman.net wrote:
   Alright, I've done the change to use the RangeVar from CopyStmt, but
   also added a check wherein we verify that the relation's OID returned
   from the planned query is the same as the relation's OID that we did the
   RLS check on- if they're different, we throw an error.  Please let me
   know if there are any remaining concerns.

Here is the check in question (added in commit 143b39c):

plan = planner(query, 0, NULL);

/*
 * If we were passed in a relid, make sure we got the same one 
back
 * after planning out the query.  It's possible that it changed
 * between when we checked the policies on the table and 
decided to
 * use a query and now.
 */
if (queryRelId != InvalidOid)
{
Oid relid = 
linitial_oid(plan-relationOids);

/*
 * There should only be one relationOid in this case, 
since we
 * will only get here when we have changed the command 
for the
 * user from a COPY relation TO to COPY (SELECT * 
FROM
 * relation) TO, to allow row level security policies 
to be
 * applied.
 */
Assert(list_length(plan-relationOids) == 1);

if (relid != queryRelId)
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg(relation referenced by COPY statement 
has changed)));
}

  That's clearly an improvement, but I'm not sure it's water-tight.
  What if the name that originally referenced a table ended up
  referencing a view?  Then you could get
  list_length(plan-relationOids) != 1.
 
 I'll test it out and see what happens.  Certainly a good question and
 if there's an issue there then I'll get it addressed.

Yes, it can be made to reference a view and trip the assertion.

  (And, in that case, I also wonder if you could get
  eval_const_expressions() to do evil things on your behalf while
  planning.)
 
 If it can be made to reference a view then there's an issue as the view
 might include a function call itself which is provided by the attacker..

Indeed.  As the parenthetical remark supposed, the check happens too late to
prevent a security breach.  planner() has run eval_const_expressions(),
executing code of the view owner's choosing.

 Clearly, if we found a relation originally then we need that same
 relation with the same OID after the conversion to a query.

That is necessary but not sufficient.  CREATE RULE can convert a table to a
view without changing the OID, thereby fooling the check.  Test procedure:

-- as superuser (or createrole)
create user blackhat;
create user alice;

-- as blackhat
begin;
create table exploit_rls_copy (c int);
alter table exploit_rls_copy enable row level security;
grant select on exploit_rls_copy to public;
commit;

-- as alice
-- first, set breakpoint on BeginCopy
copy exploit_rls_copy to stdout;

-- as blackhat
begin;
create or replace function leak() returns int immutable as $$begin
raise notice 'in leak()'; return 7; end$$ language plpgsql;
create rule _RETURN as on select to exploit_rls_copy do instead
select leak() as c from (values (0)) dummy;
commit;

-- Release breakpoint.  leak() function call happens.  After that, assertion
-- fires if enabled.  ERROR does not fire in any case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-03 Thread Beena Emerson

Hello,

This has been registered in the next 2015-09 CF since majority are in favor
of adding this multiple sync replication feature (with quorum/priority).

New patch will be submitted once we have reached a consensus on the design.

--
Beena Emerson

Re: [HACKERS] Determine operator from it's function


On 07/03/2015 01:20 AM, Jim Nasby wrote:

Is there a way to determine the operator that resulted in calling the
operator function? I thought fcinfo-flinfo-fn_expr might get set to
the OpExpr, but seems it can be a FuncExpr even when called via an
operator...


Don't think there is. Why do you need to know?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

On 2015-07-02 23:56:16 +0300, Heikki Linnakangas wrote:
 On 04/10/2015 01:46 PM, chenhj wrote:
 Result(execute time):
 default(8K),  7.370s
 set SO_SNDBUF to 32K, 4.159s(the current implement)
 set SO_SNDBUF to 64K, 2.875s
 set SO_SNDBUF to 128K,  1.593s
 set SO_SNDBUF to 256K,  1.324s
 
 I was about to commit the attached, but when I tested this between my
 Windows 8.1 virtual machine and Linux host, I was not able to see any
 performance difference. It may be because the case is hobbled by other
 inefficiencies, in the virtualization or somewhere else, but I wonder if
 others can reproduce the speedup?

Given that too small sockets incur significantly smaller latency bumps
in a virtualized environment than in a real network, and hit another set
of buffers inside the virtualization technology,, I'm not particularly
surprised by that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Resource Owner reassign Locks

On 2015-06-07 13:44:08 -0700, Jeff Janes wrote:
 I'd like to advocate for back-patching this to 9.0, 9.1, and 9.2.  It has
 run without problems for a while now, and it can be considered a bug that
 systems with a very large number of objects cannot be upgraded in a
 reasonable time.

In that case, how about working on a version for = 9.2 (single one
should suffice)? This will likely include a bunch of wrapper functions
to avoid changing the API in the back branches.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: remove nclients/nthreads constraint from pgbench




Indeed. v3 attached fixed the case when nthreads  nclients.


Ok, committed. Thanks!


Thanks!

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-03 Thread Sawada Masahiko

On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote:


 Also, the flags of each heap page header might be set PD_ALL_FROZEN,
 as well as all-visible


 Is it possible to have VM bits set to frozen but not visible?

 The description makes those two states sound independent of each other.

 Are they? Or not? Do we test for an impossible state?


It's impossible to have VM bits set to frozen but not visible.
These bit are controlled independently. But eventually, when
all-frozen bit is set, all-visible is also set.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pgbench - remove thread fork-emulation


On 04/28/2015 02:18 AM, Fabien COELHO wrote:

This patch removes the pgbench thread fork-emulation code and simplifies
things where possible, especially around pthread_create and pthread_join.
The stats collection for the report is done directly instead of using an
intermediate structure.

As a result, if no thread implementation is available, pgbench is
restricted to work with only the main thread (ie pgbench -j 1 ...).


== Rational ==

Pgbench currently provides a thread emulation through process forks. This
feature was developed way back when it may have been common that some
platforms were not supporting threads. This is now very rare (can you name
one such platform?).

However, the thread fork-emulation feature has drawbacks: Namely,
processes are not threads, the memory is not shared (sure), so it hinders
simple implementation for some features, or results in not providing these
features with fork-emulation, or having a different behavior under
fork-emulation:

Latency collection (-r) is not supported with fork emulation.

Progress (-P) is reported differently with fork emulation.

For a new feature under discussion, which consist in allowing one log
instead of per-thread logs, supporting fork-emulation requires a (heavy)
post-processing external sort phase whereas with actual threads all
threads can share and append to the same log file with limited overhead,
which is significantly simpler.


I agree with all that, it's time to let the fork-emulation mode to go. 
Committed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

On Fri, Jul 3, 2015 at 2:47 PM, Marco Atzeri marco.atz...@gmail.com wrote:
 On 7/2/2015 5:16 PM, Dave Page wrote:

 The PostgreSQL Global Development Group announces that the alpha
 release of PostgreSQL 9.5, the latest version of the world's leading
 open source database, is available today.  This release contains
 previews of all of the features which will be available in the final
 release of version 9.5, although some details will change before then.
 Please download, test, and report what you find.

 Help Test for Bugs
 --




 building on cygwin and

 $ perl --version

 This is perl 5, version 22, subversion 0 (v5.22.0) built for
 cygwin-thread-multi-64int

 build fail here, anyone seeing the same on other platforms ?

 make -C hstore_plperl all
 make[1]: Entering directory
 '/cygdrive/e/cyg_pub/devel/postgresql/prova/postgres
 ql-9.5alpha1-1.i686/build/contrib/hstore_plperl'
 gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
 -We   ndif-labels -Wmissing-format-attribute
 -Wformat-security -fno-strict-aliasing -fwrapv
 -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio
 n
 -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5a
 lpha1/src/pl/plperl
 -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/contrib/hstore -I.
 -I/pub/devel/postgresql/prova/postgres
 ql-9.5alpha1-1.i686/src/postgresql-9.5alpha1/contrib/hstore_plperl
 -I../../src/i   nclude
 -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-
 9.5alpha1/src/include -I/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE
 -c -o hstore_plperl.o
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/p
 ostgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c
 In file included from
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:6:0:
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1
 /contrib/hstore/hstore.h:79:0: warning: HS_KEY redefined
  #define HS_KEY(arr_,str_,i_) ((str_) + HSE_OFF((arr_)[2*(i_)]))
  ^
 In file included from
 /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/perl.h:
 3730:0,
  from
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/src/pl/plperl/plperl.h:48,
  from
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:4:
 /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/util.h:221:0: note: this
 is t   he location of the previous definition
  #  define HS_KEY(setxsubfn, popmark, apiver, xsver) \
  ^
 gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
 -We   ndif-labels -Wmissing-format-attribute
 -Wformat-security -fno-strict-aliasing -fwrapv
 -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio
 n -shared -o hstore_plperl.dll -Wl,--out-implib=libhstore_plperl.a
 hstore_plpe   rl.o  -L../../src/port
 -L../../src/common -Wl,-no-undefined -Wl,--allow-multiple
 -definition -Wl,--enable-auto-import  -L/usr/local/lib -Wl,--as-needed
 -L../..   /src/backend -lpostgres -lpgcommon
 -lpgport -lintl -lssl -lcrypto -lz -lreadline-lcrypt
 -lldap
 hstore_plperl.o: In function `hstore_to_plperl':
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1
 /contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to
 `hstoreUpgrade   '

Hm. dangomushi is using perl 5.22 and the build does not fail:



-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Martijn van Oosterhout

On Fri, Jul 03, 2015 at 02:34:44PM +0900, Fujii Masao wrote:
  Hmm, for me it is 100% reproducable. Are you familiar with Docker? I
  can probably construct a Dockerfile that reproduces it pretty reliably.
 
 I could reproduce the problem in the master branch by doing
 the following steps.

Thank you, I wasn't sure if you could kill the server fast enough
without containers, but it looks like immediate mode is enough.

 1. start the PostgreSQL server with wal_level = minimal
 2. execute the following SQL statements
  begin;
  create table test(id serial primary key);
  truncate table test;
  commit;
 3. shutdown the server with immediate mode
 4. restart the server (crash recovery occurs)
 5. execute the following SQL statement
 select * from test;
 
 The optimization of TRUNCATE opereation that we can use when
 CREATE TABLE and TRUNCATE are executed in the same transaction block
 seems to cause the problem. In this case, only index file truncation is
 logged, and index creation in btbuild() is not logged because wal_level
 is minimal. Then at the subsequent crash recovery, index file is truncated
 to 0 byte... Very simple fix is to log an index creation in that case,
 but not sure if that's ok to do..

Looks plausible to me.

For reference I attach a small tarball for reproduction with docker.

1. Unpack tarball into empty dir (it has three small files)
2. docker build -t test .
3. docker run -v /tmp/pgtest:/data test
4. docker run -v /tmp/pgtest:/data test

Data dir is in /tmp/pgtest

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


postgresql-test.tgz
Description: GNU Unix tar archive


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

On Fri, Jul 3, 2015 at 2:47 PM, Marco Atzeri marco.atz...@gmail.com wrote:
 On 7/2/2015 5:16 PM, Dave Page wrote:

 The PostgreSQL Global Development Group announces that the alpha
 release of PostgreSQL 9.5, the latest version of the world's leading
 open source database, is available today.  This release contains
 previews of all of the features which will be available in the final
 release of version 9.5, although some details will change before then.
 Please download, test, and report what you find.

 Help Test for Bugs
 --




 building on cygwin and

 $ perl --version

 This is perl 5, version 22, subversion 0 (v5.22.0) built for
 cygwin-thread-multi-64int

 build fail here, anyone seeing the same on other platforms ?

 make -C hstore_plperl all
 make[1]: Entering directory
 '/cygdrive/e/cyg_pub/devel/postgresql/prova/postgres
 ql-9.5alpha1-1.i686/build/contrib/hstore_plperl'
 gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
 -We   ndif-labels -Wmissing-format-attribute
 -Wformat-security -fno-strict-aliasing -fwrapv
 -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio
 n
 -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5a
 lpha1/src/pl/plperl
 -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/contrib/hstore -I.
 -I/pub/devel/postgresql/prova/postgres
 ql-9.5alpha1-1.i686/src/postgresql-9.5alpha1/contrib/hstore_plperl
 -I../../src/i   nclude
 -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-
 9.5alpha1/src/include -I/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE
 -c -o hstore_plperl.o
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/p
 ostgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c
 In file included from
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:6:0:
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1
 /contrib/hstore/hstore.h:79:0: warning: HS_KEY redefined
  #define HS_KEY(arr_,str_,i_) ((str_) + HSE_OFF((arr_)[2*(i_)]))
  ^
 In file included from
 /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/perl.h:
 3730:0,
  from
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/src/pl/plperl/plperl.h:48,
  from
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr
 c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:4:
 /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/util.h:221:0: note: this
 is t   he location of the previous definition
  #  define HS_KEY(setxsubfn, popmark, apiver, xsver) \
  ^
 gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
 -We   ndif-labels -Wmissing-format-attribute
 -Wformat-security -fno-strict-aliasing -fwrapv
 -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio
 n -shared -o hstore_plperl.dll -Wl,--out-implib=libhstore_plperl.a
 hstore_plpe   rl.o  -L../../src/port
 -L../../src/common -Wl,-no-undefined -Wl,--allow-multiple
 -definition -Wl,--enable-auto-import  -L/usr/local/lib -Wl,--as-needed
 -L../..   /src/backend -lpostgres -lpgcommon
 -lpgport -lintl -lssl -lcrypto -lz -lreadline-lcrypt
 -lldap
 hstore_plperl.o: In function `hstore_to_plperl':
 /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1
 /contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to
 `hstoreUpgrade   '

So... dangomushi is able to build it at least. Here are the logs to
the last build for example:
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dangomushidt=2015-07-02%2019%3A19%3A39stg=make-contrib
What is the name of the library generated in hstore? Perhaps there is
a mismatch here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-07-03 Thread Kouhei Kaigai

On 2015/07/02 23:13, Kouhei Kaigai wrote:
To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5. It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2]; (1) to
place that API call *after* the set_cheapest call and (2) to place
another set_cheapest call after that API call for each joinrel. By the
first set_cheapest call, I think we could probably save an alternative
path that we need in cheapest_builtin_path. By the second
set_cheapest call following that API call, we could consider
foreign/custom-join-pushdown paths also. What do you think about this
idea?

Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.

Maybe I'm missing something, but my image about this approach is that if
base relations for a given joinrel are all foreign tables and belong to
the same foreign server, then by calling that API there, we consider the
remote join over all the foreign tables, and that if not, we give up to
consider the remote join.

Your understanding is correct, but missing a point. Once foreign tables
to be joined are informed as a bitmap (joinrel-relids), it is not obvious
for extensions which relations are joined with INNER JOIN, and which ones
are joined with OUTER JOIN.
I tried to implement a common utility function under the v9.5 cycle,
however, it was suspicious whether we can make a reliable logic.

Also, I don't want to stick on the assumption that relations involved in
remote join are all managed by same foreign-server no longer.
The following two ideas introduce possible enhancement of remote join
feature that involved local relations; replicated table or transformed
to VALUES() clause.

http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=x...@mail.gmail.com
http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f8010f2...@bpxm15gp.gisp.nec.co.jp

Once we have to pay attention to the case of local/foreign relations
mixed, we have to care about the path of underlying local or foreign
relations managed by other foreign server.

I think add_paths_to_joinrel() is the best location for foreign-join,
not only custom-join. Relocation to standard_join_search() has larger
disadvantage than its advantage.

My idea is that we save the cheapest_total_path of RelOptInfo onto the
new cheapest_builtin_path just before the GetForeignJoinPaths() hook.
Why? It should be a built-in join logic, never be a foreign/custom-join,
because of the hook location; only built-in logic shall be added here.

My concern about your idea is that since that (a) add_paths_to_joinrel
is called multiple times per joinrel and that (b) repetitive add_path
calls through GetForeignJoinPaths in add_paths_to_joinrel might remove
old paths that are builtin, it's possible to save a path that is not
builtin onto the cheapest_total_path and thus to save that path wrongly
onto the cheapest_builtin_path. There might be a good way to cope with
that, though.

For the concern (a), FDW driver can reference RelOptInfo-fdw_private
that shall be initialized to NULL, then FDW driver will set valid data
if it preliminary adds something. IIRC, postgres_fdw also skips to
add same path multiple times.

For the concern (b), yep, we may enhance add_path() to retain built-in
path, instead of the add_paths_to_joinrel().
Let's adjust the logic a bit. The add_path() can know whether the given
path is usual or exceptional (ForeignPath/CustomPath towards none base
relation) one. If path is exceptional, the cheapest_builtin_path shall
be retained unconditionally. Elsewhere, the cheapest one replace here,
then the cheapest built-in path will survive.

Is it still problematic?

Regarding to the development timeline, I prefer to put something
workaround not to kick Assert() on ExecScanFetch().
We may add a warning in the documentation not to replace built-in
join if either/both of sub-trees are target of UPDATE/DELETE or
FOR SHARE/UPDATE.

I'm not sure that that is a good idea, but anyway, I think we need to
hurry fixing this issue.

My approach is not fix, but avoid. :-)

It may be an idea to implement the above fixup even though it may be
too large/late to apply v9.5 features, but we can understand how many
changes are needed to fixup this problem.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory leak fixes for pg_dump, pg_dumpall, initdb and pg_upgrade

On Fri, Jul 3, 2015 at 3:14 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 I committed some of these that seemed like improvements on readability
 grounds, but please just mark the rest as ignore in coverity.

Done. Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

2015-07-03 Thread Simon Riggs

On 3 July 2015 at 06:38, Fabien COELHO coe...@cri.ensmp.fr wrote:


 Hello Simon,

  We could do better, but that is not a reason not to commit this, as is.
 Commit, please.


 My 0,02€: Please do not commit without further testing...

 I've submitted a patch to improve checkpoint write scheduling, including
 X00 hours of performance test on various cases. This patch changes
 significantly the load distribution over the whole checkpoint, and AFAICS
 has been tested on rather small cases.

 I'm not sure that the power 1.5 is the right one for all cases. For a big
 checkpoint over 30 minutes, it may have, or not, very large and possibly
 unwanted effects. Maybe the 1.5 factor should really be a guc. Well, what I
 really think is that it needs performance measures.


power 1,5 is almost certainly not right for all cases, but it is simple and
better. And easy to remove if something even better arrives.

I don't see the two patches being in conflict.


 In conclusion, and very egoistically, I would prefer if this patch could
 wait for the checkpoint scheduling patch to be considered, as it would
 basically invalidate the X00 hours of performance tests I ran:-)


 I recommend making peace with yourself that probably 50% of development
time is wasted. But we try to keep the best half.

Thank you for your time spent contributing.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

On 2015-07-03 07:38:15 +0200, Fabien COELHO wrote:
 I've submitted a patch to improve checkpoint write scheduling, including X00
 hours of performance test on various cases. This patch changes significantly
 the load distribution over the whole checkpoint, and AFAICS has been tested
 on rather small cases.
 
 I'm not sure that the power 1.5 is the right one for all cases. For a big
 checkpoint over 30 minutes, it may have, or not, very large and possibly
 unwanted effects. Maybe the 1.5 factor should really be a guc. Well, what I
 really think is that it needs performance measures.
 
 In conclusion, and very egoistically, I would prefer if this patch could
 wait for the checkpoint scheduling patch to be considered, as it would
 basically invalidate the X00 hours of performance tests I ran:-)

These two patches target pretty independent mechanics. If you patch were
significantly influenced by this something would be wrong. It might
decrease the benefit of your patch a mite, but that's not really a
problem.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko

On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus j...@agliodbs.com wrote:
 On 07/02/2015 12:44 PM, Andres Freund wrote:
 On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
 So there's two parts to this:

 1. I need to ensure that data is replicated to X places.

 2. I need to *know* which places data was synchronously replicated to
 when the master goes down.

 My entire point is that (1) alone is useless unless you also have (2).

 I think there's a good set of usecases where that's really not the case.

 Please share!  My plea for usecases was sincere.  I can't think of any.

 And do note that I'm talking about information on the replica, not on
 the master, since in any failure situation we don't have the old
 master around to check.

 How would you, even theoretically, synchronize that knowledge to all the
 replicas? Even when they're temporarily disconnected?

 You can't, which is why what we need to know is when the replica thinks
 it was last synced from the replica side.  That is, a sync timestamp and
 lsn from the last time the replica ack'd a sync commit back to the
 master successfully.  Based on that information, I can make an informed
 decision, even if I'm down to one replica.

 ... because we would know definitively which servers were in sync.  So
 maybe that's the use case we should be supporting?

 If you want automated failover you need a leader election amongst the
 surviving nodes. The replay position is all they need to elect the node
 that's furthest ahead, and that information exists today.

 I can do that already.  If quorum synch commit doesn't help us minimize
 data loss any better than async replication or the current 1-redundant,
 why would we want it?  If it does help us minimize data loss, how?

 In your example of 2 : { local_replica, london_server, nyc_server },
 if there is not something like quorum commit, only local_replica is synch
 and the other two are async. In this case, if the local data center gets
 destroyed, you need to promote either london_server or nyc_server. But
 since they are async, they might not have the data which have been already
 committed in the master. So data loss! Of course, as I said yesterday,
 they might have all the data and no data loss happens at the promotion.
 But the point is that there is no guarantee that no data loss happens.
 OTOH, if we use quorum commit, we can guarantee that either london_server
 or nyc_server has all the data which have been committed in the master.

 So I think that quorum commit is helpful for minimizing the data loss.


Yeah, quorum commit is helpful for minimizing data loss in comparison
with today replication.
But in this your case, how can we know which server we should use as
the next master server, after local data center got down?
If we choose a wrong one, we would get the data loss.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-03 Thread Syed, Rahila

Hello,

TBH, I think that designing this as a hook-based solution is adding a whole 
lot of complexity for no value.  The hard parts of the problem are collecting 
the raw data and making the results visible to users, and both of those 
require involvement of the core code.  Where is the benefit from pushing some 
trivial intermediate arithmetic into an external module?
If there's any at all, it's certainly not enough to justify problems such as 
you mention here.

So I'd just create a pgstat_report_percent_done() type of interface in 
pgstat.c and then teach VACUUM to call it directly.

Thank you for suggestion. I agree that adding code in core will reduce code 
complexity with no additional overhead. 
Going by the consensus, I will update the patch with code to collect and store 
progress information from vacuum in pgstat.c and
UI using pg_stat_activity view.

Thank you,
Rahila Syed

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus j...@agliodbs.com wrote:
 On 07/02/2015 12:44 PM, Andres Freund wrote:
 On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
 So there's two parts to this:

 1. I need to ensure that data is replicated to X places.

 2. I need to *know* which places data was synchronously replicated to
 when the master goes down.

 My entire point is that (1) alone is useless unless you also have (2).

 I think there's a good set of usecases where that's really not the case.

 Please share!  My plea for usecases was sincere.  I can't think of any.

 And do note that I'm talking about information on the replica, not on
 the master, since in any failure situation we don't have the old
 master around to check.

 How would you, even theoretically, synchronize that knowledge to all the
 replicas? Even when they're temporarily disconnected?

 You can't, which is why what we need to know is when the replica thinks
 it was last synced from the replica side.  That is, a sync timestamp and
 lsn from the last time the replica ack'd a sync commit back to the
 master successfully.  Based on that information, I can make an informed
 decision, even if I'm down to one replica.

 ... because we would know definitively which servers were in sync.  So
 maybe that's the use case we should be supporting?

 If you want automated failover you need a leader election amongst the
 surviving nodes. The replay position is all they need to elect the node
 that's furthest ahead, and that information exists today.

 I can do that already.  If quorum synch commit doesn't help us minimize
 data loss any better than async replication or the current 1-redundant,
 why would we want it?  If it does help us minimize data loss, how?

 In your example of 2 : { local_replica, london_server, nyc_server },
 if there is not something like quorum commit, only local_replica is synch
 and the other two are async. In this case, if the local data center gets
 destroyed, you need to promote either london_server or nyc_server. But
 since they are async, they might not have the data which have been already
 committed in the master. So data loss! Of course, as I said yesterday,
 they might have all the data and no data loss happens at the promotion.
 But the point is that there is no guarantee that no data loss happens.
 OTOH, if we use quorum commit, we can guarantee that either london_server
 or nyc_server has all the data which have been committed in the master.

 So I think that quorum commit is helpful for minimizing the data loss.


 Yeah, quorum commit is helpful for minimizing data loss in comparison
 with today replication.
 But in this your case, how can we know which server we should use as
 the next master server, after local data center got down?
 If we choose a wrong one, we would get the data loss.

Check the progress of each server, e.g., by using
pg_last_xlog_replay_location(),
and choose the server which is ahead of as new master.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko

On Fri, Jul 3, 2015 at 6:23 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 Yeah, quorum commit is helpful for minimizing data loss in comparison
 with today replication.
 But in this your case, how can we know which server we should use as
 the next master server, after local data center got down?
 If we choose a wrong one, we would get the data loss.

 Check the progress of each server, e.g., by using
 pg_last_xlog_replay_location(),
 and choose the server which is ahead of as new master.


Thanks. So we can choice the next master server using by checking the
progress of each server, if hot standby is enabled.
And a such procedure is needed even today replication.

I think that the #2 problem which is Josh pointed out seems to be solved;
1. I need to ensure that data is replicated to X places.
2. I need to *know* which places data was synchronously replicated
to when the master goes down.
And we can address #1 problem using quorum commit.

Thought?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generalized JSON output functions


On 05/27/2015 09:51 PM, Andrew Dunstan wrote:


On 05/27/2015 02:37 PM, Robert Haas wrote:

On Tue, May 26, 2015 at 2:50 AM, Shulgin, Oleksandr
oleksandr.shul...@zalando.de wrote:

Is it reasonable to add this patch to CommitFest now?

It's always reasonable to add a patch to the CommitFest if you would
like for it to be reviewed and avoid having it get forgotten about.
There seems to be some disagreement about whether we want this, but
don't let that stop you from adding it to the next CommitFest.


I'm not dead set against it either. When I have time I will take a
closer look.


Andrew, will you have the time to review this? Please add yourself as 
reviewer in the commitfest app if you do.


My 2 cents is that I agree with your initial reaction: This is a lot of 
infrastructure and generalizing things, for little benefit. Let's change 
the current code where we generate JSON to be consistent with 
whitespace, and call it a day.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

On 2015-07-03 13:50:02 +0300, Heikki Linnakangas wrote:
 As Tom pointed out, you need the full lexer to do this correctly. You can
 argue that something that handles the most common cases is enough, but
 realistically, by the time you've handled all the common cases correctly,
 you've just re-invented the lexer.

Yes.

 I think we should either bite the bullet and include the full SQL lexer in
 pgbench, or come up with some new syntax for marking the beginning and end
 of a statement.

I'm pretty clearly in favor of doing correct lexing. I think we should
generalize that and make it reusable. psql has it's own hacked up
version already, there seems little point in having variedly good copies
around.

 We could do something like bash here-documents or Postgres
 dollar-quoting, for example:
 
 \set ...
 select 1234; -- A statement on a single line, no change here
 
 -- Begin a multi-line statement
 \multi-line-statement END_TOKEN
 select *
   from complicated;
 END_TOKEN

Not pretty imo. I could see including something esimpler, in addition to
the lexer, to allow sending multiple statements in one go.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Beena Emerson

Sawada Masahiko wrote:

 I think that the #2 problem which is Josh pointed out seems to be solved;
 1. I need to ensure that data is replicated to X places.
 2. I need to *know* which places data was synchronously replicated
 to when the master goes down.
 And we can address #1 problem using quorum commit.
 
 Thought?

I agree. The knowledge of which servers where in sync(#2) would not actually
help us determine the new master and quorum solves #1.





-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5856459.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-07-03 Thread Etsuro Fujita


On 2015/07/03 15:32, Kouhei Kaigai wrote:

On 2015/07/02 23:13, Kouhei Kaigai wrote:

To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5.  It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2];



Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.



Maybe I'm missing something, but my image about this approach is that if
base relations for a given joinrel are all foreign tables and belong to
the same foreign server, then by calling that API there, we consider the
remote join over all the foreign tables, and that if not, we give up to
consider the remote join.



Your understanding is correct, but missing a point. Once foreign tables
to be joined are informed as a bitmap (joinrel-relids), it is not obvious
for extensions which relations are joined with INNER JOIN, and which ones
are joined with OUTER JOIN.


Can't FDWs get the join information through the root, which I think we 
would pass to the API as the argument?



Also, I don't want to stick on the assumption that relations involved in
remote join are all managed by same foreign-server no longer.
The following two ideas introduce possible enhancement of remote join
feature that involved local relations; replicated table or transformed
to VALUES() clause.

http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=x...@mail.gmail.com
http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f8010f2...@bpxm15gp.gisp.nec.co.jp


Interesting!


I think add_paths_to_joinrel() is the best location for foreign-join,
not only custom-join. Relocation to standard_join_search() has larger
disadvantage than its advantage.


I agree with you that it's important to ensure the expandability, and my 
question is, is it possible that the API call from standard_join_search 
also realize those idea if FDWs can get the join information through the 
root or something like that?


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts


On 06/21/2015 11:12 AM, Fabien COELHO wrote:


Hello Josh,


Add backslash continuations to pgbench custom scripts.
[...]
IMHO this approach is the best compromise.


I don't personally agree.  I believe that it it worth breaking backwards
compatibility to support line breaks in pgbench statements, and that if
we're not going to do that, supporting \ continuations is of little value.

As someone who actively uses pgbench to write custom benchmarks, I need
to write queries which I can test.  \ continuation does NOT work on the
psql command line, so that's useless for testing my queries; I still
have to reformat and troubleshoot.  If we added \ continuation, I
wouldn't use it.

I think we should support line breaks, and require semicolons for
end-of-statement.  Backwards-compatability in custom pgbench scripts is
not critical; pgbench scripts are neither used in produciton, nor used
in automated systems much that I know of.

I'm not clear on why we'd need a full SQL lexer.


Attached is a without lexer version which does ;-terminated SQL commands
and \-continuated meta commands (may be useful for \shell and long \set
expressions).


As Tom pointed out, you need the full lexer to do this correctly. You 
can argue that something that handles the most common cases is enough, 
but realistically, by the time you've handled all the common cases 
correctly, you've just re-invented the lexer.


The home-grown lexer is missing e.g. dollar-quoting support, so this is 
not be parsed correctly:


do $$
begin
  ...
end;
$$;

That would be very nice to handle correctly, I've used DO-blocks in 
pgbench scripts many times, and it's a pain to have to write them in a 
single line.


Also worth noting that you can currently test so-called multi-statements 
with pgbench, by putting multiple statements on a single line. Your 
patch seems to still do that, but if we went with a full-blown SQL 
lexer, they would probably be split into two statements.


I think we should either bite the bullet and include the full SQL lexer 
in pgbench, or come up with some new syntax for marking the beginning 
and end of a statement. We could do something like bash here-documents 
or Postgres dollar-quoting, for example:


\set ...
select 1234; -- A statement on a single line, no change here

-- Begin a multi-line statement
\multi-line-statement END_TOKEN
select *
  from complicated;
END_TOKEN

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement

2015-07-03 Thread Fabrízio de Royes Mello

On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 
http://www.postgresql.org/message-id/ca+tgmozm+-0r7h0edpzzjbokvvq+gavkchmno4fypveccw-...@mail.gmail.com

 I like the idea of the feature a lot, but the proposal to which you
 refer here mentions some problems, which I'm curious how you think you
 might solve.  (I don't have any good ideas myself, beyond what I
 mentioned there.)


You're right, and we have another design (steps 1 and 2 was implemented
last year):


*** ALTER TABLE changes

1) ATController
1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c -
AlterTableGetLockLevel:3023)


2) Prepare to change relpersistence (src/backend/commands/tablecmds.c -
ATPrepCmd:3249-3270)
• check temp table (src/backend/commands/tablecmds.c -
ATPrepChangePersistence:11074)
• check foreign key constraints (src/backend/commands/tablecmds.c -
ATPrepChangePersistence:11102)


3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync
(MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if
exists)


4) Create a new fork called  TRANSIENT INIT FORK:

• from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
  ∘ new forkName (src/common/relpath.c) called _initl
  ∘ insert XLog record to drop it if transaction abort

• from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
  ∘ new forkName (src/common/relpath.c) called _initu
  ∘ insert XLog record to drop it if transaction abort


5) Change the relpersistence in catalog (pg_class-relpersistence) (heap,
toast, indexes)


6) Remove/Create INIT_FORK

• from Unlogged to Logged
  ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the
pendingDeletes queue
• from Logged to Unlogged
  ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
  ∘ create the INIT_FORK using heap_create_init_fork
  ∘ insert XLog record to drop init fork if the transaction abort



*** CRASH RECOVERY changes

1) During crash recovery
(src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)

• if the transient fork _initl exists then
  ∘ drop the transient fork _initl
  ∘ if the init fork doesn't exist then create it
  ∘ reset relation
• if the transient fork _initu exists then
  ∘ drop the transient fork _initl
  ∘ if the init fork exists then drop it
  ∘ don't reset the relation


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

On 2015-07-02 14:54:19 -0700, Josh Berkus wrote:
 On 07/02/2015 12:44 PM, Andres Freund wrote:
  On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
  So there's two parts to this:
 
  1. I need to ensure that data is replicated to X places.
 
  2. I need to *know* which places data was synchronously replicated to
  when the master goes down.
 
  My entire point is that (1) alone is useless unless you also have (2).
  
  I think there's a good set of usecases where that's really not the case.
 
 Please share!  My plea for usecases was sincere.  I can't think of any.

I have important data. I want to survive both a local hardware failure
(it's faster to continue using the local standby) and I want to protect
myself against actual disaster striking the primary datacenter. Pretty
common.

  And do note that I'm talking about information on the replica, not on
  the master, since in any failure situation we don't have the old
  master around to check.
  
  How would you, even theoretically, synchronize that knowledge to all the
  replicas? Even when they're temporarily disconnected?
 
 You can't, which is why what we need to know is when the replica thinks
 it was last synced from the replica side.  That is, a sync timestamp and
 lsn from the last time the replica ack'd a sync commit back to the
 master successfully.  Based on that information, I can make an informed
 decision, even if I'm down to one replica.

I think you're mashing together nearly unrelated topics.

Note that we already have the last replayed lsn, and we have the
timestamp of the last replayed transaction.

  If you want automated failover you need a leader election amongst the
  surviving nodes. The replay position is all they need to elect the node
  that's furthest ahead, and that information exists today.
 
 I can do that already.  If quorum synch commit doesn't help us minimize
 data loss any better than async replication or the current 1-redundant,
 why would we want it?  If it does help us minimize data loss, how?

But it does make us safer against data loss? If your app gets back the
commit you know that the data has made it both to the local replica and
one other datacenter. And you're now safe agains the loss of either the
master's hardware (most likely scenario) and safe against the loss of
the entire primary datacenter. That you need additional logic to know to
which other datacenter to fail over is just yet another piece (which you
*can* build today).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts



Hello Heikki,


I'm not clear on why we'd need a full SQL lexer.


Attached is a without lexer version which does ;-terminated SQL commands
and \-continuated meta commands (may be useful for \shell and long \set
expressions).


As Tom pointed out, you need the full lexer to do this correctly. You can 
argue that something that handles the most common cases is enough, but 
realistically, by the time you've handled all the common cases correctly, 
you've just re-invented the lexer.


Sure. I understand that part of Josh argument is that we are discussing 
pgbench test scripts here, not real full-blown applications, and these are 
expected to be quite basic, plain mostly SQL things.


The home-grown lexer is missing e.g. dollar-quoting support, so this is not 
be parsed correctly:


do $$
begin
 ...
end;
$$;


Hmmm, good one, if indeed you want to use PL/pgSQL or even any arbitrary 
language in a pgbench scripts... I would rather have created functions 
(once, outside of pgbench) and would call them from the script, so that 
would be a simple SELECT.


That would be very nice to handle correctly, I've used DO-blocks in pgbench 
scripts many times, and it's a pain to have to write them in a single line.


Yep. With some languages I'm not sure that it is even possible.

Also worth noting that you can currently test so-called multi-statements 
with pgbench, by putting multiple statements on a single line.


Yes indeed, behind the hood pgbench expects just one line, or you have to 
change significantly the way statements are handled, which is way beyond 
my initial intentions on this one, and this would mean quite a lot of 
changes for more or less corner cases.


Your patch seems to still do that, but if we went with a full-blown SQL 
lexer, they would probably be split into two statements.


I think we should either bite the bullet and include the full SQL lexer in 
pgbench, or come up with some new syntax for marking the beginning and end of 
a statement. We could do something like bash here-documents or Postgres 
dollar-quoting, for example:


\set ...
select 1234; -- A statement on a single line, no change here

-- Begin a multi-line statement
\multi-line-statement END_TOKEN
select *
 from complicated;
END_TOKEN


I do not like the aesthetic of the above much. I really liked the idea of 
simply writing SQL queries just as in psql.


So maybe just handling $$-quoting would be enough to handle reasonable 
use-cases without troubling pgbench internal working too much? That would 
be a simple local changes in the line reader.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-07-03 Thread Amit Kapila

On Thu, Jul 2, 2015 at 8:49 AM, Amit Kapila amit.kapil...@gmail.com wrote:

 On Thu, May 21, 2015 at 10:19 PM, Robert Haas robertmh...@gmail.com
wrote:
 
  On Sat, Mar 21, 2015 at 2:30 PM, Thom Brown t...@linux.com wrote:
   Looks like one of the patches I applied is newer than the one in your
list:
  
   HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6
   parallel-mode-v9.patch
   assess-parallel-safety-v4.patch
   parallel-heap-scan.patch
   parallel_seqscan_v11.patch
 
  This patch hasn't been updated for a while, so here is a rebased
  version now that we're hopefully mostly done making changes to
  pg_proc.h for 9.5.  parallel-mode-v10 was mostly committed, and
  parallel-heap-scan has now been incorporated into the parallel_seqscan
  patch.  So you should now be able to test this feature by applying
  just this patch, and the new version of the parallel seqscan patch
  which I am given to understand that Amit will be posting pretty soon.
 

 This patch again needs rebase.

Attached, find the rebased patch which can be used to review/test latest
version of parallel_seqscan patch which I am going to post in the parallel
seq. scan thread soonish.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


assess-parallel-safety-v6.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts