Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-12-04 Thread Rajkumar Raghuwanshi
On Tue, Dec 5, 2017 at 11:04 AM, Rajkumar Raghuwanshi
 wrote:
> On Mon, Dec 4, 2017 at 7:34 AM, Ashutosh Bapat
>  wrote:
>> I agree, the patch looks longer than expected. I think, it's important
>> to have some testcases to test partition-wise join with default
>> partitions. I think we need at least one test for range default
>> partitions, one test for list partitioning, one for multi-level
>> partitioning and one negative testcase with default partition missing
>> from one side of join.
>>
>> May be we could reduce the number of SQL commands and queries in the
>> patch by adding default partition to every table that participates in
>> partition-wise join (leave the tables participating in negative tests
>> aside.). But that's going to increase the size of EXPLAIN outputs and
>> query results. The negative test may simply drop the default partition
>> from one of the tables.
>>
>> For every table being tested, the patch adds two ALTER TABLE commands,
>> one for detaching an existing partition and then attach the same as
>> default partition. Alternative to that is just add a new default
>> partition without detaching and existing partition. But then the
>> default partition needs to populated with some data, which requires 1
>> INSERT statement at least. That doesn't reduce the size of patch, but
>> increases the output of query and EXPLAIN plan.
>>
>> May be in case of multi-level partitioning test, we don't need to add
>> DEFAULT in every partitioned relation; adding to one of them would be
>> enough. May be add it to the parent, but that too can be avoided. That
>> would reduce the size of patch a bit.
>
> Thanks Ashutosh for suggestions.
>
> I have reduced test cases as suggested. Attaching updated patch.
>
Sorry Attached wrong patch.

attaching correct patch now.
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 27ab852..a60ba7f 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1337,6 +1337,96 @@ SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, ph
  574. | 574.5000 | 1148. | 0011 | 0011 | A0011
 (12 rows)
 
+-- test default partition behavior for range
+ALTER TABLE prt1 DETACH PARTITION prt1_p3;
+ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
+ALTER TABLE prt2 DETACH PARTITION prt2_p3;
+ALTER TABLE prt2 ATTACH PARTITION prt2_p3 DEFAULT;
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
+QUERY PLAN
+--
+ Sort
+   Sort Key: t1.a
+   ->  Append
+ ->  Hash Join
+   Hash Cond: (t2.b = t1.a)
+   ->  Seq Scan on prt2_p1 t2
+   ->  Hash
+ ->  Seq Scan on prt1_p1 t1
+   Filter: (b = 0)
+ ->  Hash Join
+   Hash Cond: (t2_1.b = t1_1.a)
+   ->  Seq Scan on prt2_p2 t2_1
+   ->  Hash
+ ->  Seq Scan on prt1_p2 t1_1
+   Filter: (b = 0)
+ ->  Hash Join
+   Hash Cond: (t2_2.b = t1_2.a)
+   ->  Seq Scan on prt2_p3 t2_2
+   ->  Hash
+ ->  Seq Scan on prt1_p3 t1_2
+   Filter: (b = 0)
+(21 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
+  a  |  c   |  b  |  c   
+-+--+-+--
+   0 |  |   0 | 
+ 150 | 0150 | 150 | 0150
+ 300 | 0300 | 300 | 0300
+ 450 | 0450 | 450 | 0450
+(4 rows)
+
+-- test default partition behavior for list
+ALTER TABLE plt1 DETACH PARTITION plt1_p3;
+ALTER TABLE plt1 ATTACH PARTITION plt1_p3 DEFAULT;
+ALTER TABLE plt2 DETACH PARTITION plt2_p3;
+ALTER TABLE plt2 ATTACH PARTITION plt2_p3 DEFAULT;
+EXPLAIN (COSTS OFF)
+SELECT avg(t1.a), avg(t2.b), t1.c, t2.c FROM plt1 t1 RIGHT JOIN plt2 t2 ON t1.c = t2.c GROUP BY t1.c, t2.c ORDER BY t1.c, t2.c;
+  QUERY PLAN  
+--
+ Sort
+   Sort Key: t1.c, t2.c
+   ->  HashAggregate
+ Group Key: t1.c, t2.c
+ ->  Result
+   ->  Append
+ ->  Hash Right Join
+   Hash Cond: (t1.c = t2.c)
+   ->  Seq Scan on plt1_p1 t1
+   ->  Hash
+ ->  Seq Scan on plt2_p1 t2
+ ->  Hash Right Join
+   Hash Cond: (t1_1.c = t2_1.c)
+   ->  Seq Scan on plt1_p2 t1_1
+   ->  Hash
+ ->  Seq Scan on plt2_p2 t2_1
+ ->  Hash 

Re: Mention ordered datums in PartitionBoundInfoData comment

2017-12-04 Thread Ashutosh Bapat
On Tue, Dec 5, 2017 at 11:20 AM, Julien Rouhaud  wrote:
> + * PartitionBoundInfoData structures for two partitioned table with
> exactly same
>
> should be "tables".

Sorry. Thanks for pointing it out. fixed in the attached patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dd4a8d3..4cb3e16 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -72,6 +72,13 @@
  * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
  * given partition.
  *
+ * The datums in datums array are arranged in the increasing order defined by
+ * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
+ * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
+ * resp. For range and list partitions this simply means that the datums in the
+ * datums array are arranged in the increasing order defined by the partition
+ * key collation.
+ *
  * In the case of list partitioning, the indexes array stores one entry for
  * every datum, which is the index of the partition that accepts a given datum.
  * In case of range partitioning, it stores one entry per distinct range
@@ -82,6 +89,9 @@
  * partition which would accept that datum-tuple would be given by the entry
  * pointed by remainder produced when hash value of the datum-tuple is divided
  * by the greatest modulus.
+ *
+ * PartitionBoundInfoData structures for two partitioned tables with exactly
+ * same bounds look exactly same.
  */
 
 typedef struct PartitionBoundInfoData


BUGFIX: standby disconnect can corrupt serialized reorder buffers

2017-12-04 Thread Craig Ringer
Hi all

TL;DR: we should delete pg_replslot/$SLOTNAME/* at the start of each
decoding session or we can accidentally re-use stale reorder buffer
contents from prior runs and $BADNESS happens. StartupReorderBuffer() is
not sufficient.


Details:

Petr and I have found a bug in logical decoding where changes get appended
multiple times to serialized reorder buffers. This causes duplicate changes
sent to downstream (conflicts, ERRORs, etc).

Symptoms are:

* Oversized serialized reorder buffers in pg_replslot. In the case we found
this problem in, there was a 147GB reorder buffer that should only have
been 12GB.

* Downstream/receiver working hard but achieving nothing (pglogical/bdr
with conflict resolution enabled), or failing with unique violations and
other errors (built-in logical replication)

Debugging showed that logical decoding was calling the output plugin many
times with the same set of ReorderBufferChange records. It'd advance
normally, then go back to the start of a page address or similar and go
through them all all over again.

Log analysis showed that the downstream had been repeatedly connecting to
the upstream, then ERRORing out, so the upstream logs were full of

LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on standby connection

When the downstream error was fixed, the repeated changes were seen.

The cause appears to be that walsender.c's ProcessRepliesIfAny writes a LOG
for unexpected EOF then calls proc_exit(0). But  serialized txn cleanup is
done by
ReorderBufferRestoreCleanup, as called by ReorderBufferCleanupTXN, which is
invoked from the PG_CATCH() in ReorderBufferCommit() and on various normal
exits. It won't get called if we proc_exit() without an ERROR, so we leave
stale data lying around.

It's not a problem on crash restart because StartupReorderBuffer already
does the required delete.

ReorderBufferSerializeTXN, which spills the txns to disk, doesn't appear to
have any guard to ensure that the segment files don't already exist when it
goes to serialize a snapshot. Adding one there would probably be expensive;
we don't know the last lsn of the txn yet, so to be really safe we'd have
to do a directory listing and scan for any txn-$OURXID-* entries.

So to fix, I suggest that we should do a
slot-specific StartupReorderBuffer-style deletion when we start a new
decoding session on a slot, per attached patch.

It might be nice to also add a hook on proc exit, so we don't have stale
buffers lying around until next decoding session, but I didn't want to add
new global state to reorderbuffer.c so I've left that for now.

BTW, while this bug looks similar to
https://www.postgresql.org/message-id/54e4e488-186b-a056-6628-50628e4e4...@lab.ntt.co.jp
"Failed to delete old ReorderBuffer spilled files" it's really quite a
separate issue.

Both this bugfix and the above need backpatching to 9.4.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 0a9b98a857e8de02394dc8aefe365a72e50a222e Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 5 Dec 2017 13:32:45 +0800
Subject: [PATCH v1] Clean up reorder buffer files when starting logical
 decoding

We could fail to clean up reorder buffer files if the walsender exited due to a
client disconnect, because we'd skip both the normal exit and error paths.
---
 src/backend/replication/logical/reorderbuffer.c | 84 +++--
 1 file changed, 50 insertions(+), 34 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index fa95bab..bc562e2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -196,6 +196,7 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		   char *change);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferCleanSerializedTXNs(const char *slotname);
 
 static void ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap);
 static Snapshot ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
@@ -214,7 +215,8 @@ static void ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *t
 
 
 /*
- * Allocate a new ReorderBuffer
+ * Allocate a new ReorderBuffer and clean out any old serialized state from
+ * prior ReorderBuffer instances for the same slot.
  */
 ReorderBuffer *
 ReorderBufferAllocate(void)
@@ -223,6 +225,8 @@ ReorderBufferAllocate(void)
 	HASHCTL		hash_ctl;
 	MemoryContext new_ctx;
 
+	Assert(MyReplicationSlot != NULL);
+
 	/* allocate memory in own context, to have better accountability */
 	new_ctx = AllocSetContextCreate(CurrentMemoryContext,
 	"ReorderBuffer",
@@ -266,6 +270,9 @@ ReorderBufferAllocate(void)
 
 	

Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

2017-12-04 Thread Amit Kapila
On Mon, Dec 4, 2017 at 11:17 PM, Robert Haas  wrote:
> On Sat, Dec 2, 2017 at 8:04 AM, Amit Kapila  wrote:
>> Attached patch contains regression test as well.  Note that I have
>> carefully disabled all variable stats by using (analyze, timing off,
>> summary off, costs off) and then selected parallel sequential scan on
>> the right of join so that we have nloops and rows as variable stats
>> and those should remain constant.
>
> The regression test contains a whitespace error about which git diff
> --check complains.
>

oops, a silly mistake from my side.

> Also, looking at this again, shouldn't the reinitialization of the
> instrumentation arrays happen in ExecParallelReinitialize rather than
> ExecParallelFinish, so that we don't spend time doing it unless the
> Gather is actually re-executed?
>

Yeah, that sounds better, so modified the patch accordingly.

I have one another observation in the somewhat related area.  From the
code, it looks like we might have some problem with displaying sort
info for workers for rescans.  I think the problem with the sortinfo
is that it initializes shared info with local memory in
ExecSortRetrieveInstrumentation after which it won't be able to access
the values in shared memory changed by workers in rescans.  We might
be able to fix it by having some local_info same as sahred_info in
sort node.  But the main problem is how do we accumulate stats for
workers across rescans.  The type of sort method can change across
rescans.  We might be able to accumulate the size of Memory though,
but not sure if that is right.  I think though this appears to be
somewhat related to the problem being discussed in this thread, it can
be dealt separately if we want to fix it.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_accum_instr_parallel_workers_v3.patch
Description: Binary data


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-12-04 Thread Rajkumar Raghuwanshi
On Mon, Dec 4, 2017 at 7:34 AM, Ashutosh Bapat
 wrote:
> I agree, the patch looks longer than expected. I think, it's important
> to have some testcases to test partition-wise join with default
> partitions. I think we need at least one test for range default
> partitions, one test for list partitioning, one for multi-level
> partitioning and one negative testcase with default partition missing
> from one side of join.
>
> May be we could reduce the number of SQL commands and queries in the
> patch by adding default partition to every table that participates in
> partition-wise join (leave the tables participating in negative tests
> aside.). But that's going to increase the size of EXPLAIN outputs and
> query results. The negative test may simply drop the default partition
> from one of the tables.
>
> For every table being tested, the patch adds two ALTER TABLE commands,
> one for detaching an existing partition and then attach the same as
> default partition. Alternative to that is just add a new default
> partition without detaching and existing partition. But then the
> default partition needs to populated with some data, which requires 1
> INSERT statement at least. That doesn't reduce the size of patch, but
> increases the output of query and EXPLAIN plan.
>
> May be in case of multi-level partitioning test, we don't need to add
> DEFAULT in every partitioned relation; adding to one of them would be
> enough. May be add it to the parent, but that too can be avoided. That
> would reduce the size of patch a bit.

Thanks Ashutosh for suggestions.

I have reduced test cases as suggested. Attaching updated patch.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 27ab852..f83166b 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1337,6 +1337,96 @@ SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, ph
  574. | 574.5000 | 1148. | 0011 | 0011 | A0011
 (12 rows)
 
+-- test default partition behavior for range
+ALTER TABLE prt1 DETACH PARTITION prt1_p3;
+ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
+ALTER TABLE prt2 DETACH PARTITION prt2_p3;
+ALTER TABLE prt2 ATTACH PARTITION prt2_p3 DEFAULT;
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
+QUERY PLAN
+--
+ Sort
+   Sort Key: t1.a
+   ->  Append
+ ->  Hash Join
+   Hash Cond: (t2.b = t1.a)
+   ->  Seq Scan on prt2_p1 t2
+   ->  Hash
+ ->  Seq Scan on prt1_p1 t1
+   Filter: (b = 0)
+ ->  Hash Join
+   Hash Cond: (t2_1.b = t1_1.a)
+   ->  Seq Scan on prt2_p2 t2_1
+   ->  Hash
+ ->  Seq Scan on prt1_p2 t1_1
+   Filter: (b = 0)
+ ->  Hash Join
+   Hash Cond: (t2_2.b = t1_2.a)
+   ->  Seq Scan on prt2_p3 t2_2
+   ->  Hash
+ ->  Seq Scan on prt1_p3 t1_2
+   Filter: (b = 0)
+(21 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
+  a  |  c   |  b  |  c   
+-+--+-+--
+   0 |  |   0 | 
+ 150 | 0150 | 150 | 0150
+ 300 | 0300 | 300 | 0300
+ 450 | 0450 | 450 | 0450
+(4 rows)
+
+-- test default partition behavior for list
+ALTER TABLE plt1 DETACH PARTITION plt1_p3;
+ALTER TABLE plt1 ATTACH PARTITION plt1_p3 DEFAULT;
+ALTER TABLE plt2 DETACH PARTITION plt2_p3;
+ALTER TABLE plt2 ATTACH PARTITION plt2_p3 DEFAULT;
+EXPLAIN (COSTS OFF)
+SELECT avg(t1.a), avg(t2.b), t1.c, t2.c FROM plt1 t1 RIGHT JOIN plt2 t2 ON t1.c = t2.c GROUP BY t1.c, t2.c ORDER BY t1.c, t2.c;
+  QUERY PLAN  
+--
+ Sort
+   Sort Key: t1.c, t2.c
+   ->  HashAggregate
+ Group Key: t1.c, t2.c
+ ->  Result
+   ->  Append
+ ->  Hash Right Join
+   Hash Cond: (t1.c = t2.c)
+   ->  Seq Scan on plt1_p1 t1
+   ->  Hash
+ ->  Seq Scan on plt2_p1 t2
+ ->  Hash Right Join
+   Hash Cond: (t1_1.c = t2_1.c)
+   ->  Seq Scan on plt1_p2 t1_1
+   ->  Hash
+ ->  Seq Scan on plt2_p2 t2_1
+ ->  Hash Right Join
+   Hash Cond: (t1_2.c = t2_2.c)
+   ->  Seq Scan on plt1_p3 t1_2
+  

Re: Is it OK to ignore directory open failure in ResetUnloggedRelations?

2017-12-04 Thread Justin Pryzby
On Mon, Dec 04, 2017 at 03:15:08PM -0500, Tom Lane wrote:
> While working through Michael Paquier's patch to clean up inconsistent
> usage of AllocateDir(), I noticed that ResetUnloggedRelations and its
> subroutines are not consistent about whether a directory open failure
> results in erroring out or just emitting a LOG message and continuing.
> ResetUnloggedRelations itself throws a hard error if it fails to open
> pg_tblspc, but all the rest of reinit.c thinks a LOG message is
> sufficient.
...
> So now I'm thinking we should do the reverse and change these functions
> to give a hard error on AllocateDir failure.  That would result in
> startup-process failure if we are unable to scan the database, which is
> not great, but there's certainly something badly wrong if we can't.

I can offer a data point unrelated to unlogged relations.

Sometimes, following a reboot, if there's a tablespace on ZFS, and if a ZPOOL
backing device is missing/renamed (especially under qemu), postgres (if it was
shutdown cleanly) will happily start even though a tablespace is missing (due
to unable to find backing device - ZFS wants it to be exported and imported
before it scans all devices for matching UUID).

That has been surprising to me in the past and lead me to believe that
"services are up" following a reboot only to notice a bunch of ERRORs in the
logs a handful of minutes later.

Maybe that counts for a tangential +1.

Justin



Re: Is it OK to ignore directory open failure in ResetUnloggedRelations?

2017-12-04 Thread David Steele
Hi Tom,

On 12/4/17 3:15 PM, Tom Lane wrote:
> While working through Michael Paquier's patch to clean up inconsistent
> usage of AllocateDir(), I noticed that ResetUnloggedRelations and its
> subroutines are not consistent about whether a directory open failure
> results in erroring out or just emitting a LOG message and continuing.
> ResetUnloggedRelations itself throws a hard error if it fails to open
> pg_tblspc, but all the rest of reinit.c thinks a LOG message is
> sufficient.

By a strange coincidence I spent a while today reading through this code...

> My first thought was to change ResetUnloggedRelations to match the
> rest, but on reflection I'm less sure about that.  What we've got
> at the moment is that a possibly-transient directory open failure
> can result in failure to reset an unlogged relation to empty,
> which to me amounts to data corruption.  

I'm wondering how this transient directory open failure is going to
happen without a bunch of other things going wrong, but I agree that if
it happens then corruption would be the likely result.

> If the contents of the
> unlogged relation are inconsistent, which is plenty likely after
> a crash, we could end up crashing later because of that; and in
> any case the user would not see what they expect in the tables.

Agreed.

> So now I'm thinking we should do the reverse and change these functions
> to give a hard error on AllocateDir failure.  That would result in
> startup-process failure if we are unable to scan the database, which is
> not great, but there's certainly something badly wrong if we can't.

+1.  If a tablespace or database directory cannot be opened then I don't
think it makes any sense to continue.

Regards,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] Walsender timeouts and large transactions

2017-12-04 Thread Craig Ringer
On 29 November 2017 at 23:59, Petr Jelinek 
wrote:

> Hi,
>
> On 17/11/17 08:35, Kyotaro HORIGUCHI wrote:
> >
> > Moving around the code allow us to place ps_is_send_pending() in
> > the while condition, which seems to be more proper place to do
> > that. I haven't added test for this particular case.
> >
> > I tested this that
> >
> > - cleanly applies on the current master HEAD and passes make
> >   check and subscription test.
> >
> > - walsender properly chooses the slow-path even if
> >   pq_is_send_pending() is always false. (happens on a fast enough
> >   network)
> >
> > - walsender waits properly waits on socket and process-reply time
> >   in WaitLatchOrSocket.
> >
> > - walsender exits by timeout on network stall.
> >
> > So, I think the patch is functionally perfect.
> >
> > I'm a reviewer of this patch but I think I'm not allowed to mark
> > this "Ready for Commiter" since the last change is made by me.
> >
>
> Thanks for working on this, but there are couple of problems with your
> modifications which mean that it does not actually fix the original
> issue anymore (transaction taking long time to decode while sending
> changes over network works fine will result in walsender timout).
>
> The firs one is that you put pq_is_send_pending() in the while so the
> while is again never executed if there is no network send pending which
> makes the if above meaningless. Also you missed ProcessRepliesIfAny()
> when moving code around. That's needed for timeout calculations to work
> correctly.
>
> So one more revision attached with those things fixed. This version
> fixes the original issue as well.
>
>
I'm happy with what I see here. Commit message needs tweaking, but any
committer would do that anyway.

To me it looks like it's time to get this committed, marking as such.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] logical decoding of two-phase transactions

2017-12-04 Thread Craig Ringer
On 4 December 2017 at 23:15, Nikhil Sontakke 
wrote:

> PFA, latest patch for this functionality.
> This patch contains the following changes as compared to the earlier patch:
>
> - Fixed a bunch of typos and comments
>
> - Modified HeapTupleSatisfiesVacuum to return HEAPTUPLE_RECENTLY_DEAD
> if the transaction id is newer than OldestXmin. Doing this only for
> CATALOG tables (htup->t_tableOid < (Oid) FirstNormalObjectId).
>

Because logical decoding supports user-catalog relations, we need to use
the same sort of logical that GetOldestXmin uses instead of a simple
oid-range check. See  RelationIsAccessibleInLogicalDecoding() and the
user_catalog_table reloption.

Otherwise pseudo-catalogs used by logical decoding output plugins could
still suffer issues with needed tuples getting vacuumed, though only if the
txn being decoded made changes to those tables than ROLLBACKed. It's a
pretty tiny corner case for decoding of 2pc but a bigger one when we're
addressing streaming decoding.

Otherwise I'm really, really happy with how this is progressing and want to
find time to play with it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pow support for pgbench

2017-12-04 Thread Michael Paquier
On Tue, Dec 5, 2017 at 11:32 AM, Tom Lane  wrote:
> ISTM one key issue here is whether pgbench's expression language is
> meant to model SQL (where we have function overloading) or C (where
> there is no overloading).  I don't think we've really settled on a
> fixed policy on that, but maybe now is the time.

abs() is doing that already. Having some rules in the shape of at
least a comment would be nice.

> If we do think that function overloading is OK, there remains the
> question of when the typing is resolved.  I think Robert is objecting
> to resolving at runtime, and I tend to agree that that's something
> we'd regret in the long run.  It doesn't match either SQL or C.

+1.
-- 
Michael



Usage of epoch in txid_current

2017-12-04 Thread Amit Kapila
Hi,

Currently, txid_current and friends export a 64-bit format of
transaction id that is extended with an “epoch” counter so that it
will not wrap around during the life of an installation.   The epoch
value it uses is based on the epoch that is maintained by checkpoint
(aka only checkpoint increments it).

Now if epoch changes multiple times between two checkpoints
(practically the chances of this are bleak, but there is a theoretical
possibility), then won't the computation of xids will go wrong?
Basically, it can give the same value of txid after wraparound if the
checkpoint doesn't occur between the two calls to txid_current.

Am I missing something which ensures that epoch gets incremented at or
after wraparound?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: REPLICA IDENTITY FULL

2017-12-04 Thread Noah Misch
On Fri, Jun 23, 2017 at 03:45:48PM -0400, Peter Eisentraut wrote:
> On 6/23/17 13:14, Alvaro Herrera wrote:
> > Andres Freund wrote:
> >> On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:
> >>> Tom Lane wrote:
>  Peter Eisentraut  writes:
> > Any thoughts about keeping datumAsEqual() as a first check?  I did some
> > light performance tests, but it was inconclusive.
> 
>  Seems like it would tend to be a win if, in fact, the values are
>  usually equal.  If they're usually not, then it's a loser.  Do
>  we have any feeling for which case is more common?
> >>
> >> Seems like a premature optimization to me - if you care about
> >> performance and do this frequently, you're not going to end up using
> >> FULL.  If we want to performance optimize, it'd probably better to
> >> lookup candidate keys and use those if available.
> > 
> > I can get behind that argument.
> 
> Thanks for the feedback.  I have committed it after removing the
> datumIsEqual() call.

While reviewing this patch, I noticed a couple of nearby defects:

- RelationFindReplTupleSeq() says "Note that this stops on the first matching
  tuple.", but that's not the case.  It visits every row in the table, and it
  uses the last match.  The claimed behavior sounds more attractive.

- RelationFindReplTupleSeq() has comment "/* Start an index scan. */", an
  inapplicable copy-paste from RelationFindReplTupleByIndex().



Re: Error handling (or lack of it) in RemovePgTempFilesInDir

2017-12-04 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Dec 5, 2017 at 10:51 AM, Tom Lane  wrote:
>> Uh ... I'm confused?  That particular change only concerns whether we emit
>> a log message, not whether the action is attempted or succeeds.

> From the commit mentioned upthread, this switches one hard failure
> when opening pg_tblspc to a LOG report:
> @@ -3014,7 +3018,7 @@ RemovePgTempFiles(void)
>  */
> spc_dir = AllocateDir("pg_tblspc");

> -   while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL)
> +   while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
> {

That's not the same commit you just mentioned.  The point with this one is
that RemovePgTempFiles is a noncritical operation: if we fail to remove
temp files, it's still safe to start up, because those temp files won't
cause failures later.  (This is the exact opposite of the situation for
ResetUnloggedRelations's directory scans, which is why I changed that one
in the opposite direction.)

The general theory I'm operating on is that we should endeavor to
let the database start in any situation where that doesn't involve
a data-corruption hazard.  Yeah, it might not be nice if we leave
GB worth of temp files around, but is a postmaster start failure
better?  I don't think so.

regards, tom lane



Re: Silly API for do_pg_start_backup()

2017-12-04 Thread Michael Paquier
On Tue, Dec 5, 2017 at 3:58 AM, Tom Lane  wrote:
> Barring objection I'll remove the tblspcdir parameter and have
> do_pg_start_backup() open and close the directory immediately
> around the loop where it's read, like every other case in the
> backend.

+1. Thanks for the change. What has been committed in 066bc21 looks good to me..
-- 
Michael



Re: [HACKERS] pow support for pgbench

2017-12-04 Thread Michael Paquier
On Tue, Dec 5, 2017 at 5:38 AM, Robert Haas  wrote:
> I'm willing to commit any of the following things:
>
> 1. A patch that adds an integer version of pow() but not a double version
> 2. A patch that adds a double version of pow() but not an integer version
> 3. A patch that adds both an integer version of pow() and a double
> version of pow(), with the two versions having different names
>
> If Raúl is happy with only having an integer version, then I suggest
> that he adopt #1 and call it good.  Otherwise, given that Fabien wants
> the double version, I suggest we call the integer version pow() and
> the double version dpow() and go with #3.

It seems to me that 1 and 2 have value on their own for the workloads
tried to be emulated, so what you are suggesting in 3 looks good to
me. Now why are two different function names necessary? The parsing
takes care of argument types through PgBenchValue->type so having one
function exposed to the user looks like the most sensible approach to
me.
-- 
Michael



Re: Add PGDLLIMPORT lines to some variables

2017-12-04 Thread Peter Geoghegan
.
On Mon, Dec 4, 2017 at 5:41 PM, Noah Misch  wrote:
>> I don't think we quite have an established protocol for this. I
>> personally, but I'm biased in this specific case, is that we should
>> adopt a position that PGDLLIMPORTs should basically backpatched whenever
>> a credible extension even halfway reasonably requires it. There's no
>> easy way to get this done by default, and we're so far unwilling to just
>> slap this onto every variable. So to not further disadvantage people
>> force dto live in the MS environment, that seems the sanest
>> solution. It's not like these are high risk.
>
> +1

If that's going to be the policy, then I have some requests of my own.
I would like to add some PGDLLIMPORTs to suit the external version of
amcheck (the version that targets earlier versions of Postgres). These
are:

RecentGlobalXmin -- This is only PGDLLIMPORT on Postgres 10+,
following commit 56018bf2. I'd like to get that back to 9.4, although
there is no reason to not include 9.3.

TransactionXmin -- This is needed for the newer heap-matches-index
verification check. Again, I would like this on 9.4+, but 9.3+ works
too.

Note that somebody asked about running it on Windows recently, and on
one other occasion in the past. It does come up.

-- 
Peter Geoghegan



Re: Errands around AllocateDir()

2017-12-04 Thread Tom Lane
Michael Paquier  writes:
> Note I am +/-0 with exposing ReadDirExtended in back-branches, because
> there is no use for it yet there.

My thought is basically that we might end up back-patching some change
that needs that.  Nothing I've done today requires it, but it seems like
a very harmless guard against future problems.

I am BTW wondering whether any of the other directory scan loops in
the backend ought to be converted from ReadDir to ReadDirExtended(LOG).
I'm just finishing up making ResetUnloggedRelations() go the other way,
because of the argument that failing to reset unlogged relations would
be a data corruption hazard.  But there may well be other cases where
the best thing on balance is to log the problem and press on.  With
the changes we've made today, it's a very easy fix to flip from one
behavior to the other.

regards, tom lane



Re: Error handling (or lack of it) in RemovePgTempFilesInDir

2017-12-04 Thread Michael Paquier
On Tue, Dec 5, 2017 at 8:40 AM, Thomas Munro
 wrote:
> On Tue, Dec 5, 2017 at 4:44 AM, Tom Lane  wrote:
>> Anyway, I'm inclined to reverse that choice and emit LOG messages
>> reporting failure of any of the lstat, rmdir, or unlink calls in
>> RemovePgTempFilesInDir.  In the worst case, say that there are a
>> bunch of leftover temp files in a directory that someone has made
>> unwritable, this might cause a fair amount of whining in the postmaster
>> log --- but that's a situation that needs to be fixed anyway, so
>> I cannot see how not printing anything is a good idea.
>
> Belatedly, +1.  The error hiding seemed a bit odd considering that we
> were prepared to log "unexpected file found ...".  I probably should
> have questioned that instead of extending it monkey-see-monkey-do.

Well, I am -1 on this change. The coding before commit 561885d that
you have now pushed (timezone makes me answer later) was way more
conservative and I honestly preferred it as *only* the next postmaster
restart would remove remnant temp files which can cause potentially GB
of data to stay around. Also, if the failure happens at startup, isn't
it going to fail as well for backends afterwards? This would cause
backends to fail abruptly and it is actually easier to debug a
completely stopped instance...
-- 
Michael



Re: Add PGDLLIMPORT lines to some variables

2017-12-04 Thread Noah Misch
On Mon, Nov 20, 2017 at 12:02:30PM -0800, Andres Freund wrote:
> On 2017-11-20 11:58:44 -0800, Brian Cloutier wrote:
> > > please, append session_timezone to your list
> > 
> > Here's a v2 patch which also includes session_timezone.
> > 
> > Separately, is this the kind of thing which is eligible for backporting
> > into the next releases of 9.6 and 10?
> 
> I don't think we quite have an established protocol for this. I
> personally, but I'm biased in this specific case, is that we should
> adopt a position that PGDLLIMPORTs should basically backpatched whenever
> a credible extension even halfway reasonably requires it. There's no
> easy way to get this done by default, and we're so far unwilling to just
> slap this onto every variable. So to not further disadvantage people
> force dto live in the MS environment, that seems the sanest
> solution. It's not like these are high risk.

+1



Re: Errands around AllocateDir()

2017-12-04 Thread Michael Paquier
On Tue, Dec 5, 2017 at 4:11 AM, Alvaro Herrera  wrote:
> Tom Lane wrote:
>> Yeah, agreed.  The only thing I'm concerned about back-patching is
>> the places where a wrong errno might be reported.
>
> If we're currently reporting "could not open dir: Success" then
> backpatching such a fix is definitely an improvement.  OTOH if currently
> we have opendir() trying to report a failure, then LWLockRelease replace
> the errno because something completely unrelated also failed, having the
> message report exactly the opendir() failure rather than the lwlock
> failure is surely also an improvement.

Note I am +/-0 with exposing ReadDirExtended in back-branches, because
there is no use for it yet there. Only fixing the actual bugs with
errno is of course fine for me if my initial message was not clear for
back-branches.
-- 
Michael



Re: Errands around AllocateDir()

2017-12-04 Thread Michael Paquier
On Tue, Dec 5, 2017 at 7:17 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> I had a close look at all the callers of AllocateDir() and noticed a
>> couple of unwelcome things (Tom noticed some of those in the thread
>> mentioned above, I found others):
>
> Pushed with some minor additional fiddling.

Thanks.

> The most notable thing I changed was that instead of this:
>
>> - perform_base_backup() makes the mistake of not saving errno before
>> CheckXLogRemoved() when AllocateDir returns NULL, which can lead to an
>> incorrect error message.
>
> I modified CheckXLogRemoved() to internally guarantee that it does not
> change errno.  This is because there seemed to be other call sites that
> were depending on that, not just this one.  Anyway, that seemed like a
> more future-proof fix than relying on callers to deal with it.

Hm, OK. Yes I can see the point behind this way of doing instead,
CheckXLogRemoved() is much used in error code paths.
-- 
Michael



Re: es_query_dsa is broken

2017-12-04 Thread Thomas Munro
On Thu, Nov 30, 2017 at 11:19 PM, Thomas Munro
 wrote:
> On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas  wrote:
>>> Better ideas?
>>
>> How about this:
>>
>> 1. Remove es_query_dsa altogether.
>> 2. Add a dsa_area * to ExecParallelInitializeDSMContext.
>> 3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to
>> the per-node-type function.
>> 4. If the per-node-type function cares about the dsa_area *, it is
>> responsible for saving a pointer to it in the PlanState node.
>> 5. Also pass it down via the ParallelWorkerContext.
>>
>> In v10 we might need to go with a solution like what you've sketched
>> here, because Tom will complain about breaking binary compatibility
>> with EState (and maybe other PlanState nodes) in a released branch.

Here's a pair of versions of that patch tidied up a bit, for
REL_10_STABLE and master.  Unfortunately I've been unable to come up
with a reproducer that shows misbehaviour (something like what was
reported on -bugs[1] which appears to be a case of calling
dsa_get_address(wrong_area, some_dsa_pointer)).

I don't yet have a patch to remove es_query_dsa.  Instead of adding a
dsa_area * parameter to a ton of per-node functions as you suggested,
I'm wondering if we shouldn't just pass the whole ParallelExecutorInfo
object into all ExecXXXInitializeDSM() and ExecXXXInitializeWorker()
functions so they can hold onto it if they want to.  In the worker
case it'd be only partially filled out: just area and pcxt, and pcxt
would also be only partially filled out.  Alternatively I could create
a new struct ParallelExecutorWorkerInfo which has just the bit needed
for workers (that'd replace ParallelWorkerContext, which I now realise
was put in the wrong place -- it doesn't belong in access/parallel.h,
it belongs in an executor header with an executor-sounding name).  I'd
like to get a couple of other things done before I come back to this.

[1] 
https://www.postgresql.org/message-id/CAEepm=1V-ZjAAyG-xj6s7ESXFhzLsRBpyX=BLz-w2DS=obn...@mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


fix-es_query_dsa-pg10-v2.patch
Description: Binary data


fix-es_query_dsa-pg11-v2.patch
Description: Binary data


Re: Signals in a BGW

2017-12-04 Thread Michael Paquier
On Tue, Dec 5, 2017 at 10:03 AM, Craig Ringer  wrote:
> pglogical sets up its own handler 'handle_sigterm'. However, it now does
> much the same as src/backend/tcop/postgres.c's 'die' function, just without
> the single-user mode checks.

Documentation shows a simple example of that:
https://www.postgresql.org/docs/devel/static/source-conventions.html

> IMO it's silly to customise them, and a bad example.

I don't agree. It is not silly to have background workers being able
to take clean up actions and have their own tracking with some
sig_atomic_t flags.
-- 
Michael



Re: Signals in a BGW

2017-12-04 Thread Craig Ringer
On 5 December 2017 at 00:40, Chapman Flack  wrote:

>
> Is there a known, default behavior that some signals will
> have, if I simply BackgroundWorkerUnblockSignals() without customizing?
> Will SIGINT and SIGTERM, for example, have default handlers that
> interact with the volatile flags in miscadmin.h that are used by
> CHECK_FOR_INTERRUPTS, and set the process latch?
>
>
The default handler is bgworker_die ; see src/backend/postmaster/bgworker.c
. I don't know if elog() is safe in a signal handler, but I guess in the
absence of non-reentrant errcontext functions...


pglogical sets up its own handler 'handle_sigterm'. However, it now does
much the same as src/backend/tcop/postgres.c's 'die' function, just without
the single-user mode checks.

void
handle_sigterm(SIGNAL_ARGS)
{
int save_errno = errno;

SetLatch(MyLatch);

if (!proc_exit_inprogress)
{
InterruptPending = true;
ProcDiePending = true;
}

errno = save_errno;
}


so it's not significantly different. We used to have our own signal
handling but it gets seriously messy when you're calling into arbitrary
PostgreSQL code that expects CHECK_FOR_INTERRUPTS() to work.


> I notice that the worker_spi example code customizes SIGHUP
> and SIGTERM, giving them handlers that set a (different, local
> to the module) volatile flag, and set the process latch.
>

IMO it's silly to customise them, and a bad example.

Personally I'd rather change the default bgw handler to 'die', make the
sample bgworkers use CHECK_FOR_INTERRUPTS() properly, and be done with it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] compress method for spgist - 2

2017-12-04 Thread Nikita Glukhov

On 30.11.2017 05:31, Michael Paquier wrote:


I can see as well that the patches posted at the beginning of the
thread got reviews but that those did not get answered. The set of
patches also have conflicts with HEAD so they need a rebase. For those
reasons I am marking this entry as returned with feedback.


Rebased patches are attached.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
>From 1c251ff129f8e9f33d14df547d97ba549b109648 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Tue, 5 Dec 2017 02:38:50 +0300
Subject: [PATCH 1/2] spgist-compress-method-8

---
 doc/src/sgml/spgist.sgml| 54 ++---
 src/backend/access/spgist/spgdoinsert.c | 44 +++
 src/backend/access/spgist/spgutils.c| 23 --
 src/backend/access/spgist/spgvalidate.c | 24 ++-
 src/include/access/spgist.h |  5 ++-
 src/include/access/spgist_private.h |  8 +++--
 6 files changed, 127 insertions(+), 31 deletions(-)

diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 139c8ed..55c1b06 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,20 +240,21 @@
 
  
   There are five user-defined methods that an index operator class for
-  SP-GiST must provide.  All five follow the convention
-  of accepting two internal arguments, the first of which is a
-  pointer to a C struct containing input values for the support method,
-  while the second argument is a pointer to a C struct where output values
-  must be placed.  Four of the methods just return void, since
-  all their results appear in the output struct; but
+  SP-GiST must provide and one optional. All five mandatory
+  methos follow the convention of accepting two internal arguments,
+  the first of which is a pointer to a C struct containing input values for 
+  the support method, while the second argument is a pointer to a C struct 
+  where output values must be placed.  Four of the methods just return 
+  void, since all their results appear in the output struct; but
   leaf_consistent additionally returns a boolean result.
   The methods must not modify any fields of their input structs.  In all
   cases, the output struct is initialized to zeroes before calling the
-  user-defined method.
+  user-defined method. Optional method compress accepts
+  datum to be indexed and returns values which actually will be indexed.
  
 
  
-  The five user-defined methods are:
+  The five mandatory user-defined methods are:
  
 
  
@@ -283,6 +284,7 @@ typedef struct spgConfigOut
 {
 Oid prefixType; /* Data type of inner-tuple prefixes */
 Oid labelType;  /* Data type of inner-tuple node labels */
+Oid leafType;   /* Data type of leaf */
 boolcanReturnData;  /* Opclass can reconstruct original data */
 boollongValuesOK;   /* Opclass can cope with values  1 page */
 } spgConfigOut;
@@ -303,7 +305,15 @@ typedef struct spgConfigOut
   longValuesOK should be set true only when the
   attType is of variable length and the operator
   class is capable of segmenting long values by repeated suffixing
-  (see ).
+  (see ). leafType
+  usually has the same value as attType but if
+  it's different then optional method  compress
+  should be provided. Method  compress is responsible
+  for transformation from attType to 
+  leafType. In this case all other function
+  should accept leafType values. Note: both
+  consistent functions will get scankeys
+  unchanged, without compress transformation.
  
  
 
@@ -624,7 +634,8 @@ typedef struct spgInnerConsistentOut
reconstructedValue is the value reconstructed for the
parent tuple; it is (Datum) 0 at the root level or if the
inner_consistent function did not provide a value at the
-   parent level.
+   parent level. reconstructedValue should be always a
+   spgConfigOut.leafType type.
traversalValue is a pointer to any traverse data
passed down from the previous call of inner_consistent
on the parent index tuple, or NULL at the root level.
@@ -730,7 +741,8 @@ typedef struct spgLeafConsistentOut
reconstructedValue is the value reconstructed for the
parent tuple; it is (Datum) 0 at the root level or if the
inner_consistent function did not provide a value at the
-   parent level.
+   parent level. reconstructedValue should be always a
+   spgConfigOut.leafType type. 
traversalValue is a pointer to any traverse data
passed down from the previous call of inner_consistent
on the parent index tuple, or NULL at the root level.
@@ -757,6 +769,26 @@ typedef struct spgLeafConsistentOut
 

 
+ 
+  The optional user-defined method is:
+ 
+
+ 
+
+ Datum compress(Datum in)
+ 
+  

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-04 Thread David Rowley
On 2 December 2017 at 03:39, Robert Haas  wrote:
> On Thu, Nov 30, 2017 at 11:39 PM, David Rowley
>  wrote:
>> I feel like we could do better here with little extra effort. The
>> DETACH index feature does not really seem required for this patch.
>
> Because of the dump/restore considerations mentioned in
> http://postgr.es/m/ca+tgmobuhghg9v8saswkhbbfywg5a0qb+jgt0uovq5ycbdu...@mail.gmail.com
> I believe we need a way to create the index on the parent without
> immediately triggering index builds on the children, plus a way to
> create an index on a child after-the-fact and attach it to the parent.
> Detach isn't strictly required, but why support attach and not detach?

I proposed that this worked a different way in [1]. I think the way I
mention is cleaner as it means there's no extra reason for a
partitioned index to be indisvalid=false than there is for any other
normal index.

My primary reason for not liking this way is around leaving indexes
around as indisvalid=false, however, I suppose that an index could be
replaced atomically with a DETACH and ATTACH within a single
transaction. I had previously not really liked the idea of
invalidating an index by DETACHing a leaf table's index from it. Of
course, this patch does nothing special with partitioned indexes, but
I believe one day we will want to do something with these and that the
DETACH/ATTACH is not the best design for replacing part of the index.

[1] 
https://www.postgresql.org/message-id/CAKJS1f_o6v%2BXtT%2B3gfieUdCiU5Sn84humT-CVW5So_x_f%3DkLxQ%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-04 Thread David Rowley
On 2 December 2017 at 11:10, Alvaro Herrera  wrote:
> What you're saying is that I've written code for A+B, and you're
> "interested in C (which is incompatible with B), so can we have A+C and
> drop B".  But in reality, there exists (unwritten) D that solves the
> incompatiblity between B and C.  I'm just saying it's essentially the
> same to postpone C+D than to postpone B+D, and I already have B written;
> plus that way we don't have to come up with some novel way to handle
> pg_dump support.  So can we get A+B committed and discuss C+D later?
>
> A = partitioned indexes
> B = pg_dump support based on ATTACH
> C = your proposed planner stuff
> D = correct indisvalid setting for partitioned indexes (set to false
> when a partition does not contain the index)
>
> The patch in this thread is A+B.

Okay,  I wasn't insisting, just asking if you thought this was missing
from the patch.

However, I do still feel that if we're adding an index to an object
then it should be available in RelOptInfo->indexlist, but this patch
is still good progress even if we don't add it there.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] postgres_fdw super user checks

2017-12-04 Thread Ashutosh Bapat
On Tue, Dec 5, 2017 at 2:49 AM, Robert Haas  wrote:
> On Sun, Dec 3, 2017 at 3:42 PM, Stephen Frost  wrote:
>> I'm not a fan of having *only* warning in the back-branches.  What I
>> would think we'd do here is correct the back-branch documentation to be
>> correct, and then add a warning that it changes in v11.
>>
>> You didn't suggest an actual change wrt the back-branch warning, but it
>> seems to me like it'd end up being morally equivilant to "ok, forget
>> what we just said, what really happens is X, but we fix it in v11."
>
> Yeah, I'm very unclear what, if anything, to do about the back-branch
> documentation.  Suggestions appreciated.

I think the real behaviour can be described as something like this:

"Only superusers may connect to foreign servers without password
authentication, so always specify the password
option for user mappings that may be used by non-superusers." But
which user mappings may be used by non-superusers can not be defined
without explaining views owned by superusers. I don't think we should
be talking about views in that part of documentation.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Errands around AllocateDir()

2017-12-04 Thread Tom Lane
Michael Paquier  writes:
> I had a close look at all the callers of AllocateDir() and noticed a
> couple of unwelcome things (Tom noticed some of those in the thread
> mentioned above, I found others):

Pushed with some minor additional fiddling.  The most notable thing
I changed was that instead of this:

> - perform_base_backup() makes the mistake of not saving errno before
> CheckXLogRemoved() when AllocateDir returns NULL, which can lead to an
> incorrect error message.

I modified CheckXLogRemoved() to internally guarantee that it does not
change errno.  This is because there seemed to be other call sites that
were depending on that, not just this one.  Anyway, that seemed like a
more future-proof fix than relying on callers to deal with it.

regards, tom lane



Re: [HACKERS] pow support for pgbench

2017-12-04 Thread Robert Haas
On Mon, Dec 4, 2017 at 10:47 AM, Fabien COELHO  wrote:
>> What's the name of the backend function whose behavior this matches?
>>
>> As Fabien has mentioned, it tries to behave as "numeric_power". Maybe we
>> it'd better if we switch to "dpow" (which is pow with some error handling)
>> and always return a double. What do you think?
>
> My 0.02€: I think that having a integer pow implementation when possible is
> a good think for pgbench, because the main use case is to deal with table
> keys in a benchmarking scripts, which are expected to be integers.

I'm willing to commit any of the following things:

1. A patch that adds an integer version of pow() but not a double version
2. A patch that adds a double version of pow() but not an integer version
3. A patch that adds both an integer version of pow() and a double
version of pow(), with the two versions having different names

If Raúl is happy with only having an integer version, then I suggest
that he adopt #1 and call it good.  Otherwise, given that Fabien wants
the double version, I suggest we call the integer version pow() and
the double version dpow() and go with #3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-12-04 Thread Bossart, Nathan
On 12/4/17, 2:27 PM, "Robert Haas"  wrote:
> Committed.

Thanks!

Nathan



Is it OK to ignore directory open failure in ResetUnloggedRelations?

2017-12-04 Thread Tom Lane
While working through Michael Paquier's patch to clean up inconsistent
usage of AllocateDir(), I noticed that ResetUnloggedRelations and its
subroutines are not consistent about whether a directory open failure
results in erroring out or just emitting a LOG message and continuing.
ResetUnloggedRelations itself throws a hard error if it fails to open
pg_tblspc, but all the rest of reinit.c thinks a LOG message is
sufficient.

My first thought was to change ResetUnloggedRelations to match the
rest, but on reflection I'm less sure about that.  What we've got
at the moment is that a possibly-transient directory open failure
can result in failure to reset an unlogged relation to empty,
which to me amounts to data corruption.  If the contents of the
unlogged relation are inconsistent, which is plenty likely after
a crash, we could end up crashing later because of that; and in
any case the user would not see what they expect in the tables.

So now I'm thinking we should do the reverse and change these functions
to give a hard error on AllocateDir failure.  That would result in
startup-process failure if we are unable to scan the database, which is
not great, but there's certainly something badly wrong if we can't.

Thoughts?

regards, tom lane



Re: Errands around AllocateDir()

2017-12-04 Thread Alvaro Herrera
Tom Lane wrote:

> Yeah, agreed.  The only thing I'm concerned about back-patching is
> the places where a wrong errno might be reported.

If we're currently reporting "could not open dir: Success" then
backpatching such a fix is definitely an improvement.  OTOH if currently
we have opendir() trying to report a failure, then LWLockRelease replace
the errno because something completely unrelated also failed, having the
message report exactly the opendir() failure rather than the lwlock
failure is surely also an improvement.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Silly API for do_pg_start_backup()

2017-12-04 Thread Tom Lane
Why is it that do_pg_start_backup() expects its callers to pass in
an open DIR pointer for the pg_tblspc directory, rather than doing
the AllocateDir call for itself?  This complicates the callers, provides
no flexibility (since do_pg_start_backup() is hardwired to know that
pg_tblspc is what it was passed), and it doesn't seem to me to offer any
robustness either.  There are a couple of comments suggesting that
somebody thought that opening the directory early might be good, but
there's no defense of why, and I can't see that holding the directory open
across a checkpoint request is adding robustness; if anything more the
reverse, since it's a kernel resource we don't need during that phase.

Barring objection I'll remove the tblspcdir parameter and have
do_pg_start_backup() open and close the directory immediately
around the loop where it's read, like every other case in the
backend.

regards, tom lane



Re: [HACKERS] postgres_fdw super user checks

2017-12-04 Thread Robert Haas
On Sun, Dec 3, 2017 at 3:42 PM, Stephen Frost  wrote:
> I'm not a fan of having *only* warning in the back-branches.  What I
> would think we'd do here is correct the back-branch documentation to be
> correct, and then add a warning that it changes in v11.
>
> You didn't suggest an actual change wrt the back-branch warning, but it
> seems to me like it'd end up being morally equivilant to "ok, forget
> what we just said, what really happens is X, but we fix it in v11."

Yeah, I'm very unclear what, if anything, to do about the back-branch
documentation.  Suggestions appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

2017-12-04 Thread Robert Haas
On Sat, Dec 2, 2017 at 8:04 AM, Amit Kapila  wrote:
> Attached patch contains regression test as well.  Note that I have
> carefully disabled all variable stats by using (analyze, timing off,
> summary off, costs off) and then selected parallel sequential scan on
> the right of join so that we have nloops and rows as variable stats
> and those should remain constant.

The regression test contains a whitespace error about which git diff
--check complains.

Also, looking at this again, shouldn't the reinitialization of the
instrumentation arrays happen in ExecParallelReinitialize rather than
ExecParallelFinish, so that we don't spend time doing it unless the
Gather is actually re-executed?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Secondary index access optimizations

2017-12-04 Thread Pantelis Theodosiou
On Tue, Sep 5, 2017 at 9:10 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
>
> On 05.09.2017 04:02, Amit Langote wrote:
>
> Like Thomas, I'm not so sure about the whole predtest.c patch.  The core
> logic in operator_predicate_proof() should be able to conclude that, say,
> k < 21 is implied by k <= 20, which you are trying to address with some
> special case code.  If there is still problem you think need to be fixed
> here, a better place to look at would be somewhere around get_btree_test_op().
>
>
> Frankly speaking I also do not like this part of my patch.
> I will be pleased if you or somebody else can propose better solution.
> I do not understand how get_btree_test_op() can help here.
>
> Yes, k < 21 is implied by k <= 20. It is based on generic properties of <
> and  <= operators.
> But I need to proof something different: having table partition constraint
> (k < 21) I want to remove predicate (k <= 20) from query.
> In other words,  operator_predicate_proof() should be able to conclude
> that (k <= 20) is implied by (k < 21).
> But it is true only for integer types, not for floating point types. And
> Postgres operator definition
> doesn't provide some way to determine that user defined type is integer
> type: has integer values for which such conclusion is true.
>
> Why I think that it is important? Certainly, it is possible to rewrite
> query as (k < 21) and no changes in operator_predicate_proof() are needed.
> Assume the most natural use case: I have some positive integer key and I
> wan to range partition table by such key, for example with interval 1.
> Currently standard PostgreSQL partitioning mechanism requires to specify
> intervals with open high boundary.
> So if I want first partition to contain interval [1,1], second -
> [10001,20001],... I have to create partitions in such way:
>
> create table bt (k integer, v integer) partition by range (k);
> create table dt1 partition of bt for values from (1) to (10001);
> create table dt2 partition of bt for values from (10001) to (20001);
> ...
>
> If I want to write query inspecting data of the particular partition, then
> most likely I will use BETWEEN operator:
>
> SELECT * FROM t WHERE k BETWEEN 1 and 1;
>
> But right now operator_predicate_proof()  is not able to conclude that
> predicate (k BETWEEN 1 and 1) transformed to (k >= 1 AND k <= 1) is
> equivalent to (k >= 1 AND k < 10001)
> which is used as partition constraint.
>
> Another very popular use case (for example mentioned in PostgreSQL
> documentation of partitioning: https://www.postgresql.org/
> docs/10/static/ddl-partitioning.html)
> is using date as partition key:
>
> CREATE TABLE measurement (
> city_id int not null,
> logdate date not null,
> peaktempint,
> unitsales   int
> ) PARTITION BY RANGE (logdate);
>
>
> CREATE TABLE measurement_y2006m03 PARTITION OF measurement
> FOR VALUES FROM ('2006-03-01') TO ('2006-04-01')
>
>
> Assume that now I want to get measurements for March:
>
> There are three ways to write this query:
>
> select * from measurement where extract(month from logdate) = 3;
> select * from measurement where logdate between '2006-03-01' AND
> '2006-03-31';
> select * from measurement where logdate >= '2006-03-01' AND logdate  <
> '2006-04-01';
>
> Right now only for the last query optimal query plan will be constructed.
>

Perhaps the relative pages (about partitioning and optimization) should
mention to avoid BETWEEN and using closed-open checks, as the last query.

Dates are a perfect example to demonstrate that BETWEEN shouldn't be used,
in my opinion. Dates (and timestamps) are not like integers as they are
often used with different levels of precisions, day, month, year, hour,
minute, second, etc. (month in your example). Constructing the correct
expressions for the different precisions can be a nightmare with BETWEEN
but very simple with >= and < (in the example: get start_date,
'2006-03-01', and add one month).

So, just my 2c, is it worth the trouble to implement this feature
(conversion of (k<21) to (k<=20) and vice versa) and how much work would it
need for all data types that are commonly used for partitioning?



> Unfortunately my patch is not covering date type.
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: [HACKERS] WIP: Covering + unique indexes.

2017-12-04 Thread Andrey Borodin

> 30 нояб. 2017 г., в 23:07, Andrey Borodin  написал(а):
> 
> Seems like it was not a big deal of patching, I've fixed those bits (see 
> attachment). 
> I've done only simple tests as for now, but I'm planning to do better testing 
> before next CF.
> Thanks for mentioning "heapallindexed", I'll use it too.

I've tested the patch with fixed amcheck (including "heapallindexed" feature), 
tests included bulk index creation, pgbenching and amcheck of index itself and 
WAL-replicated index.
Everything worked fine.

Spotted one more typo:
> Since 10.0 there is an optional INCLUDE clause
should be
> Since 11.0 there is an optional INCLUDE clause

I think that patch set (two patches + 1 amcheck diff) is ready for committer.

Best regards, Andrey Borodin.


Re: [HACKERS] Secondary index access optimizations

2017-12-04 Thread Konstantin Knizhnik



On 04.12.2017 19:44, Alvaro Herrera wrote:

Konstantin Knizhnik wrote:


On 30.11.2017 05:16, Michael Paquier wrote:

On Mon, Nov 6, 2017 at 10:13 PM, Konstantin Knizhnik
 wrote:

Concerning broken partition_join test: it is "expected" failure: my patch
removes from the plans redundant checks.
So the only required action is to update expected file with results.
Attached please find updated patch.

The last patch version has conflicts in regression tests and did not
get any reviews. Please provide a rebased version. I am moving the
patch to next CF with waiting on author as status. Thanks!

Rebased patch is attached.

I don't think this is a rebase on the previously posted patch ... it's
about 10x as big and appears to be a thorough rewrite of the entire
optimizer.



Or, sorry. I really occasionally committed in this branch patch for 
aggregate push down.

Correct reabased patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bce3348..6a7e7fb 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -626,12 +626,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- Nu
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest
- QUERY PLAN  
--
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL and c3 is not null;-- NullTest
+QUERY PLAN
+--
  Foreign Scan on public.ft1 t1
Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c3 IS NOT NULL))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1df1e3a..c421530 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -292,7 +292,7 @@ RESET enable_nestloop;
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- NullTest
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL and c3 is not null;-- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;  -- OpExpr(l)
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE 1 = c1!;   -- OpExpr(r)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 44f6b03..c7bf118 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -345,6 +345,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		switch (rel->rtekind)
 		{
 			case RTE_RELATION:
+remove_restrictions_implied_by_constraints(root, rel, rte);
 if (rte->relkind == RELKIND_FOREIGN_TABLE)
 {
 	/* Foreign table */
@@ -1130,6 +1131,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			set_dummy_rel_pathlist(childrel);
 			continue;
 		}
+		remove_restrictions_implied_by_constraints(root, childrel, childRTE);
 
 		/* CE failed, so finish copying/modifying join quals. */
 		childrel->joininfo = (List *)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f743871..f763a97 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1466,6 +1466,51 @@ relation_excluded_by_constraints(PlannerInfo *root,
 	return false;
 }
 
+/*
+ * Remove from restrictions list items implied by table constraints
+ */
+void remove_restrictions_implied_by_constraints(PlannerInfo *root,
+RelOptInfo *rel, RangeTblEntry *rte)
+{
+	List	   *constraint_pred;
+	List	   *safe_constraints = NIL;
+	List	   *safe_restrictions = NIL;
+	ListCell   *lc;
+
+	if (rte->rtekind != RTE_RELATION || rte->inh)
+		return;
+
+	/*
+	 * OK to fetch the constraint 

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2017-12-04 Thread Marco Pfatschbacher
On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> > kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> > kqueue patch to perform better on large FreeBSD systems, it would also
> > be a solution to this problem.
> 
> I just noticed that kqueue appears to offer a solution to this problem,
> ie one of the things you can wait for is exit of another process (named
> by PID, looks like).  If that's portable to all kqueue platforms, then
> integrating a substitute for the postmaster death pipe might push that
> patch over the hump to being a net win.
> 
>   regards, tom lane

Hi,

sorry for the long silence.
I forgot about this after being on vacation.

kqueue does indeed support EVFILT_PROC like you said.
But using this effectively, would need a separate monitoring process,
because we cannot use poll(2) and kevent(2) together within the same loop.

But how about this:
We just use getppid to see whether the process id of our parent
has changed to 1 (init).
Just like calling read on the pipe, that's just one systemcall.

I did not bother to weed the postmaster_alive_fds yet, but
this change works fine for me.

Regards,
   Marco

diff --git a/src/backend/storage/ipc/pmsignal.c 
b/src/backend/storage/ipc/pmsignal.c
index 85db6b21f8..a0596c61fd 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -272,21 +272,16 @@ bool
 PostmasterIsAlive(void)
 {
 #ifndef WIN32
-   charc;
-   ssize_t rc;
+   pid_t   ppid;
 
-   rc = read(postmaster_alive_fds[POSTMASTER_FD_WATCH], , 1);
-   if (rc < 0)
+   ppid = getppid();
+   if (ppid == 1)
{
-   if (errno == EAGAIN || errno == EWOULDBLOCK)
-   return true;
-   else
-   elog(FATAL, "read on postmaster death monitoring pipe 
failed: %m");
+   elog(FATAL, "postmaster died: our ppid changed to init");
+   return false;
}
-   else if (rc > 0)
-   elog(FATAL, "unexpected data in postmaster death monitoring 
pipe");
 
-   return false;
+   return true;
 #else  /* WIN32 */
return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT);
 #endif /* WIN32 */



Re: Errands around AllocateDir()

2017-12-04 Thread Tom Lane
Michael Paquier  writes:
> I had a close look at all the callers of AllocateDir() and noticed a
> couple of unwelcome things (Tom noticed some of those in the thread
> mentioned above, I found others):

The only part of this that seems likely to be controversial is the
decision to get rid of special-case error messages, eg replace
"could not open tablespace directory \"%s\": %m"
with the more generic
"could not open directory \"%s\": %m"

As I said in the previous thread, I don't see anything much wrong
with that; but if anyone doesn't like it, speak now.

Also, I'm inclined to back-patch the exporting of ReadDirExtended,
so that we can rely on it being available if we use it in any
back-patched fixes.  I wouldn't back-patch anything else here that's
not a clear bug fix, but we may need public ReadDirExtended anyway
just to do that much (I didn't really check that yet).

regards, tom lane



Re: [HACKERS] Secondary index access optimizations

2017-12-04 Thread Alvaro Herrera
Konstantin Knizhnik wrote:
> 
> 
> On 30.11.2017 05:16, Michael Paquier wrote:
> > On Mon, Nov 6, 2017 at 10:13 PM, Konstantin Knizhnik
> >  wrote:
> > > Concerning broken partition_join test: it is "expected" failure: my patch
> > > removes from the plans redundant checks.
> > > So the only required action is to update expected file with results.
> > > Attached please find updated patch.
> > The last patch version has conflicts in regression tests and did not
> > get any reviews. Please provide a rebased version. I am moving the
> > patch to next CF with waiting on author as status. Thanks!
> 
> Rebased patch is attached.

I don't think this is a rebase on the previously posted patch ... it's
about 10x as big and appears to be a thorough rewrite of the entire
optimizer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Signals in a BGW

2017-12-04 Thread Chapman Flack
I'm curious about handling signals in a background worker.

As I understand it, these are blocked when the BGW is born, until
enabled with BackgroundWorkerUnblockSignals() after possible
customization.

Is there a known, default behavior that some signals will
have, if I simply BackgroundWorkerUnblockSignals() without customizing?
Will SIGINT and SIGTERM, for example, have default handlers that
interact with the volatile flags in miscadmin.h that are used by
CHECK_FOR_INTERRUPTS, and set the process latch?

I notice that the worker_spi example code customizes SIGHUP
and SIGTERM, giving them handlers that set a (different, local
to the module) volatile flag, and set the process latch.

The example code does call CHECK_FOR_INTERRUPTS, though I assume
this won't detect ProcDie at all, now that the custom SIGTERM
handler is setting a different, local flag. Perhaps it still
does something useful with QueryCancel?

Does this use of a custom SIGTERM handler, setting a custom flag,
setting the process latch, and then checked in custom code,
accomplish something important that would not be accomplished
by a generic handler and CHECK_FOR_INTERRUPTS ?

I'm just not clearly grasping the reason it's customized here.

-Chap



Re: [HACKERS] PoC: full merge join on comparison clause

2017-12-04 Thread Alexander Kuzmenkov

Here is the patch rebased to a852cfe9.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index ef9e1ee471..c842ed2968 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -172,31 +172,32 @@ typedef enum
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
  */
-static MergeJoinClause
+static void
 MJExamineQuals(List *mergeclauses,
 			   Oid *mergefamilies,
 			   Oid *mergecollations,
 			   int *mergestrategies,
 			   bool *mergenullsfirst,
-			   PlanState *parent)
+			   MergeJoinState *parent)
 {
-	MergeJoinClause clauses;
 	int			nClauses = list_length(mergeclauses);
 	int			iClause;
 	ListCell   *cl;
 
-	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool));
+	parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool));
 
 	iClause = 0;
 	foreach(cl, mergeclauses)
 	{
 		OpExpr	   *qual = (OpExpr *) lfirst(cl);
-		MergeJoinClause clause = [iClause];
+		MergeJoinClause clause = >mj_Clauses[iClause];
 		Oid			opfamily = mergefamilies[iClause];
 		Oid			collation = mergecollations[iClause];
-		StrategyNumber opstrategy = mergestrategies[iClause];
+		StrategyNumber sort_op_strategy = mergestrategies[iClause];
 		bool		nulls_first = mergenullsfirst[iClause];
-		int			op_strategy;
+		int			join_op_strategy;
 		Oid			op_lefttype;
 		Oid			op_righttype;
 		Oid			sortfunc;
@@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses,
 		/*
 		 * Prepare the input expressions for execution.
 		 */
-		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
-		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent);
 
 		/* Set up sort support data */
 		clause->ssup.ssup_cxt = CurrentMemoryContext;
 		clause->ssup.ssup_collation = collation;
-		if (opstrategy == BTLessStrategyNumber)
+		if (sort_op_strategy == BTLessStrategyNumber)
 			clause->ssup.ssup_reverse = false;
-		else if (opstrategy == BTGreaterStrategyNumber)
+		else if (sort_op_strategy == BTGreaterStrategyNumber)
 			clause->ssup.ssup_reverse = true;
 		else	/* planner screwed up */
-			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+			elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy);
 		clause->ssup.ssup_nulls_first = nulls_first;
 
 		/* Extract the operator's declared left/right datatypes */
 		get_op_opfamily_properties(qual->opno, opfamily, false,
-   _strategy,
+   _op_strategy,
    _lefttype,
    _righttype);
-		if (op_strategy != BTEqualStrategyNumber)	/* should not happen */
-			elog(ERROR, "cannot merge using non-equality operator %u",
- qual->opno);
+
+		/*
+		 * Determine whether we accept lesser and/or equal tuples of the inner
+		 * relation.
+		 */
+		switch (join_op_strategy)
+		{
+			case BTEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+break;
+
+			case BTLessEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTLessStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			case BTGreaterEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTGreaterStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			default:
+elog(ERROR, "unsupported join strategy %d", join_op_strategy);
+		}
 
 		/*
 		 * sortsupport routine must know if abbreviation optimization is
@@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses,
 
 		iClause++;
 	}
-
-	return clauses;
 }
 
 /*
@@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 	return result;
 }
 
+/* Tuple comparison result */
+typedef enum
+{
+	MJCR_NextInner = 1,
+	MJCR_NextOuter = -1,
+	MJCR_Join = 0
+} MJCompareResult;
+
 /*
  * MJCompare
  *
@@ -388,10 +422,10 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
  * MJEvalOuterValues and MJEvalInnerValues must already have been called
  * for the current outer and inner tuples, respectively.
  */
-static int
+static MJCompareResult
 MJCompare(MergeJoinState *mergestate)
 {
-	int			result = 0;
+	MJCompareResult result = MJCR_Join;
 	bool		nulleqnull = false;
 	ExprContext *econtext = mergestate->js.ps.ps_ExprContext;
 	int			i;
@@ -408,6 +442,7 @@ MJCompare(MergeJoinState *mergestate)
 	for (i = 0; i < mergestate->mj_NumClauses; i++)
 	{
 		MergeJoinClause clause = >mj_Clauses[i];
+		int			

Re: [HACKERS] [POC] Faster processing at Gather node

2017-12-04 Thread Robert Haas
On Sun, Dec 3, 2017 at 10:30 PM, Amit Kapila  wrote:
> I thought there are some cases (though less) where we want to Shutdown
> the nodes (ExecShutdownNode) earlier and release the resources sooner.
> However, if you are not completely sure about this change, then we can
> leave it as it.  Thanks for sharing your thoughts.

OK, thanks.  I committed that patch, after first running 100 million
tuples through a Gather over and over again to test for leaks.
Hopefully I haven't missed anything here, but it looks like it's
solid.  Here once again are the remaining patches.  While the
already-committed patches are nice, these two are the ones that
actually produced big improvements in my testing, so it would be good
to move them along.  Any reviews appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-shm-mq-less-spinlocks-v2.patch
Description: Binary data


0002-shm-mq-reduce-receiver-latch-set-v1.patch
Description: Binary data


Re: [HACKERS] pow support for pgbench

2017-12-04 Thread Fabien COELHO


Please add the new function into the documentation table in 
alphabetical order.


Fixed in the attached patch.


Yep. Patch applies cleanly. Make check & pgbench check ok. make html ok. 
POW is in the right place in the table, sorry I did not check before.



What's the name of the backend function whose behavior this matches?

As Fabien has mentioned, it tries to behave as "numeric_power". Maybe we 
it'd better if we switch to "dpow" (which is pow with some error 
handling) and always return a double. What do you think?


My 0.02€: I think that having a integer pow implementation when possible 
is a good think for pgbench, because the main use case is to deal with 
table keys in a benchmarking scripts, which are expected to be integers.


--
Fabien.

Re: [HACKERS] Cached plans and statement generalization

2017-12-04 Thread Konstantin Knizhnik



On 30.11.2017 04:59, Michael Paquier wrote:

On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik
 wrote:

One more patch passing all regression tests with autoprepare_threshold=1.
I still do not think that it should be switch on by default...

This patch does not apply, and did not get any reviews. So I am moving
it to next CF with waiting on author as status. Please provide a
rebased version. Tsunakawa-san, you are listed as a reviewer of this
patch. If you are not planning to look at it anymore, you may want to
remove your name from the related CF entry
https://commitfest.postgresql.org/16/1150/.


Updated version of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index c2a93b2..0e6cc89 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3688,6 +3688,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree.
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ *
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ *
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamRef:
+		case T_A_Const:
+		case T_A_Star:
+			/* primitive node types with no subnodes */
+			break;
+		case T_Alias:
+			/* we assume the colnames list isn't interesting */
+			break;
+		case T_RangeVar:
+			return mutator(&((RangeVar *) node)->alias, context);
+		case T_GroupingFunc:
+			return mutator(&((GroupingFunc *) node)->args, context);
+		case T_SubLink:
+			{
+SubLink	   *sublink = (SubLink *) node;
+
+if (mutator(>testexpr, context))
+	return true;
+/* we assume the operName is not interesting */
+if (mutator(>subselect, context))
+	return true;
+			}
+			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *caseexpr = (CaseExpr *) node;
+
+if (mutator(>arg, context))
+	return true;
+/* we assume mutator(& doesn't care about CaseWhens, either */
+foreach(temp, caseexpr->args)
+{
+	CaseWhen   *when = (CaseWhen *) lfirst(temp);
+
+	Assert(IsA(when, CaseWhen));
+	if (mutator(>expr, context))
+		return true;
+	if (mutator(>result, context))
+		return true;
+}
+if (mutator(>defresult, context))
+	return true;
+			}
+			break;
+		case T_RowExpr:
+			/* Assume colnames isn't interesting */
+			return mutator(&((RowExpr *) node)->args, context);
+		case T_CoalesceExpr:
+			return mutator(&((CoalesceExpr *) node)->args, context);
+		case T_MinMaxExpr:
+			return mutator(&((MinMaxExpr *) node)->args, context);
+		case T_XmlExpr:
+			{
+XmlExpr	   *xexpr = (XmlExpr *) node;
+
+if (mutator(>named_args, context))
+	return true;
+/* we assume mutator(& doesn't care about arg_names */
+if (mutator(>args, context))
+	return true;
+			}
+			break;
+		case T_NullTest:
+			return mutator(&((NullTest *) node)->arg, context);
+		case T_BooleanTest:
+			return mutator(&((BooleanTest *) node)->arg, context);
+		case T_JoinExpr:
+			{
+JoinExpr   *join = (JoinExpr *) node;
+
+if (mutator(>larg, context))
+	return true;
+if (mutator(>rarg, context))
+	return true;
+if (mutator(>quals, context))
+	return true;
+if (mutator(>alias, context))
+	return true;
+/* using list is deemed uninteresting */
+			}
+			break;
+		case T_IntoClause:
+			{
+IntoClause *into = (IntoClause *) node;
+
+if (mutator(>rel, context))
+	return true;
+/* colNames, options are deemed uninteresting */
+/* viewQuery 

Error handling (or lack of it) in RemovePgTempFilesInDir

2017-12-04 Thread Tom Lane
The recent changes in commit dc6c4c9dc caused Coverity to start
complaining that RemovePgTempFilesInDir calls rmdir() without checking
its return value, as would be good practice.  Now, this wasn't really a
fault of that commit, because the code was already ignoring the result of
unlink(), but it did cause me to wonder why we're ignoring either one.
Even granted that we don't want to throw ERROR there (because this code
runs in the postmaster, and an ERROR would force postmaster exit), our
usual coding practice would be to print a LOG message about the failure.

Tracing back, the decision to ignore errors altogether seems to originate
with commit 2a6f7ac45, which added this code:

if (strncmp(temp_de->d_name,
PG_TEMP_FILE_PREFIX,
strlen(PG_TEMP_FILE_PREFIX)) == 0)
{
unlink(rm_path);
}
else
{
/*
 * would prefer to use elog here, but it's not
 * up and running during postmaster startup...
 */
fprintf(stderr,
"Unexpected file found in temporary-files 
directory: %s\n",
rm_path);
}

I'm not real sure that the bit about "elog not being up" was true even
at the time, and it's definitely wrong now; but perhaps thinking that
it wasn't up explains why we chose not to whine about unlink failure.
Or maybe there was no thought that it could fail at all --- it's surely
odd to complain about "this file shouldn't be here" but not about
"I should be able to remove this file but failed to".

Anyway, I'm inclined to reverse that choice and emit LOG messages
reporting failure of any of the lstat, rmdir, or unlink calls in
RemovePgTempFilesInDir.  In the worst case, say that there are a
bunch of leftover temp files in a directory that someone has made
unwritable, this might cause a fair amount of whining in the postmaster
log --- but that's a situation that needs to be fixed anyway, so
I cannot see how not printing anything is a good idea.

I'm also inclined to convert the ReadDir calls in this function to be
ReadDirExtended(..., LOG), as per Michael's proposal in
https://www.postgresql.org/message-id/cab7npqrpocxjiirhmebefhxvtk7v5jvw4bz82p7oimtsm3t...@mail.gmail.com
If we don't want to throw a hard error for failure to remove files,
it seems like throwing an error for failure to read a temp dir isn't
a great choice either.

regards, tom lane



Re: [HACKERS] proposal: psql command \graw

2017-12-04 Thread Pavel Stehule
2017-12-04 9:29 GMT+01:00 Alexander Korotkov :

> On Mon, Dec 4, 2017 at 11:21 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> The problem is that it's hard to read arbitrary formatted psql output
>> from external program (not just gnuplot, but even especially written
>> script).  I made my scripts read few variations, but it doesn't look
>> feasible to read all the combinations.  For sure, it's possible to set
>> format options inside macro, but then it would affect psql format options
>> after execution.
>>
>> This is why I think only one \graw option is just fine, because it
>> produces stable machine-readable output.
>>
>
> Oh, I just get that in current state of \graw doesn't produce good
> machine-readable output.
>
> # select '|', '|' \graw
> |||
>
> Column separator is character which can occur in values, and values aren't
> escaped.  Thus, reader can't correctly divide values between columns in all
> the cases.  So, I would rather like to see \graw to output in csv format
> with proper escaping.
>

current \graw implementation is pretty minimalistic

It is interesting topic - the client side csv support.

It can simplify lot of things

Regards

Pavel


> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: [HACKERS] Surjective functional indexes

2017-12-04 Thread Konstantin Knizhnik

On 30.11.2017 05:02, Michael Paquier wrote:

On Wed, Sep 27, 2017 at 4:07 PM, Simon Riggs  wrote:

On 15 September 2017 at 16:34, Konstantin Knizhnik
 wrote:


Attached please find yet another version of the patch.

Thanks. I'm reviewing it.

Two months later, this patch is still waiting for a review (you are
listed as well as a reviewer of this patch). The documentation of the
patch has conflicts, please provide a rebased version. I am moving
this patch to next CF with waiting on author as status.

Attached please find new patch with resolved documentation conflict.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 0255375..7ec6a0d 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -294,8 +294,33 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 The optional WITH clause specifies storage
 parameters for the index.  Each index method has its own set of allowed
-storage parameters.  The B-tree, hash, GiST and SP-GiST index methods all
-accept this parameter:
+storage parameters.  All indexes accept the following parameter:
+   
+
+   
+   
+projection
+
+ 
+   Functional index is based on on projection function: function which extract subset of its argument.
+   In mathematic such functions are called non-injective. For injective function if any attribute used in the indexed
+   expression is changed, then value of index expression is also changed. So to check that index is affected by the
+   update, it is enough to check the set of changed fields. By default this parameters is assigned true value and function is considered
+   as non-injective.
+   In this case change of any of indexed key doesn't mean that value of the function is changed. For example, for
+   the expression expression(bookinfo-'isbn') defined
+   for column of JSON type is changed only when ISBN is changed, which rarely happen. The same is true for most
+   functional indexes. For non-injective functions, Postgres compares values of indexed expression for old and updated tuple and updates
+   index only when function results are different. It allows to eliminate index update and use HOT update.
+   But there are extra evaluations of the functions. So if function is expensive or probability that change of indexed column will not effect
+   the function value is small, then marking index as projection may increase update speed.
+
+
+   
+   
+
+   
+ The B-tree, hash, GiST and SP-GiST index methods all accept this parameter:

 

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index aa9c0f1..0dae5f1 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -131,6 +131,15 @@ static relopt_bool boolRelOpts[] =
 	},
 	{
 		{
+			"projection",
+			"Evaluate functional index expression on update to check if its values is changed",
+			RELOPT_KIND_INDEX,
+			AccessExclusiveLock
+		},
+		true
+	},
+	{
+		{
 			"security_barrier",
 			"View acts as a row security barrier",
 			RELOPT_KIND_VIEW,
@@ -1311,7 +1320,7 @@ fillRelOptions(void *rdopts, Size basesize,
 break;
 			}
 		}
-		if (validate && !found)
+		if (validate && !found && options[i].gen->kinds != RELOPT_KIND_INDEX)
 			elog(ERROR, "reloption \"%s\" not found in parse table",
  options[i].gen->name);
 	}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3acef27..7f356e7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -56,6 +56,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/index.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -74,7 +75,9 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
-
+#include "utils/memutils.h"
+#include "nodes/execnodes.h"
+#include "executor/executor.h"
 
 /* GUC variable */
 bool		synchronize_seqscans = true;
@@ -126,6 +129,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
 	   bool *copy);
+static bool ProjectionIsNotChanged(Relation relation, HeapTuple oldtup, HeapTuple newtup);
 
 
 /*
@@ -3485,6 +3489,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	HTSU_Result result;
 	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
+	Bitmapset  *warm_attrs;
 	Bitmapset  *key_attrs;
 	Bitmapset  *id_attrs;
 	Bitmapset  *interesting_attrs;
@@ -3548,12 +3553,11 @@ 

Re: [HACKERS] pgbench more operators & functions

2017-12-04 Thread Fabien COELHO



I'm not sure why it wasn't failing before, but I have issues building the
doc:

+are built into
pgbench
Missing '/' to close the xref


Indeed, missing xml-ization. The format was still SGML when the patch was 
developed.



+  1  3
Expecting ';' as the previous use ()


Indeed, a typo.


Regenerated v15 that applies cleanly on head. No changes.


Attached v16 fixes those two errors. I regenerated the documentation with 
the new xml toolchain, and made "check" overall and in pgbench.


Thanks for the debug,

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 94b495e..0708ad2 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -904,14 +904,32 @@ pgbench  options  d
  
   Sets variable varname to a value calculated
   from expression.
-  The expression may contain integer constants such as 5432,
+  The expression may contain the NULL constant,
+  boolean constants TRUE and FALSE,
+  integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
-  function calls, and
-  parentheses.
+  operators
+  with their usual SQL precedence and associativity,
+  function calls,
+  SQL CASE generic conditional
+  expressions and parentheses.
+ 
+
+ 
+  Functions and most operators return NULL on
+  NULL input.
+ 
+
+ 
+  For conditional purposes, non zero numerical values are
+  TRUE, zero numerical values and NULL
+  are FALSE.
+ 
+
+ 
+  When no final ELSE clause is provided to a
+  CASE, the default value is NULL.
  
 
  
@@ -920,6 +938,7 @@ pgbench  options  d
 \set ntellers 10 * :scale
 \set aid (1021 * random(1, 10 * :scale)) % \
(10 * :scale) + 1
+\set divx CASE WHEN :x  0 THEN :y/:x ELSE NULL END
 
 

@@ -996,6 +1015,177 @@ pgbench  options  d
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator
+  Description
+  Example
+  Result
+ 
+
+
+ 
+  OR
+  logical or
+  5 or 0
+  TRUE
+ 
+ 
+  AND
+  logical and
+  3 and 0
+  FALSE
+ 
+ 
+  NOT
+  logical not
+  not false
+  TRUE
+ 
+ 
+  IS [NOT] (NULL|TRUE|FALSE)
+  value tests
+  1 is null
+  FALSE
+ 
+ 
+  ISNULL|NOTNULL
+  null tests
+  1 notnull
+  TRUE
+ 
+ 
+  =
+  is equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  is not equal
+  5  4
+  TRUE
+ 
+ 
+  !=
+  is not equal
+  5 != 5
+  FALSE
+ 
+ 
+  
+  lower than
+  5  4
+  FALSE
+ 
+ 
+  =
+  lower or equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  greater than
+  5  4
+  TRUE
+ 
+ 
+  =
+  greater or equal
+  5 = 4
+  TRUE
+ 
+ 
+  |
+  integer bitwise OR
+  1 | 2
+  3
+ 
+ 
+  #
+  integer bitwise XOR
+  1 # 3
+  2
+ 
+ 
+  
+  integer bitwise AND
+  1  3
+  1
+ 
+ 
+  ~
+  integer bitwise NOT
+  ~ 1
+  -2
+ 
+ 
+  
+  integer bitwise shift left
+  1  2
+  4
+ 
+ 
+  
+  integer bitwise shift right
+  8  2
+  2
+ 
+ 
+  +
+  addition
+  5 + 4
+  9
+ 
+ 
+  -
+  substraction
+  3 - 2.0
+  1.0
+ 
+ 
+  *
+  multiplication
+  5 * 4
+  20
+ 
+ 
+  /
+  division (integer truncates the results)
+  5 / 3
+  1
+ 
+ 
+  %
+  modulo
+  3 % 2
+  1
+ 
+ 
+  -
+  opposite
+  - 2.0
+  -2.0
+ 
+
+   
+  
+ 
+
  
   Built-In Functions
 
@@ -1042,6 +1232,13 @@ pgbench  options  d
5432.0
   
   
+   exp(x)
+   double
+   exponential
+   exp(1.0)
+   2.718281828459045
+  
+  
greatest(a [, ... ] )
double if any a is double, else integer
largest value among arguments
@@ -1063,6 +1260,20 @@ pgbench  options  d
2.1
   
   
+   ln(x)
+   double
+   natural logarithm
+   ln(2.718281828459045)
+   1.0
+  
+  
+   mod(i, bj)
+   integer
+   modulo
+   mod(54, 32)
+   22
+  
+  
pi()
double
value of the constant PI
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index b3a2d9b..770be98 100644
--- a/src/bin/pgbench/exprparse.y
+++ 

Re: [HACKERS] logical decoding of two-phase transactions

2017-12-04 Thread Nikhil Sontakke
PFA, latest patch for this functionality.
This patch contains the following changes as compared to the earlier patch:

- Fixed a bunch of typos and comments

- Modified HeapTupleSatisfiesVacuum to return HEAPTUPLE_RECENTLY_DEAD
if the transaction id is newer than OldestXmin. Doing this only for
CATALOG tables (htup->t_tableOid < (Oid) FirstNormalObjectId).

- Added a filter callback filter_decode_txn_cb_wrapper() to decide if
it's ok to decode the NEXT change record. This filter as of now checks
if the XID that is involved got aborted. Additional checks can be
added here as needed.

- Added ABORT callback in the decoding process. This was not needed
before because we always used to decode committed transactions. With
2PC transactions, it possible that while we are decoding it, another
backend might issue  a concurrent ROLLBACK PREPARED. So when
filter_decode_txn_cb_wrapper() gets called, it will tell us to not to
decode the next change record. In that case we need to send an ABORT
to the subscriber (and not ROLLBACK PREPARED because we are yet to
issue PREPARE to the subscriber)

- Added all functionality to read the abort command and apply it on
the remote subscriber as needed.

- Added functionality in ReorderBufferCommit() to abort midways based
on the feedback from filter_decode_txn_cb_wrapper()

- Modified LockGXact() and FinishPreparedTransaction() to allow
missing GID in case of "ROLLBACK PREPARED". Currently, this will only
happen in the logical apply code path. We still send it to the
subscriber because it's difficult to identify on the provider if this
transaction was aborted midways in decoding or if it's in PREPARED
state on the subscriber. It will error out as before in all other
cases.

- Totally removed snapshot addition/deletion code while doing the
decoding. That's not needed at all while decoding an ongoing
transaction. The entries in the snapshot are needed for future
transactions to be able to decode older transactions. For 2PC
transactions, we don't need to decode them till COMMIT PREPARED gets
called. This has simplified all that unwanted snapshot push/pop code,
which is nice.

Regards,
Nikhils

On 30 November 2017 at 16:08, Nikhil Sontakke  wrote:
> Hi,
>
>
>>> So perhaps better approach would be to not return
>>> HEAPTUPLE_DEAD if the transaction id is newer than the OldestXmin (same
>>> logic we use for deleted tuples of committed transactions) in the
>>> HeapTupleSatisfiesVacuum() even for aborted transactions. I also briefly
>>> checked HOT pruning and AFAICS the normal HOT pruning (the one not
>>> called by vacuum) also uses the xmin as authoritative even for aborted
>>> txes so nothing needs to be done there probably.
>>>
>>> In case we are worried that this affects cleanups of for example large
>>> aborted COPY transactions and we think it's worth worrying about then we
>>> could limit the new OldestXmin based logic only to catalog tuples as
>>> those are the only ones we need available in decoding.
>>
>>
>> Yeah, if it's limited to catalog tuples only then that sounds good. I was
>> quite concerned about how it'd impact vacuuming otherwise, but if limited to
>> catalogs about the only impact should be on workloads that create lots of
>> TEMPORARY tables then ROLLBACK - and not much on those.
>>
>
> Based on these discussions, I think there are two separate issues here:
>
> 1) Make HeapTupleSatisfiesVacuum() to behave differently for recently
> aborted catalog tuples.
>
> 2) Invent a mechanism to stop a specific logical decoding activity in
> the middle. The reason to stop it could be a concurrent abort, maybe a
> global transaction manager decides to rollback, or any other reason,
> for example.
>
>  ISTM, that for 2, if (1) is able to leave the recently abort tuples
> around for a little bit while (we only really need them till the
> decode of the current change record is ongoing), then we could
> accomplish it via a callback. This callback should be called before
> commencing decode and network send of each change record. In case of
> in-core logical decoding, the callback for pgoutput could check for
> the transaction having aborted (a call to TransactionIdDidAbort() or
> similar such functions), additional logic can be added as needed for
> various scenarios. If it's aborted, we will abandon decoding and send
> an ABORT to the subscribers before returning.
>
> Regards,
> Nikhils



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


2pc_logical_04_12_17.patch
Description: Binary data


Re: Re: User defined data types in Logical Replication

2017-12-04 Thread Masahiko Sawada
On Wed, Nov 22, 2017 at 12:25 AM, Huong Dangminh
 wrote:
> Thanks for your response.
> # And sorry again because I could not reply to your gmail
> # address from my environment due to security restriction.

It's okay. I can understand your environment :-)

> Sorry for not replying sooner.
>
>> > Attached draft patch fixed this issue, at least on my environment.
>>
>> It works good for me.
>>
>> > Please review it.
>>
>> I will review it soon.
>
> There is one more case that user-defined data type is not supported in 
> Logical Replication.
> That is when remote data type's name does not exist in SUBSCRIBE.
>
> In relation.c:logicalrep_typmap_gettypname
> We search OID in syscache by remote's data type name and mapping it, if it 
> does not exist in syscache
> We will be faced with the bellow error.
>
> if (!OidIsValid(entry->typoid))
> ereport(ERROR,
> 
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("data type \"%s.%s\" required for 
> logical replication does not exist",
> entry->nspname, 
> entry->typname)));
>
> I think, it is not necessary to check typoid here in order to avoid above 
> case, is that right?

I think it's not right. We should end up with an error in the case
where the same type name doesn't exist on subscriber. With your
proposed patch, logicalrep_typmap_gettypname() can return an invalid
string (entry->typname) in that case, which can be a cause of SEGV.

> Sorry, I have added this thread to the next CF.

Thank you for adding it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: BUG #14941: Vacuum crashes

2017-12-04 Thread Robert Haas
On Mon, Dec 4, 2017 at 8:59 AM, Lyes Ameddah  wrote:
> sorry guys, yes I'm talking about a FULL VACUUM and not about Auto-Vacuum.
> Thank you very match for your feedback.

OK, but that's not the confusion.  What you said is that it CRASHES,
but the behavior you described is that it BLOCKS waiting for a lock.
Blocking and crashing are not the same thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Would a BGW need shmem_access or database_connection to enumerate databases?

2017-12-04 Thread Craig Ringer
On 1 December 2017 at 23:04, Chapman Flack  wrote:

> On 11/29/2017 05:48 PM, Chapman Flack wrote:
> > I'm thinking of writing a background worker that will enumerate
> > the databases present, and spin off, for each one, another BGW
> > that will establish a connection and do stuff.
>
> Can I even do this?
>
>  "Unlike RegisterBackgroundWorker, which can only be called
>   from within the postmaster, RegisterDynamicBackgroundWorker
>   must be called from a regular backend."
>
> Can I call RegisterDynamicBackgroundWorker when not in the postmaster,
> but also not in a "regular backend", but rather another BGW?
>

Yes. BDR does it a lot.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Would a BGW need shmem_access or database_connection to enumerate databases?

2017-12-04 Thread Craig Ringer
On 30 November 2017 at 06:48, Chapman Flack  wrote:

> I'm thinking of writing a background worker that will enumerate
> the databases present, and spin off, for each one, another BGW
> that will establish a connection and do stuff.
>
> For the "master" one, what capabilities will it need to simply
> enumerate the current names of known databases? I suppose I could
> have it connect to the null dbname and query pg_database. Would
> that be the civilized way to do it, or am I missing a simpler way?
>

pglogical does exactly this. Take a look at start_manager_workers in
pglogical.c

https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical.c#L594

and the caller pglogical_supervisor_main .


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Add GROUPS option to the Window Functions

2017-12-04 Thread Oliver Ford
Adds the GROUPS option to the window framing clause. This further
resolves TODO list item "Implement full support for window framing
clauses" and implements SQL:2011 T620. No other mainstream db has this
feature.

Apply this on top of my previous patch available here:
https://www.postgresql.org/message-id/CAGMVOdvETRCKpeT06Uoq5RsNUOdH7d1iYy7C1Pze%3DL5%3DgBzs-Q%40mail.gmail.com

== Specification ==

The GROUPS option is defined in SQL:2011 in addition to ROWS and
RANGE. Where ROWS calculate frame bounds by the number of rows before
and after the current row, and RANGE by the values of an ORDER BY
column, GROUPS calculates frame bounds by the number of changes to the
values of the ORDER BY columns.

GROUPS behaves similar to RANGE in that if two rows are peers, they
are both included in the frame. A row is out of frame if it is both
not a peer of the current row and also outside of the bounds specified
by start_value and end_value. Note that if neither start_value or
end_value are specified, then GROUPS will always produce the same
results as RANGE. So UNBOUNDED PRECEDING AND CURRENT ROW, or CURRENT
ROW AND UNBOUNDED FOLLOWING produce the same results in GROUPS and
RANGE mode (the syntax is slightly confusing as CURRENT ROW in these
modes includes peers of the actual current row).

The standard also defines an EXCLUDE GROUP option which excludes the
current row and any peers from the frame. This can be used in all
three modes, and is included in the patch.

== Performance Considerations ==

The code calculates the size of each window group for every partition
and stores this in a dynamic array. I chose 16 as the initial capacity
of the array, which doubles as needed. Real-world testing may show
that a lower or higher initial capacity is preferable for the majority
of use cases. The code also calls pfree on this array at the end of
each partition, to avoid memory hogging if there are many partitions.

== Testing ==

Tested on Windows with MinGW. All existing regression tests pass. New
tests and updated documentation is included. Tests show the results of
the GROUPS option and the EXCLUDE GROUP option also working in RANGE
and ROWS mode.


0001-window-groups-v1.patch
Description: Binary data


Re: using index or check in ALTER TABLE SET NOT NULL

2017-12-04 Thread Sergei Kornilov
Hello
I update patch and also rebase to current head. I hope now it is better aligned 
with project stylediff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dd4a8d3..36bcb3f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1227,7 +1227,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 	 * not contain any row that would belong to the new partition, we can
 	 * avoid scanning the default partition.
 	 */
-	if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
+	if (ConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
 	{
 		ereport(INFO,
 (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
@@ -1271,8 +1271,8 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 			 * that it will not contain any row that would belong to the new
 			 * partition, we can avoid scanning the child table.
 			 */
-			if (PartConstraintImpliedByRelConstraint(part_rel,
-	 def_part_constraints))
+			if (ConstraintImpliedByRelConstraint(part_rel,
+ def_part_constraints))
 			{
 ereport(INFO,
 		(errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d979ce2..35eac39 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -162,7 +162,7 @@ typedef struct AlteredTableInfo
 	/* Information saved by Phases 1/2 for Phase 3: */
 	List	   *constraints;	/* List of NewConstraint */
 	List	   *newvals;		/* List of NewColumnValue */
-	bool		new_notnull;	/* T if we added new NOT NULL constraints */
+	bool		verify_new_notnull; /* T if we should recheck NOT NULL */
 	int			rewrite;		/* Reason for forced rewrite, if any */
 	Oid			newTableSpace;	/* new tablespace; 0 means no change */
 	bool		chgPersistence; /* T if SET LOGGED/UNLOGGED is used */
@@ -370,6 +370,7 @@ static ObjectAddress ATExecDropNotNull(Relation rel, const char *colName, LOCKMO
 static void ATPrepSetNotNull(Relation rel, bool recurse, bool recursing);
 static ObjectAddress ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
  const char *colName, LOCKMODE lockmode);
+static bool NotNullImpliedByRelConstraints(Relation rel, Form_pg_attribute attr);
 static ObjectAddress ATExecColumnDefault(Relation rel, const char *colName,
 	Node *newDefault, LOCKMODE lockmode);
 static ObjectAddress ATExecAddIdentity(Relation rel, const char *colName,
@@ -4304,7 +4305,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			 * Test the current data within the table against new constraints
 			 * generated by ALTER TABLE commands, but don't rebuild data.
 			 */
-			if (tab->constraints != NIL || tab->new_notnull ||
+			if (tab->constraints != NIL || tab->verify_new_notnull ||
 tab->partition_constraint != NULL)
 ATRewriteTable(tab, InvalidOid, lockmode);
 
@@ -4465,7 +4466,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	}
 
 	notnull_attrs = NIL;
-	if (newrel || tab->new_notnull)
+	if (newrel || tab->verify_new_notnull)
 	{
 		/*
 		 * If we are rebuilding the tuples OR if we added any new NOT NULL
@@ -5461,7 +5462,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		 * null, but since it's filled specially, there's no need to test
 		 * anything.)
 		 */
-		tab->new_notnull |= colDef->is_not_null;
+		tab->verify_new_notnull |= colDef->is_not_null;
 	}
 
 	/*
@@ -5863,8 +5864,11 @@ ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
 
 		CatalogTupleUpdate(attr_rel, >t_self, tuple);
 
-		/* Tell Phase 3 it needs to test the constraint */
-		tab->new_notnull = true;
+		if (!NotNullImpliedByRelConstraints(rel, (Form_pg_attribute) GETSTRUCT(tuple)))
+		{
+			/* Tell Phase 3 it needs to test the constraint */
+			tab->verify_new_notnull = true;
+		}
 
 		ObjectAddressSubSet(address, RelationRelationId,
 			RelationGetRelid(rel), attnum);
@@ -5881,6 +5885,46 @@ ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
 }
 
 /*
+ * NotNullImpliedByRelConstraints
+ *		Does rel's existing constraints imply NOT NULL for given attribute
+ */
+static bool
+NotNullImpliedByRelConstraints(Relation rel, Form_pg_attribute attr)
+{
+	List	   *notNullConstraint = NIL;
+	NullTest   *nnulltest = makeNode(NullTest);
+
+	nnulltest->arg = (Expr *) makeVar(1,
+	  attr->attnum,
+	  attr->atttypid,
+	  attr->atttypmod,
+	  attr->attcollation,
+	  0);
+	nnulltest->nulltesttype = IS_NOT_NULL;
+
+	/*
+	 * same thing as in ConstraintImpliedByRelConstraint argisrow=false is
+	 * correct even for a composite column, because attnotnull does not
+	 * represent a SQL-spec IS NOT NULL test in such a case, just IS DISTINCT
+	 * FROM NULL.
+	 */
+	nnulltest->argisrow = false;
+	

Re: BUG #14941: Vacuum crashes

2017-12-04 Thread Lyes Ameddah
sorry guys, yes I'm talking about a FULL VACUUM and not about Auto-Vacuum.
Thank you very match for your feedback.

That's Waht I do :
vacuum FULL VERBOSE ANALYZE;
reindex database postgres;

I would be happy if there is a patch to fix that !

2017-12-01 22:16 GMT+01:00 Bossart, Nathan :

> On 12/1/17, 3:04 PM, "Robert Haas"  wrote:
> > On Fri, Dec 1, 2017 at 1:53 PM, Bossart, Nathan 
> wrote:
> >> There is already an internal flag called VACOPT_NOWAIT that gives
> >> autovacuum this ability in certain cases (introduced in the
> aforementioned
> >> commit), and I recently brought up this potential use-case as
> >> justification for a patch [0].  I'd be happy to submit a patch for
> >> providing VACOPT_NOWAIT to users if others think it's something we
> should
> >> do.
> >
> > Seems entirely reasonable to me, provided that we only update the
> > extensible-options syntax:
> >
> > VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
> >
> > I don't want to add any more options to the older syntax:
> >
> > VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns
> [, ...] ]
>
> Right.  This seems like the right path forward to me, as the
> VACUUM documentation states that the unparenthesized syntax is
> deprecated, and the DISABLE_PAGE_SKIPPING option was not added
> to the old syntax, either.
>
> > I am slightly confused as to how we got on to this topic since the
> > subject is "Vacuum crashes", but perhaps Lyes was simply speaking
> > imprecisely.
>
> I'm hoping Lyes chimes in soon to clarify if I am interpreting
> the original report incorrectly.
>
> Nathan
>
>


-- 
*Lyes AMEDDAH*
*Téléphone portable : 06 66 24 50 70*
*Titre RNCP I - Développement Web*
*HiTema*


Re: [HACKERS] pgbench more operators & functions

2017-12-04 Thread Raúl Marín Rodríguez
Hi,


> Regenerated v15 that applies cleanly on head. No changes.


I'm not sure why it wasn't failing before, but I have issues building the
doc:

+are built into
pgbench
Missing '/' to close the xref

+  1  3
Expecting ';' as the previous use ()

On Fri, Dec 1, 2017 at 1:57 PM, Fabien COELHO  wrote:

>
> Here is a v13. No code changes, but TAP tests added to maintain pgbench
> coverage to green.
>

>> Here is a v14, which is just a rebase after the documentation xml-ization.
>>
>
> Regenerated v15 that applies cleanly on head. No changes.
>
> --
> Fabien.




-- 

*Raúl Marín Rodríguez *carto.com


RE: Re: User defined data types in Logical Replication

2017-12-04 Thread Huong Dangminh
Hi,

> From: Huong Dangminh [mailto:huo-dangm...@ys.jp.nec.com]
> I attached a patch based on Sawada-san's patch with a bit of messages
> modified and remove the above check.
> Could somebody check it for me or should I add it into CF?

Sorry, I have added this thread to the next CF.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/



Re: Doc tweak for huge_pages?

2017-12-04 Thread Adrien Nayrat
On 12/01/2017 05:35 AM, Thomas Munro wrote:
>>  since it also
>> supports "transparent" hugepages.
> Hmm.  Yeah, it does, but apparently it's not so transparent.

+1. We saw performance drop with transparent_hugepage enabled on server with
more than 256GB RAM. Access to the cache where slow down when kernel try to
defragment pages.

When that happens, we saw the function isolate_freepages_block appearing in most
consuming function with perf top.

Thanks to Marc Cousin analysis, putting "madvise"  to
/sys/kernel/mm/transparent_hugepage/enabled  solved the problem. He also notice
that THP only works  for "anonymous memory mappings"[1] (shared_buffers are not
anonymous).

1: https://www.kernel.org/doc/Documentation/vm/transhuge.txt

Regards,

-- 
Adrien NAYRAT



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] proposal: psql command \graw

2017-12-04 Thread Alexander Korotkov
On Mon, Dec 4, 2017 at 11:21 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> The problem is that it's hard to read arbitrary formatted psql output from
> external program (not just gnuplot, but even especially written script).  I
> made my scripts read few variations, but it doesn't look feasible to read
> all the combinations.  For sure, it's possible to set format options inside
> macro, but then it would affect psql format options after execution.
>
> This is why I think only one \graw option is just fine, because it
> produces stable machine-readable output.
>

Oh, I just get that in current state of \graw doesn't produce good
machine-readable output.

# select '|', '|' \graw
|||

Column separator is character which can occur in values, and values aren't
escaped.  Thus, reader can't correctly divide values between columns in all
the cases.  So, I would rather like to see \graw to output in csv format
with proper escaping.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] proposal: psql command \graw

2017-12-04 Thread Alexander Korotkov
On Fri, Dec 1, 2017 at 6:36 PM, Robert Haas  wrote:

> On Fri, Dec 1, 2017 at 12:16 AM, Michael Paquier
>  wrote:
> > On Sat, Nov 11, 2017 at 12:57 AM, Pavel Stehule 
> wrote:
> >> 2017-11-10 16:38 GMT+01:00 Fabien COELHO :
> >>> So I switched the patch to "ready for committer".
> >>
> >> Thank you very much
> >
> > Patch moved to CF 2018-01 with same status: ready for committer.
>
> I vote to reject this patch.  It doesn't do anything that you can't
> already do; it just adds some syntactic sugar.  And that syntactic
> sugar saves only a handful of keystrokes.  If you want unaligned,
> tuples-only mode, you can set it in 5 keystrokes:
>
> rhaas=# \a\t
> Output format is unaligned.
> Tuples only is on.
>
> If you use this command, it takes 4 keystrokes; instead of ending your
> command with a semicolon (1 character) you end it with \graw (5
> characters).
>
> Now, granted, \graw lets you set those options for a single command
> rather than persistently, but I'm just not very interested in having a
> bunch of \g options that enable various combinations of
> options.  Soon we'll have a thicket of \g variants that force
> whichever combinations of options particular developers like to use,
> and if \graw is any indication, the \g variant won't
> necessarily look anything like the normal way of setting those
> options.  And that's not easy to fix, either: \graw could be spelled
> \gat since it forces \a on and \t on, but somebody's bound to
> eventually propose a variant that sets an option that has no
> single-character shorthand.
>

For sure, bunch of \g options don't look interesting.  But I
think \graw option would be quite useful if even it would be the only form
of  \g forever.

Our command line client psql have two great features:
1. Output format customization including boards, separators, spacing, title
and so on
2. Ability to export data into files or pipes.  It's even more powerful
with macros enabling you doing cool things with your data in few
keystokes.  I've couple of examples in my blog, but for sure it's possible
to do much more.
http://akorotkov.github.io/blog/2015/08/26/psql-gdb-attach/
http://akorotkov.github.io/blog/2016/06/09/psql-graph/

The problem is that it's hard to read arbitrary formatted psql output from
external program (not just gnuplot, but even especially written script).  I
made my scripts read few variations, but it doesn't look feasible to read
all the combinations.  For sure, it's possible to set format options inside
macro, but then it would affect psql format options after execution.

This is why I think only one \graw option is just fine, because it produces
stable machine-readable output.  If even someone need another format, she
can write a conversion script and add it to her macro.  Thus, I wouldn't
like seeing this feature rejected without better alternative proposed.

Alternatively, we could introduce ability for single-line format change,
allowing to change psql output format options whose are set back to
original values after execution of current line.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company