Re: pgsql: Clarify use of temporary tables within partition trees

2018-07-02 Thread Amit Langote
Just realized something...

On 2018/07/03 15:29, Amit Langote wrote:
> Sorry for jumping in late here.  I have a comment on the patch.
> 
> + /* if there are no partitions then treat this as non-inheritance case. 
> */
> + if (partdesc->nparts == 0)
> + {
> + parentrte->inh = false;
> + return;
> + }
> +
> 
> Why is this not near the beginning of expand_partitioned_rtentry()?

This one still stands I think.

> Also, ISTM, this code would be unreachable because
> expand_inherited_rtentry would not call here if the above if statement is
> true, no?

I forgot that expand_partitioned_rtentry() will recursively call itself if
a partition is itself a partitioned table, in which case the above code helps.

Thanks,
Amit




Re: pgsql: Clarify use of temporary tables within partition trees

2018-07-02 Thread Michael Paquier
On Tue, Jul 03, 2018 at 03:29:36PM +0900, Amit Langote wrote:
> Why is this not near the beginning of expand_partitioned_rtentry()?
> 
> Also, ISTM, this code would be unreachable because
> expand_inherited_rtentry would not call here if the above if statement is
> true, no?

FWIW, I understood that the intention here is to be careful,
particularly if expand_partitioned_rtentry begins to get called from a
different code path in the future, which is something that would likely
happen.  We could replace that by an assertion or even an elog(), and
change again this code in the future, now what's proposed here makes
quite some sense to me as well.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Clarify use of temporary tables within partition trees

2018-07-02 Thread Amit Langote
On 2018/07/03 15:16, David Rowley wrote:
> On 3 July 2018 at 18:11, Michael Paquier  wrote:
>> On Tue, Jul 03, 2018 at 06:00:46PM +1200, David Rowley wrote:
>>> I think it should be backpatched to v11 and v10. Your original commit
>>> went there too. I don't see any reason to do any different here than
>>> what you did with the original commit.
>>
>> expand_partitioned_rtentry is new as of v11.  Or you mean to tweak
>> expand_inherited_rtentry() perhaps?  I am not sure that it is worth it
>> as the code has already diverged between 10 and 11.
> 
> Oh right. I'd forgotten that changed in v11. I think the v10 code is
> fine as is then.

Sorry for jumping in late here.  I have a comment on the patch.

+   /* if there are no partitions then treat this as non-inheritance case. 
*/
+   if (partdesc->nparts == 0)
+   {
+   parentrte->inh = false;
+   return;
+   }
+

Why is this not near the beginning of expand_partitioned_rtentry()?

Also, ISTM, this code would be unreachable because
expand_inherited_rtentry would not call here if the above if statement is
true, no?

I see the following two blocks in expand_inherited_rtentry before one gets
to the call to expand_partitioned_rtentry:

if (!has_subclass(parentOID))
{
/* Clear flag before returning */
rte->inh = false;
return;
}

and

if (list_length(inhOIDs) < 2)
{
/* Clear flag before returning */
rte->inh = false;
return;
}

Thanks,
Amit




Re: Possible bug in logical replication.

2018-07-02 Thread Arseny Sher


Michael Paquier  writes:

> On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
>> Could it be possible to get a patch from all the feedback and exchange
>> gathered here?  Petr, I think that it would not hurt if you use the set
>> of words and comments you think is most adapted as the primary author of
>> the feature.
>
> I have seen no patch, so attached is one to finally close the loop and
> this open item, which includes both my suggestions and what Arseny has
> mentioned based on the latest emails exchanged.  Any objections to that?

I'm practically happy with this.

>  * while confirmed_lsn is used as base point for the decoding context.

This line is excessive as now we have comment below saying it doesn't
matter.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Clarify use of temporary tables within partition trees

2018-07-02 Thread David Rowley
On 3 July 2018 at 18:11, Michael Paquier  wrote:
> On Tue, Jul 03, 2018 at 06:00:46PM +1200, David Rowley wrote:
>> I think it should be backpatched to v11 and v10. Your original commit
>> went there too. I don't see any reason to do any different here than
>> what you did with the original commit.
>
> expand_partitioned_rtentry is new as of v11.  Or you mean to tweak
> expand_inherited_rtentry() perhaps?  I am not sure that it is worth it
> as the code has already diverged between 10 and 11.

Oh right. I'd forgotten that changed in v11. I think the v10 code is
fine as is then.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Clarify use of temporary tables within partition trees

2018-07-02 Thread Michael Paquier
On Tue, Jul 03, 2018 at 06:00:46PM +1200, David Rowley wrote:
> Thanks for fixing it up. It looks fine apart from "Temporation" should
> be "Temporary".

Of course, thanks.

> I think it should be backpatched to v11 and v10. Your original commit
> went there too. I don't see any reason to do any different here than
> what you did with the original commit.

expand_partitioned_rtentry is new as of v11.  Or you mean to tweak
expand_inherited_rtentry() perhaps?  I am not sure that it is worth it
as the code has already diverged between 10 and 11.
--
Michael


signature.asc
Description: PGP signature


Re: Should contrib modules install .h files?

2018-07-02 Thread Pavel Stehule
2018-07-03 6:43 GMT+02:00 Craig Ringer :

> On 2 July 2018 at 02:23, Andrew Gierth 
> wrote:
>
>> So I have this immediate problem: a PGXS build of a module, specifically
>> an hstore transform for a non-core PL, is much harder than it should be
>> because it has no way to get at hstore.h since that file is never
>> installed anywhere.
>>
>> Should that be changed?
>>
>>
> I think there's agreement in the thread that it should, and strong +1 from
> me.
>

+1

similar issue is with plpgsql. The header files are exported by not generic
way.

Regards

Pavel


> I just wanted to pipe up with something Petr pointed out during pglogical
> development, which is that Pg offers a handy tool to help extensions link
> up with each other - find_rendezvous_variable(...) from dfmgr.c / fmgr.h
> .
>
> It's a real shame it's not more visible in contrib/ examples and the docs.
> Any suggestions on where it should appear in the docs? Somewhere in
> extend.sgml, presumably.
>
> You still need a header from the other extension to *use* it, but it
> provides a massively easier way to find a struct of API function pointers.
> Prior to using it, I had a hack where I dlopen()ed the other shared library
> directly, and I'd also trialled using the fmgr to call a 'returns internal'
> function to get the API pointers struct that way.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>


Re: pgsql: Clarify use of temporary tables within partition trees

2018-07-02 Thread David Rowley
On 3 July 2018 at 16:55, Michael Paquier  wrote:
> Okay, the patch looks logically correct to me, I just tweaked the
> comments as per the attached.  I would also back-patch that down to v11
> to keep the code consistent with HEAD..  What do you think?

Thanks for fixing it up. It looks fine apart from "Temporation" should
be "Temporary".

I think it should be backpatched to v11 and v10. Your original commit
went there too. I don't see any reason to do any different here than
what you did with the original commit.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Possible bug in logical replication.

2018-07-02 Thread Alvaro Herrera
On 2018-Jul-03, Michael Paquier wrote:

> On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
> > Could it be possible to get a patch from all the feedback and exchange
> > gathered here?  Petr, I think that it would not hurt if you use the set
> > of words and comments you think is most adapted as the primary author of
> > the feature.
> 
> I have seen no patch, so attached is one to finally close the loop and
> this open item, which includes both my suggestions and what Arseny has
> mentioned based on the latest emails exchanged.  Any objections to that?

Let me review tomorrow.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Clarify use of temporary tables within partition trees

2018-07-02 Thread Michael Paquier
On Tue, Jul 03, 2018 at 12:59:33PM +1200, David Rowley wrote:
> On 3 July 2018 at 10:16, Michael Paquier  wrote:
>> On Mon, Jul 02, 2018 at 02:07:37PM -0400, Robert Haas wrote:
>>> I'd rather keep an elog(ERROR) than completely remove the check.
>>
>> +1.
>
> Attached

Okay, the patch looks logically correct to me, I just tweaked the
comments as per the attached.  I would also back-patch that down to v11
to keep the code consistent with HEAD..  What do you think?
--
Michael
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 2d470240d5..71e256f8fd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1681,7 +1681,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
 	int			i;
 	RangeTblEntry *childrte;
 	Index		childRTindex;
-	bool		has_child = false;
 	PartitionDesc partdesc = RelationGetPartitionDesc(parentrel);
 
 	check_stack_depth();
@@ -1707,6 +1706,16 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
 	top_parentrc, parentrel,
 	appinfos, &childrte, &childRTindex);
 
+	/*
+	 * If the partitioned table has no partitions, treat this as the
+	 * non-inheritance case.
+	 */
+	if (partdesc->nparts == 0)
+	{
+		parentrte->inh = false;
+		return;
+	}
+
 	for (i = 0; i < partdesc->nparts; i++)
 	{
 		Oid			childOID = partdesc->oids[i];
@@ -1715,15 +1724,13 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
 		/* Open rel; we already have required locks */
 		childrel = heap_open(childOID, NoLock);
 
-		/* As in expand_inherited_rtentry, skip non-local temp tables */
+		/*
+		 * Temporation partitions belonging to other sessions should have been
+		 * disallowed at definition, but for paranoia's sake, let's double
+		 * check.
+		 */
 		if (RELATION_IS_OTHER_TEMP(childrel))
-		{
-			heap_close(childrel, lockmode);
-			continue;
-		}
-
-		/* We have a real partition. */
-		has_child = true;
+			elog(ERROR, "temporary relation from another session found as partition");
 
 		expand_single_inheritance_child(root, parentrte, parentRTindex,
 		parentrel, top_parentrc, childrel,
@@ -1738,14 +1745,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
 		/* Close child relation, but keep locks */
 		heap_close(childrel, NoLock);
 	}
-
-	/*
-	 * If the partitioned table has no partitions or all the partitions are
-	 * temporary tables from other backends, treat this as non-inheritance
-	 * case.
-	 */
-	if (!has_child)
-		parentrte->inh = false;
 }
 
 /*


signature.asc
Description: PGP signature


Re: Should contrib modules install .h files?

2018-07-02 Thread Craig Ringer
On 2 July 2018 at 02:23, Andrew Gierth  wrote:

> So I have this immediate problem: a PGXS build of a module, specifically
> an hstore transform for a non-core PL, is much harder than it should be
> because it has no way to get at hstore.h since that file is never
> installed anywhere.
>
> Should that be changed?
>
>
I think there's agreement in the thread that it should, and strong +1 from
me.

I just wanted to pipe up with something Petr pointed out during pglogical
development, which is that Pg offers a handy tool to help extensions link
up with each other - find_rendezvous_variable(...) from dfmgr.c / fmgr.h .

It's a real shame it's not more visible in contrib/ examples and the docs.
Any suggestions on where it should appear in the docs? Somewhere in
extend.sgml, presumably.

You still need a header from the other extension to *use* it, but it
provides a massively easier way to find a struct of API function pointers.
Prior to using it, I had a hack where I dlopen()ed the other shared library
directly, and I'd also trialled using the fmgr to call a 'returns internal'
function to get the API pointers struct that way.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-07-02 Thread Ashutosh Bapat
On Tue, Jul 3, 2018 at 8:19 AM, Robert Haas  wrote:
> On Mon, Jul 2, 2018 at 1:46 AM, Ashutosh Bapat
>  wrote:
>> This constraint was added to the partitioned table and inherited from
>> there. If user wants to drop that constraint for some reason, this
>> error message doesn't help. The error message tells why he can't drop
>> it, but doesn't tell, directly or indirectly, the user what he should
>> do in order to drop it.
>
> That doesn't really sound like an actual problem to me.  If the error
> is that the constraint is inherited, that suggests that the solution
> is to find the place from which it got inherited and drop it there.
> And that's in fact what you have to do.  What's the problem?  I mean,
> we could add a hint, but it's possible to make yourself sound dumb by
> giving hints that are basically obvious implications from the error
> message.  Nobody wants this sort of thing:
>
> rhaas=# drop table foo;
> ERROR:  table "foo" does not exist
> HINT: Try dropping a table with a different name that does exist, or
> first create this table before trying to drop it.

Again a wrong example and wrong comparison. I think I was clear about
the problem when I wrote

--
When there was only a single way, i.e table
inheritance, to "inherit" things users could probably guess that. But
now there are multiple ways to inherit things, we have to help user a
bit more. The user might figure out that ti's a partition of a table,
so the constraint is inherited from the partitioned table. But it will
help if we give a hint about from where the constraint was inherited
like the error message itself reads like "can not drop constraint
"p_a_check" on relation "p1" inherited from "partitioned table's name"
OR a hint "you may drop constraint "p_a_check" on the partitioned
table "partitioned table's name".
--

For some reason you have chosen to remove this from the email and
commented on previous part of it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Possible bug in logical replication.

2018-07-02 Thread Michael Paquier
On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
> Could it be possible to get a patch from all the feedback and exchange
> gathered here?  Petr, I think that it would not hurt if you use the set
> of words and comments you think is most adapted as the primary author of
> the feature.

I have seen no patch, so attached is one to finally close the loop and
this open item, which includes both my suggestions and what Arseny has
mentioned based on the latest emails exchanged.  Any objections to that?
--
Michael
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index c2d0e0c723..7e10a027ca 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -340,6 +340,9 @@ CreateInitDecodingContext(char *plugin,
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..6c2addd5b7 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -343,10 +343,10 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
  * Helper function for advancing logical replication slot forward.
  * The slot's restart_lsn is used as start point for reading records,
  * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just use LogicalConfirmReceivedLocation to update
+ * confirmed_flush_lsn, we had better digest WAL to advance restart_lsn
+ * allowing to recycle WAL and old catalog tuples.  As decoding is done
+ * with fast_forward mode, no changes are generated.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +358,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Note that start_lsn does not matter here, as with fast_forward mode
+		 * no transactions are replayed.
+		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 	NIL,
 	true,
@@ -378,6 +381,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			XLogRecord *record;
 			char	   *errm = NULL;
 
+			/*
+			 * Start reading WAL at the slot's restart_lsn, which certainly
+			 * points to the valid record.
+			 */
 			record = XLogReadRecord(ctx->reader, startlsn, &errm);
 			if (errm)
 elog(ERROR, "%s", errm);
@@ -389,8 +396,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Note that changes are not generated in fast_forward mode, and
+			 * that the slot's data is still updated.
 			 */
 			if (record != NULL)
 LogicalDecodingProcessRecord(ctx, ctx->reader);


signature.asc
Description: PGP signature


Re: Copy function for logical replication slots

2018-07-02 Thread Michael Paquier
On Mon, Jul 02, 2018 at 04:31:32PM +0900, Masahiko Sawada wrote:
> Attached an updated patch including copy function support for logical
> slots as well as physical slots. Please review it.

I had a look at this patch.

As the output plugin can be changed for logical slots, having two
functions is a good thing.

+# Basic decoding works using the copied slot
+$result = $node_master->safe_psql('postgres',
+   qq[SELECT pg_logical_slot_get_changes('copy_slot', NULL, NULL);]);
+is(scalar(@foobar = split /^/m, $result),
+   12, 'Decoding produced 12 rows inc BEGIN/COMMIT using the copied slot');

This could live as well as part of the test suite for test_decoding, and
that would be faster as well.  There are more scenarios that could be
tested as well:
- Copy a temporary slot to become permanent and switch its plugin type,
then check the state of the slot in pg_replication_slots.
- Do the reverse, aka copy a permanent slot and make it temporary.
- Checking that the default behaviors are preserved: if persistent is
not changed then the new slot should have the same persistence as the
origin.  The same applies for the output plugin, and for both logical
and replication slots.
- Check that the reversed LSN is the same for both the origin and the
target.

+Copy a existing src_slot_name physical slot
+to dst_slot_name. The copied physical slot starts
+to reserve WAL from the same LSN as the source slot if the source slot
+already reserves WAL. temporary is optional. If
+temporary is omitted, the same value of the
+source slot is used.

Typo here.  Copy AN existing slot.  I would reword that a bit:
Copies an existing physical replication slot name src_slot_name to a
physical replication slot named dst_slot_name.
Extra one: "the same value AS the source slot is used."

+Copy a existing src_slot_name logical (decoding) 
slot
+to dst_slot_name while changing the output plugin
+and persistence.

There may be no need for "decoding" here, the same phrasing as suggested
above would be nicer I think.  For LSN I would suggest to add an
 markup.

I am not sure if it makes much sense to be able to copy from a slot
which has not yet been used to reserve WAL, but to change the properties
of a slot I could get that forbidding the behavior is not really
intuitive either.  

-   ReplicationSlotReserveWal();
+   /* Find start location to read WAL if not specified */
+   if (XLogRecPtrIsInvalid(start_lsn))
+   ReplicationSlotReserveWal();
+   else
+   {
+   SpinLockAcquire(&slot->mutex);
+   slot->data.restart_lsn = start_lsn;
+   SpinLockRelease(&slot->mutex);
+   }
Hmm.  Instead of all this stanza in CreateInitDecodingContext(), I would
have imagined that what should happen is that the new fresh slot gets
created with the same status data as the origin.  This saves quite a bit
of extra post-creation computing, and we are also sure that the origin
slot has consistent data as it is owned by the process doing the copy.

One property which seems important to me is to make sure that the target
slot has the same data as the origin slot once the caller knows that the
copy has completed, and not that the target slot may perhaps have the
same data as the origin while creating the target.  Do you see the
difference?  Your patch does the latter, because it creates the new slot
after releasing the lock it held on the origin, while I would think that
the former is more important, aka keep the lock for the duration of the
copy.  I am not completely sure if that's a property we want to keep,
but that deserves discussion as we should not do a copy while the origin
slot may still be consumed in parallel.  For physical slot the copy is
straight-forward, less for logical slots.
--
Michael


signature.asc
Description: PGP signature


Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-07-02 Thread Amit Langote
On 2018/07/03 11:49, Robert Haas wrote:
> On Mon, Jul 2, 2018 at 1:46 AM, Ashutosh Bapat
>  wrote:
>> This constraint was added to the partitioned table and inherited from
>> there. If user wants to drop that constraint for some reason, this
>> error message doesn't help. The error message tells why he can't drop
>> it, but doesn't tell, directly or indirectly, the user what he should
>> do in order to drop it.
> 
> That doesn't really sound like an actual problem to me.  If the error
> is that the constraint is inherited, that suggests that the solution
> is to find the place from which it got inherited and drop it there.
> And that's in fact what you have to do.  What's the problem?  I mean,
> we could add a hint, but it's possible to make yourself sound dumb by
> giving hints that are basically obvious implications from the error
> message.  Nobody wants this sort of thing:
> 
> rhaas=# drop table foo;
> ERROR:  table "foo" does not exist
> HINT: Try dropping a table with a different name that does exist, or
> first create this table before trying to drop it.
> 
> A hint here wouldn't be as silly as that, but I think it is
> unnecessary.  I doubt there's likely to be much confusion here.

I have to agree with that.  "cannot drop inherited ..." conveys enough for
a user to find out more.

Thanks,
Amit




Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-07-02 Thread Robert Haas
On Mon, Jul 2, 2018 at 1:46 AM, Ashutosh Bapat
 wrote:
> This constraint was added to the partitioned table and inherited from
> there. If user wants to drop that constraint for some reason, this
> error message doesn't help. The error message tells why he can't drop
> it, but doesn't tell, directly or indirectly, the user what he should
> do in order to drop it.

That doesn't really sound like an actual problem to me.  If the error
is that the constraint is inherited, that suggests that the solution
is to find the place from which it got inherited and drop it there.
And that's in fact what you have to do.  What's the problem?  I mean,
we could add a hint, but it's possible to make yourself sound dumb by
giving hints that are basically obvious implications from the error
message.  Nobody wants this sort of thing:

rhaas=# drop table foo;
ERROR:  table "foo" does not exist
HINT: Try dropping a table with a different name that does exist, or
first create this table before trying to drop it.

A hint here wouldn't be as silly as that, but I think it is
unnecessary.  I doubt there's likely to be much confusion here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-07-02 Thread Robert Haas
On Mon, Jul 2, 2018 at 10:33 PM, Robert Haas  wrote:
> Yes, the original proposal was that we should be relaxed about it.

...in both directions i.e. DROP TABLE would work on a VIEW and DROP
VIEW on a table.  That definitely seems like it's going too far.

> Another possibility that would also seem to meet the OP's needs is to
> make it do this:
>
> DROP TABLE IF EXISTS X;
> NOTICE:  relation "X" is not a table, skipping
>
> His complaint was really that it generated an ERROR, IIUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-07-02 Thread Robert Haas
On Mon, Jul 2, 2018 at 5:25 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jun 26, 2018 at 1:06 PM, Tom Lane  wrote:
>>> Certainly we *could* change it, but it's not at all clear that it's a good
>>> idea.  The current behavior seemed sensible when it was implemented, and
>>> it has stood for quite some years now.  Now, we have one person
>>> complaining that it wasn't what he expected.  If we change the behavior on
>>> the strength of that, will we change it back on the first complaint from
>>> someone else?  What about the possibility that the change breaks peoples'
>>> applications?
>
>> Hmm.  I think we generally allow ALTER TABLE to work on non-table
>> relations.  For example, ALTER TABLE .. OWNER TO happily accepts a
>> view or materialized view as an argument.  I have had my doubts about
>> that policy from time to time (cf.
>> 1489e2f26a4c0318938b3085f50976512f321d84) but it does seem to be the
>> policy (cf. b14206862278347a379f2bb72d92d16fb9dcea45).  Is there some
>> reason to think that the same policy that we apply to ALTER should not
>> also apply to DROP?
>
> Uh ... error proofing?  The thing about ALTER is that if you mistakenly
> point at the wrong object, it's quite likely your command will fail
> because the properties of the object don't match (eg, it doesn't have a
> column by that name).  Or if it goes through anyway, many varieties of
> ALTER are reversible -- certainly ALTER OWNER is.  There's no comparable
> safety margin with DROP, and it's not reversible, and the consequences of
> a mistake could be pretty bad.

I suppose that's true, but it still seems inconsistent.

> Note that in any case this wasn't what the OP was proposing, if I
> understood that correctly.

Yes, the original proposal was that we should be relaxed about it.

Another possibility that would also seem to meet the OP's needs is to
make it do this:

DROP TABLE IF EXISTS X;
NOTICE:  relation "X" is not a table, skipping

His complaint was really that it generated an ERROR, IIUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speedup of relation deletes during recovery

2018-07-02 Thread Michael Paquier
On Tue, Jul 03, 2018 at 04:13:15AM +0900, Fujii Masao wrote:
> OK, so what about the attached patch?

I have been looking at this patch, and this looks in good shape to me
(please indent!).

+* Call smgrclose() in reverse order as when smgropen() is called.
+* This trick enables remove_from_unowned_list() in smgrclose()
+* to search the SMgrRelation from the unowned list,
+* in O(1) performance.

A nit here: with O(1) performance.

Could it be possible to add an assertion so as isRedo is never enabled
out of recovery?
--
Michael


signature.asc
Description: PGP signature


Re: automatic restore point

2018-07-02 Thread Jaime Casanova
On Mon, 2 Jul 2018 at 20:07, Yotsunaga, Naoki
 wrote:
>
> Hi. Thanks for comments.
>
> Explanation of the background of the function proposal was inadequate.
> So, I explain again.
>
> I assume the following situation.
> User needs to make a quick, seemingly simple fix to an important production 
> database. User composes the query, gives it an once-over, and lets it run. 
> Seconds later user realizes that user forgot the WHERE clause, dropped the 
> wrong table, or made another serious mistake, and interrupts the query, but 
> the damage has been done.
> Also user did not record the time and did not look at a lsn position.
>

Thinking on Michael's suggestion of using event triggers, you can
create an event trigger to run pg_create_restore_point() on DROP,
here's a simple example of how that should like:
https://www.postgresql.org/docs/current/static/functions-event-triggers.html

You can also create a normal trigger BEFORE TRUNCATE to create a
restore point just before running the TRUNCATE command.

Those would run *on the background* (you don't need to call them
manually), you can use them right now, won't affect performance for
people not wanting this "functionality".

BTW, Michael's suggestion also included the idea of recording
xid/time/lsn which can be done through triggers too

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Protect syscache from bloating with negative cache entries

2018-07-02 Thread Andres Freund
Hi,

On 2018-07-02 21:50:36 -0400, Alvaro Herrera wrote:
> On 2018-Jul-02, Andrew Dunstan wrote:
> 
> > Andres suggested back in March (and again privately to me) that given how
> > much this has changed from the original this CF item should be marked
> > Returned With Feedback and the current patchset submitted as a new item.
> > 
> > Does anyone object to that course of action?
> 
> If doing that makes the "CF count" reset back to one for the new
> submission, then I object to that course of action.  If we really
> think this item does not belong into this commitfest, lets punt it to
> the next one.  However, it seems rather strange to do so this early in
> the cycle.  Is there really no small item that could be cherry-picked
> from this series to be committed standalone?

Well, I think it should just have been RWFed last cycle. It got plenty
of feedback. So it doesn't seem that strange to me, not to include it in
the "mop-up" CF? Either way, I don't feel strongly about it, I just know
that I won't have energy for the topic in this CF.

Greetings,

Andres Freund



Re: Protect syscache from bloating with negative cache entries

2018-07-02 Thread Alvaro Herrera
On 2018-Jul-02, Andrew Dunstan wrote:

> Andres suggested back in March (and again privately to me) that given how
> much this has changed from the original this CF item should be marked
> Returned With Feedback and the current patchset submitted as a new item.
> 
> Does anyone object to that course of action?

If doing that makes the "CF count" reset back to one for the new
submission, then I object to that course of action.  If we really
think this item does not belong into this commitfest, lets punt it to
the next one.  However, it seems rather strange to do so this early in
the cycle.  Is there really no small item that could be cherry-picked
from this series to be committed standalone?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-07-02 Thread Michael Paquier
On Mon, Jul 02, 2018 at 03:11:46PM -0700, Carter Thaxton wrote:
> Also, there may be some misunderstanding about "foo:bar" above.  That's an
> example of using a namespaced table, where "bar" is a table in the
> namespace "foo".  Normally, assuming your table is named "bar" in the
> default namespace, you would just say something like:
> 
>   pg_dump --where "bar:created_at >= 2018-05-01'"

I am wondering how this works at parsing if the table name, or one of
the columns includes a colon character :)
--
Michael


signature.asc
Description: PGP signature


Re: Old small commitfest items

2018-07-02 Thread Michael Paquier
On Mon, Jul 02, 2018 at 10:30:11AM -0400, Andrew Dunstan wrote:
> 528 1146 Fix the optimization to skip WAL-logging on table created in
> same transaction

This has been around for an astonishing amount of time...  I don't
recall all the details but rewriting most of the relation sync handling
around heapam for a corner optimization is no fun.  There is no way that
we could go down to elimitate wal_level = minimal, so I am wondering if
we should not silently ignore the optimization if possible instead of
throwing an error.  Perhaps logging a WARNING could make sense.

> 669 847 pgbench - allow to store query results into variables

I think that I could look into this one as well.

> 922 180 Failure at replay for corrupted 2PC files + reduce window
> between end-of-recovery record and history file write

I know this one pretty well :), waiting for reviews, and the patches are
not complicated.

> 1113 68 Replication status in logical replication

I think that I could finish this one as well.
--
Michael


signature.asc
Description: PGP signature


Re: automatic restore point

2018-07-02 Thread Michael Paquier
On Tue, Jul 03, 2018 at 01:06:31AM +, Yotsunaga, Naoki wrote:
>> I'd rather spend effort making the initial execution of said commands
>> less likely.
>
> I think that the function to prohibit DELETE and UPDATE without a
> WHERE clause in the later response is good way.

This has popped up already in the lists in the past.

> But I think that it is impossible to completely eliminate the failure
> of the other commands.  For example, drop the wrong table.

This kind of thing is heavily application-dependent.  For example, you
would likely not care if an operator, who has newly-joined the team in
charge of the maintenance of this data, drops unfortunately a table
which includes logs from 10 years back, and you would very likely care
about a table dropped which has user's login data.  My point is that you
need to carefully design the shape of the configuration you would use,
so as any application's admin would be able to cope with it, for example
allowing exclusion filters with regular expressions could be a good idea
to dig into.  And also you need to think about it so as it is backward
compatible.
--
Michael


signature.asc
Description: PGP signature


Re: automatic restore point

2018-07-02 Thread Michael Paquier
On Tue, Jul 03, 2018 at 01:07:41AM +, Yotsunaga, Naoki wrote:
>> There is also recovery_target_lsn which is new as of v10.
>  In this method, it is necessary to look at a lsn position before operating.
>  But I assume the user who did not look it before operating.
>  So I think that this method is not appropriate.

You should avoid top-posting on the mailing lists, this breaks the
consistency of the thread.

>> So basically what you are looking for here is a way to enforce a
>> restore point to be created depending on a set of pre-defined
>> conditions?  How would you define and choose those?
>
> I understand that I was asked how to set up a command to apply this function. 
>  Ex) DROP = on 
>  TRUNCATE = off
>  Is my interpretation right?
>  If my interpretation is correct, all the above commands will be
>  applied.
>  When this function is turned on, this function works when all the
>  above commands are executed. 

Yeah, but based on which factors are you able to define that such
conditions are enough to say that this feature is fully-compliant with
user's need, and how can you be sure that this is not going to result in
an additional maintenance burden if you need to define a new set of
conditions in the future.  For example an operator has issued a costly
ALTER TABLE which causes a full table rewrite, which could be also an
operation that you would like to prevent.  Having a set of GUCs which
define such low-level behavior is not really user-friendly.
--
Michael


signature.asc
Description: PGP signature


RE: automatic restore point

2018-07-02 Thread Yotsunaga, Naoki
Hi. Thanks for comments.

>There is also recovery_target_lsn which is new as of v10.
 In this method, it is necessary to look at a lsn position before operating.
 But I assume the user who did not look it before operating.
 So I think that this method is not appropriate.

> So basically what you are looking for here is a way to enforce a restore 
> point to be created depending on a set of pre-defined conditions?  
>How would you define and choose those?
 I understand that I was asked how to set up a command to apply this function. 
 Ex) DROP = on 
 TRUNCATE = off
 Is my interpretation right?
 If my interpretation is correct, all the above commands will be applied.
 When this function is turned on, this function works when all the above 
commands are executed.

---
Naoki Yotsynaga
-Original Message-
From: Michael Paquier [mailto:mich...@paquier.xyz] 
Sent: Tuesday, June 26, 2018 2:31 PM
To: Yotsunaga, Naoki/四ツ永 直輝 
Cc: Postgres hackers 
Subject: Re: automatic restore point

On Tue, Jun 26, 2018 at 01:17:31AM +, Yotsunaga, Naoki wrote:
> The following is a description of "automatic restore point".
> 【Background】
>   When DBA's operation failure, for example DBA accidently drop table, 
> the database is restored from the file system backup and recovered by 
> using time or transaction ID. The transaction ID is identified from 
> WAL.
>   
>   In order to solve the above problem, 
>   I'd like propose a feature to implement automatic recording function
>   of recovery point.

There is also recovery_target_lsn which is new as of v10.  This parameter is 
way better than having to track down time or XID, which is a reason why I 
developped it.  Please note that this is also one of the reasons why it is 
possible to delay WAL replays on standbys, so as an operator has room to fix 
such operator errors.  Having of course cold backups with a proper WAL archive 
and a correct retention policy never hurts.

> 【Setting file】
>   Set postgres.conf.
>   auto_create_restore_point = on # Switch on/off automatic recording
>   function of recovery point. The default value is 'off'.
> 
> So what do you think about it? Do you think is it useful?

So basically what you are looking for here is a way to enforce a restore point 
to be created depending on a set of pre-defined conditions?  How would you 
define and choose those?

> Also, when recovering with the current specification, tables other 
> than the returned table also return to the state of the specified 
> recovery point.
> So, I’m looking for ways to recover only specific tables. Do you have 
> any ideas?

Why not using the utility hook which filters out for commands you'd like to 
forbid, in this case TRUNCATE or a DROP TABLE on a given relation?  Or why not 
simply using an event trigger at your application level so as you can actually 
*prevent* the error to happen first?  With the last option you don't have to 
write C code, but this would not filter TRUNCATE.  In short, what you propose 
looks over-complicated to me and there are options on the table which allow the 
problem you are trying to solve to not happen at all.  You could also use the 
utility hook to log or register somewhere hte XID/time/LSN associated to a 
given command and then use it as your restore point.  This could also happen 
out of core.
--
Michael


RE: automatic restore point

2018-07-02 Thread Yotsunaga, Naoki
Hi. Thanks for comments.

Explanation of the background of the function proposal was inadequate.
So, I explain again.

I assume the following situation.
User needs to make a quick, seemingly simple fix to an important production 
database. User composes the query, gives it an once-over, and lets it run. 
Seconds later user realizes that user forgot the WHERE clause, dropped the 
wrong table, or made another serious mistake, and interrupts the query, but the 
damage has been done.
Also user did not record the time and did not look at a lsn position.

Certainly, I thought about reducing the possibility of executing the wrong 
command, but I thought that the possibility could not be completely eliminated.
So I proposed the “automatic restore point”.
With this function, user can recover quickly and reliably even if you perform a 
failure operation.

> I'd rather spend effort making the initial execution of said commands less 
> likely.  
  I think that the function to prohibit DELETE and UPDATE without a WHERE 
clause in the later response is good way.
  But I think that it is impossible to completely eliminate the failure of the 
other commands.
  For example, drop the wrong table.

-
Naoki Yotsunaga

-Original Message-
From: Michael Paquier [mailto:mich...@paquier.xyz] 
Sent: Tuesday, June 26, 2018 2:16 PM
To: Isaac Morland 
Cc: David G. Johnston ; Yotsunaga, Naoki/四ツ永 直輝 
; Postgres hackers 

Subject: Re: automatic restore point

On Mon, Jun 25, 2018 at 11:01:06PM -0400, Isaac Morland wrote:
> I think an optional setting making DELETE and UPDATE without a WHERE 
> clause illegal would be handy. Obviously this would have to be 
> optional for backward compatibility. Perhaps even just a GUC setting, 
> with the intent being that one would set it in .psqlrc so that 
> omitting the WHERE clause at the command line would just be a syntax 
> error. If one actually does need to affect the whole table one can 
> just say WHERE TRUE. For applications, which presumably have their SQL 
> queries tightly controlled and pre-written anyway, this would most likely not 
> be particularly useful.

There was a patch doing exactly that which was discussed last year:
https://commitfest.postgresql.org/13/948/
https://www.postgresql.org/message-id/20160721045746.ga25...@fetter.org
What was proposed was rather limiting though, see my messages on the thread.  
Using a hook, that's simple enough to develop an extension which does that.
--
Michael




Re: Protect syscache from bloating with negative cache entries

2018-07-02 Thread Kyotaro HORIGUCHI
Hello. The previous v4 patchset was just broken.

At Tue, 26 Jun 2018 18:00:03 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180626.180003.127457941.horiguchi.kyot...@lab.ntt.co.jp>
> Hello. I rebased this patchset.
..
> > The attached is the patch set including this plancache stuff.
> > 
> > 0001- catcache time-based expiration (The origin of this thread)
> > 0002- introduces dynahash pruning feature
> > 0003- implement relcache pruning using 0002
> > 0004- (perhaps) independent from the three above. PoC of
> >   plancache pruning. Details are shown above.
> 
> I found up to v3 in this thread so I named this version 4.

Somehow the 0004 was merged into the 0003 and applying 0004
results in failure. I removed 0004 part from the 0003 and rebased
and repost it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e267985853b100a8ecfd10cc02f464f8c802d19e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 26 Dec 2017 17:43:09 +0900
Subject: [PATCH 1/4] Remove entries that haven't been used for a certain time

Catcache entries can be left alone for several reasons. It is not
desirable that they eat up memory. With this patch, This adds
consideration of removal of entries that haven't been used for a
certain time before enlarging the hash array.
---
 doc/src/sgml/config.sgml  |  38 ++
 src/backend/access/transam/xact.c |   3 +
 src/backend/utils/cache/catcache.c| 153 +++-
 src/backend/utils/cache/plancache.c   | 163 ++
 src/backend/utils/misc/guc.c  |  33 ++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/utils/catcache.h  |  19 +++
 src/include/utils/plancache.h |   7 +-
 8 files changed, 413 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5b913f00c1..76745047af 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1617,6 +1617,44 @@ include_dir 'conf.d'
   
  
 
+ 
+  syscache_memory_target (integer)
+  
+   syscache_memory_target configuration parameter
+  
+  
+  
+   
+Specifies the maximum amount of memory to which syscache is expanded
+without pruning. The value defaults to 0, indicating that pruning is
+always considered. After exceeding this size, syscache pruning is
+considered according to
+. If you need to keep
+certain amount of syscache entries with intermittent usage, try
+increase this setting.
+   
+  
+ 
+
+ 
+  syscache_prune_min_age (integer)
+  
+   syscache_prune_min_age configuration parameter
+  
+  
+  
+   
+Specifies the minimum amount of unused time in seconds at which a
+syscache entry is considered to be removed. -1 indicates that syscache
+pruning is disabled at all. The value defaults to 600 seconds
+(10 minutes). The syscache entries that are not
+used for the duration can be removed to prevent syscache bloat. This
+behavior is suppressed until the size of syscache exceeds
+.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8e6aef332c..e4a4a5874c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -732,6 +732,9 @@ void
 SetCurrentStatementStartTimestamp(void)
 {
 	stmtStartTimestamp = GetCurrentTimestamp();
+
+	/* Set this timestamp as aproximated current time */
+	SetCatCacheClock(stmtStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 5ddbf6eab1..9f421cd242 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -71,9 +71,24 @@
 #define CACHE6_elog(a,b,c,d,e,f,g)
 #endif
 
+/*
+ * GUC variable to define the minimum size of hash to cosider entry eviction.
+ * This variable is shared among various cache mechanisms.
+ */
+int cache_memory_target = 0;
+
+/* GUC variable to define the minimum age of entries that will be cosidered to
+ * be evicted in seconds. This variable is shared among various cache
+ * mechanisms.
+ */
+int cache_prune_min_age = 600;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Timestamp used for any operation on caches. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 	   int nkeys,
 	   Datum v1, Datum v2,
@@ -866,9 +881,130 @@ InitCatCache(int id,
 	 */
 	MemoryContextSwitchTo(oldcxt);
 
+	/* initilize catcache reference clock if haven't done yet */
+	if (catcacheclock == 0)
+		catcacheclock = GetCurrentTimestamp();
+
 	return cp;
 }
 
+/*
+ * CatCacheCleanupOldEntries - Remove infrequently-used entries

Re: How to find the base version cf-bot is using?

2018-07-02 Thread Kyotaro HORIGUCHI
Hello. Thomas.

At Mon, 2 Jul 2018 22:14:42 +1200, Thomas Munro  
wrote in 
> On Mon, Jul 2, 2018 at 9:39 PM, Kyotaro HORIGUCHI
>  wrote:
> >> === applying patch 
> >> ./v4-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch
> > ...
> >> |Date: Tue, 26 Dec 2017 17:43:09 +0900
> > ...
> >> Patching file src/backend/utils/cache/plancache.c using Plan A...
> >> Hunk #1 failed at 63.
> 
> That's complaining about patch 0004 in your patch set.  The first 3
> apply OK, but it looks like maybe 0004 includes stuff that was already
> in 0001?  cfbot tries to apply them all.  I just tried it on HEAD
> myself:
...
> Here you show only patch 0001.

Mmm. The offsets grabed my attention. As your suggestion 0003
included the whole 0004. Thank you (and sorry) for the time to
check it.

Still I don't understand how git emitted the four broken
patchset, though.

> > How can I find the base version the CF-bot used at the time?
> 
> Yeah.  Sorry.  I should include that information in the log to make it
> clearer what it's doing.  I will do that in the next update.

Thank you.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Protect syscache from bloating with negative cache entries

2018-07-02 Thread Andrew Dunstan




On 06/26/2018 05:00 AM, Kyotaro HORIGUCHI wrote:



The attached is the patch set including this plancache stuff.

0001- catcache time-based expiration (The origin of this thread)
0002- introduces dynahash pruning feature
0003- implement relcache pruning using 0002
0004- (perhaps) independent from the three above. PoC of
   plancache pruning. Details are shown above.

I found up to v3 in this thread so I named this version 4.




Andres suggested back in March (and again privately to me) that given 
how much this has changed from the original this CF item should be 
marked Returned With Feedback and the current patchset submitted as a 
new item.


Does anyone object to that course of action?

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Threat models for DB cryptography (Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key) Management Service (KMS)

2018-07-02 Thread Nico Williams
On Mon, Jul 02, 2018 at 06:22:46PM +0900, Masahiko Sawada wrote:
> On Fri, Jun 22, 2018 at 2:31 PM, Tsunakawa, Takayuki
>  wrote:
> > From: Nico Williams [mailto:n...@cryptonector.com]
> >
> >> One shortcoming of relying on OS functionality for protection against
> >> malicious storage is that not all OSes may provide such functionality.
> >> This could be an argument for implementing full, transparent encryption
> >> for an entire DB in the postgres server.  Not a very compelling
> >> argument, but that's just my opinion -- reasonable people could differ
> >> on this.
> >
> > Yes, this is one reason I developed TDE in our product.  And
> > in-database encryption allows optimization by encrypting only user
> > data.

You're likely getting some things terribly wrong.  E.g., integrity
protection.  Most likely you're getting a false sense of security.

> Me too. In-database encryption is helpful in practice. I think 1) and
> 2) seem to cover the thread models which the data encryption in
> database needs to defend.

Yes, but piecemeal encryption seems like a bad idea to me.

Nico
-- 



Re: [HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2018-07-02 Thread Nico Williams
On Tue, Jun 26, 2018 at 04:54:13PM -0400, Robbie Harwood wrote:
> Nico Williams  writes:
> 
> > [Re-send; first attempt appears to have hit /dev/null somewhere.  My
> > apologies if you get two copies.]
> >
> > I've finally gotten around to rebasing this patch and making the change
> > that was requested, which was: merge the now-would-be-three deferral-
> > related bool columns in various pg_catalog tables into one char column.
> >
> > Instead of (deferrable, initdeferred, alwaysdeferred), now there is just
> > (deferral).
> 
> This design seems correct to me.  I have a couple questions inline, and
> some nits to go with them.

Thanks for the review.  I'm traveling (on vacation).  I'll try to get to
your comments within a week.  Thanks!



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-07-02 Thread Nico Williams
On Mon, Jul 02, 2018 at 06:56:34PM +0300, Alvaro Hernandez wrote:
> On 21/06/18 21:43, Nico Williams wrote:
> >Incidentally, PG w/ pgcrypto and FDW does provide everything one needs
> >to be able to implement client-side crypto:
> >
> >  - use PG w/ FDW as a client-side proxy for the real DB
> >  - use pgcrypto in VIEWs with INSTEAD OF triggers in the proxy
> >  - access the DB via the proxy
> 
> Sounds a bit hackish, but it could work. I doubt however the acceptance
> of a solution like this, given the number of "moving parts" and operational
> complexity associated with it.

Well, you could use SQLite3 instead as the client.  Like how GDA does
it.

I do wish there was a libpostgres -- a light-weight postgres for running
an in-process RDBMS in applications without having to have a separate
set of server processes.  That would work really well for a use case
like this one where you're really going to be using FDW to access the
real data store.

If your objection is to an RDBMS in the application accessing real data
via FDW, well, see all the commentary about threat models.  You really
can't protect against DBAs without client-side crypto (and lots of bad
trade-offs).  You can do the crypto in the application, but you start to
lose the power of SQL.  Anyways, I don't think client-side crypto is the
best answer to the DBA threat -- access reduction + auditing is easier
and better.

In any case, spinning up a postgres instance just for this use case is
easy because it wouldn't have any persistent state to keep locally.

> >Any discussion of cryptographic applications should start with a
> >discussion of threat models.  This is not a big hurdle.
> 
> You already mentioned that there are also "business criteria" to
> consider here, and they are important. And there are even more to consider.

The threat model *is* a business criterion.  What are the threats you
want to protect against?  Why aren't you interested in these other
threats?  These are *business* questions.

Of course, one has to understand the issues, including intangibles.  For
example, reputation risk is a huge business concern, but as an
intangible it's easy to miss.  People have a blind spot for intangibles.

> For instance, cases where (1) and even (2) under your proposed threat model
> cannot be fixed by external (device/filesystem) encryption. Consider for
> example the managed database services provided by the cloud vendors. While
> (all?) of them provide transparent disk encryption, are they trust-able? Do

Databases won't be your only cloud security issue.  At some point you're
either using things like SGX or, while you wait for practical, high-
performance homomorphic cryptgraphic computing, you'll settle for using
the power of contracts and contract law -- contract law is a *business*
tool.

At some point there's not much difference between an insider you can
fire and an insider at a vendor who can fire them (and which vendor you
can fire as well), especially when you can use contract law to get the
vendor to do special things for you, like show you how they do things,
reduce admin access, let you audit them, and so on.

> business want to rely on their encryption scheme, key management, and how
> they respond from requests to hand off encryption keys? I believe
> self-contained solutions are very worth, also because of this.

I don't, because client-side crypto to deal with this is just so very
unsatisfactory.  Besides, what happens when first you move the DB into
the cloud and put a lot of effort into client-side crypto, then you move
the clients into the same cloud too?

And anyways, what was proposed by OP is *server-side* crypto, which
clearly does not help at all in this case.

Nico
-- 



Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-07-02 Thread Carter Thaxton
The whole reason for the colon in the --where option is to indicate which
table the WHERE clause should refer to, so that one can dump less than all
of the rows.
The --table option is totally different.  It specifies which tables to dump
at all.

If I provide a --where option, and no --table option, I want the WHERE
clause to apply to the given table, and otherwise dump all tables.
If one supplies a --table option, it won't dump all tables - it will only
dump the one specified.  I don't want to have to specify all the tables
with --table, just to use the --where option.

Also, there may be some misunderstanding about "foo:bar" above.  That's an
example of using a namespaced table, where "bar" is a table in the
namespace "foo".  Normally, assuming your table is named "bar" in the
default namespace, you would just say something like:

  pg_dump --where "bar:created_at >= 2018-05-01'"


On Mon, Jul 2, 2018 at 11:27 AM, Robert Haas  wrote:

> On Fri, Jun 29, 2018 at 8:09 AM, Surafel Temesgen 
> wrote:
> > hey,
> > i am reviewing this patch
> > On Thu, May 31, 2018 at 4:49 AM, Carter Thaxton <
> carter.thax...@gmail.com>
> > wrote:
> >>
> >>
> >>   pg_dump --where '"foo:bar":created_at >= '2018-05-01'" dbname
> >
> > it would be more sqlish if it specified like:
> > --table=foo --where ="bar created_at >= 2018-05-01"
> > and i don't like the idea of duplicating the existing --table behavior it
> > may confuse user
> > i rather recommend extending it. And when i test it with --table option
> the
> > content of dump
> > file depend on the option specified first.
>
> But you can specify multiple tables.  You wouldn't want the same WHERE
> clause to apply to all of them.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Small improvement to compactify_tuples

2018-07-02 Thread Andrew Dunstan




On 03/04/2018 04:57 AM, Yura Sokolov wrote:


BTW, I have small change to templated version that improves sorting of
random tuples a bit (1-1.5%). Will post it a bit later with test.





There doesn't seem to have been any progress since this email.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-07-02 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 26, 2018 at 1:06 PM, Tom Lane  wrote:
>> Certainly we *could* change it, but it's not at all clear that it's a good
>> idea.  The current behavior seemed sensible when it was implemented, and
>> it has stood for quite some years now.  Now, we have one person
>> complaining that it wasn't what he expected.  If we change the behavior on
>> the strength of that, will we change it back on the first complaint from
>> someone else?  What about the possibility that the change breaks peoples'
>> applications?

> Hmm.  I think we generally allow ALTER TABLE to work on non-table
> relations.  For example, ALTER TABLE .. OWNER TO happily accepts a
> view or materialized view as an argument.  I have had my doubts about
> that policy from time to time (cf.
> 1489e2f26a4c0318938b3085f50976512f321d84) but it does seem to be the
> policy (cf. b14206862278347a379f2bb72d92d16fb9dcea45).  Is there some
> reason to think that the same policy that we apply to ALTER should not
> also apply to DROP?

Uh ... error proofing?  The thing about ALTER is that if you mistakenly
point at the wrong object, it's quite likely your command will fail
because the properties of the object don't match (eg, it doesn't have a
column by that name).  Or if it goes through anyway, many varieties of
ALTER are reversible -- certainly ALTER OWNER is.  There's no comparable
safety margin with DROP, and it's not reversible, and the consequences of
a mistake could be pretty bad.

Note that in any case this wasn't what the OP was proposing, if I
understood that correctly.

regards, tom lane



Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-07-02 Thread Daniel Gustafsson
> On 2 Jul 2018, at 14:01, Masahiko Sawada  wrote:

> Thank you for updating the patch! There are two review comments.

Thanks for reviewing!

> The current select_active_windows() function compares the all fields
> of WindowClause for the sorting but with this patch we compare only
> tleSortGroupRef, sortop and the number of uniqueOrder. I think this
> leads a degradation as follows.

You are right, that was an oversight.  The attached patch takes a stab at
fixing this.

> s/readibility/readability/

Fixed.

cheers ./daniel



window_prefix_sort-v4.patch
Description: Binary data


Re: Should contrib modules install .h files?

2018-07-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Might as well follow the MODULEDIR precedent (though I'm not wedded
 Tom> to that if somebody has an argument for something else).
 [...]
 Tom> I'd definitely vote for "error".  Likewise if any .h file listed in
 Tom> the macro doesn't exist.

OK, so that matches the patch I just posted. If any .h file listed
doesn't exist, then $(INSTALL_DATA) should fail (and a quick test seems
to confirm that it does).

I only added one HEADERS= entry, for contrib/hstore; the other modules
should also be reviewed to see if they should install any headers, but
the basic stuff should be nailed down first.

-- 
Andrew (irc:RhodiumToad)



Re: Should contrib modules install .h files?

2018-07-02 Thread Tom Lane
Andrew Gierth  writes:
> Two questions arise:

> 1) include/server has a lot of files and subdirs, so using
>include/server/$(MODULE)/ looks likely to be error-prone. So it
>should be something like include/server/contrib/$(MODULE)/ or
>include/server/extension/$(MODULE)/. Which one, or should it use
>$(MODULEDIR) to choose between the two the way that DATA and DOCS do?
>Or something else?

Might as well follow the MODULEDIR precedent (though I'm not wedded
to that if somebody has an argument for something else).

> 2) Specifying HEADERS_blah for some name "blah" that's not listed in
>MODULES or MODULE_big should do what:
>   a) install into blah/ anyway
>   b) be ignored with a warning
>   c) be silently ignored
>   d) be an error

I'd definitely vote for "error".  Likewise if any .h file listed in
the macro doesn't exist.

regards, tom lane



Re: Should contrib modules install .h files?

2018-07-02 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:

 Andrew> OK, I'm working on an updated patch

and here it is.

This installs to $(includedir_server)/$(MODULEDIR)/$MODULE/file.h
(e.g. include/server/extension/hstore/hstore.h for an actual example),
and errors if HEADERS_xxx is defined for anything that's not a module
listed in MODULES or MODULE_big.

-- 
Andrew (irc:RhodiumToad)

diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index ab7fef3979..46d26f8052 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -11,6 +11,8 @@ DATA = hstore--1.4.sql hstore--1.4--1.5.sql \
 	hstore--unpackaged--1.0.sql
 PGFILEDESC = "hstore - key/value pair data type"
 
+HEADERS = hstore.h
+
 REGRESS = hstore
 
 ifdef USE_PGXS
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index bfae02f90c..3e7817931e 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -38,6 +38,10 @@
 #   SCRIPTS -- script files (not binaries) to install into $PREFIX/bin
 #   SCRIPTS_built -- script files (not binaries) to install into $PREFIX/bin,
 # which need to be built first
+#   HEADERS -- files to install into $(includedir_server)/$MODULEDIR/$MODULE_big
+#   HEADERS_$(MODULE) -- files to install into
+# $(includedir_server)/$MODULEDIR/$MODULE; the value of $MODULE must be
+# listed in MODULES or MODULE_big
 #   REGRESS -- list of regression test cases (without suffix)
 #   REGRESS_OPTS -- additional switches to pass to pg_regress
 #   NO_INSTALLCHECK -- don't define an installcheck target, useful e.g. if
@@ -94,13 +98,16 @@ endif
 ifdef MODULEDIR
 datamoduledir := $(MODULEDIR)
 docmoduledir := $(MODULEDIR)
+incmoduledir := $(MODULEDIR)
 else
 ifdef EXTENSION
 datamoduledir := extension
 docmoduledir := extension
+incmoduledir := extension
 else
 datamoduledir := contrib
 docmoduledir := contrib
+incmoduledir := contrib
 endif
 endif
 
@@ -108,6 +115,41 @@ ifdef PG_CPPFLAGS
 override CPPFLAGS := $(PG_CPPFLAGS) $(CPPFLAGS)
 endif
 
+HEADER_alldirs := $(patsubst HEADERS_%,%,$(filter HEADERS_%, $(.VARIABLES)))
+
+# HEADERS is an error in the absence of MODULE_big to provide a dir name
+ifdef MODULE_big
+HEADER_dirs := $(MODULE_big)
+ifdef HEADERS
+HEADERS_$(MODULE_big) := $(HEADERS)
+endif
+else
+ifdef HEADERS
+$(error HEADERS requires MODULE_big to be set)
+endif
+HEADER_dirs := $(filter $(MODULES),$(HEADER_alldirs))
+endif
+
+# HEADERS_foo requires that "foo" is in MODULES as a sanity check
+ifneq ($(filter-out $(HEADER_dirs),$(HEADER_alldirs)),)
+$(error $(patsubst %,HEADERS_%,$(filter-out $(HEADER_dirs),$(HEADER_alldirs))) defined with no module)
+endif
+
+# Functions for generating install/uninstall commands; the blank lines
+# before the "endef" are required, don't lose them
+# $(call install_headers,dir,headers)
+define install_headers
+$(MKDIR_P) '$(DESTDIR)$(includedir_server)/$(incmoduledir)/$(1)/'
+$(INSTALL_DATA) $(addprefix $(srcdir)/, $(2)) '$(DESTDIR)$(includedir_server)/$(incmoduledir)/$(1)/'
+
+endef
+# $(call uninstall_headers,dir,headers)
+define uninstall_headers
+rm -f $(addprefix '$(DESTDIR)$(includedir_server)/$(incmoduledir)/$(1)'/, $(notdir $(2)))
+
+endef
+
+
 all: $(PROGRAM) $(DATA_built) $(SCRIPTS_built) $(addsuffix $(DLSUFFIX), $(MODULES)) $(addsuffix .control, $(EXTENSION))
 
 ifeq ($(with_llvm), yes)
@@ -154,6 +196,9 @@ endif # SCRIPTS
 ifdef SCRIPTS_built
 	$(INSTALL_SCRIPT) $(SCRIPTS_built) '$(DESTDIR)$(bindir)/'
 endif # SCRIPTS_built
+ifneq ($(strip $(HEADER_dirs)),)
+	$(foreach dir,$(HEADER_dirs),$(call install_headers,$(dir),$(HEADERS_$(dir
+endif # HEADERS
 ifdef MODULE_big
 ifeq ($(with_llvm), yes)
 	$(call install_llvm_module,$(MODULE_big),$(OBJS))
@@ -218,6 +263,9 @@ endif
 ifdef SCRIPTS_built
 	rm -f $(addprefix '$(DESTDIR)$(bindir)'/, $(SCRIPTS_built))
 endif
+ifneq ($(strip $(HEADER_dirs)),)
+	$(foreach dir,$(HEADER_dirs),$(call uninstall_headers,$(dir),$(HEADERS_$(dir
+endif # HEADERS
 
 ifdef MODULE_big
 ifeq ($(with_llvm), yes)


Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-07-02 Thread Peter Geoghegan
On Mon, Jul 2, 2018 at 9:28 AM, Peter Geoghegan  wrote:
>> Execution time of last "VACUUM test;" command on my notebook was:
>>
>> with bulk deletion: 1.6 s;
>> with Quick Vacuum Strategy: 5.2 s;
>> with Quick Vacuum Strategy & TID sorting: 0.6 s.
>
> I'm glad that you looked into this. You could make this faster still,
> by actually passing the lowest heap TID in the list of TIDs to kill to
> _bt_search() and _bt_binsrch(). You might have to work through several
> extra B-Tree leaf pages per bttargetdelete() call without this (you'll
> move right multiple times within bttargetdelete()).

I should add: I think that this doesn't matter so much in your
original test case with v1 of my patch, because you're naturally
accessing the index tuples in almost the most efficient way already,
since you VACUUM works its way from the start of the table until the
end of the table. You'll definitely need to pass a heap TID to
routines like _bt_search() once you start using my v2, though, since
that puts the heap TIDs in DESC sort order. Otherwise, it'll be almost
as slow as the plain "Quick Vacuum Strategy" case was.

In general, the big idea with my patch is that heap TID is just
another attribute. I am not "cheating" in any way; if it's not
possible to descend the tree and arrive at the right leaf page without
looking through several leaf pages, then my patch is broken.

You might also use _bt_moveright() with my patch. That way, you can
quickly detect that you need to move right immediately, without going
through all the items on the page. This should only be an issue in the
event of a concurrent page split, though. In my patch, I use
_bt_moveright() in a special way for unique indexes: I need to start
at the first leaf page a duplicate could be on for duplicate checking,
but once that's over I want to "jump immediately" to the leaf page the
index tuple actually needs to be inserted on. That's when
_bt_moveright() is called. (Actually, that looks like it breaks unique
index enforcement in the case of my patch, which I need to fix, but
you might still do this.)

-- 
Peter Geoghegan



Re: Speedup of relation deletes during recovery

2018-07-02 Thread Fujii Masao
On Thu, Jun 28, 2018 at 3:23 AM, Andres Freund  wrote:
> On 2018-06-28 03:21:51 +0900, Fujii Masao wrote:
>> On Wed, Jun 27, 2018 at 10:44 AM, Thomas Munro
>>  wrote:
>> > On Wed, Jun 27, 2018 at 1:13 PM, Thomas Munro
>> >  wrote:
>> >> On Wed, Jun 27, 2018 at 12:16 PM, Thomas Munro
>> >>  wrote:
>> >>> I think we should take the hint in the comments and make it O(1)
>> >>> anyway.  See attached draft patch.
>> >>
>> >> Alternatively, here is a shorter and sweeter dlist version (I did the
>> >> open-coded one thinking of theoretical back-patchability).
>> >
>> > ... though, on second thoughts, the dlist version steam-rolls over the
>> > possibility that it might not be in the list (mentioned in the
>> > comments, though it's not immediately clear how that would happen).
>> >
>> > On further reflection, on the basis that it's the most conservative
>> > change, +1 for Fujii-san's close-in-reverse-order idea.  We should
>> > reconsider that data structure for 12
>>
>> +1
>>
>> Attached is v3 patch which I implemented close-in-reverse-order idea
>> on v2.
>>
>> Regards,
>>
>> --
>> Fujii Masao
>
>> --- a/src/backend/access/transam/twophase.c
>> +++ b/src/backend/access/transam/twophase.c
>> @@ -1457,6 +1457,7 @@ FinishPreparedTransaction(const char *gid, bool 
>> isCommit)
>>   int ndelrels;
>>   SharedInvalidationMessage *invalmsgs;
>>   int i;
>> + SMgrRelation *srels;
>>
>>   /*
>>* Validate the GID, and lock the GXACT to ensure that two backends do 
>> not
>> @@ -1549,13 +1550,22 @@ FinishPreparedTransaction(const char *gid, bool 
>> isCommit)
>>   delrels = abortrels;
>>   ndelrels = hdr->nabortrels;
>>   }
>> +
>> + srels = palloc(sizeof(SMgrRelation) * ndelrels);
>>   for (i = 0; i < ndelrels; i++)
>> - {
>> - SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
>> + srels[i] = smgropen(delrels[i], InvalidBackendId);
>>
>> - smgrdounlink(srel, false);
>> - smgrclose(srel);
>> - }
>> + smgrdounlinkall(srels, ndelrels, false);
>> +
>> + /*
>> +  * Call smgrclose() in reverse order as when smgropen() is called.
>> +  * This trick enables remove_from_unowned_list() in smgrclose()
>> +  * to search the SMgrRelation from the unowned list,
>> +  * in O(1) performance.
>> +  */
>> + for (i = ndelrels - 1; i >= 0; i--)
>> + smgrclose(srels[i]);
>> + pfree(srels);
>>
>>   /*
>>* Handle cache invalidation messages.
>> --- a/src/backend/access/transam/xact.c
>> +++ b/src/backend/access/transam/xact.c
>> @@ -5618,6 +5618,8 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
>>   /* Make sure files supposed to be dropped are dropped */
>>   if (parsed->nrels > 0)
>>   {
>> + SMgrRelation *srels;
>> +
>>   /*
>>* First update minimum recovery point to cover this WAL 
>> record. Once
>>* a relation is deleted, there's no going back. The buffer 
>> manager
>> @@ -5635,6 +5637,7 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
>>*/
>>   XLogFlush(lsn);
>>
>> + srels = palloc(sizeof(SMgrRelation) * parsed->nrels);
>>   for (i = 0; i < parsed->nrels; i++)
>>   {
>>   SMgrRelation srel = smgropen(parsed->xnodes[i], 
>> InvalidBackendId);
>> @@ -5642,9 +5645,20 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
>>
>>   for (fork = 0; fork <= MAX_FORKNUM; fork++)
>>   XLogDropRelation(parsed->xnodes[i], fork);
>> - smgrdounlink(srel, true);
>> - smgrclose(srel);
>> + srels[i] = srel;
>>   }
>> +
>> + smgrdounlinkall(srels, parsed->nrels, true);
>> +
>> + /*
>> +  * Call smgrclose() in reverse order as when smgropen() is 
>> called.
>> +  * This trick enables remove_from_unowned_list() in smgrclose()
>> +  * to search the SMgrRelation from the unowned list,
>> +  * in O(1) performance.
>> +  */
>> + for (i = parsed->nrels - 1; i >= 0; i--)
>> + smgrclose(srels[i]);
>> + pfree(srels);
>>   }
>>
>>   /*
>> @@ -5685,6 +5699,7 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, 
>> TransactionId xid)
>>  {
>>   int i;
>>   TransactionId max_xid;
>> + SMgrRelation *srels;
>>
>>   Assert(TransactionIdIsValid(xid));
>>
>> @@ -5748,6 +5763,7 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, 
>> TransactionId xid)
>>   }
>>
>>   /* Make sure files supposed to be dropped are dropped */
>> + srels = palloc(sizeof(SMgrRelation) * parsed->nrels);
>>   for (i = 0; i < parsed->nrels; i++)
>>   {
>>   SMgrRelation srel = smgropen(parsed->xnodes[i], 
>> Inv

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-07-02 Thread Alvaro Hernandez




On 11/06/18 12:22, Masahiko Sawada wrote:

On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
 wrote:

Hello Hackers,

This propose a way to develop "Table-level" Transparent Data Encryption (TDE) 
and Key Management Service (KMS) support in
PostgreSQL.


Issues on data encryption of PostgreSQL
==
Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
However, it is inconvenient to use pgcrypto to encrypts data in some cases.

There are two significant inconveniences.

First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto 
functions everywhere we encrypt/decrypt.
Second, we must modify application program code much if we want to do database 
migration to PostgreSQL from other databases that is
using TDE.

To resolved these inconveniences, many users want to support TDE.
There have also been a few proposals, comments, and questions to support TDE in 
the PostgreSQL community.

However, currently PostgreSQL does not support TDE, so in development 
community, there are discussions whether it's necessary to
support TDE or not.

In these discussions, there were requirements necessary to support TDE in 
PostgreSQL.

1) The performance overhead of encryption and decryption database data must be 
minimized
2) Need to support WAL encryption.
3) Need to support Key Management Service.

Therefore, I'd like to propose the new design of TDE that deals with both above 
requirements.
Since this feature will become very large, I'd like to hear opinions from 
community before starting making the patch.

First, my proposal is table-level TDE which is that user can specify tables 
begin encrypted.
Indexes, TOAST table and WAL associated with the table that enables TDE are 
also encrypted.

Moreover, I want to support encryption for large object as well.
But I haven't found a good way for it so far. So I'd like to remain it as 
future TODO.

My proposal has five characteristics features of "table-level TDE".

1) Buffer-level data encryption and decryption
2) Per-table encryption
3) 2-tier encryption key management
4) Working with external key management services(KMS)
5) WAL encryption

Here are more details for each items.


1. Buffer-level data encryption and decryption
==
Transparent data encryption and decryption accompany by storage operation
With ordinally way like using pgcrypto, the biggest problem with encrypted data 
is the performance overhead of decrypting the data
each time the run to queries.

My proposal is to encrypt and decrypt data when performing DISK I/O operation 
to minimize performance overhead.
Therefore, the data in the shared memory layer is unencrypted so that 
performance overhead can minimize.

With this design, data encryption/decryption implementations can be developed 
by modifying the codes of the storage and buffer
manager modules,
which are responsible for performing DISK I/O operation.


2. Per-table encryption
==
User can enable TDE per table as they want.
I introduce new storage parameter "encryption_enabled" which enables TDE at 
table-level.

 // Generate  the encryption table
CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );

 // Change to the non-encryption table
ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );

This approach minimizes the overhead for tables that do not require encryption 
options.
For tables that enable TDE, the corresponding table key will be generated with 
random values, and it's stored into the new system
catalog after being encrypted by the master key.

BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use 
the IV in CBC mode for this proposal.
I'd like to hear opinions by security engineer.


3. 2-tier encryption key management
==
when it comes time to change cryptographic keys, there is a performance 
overhead to decryption and re-encryption to all data.

To solve this problem we employee 2-tier encryption.
2-tier encryption is All table keys can be stored in the database cluster after 
being encrypted by the master key, And master keys
must be stored at external of PostgreSQL.

Therefore, without master key, it is impossible to decrypt the table key. Thus, 
It is impossible to decrypt the database data.

When changing the key, it's not necessary to re-encrypt for all data.
We use the new master key only to decrypt and re-encrypt the table key, these 
operations for minimizing the performance overhead.

For table keys, all TDE-enabled tables have different table keys.
And for master key, all database have different master keys. Table keys are 
encrypted by the master key of its own database.
For WAL encryption, we have another cryptographic key. WAL-key is also 
encrypted by a master key, but it is shared across the
database cluster.


4. Working with external key management services(KMS)
==
A key management service is an integrated approach for generating, fetching and 
managing encryption keys for key co

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-07-02 Thread Alvaro Hernandez




On 21/06/18 21:43, Nico Williams wrote:

On Fri, May 25, 2018 at 08:41:46PM +0900, Moon, Insung wrote:

Issues on data encryption of PostgreSQL
==
Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
However, it is inconvenient to use pgcrypto to encrypts data in some cases.

There are two significant inconveniences.

First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto 
functions everywhere we encrypt/decrypt.

Not so.  VIEWs with INSTEAD OF triggers allow you to avoid this.


Second, we must modify application program code much if we want to do
database migration to PostgreSQL from other databases that is using
TDE.

Not so.  See above.

However, I have at times been told that I should use SQL Server or
whatever because it has column encryption.  My answer is always that it
doesn't help (see my other posts on this thread), but from a business
perspective I understand the problem: the competition has a shiny (if
useless) feature XYZ, therefore we must have it also.

I'm not opposed to PG providing encryption features similar to the
competition's provided the documentation makes their false-sense-of-
security dangers clear.

Incidentally, PG w/ pgcrypto and FDW does provide everything one needs
to be able to implement client-side crypto:

  - use PG w/ FDW as a client-side proxy for the real DB
  - use pgcrypto in VIEWs with INSTEAD OF triggers in the proxy
  - access the DB via the proxy


    Sounds a bit hackish, but it could work. I doubt however the 
acceptance of a solution like this, given the number of "moving parts" 
and operational complexity associated with it.





Presto: transparent client-side crypto that protects against DBAs.  See
other posts about properly binding ciphertext and plaintext.

Protection against malicious DBAs is ultimately a very difficult thing
to get right -- if you really have DBAs as a threat and take that threat
seriously then you'll end up implementing a Merkle tree and performance
will go out the window.


In these discussions, there were requirements necessary to support TDE in 
PostgreSQL.

1) The performance overhead of encryption and decryption database data must be 
minimized
2) Need to support WAL encryption.
3) Need to support Key Management Service.

(2) and full database encryption could be done by the filesystem /
device drivers.  I think this is a much better answer than including
encryption in the DB just because it means not adding all that
complexity to PG, though it's not as portable as doing it in the DB (and
this may well be a winning argument).

What (3) looks like depends utterly on the threat model.  We must
discuss threat models first.

The threat models will drive the design, and (1) will drive some
trade-offs.


Therefore, I'd like to propose the new design of TDE that deals with
both above requirements.  Since this feature will become very large,
I'd like to hear opinions from community before starting making the
patch.

Any discussion of cryptographic applications should start with a
discussion of threat models.  This is not a big hurdle.



    You already mentioned that there are also "business criteria" to 
consider here, and they are important. And there are even more to 
consider. For instance, cases where (1) and even (2) under your proposed 
threat model cannot be fixed by external (device/filesystem) encryption. 
Consider for example the managed database services provided by the cloud 
vendors. While (all?) of them provide transparent disk encryption, are 
they trust-able? Do business want to rely on their encryption scheme, 
key management, and how they respond from requests to hand off 
encryption keys? I believe self-contained solutions are very worth, also 
because of this.



    Álvaro

--

Alvaro Hernandez


---
OnGres




Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-07-02 Thread Robert Haas
On Tue, Jun 26, 2018 at 1:06 PM, Tom Lane  wrote:
> Certainly we *could* change it, but it's not at all clear that it's a good
> idea.  The current behavior seemed sensible when it was implemented, and
> it has stood for quite some years now.  Now, we have one person
> complaining that it wasn't what he expected.  If we change the behavior on
> the strength of that, will we change it back on the first complaint from
> someone else?  What about the possibility that the change breaks peoples'
> applications?

Hmm.  I think we generally allow ALTER TABLE to work on non-table
relations.  For example, ALTER TABLE .. OWNER TO happily accepts a
view or materialized view as an argument.  I have had my doubts about
that policy from time to time (cf.
1489e2f26a4c0318938b3085f50976512f321d84) but it does seem to be the
policy (cf. b14206862278347a379f2bb72d92d16fb9dcea45).  Is there some
reason to think that the same policy that we apply to ALTER should not
also apply to DROP?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-07-02 Thread Robert Haas
On Fri, Jun 29, 2018 at 8:09 AM, Surafel Temesgen  wrote:
> hey,
> i am reviewing this patch
> On Thu, May 31, 2018 at 4:49 AM, Carter Thaxton 
> wrote:
>>
>>
>>   pg_dump --where '"foo:bar":created_at >= '2018-05-01'" dbname
>
> it would be more sqlish if it specified like:
> --table=foo --where ="bar created_at >= 2018-05-01"
> and i don't like the idea of duplicating the existing --table behavior it
> may confuse user
> i rather recommend extending it. And when i test it with --table option the
> content of dump
> file depend on the option specified first.

But you can specify multiple tables.  You wouldn't want the same WHERE
clause to apply to all of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: shared-memory based stats collector

2018-07-02 Thread Robert Haas
On Fri, Jun 29, 2018 at 4:34 AM, Kyotaro HORIGUCHI
 wrote:
> Nowadays PostgreSQL has dynamic shared hash (dshash) so we can
> use this as the main storage of statistics. We can share data
> without a stress using this.
>
> A PoC previously posted tried to use "locally copied" dshash but
> it doesn't looks fine so I steered to different direction.
>
> With this patch dshash can create a local copy based on dynhash.

Copying the whole hash table kinds of sucks, partly because of the
time it will take to copy it, but also because it means that memory
usage is still O(nbackends * ntables).  Without looking at the patch,
I'm guessing that you're doing that because we need a way to show each
transaction a consistent snapshot of the data, and I admit that I
don't see another obvious way to tackle that problem.  Still, it would
be nice if we had a better idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Should contrib modules install .h files?

2018-07-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> How about this: it's most likely that modules that install include
 >> files will also be using MODULE_big, so use that as the default
 >> name; if a makefile that uses only MODULES also wants to install
 >> include files, have it define MODULE_NAME (or some such variable)
 >> itself.

 Tom> AFAIR, you're supposed to define at most one of those macros
 Tom> anyway, so I don't see why it couldn't be like "use MODULE_big if
 Tom> set, else use MODULE if set, else fail if MODULE_NAME isn't set".

OK, I'm working on an updated patch, that will allow this:

if using MODULE_big:

MODULE_big = mymodule
HEADERS = whatever.h

# gets installed as mymodule/whatever.h  in whatever dir we decide on
# also works if you define HEADERS_mymodule = whatever.h

if not using MODULE_big:

MODULES = foo bar baz
HEADERS_foo = foo.h
HEADERS_bar = bar.h
# baz doesn't have any headers

# foo.h installed as foo/foo.h
# bar.h installed as bar/bar.h

Two questions arise:

1) include/server has a lot of files and subdirs, so using
   include/server/$(MODULE)/ looks likely to be error-prone. So it
   should be something like include/server/contrib/$(MODULE)/ or
   include/server/extension/$(MODULE)/. Which one, or should it use
   $(MODULEDIR) to choose between the two the way that DATA and DOCS do?
   Or something else?

2) Specifying HEADERS_blah for some name "blah" that's not listed in
   MODULES or MODULE_big should do what:

  a) install into blah/ anyway
  b) be ignored with a warning
  c) be silently ignored
  d) be an error

-- 
Andrew (irc:RhodiumToad)



Re: Fix error message when trying to alter statistics on included column

2018-07-02 Thread Robert Haas
On Thu, Jun 28, 2018 at 5:28 AM, Yugo Nagata  wrote:
> According to the error message, it is not allowed to alter statistics on
> included column because this is "non-expression column".
>
>  postgres=# create table test (i int, d int);
>  CREATE TABLE
>  postgres=# create index idx on test(i) include (d);
>  CREATE INDEX
>  postgres=# alter index idx alter column 2 set statistics 10;
>  ERROR:  cannot alter statistics on non-expression column "d" of index "idx"
>  HINT:  Alter statistics on table column instead.
>
> However, I think this should be forbidded in that this is not a key column
> but a included column. Even if we decide to support expressions in included
> columns in future, it is meaningless to do this because any statistics on
> included column is never used by the planner.
>
> Attached is the patch to fix the error message. In this fix, column number
> is checked first. After applying this, the message is changed as below;
>
>  postgres=# alter index idx alter column 2 set statistics 10;
>  ERROR:  cannot alter statistics on included column "d" of index "idx"

I think you should add an open item for this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: alter index WITH ( storage_parameter = value [, ... ] ) for partition index.

2018-07-02 Thread Robert Haas
On Wed, Jun 27, 2018 at 5:42 AM, Rajkumar Raghuwanshi
 wrote:
> postgres=# alter index part_idx reset (fillfactor);
> ERROR:  "part_idx" is not a table, view, materialized view, or index

I don't know whether that should work, but it seems like the error
message needs improvement, at the least.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Regression on PostgreSQL 10 ORDER/GROUP BY expression not found in targetlist

2018-07-02 Thread Robert Haas
On Sat, Jun 30, 2018 at 5:38 PM, Tom Lane  wrote:
> I also think that there's some horribly unsafe coding in
> apply_scanjoin_target_to_paths: it clobbers a RelOptInfo's reltarget,
> with no thought for whether that might affect things elsewhere,
> and it also clobbers individual paths' PathTargets like so:
>
>if (tlist_same_exprs)
>subpath->pathtarget->sortgrouprefs =
>scanjoin_target->sortgrouprefs;
>
> with similarly little regard for whether those PathTarget structures are
> shared with anything else.

Regarding this point specifically, I definitely was not thrilled about
that hack, but it was tricky to avoid having the tlist_same_exprs case
regress in terms of performance, so I didn't feel I could turn down
performance optimizations without some evidence of an actual problem.
For this to be a problem, it would have to be the case that somebody
builds a path, returns it, and then later mutates the sortgrouprefs
array in place.  Does that happen?  Off-hand I would have thought that
you'd have to apply the correct sortgroupref labeling to a path before
using it as a building block in some other path, just like it's not
really OK to change the cost of a path once you've passed it to
add_path() or add_partial_path().  Decisions might have already been
made based on the old information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-07-02 Thread Peter Geoghegan
On Thu, Jun 14, 2018 at 11:44 AM, Peter Geoghegan  wrote:
> I attach an unfinished prototype of suffix truncation, that also
> sometimes *adds* a new attribute in pivot tuples. It adds an extra
> heap TID from the leaf level when truncating away non-distinguishing
> attributes during a leaf page split, though only when it must. The
> patch also has nbtree treat heap TID as a first class part of the key
> space of the index. Claudio wrote a patch that did something similar,
> though without the suffix truncation part [2] (I haven't studied his
> patch, to be honest). My patch is actually a very indirect spin-off of
> Anastasia's covering index patch, and I want to show what I have in
> mind now, while it's still swapped into my head. I won't do any
> serious work on this project unless and until I see a way to implement
> retail index tuple deletion, which seems like a multi-year project
> that requires the buy-in of multiple senior community members. On its
> own, my patch regresses performance unacceptably in some workloads,
> probably due to interactions with kill_prior_tuple()/LP_DEAD hint
> setting, and interactions with page space management when there are
> many "duplicates" (it can still help performance in some pgbench
> workloads with non-unique indexes, though).

I attach a revised version, which is still very much of prototype
quality, but manages to solve a few of the problems that v1 had.
Andrey Lepikhov (CC'd) asked me to post any improved version I might
have for use with his retail index tuple deletion patch, so I thought
I'd post what I have.

The main development for v2 is that the sort order of the implicit
heap TID attribute is flipped. In v1, it was in "ascending" order. In
v2, comparisons of heap TIDs are inverted to make the attribute order
"descending". This has a number of advantages:

* It's almost consistent with the current behavior when there are
repeated insertions of duplicates. Currently, this tends to result in
page splits of the leftmost leaf page among pages that mostly consist
of the same duplicated value. This means that the destabilizing impact
on DROP SCHEMA ... CASCADE regression test output noted before [1] is
totally eliminated. There is now only a single trivial change to
regression test "expected" files, whereas in v1 dozens of "expected"
files had to be changed, often resulting in less useful reports for
the user.

* The performance regression I observed with various pgbench workloads
seems to have gone away, or is now within the noise range. A patch
like this one requires a lot of validation and testing, so this should
be taken with a grain of salt.

I may have been too quick to give up on my original ambition of
writing a stand-alone patch that can be justified entirely on its own
merits, without being tied to some much more ambitious project like
retail index tuple deletion by VACUUM, or zheap's deletion marking. I
still haven't tried to replace the kludgey handling of unique index
enforcement, even though that would probably have a measurable
additional performance benefit. I think that this patch could become
an unambiguous win.

[1] 
https://postgr.es/m/CAH2-Wz=wAKwhv0PqEBFuK2_s8E60kZRMzDdyLi=-mvcm_ph...@mail.gmail.com
-- 
Peter Geoghegan


v2-0001-Ensure-nbtree-leaf-tuple-keys-are-always-unique.patch
Description: Binary data


Proposed fix for Bug #15259 (invalid empty array returned by intarray "&" operator)

2018-07-02 Thread Alexey Kryuchkov
The attached patch fixes Bug #15259 [1] in the intarray module, making the
'&' array intersection operator return proper zero-dimensional empty arrays
instead of one-dimensional, zero-length "empty" arrays.

In [2] this problem was addressed by changing the behaviour of
construct_[md_]array(), but the intarray module does not use these
functions. The patch I propose contains the relevant fixes to the
intarray module, along with regression tests.

[1]:
https://www.postgresql.org/message-id/153053285112.13258.434620894305716755%40wrigleys.postgresql.org
[2]: https://www.postgresql.org/message-id/20570.1506198...@sss.pgh.pa.us

Best regards,

Alexey Kryuchkov
diff --git a/contrib/intarray/_int_tool.c b/contrib/intarray/_int_tool.c
index ee8fb64a47..7fe82a8f12 100644
--- a/contrib/intarray/_int_tool.c
+++ b/contrib/intarray/_int_tool.c
@@ -4,6 +4,7 @@
 #include "postgres.h"
 
 #include "catalog/pg_type.h"
+#include "utils/array.h"
 
 #include "_int.h"
 
@@ -222,6 +223,13 @@ new_intArrayType(int num)
 	ArrayType  *r;
 	int			nbytes = ARR_OVERHEAD_NONULLS(1) + sizeof(int) * num;
 
+	/* if no elements, return a zero-dimensional array */
+	if (num == 0)
+	{
+		r = construct_empty_array(INT4OID);
+		return r;
+	}
+
 	r = (ArrayType *) palloc0(nbytes);
 
 	SET_VARSIZE(r, nbytes);
diff --git a/contrib/intarray/expected/_int.out b/contrib/intarray/expected/_int.out
index 0a5dd463ac..105c951bad 100644
--- a/contrib/intarray/expected/_int.out
+++ b/contrib/intarray/expected/_int.out
@@ -151,6 +151,30 @@ SELECT '{-1,3,1}'::int[] & '{1,2}';
  {1}
 (1 row)
 
+SELECT '{1}'::int[] & '{2}'::int[];
+ ?column? 
+--
+ {}
+(1 row)
+
+SELECT array_dims('{1}'::int[] & '{2}'::int[]);
+ array_dims 
+
+ 
+(1 row)
+
+SELECT ('{1}'::int[] & '{2}'::int[]) = '{}'::int[];
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT ('{}'::int[] & '{}'::int[]) = '{}'::int[];
+ ?column? 
+--
+ t
+(1 row)
+
 --test query_int
 SELECT '1'::query_int;
  query_int 
diff --git a/contrib/intarray/sql/_int.sql b/contrib/intarray/sql/_int.sql
index 44e1a729b4..40225c65ab 100644
--- a/contrib/intarray/sql/_int.sql
+++ b/contrib/intarray/sql/_int.sql
@@ -30,6 +30,10 @@ SELECT '{123,623,445}'::int[] | 1623;
 SELECT '{123,623,445}'::int[] | '{1623,623}';
 SELECT '{123,623,445}'::int[] & '{1623,623}';
 SELECT '{-1,3,1}'::int[] & '{1,2}';
+SELECT '{1}'::int[] & '{2}'::int[];
+SELECT array_dims('{1}'::int[] & '{2}'::int[]);
+SELECT ('{1}'::int[] & '{2}'::int[]) = '{}'::int[];
+SELECT ('{}'::int[] & '{}'::int[]) = '{}'::int[];
 
 
 --test query_int


Re: psql \df option for procedures

2018-07-02 Thread Isaac Morland
While you're looking at \df, you might want to consider removing the
display of source code (we have \sf for that, and it takes up a lot of
space), and add a column to display access permissions (who can execute the
function).

On 2 July 2018 at 09:22, Fabrízio de Royes Mello 
wrote:

> On Mon, Jul 2, 2018 at 7:07 AM, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
> >
> > psql's \df command current has options a/n/t/w to show
> > aggregates/normal/trigger/window functions.  Do we want to add something
> > for procedures?
> >
>
> +1. I can write a patch to save your time If you don't do it yet.
>
> Regards,
>
> --
>Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
>PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>


Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-07-02 Thread Peter Geoghegan
On Mon, Jul 2, 2018 at 7:29 AM, Andrey V. Lepikhov
 wrote:
> In the new version the patch [1] was used in cooperation with 'retail
> indextuple deletion' and 'quick vacuum strategy' patches (see
> '0004-Retail-IndexTuple-Deletion-with-TID-sorting-in-leaf-.patch'.

Cool.

I'm going to post a revised version of the unique key patch soon. I've
found that it's slightly faster to use DESC ordering for the implicit
heap TID attribute. Doing so also restores the old user-visible
behavior for DROP dependency management, which allows me to remove all
changes to the regression test output

> Execution time of last "VACUUM test;" command on my notebook was:
>
> with bulk deletion: 1.6 s;
> with Quick Vacuum Strategy: 5.2 s;
> with Quick Vacuum Strategy & TID sorting: 0.6 s.

I'm glad that you looked into this. You could make this faster still,
by actually passing the lowest heap TID in the list of TIDs to kill to
_bt_search() and _bt_binsrch(). You might have to work through several
extra B-Tree leaf pages per bttargetdelete() call without this (you'll
move right multiple times within bttargetdelete()).

-- 
Peter Geoghegan



Re: Tips on committing

2018-07-02 Thread Alvaro Herrera
On 2018-Jul-02, Stephen Frost wrote:

> > * Do a dry run before really pushing by using --dry-run.
> 
> In addition to this, I'd recommend using 'git show' on the results of
> the --dry-run, so that you see what you're really about to push.

Since commit 653530c8b196 I use this little script I borrowed from Magnus, then
page through all of it before pushing.

git push --dry-run 2>&1 | grep -v '^To' | while read line; do
if [ "$line" == "Everything up-to-date" ]; then
echo $line
else
topush=$(echo $line | awk '{print $1}')
git log --format=oneline $topush | cat
git show --format=fuller --color $topush | cat
fi
done | less -R

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-02 Thread Alexander Korotkov
Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov  wrote:
> On 06.03.2018 17:30, David Steele wrote:
>
> > I agree with Andres.  Pushing this patch to the next CF.
>
> Attached 4th version of the patches rebased onto the current master.
> Nothing interesting has changed from the previous version.

I took a look to this patchset.  In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well.  Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack?  One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST.  However, SP-GiST trees might
be much higher and these performance aspects might be different.

* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp().  In the same
time you explicitly set infinity distances for leaf nulls.  You
probably have reasons to implement it this way, but I think this
should be explained.  Also isnull property of SpGistSearchItem doesn't
have comment.

* I think KNN support should be briefly described in
src/backed/access/spgist/README.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Tips on committing

2018-07-02 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> FWIW, I developed a document on committing for my own reference, with
> some help from Andres. A lot of it is about commit message style, the
> use of fields, and so on. But I've also developed a check list for
> committing, knowing that there are a few classic mistakes that
> committers will make from time to time despite knowing better. These
> are worth checking against mechanically, IMV. Here are some actual
> examples from this document that I refer to:

Good list.

> * Do a dry run before really pushing by using --dry-run.

In addition to this, I'd recommend using 'git show' on the results of
the --dry-run, so that you see what you're really about to push.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Fix "base" snapshot handling in logical decoding

2018-07-02 Thread Arseny Sher

Arseny Sher  writes:

> There is also one thing that puzzles me as I don't know much about
> vacuum internals. If I do plain VACUUM of pg_attribute in the test, it
> shouts "catalog is missing 1 attribute(s) for relid" error (which is
> quite expected), while with 'VACUUM FULL pg_attribute' the tuple is
> silently (and wrongly, with dropped column missing) decoded. Moreover,
> if I perform the test manually, and do 'VACUUM FULL;', sometimes test
> becomes useless -- that is, tuple is successfully decoded with all three
> columns, as though VACUUM was not actually executed. All this is without
> the main patch, of course. I think I will look into this soon.

So I have been jumping around this and learned a few curious things.

1) Test in its current shape sometimes doesn't fulfill its aim indeed --
  that is, despite issued VACUUM the tuple is still decoded with all
  three fields. This happens because during attempt to advance xmin
  there is a good possiblity to encounter xl_running_xacts record logged
  before our CHECKPOINT (they are logged each 15 seconds). We do not
  try to advance xmin twice without client acknowledgment, so in this
  case xmin will not be advanced far enough to allow vacuum prune entry
  from pg_attribute.

2) This is not easy to notice because often (but not always) explicit
   VACUUM is not needed at all: tuple is often pruned by microvacuum
   (heap_page_prune_opts) right in the final decoding session. If we
   hadn't bumped xmin far enough during previous get_changes, we do that
   now, so microvacuum actually purges the entry. But if we were so
   unfortunate that 1) extra xl_running_xacts was logged and 2)
   microvacuum was in bad mood and didn't come, pg_attribute is not
   vacuumed and test becomes useless. To make this bulletproof, in the
   attached patch I doubled first get_changes: now there are two client
   acks, so our VACUUM always does the job.

3) As a side note, answer to my question 'why do we get different errors
   with VACUUM and VACUUM FULL' is the following. With VACUUM FULL, not
   only old pg_attribute entry is pruned, but also xmin of new entry
   with attisdropped=true is reset to frozen xid. This means that
   decoding session (RelationBuildTupleDesc) actually sees 3 attributes,
   and the fact that one of them is dropped doesn't embarrass this
   function (apparently relnatts in pg_class is never decremented) --
   we just go ahead and decode only live attributes.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/test_decoding/expected/oldest_xmin.out b/contrib/test_decoding/expected/oldest_xmin.out
index d09342c4be..d1b4f17e3a 100644
--- a/contrib/test_decoding/expected/oldest_xmin.out
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -1,6 +1,6 @@
 Parsed test spec with 2 sessions
 
-starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_get_changes s1_commit s0_vacuum s0_get_changes
+starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_get_changes s0_get_changes s1_commit s0_vacuum s0_get_changes
 step s0_begin: BEGIN;
 step s0_getxid: SELECT txid_current() IS NULL;
 ?column?   
@@ -14,8 +14,11 @@ step s0_checkpoint: CHECKPOINT;
 step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 data   
 
+step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data   
+
 step s1_commit: COMMIT;
-step s0_vacuum: VACUUM FULL;
+step s0_vacuum: VACUUM pg_attribute;
 step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 data   
 
diff --git a/contrib/test_decoding/specs/oldest_xmin.spec b/contrib/test_decoding/specs/oldest_xmin.spec
index 4f8af70aa2..141fe2b145 100644
--- a/contrib/test_decoding/specs/oldest_xmin.spec
+++ b/contrib/test_decoding/specs/oldest_xmin.spec
@@ -22,7 +22,7 @@ step "s0_getxid" { SELECT txid_current() IS NULL; }
 step "s0_alter" { ALTER TYPE basket DROP ATTRIBUTE mangos; }
 step "s0_commit" { COMMIT; }
 step "s0_checkpoint" { CHECKPOINT; }
-step "s0_vacuum" { VACUUM FULL; }
+step "s0_vacuum" { VACUUM pg_attribute; }
 step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
 
 session "s1"
@@ -30,8 +30,11 @@ step "s1_begin" { BEGIN; }
 step "s1_insert" { INSERT INTO harvest VALUES ((1, 2, 3)); }
 step "s1_commit" { COMMIT; }
 
-# Checkpoint with following get_changes forces to advance xmin. ALTER of a
+# Checkpoint with following get_changes forces xmin advancement. We do
+# get_changes twice because if one more xl_running_xacts record had slipped
+# before our CHECKPOINT, xmin will be advanced only on this

Re: branches_of_interest.txt

2018-07-02 Thread Tom Lane
Andres Freund  writes:
> On 2018-07-01 11:41:07 -0400, Tom Lane wrote:
>> I can see the value of people other than you being able to change it,
>> but keeping it in the core repo seems like a kluge not a proper solution.

> FWIW, I've a manually maintained version of this in the scripts I use to
> commit / backpatch things.  I'd appreciate not having to manually
> maintain it, and be afraid to forget updating it ;)

Yeah, I have something similar as well, in fact a couple of them.
I thought about whether I'd change those scripts to use an in-tree
branches_of_interest list, and I concluded that most likely I would
not, because the instant at which I decide a given branch is no longer
interesting to me is not necessarily the same instant at which we
shut down the buildfarm support for it.  It has even less to do with
the time that I next do a "git pull" after that shutdown.  Contrariwise,
once a new branch has been created in the repo, I need to set up a workdir
for it before it can safely be added to the list of branches I auto
push/pull.  So giving up control of the timing just isn't gonna work well.

If we were to keep this info in a separate repo requiring a separate
"git pull", it might be manageable since I could control when I update
that.  But then you're right back to the situation of requiring a manual
step you might forget.

In any case, if we do put this into the tree, I want it to be named
and treated as something that *only* controls the buildfarm, not any
other stuff.  We'll regret tying other stuff to that.

regards, tom lane



Re: Should contrib modules install .h files?

2018-07-02 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> So, given that we have to add something to the module makefiles
>  Tom> anyway, we could also add a macro specifying the subdirectory name
>  Tom> to use. (Although in practice this should always be equal to the
>  Tom> contrib/ subdirectory name, so maybe we could extract it on that
>  Tom> basis?)

> Using the subdir name may work for in-tree contrib/ builds but it's not
> so good for PGXS, which should not be making assumptions about the build
> directory name.

Fair point.

> How about this: it's most likely that modules that install include files
> will also be using MODULE_big, so use that as the default name; if a
> makefile that uses only MODULES also wants to install include files,
> have it define MODULE_NAME (or some such variable) itself.

AFAIR, you're supposed to define at most one of those macros anyway,
so I don't see why it couldn't be like "use MODULE_big if set, else
use MODULE if set, else fail if MODULE_NAME isn't set".

regards, tom lane



Re: ERROR: cannot start subtransactions during a parallel operation

2018-07-02 Thread Andres Freund
Hi,

On 2018-07-01 11:02:24 +0200, Mai Peng wrote:
> Hello, how could I relax the subtransaction restriction, I used the
> Parallel Unsafe option, but still have the same issue.
> Rgds.

I'm unclear why you still get the error. Could you please give us the
query triggering the error, including an explain of it, and the involved
function definitions?

Greetings,

Andres Freund



Re: Should contrib modules install .h files?

2018-07-02 Thread Andres Freund
On 2018-07-02 16:11:07 +0100, Andrew Gierth wrote:
> > "Tom" == Tom Lane  writes:
> 
>  >> A slight snag in trying to use a subdir for each module is that
>  >> there is not in fact anywhere in the existing makefiles that uses or
>  >> assigns such a name. Indeed some contrib subdirs install multiple
>  >> modules.
> 
>  Tom> So, given that we have to add something to the module makefiles
>  Tom> anyway, we could also add a macro specifying the subdirectory name
>  Tom> to use. (Although in practice this should always be equal to the
>  Tom> contrib/ subdirectory name, so maybe we could extract it on that
>  Tom> basis?)
> 
> Using the subdir name may work for in-tree contrib/ builds but it's not
> so good for PGXS, which should not be making assumptions about the build
> directory name.
> 
> How about this: it's most likely that modules that install include files
> will also be using MODULE_big, so use that as the default name; if a
> makefile that uses only MODULES also wants to install include files,
> have it define MODULE_NAME (or some such variable) itself.

For LLVM bitcode generation I made it so MODULE_big uses that as the
directory name, whereas MODULES just iterates over the components. Now
for the bitcode there was less need to make it selective, so it's a bit
easier.  We could just define that HEADERS_$component is referenced for
each component in MODULES or something like that?

Greetings,

Andres Freund



Re: branches_of_interest.txt

2018-07-02 Thread Andres Freund
On 2018-07-01 11:41:07 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > This file on the buildfarm server is used to tell clients which branches 
> > we'd like built. When a new stable branch is created it's added manually 
> > to this file, and when one gets to EOL it's removed from the file. This 
> > is a rather cumbersome process, and it occurred to me that it could be 
> > streamlined by our keeping it in the core repo instead.
> 
> I can see the value of people other than you being able to change it,
> but keeping it in the core repo seems like a kluge not a proper solution.
> In particular, once it'd been around for awhile so that the master copy
> had diverged from the back branches' copies, that would be pretty
> confusing IMO.

FWIW, I've a manually maintained version of this in the scripts I use to
commit / backpatch things.  I'd appreciate not having to manually
maintain it, and be afraid to forget updating it ;)

FWIW, I don't really see the problem of maintaining it in-tree, it has
the advantage of guaranteeing the set of known-to-be-maintained branches
is guaranteed to be current.

Greetings,

Andres Freund



Re: Should contrib modules install .h files?

2018-07-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> BTW, it's somewhat interesting to think about whether we ought to
 Tom> change the coding conventions so that extensions refer to their
 Tom> own headers with a subdirectory, e.g., #include "bloom/bloom.h".
 Tom> Having done that, all of contrib could build with a single
 Tom> centrally-provided -I switch pointing at BUILDDIR/contrib/, and
 Tom> there would be a path to allowing the code to build out of tree by
 Tom> pointing that common -I at $(includedir_server)/ or
 Tom> $(includedir_server)/MODULEDIR. This seems like it could be a lot
 Tom> less messy as we accrete more cross-module references.

I'm slightly skeptical of this because it could cause unexpected issues
when you rebuild (especially in the PGXS case) a module that has already
been installed; without care, you'd end up getting the module's own
headers from the installed version rather than the one being built,
which would be very bad.

-- 
Andrew (irc:RhodiumToad)



Re: Test-cases for deferred constraints in plpgsql_transaction.sql

2018-07-02 Thread Ashutosh Sharma
On Mon, Jul 2, 2018 at 7:07 PM, Tom Lane  wrote:
> Ashutosh Sharma  writes:
>> On Mon, Jul 2, 2018 at 3:30 PM, Peter Eisentraut
>>  wrote:
>>> I won't say we don't need more tests, but I don't see a particular
>>> testing gap in this area.
>
>> I am not saying that the existing test-case is not enough to test
>> deferred constraints but, it would have been good to add test-cases
>> having ROLLBACK statements with deferred constraints as well.
>
> What exactly would that test, other than that ROLLBACK discards changes?
> Which is a point that seems to me to be pretty well covered already.
>

Firstly, it would test if the ROLLBACK works as expected when used
with the deferred constraints in plpgsql procedures. Secondly, it
would test if the COMMIT/ROLLBACK works as expected for deferrable
constraints which was initially immediate type but, later it got set
to deferred using SET CONSTRAINTS command. I just raised this point
because i couldn't find such test anywhere in plpgsl_transaction.sql
file. Please let me know if i am missing something here. Thanks.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: Should contrib modules install .h files?

2018-07-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> A slight snag in trying to use a subdir for each module is that
 >> there is not in fact anywhere in the existing makefiles that uses or
 >> assigns such a name. Indeed some contrib subdirs install multiple
 >> modules.

 Tom> So, given that we have to add something to the module makefiles
 Tom> anyway, we could also add a macro specifying the subdirectory name
 Tom> to use. (Although in practice this should always be equal to the
 Tom> contrib/ subdirectory name, so maybe we could extract it on that
 Tom> basis?)

Using the subdir name may work for in-tree contrib/ builds but it's not
so good for PGXS, which should not be making assumptions about the build
directory name.

How about this: it's most likely that modules that install include files
will also be using MODULE_big, so use that as the default name; if a
makefile that uses only MODULES also wants to install include files,
have it define MODULE_NAME (or some such variable) itself.

-- 
Andrew (irc:RhodiumToad)



Old small commitfest items

2018-07-02 Thread Andrew Dunstan
Andres talked about us concentrating on old items and very small
items. Here's a list of items that are both old and small (FSVO
"small"):

The first number is the CF item number, the second the patch line count:

528 1146 Fix the optimization to skip WAL-logging on table created in
same transaction
669 847 pgbench - allow to store query results into variables
713 346 Correct space parsing in to_timestamp()
922 180 Failure at replay for corrupted 2PC files + reduce window
between end-of-recovery record and history file write
931 1527 Protect syscache from bloating with negative cache entries
962 553 new plpgsql extra_checks
990 248 add GUCs to control custom plan logic
1001 851 Convert join OR clauses into UNION queries
1004 1159 SERIALIZABLE with parallel query
1085 922 XML XPath default namespace support
1113 68 Replication status in logical replication
1138 733 Improve compactify_tuples and PageRepairFragmentation
1141 1851 Full merge join on comparison clause
1166 1162 Fix LWLock degradation on NUMA

The first item on the list is just plain embarrassing. It's a bug fix
item that we've been punting for 10 CFs.

If people want a priority list for items to attack during the CF this
list is probably a good place to start.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-07-02 Thread Andrey V. Lepikhov



On 29.06.2018 14:07, Юрий Соколов wrote:
чт, 28 июн. 2018 г., 8:37 Andrey V. Lepikhov >:




On 28.06.2018 05:00, Peter Geoghegan wrote:
 > On Tue, Jun 26, 2018 at 11:40 PM, Andrey V. Lepikhov
 > mailto:a.lepik...@postgrespro.ru>> wrote:
 >> I still believe that the patch for physical TID ordering in btree:
 >> 1) has its own value, not only for target deletion,
 >> 2) will require only a few local changes in my code,
 >> and this patches can be developed independently.
 >
 > I want to be clear on something now: I just don't think that this
 > patch has any chance of getting committed without something like my
 > own patch to go with it. The worst case for your patch without that
 > component is completely terrible. It's not really important for
you to
 > actually formally make it part of your patch, so I'm not going to
 > insist on that or anything, but the reality is that my patch does not
 > have independent value -- and neither does yours.
 >
As I wrote before in the last email, I will integrate TID sorting to my
patches right now. Please, give me access to the last version of your
code, if it possible.
You can track the progress at https://github.com/danolivo/postgres git
repository


Peter is absolutely right, imho: tie-breaking by TID within index
  ordering is inevitable for reliable performance of this patch.



In the new version the patch [1] was used in cooperation with 'retail 
indextuple deletion' and 'quick vacuum strategy' patches (see 
'0004-Retail-IndexTuple-Deletion-with-TID-sorting-in-leaf-.patch'.


To demonstrate the potential benefits, I did a test:

CREATE TABLE test (id serial primary key, value integer, factor integer);
INSERT INTO test (value, factor) SELECT random()*1e5, random()*1e3 FROM 
generate_series(1, 1e7);

CREATE INDEX ON test(value);
VACUUM;
DELETE FROM test WHERE (factor = 1);
VACUUM test;

Execution time of last "VACUUM test;" command on my notebook was:

with bulk deletion: 1.6 s;
with Quick Vacuum Strategy: 5.2 s;
with Quick Vacuum Strategy & TID sorting: 0.6 s.

[1] 
https://www.postgresql.org/message-id/CAH2-WzkVb0Kom%3DR%2B88fDFb%3DJSxZMFvbHVC6Mn9LJ2n%3DX%3DkS-Uw%40mail.gmail.com



With regards,
Sokolov Yura.


--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

>From 7f2384691de592bf4d11cc8b4d75eca5500cd500 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 2 Jul 2018 16:13:08 +0500
Subject: [PATCH 1/4] Retail-IndexTuple-Deletion-Access-Method

---
 contrib/bloom/blutils.c  |   1 +
 src/backend/access/brin/brin.c   |   1 +
 src/backend/access/gin/ginutil.c |   1 +
 src/backend/access/gist/gist.c   |   1 +
 src/backend/access/hash/hash.c   |   1 +
 src/backend/access/index/indexam.c   |  15 
 src/backend/access/nbtree/nbtree.c   | 138 +++
 src/backend/access/spgist/spgutils.c |   1 +
 src/include/access/amapi.h   |   6 ++
 src/include/access/genam.h   |  18 +
 src/include/access/nbtree.h  |   4 +
 11 files changed, 187 insertions(+)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 6b2b9e3..96f1d47 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -126,6 +126,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = blbuild;
 	amroutine->ambuildempty = blbuildempty;
 	amroutine->aminsert = blinsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e95fbbc..a0e06bd 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -103,6 +103,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = brinbuild;
 	amroutine->ambuildempty = brinbuildempty;
 	amroutine->aminsert = brininsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 0a32182..acf14e7 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -58,6 +58,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = ginbuild;
 	amroutine->ambuildempty = ginbuildempty;
 	amroutine->aminsert = gininsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 8a42eff..d7a53d2 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -80,6 +80,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = gistbuild;
 	amroutine->ambuildempty = gistbuildempty;
 	

Re: Should contrib modules install .h files?

2018-07-02 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> FWIW, I agree with Andres' thought that each contrib module should
>  Tom> have its own subdirectory under $(includedir_server). Otherwise
>  Tom> we're going to be faced with questions about whether .h files need
>  Tom> to be renamed because they're not globally unique enough. There
>  Tom> are already some that are pretty shaky from this standpoint:

> I'm not suggesting that all modules should install a .h file or that all
> of a module's .h files should be installed.

I agree with that, which implies the need for a new macro comparable to
DATA and DOCS that lists the .h files to be installed.

> A slight snag in trying to use a subdir for each module is that there is
> not in fact anywhere in the existing makefiles that uses or assigns such
> a name. Indeed some contrib subdirs install multiple modules.

So, given that we have to add something to the module makefiles anyway,
we could also add a macro specifying the subdirectory name to use.
(Although in practice this should always be equal to the contrib/
subdirectory name, so maybe we could extract it on that basis?)

regards, tom lane



Re: Should contrib modules install .h files?

2018-07-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Andrew Gierth  writes:
 >> I'm thinking that $(includedir_server)/$(MODULEDIR) would be a
 >> reasonable place? MODULEDIR defaults to either "contrib" or
 >> "extension" depending on whether EXTENSION is set. Something like
 >> the attached patch seem reasonable?

 Tom> FWIW, I agree with Andres' thought that each contrib module should
 Tom> have its own subdirectory under $(includedir_server). Otherwise
 Tom> we're going to be faced with questions about whether .h files need
 Tom> to be renamed because they're not globally unique enough. There
 Tom> are already some that are pretty shaky from this standpoint:

I'm not suggesting that all modules should install a .h file or that all
of a module's .h files should be installed.

A slight snag in trying to use a subdir for each module is that there is
not in fact anywhere in the existing makefiles that uses or assigns such
a name. Indeed some contrib subdirs install multiple modules.

 Tom> Not sure about whether the MODULEDIR part is useful. Seems like
 Tom> making a distinction between extensions and other contrib is
 Tom> likely to create more headaches than it avoids.

Sure, but that's just copied from DATA and DOCS which already do it that
way. For DATA there seems some justification based on CREATE EXTENSION,
but for docs?

-- 
Andrew (irc:RhodiumToad)



Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2018-07-02 Thread Jesper Pedersen

Hi,

On 03/01/2018 10:50 AM, Jesper Pedersen wrote:
As the patch still applies, make check-world passes and I believe that 
Yura has provided feedback for Andres' comments I'll leave this entry 
in "Ready for Committer".




The patch from November 27, 2017 still applies (with hunks), passes 
"make check-world" and shows performance improvements.


Keeping it in "Ready for Committer".



The patch from November 27, 2017 still applies (with hunks),

 https://commitfest.postgresql.org/18/1166/

passes "make check-world" and shows performance improvements.

Keeping it in "Ready for Committer".

Best regards,
 Jesper



Re: PANIC during crash recovery of a recently promoted standby

2018-07-02 Thread Michael Paquier
On Mon, Jul 02, 2018 at 04:25:13PM +0900, Kyotaro HORIGUCHI wrote:
> When minRecoveryPoint is invalid, there're only two possible
> cases. It may be at very beginning of archive reovery or may be
> running a crash recovery. In the latter case, we have detected
> crash recovery before redo starts. So we can turn off
> updateMinRecoveryPoint immediately and no further check is
> needed and it is (I think) easier to understand.

Er, you are missing the point that updateMinRecoveryPoint is also used
by processes, like the checkpointer, other than the startup process,
which actually needs to update minRecoveryPoint and rely on the default
value of updateMinRecoveryPoint which is true...

I am planning to finish wrapping this patch luckily on Wednesday JST
time, or in the worst case on Thursday.  I got this problem on my mind
for a couple of days now and I could not find a case where the approach
taken could cause a problem.  Opinions are welcome.
--
Michael


signature.asc
Description: PGP signature


Re: Test-cases for deferred constraints in plpgsql_transaction.sql

2018-07-02 Thread Tom Lane
Ashutosh Sharma  writes:
> On Mon, Jul 2, 2018 at 3:30 PM, Peter Eisentraut
>  wrote:
>> I won't say we don't need more tests, but I don't see a particular
>> testing gap in this area.

> I am not saying that the existing test-case is not enough to test
> deferred constraints but, it would have been good to add test-cases
> having ROLLBACK statements with deferred constraints as well.

What exactly would that test, other than that ROLLBACK discards changes?
Which is a point that seems to me to be pretty well covered already.

regards, tom lane



Re: Should contrib modules install .h files?

2018-07-02 Thread Tom Lane
Andrew Gierth  writes:
> I'm thinking that $(includedir_server)/$(MODULEDIR) would be a
> reasonable place? MODULEDIR defaults to either "contrib" or "extension"
> depending on whether EXTENSION is set.
> Something like the attached patch seem reasonable?

FWIW, I agree with Andres' thought that each contrib module should have
its own subdirectory under $(includedir_server).  Otherwise we're going
to be faced with questions about whether .h files need to be renamed
because they're not globally unique enough.  There are already some that
are pretty shaky from this standpoint:

contrib$ ls */*.h
bloom/bloom.h   pg_trgm/trgm.h
btree_gist/btree_gist.h pgcrypto/blf.h
btree_gist/btree_utils_num.hpgcrypto/imath.h
btree_gist/btree_utils_var.hpgcrypto/mbuf.h
cube/cubedata.h pgcrypto/md5.h
hstore/hstore.h pgcrypto/pgcrypto.h
intarray/_int.h pgcrypto/pgp.h
isn/EAN13.h pgcrypto/px-crypt.h
isn/ISBN.h  pgcrypto/px.h
isn/ISMN.h  pgcrypto/rijndael.h
isn/ISSN.h  pgcrypto/sha1.h
isn/UPC.h   postgres_fdw/postgres_fdw.h
isn/isn.h   seg/segdata.h
ltree/crc32.h   sepgsql/sepgsql.h
ltree/ltree.h   tablefunc/tablefunc.h
pageinspect/pageinspect.h

Not sure about whether the MODULEDIR part is useful.  Seems like
making a distinction between extensions and other contrib is
likely to create more headaches than it avoids.

BTW, it's somewhat interesting to think about whether we ought to
change the coding conventions so that extensions refer to their
own headers with a subdirectory, e.g., #include "bloom/bloom.h".
Having done that, all of contrib could build with a single
centrally-provided -I switch pointing at BUILDDIR/contrib/,
and there would be a path to allowing the code to build out of
tree by pointing that common -I at $(includedir_server)/ or
$(includedir_server)/MODULEDIR.  This seems like it could be
a lot less messy as we accrete more cross-module references.

regards, tom lane



Re: Monitoring time of fsyncing WALs

2018-07-02 Thread Michael Paquier
On Mon, Jul 02, 2018 at 11:36:06AM +0800, Craig Ringer wrote:
> On 1 July 2018 at 11:29, Michael Paquier  wrote:
>> So at the end, I would like to use the proposed patch and call it a
>> day.  Thoughts?
>>
> Patch looks good.

Thanks Craig for the review!  I have just pushed the previous patch with
the adjustments mentioned.  Regarding enforcing pg_fsync with a wait
event, I think I'll step back on this one.  There are also arguments for
allowing code paths to not have wait events as that's quite low-level,
and for extensions the choice is limited as long as there is no way to
register custom names, if that ever happens, which I am not sure of.

> I'll hijack the thread to add a few more perf/dtrace tracepoints in the WAL
> code, as they were also missing. Proposed rider patch attached.

:)

As that is a separate discussion and as the commit fest has already
begun with many items in the queue, could you begin a new thread and add
this stuff within a new CF entry?  I would say that having more trace
points in this area could be helpful, but let's discuss that
appropriately so as your proposal catches proper attention.

> BTW, we might want to instrument the pgstat_ counter calls
> and pgstat_report_wait_start / pgstat_report_wait_end, but it's easy enough
> to use dynamic tracepoints for those so I haven't yet. Maybe even just
> document them as points of interest.

Perhaps.  Feel free to propose anything you have in mind of course.
--
Michael


signature.asc
Description: PGP signature


Re: psql \df option for procedures

2018-07-02 Thread Fabrízio de Royes Mello
On Mon, Jul 2, 2018 at 7:07 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
>
> psql's \df command current has options a/n/t/w to show
> aggregates/normal/trigger/window functions.  Do we want to add something
> for procedures?
>

+1. I can write a patch to save your time If you don't do it yet.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: [PATCH] Include application_name in "connection authorized" log message

2018-07-02 Thread Don Seiler
On Mon, Jul 2, 2018 at 2:13 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 21.06.18 16:21, Don Seiler wrote:
> > -   (errmsg("connection
> > authorized: user=%s database=%s",
> > -
> >  port->user_name, port->database_name)));
> > +   (errmsg("connection
> > authorized: user=%s database=%s application=%s",
> > +
> >  port->user_name, port->database_name, port->application_name)));
>
> Why is it "application" and not "application_name"?


I was trying to be consistent since we don't use "user_name" or
"database_name" as labels even though those are the variable names.


-- 
Don Seiler
www.seiler.us


Re: branches_of_interest.txt

2018-07-02 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jul 2, 2018 at 8:18 AM, Andrew Dunstan
>  wrote:
>> An alternative would be to create a special branch within the core
>> repo for such data, something like this (The first two lines are the
>> ones that are most important):
>> ...
>> The new branch won't share any history or files with the existing branches.

> Seems like too much magic to me.

Dunno, I was wondering yesterday whether something like that would be
possible.  It'd be easier to maintain than a separate repo, for sure.

I wonder what that would look like in gitweb, though.  If the website
treated it like a version branch, it'd likely be weird.

regards, tom lane



Re: branches_of_interest.txt

2018-07-02 Thread Andrew Dunstan
On Mon, Jul 2, 2018 at 8:33 AM, Robert Haas  wrote:
> On Mon, Jul 2, 2018 at 8:18 AM, Andrew Dunstan
>  wrote:
>> Ideally this would be done as part of creating the new branch. Since
>> the web site doesn't have the same set of committers, a second metdata
>> repo like this seems sensible.
>> An alternative would be to create a special branch within the core
>> repo for such data, something like this (The first two lines are the
>> ones that are most important):
>>
>> git checkout --orphan metadata
>> git rm --cached -r .
>> wget https://buildfarm.postgresql.org/branches_of_interest.txt
>> git add branches_of_interest.txt
>> git commit -m 'initial content' branches_of_interest.txt
>> git push origin HEAD
>> git checkout master
>>
>> The new branch won't share any history or files with the existing branches.
>
> Seems like too much magic to me.
>


This is pretty much how GitHub's gh-pages docco mechanism works. It's
not particularly deep magic. But if it makes people uncomfortable,
let's go for a second repo. It's not worth having a huge argument
over.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: branches_of_interest.txt

2018-07-02 Thread Robert Haas
On Mon, Jul 2, 2018 at 8:18 AM, Andrew Dunstan
 wrote:
> Ideally this would be done as part of creating the new branch. Since
> the web site doesn't have the same set of committers, a second metdata
> repo like this seems sensible.
> An alternative would be to create a special branch within the core
> repo for such data, something like this (The first two lines are the
> ones that are most important):
>
> git checkout --orphan metadata
> git rm --cached -r .
> wget https://buildfarm.postgresql.org/branches_of_interest.txt
> git add branches_of_interest.txt
> git commit -m 'initial content' branches_of_interest.txt
> git push origin HEAD
> git checkout master
>
> The new branch won't share any history or files with the existing branches.

Seems like too much magic to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Explain buffers wrong counter with parallel plans

2018-07-02 Thread Robert Haas
On Sun, Jun 10, 2018 at 1:18 AM, Amit Kapila  wrote:
> Right, I think we have following options:
> (a) Come up with a solution which allows percolating the buffer usage
> and or similar stats to upper nodes in all cases.
> (b) Allow it to work for some of the cases as it was earlier.
>
> I think (b) can cause confusion and could lead to further questions on
> which specific cases will it work and for which it won't work.  If we
> think (a) is a reasonable approach, then we can close this item with a
> conclusion as a work item for future and OTOH if we think option (b)
> is the better way to deal with it, then we can come up with a patch to
> do so.  My inclination is to go with option (a), but I don't mind if
> the decision is to choose option (b).

I think the core problem here is this hunk from gather_readnext:

 {
 Assert(!tup);
-DestroyTupleQueueReader(reader);
 --gatherstate->nreaders;
 if (gatherstate->nreaders == 0)
-{
-ExecShutdownGatherWorkers(gatherstate);
 return NULL;
-}
 memmove(&gatherstate->reader[gatherstate->nextreader],
 &gatherstate->reader[gatherstate->nextreader + 1],
 sizeof(TupleQueueReader *)

Since ExecShutdownGatherWorkers() is no longer called there, the
instrumentation data isn't accumulated into the Gather node when the
workers are shut down.  I think that's a bug and we should fix it.

To fix the problem with Limit that you mention, we could just modify
nodeLimit.c so that when the state is changed from LIMIT_INWINDOW to
LIMIT_WINDOWEND, we also call ExecShutdownNode on the child plan.

We can fix other cases as we find them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Explain buffers wrong counter with parallel plans

2018-07-02 Thread Robert Haas
On Fri, Jun 29, 2018 at 6:12 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I'm CCing Tom here, as author of the patch that caused (most of) the
>> issue.
>
> Uh ... me?  I thought this was a parallel-query issue, which I've
> pretty much not been involved in.

Well, it was your commit that caused the behavior change, as the
original poster mentioned in his email.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: branches_of_interest.txt

2018-07-02 Thread Andrew Dunstan
On Mon, Jul 2, 2018 at 4:45 AM, Magnus Hagander  wrote:
>
>
> On Mon, Jul 2, 2018 at 10:39 AM, Peter Eisentraut
>  wrote:
>>
>> On 01.07.18 17:41, Tom Lane wrote:
>> > I can see the value of people other than you being able to change it,
>> > but keeping it in the core repo seems like a kluge not a proper
>> > solution.
>> > In particular, once it'd been around for awhile so that the master copy
>> > had diverged from the back branches' copies, that would be pretty
>> > confusing IMO.
>>
>> Yeah, I'd find this kind of weird.  The version control system contains
>> the code, not the other way around.
>>
>> > Is it worth the trouble of having a separate repo for info that
>> > shouldn't
>> > be tied to a particular PG development branch?  I'm not quite sure what
>> > else we would keep there besides this file, but perhaps other use-cases
>> > will come up.  Some of the stuff in src/tools/ has a bit of this flavor.
>>
>> The web site has some information about which versions are current, so
>> maybe there is an opportunity to hook the buildfarm in there.
>>
>> (I also have to change some things on babel.postgresql.org every time a
>> branch is made.)
>
>
> We could definitely do that on the website, if people think that's the
> easiest.
>
> We could also set up a separate git repository with exactly the same set of
> committers as the main one (but only use the master branch there) as one
> option. The maintenance cost of doing so would be very close to zero as we
> have all the infrastructure for it already. so it'd be more about which one
> is more convenient to use.
>


Ideally this would be done as part of creating the new branch. Since
the web site doesn't have the same set of committers, a second metdata
repo like this seems sensible.
An alternative would be to create a special branch within the core
repo for such data, something like this (The first two lines are the
ones that are most important):

git checkout --orphan metadata
git rm --cached -r .
wget https://buildfarm.postgresql.org/branches_of_interest.txt
git add branches_of_interest.txt
git commit -m 'initial content' branches_of_interest.txt
git push origin HEAD
git checkout master

The new branch won't share any history or files with the existing branches.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ERROR: cannot start subtransactions during a parallel operation

2018-07-02 Thread Robert Haas
On Sun, Jul 1, 2018 at 5:02 AM, Mai Peng  wrote:
> Hello, how could I relax the subtransaction restriction, I used the Parallel
> Unsafe option, but still have the same issue.

There's no user option for that.  Somebody would need to enhance
PostgreSQL by writing a patch.

I agree with Andres that the issue of XID-assignment needs to be
considered.  There are probably separate guards against that problem,
though, so it may not be an issue.  But somebody really ought to go
through xact.c and see if they can find any other problems - e.g.
whether
nParallelCurrentXids/ParallelCurrentXids will work properly, whether
any places check the current transaction for a parallel-in-progress
status rather than the top-transaction, whether aborting out of
multiple a subtransaction works directly from a parallel worker.  As
the comment says:

/*
 * Workers synchronize transaction state at the beginning of
each parallel
 * operation, so we can't account for new subtransactions after that
 * point. We might be able to make an exception for the type of
 * subtransaction established by this function, which is
typically used in
 * contexts where we're going to release or roll back the subtransaction
 * before proceeding further, so that no enduring change to the
 * transaction state occurs. For now, however, we prohibit
this case along
 * with all the others.
 */

I think that at the time I wrote that comment (and I believe I was the
one who did write it) I mostly didn't have the time to do a careful
investigation for lurking problems, and it seemed better to be
conservative.  But it's quite possible that there is nothing to do
other than remove that error check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: branches_of_interest.txt

2018-07-02 Thread Robert Haas
On Mon, Jul 2, 2018 at 4:39 AM, Peter Eisentraut
 wrote:
> On 01.07.18 17:41, Tom Lane wrote:
>> I can see the value of people other than you being able to change it,
>> but keeping it in the core repo seems like a kluge not a proper solution.
>> In particular, once it'd been around for awhile so that the master copy
>> had diverged from the back branches' copies, that would be pretty
>> confusing IMO.
>
> Yeah, I'd find this kind of weird.  The version control system contains
> the code, not the other way around.

+1.  But I like the idea of using something to which there is broader access.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-07-02 Thread Masahiko Sawada
On Mon, Jul 2, 2018 at 5:25 PM, Daniel Gustafsson  wrote:
>> On 26 Jun 2018, at 17:11, Alexander Kuzmenkov  
>> wrote:
>
>> I took a look at the patch. It applies and compiles, the tests pass.
>
> Thanks for reviewing, and apologies for the slow response.
>
>> Some thoughts about the code:
>>
>> * Postgres lists cache their lengths, so you don't need uniqueLen.
>
> Good point, fixed.
>
>> * Use an array of WindowClauseSortNode's instead of a list. It's more 
>> suitable here because you are going to sort (qsort_list creates a temporary 
>> array).
>
> I was thinking about that, but opted for code simplicity with a List approach.
> The required size of the array isn’t known ahead of time, so it must either
> potentially overallocate to the upper bound of root->parse->windowClause or 
> use
> heuristics and risk reallocating when growing, neither of which is terribly
> appealing.  Do you have any suggestions or preferences?
>
>> * Reversing the sorted list looks more confusing to me than just sorting it 
>> in the proper order right away. A qsort() comparator returning negative 
>> means "left goes before right", but here it is effectively the other way 
>> around.
>
> Changed.
>
>> * There is a function named make_pathkeys_for_window that makes a list of 
>> canonical pathkeys given a window clause. Using this function to give you 
>> the pathkeys, and then comparing them, would be more future-proof in case we 
>> ever start using hashing for windowing. Moreover, the canonical pathkeys can 
>> be compared with pointer comparison which is much faster than equal(). 
>> Still, I'm not sure whether it's going to be convenient to use in this case, 
>> or even whether it is a right thing to do. What do you think?
>
> That’s an interesting thought, one that didn’t occur to me while hacking.  I’m
> not sure whether is would be wise/clean to overload with this functionality
> though.
>
> Attached updated version also adds a testcase that was clearly missing from 
> the
> previous version and an updated window.out.
>
> cheers ./daniel
>

Thank you for updating the patch! There are two review comments.

The current select_active_windows() function compares the all fields
of WindowClause for the sorting but with this patch we compare only
tleSortGroupRef, sortop and the number of uniqueOrder. I think this
leads a degradation as follows.

=# explain select *, sum(b) over w1, sum(a) over w2, sum(b) over w3
from w window w1 as (partition by a order by a nulls first), w2 as
(partition by a order by a), w3 as (partition by a order by a nulls
first);

* Current HEAD
QUERY PLAN
---
 WindowAgg  (cost=369.16..414.36 rows=2260 width=32)
   ->  Sort  (cost=369.16..374.81 rows=2260 width=24)
 Sort Key: a
 ->  WindowAgg  (cost=158.51..243.26 rows=2260 width=24)
   ->  WindowAgg  (cost=158.51..203.71 rows=2260 width=16)
 ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
   Sort Key: a NULLS FIRST
   ->  Seq Scan on w  (cost=0.00..32.60
rows=2260 width=8)
(8 rows)

* With patch
   QUERY PLAN
-
 WindowAgg 3  (cost=500.72..545.92 rows=2260 width=32)
   ->  Sort  (cost=500.72..506.37 rows=2260 width=24)
 Sort Key: a NULLS FIRST
 ->  WindowAgg 2  (cost=329.61..374.81 rows=2260 width=24)
   ->  Sort  (cost=329.61..335.26 rows=2260 width=16)
 Sort Key: a
 ->  WindowAgg 1  (cost=158.51..203.71 rows=2260 width=16)
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
 Sort Key: a NULLS FIRST
 ->  Seq Scan on w  (cost=0.00..32.60
rows=2260 width=8)
(10 rows)

---
+* Generating the uniqueOrder can be offloaded
to the comparison
+* function to optimize for the case where we
only have a single
+* window.  For now, optimize for readibility.

s/readibility/readability/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: jsonpath

2018-07-02 Thread Nikita Glukhov

On 28.06.2018 05:39, Thomas Munro wrote:


On Thu, Jun 28, 2018 at 11:38 AM, Nikita Glukhov
 wrote:

Attached 15th version of the patches.

Hi Nikita,

I wonder why the Windows build scripts are not finding and processing
your new .y and .l files:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.3672

Perhaps Mkvcbuild.pm needs to be told about them?  For example, it has this:

 $postgres->AddFiles('src/backend/parser', 'scan.l', 'gram.y');

Yes, generation of jsonpath_gram.* was missing in Mkvcbuild.pm.

I have fixed it in the new 16th version of patches, and now it seems to 
work on

Windows (https://ci.appveyor.com/project/NikitaGlukhov/postgres/build/1.0.4)


PS If you need a way to test on Windows but don't have a local Windows
build system, this may be useful:

https://wiki.postgresql.org/wiki/Continuous_Integration


Thank, this is really helpful.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


sqljson_jsonpath_v16.tar.gz
Description: application/gzip


Re: Cache lookup errors with functions manipulation object addresses

2018-07-02 Thread Michael Paquier
On Sun, Jul 01, 2018 at 12:31:17PM -0400, Andrew Dunstan wrote:
> I think you're asserting far too broad a policy for the CF, and in any case
> there has been no discussion of what exactly is a large patch. I don't see
> any great need to defer patch 3. It is substantial although not what I would
> class as large, but it also has relatively low impact, ISTM.

I am fine with any conclusion.  As the patch has rotten a bit, I
switched it from "Ready for committer" to "Needs Review" by the way.
That seems more appropriate as the thing has rotten a bit.
--
Michael


signature.asc
Description: PGP signature


Re: effect of JIT tuple deform?

2018-07-02 Thread Pierre Ducroquet
On Wednesday, June 27, 2018 5:38:31 PM CEST Pavel Stehule wrote:
> 2018-06-27 17:19 GMT+02:00 Tomas Vondra :
> > On 06/26/2018 09:25 PM, Pavel Stehule wrote:
> >> Hi
> >> 
> >> ...
> >> 
> >> So I am able to see effect of jit_tuple_deforming, and very well, but
> >> only if optimization is active. When optimization is not active then
> >> jit_tuple_deforming does slowdown.
> >> 
> >> So maybe a usage of jit_tuple_deforming can be conditioned by
> >> jit_optimization?
> > 
> > Can you share the test case and some detail about the hardware and
> > PostgreSQL configuration?
> 
> I did very simple test
> 
> 
> 0.
> 
> master branch without asserts, shared buffer to 1GB
> tested on Lenovo T520 8GB RAM 8CPU, i7
> Fedora 28, gcc  CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"  --with-llvm
> 
> 1.
> 
> select 'create table wt(' || string_agg(format('%I int', 'c' || i),',') ||
> ')' from generate_series(1,100) g(i) \gexec
> 
> 
> 2.
> 
> begin;
> select 'insert into wt values(' || (select
> string_agg((random()*1)::int::text,',') from generate_series(1,j - j +
> 100) g(i)) || ')' from generate_series(1,100) gg(j) \gexec
> insert into wt select * from wt;
> commit;
> 
> 3.
> 
> set max_paralel_workers to 0; -- the effect of JIT will be more visible
> 
> analyze wt;
> \timing
> 
> select sum(c99) from wt;
> 
> I tested some combination of:
> 
> jit: off on
> jit_inline_above_cost: 0, 10
> jit_optimize_above_cost: 0, 10
> jit_tuple_deforming: on, off
> 
> 
> My primitive tests shows nice possitive effect of jit_tuple_deforming if
> jit optimization is active. When jit optimization is not active, then
> jit_tuple_deforming did slowdown in my test.
> 
> So there is range of costs between 10 and 50 where
> jit_tuple_deforming didn't work well (without optimization)
> 
> I am limmited by small memory of my notebook - when I created table larger
> than 3GB, then I got IO waits on my crypted disc, and any effect of JIT was
> eliminated.
> 
> Regards
> 
> Pavel

Hi

I have studied this case a bit, and I think too that there is something wrong 
here.
Right now, jit_optimize is a -O3. It's really expensive, and triggering it 
here is not the right solution. In the attached patch, I force a -O1 for tuple 
deforming. With your test case, on my computer, the results are :
- no jit : 850ms
- jit with tuple deforming without optimizations : 1650 ms (1.5ms spent 
optimizing)
- jit without tuple deforming : 820ms (0.2ms)
- jit with tuple deforming with optimization (-O3) : 770ms (105ms)
- jit with tuple deforming with patch (-O1) : 725ms (54ms)

I will look at the generated code for tuple deforming, but I think we should 
pre-optimize the LLVM bytecode if we do not want to involve the LLVM 
optimization passes.

Regards

 Pierre
>From c2e70c8fbb7715283d3d53bdf5a70e4db18c99a9 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Mon, 2 Jul 2018 13:44:10 +0200
Subject: [PATCH] Introduce opt1 in LLVM/JIT, and force it with deforming

---
 src/backend/jit/llvm/llvmjit.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 5d0cdab1fc..025319e9c1 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -91,10 +91,12 @@ static const char *llvm_layout = NULL;
 
 
 static LLVMTargetMachineRef llvm_opt0_targetmachine;
+static LLVMTargetMachineRef llvm_opt1_targetmachine;
 static LLVMTargetMachineRef llvm_opt3_targetmachine;
 
 static LLVMTargetRef llvm_targetref;
 static LLVMOrcJITStackRef llvm_opt0_orc;
+static LLVMOrcJITStackRef llvm_opt1_orc;
 static LLVMOrcJITStackRef llvm_opt3_orc;
 
 
@@ -277,6 +279,8 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 #if LLVM_VERSION_MAJOR < 5
 	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, funcname)))
 		return (void *) (uintptr_t) addr;
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt1_orc, funcname)))
+		return (void *) (uintptr_t) addr;
 	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, funcname)))
 		return (void *) (uintptr_t) addr;
 #else
@@ -284,6 +288,10 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
 	if (addr)
 		return (void *) (uintptr_t) addr;
+	if (LLVMOrcGetSymbolAddress(llvm_opt1_orc, &addr, funcname))
+		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
+	if (addr)
+		return (void *) (uintptr_t) addr;
 	if (LLVMOrcGetSymbolAddress(llvm_opt3_orc, &addr, funcname))
 		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
 	if (addr)
@@ -420,6 +428,8 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 
 	if (context->base.flags & PGJIT_OPT3)
 		compile_optlevel = 3;
+	else if (context->base.flags & PGJIT_DEFORM)
+		compile_optlevel = 1;
 	else
 		compile_optlevel = 0;
 
@@ -491,6 +501,8 @@ llvm_compile_module(LLVMJitContext *context)
 
 	if (context->base.flags & PGJIT_OPT3)
 		compile_or

Re: Concurrency bug in UPDATE of partition-key

2018-07-02 Thread Amit Khandekar
On 30 June 2018 at 19:20, Amit Kapila  wrote:
> On Fri, Jun 29, 2018 at 11:22 PM, Alvaro Herrera  
> wrote:
>> I was a bit surprised by the new epqslot output argument being added,
>> and now I think I know why: we already have es_trig_tuple_slot, so
>> shouldn't we be using that here instead?  Seems like it'd all be simpler ...

es_trig_tuple_slot is already allocated in ExecInitModifyTable(). And
the slot returned by EvalPlanQual is a separately allocated tuple
slot. I didn't get how we can assign the epqslot to
estate->es_trig_tuple_slot. That would mean throwing away the already
allocated es_trig_tuple_slot. I believe, the es_trig_tuple_slot
variable is not used for assigning already allocated slots to it.

>>
>
> Hmm, maybe, but not sure if it will be simpler.  The point is that we
> don't need to always return the epqslot, it will only be returned for
> the special case, so you might need to use an additional boolean
> variable to indicate when to fill the epqslot or someway indicate the
> same via es_trig_tuple_slot.  I think even if we somehow do that, we
> need to do something additional like taking tuple from epqslot and
> store it in es_trig_tuple_slot as I am not sure if we can directly
> assign the slot returned by EvalPlanQual to es_trig_tuple_slot.

Right, I think making use of es_trig_tuple_slot will cause redundancy
in our case, because epqslot is a separately allocated slot; so it
makes sense to pass it back separately.

> OTOH, the approach used by Amit Khandekar seems somewhat better as you
> can directly return the slot returned by EvalPlanQual in the output
> parameter.  IIUC, the same technique is already used by
> GetTupleForTrigger where it returns the epqslot in an additional
> parameter.
>



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: Test-cases for deferred constraints in plpgsql_transaction.sql

2018-07-02 Thread Ashutosh Sharma
On Mon, Jul 2, 2018 at 3:30 PM, Peter Eisentraut
 wrote:
> On 02.07.18 11:46, Ashutosh Sharma wrote:
>> Currently, I could see only one test-case for deferred constraints in
>> plpgsql_transaction.sql file which tests if the constraint checking is
>> happening during commit time or not with the help of COMMIT statement.
>> Shouldn't we add some more test-cases to test ROLLBACK and SET
>> CONSTRAINTS statements with deferrable constraints inside DO blocks.
>
> The purpose of that test is to check what happens when the COMMIT
> command fails.  Using deferrable constraints is just a way to trigger an
> error coming from the COMMIT command.
>
> I won't say we don't need more tests, but I don't see a particular
> testing gap in this area.
>

I am not saying that the existing test-case is not enough to test
deferred constraints but, it would have been good to add test-cases
having ROLLBACK statements with deferred constraints as well.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: How to find the base version cf-bot is using?

2018-07-02 Thread Thomas Munro
On Mon, Jul 2, 2018 at 9:39 PM, Kyotaro HORIGUCHI
 wrote:
>> === applying patch 
>> ./v4-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch
> ...
>> |Date: Tue, 26 Dec 2017 17:43:09 +0900
> ...
>> Patching file src/backend/utils/cache/plancache.c using Plan A...
>> Hunk #1 failed at 63.

That's complaining about patch 0004 in your patch set.  The first 3
apply OK, but it looks like maybe 0004 includes stuff that was already
in 0001?  cfbot tries to apply them all.  I just tried it on HEAD
myself:

$ patch -p1 < v4-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch
patching file doc/src/sgml/config.sgml
patching file src/backend/access/transam/xact.c
patching file src/backend/utils/cache/catcache.c
patching file src/backend/utils/cache/plancache.c
patching file src/backend/utils/misc/guc.c
patching file src/backend/utils/misc/postgresql.conf.sample
patching file src/include/utils/catcache.h
patching file src/include/utils/plancache.h
$ patch -p1 < v4-0002-introduce-dynhash-pruning.patch
patching file src/backend/utils/hash/dynahash.c
patching file src/include/utils/catcache.h
patching file src/include/utils/hsearch.h
$ patch -p1 < v4-0003-Apply-purning-to-relcache.patch
patching file src/backend/utils/cache/relcache.c
$ patch -p1 < v4-0004-Generic-plan-removal-of-PlanCacheSource.patch
patching file src/backend/utils/cache/plancache.c
Reversed (or previously applied) patch detected!  Assume -R? [n] ^C

Let's see, 0001 and 0004 both include this hunk:

--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -63,12 +63,14 @@
 #include "storage/lmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
+#include "utils/catcache.h"
 #include "utils/inval.h"

--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -63,12 +63,14 @@
 #include "storage/lmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
+#include "utils/catcache.h"
 #include "utils/inval.h"

> On the other hand the same patch cleanly applies on the master
> HEAD with no offsets.
>
>> $ patch -p1 < 
>> ~/work/patches/v4-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch
>> patching file doc/src/sgml/config.sgml
>> patching file src/backend/access/transam/xact.c
>> patching file src/backend/utils/cache/catcache.c
>> patching file src/backend/utils/cache/plancache.c
>> patching file src/backend/utils/misc/guc.c
>> patching file src/backend/utils/misc/postgresql.conf.sample
>> patching file src/include/utils/catcache.h
>> patching file src/include/utils/plancache.h

Here you show only patch 0001.

> How can I find the base version the CF-bot used at the time?

Yeah.  Sorry.  I should include that information in the log to make it
clearer what it's doing.  I will do that in the next update.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: psql \df option for procedures

2018-07-02 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 Peter> psql's \df command current has options a/n/t/w to show
 Peter> aggregates/normal/trigger/window functions. Do we want to add
 Peter> something for procedures?

yes

-- 
Andrew (irc:RhodiumToad)



psql \df option for procedures

2018-07-02 Thread Peter Eisentraut
psql's \df command current has options a/n/t/w to show
aggregates/normal/trigger/window functions.  Do we want to add something
for procedures?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Should contrib modules install .h files?

2018-07-02 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 >> So I have this immediate problem: a PGXS build of a module,
 >> specifically an hstore transform for a non-core PL, is much harder
 >> than it should be because it has no way to get at hstore.h since
 >> that file is never installed anywhere.
 >> 
 >> Should that be changed?

 Peter> Yes, just install it, I think.

I'm thinking that $(includedir_server)/$(MODULEDIR) would be a
reasonable place? MODULEDIR defaults to either "contrib" or "extension"
depending on whether EXTENSION is set.

Something like the attached patch seem reasonable?

-- 
Andrew (irc:RhodiumToad)

diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index ab7fef3979..470474b062 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -11,6 +11,8 @@ DATA = hstore--1.4.sql hstore--1.4--1.5.sql \
 	hstore--unpackaged--1.0.sql
 PGFILEDESC = "hstore - key/value pair data type"
 
+INCLUDES = hstore.h
+
 REGRESS = hstore
 
 ifdef USE_PGXS
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index bfae02f90c..f6662be2b8 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -38,6 +38,7 @@
 #   SCRIPTS -- script files (not binaries) to install into $PREFIX/bin
 #   SCRIPTS_built -- script files (not binaries) to install into $PREFIX/bin,
 # which need to be built first
+#   INCLUDES -- files to install into $PREFIX/include/server/$MODULEDIR
 #   REGRESS -- list of regression test cases (without suffix)
 #   REGRESS_OPTS -- additional switches to pass to pg_regress
 #   NO_INSTALLCHECK -- don't define an installcheck target, useful e.g. if
@@ -94,13 +95,16 @@ endif
 ifdef MODULEDIR
 datamoduledir := $(MODULEDIR)
 docmoduledir := $(MODULEDIR)
+incmoduledir := $(MODULEDIR)
 else
 ifdef EXTENSION
 datamoduledir := extension
 docmoduledir := extension
+incmoduledir := extension
 else
 datamoduledir := contrib
 docmoduledir := contrib
+incmoduledir := contrib
 endif
 endif
 
@@ -154,6 +158,9 @@ endif # SCRIPTS
 ifdef SCRIPTS_built
 	$(INSTALL_SCRIPT) $(SCRIPTS_built) '$(DESTDIR)$(bindir)/'
 endif # SCRIPTS_built
+ifdef INCLUDES
+	$(INSTALL_DATA) $(addprefix $(srcdir)/, $(INCLUDES)) '$(DESTDIR)$(includedir_server)/$(incmoduledir)/'
+endif # INCLUDES
 ifdef MODULE_big
 ifeq ($(with_llvm), yes)
 	$(call install_llvm_module,$(MODULE_big),$(OBJS))
@@ -184,6 +191,9 @@ endif # DOCS
 ifneq (,$(PROGRAM)$(SCRIPTS)$(SCRIPTS_built))
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
 endif
+ifdef INCLUDES
+	$(MKDIR_P) '$(DESTDIR)$(includedir_server)/$(incmoduledir)'
+endif
 
 ifdef MODULE_big
 installdirs: installdirs-lib
@@ -218,6 +228,9 @@ endif
 ifdef SCRIPTS_built
 	rm -f $(addprefix '$(DESTDIR)$(bindir)'/, $(SCRIPTS_built))
 endif
+ifdef INCLUDES
+	rm -f $(addprefix '$(DESTDIR)$(includedir_server)/$(incmoduledir)'/, $(INCLUDES))
+endif
 
 ifdef MODULE_big
 ifeq ($(with_llvm), yes)


Re: Test-cases for deferred constraints in plpgsql_transaction.sql

2018-07-02 Thread Peter Eisentraut
On 02.07.18 11:46, Ashutosh Sharma wrote:
> Currently, I could see only one test-case for deferred constraints in
> plpgsql_transaction.sql file which tests if the constraint checking is
> happening during commit time or not with the help of COMMIT statement.
> Shouldn't we add some more test-cases to test ROLLBACK and SET
> CONSTRAINTS statements with deferrable constraints inside DO blocks.

The purpose of that test is to check what happens when the COMMIT
command fails.  Using deferrable constraints is just a way to trigger an
error coming from the COMMIT command.

I won't say we don't need more tests, but I don't see a particular
testing gap in this area.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Global shared meta cache

2018-07-02 Thread Konstantin Knizhnik




On 26.06.2018 09:48, Ideriha, Takeshi wrote:

Hi, hackers!

My customer created hundreds of thousands of partition tables and tried to 
select data from hundreds of applications,
which resulted in enormous consumption of memory because it consumed # of 
backend multiplied by # of local memory (ex. 100 backends X 1GB = 100GB).
Relation caches are loaded on each backend local memory.

To address this issue I'm trying to move meta caches like catcache or relcache 
into shared memory.

This topic seems to have been discussed several times.
For instance this thread:
https://www.postgresql.org/message-id/CA%2BTgmobjDw_SWsxyJwT9z-YOwWv0ietuQx5fb%3DWEYdDfvCbzGQ%40mail.gmail.com

In my understanding, it discussed moving catcache and relcache to shared memory 
rather than current local backend memory,
and is most concerned with performance overhead.

Robert Haas wrote:

I think it would be interested for somebody to build a prototype here
that ignores all the problems but the first and uses some
straightforward, relatively unoptimized locking strategy for the first
problem. Then benchmark it. If the results show that the idea has
legs, then we can try to figure out what a real implementation would
look like.
(One possible approach: use Thomas Munro's DHT stuff to build the shared cache.)

I'm inspired by this comment and now developing a prototype (please see 
attached),
but I haven't yet put cache structure on shared memory.
Instead, I put dummy data on shared memory which is initialized at startup,
and then acquire/release lock just before/after searching/creating catcache 
entry.

I haven't considered relcache and catcachelist either.
It is difficult for me to do everything at one time with right direction.
So I'm trying to make small prototype and see what I'm walking on the proper 
way.

I tested pgbench to compare master branch with my patch.

0) Environment
- RHEL 7.4
- 16 cores
- 128 GB memory

1) Initialized with pgbench -i -s10

2) benchmarked 3 times for each conditions and got the average result of TPS.
  |master branch | prototype  | 
proto/master (%)


pgbench -c48 -T60 -Msimple -S   | 131297   |130541 |101%
pgbench -c48 -T60 -Msimple  | 4956 |4965   |95%
pgbench -c48 -T60 -Mprepared -S |129688|132538 |97%
pgbench -c48 -T60 -Mprepared|5113  |4615   |84%

   This result seems to show except for prepared protocol with "not only 
SELECT" it didn't make much difference.



What do you think about it?
Before I dig deeper, I want to hear your thoughts.

Best regards,
Takeshi Ideriha



Hi,
I really think that we need to move to global caches (and especially  
catalog caches) in Postgres.
Modern NUMA servers may have hundreds of cores and to be able to utilize  
all of them, we may need to start large number (hundreds) of backends.

Memory overhead of local cache multiplied by 1000 can be quite significant.

But I am not sure that just using RW lock will be enough replace local  
cache with global.

I am quite skeptical concerning performance results you have provided.
Once dataset completely fits in memory (which is true in your case),  
select-only pgbench with prepared statements should be about two times  
faster,
than without prepared statements. And in your case performance with  
prepared statements is even worser.


I wonder if you have repeated each measurement multiple time, to make  
sure that it is not just a fluctuation.
Also which postgresql configuration you have used. If it is default  
postgresql.conf with 128Mb shared buffers size,
then you are measuring time of disk access and catalog cache is not  
relevant for performance in this case.


Below are result I got with pgbench scale 100 (with scale 10 results are  
slightly better) at my desktop with just 16Gb of RAM and 4 ccore.:


   |master branch | prototype  | 
proto/master (%)
   

   pgbench -c10 -T60 -Msimple -S   | 187189|182123 |97%
   pgbench -c10 -T60 -Msimple  | 15495 |15112  |97%
   pgbench -c10 -T60 -Mprepared -S | 98273 |92810  |94%
   pgbench -c10 -T60 -Mprepared| 25796 |25169  |97%

As you see there are no surprises here: negative effect of shared cache  
is the largest for the case of non-prepared selects
(because selects themselves are much faster than updates and during  
compilation we have to access relations multiple times).











--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-02 Thread Etsuro Fujita

(2018/06/22 23:58), Robert Haas wrote:

I think this approach is going to run into trouble if the level at
which we have to apply the ConvertRowTypeExpr happens not to be a
projection-capable node.


Actually, the level we have to do that would be a child rel of a 
partitioned table or a child join of a partitionwise join, so the plan 
node would be a scan or join plan unless the child rel or child join is 
itself partitioned, in which case the plan node would be 
Append/MergeAppend and the proposed patch recursively would apply that 
conversion to child plans for the Append/MergeAppend).



And, in general, it seems to me that we want
to produce the right outputs at the lowest possible level of the plan
tree.  For instance, suppose that one of the relations being scanned
is not parallel-safe but the others are.  Then, we could end up with a
plan like this:

Append
->  Seq Scan on temp_rela
->  Gather
   ->  Parallel Seq Scan on thing1
->  Gather
   ->  Parallel Seq Scan on thing2

If a projection is required to convert the row type expression, we
certainly want that to get pushed below the Gather, not to happen at
the Gather level itself.


IIUC, we currently don't consider such a plan for partition-wise join; 
we'll only consider gathering partial paths for the parent appendrel. 
So, I'm not sure we need to take into account that when applying the 
ConvertRowtypeExpr.  Maybe I'm missing something, though.



We certainly don't want it to happen at the
Append level, which can't even handle it.


I think so too.

Best regards,
Etsuro Fujita



  1   2   >