Re: On login trigger: take three

2020-12-09 Thread Pavel Stehule
st 9. 12. 2020 v 14:28 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 09.12.2020 15:34, Pavel Stehule wrote:
>
> Hi
>
> st 9. 12. 2020 v 13:17 odesílatel Greg Nancarrow 
> napsal:
>
>> On Tue, Dec 8, 2020 at 3:26 PM Pavel Stehule 
>> wrote:
>> >
>> >
>> > There are two maybe generic questions?
>> >
>> > 1. Maybe we can introduce more generic GUC for all event triggers like
>> disable_event_triggers? This GUC can be checked only by the database owner
>> or super user. It can be an alternative ALTER TABLE DISABLE TRIGGER ALL. It
>> can be protection against necessity to restart to single mode to repair the
>> event trigger. I think so more generic solution is better than special
>> disable_client_connection_trigger GUC.
>> >
>> > 2. I have no objection against client_connection. It is probably better
>> for the mentioned purpose - possibility to block connection to database.
>> Can be interesting, and I am not sure how much work it is to introduce the
>> second event - session_start. This event should be started after connecting
>> - so the exception there doesn't block connect, and should be started also
>> after the new statement "DISCARD SESSION", that will be started
>> automatically after DISCARD ALL.  This feature should not be implemented in
>> first step, but it can be a plan for support pooled connections
>> >
>>
>> I've created a separate patch to address question (1), rather than
>> include it in the main patch, which I've adjusted accordingly. I'll
>> leave question (2) until another time, as you suggest.
>> See the attached patches.
>>
>
> I see two possible questions?
>
> 1. when you introduce this event, then the new hook is useless ?
>
> 2. what is a performance impact for users that want not to use this
> feature. What is a overhead of EventTriggerOnConnect and is possible to
> skip this step if database has not any event trigger
>
>
> As far as I understand this are questions to me rather than to Greg.
> 1. Do you mean client_connection_hook? It is used to implement this new
> event type. It can be also used for other purposes.
>

ok. I don't like it, but there are redundant hooks (against event triggers)
already.


2. Connection overhead is quite large. Supporting on connect hook requires
> traversal of event trigger relation. But this overhead is negligible
> comparing with overhead of establishing connection. In any case I did the
> following test (with local connection):
>
> pgbench -C -S -T 100 -P 1 -M prepared postgres
>
> without this patch:
> tps = 424.287889 (including connections establishing)
> tps = 952.911068 (excluding connections establishing)
>
> with this patch (but without any connection trigger defined):
> tps = 434.642947 (including connections establishing)
> tps = 995.525383 (excluding connections establishing)
>
> As you can see - there is almost now different (patched version is even
> faster, but it seems to be just "white noise".
>

This is not the worst case probably. In this patch the
StartTransactionCommand is executed on every connection, although it is not
necessary - and for most use cases it will not be used.

I did more tests - see attachments and I see a 5% slowdown - I don't think
so it is acceptable for this case. This feature is nice, and for some users
important, but only really few users can use it.

┌┬─┬┬─┐
│  test  │ WITH LT │ LT ENABLED │ LT DISABLED │
╞╪═╪╪═╡
│ ro_constant_10 │ 539 │877 │ 905 │
│ ro_index_10│ 562 │808 │ 855 │
│ ro_constant_50 │ 527 │843 │ 863 │
│ ro_index_50│ 544 │731 │ 742 │
└┴─┴┴─┘
(4 rows)

I tested a performance of trigger (results of first column in table):

CREATE OR REPLACE FUNCTION public.foo()
 RETURNS event_trigger
 LANGUAGE plpgsql
AS $function$
begin
  if not pg_has_role(session_user, 'postgres', 'member') then
raise exception 'you are not super user';
  end if;
end;
$function$;

There is an available snapshot in InitPostgres, and then there is possible
to check if for the current database some connect event trigger exists.This
can reduce an overhead of this patch, when there are no logon triggers.

I think so implemented and used names are ok, but for this feature the
performance impact should be really very minimal.

There is other small issue - missing tab-complete support for CREATE
TRIGGER statement in psql

Regards

Pavel


> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


test-ro-2.sql
Description: application/sql


data
Description: Binary data


test-ro.sql
Description: application/sql


Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Greg Nancarrow
On Thu, Dec 10, 2020 at 1:23 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Greg Nancarrow 
> > Firstly, in order to perform parallel-safety checks in the case of 
> > partitions, the
> > patch currently recursively locks/unlocks
> > (AccessShareLock) each partition during such checks (as each partition may
> > itself be a partitioned table). Is there a better way of performing the
> > parallel-safety checks and reducing the locking requirements?
>
> First of all, as you demonstrated the planning time and execution time of 
> parallel insert, I think the increased planning time is negligible when the 
> parallel insert is intentionally used for loading large amount of data.  
> However, it's a problem if the overhead is imposed on OLTP transactions.  
> Does the overhead occur with the default values of 
> max_parallel_workers_per_gather = 2 and max_parall_workers = 8?
>
> To avoid this heavy checking during planning, I'm wondering if we can have an 
> attribute in pg_class, something like relhasindexes and relhas triggers.  The 
> concerning point is that we have to maintain the accuracy of the value when 
> dropping ancillary objects around the table/partition.
>

Having information in another table that needs to be accessed is
likely to also have locking requirements.
Here the issue is specifically with partitions, because otherwise if
the target relation is not a partitioned table, it will already be
locked prior to planning as part of the parse/re-write phase (and you
will notice that the initial lock-mode, used by the parallel-safety
checking code for opening the table, is NoLock).

>
> > Secondly, I found that when running "make check-world", the
> > "partition-concurrent-attach" test fails, because it is expecting a 
> > partition
> > constraint to be violated on insert, while an "alter table attach partition 
> > ..." is
> > concurrently being executed in another transaction. Because of the partition
> > locking done by the patch's parallel-safety checking code, the insert 
> > blocks on
> > the exclusive lock held by the "alter table" in the other transaction until 
> > the
> > transaction ends, so the insert ends up successfully completing (and thus 
> > fails
> > the test) when the other transaction ends. To overcome this test failure, 
> > the
> > patch code was updated to instead perform a conditional lock on the 
> > partition,
> > and on failure (i.e. because of an exclusive lock held somewhere else), just
> > assume it's parallel-unsafe because the parallel-safety can't be determined
> > without blocking on the lock. This is not ideal, but I'm not sure of what 
> > other
> > approach could be used and I am somewhat reluctant to change that test. If
> > anybody is familiar with the "partition-concurrent-attach" test, any ideas 
> > or
> > insights would be appreciated.
>
> That test looks sane.  I think what we should do is to disable parallel 
> operation during that test.  It looks like some of other existing test cases 
> disable parallel query by setting max_parallel_workers_per_gather to 0.  It's 
> not strange that some tests fail with some configuration.  autovacuum is 
> disabled in many places of the regression test.
>
> Rather, I don't think we should introduce the trick to use 
> ConditionalLockAcquire().  Otherwise, the insert would be executed in a 
> serial fashion without the user knowing it -- "What?  The insert suddenly 
> slowed down multiple times today, and it didn't finish within the planned 
> maintenance window.  What's wrong?"
>
>

I think that's probably the best idea, to disable parallel operation
during that test.
However, that doesn't change the fact that, after removal of that
"trick", then the partition locking used in the parallel-safety
checking code will block, if a concurrent transaction has exclusively
locked that partition (as in this test case), and thus there is no
guarantee that a parallel insert will execute faster compared to
serial execution (as such locks tend to be held until the end of the
transaction).

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Asynchronous Append on postgres_fdw nodes.

2020-12-09 Thread Andrey V. Lepikhov

On 11/17/20 2:56 PM, Etsuro Fujita wrote:

On Mon, Oct 5, 2020 at 3:35 PM Etsuro Fujita  wrote:
Comments welcome!  The attached is still WIP and maybe I'm missing
something, though.
I reviewed your patch and used it in my TPC-H benchmarks. It is still 
WIP. Will you improve this patch?


I also want to say that, in my opinion, Horiguchi-san's version seems 
preferable: it is more structured, simple to understand, executor-native 
and allows to reduce FDW interface changes. This code really only needs 
one procedure - IsForeignPathAsyncCapable.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Dilip Kumar
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.
>

 /*
+ * PrepareParallelMode
+ *
+ * Prepare for entering parallel mode, based on command-type.
+ */
+void
+PrepareParallelMode(CmdType commandType)
+{
+ Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
+
+ if (IsModifySupportedInParallelMode(commandType))
+ {
+ /*
+ * Prepare for entering parallel mode by assigning a
+ * FullTransactionId, to be included in the transaction state that is
+ * serialized in the parallel DSM.
+ */
+ (void) GetCurrentTransactionId();
+ }
+}

Why do we need to serialize the transaction ID for 0001?  I mean in
0001 we are just allowing the SELECT to be executed in parallel so why
we would need the transaction Id for that.  I agree that we would need
this once we try to perform the Insert also from the worker in the
remaining patches.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2020-12-09 Thread Dilip Kumar
On Mon, Nov 30, 2020 at 2:40 PM Dilip Kumar  wrote:
>
> On Mon, Nov 30, 2020 at 12:17 PM Yugo NAGATA  wrote:
>
> Thanks for looking into this.
>
> > On Thu, 22 Oct 2020 20:36:48 +0530
> > Dilip Kumar  wrote:
> >
> > > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar  wrote:
> > > >
> > > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
> > > >  wrote:
> > > > >
> > > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas 
> > > > >  wrote in
> > > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar  
> > > > > > wrote:
> > > > > > > One idea could be, if the recovery process is waiting for WAL and 
> > > > > > > a
> > > > > > > recovery pause is requested then we can assume that the recovery 
> > > > > > > is
> > > > > > > paused because before processing the next wal it will always check
> > > > > > > whether the recovery pause is requested or not.
> > > > > ..
> > > > > > However, it might be better to implement this by having the system
> > > > > > absorb the pause immediately when it's in this state, rather than
> > > > > > trying to detect this state and treat it specially.
> > > > >
> > > > > The paused state is shown in pg_stat_activity.wait_event and it is
> > > > > strange that pg_is_wal_replay_paused() is inconsistent with the
> > > > > column.
> > > >
> > > > Right
> > > >
> > > > To make them consistent, we need to call recoveryPausesHere()
> > > > > at the end of WaitForWALToBecomeAvailable() and let
> > > > > pg_wal_replay_pause() call WakeupRecovery().
> > > > >
> > > > > I think we don't need a separate function to find the state.
> > > >
> > > > The idea makes sense to me.  I will try to change the patch as per the
> > > > suggestion.
> > >
> > > Here is the patch based on this idea.
> >
> > I reviewd this patch.
> >
> > First, I made a recovery conflict situation using a table lock.
> >
> > Standby:
> > #= begin;
> > #= select * from t;
> >
> > Primary:
> > #= begin;
> > #= lock t in ;
> >
> > After this, WAL of the table lock cannot be replayed due to a lock acquired
> > in the standby.
> >
> > Second, during the delay, I executed pg_wal_replay_pause() and
> > pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until
> > max_standby_streaming_delay was expired, and eventually returned true.
> >
> > I can also see the same behaviour by setting recovery_min_apply_delay.
> >
> > So, pg_is_wal_replay_paused waits for recovery to get paused and this works
> > successfully as expected.
> >
> > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > Especially, if max_standby_streaming_delay is -1, this will be blocked 
> > forever,
> > although this setting may not be usual. In addition, some users may set
> > recovery_min_apply_delay for a large.  If such users call 
> > pg_is_wal_replay_paused,
> > it could wait for a long time.
> >
> > At least, I think we need some descriptions on document to explain
> > pg_is_wal_replay_paused could wait while a time.
>
> Ok

Fixed this, added some comments in .sgml as well as in function header

> > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > control whether this waits for recovery to get paused or not? By setting its
> > default value to true or false, users can use the old format for calling 
> > this
> > and the backward compatibility can be maintained.
>
> So basically, if the wait_recovery_pause flag is false then we will
> immediately return true if the pause is requested?  I agree that it is
> good to have an API to know whether the recovery pause is requested or
> not but I am not sure is it good idea to make this API serve both the
> purpose?  Anyone else have any thoughts on this?
>
> >
> > As another comment, while pg_is_wal_replay_paused is blocking, I can not 
> > cancel
> > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
> >
> >
> > +   errhint("Recovery control functions can only be 
> > executed during recovery.")));
> >
> > There are a few tabs at the end of this line.
>
> I will fix.

Fixed this as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From a9f280e3b1bf87d18cf56771a3733c7576816191 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Thu, 10 Dec 2020 11:22:08 +0530
Subject: [PATCH v2] pg_is_wal_replay_paused will wait for recovery to pause

Currently, pg_is_wal_replay_paused, just check whether the recovery
pause is requested or not but it doesn't actually wait for recovery
to get paused.  With this patch if recovery pause is not requested
the api will return false otherwise it will wait for recovery to get
paused.
---
 doc/src/sgml/func.sgml |  5 +++-
 src/backend/access/transam/xlog.c  | 48 +-
 src/backend/access/transam/xlogfuncs.c | 37 +++---
 src/include/access/xlog.h  | 11 +++-
 4 files changed, 90 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 

Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Greg Nancarrow
On Thu, Dec 10, 2020 at 3:50 PM vignesh C  wrote:
> Few comments:
> +   Node   *index_expr;
> +
> +   if (index_expr_item == NULL)
>  /* shouldn't happen */
> +   elog(ERROR, "too few
> entries in indexprs list");
> +
> +   index_expr = (Node *)
> lfirst(index_expr_item);
>
> We can change this elog to below to maintain consistency:
> if (index_expr_item == NULL)/* shouldn't happen */
> {
>   context->max_hazard = PROPARALLEL_UNSAFE;
>   return context->max_hazard;
> }
>

Thanks. I think you pointed out something similar to this before, but
somehow I must have missed updating this as well (I was just following
existing error handling for this case in the Postgres code).
I'll update it as you suggest, in the next version of the patch I post.

> static HeapTuple
> heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
> CommandId cid, int options)
> {
> /*
> * To allow parallel inserts, we need to ensure that they are safe to be
> * performed in workers. We have the infrastructure to allow parallel
> * inserts in general except for the cases where inserts generate a new
> * CommandId (eg. inserts into a table having a foreign key column).
> */
> I felt we could remove the above comments or maybe rephrase it.
>

That is Amit's comment, and I'm reluctant to change it because it is
still applicable even after application of this patch.
Amit has previously suggested that I add an Assert here, to match the
comment (to replace the original Parallel-worker error-check that I
removed), so I am looking into that.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread vignesh C
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.
>

Thanks for fixing and posting a new patch.
Few comments:
+   Node   *index_expr;
+
+   if (index_expr_item == NULL)
 /* shouldn't happen */
+   elog(ERROR, "too few
entries in indexprs list");
+
+   index_expr = (Node *)
lfirst(index_expr_item);

We can change this elog to below to maintain consistency:
if (index_expr_item == NULL)/* shouldn't happen */
{
  context->max_hazard = PROPARALLEL_UNSAFE;
  return context->max_hazard;
}

static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
/*
* To allow parallel inserts, we need to ensure that they are safe to be
* performed in workers. We have the infrastructure to allow parallel
* inserts in general except for the cases where inserts generate a new
* CommandId (eg. inserts into a table having a foreign key column).
*/
I felt we could remove the above comments or maybe rephrase it.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-09 Thread Pavel Stehule
st 9. 12. 2020 v 22:59 odesílatel Tom Lane  napsal:

> Here's a couple of little finger exercises to move this along a bit.
>
> 0001 adds the ability to attach a subscript handler to an existing
> data type with ALTER TYPE.  This is clearly going to be necessary
> if we want extension types to be able to use this facility.  The
> only thing that I think might be controversial here is that I did
> not add the ability to set pg_type.typelem.  While that'd be easy
> enough so far as ALTER TYPE is concerned, I'm not sure that we want
> to encourage people to change it.  The dependency rules mean that
> the semantics of typelem aren't something you really want to change
> after-the-fact on an existing type.  Also, if we did allow it, any
> existing SubscriptingRef.refelemtype values in stored views would
> fail to be updated.
>
> 0002 makes use of that to support subscripting of hstore.  I'm not
> sure how much we care about that from a functionality standpoint,
> but it seems like it might be good to have a contrib module testing
> that extensions can use this.  Also, I thought possibly an example
> showing what's basically the minimum possible amount of complexity
> would be good to have.  If people like this, I'll finish it up (it
> lacks docs) and add it.
>

+1 using subscripts for hstore is nice idea

Pavel



> regards, tom lane
>
>


Re: [Patch] Optimize dropping of relation buffers using dlist

2020-12-09 Thread Amit Kapila
On Thu, Dec 10, 2020 at 7:11 AM Kyotaro Horiguchi
 wrote:
>
> At Wed, 9 Dec 2020 16:27:30 +0530, Amit Kapila  
> wrote in
> > On Wed, Dec 9, 2020 at 6:32 AM Kyotaro Horiguchi
> > > Mmm. At least btree doesn't need to call smgrnblocks except at
> > > expansion, so we cannot get to the optimized path in major cases of
> > > truncation involving btree (and/or maybe other indexes).
> > >
> >
> > AFAICS, btree insert should call smgrnblocks via
> > btree_xlog_insert->XLogReadBufferForRedo->XLogReadBufferForRedoExtended->XLogReadBufferExtended->smgrnblocks.
> > Similarly delete should also call smgrnblocks. Can you be bit more
> > specific related to the btree case you have in mind?
>
> Oh, sorry. I wrongly looked to non-recovery path. smgrnblocks is
> called during buffer loading while recovery. So, smgrnblock is called
> for indexes if any update happens on the heap relation.
>

Okay, so this means that we can get the benefit of optimization in
many cases in the Truncate code path as well even if we use 'cached'
flag? If so, then I would prefer to keep the code consistent for both
vacuum and truncate recovery code path.

-- 
With Regards,
Amit Kapila.




Re: Parallel Inserts in CREATE TABLE AS

2020-12-09 Thread Bharath Rupireddy
On Thu, Dec 10, 2020 at 7:48 AM Zhihong Yu  wrote:
> +   if (!OidIsValid(col->collOid) &&
> +   type_is_collatable(col->typeName->typeOid))
> +   ereport(ERROR,
> ...
> +   attrList = lappend(attrList, col);
>
> Should attrList be freed when ereport is called ?
>

I think that's not necessary since we are going to throw an error
anyways. And also that this is not a new code added as part of this
feature, it is an existing code adjusted for parallel inserts. On
looking further in the code base there are many places where we don't
free up the lists before throwing errors.

errmsg("column privileges are only valid for relations")));
errmsg("check constraint \"%s\" already exists",
errmsg("name or argument lists may not contain nulls")));
elog(ERROR, "no tlist entry for key %d", keyresno);

> +   query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;
>
> Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning 
> the value of 0 ?
>

Yeah both are equivalent. For now I will keep it that way, I will
change it in the next version of the patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Refactor MD5 implementations and switch to EVP for OpenSSL

2020-12-09 Thread Michael Paquier
On Wed, Dec 09, 2020 at 01:52:32PM +0100, Daniel Gustafsson wrote:
> The tiniest level of nitpicking would be that md5.h doesn't have the 
> /* PG_MD5_H */ comment on the #endif line (and it's even tinier since it's
> not even the fault of this patch - I just happened to notice when looking
> at that file).

Good catch.  I have fixed this one, looked again at the code this
morning, did more tests on Linux/Windows with/without OpenSSL, and
finally committed the patch.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS

2020-12-09 Thread Craig Ringer
On Thu, 3 Dec 2020 at 15:53, Craig Ringer  wrote:

>
> On Tue, 27 Oct 2020 at 16:34, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>
>> On 2020-09-25 09:40, Craig Ringer wrote:
>> > While working on extensions I've often wanted to enable cache
>> clobbering
>> > for a targeted piece of code, without paying the price of running it
>> for
>> > all backends during postgres startup and any initial setup tasks that
>> > are required.
>> >
>> > So here's a patch that, when CLOBBER_CACHE_ALWAYS or
>> > CLOBBER_CACHE_RECURSIVE are defined, adds a GUC named
>> > clobber_cache_depth . It defaults to 1 for CLOBBER_CACHE_ALWAYS or 3
>> for
>> > CLOBBER_CACHE_RECURSIVE to match the existing compiled-in behaviour.
>> But
>> > with this change it's now possible to run Pg with clobber_cache_depth=0
>> > then set it to 1 only for targeted tests.
>>
>> I think it would be great if the cache clobber code is always compiled
>> in (but turned off) when building with assertions.  The would reduce the
>> number of hoops to jump through to actually use this locally and reduce
>> the number of surprises from the build farm.
>
>
>
Updated per your comments Peter, see mail immediately up-thread.

Moved to this CF as https://commitfest.postgresql.org/31/2749/ .


RE: [Patch] Optimize dropping of relation buffers using dlist

2020-12-09 Thread tsunakawa.ta...@fujitsu.com
From: Kyotaro Horiguchi 
> Oh, sorry. I wrongly looked to non-recovery path. smgrnblocks is
> called during buffer loading while recovery. So, smgrnblock is called
> for indexes if any update happens on the heap relation.

I misunderstood that you said there's no problem with the TOAST index because 
TRUNCATE creates the meta page, resulting in the caching of the page and size 
of the relation.  Anyway, I'm relieved the concern disappeared.

Then, I'd like to hear your vote in my previous mail...


Regards
Takayuki Tsunakawa





Re: Deleting older versions in unique indexes to avoid page splits

2020-12-09 Thread Zhihong Yu
Hi,
In v11-0001-Pass-down-logically-unchanged-index-hint.patch :

+   if (hasexpression)
+   return false;
+
+   return true;

The above can be written as 'return !hasexpression;'

For +index_unchanged_by_update_var_walker:

+ * Returns true when Var that appears within allUpdatedCols located.

the sentence seems incomplete.

Currently the return value of index_unchanged_by_update_var_walker() is the
reverse of index_unchanged_by_update().
Maybe it is easier to read the code if their return values have the same
meaning.

Cheers

On Wed, Dec 9, 2020 at 5:13 PM Peter Geoghegan  wrote:

> On Tue, Dec 1, 2020 at 2:18 PM Peter Geoghegan  wrote:
> > On Tue, Dec 1, 2020 at 1:50 AM Heikki Linnakangas 
> wrote:
> > > This is a wholly new concept with a lot of heuristics. It's a lot of
> > > swallow.
>
> Attached is v11, which cleans everything up around the tableam
> interface. There are only two patches in v11, since the tableam
> refactoring made it impossible to split the second patch into a heapam
> patch and nbtree patch (there is no reduction in functionality
> compared to v10).
>
> Most of the real changes in v11 (compared to v10) are in heapam.c.
> I've completely replaced the table_compute_xid_horizon_for_tuples()
> interface with a new interface that supports all existing requirements
> (from index deletions that support LP_DEAD deletion), while also
> supporting these new requirements (e.g. bottom-up index deletion). So
> now heap_compute_xid_horizon_for_tuples() becomes
> heap_compute_delete_for_tuples(), which has different arguments but
> the same overall structure. All of the new requirements can now be
> thought of as additive things that we happen to use for nbtree
> callers, that could easily also be used in other index AMs later on.
> This means that there is a lot less new code in heapam.c.
>
> Prefetching of heap blocks for the new bottom-up index deletion caller
> now works in the same way as it has worked in
> heap_compute_xid_horizon_for_tuples() since Postgres 12 (more or
> less). This is a significant improvement compared to my original
> approach.
>
> Chaning heap_compute_xid_horizon_for_tuples() rather than adding a
> sibling function started to make sense when v10 of the patch taught
> regular nbtree LP_DEAD deletion (the thing that has been around since
> PostgreSQL 8.2) to add "extra" TIDs to check in passing, just in case
> we find that they're also deletable -- why not just have one function?
> It also means that hash indexes and GiST indexes now use the
> heap_compute_delete_for_tuples() function, though they get the same
> behavior as before. I imagine that it would be pretty straightforward
> to bring that same benefit to those other index AMs, since their
> implementations are already derived from the nbtree implementation of
> LP_DEAD deletion (and because adding extra TIDs to check in passing in
> these other index AMs should be fairly easy).
>
> > > > +} TM_IndexDelete;
> >
> > > > +} TM_IndexStatus;
> >
> > > Is it really significantly faster to have two arrays? If you had just
> > > one array, each element would be only 12 bytes long. That's not much
> > > smaller than TM_IndexDeletes, which is 8 bytes.
>
> I kept this facet of the design in v11, following some deliberation. I
> found that the TID sort operation appeared quite prominently in
> profiles, and so the optimizations mostly seemed to still make sense.
> I also kept one shellsort specialization. However, I removed the
> second specialized sort implementation, so at least there is only one
> specialization now (which is small anyway). I found that the second
> sort specialization (added to heapam.c in v10) really wasn't pulling
> its weight, even in more extreme cases of the kind that justified the
> optimizations in the first place.
>
> --
> Peter Geoghegan
>


RE: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread tsunakawa.ta...@fujitsu.com
From: Greg Nancarrow 
> Firstly, in order to perform parallel-safety checks in the case of 
> partitions, the
> patch currently recursively locks/unlocks
> (AccessShareLock) each partition during such checks (as each partition may
> itself be a partitioned table). Is there a better way of performing the
> parallel-safety checks and reducing the locking requirements?

First of all, as you demonstrated the planning time and execution time of 
parallel insert, I think the increased planning time is negligible when the 
parallel insert is intentionally used for loading large amount of data.  
However, it's a problem if the overhead is imposed on OLTP transactions.  Does 
the overhead occur with the default values of max_parallel_workers_per_gather = 
2 and max_parall_workers = 8?

To avoid this heavy checking during planning, I'm wondering if we can have an 
attribute in pg_class, something like relhasindexes and relhas triggers.  The 
concerning point is that we have to maintain the accuracy of the value when 
dropping ancillary objects around the table/partition.


> Secondly, I found that when running "make check-world", the
> "partition-concurrent-attach" test fails, because it is expecting a partition
> constraint to be violated on insert, while an "alter table attach partition 
> ..." is
> concurrently being executed in another transaction. Because of the partition
> locking done by the patch's parallel-safety checking code, the insert blocks 
> on
> the exclusive lock held by the "alter table" in the other transaction until 
> the
> transaction ends, so the insert ends up successfully completing (and thus 
> fails
> the test) when the other transaction ends. To overcome this test failure, the
> patch code was updated to instead perform a conditional lock on the partition,
> and on failure (i.e. because of an exclusive lock held somewhere else), just
> assume it's parallel-unsafe because the parallel-safety can't be determined
> without blocking on the lock. This is not ideal, but I'm not sure of what 
> other
> approach could be used and I am somewhat reluctant to change that test. If
> anybody is familiar with the "partition-concurrent-attach" test, any ideas or
> insights would be appreciated.

That test looks sane.  I think what we should do is to disable parallel 
operation during that test.  It looks like some of other existing test cases 
disable parallel query by setting max_parallel_workers_per_gather to 0.  It's 
not strange that some tests fail with some configuration.  autovacuum is 
disabled in many places of the regression test.

Rather, I don't think we should introduce the trick to use 
ConditionalLockAcquire().  Otherwise, the insert would be executed in a serial 
fashion without the user knowing it -- "What?  The insert suddenly slowed down 
multiple times today, and it didn't finish within the planned maintenance 
window.  What's wrong?"


Regards
Takayuki Tsunakawa



Re: Parallel Inserts in CREATE TABLE AS

2020-12-09 Thread Zhihong Yu
Hi,

+   if (!OidIsValid(col->collOid) &&
+   type_is_collatable(col->typeName->typeOid))
+   ereport(ERROR,
...
+   attrList = lappend(attrList, col);

Should attrList be freed when ereport is called ?

+   query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;

Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning
the value of 0 ?

Cheers

On Wed, Dec 9, 2020 at 5:43 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar  wrote:
> >
> > On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie 
> wrote:
> > >
> > > > > I'm not quite sure how to address this. Can we not allow the
> planner
> > > > > to consider that the select is for CTAS and check only after the
> > > > > planning is done for the Gather node and other checks?
> > > > >
> > > >
> > > > IIUC, you are saying that we should not influence the cost of gather
> node
> > > > even when the insertion would be done by workers? I think that
> should be
> > > > our fallback option anyway but that might miss some paths to be
> considered
> > > > parallel where the cost becomes more due to parallel_tuple_cost (aka
> tuple
> > > > transfer cost). I think the idea is we can avoid the tuple transfer
> cost
> > > > only when Gather is the top node because only at that time we can
> push
> > > > insertion down, right? How about if we have some way to detect the
> same
> > > > before calling generate_useful_gather_paths()? I think when we are
> calling
> > > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > > query_level is 1, it is for CTAS, and it doesn't have a chance to
> create
> > > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we
> can
> > > > probably assume that the Gather will be top_node. I am not sure
> about this
> > > > but I think it is worth exploring.
> > > >
> > >
> > > I took a look at the parallel insert patch and have the same idea.
> > > https://commitfest.postgresql.org/31/2844/
> > >
> > >  * Consider generating Gather or Gather Merge paths.  We must
> only do this
> > >  * if the relation is parallel safe, and we don't do it for
> child rels to
> > >  * avoid creating multiple Gather nodes within the same plan.
> We must do
> > >  * this after all paths have been generated and before
> set_cheapest, since
> > >  * one of the generated paths may turn out to be the cheapest
> one.
> > >  */
> > > if (rel->consider_parallel && !IS_OTHER_REL(rel))
> > > generate_useful_gather_paths(root, rel, false);
> > >
> > > IMO Gatherpath created here seems the right one which can possible
> ignore parallel cost if in CTAS.
> > > But We need check the following parse option which will create path to
> be the parent of Gatherpath here.
> > >
> > > if (root->parse->rowMarks)
> > > if (limit_needed(root->parse))
> > > if (root->parse->sortClause)
> > > if (root->parse->distinctClause)
> > > if (root->parse->hasWindowFuncs)
> > > if (root->parse->groupClause || root->parse->groupingSets ||
> root->parse->hasAggs || root->root->hasHavingQual)
> > >
> >
> > Yeah, and as I pointed earlier, along with this we also need to
> > consider that the RelOptInfo must be the final target(top level rel).
> >
>
> Attaching v10 patch set that includes the change suggested above for
> ignoring parallel tuple cost and also few more test cases. I split the
> patch as per Amit's suggestion. v10-0001 contains parallel inserts
> code without planner tuple cost changes and test cases. v10-0002 has
> required changes for ignoring planner tuple cost calculations.
>
> Please review it further.
>
> After the review and addressing all the comments, I plan to make some
> code common so that it can be used for Parallel Inserts in REFRESH
> MATERIALIZED VIEW. Thoughts?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Setof RangeType returns

2020-12-09 Thread Patrick Handja
Hello,

After some days, I finally found what I was looking for.
Actually this worked:

> Oid rngtypid = get_fn_expr_rettype(fcinfo->flinfo);

> typcache = range_get_typcache(fcinfo, rngtypid);

Thanks for the help.

*Andjasubu Bungama, Patrick *



Le mar. 1 déc. 2020 à 17:39, Tom Lane  a écrit :

> Patrick Handja  writes:
> > In my case, I do not have a range as an argument, I am receiving 2 int,
> > which I am using to create a range. How can I initialize typcache in this
> > case?
>
> You should be passing down the pointer from the outer function, which
> does have it at hand, no?
>
> regards, tom lane
>


Re: Get memory contexts of an arbitrary backend process

2020-12-09 Thread torikoshia

On 2020-12-04 19:16, torikoshia wrote:

On 2020-12-03 10:36, Tom Lane wrote:

Fujii Masao  writes:
I'm starting to study how this feature behaves. At first, when I 
executed

the following query, the function never returned. ISTM that since
the autovacuum launcher cannot respond to the request of memory
contexts dump, the function keeps waiting infinity. Is this a bug?
Probably we should exclude non-backend proceses from the target
processes to dump? Sorry if this was already discussed.


 SELECT pg_get_backend_memory_contexts(pid) FROM 
pg_stat_activity;


Thanks for trying it!

It was not discussed explicitly, and I was going to do it later
as commented.


+   /* TODO: Check also whether backend or not. */




FWIW, I think this patch is fundamentally unsafe.  It's got a
lot of the same problems that I complained about w.r.t. the
nearby proposal to allow cross-backend stack trace dumping.
It does avoid the trap of thinking that it can do work in
a signal handler, but instead it supposes that it can do
work involving very high-level objects such as shared hash tables
in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
never going to be safe: the only real expectation the system
has is that CHECK_FOR_INTERRUPTS is called at places where our
state is sane enough that a transaction abort can clean up.
Trying to do things like taking LWLocks is going to lead to
deadlocks or worse.  We need not even get into the hard questions,
such as what happens when one process or the other exits
unexpectedly.


Thanks for reviewing!

I may misunderstand something, but the dumper works not at
CHECK_FOR_INTERRUPTS but during the client read, i.e.,
ProcessClientReadInterrupt().

Is it also unsafe?


BTW, since there was a comment that the shared hash table
used too much memory, I'm now rewriting this patch not to use
the shared hash table but a simpler static shared memory struct.


Attached a rewritten patch.

Accordingly, I also slightly modified the basic design as below.

---
# Communication flow between the dumper and the requester
- (1) When requesting memory context dumping, the dumper changes
the struct on the shared memory state from 'ACCEPTABLE' to
'REQUESTING'.
- (2) The dumper sends the signal to the dumper process and wait on
the latch.
- (3) When the dumper noticed the signal, it changes the state to
'DUMPING'.
- (4) When the dumper completes dumping, it changes the state to
'DONE' and set the latch.
- (5) The requestor reads the dump file and shows it to the user.
Finally, the requestor removes the dump file and reset the shared
memory state to 'ACCEPTABLE'.

# Query cancellation
- When the requestor cancels dumping, e.g. signaling using ctrl-C,
the requestor changes the state of the shared memory to 'CANCELING'.
- The dumper checks the state when it tries to change the state to
'DONE' at (4), and if the state is 'CANCELING', it initializes the
dump file and reset the shared memory state to 'ACCEPTABLE'.

# Cleanup dump file and the shared memory
- In the normal case, the dumper removes the dump file and resets
the shared memory entry as described in (5).
- When something like query cancellation or process termination
happens on the dumper after (1) and before (3), in other words,
the state is 'REQUESTING', the requestor does the cleanup.
- When something happens on the dumper or the requestor after (3)
and before (4), in other words, the state is 'DUMPING', the dumper
does the cleanup. Specifically, if the requestor cancels the query,
it just changes the state to 'CANCELING' and the dumper notices it
and cleans up things later.
OTOH, when the dumper fails to dump, it cleans up the dump file and
reset the shared memory state.
- When something happens on the requestor after (4), i.e., the state
is 'DONE', the requestor does the cleanup.
- In the case of receiving SIGKILL or power failure, all dump files
are removed in the crash recovery process.
---





I also find the idea that this should be the same SQL function
as pg_get_backend_memory_contexts to be a seriously bad decision.
That means that it's not possible to GRANT the right to examine
only your own process's memory --- with this proposal, that means
granting the right to inspect every other process as well.

Beyond that, the fact that there's no way to restrict the capability
to just, say, other processes owned by the same user means that
it's not really safe to GRANT to non-superusers anyway.  Even with
such a restriction added, things are problematic, since for example
it would be possible to inquire into the workings of a
security-definer function executing in another process that
nominally is owned by your user.


I'm going to change the function name and restrict the executor to
superusers. Is it enough?


In the attached patch, I changed the function name to
pg_get_target_backend_memory_contexts() for now.


Regards,
From 120b2c33e2b295305dedc63921183c103e65ddc6 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: 

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-12-09 Thread Bharath Rupireddy
On Wed, Dec 9, 2020 at 4:49 PM Fujii Masao  wrote:
> I started reviewing 0001 patch.
>

Thanks!

> IMO the 0001 patch should be self-contained so that we can commit it at 
> first. That is, I think that it's better to move the documents and tests for 
> the functions 0001 patch adds from 0004 to 0001.
>

+1. I will make each patch self-contained in the next version which I
plan to submit soon.

> Since 0001 introduces new user-visible functions into postgres_fdw, the 
> version of postgres_fdw should be increased?
>

Yeah looks like we should do that, dblink has done that when it
introduced new functions. In case the new functions are not required
for anyone, they can choose to go back to 1.0.

Should we make the new version as 1.1 or 2.0? I prefer to make it 1.1
as we are just adding few functionality over 1.0. I will change the
default_version from 1.0 to the 1.1 and add a new
postgres_fdw--1.1.sql file.

If okay, I will make changes to 0001 patch.

> The similar code to get the server name from cached connection entry exists 
> also in pgfdw_reject_incomplete_xact_state_change(). I'm tempted to make the 
> "common" function for that code and use it both in 
> postgres_fdw_get_connections() and 
> pgfdw_reject_incomplete_xact_state_change(), to simplify the code.
>

+1. I will move the server name finding code to a new function, say
char *pgfdw_get_server_name(ConnCacheEntry *entry);

> +   /* We only look for active and open remote 
> connections. */
> +   if (entry->invalidated || !entry->conn)
> +   continue;
>
> We should return even invalidated entry because it has still cached 
> connection?
>

I checked this point earlier, for invalidated connections, the tuple
returned from the cache is also invalid and the following error will
be thrown. So, we can not get the server name for that user mapping.
Cache entries too would have been invalidated after the connection is
marked as invalid in pgfdw_inval_callback().

umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
if (!HeapTupleIsValid(umaptup))
  elog(ERROR, "cache lookup failed for user mapping with OID %u",
entry->key);

Can we reload the sys cache entries of USERMAPPINGOID (if there is a
way) for invalid connections in our new function and then do a look
up? If not, another way could be storing the associated server name or
oid in the ConnCacheEntry. Currently we store user mapping oid(in
key), its hash value(in mapping_hashvalue) and foreign server oid's
hash value (in server_hashvalue). If we have the foreign server oid,
then we can just look up for the server name, but I'm not quite sure
whether we get the same issue i.e. invalid tuples when the entry gets
invalided (via pgfdw_inval_callback) due to some change in foreign
server options.

IMHO, we can simply choose to show all the active, valid connections. Thoughts?

> Also this makes me wonder if we should return both the server name and 
> boolean flag indicating whether it's invalidated or not. If so, users can 
> easily find the invalidated connection entry and disconnect it because there 
> is no need to keep invalidated connection.
>

Currently we are returning a list of foreing server names with whom
there exist active connections. If we somehow address the above
mentioned problem for invalid connections and choose to show them as
well, then how should our output look like? Is it something like we
prepare a list of pairs (servername, validflag)?

> +   if (all)
> +   {
> +   hash_destroy(ConnectionHash);
> +   ConnectionHash = NULL;
> +   result = true;
> +   }
>
> Could you tell me why ConnectionHash needs to be destroyed?
>

Say, in a session there are hundreds of different foreign server
connections made and if users want to disconnect all of them with the
new function and don't want any further foreign connections in that
session, they can do it. But then why keep the cache just lying around
and holding those many entries? Instead we can destroy the cache and
if necessary it will be allocated later on next foreign server
connections.

IMHO, it is better to destroy the cache in case of disconnect all,
hoping to save memory, thinking that (next time if required) the cache
allocation doesn't take much time. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-09 Thread Bharath Rupireddy
On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar  wrote:
>
> On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie  wrote:
> >
> > > > I'm not quite sure how to address this. Can we not allow the planner
> > > > to consider that the select is for CTAS and check only after the
> > > > planning is done for the Gather node and other checks?
> > > >
> > >
> > > IIUC, you are saying that we should not influence the cost of gather node
> > > even when the insertion would be done by workers? I think that should be
> > > our fallback option anyway but that might miss some paths to be considered
> > > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > > only when Gather is the top node because only at that time we can push
> > > insertion down, right? How about if we have some way to detect the same
> > > before calling generate_useful_gather_paths()? I think when we are calling
> > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > > probably assume that the Gather will be top_node. I am not sure about this
> > > but I think it is worth exploring.
> > >
> >
> > I took a look at the parallel insert patch and have the same idea.
> > https://commitfest.postgresql.org/31/2844/
> >
> >  * Consider generating Gather or Gather Merge paths.  We must only 
> > do this
> >  * if the relation is parallel safe, and we don't do it for child 
> > rels to
> >  * avoid creating multiple Gather nodes within the same plan. We 
> > must do
> >  * this after all paths have been generated and before 
> > set_cheapest, since
> >  * one of the generated paths may turn out to be the cheapest one.
> >  */
> > if (rel->consider_parallel && !IS_OTHER_REL(rel))
> > generate_useful_gather_paths(root, rel, false);
> >
> > IMO Gatherpath created here seems the right one which can possible ignore 
> > parallel cost if in CTAS.
> > But We need check the following parse option which will create path to be 
> > the parent of Gatherpath here.
> >
> > if (root->parse->rowMarks)
> > if (limit_needed(root->parse))
> > if (root->parse->sortClause)
> > if (root->parse->distinctClause)
> > if (root->parse->hasWindowFuncs)
> > if (root->parse->groupClause || root->parse->groupingSets || 
> > root->parse->hasAggs || root->root->hasHavingQual)
> >
>
> Yeah, and as I pointed earlier, along with this we also need to
> consider that the RelOptInfo must be the final target(top level rel).
>

Attaching v10 patch set that includes the change suggested above for
ignoring parallel tuple cost and also few more test cases. I split the
patch as per Amit's suggestion. v10-0001 contains parallel inserts
code without planner tuple cost changes and test cases. v10-0002 has
required changes for ignoring planner tuple cost calculations.

Please review it further.

After the review and addressing all the comments, I plan to make some
code common so that it can be used for Parallel Inserts in REFRESH
MATERIALIZED VIEW. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 2939b2c51bff3548ea15c9054f9e2ab6619b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 10 Dec 2020 05:38:35 +0530
Subject: [PATCH v10] Parallel Inserts in CREATE TABLE AS

The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.

The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts its
share of tuples if instructed to do, and so are workers. Each
worker writes atomically its number of inserted tuples into a
shared memory variable, the leader combines this with its own
number of inserted tuples and shares to the client.
---
 src/backend/access/heap/heapam.c |  11 -
 src/backend/access/transam/xact.c|  30 +-
 src/backend/commands/createas.c  | 332 
 src/backend/commands/explain.c   |  32 ++
 src/backend/executor/execParallel.c  |  70 ++-
 src/backend/executor/nodeGather.c| 113 -
 src/backend/executor/nodeGatherMerge.c   |   4 +-
 src/include/access/xact.h|   1 +
 src/include/commands/createas.h  |  28 ++
 src/include/executor/execParallel.h  |   6 +-
 

Re: [Patch] Optimize dropping of relation buffers using dlist

2020-12-09 Thread Kyotaro Horiguchi
At Wed, 9 Dec 2020 16:27:30 +0530, Amit Kapila  wrote 
in 
> On Wed, Dec 9, 2020 at 6:32 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Tue, 8 Dec 2020 16:28:41 +0530, Amit Kapila  
> > wrote in
> > > On Tue, Dec 8, 2020 at 12:13 PM tsunakawa.ta...@fujitsu.com
> > >  wrote:
> > > >
> > > > From: Jamison, Kirk/ジャミソン カーク 
> > > > > Because one of the rel's cached value was false, it forced the
> > > > > full-scan path for TRUNCATE.
> > > > > Is there a possible workaround for this?
> > > >
> > > > Hmm, the other two relfilenodes are for the TOAST table and index of 
> > > > the target table.  I think the INSERT didn't access those TOAST 
> > > > relfilenodes because the inserted data was stored in the main storage.  
> > > > But TRUNCATE always truncates all the three relfilenodes.  So, the 
> > > > standby had not opened the relfilenode for the TOAST stuff or cached 
> > > > its size when replaying the TRUNCATE.
> > > >
> > > > I'm afraid this is more common than we can ignore and accept the slow 
> > > > traditional path, but I don't think of a good idea to use the cached 
> > > > flag.
> > > >
> > >
> > > I also can't think of a way to use an optimized path for such cases
> > > but I don't agree with your comment on if it is common enough that we
> > > leave this optimization entirely for the truncate path.
> >
> > Mmm. At least btree doesn't need to call smgrnblocks except at
> > expansion, so we cannot get to the optimized path in major cases of
> > truncation involving btree (and/or maybe other indexes).
> >
> 
> AFAICS, btree insert should call smgrnblocks via
> btree_xlog_insert->XLogReadBufferForRedo->XLogReadBufferForRedoExtended->XLogReadBufferExtended->smgrnblocks.
> Similarly delete should also call smgrnblocks. Can you be bit more
> specific related to the btree case you have in mind?

Oh, sorry. I wrongly looked to non-recovery path. smgrnblocks is
called during buffer loading while recovery. So, smgrnblock is called
for indexes if any update happens on the heap relation.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Proposed patch for key managment

2020-12-09 Thread Bruce Momjian
On Fri, Dec  4, 2020 at 10:32:45PM -0500, Bruce Momjian wrote:
> I can break out the -R/file descriptor passing part as a separate patch,
> and have the ssl_passphrase_command use that, but that's the only part I
> know can be useful on its own.
> 
> Since the patch is large, I found a way to push the branch to git and
> how to make a download link that tracks whatever I push to the 'key'
> branch on my github account.  Here is the updated patch link:
> 
>   https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

I have made some good progress on the patch.  I realized that pg_upgrade
can't just copy the keys from the old cluster --- they encrypt the user
heap/index files that are copied/linked by pg_upgrade, but also encrypt
the system tables, which initdb creates, so the keys have to be copied
at initdb bootstrap time --- I have added an option to do that.  I also
realized that pg_upgrade will be starting/stopping the server, so I need
to add an option to pg_upgrade to allow that prompting.  I can now
successfully pg_upgrade a cluster that uses cluster file encryption, and
keep the same keys.  All at the same URL.

In addition I have completed the command-line tool to allow changing of
the cluster passphrase, which applies over the first diff;  diff at:


https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff

My next task is to write a script for Yubikey authentication.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2020-12-09 Thread tsunakawa.ta...@fujitsu.com
From: Bharath Rupireddy 
> I'm not sure how many more of such commands exist which require changes.
> 
> How about doing it this way?
> 
> 1) Have a separate common thread listing the commands and specifying
> clearly the agreed behaviour for such commands.
> 2) Whenever the separate patches are submitted(hopefully) by the
> hackers for such commands, after the review we can make a note of that
> patch in the common thread.
> 3) Once the patches for all the listed commands are
> received(hopefully) , then they can be committed together in one
> release.

That sounds interesting and nice.  Having said that, I rather think it's better 
to release the fixes separately.  Of course, we confirm the principle of 
consistency that individual fixes base on beforehand and what actions need 
sixing (SET LOGGED/UNLOGGED was missing in the past thread.)


Regards
Takayuki Tsunakawa



RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2020-12-09 Thread Shinya11.Kato
Thanks for the reply. > Mr.Horiguchi.

I reviewed the patch and found some problems.

>+ if(startSegNo != endSegNo)
>+ else if(record->ReadRecPtr / XLOG_BLCKSZ !=
>+ if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
>+ if(ri == RM_XLOG_ID)
>+ if(info == XLOG_SWITCH)
You need to put a space after the "if".

>@@ -24,6 +24,7 @@
>#include "common/logging.h"
>#include "getopt_long.h"
>#include "rmgrdesc.h"
>+#include "catalog/pg_control.h"
I think the include statements should be arranged in alphabetical order.

>+ info = (rj << 4) & ~XLR_INFO_MASK;
>+ if(info == XLOG_SWITCH)
>+ XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
>+ 0, total_count, stats->junk_size, total_rec_len,
>+ 0, total_fpi_len, stats->junk_size, total_len);

Can't be described in the same way as "XLogDumpStatsRow(psprintf("%s/%s", 
desc->rm_name, id)..."?
Only this part looks strange.

Why are the "count" and "fpi_len" fields 0?

I think you need to improve the duplicate output in column "XLOG/SWITCH_JUNK".


Regards,
Shinya Kato


RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2020-12-09 Thread tsunakawa.ta...@fujitsu.com
From: Alvaro Herrera 
> Well, that definition seems unfriendly to me.  I prefer the stance that
> if you change the value for the parent, then future partitions inherit
> that value.

That would be right when the storage property is an optional specification such 
as fillfactor.  For example, when I run ALTER TABLE mytable SET (fillfactor = 
70) and then CREATE TABLE mytable_p1 PARTITION OF mytable, I find it nice that 
the fillfactor os mytable_p1 is also 70 (but I won't complain if it isn't, 
since I can run ALTER TABLE SET on the parent table again.)

OTOH, CREATE TABLE and CREATE UNLOGGED TABLE is an explicit request to create a 
logged and unlogged relation respectively.  I feel it a strange? if CREATE 
TABLE mytable_p1 PARTITION OF mytable creates an unlogged partition.


> > I got an impression from the discussion that some form of consensus on
> > the principle was made and you were trying to create a fix for REPLICA
> > IDENTITY.  Do you think the principle was unclear and we should state
> > it first (partly to refresh people's memories)?
> 
> I think the principle was sound -- namely, that we should make all those
> commands recurse by default, and for cases where it matters, the
> parent's setting should determine the default for future children.

Yeah, recurse by default sounded nice.  But I didn't find a consensus related 
to "parent's setting should determine the default for future children."  Could 
you point me there?


> > I'm kind of for it, but I'm hesitant to create the fix for all ALTER
> > actions, because it may take a lot of time and energy as you were
> > afraid.  Can we define the principle, and then create individual fixes
> > independently based on that principle?
> 
> That seems acceptable to me, as long as we get all changes in the same
> release.  What we don't want (according to my reading of that
> discussion) is to change the semantics of a few subcommands in one
> release, and the semantics of a few other subcommands in another
> release.

All fixes at one release seems constricting to me...  Reading from the 
following quote in the past discussion, I understood consistency is a must and 
simultaneous release is an ideal.  So, I think we can release individual fixes 
separately.  I don't think it won't worsen the situation for users at least.

"try to make them all work the same, ideally in one release, rather than 
changing them at different times, back-patching sometimes, and having no 
consistency in the details.

BTW, do you think you can continue to work on your REPLICA IDENTITY patch soon?


Regards
Takayuki Tsunakawa






Re: [Patch] ALTER SYSTEM READ ONLY

2020-12-09 Thread Andres Freund
Hi,

On 2020-12-09 16:13:06 -0500, Robert Haas wrote:
> That's not good. On a typical busy system, a system is going to be in
> the middle of a checkpoint most of the time, and the checkpoint will
> take a long time to finish - maybe minutes.

Or hours, even. Due to the cost of FPWs it can make a lot of sense to
reduce the frequency of that cost...


> We want this feature to respond within milliseconds or a few seconds,
> not minutes. So we need something better here.

Indeed.


> I'm inclined to think
> that we should try to CompleteWALProhibitChange() at the same places
> we AbsorbSyncRequests(). We know from experience that bad things
> happen if we fail to absorb sync requests in a timely fashion, so we
> probably have enough calls to AbsorbSyncRequests() to make sure that
> we always do that work in a timely fashion. So, if we do this work in
> the same place, then it will also be done in a timely fashion.

Sounds sane, without having looked in detail.


> I'm not 100% sure whether that introduces any other problems.
> Certainly, we're not going to be able to finish the checkpoint once
> we've gone read-only, so we'll fail when we try to write the WAL
> record for that, or maybe earlier if there's anything else that tries
> to write WAL. Either the checkpoint needs to error out, like any other
> attempt to write WAL, and we can attempt a new checkpoint if and when
> we go read/write, or else we need to finish writing stuff out to disk
> but not actually write the checkpoint completion record (or any other
> WAL) unless and until the system goes back into read/write mode - and
> then at that point the previously-started checkpoint will finish
> normally. The latter seems better if we can make it work, but the
> former is probably also acceptable. What you've got right now is not.

I mostly wonder which of those two has which implications for how many
FPWs we need to redo. Presumably stalling but not cancelling the current
checkpoint is better?

Greetings,

Andres Freund




Re: [Patch] ALTER SYSTEM READ ONLY

2020-12-09 Thread Andres Freund
On 2020-11-20 11:23:44 -0500, Robert Haas wrote:
> Andres, do you like the new loop better?

I do!




Re: Proposed patch for key managment

2020-12-09 Thread Stephen Frost
Greetings,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> > On 2 Dec 2020, at 22:38, Bruce Momjian  wrote:
> > Attached is a patch for key management, which will eventually be part of
> > cluster file encryption (CFE), called TDE (Transparent Data Encryption)
> > by Oracle.
> 
> The attackvector protected against seems to be operating systems users
> (*without* any legitimate database access) gaining access to the data files.

That isn't the only attack vector that this addresses (though it is one
that this is envisioned to help with- to wit, someone rsync'ing the DB
directory).  TDE also helps with traditional data at rest requirements,
in environments where you don't trust the storage layer to handle that
(for whatever reason), and it's at least imagined that backups with
pg_basebackup would also be encrypted, helping protect against backup
theft.

There's, further, certainly no shortage of folks asking for this.

> Such an attacker would also gain access to postgresql.conf and therefore may
> have the cluster passphrase command in case it's stored there.  That would
> imply the attacker is likely (or perhaps better phrased "not entirely
> unlikely") to be able to run that command and decrypt the data *iff* there is
> no interactive/2fa aspect to the passphrase command.  Is that an accurate
> interpretation?  Do Oracle TDE et.al use a similar setup where an database
> restart require manual intervention?

This is very much an 'it depends' as other products have different
capabilities in this area.  The most similar implementation to what is
being contemplated for PG today would be the big O's "tablespace" TDE,
which offers options of either "Password-based software keystores" (you
have to provide a password when you want to bring that tablespace
online), or "Auto-login software keystores" (there's a "system generated
password" that's used to encrypt the keystore, which seems to be
more-or-less the username and hostname of the system), or HSM based
options.  As such, a DB restart might require manual intervention (if
the keystore is password based, or an HSM that requires manual
involvement) or it might not (auto-login keystore of some kind).

> Admittedly I haven't read the other threads on the topic so if it's discussed
> somewhere else then I'd appreciate a whacking with a cluestick.

I'm sure it was, but hopefully the above helps you avoid having to dig
through the very (very (very)) long threads on this topic.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-09 Thread Tom Lane
Here's a couple of little finger exercises to move this along a bit.

0001 adds the ability to attach a subscript handler to an existing
data type with ALTER TYPE.  This is clearly going to be necessary
if we want extension types to be able to use this facility.  The
only thing that I think might be controversial here is that I did
not add the ability to set pg_type.typelem.  While that'd be easy
enough so far as ALTER TYPE is concerned, I'm not sure that we want
to encourage people to change it.  The dependency rules mean that
the semantics of typelem aren't something you really want to change
after-the-fact on an existing type.  Also, if we did allow it, any
existing SubscriptingRef.refelemtype values in stored views would
fail to be updated.

0002 makes use of that to support subscripting of hstore.  I'm not
sure how much we care about that from a functionality standpoint,
but it seems like it might be good to have a contrib module testing
that extensions can use this.  Also, I thought possibly an example
showing what's basically the minimum possible amount of complexity
would be good to have.  If people like this, I'll finish it up (it
lacks docs) and add it.

regards, tom lane

diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 64bf266373..21887e88a0 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -194,6 +194,14 @@ ALTER TYPE name SET ( 

+   
+
+ SUBSCRIPT can be set to the name of a type-specific
+ subscripting handler function, or NONE to remove
+ the type's subscripting handler function.  Using this option
+ requires superuser privilege.
+
+   

 
  STORAGE
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 29fe52d2ce..7c0b2c3bf0 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -94,6 +94,7 @@ typedef struct
 	bool		updateTypmodin;
 	bool		updateTypmodout;
 	bool		updateAnalyze;
+	bool		updateSubscript;
 	/* New values for relevant attributes */
 	char		storage;
 	Oid			receiveOid;
@@ -101,6 +102,7 @@ typedef struct
 	Oid			typmodinOid;
 	Oid			typmodoutOid;
 	Oid			analyzeOid;
+	Oid			subscriptOid;
 } AlterTypeRecurseParams;
 
 /* Potentially set by pg_upgrade_support functions */
@@ -3885,6 +3887,18 @@ AlterType(AlterTypeStmt *stmt)
 			/* Replacing an analyze function requires superuser. */
 			requireSuper = true;
 		}
+		else if (strcmp(defel->defname, "subscript") == 0)
+		{
+			if (defel->arg != NULL)
+atparams.subscriptOid =
+	findTypeSubscriptingFunction(defGetQualifiedName(defel),
+ typeOid);
+			else
+atparams.subscriptOid = InvalidOid; /* NONE, remove function */
+			atparams.updateSubscript = true;
+			/* Replacing a subscript function requires superuser. */
+			requireSuper = true;
+		}
 
 		/*
 		 * The rest of the options that CREATE accepts cannot be changed.
@@ -4042,6 +4056,11 @@ AlterTypeRecurse(Oid typeOid, bool isImplicitArray,
 		replaces[Anum_pg_type_typanalyze - 1] = true;
 		values[Anum_pg_type_typanalyze - 1] = ObjectIdGetDatum(atparams->analyzeOid);
 	}
+	if (atparams->updateSubscript)
+	{
+		replaces[Anum_pg_type_typsubscript - 1] = true;
+		values[Anum_pg_type_typsubscript - 1] = ObjectIdGetDatum(atparams->subscriptOid);
+	}
 
 	newtup = heap_modify_tuple(tup, RelationGetDescr(catalog),
 			   values, nulls, replaces);
@@ -4098,6 +4117,7 @@ AlterTypeRecurse(Oid typeOid, bool isImplicitArray,
 	atparams->updateReceive = false;	/* domains use F_DOMAIN_RECV */
 	atparams->updateTypmodin = false;	/* domains don't have typmods */
 	atparams->updateTypmodout = false;
+	atparams->updateSubscript = false;	/* domains don't have subscriptors */
 
 	/* Skip the scan if nothing remains to be done */
 	if (!(atparams->updateStorage ||
diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out
index f85afcb31e..14394cc95c 100644
--- a/src/test/regress/expected/create_type.out
+++ b/src/test/regress/expected/create_type.out
@@ -260,38 +260,40 @@ ALTER TYPE myvarchar SET (
 receive = myvarcharrecv,
 typmod_in = varchartypmodin,
 typmod_out = varchartypmodout,
-analyze = array_typanalyze  -- bogus, but it doesn't matter
+-- these are bogus, but it's safe as long as we don't use the type:
+analyze = ts_typanalyze,
+subscript = raw_array_subscript_handler
 );
 SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout,
-   typanalyze, typstorage
+   typanalyze, typsubscript, typstorage
 FROM pg_type WHERE typname = 'myvarchar';
-  typinput   |  typoutput   |  typreceive   |typsend|typmodin |typmodout |typanalyze| typstorage 
--+--+---+---+-+--+--+
- myvarcharin | myvarcharout | 

Re: Proposed patch for key managment

2020-12-09 Thread Daniel Gustafsson
> On 2 Dec 2020, at 22:38, Bruce Momjian  wrote:
> 
> Attached is a patch for key management, which will eventually be part of
> cluster file encryption (CFE), called TDE (Transparent Data Encryption)
> by Oracle.

The attackvector protected against seems to be operating systems users
(*without* any legitimate database access) gaining access to the data files.
Such an attacker would also gain access to postgresql.conf and therefore may
have the cluster passphrase command in case it's stored there.  That would
imply the attacker is likely (or perhaps better phrased "not entirely
unlikely") to be able to run that command and decrypt the data *iff* there is
no interactive/2fa aspect to the passphrase command.  Is that an accurate
interpretation?  Do Oracle TDE et.al use a similar setup where an database
restart require manual intervention?

Admittedly I haven't read the other threads on the topic so if it's discussed
somewhere else then I'd appreciate a whacking with a cluestick.

A few other comments from skimming the patch:

+  is data encryption keys, specifically keys zero and one.  Key zero is
+  the key uses to encrypt database heap and index files which are stored in
".. is the key used to .."?

+   matches the initdb-supplied passphrase.  If this check fails, and the
+   the server will refuse to start.
Seems like something is missing, perhaps just s/and the//?

+ https://tools.ietf.org/html/rfc2315;>RFC2315.
Tiny nitpick: turns out that RFC 2315 (with a space) is the correct spelling.

+ * are read-only.  All wrapping and unwrapping key routines depends on
+ * the openssl library for now.
OpenSSL is a name so it should be cased as OpenSSL in text like this.

 #include 
+#include 
Why do we need this header in be-secure-openssl.c?  There are no other changes
to that file?

+/* Define to 1 if you have the `OPENSSL_init_crypto' function. */
+#undef HAVE_OPENSSL_INIT_CRYPTO
This seems to be unused?

KmgrSaveCryptoKeys doesn't seem to be explicitly clamping down permissions on
the generated file, is that something we want for belt-and-suspender level
paranoia around keys? The same goes for read_one_keyfile etc.

Also, KmgrSaveCryptoKeys assumes that keys are stored in plain files which is
true for OpenSSL but won't necessarily be true for other crypto libraries.
Would it make sense to have an API in be-kmgr.c with the OpenSSL specifics in
be-kmgr-openssl.c like how the TLS backend support is written?  We have spent a
lot effort in making the backend not assume a particular TLS library, it seems
a shame to step away from that here with a tight coupling. (yes, I am biased)

The same goes for src/common/cipher.c which wraps calls in ifdefs, why not just
have an thin wrapper API which cipher-openssl.c implements?

+   case 'K':
+   file_encryption_keylen = atoi(optarg);
+   break;
atoi() will return zero on failing to parse the keylen, where 0 implies
"disabled".  Wouldn't it make sense to parse this with code which can't
silently fall back on disabling in case of users mistyping?

+* Skip cryptographic keys. It's generally not good idea to copy the
".. not *a* good idea .."

+   pg_log_fatal("cluser passphrase referenced %%R, but -R 
not specified");
s/cluser/cluster/  (there are few copy-pasteos of that elsewhere too)

+   elog(ERROR, "too many cryptographic kes");
s/kes/keys/

+#ifndef CIPHER_H
+#define CIPHER_H
The risk for headerguard collision with non-PG headers seem quite high, maybe
PG_CIPHER_H would be better?

I will try to dive in a bit deeper over the holidays.

cheers ./daniel



Re: [Patch] ALTER SYSTEM READ ONLY

2020-12-09 Thread Robert Haas
On Sat, Sep 12, 2020 at 1:23 AM Amul Sul  wrote:
> > So, if we're in the middle of a paced checkpoint with a large
> > checkpoint_timeout - a sensible real world configuration - we'll not
> > process ASRO until that checkpoint is over?  That seems very much not
> > practical. What am I missing?
>
> Yes, the process doing ASRO will wait until that checkpoint is over.

That's not good. On a typical busy system, a system is going to be in
the middle of a checkpoint most of the time, and the checkpoint will
take a long time to finish - maybe minutes. We want this feature to
respond within milliseconds or a few seconds, not minutes. So we need
something better here. I'm inclined to think that we should try to
CompleteWALProhibitChange() at the same places we
AbsorbSyncRequests(). We know from experience that bad things happen
if we fail to absorb sync requests in a timely fashion, so we probably
have enough calls to AbsorbSyncRequests() to make sure that we always
do that work in a timely fashion. So, if we do this work in the same
place, then it will also be done in a timely fashion.

I'm not 100% sure whether that introduces any other problems.
Certainly, we're not going to be able to finish the checkpoint once
we've gone read-only, so we'll fail when we try to write the WAL
record for that, or maybe earlier if there's anything else that tries
to write WAL. Either the checkpoint needs to error out, like any other
attempt to write WAL, and we can attempt a new checkpoint if and when
we go read/write, or else we need to finish writing stuff out to disk
but not actually write the checkpoint completion record (or any other
WAL) unless and until the system goes back into read/write mode - and
then at that point the previously-started checkpoint will finish
normally. The latter seems better if we can make it work, but the
former is probably also acceptable. What you've got right now is not.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-09 Thread David Rowley
On Tue, 8 Dec 2020 at 20:15, David Rowley  wrote:
> I've attached another patchset that addresses some comments left by
> Zhihong Yu over on [1].  The version number got bumped to v12 instead
> of v11 as I still have a copy of the other version of the patch which
> I made some changes to and internally named v11.

If anyone else wants to have a look at these, please do so soon. I'm
planning on starting to take a serious look at getting 0001-0003 in
early next week.

David




Re: SELECT INTO deprecation

2020-12-09 Thread Peter Eisentraut

On 2020-12-03 20:26, Peter Eisentraut wrote:

On 2020-12-03 16:34, Tom Lane wrote:

As I recall, a whole lot of the pain we have with INTO has to do
with the semantics we've chosen for INTO in a set-operation nest.
We think you can write something like

 SELECT ... INTO foo FROM ... UNION SELECT ... FROM ...

but we insist on the INTO being in the first component SELECT.
I'd like to know exactly how much of that messiness is shared
by SQL Server.


On sqlfiddle.com, this works:

select a into t3 from t1 union select a from t2;

but this gets an error:

select a from t1 union select a into t4 from t2;

SELECT INTO must be the first query in a statement containing a UNION,
INTERSECT or EXCEPT operator.


So, with that in mind, here is an alternative proposal that points out 
that SELECT INTO has some use for compatibility.
From ab0dc84e0519d618a316456598fc277a2f1c9f6f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 2 Dec 2020 12:09:39 +0100
Subject: [PATCH v2 1/2] Remove gratuitous uses of deprecated SELECT INTO

CREATE TABLE AS has been preferred over SELECT INTO (outside of ecpg
and PL/pgSQL) for a long time.  There were still a few uses of SELECT
INTO in tests and documentation, some old, some more recent.  This
changes them to CREATE TABLE AS.  Some occurrences in the tests remain
where they are specifically testing SELECT INTO parsing or similar.
---
 contrib/sepgsql/expected/label.out  | 2 +-
 contrib/sepgsql/sql/label.sql   | 2 +-
 doc/src/sgml/hstore.sgml| 2 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl| 4 ++--
 src/bin/pg_checksums/t/002_actions.pl   | 2 +-
 src/test/regress/expected/create_index.out  | 2 +-
 src/test/regress/expected/create_misc.out   | 2 +-
 src/test/regress/expected/random.out| 3 ++-
 src/test/regress/expected/select_implicit.out   | 6 --
 src/test/regress/expected/select_implicit_1.out | 6 --
 src/test/regress/expected/select_implicit_2.out | 6 --
 src/test/regress/sql/create_index.sql   | 2 +-
 src/test/regress/sql/create_misc.sql| 2 +-
 src/test/regress/sql/random.sql | 3 ++-
 src/test/regress/sql/select_implicit.sql| 6 --
 15 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/contrib/sepgsql/expected/label.out 
b/contrib/sepgsql/expected/label.out
index 0300bc6fb4..b1b7db55f6 100644
--- a/contrib/sepgsql/expected/label.out
+++ b/contrib/sepgsql/expected/label.out
@@ -6,7 +6,7 @@
 --
 CREATE TABLE t1 (a int, b text);
 INSERT INTO t1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');
-SELECT * INTO t2 FROM t1 WHERE a % 2 = 0;
+CREATE TABLE t2 AS SELECT * FROM t1 WHERE a % 2 = 0;
 CREATE FUNCTION f1 () RETURNS text
 AS 'SELECT sepgsql_getcon()'
 LANGUAGE sql;
diff --git a/contrib/sepgsql/sql/label.sql b/contrib/sepgsql/sql/label.sql
index d19c6edb4c..76e261bee8 100644
--- a/contrib/sepgsql/sql/label.sql
+++ b/contrib/sepgsql/sql/label.sql
@@ -7,7 +7,7 @@
 --
 CREATE TABLE t1 (a int, b text);
 INSERT INTO t1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');
-SELECT * INTO t2 FROM t1 WHERE a % 2 = 0;
+CREATE TABLE t2 AS SELECT * FROM t1 WHERE a % 2 = 0;
 
 CREATE FUNCTION f1 () RETURNS text
 AS 'SELECT sepgsql_getcon()'
diff --git a/doc/src/sgml/hstore.sgml b/doc/src/sgml/hstore.sgml
index 14a36ade00..25904d9562 100644
--- a/doc/src/sgml/hstore.sgml
+++ b/doc/src/sgml/hstore.sgml
@@ -841,7 +841,7 @@ Statistics
   
Using a table:
 
-SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore;
+CREATE TABLE stat AS SELECT (each(h)).key, (each(h)).value FROM testhstore;
 
   
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl 
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index f674a7c94e..9eba7d8d7d 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -502,10 +502,10 @@
 
 # create tables to corrupt and get their relfilenodes
 my $file_corrupt1 = $node->safe_psql('postgres',
-   q{SELECT a INTO corrupt1 FROM generate_series(1,1) AS a; ALTER 
TABLE corrupt1 SET (autovacuum_enabled=false); SELECT 
pg_relation_filepath('corrupt1')}
+   q{CREATE TABLE corrupt1 AS SELECT a FROM generate_series(1,1) AS a; 
ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT 
pg_relation_filepath('corrupt1')}
 );
 my $file_corrupt2 = $node->safe_psql('postgres',
-   q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE 
corrupt2 SET (autovacuum_enabled=false); SELECT 
pg_relation_filepath('corrupt2')}
+   q{CREATE TABLE corrupt2 AS SELECT b FROM generate_series(1,2) AS b; 
ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT 
pg_relation_filepath('corrupt2')}
 );
 
 # set page header and block sizes
diff --git a/src/bin/pg_checksums/t/002_actions.pl 
b/src/bin/pg_checksums/t/002_actions.pl
index 4e4934532a..8a81f36a06 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ 

Re: libpq compression

2020-12-09 Thread Robert Haas
On Tue, Dec 8, 2020 at 9:42 AM Daniil Zakhlystov
 wrote:
> I proposed a slightly different handshake (three-phase):
>
> 1. At first, the client sends _pq_.compression parameter in startup packet
> 2. Server replies with CompressionAck and following it with 
> SetCompressionMethod message.
> These two might be combined but I left them like this for symmetry reasons. 
> In most cases they
> will arrive as one piece without any additional delay.
> 3. Client replies with SetCompressionMethod message.

I think that's pretty similar to what I proposed, actually, except I
think that SetCompressionMethod in one direction should be decoupled
from SetCompressionMethod in the other direction, so that those things
don't have to be synchronized with respect to each other. Each affects
its own direction only.

> Yes, there is actually some amount of complexity involved in implementing the 
> switchable on-the-fly compression.
> Currently, compression itself operates on a different level, independently of 
> libpq protocol. By allowing
> the compression to be switchable on the fly, we need to solve these tasks:
>
> 1. When the new portion of bytes comes to the decompressor from the 
> socket.read() call, there may be
> a situation when the first part of these bytes is a compressed fragment and 
> the other is part is uncompressed, or worse,
> in a single portion of new bytes, there may be the end of some ZLIB 
> compressed message and the beginning of the ZSTD compressed message.
> The problem is that we don’t know the exact end of the ZLIB compressed 
> message before decompressing the entire chunk of new bytes
> and reading the SetCompressionMethod message. Moreover, streaming compression 
> by itself may involve some internal buffering,
> which also complexifies this problem.
>
> 2. When sending the new portion of bytes, it may be not sufficient to keep 
> track of only the current compression method.
> There may be a situation when there could be multiple SetCompressionMessages 
> in PqSendBuffer (backend) or conn->outBuffer (frontend).
> It means that it is not enough to simply track the current compression method 
> but also keep track of all compression method
> switches in PqSendBuffer or conn->outBuffer. Also, same as for decompression,
> internal buffering of streaming compression makes the situation more complex 
> in this case too.

Good points. I guess you need to arrange to "flush" at the compression
layer as well as the libpq layer so that you don't end up with data
stuck in the compression buffers.

Another idea is that you could have a new message type that says "hey,
the payload of this is 1 or more compressed messages." It uses the
most-recently set compression method. This would make switching
compression methods easier since the SetCompressionMethod message
itself could always be sent uncompressed and/or not take effect until
the next compressed message. It also allows for a prudential decision
not to bother compressing messages that are short anyway, which might
be useful. On the downside it adds a little bit of overhead. Andres
was telling me on a call that he liked this approach; I'm not sure if
it's actually best, but have you considered this sort of approach?

> I personally think that this approach is the most practical one. For example:
>
> In the server’s postgresql.conf:
>
> compress_algorithms = ‘uncompressed' // means that the server forbids any 
> server-to-client compression
> decompress_algorithms = 'zstd:7,8;uncompressed' // means that the server can 
> only decompress zstd with compression ratio 7 and 8 or communicate with 
> uncompressed messages
>
> In the client connection string:
>
> “… compression=zlib:1,3,5;zstd:6,7,8;uncompressed …” // means that the client 
> is able to compress/decompress zlib, zstd, or communicate with uncompressed 
> messages
>
> For the sake of simplicity, the client’s “compression” parameter in the 
> connection string is basically an analog of the server’s compress_algorithms 
> and decompress_algorithms.
> So the negotiation process for the above example would look like this:
>
> 1. Client sends startup packet with 
> “algorithms=zlib:1,3,5;zstd:6,7,8;uncompressed;”
> Since there is no compression mode specified, assume that the client wants 
> permanent compression.
> In future versions, the client can turn request the switchable compression 
> after the ‘;’ at the end of the message
>
> 2. Server replies with two messages:
> - CompressionAck message containing “algorithms=zstd:7,8;uncompressed;”
> Where the algorithms section basically matches the “decompress_algorithms” 
> server GUC parameter.
> In future versions, the server can specify the chosen compression mode after 
> the ‘;’ at the end of the message
>
> - Following SetCompressionMethod message containing “alg_idx=1;level_idx=1” 
> which
> essentially means that the server chose zstd with compression level 7 for 
> server-to-client compression. Every next message from the server is now 
> 

create table like: ACCESS METHOD

2020-12-09 Thread Justin Pryzby
I thought this was a good idea, but didn't hear back when I raised it before.

Failing to preserve access method is arguably a bug, reminiscent of CREATE
STATISTICS and 5564c1181.  But maybe it's not important to backpatch a fix in
this case, since access methods are still evolving.

https://www.postgresql.org/message-id/20190818193533.gl11...@telsasoft.com
On Sun, Aug 18, 2019 at 02:35:33PM -0500, Justin Pryzby wrote:
>  . What do you think about pg_restore --no-tableam; similar to
>--no-tablespaces, it would allow restoring a table to a different AM:
>PGOPTIONS='-c default_table_access_method=zedstore' pg_restore 
> --no-tableam ./pg_dump.dat -d postgres
>Otherwise, the dump says "SET default_table_access_method=heap", which
>overrides any value from PGOPTIONS and precludes restoring to new AM.
...
>  . it'd be nice if there was an ALTER TABLE SET ACCESS METHOD, to allow
>migrating data.  Otherwise I think the alternative is:
>   begin; lock t;
>   CREATE TABLE new_t LIKE (t INCLUDING ALL) USING (zedstore);
>   INSERT INTO new_t SELECT * FROM t;
>   for index; do CREATE INDEX...; done
>   DROP t; RENAME new_t (and all its indices). attach/inherit, etc.
>   commit;
>
>  . Speaking of which, I think LIKE needs a new option for ACCESS METHOD, which
>is otherwise lost.
>From 3df7de7f3f2b15c447534bcd7e05c5be79030404 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 15 Nov 2020 16:54:53 -0600
Subject: [PATCH v1] create table (like .. including ACCESS METHOD)

---
 doc/src/sgml/ref/create_table.sgml  | 12 +++-
 src/backend/parser/gram.y   |  3 ++-
 src/backend/parser/parse_utilcmd.c  |  7 +++
 src/include/nodes/parsenodes.h  | 17 +
 src/test/regress/expected/create_table_like.out |  2 +-
 src/test/regress/sql/create_table_like.sql  |  2 +-
 6 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 569f4c9da7..cb95177e92 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -87,7 +87,7 @@ class="parameter">referential_action ] [ ON UPDATE and like_option is:
 
-{ INCLUDING | EXCLUDING } { COMMENTS | CONSTRAINTS | DEFAULTS | GENERATED | IDENTITY | INDEXES | STATISTICS | STORAGE | ALL }
+{ INCLUDING | EXCLUDING } { ACCESS METHOD | COMMENTS | CONSTRAINTS | DEFAULTS | GENERATED | IDENTITY | INDEXES | STATISTICS | STORAGE | ALL }
 
 and partition_bound_spec is:
 
@@ -593,6 +593,16 @@ WITH ( MODULUS numeric_literal, REM
   available options are:
 
   
+   
+INCLUDING ACCESS METHOD
+
+ 
+  The table's access method will be copied.  By default, the
+  default_table_access_method is used.
+ 
+
+   
+

 INCLUDING COMMENTS
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8f341ac006..b32861a04e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3651,7 +3651,8 @@ TableLikeOptionList:
 		;
 
 TableLikeOption:
-COMMENTS			{ $$ = CREATE_TABLE_LIKE_COMMENTS; }
+ACCESS METHOD			{ $$ = CREATE_TABLE_LIKE_ACCESSMETHOD; }
+| COMMENTS			{ $$ = CREATE_TABLE_LIKE_COMMENTS; }
 | CONSTRAINTS		{ $$ = CREATE_TABLE_LIKE_CONSTRAINTS; }
 | DEFAULTS			{ $$ = CREATE_TABLE_LIKE_DEFAULTS; }
 | IDENTITY_P		{ $$ = CREATE_TABLE_LIKE_IDENTITY; }
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 89ee990599..3507fd4738 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -96,6 +96,7 @@ typedef struct
 	bool		ispartitioned;	/* true if table is partitioned */
 	PartitionBoundSpec *partbound;	/* transformed FOR VALUES */
 	bool		ofType;			/* true if statement contains OF typename */
+	char		*accessMethod;	/* table access method */
 } CreateStmtContext;
 
 /* State shared by transformCreateSchemaStmt and its subroutines */
@@ -252,6 +253,7 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	cxt.ispartitioned = stmt->partspec != NULL;
 	cxt.partbound = stmt->partbound;
 	cxt.ofType = (stmt->ofTypename != NULL);
+	cxt.accessMethod = NULL;
 
 	Assert(!stmt->ofTypename || !stmt->inhRelations);	/* grammar enforces */
 
@@ -345,6 +347,8 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	 */
 	stmt->tableElts = cxt.columns;
 	stmt->constraints = cxt.ckconstraints;
+	if (cxt.accessMethod != NULL)
+		stmt->accessMethod = cxt.accessMethod;
 
 	result = lappend(cxt.blist, stmt);
 	result = list_concat(result, cxt.alist);
@@ -1118,6 +1122,9 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla
 		cxt->likeclauses = lappend(cxt->likeclauses, table_like_clause);
 	}
 
+	if (table_like_clause->options & CREATE_TABLE_LIKE_ACCESSMETHOD)
+		cxt->accessMethod = get_am_name(relation->rd_rel->relam);
+
 	

Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-09 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
>> On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote:
>> I'd be happier if the "container" terminology were reserved for
>> that sort of physical containment, which means that I think a lot
>> of the commentary around SubscriptingRef is misleading.  But I do
>> not have a better word to suggest offhand.  Thoughts?

> I have only 'a composite'/'a compound' alternative in mind, but it's
> probably the same confusing as a container.

Yeah, in fact likely worse, since we tend to use those words for
rowtypes.  Naming is the hardest problem in CS :-(

regards, tom lane




Re: pg_stat_statements and "IN" conditions

2020-12-09 Thread Dmitry Dolgov
> On Wed, Dec 09, 2020 at 03:37:40AM +, Chengxi Sun wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> Hi, I did some test and it works well

Thanks for testing!




Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-09 Thread Dmitry Dolgov
> On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote:
>
> I've pushed the core patch now.

Thanks a lot!

> The jsonb parts now have to be
> rebased onto this design, which I'm assuming Dmitry will tackle

Yes, I'm already on it, just couldn't keep up with the changes in this
thread.

> BTW, while reviewing the thread to write the commit message,
> I was reminded of my concerns around the "is it a container"
> business.  As things stand, if type A has a typelem link to
> type B, then the system supposes that A contains B physically;
> this has implications for what's allowed in DDL, for example
> (cf find_composite_type_dependencies() and other places).
> We now have a feature whereby subscripting can yield a type
> that is not contained in the source type in that sense.
> I'd be happier if the "container" terminology were reserved for
> that sort of physical containment, which means that I think a lot
> of the commentary around SubscriptingRef is misleading.  But I do
> not have a better word to suggest offhand.  Thoughts?

I have only 'a composite'/'a compound' alternative in mind, but it's
probably the same confusing as a container.




Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-09 Thread Tom Lane
I've pushed the core patch now.  The jsonb parts now have to be
rebased onto this design, which I'm assuming Dmitry will tackle
(I do not intend to).  It's not quite clear to me whether we have
a meeting of the minds on what the jsonb functionality should be,
anyway.  Alexander seemed to be thinking about offering an option
to let the subscript be a jsonpath, but how would we distinguish
that from a plain-text field name?

BTW, while reviewing the thread to write the commit message,
I was reminded of my concerns around the "is it a container"
business.  As things stand, if type A has a typelem link to
type B, then the system supposes that A contains B physically;
this has implications for what's allowed in DDL, for example
(cf find_composite_type_dependencies() and other places).
We now have a feature whereby subscripting can yield a type
that is not contained in the source type in that sense.
I'd be happier if the "container" terminology were reserved for
that sort of physical containment, which means that I think a lot
of the commentary around SubscriptingRef is misleading.  But I do
not have a better word to suggest offhand.  Thoughts?

regards, tom lane




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-12-09 Thread Justin Pryzby
On Fri, Dec 04, 2020 at 12:23:23PM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> [ v24-0001-Document-historic-behavior-of-links-to-directori.patch ]
> 
> The cfbot is unhappy with one of the test cases you added:

> Maybe it could be salvaged by reversing the sense of the WHERE condition
> so that instead of trying to blacklist stuff, you whitelist just a small
> number of files that should certainly be there.

Yes, I had noticed this one.

-- 
Justin
>From 00d390ac3754db140bb98c5b09bc38e3e9b0e45c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v25 01/11] Document historic behavior of links to
 directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df29af6371..7cafdb9107 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25897,7 +25897,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 Returns a record containing the file's size, last access time stamp,
 last modification time stamp, last file status change time stamp (Unix
 platforms only), file creation time stamp (Windows only), and a flag
-indicating if it is a directory.
+indicating if it is a directory (or a symbolic link to a directory).


 This function is restricted to superusers by default, but other users
-- 
2.17.0

>From 9ac2f2cf76a53348c505482f8df27ab0464006d1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 30 Mar 2020 18:59:16 -0500
Subject: [PATCH v25 02/11] pg_stat_file and pg_ls_dir_* to use lstat()..

pg_ls_dir_* will now skip (no longer show) symbolic links, same as other
non-regular file types, as we advertize we do since 8b6d94cf6.  That seems to
be the intented behavior, since irregular file types are 1) less portable; and,
2) we don't currently show a file's type except for "bool is_dir".

pg_stat_file will now 1) show metadata of links themselves, rather than their
target; and, 2) specifically, show links to directories with "is_dir=false";
and, 3) not error on broken symlinks.
---
 doc/src/sgml/func.sgml  | 2 +-
 src/backend/utils/adt/genfile.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7cafdb9107..df29af6371 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25897,7 +25897,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 Returns a record containing the file's size, last access time stamp,
 last modification time stamp, last file status change time stamp (Unix
 platforms only), file creation time stamp (Windows only), and a flag
-indicating if it is a directory (or a symbolic link to a directory).
+indicating if it is a directory.


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index d34182a7b0..9f4927220b 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -406,7 +406,7 @@ pg_stat_file(PG_FUNCTION_ARGS)
 
 	filename = convert_and_check_filename(filename_t);
 
-	if (stat(filename, ) < 0)
+	if (lstat(filename, ) < 0)
 	{
 		if (missing_ok && errno == ENOENT)
 			PG_RETURN_NULL();
@@ -632,7 +632,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 
 		/* Get the file info */
 		snprintf(path, sizeof(path), "%s/%s", dir, de->d_name);
-		if (stat(path, ) < 0)
+		if (lstat(path, ) < 0)
 		{
 			/* Ignore concurrently-deleted files, else complain */
 			if (errno == ENOENT)
-- 
2.17.0

>From a7729e28c3361fad3adc09388a59d47bdce46eb1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 17 Mar 2020 13:16:24 -0500
Subject: [PATCH v25 03/11] Add tests on pg_ls_dir before changing it

---
 src/test/regress/expected/misc_functions.out | 24 
 src/test/regress/sql/misc_functions.sql  |  8 +++
 2 files changed, 32 insertions(+)

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index d3acb98d04..edbfd9abc1 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -201,6 +201,30 @@ select count(*) > 0 from
  t
 (1 row)
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+ name 
+--
+ .
+(1 row)
+
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+ name 
+--
+(0 rows)
+
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+ pg_ls_dir 
+---
+(0 rows)
+
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+ERROR:  could not open directory "does not exist": No such file or 

Insert Documentation - Returning Clause and Order

2020-12-09 Thread David G. Johnston
Hey,

Would it be accurate to add the following sentence to the INSERT
documentation under "Outputs"?

"...inserted or updated by the command."  For a multiple-values insertion,
the order of output rows will match the order that rows are presented in
the values or query clause.

https://www.postgresql.org/docs/current/sql-insert.html

David J.


Re: get_constraint_index() and conindid

2020-12-09 Thread Peter Eisentraut

On 2020-12-09 07:37, Michael Paquier wrote:

Only thing I could think of is that it maybe could use a (small)
comment in the message on that/why get_constraint_index is moved to
utils/lsyscache from catalog/dependency, as that took me some time to
understand.


commit message could reasonably say that maybe, but I don't think we
need to memorialize it in a comment.  lsyscache.c *is* where one
would expect to find a simple catalog-field-fetch function like this.
The previous implementation was not that, so it didn't belong there.


Agreed.


Thanks, I committed it with an expanded commit message.

After further inspection, I'm not going to do anything about the nearby 
get_index_constraint() at this item.  The current implementation can use 
an index on pg_depend.  A scan of pg_constraint has no index available.





Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2020-12-09 Thread Alvaro Herrera
On 2020-Dec-09, Bharath Rupireddy wrote:

> I'm not sure how many more of such commands exist which require changes.

The other thread has a list.  I think it is complete, but if you want to
double-check, that would be helpful.

> How about doing it this way?
> 
> 1) Have a separate common thread listing the commands and specifying
> clearly the agreed behaviour for such commands.
> 2) Whenever the separate patches are submitted(hopefully) by the
> hackers for such commands, after the review we can make a note of that
> patch in the common thread.
> 3) Once the patches for all the listed commands are
> received(hopefully) , then they can be committed together in one
> release.

Sounds good.  I think this thread is a good place to collect those
patches, but if you would prefer to have a new thread, feel free to
start one (I'd suggest CC'ing me and Tsunakawa-san).




Re: pg_rewind race condition just after promotion

2020-12-09 Thread Heikki Linnakangas

On 08/12/2020 06:45, Kyotaro Horiguchi wrote:

At Mon, 7 Dec 2020 20:13:25 +0200, Heikki Linnakangas  wrote in

I think we should fix this properly. I'm not sure if it can lead to a
broken cluster, but at least it can cause pg_rewind to fail
unnecessarily and in a user-unfriendly way. But this is actually
pretty simple to fix. pg_rewind looks at the control file to find out
the timeline the server is on. When promotion happens, the startup
process updates minRecoveryPoint and minRecoveryPointTLI fields in the
control file. We just need to read it from there. Patch attached.


Looks fine to me.  A bit concerned about making sourceHistory
needlessly file-local but on the other hand unifying sourceHistory and
targetHistory looks better.


Looking closer, findCommonAncestorTimeline() was freeing sourceHistory, 
which was pretty horrible when it's a file-local variable. I changed it 
so that both the source and target histories are passed to 
findCommonAncestorTimeline() as arguments. That seems more clear.



For the test part, that change doesn't necessariry catch the failure
of the current version, but I *believe* the prevous code is the result
of an actual failure in the past so the test probablistically (or
dependently on platforms?) hits the failure if it happned.


Right. I think the current test coverage is good enough. We've been 
bitten by this a few times by now, when we've forgotten to add the 
manual checkpoint commands to new tests, and the buildfarm has caught it 
pretty quickly.



I think we should also backpatch this. Back in 2015, we decided that
we can live with this, but it's always been a bit bogus, and seems
simple enough to fix.


I don't think this changes any successful behavior and it just saves
the failure case so +1 for back-patching.


Thanks for the review! New patch version attached.

- Heikki
>From 5894d560ca4a4c97b805cfb2c396cedd05f5a19c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 9 Dec 2020 15:28:59 +0200
Subject: [PATCH v2 1/1] pg_rewind: Fix determining TLI when server was just
 promoted.

If the source server was just promoted, and it hasn't written the
checkpoint record yet, pg_rewind considered the server to be still on
the old timeline. Because of that, it would claim incorrectly that no
rewind is required. Fix that by looking at the minRecoveryPointTLI in
the control file in addition to the ThisTimeLineID on the checkpoint.

This has been a known issue since forever, and we had worked around it
in the regression tests by issuing a checkpoint after each promotion,
before running pg_rewind. But that was always quite hacky, so better
to fix this properly. This doesn't add any new tests for this, but
removes the previously-added workarounds from the existing tests, so
that they should occasionally hit this codepath again.

Backpatch to all supported versions.

Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/9f568c97-87fe-a716-bd39-65299b8a60f4%40iki.fi
---
 src/bin/pg_rewind/pg_rewind.c | 96 ---
 src/bin/pg_rewind/t/007_standby_source.pl |  1 -
 src/bin/pg_rewind/t/008_min_recovery_point.pl |  9 --
 src/bin/pg_rewind/t/RewindTest.pm |  8 --
 4 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index d89c08f81d..8cb3bdac48 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -44,7 +44,13 @@ static void digestControlFile(ControlFileData *ControlFile,
 			  const char *content, size_t size);
 static void getRestoreCommand(const char *argv0);
 static void sanityChecks(void);
-static void findCommonAncestorTimeline(XLogRecPtr *recptr, int *tliIndex);
+static TimeLineHistoryEntry *getTimelineHistory(TimeLineID tli, bool is_source,
+int *nentries);
+static void findCommonAncestorTimeline(TimeLineHistoryEntry *sourceHistory,
+	   int sourceNentries,
+	   TimeLineHistoryEntry *targetHistory,
+	   int targetNentries,
+	   XLogRecPtr *recptr, int *tliIndex);
 static void ensureCleanShutdown(const char *argv0);
 static void disconnect_atexit(void);
 
@@ -129,6 +135,8 @@ main(int argc, char **argv)
 	XLogRecPtr	chkptrec;
 	TimeLineID	chkpttli;
 	XLogRecPtr	chkptredo;
+	TimeLineID	source_tli;
+	TimeLineID	target_tli;
 	XLogRecPtr	target_wal_endrec;
 	size_t		size;
 	char	   *buffer;
@@ -325,14 +333,28 @@ main(int argc, char **argv)
 
 	sanityChecks();
 
+	/*
+	 * Usually, the TLI can be found in the latest checkpoint record. But if
+	 * the source server is just being promoted (or it's a standby that's
+	 * following a primary that's just being promoted), and the checkpoint
+	 * requested by the promotion hasn't completed yet, the latest timeline is
+	 * in minRecoveryPoint. So we check which is later, the TLI of the
+	 * minRecoveryPoint or the latest checkpoint.
+	 */
+	source_tli = Max(ControlFile_source.minRecoveryPointTLI,
+	 

Re: On login trigger: take three

2020-12-09 Thread Konstantin Knizhnik



On 09.12.2020 15:34, Pavel Stehule wrote:

Hi

st 9. 12. 2020 v 13:17 odesílatel Greg Nancarrow > napsal:


On Tue, Dec 8, 2020 at 3:26 PM Pavel Stehule
mailto:pavel.steh...@gmail.com>> wrote:
>
>
> There are two maybe generic questions?
>
> 1. Maybe we can introduce more generic GUC for all event
triggers like disable_event_triggers? This GUC can be checked only
by the database owner or super user. It can be an alternative
ALTER TABLE DISABLE TRIGGER ALL. It can be protection against
necessity to restart to single mode to repair the event trigger. I
think so more generic solution is better than special
disable_client_connection_trigger GUC.
>
> 2. I have no objection against client_connection. It is probably
better for the mentioned purpose - possibility to block connection
to database. Can be interesting, and I am not sure how much work
it is to introduce the second event - session_start. This event
should be started after connecting - so the exception there
doesn't block connect, and should be started also after the new
statement "DISCARD SESSION", that will be started automatically
after DISCARD ALL.  This feature should not be implemented in
first step, but it can be a plan for support pooled connections
>

I've created a separate patch to address question (1), rather than
include it in the main patch, which I've adjusted accordingly. I'll
leave question (2) until another time, as you suggest.
See the attached patches.


I see two possible questions?

1. when you introduce this event, then the new hook is useless ?

2. what is a performance impact for users that want not to use this 
feature. What is a overhead of EventTriggerOnConnect and is possible 
to skip this step if database has not any event trigger


As far as I understand this are questions to me rather than to Greg.
1. Do you mean client_connection_hook? It is used to implement this new 
event type. It can be also used for other purposes.
2. Connection overhead is quite large. Supporting on connect hook 
requires traversal of event trigger relation. But this overhead is 
negligible comparing with overhead of establishing connection. In any 
case I did the following test (with local connection):


pgbench -C -S -T 100 -P 1 -M prepared postgres

without this patch:
tps = 424.287889 (including connections establishing)
tps = 952.911068 (excluding connections establishing)

with this patch (but without any connection trigger defined):
tps = 434.642947 (including connections establishing)
tps = 995.525383 (excluding connections establishing)

As you can see - there is almost now different (patched version is even 
faster, but it seems to be just "white noise".


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2020-12-09 Thread Bharath Rupireddy
On Wed, Dec 9, 2020 at 6:22 PM Alvaro Herrera  wrote:
>
> On 2020-Dec-09, tsunakawa.ta...@fujitsu.com wrote:
>
> > From: Alvaro Herrera 
> > > But what happens when you create another partition after you change the
> > > "loggedness" of the partitioned table?
> >
> > The new partition will have a property specified when the user creates
> > it.  That is, while the storage property of each storage unit
> > (=partition) is basically independent, ALTER TABLE on a partitioned
> > table is a convenient way to set the same property value to all its
> > underlying storage units.
>
> Well, that definition seems unfriendly to me.  I prefer the stance that
> if you change the value for the parent, then future partitions inherit
> that value.
>
> > I got an impression from the discussion that some form of consensus on
> > the principle was made and you were trying to create a fix for REPLICA
> > IDENTITY.  Do you think the principle was unclear and we should state
> > it first (partly to refresh people's memories)?
>
> I think the principle was sound -- namely, that we should make all those
> commands recurse by default, and for cases where it matters, the
> parent's setting should determine the default for future children.
>
> > I'm kind of for it, but I'm hesitant to create the fix for all ALTER
> > actions, because it may take a lot of time and energy as you were
> > afraid.  Can we define the principle, and then create individual fixes
> > independently based on that principle?
>
> That seems acceptable to me, as long as we get all changes in the same
> release.  What we don't want (according to my reading of that
> discussion) is to change the semantics of a few subcommands in one
> release, and the semantics of a few other subcommands in another
> release.
>

I'm not sure how many more of such commands exist which require changes.

How about doing it this way?

1) Have a separate common thread listing the commands and specifying
clearly the agreed behaviour for such commands.
2) Whenever the separate patches are submitted(hopefully) by the
hackers for such commands, after the review we can make a note of that
patch in the common thread.
3) Once the patches for all the listed commands are
received(hopefully) , then they can be committed together in one
release.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Refactor MD5 implementations and switch to EVP for OpenSSL

2020-12-09 Thread Daniel Gustafsson
> On 9 Dec 2020, at 06:47, Michael Paquier  wrote:

> In short, this seems rather committable to me.

Agreed, this version of the patch looks good to me.  I've looked over the
attributions for the code movement and it seems to match now, and all tests
pass etc. +1 on going ahead with this version.

The tiniest level of nitpicking would be that md5.h doesn't have the 
/* PG_MD5_H */ comment on the #endif line (and it's even tinier since it's
not even the fault of this patch - I just happened to notice when looking
at that file).

cheers ./daniel




Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2020-12-09 Thread Alvaro Herrera
On 2020-Dec-09, tsunakawa.ta...@fujitsu.com wrote:

> From: Alvaro Herrera 
> > But what happens when you create another partition after you change the
> > "loggedness" of the partitioned table?
> 
> The new partition will have a property specified when the user creates
> it.  That is, while the storage property of each storage unit
> (=partition) is basically independent, ALTER TABLE on a partitioned
> table is a convenient way to set the same property value to all its
> underlying storage units.

Well, that definition seems unfriendly to me.  I prefer the stance that
if you change the value for the parent, then future partitions inherit
that value.

> I got an impression from the discussion that some form of consensus on
> the principle was made and you were trying to create a fix for REPLICA
> IDENTITY.  Do you think the principle was unclear and we should state
> it first (partly to refresh people's memories)?

I think the principle was sound -- namely, that we should make all those
commands recurse by default, and for cases where it matters, the
parent's setting should determine the default for future children.

> I'm kind of for it, but I'm hesitant to create the fix for all ALTER
> actions, because it may take a lot of time and energy as you were
> afraid.  Can we define the principle, and then create individual fixes
> independently based on that principle?

That seems acceptable to me, as long as we get all changes in the same
release.  What we don't want (according to my reading of that
discussion) is to change the semantics of a few subcommands in one
release, and the semantics of a few other subcommands in another
release.




Re: On login trigger: take three

2020-12-09 Thread Pavel Stehule
Hi

st 9. 12. 2020 v 13:17 odesílatel Greg Nancarrow 
napsal:

> On Tue, Dec 8, 2020 at 3:26 PM Pavel Stehule 
> wrote:
> >
> >
> > There are two maybe generic questions?
> >
> > 1. Maybe we can introduce more generic GUC for all event triggers like
> disable_event_triggers? This GUC can be checked only by the database owner
> or super user. It can be an alternative ALTER TABLE DISABLE TRIGGER ALL. It
> can be protection against necessity to restart to single mode to repair the
> event trigger. I think so more generic solution is better than special
> disable_client_connection_trigger GUC.
> >
> > 2. I have no objection against client_connection. It is probably better
> for the mentioned purpose - possibility to block connection to database.
> Can be interesting, and I am not sure how much work it is to introduce the
> second event - session_start. This event should be started after connecting
> - so the exception there doesn't block connect, and should be started also
> after the new statement "DISCARD SESSION", that will be started
> automatically after DISCARD ALL.  This feature should not be implemented in
> first step, but it can be a plan for support pooled connections
> >
>
> I've created a separate patch to address question (1), rather than
> include it in the main patch, which I've adjusted accordingly. I'll
> leave question (2) until another time, as you suggest.
> See the attached patches.
>

I see two possible questions?

1. when you introduce this event, then the new hook is useless ?

2. what is a performance impact for users that want not to use this
feature. What is a overhead of EventTriggerOnConnect and is possible to
skip this step if database has not any event trigger

Pavel


> Regards,
> Greg Nancarrow
> Fujitsu Australia
>


Re: On login trigger: take three

2020-12-09 Thread Pavel Stehule
st 9. 12. 2020 v 13:17 odesílatel Greg Nancarrow 
napsal:

> On Tue, Dec 8, 2020 at 3:26 PM Pavel Stehule 
> wrote:
> >
> >
> > There are two maybe generic questions?
> >
> > 1. Maybe we can introduce more generic GUC for all event triggers like
> disable_event_triggers? This GUC can be checked only by the database owner
> or super user. It can be an alternative ALTER TABLE DISABLE TRIGGER ALL. It
> can be protection against necessity to restart to single mode to repair the
> event trigger. I think so more generic solution is better than special
> disable_client_connection_trigger GUC.
> >
> > 2. I have no objection against client_connection. It is probably better
> for the mentioned purpose - possibility to block connection to database.
> Can be interesting, and I am not sure how much work it is to introduce the
> second event - session_start. This event should be started after connecting
> - so the exception there doesn't block connect, and should be started also
> after the new statement "DISCARD SESSION", that will be started
> automatically after DISCARD ALL.  This feature should not be implemented in
> first step, but it can be a plan for support pooled connections
> >
>

PGC_SU_BACKEND is too strong, there should be PGC_BACKEND if this option
can be used by database owner

Pavel


> I've created a separate patch to address question (1), rather than
> include it in the main patch, which I've adjusted accordingly. I'll
> leave question (2) until another time, as you suggest.
> See the attached patches.
>
> Regards,
> Greg Nancarrow
> Fujitsu Australia
>


Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Dilip Kumar
On Wed, 9 Dec 2020 at 5:41 PM, Amit Kapila  wrote:

> On Wed, Dec 9, 2020 at 4:18 PM Dilip Kumar  wrote:
> >
> > On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila 
> wrote:
> > >
> > > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar 
> wrote:
> > > >
> > > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow 
> wrote:
> > > > >
> > > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C 
> wrote:
> > > > > >
> > > > > > Most of the code present in
> > > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > > > applicable for parallel copy patch also. The patch in this thread
> > > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make
> it
> > > > > > generic by handling like the comments below, that way this
> parallel
> > > > > > safety checks can be used based on the value set in
> > > > > > max_parallel_hazard_context. There is nothing wrong with the
> changes,
> > > > > > I'm providing these comments so that this patch can be
> generalized for
> > > > > > parallel checks and the same can also be used by parallel copy.
> > > > >
> > > > > Hi Vignesh,
> > > > >
> > > > > You are absolutely right in pointing that out, the code was taking
> > > > > short-cuts knowing that for Parallel Insert,
> > > > > "max_parallel_hazard_context.max_interesting" had been set to
> > > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > > > re-used by other callers.
> > > > >
> > > > > I've attached a new set of patches that includes your suggested
> improvements.
> > > >
> > > > I was going through v10-0001 patch where we are parallelizing only
> the
> > > > select part.
> > > >
> > > > + /*
> > > > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > > > + * INSERT...ON CONFLICT...DO UPDATE...
> > > > + */
> > > > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > > > ONCONFLICT_UPDATE)
> > > > + return PROPARALLEL_UNSAFE;
> > > >
> > > > I understand that we can now allow updates from the worker, but what
> > > > is the problem if we allow the parallel select even if there is an
> > > > update in the leader?
> > > >
> > >
> > > I think we can't allow update even in leader without having a
> > > mechanism for a shared combocid table. Right now, we share the
> > > ComboCids at the beginning of the parallel query and then never change
> > > it during the parallel query but if we allow updates in the leader
> > > backend which can generate a combocid then we need a mechanism to
> > > propagate that change. Does this make sense?
> > >
> >
> > Okay, got it.  Basically, ONCONFLICT_UPDATE might run inside some
> > transaction block and there is a possibility that update may try to
> > update the same tuple is previously inserted by the same transaction
> > and in that case, it will generate the combo cid.  Thanks for
> > clarifying.
> >
>
> We can probably add a comment in the patch so that it is clear why we
> are not allowing this case.


+1

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: On login trigger: take three

2020-12-09 Thread Greg Nancarrow
On Tue, Dec 8, 2020 at 3:26 PM Pavel Stehule  wrote:
>
>
> There are two maybe generic questions?
>
> 1. Maybe we can introduce more generic GUC for all event triggers like 
> disable_event_triggers? This GUC can be checked only by the database owner or 
> super user. It can be an alternative ALTER TABLE DISABLE TRIGGER ALL. It can 
> be protection against necessity to restart to single mode to repair the event 
> trigger. I think so more generic solution is better than special 
> disable_client_connection_trigger GUC.
>
> 2. I have no objection against client_connection. It is probably better for 
> the mentioned purpose - possibility to block connection to database. Can be 
> interesting, and I am not sure how much work it is to introduce the second 
> event - session_start. This event should be started after connecting - so the 
> exception there doesn't block connect, and should be started also after the 
> new statement "DISCARD SESSION", that will be started automatically after 
> DISCARD ALL.  This feature should not be implemented in first step, but it 
> can be a plan for support pooled connections
>

I've created a separate patch to address question (1), rather than
include it in the main patch, which I've adjusted accordingly. I'll
leave question (2) until another time, as you suggest.
See the attached patches.

Regards,
Greg Nancarrow
Fujitsu Australia


v1-0001-Add-new-config-parameter-disable_event_triggers.patch
Description: Binary data


v1-0002-Add-new-client_connection-event-supporting-a-logon-trigger.patch
Description: Binary data


Re: Feature improvement for pg_stat_statements

2020-12-09 Thread Fujii Masao




On 2020/12/09 20:42, Li Japin wrote:

Hi,


On Dec 9, 2020, at 6:37 PM, Seino Yuki mailto:sein...@oss.nttdata.com>> wrote:

2020-12-01 01:04 に Fujii Masao さんは書きました:

On 2020/11/30 23:05, Seino Yuki wrote:

2020-11-30 15:02 に Fujii Masao さんは書きました:

On 2020/11/30 12:08, Seino Yuki wrote:

2020-11-27 22:39 に Fujii Masao さんは書きました:

On 2020/11/27 21:39, Seino Yuki wrote:

2020-11-27 21:37 に Seino Yuki さんは書きました:

2020-11-16 12:28 に Seino Yuki さんは書きました:

Due to similar fixed, we have also merged the patches discussed in the
following thread.
https://commitfest.postgresql.org/30/2736/ 

The patch is posted on the 2736 side of the thread.
Regards.

I forgot the attachment and will resend it.
The following patches have been committed and we have created a
compliant patch for them.
https://commitfest.postgresql.org/30/2736/ 

Please confirm.

Thanks for updating the patch! Here are review comments from me.
+    OUT reset_exec_time TIMESTAMP WITH TIME ZONE
I prefer "stats_reset" as the name of this column for the sake of
consistency with the similar column in other stats views like
pg_stat_database.
Isn't it better to use lower cases for "TIMESTAMP WITH TIME ZONE"
because other DDLs in pg_stat_statements do that for the declaraion
of data type?
@@ -565,6 +568,8 @@ pgss_shmem_startup(void)
 pgss->n_writers = 0;
 pgss->gc_count = 0;
 pgss->stats.dealloc = 0;
+    pgss->stats.reset_exec_time = 0;
+    pgss->stats.reset_exec_time_isnull = true;
The reset time should be initialized with GetCurrentTimestamp() instead
of 0? If so, I think that we can completely get rid of reset_exec_time_isnull.
+    memset(nulls, 0, sizeof(nulls));
If some entries in "values[]" may not be set, "values[]" also needs to
be initialized with 0.
MemSet() should be used, instead?
+    /* Read dealloc */
+    values[0] = stats.dealloc;
Int64GetDatum() should be used here?
+    reset_ts = GetCurrentTimestamp();
GetCurrentTimestamp() needs to be called only when all the entries
are removed. But it's called even in other cases.
Regards,

Thanks for the review!
Fixed the patch.

Thanks for updating the patch! Here are another review comments.
+   reset_exec_time timestamp
with time zone
You forgot to update the column name in the doc?
+    Shows the time at which
pg_stat_statements_reset(0,0,0) was last called.
What about updating this to something like "Time at which all statistics
in the pg_stat_statements view were last reset." for the sale of
onsistency with the description about stats_reset column in other
tats views?
+    /* Read stats_reset */
+    values[1] = stats.stats_reset;
TimestampTzGetDatum() seems necessary.
+    reset_ts = GetCurrentTimestamp();
 /* Reset global statistics for pg_stat_statements */
Isn't it better to call GetCurrentTimestamp() before taking
an exclusive lwlock, in entry_reset()?
Regards,

Thanks for the new comment.
I got the following pointers earlier.

+    reset_ts = GetCurrentTimestamp();
GetCurrentTimestamp() needs to be called only when all the entries
are removed. But it's called even in other cases.

Which do you think is better? I think the new pointing out is better,
because entry_reset is not likely to be called often.

I was thinking that GetCurrentTimestamp() should be called before
pgss->lock lwlock is taken, only when all three arguments userid, dbid
and queryid are zero. But on second thought, we should call
GetCurrentTimestamp() and reset the stats, after the following codes?
/* All entries are removed? */
if (num_entries != num_remove)
goto release_lock;
That is, IMO that even when pg_stat_statements_reset() with non-zero
arguments is executed, if it removes all the entries, we should reset
the stats. Thought?
Regards,


+1.Fixed the patch.
We tested various reset patterns and they seemed to be fine.



Since BlessTupleDesc() is for SRFs according to the comments, I think, we can
remove it here.  Correct?

+       if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
+               elog(ERROR, "return type must be a row type");
+
+       tupdesc = BlessTupleDesc(tupdesc);


Here are other comments from me:

The patch seems to contain whitespace errors.
You can see them by "git diff --check".

+  
+Time at which all statistics in the pg_stat_statements view were last 
reset.
+  

"pg_stat_statements" in the above should be enclosed with
 and .

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Amit Kapila
On Wed, Dec 9, 2020 at 4:18 PM Dilip Kumar  wrote:
>
> On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  
> > > wrote:
> > > >
> > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> > > > >
> > > > > Most of the code present in
> > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > > applicable for parallel copy patch also. The patch in this thread
> > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > > > generic by handling like the comments below, that way this parallel
> > > > > safety checks can be used based on the value set in
> > > > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > > > I'm providing these comments so that this patch can be generalized for
> > > > > parallel checks and the same can also be used by parallel copy.
> > > >
> > > > Hi Vignesh,
> > > >
> > > > You are absolutely right in pointing that out, the code was taking
> > > > short-cuts knowing that for Parallel Insert,
> > > > "max_parallel_hazard_context.max_interesting" had been set to
> > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > > re-used by other callers.
> > > >
> > > > I've attached a new set of patches that includes your suggested 
> > > > improvements.
> > >
> > > I was going through v10-0001 patch where we are parallelizing only the
> > > select part.
> > >
> > > + /*
> > > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > > + * INSERT...ON CONFLICT...DO UPDATE...
> > > + */
> > > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > > ONCONFLICT_UPDATE)
> > > + return PROPARALLEL_UNSAFE;
> > >
> > > I understand that we can now allow updates from the worker, but what
> > > is the problem if we allow the parallel select even if there is an
> > > update in the leader?
> > >
> >
> > I think we can't allow update even in leader without having a
> > mechanism for a shared combocid table. Right now, we share the
> > ComboCids at the beginning of the parallel query and then never change
> > it during the parallel query but if we allow updates in the leader
> > backend which can generate a combocid then we need a mechanism to
> > propagate that change. Does this make sense?
> >
>
> Okay, got it.  Basically, ONCONFLICT_UPDATE might run inside some
> transaction block and there is a possibility that update may try to
> update the same tuple is previously inserted by the same transaction
> and in that case, it will generate the combo cid.  Thanks for
> clarifying.
>

We can probably add a comment in the patch so that it is clear why we
are not allowing this case.

-- 
With Regards,
Amit Kapila.




Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Amit Kapila
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>

In v10-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO,
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2049,10 +2049,6 @@ heap_prepare_insert(Relation relation,
HeapTuple tup, TransactionId xid,
  * inserts in general except for the cases where inserts generate a new
  * CommandId (eg. inserts into a table having a foreign key column).
  */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));

I think I have given a comment long back that here we can have an
Assert to check if it is a parallel-worker and relation has a
foreign-key and probably other conditions if possible. It is better to
protect such cases from happening due to any bugs. Is there a reason
you have not handled it?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Feature improvement for pg_stat_statements

2020-12-09 Thread Li Japin
Hi,

On Dec 9, 2020, at 6:37 PM, Seino Yuki 
mailto:sein...@oss.nttdata.com>> wrote:

2020-12-01 01:04 に Fujii Masao さんは書きました:
On 2020/11/30 23:05, Seino Yuki wrote:
2020-11-30 15:02 に Fujii Masao さんは書きました:
On 2020/11/30 12:08, Seino Yuki wrote:
2020-11-27 22:39 に Fujii Masao さんは書きました:
On 2020/11/27 21:39, Seino Yuki wrote:
2020-11-27 21:37 に Seino Yuki さんは書きました:
2020-11-16 12:28 に Seino Yuki さんは書きました:
Due to similar fixed, we have also merged the patches discussed in the
following thread.
https://commitfest.postgresql.org/30/2736/
The patch is posted on the 2736 side of the thread.
Regards.
I forgot the attachment and will resend it.
The following patches have been committed and we have created a
compliant patch for them.
https://commitfest.postgresql.org/30/2736/
Please confirm.
Thanks for updating the patch! Here are review comments from me.
+OUT reset_exec_time TIMESTAMP WITH TIME ZONE
I prefer "stats_reset" as the name of this column for the sake of
consistency with the similar column in other stats views like
pg_stat_database.
Isn't it better to use lower cases for "TIMESTAMP WITH TIME ZONE"
because other DDLs in pg_stat_statements do that for the declaraion
of data type?
@@ -565,6 +568,8 @@ pgss_shmem_startup(void)
 pgss->n_writers = 0;
 pgss->gc_count = 0;
 pgss->stats.dealloc = 0;
+pgss->stats.reset_exec_time = 0;
+pgss->stats.reset_exec_time_isnull = true;
The reset time should be initialized with GetCurrentTimestamp() instead
of 0? If so, I think that we can completely get rid of reset_exec_time_isnull.
+memset(nulls, 0, sizeof(nulls));
If some entries in "values[]" may not be set, "values[]" also needs to
be initialized with 0.
MemSet() should be used, instead?
+/* Read dealloc */
+values[0] = stats.dealloc;
Int64GetDatum() should be used here?
+reset_ts = GetCurrentTimestamp();
GetCurrentTimestamp() needs to be called only when all the entries
are removed. But it's called even in other cases.
Regards,
Thanks for the review!
Fixed the patch.
Thanks for updating the patch! Here are another review comments.
+   reset_exec_time timestamp
with time zone
You forgot to update the column name in the doc?
+Shows the time at which
pg_stat_statements_reset(0,0,0) was last called.
What about updating this to something like "Time at which all statistics
in the pg_stat_statements view were last reset." for the sale of
onsistency with the description about stats_reset column in other
tats views?
+/* Read stats_reset */
+values[1] = stats.stats_reset;
TimestampTzGetDatum() seems necessary.
+reset_ts = GetCurrentTimestamp();
 /* Reset global statistics for pg_stat_statements */
Isn't it better to call GetCurrentTimestamp() before taking
an exclusive lwlock, in entry_reset()?
Regards,
Thanks for the new comment.
I got the following pointers earlier.
+reset_ts = GetCurrentTimestamp();
GetCurrentTimestamp() needs to be called only when all the entries
are removed. But it's called even in other cases.
Which do you think is better? I think the new pointing out is better,
because entry_reset is not likely to be called often.
I was thinking that GetCurrentTimestamp() should be called before
pgss->lock lwlock is taken, only when all three arguments userid, dbid
and queryid are zero. But on second thought, we should call
GetCurrentTimestamp() and reset the stats, after the following codes?
/* All entries are removed? */
if (num_entries != num_remove)
goto release_lock;
That is, IMO that even when pg_stat_statements_reset() with non-zero
arguments is executed, if it removes all the entries, we should reset
the stats. Thought?
Regards,

+1.Fixed the patch.
We tested various reset patterns and they seemed to be fine.


Since BlessTupleDesc() is for SRFs according to the comments, I think, we can
remove it here.  Correct?

+   if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
+   elog(ERROR, "return type must be a row type");
+
+   tupdesc = BlessTupleDesc(tupdesc);

--
Best regards
ChengDu WenWu Information Technology Co.,Ltd.
Japin Li



Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-12-09 Thread Fujii Masao




On 2020/12/04 20:15, Bharath Rupireddy wrote:

On Fri, Dec 4, 2020 at 1:46 PM Bharath Rupireddy
 wrote:


On Fri, Dec 4, 2020 at 11:49 AM Fujii Masao  wrote:



Attaching the v2 patch set. Please review it further.


Regarding the 0001 patch, we should add the function that returns
the information of cached connections like dblink_get_connections(),
together with 0001 patch? Otherwise it's not easy for users to
see how many cached connections are and determine whether to
disconnect them or not. Sorry if this was already discussed before.



Thanks for bringing this up. Exactly this is what I was thinking a few
days back. Say the new function postgres_fdw_get_connections() which
can return an array of server names whose connections exist in the
cache. Without this function, the user may not know how many
connections this backend has until he checks it manually on the remote
server.

Thoughts? If okay, I can code the function in the 0001 patch.



Added a new function postgres_fdw_get_connections() into 0001 patch,


Thanks!



which returns a list of server names for which there exists an
existing open and active connection.

Attaching v3 patch set, please review it further.


I started reviewing 0001 patch.

IMO the 0001 patch should be self-contained so that we can commit it at first. 
That is, I think that it's better to move the documents and tests for the 
functions 0001 patch adds from 0004 to 0001.

Since 0001 introduces new user-visible functions into postgres_fdw, the version 
of postgres_fdw should be increased?

The similar code to get the server name from cached connection entry exists also in 
pgfdw_reject_incomplete_xact_state_change(). I'm tempted to make the "common" 
function for that code and use it both in postgres_fdw_get_connections() and 
pgfdw_reject_incomplete_xact_state_change(), to simplify the code.

+   /* We only look for active and open remote connections. 
*/
+   if (entry->invalidated || !entry->conn)
+   continue;

We should return even invalidated entry because it has still cached connection?
Also this makes me wonder if we should return both the server name and boolean 
flag indicating whether it's invalidated or not. If so, users can easily find 
the invalidated connection entry and disconnect it because there is no need to 
keep invalidated connection.

+   if (all)
+   {
+   hash_destroy(ConnectionHash);
+   ConnectionHash = NULL;
+   result = true;
+   }

Could you tell me why ConnectionHash needs to be destroyed?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: MultiXact\SLRU buffers configuration

2020-12-09 Thread Gilles Darold

Le 09/12/2020 à 11:51, Gilles Darold a écrit :

Also PG regression tests are failing too on several part.


Forget this, I have not run the regression tests in the right repository:

...

===
 All 189 tests passed.
===


I'm looking why the application is failing.


--
Gilles Darold
http://www.darold.net/





Re: [Patch] Optimize dropping of relation buffers using dlist

2020-12-09 Thread Amit Kapila
On Wed, Dec 9, 2020 at 6:32 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 8 Dec 2020 16:28:41 +0530, Amit Kapila  
> wrote in
> > On Tue, Dec 8, 2020 at 12:13 PM tsunakawa.ta...@fujitsu.com
> >  wrote:
> > >
> > > From: Jamison, Kirk/ジャミソン カーク 
> > > > Because one of the rel's cached value was false, it forced the
> > > > full-scan path for TRUNCATE.
> > > > Is there a possible workaround for this?
> > >
> > > Hmm, the other two relfilenodes are for the TOAST table and index of the 
> > > target table.  I think the INSERT didn't access those TOAST relfilenodes 
> > > because the inserted data was stored in the main storage.  But TRUNCATE 
> > > always truncates all the three relfilenodes.  So, the standby had not 
> > > opened the relfilenode for the TOAST stuff or cached its size when 
> > > replaying the TRUNCATE.
> > >
> > > I'm afraid this is more common than we can ignore and accept the slow 
> > > traditional path, but I don't think of a good idea to use the cached flag.
> > >
> >
> > I also can't think of a way to use an optimized path for such cases
> > but I don't agree with your comment on if it is common enough that we
> > leave this optimization entirely for the truncate path.
>
> Mmm. At least btree doesn't need to call smgrnblocks except at
> expansion, so we cannot get to the optimized path in major cases of
> truncation involving btree (and/or maybe other indexes).
>

AFAICS, btree insert should call smgrnblocks via
btree_xlog_insert->XLogReadBufferForRedo->XLogReadBufferForRedoExtended->XLogReadBufferExtended->smgrnblocks.
Similarly delete should also call smgrnblocks. Can you be bit more
specific related to the btree case you have in mind?

-- 
With Regards,
Amit Kapila.




Re: MultiXact\SLRU buffers configuration

2020-12-09 Thread Gilles Darold

Hi Andrey,

Thanks for the backport. I have issue with the first patch "Use shared 
lock in GetMultiXactIdMembers for offsets and members" 
(v1106-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-o.patch) the 
applications are not working anymore when I'm applying it. Also PG 
regression tests are failing too on several part.



   test insert_conflict  ... ok
   test create_function_1    ... FAILED
   test create_type  ... FAILED
   test create_table ... FAILED
   test create_function_2    ... FAILED
   test copy ... FAILED
   test copyselect   ... ok
   test copydml  ... ok
   test create_misc  ... FAILED
   test create_operator  ... FAILED
   test create_procedure ... ok
   test create_index ... FAILED
   test index_including  ... ok
   test create_view  ... FAILED
   test create_aggregate ... ok
   test create_function_3    ... ok
   test create_cast  ... ok
   test constraints  ... FAILED
   test triggers ... FAILED
   test inherit  ...
   ^C



This is also where I left my last try to back port for PG11, I will try 
to fix it again but it could take time to have it working.


Best regards,

--
Gilles Darold


Le 08/12/2020 à 18:52, Andrey Borodin a écrit :

Hi Gilles!

Many thanks for your message!


8 дек. 2020 г., в 21:05, Gilles Darold  написал(а):

I know that this report is not really helpful

Quite contrary - this benchmarks prove that controllable reproduction exists. 
I've rebased patches for PG11. Can you please benchmark them (without extending 
SLRU)?

Best regards, Andrey Borodin.



Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Dilip Kumar
On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila  wrote:
>
> On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar  wrote:
> >
> > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  wrote:
> > >
> > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> > > >
> > > > Most of the code present in
> > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > applicable for parallel copy patch also. The patch in this thread
> > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > > generic by handling like the comments below, that way this parallel
> > > > safety checks can be used based on the value set in
> > > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > > I'm providing these comments so that this patch can be generalized for
> > > > parallel checks and the same can also be used by parallel copy.
> > >
> > > Hi Vignesh,
> > >
> > > You are absolutely right in pointing that out, the code was taking
> > > short-cuts knowing that for Parallel Insert,
> > > "max_parallel_hazard_context.max_interesting" had been set to
> > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > re-used by other callers.
> > >
> > > I've attached a new set of patches that includes your suggested 
> > > improvements.
> >
> > I was going through v10-0001 patch where we are parallelizing only the
> > select part.
> >
> > + /*
> > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > + * INSERT...ON CONFLICT...DO UPDATE...
> > + */
> > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > ONCONFLICT_UPDATE)
> > + return PROPARALLEL_UNSAFE;
> >
> > I understand that we can now allow updates from the worker, but what
> > is the problem if we allow the parallel select even if there is an
> > update in the leader?
> >
>
> I think we can't allow update even in leader without having a
> mechanism for a shared combocid table. Right now, we share the
> ComboCids at the beginning of the parallel query and then never change
> it during the parallel query but if we allow updates in the leader
> backend which can generate a combocid then we need a mechanism to
> propagate that change. Does this make sense?
>

Okay, got it.  Basically, ONCONFLICT_UPDATE might run inside some
transaction block and there is a possibility that update may try to
update the same tuple is previously inserted by the same transaction
and in that case, it will generate the combo cid.  Thanks for
clarifying.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-12-09 Thread vignesh C
On Mon, Dec 7, 2020 at 3:00 PM Hou, Zhijie  wrote:
>
> > Attached v11 patch has the fix for this, it also includes the changes to
> > rebase on top of head.
>
> Thanks for the explanation.
>
> I think there is still chances we can know the size.
>
> +* line_size will be set. Read the line_size again to be sure 
> if it is
> +* completed or partial block.
> +*/
> +   dataSize = pg_atomic_read_u32(>line_size);
> +   if (dataSize != -1)
> +   {
>
> If I am not wrong, this seems the branch that procsssing the populated block.
> I think we can check the copiedSize here, if copiedSize == 0, that means
> Datasizes is the size of the whole line and in this case we can do the 
> enlarge.
>
>

Yes this optimization can be done, I will handle this in the next patch set.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Feature improvement for pg_stat_statements

2020-12-09 Thread Seino Yuki

2020-12-01 01:04 に Fujii Masao さんは書きました:

On 2020/11/30 23:05, Seino Yuki wrote:

2020-11-30 15:02 に Fujii Masao さんは書きました:

On 2020/11/30 12:08, Seino Yuki wrote:

2020-11-27 22:39 に Fujii Masao さんは書きました:

On 2020/11/27 21:39, Seino Yuki wrote:

2020-11-27 21:37 に Seino Yuki さんは書きました:

2020-11-16 12:28 に Seino Yuki さんは書きました:
Due to similar fixed, we have also merged the patches discussed 
in the

following thread.
https://commitfest.postgresql.org/30/2736/
The patch is posted on the 2736 side of the thread.

Regards.




I forgot the attachment and will resend it.

The following patches have been committed and we have created a
compliant patch for them.
https://commitfest.postgresql.org/30/2736/

Please confirm.


Thanks for updating the patch! Here are review comments from me.

+    OUT reset_exec_time TIMESTAMP WITH TIME ZONE

I prefer "stats_reset" as the name of this column for the sake of
consistency with the similar column in other stats views like
pg_stat_database.

Isn't it better to use lower cases for "TIMESTAMP WITH TIME ZONE"
because other DDLs in pg_stat_statements do that for the declaraion
of data type?

@@ -565,6 +568,8 @@ pgss_shmem_startup(void)
 pgss->n_writers = 0;
 pgss->gc_count = 0;
 pgss->stats.dealloc = 0;
+    pgss->stats.reset_exec_time = 0;
+    pgss->stats.reset_exec_time_isnull = true;

The reset time should be initialized with GetCurrentTimestamp() 
instead
of 0? If so, I think that we can completely get rid of 
reset_exec_time_isnull.


+    memset(nulls, 0, sizeof(nulls));

If some entries in "values[]" may not be set, "values[]" also needs 
to

be initialized with 0.

MemSet() should be used, instead?

+    /* Read dealloc */
+    values[0] = stats.dealloc;

Int64GetDatum() should be used here?

+    reset_ts = GetCurrentTimestamp();

GetCurrentTimestamp() needs to be called only when all the entries
are removed. But it's called even in other cases.

Regards,


Thanks for the review!
Fixed the patch.


Thanks for updating the patch! Here are another review comments.

+   reset_exec_time timestamp
with time zone

You forgot to update the column name in the doc?

+    Shows the time at which
pg_stat_statements_reset(0,0,0) was last called.

What about updating this to something like "Time at which all 
statistics

in the pg_stat_statements view were last reset." for the sale of
onsistency with the description about stats_reset column in other
tats views?

+    /* Read stats_reset */
+    values[1] = stats.stats_reset;

TimestampTzGetDatum() seems necessary.

+    reset_ts = GetCurrentTimestamp();
 /* Reset global statistics for pg_stat_statements */

Isn't it better to call GetCurrentTimestamp() before taking
an exclusive lwlock, in entry_reset()?

Regards,


Thanks for the new comment.

I got the following pointers earlier.


+    reset_ts = GetCurrentTimestamp();



GetCurrentTimestamp() needs to be called only when all the entries
are removed. But it's called even in other cases.


Which do you think is better? I think the new pointing out is better,
because entry_reset is not likely to be called often.


I was thinking that GetCurrentTimestamp() should be called before
pgss->lock lwlock is taken, only when all three arguments userid, dbid
and queryid are zero. But on second thought, we should call
GetCurrentTimestamp() and reset the stats, after the following codes?

/* All entries are removed? */
if (num_entries != num_remove)
goto release_lock;

That is, IMO that even when pg_stat_statements_reset() with non-zero
arguments is executed, if it removes all the entries, we should reset
the stats. Thought?

Regards,


+1.Fixed the patch.
We tested various reset patterns and they seemed to be fine.

Regards.

diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
index 2019a4ffe0..3504ca7eb1 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
@@ -5,9 +5,10 @@
 
 --- Define pg_stat_statements_info
 CREATE FUNCTION pg_stat_statements_info(
-OUT dealloc bigint
+OUT dealloc bigint,
+OUT stats_reset timestamp with time zone
 )
-RETURNS bigint
+RETURNS record
 AS 'MODULE_PATHNAME'
 LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
 
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 70cfdb2c9d..0656308040 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -81,6 +81,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
 
@@ -199,6 +200,7 @@ typedef struct Counters
 typedef struct pgssGlobalStats
 {
 	int64		dealloc;		/* # of times entries were deallocated */
+	TimestampTz stats_reset;		/* timestamp with all stats 

Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Amit Kapila
On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar  wrote:
>
> On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  wrote:
> >
> > On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> > >
> > > Most of the code present in
> > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > applicable for parallel copy patch also. The patch in this thread
> > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > generic by handling like the comments below, that way this parallel
> > > safety checks can be used based on the value set in
> > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > I'm providing these comments so that this patch can be generalized for
> > > parallel checks and the same can also be used by parallel copy.
> >
> > Hi Vignesh,
> >
> > You are absolutely right in pointing that out, the code was taking
> > short-cuts knowing that for Parallel Insert,
> > "max_parallel_hazard_context.max_interesting" had been set to
> > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > re-used by other callers.
> >
> > I've attached a new set of patches that includes your suggested 
> > improvements.
>
> I was going through v10-0001 patch where we are parallelizing only the
> select part.
>
> + /*
> + * UPDATE is not currently supported in parallel-mode, so prohibit
> + * INSERT...ON CONFLICT...DO UPDATE...
> + */
> + if (parse->onConflict != NULL && parse->onConflict->action ==
> ONCONFLICT_UPDATE)
> + return PROPARALLEL_UNSAFE;
>
> I understand that we can now allow updates from the worker, but what
> is the problem if we allow the parallel select even if there is an
> update in the leader?
>

I think we can't allow update even in leader without having a
mechanism for a shared combocid table. Right now, we share the
ComboCids at the beginning of the parallel query and then never change
it during the parallel query but if we allow updates in the leader
backend which can generate a combocid then we need a mechanism to
propagate that change. Does this make sense?

-- 
With Regards,
Amit Kapila.




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-09 Thread Amit Kapila
On Wed, Dec 9, 2020 at 2:56 PM Noah Misch  wrote:
>
> Further testing showed it was a file location problem, not a deletion problem.
> The worker tried to open
> base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
> were the files actually existing:
>
> [nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find 
> src/test/subscription/tmp_check -name '*sharedfileset*')
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
> total 408
> drwx--2 nm   usr 256 Dec 08 03:20 .
> drwx--4 nm   usr 256 Dec 08 03:20 ..
> -rw---1 nm   usr  207806 Dec 08 03:20 16393-510.changes.0
>
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
> total 0
> drwx--2 nm   usr 256 Dec 08 03:20 .
> drwx--4 nm   usr 256 Dec 08 03:20 ..
> -rw---1 nm   usr   0 Dec 08 03:20 16393-511.changes.0
>
> > > I have executed "make check" in the loop with only this file.  I have
> > > repeated it 5000 times but no failure, I am wondering shall we try to
> > > execute in the same machine in a loop where it failed once?
> >
> > Yes, that might help. Noah, would it be possible for you to try that
>
> The problem is xidhash using strcmp() to compare keys; it needs memcmp().  For
> this to matter, xidhash must contain more than one element.  Existing tests
> rarely exercise the multi-element scenario.  Under heavy load, on this system,
> the test publisher can have two active transactions at once, in which case it
> does exercise multi-element xidhash.  (The publisher is sensitive to timing,
> but the subscriber is not; once WAL contains interleaved records of two XIDs,
> the subscriber fails every time.)  This would be much harder to reproduce on a
> little-endian system, where strcmp(, _plus_one)!=0.  On big-endian,
> every small XID has zero in the first octet; they all look like empty strings.
>

Your analysis is correct.

> The attached patch has the one-line fix and some test suite changes that make
> this reproduce frequently on any big-endian system.  I'm currently planning to
> drop the test suite changes from the commit, but I could keep them if folks
> like them.  (They'd need more comments and timeout handling.)
>

I think it is better to keep this test which can always test multiple
streams on the subscriber.

Thanks for working on this.

-- 
With Regards,
Amit Kapila.




Re: Yet another fast GiST build

2020-12-09 Thread Andrey Borodin



> 7 дек. 2020 г., в 23:56, Peter Geoghegan  написал(а):
> 
> On Mon, Dec 7, 2020 at 2:05 AM Andrey Borodin  wrote:
>> Here's version with tests and docs. I still have no idea how to print some 
>> useful information about tuples keys.
> 
> I suggest calling BuildIndexValueDescription() from your own custom
> debug instrumentation code.
Thanks for the hint, Peter!
This function does exactly what I want to do. But I have no Relation inside 
gist_page_items(bytea) function... probably, I'll add gist_page_items(relname, 
blockno) overload to fetch keys.

Thanks!

Best regards, Andrey Borodin.



Re: Change default of checkpoint_completion_target

2020-12-09 Thread Laurenz Albe
On Tue, 2020-12-08 at 17:29 +, Bossart, Nathan wrote:
> +1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

While we are at it, could we change the default of "log_lock_waits" to "on"?

Yours,
Laurenz Albe





Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-09 Thread Noah Misch
On Wed, Dec 02, 2020 at 01:50:25PM +0530, Amit Kapila wrote:
> On Wed, Dec 2, 2020 at 1:20 PM Dilip Kumar  wrote:
> > > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila  
> > > wrote:
> > > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
> > > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > > > see the buildfarm reports
> > > > >
> > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
> > > > > failed the new 015_stream.pl test with the subscriber looping like 
> > > > > this:
> > > > >
> > > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker 
> > > > > "logical replication worker" (PID 13959252) exited with exit code 1
> > > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > ...

> > > > The above kind of error can happen due to the following reasons: (a)
> > > > the first time we sent the stream and created the file and that got
> > > > removed before the second stream reached the subscriber. (b) from the
> > > > publisher-side, we never sent the indication that it is the first
> > > > stream and the subscriber directly tries to open the file thinking it
> > > > is already there.

Further testing showed it was a file location problem, not a deletion problem.
The worker tried to open
base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
were the files actually existing:

[nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find 
src/test/subscription/tmp_check -name '*sharedfileset*')
src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
total 408
drwx--2 nm   usr 256 Dec 08 03:20 .
drwx--4 nm   usr 256 Dec 08 03:20 ..
-rw---1 nm   usr  207806 Dec 08 03:20 16393-510.changes.0

src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
total 0
drwx--2 nm   usr 256 Dec 08 03:20 .
drwx--4 nm   usr 256 Dec 08 03:20 ..
-rw---1 nm   usr   0 Dec 08 03:20 16393-511.changes.0

> > I have executed "make check" in the loop with only this file.  I have
> > repeated it 5000 times but no failure, I am wondering shall we try to
> > execute in the same machine in a loop where it failed once?
> 
> Yes, that might help. Noah, would it be possible for you to try that

The problem is xidhash using strcmp() to compare keys; it needs memcmp().  For
this to matter, xidhash must contain more than one element.  Existing tests
rarely exercise the multi-element scenario.  Under heavy load, on this system,
the test publisher can have two active transactions at once, in which case it
does exercise multi-element xidhash.  (The publisher is sensitive to timing,
but the subscriber is not; once WAL contains interleaved records of two XIDs,
the subscriber fails every time.)  This would be much harder to reproduce on a
little-endian system, where strcmp(, _plus_one)!=0.  On big-endian,
every small XID has zero in the first octet; they all look like empty strings.

The attached patch has the one-line fix and some test suite changes that make
this reproduce frequently on any big-endian system.  I'm currently planning to
drop the test suite changes from the commit, but I could keep them if folks
like them.  (They'd need more comments and timeout handling.)
Author: Noah Misch 
Commit: Noah Misch 

Use HASH_BLOBS for xidhash.

This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was
possible.  Conditions for reaching those symptoms were more frequent on
big-endian systems.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20201129214441.ga691...@rfd.leadboat.com

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index c37aafe..fce1dee 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -804,7 +804,7 @@ apply_handle_stream_start(StringInfo s)
hash_ctl.entrysize = sizeof(StreamXidHash);
hash_ctl.hcxt = ApplyContext;
xidhash = hash_create("StreamXidHash", 1024, _ctl,
-  

Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread Dilip Kumar
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.

I was going through v10-0001 patch where we are parallelizing only the
select part.

+ /*
+ * UPDATE is not currently supported in parallel-mode, so prohibit
+ * INSERT...ON CONFLICT...DO UPDATE...
+ */
+ if (parse->onConflict != NULL && parse->onConflict->action ==
ONCONFLICT_UPDATE)
+ return PROPARALLEL_UNSAFE;

I understand that we can now allow updates from the worker, but what
is the problem if we allow the parallel select even if there is an
update in the leader?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] Global temporary tables

2020-12-09 Thread Pavel Stehule
st 9. 12. 2020 v 7:34 odesílatel 曾文旌  napsal:

>
>
> 2020年12月8日 13:28,Pavel Stehule  写道:
>
> Hi
>
> čt 26. 11. 2020 v 12:46 odesílatel 曾文旌 
> napsal:
>
>> This is the latest patch for feature GTT(v38).
>> Everybody, if you have any questions, please let me know.
>>
>
> please, co you send a rebased patch. It is broken again
>
>
> Sure
>
> This patch is base on 16c302f51235eaec05a1f85a11c1df04ef3a6785
> Simplify code for getting a unicode codepoint's canonical class.
>
>
Thank you

Pavel

>
> Wenjing
>
>
> Regards
>
> Pavel
>
>
>>
>> Wenjing
>>
>>
>>
>


Re: Disallow cancellation of waiting for synchronous replication

2020-12-09 Thread Andrey Borodin


> 9 июня 2020 г., в 23:32, Jeff Davis  написал(а):
> 
> 

After using a patch for a while it became obvious that PANICing during 
termination is not a good idea. Even when we wait for synchronous replication. 
It generates undesired coredumps.
I think in presence of SIGTERM it's reasonable to say that we cannot protect 
user anymore.

PFA v3.

Best regards, Andrey Borodin.


v3-0001-Disallow-cancelation-of-syncronous-commit.patch
Description: Binary data


A failure of standby to follow timeline switch

2020-12-09 Thread Kyotaro Horiguchi
Hello.

We found a behavioral change (which seems to be a bug) in recovery at
PG13.

The following steps might seem somewhat strange but the replication
code deliberately cope with the case.  This is a sequense seen while
operating a HA cluseter using Pacemaker.

- Run initdb to create a primary.
- Set archive_mode=on on the primary.
- Start the primary.

- Create a standby using pg_basebackup from the primary.
- Stop the standby.
- Stop the primary.

- Put stnadby.signal to the primary then start it.
- Promote the primary.

- Start the standby.


Until PG12, the parimary signals end-of-timeline to the standby and
switches to the next timeline.  Since PG13, that doesn't happen and
the standby continues to request for the segment of the older
timeline, which no longer exists.

FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 
00010003 has already been removed

It is because WalSndSegmentOpen() can fail to detect a timeline switch
on a historic timeline, due to use of a wrong variable to check
that. It is using state->seg.ws_segno but it seems to be a thinko when
the code around was refactored in 709d003fbd.

The first patch detects the wrong behavior.  The second small patch
fixes it.

In the first patch, the test added to 001_stream_rep.pl involves two
copied functions related to server-log investigation from
019_repslot_limit.pl.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f7fbb1574a412100a6732dbe17b0d7f66a497298 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 9 Dec 2020 17:21:49 +0900
Subject: [PATCH 1/2] Add a new test to detect a replication bug

---
 src/test/recovery/t/001_stream_rep.pl | 64 ++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9e31a53de7..9dd1a70c90 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 36;
+use Test::More tests => 37;
 
 # Initialize primary node
 my $node_primary = get_new_node('primary');
@@ -409,3 +409,65 @@ ok( ($phys_restart_lsn_pre cmp $phys_restart_lsn_post) == 0,
 my $primary_data = $node_primary->data_dir;
 ok(!-f "$primary_data/pg_wal/$segment_removed",
 	"WAL segment $segment_removed recycled after physical slot advancing");
+
+#
+# Check if timeline-increment works while reading a historic timeline.
+my $node_primary_2 = get_new_node('primary_2');
+$node_primary_2->init(allows_streaming => 1);
+$node_primary_2->enable_archiving; # needed to make .paritial segment
+$node_primary_2->start;
+$node_primary_2->backup($backup_name);
+my $node_standby_3 = get_new_node('standby_3');
+$node_standby_3->init_from_backup($node_primary_2, $backup_name,
+  has_streaming => 1);
+$node_primary_2->stop;
+$node_primary_2->set_standby_mode; # increment primary timeline
+$node_primary_2->start;
+$node_primary_2->promote;
+my $logstart = get_log_size($node_standby_3);
+$node_standby_3->start;
+
+my $success = 0;
+for (my $i = 0 ; $i < 1; $i++)
+{
+	if (find_in_log(
+			$node_standby_3,
+			"requested WAL segment [0-9A-F]+ has already been removed",
+			$logstart))
+	{
+		last;
+	}
+	elsif (find_in_log(
+			$node_standby_3,
+			"End of WAL reached on timeline",
+			   $logstart))
+	{
+		$success = 1;
+		last;
+	}
+	usleep(100_000);
+}
+
+ok($success, 'Timeline increment while reading a historic timeline');
+
+# return the size of logfile of $node in bytes
+sub get_log_size
+{
+	my ($node) = @_;
+
+	return (stat $node->logfile)[7];
+}
+
+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+	my ($node, $pat, $off) = @_;
+
+	$off = 0 unless defined $off;
+	my $log = TestLib::slurp_file($node->logfile);
+	return 0 if (length($log) <= $off);
+
+	$log = substr($log, $off);
+
+	return $log =~ m/$pat/;
+}
-- 
2.27.0

>From f3862b02b928c4f8aa6380bbccbac8e3c4bf800e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 9 Dec 2020 17:22:41 +0900
Subject: [PATCH 2/2] Fix a bug of timeline-tracking while sending a historic
 timeline

Since PG13, we should track a timeline switch while sending a historic
timeline running physical replication, but WalSndSegmentOpen fails to
do that. It is a thinko of 709d003fbd.
---
 src/backend/replication/walsender.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2eb19ad293..0a3f2f1359 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2497,7 +2497,7 @@ WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo,
 		XLogSegNo	endSegNo;
 
 		XLByteToSeg(sendTimeLineValidUpto, endSegNo, state->segcxt.ws_segsize);
-		if (state->seg.ws_segno == endSegNo)
+		if (nextSegNo == endSegNo)
 			*tli_p = sendTimeLineNextTLI;
 	}
 
-- 
2.27.0



Re: pg_stat_statements and "IN" conditions

2020-12-09 Thread Chengxi Sun
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hi, I did some test and it works well