date:20200310

Re: [PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-10 Thread Craig Ringer

On Wed, 11 Mar 2020 at 13:47, Pavel Stehule  wrote:
>
> st 11. 3. 2020 v 6:43 odesílatel Julien Rouhaud  napsal:
>>
>> Le mer. 11 mars 2020 à 05:28, Laurenz Albe  a 
>> écrit :
>>>
>>> On Wed, 2020-03-11 at 11:25 +0800, Craig Ringer wrote:
>>> > Short version: Currently if the server is built with --with-llvm the
>>> > -devel packages must depend on clang for PGXS to work, even though
>>> > llvm is otherwise optional. This is a particular problem on older
>>> > platforms where 3rd party LLVM may be required to build the server's
>>> > llvmjit support. Work around by skipping the default .bc generation if
>>> > no clang is found by PGXS, as if $(with_llvm) was false.
>>>
>>> +1
>>>
>>> I have struggled with this, as have several users trying to build 
>>> oracle_fdw.
>>
>>
>> +1, I had similar experience with other extensions.
>
> +1

BTW, as a workaround in the mean time you can suppress bitcode
generation by building your ext with:

make with_llvm=no all

Similarly, if postgres is not using your ccache for builds because
it's invoking a baked-in compiler path, you can run

   make CC=$(type -p gcc) GCC=$(type -p gcc) all

to force fresh path-lookups that should find your ccache-wrappers for
the compiler.

This will also work around problems where ccache was used to build
postgres, so ccache-wrapper paths got baked into Makefile.in, so your
PGXS builds fail with something like:

make: /usr/lib64/ccache/gcc: Command not found



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: [PATCH] Replica sends an incorrect epoch in its hot standby feedback to the Master

2020-03-10 Thread Thomas Munro

On Sat, Feb 22, 2020 at 6:10 AM Palamadai, Eka  wrote:
> Thanks a lot for the feedback. Please let me know if you have any further 
> comments. Meanwhile, I have also added this patch to "Commitfest 2020-03" at 
> https://commitfest.postgresql.org/27/2464.

Thanks for the excellent reproducer for this obscure bug.  You said
the problem exists in 9.6-11, but I'm also able to reproduce it in
9.5.  That's the oldest supported release, but it probably goes back
further.  I confirmed that this patch fixes the immediate problem.
I've attached a version of your patch with a commit message, to see if
anyone has more feedback on this.
From 26740a9a670161f27c07c3c83c1056f8b4373d35 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 11 Mar 2020 17:02:02 +1300
Subject: [PATCH] Fix nextXid tracking bug on standbys (9.5-11 only).

RecordKnownAssignedTransactionIds() should never move
nextXid backwards.  Before this commit, that could happen
if some other code path had advanced it without advancing
latestObservedXid.

One consequence is that a well timed XLOG_CHECKPOINT_ONLINE
could cause hot standby feedback messages to get confused
and report an xmin from a future epoch, potentially allowing
vacuum to run too soon on the primary.

Repair, by making sure RecordKnownAssignedTransactionIds()
can only move nextXid forwards.

In release 12 and master, this was already done by commit
2fc7af5e, which consolidated similar code and straightened
out this bug.  Back-patch to supported releases before that.

Author: Eka Palamadai
Discussion: https://postgr.es/m/98bb4805-d0a2-48e1-96f4-15014313e...@amazon.com
---
 src/backend/storage/ipc/procarray.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ddd3461d56..50b9aea683 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3256,7 +3256,8 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		next_expected_xid = latestObservedXid;
 		TransactionIdAdvance(next_expected_xid);
 		LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
-		ShmemVariableCache->nextXid = next_expected_xid;
+		if (TransactionIdFollows(next_expected_xid, ShmemVariableCache->nextXid))
+			ShmemVariableCache->nextXid = next_expected_xid;
 		LWLockRelease(XidGenLock);
 	}
 }
-- 
2.20.1

Re: custom postgres launcher for tests

2020-03-10 Thread Craig Ringer

On Fri, 21 Feb 2020 at 17:05, Ivan N. Taranov  wrote:
>
> On Fri, Feb 21, 2020 at 4:49 AM Craig Ringer  wrote:
>
> > I thought I saw a related patch to this that proposed to add a pg_ctl
> > argument. Was that you too? I can't find it at the moment.

I've had it on my TODO forever but I don't think it was me who posted
a patch. I honestly can't even remember. Too much going on at once.

> This routine usable only for tap tests with used
> PostgresNode::get_new_node/start/restart, and for regress tests.
>
> Perhaps the name TEST_PGLAUNCHER is more correct for this env-var.
>
> >into doing whatever they want, so it's not a security concern
>
> >I currently do this with a horrid shellscript hack that uses next-on-path
> >lookups and a wrapper 'postgres' executable
>
> Its not security problem, because this kit only for developer, commonly - for
> iteratively build and run concrete tests.
>
> For more complexy replacing need patch for pg_ctl, or postgres wrapper, or
> replacing postgres bin  and other ways...

If we support a wrapper we should support it for all pg_ctl usage IMO.
Even if it's intended just for testing. Because the scope of "testing"
extends very far outside "in-core TAP and pg_regress tests". Testing
needs include extensions running their own tests under valgrind or
similar tools, tests simulating clustered environments using ansible
or other automation tools, and more.

So I'd rather stick with your original PGLAUNCHER proposal. I think
all the tools we care about already invoke postgres via pg_ctl, and
any that don't should probably be taught to.

(I wish pg_ctl had a --no-daemon, --foreground or --no-detach mode
though, to help with this.)

For the sake of others with similar needs I attach my current
wrapper/launcher script. To use it you have to:

mkdir pglauncher
cd pglauncher
cp $the_script postgres
chmod a+x postgres
ln -s postgres initdb
ln -s postgres pg_ctl
cd ..

Then ensure the bin directory for your target postgres is first on the
PATH and run with something like:

POSTGRES_SRC=/path/to/your/srcdir

PATH=${PWD}/pglauncher/:$PATH \
  VG_LOG="$(mktemp -p . -d valgrind-log-XX)/%n-%p.log"
  VG_ARGS="--verbose --leak-check=full
--show-leak-kinds=definite,possible --track-origins=yes
--read-var-info=yes --show-error-list=yes --num-callers=30
--malloc-fill=8f --free-fill=9f
--suppressions=$POSTGRES_SRC/src/tools/valgrind.supp
--gen-suppressions=all" \
  pg_ctl blah blah

(BTW, we should install "valgrind.supp" when we install the optional
bits and pieces like the src/test/perl modules, pg_regress, and so on,
so it's available for extensions that run their own valgrind scans.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

postgres
Description: Binary data

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread Masahiko Sawada

On Wed, 11 Mar 2020 at 13:24, Laurenz Albe  wrote:
>
> On Wed, 2020-03-11 at 12:00 +0900, Masahiko Sawada wrote:
> > > > I have one question about this patch from architectural perspective:
> > > > have you considered to use autovacuum_vacuum_threshold and
> > > > autovacuum_vacuum_scale_factor also for this purpose?
> > >
> > > I am torn.
> > >
> > > On the one hand it would be wonderful not to have to add yet more GUCs
> > > to the already complicated autovacuum configuration.  It already confuses
> > > too many users.
> > >
> > > On the other hand that will lead to unnecessary vacuums for small
> > > tables.
> > > Worse, the progression caused by the comparatively large scale
> > > factor may make it vacuum large tables too seldom.
> >
> > I might be missing your point but could you elaborate on that in what
> > kind of case you think this lead to unnecessary vacuums?
>
> If you have an insert-only table that has 10 entries, it will get
> vacuumed roughly every 2 new entries.  The impact is probably too
> little to care, but it will increase the contention for the three
> autovacuum workers available by default.

The same is true for read-write table, right? If that becomes a
problem, it's a mis-configuration and user should increase these
values just like when we set these values for read-write tables.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-10 Thread Pavel Stehule

st 11. 3. 2020 v 6:43 odesílatel Julien Rouhaud  napsal:

> Le mer. 11 mars 2020 à 05:28, Laurenz Albe  a
> écrit :
>
>> On Wed, 2020-03-11 at 11:25 +0800, Craig Ringer wrote:
>> > Short version: Currently if the server is built with --with-llvm the
>> > -devel packages must depend on clang for PGXS to work, even though
>> > llvm is otherwise optional. This is a particular problem on older
>> > platforms where 3rd party LLVM may be required to build the server's
>> > llvmjit support. Work around by skipping the default .bc generation if
>> > no clang is found by PGXS, as if $(with_llvm) was false.
>>
>> +1
>>
>> I have struggled with this, as have several users trying to build
>> oracle_fdw.
>>
>
> +1, I had similar experience with other extensions.
>

+1

Pavel

Re: Do we need to handle orphaned prepared transactions in the server?

2020-03-10 Thread Masahiko Sawada

On Mon, 2 Mar 2020 at 21:42, Hamid Akhtar  wrote:
>
> Here is the v2 of the same patch after rebasing it and running it through 
> pgindent. There are no other code changes.
>

Thank you for working on this. I think what this patch tries to
achieve would be helpful to inform orphaned prepared transactions that
can be cause of inefficient vacuum to users.

As far as I read the patch, the setting this feature using newly added
parameters seems to be complicated to me. IIUC, even if a prepared
transactions is enough older than max_age_prepared_xacts, we don't
warn if it doesn't elapsed prepared_xacts_vacuum_warn_timeout since
when the "first" prepared transaction is created. And the first
prepared transaction means that the first entry for
TwoPhaseStateData->prepXacts. Therefore, if there is always more than
one prepared transaction, we don't warn for at least
prepared_xacts_vacuum_warn_timeout seconds even if the first added
prepared transaction is already removed. So I'm not sure how we can
think the setting value of prepared_xacts_vacuum_warn_timeout.

Regarding the warning message, I wonder if the current message is too
detailed. If we want to inform that there is orphaned prepared
transactions to users, it seems to me that it's enough to report the
existence (and possibly the number of orphaned prepared transactions),
rather than individual details.

Given that the above things, we can simply think this feature; we can
have only max_age_prepared_xacts, and autovacuum checks the minimum of
prepared_at of prepared transactions, and compare it to
max_age_prepared_xacts. We can warn if (CurrentTimestamp -
min(prepared_at)) > max_age_prepared_xacts. In addition, if we also
want to control this behavior by the age of xid, we can have another
GUC parameter for comparing the age(min(xid of prepared transactions))
to that value.

Finally, regarding the name of parameters, when we mention the age of
transaction it means the age of xid of the transaction, not the time.
Please refer to other GUC parameter having "age" in its name such as
autovacuum_freeze_max_age, vacuum_freeze_min_age. The patch adds
max_age_prepared_xacts but I think it should be renamed. For example,
prepared_xact_warn_min_duration is for time and
prepared_xact_warn_max_age for age.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-10 Thread Julien Rouhaud

Le mer. 11 mars 2020 à 05:28, Laurenz Albe  a
écrit :

> On Wed, 2020-03-11 at 11:25 +0800, Craig Ringer wrote:
> > Short version: Currently if the server is built with --with-llvm the
> > -devel packages must depend on clang for PGXS to work, even though
> > llvm is otherwise optional. This is a particular problem on older
> > platforms where 3rd party LLVM may be required to build the server's
> > llvmjit support. Work around by skipping the default .bc generation if
> > no clang is found by PGXS, as if $(with_llvm) was false.
>
> +1
>
> I have struggled with this, as have several users trying to build
> oracle_fdw.
>

+1, I had similar experience with other extensions.

>

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-10 Thread Fujii Masao




On 2020/03/11 3:39, Magnus Hagander wrote:

On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  wrote:




On 2020/03/10 22:43, Amit Langote wrote:

On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao  wrote:

So, I will make the patch adding support for --no-estimate-size option
in pg_basebackup.


Patch attached.


Like the idea and the patch looks mostly good.


Thanks for reviewing the patch!


+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is always 0.

"always" seems unnecessary.


Fixed.


+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view always indicates 0 if this option is enabled.

Here too.


Fixed.

Attached is the updated version of the patch.


Would it perhaps be better to return NULL instead of 0 in the
statistics view if there is no data?

Also, should it really  be the server version that decides how this
feature behaves, and not the pg_basebackup version? Given that the
implementation is entirely in the client, it seems that's more
logical?


Yeah, you're right. I changed the patch that way.
Attached is the updated version of the patch.
 

and a few docs nitpicks:

 
  Whether this is enabled or not, the
  pg_stat_progress_basebackup view
-report the progress of the backup in the server side. But note
-that the total amount of data that will be streamed is estimated
-and reported only when this option is enabled. In other words,
-backup_total column in the view always
-indicates 0 if this option is disabled.
+report the progress of the backup in the server side.
+   
+   
+This option is not allowed when using
+--no-estimate-size.
 

I think you should just remove that whole paragraph. The details are
now listed under the disable parameter.


Fixed.


+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view indicates 0 if this option is enabled.

I suggest just "This option prevents the server from estimating the
total amount of backup data that will be streamed, resulting in the
ackup_total column in pg_stat_progress_basebackup to be (zero or NULL
depending on above)".

(Markup needed on that of course ,but you get the idea)


Yes, fixed.


+When this is disabled, the backup will start by enumerating

I'd try to avoid the double negation, with something "without this
parameter, the backup will start..."


Fixed. I used "Without using this option ...".
 

+  
+   pg_basebackup asks the server to estimate
+   the total amount of data that will be streamed by default (unless
+   --no-estimate-size is specified) in version 13 or later,
+   and does that only when --progress is specified in
+   the older versions.
+  

That's an item for the release notes, not for the reference page, I
think. It's already explained under the --disable parameter, so I
suggest removing this paragraph as well.


Fixed.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 987580d6df..2c9f51daf4 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4388,10 +4388,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  backup_total
  bigint
  
-  Total amount of data that will be streamed. If progress reporting
-  is not enabled in pg_basebackup
-  (i.e., --progress option is not specified),
-  this is 0. Otherwise, this is estimated and
+  Total amount of data that will be streamed. This is estimated and
   reported as of the beginning of
   streaming database files phase. Note that
   this is only an approximation since the database
@@ -4399,7 +4396,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   and WAL log may be included in the backup later. This is always
   the same value as backup_streamed
   once the amount of data streamed exceeds the estimated
-  total size.
+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is 0.
  
 
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 29bf2f9b97..90638aad0e 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -460,21 +460,6 @@ PostgreSQL documentation
 in this case the estimated target size will increase once it passes the
 total estimate without WAL.

-   
-When this is enabled, the

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-10 Thread Fujii Masao





On 2020/03/11 13:44, Amit Langote wrote:

On Wed, Mar 11, 2020 at 3:39 AM Magnus Hagander  wrote:

On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  wrote:

Attached is the updated version of the patch.


Would it perhaps be better to return NULL instead of 0 in the
statistics view if there is no data?


NULL sounds better than 0.


Ok, I will make the patch that changes the view so that NULL is
returned instead of 0. I'm thinking to commit that patch
after applying the change of pg_basebackup client side first.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-10 Thread Amit Langote

On Wed, Mar 11, 2020 at 3:39 AM Magnus Hagander  wrote:
> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  
> wrote:
> > Attached is the updated version of the patch.
>
> Would it perhaps be better to return NULL instead of 0 in the
> statistics view if there is no data?

NULL sounds better than 0.

Thank you,
Amit

Re: [PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-10 Thread Laurenz Albe

On Wed, 2020-03-11 at 11:25 +0800, Craig Ringer wrote:
> Short version: Currently if the server is built with --with-llvm the
> -devel packages must depend on clang for PGXS to work, even though
> llvm is otherwise optional. This is a particular problem on older
> platforms where 3rd party LLVM may be required to build the server's
> llvmjit support. Work around by skipping the default .bc generation if
> no clang is found by PGXS, as if $(with_llvm) was false.

+1

I have struggled with this, as have several users trying to build oracle_fdw.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread Laurenz Albe

On Wed, 2020-03-11 at 12:00 +0900, Masahiko Sawada wrote:
> > > I have one question about this patch from architectural perspective:
> > > have you considered to use autovacuum_vacuum_threshold and
> > > autovacuum_vacuum_scale_factor also for this purpose?
> >
> > I am torn.
> > 
> > On the one hand it would be wonderful not to have to add yet more GUCs
> > to the already complicated autovacuum configuration.  It already confuses
> > too many users.
> > 
> > On the other hand that will lead to unnecessary vacuums for small
> > tables.
> > Worse, the progression caused by the comparatively large scale
> > factor may make it vacuum large tables too seldom.
> 
> I might be missing your point but could you elaborate on that in what
> kind of case you think this lead to unnecessary vacuums?

If you have an insert-only table that has 10 entries, it will get
vacuumed roughly every 2 new entries.  The impact is probably too
little to care, but it will increase the contention for the three
autovacuum workers available by default.

Yours,
Laurenz Albe

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-10 Thread Andy Fan

On Wed, Mar 11, 2020 at 6:49 AM David Rowley  wrote:

> On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat
>  wrote:
> >
> > On Tue, Mar 10, 2020 at 1:49 PM Andy Fan 
> wrote:
> > > In my current implementation, it calculates the uniqueness for each
> > > BaseRel only, but in your way,  looks we need to calculate the
> > > UniquePathKey for both BaseRel and JoinRel.   This makes more
> > > difference for multi table join.
> >
> > I didn't understand this concern. I think, it would be better to do it
> > for all kinds of relation types base, join etc. This way we are sure
> > that one method works across the planner to eliminate the need for
> > Distinct or grouping. If we just implement something for base
> > relations right now and don't do that for joins, there is a chance
> > that that method may not work for joins when we come to implement it.
>
> Yeah, it seems to me that we're seeing more and more features that
> require knowledge of uniqueness of a RelOptInfo. The skip scans patch
> needs to know if a join will cause row duplication so it knows if the
> skip scan path can be joined to without messing up the uniqueness of
> the skip scan.  Adding more and more places that loop over the rel's
> indexlist just does not seem the right way to do it, especially so
> when you have to dissect the join rel down to its base rel components
> to check which indexes there are. Having the knowledge on-hand at the
> RelOptInfo level means we no longer have to look at indexes for unique
> proofs.
>
> > > Another concern is UniquePathKey
> > > is designed for a general purpose,  we need to maintain it no matter
> > > distinctClause/groupbyClause.
> >
> > This should be ok. The time spent in annotating a RelOptInfo about
> > uniqueness is not going to be a lot. But doing so would help generic
> > elimination of Distinct/Group/Unique operations. What is
> > UniquePathKey; I didn't find this in your patch or in the code.
>
> Possibly a misinterpretation. There is some overlap between this patch
> and the skip scan patch, both would like to skip doing explicit work
> to implement DISTINCT. Skip scans just go about it by adding new path
> types that scan the index and only gathers up unique values.  In that
> case, the RelOptInfo won't contain the unique keys, but the skip scan
> path will. How I imagine both of these patches working together in
> create_distinct_paths() is that we first check if the DISTINCT clause
> is a subset of the a set of the RelOptInfo's unique keys (this patch),
> else we check if there are any paths with uniquekeys that we can use
> to perform a no-op on the DISTINCT clause (the skip scan patch), if
> none of those apply, we create the required paths to uniquify the
> results.
>

Now I am convinced that we should maintain UniquePath on RelOptInfo,
I would see how to work with "Index Skip Scan" patch.

Re: Improve search for missing parent downlinks in amcheck

2020-03-10 Thread Peter Geoghegan

On Tue, Mar 10, 2020 at 8:30 AM Alexander Korotkov
 wrote:
> Yes, current example looks confusing in this aspect.  But your comment
> spotted to me an algorithmic issue.  We don't match highkey of
> leftmost child against parent pivot key.  But we can.  The "if
> (!BlockNumberIsValid(blkno))" branch survived from the patch version
> when we didn't match high keys.  I've revised it.  Now we enter the
> loop even for leftmost page on child level and match high key for that
> page.

Great. That looks better.

> > BTW, a P_LEFTMOST() assertion at the beginning of
> > bt_child_highkey_check() would make this easier to follow.
>
> Yes, but why should it be an assert?  We can imagine corruption, when
> there is left sibling of first child of leftmost target.

I agree. I would only make it an assertion when it concerns an
implementation detail of amcheck, but that doesn't apply here.

> Thank you.  I'd like to have another feedback from you assuming there
> are logic changes.

This looks committable. I only noticed one thing: The comments above
bt_target_page_check() need to be updated to reflect the new check,
which no longer has anything to do with "heapallindexed = true".

-- 
Peter Geoghegan

Re: Add an optional timeout clause to isolationtester step.

2020-03-10 Thread Michael Paquier

On Tue, Mar 10, 2020 at 02:53:36PM +0100, Julien Rouhaud wrote:
> So basically we could just change pg_isolation_test_session_is_blocked() to
> also return the wait_event_type and wait_event, and adding something like

Hmm.  I think that Tom has in mind the reasons behind 511540d here.

> step "" { SQL } [ cancel on "" "" ]
> 
> to the step definition should be enough.  I'm attaching a POC patch for that.
> On my laptop, the full test now complete in about 400ms.

Not much a fan of that per the lack of flexibility, but we have a
single function to avoid a huge performance impact when using
CLOBBER_CACHE_ALWAYS, so we cannot really use a SQL-based logic
either...

> FTR the REINDEX TABLE CONCURRENTLY case is eventually locked on a virtualxid,
> I'm not sure if that's could lead to too early cancellation.

WaitForLockersMultiple() is called three times in this case, but your
test case is waiting on a lock to be released for the old index which
REINDEX CONCURRENTLY would like to drop at the beginning of step 5, so
this should work reliably here.

> + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "wait_even",
> +TEXTOID, -1, 0);
Guess who is missing a 't' here.

pg_isolation_test_session_is_blocked() is not documented and it is
only used internally in the isolation test suite, so breaking its
compatibility should be fine in practice..  Now you are actually
changing it so as we get a more complex state of the blocked
session, so I think that we should use a different function name, and
a different function.  Like pg_isolation_test_session_state?
--
Michael

signature.asc
Description: PGP signature

Re: Index Skip Scan

2020-03-10 Thread Andy Fan

>
>
> I think the UniqueKeys may need to be changed from using
> EquivalenceClasses to use Exprs instead.
>

When I try to understand why UniqueKeys needs EquivalenceClasses,
see your comments here.  I feel that  FuncExpr can't be
used to as a UniquePath even we can create unique index on f(a)
and f->strict == true.  The reason is even we know a is not null,
 f->strict = true.  it is still be possible that f(a) == null.  unique index
allows more than 1 null values.  so shall we move further to use varattrno
instead of Expr?  if so,  we can also use a list of Bitmapset to present
multi
unique path of a single RelOptInfo.

Re: Proposal: PqSendBuffer removal

2020-03-10 Thread Craig Ringer

faifaiOn Sat, 7 Mar 2020 at 02:45, Andres Freund  wrote:
>
> Hi,
>
> On March 5, 2020 1:23:21 PM PST, Aleksei Ivanov  wrote:
> >Thank you for your reply!
> >
> >Yes, you are right there will be a separate call to send the data, but
> >is
> >copying data each time more costly operation than just one syscall?
>
> Yes, it's very likely to be more expensive to execute a syscall in a lot of 
> cases. They've gotten a lot more expensive with all the security issues.

Good to know.

> >Besides, if we already have a ready message packet to be sent why
> >should we
> >wait?
>
> In a lot of cases we'll send a number of small messages after each other. We 
> don't want to send those out separately, that'd just increase overhead.

Right. We presently set `TCP_NODELAY` to disable Nagle's algorithm, so
we'd tend to generate these messages as separate packets as well as
seperate syscalls, making it doubly undesirable.

We can dynamically twiddle TCP_NODELAY like many other applications do
to optimise the wire packets, but not the syscalls. It'd actually cost
extra syscalls.

> But in some paths/workloads the copy is quite noticable. I've mused before 
> whether we could extend StringInfo to handle cases like this. E.g. by having 
> StringInfo have two lengths. One that is the offset to the start of the 
> allocated memory (0 for plain StringInfos), and one for the length of the 
> string being built.
>
> Then we could get a StringInfo pointing directly to the current insertion 
> point in the send buffer.  To support growing it, enlargeStringInfo would 
> first subtract the offset to the start of the allocation, and then reallocate 
> that.
>
> I can imagine that bring useful in a number of places. And because there only 
> would be additional overhead when actually growing the StringInfo, I don't 
> think the cost would be measurable.

That sounds pretty sensible as it'd be minimally intrusive.

I've wondered whether we can further optimise some uses by having
libpq-be manage an iovec for us instead, much like we support iovec
for shm_mq. Instead of a StringInfo we'd use an iovec wrapped by a
libpq-managed struct. libpq would reserve the first few entries in the
managed iovec for the message header. Variants of pq_sendint64(...)
etc would add entries to the iovec and could be inlined since they'd
just be convenience routines. The message would be flushed by having
libpq call writev() on the iovec container.

We'd want a wrapper struct for the iovec so we could have libpq keep a
cursor for the next entry in the iovec. For libpq-fe it'd also contain
a map of which iovec entries need to be free()d; for libpq-be we'd
probably palloc(), maybe with a child memory context. To support a
stack-allocated iovec for when we know all the message fields in
advance we'd have an init function that takes the address  of the
preallocated iovec and its length limit.

We could support  We could also support a libpq-wrapped iovec where
libpq can realloc it if it fills.
stack-allocated iovec - so the caller would be responsible for
managing the max size

That way we can do zero-copy scatter-gather I/O for messages that
don't require binary-to-text-format transformations etc.

BTW, if we change StringInfo, I'd like to also official bless the
usage pattern where we wrap a buffer in a StringInfo so we can use
pq_getmsgint64 etc on it. Add a initConstStringInfo(StringInfo si,
const char * buf, size_t buflen) or something that assigns the
StringInfo values and sets maxlen = -1.  The only in-core usage I see
for this so far is in src/backend/replication/logical/worker.c but
it's used extremely heavily in pglogical etc. It'd just be a
convenience function that blesses and documents existing usage.

But like Tom I really want to first come back to the evidence. Why
should we bother? Are we solving an actual problem here? PostgreSQL is
immensely memory-allocation-happy and copy-happy. Shouldn't we be more
interested in things like reducing the cost of multiple copies and
transform passes of Datum values? Especially since that's an actual
operational pain point when you're working with multi-hundred-megabyte
bytea or text fields.

Can you come up with some profiling/performance numbers that track
time spent on memory copying in the areas you propose to target, plus
malloc overheads? With a tool like systemtap or perf it should not be
overly difficult to do so by making targeted probes that filter based
on callstack, or on file / line-range or function.

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-10 Thread Michael Paquier

On Tue, Mar 10, 2020 at 12:39:53PM -0300, Alvaro Herrera wrote:
> Another option is to return the command as a palloc'ed string (per
> psprintf), instead of using a caller-stack-allocated variable.  Passing
> the buffer len is widely used, but more error prone (and I think getting
> this one wrong might be more catastrophic than a mistake elsewhere.)
> This is not a performance-critical path enough that we *need* the
> optimization that avoids the palloc is important.  (Failure can be
> reported by returning NULL.)

That's a better approach here.

> Also, I think the function comment could stand some more detailing.

What kind of additional information would you like to add on top of
what the updated version attached does?

> Also, I think Msvcbuild.pm could follow Makefile's ideas of one line per
> file.  Maybe no need to fix all of that in this patch, but let's start
> by adding the new file it its own line rather than reflowing two
> adjacent lines (oh wait ... does perltidy put it that way?  if so,
> nevermind.)

Good idea.  It happens that perltidy does not care about that, but I
would rather keep that stuff for a separate patch/thread.
--
Michael
From 33d9a6837f2435966006947dad18c21e40e0c271 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 11 Mar 2020 12:25:18 +0900
Subject: [PATCH v3] Move routine generating restore_command to src/common/

restore_command has only been used until now by the backend, but there
is a pending patch for pg_rewind to make use of that in the frontend.

Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee3616895...@postgrespro.ru
---
 src/include/common/archive.h |  27 +
 src/backend/access/transam/xlogarchive.c |  66 ++---
 src/common/Makefile  |   1 +
 src/common/archive.c | 119 +++
 src/tools/msvc/Mkvcbuild.pm  |   4 +-
 5 files changed, 159 insertions(+), 58 deletions(-)
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/common/archive.c

diff --git a/src/include/common/archive.h b/src/include/common/archive.h
new file mode 100644
index 00..8af0b4566f
--- /dev/null
+++ b/src/include/common/archive.h
@@ -0,0 +1,27 @@
+/*-
+ *
+ * archive.h
+ *	  Common WAL archive routines
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/archive.h
+ *
+ *-
+ */
+#ifndef ARCHIVE_H
+#define ARCHIVE_H
+
+/*
+ * Routine to build a restore command to retrieve WAL segments from a WAL
+ * archive.  This uses the arguments given by the caller to replace the
+ * corresponding alias fields as defined in the GUC parameter
+ * restore_command.
+ */
+extern char *BuildRestoreCommand(const char *restoreCommand,
+ const char *xlogpath,	/* %p */
+ const char *xlogfname, /* %f */
+ const char *lastRestartPointFname);	/* %r */
+
+#endif			/* ARCHIVE_H */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 188b73e752..914ad340ea 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -21,6 +21,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "common/archive.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
 #include "replication/walsender.h"
@@ -53,11 +54,8 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	bool cleanupEnabled)
 {
 	char		xlogpath[MAXPGPATH];
-	char		xlogRestoreCmd[MAXPGPATH];
+	char	   *xlogRestoreCmd;
 	char		lastRestartPointFname[MAXPGPATH];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
 	int			rc;
 	struct stat stat_buf;
 	XLogSegNo	restartSegNo;
@@ -149,58 +147,13 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogRestoreCmd;
-	endp = xlogRestoreCmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = recoveryRestoreCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-case 'p':
-	/* %p: relative path of target file */
-	sp++;
-	StrNCpy(dp, xlogpath, endp - dp);
-	make_native_path(dp);
-	dp += strlen(dp);
-	break;
-case 'f':
-	/* %f: filename of desired file */
-	sp++;
-	StrNCpy(dp, xlogfname, endp - dp);
-	dp += strlen(dp);
-	break;
-case 'r':
-	/* %r: filename of last restartpoint */
-	sp++;
-	StrNCpy(dp, lastRestartPointFname, endp - dp);
-	dp += strlen(dp);
-	break;
-case '%':
-	/* convert %% to a single % */
-	sp++;
-	if (dp < e

[PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-10 Thread Craig Ringer

Short version: Currently if the server is built with --with-llvm the
-devel packages must depend on clang for PGXS to work, even though
llvm is otherwise optional. This is a particular problem on older
platforms where 3rd party LLVM may be required to build the server's
llvmjit support. Work around by skipping the default .bc generation if
no clang is found by PGXS, as if $(with_llvm) was false.

Detail:

If PostgreSQL is configured with --with--lvm it writes with_llvm=yes
into Makefile.global via AC_SUBST, along with the CLANG path and the
path to the LLVM_TOOLSET if supplied. PGXS sets up a %.bc dependency
for OBJS if it detects that the server was compiled with llvm support.

If clang is not found at PGXS extension build-time the extension build
will then fail, despite the user not having installed the
postgresql11-llvmjit (or whatever) package and the extension not
declaring any explicit LLVM dependency or requirement.

Andres and others went to a great deal of effort to make it possible
to separate PostgreSQL's LLVM support, so a server built with LLVM
support doesn't actually have a runtime dependency on llvm unless the
llvmjit module is loaded. This allows packagers to separate it out and
avoid the need to declare an llvm dependency on the main server
package.

I've found that this falls down for -devel packages like those the
PGDG Yum team produces. The problem arises when a container or VM is
used to build the server and its Makefile.global etc. Then the -devel
package containing Makefile.global and other PGXS bits are installed
on a new machine that does not have llvm's clang. PGXS builds will
fail when they attempt to generate bytecode, since they expect clang
to be present at the path baked in to Makefile.global - but it's
absent.

I propose that per the attached patch PGXS should simply skip adding
the automatic dependency for .bc files if clang cannot be found.
Extensions may still choose to explicitly declare the rule in their
own Makefile if they want to force bitcode generation.

If we want to get fancier about it, we could instead split the llvm
support out from Makefile.global into a Makefile.llvm or similar,
which is then conditionally included by Makefile.global if it exists.
Makefile.llvm would be packaged in a new postgresqlXX-llvmjit-devel
package since distros get uppity if you put makefile fragments in
runtime packages. This package would depend on llvm-toolset and clang.
If you install postgresqlXX-devel but not postgresqlXX-llvmjit-devel
you don't get bitcode and don't need clang. If you install
postgresqlXX-llvmjit-devel, the same clang as we built the server with
is declared as a dependency and Makefile.llvm is included, so it all
works.

But I don't think it's worth the hassle and I'd rather just skip
automatic bitcode generation if we don't find clang.

See also yum packagers report at
https://www.postgresql.org/message-id/CAMsr+YHokx0rWLV561z3=gai6cm4yjekgclkqmdwqstejvy...@mail.gmail.com
.

--
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e3ea7f21a7..df191709b2 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -1020,6 +1020,21 @@ endif # enable_coverage
 # LLVM support
 #
 
+# Don't try to build llvm bitcode if the clang used when Pg itself was built
+# can no longer be found. This is possible when this Makefile.global is
+# installed via a -devel package that doesn't have a hard dependency on clang,
+# particularly for enterprise distros that may use a newer clang for builds.
+ifeq ($(with_llvm),yes)
+# Only override setting made by Makefile.global defaults not on cmdline etc
+ifeq ($(origin with_llvm),file)
+ifeq ($(wildcard $(CLANG)),)
+$(warning disabling llvm bitcode generation: $$(CLANG) = $(CLANG) not found)
+with_llvm = no
+endif
+endif
+endif
+
 ifndef COMPILE.c.bc
 # -Wno-ignored-attributes added so gnu_printf doesn't trigger
 # warnings, when the main binary is compiled with C.

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread Masahiko Sawada

On Wed, 11 Mar 2020 at 04:17, Laurenz Albe  wrote:
>
> On Tue, 2020-03-10 at 18:14 +0900, Masahiko Sawada wrote:
>
> Thanks for the review and your thoughts!
>
> > FYI actually vacuum could perform index cleanup phase (i.g.
> > PROGRESS_VACUUM_PHASE_INDEX_CLEANUP phase) on a table even if it's a
> > truly INSERT-only table, depending on
> > vacuum_cleanup_index_scale_factor. Anyway, I also agree with not
> > disabling index cleanup in insert-only vacuum case, because it could
> > become not only a cause of index bloat but also a big performance
> > issue. For example, if autovacuum on a table always run without index
> > cleanup, gin index on that table will accumulate insertion tuples in
> > its pending list and will be cleaned up by a backend process while
> > inserting new tuple, not by a autovacuum process. We can disable index
> > vacuum by index_cleanup storage parameter per tables, so it would be
> > better to defer these settings to users.
>
> Thanks for the confirmation.
>
> > I have one question about this patch from architectural perspective:
> > have you considered to use autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor also for this purpose? That is, we
> > compare the threshold computed by these values to not only the number
> > of dead tuples but also the number of inserted tuples. If the number
> > of dead tuples exceeds the threshold, we trigger autovacuum as usual.
> > On the other hand if the number of inserted tuples exceeds, we trigger
> > autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user
> > consider the settings of newly added two parameters. We will have in
> > total 4 parameters. Amit also was concerned about that[1].
> >
> > I think this idea also works fine. In insert-only table case, since
> > only the number of inserted tuples gets increased, only one threshold
> > (that is, threshold computed by autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And
> > in mostly-insert table case, in the first place, we can trigger
> > autovacuum even in current PostgreSQL, since we have some dead tuples.
> > But if we want to trigger autovacuum more frequently by the number of
> > newly inserted tuples, we can set that threshold lower while
> > considering only the number of inserted tuples.
>
> I am torn.
>
> On the one hand it would be wonderful not to have to add yet more GUCs
> to the already complicated autovacuum configuration.  It already confuses
> too many users.
>
> On the other hand that will lead to unnecessary vacuums for small
> tables.
> Worse, the progression caused by the comparatively large scale
> factor may make it vacuum large tables too seldom.
>

I might be missing your point but could you elaborate on that in what
kind of case you think this lead to unnecessary vacuums?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-10 Thread Amit Kapila

On Tue, Mar 10, 2020 at 6:48 PM Robert Haas  wrote:
>
> On Sat, Mar 7, 2020 at 10:23 AM Tom Lane  wrote:
> > I continue to think that we'd be better off getting all of this
> > out of the heavyweight lock manager.  There is no reason why we
> > should need deadlock detection, or multiple holds of the same
> > lock, or pretty much anything that LWLocks don't give you.
>
> Well, that was my initial inclination too, but Andres didn't like it.
> I don't know whether it's better to take his advice or yours.
>
> The one facility that we need here which the heavyweight lock facility
> does provide and the lightweight lock facility does not is the ability
> to take locks on an effectively unlimited number of distinct objects.
> That is, we can't have a separate LWLock for every relation, because
> there ~2^32 relation OIDs per database, and ~2^32 database OIDs, and a
> patch that tried to allocate a tranche of 2^64 LWLocks would probably
> get shot down.
>

I think if we have to follow any LWLock based design, then we also
need to think about a case where if it is already acquired by the
backend (say in X mode), then it should be granted if the same backend
tries to acquire it in same mode (or mode that is compatible with the
mode in which it is already acquired).  As per my analysis above [1],
we do this at multiple places for relation extension lock.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BE8Vu%3D9PYZBZvMrga0Ynz_m6jmT3G_vJv-3L1PWv9Krg%40mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Improve handling of parameter differences in physical replication

2020-03-10 Thread Kyotaro Horiguchi

At Tue, 10 Mar 2020 14:47:47 +0100, Peter Eisentraut 
 wrote in 
> On 2020-03-10 09:57, Kyotaro Horiguchi wrote:
> >> Well I meant to periodically send warning messages while waiting for
> >> parameter change, that is after exhausting resources and stopping
> >> recovery. In this situation user need to notice that as soon as
> >> possible.
> > If we lose connection, standby continues to complain about lost
> > connection every 5 seconds.  This is a situation of that kind.
> 
> My argument is that it's not really the same.  If a standby is
> disconnected for more than a few minutes, it's really not going to be
> a good standby anymore after a while.  In this case, however, having
> certain parameter discrepancies is really harmless and you can run
> with it for a long time.  I'm not strictly opposed to a periodic
> warning, but it's unclear to me how we would find a good interval.

I meant the behavior after streaming is paused.  That situation leads
to loss of WAL or running out of WAL storage on the master.  Actually
5 seconds as a interval would be too frequent, but, maybe, we need at
least one message for a WAL segment-size?

> > By the way, when I reduced max_connection only on master then take
> > exclusive locks until standby complains on lock exchaustion, I see a
> > WARNING that is saying max_locks_per_transaction instead of
> > max_connection.
...
> > WARNING: recovery paused because of insufficient setting of parameter
> > max_locks_per_transaction (currently 10)
> > DETAIL:  The value must be at least as high as on the primary server.
> > HINT: Recovery cannot continue unless the parameter is changed and the
> > server restarted.
> > CONTEXT:  WAL redo at 0/6004A80 for Standb
> 
> This is all a web of half-truths.  The lock tables are sized based on
> max_locks_per_xact * (MaxBackends + max_prepared_xacts).  So if you
> run out of lock space, we currently recommend (in the single-server
> case), that you raise max_locks_per_xact, but you could also raise
> max_prepared_xacts or something else.  So this is now the opposite
> case where the lock table on the master was bigger because of
> max_connections.

Yeah, I know.  So, I'm not sure whether the checks on individual GUC
variable (other than wal_level) makes sense.  We might even not need
the WARNING on parameter change.

> We could make the advice less specific and just say, in essence, you
> need to make some parameter changes; see earlier for some hints.

In that sense the direction menetioned above seems sensible.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Asynchronous Append on postgres_fdw nodes.

2020-03-10 Thread movead li

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

I redo the make installcheck-world as Kyotaro Horiguchi point out and the
result nothing wrong. And I think the patch is good in feature and performance
here is the test result thread I made before:
https://www.postgresql.org/message-id/CA%2B9bhCK7chd0qx%2Bmny%2BU9xaOs2FDNJ7RaxG4%3D9rpgT6oAKBgWA%40mail.gmail.com

The new status of this patch is: Ready for Committer

Re: Re: Asynchronous Append on postgres_fdw nodes.

2020-03-10 Thread movead...@highgo.ca


>It seems to me that you are setting a path containing ".." to PGDATA.
Thanks point it for me.



Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

2020-03-10 Thread Tom Lane

=?UTF-8?B?QWTDqQ==?=  writes:
> Like the title says, using "gin_fuzzy_search_limit" degrades speed when it
> has a relatively low setting.
> ...
> Attached is SQL to test and observe this issue and also attached is a patch
> I want to eventually submit to a commitfest.

I took a brief look at this.  It seems like what you're actually trying
to accomplish is to ensure that entryLoadMoreItems's "stepright" path
is taken, instead of re-descending the index from the root.  Okay,
I can see why that'd be a big win, but why are you tying it to the
dropItem behavior?  It should apply any time we're iterating this loop
more than once.  IOW, it seems to me like the logic about when to step
right is just kinda broken, and this is a band-aid rather than a full fix.
The performance hit is worse for fuzzy-search mode because it will
iterate the loop more (relative to the amount of work done elsewhere),
but there's still a potential for wasted work.

Actually, a look at the code coverage report shows that the
not-step-right path is never taken at all in our standard regression
tests.  Maybe that just says bad things about the tests' coverage, but
now I'm wondering whether we could just flush that code path altogether,
and assume that we should always step right at this point.

[ cc'ing heikki and alexander, who seem to have originated that code
at 626a120656a75bf4fe64b1d0d83c23cb38d3771a.  The commit message says
it saves a lot of I/O, but I'm wondering if this report disproves that.
In any case the lack of test coverage is pretty damning. ]

While we're here, what do you think about the comment in the other
code branch just above:

/* XXX: shouldn't we apply the fuzzy search limit here? */

I'm personally inclined to suspect that the fuzzy-search business has
got lots of bugs, which haven't been noticed because (a) it's so squishily
defined that one can hardly tell whether a given result is buggy or not,
and (b) nearly nobody uses it anyway (possibly not unrelated to (a)).
As somebody who evidently is using it, you're probably more motivated
to track down bugs than the rest of us.

regards, tom lane

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-10 Thread David Rowley

On Tue, 10 Mar 2020 at 21:19, Andy Fan  wrote:
>
>> SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.
>
>
> Actually I want to keep the distinct for this case now.  One reason is there 
> are only 1
> row returned, so the distinct erasing can't help much.   The more important 
> reason is
> Query->hasAggs is true for "select distinct  (select count(*) filter (where 
> t2.c2 = 6
> and t2.c1 < 10) from ft1 t1 where t1.c1 = 6)  from ft2 t2 where t2.c2 % 6 = 0 
> order by 1;"
> (this sql came from postgres_fdw.sql).

I think that sort of view is part of the problem here.  If you want to
invent some new way to detect uniqueness that does not count that case
then we have more code with more possible places to have bugs.

FWIW, query_is_distinct_for() does detect that case with:

/*
* If we have no GROUP BY, but do have aggregates or HAVING, then the
* result is at most one row so it's surely unique, for any operators.
*/
if (query->hasAggs || query->havingQual)
return true;

which can be seen by the fact that the following find the unique join on t2.

postgres=# explain verbose select * from t1 inner join (select
count(*) c from t1) t2 on t1.a=t2.c;
 QUERY PLAN

 Hash Join  (cost=41.91..84.25 rows=13 width=12)
   Output: t1.a, (count(*))
   Inner Unique: true
   Hash Cond: (t1.a = (count(*)))
   ->  Seq Scan on public.t1  (cost=0.00..35.50 rows=2550 width=4)
 Output: t1.a
   ->  Hash  (cost=41.89..41.89 rows=1 width=8)
 Output: (count(*))
 ->  Aggregate  (cost=41.88..41.88 rows=1 width=8)
   Output: count(*)
   ->  Seq Scan on public.t1 t1_1  (cost=0.00..35.50
rows=2550 width=0)
 Output: t1_1.a
(12 rows)

It will be very simple to add an empty List of UniqueKeys to the GROUP
BY's RelOptInfo to indicate that all expressions are unique.  That way
any code that checks if some of the RelOptInfo's unique keys are a
subset of some expressions they'd like to know are unique, then
they'll get a match.

It does not really matter how much effort is saved in your example
above. The UniqueKey infrastructure won't know how much effort
properly adding all the uniquekeys will save. It should just add all
the keys it can and let whichever code cares about that reap the
benefits.

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-10 Thread David Rowley

On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat
 wrote:
>
> On Tue, Mar 10, 2020 at 1:49 PM Andy Fan  wrote:
> > In my current implementation, it calculates the uniqueness for each
> > BaseRel only, but in your way,  looks we need to calculate the
> > UniquePathKey for both BaseRel and JoinRel.   This makes more
> > difference for multi table join.
>
> I didn't understand this concern. I think, it would be better to do it
> for all kinds of relation types base, join etc. This way we are sure
> that one method works across the planner to eliminate the need for
> Distinct or grouping. If we just implement something for base
> relations right now and don't do that for joins, there is a chance
> that that method may not work for joins when we come to implement it.

Yeah, it seems to me that we're seeing more and more features that
require knowledge of uniqueness of a RelOptInfo. The skip scans patch
needs to know if a join will cause row duplication so it knows if the
skip scan path can be joined to without messing up the uniqueness of
the skip scan.  Adding more and more places that loop over the rel's
indexlist just does not seem the right way to do it, especially so
when you have to dissect the join rel down to its base rel components
to check which indexes there are. Having the knowledge on-hand at the
RelOptInfo level means we no longer have to look at indexes for unique
proofs.

> > Another concern is UniquePathKey
> > is designed for a general purpose,  we need to maintain it no matter
> > distinctClause/groupbyClause.
>
> This should be ok. The time spent in annotating a RelOptInfo about
> uniqueness is not going to be a lot. But doing so would help generic
> elimination of Distinct/Group/Unique operations. What is
> UniquePathKey; I didn't find this in your patch or in the code.

Possibly a misinterpretation. There is some overlap between this patch
and the skip scan patch, both would like to skip doing explicit work
to implement DISTINCT. Skip scans just go about it by adding new path
types that scan the index and only gathers up unique values.  In that
case, the RelOptInfo won't contain the unique keys, but the skip scan
path will. How I imagine both of these patches working together in
create_distinct_paths() is that we first check if the DISTINCT clause
is a subset of the a set of the RelOptInfo's unique keys (this patch),
else we check if there are any paths with uniquekeys that we can use
to perform a no-op on the DISTINCT clause (the skip scan patch), if
none of those apply, we create the required paths to uniquify the
results.

Re: Index Skip Scan

2020-03-10 Thread David Rowley

On Wed, 11 Mar 2020 at 01:38, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > >On Tue, Mar 10, 2020 at 09:29:32AM +1300, David Rowley wrote:
> > There's also some weird looking assumptions that an EquivalenceMember
> > can only be a Var in create_distinct_paths(). I think you're only
> > saved from crashing there because a ProjectionPath will be created
> > atop of the IndexPath to evaluate expressions, in which case you're
> > not seeing the IndexPath.  This results in the optimisation not
> > working in cases like:
>
> I've read it as "an assumption that an EquivalenceMember can only be a
> Var" results in "the optimisation not working in cases like this". But
> you've meant that ignoring a ProjectionPath with an IndexPath inside
> results in this optimisation not working, right? If so, then everything
> is clear, and my apologies, maybe I need to finally fix my sleep
> schedule :)

Yes, I was complaining that a ProjectionPath breaks the optimisation
and I don't believe there's any reason that it should.

I believe the way to make that work correctly requires paying
attention to the Path's uniquekeys rather than what type of path it
is.

Re: control max length of parameter values logged

2020-03-10 Thread Tom Lane

Alexey Bashtanov  writes:
> I personally don't think that we necessarily need to cut the values, we 
> can rely on the users
> being careful when using this feature -- in the same way we trusted them 
> use similarly dangerous
> log_min_duration_statement and especially log_statement for ages. At 
> least it's better than having
> no option to disable it. Alvaro's opinion was different [3]. What do you 
> think
> of adding a parameter to limit max logged parameter length? See patch 
> attached.

This patch is failing to build docs (per the cfbot) and it also fails
check-world because you changed behavior tested by ba79cb5dc's test case.
Attached is an update that hopefully will make the cfbot happy.

I agree that something ought to be done here, but I'm not sure that
this is exactly what.  It appears to me that there are three related
but distinct behaviors under discussion:

1. Truncation of bind parameters returned to clients in error message
   detail fields.
2. Truncation of bind parameters written to the server log in logged
   error messages.
3. Truncation of bind parameters written to the server log in non-error
   statement logging actions (log_statement and variants).

Historically we haven't truncated any of these, I believe.  As of
ba79cb5dc we forcibly truncate #1 and #2 at 64 bytes, but not #3.
Your patch proposes to provide a SUSET GUC that controls the
truncation length for all 3.

I think that the status quo as of ba79cb5dc is entirely unacceptable,
because there is no recourse if you want to find out why a statement
is failing and the reason is buried in some long parameter string.
However, this patch doesn't really fix it, because it still seems
pretty annoying that you need to be superuser to adjust what gets
sent back to the client.  Maybe that isn't a problem in practice
(since the client presumably has the original parameter value laying
about), but it seems conceptually wrong.

On the other hand, that line of reasoning leads to wanting two
separate GUCs (one for #1 and one for the other two), which seems
like overkill, plus it's going to be hard/expensive to have the
outputs for #1 and #2 not be the same.

I do agree that it seems weird and random (not to say 100% backward)
that error cases provide only truncated values but routine query
logging insists on emitting full untruncated values.  I should think
that the most common use-cases would want it the other way round.

So I feel like we'd better resolve these definitional questions about
what behavior we actually want.  I agree that ba79cb5dc is not
terribly well thought out as it stands.

regards, tom lane

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 371d783..b002df5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6632,6 +6632,24 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '

+ 
+  log_parameter_max_length (integer)
+  
+   log_parameter_max_length configuration parameter
+  
+  
+  
+   
+If greater than zero, bind parameter values reported in error
+messages and statement-logging messages are trimmed to no more
+than this many bytes.
+If this value is specified without units, it is taken as bytes.
+Zero (the default) disables trimming.
+Only superusers can change this setting.
+   
+  
+ 
+

   log_statement (enum)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 9dba3b0..70ce5bb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1843,13 +1843,24 @@ exec_bind_message(StringInfo input_message)
 		if (knownTextValues == NULL)
 			knownTextValues =
 palloc0(numParams * sizeof(char *));
-		/*
-		 * Note: must copy at least two more full characters
-		 * than BuildParamLogString wants to see; otherwise it
-		 * might fail to include the ellipsis.
-		 */
-		knownTextValues[paramno] =
-			pnstrdup(pstring, 64 + 2 * MAX_MULTIBYTE_CHAR_LEN);
+
+		if (log_parameter_max_length > 0)
+		{
+			/*
+			 * We can trim the saved string, knowing that we
+			 * won't print all of it.  But we must copy at
+			 * least two more full characters than
+			 * BuildParamLogString wants to use; otherwise it
+			 * might fail to include the trailing ellipsis.
+			 */
+			knownTextValues[paramno] =
+pnstrdup(pstring,
+		 log_parameter_max_length
+		 + 2 * MAX_MULTIBYTE_CHAR_LEN);
+		}
+		else
+			knownTextValues[paramno] = pstrdup(pstring);
+
 		MemoryContextSwitchTo(oldcxt);
 	}
 	if (pstring != pbuf.data)
@@ -1912,7 +1923,9 @@ exec_bind_message(StringInfo input_message)
 		 */
 		if (log_parameters_on_error)
 			params->paramValuesStr =
-BuildParamLogString(params, knownTextValues, 64);
+BuildParamLogString(params,
+	knownTextValues,
+	log_parameter_max

Re: shared-memory based stats collector

2020-03-10 Thread Andres Freund

Hi,

On 2020-03-10 19:52:22 +0100, Julien Rouhaud wrote:
> On Tue, Mar 10, 2020 at 1:48 PM Alvaro Herrera  
> wrote:
> >
> > On 2020-Mar-10, Kyotaro Horiguchi wrote:
> >
> > > At Mon, 9 Mar 2020 20:34:20 -0700, Andres Freund  
> > > wrote in
> > > > On 2020-03-10 12:27:25 +0900, Kyotaro Horiguchi wrote:
> > > > > That's true, but I have the same concern with Tom. The archive bacame
> > > > > too-tightly linked with other processes than actual relation.
> > > >
> > > > What's the problem here? We have a number of helper processes
> > > > (checkpointer, bgwriter) that are attached to shared memory, and it's
> > > > not a problem.
> > >
> > > That theoretically raises the chance of server-crash by a small amount
> > > of probability. But, yes, it's absurd to prmise that archiver process
> > > crashes.
> >
> > The case I'm worried about is a misconfigured archive_command that
> > causes the archiver to misbehave (exit with a code other than 0); if
> > that already doesn't happen, or we can make it not happen, then I'm okay
> > with the changes to archiver.
> 
> Any script that gets killed, or that exit with a return code > 127
> would do that.

But just with a FATAL, not with something worse. And the default
handling for aux backends accepts exit code 1 (which elog uses for
FATAL) as a normal shutdown. Am I missing something here?

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread David Rowley

On Wed, 11 Mar 2020 at 08:17, Laurenz Albe  wrote:
>
> On Tue, 2020-03-10 at 18:14 +0900, Masahiko Sawada wrote:
> > I have one question about this patch from architectural perspective:
> > have you considered to use autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor also for this purpose? That is, we
> > compare the threshold computed by these values to not only the number
> > of dead tuples but also the number of inserted tuples. If the number
> > of dead tuples exceeds the threshold, we trigger autovacuum as usual.
> > On the other hand if the number of inserted tuples exceeds, we trigger
> > autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user
> > consider the settings of newly added two parameters. We will have in
> > total 4 parameters. Amit also was concerned about that[1].
> >
> > I think this idea also works fine. In insert-only table case, since
> > only the number of inserted tuples gets increased, only one threshold
> > (that is, threshold computed by autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And
> > in mostly-insert table case, in the first place, we can trigger
> > autovacuum even in current PostgreSQL, since we have some dead tuples.
> > But if we want to trigger autovacuum more frequently by the number of
> > newly inserted tuples, we can set that threshold lower while
> > considering only the number of inserted tuples.
>
> I am torn.
>
> On the one hand it would be wonderful not to have to add yet more GUCs
> to the already complicated autovacuum configuration.  It already confuses
> too many users.
>
> On the other hand that will lead to unnecessary vacuums for small
> tables.
> Worse, the progression caused by the comparatively large scale
> factor may make it vacuum large tables too seldom.

I think we really need to discuss what the default values for these
INSERT-only vacuums should be before we can decide if we need 2
further GUCc to control the feature.  Right now the default is 0.0 on
the scale factor and 10 million tuples threshold.  I'm not saying
those are good or bad values, but if they are good, then they're
pretty different from the normal threshold of 50 and the normal scale
factor of 0.2, therefore (assuming the delete/update thresholds are
also good), then we need the additional GUCs.

If someone wants to put forward a case for making the defaults more
similar, then perhaps we can consider merging the options.  One case
might be the fact that we want INSERT-only tables to benefit from
Index Only Scans more often than after 10 million inserts.

As for pros and cons. Feel free to add to the following list:

For new GUCs/reloptions:
1. Gives users more control over this new auto-vacuum behaviour
2. The new feature can be completely disabled. This might be very
useful for people who suffer from auto-vacuum starvation.

Against new GUCs/reloptions:
1. Adds more code, documentation and maintenance.
2. Adds more complexity to auto-vacuum configuration.

As for my opinion, I'm leaning towards keeping the additional options.
I think if we were just adding auto-vacuum to core code now, then I'd
be voting to keep the configuration as simple as possible. However,
that's far from the case, and we do have over a decade of people that
have gotten used to how auto-vacuum currently behaves. Many people are
unlikely to even notice the change, but some will, and then there will
be another group of people who want to turn it off, and that group
might be upset when we tell them that they can't, at least not without
flipping the big red "autovacuum" switch into the off position (of
which, I'm pretty hesitant to recommend that anyone ever does).

Re: shared-memory based stats collector

2020-03-10 Thread Andres Freund

On 2020-03-10 09:48:07 -0300, Alvaro Herrera wrote:
> On 2020-Mar-10, Kyotaro Horiguchi wrote:
> 
> > At Mon, 9 Mar 2020 20:34:20 -0700, Andres Freund  wrote 
> > in 
> > > On 2020-03-10 12:27:25 +0900, Kyotaro Horiguchi wrote:
> > > > That's true, but I have the same concern with Tom. The archive bacame
> > > > too-tightly linked with other processes than actual relation.
> > > 
> > > What's the problem here? We have a number of helper processes
> > > (checkpointer, bgwriter) that are attached to shared memory, and it's
> > > not a problem.
> > 
> > That theoretically raises the chance of server-crash by a small amount
> > of probability. But, yes, it's absurd to prmise that archiver process
> > crashes.
> 
> The case I'm worried about is a misconfigured archive_command that
> causes the archiver to misbehave (exit with a code other than 0); if
> that already doesn't happen, or we can make it not happen, then I'm okay
> with the changes to archiver.

Well, an exit(1) is also fine, afaict. No?

The archive command can just trigger either a FATAL or a LOG:

rc = system(xlogarchcmd);
if (rc != 0)
{
/*
 * If either the shell itself, or a called command, died on a 
signal,
 * abort the archiver.  We do this because system() ignores 
SIGINT and
 * SIGQUIT while waiting; so a signal is very likely something 
that
 * should have interrupted us too.  Also die if the shell got a 
hard
 * "command not found" type of error.  If we overreact it's no 
big
 * deal, the postmaster will just start the archiver again.
 */
int lev = wait_result_is_any_signal(rc, 
true) ? FATAL : LOG;

if (WIFEXITED(rc))
{
ereport(lev,
(errmsg("archive command failed with 
exit code %d",
WEXITSTATUS(rc)),
 errdetail("The failed archive command 
was: %s",
   xlogarchcmd)));
}
else if (WIFSIGNALED(rc))
{
#if defined(WIN32)
ereport(lev,
(errmsg("archive command was terminated 
by exception 0x%X",
WTERMSIG(rc)),
 errhint("See C include file 
\"ntstatus.h\" for a description of the hexadecimal value."),
 errdetail("The failed archive command 
was: %s",
   xlogarchcmd)));
#else
ereport(lev,
(errmsg("archive command was terminated 
by signal %d: %s",
WTERMSIG(rc), 
pg_strsignal(WTERMSIG(rc))),
 errdetail("The failed archive command 
was: %s",
   xlogarchcmd)));
#endif
}
else
{
ereport(lev,
(errmsg("archive command exited with 
unrecognized status %d",
rc),
 errdetail("The failed archive command 
was: %s",
   xlogarchcmd)));
}

snprintf(activitymsg, sizeof(activitymsg), "failed on %s", 
xlog);
set_ps_display(activitymsg, false);

return false;
}

I.e. there's only normal ways to shut down the archiver due to a failing
archvie command.

Greetings,

Andres Freund

Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-10 Thread Daniel Gustafsson

> On 10 Mar 2020, at 18:38, Peter Eisentraut  
> wrote:
> 
> On 2020-03-10 12:41, Daniel Gustafsson wrote:
>>> On 10 Mar 2020, at 11:53, Hugh McMaster  wrote:
>>> Debian (and Ubuntu) are beginning to remove foo-config legacy scripts.
>>> Already, xml2-config has been flagged for removal, with packages being
>>> asked to switch to pkg-config.
>>> 
>>> This patch uses pkg-config's PKG_CHECK_MODULES macro to detect libxml2
>>> or, if pkg-config is not available, falls back to xml2-confg.
>> This was previously discussed in 20200120204715.ga73...@msg.df7cb.de which
>> ended without a real conclusion on what could/should be done (except that
>> nothing *had* to be done).
>> What is the situation on non-Debian/Ubuntu systems (BSD's, macOS etc etc)?  
>> Is
>> it worth adding pkg-config support if we still need a fallback to 
>> xml2-config?
> 
> Btw., here is an older thread for the same issue 
> .
>   Might be worth reflecting on the issues discussed there.

Thanks, didn't realize that the subject had been up for discussion earlier as
well.

For me, the duplication aspect is the most troubling, since we'd still need the
xml2-config fallback and thus won't be able to simplify the code.

cheers ./daniel

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread Laurenz Albe

On Tue, 2020-03-10 at 18:14 +0900, Masahiko Sawada wrote:

Thanks for the review and your thoughts!

> FYI actually vacuum could perform index cleanup phase (i.g.
> PROGRESS_VACUUM_PHASE_INDEX_CLEANUP phase) on a table even if it's a
> truly INSERT-only table, depending on
> vacuum_cleanup_index_scale_factor. Anyway, I also agree with not
> disabling index cleanup in insert-only vacuum case, because it could
> become not only a cause of index bloat but also a big performance
> issue. For example, if autovacuum on a table always run without index
> cleanup, gin index on that table will accumulate insertion tuples in
> its pending list and will be cleaned up by a backend process while
> inserting new tuple, not by a autovacuum process. We can disable index
> vacuum by index_cleanup storage parameter per tables, so it would be
> better to defer these settings to users.

Thanks for the confirmation.

> I have one question about this patch from architectural perspective:
> have you considered to use autovacuum_vacuum_threshold and
> autovacuum_vacuum_scale_factor also for this purpose? That is, we
> compare the threshold computed by these values to not only the number
> of dead tuples but also the number of inserted tuples. If the number
> of dead tuples exceeds the threshold, we trigger autovacuum as usual.
> On the other hand if the number of inserted tuples exceeds, we trigger
> autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user
> consider the settings of newly added two parameters. We will have in
> total 4 parameters. Amit also was concerned about that[1].
> 
> I think this idea also works fine. In insert-only table case, since
> only the number of inserted tuples gets increased, only one threshold
> (that is, threshold computed by autovacuum_vacuum_threshold and
> autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And
> in mostly-insert table case, in the first place, we can trigger
> autovacuum even in current PostgreSQL, since we have some dead tuples.
> But if we want to trigger autovacuum more frequently by the number of
> newly inserted tuples, we can set that threshold lower while
> considering only the number of inserted tuples.

I am torn.

On the one hand it would be wonderful not to have to add yet more GUCs
to the already complicated autovacuum configuration.  It already confuses
too many users.

On the other hand that will lead to unnecessary vacuums for small
tables.
Worse, the progression caused by the comparatively large scale
factor may make it vacuum large tables too seldom.

I'd be grateful if somebody knowledgeable could throw his or her opinion
into the scales.

> And I briefly looked at this patch:
> 
> @@ -2889,7 +2898,7 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
> tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
> /* As of now, we don't support parallel vacuum for autovacuum */
> tab->at_params.nworkers = -1;
> -   tab->at_params.freeze_min_age = freeze_min_age;
> +   tab->at_params.freeze_min_age = freeze_all ? 0 : freeze_min_age;
> tab->at_params.freeze_table_age = freeze_table_age;
> tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
> tab->at_params.multixact_freeze_table_age = 
> multixact_freeze_table_age;
> 
> I think we can set multixact_freeze_min_age to 0 as well.

Ugh, yes, that is a clear oversight.
I have fixed it in the latest version.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread Laurenz Albe

On Tue, 2020-03-10 at 00:00 -0500, Justin Pryzby wrote:
> > +++ b/src/backend/utils/misc/postgresql.conf.sample
> > +#autovacuum_vacuum_insert_threshold = 1000   # min number of row 
> > inserts
> > + # before vacuum
> 
> Similar to a previous comment [0] about reloptions or GUC:
> 
> Can we say "threshold number of insertions before vacuum" ?
> ..or "maximum number of insertions before triggering autovacuum"

Hmm.  I copied the wording from "autovacuum_vacuum_threshold".

Since the parameters have similar semantics, a different wording
would confuse.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread Laurenz Albe

On Tue, 2020-03-10 at 13:53 +1300, David Rowley wrote:
> 1. Do we need to change documentation on freeze_min_age to mention
> that it does not apply in all cases? I'm leaning towards not changing
> this as `VACUUM FREEZE` is also an exception to this, which I don't
> see mentioned.

I agree with that.  Too little documentation is bad, but too much of
it can also confuse and make it hard to find the needle in the haystack.

> 2. Perhaps the documentation in maintenance.sgml should mention that
> the table will be vacuumed with the equivalent of having
> vacuum_freeze_min_age = 0, instead of:
> 
> "Such a vacuum will aggressively freeze tuples."
> 
> aggressive is the wrong word here. We call it an aggressive vacuum if
> we disable page skipping, not for setting the vacuum_freeze_min_age to
> 0.

Agreed, see below.

> 3. The following DEBUG3 elog should be updated to include the new values:
> 
> elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)",
> NameStr(classForm->relname),
> vactuples, vacthresh, anltuples, anlthresh);

Done.

> Someone might be confused at why auto-vacuum is running if you don't
> put those in.
> 
> 4. This would be nicer if you swapped the order of the operands to the
> < condition and replaced the operator with >. That'll match the way it
> is done above.
> 
> /*
> * If the number of inserted tuples exceeds the threshold and no
> * vacuum is necessary for other reasons, run an "insert-only" vacuum
> * that freezes aggressively.
> */
> if (!(*dovacuum) && vacinsthresh < tabentry->inserts_since_vacuum)
> {
> *dovacuum = true;
> *freeze_all = true;
> }
> 
> It would also be nicer if you assigned the value of
> tabentry->inserts_since_vacuum to a variable, so as to match what the
> other code there is doing. That'll also make the change for #3 neater.

Changed that way.

> 5. The following text:
> 
> A threshold similar to the above is calculated from
>  and
> .
> Tables that have received more inserts than the calculated threshold
> since they were last vacuumed (and are not eligible for vacuuming for
> other reasons) will be vacuumed to reduce the impact of a future
> anti-wraparound vacuum run.
> 
> I think "... will be vacuumed with the equivalent of having  linkend="guc-vacuum-freeze-min-age"/> set to 0".
> I'm not sure we need to mention the reduction of impact to
> anti-wraparound vacuums.

Done like that.

I left in the explanation of the purpose of this setting.
Understanding the purpose of the GUCs will make it easier to tune them
correctly.

> 6. Please run the regression tests and make sure they pass. The
> "rules" test is currently failing due to the new column in
> "pg_stat_all_tables"

Oops, sorry.  I ran pgindent, but forgot to re-run the regression tests.

Done.


Attached is V5, which also fixes the bug discovered my Masahiko Sawada.
He made an interesting suggestion which we should consider before committing.

Yours,
Laurenz Albe
From 83fa15ef51836ae07a235f6070637857c9c6a9c8 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 10 Mar 2020 19:58:39 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000.
The scale factor defaults to 0, which means that it is
effectively disabled, but it offers some flexibility
to tune the feature similar to other autovacuum knobs.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed, freezing as many tuples as
possible.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 ++
 doc/src/sgml/maintenance.sgml | 12 
 doc/src/sgml/monitoring.sgml  |  5 ++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 56 ---
 src/backend/postmaster/pgstat.c   |  5 ++
 src/backend/utils/adt/pgstatfuncs.c   | 16 ++
 src/backend/utils/misc/guc.c  | 20 +++
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/i

Re: backend type in log_line_prefix?

2020-03-10 Thread Justin Pryzby

On Thu, Feb 13, 2020 at 06:43:32PM +0900, Fujii Masao wrote:
> If we do this, backend type should be also included in csvlog?

+1, I've been missing that

Note, this patch seems to correspond to:
b025f32e0b Add leader_pid to pg_stat_activity

I had mentioned privately to Julien missing this info in CSV log.

Should leader_pid be exposed instead (or in addition)?  Or backend_type be a
positive number giving the leader's PID if it's a parallel worker, or a some
special negative number like -BackendType to indicate a nonparallel worker.
NULL for a B_BACKEND which is not a parallel worker.

My hope is to answer to questions like these:

. is query (ever? usually?) using parallel paths?
. is query usefully using parallel paths?
. what queries are my max_parallel_workers(_per_process) being used for ?
. Are certain longrunning or frequently running queries which are using
  parallel paths using all max_parallel_workers and precluding other queries
  from using parallel query ?  Or, are semi-short queries sometimes precluding
  longrunning queries from using parallelism, when the long queries would
  better benefit ?

I think this patch alone wouldn't provide that, and there'd need to either be a
line logged for each worker.  Maybe it'd log full query+details (ugh), or just
log "parallel worker of pid...".  Or maybe there'd be a new column with which
the leader would log nworkers (workers planned vs workers launched - I would
*not* want to get this out of autoexplain).

-- 
Justin

Re: Constraint's NO INHERIT option is ignored in CREATE TABLE LIKE statement

2020-03-10 Thread Tom Lane

Amit Langote  writes:
> On Thu, Feb 20, 2020 at 8:20 AM David G. Johnston
>  wrote:
>> Not sure I agree with the premise that it is not supposed to be copied; is 
>> there some other object type the allows NO INHERIT that isn't copied when 
>> CREATE TABLE LIKE is used and check constraints are the odd ones out?

> Syntax currently allows only CHECK constraints to be marked NO INHERIT.

Hearing no further comments, pushed, with a bit of cosmetic polishing.

regards, tom lane

Re: shared-memory based stats collector

2020-03-10 Thread Julien Rouhaud

On Tue, Mar 10, 2020 at 1:48 PM Alvaro Herrera  wrote:
>
> On 2020-Mar-10, Kyotaro Horiguchi wrote:
>
> > At Mon, 9 Mar 2020 20:34:20 -0700, Andres Freund  wrote 
> > in
> > > On 2020-03-10 12:27:25 +0900, Kyotaro Horiguchi wrote:
> > > > That's true, but I have the same concern with Tom. The archive bacame
> > > > too-tightly linked with other processes than actual relation.
> > >
> > > What's the problem here? We have a number of helper processes
> > > (checkpointer, bgwriter) that are attached to shared memory, and it's
> > > not a problem.
> >
> > That theoretically raises the chance of server-crash by a small amount
> > of probability. But, yes, it's absurd to prmise that archiver process
> > crashes.
>
> The case I'm worried about is a misconfigured archive_command that
> causes the archiver to misbehave (exit with a code other than 0); if
> that already doesn't happen, or we can make it not happen, then I'm okay
> with the changes to archiver.

Any script that gets killed, or that exit with a return code > 127
would do that.

Re: BEFORE ROW triggers for partitioned tables

2020-03-10 Thread Peter Eisentraut


On 2020-02-27 17:51, Alvaro Herrera wrote:

Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy
-- just remove the check against them.  With that, they work fine.


This looks good to me in principle.  It's a useful thing to support.


1. Just let it be.  If the user does something silly, it's their problem
if they get an ugly error message.

2. If the partition keys are changed, raise an error.  The trigger is
allowed to change everything but those columns.  Then there's no
conflict, and it allows desirable use cases.

3. Allow the partition keys to change, as long as the tuple ends up in
the same partition.  This is the same as (1) except the error message is
nicer.

The attached patch implements (2).


That seems alright to me.


* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined).   Also, it has a performance issue,
namely that we do heap_getattr() potentially repeatedly -- maybe it'd be
better to "slotify" the tuple prior to doing the checks.


The attribute ordering issue obviously needs to be addressed, but the 
performance issue is probably not so important.  How many levels of 
partitioning are we expecting?



Another
possible controversial point is that its location in commands/trigger.c
isn't great.  (Frankly, I don't understand why the executor trigger
firing functions are in commands/ at all.)


yeah ...

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-10 Thread Magnus Hagander

On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  wrote:
>
>
>
> On 2020/03/10 22:43, Amit Langote wrote:
> > On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao  
> > wrote:
> >>> So, I will make the patch adding support for --no-estimate-size option
> >>> in pg_basebackup.
> >>
> >> Patch attached.
> >
> > Like the idea and the patch looks mostly good.
>
> Thanks for reviewing the patch!
>
> > +  total size. If the estimation is disabled in
> > +  pg_basebackup
> > +  (i.e., --no-estimate-size option is specified),
> > +  this is always 0.
> >
> > "always" seems unnecessary.
>
> Fixed.
>
> > +This option prevents the server from estimating the total
> > +amount of backup data that will be streamed. In other words,
> > +backup_total column in the
> > +pg_stat_progress_basebackup
> > +view always indicates 0 if this option is 
> > enabled.
> >
> > Here too.
>
> Fixed.
>
> Attached is the updated version of the patch.

Would it perhaps be better to return NULL instead of 0 in the
statistics view if there is no data?

Also, should it really  be the server version that decides how this
feature behaves, and not the pg_basebackup version? Given that the
implementation is entirely in the client, it seems that's more
logical?


and a few docs nitpicks:


 Whether this is enabled or not, the
 pg_stat_progress_basebackup view
-report the progress of the backup in the server side. But note
-that the total amount of data that will be streamed is estimated
-and reported only when this option is enabled. In other words,
-backup_total column in the view always
-indicates 0 if this option is disabled.
+report the progress of the backup in the server side.
+   
+   
+This option is not allowed when using
+--no-estimate-size.


I think you should just remove that whole paragraph. The details are
now listed under the disable parameter.

+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view indicates 0 if this option is enabled.

I suggest just "This option prevents the server from estimating the
total amount of backup data that will be streamed, resulting in the
ackup_total column in pg_stat_progress_basebackup to be (zero or NULL
depending on above)".

(Markup needed on that of course ,but you get the idea)

+When this is disabled, the backup will start by enumerating

I'd try to avoid the double negation, with something "without this
parameter, the backup will start..."



+  
+   pg_basebackup asks the server to estimate
+   the total amount of data that will be streamed by default (unless
+   --no-estimate-size is specified) in version 13 or later,
+   and does that only when --progress is specified in
+   the older versions.
+  

That's an item for the release notes, not for the reference page, I
think. It's already explained under the --disable parameter, so I
suggest removing this paragraph as well.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-10 Thread Justin Pryzby

I took a step back, and I wondered whether we should add a generic function for
listing a dir with metadata, possibly instead of changing the existing
functions.  Then one could do pg_ls_dir_metadata('pg_wal',false,false);

Since pg8.1, we have pg_ls_dir() to show a list of files.  Since pg10, we've
had pg_ls_logdir and pg_ls_waldir, which show not only file names but also
(some) metadata (size, mtime).  And since pg12, we've had pg_ls_tmpfile and
pg_ls_archive_statusdir, which also show metadata.

...but there's no a function which lists the metadata of an directory other
than tmp, wal, log.

One can do this:
|SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT 
a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c;
..but that's not as helpful as allowing:
|SELECT * FROM pg_ls_dir_metadata('.',true,true);

There's also no function which recurses into an arbitrary directory, so it
seems shortsighted to provide a function to recursively list a tmpdir.

Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can
write a SQL function to show the dir recursively.  It'd be trivial to plug in
wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely
trivial).
|SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp');

Also, on a neighboring thread[1], Tom indicated that the pg_ls_* functions
should enumerate all files during the initial call, which sounds like a bad
idea when recursively showing directories.  If we add a function recursing into
a directory, we'd need to discuss all the flags to expose to it, like recurse,
ignore_errors, one_filesystem?, show_dotfiles (and eventually bikeshed all the
rest of the flags in find(1)).

My initial patch [2] changed ls_tmpdir to show metadata columns including
is_dir, but not decend.  It's pretty unfortunate if a function called
pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that
(it's new in v12).

I'm interested to in feedback on the alternative approach, as attached.  The
final patch to include all the rest of columns shown by pg_stat_file() is more
of an idea/proposal and not sure if it'll be desirable.  But pg_ls_tmpdir() is
essentially the same as my v1 patch.

This is intended to be mostly independent of any fix to the WARNING I reported
[1].  Since my patch collapses pg_ls_dir into pg_ls_dir_files, we'd only need
to fix one place.  I'm planning to eventually look into Tom's suggestion of
returning tuplestore to fix that, and maybe rebase this patchset on top of
that.

-- 
Justin

[1] 
https://www.postgresql.org/message-id/flat/20200308173103.GC1357%40telsasoft.com
[2] https://www.postgresql.org/message-id/20191214224735.GA28433%40telsasoft.com
>From 2c4b2c408490ecde3cfb4e336a78942f7a6f8197 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Dec 2019 23:34:14 -0600
Subject: [PATCH v9 01/11] BUG: in errmsg

Note there's two changes here.
Should backpatch to v12, where pg_ls_tmpdir was added.
---
 src/backend/utils/adt/genfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 3741b87486..897b11a77d 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -590,7 +590,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 		if (stat(path, &attrib) < 0)
 			ereport(ERROR,
 	(errcode_for_file_access(),
-	 errmsg("could not stat directory \"%s\": %m", dir)));
+	 errmsg("could not stat file \"%s\": %m", path)));
 
 		/* Ignore anything but regular files */
 		if (!S_ISREG(attrib.st_mode))
-- 
2.17.0

>From f3ef0c6ff664f2f26e95ce97e8b50a813bd1aab8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 6 Mar 2020 16:50:07 -0600
Subject: [PATCH v9 02/11] Document historic behavior about hiding directories
 and special files

Should backpatch to v10: tmpdir, waldir and archive_statusdir
---
 doc/src/sgml/func.sgml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 323366feb6..4c0ea5ab3f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21450,6 +21450,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 (mtime) of each file in the log directory. By default, only superusers
 and members of the pg_monitor role can use this function.
 Access may be granted to others using GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.

 

@@ -21461,6 +21462,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 default only superusers and members of the pg_monitor role
 can use this function. Access may be granted to others using
 GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.

 

@@ -21473,6 +21475,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 superusers and members of the pg_monitor role can
 use th

Re: [PATCH] Connection time for \conninfo

2020-03-10 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Anyway, I don't anticipate having time to do anything with this patch
> > but I disagree that this is a "we don't want it" kind of thing, rather
> > we maybe want it, since someone cared enough to write a patch, but the
> > patch needs work and maybe we want it to look a bit different and be
> > better defined.
> 
> I think Peter's primary argument was that this doesn't belong in
> \conninfo, which is about reporting the parameters required to
> establish the connection.  We have kind of broken that already by
> cramming SSL and GSS encryption info into the results, but that
> doesn't mean it should become a kitchen-sink listing of anything
> anybody says they'd like to know.

I could certainly agree with wanting to have a psql command that's "give
me what I need to connect", but that idea and what conninfo actually
returns are pretty distant from each other.  For one thing, if I wanted
that from psql, I'd sure hope to get back something that I could
directly use when starting up a new psql session.

> Anyway, I think your point is that maybe this should be RWF
> not Rejected, and I agree with that.

Ok.

> (I had not looked at the last version of the patch, but now that
> I have, I still don't like the fact that it has the client tracking
> session start time separately from what the server does.  The small
> discrepancy that introduces is going to confuse somebody.  I see
> that there's no documentation update either.)

This worries me about as much as I worry that someone's going to be
confused by explain-analyze output vs. \timing.  Yes, it happens, and
actually pretty often, but I wouldn't change how it works because
they're two different things, not to mention that if I'm going to be
impacted by the time being off on one of the systems, I'd at least like
to know when my client thought it connected relative to the clock on my
client.

THanks,

Stephen

signature.asc
Description: PGP signature

Re: backend type in log_line_prefix?

2020-03-10 Thread Alvaro Herrera

I like these patches; the first two are nice cleanup.

My only gripe is that pgstat_get_backend_desc() is not really a pgstat
function; I think it should have a different name with a prototype in
miscadmin.h (next to the enum's new location, which I would put
someplace near the "pmod.h" comment rather than where you put it;
perhaps just above the AuxProcType definition), and implementation
probably in miscinit.c.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Connection time for \conninfo

2020-03-10 Thread Stephen Frost

Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 2020-03-10 18:38, Stephen Frost wrote:
> >>On 2/27/20 4:21 AM, Peter Eisentraut wrote:
> >>>My opinion is that this is not particularly useful and not appropriate to
> >>>piggy-back onto \conninfo.  Connection information including host, port,
> >>>database, user name is a well-established concept in PostgreSQL programs
> >>>and tools and it contains a delimited set of information. Knowing what
> >>>server and what database you are connected to also seems kind of
> >>>important.  Moreover, this is information that is under control of the
> >>>client, so it must be tracked on the client side.
> >
> >I have to say that I disagree.  Wishing to know when you connected to a
> >server is entirely reasonable and it's also rather clearly under control
> >of the client (though I don't entirely understand that argument in the
> >first place).
> 
> The argument is that the server already knows when the client connected and
> that information is already available.  So there is no reason for the client
> to also track and display that information, other than perhaps convenience.

Yes, it'd be for convenience.

> But the server does not, in general, know what host and port the client
> connected to (because of proxying and other network stuff), so this is
> something that the client must record itself.

I agree that there are some things the client has to track and not ask
the server for, because the server's answers could be different, but I
don't agree that conninfo should only ever include that info (and I
seriously doubt that was the original idea, considering 'database' is
included at least as far back as 1999..).

As a side-note, we should probably update our documentation for psql,
since the description of things under Prompting tends to presume that
there isn't anything between the client and the server (for example,
%> says "The port number at which the database server is listening.",
but it's really "The port number used to connect to the database
server." or something along those lines).

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [PATCH] Connection time for \conninfo

2020-03-10 Thread Tom Lane

Stephen Frost  writes:
> Anyway, I don't anticipate having time to do anything with this patch
> but I disagree that this is a "we don't want it" kind of thing, rather
> we maybe want it, since someone cared enough to write a patch, but the
> patch needs work and maybe we want it to look a bit different and be
> better defined.

I think Peter's primary argument was that this doesn't belong in
\conninfo, which is about reporting the parameters required to
establish the connection.  We have kind of broken that already by
cramming SSL and GSS encryption info into the results, but that
doesn't mean it should become a kitchen-sink listing of anything
anybody says they'd like to know.

Anyway, I think your point is that maybe this should be RWF
not Rejected, and I agree with that.

(I had not looked at the last version of the patch, but now that
I have, I still don't like the fact that it has the client tracking
session start time separately from what the server does.  The small
discrepancy that introduces is going to confuse somebody.  I see
that there's no documentation update either.)

regards, tom lane

Re: [PATCH] Connection time for \conninfo

2020-03-10 Thread Peter Eisentraut


On 2020-03-10 18:38, Stephen Frost wrote:

On 2/27/20 4:21 AM, Peter Eisentraut wrote:

My opinion is that this is not particularly useful and not appropriate to
piggy-back onto \conninfo.  Connection information including host, port,
database, user name is a well-established concept in PostgreSQL programs
and tools and it contains a delimited set of information. Knowing what
server and what database you are connected to also seems kind of
important.  Moreover, this is information that is under control of the
client, so it must be tracked on the client side.


I have to say that I disagree.  Wishing to know when you connected to a
server is entirely reasonable and it's also rather clearly under control
of the client (though I don't entirely understand that argument in the
first place).


The argument is that the server already knows when the client connected 
and that information is already available.  So there is no reason for 
the client to also track and display that information, other than 
perhaps convenience.  But the server does not, in general, know what 
host and port the client connected to (because of proxying and other 
network stuff), so this is something that the client must record itself.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-10 Thread Peter Eisentraut


On 2020-03-10 12:41, Daniel Gustafsson wrote:

On 10 Mar 2020, at 11:53, Hugh McMaster  wrote:



Debian (and Ubuntu) are beginning to remove foo-config legacy scripts.
Already, xml2-config has been flagged for removal, with packages being
asked to switch to pkg-config.

This patch uses pkg-config's PKG_CHECK_MODULES macro to detect libxml2
or, if pkg-config is not available, falls back to xml2-confg.


This was previously discussed in 20200120204715.ga73...@msg.df7cb.de which
ended without a real conclusion on what could/should be done (except that
nothing *had* to be done).

What is the situation on non-Debian/Ubuntu systems (BSD's, macOS etc etc)?  Is
it worth adding pkg-config support if we still need a fallback to xml2-config?


Btw., here is an older thread for the same issue 
. 
 Might be worth reflecting on the issues discussed there.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Connection time for \conninfo

2020-03-10 Thread Stephen Frost

Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 2/27/20 4:21 AM, Peter Eisentraut wrote:
> >My opinion is that this is not particularly useful and not appropriate to
> >piggy-back onto \conninfo.  Connection information including host, port,
> >database, user name is a well-established concept in PostgreSQL programs
> >and tools and it contains a delimited set of information. Knowing what
> >server and what database you are connected to also seems kind of
> >important.  Moreover, this is information that is under control of the
> >client, so it must be tracked on the client side.

I have to say that I disagree.  Wishing to know when you connected to a
server is entirely reasonable and it's also rather clearly under control
of the client (though I don't entirely understand that argument in the
first place).

> >Knowing how long you've been connected on the other hand seems kind of
> >fundamentally unimportant.  If we add that, what's to stop us from adding
> >other statistics of minor interest such as how many commands you've run,
> >how many errors there were, etc.  The connection time is already
> >available, and perhaps we should indeed make it a bit easier to get, but
> >it doesn't need to be a psql command.

Of 'minor' interest, or not, if it's something you'd like to know then
it's pretty handy if there's a way to get that info.  This also isn't
inventing a new psql command but rather adding on to one that's clearly
already a 'creature comfort' command as everything included could be
gotten at in other ways.

As for the comparison to other things of minor interest, we do give
users a way to get things like line number (%l in PROMPT), so including
something akin to that certainly doesn't seem out of the question.
Having a '%T' or something in the prompt for connected-at-time seems
like it could certainly be useful too.

Going a bit farther, I wonder if conninfo could be configurable as a
Prompt-like string, that seems like it'd be pretty neat.

Anyway, I don't anticipate having time to do anything with this patch
but I disagree that this is a "we don't want it" kind of thing, rather
we maybe want it, since someone cared enough to write a patch, but the
patch needs work and maybe we want it to look a bit different and be
better defined.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-10 Thread Fujii Masao




On 2020/03/10 22:43, Amit Langote wrote:

On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao  wrote:

So, I will make the patch adding support for --no-estimate-size option
in pg_basebackup.


Patch attached.


Like the idea and the patch looks mostly good.


Thanks for reviewing the patch!


+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is always 0.

"always" seems unnecessary.


Fixed.


+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view always indicates 0 if this option is enabled.

Here too.


Fixed.

Attached is the updated version of the patch.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 987580d6df..2c9f51daf4 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4388,10 +4388,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  backup_total
  bigint
  
-  Total amount of data that will be streamed. If progress reporting
-  is not enabled in pg_basebackup
-  (i.e., --progress option is not specified),
-  this is 0. Otherwise, this is estimated and
+  Total amount of data that will be streamed. This is estimated and
   reported as of the beginning of
   streaming database files phase. Note that
   this is only an approximation since the database
@@ -4399,7 +4396,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   and WAL log may be included in the backup later. This is always
   the same value as backup_streamed
   once the amount of data streamed exceeds the estimated
-  total size.
+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is 0.
  
 
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 29bf2f9b97..f3d05b4659 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -460,20 +460,14 @@ PostgreSQL documentation
 in this case the estimated target size will increase once it passes the
 total estimate without WAL.

-   
-When this is enabled, the backup will start by enumerating the size of
-the entire database, and then go back and send the actual contents.
-This may make the backup take slightly longer, and in particular it
-will take longer before the first data is sent.
-   

 Whether this is enabled or not, the
 pg_stat_progress_basebackup view
-report the progress of the backup in the server side. But note
-that the total amount of data that will be streamed is estimated
-and reported only when this option is enabled. In other words,
-backup_total column in the view always
-indicates 0 if this option is disabled.
+report the progress of the backup in the server side.
+   
+   
+This option is not allowed when using
+--no-estimate-size.

   
  
@@ -552,6 +546,30 @@ PostgreSQL documentation

   
  
+
+ 
+  --no-estimate-size
+  
+   
+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view indicates 0 if this option is enabled.
+   
+   
+When this is disabled, the backup will start by enumerating
+the size of the entire database, and then go back and send
+the actual contents. This may make the backup take slightly
+longer, and in particular it will take longer before the first
+data is sent. This option is useful to avoid such estimation
+time if it's too long.
+   
+   
+This option is not allowed when using --progress.
+   
+  
+ 
 

 
@@ -767,6 +785,13 @@ PostgreSQL documentation
permissions are enabled on the source cluster.
   
 
+  
+   pg_basebackup asks the server to estimate
+   the total amount of data that will be streamed by default (unless
+   --no-estimate-size is specified) in version 13 or later,
+   and does that only when --progress is specified in
+   the older versions.
+  
  
 
  
diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index 48bd838803..1032b474b4 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -102,6 +102,11 @@ typedef void (*WriteDataCallback) (size_t nbytes, char 
*buf,
  */
 #define MINIMUM_VERSION_FOR_TEMP_SLOTS 100

Re: backend type in log_line_prefix?

2020-03-10 Thread Kuntal Ghosh

On Tue, Mar 10, 2020 at 9:11 PM Peter Eisentraut
 wrote:
> On 2020-03-09 16:20, Kuntal Ghosh wrote:
> > In v3-0001-Refactor-ps_status.c-API.patch,
> > - * postgres: walsender   
> > This part is still valid, right?
> Sure but I figured this comment was in the context of the explanation of
> how the old API was being abused, so it's no longer necessary.
>
Makes sense.

> set_ps_display() checks update_process_title itself and does nothing if
> it's off, so this should work okay.
Right.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Conflict handling for COPY FROM

2020-03-10 Thread Alexey Kondratov


On 09.03.2020 15:34, Surafel Temesgen wrote:


okay attached is a rebased patch with it



+    Portal        portal = NULL;
...
+        portal = GetPortalByName("");
+        SetRemoteDestReceiverParams(dest, portal);

I think that you do not need this, since you are using a ready 
DestReceiver. The whole idea of passing DestReceiver down to the 
CopyFrom was to avoid that code. This unnamed portal is created in the 
exec_simple_query [1] and has been already set to the DestReceiver there 
[2].


Maybe I am missing something, but I have just removed this code and 
everything works just fine.


[1] 
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1178


[2] 
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1226



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: [PATCH] Connection time for \conninfo

2020-03-10 Thread David Steele


Hi Rodrigo,

On 2/27/20 4:21 AM, Peter Eisentraut wrote:
My opinion is that this is not particularly useful and not appropriate 
to piggy-back onto \conninfo.  Connection information including host, 
port, database, user name is a well-established concept in PostgreSQL 
programs and tools and it contains a delimited set of information. 
Knowing what server and what database you are connected to also seems 
kind of important.  Moreover, this is information that is under control 
of the client, so it must be tracked on the client side.


Knowing how long you've been connected on the other hand seems kind of 
fundamentally unimportant.  If we add that, what's to stop us from 
adding other statistics of minor interest such as how many commands 
you've run, how many errors there were, etc.  The connection time is 
already available, and perhaps we should indeed make it a bit easier to 
get, but it doesn't need to be a psql command.


The consensus from the committers (with the exception of Álvaro) seems 
to be that this is not a feature we want. FWIW, I agree with the majority.


I'll mark this as rejected on MAR-16 unless Álvaro makes an argument for 
committing it.


Regards,
--
-David
da...@pgmasters.net

Re: backend type in log_line_prefix?

2020-03-10 Thread Julien Rouhaud

On Tue, Mar 10, 2020 at 4:41 PM Peter Eisentraut
 wrote:
>
> On 2020-03-09 16:20, Kuntal Ghosh wrote:
> > In v3-0002-Unify-several-ways-to-tracking-backend-type.patch,

In pgstat_get_backend_desc(), the fallback "unknown process type"
description shouldn't be required anymore.

Other than that, it all looks good to me.

Re: Add absolute value to dict_int

2020-03-10 Thread Tom Lane

I wrote:
> So all we really need to do is upgrade [de]serialize_deflist to be smarter
> about int and float nodes.  This is still a bit invasive because somebody
> decided to make deserialize_deflist serve two masters, and I don't feel
> like working through whether the prsheadline code would cope nicely with
> non-string output nodes.  So the first patch attached adds a flag argument
> to deserialize_deflist to tell it whether to force all the values to
> strings.

On closer inspection, it doesn't seem that changing the behavior for
prsheadline will make any difference.  The only extant code that
reads that result is prsd_headline which always uses defGetString,
and probably any third-party text search parsers would too.
So I've pushed this without the extra flag argument.

regards, tom lane

Re: improve transparency of bitmap-only heap scans

2020-03-10 Thread David Steele


Hi Jeff,

On 2/7/20 10:22 AM, Alexey Bashtanov wrote:
I've changed it all to "unfetched" for at least not to call the same 
thing differently
in the code and in the output, and also rebased it and fit in 80 lines 
width limit.


What do you think of Alexey's updates?

Regards,
--
-David
da...@pgmasters.net

Re: range_agg

2020-03-10 Thread David Fetter

On Mon, Mar 09, 2020 at 06:34:04PM -0700, Jeff Davis wrote:
> On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote:
> > Actually ... have you given any thought to just deciding that ranges
> > and
> > multiranges are the same type? 
> 
> It has come up in a number of conversations, but I'm not sure if it was
> discussed on this list.
> 
> > I think on the whole the advantages win,
> > and I feel like that might also be the case here.
> 
> Some things to think about:
> 
> 1. Ranges are common -- at least implicitly -- in a lot of
> applications/systems. It's pretty easy to represent extrernal data as
> ranges in postgres, and also to represent postgres ranges in external
> systems. But I can see multiranges causing friction around a lot of
> common tasks, like displaying in a UI. If you only expect ranges, you
> can add a CHECK constraint, so this is annoying but not necessarily a
> deal-breaker.

It could become well and truly burdensome in a UI or an API. The
difference between one, as ranges are now, and many, as multi-ranges
would be if we shoehorn them into the range type, are pretty annoying
to deal with.

> 2. There are existing client libraries[1] that support range types and
> transform them to types within the host language. Obviously, those
> would need to be updated to expect multiple ranges.

The type systems that would support such types might get unhappy with
us if we started messing with some of the properties like
contiguousness.

> 3. It seems like we would want some kind of base "range" type. When you
> try to break a multirange down into constituent ranges, what type would
> those pieces be? (Aside: how do you get the constituent ranges?)
> 
> I'm thinking more about casting to see if there's a possible compromise
> there.

I think the right compromise is to recognize that the closure of a set
(ranges) over an operation (set union) may well be a different set
(multi-ranges). Other operations have already been proposed, complete
with concrete use cases that could really make PostgreSQL stand out.

That we don't have an obvious choice of "most correct" operation over
which to close ranges makes it even bigger a potential foot-gun
when we choose one arbitrarily and declare it to be the canonical one.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: time for catalog/pg_cast.c?

2020-03-10 Thread Alvaro Herrera

On 2020-Mar-09, Alvaro Herrera wrote:

> I extracted from the latest multirange patch a bit that creates a new
> routine CastCreate() in src/backend/catalog/pg_cast.c.  It contains the
> catalog-accessing bits to create a new cast.  It seems harmless, so I
> thought I'd apply it to get rid of a couple of hunks in the large patch.

Pushed, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-10 Thread Andy Fan

Hi Tom & David & Bapat:

Thanks for your review so far.  I want to summarize the current issues to
help
our following discussion.

1. Shall we bypass the AggNode as well with the same logic.

I think yes, since the rules to bypass a AggNode and UniqueNode is exactly
same.
The difficulty of bypassing AggNode is the current aggregation function
call is closely
coupled with AggNode.  In the past few days, I have make the aggregation
call can
run without AggNode (at least I tested sum(without finalized  fn),  avg
(with finalized fn)).
But there are a few things to do, like acl check,  anynull check and maybe
more check.
also there are some MemoryContext mess up need to fix.
I still need some time for this goal,  so I think the complex of it
deserves another thread
to discuss it,  any thought?

2. Shall we used the UniquePath as David suggested.

Actually I am trending to this way now.  Daivd, can you share more insights
about the
benefits of UniquePath?  Costing size should be one of them,  another one
may be
changing the semi join to normal join as the current innerrel_is_unique
did.  any others?

3. Can we make the rule more general?

Currently it requires every relation yields a unique result.  Daivd & Bapat
provides another example:
select m2.pk from m1, m2 where m1.pk = m2.non_unqiue_key. That's
interesting and not easy to
handle in my current framework.  This is another reason I want to take the
UniquePath framework.

Do we have any other rules to think about before implementing it?

Thanks for your feedback.


> This should be ok. The time spent in annotating a RelOptInfo about
> uniqueness is not going to be a lot. But doing so would help generic
> elimination of Distinct/Group/Unique operations. What is
> UniquePathKey; I didn't find this in your patch or in the code.
>
> This is a proposal from David,  so not in current patch/code :)

Regards
Andy Fan

Re: backend type in log_line_prefix?

2020-03-10 Thread Peter Eisentraut


On 2020-03-09 16:20, Kuntal Ghosh wrote:

In v3-0001-Refactor-ps_status.c-API.patch,

- *
- * For a walsender, the ps display is set in the following form:
- *
- * postgres: walsender   
This part is still valid, right?


Sure but I figured this comment was in the context of the explanation of 
how the old API was being abused, so it's no longer necessary.



+ init_ps_display(ps_data.data);
+ pfree(ps_data.data);
+
+ set_ps_display("initializing");
As per the existing behaviour, if update_process_title is true, we
display "authentication" as the initial string. On the other hand,
this patch, irrespective of the GUC variable, always displays
"initializing" as the initial string and In PerformAuthentication, it
sets the display as "authentication" Is this intended? Should we check
the GUC here as well?


set_ps_display() checks update_process_title itself and does nothing if 
it's off, so this should work okay.



In v3-0002-Unify-several-ways-to-tracking-backend-type.patch,
+ * If fixed_part is NULL, a default will be obtained from BackendType.
s/BackendType/MyBackendType?


yup


In v3-0003-Add-backend-type-to-csvlog-and-optionally-log_lin.patch,
+ Backend process type
In other places you've written "backend type".


ok changed



In v3-0004-Remove-am_syslogger-global-variable.patch,
+ * This is exported so that elog.c can call it when BackendType is B_LOGGER.
s/BackendType/MyBackendType?


ok

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-10 Thread Alvaro Herrera

On 2020-Mar-10, Michael Paquier wrote:

> On Tue, Mar 10, 2020 at 01:05:40PM +0300, Alexander Korotkov wrote:
> > Two options seem reasonable to me in this case.  The first is to pass
> > length as additional argument as you did.  The second option is to
> > make argument a pointer to fixed-size array as following.

Another option is to return the command as a palloc'ed string (per
psprintf), instead of using a caller-stack-allocated variable.  Passing
the buffer len is widely used, but more error prone (and I think getting
this one wrong might be more catastrophic than a mistake elsewhere.)
This is not a performance-critical path enough that we *need* the
optimization that avoids the palloc is important.  (Failure can be
reported by returning NULL.)  Also, I think the function comment could
stand some more detailing.

Also, I think Msvcbuild.pm could follow Makefile's ideas of one line per
file.  Maybe no need to fix all of that in this patch, but let's start
by adding the new file it its own line rather than reflowing two
adjacent lines (oh wait ... does perltidy put it that way?  if so,
nevermind.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

2020-03-10 Thread Corey Huinker

This latest version is an attempt at merging the work of Jürgen Purtz into
what I had posted earlier. There was relatively little overlap in the terms
we had chosen to define.

Each glossary definition now has a reference id (good idea Jürgen), the
form of which is "glossary-term". So we can link to the glossary from
outside if we so choose.

I encourage everyone to read the definitions, and suggest fixes to any
inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on
the structure itself, and hoping to get that committed.


On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker 
wrote:

> It seems like this could be a good idea, still the patch has been
>> waiting on his author for more than two weeks now, so I have marked it
>> as returned with feedback.
>>
>
> In light of feedback, I enlisted the help of an actual technical writer
> (Roger Harkavy, CCed) and we eventually found the time to take a second
> pass at this.
>
> Attached is a revised patch.
>
>
From 690473e51fc442c55c1744f69813795fce9d22dc Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 10 Mar 2020 11:26:29 -0400
Subject: [PATCH] add glossary page

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1008 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1010 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..851e9debe6
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1008 @@
+
+ Glossary
+ 
+  This is a list of terms and their in the context of PostgreSQL and Databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data.
+ 
+ 
+  For more information, see Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference values found in nearby Rows of the same Result Set.
+ 
+ 
+  For more information, see Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to reclaim space on the file system.
+ 
+ 
+  For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken up into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  A typed data element found within a Tuple or Relation or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC Records of the Heap and Index.
+ 
+ 
+  For more information, see Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs.
+
+
+ For more information, see Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute
+expensive I/O activity over time instead of generating fewer large I/O peaks which could block other processes.
+ 
+ 
+  For more information, see Server Configuration: Resource Consumption.
+ 
+
+   
+
+   
+Cast
+
+ 
+  A conversion of a Datum from its current data type to another data type.
+ 
+
+   
+
+ 
+Catalog
+
+ 
+  The SQL standard uses this standalone term to indicate what is called a
+Database in PostgreSQL's terminology.
+ 
+

Re: Improve search for missing parent downlinks in amcheck

2020-03-10 Thread Alexander Korotkov

On Tue, Mar 10, 2020 at 3:07 AM Peter Geoghegan  wrote:
> On Sun, Mar 8, 2020 at 2:52 PM Alexander Korotkov
>  wrote:
> > I've revised this comment.  Hopefully it's better now.
>
> I think that the new comments about why we need a low key for the page
> are much better now.

Good, thank you.

> > I've updated this comment using terms "cousin page" and "subtree".  I
> > didn't refer the diagram example, because it doesn't contain
> > appropriate case.  And I wouldn't like this diagram to contain such
> > case, because that probably makes this diagram too complex.  I've also
> > invented term "uncle page". BTW, should it be "aunt page"?  I don't
> > know.
>
> I have never heard the term "uncle page" before, but I like it --
> though maybe say "right uncle page". That happens to be the exact
> relationship that we're talking about here. I think any one of
> "uncle", "aunt", or "uncle/aunt" are acceptable. We'll probably never
> need to use this term again, but it seems like the right term to use
> here.

According to context that should be left uncle page.  I've changed the
text accordingly.

> Anyway, this section also seems much better now.
>
> Other things that I noticed:
>
> * Typo:
>
> > +   /*
> > +* We don't call bt_child_check() for "negative infinity" items.
> > +* But if we're performatin downlink connectivity check, we do 
> > it
> > +* for every item including "negative infinity" one.
> > +*/
>
> s/performatin/performing/

Fixed.

> * Suggest that you say "has incomplete split flag set" here:
>
> > + * - The call for block 4 will initialize data structure, but doesn't do 
> > actual
> > + *   checks assuming page 4 has incomplete split.

Yes, that sounds better.  Changed here and in the other places.

> * More importantly, is this the right thing to say about page 4? Isn't
> it also true that page 4 is the leftmost leaf page, and therefore kind
> of special in another way? Even without having the incomplete split
> flag set at all? Wouldn't it be better to address the incomplete split
> flag issue by making that apply to some other page that isn't also the
> leftmost? That would allow you to talk about the leftmost case
> directly here. Or it would at least make it less confusing.

Yes, current example looks confusing in this aspect.  But your comment
spotted to me an algorithmic issue.  We don't match highkey of
leftmost child against parent pivot key.  But we can.  The "if
(!BlockNumberIsValid(blkno))" branch survived from the patch version
when we didn't match high keys.  I've revised it.  Now we enter the
loop even for leftmost page on child level and match high key for that
page.

> BTW, a P_LEFTMOST() assertion at the beginning of
> bt_child_highkey_check() would make this easier to follow.

Yes, but why should it be an assert?  We can imagine corruption, when
there is left sibling of first child of leftmost target.  I guess,
current code would report such situation as an error, because this
left sibling lacks of parent downlink.  I've revised that "if" branch,
so we don't load a child page there anymore.  Error reporting is added
to the main loop.

> * Correct spelling is "happens" here:
>
> > +* current child page is not incomplete split, then its high key
> > +* should match to the target's key of current offset number. 
> > This
> > +* happends when child page referenced by previous downlink is
>
> * Actually, maybe this whole sentence should be reworded instead --
> say "This happens when a previous call here (to
> bt_child_highkey_check()) found an incomplete split, and we reach a
> right sibling page without a downlink -- the right sibling page's high
> key still needs to be matched to a separator key on the parent/target
> level".
>
> * Maybe say "Don't apply OffsetNumberNext() to target_downlinkoffnum
> when we already had to step right on the child level. Our traversal of
> the child level must try to move in perfect lockstep behind (to the
> left of) the target/parent level traversal."
>
> I found this detail very confusing at first.
>
> * The docs should say "...relationships, including checking that there
> are no missing downlinks in the index structure" here:
>
> >unlike bt_index_check,
> >bt_index_parent_check also checks
> > -  invariants that span parent/child relationships.
> > +  invariants that span parent/child relationships including check
> > +  that there are no missing downlinks in the index structure.
> >bt_index_parent_check follows the general
> >convention of raising an error if it finds a logical
> >inconsistency or other problem.

This comments are revised as you proposed.

> This is very close now. I would be okay with you committing the patch
> once you deal with this feedback. If you prefer, I can take another
> look at a new revision.

Thank you.  I'd like to have another feedback from you assuming there
are logic

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-10 Thread Alexander Korotkov

On Tue, Mar 10, 2020 at 3:44 PM Michael Paquier  wrote:
> On Tue, Mar 10, 2020 at 01:05:40PM +0300, Alexander Korotkov wrote:
> > Two options seem reasonable to me in this case.  The first is to pass
> > length as additional argument as you did.  The second option is to
> > make argument a pointer to fixed-size array as following.
> >
> > extern bool BuildRestoreCommand(const char *restoreCommand,
> > const char *xlogpath,   /* %p */
> > const char *xlogfname,  /* %f */
> > const char *lastRestartPointFname,  /* %r */
> > char (*commandResult)[MAXPGPATH]);
> >
> > Passing pointer to array of different size would cause an error.  The
> > downside of this approach is that passing palloc'd chunk of memory as
> > commandResult would be less convenient.
>
> Thanks Alexander for the input.  Another argument is that what you are
> suggesting here is not the usual project style, so I would still stick
> with two arguments to be consistent.

Yes, another argument is valid as well.  I'm OK with current solution.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: time for catalog/pg_cast.c?

2020-03-10 Thread Alvaro Herrera

I would even say that DropCastById belongs in the new file, which is
just the attached.  However, none of the Drop.*ById or Remove.*ById
functions seem to be in backend/catalog/ at all, and moving just a
single one seems to make things even more inconsistent.  I think all
these catalog-accessing functions should be in backend/catalog/ but I'm
not in a hurry to patch half of backend/commands/ to move them all.

(I think the current arrangement is just fallout from having created the
dependency.c system to drop objects, which rid us of a bunch of bespoke
deletion-handling code.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/pg_cast.c b/src/backend/catalog/pg_cast.c
index 3854455637..bd2a369aad 100644
--- a/src/backend/catalog/pg_cast.c
+++ b/src/backend/catalog/pg_cast.c
@@ -14,7 +14,9 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
@@ -24,6 +26,7 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -121,3 +124,29 @@ CastCreate(Oid sourcetypeid, Oid targettypeid, Oid funcid, char castcontext,
 
 	return myself;
 }
+
+void
+DropCastById(Oid castOid)
+{
+	Relation	relation;
+	ScanKeyData scankey;
+	SysScanDesc scan;
+	HeapTuple	tuple;
+
+	relation = table_open(CastRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&scankey,
+Anum_pg_cast_oid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(castOid));
+	scan = systable_beginscan(relation, CastOidIndexId, true,
+			  NULL, 1, &scankey);
+
+	tuple = systable_getnext(scan);
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "could not find tuple for cast %u", castOid);
+	CatalogTupleDelete(relation, &tuple->t_self);
+
+	systable_endscan(scan);
+	table_close(relation, RowExclusiveLock);
+}
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 5eac55aaca..355e93840a 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1645,33 +1645,6 @@ CreateCast(CreateCastStmt *stmt)
 	return myself;
 }
 
-void
-DropCastById(Oid castOid)
-{
-	Relation	relation;
-	ScanKeyData scankey;
-	SysScanDesc scan;
-	HeapTuple	tuple;
-
-	relation = table_open(CastRelationId, RowExclusiveLock);
-
-	ScanKeyInit(&scankey,
-Anum_pg_cast_oid,
-BTEqualStrategyNumber, F_OIDEQ,
-ObjectIdGetDatum(castOid));
-	scan = systable_beginscan(relation, CastOidIndexId, true,
-			  NULL, 1, &scankey);
-
-	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for cast %u", castOid);
-	CatalogTupleDelete(relation, &tuple->t_self);
-
-	systable_endscan(scan);
-	table_close(relation, RowExclusiveLock);
-}
-
-
 static void
 check_transform_function(Form_pg_proc procstruct)
 {
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 2620ff40f0..78ba395f17 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -95,5 +95,6 @@ extern ObjectAddress CastCreate(Oid sourcetypeid,
 char castcontext,
 char castmethod,
 DependencyType behavior);
+extern void DropCastById(Oid castOid);
 
 #endif			/* PG_CAST_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c77c9a6ed5..1d22cc8083 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -56,7 +56,6 @@ extern ObjectAddress CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt
 extern void RemoveFunctionById(Oid funcOid);
 extern ObjectAddress AlterFunction(ParseState *pstate, AlterFunctionStmt *stmt);
 extern ObjectAddress CreateCast(CreateCastStmt *stmt);
-extern void DropCastById(Oid castOid);
 extern ObjectAddress CreateTransform(CreateTransformStmt *stmt);
 extern void DropTransformById(Oid transformOid);
 extern void IsThereFunctionInNamespace(const char *proname, int pronargs,

Re: WIP: System Versioned Temporal Table

2020-03-10 Thread David Steele


On 3/10/20 9:00 AM, Surafel Temesgen wrote:
On Tue, Mar 3, 2020 at 9:33 PM David Steele > wrote:


The patch is not exactly new for this CF but since the first version
was
posted 2020-01-01 and there have been no updates (except a rebase)
since
then it comes pretty close.

Were you planning to work on this for PG13?  If so we'll need to see a
rebased and updated patch pretty soon.  My recommendation is that we
move this patch to PG14.

I agree with moving to  PG14 . Its hard to make it to PG13.


The target version is now PG14.

Regards,
--
-David
da...@pgmasters.net

Re: Add an optional timeout clause to isolationtester step.

2020-03-10 Thread Julien Rouhaud

On Tue, Mar 10, 2020 at 12:09:12AM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Mar 09, 2020 at 10:32:27PM -0400, Tom Lane wrote:
> >> It strikes me to wonder whether we could improve matters by teaching
> >> isolationtester to watch for particular values in a connected backend's
> >> pg_stat_activity.wait_event_type/wait_event columns.  Those columns
> >> didn't exist when isolationtester was designed, IIRC, so it's not
> >> surprising that they're not used in the current design.  But we could
> >> use them perhaps to detect that a backend has arrived at some state
> >> that's not a heavyweight-lock-wait state.
>
> > Interesting idea.  So that would be basically an equivalent of
> > PostgresNode::poll_query_until but for the isolation tester?
>
> No, more like the existing isolationtester wait query, which watches
> for something being blocked on a heavyweight lock.  Right now, that
> one depends on a bespoke function pg_isolation_test_session_is_blocked(),
> but it used to be a query on pg_stat_activity/pg_locks.

Ah interesting indeed!

> > In short
> > we gain a meta-command that runs a SELECT query that waits until the
> > query defined in the command returns true.  The polling interval may
> > be tricky to set though.
>
> I think it'd be just the same as the polling interval for the existing
> wait query.  We'd have to have some way to mark a script step to say
> what to check to decide that it's blocked ...

So basically we could just change pg_isolation_test_session_is_blocked() to
also return the wait_event_type and wait_event, and adding something like

step "" { SQL } [ cancel on "" "" ]

to the step definition should be enough.  I'm attaching a POC patch for that.
On my laptop, the full test now complete in about 400ms.

FTR the REINDEX TABLE CONCURRENTLY case is eventually locked on a virtualxid,
I'm not sure if that's could lead to too early cancellation.
>From 77c214c9fb2fa7f9a003b96db0e5ec6506217e38 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 6 Mar 2020 13:40:56 +0100
Subject: [PATCH v2 1/2] Add an optional cancel on clause to isolationtester
 steps.

Some sanity checks can require a command to wait on a lock and eventually be
cancelled.  The only way to do that was to rely on calls to pg_cancel_backend()
filtering pg_stat_activity view, but this isn't a satisfactory solution as
there's no way to guarantee that only the wanted backend will be canceled.

Instead, add a new optional "cancel on  " clause
to the step definition.  When this clause is specified, isolationtester will
actively wait on that command, and issue a cancel when the query is waiting on
the given wait event.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/20200305035354.GQ2593%40paquier.xyz
---
 src/backend/utils/adt/lockfuncs.c| 54 ++--
 src/include/catalog/pg_proc.dat  |  5 ++-
 src/test/isolation/README| 12 +--
 src/test/isolation/isolationtester.c | 43 +-
 src/test/isolation/isolationtester.h |  8 +
 src/test/isolation/specparse.y   | 20 +--
 src/test/isolation/specscanner.l |  2 ++
 7 files changed, 127 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index ecb1bf92ff..cc2bc94d0d 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -17,7 +17,9 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "storage/predicate_internals.h"
+#include "storage/procarray.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -578,17 +580,28 @@ pg_safe_snapshot_blocking_pids(PG_FUNCTION_ARGS)
 Datum
 pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 {
+	TupleDesc	tupdesc;
 	int			blocked_pid = PG_GETARG_INT32(0);
 	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
 	ArrayType  *blocking_pids_a;
 	int32	   *interesting_pids;
 	int32	   *blocking_pids;
+	Datum		values[3];
+	bool		nulls[3];
+	PGPROC	   *proc;
+	uint32		raw_wait_event = 0;
+	const char *wait_event_type = NULL;
+	const char *wait_event = NULL;
 	int			num_interesting_pids;
 	int			num_blocking_pids;
 	int			dummy;
 	int			i,
 j;
 
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
+
 	/* Validate the passed-in array */
 	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
 	if (array_contains_nulls(interesting_pids_a))
@@ -597,6 +610,34 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
 		  ARR_DIMS(interesting_pids_a));
 
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(3);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "blocked",
+	   BOOLOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wait_event_type",
+	   TEXTOID, -1, 0);
+

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-10 Thread Ashutosh Bapat

Hi Andy,

On Tue, Mar 10, 2020 at 1:49 PM Andy Fan  wrote:
>
> Hi David:
>
>>
>> 3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
>> rel itself (I've not looked for the best location in detail),
>> determine if the join can cause rows to be duplicated. If it can't,
>> then add the UniqueKeys from that rel.
>
>
> I have some concerns about this method,  maybe I misunderstand
> something, if so, please advise.
>
> In my current implementation, it calculates the uniqueness for each
> BaseRel only, but in your way,  looks we need to calculate the
> UniquePathKey for both BaseRel and JoinRel.   This makes more
> difference for multi table join.

I didn't understand this concern. I think, it would be better to do it
for all kinds of relation types base, join etc. This way we are sure
that one method works across the planner to eliminate the need for
Distinct or grouping. If we just implement something for base
relations right now and don't do that for joins, there is a chance
that that method may not work for joins when we come to implement it.

> Another concern is UniquePathKey
> is designed for a general purpose,  we need to maintain it no matter
> distinctClause/groupbyClause.

This should be ok. The time spent in annotating a RelOptInfo about
uniqueness is not going to be a lot. But doing so would help generic
elimination of Distinct/Group/Unique operations. What is
UniquePathKey; I didn't find this in your patch or in the code.

>
>
>>
>>  For example: SELECT * FROM t1
>> INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
>> {t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
>> since it's an eqijoin and t1.unique has a unique index).
>
>
> Thanks for raising this.  My current rule requires *every* relation yields a
> unique result and *no matter* with the join method.  Actually I want to make
> the rule less strict, for example, we  may just need 1 table yields unique 
> result
> and the final result will be unique as well under some join type.

That is desirable.

>
> As for the t1 INNER JOIN t2 ON t1.unique = t2.not_unique;  looks we can't
> remove the distinct based on this.
>
> create table m1(a int primary key, b int);
> create table m2(a int primary key, b int);
> insert into m1 values(1, 1), (2, 1);
> insert into m2 values(1, 1), (2, 1);
> select distinct m1.a from m1, m2 where m1.a = m2.b;

IIUC, David's rule is other way round. "select distinct m2.a from m1,
m2 where m1.a = m2.b" won't need DISTINCT node since the result of
joining m1 and m2 has unique value of m2.a for each row. In your
example the join will produce two rows (m1.a, m1.b, m2.a, m2.b) (1, 1,
1, 1) and (1, 1, 2, 1) where m2.a is unique key.

--
Best Wishes,
Ashutosh Bapat

Ecpg dependency

2020-03-10 Thread Filip Janus

Hello,
After upgrade from 11.2 to 12.2 I found, that build of ecpg component
depends on pgcommon_shlib and pgport_shlib.  But build of ecpg
doesn't include build of pgcommon_shlib and pgport_shlib. That means, if I
want to build ecpg, first I need to build  pgcommon_shlib and pgport_shlib
and after that I am able to build ecpg.

I would like to ask if this behavior is expected or not ? Because previous
version doesn't require this separate builds.

Thanks
Filip Januš

Re: Improve handling of parameter differences in physical replication

2020-03-10 Thread Peter Eisentraut


On 2020-03-10 09:57, Kyotaro Horiguchi wrote:

Well I meant to periodically send warning messages while waiting for
parameter change, that is after exhausting resources and stopping
recovery. In this situation user need to notice that as soon as
possible.


If we lose connection, standby continues to complain about lost
connection every 5 seconds.  This is a situation of that kind.


My argument is that it's not really the same.  If a standby is 
disconnected for more than a few minutes, it's really not going to be a 
good standby anymore after a while.  In this case, however, having 
certain parameter discrepancies is really harmless and you can run with 
it for a long time.  I'm not strictly opposed to a periodic warning, but 
it's unclear to me how we would find a good interval.



By the way, when I reduced max_connection only on master then take
exclusive locks until standby complains on lock exchaustion, I see a
WARNING that is saying max_locks_per_transaction instead of
max_connection.


WARNING:  insufficient setting for parameter max_connections
DETAIL:  max_connections = 2 is a lower setting than on the master server 
(where its value was 3).
HINT:  Change parameters and restart the server, or there may be resource 
exhaustion errors sooner or later.
CONTEXT:  WAL redo at 0/6A0 for XLOG/PARAMETER_CHANGE: max_connections=3 
max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0 
max_locks_per_xact=10 wal_level=replica wal_log_hints=off 
track_commit_timestamp=off
WARNING:  recovery paused because of insufficient setting of parameter 
max_locks_per_transaction (currently 10)
DETAIL:  The value must be at least as high as on the primary server.
HINT:  Recovery cannot continue unless the parameter is changed and the server 
restarted.
CONTEXT:  WAL redo at 0/6004A80 for Standb


This is all a web of half-truths.  The lock tables are sized based on 
max_locks_per_xact * (MaxBackends + max_prepared_xacts).  So if you run 
out of lock space, we currently recommend (in the single-server case), 
that you raise max_locks_per_xact, but you could also raise 
max_prepared_xacts or something else.  So this is now the opposite case 
where the lock table on the master was bigger because of max_connections.


We could make the advice less specific and just say, in essence, you 
need to make some parameter changes; see earlier for some hints.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-10 Thread Amit Langote

On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao  wrote:
> > So, I will make the patch adding support for --no-estimate-size option
> > in pg_basebackup.
>
> Patch attached.

Like the idea and the patch looks mostly good.

+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is always 0.

"always" seems unnecessary.

+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view always indicates 0 if this option is enabled.

Here too.

Thanks,
Amit

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-10 Thread Robert Haas

On Sat, Mar 7, 2020 at 10:23 AM Tom Lane  wrote:
> I continue to think that we'd be better off getting all of this
> out of the heavyweight lock manager.  There is no reason why we
> should need deadlock detection, or multiple holds of the same
> lock, or pretty much anything that LWLocks don't give you.

Well, that was my initial inclination too, but Andres didn't like it.
I don't know whether it's better to take his advice or yours.

The one facility that we need here which the heavyweight lock facility
does provide and the lightweight lock facility does not is the ability
to take locks on an effectively unlimited number of distinct objects.
That is, we can't have a separate LWLock for every relation, because
there ~2^32 relation OIDs per database, and ~2^32 database OIDs, and a
patch that tried to allocate a tranche of 2^64 LWLocks would probably
get shot down.

The patch I wrote for this tried to work around this by having an
array of LWLocks and hashing  pairs onto array slots.
This produces some false sharing, though, which Andres didn't like
(and I can understand his concern). We could work around that problem
with a more complex design, where the LWLocks in the array do not
themselves represent the right to extend the relation, but only
protect the list of lockers. But at that point it starts to look like
you are reinventing the whole LOCK/PROCLOCK division.

So from my point of view we've got three possible approaches here, all
imperfect:

- Hash  pairs onto an array of LWLocks that represent the
right to extend the relation. Problem: false sharing for the whole
time the lock is held.

- Hash  pairs onto an array of LWLocks that protect a list of
lockers. Problem: looks like reinventing LOCK/PROCLOCK mechanism,
which is a fair amount of complexity to be duplicating.

- Adapt the heavyweight lock manager. Problem: Code is old, complex,
grotty, and doesn't need more weird special cases.

Whatever we choose, I think we ought to try to get Page locks and
Relation Extension locks into the same system. They're conceptually
the same kind of thing: you're not locking an SQL object, you
basically want an LWLock, but you can't use an LWLock because you want
to lock an OID not a piece of shared memory, so you can't have enough
LWLocks to use them in the regular way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Restore replication settings when modifying a field type

2020-03-10 Thread Euler Taveira

On Thu, 5 Mar 2020 at 09:45, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-02-11 00:38, Quan Zongliang wrote:
> > new patch attached.
>
> I didn't like so much how the updating of the replica identity was
> hacked into the middle of ATRewriteCatalogs().  I have an alternative
> proposal in the attached patch that queues up an ALTER TABLE ... REPLICA
> IDENTITY command into the normal ALTER TABLE processing.  I have also
> added tests to the test suite.
>
> LGTM. Tests are ok. I've rebased it (because
61d7c7bce3686ec02bd64abac742dd35ed9b9b01). Are you planning to backpatch
it? IMHO you should because it is a bug (since REPLICA IDENTITY was
introduced in 9.4). This patch can be applied as-is in 12 but not to other
older branches. I attached new patches.

Regards,


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e35463df17f25940eabe92142237c2a85fb15b6f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 5 Mar 2020 10:11:13 +0100
Subject: [PATCH] Preserve replica identity index across ALTER TABLE rewrite

---
 src/backend/commands/tablecmds.c  | 42 +
 src/backend/utils/cache/lsyscache.c   | 23 ++
 src/include/utils/lsyscache.h |  1 +
 .../regress/expected/replica_identity.out | 46 +++
 src/test/regress/sql/replica_identity.sql | 21 +
 5 files changed, 133 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 38386fb9cf..80725dcf92 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -175,6 +175,7 @@ typedef struct AlteredTableInfo
 	List	   *changedConstraintDefs;	/* string definitions of same */
 	List	   *changedIndexOids;	/* OIDs of indexes to rebuild */
 	List	   *changedIndexDefs;	/* string definitions of same */
+	char	   *replicaIdentityIndex;	/* index to reset as REPLICA IDENTITY */
 } AlteredTableInfo;
 
 /* Struct describing one new constraint to check in Phase 3 scan */
@@ -10314,6 +10315,22 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
 	return address;
 }
 
+/*
+ * Subroutine for ATExecAlterColumnType: remember that a replica identity
+ * needs to be reset.
+ */
+static void
+RememberReplicaIdentityForRebuilding(Oid indoid, AlteredTableInfo *tab)
+{
+	if (!get_index_is_replident(indoid))
+		return;
+
+	if (tab->replicaIdentityIndex)
+		elog(ERROR, "relation %u has multiple indexes marked as replica identity", tab->relid);
+
+	tab->replicaIdentityIndex = get_rel_name(indoid);
+}
+
 /*
  * Subroutine for ATExecAlterColumnType: remember that a constraint needs
  * to be rebuilt (which we might already know).
@@ -10332,11 +10349,16 @@ RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab)
 	{
 		/* OK, capture the constraint's existing definition string */
 		char	   *defstring = pg_get_constraintdef_command(conoid);
+		Oid			indoid;
 
 		tab->changedConstraintOids = lappend_oid(tab->changedConstraintOids,
  conoid);
 		tab->changedConstraintDefs = lappend(tab->changedConstraintDefs,
 			 defstring);
+
+		indoid = get_constraint_index(conoid);
+		if (OidIsValid(indoid))
+			RememberReplicaIdentityForRebuilding(indoid, tab);
 	}
 }
 
@@ -10379,6 +10401,8 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab)
 indoid);
 			tab->changedIndexDefs = lappend(tab->changedIndexDefs,
 			defstring);
+
+			RememberReplicaIdentityForRebuilding(indoid, tab);
 		}
 	}
 }
@@ -10593,6 +10617,24 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
 		add_exact_object_address(&obj, objects);
 	}
 
+	/*
+	 * Queue up command to restore replica identity index marking
+	 */
+	if (tab->replicaIdentityIndex)
+	{
+		AlterTableCmd *cmd = makeNode(AlterTableCmd);
+		ReplicaIdentityStmt *subcmd = makeNode(ReplicaIdentityStmt);
+
+		subcmd->identity_type = REPLICA_IDENTITY_INDEX;
+		subcmd->name = tab->replicaIdentityIndex;
+		cmd->subtype = AT_ReplicaIdentity;
+		cmd->def = (Node *) subcmd;
+
+		/* do it after indexes and constraints */
+		tab->subcmds[AT_PASS_OLD_CONSTR] =
+			lappend(tab->subcmds[AT_PASS_OLD_CONSTR], cmd);
+	}
+
 	/*
 	 * It should be okay to use DROP_RESTRICT here, since nothing else should
 	 * be depending on these objects.
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a4da935371..4a2510e2ed 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3156,3 +3156,26 @@ get_range_collation(Oid rangeOid)
 	else
 		return InvalidOid;
 }
+
+/*
+ * get_index_is_replident
+ *
+ * Returns indisreplident field.
+ */
+bool
+get_index_is_replident(Oid index_oid)
+{
+	HeapTuple		tuple;
+	Form_pg_index	rd_index;
+	bool			result;
+
+	tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(index_oid));
+	if (!HeapTupleIsValid(tuple))
+		return

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-10 Thread Robert Haas

On Fri, Mar 6, 2020 at 11:27 PM Dilip Kumar  wrote:
> I think instead of the flag we need to keep the counter because we can
> acquire the same relation extension lock multiple times.  So
> basically, every time we acquire the lock we can increment the counter
> and while releasing we can decrement it.   During an error path, I
> think it is fine to set it to 0 in CommitTransaction/AbortTransaction.
> But, I am not sure that we can set to 0 or decrement it in
> AbortSubTransaction because we are not sure whether we have acquired
> the lock under this subtransaction or not.

I think that CommitTransaction, AbortTransaction, and friends have
*zero* business touching this. I think the counter - or flag - should
track whether we've got a PROCLOCK entry for a relation extension
lock. We either do, or we do not, and that does not change because of
anything have to do with the transaction state. It changes because
somebody calls LockRelease() or LockReleaseAll().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: adding partitioned tables to publications

2020-03-10 Thread Amit Langote

On Tue, Mar 10, 2020 at 5:52 PM Peter Eisentraut
 wrote:
> I have committed the 0001 patch of this series (partitioned table member
> of publication).  I changed the new argument of
> GetPublicationRelations() to an enum and reformatted some comments.
> I'll continue looking through the subsequent patches.

Thank you.

- Amit

Re: WIP: System Versioned Temporal Table

2020-03-10 Thread Surafel Temesgen

Hi,

On Tue, Mar 3, 2020 at 9:33 PM David Steele  wrote:

> Hi Surafel,
>
> On 1/3/20 5:57 AM, Surafel Temesgen wrote:
> > Rebased and conflict resolved i hope it build clean this time
>
> This patch no longer applies according to cfbot and there are a number
> of review comments that don't seem to have been addressed yet.
>
> The patch is not exactly new for this CF but since the first version was
> posted 2020-01-01 and there have been no updates (except a rebase) since
> then it comes pretty close.
>
> Were you planning to work on this for PG13?  If so we'll need to see a
> rebased and updated patch pretty soon.  My recommendation is that we
> move this patch to PG14.
>
>
I agree with moving to  PG14 . Its hard to make it to PG13.

regards
Surafel

Re: shared-memory based stats collector

2020-03-10 Thread Alvaro Herrera

On 2020-Mar-10, Kyotaro Horiguchi wrote:

> At Mon, 9 Mar 2020 20:34:20 -0700, Andres Freund  wrote 
> in 
> > On 2020-03-10 12:27:25 +0900, Kyotaro Horiguchi wrote:
> > > That's true, but I have the same concern with Tom. The archive bacame
> > > too-tightly linked with other processes than actual relation.
> > 
> > What's the problem here? We have a number of helper processes
> > (checkpointer, bgwriter) that are attached to shared memory, and it's
> > not a problem.
> 
> That theoretically raises the chance of server-crash by a small amount
> of probability. But, yes, it's absurd to prmise that archiver process
> crashes.

The case I'm worried about is a misconfigured archive_command that
causes the archiver to misbehave (exit with a code other than 0); if
that already doesn't happen, or we can make it not happen, then I'm okay
with the changes to archiver.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-10 Thread Michael Paquier

On Tue, Mar 10, 2020 at 01:05:40PM +0300, Alexander Korotkov wrote:
> Two options seem reasonable to me in this case.  The first is to pass
> length as additional argument as you did.  The second option is to
> make argument a pointer to fixed-size array as following.
> 
> extern bool BuildRestoreCommand(const char *restoreCommand,
> const char *xlogpath,   /* %p */
> const char *xlogfname,  /* %f */
> const char *lastRestartPointFname,  /* %r */
> char (*commandResult)[MAXPGPATH]);
> 
> Passing pointer to array of different size would cause an error.  The
> downside of this approach is that passing palloc'd chunk of memory as
> commandResult would be less convenient.

Thanks Alexander for the input.  Another argument is that what you are
suggesting here is not the usual project style, so I would still stick
with two arguments to be consistent.
--
Michael


signature.asc
Description: PGP signature

Re: Index Skip Scan

2020-03-10 Thread Dmitry Dolgov

> >On Tue, Mar 10, 2020 at 09:29:32AM +1300, David Rowley wrote:
> On Tue, 10 Mar 2020 at 08:56, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > Assuming we'll implement it in a way that we do not know about what kind
> > of path type is that in create_distinct_path, then it can also work for
> > ProjectionPath or anything else (if UniqueKeys are present). But then
> > still EquivalenceMember are used only to figure out correct
> > distinctPrefixKeys and do not affect whether or not skipping is applied.
> > What do I miss?
>
> I'm not sure I fully understand the question correctly, but let me
> explain further.
>
> In the 0001 patch, standard_qp_callback sets the query_uniquekeys
> depending on the DISTINCT / GROUP BY clause.  When building index
> paths in build_index_paths(), the 0002 patch should be looking at the
> root->query_uniquekeys to see if it can build any index paths that
> suit those keys.  Such paths should be tagged with the uniquekeys they
> satisfy, basically, exactly the same as how pathkeys work.  Many
> create_*_path functions will need to be modified to carry forward
> their uniquekeys. For example, create_projection_path(),
> create_limit_path() don't do anything which would cause the created
> path to violate the unique keys.   This way when you get down to
> create_distinct_paths(), paths other than IndexPath may have
> uniquekeys.  You'll be able to check which existing paths satisfy the
> unique keys required by the DISTINCT / GROUP BY and select those paths
> instead of having to create any HashAggregate / Unique paths.
>
> Does that answer the question?

Hmm... I'm afraid no, this was already clear. But looks like now I see
that I've misinterpreted one part.

> There's also some weird looking assumptions that an EquivalenceMember
> can only be a Var in create_distinct_paths(). I think you're only
> saved from crashing there because a ProjectionPath will be created
> atop of the IndexPath to evaluate expressions, in which case you're
> not seeing the IndexPath.  This results in the optimisation not
> working in cases like:

I've read it as "an assumption that an EquivalenceMember can only be a
Var" results in "the optimisation not working in cases like this". But
you've meant that ignoring a ProjectionPath with an IndexPath inside
results in this optimisation not working, right? If so, then everything
is clear, and my apologies, maybe I need to finally fix my sleep
schedule :)

Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-10 Thread Daniel Gustafsson

> On 10 Mar 2020, at 11:53, Hugh McMaster  wrote:

> Debian (and Ubuntu) are beginning to remove foo-config legacy scripts.
> Already, xml2-config has been flagged for removal, with packages being
> asked to switch to pkg-config.
> 
> This patch uses pkg-config's PKG_CHECK_MODULES macro to detect libxml2
> or, if pkg-config is not available, falls back to xml2-confg.

This was previously discussed in 20200120204715.ga73...@msg.df7cb.de which
ended without a real conclusion on what could/should be done (except that
nothing *had* to be done).

What is the situation on non-Debian/Ubuntu systems (BSD's, macOS etc etc)?  Is
it worth adding pkg-config support if we still need a fallback to xml2-config?

cheers ./daniel

Re: Remove win32ver.rc from version_stamp.pl

2020-03-10 Thread Peter Eisentraut


On 2020-03-09 15:55, Tom Lane wrote:

Peter Eisentraut  writes:

On 2020-03-01 23:51, Tom Lane wrote:

In general, while I'm on board with the idea, I wonder whether it
wouldn't be smarter to keep on defining PG_MAJORVERSION as a string,
and just add PG_MAJORVERSION_NUM alongside of it.



Agreed.  Here is another patch.


This version LGTM.  (I can't actually test the Windows aspects
of this, but I assume you did.)


committed


I'm wondering a little bit whether it'd be worth back-patching the
additions of the new #defines.  That would cut about five years off
the time till they could be relied on by extensions.  However,
I'm not sure anyone is eager to rely on them, so it may not be
worth the effort.


I doubt external code really needs these symbols.  You can always use 
PG_VERSION_NUM.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-10 Thread Hugh McMaster

Dear developers,

Debian (and Ubuntu) are beginning to remove foo-config legacy scripts.
Already, xml2-config has been flagged for removal, with packages being
asked to switch to pkg-config.

This patch uses pkg-config's PKG_CHECK_MODULES macro to detect libxml2
or, if pkg-config is not available, falls back to xml2-confg.

The patch was created against the master branch of git.

I have built PostgreSQL and run `make check`. All 196 tests passed.

Hugh


0001-pkg-check-modules-libxml2.patch
Description: Binary data

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-10 Thread Amit Kapila

On Mon, Feb 24, 2020 at 3:38 PM Amit Kapila  wrote:
>
> On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  wrote:
> > What I'm advocating is that extension locks should continue to go
> > through lock.c. And yes, that requires some changes to group locking,
> > but I still don't see why they'd be complicated.
> >
>
> Fair position, as per initial analysis, I think if we do below three
> things, it should work out without changing to a new way of locking
> for relation extension or page type locks.
> a. As per the discussion above, ensure in code we will never try to
> acquire another heavy-weight lock after acquiring relation extension
> or page type locks (probably by having Asserts in code or maybe some
> other way).

I have done an analysis of the relation extension lock (which can be
acquired via LockRelationForExtension or
ConditionalLockRelationForExtension) and found that we don't acquire
any other heavyweight lock after acquiring it. However, we do
sometimes try to acquire it again in the places where we update FSM
after extension, see points (e) and (f) described below.  The usage of
this lock can be broadly divided into six categories and each one is
explained as follows:

a. Where after taking the relation extension lock we call ReadBuffer
(or its variant) and then LockBuffer.  The LockBuffer internally calls
either LWLock to acquire or release neither of which acquire another
heavy-weight lock. It is quite obvious as well that while taking some
lightweight lock, there is no reason to acquire another heavyweight
lock on any object.  The specs/comments of ReadBufferExtended (which
gets called from variants of ReadBuffer) API says that if the blknum
requested is P_NEW, only one backend can call it at-a-time which
indicates that we don't need to acquire any heavy-weight lock inside
this API.  Otherwise, also, this API won't need a heavy-weight lock to
read the existing block into shared buffer as two different backends
are allowed to read the same block.  I have also gone through all the
functions called/used in this path to ensure that we don't use
heavy-weight locks inside it.

The usage by APIs BloomNewBuffer, GinNewBuffer, gistNewBuffer,
_bt_getbuf, and SpGistNewBuffer falls in this category.  Another API
that falls under this category is revmap_physical_extend which uses
ReadBuffer, LocakBuffer and ReleaseBuffer. The ReleaseBuffer API
unpins aka decrement the reference count for buffer and disassociates
a buffer from the resource owner.  None of that requires heavy-weight
lock. T

b. After taking relation extension lock, we call
RelationGetNumberOfBlocks which primarily calls file-level functions
to determine the size of the file. This doesn't acquire any other
heavy-weight lock after relation extension lock.

The usage by APIs ginvacuumcleanup, gistvacuumscan, btvacuumscan, and
spgvacuumscan falls in this category.

c. There is a usage in API brin_page_cleanup() where we just acquire
and release the relation extension lock to avoid reinitializing the
page. As there is no call in-between acquire and release, so there is
no chance of another heavy-weight lock acquire after having relation
extension lock.

d. In fsm_extend() and vm_extend(), after acquiring relation extension
lock, we perform various file-level operations like RelationOpenSmgr,
smgrexists, smgrcreate, smgrnblocks, smgrextend.  First, from theory,
we don't have any heavy-weight lock other than relation extension lock
which can cover such operations and then I have verified it by going
through these APIs that these don't acquire any other heavy-weight
lock.  Then these APIs also call PageSetChecksumInplace computes a
checksum of the page and sets the same in page header which is quite
straight-forward and doesn't acquire any heavy-weight lock.

In vm_extend, we additionally call CacheInvalidateSmgr to send a
shared-inval message to force other backends to close any smgr
references they may have for the relation for which we extending
visibility map which has no reason to acquire any heavy-weight lock.
I have checked the code path as well and I didn't find any
heavy-weight lock call in that.

e. In brin_getinsertbuffer, we call ReadBuffer() and LockBuffer(), the
usage of which is the same as what is mentioned in (a).  In addition
to that it calls brin_initialize_empty_new_buffer() which further
calls RecordPageWithFreeSpace which can again acquire relation
extension lock for same relation.  This usage is safe because we have
a mechanism in heavy-weight lock manager that if we already hold a
lock and a request came for the same lock and in same mode, the lock
will be granted.

f. In RelationGetBufferForTuple(), there are multiple APIs that get
called and like (e), it can try to reacquire the relation extension
lock in one of those APIs.  The main APIs it calls after acquiring
relation extension lock are described as follows:
   - GetPageWithFreeSpace: This tries to find a page in the given
relation with at least the specified amount of

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-10 Thread Alexander Korotkov

On Tue, Mar 10, 2020 at 7:28 AM Michael Paquier  wrote:
> On Mon, Mar 09, 2020 at 03:38:29PM +0530, Kuntal Ghosh wrote:
> > That's a good suggestion. But, it's unlikely that a caller would pass
> > something longer than MAXPGPATH and we indeed use that value a lot in
> > the code. IMHO, it looks okay to me to have that assumption here as
> > well.
>
> Well, a more serious problem would be to allocate something smaller
> than MAXPGPATH.  This reminds me a bit of 09ec55b9 where we did not
> correctly design from the start the base64 encode and decode routines
> for SCRAM, so I'd rather design this one correctly from the start as
> per the attached.  Alexey, Alexander, what do you think?

Two options seem reasonable to me in this case.  The first is to pass
length as additional argument as you did.  The second option is to
make argument a pointer to fixed-size array as following.

extern bool BuildRestoreCommand(const char *restoreCommand,
const char *xlogpath,   /* %p */
const char *xlogfname,  /* %f */
const char *lastRestartPointFname,  /* %r */
char (*commandResult)[MAXPGPATH]);

Passing pointer to array of different size would cause an error.  The
downside of this approach is that passing palloc'd chunk of memory as
commandResult would be less convenient.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

GSOC 2020 - Develop Performance Farm Benchmarks and Website (2020)

2020-03-10 Thread do w.r. (wrd1e16)

Hi,

I am very interested in the Develop Performance Farm Benchmarks and Website 
(2020) project as one of the GSOC project. Is it possible to link me up with 
Andreas Scherbaum to discuss more and further understand the project?

Regards,
Wen Rei
MEng Electrical and Electronics, ECS
University of Southampton

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-10 Thread Ashutosh Bapat

On Tue, Mar 10, 2020 at 3:51 AM David Rowley  wrote:

> On Sat, 7 Mar 2020 at 00:47, Andy Fan  wrote:
> > Upload the newest patch so that the cfbot can pass.  The last patch
> failed
> > because some explain without the (cost off).
>
> I've only really glanced at this patch, but I think we need to do this
> in a completely different way.
>
> I've been mentioning UniqueKeys around this mailing list for quite a
> while now [1]. To summarise the idea:
>
> 1. Add a new List field to RelOptInfo named unique_keys
> 2. During get_relation_info() process the base relation's unique
> indexes and add to the RelOptInfo's unique_keys list the indexed
> expressions from each unique index (this may need to be delayed until
> check_index_predicates() since predOK is only set there)
> 3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
> rel itself (I've not looked for the best location in detail),
>

build_*_join_rel() will be a good place for this. The paths created might
take advantage of this information for costing.

> determine if the join can cause rows to be duplicated. If it can't,
> then add the UniqueKeys from that rel.  For example: SELECT * FROM t1
> INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
> {t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
> since it's an eqijoin and t1.unique has a unique index).

this is interesting.

> If the
> condition was t1.unique = t2.unique then we could take the unique keys
> from both sides of the join, and with t1.non_unique = t2.non_unique,
> we can take neither.
> 4. When creating the GROUP BY paths (when there are no aggregates),
> don't bother doing anything if the input rel's unique keys are a
> subset of the GROUP BY clause. Otherwise, create the group by paths
> and tag the new unique keys onto the GROUP BY rel.
> 5. When creating the DISTINCT paths, don't bother if the input rel has
> unique keys are a subset of the distinct clause.
>

Thanks for laying this out in more details. Two more cases can be added to
this
6. When creating RelOptInfo for a grouped/aggregated result, if all the
columns of a group by clause are part of the result i.e. targetlist, the
columns in group by clause server as the unique keys of the result. So the
corresponding RelOptInfo can be marked as such.
7. The result of DISTINCT is unique for the columns contained in the
DISTINCT clause. Hence we can add those columns to the unique_key of the
RelOptInfo representing the result of the distinct clause.
8. If each partition of a partitioned table has a unique key with the same
columns in it and that happens to be superset of the partition key, then
the whole partitioned table gets that unique key as well.

With this we could actually pass the uniqueness information through
Subquery scans as well and the overall query will benefit with that.

>
> 4 and 5 will mean that: SELECT DISTINCT non_unique FROM t1 GROUP BY
> non_unique will just uniquify once for the GROUP BY and not for the
> distinct.  SELECT DISTINCT unique FROM t1 GROUP BY unique; won't do
> anything to uniquify the results.
>
> Because both 4 and 5 require that the uniquekeys are a subset of the
> distinct/group by clause, an empty uniquekey set would mean that the
> RelOptInfo returns no more than 1 row.  That would allow your:
>
> SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.
>
> There's a separate effort in
> https://commitfest.postgresql.org/27/1741/ to implement some parts of
> the uniquekeys idea.  However the implementation currently only covers
> adding the unique keys to Paths, not to RelOptInfos.
>

I haven't looked at that patch, but as discussed upthread, in this case we
want the uniqueness associated with the RelOptInfo and not the path.

>
> I also believe that the existing code in analyzejoins.c should be
> edited to make use of unique keys rather than how it looks at unique
> indexes and group by / distinct clauses.
>

+1.
-- 
Best Wishes,
Ashutosh Bapat

Re: Remove utils/acl.h from catalog/objectaddress.h

2020-03-10 Thread Peter Eisentraut


On 2020-03-09 17:07, Alvaro Herrera wrote:

On 2020-Mar-07, Peter Eisentraut wrote:


I noticed that catalog/objectaddress.h includes utils/acl.h for no apparent
reason.  It turns out this used to be needed but not anymore. So removed it
and cleaned up the fallout.  Patch attached.


parser/parse_nodes.h already includes nodes/parsenodes.h, so the seeming
redundancy in places such as


diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index c27d255d8d..be63e043c6 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -19,6 +19,7 @@
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_type.h"
  #include "nodes/parsenodes.h"
+#include "parser/parse_node.h"


(and others) is not just apparent; it's also redundant in practice.  And
it's not like parse_node.h is ever going to be able not to depend on
parsenodes.h, so I would vote to remove nodes/parsenodes.h from the
headers where you're adding parser/parse_node.h.


OK, committed with your and Tom's changes.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-10 Thread Masahiko Sawada

On Fri, 6 Mar 2020 at 23:46, Laurenz Albe  wrote:
>
> Thanks, Justin, for the review.
> I have applied the changes where still applicable.
>
> On Fri, 2020-03-06 at 10:52 +1300, David Rowley wrote:
> > On Fri, 6 Mar 2020 at 03:27, Laurenz Albe  wrote:
> > > On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote:
> > > > 1. I'd go for 2 new GUCs and reloptions.
> > > > autovacuum_vacuum_insert_threshold (you're currently calling this
> > > > autovacuum_vacuum_insert_limit. I don't see why the word "limit" is
> > > > relevant here). The other GUC I think should be named
> > > > autovacuum_vacuum_insert_scale_factor and these should work exactly
> > > > the same way as autovacuum_vacuum_threshold and
> > > > autovacuum_vacuum_scale_factor, but be applied in a similar way to the
> > > > vacuum settings, but only be applied after we've checked to ensure the
> > > > table is not otherwise eligible to be vacuumed.
> > >
> > > I disagree about the scale_factor (and have not added it to the
> > > updated version of the patch).  If we have a scale_factor, then the
> > > time between successive autovacuum runs would increase as the table
> > > gets bigger, which defeats the purpose of reducing the impact of each
> > > autovacuum run.
> >
> > My view here is not really to debate what logically makes the most
> > sense.  I don't really think for a minute that the current
> > auto-vacuums scale_factor and thresholds are perfect for the job. It's
> > true that the larger a table becomes, the less often it'll be
> > vacuumed, but these are control knobs that people have become
> > accustomed to and I don't really think that making an exception for
> > this is warranted.  Perhaps we can zero out the scale factor by
> > default and set the threshold into the millions of tuples. We can have
> > people chime in on what they think about that and why once the code is
> > written and even perhaps committed.
>
> Ok, I submit.  My main desire was to keep the number of new GUCs as
> low as reasonably possible, but making the feature tunable along the
> known and "trusted" lines may be a good thing.
>
> The new parameter is called "autovacuum_vacuum_insert_scale_factor".
>
> > Lack of a scale_factor does leave people who regularly truncate their
> > "append-only" tables out in the cold a bit.  Perhaps they'd like
> > index-only scans to kick in soon after they truncate without having to
> > wait for 10 million tuples, or so.
>
> That point I don't see.
> Truncating a table resets the counters to 0.
>
> > > > 10. I'm slightly worried about the case where we don't quite trigger a
> > > > normal vacuum but trigger a vacuum due to INSERTs then skip cleaning
> > > > up the indexes but proceed to leave dead index entries causing indexes
> > > > to become bloated.  It does not seem impossible that given the right
> > > > balance of INSERTs and UPDATE/DELETEs that this could happen every
> > > > time and the indexes would just become larger and larger.
> > >
> > > Perhaps we can take care of the problem by *not* skipping index
> > > cleanup if "changes_since_analyze" is substantially greater than 0.
> > >
> > > What do you think?
> >
> > Well, there is code that skips the index scans when there are 0 dead
> > tuples found in the heap. If the table is truly INSERT-only then it
> > won't do any harm since we'll skip the index scan anyway.  I think
> > it's less risky to clean the indexes. If we skip that then there will
> > be a group of people will suffer from index bloat due to this, no
> > matter if they realise it or not.

+1

FYI actually vacuum could perform index cleanup phase (i.g.
PROGRESS_VACUUM_PHASE_INDEX_CLEANUP phase) on a table even if it's a
truly INSERT-only table, depending on
vacuum_cleanup_index_scale_factor. Anyway, I also agree with not
disabling index cleanup in insert-only vacuum case, because it could
become not only a cause of index bloat but also a big performance
issue. For example, if autovacuum on a table always run without index
cleanup, gin index on that table will accumulate insertion tuples in
its pending list and will be cleaned up by a backend process while
inserting new tuple, not by a autovacuum process. We can disable index
vacuum by index_cleanup storage parameter per tables, so it would be
better to defer these settings to users.

I have one question about this patch from architectural perspective:
have you considered to use autovacuum_vacuum_threshold and
autovacuum_vacuum_scale_factor also for this purpose? That is, we
compare the threshold computed by these values to not only the number
of dead tuples but also the number of inserted tuples. If the number
of dead tuples exceeds the threshold, we trigger autovacuum as usual.
On the other hand if the number of inserted tuples exceeds, we trigger
autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user
consider the settings of newly added two parameters. We will have in
total 4 parameters. Amit also was concerned about that[1].

I think thi

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-10 Thread Fujii Masao




On 2020/03/10 11:36, Fujii Masao wrote:



On 2020/03/09 14:21, Magnus Hagander wrote:

On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi
 wrote:


At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander  wrote 
in

On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao  wrote:

I believe that the time required to estimate the backup size is not so large
in most cases, so in the above idea, most users don't need to specify more
option for the estimation. This is good for UI perspective.

OTOH, users who are worried about the estimation time can use
--no-estimate-backup-size option and skip the time-consuming estimation.


Personally, I think this is the best idea. it brings a "reasonable
default", since most people are not going to have this problem, and
yet a good way to get out from the issue for those that potentially
have it. Especially since we are now already showing the state that
"walsender is estimating the size", it should be easy enugh for people
to determine if they need to use this flag or not.

In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
pretty clear things are about backups when you call pg_basebackup, and
it keeps the option a bit more reasonable in length.


+1


I agree to the negative option and the shortened name.  What if both
--no-estimate-size and -P are specifed?  Rejecting as conflicting
options or -P supercedes?  I would choose the former because we don't
know which of them has priority.


I would definitely prefer rejecting an invalid combination of options.


+1

So, I will make the patch adding support for --no-estimate-size option
in pg_basebackup.


Patch attached.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 987580d6df..a35b690e33 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4388,10 +4388,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  backup_total
  bigint
  
-  Total amount of data that will be streamed. If progress reporting
-  is not enabled in pg_basebackup
-  (i.e., --progress option is not specified),
-  this is 0. Otherwise, this is estimated and
+  Total amount of data that will be streamed. This is estimated and
   reported as of the beginning of
   streaming database files phase. Note that
   this is only an approximation since the database
@@ -4399,7 +4396,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   and WAL log may be included in the backup later. This is always
   the same value as backup_streamed
   once the amount of data streamed exceeds the estimated
-  total size.
+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is always 0.
  
 
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 29bf2f9b97..33f9e69418 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -460,20 +460,14 @@ PostgreSQL documentation
 in this case the estimated target size will increase once it passes the
 total estimate without WAL.

-   
-When this is enabled, the backup will start by enumerating the size of
-the entire database, and then go back and send the actual contents.
-This may make the backup take slightly longer, and in particular it
-will take longer before the first data is sent.
-   

 Whether this is enabled or not, the
 pg_stat_progress_basebackup view
-report the progress of the backup in the server side. But note
-that the total amount of data that will be streamed is estimated
-and reported only when this option is enabled. In other words,
-backup_total column in the view always
-indicates 0 if this option is disabled.
+report the progress of the backup in the server side.
+   
+   
+This option is not allowed when using
+--no-estimate-size.

   
  
@@ -552,6 +546,30 @@ PostgreSQL documentation

   
  
+
+ 
+  --no-estimate-size
+  
+   
+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view always indicates 0 if this option is enabled.
+   
+   
+When this is disabled, the backup will start by enumerating
+the size of the entire database, and then go back and send
+the actual contents. This may make the backup take slightly
+longer, and in particular it will take longer before the first
+data is sent. This option is useful to avoid such estimation
+time if it's too long.
+

Re: Improve handling of parameter differences in physical replication

2020-03-10 Thread Kyotaro Horiguchi

At Mon, 9 Mar 2020 21:13:38 +0900, Masahiko Sawada 
 wrote in 
> On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut
>  wrote:
> >
> > On 2020-03-09 09:11, Masahiko Sawada wrote:
> > > I think after recovery is paused users will be better to restart the
> > > server rather than resume the recovery. I agree with this idea but I'm
> > > slightly concerned that users might not realize that recovery is
> > > paused until they look at that line in server log or at
> > > pg_stat_replication because the standby server is still functional. So
> > > I think we can periodically send WARNING to inform user that we're
> > > still waiting for parameter change and restart.
> >
> > I think that would be annoying, unless you create a system for
> > configuring those periodic warnings.
> >
> > I imagine in a case like having set max_prepared_transactions but never
> > actually using prepared transactions, people will just ignore the
> > warning until they have their next restart, so it could be months of
> > periodic warnings.
> 
> Well I meant to periodically send warning messages while waiting for
> parameter change, that is after exhausting resources and stopping
> recovery. In this situation user need to notice that as soon as
> possible.

If we lose connection, standby continues to complain about lost
connection every 5 seconds.  This is a situation of that kind.

By the way, when I reduced max_connection only on master then take
exclusive locks until standby complains on lock exchaustion, I see a
WARNING that is saying max_locks_per_transaction instead of
max_connection.

WARNING:  insufficient setting for parameter max_connections
DETAIL:  max_connections = 2 is a lower setting than on the master server 
(where its value was 3).
HINT:  Change parameters and restart the server, or there may be resource 
exhaustion errors sooner or later.
CONTEXT:  WAL redo at 0/6A0 for XLOG/PARAMETER_CHANGE: max_connections=3 
max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0 
max_locks_per_xact=10 wal_level=replica wal_log_hints=off 
track_commit_timestamp=off
WARNING:  recovery paused because of insufficient setting of parameter 
max_locks_per_transaction (currently 10)
DETAIL:  The value must be at least as high as on the primary server.
HINT:  Recovery cannot continue unless the parameter is changed and the server 
restarted.
CONTEXT:  WAL redo at 0/6004A80 for Standb

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: adding partitioned tables to publications

2020-03-10 Thread Peter Eisentraut

I have committed the 0001 patch of this series (partitioned table member 
of publication).  I changed the new argument of 
GetPublicationRelations() to an enum and reformatted some comments. 
I'll continue looking through the subsequent patches.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-10 Thread Andy Fan

Hi David:


> 3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
> rel itself (I've not looked for the best location in detail),
> determine if the join can cause rows to be duplicated. If it can't,
> then add the UniqueKeys from that rel.


I have some concerns about this method,  maybe I misunderstand
something, if so, please advise.

In my current implementation, it calculates the uniqueness for each
BaseRel only, but in your way,  looks we need to calculate the
UniquePathKey for both BaseRel and JoinRel.   This makes more
difference for multi table join.Another concern is UniquePathKey
is designed for a general purpose,  we need to maintain it no matter
distinctClause/groupbyClause.


>  For example: SELECT * FROM t1
> INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
> {t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
> since it's an eqijoin and t1.unique has a unique index).
>

Thanks for raising this.  My current rule requires *every* relation yields
a
unique result and *no matter* with the join method.  Actually I want to make
the rule less strict, for example, we  may just need 1 table yields unique
result
and the final result will be unique as well under some join type.

As for the t1 INNER JOIN t2 ON t1.unique = t2.not_unique;  looks we can't
remove the distinct based on this.

create table m1(a int primary key, b int);
create table m2(a int primary key, b int);
insert into m1 values(1, 1), (2, 1);
insert into m2 values(1, 1), (2, 1);
select distinct m1.a from m1, m2 where m1.a = m2.b;



> SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.
>

Actually I want to keep the distinct for this case now.  One reason is
there are only 1
row returned, so the distinct erasing can't help much.   The more important
reason is
Query->hasAggs is true for "select distinct  (select count(*) filter (where
t2.c2 = 6
and t2.c1 < 10) from ft1 t1 where t1.c1 = 6)  from ft2 t2 where t2.c2 % 6 =
0 order by 1;"
(this sql came from postgres_fdw.sql).

There's a separate effort in
> https://commitfest.postgresql.org/27/1741/ to implement some parts of
> the uniquekeys idea.  However the implementation currently only covers
> adding the unique keys to Paths, not to RelOptInfos


Thanks for this info.  I guess this patch is not merged so far, but looks
the cfbot
for my patch [1]  failed due to this :(   search
"explain (costs off) select distinct on(pk1) pk1, pk2 from
select_distinct_a;"


> I also believe that the existing code in analyzejoins.c should be
> edited to make use of unique keys rather than how it looks at unique
> indexes and group by / distinct clauses.
>
>  I can do this after we have agreement on the UniquePath.

For my cbbot failure, another strange thing is "A" appear ahead of "a" after
the order by..  Still didn't find out why.

[1]
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.83298

Regards
Andy Fan

Re: reindex concurrently and two toast indexes

2020-03-10 Thread Michael Paquier

On Tue, Mar 10, 2020 at 12:09:42PM +0900, Michael Paquier wrote:
> On Mon, Mar 09, 2020 at 08:04:27AM +0100, Julien Rouhaud wrote:
>> Agreed.
> 
> Thanks for checking the patch.

And applied as 61d7c7b.  Regarding the isolation tests, let's
brainstorm on what we can do, but I am afraid that it is too late for
13. 
--
Michael


signature.asc
Description: PGP signature

Re: Asynchronous Append on postgres_fdw nodes.

2020-03-10 Thread Kyotaro Horiguchi

Hello. Thank you for testing.

At Tue, 10 Mar 2020 05:13:42 +, movead li  wrote in 
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:not tested
> 
> I occur a strange issue when a exec 'make installcheck-world', it is:

I had't done that.. Bud it worked for me.

> ##
> ...
> == running regression test queries==
> test adminpack... FAILED   60 ms
> 
> ==
>  1 of 1 tests failed. 
> ==
> 
> The differences that caused some tests to fail can be viewed in the
> file "/work/src/postgres_app_for/contrib/adminpack/regression.diffs".  A copy 
> of the test summary that you see
> above is saved in the file 
> "/work/src/postgres_app_for/contrib/adminpack/regression.out".
> ...
> ##
> 
> And the content in 'contrib/adminpack/regression.out' is:

I don't see that file. Maybe regression.diff?

> ##
> SELECT pg_file_write('/tmp/test_file0', 'test0', false);
>  ERROR:  absolute path not allowed
>  SELECT pg_file_write(current_setting('data_directory') || '/test_file4', 
> 'test4', false);
> - pg_file_write 
> 
> - 5
> -(1 row)
> -
> +ERROR:  reference to parent directory ("..") not allowed

It seems to me that you are setting a path containing ".." to PGDATA.

> However the issue does not occur when I do a 'make check-world'.
> And it doesn't occur when I test the 'make installcheck-world' without the 
> patch.

check-world doesn't use path containing ".." as PGDATA.

> The new status of this patch is: Waiting on Author

Thanks for noticing that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

94 matches

Mail list logo