Patch: to pass query string to pg_plan_query()

2020-03-09 Thread legrand legrand
Hello,

This is a call for committers, reviewers and users,
regarding "planning counters in pg_stat_statements"
patch [1] but not only.

Historically, this version of pg_stat_statements
with planning counters was performing 3 calls to 
pgss_store() for non utility statements in:
1 - pgss_post_parse_analyze (init entry with queryid 
and store query text)
2 - pgss_planner_hook (to store planning counters)
3 - pgss_ExecutorEnd (to store execution counters)

Then a new version was proposed to remove one call 
to pgss_store() by adding the query string to the 
planner pg_plan_query():
1 - pgss_planner_hook (to store planning counters)
2 - pgss_ExecutorEnd (to store execution counters)

Many performances tests where performed concluding
that there is no impact on this subject.

Patch "to pass query string to the planner", could be 
committed by itself, and (maybe) used by other extensions.

If this was done, this new version of pgss with planning
counters could be committed as well, or even later 
(being used as a non core extension starting with pg13). 

So please give us your feedback regarding this patch 
"to pass query string to the planner", if you have other 
use cases, or any comment regarding core architecture.

note:
A problem was discovered during IVM testing,
because some queries without sql text where planned
without being parsed, finishing in pgss with a zero 
queryid.

A work arround is to set track_planning = false,
we have chosen to fix that in pgss by ignoring
zero queryid inside pgss_planner_hook.

Thanks in advance
Regards
PAscal
  
[1] " https://www.postgresql.org/message-id/20200309103142.GA45401%40nol
  "



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Proposal: PqSendBuffer removal

2020-03-09 Thread Tom Lane
Andres Freund  writes:
> On 2020-03-07 13:54:57 -0500, Tom Lane wrote:
>> This seems way overcomplicated compared to what I suggested (ie,
>> just let internal_putbytes bypass the buffer in cases where the
>> data would get flushed next time through its loop anyway).

> Well, we quite frequently send out multiple messages in a row, without a
> flush inbetween. It'd be nice if we could avoid both copying buffers for
> each message, as well as allocating a new stringinfo.

That gets you right into the situation where trouble adding more messages
could corrupt/destroy messages that were supposedly already sent (but in
reality aren't flushed to the client quite yet).  I really think that
there is not enough win available here to justify introducing that kind
of fragility.

To be blunt, no actual evidence has been offered in this thread that
it's worth changing anything at all in this area.  All of the bytes
in question eventually have to be delivered to the client, which is
going to involve two kernel-space/user-space copy steps along with
who-knows-what network transmission overhead.  The idea that an
extra process-local memcpy or two is significant compared to that
seems like mostly wishful thinking to me.

regards, tom lane




Re: Should we remove a fallback promotion? take 2

2020-03-09 Thread Andres Freund
Hi,

On 2020-03-06 16:33:18 -0300, Alvaro Herrera wrote:
> On 2020-Mar-06, Michael Paquier wrote:
> > On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> > > Seems reasonable, but it would be better if people proposed these
> > > kinds of changes closer to the beginning of the release cycle rather
> > > than in the crush at the end.
> > 
> > +1, to both points.
> 
> Why?  Are you saying that there's some actual risk of breaking
> something?  We're not even near beta or feature freeze yet.
> 
> I'm not seeing the reason for the "please propose this sooner in the
> cycle" argument.  It has already been proposed sooner -- seven years
> sooner.  We're not waiting for users to complain anymore; clearly nobody
> cared.

Yea. There are changes that are so invasive that it's useful to go very
early, but in this case I'm not seeing it?

+1 for removing non-fast promotions.

FWIW, I find "fallback promotion" a confusing description.


Btw, I'd really like to make the crash recovery environment more like
the replication environment. I.e. have checkpointer, bgwriter running,
and have an 'end-of-recovery' record instead of a checkpoint at the end.


Greetings,

Andres Freund




Re: Index Skip Scan

2020-03-09 Thread James Coleman
On Mon, Mar 9, 2020 at 3:56 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> Assuming we'll implement it in a way that we do not know about what kind
> of path type is that in create_distinct_path, then it can also work for
> ProjectionPath or anything else (if UniqueKeys are present). But then
> still EquivalenceMember are used only to figure out correct
> distinctPrefixKeys and do not affect whether or not skipping is applied.
> What do I miss?


Part of the puzzle seems to me to this part of the response:

> I think the UniqueKeys may need to be changed from using
> EquivalenceClasses to use Exprs instead.

But I can't say I'm being overly helpful by pointing that out, since I
don't have my head in the code enough to understand how you'd
accomplish that :)

James




time for catalog/pg_cast.c?

2020-03-09 Thread Alvaro Herrera
I extracted from the latest multirange patch a bit that creates a new
routine CastCreate() in src/backend/catalog/pg_cast.c.  It contains the
catalog-accessing bits to create a new cast.  It seems harmless, so I
thought I'd apply it to get rid of a couple of hunks in the large patch.

(I also threw in a move of get_cast_oid from functioncmds.c to
lsyscache.c, which seems its natural place; at first I thought to put it
in catalog/pg_cast.c but really it's not a great place IMO.  This
function was invented out of whole cloth in commit fd1843ff8979.  I also
contemplated the move of CreateCast and DropCastById from functioncmds.c
to some new place, but creating a new commands/castcmds.c seemed a bit
excessive, so I left them in their current locations.)

-- 
Álvaro Herrera39°49'30"S 73°17'W
"The problem with the future is that it keeps turning into the present"
(Hobbes)
commit 928aa3fa75a2f244dfb3ac08c4b4d166e426f9cf
Author: Alvaro Herrera 
AuthorDate: Mon Mar 9 17:19:50 2020 -0300
CommitDate: Mon Mar 9 17:59:30 2020 -0300

add pg_cast.c for CastCreate()

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index f8f0b4841c..9499bb33e5 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -25,6 +25,7 @@ OBJS = \
 	objectaddress.o \
 	partition.o \
 	pg_aggregate.o \
+	pg_cast.o \
 	pg_collation.o \
 	pg_constraint.o \
 	pg_conversion.o \
diff --git a/src/backend/catalog/pg_cast.c b/src/backend/catalog/pg_cast.c
new file mode 100644
index 00..19f410337d
--- /dev/null
+++ b/src/backend/catalog/pg_cast.c
@@ -0,0 +1,116 @@
+/*-
+ *
+ * pg_cast.c
+ *	  routines to support manipulation of the pg_cast relation
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/catalog/pg_cast.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/table.h"
+#include "catalog/catalog.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_cast.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+/*
+ * 
+ *		CastCreate
+ * 
+ */
+ObjectAddress
+CastCreate(Oid sourcetypeid, Oid targettypeid, Oid funcid, char castcontext,
+		   char castmethod, DependencyType behavior)
+{
+	Relation		relation;
+	HeapTuple		tuple;
+	Oidcastid;
+	Datum			values[Natts_pg_cast];
+	bool			nulls[Natts_pg_cast];
+	ObjectAddress	myself,
+	referenced;
+
+	relation = table_open(CastRelationId, RowExclusiveLock);
+
+	/*
+	 * Check for duplicate.  This is just to give a friendly error message,
+	 * the unique index would catch it anyway (so no need to sweat about race
+	 * conditions).
+	 */
+	tuple = SearchSysCache2(CASTSOURCETARGET,
+			ObjectIdGetDatum(sourcetypeid),
+			ObjectIdGetDatum(targettypeid));
+	if (HeapTupleIsValid(tuple))
+		ereport(ERROR,
+(errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("cast from type %s to type %s already exists",
+		format_type_be(sourcetypeid),
+		format_type_be(targettypeid;
+
+	/* ready to go */
+	castid = GetNewOidWithIndex(relation, CastOidIndexId, Anum_pg_cast_oid);
+	values[Anum_pg_cast_oid - 1] = ObjectIdGetDatum(castid);
+	values[Anum_pg_cast_castsource - 1] = ObjectIdGetDatum(sourcetypeid);
+	values[Anum_pg_cast_casttarget - 1] = ObjectIdGetDatum(targettypeid);
+	values[Anum_pg_cast_castfunc - 1] = ObjectIdGetDatum(funcid);
+	values[Anum_pg_cast_castcontext - 1] = CharGetDatum(castcontext);
+	values[Anum_pg_cast_castmethod - 1] = CharGetDatum(castmethod);
+
+	MemSet(nulls, false, sizeof(nulls));
+
+	tuple = heap_form_tuple(RelationGetDescr(relation), values, nulls);
+
+	CatalogTupleInsert(relation, tuple);
+
+	/* make dependency entries */
+	myself.classId = CastRelationId;
+	myself.objectId = castid;
+	myself.objectSubId = 0;
+
+	/* dependency on source type */
+	referenced.classId = TypeRelationId;
+	referenced.objectId = sourcetypeid;
+	referenced.objectSubId = 0;
+	recordDependencyOn(, , behavior);
+
+	/* dependency on target type */
+	referenced.classId = TypeRelationId;
+	referenced.objectId = targettypeid;
+	referenced.objectSubId = 0;
+	recordDependencyOn(, , behavior);
+
+	/* dependency on function */
+	if (OidIsValid(funcid))
+	{
+		referenced.classId = ProcedureRelationId;
+		referenced.objectId = funcid;
+		referenced.objectSubId = 0;
+		recordDependencyOn(, , behavior);
+	}
+
+	/* dependency on extension */
+	recordDependencyOnCurrentExtension(, false);
+
+	/* Post creation hook 

Re: Support external parameters in EXECUTE command

2020-03-09 Thread Tom Lane
Peter Eisentraut  writes:
> Continuing the discussion in [0], here is a patch that allows parameter 
> references in the arguments of the EXECUTE command.  The main purpose is 
> submitting protocol-level parameters, but the added regression test case 
> shows another way to exercise it.

I spent a bit of time looking at this.

> What's confusing is that the code already contains a reference that 
> indicates that this should be possible:
> ...
> I'm not sure what this is supposed to do without my patch on top of it. 
> If I remove the estate->es_param_list_info assignment, no tests fail 
> (except the one I added).  Either this is a leftover from previous 
> variants of this code (as discussed in [0]), or there is something I 
> haven't understood.

That case is reachable with an example like this:

create or replace function foo(int) returns int language plpgsql as $$
declare
  x int := $1;
  y int;
begin
  execute 'execute fool($1 + 11)' into y using x;
  return y;
end
$$;

deallocate fool;
prepare fool(int) as select $1 + 1;

select foo(42);

In the existing code this draws "ERROR:  there is no parameter $1".
Your patch makes it work, which is an improvement.

There are a few things bothering me, nonetheless.

1. The patch only implements part of the API for dynamic ParamListInfos,
that is it honors paramFetch but not parserSetup.  This seems not to
matter for any of the existing code paths (although we may just be lacking
a test case to reveal it); but I feel that it's just a matter of time
before somebody sets up a case where it would matter.  We would have such
a problem today if plpgsql treated EXECUTE as a plain SQL command rather
than its own magic thing, because then "EXECUTE prepared_stmt(plpgsql_var)"
would require the parserSetup hook to be honored in order to resolve the
variable reference.

It's fairly simple to fix this in ExplainExecuteQuery, since that is
creating its own ParseState; it can just apply the plist's parserSetup
to that pstate.  I did that in the attached v2 so you can see what
I'm talking about.  However, it's a lot less clear what to do in
ExecuteQuery, which as it stands is re-using a passed-in ParseState;
how do we know that the parse hooks aren't already set up in that?
(Or if they are, what do we do to merge their effects?)

2. Actually that latter problem exists already in your patch, because
it's cavalierly overwriting the passed-in ParseState's p_paramref_hook
without regard for the possibility that that's set already.  I added
an Assert that it's not set, and we get through check-world that way,
but even to write the assertion is to think that there is surely
going to be a code path that breaks it, soon if not already.

3. Both of the above points seem closely related to the vague worry
I had in the previous discussion about nested contexts all wanting
to control the resolution of parameters.  We'll get away with this,
perhaps, as long as that situation never occurs; but once it does
we have issues.

4. I'm inclined to feel that the reason we have these problems is
that this patch handles parameter resolution in the wrong place.
It would likely be better if parameter resolution were already
set up in the ParseState passed to ExecuteQuery (and then we'd fix
ExplainExecuteQuery by likewise passing it a ParseState for the
EXPLAIN EXECUTE).  However, that approach would probably result in
Params being available to any utility statement, and then we've
got issues for statements where expressions are only parsed and not
immediately executed: we have to define sane semantics for Param
references in such contexts, and make sure they get honored.

5. So that brings us back to the other point I made earlier, which
is that I'm not happy with patching this locally in EXECUTE rather
than having a design that works across-the-board for utility
statements.  You had expressed similar concerns a ways upthread:
>> Come to think of it, it would probably also be useful if PREPARE did
>> parameter processing, again in order to allow use with PQexecParams().

I think it's possible that we could solve the semantics problem
by defining the behavior for Params in utility statements as
"the parameter value is immediately substituted at parse time,
producing a Const node".  This doesn't change the behavior in EXECUTE,
because the expression will straightaway be evaluated and produce the
correct value.  In situations like

ALTER TABLE foo ADD COLUMN newcol int DEFAULT ($1);

you would also get what seem sane semantics, ie the default is the
value provided as a Param.

I feel that perhaps a patch that does this wouldn't be tremendously
more code than you have here; but the param resolution hook would be
installed at some different more-global place, and it would be code
to generate a Const node not a Param.

I attach a v2 with the trivial mods mentioned above, but just for
illustration not because I think this is the way to go.

 

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-09 Thread David Rowley
On Sat, 7 Mar 2020 at 00:47, Andy Fan  wrote:
> Upload the newest patch so that the cfbot can pass.  The last patch failed
> because some explain without the (cost off).

I've only really glanced at this patch, but I think we need to do this
in a completely different way.

I've been mentioning UniqueKeys around this mailing list for quite a
while now [1]. To summarise the idea:

1. Add a new List field to RelOptInfo named unique_keys
2. During get_relation_info() process the base relation's unique
indexes and add to the RelOptInfo's unique_keys list the indexed
expressions from each unique index (this may need to be delayed until
check_index_predicates() since predOK is only set there)
3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
rel itself (I've not looked for the best location in detail),
determine if the join can cause rows to be duplicated. If it can't,
then add the UniqueKeys from that rel.  For example: SELECT * FROM t1
INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
{t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
since it's an eqijoin and t1.unique has a unique index). If the
condition was t1.unique = t2.unique then we could take the unique keys
from both sides of the join, and with t1.non_unique = t2.non_unique,
we can take neither.
4. When creating the GROUP BY paths (when there are no aggregates),
don't bother doing anything if the input rel's unique keys are a
subset of the GROUP BY clause. Otherwise, create the group by paths
and tag the new unique keys onto the GROUP BY rel.
5. When creating the DISTINCT paths, don't bother if the input rel has
unique keys are a subset of the distinct clause.

4 and 5 will mean that: SELECT DISTINCT non_unique FROM t1 GROUP BY
non_unique will just uniquify once for the GROUP BY and not for the
distinct.  SELECT DISTINCT unique FROM t1 GROUP BY unique; won't do
anything to uniquify the results.

Because both 4 and 5 require that the uniquekeys are a subset of the
distinct/group by clause, an empty uniquekey set would mean that the
RelOptInfo returns no more than 1 row.  That would allow your:

SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.

There's a separate effort in
https://commitfest.postgresql.org/27/1741/ to implement some parts of
the uniquekeys idea.  However the implementation currently only covers
adding the unique keys to Paths, not to RelOptInfos.

I also believe that the existing code in analyzejoins.c should be
edited to make use of unique keys rather than how it looks at unique
indexes and group by / distinct clauses.

[1] https://www.postgresql.org/search/?m=1=pgsql-hackers=uniquekeys




Re: effective_io_concurrency's steampunk spindle maths

2020-03-09 Thread Thomas Munro
On Sat, Mar 7, 2020 at 9:07 AM Tom Lane  wrote:
> Tomas Vondra  writes:
> > So I think we should either rename e_i_c or keep it as is, and then also
> > have a new GUC. And then translate the values between those (but that
> > might be overkill).
>
> Please DON'T try to have two interrelated GUCs for this.  We learned
> our lesson about that years ago.

Ack.

> I think dropping the existing GUC is a perfectly sane thing to do,
> if the new definition wouldn't be compatible.  In practice few
> people will notice, because few will have set it.

That's what I thought too, but if Andres is right that "it's not like
anybody knew how to infer a useful value", I'm wondering it's enough
if we just provide an explanation of the change in the release notes.
The default doesn't change (1 goes to 1), so most people will
experience no change, but it you had it set to (say) 42 after careful
experimentation, you might like to consider updating it to the result
of:

  select round(sum(42 / n::float)) as new_setting from
generate_series(1, 42) s(n)

Here's a patch set to remove the spindle stuff, add a maintenance
variant, and use the maintenance one in
heap_compute_xid_horizon_for_tuples().
From fc472c5b473e3495e92ec9cf021b6e6393579ca5 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 7 Mar 2020 15:04:34 +1300
Subject: [PATCH 1/2] Simplify the effective_io_concurrency setting.

The effective_io_concurrency GUC and tablespace option were previously not used
directly to control the number of prefetches we'd issue.  Instead, they were
passed through a formula based on a theory about RAID spindles and
probabilities.  Today, a lot of people are familiar with the concept of I/O
request queues and have never seen a spinning disk, so let's switch to
something a little easier to justify.

If users were using a setting higher than the default value 1, then they will
need to increase the value for the same effect.  The query to compute the new
setting corresponding to the old setting OLD is:

  select round(sum(OLD / n::float)) as new_setting
from generate_series(1, OLD) s(n);

Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
---
 src/backend/executor/nodeBitmapHeapscan.c | 18 +-
 src/backend/storage/buffer/bufmgr.c   | 74 +++
 src/backend/utils/misc/guc.c  | 29 +
 src/include/storage/bufmgr.h  |  1 -
 4 files changed, 13 insertions(+), 109 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index ae8a11da30..726d3a2d9a 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -707,7 +707,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 {
 	BitmapHeapScanState *scanstate;
 	Relation	currentRelation;
-	int			io_concurrency;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -737,8 +736,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->prefetch_iterator = NULL;
 	scanstate->prefetch_pages = 0;
 	scanstate->prefetch_target = 0;
-	/* may be updated below */
-	scanstate->prefetch_maximum = target_prefetch_pages;
 	scanstate->pscan_len = 0;
 	scanstate->initialized = false;
 	scanstate->shared_tbmiterator = NULL;
@@ -794,20 +791,11 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 		ExecInitQual(node->bitmapqualorig, (PlanState *) scanstate);
 
 	/*
-	 * Determine the maximum for prefetch_target.  If the tablespace has a
-	 * specific IO concurrency set, use that to compute the corresponding
-	 * maximum value; otherwise, we already initialized to the value computed
-	 * by the GUC machinery.
+	 * Maximum number of prefetches for the tablespace if configured, otherwise
+	 * the current value of the effective_io_concurrency GUC.
 	 */
-	io_concurrency =
+	scanstate->prefetch_maximum =
 		get_tablespace_io_concurrency(currentRelation->rd_rel->reltablespace);
-	if (io_concurrency != effective_io_concurrency)
-	{
-		double		maximum;
-
-		if (ComputeIoConcurrency(io_concurrency, ))
-			scanstate->prefetch_maximum = rint(maximum);
-	}
 
 	scanstate->ss.ss_currentRelation = currentRelation;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 5880054245..7a7748b695 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -110,6 +110,13 @@ bool		zero_damaged_pages = false;
 int			bgwriter_lru_maxpages = 100;
 double		bgwriter_lru_multiplier = 2.0;
 bool		track_io_timing = false;
+
+/*
+ * How many buffers PrefetchBuffer callers should try to stay ahead of their
+ * ReadBuffer calls by.  Zero means "never prefetch".  This value is only used
+ * for buffers not belonging to tablespaces that have their
+ * effective_io_concurrency parameter set.
+ */
 int			effective_io_concurrency = 0;
 
 /*
@@ 

Re: Reducing WaitEventSet syscall churn

2020-03-09 Thread Kyotaro Horiguchi
Hello.

I looked this.

At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro  wrote 
in 
> On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro  wrote:
> > > > Here are some patches to get rid of frequent system calls.
> 
> Here's a new version of this patch set.  It gets rid of all temporary
> WaitEventSets except a couple I mentioned in another thread[1].
> WaitLatch() uses CommonWaitSet, and calls to WaitLatchOrSocket() are
> replaced by either the existing FeBeWaitSet (walsender, gssapi/openssl
> auth are also candidates) or a special purpose long lived WaitEventSet
> (replication, postgres_fdw, pgstats).  It passes make check-world with
> WAIT_USE_POLL, WAIT_USE_KQUEUE, WAIT_USE_EPOLL, all with and without
> -DEXEC_BACKEND, and make check with WAIT_USE_WIN32 (Appveyor).
> 
> 0001: "Don't use EV_CLEAR for kqueue events."
> 
> This fixes a problem in the kqueue implementation that only shows up
> once you switch to long lived WaitEventSets.  It needs to be
> level-triggered like the other implementations, for example because
> there's a place in the recovery tests where we wait twice in a row
> without trying to do I/O in between.  (This is a bit like commit
> 3b790d256f8 that fixed a similar problem on Windows.)

It looks fine in the light of document of kqueue.

> 0002: "Use a long lived WaitEventSet for WaitLatch()."
> 
> In the last version, I had a new function WaitMyLatch(), but that
> didn't help with recoveryWakeupLatch.  Switching between latches
> doesn't require a syscall, so I didn't want to have a separate WES and
> function just for that.  So I went back to using plain old
> WaitLatch(), and made it "modify" the latch every time before waiting
> on CommonWaitSet.  An alternative would be to get rid of the concept
> of latches other than MyLatch, and change the function to
> WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
> for which I didn't want to have a separate WES, so I just set that
> flag every time.  Thoughts?

It is surely an improvement from that we create a full-fledged WES
every time. The name CommonWaitSet gives an impression as if it is
used for a variety of waitevent sets, but it is used only by
WaitLatch. So I would name it LatchWaitSet. With that name I won't be
surprised by that the function is pointing WL_LATCH_SET by index 0
without any explanation when calling ModifyWaitSet.

@@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set)
ReleaseExternalFD();
 #elif defined(WAIT_USE_KQUEUE)
close(set->kqueue_fd);
-   ReleaseExternalFD();
+   if (set->kqueue_fd >= 0)
+   {
+   close(set->kqueue_fd);
+   ReleaseExternalFD();
+   }

Did you forget to remove the close() outside the if block?
Don't we need the same amendment for epoll_fd with kqueue_fd?

WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET
is given), the function returns immediately.". But now the function
reacts to latch even if WL_LATCH_SET is not set. I think actually it
is alwys set so I think we need to modify Assert and function comment
following the change.

> 0003: "Use regular WaitLatch() for condition variables."
> 
> That mechanism doesn't need its own WES anymore.

Looks fine.

> 0004: "Introduce RemoveWaitEvent()."
> 
> We'll need that to be able to handle connections that are closed and
> reopened under the covers by libpq (replication, postgres_fdw).  We
> also wanted this on a couple of other threads for multiplexing FDWs,
> to be able to adjust the wait set dynamically for a proposed async
> Append feature.
> 
> The implementation is a little naive, and leaves holes in the
> "pollfds" and "handles" arrays (for poll() and win32 implementations).
> That could be improved with a bit more footwork in a later patch.
> 
> XXX The Windows version is based on reading documentation.  I'd be
> very interested to know if check-world passes (especially
> contrib/postgres_fdw and src/test/recovery).  Unfortunately my
> appveyor.yml fu is not yet strong enough.

I didn't find the documentation about INVALID_HANDLE_VALUE in
lpHandles. Could you give me the URL for that?

I didn't run recoverycheck because because I couldn't install IPC::Run
for ActivePerl.. But contribcheck succeeded.

+   for (int i = 0; i < nevents; ++i)
+   set->handles[i + 1] = INVALID_HANDLE_VALUE;

It accesses set->handles[nevents], which is overrun.

+   /* Set up the free list. */
+   for (int i = 0; i < nevents; ++i)
+   set->events[i].next_free = i + 1;
+   set->events[nevents - 1].next_free = -1;

It sets the last element twice. (harmless but useless).

set->handles = (HANDLE) data;
data += MAXALIGN(sizeof(HANDLE) * nevents);

It is not an issue of thie patch, but does hadles need nevents + 1
elements?

WaitEventSetSize is not checking its parameter range.

I'l continue reviewing in later mail.

> 0005: "libpq: Add PQsocketChangeCount to advertise socket changes."
 

regards.

-- 
Kyotaro 

Re: Nicer error when connecting to standby with hot_standby=off

2020-03-09 Thread Andres Freund
Hi,

On 2020-03-09 18:40:32 -0400, James Coleman wrote:
> On Mon, Mar 9, 2020 at 6:28 PM Andres Freund  wrote:
> > > I wanted to get some initial feedback on the idea before writing a patch:
> > > does that seem like a reasonable change? Is it actually plausible to
> > > distinguish between this state and "still recovering" (i.e., when starting
> > > up a hot standby but initial recovery hasn't completed so it legitimately
> > > can't accept connections yet)? If so, should we include the possibility if
> > > hot_standby isn't on, just in case?
> >
> > Yes, it is feasible to distinguish those cases. And we should, if we're
> > going to change things around.
> 
> I'll look into this hopefully soon, but it's helpful to know that it's
> possible. Is it basically along the lines of checking to see if the
> LSN is past the minimum recovery point?

No, I don't think that's the right approach. IIRC the startup process
(i.e. the one doing the WAL replay) signals postmaster once consistency
has been achieved. So you can just use that state.

Greetings,

Andres Freund




Re: Proposal: PqSendBuffer removal

2020-03-09 Thread Andres Freund
Hi,

On 2020-03-07 13:54:57 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > What I'm thinking is that we'd have pg_beginmessage() (potentially a
> > different named version) initialize the relevant StringInfo basically as
> 
> > (StringInfoData){
> > .data = PqSendPointer,
> > .len = 0,
> > .alloc_offset = PqSendBuffer - PqSendBuffer
> > }
> 
> This seems way overcomplicated compared to what I suggested (ie,
> just let internal_putbytes bypass the buffer in cases where the
> data would get flushed next time through its loop anyway).

Well, we quite frequently send out multiple messages in a row, without a
flush inbetween. It'd be nice if we could avoid both copying buffers for
each message, as well as allocating a new stringinfo.

We've reduced the number of wholesale stringinfo reallocations with
pq_beginmessage_reuse(), which is e.g. significant when actually
returning tuples, and that was a noticable performance improvement.

I don't believe that the copy is a performance relevant factor solely
for messages that are individually too large to fit in the send
buffer. For one, there'll often be some pending send data from a
previous "round", which'd mean we'd need to call send() more often, or
use vectorized IO (i.e. switch to writev()). But also,


> What you're suggesting would be a lot more invasive and restrictive
> --- for example, why is it a good idea to have a hard-wired
> assumption that we can't build more than one message at once?

Well, we don't seem to have many (any?) places where that's not the
case. And having to use only one layer of buffering for outgoing data
does seem advantageous to me.  It'd not be hard to fall back to a
separate buffer just for the cases where there are multiple messages
built concurrently, if we want to support that.


> I'm also concerned that trying to do too much optimization here will
> break one of the goals of the existing code, which is to not get into
> a situation where an OOM failure causes a wire protocol violation
> because we've already sent part of a message but are no longer able to
> send the rest of it.  To ensure that doesn't happen, we must construct
> the whole message before we start to send it, and we can't let
> buffering of the last message be too entwined with construction of the
> next one.  Between that and the (desirable) arms-length separation
> between datatype I/O functions and the actual I/O, a certain amount of
> data copying seems unavoidable.

Sure. But I don't see why that requires two levels of buffering for
messages? If we were to build the message in the output buffer, resizing
as needed, we can send the data once the message is complete, or not at
all.

I don't think anything on the datatype I/O level would be affected?

While I think it'd be quite desirable to avoid e.g. the separate
stringinfo allocation for send functions, I think that's quite a
separate project. One which I have no really good idea to tackle.

Greetings,

Andres Freund


[1] Since I had looked it up:

We do a separate message for each of:
1) result description
2) each result row
3) ReadyForQuery

And we separately call through PQcommMethods for things like
pq_putemptymessage() and uses of pq_putmessage() not going through
pq_endmessage. The former is called a lot, especially when using the
extended query protocol (which we want clients to use!).


For a SELECT 1 in the simple protocol we end up calling putmessage via:
1) SendRowDescriptionMessage
2) printtup()
3) EndCommand()
4) ReadyForQuery()

For extended:
1) exec_parse_message()
2) exec_bind_message()
3) exec_describe_portal_message()
4) printtup()
5) EndCommand()
6) ReadyForQuery()




Re: Nicer error when connecting to standby with hot_standby=off

2020-03-09 Thread James Coleman
On Mon, Mar 9, 2020 at 6:28 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-08 20:12:21 -0400, James Coleman wrote:
> > I recently noticed while setting up a test environment that attempting to
> > connect to a standby running without hot_standby=on results in a fairly
> > generic error (I believe "the database system is starting up"). I don't
> > have my test setup running right now, so can't confirm with a repro case at
> > the moment, but with a little bit of spelunking I noticed that error text
> > only shows up in src/backend/postmaster/postmaster.c when
> > port->canAcceptConnections has the value CAC_STARTUP.
> >
> > Ideally the error message would include something along the lines of "The
> > server is running as a standby but cannot accept connections with
> > hot_standby=off".
>
> Yea, something roughly like that would be good.

Awesome, thanks for the early feedback!

> > I wanted to get some initial feedback on the idea before writing a patch:
> > does that seem like a reasonable change? Is it actually plausible to
> > distinguish between this state and "still recovering" (i.e., when starting
> > up a hot standby but initial recovery hasn't completed so it legitimately
> > can't accept connections yet)? If so, should we include the possibility if
> > hot_standby isn't on, just in case?
>
> Yes, it is feasible to distinguish those cases. And we should, if we're
> going to change things around.

I'll look into this hopefully soon, but it's helpful to know that it's
possible. Is it basically along the lines of checking to see if the
LSN is past the minimum recovery point?

> > The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
> > which makes me wonder if changing this value would result in a wire
> > protocol change/something the client wants to know about? If so, I assume
> > it's not reasonable to change the value, but would it still be reasonable
> > to change the error text?
>
> The value shouldn't be visible to clients in any way. While not obvious
> from the name, there's this comment at the top of the header:
>
>  *Note that this is backend-internal and is NOT exported to clients.
>  *Structs that need to be client-visible are in pqcomm.h.

This is also helpful.

Thanks,
James




Re: Bug in pg_restore with EventTrigger in parallel mode

2020-03-09 Thread Fabrízio de Royes Mello
On Mon, Mar 9, 2020 at 3:59 PM Tom Lane  wrote:
>
> =?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?=  writes:
> > On Mon, Mar 9, 2020 at 12:27 PM Tom Lane  wrote:
> >> Which, TBH, makes me wonder about the validity of the original
complaint
> >> in this thread.  I don't mind delaying ET restore as long as we
feasibly
> >> can; but if you have an ET that is going to misbehave during restore,
> >> you are in for pain, and it's hard to consider that that pain isn't
> >> self-inflicted.
>
> > The proposed patch solve the original complain. I was just trying to
> > understand completely what you pointed out before and I agree with you.
> > Thanks for the clear explanation.
>
> OK, thanks for confirming that this solves your issue in practice.
>
> > About the patch LGTM and IMHO we should back-patch it to all supported
> > versions.
>
> Done.
>

Great, thanks!

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-03-09 Thread Justin Pryzby
On Sat, Feb 29, 2020 at 08:53:04AM -0600, Justin Pryzby wrote:
> On Sat, Feb 29, 2020 at 03:35:27PM +0300, Alexey Kondratov wrote:
> > Anyway, new version is attached. It is rebased in order to resolve conflicts
> > with a recent fix of REINDEX CONCURRENTLY + temp relations, and includes
> > this small comment fix.
> 
> Thanks for rebasing - I actually started to do that yesterday.
> 
> I extracted the bits from your original 0001 patch which handled CLUSTER and
> VACUUM FULL.  I don't think if there's any interest in combining that with
> ALTER anymore.  On another thread (1), I tried to implement that, and Tom
> pointed out problem with the implementation, but also didn't like the idea.
> 
> I'm including some proposed fixes, but didn't yet update the docs, errors or
> tests for that.  (I'm including your v8 untouched in hopes of not messing up
> the cfbot).  My fixes avoid an issue if you try to REINDEX onto pg_default, I
> think due to moving system toast indexes.

I was able to avoid this issue by adding a call to GetNewRelFileNode, even
though that's already called by RelationSetNewRelfilenode().  Not sure if
there's a better way, or if it's worth Alexey's v3 patch which added a
tablespace param to RelationSetNewRelfilenode.

The current logic allows moving all the indexes and toast indexes, but I think
we should use IsSystemRelation() unless allow_system_table_mods, like existing
behavior of ALTER.

template1=# ALTER TABLE pg_extension_oid_index SET tablespace pg_default;
ERROR:  permission denied: "pg_extension_oid_index" is a system catalog
template1=# REINDEX INDEX pg_extension_oid_index TABLESPACE pg_default;
REINDEX

Finally, I think the CLUSTER is missing permission checks.  It looks like
relation_is_movable was factored out, but I don't see how that helps ?

Alexey, I'm hoping to hear back if you think these changes are ok or if you'll
publish a new version of the patch addressing the crash I reported.
Or if you're too busy, maybe someone else can adopt the patch (I can help).

-- 
Justin
>From 259e84e31b5ef0348987036ebc8ef3cc1ba85aa9 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Sat, 29 Feb 2020 15:35:27 +0300
Subject: [PATCH v10 1/5] Allow REINDEX to change tablespace

>From d2b7a5fa2e11601759b47af0c142a7824ef907a2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 30 Dec 2019 20:00:37 +0300
Subject: [PATCH v8] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml | 24 +-
 src/backend/catalog/index.c   | 75 --
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  | 96 ---
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/parser/gram.y | 14 ++--
 src/backend/tcop/utility.c|  6 +-
 src/bin/psql/tab-complete.c   |  6 ++
 src/include/catalog/index.h   |  7 +-
 src/include/commands/defrem.h |  6 +-
 src/include/nodes/parsenodes.h|  1 +
 src/test/regress/input/tablespace.source  | 49 
 src/test/regress/output/tablespace.source | 66 
 15 files changed, 323 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index c54a7c420d..0628c94bb1 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name [ TABLESPACE new_tablespace ]
 
 where option can be one of:
 
@@ -174,6 +174,28 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, then
+  all unsuitable relations will be skipped and a single WARNING
+  will be generated.
+ 
+
+   
+
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7223679033..3d98e9164a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1235,9 +1235,13 @@ index_create(Relation heapRelation,
  * Create concurrently an index based on the definition of the one provided by
  * caller.  The index is inserted into catalogs and needs to be built later
  * on.  This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the new 

Re: time for catalog/pg_cast.c?

2020-03-09 Thread Alvaro Herrera
On 2020-Mar-09, Alvaro Herrera wrote:

> I extracted from the latest multirange patch a bit that creates a new
> routine CastCreate() in src/backend/catalog/pg_cast.c.  It contains the
> catalog-accessing bits to create a new cast.  It seems harmless, so I
> thought I'd apply it to get rid of a couple of hunks in the large patch.

I forgot to "git add" this comment addition before sending:

/*
 * 
 *  CastCreate
 *
 * Forms and inserts catalog tuples for a new cast being created.
 * Caller must have already checked privileges, and done consistency
 * checks on the given datatypes and cast function (if applicable).
 *
 * 'behavior' indicates the dependency that the new cast will have on
 * its input and output types and the cast function.
 * 
 */
ObjectAddress
CastCreate(Oid sourcetypeid, Oid targettypeid, Oid funcid, char castcontext,
   char castmethod, DependencyType behavior)


I think the only API consideration here is for 'castcontext', which we
pass here as the pg_type.h symbol, but could alternatively be passed as
CoercionContext enum values (from primnodes.h).  I think the pg_cast.h
char is okay, but maybe somebody has a different opinion.
We could also add some trivial asserts (like if procoid is not invalid,
then method is function, etc.), but it doesn't seem worth fussing too
much over.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Nicer error when connecting to standby with hot_standby=off

2020-03-09 Thread Andres Freund
Hi,

On 2020-03-08 20:12:21 -0400, James Coleman wrote:
> I recently noticed while setting up a test environment that attempting to
> connect to a standby running without hot_standby=on results in a fairly
> generic error (I believe "the database system is starting up"). I don't
> have my test setup running right now, so can't confirm with a repro case at
> the moment, but with a little bit of spelunking I noticed that error text
> only shows up in src/backend/postmaster/postmaster.c when
> port->canAcceptConnections has the value CAC_STARTUP.
> 
> Ideally the error message would include something along the lines of "The
> server is running as a standby but cannot accept connections with
> hot_standby=off".

Yea, something roughly like that would be good.


> I wanted to get some initial feedback on the idea before writing a patch:
> does that seem like a reasonable change? Is it actually plausible to
> distinguish between this state and "still recovering" (i.e., when starting
> up a hot standby but initial recovery hasn't completed so it legitimately
> can't accept connections yet)? If so, should we include the possibility if
> hot_standby isn't on, just in case?

Yes, it is feasible to distinguish those cases. And we should, if we're
going to change things around.


> The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
> which makes me wonder if changing this value would result in a wire
> protocol change/something the client wants to know about? If so, I assume
> it's not reasonable to change the value, but would it still be reasonable
> to change the error text?

The value shouldn't be visible to clients in any way. While not obvious
from the name, there's this comment at the top of the header:

 *Note that this is backend-internal and is NOT exported to clients.
 *Structs that need to be client-visible are in pqcomm.h.


Greetings,

Andres Freund




Re: effective_io_concurrency's steampunk spindle maths

2020-03-09 Thread Thomas Munro
On Sat, Mar 7, 2020 at 11:54 PM Evgeniy Shishkin  wrote:
> > On Mar 7, 2020, at 00:33, Thomas Munro  wrote:
> > That is indeed what led me to start thinking about what a good new
> > name would be.
>
> MySQL has a term io_capacity.
> https://dev.mysql.com/doc/refman/8.0/en/innodb-configuring-io-capacity.html
> > The innodb_io_capacity variable defines the overall I/O capacity available 
> > to InnoDB. It should be set to approximately the number of I/O operations 
> > that the system can perform per second (IOPS). When innodb_io_capacity is 
> > set, InnoDB estimates the I/O bandwidth available for background tasks 
> > based on the set value.
> >
>
> Perhaps we can have maintenance_io_capacity as well.

That sounds like total I/O capacity for your system that will be
shared out for various tasks, which would definitely be nice to have,
but here we're talking about a simpler per-operation settings.  What
we have is a bit like work_mem (a memory limit used for each
individual hash, sort, tuplestore, ...), compared to a hypothetical
whole-system memory budget (which would definitely also be nice to
have).




Re: range_agg

2020-03-09 Thread Jeff Davis
On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote:
> Actually ... have you given any thought to just deciding that ranges
> and
> multiranges are the same type? 

It has come up in a number of conversations, but I'm not sure if it was
discussed on this list.

> I think on the whole the advantages win,
> and I feel like that might also be the case here.

Some things to think about:

1. Ranges are common -- at least implicitly -- in a lot of
applications/systems. It's pretty easy to represent extrernal data as
ranges in postgres, and also to represent postgres ranges in external
systems. But I can see multiranges causing friction around a lot of
common tasks, like displaying in a UI. If you only expect ranges, you
can add a CHECK constraint, so this is annoying but not necessarily a
deal-breaker.

2. There are existing client libraries[1] that support range types and
transform them to types within the host language. Obviously, those
would need to be updated to expect multiple ranges.

3. It seems like we would want some kind of base "range" type. When you
try to break a multirange down into constituent ranges, what type would
those pieces be? (Aside: how do you get the constituent ranges?)

I'm thinking more about casting to see if there's a possible compromise
there.

Regards,
Jeff Davis

[1] 
https://sfackler.github.io/rust-postgres-range/doc/v0.8.2/postgres_range/





Re: range_agg

2020-03-09 Thread Alvaro Herrera
I wonder what's the point of multirange arrays.  Is there a reason we
create those?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: range_agg

2020-03-09 Thread Tom Lane
Alvaro Herrera  writes:
> I wonder what's the point of multirange arrays.  Is there a reason we
> create those?

That's what we thought about arrays of composites to start with,
too.

regards, tom lane




Re: Improve search for missing parent downlinks in amcheck

2020-03-09 Thread Peter Geoghegan
On Sun, Mar 8, 2020 at 2:52 PM Alexander Korotkov
 wrote:
> I've revised this comment.  Hopefully it's better now.

I think that the new comments about why we need a low key for the page
are much better now.

> I've updated this comment using terms "cousin page" and "subtree".  I
> didn't refer the diagram example, because it doesn't contain
> appropriate case.  And I wouldn't like this diagram to contain such
> case, because that probably makes this diagram too complex.  I've also
> invented term "uncle page". BTW, should it be "aunt page"?  I don't
> know.

I have never heard the term "uncle page" before, but I like it --
though maybe say "right uncle page". That happens to be the exact
relationship that we're talking about here. I think any one of
"uncle", "aunt", or "uncle/aunt" are acceptable. We'll probably never
need to use this term again, but it seems like the right term to use
here.

Anyway, this section also seems much better now.

Other things that I noticed:

* Typo:

> +   /*
> +* We don't call bt_child_check() for "negative infinity" items.
> +* But if we're performatin downlink connectivity check, we do it
> +* for every item including "negative infinity" one.
> +*/

s/performatin/performing/

* Suggest that you say "has incomplete split flag set" here:

> + * - The call for block 4 will initialize data structure, but doesn't do 
> actual
> + *   checks assuming page 4 has incomplete split.

* More importantly, is this the right thing to say about page 4? Isn't
it also true that page 4 is the leftmost leaf page, and therefore kind
of special in another way? Even without having the incomplete split
flag set at all? Wouldn't it be better to address the incomplete split
flag issue by making that apply to some other page that isn't also the
leftmost? That would allow you to talk about the leftmost case
directly here. Or it would at least make it less confusing.

BTW, a P_LEFTMOST() assertion at the beginning of
bt_child_highkey_check() would make this easier to follow.

* Correct spelling is "happens" here:

> +* current child page is not incomplete split, then its high key
> +* should match to the target's key of current offset number. This
> +* happends when child page referenced by previous downlink is

* Actually, maybe this whole sentence should be reworded instead --
say "This happens when a previous call here (to
bt_child_highkey_check()) found an incomplete split, and we reach a
right sibling page without a downlink -- the right sibling page's high
key still needs to be matched to a separator key on the parent/target
level".

* Maybe say "Don't apply OffsetNumberNext() to target_downlinkoffnum
when we already had to step right on the child level. Our traversal of
the child level must try to move in perfect lockstep behind (to the
left of) the target/parent level traversal."

I found this detail very confusing at first.

* The docs should say "...relationships, including checking that there
are no missing downlinks in the index structure" here:

>unlike bt_index_check,
>bt_index_parent_check also checks
> -  invariants that span parent/child relationships.
> +  invariants that span parent/child relationships including check
> +  that there are no missing downlinks in the index structure.
>bt_index_parent_check follows the general
>convention of raising an error if it finds a logical
>inconsistency or other problem.

This is very close now. I would be okay with you committing the patch
once you deal with this feedback. If you prefer, I can take another
look at a new revision.

--
Peter Geoghegan




RE: Proposal: Add more compile-time asserts to expose inconsistencies.

2020-03-09 Thread Smith, Peter
Hi Michael,

Sorry I was AWOL for a couple of months. Thanks for taking the patch further, 
and committing it.

Kind Regards
---
Peter Smith
Fujitsu Australia





Re: Index Skip Scan

2020-03-09 Thread David Rowley
On Tue, 10 Mar 2020 at 08:56, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Assuming we'll implement it in a way that we do not know about what kind
> of path type is that in create_distinct_path, then it can also work for
> ProjectionPath or anything else (if UniqueKeys are present). But then
> still EquivalenceMember are used only to figure out correct
> distinctPrefixKeys and do not affect whether or not skipping is applied.
> What do I miss?

I'm not sure I fully understand the question correctly, but let me
explain further.

In the 0001 patch, standard_qp_callback sets the query_uniquekeys
depending on the DISTINCT / GROUP BY clause.  When building index
paths in build_index_paths(), the 0002 patch should be looking at the
root->query_uniquekeys to see if it can build any index paths that
suit those keys.  Such paths should be tagged with the uniquekeys they
satisfy, basically, exactly the same as how pathkeys work.  Many
create_*_path functions will need to be modified to carry forward
their uniquekeys. For example, create_projection_path(),
create_limit_path() don't do anything which would cause the created
path to violate the unique keys.   This way when you get down to
create_distinct_paths(), paths other than IndexPath may have
uniquekeys.  You'll be able to check which existing paths satisfy the
unique keys required by the DISTINCT / GROUP BY and select those paths
instead of having to create any HashAggregate / Unique paths.

Does that answer the question?




Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2020-03-09 Thread Thomas Munro
On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO  wrote:
> >> Thank you very much! I'm going to send a new patch set until the end of
> >> this week (I'm sorry I was very busy in the release of Postgres Pro
> >> 11...).
> >
> > Is anyone interested in rebasing this, and summarising what needs to
> > be done to get it in?  It's arguably a bug or at least quite
> > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
> > that a couple of forks already ship Marina's patch set.
>
> I'm a reviewer on this patch, that I find a good thing (tm), and which was
> converging to a reasonable and simple enough addition, IMHO.
>
> If I proceed in place of Marina, who is going to do the reviews?

Hi Fabien,

Cool.  I'll definitely take it for a spin if you post a fresh patch
set.  Any place that we arbitrarily don't support SERIALIZABLE, I
consider a bug, so I'd like to commit this if we can agree it's ready.
It sounds like it's actually in pretty good shape.




Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2020-03-09 Thread Fabien COELHO



Hello Thomas,


Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).


Is anyone interested in rebasing this, and summarising what needs to
be done to get it in?  It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.


I'm a reviewer on this patch, that I find a good thing (tm), and which was 
converging to a reasonable and simple enough addition, IMHO.


If I proceed in place of Marina, who is going to do the reviews?

--
Fabien.




Re: Index Skip Scan

2020-03-09 Thread Dmitry Dolgov
> On Mon, Mar 09, 2020 at 10:27:26AM +1300, David Rowley wrote:
>
> > > I think the changes in create_distinct_paths() need more work.  The
> > > way I think this should work is that create_distinct_paths() gets to
> > > know exactly nothing about what path types support the elimination of
> > > duplicate values.  The Path should carry the UniqueKeys so that can be
> > > determined. In create_distinct_paths() you should just be able to make
> > > use of those paths, which should already have been created when
> > > creating index paths for the rel due to PlannerInfo's query_uniquekeys
> > > having been set.
> >
> > Just for me to clarify. The idea is to "move" information about what
> > path types support skipping into UniqueKeys (derived from PlannerInfo's
> > query_uniquekeys), but other checks (e.g. if index am supports that)
> > still perform in create_distinct_paths?
>
> create_distinct_paths() shouldn't know any details specific to the
> pathtype that it's using or considering using.  All the details should
> just be in Path. e.g. uniquekeys, pathkeys, costs etc. There should be
> no IsA(path, ...).  Please have a look over the details in my reply to
> Tomas. I hope that reply has enough information in it, but please
> reply there if I've missed something.

Yes, I've read this reply, just wanted to ask here, since I had other
questions as well. Speaking of which:

> > > On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
> > > There's also some weird looking assumptions that an EquivalenceMember
> > > can only be a Var in create_distinct_paths(). I think you're only
> > > saved from crashing there because a ProjectionPath will be created
> > > atop of the IndexPath to evaluate expressions, in which case you're
> > > not seeing the IndexPath.

I'm probably missing something, so to eliminate any misunderstanding
from my side:

> > > This results in the optimisation not working in cases like:
> > >
> > > postgres=# create table t (a int); create index on t ((a+1)); explain
> > > select distinct a+1 from t;
> > > CREATE TABLE
> > > CREATE INDEX
> > > QUERY PLAN
> > > ---
> > >  HashAggregate  (cost=48.25..50.75 rows=200 width=4)
> > >Group Key: (a + 1)
> > >->  Seq Scan on t  (cost=0.00..41.88 rows=2550 width=4)

In this particular example skipping is not applied because, as you've
mentioned, we're dealing with ProjectionPath (not IndexScan /
IndexOnlyScan). Which means we're not even reaching the code with
EquivalenceMember, so I'm still not sure how do they connected?

Assuming we'll implement it in a way that we do not know about what kind
of path type is that in create_distinct_path, then it can also work for
ProjectionPath or anything else (if UniqueKeys are present). But then
still EquivalenceMember are used only to figure out correct
distinctPrefixKeys and do not affect whether or not skipping is applied.
What do I miss?




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-09 Thread David Rowley
On Sat, 7 Mar 2020 at 03:45, Laurenz Albe  wrote:
>
> Thanks, Justin, for the review.
> I have applied the changes where still applicable.
>
> On Fri, 2020-03-06 at 10:52 +1300, David Rowley wrote:
> > Lack of a scale_factor does leave people who regularly truncate their
> > "append-only" tables out in the cold a bit.  Perhaps they'd like
> > index-only scans to kick in soon after they truncate without having to
> > wait for 10 million tuples, or so.
>
> That point I don't see.
> Truncating a table resets the counters to 0.

The scenario there is that if we don't have any
autovacuum_vacuum_insert_scale_factor and we set the threshold to 10
million tuples.  The user truncates the table on a monthly basis and
nearer to the end of the month the tuples accumulates around 100
million tuples, roughly 3.2 million are inserted per day, so
auto-vacuum kicks in for this table around once every 3 days.  At the
start of the month, the table is truncated and it begins refilling.
The n_ins_since_vacuum is reset to 0 during the truncate. Meanwhile,
the table is being queried constantly and it takes 3 days for us to
vacuum the table again. Queries hitting the table are unable to use
Index Only Scans for 3 days.  The DBAs don't have a lot of control
over this.

I think we can help users with that by giving them a bit more control
over when auto-vacuum will run for the table. scale_factor and
threshold.

> Updated patch attached.

Great. I'll have a look.




Re: Add an optional timeout clause to isolationtester step.

2020-03-09 Thread Andres Freund
Hi,

On 2020-03-07 22:17:09 +0100, Julien Rouhaud wrote:
> For reindex concurrently, a SELECT FOR UPDATE on a different connection can
> ensure that the reindex will be stuck at some point, so canceling the command
> after a long enough timeout reproduces the original faulty behavior.

That kind of thing can already be done using statement_timeout or
lock_timeout, no?

Greetings,

Andres Freund




Re: Add absolute value to dict_int

2020-03-09 Thread Tom Lane
I wrote:
> There are at least three things we could do here:
> 1. Insist that defGetBoolean and its siblings should accept the
> string equivalent of any non-string value they accept.  This
> would change the behavior of a whole lot of utility commands,
> not only the text search commands, and I'm not exactly convinced
> it's a good idea.  Seems like it's losing error detection
> capability.
> 2. Improve the catalog representation of text search parameters
> so that the original Value node can be faithfully reconstructed.
> I'd be for this, except it seems like a lot of work for a rather
> minor benefit.
> 3. Rearrange text search parameter validation so we smash parameters
> to strings *before* we validate them, ensuring we won't take any
> settings that will be rejected later.

I still don't much like #1, but after looking closer, #2 is not as
impractical as I thought.  The catalog representation doesn't need
any redefinition really, because per the existing comments,

 * For the convenience of pg_dump, the output is formatted exactly as it
 * would need to appear in CREATE TEXT SEARCH DICTIONARY to reproduce the
 * same options.

So all we really need to do is upgrade [de]serialize_deflist to be smarter
about int and float nodes.  This is still a bit invasive because somebody
decided to make deserialize_deflist serve two masters, and I don't feel
like working through whether the prsheadline code would cope nicely with
non-string output nodes.  So the first patch attached adds a flag argument
to deserialize_deflist to tell it whether to force all the values to
strings.

Alternatively, we could do #3, as in the second patch below.  This
seems clearly Less Good, but it's a smaller/safer patch, and it's
at least potentially back-patchable, whereas changing the signature
of deserialize_deflist in stable branches would risk trouble.

(I didn't bother with regression test additions yet, but some would
be appropriate for either patch.)

Given the lack of field complaints, I'm not that excited about
back-patching anything for this.  So my inclination is to go with #2
(first patch) and fix it in HEAD only.

Thoughts?

regards, tom lane

diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index 9dca682..0731ca5 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -36,6 +36,7 @@
 #include "commands/alter.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
+#include "common/string.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
@@ -52,6 +53,8 @@ static void MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
 	 HeapTuple tup, Relation relMap);
 static void DropConfigurationMapping(AlterTSConfigurationStmt *stmt,
 	 HeapTuple tup, Relation relMap);
+static DefElem *buildDefItem(const char *name, const char *val,
+			 bool force_strings);
 
 
 /* - TS Parser commands  */
@@ -566,7 +569,7 @@ AlterTSDictionary(AlterTSDictionaryStmt *stmt)
 	if (isnull)
 		dictoptions = NIL;
 	else
-		dictoptions = deserialize_deflist(opt);
+		dictoptions = deserialize_deflist(opt, false);
 
 	/*
 	 * Modify the options list as per specified changes
@@ -1519,9 +1522,6 @@ DropConfigurationMapping(AlterTSConfigurationStmt *stmt,
  * For the convenience of pg_dump, the output is formatted exactly as it
  * would need to appear in CREATE TEXT SEARCH DICTIONARY to reproduce the
  * same options.
- *
- * Note that we assume that only the textual representation of an option's
- * value is interesting --- hence, non-string DefElems get forced to strings.
  */
 text *
 serialize_deflist(List *deflist)
@@ -1539,19 +1539,30 @@ serialize_deflist(List *deflist)
 
 		appendStringInfo(, "%s = ",
 		 quote_identifier(defel->defname));
-		/* If backslashes appear, force E syntax to determine their handling */
-		if (strchr(val, '\\'))
-			appendStringInfoChar(, ESCAPE_STRING_SYNTAX);
-		appendStringInfoChar(, '\'');
-		while (*val)
+
+		/*
+		 * If the value is a T_Integer or T_Float, emit it without quotes,
+		 * otherwise with quotes.  This is essential to allow correct
+		 * reconstruction of the node type as well as the value.
+		 */
+		if (IsA(defel->arg, Integer) || IsA(defel->arg, Float))
+			appendStringInfoString(, val);
+		else
 		{
-			char		ch = *val++;
+			/* If backslashes appear, force E syntax to quote them safely */
+			if (strchr(val, '\\'))
+appendStringInfoChar(, ESCAPE_STRING_SYNTAX);
+			appendStringInfoChar(, '\'');
+			while (*val)
+			{
+char		ch = *val++;
 
-			if (SQL_STR_DOUBLE(ch, true))
+if (SQL_STR_DOUBLE(ch, true))
+	appendStringInfoChar(, ch);
 appendStringInfoChar(, ch);
-			appendStringInfoChar(, ch);
+			}
+			appendStringInfoChar(, '\'');
 		}
-		appendStringInfoChar(, '\'');
 		if (lnext(deflist, l) != NULL)
 			appendStringInfoString(, ", ");
 	}
@@ -1566,10 +1577,12 @@ 

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-09 Thread David Rowley
On Tue, 10 Mar 2020 at 09:56, David Rowley  wrote:
>
> On Sat, 7 Mar 2020 at 03:45, Laurenz Albe  wrote:
> > Updated patch attached.
>
> Great. I'll have a look.

I don't really have many complaints about the v4 patch.  However,
during my pass of it, I did note down a few things that you might want
to have a look at.

1. Do we need to change documentation on freeze_min_age to mention
that it does not apply in all cases? I'm leaning towards not changing
this as `VACUUM FREEZE` is also an exception to this, which I don't
see mentioned.

2. Perhaps the documentation in maintenance.sgml should mention that
the table will be vacuumed with the equivalent of having
vacuum_freeze_min_age = 0, instead of:

"Such a vacuum will aggressively freeze tuples."

aggressive is the wrong word here. We call it an aggressive vacuum if
we disable page skipping, not for setting the vacuum_freeze_min_age to
0.

See heap_vacuum_rel()

/*
* We request an aggressive scan if the table's frozen Xid is now older
* than or equal to the requested Xid full-table scan limit; or if the
* table's minimum MultiXactId is older than or equal to the requested
* mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was specified.
*/
aggressive = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
   xidFullScanLimit);
aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
  mxactFullScanLimit);
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
aggressive = true;

3. The following DEBUG3 elog should be updated to include the new values:

elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)",
NameStr(classForm->relname),
vactuples, vacthresh, anltuples, anlthresh);

Someone might be confused at why auto-vacuum is running if you don't
put those in.

4. This would be nicer if you swapped the order of the operands to the
< condition and replaced the operator with >. That'll match the way it
is done above.

/*
* If the number of inserted tuples exceeds the threshold and no
* vacuum is necessary for other reasons, run an "insert-only" vacuum
* that freezes aggressively.
*/
if (!(*dovacuum) && vacinsthresh < tabentry->inserts_since_vacuum)
{
*dovacuum = true;
*freeze_all = true;
}

It would also be nicer if you assigned the value of
tabentry->inserts_since_vacuum to a variable, so as to match what the
other code there is doing. That'll also make the change for #3 neater.

5. The following text:

A threshold similar to the above is calculated from
 and
.
Tables that have received more inserts than the calculated threshold
since they were last vacuumed (and are not eligible for vacuuming for
other reasons) will be vacuumed to reduce the impact of a future
anti-wraparound vacuum run.

I think "... will be vacuumed with the equivalent of having  set to 0".
I'm not sure we need to mention the reduction of impact to
anti-wraparound vacuums.

6. Please run the regression tests and make sure they pass. The
"rules" test is currently failing due to the new column in
"pg_stat_all_tables"

Apart from the above, does anyone else have objections or concerns
with the patch?  I'd like to take a serious look at pushing it once
the above points are resolved.




Re: Nicer error when connecting to standby with hot_standby=off

2020-03-09 Thread James Coleman
On Mon, Mar 9, 2020 at 8:06 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-09 18:40:32 -0400, James Coleman wrote:
> > On Mon, Mar 9, 2020 at 6:28 PM Andres Freund  wrote:
> > > > I wanted to get some initial feedback on the idea before writing a 
> > > > patch:
> > > > does that seem like a reasonable change? Is it actually plausible to
> > > > distinguish between this state and "still recovering" (i.e., when 
> > > > starting
> > > > up a hot standby but initial recovery hasn't completed so it 
> > > > legitimately
> > > > can't accept connections yet)? If so, should we include the possibility 
> > > > if
> > > > hot_standby isn't on, just in case?
> > >
> > > Yes, it is feasible to distinguish those cases. And we should, if we're
> > > going to change things around.
> >
> > I'll look into this hopefully soon, but it's helpful to know that it's
> > possible. Is it basically along the lines of checking to see if the
> > LSN is past the minimum recovery point?
>
> No, I don't think that's the right approach. IIRC the startup process
> (i.e. the one doing the WAL replay) signals postmaster once consistency
> has been achieved. So you can just use that state.

I've taken that approach in the attached patch (I'd expected to wait
until later to work on this...but it seemed pretty small so I ended up
hacking on it this evening).

I don't have tests included: I tried intentionally breaking the
existing behavior (returning no error when hot_standby=off), but
running make check-world (including tap tests) didn't find any
breakages. I can look into that more deeply at some point, but if you
happen to know a place we test similar things, then I'd be happy to
hear it.

One other question: how is error message translation handled? I
haven't added entries to the relevant files, but also I'm obviously
not qualified to write them.

Thanks,
James
From 0389f9dd7b48b3058849960826a49079abb58e58 Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v1] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 12 +---
 src/include/libpq/libpq-be.h|  7 ++-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 55187eb910..63258287f7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2287,6 +2287,12 @@ retry1:
 	(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 	 errmsg("the database system is starting up")));
 			break;
+		case CAC_STANDBY:
+			ereport(FATAL,
+	(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+	 errmsg("the database system is up, but hot_standby is off")));
+			break;
+
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 	(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2436,10 +2442,10 @@ canAcceptConnections(int backend_type)
 			result = CAC_WAITBACKUP;	/* allow superusers only */
 		else if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
- (pmState == PM_STARTUP ||
-  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_STANDBY; /* connection disallowed on non-hot standby */
 		else if (!FatalError &&
  pmState == PM_HOT_STANDBY)
 			result = CAC_OK;	/* connection OK during hot standby */
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 82e57afc64..effa78a84a 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_STANDBY,
+	CAC_TOOMANY,
 	CAC_WAITBACKUP
 } CAC_state;
 

base-commit: 71e0d0a73773b3985db658d3c5366ce5ceef76ae
-- 
2.17.1



Re: Asynchronous Append on postgres_fdw nodes.

2020-03-09 Thread movead li
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

I occur a strange issue when a exec 'make installcheck-world', it is:

##
...
== running regression test queries==
test adminpack... FAILED   60 ms

==
 1 of 1 tests failed. 
==

The differences that caused some tests to fail can be viewed in the
file "/work/src/postgres_app_for/contrib/adminpack/regression.diffs".  A copy 
of the test summary that you see
above is saved in the file 
"/work/src/postgres_app_for/contrib/adminpack/regression.out".
...
##

And the content in 'contrib/adminpack/regression.out' is:
##
SELECT pg_file_write('/tmp/test_file0', 'test0', false);
 ERROR:  absolute path not allowed
 SELECT pg_file_write(current_setting('data_directory') || '/test_file4', 
'test4', false);
- pg_file_write 

- 5
-(1 row)
-
+ERROR:  reference to parent directory ("..") not allowed
 SELECT pg_file_write(current_setting('data_directory') || '/../test_file4', 
'test4', false);
 ERROR:  reference to parent directory ("..") not allowed
 RESET ROLE;
@@ -149,7 +145,7 @@
 SELECT pg_file_unlink('test_file4');
  pg_file_unlink 
 
- t
+ f
 (1 row)
##

However the issue does not occur when I do a 'make check-world'.
And it doesn't occur when I test the 'make installcheck-world' without the 
patch.

The new status of this patch is: Waiting on Author


Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-09 Thread Fujii Masao




On 2020/03/09 14:21, Magnus Hagander wrote:

On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi
 wrote:


At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander  wrote 
in

On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao  wrote:

I believe that the time required to estimate the backup size is not so large
in most cases, so in the above idea, most users don't need to specify more
option for the estimation. This is good for UI perspective.

OTOH, users who are worried about the estimation time can use
--no-estimate-backup-size option and skip the time-consuming estimation.


Personally, I think this is the best idea. it brings a "reasonable
default", since most people are not going to have this problem, and
yet a good way to get out from the issue for those that potentially
have it. Especially since we are now already showing the state that
"walsender is estimating the size", it should be easy enugh for people
to determine if they need to use this flag or not.

In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
pretty clear things are about backups when you call pg_basebackup, and
it keeps the option a bit more reasonable in length.


+1


I agree to the negative option and the shortened name.  What if both
--no-estimate-size and -P are specifed?  Rejecting as conflicting
options or -P supercedes?  I would choose the former because we don't
know which of them has priority.


I would definitely prefer rejecting an invalid combination of options.


+1

So, I will make the patch adding support for --no-estimate-size option
in pg_basebackup.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: shared-memory based stats collector

2020-03-09 Thread Kyotaro Horiguchi
At Tue, 10 Mar 2020 12:27:25 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> That's true, but I have the same concern with Tom. The archive bacame
> too-tightly linked with other processes than actual relation. We may
> need the second static shared memory segment apart from the current
> one.

Anyway, I changed the target version of the entry to 14.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: DROP and ddl_command_end.

2020-03-09 Thread Kyotaro Horiguchi
Thanks.

At Mon, 9 Mar 2020 13:29:47 -0400, Robert Haas  wrote in 
> On Mon, Mar 9, 2020 at 3:54 AM Kyotaro Horiguchi
>  wrote:
> > I may be missing something, andt any opinions, thoughts or suggestions
> > are welcome.
> 
> Since it's a set-returning function, I would have expected that
> instead of trying to assign the result to a variable, you'd loop over
> it using FOR var IN query.

Yes, right and I know. I intended the sample being simple, but sorry
for the bogus example. But the problem is not there. The problem is,
the trigger is called for DROP, the function returns no tuples. I'm
not sure DROP is the only command to cause the behavior, but if no
tuples means DROP, we should document that behavior.  Otherwise, we
need to document something  like:

"pg_event_trigger_ddl_commands() may omit some of the commands and may
 return no tuples."

But it is quite odd.

> But if that's the problem, the error message is a bit odd.

The error message is correct if we allow zero-tuple return from the
function.  Is it useful if we return DROP event with more information
than just DROP ?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-09 Thread Michael Paquier
On Mon, Mar 09, 2020 at 03:38:29PM +0530, Kuntal Ghosh wrote:
> That's a good suggestion. But, it's unlikely that a caller would pass
> something longer than MAXPGPATH and we indeed use that value a lot in
> the code. IMHO, it looks okay to me to have that assumption here as
> well.

Well, a more serious problem would be to allocate something smaller
than MAXPGPATH.  This reminds me a bit of 09ec55b9 where we did not
correctly design from the start the base64 encode and decode routines
for SCRAM, so I'd rather design this one correctly from the start as
per the attached.  Alexey, Alexander, what do you think?
--
Michael
From 4e2b5b60e003ec67a4854f2625cc610f01428fe2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 9 Mar 2020 17:15:00 +0900
Subject: [PATCH v2] Move routine generating restore_command to src/common/

restore_command has only been used until now by the backend, but there
is a pending patch for pg_rewind to make use of that in the frontend.

Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee3616895...@postgrespro.ru
---
 src/include/common/archive.h |  29 ++
 src/backend/access/transam/xlogarchive.c |  62 ++---
 src/common/Makefile  |   1 +
 src/common/archive.c | 110 +++
 src/tools/msvc/Mkvcbuild.pm  |   4 +-
 5 files changed, 149 insertions(+), 57 deletions(-)
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/common/archive.c

diff --git a/src/include/common/archive.h b/src/include/common/archive.h
new file mode 100644
index 00..805930490b
--- /dev/null
+++ b/src/include/common/archive.h
@@ -0,0 +1,29 @@
+/*-
+ *
+ * archive.h
+ *	  Common WAL archive routines
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/archive.h
+ *
+ *-
+ */
+#ifndef ARCHIVE_H
+#define ARCHIVE_H
+
+/*
+ * Routine to build a restore command to retrieve WAL segments from a WAL
+ * archive.  This uses the arguments given by the caller to replace the
+ * corresponding alias fields as defined in the GUC parameter
+ * restore_command.
+ */
+extern bool BuildRestoreCommand(const char *restoreCommand,
+const char *xlogpath,	/* %p */
+const char *xlogfname,	/* %f */
+const char *lastRestartPointFname,	/* %r */
+char *commandResult,
+int commandResultLen);
+
+#endif			/* ARCHIVE_H */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 188b73e752..e1d6df89e1 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -21,6 +21,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "common/archive.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
 #include "replication/walsender.h"
@@ -55,9 +56,6 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	char		xlogpath[MAXPGPATH];
 	char		xlogRestoreCmd[MAXPGPATH];
 	char		lastRestartPointFname[MAXPGPATH];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
 	int			rc;
 	struct stat stat_buf;
 	XLogSegNo	restartSegNo;
@@ -149,58 +147,12 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogRestoreCmd;
-	endp = xlogRestoreCmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = recoveryRestoreCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-case 'p':
-	/* %p: relative path of target file */
-	sp++;
-	StrNCpy(dp, xlogpath, endp - dp);
-	make_native_path(dp);
-	dp += strlen(dp);
-	break;
-case 'f':
-	/* %f: filename of desired file */
-	sp++;
-	StrNCpy(dp, xlogfname, endp - dp);
-	dp += strlen(dp);
-	break;
-case 'r':
-	/* %r: filename of last restartpoint */
-	sp++;
-	StrNCpy(dp, lastRestartPointFname, endp - dp);
-	dp += strlen(dp);
-	break;
-case '%':
-	/* convert %% to a single % */
-	sp++;
-	if (dp < endp)
-		*dp++ = *sp;
-	break;
-default:
-	/* otherwise treat the % as not special */
-	if (dp < endp)
-		*dp++ = *sp;
-	break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
+	/* Build the restore command to execute */
+	if (!BuildRestoreCommand(recoveryRestoreCommand, xlogpath, xlogfname,
+			 lastRestartPointFname, xlogRestoreCmd,
+			 MAXPGPATH))
+		elog(ERROR, "could not build restore command \"%s\"",
+			 xlogRestoreCmd);
 
 	ereport(DEBUG3,
 			

Re: logical copy_replication_slot issues

2020-03-09 Thread Masahiko Sawada
On Mon, 9 Mar 2020 at 21:46, Arseny Sher  wrote:
>
>
> Masahiko Sawada  writes:
>
> > /*
> > -* Create logical decoding context, to build the initial snapshot.
> > +* Create logical decoding context to find start point or, if we don't
> > +* need it, to 1) bump slot's restart_lsn and xmin 2) check plugin 
> > sanity.
> >  */
> >
> > Do we need to numbering that despite not referring them?
>
> No, it just seemed clearer to me this way. I don't mind removing the
> numbers if you feel this is better.
>

Okay.

> > ctx = CreateInitDecodingContext(plugin, NIL,
> > -   false,  /* do not build snapshot */
> > +   false,  /* do not build data snapshot */
> > restart_lsn,
> > logical_read_local_xlog_page, NULL, 
> > NULL,
> > NULL);
> > I'm not sure this change makes the comment better. Could you elaborate
> > on the motivation of this change?
>
> Well, DecodingContextFindStartpoint always builds a snapshot allowing
> historical *catalog* lookups. This bool controls whether the snapshot
> should additionally be suitable for looking at the actual data, this is
> e.g. used by initial data sync in the native logical replication.

Okay.

Anyway, since the patch looks good to me I've marked this patch as
"Ready for Committer". I think we can defer these things to
committers.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2020-03-09 Thread Michael Paquier
On Tue, Mar 10, 2020 at 12:22:32AM +, Smith, Peter wrote:
> Sorry I was AWOL for a couple of months. Thanks for taking the patch
> further, and committing it. 

No problem, I am glad to see you around.
--
Michael


signature.asc
Description: PGP signature


Re: Add an optional timeout clause to isolationtester step.

2020-03-09 Thread Michael Paquier
On Mon, Mar 09, 2020 at 10:32:27PM -0400, Tom Lane wrote:
> It strikes me to wonder whether we could improve matters by teaching
> isolationtester to watch for particular values in a connected backend's
> pg_stat_activity.wait_event_type/wait_event columns.  Those columns
> didn't exist when isolationtester was designed, IIRC, so it's not
> surprising that they're not used in the current design.  But we could
> use them perhaps to detect that a backend has arrived at some state
> that's not a heavyweight-lock-wait state.

Interesting idea.  So that would be basically an equivalent of
PostgresNode::poll_query_until but for the isolation tester?  In short
we gain a meta-command that runs a SELECT query that waits until the
query defined in the command returns true.  The polling interval may
be tricky to set though.  If set too low it would consume resources
for nothing, and if set too large it would make the tests using this
meta-command slower than they actually need to be.  Perhaps something
like 100ms may be fine..
--
Michael


signature.asc
Description: PGP signature


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-09 Thread Laurenz Albe
On Tue, 2020-03-10 at 09:56 +1300, David Rowley wrote:
> > > Lack of a scale_factor does leave people who regularly truncate their
> > > "append-only" tables out in the cold a bit.  Perhaps they'd like
> > > index-only scans to kick in soon after they truncate without having to
> > > wait for 10 million tuples, or so.
> > 
> > That point I don't see.
> > Truncating a table resets the counters to 0.
> 
> The scenario there is that if we don't have any
> autovacuum_vacuum_insert_scale_factor and we set the threshold to 10
> million tuples.  The user truncates the table on a monthly basis and
> nearer to the end of the month the tuples accumulates around 100
> million tuples, roughly 3.2 million are inserted per day, so
> auto-vacuum kicks in for this table around once every 3 days.  At the
> start of the month, the table is truncated and it begins refilling.
> The n_ins_since_vacuum is reset to 0 during the truncate. Meanwhile,
> the table is being queried constantly and it takes 3 days for us to
> vacuum the table again. Queries hitting the table are unable to use
> Index Only Scans for 3 days.  The DBAs don't have a lot of control
> over this.
> 
> I think we can help users with that by giving them a bit more control
> over when auto-vacuum will run for the table. scale_factor and
> threshold.

Oh, that's a good point.
I only thought about anti-wraparound vacuum, but the feature might be useful
for index-only scans as well.

Yours,
Laurenz Albe





Re: reindex concurrently and two toast indexes

2020-03-09 Thread Michael Paquier
On Mon, Mar 09, 2020 at 08:04:27AM +0100, Julien Rouhaud wrote:
> On Mon, Mar 09, 2020 at 02:52:31PM +0900, Michael Paquier wrote:
>> For the index-level operation, issuing a WARNING is not consistent
>> with the existing practice to use an ERROR, which is more adapted as
>> the operation is done on a single index at a time.
> 
> Agreed.

Thanks for checking the patch.

>> It would have been nice to test this stuff.  However, this requires an
>> invalid toast index and we cannot create that except by canceling a
>> concurrent reindex, leading us back to the upthread discussion about
>> isolation tests, timeouts and fault injection :/
> 
> Yes, unfortunately I don't see an acceptable way to add tests for that without
> some kind of fault injection, so this will have to wait :(

Let's discuss that separately.

I have also been reviewing the isolation test you have added upthread
about the dependency handling of invalid indexes, and one thing that
we cannot really do is attempting to do a reindex at index or
table-level with invalid toast indexes as this leads to unstable ERROR
or WARNING messages.  But at least one thing we can do is to extend
the query you sent directly so as it exposes the toast relation name
filtered with regex_replace().  This gives us a stable output, and
this way the test makes sure that the query cancellation happened
after the dependencies are swapped, and not at build or validation
time (indisvalid got appended to the end of the output): 
+pg_toast.pg_toast__index_ccoldf
+pg_toast.pg_toast__indext

Please feel free to see the attached for reference, that's not
something for commit in upstream, but I am going to keep that around
in my own plugin tree.
--
Michael
From fd988b4141725082b30148da31d9c2a6a88c7319 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 10 Mar 2020 12:01:23 +0900
Subject: [PATCH] Add isolation test to check dependency handling of invalid
 indexes

This uses a statement_timeout of 1s, which is not really a good idea :D
---
 .../isolation/expected/reindex-invalid.out| 34 +
 src/test/isolation/isolation_schedule |  1 +
 src/test/isolation/specs/reindex-invalid.spec | 38 +++
 3 files changed, 73 insertions(+)
 create mode 100644 src/test/isolation/expected/reindex-invalid.out
 create mode 100644 src/test/isolation/specs/reindex-invalid.spec

diff --git a/src/test/isolation/expected/reindex-invalid.out b/src/test/isolation/expected/reindex-invalid.out
new file mode 100644
index 00..ef1eff9ce7
--- /dev/null
+++ b/src/test/isolation/expected/reindex-invalid.out
@@ -0,0 +1,34 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_select s2_timeout s2_reindex s2_check s1_commit s1_drop s2_check
+step s1_select: SELECT data FROM reind_con_invalid FOR UPDATE;
+data   
+
+foo
+bar
+step s2_timeout: SET statement_timeout = 1000;
+step s2_reindex: REINDEX TABLE CONCURRENTLY reind_con_invalid; 
+step s2_reindex: <... completed>
+ERROR:  canceling statement due to statement timeout
+step s2_check: SELECT regexp_replace(i.indexrelid::regclass::text, '(pg_toast_)([0-9+/=]+)(_index)', '\1\3'),
+	 i.indisvalid
+	FROM pg_class c
+	  JOIN pg_class t ON t.oid = c.reltoastrelid
+	  JOIN pg_index i ON i.indrelid = t.oid
+	WHERE c.relname = 'reind_con_invalid'
+	  ORDER BY c.relname;
+regexp_replace indisvalid 
+
+pg_toast.pg_toast__index_ccoldf  
+pg_toast.pg_toast__indext  
+step s1_commit: COMMIT;
+step s1_drop: DROP TABLE reind_con_invalid;
+step s2_check: SELECT regexp_replace(i.indexrelid::regclass::text, '(pg_toast_)([0-9+/=]+)(_index)', '\1\3'),
+	 i.indisvalid
+	FROM pg_class c
+	  JOIN pg_class t ON t.oid = c.reltoastrelid
+	  JOIN pg_index i ON i.indrelid = t.oid
+	WHERE c.relname = 'reind_con_invalid'
+	  ORDER BY c.relname;
+regexp_replace indisvalid 
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 2873cd7c21..28e5b8b16a 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -48,6 +48,7 @@ test: lock-committed-update
 test: lock-committed-keyupdate
 test: update-locked-tuple
 test: reindex-concurrently
+test: reindex-invalid
 test: propagate-lock-delete
 test: tuplelock-conflict
 test: tuplelock-update
diff --git a/src/test/isolation/specs/reindex-invalid.spec b/src/test/isolation/specs/reindex-invalid.spec
new file mode 100644
index 00..bf866d2279
--- /dev/null
+++ b/src/test/isolation/specs/reindex-invalid.spec
@@ -0,0 +1,38 @@
+# REINDEX CONCURRENTLY with invalid indexes
+#
+# Check that handling of dependencies with invalid indexes is correct.
+# When the parent table is dropped, all the indexes are dropped.
+
+setup
+{
+	CREATE TABLE reind_con_invalid (id serial primary key, data text);
+	INSERT INTO reind_con_invalid (data) VALUES 

Re: shared-memory based stats collector

2020-03-09 Thread Kyotaro Horiguchi
Thank you all for the suggestions.

At Mon, 9 Mar 2020 12:25:39 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2020-03-09 15:04:23 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2020-03-09 15:37:05 -0300, Alvaro Herrera wrote:
> > >> I'm worried that we're causing all processes to terminate when an
> > >> archiver dies in some ugly way; but in the current coding, it's pretty
> > >> harmless and we'd just start a new one.  I think this needs to be
> > >> reconsidered.  As far as I know, pgarchiver remains unconnected to
> > >> shared memory so a crash-restart cycle is not necessary.  We should
> > >> continue to just log the error message and move on.
> > 
> > > Why is it worth having the archiver be "robust" that way?
> > 
> > I'd ask a different question: what the heck is this patchset doing
> > touching the archiver in the first place?  I can see no plausible
> > reason for that doing anything related to stats collection.
> 
> As of a release or two back, it sends stats messages for archiving
> events:
> 
>   if (pgarch_archiveXlog(xlog))
>   {
>   /* successful */
>   pgarch_archiveDone(xlog);
> 
>   /*
>* Tell the collector about the WAL file that 
> we successfully
>* archived
>*/
>   pgstat_send_archiver(xlog, false);
> 
>   break;  /* out of inner retry 
> loop */
>   }
>   else
>   {
>   /*
>* Tell the collector about the WAL file that 
> we failed to
>* archive
>*/
>   pgstat_send_archiver(xlog, true);
> 
> 
> > If we now need some new background processing for stats, let's make a
> > new postmaster child process to do that, not overload the archiver
> > with unrelated responsibilities.
> 
> I don't think that's what's going on. It's just changing the archiver so
> it can report stats via shared memory - which subsequently means that it
> it can report stats via shared memory - which subsequently means that it
> needs to deal differently with errors than when it wasn't connected.

That's true, but I have the same concern with Tom. The archive bacame
too-tightly linked with other processes than actual relation. We may
need the second static shared memory segment apart from the current
one.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: bad logging around broken restore_command

2020-03-09 Thread Kyotaro Horiguchi
At Thu, 6 Feb 2020 23:23:42 +0900, Fujii Masao  
wrote in 
> On 2020/02/06 1:10, Jeff Janes wrote:
> > If the restore command claims to have succeeded, but fails to have created
> > a file with the right name (due to typos or escaping or quoting issues, for
> > example), no complaint is issued at the time, and it then fails later with
> > a relatively uninformative error message like "could not locate required
> > checkpoint record".
...
> > I don't see why ENOENT is thought to deserve the silent treatment.  It
> > seems weird that success gets logged ("restored log file \"%s\" from
> > archive"), but one particular type of unexpected failure does not.
> 
> Agreed.

In the first place it is not perfectly silent and that problem cannot
happen.  In the ENOENT case, the function reports "could not restore
file \"%s\" from archive: %s", but with DEBUG2 then returns false, and
the callers treat the failure properly.

> I've attached a patch which emits a LOG message for ENOENT.
> 
> Isn't it better to use "could not stat file" message even in ENOENT
> case?
> If yes, something like message that you used in the patch should be
> logged as DETAIL or HINT message. So, what about the attached patch?

If you want to see some log messages in the case, it is sufficient to
raise the loglevel of the existing message from DEBUG2 to LOG.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: shared-memory based stats collector

2020-03-09 Thread Andres Freund
On 2020-03-10 12:27:25 +0900, Kyotaro Horiguchi wrote:
> That's true, but I have the same concern with Tom. The archive bacame
> too-tightly linked with other processes than actual relation.

What's the problem here? We have a number of helper processes
(checkpointer, bgwriter) that are attached to shared memory, and it's
not a problem.


> We may need the second static shared memory segment apart from the
> current one.

That seems absurd to me. Solving a non-problem by introducing complex
new infrastructure.




Re: Add an optional timeout clause to isolationtester step.

2020-03-09 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Mar 09, 2020 at 10:32:27PM -0400, Tom Lane wrote:
>> It strikes me to wonder whether we could improve matters by teaching
>> isolationtester to watch for particular values in a connected backend's
>> pg_stat_activity.wait_event_type/wait_event columns.  Those columns
>> didn't exist when isolationtester was designed, IIRC, so it's not
>> surprising that they're not used in the current design.  But we could
>> use them perhaps to detect that a backend has arrived at some state
>> that's not a heavyweight-lock-wait state.

> Interesting idea.  So that would be basically an equivalent of
> PostgresNode::poll_query_until but for the isolation tester?

No, more like the existing isolationtester wait query, which watches
for something being blocked on a heavyweight lock.  Right now, that
one depends on a bespoke function pg_isolation_test_session_is_blocked(),
but it used to be a query on pg_stat_activity/pg_locks.

> In short
> we gain a meta-command that runs a SELECT query that waits until the
> query defined in the command returns true.  The polling interval may
> be tricky to set though.

I think it'd be just the same as the polling interval for the existing
wait query.  We'd have to have some way to mark a script step to say
what to check to decide that it's blocked ...

regards, tom lane




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-09 Thread Justin Pryzby
> +++ b/src/backend/utils/misc/postgresql.conf.sample
> +#autovacuum_vacuum_insert_threshold = 1000   # min number of row 
> inserts
> + # before vacuum

Similar to a previous comment [0] about reloptions or GUC:

Can we say "threshold number of insertions before vacuum" ?
..or "maximum number of insertions before triggering autovacuum"

-- 
Justin

[0] 
https://www.postgresql.org/message-id/602873766faa0e9200a60dcc26dc10c636761d5d.camel%40cybertec.at




Re: Add an optional timeout clause to isolationtester step.

2020-03-09 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Mar 09, 2020 at 03:15:58PM -0700, Andres Freund wrote:
>> That kind of thing can already be done using statement_timeout or
>> lock_timeout, no?

> Yep, still that's not something I would recommend to commit in the
> tree as that's a double-edged sword as you already know.  For slower
> machines, you need a statement_timeout large enough so as you make
> sure that the state you want the query to wait for is reached, which
> has a cost on all other faster machines as it makes the tests slower.

It strikes me to wonder whether we could improve matters by teaching
isolationtester to watch for particular values in a connected backend's
pg_stat_activity.wait_event_type/wait_event columns.  Those columns
didn't exist when isolationtester was designed, IIRC, so it's not
surprising that they're not used in the current design.  But we could
use them perhaps to detect that a backend has arrived at some state
that's not a heavyweight-lock-wait state.

regards, tom lane




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-09 Thread Dilip Kumar
On Mon, Mar 9, 2020 at 2:36 PM Amit Kapila  wrote:
>
> On Mon, Mar 9, 2020 at 2:09 PM Masahiko Sawada 
>  wrote:
>>
>> On Mon, 9 Mar 2020 at 15:50, Amit Kapila  wrote:
>> >
>> > On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada 
>> >  wrote:
>> >>
>> >> On Mon, 9 Mar 2020 at 14:16, Amit Kapila  wrote:
>> >> >
>> >> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada 
>> >> >  wrote:
>> >> >> >
>> >> >> > Fair position, as per initial analysis, I think if we do below three
>> >> >> > things, it should work out without changing to a new way of locking
>> >> >> > for relation extension or page type locks.
>> >> >> > a. As per the discussion above, ensure in code we will never try to
>> >> >> > acquire another heavy-weight lock after acquiring relation extension
>> >> >> > or page type locks (probably by having Asserts in code or maybe some
>> >> >> > other way).
>> >> >>
>> >> >> The current patch
>> >> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
>> >> >> doesn't check that acquiring a heavy-weight lock after page type lock,
>> >> >> is that right?
>> >> >
>> >> >
>> >> > No, it should do that.
>> >> >
>> >> >>
>> >> >> There is the path doing that: ginInsertCleanup() holds
>> >> >> a page lock and insert the pending list items, which might hold a
>> >> >> relation extension lock.
>> >> >
>> >> >
>> >> > Right, I could also see that, but do you see any problem with that?  I 
>> >> > agree that Assert should cover this case, but I don't see any 
>> >> > fundamental problem with that.
>> >>
>> >> I think that could be a problem if we change the group locking so that
>> >> it doesn't consider page lock type.
>> >
>> >
>> > I might be missing something, but won't that be a problem only when if 
>> > there is a case where we acquire page lock after acquiring a relation 
>> > extension lock?
>>
>> Yes, you're right.
>>
>> Well I meant that the reason why we need to make Assert should cover
>> page locks case is the same as the reason for extension lock type
>> case. If we change the group locking so that it doesn't consider
>> extension lock and change deadlock so that it doesn't make a wait edge
>> for it, we need to ensure that the same backend doesn't acquire
>> heavy-weight lock after holding relation extension lock. These are
>> already done in the current patch. Similarly, if we did the similar
>> change for page lock in the group locking and deadlock , we need to
>> ensure the same things for page lock.
>
>
> Agreed.
>
>>
>> But ISTM it doesn't necessarily
>> need to support page lock for now because currently we use it only for
>> cleanup pending list of gin index.
>>
>
> I agree, but I think it is better to have a patch for the same even if we 
> want to review/commit that separately.  That will help us to look at how the 
> complete solution looks.

Please find the updated patch (summary of the changes)
- Instead of searching the lock hash table for assert, it maintains a counter.
- Also, handled the case where we can acquire the relation extension
lock while holding the relation extension lock on the same relation.
- Handled the error case.

In addition to that prepared a WIP patch for handling the PageLock.
First, I thought that we can use the same counter for the PageLock and
the RelationExtensionLock because in assert we just need to check
whether we are trying to acquire any other heavyweight lock while
holding any of these locks.  But, the exceptional case where we
allowed to acquire a relation extension lock while holding any of
these locks is a bit different.  Because, if we are holding a relation
extension lock then we allowed to acquire the relation extension lock
on the same relation but it can not be any other relation otherwise it
can create a cycle.  But, the same is not true with the PageLock,
i.e. while holding the PageLock you can acquire the relation extension
lock on any relation and that will be safe because the relation
extension lock guarantee that, it will never create the cycle.
However, I agree that we don't have any such cases where we want to
acquire a relation extension lock on the different relations while
holding the PageLock.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v3-0001-Add-assert-to-check-that-we-should-not-acquire-an.patch
Description: Binary data


v3-0003-Conflict-Extension-Page-lock-in-group-member.patch
Description: Binary data


v3-0002-WIP-Extend-the-patch-for-handling-PageLock.patch
Description: Binary data


Re: Some problems of recovery conflict wait events

2020-03-09 Thread Masahiko Sawada
On Tue, 10 Mar 2020 at 00:57, Fujii Masao  wrote:
>
>
>
> On 2020/03/09 14:10, Masahiko Sawada wrote:
> > On Mon, 9 Mar 2020 at 13:24, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/08 13:52, Masahiko Sawada wrote:
> >>> On Thu, 5 Mar 2020 at 20:16, Fujii Masao  
> >>> wrote:
> 
> 
> 
>  On 2020/03/05 16:58, Masahiko Sawada wrote:
> > On Wed, 4 Mar 2020 at 15:21, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/04 14:31, Masahiko Sawada wrote:
> >>> On Wed, 4 Mar 2020 at 13:48, Fujii Masao 
> >>>  wrote:
> 
> 
> 
>  On 2020/03/04 13:27, Michael Paquier wrote:
> > On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
> >> Yeah, so 0001 patch sets existing wait events to recovery conflict
> >> resolution. For instance, it sets (PG_WAIT_LOCK | 
> >> LOCKTAG_TRANSACTION)
> >> to the recovery conflict on a snapshot. 0003 patch improves these 
> >> wait
> >> events by adding the new type of wait event such as
> >> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) 
> >> patch
> >> is the fix for existing versions and 0003 patch is an improvement 
> >> for
> >> only PG13. Did you mean even 0001 patch doesn't fit for 
> >> back-patching?
> 
>  Yes, it looks like a improvement rather than bug fix.
> 
> >>>
> >>> Okay, understand.
> >>>
> > I got my eyes on this patch set.  The full patch set is in my 
> > opinion
> > just a set of improvements, and not bug fixes, so I would refrain 
> > from
> > back-backpatching.
> 
>  I think that the issue (i.e., "waiting" is reported twice needlessly
>  in PS display) that 0002 patch tries to fix is a bug. So it should be
>  fixed even in the back branches.
> >>>
> >>> So we need only two patches: one fixes process title issue and another
> >>> improve wait event. I've attached updated patches.
> >>
> >> Thanks for updating the patches! I started reading 0001 patch.
> >
> > Thank you for reviewing this patch.
> >
> >>
> >> -   /*
> >> -* Report via ps if we have been waiting for 
> >> more than 500 msec
> >> -* (should that be configurable?)
> >> -*/
> >> -   if (update_process_title && new_status == NULL 
> >> &&
> >> -   TimestampDifferenceExceeds(waitStart, 
> >> GetCurrentTimestamp(),
> >> -  
> >> 500))
> >>
> >> The patch changes ResolveRecoveryConflictWithSnapshot() and
> >> ResolveRecoveryConflictWithTablespace() so that they always add
> >> "waiting" into the PS display, whether wait is really necessary or not.
> >> But isn't it better to display "waiting" in PS basically when wait is
> >> necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
> >> does as the above?
> >
> > You're right. Will fix it.
> >
> >>
> >>  ResolveRecoveryConflictWithDatabase(Oid dbid)
> >>  {
> >> +   char*new_status = NULL;
> >> +
> >> +   /* Report via ps we are waiting */
> >> +   new_status = set_process_title_waiting();
> >>
> >> In  ResolveRecoveryConflictWithDatabase(), there seems no need to
> >> display "waiting" in PS because no wait occurs when recovery conflict
> >> with database happens.
> >
> > Isn't the startup process waiting for other backend to terminate?
> 
>  Yeah, you're right. I agree that "waiting" should be reported in this 
>  case.
> 
>  Currently ResolveRecoveryConflictWithLock() and
>  ResolveRecoveryConflictWithBufferPin() don't call
>  ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
>  in PS display. You changed them so that they report "waiting". I agree
>  to have this change. But this change is an improvement rather than
>  a bug fix, i.e., we should apply this change only in v13?
> 
> >>>
> >>> Did you mean ResolveRecoveryConflictWithDatabase and
> >>> ResolveRecoveryConflictWithBufferPin?
> >>
> >> Yes! Sorry for my typo.
> >>
> >>> In the current code as far as I
> >>> researched there are two cases where we don't add "waiting" and one
> >>> case where we doubly add "waiting".
> >>>
> >>> ResolveRecoveryConflictWithDatabase and
> >>> ResolveRecoveryConflictWithBufferPin don't update the ps title.
> >>> Although the path where GetCurrentTimestamp() >= ltime is false in
> >>> ResolveRecoveryConflictWithLock also doesn't update the ps title, it's
> >>> already updated in WaitOnLock. On the other hand, the path where
> >>> GetCurrentTimestamp() 

Re: Crash by targetted recovery

2020-03-09 Thread Fujii Masao




On 2020/03/09 15:46, Fujii Masao wrote:



On 2020/03/09 13:49, Kyotaro Horiguchi wrote:

At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao  
wrote in

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.


Thanks for updating the patch that way.
Not sure which patch you're mentioning, though.


That meant 0003.


Regarding 0003 patch, I added a bit more detail comments into
the patch so that we can understand the code more easily.
Updated version of 0003 patch attached. Barring any objection,
at first, I plan to commit this patch.


Looks good to me. Thanks for writing the detailed comments.


Thanks for the review! Pushed.

I will review other two patches later.


Pushed the v5-0001-Tidy-up-XLogSource-usage.patch!

Regarding the remaining patch adding the regression test,

+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);

What happens if "off" part gets the upper limit and "seg" part needs
to be incremented? What happens if pg_last_wal_replay_lsn() advances
very much (e.g., because of autovacuum) beyond the segment boundary
until the standby restarts? Of course, these situations very rarely happen,
but I'd like to avoid adding such not stable test if possible.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Add an optional timeout clause to isolationtester step.

2020-03-09 Thread Michael Paquier
On Mon, Mar 09, 2020 at 03:15:58PM -0700, Andres Freund wrote:
> On 2020-03-07 22:17:09 +0100, Julien Rouhaud wrote:
>> For reindex concurrently, a SELECT FOR UPDATE on a different connection can
>> ensure that the reindex will be stuck at some point, so canceling the command
>> after a long enough timeout reproduces the original faulty behavior.
> 
> That kind of thing can already be done using statement_timeout or
> lock_timeout, no?

Yep, still that's not something I would recommend to commit in the
tree as that's a double-edged sword as you already know.  For slower
machines, you need a statement_timeout large enough so as you make
sure that the state you want the query to wait for is reached, which
has a cost on all other faster machines as it makes the tests slower.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-09 Thread Masahiko Sawada
On Mon, 9 Mar 2020 at 14:16, Amit Kapila  wrote:
>
> On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada 
>  wrote:
>>
>> On Mon, 24 Feb 2020 at 19:08, Amit Kapila  wrote:
>> >
>> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  wrote:
>> > >
>> > > Hi,
>> > >
>> > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote:
>> > > > I think till we know the real need for changing group locking, going
>> > > > in the direction of what Tom suggested to use an array of LWLocks [1]
>> > > > to address the problems in hand is a good idea.
>> > >
>> > > -many
>> > >
>> > > I think that building yet another locking subsystem is the entirely
>> > > wrong idea - especially when there's imo no convincing architectural
>> > > reasons to do so.
>> > >
>> >
>> > Hmm, AFAIU, it will be done by having an array of LWLocks which we do
>> > at other places as well (like BufferIO locks).  I am not sure if we
>> > can call it as new locking subsystem, but if we decide to continue
>> > using lock.c and change group locking then I think we can do that as
>> > well, see my comments below regarding that.
>> >
>> > >
>> > > > It is not very clear to me that are we thinking to give up on Tom's
>> > > > idea [1] and change group locking even though it is not clear or at
>> > > > least nobody has proposed an idea/patch which requires that?  Or are
>> > > > we thinking that we can do what Tom suggested for relation extension
>> > > > lock and also plan to change group locking for future parallel
>> > > > operations that might require it?
>> > >
>> > > What I'm advocating is that extension locks should continue to go
>> > > through lock.c. And yes, that requires some changes to group locking,
>> > > but I still don't see why they'd be complicated.
>> > >
>> >
>> > Fair position, as per initial analysis, I think if we do below three
>> > things, it should work out without changing to a new way of locking
>> > for relation extension or page type locks.
>> > a. As per the discussion above, ensure in code we will never try to
>> > acquire another heavy-weight lock after acquiring relation extension
>> > or page type locks (probably by having Asserts in code or maybe some
>> > other way).
>>
>> The current patch
>> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
>> doesn't check that acquiring a heavy-weight lock after page type lock,
>> is that right?
>
>
> No, it should do that.
>
>>
>> There is the path doing that: ginInsertCleanup() holds
>> a page lock and insert the pending list items, which might hold a
>> relation extension lock.
>
>
> Right, I could also see that, but do you see any problem with that?  I agree 
> that Assert should cover this case, but I don't see any fundamental problem 
> with that.

I think that could be a problem if we change the group locking so that
it doesn't consider page lock type.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-09 Thread Amit Kapila
On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Mon, 9 Mar 2020 at 14:16, Amit Kapila  wrote:
> >
> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> >> >
> >> > Fair position, as per initial analysis, I think if we do below three
> >> > things, it should work out without changing to a new way of locking
> >> > for relation extension or page type locks.
> >> > a. As per the discussion above, ensure in code we will never try to
> >> > acquire another heavy-weight lock after acquiring relation extension
> >> > or page type locks (probably by having Asserts in code or maybe some
> >> > other way).
> >>
> >> The current patch
> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
> >> doesn't check that acquiring a heavy-weight lock after page type lock,
> >> is that right?
> >
> >
> > No, it should do that.
> >
> >>
> >> There is the path doing that: ginInsertCleanup() holds
> >> a page lock and insert the pending list items, which might hold a
> >> relation extension lock.
> >
> >
> > Right, I could also see that, but do you see any problem with that?  I
> agree that Assert should cover this case, but I don't see any fundamental
> problem with that.
>
> I think that could be a problem if we change the group locking so that
> it doesn't consider page lock type.
>

I might be missing something, but won't that be a problem only when if
there is a case where we acquire page lock after acquiring a relation
extension lock?  Can you please explain the scenario you have in mind which
can create a problem?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: reindex concurrently and two toast indexes

2020-03-09 Thread Julien Rouhaud
On Mon, Mar 09, 2020 at 02:52:31PM +0900, Michael Paquier wrote:
> On Fri, Mar 06, 2020 at 01:36:48PM +0100, Julien Rouhaud wrote:
> >
> > v4 attached, which doesn't prevent a REINDEX INDEX CONCURRENTLY on any 
> > invalid
> > non-TOAST index anymore.
>
> Thanks.  The position of the error check in reindex_relation() is
> correct, but as it opens a relation for the cache lookup let's invent
> a new routine in lsyscache.c to grab pg_index.indisvalid.  It is
> possible to make use of this routine with all the other checks:
> - WARNING for REINDEX TABLE (non-conurrent)
> - ERROR for REINDEX INDEX (non-conurrent)
> - ERROR for REINDEX INDEX CONCURRENTLY
> (There is already a WARNING for REINDEX TABLE CONCURRENTLY.)
>
> I did not find the addition of an error check in ReindexIndex()
> consistent with the existing practice to check the state of the
> relation reindexed in reindex_index() (for the non-concurrent case)
> and ReindexRelationConcurrently() (for the concurrent case).  Okay,
> this leads to the introduction of two new ERROR messages related to
> invalid toast indexes for the concurrent and the non-concurrent cases
> when using REINDEX INDEX instead of one, but having two messages leads
> to something much more consistent with the rest, and all checks remain
> centralized in the same routines.

I wanted to go this way at first but hesitated and finally chose to add less
checks, so I'm fine with this approach, and patch looks good to me.

> For the index-level operation, issuing a WARNING is not consistent
> with the existing practice to use an ERROR, which is more adapted as
> the operation is done on a single index at a time.

Agreed.

> For the check in reindex_relation, it is more consistent to check the
> namespace of the index instead of the parent relation IMO (the
> previous patch used "rel", which refers to the parent table).  This
> has in practice no consequence though.

Oops yes.


> It would have been nice to test this stuff.  However, this requires an
> invalid toast index and we cannot create that except by canceling a
> concurrent reindex, leading us back to the upthread discussion about
> isolation tests, timeouts and fault injection :/

Yes, unfortunately I don't see an acceptable way to add tests for that without
some kind of fault injection, so this will have to wait :(




Re: Additional improvements to extended statistics

2020-03-09 Thread Dean Rasheed
On Mon, 9 Mar 2020 at 00:02, Tomas Vondra  wrote:
>
> Speaking of which, would you take a look at [1]? I think supporting SAOP
> is fine, but I wonder if you agree with my conclusion we can't really
> support inclusion @> as explained in [2].
>

Hmm, I'm not sure. However, thinking about your example in [2] reminds
me of a thought I had a while ago, but then forgot about --- there is
a flaw in the formula used for computing probabilities with functional
dependencies:

  P(a,b) = P(a) * [f + (1-f)*P(b)]

because it might return a value that is larger that P(b), which
obviously should not be possible. We should amend that formula to
prevent a result larger than P(b). The obvious way to do that would be
to use:

  P(a,b) = Min(P(a) * [f + (1-f)*P(b)], P(b))

but actually I think it would be better and more principled to use:

  P(a,b) = f*Min(P(a),P(b)) + (1-f)*P(a)*P(b)

I.e., for those rows believed to be functionally dependent, we use the
minimum probability, and for the rows believed to be independent, we
use the product.

I think that would solve the problem with the example you gave at the
end of [2], but I'm not sure if it helps with the general case.

Regards,
Dean


> [1] 
> https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy@pierred-pdoc
> [2] 
> https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development




Re: [PATCH] Support built control file in PGXS VPATH builds

2020-03-09 Thread Peter Eisentraut
On 2020-02-07 04:14, Craig Ringer wrote:
> The attached patch fixes this by having PGXS resolve
> $(EXTENSION).control along the VPATH.

Simpler patch:

diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index 271e7eaba8..1cd750eecd 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -229,7 +229,7 @@ endif # MODULE_big
 
 install: all installdirs
 ifneq (,$(EXTENSION))
-   $(INSTALL_DATA) $(addprefix $(srcdir)/, $(addsuffix .control, 
$(EXTENSION))) '$(DESTDIR)$(datadir)/extension/'
+   $(INSTALL_DATA) $(call vpathsearch,$(addsuffix .control, $(EXTENSION))) 
'$(DESTDIR)$(datadir)/extension/'
 endif # EXTENSION
 ifneq (,$(DATA)$(DATA_built))
$(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA)) $(DATA_built) 
'$(DESTDIR)$(datadir)/$(datamoduledir)/'

Does that work for you?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Improve handling of parameter differences in physical replication

2020-03-09 Thread Peter Eisentraut

On 2020-02-28 16:33, Alvaro Herrera wrote:

Hmm, so what is the actual end-user behavior?  As I read the code, we
first send the WARNING, then pause recovery until the user resumes
replication; at that point we raise the original error.  Presumably, at
that point the startup process terminates and is relaunched, and replay
continues normally.  Is that it?


No, at that point you get the original, current behavior that the server 
instance shuts down with a fatal error.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: allow trigger to get updated columns

2020-03-09 Thread Peter Eisentraut

On 2020-03-05 13:53, Daniel Gustafsson wrote:

The 0001 refactoring patch seems a clear win to me.

In the 0002 patch:

+For UPDATE triggers, a bitmap set indicating the
+columns that were updated by the triggering command.  Generic trigger

Is it worth pointing out that tg_updatedcols will be NULL rather than an empty
Bitmapset for non-UPDATE triggers?  bitmapset.c treats NULL as an empty bitmap
but since a Bitmapset can be allocated but empty, maybe it's worth being
explicit to help developers?


done


There isn't really a test suite that excercises this IIUC, how about adding
something like the attached diff to contrib/lo?  It seemed like a lower impact
change than widening test_tsvector.


done


+1 on the patchset, marking this entry as Ready For Committer.


and done

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add an optional timeout clause to isolationtester step.

2020-03-09 Thread Julien Rouhaud
On Mon, Mar 09, 2020 at 04:47:27PM +0900, Michael Paquier wrote:
> On Sat, Mar 07, 2020 at 10:46:34AM -0500, Tom Lane wrote:
> > The arbitrarily-set timeouts that exist in some of the isolation tests
> > are horrid kluges that have caused us lots of headaches in the past
> > and no doubt will again in the future.  Aside from occasionally failing
> > when a machine is particularly overloaded, they cause the tests to
> > take far longer than necessary on decently-fast machines.  So ideally
> > we'd get rid of those entirely in favor of some more-dynamic approach.
> > Admittedly, I have no proposal for what that would be.  But adding yet
> > more ways to set a (guaranteed-to-be-wrong) timeout seems like the
> > wrong direction to be going in.  What's the actual need that you're
> > trying to deal with?
>
> As a matter of fact, the buildfarm member petalura just reported a
> failure with the isolation test "timeouts", the machine being
> extremely slow:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura=2020-03-08%2011%3A20%3A05
>
> test timeouts ... FAILED60330 ms
> [...]
> -step update: DELETE FROM accounts WHERE accountid = 'checking'; 
> -step update: <... completed>
> +step update: DELETE FROM accounts WHERE accountid = 'checking';
>  ERROR:  canceling statement due to statement timeout

Indeed.  I guess we could add some kind of environment variable facility in
isolationtester to let slow machine owner put a way bigger timeout without
making the test super slow for everyone else, but that seems overkill for just
one test, and given the other thread about deploying REL_11 build-farm client,
that wouldn't be an immediate fix either.




Re: Add an optional timeout clause to isolationtester step.

2020-03-09 Thread Michael Paquier
On Sat, Mar 07, 2020 at 10:46:34AM -0500, Tom Lane wrote:
> The arbitrarily-set timeouts that exist in some of the isolation tests
> are horrid kluges that have caused us lots of headaches in the past
> and no doubt will again in the future.  Aside from occasionally failing
> when a machine is particularly overloaded, they cause the tests to
> take far longer than necessary on decently-fast machines.  So ideally
> we'd get rid of those entirely in favor of some more-dynamic approach.
> Admittedly, I have no proposal for what that would be.  But adding yet
> more ways to set a (guaranteed-to-be-wrong) timeout seems like the
> wrong direction to be going in.  What's the actual need that you're
> trying to deal with?

As a matter of fact, the buildfarm member petalura just reported a
failure with the isolation test "timeouts", the machine being
extremely slow:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura=2020-03-08%2011%3A20%3A05

test timeouts ... FAILED60330 ms
[...]
-step update: DELETE FROM accounts WHERE accountid = 'checking'; 
-step update: <... completed>
+step update: DELETE FROM accounts WHERE accountid = 'checking';
 ERROR:  canceling statement due to statement timeout
--
Michael


signature.asc
Description: PGP signature


Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-09 Thread Michael Paquier
On Fri, Mar 06, 2020 at 05:22:16PM +0900, Michael Paquier wrote:
> Thanks for the suggestion.  Avoiding dead code makes sense as well
> here.  I'll think about this stuff a bit more first.

Okay, after pondering more on that, here is a first cut with a patch
refactoring restore_command build to src/common/.  One thing I have
changed compared to the other versions is that BuildRestoreCommand()
now returns a boolean to tell if the command was successfully built or
not.

A second thing.  As of now the interface of BuildRestoreCommand()
assumes that the resulting buffer has an allocation of MAXPGPATH.
This should be fine because that's an assumption we rely on a lot in
the code, be it frontend or backend.  However, it could also be a trap
for any caller of this routine if they allocate something smaller.
Wouldn't it be cleaner to pass down the size of the result buffer
directly to BuildRestoreCommand() and use the length given by the
caller of the routine as a sanity check?

Any thoughts?
--
Michael
From bad2c04b0eb3a146f0c2719ed8360b3b255c3c47 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 9 Mar 2020 17:15:00 +0900
Subject: [PATCH] Move routine generating restore_command to src/common/

restore_command has only been used until now by the backend, but there
is a pending patch for pg_rewind to make use of that in the frontend.

Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee3616895...@postgrespro.ru
---
 src/include/common/archive.h |  28 ++
 src/backend/access/transam/xlogarchive.c |  61 ++---
 src/common/Makefile  |   1 +
 src/common/archive.c | 109 +++
 src/tools/msvc/Mkvcbuild.pm  |   4 +-
 5 files changed, 146 insertions(+), 57 deletions(-)
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/common/archive.c

diff --git a/src/include/common/archive.h b/src/include/common/archive.h
new file mode 100644
index 00..bdcc49ffd3
--- /dev/null
+++ b/src/include/common/archive.h
@@ -0,0 +1,28 @@
+/*-
+ *
+ * archive.h
+ *	  Common WAL archive routines
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/archive.h
+ *
+ *-
+ */
+#ifndef ARCHIVE_H
+#define ARCHIVE_H
+
+/*
+ * Routine to build a restore command to retrieve WAL segments from a WAL
+ * archive.  This uses the arguments given by the caller to replace the
+ * corresponding alias fields as defined in the GUC parameter
+ * restore_command.
+ */
+extern bool BuildRestoreCommand(const char *restoreCommand,
+const char *xlogpath,	/* %p */
+const char *xlogfname,	/* %f */
+const char *lastRestartPointFname,	/* %r */
+char *result);
+
+#endif			/* ARCHIVE_H */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 188b73e752..6e8cd88a5e 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -21,6 +21,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "common/archive.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
 #include "replication/walsender.h"
@@ -55,9 +56,6 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	char		xlogpath[MAXPGPATH];
 	char		xlogRestoreCmd[MAXPGPATH];
 	char		lastRestartPointFname[MAXPGPATH];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
 	int			rc;
 	struct stat stat_buf;
 	XLogSegNo	restartSegNo;
@@ -149,58 +147,11 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogRestoreCmd;
-	endp = xlogRestoreCmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = recoveryRestoreCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-case 'p':
-	/* %p: relative path of target file */
-	sp++;
-	StrNCpy(dp, xlogpath, endp - dp);
-	make_native_path(dp);
-	dp += strlen(dp);
-	break;
-case 'f':
-	/* %f: filename of desired file */
-	sp++;
-	StrNCpy(dp, xlogfname, endp - dp);
-	dp += strlen(dp);
-	break;
-case 'r':
-	/* %r: filename of last restartpoint */
-	sp++;
-	StrNCpy(dp, lastRestartPointFname, endp - dp);
-	dp += strlen(dp);
-	break;
-case '%':
-	/* convert %% to a single % */
-	sp++;
-	if (dp < endp)
-		*dp++ = *sp;
-	break;
-default:
-	/* otherwise treat the % as not special */
-	if (dp < endp)
-		*dp++ = *sp;
-	break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-			

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-09 Thread Masahiko Sawada
On Mon, 9 Mar 2020 at 15:50, Amit Kapila  wrote:
>
> On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada 
>  wrote:
>>
>> On Mon, 9 Mar 2020 at 14:16, Amit Kapila  wrote:
>> >
>> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada 
>> >  wrote:
>> >> >
>> >> > Fair position, as per initial analysis, I think if we do below three
>> >> > things, it should work out without changing to a new way of locking
>> >> > for relation extension or page type locks.
>> >> > a. As per the discussion above, ensure in code we will never try to
>> >> > acquire another heavy-weight lock after acquiring relation extension
>> >> > or page type locks (probably by having Asserts in code or maybe some
>> >> > other way).
>> >>
>> >> The current patch
>> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
>> >> doesn't check that acquiring a heavy-weight lock after page type lock,
>> >> is that right?
>> >
>> >
>> > No, it should do that.
>> >
>> >>
>> >> There is the path doing that: ginInsertCleanup() holds
>> >> a page lock and insert the pending list items, which might hold a
>> >> relation extension lock.
>> >
>> >
>> > Right, I could also see that, but do you see any problem with that?  I 
>> > agree that Assert should cover this case, but I don't see any fundamental 
>> > problem with that.
>>
>> I think that could be a problem if we change the group locking so that
>> it doesn't consider page lock type.
>
>
> I might be missing something, but won't that be a problem only when if there 
> is a case where we acquire page lock after acquiring a relation extension 
> lock?

Yes, you're right.

Well I meant that the reason why we need to make Assert should cover
page locks case is the same as the reason for extension lock type
case. If we change the group locking so that it doesn't consider
extension lock and change deadlock so that it doesn't make a wait edge
for it, we need to ensure that the same backend doesn't acquire
heavy-weight lock after holding relation extension lock. These are
already done in the current patch. Similarly, if we did the similar
change for page lock in the group locking and deadlock , we need to
ensure the same things for page lock. But ISTM it doesn't necessarily
need to support page lock for now because currently we use it only for
cleanup pending list of gin index.


Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-09 Thread Amit Kapila
On Mon, Mar 9, 2020 at 2:09 PM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Mon, 9 Mar 2020 at 15:50, Amit Kapila  wrote:
> >
> > On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> >>
> >> On Mon, 9 Mar 2020 at 14:16, Amit Kapila 
> wrote:
> >> >
> >> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> >> >> >
> >> >> > Fair position, as per initial analysis, I think if we do below
> three
> >> >> > things, it should work out without changing to a new way of locking
> >> >> > for relation extension or page type locks.
> >> >> > a. As per the discussion above, ensure in code we will never try to
> >> >> > acquire another heavy-weight lock after acquiring relation
> extension
> >> >> > or page type locks (probably by having Asserts in code or maybe
> some
> >> >> > other way).
> >> >>
> >> >> The current patch
> >> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
> >> >> doesn't check that acquiring a heavy-weight lock after page type
> lock,
> >> >> is that right?
> >> >
> >> >
> >> > No, it should do that.
> >> >
> >> >>
> >> >> There is the path doing that: ginInsertCleanup() holds
> >> >> a page lock and insert the pending list items, which might hold a
> >> >> relation extension lock.
> >> >
> >> >
> >> > Right, I could also see that, but do you see any problem with that?
> I agree that Assert should cover this case, but I don't see any fundamental
> problem with that.
> >>
> >> I think that could be a problem if we change the group locking so that
> >> it doesn't consider page lock type.
> >
> >
> > I might be missing something, but won't that be a problem only when if
> there is a case where we acquire page lock after acquiring a relation
> extension lock?
>
> Yes, you're right.
>
> Well I meant that the reason why we need to make Assert should cover
> page locks case is the same as the reason for extension lock type
> case. If we change the group locking so that it doesn't consider
> extension lock and change deadlock so that it doesn't make a wait edge
> for it, we need to ensure that the same backend doesn't acquire
> heavy-weight lock after holding relation extension lock. These are
> already done in the current patch. Similarly, if we did the similar
> change for page lock in the group locking and deadlock , we need to
> ensure the same things for page lock.


Agreed.


> But ISTM it doesn't necessarily
> need to support page lock for now because currently we use it only for
> cleanup pending list of gin index.
>
>
I agree, but I think it is better to have a patch for the same even if we
want to review/commit that separately.  That will help us to look at how
the complete solution looks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Identifying user-created objects

2020-03-09 Thread Kyotaro Horiguchi
At Sun, 8 Mar 2020 11:55:06 +0900, Masahiko Sawada 
 wrote in 
> On Thu, 5 Mar 2020 at 18:39, Kyotaro Horiguchi  
> wrote:
> >
> > At Thu, 5 Mar 2020 18:06:26 +0900, Masahiko Sawada 
> >  wrote in
> > > On Thu, 5 Mar 2020 at 16:36, Kyotaro Horiguchi  
> > > wrote:
> > > I think normally users don't want to remove built-in functions because
> > > they think these functions are trusted and it's hard to restore them
> > > when they want later. So I thought user want to search functions that
> > > is unnecessary but not a built-in function.
> >
> > I'm not sure those who wants to introduce PCI-DSS are under a normal
> > sitautation, though:p
> >
> > That seems beside the point. pg_read_file is known to be usable for
> > drawing out database files. If you leave the function alone, the
> > security officer (designer?) have to consider the possibility that
> > someone draws out files in the database system using the function and
> > have to plan the action for the threat.  In that context,
> > built-in-or-not distinction is useless.
> 
> So how do you check if unnecessary, malicious or unauthorized function
> exists in database after that, for example when periodical security
> check? Functions defined after initdb must be checked under user's
> responsibility but since normally there are many built-in functions in
> pg_proc the check in pg_proc could be cumbersome. So the idea of this
> feature is to make that check easier by marking built-in functions.

I think there's no easy way to accomplish it.  If PostgreSQL
documentation says that "Yeah, the function tells if using the
function or feature complies the request of PCI-DSS 2.2.5!" and it
tells safe for all built-in functions, it is an outright lie even if
the server is just after initdb'ed.

Sparating from PCI-DSS and we document it just as "the function tells
if the function is built-in or not", it's true. (But I'm not sure
about its usage..)

I might be misunderstanding about the operation steps in your mind.

> >
> > In the first place, if you assume that someone may install malicious
> > functions in the server after beginning operation, distinction by OID
> > doesn't work at all because who can illegally install a malicious
> > function also be able to modify its OID with quite low odds.
> 
> Yes, that's what Fujii-san also pointed out. It's better to find a way
> to distinct functions while not relying on OID.

And it is out-of-scope of PCI-DSS 2.2.5. It mentions design or
system-building time.

Apart from PCI-DSS, if you are concerning operation-time threats. If
once someone malicious could install a function to the server, I think
that kind of feature with any criteria no longer work as a
countermeasure for further threats. Maybe something like tripwire
would work. That is, maybe a kind of checksum over system catalogs.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-09 Thread Kuntal Ghosh
On Mon, Mar 9, 2020 at 1:46 PM Michael Paquier  wrote:
>
> On Fri, Mar 06, 2020 at 05:22:16PM +0900, Michael Paquier wrote:
> > Thanks for the suggestion.  Avoiding dead code makes sense as well
> > here.  I'll think about this stuff a bit more first.
>
> Okay, after pondering more on that, here is a first cut with a patch
> refactoring restore_command build to src/common/.  One thing I have
> changed compared to the other versions is that BuildRestoreCommand()
> now returns a boolean to tell if the command was successfully built or
> not.
>
Yeah. If we're returning only -1 and 0, it's better to use a boolean.
If we're planning to provide some information about which parameter is
missing, a few negative integer values might be useful. But, I feel
that's unnecessary in this case.

> A second thing.  As of now the interface of BuildRestoreCommand()
> assumes that the resulting buffer has an allocation of MAXPGPATH.
> This should be fine because that's an assumption we rely on a lot in
> the code, be it frontend or backend.  However, it could also be a trap
> for any caller of this routine if they allocate something smaller.
> Wouldn't it be cleaner to pass down the size of the result buffer
> directly to BuildRestoreCommand() and use the length given by the
> caller of the routine as a sanity check?
>
That's a good suggestion. But, it's unlikely that a caller would pass
something longer than MAXPGPATH and we indeed use that value a lot in
the code. IMHO, it looks okay to me to have that assumption here as
well.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: Crash by targetted recovery

2020-03-09 Thread Fujii Masao




On 2020/03/09 13:49, Kyotaro Horiguchi wrote:

At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao  
wrote in

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.


Thanks for updating the patch that way.
Not sure which patch you're mentioning, though.


That meant 0003.


Regarding 0003 patch, I added a bit more detail comments into
the patch so that we can understand the code more easily.
Updated version of 0003 patch attached. Barring any objection,
at first, I plan to commit this patch.


Looks good to me. Thanks for writing the detailed comments.


Thanks for the review! Pushed.

I will review other two patches later.


There seems to be more other places where XLogSource and
XLOG_FROM_XXX are not used yet. For example, the initial values
of readSource and XLogReceiptSource, the type of argument
"source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
These also should be updated?


Right. I checked through the file and AFAICS that's all.  The attachec
v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current
master.


Thanks for the patch!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




DROP and ddl_command_end.

2020-03-09 Thread Kyotaro Horiguchi
Hello.

When I created an event trigger for ddl_command_end, I think the only
means to identify for what the trigger function is called is
pg_event_trigger_ddl_commands() so I wrote as the following function
and defined an event trigger for ddl_command_end.

CREATE OR REPLACE FUNCTION hoge() RETURNS event_trigger AS $$
DECLARE
  cmd record =  pg_event_trigger_ddl_commands();
BEGIN
  RAISE NOTICE '"%" "%" "%"',
cmd.command_tag, cmd.object_type, cmd.object_identity;
END
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER hoge_trigger ON ddl_command_end EXECUTE FUNCTION hoge();

Finally I got an ERROR while DROP.

=# CREATE TABLE t (a int);
NOTICE:  "CREATE TABLE" "table" "public.t"
CREATE TABLE
postgres=# DROP TABLE t;
ERROR:  record "cmd" is not assigned yet
DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.
CONTEXT:  PL/pgSQL function hoge() line 5 at RAISE

The function doesn't return a record for DROP statements.

The documentation is written as the follows:

https://postgresql.org/docs/current/event-trigger-definition.html
> The ddl_command_end event occurs just after the execution of this same
> set of commands. To obtain more details on the DDL operations that
> took place, use the set-returning function
> pg_event_trigger_ddl_commands() from the ddl_command_end event trigger
> code (see Section 9.28). Note that the trigger fires after the actions
> have taken place (but before the transaction commits), and thus the
> system catalogs can be read as already changed.

So I think at least pg_event_trigger_ddl_command must return a record
for all commands that trigger ddl_command_end and the record should
have the correct command_tag.  DROP TABLE is currently classified as
supporting event trigger.  If we don't do that, any workaround and
documentation is needed.

I may be missing something, andt any opinions, thoughts or suggestions
are welcome.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 91800d1fac..8376ce429b 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -1939,7 +1939,8 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
 		 * state anyway.
 		 */
 		if (cmd->type == SCT_Simple &&
-			!OidIsValid(cmd->d.simple.address.objectId))
+			!OidIsValid(cmd->d.simple.address.objectId) &&
+			!IsA(cmd->parsetree, DropStmt))
 			continue;
 
 		MemSet(nulls, 0, sizeof(nulls));
@@ -1952,8 +1953,8 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
 			case SCT_CreateOpClass:
 			case SCT_AlterTSConfig:
 {
-	char	   *identity;
-	char	   *type;
+	char	   *identity = NULL;
+	char	   *type = NULL;
 	char	   *schema = NULL;
 
 	if (cmd->type == SCT_Simple)
@@ -1969,8 +1970,12 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
 	else if (cmd->type == SCT_AlterTSConfig)
 		addr = cmd->d.atscfg.address;
 
-	type = getObjectTypeDescription();
-	identity = getObjectIdentity();
+	if (memcmp(, ,
+			   sizeof(ObjectAddress)) != 0)
+	{
+		type = getObjectTypeDescription();
+		identity = getObjectIdentity();
+	}
 
 	/*
 	 * Obtain schema name, if any ("pg_temp" if a temp
@@ -2023,14 +2028,20 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
 	/* command tag */
 	values[i++] = CStringGetTextDatum(CreateCommandName(cmd->parsetree));
 	/* object_type */
-	values[i++] = CStringGetTextDatum(type);
+	if (type != NULL)
+		values[i++] = CStringGetTextDatum(type);
+	else
+		nulls[i++] = true;
 	/* schema */
 	if (schema == NULL)
 		nulls[i++] = true;
 	else
 		values[i++] = CStringGetTextDatum(schema);
 	/* identity */
-	values[i++] = CStringGetTextDatum(identity);
+	if (identity != NULL)
+		values[i++] = CStringGetTextDatum(identity);
+	else
+		nulls[i++] = true;
 	/* in_extension */
 	values[i++] = BoolGetDatum(cmd->in_extension);
 	/* command */
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d0..3b801d279a 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1686,8 +1686,8 @@ ProcessUtilitySlow(ParseState *pstate,
 
 			case T_DropStmt:
 ExecDropStmt((DropStmt *) parsetree, isTopLevel);
-/* no commands stashed for DROP */
-commandCollected = true;
+/* Dropped object is not available */
+address =  InvalidObjectAddress;
 break;
 
 			case T_RenameStmt:


Re: pg_rewind docs correction

2020-03-09 Thread Michael Paquier
On Sun, Mar 08, 2020 at 05:13:21PM -0400, James Coleman wrote:
> I've added the information about how the backup label control file is
> written, and updated the How It Works steps to refer to that separately
> from restart.
> 
> Additionally the How It Works is updated to include WAL segments and new
> relation files in the list of files copied wholesale, since that was
> previously stated but somewhat contradicted there.

-   The result is equivalent to replacing the target data directory with the
-   source one. Only changed blocks from relation files are copied;
-   all other files are copied in full, including configuration files. The
-   advantage of pg_rewind over taking a new base 
backup, or
-   tools like rsync, is that 
pg_rewind does
-   not require reading through unchanged blocks in the cluster. This makes
-   it a lot faster when the database is large and only a small
-   fraction of blocks differ between the clusters.
+   After a successful rewind and subsequent WAL replay, the target data
+   directory is equivalent to a base backup of the source data directory. While
+   only changed blocks from existing relation files are copied; all other files
+   are copied in full, including new relation files, configuration files, and 
WAL
+   segments. The advantage of pg_rewind over taking 
a

The first sentence you are adding refers to "subsequent WAL replay".
However, this paragraph emphasizes with the state of the target
cluster after running pg_rewind but *before* make the target cluster
start recovery.  So shouldn't you just remove the part "and subsequent
WAL replay" from your first new sentence?

In the same paragraph, I think that you should remove the "While" from
"While only changed blocks", as the second part of the sentence refers
to the other files, WAl segments, etc.

The second paragraph of the docs regarding timeline lookup is
unchanged, which is fine.

-   When the target server is started for the first time after running
-   pg_rewind, it will go into recovery mode and 
replay all
-   WAL generated in the source server after the point of divergence.
+   After running pg_rewind the data directory is
+   not immediately in a consistent state. However
+   pg_rewind configures the control file so that 
when
+   the target server is started again it will enter recovery mode and replay 
all
+   WAL generated in the source server after the point of divergence.

The second part of the third paragraph is not changed, and the
modification you are doing here is about the control file.  I am
still unconvinced that this is a good change, because mentioning the
control file would be actually more adapted to the part "How it
works", where you are adding details about the backup_label file, and
already include details about the minimum consistency LSN itself
stored in the control file.

+   
+Because pg_rewind copies configuration files
+entirely from the source, correcting recovery configuration options before
+restarting the server is necessary if you intend to re-introduce the target
+as a replica of the source. If you restart the server after the rewind
+operation has finished but without configuring recovery, the target will
+again diverge from the primary.
+   

True that this is not outlined enough.

+  The relation files are now to their state at the last checkpoint 
completed
+  prior to the point at which the WAL timelines of the source and target
+  diverged plus the current state on the source of any blocks changed on 
the
+  target after that divergence.

"Relation files are now in a state equivalent to the moment of the
last completed checkpoint prior to the point.."?

-  pg_stat_tmp/, and
-  pg_subtrans/ are omitted from the data copied
-  from the source cluster. Any file or directory beginning with
-  pgsql_tmp is omitted, as well as are
+  pg_stat_tmp/, and pg_subtrans/
+  are omitted from the data copied from the source cluster. The files

This is just reorganizing an existing list, why?

+  Create a backup label file to begin WAL replay at the checkpoint created
+  at failover and  a minimum consistency LSN using
+  pg_current_wal_insert_lsn(), when using a live source
+  and the last checkpoint LSN, when using a stopped source.

Now would be the moment to mention the control file.

> I realized I didn't previously add this to the CF; since it's not a new
> patch I've added it to the current CF, but if this is incorrect please let
> me know.

The last CF of Postgres 13 began at the beginning of February :(
--
Michael


signature.asc
Description: PGP signature


Re: Improve handling of parameter differences in physical replication

2020-03-09 Thread Masahiko Sawada
On Sat, 29 Feb 2020 at 06:39, Alvaro Herrera  wrote:
>
> On 2020-Feb-27, Peter Eisentraut wrote:
>
> > So this patch relaxes this a bit.  Upon receipt of
> > XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
> > warning and set a global flag if there is a problem.  Then when we
> > actually hit the resource issue and the flag was set, we issue another
> > warning message with relevant information.  Additionally, at that
> > point we pause recovery, so a hot standby remains usable.
>
> Hmm, so what is the actual end-user behavior?  As I read the code, we
> first send the WARNING, then pause recovery until the user resumes
> replication; at that point we raise the original error.

I think after recovery is paused users will be better to restart the
server rather than resume the recovery. I agree with this idea but I'm
slightly concerned that users might not realize that recovery is
paused until they look at that line in server log or at
pg_stat_replication because the standby server is still functional. So
I think we can periodically send WARNING to inform user that we're
still waiting for parameter change and restart.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Improve handling of parameter differences in physical replication

2020-03-09 Thread Peter Eisentraut

On 2020-03-09 09:11, Masahiko Sawada wrote:

I think after recovery is paused users will be better to restart the
server rather than resume the recovery. I agree with this idea but I'm
slightly concerned that users might not realize that recovery is
paused until they look at that line in server log or at
pg_stat_replication because the standby server is still functional. So
I think we can periodically send WARNING to inform user that we're
still waiting for parameter change and restart.


I think that would be annoying, unless you create a system for 
configuring those periodic warnings.


I imagine in a case like having set max_prepared_transactions but never 
actually using prepared transactions, people will just ignore the 
warning until they have their next restart, so it could be months of 
periodic warnings.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Crash by targetted recovery

2020-03-09 Thread Kyotaro Horiguchi
At Mon, 9 Mar 2020 15:46:49 +0900, Fujii Masao  
wrote in 
> On 2020/03/09 13:49, Kyotaro Horiguchi wrote:
> > At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao
> >> Regarding 0003 patch, I added a bit more detail comments into
> >> the patch so that we can understand the code more easily.
> >> Updated version of 0003 patch attached. Barring any objection,
> >> at first, I plan to commit this patch.
> > Looks good to me. Thanks for writing the detailed comments.
> 
> Thanks for the review! Pushed.

Thanks for commiting.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-09 Thread Julien Rouhaud
On Thu, Mar 05, 2020 at 01:26:19PM -0700, legrand legrand wrote:
> Please consider PG13 shortest path ;o)
>
> My one is  parse->queryId != UINT64CONST(0) in pgss_planner_hook().
> It fixes IVM problem (verified),
> and keep CTAS equal to pgss without planning counters (verified too).

I still disagree that hiding this problem is the right fix, but since no one
objected here's a v5 with that behavior.  Hopefully this will be fixed in v14.
>From 1e9e7d7223482bb1976292176da7fa01deb84e93 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 28 Mar 2019 13:33:23 +0100
Subject: [PATCH v5 1/3] Pass query string to the planner

---
 src/backend/commands/copy.c  |  3 ++-
 src/backend/commands/createas.c  |  3 ++-
 src/backend/commands/explain.c   |  2 +-
 src/backend/commands/extension.c |  2 +-
 src/backend/commands/matview.c   |  2 +-
 src/backend/commands/portalcmds.c|  2 +-
 src/backend/executor/functions.c |  1 +
 src/backend/optimizer/plan/planner.c | 10 ++
 src/backend/tcop/postgres.c  | 13 -
 src/backend/utils/cache/plancache.c  |  3 ++-
 src/include/optimizer/optimizer.h|  3 ++-
 src/include/optimizer/planner.h  |  4 +++-
 src/include/tcop/tcopprot.h  |  6 --
 13 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4cb8..62d7635a34 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1579,7 +1579,8 @@ BeginCopy(ParseState *pstate,
}
 
/* plan the query */
-   plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, NULL);
+   plan = pg_plan_query(query, pstate->p_sourcetext,
+
CURSOR_OPT_PARALLEL_OK, NULL);
 
/*
 * With row level security and a user using "COPY relation TO", 
we
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3a5676fb39..9febdc5165 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,7 +329,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt 
*stmt,
Assert(query->commandType == CMD_SELECT);
 
/* plan the query */
-   plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, params);
+   plan = pg_plan_query(query, pstate->p_sourcetext,
+
CURSOR_OPT_PARALLEL_OK, params);
 
/*
 * Use a snapshot with an updated command ID to ensure this 
query sees
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4a50..fb772aa5cd 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -375,7 +375,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
INSTR_TIME_SET_CURRENT(planstart);
 
/* plan the query */
-   plan = pg_plan_query(query, cursorOptions, params);
+   plan = pg_plan_query(query, queryString, cursorOptions, params);
 
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 00cf4ef268..38cbea385a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -751,7 +751,7 @@ execute_sql_string(const char *sql)

   NULL,

   0,

   NULL);
-   stmt_list = pg_plan_queries(stmt_list, CURSOR_OPT_PARALLEL_OK, 
NULL);
+   stmt_list = pg_plan_queries(stmt_list, sql, 
CURSOR_OPT_PARALLEL_OK, NULL);
 
foreach(lc2, stmt_list)
{
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c3954f3e24..e5a5eef102 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -391,7 +391,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
CHECK_FOR_INTERRUPTS();
 
/* Plan the query which will generate data for the refresh. */
-   plan = pg_plan_query(query, 0, NULL);
+   plan = pg_plan_query(query, queryString, 0, NULL);
 
/*
 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/portalcmds.c 
b/src/backend/commands/portalcmds.c
index 40be5069fe..6a2c233615 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -90,7 +90,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt 
*cstmt, ParamListInfo pa
elog(ERROR, "non-SELECT statement in DECLARE CURSOR");
 
/* Plan the query, applying 

Re: Remove win32ver.rc from version_stamp.pl

2020-03-09 Thread Peter Eisentraut

On 2020-03-01 23:51, Tom Lane wrote:

In general, while I'm on board with the idea, I wonder whether it
wouldn't be smarter to keep on defining PG_MAJORVERSION as a string,
and just add PG_MAJORVERSION_NUM alongside of it.  This would
eliminate some hunks from the patch in places where it's more
convenient to have the version as a string, and it would avoid
what could otherwise be a pretty painful cross-version incompatibility
for extensions.  We already provide PG_VERSION in both forms, so
I don't see any inconsistency in doing likewise for PG_MAJORVERSION.


Agreed.  Here is another patch.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 0770c6ef67b78208791e463f4453175e2d152750 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 9 Mar 2020 08:42:02 +0100
Subject: [PATCH v2] Remove win32ver.rc from version_stamp.pl

This removes another relic from the old nmake-based Windows build.
version_stamp.pl put version number information into win32ver.rc.  But
win32ver.rc already gets other version number information from the
preprocessor at build time, so it would make more sense if all version
number information would be handled in the same way and we don't have
two places that do it.

What we need for this is having the major version number and the minor
version number as separate integer symbols.  Both configure and
Solution.pm already have that logic, because they compute
PG_VERSION_NUM.  So we just keep all the logic there now.  Put the
minor version number into a new symbol PG_MINORVERSION_NUM.  Also, add
a symbol PG_MAJORVERSION_NUM, which is a number, alongside the
existing PG_MAJORVERSION, which is a string.

Discussion: 
https://www.postgresql.org/message-id/flat/1ee46ac4-a9b2-4531-bf54-5ec2e3746...@2ndquadrant.com
---
 configure  | 15 +--
 configure.in   |  7 +--
 src/include/pg_config.h.in |  6 ++
 src/port/win32ver.rc   |  4 ++--
 src/tools/msvc/Solution.pm | 16 +---
 src/tools/version_stamp.pl | 14 +-
 6 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/configure b/configure
index 45dbeb4d19..d6d3f26d03 100755
--- a/configure
+++ b/configure
@@ -2805,6 +2805,8 @@ _ACEOF
 
 
 PG_MAJORVERSION=`expr "$PACKAGE_VERSION" : '\([0-9][0-9]*\)'`
+PG_MINORVERSION=`expr "$PACKAGE_VERSION" : '.*\.\([0-9][0-9]*\)'`
+test -n "$PG_MINORVERSION" || PG_MINORVERSION=0
 
 
 cat >>confdefs.h <<_ACEOF
@@ -2812,6 +2814,16 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
+cat >>confdefs.h <<_ACEOF
+#define PG_MAJORVERSION_NUM $PG_MAJORVERSION
+_ACEOF
+
+
+cat >>confdefs.h <<_ACEOF
+#define PG_MINORVERSION_NUM $PG_MINORVERSION
+_ACEOF
+
+
 
 
 
@@ -18875,8 +18887,7 @@ _ACEOF
 
 # Supply a numeric version string for use by 3rd party add-ons
 # awk -F is a regex on some platforms, and not on others, so make "." a tab
-PG_VERSION_NUM="`echo "$PACKAGE_VERSION" | sed 's/[A-Za-z].*$//' |
-tr '.' '   ' |
+PG_VERSION_NUM="`echo "$PG_MAJORVERSION$PG_MINORVERSION" |
 $AWK '{printf "%d%04d", $1, $2}'`"
 
 cat >>confdefs.h <<_ACEOF
diff --git a/configure.in b/configure.in
index 22f096a5ac..78902fb60d 100644
--- a/configure.in
+++ b/configure.in
@@ -30,8 +30,12 @@ AC_PREFIX_DEFAULT(/usr/local/pgsql)
 AC_DEFINE_UNQUOTED(CONFIGURE_ARGS, ["$ac_configure_args"], [Saved arguments 
from configure])
 
 [PG_MAJORVERSION=`expr "$PACKAGE_VERSION" : '\([0-9][0-9]*\)'`]
+[PG_MINORVERSION=`expr "$PACKAGE_VERSION" : '.*\.\([0-9][0-9]*\)'`]
+test -n "$PG_MINORVERSION" || PG_MINORVERSION=0
 AC_SUBST(PG_MAJORVERSION)
 AC_DEFINE_UNQUOTED(PG_MAJORVERSION, "$PG_MAJORVERSION", [PostgreSQL major 
version as a string])
+AC_DEFINE_UNQUOTED(PG_MAJORVERSION_NUM, $PG_MAJORVERSION, [PostgreSQL major 
version number])
+AC_DEFINE_UNQUOTED(PG_MINORVERSION_NUM, $PG_MINORVERSION, [PostgreSQL minor 
version number])
 
 PGAC_ARG_REQ(with, extra-version, [STRING], [append STRING to version],
  [PG_VERSION="$PACKAGE_VERSION$withval"],
@@ -2318,8 +2322,7 @@ AC_DEFINE_UNQUOTED(PG_VERSION_STR,
 
 # Supply a numeric version string for use by 3rd party add-ons
 # awk -F is a regex on some platforms, and not on others, so make "." a tab
-[PG_VERSION_NUM="`echo "$PACKAGE_VERSION" | sed 's/[A-Za-z].*$//' |
-tr '.' '   ' |
+[PG_VERSION_NUM="`echo "$PG_MAJORVERSION   $PG_MINORVERSION" |
 $AWK '{printf "%d%04d", $1, $2}'`"]
 AC_DEFINE_UNQUOTED(PG_VERSION_NUM, $PG_VERSION_NUM, [PostgreSQL version as a 
number])
 AC_SUBST(PG_VERSION_NUM)
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index d758dfd36e..41ad209380 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -780,6 +780,12 @@
 /* PostgreSQL major version as a string */
 #undef PG_MAJORVERSION
 
+/* PostgreSQL major version number */
+#undef PG_MAJORVERSION_NUM
+
+/* PostgreSQL minor version number */
+#undef PG_MINORVERSION_NUM
+
 /* Define to best printf format archetype, 

Re: ccache is required by postgresql12-devel RPM

2020-03-09 Thread Kyotaro Horiguchi
At Sat, 07 Mar 2020 13:41:58 -0500, Tom Lane  wrote in 
> Bruce Momjian  writes:
> > On Thu, Jan 30, 2020 at 08:36:17PM +0900, Kyotaro Horiguchi wrote:
> >> I notieced that the official PG12-devel RPM pacakge for RHEL8 mandates
> >> ccache being installed on building of an extension.
> >> 
> >> $ grep ccache /usr/pgsql-12/lib/pgxs/src/Makefile.global 
> >> CLANG = /usr/lib64/ccache/clang
> >> # ccache loses .gcno files
> >> 
> >> However it can be changed by explicitly setting CLANG, it seems that
> >> that setting is by accident since gcc is not ccachi'fied. Anyway, I'm
> >> not sure but, I think that that decision should be left with extension
> >> developers. Is it intentional?
> 
> > That certainly seems wrong.  Is it this URL?

Maybe and 10, 11 have the same configuration. For clarity, my
objective is just to know whether it is deliberate or not.

> I can't get too excited about this.  ccache has been part of the standard
> development environment on Red Hat systems for at least a decade, and thus
> /usr/lib64/ccache has been in the standard PATH setting for just as long.
> Devrim would have to go out of his way to force CLANG to not show up that
> way in the configure result, and I don't see that it would be an
> improvement for any real-world usage of the RPM results.

I don't remember how I configured development environments both with
CentOS7/8 but both of them initially didn't have ccache.  And I'm not
the only one who stumbled on the CLANG setting. But we might be a
small minority..

> The core reason why it shows up that way is that llvm.m4 uses
> PGAC_PATH_PROGS to set CLANG, while we don't forcibly path-ify
> the CC setting.  But arguably the latter is a bug, not the former.
> I recall that there's been some discussion about how it'd be safer
> if configure made sure that all the tool names it records are fully
> path-ified.

Although I'm not sure we should path-ify tool chains on building
extensions, if the -devel package doesn't work without ccache, the
package needs to depend on ccache package. But such a policy creates
many bothersome dependency on every tool.  I'm not sure how to deal
with that...

So, I'm satisfied to know it is the intended behavior.

Thank you for the explanation, Tom.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Remove win32ver.rc from version_stamp.pl

2020-03-09 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-03-01 23:51, Tom Lane wrote:
>> In general, while I'm on board with the idea, I wonder whether it
>> wouldn't be smarter to keep on defining PG_MAJORVERSION as a string,
>> and just add PG_MAJORVERSION_NUM alongside of it.

> Agreed.  Here is another patch.

This version LGTM.  (I can't actually test the Windows aspects
of this, but I assume you did.)

I'm wondering a little bit whether it'd be worth back-patching the
additions of the new #defines.  That would cut about five years off
the time till they could be relied on by extensions.  However,
I'm not sure anyone is eager to rely on them, so it may not be
worth the effort.

regards, tom lane




Re: backend type in log_line_prefix?

2020-03-09 Thread Kuntal Ghosh
Hello,

On Sat, Mar 7, 2020 at 8:38 PM Peter Eisentraut
 wrote:
>
> Updated patch set because of conflicts.
>
Thank you for the patch. This feature is really helpful. Here are some
minor comments:

In v3-0001-Refactor-ps_status.c-API.patch,

- *
- * For a walsender, the ps display is set in the following form:
- *
- * postgres: walsender   
This part is still valid, right?

+ init_ps_display(ps_data.data);
+ pfree(ps_data.data);
+
+ set_ps_display("initializing");
As per the existing behaviour, if update_process_title is true, we
display "authentication" as the initial string. On the other hand,
this patch, irrespective of the GUC variable, always displays
"initializing" as the initial string and In PerformAuthentication, it
sets the display as "authentication" Is this intended? Should we check
the GUC here as well?

In v3-0002-Unify-several-ways-to-tracking-backend-type.patch,
+ * If fixed_part is NULL, a default will be obtained from BackendType.
s/BackendType/MyBackendType?

In v3-0003-Add-backend-type-to-csvlog-and-optionally-log_lin.patch,
+ Backend process type
In other places you've written "backend type".

In v3-0004-Remove-am_syslogger-global-variable.patch,
+ * This is exported so that elog.c can call it when BackendType is B_LOGGER.
s/BackendType/MyBackendType?

Done some basic testing. Working as expected.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: Bug in pg_restore with EventTrigger in parallel mode

2020-03-09 Thread Tom Lane
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?=  writes:
> On Sat, Mar 7, 2020 at 8:42 PM Tom Lane  wrote:
>> However, I think that the existing code is correct to restore event
>> triggers before matview refreshes, not after as this patch would have us
>> do.  The basic idea for matview refresh is that it should happen in the
>> normal running state of the database.  If an event trigger interferes with
>> that, it would've done so in normal running as well.

> I'm not totally sure if it's entirely correct.

> For example if I write an EventTrigger to perform some kind of DDL auditing
> then during the restore the "refresh maview" operation will be audited and
> IMHO it's wrong.

The big problem I've got with this line of reasoning is that not
everything can be the last restore step.  There was already an argument
that matviews should be refreshed last so they can see the final state
of the catalogs, in case you have a matview over some catalog (and of
course that applies to pg_event_trigger as much as any other catalog).
Admittedly, that seems like an unlikely use-case, but it demonstrates
that there are limits to how much we can guarantee about dump/restore
producing just the same state that prevailed before the dump.

In the case of event triggers, the obvious counterexample is that if
you restore ET A and then ET B, ET A might interfere with the attempt
to restore ET B.  (And we have no way to know whether restoring B
before A would be better or worse.)

So on the whole I find "restore matviews as if they'd been refreshed
after the restore" to be a more trustworthy approach than the other
way.  At some level we have to trust that ETs aren't going to totally
bollix the restore.

Which, TBH, makes me wonder about the validity of the original complaint
in this thread.  I don't mind delaying ET restore as long as we feasibly
can; but if you have an ET that is going to misbehave during restore,
you are in for pain, and it's hard to consider that that pain isn't
self-inflicted.

regards, tom lane




Re: [Proposal] Global temporary tables

2020-03-09 Thread tushar

On 3/6/20 12:35 PM, 曾文旌(义从) wrote:

Fixed in global_temporary_table_v17-pg13.patch


I observed that , we do support 'global temp' keyword with views

postgres=# create or replace  global temp view v1 as select 5;
CREATE VIEW

but if we take the dump( using pg_dumpall) then it only display 'create 
view'


look like we are skipping it ?

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: [Proposal] Global temporary tables

2020-03-09 Thread tushar

On 3/6/20 12:35 PM, 曾文旌(义从) wrote:

Fixed in global_temporary_table_v17-pg13.patch


Thanks Wenjing.

Please refer this scenario , where i am able to set 
'on_commit_delete_rows=true'  on regular table using 'alter' Syntax  
which is not allowed using 'Create' Syntax


--Expected -

postgres=# CREATE TABLE foo () WITH (on_commit_delete_rows='true');
ERROR:  The parameter on_commit_delete_rows is exclusive to the global 
temp table, which cannot be specified by a regular table


--But user can do this with 'alter' command -
postgres=# create table foo();
CREATE TABLE
postgres=# alter table foo set (on_commit_delete_rows='true');
ALTER TABLE
postgres=#

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: Improve handling of parameter differences in physical replication

2020-03-09 Thread Masahiko Sawada
On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut
 wrote:
>
> On 2020-03-09 09:11, Masahiko Sawada wrote:
> > I think after recovery is paused users will be better to restart the
> > server rather than resume the recovery. I agree with this idea but I'm
> > slightly concerned that users might not realize that recovery is
> > paused until they look at that line in server log or at
> > pg_stat_replication because the standby server is still functional. So
> > I think we can periodically send WARNING to inform user that we're
> > still waiting for parameter change and restart.
>
> I think that would be annoying, unless you create a system for
> configuring those periodic warnings.
>
> I imagine in a case like having set max_prepared_transactions but never
> actually using prepared transactions, people will just ignore the
> warning until they have their next restart, so it could be months of
> periodic warnings.

Well I meant to periodically send warning messages while waiting for
parameter change, that is after exhausting resources and stopping
recovery. In this situation user need to notice that as soon as
possible.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Bug in pg_restore with EventTrigger in parallel mode

2020-03-09 Thread Fabrízio de Royes Mello
On Sat, Mar 7, 2020 at 8:42 PM Tom Lane  wrote:
>
> vignesh C  writes:
> > I'm not sure if we can add a test for this, can you have a thought
> > about this to check if we can add a test.
>
> Yeah, I'm not quite sure if a test is worth the trouble or not.
>
> We clearly do need to restore event triggers later than we do now, even
> without considering parallel restore: they should not be able to prevent
> us from executing other restore actions.  This is just like the rule that
> we don't restore DML triggers until after we've loaded data.
>

Ok.


> However, I think that the existing code is correct to restore event
> triggers before matview refreshes, not after as this patch would have us
> do.  The basic idea for matview refresh is that it should happen in the
> normal running state of the database.  If an event trigger interferes with
> that, it would've done so in normal running as well.
>

I'm not totally sure if it's entirely correct.

For example if I write an EventTrigger to perform some kind of DDL auditing
then during the restore the "refresh maview" operation will be audited and
IMHO it's wrong.


> I'm also not terribly on board with loading more functionality onto the
> RestorePass mechanism.  That's a crock that should go away someday,
> because it basically duplicates and overrides pg_dump's normal object
> sorting mechanism.  So we don't want it doing more than it absolutely
> has to.  But in this case, I don't see any reason why we can't just
> restore event triggers and matviews in the same post-ACL restore pass.

Totally agree with it.


> In a serial restore, that will make the event triggers come first
> because of the existing sort rules.  In a parallel restore, it's possible
> that they'd be intermixed, but that doesn't bother me.  Again, if your
> event triggers have side-effects on your matview refreshes, you're
> going to have some issues anyway.
>

IMHO EventTriggers can't be fired during pg_restore under any circumstances
because can lead us to a different database state than the dump used.


> So that leads me to the attached, which renames the "RESTORE_PASS_REFRESH"
> symbol for clarity, and updates the pg_dump_sort.c code and comments
> to match what's really going on.
>

Ok.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: pg_rewind docs correction

2020-03-09 Thread James Coleman
On Mon, Mar 9, 2020 at 2:59 AM Michael Paquier  wrote:

> On Sun, Mar 08, 2020 at 05:13:21PM -0400, James Coleman wrote:
> > I've added the information about how the backup label control file is
> > written, and updated the How It Works steps to refer to that separately
> > from restart.
> >
> > Additionally the How It Works is updated to include WAL segments and new
> > relation files in the list of files copied wholesale, since that was
> > previously stated but somewhat contradicted there.
>
> -   The result is equivalent to replacing the target data directory with
> the
> -   source one. Only changed blocks from relation files are copied;
> -   all other files are copied in full, including configuration files. The
> -   advantage of pg_rewind over taking a new
> base backup, or
> -   tools like rsync, is that
> pg_rewind does
> -   not require reading through unchanged blocks in the cluster. This makes
> -   it a lot faster when the database is large and only a small
> -   fraction of blocks differ between the clusters.
> +   After a successful rewind and subsequent WAL replay, the target data
> +   directory is equivalent to a base backup of the source data directory.
> While
> +   only changed blocks from existing relation files are copied; all other
> files
> +   are copied in full, including new relation files, configuration files,
> and WAL
> +   segments. The advantage of pg_rewind over
> taking a
>
> The first sentence you are adding refers to "subsequent WAL replay".
> However, this paragraph emphasizes with the state of the target
> cluster after running pg_rewind but *before* make the target cluster
> start recovery.  So shouldn't you just remove the part "and subsequent
> WAL replay" from your first new sentence?
>

I'd originally typed this:
I'm not sure I follow. After pg_rewind but before replay the directory is
*not* equivalent to a base backup. I don't see how paragraph is clearly
limited to describing what pg_rewind does. While the 2nd sentence is about
pg_rewind steps specifically, the paragraph (even in the original) goes on
to compare it to a base backup so we're talking about the operation in
totality not just the one tool.

But I realized while typing it that I was probably missing something of
what you were getting at: is the hangup on calling out the WAL replay that
a base backup (or rsync even) *also* requires WAL reply to reach a
consistent state? I hadn't thought of that while writing this initially, so
I've updated the patch to eliminate that part but also to make the analogy
to base backups more direct, since it's helpful in understanding what
result the tool is trying to accomplish and how it differs.

In the same paragraph, I think that you should remove the "While" from
> "While only changed blocks", as the second part of the sentence refers
> to the other files, WAl segments, etc.
>

Fixed as part of the above.


> The second paragraph of the docs regarding timeline lookup is
> unchanged, which is fine.
>
> -   When the target server is started for the first time after running
> -   pg_rewind, it will go into recovery mode
> and replay all
> -   WAL generated in the source server after the point of divergence.
> +   After running pg_rewind the data directory
> is
> +   not immediately in a consistent state. However
> +   pg_rewind configures the control file so
> that when
> +   the target server is started again it will enter recovery mode and
> replay all
> +   WAL generated in the source server after the point of divergence.
>
> The second part of the third paragraph is not changed, and the
> modification you are doing here is about the control file.  I am
> still unconvinced that this is a good change, because mentioning the
> control file would be actually more adapted to the part "How it
> works", where you are adding details about the backup_label file, and
> already include details about the minimum consistency LSN itself
> stored in the control file.
>

I've removed the control file reference and instead continued the analogy
to base backups.


> +   
> +Because pg_rewind copies configuration
> files
> +entirely from the source, correcting recovery configuration options
> before
> +restarting the server is necessary if you intend to re-introduce the
> target
> +as a replica of the source. If you restart the server after the rewind
> +operation has finished but without configuring recovery, the target
> will
> +again diverge from the primary.
> +   
>
> True that this is not outlined enough.
>

Thanks.


> +  The relation files are now to their state at the last checkpoint
> completed
> +  prior to the point at which the WAL timelines of the source and
> target
> +  diverged plus the current state on the source of any blocks changed
> on the
> +  target after that divergence.
>
> "Relation files are now in a state equivalent to the moment of the
> last completed checkpoint prior to the point.."?
>

Updated.


> -  

Re: [Proposal] Global temporary tables

2020-03-09 Thread Prabhat Sahu
Hi All,

Kindly check the below scenario.

*Case 1: *
postgres=# CREATE GLOBAL TEMPORARY TABLE gtt1(c1 int) on commit delete rows;
CREATE TABLE
postgres=# CREATE GLOBAL TEMPORARY TABLE gtt2(c1 int) on commit preserve
rows;
CREATE TABLE
postgres=# vacuum gtt1;
VACUUM
postgres=# vacuum gtt2;
VACUUM
postgres=# vacuum;
VACUUM
postgres=# \q

*Case 2: Exit and reconnect to psql prompt.*
[edb@localhost bin]$ ./psql  postgres
psql (13devel)
Type "help" for help.

postgres=# vacuum gtt1;
WARNING:  skipping vacuum empty global temp table "gtt1"
VACUUM
postgres=# vacuum gtt2;
WARNING:  skipping vacuum empty global temp table "gtt2"
VACUUM
postgres=# vacuum;
WARNING:  skipping vacuum empty global temp table "gtt1"
WARNING:  skipping vacuum empty global temp table "gtt2"
VACUUM

Although in "Case1" the gtt1/gtt2 are empty, we are not getting "WARNING:
 skipping vacuum empty global temp table" for VACUUM in "Case 1".
whereas we are getting the "WARNING" for VACUUM in "Case2".


On Fri, Mar 6, 2020 at 12:41 PM 曾文旌(义从)  wrote:

>
>
> > 2020年3月5日 下午10:38,Robert Haas  写道:
> >
> > On Thu, Mar 5, 2020 at 9:19 AM tushar 
> wrote:
> >> WARNING:  relfilenode 13589/1663/19063 not exist in gtt shared hash
> when forget
> >> ERROR:  out of shared memory
> >> HINT:  You might need to increase max_active_gtt.
> >>
> >> also , would be great  if we can make this error message  user friendly
> like  - "max connection reached"  rather than memory error
> >
> > That would be nice, but the bigger problem is that the WARNING there
> > looks totally unacceptable. It's looks like it's complaining of some
> > internal issue (i.e. a bug or corruption) and the grammar is poor,
> > too.
>
> Yes, WARNING should not exist.
> This is a bug in the rollback process and I have fixed it in
> global_temporary_table_v17-pg13.patch
>
>
> Wenjing
>
>
> >
> > --
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
>
>
>
>

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: Conflict handling for COPY FROM

2020-03-09 Thread Surafel Temesgen
On Fri, Mar 6, 2020 at 11:30 AM asaba.takan...@fujitsu.com <
asaba.takan...@fujitsu.com> wrote:

> Hello Surafel,
>
> Sorry for my late reply.
>
> From: Surafel Temesgen 
> >On Thu, Dec 12, 2019 at 7:51 AM mailto:asaba.takan...@fujitsu.com
>  wrote:
> >>2. I have a question about copy meta-command.
> >>When I executed copy meta-command, output wasn't displayed.
> >>Does it correspond to copy meta-command?
> >
> >Fixed
> Thank you.
>
> I think we need regression test that constraint violating row is returned
> back to the caller.
> How about this?
>
>
okay attached is a rebased patch with it

regards
Surafel
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a99f8155e4..845902b824 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
 ENCODING 'encoding_name'
+ERROR_LIMIT limit_number
 
  
 
@@ -355,6 +356,28 @@ COPY { table_name [ ( 

 
+   
+ERROR_LIMIT
+
+ 
+  Enables ignoring of errored out rows up to limit_number. If limit_number is set
+  to -1, then all errors will be ignored.
+ 
+
+ 
+  Currently, only unique or exclusion constraint violation
+  and rows formatting errors are ignored. Malformed
+  rows will rise warnings, while constraint violating rows
+  will be returned back to the caller unless any error is raised;
+  i.e. if any error is raised due to error_limit exceeds, no rows
+  will be returned back to the caller.
+ 
+
+
+   
+

 WHERE
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4cb8..4184e2e755 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -20,6 +20,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/printtup.h"
 #include "access/sysattr.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -48,6 +49,8 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
+#include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -153,6 +156,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -182,6 +186,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+	 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -836,7 +843,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 			   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 defel->defname),
 		 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options"),
+		 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	

Re: Run-time pruning for ModifyTable

2020-03-09 Thread David Rowley
Thanks for having a look at this, Amit.

On Fri, 24 Jan 2020 at 21:57, Amit Langote  wrote:
>
> On Thu, Jan 23, 2020 at 4:31 PM Amit Langote  wrote:
> Part of the optimizer patch that looks a bit complex is the changes to
> inheritance_planner() which is to be expected, because that function
> is a complex beast itself.  I have suggestions to modify some comments
> around the code added/modified by the patch for clarity; attaching a
> delta patch for that.

I've made another pass over the patch and made various changes. The
biggest of which was the required modifications to nodeModifyTable.c
so that it can now prune all partitions. Append and MergeAppend were
modified to allow this in 5935917ce59 (Thanks for pushing that Tom).
I've also slightly simplified the code in ExecModifyTable() and added
slightly more code to ExecInitModifyTable(). We now only set
mt_whichplan to WHICHPLAN_CHOOSE_SUBPLAN when run-time pruning is
enabled and do_exec_prune is true. I also made it so when all
partitions are pruned that we set mt_whichplan to
WHICHPLAN_CHOOSE_SUBPLAN as this saves an additional run-time check
during execution.

Over in inheritance_planner(), I noticed that the RT index of the
SELECT query and the UPDATE/DELETE query can differ. There was some
code that performed translations. I changed that code slightly so that
it's a bit more optimal.  It was building two lists, one for the old
RT index and one for the new. It added elements to this list
regardless of if the RT indexes were the same or not. I've now changed
that to only add to the list if they differ, which I feel should never
be slower and most likely always faster.   I'm also now building a
translation map between the old and new RT indexes, however, I only
found one test in the regression tests which require any sort of
translation of these RT indexes.  This was with an inheritance table,
so I need to do a bit more work to find a case where this happens with
a partitioned table to ensure all this works.

> The executor patch looks pretty benign too.  Diffs that looked a bit
> suspicious at first are due to replacing
> ModifyTableState.resultRelInfo that is a pointer into
> EState.es_result_relations array by an array of ResultRelInfo
> pointers, but doing that seems to make the relevant code easier to
> follow, especially if you consider the changes that the patch makes to
> that code.

Yeah, that's because the ModifyTableState's resultRelInfo field was
just a pointer to the estate->es_result_relations array offset by the
ModifyTable's resultRelIndex.  This was fine previously because we
always initialised the plans for each ResultRelInfo.  However, now
that we might be pruning some of those that array can't be used as
it'll still contain ResultRelInfos for relations we're not going to
touch. Changing this to an array of pointers allows us to point to the
elements in estate->es_result_relations that we're going to use.  I
also renamed the field just to ensure nothing can compile (thinking of
extensions here) that's not got updated code.

Tom, I'm wondering if you wouldn't mind looking over my changes to
inheritance_planner()?

David


modifytable_runtime_prune_2020-03-10.patch
Description: Binary data


Re: recovery_target_action=pause with confusing hint

2020-03-09 Thread David Steele

Hi Sergei,

On 1/30/20 12:00 PM, Fujii Masao wrote:


- check for standby triggers only for recovery_target_action - I am 
not sure this would be safe for pg_wal_replay_pause() call case


Agreed. Basically I think that recoveryPausesHere() should the promotion
trigger whether recovery target is reached or not. But one question is;
how should the recovery behave if recovery target is reached with
recovery_target_action=pause after the promotion is requested?
It should pause? Or promote?


Do you have thoughts on Fujii's comments?

Regards,
--
-David
da...@pgmasters.net




Re: POC: rational number type (fractions)

2020-03-09 Thread David Steele

Hi Joe,

On 2/7/20 11:25 PM, Joe Nelson wrote:

Hi hackers, attached is a proof of concept patch adding a new base type
called "rational" to represent fractions.


I have set the target version for this patch to PG14 because it is POC 
and this is the first CF it has appeared in.


Regards,
--
-David
da...@pgmasters.net




Re: logical copy_replication_slot issues

2020-03-09 Thread Arseny Sher


Masahiko Sawada  writes:

> /*
> -* Create logical decoding context, to build the initial snapshot.
> +* Create logical decoding context to find start point or, if we don't
> +* need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity.
>  */
>
> Do we need to numbering that despite not referring them?

No, it just seemed clearer to me this way. I don't mind removing the
numbers if you feel this is better.

> ctx = CreateInitDecodingContext(plugin, NIL,
> -   false,  /* do not build snapshot */
> +   false,  /* do not build data snapshot */
> restart_lsn,
> logical_read_local_xlog_page, NULL, NULL,
> NULL);
> I'm not sure this change makes the comment better. Could you elaborate
> on the motivation of this change?

Well, DecodingContextFindStartpoint always builds a snapshot allowing
historical *catalog* lookups. This bool controls whether the snapshot
should additionally be suitable for looking at the actual data, this is
e.g. used by initial data sync in the native logical replication.


-- cheers, arseny




Re: bad logging around broken restore_command

2020-03-09 Thread David Steele

Hi Jeff,

On 2/6/20 9:23 AM, Fujii Masao wrote:



I've attached a patch which emits a LOG message for ENOENT.


Isn't it better to use "could not stat file" message even in ENOENT case?
If yes, something like message that you used in the patch should be
logged as DETAIL or HINT message. So, what about the attached patch?


What do you think of Fujii's changes?

Regards,
--
-David
da...@pgmasters.net




New feature request: Query Rewrite Cache

2020-03-09 Thread Bert Scalzo
MySQL has a really useful feature they call the query rewrite cache. The
optimizer checks incoming queries to see if a known better rewrite has been
placed within the query rewrite cache table. If one is found, the rewrite
replaces the incoming query before sending it to the execution engine. This
capability allows for one to fix poorly performing queries in 3rd party
application code that cannot be modified. For example, suppose a 3rd party
application contains the following inefficient query: SELECT COUNT(*) FROM
table WHERE SUBSTRING(column,1,3) = 'ABC'. One can place the following
rewrite in the query rewrite cache: SELECT COUNT(*) FROM table WHERE column
LIKE 'ABC%'. The original query cannot use an index while the rewrite can.
Since it's a 3rd party application there is really no other way to make
such an improvement. The existing rewrite rules in PostgreSQL are too
narrowly defined to permit such a substitution as the incoming query could
involve many tables, so what's needed is a general "if input SQL string
matches X then replace it with Y". This check could be placed at the
beginning of the parser.c code. Suggest that the matching code should first
check the string lengths and  hash values before checking entire string
match for efficiency.


Re: pgsql: pageinspect: Fix types used for bt_metap() columns.

2020-03-09 Thread Alvaro Herrera
On 2020-Mar-08, Peter Geoghegan wrote:

> Fix these issues by changing the declaration of bt_metap() to
> consistently use data types that can reliably represent all possible
> values.  This fixes things on HEAD only.  No backpatch, since it doesn't
> seem like there is a safe way to fix the issue without including a new
> version of the pageinspect extension (HEAD/Postgres 13 already
> introduced a new version of the extension).  Besides, the oldest_xact
> issue has been around since the release of Postgres 11, and we haven't
> heard any complaints about it before now.

This may be a good time to think through about how to implement a
version history for an extension that enables users to go from pg12's
current 1.7 pageinspect to a new fixed version in pg12, say 1.7.1, and
in HEAD provide an upgrade path from both 1.7 and 1.7.1 to master's 1.8.
Then you can pg_upgrade from pg12 to pg13 having either 1.7 or 1.7.1,
and you will be able to get to 1.8 nonetheless.

Does that make sense?

The current problem might not be serious enough to warrant actually
writing the code that would be needed, but I propose to at least think
about it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: recovery_target_action=pause with confusing hint

2020-03-09 Thread Fujii Masao




On 2020/03/09 21:30, David Steele wrote:

Hi Sergei,

On 1/30/20 12:00 PM, Fujii Masao wrote:



- check for standby triggers only for recovery_target_action - I am not sure 
this would be safe for pg_wal_replay_pause() call case


Agreed. Basically I think that recoveryPausesHere() should the promotion
trigger whether recovery target is reached or not. But one question is;
how should the recovery behave if recovery target is reached with
recovery_target_action=pause after the promotion is requested?
It should pause? Or promote?


Do you have thoughts on Fujii's comments?


Thanks for the ping! And sorry for not reporting the current status.

I started the discussion about the topic very related to this.
I'm thinking to apply the change that Sergei proposes after applying
the patch I attached in this thread.
https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048e...@oss.nttdata.com

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: [Proposal] Global temporary tables

2020-03-09 Thread 曾文旌(义从)



> 2020年3月9日 下午10:37,tushar  写道:
> 
> On 3/6/20 12:35 PM, 曾文旌(义从) wrote:
>> Fixed in global_temporary_table_v17-pg13.patch
> 
> I observed that , we do support 'global temp' keyword with views
> 
> postgres=# create or replace  global temp view v1 as select 5;
> CREATE VIEW
I think we should not support global temp view.
Fixed in global_temporary_table_v18-pg13.patch.



Wenjing


> 
> but if we take the dump( using pg_dumpall) then it only display 'create view'
> 
> look like we are skipping it ?
> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company





Re: Some problems of recovery conflict wait events

2020-03-09 Thread Fujii Masao




On 2020/03/09 14:10, Masahiko Sawada wrote:

On Mon, 9 Mar 2020 at 13:24, Fujii Masao  wrote:




On 2020/03/08 13:52, Masahiko Sawada wrote:

On Thu, 5 Mar 2020 at 20:16, Fujii Masao  wrote:




On 2020/03/05 16:58, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 15:21, Fujii Masao  wrote:




On 2020/03/04 14:31, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 13:48, Fujii Masao  wrote:




On 2020/03/04 13:27, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:

Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?


Yes, it looks like a improvement rather than bug fix.



Okay, understand.


I got my eyes on this patch set.  The full patch set is in my opinion
just a set of improvements, and not bug fixes, so I would refrain from
back-backpatching.


I think that the issue (i.e., "waiting" is reported twice needlessly
in PS display) that 0002 patch tries to fix is a bug. So it should be
fixed even in the back branches.


So we need only two patches: one fixes process title issue and another
improve wait event. I've attached updated patches.


Thanks for updating the patches! I started reading 0001 patch.


Thank you for reviewing this patch.



-   /*
-* Report via ps if we have been waiting for more than 
500 msec
-* (should that be configurable?)
-*/
-   if (update_process_title && new_status == NULL &&
-   TimestampDifferenceExceeds(waitStart, 
GetCurrentTimestamp(),
-   
   500))

The patch changes ResolveRecoveryConflictWithSnapshot() and
ResolveRecoveryConflictWithTablespace() so that they always add
"waiting" into the PS display, whether wait is really necessary or not.
But isn't it better to display "waiting" in PS basically when wait is
necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
does as the above?


You're right. Will fix it.



 ResolveRecoveryConflictWithDatabase(Oid dbid)
 {
+   char*new_status = NULL;
+
+   /* Report via ps we are waiting */
+   new_status = set_process_title_waiting();

In  ResolveRecoveryConflictWithDatabase(), there seems no need to
display "waiting" in PS because no wait occurs when recovery conflict
with database happens.


Isn't the startup process waiting for other backend to terminate?


Yeah, you're right. I agree that "waiting" should be reported in this case.

Currently ResolveRecoveryConflictWithLock() and
ResolveRecoveryConflictWithBufferPin() don't call
ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
in PS display. You changed them so that they report "waiting". I agree
to have this change. But this change is an improvement rather than
a bug fix, i.e., we should apply this change only in v13?



Did you mean ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin?


Yes! Sorry for my typo.


In the current code as far as I
researched there are two cases where we don't add "waiting" and one
case where we doubly add "waiting".

ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin don't update the ps title.
Although the path where GetCurrentTimestamp() >= ltime is false in
ResolveRecoveryConflictWithLock also doesn't update the ps title, it's
already updated in WaitOnLock. On the other hand, the path where
GetCurrentTimestamp() >= ltime is true in
ResolveRecoveryConflictWithLock updates the ps title but it's wrong as
I reported.

I've split the patch into two patches: 0001 patch fixes the issue
about doubly updating ps title, 0002 patch makes the recovery conflict
resolution on database and buffer pin update the ps title.


Thanks for splitting the patches. I think that 0001 patch can be back-patched.

-   /*
-* Report via ps if we have been waiting for more than 
500 msec
-* (should that be configurable?)
-*/
-   if (update_process_title && new_status == NULL &&
-   TimestampDifferenceExceeds(waitStart, 
GetCurrentTimestamp(),
-   
   500))

Originally, "waiting" is reported in PS if we've been waiting for more than
500 msec, as the above does. But you got rid of those codes in the patch.
Did you confirm that it's safe to do that? If not, isn't it better to apply
the 

Re: Remove utils/acl.h from catalog/objectaddress.h

2020-03-09 Thread Alvaro Herrera
On 2020-Mar-07, Peter Eisentraut wrote:

> I noticed that catalog/objectaddress.h includes utils/acl.h for no apparent
> reason.  It turns out this used to be needed but not anymore. So removed it
> and cleaned up the fallout.  Patch attached.

parser/parse_nodes.h already includes nodes/parsenodes.h, so the seeming
redundancy in places such as 

> diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
> index c27d255d8d..be63e043c6 100644
> --- a/src/include/commands/vacuum.h
> +++ b/src/include/commands/vacuum.h
> @@ -19,6 +19,7 @@
>  #include "catalog/pg_statistic.h"
>  #include "catalog/pg_type.h"
>  #include "nodes/parsenodes.h"
> +#include "parser/parse_node.h"

(and others) is not just apparent; it's also redundant in practice.  And
it's not like parse_node.h is ever going to be able not to depend on
parsenodes.h, so I would vote to remove nodes/parsenodes.h from the
headers where you're adding parser/parse_node.h.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




[PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

2020-03-09 Thread Rémi Lapeyre
Here's a rebased version that should apply cleanly on HEAD.

---
 contrib/file_fdw/input/file_fdw.source  |  7 +-
 contrib/file_fdw/output/file_fdw.source | 13 ++--
 doc/src/sgml/ref/copy.sgml  |  9 ++-
 src/backend/commands/copy.c | 93 ++---
 src/test/regress/input/copy.source  | 71 ++-
 src/test/regress/output/copy.source | 58 ++-
 6 files changed, 202 insertions(+), 49 deletions(-)

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;

 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');  -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');  -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':'); -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');-- ERROR
@@ -80,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);

+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
 word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..d76a3dc6f8 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');  -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');  -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':'); -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');-- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');-- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');   -- ERROR
@@ -95,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
 word1 text OPTIONS (force_not_null 'true'),
@@ -441,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 

Re: backup manifests

2020-03-09 Thread Robert Haas
On Mon, Mar 9, 2020 at 12:22 PM tushar  wrote:
> On 3/5/20 10:25 PM, Robert Haas wrote:
> > Here's an updated patch set responding to many of the comments
> > received thus far.
> Thanks Robert. There is a scenario - if user provide port of v11 server
> at the time of  creating 'base backup'  against pg_basebackup(v13+ your
> patch applied)
> with option --manifest-checksums,will lead to  this  below error
>
> [centos@tushar-ldap-docker bin]$ ./pg_basebackup -R -p 9045
> --manifest-checksums=SHA224 -D dc1
> pg_basebackup: error: could not initiate base backup: ERROR: syntax error
> pg_basebackup: removing data directory "dc1"
> [centos@tushar-ldap-docker bin]$
>
> Steps to reproduce -
> PG v11 is running
> run pg_basebackup against that with option --manifest-checksums

Seems like expected behavior to me. We could consider providing a more
descriptive error message, but there's now way for it to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: recovery_target_action=pause with confusing hint

2020-03-09 Thread Robert Haas
On Mon, Mar 9, 2020 at 12:03 PM Fujii Masao  wrote:
> I started the discussion about the topic very related to this.
> I'm thinking to apply the change that Sergei proposes after applying
> the patch I attached in this thread.
> https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048e...@oss.nttdata.com

I think it would be good to change the primary message, not just the hint.  e.g.

LOG: pausing at end of recovery
HINT: Execute pg_wal_replay_resume() to promote.

vs.

LOG: recovery has paused
HINT: Execute pg_wal_replay_resume() to continue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Atomics in localbuf.c

2020-03-09 Thread Robert Haas
On Fri, Mar 6, 2020 at 2:06 PM Andres Freund  wrote:
> > Since local/shared buffers share the buffer header definition, we still 
> > have to use proper functions to access
> > the atomic variables.
>
> There's only one struct BufferDesc. We could separate them out /
> introduce a union or such. But that'd add some complexity / potential
> for mistakes too.

OK, got it. Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Bug in pg_restore with EventTrigger in parallel mode

2020-03-09 Thread Fabrízio de Royes Mello
On Mon, Mar 9, 2020 at 12:27 PM Tom Lane  wrote:
>
> In the case of event triggers, the obvious counterexample is that if
> you restore ET A and then ET B, ET A might interfere with the attempt
> to restore ET B.  (And we have no way to know whether restoring B
> before A would be better or worse.)
>

Yeap... you're correct.


> So on the whole I find "restore matviews as if they'd been refreshed
> after the restore" to be a more trustworthy approach than the other
> way.  At some level we have to trust that ETs aren't going to totally
> bollix the restore.
>

Ok.

> Which, TBH, makes me wonder about the validity of the original complaint
> in this thread.  I don't mind delaying ET restore as long as we feasibly
> can; but if you have an ET that is going to misbehave during restore,
> you are in for pain, and it's hard to consider that that pain isn't
> self-inflicted.
>

The proposed patch solve the original complain. I was just trying to
understand completely what you pointed out before and I agree with you.
Thanks for the clear explanation.

About the patch LGTM and IMHO we should back-patch it to all supported
versions.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


  1   2   >