Re: Poll: are people okay with function/operator table redesign?

2020-04-18 Thread Pavel Stehule
so 18. 4. 2020 v 22:36 odesílatel Tom Lane  napsal:

> Robert Haas  writes:
> > On Fri, Apr 17, 2020 at 6:30 PM David G. Johnston
> >  wrote:
> >> I feel like writing them as:
> >> + (date, integer) -> date
> >> makes more sense as they are mainly sorted on the operator symbol as
> opposed to the left operand.
>
> > I thought about that, too, but I think the way Tom did it is better.
> > It's much more natural to see it using the syntax with which it will
> > actually be invoked.
>
> Just for the record, I experimented with putting back an "operator name"
> column, as attached.  I think it could be argued either way whether this
> is an improvement or not.
>
> Some notes:
>
> * The column seems annoyingly wide, but the only way to make it narrower
> is to narrow or eliminate the column title, which could be confusing.
> Also, if there's not a fair amount of whitespace, it looks as if the
> initial name is part of the signature, which is *really* confusing,
> cf second screenshot.  (I'm not sure why the vertical rule is rendered
> so much more weakly in this case, but it is.)
>
> * I also tried it with valign="middle" to center the operator name among
> its entries.  This was *not* an improvement, it largely breaks the
> ability to see which entries belong to the name.
>

first variant looks better, because column with operator is wider.

Maybe it can look better if a content will be places to mid point. In left
upper corner it is less readable.

Regards

Pavel


> regards, tom lane
>
>


Re: Implementation DISTINCT for window aggregate function: SUM

2020-04-18 Thread David Rowley
On Sat, 18 Apr 2020 at 23:47, Eugen Konkov  wrote:
> select id, amount, sum(DISTINCT amount) over () as total
>   from xx;

> Why this is not possible in PG?

Mainly because nobody has committed anything to implement it yet.

> Why Window-specific functions do not allow DISTINCT to be used within the 
> function argument list.?
> Which problems are exists?

There are some details in [1] which you might be interested in.

David

[1] 
https://www.postgresql.org/message-id/flat/CAN1Pwonf4waD%2BPWkEFK8ANLua8fPjZ4DmV%2BhixO62%2BLiR8gwaA%40mail.gmail.com




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-18 Thread Justin Pryzby
On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote:
> On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > And, should it use two spaces before "Sort Method", "Memory" and 
> > > "Pre-sorted
> ...
> > I read through that subthread, and the ending seemed to be Peter
> > wanting things to be unified. Was there a conclusion beyond that?
> 
> This discussion is ongoing.  I think let's wait until that's settled before
> addressing this more complex and even newer case.  We can add "explain, two
> spaces and equals vs colon" to the "Open items" list if need be - I hope the
> discussion will not delay the release.

The change proposed on the WAL thread is minimal, and makes new explain(WAL)
output consistent with the that of explain(BUFFERS).

That uses a different format from "Sort", which is what incremental sort should
follow.  (Hashjoin also uses the Sort's format of two-spaces and colons rather
than equals).

So the attached 0001 makes explain output for incremental sort more consistent
with sort:

 - Two spaces;
 - colons rather than equals;
 - Don't use semicolon, which isn't in use anywhere else;

I tested with this:
template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t SELECT 
a-(a%100), a%1000 FROM generate_series(1,9)a; CREATE INDEX ON t(i); VACUUM 
VERBOSE ANALYZE t;
template1=# explain analyze SELECT * FROM t a ORDER BY i,j;
...
   Full-sort Groups: 1000  Sort Method: quicksort  Average Memory: 28kB  Peak 
Memory: 28kB  Pre-sorted Groups: 1000  Sort Method: quicksort  Average Memory: 
30kB  Peak Memory: 30kB

On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote:
> On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
> > > Should "Pre-sorted Groups:" be on a separate line ?
> > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
> > > Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB
> > 
> > I'd originally had that, but Tomas wanted it to be more compact. It's
> > easy to adjust though if the consensus changes on that.
> 
> I'm OK with changing the format if there's a consensus. The current
> format seemed better to me, but I'm not particularly attached to it.

I still think Pre-sorted groups should be on a separate line, as in 0002.
In addition to looking better (to me), and being easier to read, another reason
is that there are essentially key=>values here, but the keys are repeated (Sort
Method, etc).

I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that here.

Michael already patched most of the comment typos, the remainder I'm including
here as a "nearby patch"..

-- 
Justin
>From 55044341f82b847d136cd17df5a3c8d80c8371b4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 15 Apr 2020 08:45:21 -0500
Subject: [PATCH v1 1/3] Fix explain output for incr sort:

 - Two spaces;
 - colons rather than equals;
 - Don't use semicolon;
---
 src/backend/commands/explain.c | 18 +++---
 src/test/regress/expected/incremental_sort.out | 12 ++--
 2 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..9257c52707 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2778,7 +2778,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 	{
 		if (indent)
 			appendStringInfoSpaces(es->str, es->indent * 2);
-		appendStringInfo(es->str, "%s Groups: " INT64_FORMAT " Sort Method", groupLabel,
+		appendStringInfo(es->str, "%s Groups: " INT64_FORMAT "  Sort Method", groupLabel,
 		 groupInfo->groupCount);
 		/* plural/singular based on methodNames size */
 		if (list_length(methodNames) > 1)
@@ -2798,24 +2798,20 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 			const char *spaceTypeName;
 
 			spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
-			appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
+			appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
 			 spaceTypeName, avgSpace,
-			 groupInfo->maxMemorySpaceUsed);
+			 spaceTypeName, groupInfo->maxMemorySpaceUsed);
 		}
 
 		if (groupInfo->maxDiskSpaceUsed > 0)
 		{
 			long		avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
-
 			const char *spaceTypeName;
 
 			spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
-			/* Add a semicolon separator only if memory stats were printed. */
-			if (groupInfo->maxMemorySpaceUsed > 0)
-appendStringInfo(es->str, ";");
-			appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
+			appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
 			 spaceTypeName, avgSpace,
-			 groupInfo->maxDiskSpaceUsed);
+			 spaceTypeName, groupInfo->maxDiskSpaceUsed);
 		}
 	}
 	else
@@ -2899,7 +2895,7 @@ 

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-18 Thread Justin Pryzby
v3 fixes a brown-paper-bag logic error.

-- 
Justin
>From b5de1fc71f805bb8c7ec7e77bfce9a604ccd4bc8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 3 Apr 2020 22:43:26 -0500
Subject: [PATCH v3] fix detaching tables with inherited row triggers

The old behavior is buggy, and the intended behavior was not previously
documented.  So define the behavior that the trigger is removed if the table is
detached.  It is an error if a table being attached already has a trigger of
the same.  This differs from the behavior for indexes and constraints.

See also:
86f575948c773b0ec5b0f27066e37dd93a7f0a96

Discussion:
https://www.postgresql.org/message-id/flat/20200408152412.GZ2228%40telsasoft.com
---
 doc/src/sgml/ref/create_trigger.sgml   |  1 +
 src/backend/commands/tablecmds.c   | 38 ++
 src/bin/psql/describe.c| 12 +--
 src/test/regress/expected/triggers.out | 44 ++
 src/test/regress/sql/triggers.sql  | 18 +++
 5 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/create_trigger.sgml b/doc/src/sgml/ref/create_trigger.sgml
index bde3a63f90..303881fcfb 100644
--- a/doc/src/sgml/ref/create_trigger.sgml
+++ b/doc/src/sgml/ref/create_trigger.sgml
@@ -526,6 +526,7 @@ UPDATE OF column_name1 [, column_name2INSTEAD OF.
   
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 037d457c3d..480ea777ac 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16797,6 +16797,44 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 	}
 	table_close(classRel, RowExclusiveLock);
 
+	/*
+	 * Drop any ROW triggers inherited from partitioned table.
+	 * This undoes what CloneRowTriggersToPartition did.
+	 */
+	{
+		ScanKeyData skey;
+		SysScanDesc	scan;
+		HeapTuple	trigtup;
+		Relation	tgrel;
+
+		ScanKeyInit(, Anum_pg_trigger_tgrelid, BTEqualStrategyNumber,
+F_OIDEQ, ObjectIdGetDatum(RelationGetRelid(partRel)));
+		tgrel = table_open(TriggerRelationId, RowExclusiveLock);
+		scan = systable_beginscan(tgrel, TriggerRelidNameIndexId,
+true, NULL, 1, );
+
+		while (HeapTupleIsValid(trigtup = systable_getnext(scan)))
+		{
+			Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(trigtup);
+
+			if (!OidIsValid(pg_trigger->tgparentid) ||
+	!pg_trigger->tgisinternal ||
+	!TRIGGER_FOR_ROW(pg_trigger->tgtype))
+continue;
+
+			RemoveTriggerById(pg_trigger->oid);
+			deleteDependencyRecordsFor(TriggerRelationId,
+	pg_trigger->oid,
+	false);
+			deleteDependencyRecordsFor(RelationRelationId,
+	pg_trigger->oid,
+	false);
+		}
+
+		systable_endscan(scan);
+		table_close(tgrel, RowExclusiveLock);
+	}
+
 	/*
 	 * Detach any foreign keys that are inherited.  This includes creating
 	 * additional action triggers.
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index f05e914b4d..4cbc741c97 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2939,14 +2939,17 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT t.tgname, "
 		  "pg_catalog.pg_get_triggerdef(t.oid%s), "
-		  "t.tgenabled, %s\n"
+		  "t.tgenabled, %s, %s\n"
 		  "FROM pg_catalog.pg_trigger t\n"
 		  "WHERE t.tgrelid = '%s' AND ",
 		  (pset.sversion >= 9 ? ", true" : ""),
 		  (pset.sversion >= 9 ? "t.tgisinternal" :
 		   pset.sversion >= 80300 ?
 		   "t.tgconstraint <> 0 AS tgisinternal" :
-		   "false AS tgisinternal"), oid);
+		   "false AS tgisinternal"),
+		  (pset.sversion >= 13000 ? "t.tgparentid" :
+		   "0 AS tgparentid"),
+		  oid);
 		if (pset.sversion >= 11)
 			appendPQExpBufferStr(, "(NOT t.tgisinternal OR (t.tgisinternal AND t.tgenabled = 'D') \n"
  "OR EXISTS (SELECT 1 FROM pg_catalog.pg_depend WHERE objid = t.oid \n"
@@ -3062,6 +3065,11 @@ describeOneTableDetails(const char *schemaname,
 		tgdef = usingpos + 9;
 
 	printfPQExpBuffer(, "%s", tgdef);
+
+	/* Visually distinguish inherited triggers XXX: ROW only? */
+	if (*PQgetvalue(result, i, 4) != '0')
+		appendPQExpBufferStr(, ", PARTITION");
+
 	printTableAddFooter(, buf.data);
 }
 			}
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index e9da4ef983..9b544148bf 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2023,6 +2023,50 @@ select tgrelid::regclass, tgname, tgfoid::regproc from pg_trigger
 -++
 (0 rows)
 
+-- check detach behavior
+create trigger trg1 after insert on trigpart for each row execute procedure trigger_nothing();
+\d trigpart3
+ Table "public.trigpart3"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+ b  | integer |   |  | 
+Partition of: trigpart FOR 

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-18 Thread Justin Pryzby
On Thu, Apr 09, 2020 at 09:46:38AM -0400, Tom Lane wrote:
> Amit Langote  writes:
> > On Thu, Apr 9, 2020 at 3:09 AM Tom Lane  wrote:
> >> My point is that so long as you only allow the case of exactly one parent,
> >> you can just delete the child trigger, because it must belong to that
> >> parent.  As soon as there's any flexibility, you are going to end up
> >> reinventing all the stuff we had to invent to manage
> >> maybe-or-maybe-not-inherited columns.  So I think the "detach" idea
> >> is the first step on that road, and I counsel not taking that step.
> 
> > As mentioned upthread, we have behavior #1 for indexes (attach
> > existing / detach & keep), without any of the *islocal, *inhcount
> > infrastructure. It is a bit complex, because we need logic to check
> > the equivalence of an existing index on the partition being attached,
> > so implementing the same behavior for trigger is going to have to be
> > almost as complex.  Considering that #2 will be much simpler to
> > implement, but would be asymmetric with everything else.
> 
> I think there is justification for jumping through some hoops for
> indexes, because they can be extremely expensive to recreate.
> The same argument doesn't hold even a little bit for child
> triggers, though.
> 
> Also it can be expected that an index will still behave sensibly after
> its table is standalone, whereas that's far from obvious for a trigger
> that was meant to work on partition member tables.

I haven't heard a compelling argument for or against either way.

Maybe the worst behavior might be if, during ATTACH, we searched for a matching
trigger, and "merged" it (marked it inherited) if it matched.  That could be
bad if someone *wanted* two triggers, which seems unlikely, but to each their
own.

I implemented the simple way (and, as an experiment, 75% of the hard way).

It occured to me that we don't currently distinguish between a trigger on a
child table, and a trigger on a parent table which was recursively created on a
child.  That makes sense for indexes and constraints, since the objects persist
if the table is detached, so it doesn't matter how it was defined.

But if we remove trigger during DETACH, then it's *not* the same as a trigger
that was defined on the child, and I suggest that in v13 we should make that
visible.

-- 
Justin
>From 19fe0c70e0b4f2c05c538d1a9042f9303c927ae2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 3 Apr 2020 22:43:26 -0500
Subject: [PATCH v2] fix detaching tables with inherited after-row triggers

The old behavior is buggy, and the intended behavior was not previously
documented.  So define the behavior that the trigger is removed if the table is
detached.  It is an error if a table being attached already has a trigger of
the same.  This differs from the behavior for indexes and constraints.

See also:
86f575948c773b0ec5b0f27066e37dd93a7f0a96

Discussion:
https://www.postgresql.org/message-id/flat/20200408152412.GZ2228%40telsasoft.com
---
 doc/src/sgml/ref/create_trigger.sgml   |  1 +
 src/backend/commands/tablecmds.c   | 37 ++
 src/bin/psql/describe.c| 12 +--
 src/test/regress/expected/triggers.out | 44 ++
 src/test/regress/sql/triggers.sql  | 18 +++
 5 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/create_trigger.sgml b/doc/src/sgml/ref/create_trigger.sgml
index bde3a63f90..303881fcfb 100644
--- a/doc/src/sgml/ref/create_trigger.sgml
+++ b/doc/src/sgml/ref/create_trigger.sgml
@@ -526,6 +526,7 @@ UPDATE OF column_name1 [, column_name2INSTEAD OF.
   
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 037d457c3d..78236d152f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16797,6 +16797,43 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 	}
 	table_close(classRel, RowExclusiveLock);
 
+	/*
+	 * Drop any ROW triggers inherited from partitioned table.
+	 * This undoes what CloneRowTriggersToPartition did.
+	 */
+	{
+		ScanKeyData skey;
+		SysScanDesc	scan;
+		HeapTuple	trigtup;
+		Relation	tgrel;
+
+		ScanKeyInit(, Anum_pg_trigger_tgrelid, BTEqualStrategyNumber,
+F_OIDEQ, ObjectIdGetDatum(RelationGetRelid(partRel)));
+		tgrel = table_open(TriggerRelationId, RowExclusiveLock);
+		scan = systable_beginscan(tgrel, TriggerRelidNameIndexId,
+true, NULL, 1, );
+
+		while (HeapTupleIsValid(trigtup = systable_getnext(scan)))
+		{
+			Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(trigtup);
+
+			if (OidIsValid(pg_trigger->tgparentid) &&
+	pg_trigger->tgisinternal &&
+	TRIGGER_FOR_ROW(pg_trigger->tgtype))
+RemoveTriggerById(pg_trigger->oid);
+
+			deleteDependencyRecordsFor(TriggerRelationId,
+	pg_trigger->oid,
+	false);
+			deleteDependencyRecordsFor(RelationRelationId,
+	pg_trigger->oid,
+	false);
+		}
+
+		systable_endscan(scan);
+		table_close(tgrel, RowExclusiveLock);
+	}

Re: relocating the server's backup manifest code

2020-04-18 Thread Michael Paquier
On Sat, Apr 18, 2020 at 10:43:52AM -0400, Robert Haas wrote:
> I'm not in favor of this renaming. Different people have different
> preferences, of course, but my impression is that the general project
> style is to choose names that follow English word ordering, i.e.
> appendStringInfo rather than stringInfoAppend; add_paths_to_joinrel
> rather than joinrel_append_paths; etc. There are many exceptions, but
> I tend to lean toward English word ordering unless it seems to create
> a large amount of awkwardness in a particular case. At any rate, it
> seems a separate question from moving the code.

Both of us have rather different views on the matter then.  I still
prefer my suggestion because that's more consistent and easier to
grep, but I'll be fine with your call at the end.  I would suggest to
still use BackupManifest instead of Manifest in those functions and
structures though, as we cannot really know if the concept of manifest
would apply to other parts of the system.  A recent example of API
name conflict I have in mind is index AM vs table AM.  Index AMs have
been using rather generic names, causing issues when table AMs have
been introduced.

> I'm OK with that. I don't think it's a big deal because "manifest"
> isn't a widely-used PostgreSQL term already, but it doesn't bother me
> to rename it.

Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: where should I stick that backup?

2020-04-18 Thread Greg Stark
Fwiw, it was common trick in the Oracle world to create a named pipe
to gzip and then write your backup to it. I really like that way of
doing things but I suppose it's probably too old-fashioned to expect
to survive. And in practice while it worked for a manual process for a
sysadmin it's pretty awkward to automate reliably.




Re: WAL usage calculation patch

2020-04-18 Thread Justin Pryzby
On Sat, Apr 18, 2020 at 05:39:35PM +0200, Julien Rouhaud wrote:
> On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila  wrote:
> >
> > On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut 
> >  wrote:
> > > On 2020-04-14 05:57, Amit Kapila wrote:
> > > > Peter E, others, any suggestions on how to move forward?  I think here
> > > > we should follow the rule "follow the style of nearby code" which in
> > > > this case would be to have one space after each field as we would like
> > > > it to be closer to the "Buffers" format.  It would be good if we have
> > > > a unified format among all Explain stuff but we might not want to
> > > > change the existing things and even if we want to do that it might be
> > > > a broader/bigger change and we should do that as a PG14 change.  What
> > > > do you think?
> > >
> > > If looks like shortening to fpw= and using one space is the easiest way
> > > to solve this issue.
> > >
> >
> > I am fine with this approach and will change accordingly.  I will wait
> > for a few days (3-4 days) to see if someone shows up with either an
> > objection to this or with a better idea for the display of WAL usage
> > information.
> 
> That was also my preferred alternative.  PFA a patch for that.  I also
> changed to "fpw" for the non textual output for consistency.

Should capitalize at least the non-text one ?  And maybe the text one for
consistency ?

+   ExplainPropertyInteger("WAL fpw", NULL, 

 

And add the acronym to the docs:

$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml:  number of records, number of full page 
writes and amount of WAL bytes

"..full page writes (FPW).."

Should we also change vacuumlazy.c for consistency ?

+_("WAL usage: %ld 
records, %ld full page writes, "
+  UINT64_FORMAT " 
bytes"),

-- 
Justin




Re: fixing old_snapshot_threshold's time->xid mapping

2020-04-18 Thread Andres Freund
Hi,

On 2020-04-17 14:12:44 +1200, Thomas Munro wrote:
> What about a contrib function that lets you clobber
> oldSnapshotControl->current_timestamp?  It looks like all times in
> this system come ultimately from GetSnapshotCurrentTimestamp(), which
> uses that variable to make sure that time never goes backwards.

It'd be better than the current test situation, and probably would be
good to have as part of testing anyway (since it'd allow to make the
tests not take long / be racy on slow machines). But I still don't think
it really allows to test the feature in a natural way. It makes it
easier to test for know edge cases / problems, but not really discover
unknown ones. For that I think we need more granular bins.

- Andres




Re: sqlsmith crash incremental sort

2020-04-18 Thread James Coleman
On Thu, Apr 16, 2020 at 9:26 PM Tom Lane  wrote:
>
> Tomas Vondra  writes:
> > I think we have essentially three options:
> > 1) assuming there's just a single group
> > 2) assuming each row is a separate group
> > 3) something in between
> > If (1) and (2) are worst/best-case scenarios, maybe we should pick
> > something in between. We have DEFAULT_NUM_DISTINCT (200) which
> > essentially says "we don't know what the number of groups is" so maybe
> > we should use that.
>
> I wouldn't recommend picking either the best or worst cases.
>
> Possibly DEFAULT_NUM_DISTINCT is a sane choice, though it's fair to
> wonder if it's quite applicable to the case where we already know
> we've grouped by some columns.

Do you think defining a new default, say,
DEFAULT_NUM_DISTINCT_PRESORTED is preferred then? And choose some
value like "1/2 of the normal DEFAULT_NUM_DISTINCT groups" or some
such?

James




Re: relocating the server's backup manifest code

2020-04-18 Thread Magnus Hagander
On Sat, Apr 18, 2020 at 5:43 PM Tom Lane  wrote:

> Robert Haas  writes:
> > Despite the fact that we are after feature freeze, I think it would be
> > a good idea to commit this to PG 13. It could be saved for PG 14, but
> > that will make future back-patching substantially harder. Also, a
> > patch that's just moving code is pretty low-risk.
>
> +1 in principle --- I didn't read the patch though.
>

Same here, +1 in principle. This is not a new feature, this is "polishing a
feature that was added in 13", and doing so now will save a lot of work
down the road vs doing it later.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-18 Thread Tom Lane
Masahiko Sawada  writes:
> On Sat, 18 Apr 2020 at 00:31, Tom Lane  wrote:
>> +   /* Quick out if not even configured to be synchronous */
>> +   if (SyncRepConfig == NULL)
>> +   return false;

> I felt strange a bit that we do the above check in
> SyncRepGetSyncRecPtr() because SyncRepReleaseWaiters() which is the
> only caller says the following before calling it:

Notice there was such a test in SyncRepGetSyncRecPtr already --- I just
moved it to be before doing some work instead of after.

> Can we either change it to an assertion, move it to before acquiring
> SyncRepLock in SyncRepReleaseWaiters or just remove it?

I have no objection to that in principle, but it seems like it's a
change in SyncRepGetSyncRecPtr's API that is not necessary to fix
this bug.  So I'd rather leave it to happen along with the larger
API changes (getting rid of am_sync) that are proposed for v14.

regards, tom lane




Re: relocating the server's backup manifest code

2020-04-18 Thread Tom Lane
Robert Haas  writes:
> Despite the fact that we are after feature freeze, I think it would be
> a good idea to commit this to PG 13. It could be saved for PG 14, but
> that will make future back-patching substantially harder. Also, a
> patch that's just moving code is pretty low-risk.

+1 in principle --- I didn't read the patch though.

regards, tom lane




Re: WAL usage calculation patch

2020-04-18 Thread Julien Rouhaud
On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila  wrote:
>
> On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut
>  wrote:
> >
> > On 2020-04-14 05:57, Amit Kapila wrote:
> > > Peter E, others, any suggestions on how to move forward?  I think here
> > > we should follow the rule "follow the style of nearby code" which in
> > > this case would be to have one space after each field as we would like
> > > it to be closer to the "Buffers" format.  It would be good if we have
> > > a unified format among all Explain stuff but we might not want to
> > > change the existing things and even if we want to do that it might be
> > > a broader/bigger change and we should do that as a PG14 change.  What
> > > do you think?
> >
> > If looks like shortening to fpw= and using one space is the easiest way
> > to solve this issue.
> >
>
> I am fine with this approach and will change accordingly.  I will wait
> for a few days (3-4 days) to see if someone shows up with either an
> objection to this or with a better idea for the display of WAL usage
> information.

That was also my preferred alternative.  PFA a patch for that.  I also
changed to "fpw" for the non textual output for consistency.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..9cc1b13b76 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3350,13 +3350,13 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
 			appendStringInfoString(es->str, "WAL:");
 
 			if (usage->wal_records > 0)
-appendStringInfo(es->str, "  records=%ld",
+appendStringInfo(es->str, " records=%ld",
  usage->wal_records);
 			if (usage->wal_fpw > 0)
-appendStringInfo(es->str, "  full page writes=%ld",
+appendStringInfo(es->str, " fpw=%ld",
  usage->wal_fpw);
 			if (usage->wal_bytes > 0)
-appendStringInfo(es->str, "  bytes=" UINT64_FORMAT,
+appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
  usage->wal_bytes);
 			appendStringInfoChar(es->str, '\n');
 		}
@@ -3365,7 +3365,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
 	{
 		ExplainPropertyInteger("WAL records", NULL,
 			   usage->wal_records, es);
-		ExplainPropertyInteger("WAL full page writes", NULL,
+		ExplainPropertyInteger("WAL fpw", NULL,
 			   usage->wal_fpw, es);
 		ExplainPropertyUInteger("WAL bytes", NULL,
 usage->wal_bytes, es);


Re: Error on failed COMMIT

2020-04-18 Thread Tony Locke
On Thu, 16 Apr 2020 at 21:16, Shay Rojansky  wrote:
> Npgsql would be fine. In fact, Npgsql doesn't have any specific expectations 
> nor any specific logic around commit; it assumes errors may be returned for 
> any command (COMMIT or otherwise), and surfaces those errors as .NET 
> exceptions.

Hi all, I work on the pg8000 Python driver for Postgres and having
read through the thread I'd like to echo Shay Rojansky's comment and
say that pg8000 would be able to handle the behaviour resulting from
the proposed patch and I support the change of a call to commit()
*always* producing an error if it has failed. I can understand
people's reluctance in general to change server behaviour, but in this
case I think the good outweighs the bad. I think most people expected
the server to be behaving like this anyway.

Regards,

Tony.




Re: where should I stick that backup?

2020-04-18 Thread Robert Haas
On Fri, Apr 17, 2020 at 7:44 PM Andres Freund  wrote:
> This suggest that pipes do have a considerably higher overhead on
> windows, but that it's not all that terrible if one takes care to use
> large buffers in each pipe element.
>
> It's notable though that even the simplest use of a pipe does add a
> considerable overhead compared to using the files directly.

Thanks for these results. I think that this shows that it's probably
not a great idea to force everything to go through pipes in every
case, but on the other hand, there's no reason to be a particularly
scared of the performance implications of letting some things go
through pipes. For instance, if we decide that LZ4 compression is
going to be a good choice for most users, we might want to do that
in-process rather than via pipes. However, if somebody wants to pipe
through an external compressor that they prefer, that's going to be a
little slower, but not necessarily to a degree that creates big
problems. People with bigger databases will need to be more careful
about which options they choose, but that's kind of inevitable.

Do you agree?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: relocating the server's backup manifest code

2020-04-18 Thread Robert Haas
On Sat, Apr 18, 2020 at 8:57 AM Michael Paquier  wrote:
> +static inline bool
> +IsManifestEnabled(manifest_info *manifest)
> +{
> +   return (manifest->buffile != NULL);
> +}
> I would keep this one static and located within backup_manifest.c as
> it is only used there.

Oh, OK.

> +extern void InitializeManifest(manifest_info *manifest,
> +  manifest_option want_manifest,
> +  pg_checksum_type manifest_checksum_type);
> +extern void AppendStringToManifest(manifest_info *manifest, char *s);
> +extern void AddFileToManifest(manifest_info *manifest, const char *spcoid,
> + const char *pathname, size_t size,
> + pg_time_t mtime,
> + pg_checksum_context *checksum_ctx);
> +extern void AddWALInfoToManifest(manifest_info *manifest, XLogRecPtr 
> startptr,
> +TimeLineID starttli, XLogRecPtr endptr,
> +TimeLineID endtli);
> +extern void SendBackupManifest(manifest_info *manifest);
>
> Now the names of those routines is not really consistent if you wish
> to make one single family.  Here is a suggestion:
> - BackupManifestInit()
> - BackupManifestAppendString()
> - BackupManifestAddFile()
> - BackupManifestAddWALInfo()
> - BackupManifestSend()

I'm not in favor of this renaming. Different people have different
preferences, of course, but my impression is that the general project
style is to choose names that follow English word ordering, i.e.
appendStringInfo rather than stringInfoAppend; add_paths_to_joinrel
rather than joinrel_append_paths; etc. There are many exceptions, but
I tend to lean toward English word ordering unless it seems to create
a large amount of awkwardness in a particular case. At any rate, it
seems a separate question from moving the code.

> + * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
> I would vote for some more consistency.  Personally I just use that
> all the time:
>  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development  Group
>  * Portions Copyright (c) 1994, Regents of the University of California

Sure, that's fine.

> +typedef enum manifest_option
> +{
> [...]
> +typedef struct manifest_info
> +{
> These ought to be named backup_manifest_info and backup_manifest_option?

I'm OK with that. I don't think it's a big deal because "manifest"
isn't a widely-used PostgreSQL term already, but it doesn't bother me
to rename it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-18 Thread Dilip Kumar
On Sat, Apr 18, 2020 at 6:12 PM Erik Rijkers  wrote:
>
> On 2020-04-18 11:10, Erik Rijkers wrote:
> > On 2020-04-18 11:07, Erik Rijkers wrote:
>  Hi Erik,
> 
>  While setting up the cascading replication I have hit one issue on
>  base code[1].  After fixing that I have got one crash with streaming
>  on patch.  I am not sure whether you are facing any of these 2
>  issues
>  or any other issue.  If your issue is not any of these then plese
>  share the callstack and steps to reproduce.
> >>
> >> I figured out a few things about this. Attached is a bash script
> >> test.sh, to reproduce:
> >
> > And the attached file, test.sh.  (sorry)
>
> It turns out I must have been mistaken somewhere.  I probably missed
> bugfix_in_schema_sent.patch)
>
> I have just now rebuilt all the instances on top of master with these
> patches:
>
> > [v14-0001-Immediately-WAL-log-assignments.patch]
> > [v14-0002-Issue-individual-invalidations-with.patch]
> > [v14-0003-Extend-the-output-plugin-API-with-stream-methods.patch]
> > [v14-0004-Gracefully-handle-concurrent-aborts-of-uncommitt.patch]
> > [v14-0005-Implement-streaming-mode-in-ReorderBuffer.patch]
> > [v14-0006-Add-support-for-streaming-to-built-in-replicatio.patch]
> > [v14-0007-Track-statistics-for-streaming.patch]
> > [v14-0008-Enable-streaming-for-all-subscription-TAP-tests.patch]
> > [v14-0009-Add-TAP-test-for-streaming-vs.-DDL.patch]
> > [v14-0010-Bugfix-handling-of-incomplete-toast-tuple.patch]
> > [bugfix_in_schema_sent.patch]
>
> (by the way: this build's regression tests  'ddl', 'toast', and
> 'spill' fail)
>
> I seem now able to run all my test programs on these instances without
> errors.
>
> Sorry, I seem to have raised a false alarm (although there was initially
> certainly a problem).

No problem,  Thanks for confirming.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




valgrind error

2020-04-18 Thread Andrew Dunstan


I was just trying to revive lousyjack, my valgrind buildfarm animal
which has been offline for 12 days, after having upgraded the machine
(fedora 31, gcc 9.3.1, valgrind 3.15) and noticed lots of errors like this:


2020-04-17 19:26:03.483 EDT [63741:3] pg_regress LOG:  statement: CREATE
DATABASE "regression" TEMPLATE=template0
==63717== VALGRINDERROR-BEGIN
==63717== Use of uninitialised value of size 8
==63717==    at 0xAC5BB5: pg_comp_crc32c_sb8 (pg_crc32c_sb8.c:82)
==63717==    by 0x55A98B: XLogRecordAssemble (xloginsert.c:785)
==63717==    by 0x55A268: XLogInsert (xloginsert.c:461)
==63717==    by 0x8BC9E0: LogCurrentRunningXacts (standby.c:1005)
==63717==    by 0x8BC8F9: LogStandbySnapshot (standby.c:961)
==63717==    by 0x550CB3: CreateCheckPoint (xlog.c:8937)
==63717==    by 0x82A3B2: CheckpointerMain (checkpointer.c:441)
==63717==    by 0x56347D: AuxiliaryProcessMain (bootstrap.c:453)
==63717==    by 0x83CA18: StartChildProcess (postmaster.c:5474)
==63717==    by 0x83A120: reaper (postmaster.c:3045)
==63717==    by 0x4874B1F: ??? (in /usr/lib64/libpthread-2.30.so)
==63717==    by 0x5056F29: select (in /usr/lib64/libc-2.30.so)
==63717==    by 0x8380A0: ServerLoop (postmaster.c:1691)
==63717==    by 0x837A1F: PostmasterMain (postmaster.c:1400)
==63717==    by 0x74A71D: main (main.c:210)
==63717==  Uninitialised value was created by a stack allocation
==63717==    at 0x8BC942: LogCurrentRunningXacts (standby.c:984)
==63717==
==63717== VALGRINDERROR-END
{
   
   Memcheck:Value8
   fun:pg_comp_crc32c_sb8
   fun:XLogRecordAssemble
   fun:XLogInsert
   fun:LogCurrentRunningXacts
   fun:LogStandbySnapshot
   fun:CreateCheckPoint
   fun:CheckpointerMain
   fun:AuxiliaryProcessMain
   fun:StartChildProcess
   fun:reaper
   obj:/usr/lib64/libpthread-2.30.so
   fun:select
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}


I can't see what the problem is immediately.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: relocating the server's backup manifest code

2020-04-18 Thread Michael Paquier
On Sat, Apr 18, 2020 at 08:37:58AM -0400, Robert Haas wrote:
> Despite the fact that we are after feature freeze, I think it would be
> a good idea to commit this to PG 13. It could be saved for PG 14, but
> that will make future back-patching substantially harder. Also, a
> patch that's just moving code is pretty low-risk.

Makes sense to move this code around, so that's fine by me to do it
even after the feature freeze as it means less back-patching pain in
the future.

+static inline bool
+IsManifestEnabled(manifest_info *manifest)
+{
+   return (manifest->buffile != NULL);
+}
I would keep this one static and located within backup_manifest.c as
it is only used there.

+extern void InitializeManifest(manifest_info *manifest,
+  manifest_option want_manifest,
+  pg_checksum_type manifest_checksum_type);
+extern void AppendStringToManifest(manifest_info *manifest, char *s);
+extern void AddFileToManifest(manifest_info *manifest, const char *spcoid,
+ const char *pathname, size_t size,
+ pg_time_t mtime,
+ pg_checksum_context *checksum_ctx);
+extern void AddWALInfoToManifest(manifest_info *manifest, XLogRecPtr startptr,
+TimeLineID starttli, XLogRecPtr endptr,
+TimeLineID endtli);
+extern void SendBackupManifest(manifest_info *manifest);

Now the names of those routines is not really consistent if you wish
to make one single family.  Here is a suggestion:
- BackupManifestInit()
- BackupManifestAppendString()
- BackupManifestAddFile()
- BackupManifestAddWALInfo()
- BackupManifestSend()

+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
I would vote for some more consistency.  Personally I just use that
all the time:
 * Portions Copyright (c) 1996-2020, PostgreSQL Global Development  Group
 * Portions Copyright (c) 1994, Regents of the University of California

+typedef enum manifest_option
+{
[...]
+typedef struct manifest_info
+{
These ought to be named backup_manifest_info and backup_manifest_option?
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-18 Thread Erik Rijkers

On 2020-04-18 11:10, Erik Rijkers wrote:

On 2020-04-18 11:07, Erik Rijkers wrote:

Hi Erik,

While setting up the cascading replication I have hit one issue on
base code[1].  After fixing that I have got one crash with streaming
on patch.  I am not sure whether you are facing any of these 2 
issues

or any other issue.  If your issue is not any of these then plese
share the callstack and steps to reproduce.


I figured out a few things about this. Attached is a bash script
test.sh, to reproduce:


And the attached file, test.sh.  (sorry)


It turns out I must have been mistaken somewhere.  I probably missed 
bugfix_in_schema_sent.patch)


I have just now rebuilt all the instances on top of master with these 
patches:



[v14-0001-Immediately-WAL-log-assignments.patch]
[v14-0002-Issue-individual-invalidations-with.patch]
[v14-0003-Extend-the-output-plugin-API-with-stream-methods.patch]
[v14-0004-Gracefully-handle-concurrent-aborts-of-uncommitt.patch]
[v14-0005-Implement-streaming-mode-in-ReorderBuffer.patch]
[v14-0006-Add-support-for-streaming-to-built-in-replicatio.patch]
[v14-0007-Track-statistics-for-streaming.patch]
[v14-0008-Enable-streaming-for-all-subscription-TAP-tests.patch]
[v14-0009-Add-TAP-test-for-streaming-vs.-DDL.patch]
[v14-0010-Bugfix-handling-of-incomplete-toast-tuple.patch]
[bugfix_in_schema_sent.patch]


   (by the way: this build's regression tests  'ddl', 'toast', and 
'spill' fail)


I seem now able to run all my test programs on these instances without 
errors.


Sorry, I seem to have raised a false alarm (although there was initially 
certainly a problem).



Erik Rijkers




There is a variable  CRASH_IT  that determines whether the whole thing
will fail (with a segmentation fault) or not.  As attached it has
CRASH_IT=0 and does not crash.  When you change that to CRASH_IT=1,
then it will crash.  It turns out that this just depends on a short
wait state (3 seconds, on my machine) between setting up de
replication, and the running of pgbench.  It's possible that on very
fast machines maybe it does not occur; we've had such difference
between hardware before. This is a i5-3330S.

It deletes files so look it over before you run it.  It may also
depend on some of my local set-up but I guess that should be easily
fixed.

Can you let me know if you can reproduce the problem with this?

thanks,

Erik Rijkers





[1]
https://www.postgresql.org/message-id/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w%40mail.gmail.com


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com





relocating the server's backup manifest code

2020-04-18 Thread Robert Haas
Hi,

The backup manifest patch added a bunch of new code to
src/backend/replication/basebackup.c, and while lamenting the
complexity of that source file yesterday, I suddenly realized that I'd
unwittingly contributed to making that problem worse, and that it
would be quite easy to move the code added by that patch into a
separate file. Attached is a patch to do that.

Despite the fact that we are after feature freeze, I think it would be
a good idea to commit this to PG 13. It could be saved for PG 14, but
that will make future back-patching substantially harder. Also, a
patch that's just moving code is pretty low-risk.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v1-0001-Move-the-server-s-backup-manifest-code-to-a-separ.patch
Description: Binary data


Re: snapshot too old issues, first around wraparound and then more.

2020-04-18 Thread Robert Haas
On Fri, Apr 17, 2020 at 4:40 PM Thomas Munro  wrote:
> I understood that you'd forked a new thread to discuss one particular
> problem among the many that Andres nailed to the door, namely the xid
> map's failure to be monotonic, and here I was responding to other
> things from his list, namely the lack of defences against wrap-around
> and the lack of testing.  Apparently I misunderstood.

Oh, maybe I'm the one who misunderstood...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Poll: are people okay with function/operator table redesign?

2020-04-18 Thread Robert Haas
On Fri, Apr 17, 2020 at 6:30 PM David G. Johnston
 wrote:
> I feel like writing them as:
>
> + (date, integer) -> date
>
> makes more sense as they are mainly sorted on the operator symbol as opposed 
> to the left operand.

I thought about that, too, but I think the way Tom did it is better.
It's much more natural to see it using the syntax with which it will
actually be invoked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Implementation DISTINCT for window aggregate function: SUM

2020-04-18 Thread Eugen Konkov
Hello PostgreSQL-development,

Oracle has implementation:

select id, amount, sum(DISTINCT amount) over () as total
  from xx;


https://dbfiddle.uk/?rdbms=oracle_18=8eeb60183ec9576ddb4b2c9f2874d09f


Why this is not possible in PG?
https://dbfiddle.uk/?rdbms=postgres_12=97c05203af4c927ff9f206e164752767


Why Window-specific functions do not allow DISTINCT to be used within the 
function argument list.?
Which problems are exists?


-- 
Best regards,
Eugen Konkov





Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-18 Thread Ranier Vilela
Em sex., 17 de abr. de 2020 às 15:44, Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> escreveu:

>
> On Fri, Apr 17, 2020 at 10:33 AM Amit Kapila 
> wrote:
>
>>
>> I see some differences in the output when _create_locale() is used vs.
>> when GetLocaleInfoEx() is used.  Forex.
>>
>
> Thanks for the thorough review.
>
>
>> The function IsoLocaleName() header comment says "Convert a Windows
>> setlocale() argument to a Unix-style one", so I was expecting above
>> cases which gives valid values with _create_locale() should also work
>> with GetLocaleInfoEx().  If it is fine for GetLocaleInfoEx() to return
>> an error for the above cases, then we need an explanation of the same
>> and probably add a few comments as well.  So, I am not sure if we can
>> conclude that GetLocaleInfoEx() is an alternative to _create_locale()
>> at this stage.
>>
>
> We can get a match for those locales in non-ISO format by enumerating
> available locales with EnumSystemLocalesEx(), and trying to find a match.
>
> Please find a new patch for so.
>
I have some observations about this patch, related to style, if you will
allow me.
1. argv variable on function enum_locales_fn can be reduced.
2. Var declaration len escapes the Postgres style.
3. Why call wcslen(test_locale), again, when var len have the size?

+static BOOL CALLBACK
+enum_locales_fn(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
+{
+ WCHAR test_locale[LOCALE_NAME_MAX_LENGTH];
+
+ memset(test_locale, 0, sizeof(test_locale));
+ if (GetLocaleInfoEx(pStr, LOCALE_SENGLISHLANGUAGENAME,
+ test_locale, LOCALE_NAME_MAX_LENGTH) > 0)
+ {
+ size_t len;
+
+ wcscat(test_locale, L"_");
+ len = wcslen(test_locale);
+ if (GetLocaleInfoEx(pStr, LOCALE_SENGLISHCOUNTRYNAME,
+ test_locale + len, LOCALE_NAME_MAX_LENGTH - len) > 0)
+ {
+ WCHAR **argv = (WCHAR **) lparam;
+
+ if (wcsncmp(argv[0], test_locale, len) == 0)
+ {
+ wcscpy(argv[1], pStr);
+ return FALSE;
+ }
+ }
+ }
+ return TRUE;
+}

regards,
Ranier Vilela


Re: fixing old_snapshot_threshold's time->xid mapping

2020-04-18 Thread Dilip Kumar
On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro  wrote:
>
> On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro  wrote:
> > What about a contrib function that lets you clobber
> > oldSnapshotControl->current_timestamp?  It looks like all times in
> > this system come ultimately from GetSnapshotCurrentTimestamp(), which
> > uses that variable to make sure that time never goes backwards.
>
> Here's a draft TAP test that uses that technique successfully, as a
> POC.  It should probably be extended to cover more cases, but I
> thought I'd check what people thought of the concept first before
> going further.  I didn't see a way to do overlapping transactions with
> PostgresNode.pm, so I invented one (please excuse the bad perl); am I
> missing something?  Maybe it'd be better to do 002 with an isolation
> test instead, but I suppose 001 can't be in an isolation test, since
> it needs to connect to multiple databases, and it seemed better to do
> them both the same way.  It's also not entirely clear to me that
> isolation tests can expect a database to be fresh and then mess with
> dangerous internal state, whereas TAP tests set up and tear down a
> cluster each time.
>
> I think I found another bug in MaintainOldSnapshotTimeMapping(): if
> you make time jump by more than old_snapshot_threshold in one go, then
> the map gets cleared and then no early pruning or snapshot-too-old
> errors happen.  That's why in 002_too_old.pl it currently advances
> time by 10 minutes twice, instead of 20 minutes once.  To be
> continued.

IMHO that doesn't seems to be a problem.  Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right.  The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-18 Thread Erik Rijkers

On 2020-04-18 11:07, Erik Rijkers wrote:

Hi Erik,

While setting up the cascading replication I have hit one issue on
base code[1].  After fixing that I have got one crash with streaming
on patch.  I am not sure whether you are facing any of these 2 issues
or any other issue.  If your issue is not any of these then plese
share the callstack and steps to reproduce.


I figured out a few things about this. Attached is a bash script
test.sh, to reproduce:


And the attached file, test.sh.  (sorry)


There is a variable  CRASH_IT  that determines whether the whole thing
will fail (with a segmentation fault) or not.  As attached it has
CRASH_IT=0 and does not crash.  When you change that to CRASH_IT=1,
then it will crash.  It turns out that this just depends on a short
wait state (3 seconds, on my machine) between setting up de
replication, and the running of pgbench.  It's possible that on very
fast machines maybe it does not occur; we've had such difference
between hardware before. This is a i5-3330S.

It deletes files so look it over before you run it.  It may also
depend on some of my local set-up but I guess that should be easily
fixed.

Can you let me know if you can reproduce the problem with this?

thanks,

Erik Rijkers





[1]
https://www.postgresql.org/message-id/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w%40mail.gmail.com


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
#!/bin/bash
unset PGSERVICE PGSERVICEFILE PGDATA PGPORT PGDATABASE
# PGPASSFILE must be set and have the appropriate entries

env | grep -E '^PG'

  PROJECT=large_logical
# PROJECT=HEAD
NUM_INSTANCES=3

  BIN_DIR=$HOME/pg_stuff/pg_installations/pgsql.$PROJECT/bin
 POSTGRES=$BIN_DIR/postgres
   INITDB=$BIN_DIR/initdb
  TMP_DIR=$HOME'/tmp/'$PROJECT
   devel_file=${TMP_DIR}'/.devel'

  BASE_PORT=6015  #   ports 6015, 6016, 6017 
# BASE_PORT=6515
port1=$(( $BASE_PORT + 0 ))
port2=$(( $port1 + 1 ))
port3=$(( $port1 + 2 ))
scale=1  dbname=postgres  pubname=pub1  subname=sub1
if [[ ! -d $TMP_DIR ]]; then mkdir $TMP_DIR; fi
echo 's3kr1t' > $devel_file
  max_wal_senders=10  # publication side
max_replication_slots=10  # publication side and subscription side
 max_worker_processes=12  # subscription side
  max_logical_replication_workers=10  # subscription side
max_sync_workers_per_subscription=4   # subscription side
for n in `seq 1 $NUM_INSTANCES`; do
  port=$(( $BASE_PORT + $n -1 ))
data_dir=$TMP_DIR/pgsql.instance${n}/data
  server_dir=$TMP_DIR/pgsql.instance${n}
  $INITDB --pgdata=$data_dir --encoding=UTF8 --auth=scram-sha-256 --pwfile=$devel_file &> initdb.$port.log 
  rc=$?
  if [[ $rc -ne 0 ]]; then
echo "-- initdb $?"
  fi
 ( $POSTGRES -D $data_dir -p $port \
--wal_level=logical \
--max_replication_slots=$max_replication_slots \
--max_worker_processes=$max_worker_processes \
--max_logical_replication_workers=$max_logical_replication_workers \
--max_wal_senders=$max_wal_senders \
--max_sync_workers_per_subscription=$max_sync_workers_per_subscription \
--logging_collector=on \
--log_directory=${server_dir} \
--log_filename=logfile.${port} \
--log_replication_commands=on \
--autovacuum=off  & )
#   --logical_work_mem=128MB & )
#   pg_isready -d $dbname --timeout=60 -p $port
done

#sleep $NUM_INSTANCES
#sleep $NUM_INSTANCES

#pg_isready -d $dbname -qp 6015 --timeout=60
#pg_isready -d $dbname -qp 6016 --timeout=60
num_loop=$(( $NUM_INSTANCES - 1 ))

$BIN_DIR/pgbench --port=$BASE_PORT --quiet --initialize --scale=$scale $dbname

echo "alter table pgbench_history add column hid serial primary key" | $BIN_DIR/psql -d $dbname -p $BASE_PORT -X
#pg_isready -d $dbname -qp 6015 --timeout=60
#pg_isready -d $dbname -qp 6016 --timeout=60
for n in `seq 1 $num_loop`; do
  target_port=$(( $BASE_PORT + $n ))
  pg_dump -Fc -p $BASE_PORT \
--exclude-table-data=pgbench_history  --exclude-table-data=pgbench_accounts \
--exclude-table-data=pgbench_branches --exclude-table-data=pgbench_tellers \
-tpgbench_history -tpgbench_accounts \
-tpgbench_branches -tpgbench_tellers \
$dbname | pg_restore -1 -p $target_port -d $dbname
done

for n in `seq 1 $num_loop`; do
  pubport=$(( $BASE_PORT + $n - 1 ))
  subport=$(( $BASE_PORT + $n ))
  appname='casc:'${subport}'<'${pubport}
  echo "create publication  $pubname for all tables" | psql -d $dbname -p $pubport -X
  echo "create subscription $subname
connection 'port=${pubport} dbname=${dbname} application_name=${appname}'
publication $pubname with (enabled=false, slot_name=${subname}_${subport});" | psql -d $dbname -p $subport -X
  echo "alter subscription $subname enable; " | psql -d $dbname -p $subport -X
done

CRASH_IT=0  # 0: no crash;  1: crash

if [[ $CRASH_IT -ne 1 ]]
then
  echo "-- sleep 3" 
   sleep 3  #  this wait will avoid the crash
fi

echo " (doing a pgbench run for 3 seconds)"
echo "-- 

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-18 Thread Erik Rijkers

On 2020-04-16 11:46, Erik Rijkers wrote:

On 2020-04-16 11:33, Dilip Kumar wrote:

On Tue, Apr 14, 2020 at 9:14 PM Erik Rijkers  wrote:


On 2020-04-14 12:10, Dilip Kumar wrote:

> v14-0001-Immediately-WAL-log-assignments.patch +
> v14-0002-Issue-individual-invalidations-with.patch +
> v14-0003-Extend-the-output-plugin-API-with-stream-methods.patch+
> v14-0004-Gracefully-handle-concurrent-aborts-of-uncommitt.patch+
> v14-0005-Implement-streaming-mode-in-ReorderBuffer.patch   +
> v14-0006-Add-support-for-streaming-to-built-in-replicatio.patch+
> v14-0007-Track-statistics-for-streaming.patch  +
> v14-0008-Enable-streaming-for-all-subscription-TAP-tests.patch +
> v14-0009-Add-TAP-test-for-streaming-vs.-DDL.patch  +
> v14-0010-Bugfix-handling-of-incomplete-toast-tuple.patch

applied on top of 8128b0c (a few hours ago)




I've added your new patch

[bugfix_replica_identity_full_on_subscriber.patch]

on top of all those above but the crash (apparently the same crash)
that I had earlier still occurs (and pretty soon).

server process (PID 1721) was terminated by signal 11: Segmentation 
fault


I'll try to isolate it better and get a stacktrace



Hi Erik,

While setting up the cascading replication I have hit one issue on
base code[1].  After fixing that I have got one crash with streaming
on patch.  I am not sure whether you are facing any of these 2 issues
or any other issue.  If your issue is not any of these then plese
share the callstack and steps to reproduce.


I figured out a few things about this. Attached is a bash script 
test.sh, to reproduce:


There is a variable  CRASH_IT  that determines whether the whole thing 
will fail (with a segmentation fault) or not.  As attached it has  
CRASH_IT=0 and does not crash.  When you change that to CRASH_IT=1, then 
it will crash.  It turns out that this just depends on a short wait 
state (3 seconds, on my machine) between setting up de replication, and 
the running of pgbench.  It's possible that on very fast machines maybe 
it does not occur; we've had such difference between hardware before. 
This is a i5-3330S.


It deletes files so look it over before you run it.  It may also depend 
on some of my local set-up but I guess that should be easily fixed.


Can you let me know if you can reproduce the problem with this?

thanks,

Erik Rijkers





[1]
https://www.postgresql.org/message-id/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w%40mail.gmail.com


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com





Re: 001_rep_changes.pl stalls

2020-04-18 Thread Noah Misch
On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > By the way, if latch is consumed in WalSndLoop, succeeding call to
> > WalSndWaitForWal cannot be woke-up by the latch-set.  Doesn't that
> > cause missing wakeups? (in other words, overlooking of wakeup latch).
> 
> - Since the only source other than timeout of walsender wakeup is latch,
> - we should avoid wasteful consuming of latch. (It is the same issue
> - with [1]).
> 
> + Since walsender is wokeup by LSN advancement via latch, we should
> + avoid wasteful consuming of latch. (It is the same issue with [1]).
> 
> 
> > If wakeup signal is not remembered on walsender (like
> > InterruptPending), WalSndPhysical cannot enter a sleep with
> > confidence.

No; per latch.h, "What must be avoided is placing any checks for asynchronous
events after WaitLatch and before ResetLatch, as that creates a race
condition."  In other words, the thing to avoid is calling ResetLatch()
without next examining all pending work that a latch would signal.  Each
walsender.c WaitLatch call does follow the rules.

On Sat, Apr 18, 2020 at 12:29:58AM +0900, Fujii Masao wrote:
> On 2020/04/17 14:41, Noah Misch wrote:
> >1. Make XLogSendPhysical() more like XLogSendLogical(), calling
> >WalSndWaitForWal() when no WAL is available.  A quick version of this
> >passes tests, but I'll need to audit WalSndWaitForWal() for things that 
> > are
> >wrong for physical replication.
> 
> (1) makes even physical replication walsender sleep in two places and
> which seems to make the code for physical replication complicated
> more than necessary. I'd like to avoid (1) if possible.

Good point.

> >2. Make XLogSendLogical() more like XLogSendPhysical(), returning when
> >insufficient WAL is available.  This complicates the xlogreader.h API to
> >pass back "wait for this XLogRecPtr", and we'd then persist enough state 
> > to
> >resume decoding.  This doesn't have any advantages to make up for those.
> >
> >3. Don't avoid waiting in WalSndLoop(); instead, fix the stall by copying the
> >WalSndKeepalive() call from WalSndWaitForWal() to WalSndLoop().  This 
> > risks
> >further drift between the two wait sites; on the other hand, one could
> >refactor later to help avoid that.
> 
> Since the additional call of WalSndKeepalive() is necessary only for
> logical replication, it should be copied to, e.g., XLogSendLogical(),
> instead of WalSndLoop()? For example, when XLogSendLogical() sets
> WalSndCaughtUp to true, it should call WalSndKeepalive()?

We'd send a keepalive even when pq_flush_if_writable() can't empty the output
buffer.  That could be acceptable, but it's not ideal.

> The root problem seems that when WAL record that's no-opes for
> logical rep is processed, keep alive message has not sent immediately,
> in spite of that we want pg_stat_replication to be updated promptly.

The degree of promptness should be predictable, at least.  If we removed the
WalSndKeepalive() from WalSndWaitForWal(), pg_stat_replication updates would
not be prompt, but they would be predictable.  I do, however, think prompt
updates are worthwhile.

> (3) seems to try to address this problem straightly and looks better to me.
> 
> >4. Keep the WalSndLoop() wait, but condition it on !logical.  This is the
> >minimal fix, but it crudely punches through the abstraction between
> >WalSndLoop() and its WalSndSendDataCallback.
> 
> (4) also looks good because it's simple, if we can redesign those
> functions in good shape.

Let's do that.  I'm attaching the replacement implementation and the revert of
v1.
Author: Noah Misch 
Commit: Noah Misch 

Revert "When WalSndCaughtUp, sleep only in WalSndWaitForWal()."

This reverts commit 421685812290406daea58b78dfab0346eb683bbb.  It caused
idle physical walsenders to busy-wait, as reported by Fujii Masao.

Discussion: https://postgr.es/m/20200417054146.ga1061...@rfd.leadboat.com

diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index fc475d1..122d884 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1428,10 +1428,8 @@ WalSndWaitForWal(XLogRecPtr loc)
/*
 * We only send regular messages to the client for full decoded
 * transactions, but a synchronous replication and walsender 
shutdown
-* possibly are waiting for a later location. So, before 
sleeping, we
-* send a ping containing the flush location. If the receiver is
-* otherwise idle, this keepalive will trigger a reply. 
Processing the
-* reply will update these MyWalSnd locations.
+* possibly are waiting for a later location. So we send pings
+* containing the flush location every now and then.
 */
   

Re: fixing old_snapshot_threshold's time->xid mapping

2020-04-18 Thread Thomas Munro
On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro  wrote:
> What about a contrib function that lets you clobber
> oldSnapshotControl->current_timestamp?  It looks like all times in
> this system come ultimately from GetSnapshotCurrentTimestamp(), which
> uses that variable to make sure that time never goes backwards.

Here's a draft TAP test that uses that technique successfully, as a
POC.  It should probably be extended to cover more cases, but I
thought I'd check what people thought of the concept first before
going further.  I didn't see a way to do overlapping transactions with
PostgresNode.pm, so I invented one (please excuse the bad perl); am I
missing something?  Maybe it'd be better to do 002 with an isolation
test instead, but I suppose 001 can't be in an isolation test, since
it needs to connect to multiple databases, and it seemed better to do
them both the same way.  It's also not entirely clear to me that
isolation tests can expect a database to be fresh and then mess with
dangerous internal state, whereas TAP tests set up and tear down a
cluster each time.

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen.  That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once.  To be
continued.
From b78644a0f9580934b136ca8413366de91198203f Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v2 1/6] Expose oldSnapshotControl.

---
 src/backend/utils/time/snapmgr.c | 55 +--
 src/include/utils/old_snapshot.h | 75 
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 1c063c592c..abaaea569a 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -74,59 +75,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h