date:20210415

Re: Replication slot stats misgivings

2021-04-15 Thread Amit Kapila

On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada  wrote:
>
> Thank you for the update! The patch looks good to me.
>

I have pushed the first patch. Comments on the next patch
v13-0001-Use-HTAB-for-replication-slot-statistics:
1.
+ /*
+ * Check for all replication slots in stats hash table. We do this check
+ * when replSlotStats has more than max_replication_slots entries, i.e,
+ * when there are stats for the already-dropped slot, to avoid frequent
+ * call SearchNamedReplicationSlot() which acquires LWLock.
+ */
+ if (replSlotStats && hash_get_num_entries(replSlotStats) >
max_replication_slots)
+ {
+ PgStat_ReplSlotEntry *slotentry;
+
+ hash_seq_init(, replSlotStats);
+ while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search()) != NULL)
+ {
+ if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+ pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+ }
+ }

Is SearchNamedReplicationSlot() so frequently used that we need to do
this only when the hash table has entries more than
max_replication_slots? I think it would be better if we can do it
without such a condition to reduce the chances of missing the slot
stats. We don't have any such restrictions for any other cases in this
function.

I think it is better to add CHECK_FOR_INTERRUPTS in the above while loop?

2.
/*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry

I think the comment above this structure can be changed to "The
collector's data per slot" or something like that. Also, if we have to
follow table/function/db style, then probably this structure should be
named as PgStat_StatReplSlotEntry.

3.
- * create the statistics for the replication slot.
+ * create the statistics for the replication slot. In case where the
+ * message for dropping the old slot gets lost and a slot with the same is

/the same is/the same name is/.

Can we mention something similar to what you have added here in docs as well?

4.
+CREATE VIEW pg_stat_replication_slots AS
+SELECT
+s.slot_name,
+s.spill_txns,
+s.spill_count,
+s.spill_bytes,
+s.stream_txns,
+s.stream_count,
+s.stream_bytes,
+s.total_txns,
+s.total_bytes,
+s.stats_reset
+FROM pg_replication_slots as r,
+LATERAL pg_stat_get_replication_slot(slot_name) as s
+WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..

-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

-- 
With Regards,
Amit Kapila.

Re: TRUNCATE on foreign table

2021-04-15 Thread Kyotaro Horiguchi

At Fri, 16 Apr 2021 11:54:16 +0900, Fujii Masao  
wrote in 
> On 2021/04/16 9:15, Bharath Rupireddy wrote:
> > On Thu, Apr 15, 2021 at 8:19 PM Fujii Masao 
> > wrote:
> >> On 2021/04/14 12:54, Bharath Rupireddy wrote:
> >>> IMHO, we can push all the TRUNCATE options (ONLY, RESTRICTED, CASCADE,
> >>> RESTART/CONTINUE IDENTITY), because it doesn't have any major
> >>> challenge(implementation wise) unlike pushing some clauses in
> >>> SELECT/UPDATE/DELETE and we already do this on the master. It doesn't
> >>> look good and may confuse users, if we push some options and restrict
> >>> others. We should have an explicit note in the documentation saying we
> >>> push all these options to the remote server. We can leave it to the
> >>> user to write TRUNCATE for foreign tables with the appropriate
> >>> options. If somebody complains about a problem that they will face
> >>> with this behavior, we can revisit.
> >>
> >> That's one of the options. But I'm afraid it's hard to drop (revisit)
> >> the feature once it has been released. So if there is no explicit
> >> use case for that, basically I'd like to drop that before release
> >> like we agree to drop unused TRUNCATE_REL_CONTEXT_CASCADING.
> > Thanks. Looks like the decision is going in the direction of
> > restricting those options, I will withdraw my point.
> 
> We are still discussing whether RESTRICT option should be pushed down to
> a foreign data wrapper. But ISTM at least we could reach the consensus about
> the drop of extra information for each foreign table. So what about applying
> the attached patch and remove the extra information at first?

I'm fine with that direction. Thanks for the patch.

The change is straight-forward and looks fine, except the following
part.

 contrib/postgres_fdw/sql/postgres_fdw.sql: 2436 -- after patching
2436> -- in case when remote table has inherited children
2437> CREATE TABLE tru_rtable0_child () INHERITS (tru_rtable0);
2438> INSERT INTO tru_rtable0 (SELECT x FROM generate_series(5,9) x);
2439> INSERT INTO tru_rtable0_child (SELECT x FROM generate_series(10,14) x);
2440> SELECT sum(id) FROM tru_ftable;   -- 95
2441>
2442> TRUNCATE ONLY tru_ftable; -- truncate both parent and child
2443> SELECT count(*) FROM tru_ftable;   -- 0
2444>
2445> INSERT INTO tru_rtable0 (SELECT x FROM generate_series(21,25) x);
2446> SELECT sum(id) FROM tru_ftable;   -- 115
2447> TRUNCATE tru_ftable;  -- truncate both of parent and 
child
2448> SELECT count(*) FROM tru_ftable;-- 0

L2445-L2448 doesn't work as described since L2445 inserts tuples only
to the parent.

And there's a slight difference for no reason between the comment at
2442 and 2447.

(The attached is a fix on top of the proposed patch.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1a3f5cb4ad..d32f291089 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8388,7 +8388,7 @@ SELECT sum(id) FROM tru_ftable;   -- 95
   95
 (1 row)
 
-TRUNCATE ONLY tru_ftable;  -- truncate both parent and child
+TRUNCATE ONLY tru_ftable;  -- truncate both of parent and child
 SELECT count(*) FROM tru_ftable;   -- 0
  count 
 ---
@@ -8396,10 +8396,11 @@ SELECT count(*) FROM tru_ftable;   -- 0
 (1 row)
 
 INSERT INTO tru_rtable0 (SELECT x FROM generate_series(21,25) x);
-SELECT sum(id) FROM tru_ftable;-- 115
+INSERT INTO tru_rtable0_child (SELECT x FROM generate_series(26,30) x);
+SELECT sum(id) FROM tru_ftable;-- 255
  sum 
 -
- 115
+ 255
 (1 row)
 
 TRUNCATE tru_ftable;   -- truncate both of parent and child
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 97c156a472..65643e120d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2439,11 +2439,12 @@ INSERT INTO tru_rtable0 (SELECT x FROM 
generate_series(5,9) x);
 INSERT INTO tru_rtable0_child (SELECT x FROM generate_series(10,14) x);
 SELECT sum(id) FROM tru_ftable;   -- 95
 
-TRUNCATE ONLY tru_ftable;  -- truncate both parent and child
+TRUNCATE ONLY tru_ftable;  -- truncate both of parent and child
 SELECT count(*) FROM tru_ftable;   -- 0
 
 INSERT INTO tru_rtable0 (SELECT x FROM generate_series(21,25) x);
-SELECT sum(id) FROM tru_ftable;-- 115
+INSERT INTO tru_rtable0_child (SELECT x FROM generate_series(26,30) x);
+SELECT sum(id) FROM tru_ftable;-- 255
 TRUNCATE tru_ftable;   -- truncate both of parent and child
 SELECT count(*) FROM tru_ftable;-- 0

Re: Retry in pgbench

2021-04-15 Thread Fabien COELHO




It would be useful to test replicating clusters with a (switch|fail)over
procedure.


Interesting idea but in general a failover takes sometime (like a few
minutes), and it will strongly affect TPS. I think in the end it just
compares the failover time.

Or are you suggesting to ignore the time spent in failover?


Or simply to be able to measure it simply from a client perspective? How 
much delay is introduced, how long is endured to go back to the previous 
tps level…


My recollection of Marina patch is that it was non trivial, adding such a 
new and interesting feature suggests a set of patches, not just one patch.


--
Fabien.

Re: Replication slot stats misgivings

2021-04-15 Thread vignesh C

On Thu, Apr 15, 2021 at 2:46 PM Amit Kapila  wrote:
>
> On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada  wrote:
> >
> > On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila  wrote:
> > >
> > Thank you for updating the patch.
> >
> > I have one question on the doc change:
> >
> > +so the counter is not incremented for subtransactions. Note that 
> > this
> > +includes the transactions streamed and or spilled.
> > +   
> >
> > The patch uses the sentence "streamed and or spilled" in two places.
> > You meant “streamed and spilled”? Even if it actually means “and or”,
> > using "and or” (i.g., connecting “and” to “or” by a space) is general?
> > I could not find we use it other places in the doc but found we're
> > using "and/or" instead.
> >
>
> I changed it to 'and/or' and made another minor change.

I have rebased the remaining patches on top of head. Attached the
patches for the same.
Thoughts?

Regards,
Vignesh
From e66015cc7ce9c67280f6779bca56a257652a4342 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Fri, 16 Apr 2021 08:33:40 +0530
Subject: [PATCH v13 1/2] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where drop-slot-stats message
is lost: 1) the stats for the new slot are not recorded and 2) writing
beyond the end of the array when after restring the number of slots
whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes to use HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 src/backend/catalog/system_views.sql  |  30 +--
 src/backend/postmaster/pgstat.c   | 268 +++---
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c|  23 +-
 src/backend/utils/adt/pgstatfuncs.c   | 135 ++-
 src/include/catalog/pg_proc.dat   |  14 +-
 src/include/pgstat.h  |   8 +-
 src/include/replication/slot.h|   2 +-
 src/test/regress/expected/rules.out   |   4 +-
 9 files changed, 258 insertions(+), 228 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6d78b33590..fba45473c1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
 JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
 LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-SELECT
-s.slot_name,
-s.spill_txns,
-s.spill_count,
-s.spill_bytes,
-s.stream_txns,
-s.stream_count,
-s.stream_bytes,
-s.total_txns,
-s.total_bytes,
-s.stats_reset
-FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
 SELECT
 s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
 FROM pg_get_replication_slots() AS L
 LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+SELECT
+s.slot_name,
+s.spill_txns,
+s.spill_count,
+s.spill_bytes,
+s.stream_txns,
+s.stream_count,
+s.stream_bytes,
+s.total_txns,
+s.total_bytes,
+s.stats_reset
+FROM pg_replication_slots as r,
+LATERAL pg_stat_get_replication_slot(slot_name) as s
+WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
 SELECT
 D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..9ded0a75e9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* --
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_ReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void

Re: Replication slot stats misgivings

2021-04-15 Thread Justin Pryzby

On Fri, Apr 16, 2021 at 08:48:29AM +0530, Amit Kapila wrote:
> I am fine with your proposed changes. There are one or two more
> patches in this area. I can include your suggestions along with those
> if you don't mind?

However's convenient is fine 

-- 
Justin

Re: Replication slot stats misgivings

2021-04-15 Thread Amit Kapila

On Fri, Apr 16, 2021 at 8:22 AM Justin Pryzby  wrote:
>
> On Thu, Apr 15, 2021 at 02:46:35PM +0530, Amit Kapila wrote:
> > On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila  
> > > wrote:
> > > >
> > > Thank you for updating the patch.
> > >
> > > I have one question on the doc change:
> > >
> > > +so the counter is not incremented for subtransactions. Note that 
> > > this
> > > +includes the transactions streamed and or spilled.
> > > +   
> > >
> > > The patch uses the sentence "streamed and or spilled" in two places.
> > > You meant “streamed and spilled”? Even if it actually means “and or”,
> > > using "and or” (i.g., connecting “and” to “or” by a space) is general?
> > > I could not find we use it other places in the doc but found we're
> > > using "and/or" instead.
> > >
> >
> > I changed it to 'and/or' and made another minor change.
>
> I'm suggesting some doc changes.  If these are fine, I'll include in my next
> round of doc review, in case you don't want to make another commit just for
> that.
>

I am fine with your proposed changes. There are one or two more
patches in this area. I can include your suggestions along with those
if you don't mind?

-- 
With Regards,
Amit Kapila.

Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-04-15 Thread Kyotaro Horiguchi

At Mon, 12 Apr 2021 15:20:41 +0900, Masahiko Sawada  
wrote in 
> .
> 
> On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada  wrote:
> >
> > Hi,
> >
> > While discussing freezing tuples during CTAS[1], we found that
> > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation.
> > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS,
> > it took 12 sec whereas the code without the patch took 10 sec with the
> > following query:
> >
> > create table t1 (a, b, c, d) as select i,i,i,i from
> > generate_series(1,2000) i;
> >
> > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the
> > following queries:
> >
> > create table source as select generate_series(1, 5000);
> > create materialized view mv as select * from source;
> > refresh materialized view mv;
> >
> > The execution time of REFRESH MATERIALIZED VIEW are:
> >
> > w/ HEAP_INSERT_FROZEN flag : 42 sec
> > w/o HEAP_INSERT_FROZEN flag : 33 sec
> >
> > After investigation, I found that such performance degradation happens
> > on only HEAD code. It seems to me that commit 39b66a91b (and
> > 7db0cd2145) is relevant that has heap_insert() set VM bits and
> > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas
> > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the
> > page when inserting a tuple for the first time on the page (around
> > L2133 in heapam.c), every subsequent heap_insert() on the page reads
> > and pins a VM buffer (see RelationGetBufferForTuple()).
> 
> IIUC RelationGetBufferForTuple() pins vm buffer if the page is
> all-visible since the caller might clear vm bit during operation. But
> it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting
> HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible
> bit but never clear those flag and bit during insertion. Therefore to
> fix this issue, I think we can have RelationGetBufferForTuple() not to
> pin vm buffer if we're inserting a frozen tuple (i.g.,
> HEAP_FROZEN_INSERT case) and the target page is already all-visible.

It seems right to me.

> In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would
> be the table is empty. That way, we will pin vm buffer only for the
> first time of inserting frozen tuple into the empty page, then set
> PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set
> XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not
> pin vm buffer as long as we’re inserting a frozen tuple into the same
> page.
> 
> If the target page is neither empty nor all-visible we will not pin vm
> buffer, which is fine because if the page has non-frozen tuple we
> cannot set bit on vm during heap_insert(). If all tuples on the page
> are already frozen but PD_ALL_VISIBLE is not set for some reason, we
> would be able to set all-frozen bit on vm but it seems not a good idea
> since it requires checking during insertion if all existing tuples are
> frozen or not.
> 
> The attached patch does the above idea. With this patch, the same
> performance tests took 33 sec.

Great! The direction of the patch looks fine to me.

+* If we're inserting frozen entry into empty page, we will set
+* all-visible to page and all-frozen on visibility map.
+*/
+   if (PageGetMaxOffsetNumber(page) == 0)
all_frozen_set = true;

AFAICS the page is always empty when RelationGetBufferForTuple
returned a valid vmbuffer.  So the "if" should be an "assert" instead.

And, the patch changes the value of all_frozen_set to false when the
page was already all-frozen (thus not empty). It would be fine since
we don't need to change the visibility of the page in that case but
the variable name is no longer correct.  set_all_visible or such?

> Also, I've measured the number of page read during REFRESH
> MATERIALIZED VIEW using by pg_stat_statements. There were big
> different on shared_blks_hit on pg_stat_statements:
> 
> 1. w/ HEAP_INSERT_FROZEN flag (HEAD) : 50221781
> 2. w/ HEAP_INSERT_FROZEN flag (HEAD) : 221782
> 3. Patched: 443014
> 
> Since the 'source' table has 5000 and each heap_insert() read vm
> buffer, test 1 read pages as many as the number of insertion tuples.
> The value of test 3 is about twice as much as the one of test 2. This
> is because heap_insert() read the vm buffer for each first insertion
> to the page. The table has 221239 blocks.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Forget close an open relation in ReorderBufferProcessTXN()

2021-04-15 Thread Amit Kapila

On Thu, Apr 15, 2021 at 10:56 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Thu, Apr 15, 2021 at 4:00 PM Japin Li  wrote:
> >>
> >> The RelationIdGetRelation() comment says:
> >>
> > Caller should eventually decrement count. (Usually,
> > that happens by calling RelationClose().)
> >>
> >> However, it doesn't do it in ReorderBufferProcessTXN().
> >> I think we should close it, here is a patch that fixes it. Thoughts?
> >>
>
> > +1. Your fix looks correct to me but can we test it in some way?
>
> I think this code has a bigger problem: it should not be using
> RelationIdGetRelation and RelationClose directly.  99.44% of
> the backend goes through relation_open or one of the other
> relation.c wrappers, so why doesn't this?
>

I think it is because relation_open expects either caller to have a
lock on the relation or don't use 'NoLock' lockmode. AFAIU, we don't
need to acquire a lock on relation while decoding changes from WAL
because it uses a historic snapshot to build a relcache entry and all
the later changes to the rel are absorbed while decoding WAL.

I think it is also important to *not* acquire any lock on relation
otherwise it can lead to some sort of deadlock or infinite wait in the
decoding process. Consider a case for operations like Truncate (or if
the user has acquired an exclusive lock on the relation in some other
way say via Lock command) which acquires an exclusive lock on
relation, it won't get replicated in synchronous mode (when
synchronous_standby_name is configured). The truncate operation will
wait for the transaction to be replicated to the subscriber and the
decoding process will wait for the Truncate operation to finish.

> Possibly the answer is "it copied the equally misguided code
> in pgoutput.c".
>

I think it is following what is done during decoding, otherwise, it
will lead to the problems as described above. We are already
discussing one of the similar problems [1] where pgoutput
unintentionally acquired a lock on the index and lead to a sort of
deadlock.

If the above understanding is correct, I think we might want to
improve comments in this area.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759%40OS0PR01MB6113.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: Bogus collation version recording in recordMultipleDependencies

2021-04-15 Thread Julien Rouhaud

On Thu, Apr 15, 2021 at 10:06:24AM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Wed, Apr 14, 2021 at 01:18:07PM -0400, Tom Lane wrote:
> >> (To be clear: 0002 passes check-world as-is, while 0001 is not
> >> committable without some regression-test fiddling.)
> 
> > I'm probably missing something obvious but both 0001 and 0002 pass 
> > check-world
> > for me, on a glibc box and --with-icu.
> 
> 0001 fails for me :-(.  I think that requires default collation to be C.

Oh right, adding --no-locale to the regress opts I see that create_index is
failing, and that's not the one I was expecting.

We could change create_index test to create c2 with a C collation, in order to
test that we don't track dependency on unversioned locales, and add an extra
test in collate.linux.utf8 to check that we do track a dependency on the
default collation as this test isn't run in the --no-locale case.  The only
case not tested would be default unversioned collation, but I'm not sure where
to properly test that.  Maybe a short leading test in collate.linux.utf8 that
would be run on linux in that case (when getdatabaseencoding() != 'UTF8')?  It
would require an extra alternate file but it wouldn't cause too much
maintenance problem as there should be only one test.

Re: TRUNCATE on foreign table

2021-04-15 Thread Fujii Masao




On 2021/04/16 9:15, Bharath Rupireddy wrote:

On Thu, Apr 15, 2021 at 8:19 PM Fujii Masao  wrote:

On 2021/04/14 12:54, Bharath Rupireddy wrote:

IMHO, we can push all the TRUNCATE options (ONLY, RESTRICTED, CASCADE,
RESTART/CONTINUE IDENTITY), because it doesn't have any major
challenge(implementation wise) unlike pushing some clauses in
SELECT/UPDATE/DELETE and we already do this on the master. It doesn't
look good and may confuse users, if we push some options and restrict
others. We should have an explicit note in the documentation saying we
push all these options to the remote server. We can leave it to the
user to write TRUNCATE for foreign tables with the appropriate
options. If somebody complains about a problem that they will face
with this behavior, we can revisit.


That's one of the options. But I'm afraid it's hard to drop (revisit)
the feature once it has been released. So if there is no explicit
use case for that, basically I'd like to drop that before release
like we agree to drop unused TRUNCATE_REL_CONTEXT_CASCADING.


Thanks. Looks like the decision is going in the direction of
restricting those options, I will withdraw my point.


We are still discussing whether RESTRICT option should be pushed down to
a foreign data wrapper. But ISTM at least we could reach the consensus about
the drop of extra information for each foreign table. So what about applying
the attached patch and remove the extra information at first?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bdc4c3620d..7a798530e3 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2179,24 +2179,19 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List 
**retrieved_attrs)
 void
 deparseTruncateSql(StringInfo buf,
   List *rels,
-  List *rels_extra,
   DropBehavior behavior,
   bool restart_seqs)
 {
-   ListCell   *lc1,
-  *lc2;
+   ListCell   *cell;
 
appendStringInfoString(buf, "TRUNCATE ");
 
-   forboth(lc1, rels, lc2, rels_extra)
+   foreach(cell, rels)
{
-   Relationrel = lfirst(lc1);
-   int extra = lfirst_int(lc2);
+   Relationrel = lfirst(cell);
 
-   if (lc1 != list_head(rels))
+   if (cell != list_head(rels))
appendStringInfoString(buf, ", ");
-   if (extra & TRUNCATE_REL_CONTEXT_ONLY)
-   appendStringInfoString(buf, "ONLY ");
 
deparseRelation(buf, rel);
}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 5070d93239..1a3f5cb4ad 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8218,13 +8218,13 @@ drop table loc3;
 -- test for TRUNCATE
 -- ===
 CREATE TABLE tru_rtable0 (id int primary key);
-CREATE TABLE tru_rtable1 (id int primary key);
 CREATE FOREIGN TABLE tru_ftable (id int)
SERVER loopback OPTIONS (table_name 'tru_rtable0');
 INSERT INTO tru_rtable0 (SELECT x FROM generate_series(1,10) x);
 CREATE TABLE tru_ptable (id int) PARTITION BY HASH(id);
 CREATE TABLE tru_ptable__p0 PARTITION OF tru_ptable
 FOR VALUES WITH (MODULUS 2, REMAINDER 0);
+CREATE TABLE tru_rtable1 (id int primary key);
 CREATE FOREIGN TABLE tru_ftable__p1 PARTITION OF tru_ptable
 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
SERVER loopback OPTIONS (table_name 'tru_rtable1');
@@ -8388,22 +8388,22 @@ SELECT sum(id) FROM tru_ftable;   -- 95
   95
 (1 row)
 
-TRUNCATE ONLY tru_ftable;  -- truncate only parent portion
-SELECT sum(id) FROM tru_ftable;   -- 60
- sum 
--
-  60
+TRUNCATE ONLY tru_ftable;  -- truncate both parent and child
+SELECT count(*) FROM tru_ftable;   -- 0
+ count 
+---
+ 0
 (1 row)
 
 INSERT INTO tru_rtable0 (SELECT x FROM generate_series(21,25) x);
-SELECT sum(id) FROM tru_ftable;-- 175
+SELECT sum(id) FROM tru_ftable;-- 115
  sum 
 -
- 175
+ 115
 (1 row)
 
 TRUNCATE tru_ftable;   -- truncate both of parent and child
-SELECT count(*) FROM tru_ftable;-- empty
+SELECT count(*) FROM tru_ftable;-- 0
  count 
 ---
  0
diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index c590f374c6..c521cba3fc 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -401,7 +401,6 @@ static void postgresExplainForeignModify(ModifyTableState 
*mtstate,
 static void

Re: Replication slot stats misgivings

2021-04-15 Thread Justin Pryzby

On Thu, Apr 15, 2021 at 02:46:35PM +0530, Amit Kapila wrote:
> On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada  wrote:
> >
> > On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila  wrote:
> > >
> > Thank you for updating the patch.
> >
> > I have one question on the doc change:
> >
> > +so the counter is not incremented for subtransactions. Note that 
> > this
> > +includes the transactions streamed and or spilled.
> > +   
> >
> > The patch uses the sentence "streamed and or spilled" in two places.
> > You meant “streamed and spilled”? Even if it actually means “and or”,
> > using "and or” (i.g., connecting “and” to “or” by a space) is general?
> > I could not find we use it other places in the doc but found we're
> > using "and/or" instead.
> >
> 
> I changed it to 'and/or' and made another minor change.

I'm suggesting some doc changes.  If these are fine, I'll include in my next
round of doc review, in case you don't want to make another commit just for
that.

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d90eb0f21..18c5bba254 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2720,9 +2720,9 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i


 Number of decoded transactions sent to the decoding output plugin for
-this slot. This counter is used to maintain the top level transactions,
-so the counter is not incremented for subtransactions. Note that this
-includes the transactions that are streamed and/or spilled.
+this slot. This counts top-level transactions only,
+and is not incremented for subtransactions. Note that this
+includes transactions that are streamed and/or spilled.

  
 
@@ -2731,10 +2731,10 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i
 total_bytesbigint


-Amount of decoded transactions data sent to the decoding output plugin
+Amount of decoded transaction data sent to the decoding output plugin
 while decoding the changes from WAL for this slot. This can be used to
 gauge the total amount of data sent during logical decoding. Note that
-this includes the data that is streamed and/or spilled.
+this includes data that is streamed and/or spilled.

   
  
-- 
2.17.0

Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-15 Thread Zhihong Yu

On Thu, Apr 15, 2021 at 6:27 PM Tomas Vondra 
wrote:

>
>
> On 4/15/21 7:35 PM, James Coleman wrote:
> > On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming  wrote:
> >>
> >> On 15-04-2021 04:01, James Coleman wrote:
> >>> On Wed, Apr 14, 2021 at 5:42 PM James Coleman 
> wrote:
> 
>  On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
>   wrote:
> >
> > On 4/12/21 2:24 PM, Luc Vlaming wrote:
> >> Hi,
> >>
> >> When trying to run on master (but afaik also PG-13) TPC-DS queries
> 94,
> >> 95 and 96 on a SF10 I get the error "could not find pathkey item to
> sort".
> >> When I disable enable_gathermerge the problem goes away and then the
> >> plan for query 94 looks like below. I tried figuring out what the
> >> problem is but to be honest I would need some pointers as the code
> that
> >> tries to matching equivalence members in prepare_sort_from_pathkeys
> is
> >> something i'm really not familiar with.
> >>
> >
> > Could be related to incremental sort, which allowed some gather merge
> > paths that were impossible before. We had a couple issues related to
> > that fixed in November, IIRC.
> >
> >> To reproduce you can either ingest and test using the toolkit I
> used too
> >> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or
> >> alternatively just use the schema (see
> >>
> https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native
> )
> >>
> >
> > Thanks, I'll see if I can reproduce that with your schema.
> >
> >
> > regards
> >
> > --
> > Tomas Vondra
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
> 
>  The query in question is:
> 
>  select  count(*)
>   from store_sales
>   ,household_demographics
>   ,time_dim, store
>   where ss_sold_time_sk = time_dim.t_time_sk
>   and ss_hdemo_sk = household_demographics.hd_demo_sk
>   and ss_store_sk = s_store_sk
>   and time_dim.t_hour = 15
>   and time_dim.t_minute >= 30
>   and household_demographics.hd_dep_count = 7
>   and store.s_store_name = 'ese'
>   order by count(*)
>   limit 100;
> 
>   From debugging output it looks like this is the plan being chosen
>  (cheapest total path):
>   Gather(store_sales household_demographics time_dim)
> rows=60626
>  cost=3145.73..699910.15
>   HashJoin(store_sales household_demographics time_dim)
>  rows=25261 cost=2145.73..692847.55
> clauses: store_sales.ss_hdemo_sk =
>  household_demographics.hd_demo_sk
>   HashJoin(store_sales time_dim) rows=252609
>  cost=1989.73..692028.08
> clauses: store_sales.ss_sold_time_sk =
>  time_dim.t_time_sk
>   SeqScan(store_sales) rows=11998564
>  cost=0.00..658540.64
>   SeqScan(time_dim) rows=1070
>  cost=0.00..1976.35
>   SeqScan(household_demographics) rows=720
>  cost=0.00..147.00
> 
>  prepare_sort_from_pathkeys fails to find a pathkey because
>  tlist_member_ignore_relabel returns null -- which seemed weird because
>  the sortexpr is an Aggref (in a single member equivalence class) and
>  the tlist contains a single member that's also an Aggref. It turns out
>  that the only difference between the two Aggrefs is that the tlist
>  entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
>  aggsplit = AGGSPLIT_SIMPLE.
> 
>  That's as far as I've gotten so far, but I figured I'd get that info
>  out to see if it means anything obvious to anyone else.
> >>>
> >>> This really goes back to [1] where we fixed a similar issue by making
> >>> find_em_expr_usable_for_sorting_rel parallel the rules in
> >>> prepare_sort_from_pathkeys.
> >>>
> >>> Most of those conditions got copied, and the case we were trying to
> >>> handle is the fact that prepare_sort_from_pathkeys can generate a
> >>> target list entry under those conditions if one doesn't exist. However
> >>> there's a further restriction there I don't remember looking at: it
> >>> uses pull_var_clause and tlist_member_ignore_relabel to ensure that
> >>> all of the vars that feed into the sort expression are found in the
> >>> target list. As I understand it, that is: it will build a target list
> >>> entry for something like "md5(column)" if "column" (and that was one
> >>> of our test cases for the previous fix) is in the target list already.
> >>>
> >>> But there's an additional detail here: the call to pull_var_clause
> >>> requests aggregates, window functions, and placeholders be treated as
> >>> vars. That means for

Re: Converting built-in SQL functions to new style

2021-04-15 Thread Justin Pryzby

On Thu, Apr 15, 2021 at 07:25:39PM -0400, Tom Lane wrote:
> One thing I was wondering about, but did not pull the trigger on
> here, is whether to split off the function-related stuff in
> system_views.sql into a new file "system_functions.sql"

+1

-- 
Justin

Re: Truncate in synchronous logical replication failed

2021-04-15 Thread Japin Li



On Thu, 15 Apr 2021 at 19:25, Amit Kapila  wrote:
> On Thu, Apr 15, 2021 at 4:30 PM Japin Li  wrote:
>>
>>
>> On Thu, 15 Apr 2021 at 18:22, Amit Kapila  wrote:
>> > On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila  
>> > wrote:
>> >>
>> >> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
>> >>  wrote:
>> >> >
>> >> > > On 12 Apr 2021, at 08:58, Amit Kapila  wrote:
>> >> > >
>> >> > > The problem happens only when we try to fetch IDENTITY_KEY attributes
>> >> > > because pgoutput uses RelationGetIndexAttrBitmap() to get that
>> >> > > information which locks the required indexes. Now, because TRUNCATE
>> >> > > has already acquired an exclusive lock on the index, it seems to
>> >> > > create a sort of deadlock where the actual Truncate operation waits
>> >> > > for logical replication of operation to complete and logical
>> >> > > replication waits for actual Truncate operation to finish.
>> >> > >
>> >> > > Do we really need to use RelationGetIndexAttrBitmap() to build
>> >> > > IDENTITY_KEY attributes? During decoding, we don't even lock the main
>> >> > > relation, we just scan the system table and build that information
>> >> > > using a historic snapshot. Can't we do something similar here?
>> >> > >
>> >> > > Adding Petr J. and Peter E. to know their views as this seems to be an
>> >> > > old problem (since the decoding of Truncate operation is introduced).
>> >> >
>> >> > We used RelationGetIndexAttrBitmap because it already existed, no other 
>> >> > reason.
>> >> >
>> >>
>> >> Fair enough. But I think we should do something about it because using
>> >> the same (RelationGetIndexAttrBitmap) just breaks the synchronous
>> >> logical replication. I think this is broken since the logical
>> >> replication of Truncate is supported.
>> >>
>> >> > I am not sure what exact locking we need but I would have guessed at 
>> >> > least AccessShareLock would be needed.
>> >> >
>> >>
>> >> Are you telling that we need AccessShareLock on the index? If so, why
>> >> is it different from how we access the relation during decoding
>> >> (basically in ReorderBufferProcessTXN, we directly use
>> >> RelationIdGetRelation() without any lock on the relation)? I think we
>> >> do it that way because we need it to process WAL entries and we need
>> >> the relation state based on the historic snapshot, so, even if the
>> >> relation is later changed/dropped, we are fine with the old state we
>> >> got. Isn't the same thing applies here in logicalrep_write_attrs? If
>> >> that is true then some equivalent of RelationGetIndexAttrBitmap where
>> >> we use RelationIdGetRelation instead of index_open should work?
>> >>
>> >
>> > Today, again I have thought about this and don't see a problem with
>> > the above idea. If the above understanding is correct, then I think
>> > for our purpose in pgoutput, we just need to call RelationGetIndexList
>> > and then build the idattr list for relation->rd_replidindex.
>>
>> Sorry, I don't know how can we build the idattr without open the index.
>> If we should open the index, then we should use NoLock, since the TRUNCATE
>> side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
>> assumes that the  appropriate lock on the index is already taken.
>>
>
> Why can't we use RelationIdGetRelation() by passing the required
> indexOid to it?

Thanks for your reminder.  It might be a way to solve this problem.
I'll try it later.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Converting contrib SQL functions to new style

2021-04-15 Thread Noah Misch

On Wed, Apr 14, 2021 at 02:03:56PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Wed, Apr 14, 2021 at 1:41 PM Tom Lane  wrote:
> >> Could we hack things so that extension scripts are only allowed to
> >> reference objects created (a) by the system, (b) earlier in the
> >> same script, or (c) owned by one of the declared prerequisite
> >> extensions?  Seems like that might provide a pretty bulletproof
> >> defense against trojan-horse objects, though I'm not sure how much
> >> of a pain it'd be to implement.

Good idea.

> > That doesn't seem like a crazy idea, but the previous idea of having
> > some magic syntax that means "the schema where extension FOO is" seems
> > like it might be easier to implement and more generally useful.
> 
> I think that's definitely useful, but it's not a fix for the
> reference-capture problem unless you care to assume that the other
> extension's schema is free of trojan-horse objects.

I could see using that, perhaps in a non-SQL-language function.  I agree it
solves different problems.

Re: Retry in pgbench

2021-04-15 Thread Tatsuo Ishii

> By the way, I've been playing with the idea of failing gracefully and retry
> indefinitely (or until given -T) on SQL error AND connection issue.
> 
> It would be useful to test replicating clusters with a (switch|fail)over
> procedure.

Interesting idea but in general a failover takes sometime (like a few
minutes), and it will strongly affect TPS. I think in the end it just
compares the failover time.

Or are you suggesting to ignore the time spent in failover?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-15 Thread Tomas Vondra



On 4/15/21 7:35 PM, James Coleman wrote:
> On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming  wrote:
>>
>> On 15-04-2021 04:01, James Coleman wrote:
>>> On Wed, Apr 14, 2021 at 5:42 PM James Coleman  wrote:

 On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
  wrote:
>
> On 4/12/21 2:24 PM, Luc Vlaming wrote:
>> Hi,
>>
>> When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
>> 95 and 96 on a SF10 I get the error "could not find pathkey item to 
>> sort".
>> When I disable enable_gathermerge the problem goes away and then the
>> plan for query 94 looks like below. I tried figuring out what the
>> problem is but to be honest I would need some pointers as the code that
>> tries to matching equivalence members in prepare_sort_from_pathkeys is
>> something i'm really not familiar with.
>>
>
> Could be related to incremental sort, which allowed some gather merge
> paths that were impossible before. We had a couple issues related to
> that fixed in November, IIRC.
>
>> To reproduce you can either ingest and test using the toolkit I used too
>> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or
>> alternatively just use the schema (see
>> https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)
>>
>
> Thanks, I'll see if I can reproduce that with your schema.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

 The query in question is:

 select  count(*)
  from store_sales
  ,household_demographics
  ,time_dim, store
  where ss_sold_time_sk = time_dim.t_time_sk
  and ss_hdemo_sk = household_demographics.hd_demo_sk
  and ss_store_sk = s_store_sk
  and time_dim.t_hour = 15
  and time_dim.t_minute >= 30
  and household_demographics.hd_dep_count = 7
  and store.s_store_name = 'ese'
  order by count(*)
  limit 100;

  From debugging output it looks like this is the plan being chosen
 (cheapest total path):
  Gather(store_sales household_demographics time_dim) rows=60626
 cost=3145.73..699910.15
  HashJoin(store_sales household_demographics time_dim)
 rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
 household_demographics.hd_demo_sk
  HashJoin(store_sales time_dim) rows=252609
 cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
 time_dim.t_time_sk
  SeqScan(store_sales) rows=11998564
 cost=0.00..658540.64
  SeqScan(time_dim) rows=1070
 cost=0.00..1976.35
  SeqScan(household_demographics) rows=720
 cost=0.00..147.00

 prepare_sort_from_pathkeys fails to find a pathkey because
 tlist_member_ignore_relabel returns null -- which seemed weird because
 the sortexpr is an Aggref (in a single member equivalence class) and
 the tlist contains a single member that's also an Aggref. It turns out
 that the only difference between the two Aggrefs is that the tlist
 entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
 aggsplit = AGGSPLIT_SIMPLE.

 That's as far as I've gotten so far, but I figured I'd get that info
 out to see if it means anything obvious to anyone else.
>>>
>>> This really goes back to [1] where we fixed a similar issue by making
>>> find_em_expr_usable_for_sorting_rel parallel the rules in
>>> prepare_sort_from_pathkeys.
>>>
>>> Most of those conditions got copied, and the case we were trying to
>>> handle is the fact that prepare_sort_from_pathkeys can generate a
>>> target list entry under those conditions if one doesn't exist. However
>>> there's a further restriction there I don't remember looking at: it
>>> uses pull_var_clause and tlist_member_ignore_relabel to ensure that
>>> all of the vars that feed into the sort expression are found in the
>>> target list. As I understand it, that is: it will build a target list
>>> entry for something like "md5(column)" if "column" (and that was one
>>> of our test cases for the previous fix) is in the target list already.
>>>
>>> But there's an additional detail here: the call to pull_var_clause
>>> requests aggregates, window functions, and placeholders be treated as
>>> vars. That means for our Aggref case it would require that the two
>>> Aggrefs be fully equal, so the differing aggsplit member would cause a
>>> target list entry not to be built, hence our error here.
>>>
>>> I've attached a quick and dirty patch that encodes that final rule
>>> from

Re: wal stats questions

2021-04-15 Thread Masahiro Ikeda



On 2021/04/13 9:33, Fujii Masao wrote:
> 
> 
> On 2021/03/30 20:37, Masahiro Ikeda wrote:
>> OK, I added the condition to the fast-return check. I noticed that I
>> misunderstood that the purpose is to avoid expanding a clock check using WAL
>> stats counters. But, the purpose is to make the conditions stricter, right?
> 
> Yes. Currently if the following condition is false even when the WAL counters
> are updated, nothing is sent to the stats collector. But with your patch,
> in this case the WAL stats are sent.
> 
> if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>     pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>     !have_function_stats && !disconnect)
> 
> Thanks for the patch! It now fails to be applied to the master cleanly.
> So could you rebase the patch?

Thanks for your comments!
I rebased it.


> pgstat_initialize() should initialize pgWalUsage with zero?

Thanks. But, I didn't handle it because I undo the logic to calculate the diff
as you mentioned below.


> +    /*
> + * set the counters related to generated WAL data.
> + */
> +    WalStats.m_wal_records = pgWalUsage.wal_records;
> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes;
> 
> This should be skipped if pgWalUsage.wal_records is zero?

Yes, fixed it.


> + * Be careful that the counters are cleared after reporting them to
> + * the stats collector although you can use WalUsageAccumDiff()
> + * to computing differences to previous values. For backends,
> + * the counters may be reset after a transaction is finished and
> + * pgstat_send_wal() is invoked, so you can compute the difference
> + * in the same transaction only.
> 
> On the second thought, I'm afraid that this can be likely to be a foot-gun
> in the future. So I'm now wondering if the current approach (i.e., calculate
> the diff between the current and previous pgWalUsage and don't reset it
> to zero) is better. Thought? Since the similar data structure pgBufferUsage
> doesn't have this kind of restriction, I'm afraid that the developer can
> easily miss that only pgWalUsage has the restriction.
> 
> But currently the diff is calculated (i.e., WalUsageAccumDiff() is called)
> even when the WAL counters are not updated. Then if that calculated diff is
> zero, we skip sending the WAL stats. This logic should be improved. That is,
> probably we should be able to check whether the WAL counters are updated
> or not, without calculating the diff, because the calculation is not free.
> We can implement this by introducing new flag variable that we discussed
> upthread. This flag is set to true whenever the WAL counters are incremented
> and used to determine whether the WAL stats need to be sent.

Sound reasonable. I agreed that the restriction has a risk to lead mistakes.
I made the patch introducing a new flag.

- v4-0001-performance-improvements-of-reporting-wal-stats.patch


I think introducing a new flag is not necessary because we can know if the WAL
stats are updated or not using the counters of the generated wal record, wal
writes and wal syncs. It's fast compared to get timestamp. The attached patch
is to check if the counters are updated or not using them.

-
v4-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch


> If we do this, another issue is that the data types for wal_records and 
> wal_fpi
> in pgWalUsage are long. Which may lead to overflow? If yes, it's should be
> replaced with uint64.

Yes, I fixed. BufferUsage's counters have same issue, so I fixed them too.

BTW, is it better to change PgStat_Counter from int64 to uint64 because there
aren't any counters with negative number?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 14be850cb3..265bb16cf6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -173,21 +173,21 @@ typedef struct Counters
 	double		sum_var_time[PGSS_NUMKIND]; /* sum of variances in
 			 * planning/execution time in msec */
 	int64		rows;			/* total # of retrieved or affected rows */
-	int64		shared_blks_hit;	/* # of shared buffer hits */
-	int64		shared_blks_read;	/* # of shared disk blocks read */
-	int64		shared_blks_dirtied;	/* # of shared disk blocks dirtied */
-	int64		shared_blks_written;	/* # of shared disk blocks written */
-	int64		local_blks_hit; /* # of local buffer hits */
-	int64		local_blks_read;	/* # of local disk blocks read */
-	int64		local_blks_dirtied; /* # of local disk blocks dirtied */
-	int64		local_blks_written; /* # of local disk blocks written */
-	int64		temp_blks_read; /* # of temp blocks read */
-	int64		temp_blks_written;	/* # of temp blocks written */
+	uint64		shared_blks_hit;	/* # of shared buffer hits */
+	uint64		shared_blks_read;	/* # of shared disk blocks read */
+	uint64

Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows

2021-04-15 Thread Andrew Dunstan



On 4/15/21 8:36 PM, Michael Paquier wrote:
> On Thu, Apr 15, 2021 at 07:16:05AM -0400, Andrew Dunstan wrote:
>> Reviewing the history, I don't want to undo 114541d58e5.
> Maybe we could remove it, but that may be better as a separate
> discussion if it is proving to not improve the situation, and I don't 
> really want to take any risks in destabilizing the buildfarm these
> days.  
>
>> So I'm trying your patch.
> Thanks!  If you need any help, please feel free to ping me.



It's worked on fairywren, I will double check on drongo and if all is
well will commit.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Remove redundant variable from transformCreateStmt

2021-04-15 Thread Bharath Rupireddy

On Thu, Apr 15, 2021 at 8:40 PM Jeevan Ladhe
 wrote:
> IMHO, I think the idea here was to just get rid of an unnecessary variable
> rather than refactoring.
>
> On Thu, Apr 15, 2021 at 5:48 PM Bharath Rupireddy 
>  wrote:
>>
>> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul  wrote:
>> >
>> > Hi,
>> >
>> > Attached patch removes "is_foreign_table" from transformCreateStmt()
>> > since it already has cxt.isforeign that can serve the same purpose.
>>
>> Yeah having that variable as "is_foreign_table" doesn't make sense
>> when we have the info in ctx. I'm wondering whether we can do the
>> following (like transformFKConstraints). It will be more readable and
>> we could also add more comments on why we don't skip validation for
>> check constraints i.e. constraint->skip_validation = false in case for
>> foreign tables.
>
> To address your concern here, I think it can be addressed by adding a comment
> just before we make a call to transformCheckConstraints().

+1. The comment  * If creating a new table (but not a foreign table),
we can safely skip * in transformCheckConstraints just says that we
don't mark skip_validation = true for foreign tables. But the
discussion that led to the commit 86705aa8 [1] has the information as
to why it is so. Although, I have not gone through it entirely, having
something like "newly created foreign tables can have data at the
moment they created, so the constraint validation cannot be skipped"
in transformCreateStmt before calling transformCheckConstraints gives
an idea as to why we don't skip validation.

[1] - 
https://www.postgresql.org/message-id/flat/d2b7419f-4a71-cf86-cc99-bfd0f359a1ea%40lab.ntt.co.jp

> I think this is intentional, to keep the code consistent with the CREATE
> TABLE path i.e. transformCreateStmt(). Here is what the comment atop
> transformCheckConstraints() reads:
>
> /*
>  * transformCheckConstraints
>  * handle CHECK constraints
>  *
>  * Right now, there's nothing to do here when called from ALTER TABLE,
>  * but the other constraint-transformation functions are called in both
>  * the CREATE TABLE and ALTER TABLE paths, so do the same here, and just
>  * don't do anything if we're not authorized to skip validation.
>  */

Yeah, I re-read it and it looks like it's intentional for consistency reasons.

I'm not opposed to this patch as it clearly removes an unnecessary variable.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows

2021-04-15 Thread Michael Paquier

On Thu, Apr 15, 2021 at 07:16:05AM -0400, Andrew Dunstan wrote:
> Reviewing the history, I don't want to undo 114541d58e5.

Maybe we could remove it, but that may be better as a separate
discussion if it is proving to not improve the situation, and I don't 
really want to take any risks in destabilizing the buildfarm these
days.  

> So I'm trying your patch.

Thanks!  If you need any help, please feel free to ping me.
--
Michael

signature.asc
Description: PGP signature

Re: New IndexAM API controlling index vacuum strategies

2021-04-15 Thread Peter Geoghegan

On Thu, Apr 15, 2021 at 5:12 PM Andres Freund  wrote:
> > https://thebuild.com/blog/2019/02/08/do-not-change-autovacuum-age-settings/
>
> Not at all convinced. The issue of needing to modify a lot of
> all-visible pages again to freeze them is big enough to let it be a
> problem even after the freeze map. Yes, there's workloads where it's
> much less of a problem, but not all the time.

Not convinced of what? I only claimed that it was much less common.
Many users live in fear of the extreme worst case of the database no
longer being able to accept writes. That is a very powerful fear.

> > As I said, we handle the case where autovacuum_freeze_max_age is set
> > to something larger than vacuum_failsafe_age is a straightforward and
> > pretty sensible way. I am curious, though: what
> > autovacuum_freeze_max_age setting is "much higher" than 1.6 billion,
> > but somehow also not extremely ill-advised and dangerous? What number
> > is that, precisely? Apparently this is common, but I must confess that
> > it's the first I've heard about it.
>
> I didn't intend to say that the autovacuum_freeze_max_age would be set
> much higher than 1.6 billion, just that that the headroom would be much
> less. I've set it, and seen it set, to 1.5-1.8bio without problems,
> while reducing overhead substantially.

Okay, that makes way more sense. (Though I still think that a
autovacuum_freeze_max_age beyond 1 billion is highly dubious.)

Let's say you set autovacuum_freeze_max_age to 1.8 billion (and you
really know what you're doing). This puts you in a pretty select group
of Postgres users -- the kind of select group that might be expected
to pay very close attention to the compatibility section of the
release notes. In any case it makes the failsafe kick in when
relfrozenxid age is 1.89 billion. Is that so bad? In fact, isn't this
feature actually pretty great for this select cohort of Postgres users
that live dangerously? Now it's far safer to live on the edge (perhaps
with some additional tuning that ought to be easy for this elite group
of users).

-- 
Peter Geoghegan

Re: TRUNCATE on foreign table

2021-04-15 Thread Bharath Rupireddy

On Thu, Apr 15, 2021 at 8:19 PM Fujii Masao  wrote:
> On 2021/04/14 12:54, Bharath Rupireddy wrote:
> > IMHO, we can push all the TRUNCATE options (ONLY, RESTRICTED, CASCADE,
> > RESTART/CONTINUE IDENTITY), because it doesn't have any major
> > challenge(implementation wise) unlike pushing some clauses in
> > SELECT/UPDATE/DELETE and we already do this on the master. It doesn't
> > look good and may confuse users, if we push some options and restrict
> > others. We should have an explicit note in the documentation saying we
> > push all these options to the remote server. We can leave it to the
> > user to write TRUNCATE for foreign tables with the appropriate
> > options. If somebody complains about a problem that they will face
> > with this behavior, we can revisit.
>
> That's one of the options. But I'm afraid it's hard to drop (revisit)
> the feature once it has been released. So if there is no explicit
> use case for that, basically I'd like to drop that before release
> like we agree to drop unused TRUNCATE_REL_CONTEXT_CASCADING.

Thanks. Looks like the decision is going in the direction of
restricting those options, I will withdraw my point.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: New IndexAM API controlling index vacuum strategies

2021-04-15 Thread Andres Freund

Hi,

On 2021-04-14 21:30:29 -0700, Peter Geoghegan wrote:
> I think that this was once true, but is now much less common, mostly
> due to the freeze map stuff in 9.6. And due a general recognition that
> the *risk* of increasing them is just too great (a risk that we can
> hope was diminished by the failsafe, incidentally). As an example of
> this, Christophe Pettus had a Damascene conversion when it came to
> increasing autovacuum_freeze_max_age aggressively, which we explains
> here:
> 
> https://thebuild.com/blog/2019/02/08/do-not-change-autovacuum-age-settings/

Not at all convinced. The issue of needing to modify a lot of
all-visible pages again to freeze them is big enough to let it be a
problem even after the freeze map. Yes, there's workloads where it's
much less of a problem, but not all the time.

> As I said, we handle the case where autovacuum_freeze_max_age is set
> to something larger than vacuum_failsafe_age is a straightforward and
> pretty sensible way. I am curious, though: what
> autovacuum_freeze_max_age setting is "much higher" than 1.6 billion,
> but somehow also not extremely ill-advised and dangerous? What number
> is that, precisely? Apparently this is common, but I must confess that
> it's the first I've heard about it.

I didn't intend to say that the autovacuum_freeze_max_age would be set
much higher than 1.6 billion, just that that the headroom would be much
less. I've set it, and seen it set, to 1.5-1.8bio without problems,
while reducing overhead substantially.

Greetings,

Andres Freund

Re: Replacing pg_depend PIN entries with a fixed range check

2021-04-15 Thread Tom Lane

Andres Freund  writes:
> On 2021-04-15 19:59:24 -0400, Tom Lane wrote:
>> No, *neither* of them are pinned, and we don't want them to be.
>> It's something of a historical artifact that template1 has a low OID.

> Hm, it makes sense for template1 not to be pinned, but it doesn't seem
> as obvious why that should be the case for template0.

IIRC, the docs suggest that in an emergency you could recreate either
of them from the other.  Admittedly, if you've put stuff in template1
then this might cause problems later, but I think relatively few
people do that.

> I'm not at all concerned about the speed after the change - it just
> seems cleaner and easier to understand not to have exceptions.

We had these exceptions already, they were just implemented in initdb
rather than the backend.

regards, tom lane

Re: Replacing pg_depend PIN entries with a fixed range check

2021-04-15 Thread Andres Freund

Hi,

On 2021-04-15 19:59:24 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Hm, maybe we ought to swap template0 and template1 instead? I.e. have
> > template0 be in pg_database.dat and thus get a pinned oid, and then
> > create template1, postgres etc from that?
> 
> No, *neither* of them are pinned, and we don't want them to be.
> It's something of a historical artifact that template1 has a low OID.

Hm, it makes sense for template1 not to be pinned, but it doesn't seem
as obvious why that should be the case for template0.


> In short, I'm really skeptical of changing any of these pin-or-not
> decisions to save one or two comparisons in IsPinnedObject.  That
> function is already orders of magnitude faster than what it replaces;
> we don't need to sweat over making it faster yet.

I'm not at all concerned about the speed after the change - it just
seems cleaner and easier to understand not to have exceptions.

Greetings,

Andres Freund

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-04-15 Thread Andres Freund

Hi,

On 2021-04-12 19:49:36 -0700, Melanie Plageman wrote:
> So, I took a stab at implementing this in PgBackendStatus.

Cool!


> The attached patch is not quite on top of current master, so, alas,
> don't try and apply it. I went to rebase today and realized I needed
> to make some changes in light of e1025044cd4, however, I wanted to
> share this WIP so that I could pose a few questions that I imagine
> will still be relevant after I rewrite the patch.
>
> I removed buffers_backend and buffers_backend_fsync from
> pg_stat_bgwriter and have created a new view which tracks
>   - number of shared buffers the checkpointer and bgwriter write out
>   - number of shared buffers a regular backend is forced to flush
>   - number of extends done by a regular backend through shared buffers
>   - number of buffers flushed by a backend or autovacuum using a
> BufferAccessStrategy which, were they not to use this strategy,
> could perhaps have been avoided if a clean shared buffer was
> available
>   - number of fsyncs done by a backend which could have been done by
> checkpointer if sync queue had not been full

I wonder if leaving buffers_alloc in pg_stat_bgwriter makes sense after
this? I'm tempted to move that to pg_stat_buffers or such...

I'm not quite convinced by having separate columns for checkpointer,
bgwriter, etc. That doesn't seem to scale all that well. What if we
instead made it a view that has one row for each BackendType?


> In implementing this counting per backend, it is easy for all types of
> backends to keep track of the number of writes, extends, fsyncs, and
> strategy writes they are doing. So, as recommended upthread, I have
> added columns in the view for the number of writes for checkpointer and
> bgwriter and others. Thus, this view becomes more than just stats on
> "avoidable I/O done by backends".
>
> So, my question is, does it makes sense to track all extends -- those to
> extend the fsm and visimap and when making a new relation or index? Is
> that information useful? If so, is it different than the extends done
> through shared buffers? Should it be tracked separately?

I don't fully understand what you mean with "extends done through shared
buffers"?


> Another question I have is, should the number of extends be for every
> single block extended or should we try to track the initiation of a set
> of extends (all of those added in RelationAddExtraBlocks(), in this
> case)?

I think it should be 8k blocks, i.e. RelationAddExtraBlocks() should be
tracked as many individual extends. It's implemented that way, but more
importantly, it should be in BLCKSZ units. If we later add some actually
batched operations, we can have separate stats for that.


> Of course, this view, when grown, will begin to overlap with pg_statio,
> which is another consideration. What is its identity? I would find
> "avoidable I/O" either avoidable entirely or avoidable for that
> particular type of process, to be useful.

I think it's fine to overlap with pg_statio_* - those are for individual
objects, so it seems to be expected to overlap with coarser stats.


> Or maybe, it should have a more expansive mandate. Maybe it would be
> useful to aggregate some of the info from pg_stat_statements at a higher
> level -- like maybe shared_blks_read counted across many statements for
> a period of time/context in which we expected the relation in shared
> buffers becomes potentially interesting.

Let's do something more basic first...


Greetings,

Andres Freund

Re: Replacing pg_depend PIN entries with a fixed range check

2021-04-15 Thread Tom Lane

Andres Freund  writes:
> Hm, maybe we ought to swap template0 and template1 instead? I.e. have
> template0 be in pg_database.dat and thus get a pinned oid, and then
> create template1, postgres etc from that?

No, *neither* of them are pinned, and we don't want them to be.
It's something of a historical artifact that template1 has a low OID.

>> +/*
>> + * Large objects are never pinned.  We need this special case because
>> + * their OIDs can be user-assigned.
>> + */
>> +if (classId == LargeObjectRelationId)
>> +return false;

> Huh, shouldn't we reject that when creating them?

We've got regression tests that create blobs with small OIDs :-(.
We could change those tests of course, but they're pretty ancient
and I'm hesitant to move those goal posts.

> I guess you didn't because of dump/restore concerns?

That too.

In short, I'm really skeptical of changing any of these pin-or-not
decisions to save one or two comparisons in IsPinnedObject.  That
function is already orders of magnitude faster than what it replaces;
we don't need to sweat over making it faster yet.

regards, tom lane

Re: Replacing pg_depend PIN entries with a fixed range check

2021-04-15 Thread Andres Freund

Hi,

On 2021-04-14 21:43:28 -0400, Tom Lane wrote:
> In [1] Andres and I speculated about whether we really need all
> those PIN entries in pg_depend.  Here is a draft patch that gets
> rid of them.

Yay.

> There are a couple of objects, namely template1 and the public
> schema, that are in the catalog .dat files but are not supposed
> to be pinned.  The existing code accomplishes that by excluding them
> (in two different ways :-() while filling pg_depend.  This patch
> just hard-wires exceptions for them in IsPinnedObject(), which seems
> to me not much uglier than what we had before.  The existing code
> also handles pinning of the standard tablespaces in an idiosyncratic
> way; I just dropped that and made them be treated as pinned.

Hm, maybe we ought to swap template0 and template1 instead? I.e. have
template0 be in pg_database.dat and thus get a pinned oid, and then
create template1, postgres etc from that?

I guess we could also just create public in initdb.

Not that it matters much, having those exceptions doesn't seem too bad.



> Anyway, as to concrete results:
> 
> * pg_depend's total relation size, in a freshly made database,
> drops from 1269760 bytes to 368640 bytes.

Nice!



> I didn't try to reproduce the original performance bottleneck
> that was complained of in [1], but that might be fun to check.

I hope it's not reproducible as is, because I hopefully did fix the bug
leading to it ;)

> +bool
> +IsPinnedObject(Oid classId, Oid objectId)
> +{
> + /*
> +  * Objects with OIDs above FirstUnpinnedObjectId are never pinned.  
> Since
> +  * the OID generator skips this range when wrapping around, this check
> +  * guarantees that user-defined objects are never considered pinned.
> +  */
> + if (objectId >= FirstUnpinnedObjectId)
> + return false;
> +
> + /*
> +  * Large objects are never pinned.  We need this special case because
> +  * their OIDs can be user-assigned.
> +  */
> + if (classId == LargeObjectRelationId)
> + return false;
> +

Huh, shouldn't we reject that when creating them? IIRC we already use
oid range checks in a bunch of places? I guess you didn't because of
dump/restore concerns?

Greetings,

Andres Freund

Converting built-in SQL functions to new style

2021-04-15 Thread Tom Lane

[ moving this to a new thread so as not to confuse the cfbot ]

I wrote:
> Andrew Dunstan  writes:
>> Is there anything else we should be doing along the eat your own dogfood
>> line that don't have these security implications?

> We can still convert the initdb-created SQL functions to new style,
> since there's no security threat during initdb.  I'll make a patch
> for that soon.

Here's a draft patch that converts all the built-in and information_schema
SQL functions to new style, except for half a dozen that cannot be
converted because they use polymorphic arguments.

Leaving that remaining half-a-dozen as old style seems okay from a
security standpoint, because they are few enough and simple enough
that it's no big notational headache to make their source text 100%
search-path-proof.  I've inserted OPERATOR() notation where necessary
to make them bulletproof.

Also worth a comment perhaps is that for the functions that are being
converted, I replaced the prosrc text in pg_proc.dat with "see
system_views.sql".  I think this might reduce confusion by making
it clear that these are not the operative definitions.

One thing this patch does that's not strictly within the charter
is to give the two forms of ts_debug() pg_proc.dat entries, just
so they are more like their new neighbors.  This means they'll be
pinned where before they were not, but that seems desirable to me.

I'm pretty confident the conversion is accurate, because I used \sf
to generate the text for the replacement definitions.  So I think
this is committable, though review is welcome.

One thing I was wondering about, but did not pull the trigger on
here, is whether to split off the function-related stuff in
system_views.sql into a new file "system_functions.sql", as has
long been speculated about by the comments in system_views.sql.
I think it is time to do this because

(a) The function stuff now amounts to a full third of the file.

(b) While the views made by system_views.sql are intentionally
not pinned, the function-related commands are messing with
pre-existing objects that *are* pinned.  This seems quite
confusing to me, and it might interfere with the intention that
you could reload the system view definitions using this file.

Thoughts?

regards, tom lane

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 941a9f664c..0f2c1833e9 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -43,7 +43,8 @@ SET search_path TO information_schema;
 CREATE FUNCTION _pg_expandarray(IN anyarray, OUT x anyelement, OUT n int)
 RETURNS SETOF RECORD
 LANGUAGE sql STRICT IMMUTABLE PARALLEL SAFE
-AS 'select $1[s], s - pg_catalog.array_lower($1,1) + 1
+AS 'select $1[s],
+s operator(pg_catalog.-) pg_catalog.array_lower($1,1) operator(pg_catalog.+) 1
 from pg_catalog.generate_series(pg_catalog.array_lower($1,1),
 pg_catalog.array_upper($1,1),
 1) as g(s)';
@@ -52,28 +53,26 @@ CREATE FUNCTION _pg_expandarray(IN anyarray, OUT x anyelement, OUT n int)
  * column's position in the index (NULL if not there) */
 CREATE FUNCTION _pg_index_position(oid, smallint) RETURNS int
 LANGUAGE sql STRICT STABLE
-AS $$
+BEGIN ATOMIC
 SELECT (ss.a).n FROM
   (SELECT information_schema._pg_expandarray(indkey) AS a
FROM pg_catalog.pg_index WHERE indexrelid = $1) ss
   WHERE (ss.a).x = $2;
-$$;
+END;

 CREATE FUNCTION _pg_truetypid(pg_attribute, pg_type) RETURNS oid
 LANGUAGE sql
 IMMUTABLE
 PARALLEL SAFE
 RETURNS NULL ON NULL INPUT
-AS
-$$SELECT CASE WHEN $2.typtype = 'd' THEN $2.typbasetype ELSE $1.atttypid END$$;
+RETURN CASE WHEN $2.typtype = 'd' THEN $2.typbasetype ELSE $1.atttypid END;

 CREATE FUNCTION _pg_truetypmod(pg_attribute, pg_type) RETURNS int4
 LANGUAGE sql
 IMMUTABLE
 PARALLEL SAFE
 RETURNS NULL ON NULL INPUT
-AS
-$$SELECT CASE WHEN $2.typtype = 'd' THEN $2.typtypmod ELSE $1.atttypmod END$$;
+RETURN CASE WHEN $2.typtype = 'd' THEN $2.typtypmod ELSE $1.atttypmod END;

 -- these functions encapsulate knowledge about the encoding of typmod:

@@ -82,8 +81,7 @@ CREATE FUNCTION _pg_char_max_length(typid oid, typmod int4) RETURNS integer
 IMMUTABLE
 PARALLEL SAFE
 RETURNS NULL ON NULL INPUT
-AS
-$$SELECT
+RETURN
   CASE WHEN $2 = -1 /* default typmod */
THEN null
WHEN $1 IN (1042, 1043) /* char, varchar */
@@ -91,15 +89,14 @@ $$SELECT
WHEN $1 IN (1560, 1562) /* bit, varbit */
THEN $2
ELSE null
-  END$$;
+  END;

 CREATE FUNCTION _pg_char_octet_length(typid oid, typmod int4) RETURNS integer
 LANGUAGE sql
 IMMUTABLE
 PARALLEL SAFE
 RETURNS NULL ON NULL INPUT
-AS
-$$SELECT
+RETURN
   CASE WHEN $1 IN (25, 1042, 1043) /* text, char, varchar */
THEN CASE WHEN $2 = -1 /* default typmod */

Re: Collation versioning

2021-04-15 Thread Thomas Munro

On Mon, Mar 15, 2021 at 2:25 PM Thomas Munro  wrote:
> FYI I have added this as an open item for PostgreSQL 14.  My default
> action will be to document this limitation, if we can't come up with
> something better in time.

Here is a short doc update to explain the situation on Windows and
close that open item.

PS  While trying to find official names to use to refer to the "en-US"
and "English_United States.1252" forms, I came across these sentences
in the Windows documentation[1], which support the idea already
discussed of trying to prevent the latter format from ever entering
our catalogs, in some future release:

"The locale-name form is a short, IETF-standardized string; for
example, en-US for English (United States) or bs-Cyrl-BA for Bosnian
(Cyrillic, Bosnia and Herzegovina).  These forms are preferred. [...]"

"The language[_country-region[.code-page]] form is stored in the
locale setting for a category when a language string, or language
string and country or region string, is used to create the locale.
[...] We do not recommend this form for locale strings embedded in
code or serialized to storage, because these strings are more likely
to be changed by an operating system update than the locale name
form."

[1] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160
From fd6c376dba21fdb0020d9b9de08fb878bb66f23d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 16 Apr 2021 10:21:48 +1200
Subject: [PATCH] Doc: Document known problem with Windows collation versions.

Warn users that locales with traditional Windows NLS names like
"English_United States.1252" won't provide version information, and that
something like initdb --lc-collate=en-US would be needed to fix that
problem for the database default collation.

Discussion: https://postgr.es/m/CA%2BhUKGJ_hk3rU%3D%3Dg2FpAMChb_4i%2BTJacpjjqFsinY-tRM3FBmA%40mail.gmail.com
---
 doc/src/sgml/charset.sgml | 9 +
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 1b00e543a6..9630b18988 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -985,6 +985,15 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
  approach is imperfect as maintainers are free to back-port newer
  collation definitions to older C library releases.

+
+ When using Windows collations, version information is only available for
+ collations defined with IETF BCP47 locale names such as
+ en-US.  Currently, initdb selects
+ a default locale using a traditional Windows language and country
+ string such as English_United States.1252.  The
+ --lc-collate option can be used to provide an explicit
+ locale name in IETF-standardized form.
+

-- 
2.30.1

Re: SQL-standard function body

2021-04-15 Thread Tom Lane

Based on the discussion so far, I've committed 0001 and 0002 but not 0003,
and marked this open issue as closed.

regards, tom lane

Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-15 Thread Tomas Vondra




On 4/15/21 2:21 AM, Robert Haas wrote:
> On Wed, Apr 14, 2021 at 8:20 PM James Coleman  wrote:
>>> Hmm, could be. Although, the stack trace at issue doesn't seem to show
>>> a call to create_incrementalsort_plan().
>>
>> The changes to gather merge path generation made it possible to use
>> those paths in more cases for both incremental sort and regular sort,
>> so by "incremental sort" I read Tomas as saying "the patches that
>> brought in incremental sort" not specifically "incremental sort
>> itself".
> 
> I agree. That's why I said "hmm, could be" even though the plan
> doesn't involve one.
> 

Yeah, that's what I meant. The difference to pre-13 behavior is that we
now call generate_useful_gather_paths, which also considers adding extra
sort (unlike plain generate_gather_paths).

So now we can end up with "Gather Merge -> Sort" paths that would not be
considered before.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ATTACH PARTITION locking documentation for DEFAULT partitions

2021-04-15 Thread Justin Pryzby

On Thu, Apr 15, 2021 at 08:47:26PM +0200, Matthias van de Meent wrote:
> I recently noticed that ATTACH PARTITION also recursively locks the
> default partition with ACCESS EXCLUSIVE mode when its constraints do
> not explicitly exclude the to-be-attached partition, which I couldn't
> find documented (has been there since PG10 I believe).

I'm not sure it's what you're looking for, but maybe you saw:
https://www.postgresql.org/docs/12/sql-altertable.html
|The default partition can't contain any rows that would need to be moved to the
|new partition, and will be scanned to verify that none are present. This scan,
|like the scan of the new partition, can be avoided if an appropriate
|CHECK constraint is present.

And since 2a4d96ebb:
|Attaching a partition acquires a SHARE UPDATE EXCLUSIVE lock on the parent 
table, in addition to ACCESS EXCLUSIVE locks on the table to be attached and on 
the default partition (if any).

>From your patch:

> +
> + Similarly, if you have a default partition on the parent table, it is
> + recommended to create a CHECK constraint that 
> excludes
> + the to be attached partition constraint. Here, too, without the
> + CHECK constraint, this table will be scanned to
> + validate that the updated default partition constraints while holding
> + an ACCESS EXCLUSIVE lock on the default partition.
> +

The AEL is acquired in any case, right ?

I think whatever we say here needs to be crystal clear that only the scan can
be skipped.

I suggest that maybe the existing paragraph in alter_table.sgml could maybe say
that an exclusive lock is held, maybe like.

|The default partition can't contain any rows that would need to be moved to the
|new partition, and will be scanned to verify that none are present. This scan,
|like the scan of the new partition, can be avoided if an appropriate
|CHECK constraint is present.
|The scan of the default partition occurs while it is exclusively locked.

-- 
Justin

ATTACH PARTITION locking documentation for DEFAULT partitions

2021-04-15 Thread Matthias van de Meent

Hi,

I recently noticed that ATTACH PARTITION also recursively locks the
default partition with ACCESS EXCLUSIVE mode when its constraints do
not explicitly exclude the to-be-attached partition, which I couldn't
find documented (has been there since PG10 I believe).

PFA a patch that documents just that.

With regards,

Matthias van de Meent.
From 2bf23cd8018c7e2cbff4f00be4aba1e806750998 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Thu, 15 Apr 2021 20:43:00 +0200
Subject: [PATCH v1] ATTACH PARTITION locking documentation for DEFAULT
 partitions.

---
 doc/src/sgml/ddl.sgml | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 30e4170963..f001f0b8a3 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3934,6 +3934,9 @@ ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
 \copy measurement_y2008m02 from 'measurement_y2008m02'
 -- possibly some other data preparation work
 
+ALTER TABLE measurement_default ADD CONSTRAINT excl_y2008m02
+   CHECK ( (logdate = DATE '2008-02-01' AND logdate  DATE '2008-03-01') IS FALSE );
+
 ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 FOR VALUES FROM ('2008-02-01') TO ('2008-03-01' );
 
@@ -3953,6 +3956,21 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
  constraint after ATTACH PARTITION is finished.
 
 
+
+ Similarly, if you have a default partition on the parent table, it is
+ recommended to create a CHECK constraint that excludes
+ the to be attached partition constraint. Here, too, without the
+ CHECK constraint, this table will be scanned to
+ validate that the updated default partition constraints while holding
+ an ACCESS EXCLUSIVE lock on the default partition.
+
+
+
+ Note that the locks on both the table that is being attached and the 
+ default partition are both recursive to all their child partitions, if 
+ these tables are partitioned tables themselves.
+
+
 
  As explained above, it is possible to create indexes on partitioned tables
  so that they are applied automatically to the entire hierarchy.
-- 
2.20.1

Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation

2021-04-15 Thread Matthias van de Meent

-- Targeting PG15; if too early / noise then please ignore.

I've noticed there are a lot of places in the btree index
infrastructure (and also some other index AMs) that effectively
iterate over the attributes of the index tuple, but won't use
index_deform_tuple for reasons. However, this implies that they must
repeatedly call index_getattr, which in the worst case is O(n) for the
n-th attribute, slowing down extraction of multi-column indexes
significantly. As such, I've added some API that allows for iteration
(ish) over index attributes.

Please find attached patch 0001 that improves the runtime complexity
of many of these places by storing and reusing the offset of the last
extracted attribute. This improves the worst-case runtime of
extracting all attributes to O(n) for incremental attribute extraction
(from O(n*n)). Note that finding the first offsets is still an O(n)
worst case for starting at the n-th attribute, but nothing can be done
about that.

Almost all workloads for multi-column nbtree indexes that cannot use
attcacheoff should see a benefit from this patch; only those that only
use row scans cannot use this optimization. Additionally, multi-column
gist indexes could also see some (albeit limited) benefit, which is
indeed useful when considering the newly added INCLUDE support in the
gist AM.

Also attached is 0002, which dynamically truncates attribute prefixes
of tuples whilst _binsrch-ing through a nbtree page. It greatly uses
the improved performance of 0001; they work very well together. The
problems that Peter (cc-ed) mentions in [0] only result in invalid
search bounds when traversing the tree, but on the page level valid
bounds can be constructed.

This is patchset 1 of a series of patches I'm starting for eventually
adding static prefix truncation into nbtree infrastructure in
PostgreSQL. I've put up a wiki page [1] with my current research and
thoughts on that topic.

Performance
---

I've run some tests with regards to performance on my laptop; which
tests nbtree index traversal. The test is based on a recent UK land
registry sales prices dataset (25744780 rows), being copied from one
table into an unlogged table with disabled autovacuum, with one index
as specified by the result. Master @ 99964c4a, patched is with both
0001 and 0002. The results are averages over 3 runs, with plain
configure, compiled by gcc (Debian 6.3.0-18+deb9u1).

INSERT (index definition)| master (s) | patched (s) | improv(%)
UNIQUE (transaction) | 256851 |  251705 | 2.00
(county, city, locality) | 154529 |  147495 | 4.55
(county COLLATE "en_US", city, locality) | 174028 |  164165 | 5.67
(always_null, county, city, locality)| 173090 |  166851 | 3.60

Some testing for reindex indicates improvements there as well: Same
compiled version; all indexes on an unlogged table; REINDEX run 4
times on each index, last 3 were averaged.

REINDEX (index definition)   | master (s) | patched (s) | improv(%)
UNIQUE (transaction) |  11623 |   11692 | -0.6
(county, city, locality) |  58299 |   54770 |  6.1
(county COLLATE "en_US", city, locality) |  61790 |   55887 |  9.6
(always_null, county, city, locality)|  69703 |   63925 |  8.3

I am quite suprised with the results for the single-column unique
index insertions, as that was one of the points where I was suspecting
a slight decrease in performance for inserts. I haven't really checked
why the performance increased, but I suspect it has to do with an
improved fast-path for finding the first attribute (we know it always
starts at offset 0 of the data section), but it might also just as
well be due to throttling (sadly, I do not have a stable benchmarking
machine, so my laptop will do).

I'm also slightly disappointed with the results of the always_null
insert load; I had hoped for better results there, seeing the results
for the other 2 multi-column indexes.


With regards,

Matthias van de Meent.

[0] 
https://www.postgresql.org/message-id/cah2-wzn_nayk4pr0hrwo0stwhmxjp5qyu+x8vppt030xpqr...@mail.gmail.com
[1] https://wiki.postgresql.org/wiki/NBTree_Prefix_Truncation
From b832318acad276ef991785141f606d624c67cb69 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Mon, 22 Feb 2021 14:13:13 +0100
Subject: [PATCH v1 2/2] Implement page-level dynamic prefix truncation for
 _bt_binsrch*

Because tuples are ordered on the page, if some prefix of the scan columns on
both sides of the compared tuple are equal to the scankey, then the current
tuple that is being compared must also have those prefixing columns that equal
the scankey.

We cannot propagate this information to _binsrch on subsequent pages, as this
downstream page may concurrently have split and/or have merged with its
deleted left neighbour (see [0]), moving the keyspace of the linked page, so
we can only trust the current state of

Re: psql - add SHOW_ALL_RESULTS option

2021-04-15 Thread Peter Eisentraut


On 15.04.21 13:51, Fabien COELHO wrote:

That's a lot IMO, so my vote would be to discard this feature for now
and revisit it properly in the 15 dev cycle, so as resources are
redirected into more urgent issues (13 open items as of the moment of
writing this email).


I don't wish to tell people which open issues they ought to work on
... but this patch seems like it could be quite a large can of worms,
and I'm not detecting very much urgency about getting it fixed.
If it's not to be reverted then some significant effort needs to be
put into it soon.


My overly naive trust in non regression test to catch any issues has 
been largely proven wrong. Three key features do not have a single 
tests. Sigh.


I'll have some time to look at it over next week-end, but not before.


I have reverted the patch and moved the commit fest entry to CF 2021-07.

Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-15 Thread James Coleman

On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming  wrote:
>
> On 15-04-2021 04:01, James Coleman wrote:
> > On Wed, Apr 14, 2021 at 5:42 PM James Coleman  wrote:
> >>
> >> On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
> >>  wrote:
> >>>
> >>> On 4/12/21 2:24 PM, Luc Vlaming wrote:
>  Hi,
> 
>  When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
>  95 and 96 on a SF10 I get the error "could not find pathkey item to 
>  sort".
>  When I disable enable_gathermerge the problem goes away and then the
>  plan for query 94 looks like below. I tried figuring out what the
>  problem is but to be honest I would need some pointers as the code that
>  tries to matching equivalence members in prepare_sort_from_pathkeys is
>  something i'm really not familiar with.
> 
> >>>
> >>> Could be related to incremental sort, which allowed some gather merge
> >>> paths that were impossible before. We had a couple issues related to
> >>> that fixed in November, IIRC.
> >>>
>  To reproduce you can either ingest and test using the toolkit I used too
>  (see https://github.com/swarm64/s64da-benchmark-toolkit/), or
>  alternatively just use the schema (see
>  https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)
> 
> >>>
> >>> Thanks, I'll see if I can reproduce that with your schema.
> >>>
> >>>
> >>> regards
> >>>
> >>> --
> >>> Tomas Vondra
> >>> EnterpriseDB: http://www.enterprisedb.com
> >>> The Enterprise PostgreSQL Company
> >>
> >> The query in question is:
> >>
> >> select  count(*)
> >>  from store_sales
> >>  ,household_demographics
> >>  ,time_dim, store
> >>  where ss_sold_time_sk = time_dim.t_time_sk
> >>  and ss_hdemo_sk = household_demographics.hd_demo_sk
> >>  and ss_store_sk = s_store_sk
> >>  and time_dim.t_hour = 15
> >>  and time_dim.t_minute >= 30
> >>  and household_demographics.hd_dep_count = 7
> >>  and store.s_store_name = 'ese'
> >>  order by count(*)
> >>  limit 100;
> >>
> >>  From debugging output it looks like this is the plan being chosen
> >> (cheapest total path):
> >>  Gather(store_sales household_demographics time_dim) rows=60626
> >> cost=3145.73..699910.15
> >>  HashJoin(store_sales household_demographics time_dim)
> >> rows=25261 cost=2145.73..692847.55
> >>clauses: store_sales.ss_hdemo_sk =
> >> household_demographics.hd_demo_sk
> >>  HashJoin(store_sales time_dim) rows=252609
> >> cost=1989.73..692028.08
> >>clauses: store_sales.ss_sold_time_sk =
> >> time_dim.t_time_sk
> >>  SeqScan(store_sales) rows=11998564
> >> cost=0.00..658540.64
> >>  SeqScan(time_dim) rows=1070
> >> cost=0.00..1976.35
> >>  SeqScan(household_demographics) rows=720
> >> cost=0.00..147.00
> >>
> >> prepare_sort_from_pathkeys fails to find a pathkey because
> >> tlist_member_ignore_relabel returns null -- which seemed weird because
> >> the sortexpr is an Aggref (in a single member equivalence class) and
> >> the tlist contains a single member that's also an Aggref. It turns out
> >> that the only difference between the two Aggrefs is that the tlist
> >> entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
> >> aggsplit = AGGSPLIT_SIMPLE.
> >>
> >> That's as far as I've gotten so far, but I figured I'd get that info
> >> out to see if it means anything obvious to anyone else.
> >
> > This really goes back to [1] where we fixed a similar issue by making
> > find_em_expr_usable_for_sorting_rel parallel the rules in
> > prepare_sort_from_pathkeys.
> >
> > Most of those conditions got copied, and the case we were trying to
> > handle is the fact that prepare_sort_from_pathkeys can generate a
> > target list entry under those conditions if one doesn't exist. However
> > there's a further restriction there I don't remember looking at: it
> > uses pull_var_clause and tlist_member_ignore_relabel to ensure that
> > all of the vars that feed into the sort expression are found in the
> > target list. As I understand it, that is: it will build a target list
> > entry for something like "md5(column)" if "column" (and that was one
> > of our test cases for the previous fix) is in the target list already.
> >
> > But there's an additional detail here: the call to pull_var_clause
> > requests aggregates, window functions, and placeholders be treated as
> > vars. That means for our Aggref case it would require that the two
> > Aggrefs be fully equal, so the differing aggsplit member would cause a
> > target list entry not to be built, hence our error here.
> >
> > I've attached a quick and dirty patch that encodes that final rule
> > from prepare_sort_from_pathkeys into
> >

Re: Forget close an open relation in ReorderBufferProcessTXN()

2021-04-15 Thread Tom Lane

Amit Kapila  writes:
> On Thu, Apr 15, 2021 at 4:00 PM Japin Li  wrote:
>> 
>> The RelationIdGetRelation() comment says:
>> 
> Caller should eventually decrement count. (Usually,
> that happens by calling RelationClose().)
>> 
>> However, it doesn't do it in ReorderBufferProcessTXN().
>> I think we should close it, here is a patch that fixes it. Thoughts?
>> 

> +1. Your fix looks correct to me but can we test it in some way?

I think this code has a bigger problem: it should not be using
RelationIdGetRelation and RelationClose directly.  99.44% of
the backend goes through relation_open or one of the other
relation.c wrappers, so why doesn't this?

Possibly the answer is "it copied the equally misguided code
in pgoutput.c".  A quick grep shows nothing else doing it this
way.

regards, tom lane

Re: Converting contrib SQL functions to new style

2021-04-15 Thread Alvaro Herrera

On 2021-Apr-15, Vik Fearing wrote:

> CREATE DOMAIN earth AS "$extension".cube.cube
>   CONSTRAINT not_point check("$extension".cube.cube_is_point(value))
>   CONSTRAINT not_3d check("$extension".cube.cube_dim(value <= 3)
>   ...;

I find this syntax pretty weird -- here, the ".cube." part of the
identifier is acting as an argument of sorts for the preceding
$extension thingy.  This looks very surprising.

Something similar to OPERATOR() syntax may be more palatable:

 CREATE DOMAIN earth AS PG_EXTENSION_SCHEMA(cube).cube
   CONSTRAINT not_point check(PG_EXTENSION_SCHEMA(cube).cube_is_point(value))
   CONSTRAINT not_3d check(PG_EXTENSION_SCHEMA(cube).cube_dim(value <= 3)
   ...;

Here, the PG_EXTENSION_SCHEMA() construct expands into the schema of the
given extension.  This looks more natural to me, since the extension
that acts as argument to PG_EXTENSION_SCHEMA() does look like an
argument.

I don't know if the parser would like this, though.

-- 
Álvaro Herrera   Valdivia, Chile

Re: pg_amcheck contrib application

2021-04-15 Thread Mark Dilger



> On Apr 14, 2021, at 10:17 AM, Robert Haas  wrote:
> 
> On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger
>  wrote:
>> It now reports:
>> 
>> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
>> # toast value 16461 missing chunk 3 with expected size 1996
>> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
>> # toast value 16461 was expected to end at chunk 6 with total size 
>> 1, but ended at chunk 5 with total size 8004
>> 
>> It sounds like you weren't expecting the second of these reports.  I think 
>> it is valuable, especially when there are multiple missing chunks and 
>> multiple extraneous chunks, as it makes it easier for the user to reconcile 
>> the missing chunks against the extraneous chunks.
> 
> I wasn't, but I'm not overwhelmingly opposed to it, either. I do think
> I would be in favor of splitting this kind of thing up into two
> messages:
> 
> # toast value 16459 unexpected chunks 1000 through 1004 each with
> size 1996 followed by chunk 1005 with size 20
> 
> We'll have fewer message variants and I don't think any real
> regression in usability if we say:
> 
> # toast value 16459 has unexpected chunks 1000 through 1004 each
> with size 1996
> # toast value 16459 has unexpected chunk 1005 with size 20

Changed.

> (Notice that I also inserted "has" so that the sentence a verb. Or we
> could "contains.")

I have added the verb "has" rather than "contains" because "has" is more 
consistent with the phrasing of other similar corruption reports.



v21-0001-amcheck-adding-toast-pointer-corruption-checks.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

wrong units in ExplainPropertyFloat

2021-04-15 Thread Justin Pryzby

|commit 7a50bb690b4837d29e715293c156cff2fc72885c
|Author: Andres Freund 
|Date:   Fri Mar 16 23:13:12 2018 -0700
|
|Add 'unit' parameter to ExplainProperty{Integer,Float}.
|
|This allows to deduplicate some existing code, but mainly avoids some
|duplication in upcoming commits.
|
|In passing, fix variable names indicating wrong unit (seconds instead
|of ms).
|
|Author: Andres Freund
|Discussion: 
https://postgr.es/m/20180314002740.cah3mdsonz5mx...@alap3.anarazel.de

@@ -1304,8 +1299,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
planstate->instrument && planstate->instrument->nloops > 0)
{
double  nloops = planstate->instrument->nloops;
-   double  startup_sec = 1000.0 * 
planstate->instrument->startup / nloops;
-   double  total_sec = 1000.0 * 
planstate->instrument->total / nloops;
+   double  startup_ms = 1000.0 * 
planstate->instrument->startup / nloops;
+   double  total_ms = 1000.0 * 
planstate->instrument->total / nloops;
...
if (es->timing)
{
-   ExplainPropertyFloat("Actual Startup Time", 
startup_sec, 3, es);
-   ExplainPropertyFloat("Actual Total Time", 
total_sec, 3, es);
+   ExplainPropertyFloat("Actual Startup Time", 
"s", startup_ms,
+3, es);
+   ExplainPropertyFloat("Actual Total Time", "s", 
total_ms,
+3, es);

There's 3 pairs of these, and the other two pairs use "ms":

$ git grep 'Actual.*Time' src/backend/commands/explain.c 
src/backend/commands/explain.c: 
ExplainPropertyFloat("Actual Startup Time", "s", startup_ms,
src/backend/commands/explain.c: 
ExplainPropertyFloat("Actual Total Time", "s", total_ms,
src/backend/commands/explain.c: 
ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es);
src/backend/commands/explain.c: 
ExplainPropertyFloat("Actual Total Time", "ms", 0.0, 3, es);
src/backend/commands/explain.c: 
ExplainPropertyFloat("Actual Startup Time", "ms",
src/backend/commands/explain.c: 
ExplainPropertyFloat("Actual Total Time", "ms",

Text mode uses appendStringInfo() for high density output, so this only affects
non-text output, but it turns out that units aren't shown for nontext format
anyway - this seems like a deficiency, but it means there's no visible bug.

-- 
Justin

Re: Replacing pg_depend PIN entries with a fixed range check

2021-04-15 Thread Tom Lane

[ connecting up two threads here ]

I wrote:
> Hence, what this patch does is to establish a manually-managed cutoff
> point akin to FirstBootstrapObjectId, and make initdb push the OID
> counter up to that once it's made the small number of pinned objects
> it's responsible for.  With the value I used here, a couple hundred
> OIDs are wasted, but there seems to be a reasonable amount of headroom
> still beyond that.  On my build, the OID counter at the end of initdb
> is 15485 (with a reasonable number of glibc and ICU locales loaded).
> So we still have about 900 free OIDs there; and there are 500 or so
> free just below FirstBootstrapObjectId, too.  So this approach does
> hasten the day when we're going to run out of free OIDs below 16384,
> but not by all that much.

In view of the discussion at [1], there's more pressure on the OID supply
above 10K than I'd realized.  While I don't have any good ideas about
eliminating the problem altogether, I did have a thought that would remove
the extra buffer zone created by my first-draft patch in this thread.
Namely, let's have genbki.pl write out its final OID assignment counter
value in a command in the postgres.bki file, say "set_next_oid 12036".
This would cause the bootstrap backend to set the server's OID counter to
that value.  Then the initial part of initdb's post-bootstrap processing
could assign pinned OIDs working forward from there, with no gap.  We'd
still need a gap before FirstBootstrapObjectId (which we might as well
rename to FirstUnpinnedObjectId), but we don't need two gaps, and so this
patch wouldn't make things any worse than they are today.

I'm not planning to put more effort into this patch right now, but
I'll revise it along these lines once v15 development opens.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CAGPqQf3JYTrTB1E1fu_zOGj%2BrG_kwTfa3UcUYPfNZL9o1bcYNw%40mail.gmail.com

Re: Options given both on cmd-line and in the config with different values

2021-04-15 Thread Honza Horak


On 4/14/21 7:55 PM, Tom Lane wrote:

Honza Horak  writes:

I'm trying to understand what is happening in the following bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1935301



The upgrade process makes it a bit more difficult, but it seems to boil
down to this problem -- even when pg_ctl gets clear guidance where to
find datadir using -D option on the command-line, it forgets this
guidance once finding data_directory option in the postgresql.conf.



Is this the expected behavior actually?


The rule actually is that -D on the command line says where to find
the configuration file.  While -D is then also the default for where
to find the data directory, the config file can override that by
giving data_directory explicitly.

This is intended to support situations where the config file is kept
outside the data directory for management reasons.  If you are not
actively doing that, I'd recommend *not* setting data_directory
explicitly in the file.

While I've not studied the bug report carefully, it sounds like the
update process you're using involves copying the old config file
across verbatim.  You'd at minimum need to filter out data_directory
and related settings to make that safe.


Thanks for explaining, it makes perfect sense. You're right that there 
is some dbdata directory moving involved, so in that case removing 
data_directory option from postgresql.conf makes sense.


Thanks,
Honza

Re: Schema variables - new implementation for Postgres 15

2021-04-15 Thread Pavel Stehule

čt 15. 4. 2021 v 18:02 odesílatel tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> napsal:

> From: Pavel Stehule 
>
> --
>
> do $$
>
> declare x int ;
>
> begin
>
>   for i in 1..100
>
>   loop
>
> let ooo = i;
>
>   end loop;
>
> end;
>
> $$;
>
>
>
> variant 1 .. 1500 ms
>
> variant 2 with PLpgSQL support .. 140 ms
>
> variant 2 without PLpgSQL support 9000 ms
>
> --
>
>
>
>
>
> That's impressive!  But 1 million times of variable assignment took only
> 140 ms?  It's that one assignment took only 140 nanosecond, which is near
> one DRAM access?  Can PL/pgSQL processing be really so fast?
>

In this case the PLpgSQL can be very fast - and after changes in pg 13, the
PLpgSQL is not significantly slower than Lua or than PHP.

Every body can repeat these tests - I did it on my Lenovo T520 notebook

Pavel



>
>
>
> Regards
>
> Takayuki Tsunakawa
>
>
>

RE: Schema variables - new implementation for Postgres 15

2021-04-15 Thread tsunakawa.ta...@fujitsu.com

From: Pavel Stehule 
--
do $$
declare x int ;
begin
  for i in 1..100
  loop
let ooo = i;
  end loop;
end;
$$;

variant 1 .. 1500 ms
variant 2 with PLpgSQL support .. 140 ms
variant 2 without PLpgSQL support 9000 ms
--


That's impressive!  But 1 million times of variable assignment took only 140 
ms?  It's that one assignment took only 140 nanosecond, which is near one DRAM 
access?  Can PL/pgSQL processing be really so fast?


Regards
Takayuki Tsunakawa

buildfarm xversion diff

2021-04-15 Thread Justin Pryzby

Forking 
https://www.postgresql.org/message-id/20210328231433.gi15...@telsasoft.com

I gave suggestion how to reduce the "lines of diff" metric almost to nothing,
allowing a very small "fudge factor", and which I think makes this a pretty
good metric rather than a passable one.

Thoughts ?

On Sun, Mar 28, 2021 at 06:14:33PM -0500, Justin Pryzby wrote:
> On Sun, Mar 28, 2021 at 04:48:29PM -0400, Andrew Dunstan wrote:
> > Nothing is hidden here - the diffs are reported, see for example
> > 
> > What we're comparing here is target pg_dumpall against the original
> > source vs target pg_dumpall against the upgraded source.
> 
> The command being run is:
> 
> https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm#L610
> system( "diff -I '^-- ' -u $upgrade_loc/origin-$oversion.sql "
>   . "$upgrade_loc/converted-$oversion-to-$this_branch.sql "
>   . "> $upgrade_loc/dumpdiff-$oversion 2>&1");
> ...
>   my $difflines = `wc -l < $upgrade_loc/dumpdiff-$oversion`;
> 
> where -I means: --ignore-matching-lines=RE
> 
> I think wc -l should actually be grep -c '^[-+]'
> otherwise context lines count for as much as diff lines.
> You could write that with diff -U0 |wc -l, except the context is useful to
> humans.
> 
> With some more effort, the number of lines of diff can be very small, allowing
> a smaller fudge factor.  
> 
> For upgrade from v10:
> time make -C src/bin/pg_upgrade check oldsrc=`pwd`/10 
> oldbindir=`pwd`/10/tmp_install/usr/local/pgsql/bin
> 
> $ diff -u src/bin/pg_upgrade/tmp_check/dump1.sql 
> src/bin/pg_upgrade/tmp_check/dump2.sql |wc -l
> 622
> 
> Without context:
> $ diff -u src/bin/pg_upgrade/tmp_check/dump1.sql 
> src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c '^[-+]'
> 142
> 
> Without comments:
> $ diff -I '^-- ' -u src/bin/pg_upgrade/tmp_check/dump1.sql 
> src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c '^[-+]'
> 130
> 
> Without SET default stuff:
> diff -I '^$' -I "SET default_table_access_method = heap;" -I "^SET 
> default_toast_compression = 'pglz';$" -I '^-- ' -u 
> /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql 
> /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump2.sql |less |grep 
> -c '^[-+]'
> 117
> 
> Without trigger function call noise:
> diff -I "^CREATE TRIGGER [_[:alnum:]]\+ .* FOR EACH \(ROW\|STATEMENT\) 
> EXECUTE \(PROCEDURE\|FUNCTION\)" -I '^$' -I "SET default_table_access_method 
> = heap;" -I "^SET default_toast_compression = 'pglz';$" -I '^-- ' -u 
> /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql 
> /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c 
> '^[-+]'
> 11
> 
> Maybe it's important not to totally ignore that, and instead perhaps clean up
> the known/accepted changes like s/FUNCTION/PROCEDURE/:
> 
>  '/^CREATE TRIGGER/s/FUNCTION/PROCEDURE/' |diff -I '^$' -I "SET 
> default_table_access_method = heap;" -I "^SET default_toast_compression = 
> 'pglz';$" -I '^-- ' -u 
> /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql - |grep -c 
> '^[-+]'
> 11
> 
> It seems weird that we don't quote "heap" but we quote tablespaces and not
> toast compression methods.

Re: Commit ab596105b55 - BRIN minmax-multi indexes

2021-04-15 Thread Tom Lane

Rushabh Lathia  writes:
> On Thu, Apr 15, 2021 at 7:49 PM Tom Lane  wrote:
>> As of right now, genbki.pl's OID counter reaches 12036, so it's
>> pretty clear that 12000 no longer works.  (I have this figure in
>> my head because I noted it while working on [1].)  13000 might
>> well be an excessive jump though.  Do you have a concrete problem
>> with it?

> In EDB Advance Server, it has their own set of system objects.  Due
> to mentioned commit (where it changes the FirstBootstrapObjectId to 13000),
> now system objects exceeding the FirstNormalObjectId.

You might want to rethink where you're allocating those OIDs.  Even if
we didn't move FirstBootstrapObjectId today, it's inevitably going to
creep up over time.

As I recall the discussions about this, we'd expected that add-on products
that need OIDs in the bootstrap range would take them from the 8K-10K
range, not above FirstBootstrapObjectId.  Because of the possibility of
having lots of system locales creating lots of collations, the amount of
available OID space above FirstBootstrapObjectId is not as predictable as
you might wish.  (I suspect eventually we're going to have to back off
the idea of creating every possible locale at bootstrap, but we haven't
addressed that yet.)

We are overlapping development use of the 8K-10K OID range with it being
available for add-ons post-release, which might make it hard to do testing
against HEAD.  But you could renumber the not-yet-frozen objects' IDs out
of the way whenever you want to make a merge.

regards, tom lane

Re: Remove redundant variable from transformCreateStmt

2021-04-15 Thread Jeevan Ladhe

IMHO, I think the idea here was to just get rid of an unnecessary variable
rather than refactoring.

On Thu, Apr 15, 2021 at 5:48 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul  wrote:
> >
> > Hi,
> >
> > Attached patch removes "is_foreign_table" from transformCreateStmt()
> > since it already has cxt.isforeign that can serve the same purpose.
>
> Yeah having that variable as "is_foreign_table" doesn't make sense
> when we have the info in ctx. I'm wondering whether we can do the
> following (like transformFKConstraints). It will be more readable and
> we could also add more comments on why we don't skip validation for
> check constraints i.e. constraint->skip_validation = false in case for
> foreign tables.
>

To address your concern here, I think it can be addressed by adding a
comment
just before we make a call to transformCheckConstraints().

In transformAlterTableStmt: we can remove transformCheckConstraints
> entirely because calling transformCheckConstraints with skipValidation
> = false does nothing and has no value. This way we could save a
> function call.
>
> I prefer removing the skipValidation parameter from
> transformCheckConstraints. Others might have different opinions.
>

I think this is intentional, to keep the code consistent with the CREATE
TABLE path i.e. transformCreateStmt(). Here is what the comment atop
transformCheckConstraints() reads:

/*
 * transformCheckConstraints
 * handle CHECK constraints
 *
 * Right now, there's nothing to do here when called from ALTER TABLE,
 * but the other constraint-transformation functions are called in both
 * the CREATE TABLE and ALTER TABLE paths, so do the same here, and just
 * don't do anything if we're not authorized to skip validation.
 */

This was originally discussed in thread[1] and commit:
f27a6b15e6566fba7748d0d9a3fc5bcfd52c4a1b

[1]
https://www.postgresql.org/message-id/flat/1238779931.11913728.1449143089410.JavaMail.yahoo%40mail.yahoo.com#f2d8318b6beef37dfff06baa9a1538b7

Regards,
Jeevan Ladhe

Re: TRUNCATE on foreign table

2021-04-15 Thread Fujii Masao





On 2021/04/14 13:41, Kyotaro Horiguchi wrote:

At Wed, 14 Apr 2021 13:17:55 +0900, Kohei KaiGai  wrote in

Please assume the internal heap data is managed by PostgreSQL core, and
external data source is managed by postgres_fdw (or other FDW driver).
TRUNCATE command requires these object managers to eliminate the data
on behalf of the foreign tables picked up.

Even though the object manager tries to eliminate the managed data, it may be
restricted by some reason; FK restrictions in case of PostgreSQL internal data.
In this case, CASCADE/RESTRICT option suggests the object manager how
to handle the target data.

The ONLY clause controls whoes data shall be eliminated.
On the other hand, CASCADE/RESTRICT and CONTINUE/RESTART controls
how data shall be eliminated. It is a primitive difference.


I have a different view on this classification. IMO ONLY and RESTRICT/CASCADE
should be categorized into the same group. Because both options specify
whether to truncate dependent tables or not. If we treat a foreign table as
an abstraction of external data source, ISTM that we should not take care of
table dependancy in the remote server. IOW, we should truncate entire
external data source, i.e., postgres_fdw should push neither ONLY nor
RESTRICT down to the remote server.



I object to unconditionally push ONLY to remote. As Kaigai-san said
that it works an apparent wrong way when a user wants to truncate only
the specified foreign table in a inheritance tree and there's no way to
avoid the behavior.

I also don't think it is right to push down CASCADE/RESTRICT.  The
options suggest to propagate truncation to *local* referrer tables
from the *foreign* table, not to the remote referrer tables from the
original table on remote.


Agreed.



If a user want to allow that behavior it
should be specified by foreign table options.  (It is bothersome when
someone wants to specify the behavior on-the-fly.)

alter foreign table ft1 options (add truncate_cascade 'true');


Maybe. I think this is the item for v15 or later.



Also, CONTINUE/RESTART IDENTITY should not work since foreign tables
don't have an identity-sequence. However, this we might be able to
push down the options since it affects only the target table.


+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Commit ab596105b55 - BRIN minmax-multi indexes

2021-04-15 Thread Tomas Vondra

On 4/15/21 4:29 PM, Rushabh Lathia wrote:
> 
> 
> On Thu, Apr 15, 2021 at 7:49 PM Tom Lane  > wrote:
> 
> Rushabh Lathia  > writes:
> > Commit mentioned in the $subject changed the FirstBootstrapObjectId
> > (transam.h) from 12000 to 13000.  I was trying to understand the
> reason
> > behind this change, but was not able to gather that information.
> Also didn't
> > find anything in the commit message either.
> 
> As of right now, genbki.pl 's OID counter reaches
> 12036, so it's
> pretty clear that 12000 no longer works.  (I have this figure in
> my head because I noted it while working on [1].)  13000 might
> well be an excessive jump though.  Do you have a concrete problem
> with it?
> 

Yeah, the bump from 12000 to 13000 might be unnecessarily large. But
considering the bootstrap uses only about 1000 OIDs from the >=13000
range, I don't see this as a problem. Surely we can move the ranges in
the future, if needed?

> 
> In EDB Advance Server, it has their own set of system objects.  Due
> to mentioned commit (where it changes the FirstBootstrapObjectId to 13000),
> now system objects exceeding the FirstNormalObjectId.  
> 

I haven't checked what the EDBAS does exactly, but how could it hit
16384 because of custom catalogs? I haven't checked what exactly is
EDBAS doing, but surely it does not have thousands of catalogs, right?

It's OK to lower FirstBootstrapObjectId to e.g. 12500 during the merge,
if that solves the issue for EDBAS. As I said, those ranges are mostly
arbitrary anyway, and EDBAS already has catalogs differences.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: TRUNCATE on foreign table

2021-04-15 Thread Fujii Masao





On 2021/04/14 12:54, Bharath Rupireddy wrote:

IMHO, we can push all the TRUNCATE options (ONLY, RESTRICTED, CASCADE,
RESTART/CONTINUE IDENTITY), because it doesn't have any major
challenge(implementation wise) unlike pushing some clauses in
SELECT/UPDATE/DELETE and we already do this on the master. It doesn't
look good and may confuse users, if we push some options and restrict
others. We should have an explicit note in the documentation saying we
push all these options to the remote server. We can leave it to the
user to write TRUNCATE for foreign tables with the appropriate
options. If somebody complains about a problem that they will face
with this behavior, we can revisit.


That's one of the options. But I'm afraid it's hard to drop (revisit)
the feature once it has been released. So if there is no explicit
use case for that, basically I'd like to drop that before release
like we agree to drop unused TRUNCATE_REL_CONTEXT_CASCADING.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Commit ab596105b55 - BRIN minmax-multi indexes

2021-04-15 Thread Tomas Vondra

Hi,

On 4/15/21 4:03 PM, Rushabh Lathia wrote:
> Hi,
> 
> Commit mentioned in the $subject changed the FirstBootstrapObjectId
> (transam.h) from 12000 to 13000.  I was trying to understand the reason
> behind this change, but was not able to gather that information. Also didn't
> find anything in the commit message either.
> 
> Can you please explain those changes? Is it accidental or intentional?
> 

Yeah, it's an intentional change - I should have mentioned it explicitly
in the thread, probably.

We're assigning OIDs to catalog entries, at different phases, and each
phase has a range or OIDs to ensure the values are unique. The first
phase is genkbi.pl which transforms the .dat files, assigns OIDs in the
[FirstGenbkiObjectId, FirstBootstrapObjectId) range.

However, patches are adding new stuff to the .dat files, so we may hit
the upper limit. The minmax patch happened to add enough new entries to
hit it, i.e. the genbki.pl needed OIDs above FirstBootstrapObjectId and
the compilation would fail. Try lowering the value back to 12000 and run
run "make check" again - it'll fail.

The limits are mostly arbitrary, the primary purpose is to ensure the
OIDs are unique etc. So the patch simply added 1000 values to the genbki
range, to fix this.

Not sure what'll happen once we fill all those ranges, but we're quite
far from that, I think. It took us ~20 years to get 2000 OIDs in the
genbki range, and the bootstrap has ~1000 OIDs. So we've used only about
half the values between 10k and 16k, so far ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Commit ab596105b55 - BRIN minmax-multi indexes

2021-04-15 Thread Rushabh Lathia

On Thu, Apr 15, 2021 at 7:49 PM Tom Lane  wrote:

> Rushabh Lathia  writes:
> > Commit mentioned in the $subject changed the FirstBootstrapObjectId
> > (transam.h) from 12000 to 13000.  I was trying to understand the reason
> > behind this change, but was not able to gather that information. Also
> didn't
> > find anything in the commit message either.
>
> As of right now, genbki.pl's OID counter reaches 12036, so it's
> pretty clear that 12000 no longer works.  (I have this figure in
> my head because I noted it while working on [1].)  13000 might
> well be an excessive jump though.  Do you have a concrete problem
> with it?
>

In EDB Advance Server, it has their own set of system objects.  Due
to mentioned commit (where it changes the FirstBootstrapObjectId to 13000),
now system objects exceeding the FirstNormalObjectId.


> regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/flat/3737988.1618451...@sss.pgh.pa.us
>


-- 
Rushabh Lathia

Re: Commit ab596105b55 - BRIN minmax-multi indexes

2021-04-15 Thread Tom Lane

Rushabh Lathia  writes:
> Commit mentioned in the $subject changed the FirstBootstrapObjectId
> (transam.h) from 12000 to 13000.  I was trying to understand the reason
> behind this change, but was not able to gather that information. Also didn't
> find anything in the commit message either.

As of right now, genbki.pl's OID counter reaches 12036, so it's
pretty clear that 12000 no longer works.  (I have this figure in
my head because I noted it while working on [1].)  13000 might
well be an excessive jump though.  Do you have a concrete problem
with it?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/3737988.1618451...@sss.pgh.pa.us

Re: Bogus collation version recording in recordMultipleDependencies

2021-04-15 Thread Tom Lane

Julien Rouhaud  writes:
> On Wed, Apr 14, 2021 at 01:18:07PM -0400, Tom Lane wrote:
>> (To be clear: 0002 passes check-world as-is, while 0001 is not
>> committable without some regression-test fiddling.)

> I'm probably missing something obvious but both 0001 and 0002 pass check-world
> for me, on a glibc box and --with-icu.

0001 fails for me :-(.  I think that requires default collation to be C.

regards, tom lane

Commit ab596105b55 - BRIN minmax-multi indexes

2021-04-15 Thread Rushabh Lathia

Hi,

Commit mentioned in the $subject changed the FirstBootstrapObjectId
(transam.h) from 12000 to 13000.  I was trying to understand the reason
behind this change, but was not able to gather that information. Also didn't
find anything in the commit message either.

Can you please explain those changes? Is it accidental or intentional?

Thanks,
Rushabh Lathia
www.EnterpriseDB.com

Re: Remove redundant variable from transformCreateStmt

2021-04-15 Thread Amul Sul

On Thu, Apr 15, 2021 at 5:47 PM Bharath Rupireddy
 wrote:
>
> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul  wrote:
> >
> > Hi,
> >
> > Attached patch removes "is_foreign_table" from transformCreateStmt()
> > since it already has cxt.isforeign that can serve the same purpose.
>
> Yeah having that variable as "is_foreign_table" doesn't make sense
> when we have the info in ctx. I'm wondering whether we can do the
> following (like transformFKConstraints). It will be more readable and
> we could also add more comments on why we don't skip validation for
> check constraints i.e. constraint->skip_validation = false in case for
> foreign tables.
>
> bool skip_validation = true;
> if (IsA(stmt, CreateForeignTableStmt))
> {
> cxt.stmtType = "CREATE FOREIGN TABLE";
> cxt.isforeign = true;
> skip_validation = false;> <<>>
> }
> transformCheckConstraints(, skip_validation);
>
> Alternatively, we could also remove skipValidation function parameter
> (since transformCheckConstraints is a static function, I think it's
> okay) and modify transformCheckConstraints, then we can do following:
>
> In transformCreateStmt:
> if (!ctx.isforeign)
>   transformCheckConstraints();
>
> In transformAlterTableStmt: we can remove transformCheckConstraints
> entirely because calling transformCheckConstraints with skipValidation
> = false does nothing and has no value. This way we could save a
> function call.
>
> I prefer removing the skipValidation parameter from
> transformCheckConstraints. Others might have different opinions.
>

Then we also need to remove the transformCheckConstraints() dummy call
from transformAlterTableStmt() which was added for the readability.
Also, this change to transformCheckConstraints() will make it
inconsistent with transformFKConstraints().

I think we shouldn't worry too much about this function call overhead
here since this is a slow utility path, and that is the reason the
current structure doesn't really bother me.

Regards,
Amul

Re: Remove redundant variable from transformCreateStmt

2021-04-15 Thread Bharath Rupireddy

On Thu, Apr 15, 2021 at 5:04 PM Amul Sul  wrote:
>
> Hi,
>
> Attached patch removes "is_foreign_table" from transformCreateStmt()
> since it already has cxt.isforeign that can serve the same purpose.

Yeah having that variable as "is_foreign_table" doesn't make sense
when we have the info in ctx. I'm wondering whether we can do the
following (like transformFKConstraints). It will be more readable and
we could also add more comments on why we don't skip validation for
check constraints i.e. constraint->skip_validation = false in case for
foreign tables.

bool skip_validation = true;
if (IsA(stmt, CreateForeignTableStmt))
{
cxt.stmtType = "CREATE FOREIGN TABLE";
cxt.isforeign = true;
skip_validation = false;> <<>>
}
transformCheckConstraints(, skip_validation);

Alternatively, we could also remove skipValidation function parameter
(since transformCheckConstraints is a static function, I think it's
okay) and modify transformCheckConstraints, then we can do following:

In transformCreateStmt:
if (!ctx.isforeign)
  transformCheckConstraints();

In transformAlterTableStmt: we can remove transformCheckConstraints
entirely because calling transformCheckConstraints with skipValidation
= false does nothing and has no value. This way we could save a
function call.

I prefer removing the skipValidation parameter from
transformCheckConstraints. Others might have different opinions.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: psql - add SHOW_ALL_RESULTS option

2021-04-15 Thread Fabien COELHO




Hello Tom,


That's a lot IMO, so my vote would be to discard this feature for now
and revisit it properly in the 15 dev cycle, so as resources are
redirected into more urgent issues (13 open items as of the moment of
writing this email).


I don't wish to tell people which open issues they ought to work on
... but this patch seems like it could be quite a large can of worms,
and I'm not detecting very much urgency about getting it fixed.
If it's not to be reverted then some significant effort needs to be
put into it soon.


My overly naive trust in non regression test to catch any issues has been 
largely proven wrong. Three key features do not have a single tests. Sigh.


I'll have some time to look at it over next week-end, but not before.

--
Fabien.

Re: Remove redundant variable from transformCreateStmt

2021-04-15 Thread Jeevan Ladhe

Thanks Amul, this looks pretty straight forward. LGTM.
I have also run the regression on master and seems good.

Regards,
Jeevan Ladhe

Remove redundant variable from transformCreateStmt

2021-04-15 Thread Amul Sul

Hi,

Attached patch removes "is_foreign_table" from transformCreateStmt()
since it already has cxt.isforeign that can serve the same purpose.

Regards,
Amul
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 9dd30370dae..97e6d65158c 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -176,7 +176,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	Oid			namespaceid;
 	Oid			existing_relid;
 	ParseCallbackState pcbstate;
-	bool		is_foreign_table = IsA(stmt, CreateForeignTableStmt);
 
 	/*
 	 * We must not scribble on the passed-in CreateStmt, so copy it.  (This is
@@ -334,7 +333,7 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	/*
 	 * Postprocess check constraints.
 	 */
-	transformCheckConstraints(, !is_foreign_table ? true : false);
+	transformCheckConstraints(, !cxt.isforeign);
 
 	/*
 	 * Postprocess extended statistics.

Re: Truncate in synchronous logical replication failed

2021-04-15 Thread Amit Kapila

On Thu, Apr 15, 2021 at 4:30 PM Japin Li  wrote:
>
>
> On Thu, 15 Apr 2021 at 18:22, Amit Kapila  wrote:
> > On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila  wrote:
> >>
> >> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
> >>  wrote:
> >> >
> >> > > On 12 Apr 2021, at 08:58, Amit Kapila  wrote:
> >> > >
> >> > > The problem happens only when we try to fetch IDENTITY_KEY attributes
> >> > > because pgoutput uses RelationGetIndexAttrBitmap() to get that
> >> > > information which locks the required indexes. Now, because TRUNCATE
> >> > > has already acquired an exclusive lock on the index, it seems to
> >> > > create a sort of deadlock where the actual Truncate operation waits
> >> > > for logical replication of operation to complete and logical
> >> > > replication waits for actual Truncate operation to finish.
> >> > >
> >> > > Do we really need to use RelationGetIndexAttrBitmap() to build
> >> > > IDENTITY_KEY attributes? During decoding, we don't even lock the main
> >> > > relation, we just scan the system table and build that information
> >> > > using a historic snapshot. Can't we do something similar here?
> >> > >
> >> > > Adding Petr J. and Peter E. to know their views as this seems to be an
> >> > > old problem (since the decoding of Truncate operation is introduced).
> >> >
> >> > We used RelationGetIndexAttrBitmap because it already existed, no other 
> >> > reason.
> >> >
> >>
> >> Fair enough. But I think we should do something about it because using
> >> the same (RelationGetIndexAttrBitmap) just breaks the synchronous
> >> logical replication. I think this is broken since the logical
> >> replication of Truncate is supported.
> >>
> >> > I am not sure what exact locking we need but I would have guessed at 
> >> > least AccessShareLock would be needed.
> >> >
> >>
> >> Are you telling that we need AccessShareLock on the index? If so, why
> >> is it different from how we access the relation during decoding
> >> (basically in ReorderBufferProcessTXN, we directly use
> >> RelationIdGetRelation() without any lock on the relation)? I think we
> >> do it that way because we need it to process WAL entries and we need
> >> the relation state based on the historic snapshot, so, even if the
> >> relation is later changed/dropped, we are fine with the old state we
> >> got. Isn't the same thing applies here in logicalrep_write_attrs? If
> >> that is true then some equivalent of RelationGetIndexAttrBitmap where
> >> we use RelationIdGetRelation instead of index_open should work?
> >>
> >
> > Today, again I have thought about this and don't see a problem with
> > the above idea. If the above understanding is correct, then I think
> > for our purpose in pgoutput, we just need to call RelationGetIndexList
> > and then build the idattr list for relation->rd_replidindex.
>
> Sorry, I don't know how can we build the idattr without open the index.
> If we should open the index, then we should use NoLock, since the TRUNCATE
> side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
> assumes that the  appropriate lock on the index is already taken.
>

Why can't we use RelationIdGetRelation() by passing the required
indexOid to it?


-- 
With Regards,
Amit Kapila.

Re: Forget close an open relation in ReorderBufferProcessTXN()

2021-04-15 Thread Amit Kapila

On Thu, Apr 15, 2021 at 4:00 PM Japin Li  wrote:
>
> The RelationIdGetRelation() comment says:
>
> > Caller should eventually decrement count. (Usually,
> > that happens by calling RelationClose().)
>
> However, it doesn't do it in ReorderBufferProcessTXN().
> I think we should close it, here is a patch that fixes it. Thoughts?
>

+1. Your fix looks correct to me but can we test it in some way?

-- 
With Regards,
Amit Kapila.

Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows

2021-04-15 Thread Andrew Dunstan



On 4/15/21 12:57 AM, Michael Paquier wrote:
> On Wed, Apr 14, 2021 at 09:26:19PM -0400, Andrew Dunstan wrote:
>> Well, let me try it on fairywren tomorrow. Since we manage this on the
>> buildfarm without any use at all of Win32API::File it might not be
>> necessary in TAP code either, particularly if we're not truncating the file.
> Thanks.  If that proves to not be necessary, +1 to remove this code.
> I have been playing with this stuff, and the attached patch seems to
> work properly on Windows.  On top of that, I have also tested the
> non-Win32 path on an MSVC box to see that it was working, but my
> environment is not really noisy usually with such compatibility
> issues.


Reviewing the history, I don't want to undo 114541d58e5. So I'm trying
your patch.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Replication slot stats misgivings

2021-04-15 Thread Masahiko Sawada

On Thu, Apr 15, 2021 at 6:16 PM Amit Kapila  wrote:
>
> On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada  wrote:
> >
> > On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila  wrote:
> > >
> > Thank you for updating the patch.
> >
> > I have one question on the doc change:
> >
> > +so the counter is not incremented for subtransactions. Note that 
> > this
> > +includes the transactions streamed and or spilled.
> > +   
> >
> > The patch uses the sentence "streamed and or spilled" in two places.
> > You meant “streamed and spilled”? Even if it actually means “and or”,
> > using "and or” (i.g., connecting “and” to “or” by a space) is general?
> > I could not find we use it other places in the doc but found we're
> > using "and/or" instead.
> >
>
> I changed it to 'and/or' and made another minor change.


Thank you for the update! The patch looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Truncate in synchronous logical replication failed

2021-04-15 Thread Japin Li



On Thu, 15 Apr 2021 at 18:22, Amit Kapila  wrote:
> On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila  wrote:
>>
>> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
>>  wrote:
>> >
>> > > On 12 Apr 2021, at 08:58, Amit Kapila  wrote:
>> > >
>> > > The problem happens only when we try to fetch IDENTITY_KEY attributes
>> > > because pgoutput uses RelationGetIndexAttrBitmap() to get that
>> > > information which locks the required indexes. Now, because TRUNCATE
>> > > has already acquired an exclusive lock on the index, it seems to
>> > > create a sort of deadlock where the actual Truncate operation waits
>> > > for logical replication of operation to complete and logical
>> > > replication waits for actual Truncate operation to finish.
>> > >
>> > > Do we really need to use RelationGetIndexAttrBitmap() to build
>> > > IDENTITY_KEY attributes? During decoding, we don't even lock the main
>> > > relation, we just scan the system table and build that information
>> > > using a historic snapshot. Can't we do something similar here?
>> > >
>> > > Adding Petr J. and Peter E. to know their views as this seems to be an
>> > > old problem (since the decoding of Truncate operation is introduced).
>> >
>> > We used RelationGetIndexAttrBitmap because it already existed, no other 
>> > reason.
>> >
>>
>> Fair enough. But I think we should do something about it because using
>> the same (RelationGetIndexAttrBitmap) just breaks the synchronous
>> logical replication. I think this is broken since the logical
>> replication of Truncate is supported.
>>
>> > I am not sure what exact locking we need but I would have guessed at least 
>> > AccessShareLock would be needed.
>> >
>>
>> Are you telling that we need AccessShareLock on the index? If so, why
>> is it different from how we access the relation during decoding
>> (basically in ReorderBufferProcessTXN, we directly use
>> RelationIdGetRelation() without any lock on the relation)? I think we
>> do it that way because we need it to process WAL entries and we need
>> the relation state based on the historic snapshot, so, even if the
>> relation is later changed/dropped, we are fine with the old state we
>> got. Isn't the same thing applies here in logicalrep_write_attrs? If
>> that is true then some equivalent of RelationGetIndexAttrBitmap where
>> we use RelationIdGetRelation instead of index_open should work?
>>
>
> Today, again I have thought about this and don't see a problem with
> the above idea. If the above understanding is correct, then I think
> for our purpose in pgoutput, we just need to call RelationGetIndexList
> and then build the idattr list for relation->rd_replidindex.

Sorry, I don't know how can we build the idattr without open the index.
If we should open the index, then we should use NoLock, since the TRUNCATE
side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
assumes that the  appropriate lock on the index is already taken.

Please let me know if I have misunderstood.

[1] 
https://www.postgresql.org/message-id/OSBPR01MB488834BDBD67BFF2FB048B3DED4E9%40OSBPR01MB4888.jpnprd01.prod.outlook.com

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Bogus collation version recording in recordMultipleDependencies

2021-04-15 Thread Julien Rouhaud

On Wed, Apr 14, 2021 at 01:18:07PM -0400, Tom Lane wrote:
> 
> One question here is whether it's correct that the domain's dependency
> on collation "aa_DJ" is unversioned.  Maybe that's intentional, but it
> seems worth asking.

This is intentional I think, we should record collation version only for object
that might break if the collation version is updated.  So creating an index on
that domain would record the collation version.

> Anyway, there are two pretty obvious bugs in the dependencies for the
> domain's CHECK constraint: the version for collation mycoll leaks
> into the entry for function foo, and an entirely useless (because
> unversioned) dependency is recorded on the default collation.

Agreed.

> (To be clear: 0002 passes check-world as-is, while 0001 is not
> committable without some regression-test fiddling.)

I'm probably missing something obvious but both 0001 and 0002 pass check-world
for me, on a glibc box and --with-icu.

> Thoughts?

I think this is an open item, so I added one for now.

Forget close an open relation in ReorderBufferProcessTXN()

2021-04-15 Thread Japin Li



The RelationIdGetRelation() comment says:

> Caller should eventually decrement count. (Usually,
> that happens by calling RelationClose().)

However, it doesn't do it in ReorderBufferProcessTXN().
I think we should close it, here is a patch that fixes it. Thoughts?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.


diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..aac6ffc602 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2261,7 +2261,10 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
elog(ERROR, 
"could not open relation with OID %u", relid);

if 
(!RelationIsLogicallyLogged(relation))
+   {
+   
RelationClose(relation);
continue;
+   }

relations[nrelations++] 
= relation;
}

Re: Truncate in synchronous logical replication failed

2021-04-15 Thread Amit Kapila

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila  wrote:
>
> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
>  wrote:
> >
> > > On 12 Apr 2021, at 08:58, Amit Kapila  wrote:
> > >
> > > The problem happens only when we try to fetch IDENTITY_KEY attributes
> > > because pgoutput uses RelationGetIndexAttrBitmap() to get that
> > > information which locks the required indexes. Now, because TRUNCATE
> > > has already acquired an exclusive lock on the index, it seems to
> > > create a sort of deadlock where the actual Truncate operation waits
> > > for logical replication of operation to complete and logical
> > > replication waits for actual Truncate operation to finish.
> > >
> > > Do we really need to use RelationGetIndexAttrBitmap() to build
> > > IDENTITY_KEY attributes? During decoding, we don't even lock the main
> > > relation, we just scan the system table and build that information
> > > using a historic snapshot. Can't we do something similar here?
> > >
> > > Adding Petr J. and Peter E. to know their views as this seems to be an
> > > old problem (since the decoding of Truncate operation is introduced).
> >
> > We used RelationGetIndexAttrBitmap because it already existed, no other 
> > reason.
> >
>
> Fair enough. But I think we should do something about it because using
> the same (RelationGetIndexAttrBitmap) just breaks the synchronous
> logical replication. I think this is broken since the logical
> replication of Truncate is supported.
>
> > I am not sure what exact locking we need but I would have guessed at least 
> > AccessShareLock would be needed.
> >
>
> Are you telling that we need AccessShareLock on the index? If so, why
> is it different from how we access the relation during decoding
> (basically in ReorderBufferProcessTXN, we directly use
> RelationIdGetRelation() without any lock on the relation)? I think we
> do it that way because we need it to process WAL entries and we need
> the relation state based on the historic snapshot, so, even if the
> relation is later changed/dropped, we are fine with the old state we
> got. Isn't the same thing applies here in logicalrep_write_attrs? If
> that is true then some equivalent of RelationGetIndexAttrBitmap where
> we use RelationIdGetRelation instead of index_open should work?
>

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex. We can
then cache it in relation->rd_idattr. I am not sure if it is really
required to do all the other work in RelationGetIndexAttrBitmap.

-- 
With Regards,
Amit Kapila.

Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-15 Thread Luc Vlaming


On 15-04-2021 04:01, James Coleman wrote:

On Wed, Apr 14, 2021 at 5:42 PM James Coleman  wrote:


On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
 wrote:


On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.



Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.


To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)



Thanks, I'll see if I can reproduce that with your schema.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


The query in question is:

select  count(*)
 from store_sales
 ,household_demographics
 ,time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
 and ss_hdemo_sk = household_demographics.hd_demo_sk
 and ss_store_sk = s_store_sk
 and time_dim.t_hour = 15
 and time_dim.t_minute >= 30
 and household_demographics.hd_dep_count = 7
 and store.s_store_name = 'ese'
 order by count(*)
 limit 100;

 From debugging output it looks like this is the plan being chosen
(cheapest total path):
 Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
 HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
   clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
 HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
   clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
 SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
 SeqScan(time_dim) rows=1070
cost=0.00..1976.35
 SeqScan(household_demographics) rows=720
cost=0.00..147.00

prepare_sort_from_pathkeys fails to find a pathkey because
tlist_member_ignore_relabel returns null -- which seemed weird because
the sortexpr is an Aggref (in a single member equivalence class) and
the tlist contains a single member that's also an Aggref. It turns out
that the only difference between the two Aggrefs is that the tlist
entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
aggsplit = AGGSPLIT_SIMPLE.

That's as far as I've gotten so far, but I figured I'd get that info
out to see if it means anything obvious to anyone else.


This really goes back to [1] where we fixed a similar issue by making
find_em_expr_usable_for_sorting_rel parallel the rules in
prepare_sort_from_pathkeys.

Most of those conditions got copied, and the case we were trying to
handle is the fact that prepare_sort_from_pathkeys can generate a
target list entry under those conditions if one doesn't exist. However
there's a further restriction there I don't remember looking at: it
uses pull_var_clause and tlist_member_ignore_relabel to ensure that
all of the vars that feed into the sort expression are found in the
target list. As I understand it, that is: it will build a target list
entry for something like "md5(column)" if "column" (and that was one
of our test cases for the previous fix) is in the target list already.

But there's an additional detail here: the call to pull_var_clause
requests aggregates, window functions, and placeholders be treated as
vars. That means for our Aggref case it would require that the two
Aggrefs be fully equal, so the differing aggsplit member would cause a
target list entry not to be built, hence our error here.

I've attached a quick and dirty patch that encodes that final rule
from prepare_sort_from_pathkeys into
find_em_expr_usable_for_sorting_rel. I can't help but think that
there's a cleaner way to do with this with less code duplication, but
hindering that is that prepare_sort_from_pathkeys is working with a
TargetList while find_em_expr_usable_for_sorting_rel is working with a
list of expressions.

James

1: 
https://www.postgresql.org/message-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_%3DNS9GWXuNiFANg%40mail.gmail.com



Hi,

The patch seems to make the planner proceed and not error out anymore. 
Cannot judge if it's doing the right thing however or if its enough :) 
It works for me for all reported

Re: Replication slot stats misgivings

2021-04-15 Thread Amit Kapila

On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada  wrote:
>
> On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila  wrote:
> >
> Thank you for updating the patch.
>
> I have one question on the doc change:
>
> +so the counter is not incremented for subtransactions. Note that this
> +includes the transactions streamed and or spilled.
> +   
>
> The patch uses the sentence "streamed and or spilled" in two places.
> You meant “streamed and spilled”? Even if it actually means “and or”,
> using "and or” (i.g., connecting “and” to “or” by a space) is general?
> I could not find we use it other places in the doc but found we're
> using "and/or" instead.
>

I changed it to 'and/or' and made another minor change.


-- 
With Regards,
Amit Kapila.


v12-0001-Add-information-of-total-data-processed-to-repli.patch
Description: Binary data

Re: [Proposal] Global temporary tables

2021-04-15 Thread shawn wang

wenjing  于2021年4月15日周四 下午3:26写道：

> HI Pavel
>
> I added user documentation.
> Please give me feedback.
>
>
> Wenjing
>
>
Hi, Wenjing,

I have checked your documentation section and fixed a spelling mistake,
adjusted some sentences for you.
All the modified content is in the new patch, and please check it.

Regards

Shawn


0002-gtt-v47-doc.patch
Description: Binary data

Re: Extensions not dumped when --schema is used

2021-04-15 Thread Guillaume Lelarge

Le jeu. 15 avr. 2021 à 09:58, Michael Paquier  a
écrit :

> On Wed, Apr 14, 2021 at 05:31:15AM -0700, Noah Misch wrote:
> > The name "without_extension_explicit_schema" arose because that test
> differs
> > from the "without_extension" test by adding --schema=public.  The test
> named
> > "without_extension_implicit_schema" differs from "without_extension" by
> adding
> > --schema=regress_pg_dump_schema, so the word "implicit" feels
> not-descriptive
> > of the test.  I recommend picking a different name.  Other than that, the
> > change looks good.
>
> Thanks for the review.  I have picked up "internal" instead, as
> that's the schema created within the extension itself, and applied the
> patch.
>

Thanks for the work on this. I didn't understand everything on the issue,
which is why I didn't say a thing, but I followed the thread, and very much
appreciated the fix.


-- 
Guillaume.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-15 Thread Julien Rouhaud

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:
> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> > 
> > So I'm -1 on Julien's first proposed change, and +1 on his second
> > proposed change (the name of the first argument of
> > pgstat_report_query_id should be query_id).
> 
> Thanks for your analysis.  Updated patch attached with the change
> suggested above.

Thanks Bruce.  It looks good to me.

Re: Extensions not dumped when --schema is used

2021-04-15 Thread Michael Paquier

On Wed, Apr 14, 2021 at 05:31:15AM -0700, Noah Misch wrote:
> The name "without_extension_explicit_schema" arose because that test differs
> from the "without_extension" test by adding --schema=public.  The test named
> "without_extension_implicit_schema" differs from "without_extension" by adding
> --schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive
> of the test.  I recommend picking a different name.  Other than that, the
> change looks good.

Thanks for the review.  I have picked up "internal" instead, as
that's the schema created within the extension itself, and applied the
patch.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-04-15 Thread Michael Paquier

On Wed, Apr 14, 2021 at 11:13:47AM -0500, David Christensen wrote:
> Enclosed is a patch that expands the unit output for
> pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
> numeric output code to account for the larger number of units we're using
> rather than just adding nesting levels.
> 
> There are also a few other places that could benefit from expanded units,
> but this is a standalone starting point.

Please don't forget to add this patch to the next commit fest of July
if you want to get some reviews:
https://commitfest.postgresql.org/33/

Note that the development of Postgres 14 is over, and that there was a
feature freeze last week, but this can be considered for 15.
--
Michael


signature.asc
Description: PGP signature

Re: Problems around compute_query_id

2021-04-15 Thread Julien Rouhaud

On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
> I think we should simply document that %Q is not compatible with
> log_statements.

Hearing no objection I documented that limitation.

> 
> > While making the feature run on some test server, I have noticed that
> > %Q would log some garbage query ID for autovacuum workers that's kept
> > around.  That looks wrong.
> 
> I've not been able to reproduce it, do you have some hint on how to do it?
> 
> Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
> the problem?

It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid.  For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one.  Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.
>From eb8a90b88d8572b110535c6350830cf58632fecd Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 15 Apr 2021 15:28:03 +0800
Subject: [PATCH v1] Additional fixes for compute_query_id.

Statements logged by log_statement parameter always get a zero queryid.  This
is a limitation due to how log_statement is implemented, as it logs all wanted
synctatically valid statements before an indentifier can be calculated,
including invalid statements such as

SELECT not_a_column;

for which an identifier can't be calculated, so document that limitation.

It was also possible that a new process ends up emitting logs with an incorrect
query identifier if the previous PgBackendStatus user reported an identifier as
the pgstat field wasn't correctly reset.

Author: Julien Rouhaud
Reported-by: Fuji Masao
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/yhpku8hfi4no4...@paquier.xyz
---
 doc/src/sgml/config.sgml| 10 ++
 src/backend/utils/activity/backend_status.c |  1 +
 2 files changed, 11 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 776ab1a8c8..d5735b72c8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7139,6 +7139,16 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 
 

+
+   
+
+ The %Q escape will always report a zero identifier
+ when used together with  as this
+ parameter logs statements before an identifier can be calculated,
+ including invalid statements for which an identifier cannot be
+ calculated.
+
+   
   
  
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 6110113e56..80ffcdf6de 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -398,6 +398,7 @@ pgstat_bestart(void)
 	lbeentry.st_state = STATE_UNDEFINED;
 	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
 	lbeentry.st_progress_command_target = InvalidOid;
+	lbeentry.st_queryid = UINT64CONST(0);
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should
-- 
2.30.1

Re: Replication slot stats misgivings

2021-04-15 Thread Masahiko Sawada

On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila  wrote:
>
> On Wed, Apr 14, 2021 at 5:52 PM vignesh C  wrote:
> >
>
> I have made minor changes to the 0001 and 0002 patches. Attached is
> the combined patch for them, I think we can push them as one patch.
> Changes made are (a) minor editing in comments, (b) changed the
> condition when to report stats such that unless we have processed any
> bytes, we shouldn't send those, (c) removed some unrelated changes
> from 0002, (d) ran pgindent.
>
> Let me know what you think of the attached?

Thank you for updating the patch.

I have one question on the doc change:

+so the counter is not incremented for subtransactions. Note that this
+includes the transactions streamed and or spilled.
+   

The patch uses the sentence "streamed and or spilled" in two places.
You meant “streamed and spilled”? Even if it actually means “and or”,
using "and or” (i.g., connecting “and” to “or” by a space) is general?
I could not find we use it other places in the doc but found we're
using "and/or" instead.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Replication slot stats misgivings

2021-04-15 Thread Amit Kapila

On Thu, Apr 15, 2021 at 12:45 PM vignesh C  wrote:
>
> On Thu, Apr 15, 2021 at 11:52 AM Amit Kapila  wrote:
> >
> > On Wed, Apr 14, 2021 at 5:52 PM vignesh C  wrote:
> > >
> >
> > I have made minor changes to the 0001 and 0002 patches. Attached is
> > the combined patch for them, I think we can push them as one patch.
> > Changes made are (a) minor editing in comments, (b) changed the
> > condition when to report stats such that unless we have processed any
> > bytes, we shouldn't send those, (c) removed some unrelated changes
> > from 0002, (d) ran pgindent.
> >
> > Let me know what you think of the attached?
>
> Changes look fine to me, the patch applies neatly and make check-world passes.
>

Thanks! Sawada-San, others, unless you have any suggestions, I am
planning to push
v11-0001-Add-information-of-total-data-processed-to-repli.patch
tomorrow.


-- 
With Regards,
Amit Kapila.

Re: Replication slot stats misgivings

2021-04-15 Thread vignesh C

On Thu, Apr 15, 2021 at 11:52 AM Amit Kapila  wrote:
>
> On Wed, Apr 14, 2021 at 5:52 PM vignesh C  wrote:
> >
>
> I have made minor changes to the 0001 and 0002 patches. Attached is
> the combined patch for them, I think we can push them as one patch.
> Changes made are (a) minor editing in comments, (b) changed the
> condition when to report stats such that unless we have processed any
> bytes, we shouldn't send those, (c) removed some unrelated changes
> from 0002, (d) ran pgindent.
>
> Let me know what you think of the attached?

Changes look fine to me, the patch applies neatly and make check-world passes.

Regards,
Vignesh

Re: logical replication empty transactions

2021-04-15 Thread Ajin Cherian

On Thu, Apr 15, 2021 at 1:29 PM Ajin Cherian  wrote:

>
> I've rebased the patch and made changes so that the patch supports
> "streaming in-progress transactions" and handling of logical decoding
> messages (transactional and non-transactional).
> I see that this patch not only makes sure that empty transactions are not
> sent but also does call OutputPluginUpdateProgress when an empty
> transaction is not sent, as a result the confirmed_flush_lsn is kept
> moving. I also see no hangs when synchronous_standby is configured.
> Do let me know your thoughts on this patch.
>
>
Removed some debug logs and typos.

regards,
Ajin Cherian
Fujitsu Australia


v3-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data

Re: Replication slot stats misgivings

2021-04-15 Thread Amit Kapila

On Wed, Apr 14, 2021 at 5:52 PM vignesh C  wrote:
>

I have made minor changes to the 0001 and 0002 patches. Attached is
the combined patch for them, I think we can push them as one patch.
Changes made are (a) minor editing in comments, (b) changed the
condition when to report stats such that unless we have processed any
bytes, we shouldn't send those, (c) removed some unrelated changes
from 0002, (d) ran pgindent.

Let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.


v11-0001-Add-information-of-total-data-processed-to-repli.patch
Description: Binary data

Re: Can a child process detect postmaster death when in pg_usleep?

2021-04-15 Thread Bharath Rupireddy

On Thu, Apr 15, 2021 at 5:28 AM Thomas Munro  wrote:

Thanks a lot for the detailed explanation.

> On Thu, Apr 15, 2021 at 2:06 AM Bharath Rupireddy
>  wrote:
> > 1) Is it really harmful to use pg_usleep in a postmaster child process
> > as it doesn't let the child process detect postmaster death?
>
> Yeah, that's a bad idea.  Any long-term waiting (including short waits
> in a loop) should ideally be done with the latch infrastructure.

Agree. Along with short waits in a loop, I think we also should
replace pg_usleep with WaitLatch that has a user configurable
parameter like below:

pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
pg_usleep(PostAuthDelay * 100L);
pg_usleep(CommitDelay);

> 4) Are there any places where we need to replace pg_usleep with
> > WaitLatch/equivalent of pg_sleep to detect the postmaster death
> > properly?
>
> We definitely have replaced a lot of sleeps with latch.c primitives
> over the past few years, since we got WL_EXIT_ON_PM_DEATH and
> condition variables.  There may be many more to improve...  You
> mentioned autovacuum... yeah, Stephen fixed one of these with commit
> 4753ef37, but yeah it's not great to have those others in there...

I have not looked at the commit 4753ef37 previously, but it
essentially addresses the problem with pg_usleep for vacuum delay. I'm
thinking we can also replace pg_usleep in below places based on the
fact that pg_usleep should be avoided in 1) short waits in a loop 2)
when wait time is dependent on user configurable parameters. And using
WaitLatch may require us to add wait event types to WaitEventTimeout
enum, but that's okay.

1) pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L); in lazy_truncate_heap
2) pg_usleep(CommitDelay); in XLogFlush
3) pg_usleep(1L); in CreateCheckPoint
4) pg_usleep(100L); in do_pg_stop_backup
5) pg_usleep(1000L); in read_local_xlog_page
6) pg_usleep(PostAuthDelay * 100L); in AutoVacLauncherMain,
AutoVacWorkerMain, StartBackgroundWorker, InitPostgres
7) pg_usleep(10L); in RequestCheckpoint
8) pg_usleep(100L); in pgarch_ArchiverCopyLoop
9) pg_usleep(PGSTAT_RETRY_DELAY * 1000L); in backend_read_statsfile
10) pg_usleep(PreAuthDelay * 100L); in BackendInitialize
11) pg_usleep(1L); in WalSndWaitStopping
12) pg_usleep(standbyWait_us); in WaitExceedsMaxStandbyDelay
13) pg_usleep(1L); in RegisterSyncRequest

I'm sure we won't be changing in all of the above places. It will be
good to review and correct the above list.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

84 matches

Mail list logo