Re: Higher level questions around shared memory stats

2022-03-31 Thread Kyotaro Horiguchi
At Thu, 31 Mar 2022 22:12:25 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2022-03-31 20:06:07 -0700, David G. Johnston wrote:
> > Do we really think no one has taken our advice in the documentation and
> > moved their stats_temp_directory to a RAM-based file system?
> 
> I'm pretty sure some have, I've seen it in the field in the past.

Oh. But no problem if no extensions enumerated upthread are not used
there.  If one of them is used, the DBA saw some files generated under
pg_stat_tmp..

> > The question is whether current uses of PG_STAT_TMP_DIR can be made to use
> > the value of the GUC without them knowing or caring about the fact we
> > changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever
> > the current value of stats_temp_directory is.  I take it from the compiler
> > directive of #define that this isn't doable.
> 
> Correct, we can't.
> 
> 
> > If we later want to coerce extensions (and even our own code) to use a
> > user-supplied directory we can then remove the define and give them an API
> > to use instead.
> 
> FWIW, it's not quite there yet (as in, not a goal for 15), but with a bit
> further work, a number of such extensions could use the shared memory stats
> interface to store their additional stats. And they wouldn't have to care
> about where the files get stored.

pg_stat_statements has moved stored queries from shared memory to file
in comparison between memory efficiency and speed.  So fast storage
could give benefit. I'm not sure the files is flushed so frequently,
though.  And in the first place 

But, as Andres saying, currently it *is* stored in the data directory
and no one seems complaing.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: psql - add SHOW_ALL_RESULTS option

2022-03-31 Thread Fabien COELHO



Attached a rebase.


Again, after the SendQuery refactoring extraction.

--
Fabien.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index e0abe34bb6..8f7f93172a 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -50,8 +50,28 @@ BEGIN \;
 SELECT 2.0 AS "float" \;
 SELECT 'world' AS "text" \;
 COMMIT;
+ float 
+---
+   2.0
+(1 row)
+
+ text  
+---
+ world
+(1 row)
+
 -- compound with empty statements and spurious leading spacing
 \;\;   SELECT 3 + 3 \;\;\;   SELECT ' ' || ' !' \;\;   SELECT 1 + 4 \;;
+ ?column? 
+--
+6
+(1 row)
+
+ ?column? 
+--
+   !
+(1 row)
+
  ?column? 
 --
 5
@@ -61,6 +81,11 @@ COMMIT;
 SELECT 1 + 1 + 1 AS "add" \gset
 SELECT :add + 1 + 1 AS "add" \;
 SELECT :add + 1 + 1 AS "add" \gset
+ add 
+-
+   5
+(1 row)
+
 -- set operator
 SELECT 1 AS i UNION SELECT 2 ORDER BY i;
  i 
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index caabb06c53..f01adb1fd2 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -127,18 +127,11 @@ echo '\x \\ SELECT * FROM foo;' | psql
commands included in the string to divide it into multiple
transactions.  (See 
for more details about how the server handles multi-query strings.)
-   Also, psql only prints the
-   result of the last SQL command in the string.
-   This is different from the behavior when the same string is read from
-   a file or fed to psql's standard input,
-   because then psql sends
-   each SQL command separately.
   
   
-   Because of this behavior, putting more than one SQL command in a
-   single -c string often has unexpected results.
-   It's better to use repeated -c commands or feed
-   multiple commands to psql's standard input,
+   If having several commands executed in one transaction is not desired, 
+   use repeated -c commands or feed multiple commands to
+   psql's standard input,
either using echo as illustrated above, or
via a shell here-document, for example:
 
@@ -3570,10 +3563,6 @@ select 1\; select 2\; select 3;
 commands included in the string to divide it into multiple
 transactions.  (See 
 for more details about how the server handles multi-query strings.)
-psql prints only the last query result
-it receives for each request; in this example, although all
-three SELECTs are indeed executed, psql
-only prints the 3.
 
 
   
@@ -4160,6 +4149,18 @@ bar
   
 
   
+SHOW_ALL_RESULTS
+
+
+When this variable is set to off, only the last
+result of a combined query (\;) is shown instead of
+all of them.  The default is on.  The off behavior
+is for compatibility with older versions of psql.
+
+
+  
+
+   
 SHOW_CONTEXT
 
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index c9847c8f9a..c84688e89c 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -32,9 +32,10 @@
 
 static bool DescribeQuery(const char *query, double *elapsed_msec);
 static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec);
-static bool ExecQueryAndProcessResult(const char *query, double *elapsed_msec, bool *svpt_gone_p);
 static bool command_no_begin(const char *query);
 static bool is_select_command(const char *query);
+static int SendQueryAndProcessResults(const char *query, double *pelapsed_msec,
+	bool is_watch, const printQueryOpt *opt, FILE *printQueryFout, bool *tx_ended);
 
 
 /*
@@ -355,7 +356,7 @@ CheckConnection(void)
  * Returns true for valid result, false for error state.
  */
 static bool
-AcceptResult(const PGresult *result)
+AcceptResult(const PGresult *result, bool show_error)
 {
 	bool		OK;
 
@@ -386,7 +387,7 @@ AcceptResult(const PGresult *result)
 break;
 		}
 
-	if (!OK)
+	if (!OK && show_error)
 	{
 		const char *error = PQerrorMessage(pset.db);
 
@@ -474,6 +475,18 @@ ClearOrSaveResult(PGresult *result)
 	}
 }
 
+/*
+ * Consume all results
+ */
+static void
+ClearOrSaveAllResults()
+{
+	PGresult	*result;
+
+	while ((result = PQgetResult(pset.db)) != NULL)
+		ClearOrSaveResult(result);
+}
+
 
 /*
  * Print microtiming output.  Always print raw milliseconds; if the interval
@@ -574,7 +587,7 @@ PSQLexec(const char *query)
 
 	ResetCancelConn();
 
-	if (!AcceptResult(res))
+	if (!AcceptResult(res, true))
 	{
 		ClearOrSaveResult(res);
 		res = NULL;
@@ -597,11 +610,8 @@ int
 PSQLexecWatch(const char *query, const printQueryOpt *opt, FILE *printQueryFout)
 {
 	bool		timing = pset.timing;
-	PGresult   *res;
 	double		elapsed_msec = 0;
-	instr_time	before;
-	instr_time	after;
-	FILE	   *fout;
+	int			res;
 
 	if (!pset.db)
 	{
@@ 

Re: Error "initial slot snapshot too large" in create replication slot

2022-03-31 Thread Kyotaro Horiguchi
At Sun, 13 Feb 2022 17:35:38 +0300, Yura Sokolov  
wrote in 
> В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет:
> > Right. I think the patch looks fine to me.
> > 
> 
> Good day.
> 
> I've looked to the patch. Personally I'd prefer dynamically resize
> xip array. But I think there is issue with upgrade if replica source
> is upgraded before destination, right?

I don't think it is relevant.  I think we don't convey snapshot via
streaming replication.  But I think that expanding xip or subxip is
wrong, since it is tied with ProcArray structure. (Even though we
abuse the arrays in some situations, like this).

> Concerning patch, I think more comments should be written about new
> usage case for `takenDuringRecovery`. May be this field should be renamed
> at all?

I don't feel the need to rename it so much. It just signals that "this
snapshot is in the form for recovery".  The more significant reason is
that I don't come up better name:p

And the comment is slightly modified and gets a pointer to detailed
comment.

+* Although this snapshot is not acutally taken during recovery, all 
XIDs
+* are stored in subxip. See GetSnapshotData for the details of that 
form
+* of snapshot.


> And there are checks for `takenDuringRecovery` in `heapgetpage` and
> `heapam_scan_sample_next_tuple`. Are this checks affected by the change?
> Neither the preceding discussion nor commit message answer me.

The snapshot works correctly, but for the heapgetpage case, it foces
all_visible to be false.  That unnecessarily prevents visibility check
from skipping.

An annoying thing in SnapBuildInitialSnapshot is that we don't know
the number of xids before looping over the xid range, and we don't
want to bother sorting top-level xids and subxids unless we have to do
so.

Is it better that we hassle in SnapBuildInitialSnapshot to create a
!takenDuringRecovery snapshot?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: logical decoding and replication of sequences

2022-03-31 Thread Amit Kapila
On Fri, Apr 1, 2022 at 10:22 AM Masahiko Sawada  wrote:
>
> On Sat, Mar 26, 2022 at 6:56 PM Tomas Vondra
>  wrote:
> > >>
> > >> But there's a more serious issue, I think. So far, we allowed this:
> > >>
> > >>   BEGIN;
> > >>   CREATE SEQUENCE s2;
> > >>   ALTER PUBLICATION p ADD SEQUENCE s2;
> > >>   INSERT INTO seq_test SELECT nextval('s2') FROM generate_series(1,100);
> > >>   COMMIT;
> > >>
> > >> and the behavior was that we replicated the changes. But with the patch
> > >> applied, that no longer happens, because should_apply_changes_for_rel
> > >> says the change should not be applied.
> > >>
> > >> And after thinking about this, I think that's correct - we can't apply
> > >> changes until ALTER SUBSCRIPTION ... REFRESH PUBLICATION gets executed,
> > >> and we can't do that until the transaction commits.
> > >>
> > >> So I guess that's correct, and the current behavior is a bug.
> > >>
> > >
> > > Yes, I also think that is a bug.
> > >
> >
> > OK
>
> I also think that this is a bug. Given this behavior is a bug and
> newly-added sequence data should be replicated only after ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION, is there any case where the
> sequence message applied on the subscriber is transactional?
>

It could be required for Alter Sequence as that can also rewrite the
relfilenode. However, IIUC, I think there is a bigger problem with
non-transactional sequence implementation as that can cause
inconsistent replica. See the problem description and test case in my
previous email [1] (While thinking about this, I think I see a problem
with the non-transactional handling of sequences). Can you please
once check that and let me know if I am missing something there? If
not, then I think we may need to first think of a solution for
non-transactional sequence handling.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KAFdQEULk%2B4C%3DieWA5UPSUtf_gtqKsFj9J9f2c%3D8hm4g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas

2022-03-31 Thread Michael Paquier
On Thu, Mar 31, 2022 at 12:09:35PM +0200, Matthias van de Meent wrote:
> PageInit MAXALIGNs the size of the special area that it receives as an
> argument; so any changes to the page header that would misalign the
> value would be AM-specific; in which case it is quite unlikely that
> this is the right accessor for your page's special area.

Right.  I'd still be tempted to keep that per-AM rather than making
the checks deeper with one extra macro layer in the page header or
with a different macro that would depend on the opaque type, though.
Like in the attached, for example.
--
Michael
diff --git a/src/include/access/ginblock.h b/src/include/access/ginblock.h
index 9347f464f3..050fff80dc 100644
--- a/src/include/access/ginblock.h
+++ b/src/include/access/ginblock.h
@@ -108,7 +108,11 @@ typedef struct GinMetaPageData
 /*
  * Macros for accessing a GIN index page's opaque data
  */
-#define GinPageGetOpaque(page) ( (GinPageOpaque) PageGetSpecialPointer(page) )
+#define GinPageGetOpaque(page) \
+( \
+	AssertMacro(PageGetSpecialSize(page) == MAXALIGN(sizeof(GinPageOpaqueData))), \
+	(GinPageOpaque) PageGetSpecialPointer(page) \
+)
 
 #define GinPageIsLeaf(page)( (GinPageGetOpaque(page)->flags & GIN_LEAF) != 0 )
 #define GinPageSetLeaf(page)   ( GinPageGetOpaque(page)->flags |= GIN_LEAF )
diff --git a/src/include/access/gist.h b/src/include/access/gist.h
index a3337627b8..a10caf8ae9 100644
--- a/src/include/access/gist.h
+++ b/src/include/access/gist.h
@@ -164,7 +164,11 @@ typedef struct GISTENTRY
 	bool		leafkey;
 } GISTENTRY;
 
-#define GistPageGetOpaque(page) ( (GISTPageOpaque) PageGetSpecialPointer(page) )
+#define GistPageGetOpaque(page) \
+( \
+	AssertMacro(PageGetSpecialSize(page) == MAXALIGN(sizeof(GISTPageOpaqueData))), \
+	(GISTPageOpaque) PageGetSpecialPointer(page) \
+)
 
 #define GistPageIsLeaf(page)	( GistPageGetOpaque(page)->flags & F_LEAF)
 #define GIST_LEAF(entry) (GistPageIsLeaf((entry)->page))
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index da372841c4..b442ddc4c8 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -85,7 +85,11 @@ typedef struct HashPageOpaqueData
 
 typedef HashPageOpaqueData *HashPageOpaque;
 
-#define HashPageGetOpaque(page) ((HashPageOpaque) PageGetSpecialPointer(page))
+#define HashPageGetOpaque(page) \
+( \
+	AssertMacro(PageGetSpecialSize(page) == MAXALIGN(sizeof(HashPageOpaqueData))), \
+	(HashPageOpaque) PageGetSpecialPointer(page) \
+)
 
 #define H_NEEDS_SPLIT_CLEANUP(opaque)	(((opaque)->hasho_flag & LH_BUCKET_NEEDS_SPLIT_CLEANUP) != 0)
 #define H_BUCKET_BEING_SPLIT(opaque)	(((opaque)->hasho_flag & LH_BUCKET_BEING_SPLIT) != 0)
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 93f8267b48..e11839857d 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -70,7 +70,11 @@ typedef struct BTPageOpaqueData
 
 typedef BTPageOpaqueData *BTPageOpaque;
 
-#define BTPageGetOpaque(page) ((BTPageOpaque) PageGetSpecialPointer(page))
+#define BTPageGetOpaque(page) \
+( \
+	AssertMacro(PageGetSpecialSize(page) == MAXALIGN(sizeof(BTPageOpaqueData))), \
+	(BTPageOpaque) PageGetSpecialPointer(page) \
+)
 
 /* Bits defined in btpo_flags */
 #define BTP_LEAF		(1 << 0)	/* leaf page, i.e. not internal page */
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index eb56b1c6b8..d61433b5af 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -75,7 +75,12 @@ typedef SpGistPageOpaqueData *SpGistPageOpaque;
 #define SPGIST_LEAF			(1<<2)
 #define SPGIST_NULLS		(1<<3)
 
-#define SpGistPageGetOpaque(page) ((SpGistPageOpaque) PageGetSpecialPointer(page))
+#define SpGistPageGetOpaque(page) \
+( \
+	AssertMacro(PageGetSpecialSize(page) == MAXALIGN(sizeof(SpGistPageOpaqueData))), \
+	(SpGistPageOpaque) PageGetSpecialPointer(page) \
+)
+
 #define SpGistPageIsMeta(page) (SpGistPageGetOpaque(page)->flags & SPGIST_META)
 #define SpGistPageIsDeleted(page) (SpGistPageGetOpaque(page)->flags & SPGIST_DELETED)
 #define SpGistPageIsLeaf(page) (SpGistPageGetOpaque(page)->flags & SPGIST_LEAF)


signature.asc
Description: PGP signature


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-31 Thread Fabien COELHO



Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, 
skipped, retried retries?


Anyway now that current CF is closing, it will not be possible to
change those logging design soon. Or can we change the logging design
even after CF is closed?


My 0.02€: I'm not sure how the official guidelines are to be interpreted 
in that case, but if the design is to be changed, ISTM that it is better 
to do it before a release instead of letting the release out with one 
format and changing it in the next release?


--
Fabien.

Re: unnecessary (same) restart_lsn handing in LogicalIncreaseRestartDecodingForSlot

2022-03-31 Thread Amit Kapila
On Wed, Mar 16, 2022 at 5:02 PM Ashutosh Bapat
 wrote:
>
> Hi All,
> At the beginning of LogicalIncreaseRestartDecodingForSlot(), we have codeine
> ```
> 1623 /* don't overwrite if have a newer restart lsn */
> 1624 if (restart_lsn <= slot->data.restart_lsn)
> 1625 {
> 1626 }
> 1627
> 1628 /*
> 1629  * We might have already flushed far enough to directly
> accept this lsn,
> 1630  * in this case there is no need to check for existing candidate LSNs
> 1631  */
> 1632 else if (current_lsn <= slot->data.confirmed_flush)
> 1633 {
> 1634 slot->candidate_restart_valid = current_lsn;
> 1635 slot->candidate_restart_lsn = restart_lsn;
> 1636
> 1637 /* our candidate can directly be used */
> 1638 updated_lsn = true;
> 1639 }
> ```
> This code avoids directly writing slot's persistent data multiple
> times if the restart_lsn does not change between successive running
> transactions WAL records and the confirmed flush LSN is higher than
> all of those.
>
> But if the downstream is fast enough to send at least one newer
> confirmed flush between every two running transactions WAL records, we
> end up overwriting slot's restart LSN even if it does not change
> because of following code
> ```
> 1641 /*
> 1642  * Only increase if the previous values have been applied, otherwise 
> we
> 1643  * might never end up updating if the receiver acks too
> slowly. A missed
> 1644  * value here will just cause some extra effort after reconnecting.
> 1645  */
> 1646 if (slot->candidate_restart_valid == InvalidXLogRecPtr)
> 1647 {
> 1648 slot->candidate_restart_valid = current_lsn;
> 1649 slot->candidate_restart_lsn = restart_lsn;
> 1650 SpinLockRelease(>mutex);
> 1651
> 1652 elog(LOG, "got new restart lsn %X/%X at %X/%X",
> 1653  LSN_FORMAT_ARGS(restart_lsn),
> 1654  LSN_FORMAT_ARGS(current_lsn));
> 1655 }
> ```
>

Are you worried that in corner cases we will update the persistent
copy of slot with same restart_lsn multiple times?

AFAICS, we update persistent copy via LogicalConfirmReceivedLocation()
which is called only when 'updated_lsn' is set and that doesn't get
set in the if check (slot->candidate_restart_valid ==
InvalidXLogRecPtr) you quoted. It is not very clear to me after
reading your email what exactly is your concern, so I might be missing
something.

-- 
With Regards,
Amit Kapila.




Re: Higher level questions around shared memory stats

2022-03-31 Thread Andres Freund
Hi,

On 2022-03-31 20:06:07 -0700, David G. Johnston wrote:
> Do we really think no one has taken our advice in the documentation and
> moved their stats_temp_directory to a RAM-based file system?

I'm pretty sure some have, I've seen it in the field in the past.


> The question is whether current uses of PG_STAT_TMP_DIR can be made to use
> the value of the GUC without them knowing or caring about the fact we
> changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever
> the current value of stats_temp_directory is.  I take it from the compiler
> directive of #define that this isn't doable.

Correct, we can't.


> If we later want to coerce extensions (and even our own code) to use a
> user-supplied directory we can then remove the define and give them an API
> to use instead.

FWIW, it's not quite there yet (as in, not a goal for 15), but with a bit
further work, a number of such extensions could use the shared memory stats
interface to store their additional stats. And they wouldn't have to care
about where the files get stored.

Greetings,

Andres Freund




Re: logical decoding and replication of sequences

2022-03-31 Thread Masahiko Sawada
On Sat, Mar 26, 2022 at 6:56 PM Tomas Vondra
 wrote:
>
>
>
> On 3/26/22 08:28, Amit Kapila wrote:
> > On Fri, Mar 25, 2022 at 10:20 PM Tomas Vondra
> >  wrote:
> >>
> >> Hmm, so fixing this might be a bit trickier than I expected.
> >>
> >> Firstly, currently we only send nspname/relname in the sequence message,
> >> not the remote OID or schema. The idea was that for sequences we don't
> >> really need schema info, so this seemed OK.
> >>
> >> But should_apply_changes_for_rel() needs LogicalRepRelMapEntry, and to
> >> create/maintain that those records we need to send the schema.
> >>
> >> Attached is a WIP patch does that.
> >>
> >> Two places need more work, I think:
> >>
> >> 1) maybe_send_schema needs ReorderBufferChange, but we don't have that
> >> for sequences, we only have TXN. I created a simple wrapper, but maybe
> >> we should just tweak maybe_send_schema to use TXN.
> >>
> >> 2) The transaction handling in is a bit confusing. The non-transactional
> >> increments won't have any explicit commit later, so we can't just rely
> >> on begin_replication_step/end_replication_step. But I want to try
> >> spending a bit more time on this.
> >>
> >
> > I didn't understand what you want to say in point (2).
> >
>
> My point is that handle_apply_sequence() either needs to use the same
> transaction handling as other apply methods, or start (and commit) a
> separate transaction for the "transactional" case.
>
> Which means we can't use the begin_replication_step/end_replication_step
> and the current code seems a bit complex. And I'm not sure it's quite
> correct. So this place needs more work.
>
> >>
> >> But there's a more serious issue, I think. So far, we allowed this:
> >>
> >>   BEGIN;
> >>   CREATE SEQUENCE s2;
> >>   ALTER PUBLICATION p ADD SEQUENCE s2;
> >>   INSERT INTO seq_test SELECT nextval('s2') FROM generate_series(1,100);
> >>   COMMIT;
> >>
> >> and the behavior was that we replicated the changes. But with the patch
> >> applied, that no longer happens, because should_apply_changes_for_rel
> >> says the change should not be applied.
> >>
> >> And after thinking about this, I think that's correct - we can't apply
> >> changes until ALTER SUBSCRIPTION ... REFRESH PUBLICATION gets executed,
> >> and we can't do that until the transaction commits.
> >>
> >> So I guess that's correct, and the current behavior is a bug.
> >>
> >
> > Yes, I also think that is a bug.
> >
>
> OK

I also think that this is a bug. Given this behavior is a bug and
newly-added sequence data should be replicated only after ALTER
SUBSCRIPTION ... REFRESH PUBLICATION, is there any case where the
sequence message applied on the subscriber is transactional?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: proposal: enhancing plpgsql debug API - returns text value of variable content

2022-03-31 Thread Pavel Stehule
čt 31. 3. 2022 v 23:12 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > I am sending updated patch
>
> After studying the list of exposed functions for awhile, it seemed
> to me that we should also expose exec_assign_value.  The new pointers
> allow a plugin to compute a value in Datum+isnull format, but then it
> can't do much of anything with it: exec_assign_expr is a completely
> inconvenient API if what you want to do is put a specific Datum
> value into a variable.  Adding exec_assign_value provides "store"
> and "fetch" APIs that are more or less inverses, which should be
> easier to work with.
>
> So I did that and pushed it.
>

great

Thank you

Pavel


>
> regards, tom lane
>


Re: head fails to build on SLES 12 (wal_compression=zstd)

2022-03-31 Thread Michael Paquier
On Thu, Mar 31, 2022 at 09:10:00PM -0400, Tom Lane wrote:
> In short, I think we should push Justin's version-check patch,
> and also fix the SGML docs to say that we require zstd >= 1.4.0.

1.4.0 was released in April 2019, just 3 years ago.  It does not sound
that bad to me to make this version number a requirement for 15~ if
one wants to use zstd.
--
Michael


signature.asc
Description: PGP signature


Re: should vacuum's first heap pass be read-only?

2022-03-31 Thread Peter Geoghegan
On Thu, Mar 31, 2022 at 9:08 PM Dilip Kumar  wrote:
> But with the conveyor belt
> we remember the conveyor belt pageno upto which we have done the index
> vacuum and then we only need to do vacuum for the remaining tids which
> will definitely reduce the index vacuuming passes, right?

Right, exactly -- the index or two that really need to be vacuumed a
lot can have relatively small dead_items arrays.

Other indexes (when eventually vacuumed) will need a larger dead_items
array, with everything we need to get rid of from the index in one big
array. Hopefully this won't matter much. Vacuuming these indexes
should be required infrequently (compared to the bloat-prone indexes).

As I said upthread, when we finally have to perform heap vacuuming
(not heap pruning), it'll probably happen because the heap itself
needs heap vacuuming. We could probably get away with *never* vacuum
certain indexes on tables prone to non-HOT updates, without that ever
causing index bloat.  But heap line pointer bloat is eventually going
to become a real problem with non-HOT updates, no matter what.

-- 
Peter Geoghegan




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-03-31 Thread Kyotaro Horiguchi
At Tue, 29 Mar 2022 09:31:42 -0400, Robert Haas  wrote 
in 
> On Tue, Mar 29, 2022 at 9:28 AM Alvaro Herrera  
> wrote:
> > OK, this is a bug that's been open for years.   A fix can be committed
> > after the feature freeze anyway.
> 
> +1

By the way, may I ask how do we fix this?  The existing recovery code
already generates just-to-be-delete files in a real directory in
pg_tblspc sometimes, and elsewise skip applying WAL records on
nonexistent heap pages.  It is the "mixed" way.

1. stop XLogReadBufferForRedo creating a file in nonexistent
  directories then remember the failure (I'm not sure how big the
  impact is.)


2. unconditionally create all objects required for recovery to proceed..
  2.1 and igore the failures.
  2.2 and remember the failures.

3. Any other?

2 needs to create a real directory in pg_tblspc. So 1?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Logical replication timeout problem

2022-03-31 Thread Amit Kapila
On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
>
> On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
>
> This is exactly our initial analysis and we have tried a patch on
> these lines and it has a noticeable overhead. See [1]. Calling this
> for each change or each skipped change can bring noticeable overhead
> that is why we decided to call it after a certain threshold (100) of
> skipped changes. Now, surely as mentioned in my previous reply we can
> make it generic such that instead of calling this (update_progress
> function as in the patch) for skipped cases, we call it always. Will
> that make it better?
>
> That's what I have in mind but using a different approach.
>
> > The functions CreateInitDecodingContext and CreateDecodingContext receives 
> > the
> > update_progress function as a parameter. These functions are called in 2
> > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
> > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > WalSndUpdateProgress as a progress function. Case (b) does not have one 
> > because
> > it is not required -- local decoding/communication. There is no custom 
> > update
> > progress routine for each output plugin which leads me to the question:
> > couldn't we encapsulate the update progress call into the callback 
> > functions?
> >
>
> Sorry, I don't get your point. What exactly do you mean by this?
> AFAIS, currently we call this output plugin API in pgoutput functions
> only, do you intend to get it invoked from a different place?
>
> It seems I didn't make myself clear. The callbacks I'm referring to the
> *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> *_cb_wrapper() function, we have something like:
>
> if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> NewUpdateProgress(ctx, false);
>
> The NewUpdateProgress function would contain a logic similar to the
> update_progress() from the proposed patch. (A different function name here 
> just
> to avoid confusion.)
>
> The output plugin is responsible to set ctx->progress with the callback
> variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) that we 
> would
> like to run NewUpdateProgress.
>

This sounds like a conflicting approach to what we currently do.
Currently, OutputPluginUpdateProgress() is called from the xact
related pgoutput functions like pgoutput_commit_txn(),
pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we
follow what you are saying then for some of the APIs like
pgoutput_change/_message/_truncate, we need to set the parameter to
invoke NewUpdateProgress() which will internally call
OutputPluginUpdateProgress(), and for the remaining APIs, we will call
in the corresponding pgoutput_* function. I feel if we want to make it
more generic than the current patch, it is better to directly call
what you are referring to here as NewUpdateProgress() in all remaining
APIs like pgoutput_change/_truncate, etc.

-- 
With Regards,
Amit Kapila.




Re: generic plans and "initial" pruning

2022-03-31 Thread David Rowley
On Fri, 1 Apr 2022 at 16:09, Amit Langote  wrote:
> definition of PlannedStmt says this:
>
> /* 
>  *  PlannedStmt node
>  *
>  * The output of the planner
>
> With the ideas that you've outlined below, perhaps we can frame most
> of the things that the patch wants to do as the planner and the
> plancache changes.  If we twist the above definition a bit to say what
> the plancache does in this regard is part of planning, maybe it makes
> sense to add the initial pruning related fields (nodes, outputs) into
> PlannedStmt.

How about the PartitionPruneInfos go into PlannedStmt as a List
indexed in the way I mentioned and the cache of the results of pruning
in EState?

I think that leaves you adding  List *partpruneinfos,  Bitmapset
*minimumlockrtis to PlannedStmt and the thing you have to cache the
pruning results into EState.   I'm not very clear on where you should
stash the results of run-time pruning in the meantime before you can
put them in EState.  You might need to invent some intermediate struct
that gets passed around that you can scribble down some details you're
going to need during execution.

> One question is whether the planner should always pay the overhead of
> initializing this bitmapset?  I mean it's only worthwhile if
> AcquireExecutorLocks() is going to be involved, that is, the plan will
> be cached and reused.

Maybe the Bitmapset for the minimal locks needs to be built with
bms_add_range(NULL, 0, list_length(rtable));  then do
bms_del_members() on the relevant RTIs you find in the listed
PartitionPruneInfos.  That way it's very simple and cheap to do when
there are no PartitionPruneInfos.

> > 4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
> > revived here.  Why don't you just add a partitioned_relids to
> > PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
> > you a Relids of them. PartitionedRelPruneInfo already has an rtindex
> > field, so you just need to bms_add_member whatever that rtindex is.
>
> Hmm, not all Append/MergeAppend nodes in the plan tree may have
> make_partition_pruneinfo() called on them though.

For Append/MergeAppends without run-time pruning you'll want to add
the RTIs to the minimal locking set of RTIs to go into PlannedStmt.
The only things you want to leave out of that are RTIs for the RTEs
that you might run-time prune away during AcquireExecutorLocks().

David




Re: should vacuum's first heap pass be read-only?

2022-03-31 Thread Dilip Kumar
On Fri, Apr 1, 2022 at 1:55 AM Robert Haas  wrote:
>

> But having said that, coming back to this with fresh eyes, I think
> Dilip has a really good point here. If the first thing we do at the
> start of every VACUUM is scan the heap in a way that is guaranteed to
> rediscover all of the dead TIDs that we've previously added to the
> conveyor belt plus maybe also new ones, we may as well just forget the
> whole idea of having a conveyor belt at all. At that point we're just
> talking about a system for deciding when to skip index vacuuming, and
> the conveyor belt is a big complicated piece of machinery that stores
> data we don't really need for anything because if we threw it out the
> next vacuum would reconstruct it anyhow and we'd get exactly the same
> results with less code.

After thinking more about this I see there is some value of
remembering the dead tids in the conveyor belt.  Basically, the point
is if there are multiple indexes and we do the index vacuum for some
of the indexes and skip for others.  And now when we again do the
complete vacuum cycle that time we will again get all the old dead
tids + the new dead tids but without conveyor belt we might need to
perform the multiple cycle of the index vacuum even for the indexes
for which we had done the vacuum in previous vacuum cycle (if all tids
are not fitting in maintenance work mem). But with the conveyor belt
we remember the conveyor belt pageno upto which we have done the index
vacuum and then we only need to do vacuum for the remaining tids which
will definitely reduce the index vacuuming passes, right?

So my stand is, a) for the global indexes we must need the conveyor
belt for remembering the partition wise dead tids (because after
vacuuming certain partitions when we go for global index vacuuming we
don't want to rescan all the partitions to get the same dead items) b)
and even without global indexes there are advantages of storing dead
items in the conveyor belt as explained in my previous paragraph.  So
I think it is worth adding the conveyor belt infrastructure.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-31 Thread Michael Paquier
On Thu, Mar 31, 2022 at 08:42:41PM -0700, Noah Misch wrote:
> The failure looked like this:
> 
> # Running: diff -q 
> /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_lPFv/dump1.sql
>  
> /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_lPFv/dump2.sql
> /usr/bin/diff: illegal option -- q
> usage: diff [-bitw] [-c | -e | -f | -h | -n | -u] file1 file2
>diff [-bitw] [-C number | -U number] file1 file2
>diff [-bitw] [-D string] file1 file2
>diff [-bitw] [-c | -e | -f | -h | -n | -u] [-l] [-r] [-s] [-S name] 
> directory1 directory2
> not ok 4 - old and new dump match after pg_upgrade

Ah, thanks for the information!  So the problem was that the first
commit of the patch took the diff command from the MSVC scripts, and
there is no -q on Solaris 11.3.  Using File::Compare should be enough
to fix the problem, then.  Hopefully.
--
Michael


signature.asc
Description: PGP signature


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-31 Thread Michael Paquier
On Fri, Apr 01, 2022 at 10:16:48AM +0900, Michael Paquier wrote:
> Okay, done after an extra round of self-review.  I have finished by
> tweaking a couple of comments, and adjusted further TESTING to explain
> what needs to be done to have a dump compatible with the test.  Let's
> now see what goes wrong.

So, the first reports are published, and the buildfarm is rather cool
on the matter.  wrasse is the only buildfarm member that has reported
a failure, complaining that the dumps generated do not match.  I am
not completely sure what's going on there, so I have applied an extra
patch to get more information from the logs on failures, and switched
the test to use File::Compare::compare() to check if the dumps match.
This last part feels safer in the long run, anyway.  There should be a
diff command as previous runs used test.sh, so perhaps this is an
issue with its perl.  The next report should tell more.
--
Michael


signature.asc
Description: PGP signature


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-31 Thread Justin Pryzby
On Thu, Mar 31, 2022 at 08:42:41PM -0700, Noah Misch wrote:
> On Fri, Apr 01, 2022 at 10:16:48AM +0900, Michael Paquier wrote:
> > On Thu, Mar 31, 2022 at 09:49:50AM -0400, Tom Lane wrote:
> > > Well, let's go ahead with it and see what happens.  If it's too
> > > much of a mess we can always revert.
> > 
> > Okay, done after an extra round of self-review.  I have finished by
> > tweaking a couple of comments, and adjusted further TESTING to explain
> > what needs to be done to have a dump compatible with the test.  Let's
> > now see what goes wrong.
> 
> The REL_14 buildfarm client did not grab logs from the first failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-01%2001%3A39%3A04
> 
> The failure looked like this:
> 
> # Running: diff -q 
> /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_lPFv/dump1.sql
>  
> /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_lPFv/dump2.sql
> /usr/bin/diff: illegal option -- q
> usage: diff [-bitw] [-c | -e | -f | -h | -n | -u] file1 file2
>diff [-bitw] [-C number | -U number] file1 file2
>diff [-bitw] [-D string] file1 file2
>diff [-bitw] [-c | -e | -f | -h | -n | -u] [-l] [-r] [-s] [-S name] 
> directory1 directory2
> not ok 4 - old and new dump match after pg_upgrade

Is diff -q defined somewhere ?  I can't find it in postgres sources nor in
sources for bf client.

Maybe your bf member could use git diff --exit-code --quiet ?

-- 
Justin




Re: Unit tests for SLRU

2022-03-31 Thread Noah Misch
On Thu, Mar 31, 2022 at 05:30:41PM +0300, Aleksander Alekseev wrote:
> I used src/test/modules/test_* modules as an example.

> If time permits, please take a quick look at the patch and let me know
> if I'm moving the right direction. There will be more tests in the
> final version, but I would appreciate an early feedback.

The default place for this kind of test is regress.c, with plain "make check"
calling the regress.c function.  src/test/modules is for things requiring an
extension module or otherwise unable to run through regress.c.




Re: generic plans and "initial" pruning

2022-03-31 Thread Tom Lane
Amit Langote  writes:
> On Fri, Apr 1, 2022 at 10:32 AM David Rowley  wrote:
>> 1. You've changed the signature of various functions by adding
>> ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
>> put the ExecLockRelsInfo as a new field in PlannedStmt?

> I'm worried about that churn myself and did consider this idea, though
> I couldn't shake the feeling that it's maybe wrong to put something in
> PlannedStmt that the planner itself doesn't produce.

PlannedStmt is part of the plan tree, which MUST be read-only to
the executor.  This is not negotiable.  However, there's other
places that this data could be put, such as QueryDesc.
Or for that matter, couldn't the data structure be created by
the planner?  (It looks like David is proposing exactly that
further down.)

regards, tom lane




Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-31 Thread Noah Misch
On Fri, Apr 01, 2022 at 10:16:48AM +0900, Michael Paquier wrote:
> On Thu, Mar 31, 2022 at 09:49:50AM -0400, Tom Lane wrote:
> > Well, let's go ahead with it and see what happens.  If it's too
> > much of a mess we can always revert.
> 
> Okay, done after an extra round of self-review.  I have finished by
> tweaking a couple of comments, and adjusted further TESTING to explain
> what needs to be done to have a dump compatible with the test.  Let's
> now see what goes wrong.

The REL_14 buildfarm client did not grab logs from the first failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-01%2001%3A39%3A04

The failure looked like this:

# Running: diff -q 
/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_lPFv/dump1.sql
 
/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_lPFv/dump2.sql
/usr/bin/diff: illegal option -- q
usage: diff [-bitw] [-c | -e | -f | -h | -n | -u] file1 file2
   diff [-bitw] [-C number | -U number] file1 file2
   diff [-bitw] [-D string] file1 file2
   diff [-bitw] [-c | -e | -f | -h | -n | -u] [-l] [-r] [-s] [-S name] 
directory1 directory2
not ok 4 - old and new dump match after pg_upgrade




Re: generic plans and "initial" pruning

2022-03-31 Thread Amit Langote
Thanks a lot for looking into this.

On Fri, Apr 1, 2022 at 10:32 AM David Rowley  wrote:
> I've been looking over the v8 patch and I'd like to propose semi-baked
> ideas to improve things.  I'd need to go and write them myself to
> fully know if they'd actually work ok.
>
> 1. You've changed the signature of various functions by adding
> ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
> put the ExecLockRelsInfo as a new field in PlannedStmt?
>
> I think the above gets around messing the signatures of
> CreateQueryDesc(), ExplainOnePlan(), pg_plan_queries(),
> PortalDefineQuery(), ProcessQuery() It would get rid of your change of
> foreach to forboth in execute_sql_string() / PortalRunMulti() and gets
> rid of a number of places where your carrying around a variable named
> execlockrelsinfo_list. It would also make the patch significantly
> easier to review as you'd be touching far fewer files.

I'm worried about that churn myself and did consider this idea, though
I couldn't shake the feeling that it's maybe wrong to put something in
PlannedStmt that the planner itself doesn't produce.  I mean the
definition of PlannedStmt says this:

/* 
 *  PlannedStmt node
 *
 * The output of the planner

With the ideas that you've outlined below, perhaps we can frame most
of the things that the patch wants to do as the planner and the
plancache changes.  If we twist the above definition a bit to say what
the plancache does in this regard is part of planning, maybe it makes
sense to add the initial pruning related fields (nodes, outputs) into
PlannedStmt.

> 2. I don't really like the way you've gone about most of the patch...
>
> The way I imagine this working is that during create_plan() we visit
> all nodes that have run-time pruning then inside create_append_plan()
> and create_merge_append_plan() we'd tag those onto a new field in
> PlannerGlobal  That way you can store the PartitionPruneInfos in the
> new PlannedStmt field in standard_planner() after the
> makeNode(PlannedStmt).
>
> Instead of storing the PartitionPruneInfo in the Append / MergeAppend
> struct, you'd just add a new index field to those structs. The index
> would start with 0 for the 0th PartitionPruneInfo. You'd basically
> just know the index by assigning
> list_length(root->glob->partitionpruneinfos).
>
> You'd then assign the root->glob->partitionpruneinfos to
> PlannedStmt.partitionpruneinfos and anytime you needed to do run-time
> pruning during execution, you'd need to use the Append / MergeAppend's
> partition_prune_info_idx to lookup the PartitionPruneInfo in some new
> field you add to EState to store those.  You'd leave that index as -1
> if there's no PartitionPruneInfo for the Append / MergeAppend node.
>
> When you do AcquireExecutorLocks(), you'd iterate over the
> PlannedStmt's PartitionPruneInfo to figure out which subplans to
> prune. You'd then have an array sized
> list_length(plannedstmt->runtimepruneinfos) where you'd store the
> result.  When the Append/MergeAppend node starts up you just check if
> the part_prune_info_idx >= 0 and if there's a non-NULL result stored
> then use that result.  That's how you'd ensure you always got the same
> run-time prune result between locking and plan startup.

Actually, Robert too suggested such an idea to me off-list and I think
it's worth trying.  I was not sure about the implementation, because
then we'd be passing around lists of initial pruning nodes/results
across many function/module boundaries that you mentioned in your
comment 1, but if we agree that PlannedStmt is an acceptable place for
those things to be stored, then I agree it's an attractive idea.

> 3. Also, looking at ExecGetLockRels(), shouldn't it be the planner's
> job to determine the minimum set of relations which must be locked?  I
> think the plan tree traversal during execution not great.  Seems the
> whole point of this patch is to reduce overhead during execution. A
> full additional plan traversal aside from the 3 that we already do for
> start/run/end of execution seems not great.
>
> I think this means that during AcquireExecutorLocks() you'd start with
> the minimum set or RTEs that need to be locked as determined during
> create_plan() and stored in some Bitmapset field in PlannedStmt.

The patch did have a PlannedStmt.lockrels till v6.  Though, it wasn't
the same thing as you are describing it...

> This
> minimal set would also only exclude RTIs that would only possibly be
> used due to a PartitionPruneInfo with initial pruning steps, i.e.
> include RTIs from PartitionPruneInfo with no init pruining steps (you
> can't skip any locks for those).  All you need to do to determine the
> RTEs to lock are to take the minimal set and execute each
> PartitionPruneInfo in the PlannedStmt that has init steps

So just thinking about an Append/MergeAppend, the minimum set must
include the RT indexes of all the partitioned tables whose direct and
indirect children's plans 

Re: Higher level questions around shared memory stats

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 7:33 PM Kyotaro Horiguchi 
wrote:

> At Thu, 31 Mar 2022 14:04:16 -0700, Andres Freund 
> wrote in
> > Hi,
> >
> > On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote:
> > > After moving to shared stats, we might want to expose the GUC variable
> > > itself. Then hide/remove the macro PG_STAT_TMP_DIR.  This breaks the
> > > extensions but it is better than keeping using PG_STAT_TMP_DIR for
> > > uncertain reasons. The existence of the macro can be used as the
> > > marker of the feature change.  This is the chance to break the (I
> > > think) bad practice shared among the extensions.  At least I am okay
> > > with that.
> >
> > I don't really understand why we'd want to do it that way round? Why not
> just
> > leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses
> the
> > GUC, we're not loosing anything, and we'd not break extensions
> unnecessarily?
>
> Yeah, I'm two-sided here.
>
> I think so and did so in the past versions of this patch.  But when
> asked anew, I came to think I might want to keep (and make use more
> of) the configuarable aspect of the dbserver's dedicated temporary
> directory.  The change is reliably detectable on extensions and the
> requried change is easy.
>
> > Obviously there's no strong demand for pg_stat_statements et al to use
> the
> > user-configurable stats_temp_directory, given they've not done so for
> years
> > without complaints?  The code to support the configurable stats_temp_dir
> isn't
> > huge, but it's not small either.
>
> I even doubt anyone have stats_temp_directory changed in a cluster.
> Thus I agree that it is reasonable that we take advantage of the
> chance to remove the feature of little significance.
>
>
Do we really think no one has taken our advice in the documentation and
moved their stats_temp_directory to a RAM-based file system?

https://www.postgresql.org/docs/current/runtime-config-statistics.html

It doesn't seem right that extensions are making the decision of where to
place their temporary statistics files.  If the user has specified a
directory for them the system should be placing them there.

The question is whether current uses of PG_STAT_TMP_DIR can be made to use
the value of the GUC without them knowing or caring about the fact we
changed PG_STAT_TMP_DIR from a static define for pg_stat_tmp to whatever
the current value of stats_temp_directory is.  I take it from the compiler
directive of #define that this isn't doable.

Given all that I'd say we remove stats_temp_directory (noting it as being
obsolete as only core code was ever given leave to use it and that core
code now uses shared memory instead - basically forcing the choice of
RAM-based file system onto the user).

If we later want to coerce extensions (and even our own code) to use a
user-supplied directory we can then remove the define and give them an API
to use instead.  I suspect such an API would look different than just "here
is a directory name" anyway (e.g., force extensions to use a subdirectory
under the base directory for their data).  We would name the base directory
GUC something like "stats_temp_base_directory" and be happy that
stats_temp_directory isn't sitting there already giving us the stink eye.

David J.


Re: Logical replication timeout problem

2022-03-31 Thread Euler Taveira
On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
> This is exactly our initial analysis and we have tried a patch on
> these lines and it has a noticeable overhead. See [1]. Calling this
> for each change or each skipped change can bring noticeable overhead
> that is why we decided to call it after a certain threshold (100) of
> skipped changes. Now, surely as mentioned in my previous reply we can
> make it generic such that instead of calling this (update_progress
> function as in the patch) for skipped cases, we call it always. Will
> that make it better?
That's what I have in mind but using a different approach.

> > The functions CreateInitDecodingContext and CreateDecodingContext receives 
> > the
> > update_progress function as a parameter. These functions are called in 2
> > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
> > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > WalSndUpdateProgress as a progress function. Case (b) does not have one 
> > because
> > it is not required -- local decoding/communication. There is no custom 
> > update
> > progress routine for each output plugin which leads me to the question:
> > couldn't we encapsulate the update progress call into the callback 
> > functions?
> >
> 
> Sorry, I don't get your point. What exactly do you mean by this?
> AFAIS, currently we call this output plugin API in pgoutput functions
> only, do you intend to get it invoked from a different place?
It seems I didn't make myself clear. The callbacks I'm referring to the
*_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
*_cb_wrapper() function, we have something like:

if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
NewUpdateProgress(ctx, false);

The NewUpdateProgress function would contain a logic similar to the
update_progress() from the proposed patch. (A different function name here just
to avoid confusion.)

The output plugin is responsible to set ctx->progress with the callback
variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) that we would
like to run NewUpdateProgress.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Higher level questions around shared memory stats

2022-03-31 Thread Kyotaro Horiguchi
At Thu, 31 Mar 2022 14:04:16 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote:
> > After moving to shared stats, we might want to expose the GUC variable
> > itself. Then hide/remove the macro PG_STAT_TMP_DIR.  This breaks the
> > extensions but it is better than keeping using PG_STAT_TMP_DIR for
> > uncertain reasons. The existence of the macro can be used as the
> > marker of the feature change.  This is the chance to break the (I
> > think) bad practice shared among the extensions.  At least I am okay
> > with that.
> 
> I don't really understand why we'd want to do it that way round? Why not just
> leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the
> GUC, we're not loosing anything, and we'd not break extensions unnecessarily?

Yeah, I'm two-sided here.

I think so and did so in the past versions of this patch.  But when
asked anew, I came to think I might want to keep (and make use more
of) the configuarable aspect of the dbserver's dedicated temporary
directory.  The change is reliably detectable on extensions and the
requried change is easy.

> Obviously there's no strong demand for pg_stat_statements et al to use the
> user-configurable stats_temp_directory, given they've not done so for years
> without complaints?  The code to support the configurable stats_temp_dir isn't
> huge, but it's not small either.

I even doubt anyone have stats_temp_directory changed in a cluster.
Thus I agree that it is reasonable that we take advantage of the
chance to remove the feature of little significance.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: unlogged sequences

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 6:03 PM Robert Haas  wrote:

> On Thu, Mar 31, 2022 at 8:44 PM David G. Johnston
>  wrote:
>
> > The "give the user power" argument is also valid.  But since they
> already have power through unowned sequences, having the owned sequences
> more narrowly defined doesn't detract from usability, and in many ways
> enhances it by further reinforcing the fact that the sequence internally
> used when you say "GENERATED ALWAYS AS IDENTITY" is an implementation
> detail - one that has the same persistence as the table.
>
> I think there's a question about what happens in the GENERATED ALWAYS
> AS IDENTITY case. The DDL commands that create such sequences are of
> the form ALTER TABLE something ALTER COLUMN somethingelse GENERATED
> ALWAYS AS (sequence_parameters), and if we need to specify somewhere
> in the whether the sequence should be logged or unlogged, how do we do
> that?


I give answers for the "owned sequences match their owning table's
persistence" model below:

You would not need to specify it - the table is specified and that is
sufficient to know what value to choose.


> Consider:
>
> rhaas=# create unlogged table xyz (a int generated always as identity);
> CREATE TABLE
> rhaas=# \d+ xyz
>  Unlogged table "
> public.xyz"
>  Column |  Type   | Collation | Nullable |   Default
>  | Storage | Compression | Stats target | Description
>
> +-+---+--+--+-+-+--+-
>  a  | integer |   | not null | generated always as
> identity | plain   | |  |
> Access method: heap
>
> rhaas=# \d+ xyz_a_seq
>  Sequence "public.xyz_a_seq"
>   Type   | Start | Minimum |  Maximum   | Increment | Cycles? | Cache
> -+---+-++---+-+---
>  integer | 1 |   1 | 2147483647 | 1 | no  | 1
> Sequence for identity column: public.xyz.a
>
> In this new system, does the user still get a logged sequence?


No


> If they
> get an unlogged sequence, how does dump-and-restore work?


As described in the first response, since ALTER COLUMN is used during
dump-and-restore, the sequence creation occurs in a command where we know
the owning table is unlogged so the created sequence is unlogged.


> What if they
> want to still have a logged sequence?


I was expecting the following to work, though it does not presently:

ALTER SEQUENCE yetanotherthing OWNED BY NONE;
ERROR: cannot change ownership of identity sequence

ALTER SEQUENCE yetanotherthing SET LOGGED;

IMO, the generated case is the stronger one for not allowing them to be
different.  They can fall back onto the DEFAULT
nextval('sequence_that_is_unowned') option to get the desired behavior.

David J.


Re: Logical replication timeout problem

2022-03-31 Thread Amit Kapila
On Fri, Apr 1, 2022 at 7:33 AM Euler Taveira  wrote:
>
> On Thu, Mar 31, 2022, at 9:24 AM, Masahiko Sawada wrote:
>
> On the other hand, possible another solution would be to add a new
> callback that is called e.g., every 1000 changes so that walsender
> does its job such as timeout handling while processing the decoded
> data in reorderbuffer.c. The callback is set only if the walsender
> does logical decoding, otherwise NULL. With this idea, other plugins
> will also be able to benefit without changes. But I’m not really sure
> it’s a good design, and adding a new callback introduces complexity.
>
> No new callback is required.
>
> In the current code, each output plugin callback is responsible to call
> OutputPluginUpdateProgress. It is up to the output plugin author to add calls
> to this function. The lack of a call in a callback might cause issues like 
> what
> was described in the initial message.
>

This is exactly our initial analysis and we have tried a patch on
these lines and it has a noticeable overhead. See [1]. Calling this
for each change or each skipped change can bring noticeable overhead
that is why we decided to call it after a certain threshold (100) of
skipped changes. Now, surely as mentioned in my previous reply we can
make it generic such that instead of calling this (update_progress
function as in the patch) for skipped cases, we call it always. Will
that make it better?

> The functions CreateInitDecodingContext and CreateDecodingContext receives the
> update_progress function as a parameter. These functions are called in 2
> places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
> SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> WalSndUpdateProgress as a progress function. Case (b) does not have one 
> because
> it is not required -- local decoding/communication. There is no custom update
> progress routine for each output plugin which leads me to the question:
> couldn't we encapsulate the update progress call into the callback functions?
>

Sorry, I don't get your point. What exactly do you mean by this?
AFAIS, currently we call this output plugin API in pgoutput functions
only, do you intend to get it invoked from a different place?

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275DFFDAC7A59FA148931529E209%40OS3PR01MB6275.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: should vacuum's first heap pass be read-only?

2022-03-31 Thread Peter Geoghegan
On Thu, Mar 31, 2022 at 5:31 PM Robert Haas  wrote:
> > I agree.
>
> But in 
> http://postgr.es/m/ca+tgmoa6kveeurtyeoi3a+ra2xuynwqmj_s-h4kun6-bkmm...@mail.gmail.com
> (and the messages just before and just after it) we seemed to be
> agreeing on a design where that's exactly what happens. It seemed like
> a good idea to me at the time, but now it seems like it's a bad idea,
> because it involves using the conveyor belt in a way that adds no
> value.

There are two types of heap scans (in my mind, at least): those that
prune, and those that VACUUM. While there has traditionally been a 1:1
correspondence between these two scans (barring cases with no LP_DEAD
items whatsoever), that's no longer in Postgres 14, which added the
"bypass index scan in the event of few LP_DEAD items left by pruning"
optimization (or Postgres 12 if you count INDEX_CLEANUP=off).

When I said "I agree" earlier today, I imagined that I was pretty much
affirming everything else that I'd said up until that point of the
email. Which is that the conveyor belt is interesting as a way of
breaking (or even just loosening) dependencies on the *order* in which
we perform work within a given "VACUUM cycle". Things can be much
looser than they are today, with indexes (which we've discussed a lot
already), and even with heap pruning (which I brought up for the first
time just today).

However, I don't see any way that it will be possible to break one
particular ordering dependency, even with the conveyor belt stuff: The
"basic invariant" described in comments above lazy_scan_heap(), which
describes rules about TID recycling -- we can only recycle TIDs when a
full "VACUUM cycle" completes, just like today.

That was a point I was making in the email from back in February:
obviously it's unsafe to do lazy_vacuum_heap_page() processing of a
page until we're already 100% sure that the LP_DEAD items are not
referenced by any indexes, even indexes that have very little bloat
(that don't really need to be vacuumed for their own sake). However,
the conveyor belt can add value by doing much more frequent processing
in lazy_scan_prune() (of different pages each time, or perhaps even
repeat processing of the same heap pages), and much more frequent
index vacuuming for those indexes that seem to need it.

So the lazy_scan_prune() work (pruning and freezing) can and probably
should be separated in time from the index vacuuming (compared to the
current design). Maybe not for all of the indexes -- typically for the
majority, maybe 8 out of 10. We can do much less index vacuuming in
those indexes that don't really need it, in order to be able to do
much more in those that do. At some point we must "complete a whole
cycle of heap vacuuming" by processing all the heap pages using
lazy_vacuum_heap_page() that need it.

Separately, the conveyor belt seems to have promise as a way of
breaking up work for multiplexing, or parallel processing.

-- 
Peter Geoghegan




Re: Logical replication timeout problem

2022-03-31 Thread Euler Taveira
On Thu, Mar 31, 2022, at 9:24 AM, Masahiko Sawada wrote:
> The patch basically looks good to me. But the only concern to me is
> that once we get the patch committed, we will have to call
> update_progress() at all paths in callbacks that process changes.
> Which seems poor maintainability.
I didn't like the current fix for the same reason. We need a robust feedback
system for logical replication. We had this discussion in the "skip empty
transactions" thread [1].

> On the other hand, possible another solution would be to add a new
> callback that is called e.g., every 1000 changes so that walsender
> does its job such as timeout handling while processing the decoded
> data in reorderbuffer.c. The callback is set only if the walsender
> does logical decoding, otherwise NULL. With this idea, other plugins
> will also be able to benefit without changes. But I’m not really sure
> it’s a good design, and adding a new callback introduces complexity.
No new callback is required.

In the current code, each output plugin callback is responsible to call
OutputPluginUpdateProgress. It is up to the output plugin author to add calls
to this function. The lack of a call in a callback might cause issues like what
was described in the initial message.

The functions CreateInitDecodingContext and CreateDecodingContext receives the
update_progress function as a parameter. These functions are called in 2
places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
WalSndUpdateProgress as a progress function. Case (b) does not have one because
it is not required -- local decoding/communication. There is no custom update
progress routine for each output plugin which leads me to the question:
couldn't we encapsulate the update progress call into the callback functions?
If so, we could have an output plugin parameter to inform which callbacks we
would like to call the update progress routine. This would simplify the code,
make it less error prone and wouldn't impose a burden on maintainability.

[1] 
https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Logical replication timeout problem

2022-03-31 Thread Amit Kapila
On Thu, Mar 31, 2022 at 5:55 PM Masahiko Sawada  wrote:
> On Wed, Mar 30, 2022 at 6:00 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 30, 2022 at 1:24 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tues, Mar 29, 2022 at 9:45 AM I wrote:
> > > > Attach the new patch.
> > >
> > > Rebase the patch because the commit d5a9d86d in current HEAD.
> > >
> >
> > Thanks, this looks good to me apart from a minor indentation change
> > which I'll take care of before committing. I am planning to push this
> > day after tomorrow on Friday unless there are any other major
> > comments.
>
> The patch basically looks good to me. But the only concern to me is
> that once we get the patch committed, we will have to call
> update_progress() at all paths in callbacks that process changes.
> Which seems poor maintainability.
>
> On the other hand, possible another solution would be to add a new
> callback that is called e.g., every 1000 changes so that walsender
> does its job such as timeout handling while processing the decoded
> data in reorderbuffer.c. The callback is set only if the walsender
> does logical decoding, otherwise NULL. With this idea, other plugins
> will also be able to benefit without changes. But I’m not really sure
> it’s a good design, and adding a new callback introduces complexity.
>

Yeah, same here. I have also mentioned another way to expose an API
from reorderbuffer [1] by introducing a skip API but just not sure if
that or this API is generic enough to make it adding worth. Also, note
that the current patch makes the progress recording of large
transactions somewhat better when most of the changes are skipped. We
can further extend it to make it true for other cases as well but that
probably can be done separately if required as that is not required
for this bug-fix.

I intend to commit this patch today but I think it is better to wait
for a few more days to see if anybody has any opinion on this matter.
I'll push this on Tuesday unless we decide to do something different
here.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BfQjndoBOFUn9Wy0hhm3MLyUWEpcT9O7iuCELktfdBiQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: generic plans and "initial" pruning

2022-03-31 Thread David Rowley
On Thu, 31 Mar 2022 at 16:25, Amit Langote  wrote:
> Rebased.

I've been looking over the v8 patch and I'd like to propose semi-baked
ideas to improve things.  I'd need to go and write them myself to
fully know if they'd actually work ok.

1. You've changed the signature of various functions by adding
ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
put the ExecLockRelsInfo as a new field in PlannedStmt?

I think the above gets around messing the signatures of
CreateQueryDesc(), ExplainOnePlan(), pg_plan_queries(),
PortalDefineQuery(), ProcessQuery() It would get rid of your change of
foreach to forboth in execute_sql_string() / PortalRunMulti() and gets
rid of a number of places where your carrying around a variable named
execlockrelsinfo_list. It would also make the patch significantly
easier to review as you'd be touching far fewer files.

2. I don't really like the way you've gone about most of the patch...

The way I imagine this working is that during create_plan() we visit
all nodes that have run-time pruning then inside create_append_plan()
and create_merge_append_plan() we'd tag those onto a new field in
PlannerGlobal  That way you can store the PartitionPruneInfos in the
new PlannedStmt field in standard_planner() after the
makeNode(PlannedStmt).

Instead of storing the PartitionPruneInfo in the Append / MergeAppend
struct, you'd just add a new index field to those structs. The index
would start with 0 for the 0th PartitionPruneInfo. You'd basically
just know the index by assigning
list_length(root->glob->partitionpruneinfos).

You'd then assign the root->glob->partitionpruneinfos to
PlannedStmt.partitionpruneinfos and anytime you needed to do run-time
pruning during execution, you'd need to use the Append / MergeAppend's
partition_prune_info_idx to lookup the PartitionPruneInfo in some new
field you add to EState to store those.  You'd leave that index as -1
if there's no PartitionPruneInfo for the Append / MergeAppend node.

When you do AcquireExecutorLocks(), you'd iterate over the
PlannedStmt's PartitionPruneInfo to figure out which subplans to
prune. You'd then have an array sized
list_length(plannedstmt->runtimepruneinfos) where you'd store the
result.  When the Append/MergeAppend node starts up you just check if
the part_prune_info_idx >= 0 and if there's a non-NULL result stored
then use that result.  That's how you'd ensure you always got the same
run-time prune result between locking and plan startup.

3. Also, looking at ExecGetLockRels(), shouldn't it be the planner's
job to determine the minimum set of relations which must be locked?  I
think the plan tree traversal during execution not great.  Seems the
whole point of this patch is to reduce overhead during execution. A
full additional plan traversal aside from the 3 that we already do for
start/run/end of execution seems not great.

I think this means that during AcquireExecutorLocks() you'd start with
the minimum set or RTEs that need to be locked as determined during
create_plan() and stored in some Bitmapset field in PlannedStmt. This
minimal set would also only exclude RTIs that would only possibly be
used due to a PartitionPruneInfo with initial pruning steps, i.e.
include RTIs from PartitionPruneInfo with no init pruining steps (you
can't skip any locks for those).  All you need to do to determine the
RTEs to lock are to take the minimal set and execute each
PartitionPruneInfo in the PlannedStmt that has init steps

4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
revived here.  Why don't you just add a partitioned_relids to
PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
you a Relids of them. PartitionedRelPruneInfo already has an rtindex
field, so you just need to bms_add_member whatever that rtindex is.

It's a fairly high-level review at this stage. I can look in more
detail if the above points get looked at.  You may find or know of
some reason why it can't be done like I mention above.

David




Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

2022-03-31 Thread Kyotaro Horiguchi
At Thu, 31 Mar 2022 08:45:56 -0700, Nathan Bossart  
wrote in 
> At all times, PostgreSQL maintains a
> write ahead log (WAL) in the 
> pg_wal/
> -   subdirectory of the cluster's data directory. The log records
> -   every change made to the database's data files.  This log exists
> +   subdirectory of the cluster's data directory. The WAL records
> +   capture every change made to the database's data files.  This log exists
> 
> I don't think this change really adds anything.  The preceding sentence
> makes it clear that we are discussing the write-ahead log, and IMO the
> change in phrasing ("the log records every change" is changed to "the
> records capture every change") subtly changes the meaning of the sentence.
> 
> The rest looks good to me. 

+1.  It is not a composite noun "log records".

The original sentence is "S(The log) V(records) O(every change that is
made to .. files)".  The proposed change looks like changing it to
"S(The log records) V(capture) O(every ..files)".  In that sense, the
original one seem rather correct to me, since "capture" seems to have
the implication of "write after log..", to me.


I looked though the document and found other use of "log
record|segment".  What do you think about the attached?

There're some uncertain point in the change.

  you should at least save the contents of the cluster's 
pg_wal
- subdirectory, as it might contain logs which
+ subdirectory, as it might contain WAL files which
  were not archived before the system went down.

The "logs" means acutally "WAL segment (files)" but the concept of
"segment" is out of focus in the context.  So just "file" is used
there.  The same change is applied on dezon of places.


-   disk-space requirements for the WAL logs are met,
+   disk-space requirements for the WAL are met,

This might be better be "WAL files" instead of just "WAL".


-   WAL logs are stored in the directory
+   WAL is stored in the directory
pg_wal under the data directory, as a set of

I'm not sure which is better, use "WAL" as a collective noun, or "WAL
files" as the cocrete objects.


-   The aim of WAL is to ensure that the log is
+   The aim of WAL is to ensure that the WAL record is
written before database records are altered, but this can be subverted by

This is not a mechanical change.  But I think this is correct.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index dd8640b092..941042f646 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1246,7 +1246,7 @@ SELECT pg_stop_backup();
  require that you have enough free space on your system to hold two
  copies of your existing database. If you do not have enough space,
  you should at least save the contents of the cluster's 
pg_wal
- subdirectory, as it might contain logs which
+ subdirectory, as it might contain WAL files which
  were not archived before the system went down.
 

@@ -1324,8 +1324,8 @@ SELECT pg_stop_backup();
 which tells PostgreSQL how to retrieve archived
 WAL file segments.  Like the archive_command, this is
 a shell command string.  It can contain %f, which is
-replaced by the name of the desired log file, and %p,
-which is replaced by the path name to copy the log file to.
+replaced by the name of the desired WAL file, and %p,
+which is replaced by the path name to copy the WAL file to.
 (The path name is relative to the current working directory,
 i.e., the cluster's data directory.)
 Write %% if you need to embed an actual 
%
@@ -1651,9 +1651,9 @@ archive_command = 'local_backup_script.sh "%p" "%f"'
  CREATE 
TABLESPACE
  commands are WAL-logged with the literal absolute path, and will
  therefore be replayed as tablespace creations with the same
- absolute path.  This might be undesirable if the log is being
+ absolute path.  This might be undesirable if the WAL is being
  replayed on a different machine.  It can be dangerous even if the
- log is being replayed on the same machine, but into a new data
+ WAL is being replayed on the same machine, but into a new data
  directory: the replay will still overwrite the contents of the
  original tablespace.  To avoid potential gotchas of this sort,
  the best practice is to take a new base backup after creating or
@@ -1670,11 +1670,11 @@ archive_command = 'local_backup_script.sh "%p" "%f"'
 we might need to fix partially-written disk pages.  Depending on
 your system hardware and software, the risk of partial writes might
 be small enough to ignore, in which case you can significantly
-reduce the total volume of archived logs by turning off page
+reduce the total volume of archived WAL files by turning off page
 snapshots using the 
 parameter.  (Read the notes and warnings in 
 before you do so.)  Turning off page snapshots does 

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-31 Thread Michael Paquier
On Thu, Mar 31, 2022 at 09:49:50AM -0400, Tom Lane wrote:
> Well, let's go ahead with it and see what happens.  If it's too
> much of a mess we can always revert.

Okay, done after an extra round of self-review.  I have finished by
tweaking a couple of comments, and adjusted further TESTING to explain
what needs to be done to have a dump compatible with the test.  Let's
now see what goes wrong.
--
Michael


signature.asc
Description: PGP signature


Re: unlogged sequences

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 6:03 PM Robert Haas  wrote:

>
> In this new system, does the user still get a logged sequence? If they
> get an unlogged sequence, how does dump-and-restore work? What if they
> want to still have a logged sequence? But for sequences that are
> simply owned, there is no problem here, and I think that inventing one
> would not be a good plan.
>

There is no design problem here, just coding (including special handling
for pg_upgrade).  When making a sequence owned we can, without requiring
any syntax, choose to change its persistence to match the table owning it.
Or not.  These are basically options 1 and 2 I laid out earlier:

https://www.postgresql.org/message-id/CAKFQuwY6GsC1CvweCkgaYi-%2BHNF2F-fqCp8JpdFK9bk18gqzFA%40mail.gmail.com

I slightly favor 1, making owned sequences implementation details having a
matched persistence mode.  But we seem to be leaning toward option 2 per
subsequent emails. I'm ok with that - just give me an easy way to change
all my upgraded logged sequences to unlogged.  And probably do the same if
I change my table's mode as well.

That there is less implementation complexity is nice but the end user won't
see that.  I think the typical end user would appreciate having the
sequence stay in sync with the table instead of having to worry about those
kinds of details.  Hence my slight favor given toward 1.

David J.


Re: head fails to build on SLES 12 (wal_compression=zstd)

2022-03-31 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 31, 2022 at 7:38 PM Tom Lane  wrote:
>> Indeed.  I tried building against 1.3.6 (mainly because it was laying
>> around) and the error reported by Devrim is just the tip of the iceberg.
>> ...
>> I wonder whether Robert's ambition to be compatible with old versions
>> extends that far.

> It definitely doesn't, and the fact that there's that much difference
> in the APIs between 2018 and the present frankly makes my heart sink.

AFAICS it's just additions; this stuff is not evidence that they
broke anything.

> It looks like this stuff may be what libzstd calls the "advanced API"
> which was considered unstable until 1.4.0 according to
> https://github.com/facebook/zstd/releases -- so maybe we ought to
> declare anything pre-1.4.0. to be unsupported.

That seems like the appropriate answer to me.  I verified that we
build cleanly and pass check-world against 1.4.0, and later I'm
going to set up BF member longfin to use that.  So that will give
us an anchor that we support zstd that far back.  Had we written
this code earlier, maybe we'd have confined ourselves to 1.3.x
features ... but we didn't, and I don't see much value in doing
so now.

In short, I think we should push Justin's version-check patch,
and also fix the SGML docs to say that we require zstd >= 1.4.0.

regards, tom lane




Re: unlogged sequences

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 8:44 PM David G. Johnston
 wrote:
> It seems reasonable to extend the definition of "ownership of a sequence" in 
> this way.  We always let you create unowned sequences with whatever 
> persistence you like if you need flexibility.

I'd say it doesn't seem to have any benefit, and therefore seems
unreasonable. Right now, OWNED BY is documented as a way of getting
the sequence to automatically be dropped if the table column goes
away. If it already did five things, maybe you could argue that this
thing is just like the other five and therefore changing it is the
right idea. But going from one thing to two that don't seem to have
much to do with each other seems much less reasonable, especially
since it doesn't seem to buy anything.

> The "give the user power" argument is also valid.  But since they already 
> have power through unowned sequences, having the owned sequences more 
> narrowly defined doesn't detract from usability, and in many ways enhances it 
> by further reinforcing the fact that the sequence internally used when you 
> say "GENERATED ALWAYS AS IDENTITY" is an implementation detail - one that has 
> the same persistence as the table.

I think there's a question about what happens in the GENERATED ALWAYS
AS IDENTITY case. The DDL commands that create such sequences are of
the form ALTER TABLE something ALTER COLUMN somethingelse GENERATED
ALWAYS AS (sequence_parameters), and if we need to specify somewhere
in the whether the sequence should be logged or unlogged, how do we do
that? Consider:

rhaas=# create unlogged table xyz (a int generated always as identity);
CREATE TABLE
rhaas=# \d+ xyz
 Unlogged table "public.xyz"
 Column |  Type   | Collation | Nullable |   Default
 | Storage | Compression | Stats target | Description
+-+---+--+--+-+-+--+-
 a  | integer |   | not null | generated always as
identity | plain   | |  |
Access method: heap

rhaas=# \d+ xyz_a_seq
 Sequence "public.xyz_a_seq"
  Type   | Start | Minimum |  Maximum   | Increment | Cycles? | Cache
-+---+-++---+-+---
 integer | 1 |   1 | 2147483647 | 1 | no  | 1
Sequence for identity column: public.xyz.a

In this new system, does the user still get a logged sequence? If they
get an unlogged sequence, how does dump-and-restore work? What if they
want to still have a logged sequence? But for sequences that are
simply owned, there is no problem here, and I think that inventing one
would not be a good plan.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-31 Thread Tatsuo Ishii
> Alvaro Herrera  writes:
>>> After:
>>> interval_start num_transactions sum_latency sum_latency_2 min_latency 
>>> max_latency
>>> { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 
>>> min_lag max_lag [ skipped ] ] [ retried retries ]
> 
>> I think that the explanatory paragraph is way too long now, particularly
>> since it explains --failures-detailed starting in the middle.  Also, the
>> example output doesn't include the failures-detailed mode.
> 
> I think the problem is not merely one of documentation, but one of
> bad design.  Up to now it was possible to tell what was what from
> counting the number of columns in the output; but with this design,
> that is impossible.  That should be fixed.  The first thing you have
> got to do is drop the alternation { failures | serialization_failures
> deadlock_failures }.  That doesn't make any sense in the first place:
> counting serialization and deadlock failures doesn't make it impossible
> for other errors to occur.  It'd probably make the most sense to have
> three columns always, serialization, deadlock and total.

+1.

> Now maybe
> that change alone is sufficient, but I'm not convinced, because the
> multiple options at the end of the line mean we will never again be
> able to add any more columns without reintroducing ambiguity.  I
> would be happier if the syntax diagram were such that columns could
> only be dropped from right to left.

Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, 
retried retries?

Anyway now that current CF is closing, it will not be possible to
change those logging design soon. Or can we change the logging design
even after CF is closed?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: unlogged sequences

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 8:42 PM Tomas Vondra
 wrote:
> Well, yeah. I did this because the patch was somewhat inconsistent when
> handling owned sequences - it updated persistence for owned sequences
> when persistence for the table changed, expecting to keep them in sync,
> but then it also allowed operations that'd break it.

Oops.

> But that started a discussion about exactly this, and AFAICS there's
> agreement we want to allow the table and owned sequences to have
> different persistence values.
>
> The discussion about the details is still ongoing, but I think it's
> clear we'll ditch the restrictions you point out.

Great.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [WIP] ALTER COLUMN IF EXISTS

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 8:02 PM David G. Johnston
 wrote:
> At present the project seems to largely consider the IF EXISTS/IF NOT EXISTS 
> features to have been largely a mistake and while removing it is not going to 
> happen the desire to change or extend it is not strong.

I like the IF [NOT] EXISTS stuff quite a bit. I wish it had existed
back when I was doing application programming with PostgreSQL. I would
have used it for exactly the sorts of things that Bradley mentions.

I don't know how far it's worth taking this stuff. I dislike the fact
that when you get beyond what you can do with IF [NOT] EXISTS, you're
suddenly thrown into having to write SQL against system catalog
contents which, if you're the sort of person who really likes the IF
[NOT] EXISTS commands, may well be something you don't feel terribly
comfortable doing. It's almost tempting to propose new SQL functions
just for these kinds of scripts. Like instead of adding support
for

  ALTER TABLE myschema.mytable IF EXISTS RENAME IF EXISTS this TO that;

...and I presume you need IF EXISTS twice, once for the table and once
for the column, we could instead make it possible for people to write:

IF pg_table_exists('myschema.mytable') AND
pg_table_has_column('myschema.mytable', 'this') THEN
ALTER TABLE myschema.mytable RENAME this TO that;
END IF;

An  advantage of that approach is that you could also do more
complicated things that are never going to work with any number of
IF-EXISTS clauses. For example, imagine you want to rename foo to bar
and bar to baz, unless that's been done already. Well with these
functions you can just do this:

IF pg_table_has_column('mytab', 'foo') THEN
ALTER TABLE mytab RENAME bar TO baz;
ALTER TABLE mytab RENAME foo TO bar;
END;

There's no way to get there with just IF EXISTS.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: unlogged sequences

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 5:25 PM Robert Haas  wrote:

> On Thu, Mar 31, 2022 at 10:14 AM Tomas Vondra
>  wrote:
> > * When linking a sequence to a table (ALTER SEQUENCE ... OWNED BY),
> > there's an ereport(ERROR) if the relpersistence values do not match.
> >
> > * Disallow changing persistence for owned sequences directly.
>
> Wait, what? I don't understand why we would want to do either of these
> things.
>
> It seems to me that it's totally fine to use a logged table with an
> unlogged sequence, or an unlogged table with a logged sequence, or any
> of the other combinations. You get what you ask for, so make sure to
> ask for what you want. And that's it.
>

It seems reasonable to extend the definition of "ownership of a sequence"
in this way.  We always let you create unowned sequences with whatever
persistence you like if you need flexibility.

The "give the user power" argument is also valid.  But since they already
have power through unowned sequences, having the owned sequences more
narrowly defined doesn't detract from usability, and in many ways enhances
it by further reinforcing the fact that the sequence internally used when
you say "GENERATED ALWAYS AS IDENTITY" is an implementation detail - one
that has the same persistence as the table.

David J.


Re: unlogged sequences

2022-03-31 Thread Tomas Vondra
On 4/1/22 02:25, Robert Haas wrote:
> On Thu, Mar 31, 2022 at 10:14 AM Tomas Vondra
>  wrote:
>> * When linking a sequence to a table (ALTER SEQUENCE ... OWNED BY),
>> there's an ereport(ERROR) if the relpersistence values do not match.
>>
>> * Disallow changing persistence for owned sequences directly.
> 
> Wait, what? I don't understand why we would want to do either of these things.
> 
> It seems to me that it's totally fine to use a logged table with an
> unlogged sequence, or an unlogged table with a logged sequence, or any
> of the other combinations. You get what you ask for, so make sure to
> ask for what you want. And that's it.
> 
> If you say something like CREATE [UNLOGGED] TABLE foo (a serial) it's
> fine for serial to attribute the same persistence level to the
> sequence as it does to the table. But when that's dumped, it's going
> to be dumped as a CREATE TABLE command and a CREATE SEQUENCE command,
> each of which has a separate persistence level. So you can recreate
> whatever state you have.
> 

Well, yeah. I did this because the patch was somewhat inconsistent when
handling owned sequences - it updated persistence for owned sequences
when persistence for the table changed, expecting to keep them in sync,
but then it also allowed operations that'd break it.

But that started a discussion about exactly this, and AFAICS there's
agreement we want to allow the table and owned sequences to have
different persistence values.

The discussion about the details is still ongoing, but I think it's
clear we'll ditch the restrictions you point out.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: head fails to build on SLES 12 (wal_compression=zstd)

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 7:38 PM Tom Lane  wrote:
> Indeed.  I tried building against 1.3.6 (mainly because it was laying
> around) and the error reported by Devrim is just the tip of the iceberg.
> With "make -k", I see unknown-symbol failures on
>
> ZSTD_CCtx_setParameter
> ZSTD_c_compressionLevel
> ZSTD_c_nbWorkers
> ZSTD_CCtx_reset
> ZSTD_reset_session_only
> ZSTD_compressStream2
> ZSTD_e_continue
> ZSTD_e_end
>
> I wonder whether Robert's ambition to be compatible with old versions
> extends that far.

It definitely doesn't, and the fact that there's that much difference
in the APIs between 2018 and the present frankly makes my heart sink.

It looks like this stuff may be what libzstd calls the "advanced API"
which was considered unstable until 1.4.0 according to
https://github.com/facebook/zstd/releases -- so maybe we ought to
declare anything pre-1.4.0. to be unsupported. I really hope they're
serious about keeping the advanced API stable, though. I'm excited
about the potential of using libzstd, but I don't want to have to keep
adjusting our code to work with new library versions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: should vacuum's first heap pass be read-only?

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 6:43 PM Peter Geoghegan  wrote:
> > The only way the conveyor belt system has any
> > value is if we think that there is some set of circumstances where the
> > heap scan is separated in time from the index vacuum, such that we
> > might sometimes do an index vacuum without having done a heap scan
> > just before.
>
> I agree.

But in 
http://postgr.es/m/ca+tgmoa6kveeurtyeoi3a+ra2xuynwqmj_s-h4kun6-bkmm...@mail.gmail.com
(and the messages just before and just after it) we seemed to be
agreeing on a design where that's exactly what happens. It seemed like
a good idea to me at the time, but now it seems like it's a bad idea,
because it involves using the conveyor belt in a way that adds no
value.

Am I confused here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [WIP] ALTER COLUMN IF EXISTS

2022-03-31 Thread Tom Lane
"David G. Johnston"  writes:
> On Thu, Mar 31, 2022 at 4:39 PM Bradley Ayers 
> wrote:
>> I'm interested in adding more ergonomics to DDL commands, in
>> particular supporting IF EXISTS for ALTER TABLE … ALTER COLUMN, so
>> that if a column doesn't exist the command is skipped.

> At present the project seems to largely consider the IF EXISTS/IF NOT
> EXISTS features to have been largely a mistake and while removing it is not
> going to happen the desire to change or extend it is not strong.

That might be an overstatement.  There's definitely a camp that
doesn't like CREATE IF NOT EXISTS, precisely on the grounds that it's
not idempotent --- success of the command tells you very little about
the state of the object, beyond the fact that some object of that name
now exists.  (DROP IF EXISTS, by comparison, *is* idempotent: success
guarantees that the object now does not exist.  CREATE OR REPLACE
is also idempotent, or at least much closer than IF NOT EXISTS.)
It's not entirely clear to me whether ALTER IF EXISTS could escape any
of that concern, but offhand it seems like it's close to the CREATE
problem.  I do kind of wonder what the use-case for it is, anyway.

One thing to keep in mind is that unlike some other DBMSes, you
can script pretty much any conditional DDL you want in Postgres.
This considerably reduces the pressure to provide conditionalization
built right into the DDL commands.  As a result, we (or at least I)
prefer to offer only the most clearly useful, best-defined cases
as built-in DDL features.  So there's definitely a hurdle that
an ALTER IF EXISTS patch would have to clear before having a chance
of being accepted.

regards, tom lane




Re: unlogged sequences

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 10:14 AM Tomas Vondra
 wrote:
> * When linking a sequence to a table (ALTER SEQUENCE ... OWNED BY),
> there's an ereport(ERROR) if the relpersistence values do not match.
>
> * Disallow changing persistence for owned sequences directly.

Wait, what? I don't understand why we would want to do either of these things.

It seems to me that it's totally fine to use a logged table with an
unlogged sequence, or an unlogged table with a logged sequence, or any
of the other combinations. You get what you ask for, so make sure to
ask for what you want. And that's it.

If you say something like CREATE [UNLOGGED] TABLE foo (a serial) it's
fine for serial to attribute the same persistence level to the
sequence as it does to the table. But when that's dumped, it's going
to be dumped as a CREATE TABLE command and a CREATE SEQUENCE command,
each of which has a separate persistence level. So you can recreate
whatever state you have.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [WIP] ALTER COLUMN IF EXISTS

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 4:39 PM Bradley Ayers 
wrote:

>
> I'm interested in adding more ergonomics to DDL commands, in
> particular supporting IF EXISTS for ALTER TABLE … ALTER COLUMN, so
> that if a column doesn't exist the command is skipped.
>
> IF EXISTS is already supported in various places (e.g. ALTER TABLE …
> ADD COLUMN IF NOT EXISTS, and ALTER TABLE … DROP COLUMN IF EXISTS),
> but it's not available for any of the ALTER COLUMN sub commands.
>

At present the project seems to largely consider the IF EXISTS/IF NOT
EXISTS features to have been largely a mistake and while removing it is not
going to happen the desire to change or extend it is not strong.

If you want to make a go at this I would suggest not writing any new code
at first but instead take inventory of what is already implemented, how it
is implemented, what gaps there are, and proposals to fill those gaps.
Write the theory/rules that we follow in our existing (or future)
implementation of this idempotence feature.  Then get agreement to
implement the proposals from enough important people that a well-written
patch would be considered acceptable to commit.
I don't know if any amount of planning and presentation will convince
everyone this is a good idea in theory, let alone one that we want to
maintain while the author goes off to other projects (this being your first
patch that seems like a reasonable assumption).

I can say you have some community support in the endeavor but, and maybe
this is biasing me, my (fairly recent) attempt at what I considered
bug-fixing in this area was not accepted.  On that note, as part of your
research, you should find the previous email threads on this topic (there
are quite a few I am sure), and make you own judgements from those.  Aside
from it being my opinion I don't have any information at hand that isn't in
the email archives.

David J.


Re: [PATCH] Full support for index LP_DEAD hint bits on standby

2022-03-31 Thread Michail Nikolaev
Hello, David.

Thanks for your review!

> As a specific recommendation here - submit patches with a complete commit 
> message.
> Tweak it for each new version so that any prior discussion that informed the 
> general design of
> the patch is reflected in the commit message.

Yes, agreed. Applied to my other patch (1).

> In terms of having room for bugs this description seems like a lot of logic 
> to have to get correct.

Yes, it is the scary part. But it is contained in single
is_index_lp_dead_maybe_allowed function for now.

> Could we just do this first pass as:
> Enable recovery mode LP_DEAD hint bit updates after the first streamed 
> CHECKPOINT record comes over from the primary.
> ?

Not sure, but yes, it is better to split the patch into more detailed commits.

Thanks,
Michail.

[1]: 
https://www.postgresql.org/message-id/flat/CANtu0ogzo4MsR7My9%2BNhu3to5%3Dy7G9zSzUbxfWYOn9W5FfHjTA%40mail.gmail.com#341a3c3b033f69b260120b3173a66382




Re: [PATCH] Full support for index LP_DEAD hint bits on standby

2022-03-31 Thread Michail Nikolaev
Hello, Peter.

> The simple answer is: I don't know. I could probably come up with a
> better answer than that, but it would take real effort, and time.

I remember you had an idea about using the LP_REDIRECT bit in btree
indexes as some kind of “recently dead” flag (1).
Is this idea still in progress? Maybe an additional bit could provide
a space for a better solution.

> I think that you could do a better job of explaining and promoting the
> problem that you're trying to solve here. Emphasis on the problem, not
> so much the solution.

System I am working on highly depends on the performance of reading
from standby. In our workloads queries on standby are sometimes
10-100x slower than on primary due to absence of LP_DEAD support.
Other users have the same issues (2). I believe such functionality is
great optimization for read replicas with both analytics and OLTP
(read-only) workloads.

> You must understand that this whole area is *scary*. The potential for
> serious data corruption bugs is very real. And because the whole area
> is so complicated (it is at the intersection of 2-3 complicated
> areas), we can expect those bugs to be hidden for a long time. We
> might never be 100% sure that we've fixed all of them if the initial
> design is not generally robust. Most patches are not like that.

Moved to “Waiting for Author” for now.

[1]: 
https://www.postgresql.org/message-id/flat/CAH2-Wz%3D-BoaKgkN-MnKj6hFwO1BOJSA%2ByLMMO%2BLRZK932fNUXA%40mail.gmail.com#6d7cdebd68069cc493c11b9732fd2040
[2]: 
https://www.postgresql.org/message-id/flat/20170428133818.24368.33533%40wrigleys.postgresql.org

Thanks,
Michail.




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2022-03-31 Thread Justin Pryzby
On Mon, Mar 14, 2022 at 09:37:25PM -0500, Justin Pryzby wrote:
> The original, minimal goal of this patch was to show shared tempdirs in
> pg_ls_tmpfile() - rather than hiding them misleadingly as currently happens.
> 20200310183037.ga29...@telsasoft.com
> 20200313131232.go29...@telsasoft.com
> 
> I added the metadata function 2 years ago since it's silly to show metadata 
> for
> tmpdir but not other, arbitrary directories.
> 20200310183037.ga29...@telsasoft.com
> 20200313131232.go29...@telsasoft.com
> 20201223191710.gr30...@telsasoft.com

I renamed the CF entry to make even more clear the original motive for the
patches (I'm not maintaining the patch to add the metadata function just to
avoid writing a lateral join).

> > In the whole set, improving the docs as of 0001 makes sense, but the
> > change is incomplete.  Most of 0002 also makes sense and should be
> > stable enough.  I am less enthusiastic about any of the other changes
> > proposed and what we can gain from these parts.
> 
> It is frustrating to hear this feedback now, after the patch has gone through
> multiple rewrites over 2 years - based on other positive feedback and review.
> I went to the effort to ask, numerous times, whether to write the patch and 
> how
> its interfaces should look.  Now, I'm hearing that not only the implementation
> but its goals are wrong.  What should I have done to avoid that ?
> 
> 20200503024215.gj28...@telsasoft.com
> 20191227195918.gf12...@telsasoft.com
> 20200116003924.gj26...@telsasoft.com
> 20200908195126.gb18...@telsasoft.com

Michael said he's not enthusiastic about the patch.  But I haven't heard a
suggestion about how else to address the issue that pg_ls_tmpdir() hides shared
filesets.

On Mon, Mar 28, 2022 at 09:13:52PM -0500, Justin Pryzby wrote:
> On Sat, Mar 26, 2022 at 08:23:54PM +0900, Michael Paquier wrote:
> > On Wed, Mar 23, 2022 at 03:17:35PM +0900, Michael Paquier wrote:
> > > FWIW, per my review the bit of the patch set that I found the most
> > > relevant is the addition of a note in the docs of pg_stat_file() about
> > > the case where "filename" is a link, because the code internally uses
> > > stat().   The function name makes that obvious, but that's not
> > > commonly known, I guess.  Please see the attached, that I would be
> > > fine to apply.
> > 
> > Hmm.  I am having second thoughts on this one, as on Windows we rely
> > on GetFileInformationByHandle() for the emulation of stat() in
> > win32stat.c, and it looks like this returns some information about the
> > junction point and not the directory or file this is pointing to, it
> > seems.
> 
> Where did you find that ?  What metadata does it return about the junction
> point ?  We only care about a handful of fields.

Pending your feedback, I didn't modify this beyond your original suggestion -
which seemed like a good one.

This also adds some comments you requested and fixes your coding style
complaints, and causes cfbot to test my proposed patch rather than your doc
patch.

-- 
Justin
>From 354e62f07f345de403d1b2894eba98aec44217b5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v35 1/7] Document historic behavior of links to directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4001cb2bda5..d01aeec9f88 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27631,6 +27631,10 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 platforms only), file creation time stamp (Windows only), and a flag
 indicating if it is a directory.

+   
+   If filename is a link, this function returns information about the file
+   or directory the link refers to.
+   

 This function is restricted to superusers by default, but other users
 can be granted EXECUTE to run the function.
-- 
2.17.1

>From ee202767ae0d6651c552056424b004be2b110ec1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 17 Mar 2020 13:16:24 -0500
Subject: [PATCH v35 2/7] Add tests before changing pg_ls_*

---
 src/test/regress/expected/misc_functions.out | 59 
 src/test/regress/sql/misc_functions.sql  | 15 +
 2 files changed, 74 insertions(+)

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 01d1ad0b9a4..45544469af5 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -416,6 +416,65 @@ select count(*) > 0 from
  t
 (1 row)
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+ name 
+--
+ .
+(1 row)
+
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+ name 
+--
+(0 rows)
+
+select pg_ls_dir('does not exist', true, 

[WIP] ALTER COLUMN IF EXISTS

2022-03-31 Thread Bradley Ayers
Hi,

I'm interested in adding more ergonomics to DDL commands, in
particular supporting IF EXISTS for ALTER TABLE … ALTER COLUMN, so
that if a column doesn't exist the command is skipped.

IF EXISTS is already supported in various places (e.g. ALTER TABLE …
ADD COLUMN IF NOT EXISTS, and ALTER TABLE … DROP COLUMN IF EXISTS),
but it's not available for any of the ALTER COLUMN sub commands.

The motivation is to make it easier to write idempotent migrations
that can be incrementally authored, such that they can be re-executed
multiple times without having to write an "up" and "down" migration.
https://github.com/graphile/migrate#idempotency elaborates a bit more
on the approach.

The current approach I see is to write something like:

DO $$
  BEGIN
IF EXISTS (SELECT 1
 FROM information_schema.columns
 WHERE table_schema = 'myschema' AND table_name = 'mytable' AND
column_name = 'mycolume')
THEN
  ALTER TABLE myschema.mytable RENAME mycolume TO mycolumn;
END IF;
  END
$$;

I think ideally the IF EXISTS would be added to all of the ALTER
COLUMN commands, however for the moment I have only added it to the {
SET | DROP } NOT NULL command to demonstrate the approach and see if
there's in-principle support for such a change.

Questions:

1. I assume this is not part of the SQL specification, so this would
introduce more deviation to PostgreSQL. Is that accurate? Is that
problematic?
2. I believe I'm missing some code paths for table inheritance, is that correct?
3. I haven't updated the documentation—is it correct to do that in
doc/src/sgml/ref/alter_table.sgml?
4. This is my first time attempting to contribute to PostgreSQL, have
I missed anything?

-- 
Cheers,
Brad


v1-0001-Add-IF-EXISTS-support-to-ALTER-COLUMN-SET-DROP-NO.patch
Description: Binary data


Re: head fails to build on SLES 12 (wal_compression=zstd)

2022-03-31 Thread Tom Lane
Justin Pryzby  writes:
> On Thu, Mar 31, 2022 at 11:44:40AM -0400, Tom Lane wrote:
>> In view of 51c0d186d ("Allow parallel zstd compression"), I agree
>> that some clarity about the minimum supported version of zstd
>> seems essential.  I don't want to be dealing with threading bugs
>> in ancient zstd versions.  However, why do you suggest 1.3.7 in
>> particular?

> That's where I found that ZSTD_CLEVEL_DEFAULT was added, in their git.

> I've just installed a .deb for 1.3.8, and discovered that the APIs used by
> basebackup were considered experimental/nonpublic/static-lib-only until 1.4.0

Indeed.  I tried building against 1.3.6 (mainly because it was laying
around) and the error reported by Devrim is just the tip of the iceberg.
With "make -k", I see unknown-symbol failures on

ZSTD_CCtx_setParameter
ZSTD_c_compressionLevel
ZSTD_c_nbWorkers
ZSTD_CCtx_reset
ZSTD_reset_session_only
ZSTD_compressStream2
ZSTD_e_continue
ZSTD_e_end

I wonder whether Robert's ambition to be compatible with old versions
extends that far.

regards, tom lane




Re: unlogged sequences

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 3:43 PM Tomas Vondra 
wrote:

> On 3/31/22 22:40, David G. Johnston wrote:
> > On Thu, Mar 31, 2022 at 1:05 PM Tomas Vondra
> > mailto:tomas.von...@enterprisedb.com>>
> > wrote:
> >
> >
> > I agree the first part is not contentious, so shall we extract this
> part
> > of the patch and get that committed for PG15? Or is that too late to
> > make such changes to the patch?
> >
> >
> > The minimum viable feature for me, given the written goal for the patch
> > and the premise of not changing any existing behavior, is:
> >
> > DB State: Allow a sequence to be unlogged.
> > Command: ALTER SEQUENCE SET UNLOGGED
> > Limitation: The above command fails if the sequence is unowned, or it is
> > owned and the table owning it is not UNLOGGED
> >
> > (optional safety) Limitation: Changing a table from unlogged to logged
> > while owning unlogged sequences would be prohibited
> > (optional safety) Compensatory Behavior: Add the ALTER SEQUENCE SET
> > LOGGED command for owned sequences to get them logged again in
> > preparation for changing the table to being logged.
> >
> > In particular, I don't see CREATE UNLOGGED SEQUENCE to be all that
> > valuable since CREATE UNLOGGED TABLE wouldn't leverage it.
> >
>
> Hmm, so what about doing a little bit different thing:
>
> 1) owned sequences inherit persistence of the table by default
>

This is the contentious point.  If we are going to do it by default - thus
changing existing behavior - I would rather just do it always.  This is
also underspecified, there are multiple ways for a sequence to become owned.

Personally I'm for the choice to effectively remove the sequence's own
concept of logged/unlogged when it is owned by a table and to always just
use the table's value.


> 2) allow ALTER SEQUENCE to change persistence for all sequences (no
> restriction for owned sequences)
>

A generalization that is largely incontrovertible.

>
> 3) ALTER TABLE ... SET [UN]LOGGED changes persistence for sequences
> matching the initial table persistence
>

I'm leaning against this, leaving users to set each owned sequence to
logged/unlogged as they see fit if they want something other than
all-or-nothing.  I would stick to only providing an easy method to get the
assumed desired all-same behavior.
ALTER TABLE SET [UN]LOGGED, SET ALL SEQUENCES TO [UN]LOGGED;


> IMHO (1) would address vast majority of cases, which simply want the
> same persistence for the whole table and all auxiliary objects. (2)
> would address use cases requiring different persistence for sequences
> (including owned ones).
>
> I'm not sure about (3) though, maybe that's overkill.
>
> Of course, we'll always have problems with older releases, as it's not
> clear whether a logged sequence on unlogged table would be desirable or
> is used just because unlogged sequences were not supported. (We do have
> the same issue for logged tables too, but I doubt anyone really needs
> defining unlogged sequences on logged tables.)
>
> So no matter what we do, we'll make the wrong decision in some cases.
>

Again, I don't have too much concern here because you lose very little by
having an unowned sequence.  Which is why I'm fine with owned sequences
becoming even moreso implementation details that adhere to the persistence
mode of the owning relation.  But if the goal here is to defer such a
decision then the tradeoff is the DBA is given control and they get to
enforce consistency even if they are not benefitting from the flexibility.

> > Not having
> > CREATE TABLE make an unlogged sequence by default is annoying though and
> > likely should be changed - though it can leverage ALTER SEQUENCE too.
>
> Yeah. I think my proposal is pretty close to that, except that the
> sequence would first inherit persistence from the table, and there'd be
> an ALTER SEQUENCE for owned sequences where it differs. (And non-owned
> sequences would be created as logged/unlogged explicitly.)
>

I don't have any real problem with 1 or 2, they fill out the feature so it
is generally designed as opposed to solving a very specific problem.

For 1:
The "ADD COLUMN" (whether in CREATE TABLE or ALTER TABLE) pathway will
produce a new sequence whose persistence matches that of the target table.
While a behavior change it is one aligned with the goal of the patch for
typical ongoing behavior and should benefit way more people than it may
inconvenience.  The "sequence not found" error that would be generated
seems minimally impactful.

The "ALTER SEQUENCE OWNED BY" pathway will not change the sequence's
persistence.  This is what pg_dump will use for serial/bigserial
The "ALTER TABLE ALTER COLUMN" pathway will not change the sequence's
persistence.  This is what pg_dump will use for generated always as identity

Provide a general purpose ALTER SEQUENCE SET [UN]LOGGED command

Provide an SQL Command to change all owned sequences of a table to be
UNLOGGED or LOGGED (I mentioned a function as well if someone thinks it
worth 

Re: Slow standby snapshot

2022-03-31 Thread Michail Nikolaev
Hello.

Just an updated commit message.

Thanks,
Michail.
From 934d649a18c66f8b448463e57375865b33ce53e7 Mon Sep 17 00:00:00 2001
From: nkey 
Date: Fri, 1 Apr 2022 02:17:55 +0300
Subject: [PATCH v5] Optimize KnownAssignedXidsGetAndSetXmin by maintaining
 offset to next valid xid.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Currently KnownAssignedXidsGetAndSetXmin requires an iterative loop through 
KnownAssignedXids array,
including xids marked as invalid. Performance impact is especially noticeable 
in the presence of
long (few seconds) transactions on primary, high value (few thousands) of 
max_connections and high
read workload on standby. Most of the cpu spent on looping throw 
KnownAssignedXids while almost all
xid are invalid anyway. KnownAssignedXidsCompress removes invalid xids from 
time to time,
but performance is still affected.

To increase performance we lazily maintain an offset in the 
KnownAssignedXidsNext array to skip known
to be invalid xids. KnownAssignedXidsNext does not always point to “next” valid 
xid, it is just some
offset safe to skip (known to be filled by only invalid xids).

It helps to skip the gaps and could significantly increase performance - up to 
10% more TPS on standby.

Thread: 
https://www.postgresql.org/message-id/flat/CALdSSPgahNUD_%3DpB_j%3D1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw%40mail.gmail.com

Benchmark:
https://www.postgresql.org/message-id/flat/CANtu0ohzBFTYwdLtcanWo4%2B794WWUi7LY2rnbHyorJdE8_ZnGg%40mail.gmail.com#379c1be7b8134ada5a574078d51b64c6
---
 src/backend/storage/ipc/procarray.c | 56 -
 1 file changed, 47 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c
index 13d192ec2b..b5cb3423fb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -267,6 +267,7 @@ static PGPROC *allProcs;
  */
 static TransactionId *KnownAssignedXids;
 static bool *KnownAssignedXidsValid;
+static int32 *KnownAssignedXidsNext;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
 /*
@@ -455,6 +456,12 @@ CreateSharedProcArray(void)
ShmemInitStruct("KnownAssignedXidsValid",
mul_size(sizeof(bool), 
TOTAL_MAX_CACHED_SUBXIDS),
);
+   KnownAssignedXidsNext = (int32 *)
+   ShmemInitStruct("KnownAssignedXidsNext",
+   
mul_size(sizeof(int32), TOTAL_MAX_CACHED_SUBXIDS),
+   );
+   for (int i = 0; i < TOTAL_MAX_CACHED_SUBXIDS; i++)
+   KnownAssignedXidsNext[i] = 1;
}
 }
 
@@ -4544,7 +4551,13 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
  * XID entry itself.  This preserves the property that the XID entries are
  * sorted, so we can do binary searches easily.  Periodically we compress
  * out the unused entries; that's much cheaper than having to compress the
- * array immediately on every deletion.
+ * array immediately on every deletion. Also, we lazily maintain an offset
+ * in KnownAssignedXidsNext[] array to skip known to be invalid xids. It
+ * helps to skip the gaps; it could significantly increase performance in
+ * the case of long transactions on the primary. KnownAssignedXidsNext[] is
+ * updating while taking the snapshot. In general case KnownAssignedXidsNext
+ * contains not an offset to the next valid xid but a number which tends to
+ * the offset to next valid xid.
  *
  * The actually valid items in KnownAssignedXids[] and KnownAssignedXidsValid[]
  * are those with indexes tail <= i < head; items outside this subscript range
@@ -4582,7 +4595,7 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
  * must happen)
  * * Compressing the array is O(S) and requires exclusive lock
  * * Removing an XID is O(logS) and requires exclusive lock
- * * Taking a snapshot is O(S) and requires shared lock
+ * * Taking a snapshot is O(S), O(N) next call; requires shared lock
  * * Checking for an XID is O(logS) and requires shared lock
  *
  * In comparison, using a hash table for KnownAssignedXids would mean that
@@ -4642,12 +4655,13 @@ KnownAssignedXidsCompress(bool force)
 * re-aligning data to 0th element.
 */
compress_index = 0;
-   for (i = tail; i < head; i++)
+   for (i = tail; i < head; i += KnownAssignedXidsNext[i])
{
if (KnownAssignedXidsValid[i])
{
KnownAssignedXids[compress_index] = 
KnownAssignedXids[i];
KnownAssignedXidsValid[compress_index] = true;
+   KnownAssignedXidsNext[compress_index] = 1;
compress_index++;
}
}
@@ 

Re: should vacuum's first heap pass be read-only?

2022-03-31 Thread Peter Geoghegan
On Thu, Mar 31, 2022 at 1:25 PM Robert Haas  wrote:
> > I don't think we want to, exactly. Maybe it's easier to store
> > redundant TIDs than to avoid storing them in the first place (we can
> > probably just accept some redundancy). There is a trade-off to be made
> > there. I'm not at all sure of what the best trade-off is, though.
>
> My intuition is that storing redundant TIDs will turn out to be a very
> bad idea. I think that if we do that, the system will become prone to
> a new kind of vicious cycle (to try to put this in words similar to
> the ones you've been using to write about similar effects).

I don't feel very strongly about it either way. I definitely think
that there are workloads for which that will be true, and I in general
I am no fan of putting off work that cannot possibly turn out to be
unnecessary in the end. I am in favor of "good laziness", not "bad
laziness".

> The more data we collect on the conveyor belt, the
> harder it's going to be when we eventually need to deduplicate.
>
> Also, I think it's possible to deduplicate at a very reasonable cost
> so long as (1) we enter each batch of TIDs into the conveyor belt as a
> distinguishable "run"

I definitely think that's the way to go, in general (regardless of
what we do about deduplicating TIDs).

> and (2) we never accumulate so many of these
> runs at the same time that we can't do a single merge pass to turn
> them into a sorted output stream. We're always going to discover dead
> TIDs in sorted order, so as we make our pass over the heap, we can
> simultaneously be doing an on-the-fly merge pass over the existing
> runs that are on the conveyor belt. That means we never need to store
> all the dead TIDs in memory at once.

That's a good idea IMV. I am vaguely reminded of an LSM tree.

> We just fetch enough data from
> each "run" to cover the block numbers for which we're performing the
> heap scan, and when the heap scan advances we throw away data for
> blocks that we've passed and fetch data for the blocks we're now
> reaching.

I wonder if you've thought about exploiting the new approach to
skipping pages using the visibility map from my patch series (which
you reviewed recently). I think that putting that in scope here could
be very helpful. As a way of making the stuff we already want to do
with [global] indexes easier, but also as independently useful work
based on the conveyor belt. The visibility map is very underexploited
as a source of accurate information about what VACUUM should be doing
IMV (in autovacuum.c's scheduling logic, but also in vacuumlazy.c
itself).

Imagine a world in which we decide up-front what pages we're going to
scan (i.e. our final vacrel->scanned_pages), by first scanning the
visibility map, and serializing it in local memory, or sometimes in
disk using the conveyor belt. Now you'll have a pretty good idea how
much TID deduplication will be necessary when you're done (to give one
example). In general "locking in" a plan of action for VACUUM seems
like it would be very useful. It will tend to cut down on the number
of "dead but not yet removable" tuples that VACUUM encounters right
now -- you at least avoid visiting concurrently modified heap pages
that were all-visible back when OldestXmin was first established.

When all skippable ranges are known in advance, we can reorder things
in many different ways -- since the work of vacuuming can be
decomposed on the lazy_scan_heap side, too. The only ordering
dependency among heap pages (that I can think of offhand) is FSM
vacuuming, which seems like it could be addressed without great
difficulty.

Ideally everything can be processed in whatever order is convenient as
of right now, based on costs and benefits. We could totally decompose
the largest VACUUM operations into individually processable unit of
work (heap and index units), so that individual autovacuum workers no
longer own particular VACUUM operations at all. Autovacuum workers
would instead multiplex all of the units of work from all pending
VACUUMs, that are scheduled centrally, based on the current load of
the system. We can probably afford to be much more sensitive to the
number of pages we'll dirty relative to what the system can take right
now, and so on.

Cancelling an autovacuum worker may just make the system temporarily
suspend the VACUUM operation for later with this design (in fact
cancelling an autovacuum may not really be meaningful at all). A
design like this could also enable things like changing our mind about
advancing relfrozenxid: if we decided to skip all-visible pages in
this VACUUM operation, and come to regret that decision by the end
(because lots of XIDs are consumed in the interim), maybe we can go
back and get the all-visible pages we missed first time around.

I don't think that all of these ideas will turn out to be winners, but
I'm just trying to stimulate discussion. Breaking almost all
dependencies on the order that we process things in seems like it has

Re: unlogged sequences

2022-03-31 Thread Tomas Vondra
On 3/31/22 22:40, David G. Johnston wrote:
> On Thu, Mar 31, 2022 at 1:05 PM Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
> 
> 
> I agree the first part is not contentious, so shall we extract this part
> of the patch and get that committed for PG15? Or is that too late to
> make such changes to the patch?
> 
> 
> The minimum viable feature for me, given the written goal for the patch
> and the premise of not changing any existing behavior, is:
> 
> DB State: Allow a sequence to be unlogged.
> Command: ALTER SEQUENCE SET UNLOGGED
> Limitation: The above command fails if the sequence is unowned, or it is
> owned and the table owning it is not UNLOGGED
> 
> (optional safety) Limitation: Changing a table from unlogged to logged
> while owning unlogged sequences would be prohibited
> (optional safety) Compensatory Behavior: Add the ALTER SEQUENCE SET
> LOGGED command for owned sequences to get them logged again in
> preparation for changing the table to being logged.
> 
> In particular, I don't see CREATE UNLOGGED SEQUENCE to be all that
> valuable since CREATE UNLOGGED TABLE wouldn't leverage it.
> 

Hmm, so what about doing a little bit different thing:

1) owned sequences inherit persistence of the table by default

2) allow ALTER SEQUENCE to change persistence for all sequences (no
restriction for owned sequences)

3) ALTER TABLE ... SET [UN]LOGGED changes persistence for sequences
matching the initial table persistence

IMHO (1) would address vast majority of cases, which simply want the
same persistence for the whole table and all auxiliary objects. (2)
would address use cases requiring different persistence for sequences
(including owned ones).

I'm not sure about (3) though, maybe that's overkill.

Of course, we'll always have problems with older releases, as it's not
clear whether a logged sequence on unlogged table would be desirable or
is used just because unlogged sequences were not supported. (We do have
the same issue for logged tables too, but I doubt anyone really needs
defining unlogged sequences on logged tables.)

So no matter what we do, we'll make the wrong decision in some cases.

> The above, possibly only half-baked, patch scope does not change any
> existing behavior but allows for the stated goal: an unlogged table
> having an unlogged sequence.  The DBA just has to execute the ALTER
> SEQUENCE command on all relevant sequences.  They can't even really get
> it wrong since only relevant sequences can be altered.  Not having
> CREATE TABLE make an unlogged sequence by default is annoying though and
> likely should be changed - though it can leverage ALTER SEQUENCE too.
> 
> Anything else they wish to do can be done via a combination of ownership
> manipulation and, worse case, dropping and recreating the sequence. 
> Though allowed for unowned unlogged sequences, while outside the
> explicit goal of the patch, would be an easy add (just don't error on
> the ALTER SEQUENCE SET UNLOGGED when the sequence is unowned).
> 

Yeah. I think my proposal is pretty close to that, except that the
sequence would first inherit persistence from the table, and there'd be
an ALTER SEQUENCE for owned sequences where it differs. (And non-owned
sequences would be created as logged/unlogged explicitly.)

I don't think we need to worry about old pg_dump versions on new PG
versions, because that's not really supported.

And for old PG versions the behavior would differ a bit depending on the
pg_dump version used. With old pg_dump version, the ALTER SEQUENCE would
not be emitted, so all owned sequences would inherit table persistence.
With new pg_dump we'd get the expected persistence (which might differ).

That's need to be documented, of course.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: head fails to build on SLES 12 (wal_compression=zstd)

2022-03-31 Thread Justin Pryzby
On Thu, Mar 31, 2022 at 11:44:40AM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > Possible responses look like:
> >  - Use 0 which also means "default" (need to verify that works across 
> > versions);
> >  - Or #ifndef ZSTD_CLEVEL_DEFAULT #define ZSTD_CLEVEL_DEFAULT 3;
> >  - Add a test for a minimum zstd version v1.3.7.  This may be a good idea 
> > for
> >v15 in any case, since we're using a few different APIs (at least
> >ZSTD_compress and ZSTD_compressStream2 and execve(zstd)).
> 
> In view of 51c0d186d ("Allow parallel zstd compression"), I agree
> that some clarity about the minimum supported version of zstd
> seems essential.  I don't want to be dealing with threading bugs
> in ancient zstd versions.  However, why do you suggest 1.3.7 in
> particular?

That's where I found that ZSTD_CLEVEL_DEFAULT was added, in their git.

I've just installed a .deb for 1.3.8, and discovered that the APIs used by
basebackup were considered experimental/nonpublic/static-lib-only until 1.4.0
https://github.com/facebook/zstd/releases/tag/v1.4.0
ZSTD_CCtx_setParameter ZSTD_c_compressionLevel ZSTD_c_nbWorkers ZSTD_CCtx_reset 
ZSTD_reset_session_only ZSTD_compressStream2 ZSTD_e_continue ZSTD_e_end

FYI: it looks like parallel, thread workers were also a nonpublic,
"experimental" API until v1.3.7.  Actually, ZSTD_p_nbThreads was renamed to
ZSTD_p_nbWorkers and then (in 1.3.8) renamed to ZSTD_c_nbWorkers.

Versions less than 1.3.8 would have required compile-time conditionals for both
ZSTD_CLEVEL_DEFAULT and ZSTD_p_nbThreads/ZSTD_p_nbWorkers/ZSTD_c_nbWorkers (but
that is moot).

Negative compression levels were added in 1.3.4 (but I think the range of their
levels was originally -1..-7 and now expanded).  And long-distance matching
added in 1.3.2.

cirrus has installed:
linux (debian bullseye) 1.4.8
macos has 1.5.0
freebsd has 1.5.0

debian buster (oldstable) has v1.3.8, which is too old.
ubuntu focal LTS has 1.4.4 (bionic LTS has 1.3.3 which seems too old)
rhel7 has 1.5.2 in EPEL;

SLES 12.5 has zstd 1.3.3, so it won't be supported.  But postgres should fail
gracefully during ./configure rather than during build.

Note that check-world fails if wal_compression is enabled due to two
outstanding issues, so it's not currently possible to set that in CI or
buildfarm...
https://www.postgresql.org/message-id/c86ce84f-dd38-9951-102f-13a931210f52%40dunslane.net
>From 45b94681af7fb822455f5f6114298b030d52e820 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 31 Mar 2022 17:20:59 -0500
Subject: [PATCH] zstd: require library >=1.4.0

Older versions would fail to compile anyway.
---
 configure| 22 +++---
 configure.ac |  2 +-
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/configure b/configure
index 8b361e211b8..14d2ca28534 100755
--- a/configure
+++ b/configure
@@ -9092,19 +9092,19 @@ $as_echo "$with_zstd" >&6; }
 if test "$with_zstd" = yes; then
 
 pkg_failed=no
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd" >&5
-$as_echo_n "checking for libzstd... " >&6; }
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd >= 1.4.0" >&5
+$as_echo_n "checking for libzstd >= 1.4.0... " >&6; }
 
 if test -n "$ZSTD_CFLAGS"; then
 pkg_cv_ZSTD_CFLAGS="$ZSTD_CFLAGS"
  elif test -n "$PKG_CONFIG"; then
 if test -n "$PKG_CONFIG" && \
-{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
-  ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd >= 1.4.0\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libzstd >= 1.4.0") 2>&5
   ac_status=$?
   $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
-  pkg_cv_ZSTD_CFLAGS=`$PKG_CONFIG --cflags "libzstd" 2>/dev/null`
+  pkg_cv_ZSTD_CFLAGS=`$PKG_CONFIG --cflags "libzstd >= 1.4.0" 2>/dev/null`
 		  test "x$?" != "x0" && pkg_failed=yes
 else
   pkg_failed=yes
@@ -9116,12 +9116,12 @@ if test -n "$ZSTD_LIBS"; then
 pkg_cv_ZSTD_LIBS="$ZSTD_LIBS"
  elif test -n "$PKG_CONFIG"; then
 if test -n "$PKG_CONFIG" && \
-{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
-  ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd >= 1.4.0\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libzstd >= 1.4.0") 2>&5
   ac_status=$?
   $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
-  pkg_cv_ZSTD_LIBS=`$PKG_CONFIG --libs "libzstd" 2>/dev/null`
+  pkg_cv_ZSTD_LIBS=`$PKG_CONFIG --libs "libzstd >= 1.4.0" 2>/dev/null`
 		  test "x$?" != "x0" && pkg_failed=yes
 else
   pkg_failed=yes
@@ -9142,14 +9142,14 @@ else
 _pkg_short_errors_supported=no
 fi
 if test $_pkg_short_errors_supported = yes; then
-	ZSTD_PKG_ERRORS=`$PKG_CONFIG 

Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-03-31 Thread Cary Huang
Hello

pg_receivewal creates this .partial WAL file during WAL streaming and it is 
already treating this file as a temporary file. It will fill this .partial file 
with zeroes up to 16777216 by default before streaming real WAL data on it. 

If your .partial file is only 8396800 bytes, then this could mean that 
pg_receivewal is terminated abruptly while it is appending zeroes or your 
system runs out of disk space. Do you have any error message? 

If this is case, the uninitialized .partial file should still be all zeroes, so 
it should be ok to delete it and have pg_receivewal to recreate a new .partial 
file.

Also, in your patch, you are using pad_to_size argument in function 
dir_open_for_write to determine if it needs to create a temp file, but I see 
that this function is always given a pad_to_size  = 16777216 , and never 0. Am 
I missing something?

Cary Huang
===
HighGo Software Canada

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-31 Thread Jacob Champion
On Tue, 2022-03-29 at 23:38 +, Jacob Champion wrote:
> v8 rebases over the recent SSL changes to get the cfbot green again.

I think the Windows failure [1] is unrelated to this patch, but for
posterity:

> [03:01:58.925] c:\cirrus>call "C:/Program Files/Git/usr/bin/timeout.exe" -v 
> -k60s 15m perl src/tools/msvc/vcregress.pl recoverycheck 
> [03:03:16.106] [03:03:16] t/001_stream_rep.pl .. ok76551 
> ms ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
> [03:03:16.120] [03:03:16] t/002_archiving.pl ... ok0 
> ms ( 0.02 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.02 CPU)
> [03:03:16.128] [03:03:16] t/003_recovery_targets.pl  ok0 
> ms ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
> [03:03:16.138] [03:03:16] t/004_timeline_switch.pl . ok0 
> ms ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
> [03:03:16.141] [03:03:16] t/005_replay_delay.pl  ok0 
> ms ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
> [03:03:24.561] [03:03:24] t/006_logical_decoding.pl  ok 8416 
> ms ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
> [03:03:32.496] [03:03:32] t/007_sync_rep.pl  ok 7895 
> ms ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
> [03:03:32.496] [03:03:32] t/008_fsm_truncation.pl .. ok0 
> ms ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
> [03:16:58.985] /usr/bin/timeout: sending signal TERM to command ‘perl’

The server and client logs don't quite match up; it looks like we get
partway through t/018_wal_optimize.pl, maybe to the end of the
`wal_level = minimal, SET TABLESPACE commit subtransaction` test,
before hanging.

I see that there's an active thread about a hang later in the recovery
suite [2]. That's suspicious since 019 is just the next test, but I
don't see any evidence in the logs that we actually started test 019 in
this run.

--Jacob

[1] https://cirrus-ci.com/task/5434752374145024
[2] 
https://www.postgresql.org/message-id/flat/83b46e5f-2a52-86aa-fa6c-8174908174b8%40iki.fi


Re: A qsort template

2022-03-31 Thread Thomas Munro
On Thu, Mar 31, 2022 at 11:09 PM John Naylor
 wrote:
> In a couple days I'm going to commit the v3 patch "accelerate tuple
> sorting for common types" as-is after giving it one more look, barring
> objections.

Hi John,

Thanks so much for all the work you've done here!  I feel bad that I
lobbed so many experimental patches in here and then ran away due to
lack of cycles.  That particular patch (the one cfbot has been chewing
on all this time) does indeed seem committable, despite the
deficiencies/opportunities listed in comments.  It's nice to reduce
code duplication, it gives the right answers, and it goes faster.

> I started towards incorporating the change in insertion sort threshold
> (part of 0010), but that caused regression test failures, so that will
> have to wait for a bit of analysis and retesting. (My earlier tests
> were done in a separate module.)
>
> The rest in this series that I looked at closely were either
> refactoring or could use some minor tweaks so likely v16 material.

Looking forward to it.




Re: Extensible Rmgr for Table AMs

2022-03-31 Thread Jeff Davis
On Mon, 2022-03-28 at 11:00 -0700, Andres Freund wrote:
> I think this has been discussed before, but to me it's not obvious
> that it's a
> good idea to change RmgrTable from RmgrData to RmgrData *. That adds
> an
> indirection, without obvious benefit.

I did some performance tests. I created a narrow table, took a base
backup, loaded 100M records, finished the base backup. Then I recovered
using the different build combinations (patched/unpatched, clang/gcc).

 compilerrun1   run2
unpatched:   gcc 102s   106s
patched: gcc 107s   105s
unpatched:   clang   109s   110s
patched: clang   109s   111s

I don't see a clear signal that this patch worsens performance. The
102s run was the very first run, so I suspect it was just due to the
processor starting out cold. Let me know if you think the test is
testing the right paths.

Perhaps I should address your other perf concerns around GetRmgr (make
it static inline, reduce number of calls), and then leave the
indirection for the sake of cleanliness?

If you are still concerned, I can switch back to separate tables to
eliminate the indirection for built-in rmgrs. Separate rmgr tables
still require a branch (to decide which table to access), but it should
be a highly predictable one.

Regards,
Jeff Davis






Re: Commitfest closing

2022-03-31 Thread Tom Lane
Andrew Dunstan  writes:
> On Mar 31, 2022, at 4:47 PM, Greg Stark  wrote:
>> So the last commitfest often runs over and "feature freeze" isn't
>> scheduled until April 7. Do committers want to keep the commitfest
>> open until then? Or close it now and focus it only on a few pending
>> features they're already working on?

> In past years the CF has been kept open. Let’s stick with that.

Yeah, I was assuming it would stay open till late next week.

We do have at least two patches in the queue that we want to put in
after everything else: my frontend logging patch, and the
PGDLLIMPORT-for-everything patch that I believe Robert has taken
responsibility for.  So if we want those in before the nominal
feature freeze, other stuff is going to need to be done a day or
so beforehand.  But that still gives us most of a week.

regards, tom lane




Re: proposal: enhancing plpgsql debug API - returns text value of variable content

2022-03-31 Thread Tom Lane
Pavel Stehule  writes:
> I am sending updated patch

After studying the list of exposed functions for awhile, it seemed
to me that we should also expose exec_assign_value.  The new pointers
allow a plugin to compute a value in Datum+isnull format, but then it
can't do much of anything with it: exec_assign_expr is a completely
inconvenient API if what you want to do is put a specific Datum
value into a variable.  Adding exec_assign_value provides "store"
and "fetch" APIs that are more or less inverses, which should be
easier to work with.

So I did that and pushed it.

regards, tom lane




Re: unlogged sequences

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 1:40 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> The DBA just has to execute the ALTER SEQUENCE command on all relevant
> sequences.
>

Additional, if we do not implement the forced matching of persistence mode,
we should consider adding an "ALTER TABLE SET ALL SEQUENCES TO UNLOGGED"
command for convenience.  Or maybe make it a function - which would allow
for SQL execution against a catalog lookup.

David J.


Re: Higher level questions around shared memory stats

2022-03-31 Thread Andres Freund
Hi,

On 2022-03-31 16:16:31 +0900, Kyotaro Horiguchi wrote:
> After moving to shared stats, we might want to expose the GUC variable
> itself. Then hide/remove the macro PG_STAT_TMP_DIR.  This breaks the
> extensions but it is better than keeping using PG_STAT_TMP_DIR for
> uncertain reasons. The existence of the macro can be used as the
> marker of the feature change.  This is the chance to break the (I
> think) bad practice shared among the extensions.  At least I am okay
> with that.

I don't really understand why we'd want to do it that way round? Why not just
leave PG_STAT_TMP_DIR in place, and remove the GUC? Since nothing uses the
GUC, we're not loosing anything, and we'd not break extensions unnecessarily?

Obviously there's no strong demand for pg_stat_statements et al to use the
user-configurable stats_temp_directory, given they've not done so for years
without complaints?  The code to support the configurable stats_temp_dir isn't
huge, but it's not small either.

Greetings,

Andres Freund




Re: Commitfest closing

2022-03-31 Thread Andrew Dunstan



> On Mar 31, 2022, at 4:47 PM, Greg Stark  wrote:
> 
> So the last commitfest often runs over and "feature freeze" isn't
> scheduled until April 7. Do committers want to keep the commitfest
> open until then? Or close it now and focus it only on a few pending
> features they're already working on?

In past years the CF has been kept open. Let’s stick with that.

Cheers

Andrew




Re: Logical insert/update/delete WAL records for custom table AMs

2022-03-31 Thread Jeff Davis
On Thu, 2022-03-31 at 09:05 -0400, Robert Haas wrote:
> 1. An AM that doesn't care about having anything happening during
> recovery, but wants to be able to get logical decoding to do some
> work. Maybe the intention of the AM is that data is available only
> when the server is not in recovery and all data is lost on shutdown,
> or maybe the AM has its own separate durability mechanism.

This is a speculative use case that is not what I would use it for, but
perhaps someone wants to do that with a table AM or maybe an FDW.

> 2. An AM that wants things to happen during recovery, but handles
> that
> separately. For example, maybe it logs all the data changes via
> log_newpage() and then separately emits these log records.

Yes, or Generic WAL. Generic WAL seems like a half-feature without this
Logical WAL patch, because it's hopeless to support logical
decoding/replication of whatever data you're logging with Generic WAL.
That's probably the strongest argument for this patch.

> 3. An AM that wants things to happen during recovery, and expects
> these records to serve that purpose. That would require getting
> control during WAL replay as well as during decoding, and would
> probably only work for an AM whose data is not block-structured (e.g.
> an in-memory btree that is dumped to disk at every checkpoint).

This patch would not work in this case because the records are ignored
during REDO.

Regards,
Jeff Davis







Commitfest closing

2022-03-31 Thread Greg Stark
So the last commitfest often runs over and "feature freeze" isn't
scheduled until April 7. Do committers want to keep the commitfest
open until then? Or close it now and focus it only on a few pending
features they're already working on?

-- 
greg




Re: unlogged sequences

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 1:05 PM Tomas Vondra 
wrote:

>
> I agree the first part is not contentious, so shall we extract this part
> of the patch and get that committed for PG15? Or is that too late to
> make such changes to the patch?
>
>
The minimum viable feature for me, given the written goal for the patch and
the premise of not changing any existing behavior, is:

DB State: Allow a sequence to be unlogged.
Command: ALTER SEQUENCE SET UNLOGGED
Limitation: The above command fails if the sequence is unowned, or it is
owned and the table owning it is not UNLOGGED

(optional safety) Limitation: Changing a table from unlogged to logged
while owning unlogged sequences would be prohibited
(optional safety) Compensatory Behavior: Add the ALTER SEQUENCE SET LOGGED
command for owned sequences to get them logged again in preparation for
changing the table to being logged.

In particular, I don't see CREATE UNLOGGED SEQUENCE to be all that valuable
since CREATE UNLOGGED TABLE wouldn't leverage it.

The above, possibly only half-baked, patch scope does not change any
existing behavior but allows for the stated goal: an unlogged table having
an unlogged sequence.  The DBA just has to execute the ALTER SEQUENCE
command on all relevant sequences.  They can't even really get it wrong
since only relevant sequences can be altered.  Not having CREATE TABLE make
an unlogged sequence by default is annoying though and likely should be
changed - though it can leverage ALTER SEQUENCE too.

Anything else they wish to do can be done via a combination of ownership
manipulation and, worse case, dropping and recreating the sequence.  Though
allowed for unowned unlogged sequences, while outside the explicit goal of
the patch, would be an easy add (just don't error on the ALTER SEQUENCE SET
UNLOGGED when the sequence is unowned).

David J.


Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2022-03-31 Thread Greg Stark
On Mon, 22 Nov 2021 at 02:14, Andy Fan  wrote:
>
> _I_ think there is no way to bypass that.  I guess Tom has a bigger plan on
> Var (not only for notnull), but no time to invest in them so far.  If
> that is the case,
> personally I think we can go ahead with my method first and continue the 
> review
> process.

This discussion has gone on for two years now and meandered into
different directions. There have been a number of interesting
proposals and patches in that time but it's not clear now what patch
is even under consideration and what questions remain for it. And I
think this message from last November is the last comment on it so I
wonder if it's reached a bit of an impasse.

I think I would suggest starting a fresh thread with a patch distilled
from the previous discussions. Then once that's settled repeat with
additional patches, keeping the discussion focused just on the current
change.

Personally I think these kinds of optimizations are important because
they're what allow people to use SQL without micro-optimizing each
query.

-- 
greg




Re: SQL/JSON: functions

2022-03-31 Thread Andrew Dunstan


On 3/31/22 15:54, Greg Stark wrote:
> I see several commits referencing this entry. Can we mark it committed
> or are there still chunks of commits to go?



No code chunks left, only a documentation patch which should land
tomorrow or Saturday.


I am also planning on committing the JSON_TABLE patches before feature
freeze (April 7). They depend on this set.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: should vacuum's first heap pass be read-only?

2022-03-31 Thread Robert Haas
[ returning to this thread after a bit of a hiatus ]

On Fri, Feb 25, 2022 at 12:15 PM Peter Geoghegan  wrote:
> I can only speak for myself, but that sounds correct to me. IMO what
> we really want here is to create lots of options for VACUUM.

I agree. That seems like a good way of thinking about it.

> I don't think we want to, exactly. Maybe it's easier to store
> redundant TIDs than to avoid storing them in the first place (we can
> probably just accept some redundancy). There is a trade-off to be made
> there. I'm not at all sure of what the best trade-off is, though.

My intuition is that storing redundant TIDs will turn out to be a very
bad idea. I think that if we do that, the system will become prone to
a new kind of vicious cycle (to try to put this in words similar to
the ones you've been using to write about similar effects). Imagine
that we vacuum a table which contains N dead tuples a total of K times
in succession, but each time we either deliberately decide against
index vacuuming or get killed before we can complete it. If we don't
have logic to prevent duplicate entries from being added to the
conveyor belt, we will have N*K TIDs in the conveyor belt rather than
only N, and I think it's not hard to imagine situations where K could
be big enough for that to be hugely painful. Moreover, at some point,
we're going to need to deduplicate anyway, because each of those dead
TIDs can only be marked unused once. Letting the data on the conveyor
belt in the hopes of sorting it out later seems almost certain to be a
losing proposition. The more data we collect on the conveyor belt, the
harder it's going to be when we eventually need to deduplicate.

Also, I think it's possible to deduplicate at a very reasonable cost
so long as (1) we enter each batch of TIDs into the conveyor belt as a
distinguishable "run" and (2) we never accumulate so many of these
runs at the same time that we can't do a single merge pass to turn
them into a sorted output stream. We're always going to discover dead
TIDs in sorted order, so as we make our pass over the heap, we can
simultaneously be doing an on-the-fly merge pass over the existing
runs that are on the conveyor belt. That means we never need to store
all the dead TIDs in memory at once. We just fetch enough data from
each "run" to cover the block numbers for which we're performing the
heap scan, and when the heap scan advances we throw away data for
blocks that we've passed and fetch data for the blocks we're now
reaching.

> > but if you are going to rescan the heap
> > again next time before doing any index vacuuming then why we want to
> > store them anyway.
>
> It all depends, of course. The decision needs to be made using a cost
> model. I suspect it will be necessary to try it out, and see.

But having said that, coming back to this with fresh eyes, I think
Dilip has a really good point here. If the first thing we do at the
start of every VACUUM is scan the heap in a way that is guaranteed to
rediscover all of the dead TIDs that we've previously added to the
conveyor belt plus maybe also new ones, we may as well just forget the
whole idea of having a conveyor belt at all. At that point we're just
talking about a system for deciding when to skip index vacuuming, and
the conveyor belt is a big complicated piece of machinery that stores
data we don't really need for anything because if we threw it out the
next vacuum would reconstruct it anyhow and we'd get exactly the same
results with less code. The only way the conveyor belt system has any
value is if we think that there is some set of circumstances where the
heap scan is separated in time from the index vacuum, such that we
might sometimes do an index vacuum without having done a heap scan
just before.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Correct docs re: rewriting indexes when table rewrite is skipped

2022-03-31 Thread James Coleman
On Thu, Mar 31, 2022 at 3:25 PM Robert Haas  wrote:
>
> On Thu, Mar 31, 2022 at 10:51 AM James Coleman  wrote:
> > Updated.
>
> This version looks fine to me. If nobody objects I will commit it and
> credit myself as a co-author.

Sounds great; thanks again for the review.

James Coleman




Re: Pluggable toaster

2022-03-31 Thread Greg Stark
It looks like it's still not actually building on some compilers. I
see a bunch of warnings as well as an error:

[03:53:24.660] dummy_toaster.c:97:2: error: void function
'dummyDelete' should not return a value [-Wreturn-type]

Also the "publication" regression test needs to be adjusted as it
includes \d+ output which has changed to include the toaster.




Re: Temporary tables versus wraparound... again

2022-03-31 Thread Greg Stark
I've updated the patches.

Adding the assertion actually turned up a corner case where RecentXmin
was *not* set. If you lock a temporary table and that's the only thing
you do in a transaction then the flag is set indicating you've used
the temp schema but you never take a snapshot :(

I also split out the warnings and added a test that relfrozenxid was
advanced on the toast table as well.

I haven't wrapped my head around multixacts yet. It's complicated by
this same codepath being used for truncates of regular tables that
were created in the same transaction.
From 63801cbb7c20676df98be47c52269bb6d7cf7b06 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 31 Mar 2022 15:49:19 -0400
Subject: [PATCH v5 2/3] Update relfrozenxmin when truncating temp tables

Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats
like normal truncate. Otherwise even typical short-lived transactions
using temporary tables can easily cause them to reach relfrozenxid.
---
 src/backend/catalog/heap.c | 60 ++
 1 file changed, 60 insertions(+)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 6eb78a9c0f..3f593f03dc 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -30,6 +30,7 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/relation.h"
 #include "access/table.h"
@@ -70,6 +71,7 @@
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/snapmgr.h"
 #include "utils/syscache.h"
 
 
@@ -2980,6 +2982,54 @@ RelationTruncateIndexes(Relation heapRelation)
 	}
 }
 
+/*
+ * Reset the relfrozenxid and other stats to the same values used when
+ * creating tables. This is used after non-transactional truncation.
+ *
+ * Doing this reduces the need for long-running programs to vacuum their own
+ * temporary tables (since they're not covered by autovacuum) at least in the
+ * case where they're ON COMMIT DELETE ROWS.
+ *
+ * see also src/backend/commands/vacuum.c vac_update_relstats()
+ * also see AddNewRelationTuple() above
+ */
+
+static void
+ResetVacStats(Relation rel)
+{
+	HeapTuple	ctup;
+	Form_pg_class pgcform;
+	Relation classRel;
+
+	/* Ensure RecentXmin is valid -- it almost certainly is but regression
+	 * tests turned up an unlikely corner case where it might not be */
+	if (!FirstSnapshotSet)
+		(void)GetLatestSnapshot();
+	Assert(FirstSnapshotSet);
+	
+	/* Fetch a copy of the tuple to scribble on */
+	classRel = table_open(RelationRelationId, RowExclusiveLock);
+	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(RelationGetRelid(rel)));
+	if (!HeapTupleIsValid(ctup))
+		elog(ERROR, "pg_class entry for relid %u vanished during truncation",
+			 RelationGetRelid(rel));
+	pgcform = (Form_pg_class) GETSTRUCT(ctup);
+
+	/*
+	 * Update relfrozenxid
+	 */
+
+	pgcform->relpages = 0;
+	pgcform->reltuples = -1;
+	pgcform->relallvisible = 0;
+	pgcform->relfrozenxid = RecentXmin;
+	pgcform->relminmxid = GetOldestMultiXactId();
+
+	heap_inplace_update(classRel, ctup);
+
+	table_close(classRel, RowExclusiveLock);
+}
+
 /*
  *	 heap_truncate
  *
@@ -2988,6 +3038,14 @@ RelationTruncateIndexes(Relation heapRelation)
  * This is not transaction-safe!  There is another, transaction-safe
  * implementation in commands/tablecmds.c.  We now use this only for
  * ON COMMIT truncation of temporary tables, where it doesn't matter.
+ *
+ * Or whenever a table's relfilenode was created within the same transaction
+ * such as when the table was created or truncated (normally) within this
+ * transaction.
+ *
+ * The correctness of this code depends on the fact that the table creation or
+ * truncation would be rolled back *including* the insert/update to the
+ * pg_class row that we update in place here.
  */
 void
 heap_truncate(List *relids)
@@ -3044,6 +3102,7 @@ heap_truncate_one_rel(Relation rel)
 
 	/* Truncate the underlying relation */
 	table_relation_nontransactional_truncate(rel);
+	ResetVacStats(rel);
 
 	/* If the relation has indexes, truncate the indexes too */
 	RelationTruncateIndexes(rel);
@@ -3055,6 +3114,7 @@ heap_truncate_one_rel(Relation rel)
 		Relation	toastrel = table_open(toastrelid, AccessExclusiveLock);
 
 		table_relation_nontransactional_truncate(toastrel);
+		ResetVacStats(toastrel);
 		RelationTruncateIndexes(toastrel);
 		/* keep the lock... */
 		table_close(toastrel, NoLock);
-- 
2.35.1

From 717c416a1bb26ac46ece04eb7b755d9ee12cb9c1 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 31 Mar 2022 15:50:02 -0400
Subject: [PATCH v5 3/3] Add test for truncating temp tables advancing
 relfrozenxid

This test depends on other transactions not running at the same time
so that relfrozenxid can advance so it has to be moved to its own
schedule.
---
 src/test/regress/expected/temp.out | 37 ++
 src/test/regress/parallel_schedule | 10 +---
 src/test/regress/sql/temp.sql  

Re: unlogged sequences

2022-03-31 Thread Tomas Vondra



On 3/31/22 21:55, David G. Johnston wrote:
> On Thu, Mar 31, 2022 at 12:36 PM Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
> 
> On 3/31/22 19:35, David G. Johnston wrote:
> > On Thu, Mar 31, 2022 at 9:28 AM Andres Freund  
> > >> wrote:
> >
> >     I agree it makes sense to have logged sequences with unlogged
> tables. We
> >     should call out the behavioural change somewhere prominent in the
> >     release
> >     notes.
> >
> 
> I'm not sure I follow. If we allow logged sequences with unlogged
> tables, there's be no behavioral change, no?
> 
> 
> As noted below, the behavior change is in how CREATE TABLE behaves.  Not
> whether or not mixed persistence is allowed.
>  
> 
> or maybe we could
> modify pg_dump to emit UNLOGGED when the table is unlogged (but that
> would work only when using the new pg_dump).
> 
> 
> Yes, the horse has already left the barn.  I don't really have an
> opinion on whether to leave the barn door open or closed.
>  
> 
> 
> > If choosing option 2, are you on board with changing the behavior of
> > CREATE UNLOGGED TABLE with respect to any auto-generated sequences?
> >
> 
> What behavior change, exactly? To create the sequences as UNLOGGED, but
> we'd not update the persistence after that?
> 
> 
> Today, a newly created unlogged table with an automatically owned
> sequence (serial, generated identity) has a logged sequence.  This patch
> changes that so the new automatically owned sequence is unlogged.  This
> seems to be generally agreed upon as being desirable - but given the
> fact that unlogged tables will not have unlogged sequences it seems
> worth confirming that this minor inconsistency is acceptable.
> 
> The first newly added behavior is just allowing sequences to be
> unlogged.  That is the only mandatory feature introduced by this patch
> and doesn't seem contentious.
> 
> The second newly added behavior being proposed is to have the
> persistence of the sequence be forcibly matched to the table.  Whether
> this is desirable is the main point under discussion.
> 

Right. The latest version of the patch also prohibits changing
persistence of owned sequences directly. But that's probably considered
 to be part of the second behavior.

I agree the first part is not contentious, so shall we extract this part
of the patch and get that committed for PG15? Or is that too late to
make such changes to the patch?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: head fails to build on SLES 12

2022-03-31 Thread Larry Rosenman

On 03/31/2022 2:58 pm, Andrew Dunstan wrote:

On 3/31/22 10:23, Devrim Gündüz wrote:

Hi,

Latest snapshot tarball fails to build on SLES 12.5, which uses GCC
4.8-8. Build log is attached.

Please let me know if you want me to provide more info.




AFAICT we don't have any buildfarm animals currently reporting on SLES
12 (and the one we did have was on s390x, if that matters).

So if this is something that needs support we should address that. 
After

all, that's exactly what the buildfarm was created for.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com


I can spin up a VM on SLES 12 assuming someone can point me to the right 
place to get

an ISO, etc, and I don't need a payfor license.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: head fails to build on SLES 12

2022-03-31 Thread Andrew Dunstan


On 3/31/22 10:23, Devrim Gündüz wrote:
> Hi,
>
> Latest snapshot tarball fails to build on SLES 12.5, which uses GCC
> 4.8-8. Build log is attached.
>
> Please let me know if you want me to provide more info.
>


AFAICT we don't have any buildfarm animals currently reporting on SLES
12 (and the one we did have was on s390x, if that matters).

So if this is something that needs support we should address that. After
all, that's exactly what the buildfarm was created for.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: unlogged sequences

2022-03-31 Thread David G. Johnston
On Thu, Mar 31, 2022 at 12:36 PM Tomas Vondra 
wrote:

> On 3/31/22 19:35, David G. Johnston wrote:
> > On Thu, Mar 31, 2022 at 9:28 AM Andres Freund  > > wrote:
> >
> > I agree it makes sense to have logged sequences with unlogged
> tables. We
> > should call out the behavioural change somewhere prominent in the
> > release
> > notes.
> >
>
> I'm not sure I follow. If we allow logged sequences with unlogged
> tables, there's be no behavioral change, no?
>
>
As noted below, the behavior change is in how CREATE TABLE behaves.  Not
whether or not mixed persistence is allowed.


> or maybe we could
> modify pg_dump to emit UNLOGGED when the table is unlogged (but that
> would work only when using the new pg_dump).
>

Yes, the horse has already left the barn.  I don't really have an opinion
on whether to leave the barn door open or closed.


>
> > If choosing option 2, are you on board with changing the behavior of
> > CREATE UNLOGGED TABLE with respect to any auto-generated sequences?
> >
>
> What behavior change, exactly? To create the sequences as UNLOGGED, but
> we'd not update the persistence after that?
>
>
Today, a newly created unlogged table with an automatically owned sequence
(serial, generated identity) has a logged sequence.  This patch changes
that so the new automatically owned sequence is unlogged.  This seems to be
generally agreed upon as being desirable - but given the fact that unlogged
tables will not have unlogged sequences it seems worth confirming that this
minor inconsistency is acceptable.

The first newly added behavior is just allowing sequences to be unlogged.
That is the only mandatory feature introduced by this patch and doesn't
seem contentious.

The second newly added behavior being proposed is to have the persistence
of the sequence be forcibly matched to the table.  Whether this is
desirable is the main point under discussion.

David J.


Re: SQL/JSON: functions

2022-03-31 Thread Greg Stark
I see several commits referencing this entry. Can we mark it committed
or are there still chunks of commits to go?




Re: unlogged sequences

2022-03-31 Thread Tomas Vondra
On 3/31/22 19:35, David G. Johnston wrote:
> On Thu, Mar 31, 2022 at 9:28 AM Andres Freund  > wrote:
> 
> I agree it makes sense to have logged sequences with unlogged tables. We
> should call out the behavioural change somewhere prominent in the
> release
> notes.
> 

I'm not sure I follow. If we allow logged sequences with unlogged
tables, there's be no behavioral change, no?

> 
> We can/do already support that unlikely use case by allowing one to
> remove the OWNERSHIP dependency between the table and the sequence.
> 
> I'm fine with owned sequences tracking the persistence attribute of the
> owning table.
> 

So essentially an independent sequence, used in a default value.

> I don't think we should make pg_upgrade change the loggedness of
> sequences.
> 
> 
> We are willing to change the default behavior here so it is going to
> affect dump/restore anyway, might as well fully commit and do the same
> for pg_upgrade.  The vast majority of users will benefit from the new
> default behavior.
> 

Whatever we do, I think we should keep the pg_dump and pg_upgrade
behavior as consistent as possible.

> I don't actually get, though, how that would play with pg_dump since it
> always emits an unowned, and thus restored as logged, sequence first and
> then alters the sequence to be owned by the table.  Thus restoring an
> old SQL dump into the v15 is going to fail if we prohibit
> unlogged-table/logged-sequence; unless we actively change the
> logged-ness of the sequence when subordinating it to a table.
> 

Yeah. I guess we'd need to either automatically switch the sequence to
the right persistence when linking it to the table, or maybe we could
modify pg_dump to emit UNLOGGED when the table is unlogged (but that
would work only when using the new pg_dump).

> Thus, the choices seem to be:
> 
> 1) implement forced persistence agreement for owned sequences, changing
> the sequence to match the table when the alter table happens, and during
> pg_upgrade.
> 2) do not force persistence agreement for owned sequences
> 
> If choosing option 2, are you on board with changing the behavior of
> CREATE UNLOGGED TABLE with respect to any auto-generated sequences?
> 

What behavior change, exactly? To create the sequences as UNLOGGED, but
we'd not update the persistence after that?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Correct docs re: rewriting indexes when table rewrite is skipped

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 10:51 AM James Coleman  wrote:
> Updated.

This version looks fine to me. If nobody objects I will commit it and
credit myself as a co-author.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: jsonpath syntax extensions

2022-03-31 Thread Nikita Malakhov
Hi,
Ok, we'll rebase it onto actual master for the next iteration.
Thank you!

On Thu, Mar 31, 2022 at 10:17 PM Greg Stark  wrote:

> Well I still think this would be a good candidate to get reviewed.
>
> But it currently needs a rebase and it's the last day of the CF so I
> guess it'll get moved forward again. I don't think "returned with
> feedback" is helpful given there's been basically no feedback :(
>
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 2:44 PM Andres Freund  wrote:
> On 2022-03-31 14:31:43 -0400, Robert Haas wrote:
> > On Thu, Mar 31, 2022 at 12:25 PM Andres Freund  wrote:
> > > > Andres, thoughts? Do you want me to polish and commit 0001?
> > >
> > > Yes please!
> >
> > Here is a polished version. Comments?
>
> LGTM.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: jsonpath syntax extensions

2022-03-31 Thread Greg Stark
Well I still think this would be a good candidate to get reviewed.

But it currently needs a rebase and it's the last day of the CF so I
guess it'll get moved forward again. I don't think "returned with
feedback" is helpful given there's been basically no feedback :(




Re: minor gripe about lax reloptions parsing for views

2022-03-31 Thread Greg Stark
The patch is currently not applying.

And it looks like there hasn't been any discussion since Alvaro's
comments last december. I'm marking the patch Returned with Feedback.

On Fri, 24 Dec 2021 at 16:49, Alvaro Herrera  wrote:
>
> On 2021-Dec-21, Mark Dilger wrote:
>
> > Rebased patch attached:
>
> These tests are boringly repetitive.  Can't we have something like a
> nested loop, with AMs on one and reloptions on the other, where each
> reloption is tried on each AM and an exception block to report the
> failure or success for each case?  Maybe have the list of AMs queried
> from pg_am with hardcoded additions if needed?
>
> --
> Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
> "Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)
>
>


-- 
greg




Re: [PATCH] pgbench: add multiconnect option

2022-03-31 Thread Greg Stark
According to the cfbot this patch needs a rebase




Re: ORDER BY pushdowns seem broken in postgres_fdw

2022-03-31 Thread Tom Lane
Ronan Dunklau  writes:
> Le jeudi 31 mars 2022, 01:41:37 CEST Tom Lane a écrit :
>> If no objections, I think this is committable.

> No objections on my end.

Pushed.

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-31 Thread Andres Freund
On 2022-03-31 14:31:43 -0400, Robert Haas wrote:
> On Thu, Mar 31, 2022 at 12:25 PM Andres Freund  wrote:
> > > Andres, thoughts? Do you want me to polish and commit 0001?
> >
> > Yes please!
> 
> Here is a polished version. Comments?

LGTM.




Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-31 Thread Andres Freund
Hi,

On 2022-03-30 12:58:26 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-03-30 12:34:34 -0400, Tom Lane wrote:
> >> One refinement that comes to mind as I look at the patch is to distinguish
> >> between "check" and "installcheck".  Not sure that's worthwhile, but not
> >> sure it isn't, either.
> 
> > As it's just about "free" to do so, I see no reason not to go for showing 
> > that
> > difference.  How about:
> 
> > echo "+++ (tap|regress|isolation) [install-]check in $(subdir) +++" && \
> 
> WFM.

Pushed like that.

Greetings,

Andres Freund




Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-31 Thread Andrew Dunstan


On 3/31/22 12:45, Tom Lane wrote:
>
> On looking closer, the combination of --config-auth and --create-role
> *only* fixes the config files for SSPI, it doesn't expect the server
> to be running.
>
> I agree that the documentation of this is nonexistent and the design
> is probably questionable, but I'm not volunteering to fix either.
> If you are, step right up.  In the meantime, I believe (without
> having tested) that the correct incantation is to use auth_extra
> but *also* create the user further down.
>

I will take fixing it as a TODO. But not until after feature freeze.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 12:25 PM Andres Freund  wrote:
> > Andres, thoughts? Do you want me to polish and commit 0001?
>
> Yes please!

Here is a polished version. Comments?

> FWIW, once the freeze is done I'm planning to set up scripting to see which
> parts of the code we whacked around don't have test coverage...

Sounds terrifying.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-initdb-When-running-CREATE-DATABASE-use-STRATEGY-WAL.patch
Description: Binary data


Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-31 Thread Andrew Dunstan


On 3/31/22 14:12, Robert Haas wrote:
> On Thu, Mar 31, 2022 at 12:56 PM Robert Haas  wrote:
>> I agree. That's exactly what I said in
>> http://postgr.es/m/CA+TgmoasOhqLR=tsymhd4tyx-qnfwtde_u19zphkunpsckh...@mail.gmail.com
>> ...
> OK, so I pushed a commit adding that incantation to the test script,
> and also a comment explaining why it's there. Possibly we ought to go
> add similar comments to other places where this incantation is used,
> or find a way to make this all a bit more self-documenting, but that
> doesn't necessarily need to be done today.
>
> The buildfarm does look rather green at the moment, though, so I'm not
> sure how I know whether this "worked".



You should know when jacana reports next (in the next hour or three), as
it's not set up for Unix sockets.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Unit tests for SLRU

2022-03-31 Thread Daniel Gustafsson
> On 31 Mar 2022, at 16:30, Aleksander Alekseev  
> wrote:

Thanks for hacking on increasing test coverage!

> While on it, I moved the Asserts() outside of SimpleLruInit().  It didn't seem
> to be a right place to check the IsUnderPostmaster value, and complicated the
> test implementation.

+ *
+ * Returns false if the cache didn't exist before the call, true otherwise.
  */
-void
+bool
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,

If we're changing the API to make it testable, that should be noted in the
comment and how the return value should be interpreted (and when it can be
ignored).  It also doesn't seem all that appealing that SimpleLruInit can
return false on successful function invocation, it makes for a confusing API.

--
Daniel Gustafsson   https://vmware.com/





Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-31 Thread Peter Geoghegan
On Thu, Mar 31, 2022 at 10:50 AM Andres Freund  wrote:
> > So, to be clear: vac_update_relstats() never actually considered the
> > new relfrozenxid value from its vacuumlazy.c caller to be "in the
> > future"?
>
> No, I added separate debug messages for those, and also applied your patch,
> and it didn't trigger.

The assert is "Assert(diff > 0)", and not "Assert(diff >= 0)". Plus
the other related assert I mentioned did not trigger. So when this
"diff" assert did trigger, the value of "diff" must have been 0 (not a
negative value). While this state does technically indicate that the
"existing" relfrozenxid value (actually a stale version) appears to be
"in the future" (because the OldestXmin XID might still never have
been allocated), it won't ever be in the future according to
vac_update_relstats() (even if it used that version).

I suppose that I might be wrong about that, somehow -- anything is
possible. The important point is that there is currently no evidence
that this bug (or any very recent bug) could ever allow
vac_update_relstats() to actually believe that it needs to update
relfrozenxid/relminmxid, purely because the existing value is in the
future.

The fact that vac_update_relstats() doesn't log/warn when this happens
is very unfortunate, but there is nevertheless no evidence that that
would have informed us of any bug on HEAD, even including the actual
bug here, which is a bug in vacuumlazy.c (not in vac_update_relstats).

> I do think we should apply a version of the warnings you have (with a WARNING
> instead of PANIC obviously). I think it's bordering on insanity that we have
> so many paths to just silently fix stuff up around vacuum. It's like we want
> things to be undebuggable, and to give users no warnings about something being
> up.

Yeah, it's just totally self defeating to not at least log it. I mean
this is a code path that is only hit once per VACUUM, so there is
practically no risk of that causing any new problems.

> Can you repro the issue with my recipe? FWIW, adding log_min_messages=debug5
> and fsync=off made the crash trigger more quickly.

I'll try to do that today. I'm not feeling the most energetic right
now, to be honest.

--
Peter Geoghegan




Re: [PATCH] Accept IP addresses in server certificate SANs

2022-03-31 Thread Jacob Champion
On Thu, 2022-03-31 at 16:32 +0200, Peter Eisentraut wrote:
> Why add a (failry complicated) pg_inet_pton() when a perfectly
> reasonable inet_pton() exists?

I think it was mostly just that inet_aton() and pg_inet_net_ntop() both
had ports, and I figured I might as well port the other one since we
already had the implementation. (I don't have a good intuition yet for
the community's preference for port vs dependency.)

> I would get rid of all that refactoring and just have your code call
> inet_pton()/inet_ntop() directly.
> 
> If you're worried about portability, and you don't want to go through
> the effort of proving libpgport substitutes, just have your code raise
> an error in the "#else" code paths.  We can fill that in later if there
> is demand.

Switched to inet_pton() in v12, with no #if/else for now. I think this
should work with Winsock as-is; let's see if the bot agrees...

Thanks,
--Jacob
From a973663f1b4b98a427692bb6291ddc2c3ce9bb4f Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Thu, 18 Nov 2021 15:36:18 -0800
Subject: [PATCH v12] libpq: allow IP address SANs in server certs

The current implementation supports exactly one IP address in a server
certificate's Common Name, which is brittle (the strings must match
exactly).  This patch adds support for IPv4 and IPv6 addresses in a
server's Subject Alternative Names.

Per discussion on-list:

- If the client's expected host is an IP address, we allow fallback to
  the Subject Common Name if an iPAddress SAN is not present, even if
  a dNSName is present. This matches the behavior of NSS, in violation
  of the relevant RFCs.

- We also, counter-intuitively, match IP addresses embedded in dNSName
  SANs. From inspection this appears to have been the behavior since the
  SAN matching feature was introduced in acd08d76.

- Unlike NSS, we don't map IPv4 to IPv6 addresses, or vice-versa.

- Move PGSQL_AF_INET* to inet-common.h to reduce copy-paste.

Co-authored-by: Kyotaro Horiguchi 
Co-authored-by: Daniel Gustafsson 
---
 doc/src/sgml/libpq.sgml   |  21 ++-
 src/backend/utils/adt/inet_net_pton.c |   2 +-
 src/include/common/inet-common.h  |  35 +
 src/include/utils/inet.h  |  13 +-
 src/interfaces/libpq/fe-secure-common.c   |  99 +
 src/interfaces/libpq/fe-secure-common.h   |   4 +
 src/interfaces/libpq/fe-secure-openssl.c  | 139 --
 src/port/inet_net_ntop.c  |  12 +-
 .../conf/server-cn-and-ip-alt-names.config|  24 +++
 src/test/ssl/conf/server-ip-alt-names.config  |  19 +++
 .../conf/server-ip-cn-and-alt-names.config|  21 +++
 .../server-ip-cn-and-dns-alt-names.config |  21 +++
 src/test/ssl/conf/server-ip-cn-only.config|  12 ++
 src/test/ssl/conf/server-ip-in-dnsname.config |  18 +++
 .../ssl/ssl/server-cn-and-ip-alt-names.crt|  20 +++
 .../ssl/ssl/server-cn-and-ip-alt-names.key|  27 
 src/test/ssl/ssl/server-ip-alt-names.crt  |  19 +++
 src/test/ssl/ssl/server-ip-alt-names.key  |  27 
 .../ssl/ssl/server-ip-cn-and-alt-names.crt|  19 +++
 .../ssl/ssl/server-ip-cn-and-alt-names.key|  27 
 .../ssl/server-ip-cn-and-dns-alt-names.crt|  20 +++
 .../ssl/server-ip-cn-and-dns-alt-names.key|  27 
 src/test/ssl/ssl/server-ip-cn-only.crt|  18 +++
 src/test/ssl/ssl/server-ip-cn-only.key|  27 
 src/test/ssl/ssl/server-ip-in-dnsname.crt |  18 +++
 src/test/ssl/ssl/server-ip-in-dnsname.key |  27 
 src/test/ssl/sslfiles.mk  |   6 +
 src/test/ssl/t/001_ssltests.pl| 110 +-
 28 files changed, 791 insertions(+), 41 deletions(-)
 create mode 100644 src/include/common/inet-common.h
 create mode 100644 src/test/ssl/conf/server-cn-and-ip-alt-names.config
 create mode 100644 src/test/ssl/conf/server-ip-alt-names.config
 create mode 100644 src/test/ssl/conf/server-ip-cn-and-alt-names.config
 create mode 100644 src/test/ssl/conf/server-ip-cn-and-dns-alt-names.config
 create mode 100644 src/test/ssl/conf/server-ip-cn-only.config
 create mode 100644 src/test/ssl/conf/server-ip-in-dnsname.config
 create mode 100644 src/test/ssl/ssl/server-cn-and-ip-alt-names.crt
 create mode 100644 src/test/ssl/ssl/server-cn-and-ip-alt-names.key
 create mode 100644 src/test/ssl/ssl/server-ip-alt-names.crt
 create mode 100644 src/test/ssl/ssl/server-ip-alt-names.key
 create mode 100644 src/test/ssl/ssl/server-ip-cn-and-alt-names.crt
 create mode 100644 src/test/ssl/ssl/server-ip-cn-and-alt-names.key
 create mode 100644 src/test/ssl/ssl/server-ip-cn-and-dns-alt-names.crt
 create mode 100644 src/test/ssl/ssl/server-ip-cn-and-dns-alt-names.key
 create mode 100644 src/test/ssl/ssl/server-ip-cn-only.crt
 create mode 100644 src/test/ssl/ssl/server-ip-cn-only.key
 create mode 100644 src/test/ssl/ssl/server-ip-in-dnsname.crt
 create mode 100644 src/test/ssl/ssl/server-ip-in-dnsname.key

diff --git a/doc/src/sgml/libpq.sgml 

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-31 Thread Robert Haas
On Thu, Mar 31, 2022 at 12:56 PM Robert Haas  wrote:
> I agree. That's exactly what I said in
> http://postgr.es/m/CA+TgmoasOhqLR=tsymhd4tyx-qnfwtde_u19zphkunpsckh...@mail.gmail.com
> ...

OK, so I pushed a commit adding that incantation to the test script,
and also a comment explaining why it's there. Possibly we ought to go
add similar comments to other places where this incantation is used,
or find a way to make this all a bit more self-documenting, but that
doesn't necessarily need to be done today.

The buildfarm does look rather green at the moment, though, so I'm not
sure how I know whether this "worked".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: support for MERGE

2022-03-31 Thread Daniel Gustafsson
> On 31 Mar 2022, at 19:38, Ranier Vilela  wrote:

> I think that there is an oversight at 7103ebb
> There is no chance of Assert preventing this bug.

This seems reasonable from brief reading of the code, NULL is a legitimate
value for the map and that should yield an empty list AFAICT.

--
Daniel Gustafsson   https://vmware.com/





Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-31 Thread Andres Freund
Hi,

On 2022-03-31 10:12:49 -0700, Peter Geoghegan wrote:
> On Wed, Mar 30, 2022 at 9:59 PM Andres Freund  wrote:
> > I'm not sure there's a proper bug on HEAD here. I think at worst it can 
> > delay
> > the horizon increasing a bunch, by falsely not using an aggressive vacuum 
> > when
> > we should have - might even be limited to a single autovacuum cycle.
> 
> So, to be clear: vac_update_relstats() never actually considered the
> new relfrozenxid value from its vacuumlazy.c caller to be "in the
> future"?

No, I added separate debug messages for those, and also applied your patch,
and it didn't trigger.

I don't immediately see how we could end up computing a frozenxid value that
would be problematic? The pgcform->relfrozenxid value will always be the
"local" value, which afaics can be behind the other database's value (and thus
behind the value from the relcache init file). But it can't be ahead, we have
the proper invalidations for that (I think).


I do think we should apply a version of the warnings you have (with a WARNING
instead of PANIC obviously). I think it's bordering on insanity that we have
so many paths to just silently fix stuff up around vacuum. It's like we want
things to be undebuggable, and to give users no warnings about something being
up.


> It just looked that way to the failing assertion in
> vacuumlazy.c, because its own version of the original relfrozenxid was
> stale from the beginning? And so the worst problem is probably just
> that we don't use aggressive VACUUM when we really should in rare
> cases?

Yes, I think that's right.

Can you repro the issue with my recipe? FWIW, adding log_min_messages=debug5
and fsync=off made the crash trigger more quickly.

Greetings,

Andres Freund




support for MERGE

2022-03-31 Thread Ranier Vilela
On 2022-Mar-28, Alvaro Herrera wrote:

>> I intend to get this pushed after lunch.

>Pushed, with one more change: fetching the tuple ID junk attribute in
>ExecMerge was not necessary, since we already had done that in
>ExecModifyTable. We just needed to pass that down to ExecMerge, and
>make sure to handle the case where there isn't one.
Hi,

I think that there is an oversight at 7103ebb

There is no chance of Assert preventing this bug.

regards,
Ranier Vilela


0001-avoid-dereference-null-map.patch
Description: Binary data


  1   2   >