Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Bharath Rupireddy
On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila  wrote:
>
> > Thanks for the patch. I have some comments on v42:
>
> Probably trying to keep it similar with
> binary_upgrade_create_empty_extension(). I think it depends what
> behaviour we expect for NULL input.

confirmed_flush_lsn for a logical slot can be null (for instance,
before confirmed_flush is updated for a newly created logical slot if
someone calls pg_stat_replication -> pg_get_replication_slots) and
when it is so, the binary_upgrade_create_empty_extension errors out.
Is this behaviour wanted? I think the function returning null on null
input is a better behaviour here.

> > 2.
> > +Datum
> > +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
> >
> > The function name looks too generic in the sense that it validates WAL
> > records for correctness/corruption, but it is not. Can it be something
> > like binary_upgrade_{check_for_wal_logical_end,
> > check_for_logical_end_of_wal} or such?
> >
>
> How about slightly modified version like
> binary_upgrade_validate_wal_logical_end?

Works for me.

> > 3.
> > +/* Quick exit if the given lsn is larger than current one */
> > +if (start_lsn >= GetFlushRecPtr(NULL))
> > +PG_RETURN_BOOL(false);
> > +
> >
> > An LSN that doesn't exists yet is an error IMO, may be an error better here?
> >
>
> It will anyway lead to error at later point but we will provide more
> information about all the slots that have invalid value of
> confirmed_flush LSN.

I disagree with the function returning false for non-existing LSN.
IMO, failing fast when an LSN that doesn't exist yet is supplied to
the function is the right approach. We never know, the slots on disk
content can get corrupted for some reason and confirmed_flush_lsn is
'/' or a non-existing LSN.

> > 4.
> > + * This function is used to verify that there are no WAL records (except 
> > some
> > + * types) after confirmed_flush_lsn of logical slots, which means all the
> > + * changes were replicated to the subscriber. There is a possibility that 
> > some
> > + * WALs are inserted during upgrade, so such types would be ignored.
> > + *
> >
> > This comment before the function better be at the callsite of the
> > function, because as far as this function is concerned, it checks if
> > there are any WAL records that are not "certain" types after the given
> > LSN, it doesn't know logical slots or confirmed_flush_lsn or such.
> >
>
> Yeah, we should give information at the callsite but I guess we need
> to give some context atop this function as well so that it is easier
> to explain the functionality.

At the callsite a detailed description is good. At the function
definition just a reference to the callsite is good.

> > 5. Trying to understand the interaction of this feature with custom
> > WAL records that a custom WAL resource manager puts in. Is it okay to
> > have custom WAL records after the "logical WAL end"?
> > +/*
> > + * There is a possibility that following records may be generated
> > + * during the upgrade.
> > + */
> >
>
> I don't think so. The only valid records for the checks in this
> function are probably the ones that can get generated by the upgrade
> process because we ensure that walsender sends all the records before
> it exits at shutdown time.

Can you help me understand how the list of WAL records that pg_upgrade
can generate is put up? Identified them after running some tests?

> > 10.
> > Why just wal_level and max_replication_slots, why not
> > max_worker_processes and max_wal_senders too?
>
> Isn't it sufficient to check the parameters that are required to
> create a slot aka what we check in the function
> CheckLogicalDecodingRequirements()? We are only creating logical slots
> here so I think that should be sufficient.

Ah, that makes sense.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z

2023-09-21 Thread Michael Paquier
On Thu, Sep 21, 2023 at 06:56:29PM +1200, David Rowley wrote:
> I deem it pretty unlikely that someone will accidentally remove the
> code that I just committed and a test to test that vacuumdb -Z
> --buffer-usage-limit ... passes the BUFFER_USAGE_LIMIT option would
> likely just forever mark that we once had a trivial bug that forgot to
> include the --buffer-usage-limit when -Z was specified.

Perhaps so.

> If others feel strongly that a test is worthwhile, then I'll reconsider.

I don't know if you would like that, but the addition is as simple as
the attached, FYI.
--
Michael
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index eccfcc54a1..597bbd8f54 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -40,6 +40,10 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--disable-page-skipping', 'postgres' ],
 	qr/statement: VACUUM \(DISABLE_PAGE_SKIPPING, SKIP_DATABASE_STATS\).*;/,
 	'vacuumdb --disable-page-skipping');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--buffer-usage-limit', '0', 'postgres' ],
+	qr/statement: VACUUM \(SKIP_DATABASE_STATS, BUFFER_USAGE_LIMIT '0'\).*;/,
+	'vacuumdb --buffer-usage-limit');
 $node->issues_sql_like(
 	[ 'vacuumdb', '--skip-locked', 'postgres' ],
 	qr/statement: VACUUM \(SKIP_DATABASE_STATS, SKIP_LOCKED\).*;/,
@@ -48,6 +52,10 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--skip-locked', '--analyze-only', 'postgres' ],
 	qr/statement: ANALYZE \(SKIP_LOCKED\).*;/,
 	'vacuumdb --skip-locked --analyze-only');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--buffer-usage-limit', '0', '--analyze-only', 'postgres' ],
+	qr/statement: ANALYZE \(BUFFER_USAGE_LIMIT '0'\).*;/,
+	'vacuumdb --buffer-usage-limit --analyze-only');
 $node->command_fails(
 	[ 'vacuumdb', '--analyze-only', '--disable-page-skipping', 'postgres' ],
 	'--analyze-only and --disable-page-skipping specified together');


signature.asc
Description: PGP signature


Re: Row pattern recognition

2023-09-21 Thread Tatsuo Ishii
> Attached is the fix against v6 patch. I will include this in upcoming v7 
> patch.

Attached is the v7 patch. It includes the fix mentioned above.  Also
this time the pattern matching engine is enhanced: previously it
recursed to row direction, which means if the number of rows in a
frame is large, it could exceed the limit of stack depth.  The new
version recurses over matched pattern variables in a row, which is at
most 26 which should be small enough.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 4d5be4c27ae5e4a50924520574e2c1ca3e398551 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Fri, 22 Sep 2023 13:53:35 +0900
Subject: [PATCH v7 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 222 ++--
 src/include/nodes/parsenodes.h  |  56 
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 273 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d2032885e..74c2069050 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	MergeWhenClause *mergewhen;
 	struct KeyActions *keyactions;
 	struct KeyAction *keyaction;
+	RPSkipTo	skipto;
 }
 
 %type 	stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -453,8 +456,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +558,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
+%type 	first_or_last
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +642,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +671,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +713,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
@@ -718,7 +729,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
 	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
-	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -731,7 +742,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
 	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD
+	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE METHOD
 	MINUTE_P MINVALUE MODE MONTH_P MOVE
 
 	NAME_P NAMES NATIONAL 

Doesn't pgstat_report_wal() handle the argument "force" incorrectly

2023-09-21 Thread Ryoga Yoshida

Hi,

pgstat_report_wal() calls pgstat_flush_wal() and pgstat_flush_io(). When 
calling them, pgstat_report_wal() specifies its argument "force" as the 
argument of them, as follows. But according to the code of 
pgstat_flush_wal() and pgstat_flush_io(), their argument is "nowait" and 
its meaning seems the opposite of "force". This means that, even when 
checkpointer etc calls pgstat_report_wal() with force=true to forcibly 
flush the statistics, pgstat_flush_wal() and pgstat_flush_io() skip 
flushing the statistics if they fail to acquire the lock immediately 
because they are called with nowait=true. This seems unexpected behavior 
and a bug.

void
pgstat_report_wal(bool force)
{
pgstat_flush_wal(force);

pgstat_flush_io(force);
}

BTW, pgstat_report_stat() treats "nowait" and "force" as the opposite 
one, as follows.

/* don't wait for lock acquisition when !force */
nowait = !force;

Ryoga Yoshidadiff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index bcaed14d02..f9b5ee8a7b 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -42,9 +42,9 @@ static WalUsage prevWalUsage;
 void
 pgstat_report_wal(bool force)
 {
-	pgstat_flush_wal(force);
+	pgstat_flush_wal(!force);
 
-	pgstat_flush_io(force);
+	pgstat_flush_io(!force);
 }
 
 /*


Doesn't pgstat_report_wal() handle the argument "force" incorrectly

2023-09-21 Thread Ryoga Yoshida

Hi,

pgstat_report_wal() calls pgstat_flush_wal() and pgstat_flush_io(). When 
calling them, pgstat_report_wal() specifies its argument "force" as the 
argument of them, as follows. But according to the code of 
pgstat_flush_wal() and pgstat_flush_io(), their argument is "nowait" and 
its meaning seems the opposite of "force". This means that, even when 
checkpointer etc calls pgstat_report_wal() with force=true to forcibly 
flush the statistics, pgstat_flush_wal() and pgstat_flush_io() skip 
flushing the statistics if they fail to acquire the lock immediately 
because they are called with nowait=true. This seems unexpected behavior 
and a bug.

void
pgstat_report_wal(bool force)
{
pgstat_flush_wal(force);

pgstat_flush_io(force);
}

BTW, pgstat_report_stat() treats "nowait" and "force" as the opposite 
one, as follows.

/* don't wait for lock acquisition when !force */
nowait = !force;

Ryoga Yoshida




Re: [HACKERS] Should logtape.c blocks be of type long?

2023-09-21 Thread Peter Geoghegan
On Thu, Sep 21, 2023 at 9:46 PM Michael Paquier  wrote:
> So, attached is a patch to change these longs to int64.  Any thoughts?

No new thoughts. I'm still all in favor of this. Thanks for picking it up.

At some point we should completely ban the use of "long".

-- 
Peter Geoghegan




Try adding type cast with tablespace

2023-09-21 Thread 程ゆき
Hi all,


Good day!


I am a newbee to PostgreSQL and recently came across an idea about
type-casting tablespace OID.

The motibation is that when I have to upgrade a PostgreSQL database, we
need to join other tables to

track tablespace name. I have just created a simple patch to resolve this.


Hope you can take a look with this.


My Execution Sample:

# After Patch:



postgres=# SELECT oid,oid::regtablespace,spcname from pg_tablespace ;

 oid  |oid |  spcname

--++

 1663 | pg_default | pg_default

 1664 | pg_global  | pg_global

(2 rows)




# Before Patch



postgres-# SELECT oid,oid::regtablespace,spcname from pg_tablespace ;

ERROR:  syntax error at or near "oid"

LINE 1: oid  |oid |  spcname

^




I added the "::regtablespace" part to source.

Note: While developing, I also had to add several rows to pgcatalog tables.

  Please point out if any OID newly assigned is not appropriate.


Kind Regards,

Yuki Tei
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 296930eb3b..3ab0f75c84 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_type.h"
+#include "commands/tablespace.h"
 #include "lib/stringinfo.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -869,6 +870,122 @@ regoperatorsend(PG_FUNCTION_ARGS)
 	return oidsend(fcinfo);
 }
 
+/*
+ * regtablespacein- converts "tablespacename" to tablespace OID
+ *
+ * We also accept a numeric OID, for symmetry with the output routine.
+ *
+ * '-' signifies unknown (OID 0).  In all other cases, the input must
+ * match an existing pg_tablespace entry.
+ */
+Datum
+regtablespacein(PG_FUNCTION_ARGS)
+{
+char   *tablespace_name_or_oid = PG_GETARG_CSTRING(0);
+Node   *escontext = fcinfo->context;
+Oid result;
+List   *names;
+ 
+/* Handle "-" or numeric OID */
+if (parseDashOrOid(tablespace_name_or_oid, , escontext))
+PG_RETURN_OID(result);
+ 
+/* The rest of this wouldn't work in bootstrap mode */
+if (IsBootstrapProcessingMode())
+elog(ERROR, "regtablespace values must be OIDs in bootstrap mode");
+ 
+/* Normal case: see if the name matches any pg_tablespace entry. */
+names = stringToQualifiedNameList(tablespace_name_or_oid, escontext);
+if (names == NIL)
+PG_RETURN_NULL();
+ 
+if (list_length(names) != 1)
+ereturn(escontext, (Datum) 0,
+(errcode(ERRCODE_INVALID_NAME),
+ errmsg("invalid name syntax")));
+ 
+result = get_tablespace_oid(strVal(linitial(names)), true);
+ 
+if (!OidIsValid(result))
+ereturn(escontext, (Datum) 0,
+(errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablespace \"%s\" does not exist",
+strVal(linitial(names);
+ 
+PG_RETURN_OID(result);
+}
+ 
+/*
+ * to_regtablespace   - converts "tablespacename" to tablespace OID
+ *
+ * If the name is not found, we return NULL.
+ */
+Datum
+to_regtablespace(PG_FUNCTION_ARGS)
+{
+char   *tablespace_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+Datum   result;
+ErrorSaveContext escontext = {T_ErrorSaveContext};
+ 
+if (!DirectInputFunctionCallSafe(regtablespacein, tablespace_name,
+ InvalidOid, -1,
+ (Node *) ,
+ ))
+PG_RETURN_NULL();
+PG_RETURN_DATUM(result);
+}
+ 
+/*
+ * regtablespaceout   - converts tablespace OID to "tablespace_name"
+ */
+Datum
+regtablespaceout(PG_FUNCTION_ARGS)
+{
+Oid spcid = PG_GETARG_OID(0);
+char   *result;
+ 
+if (spcid == InvalidOid)
+{
+result = pstrdup("-");
+PG_RETURN_CSTRING(result);
+}
+ 
+result = get_tablespace_name(spcid);
+ 
+if (result)
+{
+/* pstrdup is not really necessary, but it avoids a compiler warning */
+result = pstrdup(quote_identifier(result));
+}
+else
+{
+/* If OID doesn't match any tablespace, return it numerically */
+result = (char *) palloc(NAMEDATALEN);
+snprintf(result, NAMEDATALEN, "%u", spcid);
+}
+ 
+PG_RETURN_CSTRING(result);
+}
+
+/*
+ *  regtablespacerecv - converts external binary format to regtablespace
+ */
+Datum
+regtablespacerecv(PG_FUNCTION_ARGS)
+{
+/* Exactly the same as oidrecv, so share code */
+return oidrecv(fcinfo);
+}
+ 
+/*
+ *  regtablespacesend - converts regtablespace to binary format
+ */
+Datum

Re: [HACKERS] Should logtape.c blocks be of type long?

2023-09-21 Thread Michael Paquier
On Sun, Feb 26, 2017 at 10:04:20AM -0800, Peter Geoghegan wrote:
> I think it's worth being consistent about a restriction like this, as
> Robert said. Given that fixing this issue will not affect the machine
> code generated by compilers for the majority of platforms we support,
> doing so seems entirely worthwhile to me.

(Reviving an old thread, fives years later..)

As far as I can see, no patches have been sent to do that, and the
changes required to replace long by int64 in the logtape and tuplesort
code are rather minimal as long as some care is taken with all the
internals of logtape (correct me of course if I'm wrong).  I really
don't see why we shouldn't do the switch based on the argument of
Windows not being able to handle more than 16TB worth of blocks as
things stand because of long in these code paths.

So, attached is a patch to change these longs to int64.  Any thoughts?
--
Michael
diff --git a/src/include/utils/logtape.h b/src/include/utils/logtape.h
index 5420a24ac9..2de7add81c 100644
--- a/src/include/utils/logtape.h
+++ b/src/include/utils/logtape.h
@@ -51,7 +51,7 @@ typedef struct TapeShare
 	 * Currently, all the leader process needs is the location of the
 	 * materialized tape's first block.
 	 */
-	long		firstblocknumber;
+	int64		firstblocknumber;
 } TapeShare;
 
 /*
@@ -70,8 +70,8 @@ extern void LogicalTapeWrite(LogicalTape *lt, const void *ptr, size_t size);
 extern void LogicalTapeRewindForRead(LogicalTape *lt, size_t buffer_size);
 extern void LogicalTapeFreeze(LogicalTape *lt, TapeShare *share);
 extern size_t LogicalTapeBackspace(LogicalTape *lt, size_t size);
-extern void LogicalTapeSeek(LogicalTape *lt, long blocknum, int offset);
-extern void LogicalTapeTell(LogicalTape *lt, long *blocknum, int *offset);
-extern long LogicalTapeSetBlocks(LogicalTapeSet *lts);
+extern void LogicalTapeSeek(LogicalTape *lt, int64 blocknum, int offset);
+extern void LogicalTapeTell(LogicalTape *lt, int64 *blocknum, int *offset);
+extern int64 LogicalTapeSetBlocks(LogicalTapeSet *lts);
 
 #endif			/* LOGTAPE_H */
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f31878bdee..604fd00308 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -94,9 +94,9 @@
  */
 typedef struct TapeBlockTrailer
 {
-	long		prev;			/* previous block on this tape, or -1 on first
+	int64		prev;			/* previous block on this tape, or -1 on first
  * block */
-	long		next;			/* next block on this tape, or # of valid
+	int64		next;			/* next block on this tape, or # of valid
  * bytes on last block (if < 0) */
 } TapeBlockTrailer;
 
@@ -153,10 +153,10 @@ struct LogicalTape
 	 * When concatenation of worker tape BufFiles is performed, an offset to
 	 * the first block in the unified BufFile space is applied during reads.
 	 */
-	long		firstBlockNumber;
-	long		curBlockNumber;
-	long		nextBlockNumber;
-	long		offsetBlockNumber;
+	int64		firstBlockNumber;
+	int64		curBlockNumber;
+	int64		nextBlockNumber;
+	int64		offsetBlockNumber;
 
 	/*
 	 * Buffer for current data block(s).
@@ -172,7 +172,7 @@ struct LogicalTape
 	 * order; blocks are consumed from the end of the array (lowest block
 	 * numbers first).
 	 */
-	long	   *prealloc;
+	int64	   *prealloc;
 	int			nprealloc;		/* number of elements in list */
 	int			prealloc_size;	/* number of elements list can hold */
 };
@@ -200,9 +200,9 @@ struct LogicalTapeSet
 	 * blocks that are in unused holes between worker spaces following BufFile
 	 * concatenation.
 	 */
-	long		nBlocksAllocated;	/* # of blocks allocated */
-	long		nBlocksWritten; /* # of blocks used in underlying file */
-	long		nHoleBlocks;	/* # of "hole" blocks left */
+	int64		nBlocksAllocated;	/* # of blocks allocated */
+	int64		nBlocksWritten; /* # of blocks used in underlying file */
+	int64		nHoleBlocks;	/* # of "hole" blocks left */
 
 	/*
 	 * We store the numbers of recycled-and-available blocks in freeBlocks[].
@@ -213,19 +213,19 @@ struct LogicalTapeSet
 	 * LogicalTapeSetForgetFreeSpace().
 	 */
 	bool		forgetFreeSpace;	/* are we remembering free blocks? */
-	long	   *freeBlocks;		/* resizable array holding minheap */
-	long		nFreeBlocks;	/* # of currently free blocks */
+	int64	   *freeBlocks;		/* resizable array holding minheap */
+	int64		nFreeBlocks;	/* # of currently free blocks */
 	Size		freeBlocksLen;	/* current allocated length of freeBlocks[] */
 	bool		enable_prealloc;	/* preallocate write blocks? */
 };
 
 static LogicalTape *ltsCreateTape(LogicalTapeSet *lts);
-static void ltsWriteBlock(LogicalTapeSet *lts, long blocknum, const void *buffer);
-static void ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer);
-static long ltsGetBlock(LogicalTapeSet *lts, LogicalTape *lt);
-static long ltsGetFreeBlock(LogicalTapeSet *lts);
-static long ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt);
-static void ltsReleaseBlock(LogicalTapeSet *lts, long blocknum);
+static void 

Re: New WAL record to detect the checkpoint redo location

2023-09-21 Thread Dilip Kumar
On Thu, Sep 21, 2023 at 1:50 AM Robert Haas  wrote:
>
> On Mon, Sep 18, 2023 at 2:57 PM Robert Haas  wrote:
> > I've been brainstorming about this today, trying to figure out some
> > ideas to make it work.
>
> Here are some patches.
>
> 0001 refactors XLogInsertRecord to unify a couple of separate tests of
> isLogSwitch, hopefully making it cleaner and cheaper to add more
> special cases.

Yeah, this looks improvement as it removes one isLogSwitch from the code.

> 0002 is a very minimal patch to add XLOG_CHECKPOINT_REDO without using
> it for anything.
>
> 0003 modifies CreateCheckPoint() to insert an XLOG_CHECKPOINT_REDO
> record for any non-shutdown checkpoint, and modifies
> XLogInsertRecord() to treat that as a new special case, wherein after
> inserting the record the redo pointer is reset while still holding the
> WAL insertion locks.

Yeah from a design POV, it looks fine to me because the main goal was
to insert the XLOG_CHECKPOINT_REDO record and set the "RedoRecPtr"
under the same exclusive wal insertion lock and this patch is doing
this.  As you already mentioned it is an improvement over my first
patch because a) it holds the exclusive WAL insersion lock for a very
short duration b) not increasing the number of branches in
XLogInsertRecord().

Some review
1.
I feel we can reduce one more branch to the normal path by increasing
one branch in this special case i.e.

Your code is
if (class == WALINSERT_SPECIAL_SWITCH)
{
/*execute isSwitch case */
}
else if (class == WALINSERT_SPECIAL_CHECKPOINT)
{
/*execute checkpoint redo case */
}
else
{
/* common case*/
}

My suggestion
if (xl_rmid == RM_XLOG_ID)
{
if (class == WALINSERT_SPECIAL_SWITCH)
{
  /*execute isSwitch case */
}
else if (class == WALINSERT_SPECIAL_CHECKPOINT)
   {
 /*execute checkpoint redo case */
   }
}
else
{
 /* common case*/
}

2.
In fact, I feel that we can remove this branch as well right? I mean
why do we need to have this separate thing called "class"?  we can
very much use "info" for that purpose. right?

+ /* Does this record type require special handling? */
+ if (rechdr->xl_rmid == RM_XLOG_ID)
+ {
+ if (info == XLOG_SWITCH)
+ class = WALINSERT_SPECIAL_SWITCH;
+ else if (XLOG_CHECKPOINT_REDO)
+ class = WALINSERT_SPECIAL_CHECKPOINT;
+ }

So if we remove this then we do not have this class and the above case
would look like

if (xl_rmid == RM_XLOG_ID)
{
 if (info == XLOG_SWITCH)
{
  /*execute isSwitch case */
}
else if (info == XLOG_CHECKPOINT_REDO)
   {
 /*execute checkpoint redo case */
   }
}
else
{
 /* common case*/
}

3.
+ /* Does this record type require special handling? */
+ if (rechdr->xl_rmid == RM_XLOG_ID)
+ {
+ if (info == XLOG_SWITCH)
+ class = WALINSERT_SPECIAL_SWITCH;
+ else if (XLOG_CHECKPOINT_REDO)
+ class = WALINSERT_SPECIAL_CHECKPOINT;
+ }
+

the above check-in else if is wrong I mean
else if (XLOG_CHECKPOINT_REDO) should be else if (info == XLOG_CHECKPOINT_REDO)

That's all I have for now.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PoC] Reducing planning time when tables have many partitions

2023-09-21 Thread Lepikhov Andrei
On Wed, Sep 20, 2023, at 5:04 PM, Yuya Watari wrote:
> On Tue, Sep 19, 2023 at 5:21 PM Andrey Lepikhov
>  wrote:
>> Working on self-join removal in the thread [1] nearby, I stuck into the
>> problem, which made an additional argument to work in this new direction
>> than a couple of previous ones.
>> With indexing positions in the list of equivalence members, we make some
>> optimizations like join elimination more complicated - it may need to
>> remove some clauses and equivalence class members.
>> For changing lists of derives or ec_members, we should go through all
>> the index lists and fix them, which is a non-trivial operation.
>
> Thank you for looking into this and pointing that out. I understand
> that this problem will occur somewhere like your patch [1] quoted
> below because we need to modify RelOptInfo->eclass_child_members in
> addition to ec_members. Is my understanding correct? (Of course, I
> know ec_[no]rel_members, but I doubt we need them.)

It is okay if we talk about the self-join-removal feature specifically because 
joins are removed before an inheritance expansion.
But ec_source_indexes and ec_derive_indexes point to specific places in 
eq_sources and eq_derives lists. If I removed an EquivalenceClass or a 
restriction during an optimisation, I would arrange all indexes, too.
Right now, I use a workaround here and remove the index link without removing 
the element from the list. But I'm not sure how good this approach can be in 
perspective.
So, having eq_sources and eq_derives localised in EC could make such 
optimisations a bit more simple.

-- 
Regards,
Andrei Lepikhov




Re: pg_upgrade --check fails to warn about abstime

2023-09-21 Thread Tom Lane
Bruce Momjian  writes:
> On Wed, Sep 20, 2023 at 06:54:24PM +0200, Álvaro Herrera wrote:
>> I got a complaint that pg_upgrade --check fails to raise red flags when
>> the source database contains type abstime when upgrading from pg11.  The
>> type (along with reltime and tinterval) was removed by pg12.

> Wow, I never added code to pg_upgrade to check for that, and no one
> complained either.

Yeah, so most people had indeed listened to warnings and moved away
from those datatypes.  I'm inclined to think that adding code for this
at this point is a bit of a waste of time.

regards, tom lane




Re: Questioning an errcode and message in jsonb.c

2023-09-21 Thread Andy Fan
Hi Chap,


> As to how to redesign the error message is a bit confusing to
> me, it would be good to see the proposal code as well.
>
> The only concern from me is that the new error from newer
> version is not compatible with the older versions, which may matters
> matters or doesn't match, I don't know.
>
>
Do you mind providing the patch in your mind, and let's just ignore
the compatible issue for now.  I think that would be pretty helpful for
further discussion.

-- 
Best Regards
Andy Fan


Re: Report planning memory in EXPLAIN ANALYZE

2023-09-21 Thread Andy Fan
Hi Ashutosh,

Thanks for the patch!

I think the reason why others are not excited about this is because the
benefit is not clear enough to them after the first glance and plus the
chosen wolds "Planning Memory: used " is still confusing for different
people. I admit it is clear to you absolutely, just not for others.  Here
are some minor feedback from me:

1). The commit message of patch 1 just says how it does but doesn't
say why it does. After reading through the discussion, I suggest making
it clearer to others.

...
After the planning is done, it may still occupy lots of memory which is
allocated but not pfree-d. All of these memories can't be used in the later
stage. This patch will report such memory usage in order to making some
future mistakes can be found in an easier way.
...

(a description of how it works just like what you have done in the commit
message).

So if the people read the commit message, they can understand where the
function is used for.

2). at the second level, it would be cool that the user can understand the
metric without reading the commit message.

Planning Memory: used=15088 bytes allocated=16384 bytes

Word 'Planning' is kind of a time duration, so the 'used' may be the
'used' during the duration or 'used' after the duration. Obviously you
means the later one but it is not surprising others think it in the other
way. So I'd like to reword the metrics to:

Memory Occupied (Now): Parser: 1k Planner: 10k

'Now' is a cleaner signal that is a time point rather than a time
duration. 'Parser' and 'Planner' also show a timeline about the
important time point. At the same time, it cost very little coding
effort based on patch 01. Different people may have different
feelings about these words, I just hope I describe the goal clearly.

About patch 2 & patch 3, since I didn't find a good usage of it, so I
didn't put much effort on it. interesting probably is not a enough
reason for adding it IMO, since there are too many interesting things.

Personally I am pretty like patch 1, we just need to refactor some words
to make everything clearer.

--
Best Regards
Andy Fan


RE: CI: Unfamiliar error while testing macOS

2023-09-21 Thread Hayato Kuroda (Fujitsu)
Dear Daniel,

Thank you for confirmation!

> 
> That looks like a transient infrastructure error and not something related to
> your patch, it will most likely go away for the next run.
>

I checked next run and found that tests were passed on all platforms.
I'm still not sure the root cause, but now I forget about it.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: pg_upgrade --check fails to warn about abstime

2023-09-21 Thread Bruce Momjian
On Wed, Sep 20, 2023 at 06:54:24PM +0200, Álvaro Herrera wrote:
> I got a complaint that pg_upgrade --check fails to raise red flags when
> the source database contains type abstime when upgrading from pg11.  The
> type (along with reltime and tinterval) was removed by pg12.

Wow, I never added code to pg_upgrade to check for that, and no one
complained either.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Questioning an errcode and message in jsonb.c

2023-09-21 Thread Chapman Flack

On 2023-09-21 20:38, Andy Fan wrote:

insert into tb select '{"a": "foo", "b": 1}';
...
select cast(a->'a' as numeric) from tb;
ERRCODE_INVALID_PARAMETER_VALUE  cannot cast jsonb string to type 
numeric

...
select cast(a->'b' as int2) from tb;
NUMERIC_VALUE_OUT_OF_RANGE smallint out of range


... and perhaps driving home the point:

insert into tb select '{"a": "1", "b": 1}';
select cast(a->'a' as int2) from tb;
ERRCODE_INVALID_PARAMETER_VALUE  cannot cast jsonb string to type 
smallint


which illustrates that:

1) it is of no consequence whether the non-numeric JSON type of
the cast source is something that does or doesn't look castable to
numeric: in the first-step test that produces this message, the
only thing tested is whether the JSON type of the source is JSON
numeric. If it is not, there will be no attempt to cast it.

2) it is immaterial what the SQL target type of the cast is;
the message will misleadingly say "to smallint" if you are
casting to smallint, or "to double precision" if you are casting
to that, but the only thing that has been tested is whether the
source has JSON type numeric.

The message in this case only really means "JSON type string
where JSON type numeric needed".

The issue is fully general:

insert into tb select '{"a": 1}';
select cast(a->'a' as boolean) from tb;
ERRCODE_INVALID_PARAMETER_VALUE  cannot cast jsonb numeric to type 
boolean


Again, all that has been tested is whether the JSON type is
JSON boolean. If it is not, no effort is made to cast it, and
the message really only means "JSON type numeric where
JSON type boolean needed".

The most annoying cases are the ones where JSON type numeric
is needed, because of the several different SQL types that one
might want as the ultimate target type, so extra machinations
are needed to get this message to misleadingly mention that
ultimate type.

As I mentioned in my earlier message, the behavior here
differs from the exactly analogous specified behavior for
XMLCAST in SQL/XML. I am not saying the behavior here is
wrong; perhaps SQL/JSON has chosen to specify it differently
(I haven't got a copy). But I pointed out the difference as
it may help to pinpoint the relevant part of the spec.

In the SQL/XML XMLCAST, the same two-step process exists:
a first step that is only concerned with the XML Schema
type (say, is it xs:string or xs:decimal?), and a second
step where the right xs type is then cast to the wanted SQL type.

The difference is, XMLCAST in the first step will try to
cast a different xs type to the right xs type. By contrast
our JSON casting simply requires the JSON type to be the
right JSON type, or fails. And for all I know, that different
approach may be as specified in SQL/JSON.

But I would not have it use ERRCODE_INVALID_PARAMETER_VALUE,
or issue a message talking about the ultimate SQL type when the
only thing checked in that step is the JSON type ... unless
the spec really says to do so.

Regards,
-Chap




Re: Questioning an errcode and message in jsonb.c

2023-09-21 Thread Andy Fan
Hi Peter,

On Wed, Sep 20, 2023 at 4:51 PM Peter Eisentraut 
wrote:

> On 18.09.23 18:55, Chapman Flack wrote:
> > It would make me happy if the message could be changed, and maybe
> > ERRCODE_INVALID_PARAMETER_VALUE also changed, perhaps to one of
> > the JSON-specific ones in the 2203x range.
>
> What is an example of a statement or function call that causes this
> error?  Then we can look in the SQL standard for guidance.
>

Thanks for showing interest in this.   The issue comes from this situation.

create table tb(a jsonb);

insert into tb select '{"a": "foo", "b": 1}';


select cast(a->'a' as numeric) from tb;
ERRCODE_INVALID_PARAMETER_VALUE  cannot cast jsonb string to type numeric

the call stack is:
0  in errstart of elog.c:351
1  in errstart_cold of elog.c:333
2  in cannotCastJsonbValue of jsonb.c:2033
3  in jsonb_numeric of jsonb.c:2063
4  in ExecInterpExpr of execExprInterp.c:758

select cast(a->'b' as int2) from tb;
NUMERIC_VALUE_OUT_OF_RANGE smallint out of range

the call stack is:
1  in errstart_cold of elog.c:333
2  in numeric_int2 of numeric.c:4503
3  in DirectFunctionCall1Coll of fmgr.c:785
4  in jsonb_int2 of jsonb.c:2086

There are 2 different errcode involved here and there are two different
functions that play part in it (jsonb_numeric and numeric_int2).  and
the error code jsonb_numeric used is improper as well.

The difference is not very huge, but it would be cool if we can make
it better,  If something really improves here, it will make the code in [0]
cleaner as well. the bad code in [0]:

+Datum
+jsonb_finish_numeric(PG_FUNCTION_ARGS)
+{
+ JsonbValue *v = (JsonbValue *)PG_GETARG_POINTER(0);
+ Oid final_oid = PG_GETARG_OID(1);
+ if (v->type != jbvNumeric)
+ cannotCastJsonbValue(v->type, format_type_be(final_oid));
+ PG_RETURN_NUMERIC(v->val.numeric);
+}

To match the error message in the older version, I have to input
a {final_oid} argument in jsonb_finish_numeric function which
is not good.

As to how to redesign the error message is a bit confusing to
me, it would be good to see the proposal code as well.

The only concern from me is that the new error from newer
version is not compatible with the older versions, which may matters
matters or doesn't match, I don't know.

[0]
https://www.postgresql.org/message-id/43a988594ac91a63dc4bb49a94303a42%40anastigmatix.net
-- 
Best Regards
Andy Fan


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Michael Paquier
On Thu, Sep 21, 2023 at 01:50:28PM +0530, Amit Kapila wrote:
> We have discussed this point. Normally, we don't have such options in
> upgrade, so we were hesitent to add a new one for this but there is a
> discussion to add an --exclude-logical-slots option. We are planning
> to add that as a separate patch after getting some more consensus on
> it. Right now, the idea is to get the main patch ready.

Okay.  I am wondering if the subscriber part is OK now without an
option, but that could also be considered separately, as well.  At
least I hope so.
--
Michael


signature.asc
Description: PGP signature


Re: Disabling Heap-Only Tuples

2023-09-21 Thread Andres Freund
Hi,

On 2023-09-19 20:20:06 +0200, Matthias van de Meent wrote:
> Mostly agreed, but I think there's a pitfall here. You seem to assume
> we have a perfect oracle that knows the optimal data size, but we
> already know that our estimates can be significantly off. I don't
> quite trust the statistics enough to do any calculations based on the
> number of tuples in the relation. That also ignores the fact that we
> don't actually have any good information about the average size of the
> tuples in the table. So with current statistics, any automated "this
> is how large the table should be" decisions would result in an
> automated footgun, instead of the current patch's where the user has
> to decide to configure it to an explicit value.

The proposed patch already relies on the FSM being reasonably up2date, no? If
the FSM doesn't know about free space, the patch won't be able to place tuples
earlier in the relation. And if the FSM wrongly thinks there's lots of free
space, it'll make updates very expensive.

We obviously don't want to scan the whole FSM on an ongoing basis, but
visiting the top-level FSM pages and/or having vacuum/analyze update some
statistic based on a more thorough analysis of the FSM doesn't seem insane.


A related issue is that an accurate tuple size and accurate number of tuples
isn't really sufficient - if tuples are wider, there can be plenty space on
pages without updates being able to reuse that space. And the width of tuples
doesn't have to be evenly distributed, so a simple approach of calculating how
many tuples of the average width fit in a page and then using that to come up
with the overall number of required pages isn't necessarily accurate either.


> But about that: I'm not sure what the "footgun" is that you've
> mentioned recently?
> The issue with excessive bloat (when the local_update_limit is set too
> small and fillfactor is low) was fixed in the latest patch nearly
> three weeks ago, so the only remaining issue with misconfiguration is
> slower updates.

There seem to be plenty footguns. Just to name a few:

- The user has to determine a good value for local_update_limit, without
  really any good way of doing so.

- A "too low" local_update_limit will often succeed in finding some space in
  earlier pages, without that providing useful progress on compaction -
  e.g. because subsequently tuples on the earlier page will be updated and
  there's now no space anymore. Leading to index bloat.

- Configuring local_update_limit as a fixed size will be fragile when the data
  actually grows, leading to lots of pointless out-of-page updates.


I think a minimal working approach could be to have the configuration be based
on the relation size vs space known to the FSM. If the target block of an
update is higher than ((relation_size - fsm_free_space) *
new_reloption_or_guc), try finding the target block via the FSM, even if
there's space on the page.


> Sure, that's not great, but in my opinion not a
> "footgun": performance returns immediately after resetting
> local_update_limit, and no space was lost.

I think there's plenty ways to get pointless out-of-page updates, and
therefore index bloat, with local_update_limit as-proposed (see earlier in the
email). Once you have such pointless out-of-page updates, disabling
local_update_limit won't bring performance back immediately (space usage due
to index bloat and lookup performance issues due to the additional index
entries).


> Updating the reloption after relation truncation implies having the
> same lock as relation truncation, i.e. AEL (if the vacuum docs are to
> be believed).

Aside: We really need to get rid of the AEL for relation trunction - it's
quite painful for hot standby workloads...

Thomas has been talking about a patch (and perhaps even posted it) that adds
infrastructure providing a "shared smgrrelation". Once we have that I think we
could lower the required lock level for truncation, by having storing both the
filesystem size and the "valid" size. There's a few potential models:

- Vacuum truncation could lower the valid size in-memory, end its transaction,
  wait for concurrent accesses to the relation to finish, check if/where to
  the relation has been extended since, acquire the extension lock and
  truncate down to the "valid" size.

  The danger with that is that the necessary waiting can be long, threatening
  to starve autovacuum of workers.

- Instead of making a single vacuum wait, we could have one vacuum update the
  valid size of the relation and also store an xid horizon. Later vacuums can
  truncate the physical size down the to valid size if there are no snapshot
  conflicts with said xid anymore.


If we had such an shared smgrrel, we could also make relation extension a lot
more efficient, because we would not need to pin all pages that a relation
extension "covers" - the reason that we need to pin the to-be-extended-pages
is to prevent concurrent scans from reading "new" blocks while 

Re: pg_upgrade and logical replication

2023-09-21 Thread Michael Paquier
On Thu, Sep 21, 2023 at 02:35:55PM +0530, Amit Kapila wrote:
> It is because after upgrade of both publisher and subscriber, the
> subscriptions won't work. Both publisher and subscriber should work,
> otherwise, the logical replication set up won't work. I think we can
> probably do this, if we can document clearly how the user can make
> their logical replication set up work after upgrade.

Yeah, well, this comes back to my original point that the upgrade of
publisher nodes and subscriber nodes should be treated as two
different problems or we're mixing apples and oranges (and a node
could have both subscriber and publishers).  While being able to
support both is a must, it is going to be a two-step process at the
end, with the subscribers done first and the publishers done after.
That's also kind of the point that Julien makes in top message of this
thread.

I agree that docs are lacking in the proposed patch in terms of
restrictions, assumptions and process flow, but taken in isolation the
problem of the publishers is not something that this patch has to take
care of.  I'd certainly agree that it should mention, at least and if
merged first, to be careful if upgrading the publishers as its slots
are currently removed.
--
Michael


signature.asc
Description: PGP signature


Re: Disabling Heap-Only Tuples

2023-09-21 Thread Andres Freund
Hi,

On 2023-09-19 14:50:13 -0400, Robert Haas wrote:
> On Tue, Sep 19, 2023 at 12:56 PM Andres Freund  wrote:
> > Yea, a setting like what's discussed here seems, uh, not particularly useful
> > for achieving the goal of compacting tables.  I don't think guiding this
> > through SQL makes a lot of sense. For decent compaction you'd want to scan 
> > the
> > table backwards, and move rows from the end to earlier, but stop once
> > everything is filled up. You can somewhat do that from SQL, but it's going 
> > to
> > be awkward and slow.  I doubt you even want to use the normal UPDATE WAL
> > logging.
> >
> > I think having explicit compaction support in VACUUM or somewhere similar
> > would make sense, but I don't think the proposed GUC is a useful stepping
> > stone.
> 
> I think there's a difference between wanting to compact instantly and
> wanting to compact over time. I think that this kind of thing is
> reasonably well-suited to the latter, if we can engineer away the
> cases where it backfires.
> 
> But I know people will try to use it for instant compaction too, and
> there it's worth remembering why we removed old-style VACUUM FULL. The
> main problem is that it was mind-bogglingly slow.

I think some of the slowness was implementation related, rather than
fundamental. But more importantly, storage was something entirely different
back then than it is now.


> The other really bad problem is that it caused massive index bloat. I think
> any system that's based on moving around my tuples right now to make my
> table smaller right now is likely to have similar issues.

I think the problem of exploding WAL usage exists both for compaction being
done in VACUUM (or a dedicated command) and being done by backends. I think to
make using a facility like this realistic, you really need some form of rate
limiting, regardless of when compaction is performed. Even leaving WAL volume
aside, naively doing on-update compaction will cause lots of additional
contention on early FSM pages.


> In the case where you're trying to compact gradually, I think there
> are potentially serious issues with index bloat, but only potentially.
> It seems like there are reasonable cases where it's fine.

> Specifically, if you have relatively few indexes per table, relatively
> few long-running transactions, and all tuples get updated on a
> semi-regular basis, I'm thinking that you're more likely to win than
> lose.

Maybe - but are you going to have a significant bloat issue in that case?
Sure, if the updates update most of the table, youre are going to - but then
on-update compaction won't really be needed either, since you're going to run
out of space on pages on a regular basis.

Greetings,

Andres Freund




Re: GenBKI emits useless open;close for catalogs without rows

2023-09-21 Thread Andres Freund
Hi,

On 2023-09-19 21:05:41 +0300, Heikki Linnakangas wrote:
> On 18/09/2023 17:50, Matthias van de Meent wrote:
> > (initdb takes about 73ms locally with syncing disabled)
>
> That's impressive. It takes about 600 ms on my laptop. Of which about 140 ms
> goes into processing the BKI file. And that's with "initdb -no-sync" option.

I think there must be a digit missing in Matthias' numbers.


> > Various methods of reducing the size of postgres.bki were applied, as
> > detailed in the patch's commit message. I believe the current output
> > is still quite human readable.
>
> Overall this does not seem very worthwhile to me.

Because the wins are too small?

FWIW, Making postgres.bki smaller and improving bootstrapping time does seem
worthwhile to me. But it doesn't seem quite right to handle the batching in
the file format, it should be on the server side, no?

We really should stop emitting WAL during initdb...


> Looking at "perf" profile of initdb, I also noticed that a small but
> measurable amount of time is spent in the "isatty(0)" call in do_end(). Does
> anyone care about doing bootstrap mode interactively? We could probably
> remove that.

Heh, yea, that's pretty pointless.

Greetings,

Andres Freund




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-21 Thread Jeff Davis
On Tue, 2023-09-19 at 20:23 -0700, Maciek Sakrejda wrote:
> On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis  wrote:
> > ...
> > On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
> > > That leads to a second idea, which is having it continue
> > > to be a GUC but only affect directly-entered SQL, with all
> > > indirectly-entered SQL either being stored as a node tree or
> > > having a
> > > search_path property attached somewhere.
> > 
> > That's not too far from the proposed patch and I'd certainly be
> > interested to hear more and/or adapt my patch towards this idea.
> 
> As an interested bystander, that's the same thing I was thinking when
> reading this. I reread your original e-mail, Jeff, and I still think
> that.

I have attached an updated patch. Changes:

  * Syntax is now: SEARCH FROM { DEFAULT | TRUSTED | SESSION }
- added "FROM" to suggest that it's the source, and only a starting
place, rather than a specific and final setting. I don't feel strongly
about the FROM one way or another, so I can take it out if it's not
helpful.
- changed "SYSTEM" to "TRUSTED", which better reflects the purpose,
and doesn't suggest any connection to ALTER SYSTEM.
  * Removed GUC -- we can reconsider this kind of thing later.
  * ERROR if IMMUTABLE is combined with SEARCH FROM SESSION
  * pg_dump support. Emits "SEARCH FROM SESSION" or "SEARCH FROM
TRUSTED" only if explicitly specified; otherwise emits no SEARCH
clause. Differentiating the unspecified cases may be useful for
migration purposes later.
  * psql support.
  * Updated docs to try to better present the concept, and document
CREATE PROCEDURE as well.


The SEARCH clause declares a new property that will be useful to both
enforce safety and also to guide users to migrate in a safe direction
over time.

For instance, the current patch prohibits the combination of IMMUTABLE
and SEARCH FROM SESSION; but allows IMMUTABLE if no SEARCH clause is
specified at all (to avoid breaking anything). We could extend that
slowly over several releases ratchet up the pressure (with warnings or
changing defaults) until all IMMUTABLE functions require SEARCH FROM
TRUSTED. Perhaps IMMUTABLE would even imply SEARCH FROM TRUSTED.

The search property is consistent with other properties, like
IMMUTABLE, which is both a marker and also enforces some restrictions
(e.g. you can't CREATE TABLE). It's also a lot nicer to use than a SET
clause, and provides a nice place to document certain behaviors.

(Aside: the concept of IMMUTABLE is basically broken today, due to
search_path problems.)

SEARCH FROM DEFAULT is just a way to get an object back to the
"unspecified search clause" state. It has the same behavior as SEARCH
FROM SESSION, except that the former will cause a hard error when
combined with IMMUTABLE. I think it's worth differentiating the
unspecified search clause from the explicit SEARCH FROM SESSION clause
for the purposes of migration.

There were three main complaints:

Comaplaint A: That it creates a new mechanism[1].

The patch doesn't create a new internal mechanism, it almost entirely
reuses the existing SET clause mechanism. I think complaint A is really
about the user-facing mechanics, which is essentially the same as the
complaint B.

Complaint B: That it's overlapping in functionality with the SET
clause[2][3]. In other words:

   CREATE FUNCTION ... SEARCH FROM TRUSTED ...;
   CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...;

do similar things. But the latter is much worse:

   * it's user-unfriendly (requiring pg_temp is highly unintuitive)
   * it doesn't allow Postgres to warn if the function is being used in
an unsafe context
   * there's no way to explicitly declare that you want the search path
to come from the session instead (either to be more clear about your
intentions, or to be forward-compatible)

In my opinion, the "SET search_path = ..." form should be used when you
actually want the search_path to contain some specific schema, not in
cases where you're just using built-in objects.

Complaint C: search_path is hopeless[4].

I think we can make meaningful improvements to the status quo, like
with the attached patch, that will reduce a lot of the surface area for
security risks. Right now our privilege model breaks down very quickly
even with trivial and typical use cases and we can do better.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaRPJJN%3DAOqC4b9t90vFQX81hKiXNPNhbxR0-Sm8F8nCA%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CA%2BTgmoah_bTjUFng-vZnivPQs0kQWUaSwAu49SU5M%2BzTxA%2B3Qw%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/15464811-18fb-c7d4-4620-873366d367d6%40eisentraut.org
[4]
https://www.postgresql.org/message-id/20230812182559.d7plqwx3p65ys4i7%40awork3.anarazel.de


From 1cdb1e604535e7e0e1c3a1d8c9d38d99c9d42ba3 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Fri, 11 Aug 2023 14:33:36 -0700
Subject: [PATCH v2] CREATE FUNCTION ... 

Re: Index range search optimization

2023-09-21 Thread Peter Geoghegan
On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov  wrote:
> I looked at the patch code and I agree with this optimization.
> Implementation also looks good to me except change :
> + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
> + !(key->sk_flags & SK_ROW_HEADER))
> + requiredDir = true;
> ...
> - if ((key->sk_flags & SK_BT_REQFWD) &&
> - ScanDirectionIsForward(dir))
> - *continuescan = false;
> - else if ((key->sk_flags & SK_BT_REQBKWD) &&
> - ScanDirectionIsBackward(dir))
> + if (requiredDir)
>   *continuescan = false;
>
> looks like changing behavior in the case when key->sk_flags &
> SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
> (!requiredDirMatched)
> Originally it doesn't set *continuescan = false; and with the patch it will 
> set.

I agree that this is a problem. Inequality strategy scan keys are used
when the initial positioning strategy used by _bt_first (for its
_bt_search call) is based on an operator other than the "=" operator
for the opclass. These scan keys are required in one direction only
(Konstantin's original patch just focussed on these cases, actually).
Obviously, that difference matters. I don't think that this patch
should do anything that even looks like it might be revising the
formal definition of "required in the current scan direction".

Why is SK_ROW_HEADER treated as a special case by the patch? Could it
be related to the issues with required-ness and scan direction? Note
that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row
comparisons, so they're only ever required for one scan direction.
(Equality-type row constructor syntax can of course be used without
preventing the system from using an index scan, but the nbtree code
will not see that case as a row comparison in the first place. This is
due to preprocessing by the planner -- nbtree just sees conventional
scan keys with multiple simple equality scan keys with = row
comparisons.)

Also, what about NULLs? While "key IS NULL" is classified as an
equality check (see _bt_preprocess_keys comments), the same isn't true
with "key IS NOT NULL". The latter case usually has scan key flags
"SK_ISNULL | SK_SEARCHNOTNULL | SK_BT_REQFWD" -- there is no
SK_BT_REQBKWD here.

> This may be relevant for the first page when requiredDirMatched is
> intentionally skipped to be set and for call
> _bt_checkkeys(scan, itup, truncatt, dir, , false);

Also, requiredDirMatched isn't initialized by _bt_readpage() when
"so->firstPage". Shouldn't it be initialized to false?

Also, don't we need to take more care with a fully empty page? The "if
(!so->firstPage) ... " block should be gated using a condition such as
"if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff"
test would also work, but then the optimization will get applied on
pages with only one non-pivot tuple. That would be harmless, but a
waste of cycles.)

> Also naming of requiredDirMatched and requiredDir seems semantically
> hard to understand the meaning without looking at the patch commit
> message. But I don't have better proposals yet, so maybe it's
> acceptable.

I agree. How about "requiredMatchedByPrecheck" instead of
"requiredDirMatched", and "required" instead of "requiredDir"?

It would be nice if this patch worked in a way that could be verified
by an assertion. Under this scheme, the optimization would only really
be used in release builds (builds without assertions enabled, really).
We'd only verify that the optimized case agreed with the slow path in
assert-enabled builds. It might also make sense to always "apply the
optimization" on assert-enabled builds, even for the first page seen
by _bt_readpage by any _bt_first-wise scan. Maybe this sort of
approach is impractical here for some reason, but I don't see why it
should be.

Obviously, the optimization should lower the amount of work in some
calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys
gives. Ideally, it should be done in a way that makes that very
obvious. There are some very subtle interactions between _bt_checkkeys
and other, distant code -- which makes me feel paranoid. Notably, with
required equality strategy scan keys, we're crucially dependent on
_bt_first using an equality strategy for its initial positioning call
to _bt_search. This is described by comments in both _bt_checkkeys and
in _bt_first.

Note, in particular, that it is essential that the initial offnum
passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take
place that could cause it to become confused by a required equality
strategy scan key, leading to _bt_checkkeys terminating the whole scan
"early" -- causing wrong answers. For a query "WHERE foo = 5" (and a
forward scan), we had better not pass _bt_readpage an offset number
for a tuple with "foo" value 4. If that is ever allowed then
_bt_checkkeys will terminate the scan immediately, leading to wrong
answers. All because _bt_checkkeys can't tell if 4 comes before 5 or
comes after 5 -- it only has an "=" operator 

Re: Eliminate redundant tuple visibility check in vacuum

2023-09-21 Thread Robert Haas
On Thu, Sep 21, 2023 at 3:53 PM Robert Haas  wrote:
> > >  static int   heap_prune_chain(Buffer buffer,
> > >OffsetNumber 
> > > rootoffnum,
> > > +  int8 *htsv,
> > >PruneState 
> > > *prstate);
> >
> > Hm, do we really want to pass this explicitly to a bunch of functions? Seems
> > like it might be better to either pass the PruneResult around or to have a
> > pointer in PruneState?
>
> As far as I can see, 0002 adds it to one function (heap_page_pune) and
> 0003 adds it to one more (heap_prune_chain). That's not much of a
> bunch.

I didn't read this carefully enough. Actually, heap_prune_chain() is
the *only* function that gets int8 *htsv as an argument. I don't
understand how that's a bunch ... unless there are later patches not
shown here that you're worried abot. What happens in 0002 is a
function getting PruneResult * as an argument, not int8 *htsv.

Honestly I think 0002 and 0003 are ready to commit, if you're not too
opposed to them, or if we can find some relatively small changes that
would address your objections.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Eliminate redundant tuple visibility check in vacuum

2023-09-21 Thread Robert Haas
[ sorry for the delay getting back to this ]

On Wed, Sep 13, 2023 at 1:29 PM Andres Freund  wrote:
> I think it might be worth making the names a bit less ambiguous than they
> were. It's a bit odd that one has "new" in the name, the other doesn't,
> despite both being about newly marked things. And "deleted" seems somewhat
> ambiguous, it could also be understood as marking things LP_DEAD. Maybe
> nnewunused?

I like it the better the way Melanie did it. The current name may not
be for the best, but that can be changed some other time, in a
separate patch, if someone likes. For now, changing the name seems
like a can of worms we don't need to open; the existing names have
precedent on their side if nothing else.

> >  static int   heap_prune_chain(Buffer buffer,
> >OffsetNumber 
> > rootoffnum,
> > +  int8 *htsv,
> >PruneState *prstate);
>
> Hm, do we really want to pass this explicitly to a bunch of functions? Seems
> like it might be better to either pass the PruneResult around or to have a
> pointer in PruneState?

As far as I can see, 0002 adds it to one function (heap_page_pune) and
0003 adds it to one more (heap_prune_chain). That's not much of a
bunch.

> >   /*
> >* The criteria for counting a tuple as live in this block 
> > need to
> > @@ -1682,7 +1664,7 @@ retry:
> >* (Cases where we bypass index vacuuming will violate this 
> > optimistic
> >* assumption, but the overall impact of that should be 
> > negligible.)
> >*/
> > - switch (res)
> > + switch ((HTSV_Result) presult.htsv[offnum])
> >   {
> >   case HEAPTUPLE_LIVE:
>
> I think we should assert that we have a valid HTSV_Result here, i.e. not
> -1. You could wrap the cast and Assert into an inline funciton as well.

This isn't a bad idea, although I don't find it completely necessary either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: how to do profile for pg?

2023-09-21 Thread David Geier

Hi,

This talk from Andres seems to have some relevant information for you:

https://www.youtube.com/watch?v=HghP4D72Noc

--
David Geier
(ServiceNow)





Re: Questions about the new subscription parameter: password_required

2023-09-21 Thread Robert Haas
On Thu, Sep 21, 2023 at 8:03 AM Benoit Lobréau
 wrote:
> I am confused about the new subscription parameter: password_required.
>
> I have two instances. The publisher's pg_hba is configured too allow
> connections without authentication. On the subscriber, I have an
> unprivileged user with pg_create_subscription and CREATE on the database.
>
> I tried using a superuser to create a subsciption without setting the
> password_required parameter (the default is true). Then I changed the
> owner to the unprivileged user.
>
> This user can use the subscription without limitation (including ALTER
> SUBSCRIPTION ENABLE / DISABLE). The \dRs+ metacommand shows that a
> password is requiered, which is not the case (or it is but it's not
> enforced).
>
> Is this normal? I was expecting the ALTER SUBSCRIPTION .. OWNER to fail.

Which one? I see 2 ALTER SUBSCRIPTION ... OWNER commands in
password_required.log and 1 more in password_required2.log, but
they're all performed by the superuser, who is entitled to do anything
they want.

The intention here is that most subscriptions will have
passwordrequired=true. If such a subscription is owned by a superuser,
the superuser can still use them however they like. If owned by a
non-superuser, they can use them however they like *provided* that the
password must be used to authenticate. If the superuser wants a
non-superuser to be able to own a subscription that doesn't use a
password, the superuser can set that up by configuring
passwordrequired=false. But then that non-superuser is not allowed to
further manipulate that subscription.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-21 Thread Robert Haas
On Tue, Sep 19, 2023 at 7:56 PM Jeff Davis  wrote:
> Good documentation includes some guidance. Sure, it should describe the
> system behavior, but without anchoring it to some kind of expected use
> case, it can be equally frustrating.

Fair.

> I don't mean to set some major new standard in the documentation that
> should apply to everything; but for the privilege system, even hackers
> are having trouble keeping up (myself included). A bit of guidance
> toward supported use cases helps a lot.

Yeah, this stuff is complicated, and I agree that it's hard even for
hackers to keep up with. I don't really have a strong view on the
concrete case you mentioned involving password_required. I always
worry that if there are three cases and we suggest one of them then
the others will be viewed negatively when really they're all equally
fine. On the other hand, that can often be addressed by starting the
sentence with "For example, you could" or similar, so perhaps
there's no problem here at all. I generally agree with the idea that
examples can be useful for clarifying points that may otherwise be too
theoretical.

> I hope what I'm saying is not useless vitriol. I am offering the best
> solutions I see in a bad situation. And I believe I've uncovered some
> emergent behaviors that are not well-understood even among prominent
> hackers.

Yeah, I wasn't really intending to say that you were. I just get
nervous about statements like "don't ever do X!" because I find it
completely unconvincing. In my experience, when you tell people stuff
like that, some of them just go off and do it anyway and, especially
in a case like this, chances are very good that nothing bad will ever
happen to them, simply because most PostgreSQL installations don't
have malicious local users. When you tell them "but that's really bad"
they say "why?" and if the documentation doesn't have an answer to the
question well then that sucks.

> >  because it would break a lot of stuff for a lot
> > of people, but if we could get that then I think we could maybe
> > emerge
> > in a better spot once the pain of the compatibility break receded.
>
> Are there ways we can soften this a bit? I know compatibility GUCs are
> not to be added lightly, but perhaps one is justified here?

I don't know. I'm skeptical. This behavior is so complicated and hard
to get right. Having it GUC-dependent makes it even more confusing
than it is already. But I guess it also depends on what the GUC does.

Let's say we make a rule that every function or procedure has to have
a search_path attached to it as a function property. That is, CREATE
FUNCTION .. SEARCH something sets pg_proc.prosearch = 'something'. If
you omit the SEARCH clause, one is implicitly supplied for you. If you
say SEARCH NULL, then the function is executed with the search_path
taken from the GUC; SEARCH 'anything_else' specified a literal
search_path to be used.

In such a world, I can imagine having a GUC that determines whether
the implicitly supplied SEARCH clause is SEARCH
${WHATEVER_THE_SEARCH_PATH_IS_RIGHT_NOW} or SEARCH NULL. Such a GUC
only takes effect at CREATE FUNCTION time. However, I cannot imagine
having a GUC that causes the SEARCH clauses attached to all functions
to be globally ignored at execution time. That seems like a choice we
would likely regret bitterly. The first thing is already painful, but
the second one is exponentially worse, because in the first world, you
have to be careful to get your functions defined correctly, but if you
do, you know they'll run OK on any PostgreSQL cluster anywhere,
whereas in the second world, there's no way to define a function that
behaves the same way on every PostgreSQL instance. Imagine being an
extension author, for example.

I am a little worried that this kind of design might end up reversing
the direction of some security problems that we have now. For
instance, right now, if you call a function with a SET search_path
clause, you know that it can't make any changes to search_path that
survive function exit. You'll get your old search_path back. With this
kind of design, it seems like it would be a lot easier to get back to
the SQL toplevel and find the search_path surprisingly changed under
you. I think we have that problem in some cases already, though. I'm
unclear how much worse this makes it.

> > Another option is something around sandboxing and/or function trust.
> > The idea here is to not care too much about the search_path behavior
> > itself, and instead focus on the consequences, namely what code is
> > getting executed as which user and perhaps what kinds of operations
> > it's performing.
>
> I'm open to discussing that further, and it certainly may solve some
> problems, but it does not seem to solve the fundamental problem with
> search_path: that the caller can (intentionally or unintentionally)
> cause a function to do unexpected things.

Well, I think it's meant to solve that problem.  How effectively it
does so is a 

Re: how to do profile for pg?

2023-09-21 Thread Alvaro Herrera
This talk by D. Dolgov 
https://www.postgresql.eu/events/pgconfeu2022/sessions/session/3861/slides/325/Dmitrii_Dolgov_PGConf_EU_2022.pdf
might be insightful.  Or not, because you need to fill in a lot of
blanks.  Maybe you can find a recording of that talk somewhere, if
you're lucky.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)




Re: how to do profile for pg?

2023-09-21 Thread Aleksander Alekseev
Hi,

> but I need a quick demo to see the memory profiling or CPU profiling.  I hope 
> a blog or a video which is better for me. Thanks

Well, then I guess you better hurry with reading these books :)

There is no shortcut I'm afraid. One of the first things that Brendan
explains is how to do benchmarks *prorerly*. This is far from being
trivial and often you may be measuring not something you want. E.g.
you may think that you are profiling CPU while in fact there is a lock
contention and CPU is not even a bottleneck. Another thing worth
considering which is often neglected is to make sure your optimization
doesn't cause any digradations under different workloads.

Last but not least you should be mindful of different configuration
parameters of PostgreSQL - shared_buffers, synchronous_commit = off,
to name a few, and also understand the architecture of the system
quite well. In this context I recommend Database System Concepts, 7th
Edition by Avi Silberschatz et al and also CMU Intro to Database
Systems [1] and CMU Advanced Database Systems [2] courses.

[1]: https://www.youtube.com/playlist?list=PLSE8ODhjZXjZaHA6QcxDfJ0SIWBzQFKEG
[2]: https://www.youtube.com/playlist?list=PLSE8ODhjZXjYzlLMbX3cR0sxWnRM7CLFn

-- 
Best regards,
Aleksander Alekseev




Re: New WAL record to detect the checkpoint redo location

2023-09-21 Thread Robert Haas
On Thu, Sep 21, 2023 at 4:22 AM Amit Kapila  wrote:
> After the 0003 patch, do we need acquire exclusive lock via
> WALInsertLockAcquireExclusive() for non-shutdown checkpoints. Even the
> comment "We must block concurrent insertions while examining insert
> state to determine the checkpoint REDO pointer." seems to indicate
> that it is not required. If it is required then we may want to change
> the comments and also acquiring the locks twice will have more cost
> than acquiring it once and write the new WAL record under that lock.

I think the comment needs updating. I don't think we can do curInsert
= XLogBytePosToRecPtr(Insert->CurrBytePos) without taking the locks.
Same for Insert->fullPageWrites.

I agree that it looks a little wasteful to release the lock and then
reacquire it, but I suppose checkpoints don't happen often enough for
it to matter. You're not going to notice an extra set of insertion
lock acquisitions once every 5 minutes, or every half hour, or even
every 1 minute if your checkpoints are super-frequent.

Also notice that the current code is also quite inefficient in this
way. GetLastImportantRecPtr() acquires and releases each lock one at a
time, and then we immediately turn around and do
WALInsertLockAcquireExclusive(). If the overhead that you're concerned
about here were big enough to matter, we could reclaim what we're
losing by having a version of GetLastImportantRecPtr() that expects to
be called with all locks already held. But when I asked Andres, he
thought that it didn't matter, and I bet he's right.

> One minor comment:
> + else if (XLOG_CHECKPOINT_REDO)
> + class = WALINSERT_SPECIAL_CHECKPOINT;
> + }
>
> Isn't the check needs to compare the record type with info?

Yeah wow. That's a big mistake.

> Your v6-0001* patch looks like an improvement to me even without the
> other two patches.

Good to know, thanks.

> BTW, I would like to mention that there is a slight interaction of
> this work with the patch to upgrade/migrate slots [1]. Basically in
> [1], to allow slots migration from lower to higher version, we need to
> ensure that all the WAL has been consumed by the slots before clean
> shutdown. However, during upgrade we can generate few records like
> checkpoint which we will ignore for the slot consistency checking as
> such records doesn't matter for data consistency after upgrade. We
> probably need to add this record to that list. I'll keep an eye on
> both the patches so that we don't miss that interaction but mentioned
> it here to make others also aware of the same.

If your approach requires a code change every time someone adds a new
WAL record that doesn't modify table data, you might want to rethink
the approach a bit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: how to do profile for pg?

2023-09-21 Thread jack...@gmail.com
but I need a quick demo to see the memory profiling or CPU profiling. I 
hope a blog or a video which is better for me. Thanks.





 Replied Message 



From
mailto:aleksan...@timescale.com; 
>Aleksander Alekseevaleksan...@timescale.com


Date
09/21/2023 
22:02


To
mailto:pgsql-hackers@lists.postgresql.org; 
>pgsql-hackerspgsql-hackers@lists.postgresql.org


Cc
mailto:jack...@gmail.com; 
>jack...@gmail.com


Subject
Re: how to do 
profile for pg?



Hi jacktby,

PostgreSQL is literally a large and complicated program in C. Thus it
can be profiled as such. E.g. you can use `perf` and build flamegraphs
using `perf record`. Often pgbench is an adequate tool to compare
before and after results.There are many other tools available
depending on what exactly you want to profile - CPU, lock contention,
disk I/O, etc. People write books (plural) on the subject. Personally
I would recommend System Performance, Enterprise and the Cloud, 2nd
Edition and BPF Performance Tools by Brendan Gregg.

-- 
Best regards,
Aleksander Alekseev





Re: how to do profile for pg?

2023-09-21 Thread Aleksander Alekseev
Hi jacktby,

PostgreSQL is literally a large and complicated program in C. Thus it
can be profiled as such. E.g. you can use `perf` and build flamegraphs
using `perf record`. Often pgbench is an adequate tool to compare
before and after results.There are many other tools available
depending on what exactly you want to profile - CPU, lock contention,
disk I/O, etc. People write books (plural) on the subject. Personally
I would recommend "System Performance, Enterprise and the Cloud, 2nd
Edition" and "BPF Performance Tools" by Brendan Gregg.

-- 
Best regards,
Aleksander Alekseev




Re: Infinite Interval

2023-09-21 Thread Ashutosh Bapat
On Wed, Sep 20, 2023 at 8:23 PM Dean Rasheed  wrote:
>
> On Wed, 20 Sept 2023 at 15:00, Ashutosh Bapat
>  wrote:
> >
> > 0005 - Refactored Jian's code fixing window functions. Does not
> > contain the changes for serialization and deserialization. Jian,
> > please let me know if I have missed anything else.
> >
>
> That looks a lot neater. One thing I don't care for is this code
> pattern in finite_interval_pl():
>
> +result->month = span1->month + span2->month;
> +/* overflow check copied from int4pl */
> +if (SAMESIGN(span1->month, span2->month) &&
> +!SAMESIGN(result->month, span1->month))
> +ereport(ERROR,
>
> The problem is that this is a bug waiting to happen for anyone who
> uses this function with "result" pointing to the same Interval struct
> as "span1" or "span2". I understand that the current code avoids this
> by careful use of temporary Interval structs, but it's still a pretty
> ugly pattern. This can be avoided by using pg_add_s32/64_overflow(),
> which then allows the callers to be simplified, getting rid of the
> temporary Interval structs and memcpy()'s.

That's a good idea. Done.

>
> Also, in do_interval_discard(), this seems a bit risky:
>
> +neg_val.day = -newval->day;
> +neg_val.month = -newval->month;
> +neg_val.time = -newval->time;
>
> because it could in theory turn a finite large negative interval into
> an infinite one (-INT_MAX -> INT_MAX), leading to an assertion failure
> in finite_interval_pl(). Now maybe that's not possible for some other
> reasons, but I think we may as well do the same refactoring for
> interval_mi() as we're doing for interval_pl() -- i.e., introduce a
> finite_interval_mi() function, making the addition and subtraction
> code match, and removing the need for neg_val in
> do_interval_discard().

Your suspicion is correct. It did throw an error. Added tests for the
same. Introduced finite_interval_mi() which uses
pg_sub_s32/s64_overflow() functions.

I will send updated patches with my next reply.

--
Best Wishes,
Ashutosh Bapat




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Hayato Kuroda (Fujitsu)
Dear Bharath,

Thank you for reviewing! Before addressing them, I would like to reply some 
comments.

> 6.
> +if (PQntuples(res) != 1)
> +pg_fatal("could not count the number of logical replication slots");
> +
> 
> Not existing a single logical replication slot an error? I think it
> must be if (PQntuples(res) == 0) return;?
>

The query executes "SELECT count(*)...", IIUC it exactly returns 1 row.

> 7. A nit:
> +nslots_on_new = atoi(PQgetvalue(res, 0, 0));
> +
> +if (nslots_on_new)
> 
> Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?

Note that the vaule would be used for upcoming pg_fatal. I prefer current style
because multiple atoi(PQgetvalue(res, 0, 0)) was not so beautiful.

> 
> 11.
> +# 2. Generate extra WAL records. Because these WAL records do not get
> consumed
> +# it will cause the upcoming pg_upgrade test to fail.
> +$old_publisher->safe_psql('postgres',
> +"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
> +);
> +$old_publisher->stop;
> 
> This might be a recipie for sporadic test failures - how is it
> guaranteed that the newly generated WAL records aren't consumed.

You mentioned at line 118, but at that time logical replication system is not 
created.
The subscriber is created at line 163.
Therefore WALs would not be consumed automatically.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


how to do profile for pg?

2023-09-21 Thread jack...@gmail.com


Re: Allow specifying a dbname in pg_basebackup connection string

2023-09-21 Thread Daniel Gustafsson
> On 18 Sep 2023, at 14:11, Daniel Gustafsson  wrote:

> Unless something sticks out in a second pass over it I will go ahead and apply
> it.

And applied, closing the CF entry.

--
Daniel Gustafsson





Re: Add 'worker_type' to pg_stat_subscription

2023-09-21 Thread Amit Kapila
On Thu, Sep 21, 2023 at 5:36 AM Peter Smith  wrote:
>
> On Thu, Sep 21, 2023 at 9:34 AM Nathan Bossart  
> wrote:
> >
> > On Thu, Sep 21, 2023 at 08:14:23AM +0900, Michael Paquier wrote:
> > > On Thu, Sep 21, 2023 at 09:01:01AM +1000, Peter Smith wrote:
> > >> One question -- the patch comment still says "Bumps catversion.", but
> > >> catversion.h change is missing from the v9 patch?
> > >
> > > Yeah, previous patches did that, but it is no big deal.  My take is
> > > that it is a good practice to never do a catversion bump in posted
> > > patches, and that it is equally a good practice from Nathan to be
> > > reminded about that with the addition of a note in the commit message
> > > of the patch posted.
> >
> > Right, I'll take care of it before committing.  I'm trying to make sure I
> > don't forget.  :)
>
> OK, all good.
>
> ~~~
>
> This is a bit of a late entry, but looking at the PG DOCS, I felt it
> might be simpler if we don't always refer to every other worker type
> when explaining NULLs. The descriptions are already bigger than they
> need to be, and if more types ever get added they will keep growing.
>
> ~
>
> BEFORE
> leader_pid integer
> Process ID of the leader apply worker if this process is a parallel
> apply worker; NULL if this process is a leader apply worker or a table
> synchronization worker
>
> SUGGESTION
> leader_pid integer
> Process ID of the leader apply worker; NULL if this process is not a
> parallel apply worker
>
> ~
>
> BEFORE
> relid oid
> OID of the relation that the worker is synchronizing; NULL for the
> leader apply worker and parallel apply workers
>
> SUGGESTION
> relid oid
> OID of the relation being synchronized; NULL if this process is not a
> table synchronization worker
>

I find the current descriptions better than the proposed. But I am not
opposed to your proposal if others are okay with it. Personally, I
feel even if we want to change these descriptions, we can do it as a
separate patch.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Amit Kapila
On Thu, Sep 21, 2023 at 4:57 PM Bharath Rupireddy
 wrote:
>
> On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Good catch, I could not notice because it worked well in my RHEL. Here is 
> > the
> > updated version.
>
> Thanks for the patch. I have some comments on v42:
>
> 1.
> +{ oid => '8046', descr => 'for use by pg_upgrade',
> +  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
> +  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
>
> +if (PG_ARGISNULL(0))
> +elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
>
> Can proisstrict => 'f' be removed so that there's no need for explicit
> PG_ARGISNULL check? Any specific reason to keep it?
>

Probably trying to keep it similar with
binary_upgrade_create_empty_extension(). I think it depends what
behaviour we expect for NULL input.

> And, the before the ISNULL check the arg is read, which isn't good.
>

Right.

> 2.
> +Datum
> +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
>
> The function name looks too generic in the sense that it validates WAL
> records for correctness/corruption, but it is not. Can it be something
> like binary_upgrade_{check_for_wal_logical_end,
> check_for_logical_end_of_wal} or such?
>

How about slightly modified version like
binary_upgrade_validate_wal_logical_end?

> 3.
> +/* Quick exit if the given lsn is larger than current one */
> +if (start_lsn >= GetFlushRecPtr(NULL))
> +PG_RETURN_BOOL(false);
> +
>
> An LSN that doesn't exists yet is an error IMO, may be an error better here?
>

It will anyway lead to error at later point but we will provide more
information about all the slots that have invalid value of
confirmed_flush LSN.

> 4.
> + * This function is used to verify that there are no WAL records (except some
> + * types) after confirmed_flush_lsn of logical slots, which means all the
> + * changes were replicated to the subscriber. There is a possibility that 
> some
> + * WALs are inserted during upgrade, so such types would be ignored.
> + *
>
> This comment before the function better be at the callsite of the
> function, because as far as this function is concerned, it checks if
> there are any WAL records that are not "certain" types after the given
> LSN, it doesn't know logical slots or confirmed_flush_lsn or such.
>

Yeah, we should give information at the callsite but I guess we need
to give some context atop this function as well so that it is easier
to explain the functionality.

> 5. Trying to understand the interaction of this feature with custom
> WAL records that a custom WAL resource manager puts in. Is it okay to
> have custom WAL records after the "logical WAL end"?
> +/*
> + * There is a possibility that following records may be generated
> + * during the upgrade.
> + */
>

I don't think so. The only valid records for the checks in this
function are probably the ones that can get generated by the upgrade
process because we ensure that walsender sends all the records before
it exits at shutdown time.

>
> 10.
> Why just wal_level and max_replication_slots, why not
> max_worker_processes and max_wal_senders too?

Isn't it sufficient to check the parameters that are required to
create a slot aka what we check in the function
CheckLogicalDecodingRequirements()? We are only creating logical slots
here so I think that should be sufficient.

-- 
With Regards,
Amit Kapila.




Re: Index range search optimization

2023-09-21 Thread Pavel Borisov
On Thu, 21 Sept 2023 at 15:17, Alexander Korotkov  wrote:
>
> On Wed, Sep 20, 2023 at 5:07 PM Alexander Korotkov  
> wrote:
> > On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan  wrote:
> >  This also makes sense. I've rephrased the comment.
>
> The revised patch is attached.  It contains better comments and the
> commit message.  Peter, could you please check if you're OK with this?
Hi, Alexander!

I looked at the patch code and I agree with this optimization.
Implementation also looks good to me except change :
+ if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
+ !(key->sk_flags & SK_ROW_HEADER))
+ requiredDir = true;
...
- if ((key->sk_flags & SK_BT_REQFWD) &&
- ScanDirectionIsForward(dir))
- *continuescan = false;
- else if ((key->sk_flags & SK_BT_REQBKWD) &&
- ScanDirectionIsBackward(dir))
+ if (requiredDir)
  *continuescan = false;

looks like changing behavior in the case when key->sk_flags &
SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
(!requiredDirMatched)
Originally it doesn't set *continuescan = false; and with the patch it will set.

This may be relevant for the first page when requiredDirMatched is
intentionally skipped to be set and for call
_bt_checkkeys(scan, itup, truncatt, dir, , false);

Maybe I missed something and this can not appear for some reason?

Also naming of requiredDirMatched and requiredDir seems semantically
hard to understand the meaning without looking at the patch commit
message. But I don't have better proposals yet, so maybe it's
acceptable.

Kind regards,
Pavel Borisov
Supabase.




Re: pg_stat_statements and "IN" conditions

2023-09-21 Thread Jakub Wartak
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I've tested the patched on 17devel/master and it is my feeling - especially 
given the proliferation of the ORMs - that we need such thing in pgss. Thread 
already took almost 3 years, so it would be pity to waste so much development 
time of yours. Cfbot is green, and patch works very well for me. IMVHO 
commitfest status should be even set to ready-for-comitter.

Given the:
SET query_id_const_merge = on;
SELECT pg_stat_statements_reset();
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 11);
SELECT * FROM test WHERE a IN (1, 2, 3);
SELECT * FROM test WHERE a = ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12}');
SELECT * FROM test WHERE a = ANY (ARRAY[11,10,9,8,7,6,5,4,3,2,1]);

The patch results in:
  q  | calls
-+---
 SELECT * FROM test WHERE a = ALL($1)| 1
 SELECT pg_stat_statements_reset()   | 1
 SELECT * FROM test WHERE a IN ($1, $2, $3)  | 1
 SELECT * FROM test WHERE a IN (... [10-99 entries]) | 2

Of course it's pity it doesn't collapse the below ones:

SELECT * FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11)) 
AS t (num);
INSERT INTO dummy VALUES(1, 'text 1'),(2, 'text 2'),(3, 'text 3'),(4, 'text 
3'),(5, 'text 3'),(6, 'text 3'),(7, 'text 3'),(8, 'text 3'),(9, 'text 3'),(10, 
'text 3') ON CONFLICT (id) DO NOTHING;
PREPARE s3(int[], int[], int[], int[], int[], int[], int[], int[], int[], 
int[], int[]) AS SELECT * FROM test WHERE 
a = ANY ($1::int[]) OR 
a = ANY ($2::int[]) OR
[..]
a = ANY ($11::int[]) ;

but given the convoluted thread history, it's understandable and as you stated 
- maybe in future.

There's one additional benefit to this patch: the pg_hint_plan extension seems 
to borrow pgss's generate_normalized_query(). So if that's changed in next 
major release, the pg_hint_plan hint table (transparent plan rewrite using 
table) will automatically benefit from generalization of the query string here 
(imagine fixing plans for ORM that generate N {1,1024} number of IN() array 
elements; today that would be N number of entries in the "hint_plan.hints" 
table).

The new status of this patch is: Needs review


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Bharath Rupireddy
On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Good catch, I could not notice because it worked well in my RHEL. Here is the
> updated version.

Thanks for the patch. I have some comments on v42:

1.
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',

+if (PG_ARGISNULL(0))
+elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

Can proisstrict => 'f' be removed so that there's no need for explicit
PG_ARGISNULL check? Any specific reason to keep it?

And, the before the ISNULL check the arg is read, which isn't good.

2.
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)

The function name looks too generic in the sense that it validates WAL
records for correctness/corruption, but it is not. Can it be something
like binary_upgrade_{check_for_wal_logical_end,
check_for_logical_end_of_wal} or such?

3.
+/* Quick exit if the given lsn is larger than current one */
+if (start_lsn >= GetFlushRecPtr(NULL))
+PG_RETURN_BOOL(false);
+

An LSN that doesn't exists yet is an error IMO, may be an error better here?

4.
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *

This comment before the function better be at the callsite of the
function, because as far as this function is concerned, it checks if
there are any WAL records that are not "certain" types after the given
LSN, it doesn't know logical slots or confirmed_flush_lsn or such.

5. Trying to understand the interaction of this feature with custom
WAL records that a custom WAL resource manager puts in. Is it okay to
have custom WAL records after the "logical WAL end"?
+/*
+ * There is a possibility that following records may be generated
+ * during the upgrade.
+ */

6.
+if (PQntuples(res) != 1)
+pg_fatal("could not count the number of logical replication slots");
+

Not existing a single logical replication slot an error? I think it
must be if (PQntuples(res) == 0) return;?

7. A nit:
+nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+if (nslots_on_new)

Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?

8.
+if (nslots_on_new)
+pg_fatal("expected 0 logical replication slots but found %d",
+ nslots_on_new);

How about "New cluster database is containing logical replication
slots", note that the some of the fatal messages are starting with an
upper-case letter.

9.
+res = executeQueryOrDie(conn, "SHOW wal_level;");
+res = executeQueryOrDie(conn, "SHOW max_replication_slots;");

Instead of 2 queries to determine required parameters, isn't it better
with a single query like the following?

select setting from pg_settings where name in ('wal_level',
'max_replication_slots') order by name;

10.
Why just wal_level and max_replication_slots, why not
max_worker_processes and max_wal_senders too? I'm looking at
RecoveryRequiresIntParameter and if they are different on the upgraded
instance, chances that the logical replication won't work, no?

11.
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;

This might be a recipie for sporadic test failures - how is it
guaranteed that the newly generated WAL records aren't consumed.

May be stop subscriber or temporarily disable the subscription and
then generate WAL records?

12.
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+

Why not these functions be defined in xlogreader.h with elog/ereport
in #ifndef FRONTEND #endif blocks? IMO, xlogreader.h seems right
location for these functions.

13.
+LogicalReplicationSlotInfo

Where is this structure defined?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Comment about set_join_pathlist_hook()

2023-09-21 Thread Etsuro Fujita
Hi,

On Wed, Sep 20, 2023 at 7:49 PM David Rowley  wrote:
> On Wed, 20 Sept 2023 at 22:06, Etsuro Fujita  wrote:
> > So I would like to propose to extend the comment to explain what they
> > can do, as in the comment about set_rel_pathlist_hook() in allpaths.c.
> > Attached is a patch for that.
>
> Looks good to me.
>
> I see you've copy/edited the comment just above the call to
> set_rel_pathlist_hook(). That makes sense.

Cool!  Pushed.

Thanks for taking a look!

Best regards,
Etsuro Fujita




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Hayato Kuroda (Fujitsu)
Dear Hackers,

> Good catch, I could not notice because it worked well in my RHEL. Here is the
> updated version.

I did some cosmetic changes for the patch, the functionality was not changed.
E.g., a macro function was replaced to an inline.

Note that cfbot got angry to old patch, but it seemed the infrastructure-side
error. Let's see again.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v43-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patch
Description:  v43-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patch


Re: Add 'worker_type' to pg_stat_subscription

2023-09-21 Thread Amit Kapila
On Thu, Sep 21, 2023 at 1:00 AM Nathan Bossart  wrote:
>
> On Tue, Sep 19, 2023 at 08:36:35AM +0530, Amit Kapila wrote:
> > I am of the opinion that worker_type should be 'apply' instead of
> > 'leader apply' because even when it is a leader for parallel apply
> > worker, it could perform individual transactions apply. For reference,
> > I checked pg_stat_activity.backend_type, there is nothing called main
> > or leader backend even when the backend is involved in parallel query.
>
> Okay.  Here is v9 of the patch with this change.
>

The changes looks good to me, though I haven't tested it. But feel
free to commit if you are fine with this patch.

-- 
With Regards,
Amit Kapila.




Questions about the new subscription parameter: password_required

2023-09-21 Thread Benoit Lobréau

Hi,

I am confused about the new subscription parameter: password_required.

I have two instances. The publisher's pg_hba is configured too allow 
connections without authentication. On the subscriber, I have an 
unprivileged user with pg_create_subscription and CREATE on the database.


I tried using a superuser to create a subsciption without setting the 
password_required parameter (the default is true). Then I changed the 
owner to the unprivileged user.


This user can use the subscription without limitation (including ALTER 
SUBSCRIPTION ENABLE / DISABLE). The \dRs+ metacommand shows that a 
password is requiered, which is not the case (or it is but it's not 
enforced).


Is this normal? I was expecting the ALTER SUBSCRIPTION .. OWNER to fail.

When I try to drop the subscription with the unprivileged user or a 
superuser, I get an error:


ERROR:  password is required
DETAIL:  Non-superuser cannot connect if the server does not request a 
password.
HINT:  Target server's authentication method must be changed, or set 
password_required=false in the subscription parameters.


I have to re-change the subscription owner to the superuser, to be able 
to drop it.


(See password_required.sql and password_required.log)

I tried the same setup and changed the connexion string to add an 
application_name with the unprivileged user. In this case, I am reminded 
that I need a password. I tried modifying password_required to false 
with the superuser and modify the connexion string with the unprivilege 
user again. It fails with:


HINT:  Subscriptions with the password_required option set to false may 
only be created or modified by the superuser.


I think that this part works as intended.

I tried dropping the subscription with the unprivilege user: it works. 
Is it normal (given the previous message)?


(see password_required2.sql and password_required2.log)

--
Benoit Lobréau
Consultant
http://dalibo.com

--
\c tests_pg16 postgres
You are now connected to database "tests_pg16" as user "postgres".
--
SELECT version();
 version  
--
 PostgreSQL 16.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1), 64-bit
(1 row)

--
CREATE SUBSCRIPTION sub_pg16   
   CONNECTION 'host=/var/run/postgresql port=5437 user=user_pub_pg16 dbname=tests_pg16'
   PUBLICATION pub_pg16;
psql:/home/benoit/tmp/password_required.sql:8: NOTICE:  created replication slot "sub_pg16" on publisher
CREATE SUBSCRIPTION
--
ALTER SUBSCRIPTION sub_pg16 OWNER TO sub_owner ;
ALTER SUBSCRIPTION
--
\x
Expanded display is on.
\dRs+
List of subscriptions
-[ RECORD 1 ]--+
Name   | sub_pg16
Owner  | sub_owner
Enabled| t
Publication| {pub_pg16}
Binary | f
Streaming  | off
Two-phase commit   | d
Disable on error   | f
Origin | any
Password required  | t
Run as owner?  | f
Synchronous commit | off
Conninfo   | host=/var/run/postgresql port=5437 user=user_pub_pg16 dbname=tests_pg16
Skip LSN   | 0/0

\du+
List of roles
-[ RECORD 1 ]---
Role name   | postgres
Attributes  | Superuser, Create role, Create DB, Replication, Bypass RLS
Description | 
-[ RECORD 2 ]---
Role name   | sub_owner
Attributes  | 
Description | 

\l tests_pg16
List of databases
-[ RECORD 1 ]-+--
Name  | tests_pg16
Owner | postgres
Encoding  | UTF8
Locale Provider   | libc
Collate   | C
Ctype | C
ICU Locale| 
ICU Rules | 
Access privileges | =Tc/postgres +
  | postgres=CTc/postgres+
  | sub_owner=C/postgres

\x
Expanded display is off.
--
\c - sub_owner 
You are now connected to database "tests_pg16" as user "sub_owner".
--
ALTER SUBSCRIPTION sub_pg16 DISABLE;
ALTER SUBSCRIPTION
--
ALTER SUBSCRIPTION sub_pg16 ENABLE;
ALTER SUBSCRIPTION
--
ALTER SUBSCRIPTION sub_pg16 RENAME TO sub_pg16_renamed;
ALTER SUBSCRIPTION
--
DROP SUBSCRIPTION sub_pg16_renamed ;
psql:/home/benoit/tmp/password_required.sql:26: ERROR:  password is required
DETAIL:  Non-superuser cannot connect if the server does not request a password.
HINT:  Target server's authentication method must be changed, or set password_required=false in the subscription parameters.
--
\c - postgres
You are now connected to database "tests_pg16" as user "postgres".
--
DROP SUBSCRIPTION sub_pg16_renamed;
psql:/home/benoit/tmp/password_required.sql:30: ERROR:  password is required
DETAIL:  Non-superuser cannot connect if the server does not request a password.
HINT:  Target server's authentication method 

Re: Guiding principle for dropping LLVM versions?

2023-09-21 Thread Daniel Gustafsson
> On 21 Sep 2023, at 07:28, Tom Lane  wrote:
> 
> Thomas Munro  writes:
>> I wonder if there is a good way to make this sort of thing more
>> systematic.  If we could agree on a guiding principle vaguely like the
>> above, then perhaps we just need a wiki page that lists relevant
>> distributions, versions and EOL dates, that could be used to reduce
>> the combinations of stuff we have to consider and make the pruning
>> decisions into no-brainers.

As someone who on occasion poke at OpenSSL compat code I would very much like a
more structured approach around dealing with dependencies.

> Thus, I think it's worthwhile to spend effort on back-patching
> new-LLVM compatibility fixes into old PG branches, but I agree
> that newer PG branches can drop compatibility with obsolete
> LLVM versions.

+1

> LLVM is maybe not the poster child for these concerns -- for
> either direction of compatibility problems, someone could build
> without JIT support and not really be dead in the water.

Right, OpenSSL on the other hand might be better example since removing TLS
support is likely a no-show.  I can see both the need to use an old OpenSSL
version in a backbranch due to certifications etc, as well as a requirement in
other cases to use the latest version due to CVE's.

> In any case, I agree with your prior decision to not touch v11
> for this.  With that branch's next release being its last,
> I think the odds of introducing a bug we can't fix later
> outweigh any arguable portability gain.

Agreed.

--
Daniel Gustafsson





Re: Index range search optimization

2023-09-21 Thread Alexander Korotkov
On Wed, Sep 20, 2023 at 5:07 PM Alexander Korotkov  wrote:
> On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan  wrote:
>  This also makes sense. I've rephrased the comment.

The revised patch is attached.  It contains better comments and the
commit message.  Peter, could you please check if you're OK with this?

--
Regards,
Alexander Korotkov


0001-Skip-checking-of-scan-keys-required-for-direction-v3.patch
Description: Binary data


Re: pg_upgrade and logical replication

2023-09-21 Thread Amit Kapila
On Thu, Sep 21, 2023 at 4:39 AM Michael Paquier  wrote:
>
> On Wed, Sep 20, 2023 at 04:54:36PM +0530, Amit Kapila wrote:
> > Also, the patch seems to be allowing subscription relations from PG
> > >=10 to be migrated but how will that work if the corresponding
> > publisher is also upgraded without slots? Won't the corresponding
> > workers start failing as soon as you restart the upgrade server? Do we
> > need to document the steps for users?
>
> Hmm?  How is that related to the upgrade of the subscribers?
>

It is because after upgrade of both publisher and subscriber, the
subscriptions won't work. Both publisher and subscriber should work,
otherwise, the logical replication set up won't work. I think we can
probably do this, if we can document clearly how the user can make
their logical replication set up work after upgrade.

>
>  And how
> is that different from the case where a subscriber tries to connect
> back to a publisher where a slot has been dropped?
>

It is different because we don't drop slots automatically anywhere else.

-- 
With Regards,
Amit Kapila.




Re: pg_upgrade and logical replication

2023-09-21 Thread Amit Kapila
On Thu, Sep 21, 2023 at 11:37 AM Michael Paquier  wrote:
>
> On Wed, Sep 20, 2023 at 04:54:36PM +0530, Amit Kapila wrote:
> > Is the check to ensure remote_lsn is valid correct in function
> > check_for_subscription_state()? How about the case where the apply
> > worker didn't receive any change but just marked the relation as
> > 'ready'?
>
> I may be missing, of course, but a relation is switched to
> SUBREL_STATE_READY only once a sync happened and its state was
> SUBREL_STATE_SYNCDONE, implying that SubscriptionRelState->lsn is
> never InvalidXLogRecPtr, no?
>

The check in the patch is about the logical replication worker's
origin's LSN. The value of SubscriptionRelState->lsn won't matter for
the check.

-- 
With Regards,
Amit Kapila.




Re: New WAL record to detect the checkpoint redo location

2023-09-21 Thread Amit Kapila
On Thu, Sep 21, 2023 at 7:05 AM Robert Haas  wrote:
>
> On Mon, Sep 18, 2023 at 2:57 PM Robert Haas  wrote:
> > I've been brainstorming about this today, trying to figure out some
> > ideas to make it work.
>
> Here are some patches.
>
> 0001 refactors XLogInsertRecord to unify a couple of separate tests of
> isLogSwitch, hopefully making it cleaner and cheaper to add more
> special cases.
>
> 0002 is a very minimal patch to add XLOG_CHECKPOINT_REDO without using
> it for anything.
>
> 0003 modifies CreateCheckPoint() to insert an XLOG_CHECKPOINT_REDO
> record for any non-shutdown checkpoint, and modifies
> XLogInsertRecord() to treat that as a new special case, wherein after
> inserting the record the redo pointer is reset while still holding the
> WAL insertion locks.
>

After the 0003 patch, do we need acquire exclusive lock via
WALInsertLockAcquireExclusive() for non-shutdown checkpoints. Even the
comment "We must block concurrent insertions while examining insert
state to determine the checkpoint REDO pointer." seems to indicate
that it is not required. If it is required then we may want to change
the comments and also acquiring the locks twice will have more cost
than acquiring it once and write the new WAL record under that lock.

One minor comment:
+ else if (XLOG_CHECKPOINT_REDO)
+ class = WALINSERT_SPECIAL_CHECKPOINT;
+ }

Isn't the check needs to compare the record type with info?

Your v6-0001* patch looks like an improvement to me even without the
other two patches.

BTW, I would like to mention that there is a slight interaction of
this work with the patch to upgrade/migrate slots [1]. Basically in
[1], to allow slots migration from lower to higher version, we need to
ensure that all the WAL has been consumed by the slots before clean
shutdown. However, during upgrade we can generate few records like
checkpoint which we will ignore for the slot consistency checking as
such records doesn't matter for data consistency after upgrade. We
probably need to add this record to that list. I'll keep an eye on
both the patches so that we don't miss that interaction but mentioned
it here to make others also aware of the same.

[1] - 
https://www.postgresql.org/message-id/TYAPR01MB586615579356A84A8CF29A00F5F9A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: remaining sql/json patches

2023-09-21 Thread Alvaro Herrera
I keep looking at 0001, and in the struct definition I think putting the
escontext at the bottom is not great, because there's a comment a few
lines above that says "XXX: following fields only needed during
"compilation"), could be thrown away afterwards".  This comment is not
strictly true, because innermost_caseval is actually used by
array_map(); yet it seems that ->escontext should appear before that
comment.

However, if you put it before steps_len, it would push members steps_len
and steps_alloc beyond the struct's first cache line(*).  If those
struct members are critical for expression init performance, then maybe
it's not a good tradeoff.  I don't know if this was struct laid out
carefully with that consideration in mind or not.

Also, ->escontext's own comment in ExprState seems to be saying too much
and not saying enough.  I would reword it as "For expression nodes that
support soft errors.  NULL if caller wants them thrown instead".  The
shortest I could make so that it fits in a single is "For nodes that can
error softly. NULL if caller wants them thrown", or "For
soft-error-enabled nodes.  NULL if caller wants errors thrown".  Not
sure if those are good enough, or just make the comment the whole four
lines ...


(*) This is what pahole says about the struct as 0001 would put it:

struct ExprState {
NodeTagtype; /* 0 4 */
uint8  flags;/* 4 1 */
_Bool  resnull;  /* 5 1 */

/* XXX 2 bytes hole, try to pack */

Datum  resvalue; /* 8 8 */
TupleTableSlot *   resultslot;   /*16 8 */
struct ExprEvalStep *  steps;/*24 8 */
ExprStateEvalFunc  evalfunc; /*32 8 */
Expr * expr; /*40 8 */
void * evalfunc_private; /*48 8 */
intsteps_len;/*56 4 */
intsteps_alloc;  /*60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct PlanState * parent;   /*64 8 */
ParamListInfo  ext_params;   /*72 8 */
Datum *innermost_caseval;/*80 8 */
_Bool *innermost_casenull;   /*88 8 */
Datum *innermost_domainval;  /*96 8 */
_Bool *innermost_domainnull; /*   104 8 */
ErrorSaveContext * escontext;/*   112 8 */

/* size: 120, cachelines: 2, members: 18 */
/* sum members: 118, holes: 1, sum holes: 2 */
/* last cacheline: 56 bytes */
};

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)




Re: should frontend tools use syncfs() ?

2023-09-21 Thread Maxim Orlov
On Wed, 20 Sept 2023 at 22:08, Nathan Bossart 
wrote:

> I think we should also consider renaming things like SYNC_METHOD_FSYNC to
> WAL_SYNC_METHOD_FSYNC, and sync_method_options to wal_sync_method_options.
>

I've already rename sync_method_options in previous patch.
 34 @@ -171,7 +171,7 @@ static bool check_wal_consistency_checking_deferred
= false;
 35  /*
 36   * GUC support
 37   */
 38 -const struct config_enum_entry sync_method_options[] = {
 39 +const struct config_enum_entry wal_sync_method_options[] = {

As for SYNC_METHOD_FSYNC rename, PFA patch.
Also make enum for WAL sync methods instead of defines.

-- 
Best regards,
Maxim Orlov.


v11-0001-Fix-conflicting-types-for-sync_method.patch
Description: Binary data


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Amit Kapila
On Thu, Sep 21, 2023 at 1:10 PM Michael Paquier  wrote:
>
> On Wed, Sep 20, 2023 at 11:28:33AM +, Hayato Kuroda (Fujitsu) wrote:
> > Good catch, I could not notice because it worked well in my RHEL. Here is 
> > the
> > updated version.
>
> I am getting slowly up to date with this patch..  But before going in
> depth with more review, there is something that I got to ask: why is
> there no option to control if the slots are copied across the upgrade?
> At least, I would have imagined that an option to disable the copy of
> the slots would be adapted, say a --no-slot-copy or similar to get
> back to the old behavior if need be.
>

We have discussed this point. Normally, we don't have such options in
upgrade, so we were hesitent to add a new one for this but there is a
discussion to add an --exclude-logical-slots option. We are planning
to add that as a separate patch after getting some more consensus on
it. Right now, the idea is to get the main patch ready.

> + * This is because before that the logical slots are not saved at shutdown, 
> so
> + * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
>
> Is this comment in get_old_cluster_logical_slot_infos() still true
> after e0b2eed047d?
>

Yes, we didn't backpatched it, so slots from pre-17 won't be flushed
at shutdown time even if required.

-- 
With Regards,
Amit Kapila.




Re: Array initialisation notation in syscache.c

2023-09-21 Thread Peter Eisentraut

On 31.03.23 04:16, Thomas Munro wrote:

 From the light relief department, here is some more variadic macrology:

-   tp = SearchSysCache1(TSPARSEROID, ObjectIdGetDatum(prsId));
+   tp = SearchSysCache(TSPARSEROID, ObjectIdGetDatum(prsId));


I'm worried that if we are removing the variants with the explicit 
numbers, it will make it difficult for extensions to maintain 
compatibility with previous PG major versions.  They would probably have 
to copy much of your syscache.h changes into their own code.  Seems messy.






Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z

2023-09-21 Thread Ryoga Yoshida

On 2023-09-21 14:50, David Rowley wrote:

I've pushed your patch plus some additional code to escape the text
specified in --buffer-usage-limit before passing it to the server in
commit 5cfba1ad6

Thanks again for the report.


Thank you for the commit. I didn't notice about the escaping and it 
seemed like it would be difficult for me to fix, so I appreciate your 
help.


Ryoga Yoshida




Re: CI: Unfamiliar error while testing macOS

2023-09-21 Thread Daniel Gustafsson
> On 21 Sep 2023, at 05:25, Hayato Kuroda (Fujitsu)  
> wrote:
> 
> Dear hackers,
> 
> While developing my patch, I found that the CI for macOS failed with unknown 
> error [1].
> Do you know the reason why it happened? Please tell me if you have 
> workarounds...
> 
> It failed the test at "Upload 'ccache' cache". The Cirrus app said a 
> following message:
> 
>> Persistent worker failed to start the task: remote agent failed: failed to 
>> run agent: wait: remote command exited without exit status or exit signal

That looks like a transient infrastructure error and not something related to
your patch, it will most likely go away for the next run.

--
Daniel Gustafsson





Re: bug fix and documentation improvement about vacuumdb

2023-09-21 Thread Daniel Gustafsson
> On 21 Sep 2023, at 03:53, Kuwamura Masaki  
> wrote:

> When sending an update, please include the previous patch as well with your 
> new
> tests as a 0002 patch in a patchset.  The CFBot can only apply and build/test
> patches when the entire patchset is attached to the email.  The below
> testresults indicate that the patch failed the new tests (which in a way is
> good since without the fix the tests *should* fail), since the fix patch was
> not applied:
> 
> http://cfbot.cputube.org/masaki-kuwamura.html 
> 
> I'm sorry, I didn't know that. I attached both the test and fix patch to this 
> mail.

No worries at all.  If you look at the page now you will see all green
checkmarks indicating that the patch was tested in CI.  So now we know that
your tests fail without the fix and work with the fix applied, so all is well.

--
Daniel Gustafsson





Re: Make --help output fit within 80 columns per line

2023-09-21 Thread Peter Eisentraut

On 31.08.23 09:47, torikoshia wrote:
BTW, psql --help outputs the content of PGHOST, which caused a failure 
in the test:


```
-h, --host=HOSTNAME  database server host or socket directory
  (default: 
"/var/folders/m7/9snkd5b54cx_b4lxkl9ljlccgn/T/LobrmSUf7t")

```

It may be overkill, added a logic for removing the content of PGHOST.


I wonder, should we remove this?  We display the 
environment-variable-based defaults in psql --help, but not for any 
other programs.  This is potentially misleading.






Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-21 Thread Michael Paquier
On Wed, Sep 20, 2023 at 11:28:33AM +, Hayato Kuroda (Fujitsu) wrote:
> Good catch, I could not notice because it worked well in my RHEL. Here is the
> updated version.

I am getting slowly up to date with this patch..  But before going in
depth with more review, there is something that I got to ask: why is
there no option to control if the slots are copied across the upgrade?
At least, I would have imagined that an option to disable the copy of
the slots would be adapted, say a --no-slot-copy or similar to get
back to the old behavior if need be.

+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk

Is this comment in get_old_cluster_logical_slot_infos() still true
after e0b2eed047d?
--
Michael


signature.asc
Description: PGP signature


Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z

2023-09-21 Thread David Rowley
On Thu, 21 Sept 2023 at 17:59, Michael Paquier  wrote:
> That was fast.  If I may ask, why don't you have some regression tests
> for the two code paths of vacuumdb that append this option to the
> commands generated for VACUUM and ANALYZE?

I think we have differing standards for what constitutes as a useful
test.  For me, there would have to be a much higher likelihood of this
ever getting broken again.

I deem it pretty unlikely that someone will accidentally remove the
code that I just committed and a test to test that vacuumdb -Z
--buffer-usage-limit ... passes the BUFFER_USAGE_LIMIT option would
likely just forever mark that we once had a trivial bug that forgot to
include the --buffer-usage-limit when -Z was specified.

If others feel strongly that a test is worthwhile, then I'll reconsider.

David




Re: pg_upgrade --check fails to warn about abstime

2023-09-21 Thread Suraj Kharage
Thanks, Alvaro, for working on this.

The patch looks good to me.

+ * similar to the above, but for types that were removed in 12.
Comment can start with a capital letter.

Also, We need to backport the same, right?

On Wed, Sep 20, 2023 at 10:24 PM Alvaro Herrera 
wrote:

> I got a complaint that pg_upgrade --check fails to raise red flags when
> the source database contains type abstime when upgrading from pg11.  The
> type (along with reltime and tinterval) was removed by pg12.
>
>
> In passing, while testing this, I noticed that the translation
> infrastructure in pg_upgrade/util.c is broken: we do have the messages
> in the translation catalog, but the translations for the messages from
> prep_status are never displayed.  So I made the quick hack of adding _()
> around the fmt, and this was the result:
>
> Verificando Consistencia en Vivo en el Servidor Antiguo
> ---
> Verificando las versiones de los clústerséxito
> Verificando que el usuario de base de datos es el usuario de
> instalaciónéxito
> Verificando los parámetros de conexión de bases de datoséxito
> Verificando transacciones preparadas  éxito
> Verificando tipos compuestos definidos por el sistema en tablas de
> usuarioéxito
> Verificando tipos de datos reg* en datos de usuario   éxito
> Verificando contrib/isn con discordancia en mecanismo de paso de
> bigintéxito
> Checking for incompatible "aclitem" data type in user tables  éxito
> Checking for removed "abstime" data type in user tables   éxito
> Checking for removed "reltime" data type in user tables   éxito
> Checking for removed "tinterval" data type in user tables éxito
> Verificando conversiones de codificación definidas por el usuarioéxito
> Verificando operadores postfix definidos por el usuario   éxito
> Verificando funciones polimórficas incompatibles éxito
> Verificando tablas WITH OIDS  éxito
> Verificando columnas de usuario del tipo «sql_identifier»   éxito
> Verificando la presencia de las bibliotecas requeridaséxito
> Verificando que el usuario de base de datos es el usuario de
> instalaciónéxito
> Verificando transacciones preparadas  éxito
> Verificando los directorios de tablespaces para el nuevo clústeréxito
>
> Note how nicely they line up ... not.  There is some code that claims to
> do this correctly, but apparently it counts bytes, not characters, and
> also it appears to be measuring the original rather than the
> translation.
>
> I think we're trimming the strings in the wrong places.  We need to
> apply _() to the originals, not the trimmed ones.  Anyway, clearly
> nobody has looked at this very much.
>
> --
> Álvaro Herrera PostgreSQL Developer  —
> https://www.EnterpriseDB.com/
> "We’ve narrowed the problem down to the customer’s pants being in a
> situation
>  of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)
>


-- 
--

Thanks & Regards,
Suraj kharage,



edbpostgres.com


Re: Unlogged relation copy is not fsync'd

2023-09-21 Thread Noah Misch
On Fri, Sep 15, 2023 at 02:47:45PM +0300, Heikki Linnakangas wrote:
> On 05/09/2023 21:20, Robert Haas wrote:

> Thinking about this some more, I think this is still not 100% correct, even
> with the patch I posted earlier:
> 
> > /*
> >  * When we WAL-logged rel pages, we must nonetheless fsync them.  The
> >  * reason is that since we're copying outside shared buffers, a 
> > CHECKPOINT
> >  * occurring during the copy has no way to flush the previously written
> >  * data to disk (indeed it won't know the new rel even exists).  A crash
> >  * later on would replay WAL from the checkpoint, therefore it wouldn't
> >  * replay our earlier WAL entries. If we do not fsync those pages here,
> >  * they might still not be on disk when the crash occurs.
> >  */
> > if (use_wal || relpersistence == RELPERSISTENCE_UNLOGGED)
> > smgrimmedsync(dst, forkNum);
> 
> Let's walk through each case one by one:
> 
> 1. Temporary table. No fsync() needed. This is correct.
> 
> 2. Unlogged rel, main fork. Needs to be fsync'd, because we skipped WAL, and
> also because we bypassed the buffer manager. Correct.

Agreed.

> 3. Unlogged rel, init fork. Needs to be fsync'd, even though we WAL-logged
> it, because we bypassed the buffer manager. Like the comment explains. This
> is now correct with the patch.

This has two subtypes:

3a. Unlogged rel, init fork, use_wal (wal_level!=minimal). Matches what
you wrote.

3b. Unlogged rel, init fork, !use_wal (wal_level==minimal). Needs to be
fsync'd because we didn't WAL-log it and RelationCreateStorage() won't call
AddPendingSync().  (RelationCreateStorage() could start calling
AddPendingSync() for this case.  I think we chose not to do that because there
will never be additional writes to the init fork, and smgrDoPendingSyncs()
would send the fork to FlushRelationsAllBuffers() even though the fork will
never appear in shared buffers.  On the other hand, grouping the sync with the
other end-of-xact syncs could improve efficiency for some filesystems.  Also,
the number of distinguishable cases is unpleasantly high.)

> 4. Permanent rel, use_wal == true. Needs to be fsync'd, even though we
> WAL-logged it, because we bypassed the buffer manager. Like the comment
> explains. Correct.
> 
> 5. Permanent rel, use_wal == false. We skip fsync, because it means that
> it's a new relation, so we have a sync request pending for it. (An assertion
> for that would be nice). At the end of transaction, in smgrDoPendingSyncs(),
> we will either fsync it or we will WAL-log all the pages if it was a small
> relation. I believe this is *wrong*. If smgrDoPendingSyncs() decides to
> WAL-log the pages, we have the same race condition that's explained in the
> comment, because we bypassed the buffer manager:
> 
> 1. RelationCopyStorage() copies some of the pages.
> 2. Checkpoint happens, which fsyncs the relation (smgrcreate() registered a
> dirty segment when the relation was created)
> 3. RelationCopyStorage() copies the rest of the pages.
> 4. smgrDoPendingSyncs() WAL-logs all the pages.

smgrDoPendingSyncs() delegates to log_newpage_range().  log_newpage_range()
loads each page into the buffer manager and calls MarkBufferDirty() on each.
Hence, ...

> 5. Another checkpoint happens. It does *not* fsync the relation.

... I think this will fsync the relation.  No?

> 6. Crash.




Re: Comment about set_join_pathlist_hook()

2023-09-21 Thread Lepikhov Andrei
On Thu, Sep 21, 2023, at 12:53 PM, Etsuro Fujita wrote:
> Hi,
>
> On Thu, Sep 21, 2023 at 11:49 AM Lepikhov Andrei
>  wrote:
>> On Wed, Sep 20, 2023, at 5:05 PM, Etsuro Fujita wrote:
>> > What I am concerned about from the report [1] is that this comment is
>> > a bit too terse; it might cause a misunderstanding that extensions can
>> > do different things than we intend to allow:
>> >
>> > /*
>> >  * 6. Finally, give extensions a chance to manipulate the path list.
>> >  */
>> > if (set_join_pathlist_hook)
>> > set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
>> >jointype, );
>> >
>> > So I would like to propose to extend the comment to explain what they
>> > can do, as in the comment about set_rel_pathlist_hook() in allpaths.c.
>> > Attached is a patch for that.
>>
>> It makes sense. But why do you restrict addition to pathlist by only the 
>> add_path() routine? It can fail to add a path to the pathlist. We need to 
>> find out the result of the add_path operation and need to check the pathlist 
>> each time. So, sometimes, it can be better to add a path manually.
>
> I do not agree with you on this point; I think you can do so at your
> own responsibility, but I think it is better for extensions to use
> add_path(), because that makes them stable.  (Assuming that add_path()
> has a bug and we change the logic of it to fix the bug, extensions
> that do not follow the standard procedure might not work anymore.)

Ok, I got it.This question related to the add_path() interface itself, not to 
the comment. It is awkward to check every time in the pathlist the result of 
the add_path.

>> One more slip-up could be prevented by the comment: removing a path from the 
>> pathlist we should remember about the cheapest_* pointers.
>
> Do we really need to do this?  I mean we do set_cheapest() afterward.
> See standard_join_search().

Agree, in the case of current join it doesn't make sense. I stuck in this 
situation because providing additional path at the current level I need to 
arrange paths for the inner and outer too.

Thanks for the explanation!

-- 
Regards,
Andrei Lepikhov




Re: pg_upgrade and logical replication

2023-09-21 Thread Michael Paquier
On Wed, Sep 20, 2023 at 04:54:36PM +0530, Amit Kapila wrote:
> Is the check to ensure remote_lsn is valid correct in function
> check_for_subscription_state()? How about the case where the apply
> worker didn't receive any change but just marked the relation as
> 'ready'?

I may be missing, of course, but a relation is switched to
SUBREL_STATE_READY only once a sync happened and its state was
SUBREL_STATE_SYNCDONE, implying that SubscriptionRelState->lsn is
never InvalidXLogRecPtr, no?

For instance, nothing happens when a
Assert(!XLogRecPtrIsInvalid(rstate->lsn)) is added in
process_syncing_tables_for_apply().
--
Michael


signature.asc
Description: PGP signature