Re: New strategies for freezing, advancing relfrozenxid early

2022-10-03 Thread Peter Geoghegan
On Mon, Oct 3, 2022 at 10:13 PM Jeff Davis  wrote:
> Take the case where you load a lot of data in one transaction. After
> the loading transaction finishes, those new pages will soon be marked
> all-visible.
>
> In the future, vacuum runs will have to decide what to do. If a vacuum
> decides to do an aggressive scan to freeze all of those pages, it may
> be at some unfortunate time and disrupt the workload. But if it skips
> them all, then it's just deferring the work until it runs up against
> autovacuum_freeze_max_age, which might also be at an unfortunate time.

Predicting the future accurately is intrinsically hard. We're already
doing that today by freezing lazily. I think that we can come up with
a better overall strategy, but there is always a risk that we'll come
out worse off in some individual cases. I think it's worth it if it
avoids ever really flying off the rails.

> So how does your patch series handle this case? I assume there's some
> mechanism to freeze a moderate number of pages without worrying about
> advancing relfrozenxid?

It mostly depends on whether or not the table exceeds the new
vacuum_freeze_strategy_threshold GUC in size at the time of the
VACUUM. This is 4GB by default, at least right now.

The case where the table size doesn't exceed that threshold yet will
see each VACUUM advance relfrozenxid when it happens to be very cheap
to do so, in terms of the amount of extra scanned_pages. If the number
of extra scanned_pages is less than 5% of the total table size
(current rel_pages), then we'll advance relfrozenxid early by making
sure to scan any all-visible pages.

Actually, this scanned_pages threshold starts at 5%. It is usually 5%,
but it will eventually start to grow (i.e. make VACUUM freeze eagerly
more often) once table age exceeds 50% of autovacuum_freeze_max_age at
the start of the VACUUM. So the skipping strategy threshold is more or
less a blend of physical units (heap pages) and logical units (XID
age).

Then there is the case where it's already a larger table at the point
a given VACUUM begins -- a table that ends up exceeding the same table
size threshold, vacuum_freeze_strategy_threshold. When that happens
we'll freeze all pages that are going to be marked all-visible as a
matter of policy (i.e. use eager freezing strategy), so that the same
pages can be marked all-frozen instead. We won't freeze pages that
aren't full of all-visible tuples (except for LP_DEAD items), unless
they have XIDs that are so old that vacuum_freeze_min_age triggers
freezing.

Once a table becomes larger than vacuum_freeze_strategy_threshold,
VACUUM stops marking pages all-visible in the first place,
consistently marking them all-frozen instead. So naturally there just
cannot be any all-visible pages after the first eager freezing VACUUM
(actually there are some obscure edge cases that can result in the odd
all-visible page here or there, but this should be extremely rare, and
have only negligible impact).

Bigger tables always have pages frozen eagerly, and in practice always
advance relfrozenxid early. In other words, eager freezing strategy
implies eager freezing strategy -- though not the other way around.
Again, these details that may change in the future. My focus is
validating the high level concepts.

So we avoid big spikes, and try to do the work when it's cheapest.

-- 
Peter Geoghegan




Re: ssl tests aren't concurrency safe due to get_free_port()

2022-10-03 Thread Andrew Dunstan


On 2022-10-02 Su 12:49, Andres Freund wrote:
>
> 2) Use a lockfile containing a pid to protect the choice of a port within a
>build directory. Before accepting a port get_free_port() would check if the
>a lockfile exists for the port and if so, if the test using it is still
>alive.  That will protect against racyness between multiple tests inside a
>build directory, but won't protect against races between multiple builds
>running concurrently on a machine (e.g. on a buildfarm host)
>
>

I think this is the right solution. To deal with the last issue, the
lockdir should be overrideable, like this:


  my $port_lockdir = $ENV{PG_PORT_LOCKDIR} || $build_dir;


Buildfarm animals could set this, probably to the global lockdir (see
run_branches.pl). Prior to that, buildfarm owners could do that manually.


There are numerous examples of lockfile code in the buildfarm sources.
I'll try to hack something up.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Use XLogFromFileName() in pg_resetwal to parse position from WAL file

2022-10-03 Thread Bharath Rupireddy
Hi,

It looks like there's an opportunity to replace explicit WAL file
parsing code with XLogFromFileName() in pg_resetwal.c. This was not
done then (in PG 10) because the XLogFromFileName() wasn't accepting
file size as an input parameter (see [1]) and pg_resetwal needed to
use WAL file size from the controlfile. Thanks to the commit
fc49e24fa69a15efacd5b8958115ed9c43c48f9a which added the
wal_segsz_bytes parameter to XLogFromFileName().

I'm attaching a small patch herewith. This removes using extra
variables in pg_resetwal.c and a bit of duplicate code too (5 LOC).

Thoughts?

[1] 
https://github.com/postgres/postgres/blob/REL_10_STABLE/src/include/access/xlog_internal.h

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Use-XLogFromFileName-in-pg_resetwal-to-parse-posi.patch
Description: Binary data


Re: [POC] Allow flattening of subquery with a link to upper query

2022-10-03 Thread Andrey Lepikhov

On 9/13/22 16:40, Andrey Lepikhov wrote:

On 5/9/2022 12:22, Richard Guo wrote:
On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:
To resolve both issues, lower outer join passes through pull_sublinks_* 
into flattening routine (see attachment).

I've added these cases into subselect.sql

In attachment - new version of the patch, rebased onto current master.

--
Regards
Andrey Lepikhov
Postgres Professional
From 6462578f8789cb831f45ebfad65308d6afabb833 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 28 Sep 2022 13:42:12 +0300
Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary
 join query. With many restrictions, check each subquery and pull up
 expressions, references upper query block. Works for operators '=' and 'IN'.
 Machinery of converting ANY subquery to JOIN stays the same with minor
 changes in walker function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pass a lower outer join through the pullup_sublink recursive procedure to find
out some situations when qual references outer side of an outer join.

[1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469.
---
 src/backend/optimizer/plan/subselect.c| 320 +++-
 src/backend/optimizer/prep/prepjointree.c |  71 ++--
 src/backend/optimizer/util/tlist.c|   2 +-
 src/backend/optimizer/util/var.c  |   8 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/optimizer/optimizer.h |   1 +
 src/include/optimizer/subselect.h |   3 +-
 src/include/optimizer/tlist.h |   1 +
 src/test/regress/expected/prepare.out |  18 +
 src/test/regress/expected/subselect.out   | 438 ++
 src/test/regress/sql/prepare.sql  |   7 +
 src/test/regress/sql/subselect.sql| 162 
 12 files changed, 1004 insertions(+), 37 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92e3338584..29da43bb4d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -32,6 +32,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
@@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context
 } inline_cte_walker_context;
 
 
+bool optimize_correlated_subqueries = true;
+
 static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
 		   List *plan_params,
 		   SubLinkType subLinkType, int subLinkId,
@@ -1229,6 +1232,288 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
 	return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+static bool
+contain_placeholders(Node *node, inline_cte_walker_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, Query))
+		return query_tree_walker((Query *) node, contain_placeholders, context, 0);
+	else if (IsA(node, PlaceHolderVar))
+	{
+		return true;
+	}
+
+	return expression_tree_walker(node, contain_placeholders, context);
+}
+
+/*
+ * To be pullable all clauses of flattening correlated subquery should be
+ * anded and mergejoinable (XXX: really necessary?)
+ */
+static bool
+quals_is_pullable(Node *quals)
+{
+	if (!contain_vars_of_level(quals, 1))
+		return true;
+
+	if (quals && IsA(quals, OpExpr))
+	{
+		OpExpr *expr = (OpExpr *) quals;
+		Node   *leftarg;
+
+		/* Contains only one expression */
+		leftarg = linitial(expr->args);
+		if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it really necessary ? */
+			return false;
+
+		if (contain_placeholders(quals, NULL))
+			return false;
+
+		return true;
+	}
+	else if (is_andclause(quals))
+	{
+		ListCell   *l;
+
+		foreach(l, ((BoolExpr *) quals)->args)
+		{
+			Node *andarg = (Node *) lfirst(l);
+
+			if (!IsA(andarg, OpExpr))
+return false;
+			if (!quals_is_pullable(andarg))
+return false;
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+typedef struct
+{
+	Query  *subquery;
+	int		newvarno;
+	List   *pulling_quals;
+	bool	varlevel_up;
+} correlated_t;
+
+static Node *
+pull_subquery_clauses_mutator(Node *node, correlated_t *ctx)
+{
+	if (node == NULL)
+		return NULL;
+
+	if (IsA(node, OpExpr) && !ctx->varlevel_up)
+	{
+		if (!contain_vars_of_level(node, 1))
+			return node;
+
+		/*
+		 * The expression contains links to upper relation. It will be pulled to
+		 * uplevel. All links into vars of upper levels must be changed.
+		 */
+
+		ctx->varlevel_up = true;
+		ctx->pulling_quals =
+			lappend(ctx->pulling_quals,
+	expression_tree_mutator(node,
+			pull_subquery_clauses_mutator,
+			(void *) ctx));
+		ctx->varlevel_up = false;
+
+		/* Replace position of pulled expression by the 'true' value */
+		return makeBoolConst(true, 

Re: GUC tables - use designated initializers

2022-10-03 Thread Peter Smith
On Wed, Sep 28, 2022 at 12:04 PM Peter Smith  wrote:
>
> On Wed, Sep 28, 2022 at 2:21 AM Tom Lane  wrote:
> >
> > Peter Smith  writes:
> > > Enums index a number of the GUC tables. This all relies on the
> > > elements being carefully arranged to be in the same order as those
> > > enums. There are comments to say what enum index belongs to each table
> > > element.
> > > But why not use designated initializers to enforce what the comments
> > > are hoping for?
> >
> > Interesting proposal, but it's not clear to me that this solution makes
> > the code more bulletproof rather than less so.  Yeah, you removed the
> > order dependency, but the other concern here is that the array gets
> > updated at all when adding a new enum entry.  This method seems like
> > it'd help disguise such an oversight.  In particular, the adjacent
> > StaticAssertDecls about the array lengths are testing something different
> > than they used to, and I fear they lost some of their power.
>
> Thanks for the feedback!
>
> The current code StaticAssertDecl asserts that the array length is the
> same as the number of enums by using hardwired knowledge of what enum
> is the "last" one (e.g. DEVELOPER_OPTIONS in the example below).
>
> StaticAssertDecl(lengthof(config_group_names) == (DEVELOPER_OPTIONS + 2),
> "array length mismatch");
>
> Hmmm. I think maybe I found the example to justify your fear. It's a
> bit subtle and AFAIK the HEAD code would not suffer this problem ---
> imagine if the developer adds the new enum before the "last" one (e.g.
> ADD_NEW_BEFORE_LAST comes before DEVELOPER_OPTIONS) and at the same
> time they *forgot* to update the table elements, then that designated
> index [DEVELOPER_OPTIONS] will still ensure the table becomes the
> correct increased length (so the StaticAssertDecl will be ok) except
> now there will be an undetected "hole" in the table at the forgotten
> [ADD_NEW_BEFORE_LAST] index.
>
> > Can we
> > improve those checks so they'd catch a missing entry again?
>
> Thinking...
>

Although adding designated initializers did fix some behaviour of the
current code (e.g. which assumed array element order, but cannot
enforce it), it also introduces some new unwanted quirks (e.g.
accidentally omitting table elements can now be undetectable).

I can't see any good coming from exchanging one kind of problem for a
new kind of problem, so I am abandoning the idea of using designated
initializers in these GUC tables.

~

The v2 patches are updated as follows:

0001 - Now this patch only fixes a comment that had a wrong enum name.
0002 - Removes unnecessary whitespace (same as v1-0002)

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v2-0002-GUC-tables-remove-unnecessary-whitespace.patch
Description: Binary data


v2-0001-GUC-tables-fix-enum-name.patch
Description: Binary data


Re: New strategies for freezing, advancing relfrozenxid early

2022-10-03 Thread Jeff Davis
On Mon, 2022-10-03 at 20:11 -0700, Peter Geoghegan wrote:
> True. Though I think that a strong bias in the direction of advancing
> relfrozenxid by some amount (not necessarily by very many XIDs) still
> makes sense, especially when we're already freezing aggressively.

Take the case where you load a lot of data in one transaction. After
the loading transaction finishes, those new pages will soon be marked
all-visible.

In the future, vacuum runs will have to decide what to do. If a vacuum
decides to do an aggressive scan to freeze all of those pages, it may
be at some unfortunate time and disrupt the workload. But if it skips
them all, then it's just deferring the work until it runs up against
autovacuum_freeze_max_age, which might also be at an unfortunate time.

So how does your patch series handle this case? I assume there's some
mechanism to freeze a moderate number of pages without worrying about
advancing relfrozenxid?

Regards,
Jeff Davis





Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-10-03 Thread Ken Kato

The problem is that you're not closing the 


Thank you for the reviews and comments.
I closed the  so that the problem should be fixed now.

Regards,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 342b20ebeb..14cb496fdc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4562,6 +4562,16 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
daemon
   
  
+
+ 
+  
+   last_vacuum_index_scans bigint
+  
+  
+   Number of splitted index scans performed during the the last vacuum
+   on this table
+  
+ 
 

   
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index dfbe37472f..c5b4405f4b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -629,7 +629,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 		 rel->rd_rel->relisshared,
 		 Max(vacrel->new_live_tuples, 0),
 		 vacrel->recently_dead_tuples +
-		 vacrel->missed_dead_tuples);
+		 vacrel->missed_dead_tuples,
+		 vacrel->num_index_scans);
 	pgstat_progress_end_command();
 
 	if (instrument)
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f7ec79e0..f854576b20 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -675,7 +675,9 @@ CREATE VIEW pg_stat_all_tables AS
 pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
 pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
 pg_stat_get_analyze_count(C.oid) AS analyze_count,
-pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count
+pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count,
+pg_stat_get_last_vacuum_index_scans(C.oid) AS last_vacuum_index_scans
+
 FROM pg_class C LEFT JOIN
  pg_index I ON C.oid = I.indrelid
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index a846d9ffb6..ffc9daf944 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -208,8 +208,8 @@ pgstat_drop_relation(Relation rel)
  * Report that the table was just vacuumed.
  */
 void
-pgstat_report_vacuum(Oid tableoid, bool shared,
-	 PgStat_Counter livetuples, PgStat_Counter deadtuples)
+pgstat_report_vacuum(Oid tableoid, bool shared, PgStat_Counter livetuples,
+	 PgStat_Counter deadtuples, PgStat_Counter num_index_scans)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
@@ -232,6 +232,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	tabentry->n_live_tuples = livetuples;
 	tabentry->n_dead_tuples = deadtuples;
+	tabentry->n_index_scans = num_index_scans;
 
 	/*
 	 * It is quite possible that a non-aggressive VACUUM ended up skipping
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index eadd8464ff..573f761f6b 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -179,6 +179,20 @@ pg_stat_get_dead_tuples(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+Datum
+pg_stat_get_last_vacuum_index_scans(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	int64		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = tabentry->n_index_scans;
+
+	PG_RETURN_INT64(result);
+}
 
 Datum
 pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 68bb032d3e..9cadf4df7a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5334,6 +5334,10 @@
   proname => 'pg_stat_get_autoanalyze_count', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_autoanalyze_count' },
+{ oid => '3813', descr => 'statistics: last vacuum index scans for a table',
+  proname => 'pg_stat_get_last_vacuum_index_scans', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_last_vacuum_index_scans' },
 { oid => '1936', descr => 'statistics: currently active backend IDs',
   proname => 'pg_stat_get_backend_idset', prorows => '100', proretset => 't',
   provolatile => 's', proparallel => 'r', prorettype => 'int4',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ad7334a0d2..4fa6894ec8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -366,6 +366,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter n_live_tuples;
 	PgStat_Counter n_dead_tuples;
+	PgStat_Counter n_index_scans;
 	PgStat_Counter changes_since_analyze;
 	

Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Jaime Casanova
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote:
> On 10/3/22 21:25, Jaime Casanova wrote:
> > On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
> >> On 9/29/22 08:53, Jaime Casanova wrote:
> >>> ...
> >>>
> >>> Just found one more ocurrance of this one with this index while an
> >>> autovacuum was running:
> >>>
> >>> """
> >>> CREATE INDEX bt_f8_heap_seqno_idx 
> >>> ON public.bt_f8_heap 
> >>> USING brin (seqno float8_minmax_multi_ops);
> >>> """
> >>> Attached is a backtrace.
> >>
> >> Thanks for the report!
> >>
> >> I think I see the issue - brin_minmax_multi_union does not realize the
> >> two summaries could have just one range each, and those can overlap so
> >> that merge_overlapping_ranges combines them into a single one.
> >>
> >> This is harmless, except that the assert int build_distances is overly
> >> strict. Not sure if we should just remove the assert or only compute the
> >> distances with (neranges>1).
> >>
> >> Do you happen to have the core dump? It'd be useful to look at ranges_a
> >> and ranges_b, to confirm this is indeed what's happening.
> >>
> > 
> > I do have it.
> > 
> > (gdb) p *ranges_a
> > $4 = {
> >   typid = 701,
> >   colloid = 0,
> >   attno = 0,
> >   cmp = 0x0,
> >   nranges = 0,
> >   nsorted = 1,
> >   nvalues = 1,
> >   maxvalues = 32,
> >   target_maxvalues = 32,
> >   values = 0x55d2ea1987c8
> > }
> > (gdb) p *ranges_b
> > $5 = {
> >   typid = 701,
> >   colloid = 0,
> >   attno = 0,
> >   cmp = 0x0,
> >   nranges = 0,
> >   nsorted = 1,
> >   nvalues = 1,
> >   maxvalues = 32,
> >   target_maxvalues = 32,
> >   values = 0x55d2ea196da8
> > }
> > 
> 
> Thanks. That mostly confirms my theory. I'd bet that this
> 
> (gdb) p ranges_a->values[0]
> (gdb) p ranges_b->values[0]
> 
> will print the same value. 
> 

you're right, they are same value

(gdb) p ranges_a->values[0]
$1 = 4679532294229561068
(gdb) p ranges_b->values[0]
$2 = 4679532294229561068

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: Small miscellaneous fixes

2022-10-03 Thread Michael Paquier
On Mon, Oct 03, 2022 at 08:05:57AM -0300, Ranier Vilela wrote:
> Em seg., 3 de out. de 2022 às 05:01, Masahiko Sawada 
> escreveu:
>> On Fri, Sep 30, 2022 at 9:08 AM Ranier Vilela  wrote:
>>> 1. Avoid useless reassigning var _logsegno
>> (src/backend/access/transam/xlog.c)
>>> Commit 7d70809 left a little oversight.
>>> XLByteToPrevSeg and XLByteToSeg are macros, and both assign _logsegno.
>>> So, the first assignment is lost and is useless.

Right, I have missed this one.  We do that now in
build_backup_content() when building the contents of the backup
history file.

>>> 4. Fix declaration volatile signal var (src/bin/pgbench/pgbench.c)
>>> Like how to commit 5ac9e86, this is a similar case.
>>
>> The same is true also for alarm_triggered in pg_test_fsync.c?
>>
> I don't think so.
> If I understand the problem correctly, the failure can occur with true
> signals, provided by the OS
> In the case at hand, it seems to me more like an internal form of signal,
> that is, simulated.
> So bool works fine.

I am not following your reasoning here.  Why does it matter to change
one but not the other?  Both are used with SIGALRM, it seems.

The other three seem fine, so fixed.
--
Michael


signature.asc
Description: PGP signature


Re: ExecRTCheckPerms() and many prunable partitions

2022-10-03 Thread Amit Langote
On Tue, Oct 4, 2022 at 12:54 PM Tom Lane  wrote:
> Amit Langote  writes:
> > On Thu, Jul 28, 2022 at 6:18 AM Tom Lane  wrote:
> >> ... One more thing: maybe we should rethink where to put
> >> extraUpdatedCols.  Between the facts that it's not used for
> >> actual permissions checks, and that it's calculated by the
> >> rewriter not parser, it doesn't seem like it really belongs
> >> in RelPermissionInfo.  Should we keep it in RangeTblEntry?
> >> Should it go somewhere else entirely?  I'm just speculating,
> >> but now is a good time to think about it.
>
> > I've kept extraUpdatedCols in RangeTblEntry in the latest patch, but
> > perhaps it makes sense to put that into Query?
>
> That's got morally the same problem as keeping it in RangeTblEntry:
> those are structures that are built by the parser.  Hacking on them
> later isn't terribly clean.
>
> I wonder if it could make sense to postpone calculation of the
> extraUpdatedCols out of the rewriter and into the planner, with
> the idea that it ends up $someplace in the finished plan tree
> but isn't part of the original parsetree.

Looking at PlannerInfo.update_colnos, something that's needed for
execution but not in Query, maybe we can make preprocess_targetlist()
also populate an PlannerInfo.extraUpdatedCols?

> A different aspect of this is that putting it in Query doesn't
> make a lot of sense unless there is only one version of the
> bitmap per Query.  In simple UPDATEs that would be true, but
> I think that inherited/partitioned UPDATEs would need one per
> result relation, which is likely the reason it got dumped in
> RangeTblEntry to begin with.

Yeah, so if we have PlannerInfos.extraUpdatedCols as the root table's
version of that, grouping_planner() can make copies for all result
relations and put the list in ModifyTable.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: ExecRTCheckPerms() and many prunable partitions

2022-10-03 Thread Tom Lane
Amit Langote  writes:
> On Thu, Jul 28, 2022 at 6:18 AM Tom Lane  wrote:
>> ... One more thing: maybe we should rethink where to put
>> extraUpdatedCols.  Between the facts that it's not used for
>> actual permissions checks, and that it's calculated by the
>> rewriter not parser, it doesn't seem like it really belongs
>> in RelPermissionInfo.  Should we keep it in RangeTblEntry?
>> Should it go somewhere else entirely?  I'm just speculating,
>> but now is a good time to think about it.

> I've kept extraUpdatedCols in RangeTblEntry in the latest patch, but
> perhaps it makes sense to put that into Query?

That's got morally the same problem as keeping it in RangeTblEntry:
those are structures that are built by the parser.  Hacking on them
later isn't terribly clean.

I wonder if it could make sense to postpone calculation of the
extraUpdatedCols out of the rewriter and into the planner, with
the idea that it ends up $someplace in the finished plan tree
but isn't part of the original parsetree.

A different aspect of this is that putting it in Query doesn't
make a lot of sense unless there is only one version of the
bitmap per Query.  In simple UPDATEs that would be true, but
I think that inherited/partitioned UPDATEs would need one per
result relation, which is likely the reason it got dumped in
RangeTblEntry to begin with.

regards, tom lane




Re: ExecRTCheckPerms() and many prunable partitions

2022-10-03 Thread Amit Langote
On Thu, Jul 28, 2022 at 6:18 AM Tom Lane  wrote:
> ... One more thing: maybe we should rethink where to put
> extraUpdatedCols.  Between the facts that it's not used for
> actual permissions checks, and that it's calculated by the
> rewriter not parser, it doesn't seem like it really belongs
> in RelPermissionInfo.  Should we keep it in RangeTblEntry?
> Should it go somewhere else entirely?  I'm just speculating,
> but now is a good time to think about it.

After fixing the issue related to this mentioned by Andres, I started
thinking more about this, especially the "Should it go somewhere else
entirely?" part.

I've kept extraUpdatedCols in RangeTblEntry in the latest patch, but
perhaps it makes sense to put that into Query?  So, the stanza in
RewriteQuery() that sets extraUpdatedCols will be changed as:

@@ -3769,7 +3769,8 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
NULL, 0, NULL);

/* Also populate extraUpdatedCols (for generated columns) */
-   fill_extraUpdatedCols(rt_entry, rt_perminfo, rt_entry_relation);
+   parsetree->extraUpdatedCols =
+   get_extraUpdatedCols(rt_entry, rt_perminfo, rt_entry_relation);
}
else if (event == CMD_MERGE)
{

Then, like withCheckOptionLists et al, have grouping_planner()
populate an extraUpdatedColsBitmaps in ModifyTable.
ExecInitModifyTable() will assign them directly into ResultRelInfos.
That way, anyplace in the executor that needs to look at
extraUpdatedCols of a given result relation can get that from its
ResultRelInfo, rather than from the RangeTblEntry as now.

Thoughts?


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Miscellaneous tab completion issue fixes

2022-10-03 Thread Michael Paquier
On Mon, Oct 03, 2022 at 06:29:32PM +0100, Dagfinn Ilmari Mannsåker wrote:
> vignesh C  writes:
>> +else if (TailMatchesCS("\\dRp*"))
>> +COMPLETE_WITH_QUERY(Query_for_list_of_publications[0].query);
>> +else if (TailMatchesCS("\\dRs*"))
>> +COMPLETE_WITH_QUERY(Query_for_list_of_subscriptions[0].query);
> 
> These are version-specific queries, so should be passed in their
> entirety to COMPLETE_WITH_VERSIONED_QUERY() so that psql can pick the
> right version, and avoid sending the query at all if the server is too
> old.

+1.

>> +/* add these to Query_for_list_of_roles in OWNER TO contexts */
>> +#define Keywords_for_list_of_owner_to_roles \
>> +"CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
> 
> I think this would read better without the TO, both in the comment and
> the constant name, similar to the below only having GRANT without TO:

Keywords_for_list_of_grant_roles is used in six code paths, so it
seems to me that there is little gain in having a separate #define
here.  Let's just specify the list of allowed roles (RoleSpec) where
OWNER TO is parsed.
--
Michael


signature.asc
Description: PGP signature


Re: [RFC] building postgres with meson - v13

2022-10-03 Thread Andres Freund
Hi,

On 2022-09-30 15:35:26 -0700, Andres Freund wrote:
> I was thinking of starting at least the following threads / CF entries once a
> few of the remaining things are resolved:
> 
> - PGXS compatibility, plus related autoconf simplification patches
> - pkg-config files for building postgres extensions
> - relative rpath support
> 
> I am a bit on the fence about whether it's worth doing so for:
> 
> - installcheck equivalent
> - precompiled header support (would like it soon, because it reduces
>   compile-test times substantially)
> 
> and, for no really tangible reason, considered
> - resource files generation
> - docs
> - docs dependency
> 
> to be part of this thread / CF entry.
> 
> Now that I think about it more, I am inclined to also push the docs changes to
> a new thread, just for wider visibility.
> 
> I think it'd be ok to commit the docs dependency fix soon, without a separate
> thread, as it really fixes a "build bug".

I've not yet posted these different threads, but I've split up the meson tree
into subtrees corresponding to pretty much the above.


The meson tree now mainly merges those subtrees together. It still directly
contains the xml-tools dependency wrapper (to be merged soon) and the CI
changes (either later or never).

I've attached a revised version of the xml-tools dependency wrapper (0001):
Cleanups, minor error handling improvements, and bit of comment polishing. I'd
welcome review. But as it fixes a build-dependency bug / FIXME, I'm planning
to push it relatively soon otherwise.

0002 fixes libpq's .pc file (static dependencies didn't show up anymore) and
AIX compilation. AIX doesn't yet support link_whole (support was merged into
meson yesterday though). On the way it also improves comments and a bit of
generic infrastructure.  The price for now is that the static libpq is built
separately from the shared one, not reusing any objects. I felt that the
complexity of reusing the objects isn't worth it for now.

Peter, it'd be great if you could have a look at 0002.

0003 mirrors the setup of libpq to the various ecpg libraries. This is a
prerequisite to adding resource files.

0004 adds the resource files


I think after that we could close the CF entry (and create a bunch of followup
entries, as discussed above). Although it somehow seems frivolous to start a
separate thread for "installcheck equivalent" :)

Greetings,

Andres Freund
>From b21850b49a9dd9f4a60eb9c243ba64be61e7e9c0 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 3 Oct 2022 18:26:24 -0700
Subject: [PATCH v18 01/22] meson: docs: Add xml{lint,proc} wrapper to collect
 dependencies

meson/ninja do not support specifying dependencies via globs (as those make it
significantly more expensive to check the current build state). Instead
targets should emit dependency information when running that then can be
cheaply re-checked during future builds.

To handle xmllint and xsltproc invocations in the docs, add and use a wrapper
that uses --load-trace to collect dependency information.

Author: Nazir Bilal Yavuz 
Author: Andres Freund 
Discussion: https://postgr.es/m/c5736f70-bb6d-8d25-e35c-e3d886e4e...@enterprisedb.com
---
 doc/src/sgml/meson.build  | 41 ---
 doc/src/sgml/xmltools_dep_wrapper | 54 +++
 2 files changed, 84 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/xmltools_dep_wrapper

diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index ba2a261e7a4..65fd6131344 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -2,7 +2,7 @@ docs = []
 alldocs = []
 doc_generated = []
 
-xmllint = find_program(get_option('XMLLINT'), native: true, required: false)
+xmllint_bin = find_program(get_option('XMLLINT'), native: true, required: false)
 
 
 version_sgml = configure_file(
@@ -60,14 +60,23 @@ doc_generated += custom_target('keywords-table.sgml',
 )
 
 # For everything else we need at least xmllint
-if not xmllint.found()
+if not xmllint_bin.found()
   subdir_done()
 endif
 
 pandoc = find_program('pandoc', native: true, required: false)
-xsltproc = find_program(get_option('XSLTPROC'), native: true, required: false)
+xsltproc_bin = find_program(get_option('XSLTPROC'), native: true, required: false)
 fop = find_program('fop', native: true, required: false)
 
+xmltools_wrapper = [
+  python, files('xmltools_dep_wrapper'),
+  '--targetname', '@OUTPUT@', '--depfile', '@DEPFILE@'
+]
+
+xmllint = xmltools_wrapper + [
+  '--tool', xmllint_bin, '--',
+]
+
 # Run validation only once, common to all subsequent targets.  While
 # we're at it, also resolve all entities (that is, copy all included
 # files into one big file).  This helps tools that don't understand
@@ -75,6 +84,7 @@ fop = find_program('fop', native: true, required: false)
 postgres_full_xml = custom_target('postgres-full.xml',
   input: 'postgres.sgml',
   output: 'postgres-full.xml',
+  depfile: 'postgres-full.xml.d',
   command: [xmllint, '--noent', 

Re: New strategies for freezing, advancing relfrozenxid early

2022-10-03 Thread Peter Geoghegan
On Mon, Oct 3, 2022 at 5:41 PM Jeff Davis  wrote:
> I like this general approach. The existing GUCs have evolved in a
> confusing way.

Thanks for taking a look!

> > For the most part the
> > skipping/freezing strategy stuff has a good sense of what matters
> > already, and shouldn't need to be guided very often.
>
> I'd like to know more clearly where manual VACUUM fits in here. Will it
> user a more aggressive strategy than an autovacuum, and how so?

There is no change whatsoever in the relationship between manually
issued VACUUMs and autovacuums. We interpret autovacuum_freeze_max_age
in almost the same way as HEAD. The only detail that's changed is that
we almost always interpret "freeze_table_age" as "just use
autovacuum_freeze_max_age" in the patch, rather than as
"vacuum_freeze_table_age, though never more than 95% of
autovacuum_freeze_max_age", as on HEAD.

Maybe this would be less confusing if I went just a bit further, and
totally got rid of the concept that vacuumlazy.c calls aggressive
VACUUM on HEAD -- then there really would be exactly one kind of
VACUUM, just like before the visibility map was first introduced back
in 2009. This would relegate antiwraparound-ness to just another
condition that autovacuum.c used to launch VACUUMs.

Giving VACUUM the freedom to choose where and how to freeze and
advance relfrozenxid based on both costs and benefits is key here.
Anything that needlessly imposes a rigid rule on vacuumlazy.c
undermines that -- it ties VACUUM's hands. The user can still
influence many of the details using high-level GUCs that work at the
table level, rather than GUCs that can only work at the level of
individual VACUUM operations (that leaves too much to chance). Users
shouldn't want or need to micromanage VACUUM.

> > The patch relegates vacuum_freeze_table_age to a compatibility
> > option,
> > making its default -1, meaning "just use autovacuum_freeze_max_age".
>
> The purpose of vacuum_freeze_table_age seems to be that, if you
> regularly issue VACUUM commands, it will prevent a surprise
> antiwraparound vacuum. Is that still the case?

The user really shouldn't need to do anything with
vacuum_freeze_table_age at all now. It's mostly just a way for the
user to optionally insist on advancing relfrozenxid via a
antiwraparound/aggressive VACUUM -- like in a manual VACUUM FREEZE.
Even VACUUM FREEZE shouldn't be necessary very often.

> Maybe it would make more sense to have vacuum_freeze_table_age be a
> fraction of autovacuum_freeze_max_age, and be treated as a maximum so
> that other intelligence might kick in and freeze sooner?

That's kind of how the newly improved skipping strategy stuff works.
It gives some weight to table age as one additional factor (based on
how close the table's age is to autovacuum_freeze_max_age or its Multi
equivalent).

If table age is (say) 60% of autovacuum_freeze_max_age, then VACUUM
should be "60% as aggressive" as a conventional
aggressive/antiwraparound autovacuum would be. What that actually
means is that the VACUUM will tend to prefer advancing relfrozenxid
the closer we get to the cutoff, gradually giving less and less
consideration to putting off work as we get closer and closer. When we
get to 100% then we'll definitely advance relfrozenxid (via a
conventional aggressive/antiwraparound VACUUM).

The precise details are unsettled, but I'm pretty sure that the
general idea is sound. Basically we're replacing
vacuum_freeze_table_age with a dynamic, flexible version of the same
basic idea. Now we don't just care about the need to advance
relfrozenxid (benefits), though; we also care about costs.

> >  This makes things less confusing for users and hackers.
>
> It may take an adjustment period ;-)

Perhaps this is more of an aspiration at this point.  :-)

> Yes, it's clearing things up, but it's still a complex problem.
> There's:
>
>  a. xid age vs the actual amount of deferred work to be done
>  b. advancing relfrozenxid vs skipping all-visible pages
>  c. difficulty in controlling reasonable behavior (e.g.
> vacuum_freeze_min_age often being ignored, freezing
> individual tuples rather than pages)
>
> Your first email described the motivation in terms of (a), but the
> patches seem more focused on (b) and (c).

I think that all 3 areas are deeply and hopelessly intertwined.

For example, vacuum_freeze_min_age is effectively ignored in many
important cases right now precisely because we senselessly skip
all-visible pages with unfrozen tuples, no matter what -- the problem
actually comes from the visibility map, which vacuum_freeze_min_age
predates by quite a few years. So how can you possibly address the
vacuum_freeze_min_age issues without also significantly revising VM
skipping behavior? They're practically the same problem!

And once you've fixed vacuum_freeze_min_age (and skipping), how can
you then pass up the opportunity to advance relfrozenxid early when
doing so will require only a little extra work? I'm going to regress

Re: shadow variables - pg15 edition

2022-10-03 Thread Justin Pryzby
On Tue, Oct 04, 2022 at 02:27:09PM +1300, David Rowley wrote:
> On Tue, 30 Aug 2022 at 17:44, Justin Pryzby  wrote:
> > Would you check if any of these changes are good enough ?
> 
> I looked through v5.txt and modified it so that the fix for the shadow
> warnings are more aligned to the spreadsheet I created.

Thanks

> diff --git a/src/backend/utils/adt/datetime.c 
> b/src/backend/utils/adt/datetime.c
> index 350039cc86..7848deeea9 100644
> --- a/src/backend/utils/adt/datetime.c
> +++ b/src/backend/utils/adt/datetime.c
> @@ -1019,17 +1019,17 @@ DecodeDateTime(char **field, int *ftype, int nf,
>   if (ptype == DTK_JULIAN)
>   {
>   char   *cp;
> - int val;
> + int jday;
>  
>   if (tzp == NULL)
>   return DTERR_BAD_FORMAT;
>  
>   errno = 0;
> - val = strtoint(field[i], , 10);
> + jday = strtoint(field[i], , 10);
>   if (errno == ERANGE || val < 0)
>   return DTERR_FIELD_OVERFLOW;

Here, you forgot to change "val < 0".

I tried to see how to make that fail (differently) but can't see yet how
pass a negative julian date..

-- 
Justin




RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Önder,

> As far as I can see this patch is mostly useful for detecting the failures
> on the initial remote command. This is especially common when the remote
> server does a failover/switchover and postgres_fdw uses a cached connection
> to access to the remote server.

Sounds reasonable. Do you mean that we can add additional GUC like 
"postgres_fdw.initial_check",
wait WL_SOCKET_CLOSED if the conneciton is found in the hash table, and do 
reconnection if it might be closed, right?

> I think any extension that deals with multiple Postgres nodes can benefit
> from such logic. In fact, the reason I realized this patch is that on the
> Citus extension, we are trying to solve a similar problem [1], [2].

> Thinking even more, I think any extension that uses libpq and WaitEventSets
> can benefit from such a function.

OK, I agreed it may be useful, but where should this function be?
This cannot be used from application, so it should not in interface/libpq dir. 
So backend/libpq/pqcomm.c?

> I think it also depends on where you decide to put
> pgfdw_connection_check_internal(). If you prefer the postgres_fdw side,
> could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?
> But if you decide to put it into the Postgres side, the API
> for pgfdw_connection_check_internal() -- or equivalent function -- could be
> discussed. Do we pass a WaitEventSet and if it is NULL create a new one,
> else use what is passed to the function? Not sure, maybe you can come up
> with a better API.

Thank you for describing more detail. I can imagine you said.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: shadow variables - pg15 edition

2022-10-03 Thread David Rowley
On Tue, 30 Aug 2022 at 17:44, Justin Pryzby  wrote:
> Would you check if any of these changes are good enough ?

I looked through v5.txt and modified it so that the fix for the shadow
warnings are more aligned to the spreadsheet I created.

I also fixed some additional warnings which leaves just 5 warnings. Namely:

../../../src/include/utils/elog.h:317:29: warning: declaration of
‘_save_exception_stack’ shadows a previous local
../../../src/include/utils/elog.h:318:39: warning: declaration of
‘_save_context_stack’ shadows a previous local
../../../src/include/utils/elog.h:319:28: warning: declaration of
‘_local_sigjmp_buf’ shadows a previous local
../../../src/include/utils/elog.h:320:22: warning: declaration of
‘_do_rethrow’ shadows a previous local
pgbench.c:7509:40: warning: declaration of ‘now’ shadows a previous local

The first 4 of those are due to a nested PG_TRY().  The final one I
just ran out of inspiration on what to rename the variable to.

If there are no objections then I'll push this in the next day or 2.

David
diff --git a/src/backend/access/brin/brin_minmax_multi.c 
b/src/backend/access/brin/brin_minmax_multi.c
index ed16f93acc..9a0bcf6698 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -3059,16 +3059,16 @@ brin_minmax_multi_summary_out(PG_FUNCTION_ARGS)
char   *a,
   *b;
text   *c;
-   StringInfoData str;
+   StringInfoData buf;
 
-   initStringInfo();
+   initStringInfo();
 
a = OutputFunctionCall(, 
ranges_deserialized->values[idx++]);
b = OutputFunctionCall(, 
ranges_deserialized->values[idx++]);
 
-   appendStringInfo(, "%s ... %s", a, b);
+   appendStringInfo(, "%s ... %s", a, b);
 
-   c = cstring_to_text_with_len(str.data, str.len);
+   c = cstring_to_text_with_len(buf.data, buf.len);
 
astate_values = accumArrayResult(astate_values,

 PointerGetDatum(c),
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index fc85ba99ac..553500cec0 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -397,7 +397,7 @@ restartScanEntry:
{
BlockNumber rootPostingTree = GinGetPostingTree(itup);
GinBtreeStack *stack;
-   Pagepage;
+   Pageentrypage;
ItemPointerData minItem;
 
/*
@@ -428,13 +428,13 @@ restartScanEntry:
 */
IncrBufferRefCount(entry->buffer);
 
-   page = BufferGetPage(entry->buffer);
+   entrypage = BufferGetPage(entry->buffer);
 
/*
 * Load the first page into memory.
 */
ItemPointerSetMin();
-   entry->list = GinDataLeafPageGetItems(page, 
>nlist, minItem);
+   entry->list = GinDataLeafPageGetItems(entrypage, 
>nlist, minItem);
 
entry->predictNumberResult = stack->predictNumber * 
entry->nlist;
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 75b214824d..bd4d85041d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6283,14 +6283,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 */
if (ISUPDATE_from_mxstatus(members[i].status))
{
-   TransactionId xid = members[i].xid;
+   TransactionId txid = members[i].xid;
 
-   Assert(TransactionIdIsValid(xid));
-   if (TransactionIdPrecedes(xid, relfrozenxid))
+   Assert(TransactionIdIsValid(txid));
+   if (TransactionIdPrecedes(txid, relfrozenxid))
ereport(ERROR,

(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg_internal("found update 
xid %u from before relfrozenxid %u",
-   
 xid, relfrozenxid)));
+   
 txid, relfrozenxid)));
 
/*
 * It's an update; should we keep it?  If the 
transaction is known
@@ -6304,13 +6304,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 * because of race conditions explained in detail in
 * heapam_visibility.c.
 */
-  

Re: New strategies for freezing, advancing relfrozenxid early

2022-10-03 Thread Jeff Davis
On Thu, 2022-09-08 at 13:23 -0700, Peter Geoghegan wrote:
> The new patch unifies the concept of antiwraparound
> VACUUM with the concept of aggressive VACUUM. Now there is only
> antiwraparound and regular VACUUM (uh, barring VACUUM FULL). And now
> antiwraparound VACUUMs are not limited to antiwraparound autovacuums
> -- a manual VACUUM can also be antiwraparound (that's just the new
> name for "aggressive").

I like this general approach. The existing GUCs have evolved in a
confusing way.

> For the most part the
> skipping/freezing strategy stuff has a good sense of what matters
> already, and shouldn't need to be guided very often.

I'd like to know more clearly where manual VACUUM fits in here. Will it
user a more aggressive strategy than an autovacuum, and how so?

> The patch relegates vacuum_freeze_table_age to a compatibility
> option,
> making its default -1, meaning "just use autovacuum_freeze_max_age".

The purpose of vacuum_freeze_table_age seems to be that, if you
regularly issue VACUUM commands, it will prevent a surprise
antiwraparound vacuum. Is that still the case?

Maybe it would make more sense to have vacuum_freeze_table_age be a
fraction of autovacuum_freeze_max_age, and be treated as a maximum so
that other intelligence might kick in and freeze sooner?

>  This makes things less confusing for users and hackers.

It may take an adjustment period ;-)

> The details of the skipping-strategy-choice algorithm are still
> unsettled in v3 (no real change there). ISTM that the important thing
> is still the high level concepts. Jeff was slightly puzzled by the
> emphasis placed on the cost model/strategy stuff, at least at one
> point. Hopefully my intent will be made clearer by the ideas featured
> in the new patch.

Yes, it's clearing things up, but it's still a complex problem.
There's:

 a. xid age vs the actual amount of deferred work to be done
 b. advancing relfrozenxid vs skipping all-visible pages
 c. difficulty in controlling reasonable behavior (e.g.
vacuum_freeze_min_age often being ignored, freezing
individual tuples rather than pages)

Your first email described the motivation in terms of (a), but the
patches seem more focused on (b) and (c).

>  The skipping strategy decision making process isn't
> particularly complicated, but it now looks more like an optimization
> problem of some kind or other.

There's another important point here, which is that it gives an
opportunity to decide to freeze some all-visible pages in a given round
just to reduce the deferred work, without worrying about advancing
relfrozenxid.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: Reducing the chunk header sizes on all memory context types

2022-10-03 Thread Ranier Vilela
 On Thu, 29 Sept 2022 at 18:30, David Rowley 
wrote:
> Does anyone have any opinions on this?
Hi,

Revisiting my work on reducing memory consumption, I found this patch left
out.
I'm not sure I can help.
But basically I was able to write and read the block size, in the chunk.
Could it be the case of writing and reading the context pointer in the same
way?
Sure this leads to some performance loss, but would it make it possible to
get the context pointer from the chunk?
In other words, would it be possible to save the context pointer and
compare it later in MemoryContextContains?

regards,
Ranier Vilela


001-reduces-memory-consumption.patch
Description: Binary data


installcheck-world concurrency issues

2022-10-03 Thread Andres Freund
Hi,

while working on installcheck support with meson, that currently running
installcheck-world fails regularly with meson and occasionally with make.

A way to quite reliably reproduce this with make is

make -s -j48 -C contrib/ USE_MODULE_DB=1 installcheck-adminpack-recurse 
installcheck-passwordcheck-recurse

that will fail with diffs like:

diff -du10 
/home/andres/src/postgresql/contrib/passwordcheck/expected/passwordcheck.out 
/home/andres/build/postgres/dev-assert/vpath/contrib/passwordcheck/res>
--- 
/home/andres/src/postgresql/contrib/passwordcheck/expected/passwordcheck.out
2022-10-03 15:56:57.900326662 -0700
+++ 
/home/andres/build/postgres/dev-assert/vpath/contrib/passwordcheck/results/passwordcheck.out
2022-10-03 15:56:59.930329973 -0700
@@ -1,19 +1,22 @@
 LOAD 'passwordcheck';
 CREATE USER regress_user1;
 -- ok
 ALTER USER regress_user1 PASSWORD 'a_nice_long_password';
+ERROR:  tuple concurrently deleted
 -- error: too short
 ALTER USER regress_user1 PASSWORD 'tooshrt';
-ERROR:  password is too short
+ERROR:  role "regress_user1" does not exist
 -- error: contains user name
 ALTER USER regress_user1 PASSWORD 'xyzregress_user1';
-ERROR:  password must not contain user name
+ERROR:  role "regress_user1" does not exist
 -- error: contains only letters

 LOAD 'passwordcheck';
 CREATE USER regress_user1;
 -- ok
 ALTER USER regress_user1 PASSWORD 'a_nice_long_password';
+ERROR:  tuple concurrently deleted
 -- error: too short
 ALTER USER regress_user1 PASSWORD 'tooshrt';
-ERROR:  password is too short
+ERROR:  role "regress_user1" does not exist
 -- error: contains user name


That's not surprising, given the common name of "regress_user1".

The attached patch fixes a number of instances of this issue. With it I got
through ~5 iterations of installcheck-world on ac, and >30 iterations with
meson.

There's a few further roles that seem to pose some danger goign forward:

./contrib/file_fdw/sql/file_fdw.sql:CREATE ROLE regress_no_priv_user LOGIN; 
-- has priv but no user mapping
./contrib/postgres_fdw/sql/postgres_fdw.sql:CREATE ROLE regress_view_owner 
SUPERUSER;
./contrib/postgres_fdw/sql/postgres_fdw.sql:CREATE ROLE regress_nosuper 
NOSUPERUSER;
./contrib/passwordcheck/sql/passwordcheck.sql:CREATE USER 
regress_passwordcheck_user1;
./contrib/citext/sql/create_index_acl.sql:CREATE ROLE regress_minimal;
./src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql:CREATE ROLE regress_r1;
./src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql:CREATE ROLE regress_s1;
./src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql:CREATE ROLE 
regress_role_joe;
./src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql:CREATE USER 
regress_test_user;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol0 
SUPERUSER LOGIN;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrolx 
SUPERUSER LOGIN;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol2 
SUPERUSER;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol1 
SUPERUSER LOGIN IN ROLE regress_testrol2;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE 
regress_role_haspriv;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE 
regress_role_nopriv;
./src/test/modules/unsafe_tests/sql/guc_privs.sql:CREATE ROLE regress_admin 
SUPERUSER;
./src/test/modules/test_ddl_deparse/sql/alter_function.sql:CREATE ROLE 
regress_alter_function_role;


BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
somehow?  Seems not great to run it as part of installcheck-world, if we don't
want to run it as part of installcheck.


A second issue I noticed is that advisory_lock.sql often fails, because the
pg_locks queries don't restrict to the current database. Patch attached.

I haven't seen that with autoconf installcheck-world, presumably because of
this:

# There are too many interdependencies between the subdirectories, so
# don't attempt parallel make here.
.NOTPARALLEL:


With those two patches applied, I got through 10 iterations of running all
regress / isolation tests concurrently with meson without failures.

I attached the meson patch as well, but just because I used it to to get to
these patches.

Greetings,

Andres Freund
>From fee4e186f18134df929618bbd00cc1519149f144 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 25 Sep 2022 16:49:51 -0700
Subject: [PATCH v2 1/3] tests: Rename conflicting role names

These cause problems when running installcheck concurrently.
---
 src/pl/plperl/expected/plperl_setup.out   | 30 +--
 src/pl/plperl/sql/plperl_setup.sql| 30 +--
 contrib/adminpack/expected/adminpack.out  | 20 ++---
 contrib/adminpack/sql/adminpack.sql   | 20 ++---
 .../passwordcheck/expected/passwordcheck.out  | 16 +-
 contrib/passwordcheck/sql/passwordcheck.sql   | 16 +-
 6 files changed, 66 

Re: Error-safe user functions

2022-10-03 Thread Nikita Glukhov

Sorry, I didn't not tried building using meson.
One line was fixed in the new test module's meson.build.

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


errorsafe_functions_v01.tgz
Description: application/compressed-tar


installing static libraries (was building postgres with meson)

2022-10-03 Thread Andres Freund
Hi,

Splitting out to a new thread. Started at
https://www.postgresql.org/message-id/20220915051754.ccx35szetbvnjv7g%40awork3.anarazel.de

On 2022-09-14 22:17:54 -0700, Andres Freund wrote:
> On 2022-09-15 01:10:16 -0400, Tom Lane wrote:
> > I realize that there are people for whom other considerations outweigh
> > that, but I don't think that we should install static libraries by
> > default.  Long ago it was pretty common for configure scripts to
> > offer --enable-shared and --enable-static options ... should we
> > resurrect that?
>
> It'd be easy enough. I don't really have an opinion on whether it's worth
> having the options. I think most packaging systems have ways of not including
> files even if $software installs them.

A few questions, in case we want to do this:

1) should this affect libraries we build only as static libraries, like
   pgport, pgcommon, pgfeutils?

   I assume there's some extensions that build binaries with pgxs, which then
   presumably need pgport, pgcommon.

2) Would we want the option add it to autoconf and meson, or just meson?

3) For meson, I'd be inclined to leave the static libraries in as build
   targets, but just not build and install them by default.

4) Why are we installing the static libraries into libdir? Given that they're
   not versioned at all, it somehow seems pkglibdir would be more appropriate?

   Debian apparently patches postgres in an attempt to do so:
   
https://salsa.debian.org/postgresql/postgresql/-/blob/15/debian/patches/libpgport-pkglibdir
   but I think it's not quite complete, because the libpq.pc doesn't know
   about the new location for the static libraries

5) currently we build the constituents of libpq.a with -fPIC, but then propose
   to link with -lpgport -lpgcommon, rather than linking with the_shlib
   versions.  That seems a bit bogus to me - either we care about providing an
   efficient non-PIC library (in which case libpq.a elements would need to be
   build separately), or we don't (in which case we should just link to
   *_shlib).

Greetings,

Andres Freund




Re: Pluggable toaster

2022-10-03 Thread Nikita Malakhov
Hi hackers!
Cfbot failed in meson build with previous patchsets, so I've rebased them
onto the latest master and added necessary meson build info.

Patchset consists of:
v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with
reference TOAST mechanics - new API is introduced but
reference TOAST is still unchanged;
v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API
- reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov  wrote:

> Hi,
> Meson build for the patchset failed, meson build files attached and
> README/Doc package
> reworked with more detailed explanation of virtual function table along
> with other corrections.
>
> On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov  wrote:
>
>> Hi hackers!
>> Last patchset has an invalid patch file - v16-0003-toaster-docs.patch.
>> Here's corrected patchset,
>> sorry for the noise.
>>
>> On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov 
>> wrote:
>>
>>> Hi hackers!
>>>
>>> Cfbot is still not happy with the patchset, so I'm attaching a rebased
>>> one, rebased onto the current
>>> master (from today). The third patch contains documentation package, and
>>> the second one contains large
>>> README.toastapi file providing additional in-depth docs for developers.
>>>
>>> Comments would be greatly appreciated.
>>>
>>> Again, after checking patch sources I have a strong opinion that it
>>> needs some refactoring -
>>> move all files related to TOAST implementation into new folder
>>> /backend/access/toast where
>>> Generic (default) Toaster resides.
>>>
>>> Patchset consists of:
>>> v16-0001-toaster-interface.patch - Pluggable TOAST API interface along
>>> with reference TOAST mechanics;
>>> v16-0002-toaster-default.patch - Default TOAST re-implemented using
>>> Toaster API;
>>> v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package
>>>
>>> Actual GitHub branch resides at
>>> https://github.com/postgrespro/postgres/tree/toasterapi_clean
>>>
>>> On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov 
>>> wrote:
>>>
 Hi hackers!

 Cfbot is not happy with previous patchset, so I'm attaching new one,
 rebased onto current master
 (15b4). Also providing patch with documentation package, and the second
 one contains large
 README.toastapi file providing additional in-depth docs for developers.

 Comments would be greatly appreciated.

 Also, after checking patch sources I have a strong opinion that it
 needs some refactoring -
 move all files related to TOAST implementation into new folder
 /backend/access/toast where
 Generic (default) Toaster resides.

 Patchset consists of:
 v15-0001-toaster-interface.patch - Pluggable TOAST API interface along
 with reference TOAST mechanics;
 v15-0002-toaster-default.patch - Default TOAST re-implemented using
 Toaster API;
 v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

 On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion 
 wrote:

> On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov 
> wrote:
> > It would be more clear for complex data types like JSONB, where
> developers could
> > need some additional functionality to work with internal
> representation of data type,
> > and its full potential is revealed in our JSONB toaster extension.
> The JSONB toaster
> > is still in development but we plan to make it available soon.
>
> Okay. It'll be good to have that, because as it is now it's hard to
> see the whole picture.
>
> > On installing dummy_toaster contrib: I've just checked it by making
> a patch from commit
> > and applying onto my clone of master and 2 patches provided in
> previous email without
> > any errors and sll checks passed - applying with git am, configure
> with debug, cassert,
> > depend and enable-tap-tests flags and run checks.
> > Please advice what would cause such a behavior?
>
> I don't think the default pg_upgrade tests will upgrade contrib
> objects (there are instructions in src/bin/pg_upgrade/TESTING that
> cover manual dumps, if you prefer that method). My manual steps were
> roughly
>
> =# CREATE EXTENSION dummy_toaster;
> =# CREATE TABLE test (t TEXT
> STORAGE external
> TOASTER dummy_toaster_handler);
> =# \q
> $ initdb -D newdb
> $ pg_ctl -D olddb stop
> $ pg_upgrade -b /bin -B /bin -d
> ./olddb -D ./newdb
>
> (where /bin is on the PATH, so we're using the right
> binaries).
>
> Thanks,
> --Jacob
>


 --
 Regards,
 Nikita Malakhov
 Postgres Professional
 https://postgrespro.ru/

Re: [PATCH] Fix build with LLVM 15 or above

2022-10-03 Thread Zhihong Yu
On Mon, Oct 3, 2022 at 2:41 PM Andres Freund  wrote:

> Hi,
>
> On 2022-10-03 12:16:12 -0700, Andres Freund wrote:
> > I haven't yet run through the whole regression test with an assert
> enabled
> > llvm because an assert-enabled llvm is *SLOW*, but it got through the
> first
> > few parallel groups ok.  Using an optimized llvm 15, all tests pass with
> > PGOPTIONS=-cjit_above_cost=0.
>
> That did pass. But to be able to use clang >= 15 one more piece is
> needed. Updated patch attached.
>
> Greetings,
>
> Andres Freund
>
Hi,

+* When targetting an llvm version with opaque pointers enabled by

I think `targetting` should be spelled as targeting

Cheers


Re: [PATCH] Fix build with LLVM 15 or above

2022-10-03 Thread Andres Freund
Hi,

On 2022-10-03 12:16:12 -0700, Andres Freund wrote:
> I haven't yet run through the whole regression test with an assert enabled
> llvm because an assert-enabled llvm is *SLOW*, but it got through the first
> few parallel groups ok.  Using an optimized llvm 15, all tests pass with
> PGOPTIONS=-cjit_above_cost=0.

That did pass. But to be able to use clang >= 15 one more piece is
needed. Updated patch attached.

Greetings,

Andres Freund
>From 0fed706031df781bb5889a859c47379e3c37f4f4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 22 Sep 2022 23:38:56 +1200
Subject: [PATCH v2] WIP: jit: LLVM 15: Minimal changes.

Per https://llvm.org/docs/OpaquePointers.html, support for non-opaque
pointers still exists and we can request that on our context.  We have
until LLVM 16 to move to opaque pointers.
---
 src/backend/jit/llvm/llvmjit.c   | 18 +++
 src/backend/jit/llvm/meson.build |  3 ++
 configure| 89 
 configure.ac |  3 ++
 4 files changed, 113 insertions(+)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index fd3eecf27d3..bccfcfa9698 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -798,6 +798,16 @@ llvm_session_initialize(void)
 	LLVMInitializeNativeAsmPrinter();
 	LLVMInitializeNativeAsmParser();
 
+	/*
+	 * When targetting an llvm version with opaque pointers enabled by
+	 * default, turn them off for the context we build our code in. Don't need
+	 * to do so for other contexts (e.g. llvm_ts_context) - once the IR is
+	 * generated, it carries the necessary information.
+	 */
+#if LLVM_VERSION_MAJOR > 14
+	LLVMContextSetOpaquePointers(LLVMGetGlobalContext(), false);
+#endif
+
 	/*
 	 * Synchronize types early, as that also includes inferring the target
 	 * triple.
@@ -1112,7 +1122,11 @@ llvm_resolve_symbols(LLVMOrcDefinitionGeneratorRef GeneratorObj, void *Ctx,
 	 LLVMOrcJITDylibRef JD, LLVMOrcJITDylibLookupFlags JDLookupFlags,
 	 LLVMOrcCLookupSet LookupSet, size_t LookupSetSize)
 {
+#if LLVM_VERSION_MAJOR > 14
+	LLVMOrcCSymbolMapPairs symbols = palloc0(sizeof(LLVMOrcCSymbolMapPair) * LookupSetSize);
+#else
 	LLVMOrcCSymbolMapPairs symbols = palloc0(sizeof(LLVMJITCSymbolMapPair) * LookupSetSize);
+#endif
 	LLVMErrorRef error;
 	LLVMOrcMaterializationUnitRef mu;
 
@@ -1230,7 +1244,11 @@ llvm_create_jit_instance(LLVMTargetMachineRef tm)
 	 * Symbol resolution support for "special" functions, e.g. a call into an
 	 * SQL callable function.
 	 */
+#if LLVM_VERSION_MAJOR > 14
+	ref_gen = LLVMOrcCreateCustomCAPIDefinitionGenerator(llvm_resolve_symbols, NULL, NULL);
+#else
 	ref_gen = LLVMOrcCreateCustomCAPIDefinitionGenerator(llvm_resolve_symbols, NULL);
+#endif
 	LLVMOrcJITDylibAddGenerator(LLVMOrcLLJITGetMainJITDylib(lljit), ref_gen);
 
 	return lljit;
diff --git a/src/backend/jit/llvm/meson.build b/src/backend/jit/llvm/meson.build
index de2e624ab58..e5a702163b7 100644
--- a/src/backend/jit/llvm/meson.build
+++ b/src/backend/jit/llvm/meson.build
@@ -51,6 +51,9 @@ endif
 
 # XXX: Need to determine proper version of the function cflags for clang
 bitcode_cflags = ['-fno-strict-aliasing', '-fwrapv']
+if llvm.version().version_compare('>=15.0')
+  bitcode_cflags += ['-Xclang', '-no-opaque-pointers']
+endif
 bitcode_cflags += cppflags
 
 # XXX: Worth improving on the logic to find directories here
diff --git a/configure b/configure
index 1caca21b625..cf457a0ed4a 100755
--- a/configure
+++ b/configure
@@ -7391,6 +7391,95 @@ if test x"$pgac_cv_prog_CLANGXX_cxxflags__fexcess_precision_standard" = x"yes";
 fi
 
 
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CLANG} supports -Xclang -no-opaque-pointers, for BITCODE_CFLAGS" >&5
+$as_echo_n "checking whether ${CLANG} supports -Xclang -no-opaque-pointers, for BITCODE_CFLAGS... " >&6; }
+if ${pgac_cv_prog_CLANG_cflags__Xclang__no_opaque_pointers+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CLANG}
+CFLAGS="${BITCODE_CFLAGS} -Xclang -no-opaque-pointers"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CLANG_cflags__Xclang__no_opaque_pointers=yes
+else
+  pgac_cv_prog_CLANG_cflags__Xclang__no_opaque_pointers=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CLANG_cflags__Xclang__no_opaque_pointers" >&5
+$as_echo "$pgac_cv_prog_CLANG_cflags__Xclang__no_opaque_pointers" >&6; }
+if test x"$pgac_cv_prog_CLANG_cflags__Xclang__no_opaque_pointers" = x"yes"; then
+  BITCODE_CFLAGS="${BITCODE_CFLAGS} -Xclang -no-opaque-pointers"
+fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking 

Re: problems with making relfilenodes 56-bits

2022-10-03 Thread Andres Freund
Hi,

On 2022-10-03 19:40:30 +0200, Matthias van de Meent wrote:
> On Mon, 3 Oct 2022, 19:01 Andres Freund,  wrote:
> > Random idea: xl_prev is large. Store a full xl_prev in the page header, but
> > only store a 2 byte offset from the page header xl_prev within each record.
> 
> With that small xl_prev we may not detect partial page writes in
> recycled segments; or other issues in the underlying file system. With
> small record sizes, the chance of returning incorrect data would be
> significant for small records (it would be approximately the chance of
> getting a record boundary on the underlying page boundary * chance of
> getting the same MAXALIGN-adjusted size record before the persistence
> boundary). That issue is part of the reason why my proposed change
> upthread still contains the full xl_prev.

What exactly is the theory for this significant increase? I don't think
xl_prev provides a meaningful protection against torn pages in the first
place?

Greetings,

Andres Freund




Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Tomas Vondra
On 10/3/22 21:25, Jaime Casanova wrote:
> On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
>> On 9/29/22 08:53, Jaime Casanova wrote:
>>> ...
>>>
>>> Just found one more ocurrance of this one with this index while an
>>> autovacuum was running:
>>>
>>> """
>>> CREATE INDEX bt_f8_heap_seqno_idx 
>>> ON public.bt_f8_heap 
>>> USING brin (seqno float8_minmax_multi_ops);
>>> """
>>> Attached is a backtrace.
>>
>> Thanks for the report!
>>
>> I think I see the issue - brin_minmax_multi_union does not realize the
>> two summaries could have just one range each, and those can overlap so
>> that merge_overlapping_ranges combines them into a single one.
>>
>> This is harmless, except that the assert int build_distances is overly
>> strict. Not sure if we should just remove the assert or only compute the
>> distances with (neranges>1).
>>
>> Do you happen to have the core dump? It'd be useful to look at ranges_a
>> and ranges_b, to confirm this is indeed what's happening.
>>
> 
> I do have it.
> 
> (gdb) p *ranges_a
> $4 = {
>   typid = 701,
>   colloid = 0,
>   attno = 0,
>   cmp = 0x0,
>   nranges = 0,
>   nsorted = 1,
>   nvalues = 1,
>   maxvalues = 32,
>   target_maxvalues = 32,
>   values = 0x55d2ea1987c8
> }
> (gdb) p *ranges_b
> $5 = {
>   typid = 701,
>   colloid = 0,
>   attno = 0,
>   cmp = 0x0,
>   nranges = 0,
>   nsorted = 1,
>   nvalues = 1,
>   maxvalues = 32,
>   target_maxvalues = 32,
>   values = 0x55d2ea196da8
> }
> 

Thanks. That mostly confirms my theory. I'd bet that this

(gdb) p ranges_a->values[0]
(gdb) p ranges_b->values[0]

will print the same value. I've been able to reproduce this, but it's
pretty difficult to get the timing right (and it requires table with
just a single value in that BRIN range).

I'll get this fixed in a couple days. Considering the benign nature of
this issue (unnecessary assert) I'm not going to rush.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Error-safe user functions

2022-10-03 Thread Nikita Glukhov

Hi, hackers!


Trying to implement error handling behavior required by SQL/JSON, we
came to agreement that we need special infrastructure for catching
errors in the input and type conversion functions without heavy-weight
things like subtransactions.  See the whole thread "SQL/JSON features
for v15" [1], or last ~5  messages in the branch starting from [2].

The idea is simple -- introduce new "error-safe" calling mode of user
functions by passing special node through FunctCallInfo.context, in
which function should write error info and return instead of throwing
it.  Also such functions should manually free resources before
returning an error.  This gives ability to avoid PG_TRY/PG_CATCH and
subtransactions.


I have submitted two patch sets to the old thread: the first [3] POC
example for NULL_ON_ERROR option for COPY, and the second [4] with the
set of error-safe functions needed for SQL/JSON.

Now I'm starting this separate thread with the new version of the
patch set, which includes error-safe functions for the subset of
data types (unfinished domains were removed), NULL_ON_ERROR option
for COPY (may need one more thread).



In the previous version of the patch error-safe functions were marked
in the catalog using new column pg_proc.proissafe, but it is not the
best solution:

On 30.09.2022, Tom Lane wrote:

I strongly recommend against having a new pg_proc column at all.
I doubt that you really need it, and having one will create
enormous mechanical burdens to making the conversion.  (For example,
needing a catversion bump every time we convert one more function,
or an extension version bump to convert extensions.)


I think the only way to avoid catalog modification (adding new columns
to pg_proc or pg_type, introducing new function signatures etc.) and
to avoid adding some additional code to the entry of error-safe
functions is to bump version of our calling convention.  I simply added
flag Pg_finfo_record.errorsafe which is set to true when the new
PG_FUNCTION_INFO_V2_ERRORSAFE() macro is used.  We could avoid adding
this flag by treating every V2 as error-safe, but I'm not sure if
it is acceptable.

Built-in error-safe function are marked in pg_proc.dat using the
special flag "errorsafe" which is stored only in FmgrBuiltin, not in
the catalog like previous "proissafe" was.



On 2022-09-3 Andrew Dunstan wrote:

I suggest just submitting the Input function stuff on its own, I
think that means not patches 3,4,15 at this stage. Maybe we would
also need a small test module to call the functions, or at least
some of them.  The earlier we can get this in the earlier SQL/JSON
patches based on it can be considered.



+1


I have added test module in patch #14.


On 2022-09-3 Andrew Dunstan wrote:
proissafe isn't really a very informative name. Safe for what? maybe
proerrorsafe or something would be better?


I have renamed "safe" to "errorsafe".


On 2022-09-3 Andrew Dunstan wrote:

I don't think we need the if test or else clause here:

+   if (edata)
+   return InputFunctionCallInternal(flinfo, str, typioparam,
typmod, edata);
+   else
+   return InputFunctionCall(flinfo, str, typioparam, typmod);

"If" statement removed.


On 2022-09-3 Andrew Dunstan wrote:

I think we should probably cover float8 as well as float4, and there
might be some other odd gaps.


I have added error-safe function for float8 too.


[1]https://www.postgresql.org/message-id/flat/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
[2]https://www.postgresql.org/message-id/flat/13351.1661965592%40sss.pgh.pa.us#3d23aa20c808d0267ac1f7ef2825f0dd
[3]https://www.postgresql.org/message-id/raw/379e5365-9670-e0de-ee08-57ba61cbc976%40postgrespro.ru
[4]https://www.postgresql.org/message-id/raw/0574201c-bd35-01af-1557-8936f99ce5aa%40postgrespro.ru

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


errorsafe_functions_v01.tgz
Description: application/compressed-tar


Re: Reducing the chunk header sizes on all memory context types

2022-10-03 Thread Tom Lane
David Rowley  writes:
> Andres did mention to me off-list about perhaps adding a boolean field
> to FunctionCallInfoBaseData to indicate if the return value can be
> assumed to be in CurrentMemoryContext.  I feel like that might be
> quite a bit of work to go and change all functions to ensure that
> that's properly populated.

That seems like the right basic answer, but wrong in detail.  We have only
a tiny number of places that care about this --- aggregates and window
functions basically --- and those already have a bunch of special calling
conventions.  So changing the generic fmgr APIs has side-effects far
beyond what's justified.

I think what we should look at is extending the aggregate/window
function APIs so that such functions can report where they put their
output, and then we can nuke MemoryContextContains(), with the
code code set up to assume that it has to copy if the called function
didn't report anything.  The existing FunctionCallInfo.resultinfo
mechanism (cf. ReturnSetInfo for SRFs) is probably the right thing
to pass the flag through.

I can take a look at this once the release dust settles a little.

regards, tom lane




Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Jaime Casanova
On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
> On 9/29/22 08:53, Jaime Casanova wrote:
> > ...
> > 
> > Just found one more ocurrance of this one with this index while an
> > autovacuum was running:
> > 
> > """
> > CREATE INDEX bt_f8_heap_seqno_idx 
> > ON public.bt_f8_heap 
> > USING brin (seqno float8_minmax_multi_ops);
> > """
> > Attached is a backtrace.
> 
> Thanks for the report!
> 
> I think I see the issue - brin_minmax_multi_union does not realize the
> two summaries could have just one range each, and those can overlap so
> that merge_overlapping_ranges combines them into a single one.
> 
> This is harmless, except that the assert int build_distances is overly
> strict. Not sure if we should just remove the assert or only compute the
> distances with (neranges>1).
> 
> Do you happen to have the core dump? It'd be useful to look at ranges_a
> and ranges_b, to confirm this is indeed what's happening.
> 

I do have it.

(gdb) p *ranges_a
$4 = {
  typid = 701,
  colloid = 0,
  attno = 0,
  cmp = 0x0,
  nranges = 0,
  nsorted = 1,
  nvalues = 1,
  maxvalues = 32,
  target_maxvalues = 32,
  values = 0x55d2ea1987c8
}
(gdb) p *ranges_b
$5 = {
  typid = 701,
  colloid = 0,
  attno = 0,
  cmp = 0x0,
  nranges = 0,
  nsorted = 1,
  nvalues = 1,
  maxvalues = 32,
  target_maxvalues = 32,
  values = 0x55d2ea196da8
}

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: [PATCH] Fix build with LLVM 15 or above

2022-10-03 Thread Andres Freund
Hi,

On 2022-10-03 18:34:18 +1300, Thomas Munro wrote:
> One option I thought about as a stopgap measure is to use
> LLVMContextSetOpaquePointers(context, false) to turn the new code
> paths off, but it doesn't seem to work for me and I couldn't figure
> out why yet (it still aborts -- probably there are more 'contexts'
> around that I didn't handle, something like that).

I think that's just because of this hunk:

@@ -992,7 +1000,12 @@ llvm_create_types(void)
 }

 /* eagerly load contents, going to need it all */
+#if LLVM_VERSION_MAJOR > 14
+if 
(LLVMParseBitcodeInContext2(LLVMOrcThreadSafeContextGetContext(llvm_ts_context),
+   buf, _types_module))
+#else
 if (LLVMParseBitcode2(buf, _types_module))
+#endif
 {
 elog(ERROR, "LLVMParseBitcode2 of %s failed", path);
 }

This is the wrong context to use here. Because of that we end up with types
from two different contexts being used, which leads to this assertion to fail:

#5  0x7f945a036ab2 in __GI___assert_fail (
assertion=0x7f93cf5a4a1b "getOperand(0)->getType() == 
getOperand(1)->getType() && \"Both operands to ICmp instruction are not of the 
same type!\"",
file=0x7f93cf66062a 
"/home/andres/src/llvm-project/llvm/include/llvm/IR/Instructions.h", line=1191,
function=0x7f93cf5f2db6 "void llvm::ICmpInst::AssertOK()") at 
./assert/assert.c:101
#6  0x7f93cf9e3a3c in llvm::ICmpInst::AssertOK (this=0x56482c3b4b50) at 
/home/andres/src/llvm-project/llvm/include/llvm/IR/Instructions.h:1190
#7  0x7f93cf9e38ca in llvm::ICmpInst::ICmpInst (this=0x56482c3b4b50, 
pred=llvm::CmpInst::ICMP_UGE, LHS=0x56482c3b98d0, RHS=0x56482c3b9920, 
NameStr="")
at /home/andres/src/llvm-project/llvm/include/llvm/IR/Instructions.h:1245
#8  0x7f93cf9dc6f9 in llvm::IRBuilderBase::CreateICmp (this=0x56482c3b4650, 
P=llvm::CmpInst::ICMP_UGE, LHS=0x56482c3b98d0, RHS=0x56482c3b9920, Name="")
at /home/andres/src/llvm-project/llvm/include/llvm/IR/IRBuilder.h:2212
#9  0x7f93cfa650cd in LLVMBuildICmp (B=0x56482c3b4650, Op=LLVMIntUGE, 
LHS=0x56482c3b98d0, RHS=0x56482c3b9920, Name=0x7f9459722cf2 "")
at /home/andres/src/llvm-project/llvm/lib/IR/Core.cpp:3883
#10 0x7f945971b4d7 in llvm_compile_expr (state=0x56482c31f878) at 
/home/andres/src/postgresql/src/backend/jit/llvm/llvmjit_expr.c:302
#11 0x56482a28f76b in jit_compile_expr (state=state@entry=0x56482c31f878) 
at /home/andres/src/postgresql/src/backend/jit/jit.c:177
#12 0x564829f44e62 in ExecReadyExpr (state=state@entry=0x56482c31f878) at 
/home/andres/src/postgresql/src/backend/executor/execExpr.c:885

because types (compared by pointer value) are only unique within a context.

I think all that is needed for this aspect would be:

#if LLVM_VERSION_MAJOR > 14
LLVMContextSetOpaquePointers(LLVMGetGlobalContext(), false);
#endif


I haven't yet run through the whole regression test with an assert enabled
llvm because an assert-enabled llvm is *SLOW*, but it got through the first
few parallel groups ok.  Using an optimized llvm 15, all tests pass with
PGOPTIONS=-cjit_above_cost=0.


> That option is available for LLVM 15 but will be taken out in LLVM 16, so
> that's supposed to be the last chance to stop using pre-opaque pointers; see
> the bottom of the page I linked above for that, where they call it
> setOpaquePointers(false) (the C++ version of
> LLVMContextSetOpaquePointers()).  I don't really want to go with that if we
> can avoid it, though, because it says "Opaque pointers are enabled by
> default. Typed pointers are still available, but only supported on a
> best-effort basis and may be untested" so I expect it to be blighted with
> problems.

I think it'd be ok for the back branches, while we figure out the opaque stuff
in HEAD.

Greetings,

Andres Freund
>From d2a383fe07f962e3a1e17a7f8f112a868cbd8175 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 22 Sep 2022 23:38:56 +1200
Subject: [PATCH v1] WIP: jit: LLVM 15: Minimal changes.

Per https://llvm.org/docs/OpaquePointers.html, support for non-opaque
pointers still exists and we can request that on our context.  We have
until LLVM 16 to move to opaque pointers.
---
 src/backend/jit/llvm/llvmjit.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index fd3eecf27d3..bccfcfa9698 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -798,6 +798,16 @@ llvm_session_initialize(void)
 	LLVMInitializeNativeAsmPrinter();
 	LLVMInitializeNativeAsmParser();
 
+	/*
+	 * When targetting an llvm version with opaque pointers enabled by
+	 * default, turn them off for the context we build our code in. Don't need
+	 * to do so for other contexts (e.g. llvm_ts_context) - once the IR is
+	 * generated, it carries the necessary information.
+	 */
+#if LLVM_VERSION_MAJOR > 14
+	LLVMContextSetOpaquePointers(LLVMGetGlobalContext(), false);
+#endif
+
 	/*
 	 * Synchronize 

Re: [patch] \g with multiple result sets and \watch with copy queries

2022-10-03 Thread Tom Lane
I wrote:
> Ugh.  I think we'd better fix that before 15.0, else somebody may
> think this is the new intended behavior and raise compatibility
> concerns when we fix it.  I will see if I can squeeze it in before
> this afternoon's 15rc2 wrap.

Pushed after making some corrections.

Given the time pressure, I did not worry about installing regression
test coverage for this stuff, but I wonder if we shouldn't add some.

regards, tom lane




Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-10-03 Thread Andrey Chudnovsky
> I think we can probably prototype a callback hook for approach (1)
> pretty quickly. (2) is a lot more work and investigation, but it's
> work that I'm interested in doing (when I get the time). I think there
> are other very good reasons to consider a third-party SASL library,
> and some good lessons to be learned, even if the community decides not
> to go down that road.

Makes sense. We will work on (1.) and do some check if there are any
blockers for a shared solution to support github and google.

On Fri, Sep 30, 2022 at 1:45 PM Jacob Champion  wrote:
>
> On Fri, Sep 30, 2022 at 7:47 AM Andrey Chudnovsky
>  wrote:
> > > How should we communicate those pieces to a custom client when it's
> > > passing a token directly? The easiest way I can see is for the custom
> > > client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin).
> > > If you had to parse the libpq error message, I don't think that'd be
> > > particularly maintainable.
> >
> > I agree that parsing the message is not a sustainable way.
> > Could you provide more details on the SASL plugin approach you propose?
> >
> > Specifically, is this basically a set of extension hooks for the client 
> > side?
> > With the need for the client to be compiled with the plugins based on
> > the set of providers it needs.
>
> That's a good question. I can see two broad approaches, with maybe
> some ability to combine them into a hybrid:
>
> 1. If there turns out to be serious interest in having libpq itself
> handle OAuth natively (with all of the web-facing code that implies,
> and all of the questions still left to answer), then we might be able
> to provide a "token hook" in the same way that we currently provide a
> passphrase hook for OpenSSL keys. By default, libpq would use its
> internal machinery to take the provider details, navigate its builtin
> flow, and return the Bearer token. If you wanted to override that
> behavior as a client, you could replace the builtin flow with your
> own, by registering a set of callbacks.
>
> 2. Alternatively, OAuth support could be provided via a mechanism
> plugin for some third-party SASL library (GNU libgsasl, Cyrus
> libsasl2). We could provide an OAuth plugin in contrib that handles
> the default flow. Other providers could publish their alternative
> plugins to completely replace the OAUTHBEARER mechanism handling.
>
> Approach (2) would make for some duplicated effort since every
> provider has to write code to speak the OAUTHBEARER protocol. It might
> simplify provider-specific distribution, since (at least for Cyrus) I
> think you could build a single plugin that supports both the client
> and server side. But it would be a lot easier to unknowingly (or
> knowingly) break the spec, since you'd control both the client and
> server sides. There would be less incentive to interoperate.
>
> Finally, we could potentially take pieces from both, by having an
> official OAuth mechanism plugin that provides a client-side hook to
> override the flow. I have no idea if the benefits would offset the
> costs of a plugin-for-a-plugin style architecture. And providers would
> still be free to ignore it and just provide a full mechanism plugin
> anyway.
>
> > > Well... I don't quite understand why we'd go to the trouble of
> > > providing a provider-agnostic communication solution only to have
> > > everyone write their own provider-specific client support. Unless
> > > you're saying Microsoft would provide an officially blessed plugin for
> > > the *server* side only, and Google would provide one of their own, and
> > > so on.
> >
> > Yes, via extensions. Identity providers can open source extensions to
> > use their auth services outside of first party PaaS offerings.
> > For 3rd party Postgres PaaS or on premise deployments.
>
> Sounds reasonable.
>
> > > The server side authorization is the only place where I think it makes
> > > sense to specialize by default. libpq should remain agnostic, with the
> > > understanding that we'll need to make hard decisions when a major
> > > provider decides not to follow a spec.
> >
> > Completely agree with agnostic libpq. Though needs validation with
> > several major providers to know if this is possible.
>
> Agreed.
>
> > > Specifically it delivers that message to an end user. If you want a
> > > generic machine client to be able to use that, then we'll need to talk
> > > about how.
> >
> > Yes, that's what needs to be decided.
> > In both Device code and Authorization code scenarios, libpq and the
> > client would need to exchange a couple of pieces of metadata.
> > Plus, after success, the client should be able to access a refresh token 
> > for further use.
> >
> > Can we implement a generic protocol like for this between libpq and the 
> > clients?
>
> I think we can probably prototype a callback hook for approach (1)
> pretty quickly. (2) is a lot more work and investigation, but it's
> work that I'm interested in doing (when I get the time). I think there

Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Tomas Vondra
On 9/29/22 08:53, Jaime Casanova wrote:
> ...
> 
> Just found one more ocurrance of this one with this index while an
> autovacuum was running:
> 
> """
> CREATE INDEX bt_f8_heap_seqno_idx 
> ON public.bt_f8_heap 
> USING brin (seqno float8_minmax_multi_ops);
> """
> Attached is a backtrace.

Thanks for the report!

I think I see the issue - brin_minmax_multi_union does not realize the
two summaries could have just one range each, and those can overlap so
that merge_overlapping_ranges combines them into a single one.

This is harmless, except that the assert int build_distances is overly
strict. Not sure if we should just remove the assert or only compute the
distances with (neranges>1).

Do you happen to have the core dump? It'd be useful to look at ranges_a
and ranges_b, to confirm this is indeed what's happening.

If not, how reproducible is it?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: problems with making relfilenodes 56-bits

2022-10-03 Thread Matthias van de Meent
On Mon, 3 Oct 2022, 19:01 Andres Freund,  wrote:
>
> Hi,
>
> On 2022-10-03 08:12:39 -0400, Robert Haas wrote:
> > On Fri, Sep 30, 2022 at 8:20 PM Andres Freund  wrote:
> > > I think it'd be interesting to look at per-record-type stats between two
> > > equivalent workload, to see where practical workloads suffer the most
> > > (possibly with fpw=off, to make things more repeatable).
> >
> > I would expect, and Dilip's results seem to confirm, the effect to be
> > pretty uniform: basically, nearly every record gets bigger by 4 bytes.
> > That's because most records contain at least one block reference, and
> > if they contain multiple block references, likely all but one will be
> > marked BKPBLOCK_SAME_REL, so we pay the cost just once.
>
> But it doesn't really matter that much if an already large record gets a bit
> bigger. Whereas it does matter if it's a small record. Focussing on optimizing
> the record types where the increase is large seems like a potential way
> forward to me, even if we can't find something generic.
>
>
> > I thought about trying to buy back some space elsewhere, and I think
> > that would be a reasonable approach to getting this committed if we
> > could find a way to do it. However, I don't see a terribly obvious way
> > of making it happen.
>
> I think there's plenty potential...
>
>
> > Trying to do it by optimizing specific WAL record
> > types seems like a real pain in the neck, because there's tons of
> > different WAL records that all have the same problem.
>
> I am not so sure about that. Improving a bunch of the most frequent small
> records might buy you back enough on just about every workload to be OK.
>
> I put the top record sizes for an installcheck run with full_page_writes off
> at the bottom. Certainly our regression tests aren't generally
> representative. But I think it still decently highlights how just improving a
> few records could buy you back more than enough.
>
>
> > Trying to do it in a generic way makes more sense, and the fact that we have
> > 2 padding bytes available in XLogRecord seems like a place to start looking,
> > but the way forward from there is not clear to me.
>
> Random idea: xl_prev is large. Store a full xl_prev in the page header, but
> only store a 2 byte offset from the page header xl_prev within each record.

With that small xl_prev we may not detect partial page writes in
recycled segments; or other issues in the underlying file system. With
small record sizes, the chance of returning incorrect data would be
significant for small records (it would be approximately the chance of
getting a record boundary on the underlying page boundary * chance of
getting the same MAXALIGN-adjusted size record before the persistence
boundary). That issue is part of the reason why my proposed change
upthread still contains the full xl_prev.

A different idea is removing most block_ids from the record, and
optionally reducing per-block length fields to 1B. Used block ids are
effectively always sequential, and we only allow 33+4 valid values, so
we can use 2 bits to distinguish between 'block belonging to this ID
field have at most 255B of data registered' and 'blocks up to this ID
follow sequentially without own block ID'. That would save 2N-1 total
bytes for N blocks. It is scraping the barrel, but I think it is quite
possible.

Lastly, we could add XLR_BLOCK_ID_DATA_MED for values >255 containing
up to UINT16_MAX lengths. That would save 2 bytes for records that
only just pass the 255B barrier, where 2B is still a fairly
significant part of the record size.

Kind regards,

Matthias van de Meent




Re: Miscellaneous tab completion issue fixes

2022-10-03 Thread Dagfinn Ilmari Mannsåker
vignesh C  writes:

> Hi,
>
> There were a couple of tab completion issues present:
> a) \dRp and \dRs tab completion displays tables instead of displaying
> publications and subscriptions.
> b) "ALTER ... OWNER TO" does not include displaying of CURRENT_ROLE,
> CURRENT_USER and SESSION_USER.
>
> The attached patch has the changes to handle the same.

Good catches! Just a few comments:

> + else if (TailMatchesCS("\\dRp*"))
> + COMPLETE_WITH_QUERY(Query_for_list_of_publications[0].query);
> + else if (TailMatchesCS("\\dRs*"))
> + COMPLETE_WITH_QUERY(Query_for_list_of_subscriptions[0].query);

These are version-specific queries, so should be passed in their
entirety to COMPLETE_WITH_VERSIONED_QUERY() so that psql can pick the
right version, and avoid sending the query at all if the server is too
old.

> +/* add these to Query_for_list_of_roles in OWNER TO contexts */
> +#define Keywords_for_list_of_owner_to_roles \
> +"CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"

I think this would read better without the TO, both in the comment and
the constant name, similar to the below only having GRANT without TO:

>  /* add these to Query_for_list_of_roles in GRANT contexts */
>  #define Keywords_for_list_of_grant_roles \
>  "PUBLIC", "CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"

- ilmari




Re: Question: test "aggregates" failed in 32-bit machine

2022-10-03 Thread Tomas Vondra


On 10/3/22 16:05, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
>> Based on the above discussion, the RMT asks for a revert of db0d67db2 in 
>> the v15 release. The RMT also recommends a revert in HEAD but does not 
>> have the power to request that.
> 
> Roger, I'll push these shortly.
> 

Thanks for resolving this, and apologies for not noticing this thread
earlier (and for the bugs in the code, ofc).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: problems with making relfilenodes 56-bits

2022-10-03 Thread Andres Freund
Hi,

On 2022-10-03 08:12:39 -0400, Robert Haas wrote:
> On Fri, Sep 30, 2022 at 8:20 PM Andres Freund  wrote:
> > I think it'd be interesting to look at per-record-type stats between two
> > equivalent workload, to see where practical workloads suffer the most
> > (possibly with fpw=off, to make things more repeatable).
>
> I would expect, and Dilip's results seem to confirm, the effect to be
> pretty uniform: basically, nearly every record gets bigger by 4 bytes.
> That's because most records contain at least one block reference, and
> if they contain multiple block references, likely all but one will be
> marked BKPBLOCK_SAME_REL, so we pay the cost just once.

But it doesn't really matter that much if an already large record gets a bit
bigger. Whereas it does matter if it's a small record. Focussing on optimizing
the record types where the increase is large seems like a potential way
forward to me, even if we can't find something generic.


> I thought about trying to buy back some space elsewhere, and I think
> that would be a reasonable approach to getting this committed if we
> could find a way to do it. However, I don't see a terribly obvious way
> of making it happen.

I think there's plenty potential...


> Trying to do it by optimizing specific WAL record
> types seems like a real pain in the neck, because there's tons of
> different WAL records that all have the same problem.

I am not so sure about that. Improving a bunch of the most frequent small
records might buy you back enough on just about every workload to be OK.

I put the top record sizes for an installcheck run with full_page_writes off
at the bottom. Certainly our regression tests aren't generally
representative. But I think it still decently highlights how just improving a
few records could buy you back more than enough.


> Trying to do it in a generic way makes more sense, and the fact that we have
> 2 padding bytes available in XLogRecord seems like a place to start looking,
> but the way forward from there is not clear to me.

Random idea: xl_prev is large. Store a full xl_prev in the page header, but
only store a 2 byte offset from the page header xl_prev within each record.

Greetings,

Andres Freund

by total size:

Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
Heap/INSERT  1041666 ( 50.48)106565255 
( 50.54)0 (  0.00)106565255 ( 43.92)
Btree/INSERT_LEAF 352196 ( 17.07) 24067672 
( 11.41)0 (  0.00) 24067672 (  9.92)
Heap/DELETE   250852 ( 12.16) 13546008 
(  6.42)0 (  0.00) 13546008 (  5.58)
Hash/INSERT   108499 (  5.26)  7811928 
(  3.70)0 (  0.00)  7811928 (  3.22)
Transaction/COMMIT 16053 (  0.78)  6402657 
(  3.04)0 (  0.00)  6402657 (  2.64)
Gist/PAGE_UPDATE   57225 (  2.77)  5217100 
(  2.47)0 (  0.00)  5217100 (  2.15)
Gin/UPDATE_META_PAGE   23943 (  1.16)  4539970 
(  2.15)0 (  0.00)  4539970 (  1.87)
Gin/INSERT 27004 (  1.31)  3623998 
(  1.72)0 (  0.00)  3623998 (  1.49)
Gist/PAGE_SPLIT  448 (  0.02)  3391244 
(  1.61)0 (  0.00)  3391244 (  1.40)
SPGist/ADD_LEAF38968 (  1.89)  3341696 
(  1.58)0 (  0.00)  3341696 (  1.38)
...
XLOG/FPI7228 (  0.35)   378924 
(  0.18) 29788166 ( 93.67) 30167090 ( 12.43)
...
Gin/SPLIT141 (  0.01)13011 
(  0.01)  1187588 (  3.73)  1200599 (  0.49)
...
    
  
Total2063609 210848282 
[86.89%] 31802766 [13.11%]242651048 [100%]

(Included XLOG/FPI and Gin/SPLIT to explain why there's FPIs despite running 
with fpw=off)

sorted by number of records:
Heap/INSERT  1041666 ( 50.48)106565255 
( 50.54)0 (  0.00)106565255 ( 43.92)
Btree/INSERT_LEAF 352196 ( 17.07) 24067672 
( 11.41)  

Re: [patch] \g with multiple result sets and \watch with copy queries

2022-10-03 Thread Tom Lane
"Daniel Verite"  writes:
> The psql improvement in v15 to output multiple result sets does not
> behave as one might expect with \g: the output file or program
> to pipe into is opened/closed on each result set, overwriting the
> previous ones in the case of \g file.

Ugh.  I think we'd better fix that before 15.0, else somebody may
think this is the new intended behavior and raise compatibility
concerns when we fix it.  I will see if I can squeeze it in before
this afternoon's 15rc2 wrap.

regards, tom lane




Re: Question: test "aggregates" failed in 32-bit machine

2022-10-03 Thread Tom Lane
[ Just for the archives' sake at this point, in case somebody has
another go at this feature. ]

I wrote:
> ... I'm now discovering that the code I'd hoped to salvage in
> pathkeys.c (get_useful_group_keys_orderings and related) has its very own
> bugs.  It's imagining that it can rearrange a PathKeys list arbitrarily
> and then rearrange the GROUP BY SortGroupClause list to match, but that's
> easier said than done, for a couple of different reasons.

It strikes me that the easy solution here is to *not* rearrange the
SortGroupClause list at all.  What that would be used for later is
to generate a Unique node's list of columns to compare, but since
Unique only cares about equality-or-not, there's no strong reason
why it has to compare the columns in the same order they're sorted
in.  (Indeed, if anything we should prefer to compare them in the
opposite order, since the least-significant column should be the
most likely to be different from the previous row.)

I'm fairly sure that the just-reverted code is buggy on its
own terms, in that it might sometimes produce a clause list
that's not ordered the same as the pathkeys; but there's no
visible misbehavior, because that does not in fact matter.

So this'd let us simplify the APIs here, in particular PathKeyInfo
seems unnecessary, because we don't have to pass the SortGroupClause
list into or out of the pathkey-reordering logic.

regards, tom lane




Re: Possible solution for masking chosen columns when using pg_dump

2022-10-03 Thread Julien Rouhaud
Hi,

On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote:
>
> Hello, here's my take on masking data when using pg_dump
>  
> The main idea is using PostgreSQL functions to replace data during a SELECT.
> When table data is dumped SELECT a,b,c,d ... from ... query is generated, the 
> columns that are marked for masking are replaced with result of functions on 
> those columns
> Example: columns name, count are to be masked, so the query will look as 
> such: SELECT id, mask_text(name), mask_int(count), date from ...
>  
> So about the interface: I added 2 more command-line options: 
>  
> --mask-columns, which specifies what columns from what tables will be masked 
>     usage example:
>             --mask-columns "t1.name, t2.description" - both columns will be 
> masked with the same corresponding function
>             or --mask-columns name - ALL columns with name "name" from all 
> dumped tables will be masked with correspoding function
>  
> --mask-function, which specifies what functions will mask data
>     usage example:
>             --mask-function mask_int - corresponding columns will be masked 
> with function named "mask_int" from default schema (public)
>             or --mask-function my_schema.mask_varchar - same as above but 
> with specified schema where the function is stored
>             or --mask-function somedir/filename - the function is "defined" 
> here - more on the structure below

FTR I wrote an extension POC [1] last weekend that does that but on the backend
side.  The main advantage is that it's working with any existing versions of
pg_dump (or any client relying on COPY or even plain interactive SQL
statements), and that the DBA can force a dedicated role to only get a masked
dump, even if they forgot to ask for it.

I only had a quick look at your patch but it seems that you left some todo in
russian, which isn't helpful at least to me.

[1] https://github.com/rjuju/pg_anonymize




Possible solution for masking chosen columns when using pg_dump

2022-10-03 Thread Олег Целебровский

Hello, here's my take on masking data when using pg_dump
 
The main idea is using PostgreSQL functions to replace data during a SELECT.
When table data is dumped SELECT a,b,c,d ... from ... query is generated, the 
columns that are marked for masking are replaced with result of functions on 
those columns
Example: columns name, count are to be masked, so the query will look as such: 
SELECT id, mask_text(name), mask_int(count), date from ...
 
So about the interface: I added 2 more command-line options: 
 
--mask-columns, which specifies what columns from what tables will be masked 
    usage example:
            --mask-columns "t1.name, t2.description" - both columns will be 
masked with the same corresponding function
            or --mask-columns name - ALL columns with name "name" from all 
dumped tables will be masked with correspoding function
 
--mask-function, which specifies what functions will mask data
    usage example:
            --mask-function mask_int - corresponding columns will be masked 
with function named "mask_int" from default schema (public)
            or --mask-function my_schema.mask_varchar - same as above but with 
specified schema where the function is stored
            or --mask-function somedir/filename - the function is "defined" 
here - more on the structure below
 
Structure of the file with function description:
 
First row - function name (with or without schema name)
Second row - type of in and out value (the design is to only work with same 
input/output type so no int-to-text shenanigans)
Third row - language of function
Forth and later rows - body of a function
 
Example of such file:
 
mask_text
text
plpgsql
res := '***';
 
First iteration of using file-described functions used just plain SQL query, 
but since it executed during read-write connection, some things such as writing 
"DROP TABLE t1;" after the CREATE FUNCTION ...; were possible.
Now even if something harmful is written in function body, it will be executed 
during dump-read-only connection, where it will just throw an error
 
About "corresponding columns and functions" - masking functions and columns are 
paired with eachother based on the input order, but --masking-columns and 
--masking-functions don't have to be subsequent.
Example: pg_dump -t table_name --mask-columns name --mask-colums count 
--mask-function mask_text --mask-function mask_int - here 'name' will be paired 
with function 'mask_text' and 'count' with 'mask_int' 
 
Patch includes regression tests
 
I'm open to discussion of this patch
 
Best regards,
 
Oleg Tselebrovskiydiff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bd9b066e4e..457290064d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -97,6 +97,14 @@ typedef enum OidOptions
 	zeroAsNone = 4
 } OidOptions;
 
+typedef struct
+{
+	char* column; 	/* name of masked column */
+	char* table;	/* name of table where masked column is stored */
+	char* func;		/* name of masking function */
+	char* schema;	/* name of schema where masking function is stored */
+} MaskColumnInfo;
+
 /* global decls */
 static bool dosync = true;		/* Issue fsync() to make dump durable on disk. */
 
@@ -105,6 +113,14 @@ static Oid	g_last_builtin_oid; /* value of the last builtin oid */
 /* The specified names/patterns should to match at least one entity */
 static int	strict_names = 0;
 
+/*
+* mask_column_info_list contains info about every to-be-masked column:
+* its name, a name its table (if nothing is specified - mask all columns with this name),
+* name of masking function and name of schema containing this function (public if not specified)
+*/
+
+static SimplePtrList mask_column_info_list = {NULL, NULL};
+
 /*
  * Object inclusion/exclusion lists
  *
@@ -160,6 +176,8 @@ static int	nseclabels = 0;
    (obj)->dobj.name)
 
 static void help(const char *progname);
+static void addFuncToDatabase(MaskColumnInfo* cur_mask_column_info, 
+			 FILE* mask_func_file, DumpOptions* dopt);
 static void setup_connection(Archive *AH,
 			 const char *dumpencoding, const char *dumpsnapshot,
 			 char *use_role);
@@ -184,6 +202,8 @@ static void prohibit_crossdb_refs(PGconn *conn, const char *dbname,
 
 static NamespaceInfo *findNamespace(Oid nsoid);
 static void dumpTableData(Archive *fout, const TableDataInfo *tdinfo);
+static void maskColumns(TableInfo *tbinfo, char* current_column_name,
+		PQExpBuffer* q, SimpleStringList* column_names);
 static void refreshMatViewData(Archive *fout, const TableDataInfo *tdinfo);
 static const char *getRoleName(const char *roleoid_str);
 static void collectRoleNames(Archive *fout);
@@ -342,6 +362,19 @@ main(int argc, char **argv)
 	int			numWorkers = 1;
 	int			compressLevel = -1;
 	int			plainText = 0;
+
+	/* needed for masking */
+	SimpleStringList mask_columns_list = {NULL, NULL};
+	SimpleStringList mask_func_list = {NULL, NULL};
+	SimpleStringListCell *mask_func_cell;
+	SimpleStringListCell *mask_columns_cell;
+	

Miscellaneous tab completion issue fixes

2022-10-03 Thread vignesh C
Hi,

There were a couple of tab completion issues present:
a) \dRp and \dRs tab completion displays tables instead of displaying
publications and subscriptions.
b) "ALTER ... OWNER TO" does not include displaying of CURRENT_ROLE,
CURRENT_USER and SESSION_USER.

The attached patch has the changes to handle the same.

Regards,
Vignesh
From 1d454be7e89828bdacb191681e469c9581b615d5 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 2 Oct 2022 10:45:04 +0530
Subject: [PATCH v1 1/2] Display publications and subscriptions for tab
 completion of \dRp and \dRs.

Display publications and subscriptions for tab completion of \dRp and
\dRs.
---
 src/bin/psql/tab-complete.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 71cfe8aec1..4cda92208b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -4614,6 +4614,10 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables);
 	else if (TailMatchesCS("\\dP*"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_relations);
+	else if (TailMatchesCS("\\dRp*"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_publications[0].query);
+	else if (TailMatchesCS("\\dRs*"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_subscriptions[0].query);
 	else if (TailMatchesCS("\\ds*"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_sequences);
 	else if (TailMatchesCS("\\dt*"))
-- 
2.32.0

From 1c16b36c183f610fb79607f86663a8f6d655d70c Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 2 Oct 2022 10:59:59 +0530
Subject: [PATCH v1 2/2] Include CURRENT_ROLE, CURRENT_USER and SESSION_USER in
 tab completion while changing owner.

Include CURRENT_ROLE, CURRENT_USER and SESSION_USER in tab completion
while changing owner.
---
 src/bin/psql/tab-complete.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 4cda92208b..4e71f4371b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1031,6 +1031,10 @@ static const SchemaQuery Query_for_trigger_of_table = {
 "   FROM pg_catalog.pg_roles "\
 "  WHERE rolname LIKE '%s'"
 
+/* add these to Query_for_list_of_roles in OWNER TO contexts */
+#define Keywords_for_list_of_owner_to_roles \
+"CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
+
 /* add these to Query_for_list_of_roles in GRANT contexts */
 #define Keywords_for_list_of_grant_roles \
 "PUBLIC", "CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
@@ -4160,7 +4164,8 @@ psql_completion(const char *text, int start, int end)
 
 /* OWNER TO  - complete with available roles */
 	else if (TailMatches("OWNER", "TO"))
-		COMPLETE_WITH_QUERY(Query_for_list_of_roles);
+		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles,
+ Keywords_for_list_of_owner_to_roles);
 
 /* ORDER BY */
 	else if (TailMatches("FROM", MatchAny, "ORDER"))
-- 
2.32.0



Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread Önder Kalacı
Hi Hayato Kuroda,



> If the checking function is called not periodically but GetConnection(),
> it means that the health of foreign servers will be check only when remote
> connections are used.
> So following workload described in [1] cannot handle the issue.
>
> BEGIN --- remote operations--- local operations --- COMMIT
>
>
As far as I can see this patch is mostly useful for detecting the failures
on the initial remote command. This is especially common when the remote
server does a failover/switchover and postgres_fdw uses a cached connection
to access to the remote server.

The remote server failures within a transaction block sounds like a much
less common problem. Still, you can give a good error message before COMMIT
is sent to the remote server.

I agree that this doesn't solve the issue you described, but I'm not sure
if it is worthwhile to fix such a problem.


> > Can we have this function/logic on Postgres core, so that other
> extensions
> > can also use?
>
> I was not sure about any use-case, but I think it can because it does
> quite general things.
> Is there any good motivation to do that?
>

I think any extension that deals with multiple Postgres nodes can benefit
from such logic. In fact, the reason I realized this patch is that on the
Citus extension, we are trying to solve a similar problem [1], [2].

Thinking even more, I think any extension that uses libpq and WaitEventSets
can benefit from such a function.


> > Do you see any performance implications of creating/freeing waitEventSets
> > per check? I wonder if we can somehow re-use the same waitEventSet by
> > modifyWaitEvent? I guess no, but still, if this check causes a
> performance
> > implication, can we somehow cache 1 waitEventSet per connection?
>
> I have not tested yet, but I agreed this will be caused performance
> decrease.
> In next version first I will re-use the event set anyway, and it must be
> considered later.
> Actually I'm not sure your suggestion,
> but you mean to say that we can add a hash table that associates  PGconn
> and WaitEventSet,  right?
>
>
I think it also depends on where you decide to put
pgfdw_connection_check_internal(). If you prefer the postgres_fdw side,
could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?

But if you decide to put it into the Postgres side, the API
for pgfdw_connection_check_internal() -- or equivalent function -- could be
discussed. Do we pass a WaitEventSet and if it is NULL create a new one,
else use what is passed to the function? Not sure, maybe you can come up
with a better API.

Thanks,
Onder KALACI

[1]: Check WL_SOCKET_CLOSED before using any connection by onderkalaci ·
Pull Request #6259 · citusdata/citus (github.com)

[2]: Add a single connection retry in MULTI_CONNECTION_LOST state by
marcocitus · Pull Request #6283 · citusdata/citus (github.com)



PostgreSQL 15 RC2 + GA release dates

2022-10-03 Thread Jonathan S. Katz

Hi,

We are planning a PostgreSQL 15 RC2 release for October 6, 2022. We are 
releasing a second release candidate due to the revert of an 
optimization around the GROUP BY clause.


We are still planning the PostgreSQL 15 GA release for October 13, but 
we may push this to October 20 based on reports from the RCs.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Question: test "aggregates" failed in 32-bit machine

2022-10-03 Thread Tom Lane
"Jonathan S. Katz"  writes:
> Based on the above discussion, the RMT asks for a revert of db0d67db2 in 
> the v15 release. The RMT also recommends a revert in HEAD but does not 
> have the power to request that.

Roger, I'll push these shortly.

regards, tom lane




Re: Question: test "aggregates" failed in 32-bit machine

2022-10-03 Thread Jonathan S. Katz

On 10/2/22 8:45 PM, Michael Paquier wrote:

On Sun, Oct 02, 2022 at 02:11:12PM -0400, Tom Lane wrote:

"Jonathan S. Katz"  writes:

OK. For v15 I am heavily in favor for the least risky approach given the
point we are at in the release cycle. The RMT hasn’t met yet to discuss,
but from re-reading this thread again, I would recommend to revert
(i.e. the “straight up revert”).


OK by me.


I don't quite see why it would be to let this code live on HEAD if it
is not ready to be merged as there is a risk of creating side issues
with things tied to the costing still ready to be merged, so I agree
that the reversion done on both branches is the way to go for now.
This could always be reworked and reproposed in the future.


[RMT-hat]

Just to follow things procedure-wise[1], while there do not seem to be 
any objections to reverting through regular community processes, I do 
think the RMT has to make this ask as Tomas (patch committer) has not 
commented and we are up against release deadlines.


Based on the above discussion, the RMT asks for a revert of db0d67db2 in 
the v15 release. The RMT also recommends a revert in HEAD but does not 
have the power to request that.


We do hope to see continued work and inclusion of this feature for a 
future release. We understand that the work on this optimization is 
complicated and appreciate all of the efforts on it.


Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/Release_Management_Team



OpenPGP_signature
Description: OpenPGP digital signature


Re: Avoid memory leaks during base backups

2022-10-03 Thread Bharath Rupireddy
On Thu, Sep 29, 2022 at 10:38 PM Bharath Rupireddy
 wrote:
>
> Please review the v4 patch.

I used valgrind for testing. Without patch, there's an obvious memory
leak [1], with patch no memory leak.

I used ALLOCSET_START_SMALL_SIZES instead of ALLOCSET_DEFAULT_SIZES
for backup memory context so that it can start small and grow if
required.

I'm attaching v5 patch, please review it further.

[1]
==00:00:01:36.306 145709== VALGRINDERROR-BEGIN
==00:00:01:36.306 145709== 24 bytes in 1 blocks are still reachable in
loss record 122 of 511
==00:00:01:36.306 145709==at 0x98E501: palloc (mcxt.c:1170)
==00:00:01:36.306 145709==by 0x9C1795: makeStringInfo (stringinfo.c:45)
==00:00:01:36.306 145709==by 0x2DE22A: pg_backup_start (xlogfuncs.c:96)
==00:00:01:36.306 145709==by 0x4D2DB6: ExecMakeTableFunctionResult
(execSRF.c:234)
==00:00:01:36.306 145709==by 0x4F08DA: FunctionNext (nodeFunctionscan.c:95)
==00:00:01:36.306 145709==by 0x4D48EA: ExecScanFetch (execScan.c:133)
==00:00:01:36.306 145709==by 0x4D4963: ExecScan (execScan.c:182)
==00:00:01:36.306 145709==by 0x4F0C84: ExecFunctionScan
(nodeFunctionscan.c:270)
==00:00:01:36.306 145709==by 0x4D0255: ExecProcNodeFirst
(execProcnode.c:464)
==00:00:01:36.306 145709==by 0x4C32D4: ExecProcNode (executor.h:259)
==00:00:01:36.306 145709==by 0x4C619C: ExecutePlan (execMain.c:1636)
==00:00:01:36.306 145709==by 0x4C3A0F: standard_ExecutorRun (execMain.c:363)
==00:00:01:36.306 145709==
==00:00:01:36.306 145709== VALGRINDERROR-END

==00:00:01:36.334 145709== VALGRINDERROR-BEGIN
==00:00:01:36.334 145709== 1,024 bytes in 1 blocks are still reachable
in loss record 426 of 511
==00:00:01:36.334 145709==at 0x98E501: palloc (mcxt.c:1170)
==00:00:01:36.334 145709==by 0x9C17CF: initStringInfo (stringinfo.c:63)
==00:00:01:36.334 145709==by 0x9C17A5: makeStringInfo (stringinfo.c:47)
==00:00:01:36.334 145709==by 0x2DE22A: pg_backup_start (xlogfuncs.c:96)
==00:00:01:36.334 145709==by 0x4D2DB6: ExecMakeTableFunctionResult
(execSRF.c:234)
==00:00:01:36.334 145709==by 0x4F08DA: FunctionNext (nodeFunctionscan.c:95)
==00:00:01:36.334 145709==by 0x4D48EA: ExecScanFetch (execScan.c:133)
==00:00:01:36.334 145709==by 0x4D4963: ExecScan (execScan.c:182)
==00:00:01:36.334 145709==by 0x4F0C84: ExecFunctionScan
(nodeFunctionscan.c:270)
==00:00:01:36.334 145709==by 0x4D0255: ExecProcNodeFirst
(execProcnode.c:464)
==00:00:01:36.334 145709==by 0x4C32D4: ExecProcNode (executor.h:259)
==00:00:01:36.334 145709==by 0x4C619C: ExecutePlan (execMain.c:1636)
==00:00:01:36.334 145709==
==00:00:01:36.334 145709== VALGRINDERROR-END

==00:00:01:36.335 145709== VALGRINDERROR-BEGIN
==00:00:01:36.335 145709== 1,096 bytes in 1 blocks are still reachable
in loss record 431 of 511
==00:00:01:36.335 145709==at 0x98E766: palloc0 (mcxt.c:1201)
==00:00:01:36.335 145709==by 0x2DE152: pg_backup_start (xlogfuncs.c:81)
==00:00:01:36.335 145709==by 0x4D2DB6: ExecMakeTableFunctionResult
(execSRF.c:234)
==00:00:01:36.335 145709==by 0x4F08DA: FunctionNext (nodeFunctionscan.c:95)
==00:00:01:36.335 145709==by 0x4D48EA: ExecScanFetch (execScan.c:133)
==00:00:01:36.335 145709==by 0x4D4963: ExecScan (execScan.c:182)
==00:00:01:36.335 145709==by 0x4F0C84: ExecFunctionScan
(nodeFunctionscan.c:270)
==00:00:01:36.335 145709==by 0x4D0255: ExecProcNodeFirst
(execProcnode.c:464)
==00:00:01:36.335 145709==by 0x4C32D4: ExecProcNode (executor.h:259)
==00:00:01:36.335 145709==by 0x4C619C: ExecutePlan (execMain.c:1636)
==00:00:01:36.335 145709==by 0x4C3A0F: standard_ExecutorRun (execMain.c:363)
==00:00:01:36.335 145709==by 0x4C37FA: ExecutorRun (execMain.c:307)
==00:00:01:36.335 145709==
==00:00:01:36.335 145709== VALGRINDERROR-END

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b99336d3a5510aa10036844ef8d3a443694d5dec Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 3 Oct 2022 13:06:30 +
Subject: [PATCH v5] Avoid memory leaks during backups

Postgres currently can leak memory if a failure occurs during base
backup in do_pg_backup_start() or do_pg_backup_stop() or
perform_base_backup(). The palloc'd memory such as backup_state or
tablespace_map in xlogfuncs.c or basebackup.c or tablespaceinfo in
perform_base_backup() or any other, is left-out which may cause
memory bloat on the server eventually. To experience this issue,
run pg_basebackup with --label name longer than 1024 characters and
observe memory with watch command, the memory usage goes up. It looks
like the memory leak issue has been there for quite some time.

To solve the above problem, we create separate memory context for
backup and add clean up callback in PostgresMain()'s error handling
code using sigsetjmp().

The design of the patch is as follows:

For backup functions or on-line backups, a new memory context is
created as a child of TopMemoryContext 

[Commitfest 2022-09] Date is Over.

2022-10-03 Thread Ibrar Ahmed
The date of the current commitfest is over, here is the current status of
the  "September 2022 commitfest."
There were 296 patches in the commitfest and 58 were get committed.

Total: 296.
Needs review: 155.
Waiting on Author: 41.
Ready for Committer: 19.
Committed: 58.
Moved to next CF: 8.
 Returned with Feedback: 5.
Rejected: 2. Withdrawn: 8.

-- 
Ibrar Ahmed.


Re: PostgreSQL 15 GA release date

2022-10-03 Thread Ranier Vilela
Em seg., 3 de out. de 2022 às 09:40, Julien Rouhaud 
escreveu:

> On Mon, Oct 03, 2022 at 09:05:34AM -0300, Ranier Vilela wrote:
> > >Please let us know if you have any questions. We're excited that we are
> > >very close to officially releasing PostgreSQL 15.
> > Hi, forgive my ignorance.
> > What are the rules for a commit to be included in the release notes?
> > Does it need to be explicitly requested?
> >
> > Why is the commit 8cb2a22
> > <
> https://github.com/postgres/postgres/commit/8cb2a22bbb2cf4212482ac15021ceaa2e9c52209
> >
> > (bug fix), not included?
> > I think users of this function should be better informed.
>
> That commit has been backpatched, so it's not new in pg15 and will be in
> the
> release notes of the next minor versions.
>
Thanks Julien.

regards,
Ranier Vilela


Re: PostgreSQL 15 GA release date

2022-10-03 Thread Ranier Vilela
Em seg., 3 de out. de 2022 às 09:39, Justin Pryzby 
escreveu:

> On Mon, Oct 03, 2022 at 09:05:34AM -0300, Ranier Vilela wrote:
> > >Please let us know if you have any questions. We're excited that we are
> > >very close to officially releasing PostgreSQL 15.
> > Hi, forgive my ignorance.
> > What are the rules for a commit to be included in the release notes?
> > Does it need to be explicitly requested?
> >
> > Why is the commit 8cb2a22 (bug fix), not included?
>
> It's not included in the release notes for major version 15, since it
> was backpatched to v12, so it's not a "change in v15" (for example,
> someone upgrading from v14.6 will already have that change).
>
> It'll be included in the minor release notes, instead.
>
Thanks Justin for the clarification.

regards,
Ranier Vilela


Re: [BUG] parenting a PK constraint to a self-FK one (Was: Self FK oddity when attaching a partition)

2022-10-03 Thread Jehan-Guillaume de Rorthais
On Fri, 30 Sep 2022 16:11:09 -0700
Zhihong Yu  wrote:

> On Fri, Sep 30, 2022 at 3:30 PM Jehan-Guillaume de Rorthais 
> wrote:
...
> 
> +* Self-Foreign keys are ignored as the index was preliminary
> created
> 
> preliminary created -> primarily created

Thank you! This is fixed and rebased on current master branch in patches
attached.

Regards,
>From 34ee3a3737e36d440824db3e8eb7c8f802a7276a Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Fri, 30 Sep 2022 23:42:21 +0200
Subject: [PATCH 1/3] Fix bug between constraint parenting and self-FK

Function get_relation_idx_constraint_oid() assumed an index can
only be associated to zero or one constraint for a given relation.

However, if the relation has a self-foreign key, the index could
be referenced as enforcing two different constraints on the same
relation: the PK and the FK.

This lead to a bug with partitionned table where the self-FK could
become the parent of a partition's PK.
---
 src/backend/catalog/pg_constraint.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index bb65fb1e0a..ba7a8ff965 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -985,8 +985,8 @@ get_relation_constraint_attnos(Oid relid, const char *conname,
 }
 
 /*
- * Return the OID of the constraint associated with the given index in the
- * given relation; or InvalidOid if no such index is catalogued.
+ * Return the OID of the original constraint enforced by the given index
+ * in the given relation; or InvalidOid if no such index is catalogued.
  */
 Oid
 get_relation_idx_constraint_oid(Oid relationId, Oid indexId)
@@ -1011,6 +1011,18 @@ get_relation_idx_constraint_oid(Oid relationId, Oid indexId)
 		Form_pg_constraint constrForm;
 
 		constrForm = (Form_pg_constraint) GETSTRUCT(tuple);
+
+		/*
+		 * Self-Foreign keys are ignored as the index was primarily created
+		 * to enforce a PK or unique constraint in the first place.  This means
+		 * only primary key, unique constraint or an exlusion one can be
+		 * returned.
+		 */
+		if (constrForm->contype != CONSTRAINT_PRIMARY
+			&& constrForm->contype != CONSTRAINT_UNIQUE
+			&& constrForm->contype != CONSTRAINT_EXCLUSION)
+			continue;
+
 		if (constrForm->conindid == indexId)
 		{
 			constraintId = constrForm->oid;
-- 
2.37.3

>From 6c8d9595dc1b2b62e3cc5ce3bc1c970b6eedcbc2 Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Fri, 30 Sep 2022 23:54:04 +0200
Subject: [PATCH 2/3] Fix bug where a self-FK was cloned twice on partitions

A tbale with a self-foreign keys appears on both the referenced
and referencing side of the constraint.

Because of this, when creating or attaching a new partition to
a table, the self-FK was cloned twice, by CloneFkReferenced() and
CloneFkReferencing() functions.
---
 src/backend/commands/tablecmds.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7d8a75d23c..d19d917571 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9968,6 +9968,9 @@ CloneForeignKeyConstraints(List **wqueue, Relation parentRel,
  * clone those constraints to the given partition.  This is to be called
  * when the partition is being created or attached.
  *
+ * Note that this ignore self-referenced FK. Those are handled in
+ * CloneFkReferencing().
+ *
  * This recurses to partitions, if the relation being attached is partitioned.
  * Recursion is done by calling addFkRecurseReferenced.
  */
@@ -10047,10 +10050,14 @@ CloneFkReferenced(Relation parentRel, Relation partitionRel)
 		constrForm = (Form_pg_constraint) GETSTRUCT(tuple);
 
 		/*
-		 * As explained above: don't try to clone a constraint for which we're
-		 * going to clone the parent.
+		 * As explained above and in the function header:
+		 * - don't try to clone a constraint for which we're going to clone
+		 *   the parent.
+		 * - don't clone a self-referenced constraint here as it is handled
+		 *   on the CloneFkReferencing() side
 		 */
-		if (list_member_oid(clone, constrForm->conparentid))
+		if (list_member_oid(clone, constrForm->conparentid) ||
+			constrForm->conrelid == RelationGetRelid(parentRel))
 		{
 			ReleaseSysCache(tuple);
 			continue;
-- 
2.37.3

>From a08ed953e3cd559ea1fec78301cb72e611f9387e Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Sat, 1 Oct 2022 00:00:28 +0200
Subject: [PATCH 3/3] Add regression tests about self-FK with partitions

---
 src/test/regress/expected/constraints.out | 37 +++
 src/test/regress/sql/constraints.sql  | 31 +++
 2 files changed, 68 insertions(+)

diff --git a/src/test/regress/expected/constraints.out b/src/test/regress/expected/constraints.out
index e6f6602d95..49aa659330 100644
--- a/src/test/regress/expected/constraints.out

Re: PostgreSQL 15 GA release date

2022-10-03 Thread Julien Rouhaud
On Mon, Oct 03, 2022 at 09:05:34AM -0300, Ranier Vilela wrote:
> >Please let us know if you have any questions. We're excited that we are
> >very close to officially releasing PostgreSQL 15.
> Hi, forgive my ignorance.
> What are the rules for a commit to be included in the release notes?
> Does it need to be explicitly requested?
> 
> Why is the commit 8cb2a22
> 
> (bug fix), not included?
> I think users of this function should be better informed.

That commit has been backpatched, so it's not new in pg15 and will be in the
release notes of the next minor versions.




Re: PostgreSQL 15 GA release date

2022-10-03 Thread Justin Pryzby
On Mon, Oct 03, 2022 at 09:05:34AM -0300, Ranier Vilela wrote:
> >Please let us know if you have any questions. We're excited that we are
> >very close to officially releasing PostgreSQL 15.
> Hi, forgive my ignorance.
> What are the rules for a commit to be included in the release notes?
> Does it need to be explicitly requested?
> 
> Why is the commit 8cb2a22 (bug fix), not included?

It's not included in the release notes for major version 15, since it
was backpatched to v12, so it's not a "change in v15" (for example,
someone upgrading from v14.6 will already have that change).

It'll be included in the minor release notes, instead.

-- 
Justin




Re: Support load balancing in libpq

2022-10-03 Thread Jelte Fennema
I attached a new patch which does the following:
1. adds tap tests
2. adds random_seed parameter to libpq (required for tap tests)
3. frees conn->loadbalance in freePGConn
4. add more expansive docs on the feature its behaviour

Apart from bike shedding on the name of the option I think it's pretty good now.

> Isn't this exactly what connect_timeout is providing? In my tests, it
> worked exactly as I would expect it, i.e. after connect_timeout seconds,
> libpq was re-shuffling and going for another host.

Yes, this was the main purpose of multiple hosts previously. This patch
doesn't change that, and it indeed continues to work when enabling
load balancing too. I included this in the tap tests.

> I tested this some more, and found it somewhat surprising that at least
> when looking at it on a microscopic level, some hosts are chosen more
> often than the others for a while.

That does seem surprising, but it looks like it might simply be bad luck.
Did you compile with OpenSSL support? Otherwise, the strong random
source might not be used. 

> So it looks like it load-balances between pg1 and pg3, and not between
> the three IPs -  is this expected?
>
> If I switch from "host=pg1,pg3" to "host=pg1,pg1,pg3", each IP adress is
> hit roughly equally.
>
> So I guess this is how it should work, but in that case I think the
> documentation should be more explicit about what is to be expected if a
> host has multiple IP addresses or hosts are specified multiple times in
> the connection string.

Yes, this behaviour is expected I tried to make that clearer in the newest
version of the docs. 

> For the patch itself, I think it is better to use a more precise time
> function in libpq_prng_init or call it only once.
> Thought it is a special corner case, imagine all the connection attempts at
> first second will be seeded with the save

I agree that using microseconds would probably be preferable. But that seems
like a separate patch, since I took this initialization code from the 
InitProcessGlobals
function. Also, it shouldn't be a big issue in practice, since usually the 
strong random 
source will be used.

0001-Support-load-balancing-in-libpq.patch
Description: 0001-Support-load-balancing-in-libpq.patch


Re: problems with making relfilenodes 56-bits

2022-10-03 Thread Robert Haas
On Fri, Sep 30, 2022 at 8:20 PM Andres Freund  wrote:
> I think it'd be interesting to look at per-record-type stats between two
> equivalent workload, to see where practical workloads suffer the most
> (possibly with fpw=off, to make things more repeatable).

I would expect, and Dilip's results seem to confirm, the effect to be
pretty uniform: basically, nearly every record gets bigger by 4 bytes.
That's because most records contain at least one block reference, and
if they contain multiple block references, likely all but one will be
marked BKPBLOCK_SAME_REL, so we pay the cost just once.

Because of alignment padding, the practical effect is probably that
about half of the records get bigger by 8 bytes and the other half
don't get bigger at all. But I see no reason to believe that things
are any better or worse than that. Most interesting record types are
going to contain some kind of variable-length payload, so the chances
that a 4 byte size increase pushes you across a MAXALIGN boundary seem
to be no better or worse than fifty-fifty.

> I think it'd be an OK tradeoff to optimize WAL usage for a few of the worst to
> pay off for 56bit relfilenodes. The class of problems foreclosed is large
> enough to "waste" "improvement potential" on this.

I thought about trying to buy back some space elsewhere, and I think
that would be a reasonable approach to getting this committed if we
could find a way to do it. However, I don't see a terribly obvious way
of making it happen. Trying to do it by optimizing specific WAL record
types seems like a real pain in the neck, because there's tons of
different WAL records that all have the same problem. Trying to do it
in a generic way makes more sense, and the fact that we have 2 padding
bytes available in XLogRecord seems like a place to start looking, but
the way forward from there is not clear to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




re: PostgreSQL 15 GA release date

2022-10-03 Thread Ranier Vilela
>Please let us know if you have any questions. We're excited that we are
>very close to officially releasing PostgreSQL 15.
Hi, forgive my ignorance.
What are the rules for a commit to be included in the release notes?
Does it need to be explicitly requested?

Why is the commit 8cb2a22

(bug fix), not included?
I think users of this function should be better informed.

regards,
Ranier Vilela


Re: Tracking last scan time

2022-10-03 Thread Dave Page
Hi

On Fri, 30 Sept 2022 at 18:58, Andres Freund  wrote:

> Hi,
>
> On 2022-09-30 17:58:31 +0200, Vik Fearing wrote:
> > On 9/7/22 12:03, Dave Page wrote:
> > > Here's a v4 patch. This reverts to using
> > > GetCurrentTransactionStopTimestamp() for the last_scan times, and will
> > > set xactStopTimestamp the first time
> GetCurrentTransactionStopTimestamp()
> > > is called, thus avoiding multiple gettimeofday() calls.
> > > SetCurrentTransactionStopTimestamp() is removed, as is use
> > > of xactStopTimestamp (except when resetting it to 0).
> >
> > This patch looks good to me and has much saner behavior than what it
> > replaces.
>
> I agree. However, it seems like a significant enough behavioural change
> that
> I'd rather commit it as a separate patch.  I agree with Vik's judgement
> that
> the patch otherwise is otherwise ready. Happy to do that split myself, or
> you
> can do it...
>

Thanks. It's just the changes in xact.c, so it doesn't seem like it would
cause you any more work either way, in which case, I'll leave it to you :-)

FYI, the OID I chose was simply the closest single value to those used for
the other related functions (e.g. pg_stat_get_numscans). Seemed like a good
way to use up one more random unused value, but I don't care if it gets
changed to the 8000+ range.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com


Re: Allow logical replication to copy tables in binary format

2022-10-03 Thread Melih Mutlu
Hi Takamichi,

Thanks for your reviews.

I addressed your reviews, please find the attached patch.

osumi.takami...@fujitsu.com , 16 Eyl 2022 Cum,
16:51 tarihinde şunu yazdı:

> (1) whitespace issues
>

Fixed

(2) Suggestion to update another general description about the subscription
>
> Kindly have a look at doc/src/sgml/logical-replication.sgml.
>
> "The data types of the columns do not need to match,
> as long as the text representation of the data can be converted to the
> target type.
> For example, you can replicate from a column of type integer to a column
> of type bigint."
>
> With the patch, I think we have an impact about those descriptions
> since enabling the binary option for a subscription and executing the
> initial synchronization requires the same data types for binary format.
>
> I suggest that we update those descriptions as well.
>

You're right, this needs to be stated in the docs. Modified descriptions
accordingly.


> (3) shouldn't test that we fail expectedly with binary copy for different
> types ?
>
> How about having a test that we correctly fail with different data types
> between the publisher and the subscriber, for instance ?
>

Modified 002_types.pl test such that it now tests the replication between
different data types.
It's expected to fail if the binary is enabled, and succeed if not.


> (4) Error message of the binary format copy
>
> I've gotten below message from data type contradiction (between integer
> and bigint).
> Probably, this is unclear for the users to understand the direct cause
> and needs to be adjusted ?
> This might be a similar comment Euler mentioned in [1].
>
> 2022-09-16 11:54:54.835 UTC [4570] ERROR:  insufficient data left in
> message
> 2022-09-16 11:54:54.835 UTC [4570] CONTEXT:  COPY tab, line 1, column id
>

It's already unclear for users to understand what's the issue if they're
copying data between different column types via the COPY command.
This issue comes from COPY, and logical replication just utilizes COPY.
I don't think it would make sense to adjust an error message from a
functionality which logical replication only uses and has no direct impact
on.
It might be better to do this in a separate patch. What do you think?


> (5) Minor adjustment of the test comment in 002_types.pl.
>
> +is( $result, $sync_result, 'check initial sync on subscriber');
> +is( $result_binary, $sync_result, 'check initial sync on subscriber in
> binary');
>
>  # Insert initial test data
>
> There are two same comments which say "Insert initial test data" in this
> file.
> We need to update them, one for the initial table sync and
> the other for the application of changes.
>

Fixed.

I agree with the direction to support binary copy for v16 and later.
>
> IIUC, the binary format replication with different data types fails even
> during apply phase on HEAD.
> I thought that means, the upgrade concern only applies to a scenario that
> the user executes
> only initial table synchronizations between the publisher and subscriber
> and doesn't replicate any data at apply phase after that. I would say
> this isn't a valid scenario and your proposal makes sense.
>

No, logical replication in binary does not fail on apply phase if data
types are different.
The concern with upgrade (if data types are not the same) would be not
being able to create a new subscription with binary enabled or replicate
new tables added into publication.
Replication of tables from existing subscriptions would not be affected by
this change since they will already be in the apply phase, not tablesync.
Do you think this would still be an issue?


Thanks,
Melih


v3-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description: Binary data


Re: Small miscellaneous fixes

2022-10-03 Thread Ranier Vilela
Em seg., 3 de out. de 2022 às 05:01, Masahiko Sawada 
escreveu:

> On Fri, Sep 30, 2022 at 9:08 AM Ranier Vilela  wrote:
> >
> > Hi.
> >
> > There are assorted fixes to the head branch.
> >
> > 1. Avoid useless reassigning var _logsegno
> (src/backend/access/transam/xlog.c)
> > Commit 7d70809 left a little oversight.
> > XLByteToPrevSeg and XLByteToSeg are macros, and both assign _logsegno.
> > So, the first assignment is lost and is useless.
> >
> > 2. Avoid retesting log_min_duration (src/backend/commands/analyze.c)
> > The log_min_duration has already been tested before and the second test
> > can be safely removed.
> >
> > 3. Avoid useless var declaration record (src/backend/utils/misc/guc.c)
> > The var record is never really used.
>
> Three changes look good to me.
>
Hi, thanks for reviewing this.


>
> >
> > 4. Fix declaration volatile signal var (src/bin/pgbench/pgbench.c)
> > Like how to commit 5ac9e86, this is a similar case.
>
> The same is true also for alarm_triggered in pg_test_fsync.c?
>
I don't think so.
If I understand the problem correctly, the failure can occur with true
signals, provided by the OS
In the case at hand, it seems to me more like an internal form of signal,
that is, simulated.
So bool works fine.

CF entry created:
https://commitfest.postgresql.org/40/3925/

regards,
Ranier Vilela


RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Önder,

Thank you for being interest to my patch! Your suggestions will be included to 
newer version.

> In other words, what is the trade-off for calling
> pgfdw_connection_check_internal() inside GetConnection() when we are about
> to use a "cached" connection? I think that might simplify the patch as well

If the checking function is called not periodically but GetConnection(),
it means that the health of foreign servers will be check only when remote 
connections are used.
So following workload described in [1] cannot handle the issue.

BEGIN --- remote operations--- local operations --- COMMIT

But, yes, I perfectly agreed that it could simplify the code
because we can reduce the timer part. This is second plan of this patch,
I may move on this approach if it is still useful.

> Can we have this function/logic on Postgres core, so that other extensions
> can also use?

I was not sure about any use-case, but I think it can because it does quite 
general things.
Is there any good motivation to do that?

> What if PQsocket(conn) returns -1? Maybe we move certain connection state
> checks into pgfdw_connection_check_internal() such that it is more generic?
> I can think of checks like: conn!=NULL, PQsocket(conn) != PGINVALID_SOCKET,
> PQstatus == CONNECTION_OK

ereport(ERROR) will be thrown if PQsocket(conn) returns -1.
All of you said should be handled here. I will modify it.

> Do you see any performance implications of creating/freeing waitEventSets
> per check? I wonder if we can somehow re-use the same waitEventSet by
> modifyWaitEvent? I guess no, but still, if this check causes a performance
> implication, can we somehow cache 1 waitEventSet per connection?

I have not tested yet, but I agreed this will be caused performance decrease.
In next version first I will re-use the event set anyway, and it must be 
considered later.
Actually I'm not sure your suggestion,
but you mean to say that we can add a hash table that associates  PGconn and 
WaitEventSet,  right?

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB58662809E678253B90E82CE5F5889%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PATCH] Add peer authentication TAP test

2022-10-03 Thread Drouvot, Bertrand

Hi,

On 10/3/22 9:46 AM, Michael Paquier wrote:

On Fri, Sep 30, 2022 at 07:51:29PM +0200, Drouvot, Bertrand wrote:

Agree that it could be simplified, thanks for the hints!

Attached a simplified version.


While looking at that, I have noticed that it is possible to reduce
the number of connection attempts (for example no need to re-test that
the connection works when the map is not set, and the authn log would
be the same with the map in place).


Yeah that's right, thanks for simplifying further.


 Note that a path's meson.build
needs a refresh for any new file added into the tree, with 003_peer.pl
missing so this new test was not running in the recent CI runs.  The
indentation was also a bit wrong and I have tweaked a few comments,
before finally applying it.


Thanks!

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread Önder Kalacı
Hi Hayato Kuroda,

Thanks for the patch. I think there are some non-fdw extensions out there
which could benefit from this logic. That's why I want to first learn some
more about high-level design/goals of the patch a little more.

+/*
> + * Call callbacks for checking remote servers.
> + */
> +void
> +CallCheckingRemoteServersCallbacks(void)


Why do we run this periodically, but not when a specific connection is
going to be used? Wouldn't running it periodically prevent detecting some
already-closed sockets at the time of the connection used (e.g., we checked
10 seconds ago, the server died 5 seconds ago)?

In other words, what is the trade-off for calling
pgfdw_connection_check_internal() inside GetConnection() when we are about
to use a "cached" connection? I think that might simplify the patch as well.

+/*
> + * Helper function for pgfdw_connection_check
> + */
> +static bool
> +pgfdw_connection_check_internal(PGconn *conn)
> +{


Can we have this function/logic on Postgres core, so that other extensions
can also use?

 + AddWaitEventToSet(eventset, WL_SOCKET_CLOSED, PQsocket(conn), NULL,
> NULL);


What if PQsocket(conn) returns -1? Maybe we move certain connection state
checks into pgfdw_connection_check_internal() such that it is more generic?
I can think of checks like: conn!=NULL, PQsocket(conn) != PGINVALID_SOCKET,
PQstatus == CONNECTION_OK


+ eventset = CreateWaitEventSet(CurrentMemoryContext, 1);
> + AddWaitEventToSet(eventset, WL_SOCKET_CLOSED, PQsocket(conn), NULL,
> NULL);
> +
> + WaitEventSetWait(eventset, 0, , 1, 0);
> +
> + if (events.events & WL_SOCKET_CLOSED)
> + {
> + FreeWaitEventSet(eventset);
> + return false;
> + }
> + FreeWaitEventSet(eventset);


Do you see any performance implications of creating/freeing waitEventSets
per check? I wonder if we can somehow re-use the same waitEventSet by
modifyWaitEvent? I guess no, but still, if this check causes a performance
implication, can we somehow cache 1 waitEventSet per connection?


Thanks,
Onder KALACI
Developing the Citus extension @Microsoft


Re: Improve description of XLOG_RUNNING_XACTS

2022-10-03 Thread Masahiko Sawada
On Mon, Oct 3, 2022 at 5:15 PM Michael Paquier  wrote:
>
> On Fri, Sep 16, 2022 at 10:55:53AM +0900, Kyotaro Horiguchi wrote:
> > Putting an arbitrary upper-bound on the number of subxids to print
> > might work? I'm not sure how we can determine the upper-bound, though.
>
> You could hardcode it so as it does not blow up the whole view, say
> 20~30.  Anyway, I agree with the concern raised upthread about the
> amount of extra data this would add to the output, so having at least
> the number of subxids would be better than the existing state of
> things telling only if the list of overflowed.  So let's stick to
> that.

Why are only subtransaction information in XLOG_RUNNING_XACTS limited?
I think we have other information shown without bounds such as lock
information in XLOG_STANDBY_LOCK and invalidation messages in
XLOG_INVALIDATIONS.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Improve description of XLOG_RUNNING_XACTS

2022-10-03 Thread Michael Paquier
On Fri, Sep 16, 2022 at 10:55:53AM +0900, Kyotaro Horiguchi wrote:
> Putting an arbitrary upper-bound on the number of subxids to print
> might work? I'm not sure how we can determine the upper-bound, though.

You could hardcode it so as it does not blow up the whole view, say
20~30.  Anyway, I agree with the concern raised upthread about the
amount of extra data this would add to the output, so having at least
the number of subxids would be better than the existing state of
things telling only if the list of overflowed.  So let's stick to
that.

Here is another idea for the list of subxids: show the full list of
subxids only when using --xid.  We could always introduce an extra
switch, but that does not seem worth the extra workload here.
--
Michael


signature.asc
Description: PGP signature


Re: Small miscellaneous fixes

2022-10-03 Thread Masahiko Sawada
On Fri, Sep 30, 2022 at 9:08 AM Ranier Vilela  wrote:
>
> Hi.
>
> There are assorted fixes to the head branch.
>
> 1. Avoid useless reassigning var _logsegno (src/backend/access/transam/xlog.c)
> Commit 7d70809 left a little oversight.
> XLByteToPrevSeg and XLByteToSeg are macros, and both assign _logsegno.
> So, the first assignment is lost and is useless.
>
> 2. Avoid retesting log_min_duration (src/backend/commands/analyze.c)
> The log_min_duration has already been tested before and the second test
> can be safely removed.
>
> 3. Avoid useless var declaration record (src/backend/utils/misc/guc.c)
> The var record is never really used.

Three changes look good to me.

>
> 4. Fix declaration volatile signal var (src/bin/pgbench/pgbench.c)
> Like how to commit 5ac9e86, this is a similar case.

The same is true also for alarm_triggered in pg_test_fsync.c?

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-10-03 Thread Kshetrapaldesi Tutika
I applied this patch in my local environment and would like to reviewthe same 
before 14-October-2022 with some test data

postgres=# \d pg_stat_all_tables
View "pg_catalog.pg_stat_all_tables"
 Column  |   Type   | Collation | Nullable | 
Default
-+--+---+--+-
 relid   | oid  |   |  |
 schemaname  | name |   |  |
 relname | name |   |  |
 seq_scan| bigint   |   |  |
 seq_tup_read| bigint   |   |  |
 idx_scan| bigint   |   |  |
 idx_tup_fetch   | bigint   |   |  |
 n_tup_ins   | bigint   |   |  |
 n_tup_upd   | bigint   |   |  |
 n_tup_del   | bigint   |   |  |
 n_tup_hot_upd   | bigint   |   |  |
 n_live_tup  | bigint   |   |  |
 n_dead_tup  | bigint   |   |  |
 n_mod_since_analyze | bigint   |   |  |
 n_ins_since_vacuum  | bigint   |   |  |
 last_vacuum | timestamp with time zone |   |  |
 last_autovacuum | timestamp with time zone |   |  |
 last_analyze| timestamp with time zone |   |  |
 last_autoanalyze| timestamp with time zone |   |  |
 vacuum_count| bigint   |   |  |
 autovacuum_count| bigint   |   |  |
 analyze_count   | bigint   |   |  |
 autoanalyze_count   | bigint   |   |  |
 last_vacuum_index_scans | bigint   |   |  |

postgres=# select version();
 version
--
 PostgreSQL 15beta4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Andres,

> This seems to reliably fail on windows. See

Thanks for reporting. Actually this feature cannot be used on Windows machine.
To check the status of each socket that connects to the foreign server,
the socket event WL_SOCKET_CLOSED is used.
The event is only enabled on some OSes, and Windows machine cannot.  

The part must be skipped if the system cannot be used the event, but I was not 
sure how to do that...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: [PATCH] Add peer authentication TAP test

2022-10-03 Thread Michael Paquier
On Fri, Sep 30, 2022 at 07:51:29PM +0200, Drouvot, Bertrand wrote:
> Agree that it could be simplified, thanks for the hints!
> 
> Attached a simplified version.

While looking at that, I have noticed that it is possible to reduce
the number of connection attempts (for example no need to re-test that
the connection works when the map is not set, and the authn log would
be the same with the map in place).  Note that a path's meson.build
needs a refresh for any new file added into the tree, with 003_peer.pl
missing so this new test was not running in the recent CI runs.  The
indentation was also a bit wrong and I have tweaked a few comments,
before finally applying it.
--
Michael


signature.asc
Description: PGP signature


Re: [RFC] building postgres with meson - v13

2022-10-03 Thread samay sharma
Hi,

On Mon, Sep 26, 2022 at 6:02 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 24.09.22 20:09, Andres Freund wrote:
> > On 2022-09-24 13:52:29 -0400, Tom Lane wrote:
> >> ... btw, shouldn't the CF entry [1] get closed now?
> >
> > Unfortunately not - there's quite a few followup patches that haven't
> been
> > [fully] reviewed and thus not applied yet.
>
> Here is some review of the remaining ones (might not match exactly what
> you attached, I was working off your branch):
>
>
> 9f789350a7a7 meson: ci: wip: move compilerwarnings task to meson
>
> This sounds reasonable to me in principle, but I haven't followed the
> cirrus stuff too closely, and it doesn't say why it's "wip".  Perhaps
> others could take a closer look.
>
>
> ccf20a68f874 meson: ci: Add additional CI coverage
>
> IIUC, this is just for testing your branch, not meant for master?
>
>
> 02d84c21b227 meson: prereq: win: remove date from version number in
> win32ver.rc
>
> do it
>
>
> 5c42b3e7812e meson: WIP: Add some of the windows resource files
>
> What is the thinking on this now?  What does this change over the
> current state?
>
>
> 9bc60bccfd10 meson: Add support for relative rpaths, fixing tests on
> MacOS w/ SIP
>
> I suggest a separate thread and/or CF entry for this.  There have been
> various attempts to deal with SIP before, with varying results.  This
> is not part of the meson transition as such.
>
>
> 9f5be26c1215 meson: Add docs for building with meson
>
> I do like the overall layout of this.
>
> The "Supported Platforms" section should be moved back to near the end
> of the chapter.  I don't see a reason to move it forward, at least
> none that is related to the meson issue.
>

Agreed that it's unrelated to meson. However, I think it's better to move
it in the front as it's generally useful to know if your platform is
supported before you start performing the installation steps and get stuck
somewhere.

Do you think I should submit that as a separate commit in the same
patch-set or just move it out to a completely different patch submission?


>
> The changes to the "Getting the Source" section are also not
> appropriate for this patch.
>
>
Given that many developers are now using Git for downloading the source
code, I think it makes sense to be in the Getting the source section. Also,
meson today doesn't cleanly build via the tarballs. Hence, I added it to
the section (and patchset).

Do you think I should move this to a different patch?


> In the section "Building and Installation with meson":
>
> - Remove the "git clone" stuff.


> - The "Running tests" section should be moved to Chapter 33. Regression
> Tests.
>

The autoconf / make section also has a small section on how to run the
regression tests. The "Running tests" section is meant to the be equivalent
of that for meson (i.e. brief overview). I do intend to add a detailed
section to Chapter 33 with more info on how to interpret test results etc.
Do you think the current section is too verbose for where it is?


> Some copy-editing will probably be suitable, but I haven't focused on
> that yet.
>
>
> 9c00d355d0e9 meson: Add PGXS compatibility
>
> This looks like a reasonable direction to me.  How complete is it?  It
> says it works for some extensions but not others.  How do we define
> the target line here?
>
>
> 3fd5e13dcad3 meson: Add postgresql-extension.pc for building extension
> libraries
>
> Separate thread for this as well.  This is good and important, but we
> must also add it to the make build.
>
>
> 4b5bfa1c19aa meson: Add LLVM bitcode emission
>
> still in progress
>
>
> eb40f6e53104 meson: Add support for building with precompiled headers
>
> Any reason not to enable this by default?  The benefits on non-Windows
> appear to be less dramatic, but they are not zero.  Might be better to
> enable it consistently so that for example any breakage is easier
> caught.
>
>
> 377bfdea6042 meson: Add xmllint/xsltproc wrapper script to handle
> dependencies automatically
>
> Is this part of the initial transition, required for correctness, or
> is it an optional optimization?  Could use more explanation.  Maybe
> move to separate thread also?
>
>


Re: pg_upgrade test failure

2022-10-03 Thread Michael Paquier
On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote:
> So I think that setting is_lnk = false is good enough here.  Do
> you see a hole in it?

I cannot think on one, on top of my head.  Thanks for the
explanation.
--
Michael


signature.asc
Description: PGP signature