date:20230321

Re: Commitfest 2023-03 starting tomorrow!

2023-03-21 Thread Thomas Munro

On Tue, Mar 21, 2023 at 10:59 PM Alvaro Herrera  wrote:
> I gave a talk on Friday at a private EDB mini-conference about the
> PostgreSQL open source process; and while preparing for that one, I
> ran some 'git log' commands to obtain the number of code contributors
> for each release, going back to 9.4 (when we started using the
> 'Authors:' tag more prominently).  What I saw is a decline in the number
> of unique contributors, from its maximum at version 12, down to the
> numbers we had in 9.5.  We went back 4 years.  That scared me a lot.

Can you share the subtotals?

One immediate thought about commit log-based data is that we're not
using git Author, and the Author footer convention is only used by
some committers.  So I guess it must have been pretty laborious to
read the prose-form data?  We do have machine-readable Discussion
footers though.  By scanning those threads for SMTP From headers on
messages that had patches attached, we can find the set of (distinct)
addresses that contributed to each commit.  (I understand that some
people are co-authors and may not send an email, but if you counted
those and I didn't then you counted more, not fewer, contributors I
guess?  On the other hand if someone posted a patch that wasn't used
in the commit, or posted from two home/work/whatever accounts that's a
false positive for my technique.)

In a quick and dirty attempt at this made from bits of Python I
already had lying around (which may of course later turn out to be
flawed and need refinement), I extracted, for example:

postgres=# select * from t where commit =
'8d578b9b2e37a4d9d6f422ced5126acec62365a7';
  commit  |  time  |
address
--++--
 8d578b9b2e37a4d9d6f422ced5126acec62365a7 | 2023-03-21 14:29:34+13 |
Melanie Plageman 
 8d578b9b2e37a4d9d6f422ced5126acec62365a7 | 2023-03-21 14:29:34+13 |
Thomas Munro 
(2 rows)

You can really only go back about 5-7 years before that technique runs
out of steam, as the links run out. For what they're worth, these
numbers seem to suggests around ~260 distinct email addresses send
patches to threads referenced by commits.  Maybe we're in a 3-year
long plateau, but I don't see a peak back in r12:

postgres=# select date_trunc('year', time), count(distinct address)
from t group by 1 order by 1;
   date_trunc   | count
+---
 2015-01-01 00:00:00+13 |13
 2016-01-01 00:00:00+13 |37
 2017-01-01 00:00:00+13 |   144
 2018-01-01 00:00:00+13 |   187
 2019-01-01 00:00:00+13 |   225
 2020-01-01 00:00:00+13 |   260
 2021-01-01 00:00:00+13 |   256
 2022-01-01 00:00:00+13 |   262
 2023-01-01 00:00:00+13 |   119
(9 rows)

Of course 2023 is only just getting started.  Zooming in closer, the
peak period for this measurement is March/April, as I guess a lot of
little things make it into the final push:

postgres=# select date_trunc('month', time), count(distinct address)
from t where time > '2021-01-01' group by 1 order by 1;
   date_trunc   | count
+---
 2021-01-01 00:00:00+13 |83
 2021-02-01 00:00:00+13 |70
 2021-03-01 00:00:00+13 |   100
 2021-04-01 00:00:00+13 |   109
 2021-05-01 00:00:00+12 |54
 2021-06-01 00:00:00+12 |82
 2021-07-01 00:00:00+12 |86
 2021-08-01 00:00:00+12 |83
 2021-09-01 00:00:00+12 |73
 2021-10-01 00:00:00+13 |68
 2021-11-01 00:00:00+13 |66
 2021-12-01 00:00:00+13 |48
 2022-01-01 00:00:00+13 |68
 2022-02-01 00:00:00+13 |73
 2022-03-01 00:00:00+13 |   110
 2022-04-01 00:00:00+13 |90
 2022-05-01 00:00:00+12 |47
 2022-06-01 00:00:00+12 |50
 2022-07-01 00:00:00+12 |72
 2022-08-01 00:00:00+12 |81
 2022-09-01 00:00:00+12 |   105
 2022-10-01 00:00:00+13 |68
 2022-11-01 00:00:00+13 |74
 2022-12-01 00:00:00+13 |58
 2023-01-01 00:00:00+13 |65
 2023-02-01 00:00:00+13 |61
 2023-03-01 00:00:00+13 |64
(27 rows)

Perhaps the present March is looking a little light compared to the
usual 100+ number, but actually if you take just the 1st to the 21st
of previous Marches, they were similar sorts of numbers.

postgres=# select date_trunc('month', time), count(distinct address)
   from t
   where (time >= '2022-03-01' and time <= '2022-03-21') or
 (time >= '2021-03-01' and time <= '2021-03-21') or
 (time >= '2020-03-01' and time <= '2020-03-21') or
 (time >= '2019-03-01' and time <= '2019-03-21')
   group by 1 order by 1;
   date_trunc   | count
+---
 2019-03-01 00:00:00+13 |57
 2020-03-01 00:00:00+13 |57
 2021-03-01 00:00:00+13 |77
 2022-03-01 00:00:00+13 |72
(4 rows)

Another thing we could count is distinct names in the Commitfest app.
I count 162 names in Commitfest 42 today.  Unfortunately I don't have
the data to

Re: Error "initial slot snapshot too large" in create replication slot

2023-03-21 Thread Kyotaro Horiguchi

At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" 
 wrote in 
> On Wed, 12 Oct 2022 at 01:10, Michael Paquier  wrote:
> > The discussion seems to have stopped here.  As this is classified as a
> > bug fix, I have moved this patch to next CF, waiting on author for the
> > moment.
> 
> Kyotoro Horiguchi, any chance you'll be able to work on this for this
> commitfest? If so shout (or anyone else is planning to push it over
> the line Andres?) otherwise I'll move it on to the next release.

Ugg. sorry for being lazy.  I have lost track of the conversation. I'm
currently working on this and will come back soon with a new version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Initial Schema Sync for Logical Replication

2023-03-21 Thread Amit Kapila

On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada  wrote:
>
> On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira  wrote:
> >
> > > You should
> > > exclude them removing these objects from the TOC before running 
> > > pg_restore or
> > > adding a few pg_dump options to exclude these objects. Another issue is 
> > > related
> > > to different version. Let's say the publisher has a version ahead of the
> > > subscriber version, a new table syntax can easily break your logical
> > > replication setup. IMO pg_dump doesn't seem like a good solution for 
> > > initial
> > > synchronization.
> > >
> > > Instead, the backend should provide infrastructure to obtain the required 
> > > DDL
> > > commands for the specific (set of) tables. This can work around the 
> > > issues from
> > > the previous paragraph:
> > >
> > ...
> > > * don't need to worry about different versions.
> > >
> >
> > AFAICU some of the reasons why pg_dump is not allowed to dump from the
> > newer version are as follows: (a) there could be more columns in the
> > newer version of the system catalog and then Select * type of stuff
> > won't work because the client won't have knowledge of additional
> > columns. (b) the newer version could have new features (represented by
> > say new columns in existing catalogs or new catalogs) that the older
> > version of pg_dump has no knowledge of and will fail to get that data
> > and hence an inconsistent dump. The subscriber will easily be not in
> > sync due to that.
> >
> > Now, how do we avoid these problems even if we have our own version of
> > functionality similar to pg_dump for selected objects? I guess we will
> > face similar problems.
>
> Right. I think that such functionality needs to return DDL commands
> that can be executed on the requested version.
>
> > If so, we may need to deny schema sync in any such case.
>
> Yes. Do we have any concrete use case where the subscriber is an older
> version, in the first place?
>

As per my understanding, it is mostly due to the reason that it can
work today. Today, during an off-list discussion with Jonathan on this
point, he pointed me to a similar incompatibility in MySQL
replication. See the "SQL incompatibilities" section in doc[1]. Also,
please note that this applies not only to initial sync but also to
schema sync during replication. I don't think it would be feasible to
keep such cross-version compatibility for DDL replication.

Having said above, I don't intend that we must use pg_dump from the
subscriber for the purpose of initial sync. I think the idea at this
stage is to primarily write a POC patch to see what difficulties we
may face. The other options that we could try out are (a) try to
duplicate parts of pg_dump code in some way (by extracting required
code) for the subscription's initial sync, or (b) have a common code
(probably as a library or some other way) for the required
functionality. There could be more possibilities that we may not have
thought of yet. But the main point is that for approaches other than
using pg_dump, we should consider ways to avoid duplicity of various
parts of its code. Due to this, I think before ruling out using
pg_dump, we should be clear about its risks and limitations.

Thoughts?

[1] - https://dev.mysql.com/doc/refman/8.0/en/replication-compatibility.html
[2] - 
https://www.postgresql.org/message-id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: CREATE DATABASE ... STRATEGY WAL_LOG issues

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 09:34:14 -0700, Andres Freund wrote:
> On 2023-03-21 11:33:59 -0400, Robert Haas wrote:
> > That feels like it would be slightly more rational behavior,
> > but I'm not smart enough to guess whether anyone would actually be
> > happier (or less happy) after such a change than they are now.
> 
> Yea, I'm not either. The current behaviour does have the feature that it will
> read in some data for each table, but limits trashing of shared buffers for
> huge tables. That's good if your small to medium sized source database isn't
> in s_b, because the next CREATE DATABASE has a change to not need to read the
> data again. But if you have a source database with lots of small relations, it
> can easily lead to swamping s_b.

Patch with the two minimal fixes attached. As we don't know whether it's worth
changing the strategy, the more minimal fixes seem more appropriate.

Greetings,

Andres Freund
>From caa172f4faeb4a1c2f8c03d9c31f15d91c97dde3 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 20 Mar 2023 21:57:40 -0700
Subject: [PATCH v1] Fix memory leak and inefficiency in CREATE DATABASE ...
 STRATEGY WAL_LOG

RelationCopyStorageUsingBuffer() did not free the strategies used to access
the source / target relation. They memory was released at the end of the
transaction, but when using a template database with a lot of relations, the
temporary leak can become big prohibitively big.

RelationCopyStorageUsingBuffer() acquired the buffer for the target relation
with RBM_NORMAL, therefore requiring a read of a block guaranteed to be
zero. Use RBM_ZERO_AND_LOCK instead.

Discussion: https://postgr.es/m/20230321070113.o2vqqxogjykwg...@awork3.anarazel.de
---
 src/backend/storage/buffer/bufmgr.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0a05577b68d..95212a39416 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3833,11 +3833,9 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 		LockBuffer(srcBuf, BUFFER_LOCK_SHARE);
 		srcPage = BufferGetPage(srcBuf);
 
-		/* Use P_NEW to extend the destination relation. */
 		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, blkno,
-		   RBM_NORMAL, bstrategy_dst,
+		   RBM_ZERO_AND_LOCK, bstrategy_dst,
 		   permanent);
-		LockBuffer(dstBuf, BUFFER_LOCK_EXCLUSIVE);
 		dstPage = BufferGetPage(dstBuf);
 
 		START_CRIT_SECTION();
@@ -3855,6 +3853,9 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 		UnlockReleaseBuffer(dstBuf);
 		UnlockReleaseBuffer(srcBuf);
 	}
+
+	FreeAccessStrategy(bstrategy_src);
+	FreeAccessStrategy(bstrategy_dst);
 }
 
 /* -
-- 
2.38.0

Re: Data is copied twice when specifying both child and parent table in publication

2023-03-21 Thread Peter Smith

Here are some review comments for patch code of HEAD_v19-0001

==
doc/src/sgml/ref/create_publication.sgml

1.
+ 
+  There can be a case where a subscription combines multiple
+  publications. If a root partitioned table is published by any
+  subscribed publications which set
+  publish_via_partition_root = true, changes on this
+  root partitioned table (or on its partitions) will be published using
+  the identity and schema of this root partitioned table rather than
+  that of the individual partitions.
+ 

1a.
The paragraph prior to this one just refers to "partitioned tables"
instead of "root partitioned table", so IMO we should continue with
the same terminology.

I also modified the remaining text slightly. AFAIK my suggestion
conveys exactly the same information but is shorter.

SUGGESTION
There can be a case where one subscription combines multiple
publications. If any of those publications has set
publish_via_partition_root = true, then changes in
a partitioned table (or on its partitions) will be published using the
identity and schema of the partitioned table.

~

1b.
Shouldn't that paragraph (or possibly somewhere in the CREATE
SUBSCRIPTION notes?) also explain that in this scenario the logical
replication will only publish one set of changes? After all, this is
the whole point of the patch, but I am not sure if the user will know
of this behaviour from the current documentation.

==
src/backend/catalog/pg_publication.c

2. filter_partitions

BEFORE:
static void
filter_partitions(List *table_infos)
{
ListCell   *lc;

foreach(lc, table_infos)
{
bool skip = false;
List*ancestors = NIL;
ListCell*lc2;
published_rel*table_info = (published_rel *) lfirst(lc);

if (get_rel_relispartition(table_info->relid))
ancestors = get_partition_ancestors(table_info->relid);

foreach(lc2, ancestors)
{
Oid ancestor = lfirst_oid(lc2);

/* Is ancestor exists in the published table list? */
if (is_ancestor_member_tableinfos(ancestor, table_infos))
{
skip = true;
break;
}
}

if (skip)
table_infos = foreach_delete_current(table_infos, lc);
}
}

~

2a.
My previous review [1] (see #1) suggested putting most code within the
condition. AFAICT my comment still is applicable but was not yet
addressed.

2b.
IMO the comment "/* Is ancestor exists in the published table list?
*/" is unnecessary because it is already clear what is the purpose of
the function named "is_ancestor_member_tableinfos".


SUGGESTION
static void
filter_partitions(List *table_infos)
{
ListCell   *lc;

foreach(lc, table_infos)
{
if (get_rel_relispartition(table_info->relid))
{
bool skip = false;
ListCell*lc2;
published_rel*table_info = (published_rel *) lfirst(lc);
List *ancestors = get_partition_ancestors(table_info->relid);

foreach(lc2, ancestors)
{
Oid ancestor = lfirst_oid(lc2);

if (is_ancestor_member_tableinfos(ancestor, table_infos))
{
skip = true;
break;
}
}

if (skip)
table_infos = foreach_delete_current(table_infos, lc);
}
}
}

==
src/backend/commands/subscriptioncmds.c

3.
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
 {
  WalRcvExecResult *res;
- StringInfoData cmd;
+ StringInfoData cmd,
+ pub_names;
  TupleTableSlot *slot;
  Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
  List*tablelist = NIL;
- bool check_columnlist = (walrcv_server_version(wrconn) >= 15);
+ int server_version = walrcv_server_version(wrconn);
+ bool check_columnlist = (server_version >= 15);
+
+ initStringInfo(_names);
+ get_publications_str(publications, _names, true);

  initStringInfo();
- appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename \n");

- /* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(, ", t.attnames\n");
+ /* Get the list of tables from the publisher. */
+ if (server_version >= 16)
+ {

~

I think the 'pub_names' is only needed within that ">= 16" condition.

So all the below code can be moved into that scope can't it?

+ StringInfoData pub_names;
+ initStringInfo(_names);
+ get_publications_str(publications, _names, true);

+ pfree(pub_names.data);

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuNsvO9o9XzeJuSLsAsndgCKVphDPBRqYuOTy2bR28E%2Bg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-21 Thread Michael Paquier

On Wed, Mar 22, 2023 at 11:37:03AM +0900, Kyotaro Horiguchi wrote:
> In the original description, "buffer fetches" appears to be a plural
> form of a compound noun and correct, similar to "buffer hits"
> mentioned later. If we reword it, I think it should be "number of
> buffers fetched".

Using the plural makes sense, yes.
--
Michael


signature.asc
Description: PGP signature

Re: Using AF_UNIX sockets always for tests on Windows

2023-03-21 Thread Thomas Munro

Thanks Tom, Andres, Juan José, Andrew for your feedback.  I agree that
it's a better "OS harmonisation" outcome if we can choose both ways on
both OSes.  I will leave the 0003 patch aside for now due to lack of
time, but here's a rebase of the first two patches.  Since this is
really just more cleanup-obsolete-stuff background work, I'm going to
move it to the next CF.

0001 -- Teach mkdtemp() to care about permissions on Windows.
Reviewed by Juan José, who confirmed my blind-coded understanding and
showed an alternative version but didn't suggest doing it that way.
It's a little confusing that NULL "attributes" means something
different from NULL "descriptor", so I figured I should highlight that
difference more clearly with some new comments.  I guess one question
is "why should we expect the calling process to have the default
access token?"

0002 -- Use AF_UNIX for pg_regress.  This one removes a couple of
hundred Windows-only lines that set up SSPI stuff, and some comments
about a signal handling hazard that -- as far as I can tell by reading
manuals -- was never really true.

0003 -- TAP testing adjustments needs some more work based on feedback
already given, not included in this revision.
From e7fac7a15ed0eda6516e7fa0917c06e005341b00 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 7 Sep 2022 07:35:11 +1200
Subject: [PATCH v3 1/2] Make mkdtemp() more secure on Windows.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Our POSIX mkdtemp() implementation in src/port/mkdtemp.c code would
create directories with default permissions on Windows.  Fix, using the
native Windows API instead of mkdir().

This function is currently used by pg_regress's make_temp_sockdir().

Reviewed-by: Juan José Santamaría Flecha 
Discussion: https://postgr.es/m/CA%2BhUKGK30uLx9dpgkYwomgH0WVLUHytkChDgf3iUM2zp0pf_nA%40mail.gmail.com
---
 src/port/mkdtemp.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/src/port/mkdtemp.c b/src/port/mkdtemp.c
index 4578e8384c..9d3c4fce71 100644
--- a/src/port/mkdtemp.c
+++ b/src/port/mkdtemp.c
@@ -187,8 +187,35 @@ GETTEMP(char *path, int *doopen, int domkdir)
 		}
 		else if (domkdir)
 		{
+#ifdef WIN32
+			/*
+			 * Plain mkdir(path, 0700) would ignore the mode argument, so
+			 * we'll use the native Windows API to create the directory.  By
+			 * setting lpSecurityDescriptor to NULL, we get "the default
+			 * security descriptor associated with the access token of the
+			 * calling process.  [...]  By default, the default DACL in the
+			 * access token of a process allows access only to the user
+			 * represented by the access token."
+			 *
+			 * Note that a NULL lpSecurityDescriptor is not the same as a NULL
+			 * lpSecurityAttributes argument.  The latter would mean that the
+			 * ACL is inherited from the parent directory, which would
+			 * probably work out the same if it's the TMP directory, but by a
+			 * different route.
+			 */
+			SECURITY_ATTRIBUTES sa = {
+.nLength = sizeof(SECURITY_ATTRIBUTES),
+.lpSecurityDescriptor = NULL,
+.bInheritHandle = false
+			};
+
+			if (CreateDirectory(path, ))
+return 1;
+			_dosmaperr(GetLastError());
+#else
 			if (mkdir(path, 0700) >= 0)
 return 1;
+#endif
 			if (errno != EEXIST)
 return 0;
 		}
-- 
2.39.2

From 43ae66dc965d807dded8d434b7a1ea0b3f12e986 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 19 Aug 2022 11:28:38 +1200
Subject: [PATCH v3 2/2] Always use AF_UNIX sockets in pg_regress on Windows.

Since we can now rely on AF_UNIX sockets working on supported Windows
versions (10+), we can remove some Windows-only code for setting up
secure TCP in pg_regress.

Previously, we thought the socket cleanup code was unsafe, so we made
Unix-domain sockets an option with a "use-at-your-own-risk" note.  On
closer inspection, the concerns about signal handlers don't seem to
apply here.  (initdb.c has similar concerns but needs separate
investigation.)

Previously, commit f6dc6dd5 secured temporary installations using TCP/IP
on Windows, while commit be76a6d3 used file system permissions for Unix.
Now that our src/port/mkdtemp.c file creates non-world-accessible
directories on Windows, we can just do the same on Windows.

Note that this doesn't affect the TAP tests, which continue to use
TCP/IP on Windows by default.

Discussion: https://postgr.es/m/CA%2BhUKGK30uLx9dpgkYwomgH0WVLUHytkChDgf3iUM2zp0pf_nA%40mail.gmail.com
---
 src/test/regress/pg_regress.c | 274 --
 1 file changed, 32 insertions(+), 242 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 7b23cc80dc..9c3251ca3b 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -55,6 +55,20 @@ char	   *host_platform = HOST_TUPLE;
 static char *shellprog = SHELLPROG;
 #endif
 
+/*
+ * The name of the environment variable that controls where we put temporary
+ * files, to override

Re: Commitfest 2023-03 starting tomorrow!

2023-03-21 Thread Greg Stark

So a week later

 Status summary: March 15March 22
   Needs review: 152 128
   Waiting on Author: 42  36
   Ready for Committer:   39  32
   Committed: 61  82
   Moved to next CF:   4  15
   Withdrawn: 17  16 (?)
   Rejected:   0   5
   Returned with Feedback: 4   5
 Total: 319.


These patches that are "Needs Review" and have received no comments at
all since before March 1st are now below. There are about 20 fewer
such patches than there were last week.

No emails since August-December 2022:

* New hooks in the connection path
* Add log messages when replication slots become active and inactive
* Remove dead macro exec_subplan_get_plan
* Consider parallel for LATERAL subqueries having LIMIT/OFFSET
* pg_rewind WAL deletion pitfall
* Simplify find_my_exec by using realpath(3)
* Move backup-related code to xlogbackup.c/.h
* Avoid hiding shared filesets in pg_ls_tmpdir (pg_ls_* functions for
showing metadata ...)
* Fix bogus error emitted by pg_recvlogical when interrupted
* Check consistency of GUC defaults between .sample.conf and
pg_settings.boot_val
* Code checks for App Devs, using new options for transaction behavior
* Lockless queue of waiters based on atomic operations for LWLock
* Fix assertion failure with next_phase_at in snapbuild.c
* Add sortsupport for range types and btree_gist
* asynchronous execution support for Custom Scan
* CREATE INDEX CONCURRENTLY on partitioned table
* Partial aggregates push down
* Non-replayable WAL records through overflows and >MaxAllocSize lengths

No emails since January 2023

* Enable jitlink as an alternative jit linker of legacy Rtdyld and add
riscv jitting support
* basebackup: support zstd long distance matching
* pgbench - adding pl/pgsql versions of tests
* Function to log backtrace of postgres processes
* More scalable multixacts buffers and locking
* COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns
* postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
* Add semi-join pushdown to postgres_fdw
* Skip replicating the tables specified in except table option
* Post-special Page Storage TDE support
* Direct I/O (developer-only feature)
* Improve doc for autovacuum on partitioned tables
* Set arbitrary GUC options during initdb
* An attempt to avoid
locally-committed-but-not-replicated-to-standby-transactions in
synchronous replication
* Check lateral references within PHVs for memoize cache keys
* monitoring usage count distribution
* Reduce wakeup on idle for bgwriter & walwriter for >5s
* Report the query string that caused a memory error under Valgrind

No emails since February 2023

* New [relation] options engine
* possibility to take name, signature and oid of currently executed
function in GET DIAGNOSTICS statement
* Named Operators
* nbtree performance improvements through specialization on key shape
* Fix assertion failure in SnapBuildInitialSnapshot()
* Speed up releasing of locks
* Improve pg_bsd_indent's handling of multiline initialization expressions
* User functions for building SCRAM secrets
* Refactoring postgres_fdw/connection.c
* Add pg_stat_session
* Doc: Improve note about copying into postgres_fdw foreign tables in batch
* archive modules loose ends
* Fix dsa_free() to re-bin segment
* Reduce timing overhead of EXPLAIN ANALYZE using rdtsc
* clean up permission checks after 599b33b94
* Some revises in adding sorting path
* ResourceOwner refactoring
* Fix the description of GUC "max_locks_per_transaction" and
"max_pred_locks_per_transaction" in guc_table.c
* some namespace.c refactoring
* Add function to_oct
* Switching XLog source from archive to streaming when primary available
* Dynamic result sets from procedures
* BRIN - SK_SEARCHARRAY and scan key preprocessing
* Reuse Workers and Replication Slots during Logical Replication

Re: Kerberos delegation support in libpq and postgres_fdw

2023-03-21 Thread Greg Stark

The CFBot says there's a function be_gssapi_get_proxy() which is
undefined. Presumably this is a missing #ifdef or a definition that
should be outside an #ifdef.

[14:05:21.532] dblink.c: In function ‘dblink_security_check’:
[14:05:21.532] dblink.c:2606:38: error: implicit declaration of
function ‘be_gssapi_get_proxy’ [-Werror=implicit-function-declaration]
[14:05:21.532] 2606 | if (PQconnectionUsedGSSAPI(conn) &&
be_gssapi_get_proxy(MyProcPort))
[14:05:21.532] | ^~~
[14:05:21.532] cc1: all warnings being treated as errors

[13:56:28.789] dblink.c.obj : error LNK2019: unresolved external
symbol be_gssapi_get_proxy referenced in function dblink_connstr_check
[13:56:29.040] contrib\dblink\dblink.dll : fatal error LNK1120: 1
unresolved externals

On Mon, 20 Mar 2023 at 09:30, Stephen Frost  wrote:
>
> Greetings,
>
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> > > On Mon, Sep 19, 2022 at 02:05:39PM -0700, Jacob Champion wrote:
> > > > It's not prevented, because a password is being used. In my tests I'm
> > > > connecting as an unprivileged user.
> > > >
> > > > You're claiming that the middlebox shouldn't be doing this. If this new
> > > > default behavior were the historical behavior, then I would have agreed.
> > > > But the cat's already out of the bag on that, right? It's safe today.
> > > > And if it's not safe today for some other reason, please share why, and
> > > > maybe I can work on a patch to try to prevent people from doing it.
> > >
> > > Please note that this has been marked as returned with feedback in the
> > > current CF, as this has remained unanswered for a bit more than three
> > > weeks.
> >
> > There's some ongoing discussion about how to handle outbound connections
> > from the server ending up picking up credentials from the server's
> > environment (that really shouldn't be allowed unless specifically asked
> > for..), that's ultimately an independent change from what this patch is
> > doing.
>
> That got committed, which is great, though it didn't go quite as far as
> I had been hoping regarding dealing with outbound connections from the
> server- perhaps we should make it clear at least for postgres_fdw that
> it might be good for administrators to explicitly say which options are
> allowed for a given user-map when it comes to how authentication is
> done to the remote server?  Seems like mostly a documentation
> improvement, I think?  Or should we have some special handling around
> that option for postgres_fdw/dblink?
>
> > Here's an updated version which does address Robert's concerns around
> > having this disabled by default and having options on both the server
> > and client side saying if it is to be enabled or not.  Also added to
> > pg_stat_gssapi a field that indicates if credentials were proxied or not
> > and made some other improvements and added additional regression tests
> > to test out various combinations.
>
> I've done some self-review and also reworked how the security checks are
> done to be sure that we're not ending up pulling credentials from the
> environment (with added regression tests to check for it too).  If
> there's remaining concerns around that, please let me know.  Of course,
> other review would be great also.  Presently though:
>
> - Rebased up to today
> - Requires explicitly being enabled on client and server
> - Authentication to a remote server via dblink or postgres_fdw with
>   GSSAPI requires that credentials were proxied by the client to the
>   server, except if the superuser set 'password_required' to false on
>   the postgres_fdw (which has lots of caveats around it in the
>   documentation because it's inherently un-safe to do).
> - Includes updated documentation
> - Quite a few additional regression tests to check for unrelated
>   credentials coming from the environment in either cases where
>   credentials have been proxied and in cases where they haven't.
> - Only changes to existing regression tests for dblink/postgres_fdw are
>   in the error message wording updates.
>
> Thanks!
>
> Stephen



-- 
greg

RE: Allow logical replication to copy tables in binary format

2023-03-21 Thread shiy.f...@fujitsu.com

On Wed Mar 22, 2023 7:29 AM Peter Smith  wrote:
> 
> Thanks for all the patch updates. Patch v19 LGTM.
> 

+1

Regards,
Shi Yu

Re: [PATCH] Optional OR REPLACE in CREATE OPERATOR statement

2023-03-21 Thread Gregory Stark (as CFM)

On Tue, 5 Jul 2022 at 11:29, Tom Lane  wrote:
>
> No, that's not acceptable.  CREATE OR REPLACE should always produce
> exactly the same final state of the object, but in this case we cannot
> change the underlying function if the operator already exists.

It sounds like this patch isn't the direction to go in. I don't know
if IF NOT EXISTS is better but that design discussion should probably
happen after this commitfest.

I'm sorry but I guess I'll mark this patch Rejected.

-- 
Gregory Stark
As Commitfest Manager

RE: Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?

2023-03-21 Thread Hayato Kuroda (Fujitsu)

Dear Daniel, Tom,

> > On 20 Mar 2023, at 15:31, Tom Lane  wrote:
> >
> > "Hayato Kuroda (Fujitsu)"  writes:
> >> While checking documentations, I found that one line notes our product as
> >> "PostgreSQL", whereas another line notes
> as just
> >> "PostgreSQL".
> >
> > IMO the convention is to use the  tag everywhere that we
> > spell out "PostgreSQL".  I don't think it's actually rendered differently
> > with our current stylesheets, but maybe someday it will be.
> 
> IIRC the main use in DocBook is for automatically decorating productnames with
> trademark signs etc, and to generate lists of trademarks, but also that they
> can be rendered differently.

OK, I understood that even if the string is not rendered, that should be tagged 
as .

> IIRC the main use in DocBook is for automatically decorating productnames with
> trademark signs etc, and to generate lists of trademarks, but also that they
> can be rendered differently.
> 
> This reminded me that I was planning to apply the below to make the markup of
> PostgreSQL consistent:

I have also grepped to detect another wrong markups, and I think at least
"PostgreSQL" should be changed. PSA the patch.

```
$ grep -rI \>PostgreSQL\< | grep -v productname
config.sgml:the log. The default is PostgreSQL.
func.sgml:PostgreSQL
func.sgml:PostgreSQL
runtime.sgml:   event source named PostgreSQL.
problems.sgml:   The software package in total is called 
PostgreSQL,
ref/pg_ctl-ref.sgml:   default is PostgreSQL.  Note that 
this only controls
ref/pg_ctl-ref.sgml:   source name PostgreSQL.
ref/pg_ctl-ref.sgml:   The default is PostgreSQL.
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

fix_inconsistent_tag.patch
Description: fix_inconsistent_tag.patch

Re: Raising the SCRAM iteration count

2023-03-21 Thread Gregory Stark (as CFM)

On Tue, 14 Mar 2023 at 14:54, Gregory Stark (as CFM)
 wrote:
>
> CFBot is failing with this test failure... I'm not sure  if this just
> represents a timing dependency or a bad test or what?

CFBot is now consistently showing these test failures. I think there
might actually be a problem here?


> [09:44:49.937] --- stdout ---
> [09:44:49.937] # executing test in
> /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password
> group authentication test 001_password
> [09:44:49.937] ok 1 - scram_iterations in server side ROLE
> [09:44:49.937] # test failed
> [09:44:49.937] --- stderr ---
> [09:44:49.937] # Tests were run but no plan was declared and
> done_testing() was not seen.
> [09:44:49.937] # Looks like your test exited with 2 just after 1.
> [09:44:49.937]
> [09:44:49.937] (test program exited with status code 2)
>
> It looks like perhaps a Perl issue?
>
> # Running: /tmp/cirrus-ci-build/build-32/src/test/regress/pg_regress
> --config-auth 
> /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/data/t_001_password_primary_data/pgdata
> ### Starting node "primary"
> # Running: pg_ctl -w -D
> /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/data/t_001_password_primary_data/pgdata
> -l 
> /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/log/001_password_primary.log
> -o --cluster-name=primary start
> waiting for server to start done
> server started
> # Postmaster PID for node "primary" is 66930
> [09:44:07.411](1.875s) ok 1 - scram_iterations in server side ROLE
> Can't locate IO/Pty.pm in @INC (you may need to install the IO::Pty
> module) (@INC contains: /tmp/cirrus-ci-build/src/test/perl
> /tmp/cirrus-ci-build/src/test/authentication /etc/perl
> /usr/local/lib/i386-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1
> /usr/lib/i386-linux-gnu/perl5/5.32 /usr/share/perl5
> /usr/lib/i386-linux-gnu/perl/5.32 /usr/share/perl/5.32
> /usr/local/lib/site_perl) at /usr/share/perl5/IPC/Run.pm line 1828.
> Unexpected SCALAR(0x5814b508) in harness() parameter 3 at
> /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Cluster.pm line
> 2112.
> Can't locate IO/Pty.pm in @INC (you may need to install the IO::Pty
> module) (@INC contains: /tmp/cirrus-ci-build/src/test/perl
> /tmp/cirrus-ci-build/src/test/authentication /etc/perl
> /usr/local/lib/i386-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1
> /usr/lib/i386-linux-gnu/perl5/5.32 /usr/share/perl5
> /usr/lib/i386-linux-gnu/perl/5.32 /usr/share/perl/5.32
> /usr/local/lib/site_perl) at /usr/share/perl5/IPC/Run.pm line 1939.
> # Postmaster PID for node "primary" is 66930
> ### Stopping node "primary" using mode immediate
> # Running: pg_ctl -D
> /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/data/t_001_password_primary_data/pgdata
> -m immediate stop
> waiting for server to shut down done
> server stopped
> # No postmaster PID for node "primary"
> [09:44:07.521](0.110s) # Tests were run but no plan was declared and
> done_testing() was not seen.
> [09:44:07.521](0.000s) # Looks like your test exited with 2 just after 1.



-- 
Gregory Stark
As Commitfest Manager

Re: Commitfest 2023-03 starting tomorrow!

2023-03-21 Thread Greg Stark

On Tue, 21 Mar 2023 at 05:59, Alvaro Herrera  wrote:
>
> On 2023-Mar-20, Thomas Munro wrote:
>
> This led me to suggesting that perhaps we need to be more lenient when
> it comes to new contributors.  As I said, for seasoned contributors,
> it's not a problem to keep up with our requirements, however silly they
> are.  But people who spend their evenings a whole week or month trying
> to understand how to patch for one thing that they want, to be received
> by six months of silence followed by a constant influx of "please rebase
> please rebase please rebase", no useful feedback, and termination with
> "eh, you haven't rebased for the 1001th time, your patch has been WoA
> for X days, we're setting it RwF, feel free to return next year" ...
> they are most certainly off-put and will *not* try again next year.

I feel like the "no useful feedback" is the real problem though. If
the patches had been reviewed in earlier commitfests the original
contributor would still have been around to finish it... Like, I think
what would actually solve this problem would be if we kept a "clean"
house where patches were committed within one or two commitfests
rather than dragged forward until the final commitfest.

I do agree though. It would be nice if it was easier for anyone to do
trivial merges and update the commitfest entry. That's the kind of
thing gitlab/github are better positioned to solve when they can have
integral editors and built-in CI...

I haven't been RwF or moving to the next commitfest when the merge
looked trivial. And in one case I actually did the merge myself :) But
that only goes so far. If the merge actually requires understanding
the patch in depth then the counter-argument is that the committer
might be spending a lot of time on a patch that won't get committed
while others sit ignored entirely.

--
greg

Re: Initial Schema Sync for Logical Replication

2023-03-21 Thread Masahiko Sawada

On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila  wrote:
>
> On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira  wrote:
> >
> > On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
> >
> > > From: Alvaro Herrera 
> > > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > > On 2023-Mar-15, Kumar, Sachin wrote:
> > >
> > > > 1. In  CreateSubscription()  when we create replication
> > > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > > can use this snapshot later in the pg_dump.
> > > >
> > > > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> > >
> > > Overall I'm not on board with the idea that logical replication would 
> > > depend on
> > > pg_dump; that seems like it could run into all sorts of trouble (what if 
> > > calling
> > > external binaries requires additional security setup?  what about pg_hba
> > > connection requirements? what about max_connections in tight
> > > circumstances?).
> > > what if calling external binaries requires additional security setup
> > I am not sure what kind of security restriction would apply in this case, 
> > maybe pg_dump
> > binary can be changed ?
> >
> > Using pg_dump as part of this implementation is not acceptable because we
> > expect the backend to be decoupled from the client. Besides that, pg_dump
> > provides all table dependencies (such as tablespaces, privileges, security
> > labels, comments); not all dependencies shouldn't be replicated.
> >
>
> I agree that in the initial version we may not support sync of all
> objects but why that shouldn't be possible in the later versions?
>
> > You should
> > exclude them removing these objects from the TOC before running pg_restore 
> > or
> > adding a few pg_dump options to exclude these objects. Another issue is 
> > related
> > to different version. Let's say the publisher has a version ahead of the
> > subscriber version, a new table syntax can easily break your logical
> > replication setup. IMO pg_dump doesn't seem like a good solution for initial
> > synchronization.
> >
> > Instead, the backend should provide infrastructure to obtain the required 
> > DDL
> > commands for the specific (set of) tables. This can work around the issues 
> > from
> > the previous paragraph:
> >
> ...
> > * don't need to worry about different versions.
> >
>
> AFAICU some of the reasons why pg_dump is not allowed to dump from the
> newer version are as follows: (a) there could be more columns in the
> newer version of the system catalog and then Select * type of stuff
> won't work because the client won't have knowledge of additional
> columns. (b) the newer version could have new features (represented by
> say new columns in existing catalogs or new catalogs) that the older
> version of pg_dump has no knowledge of and will fail to get that data
> and hence an inconsistent dump. The subscriber will easily be not in
> sync due to that.
>
> Now, how do we avoid these problems even if we have our own version of
> functionality similar to pg_dump for selected objects? I guess we will
> face similar problems.

Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.

> If so, we may need to deny schema sync in any such case.

Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

misplaced GUC in pqcomm.h -- where to put actual common variable though...?

2023-03-21 Thread Greg Stark

So back in 2002 in 7.3 there was a commit 2c6b34d9598 which added a
GUC db_user_namespace which is stored in a variable Db_user_namespace.
All that seems fine except...

The variable this GUC is stored in is Db_user_namespace which... is
actually declared in pqcomm.h which is intended to be "Definitions
common to frontends and backends".

Afaics it's never actually defined in any FE code, neither libpq nor
any clients. I was a bit surprised this isn't producing a warning
about an extern declaration that's never defined but I guess that's
not actually that unusual.

The actual variable is defined in the backend in postmaster.c. I'm
guessing this declaration can just move to libpq/libpq.h which
(counterintuitively) is for the backend iiuc.

I don't think this causes any actual problems aside from namespace
pollution but it confused me.  I found this because I was looking for
where to put the ALPN protocol version which (at least at the moment)
would be the same for the server and client. But as far as I can tell
it would be the only variable (other than the above) declared in both
and that means there's no particularly handy place to put the
definition.

--
greg

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-21 Thread Kyotaro Horiguchi

At Wed, 22 Mar 2023 10:16:12 +0900, Michael Paquier  wrote 
in 
> On Mon, Mar 20, 2023 at 11:57:31AM +0100, Drouvot, Bertrand wrote:
> > "Buffer" sounds more appropriate to me, so the attached has been done that 
> > way.
> 
> This choice is OK for me.
> 
> > +
> > + pg_stat_get_xact_blocks_fetched
> > +
> > +pg_stat_get_xact_blocks_fetched ( 
> > oid )
> > +bigint
> > +   
> > +   
> > +Returns the number of buffer fetches for table or index, in the 
> > current transaction
> 
> This should be "number of buffer fetched", no?

In the original description, "buffer fetches" appears to be a plural
form of a compound noun and correct, similar to "buffer hits"
mentioned later. If we reword it, I think it should be "number of
buffers fetched".

> > +
> > +pg_stat_get_xact_blocks_hit ( 
> > oid )
> > +bigint
> > +   
> > +   
> > +Returns the number of buffer hits for table or index, in the 
> > current transaction
> > +   
> 
> This one looks OK to me too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Improve logging when using Huge Pages

2023-03-21 Thread Justin Pryzby

On Mon, Mar 20, 2023 at 02:03:13PM +0900, Michael Paquier wrote:
> On Mon, Mar 20, 2023 at 01:54:46PM +0900, Michael Paquier wrote:
> > The main advantage of a read-only GUC over a function is that users
> > would not need to start a postmaster to know if huge pages would be
> > active or not.  This is the main reason why a GUC would be a better
> > fit, in my opinion, because it makes for a cheaper check, while still
> > allowing a SQL query to check the value of the GUC.
> 
> [ Should have read more carefully ]
> 
> ..  Which is something you cannot do with -C because mmap() happens
> after the runtime-computed logic for postgres -C.  It does not sound
> right to do the mmap() for a GUC check, so indeed a function may be
> more adapted rather than move mmap() call a bit earlier in the
> postmaster startup sequence.

On Mon, Mar 20, 2023 at 02:17:33PM +0900, Michael Paquier wrote:
> On Mon, Mar 20, 2023 at 01:09:09AM -0400, Tom Lane wrote:
> > I'm confused here, because Horiguchi-san is saying that that
> > won't work.  I've not checked the code lately, but I think that
> > "postgres -C var" prints its results before actually attempting
> > to establish shared memory, so I suspect Horiguchi-san is right.
> 
> Yes, I haven't read correctly through.  Sorry for the noise.

You set this patch to "waiting on author" twice.  Would you let me know
what I could do to help progress the patch?  Right now, I have no idea.

Most recently, you said it'd be better implemented as a GUC to allow
using -C, but then recanted because -C doesn't work for this (which is
why I implemented it as a string back on 2023-02-08).  Which is why I
reset its status on 2023-03-20.

2023-03-22 01:36:58 Michael Paquier (michael-kun)   New status: Waiting on 
Author
2023-03-20 13:05:32 Justin Pryzby (justinpryzby)New status: Needs review
2023-03-20 05:03:53 Michael Paquier (michael-kun)   New status: Waiting on 
Author

-- 
Justin

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread shiy.f...@fujitsu.com

On Tuesday, March 21, 2023 8:03 PM Önder Kalacı  wrote:
> 
> Attached patches again.
> 

Thanks for updating the patch.

@@ -408,15 +412,18 @@ $node_subscriber->wait_for_subscription_sync;
 $node_publisher->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN c_drop;
 ));
 $node_subscriber->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN c_drop;
 ));

Is there any reasons why we drop column here? Dropped column case has been
tested on table dropped_cols. The generated column problem can be detected
without dropping columns on my machine.

Regards,
Shi Yu

Re: Show various offset arrays for heap WAL records

2023-03-21 Thread Peter Geoghegan

On Tue, Mar 21, 2023 at 3:37 PM Peter Geoghegan  wrote:
> One problem that I often run into when performing analysis of VACUUM
> using pg_walinspect is the issue of *who* pruned which heap page, for
> any given PRUNE record. Was it VACUUM/autovacuum, or was it
> opportunistic pruning? There is no way of knowing for sure right now.
> You *cannot* rely on an xid of 0 as an indicator of a given PRUNE
> record coming from VACUUM; it could just have been an opportunistic
> prune operation that happened to take place when a SELECT query ran,
> before any XID was ever allocated.

In case it's unclear how much of a problem this can be, here's an example:

The misc.sql regression test does a bulk update of the table "onek". A
little later, one of the queries that appears under the section "copy"
from the same file SELECTs from "onek". This produces a succession of
opportunistic prune records that look exactly like what you'd expect from
a VACUUM when viewed through pg_walinspect (without this patch). Each
PRUNE record has XID 0. The records appear in ascending heap block
number order, since there is a sequential scan involved (we go through
heapgetpage() to get to heap_page_prune_opt(), where the query prunes
opportunistically).

Another slightly surprising fact revealed by the patch is the ratio of
opportunistic prunes ("Heap2/PRUNE") to prunes run during VACUUM
("Heap2/PRUNE+BYVACUUM") with the regression tests:

│ resource_manager/record_type │ Heap2/PRUNE │
│ count│ 4,521   │
│ count_perc   │ 0.220   │
│ rec_size │ 412,442 │
│ avg_rec_size │ 91  │
│ rec_size_perc│ 0.194   │
│ fpi_size │ 632,828 │
│ fpi_size_perc│ 1.379   │
│ combined_size│ 1,045,270   │
│ combined_size_perc   │ 0.404   │
├─[ RECORD 61 ]┼─┤
│ resource_manager/record_type │ Heap2/PRUNE+BYVACUUM│
│ count│ 2,784   │
│ count_perc   │ 0.135   │
│ rec_size │ 467,057 │
│ avg_rec_size │ 167 │
│ rec_size_perc│ 0.219   │
│ fpi_size │ 546,344 │
│ fpi_size_perc│ 1.190   │
│ combined_size│ 1,013,401   │
│ combined_size_perc   │ 0.391   │
├─[ RECORD 62 ]┼─┤
│ resource_manager/record_type │ Heap2/VACUUM│
│ count│ 3,463   │
│ count_perc   │ 0.168   │
│ rec_size │ 610,038 │
│ avg_rec_size │ 176 │
│ rec_size_perc│ 0.286   │
│ fpi_size │ 893,964 │
│ fpi_size_perc│ 1.948   │
│ combined_size│ 1,504,002   │
│ combined_size_perc   │ 0.581   │
├─[ RECORD 63 ]┼─┤
│ resource_manager/record_type │ Heap2/VISIBLE   │
│ count│ 7,293   │
│ count_perc   │ 0.354   │
│ rec_size │ 431,382 │
│ avg_rec_size │ 59  │
│ rec_size_perc│ 0.202   │
│ fpi_size │ 1,794,048   │
│ fpi_size_perc│ 3.909   │
│ combined_size│ 2,225,430   │
│ combined_size_perc   │ 0.859   │


--
Peter Geoghegan

Re: Remove nonmeaningful prefixes in PgStat_* fields

2023-03-21 Thread Michael Paquier

On Mon, Mar 20, 2023 at 10:05:21AM +0100, Drouvot, Bertrand wrote:
> Good point and keeping it as it is currently would not
> affect the work that is/will be done in [1].

I guess that I'm OK with that to get more of pgstatfuncs.c to use
macros for the function definitions there..  Alvaro, Tom, perhaps you
still think that this is unadapted?  Based on the file split and the
references to funcentry and tabentry, I think that's OK, but that
stands just as one opinion among many..

> So, please find attached V2 attached taking this comment into account. 
> [1]: 
> https://www.postgresql.org/message-id/flat/89606d96-cd94-af74-18f3-c7ab2b684ba2%40gmail.com

Nice.  I am pretty sure that finishing some of that is doable by the
end of this CF to reduce the size of pgstatfuncs.c overall.
--
Michael

signature.asc
Description: PGP signature

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-21 Thread Michael Paquier

On Mon, Mar 20, 2023 at 11:57:31AM +0100, Drouvot, Bertrand wrote:
> "Buffer" sounds more appropriate to me, so the attached has been done that 
> way.

This choice is OK for me.

> +
> + pg_stat_get_xact_blocks_fetched
> +
> +pg_stat_get_xact_blocks_fetched ( 
> oid )
> +bigint
> +   
> +   
> +Returns the number of buffer fetches for table or index, in the 
> current transaction

This should be "number of buffer fetched", no?

> +
> +pg_stat_get_xact_blocks_hit ( oid )
> +bigint
> +   
> +   
> +Returns the number of buffer hits for table or index, in the current 
> transaction
> +   

This one looks OK to me too.
--
Michael


signature.asc
Description: PGP signature

Re: Track IO times in pg_stat_io

2023-03-21 Thread Melanie Plageman

On Mon, Mar 20, 2023 at 10:34 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-03-16 17:19:16 -0400, Melanie Plageman wrote:
> > > > > I wonder if we should get rid of pgStatBlockReadTime, 
> > > > > pgStatBlockWriteTime,
> > > >
> > > > And then have pg_stat_reset_shared('io') reset pg_stat_database IO
> > > > stats?
> > >
> > > Yes.
> >
> > I think this makes sense but I am hesitant to do it in this patchset,
> > because it feels a bit hidden...maybe?
>
> I'd not do it in the same commit, but I don't see a problem with doing it in
> the same patchset.
>
> Now that I think about it again, this wouldn't make pg_stat_reset_shared('io')
> affect pg_stat_database - I was thinking we should use pgstat_io.c stats to
> provide the information for pgstat_database.c, using its own pending counter.

So, I've done this in the attached. But, won't resetting pgstat_database
be a bit weird if you have built up some IO timing in pending counters
and right after you reset a flush happens and then suddenly the values
are way above 0 again?

> > I considered refactoring pgstat_io_end() to use INSTR_TIME_ACCUM_DIFF()
> > like [1], but, in the end I actually think I would end up with more
> > operations because of the various different counters needing to be
> > updated. As it is now, I do a single subtract and a few adds (one for
> > each of the different statistics objects tracking IO times
> > (pgBufferUsage, pgStatBlockWrite/ReadTime). Whereas, I would need to do
> > an accum diff for every one of those.
>
> Right - that only INSTR_TIME_ACCUM_DIFF() only makes sense if there's just a
> single counter to update.
>
>
> WRT:
> /* TODO: AFAICT, 
> pgstat_count_buffer_write_time is only called */
> /* for shared buffers whereas 
> pgstat_count_buffer_read_time is */
> /* called for temp relations and shared 
> buffers. */
> /*
>  * is this intentional and should I match 
> current behavior or
>  * not?
>  */
>
> It's hard to see how that behaviour could be intentional.  Probably worth
> fixing in a separate patch. I don't think we're going to backpatch, but it
> would make this clearer nonetheless.


Attached v7 does this in separate commits.

Remaining feedback is about FlushLocalBuffers(). Is the idea simply to
get it into bufmgr.c because that is cleaner from an API perspective?

- Melanie
From a7ba3cce6dbbde49efa5b20e2db5cd49c259d3ad Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 21 Mar 2023 18:20:44 -0400
Subject: [PATCH v7 2/4] FlushRelationBuffers() counts temp relation IO timing

Add pgstat_database and pgBufferUsage IO timing counting to
FlushRelationBuffers() for writes of temporary relations.
---
 src/backend/storage/buffer/bufmgr.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0a05577b68..dea2e8fe40 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3591,6 +3591,8 @@ FlushRelationBuffers(Relation rel)
 {
 	int			i;
 	BufferDesc *bufHdr;
+	instr_time	io_start,
+io_time;
 
 	if (RelationUsesLocalBuffers(rel))
 	{
@@ -3616,17 +3618,33 @@ FlushRelationBuffers(Relation rel)
 
 PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
+if (track_io_timing)
+	INSTR_TIME_SET_CURRENT(io_start);
+else
+	INSTR_TIME_SET_ZERO(io_start);
+
 smgrwrite(RelationGetSmgr(rel),
 		  BufTagGetForkNum(>tag),
 		  bufHdr->tag.blockNum,
 		  localpage,
 		  false);
 
+
 buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 pg_atomic_unlocked_write_u32(>state, buf_state);
 
 pgstat_count_io_op(IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL, IOOP_WRITE);
 
+if (track_io_timing)
+{
+	INSTR_TIME_SET_CURRENT(io_time);
+	INSTR_TIME_SUBTRACT(io_time, io_start);
+	pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
+	INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time);
+}
+
+pgBufferUsage.local_blks_written++;
+
 /* Pop the error context stack */
 error_context_stack = errcallback.previous;
 			}
-- 
2.37.2

From 1088cde0ea0d39b7e55cd919f4c9151136f36e28 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 21 Mar 2023 16:00:55 -0400
Subject: [PATCH v7 1/4] Count IO time for temp relation writes

Both pgstat_database and pgBufferUsage write times failed to count
timing for flushes of dirty local buffers when acquiring a new local
buffer for a temporary relation block.
---
 src/backend/storage/buffer/localbuf.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 5325ddb663..80510411ae 100644
--- a/src/backend/storage/buffer/localbuf.c
+++

Re: Fix fseek() detection of unseekable files on WIN32

2023-03-21 Thread Michael Paquier

On Mon, Mar 20, 2023 at 07:06:22AM +0900, Michael Paquier wrote:
> Not sure about this one.  I have considered it and dirmod.c includes
> also bits for cygwin, while being aimed for higher-level routines like
> rename(), unlink() or symlink().  This patch is only for WIN32, and
> aimed for common parts in win32*.c code, so a separate file seemed a
> bit cleaner to me at the end.

By the way, do you think that we could be able to get a TAP test for
that?  It does not seem that it needs to be that complicated, as long
as we use a pg_dump command that pipes its output to a pg_restore
command launched by system()?
--
Michael


signature.asc
Description: PGP signature

Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-21 Thread Michael Paquier

On Mon, Mar 20, 2023 at 07:56:42AM -0400, Robert Haas wrote:
> Anyone want to comment on this?

I have not checked the patch in details, but perhaps this needs at
least one test?
--
Michael


signature.asc
Description: PGP signature

Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean

2023-03-21 Thread Michael Paquier

On Mon, Mar 20, 2023 at 04:04:31PM +0800, Richard Guo wrote:
> Agreed. +1 to remove the counts.

Thanks.  Adjusted this way, then.
--
Michael


signature.asc
Description: PGP signature

Re: Allow logical replication to copy tables in binary format

2023-03-21 Thread Peter Smith

Thanks for all the patch updates. Patch v19 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Show various offset arrays for heap WAL records

2023-03-21 Thread Peter Geoghegan

On Mon, Mar 13, 2023 at 6:41 PM Peter Geoghegan  wrote:
> There are several different things that seem important to me
> personally. These are in tension with each other, to a degree. These
> are:
>
> 1. Like Andres, I'd really like to have some way of inspecting things
> like heapam PRUNE, VACUUM, and FREEZE_PAGE records in significant
> detail. These record types happen to be very important in general, and
> the ability to see detailed information about the WAL record would
> definitely help with some debugging scenarios. I've really missed
> stuff like this while debugging serious issues under time pressure.

One problem that I often run into when performing analysis of VACUUM
using pg_walinspect is the issue of *who* pruned which heap page, for
any given PRUNE record. Was it VACUUM/autovacuum, or was it
opportunistic pruning? There is no way of knowing for sure right now.
You *cannot* rely on an xid of 0 as an indicator of a given PRUNE
record coming from VACUUM; it could just have been an opportunistic
prune operation that happened to take place when a SELECT query ran,
before any XID was ever allocated.

I think that we should do something like the attached, to completely
avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
XLOG_HEAP2 record types to indicate that they took place during
VACUUM, by XOR'ing the flag with the record type/info when
XLogInsert() is called. For now this is only used by PRUNE records.
Tools like pg_walinspect will report a separate "Heap2/PRUNE+BYVACUUM"
record_type, as well as the unadorned Heap2/PRUNE record_type, which
we'll now know must have been opportunistic pruning.

The approach of using a bit in the style of the heapam init bit makes
sense to me, because the bit is available, and works in a way that is
minimally invasive. Also, one can imagine needing to resolve a similar
ambiguity in the future, when (say) opportunistic freezing is added.

I think that it makes sense to treat this within the scope of
Melanie's ongoing work to improve the instrumentation of these records
-- meaning that it's in scope for Postgres 16. Admittedly this is a
slightly creative interpretation, so if others disagree then I won't
argue. This is quite a small patch, though, which makes debugging
significantly easier. I think that there could be a great deal of
utility in being able to easily "pair up" corresponding
"Heap2/PRUNE+BYVACUUM" and "Heap2/VACUUM" records in debugging
scenarios. I can imagine linking these to "Heap2/FREEZE_PAGE" and
"Heap2/VISIBLE" records, too, since they're all closely related record
types.

--
Peter Geoghegan

v1-0001-Record-which-PRUNE-records-are-from-VACUUM.patch
Description: Binary data

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Matthias van de Meent

On Tue, 21 Mar 2023 at 23:05, Andres Freund  wrote:
>
> Hi,
>
> On 2023-03-21 21:02:08 +0100, Matthias van de Meent wrote:
> > On Tue, 21 Mar 2023 at 20:58, Andres Freund  wrote:
> > > On 2023-03-21 20:20:40 +0100, Matthias van de Meent wrote:
> > > > Yes, attcacheoff is a tremendous performance boon in many cases.
> > >
> > > Which? We don't use fastgetattr() in many places these days. And in some 
> > > quick
> > > measurements it's a wash or small loss when deforming slot tuples, even 
> > > when
> > > the attcacheoff optimization would apply, because the branches for 
> > > managing it
> > > add more overhead than they safe.
> >
> > My experience with attcacheoff performance is in indexes, specifically
> > index_getattr(). Sure, multi-column indexes are uncommon, but the
> > difference between have and have-not for cached attribute offsets is
> > several %.
>
> I did indeed not think of index_getattr(), just heap related things.
>
> Do you have a good test workload handy - I'm kinda curious to compare the cost
> of removing attcacheoff vs the gain of not maintaining it for index workloads.

Rebuilding indexes has been my go-to workload for comparing
attribute-related btree performance optimizations in [0] and [1].
Results of tests from '21 in which we're always calculating offsets
from 0 show a slowdown of 4-18% in attcacheoff-enabled workloads if
we're calculating offsets dynamically.

> It looks like many of the index_getattr() cases could be made faster without
> attcacheoff. A lot of places seem to loop over all attributes, and the key to
> accelerating that is to keep state between the iterations.

Indeed, it's not great. You can take a look at [1], which is where I'm
trying to optimize btree's handling of comparing tuples; which
includes work on reducing overhead for attribute accesses.

Note that each btree page should be able to do with comparing at most
2*log(ntups) columns, where this is currently natts * log(ntups).

> Attcacheoff is
> that, but quite stunted, because it only works if there aren't any NULLs (even
> if the NULL is in a later column).

Yes, that isn't great either, but most indexes I've seen have tuples
that are either all NULL, or have no nulls; only seldom I see indexes
that have mixed NULL/not-null index tuple attributes.

Kind regards,

Matthias van de Meent.

[0] 
https://www.postgresql.org/message-id/flat/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg%40mail.gmail.com#fe3369c4e202a7ed468e47bf5420f530
[1] 
https://www.postgresql.org/message-id/flat/caeze2wg52tsswa9fy7ocxx-k7pplmnxa_fmq6-+_pzr-aoo...@mail.gmail.com

Re: doc: add missing "id" attributes to extension packaging page

2023-03-21 Thread Brar Piening


On 17.01.2023 at 23:43, Karl O. Pinc wrote:

It's good you asked.  After poking about the XSLT 1.0 spec to refresh
my memory I abandoned the "mode" approach entirely.  The real "right
way" is to use the import mechanism.


It actually is not.

After quite some time trying to figure out why things don't work as
intended, I ended up reading the XSLT 1.0 spec.

As the name already suggests,  is closely related to
 with the difference that it *applies* a *template
rule* from an imported style sheet instead of applying a template rule
from the current style sheet
(https://www.w3.org/TR/1999/REC-xslt-19991116#apply-imports). What it
does not do is *calling* a *named template*
(https://www.w3.org/TR/1999/REC-xslt-19991116#named-templates).

What this basically means is that in XSLT 1.0 you can use
 to override template rules () but you
cannot use it to override named templates (). If you want to
override named templates you basically have to duplicate and change them.

While there are mechanisms to call overriden named templates in XSLT 3,
this is out of scope here, since we're bound to XSLT 1.0

As a consequence, there was little I could change in the initial patch
to avoid the code duplication and all attempts to do so, ultimately led
to even longer and more complex code without really reducing the amount
of duplication.

The  approach actually does work in the varlistentry
case, although this doesn't really change a lot regarding the length of
the patch (it's a bit nicer though since in this case it really avoids
duplication). I've also taken the advice to terminate the build and
print the xpath if a required id is missing.

The attached patch is my best-effort approach to implement discoverable
links.

Best regards,

Brar
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index 62d9f9eb22..9a0241a8d6 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -157,7 +157,7 @@ combined_size_percentage | 2.8634072910530795
 

 
-   
+   
 
  pg_get_wal_block_info(start_lsn pg_lsn, end_lsn pg_lsn) returns 
setof record
 
diff --git a/doc/src/sgml/stylesheet-html-common.xsl 
b/doc/src/sgml/stylesheet-html-common.xsl
index 3f85ea9536..9df2782ce4 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -301,120 +301,4 @@ set   toc,title
   
 
 
-
-
-  
-  
-  
-  
-  
-
-  
-
-  
-  
-
-  
-
-  
-  
-  
-
-  
-
-  
-  
-
-  
-
-  
-
-  
-
-  
-  
-
-  
-  6
-  
-
-  
-
-  
-  http://www.w3.org/1999/xhtml;>
-
-
-  
-clear: both
-  
-
-
-  
-
-
-  
-
-
-
-  
-
-  
-
-
-
-
-  
-  
-
-  
-
-  
-
-
-
-
-  
-  
-
-  
-
-  #
-  
-
-  
-
-
-  id_link
-
- #
-  
-
-
-  
-  
-
-  A 
-  
-   element at path '
-  
-/
-
-
-  [@
-  
-   = '
-  
-  ']
-
-  
-  ' is missing an id. Please add one to make it 
usable
-   as stable anchor in the public HTML 
documentation.
-
-  
-
-  
-
-
 
diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css
index 15bcc95d41..cc14efa1ca 100644
--- a/doc/src/sgml/stylesheet.css
+++ b/doc/src/sgml/stylesheet.css
@@ -169,13 +169,3 @@ acronym{ font-style: inherit; }
 width: 75%;
   }
 }
-
-/* Links to ids of headers and definition terms */
-a.id_link {
-color: inherit;
-visibility: hidden;
-}
-
-*:hover > a.id_link {
-visibility: visible;
-}

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 21:02:08 +0100, Matthias van de Meent wrote:
> On Tue, 21 Mar 2023 at 20:58, Andres Freund  wrote:
> > On 2023-03-21 20:20:40 +0100, Matthias van de Meent wrote:
> > > Yes, attcacheoff is a tremendous performance boon in many cases.
> >
> > Which? We don't use fastgetattr() in many places these days. And in some 
> > quick
> > measurements it's a wash or small loss when deforming slot tuples, even when
> > the attcacheoff optimization would apply, because the branches for managing 
> > it
> > add more overhead than they safe.
> 
> My experience with attcacheoff performance is in indexes, specifically
> index_getattr(). Sure, multi-column indexes are uncommon, but the
> difference between have and have-not for cached attribute offsets is
> several %.

I did indeed not think of index_getattr(), just heap related things.

Do you have a good test workload handy - I'm kinda curious to compare the cost
of removing attcacheoff vs the gain of not maintaining it for index workloads.

It looks like many of the index_getattr() cases could be made faster without
attcacheoff. A lot of places seem to loop over all attributes, and the key to
accelerating that is to keep state between the iterations. Attcacheoff is
that, but quite stunted, because it only works if there aren't any NULLs (even
if the NULL is in a later column).

Greetings,

Andres Freund

Re: proposal: possibility to read dumped table's name from file

2023-03-21 Thread Pavel Stehule

út 21. 3. 2023 v 16:32 odesílatel Justin Pryzby 
napsal:

> On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> > ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby 
> napsal:
> >
> > > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > > rebase + enhancing about related option from a563c24
> > >
> > > Thanks.
> > >
> > > It looks like this doesn't currently handle extensions, which were
> added
> > > at 6568cef26e.
>
> What about this part ?  Should extension filters be supported ?
>

should be fixed


>
> I think the comment that I'd patched that lists all the filter types
> should be minimized, rather than duplicating the list of all the
> possible filters that's already in the usrr-facing documentation.
>

I modified this comment. Please, check

>
> One new typo: childrent
>

fixed

Regards

Pavel
From f91f525f32f51f7c5784dd7af57fe2b692db5e7f Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 127 
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1768 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d6b1faa804..17bfc661a9 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -829,6 +829,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Jeff Davis

On Tue, 2023-03-21 at 09:22 -0400, Dave Cramer wrote:
> As Jeff mentioned there is a visibility problem if the search path is
> changed. The simplest solution IMO is to look up the OID at the time
> the format is requested and use the OID going forward to format the
> output as binary. If the search path changes and a type with the same
> name is now first in the search path then the data would be returned
> in text. 

The binary format parameter would ordinarily be set by the maintainer
of the client library, who knows nothing about the schema the client
might be accessing, and nothing about the search_path that might be
set. They would only know which binary parsers they've already written
and included with their client library.

With that in mind, using search_path at all seems weird. Why would a
change in search_path affect which types the client library knows how
to parse? If the client library knows how to parse "foo.mytype"'s
binary representation, and you change the search path such that it
finds "bar.mytype" instead, did the client library all of a sudden
forget how to parse "foo.mytype" and learn to parse "bar.mytype"?

If there's some extension that offers type "mytype", and perhaps allows
it to be installed in any schema, then it seems that the client library
would know how to parse all instances of "mytype" regardless of the
schema or search_path.

Of course, a potential problem is that ordinary users can create types
(e.g. enum types) and so you'd have to be careful about some tricks
where someone shadows a well-known extension in order to confuse the
client with unexpected binary data (not sure if that's a security
concern or not, just thinking out loud).

One solution might be that unqualified type names would work on all
types of that name (in any schema) that are owned by a superuser,
regardless of search_path. Most extension scripts will be run as
superuser anyway. It would feel a little magical, which I don't like,
but would work in any practical case I can think of.

Another solution would be to have some extra catalog field in pg_type
that would be a "binary format identifier" and use that rather than the
type name to match up binary parsers with the proper type.

Am I over-thinking this?

Regards,
Jeff Davis

Re: pgsql: Stop recommending auto-download of DTD files, and indeed disable

2023-03-21 Thread Robert Haas

On Tue, Mar 21, 2023 at 4:45 PM Robert Haas  wrote:
> I use MacPorts, rather than Homebrew, but still found it necessary to
> do something similar, specifically:
>
> export XML_CATALOG_FILES=/opt/local/etc/xml/catalog

Ah, never mind. I had an incorrect value in my environment. If I unset
it completely, it works just as well as setting a correct value.

Sorry for the noise.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Stop recommending auto-download of DTD files, and indeed disable

2023-03-21 Thread Robert Haas

On Wed, Feb 8, 2023 at 5:16 PM Tom Lane  wrote:
> Stop recommending auto-download of DTD files, and indeed disable it.

According to this commit:

   
The Homebrew-supplied programs require the following environment variable
to be set:

export XML_CATALOG_FILES=/usr/local/etc/xml/catalog

Without it, xsltproc will throw errors like this:

I/O error : Attempt to load network entity
http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd
postgres.sgml:21: warning: failed to load external entity
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd;
...

   

I use MacPorts, rather than Homebrew, but still found it necessary to
do something similar, specifically:

export XML_CATALOG_FILES=/opt/local/etc/xml/catalog

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2023-03-21 Thread Sergei Kornilov

Hello!

The documentation still describes the function pg_stat_statements_reset like 
this

>   By default, this function can only be executed by superusers.

But unfortunately, this part was lost somewhere.

-- Don't want this to be available to non-superusers.
REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint, boolean) FROM 
PUBLIC;

should be added to the upgrade script

Also, shouldn't we first do:

/* First we have to remove them from the extension */
ALTER EXTENSION pg_stat_statements DROP VIEW ..
ALTER EXTENSION pg_stat_statements DROP FUNCTION ..

like in previous upgrade scripts?

> +   Time at which min/max statistics gathering started for this
> +   statement

I think it would be better to explicitly mention in the documentation all 4 
fields for which minmax_stats_since displays the time.

regards, Sergei

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 15:26:38 -0400, Tom Lane wrote:
> Matthias van de Meent  writes:
> > ... with that patch we actually don't need the attcacheoff in the
> > pg_atttribute struct: it only needs to be present in the derived
> > "TupleAttrAlignData" structs which carry the
> > length/alignment/storage/byval info.
> 
> Yeah, I was wondering about that too: keeping attcacheoff as local
> state in slots might get us all its win without so much conceptual
> dirtiness.

It's also the place where it's the least likely to help - afaict attcacheoff
is only really beneficial for fastgetattr(). Which conditions it's use more
strictly - not only can there not be any NULLs before the accessed column,
there may not be any NULLs in the tuple at all.

Greetings,

Andres Freund

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Matthias van de Meent

On Tue, 21 Mar 2023 at 20:58, Andres Freund  wrote:
>
> Hi,
>
> On 2023-03-21 20:20:40 +0100, Matthias van de Meent wrote:
> > On Tue, 21 Mar 2023 at 19:55, Tom Lane  wrote:
> > >
> > > Andres Freund  writes:
> > > > FWIW, I think we should consider getting rid of attcacheoff. I doubt 
> > > > it's
> > > > worth its weight these days, because deforming via slots starts at the
> > > > beginning anyway. The overhead of maintaining it is not insubstantial, 
> > > > and
> > > > it's just architecturally ugly to to update tupledescs continually.
> > >
> > > I'd be for that if we can convince ourselves there's not a material
> > > speed penalty.  As you say, it's quite ugly.
> >
> > Yes, attcacheoff is a tremendous performance boon in many cases.
>
> Which? We don't use fastgetattr() in many places these days. And in some quick
> measurements it's a wash or small loss when deforming slot tuples, even when
> the attcacheoff optimization would apply, because the branches for managing it
> add more overhead than they safe.

My experience with attcacheoff performance is in indexes, specifically
index_getattr(). Sure, multi-column indexes are uncommon, but the
difference between have and have-not for cached attribute offsets is
several %.

Kind regards,

Matthias van de Meent

Re: Comment in preptlist.c

2023-03-21 Thread David Rowley

On Tue, 21 Mar 2023 at 22:41, Etsuro Fujita  wrote:
> I think that “planner/rewriter” should be parser/rewriter.  Attached
> is a patch for that.

Pushed.

David

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 20:20:40 +0100, Matthias van de Meent wrote:
> On Tue, 21 Mar 2023 at 19:55, Tom Lane  wrote:
> >
> > Andres Freund  writes:
> > > FWIW, I think we should consider getting rid of attcacheoff. I doubt it's
> > > worth its weight these days, because deforming via slots starts at the
> > > beginning anyway. The overhead of maintaining it is not insubstantial, and
> > > it's just architecturally ugly to to update tupledescs continually.
> >
> > I'd be for that if we can convince ourselves there's not a material
> > speed penalty.  As you say, it's quite ugly.
> 
> Yes, attcacheoff is a tremendous performance boon in many cases.

Which? We don't use fastgetattr() in many places these days. And in some quick
measurements it's a wash or small loss when deforming slot tuples, even when
the attcacheoff optimization would apply, because the branches for managing it
add more overhead than they safe.

Greetings,

Andres Freund

Re: Adjust Memoize hit_ratio calculation

2023-03-21 Thread David Rowley

On Tue, 21 Mar 2023 at 09:41, David Rowley  wrote:
> Because that's being changed in v16, I think it might also be a good
> idea to fix the hit_ratio calculation problem reported by David
> Johnston in [1].  In the attached, I've adjusted David's calculation
> slightly so that we divide by Max(ndistinct, est_cache_entries)
> instead of ndistinct.  This saves from overestimating when ndistinct
> is smaller than est_cache_entries.  I'd rather fix this now for v16
> than wait until v17 and further adjust the Memoize costing.

I've now pushed this change.

David

Re: Add SHELL_EXIT_CODE to psql

2023-03-21 Thread Corey Huinker

>
>
> As you had it, any nonzero result would prevent backtick substitution.
> I'm not really sure how much thought went into the existing behavior,
> but I am pretty sure that it's not part of this patch's charter to
> change that.
>
> regards, tom lane
>

 The changes all make sense, thanks!

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Tom Lane

Matthias van de Meent  writes:
> ... with that patch we actually don't need the attcacheoff in the
> pg_atttribute struct: it only needs to be present in the derived
> "TupleAttrAlignData" structs which carry the
> length/alignment/storage/byval info.

Yeah, I was wondering about that too: keeping attcacheoff as local
state in slots might get us all its win without so much conceptual
dirtiness.

regards, tom lane

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Matthias van de Meent

On Tue, 21 Mar 2023 at 19:55, Tom Lane  wrote:
>
> Andres Freund  writes:
> > FWIW, I think we should consider getting rid of attcacheoff. I doubt it's
> > worth its weight these days, because deforming via slots starts at the
> > beginning anyway. The overhead of maintaining it is not insubstantial, and
> > it's just architecturally ugly to to update tupledescs continually.
>
> I'd be for that if we can convince ourselves there's not a material
> speed penalty.  As you say, it's quite ugly.

Yes, attcacheoff is a tremendous performance boon in many cases. But
all is not lost:

When I was working on other improvements I experimented with storing
the attributes used in (de)serializing tuples to disk in a separate
structured array in the TupleDesc, a prototype patch of which I shared
here [0]. I didn't see a speed difference back then so I didn't
further venture into that path (as it adds complexity without
performance benefits), but I think it can be relevant to this thread
because with that patch we actually don't need the attcacheoff in the
pg_atttribute struct: it only needs to be present in the derived
"TupleAttrAlignData" structs which carry the
length/alignment/storage/byval info.

Kind regards,

Matthias van de Meent

[0] 
https://www.postgresql.org/message-id/CAEze2Wh8-metSryZX_Ubj-uv6kb%2B2YnzHAejmEdubjhmGusBAg%40mail.gmail.com

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Tom Lane

Andres Freund  writes:
> FWIW, I think we should consider getting rid of attcacheoff. I doubt it's
> worth its weight these days, because deforming via slots starts at the
> beginning anyway. The overhead of maintaining it is not insubstantial, and
> it's just architecturally ugly to to update tupledescs continually.

I'd be for that if we can convince ourselves there's not a material
speed penalty.  As you say, it's quite ugly.

regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-21 Thread Justin Pryzby

On Thu, Mar 16, 2023 at 07:04:16PM +0400, Ilya Gladyshev wrote:
> > 16 марта 2023 г., в 04:07, Justin Pryzby  написал(а):
> > 
> > On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
> >>> The only change from the current patch is (3).  (1) still calls
> >>> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> >>> the progress reporting to set the TOTAL in ProcessUtilitySlow().
> >> 
> >> As for reusing TOTAL calculated outside of DefineIndex, as I can see, 
> >> ProcessUtilitySlow is not the only call site for DefineIndex (although, I 
> >> don’t know whether all of them need progress tracking), for instance, 
> >> there is ALTER TABLE that calls DefineIndex to create index for 
> >> constraints. So I feel like rearranging progress reporting will result in 
> >> unnecessary code duplication in those call sites, so passing in an 
> >> optional parameter seems to be easier here, if we are going to optimize 
> >> it, after all. Especially if back-patching is a non-issue.
> > 
> > Yeah.  See attached.  I don't like duplicating the loop.  Is this really
> > the right direction to go ?
> > 
> > I haven't verified if the child tables are locked in all the paths which
> > would call count_leaf_partitions().  But why is it important to lock
> > them for this?  If they weren't locked before, that'd be a pre-existing
> > problem...
> > <0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch>
> 
> I’m not sure what the general policy on locking is, but I have checked ALTER 
> TABLE ADD INDEX, and the all the partitions seem to be locked on the first 
> entry to DefineIndex there. All other call sites pass in the parentIndexId, 
> which means the progress tracking machinery will not be initialized, so I 
> think, we don’t need to do locking in count_leaf_partitions(). 

> The approach in the patch looks good to me. Some nitpicks on the patch: 
> 1. There’s an unnecessary second call to get_rel_relkind in 
> ProcessUtilitySlow, we can just use what’s in the variable relkind.
> 2. We can also combine else and if to have one less nested level like that:
> + else if (!RELKIND_HAS_PARTITIONS(child_relkind))
> 
> 3. There was a part of the comment saying "If the index was built by calling 
> DefineIndex() recursively, the called function is responsible for updating 
> the progress report for built indexes.", I think it is still useful to have 
> it there.

Thanks, I addressed (1) and (3).  (2) is deliberate, to allow a place to
put the comment which is not specific to the !HAS_PARTITIONS case.

-- 
Justin
>From 90300bc4ca08088de3a0015ff5c1a251537feacc Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 ++-
 src/backend/bootstrap/bootparse.y |  2 +
 src/backend/commands/indexcmds.c  | 74 +--
 src/backend/commands/tablecmds.c  |  4 +-
 src/backend/tcop/utility.c|  8 +-
 src/backend/utils/activity/backend_progress.c | 28 +++
 src/include/commands/defrem.h |  1 +
 src/include/utils/backend_progress.h  |  1 +
 8 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d02..3f891b75541 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6901,7 +6901,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect

Re: PostgreSQL 16 Release Management Team & Feature Freeze

2023-03-21 Thread Jonathan S. Katz


On 3/21/23 1:17 PM, Roberto Mello wrote:

On Tue, Mar 21, 2023 at 9:35 AM Jonathan S. Katz  wrote:


You can track open items for the PostgreSQL 16 release here:

 https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items


The wiki page references April 8th, 2022, btw.


Fixed :) Thanks!

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature

Re: Transparent column encryption

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 18:05:15 +0100, Peter Eisentraut wrote:
> On 16.03.23 17:36, Andres Freund wrote:
> > Maybe a daft question, but why do we need a separate type and typmod for
> > encrypted columns? Why isn't the fact that the column is encrypted exactly 
> > one
> > new field, and we use the existing type/typmod fields?
> 
> The way this is implemented is that for an encrypted column, the real
> atttypid and atttypmod are one of the encrypted special types
> (pg_encrypted_*).  That way, most of the system doesn't need to care about
> the details of encryption or whatnot, it just unpacks tuples etc. by looking
> at atttypid, atttyplen, etc., and queries on encrypted data behave normally
> by just looking at what operators etc. those types have.  This approach
> heavily contains the number of places that need to know about this feature
> at all.

I get that for the type, but why do we need the typmod duplicated as well?

Greetings,

Andres Freund

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 18:15:40 +0100, Peter Eisentraut wrote:
> On 21.03.23 17:43, Andres Freund wrote:
> > > The context of my message was to do the proposed change for PG16 to buy 
> > > back
> > > a few bytes that are being added by another feature
> > How much would you need to buy back to "reach parity"?
> 
> I don't think we can find enough to make the impact zero bytes.  It's also
> not clear exactly what the impact of each byte would be (compared to
> possible complications in other parts of the code, for example).  But if
> there are a few low-hanging fruit, it seems like we could pick them, to old
> us over until we have a better solution to the underlying issue.

attndims 4->2
attstattarget 4->2
attinhcount 4->2

+ some reordering only gets you from 112->108 unfortunately, due to a 1 byte
alignment hole and 2 bytes of trailing padding.

before:
/* size: 112, cachelines: 2, members: 22 */
/* sum members: 111, holes: 1, sum holes: 1 */
/* last cacheline: 48 bytes */

after:
/* size: 108, cachelines: 2, members: 22 */
/* sum members: 105, holes: 1, sum holes: 1 */
/* padding: 2 */
/* last cacheline: 44 bytes */

You might be able to fill the hole + padding with your data - but IIRC that
was 3 4byte integers?

FWIW, I think we should consider getting rid of attcacheoff. I doubt it's
worth its weight these days, because deforming via slots starts at the
beginning anyway. The overhead of maintaining it is not insubstantial, and
it's just architecturally ugly to to update tupledescs continually.

Not for your current goal, but I do wonder how hard it'd be to make it work to
store multiple booleans as bitmasks. Probably ties into the discussion around
not relying on struct "mapping" for catalog tables (which we IIRC decided is
the sensible way the NAMEDATALEN restriction).

E.g. pg_attribute has 6 booleans, and attgenerated effectively is a boolean
too, and attidentity could easily be modeled as such as well.

If were to not rely on struct mapping anymore, we could possibly transparently
do this as part of forming/deforming heap tuples. Using something like
TYPALIGN_BIT. The question is whether it'd be too expensive to decode...

Greetings,

Andres Freund

Re: PostgreSQL 16 Release Management Team & Feature Freeze

2023-03-21 Thread Roberto Mello

On Tue, Mar 21, 2023 at 9:35 AM Jonathan S. Katz  wrote:
>
> You can track open items for the PostgreSQL 16 release here:
>
> https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items

The wiki page references April 8th, 2022, btw.

Roberto

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Peter Eisentraut


On 21.03.23 17:43, Andres Freund wrote:

The context of my message was to do the proposed change for PG16 to buy back
a few bytes that are being added by another feature

How much would you need to buy back to "reach parity"?


I don't think we can find enough to make the impact zero bytes.  It's 
also not clear exactly what the impact of each byte would be (compared 
to possible complications in other parts of the code, for example).  But 
if there are a few low-hanging fruit, it seems like we could pick them, 
to old us over until we have a better solution to the underlying issue.

Re: CREATE DATABASE ... STRATEGY WAL_LOG issues

2023-03-21 Thread Robert Haas

On Tue, Mar 21, 2023 at 12:34 PM Andres Freund  wrote:
> More generally, I still think we need logic to use unused buffers even when
> strategies are in use

Yep.

> (my current theory is that we wouldn't increase the
> usagecount when strategies use unused buffers, so they can be replaced more
> easily).

Don't know about this part.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add SHELL_EXIT_CODE to psql

2023-03-21 Thread Tom Lane

Corey Huinker  writes:
> On Mon, Mar 20, 2023 at 1:01 PM Tom Lane  wrote:
>> * Why do you have wait_result_to_exit_code defaulting to return 0
>> if it doesn't recognize the code as either WIFEXITED or WIFSIGNALED?
>> That seems pretty misleading; again -1 would be a better idea.

> That makes sense as well. Given that WIFSIGNALED is currently defined as
> the negation of WIFEXITED, whatever default result we have here is
> basically a this-should-never-happen.

Good point.  So we'd better have it first pass through -1 literally,
else pclose() or system() failure will be reported as something
misleading (probably signal 127?).

Pushed with that change, some cosmetic adjustments, and one significant
logic change in do_backtick: I made it do

if (fd)
{
/*
 * Although pclose's result always sets SHELL_EXIT_CODE, we
 * historically have abandoned the backtick substitution only if it
 * returns -1.
 */
exit_code = pclose(fd);
if (exit_code == -1)
{
pg_log_error("%s: %m", cmd);
error = true;
}
}

As you had it, any nonzero result would prevent backtick substitution.
I'm not really sure how much thought went into the existing behavior,
but I am pretty sure that it's not part of this patch's charter to
change that.

regards, tom lane

Re: Transparent column encryption

2023-03-21 Thread Peter Eisentraut


On 16.03.23 17:36, Andres Freund wrote:

Maybe a daft question, but why do we need a separate type and typmod for
encrypted columns? Why isn't the fact that the column is encrypted exactly one
new field, and we use the existing type/typmod fields?


The way this is implemented is that for an encrypted column, the real 
atttypid and atttypmod are one of the encrypted special types 
(pg_encrypted_*).  That way, most of the system doesn't need to care 
about the details of encryption or whatnot, it just unpacks tuples etc. 
by looking at atttypid, atttyplen, etc., and queries on encrypted data 
behave normally by just looking at what operators etc. those types have. 
 This approach heavily contains the number of places that need to know 
about this feature at all.



Do we need to decouple tuple descriptors from pg_attribute altogether?


Yes. Very clearly. The amount of memory and runtime we spent on tupledescs is
disproportionate. A second angle is that we build tupledescs way way too
frequently. The executor infers them everywhere, so not even prepared
statements protect against that.



How do we decide what goes into the tuple descriptor and what does not?  I'm
interested in addressing this, because obviously we do want the ability to
add more features in the future, but I don't know what the direction should
be.


We've had some prior discussion around this, see e.g.
https://postgr.es/m/20210819114435.6r532qbadcsyfscp%40alap3.anarazel.de


This sounds like a good plan.

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 17:36:48 +0100, Peter Eisentraut wrote:
> On 21.03.23 00:51, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2023-03-20 10:37:36 -0400, Tom Lane wrote:
> > > > I agree that attinhcount could be narrowed, but I have some concern
> > > > about attstattarget.  IIRC, the limit on attstattarget was once 1000
> > > > and then we raised it to 1.  Is it inconceivable that we might
> > > > want to raise it to 10 someday?
> > 
> > > Hard to believe that'd happen in a minor version - and I don't think 
> > > there'd
> > > an issue with widening it again in a major version?
> > 
> > True.  However, I think Tomas' idea of making these columns nullable
> > is even better than narrowing them.

Why not do both?


> The context of my message was to do the proposed change for PG16 to buy back
> a few bytes that are being added by another feature

How much would you need to buy back to "reach parity"?

Greetings,

Andres Freund

Re: Save a few bytes in pg_attribute

2023-03-21 Thread Peter Eisentraut


On 21.03.23 00:51, Tom Lane wrote:

Andres Freund  writes:

On 2023-03-20 10:37:36 -0400, Tom Lane wrote:

I agree that attinhcount could be narrowed, but I have some concern
about attstattarget.  IIRC, the limit on attstattarget was once 1000
and then we raised it to 1.  Is it inconceivable that we might
want to raise it to 10 someday?



Hard to believe that'd happen in a minor version - and I don't think there'd
an issue with widening it again in a major version?


True.  However, I think Tomas' idea of making these columns nullable
is even better than narrowing them.


The context of my message was to do the proposed change for PG16 to buy 
back a few bytes that are being added by another feature, and then 
consider doing a larger detangling of pg_attribute and tuple descriptors 
in PG17, which might well involve taking the attstattarget out of the 
hot path.  Making attstattarget nullable (i.e., not part of the fixed 
part of pg_attribute) would require fairly significant surgery, so I 
think it would be better done as part of a more comprehensive change 
that would allow the same treatment for other columns as well.

Re: CREATE DATABASE ... STRATEGY WAL_LOG issues

2023-03-21 Thread Andres Freund

Hi,

On 2023-03-21 11:33:59 -0400, Robert Haas wrote:
> On Tue, Mar 21, 2023 at 3:01 AM Andres Freund  wrote:
> > Easy enough to fix and shows clear improvement. One thing I wonder is if 
> > it's
> > worth moving the strategies up one level? Probaly not, but ...
> 
> Hmm, so share a strategy across all relation forks? You could even
> push it up a level beyond that and share it across all relations being
> copied.

The latter is what I was wondering about.

> That feels like it would be slightly more rational behavior,
> but I'm not smart enough to guess whether anyone would actually be
> happier (or less happy) after such a change than they are now.

Yea, I'm not either. The current behaviour does have the feature that it will
read in some data for each table, but limits trashing of shared buffers for
huge tables. That's good if your small to medium sized source database isn't
in s_b, because the next CREATE DATABASE has a change to not need to read the
data again. But if you have a source database with lots of small relations, it
can easily lead to swamping s_b.

More generally, I still think we need logic to use unused buffers even when
strategies are in use (my current theory is that we wouldn't increase the
usagecount when strategies use unused buffers, so they can be replaced more
easily).

Greetings,

Andres Freund

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Dave Cramer

On Tue, 21 Mar 2023 at 11:52, Jeff Davis  wrote:

> On Mon, 2023-03-20 at 20:18 -0400, Dave Cramer wrote:
> > For JAVA pools it's probably OK, but for pools like pgbouncer we have
> > no control of who is going to get the connection next.
>
> Can pgbouncer use different pools for different settings of
> format_binary?
>
>
My concern here is that if I can only change binary format in the startup
parameter then when I return the connection to the pool I would expect the
pool to reset all session level settings including binary format.
The next time I borrow the connection I can no longer set binary format.


>

Re: Schema variables - new implementation for Postgres 15

2023-03-21 Thread Peter Eisentraut


On 17.03.23 21:50, Pavel Stehule wrote:

rebase + fix-update pg_dump tests


I have a few comments on the code:

0001

ExecGrant_Variable() could probably use ExecGrant_common().

The additions to syscache.c should be formatted to the new style.

in pg_variable.h:

- create_lsn ought to have a "var" prefix.

- typo: "typmode for variable's type"

- What is the purpose of struct Variable?  It seems very similar to
  FormData_pg_variable.  At least a comment would be useful.

Preserve the trailing comma in ParseExprKind.


0002

expr_kind_allows_session_variables() should have some explanation
about criteria for determining which expression kinds should allow
variables.

Usually, we handle EXPR_KIND_* switches without default case, so we
get notified what needs to be changed if a new enum symbol is added.


0010

The material from the tutorial (advanced.sgml) might be better in
ddl.sgml.

In catalogs.sgml, the columns don't match the ones actually defined in
pg_variable.h in patch 0001 (e.g., create_lsn is missing and the order
doesn't match).

(The order of columns in pg_variable.h didn't immediately make sense to 
me either, so maybe there is a middle ground to be found.)


session_variables_ambiguity_warning: There needs to be more
information about this.  The current explanation is basically just,
"warn if your query is confusing".  Why do I want that?  Why would I
not want that?  What is the alternative?  What are some examples?
Shouldn't there be a standard behavior without a need to configure
anything?

In allfiles.sgml, dropVariable should be before dropView.

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Jeff Davis

On Mon, 2023-03-20 at 20:18 -0400, Dave Cramer wrote:
> For JAVA pools it's probably OK, but for pools like pgbouncer we have
> no control of who is going to get the connection next.

Can pgbouncer use different pools for different settings of
format_binary?

Regards,
Jeff Davis

PostgreSQL 16 Release Management Team & Feature Freeze

2023-03-21 Thread Jonathan S. Katz


Hi,

We are pleased to announce the Release Management Team (RMT) (cc'd) for
the PostgreSQL 16 release:

   - Alvaro Herrera
   - Amit Kapila
   - Jonathan Katz

You can find information about the responsibilities of the RMT here:

   https://wiki.postgresql.org/wiki/Release_Management_Team

Additionally, the RMT has set the feature freeze to be **April 8, 2023 
at 0:00 AoE**[1]. This is the last time to commit features for 
PostgreSQL 16. In  other words, no new PostgreSQL 16 feature can be 
committed after April 8, 2023 at 0:00 AoE.


You can track open items for the PostgreSQL 16 release here:

   https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items

Finally, the Release Team is considering making April 8 0:00 AoE the 
standard feature freeze date/time for future years. This has been the 
cutoff over the past several years, and standardizing will give 
predictability to when the feature development cycle ends. If you have 
reasons why this should not be the case, please voice your concerns.


Please let us know if you have any questions.

On behalf of the PG16 RMT,

Jonathan

[1] https://en.wikipedia.org/wiki/Anywhere_on_Earth


OpenPGP_signature
Description: OpenPGP digital signature

Re: CREATE DATABASE ... STRATEGY WAL_LOG issues

2023-03-21 Thread Robert Haas

On Tue, Mar 21, 2023 at 3:01 AM Andres Freund  wrote:
> While hacking on my relation extension patch I found two issues with WAL_LOG:
>
> 1) RelationCopyStorageUsingBuffer() doesn't free the used strategies. This
>means we'll use #relations * ~10k memory

Woops.

> 2) RelationCopyStorageUsingBuffer() gets the buffer for the target relation
>with RBM_NORMAL, therefore requiring a read of a block guaranteed to be
>zero

Woops.

> Easy enough to fix and shows clear improvement. One thing I wonder is if it's
> worth moving the strategies up one level? Probaly not, but ...

Hmm, so share a strategy across all relation forks? You could even
push it up a level beyond that and share it across all relations being
copied. That feels like it would be slightly more rational behavior,
but I'm not smart enough to guess whether anyone would actually be
happier (or less happy) after such a change than they are now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: proposal: possibility to read dumped table's name from file

2023-03-21 Thread Pavel Stehule

út 21. 3. 2023 v 16:32 odesílatel Justin Pryzby 
napsal:

> On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> > ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby 
> napsal:
> >
> > > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > > rebase + enhancing about related option from a563c24
> > >
> > > Thanks.
> > >
> > > It looks like this doesn't currently handle extensions, which were
> added
> > > at 6568cef26e.
>
> What about this part ?  Should extension filters be supported ?
>

I missed this, yes, it should be supported.



>
> I think the comment that I'd patched that lists all the filter types
> should be minimized, rather than duplicating the list of all the
> possible filters that's already in the usrr-facing documentation.
>
> One new typo: childrent
>

Re: Make EXPLAIN generate a generic plan for a parameterized query

2023-03-21 Thread Christoph Berg

Hi,

I reviewed the patch and find the idea of allowing $placeholders with
EXPLAIN very useful, it will solve relevant real-world use-cases.
(Queries from pg_stat_statements, found in the log, or in application
code.)

I have some comments:

> This allows EXPLAIN to generate generic plans for parameterized statements
> (that have parameter placeholders like $1 in the statement text).

> +   
> +GENERIC_PLAN
> +
> + 
> +  Generate a generic plan for the statement (see  linkend="sql-prepare"/>
> +  for details about generic plans).  The statement can contain parameter
> +  placeholders like $1 (but then it has to be a 
> statement
> +  that supports parameters).  This option cannot be used together with
> +  ANALYZE, since a statement with unknown parameters
> +  cannot be executed.

Like in the commit message quoted above, I would put more emphasis on
"parameterized query" here:

  Allow the statement to contain parameter placeholders like
  $1 and generate a generic plan for it.
  This option cannot be used together with ANALYZE.

> + /* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */
> + if (es->generic && es->analyze)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("EXPLAIN ANALYZE cannot be used with 
> GENERIC_PLAN")));

To put that in line with the other error messages in that context, I'd
inject an extra "option":

  errmsg("EXPLAIN option ANALYZE cannot be used with GENERIC_PLAN")));

> --- a/src/test/regress/sql/explain.sql
> +++ b/src/test/regress/sql/explain.sql
> @@ -128,3 +128,33 @@ select explain_filter('explain (verbose) select * from 
> t1 where pg_temp.mysin(f1
>  -- Test compute_query_id
>  set compute_query_id = on;
>  select explain_filter('explain (verbose) select * from int8_tbl i8');
> +
> +-- Test EXPLAIN (GENERIC_PLAN)
> +select explain_filter('explain (generic_plan) select unique1 from tenk1 
> where thousand = $1');
> +-- should fail
> +select explain_filter('explain (analyze, generic_plan) select unique1 from 
> tenk1 where thousand = $1');
> +-- Test EXPLAIN (GENERIC_PLAN) with partition pruning
> +-- should prune at plan time, but not at execution time
> +create extension if not exists postgres_fdw;

"create extension postgres_fdw" cannot be used from src/test/regress/
since contrib/ might not have been built.

> +create server loop42 foreign data wrapper postgres_fdw;
> +create user mapping for current_role server loop42 options 
> (password_required 'false');
> +create table gen_part (
> +  key1 integer not null,
> +  key2 integer not null
> +) partition by list (key1);
> +create table gen_part_1
> +  partition of gen_part for values in (1)
> +  partition by range (key2);
> +create foreign table gen_part_1_1
> +  partition of gen_part_1 for values from (1) to (2)
> +  server loop42 options (table_name 'whatever_1_1');
> +create foreign table gen_part_1_2
> +  partition of gen_part_1 for values from (2) to (3)
> +  server loop42 options (table_name 'whatever_1_2');
> +create foreign table gen_part_2
> +  partition of gen_part for values in (2)
> +  server loop42 options (table_name 'whatever_2');
> +select explain_filter('explain (generic_plan) select key1, key2 from 
> gen_part where key1 = 1 and key2 = $1');

I suggest leaving this test in place here, but with local tables (to
show that plan time pruning using the one provided parameter works),
and add a comment here explaining that is being tested:

-- create a partition hierarchy to show that plan time pruning removes
-- the key1=2 table but generates a generic plan for key2=$1

The test involving postgres_fdw is still necessary to exercise the new
EXEC_FLAG_EXPLAIN_GENERIC code path, but needs to be moved elsewhere,
probably src/test/modules/.

In the new location, likewise add a comment why this test needs to
look this way.

Christoph

Re: proposal: possibility to read dumped table's name from file

2023-03-21 Thread Justin Pryzby

On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby  napsal:
> 
> > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > rebase + enhancing about related option from a563c24
> >
> > Thanks.
> >
> > It looks like this doesn't currently handle extensions, which were added
> > at 6568cef26e.

What about this part ?  Should extension filters be supported ?

I think the comment that I'd patched that lists all the filter types
should be minimized, rather than duplicating the list of all the
possible filters that's already in the usrr-facing documentation.

One new typo: childrent

Making the initial and maximum DSA segment sizes configurable

2023-03-21 Thread Masahiko Sawada

Hi all,

I started this new thread from another thread[1] where we're
discussing a new storage for TIDs, TidStore, since we found a
difficulty about the memory usage limit for TidStores on DSA.

TidStore is a new data structure to efficiently store TIDs, backed by
a radix tree. In the patch series proposed on that thread, in addition
to radix tree and TidStore, there is another patch for lazy (parallel)
vacuum to replace the array of dead tuple TIDs with a TidStore. To
support parallel vacuum, radix tree (and TidStore) can be created on a
local memory as well as on DSA. Also, it has memory usage limit
functionality; we can specify the memory limit (e.g.,
maintenance_work_mem) to TidStoreCreate() function. Once the total DSA
segment size (area->control->total_segment_size) exceeds the limit,
TidStoreIsFull() returns true. The lazy vacuum can continue scanning
heap blocks to collect dead tuple TIDs until TidStoreIsFull() returns
true. Currently lazy vacuum is the sole user of TidStore but maybe it
can be used by other codes such as tidbitmap.c where will be limited
by work_mem.

During the development, we found out that DSA memory growth is
unpredictable, leading to inefficient memory limitation.

DSA is built on top of DSM segments and it manages a set of DSM
segments, adding new segments as required and detaching them when they
are no longer needed. The DSA segment starts with 1MB in size and a
new segment size is at least big enough to follow a geometric series
that approximately doubles the total storage each time we create a new
segment. Because of this fact, it's not efficient to simply compare
the memory limit to the total segment size. For example, if
maintenance_work_mem is 512MB, the total segment size will be like:

2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) = 510MB -> less than the
limit, continue heap scan.

2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) + 256 = 766MB -> stop (exceed  254MB).

One might think we can use dsa_set_size_limit() but it cannot; lazy
vacuum ends up with an error. If we set DSA_ALLOC_NO_OOM, we might end
up stopping the insertion halfway.

Besides excessively allocating memory, since the initial DSM segment
size is fixed 1MB, memory usage of a shared TidStore will start from
1MB+. This is higher than the minimum values of both work_mem and
maintenance_work_mem, 64kB and 1MB respectively. Increasing the
minimum m_w_m to 2MB might be acceptable but not for work_mem.

Researching possible solutions, we found that aset.c also has a
similar characteristic; allocates an 8K block (by default) upon the
first allocation in a context, and doubles that size for each
successive block request. But we can specify the initial block size
and max blocksize. This made me think of an idea to specify both to
DSA and both values are calculated based on m_w_m. I've attached the
patch for this idea. The changes to dsa.c are straightforward since
dsa.c already uses macros DSA_INITIAL_SEGMENT_SIZE and
DSA_MAX_SEGMENT_SIZE. I just made these values configurable.

FYI with this patch, we can create a DSA in parallel_vacuum_init()
with initial and maximum block sizes as follows:

initial block size = min(m_w_m / 4, 1MB)
max block size = max(m_w_m / 8, 8MB)

In most cases, we can start with a 1MB initial segment, the same as
before. For larger memory, the heap scan stops after DSA allocates
1.25 times more memory than m_w_m. For example, if m_w_m = 512MB, the
both initial and maximum segment sizes are 1MB and 64MB respectively,
and then DSA allocates the segments as follows until heap scanning
stops:

2 * (1 + 2 + 4 + 8 + 16 + 32 + 64) + (64 * 4) = 510MB -> less than the
limit, continue heap scan.

2 * (1 + 2 + 4 + 8 + 16 + 32 + 64) + (64 * 5) = 574MB -> stop
(allocated additional 62MB).

It also works with smaller memory; If the limit is 1MB, we start with
a 256KB initial segment and heap scanning stops after DSA allocated
1.5MB (= 256kB + 256kB + 512kB + 512kB).

There is room for considering better formulas for initial and maximum
block sizes but making both values configurable is a promising idea.
And the analogous behavior to aset could be a good thing for
readability and maintainability. There is another test result where I
used this idea on top of a radix tree[2].

We need to consider the total number of allocated DSA segments as the
total number of DSM segments available on the system is fixed[3]. But
it seems not problematic even with this patch since we allocate only a
few additional segments (in above examples 17 segs vs. 19 segs). There
was no big difference also in performance[2].

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoDBmD5q%3DeO%2BK%3DgyuVt53XvwpJ2dgxPwrtZ-eVOjVmtJjg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAD21AoDKr%3D4YHphy6cRojE5eyT6E2ao8xb44E309eTrUEOC6xw%40mail.gmail.com
[3] from dsm.c, the total number of DSM segments available on the
system is calculated by:
#define PG_DYNSHMEM_FIXED_SLOTS 64
#define PG_DYNSHMEM_SLOTS_PER_BACKEND   5

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Merlin Moncure

On Tue, Mar 21, 2023 at 8:22 AM Dave Cramer  wrote:

>
> On Tue, 21 Mar 2023 at 07:35, Merlin Moncure  wrote:
>
>>
>>
>> On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer  wrote:
>>
>>>
>>>
>>>
>>> On Mon, 20 Mar 2023 at 19:10, Merlin Moncure  wrote:
>>>


 On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer 
 wrote:

>
> OIDs are a pain to deal with IMO.   They will not survive a dump style
 restore, and are hard to keep synchronized between databases...type names
 don't have this problem.   OIDs are an implementation artifact that ought
 not need any extra dependency.

>>> AFAIK, OID's for built-in types don't change.
>>> Clearly we need more thought on how to deal with UDT's
>>>
>>
>> Yeah.  Not having a solution that handles arrays and composites though
>> would feel pretty incomplete since they would be the one of the main
>> beneficiaries from a performance standpoint.
>>
> I don't think arrays of built-in types are a problem; drivers already know
> how to deal with these.
>
>
>> I guess minimally you'd need to expose some mechanic to look up oids, but
>> being able to specify "foo"."bar", in the GUC would be pretty nice (albeit
>> a lot more work).
>>
>
> As Jeff mentioned there is a visibility problem if the search path is
> changed.
>

Only if the name is not fully qualified. By allowing OID to bypass
visibility, it stands to reason visibility ought to be bypassed for type
requests as well, or at least be able to be.  If we are setting things in
GUC, that suggests we can establish things in postgresql.conf, and oids
feel out of place there.

Yep.  Your rationale is starting to click.  How would this interact with
>> existing code bases?
>>
> Actually JDBC wasn't the first to ask for this.  Default result formats
> should be settable per session · postgresql-interfaces/enhancement-ideas ·
> Discussion #5 (github.com)
>  
> I've
> tested it with JDBC and it requires no code changes on our end. Jack tested
> it and it required no code changes on his end either. He did some
> performance tests and found "At 100 rows the text format takes 48% longer
> than the binary format."
> https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5#discussioncomment-3188599
>

Yeah, the general need is very clear IMO.


> I get that JDBC is the main target, but how does this interact with libpq
>> code that explicitly sets resultformat?
>>
> Honestly I have no idea how it would function with libpq. I presume if the
> client did not request binary format then things would work as they do
> today.
>

I see your argument here, but IMO this is another can of nudge away from
GUC, unless you're willing to establish that behavior.  Thinking here is
that the GUC wouldn't do anything for libpq, uses cases, and couldn't,
since resultformat would be overriding the behavior in all interesting
cases...it seems odd to implement server side specified behavior that the
client library doesn't implement.

merlin

Re: Ability to reference other extensions by schema in extension scripts

2023-03-21 Thread Tom Lane

"Regina Obe"  writes:
>> making the no_relocate values visible somehow, I'm not convinced that
>> pg_available_extension_versions should be the place to do it.  ISTM what's
>> relevant is the no_relocate values of *installed* extensions, not those of
>> potentially-installable extensions.  If we had a view over pg_extension then
>> that might be a place to add this, but we don't.  On the whole it didn't seem
>> important enough to pursue, so I just left it out.

> Thanks.  Agree with get_available_versions_for_extension, not necessary.

If we did feel like doing something about this, on reflection I think
the thing to do would be to add no_relocate as an actual column in
pg_extension, probably of type "oid[]".  Then we could modify the
SET SCHEMA code to check that instead of parsing the extension control
files.  That'd be a little cleaner, but I can't say that I'm hugely
excited about it.

regards, tom lane

RE: Ability to reference other extensions by schema in extension scripts

2023-03-21 Thread Regina Obe

> Pushed with some mostly-cosmetic adjustments (in particular I tried to
make
> the docs and tests neater).
> 
> I did not commit the changes in get_available_versions_for_extension
> to add no_relocate as an output column.  Those were dead code because you
> hadn't done anything to connect them up to an actual output parameter of
> pg_available_extension_versions().  While I'm not necessarily averse to
> making the no_relocate values visible somehow, I'm not convinced that
> pg_available_extension_versions should be the place to do it.  ISTM what's
> relevant is the no_relocate values of *installed* extensions, not those of
> potentially-installable extensions.  If we had a view over pg_extension
then
> that might be a place to add this, but we don't.  On the whole it didn't
seem
> important enough to pursue, so I just left it out.
> 
>   regards, tom lane


Thanks.  Agree with get_available_versions_for_extension, not necessary.

Thanks,
Regina

Re: Comment in preptlist.c

2023-03-21 Thread Tom Lane

Etsuro Fujita  writes:
> While working on something else, I noticed $SUBJECT added by commit 86dc90056:
>  * For UPDATE and DELETE queries, the targetlist must also contain "junk"
>  * tlist entries needed to allow the executor to identify the rows to be
>  * updated or deleted; for example, the ctid of a heap row.  (The planner
>  * adds these; they're not in what we receive from the planner/rewriter.)

> I think that “planner/rewriter” should be parser/rewriter.  Attached
> is a patch for that.

Agreed, obviously a thinko :-(

regards, tom lane

Re: doc: add missing "id" attributes to extension packaging page

2023-03-21 Thread Brar Piening


On 20.03.2023 at 19:47, Gregory Stark (as CFM) wrote:

Looks like a lot of good work was happening on this patch right up
until mid-February. Is there a lot of work left? Do you think you'll
have a chance to wrap it up this commitfest for this release?


Thanks for the ping.

I had another look this morning and I think I can probably finish this
by the end of the week.

Re: real/float example for testlibpq3

2023-03-21 Thread Mark Wong

On Mon, Mar 20, 2023 at 01:37:57PM -0400, Gregory Stark (as CFM) wrote:
> On Mon, 23 Jan 2023 at 11:54, Mark Wong  wrote:
> fficient way to communicate useful information.
> >
> > Yeah, I will try to cover all the data types we ship. :)  I'll keep at
> > it and drop the code examples.
> 
> I assume this isn't going to happen for this commitfest? If you think
> it is then shout otherwise I'll mark it Returned with Feedback and
> move it on to the next release.

Sounds good.  I actually thought I did that already, thanks for catching
that.

Regards,
Mark
--
Mark Wong
EDB https://enterprisedb.com

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Dave Cramer

On Tue, 21 Mar 2023 at 07:35, Merlin Moncure  wrote:

>
>
> On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer  wrote:
>
>>
>>
>>
>> On Mon, 20 Mar 2023 at 19:10, Merlin Moncure  wrote:
>>
>>>
>>>
>>> On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer 
>>> wrote:
>>>

 OIDs are a pain to deal with IMO.   They will not survive a dump style
>>> restore, and are hard to keep synchronized between databases...type names
>>> don't have this problem.   OIDs are an implementation artifact that ought
>>> not need any extra dependency.
>>>
>> AFAIK, OID's for built-in types don't change.
>> Clearly we need more thought on how to deal with UDT's
>>
>
> Yeah.  Not having a solution that handles arrays and composites though
> would feel pretty incomplete since they would be the one of the main
> beneficiaries from a performance standpoint.
>
I don't think arrays of built-in types are a problem; drivers already know
how to deal with these.


> I guess minimally you'd need to expose some mechanic to look up oids, but
> being able to specify "foo"."bar", in the GUC would be pretty nice (albeit
> a lot more work).
>

As Jeff mentioned there is a visibility problem if the search path is
changed. The simplest solution IMO is to look up the OID at the time the
format is requested and use the OID going forward to format the output as
binary. If the search path changes and a type with the same name is now
first in the search path then the data would be returned in text.


>

>
>> This seems like a protocol or even a driver issue rather than a GUC
>>> issue. Why does the server need to care what format the client might want
>>> to prefer on a query by query basis?
>>>
>>
>> Actually this isn't a query by query basis. The point of this is that the
>> client wants all the results for given OID's in binary.
>>
>
> Yep.  Your rationale is starting to click.  How would this interact with
> existing code bases?
>
Actually JDBC wasn't the first to ask for this.  Default result formats
should be settable per session · postgresql-interfaces/enhancement-ideas ·
Discussion #5 (github.com)
 I've
tested it with JDBC and it requires no code changes on our end. Jack tested
it and it required no code changes on his end either. He did some
performance tests and found "At 100 rows the text format takes 48% longer
than the binary format."
https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5#discussioncomment-3188599

I get that JDBC is the main target, but how does this interact with libpq
> code that explicitly sets resultformat?
>
Honestly I have no idea how it would function with libpq. I presume if the
client did not request binary format then things would work as they do
today.


Dave

>
>

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-21 Thread Imseih (AWS), Sami

> This indeed feels a bit more natural seen from here, after looking at
> the code paths using an Instrumentation in the executor and explain,
> for example. At least, this stresses me much less than adding 16
> bytes to EState for something restricted to the extended protocol when
> it comes to monitoring capabilities.

Attached is the patch that uses Instrumentation. 

I did not add any new tests, and we do not have anyway now
of setting a row count when going through the Execute message.
I think this may be need to be addressed separately since there
Seems to be. Gap in extended query protocol testing.

For this fix however, The JDBC test does show correct results.


Regards,

Sami Imseih
Amazon Web Services (AWS)





0001-Correct-accumulation-of-counters-for-extended-query-.patch
Description: 0001-Correct-accumulation-of-counters-for-extended-query-.patch

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-03-21 Thread Alvaro Herrera

On 2023-Mar-21, Dean Rasheed wrote:

> Looking at it with fresh eyes though, I realise that I could have just written
> 
> action->qual = make_and_qual((Node *) ntest, action->qual);
> 
> which is equivalent, but more concise.

Nice.

I have no further observations about this patch.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-03-21 Thread Dean Rasheed

On Tue, 21 Mar 2023 at 10:28, Alvaro Herrera  wrote:
>
> > + /* Combine it with the action's WHEN condition */
> > + if (action->qual == NULL)
> > + action->qual = (Node *) ntest;
> > + else
> > + action->qual =
> > + (Node *) makeBoolExpr(AND_EXPR,
> > +   
> > list_make2(ntest, action->qual),
> > +   
> > -1);
>
> Hmm, I think ->qual is already in implicit-and form, so do you really
> need to makeBoolExpr, or would it be sufficient to append this new
> condition to the list?
>

No, this has come directly from transformWhereClause() in the parser,
so it's an expression tree, not a list. Transforming to implicit-and
form doesn't happen until later.

Looking at it with fresh eyes though, I realise that I could have just written

action->qual = make_and_qual((Node *) ntest, action->qual);

which is equivalent, but more concise.

Regards,
Dean

Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread Önder Kalacı

Hi Shi Yu,

>
>
> 1.
>  $node_publisher->safe_psql(
> 'postgres', qq(
> ALTER TABLE dropped_cols DROP COLUMN b_drop;
> +   ALTER TABLE generated_cols DROP COLUMN b_gen;
>  ));
>  $node_subscriber->safe_psql(
> 'postgres', qq(
> ALTER TABLE dropped_cols DROP COLUMN b_drop;
> +   ALTER TABLE generated_cols DROP COLUMN b_gen;
>  ));
>
> I think we want to test generated columns, so we don't need to drop
> columns.
> Otherwise the generated column problem can't be detected.
>
>
Ow, what a mistake. Now changed (and ensured that without the patch
the test fails).



> 2.
> # The bug was that when the REPLICA IDENTITY FULL is used with dropped
> columns,
> # we fail to apply updates and deletes
>
> Maybe we should mention generated columns in comment of the test.
>
> makes sense


> 3.
> I ran pgindent and it modified some lines. Maybe we can improve the patch
> as the following.
>
> @@ -292,8 +292,8 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot
> *slot2,
> att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
>
> /*
> -* Ignore dropped and generated columns as the publisher
> -* doesn't send those
> +* Ignore dropped and generated columns as the publisher
> doesn't send
> +* those
>  */
> if (att->attisdropped || att->attgenerated)
> continue;
>
>  fixed


Attached patches again.


Thanks,
Onder KALACI


v1-0001-REL_12-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_15-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_14-REL_13-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-HEAD-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-03-21 Thread Aleksander Alekseev

Hi John,

> Thanks for picking up this badly-needed topic again!

Many thanks for the review!

> 0001:
>
> +In this condition the system can still execute read-only transactions.
> +The active transactions will continue to execute and will be able to
> +commit.
>
> This is ambiguous. I'd first say that any transactions already started can 
> continue, and then say that only new read-only transactions can be started.

Fixed.

> 0004:
>
> -HINT:  Stop the postmaster and vacuum that database in single-user mode.
> +HINT:  VACUUM or VACUUM FREEZE that database.
>
> VACUUM FREEZE is worse and should not be mentioned, since it does unnecessary 
> work. Emergency vacuum is not school -- you don't get extra credit for doing 
> unnecessary work.

Fixed.

> Also, we may consider adding a boxed NOTE warning specifically against 
> single-user mode, especially if this recommendation will change in at least 
> some minor releases so people may not hear about it. See also [1].

Done.

> - * If we're past xidStopLimit, refuse to execute transactions, unless
> - * we are running in single-user mode (which gives an escape hatch
> - * to the DBA who somehow got past the earlier defenses).
> + * If we're past xidStopLimit, refuse to allocate new XIDs.
>
> This patch doesn't completely get rid of the need for single-user mode, so it 
> should keep all information about it. If a DBA wanted to e.g. drop or 
> truncate a table to save vacuum time, it is still possible to do it in 
> single-user mode, so the escape hatch is still useful.

Fixed.

> In swapping this topic back in my head, I also saw [2] where Robert suggested
>
> "that old prepared transactions and stale replication
> slots should be emphasized more prominently.  Maybe something like:
>
> HINT:  Commit or roll back old prepared transactions, drop stale
> replication slots, or kill long-running sessions.
> Ensure that autovacuum is progressing, or run a manual database-wide VACUUM."

It looks like the hint regarding replication slots was added at some
point. Currently we have:

```
errhint( [...]
"You might also need to commit or roll back old prepared
transactions, or drop stale replication slots.")));
```

So I choose to keep it as is for now. Please let me know if you think
we should also add a suggestion to kill long-running sessions, etc.

-- 
Best regards,
Aleksander Alekseev


v3-0001-Correct-the-docs-about-preventing-XID-wraparound.patch
Description: Binary data


v3-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patch
Description: Binary data


v3-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patch
Description: Binary data


v3-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch
Description: Binary data

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Merlin Moncure

On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer  wrote:

>
>
>
> On Mon, 20 Mar 2023 at 19:10, Merlin Moncure  wrote:
>
>>
>>
>> On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer  wrote:
>>
>>>
>>> OIDs are a pain to deal with IMO.   They will not survive a dump style
>> restore, and are hard to keep synchronized between databases...type names
>> don't have this problem.   OIDs are an implementation artifact that ought
>> not need any extra dependency.
>>
> AFAIK, OID's for built-in types don't change.
> Clearly we need more thought on how to deal with UDT's
>

Yeah.  Not having a solution that handles arrays and composites though
would feel pretty incomplete since they would be the one of the main
beneficiaries from a performance standpoint.I guess minimally you'd
need to expose some mechanic to look up oids, but being able to
specify "foo"."bar", in the GUC would be pretty nice (albeit a lot more
work).

> This seems like a protocol or even a driver issue rather than a GUC issue.
>> Why does the server need to care what format the client might want to
>> prefer on a query by query basis?
>>
>
> Actually this isn't a query by query basis. The point of this is that the
> client wants all the results for given OID's in binary.
>

Yep.  Your rationale is starting to click.  How would this interact with
existing code bases?  I get that JDBC is the main target, but how does this
interact with libpq code that explicitly sets resultformat? Perhaps the
answer should be as it shouldn't change documented behavior, and a
hypothetical resultformat=2 could be reserved to default to text but allow
for server control, and 3 as the same but default to binary.

merlin

Re: Initial Schema Sync for Logical Replication

2023-03-21 Thread Amit Kapila

On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira  wrote:
>
> On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
>
> > From: Alvaro Herrera 
> > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > On 2023-Mar-15, Kumar, Sachin wrote:
> >
> > > 1. In  CreateSubscription()  when we create replication
> > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > can use this snapshot later in the pg_dump.
> > >
> > > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> >
> > Overall I'm not on board with the idea that logical replication would 
> > depend on
> > pg_dump; that seems like it could run into all sorts of trouble (what if 
> > calling
> > external binaries requires additional security setup?  what about pg_hba
> > connection requirements? what about max_connections in tight
> > circumstances?).
> > what if calling external binaries requires additional security setup
> I am not sure what kind of security restriction would apply in this case, 
> maybe pg_dump
> binary can be changed ?
>
> Using pg_dump as part of this implementation is not acceptable because we
> expect the backend to be decoupled from the client. Besides that, pg_dump
> provides all table dependencies (such as tablespaces, privileges, security
> labels, comments); not all dependencies shouldn't be replicated.
>

I agree that in the initial version we may not support sync of all
objects but why that shouldn't be possible in the later versions?

> You should
> exclude them removing these objects from the TOC before running pg_restore or
> adding a few pg_dump options to exclude these objects. Another issue is 
> related
> to different version. Let's say the publisher has a version ahead of the
> subscriber version, a new table syntax can easily break your logical
> replication setup. IMO pg_dump doesn't seem like a good solution for initial
> synchronization.
>
> Instead, the backend should provide infrastructure to obtain the required DDL
> commands for the specific (set of) tables. This can work around the issues 
> from
> the previous paragraph:
>
...
> * don't need to worry about different versions.
>

AFAICU some of the reasons why pg_dump is not allowed to dump from the
newer version are as follows: (a) there could be more columns in the
newer version of the system catalog and then Select * type of stuff
won't work because the client won't have knowledge of additional
columns. (b) the newer version could have new features (represented by
say new columns in existing catalogs or new catalogs) that the older
version of pg_dump has no knowledge of and will fail to get that data
and hence an inconsistent dump. The subscriber will easily be not in
sync due to that.

Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems. If so, we may need to deny schema sync in any
such case.

-- 
With Regards,
Amit Kapila.

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-03-21 Thread Alvaro Herrera

On 2023-Mar-19, Dean Rasheed wrote:

> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> new file mode 100644
> index efe88cc..e1ebc8d
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y

> +merge_when_tgt_matched:
> + WHEN MATCHED{ $$ = 
> MERGE_WHEN_MATCHED; }
> + | WHEN NOT MATCHED BY SOURCE{ $$ = 
> MERGE_WHEN_NOT_MATCHED_BY_SOURCE; }
> + ;

I think a one-line comment on why this "matched" production matches "NOT
MATCHED BY" would be useful.  I think you have a big one in
transformMergeStmt already.

> + /* Combine it with the action's WHEN condition */
> + if (action->qual == NULL)
> + action->qual = (Node *) ntest;
> + else
> + action->qual =
> + (Node *) makeBoolExpr(AND_EXPR,
> + 
>   list_make2(ntest, action->qual),
> + 
>   -1);

Hmm, I think ->qual is already in implicit-and form, so do you really
need to makeBoolExpr, or would it be sufficient to append this new
condition to the list?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread shiy.f...@fujitsu.com

On Tue, Mar 21, 2023 4:51 PM Önder Kalacı  wrote:
> 
> Hi Amit, Shi Yu
> 
> Now attaching the similar patches for generated columns.
> 

Thanks for your patches. Here are some comments.

1.
 $node_publisher->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN b_gen;
 ));
 $node_subscriber->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN b_gen;
 ));

I think we want to test generated columns, so we don't need to drop columns.
Otherwise the generated column problem can't be detected.

2. 
# The bug was that when the REPLICA IDENTITY FULL is used with dropped columns,
# we fail to apply updates and deletes

Maybe we should mention generated columns in comment of the test.

3.
I ran pgindent and it modified some lines. Maybe we can improve the patch
as the following.

@@ -292,8 +292,8 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);

/*
-* Ignore dropped and generated columns as the publisher
-* doesn't send those
+* Ignore dropped and generated columns as the publisher 
doesn't send
+* those
 */
if (att->attisdropped || att->attgenerated)
continue;

Regards,
Shi Yu

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-21 Thread Ants Aasma

On Mon, 20 Mar 2023 at 00:59, Melanie Plageman
 wrote:
>
> On Wed, Mar 15, 2023 at 6:46 AM Ants Aasma  wrote:
> >
> > On Wed, 15 Mar 2023 at 02:29, Melanie Plageman
> >  wrote:
> > > As for routine vacuuming and the other buffer access strategies, I think
> > > there is an argument for configurability based on operator knowledge --
> > > perhaps your workload will use the data you are COPYing as soon as the
> > > COPY finishes, so you might as well disable a buffer access strategy or
> > > use a larger fraction of shared buffers. Also, the ring sizes were
> > > selected sixteen years ago and average server memory and data set sizes
> > > have changed.
> >
> > To be clear I'm not at all arguing against configurability. I was
> > thinking that dynamic use could make the configuration simpler by self
> > tuning to use no more buffers than is useful.
>
> Yes, but I am struggling with how we would define "useful".

For copy and vacuum, the only reason I can see for keeping visited
buffers around is to avoid flushing WAL or at least doing it in larger
batches. Once the ring is big enough that WAL doesn't need to be
flushed on eviction, making it bigger only wastes space that could be
used by something that is not going to be evicted soon.

> > > StrategyRejectBuffer() will allow bulkreads to, as you say, use more
> > > buffers than the original ring size, since it allows them to kick
> > > dirty buffers out of the ring and claim new shared buffers.
> > >
> > > Bulkwrites and vacuums, however, will inevitably dirty buffers and
> > > require flushing the buffer (and thus flushing the associated WAL) when
> > > reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of
> > > the ring, since dirtying buffers is their common case. A dynamic
> > > resizing like the one you suggest would likely devolve to vacuum and
> > > bulkwrite strategies always using the max size.
> >
> > I think it should self stabilize around the point where the WAL is
> > either flushed by other commit activity, WAL writer or WAL buffers
> > filling up. Writing out their own dirtied buffers will still happen,
> > just the associated WAL flushes will be in larger chunks and possibly
> > done by other processes.
>
> They will have to write out any WAL associated with modifications to the
> dirty buffer before flushing it, so I'm not sure I understand how this
> would work.

By the time the dirty buffer needs eviction the WAL associated with it
can already be written out by concurrent commits, WAL writer or by WAL
buffers filling up. The bigger the ring is, the higher the chance that
one of these will happen before we loop around.

> > > As for decreasing the ring size, buffers are only "added" to the ring
> > > lazily and, technically, as it is now, buffers which have been added
> > > added to the ring can always be reclaimed by the clocksweep (as long as
> > > they are not pinned). The buffer access strategy is more of a
> > > self-imposed restriction than it is a reservation. Since the ring is
> > > small and the buffers are being frequently reused, odds are the usage
> > > count will be 1 and we will be the one who set it to 1, but there is no
> > > guarantee. If, when attempting to reuse the buffer, its usage count is
> > > > 1 (or it is pinned), we also will kick it out of the ring and go look
> > > for a replacement buffer.
> >
> > Right, but while the buffer is actively used by the ring it is
> > unlikely that clocksweep will find it at usage 0 as the ring buffer
> > should cycle more often than the clocksweep. Whereas if the ring stops
> > using a buffer, clocksweep will eventually come and reclaim it. And if
> > the ring shrinking decision turns out to be wrong before the
> > clocksweep gets around to reusing it, we can bring the same buffer
> > back into the ring.
>
> I can see what you mean about excluding a buffer from the ring being a
> more effective way of allowing it to be reclaimed. However, I'm not sure
> I understand the use case. If the operation, say vacuum, is actively
> using the buffer and keeping its usage count at one, then what would be
> the criteria for it to decide to stop using it?

The criteria for reducing ring size could be that we have cycled the
ring buffer n times without having to do any WAL flushes.

> Also, if vacuum used the buffer once and then didn't reuse it but, for
> some reason, the vacuum isn't over, it isn't any different at that point
> than some other buffer with a usage count of one. It isn't any harder
> for it to be reclaimed by the clocksweep.
>
> The argument I could see for decreasing the size even when the buffers
> are being used by the operation employing the strategy is if there is
> pressure from other workloads to use those buffers. But, designing a
> system that would reclaim buffers when needed by other workloads is more
> complicated than what is being proposed here.

I don't think any specific reclaim is needed, if the ring stops using
a buffer *and* there is pressure

Re: Comment in preptlist.c

2023-03-21 Thread Richard Guo

On Tue, Mar 21, 2023 at 5:41 PM Etsuro Fujita 
wrote:

> While working on something else, I noticed $SUBJECT added by commit
> 86dc90056:
>
>  * For UPDATE and DELETE queries, the targetlist must also contain "junk"
>  * tlist entries needed to allow the executor to identify the rows to be
>  * updated or deleted; for example, the ctid of a heap row.  (The planner
>  * adds these; they're not in what we receive from the planner/rewriter.)
>
> I think that “planner/rewriter” should be parser/rewriter.  Attached
> is a patch for that.


Yes of course.  It should be parser/rewriter here.

Thanks
Richard

Re: Commitfest 2023-03 starting tomorrow!

2023-03-21 Thread Alvaro Herrera

On 2023-Mar-20, Thomas Munro wrote:

> I realised that part of Alvaro's complaint was probably caused by
> cfbot's refusal to show any useful information just because it
> couldn't apply a patch the last time it tried.  A small improvement
> today: now it shows a ♲ symbol (with hover text "Rebase needed") if it
> doesn't currently apply, but you can still see the most recent CI test
> results.  And from there you can find your way to the parent commit
> ID.

Thank you for improving and continue to think about further enhancements
to the CF bot.  It has clearly improved our workflow a lot.

My complaint wasn't actually targetted at the CF bot.  It turns out that
I gave a talk on Friday at a private EDB mini-conference about the
PostgreSQL open source process; and while preparing for that one, I
ran some 'git log' commands to obtain the number of code contributors
for each release, going back to 9.4 (when we started using the
'Authors:' tag more prominently).  What I saw is a decline in the number
of unique contributors, from its maximum at version 12, down to the
numbers we had in 9.5.  We went back 4 years.  That scared me a lot.

So I started a conversation about that and some people told me that it's
very easy to be discouraged by our process.  I don't need to mention
that it's antiquated -- this in itself turns off youngsters.  But in
addition to that, I think newbies might be discouraged because their
contributions seem to go nowhere even after following the process.

This led me to suggesting that perhaps we need to be more lenient when
it comes to new contributors.  As I said, for seasoned contributors,
it's not a problem to keep up with our requirements, however silly they
are.  But people who spend their evenings a whole week or month trying
to understand how to patch for one thing that they want, to be received
by six months of silence followed by a constant influx of "please rebase
please rebase please rebase", no useful feedback, and termination with
"eh, you haven't rebased for the 1001th time, your patch has been WoA
for X days, we're setting it RwF, feel free to return next year" ...
they are most certainly off-put and will *not* try again next year.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Por suerte hoy explotó el califont porque si no me habría muerto
 de aburrido"  (Papelucho)

Re: [PATCH] Add initial xid/mxid/mxoff to initdb

2023-03-21 Thread Maxim Orlov

On Mon, 20 Mar 2023 at 22:31, Gregory Stark (as CFM) 
wrote:

>
> So is that other thread tracked in a different commitfest entry and
> this one completely redundant? I'll mark it Rejected then?
>

Yep, it appears so.

-- 
Best regards,
Maxim Orlov.

Comment in preptlist.c

2023-03-21 Thread Etsuro Fujita

While working on something else, I noticed $SUBJECT added by commit 86dc90056:

 * For UPDATE and DELETE queries, the targetlist must also contain "junk"
 * tlist entries needed to allow the executor to identify the rows to be
 * updated or deleted; for example, the ctid of a heap row.  (The planner
 * adds these; they're not in what we receive from the planner/rewriter.)

I think that “planner/rewriter” should be parser/rewriter.  Attached
is a patch for that.

Best regards,
Etsuro Fujita


fix-comment.patch
Description: Binary data

Re: ICU locale validation / canonicalization

2023-03-21 Thread Peter Eisentraut


On 17.03.23 18:55, Jeff Davis wrote:

On Wed, 2023-03-15 at 15:18 -0700, Jeff Davis wrote:

I left out the validation patch for now, and I'm evaluating a
different
approach that will attempt to match to the locales retrieved with
uloc_countAvailable()/uloc_getAvailable().


I like this approach, attached new patch series with that included as
0006.


I have looked at the first three patches.  I think we want what those 
patches do.



[PATCH v6 1/6] Support language tags in older ICU versions (53 and
 earlier).

In pg_import_system_collations(), this is now redundant and can be 
simplified:


-   if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
+   if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag))

icu_set_collation_attributes() needs more commenting about what is going 
on.  My guess is that uloc_canonicalize() converts from language tag to 
ICU locale ID, and then the existing logic to parse that apart would 
apply.  Is that how it works?



[PATCH v6 2/6] Wrap ICU ucol_open().

It makes sense to try to unify some of this.  But I find the naming 
confusing.  If I see pg_ucol_open(), then I would expect that all calls 
to ucol_open() would be replaced by this.  But here it's only a few, 
without explanation.  (pg_ucol_open() has no explanation at all AFAICT.)


I have in my notes that check_icu_locale() and make_icu_collator() 
should be combined into a single function.  I think that would be a 
better way to slice it.


Btw., I had intentionally not written code like this

+#if U_ICU_VERSION_MAJOR_NUM < 54
+   icu_set_collation_attributes(collator, loc_str);
+#endif

The disadvantage of doing it that way is that you then need to dig out 
an old version of ICU in order to check whether the code compiles at 
all.  With the current code, you can be sure that that code compiles if 
you make changes elsewhere.



[PATCH v6 3/6] Handle the "und" locale in ICU versions 54 and older.

This makes sense, but the same comment about not #if'ing out code for 
old ICU versions applies here.


The

+#ifdef USE_ICU
+

before pg_ucol_open() probably belongs in patch 2.

Re: Allow logical replication to copy tables in binary format

2023-03-21 Thread Melih Mutlu

Hi,

Peter Smith , 21 Mar 2023 Sal, 04:33 tarihinde şunu
yazdı:

> Here are my review comments for v18-0001
>
> ==
> doc/src/sgml/logical-replication.sgml
>
> 1.
> +   target table. However, logical replication in binary format is more
> +   restrictive. See the binary option of
> +   CREATE
> SUBSCRIPTION
> +   for details.
>
>
> Because you've changed the linkend to be the binary option, IMO now
> the  part also needs to be modified. Otherwise, this page has
> multiple "CREATE SUBSCRIPTION" links which jump to different places,
> which just seems wrong to me.
>

Makes sense. I changed it as you suggested.


> 3.
> I think there can only be 0 or 1 list element in 'options'.
>
> So, why does the code here use lappend(options,...) instead of just
> using list_make1(...)?
>

Changed it to list_make1.

Amit Kapila , 21 Mar 2023 Sal, 12:27 tarihinde
şunu yazdı:

> > Explaining the issue explicitly with a comment seems better to me than
> the trick of changing order of table creation just for some test cases.
> > But I'm also ok with removing the use of disable_on_error if that's what
> you agree on.
> >
>
> Let's do that way for now.
>

Done.

Thanks,
-- 
Melih Mutlu
Microsoft
From 8d94a5f18a433efaeae907b5dc0225272b7442aa Mon Sep 17 00:00:00 2001
From: Melih Mutlu 
Date: Mon, 8 Aug 2022 14:14:44 +0300
Subject: [PATCH v19] Allow logical replication to copy table in binary

If binary option is enabled in a subscription, then copy tables in binary format
during table synchronization.

Without this patch, tables are copied in text format even if the
subscription is created with binary option enabled. This patch allows logical replication
to perform in binary format starting from initial sync. Copying
tables in binary format may reduce the time spent depending on column
types.

Binary copy for initial table synchronization is supported
only when both publisher and subscriber are v16 or later.

Discussion: https://postgr.es/m/CAGPVpCQvAziCLknEnygY0v1-KBtg%2BOm-9JHJYZOnNPKFJPompw%40mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml   |   4 +-
 doc/src/sgml/ref/alter_subscription.sgml|   5 +
 doc/src/sgml/ref/create_subscription.sgml   |  28 ++-
 src/backend/replication/logical/tablesync.c |  17 +-
 src/test/subscription/t/014_binary.pl   | 180 ++--
 5 files changed, 216 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 6b0e300adc..3836d13ad3 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -251,7 +251,9 @@
column of type bigint.  The target table can also have
additional columns not provided by the published table.  Any such columns
will be filled with the default value as specified in the definition of the
-   target table.
+   target table. However, logical replication in binary format is more
+   restrictive. See the binary
+   option of CREATE SUBSCRIPTION for details.
   
 
   
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 964fcbb8ff..e92346edef 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -177,6 +177,11 @@ ALTER SUBSCRIPTION name RENAME TO <
   how copy_data = true can interact with the
   origin parameter.
  
+ 
+  See the binary
+  option of CREATE SUBSCRIPTION for details
+  about copying pre-existing data in binary format.
+ 
 

   
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 51c45f17c7..9d4b9d4e33 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -185,15 +185,25 @@ CREATE SUBSCRIPTION subscription_name
 
-   
+   
 binary (boolean)
 
  
-  Specifies whether the subscription will request the publisher to
-  send the data in binary format (as opposed to text).
-  The default is false.
-  Even when this option is enabled, only data types having
-  binary send and receive functions will be transferred in binary.
+  Specifies whether the subscription will request the publisher to send
+  the data in binary format (as opposed to text). The default is
+  false. Any initial table synchronization copy
+  (see copy_data) also uses the same format. Binary
+  format can be faster than the text format, but it is less portable
+  across machine architectures and PostgreSQL
+  versions. Binary format is very data type specific; for example, it
+  will not allow copying from a smallint column to an
+  integer column, even though that would work fine in text
+  format. Even when this option is enabled, only data types having binary
+  send and receive functions will be

Re: Allow logical replication to copy tables in binary format

2023-03-21 Thread Amit Kapila

On Tue, Mar 21, 2023 at 2:16 PM Melih Mutlu  wrote:
>
> Amit Kapila , 21 Mar 2023 Sal, 09:03 tarihinde şunu 
> yazdı:
>>
>> On Tue, Mar 21, 2023 at 7:03 AM Peter Smith  wrote:
>> >
>> >
>> > ==
>> > src/test/subscription/t/014_binary.pl
>> >
>> > 4.
>> > # -
>> > # Test mismatched column types with/without binary mode
>> > # -
>> >
>> > # Test syncing tables with mismatching column types
>> > $node_publisher->safe_psql(
>> > 'postgres', qq(
>> > CREATE TABLE public.test_mismatching_types (
>> > a bigint PRIMARY KEY
>> > );
>> > INSERT INTO public.test_mismatching_types (a)
>> > VALUES (1), (2);
>> > ));
>> >
>> > # Check the subscriber log from now on.
>> > $offset = -s $node_subscriber->logfile;
>> >
>> > # Ensure the subscription is enabled. disable_on_error is still true,
>> > # so the subscription can be disabled due to missing realtion until
>> > # the test_mismatching_types table is created.
>> > $node_subscriber->safe_psql(
>> > 'postgres', qq(
>> > CREATE TABLE public.test_mismatching_types (
>> > a int PRIMARY KEY
>> > );
>> > ALTER SUBSCRIPTION tsub ENABLE;
>> > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
>> > ));
>> >
>> > ~~
>> >
>> > I found the "Ensure the subscription is enabled..." comment and the
>> > necessity for enabling the subscription to be confusing.
>> >
>> > Can't some complications all be eliminated just by creating the table
>> > on the subscribe side first?
>> >
>>
>> Hmm, that would make this test inconsistent with other tests and
>> probably difficult to understand and extend. I don't like to say this
>> but I think introducing disable_on_error has introduced more
>> complexities in the patch due to the requirement of enabling
>> subscription again and again. I feel it would be better without using
>> disable_on_error option in these tests.
>
>
> While this change would make the test inconsistent, I think it also would 
> make more confusing.
>

I also think so.

> Explaining the issue explicitly with a comment seems better to me than the 
> trick of changing order of table creation just for some test cases.
> But I'm also ok with removing the use of disable_on_error if that's what you 
> agree on.
>

Let's do that way for now.

-- 
With Regards,
Amit Kapila.

Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread Önder Kalacı

Hi Amit, Shi Yu

Now attaching the similar patches for generated columns.

Thanks,
Onder KALACI



Amit Kapila , 21 Mar 2023 Sal, 09:07 tarihinde
şunu yazdı:

> On Mon, Mar 20, 2023 at 6:28 PM Amit Kapila 
> wrote:
> >
> > On Mon, Mar 20, 2023 at 2:58 PM Önder Kalacı 
> wrote:
> > >
> > >
> > > I applied all patches for all the versions, and re-run the
> subscription tests,
> > > all looks good to me.
> > >
> >
> > LGTM. I'll push this tomorrow unless there are more comments.
> >
>
> Pushed.
>
> --
> With Regards,
> Amit Kapila.
>


v1-0001-HEAD-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_15-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_14-REL_13-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_12-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data

Re: Allow logical replication to copy tables in binary format

2023-03-21 Thread Melih Mutlu

Amit Kapila , 21 Mar 2023 Sal, 09:03 tarihinde
şunu yazdı:

> On Tue, Mar 21, 2023 at 7:03 AM Peter Smith  wrote:
> >
> >
> > ==
> > src/test/subscription/t/014_binary.pl
> >
> > 4.
> > # -
> > # Test mismatched column types with/without binary mode
> > # -
> >
> > # Test syncing tables with mismatching column types
> > $node_publisher->safe_psql(
> > 'postgres', qq(
> > CREATE TABLE public.test_mismatching_types (
> > a bigint PRIMARY KEY
> > );
> > INSERT INTO public.test_mismatching_types (a)
> > VALUES (1), (2);
> > ));
> >
> > # Check the subscriber log from now on.
> > $offset = -s $node_subscriber->logfile;
> >
> > # Ensure the subscription is enabled. disable_on_error is still true,
> > # so the subscription can be disabled due to missing realtion until
> > # the test_mismatching_types table is created.
> > $node_subscriber->safe_psql(
> > 'postgres', qq(
> > CREATE TABLE public.test_mismatching_types (
> > a int PRIMARY KEY
> > );
> > ALTER SUBSCRIPTION tsub ENABLE;
> > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
> > ));
> >
> > ~~
> >
> > I found the "Ensure the subscription is enabled..." comment and the
> > necessity for enabling the subscription to be confusing.
> >
> > Can't some complications all be eliminated just by creating the table
> > on the subscribe side first?
> >
>
> Hmm, that would make this test inconsistent with other tests and
> probably difficult to understand and extend. I don't like to say this
> but I think introducing disable_on_error has introduced more
> complexities in the patch due to the requirement of enabling
> subscription again and again. I feel it would be better without using
> disable_on_error option in these tests.
>

While this change would make the test inconsistent, I think it also would
make more confusing.
Explaining the issue explicitly with a comment seems better to me than the
trick of changing order of table creation just for some test cases.
But I'm also ok with removing the use of disable_on_error if that's what
you agree on.

Thanks,
-- 
Melih Mutlu
Microsoft

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-21 Thread wangw.f...@fujitsu.com

On Mon, Mar 20, 2023 at 21:18 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> Dear Wang,
> 
> I have tested about multilevel partitions, and it worked well.
> Followings are my comments for v18-0001.

Thanks for your comments and testing.

> 01. pg_get_publication_tables
> 
> ```
> +   ListCell   *lc;
> ```
> 
> This definition can be inside of the "for (i = 0; i < nelems; i++)".

Changed.

> 02. pg_get_publication_tables
> 
> ```
> -* If the publication publishes partition changes via 
> their
> -* respective root partitioned tables, we must 
> exclude partitions
> -* in favor of including the root partitioned tables. 
> Otherwise,
> -* the function could return both the child and 
> parent tables
> -* which could cause data of the child table to be
> -* double-published on the subscriber side.
> +* Publications support partitioned tables. If
> +* publish_via_partition_root is false, all changes 
> are replicated
> +* using leaf partition identity and schema, so we 
> only need those.
> +* Otherwise, get the partitioned table itself.
> ```
> 
> The comments can be  inside of the "else".

Since I think there are related operations in the function
GetAllTablesPublicationRelations, it might be better to write it above the
if-statement.

> 03. pg_get_publication_tables
> 
> ```
> +   pfree(elems);
> ```
> 
> Only elems is pfree()'d here, but how about other variable like pub_elem and
> pub_elem_tables?

Added releases to these two variables.

Regards,
Wang Wei

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-21 Thread wangw.f...@fujitsu.com

On Mon, Mar 20, 2023 at 15:32 PM Peter Smith  wrote:
> Here are some review comments for v17-0001.

Thanks for your comments.

> ==
> src/backend/catalog/pg_publication.c
> 
> 1. filter_partitions
> 
> -static List *
> -filter_partitions(List *relids)
> +static void
> +filter_partitions(List *table_infos)
>  {
> - List*result = NIL;
>   ListCell   *lc;
> - ListCell   *lc2;
> 
> - foreach(lc, relids)
> + foreach(lc, table_infos)
>   {
> - bool skip = false;
> - List*ancestors = NIL;
> - Oid relid = lfirst_oid(lc);
> + bool skip = false;
> + List*ancestors = NIL;
> + ListCell*lc2;
> + published_rel*table_info = (published_rel *) lfirst(lc);
> 
> - if (get_rel_relispartition(relid))
> - ancestors = get_partition_ancestors(relid);
> + if (get_rel_relispartition(table_info->relid))
> + ancestors = get_partition_ancestors(table_info->relid);
> 
>   foreach(lc2, ancestors)
>   {
>   Oid ancestor = lfirst_oid(lc2);
> + ListCell   *lc3;
> 
>   /* Check if the parent table exists in the published table list. */
> - if (list_member_oid(relids, ancestor))
> + foreach(lc3, table_infos)
>   {
> - skip = true;
> - break;
> + Oid relid = ((published_rel *) lfirst(lc3))->relid;
> +
> + if (relid == ancestor)
> + {
> + skip = true;
> + break;
> + }
>   }
> +
> + if (skip)
> + break;
>   }
> 
> - if (!skip)
> - result = lappend_oid(result, relid);
> + if (skip)
> + table_infos = foreach_delete_current(table_infos, lc);
>   }
> -
> - return result;
>  }
> 
> 
> It seems the 'skip' and 'ancestors' and 'lc2' vars are not needed
> except when "if (get_rel_relispartition(table_info->relid))" is true,
> so won't it be better to restructure the code to put everything inside
> that condition. Then you will save a few unnecessary tests of
> foreach(lc2, ancestors) and (skip).
> 
> For example,
> 
> static void
> filter_partitions(List *table_infos)
> {
> ListCell   *lc;
> 
> foreach(lc, table_infos)
> {
> published_rel*table_info = (published_rel *) lfirst(lc);
> 
> if (get_rel_relispartition(table_info->relid))
> {
> bool skip = false;
> List *ancestors = get_partition_ancestors(table_info->relid);
> ListCell *lc2;
> 
> foreach(lc2, ancestors)
> {
> Oid ancestor = lfirst_oid(lc2);
> ListCell   *lc3;
> /* Check if the parent table exists in the published table list. */
> foreach(lc3, table_infos)
> {
> Oid relid = ((published_rel *) lfirst(lc3))->relid;
> 
> if (relid == ancestor)
> {
> skip = true;
> break;
> }
> }
> if (skip)
> break;
> }
> 
> if (skip)
> table_infos = foreach_delete_current(table_infos, lc);
> }
> }
> }

Refactored this part of code based on other comments.

> ~~~
> 
> 3. pg_get_publication_tables
> 
>   /* Show all columns when the column list is not specified. */
> - if (nulls[1] == true)
> + if (nulls[2] == true)
> 
> Since you are changing this line anyway, you might as well change it
> to remove the redundant "== true" part.
> 
> SUGGESTION
> if (nulls[2])

Changed.

Regards,
Wang Wei

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-21 Thread wangw.f...@fujitsu.com

On Sat, Mar 18, 2023 at 7:37 AM Jacob Champion  wrote:
> On Thu, Mar 16, 2023 at 11:28 PM wangw.f...@fujitsu.com
>  wrote:
> > Attach the new patch set.

Thanks for your comments and testing.

> For example, the corner case mentioned in 0003, with multiple
> publications having conflicting pubviaroot settings, isn't tested as far
> as I can see. (I checked manually, and it appears to work as intended.)
> And the related pub_lower_level test currently only covers the case
> where multiple publications have pubviaroot=true, so the following test
> comment is now misleading:
> 
> > # for tab4, we publish changes through the "middle" partitioned table
> > $node_publisher->safe_psql('postgres',
> > "CREATE PUBLICATION pub_lower_level FOR TABLE tab4_1 WITH
> (publish_via_partition_root = true)"
> > );
> 
> ...since the changes are now in fact published via the tab4 root after
> this patchset is applied.

Make sense.
Tried to improve this comment like below:
```
If we subscribe only to pub_lower_level, changes for tab4 will be published
through the "middle" partition table. However, since we will be subscribing to
both pub_lower_level and pub_all (see subscription sub2 below), we will publish
changes via the root table (tab4).
```

Regards,
Wang Wei

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-21 Thread wangw.f...@fujitsu.com

On Mon, Mar 20, 2023 at 18:15 PM Amit Kapila  wrote:
>

Thanks for your comments.

> On Mon, Mar 20, 2023 at 1:02 PM Peter Smith 
> wrote:
> >
> >
> > ==
> > src/include/catalog/pg_proc.dat
> >
> > 4.
> > +{ oid => '6119',
> > +  descr => 'get information of the tables in the given publication array',
> >
> > Should that be worded in a way to make it more clear that the
> > "publication array" is really an "array of publication names"?
> >
> 
> I don't know how important it is to tell that the array is an array of
> publication names but the current description can be improved. How
> about something like: "get information of the tables that are part of
> the specified publications"

Changed.

> Few other comments:
> =
> 1.
>   foreach(lc2, ancestors)
>   {
>   Oid ancestor = lfirst_oid(lc2);
> + ListCell   *lc3;
> 
>   /* Check if the parent table exists in the published table list. */
> - if (list_member_oid(relids, ancestor))
> + foreach(lc3, table_infos)
>   {
> - skip = true;
> - break;
> + Oid relid = ((published_rel *) lfirst(lc3))->relid;
> +
> + if (relid == ancestor)
> + {
> + skip = true;
> + break;
> + }
>   }
> +
> + if (skip)
> + break;
>   }
> 
> - if (!skip)
> - result = lappend_oid(result, relid);
> + if (skip)
> + table_infos = foreach_delete_current(table_infos, lc);
> 
> The usage of skip looks a bit ugly to me. Can we move the code for the
> inner loop to a separate function (like
> is_ancestor_member_tableinfos()) and remove the current cell if it
> returns true?

Changed.

> 2.
>   * Filter out the partitions whose parent tables were also specified in
>   * the publication.
>   */
> -static List *
> -filter_partitions(List *relids)
> +static void
> +filter_partitions(List *table_infos)
> 
> The comment atop filter_partitions is no longer valid. Can we slightly
> change it to: "Filter out the partitions whose parent tables are also
> present in the list."?

Changed.

> 3.
> -# Note: We create two separate tables, not a partitioned one, so that we can
> -# easily identity through which relation were the changes replicated.
> +# Note: We only create one table (tab4) here. We specified
> +# publish_via_partition_root = true (see pub_all and pub_lower_level above), 
> so
> +# all data will be replicated to that table.
>  $node_subscriber2->safe_psql('postgres',
>   "CREATE TABLE tab4 (a int PRIMARY KEY)");
> -$node_subscriber2->safe_psql('postgres',
> - "CREATE TABLE tab4_1 (a int PRIMARY KEY)");
> 
> I am not sure if it is a good idea to remove tab4_1 here. It is
> testing something different as mentioned in the comments. Also, I
> don't see any data in tab4 for the initial sync, so not sure if this
> tests the behavior changed by this patch.

Reverted this change. And inserted the initial sync data into table tab4 to test
this more clearly.

> 4.
> --- a/src/test/subscription/t/031_column_list.pl
> +++ b/src/test/subscription/t/031_column_list.pl
> @@ -959,7 +959,8 @@ $node_publisher->safe_psql(
>   CREATE TABLE test_root_1 PARTITION OF test_root FOR VALUES FROM (1) TO
> (10);
>   CREATE TABLE test_root_2 PARTITION OF test_root FOR VALUES FROM (10) TO
> (20);
> 
> - CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH
> (publish_via_partition_root = true);
> + CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH
> (publish_via_partition_root = true);
> + CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
> (publish_via_partition_root = true);
> 
>   -- initial data
>   INSERT INTO test_root VALUES (1, 2, 3);
> @@ -968,7 +969,7 @@ $node_publisher->safe_psql(
> 
>  $node_subscriber->safe_psql(
>   'postgres', qq(
> - CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> pub_root_true;
> + CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> pub_root_true_1, pub_root_true_2;
> 
> It is not clear to me what exactly you want to test here. Please add
> some comments.

Tried to add the following comment to make it clear:
```
+# Subscribe to pub_root_true_1 and pub_root_true_2 at the same time, which
+# means that the initial data will be synced once, and only the column list of
+# the parent table (test_root) in the publication pub_root_true_1 will be used
+# for both table sync and data replication.
 $node_subscriber->safe_psql(
'postgres', qq(
-   CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION 
pub_root_true;
+   CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION 
pub_root_true_1, pub_root_true_2;
```

> 5. I think you can merge the 0001 and 0003 patches.

Merged.

> Apart from the above, attached is a patch to change some of the
> comments in the patch.

Thanks for this improvement. I've checked and merged it.

Attach the new patch set.

Regards,
Wang Wei


HEAD_v19-0001-Fix-data-replicated-twice-when-specifying-publis.patch
Description:  HEAD_v19-0001-Fix-data-replicated-twice-when-specifying-publis.patch

Re: Timeline ID hexadecimal format

2023-03-21 Thread Sébastien Lardière


On 21/03/2023 08:15, Peter Eisentraut wrote:

On 20.03.23 10:40, Sébastien Lardière wrote:
About option_parse_int(), actually, strtoint() is used, do we need 
a option_parse_ul() fonction ?


For the option parsing, I propose the attached patch.  This follows 
the structure of option_parse_int(), so in the future it could be 
extracted and refactored in the same way, if there is more need.



ok for me, it accept 0x values and refuse wrong values


committed


thanks,


--
Sébastien

Hash table scans outside transactions

2023-03-21 Thread Ashutosh Bapat

Hi,
Hash table scans (seq_scan_table/level) are cleaned up at the end of a
transaction in AtEOXact_HashTables(). If a hash seq scan continues
beyond transaction end it will meet "ERROR: no hash_seq_search scan
for hash table" in deregister_seq_scan(). That seems like a limiting
the hash table usage.

Our use case is
1. Add/update/remove entries in hash table
2. Scan the existing entries and perform one transaction per entry
3. Close scan

repeat above steps in an infinite loop. Note that we do not
add/modify/delete entries in step 2. We can't use linked lists since
the entries need to be updated or deleted using hash keys. Because the
hash seq scan is cleaned up at the end of the transaction, we
encounter error in the 3rd step. I don't see that the actual hash
table scan depends upon the seq_scan_table/level[] which is cleaned up
at the end of the transaction.

I have following questions
1. Is there a way to avoid cleaning up seq_scan_table/level() when the
transaction ends?
2. Is there a way that we can use hash table implementation in
PostgreSQL code for our purpose?


-- 
Best Wishes,
Ashutosh Bapat

Re: Timeline ID hexadecimal format

2023-03-21 Thread Peter Eisentraut


On 20.03.23 10:40, Sébastien Lardière wrote:
About option_parse_int(), actually, strtoint() is used, do we need a 
option_parse_ul() fonction ?


For the option parsing, I propose the attached patch.  This follows 
the structure of option_parse_int(), so in the future it could be 
extracted and refactored in the same way, if there is more need.



ok for me, it accept 0x values and refuse wrong values


committed

CREATE DATABASE ... STRATEGY WAL_LOG issues

2023-03-21 Thread Andres Freund

Hi,

While hacking on my relation extension patch I found two issues with WAL_LOG:

1) RelationCopyStorageUsingBuffer() doesn't free the used strategies. This
   means we'll use #relations * ~10k memory

2) RelationCopyStorageUsingBuffer() gets the buffer for the target relation
   with RBM_NORMAL, therefore requiring a read of a block guaranteed to be
   zero

Easy enough to fix and shows clear improvement. One thing I wonder is if it's
worth moving the strategies up one level? Probaly not, but ...

Greetings,

Andres Freund

1 2 >

1 - 100 of 105 matches

Mail list logo