Re: Trigger violates foreign key constraint

2023-12-21 Thread Pavel Luzanov
I fully support this addition to the documentation. The legal 
possibility of breaking data consistency must be documented at least.


Please, consider small suggestion to replace last sentence.

- This is not considered a bug, and it is the responsibility of the user 
to write triggers so that such problems are avoided.

+ It is the trigger programmer's responsibility to avoid such scenarios.

To be consistent with the sentence about recursive trigger calls: [1]
"It is the trigger programmer's responsibility to avoid infinite 
recursion in such scenarios."


Also I don't really like "This is not considered a bug" part, since it 
looks like an excuse.



1. https://www.postgresql.org/docs/devel/trigger-definition.html

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages

2023-12-21 Thread Xiaoran Wang
Hi,
I updated the comment about the CatalogSnapshot `src/backend/utils/time/
snapmgr.c`

Xiaoran Wang  于2023年12月18日周一 15:02写道:

> Hi,
> Thanks for your reply.
>
> jian he  于2023年12月18日周一 08:20写道:
>
>> Hi
>> ---setup.
>> drop table s2;
>> create table s2(a int);
>>
>> After apply the patch
>> alter table s2 add primary key (a);
>>
>> watch CatalogSnapshot
>> 
>> #0  GetNonHistoricCatalogSnapshot (relid=1259)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:412
>> #1  0x55ba78f0d6ba in GetCatalogSnapshot (relid=1259)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:371
>> #2  0x55ba785ffbe1 in systable_beginscan
>> (heapRelation=0x7f256f30b5a8, indexId=2662, indexOK=false,
>> snapshot=0x0, nkeys=1, key=0x7ffe230f0180)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/access/index/genam.c:413
>> (More stack frames follow...)
>>
>> -
>> Hardware watchpoint 13: CatalogSnapshot
>>
>> Old value = (Snapshot) 0x55ba7980b6a0 
>> New value = (Snapshot) 0x0
>> InvalidateCatalogSnapshot () at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
>> 435 SnapshotResetXmin();
>> (gdb) bt 4
>> #0  InvalidateCatalogSnapshot ()
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
>> #1  0x55ba78f0ee85 in AtEOXact_Snapshot (isCommit=true,
>> resetXmin=false)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:1057
>> #2  0x55ba7868201b in CommitTransaction ()
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:2373
>> #3  0x55ba78683495 in CommitTransactionCommand ()
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:3061
>> (More stack frames follow...)
>>
>> --
>> but the whole process changes pg_class, pg_index,
>> pg_attribute,pg_constraint etc.
>> only one GetCatalogSnapshot and  InvalidateCatalogSnapshot seems not
>> correct?
>> what if there are concurrency changes in the related pg_catalog table.
>>
>> your patch did pass the isolation test!
>>
>
> Yes, I have run the installcheck-world locally, and all the tests passed.
> There are two kinds of Invalidation Messages.
> One kind is from the local backend, such as what you did in the example
> "alter table s2 add primary key (a);",  it modifies the pg_class,
> pg_attribute ect ,
> so it generates some Invalidation Messages to invalidate the "s2" related
> tuples in pg_class , pg_attribute ect, and Invalidate Message to
> invalidate s2
> relation cache. When the command is finished, in the
> CommandCounterIncrement,
> those Invalidation Messages will be processed to make the system cache work
> well for the following commands.
>
> The other kind of Invalidation Messages are from other backends.
> Suppose there are two sessions:
> session1
> ---
> 1: create table foo(a int);
> ---
> session 2
> ---
> 1: create table test(a int); (before session1:1)
> 2: insert into foo values(1); (execute after session1:1)
> ---
> Session1 will generate Invalidation Messages and send them when the
> transaction is committed,
> and session 2 will accept those Invalidation Messages from session 1 and
> then execute
> the second command.
>
> Before the patch, Postgres will invalidate the CatalogSnapshot for those
> two kinds of Invalidation
> Messages. So I did a small optimization in this patch, for local
> Invalidation Messages, we don't
> call InvalidateCatalogSnapshot, we can use one CatalogSnapshot in a
> transaction even if we modify
> the catalog and generate Invalidation Messages, as the visibility of the
> tuple is identified by the curcid,
> as long as we update the curcid of the CatalogSnapshot in  
> SnapshotSetCommandId,
> it can work
> correctly.
>
>
>
>> I think you patch doing is against following code comments in
>> src/backend/utils/time/snapmgr.c
>>
>> /*
>>  * CurrentSnapshot points to the only snapshot taken in
>> transaction-snapshot
>>  * mode, and to the latest one taken in a read-committed transaction.
>>  * SecondarySnapshot is a snapshot that's always up-to-date as of the
>> current
>>  * instant, even in transaction-snapshot mode.  It should only be used for
>>  * special-purpose code (say, RI checking.)  CatalogSnapshot points to an
>>  * MVCC snapshot intended to be used for catalog scans; we must
>> invalidate it
>>  * whenever a system catalog change occurs.
>>  *
>>  * These SnapshotData structs are static to simplify memory allocation
>>  * (see the hack in GetSnapshotData to avoid repeated malloc/free).
>>  */
>> static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
>> static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
>> SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
>> SnapshotData SnapshotSelfData = {SNAPSHOT_SELF};
>> SnapshotData SnapshotAnyData = {SNAPSHOT_ANY};
>>
>
> Thank you for pointing it out, I think I need to update the 

Optimization outcome depends on the index order

2023-12-21 Thread Andrei Lepikhov

On 21/12/2023 12:10, Alexander Korotkov wrote:
> I took a closer look at the patch in [9].  I should drop my argument
> about breaking the model, because add_path() already considers other
> aspects than just costs.  But I have two more note about that patch:
>
> 1) It seems that you're determining the fact that the index path
> should return strictly one row by checking path->rows <= 1.0 and
> indexinfo->unique.  Is it really guaranteed that in this case quals
> are matching unique constraint?  path->rows <= 1.0 could be just an
> estimation error.  Or one row could be correctly estimated, but it's
> going to be selected by some quals matching unique constraint and
> other quals in recheck.  So, it seems there is a risk to select
> suboptimal index due to this condition.

Operating inside the optimizer, we consider all estimations to be the 
sooth. This patch modifies only one place: having two equal assumptions, 
we just choose one that generally looks more stable.
Filtered tuples should be calculated and included in the cost of the 
path. The decision on the equality of paths has been made in view of the 
estimation of these filtered tuples.


> 2) Even for non-unique indexes this patch is putting new logic on top
> of the subsequent code.  How we can prove it's going to be a win?
> That could lead, for instance, to dropping parallel-safe paths in
> cases we didn't do so before.
Because we must trust all predictions made by the planner, we just 
choose the most trustworthy path. According to the planner logic, it is 
a path with a smaller selectivity. We can make mistakes anyway just 
because of the nature of estimation.


> Anyway, please start a separate thread if you're willing to put more
> work into this.

Done

> 9. https://www.postgresql.org/message-id/154f786a-06a0-4fb1-
> b8a4-16c66149731b%40postgrespro.ru

--
regards,
Andrei Lepikhov
Postgres ProfessionalFrom 7b044de1449a5fdc450cb629caafb4e15ded7a93 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 27 Nov 2023 11:23:48 +0700
Subject: [PATCH] Choose an index path with the best selectivity estimation.

In the case when optimizer predicts only one row prefer choosing UNIQUE indexes
In other cases, if optimizer treats indexes as equal, make a last attempt
selecting the index with less selectivity - this decision takes away dependency
on the order of indexes in an index list (good for reproduction of some issues)
and proposes one more objective argument to choose specific index.
---
 src/backend/optimizer/util/pathnode.c | 42 +++
 .../expected/drop-index-concurrently-1.out| 16 +++
 src/test/regress/expected/functional_deps.out | 39 +
 src/test/regress/expected/join.out| 40 +-
 src/test/regress/sql/functional_deps.sql  | 32 ++
 5 files changed, 143 insertions(+), 26 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d3..4b5aedd579 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -454,6 +454,48 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
costcmp = compare_path_costs_fuzzily(new_path, old_path,

 STD_FUZZ_FACTOR);
 
+   /*
+* Apply some heuristics on index paths.
+*/
+   if (IsA(new_path, IndexPath) && IsA(old_path, IndexPath))
+   {
+   IndexPath *inp = (IndexPath *) new_path;
+   IndexPath *iop = (IndexPath *) old_path;
+
+   if (new_path->rows <= 1.0 && old_path->rows <= 1.0)
+   {
+   /*
+* When both paths are predicted to produce 
only one tuple,
+* the optimiser should prefer choosing a 
unique index scan
+* in all cases.
+*/
+   if (inp->indexinfo->unique && 
!iop->indexinfo->unique)
+   costcmp = COSTS_BETTER1;
+   else if (!inp->indexinfo->unique && 
iop->indexinfo->unique)
+   costcmp = COSTS_BETTER2;
+   else if (costcmp != COSTS_DIFFERENT)
+   /*
+* If the optimiser doesn't have an 
obviously stable choice
+* of unique index, increase the chance 
of avoiding mistakes
+* by choosing an index with smaller 
selectivity.
+* This option makes decision more 
conservative and looks
+* debatable.
+ 

A typo in a messsage?

2023-12-21 Thread Kyotaro Horiguchi
I found the following message introduced by a recent commit.

> errdetail("The first unsummarized LSN is this range is %X/%X.",
 
Shouldn't the "is" following "LSN" be "in"?


diff --git a/src/backend/backup/basebackup_incremental.c 
b/src/backend/backup/basebackup_incremental.c
index 42bbe564e2..22b861ce52 100644
--- a/src/backend/backup/basebackup_incremental.c
+++ b/src/backend/backup/basebackup_incremental.c
@@ -575,7 +575,7 @@ PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
tle->tli,

LSN_FORMAT_ARGS(tli_start_lsn),

LSN_FORMAT_ARGS(tli_end_lsn)),
-errdetail("The first 
unsummarized LSN is this range is %X/%X.",
+errdetail("The first 
unsummarized LSN in this range is %X/%X.",
   
LSN_FORMAT_ARGS(tli_missing_lsn;
}
 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Transaction timeout

2023-12-21 Thread Japin Li

On Tue, 19 Dec 2023 at 22:06, Japin Li  wrote:
> On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin  wrote:
>>> On 19 Dec 2023, at 13:26, Andrey M. Borodin  wrote:
>>>
>>> I don’t have Windows machine, so I hope CF bot will pick this.
>>
>> I used Github CI to produce version of tests that seems to be is stable on 
>> Windows.
>
> It still failed on Windows Server 2019 [1].
>
> diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out 
> C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out
> --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 
> 10:34:30.354721100 +
> +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out  
> 2023-12-19 10:38:25.877981600 +
> @@ -100,7 +100,7 @@
>  step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE 
> application_name = 'isolation/timeouts/stt2'
>  count
>  -
> -0
> +1
>  (1 row)
>
>  step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET 
> statement_timeout = '10s'; SET lock_timeout = '10s'; SET transaction_timeout 
> = '10s';
>
> [1] 
> https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs

Hi,

I try to split the test for transaction timeout, and all passed on my CI [1].

OTOH, I find if I set transaction_timeout in a transaction, it will not take
effect immediately.  For example:

[local]:2049802 postgres=# BEGIN;
BEGIN
[local]:2049802 postgres=*# SET transaction_timeout TO '1s';
SET
[local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;  -- wait 10s
   relname
--
 pg_statistic
(1 row)

[local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;
FATAL:  terminating connection due to transaction timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

It looks odd.  Does this is expected? I'm not read all the threads,
am I missing something?

[1] https://cirrus-ci.com/build/6574686130143232

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

>From fb87e5fe2ea5ced51a7e443243cdd40115423449 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" 
Date: Sun, 3 Dec 2023 23:18:00 +0500
Subject: [PATCH v13 1/1] Introduce transaction_timeout

This commit adds timeout that is expected to be used as a prevention
of long-running queries. Any session within transaction will be
terminated after spanning longer than this timeout.

However, this timeout is not applied to prepared transactions.
Only transactions with user connections are affected.

Author: Andrey Borodin 
Reviewed-by: Nikolay Samokhvalov 
Reviewed-by: Andres Freund 
Reviewed-by: Fujii Masao 
Reviewed-by: bt23nguyent 
Reviewed-by: Yuhang Qiu 
Reviewed-by: Japin Li 

Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
---
 doc/src/sgml/config.sgml  | 35 +++
 src/backend/postmaster/autovacuum.c   |  2 +
 src/backend/storage/lmgr/proc.c   |  1 +
 src/backend/tcop/postgres.c   | 27 +++-
 src/backend/utils/errcodes.txt|  1 +
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 10 +++
 src/backend/utils/misc/guc_tables.c   | 11 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/bin/pg_dump/pg_backup_archiver.c  |  2 +
 src/bin/pg_dump/pg_dump.c |  2 +
 src/bin/pg_rewind/libpq_source.c  |  1 +
 src/include/miscadmin.h   |  1 +
 src/include/storage/proc.h|  1 +
 src/include/utils/timeout.h   |  1 +
 src/test/isolation/Makefile   |  5 +-
 src/test/isolation/expected/timeouts.out  | 63 ++-
 src/test/isolation/specs/timeouts.spec| 30 +
 18 files changed, 190 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b5624ca884..d62edcf83b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9134,6 +9134,41 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  transaction_timeout (integer)
+  
+   transaction_timeout configuration parameter
+  
+  
+  
+   
+Terminate any session that spans longer than the specified amount of
+time in transaction. The limit applies both to explicit transactions
+(started with BEGIN) and to implicitly started
+transaction corresponding to single statement. But this limit is not
+applied to prepared transactions.
+If this value is specified without units, it is taken as milliseconds.
+A value of zero (the default) disables the timeout.
+   
+
+   
+If transaction_timeout is shorter than
+ 

Re: trying again to get incremental backup

2023-12-21 Thread Alexander Lakhin

21.12.2023 23:43, Robert Haas wrote:

There are also two deadcode.DeadStores complaints from clang. First one is
about:
  /*
   * Align the wait time to prevent drift. This doesn't really matter,
   * but we'd like the warnings about how long we've been waiting to say
   * 10 seconds, 20 seconds, 30 seconds, 40 seconds ... without ever
   * drifting to something that is not a multiple of ten.
   */
  timeout_in_ms -=
  TimestampDifferenceMilliseconds(current_time, initial_time) %
  timeout_in_ms;
It looks like this timeout is really not used.

Oops. It should be. See attached.


My quick experiment shows that that TimestampDifferenceMilliseconds call
always returns zero, due to it's arguments swapped.

The other changes look good to me.

Thank you!

Best regards,
Alexander




Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Junwang Zhao
On Thu, Dec 21, 2023 at 5:35 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 11 Dec 2023 23:31:29 +0900,
>   Masahiko Sawada  wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
>* Add CopyToState{Get,Set}Opaque()
>
> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
>* Add CopyToStateGetAttNumList()
>
> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>
> 3. Export CopySend*()
>* Rename CopySend*() to CopyToStateSend*() and export them
>* Exception: CopySendEndOfRow() to CopyToStateFlush() because
>  it just flushes the internal buffer now.
>
> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>
I guess the purpose of these helpers is to avoid expose CopyToState to
copy.h, but I
think expose CopyToState to user might make life easier, users might want to use
the memory contexts of the structure (though I agree not all the
fields are necessary
for extension handers).

> The attached patch is based on the Sawada-san's patch and
> includes the above changes. Note that this patch is also
> dirty so just for discussion.
>
> My suggestions from this experience:
>
> 1. Split COPY handler to COPY TO handler and COPY FROM handler
>
>* CopyFormatRoutine is a bit tricky. An extension needs
>  to create a CopyFormatRoutine node and
>  a CopyToFormatRoutine node.
>
>* If we just require "copy_to_${FORMAT}(internal)"
>  function and "copy_from_${FORMAT}(internal)" function,
>  we can remove the tricky approach. And it also avoid
>  name collisions with other handler such as tablesample
>  handler.
>  See also:
>  
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> 2. Need an opaque space like IndexScanDesc::opaque does
>
>* A custom COPY TO handler needs to keep its data

I once thought users might want to parse their own options, maybe this
is a use case for this opaque space.

For the name, I thought private_data might be a better candidate than
opaque, but I do not insist.
>
> 3. Export CopySend*()
>
>* If we like minimum API, we just need to export
>  CopySendData() and CopySendEndOfRow(). But
>  CopySend{String,Char,Int32,Int16}() will be convenient
>  custom COPY TO handlers. (A custom COPY TO handler for
>  Apache Arrow doesn't need them.)

Do you use the arrow library to control the memory? Is there a way that
we can let the arrow use postgres' memory context? I'm not sure this
is necessary, just raise the question for discussion.
>
> Questions:
>
> 1. What value should be used for "format" in
>PgMsg_CopyOutResponse message?
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144
>
>It's 1 for binary format and 0 for text/csv format.
>
>Should we make it customizable by custom COPY TO handler?
>If so, what value should be used for this?
>
> 2. Do we need more tries for design discussion for the first
>implementation? If we need, what should we try?
>
>
> Thanks,
> --
> kou

+PG_FUNCTION_INFO_V1(copy_testfmt_handler);
+Datum
+copy_testfmt_handler(PG_FUNCTION_ARGS)
+{
+ bool is_from = PG_GETARG_BOOL(0);
+ CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
+

extra semicolon.

-- 
Regards
Junwang Zhao




Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Amit Kapila
On Fri, Dec 22, 2023 at 5:00 AM Michael Paquier  wrote:
>
> On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:
> > On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
> >> We can return int2 value from the function pg_get_replication_slots()
> >> and then use that to display a string in the view
> >> pg_replication_slots.
> >
> > I strongly dislike that pattern. It just leads to complicated views - and
> > doesn't provide a single benefit that I am aware of. It's much bettter to
> > simply populate the text version in pg_get_replication_slots().
>
> I agree that this is a better integration in the view, and that's what
> I would do FWIW.
>
> Amit, how much of a problem would it be to do a text->enum mapping
> when synchronizing the slots from a primary to a standby?
>

There is no problem as such in that. We were trying to see if there is
a more convenient way but let's move by having cause as text from both
the function and view as that seems to be a preferred way.

-- 
With Regards,
Amit Kapila.




Re: int4->bool test coverage

2023-12-21 Thread Michael Paquier
On Thu, Dec 21, 2023 at 11:56:22AM +0100, Christoph Berg wrote:
> The first cast is the int4_bool function, but it isn't covered by the
> regression tests at all. The attached patch adds tests.

I don't see why not.

Interesting that there are a few more of these in int.c, like int2up,
int4inc, int2smaller, int{2,4}shr, int{2,4}not, etc.
--
Michael


signature.asc
Description: PGP signature


Re: ci: Build standalone INSTALL file

2023-12-21 Thread Michael Paquier
On Thu, Dec 21, 2023 at 02:22:02PM -0500, Tom Lane wrote:
> Here's a draft patch for this.  Most of it is mechanical removal of
> infrastructure for building the INSTALL file.  If anyone wants to
> bikeshed on the new wording of README, feel free.

Thanks for putting this together.  That looks reasonable.

> diff --git a/README b/README
> index 56d0c951a9..e40e610ccb 100644
> --- a/README
> +++ b/README
> @@ -9,14 +9,13 @@ that supports an extended subset of the SQL standard, 
> including
> -See the file INSTALL for instructions on how to build and install
> -PostgreSQL.  That file also lists supported operating systems and
> -hardware platforms and contains information regarding any other
> -software packages that are required to build or run the PostgreSQL
> -system.  Copyright and license information can be found in the
> -file COPYRIGHT.  A comprehensive documentation set is included in this
> -distribution; it can be read as described in the installation
> -instructions.
> +Copyright and license information can be found in the file COPYRIGHT.
> +
> +General documentation about this version of PostgreSQL can be found at:
> +https://www.postgresql.org/docs/devel/
> +In particular, information about building PostgreSQL from the source
> +code can be found at:
> +https://www.postgresql.org/docs/devel/installation.html

Sounds fine by me, including the extra step documented in
RELEASE_CHANGES.  No information is lost.
--
Michael


signature.asc
Description: PGP signature


Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Masahiko Sawada
On Thu, Dec 21, 2023 at 6:35 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 11 Dec 2023 23:31:29 +0900,
>   Masahiko Sawada  wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
>* Add CopyToState{Get,Set}Opaque()
>
> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
>* Add CopyToStateGetAttNumList()
>
> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688

I think we can move CopyToState to copy.h and we don't need to have
set/get functions for its fields.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Masahiko Sawada
On Fri, Dec 22, 2023 at 10:00 AM Michael Paquier  wrote:
>
> On Thu, Dec 21, 2023 at 06:35:04PM +0900, Sutou Kouhei wrote:
> >* If we just require "copy_to_${FORMAT}(internal)"
> >  function and "copy_from_${FORMAT}(internal)" function,
> >  we can remove the tricky approach. And it also avoid
> >  name collisions with other handler such as tablesample
> >  handler.
> >  See also:
> >  
> > https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> Hmm.  I prefer the unique name approach for the COPY portions without
> enforcing any naming policy on the function names returning the
> handlers, actually, though I can see your point.

Yeah, another idea is to provide support functions to return a
CopyFormatRoutine wrapping either CopyToFormatRoutine or
CopyFromFormatRoutine. For example:

extern CopyFormatRoutine *MakeCopyToFormatRoutine(const
CopyToFormatRoutine *routine);

extensions can do like:

static const CopyToFormatRoutine testfmt_handler = {
.type = T_CopyToFormatRoutine,
.start_fn = testfmt_copyto_start,
.onerow_fn = testfmt_copyto_onerow,
.end_fn = testfmt_copyto_end
};

Datum
copy_testfmt_handler(PG_FUNCTION_ARGS)
{
CopyFormatRoutine *routine = MakeCopyToFormatRoutine(_handler);
:

>
> > 2. Need an opaque space like IndexScanDesc::opaque does
> >
> >* A custom COPY TO handler needs to keep its data
>
> Sounds useful to me to have a private area passed down to the
> callbacks.
>

+1

>
> > Questions:
> >
> > 1. What value should be used for "format" in
> >PgMsg_CopyOutResponse message?
> >
> >It's 1 for binary format and 0 for text/csv format.
> >
> >Should we make it customizable by custom COPY TO handler?
> >If so, what value should be used for this?
>
> Interesting point.  It looks very tempting to give more flexibility to
> people who'd like to use their own code as we have one byte in the
> protocol but just use 0/1.  Hence it feels natural to have a callback
> for that.

+1

>
> It also means that we may want to think harder about copy_is_binary in
> libpq in the future step.  Now, having a backend implementation does
> not need any libpq bits, either, because a client stack may just want
> to speak the Postgres protocol directly.  Perhaps a custom COPY
> implementation would be OK with how things are in libpq, as well,
> tweaking its way through with just text or binary.
>
> > 2. Do we need more tries for design discussion for the first
> >implementation? If we need, what should we try?
>
> A makeNode() is used with an allocation in the current memory context
> in the function returning the handler.  I would have assume that this
> stuff returns a handler as a const struct like table AMs.

+1

The example I mentioned above does that.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Michael Paquier
On Thu, Dec 21, 2023 at 06:35:04PM +0900, Sutou Kouhei wrote:
>* If we just require "copy_to_${FORMAT}(internal)"
>  function and "copy_from_${FORMAT}(internal)" function,
>  we can remove the tricky approach. And it also avoid
>  name collisions with other handler such as tablesample
>  handler.
>  See also:
>  
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9

Hmm.  I prefer the unique name approach for the COPY portions without
enforcing any naming policy on the function names returning the
handlers, actually, though I can see your point.

> 2. Need an opaque space like IndexScanDesc::opaque does
> 
>* A custom COPY TO handler needs to keep its data

Sounds useful to me to have a private area passed down to the
callbacks.

> 3. Export CopySend*()
> 
>* If we like minimum API, we just need to export
>  CopySendData() and CopySendEndOfRow(). But
>  CopySend{String,Char,Int32,Int16}() will be convenient
>  custom COPY TO handlers. (A custom COPY TO handler for
>  Apache Arrow doesn't need them.)

Hmm.  Not sure on this one.  This may come down to externalize the
manipulation of fe_msgbuf.  Particularly, could it be possible that
some custom formats don't care at all about the network order?

> Questions:
> 
> 1. What value should be used for "format" in
>PgMsg_CopyOutResponse message?
> 
>It's 1 for binary format and 0 for text/csv format.
> 
>Should we make it customizable by custom COPY TO handler?
>If so, what value should be used for this?

Interesting point.  It looks very tempting to give more flexibility to
people who'd like to use their own code as we have one byte in the
protocol but just use 0/1.  Hence it feels natural to have a callback
for that.

It also means that we may want to think harder about copy_is_binary in
libpq in the future step.  Now, having a backend implementation does
not need any libpq bits, either, because a client stack may just want
to speak the Postgres protocol directly.  Perhaps a custom COPY
implementation would be OK with how things are in libpq, as well,
tweaking its way through with just text or binary.

> 2. Do we need more tries for design discussion for the first
>implementation? If we need, what should we try?

A makeNode() is used with an allocation in the current memory context
in the function returning the handler.  I would have assume that this
stuff returns a handler as a const struct like table AMs.
--
Michael


signature.asc
Description: PGP signature


Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Michael Paquier
On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:
> On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
>> We can return int2 value from the function pg_get_replication_slots()
>> and then use that to display a string in the view
>> pg_replication_slots.
> 
> I strongly dislike that pattern. It just leads to complicated views - and
> doesn't provide a single benefit that I am aware of. It's much bettter to
> simply populate the text version in pg_get_replication_slots().

I agree that this is a better integration in the view, and that's what
I would do FWIW.

Amit, how much of a problem would it be to do a text->enum mapping
when synchronizing the slots from a primary to a standby?  Sure you
could have a system function that does some of the mapping work, but I
am not sure what's the best integration when it comes to the other
patch.
--
Michael


signature.asc
Description: PGP signature


Re: Remove MSVC scripts from the tree

2023-12-21 Thread Michael Paquier
On Thu, Dec 21, 2023 at 03:43:32PM -0500, Andrew Dunstan wrote:
> On 2023-12-21 Th 03:01, Michael Paquier wrote:
>> Andrew, was the original target of pgperlsyncheck committers and
>> hackers who played with the MSVC scripts but could not run sanity
>> checks on Windows (see [1])?
> 
> 
> yes.

Okay, thanks.  Wouldn't it be better to remove it at the end?  With
the main use case behind its introduction being gone, it is less
attractive to keep maintaining it.  If some people have been using it
in their workflows, I'm OK to keep it but the rest of the tree can be
checked at runtime as well.

> I'm actually a bit dubious about win32tzlist.pl. Win32::Registry is not
> present in a recent Strawberry Perl installation, and its latest version
> says it is obsolete, although it's still included in the cpan bundle
> libwin32.
> 
> I wonder who has actually run the script any time recently?

Hmm...  I've never run it with meson on Win32.

> In any case, we can probably work around the syncheck issue by making the
> module a runtime requirement rather than a compile time requirement, by
> using "require" instead of "use".

Interesting.  Another trick would be needed for HKEY_LOCAL_MACHINE,
like what the dummylib but local to win32tzlist.pl.  Roughly among
these lines:
-use Win32::Registry;
+use Config;
+
+require Win32::Registry;
 
 my $tzfile = 'src/bin/initdb/findtimezone.c';
 
+if ($Config{osname} ne 'MSWin32' && $Config{osname} ne 'msys')
+{
+   use vars qw($HKEY_LOCAL_MACHINE);
+}
--
Michael


signature.asc
Description: PGP signature


Re: broken master regress tests

2023-12-21 Thread Jeff Davis
On Wed, 2023-12-20 at 17:48 -0800, Jeff Davis wrote:
> Attached.

It appears to increase the coverage. I committed it and I'll see how
the buildfarm reacts.

Regards,
Jeff Davis





Re: Built-in CTYPE provider

2023-12-21 Thread Jeff Davis
On Wed, 2023-12-20 at 15:47 -0800, Jeremy Schneider wrote:

> One other thing that comes to mind: how does the parser do case
> folding
> for relation names? Is that using OS-provided libc as of today? Or
> did
> we code it to use ICU if that's the DB default? I'm guessing libc,
> and
> global catalogs probably need to be handled in a consistent manner,
> even
> across different encodings.

The code is in downcase_identifier():

  /*  
   * SQL99 specifies Unicode-aware case normalization, which we don't 
   * yet have the infrastructure for...
   */
  if (ch >= 'A' && ch <= 'Z')
ch += 'a' - 'A';
  else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
ch = tolower(ch);
  result[i] = (char) ch;

My proposal would add the infrastructure that the comment above says is
missing.

It seems like we should be using the database collation at this point
because you don't want inconsistency between the catalogs and the
parser here. Then again, the SQL spec doesn't seem to support tailoring
of case conversions, so maybe we are avoiding it for that reason? Or
maybe we're avoiding catalog access? Or perhaps the work for ICU just
wasn't done here yet?

> (Kindof related... did you ever see the demo where I create a user
> named
> '' and then I try to connect to a database with non-unicode
> encoding?
>   ...at least it seems to be able to walk the index without
> decoding
> strings to find other users - but the way these global catalogs work
> scares me a little bit)

I didn't see that specific demo, but in general we seem to change
between pg_wchar and unicode code points too freely, so I'm not
surprised that something went wrong.

Regards,
Jeff Davis





Re: Built-in CTYPE provider

2023-12-21 Thread Jeff Davis
On Wed, 2023-12-20 at 16:29 -0800, Jeremy Schneider wrote:
> found some more. here's my running list of everything user-facing I
> see
> in core PG code so far that might involve case:
> 
> * upper/lower/initcap
> * regexp_*() and *_REGEXP()
> * ILIKE, operators ~* !~* ~~ !~~ ~~* !~~*
> * citext + replace(), split_part(), strpos() and translate()
> * full text search - everything is case folded
> * unaccent? not clear to me whether CTYPE includes accent folding

No, ctype has nothing to do with accents as far as I can tell. I don't
know if I'm using the right terminology, but I think "case" is a
variant of a character whereas "accent" is a modifier/mark, and the
mark is a separate concept from the character itself.

> * ltree
> * pg_trgm
> * core PG parser, case folding of relation names

Let's separate it into groups.

(1) Callers that use a collation OID or pg_locale_t:

  * collation & hashing
  * upper/lower/initcap
  * regex, LIKE, formatting
  * pg_trgm (which uses regexes)
  * maybe postgres_fdw, but might just be a passthrough
  * catalog cache (always uses DEFAULT_COLLATION_OID)
  * citext (always uses DEFAULT_COLLATION_OID, but probably shouldn't)

(2) A long tail of callers that depend on what LC_CTYPE/LC_COLLATE are
set to, or use ad-hoc ASCII-only semantics:

  * core SQL parser downcase_identifier()
  * callers of pg_strcasecmp() (DDL, etc.)
  * GUC name case folding
  * full text search ("mylocale = 0 /* TODO */")
  * a ton of stuff uses isspace(), isdigit(), etc.
  * various callers of tolower()/toupper()
  * some selfuncs.c stuff
  * ...

Might have missed some places.

The user impact of a new builtin provider would affect (1), but only
for those actually using the provider. So there's no compatibility risk
there, but it's good to understand what it will affect.

We can, on a case-by-case basis, also consider using the new APIs I'm
proposing for instances of (2). There would be some compatibility risk
there for existing callers, and we'd have to consider whether it's
worth it or not. Ideally, new callers would either use the new APIs or
use the pg_ascii_* APIs.

Regards,
Jeff Davis





Re: pg_serial bloat

2023-12-21 Thread Thomas Munro
On Fri, Dec 15, 2023 at 9:53 AM Thomas Munro  wrote:
> ... We've seen a system with ~30GB of files in there
> (note: full/untruncated be would be 2³² xids × sizeof(uint64_t) =
> 32GB).  It's not just a gradual disk space leak: according to disk
> space monitoring, this system suddenly wrote ~half of that data, which
> I think must be the while loop in SerialAdd() zeroing out pages.

Attempt at an analysis of this rare anti-social I/O pattern:

SerialAdd() writes zero pages in a range from the old headPage up to
some target page, but headPage can be any number, arbitrarily far in
the past (or apparently, the future).  It only keeps up with the
progress of the xid clock and spreads that work out if we happen to
call SerialAdd() often enough.  If we call SerialAdd() only every
couple of billion xids (eg very occasionally you leave a transaction
open and go out to lunch on a very busy system using SERIALIZABLE
everywhere), you might find yourself suddenly needing to write out
many gigabytes of zeroes there.

One observation is that headPage gets periodically zapped to -1 by
checkpoints, near the comment "SLRU is no longer needed", providing a
periodic dice-roll that chops the range down.  Unfortunately the
historical "apparent wraparound" bug prevents that from being reached.
That bug was fixed by commit d6b0c2b (master only, no back-patch).  On
the system where we saw pg_serial going bananas, that message appeared
regularly.

Attempts to find a solution:

I think it might make sense to clamp firstZeroPage into the page range
implied by tailXid, headXid.  Those values are eagerly maintained and
interlock with snapshots and global xmin (correctly but
under-documented-ly, AFAICS so far), and we will never try to look up
the CSN for any xid outside that range.  I think that should exclude
the pathological zero-writing cases.  I wouldn't want to do this
without a working reproducer though, which will take some effort.

Another thought is that in the glorious 64 bit future, we might be
able to invent a "sparse" SLRU, where if the file or page doesn't
exist, we just return a zero CSN, and when we write a new page we just
let the OS provide filesystem holes as required.  The reason I
wouldn't want to invent sparse SLRUs with 32 bit indexing is that we
have no confidence in the truncation logic, which might leave stray
files from earlier epochs.  So I think we need zero'd pages (or
perhaps at least to confirm that there is nothing already there, but I
have zero desire to make the current wraparound-ridden system more
complex).




Re: Emit fewer vacuum records by reaping removable tuples during pruning

2023-12-21 Thread Melanie Plageman
On Fri, Nov 17, 2023 at 6:12 PM Melanie Plageman
 wrote:
>
> On Mon, Nov 13, 2023 at 5:28 PM Melanie Plageman
>  wrote:
> > When there are no indexes on the relation, we can set would-be dead
> > items LP_UNUSED and remove them during pruning. This saves us a vacuum
> > WAL record, reducing WAL volume (and time spent writing and syncing
> > WAL).
> ...
> > Note that (on principle) this patch set is on top of the bug fix I
> > proposed in [1].
> >
> > [1] 
> > https://www.postgresql.org/message-id/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
>
> Rebased on top of fix in b2e237afddc56a and registered for the january fest
> https://commitfest.postgresql.org/46/4665/

I got an off-list question about whether or not this codepath is
exercised in existing regression tests. It is -- vacuum.sql tests
include those which vacuum a table with no indexes and tuples that can
be deleted.

I also looked through [1] to see if there were any user-facing docs
which happened to mention the exact implementation details of how and
when tuples are deleted by vacuum. I didn't see anything like that, so
I don't think there are user-facing docs which need updating.

- Melanie

[1] https://www.postgresql.org/docs/devel/routine-vacuuming.html




Re: Eager page freeze criteria clarification

2023-12-21 Thread Robert Haas
On Thu, Dec 21, 2023 at 10:56 AM Melanie Plageman
 wrote:
> Agreed. I plan to test with another distribution. Though, the exercise
> of determining which ones are useful is probably more challenging.
> I imagine we will have to choose one distribution (as opposed to
> supporting different distributions and choosing based on data access
> patterns for a table). Though, even with a normal distribution, I
> think it should be an improvement.

Our current algorithm isn't adaptive at all, so I like our chances of
coming out ahead. It won't surprise me if somebody finds a case where
there is a regression, but if we handle some common and important
cases correctly (e.g. append-only, update-everything-nonstop) then I
think we're probably ahead even if there are some cases where we do
worse. It does depend on how much worse they are, and how realistic
they are, but we don't want to be too fearful here: we know what we're
doing right now isn't too great.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Eager page freeze criteria clarification

2023-12-21 Thread Melanie Plageman
On Wed, Dec 13, 2023 at 12:24 PM Robert Haas  wrote:
>
> Great results.

Thanks!

> On Sat, Dec 9, 2023 at 5:12 AM Melanie Plageman
>  wrote:
> > Values can be "removed" from the accumulator by simply decrementing its
> > cardinality and decreasing the sum and sum squared by a value that will
> > not change the mean and standard deviation of the overall distribution.
> > To adapt to a table's changing access patterns, we'll need to remove
> > values from this accumulator over time, but this patch doesn't yet
> > decide when to do this. A simple solution may be to cap the cardinality
> > of the accumulator to the greater of 1% of the table size, or some fixed
> > number of values (perhaps 200?). Even without such removal of values,
> > the distribution recorded in the accumulator will eventually skew toward
> > more recent data, albeit at a slower rate.
>
> I think we're going to need something here. Otherwise, after 6 months
> of use, changing a table's perceived access pattern will be quite
> difficult.
>
> I think one challenge here is to find something that doesn't decay too
> often and end up with cases where it basically removes all the data.
>
> As long as you avoid that, I suspect that the algorithm might not be
> terribly sensitive to other kinds of changes. If you decay after 200
> values or 2000 or 20,000, it will only affect how fast we can change
> our notion of the access pattern, and my guess would be that any of
> those values would produce broadly acceptable results, with some
> differences in the details. If you decay after 200,000,000 values or
> not at all, then I think there will be problems.

I'll add the decay logic and devise a benchmark that will exercise it.
I can test at least one or two of these ideas.

> > The goal is to keep pages frozen for at least target_freeze_duration.
> > target_freeze_duration is in seconds and pages only have a last
> > modification LSN, so target_freeze_duration must be converted to LSNs.
> > To accomplish this, I've added an LSNTimeline data structure, containing
> > XLogRecPtr, TimestampTz pairs stored with decreasing precision as they
> > age. When we need to translate the guc value to LSNs, we linearly
> > interpolate it on this timeline. For the time being, the global
> > LSNTimeline is in PgStat_WalStats and is only updated by vacuum. There
> > is no reason it can't be updated with some other cadence and/or by some
> > other process (nothing about it is inherently tied to vacuum). The
> > cached translated value of target_freeze_duration is stored in each
> > table's stats. This is arbitrary as it is not a table-level stat.
> > However, it needs to be located somewhere that is accessible on
> > update/delete. We may want to recalculate it more often than once per
> > table vacuum, especially in case of long-running vacuums.
>
> This part sounds like it isn't quite baked yet. The idea of the data
> structure seems fine, but updating it once per vacuum sounds fairly
> unprincipled to me? Don't we want the updates to happen on a somewhat
> regular wall clock cadence?

Yes, this part was not fully baked. I actually discussed this with
Andres at PGConf EU last week and he suggested that background writer
update the LSNTimeline. He also suggested I propose the LSNTimeline in
a new thread. I could add a pageinspect function returning the
estimated time of last page modification given the page LSN (so it is
proposed with a user).

- Melanie




Re: Remove MSVC scripts from the tree

2023-12-21 Thread Andrew Dunstan



On 2023-12-21 Th 03:01, Michael Paquier wrote:

On Wed, Dec 20, 2023 at 11:39:15PM -0800, Andres Freund wrote:

Can't we teach the tool that it should not validate src/tools/win32tzlist.pl
on !windows? It's obviously windows specific code, and it's special case
enough that there doesn't seem like a need to develop it on !windows.

I am not really excited about keeping a dummy library for the sake of
a script checking if this WIN32-only file is correctly written, and
I've never used pgperlsyncheck, TBH, since it exists in af616ce48347.
Anyway, we could just tweak the list of files returned by
find_perl_files as win32tzlist.pl is valid for perltidy and
perlcritic.

Andrew, was the original target of pgperlsyncheck committers and
hackers who played with the MSVC scripts but could not run sanity
checks on Windows (see [1])?



yes.



There are a few more cases like the
Unicode scripts or some of the stuff in src/tools/ where that can be
useful still these are not touched on a daily basis.  The rest of the
pm files are for TAP tests, one for Unicode.  I'm OK to tweak the
script, still, if its main purpose is gone..

[1]: 
https://www.postgresql.org/message-id/f3c12e2c-618f-cb6f-082b-a2f604dbe...@2ndquadrant.com



I'm actually a bit dubious about win32tzlist.pl. Win32::Registry is not 
present in a recent Strawberry Perl installation, and its latest version 
says it is obsolete, although it's still included in the cpan bundle 
libwin32.


I wonder who has actually run the script any time recently?

In any case, we can probably work around the syncheck issue by making 
the module a runtime requirement rather than a compile time requirement, 
by using "require" instead of "use".



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: trying again to get incremental backup

2023-12-21 Thread Robert Haas
On Thu, Dec 21, 2023 at 10:00 AM Alexander Lakhin  wrote:
> Please look at the attached patch; it corrects all 29 items ("recods"
> fixed in two places), but maybe you find some substitutions wrong...

Thanks, committed with a few additions.

> I've also observed that those commits introduced new warnings:
> $ CC=gcc-12 CPPFLAGS="-Wtype-limits" ./configure -q && make -s -j8
> reconstruct.c: In function ‘read_bytes’:
> reconstruct.c:511:24: warning: comparison of unsigned expression in ‘< 0’ is 
> always false [-Wtype-limits]
>511 | if (rb < 0)
>|^
> reconstruct.c: In function ‘write_reconstructed_file’:
> reconstruct.c:650:40: warning: comparison of unsigned expression in ‘< 0’ is 
> always false [-Wtype-limits]
>650 | if (rb < 0)
>|^
> reconstruct.c:662:32: warning: comparison of unsigned expression in ‘< 0’ is 
> always false [-Wtype-limits]
>662 | if (wb < 0)

Oops. I think the variables should be type int. See attached.

> There are also two deadcode.DeadStores complaints from clang. First one is
> about:
>  /*
>   * Align the wait time to prevent drift. This doesn't really matter,
>   * but we'd like the warnings about how long we've been waiting to 
> say
>   * 10 seconds, 20 seconds, 30 seconds, 40 seconds ... without ever
>   * drifting to something that is not a multiple of ten.
>   */
>  timeout_in_ms -=
>  TimestampDifferenceMilliseconds(current_time, initial_time) %
>  timeout_in_ms;
> It looks like this timeout is really not used.

Oops. It should be. See attached.

> And the minor one (similar to many existing, maybe doesn't deserve fixing):
> walsummarizer.c:808:5: warning: Value stored to 'summary_end_lsn' is never 
> read [deadcode.DeadStores]
>  summary_end_lsn = 
> private_data->read_upto;
>  ^ ~~~

It kind of surprises me that this is dead, but it seems best to keep
it there to be on the safe side, in case some change to the logic
renders it not dead in the future.

> >> Also, a comment above MaybeRemoveOldWalSummaries() basically repeats a
> >> comment above redo_pointer_at_last_summary_removal declaration, but
> >> perhaps it should say about removing summaries instead?
> > Wow, yeah. Thanks, will fix.
>
> Thank you for paying attention to it!

I'll fix this next.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


fix-ib-thinkos.patch
Description: Binary data


Re: Fixing backslash dot for COPY FROM...CSV

2023-12-21 Thread Daniel Verite
vignesh C wrote:

> Thanks for the updated patch, any reason why this is handled only in csv.
> postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out';
> COPY 1
> postgres=# select * from test1;
>  c1
> ---
> line1
> (1 row)

I believe it's safer to not change anything to the normal "non-csv"
text mode.
The current doc says that \. will not be taken as data in this format.
From https://www.postgresql.org/docs/current/sql-copy.html :

   Any other backslashed character that is not mentioned in the above
   table will be taken to represent itself. However, beware of adding
   backslashes unnecessarily, since that might accidentally produce a
   string matching the end-of-data marker (\.) or the null string (\N
   by default). These strings will be recognized before any other
   backslash processing is done.



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Tom Lane
Andres Freund  writes:
> On 2023-12-21 10:46:02 -0500, Tom Lane wrote:
>> Let's go with "/devel/ in master and a number in release branches"
>> for now, and tweak that if the web team wants to take on maintaining
>> a redirect.  I'll put together a concrete patch proposal in a little
>> bit.

> Cool.

Here's a draft patch for this.  Most of it is mechanical removal of
infrastructure for building the INSTALL file.  If anyone wants to
bikeshed on the new wording of README, feel free.

regards, tom lane

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 80db4c73f8..eba569e930 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -109,10 +109,7 @@ distdir:
 	  || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
 	  fi || exit; \
 	done
-	$(MAKE) -C $(distdir)/doc/src/sgml/ INSTALL
-	cp $(distdir)/doc/src/sgml/INSTALL $(distdir)/
 	$(MAKE) -C $(distdir) distclean
-	rm -f $(distdir)/README.git
 
 distcheck: dist
 	rm -rf $(dummy)
diff --git a/Makefile b/Makefile
index c66fb3027b..9bc1a4ec17 100644
--- a/Makefile
+++ b/Makefile
@@ -17,13 +17,7 @@ all:
 
 all check install installdirs installcheck installcheck-parallel uninstall clean distclean maintainer-clean dist distcheck world check-world install-world installcheck-world:
 	@if [ ! -f GNUmakefile ] ; then \
-	   if [ -f INSTALL ] ; then \
-	 INSTRUCTIONS="INSTALL"; \
-	   else \
-	 INSTRUCTIONS="README.git"; \
-	   fi; \
-	   echo "You need to run the 'configure' program first. See the file"; \
-	   echo "'$$INSTRUCTIONS' for installation instructions, or visit: " ; \
+	   echo "You need to run the 'configure' program first. Please see"; \
 	   echo "" ; \
 	   false ; \
 	 fi
diff --git a/README b/README
index 56d0c951a9..e40e610ccb 100644
--- a/README
+++ b/README
@@ -9,14 +9,13 @@ that supports an extended subset of the SQL standard, including
 transactions, foreign keys, subqueries, triggers, user-defined types
 and functions.  This distribution also contains C language bindings.
 
-See the file INSTALL for instructions on how to build and install
-PostgreSQL.  That file also lists supported operating systems and
-hardware platforms and contains information regarding any other
-software packages that are required to build or run the PostgreSQL
-system.  Copyright and license information can be found in the
-file COPYRIGHT.  A comprehensive documentation set is included in this
-distribution; it can be read as described in the installation
-instructions.
+Copyright and license information can be found in the file COPYRIGHT.
+
+General documentation about this version of PostgreSQL can be found at:
+https://www.postgresql.org/docs/devel/
+In particular, information about building PostgreSQL from the source
+code can be found at:
+https://www.postgresql.org/docs/devel/installation.html
 
 The latest version of this software, and related software, may be
 obtained at https://www.postgresql.org/download/.  For more information
diff --git a/README.git b/README.git
deleted file mode 100644
index 4bf614eea4..00
--- a/README.git
+++ /dev/null
@@ -1,14 +0,0 @@
-(This file does not appear in release tarballs.)
-
-In a release or snapshot tarball of PostgreSQL, a documentation file named
-INSTALL will appear in this directory.  However, this file is not stored in
-git and so will not be present if you are using a git checkout.
-
-If you are using a git checkout, you can view the most recent installation
-instructions at:
-	https://www.postgresql.org/docs/devel/installation.html
-
-Users compiling from git will also need compatible versions of Bison, Flex,
-and Perl, as discussed in the install documentation.  These programs are not
-needed when using a tarball, since the files they are needed to build are
-already present in the tarball.  (On Windows, however, you need Perl anyway.)
diff --git a/doc/src/sgml/.gitignore b/doc/src/sgml/.gitignore
index 88a07d852e..91f2781fe7 100644
--- a/doc/src/sgml/.gitignore
+++ b/doc/src/sgml/.gitignore
@@ -6,7 +6,6 @@
 /man7/
 /man-stamp
 # Other popular build targets
-/INSTALL
 /postgres-US.pdf
 /postgres-A4.pdf
 /postgres.html
@@ -21,7 +20,5 @@
 /wait_event_types.sgml
 # Assorted byproducts from building the above
 /postgres-full.xml
-/INSTALL.html
-/INSTALL.xml
 /postgres-US.fo
 /postgres-A4.fo
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 2ef818900f..725fec59e7 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -36,6 +36,8 @@ ifndef FOP
 FOP = $(missing) fop
 endif
 
+PANDOC = pandoc
+
 XMLINCLUDE = --path . --path $(srcdir)
 
 ifdef XMLLINT
@@ -113,25 +115,6 @@ wait_event_types.sgml: $(top_srcdir)/src/backend/utils/activity/wait_event_names
 targets-meson.sgml: targets-meson.txt $(srcdir)/generate-targets-meson.pl
 	$(PERL) $(srcdir)/generate-targets-meson.pl $^ > $@
 
-##
-## Generation of some text files.
-##
-
-ICONV = iconv
-PANDOC = pandoc
-
-INSTALL: % : %.html
-	$(PANDOC) -t plain -o 

Re: index prefetching

2023-12-21 Thread Tomas Vondra
On 12/21/23 18:14, Robert Haas wrote:
> On Thu, Dec 21, 2023 at 11:08 AM Andres Freund  wrote:
>> But I'd like you to feel guilty (no, not really) and fix it (yes, really) :)
> 
> Sadly, you're more likely to get the first one than you are to get the
> second one. I can't really see going back to revisit that decision as
> a basis for somebody else's new work -- it'd be better if the person
> doing the new work figured out what makes sense here.
> 

I think it's a great example of "hindsight is 20/20". There were
perfectly valid reasons to have two separate nodes, and it's not like
these reasons somehow disappeared. It still is a perfectly reasonable
decision.

It's just that allowing index-only filters for regular index scans seems
to eliminate pretty much all executor differences between the two nodes.
But that's hard to predict - I certainly would not have even think about
that back when index-only scans were added.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




authentication/t/001_password.pl trashes ~/.psql_history

2023-12-21 Thread Tom Lane
I happened to notice this stuff getting added to my .psql_history:

\echo background_psql: ready
SET password_encryption='scram-sha-256';
;
\echo background_psql: QUERY_SEPARATOR
SET scram_iterations=42;
;
\echo background_psql: QUERY_SEPARATOR
\password scram_role_iter
\q

After grepping for these strings, this is evidently the fault of
src/test/authentication/t/001_password.pl by way of BackgroundPsql.pm,
which fires up an interactive psql run that is not given the -n switch.

Currently the only other user of interactive_psql() seems to be
psql/t/010_tab_completion.pl, which avoids this problem by
explicitly redirecting the history file.  We could have 001_password.pl
do likewise, or we could have it pass the -n switch, but I think we're
going to have this problem resurface repeatedly if we leave it to the
outer test script to remember to do it.

My first idea was that BackgroundPsql.pm should take responsibility for
preventing this, by explicitly setting $ENV{PSQL_HISTORY} to "/dev/null"
if the calling script hasn't set some other value.  However, that could
fail if the user who runs the test habitually sets PSQL_HISTORY.

A messier but safer alternative would be to supply the -n switch by
default, with some way for 010_tab_completion.pl to override that.

Thoughts?

regards, tom lane




Re: Functions to return random numbers in a given range

2023-12-21 Thread Pavel Stehule
Hi

čt 21. 12. 2023 v 18:06 odesílatel Dean Rasheed 
napsal:

> Attached is a patch that adds 3 SQL-callable functions to return
> random integer/numeric values chosen uniformly from a given range:
>
>   random(min int, max int) returns int
>   random(min bigint, max bigint) returns bigint
>   random(min numeric, max numeric) returns numeric
>
The return value is in the range [min, max], and in the numeric case,
> the result scale equals Max(scale(min), scale(max)), so it can be used
> to generate large random integers, as well as decimals.
>
> The goal is to provide simple, easy-to-use functions that operate
> correctly over arbitrary ranges, which is trickier than it might seem
> using the existing random() function. The main advantages are:
>
> 1. Support for arbitrary bounds (provided that max >= min). A SQL or
> PL/pgSQL implementation based on the existing random() function can
> suffer from integer overflow if the difference max-min is too large.
>
> 2. Uniform results over the full range. It's easy to overlook the fact
> that in a naive implementation doing something like
> "((max-min)*random()+min)::int", the endpoint values will be half as
> likely as any other value, since casting to integer rounds to nearest.
>
> 3. Makes better use of the underlying PRNG, not limited to the 52-bits
> of double precision values.
>
> 4. Simpler and more efficient generation of random numeric values.
> This is something I have commonly wanted in the past, and have usually
> resorted to hacks involving multiple calls to random() to build
> strings of digits, which is horribly slow, and messy.
>
> The implementation moves the existing random functions to a new source
> file, so the new functions all share a common PRNG state with the
> existing random functions, and that state is kept private to that
> file.
>

+1

Regards

Pavel


> Regards,
> Dean
>


Re: index prefetching

2023-12-21 Thread Robert Haas
On Thu, Dec 21, 2023 at 11:08 AM Andres Freund  wrote:
> But I'd like you to feel guilty (no, not really) and fix it (yes, really) :)

Sadly, you're more likely to get the first one than you are to get the
second one. I can't really see going back to revisit that decision as
a basis for somebody else's new work -- it'd be better if the person
doing the new work figured out what makes sense here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Functions to return random numbers in a given range

2023-12-21 Thread Dean Rasheed
Attached is a patch that adds 3 SQL-callable functions to return
random integer/numeric values chosen uniformly from a given range:

  random(min int, max int) returns int
  random(min bigint, max bigint) returns bigint
  random(min numeric, max numeric) returns numeric

The return value is in the range [min, max], and in the numeric case,
the result scale equals Max(scale(min), scale(max)), so it can be used
to generate large random integers, as well as decimals.

The goal is to provide simple, easy-to-use functions that operate
correctly over arbitrary ranges, which is trickier than it might seem
using the existing random() function. The main advantages are:

1. Support for arbitrary bounds (provided that max >= min). A SQL or
PL/pgSQL implementation based on the existing random() function can
suffer from integer overflow if the difference max-min is too large.

2. Uniform results over the full range. It's easy to overlook the fact
that in a naive implementation doing something like
"((max-min)*random()+min)::int", the endpoint values will be half as
likely as any other value, since casting to integer rounds to nearest.

3. Makes better use of the underlying PRNG, not limited to the 52-bits
of double precision values.

4. Simpler and more efficient generation of random numeric values.
This is something I have commonly wanted in the past, and have usually
resorted to hacks involving multiple calls to random() to build
strings of digits, which is horribly slow, and messy.

The implementation moves the existing random functions to a new source
file, so the new functions all share a common PRNG state with the
existing random functions, and that state is kept private to that
file.

Regards,
Dean
From 0b7015668387c337114adb4b3c24fe1d8053bf9c Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Fri, 25 Aug 2023 10:42:38 +0100
Subject: [PATCH v1] Add random-number-in-range functions.

This adds 3 functions:

random(min int, max int) returns int
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric

Each returns a random number in the range [min, max].

In the numeric case, the result scale is Max(scale(min), scale(max)).
---
 doc/src/sgml/func.sgml|  39 ++-
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/float.c |  95 --
 src/backend/utils/adt/meson.build |   1 +
 src/backend/utils/adt/numeric.c   | 219 +
 src/backend/utils/adt/pseudorandomfuncs.c | 185 +++
 src/common/pg_prng.c  |  36 +++
 src/include/catalog/pg_proc.dat   |  12 +
 src/include/common/pg_prng.h  |   1 +
 src/include/utils/numeric.h   |   4 +
 src/test/regress/expected/random.out  | 360 ++
 src/test/regress/sql/random.sql   | 164 ++
 12 files changed, 1017 insertions(+), 100 deletions(-)
 create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 20da3ed033..b0b65d81dc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1862,6 +1862,36 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

   
 
+  
+   
+
+ random
+
+random ( min integer, max integer )
+integer
+   
+   
+random ( min bigint, max bigint )
+bigint
+   
+   
+random ( min numeric, max numeric )
+numeric
+   
+   
+Return a random value in the range
+min = x = max.
+   
+   
+random(1, 10)
+7
+   
+   
+random(-0.499, 0.499)
+0.347
+   
+  
+
   

 
@@ -1906,19 +1936,18 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

 
   
-   The random() function uses a deterministic
-   pseudo-random number generator.
+   The random functions listed in 
+   use a deterministic pseudo-random number generator.
It is fast but not suitable for cryptographic
applications; see the  module for a more
secure alternative.
If setseed() is called, the series of results of
-   subsequent random() calls in the current session
+   subsequent calls to these random functions in the current session
can be repeated by re-issuing setseed() with the same
argument.
Without any prior setseed() call in the same
-   session, the first random() call obtains a seed
+   session, the first call to any of these random functions obtains a seed
from a platform-dependent source of random bits.
-   These remarks hold equally for random_normal().
   
 
   
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 199eae525d..610ccf2f79 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -82,6 +82,7 @@ OBJS = \
 	pg_lsn.o \
 	pg_upgrade_support.o \

Re: Eager page freeze criteria clarification

2023-12-21 Thread Joe Conway

On 12/21/23 10:56, Melanie Plageman wrote:

On Sat, Dec 9, 2023 at 9:24 AM Joe Conway  wrote:

However, even if we assume a more-or-less normal distribution, we should
consider using subgroups in a way similar to Statistical Process
Control[1]. The reasoning is explained in this quote:

 The Math Behind Subgroup Size

 The Central Limit Theorem (CLT) plays a pivotal role here. According
 to CLT, as the subgroup size (n) increases, the distribution of the
 sample means will approximate a normal distribution, regardless of
 the shape of the population distribution. Therefore, as your
 subgroup size increases, your control chart limits will narrow,
 making the chart more sensitive to special cause variation and more
 prone to false alarms.


I haven't read anything about statistical process control until you
mentioned this. I read the link you sent and also googled around a
bit. I was under the impression that the more samples we have, the
better. But, it seems like this may not be the assumption in
statistical process control?

It may help us to get more specific. I'm not sure what the
relationship between "unsets" in my code and subgroup members would
be.  The article you linked suggests that each subgroup should be of
size 5 or smaller. Translating that to my code, were you imagining
subgroups of "unsets" (each time we modify a page that was previously
all-visible)?


Basically, yes.

It might not makes sense, but I think we could test the theory by 
plotting a histogram of the raw data, and then also plot a histogram 
based on sub-grouping every 5 sequential values in your accumulator.


If the former does not look very normal (I would guess most workloads it 
will be skewed with a long tail) and the latter looks to be more normal, 
then it would say we were on the right track.


There are statistical tests for "normalness" that could be applied too 
( e.g. 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350423/#sec2-13title ) 
which be a more rigorous approach, but the quick look at histograms 
might be sufficiently convincing.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: index prefetching

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 11:00:34 -0500, Robert Haas wrote:
> On Thu, Dec 21, 2023 at 10:33 AM Tomas Vondra
>  wrote:
> > > I continue to think that we should not have split plain and index only 
> > > scans
> > > into separate files...
> >
> > I do agree with that opinion. Not just because of this prefetching
> > thread, but also because of the discussions about index-only filters in
> > a nearby thread.
> 
> For the record, in the original patch I submitted for this feature, it
> wasn't in separate files. If memory serves, Tom changed it.
> 
> So don't blame me. :-)

But I'd like you to feel guilty (no, not really) and fix it (yes, really) :)

Greetings,

Andres Freund




Re: index prefetching

2023-12-21 Thread Robert Haas
On Thu, Dec 21, 2023 at 10:33 AM Tomas Vondra
 wrote:
> > I continue to think that we should not have split plain and index only scans
> > into separate files...
>
> I do agree with that opinion. Not just because of this prefetching
> thread, but also because of the discussions about index-only filters in
> a nearby thread.

For the record, in the original patch I submitted for this feature, it
wasn't in separate files. If memory serves, Tom changed it.

So don't blame me. :-)

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 10:46:02 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-12-21 10:22:49 -0500, Tom Lane wrote:
> >> We could make it version-specific,
> >> https://www.postgresql.org/docs/17/installation.html
> >> and task src/tools/version_stamp.pl with updating it.  But that's
> >> problematic for not-yet-released branches (there's no 17 today
> >> for example).
> 
> > Perhaps we could make the website redirect 17 to /devel/ until 17 is 
> > branched
> > off?
> 
> Hmm, maybe, but then there's a moving part in version-stamping that's not
> accessible to the average committer.  On the other hand, it wouldn't
> be too awful if that redirect didn't get updated instantly after a
> branch.  This is probably a point where we need advice from the web
> team about how they manage documentation branches on the site.

IIRC the relevant part of the website code has access to a table of
documentation versions, so the redirect could be implement based on not
knowing the version.


> >> Perhaps we can use /devel/ in the master branch
> >> and try to remember to replace that with a version number as soon
> >> as a release branch is forked off --- but does the docs website
> >> get populated as soon as the branch is made?
> 
> > I think it runs a few times a day - breaking the link for a few hours 
> > wouldn't
> > be optimal, but also not the end of the world. But redirecting $vnext -> to
> > devel would probably be a more reliable approach.
> 
> Let's go with "/devel/ in master and a number in release branches"
> for now, and tweak that if the web team wants to take on maintaining
> a redirect.  I'll put together a concrete patch proposal in a little
> bit.

Cool.

Greetings,

Andres Freund




Re: Eager page freeze criteria clarification

2023-12-21 Thread Melanie Plageman
On Sat, Dec 9, 2023 at 9:24 AM Joe Conway  wrote:
>
> On 12/8/23 23:11, Melanie Plageman wrote:
> >
> > I'd be delighted to receive any feedback, ideas, questions, or review.
>
>
> This is well thought out, well described, and a fantastic improvement in
> my view -- well done!

Thanks, Joe! That means a lot! I see work done by hackers on the
mailing list a lot that makes me think, "hey, that's
cool/clever/awesome!" but I don't give that feedback. I appreciate you
doing that!

> I do think we will need to consider distributions other than normal, but
> I don't know offhand what they will be.

Agreed. I plan to test with another distribution. Though, the exercise
of determining which ones are useful is probably more challenging.
I imagine we will have to choose one distribution (as opposed to
supporting different distributions and choosing based on data access
patterns for a table). Though, even with a normal distribution, I
think it should be an improvement.

> However, even if we assume a more-or-less normal distribution, we should
> consider using subgroups in a way similar to Statistical Process
> Control[1]. The reasoning is explained in this quote:
>
>  The Math Behind Subgroup Size
>
>  The Central Limit Theorem (CLT) plays a pivotal role here. According
>  to CLT, as the subgroup size (n) increases, the distribution of the
>  sample means will approximate a normal distribution, regardless of
>  the shape of the population distribution. Therefore, as your
>  subgroup size increases, your control chart limits will narrow,
>  making the chart more sensitive to special cause variation and more
>  prone to false alarms.

I haven't read anything about statistical process control until you
mentioned this. I read the link you sent and also googled around a
bit. I was under the impression that the more samples we have, the
better. But, it seems like this may not be the assumption in
statistical process control?

It may help us to get more specific. I'm not sure what the
relationship between "unsets" in my code and subgroup members would
be.  The article you linked suggests that each subgroup should be of
size 5 or smaller. Translating that to my code, were you imagining
subgroups of "unsets" (each time we modify a page that was previously
all-visible)?

Thanks for the feedback!

- Melanie




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Tom Lane
Andres Freund  writes:
> On 2023-12-21 10:22:49 -0500, Tom Lane wrote:
>> We could make it version-specific,
>> https://www.postgresql.org/docs/17/installation.html
>> and task src/tools/version_stamp.pl with updating it.  But that's
>> problematic for not-yet-released branches (there's no 17 today
>> for example).

> Perhaps we could make the website redirect 17 to /devel/ until 17 is branched
> off?

Hmm, maybe, but then there's a moving part in version-stamping that's not
accessible to the average committer.  On the other hand, it wouldn't
be too awful if that redirect didn't get updated instantly after a
branch.  This is probably a point where we need advice from the web
team about how they manage documentation branches on the site.

>> Perhaps we can use /devel/ in the master branch
>> and try to remember to replace that with a version number as soon
>> as a release branch is forked off --- but does the docs website
>> get populated as soon as the branch is made?

> I think it runs a few times a day - breaking the link for a few hours wouldn't
> be optimal, but also not the end of the world. But redirecting $vnext -> to
> devel would probably be a more reliable approach.

Let's go with "/devel/ in master and a number in release branches"
for now, and tweak that if the web team wants to take on maintaining
a redirect.  I'll put together a concrete patch proposal in a little
bit.

regards, tom lane




Re: index prefetching

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 16:20:45 +0100, Tomas Vondra wrote:
> On 12/21/23 14:43, Andres Freund wrote:
> >> AFAICS this seems similar to some of the AIO patch, I wonder what that
> >> plans to do. I need to check.
> > 
> > Yes, most of this exists there.  The difference that with the AIO you don't
> > need to prefetch, as you can just initiate the IO for real, and wait for it 
> > to
> > complete.
> > 
> 
> Right, although the line where things stop being "prefetch" and becomes
> "async" seems a bit unclear to me / perhaps more a point of view.

Agreed. What I meant with not needing prefetching was that you'd not use
fadvise(), because it's better to instead just asynchronously read data into
shared buffers. That way you don't have the doubling of syscalls and you don't
need to care less about the buffering rate in the kernel.

Greetings,

Andres Freund




Re: index prefetching

2023-12-21 Thread Tomas Vondra



On 12/21/23 14:27, Andres Freund wrote:
> Hi,
> 
> On 2023-12-09 19:08:20 +0100, Tomas Vondra wrote:
>> But there's a layering problem that I don't know how to solve - I don't
>> see how we could make indexam.c entirely oblivious to the prefetching,
>> and move it entirely to the executor. Because how else would you know
>> what to prefetch?
> 
>> With index_getnext_tid() I can imagine fetching XIDs ahead, stashing
>> them into a queue, and prefetching based on that. That's kinda what the
>> patch does, except that it does it from inside index_getnext_tid(). But
>> that does not work for index_getnext_slot(), because that already reads
>> the heap tuples.
> 
>> We could say prefetching only works for index_getnext_tid(), but that
>> seems a bit weird because that's what regular index scans do. (There's a
>> patch to evaluate filters on index, which switches index scans to
>> index_getnext_tid(), so that'd make prefetching work too, but I'd ignore
>> that here.
> 
> I think we should just switch plain index scans to index_getnext_tid(). It's
> one of the primary places triggering index scans, so a few additional lines
> don't seem problematic.
> 
> I continue to think that we should not have split plain and index only scans
> into separate files...
> 

I do agree with that opinion. Not just because of this prefetching
thread, but also because of the discussions about index-only filters in
a nearby thread.

> 
>> There are other index_getnext_slot() callers, and I don't
>> think we should accept does not work for those places seems wrong (e.g.
>> execIndexing/execReplication would benefit from prefetching, I think).
> 
> I don't think it'd be a problem to have to opt into supporting
> prefetching. There's plenty places where it doesn't really seem likely to be
> useful, e.g. doing prefetching during syscache lookups is very likely just a
> waste of time.
> 
> I don't think e.g. execReplication is likely to benefit from prefetching -
> you're just fetching a single row after all. You'd need a lot of dead rows to
> make it beneficial.  I think it's similar in execIndexing.c.
> 

Yeah, systable scans are unlikely to benefit from prefetching of this
type. I'm not sure about execIndexing/execReplication, it wasn't clear
to me but maybe you're right.

> 
> I suspect we should work on providing executor nodes with some estimates about
> the number of rows that are likely to be consumed. If an index scan is under a
> LIMIT 1, we shoulnd't prefetch. Similar for sequential scan with the
> infrastructure in
> https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com
> 

Isn't this mostly addressed by the incremental ramp-up at the beginning?
Even with target set to 1000, we only start prefetching 1, 2, 3, ...
blocks ahead, it's not like we'll prefetch 1000 blocks right away.

I did initially plan to also consider the number of rows we're expected
to need, but I think it's actually harder than it might seem. With LIMIT
for example we often don't know how selective the qual is, it's not like
we can just stop prefetching after the reading the first N tids. With
other nodes it's good to remember those are just estimates - it'd be
silly to be bitten both by a wrong estimate and also prefetching doing
the wrong thing based on an estimate.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 10:22:49 -0500, Tom Lane wrote:
> I think the only real question is what URL to point at exactly.  We can't
> simply say
> 
> https://www.postgresql.org/docs/current/installation.html
> 
> because that will be wrong for any version more than one major
> release back.

Right.


> We could make it version-specific,
> 
> https://www.postgresql.org/docs/17/installation.html
> 
> and task src/tools/version_stamp.pl with updating it.  But that's
> problematic for not-yet-released branches (there's no 17 today
> for example).

Perhaps we could make the website redirect 17 to /devel/ until 17 is branched
off?


> Perhaps we can use /devel/ in the master branch
> and try to remember to replace that with a version number as soon
> as a release branch is forked off --- but does the docs website
> get populated as soon as the branch is made?

I think it runs a few times a day - breaking the link for a few hours wouldn't
be optimal, but also not the end of the world. But redirecting $vnext -> to
devel would probably be a more reliable approach.

Greetings,

Andres Freund




Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Andres Freund
On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
> On Thu, Dec 21, 2023 at 5:05 PM Andres Freund  wrote:
> > We clearly can't just expose the numerical value for a C enum. So it has to 
> > be
> > converted to something SQL representable.
> >
> 
> We can return int2 value from the function pg_get_replication_slots()
> and then use that to display a string in the view
> pg_replication_slots.

I strongly dislike that pattern. It just leads to complicated views - and
doesn't provide a single benefit that I am aware of. It's much bettter to
simply populate the text version in pg_get_replication_slots().




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Tom Lane
Andres Freund  writes:
> On 2023-12-21 08:39:26 +0900, Michael Paquier wrote:
>> On Wed, Dec 20, 2023 at 11:36:28AM -0500, Tom Lane wrote:
>>> I thought the plan was to get rid of that file, in pursuit of making
>>> our distribution tarballs be more or less pure git pulls.  Instead of
>>> expending more effort on it, why not just push that project forward?

> Ah, right.  I don't really care what solution we go for, just that as long as
> we have INSTALL, we should make sure we don't regularly break it... Both
> Michael and I have in the last couple weeks.

So let's just do it.  I think the only real question is what URL
to point at exactly.  We can't simply say

https://www.postgresql.org/docs/current/installation.html

because that will be wrong for any version more than one major
release back.  We could make it version-specific,

https://www.postgresql.org/docs/17/installation.html

and task src/tools/version_stamp.pl with updating it.  But that's
problematic for not-yet-released branches (there's no 17 today
for example).  Perhaps we can use /devel/ in the master branch
and try to remember to replace that with a version number as soon
as a release branch is forked off --- but does the docs website
get populated as soon as the branch is made?

regards, tom lane




Re: Set log_lock_waits=on by default

2023-12-21 Thread Frédéric Yhuel




Le 21/12/2023 à 14:29, Laurenz Albe a écrit :

Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.


I think it's a really good idea. At Dalibo, we advise our customers to 
switch it on. AFAICT, it's never been a problem.


Best regards,
Frédéric





Re: index prefetching

2023-12-21 Thread Tomas Vondra



On 12/21/23 14:43, Andres Freund wrote:
> Hi,
> 
> On 2023-12-21 13:30:42 +0100, Tomas Vondra wrote:
>> You're right a lot of this is a guesswork. I don't think we can do much
>> better, because it depends on stuff that's out of our control - each OS
>> may do things differently, or perhaps it's just configured differently.
>>
>> But I don't think this is really a serious issue - all the read-ahead
>> implementations need to work about the same, because they are meant to
>> work in a transparent way.
>>
>> So it's about deciding at which point we think this is a sequential
>> pattern. Yes, the OS may use a slightly different threshold, but the
>> exact value does not really matter - in the worst case we prefetch a
>> couple more/fewer blocks.
>>
>> The OS read-ahead can't really prefetch anything except sequential
>> cases, so the whole question is "When does the access pattern get
>> sequential enough?". I don't think there's a perfect answer, and I don't
>> think we need a perfect one - we just need to be reasonably close.
> 
> For the streaming read interface (initially backed by fadvise, to then be
> replaced by AIO) we found that it's clearly necessary to avoid fadvises in
> cases of actual sequential IO - the overhead otherwise leads to easily
> reproducible regressions.  So I don't think we have much choice.
> 

Yeah, the regression are pretty easy to demonstrate. In fact, I didn't
have such detection in the first patch, but after the first round of
benchmarks it became obvious it's needed.

> 
>> Also, while I don't want to lazily dismiss valid cases that might be
>> affected by this, I think that sequential access for index paths is not
>> that common (with the exception of clustered indexes).
> 
> I think sequential access is common in other cases as well. There's lots of
> indexes where heap tids are almost perfectly correlated with index entries,
> consider insert only insert-only tables and serial PKs or inserted_at
> timestamp columns.  Even leaving those aside, for indexes with many entries
> for the same key, we sort by tid these days, which will also result in
> "runs" of sequential access.
> 

True. I should have thought about those cases.

> 
>> Obviously, the latter case has much more severe impact, but it depends
>> on the exact workload / access pattern etc. The only "perfect" solution
>> would be to actually check the page cache, but well - that seems to be
>> fairly expensive.
> 
>> What I was envisioning was something self-tuning, based on the I/O we
>> may do later. If the prefetcher decides to prefetch something, but finds
>> it's already in cache, we'd increase the distance, to remember more
>> blocks. Likewise, if a block is not prefetched but then requires I/O
>> later, decrease the distance. That'd make it adaptive, but I don't think
>> we actually have the info about I/O.
> 
> How would the prefetcher know that hte data wasn't in cache?
> 

I don't think there's a good way to do that, unfortunately, or at least
I'm not aware of it. That's what I meant by "we don't have the info" at
the end. Which is why I haven't tried implementing it.

The only "solution" I could come up with was some sort of "timing" for
the I/O requests and deducing what was cached. Not great, of course.

> 
>> Alternatively, I was thinking about moving the prefetches into a
>> separate worker process (or multiple workers), so we'd just queue the
>> request and all the overhead would be done by the worker. The main
>> problem is the overhead of calling posix_fadvise() for blocks that are
>> already in memory, and this would just move it to a separate backend. I
>> wonder if that might even make the custom cache unnecessary / optional.
> 
> The AIO patchset provides this.
> 

OK, I guess it's time for me to take a look at the patch again.

> 
>> AFAICS this seems similar to some of the AIO patch, I wonder what that
>> plans to do. I need to check.
> 
> Yes, most of this exists there.  The difference that with the AIO you don't
> need to prefetch, as you can just initiate the IO for real, and wait for it to
> complete.
> 

Right, although the line where things stop being "prefetch" and becomes
"async" seems a bit unclear to me / perhaps more a point of view.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: trying again to get incremental backup

2023-12-21 Thread Alexander Lakhin

21.12.2023 15:07, Robert Haas wrote:

On Wed, Dec 20, 2023 at 11:00 PM Alexander Lakhin  wrote:

I've found several typos/inconsistencies introduced with 174c48050 and
dc2123400. Maybe you would want to fix them, while on it?:

That's an impressively long list of mistakes in something I thought
I'd been careful about. Sigh.

I don't suppose you could provide these corrections in the form of a
patch? I don't really want to run these sed commands across the entire
tree and then try to figure out what's what...


Please look at the attached patch; it corrects all 29 items ("recods"
fixed in two places), but maybe you find some substitutions wrong...

I've also observed that those commits introduced new warnings:
$ CC=gcc-12 CPPFLAGS="-Wtype-limits" ./configure -q && make -s -j8
reconstruct.c: In function ‘read_bytes’:
reconstruct.c:511:24: warning: comparison of unsigned expression in ‘< 0’ is 
always false [-Wtype-limits]
  511 | if (rb < 0)
  |    ^
reconstruct.c: In function ‘write_reconstructed_file’:
reconstruct.c:650:40: warning: comparison of unsigned expression in ‘< 0’ is 
always false [-Wtype-limits]
  650 | if (rb < 0)
  |    ^
reconstruct.c:662:32: warning: comparison of unsigned expression in ‘< 0’ is 
always false [-Wtype-limits]
  662 | if (wb < 0)

There are also two deadcode.DeadStores complaints from clang. First one is
about:
    /*
 * Align the wait time to prevent drift. This doesn't really matter,
 * but we'd like the warnings about how long we've been waiting to say
 * 10 seconds, 20 seconds, 30 seconds, 40 seconds ... without ever
 * drifting to something that is not a multiple of ten.
 */
    timeout_in_ms -=
    TimestampDifferenceMilliseconds(current_time, initial_time) %
    timeout_in_ms;
It looks like this timeout is really not used.

And the minor one (similar to many existing, maybe doesn't deserve fixing):
walsummarizer.c:808:5: warning: Value stored to 'summary_end_lsn' is never read 
[deadcode.DeadStores]
    summary_end_lsn = 
private_data->read_upto;
    ^ ~~~


Also, a comment above MaybeRemoveOldWalSummaries() basically repeats a
comment above redo_pointer_at_last_summary_removal declaration, but
perhaps it should say about removing summaries instead?

Wow, yeah. Thanks, will fix.


Thank you for paying attention to it!

Best regards,
Alexanderdiff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 7c183a5cfd..e411ddbf45 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -213,7 +213,7 @@ PostgreSQL documentation
 
  
   -i old_manifest_file
-  --incremental=old_meanifest_file
+  --incremental=old_manifest_file
   

 Performs an incremental
diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml
index e1cb31607e..8a0a600c2b 100644
--- a/doc/src/sgml/ref/pg_combinebackup.sgml
+++ b/doc/src/sgml/ref/pg_combinebackup.sgml
@@ -83,7 +83,7 @@ PostgreSQL documentation
   

 The -n/--dry-run option instructs
-pg_cominebackup to figure out what would be done
+pg_combinebackup to figure out what would be done
 without actually creating the target directory or any output files.
 It is particularly useful in combination with --debug.

diff --git a/src/backend/backup/basebackup_incremental.c b/src/backend/backup/basebackup_incremental.c
index 1e5a5ac33a..42bbe564e2 100644
--- a/src/backend/backup/basebackup_incremental.c
+++ b/src/backend/backup/basebackup_incremental.c
@@ -158,7 +158,7 @@ CreateIncrementalBackupInfo(MemoryContext mcxt)
 
 /*
  * Before taking an incremental backup, the caller must supply the backup
- * manifest from a prior backup. Each chunk of manifest data recieved
+ * manifest from a prior backup. Each chunk of manifest data received
  * from the client should be passed to this function.
  */
 void
@@ -462,7 +462,7 @@ PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
 			++deadcycles;
 
 		/*
-		 * If we've managed to wait for an entire minute withot the WAL
+		 * If we've managed to wait for an entire minute without the WAL
 		 * summarizer absorbing a single WAL record, error out; probably
 		 * something is wrong.
 		 *
@@ -473,7 +473,7 @@ PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
 		 * likely to catch a reasonable number of the things that can go wrong
 		 * in practice (e.g. the summarizer process is completely hung, say
 		 * because somebody hooked up a debugger to it or something) without
-		 * giving up too quickly when the sytem is just slow.
+		 * giving up too quickly when the system is just slow.
 		 */
 		if (deadcycles >= 6)
 			

Re: Set log_lock_waits=on by default

2023-12-21 Thread Nikolay Samokhvalov
On Thu, Dec 21, 2023 at 05:29 Laurenz Albe  wrote:

> Here is a patch to implement this.
> Being stuck behind a lock for more than a second is almost
> always a problem, so it is reasonable to turn this on by default.


I think it's a very good idea. On all heavily loaded systems I have
observed so far, we always have turned it on. 1s (default deadlock_timeout)
is quite large value for web/mobile apps, meaning that default frequency of
logging is quite low, so any potential suffering from observer effect
doesn't happen -- saturation related active session number happens much,
much earlier, even if you have very slow disk IO for logging.

At the same time, I like the idea by Robert to separate logging of log
waits and deadlock_timeout logic -- the current implementation is a quite
confusing for new users. I also had cases when people wanted to log lock
waits earlier than deadlock detection. And also, most always lock wait
logging lacks the information another the blocking session (its state, and
last query, first of all), but is maybe an off topic worthing another
effort of improvements.

Nik


Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Bertrand Drouvot
Hi,

On Thu, Dec 21, 2023 at 07:55:51PM +0530, Amit Kapila wrote:
> On Thu, Dec 21, 2023 at 5:05 PM Andres Freund  wrote:
> > I'm not entirely sure I understand the difference - just whether we add one
> > new column or replace the existing 'conflicting' column? I can see arguments
> > for either.
> >
> 
> Agreed. I think the argument against replacing the existing
> 'conflicting' column is that there is a chance that it is being used
> by some monitoring script which I guess shouldn't be a big deal to
> change. So, if we don't see that as a problem, I would prefer to have
> a single column with conflict reason where one of its values indicates
> there is no conflict.

+1

> A conflicting column where NULL indicates no conflict, and other
> > values indicate the reason for the conflict, doesn't seem too bad.
> >
> 
> This is fine too.

+1

> >
> > > And if we plan to return/display cause from either function or view,
> > > then shall it be enum 'ReplicationSlotInvalidationCause' or
> > > description/text corresponding to enum?
> >
> > We clearly can't just expose the numerical value for a C enum. So it has to 
> > be
> > converted to something SQL representable.
> >
> 
> We can return int2 value from the function pg_get_replication_slots()
> and then use that to display a string in the view
> pg_replication_slots.

Yeah, and in the sync slot related work we could use pg_get_replication_slots()
then to get the enum.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: GIN-Indexable JSON Patterns

2023-12-21 Thread David E. Wheeler
On Dec 17, 2023, at 13:10, David E. Wheeler  wrote:

> Quick follow-up to my slew of questions back in [September][1]. I wanted to 
> update [my patch][2] to note that only JSON Path equality operators are 
> supported by indexes, as [previously discussed][3].

Should I just add it to the patch and let the reviews fall where they may? :-)

Best,

David

 [1]: 
https://www.postgresql.org/message-id/15dd78a5-b5c4-4332-acfe-55723259c...@justatheory.com
 [2]: https://commitfest.postgresql.org/45/4624/
 [3]: 
https://www.postgresql.org/message-id/973d6495-cf28-4d06-7d46-758bd2615...@xs4all.nl





Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Amit Kapila
On Thu, Dec 21, 2023 at 5:05 PM Andres Freund  wrote:
>
> On 2023-12-21 16:08:48 +0530, shveta malik wrote:
> > On Thu, Dec 21, 2023 at 3:10 PM Andres Freund  wrote:
> > >
> > > Extra columns aren't free from a usability perspective. IFF we do 
> > > something, I
> > > think it should be a single column with a cause.
> >
> > Thanks for the feedback. But do you mean that we replace existing
> > 'conflicting' column with 'cause' in both the function and view
> > (pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
> > that we expose 'cause' from pg_get_replication_slots() and use that to
> > display 'conflicting' in pg_replication_slots view?
>
> I'm not entirely sure I understand the difference - just whether we add one
> new column or replace the existing 'conflicting' column? I can see arguments
> for either.
>

Agreed. I think the argument against replacing the existing
'conflicting' column is that there is a chance that it is being used
by some monitoring script which I guess shouldn't be a big deal to
change. So, if we don't see that as a problem, I would prefer to have
a single column with conflict reason where one of its values indicates
there is no conflict.

A conflicting column where NULL indicates no conflict, and other
> values indicate the reason for the conflict, doesn't seem too bad.
>

This is fine too.

>
> > And if we plan to return/display cause from either function or view,
> > then shall it be enum 'ReplicationSlotInvalidationCause' or
> > description/text corresponding to enum?
>
> We clearly can't just expose the numerical value for a C enum. So it has to be
> converted to something SQL representable.
>

We can return int2 value from the function pg_get_replication_slots()
and then use that to display a string in the view
pg_replication_slots.

-- 
With Regards,
Amit Kapila.




Re: Set log_lock_waits=on by default

2023-12-21 Thread Robert Haas
On Thu, Dec 21, 2023 at 8:29 AM Laurenz Albe  wrote:
> Here is a patch to implement this.
> Being stuck behind a lock for more than a second is almost
> always a problem, so it is reasonable to turn this on by default.

I think it depends somewhat on the lock type, and also on your
threshold for what constitutes a problem. For example, you can wait
for 1 second for a relation extension lock pretty easily, I think,
just because the I/O system is busy. Or I think also a VXID lock held
by some transaction that has a tuple locked could be not particularly
exciting. A conflict on a relation lock seems more likely to represent
a real issue, but I guess it's all kind of a judgement call. A second
isn't really all that long on an overloaded system, and I see an awful
lot of overloaded systems (because those are the people who call me).

Just a random idea but what if we separated log_lock_waits from
deadlock_timeout? Say, it becomes time-valued rather than
Boolean-valued, but it has to be >= deadlock_timeout? Because I'd
probably be more interested in hearing about a lock wait that was more
than say 10 seconds, but I don't necessarily want to wait 10 seconds
for the deadlock detector to trigger.

In general, I do kind of like the idea of trying to log more problem
situations by default, so that when someone has a major issue, you
don't have to start by having them change all the logging settings and
then wait until they get hosed a second time before you can
troubleshoot anything. I'm just concerned that 1s might be too
sensitive for a lot of users who aren't as, let's say, diligent about
keeping the system healthy as you probably are.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: logical decoding and replication of sequences, take 2

2023-12-21 Thread Tomas Vondra
On 12/15/23 03:33, Amit Kapila wrote:
> On Thu, Dec 14, 2023 at 9:14 PM Ashutosh Bapat
>  wrote:
>>
>> On Thu, Dec 14, 2023 at 2:51 PM Amit Kapila  wrote:
>>>
>>> It can only be cleaned if we process it but xact_decode won't allow us
>>> to process it and I don't think it would be a good idea to add another
>>> hack for sequences here. See below code:
>>>
>>> xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
>>> {
>>> SnapBuild  *builder = ctx->snapshot_builder;
>>> ReorderBuffer *reorder = ctx->reorder;
>>> XLogReaderState *r = buf->record;
>>> uint8 info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK;
>>>
>>> /*
>>> * If the snapshot isn't yet fully built, we cannot decode anything, so
>>> * bail out.
>>> */
>>> if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
>>> return;
>>
>> That may be true for a transaction which is decoded, but I think all
>> the transactions which are added to ReorderBuffer should be cleaned up
>> once they have been processed irrespective of whether they are
>> decoded/sent downstream or not. In this case I see the sequence hash
>> being cleaned up for the sequence related transaction in Hayato's
>> reproducer.
>>
> 
> It was because the test you are using was not designed to show the
> problem I mentioned. In this case, the rollback was after a full
> snapshot state was reached.
> 

Right, I haven't tried to reproduce this, but it very much looks like we
the entry would not be removed if the xact aborts/commits before the
snapshot reaches FULL state.

I suppose one way to deal with this would be to first check if an entry
for the same relfilenode exists. If it does, the original transaction
must have terminated, but we haven't cleaned it up yet - in which case
we can just "move" the relfilenode to the new one.

However, can't that happen even with full snapshots? I mean, let's say a
transaction creates a relfilenode and terminates without writing an
abort record (surely that's possible, right?). And then another xact
comes and generates the same relfilenode (presumably that's unlikely,
but perhaps possible?). Aren't we in pretty much the same situation,
until the next RUNNING_XACTS cleans up the hash table?


I think tracking all relfilenodes would fix the original issue (with
treating some changes as transactional), and the tweak that "moves" the
relfilenode to the new xact would fix this other issue too.

That being said, I feel a bit uneasy about it, for similar reasons as
Amit. If we start processing records before full snapshot, that seems
like moving the assumptions a bit. For example it means we'd create
ReorderBufferTXN entries for cases that'd have skipped before. OTOH this
is (or should be) only a very temporary period while starting the
replication, I believe.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: "pgoutput" options missing on documentation

2023-12-21 Thread Emre Hasegeli
> But the xref seems present only in the master/v16/v15 patches, but not
> for the earlier patches v14/v13/v12. Why not?

I missed it.

> But the change was only in the patches v14 onwards. Although the new
> error message was only added for HEAD, isn't it still correct to say
> "A valid version is required." for all the patches including v12 and
> v13?

Yes, it's still correct.

Fixed versions are attached.


v12-0001-doc-Clarify-pgoutput-options.patch
Description: Binary data


v14-0001-doc-Clarify-pgoutput-options.patch
Description: Binary data


v13-0001-doc-Clarify-pgoutput-options.patch
Description: Binary data


Re: index prefetching

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 13:30:42 +0100, Tomas Vondra wrote:
> You're right a lot of this is a guesswork. I don't think we can do much
> better, because it depends on stuff that's out of our control - each OS
> may do things differently, or perhaps it's just configured differently.
> 
> But I don't think this is really a serious issue - all the read-ahead
> implementations need to work about the same, because they are meant to
> work in a transparent way.
> 
> So it's about deciding at which point we think this is a sequential
> pattern. Yes, the OS may use a slightly different threshold, but the
> exact value does not really matter - in the worst case we prefetch a
> couple more/fewer blocks.
> 
> The OS read-ahead can't really prefetch anything except sequential
> cases, so the whole question is "When does the access pattern get
> sequential enough?". I don't think there's a perfect answer, and I don't
> think we need a perfect one - we just need to be reasonably close.

For the streaming read interface (initially backed by fadvise, to then be
replaced by AIO) we found that it's clearly necessary to avoid fadvises in
cases of actual sequential IO - the overhead otherwise leads to easily
reproducible regressions.  So I don't think we have much choice.


> Also, while I don't want to lazily dismiss valid cases that might be
> affected by this, I think that sequential access for index paths is not
> that common (with the exception of clustered indexes).

I think sequential access is common in other cases as well. There's lots of
indexes where heap tids are almost perfectly correlated with index entries,
consider insert only insert-only tables and serial PKs or inserted_at
timestamp columns.  Even leaving those aside, for indexes with many entries
for the same key, we sort by tid these days, which will also result in
"runs" of sequential access.


> Obviously, the latter case has much more severe impact, but it depends
> on the exact workload / access pattern etc. The only "perfect" solution
> would be to actually check the page cache, but well - that seems to be
> fairly expensive.

> What I was envisioning was something self-tuning, based on the I/O we
> may do later. If the prefetcher decides to prefetch something, but finds
> it's already in cache, we'd increase the distance, to remember more
> blocks. Likewise, if a block is not prefetched but then requires I/O
> later, decrease the distance. That'd make it adaptive, but I don't think
> we actually have the info about I/O.

How would the prefetcher know that hte data wasn't in cache?


> Alternatively, I was thinking about moving the prefetches into a
> separate worker process (or multiple workers), so we'd just queue the
> request and all the overhead would be done by the worker. The main
> problem is the overhead of calling posix_fadvise() for blocks that are
> already in memory, and this would just move it to a separate backend. I
> wonder if that might even make the custom cache unnecessary / optional.

The AIO patchset provides this.


> AFAICS this seems similar to some of the AIO patch, I wonder what that
> plans to do. I need to check.

Yes, most of this exists there.  The difference that with the AIO you don't
need to prefetch, as you can just initiate the IO for real, and wait for it to
complete.

Greetings,

Andres Freund




Re: index prefetching

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-09 19:08:20 +0100, Tomas Vondra wrote:
> But there's a layering problem that I don't know how to solve - I don't
> see how we could make indexam.c entirely oblivious to the prefetching,
> and move it entirely to the executor. Because how else would you know
> what to prefetch?

> With index_getnext_tid() I can imagine fetching XIDs ahead, stashing
> them into a queue, and prefetching based on that. That's kinda what the
> patch does, except that it does it from inside index_getnext_tid(). But
> that does not work for index_getnext_slot(), because that already reads
> the heap tuples.

> We could say prefetching only works for index_getnext_tid(), but that
> seems a bit weird because that's what regular index scans do. (There's a
> patch to evaluate filters on index, which switches index scans to
> index_getnext_tid(), so that'd make prefetching work too, but I'd ignore
> that here.

I think we should just switch plain index scans to index_getnext_tid(). It's
one of the primary places triggering index scans, so a few additional lines
don't seem problematic.

I continue to think that we should not have split plain and index only scans
into separate files...


> There are other index_getnext_slot() callers, and I don't
> think we should accept does not work for those places seems wrong (e.g.
> execIndexing/execReplication would benefit from prefetching, I think).

I don't think it'd be a problem to have to opt into supporting
prefetching. There's plenty places where it doesn't really seem likely to be
useful, e.g. doing prefetching during syscache lookups is very likely just a
waste of time.

I don't think e.g. execReplication is likely to benefit from prefetching -
you're just fetching a single row after all. You'd need a lot of dead rows to
make it beneficial.  I think it's similar in execIndexing.c.


I suspect we should work on providing executor nodes with some estimates about
the number of rows that are likely to be consumed. If an index scan is under a
LIMIT 1, we shoulnd't prefetch. Similar for sequential scan with the
infrastructure in
https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com

Greetings,

Andres Freund




Set log_lock_waits=on by default

2023-12-21 Thread Laurenz Albe
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.

Yours,
Laurenz Albe
From a767e69c724fbbff14114729272be5d29c3d69d8 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 21 Dec 2023 14:24:00 +0100
Subject: [PATCH v1] Default to log_lock_waits=on

If someone is stuck behind a lock for more than a second,
that is almost always a problem that is worth a log entry.
---
 doc/src/sgml/config.sgml  | 2 +-
 src/backend/storage/lmgr/proc.c   | 2 +-
 src/backend/utils/misc/guc_tables.c   | 2 +-
 src/backend/utils/misc/postgresql.conf.sample | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 44cada2b40..f96ec5a72e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7383,7 +7383,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 Controls whether a log message is produced when a session waits
 longer than  to acquire a
 lock.  This is useful in determining if lock waits are causing
-poor performance.  The default is off.
+poor performance.  The default is on.
 Only superusers and users with the appropriate SET
 privilege can change this setting.

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index b6451d9d08..4f25cebbd8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -60,7 +60,7 @@ int			StatementTimeout = 0;
 int			LockTimeout = 0;
 int			IdleInTransactionSessionTimeout = 0;
 int			IdleSessionTimeout = 0;
-bool		log_lock_waits = false;
+bool		log_lock_waits = true;
 
 /* Pointer to this process's PGPROC struct, if any */
 PGPROC	   *MyProc = NULL;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f7c9882f7c..1f3d56e6ee 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1519,7 +1519,7 @@ struct config_bool ConfigureNamesBool[] =
 			NULL
 		},
 		_lock_waits,
-		false,
+		true,
 		NULL, NULL, NULL
 	},
 	{
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf9f283cfe..451e09d3d8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -586,7 +586,7 @@
 	#processes
 	#   %% = '%'
 	# e.g. '<%u%%%d> '
-#log_lock_waits = off			# log lock waits >= deadlock_timeout
+#log_lock_waits = on			# log lock waits >= deadlock_timeout
 #log_recovery_conflict_waits = off	# log standby recovery conflict waits
 	# >= deadlock_timeout
 #log_parameter_max_length = -1		# when logging statements, limit logged
-- 
2.43.0



Re: GUC names in messages

2023-12-21 Thread Peter Eisentraut

On 21.12.23 07:24, Peter Smith wrote:

#1. GUC name quoting.

Some basic guidelines were decided and a patch is already pushed [1].


 In messages containing configuration variable names, do not include quotes
 when the names are visibly not natural English words, such as when they
 have underscores, are all-uppercase or have mixed case. Otherwise, quotes
 must be added. Do include quotes in a message where an arbitrary variable
 name is to be expanded.


AFAIK there is nothing controversial there, although maybe the
guideline for 'mixed case' needs revisiting depending on objections
about point #2.


Now that I read this again, I think this is wrong.

We should decide the quoting for a category, not the actual content. 
Like, quote all file names; do not quote keywords.


This led to the attempted patch to decide the quoting of GUC parameter 
names dynamically based on the actual content, which no one really 
liked.  But then, to preserve consistency, we also need to be uniform in 
quoting GUC parameter names where the name is hardcoded.






Re: logical decoding and replication of sequences, take 2

2023-12-21 Thread Tomas Vondra
On 12/19/23 13:54, Christophe Pettus wrote:
> Hi,
> 
> I wanted to hop in here on one particular issue:
> 
>> On Dec 12, 2023, at 02:01, Tomas Vondra  
>> wrote:
>> - desirability of the feature: Random IDs (UUIDs etc.) are likely a much
>> better solution for distributed (esp. active-active) systems. But there
>> are important use cases that are likely to keep using regular sequences
>> (online upgrades of single-node instances, existing systems, ...).
> 
> +1.
> 
> Right now, the lack of sequence replication is a rather large 
> foot-gun on logical replication upgrades.  Copying the sequences
> over during the cutover period is doable, of course, but:
> 
> (a) There's no out-of-the-box tooling that does it, so everyone has
> to write some scripts just for that one function.
>
> (b) It's one more thing that extends the cutover window.
> 

I agree it's an annoying gap for this use case. But if this is the only
use cases, maybe a better solution would be to provide such tooling
instead of adding it to the logical decoding?

It might seem a bit strange if most data is copied by replication
directly, while sequences need special handling, ofc.

> I don't think it is a good idea to make it mandatory: for example, 
> there's a strong use case for replicating a table but not a sequence 
> associated with it.  But it's definitely a missing feature in
> logical replication.

I don't think the plan was to make replication of sequences mandatory,
certainly not with the built-in replication. If you don't add sequences
to the publication, the sequence changes will be skipped.

But it still needs to be part of the decoding, which adds overhead for
all logical decoding uses, even if the sequence changes end up being
discarded. That's somewhat annoying, especially considering sequences
are fairly common part of the WAL stream.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-21 Thread Michail Nikolaev
Hello.

Realized my last idea is invalid (because tuples are frozen by using
dynamically calculated horizon) - so, don't waste your time on it :)

Need to think a little bit more here.

Thanks,
Mikhail.




RE: Synchronizing slots from primary to standby

2023-12-21 Thread Hayato Kuroda (Fujitsu)
Dear Shveta,

Thanks for updating the patch! Here is my comments for v52-0002.

~
system-views.sgml

01. 

```
+
+ 
+  
+   sync_state char
+  
+  
+  Defines slot synchronization state. This is meaningful on the physical
+  standby which has configured  = 
true.
+  Possible values are:
+   
+
+ n = none for user created slots,
...
```

Hmm. I'm not sure why we must show a single character to a user. I'm OK for
pg_subscription.srsubstate because it is a "catalog" - the actual value would be
recorded in the heap. But pg_replication_slot is just a view so that we can 
replace
internal representations to other strings. E.g., 
pg_replication_slots.wal_status.
How about using {none, initialized, ready} or something?

~
postmaster.c

02. bgworker_should_start_now

```
+if (start_time == BgWorkerStart_ConsistentState_HotStandby &&
+ pmState != PM_RUN)
+return true;
```

I'm not sure the second condition is really needed. The line will be executed 
when
pmState is PM_HOT_STANDBY. Is there a possibility that pmState is changed 
around here?

~
libpqwalreceiver.c

03. PQWalReceiverFunctions

```
+.walrcv_get_dbname_from_conninfo = libpqrcv_get_dbname_from_conninfo,
```

Just to confirm - is there a rule for ordering?

~
slotsync.c

04. SlotSyncWorkerCtx

```
typedef struct SlotSyncWorkerCtx
{
pid_t   pid;
slock_t mutex;
} SlotSyncWorkerCtx;

SlotSyncWorkerCtx *SlotSyncWorker = NULL;
```

Per other files like launcher.c, should we use a name like 
"SlotSyncWorkerCtxStruct"?

05. SlotSyncWorkerRegister()

Your coding will work well, but there is another approach which validates
slotsync parameters here. In this case, the postmaster should exit ASAP. This 
can
notify that there are some wrong settings to users earlier. Thought?

06. wait_for_primary_slot_catchup

```
+CHECK_FOR_INTERRUPTS();
+
+/* Handle any termination request if any */
+ProcessSlotSyncInterrupts(wrconn);
```

ProcessSlotSyncInterrupts() also has CHECK_FOR_INTERRUPTS(), so no need to call.

07. wait_for_primary_slot_catchup

```
+/*
+ * XXX: Is waiting for 2 seconds before retrying enough or more or
+ * less?
+ */
+rc = WaitLatch(MyLatch,
+   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+   2000L,
+   WAIT_EVENT_REPL_SLOTSYNC_PRIMARY_CATCHUP);
+
+ResetLatch(MyLatch);
+
+/* Emergency bailout if postmaster has died */
+if (rc & WL_POSTMASTER_DEATH)
+proc_exit(1);
```

Is there any reasons not to use WL_EXIT_ON_PM_DEATH event? If not, you can use.

08. synchronize_slots

```
+SpinLockAcquire(>mutex);
+if (!WalRcv ||
+(WalRcv->slotname[0] == '\0') ||
+XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
+{
...
```

Assuming that WalRcv is still NULL. In this case, does the first 
SpinLockAcquire()
lead a segmentation fault?

09. synchronize_slots

```
+elog(DEBUG2, "slot sync worker's query:%s \n", s.data);
```

The query is not dynamical one, so I think no need to print even if the debug
mode.

10. synchronize_one_slot

IIUC, this function can synchronize slots even if the used plugin on primary is
not installed on the secondary server. If the slot is created by the slotsync
worker, users will recognize it after the server is promoted and the decode is
starting. I felt it is not good specification. Can we detect in the validation
phase?

~
not the source code

11. 

I tested the typical case - promoting a publisher from a below diagram.
A physical replication slot "physical" was specified as standby_slot_names.

```
node A (primary) --> node B (secondary)
|
|
node C (subscriber)
```

And after the promoting, below lines were periodically output on logfiles for
node B and C.

```
WARNING:  replication slot "physical" specified in parameter 
"standby_slot_names" does not exist, ignoring
```

Do you have idea to suppress the warning? IIUC it is a normal behavior of the
walsender so that we cannot avoid the periodical outputs.

The steps of the test was as follows:

1. stop the node A via pg_ctl stop
2. promota the node B via pg_ctl promote
3. change the connection string of the subscription via ALTER SUBSCRIPTION ... 
CONNECTION ...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2023-12-21 Thread Pavel Stehule
Hi

čt 21. 12. 2023 v 13:37 odesílatel Ishaan Adarsh 
napsal:

> The recent documentation patches are part of my GSoC 2023 project
> 
> to develop a comprehensive PostgreSQL extension development tutorial, it
> assumes only a basic knowledge of Postgres and the target programming
> language.
>
> The entire project is available on GitHub: Postgres-extension-tutorial
> .
> It covers many topics, including prerequisites, writing extensions,
> creating Makefiles, using procedural languages, incorporating external
> languages, writing regression tests, and managing extension releases. *The 
> patch submitted
> for procedural languages, specifically PL/pgSQL and PL/Python, is part of
> the procedural language section within the broader tutorial. *
>
> Based on the feedback I think there is a real need
>  for this as this
> is a very important and growing part of the Postgres ecosystem. Currently,
> all the extension material is scattered and very limited. There are
> various third-party blog posts focusing on different areas, and sometimes
> contradictory. The main motivation behind making this is to make the barrier
> for entry less prohibitive for new contributors.
>
> I would greatly appreciate your input on how to add it to the existing
> documentation (this is where I have major doubts) and any suggestions on
> how to proceed. If there are areas where the existing documentation is
> already sufficient or if there are ways to improve the overall structure, I
> am open to making adjustments.
>

https://www.postgresql.org/docs/current/plpgsql-development-tips.html and
new section - deployment or packaging to extensions

I agree so https://www.postgresql.org/docs/current/plpgsql-overview.html is
under dimensioned, but packaging should not be there

Regards

Pavel


>
> Best,
> Ishaan Adarsh
>
>
> On Thu, Dec 21, 2023 at 4:17 PM Pavel Stehule 
> wrote:
>
>> Hi
>>
>> čt 21. 12. 2023 v 11:18 odesílatel Peter Eisentraut 
>> napsal:
>>
>>> On 19.12.23 17:26, Ishaan Adarsh wrote:
>>> > Subject: Clarification on the Purpose of the Patch
>>> >
>>> > Hi Peter,
>>> >
>>> > The intention was to address the challenge faced by newcomers in
>>> > understanding how to write an extension for PostgreSQL. The existing
>>> > documentation, while comprehensive, lacks a consolidated and
>>> > easy-to-follow tutorial that serves as a quick start guide. The goal
>>> was
>>> > to create a beginner-friendly resource that assumes only knowledge of
>>> > Postgres and the target language, making it accessible for new
>>> > contributors because the barrier for entry is prohibitive for new
>>> > contributors. There are various third-party blog posts focusing on
>>> > different areas, and sometimes contradictory.
>>>
>>> Have you seen this:
>>>
>>> https://www.postgresql.org/docs/devel/extend-extensions.html#EXTEND-EXTENSIONS-EXAMPLE
>>>
>>> Maybe that could be extended/modified/simplified?
>>>
>>> > Specifically:
>>> > 1. The new section titled "Quick Start Guide" aims to provide
>>> > step-by-step instructions for users to get started with writing
>>> > extensions in PL/pgSQL and PL/Python.
>>>
>>> What's confusing here is writing an extension in a PL language is not a
>>> normal use case I'd say.  The normal use case involves some C code.
>>>
>>
>>  Extensions were designed for C, but they are working with PL well too.
>> Some of my customers use extensions for PLpgSQL and they are almost happy.
>> 1) there is nothing else, 2) it is really works
>>
>> I agree with Peter - this topic is not what I imagine under "Quick start
>> guide"
>>
>> Regards
>>
>> Pavel
>>
>


Re: index prefetching

2023-12-21 Thread Tomas Vondra



On 12/21/23 07:49, Dilip Kumar wrote:
> On Wed, Dec 20, 2023 at 7:11 AM Tomas Vondra
>  wrote:
>>
> I was going through to understand the idea, couple of observations
> 
> --
> + for (int i = 0; i < PREFETCH_LRU_SIZE; i++)
> + {
> + entry = >prefetchCache[lru * PREFETCH_LRU_SIZE + i];
> +
> + /* Is this the oldest prefetch request in this LRU? */
> + if (entry->request < oldestRequest)
> + {
> + oldestRequest = entry->request;
> + oldestIndex = i;
> + }
> +
> + /*
> + * If the entry is unused (identified by request being set to 0),
> + * we're done. Notice the field is uint64, so empty entry is
> + * guaranteed to be the oldest one.
> + */
> + if (entry->request == 0)
> + continue;
> 
> If the 'entry->request == 0' then we should break instead of continue, right?
> 

Yes, I think that's true. The small LRU caches are accessed/filled
linearly, so once we find an empty entry, all following entries are
going to be empty too.

I thought this shouldn't make any difference, because the LRUs are very
small (only 8 entries, and I don't think we should make them larger).
And it's going to go away once the cache gets full. But now that I think
about it, maybe this could matter for small queries that only ever hit a
couple rows. Hmmm, I'll have to check.

Thanks for noticing this!

> ---
> /*
>  * Used to detect sequential patterns (and disable prefetching).
>  */
> #define PREFETCH_QUEUE_HISTORY 8
> #define PREFETCH_SEQ_PATTERN_BLOCKS 4
> 
> If for sequential patterns we search only 4 blocks then why we are
> maintaining history for 8 blocks
> 
> ---

Right, I think there's no reason to keep these two separate constants. I
believe this is a remnant from an earlier patch version which tried to
do something smarter, but I ended up abandoning that.

> 
> + *
> + * XXX Perhaps this should be tied to effective_io_concurrency somehow?
> + *
> + * XXX Could it be harmful that we read the queue backwards? Maybe memory
> + * prefetching works better for the forward direction?
> + */
> + for (int i = 1; i < PREFETCH_SEQ_PATTERN_BLOCKS; i++)
> 
> Correct, I think if we fetch this forward it will have an advantage
> with memory prefetching.
> 

OK, although we only really have a couple uint32 values, so it should be
the same cacheline I guess.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2023-12-21 Thread Ishaan Adarsh
The recent documentation patches are part of my GSoC 2023 project

to develop a comprehensive PostgreSQL extension development tutorial, it
assumes only a basic knowledge of Postgres and the target programming
language.

The entire project is available on GitHub: Postgres-extension-tutorial
.
It covers many topics, including prerequisites, writing extensions,
creating Makefiles, using procedural languages, incorporating external
languages, writing regression tests, and managing extension releases.
*The patch submitted
for procedural languages, specifically PL/pgSQL and PL/Python, is part of
the procedural language section within the broader tutorial. *

Based on the feedback I think there is a real need
 for this as this is
a very important and growing part of the Postgres ecosystem. Currently, all
the extension material is scattered and very limited. There are various
third-party blog posts focusing on different areas, and sometimes
contradictory. The main motivation behind making this is to make the barrier
for entry less prohibitive for new contributors.

I would greatly appreciate your input on how to add it to the existing
documentation (this is where I have major doubts) and any suggestions on
how to proceed. If there are areas where the existing documentation is
already sufficient or if there are ways to improve the overall structure, I
am open to making adjustments.

Best,
Ishaan Adarsh


On Thu, Dec 21, 2023 at 4:17 PM Pavel Stehule 
wrote:

> Hi
>
> čt 21. 12. 2023 v 11:18 odesílatel Peter Eisentraut 
> napsal:
>
>> On 19.12.23 17:26, Ishaan Adarsh wrote:
>> > Subject: Clarification on the Purpose of the Patch
>> >
>> > Hi Peter,
>> >
>> > The intention was to address the challenge faced by newcomers in
>> > understanding how to write an extension for PostgreSQL. The existing
>> > documentation, while comprehensive, lacks a consolidated and
>> > easy-to-follow tutorial that serves as a quick start guide. The goal
>> was
>> > to create a beginner-friendly resource that assumes only knowledge of
>> > Postgres and the target language, making it accessible for new
>> > contributors because the barrier for entry is prohibitive for new
>> > contributors. There are various third-party blog posts focusing on
>> > different areas, and sometimes contradictory.
>>
>> Have you seen this:
>>
>> https://www.postgresql.org/docs/devel/extend-extensions.html#EXTEND-EXTENSIONS-EXAMPLE
>>
>> Maybe that could be extended/modified/simplified?
>>
>> > Specifically:
>> > 1. The new section titled "Quick Start Guide" aims to provide
>> > step-by-step instructions for users to get started with writing
>> > extensions in PL/pgSQL and PL/Python.
>>
>> What's confusing here is writing an extension in a PL language is not a
>> normal use case I'd say.  The normal use case involves some C code.
>>
>
>  Extensions were designed for C, but they are working with PL well too.
> Some of my customers use extensions for PLpgSQL and they are almost happy.
> 1) there is nothing else, 2) it is really works
>
> I agree with Peter - this topic is not what I imagine under "Quick start
> guide"
>
> Regards
>
> Pavel
>


Re: index prefetching

2023-12-21 Thread Tomas Vondra
On 12/20/23 20:09, Robert Haas wrote:
> On Tue, Dec 19, 2023 at 8:41 PM Tomas Vondra
> ...
>> I have imagined something like this:
>>
>> nodeIndexscan / index_getnext_slot()
>> -> no callback, all TIDs are prefetched
>>
>> nodeIndexonlyscan / index_getnext_tid()
>> -> callback checks VM for the TID, prefetches if not all-visible
>> -> the VM check result is stored in the queue with the VM (but in an
>>extensible way, so that other callback can store other stuff)
>> -> index_getnext_tid() also returns this extra information
>>
>> So not that different from the WIP patch, but in a "generic" and
>> extensible way. Instead of hard-coding the all-visible flag, there'd be
>> a something custom information. A bit like qsort_r() has a void* arg to
>> pass custom context.
>>
>> Or if envisioned something different, could you elaborate a bit?
> 
> I can't totally follow the sketch you give above, but I think we're
> thinking along similar lines, at least.
> 

Yeah, it's hard to discuss vague descriptions of code that does not
exist yet. I'll try to do the actual patch, then we can discuss.

>>> index_prefetch_is_sequential() makes me really nervous
>>> because it seems to depend an awful lot on whether the OS is doing
>>> prefetching, and how the OS is doing prefetching, and I think those
>>> might not be consistent across all systems and kernel versions.
>>
>> If the OS does not have read-ahead, or it's not configured properly,
>> then the patch does not perform worse than what we have now. I'm far
>> more concerned about the opposite issue, i.e. causing regressions with
>> OS-level read-ahead. And the check handles that well, I think.
> 
> I'm just not sure how much I believe that it's going to work well
> everywhere. I mean, I have no evidence that it doesn't, it just kind
> of looks like guesswork to me. For instance, the behavior of the
> algorithm depends heavily on PREFETCH_QUEUE_HISTORY and
> PREFETCH_SEQ_PATTERN_BLOCKS, but those are just magic numbers. Who is
> to say that on some system or workload you didn't test the required
> values aren't entirely different, or that the whole algorithm doesn't
> need rethinking? Maybe we can't really answer that question perfectly,
> but the patch doesn't really explain the reasoning behind this choice
> of algorithm.
> 

You're right a lot of this is a guesswork. I don't think we can do much
better, because it depends on stuff that's out of our control - each OS
may do things differently, or perhaps it's just configured differently.

But I don't think this is really a serious issue - all the read-ahead
implementations need to work about the same, because they are meant to
work in a transparent way.

So it's about deciding at which point we think this is a sequential
pattern. Yes, the OS may use a slightly different threshold, but the
exact value does not really matter - in the worst case we prefetch a
couple more/fewer blocks.

The OS read-ahead can't really prefetch anything except sequential
cases, so the whole question is "When does the access pattern get
sequential enough?". I don't think there's a perfect answer, and I don't
think we need a perfect one - we just need to be reasonably close.

Also, while I don't want to lazily dismiss valid cases that might be
affected by this, I think that sequential access for index paths is not
that common (with the exception of clustered indexes).

FWIW bitmap index scans have exactly the same "problem" except that no
one cares about it because that's how it worked from the start, so it's
not considered a regression.

>>> Similarly with index_prefetch(). There's a lot of "magical"
>>> assumptions here. Even index_prefetch_add_cache() has this problem --
>>> the function assumes that it's OK if we sometimes fail to detect a
>>> duplicate prefetch request, which makes sense, but under what
>>> circumstances is it necessary to detect duplicates and in what cases
>>> is it optional? The function comments are silent about that, which
>>> makes it hard to assess whether the algorithm is good enough.
>>
>> I don't quite understand what problem with duplicates you envision here.
>> Strictly speaking, we don't need to detect/prevent duplicates - it's
>> just that if you do posix_fadvise() for a block that's already in
>> memory, it's overhead / wasted time. The whole point is to not do that
>> very often. In this sense it's entirely optional, but desirable.
> 
> Right ... but the patch sets up some data structure that will
> eliminate duplicates in some circumstances and fail to eliminate them
> in others. So it's making a judgement that the things it catches are
> the cases that are important enough that we need to catch them, and
> the things that it doesn't catch are cases that aren't particularly
> important to catch. Here again, PREFETCH_LRU_SIZE and
> PREFETCH_LRU_COUNT seem like they will have a big impact, but why
> these values? The comments suggest that it's because we want to cover
> ~8MB of data, but it's not clear 

Re: Building PosgresSQL with LLVM fails on Solaris 11.4

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-13 15:18:02 +, Sacha Hottinger wrote:
> Thanks for your reply.
> The reason I was suspicious with the warnings of the gcc build was, because 
> gmake check reported 138 out of 202 tests to have failed. I have attached the 
> output of gmake check.

That'll likely be due to assertion / segmentation failures.

You'd need to enable core dumps and show a backtrace.

I assume that if you run tests without JIT support (e.g. by export
PGOPTIONS='-c jit=0'; gmake check), no such problem occurs?


> After you mentioned that gcc did not report any errors, just warnings, we 
> installed the build.
> First, it seeemed to work and SELECT pg_jit_available(); showed 
> "pg_jit_available" as "t" but the DB showed strange behaviour. I.e. not 
> always, but sometimes running "show parallel_tuple_cost" caused postmaster to 
> restart a server process.
> We had to back to the previous installation.
> 
> It seems there is definitievly something wrong with the result gcc created.

I suspect that the LLVM version you used does something wrong on sparc. Which
version of LLVM is it?

Greetings,

Andres Freund




Re: Detecting some cases of missing backup_label

2023-12-21 Thread David Steele

On 12/21/23 07:37, Andres Freund wrote:


On 2023-12-20 13:11:37 -0400, David Steele wrote:

I've run this through a bunch of scenarios (in my head) with parallel
backups and it does seem to hold up.

I think we'd need to write the state file before XLOG_BACKUP_START just in
case. Seems better to have an extra state file rather than have one be
missing.


That'd very significantly weaken the approach, afaict, because "external" base
base backup could end up copying those files. The whole point is to detect
broken procedures, so relying on such files being excluded from the base
backup seems like a bad idea.

I also see no need to do so - because we'd only verify that a backup start has
been replayed when replaying XLOG_BACKUP_STOP there's no danger in not
creating the files during XLOG_BACKUP_START, but doing so just before logging
the XLOG_BACKUP_STOP.


Ugh, I meant XLOG_BACKUP_STOP. So sounds like we are on the same page.


Probably we'd want to exclude *all* state files from backups, though.


I don't think so - I think we want the opposite? As noted above, I think in a
safety net like this we shouldn't assume that backup procedures were followed
correctly.


Fair enough.


Seems like in various PITR scenarios it could be hard to determine when to
remove them.


Why? I think we can basically remove the files when:

a) after the checkpoint during which XLOG_BACKUP_STOP was replayed - I think
we already have the infrastructure to queue file deletions that we can hook
into
b) when replaying a shutdown checkpoint / after creation of a shutdown
checkpoint


I thought about this some more. I *think* any state files a backup can 
see would have to be for XLOG_BACKUP_STOP records generated during the 
backup and they would get removed before the cluster had recovered to 
consistency.


I'd still prefer to exclude state files from the backup, but I agree 
there is no actual need to do so.


Regards,
-David




Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread shveta malik
On Thu, Dec 21, 2023 at 5:04 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-12-21 16:08:48 +0530, shveta malik wrote:
> > On Thu, Dec 21, 2023 at 3:10 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> > > > While listening at Bertrand's talk about logical decoding on standbys
> > > > last week at Prague, I got surprised by the fact that we do not
> > > > reflect in the catalogs the reason why a conflict happened for a slot.
> > > > There are three of them depending on ReplicationSlotInvalidationCause:
> > > > - WAL removed.
> > > > - Invalid horizon.
> > > > - Insufficient WAL level.
> > >
> > > It should be extremely rare to hit any of these other than "WAL removed", 
> > > so
> > > I'm not sure it's worth adding interface complexity to show them.
> > >
> > >
> > > > ReplicationSlotCtl holds this information, so couldn't it be useful
> > > > for monitoring purposes to know why a slot got invalidated and add a
> > > > column to pg_get_replication_slots()?  This could just be an extra
> > > > text conflicting_reason, defaulting to NULL when there's nothing to
> > > > see.
> > >
> > > Extra columns aren't free from a usability perspective. IFF we do 
> > > something, I
> > > think it should be a single column with a cause.
> >
> > Thanks for the feedback. But do you mean that we replace existing
> > 'conflicting' column with 'cause' in both the function and view
> > (pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
> > that we expose 'cause' from pg_get_replication_slots() and use that to
> > display 'conflicting' in pg_replication_slots view?
>
> I'm not entirely sure I understand the difference - just whether we add one
> new column or replace the existing 'conflicting' column? I can see arguments
> for either. A conflicting column where NULL indicates no conflict, and other
> values indicate the reason for the conflict, doesn't seem too bad.
>
>
> > And if we plan to return/display cause from either function or view,
> > then shall it be enum 'ReplicationSlotInvalidationCause' or
> > description/text corresponding to enum?
>
> We clearly can't just expose the numerical value for a C enum. So it has to be
> converted to something SQL representable.
>
>
> >  In the other feature being discussed "Synchronize slots from primary
> > to standby" [1] , there is a requirement to replicate invalidation
> > cause of slot from the primary to standby and thus it is needed in
> > enum form there. And thus there was a suggestion earlier to have the
> > function return enum-value and let the view display it as
> > text/description to the user.  So kindly let us know your thoughts.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011...@enterprisedb.com
>
> Can you point me to a more specific message for that requirement? It seems
> pretty odd to me. Your link goes to the top of a 400 message thread, I don't
> have time to find one specific design point in that...

It is currently implemented there as a new function
'pg_get_slot_invalidation_cause()'  without changing existing view
pg_replication_slots. (See 2.1 in [1] where it was introduced).

Then it was suggested in [2] to fork a new thread as it makes sense to
have it independent of this slot-synchronization feature.

The new thread forked is [3]. In that thread, the issues in having a
new function pg_get_slot_invalidation_cause() are discussed and also
we came to know about this very thread that started the next day.

[1]: 
https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/CAA4eK1K0KCDNtpDyUKucMRdyK-5KdrCRWakCpHEdHT9muAiEOw%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6N%3DanX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com

thanks
Shveta




Re: trying again to get incremental backup

2023-12-21 Thread Robert Haas
On Wed, Dec 20, 2023 at 11:00 PM Alexander Lakhin  wrote:
> I've found several typos/inconsistencies introduced with 174c48050 and
> dc2123400. Maybe you would want to fix them, while on it?:

That's an impressively long list of mistakes in something I thought
I'd been careful about. Sigh.

I don't suppose you could provide these corrections in the form of a
patch? I don't really want to run these sed commands across the entire
tree and then try to figure out what's what...

> Also, a comment above MaybeRemoveOldWalSummaries() basically repeats a
> comment above redo_pointer_at_last_summary_removal declaration, but
> perhaps it should say about removing summaries instead?

Wow, yeah. Thanks, will fix.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Detecting some cases of missing backup_label

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-20 13:11:37 -0400, David Steele wrote:
> I've run this through a bunch of scenarios (in my head) with parallel
> backups and it does seem to hold up.
>
> I think we'd need to write the state file before XLOG_BACKUP_START just in
> case. Seems better to have an extra state file rather than have one be
> missing.

That'd very significantly weaken the approach, afaict, because "external" base
base backup could end up copying those files. The whole point is to detect
broken procedures, so relying on such files being excluded from the base
backup seems like a bad idea.

I also see no need to do so - because we'd only verify that a backup start has
been replayed when replaying XLOG_BACKUP_STOP there's no danger in not
creating the files during XLOG_BACKUP_START, but doing so just before logging
the XLOG_BACKUP_STOP.



> I'm a little worried about what happens if a state file goes missing, but I
> guess that could be true of any file in PGDATA.

Yea, that seems like a non-issue to me.


> Probably we'd want to exclude *all* state files from backups, though.

I don't think so - I think we want the opposite? As noted above, I think in a
safety net like this we shouldn't assume that backup procedures were followed
correctly.


> Seems like in various PITR scenarios it could be hard to determine when to
> remove them.

Why? I think we can basically remove the files when:

a) after the checkpoint during which XLOG_BACKUP_STOP was replayed - I think
   we already have the infrastructure to queue file deletions that we can hook
   into
b) when replaying a shutdown checkpoint / after creation of a shutdown
   checkpoint

Greetings,

Andres Freund




Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 16:08:48 +0530, shveta malik wrote:
> On Thu, Dec 21, 2023 at 3:10 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> > > While listening at Bertrand's talk about logical decoding on standbys
> > > last week at Prague, I got surprised by the fact that we do not
> > > reflect in the catalogs the reason why a conflict happened for a slot.
> > > There are three of them depending on ReplicationSlotInvalidationCause:
> > > - WAL removed.
> > > - Invalid horizon.
> > > - Insufficient WAL level.
> >
> > It should be extremely rare to hit any of these other than "WAL removed", so
> > I'm not sure it's worth adding interface complexity to show them.
> >
> >
> > > ReplicationSlotCtl holds this information, so couldn't it be useful
> > > for monitoring purposes to know why a slot got invalidated and add a
> > > column to pg_get_replication_slots()?  This could just be an extra
> > > text conflicting_reason, defaulting to NULL when there's nothing to
> > > see.
> >
> > Extra columns aren't free from a usability perspective. IFF we do 
> > something, I
> > think it should be a single column with a cause.
>
> Thanks for the feedback. But do you mean that we replace existing
> 'conflicting' column with 'cause' in both the function and view
> (pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
> that we expose 'cause' from pg_get_replication_slots() and use that to
> display 'conflicting' in pg_replication_slots view?

I'm not entirely sure I understand the difference - just whether we add one
new column or replace the existing 'conflicting' column? I can see arguments
for either. A conflicting column where NULL indicates no conflict, and other
values indicate the reason for the conflict, doesn't seem too bad.


> And if we plan to return/display cause from either function or view,
> then shall it be enum 'ReplicationSlotInvalidationCause' or
> description/text corresponding to enum?

We clearly can't just expose the numerical value for a C enum. So it has to be
converted to something SQL representable.


>  In the other feature being discussed "Synchronize slots from primary
> to standby" [1] , there is a requirement to replicate invalidation
> cause of slot from the primary to standby and thus it is needed in
> enum form there. And thus there was a suggestion earlier to have the
> function return enum-value and let the view display it as
> text/description to the user.  So kindly let us know your thoughts.
>
> [1] - 
> https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011...@enterprisedb.com

Can you point me to a more specific message for that requirement? It seems
pretty odd to me. Your link goes to the top of a 400 message thread, I don't
have time to find one specific design point in that...

Greetings,

Andres




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 14:41:37 +0700, John Naylor wrote:
> I've attached v47, which is v46 plus some fixes for radix tree.

Could either of you summarize what the design changes you've made in the last
months are and why you've done them? Unfortunately this thread is very long,
and the comments in the file just say "FIXME" in places that apparently are
affected by design changes.  This makes it hard to catch up here.

Greetings,

Andres Freund




Re: Autonomous transactions 2023, WIP

2023-12-21 Thread Andrey M. Borodin



> On 15 Dec 2023, at 16:28, Ivan Kush  wrote:
> 
> 
> 
> Hello. I'm working on the support of autonomous transactions in Postgres.
> 

> # Summary
> * Add pragma AUTONOMOUS_TRANSACTION in the functions. When function 
> contains this pragma, the it's executed autonomously
> * Background workers are used to run autonomous sessions.
> * Synchronous execution between backend and autonomous session
> * Postgres Client-Server Protocol is used to communicate between them
> * Pool of autonomous sessions. Pool is created lazily.
> * Infinite nested calls of autonomous functions are allowed. Limited 
> only by computer resources.
> * If another 2nd autonomous function is called in the 1st autonomous 
> function, the 2nd is executed at the beginning, and then the 1st 
> continues execution.

Cool, looks interesting! As far as I know EnterpriseDB, Postgres Pro and 
OracleDB have this functionality. So, seems like the stuff is in demand.
How does your version compare to this widely used databases? Is anyone else 
using backgroud connections? Which syntax is used by other DBMS'?

Looking into the code it seems like an easy way for PL\pgSQL function to have a 
client connection. I think this might work for other PLs too.

The patch touches translations ( src/backend/po/). I think we typically do not 
do this in code patches, because this work is better handled by translators.


Best regards, Andrey Borodin.



Re: partitioning and identity column

2023-12-21 Thread Peter Eisentraut

On 19.12.23 11:47, Ashutosh Bapat wrote:

At this point I am looking for opinions on the above rules and whether
the implementation is on the right track.


This looks on the right track to me.


0001 - change to get_partition_ancestors() prologue. Can be reviewed
and committed independent of other patches.


I committed that.


0004 - An attached partition inherits identity property and uses the
underlying sequence for direct INSERTs. When inheriting the identity
property it should also inherit the NOT NULL constraint, but that's a
TODO in this patch. We expect matching NOT NULL constraints to be
present in the partition being attached. I am not sure whether we want
to add NOT NULL constraints automatically for an identity column. We
require a NOT NULL constraint to be present when adding identity
property to a column. The behavior in the patch seems to be consistent
with this.


I think it makes sense that the NOT NULL constraint must be added 
manually before attaching  is allowed.






Re: Add --check option to pgindent

2023-12-21 Thread Jelte Fennema-Nio
On Tue, 19 Dec 2023 at 17:54, Tristan Partin  wrote:
> I was envisioning something along the lines of:
>
> pgindent --check --diff > patches.txt
> status=$?
> patch  with manual parsing
> exit $status

Okay, I got a working version. And I updated the pre-commit hook on
the wiki accordingly.




int4->bool test coverage

2023-12-21 Thread Christoph Berg
I was surprised to learn that 2 is a valid boolean (thanks Berge):

# select 2::boolean;
 bool
──
 t

... while '2' is not:

# select '2'::boolean;
ERROR:  22P02: invalid input syntax for type boolean: "2"
LINE 1: select '2'::boolean;
   ^
LOCATION:  boolin, bool.c:151


The first cast is the int4_bool function, but it isn't covered by the
regression tests at all. The attached patch adds tests.

Christoph
>From 5752d75122db323b4066dd604d0c7a19077641a6 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Thu, 21 Dec 2023 11:43:28 +0100
Subject: [PATCH] Add tests for int4_bool

This cast was previously not covered at all by the regression tests.
---
 src/test/regress/expected/boolean.out | 19 +++
 src/test/regress/sql/boolean.sql  |  6 ++
 2 files changed, 25 insertions(+)

diff --git a/src/test/regress/expected/boolean.out b/src/test/regress/expected/boolean.out
index ee9c244bf8..ff9440df7e 100644
--- a/src/test/regress/expected/boolean.out
+++ b/src/test/regress/expected/boolean.out
@@ -566,6 +566,25 @@ SELECT isnul OR istrue OR isfalse FROM booltbl4;
  t
 (1 row)
 
+-- Casts
+SELECT 0::boolean;
+ bool 
+--
+ f
+(1 row)
+
+SELECT 1::boolean;
+ bool 
+--
+ t
+(1 row)
+
+SELECT 2::boolean;
+ bool 
+--
+ t
+(1 row)
+
 --
 -- Clean up
 -- Many tables are retained by the regression test, but these do not seem
diff --git a/src/test/regress/sql/boolean.sql b/src/test/regress/sql/boolean.sql
index bc9937d692..cde6cd3576 100644
--- a/src/test/regress/sql/boolean.sql
+++ b/src/test/regress/sql/boolean.sql
@@ -251,6 +251,12 @@ SELECT istrue OR isfalse OR isnul FROM booltbl4;
 SELECT isnul OR istrue OR isfalse FROM booltbl4;
 
 
+-- Casts
+SELECT 0::boolean;
+SELECT 1::boolean;
+SELECT 2::boolean;
+
+
 --
 -- Clean up
 -- Many tables are retained by the regression test, but these do not seem
-- 
2.43.0



Re: POC: GROUP BY optimization

2023-12-21 Thread Alexander Korotkov
Hi!

On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
 wrote:
>
> New version of the patch. Fixed minor inconsistencies and rebased onto
> current master.

Thank you (and other authors) for working on this subject.  Indeed to
GROUP BY clauses are order-agnostic.  Reordering them in the most
suitable order could give up significant query planning benefits.  I
went through the thread: I see significant work has been already made
on this patch, the code is quite polished.

I'd like to make some notes.

1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls.  I think
it would be good to extract it into a separate function.  Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.

2) I wonder what planning overhead this patch could introduce?  Could
you try to measure the worst case?  What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index.  This should give us an understanding on whether we need
a separate GUC to control this feature.

3) I see that get_useful_group_keys_orderings() makes 3 calls to
get_cheapest_group_keys_order() function.  Each time
get_cheapest_group_keys_order() performs the cost estimate and
reorders the free keys.  However, cost estimation implies the system
catalog lookups (that is quite expensive).  I wonder if we could
change the algorithm.  Could we just sort the group-by keys by cost
once, save this ordering and then just re-use it.  So, every time we
need to reorder a group by, we can just pull the required keys to the
top and use saved ordering for the rest.  I also wonder if we could do
this once for add_paths_to_grouping_rel() and
create_partial_grouping_paths() calls.  So, it probably should be
somewhere in create_ordinary_grouping_paths().

4) I think we can do some optimizations when enable_incremental_sort
== off.  Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.

--
Regards,
Alexander Korotkov




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Daniel Gustafsson
> On 21 Dec 2023, at 10:16, Andres Freund  wrote:
> 
> Hi,
> 
> On 2023-12-20 15:28:56 +0100, Daniel Gustafsson wrote:
>> +  time make -s -j${BUILD_JOBS} -C doc/src/sgml all INSTALL
>> unrelated pet peeve: "make -C doc/src/sgml all" doesn't build all docs 
>> targets..
> 
> Well, building the PDF takes a *long* time and is rarely required. I think
> there's an argument for adding INSTALL to all - however, there's a reason not
> to as well: It has pandoc as an additional dependency, which isn't small...

Yeah, I'm not advocating changing anything.

--
Daniel Gustafsson





Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2023-12-21 Thread Pavel Stehule
Hi

čt 21. 12. 2023 v 11:18 odesílatel Peter Eisentraut 
napsal:

> On 19.12.23 17:26, Ishaan Adarsh wrote:
> > Subject: Clarification on the Purpose of the Patch
> >
> > Hi Peter,
> >
> > The intention was to address the challenge faced by newcomers in
> > understanding how to write an extension for PostgreSQL. The existing
> > documentation, while comprehensive, lacks a consolidated and
> > easy-to-follow tutorial that serves as a quick start guide. The goal was
> > to create a beginner-friendly resource that assumes only knowledge of
> > Postgres and the target language, making it accessible for new
> > contributors because the barrier for entry is prohibitive for new
> > contributors. There are various third-party blog posts focusing on
> > different areas, and sometimes contradictory.
>
> Have you seen this:
>
> https://www.postgresql.org/docs/devel/extend-extensions.html#EXTEND-EXTENSIONS-EXAMPLE
>
> Maybe that could be extended/modified/simplified?
>
> > Specifically:
> > 1. The new section titled "Quick Start Guide" aims to provide
> > step-by-step instructions for users to get started with writing
> > extensions in PL/pgSQL and PL/Python.
>
> What's confusing here is writing an extension in a PL language is not a
> normal use case I'd say.  The normal use case involves some C code.
>

 Extensions were designed for C, but they are working with PL well too.
Some of my customers use extensions for PLpgSQL and they are almost happy.
1) there is nothing else, 2) it is really works

I agree with Peter - this topic is not what I imagine under "Quick start
guide"

Regards

Pavel


Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread shveta malik
On Thu, Dec 21, 2023 at 3:10 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> > While listening at Bertrand's talk about logical decoding on standbys
> > last week at Prague, I got surprised by the fact that we do not
> > reflect in the catalogs the reason why a conflict happened for a slot.
> > There are three of them depending on ReplicationSlotInvalidationCause:
> > - WAL removed.
> > - Invalid horizon.
> > - Insufficient WAL level.
>
> It should be extremely rare to hit any of these other than "WAL removed", so
> I'm not sure it's worth adding interface complexity to show them.
>
>
> > ReplicationSlotCtl holds this information, so couldn't it be useful
> > for monitoring purposes to know why a slot got invalidated and add a
> > column to pg_get_replication_slots()?  This could just be an extra
> > text conflicting_reason, defaulting to NULL when there's nothing to
> > see.
>
> Extra columns aren't free from a usability perspective. IFF we do something, I
> think it should be a single column with a cause.

Thanks for the feedback. But do you mean that we replace existing
'conflicting' column with 'cause' in both the function and view
(pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
that we expose 'cause' from pg_get_replication_slots() and use that to
display 'conflicting' in pg_replication_slots view?

And if we plan to return/display cause from either function or view,
then shall it be enum 'ReplicationSlotInvalidationCause' or
description/text corresponding to enum?

 In the other feature being discussed "Synchronize slots from primary
to standby" [1] , there is a requirement to replicate invalidation
cause of slot from the primary to standby and thus it is needed in
enum form there. And thus there was a suggestion earlier to have the
function return enum-value and let the view display it as
text/description to the user.  So kindly let us know your thoughts.

[1] - 
https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011...@enterprisedb.com

thanks
Shveta




Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2023-12-21 Thread Peter Eisentraut

On 19.12.23 17:26, Ishaan Adarsh wrote:

Subject: Clarification on the Purpose of the Patch

Hi Peter,

The intention was to address the challenge faced by newcomers in 
understanding how to write an extension for PostgreSQL. The existing 
documentation, while comprehensive, lacks a consolidated and 
easy-to-follow tutorial that serves as a quick start guide. The goal was 
to create a beginner-friendly resource that assumes only knowledge of 
Postgres and the target language, making it accessible for new 
contributors because the barrier for entry is prohibitive for new 
contributors. There are various third-party blog posts focusing on 
different areas, and sometimes contradictory.


Have you seen this: 
https://www.postgresql.org/docs/devel/extend-extensions.html#EXTEND-EXTENSIONS-EXAMPLE


Maybe that could be extended/modified/simplified?


Specifically:
1. The new section titled "Quick Start Guide" aims to provide 
step-by-step instructions for users to get started with writing 
extensions in PL/pgSQL and PL/Python.


What's confusing here is writing an extension in a PL language is not a 
normal use case I'd say.  The normal use case involves some C code.






Re: speed up a logical replica setup

2023-12-21 Thread Amit Kapila
On Wed, Nov 1, 2023 at 7:10 PM Ashutosh Bapat
 wrote:
>
> Here are some comments about functionality and design.
>
> + 
> + 
> + pg_subscriber creates one replication slot for
> + each specified database on the source server. The replication slot name
> + contains a pg_subscriber prefix. These replication
> + slots will be used by the subscriptions in a future step. Another
> + replication slot is used to get a consistent start location. This
> + consistent LSN will be used as a stopping point in the  + linkend="guc-recovery-target-lsn"/> parameter and by the
> + subscriptions as a replication starting point. It guarantees that no
> + transaction will be lost.
> + 
> + 
>
> CREATE_REPLICATION_SLOT would wait for any incomplete transaction to
> complete. So it may not be possible to have an incomplete transaction
> on standby when it comes out of recovery. Am I correct? Can we please
> have a testcase where we test this scenario? What about a prepared
> transactions?
>

It will wait even for prepared transactions to commit. So, there
shouldn't be any behavior difference for prepared and non-prepared
transactions.

> +
> + 
> + 
> + pg_subscriber writes recovery parameters into
> + the target data directory and start the target server. It specifies a LSN
> + (consistent LSN that was obtained in the previous step) of write-ahead
> + log location up to which recovery will proceed. It also specifies
> + promote as the action that the server should take once
> + the recovery target is reached. This step finishes once the server ends
> + standby mode and is accepting read-write operations.
> + 
> + 
>
> At this stage the standby would have various replication objects like
> publications, subscriptions, origins inherited from the upstream
> server and possibly very much active. With failover slots, it might
> inherit replication slots. Is it intended that the new subscriber also
> acts as publisher for source's subscribers OR that the new subscriber
> should subscribe to the upstreams of the source? Some use cases like
> logical standby might require that but a multi-master multi-node setup
> may not. The behaviour should be user configurable.
>

Good points but even if we make it user configurable how to exclude
such replication objects? And if we don't exclude then what will be
their use because if one wants to use it as a logical standby then we
only need publications and failover/sync slots in it and also there
won't be a need to create new slots, publications on the primary to
make the current physical standby as logical subscriber.

> There may be other objects in this category which need special consideration 
> on
> the subscriber. I haven't fully thought through the list of such objects.
>
> + uses the replication slot that was created in a previous step. The
> + subscription is created but it is not enabled yet. The reason is the
> + replication progress must be set to the consistent LSN but replication
> + origin name contains the subscription oid in its name. Hence, the
>
> Not able to understand the sentence "The reason is ... in its name".
> Why is subscription OID in origin name matters?
>

Using subscription OID in origin is probably to uniquely identify the
origin corresponding to the subscription, we do that while creating a
subscription as well.

-- 
With Regards,
Amit Kapila.




Re: Postgres picks suboptimal index after building of an extended statistics

2023-12-21 Thread Alexander Korotkov
On Thu, Dec 21, 2023 at 10:41 AM Andrei Lepikhov
 wrote:
>
> On 18/12/2023 15:29, Alexander Korotkov wrote:
> > Also, there is a set of patches [7], [8], and [9], which makes the
> > optimizer consider path selectivity as long as path costs during the
> > path selection.  I've rechecked that none of these patches could resolve
> > the original problem described in [1].
> It is true. We accidentally mixed two different problems in one thread.
> >  Also, I think they are quite
> > tricky.  The model of our optimizer assumes that paths in the list
> > should be the different ways of getting the same result.  If we choose
> > the paths by their selectivity, that breaks this model.  I don't say
> > there is no way for this.  But if we do this, that would require
> > significant rethinking of our optimizer model and possible revision of a
> > significant part of it.
> I can't understand that. In [9] we just elaborate the COSTS_EQUAL case
> and establish final decision on more stable basis than a casual order of
> indexes in the list.

I took a closer look at the patch in [9].  I should drop my argument
about breaking the model, because add_path() already considers other
aspects than just costs.  But I have two more note about that patch:

1) It seems that you're determining the fact that the index path
should return strictly one row by checking path->rows <= 1.0 and
indexinfo->unique.  Is it really guaranteed that in this case quals
are matching unique constraint?  path->rows <= 1.0 could be just an
estimation error.  Or one row could be correctly estimated, but it's
going to be selected by some quals matching unique constraint and
other quals in recheck.  So, it seems there is a risk to select
suboptimal index due to this condition.

2) Even for non-unique indexes this patch is putting new logic on top
of the subsequent code.  How we can prove it's going to be a win?
That could lead, for instance, to dropping parallel-safe paths in
cases we didn't do so before.

Anyway, please start a separate thread if you're willing to put more
work into this.

> >  Anyway, I think if there is still interest in
> > this, that should be moved into a separate thread to keep this thread
> > focused on the problem described in [1].
> Agree. IMO, the problem of optimizer dependency on an order of indexes
> in the relation index list is more urgent for now.
> >
> > Finally, I'd like to note that the issue described in [1] is mostly the
> > selectivity estimation problem.  It could be solved by adding the
> > multi-column MCV statistics.  The patches published so far look more
> > like hacks for particular use cases rather than appropriate solutions.
> > It still looks promising to me to use the knowledge of unique
> > constraints during selectivity estimation [10].  Even though it's hard
> > to implement and possibly implies some overhead, it fits the current
> > model.  I also think unique contracts could probably be used in some way
> > to improve estimates even when there is no full match.
> I have tried to use the knowledge about unique indexes in the
> selectivity estimation routine. But it looks invasive and adds a lot of
> overhead.

I got it.  But it doesn't look enough to decide this is no way.  Could
you, please, share some of your results?  It might happen that we just
need to rework some of data structures to make this information more
easily accessible at selectivity estimation stage.

--
Regards,
Alexander Korotkov




Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> While listening at Bertrand's talk about logical decoding on standbys
> last week at Prague, I got surprised by the fact that we do not
> reflect in the catalogs the reason why a conflict happened for a slot.
> There are three of them depending on ReplicationSlotInvalidationCause:
> - WAL removed.
> - Invalid horizon.
> - Insufficient WAL level.

It should be extremely rare to hit any of these other than "WAL removed", so
I'm not sure it's worth adding interface complexity to show them.


> ReplicationSlotCtl holds this information, so couldn't it be useful
> for monitoring purposes to know why a slot got invalidated and add a
> column to pg_get_replication_slots()?  This could just be an extra
> text conflicting_reason, defaulting to NULL when there's nothing to
> see.

Extra columns aren't free from a usability perspective. IFF we do something, I
think it should be a single column with a cause.

Greetings,

Andres Freund




Re: Synchronizing slots from primary to standby

2023-12-21 Thread Bertrand Drouvot
Hi,

On Thu, Dec 21, 2023 at 02:23:12AM +, Zhijie Hou (Fujitsu) wrote:
> On Wednesday, December 20, 2023 8:42 PM Zhijie Hou (Fujitsu) 
>  wrote:
> > 
> > Attach the V51 patch set which addressed Kuroda-san's comments.
> > I also tried to improve the test in 0003 to make it stable.
> 
> The patches conflict with a recent commit dc21234.
> Here is the rebased V51_2 version, there is no code changes in this version.
> 

Thanks!

I've a few remarks regarding 0001:

1 === 

In the commit message what about replacing "Allow logical walsenders to wait for
the physical standbys" with "Force some logical walsenders to wait for the
physical standbys"?

Also I think it would be better to first explain what we are trying to achieve
and after explain how we do it (adding a new flag in CREATE SUBSCRIPTION and so
on).

2 ===

+  
+   
+List of physical replication slots that logical replication slots with
+failover enabled waits for.

Worth to add a few words about what we are actually waiting for?

3 ===

+   ereport(ERROR,
+   (errcode(ERRCODE_PROTOCOL_VIOLATION),
+errmsg("could not alter replication slot 
\"%s\" on publisher: %s",
+   slotname, 
pchomp(PQerrorMessage(conn->streamConn);

should we mention "on publisher" here, what about removing the word "publisher"?

4 ===

@@ -248,10 +262,13 @@ ReplicationSlotValidateName(const char *name, int elevel)
  * during getting changes, if the two_phase option is enabled it can skip
  * prepare because by that time start decoding point has been moved. So the
  * user will only get commit prepared.
+ * failover: If enabled, allows the slot to be synced to physical standbys so
+ * that logical replication can be resumed after failover.

s/allows/forces ?

5 ===

+   boolok;

parse_ok maybe?

6 ===

+   /* Need a modifiable copy of string. */
+   rawname = pstrdup(*newval);

It seems to me that the single line comments in the neighborhood functions (see
RestoreSlotFromDisk() for example) don't finish with ".". Worth to follow the
same format for all what we add in slot.c?

7 ===

+static void
+parseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover)

ParseAlterReplSlotOptions instead?

8 ===

+* We do not need to change the failover to false if 
the server
+* does not support failover (e.g. pre-PG17)

Missing "." at the end.

9 ===

+* See comments above for twophasestate, same holds true for
+* 'failover'

Missing "." at the end.

10 ===

+++ b/src/include/replication/walsender.h
@@ -12,6 +12,8 @@
 #ifndef _WALSENDER_H
 #define _WALSENDER_H

+#include "access/xlogdefs.h"

Is this include needed?

11 ===

+* When the wait event is WAIT_FOR_STANDBY_CONFIRMATION, wait on another
+* CV that is woken up by physical walsenders when the walreceiver has
+* confirmed the receipt of LSN.

s/that is woken up by/that is broadcasted by/ ?

12 ===

We are mentioning in several places that the replication can be resumed after a
failover. Should we add a few words about possible lag? (see [1])

[1]: 
https://www.postgresql.org/message-id/CAA4eK1KihniOK21mEVYtSOHRQiGNyToUmENWp7hPbH_PMsqzkA%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Autonomous transactions 2023, WIP

2023-12-21 Thread Pavel Stehule
Hi

although I like the idea related to autonomous transactions, I don't think
so this way is the best

1. The solution based on background workers looks too fragile - it can be
easy to exhaust all background workers, and because this feature is
proposed mainly for logging, then it is a little bit dangerous, because it
means loss of possibility of logging.

2. although the Oracle syntax is interesting, and I proposed PRAGMA more
times,  it doesn't allow this functionality in other PL

I don't propose exactly  firebird syntax
https://firebirdsql.org/refdocs/langrefupd25-psql-autonomous-trans.html,
but I think this solution is better than ADA's PRAGMAs. I can imagine some
special flag for function like

CREATE OR REPLACE FUNCTION ...
AS $$
$$ LANGUAGE plpgsql AUTONOMOUS TRANSACTION;

as another possibility.

3. Heikki wrote about the possibility to support threads in Postgres. One
significant part of this project is elimination of global variables. It can
be common with autonomous transactions.

Surely, the first topic should be the method of implementation. Maybe I
missed it, but there is no agreement of background worker based.

Regards

Pavel


Re: Synchronizing slots from primary to standby

2023-12-21 Thread shveta malik
On Wed, Dec 20, 2023 at 12:02 PM Peter Smith  wrote:
>
> Here are some comments for the patch v50-0002.

Thank You for the feedback. I have addressed these in v52.

> ==
> GENERAL
>
> (I made a short study of all the ereports in this patch -- here are
> some findings)
>
> ~~~
>
> 0.1 Don't need the parentheses.
>
> Checking all the ereports I see that half of them have the redundant
> parentheses and half of them do not; You might as well make them all
> use the new style where the extra parentheses are not needed.
>
> e.g.
> + ereport(LOG,
> + (errmsg("skipping slot synchronization"),
> + errdetail("enable_syncslot is disabled.")));
>
> e.g.
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot drop replication slot \"%s\"", name),
> + errdetail("This slot is being synced from the primary server.")));
>
> and many more like this. Search for all the ereports.
>
> ~~~
>
> 0.2
> + ereport(LOG,
> + (errmsg("dropped replication slot \"%s\" of dbid %d as it "
> + "was not sync-ready", NameStr(s->data.name),
> + s->data.database)));
>
> I felt maybe that could be:
>
> errmsg("dropped replication slot \"%s\" of dbid %d", ...
> errdetail("It was not sync-ready.")
>
> (now this shares the same errmsg with another ereport)
>
> ~~~
>
> 0.3.
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("skipping sync of slot \"%s\" as it is a user created"
> + " slot", remote_slot->name),
> + errdetail("This slot has failover enabled on the primary and"
> +" thus is sync candidate but user created slot with"
> +" the same name already exists on the standby.")));
>
> This seemed too wordy. Can't it be shortened (maybe like below)
> without losing any of the vital information?
>
> errmsg("skipping sync of slot \"%s\"", ...)
> errdetail("A user-created slot with the same name already exists on
> the standby.")

I have modified it a little bit more. Please see now. I wanted to add
the info that slot-sync worker is exiting instead of skipping a slot
and that the concerned slot is a failover slot on primary. These were
the other comments around the same.

> ~~~
>
> 0.4
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("exiting from slot synchronization due to bad configuration"),
> + /* translator: second %s is a GUC variable name */
> + errdetail("The primary slot \"%s\" specified by %s is not valid.",
> +PrimarySlotName, "primary_slot_name")));
>
> /The primary slot/The primary server slot/
>
> ~~~
>
> 0.5
> + ereport(ERROR,
> + (errmsg("could not fetch primary_slot_name \"%s\" info from the "
> + "primary: %s", PrimarySlotName, res->err)));
>
> /primary:/primary server:/
>
> ~~~
>
> 0.6
> The continuations for long lines are inconsistent. Sometimes there are
> trailing spaces and sometimes there are leading spaces. And sometimes
> there are both at the same time which would cause double-spacing in
> the message! Please make them all the same. I think using leading
> spaces is easier but YMMV.
>
> e.g.
> + elog(ERROR,
> + "not synchronizing local slot \"%s\" LSN(%X/%X)"
> + " to remote slot's LSN(%X/%X) as synchronization "
> + " would move it backwards", remote_slot->name,
> + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
>
> ==
> src/backend/replication/logical/slotsync.c
>
> 1. check_primary_info
>
> + /* No need to check further, return that we are cascading standby */
> + if (remote_in_recovery)
> + {
> + *am_cascading_standby = true;
> + ExecClearTuple(tupslot);
> + walrcv_clear_result(res);
> + CommitTransactionCommand();
> + return;
> + }
> +
> + valid = DatumGetBool(slot_getattr(tupslot, 2, ));
> + Assert(!isnull);
> +
> + if (!valid)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("exiting from slot synchronization due to bad configuration"),
> + /* translator: second %s is a GUC variable name */
> + errdetail("The primary slot \"%s\" specified by %s is not valid.",
> +PrimarySlotName, "primary_slot_name")));
> + ExecClearTuple(tupslot);
> + walrcv_clear_result(res);
> + CommitTransactionCommand();
> +}
>
> Now that there is a common cleanup/return code this function be
> reduced further like below:
>
> SUGGESTION
>
> if (remote_in_recovery)
> {
>   /* No need to check further, return that we are cascading standby */
>   *am_cascading_standby = true;
> }
> else
> {
>   /* We are a normal standby. */
>
>   valid = DatumGetBool(slot_getattr(tupslot, 2, ));
>   Assert(!isnull);
>
>   if (!valid)
> ...
> }
>
> ExecClearTuple(tupslot);
> walrcv_clear_result(res);
> CommitTransactionCommand();
> }
>
> ~~~
>
> 2. ReplSlotSyncWorkerMain
>
> + /*
> + * One can promote the standby and we can no longer be a cascading
> + * standby. So recheck here.
> + */
> + if (am_cascading_standby)
> + check_primary_info(wrconn, _cascading_standby);
>
> Minor rewording of that new comment.
>
> SUGGESTION
> If the standby was promoted then what was 

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 11 Dec 2023 23:31:29 +0900,
  Masahiko Sawada  wrote:

> I've sketched the above idea including a test module in
> src/test/module/test_copy_format, based on v2 patch. It's not splitted
> and is dirty so just for discussion.

I implemented a sample COPY TO handler for Apache Arrow that
supports only integer and text.

I needed to extend the patch:

1. Add an opaque space for custom COPY TO handler
   * Add CopyToState{Get,Set}Opaque()
   
https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944

2. Export CopyToState::attnumlist
   * Add CopyToStateGetAttNumList()
   
https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688

3. Export CopySend*()
   * Rename CopySend*() to CopyToStateSend*() and export them
   * Exception: CopySendEndOfRow() to CopyToStateFlush() because
 it just flushes the internal buffer now.
   
https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e

The attached patch is based on the Sawada-san's patch and
includes the above changes. Note that this patch is also
dirty so just for discussion.

My suggestions from this experience:

1. Split COPY handler to COPY TO handler and COPY FROM handler

   * CopyFormatRoutine is a bit tricky. An extension needs
 to create a CopyFormatRoutine node and
 a CopyToFormatRoutine node.

   * If we just require "copy_to_${FORMAT}(internal)"
 function and "copy_from_${FORMAT}(internal)" function,
 we can remove the tricky approach. And it also avoid
 name collisions with other handler such as tablesample
 handler.
 See also:
 
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9

2. Need an opaque space like IndexScanDesc::opaque does

   * A custom COPY TO handler needs to keep its data

3. Export CopySend*()

   * If we like minimum API, we just need to export
 CopySendData() and CopySendEndOfRow(). But
 CopySend{String,Char,Int32,Int16}() will be convenient
 custom COPY TO handlers. (A custom COPY TO handler for
 Apache Arrow doesn't need them.)

Questions:

1. What value should be used for "format" in
   PgMsg_CopyOutResponse message?

   
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144

   It's 1 for binary format and 0 for text/csv format.

   Should we make it customizable by custom COPY TO handler?
   If so, what value should be used for this?

2. Do we need more tries for design discussion for the first
   implementation? If we need, what should we try?


Thanks,
-- 
kou
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b562..e7597894bf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -23,6 +23,7 @@
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -32,6 +33,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteHandler.h"
 #include "utils/acl.h"
@@ -427,6 +429,8 @@ ProcessCopyOptions(ParseState *pstate,
 
 	opts_out->file_encoding = -1;
 
+	/* Text is the default format. */
+	opts_out->to_ops = 
 	/* Extract options from the statement node tree */
 	foreach(option, options)
 	{
@@ -442,9 +446,26 @@ ProcessCopyOptions(ParseState *pstate,
 			if (strcmp(fmt, "text") == 0)
  /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
+			{
 opts_out->csv_mode = true;
+opts_out->to_ops = 
+			}
 			else if (strcmp(fmt, "binary") == 0)
+			{
 opts_out->binary = true;
+opts_out->to_ops = 
+			}
+			else if (!is_from)
+			{
+/*
+ * XXX: Currently we support custom COPY format only for COPY
+ * TO.
+ *
+ * XXX: need to check the combination of the existing options
+ * and a custom format (e.g., FREEZE)?
+ */
+opts_out->to_ops = GetCopyToFormatRoutine(fmt);
+			}
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -864,3 +885,62 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 
 	return attnums;
 }
+
+static CopyFormatRoutine *
+GetCopyFormatRoutine(char *format_name, bool is_from)
+{
+	Oid			handlerOid;
+	Oid			funcargtypes[1];
+	CopyFormatRoutine *cp;
+	Datum		datum;
+
+	funcargtypes[0] = INTERNALOID;
+	handlerOid = LookupFuncName(list_make1(makeString(format_name)), 1,
+funcargtypes, true);
+
+	if (!OidIsValid(handlerOid))
+		ereport(ERROR,
+(errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("COPY format \"%s\" not recognized", format_name)));
+
+	datum = 

Re: Function to get invalidation cause of a replication slot.

2023-12-21 Thread Amit Kapila
On Thu, Dec 21, 2023 at 12:07 PM Michael Paquier  wrote:
>
> On Thu, Dec 21, 2023 at 11:53:04AM +0530, Amit Kapila wrote:
> > On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier  
> > wrote:
> > Yeah, if one uses them independently then there is no such guarantee.
>
> This could be possible in the same query as well, still less likely,
> as the contents are volatile.
>

True, this is quite obvious but that was not a recommended way to use
the function. Anyway, now that we agree to expose it via an existing
function, there is no point in further argument on this.

> >>  A lot could happen between both function calls while the
> >> repslot LWLock is not hold.
> >>
> >> Yeah, you could keep the reason text as NULL when there is no
> >> conflict, replacing the boolean by the text in the function, and keep
> >> the view definition compatible with v16 while adding an extra column.
> >
> > But as mentioned we also want the enum value to be exposed in some way
> > so that it can be used by the sync slot feature [1] as well,
> > otherwise, we may need some mappings to convert the text back to an
> > enum. I guess if we want to expose via view, then we can return an
> > enum value by pg_get_replication_slots() and the view can replace it
> > with text based on the value.
>
> Sure.  Something like is OK by me as long as the data is retrieved
> from a single scan of the slot data while holding the slot data's
> LWLock.
>

Okay, so let's go this way unless someone feels otherwise.

-- 
With Regards,
Amit Kapila.




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-21 08:39:26 +0900, Michael Paquier wrote:
> On Wed, Dec 20, 2023 at 11:36:28AM -0500, Tom Lane wrote:
> > Andres Freund  writes:
> >> We fairly regularly have commits breaking the generation of INSTALL. IIRC 
> >> we
> >> recently discussed building it locally unconditionally, but I couldn't
> >> immediately find that discussion.  Until then, I think we should at least
> >> build it in CI so that cfbot can warn.
> > 
> > I thought the plan was to get rid of that file, in pursuit of making
> > our distribution tarballs be more or less pure git pulls.  Instead of
> > expending more effort on it, why not just push that project forward?
> > (IIRC, what we intended to do instead was to modify the top-level
> > README to point at the HTML install directions on the web.)

Ah, right.  I don't really care what solution we go for, just that as long as
we have INSTALL, we should make sure we don't regularly break it... Both
Michael and I have in the last couple weeks.


> Hmm.  It depends on if the next release should include it or not, but
> let me add my +1 for replacing it with a simple redirect.

Are you going to submit a patch for that bit?

Greetings,

Andres Freund




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Andres Freund
Hi,

On 2023-12-20 15:28:56 +0100, Daniel Gustafsson wrote:
> +  time make -s -j${BUILD_JOBS} -C doc/src/sgml all INSTALL
> unrelated pet peeve: "make -C doc/src/sgml all" doesn't build all docs 
> targets..

Well, building the PDF takes a *long* time and is rarely required. I think
there's an argument for adding INSTALL to all - however, there's a reason not
to as well: It has pandoc as an additional dependency, which isn't small...

Andres




Re: ci: Build standalone INSTALL file

2023-12-21 Thread Andres Freund
On 2023-12-21 08:44:33 +0900, Michael Paquier wrote:
> On Wed, Dec 20, 2023 at 03:28:56PM +0100, Daniel Gustafsson wrote:
> > +  time make -s -j${BUILD_JOBS} -C doc/src/sgml all INSTALL
> > unrelated pet peeve: "make -C doc/src/sgml all" doesn't build all docs 
> > targets..
> 
> That seems relevant in terms of coverage.  Why not just moving the
> INSTALL bit to a different line?

I am confused - which coverage could we be loosing here?




Re: Remove MSVC scripts from the tree

2023-12-21 Thread Peter Eisentraut

On 20.12.23 16:43, Peter Eisentraut wrote:

On 20.12.23 12:40, Andres Freund wrote:
Hm, or perhaps we should just get rid of sed use altogether. The 
sepgsql case

is trivially translateable to perl, and postprocess_dtrace.sed isn't
much harder.


Maybe yeah, but also it seems fine as is and we can easily fix the 
present issue ...



OTOH, I actually don't think it's valid to not have sed when you have
dtrace. Erroring out in a weird way in such an artificially 
constructed test

doesn't really seem like a problem.


Agreed.  So let's just make it not-required, and that should work.

Updated patch set attached.


I have committed these two.




Re: Postgres picks suboptimal index after building of an extended statistics

2023-12-21 Thread Andrei Lepikhov

On 18/12/2023 15:29, Alexander Korotkov wrote:
Also, there is a set of patches [7], [8], and [9], which makes the 
optimizer consider path selectivity as long as path costs during the 
path selection.  I've rechecked that none of these patches could resolve 
the original problem described in [1].

It is true. We accidentally mixed two different problems in one thread.
  Also, I think they are quite 
tricky.  The model of our optimizer assumes that paths in the list 
should be the different ways of getting the same result.  If we choose 
the paths by their selectivity, that breaks this model.  I don't say 
there is no way for this.  But if we do this, that would require 
significant rethinking of our optimizer model and possible revision of a 
significant part of it.
I can't understand that. In [9] we just elaborate the COSTS_EQUAL case 
and establish final decision on more stable basis than a casual order of 
indexes in the list.
  Anyway, I think if there is still interest in 
this, that should be moved into a separate thread to keep this thread 
focused on the problem described in [1].
Agree. IMO, the problem of optimizer dependency on an order of indexes 
in the relation index list is more urgent for now.


Finally, I'd like to note that the issue described in [1] is mostly the 
selectivity estimation problem.  It could be solved by adding the 
multi-column MCV statistics.  The patches published so far look more 
like hacks for particular use cases rather than appropriate solutions.  
It still looks promising to me to use the knowledge of unique 
constraints during selectivity estimation [10].  Even though it's hard 
to implement and possibly implies some overhead, it fits the current 
model.  I also think unique contracts could probably be used in some way 
to improve estimates even when there is no full match.
I have tried to use the knowledge about unique indexes in the 
selectivity estimation routine. But it looks invasive and adds a lot of 
overhead.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Remove MSVC scripts from the tree

2023-12-21 Thread Michael Paquier
On Wed, Dec 20, 2023 at 11:39:15PM -0800, Andres Freund wrote:
> Can't we teach the tool that it should not validate src/tools/win32tzlist.pl
> on !windows? It's obviously windows specific code, and it's special case
> enough that there doesn't seem like a need to develop it on !windows.

I am not really excited about keeping a dummy library for the sake of
a script checking if this WIN32-only file is correctly written, and
I've never used pgperlsyncheck, TBH, since it exists in af616ce48347.
Anyway, we could just tweak the list of files returned by
find_perl_files as win32tzlist.pl is valid for perltidy and
perlcritic.

Andrew, was the original target of pgperlsyncheck committers and
hackers who played with the MSVC scripts but could not run sanity
checks on Windows (see [1])?  There are a few more cases like the
Unicode scripts or some of the stuff in src/tools/ where that can be
useful still these are not touched on a daily basis.  The rest of the
pm files are for TAP tests, one for Unicode.  I'm OK to tweak the
script, still, if its main purpose is gone..

[1]: 
https://www.postgresql.org/message-id/f3c12e2c-618f-cb6f-082b-a2f604dbe...@2ndquadrant.com
--
Michael


signature.asc
Description: PGP signature