date:20231221

Re: Trigger violates foreign key constraint

2023-12-21 Thread Pavel Luzanov

I fully support this addition to the documentation. The legal 
possibility of breaking data consistency must be documented at least.


Please, consider small suggestion to replace last sentence.

- This is not considered a bug, and it is the responsibility of the user 
to write triggers so that such problems are avoided.

+ It is the trigger programmer's responsibility to avoid such scenarios.

To be consistent with the sentence about recursive trigger calls: [1]
"It is the trigger programmer's responsibility to avoid infinite 
recursion in such scenarios."


Also I don't really like "This is not considered a bug" part, since it 
looks like an excuse.



1. https://www.postgresql.org/docs/devel/trigger-definition.html

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com

Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages

2023-12-21 Thread Xiaoran Wang

Hi,
I updated the comment about the CatalogSnapshot `src/backend/utils/time/
snapmgr.c`

Xiaoran Wang  于2023年12月18日周一 15:02写道：

> Hi,
> Thanks for your reply.
>
> jian he  于2023年12月18日周一 08:20写道：
>
>> Hi
>> ---setup.
>> drop table s2;
>> create table s2(a int);
>>
>> After apply the patch
>> alter table s2 add primary key (a);
>>
>> watch CatalogSnapshot
>> 
>> #0  GetNonHistoricCatalogSnapshot (relid=1259)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:412
>> #1  0x55ba78f0d6ba in GetCatalogSnapshot (relid=1259)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:371
>> #2  0x55ba785ffbe1 in systable_beginscan
>> (heapRelation=0x7f256f30b5a8, indexId=2662, indexOK=false,
>> snapshot=0x0, nkeys=1, key=0x7ffe230f0180)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/access/index/genam.c:413
>> (More stack frames follow...)
>>
>> -
>> Hardware watchpoint 13: CatalogSnapshot
>>
>> Old value = (Snapshot) 0x55ba7980b6a0 
>> New value = (Snapshot) 0x0
>> InvalidateCatalogSnapshot () at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
>> 435 SnapshotResetXmin();
>> (gdb) bt 4
>> #0  InvalidateCatalogSnapshot ()
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
>> #1  0x55ba78f0ee85 in AtEOXact_Snapshot (isCommit=true,
>> resetXmin=false)
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:1057
>> #2  0x55ba7868201b in CommitTransaction ()
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:2373
>> #3  0x55ba78683495 in CommitTransactionCommand ()
>> at
>> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:3061
>> (More stack frames follow...)
>>
>> --
>> but the whole process changes pg_class, pg_index,
>> pg_attribute,pg_constraint etc.
>> only one GetCatalogSnapshot and  InvalidateCatalogSnapshot seems not
>> correct?
>> what if there are concurrency changes in the related pg_catalog table.
>>
>> your patch did pass the isolation test!
>>
>
> Yes, I have run the installcheck-world locally, and all the tests passed.
> There are two kinds of Invalidation Messages.
> One kind is from the local backend, such as what you did in the example
> "alter table s2 add primary key (a);",  it modifies the pg_class,
> pg_attribute ect ,
> so it generates some Invalidation Messages to invalidate the "s2" related
> tuples in pg_class , pg_attribute ect, and Invalidate Message to
> invalidate s2
> relation cache. When the command is finished, in the
> CommandCounterIncrement,
> those Invalidation Messages will be processed to make the system cache work
> well for the following commands.
>
> The other kind of Invalidation Messages are from other backends.
> Suppose there are two sessions:
> session1
> ---
> 1: create table foo(a int);
> ---
> session 2
> ---
> 1: create table test(a int); (before session1:1)
> 2: insert into foo values(1); (execute after session1:1)
> ---
> Session1 will generate Invalidation Messages and send them when the
> transaction is committed,
> and session 2 will accept those Invalidation Messages from session 1 and
> then execute
> the second command.
>
> Before the patch, Postgres will invalidate the CatalogSnapshot for those
> two kinds of Invalidation
> Messages. So I did a small optimization in this patch, for local
> Invalidation Messages, we don't
> call InvalidateCatalogSnapshot, we can use one CatalogSnapshot in a
> transaction even if we modify
> the catalog and generate Invalidation Messages, as the visibility of the
> tuple is identified by the curcid,
> as long as we update the curcid of the CatalogSnapshot in  
> SnapshotSetCommandId,
> it can work
> correctly.
>
>
>
>> I think you patch doing is against following code comments in
>> src/backend/utils/time/snapmgr.c
>>
>> /*
>>  * CurrentSnapshot points to the only snapshot taken in
>> transaction-snapshot
>>  * mode, and to the latest one taken in a read-committed transaction.
>>  * SecondarySnapshot is a snapshot that's always up-to-date as of the
>> current
>>  * instant, even in transaction-snapshot mode.  It should only be used for
>>  * special-purpose code (say, RI checking.)  CatalogSnapshot points to an
>>  * MVCC snapshot intended to be used for catalog scans; we must
>> invalidate it
>>  * whenever a system catalog change occurs.
>>  *
>>  * These SnapshotData structs are static to simplify memory allocation
>>  * (see the hack in GetSnapshotData to avoid repeated malloc/free).
>>  */
>> static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
>> static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
>> SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
>> SnapshotData SnapshotSelfData = {SNAPSHOT_SELF};
>> SnapshotData SnapshotAnyData = {SNAPSHOT_ANY};
>>
>
> Thank you for pointing it out, I think I need to update the

Optimization outcome depends on the index order

2023-12-21 Thread Andrei Lepikhov


On 21/12/2023 12:10, Alexander Korotkov wrote:
> I took a closer look at the patch in [9].  I should drop my argument
> about breaking the model, because add_path() already considers other
> aspects than just costs.  But I have two more note about that patch:
>
> 1) It seems that you're determining the fact that the index path
> should return strictly one row by checking path->rows <= 1.0 and
> indexinfo->unique.  Is it really guaranteed that in this case quals
> are matching unique constraint?  path->rows <= 1.0 could be just an
> estimation error.  Or one row could be correctly estimated, but it's
> going to be selected by some quals matching unique constraint and
> other quals in recheck.  So, it seems there is a risk to select
> suboptimal index due to this condition.

Operating inside the optimizer, we consider all estimations to be the 
sooth. This patch modifies only one place: having two equal assumptions, 
we just choose one that generally looks more stable.
Filtered tuples should be calculated and included in the cost of the 
path. The decision on the equality of paths has been made in view of the 
estimation of these filtered tuples.


> 2) Even for non-unique indexes this patch is putting new logic on top
> of the subsequent code.  How we can prove it's going to be a win?
> That could lead, for instance, to dropping parallel-safe paths in
> cases we didn't do so before.
Because we must trust all predictions made by the planner, we just 
choose the most trustworthy path. According to the planner logic, it is 
a path with a smaller selectivity. We can make mistakes anyway just 
because of the nature of estimation.


> Anyway, please start a separate thread if you're willing to put more
> work into this.

Done

> 9. https://www.postgresql.org/message-id/154f786a-06a0-4fb1-
> b8a4-16c66149731b%40postgrespro.ru

--
regards,
Andrei Lepikhov
Postgres ProfessionalFrom 7b044de1449a5fdc450cb629caafb4e15ded7a93 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 27 Nov 2023 11:23:48 +0700
Subject: [PATCH] Choose an index path with the best selectivity estimation.

In the case when optimizer predicts only one row prefer choosing UNIQUE indexes
In other cases, if optimizer treats indexes as equal, make a last attempt
selecting the index with less selectivity - this decision takes away dependency
on the order of indexes in an index list (good for reproduction of some issues)
and proposes one more objective argument to choose specific index.
---
 src/backend/optimizer/util/pathnode.c | 42 +++
 .../expected/drop-index-concurrently-1.out| 16 +++
 src/test/regress/expected/functional_deps.out | 39 +
 src/test/regress/expected/join.out| 40 +-
 src/test/regress/sql/functional_deps.sql  | 32 ++
 5 files changed, 143 insertions(+), 26 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d3..4b5aedd579 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -454,6 +454,48 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
costcmp = compare_path_costs_fuzzily(new_path, old_path,

 STD_FUZZ_FACTOR);
 
+   /*
+* Apply some heuristics on index paths.
+*/
+   if (IsA(new_path, IndexPath) && IsA(old_path, IndexPath))
+   {
+   IndexPath *inp = (IndexPath *) new_path;
+   IndexPath *iop = (IndexPath *) old_path;
+
+   if (new_path->rows <= 1.0 && old_path->rows <= 1.0)
+   {
+   /*
+* When both paths are predicted to produce 
only one tuple,
+* the optimiser should prefer choosing a 
unique index scan
+* in all cases.
+*/
+   if (inp->indexinfo->unique && 
!iop->indexinfo->unique)
+   costcmp = COSTS_BETTER1;
+   else if (!inp->indexinfo->unique && 
iop->indexinfo->unique)
+   costcmp = COSTS_BETTER2;
+   else if (costcmp != COSTS_DIFFERENT)
+   /*
+* If the optimiser doesn't have an 
obviously stable choice
+* of unique index, increase the chance 
of avoiding mistakes
+* by choosing an index with smaller 
selectivity.
+* This option makes decision more 
conservative and looks
+* debatable.
+

A typo in a messsage?

2023-12-21 Thread Kyotaro Horiguchi

I found the following message introduced by a recent commit.

> errdetail("The first unsummarized LSN is this range is %X/%X.",
 
Shouldn't the "is" following "LSN" be "in"?


diff --git a/src/backend/backup/basebackup_incremental.c 
b/src/backend/backup/basebackup_incremental.c
index 42bbe564e2..22b861ce52 100644
--- a/src/backend/backup/basebackup_incremental.c
+++ b/src/backend/backup/basebackup_incremental.c
@@ -575,7 +575,7 @@ PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
tle->tli,

LSN_FORMAT_ARGS(tli_start_lsn),

LSN_FORMAT_ARGS(tli_end_lsn)),
-errdetail("The first 
unsummarized LSN is this range is %X/%X.",
+errdetail("The first 
unsummarized LSN in this range is %X/%X.",
   
LSN_FORMAT_ARGS(tli_missing_lsn;
}
 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Transaction timeout

2023-12-21 Thread Japin Li


On Tue, 19 Dec 2023 at 22:06, Japin Li  wrote:
> On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin  wrote:
>>> On 19 Dec 2023, at 13:26, Andrey M. Borodin  wrote:
>>>
>>> I don’t have Windows machine, so I hope CF bot will pick this.
>>
>> I used Github CI to produce version of tests that seems to be is stable on 
>> Windows.
>
> It still failed on Windows Server 2019 [1].
>
> diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out 
> C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out
> --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 
> 10:34:30.354721100 +
> +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out  
> 2023-12-19 10:38:25.877981600 +
> @@ -100,7 +100,7 @@
>  step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE 
> application_name = 'isolation/timeouts/stt2'
>  count
>  -
> -0
> +1
>  (1 row)
>
>  step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET 
> statement_timeout = '10s'; SET lock_timeout = '10s'; SET transaction_timeout 
> = '10s';
>
> [1] 
> https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs

Hi,

I try to split the test for transaction timeout, and all passed on my CI [1].

OTOH, I find if I set transaction_timeout in a transaction, it will not take
effect immediately.  For example:

[local]:2049802 postgres=# BEGIN;
BEGIN
[local]:2049802 postgres=*# SET transaction_timeout TO '1s';
SET
[local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;  -- wait 10s
   relname
--
 pg_statistic
(1 row)

[local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;
FATAL:  terminating connection due to transaction timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

It looks odd.  Does this is expected? I'm not read all the threads,
am I missing something?

[1] https://cirrus-ci.com/build/6574686130143232

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

>From fb87e5fe2ea5ced51a7e443243cdd40115423449 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" 
Date: Sun, 3 Dec 2023 23:18:00 +0500
Subject: [PATCH v13 1/1] Introduce transaction_timeout

This commit adds timeout that is expected to be used as a prevention
of long-running queries. Any session within transaction will be
terminated after spanning longer than this timeout.

However, this timeout is not applied to prepared transactions.
Only transactions with user connections are affected.

Author: Andrey Borodin 
Reviewed-by: Nikolay Samokhvalov 
Reviewed-by: Andres Freund 
Reviewed-by: Fujii Masao 
Reviewed-by: bt23nguyent 
Reviewed-by: Yuhang Qiu 
Reviewed-by: Japin Li 

Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
---
 doc/src/sgml/config.sgml  | 35 +++
 src/backend/postmaster/autovacuum.c   |  2 +
 src/backend/storage/lmgr/proc.c   |  1 +
 src/backend/tcop/postgres.c   | 27 +++-
 src/backend/utils/errcodes.txt|  1 +
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 10 +++
 src/backend/utils/misc/guc_tables.c   | 11 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/bin/pg_dump/pg_backup_archiver.c  |  2 +
 src/bin/pg_dump/pg_dump.c |  2 +
 src/bin/pg_rewind/libpq_source.c  |  1 +
 src/include/miscadmin.h   |  1 +
 src/include/storage/proc.h|  1 +
 src/include/utils/timeout.h   |  1 +
 src/test/isolation/Makefile   |  5 +-
 src/test/isolation/expected/timeouts.out  | 63 ++-
 src/test/isolation/specs/timeouts.spec| 30 +
 18 files changed, 190 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b5624ca884..d62edcf83b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9134,6 +9134,41 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  transaction_timeout (integer)
+  
+   transaction_timeout configuration parameter
+  
+  
+  
+   
+Terminate any session that spans longer than the specified amount of
+time in transaction. The limit applies both to explicit transactions
+(started with BEGIN) and to implicitly started
+transaction corresponding to single statement. But this limit is not
+applied to prepared transactions.
+If this value is specified without units, it is taken as milliseconds.
+A value of zero (the default) disables the timeout.
+   
+
+   
+If transaction_timeout is shorter than
+

Re: trying again to get incremental backup

2023-12-21 Thread Alexander Lakhin


21.12.2023 23:43, Robert Haas wrote:

There are also two deadcode.DeadStores complaints from clang. First one is
about:
  /*
   * Align the wait time to prevent drift. This doesn't really matter,
   * but we'd like the warnings about how long we've been waiting to say
   * 10 seconds, 20 seconds, 30 seconds, 40 seconds ... without ever
   * drifting to something that is not a multiple of ten.
   */
  timeout_in_ms -=
  TimestampDifferenceMilliseconds(current_time, initial_time) %
  timeout_in_ms;
It looks like this timeout is really not used.

Oops. It should be. See attached.


My quick experiment shows that that TimestampDifferenceMilliseconds call
always returns zero, due to it's arguments swapped.

The other changes look good to me.

Thank you!

Best regards,
Alexander

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Junwang Zhao

On Thu, Dec 21, 2023 at 5:35 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 11 Dec 2023 23:31:29 +0900,
>   Masahiko Sawada  wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
>* Add CopyToState{Get,Set}Opaque()
>
> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
>* Add CopyToStateGetAttNumList()
>
> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>
> 3. Export CopySend*()
>* Rename CopySend*() to CopyToStateSend*() and export them
>* Exception: CopySendEndOfRow() to CopyToStateFlush() because
>  it just flushes the internal buffer now.
>
> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>
I guess the purpose of these helpers is to avoid expose CopyToState to
copy.h, but I
think expose CopyToState to user might make life easier, users might want to use
the memory contexts of the structure (though I agree not all the
fields are necessary
for extension handers).

> The attached patch is based on the Sawada-san's patch and
> includes the above changes. Note that this patch is also
> dirty so just for discussion.
>
> My suggestions from this experience:
>
> 1. Split COPY handler to COPY TO handler and COPY FROM handler
>
>* CopyFormatRoutine is a bit tricky. An extension needs
>  to create a CopyFormatRoutine node and
>  a CopyToFormatRoutine node.
>
>* If we just require "copy_to_${FORMAT}(internal)"
>  function and "copy_from_${FORMAT}(internal)" function,
>  we can remove the tricky approach. And it also avoid
>  name collisions with other handler such as tablesample
>  handler.
>  See also:
>  
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> 2. Need an opaque space like IndexScanDesc::opaque does
>
>* A custom COPY TO handler needs to keep its data

I once thought users might want to parse their own options, maybe this
is a use case for this opaque space.

For the name, I thought private_data might be a better candidate than
opaque, but I do not insist.
>
> 3. Export CopySend*()
>
>* If we like minimum API, we just need to export
>  CopySendData() and CopySendEndOfRow(). But
>  CopySend{String,Char,Int32,Int16}() will be convenient
>  custom COPY TO handlers. (A custom COPY TO handler for
>  Apache Arrow doesn't need them.)

Do you use the arrow library to control the memory? Is there a way that
we can let the arrow use postgres' memory context? I'm not sure this
is necessary, just raise the question for discussion.
>
> Questions:
>
> 1. What value should be used for "format" in
>PgMsg_CopyOutResponse message?
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144
>
>It's 1 for binary format and 0 for text/csv format.
>
>Should we make it customizable by custom COPY TO handler?
>If so, what value should be used for this?
>
> 2. Do we need more tries for design discussion for the first
>implementation? If we need, what should we try?
>
>
> Thanks,
> --
> kou

+PG_FUNCTION_INFO_V1(copy_testfmt_handler);
+Datum
+copy_testfmt_handler(PG_FUNCTION_ARGS)
+{
+ bool is_from = PG_GETARG_BOOL(0);
+ CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
+

extra semicolon.

-- 
Regards
Junwang Zhao

Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Amit Kapila

On Fri, Dec 22, 2023 at 5:00 AM Michael Paquier  wrote:
>
> On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:
> > On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
> >> We can return int2 value from the function pg_get_replication_slots()
> >> and then use that to display a string in the view
> >> pg_replication_slots.
> >
> > I strongly dislike that pattern. It just leads to complicated views - and
> > doesn't provide a single benefit that I am aware of. It's much bettter to
> > simply populate the text version in pg_get_replication_slots().
>
> I agree that this is a better integration in the view, and that's what
> I would do FWIW.
>
> Amit, how much of a problem would it be to do a text->enum mapping
> when synchronizing the slots from a primary to a standby?
>

There is no problem as such in that. We were trying to see if there is
a more convenient way but let's move by having cause as text from both
the function and view as that seems to be a preferred way.

-- 
With Regards,
Amit Kapila.

Re: int4->bool test coverage

2023-12-21 Thread Michael Paquier

On Thu, Dec 21, 2023 at 11:56:22AM +0100, Christoph Berg wrote:
> The first cast is the int4_bool function, but it isn't covered by the
> regression tests at all. The attached patch adds tests.

I don't see why not.

Interesting that there are a few more of these in int.c, like int2up,
int4inc, int2smaller, int{2,4}shr, int{2,4}not, etc.
--
Michael


signature.asc
Description: PGP signature

Re: ci: Build standalone INSTALL file

2023-12-21 Thread Michael Paquier

On Thu, Dec 21, 2023 at 02:22:02PM -0500, Tom Lane wrote:
> Here's a draft patch for this.  Most of it is mechanical removal of
> infrastructure for building the INSTALL file.  If anyone wants to
> bikeshed on the new wording of README, feel free.

Thanks for putting this together.  That looks reasonable.

> diff --git a/README b/README
> index 56d0c951a9..e40e610ccb 100644
> --- a/README
> +++ b/README
> @@ -9,14 +9,13 @@ that supports an extended subset of the SQL standard, 
> including
> -See the file INSTALL for instructions on how to build and install
> -PostgreSQL.  That file also lists supported operating systems and
> -hardware platforms and contains information regarding any other
> -software packages that are required to build or run the PostgreSQL
> -system.  Copyright and license information can be found in the
> -file COPYRIGHT.  A comprehensive documentation set is included in this
> -distribution; it can be read as described in the installation
> -instructions.
> +Copyright and license information can be found in the file COPYRIGHT.
> +
> +General documentation about this version of PostgreSQL can be found at:
> +https://www.postgresql.org/docs/devel/
> +In particular, information about building PostgreSQL from the source
> +code can be found at:
> +https://www.postgresql.org/docs/devel/installation.html

Sounds fine by me, including the extra step documented in
RELEASE_CHANGES.  No information is lost.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Masahiko Sawada

On Thu, Dec 21, 2023 at 6:35 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 11 Dec 2023 23:31:29 +0900,
>   Masahiko Sawada  wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
>* Add CopyToState{Get,Set}Opaque()
>
> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
>* Add CopyToStateGetAttNumList()
>
> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688

I think we can move CopyToState to copy.h and we don't need to have
set/get functions for its fields.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Masahiko Sawada

On Fri, Dec 22, 2023 at 10:00 AM Michael Paquier  wrote:
>
> On Thu, Dec 21, 2023 at 06:35:04PM +0900, Sutou Kouhei wrote:
> >* If we just require "copy_to_${FORMAT}(internal)"
> >  function and "copy_from_${FORMAT}(internal)" function,
> >  we can remove the tricky approach. And it also avoid
> >  name collisions with other handler such as tablesample
> >  handler.
> >  See also:
> >  
> > https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> Hmm.  I prefer the unique name approach for the COPY portions without
> enforcing any naming policy on the function names returning the
> handlers, actually, though I can see your point.

Yeah, another idea is to provide support functions to return a
CopyFormatRoutine wrapping either CopyToFormatRoutine or
CopyFromFormatRoutine. For example:

extern CopyFormatRoutine *MakeCopyToFormatRoutine(const
CopyToFormatRoutine *routine);

extensions can do like:

static const CopyToFormatRoutine testfmt_handler = {
.type = T_CopyToFormatRoutine,
.start_fn = testfmt_copyto_start,
.onerow_fn = testfmt_copyto_onerow,
.end_fn = testfmt_copyto_end
};

Datum
copy_testfmt_handler(PG_FUNCTION_ARGS)
{
CopyFormatRoutine *routine = MakeCopyToFormatRoutine(_handler);
:

>
> > 2. Need an opaque space like IndexScanDesc::opaque does
> >
> >* A custom COPY TO handler needs to keep its data
>
> Sounds useful to me to have a private area passed down to the
> callbacks.
>

+1

>
> > Questions:
> >
> > 1. What value should be used for "format" in
> >PgMsg_CopyOutResponse message?
> >
> >It's 1 for binary format and 0 for text/csv format.
> >
> >Should we make it customizable by custom COPY TO handler?
> >If so, what value should be used for this?
>
> Interesting point.  It looks very tempting to give more flexibility to
> people who'd like to use their own code as we have one byte in the
> protocol but just use 0/1.  Hence it feels natural to have a callback
> for that.

+1

>
> It also means that we may want to think harder about copy_is_binary in
> libpq in the future step.  Now, having a backend implementation does
> not need any libpq bits, either, because a client stack may just want
> to speak the Postgres protocol directly.  Perhaps a custom COPY
> implementation would be OK with how things are in libpq, as well,
> tweaking its way through with just text or binary.
>
> > 2. Do we need more tries for design discussion for the first
> >implementation? If we need, what should we try?
>
> A makeNode() is used with an allocation in the current memory context
> in the function returning the handler.  I would have assume that this
> stuff returns a handler as a const struct like table AMs.

+1

The example I mentioned above does that.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Michael Paquier

On Thu, Dec 21, 2023 at 06:35:04PM +0900, Sutou Kouhei wrote:
>* If we just require "copy_to_${FORMAT}(internal)"
>  function and "copy_from_${FORMAT}(internal)" function,
>  we can remove the tricky approach. And it also avoid
>  name collisions with other handler such as tablesample
>  handler.
>  See also:
>  
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9

Hmm.  I prefer the unique name approach for the COPY portions without
enforcing any naming policy on the function names returning the
handlers, actually, though I can see your point.

> 2. Need an opaque space like IndexScanDesc::opaque does
> 
>* A custom COPY TO handler needs to keep its data

Sounds useful to me to have a private area passed down to the
callbacks.

> 3. Export CopySend*()
> 
>* If we like minimum API, we just need to export
>  CopySendData() and CopySendEndOfRow(). But
>  CopySend{String,Char,Int32,Int16}() will be convenient
>  custom COPY TO handlers. (A custom COPY TO handler for
>  Apache Arrow doesn't need them.)

Hmm.  Not sure on this one.  This may come down to externalize the
manipulation of fe_msgbuf.  Particularly, could it be possible that
some custom formats don't care at all about the network order?

> Questions:
> 
> 1. What value should be used for "format" in
>PgMsg_CopyOutResponse message?
> 
>It's 1 for binary format and 0 for text/csv format.
> 
>Should we make it customizable by custom COPY TO handler?
>If so, what value should be used for this?

Interesting point.  It looks very tempting to give more flexibility to
people who'd like to use their own code as we have one byte in the
protocol but just use 0/1.  Hence it feels natural to have a callback
for that.

It also means that we may want to think harder about copy_is_binary in
libpq in the future step.  Now, having a backend implementation does
not need any libpq bits, either, because a client stack may just want
to speak the Postgres protocol directly.  Perhaps a custom COPY
implementation would be OK with how things are in libpq, as well,
tweaking its way through with just text or binary.

> 2. Do we need more tries for design discussion for the first
>implementation? If we need, what should we try?

A makeNode() is used with an allocation in the current memory context
in the function returning the handler.  I would have assume that this
stuff returns a handler as a const struct like table AMs.
--
Michael

signature.asc
Description: PGP signature

Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-21 Thread Michael Paquier

On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:
> On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
>> We can return int2 value from the function pg_get_replication_slots()
>> and then use that to display a string in the view
>> pg_replication_slots.
> 
> I strongly dislike that pattern. It just leads to complicated views - and
> doesn't provide a single benefit that I am aware of. It's much bettter to
> simply populate the text version in pg_get_replication_slots().

I agree that this is a better integration in the view, and that's what
I would do FWIW.

Amit, how much of a problem would it be to do a text->enum mapping
when synchronizing the slots from a primary to a standby?  Sure you
could have a system function that does some of the mapping work, but I
am not sure what's the best integration when it comes to the other
patch.
--
Michael


signature.asc
Description: PGP signature

Re: Remove MSVC scripts from the tree

2023-12-21 Thread Michael Paquier

On Thu, Dec 21, 2023 at 03:43:32PM -0500, Andrew Dunstan wrote:
> On 2023-12-21 Th 03:01, Michael Paquier wrote:
>> Andrew, was the original target of pgperlsyncheck committers and
>> hackers who played with the MSVC scripts but could not run sanity
>> checks on Windows (see [1])?
> 
> 
> yes.

Okay, thanks.  Wouldn't it be better to remove it at the end?  With
the main use case behind its introduction being gone, it is less
attractive to keep maintaining it.  If some people have been using it
in their workflows, I'm OK to keep it but the rest of the tree can be
checked at runtime as well.

> I'm actually a bit dubious about win32tzlist.pl. Win32::Registry is not
> present in a recent Strawberry Perl installation, and its latest version
> says it is obsolete, although it's still included in the cpan bundle
> libwin32.
> 
> I wonder who has actually run the script any time recently?

Hmm...  I've never run it with meson on Win32.

> In any case, we can probably work around the syncheck issue by making the
> module a runtime requirement rather than a compile time requirement, by
> using "require" instead of "use".

Interesting.  Another trick would be needed for HKEY_LOCAL_MACHINE,
like what the dummylib but local to win32tzlist.pl.  Roughly among
these lines:
-use Win32::Registry;
+use Config;
+
+require Win32::Registry;
 
 my $tzfile = 'src/bin/initdb/findtimezone.c';
 
+if ($Config{osname} ne 'MSWin32' && $Config{osname} ne 'msys')
+{
+   use vars qw($HKEY_LOCAL_MACHINE);
+}
--
Michael


signature.asc
Description: PGP signature

Re: broken master regress tests

2023-12-21 Thread Jeff Davis

On Wed, 2023-12-20 at 17:48 -0800, Jeff Davis wrote:
> Attached.

It appears to increase the coverage. I committed it and I'll see how
the buildfarm reacts.

Regards,
Jeff Davis

Re: Built-in CTYPE provider

2023-12-21 Thread Jeff Davis

On Wed, 2023-12-20 at 15:47 -0800, Jeremy Schneider wrote:

> One other thing that comes to mind: how does the parser do case
> folding
> for relation names? Is that using OS-provided libc as of today? Or
> did
> we code it to use ICU if that's the DB default? I'm guessing libc,
> and
> global catalogs probably need to be handled in a consistent manner,
> even
> across different encodings.

The code is in downcase_identifier():

  /*  
   * SQL99 specifies Unicode-aware case normalization, which we don't 
   * yet have the infrastructure for...
   */
  if (ch >= 'A' && ch <= 'Z')
ch += 'a' - 'A';
  else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
ch = tolower(ch);
  result[i] = (char) ch;

My proposal would add the infrastructure that the comment above says is
missing.

It seems like we should be using the database collation at this point
because you don't want inconsistency between the catalogs and the
parser here. Then again, the SQL spec doesn't seem to support tailoring
of case conversions, so maybe we are avoiding it for that reason? Or
maybe we're avoiding catalog access? Or perhaps the work for ICU just
wasn't done here yet?

> (Kindof related... did you ever see the demo where I create a user
> named
> '' and then I try to connect to a database with non-unicode
> encoding?
>   ...at least it seems to be able to walk the index without
> decoding
> strings to find other users - but the way these global catalogs work
> scares me a little bit)

I didn't see that specific demo, but in general we seem to change
between pg_wchar and unicode code points too freely, so I'm not
surprised that something went wrong.

Regards,
Jeff Davis

Re: Built-in CTYPE provider

2023-12-21 Thread Jeff Davis

On Wed, 2023-12-20 at 16:29 -0800, Jeremy Schneider wrote:
> found some more. here's my running list of everything user-facing I
> see
> in core PG code so far that might involve case:
> 
> * upper/lower/initcap
> * regexp_*() and *_REGEXP()
> * ILIKE, operators ~* !~* ~~ !~~ ~~* !~~*
> * citext + replace(), split_part(), strpos() and translate()
> * full text search - everything is case folded
> * unaccent? not clear to me whether CTYPE includes accent folding

No, ctype has nothing to do with accents as far as I can tell. I don't
know if I'm using the right terminology, but I think "case" is a
variant of a character whereas "accent" is a modifier/mark, and the
mark is a separate concept from the character itself.

> * ltree
> * pg_trgm
> * core PG parser, case folding of relation names

Let's separate it into groups.

(1) Callers that use a collation OID or pg_locale_t:

  * collation & hashing
  * upper/lower/initcap
  * regex, LIKE, formatting
  * pg_trgm (which uses regexes)
  * maybe postgres_fdw, but might just be a passthrough
  * catalog cache (always uses DEFAULT_COLLATION_OID)
  * citext (always uses DEFAULT_COLLATION_OID, but probably shouldn't)

(2) A long tail of callers that depend on what LC_CTYPE/LC_COLLATE are
set to, or use ad-hoc ASCII-only semantics:

  * core SQL parser downcase_identifier()
  * callers of pg_strcasecmp() (DDL, etc.)
  * GUC name case folding
  * full text search ("mylocale = 0 /* TODO */")
  * a ton of stuff uses isspace(), isdigit(), etc.
  * various callers of tolower()/toupper()
  * some selfuncs.c stuff
  * ...

Might have missed some places.

The user impact of a new builtin provider would affect (1), but only
for those actually using the provider. So there's no compatibility risk
there, but it's good to understand what it will affect.

We can, on a case-by-case basis, also consider using the new APIs I'm
proposing for instances of (2). There would be some compatibility risk
there for existing callers, and we'd have to consider whether it's
worth it or not. Ideally, new callers would either use the new APIs or
use the pg_ascii_* APIs.

Regards,
Jeff Davis

Re: pg_serial bloat

2023-12-21 Thread Thomas Munro

On Fri, Dec 15, 2023 at 9:53 AM Thomas Munro  wrote:
> ... We've seen a system with ~30GB of files in there
> (note: full/untruncated be would be 2³² xids × sizeof(uint64_t) =
> 32GB).  It's not just a gradual disk space leak: according to disk
> space monitoring, this system suddenly wrote ~half of that data, which
> I think must be the while loop in SerialAdd() zeroing out pages.

Attempt at an analysis of this rare anti-social I/O pattern:

SerialAdd() writes zero pages in a range from the old headPage up to
some target page, but headPage can be any number, arbitrarily far in
the past (or apparently, the future).  It only keeps up with the
progress of the xid clock and spreads that work out if we happen to
call SerialAdd() often enough.  If we call SerialAdd() only every
couple of billion xids (eg very occasionally you leave a transaction
open and go out to lunch on a very busy system using SERIALIZABLE
everywhere), you might find yourself suddenly needing to write out
many gigabytes of zeroes there.

One observation is that headPage gets periodically zapped to -1 by
checkpoints, near the comment "SLRU is no longer needed", providing a
periodic dice-roll that chops the range down.  Unfortunately the
historical "apparent wraparound" bug prevents that from being reached.
That bug was fixed by commit d6b0c2b (master only, no back-patch).  On
the system where we saw pg_serial going bananas, that message appeared
regularly.

Attempts to find a solution:

I think it might make sense to clamp firstZeroPage into the page range
implied by tailXid, headXid.  Those values are eagerly maintained and
interlock with snapshots and global xmin (correctly but
under-documented-ly, AFAICS so far), and we will never try to look up
the CSN for any xid outside that range.  I think that should exclude
the pathological zero-writing cases.  I wouldn't want to do this
without a working reproducer though, which will take some effort.

Another thought is that in the glorious 64 bit future, we might be
able to invent a "sparse" SLRU, where if the file or page doesn't
exist, we just return a zero CSN, and when we write a new page we just
let the OS provide filesystem holes as required.  The reason I
wouldn't want to invent sparse SLRUs with 32 bit indexing is that we
have no confidence in the truncation logic, which might leave stray
files from earlier epochs.  So I think we need zero'd pages (or
perhaps at least to confirm that there is nothing already there, but I
have zero desire to make the current wraparound-ridden system more
complex).

Re: Emit fewer vacuum records by reaping removable tuples during pruning

2023-12-21 Thread Melanie Plageman

On Fri, Nov 17, 2023 at 6:12 PM Melanie Plageman
 wrote:
>
> On Mon, Nov 13, 2023 at 5:28 PM Melanie Plageman
>  wrote:
> > When there are no indexes on the relation, we can set would-be dead
> > items LP_UNUSED and remove them during pruning. This saves us a vacuum
> > WAL record, reducing WAL volume (and time spent writing and syncing
> > WAL).
> ...
> > Note that (on principle) this patch set is on top of the bug fix I
> > proposed in [1].
> >
> > [1] 
> > https://www.postgresql.org/message-id/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
>
> Rebased on top of fix in b2e237afddc56a and registered for the january fest
> https://commitfest.postgresql.org/46/4665/

I got an off-list question about whether or not this codepath is
exercised in existing regression tests. It is -- vacuum.sql tests
include those which vacuum a table with no indexes and tuples that can
be deleted.

I also looked through [1] to see if there were any user-facing docs
which happened to mention the exact implementation details of how and
when tuples are deleted by vacuum. I didn't see anything like that, so
I don't think there are user-facing docs which need updating.

- Melanie

[1] https://www.postgresql.org/docs/devel/routine-vacuuming.html

Re: Eager page freeze criteria clarification

2023-12-21 Thread Robert Haas

On Thu, Dec 21, 2023 at 10:56 AM Melanie Plageman
 wrote:
> Agreed. I plan to test with another distribution. Though, the exercise
> of determining which ones are useful is probably more challenging.
> I imagine we will have to choose one distribution (as opposed to
> supporting different distributions and choosing based on data access
> patterns for a table). Though, even with a normal distribution, I
> think it should be an improvement.

Our current algorithm isn't adaptive at all, so I like our chances of
coming out ahead. It won't surprise me if somebody finds a case where
there is a regression, but if we handle some common and important
cases correctly (e.g. append-only, update-everything-nonstop) then I
think we're probably ahead even if there are some cases where we do
worse. It does depend on how much worse they are, and how realistic
they are, but we don't want to be too fearful here: we know what we're
doing right now isn't too great.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Eager page freeze criteria clarification

2023-12-21 Thread Melanie Plageman

On Wed, Dec 13, 2023 at 12:24 PM Robert Haas  wrote:
>
> Great results.

Thanks!

> On Sat, Dec 9, 2023 at 5:12 AM Melanie Plageman
>  wrote:
> > Values can be "removed" from the accumulator by simply decrementing its
> > cardinality and decreasing the sum and sum squared by a value that will
> > not change the mean and standard deviation of the overall distribution.
> > To adapt to a table's changing access patterns, we'll need to remove
> > values from this accumulator over time, but this patch doesn't yet
> > decide when to do this. A simple solution may be to cap the cardinality
> > of the accumulator to the greater of 1% of the table size, or some fixed
> > number of values (perhaps 200?). Even without such removal of values,
> > the distribution recorded in the accumulator will eventually skew toward
> > more recent data, albeit at a slower rate.
>
> I think we're going to need something here. Otherwise, after 6 months
> of use, changing a table's perceived access pattern will be quite
> difficult.
>
> I think one challenge here is to find something that doesn't decay too
> often and end up with cases where it basically removes all the data.
>
> As long as you avoid that, I suspect that the algorithm might not be
> terribly sensitive to other kinds of changes. If you decay after 200
> values or 2000 or 20,000, it will only affect how fast we can change
> our notion of the access pattern, and my guess would be that any of
> those values would produce broadly acceptable results, with some
> differences in the details. If you decay after 200,000,000 values or
> not at all, then I think there will be problems.

I'll add the decay logic and devise a benchmark that will exercise it.
I can test at least one or two of these ideas.

> > The goal is to keep pages frozen for at least target_freeze_duration.
> > target_freeze_duration is in seconds and pages only have a last
> > modification LSN, so target_freeze_duration must be converted to LSNs.
> > To accomplish this, I've added an LSNTimeline data structure, containing
> > XLogRecPtr, TimestampTz pairs stored with decreasing precision as they
> > age. When we need to translate the guc value to LSNs, we linearly
> > interpolate it on this timeline. For the time being, the global
> > LSNTimeline is in PgStat_WalStats and is only updated by vacuum. There
> > is no reason it can't be updated with some other cadence and/or by some
> > other process (nothing about it is inherently tied to vacuum). The
> > cached translated value of target_freeze_duration is stored in each
> > table's stats. This is arbitrary as it is not a table-level stat.
> > However, it needs to be located somewhere that is accessible on
> > update/delete. We may want to recalculate it more often than once per
> > table vacuum, especially in case of long-running vacuums.
>
> This part sounds like it isn't quite baked yet. The idea of the data
> structure seems fine, but updating it once per vacuum sounds fairly
> unprincipled to me? Don't we want the updates to happen on a somewhat
> regular wall clock cadence?

Yes, this part was not fully baked. I actually discussed this with
Andres at PGConf EU last week and he suggested that background writer
update the LSNTimeline. He also suggested I propose the LSNTimeline in
a new thread. I could add a pageinspect function returning the
estimated time of last page modification given the page LSN (so it is
proposed with a user).

- Melanie

Re: Remove MSVC scripts from the tree

2023-12-21 Thread Andrew Dunstan

On 2023-12-21 Th 03:01, Michael Paquier wrote:

On Wed, Dec 20, 2023 at 11:39:15PM -0800, Andres Freund wrote:

Can't we teach the tool that it should not validate src/tools/win32tzlist.pl
on !windows? It's obviously windows specific code, and it's special case
enough that there doesn't seem like a need to develop it on !windows.

I am not really excited about keeping a dummy library for the sake of
a script checking if this WIN32-only file is correctly written, and
I've never used pgperlsyncheck, TBH, since it exists in af616ce48347.
Anyway, we could just tweak the list of files returned by
find_perl_files as win32tzlist.pl is valid for perltidy and
perlcritic.

Andrew, was the original target of pgperlsyncheck committers and
hackers who played with the MSVC scripts but could not run sanity
checks on Windows (see [1])?

yes.

There are a few more cases like the
Unicode scripts or some of the stuff in src/tools/ where that can be
useful still these are not touched on a daily basis. The rest of the
pm files are for TAP tests, one for Unicode. I'm OK to tweak the
script, still, if its main purpose is gone..

[1]:
https://www.postgresql.org/message-id/f3c12e2c-618f-cb6f-082b-a2f604dbe...@2ndquadrant.com

I'm actually a bit dubious about win32tzlist.pl. Win32::Registry is not
present in a recent Strawberry Perl installation, and its latest version
says it is obsolete, although it's still included in the cpan bundle
libwin32.

I wonder who has actually run the script any time recently?

In any case, we can probably work around the syncheck issue by making
the module a runtime requirement rather than a compile time requirement,
by using "require" instead of "use".

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: trying again to get incremental backup

2023-12-21 Thread Robert Haas

On Thu, Dec 21, 2023 at 10:00 AM Alexander Lakhin  wrote:
> Please look at the attached patch; it corrects all 29 items ("recods"
> fixed in two places), but maybe you find some substitutions wrong...

Thanks, committed with a few additions.

> I've also observed that those commits introduced new warnings:
> $ CC=gcc-12 CPPFLAGS="-Wtype-limits" ./configure -q && make -s -j8
> reconstruct.c: In function ‘read_bytes’:
> reconstruct.c:511:24: warning: comparison of unsigned expression in ‘< 0’ is 
> always false [-Wtype-limits]
>511 | if (rb < 0)
>|^
> reconstruct.c: In function ‘write_reconstructed_file’:
> reconstruct.c:650:40: warning: comparison of unsigned expression in ‘< 0’ is 
> always false [-Wtype-limits]
>650 | if (rb < 0)
>|^
> reconstruct.c:662:32: warning: comparison of unsigned expression in ‘< 0’ is 
> always false [-Wtype-limits]
>662 | if (wb < 0)

Oops. I think the variables should be type int. See attached.

> There are also two deadcode.DeadStores complaints from clang. First one is
> about:
>  /*
>   * Align the wait time to prevent drift. This doesn't really matter,
>   * but we'd like the warnings about how long we've been waiting to 
> say
>   * 10 seconds, 20 seconds, 30 seconds, 40 seconds ... without ever
>   * drifting to something that is not a multiple of ten.
>   */
>  timeout_in_ms -=
>  TimestampDifferenceMilliseconds(current_time, initial_time) %
>  timeout_in_ms;
> It looks like this timeout is really not used.

Oops. It should be. See attached.

> And the minor one (similar to many existing, maybe doesn't deserve fixing):
> walsummarizer.c:808:5: warning: Value stored to 'summary_end_lsn' is never 
> read [deadcode.DeadStores]
>  summary_end_lsn = 
> private_data->read_upto;
>  ^ ~~~

It kind of surprises me that this is dead, but it seems best to keep
it there to be on the safe side, in case some change to the logic
renders it not dead in the future.

> >> Also, a comment above MaybeRemoveOldWalSummaries() basically repeats a
> >> comment above redo_pointer_at_last_summary_removal declaration, but
> >> perhaps it should say about removing summaries instead?
> > Wow, yeah. Thanks, will fix.
>
> Thank you for paying attention to it!

I'll fix this next.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


fix-ib-thinkos.patch
Description: Binary data

Re: Fixing backslash dot for COPY FROM...CSV

2023-12-21 Thread Daniel Verite

vignesh C wrote:

> Thanks for the updated patch, any reason why this is handled only in csv.
> postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out';
> COPY 1
> postgres=# select * from test1;
>  c1
> ---
> line1
> (1 row)

I believe it's safer to not change anything to the normal "non-csv"
text mode.
The current doc says that \. will not be taken as data in this format.
From https://www.postgresql.org/docs/current/sql-copy.html :

   Any other backslashed character that is not mentioned in the above
   table will be taken to represent itself. However, beware of adding
   backslashes unnecessarily, since that might accidentally produce a
   string matching the end-of-data marker (\.) or the null string (\N
   by default). These strings will be recognized before any other
   backslash processing is done.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: ci: Build standalone INSTALL file

2023-12-21 Thread Tom Lane

Andres Freund  writes:
> On 2023-12-21 10:46:02 -0500, Tom Lane wrote:
>> Let's go with "/devel/ in master and a number in release branches"
>> for now, and tweak that if the web team wants to take on maintaining
>> a redirect.  I'll put together a concrete patch proposal in a little
>> bit.

> Cool.

Here's a draft patch for this.  Most of it is mechanical removal of
infrastructure for building the INSTALL file.  If anyone wants to
bikeshed on the new wording of README, feel free.

regards, tom lane

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 80db4c73f8..eba569e930 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -109,10 +109,7 @@ distdir:
 	  || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
 	  fi || exit; \
 	done
-	$(MAKE) -C $(distdir)/doc/src/sgml/ INSTALL
-	cp $(distdir)/doc/src/sgml/INSTALL $(distdir)/
 	$(MAKE) -C $(distdir) distclean
-	rm -f $(distdir)/README.git
 
 distcheck: dist
 	rm -rf $(dummy)
diff --git a/Makefile b/Makefile
index c66fb3027b..9bc1a4ec17 100644
--- a/Makefile
+++ b/Makefile
@@ -17,13 +17,7 @@ all:
 
 all check install installdirs installcheck installcheck-parallel uninstall clean distclean maintainer-clean dist distcheck world check-world install-world installcheck-world:
 	@if [ ! -f GNUmakefile ] ; then \
-	   if [ -f INSTALL ] ; then \
-	 INSTRUCTIONS="INSTALL"; \
-	   else \
-	 INSTRUCTIONS="README.git"; \
-	   fi; \
-	   echo "You need to run the 'configure' program first. See the file"; \
-	   echo "'$$INSTRUCTIONS' for installation instructions, or visit: " ; \
+	   echo "You need to run the 'configure' program first. Please see"; \
 	   echo "" ; \
 	   false ; \
 	 fi
diff --git a/README b/README
index 56d0c951a9..e40e610ccb 100644
--- a/README
+++ b/README
@@ -9,14 +9,13 @@ that supports an extended subset of the SQL standard, including
 transactions, foreign keys, subqueries, triggers, user-defined types
 and functions.  This distribution also contains C language bindings.
 
-See the file INSTALL for instructions on how to build and install
-PostgreSQL.  That file also lists supported operating systems and
-hardware platforms and contains information regarding any other
-software packages that are required to build or run the PostgreSQL
-system.  Copyright and license information can be found in the
-file COPYRIGHT.  A comprehensive documentation set is included in this
-distribution; it can be read as described in the installation
-instructions.
+Copyright and license information can be found in the file COPYRIGHT.
+
+General documentation about this version of PostgreSQL can be found at:
+https://www.postgresql.org/docs/devel/
+In particular, information about building PostgreSQL from the source
+code can be found at:
+https://www.postgresql.org/docs/devel/installation.html
 
 The latest version of this software, and related software, may be
 obtained at https://www.postgresql.org/download/.  For more information
diff --git a/README.git b/README.git
deleted file mode 100644
index 4bf614eea4..00
--- a/README.git
+++ /dev/null
@@ -1,14 +0,0 @@
-(This file does not appear in release tarballs.)
-
-In a release or snapshot tarball of PostgreSQL, a documentation file named
-INSTALL will appear in this directory.  However, this file is not stored in
-git and so will not be present if you are using a git checkout.
-
-If you are using a git checkout, you can view the most recent installation
-instructions at:
-	https://www.postgresql.org/docs/devel/installation.html
-
-Users compiling from git will also need compatible versions of Bison, Flex,
-and Perl, as discussed in the install documentation.  These programs are not
-needed when using a tarball, since the files they are needed to build are
-already present in the tarball.  (On Windows, however, you need Perl anyway.)
diff --git a/doc/src/sgml/.gitignore b/doc/src/sgml/.gitignore
index 88a07d852e..91f2781fe7 100644
--- a/doc/src/sgml/.gitignore
+++ b/doc/src/sgml/.gitignore
@@ -6,7 +6,6 @@
 /man7/
 /man-stamp
 # Other popular build targets
-/INSTALL
 /postgres-US.pdf
 /postgres-A4.pdf
 /postgres.html
@@ -21,7 +20,5 @@
 /wait_event_types.sgml
 # Assorted byproducts from building the above
 /postgres-full.xml
-/INSTALL.html
-/INSTALL.xml
 /postgres-US.fo
 /postgres-A4.fo
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 2ef818900f..725fec59e7 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -36,6 +36,8 @@ ifndef FOP
 FOP = $(missing) fop
 endif
 
+PANDOC = pandoc
+
 XMLINCLUDE = --path . --path $(srcdir)
 
 ifdef XMLLINT
@@ -113,25 +115,6 @@ wait_event_types.sgml: $(top_srcdir)/src/backend/utils/activity/wait_event_names
 targets-meson.sgml: targets-meson.txt $(srcdir)/generate-targets-meson.pl
 	$(PERL) $(srcdir)/generate-targets-meson.pl $^ > $@
 
-##
-## Generation of some text files.
-##
-
-ICONV = iconv
-PANDOC = pandoc
-
-INSTALL: % : %.html
-	$(PANDOC) -t plain -o

Re: index prefetching

2023-12-21 Thread Tomas Vondra

On 12/21/23 18:14, Robert Haas wrote:
> On Thu, Dec 21, 2023 at 11:08 AM Andres Freund  wrote:
>> But I'd like you to feel guilty (no, not really) and fix it (yes, really) :)
> 
> Sadly, you're more likely to get the first one than you are to get the
> second one. I can't really see going back to revisit that decision as
> a basis for somebody else's new work -- it'd be better if the person
> doing the new work figured out what makes sense here.
> 

I think it's a great example of "hindsight is 20/20". There were
perfectly valid reasons to have two separate nodes, and it's not like
these reasons somehow disappeared. It still is a perfectly reasonable
decision.

It's just that allowing index-only filters for regular index scans seems
to eliminate pretty much all executor differences between the two nodes.
But that's hard to predict - I certainly would not have even think about
that back when index-only scans were added.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

authentication/t/001_password.pl trashes ~/.psql_history

2023-12-21 Thread Tom Lane

I happened to notice this stuff getting added to my .psql_history:

\echo background_psql: ready
SET password_encryption='scram-sha-256';
;
\echo background_psql: QUERY_SEPARATOR
SET scram_iterations=42;
;
\echo background_psql: QUERY_SEPARATOR
\password scram_role_iter
\q

After grepping for these strings, this is evidently the fault of
src/test/authentication/t/001_password.pl by way of BackgroundPsql.pm,
which fires up an interactive psql run that is not given the -n switch.

Currently the only other user of interactive_psql() seems to be
psql/t/010_tab_completion.pl, which avoids this problem by
explicitly redirecting the history file.  We could have 001_password.pl
do likewise, or we could have it pass the -n switch, but I think we're
going to have this problem resurface repeatedly if we leave it to the
outer test script to remember to do it.

My first idea was that BackgroundPsql.pm should take responsibility for
preventing this, by explicitly setting $ENV{PSQL_HISTORY} to "/dev/null"
if the calling script hasn't set some other value.  However, that could
fail if the user who runs the test habitually sets PSQL_HISTORY.

A messier but safer alternative would be to supply the -n switch by
default, with some way for 010_tab_completion.pl to override that.

Thoughts?

regards, tom lane

Re: Functions to return random numbers in a given range

2023-12-21 Thread Pavel Stehule

Hi

čt 21. 12. 2023 v 18:06 odesílatel Dean Rasheed 
napsal:

> Attached is a patch that adds 3 SQL-callable functions to return
> random integer/numeric values chosen uniformly from a given range:
>
>   random(min int, max int) returns int
>   random(min bigint, max bigint) returns bigint
>   random(min numeric, max numeric) returns numeric
>
The return value is in the range [min, max], and in the numeric case,
> the result scale equals Max(scale(min), scale(max)), so it can be used
> to generate large random integers, as well as decimals.
>
> The goal is to provide simple, easy-to-use functions that operate
> correctly over arbitrary ranges, which is trickier than it might seem
> using the existing random() function. The main advantages are:
>
> 1. Support for arbitrary bounds (provided that max >= min). A SQL or
> PL/pgSQL implementation based on the existing random() function can
> suffer from integer overflow if the difference max-min is too large.
>
> 2. Uniform results over the full range. It's easy to overlook the fact
> that in a naive implementation doing something like
> "((max-min)*random()+min)::int", the endpoint values will be half as
> likely as any other value, since casting to integer rounds to nearest.
>
> 3. Makes better use of the underlying PRNG, not limited to the 52-bits
> of double precision values.
>
> 4. Simpler and more efficient generation of random numeric values.
> This is something I have commonly wanted in the past, and have usually
> resorted to hacks involving multiple calls to random() to build
> strings of digits, which is horribly slow, and messy.
>
> The implementation moves the existing random functions to a new source
> file, so the new functions all share a common PRNG state with the
> existing random functions, and that state is kept private to that
> file.
>

+1

Regards

Pavel


> Regards,
> Dean
>

Re: index prefetching

2023-12-21 Thread Robert Haas

On Thu, Dec 21, 2023 at 11:08 AM Andres Freund  wrote:
> But I'd like you to feel guilty (no, not really) and fix it (yes, really) :)

Sadly, you're more likely to get the first one than you are to get the
second one. I can't really see going back to revisit that decision as
a basis for somebody else's new work -- it'd be better if the person
doing the new work figured out what makes sense here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Functions to return random numbers in a given range

2023-12-21 Thread Dean Rasheed

Attached is a patch that adds 3 SQL-callable functions to return
random integer/numeric values chosen uniformly from a given range:

  random(min int, max int) returns int
  random(min bigint, max bigint) returns bigint
  random(min numeric, max numeric) returns numeric

The return value is in the range [min, max], and in the numeric case,
the result scale equals Max(scale(min), scale(max)), so it can be used
to generate large random integers, as well as decimals.

The goal is to provide simple, easy-to-use functions that operate
correctly over arbitrary ranges, which is trickier than it might seem
using the existing random() function. The main advantages are:

1. Support for arbitrary bounds (provided that max >= min). A SQL or
PL/pgSQL implementation based on the existing random() function can
suffer from integer overflow if the difference max-min is too large.

2. Uniform results over the full range. It's easy to overlook the fact
that in a naive implementation doing something like
"((max-min)*random()+min)::int", the endpoint values will be half as
likely as any other value, since casting to integer rounds to nearest.

3. Makes better use of the underlying PRNG, not limited to the 52-bits
of double precision values.

4. Simpler and more efficient generation of random numeric values.
This is something I have commonly wanted in the past, and have usually
resorted to hacks involving multiple calls to random() to build
strings of digits, which is horribly slow, and messy.

The implementation moves the existing random functions to a new source
file, so the new functions all share a common PRNG state with the
existing random functions, and that state is kept private to that
file.

Regards,
Dean
From 0b7015668387c337114adb4b3c24fe1d8053bf9c Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Fri, 25 Aug 2023 10:42:38 +0100
Subject: [PATCH v1] Add random-number-in-range functions.

This adds 3 functions:

random(min int, max int) returns int
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric

Each returns a random number in the range [min, max].

In the numeric case, the result scale is Max(scale(min), scale(max)).
---
 doc/src/sgml/func.sgml|  39 ++-
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/float.c |  95 --
 src/backend/utils/adt/meson.build |   1 +
 src/backend/utils/adt/numeric.c   | 219 +
 src/backend/utils/adt/pseudorandomfuncs.c | 185 +++
 src/common/pg_prng.c  |  36 +++
 src/include/catalog/pg_proc.dat   |  12 +
 src/include/common/pg_prng.h  |   1 +
 src/include/utils/numeric.h   |   4 +
 src/test/regress/expected/random.out  | 360 ++
 src/test/regress/sql/random.sql   | 164 ++
 12 files changed, 1017 insertions(+), 100 deletions(-)
 create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 20da3ed033..b0b65d81dc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1862,6 +1862,36 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

   
 
+  
+   
+
+ random
+
+random ( min integer, max integer )
+integer
+   
+   
+random ( min bigint, max bigint )
+bigint
+   
+   
+random ( min numeric, max numeric )
+numeric
+   
+   
+Return a random value in the range
+min = x = max.
+   
+   
+random(1, 10)
+7
+   
+   
+random(-0.499, 0.499)
+0.347
+   
+  
+
   

 
@@ -1906,19 +1936,18 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

 
   
-   The random() function uses a deterministic
-   pseudo-random number generator.
+   The random functions listed in 
+   use a deterministic pseudo-random number generator.
It is fast but not suitable for cryptographic
applications; see the  module for a more
secure alternative.
If setseed() is called, the series of results of
-   subsequent random() calls in the current session
+   subsequent calls to these random functions in the current session
can be repeated by re-issuing setseed() with the same
argument.
Without any prior setseed() call in the same
-   session, the first random() call obtains a seed
+   session, the first call to any of these random functions obtains a seed
from a platform-dependent source of random bits.
-   These remarks hold equally for random_normal().
   
 
   
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 199eae525d..610ccf2f79 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -82,6 +82,7 @@ OBJS = \
 	pg_lsn.o \
 	pg_upgrade_support.o \

Re: Eager page freeze criteria clarification

2023-12-21 Thread Joe Conway


On 12/21/23 10:56, Melanie Plageman wrote:

On Sat, Dec 9, 2023 at 9:24 AM Joe Conway  wrote:

However, even if we assume a more-or-less normal distribution, we should
consider using subgroups in a way similar to Statistical Process
Control[1]. The reasoning is explained in this quote:

 The Math Behind Subgroup Size

 The Central Limit Theorem (CLT) plays a pivotal role here. According
 to CLT, as the subgroup size (n) increases, the distribution of the
 sample means will approximate a normal distribution, regardless of
 the shape of the population distribution. Therefore, as your
 subgroup size increases, your control chart limits will narrow,
 making the chart more sensitive to special cause variation and more
 prone to false alarms.


I haven't read anything about statistical process control until you
mentioned this. I read the link you sent and also googled around a
bit. I was under the impression that the more samples we have, the
better. But, it seems like this may not be the assumption in
statistical process control?

It may help us to get more specific. I'm not sure what the
relationship between "unsets" in my code and subgroup members would
be.  The article you linked suggests that each subgroup should be of
size 5 or smaller. Translating that to my code, were you imagining
subgroups of "unsets" (each time we modify a page that was previously
all-visible)?


Basically, yes.

It might not makes sense, but I think we could test the theory by 
plotting a histogram of the raw data, and then also plot a histogram 
based on sub-grouping every 5 sequential values in your accumulator.


If the former does not look very normal (I would guess most workloads it 
will be skewed with a long tail) and the latter looks to be more normal, 
then it would say we were on the right track.


There are statistical tests for "normalness" that could be applied too 
( e.g. 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350423/#sec2-13title ) 
which be a more rigorous approach, but the quick look at histograms 
might be sufficiently convincing.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

93 matches

Mail list logo