RE: refactoring basebackup.c

2022-02-11 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi, Hackers.
Thank you for developing a great feature.
The current help message shown below does not seem to be able to specify the 
'client-' or 'server-' for lz4 compression.
 --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]

The attached small patch fixes the help message as follows:
 --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]

Regards,
Noriyoshi Shinoda
-Original Message-
From: Robert Haas  
Sent: Saturday, February 12, 2022 12:50 AM
To: Justin Pryzby 
Cc: Jeevan Ladhe ; Dipesh Pandit 
; Abhijit Menon-Sen ; Dmitry Dolgov 
<9erthali...@gmail.com>; Jeevan Ladhe ; Mark 
Dilger ; pgsql-hack...@postgresql.org; tushar 

Subject: Re: refactoring basebackup.c

On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby  wrote:
> FYI: there's a couple typos in the last 2 patches.

Hmm. OK. But I don't consider "can be optionally specified" incorrect or worse 
than "can optionally be specified".

I do agree that spelling words correctly is a good idea.

--
Robert Haas
EDB: http://www.enterprisedb.com 




pg_basebackup_help_v1.diff
Description: pg_basebackup_help_v1.diff


Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-11 Thread Peter Geoghegan
On Sat, Jan 29, 2022 at 8:42 PM Peter Geoghegan  wrote:
> Attached is v7, a revision that overhauls the algorithm that decides
> what to freeze. I'm now calling it block-driven freezing in the commit
> message. Also included is a new patch, that makes VACUUM record zero
> free space in the FSM for an all-visible page, unless the total amount
> of free space happens to be greater than one half of BLCKSZ.

I pushed the earlier refactoring and instrumentation patches today.

Attached is v8. No real changes -- just a rebased version.

It will be easier to benchmark and test the page-driven freezing stuff
now, since the master/baseline case will now output instrumentation
showing how relfrozenxid has been advanced (if at all) -- whether (and
to what extent) each VACUUM operation advances relfrozenxid can now be
directly compared, just by monitoring the log_autovacuum_min_duration
output for a given table over time.

--
Peter Geoghegan
From 41136d2a8af434a095ce3e6dfdfbe4b48b9ec338 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sun, 23 Jan 2022 21:10:38 -0800
Subject: [PATCH v8 3/3] Add all-visible FSM heuristic.

When recording free space in all-frozen page, record that the page has
zero free space when it has less than half BLCKSZ worth of space,
according to the traditional definition.  Otherwise record free space as
usual.

Making all-visible pages resistant to change like this can be thought of
as a form of hysteresis.  The page is given an opportunity to "settle"
and permanently stay in the same state when the tuples on the page will
never be updated or deleted.  But when they are updated or deleted, the
page can once again be used to store any tuple.  Over time, most pages
tend to settle permanently in many workloads, perhaps only on the second
or third attempt.
---
 src/backend/access/heap/vacuumlazy.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ea4b75189..95049ed25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1231,6 +1231,13 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
  */
 freespace = PageGetHeapFreeSpace(page);
 
+/*
+ * An all-visible page should not have its free space
+ * available from FSM unless it's more than half empty
+ */
+if (PageIsAllVisible(page) && freespace < BLCKSZ / 2)
+	freespace = 0;
+
 UnlockReleaseBuffer(buf);
 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 continue;
@@ -1368,6 +1375,13 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
+			/*
+			 * An all-visible page should not have its free space available
+			 * from FSM unless it's more than half empty
+			 */
+			if (PageIsAllVisible(page) && freespace < BLCKSZ / 2)
+freespace = 0;
+
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
@@ -2537,6 +2551,13 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		freespace = PageGetHeapFreeSpace(page);
 
+		/*
+		 * An all-visible page should not have its free space available from
+		 * FSM unless it's more than half empty
+		 */
+		if (PageIsAllVisible(page) && freespace < BLCKSZ / 2)
+			freespace = 0;
+
 		UnlockReleaseBuffer(buf);
 		RecordPageWithFreeSpace(vacrel->rel, tblk, freespace);
 		vacuumed_pages++;
-- 
2.30.2

From 4838bd1f11b748d2082caedfe4da506b8fe3f67a Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Mon, 13 Dec 2021 15:00:49 -0800
Subject: [PATCH v8 2/3] Make block-level characteristics drive freezing.

Teach VACUUM to freeze all of the tuples on a page whenever it notices
that it would otherwise mark the page all-visible, without also marking
it all-frozen.  VACUUM won't freeze _any_ tuples on the page unless
_all_ tuples (that remain after pruning) are all-visible.  It may
occasionally be necessary to freeze the page due to the presence of a
particularly old XID, from before VACUUM's FreezeLimit cutoff.  But the
FreezeLimit mechanism will seldom have any impact on which pages are
frozen anymore -- it is just a backstop now.

Freezing can now informally be thought of as something that takes place
at the level of an entire page, or not at all -- differences in XIDs
among tuples on the same page are not interesting, barring extreme
cases.  Freezing a page is now practically synonymous with setting the
page to all-visible in the visibility map, at least to users.

The main upside of the new approach to freezing is that it makes the
overhead of vacuuming much more predictable over time.  We avoid the
need for large balloon payments, since the system no longer accumulates
"freezing debt" that can only be paid off by anti-wraparound vacuuming.
This seems to have been particularly troublesome with append-only
tables, especially in the common case where XIDs from pages that are
marked all-visible for the first 

Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Michael Paquier
On Sat, Feb 12, 2022 at 09:49:47AM +0900, Michael Paquier wrote:
> I guess that it would be better to revert the TAP test and rework all
> that for the time being, then.

And this part is done.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2022-02-11 Thread Michael Paquier
On Fri, Feb 11, 2022 at 10:07:49AM -0500, Robert Haas wrote:
> Over in 
> http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pgebaosgfmzz6ac4_srwom_+u...@mail.gmail.com
> I was noticing that CreateWalTarMethod doesn't support LZ4
> compression. It would be nice if it did. I thought maybe the patch on
> this thread would fix that, but I think maybe it doesn't, because it
> looks like that's touching the WalDirectoryMethod part of that file,
> rather than the WalTarMethod part. Is that correct?

Correct.  pg_receivewal only cares about the directory method, so this
thread was limited to this part.  Yes, it would be nice to extend
fully the tar method of walmethods.c to support LZ4, but I was not
sure what needed to be done, and I am still not sure based on what has
just been done as of 751b8d23.

> And, on a related note, Michael, do you plan to get something
> committed here? 

Apart from f79962d, ba5 and 50e1441, I don't think that there was
something left to do for this thread.  Perhaps I am missing something?
--
Michael


signature.asc
Description: PGP signature


Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Andres Freund
Hi,

On 2022-02-11 16:41:24 -0800, Andres Freund wrote:
> FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
> r/w workload that's been modified to start 70 savepoints at the start shows
> 
> pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  
> t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table 
> "pgbench_branches"
> pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  
> t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table 
> "pgbench_branches"
> pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  
> t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table 
> "pgbench_branches"
> 
> after a few minutes of running with a local, not slowed down, syncrep. Without
> any other artifical slowdowns or such.

And this can easily be triggered even without subtransactions, in a completely
reliable way.

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; 
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.

Greetings,

Andres Freund




Re: Replacing TAP test planning with done_testing()

2022-02-11 Thread Julien Rouhaud
Le sam. 12 févr. 2022 à 04:28, Alvaro Herrera  a
écrit :

> On 2022-Feb-11, Tom Lane wrote:
>
> > Andres Freund  writes:
>
> > > +1 on backpatching. Backpatching tests now is less likely to cause
> conflicts,
> > > but more likely to fail during tests.
> >
> > If you've got the energy to do it, +1 for backpatching.  I agree
> > with Michael's opinion that doing so will probably save us
> > future annoyance.
>
> +1
>

+1

>


Re: PGroonga index-only scan problem with yesterday’s PostgreSQL updates

2022-02-11 Thread Anders Kaseorg

On 2/11/22 15:57, Tom Lane wrote:

Possibly same issue I just fixed?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e5691cc9170bcd6c684715c2755d919c5a16fea2


Yeah, that does seem to fix my test cases.  Thanks!

Any chance this will make it into minor releases sooner than three 
months from now?  I know extra minor releases are unusual, but this 
regression will be critical at least for self-hosted Zulip users and 
presumably other PGroonga users.


Anders




Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Michael Paquier
On Fri, Feb 11, 2022 at 12:34:31PM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
>> Or, is it okay to use ABS_SRCDIR?
> 
> Don't see why not --- other tests certainly do.

Another solution would be to rely on TESTDIR, which is always set as
of the Makefile invocations and vcregress.pl, but that's doomed for
VPATH builds as this points to the build directory, not the source
directory so we would not have access to the sample file.  The root
issue here is that we don't have access to top_srcdir when running the
TAP tests, if we want to keep this stuff as a TAP test.

libpq's regress.pl tweaks things by passing down a SRCDIR to take care
of the VPATH problem, but there is nothing in %ENV that points to the
source directory in the context of a TAP test by default, if we cannot
rely on pg_config to find where the sample file is located.  That's
different than this thread, but should we make an effort and expand
the data available here for TAP tests?

Hmm.  Relying on ABS_SRCDIR in the main test suite would be fine,
though.  It also looks that there would be nothing preventing the
addition of the three tests in the TAP test, as of three SQL queries
instead.  These had better be written with one or more CTEs, for
readability, at least, with the inner query reading the contents of
the sample file to grab the list of all the GUCs.

I guess that it would be better to revert the TAP test and rework all
that for the time being, then.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Michael Paquier
On Fri, Feb 11, 2022 at 10:48:11AM +0100, Christoph Berg wrote:
> It never caused any problem in the 12+ years we have been doing this,
> but Debian is patching pg_config not to be relocatable since we are
> installing into /usr/lib/postgresql/NN /usr/share/postgresql/NN, so
> not a single prefix.
> 
> https://salsa.debian.org/postgresql/postgresql/-/blob/15/debian/patches/50-per-version-dirs.patch

Wow.  This is the way for Debian to bypass the fact that ./configure
is itself patched, hence you cannot rely on things like --libdir,
--bindir and the kind at build time?  That's..  Err..  Fancy, I'd
say.
--
Michael


signature.asc
Description: PGP signature


Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Andres Freund
Hi,

On 2022-02-10 22:11:38 -0800, Andres Freund wrote:
> Looks lik syncrep will make this a lot worse, because it can drastically
> increase the window between the TransactionIdCommitTree() and
> ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> least it might make it easier to write tests exercising this scenario...

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  
t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table 
"pgbench_branches"
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  
t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table 
"pgbench_branches"
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  t_xmin 
4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches"

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.

Greetings,

Andres Freund




Re: logical decoding and replication of sequences

2022-02-11 Thread Tomas Vondra
On 2/10/22 19:17, Tomas Vondra wrote:
> I've polished & pushed the first part adding sequence decoding
> infrastructure etc. Attached are the two remaining parts.
> 
> I plan to wait a day or two and then push the test_decoding part. The
> last part (for built-in replication) will need more work and maybe
> rethinking the grammar etc.
> 

I've pushed the second part, adding sequences to test_decoding.

Here's the remaining part, rebased, with a small tweak in the TAP test
to eliminate the issue with not waiting for sequence increments. I've
kept the tweak in a separate patch, so that we can throw it away easily
if we happen to resolve the issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom c456270469cdb6ad769455eb3d16aea6db2c02af Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 10 Feb 2022 15:18:59 +0100
Subject: [PATCH 1/2] Add support for decoding sequences to built-in
 replication

---
 doc/src/sgml/catalogs.sgml  |  71 
 doc/src/sgml/ref/alter_publication.sgml |  24 +-
 doc/src/sgml/ref/alter_subscription.sgml|   4 +-
 src/backend/catalog/pg_publication.c| 149 -
 src/backend/catalog/system_views.sql|  10 +
 src/backend/commands/publicationcmds.c  | 350 +++-
 src/backend/commands/sequence.c |  79 +
 src/backend/commands/subscriptioncmds.c | 272 +++
 src/backend/executor/execReplication.c  |   2 +-
 src/backend/nodes/copyfuncs.c   |   1 +
 src/backend/nodes/equalfuncs.c  |   1 +
 src/backend/parser/gram.y   |  32 ++
 src/backend/replication/logical/proto.c |  52 +++
 src/backend/replication/logical/tablesync.c | 118 ++-
 src/backend/replication/logical/worker.c|  60 
 src/backend/replication/pgoutput/pgoutput.c |  85 -
 src/backend/utils/cache/relcache.c  |   4 +-
 src/bin/psql/tab-complete.c |  14 +-
 src/include/catalog/pg_proc.dat |   5 +
 src/include/catalog/pg_publication.h|  14 +
 src/include/commands/sequence.h |   1 +
 src/include/nodes/parsenodes.h  |   6 +
 src/include/replication/logicalproto.h  |  19 ++
 src/include/replication/pgoutput.h  |   1 +
 src/test/regress/expected/rules.out |   8 +
 src/test/subscription/t/028_sequences.pl| 196 +++
 26 files changed, 1542 insertions(+), 36 deletions(-)
 create mode 100644 src/test/subscription/t/028_sequences.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 879d2dbce03..271dc03e5a2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9540,6 +9540,11 @@ SCRAM-SHA-256$iteration count:
   prepared transactions
  
 
+ 
+  pg_publication_sequences
+  publications and their associated sequences
+ 
+
  
   pg_publication_tables
   publications and their associated tables
@@ -11375,6 +11380,72 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_publication_sequences
+
+  
+   pg_publication_sequences
+  
+
+  
+   The view pg_publication_sequences provides
+   information about the mapping between publications and the sequences they
+   contain.  Unlike the underlying catalog
+   pg_publication_rel,
+   this view expands
+   publications defined as FOR ALL SEQUENCES, so for such
+   publications there will be a row for each eligible sequence.
+  
+
+  
+   pg_publication_sequences Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   pubname name
+   (references pg_publication.pubname)
+  
+  
+   Name of publication
+  
+ 
+
+ 
+  
+   schemaname name
+   (references pg_namespace.nspname)
+  
+  
+   Name of schema containing sequence
+  
+ 
+
+ 
+  
+   sequencename name
+   (references pg_class.relname)
+  
+  
+   Name of sequence
+  
+ 
+
+   
+  
+ 
+
  
   pg_publication_tables
 
diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index 7c7c27bf7ce..9da8274ae2c 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -31,7 +31,9 @@ ALTER PUBLICATION name RENAME TO where publication_object is one of:
 
 TABLE [ ONLY ] table_name [ * ] [, ... ]
+SEQUENCE sequence_name [ * ] [, ... ]
 ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
+ALL SEQUENCES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
 
@@ -56,7 +58,18 @@ ALTER PUBLICATION name RENAME TO 
 
   
-   The fourth variant of this command listed in the synopsis can change
+   The next three variants change which sequences are part of the publication.
+   The SET SEQUENCE clause will replace the list of sequences
+   in the publication with the 

Re: PGroonga index-only scan problem with yesterday’s PostgreSQL updates

2022-02-11 Thread Tom Lane
Anders Kaseorg  writes:
> With yesterday’s release of PostgreSQL 11.15, 12.10, and 13.6 
> (presumably 10.20 and 14.2 as well), Zulip’s test suite started failing 
> with “variable not found in subplan target list” errors from PostgreSQL 
> on a table that has a PGroonga index.

Possibly same issue I just fixed?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e5691cc9170bcd6c684715c2755d919c5a16fea2

regards, tom lane




PGroonga index-only scan problem with yesterday’s PostgreSQL updates

2022-02-11 Thread Anders Kaseorg
With yesterday’s release of PostgreSQL 11.15, 12.10, and 13.6 
(presumably 10.20 and 14.2 as well), Zulip’s test suite started failing 
with “variable not found in subplan target list” errors from PostgreSQL 
on a table that has a PGroonga index.  I found the following 
reproduction recipe from a fresh database:


psql (11.15 (Debian 11.15-1.pgdg100+1))
Type "help" for help.

test=# CREATE EXTENSION pgroonga;
CREATE EXTENSION
test=# CREATE TABLE t AS SELECT CAST(c AS text) FROM generate_series(1,
1) AS c;
SELECT 1
test=# CREATE INDEX t_c ON t USING pgroonga (c);
CREATE INDEX
test=# VACUUM t;
VACUUM
test=# SELECT COUNT(*) FROM t;
ERROR:  variable not found in subplan target list

I filed https://github.com/pgroonga/pgroonga/issues/203, and confirmed 
with a Git bisection of PostgreSQL that this error first appears with 
ec36745217 (REL_11_15~42) “Fix index-only scan plans, take 2.”  I’m 
aware that this likely just exposed a previously hidden PGroonga bug, 
but I figure PostgreSQL developers might want to know about this anyway 
and help come up with the right fix.  The PGroonga author suggested I 
start a thread here:


https://github.com/pgroonga/pgroonga/issues/203#issuecomment-1036708841

Thanks for investigating this!

Our CI is also broken with new PostgreSQL:
https://github.com/pgroonga/pgroonga/runs/5149762901?check_suite_focus=true
 
https://www.postgresql.org/message-id/602391641208390%40iva4-92c901fae84c.qloud-c.yandex.net

says partial-retrieval index-only scan but our case is
non-retrievable index-only scan. In non-retrievable index-only scan,
the error is occurred.

We asked about non-retrievable index-only scan on the PostgreSQL
mailing list in the past:
https://www.postgresql.org/message-id/5148.1584372043%40sss.pgh.pa.us
We thought non-retrievable index-only scan should not be used but
PostgreSQL may use it as a valid plan. So I think that our case
should be supported with postgres/postgres@ec36745 “Fix index-only > scan 
plans, take 2.”

Could you ask about this case on the PostgreSQL mailing list
https://www.postgresql.org/list/pgsql-hackers/ ?

The following patch fixes our case and PostgreSQL's test cases are
still passed but a review by the original author is needed:

--- postgresql-11.15.orig/src/backend/optimizer/plan/setrefs.c  2022-02-08 
06:20:23.0 +0900



+++ postgresql-11.15/src/backend/optimizer/plan/setrefs.c   2022-02-12 
07:32:20.355567317 +0900



@@ -1034,7 +1034,7 @@



{



TargetEntry *indextle = (TargetEntry *) lfirst(lc);


 



-   if (!indextle->resjunk)



+   if (!indextle->resjunk || indextle->expr->type == T_Var)



stripped_indextlist = lappend(stripped_indextlist, 
indextle);



}


 



```



Anders




Re: support for MERGE

2022-02-11 Thread Justin Pryzby
On Fri, Feb 11, 2022 at 05:25:49PM -0300, Alvaro Herrera wrote:
> > I'm not sure git diff --cherry-pick is widely known/used, but I think
> > using that relative to master may be good enough.  
> 
> I had never heard of git diff --cherry-pick, and the manpages I found
> don't document it, so frankly I doubt it's known.  I still have no idea
> what does it do.

See git-log(1)

   --cherry-pick
   Omit any commit that introduces the same change as another commit on 
the “other side” when the set of commits are limited with symmetric difference.

The "symmetric difference" is the triple-dot notation.

The last few years I've used this to check for missing bits in the draft
release notes.  (Actually, I tend to start my own list of features before
that).  It's doing a generic version of what git_changelog does.

https://www.postgresql.org/message-id/20210510144045.gc27...@telsasoft.com

> I suppose there is an obvious reason why using git diff with
> $(git merge-base ...) as one of the arguments doesn't work for these purposes.
> 
> > Andres thinks that does the wrong thing if CI is run manually (not by CFBOT)
> > for patches against backbranches.
> 
> I wonder if it's sufficient to handle these things (coverage
> specifically) for branch master only.

Or default to master, and maybe try to parse the commit message and pull out
Backpatch-through: NN.  It's intended to be machine-readable, after all.

Let's talk about it more on the/another thread :)

-- 
Justin




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-02-11 Thread Andrew Dunstan


On 2/6/22 10:45, Robert Haas wrote:
> For what it's worth, I am generally in favor of having something like
> this in PostgreSQL. I think it's wrong of us to continue assuming that
> everyone has command-line access. Even when that's true, it's not
> necessarily convenient. If you choose to use a relational database,
> you may be the sort of person who likes SQL.



Almost completely off topic, but this reminded me of an incident about
30 years ago at my first gig as an SA/DBA. There was an application
programmer who insisted on loading a set of values from a text file into
a temp table (it was Ingres, anyone remember that?). Why? Because he
knew how to write "Select * from mytable order by mycol" but didn't know
how to drive the Unix sort utility at the command line. When I was
unable to restrain myself from smiling at this he got very angry and
yelled at me loudly.

So, yes, some people do like SQL and hate the command line.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Fix overflow in DecodeInterval

2022-02-11 Thread Joseph Koshakow
Tom Lane  writes:
> Joseph Koshakow  writes:
> > The attached patch fixes an overflow bug in DecodeInterval when applying
> > the units week, decade, century, and millennium. The overflow check logic
> > was modelled after the overflow check at the beginning of `int
> > tm2interval(struct pg_tm *tm, fsec_t fsec, Interval *span);` in timestamp.c.
>
>
> Good catch, but I don't think that tm2interval code is best practice
> anymore.  Rather than bringing "double" arithmetic into the mix,
> you should use the overflow-detecting arithmetic functions in
> src/include/common/int.h.  The existing code here is also pretty
> faulty in that it doesn't notice addition overflow when combining
> multiple units.  So for example, instead of
>
>
> tm->tm_mday += val * 7;
>
>
> I think we should write something like
>
>
> if (pg_mul_s32_overflow(val, 7, ))
> return DTERR_FIELD_OVERFLOW;
> if (pg_add_s32_overflow(tm->tm_mday, tmp, >tm_mday))
> return DTERR_FIELD_OVERFLOW;
>
>
> Perhaps some macros could be used to make this more legible?
>
>
> regards, tom lane
>
>
> @postgresql

Thanks for the feedback Tom, I've attached an updated patch with
your suggestions. Feel free to rename the horribly named macro.

Also while fixing this I noticed that fractional intervals can also
cause an overflow issue.
postgres=# SELECT INTERVAL '0.1 months 2147483647 days';
 interval
--
 -2147483646 days
(1 row)
I haven't looked into it, but it's probably a similar cause.
From 49b5beb4132a0c9e793e1fd7b91a4da0c96a6196 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Fri, 11 Feb 2022 14:18:32 -0500
Subject: [PATCH] Check for overflow when decoding an interval

When decoding an interval there are various date units which are
aliases for some multiple of another unit. For example a week is 7 days
and a decade is 10 years. When decoding these specific units, there is
no check for overflow, allowing the interval to overflow. This commit
adds an overflow check for all of these units.
---
 src/backend/utils/adt/datetime.c   | 22 ++
 src/test/regress/expected/interval.out | 32 ++
 src/test/regress/sql/interval.sql  |  8 +++
 3 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 7926258c06..59d391adda 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -21,6 +21,7 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_type.h"
+#include "common/int.h"
 #include "common/string.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -3106,6 +3107,19 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 	int			val;
 	double		fval;
 
+	/*
+	 * Multiply src by mult and add it to dest while checking for overflow.
+	 */
+#define SAFE_MUL_ADD(src, mult, dst)	\
+	do\
+	{\
+		int tmp;		\
+		if (pg_mul_s32_overflow(src, mult, ))		\
+			return DTERR_FIELD_OVERFLOW;			\
+		if (pg_add_s32_overflow(dst, tmp, &(dst)))		\
+			return DTERR_FIELD_OVERFLOW;			\
+	} while (0)
+
 	*dtype = DTK_DELTA;
 	type = IGNORE_DTF;
 	ClearPgTm(tm, fsec);
@@ -3293,7 +3307,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 		break;
 
 	case DTK_WEEK:
-		tm->tm_mday += val * 7;
+		SAFE_MUL_ADD(val, 7, tm->tm_mday);
 		AdjustFractDays(fval, tm, fsec, 7);
 		tmask = DTK_M(WEEK);
 		break;
@@ -3311,19 +3325,19 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 		break;
 
 	case DTK_DECADE:
-		tm->tm_year += val * 10;
+		SAFE_MUL_ADD(val, 10, tm->tm_year);
 		tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 10);
 		tmask = DTK_M(DECADE);
 		break;
 
 	case DTK_CENTURY:
-		tm->tm_year += val * 100;
+		SAFE_MUL_ADD(val, 100, tm->tm_year);
 		tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 100);
 		tmask = DTK_M(CENTURY);
 		break;
 
 	case DTK_MILLENNIUM:
-		tm->tm_year += val * 1000;
+		SAFE_MUL_ADD(val, 1000, tm->tm_year);
 		tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 1000);
 		tmask = DTK_M(MILLENNIUM);
 		break;
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index accd4a7d90..88865483fa 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -232,6 +232,38 @@ INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years');
 ERROR:  interval out of range
 LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years'...
  ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks');
+ERROR:  interval field value out of range: "2147483647 weeks"
+LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks')...
+ ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 weeks');

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

2022-02-11 Thread Thomas Munro
On Thu, Jan 6, 2022 at 10:23 AM Robert Haas  wrote:
> That's fixed now. So what should we do about this patch? This is a
> bug, so it would be nice to do *something*. I don't really like the
> fact that this makes the behavior contingent on USE_ASSERT_CHECKING,
> and I suggest that you make a new symbol like USE_BARRIER_SMGR_RELEASE
> which by default gets defined on WIN32, but can be defined elsewhere
> if you want (see the treatment of EXEC_BACKEND in pg_config_manual.h).

Ok, done like that.

> Furthermore, I can't see back-patching this, given that it would be
> the very first use of the barrier machinery. But I think it would be
> good to get something into master, because then we'd actually be using
> this procsignalbarrier stuff for something. On a good day we've fixed
> a bug. On a bad day we'll learn something new about how
> procsignalbarrier needs to work.

Agreed.

Pushed.  The basic Windows/tablespace bug seen occasionally in CI[1]
should now be fixed.

For the sake of the archives, here's a link to the ongoing discussion
about further potential uses of this mechanism:

https://www.postgresql.org/message-id/flat/20220209220004.kb3dgtn2x2k2gtdm%40alap3.anarazel.de

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGJp-m8uAD_wS7%2BdkTgif013SNBSoJujWxvRUzZ1nkoUyA%40mail.gmail.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 4:08 PM Alvaro Herrera  wrote:
> It seems you're thinking deciding what to do based on an option that
> gets a boolean argument.  But what about making the argument be an enum?
> For example
>
> CREATE DATABASE ... WITH (STRATEGY = LOG);  -- default if option is 
> omitted
> CREATE DATABASE ... WITH (STRATEGY = CHECKPOINT);
>
> So the user has to think about it in terms of some strategy to choose,
> rather than enabling or disabling some flag with nontrivial
> implications.

I don't like those particular strategy names very much, but in general
I think that could be a way to go, too. I somewhat hope we never end
up with THREE strategies for creating a new database, but now that I
think about it, we might. Somebody might want to use a fancy FS
primitive that clones a directory at the FS level, or something.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Merging statistics from children instead of re-sampling everything

2022-02-11 Thread Robert Haas
On Wed, Jun 30, 2021 at 11:15 AM Tomas Vondra
 wrote:
> You're right maintaining a per-partition samples and merging those might
> solve (or at least reduce) some of the problems, e.g. eliminating most
> of the I/O that'd be needed for sampling. And yeah, it's not entirely
> clear how to merge some of the statistics types (like ndistinct). But
> for a lot of the basic stats it works quite nicely, I think.

It feels like you might in some cases get very different answers.
Let's say you have 1000 partitions. In each of those partitions, there
is a particular value that appears in column X in 50% of the rows.
This value differs for every partition. So you can imagine for example
that in partition 1, X = 1 with probability 50%; in partition 2, X = 2
with probability 50%, etc. There is also a value, let's say 0, which
appears in 0.5% of the rows in every partition. It seems possible that
0 is not an MCV in any partition, or in only some of them, but it
might be more common overall than the #1 MCV of any single partition.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Andrew Dunstan


On 2/11/22 15:47, Robert Haas wrote:
> On Fri, Feb 11, 2022 at 3:40 PM Andrew Dunstan  wrote:
>> I'm not really sure any single parameter name is going to capture the
>> subtlety involved here.
> I mean to some extent that's inevitable, but it's not a reason not to
> do the best we can.


True.

I do think we should be wary of any name starting with "LOG", though.
Long experience tells us that's something that confuses users when it
refers to the WAL.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Alvaro Herrera
On 2022-Feb-11, Robert Haas wrote:

> What I find difficult about doing that is that this is all a bunch of
> technical details that users may have difficulty understanding. If we
> say WAL_LOG or WAL_LOG_DATA, a reasonably but not incredibly
> well-informed user will assume that skipping WAL is not really an
> option. If we say CHECKPOINT, a reasonably but not incredibly
> well-informed user will presume they don't want one (I think).
> CHECKPOINT also seems like it's naming the switch by the unwanted side
> effect, which doesn't seem too flattering to the existing method.

It seems you're thinking deciding what to do based on an option that
gets a boolean argument.  But what about making the argument be an enum?
For example

CREATE DATABASE ... WITH (STRATEGY = LOG);  -- default if option is omitted
CREATE DATABASE ... WITH (STRATEGY = CHECKPOINT);

So the user has to think about it in terms of some strategy to choose,
rather than enabling or disabling some flag with nontrivial
implications.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"[PostgreSQL] is a great group; in my opinion it is THE best open source
development communities in existence anywhere."(Lamar Owen)




Re: Per-table storage parameters for TableAM/IndexAM extensions

2022-02-11 Thread Robert Haas
On Wed, Dec 29, 2021 at 12:08 PM Sadhuprasad Patro  wrote:
> Open Question: When a user changes AM, then what should be the
> behavior for not supported storage options? Should we drop the options
> and go with only system storage options?
> Or throw an error, in which case the user has to clean the added parameters.

A few people have endorsed the error behavior here but I foresee problems.

Imagine that I am using the "foo" tableam with "compression=lots" and
I want to switch to the "bar" AM which does not support that option.
If I remove the "compression=lots" option using a separate command,
the "foo" table AM may rewrite my whole table and decompress
everything. Then when I convert to the "bar" AM it's going to have to
be rewritten again. That's painful. I clearly need some way to switch
AMs without having to rewrite the table twice.

It's also interesting to consider the other direction. If I am
switching from "bar" to "foo" I would really like to be able to add
the "compression=lots" option at the same time I make the switch.
There needs to be some syntax for that.

One way to solve the first of these problem is to silently drop
unsupported options. Maybe a better way is to have syntax that allows
you to specify options to be added and removed at the time you switch
AMs e.g.:

ALTER TABLE mytab SET ACCESS METHOD bar OPTIONS (DROP compression);
ALTER TABLE mytab SET ACCESS METHOD foo OPTIONS (ADD compression 'lots');

I don't like that particular syntax a ton personally but it does match
what we already use for ALTER SERVER. Unfortunately it's wildly
inconsistent with what we do for ALTER TABLE. Another idea might be
something like:

ALTER TABLE mytab SET ACCESS METHOD bar RESET compression;
ALTER TABLE mytab SET ACCESS METHOD foo SET compression = 'lots';

You'd need to be able to do multiple things with one command e.g.

ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
preferred_fruit = 'banana';

Regardless of the details, I don't think it's viable to just say,
well, rewrite the table multiple times if that's what it takes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Fix overflow in DecodeInterval

2022-02-11 Thread Tom Lane
Joseph Koshakow  writes:
> The attached patch fixes an overflow bug in DecodeInterval when applying
> the units week, decade, century, and millennium. The overflow check logic
> was modelled after the overflow check at the beginning of `int
> tm2interval(struct pg_tm *tm, fsec_t fsec, Interval *span);` in timestamp.c.

Good catch, but I don't think that tm2interval code is best practice
anymore.  Rather than bringing "double" arithmetic into the mix,
you should use the overflow-detecting arithmetic functions in
src/include/common/int.h.  The existing code here is also pretty
faulty in that it doesn't notice addition overflow when combining
multiple units.  So for example, instead of

tm->tm_mday += val * 7;

I think we should write something like

if (pg_mul_s32_overflow(val, 7, ))
return DTERR_FIELD_OVERFLOW;
if (pg_add_s32_overflow(tm->tm_mday, tmp, >tm_mday))
return DTERR_FIELD_OVERFLOW;

Perhaps some macros could be used to make this more legible?

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 1:32 PM Bruce Momjian  wrote:
> > Yeah, maybe. But it's not clear to me with that kind of naming whether
> > TRUE or FALSE would be the existing behavior? One version logs a
> > single record for the whole database, and the other logs a record per
> > database block. Neither version logs per file. LOG_COPIED_BLOCKS,
> > maybe?
>
> Yes, I like BLOCKS more than FILE.

Cool.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 3:40 PM Andrew Dunstan  wrote:
> I'm not really sure any single parameter name is going to capture the
> subtlety involved here.

I mean to some extent that's inevitable, but it's not a reason not to
do the best we can.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: support for CREATE MODULE

2022-02-11 Thread Robert Haas
On Thu, Feb 10, 2022 at 4:17 PM Alvaro Herrera  wrote:
> On 2022-Feb-04, Tom Lane wrote:
> > If we invent modules I think they need to work more like extensions
> > naming-wise, ie they group objects but have no effect for naming.
>
> I think modules are an orthogonal concept to extensions, and trying to
> mix both is messy.
>
> The way I see modules working is as a "logical" grouping of objects --
> they provide encapsulated units of functionality.  A module has private
> functions, which cannot be called except from other functions in the
> same module.  You can abstract them out of the database design, leaving
> you with only the exposed functions, the public API.
>
> An extension is a way to distribute or package objects.  An extension
> can contain a module, and of course you should be able to use an
> extension to distribute a module, or even several modules.  In fact, I
> think it should be possible to have several extensions contribute
> different objects to the same module.
>
> But things like name resolution rules (search path) are not affected by
> how the objects are distributed, whereas the search path is critical in
> how you think about the objects in a module.
>
> If modules are just going to be extensions, I see no point.

+1.

I think that extensions, as we have them in PostgreSQL today, are
basically feature-complete. In my experience, they do all the things
that we need them to do, and pretty well. I think Tom was correct when
he predicted many years ago that getting the extension feature into
PostgreSQL would be remembered as one the biggest improvements of that
era. (I do not recall his exact words.)

But the same is clearly not true of schemas. Schemas are missing
features that people expect to have, private objects being one of
them, and variables another, and I think there are other things
missing, too. Also, search_path is an absolutely wretched design and
I'd propose ripping it out if I had any sensible idea what would
replace it.

IMHO, if we're going to do anything at all in this area, it ought to
be targeting the rather-large weaknesses of the schema system, rather
than anything about the extension system.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Andrew Dunstan


On 2/11/22 13:32, Bruce Momjian wrote:
> On Fri, Feb 11, 2022 at 01:18:58PM -0500, Robert Haas wrote:
>> On Fri, Feb 11, 2022 at 12:50 PM Bruce Momjian  wrote:
>>> On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote:
 How about something like LOG_AS_CLONE? That makes it clear, I hope,
 that we're logging it a different way, but that method of logging it
 is different in each case. You'd still have to read the documentation
 to find out what it really means, but at least it seems like it points
 you more in the right direction. To me, anyway.
>>> I think CLONE would be confusing since we don't use that term often,
>>> maybe LOG_DB_COPY or LOG_FILE_COPY?
>> Yeah, maybe. But it's not clear to me with that kind of naming whether
>> TRUE or FALSE would be the existing behavior? One version logs a
>> single record for the whole database, and the other logs a record per
>> database block. Neither version logs per file. LOG_COPIED_BLOCKS,
>> maybe?
> Yes, I like BLOCKS more than FILE.


I'm not really sure any single parameter name is going to capture the
subtlety involved here.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Replacing TAP test planning with done_testing()

2022-02-11 Thread Alvaro Herrera
On 2022-Feb-11, Tom Lane wrote:

> Andres Freund  writes:

> > +1 on backpatching. Backpatching tests now is less likely to cause 
> > conflicts,
> > but more likely to fail during tests.
> 
> If you've got the energy to do it, +1 for backpatching.  I agree
> with Michael's opinion that doing so will probably save us
> future annoyance.

+1

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: support for MERGE

2022-02-11 Thread Alvaro Herrera
On 2022-Feb-11, Justin Pryzby wrote:

> Because I put your patch on top of some other branch with the CI coverage (and
> other stuff).

Ah, that makes sense.

> But it has to figure out where the branch "starts".  Which I did by looking at
> "git diff --cherry-pick origin..."
>
> I'm not sure git diff --cherry-pick is widely known/used, but I think
> using that relative to master may be good enough.  

I had never heard of git diff --cherry-pick, and the manpages I found
don't document it, so frankly I doubt it's known.  I still have no idea
what does it do.

I suppose there is an obvious reason why using git diff with
$(git merge-base ...) as one of the arguments doesn't work for these purposes.

> Andres thinks that does the wrong thing if CI is run manually (not by CFBOT)
> for patches against backbranches.

I wonder if it's sufficient to handle these things (coverage
specifically) for branch master only.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"




Re: Replacing TAP test planning with done_testing()

2022-02-11 Thread Tom Lane
Andres Freund  writes:
> On 2022-02-11 21:04:53 +0100, Daniel Gustafsson wrote:
>> I opted out of backpatching for now, to solicit more comments on that.  It's
>> not a bugfix, but it's also not affecting the compiled bits that we ship, so 
>> I
>> think there's a case to be made both for and against a backpatch.  Looking at
>> the oldest branch we support, it seems we've done roughly 25 changes to the
>> test plans in REL_10_STABLE over the years, so it's neither insignificant nor
>> an everyday activity.  Personally I don't have strong opinions, what do 
>> others
>> think?

> +1 on backpatching. Backpatching tests now is less likely to cause conflicts,
> but more likely to fail during tests.

If you've got the energy to do it, +1 for backpatching.  I agree
with Michael's opinion that doing so will probably save us
future annoyance.

regards, tom lane




Re: Replacing TAP test planning with done_testing()

2022-02-11 Thread Andres Freund
On 2022-02-11 21:04:53 +0100, Daniel Gustafsson wrote:
> I opted out of backpatching for now, to solicit more comments on that.  It's
> not a bugfix, but it's also not affecting the compiled bits that we ship, so I
> think there's a case to be made both for and against a backpatch.  Looking at
> the oldest branch we support, it seems we've done roughly 25 changes to the
> test plans in REL_10_STABLE over the years, so it's neither insignificant nor
> an everyday activity.  Personally I don't have strong opinions, what do others
> think?

+1 on backpatching. Backpatching tests now is less likely to cause conflicts,
but more likely to fail during tests.




Re: Replacing TAP test planning with done_testing()

2022-02-11 Thread Daniel Gustafsson
> On 10 Feb 2022, at 01:58, Michael Paquier  wrote:
>>> Daniel Gustafsson  writes:

 The attached patch removes all Test::More planning and instead ensures 
 that all
 tests conclude with a done_testing() call.

Pushed to master now with a few more additional hunks fixing test changes that
happened between posting this and now.

> Could it be possible to backpatch that even if
> this could be qualified as only cosmetic?  Each time a test is
> backpatched we need to tweak the number of tests planned, and that may
> change slightly depending on the branch dealt with.

I opted out of backpatching for now, to solicit more comments on that.  It's
not a bugfix, but it's also not affecting the compiled bits that we ship, so I
think there's a case to be made both for and against a backpatch.  Looking at
the oldest branch we support, it seems we've done roughly 25 changes to the
test plans in REL_10_STABLE over the years, so it's neither insignificant nor
an everyday activity.  Personally I don't have strong opinions, what do others
think?

--
Daniel Gustafsson   https://vmware.com/





Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tomas Vondra
On 2/11/22 19:13, Robert Haas wrote:
> On Fri, Feb 11, 2022 at 1:06 PM Tom Lane  wrote:
>> While I'm not opposed to moving those goalposts at some point,
>> I think pushing them all the way up to 9.5 for this one easily-fixed
>> problem is not very reasonable.
>>
>> Given other recent discussion, an argument could be made for moving
>> the cutoff to 9.2, on the grounds that it's too hard to test against
>> anything older.  But that still leaves us needing a version check
>> if we want to use TABLESAMPLE.
> 
> OK, thanks.
> 

Yeah, I think checking server_version is fairly simple. Which is mostly
why I haven't done anything about that in the submitted patch, because
the other issues (what fraction to sample) seemed more important.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Fix overflow in DecodeInterval

2022-02-11 Thread Joseph Koshakow
The attached patch fixes an overflow bug in DecodeInterval when applying
the units week, decade, century, and millennium. The overflow check logic
was modelled after the overflow check at the beginning of `int
tm2interval(struct pg_tm *tm, fsec_t fsec, Interval *span);` in timestamp.c.

This is my first patch, so apologies if I screwed up the process somewhere.

- Joe Koshakow
From b3c096039a4684646806e2959d7ac9318504b031 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Fri, 11 Feb 2022 14:18:32 -0500
Subject: [PATCH] Check for overflow when decoding an interval

When decoding an interval there are various date units which are
aliases for some multiple of another unit. For example a week is 7 days
and a decade is 10 years. When decoding these specific units, there is
no check for overflow, allowing the interval to overflow. This commit
adds an overflow check for all of these units.
---
 src/backend/utils/adt/datetime.c   | 28 ++
 src/test/regress/expected/interval.out | 32 ++
 src/test/regress/sql/interval.sql  |  8 +++
 3 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 7926258c06..2813e8647d 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3293,10 +3293,15 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 		break;
 
 	case DTK_WEEK:
-		tm->tm_mday += val * 7;
+	{
+		double days = (double) val * 7;
+		if (days > INT_MAX || days < INT_MIN)
+			return DTERR_FIELD_OVERFLOW;
+		tm->tm_mday += days;
 		AdjustFractDays(fval, tm, fsec, 7);
 		tmask = DTK_M(WEEK);
 		break;
+	}
 
 	case DTK_MONTH:
 		tm->tm_mon += val;
@@ -3311,22 +3316,37 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 		break;
 
 	case DTK_DECADE:
-		tm->tm_year += val * 10;
+	{
+		double years = (double) val * 10;
+		if (years > INT_MAX || years < INT_MIN)
+			return DTERR_FIELD_OVERFLOW;
+		tm->tm_year += years;
 		tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 10);
 		tmask = DTK_M(DECADE);
 		break;
+	}
 
 	case DTK_CENTURY:
-		tm->tm_year += val * 100;
+	{
+		double years = (double) val * 100;
+		if (years > INT_MAX || years < INT_MIN)
+			return DTERR_FIELD_OVERFLOW;
+		tm->tm_year += years;
 		tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 100);
 		tmask = DTK_M(CENTURY);
 		break;
+	}
 
 	case DTK_MILLENNIUM:
-		tm->tm_year += val * 1000;
+	{
+		double years = (double) val * 1000;
+		if (years > INT_MAX || years < INT_MIN)
+			return DTERR_FIELD_OVERFLOW;
+		tm->tm_year += years;
 		tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 1000);
 		tmask = DTK_M(MILLENNIUM);
 		break;
+	}
 
 	default:
 		return DTERR_BAD_FORMAT;
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index accd4a7d90..88865483fa 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -232,6 +232,38 @@ INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years');
 ERROR:  interval out of range
 LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years'...
  ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks');
+ERROR:  interval field value out of range: "2147483647 weeks"
+LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks')...
+ ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 weeks');
+ERROR:  interval field value out of range: "-2147483648 weeks"
+LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 weeks'...
+ ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 decades');
+ERROR:  interval field value out of range: "2147483647 decades"
+LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 decades...
+ ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 decades');
+ERROR:  interval field value out of range: "-2147483648 decades"
+LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 decade...
+ ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 centuries');
+ERROR:  interval field value out of range: "2147483647 centuries"
+LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 centuri...
+ ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 centuries');
+ERROR:  interval field value out of range: "-2147483648 centuries"
+LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 centur...
+ ^
+INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 

Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Andres Freund
Hi,

On 2022-02-11 13:48:59 +, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 08:48, Simon Riggs  
> wrote:
> >
> > On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:
> >
> > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > increase the window between the TransactionIdCommitTree() and
> > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But 
> > > at
> > > least it might make it easier to write tests exercising this scenario...
> >
> > Agreed
> >
> > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > item cache is set too early in some cases. The single item cache is
> > important for performance, so we just need to be more careful about
> > setting the cache.
> 
> Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> being set, but this fails! Not worked out why, yet.

I don't think it's safe to check !TransactionIdKnownNotInProgress() in
TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
do TransactionLogFetch() internally.


> just_remove_TransactionIdIsKnownCompleted_call.v1.patch

I think this might contain a different diff than what the name suggests?


> just removes the known offending call, passes make check, but IMHO
> leaves the same error just as likely by other callers.

Which other callers are you referring to?



To me it seems that the "real" reason behind this specific issue being
nontrivial to fix and behind the frequent error of calling
TransactionIdDidCommit() before TransactionIdIsInProgress() is
that we've really made a mess out of the transaction status determination code
/ API. IMO the original sin is requiring the complicated

if (TransactionIdIsCurrentTransactionId(xid)
...
else if (TransactionIdIsInProgress(xid)
...
else if (TransactionIdDidCommit(xid)
...

dance at pretty much any place checking transaction status. The multiple calls
for the same xid is, I think, what pushed the caching down to the
TransactionLogFetch level.

ISTM that we should introduce something like TransactionIdStatus(xid) that
returns
XACT_STATUS_CURRENT
XACT_STATUS_IN_PROGRESS
XACT_STATUS_COMMITTED
XACT_STATUS_ABORTED
accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
XidInMVCCSnapshot() instead of TransactionIdIsInProgress().

I think that'd also make visibility checks faster - we spend a good chunk of
cycles doing unnecessary function calls and repeatedly gathering information.


It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
optimized - TransactionIdIsInProgress() knows it doesn't need to check
anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
smaller than GetTopTransactionIdIfAny() before bsearching through subxids.

Of course anything along these lines would never be backpatchable...

Greetings,

Andres Freund




Re: support for MERGE

2022-02-11 Thread Justin Pryzby
On Fri, Feb 11, 2022 at 03:21:43PM -0300, Alvaro Herrera wrote:
> On 2022-Jan-28, Justin Pryzby wrote:
> > Have you looked at code coverage ?  I have an experimental patch to add 
> > that to
> > cirrus, and ran it with this patch; visible here:
> > https://cirrus-ci.com/task/6362512059793408
> 
> Ah, thanks, this is useful.  I think it is showing that the new code is
> generally well covered, but there are some exceptions, particularly
> ExecMergeMatched in some concurrent cases (TM_Deleted and
> TM_SelfModified after table_tuple_lock -- those code pages have
> user-facing errors but are not covered by any tests.)
> 
> How does this work?  I notice it is reporting for
> src/bin/pg_upgrade/relfilenode.c, but that file is not changed by this
> patch.

Because I put your patch on top of some other branch with the CI coverage (and
other stuff).

It tries to only show files changed by the branch being checked:
https://github.com/justinpryzby/postgres/commit/d668142040031915

But it has to figure out where the branch "starts".  Which I did by looking at
"git diff --cherry-pick origin..."

Andres thinks that does the wrong thing if CI is run manually (not by CFBOT)
for patches against backbranches.  I'm not sure git diff --cherry-pick is
widely known/used, but I think using that relative to master may be good
enough.  

Ongoing discussion here.
https://www.postgresql.org/message-id/20220203035827.GG23027%40telsasoft.com

-- 
Justin




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Bruce Momjian
On Fri, Feb 11, 2022 at 01:18:58PM -0500, Robert Haas wrote:
> On Fri, Feb 11, 2022 at 12:50 PM Bruce Momjian  wrote:
> > On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote:
> > > How about something like LOG_AS_CLONE? That makes it clear, I hope,
> > > that we're logging it a different way, but that method of logging it
> > > is different in each case. You'd still have to read the documentation
> > > to find out what it really means, but at least it seems like it points
> > > you more in the right direction. To me, anyway.
> >
> > I think CLONE would be confusing since we don't use that term often,
> > maybe LOG_DB_COPY or LOG_FILE_COPY?
> 
> Yeah, maybe. But it's not clear to me with that kind of naming whether
> TRUE or FALSE would be the existing behavior? One version logs a
> single record for the whole database, and the other logs a record per
> database block. Neither version logs per file. LOG_COPIED_BLOCKS,
> maybe?

Yes, I like BLOCKS more than FILE.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: support for MERGE

2022-02-11 Thread Alvaro Herrera
On 2022-Jan-28, Justin Pryzby wrote:

> Have you looked at code coverage ?  I have an experimental patch to add that 
> to
> cirrus, and ran it with this patch; visible here:
> https://cirrus-ci.com/task/6362512059793408

Ah, thanks, this is useful.  I think it is showing that the new code is
generally well covered, but there are some exceptions, particularly
ExecMergeMatched in some concurrent cases (TM_Deleted and
TM_SelfModified after table_tuple_lock -- those code pages have
user-facing errors but are not covered by any tests.)


How does this work?  I notice it is reporting for
src/bin/pg_upgrade/relfilenode.c, but that file is not changed by this
patch.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 12:50 PM Bruce Momjian  wrote:
> On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote:
> > How about something like LOG_AS_CLONE? That makes it clear, I hope,
> > that we're logging it a different way, but that method of logging it
> > is different in each case. You'd still have to read the documentation
> > to find out what it really means, but at least it seems like it points
> > you more in the right direction. To me, anyway.
>
> I think CLONE would be confusing since we don't use that term often,
> maybe LOG_DB_COPY or LOG_FILE_COPY?

Yeah, maybe. But it's not clear to me with that kind of naming whether
TRUE or FALSE would be the existing behavior? One version logs a
single record for the whole database, and the other logs a record per
database block. Neither version logs per file. LOG_COPIED_BLOCKS,
maybe?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 1:06 PM Tom Lane  wrote:
> While I'm not opposed to moving those goalposts at some point,
> I think pushing them all the way up to 9.5 for this one easily-fixed
> problem is not very reasonable.
>
> Given other recent discussion, an argument could be made for moving
> the cutoff to 9.2, on the grounds that it's too hard to test against
> anything older.  But that still leaves us needing a version check
> if we want to use TABLESAMPLE.

OK, thanks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tom Lane
Robert Haas  writes:
> I think it's going to be necessary to compromise on that at some
> point.

Sure.  The existing postgres_fdw documentation [1] already addresses
this issue:

postgres_fdw can be used with remote servers dating back to PostgreSQL
8.3. Read-only capability is available back to 8.1. A limitation
however is that postgres_fdw generally assumes that immutable built-in
functions and operators are safe to send to the remote server for
execution, if they appear in a WHERE clause for a foreign table. Thus,
a built-in function that was added since the remote server's release
might be sent to it for execution, resulting in “function does not
exist” or a similar error. This type of failure can be worked around
by rewriting the query, for example by embedding the foreign table
reference in a sub-SELECT with OFFSET 0 as an optimization fence, and
placing the problematic function or operator outside the sub-SELECT.

While I'm not opposed to moving those goalposts at some point,
I think pushing them all the way up to 9.5 for this one easily-fixed
problem is not very reasonable.

Given other recent discussion, an argument could be made for moving
the cutoff to 9.2, on the grounds that it's too hard to test against
anything older.  But that still leaves us needing a version check
if we want to use TABLESAMPLE.

regards, tom lane

[1] https://www.postgresql.org/docs/devel/postgres-fdw.html#id-1.11.7.45.17




Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-11 Thread Nathan Bossart
Here is a rebased patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 506aa95dd77f16dc64d7fe9c52ca67dd3c10212e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v4 1/8] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 213 
 src/backend/postmaster/postmaster.c |  44 -
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  17 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 11 files changed, 300 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index dbbeac5a82..1b7aae60f5 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 0587e45920..7eae34884d 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..dd86f0f5ce
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,213 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process is new as of Postgres 15.  It's main purpose is to
+ * offload tasks that could otherwise delay startup and checkpointing, but
+ * it needn't be restricted to just those things.  Offloaded tasks should
+ * not be synchronous (e.g., checkpointing shouldn't need to wait for the
+ * custodian to complete a task before proceeding).  Also, ensure that any
+ * offloaded tasks are either not required during single-user mode or are
+ * performed separately during single-user mode.
+ *
+ * The custodian is not an essential process and can shutdown quickly when
+ * requested.  The custodian will wake up approximately once every 5
+ * minutes to perform its tasks, but backends can (and should) set its
+ * latch to wake it up sooner.
+ *
+ * Normal termination is by SIGTERM, which instructs the bgwriter to
+ * exit(0).  Emergency termination is by SIGQUIT; like any backend, the
+ * custodian will simply abort and exit on SIGQUIT.
+ *
+ * If the custodian exits unexpectedly, the postmaster treats that the same
+ * as a backend crash: shared memory may be corrupted, so remaining
+ * backends should be killed by SIGQUIT and then a recovery cycle started.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "utils/memutils.h"
+
+#define CUSTODIAN_TIMEOUT_S (300)		/* 5 minutes */
+
+/*
+ * Main entry point for custodian process
+ *
+ * This is invoked from AuxiliaryProcessMain, which has already created the
+ * basic execution environment, but not enabled signals yet.
+ */
+void
+CustodianMain(void)
+{
+	sigjmp_buf	local_sigjmp_buf;
+	MemoryContext custodian_context;
+
+	/*
+	 * Properly accept or ignore signals that might be sent to us.
+	 */
+	

[PATCH] Support % wildcard in extension upgrade filenames

2022-02-11 Thread Sandro Santilli
At PostGIS we've been thinking of ways to have better support, from
PostgreSQL proper, to our upgrade strategy, which has always consisted
in a SINGLE upgrade file good for upgrading from any older version.

The current lack of such support for EXTENSIONs forced us to install
a lot of files on disk and we'd like to stop doing that at some point
in the future.

The proposal is to support wildcards for versions encoded in the
filename so that (for our case) a single wildcard could match "any
version". I've been thinking about the '%' character for that, to
resemble the wildcard used for LIKE.

Here's the proposal:
https://lists.osgeo.org/pipermail/postgis-devel/2022-February/029500.html

A very very short (and untested) patch which might (or
might not) support our case is attached.

The only problem with my proposal/patch would be the presence, on the
wild, of PostgreSQL EXTENSION actually using the '%' character in
their version strings, which is currently considered legit by
PostgreSQL.

How do you feel about the proposal (which is wider than the patch) ?

--strk; 

  Libre GIS consultant/developer
  https://strk.kbt.io/services.html
>From e9d060b19c655924c4edb6679169261d605f735d Mon Sep 17 00:00:00 2001
From: Sandro Santilli 
Date: Fri, 11 Feb 2022 18:49:45 +0100
Subject: [PATCH] Support % wildcard in extension upgrade scripts

---
 src/backend/commands/extension.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index a2e77c418a..aafd9c17ee 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1072,6 +1072,8 @@ get_ext_ver_info(const char *versionname, List **evi_list)
 	foreach(lc, *evi_list)
 	{
 		evi = (ExtensionVersionInfo *) lfirst(lc);
+		if (strcmp(evi->name, "%") == 0)
+			return evi;
 		if (strcmp(evi->name, versionname) == 0)
 			return evi;
 	}
-- 
2.30.2



Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Bruce Momjian
On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote:
> How about something like LOG_AS_CLONE? That makes it clear, I hope,
> that we're logging it a different way, but that method of logging it
> is different in each case. You'd still have to read the documentation
> to find out what it really means, but at least it seems like it points
> you more in the right direction. To me, anyway.

I think CLONE would be confusing since we don't use that term often,
maybe LOG_DB_COPY or LOG_FILE_COPY?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 12:39 PM Tom Lane  wrote:
> Tomas Vondra  writes:
> > So here we go. The patch does a very simple thing - it uses TABLESAMPLE
> > to collect/transfer just a small sample from the remote node, saving
> > both CPU and network.
>
> This is great if the remote end has TABLESAMPLE, but pre-9.5 servers
> don't, and postgres_fdw is supposed to still work with old servers.
> So you need some conditionality for that.

I think it's going to be necessary to compromise on that at some
point. I don't, for example, think it would be reasonable for
postgres_fdw to have detailed knowledge of which operators can be
pushed down as a function of the remote PostgreSQL version. Nor do I
think that we care about whether this works at all against, say,
PostgreSQL 8.0. I'm not sure where it's reasonable to draw a line and
say we're not going to expend any more effort, and maybe 15 with 9.5
is a small enough gap that we still care at least somewhat about
compatibility. But even that is not 100% obvious to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tom Lane
Tomas Vondra  writes:
> So here we go. The patch does a very simple thing - it uses TABLESAMPLE 
> to collect/transfer just a small sample from the remote node, saving 
> both CPU and network.

This is great if the remote end has TABLESAMPLE, but pre-9.5 servers
don't, and postgres_fdw is supposed to still work with old servers.
So you need some conditionality for that.

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 12:11 PM Andrew Dunstan  wrote:
> The last one at least has the advantage that it doesn't invent yet
> another keyword.

We don't need a new keyword for this as long as it lexes as one token,
because createdb_opt_name accepts IDENT. So I think we should focus on
trying to come up with something that is as clear as we know how to
make it.

What I find difficult about doing that is that this is all a bunch of
technical details that users may have difficulty understanding. If we
say WAL_LOG or WAL_LOG_DATA, a reasonably but not incredibly
well-informed user will assume that skipping WAL is not really an
option. If we say CHECKPOINT, a reasonably but not incredibly
well-informed user will presume they don't want one (I think).
CHECKPOINT also seems like it's naming the switch by the unwanted side
effect, which doesn't seem too flattering to the existing method.

How about something like LOG_AS_CLONE? That makes it clear, I hope,
that we're logging it a different way, but that method of logging it
is different in each case. You'd still have to read the documentation
to find out what it really means, but at least it seems like it points
you more in the right direction. To me, anyway.

> I can live with the new method being the default. I'm sure it would be
> highlighted in the release notes too.

That would make sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Tom Lane
Justin Pryzby  writes:
> Or, is it okay to use ABS_SRCDIR?

Don't see why not --- other tests certainly do.

regards, tom lane




Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Justin Pryzby
On Fri, Feb 11, 2022 at 10:41:27AM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Fri, Feb 11, 2022 at 09:59:55AM -0500, Tom Lane wrote:
> >> This seems like a pretty bad idea even if it weren't failing outright.
> >> We should be examining the version of the file that's in the source
> >> tree; the one in the installation tree might have version-skew
> >> problems, if you've not yet done "make install".
> 
> > My original way used the source tree, but Michael thought it would be an 
> > issue
> > for "installcheck" where the config may not be available.
> 
> Yeah, you are at risk either way, but in practice nobody is going to be
> running these TAP tests without a source tree.
> 
> > This is what I had written:
> > FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') 
> > AS ln) conf
> 
> That's not using the source tree either, but the copy in the
> cluster-under-test.  I'd fear it to be unstable in the buildfarm, where
> animals can append whatever they please to the config file being used by
> tests.

The important test is that "new GUCs exist in sample.conf".  The converse test
is less interesting, so I think the important half would be okay.

I see the BF client appends to postgresql.conf.
https://github.com/PGBuildFarm/client-code/blob/main/run_build.pl#L1374

It could use "include", which was added in v8.2.  Or it could add a marker,
like pg_regress does:
src/test/regress/pg_regress.c:  fputs("\n# Configuration added by 
pg_regress\n\n", pg_conf);

If it were desirable to also check that "things that look like GUCs in the conf
are actually GUCs", either of those would avoid false positives.

Or, is it okay to use ABS_SRCDIR?

+\getenv abs_srcdir PG_ABS_SRCDIR
+\set filename :abs_srcdir 
'../../../../src/backend/utils/misc/postgresql.conf.sample'
+-- test that GUCS are in postgresql.conf
+SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample EXCEPT
+SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc
+FROM (SELECT regexp_split_to_table(pg_read_file(:'filename'), '\n') AS ln) conf
+WHERE ln ~ '^#?[[:alpha:]]'
+ORDER BY 1;
+
+-- test that lines in postgresql.conf that look like GUCs are GUCs
+SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc
+FROM (SELECT regexp_split_to_table(pg_read_file(:'filename'), '\n') AS ln) conf
+WHERE ln ~ '^#?[[:alpha:]]'
+EXCEPT SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample
+ORDER BY 1;

-- 
Justin




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-11 Thread Andrew Dunstan


On 2/10/22 07:32, Dilip Kumar wrote:
> On Wed, Feb 9, 2022 at 9:31 PM Robert Haas  wrote:
>> On Wed, Feb 9, 2022 at 10:59 AM Dilip Kumar  wrote:
>>> On Wed, Feb 9, 2022 at 9:25 PM Andrew Dunstan  wrote:
 On 6/16/21 03:52, Dilip Kumar wrote:
> On Tue, Jun 15, 2021 at 7:01 PM Andrew Dunstan  
> wrote:
>> Rather than use size, I'd be inclined to say use this if the source
>> database is marked as a template, and use the copydir approach for
>> anything that isn't.
> Yeah, that is possible, on the other thought wouldn't it be good to
> provide control to the user by providing two different commands, e.g.
> COPY DATABASE for the existing method (copydir) and CREATE DATABASE
> for the new method (fully wal logged)?
 This proposal seems to have gotten lost.
>>> Yeah, I am planning to work on this part so that we can support both 
>>> methods.
>> But can we pick a different syntax? In my view this should be an
>> option to CREATE DATABASE rather than a whole new command.
> Maybe we can provide something like
>
> CREATE DATABASE..WITH WAL_LOG=true/false ? OR
> CREATE DATABASE..WITH WAL_LOG_DATA_PAGE=true/false ? OR
> CREATE DATABASE..WITH CHECKPOINT=true/false ? OR
>
> And then we can explain in documentation about these options?  I think
> default should be new method?
>
>

The last one at least has the advantage that it doesn't invent yet
another keyword.

I can live with the new method being the default. I'm sure it would be
highlighted in the release notes too.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: the build farm is ok, but not the hippopotamus (or the jay)

2022-02-11 Thread Tomas Vondra

On 2/11/22 16:16, Robert Haas wrote:

Hi,

I was looking over the buildfarm results yesterday and this morning,
and things seem to be mostly green, but not entirely. 'hippopotamus'
was failing yesterday with strange LDAP-related errors, and is now
complaining about pgsql-Git-Dirty, which I assume means that someone
is doing something funny on that machine manually. 'jay' has the same
maintainer and is also showing some failures.



Yes, those are "my" machines. I've been upgrading the OS on them, and 
this is likely a fallout from that. Sorry about the noise.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tomas Vondra

Hi,

here's a small patch modifying postgres_fdw to use TABLESAMPLE to 
collect sample when running ANALYZE on a foreign table. Currently the 
sampling happens locally, i.e. we fetch all data from the remote server 
and then pick rows. But that's hugely expensive for large relations 
and/or with limited network bandwidth, of course.


Alexander Lepikhov mentioned this in [1], but it was actually proposed 
by Stephen in 2020 [2] but no patch even materialized.


So here we go. The patch does a very simple thing - it uses TABLESAMPLE 
to collect/transfer just a small sample from the remote node, saving 
both CPU and network.


And indeed, that helps quite a bit:

-
create table t (a int);
Time: 10.223 ms

insert into t select i from generate_series(1,1000) s(i);
Time: 552.207 ms

analyze t;
Time: 310.445 ms

CREATE FOREIGN TABLE ft (a int)
  SERVER foreign_server
  OPTIONS (schema_name 'public', table_name 't');
Time: 4.838 ms

analyze ft;
WARNING:  SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t TABLESAMPLE 
SYSTEM(0.375001)

Time: 44.632 ms

alter foreign table ft options (sample 'false');
Time: 4.821 ms

analyze ft;
WARNING:  SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t
Time: 6690.425 ms (00:06.690)
-

6690ms without sampling, and 44ms with sampling - quite an improvement. 
Of course, the difference may be much smaller/larger in other cases.


Now, there's a couple issues ...

Firstly, the FDW API forces a bit strange division of work between 
different methods and duplicating some of it (and it's already mentioned 
in postgresAnalyzeForeignTable). But that's minor and can be fixed.


The other issue is which sampling method to use - we have SYSTEM and 
BERNOULLI built in, and the tsm_system_rows as an extension (and _time, 
but that's not very useful here). I guess we'd use one of the built-in 
ones, because that'll work on more systems out of the box.


But that leads to the main issue - determining the fraction of rows to 
sample. We know how many rows we want to sample, but we have no idea how 
many rows there are in total. We can look at reltuples, but we can't be 
sure how accurate / up-to-date that value is.


The patch just trusts it unless it's obviously bogus (-1, 0, etc.) and 
applies some simple sanity checks, but I wonder if we need to do more 
(e.g. look at relation size and adjust reltuples by current/relpages).


FWIW this is yet a bit more convoluted when analyzing partitioned table 
with foreign partitions, because we only ever look at relation sizes to 
determine how many rows to sample from each partition.



regards

[1] 
https://www.postgresql.org/message-id/bdb0bea2-a0da-1f1d-5c92-96ff90c198eb%40postgrespro.ru


[2] 
https://www.postgresql.org/message-id/20200829162231.GE29590%40tamriel.snowman.net


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..68f875c47a6 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 	appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
 }
 
+/*
+ * Construct SELECT statement to acquire numbe of rows of given relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+	StringInfoData relname;
+
+	/* We'll need the remote relation name as a literal. */
+	initStringInfo();
+	deparseRelation(, rel);
+
+	appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+	deparseStringLiteral(buf, relname.data);
+	appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
 /*
  * Construct SELECT statement to acquire sample rows of given relation.
  *
@@ -2328,6 +2348,72 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	deparseRelation(buf, rel);
 }
 
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+	Oid			relid = RelationGetRelid(rel);
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	int			i;
+	char	   *colname;
+	List	   *options;
+	ListCell   *lc;
+	bool		first = true;
+
+	*retrieved_attrs = NIL;
+
+	appendStringInfoString(buf, "SELECT ");
+	for (i = 0; i < tupdesc->natts; i++)
+	{
+		/* Ignore dropped columns. */
+		if 

Re: Assertion failure in WaitForWALToBecomeAvailable state machine

2022-02-11 Thread Bharath Rupireddy
On Fri, Feb 11, 2022 at 6:31 PM Dilip Kumar  wrote:
>
> On Fri, Feb 11, 2022 at 6:22 PM Bharath Rupireddy
>  wrote:
> >
> > On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar  wrote:
> > >
>
> > IIUC, the issue can happen while the walreceiver failed to get WAL
> > from primary for whatever reasons and its status is not
> > WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead
> > in WaitForWALToBecomeAvailable for reading from archive which ends up
> > in this assertion failure. ITSM, a rare scenario and it depends on
> > what walreceiver does between failure to get WAL from primary and
> > updating status to WALRCV_STOPPING or WALRCV_STOPPED.
> >
> > If the above race condition is a serious problem, if one thinks at
> > least it is a problem at all, that needs to be fixed.
>
> I don't think we can design a software which has open race conditions
> even though they are rarely occurring :)

Yes.

> I don't think
> > just making InstallXLogFileSegmentActive false is enough. By looking
> > at the comment [1], it doesn't make sense to move ahead for restoring
> > from the archive location without the WAL receiver fully stopped.
> > IMO, the real fix is to just remove WalRcvStreaming() and call
> > XLogShutdownWalRcv() unconditionally. Anyways, we have the
> > Assert(!WalRcvStreaming()); down below. I don't think it will create
> > any problem.
>
> If WalRcvStreaming() is returning false that means walreceiver is
> already stopped so we don't need to shutdown it externally.  I think
> like we are setting this flag outside start streaming we can reset it
> also outside XLogShutdownWalRcv.  Or I am fine even if we call
> XLogShutdownWalRcv() because if walreceiver is stopped it will just
> reset the flag we want it to reset and it will do nothing else.

As I said, I'm okay with just calling XLogShutdownWalRcv()
unconditionally as it just returns in case walreceiver has already
stopped and updates the InstallXLogFileSegmentActive flag to false.

Let's also hear what other hackers have to say about this.

Regards,
Bharath Rupireddy.




Re: pg_receivewal.exe unhandled exception in zlib1.dll

2022-02-11 Thread Andres Freund
Hi,

On 2022-02-11 16:33:11 +0100, Juan José Santamaría Flecha wrote:
> If you execute pg_receivewal.exe with the option "--compression-method
> gzip" it will fail with no error. You can see an error in the db log:
> 
> 2022-02-10 11:46:32.725 CET [11664][walsender] [pg_receivewal][3/0:0] LOG:
> could not receive data from client: An existing connection was forcibly
> closed by the remote host.
> 
> Tracing the execution you can see:
> 
> Unhandled exception at 0x7FFEA78C1208 (ucrtbase.dll) in An invalid
> parameter was passed to a function that considers invalid parameters fatal.
>   ucrtbase.dll!7ff8e8ae36a2() Unknown
> > zlib1.dll!gz_comp(gz_state * state, int flush) Line 111 C
>   zlib1.dll!gz_write(gz_state * state, const void * buf, unsigned __int64
> len) Line 235 C
>   pg_receivewal.exe!dir_write(void * f, const void * buf, unsigned __int64
> count) Line 300 C
>   pg_receivewal.exe!ProcessXLogDataMsg(pg_conn * conn, StreamCtl * stream,
> char * copybuf, int len, unsigned __int64 * blockpos) Line 1150 C
>   pg_receivewal.exe!HandleCopyStream(pg_conn * conn, StreamCtl * stream,
> unsigned __int64 * stoppos) Line 850 C
>   pg_receivewal.exe!ReceiveXlogStream(pg_conn * conn, StreamCtl * stream)
> Line 605 C
>   pg_receivewal.exe!StreamLog() Line 636 C
>   pg_receivewal.exe!main(int argc, char * * argv) Line 1005 C
> 
> The problem comes from the file descriptor passed to gzdopen() in
> 'src/bin/pg_basebackup/walmethods.c'. Using gzopen() instead, solves the
> issue without ifdefing for WIN32. Please find attached a patch for so.

I hit this as well. The problem is really caused by using a debug build of
postgres vs a production build of zlib. The use different C runtimes, with
different file descriptors. A lot of resources in the windows world are
unfortunately tied to the C runtime and that there can multiple C runtimes in
a single process.

Greetings,

Andres Freund




Re: refactoring basebackup.c

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby  wrote:
> FYI: there's a couple typos in the last 2 patches.

Hmm. OK. But I don't consider "can be optionally specified" incorrect
or worse than "can optionally be specified".

I do agree that spelling words correctly is a good idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, Feb 11, 2022 at 09:59:55AM -0500, Tom Lane wrote:
>> This seems like a pretty bad idea even if it weren't failing outright.
>> We should be examining the version of the file that's in the source
>> tree; the one in the installation tree might have version-skew
>> problems, if you've not yet done "make install".

> My original way used the source tree, but Michael thought it would be an issue
> for "installcheck" where the config may not be available.

Yeah, you are at risk either way, but in practice nobody is going to be
running these TAP tests without a source tree.

> This is what I had written:
> FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') AS 
> ln) conf

That's not using the source tree either, but the copy in the
cluster-under-test.  I'd fear it to be unstable in the buildfarm, where
animals can append whatever they please to the config file being used by
tests.

regards, tom lane




Re: OpenSSL conflicts with wincrypt.h

2022-02-11 Thread Tom Lane
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?=  
writes:
> There is a comment in 'src/backend/libpq/be-secure-openssl.c' addressing
> this issue, but I have to explicitly undefine X509_NAME. Please find
> attached a patch for so.

Um ... why?  Shouldn't the #undef in the OpenSSL headers take care
of the problem?

regards, tom lane




Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Justin Pryzby
On Fri, Feb 11, 2022 at 09:59:55AM -0500, Tom Lane wrote:
> Christoph Berg  writes:
> > this test is failing at Debian package compile time:
> > Could not open /usr/share/postgresql/15/postgresql.conf.sample: No such 
> > file or directory at t/003_check_guc.pl line 47.
> 
> > So it's trying to read from /usr/share/postgresql which doesn't exist
> > yet at build time.
> 
> > The relevant part of the test is this:
> 
> > # Find the location of postgresql.conf.sample, based on the information
> > # provided by pg_config.
> > my $sample_file =
> >   $node->config_data('--sharedir') . '/postgresql.conf.sample';
> 
> This seems like a pretty bad idea even if it weren't failing outright.
> We should be examining the version of the file that's in the source
> tree; the one in the installation tree might have version-skew
> problems, if you've not yet done "make install".

My original way used the source tree, but Michael thought it would be an issue
for "installcheck" where the config may not be available.

https://www.postgresql.org/message-id/YfTg/WHNLVVygy8v%40paquier.xyz

This is what I had written:

-- test that GUCS are in postgresql.conf
SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample EXCEPT
SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc
FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') AS 
ln) conf
WHERE ln ~ '^#?[[:alpha:]]'
ORDER BY 1;
lower
-
 config_file
 plpgsql.check_asserts
 plpgsql.extra_errors
 plpgsql.extra_warnings
 plpgsql.print_strict_params
 plpgsql.variable_conflict
(6 rows)

-- test that lines in postgresql.conf that look like GUCs are GUCs
SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc
FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') AS 
ln) conf
WHERE ln ~ '^#?[[:alpha:]]'
EXCEPT SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample
ORDER BY 1;
guc
---
 include
 include_dir
 include_if_exists
(3 rows)

-- 
Justin




pg_receivewal.exe unhandled exception in zlib1.dll

2022-02-11 Thread Juan José Santamaría Flecha
Hello,

If you execute pg_receivewal.exe with the option "--compression-method
gzip" it will fail with no error. You can see an error in the db log:

2022-02-10 11:46:32.725 CET [11664][walsender] [pg_receivewal][3/0:0] LOG:
could not receive data from client: An existing connection was forcibly
closed by the remote host.

Tracing the execution you can see:

Unhandled exception at 0x7FFEA78C1208 (ucrtbase.dll) in An invalid
parameter was passed to a function that considers invalid parameters fatal.
  ucrtbase.dll!7ff8e8ae36a2() Unknown
> zlib1.dll!gz_comp(gz_state * state, int flush) Line 111 C
  zlib1.dll!gz_write(gz_state * state, const void * buf, unsigned __int64
len) Line 235 C
  pg_receivewal.exe!dir_write(void * f, const void * buf, unsigned __int64
count) Line 300 C
  pg_receivewal.exe!ProcessXLogDataMsg(pg_conn * conn, StreamCtl * stream,
char * copybuf, int len, unsigned __int64 * blockpos) Line 1150 C
  pg_receivewal.exe!HandleCopyStream(pg_conn * conn, StreamCtl * stream,
unsigned __int64 * stoppos) Line 850 C
  pg_receivewal.exe!ReceiveXlogStream(pg_conn * conn, StreamCtl * stream)
Line 605 C
  pg_receivewal.exe!StreamLog() Line 636 C
  pg_receivewal.exe!main(int argc, char * * argv) Line 1005 C

The problem comes from the file descriptor passed to gzdopen() in
'src/bin/pg_basebackup/walmethods.c'. Using gzopen() instead, solves the
issue without ifdefing for WIN32. Please find attached a patch for so.

Regards,

Juan José Santamaría Flecha
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index a6d08c1..bc0790a 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -143,7 +143,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
 #ifdef HAVE_LIBZ
 	if (dir_data->compression_method == COMPRESSION_GZIP)
 	{
-		gzfp = gzdopen(fd, "wb");
+		gzfp = gzopen(tmppath, "wb");
 		if (gzfp == NULL)
 		{
 			dir_data->lasterrno = errno;


OpenSSL conflicts with wincrypt.h

2022-02-11 Thread Juan José Santamaría Flecha
Hello,

When building Postgres using MSVC with Kerberos Version 4.1 and OpenSSL
1.1.1l (both of them, using only one will raise no errors), I see errors
like:

"C:\postgres\pgsql.sln" (default target) (1) ->
"C:\postgres\postgres.vcxproj" (default target) (2) ->
(ClCompile target) ->
  C:\postgres\src\backend\libpq\be-secure-openssl.c(583,43): warning C4047:
'function': 'X509_NAME *' differs in levels of indirection from 'int'
[C:\postgres\pos
tgres.vcxproj]

"C:\postgres\pgsql.sln" (default target) (1) ->
"C:\postgres\postgres.vcxproj" (default target) (2) ->
(ClCompile target) ->
  C:\postgres\src\backend\libpq\be-secure-openssl.c(74,35): error C2143:
syntax error: missing ')' before '(' [C:\postgres\postgres.vcxproj]

There is a comment in 'src/backend/libpq/be-secure-openssl.c' addressing
this issue, but I have to explicitly undefine X509_NAME. Please find
attached a patch for so.

Regards,

Juan José Santamaría Flecha
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 3d0168a..649bdc3 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -49,6 +49,9 @@
 #ifndef OPENSSL_NO_ECDH
 #include 
 #endif
+#ifdef WIN32
+#undef X509_NAME
+#endif
 #include 
 
 


Re: refactoring basebackup.c

2022-02-11 Thread Justin Pryzby
On Fri, Feb 11, 2022 at 08:35:25PM +0530, Jeevan Ladhe wrote:
> Thanks Robert for the bravity :-)

FYI: there's a couple typos in the last 2 patches.

I added them to my typos branch; feel free to wait until April if you'd prefer
to see them fixed in bulk.

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd19..649b91208f3 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -419,7 +419,7 @@ PostgreSQL documentation

 The compression method can be set to gzip or
 lz4, or none for no
-compression. A compression level can be optionally specified, by
+compression. A compression level can optionally be specified, by
 appending the level number after a colon (:). If no
 level is specified, the default compression level will be used. If
 only a level is specified without mentioning an algorithm,
@@ -440,7 +440,7 @@ PostgreSQL documentation
 -Xstream, pg_wal.tar will
 be compressed using gzip if client-side gzip
 compression is selected, but will not be compressed if server-side
-compresion or LZ4 compresion is selected.
+compression or LZ4 compression is selected.

   
  




the build farm is ok, but not the hippopotamus (or the jay)

2022-02-11 Thread Robert Haas
Hi,

I was looking over the buildfarm results yesterday and this morning,
and things seem to be mostly green, but not entirely. 'hippopotamus'
was failing yesterday with strange LDAP-related errors, and is now
complaining about pgsql-Git-Dirty, which I assume means that someone
is doing something funny on that machine manually. 'jay' has the same
maintainer and is also showing some failures.

The last failure on jay looks like this:

test pgp-armor... ok  126 ms
test pgp-decrypt  ... ok  192 ms
test pgp-encrypt  ... ok  611 ms
test pgp-compression  ... FAILED (test process exited with
exit code 127)2 ms
test pgp-pubkey-decrypt   ... FAILED (test process exited with
exit code 127)2 ms
test pgp-pubkey-encrypt   ... FAILED9 ms
test pgp-info ... FAILED8 ms

Is there some maintenance happening on these machines that means we
should discount these failures, or is something broken?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Merging statistics from children instead of re-sampling everything

2022-02-11 Thread Tomas Vondra




On 2/11/22 05:29, Andrey V. Lepikhov wrote:

On 2/11/22 03:37, Tomas Vondra wrote:

That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.

IMO, to solve the problem we should do two things:
1. Avoid repeatable partition scans in the case inheritance tree.
2. Avoid to re-analyze everything in the case of active changes in small 
subset of partitions.


For (1) i can imagine a solution like multiplexing: on the stage of 
defining which relations to scan, group them and prepare parameters of 
scanning to make multiple samples in one shot.

>> It looks like we need a separate logic for analysis of partitioned
tables - we should form and cache samples on each partition before an 
analysis.
It requires a prototype to understand complexity of such solution and 
can be done separately from (2).




I'm not sure I understand what you mean by multiplexing. The term 
usually means "sending multiple signals at once" but I'm not sure how 
that applies to this issue. Can you elaborate?


I assume you mean something like collecting a sample for a partition 
once, and then keeping and reusing the sample for future ANALYZE runs, 
until invalidated in some sense.


Yeah, I agree that'd be useful - and not just for partitions, actually. 
I've been pondering something like that for regular tables, because the 
sample might be used for estimation of clauses directly.


But it requires storing the sample somewhere, and I haven't found a good 
and simple way to do that. We could serialize that into bytea, or we 
could create a new fork, or something, but what should that do with 
oversized attributes (how would TOAST work for a fork) and/or large 
samples (which might not fit into 1GB bytea)?



Task (2) is more difficult to solve. Here we can store samples from each 
partition in values[] field of pg_statistic or in specific table which 
stores a 'most probable values' snapshot of each table.


I think storing samples in pg_statistic is problematic, because values[] 
is subject to 1GB limit etc. Not an issue for small MCV list of a single 
attribute, certainly an issue for larger samples. Even if the data fit, 
the size of pg_statistic would explode.


Most difficult problem here, as you mentioned, is ndistinct value. Is it 
possible to store not exactly calculated value of ndistinct, but an 
'expected value', based on analysis of samples and histograms on 
partitions? Such value can solve also a problem of estimation of a SETOP 
result grouping (joining of them, etc), where we have statistics only on 
sources of the union.




I think ndistinct is problem only when merging final estimates. But if 
we're able to calculate and store some intermediate results, we can 
easily use HLL and merge that.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Teach pg_receivewal to use lz4 compression

2022-02-11 Thread Robert Haas
On Thu, Nov 4, 2021 at 10:47 PM Michael Paquier  wrote:
> Indeed.  My rebase was a bit sloppy here.

Hi!

Over in 
http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pgebaosgfmzz6ac4_srwom_+u...@mail.gmail.com
I was noticing that CreateWalTarMethod doesn't support LZ4
compression. It would be nice if it did. I thought maybe the patch on
this thread would fix that, but I think maybe it doesn't, because it
looks like that's touching the WalDirectoryMethod part of that file,
rather than the WalTarMethod part. Is that correct? And, on a related
note, Michael, do you plan to get something committed here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-11 Thread Jeevan Ladhe
Thanks Robert for the bravity :-)

Regards,
Jeevan Ladhe


On Fri, 11 Feb 2022, 20:31 Robert Haas,  wrote:

> On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit 
> wrote:
> > > Sure, please find the rebased patch attached.
> >
> > Thanks, I have validated v2 patch on top of rebased patch.
>
> I'm still feeling brave, so I committed this too after fixing a few
> things. In the process I noticed that we don't have support for LZ4
> compression of streamed WAL (cf. CreateWalTarMethod). It would be good
> to fix that. I'm not quite sure whether
>
> http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
> is basically what we need or whether something else is required.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: refactoring basebackup.c

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit  wrote:
> > Sure, please find the rebased patch attached.
>
> Thanks, I have validated v2 patch on top of rebased patch.

I'm still feeling brave, so I committed this too after fixing a few
things. In the process I noticed that we don't have support for LZ4
compression of streamed WAL (cf. CreateWalTarMethod). It would be good
to fix that. I'm not quite sure whether
http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
is basically what we need or whether something else is required.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Tom Lane
Christoph Berg  writes:
> this test is failing at Debian package compile time:
> Could not open /usr/share/postgresql/15/postgresql.conf.sample: No such file 
> or directory at t/003_check_guc.pl line 47.

> So it's trying to read from /usr/share/postgresql which doesn't exist
> yet at build time.

> The relevant part of the test is this:

> # Find the location of postgresql.conf.sample, based on the information
> # provided by pg_config.
> my $sample_file =
>   $node->config_data('--sharedir') . '/postgresql.conf.sample';

This seems like a pretty bad idea even if it weren't failing outright.
We should be examining the version of the file that's in the source
tree; the one in the installation tree might have version-skew
problems, if you've not yet done "make install".

regards, tom lane




Re: Synchronizing slots from primary to standby

2022-02-11 Thread Peter Eisentraut



On 05.02.22 20:59, Andres Freund wrote:

On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:

 From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Mon, 3 Jan 2022 14:43:36 +0100
Subject: [PATCH v3] Synchronize logical replication slots from primary to
  standby

I've just skimmed the patch and the related threads. As far as I can tell this
cannot be safely used without the conflict handling in [1], is that correct?


This or similar questions have been asked a few times about this or 
similar patches, but they always come with some doubt.  If we think so, 
it would be useful perhaps if we could come up with test cases that 
would demonstrate why that other patch/feature is necessary.  (I'm not 
questioning it personally, I'm just throwing out ideas here.)





Re: Accommodate startup process in a separate ProcState array slot instead of in MaxBackends slots.

2022-02-11 Thread Yura Sokolov
В Сб, 16/10/2021 в 16:37 +0530, Bharath Rupireddy пишет:
> On Thu, Oct 14, 2021 at 10:56 AM Fujii Masao
>  wrote:
> > On 2021/10/12 15:46, Bharath Rupireddy wrote:
> > > On Tue, Oct 12, 2021 at 5:37 AM Fujii Masao  
> > > wrote:
> > > > On 2021/10/12 4:07, Bharath Rupireddy wrote:
> > > > > Hi,
> > > > > 
> > > > > While working on [1], it is found that currently the ProcState array
> > > > > doesn't have entries for auxiliary processes, it does have entries for
> > > > > MaxBackends. But the startup process is eating up one slot from
> > > > > MaxBackends. We need to increase the size of the ProcState array by 1
> > > > > at least for the startup process. The startup process uses ProcState
> > > > > slot via InitRecoveryTransactionEnvironment->SharedInvalBackendInit.
> > > > > The procState array size is initialized to MaxBackends in
> > > > > SInvalShmemSize.
> > > > > 
> > > > > The consequence of not fixing this issue is that the database may hit
> > > > > the error "sorry, too many clients already" soon in
> > > > > SharedInvalBackendInit.
> > 
> > On second thought, I wonder if this error could not happen in practice. No?
> > Because autovacuum doesn't work during recovery and the startup process
> > can safely use the ProcState entry for autovacuum worker process.
> > Also since the minimal allowed value of autovacuum_max_workers is one,
> > the ProcState array guarantees to have at least one entry for autovacuum 
> > worker.
> > 
> > If this understanding is right, we don't need to enlarge the array and
> > can just update the comment. I don't strongly oppose to enlarge
> > the array in the master, but I'm not sure it's worth doing that
> > in back branches if the issue can cause no actual error.
> 
> Yes, the issue can't happen. The comment in the SInvalShmemSize,
> mentioning about the startup process always having an extra slot
> because the autovacuum worker is not active during recovery, looks
> okay. But, is it safe to assume that always? Do we have a way to
> specify that in the form an Assert(when_i_am_startup_proc &&
> autovacuum_not_running) (this looks a bit dirty though)? Instead, we
> can just enlarge the array in the master and be confident about the
> fact that the startup process always has one dedicated slot.

But this slot wont be used for most of cluster life. It will be just
waste.

And `Assert(there_is_startup_proc && autovacuum_not_running)` has
value on its own, hasn't it? So why doesn't add it with comment.

regards,
Yura Sokolov





Re: Synchronizing slots from primary to standby

2022-02-11 Thread Peter Eisentraut

On 10.02.22 22:47, Bruce Momjian wrote:

On Tue, Feb  8, 2022 at 08:27:32PM +0530, Ashutosh Sharma wrote:

Which means that if e.g. the standby_slot_names GUC differs from
synchronize_slot_names on the physical replica, the slots synchronized on the
physical replica are not going to be valid.  Or if the primary drops its
logical slots.



Should the redo function for the drop replication slot have the capability
to drop it on standby and its subscribers (if any) as well?


Slots are not WAL logged (and shouldn't be).

I think you pretty much need the recovery conflict handling infrastructure I
referenced upthread, which recognized during replay if a record has a conflict
with a slot on a standby.  And then ontop of that you can build something like
this patch.



OK. Understood, thanks Andres.


I would love to see this feature in PG 15.  Can someone explain its
current status?  Thanks.


The way I understand it:

1. This feature (probably) depends on the "Minimal logical decoding on 
standbys" patch.  The details there aren't totally clear (to me).  That 
patch had some activity lately but I don't see it in a state that it's 
nearing readiness.


2. I think the way this (my) patch is currently written needs some 
refactoring about how we launch and manage workers.  Right now, it's all 
mangled together with logical replication, since that is a convenient 
way to launch and manage workers, but it really doesn't need to be tied 
to logical replication, since it can also be used for other logical slots.


3. It's an open question how to configure this.  My patch show a very 
minimal configuration that allows you to keep all logical slots always 
behind one physical slot, which addresses one particular use case.  In 
general, you might have things like, one set of logical slots should 
stay behind one physical slot, another set behind another physical slot, 
another set should not care, etc.  This could turn into something like 
the synchronous replication feature, where it ends up with its own 
configuration language.


Each of these are clearly significant jobs on their own.




Re: refactoring basebackup.c

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 5:58 AM Jeevan Ladhe  wrote:
> >Jeevan, Your v12 patch does not apply on HEAD, it requires a
> rebase.
>
> Sure, please find the rebased patch attached.

It's Friday today, but I'm feeling brave, and it's still morning here,
so ... committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Simon Riggs
On Fri, 11 Feb 2022 at 08:48, Simon Riggs  wrote:
>
> On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:
>
> > Looks lik syncrep will make this a lot worse, because it can drastically
> > increase the window between the TransactionIdCommitTree() and
> > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > least it might make it easier to write tests exercising this scenario...
>
> Agreed
>
> TransactionIdIsKnownCompleted(xid) is only broken because the single
> item cache is set too early in some cases. The single item cache is
> important for performance, so we just need to be more careful about
> setting the cache.

Something like this... fix_cachedFetchXid.v1.patch prevents the cache
being set, but this fails! Not worked out why, yet.

just_remove_TransactionIdIsKnownCompleted_call.v1.patch
just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


fix_cachedFetchXid.v1.patch
Description: Binary data


just_remove_TransactionIdIsKnownCompleted_call.v1.patch
Description: Binary data


Re: Identify missing publications from publisher while create/alter subscription.

2022-02-11 Thread Ashutosh Sharma
I have spent little time looking at the latest patch. The patch looks
to be in good shape as it has already been reviewed by many people
here, although I did get some comments. Please take a look and let me
know your thoughts.


+   /* Try to connect to the publisher. */
+   wrconn = walrcv_connect(sub->conninfo, true, sub->name, );
+   if (!wrconn)
+   ereport(ERROR,
+   (errmsg("could not connect to the publisher: %s", err)));

I think it would be good to also include the errcode
(ERRCODE_CONNECTION_FAILURE) here?

--

@@ -514,6 +671,8 @@ CreateSubscription(ParseState *pstate,
CreateSubscriptionStmt *stmt,

PG_TRY();
{
+   check_publications(wrconn, publications, opts.validate_publication);
+


Instead of passing the opts.validate_publication argument to
check_publication function, why can't we first check if this option is
set or not and accordingly call check_publication function? For other
options I see that it has been done in the similar way for e.g. check
for opts.connect or opts.refresh or opts.enabled etc.

--

Above comment also applies for:

@@ -968,6 +1130,8 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
replaces[Anum_pg_subscription_subpublications - 1] = true;

update_tuple = true;
+   connect_and_check_pubs(sub, stmt->publication,
+  opts.validate_publication);


--

+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.

publication -> publications (in the 4th line : throw an error if any
of the publication doesn't exist)

This applies for both CREATE and ALTER subscription commands.

--
With Regards,
Ashutosh Sharma.

On Sat, Nov 13, 2021 at 12:50 PM vignesh C  wrote:
>
> On Wed, Nov 10, 2021 at 11:16 AM Bharath Rupireddy
>  wrote:
> >
> > On Tue, Nov 9, 2021 at 9:27 PM vignesh C  wrote:
> > > Attached v12 version is rebased on top of Head.
> >
> > Thanks for the patch. Here are some comments on v12:
> >
> > 1) I think ERRCODE_TOO_MANY_ARGUMENTS isn't the right error code, the
> > ERRCODE_UNDEFINED_OBJECT is more meaningful. Please change.
> > + ereport(ERROR,
> > + (errcode(ERRCODE_TOO_MANY_ARGUMENTS),
> > + errmsg_plural("publication %s does not exist in the publisher",
> > +"publications %s do not exist in the publisher",
> >
> > The existing code using
> > ereport(ERROR,
> > (errcode(ERRCODE_UNDEFINED_OBJECT),
> >  errmsg("subscription \"%s\" does not exist", subname)));
>
> Modified
>
> > 2) Typo: It is "One of the specified publications exists."
> > +# One of the specified publication exist.
>
> Modified
>
> > 3) I think we can remove the test case "+# Specified publication does
> > not exist." because the "+# One of the specified publication exist."
> > covers the code.
>
> Modified
>
> > 4) Do we need the below test case? Even with refresh = false, it does
> > call connect_and_check_pubs() right? Please remove it.
> > +# Specified publication does not exist with refresh = false.
> > +($ret, $stdout, $stderr) = $node_subscriber->psql('postgres',
> > +   "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub
> > WITH (REFRESH = FALSE, VALIDATE_PUBLICATION = TRUE)"
> > +);
> > +ok( $stderr =~
> > + m/ERROR:  publication "non_existent_pub" does not exist in
> > the publisher/,
> > +   "Alter subscription for non existent publication fails");
> > +
>
> Modified
>
> > 5) Change the test case names to different ones instead of the same.
> > Have something like:
> > "Create subscription fails with single non-existent publication");
> > "Create subscription fails with multiple non-existent publications");
> > "Create subscription fails with mutually exclusive options");
> > "Alter subscription add publication fails with non-existent publication");
> > "Alter subscription set publication fails with non-existent publication");
> > "Alter subscription set publication fails with connection to a
> > non-existent database");
>
> Modified
>
> Thanks for the comments, the attached v13 patch has the fixes for the same.
>
> Regards,
> Vignesh




Re: Assertion failure in WaitForWALToBecomeAvailable state machine

2022-02-11 Thread Dilip Kumar
On Fri, Feb 11, 2022 at 6:22 PM Bharath Rupireddy
 wrote:
>
> On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar  wrote:
> >

> IIUC, the issue can happen while the walreceiver failed to get WAL
> from primary for whatever reasons and its status is not
> WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead
> in WaitForWALToBecomeAvailable for reading from archive which ends up
> in this assertion failure. ITSM, a rare scenario and it depends on
> what walreceiver does between failure to get WAL from primary and
> updating status to WALRCV_STOPPING or WALRCV_STOPPED.
>
> If the above race condition is a serious problem, if one thinks at
> least it is a problem at all, that needs to be fixed.

I don't think we can design a software which has open race conditions
even though they are rarely occurring :)

I don't think
> just making InstallXLogFileSegmentActive false is enough. By looking
> at the comment [1], it doesn't make sense to move ahead for restoring
> from the archive location without the WAL receiver fully stopped.
> IMO, the real fix is to just remove WalRcvStreaming() and call
> XLogShutdownWalRcv() unconditionally. Anyways, we have the
> Assert(!WalRcvStreaming()); down below. I don't think it will create
> any problem.

If WalRcvStreaming() is returning false that means walreceiver is
already stopped so we don't need to shutdown it externally.  I think
like we are setting this flag outside start streaming we can reset it
also outside XLogShutdownWalRcv.  Or I am fine even if we call
XLogShutdownWalRcv() because if walreceiver is stopped it will just
reset the flag we want it to reset and it will do nothing else.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-02-11 Thread Etsuro Fujita
On Tue, Feb 8, 2022 at 3:49 AM Fujii Masao  wrote:
> Here are the review comments for 0001 patch.
>
> I got the following compiler warning.
>
> [16:58:07.120] connection.c: In function ‘pgfdw_finish_pre_commit_cleanup’:
> [16:58:07.120] connection.c:1726:4: error: ISO C90 forbids mixed declarations 
> and code [-Werror=declaration-after-statement]
> [16:58:07.120]  1726 |PGresult   *res;
> [16:58:07.120]   |^~~~

Sorry, I didn’t notice this, because my compiler doesn’t produce it.
I tried to fix it.  Attached is an updated version of the patch set.
I hope this works for you.

> +   /* Ignore errors in the DEALLOCATE (see note above) */
> +   if ((res = PQgetResult(entry->conn)) != NULL)
>
> Doesn't PQgetResult() need to be called repeatedly until it returns NULL or 
> the connection is lost because there can be more than one messages to receive?

Yeah, we would receive a single message here, but PQgetResult must be
called repeatedly until it returns NULL (see the documentation note
about it in libpq.sgml); else the PQtransactionStatus of the
connection would remain PQTRANS_ACTIVE, causing the connection to be
closed at transaction end, because we do this in
pgfdw_reset_xact_state called from pgfdw_xact_callback:

/*
 * If the connection isn't in a good idle state, it is marked as
 * invalid or keep_connections option of its server is disabled, then
 * discard it to recover. Next GetConnection will open a new
 * connection.
 */
if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state ||
entry->invalidated ||
!entry->keep_connections)
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

But I noticed a brown-paper-bag bug in the bit you showed above: the
if test should be modified as a while loop.  :-(  I fixed this in the
attached.

> +   if (pending_deallocs)
> +   {
> +   foreach(lc, pending_deallocs)
>
> If pending_deallocs is NIL, we don't enter this foreach loop. So probably "if 
> (pending_deallocs)" seems not necessary.

Yeah, I think we could omit the if test, but I added it to match other
places (see e.g., foreign_grouping_ok() in postgres_fdw.c).  It looks
cleaner to me to have it before the loop.

> entry->keep_connections = defGetBoolean(def);
> +   if (strcmp(def->defname, "parallel_commit") == 0)
> +   entry->parallel_commit = defGetBoolean(def);
>
> Isn't it better to use "else if" here, instead?

Yeah, that would be better.  Done.

> +static void do_sql_command_begin(PGconn *conn, const char *sql);
> +static void do_sql_command_end(PGconn *conn, const char *sql);
>
> To simplify the code more, I'm tempted to change do_sql_command() so that it 
> just calls the above two functions, instead of calling PQsendQuery() and 
> pgfw_get_result() directly. Thought? If we do this, probably we also need to 
> change do_sql_command_end() so that it accepts boolean flag which specifies 
> whether PQconsumeInput() is called or not, as follows.

Done.  Actually, I was planning to do this for consistency with a
similar refactoring for pgfdw_cancel_query and
pgfdw_exec_cleanup_query that had been done
in the parallel-abort patch.

I tweaked comments/docs a little bit as well.

Thanks for reviewing!

Best regards,
Etsuro Fujita


v4-0001-postgres-fdw-Add-support-for-parallel-commit.patch
Description: Binary data


v4-0002-postgres_fdw-Minor-cleanup-for-pgfdw_abort_cleanup.patch
Description: Binary data


v4-0003-postgres-fdw-Add-support-for-parallel-abort.patch
Description: Binary data


Re: Assertion failure in WaitForWALToBecomeAvailable state machine

2022-02-11 Thread Bharath Rupireddy
On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar  wrote:
>
> Hi,
>
> The problem is that whenever we are going for streaming we always set
> XLogCtl->InstallXLogFileSegmentActive to true, but while switching
> from streaming to archive we do not always reset it so it hits
> assertion in some cases. Basically we reset it inside
> XLogShutdownWalRcv() but while switching from the streaming mode we
> only call it conditionally if WalRcvStreaming().  But it is very much
> possible that even before we call WalRcvStreaming() the walreceiver
> might have set alrcv->walRcvState to WALRCV_STOPPED.  So now
> WalRcvStreaming() will return false.  So I agree now we do not want to
> really shut down the walreceiver but who will reset the flag?
>
> I just ran some tests on primary and attached the walreceiver to gdb
> and waited for it to exit with timeout and then the recovery process
> hit the assertion.
>
> 2022-02-11 14:33:56.976 IST [60978] FATAL:  terminating walreceiver
> due to timeout
> cp: cannot stat
> ‘/home/dilipkumar/work/PG/install/bin/wal_archive/0002.history’:
> No such file or directory
> 2022-02-11 14:33:57.002 IST [60973] LOG:  restored log file
> "00010003" from archive
> TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File:
> "xlog.c", Line: 3823, PID: 60973)
>
> I have just applied a quick fix and that solved the issue, basically
> if the last failed source was streaming and the WalRcvStreaming() is
> false then just reset this flag.

IIUC, the issue can happen while the walreceiver failed to get WAL
from primary for whatever reasons and its status is not
WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead
in WaitForWALToBecomeAvailable for reading from archive which ends up
in this assertion failure. ITSM, a rare scenario and it depends on
what walreceiver does between failure to get WAL from primary and
updating status to WALRCV_STOPPING or WALRCV_STOPPED.

If the above race condition is a serious problem, if one thinks at
least it is a problem at all, that needs to be fixed. I don't think
just making InstallXLogFileSegmentActive false is enough. By looking
at the comment [1], it doesn't make sense to move ahead for restoring
from the archive location without the WAL receiver fully stopped.
IMO, the real fix is to just remove WalRcvStreaming() and call
XLogShutdownWalRcv() unconditionally. Anyways, we have the
Assert(!WalRcvStreaming()); down below. I don't think it will create
any problem.

[1]
/*
 * Before we leave XLOG_FROM_STREAM state, make sure that
 * walreceiver is not active, so that it won't overwrite
 * WAL that we restore from archive.
 */
if (WalRcvStreaming())
XLogShutdownWalRcv();

Regards,
Bharath Rupireddy.




Re: Database-level collation version tracking

2022-02-11 Thread Julien Rouhaud
On Fri, Feb 11, 2022 at 12:07:02PM +0100, Peter Eisentraut wrote:
> On 10.02.22 12:08, Julien Rouhaud wrote:
> > > +  errhint("Rebuild all objects affected 
> > > by collation in the template database and run "
> > > +  "ALTER DATABASE %s 
> > > REFRESH COLLATION VERSION, "
> > > +  "or build PostgreSQL 
> > > with the right library version.",
> > > +  
> > > quote_identifier(dbtemplate;
> > 
> > After a second read I think the messages are slightly ambiguous.  What do 
> > you
> > think about specifying the problematic collation name and provider?
> > 
> > For now we only support libc default collation so users will probably have 
> > to
> > reindex almost everything on that database (not sure if the versioning is 
> > more
> > fine grained on Windows), but we should probably still specify "affected by
> > libc collation" in the errhint so they have a chance to avoid unnecessary
> > reindex.
> 
> I think accurate would be something like "objects using the default
> collation", since objects using a specific collation are not meant, even if
> they use the same provider.

Technically is the objects explicitly use the same collation as the default
collation they should be impacted the same way, but agreed.

> > > +/*
> > > + * ALTER DATABASE name REFRESH COLLATION VERSION
> > > + */
> > > +ObjectAddress
> > > +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
> > 
> > I'm wondering why you changed this function to return an ObjectAddress 
> > rather
> > than an Oid?  There's no event trigger support for ALTER DATABASE, and the 
> > rest
> > of similar utility commands also returns Oid.
> 
> Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which
> return ObjectAddress.

Apparently I managed to only check AlterDatabase and AlterDatabaseSet, which
both return an Oid.  Maybe we could also update those two to also return an
ObjectAddress, for consistency?




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-02-11 Thread Matthias van de Meent
Hi,

Today I noticed the inefficiencies of our dead tuple storage once
again, and started theorizing about a better storage method; which is
when I remembered that this thread exists, and that this thread
already has amazing results.

Are there any plans to get the results of this thread from PoC to committable?

Kind regards,

Matthias van de Meent




Re: refactoring basebackup.c

2022-02-11 Thread Dipesh Pandit
> Sure, please find the rebased patch attached.

Thanks, I have validated v2 patch on top of rebased patch.

Thanks,
Dipesh


Re: Database-level collation version tracking

2022-02-11 Thread Peter Eisentraut

On 10.02.22 12:08, Julien Rouhaud wrote:

+errhint("Rebuild all objects affected by 
collation in the template database and run "
+"ALTER DATABASE %s REFRESH 
COLLATION VERSION, "
+"or build PostgreSQL with 
the right library version.",
+
quote_identifier(dbtemplate;


After a second read I think the messages are slightly ambiguous.  What do you
think about specifying the problematic collation name and provider?

For now we only support libc default collation so users will probably have to
reindex almost everything on that database (not sure if the versioning is more
fine grained on Windows), but we should probably still specify "affected by
libc collation" in the errhint so they have a chance to avoid unnecessary
reindex.


I think accurate would be something like "objects using the default 
collation", since objects using a specific collation are not meant, even 
if they use the same provider.



+/*
+ * ALTER DATABASE name REFRESH COLLATION VERSION
+ */
+ObjectAddress
+AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)


I'm wondering why you changed this function to return an ObjectAddress rather
than an Oid?  There's no event trigger support for ALTER DATABASE, and the rest
of similar utility commands also returns Oid.


Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which 
return ObjectAddress.





Re: refactoring basebackup.c

2022-02-11 Thread Jeevan Ladhe
>Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase.

Sure, please find the rebased patch attached.

Regards,
Jeevan

On Fri, 11 Feb 2022 at 14:13, Dipesh Pandit  wrote:

> Hi,
>
> Thanks for the feedback, I have incorporated the suggestions
> and updated a new patch. PFA v2 patch.
>
> > I think similar to bbstreamer_lz4_compressor_content() in
> > bbstreamer_lz4_decompressor_content() we can change len to avail_in.
>
> In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
> based on the number of bytes decompressed in each iteration. I think
> we cannot replace it with "len" here.
>
> Jeevan, Your v12 patch does not apply on HEAD, it requires a
> rebase. I have applied it on commit
> 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
> to validate my v2 patch.
>
> Thanks,
> Dipesh
>


v13-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: [BUG]Update Toast data failure in logical replication

2022-02-11 Thread Amit Kapila
On Fri, Feb 11, 2022 at 12:00 PM tanghy.f...@fujitsu.com
 wrote:
>
> Attached the patches which fixed the above two comments and the first comment 
> in
> my previous mail [1], the rest is the same as before.
> I ran the tests on all branches, they all passed as expected.
>

Thanks, these look good to me. I'll push these early next week
(Monday) unless there are any more suggestions or comments.

-- 
With Regards,
Amit Kapila.




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-11 Thread Andy Fan
>
>>
> So I think knowing what bad it is to have this feature is the key point to
>> discussion now.
>>
>>
I re-read the discussion at 2015 [1] and the below topic is added for the
above
question.   Here is the summary for easy discussion.


>From planner aspect:

> While I've only read your description of the patch not the patch itself,
> the search methodology you propose seems pretty brute-force and
> unlikely to solve that issue.  It's particularly important to avoid
O(N^2)
> behaviors when there are N expressions ...

The patch has 3 steps in general.  1). Gather the filter_qual_list during
the deconstruct_jointree. only unmergeable qual is gathered here.
2).  After the root->eq_classes is built, scan each of the above quals to
find out if there is a EC match,  if yes, add it to the EC.  There are
some fast paths here. like ec->relids,  em->em_relids.  3).  compose
the qual in ec_filter and members in ec_members, then distribute it to
the relations.  This step take the most cycles of this feature,   and it is
the most important part for this feature as well.

Fortunately,  thousands of partitions of a table would not make it worse
since they are not generated at that stage.  So I'd believe the number of
ECs or EMs in an EC would be pretty small in common cases.

> time would be spent on searches for matching subexpressions whether
> or not anything was learned (and often nothing would be learned).

This is about some cases like "SELECT * FROM t1, t2 WHERE t1.a = t2.a
and t1.b > 3".   In this case,  we still need to go through steps 1 & 2,
all the fast
paths don't work and the equal() is unavoidable.  However step 3 can be
ignored.
If we want to improve this,  could we maintain an attr_eq_indexes in
RelOptInfos
which indicates if the given attribute appears in any one of EC members?

=
>From executor aspects:

> The reason why the answer isn't automatically "all of them"
> is because, first of all, it's possible that enforcing the condition
> at a particular table costs more to filter out the rows that we save
> in execution time at higher levels of the plan tree.  For example,
> consider A JOIN B ON A.X = B.X WHERE A.X > 100.  It might be that
> the range of A.X is [0,101] but the range of B.X is
> [100,200]; so enforcing the inequality against A is very
> selective but enforcing it against B filters out basically nothing.

I think we can classify this as we push down / execute an qual, the
qual takes lots of cycles, but it doesn't filter many rows.

> A first cut might be to enforce the inequality against the relation
> where it's believed to be most selective, equivalence-class column
> mentioned in the inequality provided that the
> selectivity is thought to be above some threshold ... but I'm not sure
> this is very principled,

I can only input +1 after some deep thoughts.

>> Furthermore, there are some cases involving parameterized paths where
>> enforcing the inequality multiple times is definitely bad: for
>> example, if we've got a nested loop where the outer side is a seq scan
>> that enforces the condition and the inner side is an index probe, it
>> is just a waste to retest it on the inner side.  We already know that
>> the outer row passes the inequality, so the inner row will necessarily
>> pass also.  This doesn't apply to merge or hash joins, and it also
>> doesn't apply to all nested loops: scans that aren't paramaterized by
>> the equivalence-class column can still benefit from separate
>> enforcement of the inequality.
>>
> I guess that could be fixed by somehow marking these pushed quals as
> optional and having parameterised scans ignore optional quals.

This has been done by committing 4.

> Now, all that having been said, I think this is a pretty important
> optimization.  Lots of people have asked for it, and I think it would
> be worth expending some brainpower to try to figure out a way to be
> smarter than we are now, which is, in a nutshell, as dumb as possible.

+1.  I asked custom to add the derivable quals manually for 10+ of table
each query last year and gained great results.

Anyone still have interest in this?  Or is a better solution really
possible?
Or is the current method  too bad to rescue?

-- 
Best Regards
Andy Fan


Assertion failure in WaitForWALToBecomeAvailable state machine

2022-02-11 Thread Dilip Kumar
Hi,

The problem is that whenever we are going for streaming we always set
XLogCtl->InstallXLogFileSegmentActive to true, but while switching
from streaming to archive we do not always reset it so it hits
assertion in some cases. Basically we reset it inside
XLogShutdownWalRcv() but while switching from the streaming mode we
only call it conditionally if WalRcvStreaming().  But it is very much
possible that even before we call WalRcvStreaming() the walreceiver
might have set alrcv->walRcvState to WALRCV_STOPPED.  So now
WalRcvStreaming() will return false.  So I agree now we do not want to
really shut down the walreceiver but who will reset the flag?

I just ran some tests on primary and attached the walreceiver to gdb
and waited for it to exit with timeout and then the recovery process
hit the assertion.

2022-02-11 14:33:56.976 IST [60978] FATAL:  terminating walreceiver
due to timeout
cp: cannot stat
‘/home/dilipkumar/work/PG/install/bin/wal_archive/0002.history’:
No such file or directory
2022-02-11 14:33:57.002 IST [60973] LOG:  restored log file
"00010003" from archive
TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File:
"xlog.c", Line: 3823, PID: 60973)

I have just applied a quick fix and that solved the issue, basically
if the last failed source was streaming and the WalRcvStreaming() is
false then just reset this flag.

@@ -12717,6 +12717,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr
RecPtr, bool randAccess,
 */
if (WalRcvStreaming())
XLogShutdownWalRcv();
+   else
+   {
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
XLogCtl->InstallXLogFileSegmentActive = false;
+   LWLockRelease(ControlFileLock);
+   }



--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




[PATCH] minor reloption regression tests improvement

2022-02-11 Thread Nikolay Shaplov
I'd like to suggest a patch for reloption regression tests.

This patch tests case, that can be rarely met in actual life: when reloptions 
have some illegal option set (as a result of malfunction or extension 
downgrade or something), and user tries to remove this option by using RESET.
Current postgres behaviour is to actually remove this option.

Like:

UPDATE pg_class  SET reloptions = '{illegal_option=4}' 
   WHERE oid = 'reloptions_test'::regclass;

ALTER TABLE reloptions_test RESET (illegal_option);

Why this should be tested:
1. It is what postgres actually do now.
2. This behaviour is reasonable. DB User can fix problem without updating 
pg_class, having rights to change his own table.
3. Better to get test alarm, if this behavior is accidentally changed.  

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su
diff --git a/src/test/regress/expected/reloptions.out b/src/test/regress/expected/reloptions.out
index b6aef6f654..9f460a7e60 100644
--- a/src/test/regress/expected/reloptions.out
+++ b/src/test/regress/expected/reloptions.out
@@ -183,6 +183,17 @@ SELECT reloptions FROM pg_class WHERE oid = (
  {autovacuum_vacuum_cost_delay=23}
 (1 row)
 
+-- Can reset option that is not allowed, but for some reason is already set
+UPDATE pg_class
+	SET reloptions = '{fillfactor=13,autovacuum_enabled=false,illegal_option=4}'
+	WHERE oid = 'reloptions_test'::regclass;
+ALTER TABLE reloptions_test RESET (illegal_option);
+SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass;
+reloptions
+--
+ {fillfactor=13,autovacuum_enabled=false}
+(1 row)
+
 --
 -- CREATE INDEX, ALTER INDEX for btrees
 --
diff --git a/src/test/regress/sql/reloptions.sql b/src/test/regress/sql/reloptions.sql
index 4252b0202f..fadce3384d 100644
--- a/src/test/regress/sql/reloptions.sql
+++ b/src/test/regress/sql/reloptions.sql
@@ -105,6 +105,13 @@ SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass;
 SELECT reloptions FROM pg_class WHERE oid = (
 	SELECT reltoastrelid FROM pg_class WHERE oid = 'reloptions_test'::regclass);
 
+-- Can reset option that is not allowed, but for some reason is already set
+UPDATE pg_class
+	SET reloptions = '{fillfactor=13,autovacuum_enabled=false,illegal_option=4}'
+	WHERE oid = 'reloptions_test'::regclass;
+ALTER TABLE reloptions_test RESET (illegal_option);
+SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass;
+
 --
 -- CREATE INDEX, ALTER INDEX for btrees
 --


Re: pgsql: Add TAP test to automate the equivalent of check_guc

2022-02-11 Thread Christoph Berg
Re: Michael Paquier
> Add TAP test to automate the equivalent of check_guc
> 
> src/backend/utils/misc/check_guc is a script that cross-checks the
> consistency of the GUCs with postgresql.conf.sample, making sure that

Hi,

this test is failing at Debian package compile time:

00:55:07 ok 18 - drop tablespace 2
00:55:07 ok 19 - drop tablespace 3
00:55:07 ok 20 - drop tablespace 4
00:55:07 ok
00:55:07 # Looks like your test exited with 2 before it could output anything.
00:55:09 t/003_check_guc.pl ..
00:55:09 1..3
00:55:09 Dubious, test returned 2 (wstat 512, 0x200)
00:55:09 Failed 3/3 subtests
00:55:09
00:55:09 Test Summary Report
00:55:09 ---
00:55:09 t/003_check_guc.pl(Wstat: 512 Tests: 0 Failed: 0)
00:55:09   Non-zero exit status: 2
00:55:09   Parse errors: Bad plan.  You planned 3 tests but ran 0.
00:55:09 Files=3, Tests=62,  6 wallclock secs ( 0.05 usr  0.01 sys +  3.31 cusr 
 1.35 csys =  4.72 CPU)
00:55:09 Result: FAIL
00:55:09 make[3]: *** [/<>/build/../src/makefiles/pgxs.mk:457: 
check] Error 1
00:55:09 make[3]: Leaving directory 
'/<>/build/src/test/modules/test_misc'

### Starting node "main"
# Running: pg_ctl -w -D 
/srv/projects/postgresql/pg/master/build/src/test/modules/test_misc/tmp_check/t_003_check_guc_main_data/pgdata
 -l 
/srv/projects/postgresql/pg/master/build/src/test/modules/test_misc/tmp_check/log/003_check_guc_main.log
 -o --cluster-name=main start
waiting for server to start done
server started
# Postmaster PID for node "main" is 1432398
Could not open /usr/share/postgresql/15/postgresql.conf.sample: No such file or 
directory at t/003_check_guc.pl line 47.
### Stopping node "main" using mode immediate


So it's trying to read from /usr/share/postgresql which doesn't exist
yet at build time.

The relevant part of the test is this:

# Find the location of postgresql.conf.sample, based on the information
# provided by pg_config.
my $sample_file =
  $node->config_data('--sharedir') . '/postgresql.conf.sample';


It never caused any problem in the 12+ years we have been doing this,
but Debian is patching pg_config not to be relocatable since we are
installing into /usr/lib/postgresql/NN /usr/share/postgresql/NN, so
not a single prefix.

https://salsa.debian.org/postgresql/postgresql/-/blob/15/debian/patches/50-per-version-dirs.patch

Christoph




Re: unlogged sequences

2022-02-11 Thread Peter Eisentraut

On 25.06.19 20:37, Andres Freund wrote:

I.e. I think it'd be better if we just added a fork argument to
fill_seq_with_data(), and then do something like

smgrcreate(srel, INIT_FORKNUM, false);
log_smgrcreate(>rd_node, INIT_FORKNUM);
fill_seq_with_data(rel, tuple, INIT_FORKNUM);

and add a FlushBuffer() to the end of fill_seq_with_data() if writing
INIT_FORKNUM. The if (RelationNeedsWAL(rel)) would need an || forkNum ==
INIT_FORKNUM.


Now that logical replication of sequences is nearing completion, I 
figured it would be suitable to dust off this old discussion on unlogged 
sequences, mainly so that sequences attached to unlogged tables can be 
excluded from replication.


Attached is a new patch that incorporates the above suggestions, with 
some slight refactoring.  The only thing I didn't/couldn't do was to 
call FlushBuffers(), since that is not an exported function.  So this 
still calls FlushRelationBuffers(), which was previously not liked. 
Ideas welcome.


I have also re-tested the crash reported by Michael Paquier in the old 
discussion and added test cases that catch them.


The rest of the patch is just documentation, DDL support, client 
support, etc.


What is not done yet is support for ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED.  This is a bit of a problem because:


1. The new behavior is that a serial/identity sequence of a new unlogged 
table is now also unlogged.
2. There is also a new restriction that changing a table to logged is 
not allowed if it is linked to an unlogged sequence.  (This is IMO 
similar to the existing restriction on linking mixed logged/unlogged 
tables via foreign keys.)
3. Thus, currently, you can't create an unlogged table with a 
serial/identity column and then change it to logged.  This is reflected 
in some of the test changes I had to make in alter_table.sql to work 
around this.  These should eventually go away.


Interestingly, there is grammar support for ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED because there is this:


|   ALTER SEQUENCE qualified_name alter_table_cmds
{
AlterTableStmt *n = makeNode(AlterTableStmt);
n->relation = $3;
n->cmds = $4;
n->objtype = OBJECT_SEQUENCE;
n->missing_ok = false;
$$ = (Node *)n;
}

But it is rejected later in tablecmds.c.  In fact, it appears that this 
piece of grammar is currently useless because there are no 
alter_table_cmds that actually work for sequences.  (This used to be 
different because things like OWNER TO also went through here.)


I tried to make tablecmds.c handle sequences as well, but that became 
messy.  So I'm thinking about making ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED an entirely separate code path and rip out the above 
grammar, but that needs some further pondering.


But all that is a bit of a separate effort, so in the meantime some 
review of the changes in and around fill_seq_with_data() would be useful.From 761d9be68da6557ccd03f06f41e7858380679406 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 11 Feb 2022 09:44:37 +0100
Subject: [PATCH v2] Unlogged sequences

Add support for unlogged sequences.  Unlike for unlogged tables, this
is not a performance feature.  It allows sequences associated with
unlogged tables to be excluded from replication.

An identity/serial sequence is automatically made unlogged when its
owning table is.  (See also discussion in bug #15631.)

TODO:
- ALTER SEQUENCE ... SET LOGGED/UNLOGGED

Discussion: 
https://www.postgresql.org/message-id/flat/04e12818-2f98-257c-b926-2845d74ed04f%402ndquadrant.com
---
 doc/src/sgml/ref/create_sequence.sgml  | 23 +-
 doc/src/sgml/ref/pg_dump.sgml  |  6 +--
 src/backend/commands/sequence.c| 36 +++
 src/backend/commands/tablecmds.c   | 22 ++
 src/backend/parser/parse_utilcmd.c |  1 +
 src/bin/pg_dump/pg_dump.c  |  4 +-
 src/bin/psql/describe.c|  8 +++-
 src/test/recovery/t/014_unlogged_reinit.pl | 51 +++---
 src/test/regress/expected/alter_table.out  | 40 +
 src/test/regress/expected/sequence.out | 12 -
 src/test/regress/sql/alter_table.sql   |  9 ++--
 src/test/regress/sql/sequence.sql  |  8 +++-
 12 files changed, 173 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/ref/create_sequence.sgml 
b/doc/src/sgml/ref/create_sequence.sgml
index 20bdbc002f..a84aa5bf56 100644
--- a/doc/src/sgml/ref/create_sequence.sgml
+++ b/doc/src/sgml/ref/create_sequence.sgml
@@ -21,7 +21,7 @@
 
  
 
-CREATE [ TEMPORARY | TEMP ] SEQUENCE [ IF NOT EXISTS ] name
+CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] SEQUENCE [ IF NOT EXISTS ] 
name
 [ AS data_type ]
 [ INCREMENT [ BY ] increment ]
 [ MINVALUE minvalue | NO 
MINVALUE ] [ MAXVALUE maxvalue | 
NO MAXVALUE ]
@@ -92,6 +92,27 @@ Parameters
 

 
+   
+

Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Simon Riggs
On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:

> Looks lik syncrep will make this a lot worse, because it can drastically
> increase the window between the TransactionIdCommitTree() and
> ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: refactoring basebackup.c

2022-02-11 Thread Dipesh Pandit
Hi,

Thanks for the feedback, I have incorporated the suggestions
and updated a new patch. PFA v2 patch.

> I think similar to bbstreamer_lz4_compressor_content() in
> bbstreamer_lz4_decompressor_content() we can change len to avail_in.

In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
based on the number of bytes decompressed in each iteration. I think
we cannot replace it with "len" here.

Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase. I have applied it on commit 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
to validate my v2 patch.

Thanks,
Dipesh
From 47a0ef4348747ffa61eccd7954e00f3cf5fc7222 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Thu, 3 Feb 2022 18:31:03 +0530
Subject: [PATCH] support client side compression and decompression using LZ4

---
 src/bin/pg_basebackup/Makefile|   1 +
 src/bin/pg_basebackup/bbstreamer.h|   3 +
 src/bin/pg_basebackup/bbstreamer_lz4.c| 431 ++
 src/bin/pg_basebackup/pg_basebackup.c |  32 +-
 src/bin/pg_verifybackup/t/009_extract.pl  |   7 +-
 src/bin/pg_verifybackup/t/010_client_untar.pl | 111 +++
 src/tools/msvc/Mkvcbuild.pm   |   1 +
 7 files changed, 580 insertions(+), 6 deletions(-)
 create mode 100644 src/bin/pg_basebackup/bbstreamer_lz4.c
 create mode 100644 src/bin/pg_verifybackup/t/010_client_untar.pl

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index ada3a5a..1d0db4f 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -43,6 +43,7 @@ BBOBJS = \
 	bbstreamer_file.o \
 	bbstreamer_gzip.o \
 	bbstreamer_inject.o \
+	bbstreamer_lz4.o \
 	bbstreamer_tar.o
 
 all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fe49ae3..c2de77b 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -206,6 +206,9 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
 			void (*report_output_file) (const char *));
 
 extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
+ int compresslevel);
+extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
new file mode 100644
index 000..f0bc226
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -0,0 +1,431 @@
+/*-
+ *
+ * bbstreamer_lz4.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_basebackup/bbstreamer_lz4.c
+ *-
+ */
+
+#include "postgres_fe.h"
+
+#include 
+
+#ifdef HAVE_LIBLZ4
+#include 
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBLZ4
+typedef struct bbstreamer_lz4_frame
+{
+	bbstreamer	base;
+
+	LZ4F_compressionContext_t	cctx;
+	LZ4F_decompressionContext_t	dctx;
+	LZ4F_preferences_t			prefs;
+
+	size_t		bytes_written;
+	bool		header_written;
+} bbstreamer_lz4_frame;
+
+static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+			  bbstreamer_member *member,
+			  const char *data, int len,
+			  bbstreamer_archive_context context);
+static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
+	.content = bbstreamer_lz4_compressor_content,
+	.finalize = bbstreamer_lz4_compressor_finalize,
+	.free = bbstreamer_lz4_compressor_free
+};
+
+static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+bbstreamer_member *member,
+const char *data, int len,
+bbstreamer_archive_context context);
+static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
+	.content = bbstreamer_lz4_decompressor_content,
+	.finalize = bbstreamer_lz4_decompressor_finalize,
+	.free = bbstreamer_lz4_decompressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs lz4 compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBLZ4
+	bbstreamer_lz4_frame   *streamer;
+	LZ4F_errorCode_t		ctxError;
+	LZ4F_preferences_t	   *prefs;
+	size_t	compressed_bound;
+
+