Re: Race conditions in 019_replslot_limit.pl

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 12:15:58 -0800, Andres Freund wrote:
> On 2022-02-25 15:07:01 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > Hmm.  Maybe put a couple more debug messages into ReplicationSlotCleanup
> > and/or ReplicationSlotDropPtr?  It doesn't seem very clear where in that
> > sequence it's hanging up.
> 
> Yea, was thinking that as well.
> 
> 
> I'm also wondering whether it's worth adding an assert, or at least a WARNING,
> about no lwlocks held to the tail end of ShutdownPostgres?  I don't want to
> add an LWLockReleaseAll() yet, before I understand what's actually happening.

Did those things. Sure looks like there's some interaction with
checkpointer. There's a similar sequence in the two failures since the
additional debugging.

2022-02-26 06:35:52.453 CET [6219bc37.8717f:20] DEBUG:  replication slot drop: 
pg_basebackup_553343: removed on-disk
2022-02-26 06:35:52.453 CET [6219bc37.87168:17] DEBUG:  snapshot of 0+0 running 
transaction ids (lsn 0/700060 oldest xid 720 latest complete 719 next xid 720)
2022-02-26 06:35:52.453 CET [6219bc37.87168:18] DEBUG:  begin invalidating 
obsolete replication slots older than 0/70
2022-02-26 06:35:52.453 CET [6219bc37.87168:19] DEBUG:  done invalidating 
obsolete replication slots
2022-02-26 06:35:52.453 CET [6219bc37.87168:20] DEBUG:  attempting to remove 
WAL segments older than log file 0006
2022-02-26 06:35:52.453 CET [6219bc37.87168:21] DEBUG:  removing write-ahead 
log file "00010006"
2022-02-26 06:35:52.453 CET [6219bc37.87168:22] DEBUG:  SlruScanDirectory 
invoking callback on pg_subtrans/
2022-02-26 06:35:52.453 CET [6219bc37.87168:23] LOG:  checkpoint complete: 
wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 
s, sync=0.001 s, total=0.351 s; sync files=0, longest=0.000 s, average=0.000 s; 
distance=1024 kB, estimate=1024 kB
[...]
2022-02-26 06:35:52.970 CET [6219bc37.87139:71] LOG:  received fast shutdown 
request
2022-02-26 06:35:52.970 CET [6219bc37.87139:72] LOG:  aborting any active 
transactions
[...]
2022-02-26 06:35:52.971 CET [6219bc37.87168:24] LOG:  shutting down
2022-02-26 06:35:52.976 CET [6219bc37.8717f:21] DEBUG:  replication slot drop: 
pg_basebackup_553343: done
2022-02-26 06:35:52.976 CET [6219bc37.8717f:22] DEBUG:  temporary replication 
slot cleanup: begin
2022-02-26 06:35:52.976 CET [6219bc37.8717f:23] DEBUG:  temporary replication 
slot cleanup: 0 in use, active_pid: 553385
2022-02-26 06:35:52.976 CET [6219bc37.8717f:24] DEBUG:  temporary replication 
slot cleanup: done
2022-02-26 06:35:52.977 CET [6219bc37.8717f:25] DEBUG:  shmem_exit(0): 7 
on_shmem_exit callbacks to make
2022-02-26 06:35:52.977 CET [6219bc37.8717f:26] DEBUG:  proc_exit(0): 3 
callbacks to make
2022-02-26 06:35:52.977 CET [6219bc37.8717f:27] LOG:  disconnection: session 
time: 0:00:00.996 user=bf database= host=[local]
2022-02-26 06:35:52.977 CET [6219bc37.8717f:28] DEBUG:  exit(0)
2022-02-26 06:35:52.977 CET [6219bc37.8717f:29] DEBUG:  shmem_exit(-1): 0 
before_shmem_exit callbacks to make
2022-02-26 06:35:52.977 CET [6219bc37.8717f:30] DEBUG:  shmem_exit(-1): 0 
on_shmem_exit callbacks to make
2022-02-26 06:35:52.977 CET [6219bc37.8717f:31] DEBUG:  proc_exit(-1): 0 
callbacks to make

So the backend (8717f/553343) is in the middle of ReplicationSlotDrop(), after
removing on-disk data. And then for 500ms nothing happens until checkpointer
wakes up again. As soon as it does, the slot drop continues.

Just before calling ReplicationSlotDrop() we were able to acquire
ReplicationSlotControlLock in share mode. Just after the log message after
which there's the delay, ReplicationSlotControlLock is locked in exclusive
mode.

Too tired now..

Greetings,

Andres Freund




Re: Allow file inclusion in pg_hba and pg_ident files

2022-02-25 Thread Julien Rouhaud
On Sat, Feb 26, 2022 at 03:36:19PM +0900, Michael Paquier wrote:
> On Sat, Feb 26, 2022 at 02:27:15PM +0800, Julien Rouhaud wrote:
> 
> > Note that in order to do so we would need to expose quite a lot more about 
> > hba
> > internals, like tokenize_file() and parse_hba_line(), along with structs
> > HbaToken and TokenizedLine.
> 
> I'd be rather fine with exposing all that in the shape of a clean
> split, renaming some of those structures and/or function with an
> Hba-like prefix, for consistency.

Of course.  I was thinking using "auth" for something that's common to pg_hba
and pg_ident (like e.g. TokenizeAuthFile()), and otherwise keep the current
hba/ident prefix.

Unless someone object or suggest better naming in the next few days I will take
care of that.

I would also welcome some opinion on the points I mentioned about 0002.




Re: Allow file inclusion in pg_hba and pg_ident files

2022-02-25 Thread Michael Paquier
On Sat, Feb 26, 2022 at 02:27:15PM +0800, Julien Rouhaud wrote:
> I'm fine with it.  Assuming that you meant to move also the underlying
> functions that goes with it (fill_hba_line and such), that would end up
> removing about 15% of hba.c (after applying 0001, 0002 and 0003).

Cool.  Thanks for the number.

> Note that in order to do so we would need to expose quite a lot more about hba
> internals, like tokenize_file() and parse_hba_line(), along with structs
> HbaToken and TokenizedLine.

I'd be rather fine with exposing all that in the shape of a clean
split, renaming some of those structures and/or function with an
Hba-like prefix, for consistency.
--
Michael


signature.asc
Description: PGP signature


Re: Commitfest manager for 2022-03

2022-02-25 Thread Julien Rouhaud
On Sat, Feb 26, 2022 at 02:42:33PM +0900, Michael Paquier wrote:
> On Fri, Feb 25, 2022 at 01:58:55PM -0600, David Steele wrote:
> > On 2/25/22 12:39, Greg Stark wrote:
> >> I would like to volunteer.
> 
> > I've been hoping somebody would volunteer, so I'm all in favor of you being
> > CF.
> 
> Greg as CFM would be fine.  It's nice to see someone volunteer.

+1, thanks a lot Greg!

Note that the last CF of a major version is probably even more work than
"regular" CF, so if other people are interested there's probably room for
multiple CFMs.




Re: Allow file inclusion in pg_hba and pg_ident files

2022-02-25 Thread Julien Rouhaud
Hi,

On Sat, Feb 26, 2022 at 03:04:43PM +0900, Michael Paquier wrote:
> On Wed, Feb 23, 2022 at 09:44:58AM -0800, Nathan Bossart wrote:
> > On Wed, Feb 23, 2022 at 12:59:59PM +0800, Julien Rouhaud wrote:
> >> 0001 adds a new pg_ident_file_mappings view, which is basically the same as
> >> pg_hba_file_rules view but for mappings.  It's probably already useful, for
> >> instance if you need to tweak some regexp.
> > 
> > This seems reasonable.
> 
> Interesting.  One can note that hba.c is already large, and this makes
> the file larger.  I'd like to think that it would be better to move
> all the code related to the SQL functions for pg_hba.conf and such to
> a new hbafuncs.c under adt/.  Would that make sense?

I'm fine with it.  Assuming that you meant to move also the underlying
functions that goes with it (fill_hba_line and such), that would end up
removing about 15% of hba.c (after applying 0001, 0002 and 0003).

Note that in order to do so we would need to expose quite a lot more about hba
internals, like tokenize_file() and parse_hba_line(), along with structs
HbaToken and TokenizedLine.




Re: Allow file inclusion in pg_hba and pg_ident files

2022-02-25 Thread Michael Paquier
On Wed, Feb 23, 2022 at 09:44:58AM -0800, Nathan Bossart wrote:
> On Wed, Feb 23, 2022 at 12:59:59PM +0800, Julien Rouhaud wrote:
>> 0001 adds a new pg_ident_file_mappings view, which is basically the same as
>> pg_hba_file_rules view but for mappings.  It's probably already useful, for
>> instance if you need to tweak some regexp.
> 
> This seems reasonable.

Interesting.  One can note that hba.c is already large, and this makes
the file larger.  I'd like to think that it would be better to move
all the code related to the SQL functions for pg_hba.conf and such to
a new hbafuncs.c under adt/.  Would that make sense?
--
Michael


signature.asc
Description: PGP signature


Re: Frontend error logging style

2022-02-25 Thread Tom Lane
Michael Paquier  writes:
> Hm.  I am not sure to like much this abstraction by having so many
> macros that understand the log level through their name.  Wouldn't it
> be cleaner to have only one pg_log_hint() and one pg_log_detail() with
> the log level passed as argument of the macro?

Mmm ... that doesn't sound better to me.  I think it wouldn't be
obvious that pg_log_warning and pg_log_hint are fundamentally
different sorts of things: in the first, "warning" refers to
an error severity level, while in the second, "hint" refers to
a message component.  I'm not wedded to the way I did it in this
patch, but I think we ought to maintain a notational distinction
between those two sorts of concepts.

regards, tom lane




Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-25 Thread Michael Paquier
On Fri, Feb 25, 2022 at 10:35:49AM +0100, Gunnar "Nick" Bluth wrote:
> Am 24.02.22 um 14:46 schrieb Daniel Gustafsson:
>> Actually, I think this looks like a saner approach.  Putting a config setting
>> in two place (postgresql.conf and on the commandline for pg_rewind) is a 
>> recipe
>> for them diverging.

FWIW, I have a bad feeling about passing down directly a command
through an option itself part of a command, so what's discussed on
this thread is refreshing.

+   } else {
+   snprintf(postgres_cmd, sizeof(postgres_cmd),
+"\"%s\" -D \"%s\" --config_file=\"%s\" -C 
restore_command",
+postgres_exec_path, datadir_target, config_file);
+   }

Shouldn't this one use appendShellString() on config_file?
--
Michael


signature.asc
Description: PGP signature


Re: Commitfest manager for 2022-03

2022-02-25 Thread Michael Paquier
On Fri, Feb 25, 2022 at 01:58:55PM -0600, David Steele wrote:
> On 2/25/22 12:39, Greg Stark wrote:
>> I would like to volunteer.
> 
> Since I've done the March CF for the last seven years, I thought it would be
> good if I chimed in. I've been agonizing a bit because I have travel planned
> during March and while I *could* do it I don't think I'd be able to do a
> good job.

That's a time-consuming task, and there have been little occasions to
enjoy travelling lately, so have a good time :)

> I've been hoping somebody would volunteer, so I'm all in favor of you being
> CF.

Greg as CFM would be fine.  It's nice to see someone volunteer.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-25 Thread Michael Paquier
On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
> Looks to me like authn_id isn't synchronized to parallel workers right now. So
> the function will return the wrong thing when executed as part of a parallel
> query.

FWIW, I am not completely sure what's the use case for being able to
see the authn of the current session through a trigger.  We expose
that when log_connections is enabled, for audit purposes.  I can also
get behind something more central so as one can get a full picture of
the authn used by a bunch of session, particularly with complex HBA
policies, but this looks rather limited to me in usability.  Perhaps
that's not enough to stand as an objection, though, and the patch is
dead simple.

> I don't think we should add further functions not prefixed with pg_.

Yep.

> Perhaps a few tests for less trivial authn_ids could be worthwhile?
> E.g. certificate DNs.

Yes, src/test/ssl would handle that just fine.  Now, this stuff
already looks after authn results with log_connections=on, so that
feels like a duplicate.
--
Michael


signature.asc
Description: PGP signature


Re: Frontend error logging style

2022-02-25 Thread Michael Paquier
On Fri, Feb 25, 2022 at 12:15:25PM -0500, Tom Lane wrote:
> I feel that the reasonable alternatives are either to drop the FATAL
> log level, or try to make it actually mean something by consistently
> using it for errors that are indeed fatal.  I had a go at doing the
> latter, but eventually concluded that that way madness lies.  It's
> not too hard to use "pg_log_fatal" when there's an exit(1) right
> after it, but there are quite a lot of cases where a subroutine
> reports an error and returns a failure code to its caller, whereupon
> the caller exits.  Either the subroutine has to make an unwarranted
> assumption about what its callers will do, or we need to make an API
> change to allow the subroutine itself to exit(), or we are going to
> present a user experience that is inconsistently different depending
> on internal implementation details.

Nice code cut.

> I conclude that we ought to drop the separate FATAL level and just
> use ERROR instead.

FWIW, I have found the uses of pg_log_error() and pg_log_fatal()
rather confusing in some of the src/bin/ tools, so I am in favor of
removing completely one or the other, and getting rid of FATAL is the
best choice.

> The attached revision does that, standardizes on pg_fatal() as the
> abbreviation for pg_log_error() + exit(1), and invents detail/hint
> features as per previous discussion.

+#define pg_log_warning_hint(...) do { \
+   if (likely(__pg_log_level <= PG_LOG_WARNING)) \
+   pg_log_generic(PG_LOG_WARNING, PG_LOG_HINT, __VA_ARGS__);
\
} while(0)

Hm.  I am not sure to like much this abstraction by having so many
macros that understand the log level through their name.  Wouldn't it
be cleaner to have only one pg_log_hint() and one pg_log_detail() with
the log level passed as argument of the macro?
--
Michael


signature.asc
Description: PGP signature


Re: should vacuum's first heap pass be read-only?

2022-02-25 Thread Dilip Kumar
On Fri, Feb 25, 2022 at 10:45 PM Peter Geoghegan  wrote:
>
> On Fri, Feb 25, 2022 at 5:06 AM Dilip Kumar  wrote:
> > Based on this discussion, IIUC, we are saying that now we will do the
> > lazy_scan_heap every time like we are doing now.  And we will
> > conditionally skip the index vacuum for all or some of the indexes and
> > then based on how much index vacuum is done we will conditionally do
> > the lazy_vacuum_heap_rel().  Is my understanding correct?
>
> Bear in mind that the cost of lazy_scan_heap is often vastly less than
> the cost of vacuuming all indexes -- and so doing a bit more work
> there than theoretically necessary is not necessarily a problem.
> Especially if you have something like UUID indexes, where there is no
> natural locality. Many tables have 10+ indexes. Even large tables.

Completely agree with that.

> > IMHO, if we are doing the heap scan every time and then we are going
> > to get the same dead items again which we had previously collected in
> > the conveyor belt.  I agree that we will not add them again into the
> > conveyor belt but why do we want to store them in the conveyor belt
> > when we want to redo the whole scanning again?
>
> I don't think we want to, exactly. Maybe it's easier to store
> redundant TIDs than to avoid storing them in the first place (we can
> probably just accept some redundancy). There is a trade-off to be made
> there. I'm not at all sure of what the best trade-off is, though.

Yeah we can think of that.

> > I think (without global indexes) the main advantage of using the
> > conveyor belt is that if we skip the index scan for some of the
> > indexes then we can save the dead item somewhere so that without
> > scanning the heap again we have those dead items to do the index
> > vacuum sometime in future
>
> Global indexes are important in their own right, but ISTM that they
> have similar needs to other things anyway. Having this flexibility is
> even more important with global indexes, but the concepts themselves
> are similar. We want options and maximum flexibility, everywhere.

+1

> > but if you are going to rescan the heap
> > again next time before doing any index vacuuming then why we want to
> > store them anyway.
>
> It all depends, of course. The decision needs to be made using a cost
> model. I suspect it will be necessary to try it out, and see.

Yeah right.  But I still think that we should be thinking toward
skipping the first vacuum pass also conditionally.  I mean if there
are not many new dead tuples which we realize before evejn starting
the heap scan then why not directly jump to the index vacuuming if
some of the index needs vacuum.  But I agree based on some testing and
cost model we can decide what is the best way forward.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Fix typo in logicalfuncs.c - :%s/private date/Private data

2022-02-25 Thread Amit Kapila
On Fri, Feb 25, 2022 at 8:39 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> Here's a tiny patch to do $subject.
>

LGTM. I'll push this in some time.

-- 
With Regards,
Amit Kapila.




Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-25 Thread Amit Kapila
On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman
 wrote:
>
> Since _hash_alloc_buckets() WAL-logs the last page of the
> splitpoint, is it safe to skip the smgrimmedsync()? What if the last
> page of the splitpoint doesn't end up having any tuples added to it
> during the index build and the redo pointer is moved past the WAL for
> this page and then later there is a crash sometime before this page
> makes it to permanent storage. Does it matter that this page is lost? If
> not, then why bother WAL-logging it?
>

I think we don't care if the page is lost before we update the
meta-page in the caller because we will try to reallocate in that
case. But we do care after meta page update (having the updated
information about this extension via different masks) in which case we
won't lose this last page because it would have registered the sync
request for it via sgmrextend before meta page update.

-- 
With Regards,
Amit Kapila.




Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?

2022-02-25 Thread Michael Paquier
On Fri, Feb 25, 2022 at 01:09:53PM -0800, Nathan Bossart wrote:
> This one has been quiet for a while.  Should we mark it as
> returned-with-feedback?

Yes, that's my feeling and I got cold feet about this change.  This
patch would bring some extra visibility for something that's not
incorrect either on HEAD, as end-of-recovery checkpoints are the same
things as shutdown checkpoints.  And there is an extra argument where
back-patching would become a bit more tricky in an area that's already
a lot sensitive.
--
Michael


signature.asc
Description: PGP signature


Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-25 Thread Amit Kapila
On Fri, Feb 25, 2022 at 7:26 AM Peter Smith  wrote:
>
> Below are my review comments for the v3 patch.
>
...
> 2. doc/src/sgml/monitoring.sgml
>
> +  
> pg_stat_subscription_activitypg_stat_subscription_activity
> +  One row per subscription, showing statistics about subscription
> +  activity.
> +  See 
> +  pg_stat_subscription_activity
> for details.
>
>   
>
> Currently these stats are only about errors. These are not really
> statistics about "activity" though. Probably it is better just to
> avoid that word altogether?
>
> SUGGESTIONS
>
> e.g.1. "One row per subscription, showing statistics about
> subscription activity." --> "One row per subscription, showing
> statistics about errors."
>

I preferred this one and made another change suggested by you in the
latest version posted by me. Thanks!

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-25 Thread Amit Kapila
On Thu, Feb 24, 2022 at 9:20 PM Masahiko Sawada  wrote:
>

I have reviewed the latest version and made a few changes along with
fixing some of the pending comments by Peter Smith. The changes are as
follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as
that is not required now; (b) changed the struct name
PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it
similar to DropDb; (c) changed the view name to
pg_stat_subscription_stats, we can reconsider it in future if there is
a consensus on some other name, accordingly changed the reset function
name to pg_stat_reset_subscription_stats; (d) moved some of the newly
added subscription stats functions adjacent to slots to main the
consistency in code; (e) changed comments at few places; (f) added
LATERAL back to system_views query as we refer pg_subscription's oid
in the function call, previously that was not clear.

Do let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.


v4-0001-Reconsider-pg_stat_subscription_workers-view.patch
Description: Binary data


Re: Adding CI to our tree

2022-02-25 Thread Justin Pryzby
This is the other half of my CI patches, which are unrelated to the TAP ones on
the other thread.
>From 88c01c09ee26db2817629265fc12b2dbcd8c9a91 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d10b0a82f9..1b7c36283e 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +181,12 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+#apt-get update
+#apt-get -y install ...
 
   configure_script: |
 su postgres <<-EOF
@@ -237,7 +240,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export
 
-  setup_cores_script:
+  setup_os_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
@@ -384,6 +387,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set
 
+  setup_os_script: |
+REM choco install -y ...
+
   configure_script:
 # copy errors out when using forward slashes
 - copy src\tools\ci\windows_build_config.pl src\tools\msvc\config.pl
@@ -479,6 +485,10 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  setup_os_script: |
+#apt-get update
+#apt-get -y install ...
+
   ###
   # Test that code can be built with gcc/clang without warnings
   ###
-- 
2.17.1

>From 7a80dfccb17f454849679ebe9abc89bd0901828a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 15 Jan 2022 11:27:28 -0600
Subject: [PATCH 2/7] cirrus: upload changed html docs as artifacts

Always run doc build; to allow them to be shown in cfbot, they should not be
skipped if the linux build fails.

This could be done on the client side (cfbot).  One advantage of doing it here
is that fewer docs are uploaded - many patches won't upload docs at all.

XXX: if this is run in the same task, the configure flags should probably be
consistent ?

https://cirrus-ci.com/task/5396696388599808

ci-os-only: linux
---
 .cirrus.yml | 46 ++
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 1b7c36283e..f21b249b6f 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -450,10 +450,6 @@ task:
 task:
   name: CompilerWarnings
 
-  # To limit unnecessary work only run this once the normal linux test succeeds
-  depends_on:
-- Linux - Debian Bullseye
-
   env:
 CPUS: 4
 BUILD_JOBS: 4
@@ -482,6 +478,13 @@ task:
 clang -v
 export
 
+git remote -v
+git branch -a
+git remote add postgres https://github.com/postgres/postgres
+time git fetch -v postgres master
+git log -1 postgres/master
+git diff --name-only postgres/master..
+
   ccache_cache:
 folder: $CCACHE_DIR
 
@@ -557,16 +560,35 @@ task:
   ###
   # Verify docs can be built
   ###
-  # XXX: Only do this if there have been changes in doc/ since last build
   always:
 docs_build_script: |
-  time ./configure \
---cache gcc.cache \
-CC="ccache gcc" \
-CXX="ccache g++" \
-CLANG="ccache clang"
-  make -s -j${BUILD_JOBS} clean
-  time make -s -j${BUILD_JOBS} -C doc
+  # Do nothing if the patch doesn't update doc:
+  git diff --name-only --cherry-pick --exit-code postgres/master... doc && exit
+
+  # Exercise HTML and other docs:
+  ./configure >/dev/null
+  make -s -C doc
+  cp -r doc new-docs
+
+  # Build HTML docs from the parent commit
+  git checkout postgres/master -- doc
+  make -s -C doc clean
+  make -s -C doc html
+  cp -r doc old-docs
+
+  # Copy HTML which differ into html_docs
+  # This will show any files which differ from files generated from postgres/master.
+  # Commits to postgres/master which affect doc/ will cause more files to be
+  # included until the patch is rebased.
+  mkdir html_docs
+  cp new-docs/src/sgml/html/*.css new-docs/src/sgml/html/*.svg html_docs/
+  for f in 

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-25 Thread Peter Geoghegan
On Fri, Feb 25, 2022 at 3:26 PM Andres Freund  wrote:
> freeze_required_limit, freeze_desired_limit? Or s/limit/cutoff/? Or
> s/limit/below/? I kind of like below because that answers < vs <= which I find
> hard to remember around freezing.

I like freeze_required_limit the most.

> That may be true, but I think working more incrementally is better in this
> are. I'd rather have a smaller improvement for a release, collect some data,
> get another improvement in the next, than see a bunch of reports of larger
> wind and large regressions.

I agree.

There is an important practical way in which it makes sense to treat
0001 as separate to 0002. It is true that 0001 is independently quite
useful. In practical terms, I'd be quite happy to just get 0001 into
Postgres 15, without 0002. I think that that's what you meant here, in
concrete terms, and we can agree on that now.

However, it is *also* true that there is an important practical sense
in which they *are* related. I don't want to ignore that either -- it
does matter. Most of the value to be had here comes from the synergy
between 0001 and 0002 -- or what I've been calling a "virtuous cycle",
the thing that makes it possible to advance relfrozenxid/relminmxid in
almost every VACUUM. Having both 0001 and 0002 together (or something
along the same lines) is way more valuable than having just one.

Perhaps we can even agree on this second point. I am encouraged by the
fact that you at least recognize the general validity of the key ideas
from 0002. If I am going to commit 0001 (and not 0002) ahead of
feature freeze for 15, I better be pretty sure that I have at least
roughly the right idea with 0002, too -- since that's the direction
that 0001 is going in. It almost seems dishonest to pretend that I
wasn't thinking of 0002 when I wrote 0001.

I'm glad that you seem to agree that this business of accumulating
freezing debt without any natural limit is just not okay. That is
really fundamental to me. I mean, vacuum_freeze_min_age kind of
doesn't work as designed. This is a huge problem for us.

> > Under these conditions, we will have many more opportunities to
> > advance relminmxid for most of the tables (including the larger
> > tables) all the way up to current-oldestMxact with the patch series.
> > Without needing to freeze *any* MultiXacts early (just freezing some
> > XIDs early) to get that benefit. The patch series is not just about
> > spreading the burden of freezing, so that non-aggressive VACUUMs
> > freeze more -- it's also making relfrozenxid and relminmxid more
> > recent and therefore *reliable* indicators of which tables any
> > wraparound problems *really* are.
>
> My concern was explicitly about the case where we have to create new
> multixacts...

It was a mistake on my part to counter your point about that with this
other point about eager relminmxid advancement. As I said in the last
email, while that is very valuable, it's not something that needs to
be brought into this.

> > Does that make sense to you?
>
> Yes.

Okay, great. The fact that you recognize the value in that comes as a relief.

> > You mean to change the signature of heap_tuple_needs_freeze, so it
> > doesn't return a bool anymore? It just has two bool pointers as
> > arguments, can_freeze and need_freeze?
>
> Something like that. Or return true if there's anything to do, and then rely
> on can_freeze and need_freeze for finer details. But it doesn't matter that 
> much.

Got it.

> > The problem that all of these heuristics have is that they will tend
> > to make it impossible for future non-aggressive VACUUMs to be able to
> > advance relfrozenxid. All that it takes is one single all-visible page
> > to make that impossible. As I said upthread, I think that being able
> > to advance relfrozenxid (and especially relminmxid) by *some* amount
> > in every VACUUM has non-obvious value.
>
> I think that's a laudable goal. But I don't think we should go there unless we
> are quite confident we've mitigated the potential downsides.

True. But that works both ways. We also shouldn't err in the direction
of adding these kinds of heuristics (which have real downsides) until
the idea of mostly swallowing the cost of freezing whole pages (while
making it possible to disable) has lost, fairly. Overall, it looks
like the cost is acceptable in most cases.

I think that users will find it very reassuring to regularly and
reliably see confirmation that wraparound is being kept at bay, by
every VACUUM operation, with details that they can relate to their
workload. That has real value IMV -- even when it's theoretically
unnecessary for us to be so eager with advancing relfrozenxid.

I really don't like the idea of falling behind on freezing
systematically. You always run the "risk" of freezing being wasted.
But that way of looking at it can be penny wise, pound foolish --
maybe we should just accept that trying to predict what will happen in
the future (whether or not freezing will be 

Re: convert libpq uri-regress tests to tap test

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 09:56:47 -0800, Andres Freund wrote:
> On 2022-02-24 08:46:23 -0800, Andres Freund wrote:
> > I'm mildly inclined to only do 0001 and 0002 for now. We'd not loose msvc
> > coverage, because it already doesn't build the test. Once we've ironed that
> > stuff out, we could do 0003?
> 
> From what I can see in the buildfarm client, we'd not loose (nor gain) any
> buildfarm coverage either. It doesn't run the test today.

Attached are rebased patches. I polished 0001, the regress.pl -> 001_uri.pl
conversion some more (although some of perltidy's changes aren't clearly an
improvement).

I'd like to commit 0001 and 0002 soon, unless somebody sees a reason not to?

Greetings,

Andres Freund
>From 87017e4b0a78c486eca24aab96f32e1168c82b93 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 23 Feb 2022 12:22:56 -0800
Subject: [PATCH v3 1/3] Convert src/interfaces/libpq/test to a tap test.

The old form of the test needed a bunch of custom infrastructure. These days
tap tests provide the necessary infrastructure to do better.

We discussed whether to move this test to src/test/modules, alongside
libpq_pipeline, but concluded that the opposite direction would be
better. libpq_pipeline will be moved at a later date, once the buildfarm and
msvc build infrastructure is ready for it.

The invocation of the tap test will be added in the next commit. It involves
just enough buildsystem changes to be worth commiting separately. Can't happen
the other way round because prove errors out when invoked without tests.

Discussion: https://postgr.es/m/20220223203031.ezrd73ohvjgfk...@alap3.anarazel.de
---
 src/interfaces/libpq/t/001_uri.pl  | 244 +
 src/interfaces/libpq/test/.gitignore   |   2 -
 src/interfaces/libpq/test/Makefile |   7 +-
 src/interfaces/libpq/test/README   |   7 -
 src/interfaces/libpq/test/expected.out | 171 -
 src/interfaces/libpq/test/regress.in   |  57 --
 src/interfaces/libpq/test/regress.pl   |  65 ---
 7 files changed, 246 insertions(+), 307 deletions(-)
 create mode 100644 src/interfaces/libpq/t/001_uri.pl
 delete mode 100644 src/interfaces/libpq/test/README
 delete mode 100644 src/interfaces/libpq/test/expected.out
 delete mode 100644 src/interfaces/libpq/test/regress.in
 delete mode 100644 src/interfaces/libpq/test/regress.pl

diff --git a/src/interfaces/libpq/t/001_uri.pl b/src/interfaces/libpq/t/001_uri.pl
new file mode 100644
index 000..90f370f8fd6
--- /dev/null
+++ b/src/interfaces/libpq/t/001_uri.pl
@@ -0,0 +1,244 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Utils;
+use Test::More;
+use IPC::Run;
+
+
+# List of URIs tests. For each test the first element is the input string, the
+# second the expected stdout and the third the expected stderr.
+my @tests = (
+	[
+		q{postgresql://uri-user:secret@host:12345/db},
+		q{user='uri-user' password='secret' dbname='db' host='host' port='12345' (inet)},
+		q{},
+	],
+	[
+		q{postgresql://uri-user@host:12345/db},
+		q{user='uri-user' dbname='db' host='host' port='12345' (inet)}, q{},
+	],
+	[
+		q{postgresql://uri-user@host/db},
+		q{user='uri-user' dbname='db' host='host' (inet)}, q{},
+	],
+	[
+		q{postgresql://host:12345/db},
+		q{dbname='db' host='host' port='12345' (inet)}, q{},
+	],
+	[ q{postgresql://host/db}, q{dbname='db' host='host' (inet)}, q{}, ],
+	[
+		q{postgresql://uri-user@host:12345/},
+		q{user='uri-user' host='host' port='12345' (inet)},
+		q{},
+	],
+	[
+		q{postgresql://uri-user@host/},
+		q{user='uri-user' host='host' (inet)},
+		q{},
+	],
+	[ q{postgresql://uri-user@},   q{user='uri-user' (local)}, q{}, ],
+	[ q{postgresql://host:12345/}, q{host='host' port='12345' (inet)}, q{}, ],
+	[ q{postgresql://host:12345},  q{host='host' port='12345' (inet)}, q{}, ],
+	[ q{postgresql://host/db}, q{dbname='db' host='host' (inet)},  q{}, ],
+	[ q{postgresql://host/},   q{host='host' (inet)},  q{}, ],
+	[ q{postgresql://host},q{host='host' (inet)},  q{}, ],
+	[ q{postgresql://},q{(local)}, q{}, ],
+	[
+		q{postgresql://?hostaddr=127.0.0.1}, q{hostaddr='127.0.0.1' (inet)},
+		q{},
+	],
+	[
+		q{postgresql://example.com?hostaddr=63.1.2.4},
+		q{host='example.com' hostaddr='63.1.2.4' (inet)},
+		q{},
+	],
+	[ q{postgresql://%68ost/}, q{host='host' (inet)}, q{}, ],
+	[
+		q{postgresql://host/db?user=uri-user},
+		q{user='uri-user' dbname='db' host='host' (inet)},
+		q{},
+	],
+	[
+		q{postgresql://host/db?user=uri-user=12345},
+		q{user='uri-user' dbname='db' host='host' port='12345' (inet)},
+		q{},
+	],
+	[
+		q{postgresql://host/db?u%73er=someotheruser=12345},
+		q{user='someotheruser' dbname='db' host='host' port='12345' (inet)},
+		q{},
+	],
+	[
+		q{postgresql://host/db?u%7aer=someotheruser=12345}, q{},
+		q{uri-regress: invalid URI query parameter: "uzer"},
+	],
+	[
+		

Re: set TESTDIR from perl rather than Makefile

2022-02-25 Thread Justin Pryzby
On Sun, Feb 20, 2022 at 04:39:08PM -0800, Andres Freund wrote:
> On 2022-02-19 17:53:09 -0600, Justin Pryzby wrote:
> > I also meant to also attach it.
> 
> Is the patch actually independent of the other patches in your stack?

Yes - I rearranged it that way for this thread.

However, it's best served/combined with the alltaptests patch :)

> > -   $expected = slurp_file_eval("traces/$testname.trace");
> > +   my $inputdir = "$ENV{'TESTDIR'}/tmp_check";
> > +   $expected = slurp_file_eval("$inputdir/traces/$testname.trace");
> 
> Why is this needed? Shouldn't we end up in exactly the same dir with/without
> this patch?

Right - I'd incorrectly set test_dir to t/ rather than the parent dir.

> > +++ b/src/tools/msvc/vcregress.pl
> > @@ -261,10 +261,8 @@ sub tap_check
> > $ENV{PG_REGRESS}= "$topdir/$Config/pg_regress/pg_regress";
> > $ENV{REGRESS_SHLIB} = "$topdir/src/test/regress/regress.dll";
> >  
> > -   $ENV{TESTDIR} = "$dir";
> > my $module = basename $dir;
> > -   # add the module build dir as the second element in the PATH
> > -   $ENV{PATH} =~ s!;!;$topdir/$Config/$module;!;
> > +   #$ENV{VCREGRESS_MODE} = $Config;
> 
> Hm. How does the module build dir end up on PATH after this?

That patch only dealt with TESTDIR.  PATH was still set by makefiles.
For MSVC, it was being set to top_builddir/tmp_install.

I've added a 2nd patch to also set PATH.  Maybe it should also set
PATH=$bindir:$PATH - I'm not sure.

https://github.com/justinpryzby/postgres/runs/5340884168
>From 9268dc1f10eed605d584ade20e263a6c32439e2e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 19 Feb 2022 13:06:52 -0600
Subject: [PATCH 1/7] wip: set TESTDIR from src/test/perl rather than
 Makefile/vcregress

TODO: PATH

These seem most likely to break:

make check -C src/bin/psql
make check -C src/bin/pgbench
make check -C src/test/modules/test_misc
make check -C src/test/modules/libpq_pipeline
PROVE_TESTS=t/027_stream_regress.pl make check -C src/test/recovery
---
 src/Makefile.global.in |  9 ++---
 src/test/perl/PostgreSQL/Test/Utils.pm | 16 +---
 src/tools/msvc/vcregress.pl|  1 -
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index c980444233..92649d0193 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -451,8 +451,9 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' \
+   PG_SUBDIR='$(CURDIR)' \
PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
@@ -461,8 +462,9 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
+   PG_SUBDIR='$(CURDIR)' \
PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
@@ -472,7 +474,8 @@ define prove_check
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
+   $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
+   PG_SUBDIR='$(CURDIR)' \
PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 46cd746796..0e2abb6c9e 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -184,19 +184,21 @@ INIT
 	# test may still fail, but it's more likely to report useful facts.
 	$SIG{PIPE} = 'IGNORE';
 
-	# Determine output directories, and create them.  The base path is the
-	# TESTDIR environment variable, which is normally set by the invoking
-	# Makefile.
-	$tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check";
+	my $test_dir = File::Spec->rel2abs($ENV{PG_SUBDIR} || dirname(dirname($0)));
+	my $test_name = basename($0);
+	$test_name =~ s/\.[^.]+$//;
+
+	$ENV{TESTDIR} = $test_dir;
+
+	# Determine output directories, and create them.
+	$tmp_check = "$test_dir/tmp_check";
 	$log_path = "$tmp_check/log";
 
 	mkdir $tmp_check;
 	mkdir $log_path;
 
 	# Open the test log file, whose name depends on the test name.
-	$test_logfile = basename($0);
-	$test_logfile =~ s/\.[^.]+$//;
-	$test_logfile = "$log_path/regress_log_$test_logfile";
+	$test_logfile = "$log_path/regress_log_$test_name";
 	open my $testlog, '>', $test_logfile
 	 

Re: Mingw task for Cirrus CI

2022-02-25 Thread Andres Freund
Hi,

Andrew, CCIng you both because you might be interested in the CI bit, and
because you might know the answer.

On 2022-02-25 19:44:27 +0300, Melih Mutlu wrote:
> I've been working on adding Windows+MinGW environment into cirrus-ci tasks
> (discussion about ci is here [1]).
> It uses MSYS2 to set the environment. UCRT is chosen as C standard library,
> instead of MSVCRT.
> The task configures postgres with features that are available in MSYS2 (see
> available packages [2]) and tap tests are enabled.
> I already added the necessary docker image, you can find the related PR at
> [3] and a successful cirruc-ci run with these changes at [4].
> Attached patch adds a task runs on Windows with MinGW for cirrus-ci
>
> However, I cannot run configure with --with-python, --with-perl or
> --with-tcl.
> There are two issues I encountered while trying to enable.  e.g. for
> --with-python
>
> 1-  python_ldlibrary is set to "libpython3.9.dll.a". So the related line in
> configure cannot retrieve "ldlibrary"

This presumably is due to using mingw's python rather than python.org's
python. Seems like a reasonable thing to support for the mingw build.

Melih, you could try to build against the python.org python (i installed in
the CI container).


> 2-  src/pl/plpython/Makefile looks under "C:/Windows/system32/" for
> PYTHONDLL. gendef cannot open that file, give an error like " failed to
> open()" when creating a .def file.

On my win10 VM in which I installed python.org python I don't see a python dll
in c:/windows/system32 either. Looks like none of our windows mingw animals
build with python. So it might just be bitrot.


> In the end, make check-world still fails, even though I was able to run
> configure and make without any obvious error.
> I see bunch of errors in tests like:
> +ERROR:  language "plpython3u" does not exist
> +HINT:  Use CREATE EXTENSION to load the language into the database.

> Here is the logs from failed ci run:
> https://api.cirrus-ci.com/v1/artifact/task/4645682031099904/log/build/src/pl/plpython/regression.diffs

The URL to the rest of the CI run is https://cirrus-ci.com/task/4645682031099904


The relevant failure is earlier:

+ERROR:  could not load library 
"C:/cirrus/build/tmp_install/ucrt64/lib/postgresql/plpython3.dll": The 
specified module could not be found.

Clearly plpython is being built, because we see some warnings:

[22:44:24.456] C:/msys64/ucrt64/include/python3.9/pyconfig.h:1474: warning: 
"SIZEOF_OFF_T" redefined
[22:44:24.456]  1474 | #define SIZEOF_OFF_T 8
[22:44:24.456]   |
[22:44:24.456] In file included from c:/cirrus/src/include/c.h:54,
[22:44:24.456]  from c:/cirrus/src/include/postgres.h:46,
[22:44:24.456]  from 
c:/cirrus/contrib/jsonb_plpython/jsonb_plpython.c:1:
[22:44:24.456] ../../src/include/pg_config.h:875: note: this is the location of 
the previous definition
[22:44:24.456]   875 | #define SIZEOF_OFF_T 4

Seems we're doing something wrong and end up with a 4 byte off_t, whereas
python ends up with an 8byte one. We probably need to fix that. But it's not
the cause of this problem.


You could take out -s from the make flags and see whether plpython3.dll is
being built and installed, and where to.


Greetings,

Andres Freund




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-25 Thread Peter Geoghegan
On Fri, Feb 25, 2022 at 3:48 PM Andres Freund  wrote:
> I don't see why it matters that OldestXmin and OldestMxact are computed at the
> same time?  It's a question of the workload, not vacuum algorithm.

I think it's both.

> OldestMxact inherently lags OldestXmin. OldestMxact can only advance after all
> members are older than OldestXmin (not quite true, but that's the bound), and
> they have always more than one member.
>
>
> > How many of these "skewed MultiXacts" can we really expect?
>
> I don't think they're skewed in any way. It's a fundamental aspect of
> multixacts.

Having this happen to some degree is fundamental to MultiXacts, sure.
But also seems like the approach of using FreezeLimit and
MultiXactCutoff in the way that we do right now seems like it might
make the problem a lot worse. Because they're completely meaningless
cutoffs. They are magic numbers that have no relationship whatsoever
to each other.

There are problems with assuming that OldestXmin and OldestMxact
"align" -- no question. But at least it's approximately true -- which
is a start. They are at least not arbitrarily, unpredictably
different, like FreezeLimit and MultiXactCutoff are, and always will
be. I think that that's a meaningful and useful distinction.

I am okay with making the most pessimistic possible assumptions about
how any changes to how we freeze might cause FreezeMultiXactId() to
allocate more MultiXacts than before. And I accept that the patch
series shouldn't "get credit" for "offsetting" any problem like that
by making relminmxid advancement occur much more frequently (even
though that does seem very valuable). All I'm really saying is this:
in general, there are probably quite a few opportunities for
FreezeMultiXactId() to avoid allocating new XMIDs (just to freeze
XIDs) by having the full context. And maybe by making the dialog
between lazy_scan_prune and heap_prepare_freeze_tuple a bit more
nuanced.

-- 
Peter Geoghegan




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 15:28:17 -0800, Peter Geoghegan wrote:
> But why should we ever get to the FreezeMultiXactId() loop with the
> stuff from 0002 in place? The whole purpose of the loop is to handle
> cases where we have to remove *some* (not all) XIDs from before
> cutoff_xid that appear in a MultiXact, which requires careful checking
> of each XID (this is only possible when the MultiXactId is <
> cutoff_multi to begin with, which is OldestMxact in the patch, which
> is presumably very recent).
> 
> It's not impossible that we'll get some number of "skewed MultiXacts"
> with the patch -- cases that really do necessitate allocating a new
> MultiXact, just to "freeze some XIDs from a MultiXact". That is, there
> will sometimes be some number of XIDs that are < OldestXmin, but
> nevertheless appear in some MultiXactIds >= OldestMxact. This seems
> likely to be rare with the patch, though, since VACUUM calculates its
> OldestXmin and OldestMxact (which are what cutoff_xid and cutoff_multi
> really are in the patch) at the same point in time. Which was the
> point I made in my email yesterday.

I don't see why it matters that OldestXmin and OldestMxact are computed at the
same time?  It's a question of the workload, not vacuum algorithm.

OldestMxact inherently lags OldestXmin. OldestMxact can only advance after all
members are older than OldestXmin (not quite true, but that's the bound), and
they have always more than one member.


> How many of these "skewed MultiXacts" can we really expect?

I don't think they're skewed in any way. It's a fundamental aspect of
multixacts.

Greetings,

Andres Freund




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-25 Thread Peter Geoghegan
On Fri, Feb 25, 2022 at 2:00 PM Peter Geoghegan  wrote:
> > Hm. I guess I'll have to look at the code for it. It doesn't immediately
> > "feel" quite right.
>
> I kinda think it might be. Please let me know if you see a problem
> with what I've said.

Oh, wait. I have a better idea of what you meant now. The loop towards
the end of FreezeMultiXactId() will indeed "Determine whether to keep
this member or ignore it." when we need a new MultiXactId. The loop is
exact in the sense that it will only include those XIDs that are truly
needed -- those that are still running.

But why should we ever get to the FreezeMultiXactId() loop with the
stuff from 0002 in place? The whole purpose of the loop is to handle
cases where we have to remove *some* (not all) XIDs from before
cutoff_xid that appear in a MultiXact, which requires careful checking
of each XID (this is only possible when the MultiXactId is <
cutoff_multi to begin with, which is OldestMxact in the patch, which
is presumably very recent).

It's not impossible that we'll get some number of "skewed MultiXacts"
with the patch -- cases that really do necessitate allocating a new
MultiXact, just to "freeze some XIDs from a MultiXact". That is, there
will sometimes be some number of XIDs that are < OldestXmin, but
nevertheless appear in some MultiXactIds >= OldestMxact. This seems
likely to be rare with the patch, though, since VACUUM calculates its
OldestXmin and OldestMxact (which are what cutoff_xid and cutoff_multi
really are in the patch) at the same point in time. Which was the
point I made in my email yesterday.

How many of these "skewed MultiXacts" can we really expect? Seems like
there might be very few in practice. But I'm really not sure about
that.

-- 
Peter Geoghegan




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 14:00:12 -0800, Peter Geoghegan wrote:
> On Thu, Feb 24, 2022 at 11:14 PM Andres Freund  wrote:
> > I am not a fan of the backstop terminology. It's still the reason we need to
> > do freezing for correctness reasons.
> 
> Thanks for the review!
> 
> I'm not wedded to that particular terminology, but I think that we
> need something like it. Open to suggestions.
>
> How about limit-based? Something like that?

freeze_required_limit, freeze_desired_limit? Or s/limit/cutoff/? Or
s/limit/below/? I kind of like below because that answers < vs <= which I find
hard to remember around freezing.


> > I'm a tad concerned about replacing mxids that have some members that are
> > older than OldestXmin but not older than FreezeLimit. It's not too hard to
> > imagine that accelerating mxid consumption considerably.  But we can 
> > probably,
> > if not already done, special case that.
> 
> Let's assume for a moment that this is a real problem. I'm not sure if
> it is or not myself (it's complicated), but let's say that it is. The
> problem may be more than offset by the positive impact on relminxmid
> advancement. I have placed a large emphasis on enabling
> relfrozenxid/relminxmid advancement in every non-aggressive VACUUM,
> for a number of reasons -- this is one of the reasons. Finding a way
> for every VACUUM operation to be "vacrel->scanned_pages +
> vacrel->frozenskipped_pages == orig_rel_pages" (i.e. making *some*
> amount of relfrozenxid/relminxmid advancement possible in every
> VACUUM) has a great deal of value.

That may be true, but I think working more incrementally is better in this
are. I'd rather have a smaller improvement for a release, collect some data,
get another improvement in the next, than see a bunch of reports of larger
wind and large regressions.


> As I said recently on the "do only critical work during single-user
> vacuum?" thread, why should the largest tables in databases that
> consume too many MXIDs do so evenly, across all their tables? There
> are usually one or two large tables, and many more smaller tables. I
> think it's much more likely that the largest tables consume
> approximately zero MultiXactIds in these databases -- actual
> MultiXactId consumption is probably concentrated in just one or two
> smaller tables (even when we burn through MultiXacts very quickly).
> But we don't recognize these kinds of distinctions at all right now.

Recognizing those distinctions seems independent of freezing multixacts with
live members. I am happy with freezing them more aggressively if they don't
have live members. It's freezing mxids with live members that has me
concerned.  The limits you're proposing are quite aggressive and can advance
quickly.

I've seen large tables with plenty multixacts. Typically concentrated over a
value range (often changing over time).


> Under these conditions, we will have many more opportunities to
> advance relminmxid for most of the tables (including the larger
> tables) all the way up to current-oldestMxact with the patch series.
> Without needing to freeze *any* MultiXacts early (just freezing some
> XIDs early) to get that benefit. The patch series is not just about
> spreading the burden of freezing, so that non-aggressive VACUUMs
> freeze more -- it's also making relfrozenxid and relminmxid more
> recent and therefore *reliable* indicators of which tables any
> wraparound problems *really* are.

My concern was explicitly about the case where we have to create new
multixacts...


> Does that make sense to you?

Yes.


> > > On HEAD, FreezeMultiXactId() doesn't get passed down the VACUUM 
> > > operation's
> > > OldestXmin at all (it actually just gets FreezeLimit passed as its
> > > cutoff_xid argument). It cannot possibly recognize any of this for itself.
> >
> > It does recognize something like OldestXmin in a more precise and expensive
> > way - MultiXactIdIsRunning() and TransactionIdIsCurrentTransactionId().
> 
> It doesn't look that way to me.
> 
> While it's true that FreezeMultiXactId() will call MultiXactIdIsRunning(),
> that's only a cross-check.

> This cross-check is made at a point where we've already determined that the
> MultiXact in question is < cutoff_multi. In other words, it catches cases
> where a "MultiXactId < cutoff_multi" Multi contains an XID *that's still
> running* -- a correctness issue. Nothing to do with being smart about
> avoiding allocating new MultiXacts during freezing, or exploiting the fact
> that "FreezeLimit < OldestXmin" (which is almost always true, very true).

If there's <= 1 live members in a mxact, we replace it with with a plain xid
iff the xid also would get frozen. With the current freezing logic I don't see
what passing down OldestXmin would change. Or how it differs to a meaningful
degree from heap_prepare_freeze_tuple()'s logic.  I don't see how it'd avoid a
single new mxact from being allocated.



> > > Caller should make temp
> > > + * copies of global tracking variables 

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-25 Thread Peter Geoghegan
On Thu, Feb 24, 2022 at 11:14 PM Andres Freund  wrote:
> I am not a fan of the backstop terminology. It's still the reason we need to
> do freezing for correctness reasons.

Thanks for the review!

I'm not wedded to that particular terminology, but I think that we
need something like it. Open to suggestions.

How about limit-based? Something like that?

> It'd make more sense to me to turn it
> around and call the "non-backstop" freezing opportunistic freezing or such.

The problem with that scheme is that it leads to a world where
"standard freezing" is incredibly rare (it often literally never
happens), whereas "opportunistic freezing" is incredibly common. That
doesn't make much sense to me.

We tend to think of 50 million XIDs (the vacuum_freeze_min_age
default) as being not that many. But I think that it can be a huge
number, too. Even then, it's unpredictable -- I suspect that it can
change without very much changing in the application, from the point
of view of users. That's a big part of the problem I'm trying to
address -- freezing outside of aggressive VACUUMs is way too rare (it
might barely happen at all). FreezeLimit/vacuum_freeze_min_age was
designed at a time when there was no visibility map at all, when it
made somewhat more sense as the thing that drives freezing.

Incidentally, this is part of the problem with anti-wraparound vacuums
and freezing debt -- the fact that some quite busy databases take
weeks or months to go through 50 million XIDs (or 200 million)
increases the pain of the eventual aggressive VACUUM. It's not
completely unbounded -- autovacuum_freeze_max_age is not 100% useless
here. But the extent to which that stuff bounds the debt can vary
enormously, for not-very-good reasons.

> > Whenever we opt to
> > "freeze a page", the new page-level algorithm *always* uses the most
> > recent possible XID and MXID values (OldestXmin and oldestMxact) to
> > decide what XIDs/XMIDs need to be replaced. That might sound like it'd
> > be too much, but it only applies to those pages that we actually
> > decide to freeze (since page-level characteristics drive everything
> > now). FreezeLimit is only one way of triggering that now (and one of
> > the least interesting and rarest).
>
> That largely makes sense to me and doesn't seem weird.

I'm very pleased that the main intuition behind 0002 makes sense to
you. That's a start, at least.

> I'm a tad concerned about replacing mxids that have some members that are
> older than OldestXmin but not older than FreezeLimit. It's not too hard to
> imagine that accelerating mxid consumption considerably.  But we can probably,
> if not already done, special case that.

Let's assume for a moment that this is a real problem. I'm not sure if
it is or not myself (it's complicated), but let's say that it is. The
problem may be more than offset by the positive impact on relminxmid
advancement. I have placed a large emphasis on enabling
relfrozenxid/relminxmid advancement in every non-aggressive VACUUM,
for a number of reasons -- this is one of the reasons. Finding a way
for every VACUUM operation to be "vacrel->scanned_pages +
vacrel->frozenskipped_pages == orig_rel_pages" (i.e. making *some*
amount of relfrozenxid/relminxmid advancement possible in every
VACUUM) has a great deal of value.

As I said recently on the "do only critical work during single-user
vacuum?" thread, why should the largest tables in databases that
consume too many MXIDs do so evenly, across all their tables? There
are usually one or two large tables, and many more smaller tables. I
think it's much more likely that the largest tables consume
approximately zero MultiXactIds in these databases -- actual
MultiXactId consumption is probably concentrated in just one or two
smaller tables (even when we burn through MultiXacts very quickly).
But we don't recognize these kinds of distinctions at all right now.

Under these conditions, we will have many more opportunities to
advance relminmxid for most of the tables (including the larger
tables) all the way up to current-oldestMxact with the patch series.
Without needing to freeze *any* MultiXacts early (just freezing some
XIDs early) to get that benefit. The patch series is not just about
spreading the burden of freezing, so that non-aggressive VACUUMs
freeze more -- it's also making relfrozenxid and relminmxid more
recent and therefore *reliable* indicators of which tables any
wraparound problems *really* are.

Does that make sense to you? This kind of "virtuous cycle" seems
really important to me. It's a subtle point, so I have to ask.

> > It seems that heap_prepare_freeze_tuple allocates new MXIDs (when
> > freezing old ones) in large part so it can NOT freeze XIDs that it
> > would have been useful (and much cheaper) to remove anyway.
>
> Well, we may have to allocate a new mxid because some members are older than
> FreezeLimit but others are still running. When do we not remove xids that
> would have been cheaper to remove once we 

Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-25 Thread Melanie Plageman
On Thu, Feb 24, 2022 at 10:24 PM Amit Kapila  wrote:
>
> On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman
>  wrote:
> >
> > I'm trying to understand why hash indexes are built primarily in shared
> > buffers except when allocating a new splitpoint's worth of bucket pages
> > -- which is done with smgrextend() directly in _hash_alloc_buckets().
> >
> > Is this just so that the value returned by smgrnblocks() includes the
> > new splitpoint's worth of bucket pages?
> >
> > All writes of tuple data to pages in this new splitpoint will go
> > through shared buffers (via hash_getnewbuf()).
> >
> > I asked this and got some thoughts from Robert in [1], but I still don't
> > really get it.
> >
> > When a new page is needed during the hash index build, why can't
> > _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of
> > _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping?
> >
>
> We allocate the chunk of pages (power-of-2 groups) at the time of
> split which allows them to appear consecutively in an index. This
> helps us to compute the physical block number from bucket number
> easily (BUCKET_TO_BLKNO mapping) with some minimal control
> information.

got it, thanks.

Since _hash_alloc_buckets() WAL-logs the last page of the
splitpoint, is it safe to skip the smgrimmedsync()? What if the last
page of the splitpoint doesn't end up having any tuples added to it
during the index build and the redo pointer is moved past the WAL for
this page and then later there is a crash sometime before this page
makes it to permanent storage. Does it matter that this page is lost? If
not, then why bother WAL-logging it?

- Melanie




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 14:25:52 +0800, Julien Rouhaud wrote:
> > Basically the type of stats we're trying to move to dynamic shared memory is
> > about counters that should persist for a while and are accumulated across
> > connections etc. Whereas backend_status.c is more about tracking the current
> > state (what query is a backend running, what user, etc).
> 
> But would it be acceptable to use dynamic shared memory in backend_status and
> e.g. have a dsa_pointer rather than a fixed length array?

Might be OK, but it does add a fair bit of complexity. Suddenly there's a
bunch more initialization order dependencies that you don't have right
now. I'd not go there for just this.

Greetings,

Andres Freund




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-25 Thread Andres Freund
On 2022-02-25 20:19:24 +, Jacob Champion wrote:
> From 2fde60a6bc3739f1894c8c264120e4fa0f04df64 Mon Sep 17 00:00:00 2001
> From: Jacob Champion 
> Date: Mon, 14 Feb 2022 08:10:53 -0800
> Subject: [PATCH v3] Add API to retrieve authn_id from SQL

> The authn_id field in MyProcPort is currently only accessible to the
> backend itself.  Add a SQL function, session_authn_id(), to expose the
> field to triggers that may want to make use of it.

Looks to me like authn_id isn't synchronized to parallel workers right now. So
the function will return the wrong thing when executed as part of a parallel
query.

I don't think we should add further functions not prefixed with pg_.


Perhaps a few tests for less trivial authn_ids could be worthwhile?
E.g. certificate DNs.

Greetings,

Andres Freund




Re: [PATCH] pg_permissions

2022-02-25 Thread Chapman Flack
I would be happy to review this patch, but a look through the email leaves me
thinking it may still be waiting on a C implementation of pg_get_acl(). Is that
right? And perhaps a view rename to pg_privileges, following Peter's comment?

Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?

2022-02-25 Thread Nathan Bossart
On Mon, Jan 31, 2022 at 04:25:21PM +0900, Michael Paquier wrote:
> At the end of the day, it may be better to just let this stuff be.
> Another argument for doing nothing is that this could cause
> hard-to-spot conflicts when it comes to back-patch something.

This one has been quiet for a while.  Should we mark it as
returned-with-feedback?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-25 Thread Jacob Champion
On Fri, 2022-02-25 at 16:28 +, Jacob Champion wrote:
> Ha, opr_sanity caught my use of cstring. I'll work on a fix later
> today.

Fixed in v3.

--Jacob
From 2fde60a6bc3739f1894c8c264120e4fa0f04df64 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 14 Feb 2022 08:10:53 -0800
Subject: [PATCH v3] Add API to retrieve authn_id from SQL

The authn_id field in MyProcPort is currently only accessible to the
backend itself.  Add a SQL function, session_authn_id(), to expose the
field to triggers that may want to make use of it.
---
 src/backend/utils/adt/name.c  | 12 +++-
 src/include/catalog/catversion.h  |  2 +-
 src/include/catalog/pg_proc.dat   |  3 +++
 src/test/authentication/t/001_password.pl | 11 +++
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index e8bba3670c..b892d25c29 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "libpq/libpq-be.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -257,7 +258,7 @@ namestrcmp(Name name, const char *str)
 
 
 /*
- * SQL-functions CURRENT_USER, SESSION_USER
+ * SQL-functions CURRENT_USER, SESSION_USER, SESSION_AUTHN_ID
  */
 Datum
 current_user(PG_FUNCTION_ARGS)
@@ -271,6 +272,15 @@ session_user(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(DirectFunctionCall1(namein, CStringGetDatum(GetUserNameFromId(GetSessionUserId(), false;
 }
 
+Datum
+session_authn_id(PG_FUNCTION_ARGS)
+{
+	if (!MyProcPort || !MyProcPort->authn_id)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(cstring_to_text(MyProcPort->authn_id));
+}
+
 
 /*
  * SQL-functions CURRENT_SCHEMA, CURRENT_SCHEMAS
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 1addb568ef..14194afe1c 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202202221
+#define CATALOG_VERSION_NO	202202251
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7f1ee97f55..3ddcbae55e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1508,6 +1508,9 @@
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
+{ oid => '9774', descr => 'session authenticated identity',
+  proname => 'session_authn_id', provolatile => 's', prorettype => 'text',
+  proargtypes => '', prosrc => 'session_authn_id' },
 
 { oid => '744',
   proname => 'array_eq', prorettype => 'bool',
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..2aa28ed547 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -82,6 +82,11 @@ test_role($node, 'scram_role', 'trust', 0,
 test_role($node, 'md5_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 
+my $res = $node->safe_psql('postgres',
+	"SELECT session_authn_id() IS NULL;"
+);
+is($res, 't', "users with trust authentication have NULL authn_id");
+
 # For plain "password" method, all users should also be able to connect.
 reset_pg_hba($node, 'password');
 test_role($node, 'scram_role', 'password', 0,
@@ -91,6 +96,12 @@ test_role($node, 'md5_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=password/]);
 
+$res = $node->safe_psql('postgres',
+	"SELECT session_authn_id();",
+	connstr  => "user=md5_role"
+);
+is($res, 'md5_role', "users with md5 authentication have authn_id matching role name");
+
 # For "scram-sha-256" method, user "scram_role" should be able to connect.
 reset_pg_hba($node, 'scram-sha-256');
 test_role(
-- 
2.25.1



Re: Race conditions in 019_replslot_limit.pl

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 15:07:01 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Seems to suggest something is holding a problematic lock in a way that I do 
> > not understand yet:
> 
> > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake=2022-02-23%2013%3A47%3A20=recovery-check
> > 2022-02-23 09:09:52.299 EST [2022-02-23 09:09:52 EST 1997084:6] 
> > 019_replslot_limit.pl LOG:  received replication command: 
> > CREATE_REPLICATION_SLOT "pg_basebackup_1997084" TEMPORARY PHYSICAL ( 
> > RESERVE_WAL)
> > ...
> > 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:14] 
> > 019_replslot_limit.pl DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks 
> > to make
> > 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:15] 
> > 019_replslot_limit.pl DEBUG:  replication slot exit hook, without active 
> > slot
> > 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:16] 
> > 019_replslot_limit.pl DEBUG:  temporary replication slot cleanup: begin
> 
> > last message from 1997084 until the immediate shutdown.
> 
> Hmm.  Maybe put a couple more debug messages into ReplicationSlotCleanup
> and/or ReplicationSlotDropPtr?  It doesn't seem very clear where in that
> sequence it's hanging up.

Yea, was thinking that as well.


I'm also wondering whether it's worth adding an assert, or at least a WARNING,
about no lwlocks held to the tail end of ShutdownPostgres?  I don't want to
add an LWLockReleaseAll() yet, before I understand what's actually happening.


> > We could be more certain if we shut down the cluster in fast rather than
> > immediate mode. So I'm thinking of doing something like
> 
> Does that risk an indefinite hangup of the buildfarm run?

I think not. The pg_ctl stop -m fast should time out after PGCTLTIMEOUT,
$self->_update_pid(-1); should notice it's not dead. The END{} block should
then shut it down in immediate mode.

Greetings,

Andres Freund




Re: Race conditions in 019_replslot_limit.pl

2022-02-25 Thread Tom Lane
Andres Freund  writes:
> Seems to suggest something is holding a problematic lock in a way that I do 
> not understand yet:

> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake=2022-02-23%2013%3A47%3A20=recovery-check
> 2022-02-23 09:09:52.299 EST [2022-02-23 09:09:52 EST 1997084:6] 
> 019_replslot_limit.pl LOG:  received replication command: 
> CREATE_REPLICATION_SLOT "pg_basebackup_1997084" TEMPORARY PHYSICAL ( 
> RESERVE_WAL)
> ...
> 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:14] 
> 019_replslot_limit.pl DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to 
> make
> 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:15] 
> 019_replslot_limit.pl DEBUG:  replication slot exit hook, without active slot
> 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:16] 
> 019_replslot_limit.pl DEBUG:  temporary replication slot cleanup: begin

> last message from 1997084 until the immediate shutdown.

Hmm.  Maybe put a couple more debug messages into ReplicationSlotCleanup
and/or ReplicationSlotDropPtr?  It doesn't seem very clear where in that
sequence it's hanging up.

> We could be more certain if we shut down the cluster in fast rather than
> immediate mode. So I'm thinking of doing something like

Does that risk an indefinite hangup of the buildfarm run?

regards, tom lane




Re: Commitfest manager for 2022-03

2022-02-25 Thread David Steele

Hi Greg,

On 2/25/22 12:39, Greg Stark wrote:

I would like to volunteer.

On Fri, 25 Feb 2022 at 05:31, Julien Rouhaud  wrote:


Is there any volunteer?


Since I've done the March CF for the last seven years, I thought it 
would be good if I chimed in. I've been agonizing a bit because I have 
travel planned during March and while I *could* do it I don't think I'd 
be able to do a good job.


I've been hoping somebody would volunteer, so I'm all in favor of you 
being CF.


Regards,
-David




Re: Add index scan progress to pg_stat_progress_vacuum

2022-02-25 Thread Nathan Bossart
On Wed, Feb 23, 2022 at 10:41:36AM -0800, Peter Geoghegan wrote:
> On Wed, Feb 23, 2022 at 10:02 AM Imseih (AWS), Sami  
> wrote:
>> If the failsafe kicks in midway through a vacuum, the number indexes_total 
>> will not be reset to 0. If INDEX_CLEANUP is turned off, then the value will 
>> be 0 at the start of the vacuum.
> 
> The way that this works with num_index_scans is that we "round up"
> when there has been non-zero work in lazy_vacuum_all_indexes(), but
> not if the precheck in lazy_vacuum_all_indexes() fails. That seems
> like a good model to generalize from here. Note that this makes
> INDEX_CLEANUP=off affect num_index_scans in much the same way as a
> VACUUM where the failsafe kicks in very early, during the initial heap
> pass. That is, if the failsafe kicks in before we reach lazy_vacuum()
> for the first time (which is not unlikely), or even in the
> lazy_vacuum_all_indexes() precheck, then num_index_scans will remain
> at 0, just like INDEX_CLEANUP=off.
> 
> The actual failsafe WARNING shows num_index_scans, possibly before it
> gets incremented one last time (by "rounding up"). So it's reasonably
> clear how this all works from that context (assuming that the
> autovacuum logging stuff, which reports num_index_scans, outputs a
> report for a table where the failsafe kicked in).

I am confused.  If failsafe kicks in during the middle of a vacuum, I
(perhaps naively) would expect indexes_total and indexes_processed to go to
zero, and I'd expect to no longer see the "vacuuming indexes" and "cleaning
up indexes" phases.  Otherwise, how would I know that we are now skipping
indexes?  Of course, you won't have any historical context about the index
work done before failsafe kicked in, but IMO it is misleading to still
include it in the progress view.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 14:10:39 -0500, Tom Lane wrote:
> I'm happy to add support for custom auth methods if they can use
> a protocol that's safer than cleartext-password.  But if that's the
> only feasible option, then we're just encouraging people to use
> insecure methods.

It looks like scram can be used without much trouble. All the necessary
infrastructure to implement it without duplication appears to be public.  The
plugin would need to get a secret from whereever and call
  CheckSASLAuth(_be_scram_mech, port, shadow_pass, logdetail);

or if it needs to do something more complicated than CheckSASLAuth(), it can
use AUTH_REQ_SASL{,_FIN CONT} itself.

Of course whether it's viable depends on what the custom auth method wants to
do. But it's not a restriction of the authentication plugin model.

Greetings,

Andres Freund




Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Jacob Champion
On Fri, 2022-02-25 at 13:40 -0500, Jonathan S. Katz wrote:
> I'm generally in favor of being able to support additional 
> authentication methods, the first one coming to mind is supporting OIDC. 
> Having a pluggable auth infrastructure could possibly make such efforts 
> easier. I'm definitely intrigued.

I'm hoping to dust off my OAuth patch and see if it can be ported on
top of this proposal.

--Jacob


Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-25 Thread Nathan Bossart
On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote:
> Thanks Satya and others for the inputs. Here's the v1 patch that
> basically allows async wal senders to wait until the sync standbys
> report their flush lsn back to the primary. Please let me know your
> thoughts.

I haven't had a chance to look too closely yet, but IIUC this adds a new
function that waits for synchronous replication.  This new function
essentially spins until the synchronous LSN has advanced.

I don't think it's a good idea to block sending any WAL like this.  AFAICT
it is possible that there will be a lot of synchronously replicated WAL
that we can send, and it might just be the last several bytes that cannot
yet be replicated to the asynchronous standbys.  І believe this patch will
cause the server to avoid sending _any_ WAL until the synchronous LSN
advances.

Perhaps we should instead just choose the SendRqstPtr based on the current
synchronous LSN.  Presumably there are other things we'd need to consider,
but in general, I think we ought to send as much WAL as possible for a
given call to XLogSendPhysical().

> I've done pgbench testing to see if the patch causes any problems. I
> ran tests two times, there isn't much difference in the txns per
> seconds (tps), although there's a delay in the async standby receiving
> the WAL, after all, that's the feature we are pursuing.

I'm curious what a longer pgbench run looks like when the synchronous
replicas are in the same region.  That is probably a more realistic
use-case.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 16:25:01 -0300, Euler Taveira wrote:
> On Fri, Feb 25, 2022, at 11:52 AM, Greg Stark wrote:
> > On Tue, 25 Jan 2022 at 01:32, Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > I was looking the shared memory stats patch again.
> > 
> > Can you point me to this thread? I looked for it but couldn't find it.

> https://postgr.es/m/20180629.173418.190173462.horiguchi.kyot...@lab.ntt.co.jp

I'll post a rebased version as soon as this is resolved... I have a local one,
but it just works by nuking a bunch of tests / #ifdefing out code related to
this.

Greetings,

Andres Freund




Re: Race conditions in 019_replslot_limit.pl

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-22 18:06:24 -0800, Andres Freund wrote:
> On 2022-02-18 15:14:15 -0800, Andres Freund wrote:
> > I'm running out of ideas for how to try to reproduce this. I think we might
> > need some additional debugging information to get more information from the
> > buildfarm.
> 
> > I'm thinking of adding log_min_messages=DEBUG2 to primary3, passing 
> > --verbose
> > to pg_basebackup in $node_primary3->backup(...).
> >
> > It might also be worth adding DEBUG2 messages to ReplicationSlotShmemExit(),
> > ReplicationSlotCleanup(), InvalidateObsoleteReplicationSlots().
> 
> Planning to commit something like the attached.

This did provide us a bit more detail.

Seems to suggest something is holding a problematic lock in a way that I do not 
understand yet:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake=2022-02-23%2013%3A47%3A20=recovery-check
2022-02-23 09:09:52.299 EST [2022-02-23 09:09:52 EST 1997084:6] 
019_replslot_limit.pl LOG:  received replication command: 
CREATE_REPLICATION_SLOT "pg_basebackup_1997084" TEMPORARY PHYSICAL ( 
RESERVE_WAL)
...
2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:14] 
019_replslot_limit.pl DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to 
make
2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:15] 
019_replslot_limit.pl DEBUG:  replication slot exit hook, without active slot
2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:16] 
019_replslot_limit.pl DEBUG:  temporary replication slot cleanup: begin

last message from 1997084 until the immediate shutdown. Just checking for
temporary replication slots that need to be dropped requires
ReplicationSlotControlLock. Actually dropping also requires
ReplicationSlotAllocationLock

1997084 does have to a temporary replication slot to clean up.


2022-02-23 09:09:52.519 EST [2022-02-23 09:09:52 EST 1997083:35] 
019_replslot_limit.pl DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to 
make
2022-02-23 09:09:52.519 EST [2022-02-23 09:09:52 EST 1997083:36] 
019_replslot_limit.pl DEBUG:  replication slot exit hook, without active slot
2022-02-23 09:09:52.519 EST [2022-02-23 09:09:52 EST 1997083:37] 
019_replslot_limit.pl DEBUG:  temporary replication slot cleanup: begin
2022-02-23 09:09:52.519 EST [2022-02-23 09:09:52 EST 1997083:38] 
019_replslot_limit.pl DEBUG:  temporary replication slot cleanup: done

1997083 succeeds in doing the cleanup. It does not have a temporary
replication slot. Making it look like somehow ReplicationSlotAllocationLock
hasn't been released.


2022-02-23 09:09:52.519 EST [2022-02-23 09:09:52 EST 1997083:39] 
019_replslot_limit.pl DEBUG:  shmem_exit(0): 7 on_shmem_exit callbacks to make
...
2022-02-23 09:09:53.076 EST [2022-02-23 09:09:52 EST 1997072:87] LOG:  received 
immediate shutdown request
...
2022-02-23 09:09:53.095 EST [2022-02-23 09:09:52 EST 1997072:90] DEBUG:  server 
process (PID 1997084) exited with exit code 2

It's *possible*, but not likely, that somehow 1997084 just doesn't get
scheduled for a prolonged amount of time.


We could be more certain if we shut down the cluster in fast rather than
immediate mode. So I'm thinking of doing something like

# We've seen occasionales cases where multiple walsender pids are active. An
# immediate shutdown may hide evidence of a locking bug. So if multiple
# walsenders are observed, shut down in fast mode, and collect some more
# information.
if (not like($senderpid, qr/^[0-9]+$/, "have walsender pid $senderpid"))
{
my ($stdout, $stderr);
$node_primary3->psql('postgres',
 "\\a\\t\nSELECT * FROM pg_stat_activity",
 stdout => \$stdout, stderr => \$stderr);
diag $stdout, $stderr;
$node_primary3->stop('fast');
$node_standby3->stop('fast');
die "could not determine walsender pid, can't continue";
}

Does that make sense? Better ideas?

Greetings,

Andres Freund




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-25 Thread Euler Taveira
On Fri, Feb 25, 2022, at 11:52 AM, Greg Stark wrote:
> On Tue, 25 Jan 2022 at 01:32, Andres Freund  wrote:
> >
> > Hi,
> >
> > I was looking the shared memory stats patch again.
> 
> Can you point me to this thread? I looked for it but couldn't find it.
https://postgr.es/m/20180629.173418.190173462.horiguchi.kyot...@lab.ntt.co.jp


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Tom Lane
Jeff Davis  writes:
> On Fri, 2022-02-25 at 12:39 -0500, Tom Lane wrote:
>> My point is that sending cleartext passwords over the wire is an
>> insecure-by-definition protocol that we shouldn't be encouraging
>> more use of.

> We can require custom auth entries in pg_hba.conf to also specify
> local, hostssl or hostgssenc.

That's just a band-aid, though.  It does nothing for the other
reasons not to want cleartext passwords, notably that you might
be giving your password to a compromised server.

I'm happy to add support for custom auth methods if they can use
a protocol that's safer than cleartext-password.  But if that's the
only feasible option, then we're just encouraging people to use
insecure methods.

I also have in mind here that there has been discussion of giving
libpq a feature to refuse, on the client side, to send cleartext
passwords.  (I don't recall right now if that's been mentioned
publicly or only among the security team, but it's definitely
under consideration.)

So I think this proposal needs more thought.  A server-side hook
alone is not a useful improvement IMO; it's closer to being an
attractive nuisance.

regards, tom lane




Re: Fix overflow in justify_interval related functions

2022-02-25 Thread Nathan Bossart
On Fri, Feb 25, 2022 at 10:30:57AM -0500, Joseph Koshakow wrote:
> Just checking because I'm not very familiar with the process,
> are there any outstanding items that I need to do for this patch?

Unless someone has additional feedback, I don't think so.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-02-25 Thread David Christensen
Bharath Rupireddy  writes:

> On Fri, Feb 25, 2022 at 12:36 AM David Christensen
>  wrote:
>>
>> Greetings,
>>
>> This patch adds the ability to specify a RelFileNode and optional BlockNum 
>> to limit output of
>> pg_waldump records to only those which match the given criteria.  This 
>> should be more performant
>> than `pg_waldump | grep` as well as more reliable given specific variations 
>> in output style
>> depending on how the blocks are specified.
>>
>> This currently affects only the main fork, but we could presumably add the 
>> option to filter by fork
>> as well, if that is considered useful.
>
> Thanks for the patch. This is not adding something that users can't do
> right now, but definitely improves the usability of the pg_waldump as
> it avoids external filterings. Also, it can give the stats/info at
> table and block level. So, +1 from my side.

Attached is V2 with additional feedback from this email, as well as the 
specification of the
ForkNumber and FPW as specifiable options.

Best,

David

>From 1b04f04317d364006371bdab0db9086f79138b25 Mon Sep 17 00:00:00 2001
From: David Christensen 
Date: Fri, 25 Feb 2022 12:52:56 -0600
Subject: [PATCH] Add additional filtering options to pg_waldump

This feature allows you to only output records that are targeting a specific RelFileNode and optional
BlockNumber within this relation, while specifying which ForkNum you want to filter to.

We also add the independent ability to filter via Full Page Write.
---
 doc/src/sgml/ref/pg_waldump.sgml |  48 
 src/bin/pg_waldump/pg_waldump.c  | 128 ++-
 2 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 5735a161ce..f157175764 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -100,6 +100,44 @@ PostgreSQL documentation
   
  
 
+ 
+  -k block
+  --block=block
+  
+   
+Display only records touching the given block. (Requires also
+providing the relation via --relation.)
+   
+  
+ 
+
+ 
+  --fork=fork
+  
+   
+When using the --relation filter, output only records
+from the given fork.  The valid values here are 0
+for the main fork, 1 for the Free Space
+Map, 2 for the Visibility Map,
+and 3 for the Init fork.  If unspecified, defaults
+to the main fork.
+   
+  
+ 
+
+ 
+  -l tbl/db/rel
+  --relation=tbl/db/rel
+  
+   
+Display only records touching the given relation.  The relation is
+specified via tablespace OID, database OID, and relfilenode separated
+by slashes, for instance, 1234/12345/12345.  This
+is the same format used for relations in the WAL dump output.
+   
+  
+ 
+
  
   -n limit
   --limit=limit
@@ -183,6 +221,16 @@ PostgreSQL documentation

  
 
+ 
+   -w
+   --fullpage
+   
+   
+   Filter records to only those which have full page writes.
+   
+   
+ 
+
  
   -x xid
   --xid=xid
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a6251e1a96..a527cd4dc6 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -55,6 +55,12 @@ typedef struct XLogDumpConfig
 	bool		filter_by_rmgr_enabled;
 	TransactionId filter_by_xid;
 	bool		filter_by_xid_enabled;
+	RelFileNode filter_by_relation;
+	bool		filter_by_relation_enabled;
+	BlockNumber	filter_by_relation_block;
+	bool		filter_by_relation_block_enabled;
+	ForkNumber	filter_by_relation_forknum;
+	bool		filter_by_fpw;
 } XLogDumpConfig;
 
 typedef struct Stats
@@ -394,6 +400,56 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
 	return count;
 }
 
+/*
+ * Boolean to return whether the given WAL record matches a specific relation and optional block
+ */
+static bool
+XLogRecordMatchesRelationBlock(XLogReaderState *record, RelFileNode matchRnode, BlockNumber matchBlock, ForkNumber matchFork)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= record->max_block_id; block_id++)
+	{
+		RelFileNode rnode;
+		ForkNumber	forknum;
+		BlockNumber blk;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		XLogRecGetBlockTag(record, block_id, , , );
+
+		if (forknum == matchFork &&
+			matchRnode.spcNode == rnode.spcNode &&
+			matchRnode.dbNode == rnode.dbNode &&
+			matchRnode.relNode == rnode.relNode &&
+			(matchBlock == InvalidBlockNumber || matchBlock == blk))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Boolean to return whether the given WAL record contains a full page write
+ */
+static bool
+XLogRecordHasFPW(XLogReaderState *record)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= record->max_block_id; block_id++)
+	{
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if 

Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Jeff Davis
On Fri, 2022-02-25 at 12:39 -0500, Tom Lane wrote:
> My point is that sending cleartext passwords over the wire is an
> insecure-by-definition protocol that we shouldn't be encouraging
> more use of.

We can require custom auth entries in pg_hba.conf to also specify
local, hostssl or hostgssenc.

It might annoy people who have a network secured at some other layer,
or who have the client on the same machine as the host. We could allow
plain "host" if someone specifies "customplain" explicitly.

Regards,
Jeff Davis






Re: Two noncritical bugs of pg_waldump

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote:
> > If I give an empty file to the tool it complains as the follows.
> > 
> > > pg_waldump: fatal: could not read file "hoge": No such file or directory
> > 
> > No, the file exists.  The cause is it reads uninitialized errno to
> > detect errors from the system call.  read(2) is defined to set errno
> > always when it returns -1 and doesn't otherwise. Thus it seems to me
> > that it is better to check that the return value is less than zero
> > than to clear errno before the call to read().
> 
> So I post a patch contains only the indisputable part.

Thanks for the report and fix. Pushed. This was surprisingly painful, all but
one branch had conflicts...

Greetings,

Andres Freund




Re: Using operators to do query hints

2022-02-25 Thread Bruce Momjian
On Tue, Feb 22, 2022 at 04:12:15PM -0500, Greg Stark wrote:
> I've been playing with an idea I had a while back. Basically that it
> would be useful to have some "noop" operators that are used purely to
> influence the planner.
> 
> For context I've suggested in the past that there are two categories of hints:
> 
> 1 Hints that override the planner's decisions with explicit planning
> decisions. These we don't like for a variety of reasons, notably
> because users don't have very good knowledge of all the plan types
> available and new plan types come out in the future.
> 
> 2 Hints that just influence the estimates that the planner uses to
> make its decisions. These seem more acceptable as it amounts to
> admitting the users know the distribution of their data and the
> behaviour of their functions better than the statistics. Also, there
> are plenty of cases where the statistics are really just wild guesses.
> They still allow the planner to make decisions about what join orders
> or types and so on given the updated data.

#2 is an interesting distinction.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: C++ Trigger Framework

2022-02-25 Thread Bruce Momjian
On Tue, Feb 22, 2022 at 11:33:05PM +0200, Shmuel Kamensky wrote:
> Hi Nathan,
> 
> Thanks for the reply. I did indeed read that document and it's a great place 
> to
> get started. But it doesn't quite answer my questions. Do you personally have
> experience writing software in C++ that interacts with Postgres?

As far as I know, it should just work like C.  I know you can compile
the backend server using a C++ compiler because we get bug reports when
we break it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Jonathan S. Katz

On 2/25/22 12:39 PM, Tom Lane wrote:

Jeff Davis  writes:

On Thu, 2022-02-24 at 20:47 -0500, Tom Lane wrote:

... and, since we can't readily enforce that the client only sends
those cleartext passwords over suitably-encrypted connections, this
could easily be a net negative for security.  Not sure that I think
it's a good idea.



I don't understand your point. Can't you just use "hostssl" rather than
"host"?


My point is that sending cleartext passwords over the wire is an
insecure-by-definition protocol that we shouldn't be encouraging
more use of.


This is my general feeling as well. We just spent a bunch of effort 
adding, refining, and making SCRAM the default method. I think doing 
anything that would drive more use of sending plaintext passwords, even 
over TLS, is counter to that.


I do understand arguments for (e.g. systems that require checking 
password complexity), but I wonder if it's better for us to delegate 
that to an external auth system. Regardless, I can get behind Andres' 
point to "check Port->ssl_in_use before sendAuthRequest(AUTH_REQ_PASSWORD)".


I'm generally in favor of being able to support additional 
authentication methods, the first one coming to mind is supporting OIDC. 
Having a pluggable auth infrastructure could possibly make such efforts 
easier. I'm definitely intrigued.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: trigger example for plsample

2022-02-25 Thread Chapman Flack
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

This patch is straightforward, does what it says, and passes the tests.

Regarding the duplication of code between plsample_func_handler and
plsample_trigger_handler, perhaps that's for the best for now, as 3554 in
the same commitfest also touches plsample, so merge conflicts may be
minimized by not doing more invasive refactoring.

That would leave low-hanging fruit for a later patch that could refactor
plsample to reduce the duplication (maybe adding a validator at the same
time? That would also duplicate some of the checks in the existing handlers.)

I am not sure that structuring the trigger handler with separate compile and
execute steps is worth the effort for a simple example like plsample. The main
plsample_func_handler is not so structured.

It's likely that many real PLs will have some notion of compilation separate 
from
execution. But those will also have logic to do the compilation only once, and
somewhere to cache the result of that for reuse across calls, and those kinds of
details might make plsample's basic skeleton more complex than needed.

I know that in just looking at expected/plsample.out, I was a little distracted 
by
seeing multiple "compile" messages for the same trigger function in the same
session and wondering why that was.

So maybe it would be simpler and less distracting to assume that the PL targeted
by plsample is one that just has a simple interpreter that works from the 
source text.

Regards,
-Chap

The new status of this patch is: Waiting on Author


Re: Commitfest manager for 2022-03

2022-02-25 Thread Greg Stark
I would like to volunteer.

On Fri, 25 Feb 2022 at 05:31, Julien Rouhaud  wrote:
>
> Hi,
>
> The final commitfest for pg15 will start in a few days, and I didn't see any
> discussion on it or anyone volunteering to be a CFM.
>
> I thought it would be a good idea to send this reminder now and avoid the same
> situation as the last commitfest, to avoid unnecessary pain for the CFM(s).
>
> Is there any volunteer?
>
> For the record there are already 246 active patches registered.
>
>


-- 
greg




Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-25 Thread Julien Rouhaud
On Sat, Feb 26, 2022 at 02:30:36AM +0800, Julien Rouhaud wrote:
> On Fri, Feb 25, 2022 at 06:49:42PM +0100, Matthias van de Meent wrote:
> >
> > The point I was trying to make was "If cps->ckpt_flags is
> > CHECKPOINT_IMMEDIATE, we hurry up to start the new checkpoint that is
> > actually immediate". That doesn't mean that this checkpoint was
> > created with IMMEDIATE or running using IMMEDIATE, only that optional
> > delays are now being skipped instead.
> 
> Ah, I now see what you mean.
> 
> > To let the user detect _why_ the optional delays are now being
> > skipped, I propose not to report this currently running checkpoint's
> > "flags | CHECKPOINT_IMMEDIATE", but to add reporting of the next
> > checkpoint's flags; which would allow the detection and display of the
> > CHECKPOINT_IMMEDIATE we're actually hurrying for (plus some more
> > interesting information flags.
> 
> I'm still not convinced that's a sensible approach.  The next checkpoint will
> be displayed in the view as CHECKPOINT_IMMEDIATE, so you will then know about
> it.  I'm not sure that having that specific information in the view is
> going to help, especially if users have to understand "a slow checkpoint is
> actually fast even if it's displayed as slow if the next checkpoint is going 
> to
> be fast".  Saying "it's timed" (which imply slow) and "it's fast" is maybe
> still counter intuitive, but at least have a better chance to see there's
> something going on and refer to the doc if you don't get it.

Just to be clear, I do think that it's worthwhile to add some information that
some backends are waiting for that next checkpoint.  As discussed before, an
int for the number of backends looks like enough information to me.




Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-25 Thread Julien Rouhaud
On Fri, Feb 25, 2022 at 06:49:42PM +0100, Matthias van de Meent wrote:
>
> The point I was trying to make was "If cps->ckpt_flags is
> CHECKPOINT_IMMEDIATE, we hurry up to start the new checkpoint that is
> actually immediate". That doesn't mean that this checkpoint was
> created with IMMEDIATE or running using IMMEDIATE, only that optional
> delays are now being skipped instead.

Ah, I now see what you mean.

> To let the user detect _why_ the optional delays are now being
> skipped, I propose not to report this currently running checkpoint's
> "flags | CHECKPOINT_IMMEDIATE", but to add reporting of the next
> checkpoint's flags; which would allow the detection and display of the
> CHECKPOINT_IMMEDIATE we're actually hurrying for (plus some more
> interesting information flags.

I'm still not convinced that's a sensible approach.  The next checkpoint will
be displayed in the view as CHECKPOINT_IMMEDIATE, so you will then know about
it.  I'm not sure that having that specific information in the view is
going to help, especially if users have to understand "a slow checkpoint is
actually fast even if it's displayed as slow if the next checkpoint is going to
be fast".  Saying "it's timed" (which imply slow) and "it's fast" is maybe
still counter intuitive, but at least have a better chance to see there's
something going on and refer to the doc if you don't get it.




Re: Some optimisations for numeric division

2022-02-25 Thread Tom Lane
Dean Rasheed  writes:
> And another update following feedback from the cfbot.

This patchset LGTM.  On my machine there doesn't seem to be any
measurable performance change for the numeric regression test,
but numeric_big gets about 15% faster.

The only additional thought I have, based on your comments about
0001, is that maybe in mul_var we should declare var1digit as
being NumericDigit not int.  I tried that here and didn't see
any material change in the generated assembly code (using gcc
8.5.0), so you're right that modern gcc doesn't need any help
there; but I wonder if it could help on other compiler versions.

I'll mark this RFC.

regards, tom lane




Re: [PATCH] Add reloption for views to enable RLS

2022-02-25 Thread Dean Rasheed
On Fri, 18 Feb 2022 at 14:57, Laurenz Albe  wrote:
>
> Here is a new version, with improved documentation and the option renamed
> to "check_permissions_owner".  I just prefer the shorter form.
>

Re-reading this thread, I think I preferred the name
"security_invoker". The main objection seemed to come from the
potential confusion with SECURITY INVOKER/DEFINER functions, but I
think that's really a different thing. As long as the documentation
for the default behaviour is clear (which I think it was), then it
should be easy to explain how a security invoker view behaves
differently. Also, there's value in using the same terminology as
other databases, because many users will already be familiar with the
feature from those databases.

Some other review comments:

1). This new comment:

+   
+Be aware that USAGE privileges on schemas containing
+the underlying base relations are not checked.
+   

is not entirely accurate. It's more accurate to say that a user
creating or replacing a view must have CREATE privileges on the schema
containing the view and USAGE privileges on any schemas referred to in
the view query, whereas a user using the view only needs USAGE
privileges on the schema containing the view.

(Note that, for the view creator, USAGE is required on any schema
referred to in the query -- e.g., schemas containing functions as well
as base relations.)

2). The patch is adding a new field to RangeTblEntry which seems to be
unnecessary -- it's set, and copied around, but never read, so it
should just be removed.

3). Looking at this change:

-setRuleCheckAsUser((Node *) rule->actions, relation->rd_rel->relowner);
-setRuleCheckAsUser(rule->qual, relation->rd_rel->relowner);
+if (!(relation->rd_rel->relkind == RELKIND_VIEW
+  && !RelationSubqueryCheckPermsOwner(relation)))
+{
+setRuleCheckAsUser((Node *) rule->actions,
relation->rd_rel->relowner);
+setRuleCheckAsUser(rule->qual, relation->rd_rel->relowner);
+}

I think it should call setRuleCheckAsUser() in all cases. It might be
true that the rule fetched has checkAsUser set to InvalidOid
throughout its action and quals, but it seems unwise to rely on that
-- better to code defensively and explicitly set it in all cases.

4). In the same code block, I think the new behaviour should be
applied to SELECT rules only. The view may have other non-SELECT rules
(just as a table may have non-SELECT rules), created using CREATE
RULE, but their actions are independent of the view definition.
Currently their permissions are checked as the view/table owner, and
if anyone wanted to change that, it should be an option on the rule,
not the view (just as triggers can be made security definer or
invoker, depending on how the trigger function is defined).

(Note: I'm not suggesting that anyone actually spend any time adding
such an option to rules. Given all the pitfalls associated with rules,
I think their use should be discouraged, and no development effort
should be expended enhancing them.)

5). In the same function, the block of code that fetches rules and
triggers has been moved. I think it would be worth adding a comment to
explain why it's now important to extract the reloptions *before*
fetching the relation's rules and triggers.

6). The second set of tests added to rowsecurity.sql seem to have
nothing to do with RLS, and probably belong in updatable_views.sql,
and I think it would be worth adding a few more tests for things like
views on top of views.

Regards,
Dean




Re: convert libpq uri-regress tests to tap test

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-24 08:46:23 -0800, Andres Freund wrote:
> I'm mildly inclined to only do 0001 and 0002 for now. We'd not loose msvc
> coverage, because it already doesn't build the test. Once we've ironed that
> stuff out, we could do 0003?

>From what I can see in the buildfarm client, we'd not loose (nor gain) any
buildfarm coverage either. It doesn't run the test today.

Greetings,

Andres Freund




Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-25 Thread Matthias van de Meent
On Fri, 25 Feb 2022 at 17:35, Julien Rouhaud  wrote:
>
> On Fri, Feb 25, 2022 at 08:53:50PM +0530, Nitin Jadhav wrote:
> > >
> > > I'm not sure what Matthias meant, but as far as I know there's no 
> > > fundamental
> > > difference between checkpoint with and without the CHECKPOINT_IMMEDIATE 
> > > flag,
> > > and there's also no scheduling for multiple checkpoints.
> > >
> > > Yes, the flags will remain the same but checkpoint.c will test both the 
> > > passed
> > > flags and the shmem flags to see whether a delay should be added or not, 
> > > which
> > > is the only difference in checkpoint processing for this flag.  See the 
> > > call to
> > > ImmediateCheckpointRequested() which will look at the value in shmem:
> > >
> > >/*
> > > * Perform the usual duties and take a nap, unless we're behind 
> > > schedule,
> > > * in which case we just try to catch up as quickly as possible.
> > > */
> > >if (!(flags & CHECKPOINT_IMMEDIATE) &&
> > >!ShutdownRequestPending &&
> > >!ImmediateCheckpointRequested() &&
> > >IsCheckpointOnSchedule(progress))
> >
> > I understand that the checkpointer considers flags as well as the
> > shmem flags and if CHECKPOINT_IMMEDIATE flag is set, it affects the
> > current checkpoint operation (No further delay) but does not change
> > the current flag value. Should we display this change in the kind
> > field of the view or not? Please share your thoughts.
>
> I think the fields should be added.  It's good to know that a checkpoint was
> trigger due to normal activity and should be spreaded, and then something
> upgraded it to an immediate checkpoint.  If you're desperately waiting for the
> end of a checkpoint for some reason and ask for an immediate checkpoint, 
> you'll
> certainly be happy to see that the checkpointer is aware of it.
>
> But maybe I missed something in the code, so let's wait for Matthias input
> about it.

The point I was trying to make was "If cps->ckpt_flags is
CHECKPOINT_IMMEDIATE, we hurry up to start the new checkpoint that is
actually immediate". That doesn't mean that this checkpoint was
created with IMMEDIATE or running using IMMEDIATE, only that optional
delays are now being skipped instead.

To let the user detect _why_ the optional delays are now being
skipped, I propose not to report this currently running checkpoint's
"flags | CHECKPOINT_IMMEDIATE", but to add reporting of the next
checkpoint's flags; which would allow the detection and display of the
CHECKPOINT_IMMEDIATE we're actually hurrying for (plus some more
interesting information flags.

-Matthias

PS. I just noticed that the checkpoint flags are also being parsed and
stringified twice in LogCheckpointStart; and adding another duplicate
in the current code would put that at 3 copies of effectively the same
code. Do we maybe want to deduplicate that into macros, similar to
LSN_FORMAT_ARGS?




Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 09:33:45 -0800, Jeff Davis wrote:
> On Thu, 2022-02-24 at 20:47 -0500, Tom Lane wrote:
> > ... and, since we can't readily enforce that the client only sends
> > those cleartext passwords over suitably-encrypted connections, this
> > could easily be a net negative for security.  Not sure that I think
> > it's a good idea.
> 
> I don't understand your point. Can't you just use "hostssl" rather than
> "host"?

And the extension could check Port->ssl_in_use before 
sendAuthRequest(AUTH_REQ_PASSWORD)
if it wanted to restrict it.

Greetings,

Andres Freund




Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Tom Lane
Jeff Davis  writes:
> On Thu, 2022-02-24 at 20:47 -0500, Tom Lane wrote:
>> ... and, since we can't readily enforce that the client only sends
>> those cleartext passwords over suitably-encrypted connections, this
>> could easily be a net negative for security.  Not sure that I think
>> it's a good idea.

> I don't understand your point. Can't you just use "hostssl" rather than
> "host"?

My point is that sending cleartext passwords over the wire is an
insecure-by-definition protocol that we shouldn't be encouraging
more use of.

regards, tom lane




Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Jeff Davis
On Thu, 2022-02-24 at 20:47 -0500, Tom Lane wrote:
> ... and, since we can't readily enforce that the client only sends
> those cleartext passwords over suitably-encrypted connections, this
> could easily be a net negative for security.  Not sure that I think
> it's a good idea.

I don't understand your point. Can't you just use "hostssl" rather than
"host"?

Also there are some useful cases that don't really require SSL, like
when the client and host are on the same machine, or if you have a
network secured some other way.

Regards,
Jeff Davis






Re: Proposal: Support custom authentication methods using hooks

2022-02-25 Thread Jeff Davis
On Thu, 2022-02-24 at 19:47 -0800, Andres Freund wrote:
> Why is it restricted to that? You could do sasl negotiation as well
> from what
> I can see? And that'd theoretically also allow to negotiate whether
> the client
> supports different ways of doing auth?  Not saying that that's easy,
> but I
> don't think it's a fundamental restriction.

Good point! It would only work with enhanced clients though -- maybe in
the future we'd make libpq pluggable with new auth methods?

> We have several useful authentication technologies built ontop of
> plaintext
> exchange. Radius, Ldap, Pam afaics could be implemented as an
> extension?

Yes, and it means that we won't have to extend that list in core in the
future when new methods become popular.

Regards,
Jeff Davis






Re: Readd use of TAP subtests

2022-02-25 Thread Andrew Dunstan


On 2/25/22 11:41, Andres Freund wrote:
> Hi,
>
> On 2022-02-25 09:43:20 -0500, Andrew Dunstan wrote:
>> AIUI TAP consumers are supposed to ignore lines they don't understand.
> Are they?
>
> In http://testanything.org/tap-version-13-specification.html there's:
>
> "Lines written to standard output matching /^(not )?ok\b/ must be interpreted
> as test lines. [...]All other lines must not be considered test output."
>
> That says that all other lines aren't "test ouput". But what does that mean?
> It certainly doesn't mean they can just be ignored, because obviously
> ^(TAP version|#|1..|Bail out) isn't to be ignored.



I don't think we're following spec 13.


>
> And then there's:
>
>
> "
> Anything else
>
>   Any output that is not a version, a plan, a test line, a YAML block, a
>   diagnostic or a bail out is incorrect. How a harness handles the incorrect
>   line is undefined. Test::Harness silently ignores incorrect lines, but will
>   become more stringent in the future. TAP::Harness reports TAP syntax errors 
> at
>   the end of a test run.
> "
>
> If I were to to implement a tap parser this wouldn't make me ignore lines.
>
>
> Contrasting to that:
> http://testanything.org/tap-specification.html
>
> "
> Anything else
>
>   A TAP parser is required to not consider an unknown line as an error but may
>   optionally choose to capture said line and hand it to the test harness,
>   which may have custom behavior attached. This is to allow for forward
>   compatability. Test::Harness silently ignores incorrect lines, but will
>   become more stringent in the future. TAP::Harness reports TAP syntax errors
>   at the end of a test run.
> "
>
> I honestly don't know what to make of that. Parsers are supposed to ignore
> unknown lines, Test::Harness silently ignores, but also "TAP::Harness reports
> TAP syntax errors"?  This seems like a self contradictory mess.
>

I agree it's a mess. Both of these "specs" are incredibly loose. I guess
that happens when the spec comes after the implementation.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: should vacuum's first heap pass be read-only?

2022-02-25 Thread Peter Geoghegan
On Fri, Feb 25, 2022 at 5:06 AM Dilip Kumar  wrote:
> Based on this discussion, IIUC, we are saying that now we will do the
> lazy_scan_heap every time like we are doing now.  And we will
> conditionally skip the index vacuum for all or some of the indexes and
> then based on how much index vacuum is done we will conditionally do
> the lazy_vacuum_heap_rel().  Is my understanding correct?

I can only speak for myself, but that sounds correct to me. IMO what
we really want here is to create lots of options for VACUUM. To do the
work of index vacuuming when it is most convenient, based on very
recent information about what's going on in each index. There at some
specific obvious ways that it might help. For example, it would be
nice if the failsafe could not really skip index vacuuming -- it could
just put it off until later, after relfrozenxid has been advanced to a
safe value.

Bear in mind that the cost of lazy_scan_heap is often vastly less than
the cost of vacuuming all indexes -- and so doing a bit more work
there than theoretically necessary is not necessarily a problem.
Especially if you have something like UUID indexes, where there is no
natural locality. Many tables have 10+ indexes. Even large tables.

> IMHO, if we are doing the heap scan every time and then we are going
> to get the same dead items again which we had previously collected in
> the conveyor belt.  I agree that we will not add them again into the
> conveyor belt but why do we want to store them in the conveyor belt
> when we want to redo the whole scanning again?

I don't think we want to, exactly. Maybe it's easier to store
redundant TIDs than to avoid storing them in the first place (we can
probably just accept some redundancy). There is a trade-off to be made
there. I'm not at all sure of what the best trade-off is, though.

> I think (without global indexes) the main advantage of using the
> conveyor belt is that if we skip the index scan for some of the
> indexes then we can save the dead item somewhere so that without
> scanning the heap again we have those dead items to do the index
> vacuum sometime in future

Global indexes are important in their own right, but ISTM that they
have similar needs to other things anyway. Having this flexibility is
even more important with global indexes, but the concepts themselves
are similar. We want options and maximum flexibility, everywhere.

> but if you are going to rescan the heap
> again next time before doing any index vacuuming then why we want to
> store them anyway.

It all depends, of course. The decision needs to be made using a cost
model. I suspect it will be necessary to try it out, and see.

--
Peter Geoghegan




Re: BufferAlloc: don't take two simultaneous locks

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 12:51:22 +0300, Yura Sokolov wrote:
> > > +  * The usage_count starts out at 1 so that the buffer can survive one
> > > +  * clock-sweep pass.
> > > +  *
> > > +  * We use direct atomic OR instead of Lock+Unlock since no other backend
> > > +  * could be interested in the buffer. But StrategyGetBuffer,
> > > +  * Flush*Buffers, Drop*Buffers are scanning all buffers and locks them 
> > > to
> > > +  * compare tag, and UnlockBufHdr does raw write to state. So we have to
> > > +  * spin if we found buffer locked.
> > 
> > So basically the first half of of the paragraph is wrong, because no, we
> > can't?
> 
> Logically, there are no backends that could be interesting in the buffer.
> Physically they do LockBufHdr/UnlockBufHdr just to check they are not 
> interesting.

Yea, but that's still being interested in the buffer...


> > > +  * Note that we write tag unlocked. It is also safe since there is 
> > > always
> > > +  * check for BM_VALID when tag is compared.
> > 
> > 
> > >*/
> > >   buf->tag = newTag;
> > > - buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |
> > > -BM_CHECKPOINT_NEEDED | BM_IO_ERROR | 
> > > BM_PERMANENT |
> > > -BUF_USAGECOUNT_MASK);
> > >   if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == 
> > > INIT_FORKNUM)
> > > - buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;
> > > + new_bits = BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;
> > >   else
> > > - buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > > -
> > > - UnlockBufHdr(buf, buf_state);
> > > + new_bits = BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > >  
> > > - if (oldPartitionLock != NULL)
> > > + buf_state = pg_atomic_fetch_or_u32(>state, new_bits);
> > > + while (unlikely(buf_state & BM_LOCKED))
> > 
> > I don't think it's safe to atomic in arbitrary bits. If somebody else has
> > locked the buffer header in this moment, it'll lead to completely bogus
> > results, because unlocking overwrites concurrently written contents (which
> > there shouldn't be any, but here there are)...
> 
> That is why there is safety loop in the case buf->state were locked just
> after first optimistic atomic_fetch_or. 99.999% times this loop will not
> have a job. But in case other backend did lock buf->state, loop waits
> until it releases lock and retry atomic_fetch_or.

> > And or'ing contents in also doesn't make sense because we it doesn't work to
> > actually unset any contents?
> 
> Sorry, I didn't understand sentence :((


You're OR'ing multiple bits into buf->state. LockBufHdr() only ORs in
BM_LOCKED. ORing BM_LOCKED is fine:
Either the buffer is not already locked, in which case it just sets the
BM_LOCKED bit, acquiring the lock. Or it doesn't change anything, because
BM_LOCKED already was set.

But OR'ing in multiple bits is *not* fine, because it'll actually change the
contents of ->state while the buffer header is locked.


> > Why don't you just use LockBufHdr/UnlockBufHdr?
> 
> This pair makes two atomic writes to memory. Two writes are heavier than
> one write in this version (if optimistic case succeed).

UnlockBufHdr doesn't use a locked atomic op. It uses a write barrier and an
unlocked write.

Greetings,

Andres Freund




Re: Add parameter jit_warn_above_fraction

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 17:28:41 +0100, Magnus Hagander wrote:
> On Fri, Feb 25, 2022 at 5:20 PM Andres Freund  wrote:
> > On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:
> > > This patch adds a configuration parameter jit_warn_above_fraction that
> > > will cause a warning to be logged if the fraction of time spent on
> > > doing JIT is bigger than the specified one. For example, this can be
> > > used to track down those cases where JIT ends up taking 90% of the
> > > query runtime because of bad estimates...
> >
> > Hm. Could it make sense to do this as a auto_explain feature?
> 
> It could be. But I was looking for something a lot more "light weight"
> than having to install an extension. But yes, if we wanted to, we
> could certainly change jit_warn_above_fraction to be
> auto_explain.log_min_jit_fraction or something like that, and do
> basically the same thing.  But then, we could also have
> log_min_duration_statement be part of auto_explain instead, so it's
> all about where to draw the line :)

I guess it feels a tad on the "too narrow/specific" side of things for the
general code. We don't have log_min_duration_{parsing,planning,execution}
either.  But I also get it. So I just wanted to raise it ;)

Greetings,

Andres Freund




Mingw task for Cirrus CI

2022-02-25 Thread Melih Mutlu
Hi All,

I've been working on adding Windows+MinGW environment into cirrus-ci tasks
(discussion about ci is here [1]).
It uses MSYS2 to set the environment. UCRT is chosen as C standard library,
instead of MSVCRT.
The task configures postgres with features that are available in MSYS2 (see
available packages [2]) and tap tests are enabled.
I already added the necessary docker image, you can find the related PR at
[3] and a successful cirruc-ci run with these changes at [4].
Attached patch adds a task runs on Windows with MinGW for cirrus-ci

However, I cannot run configure with --with-python, --with-perl or
--with-tcl.
There are two issues I encountered while trying to enable.  e.g. for
--with-python

1-  python_ldlibrary is set to "libpython3.9.dll.a". So the related line in
configure cannot retrieve "ldlibrary"
2-  src/pl/plpython/Makefile looks under "C:/Windows/system32/" for
PYTHONDLL. gendef cannot open that file, give an error like " failed to
open()" when creating a .def file.

To overcome those, I added the correct pattern to extract ldlibrary by
appending  "-e 's/\.dll.a$//'" at the end of the related line.
Then, I tried to use python dll from MSYS instead of windows/system32 by
changing PYTHONDLL path to "/ucrt64/bin/libpython3.9.dll". I'm not sure if
this is the correct dll.
Here is the diff of these two changes:

diff --git a/configure b/configure
index f3cb5c2b51..42ea580442 100755
--- a/configure
+++ b/configure
@@ -10536,7 +10536,7 @@ python_libdir=`${PYTHON} -c "import sysconfig;
print(' '.join(filter(None,syscon
 python_ldlibrary=`${PYTHON} -c "import sysconfig; print('
'.join(filter(None,sysconfig.get_config_vars('LDLIBRARY'"`

 # If LDLIBRARY exists and has a shlib extension, use it verbatim.
-ldlibrary=`echo "${python_ldlibrary}" | sed -e 's/\.so$//' -e 's/\.dll$//'
-e 's/\.dylib$//' -e 's/\.sl$//'`
+ldlibrary=`echo "${python_ldlibrary}" | sed -e 's/\.so$//' -e 's/\.dll$//'
-e 's/\.dylib$//' -e 's/\.sl$//' -e 's/\.dll.a$//'`
 if test -e "${python_libdir}/${python_ldlibrary}" -a
x"${python_ldlibrary}" != x"${ldlibrary}"
 then
ldlibrary=`echo "${ldlibrary}" | sed "s/^lib//"`
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index a83ae8865c..4254ef94d7 100644
--- a/src/pl/plpython/Makefile
+++ b/src/pl/plpython/Makefile
@@ -61,7 +61,8 @@ INCS =plpython.h \
 ifeq ($(PORTNAME), win32)

 pytverstr=$(subst .,,${python_version})
-PYTHONDLL=$(subst \,/,$(WINDIR))/system32/python${pytverstr}.dll
+#PYTHONDLL=$(subst \,/,$(WINDIR))/system32/python${pytverstr}.dll
+PYTHONDLL=/ucrt64/bin/libpython3.9.dll

 OBJS += libpython${pytverstr}.a



In the end, make check-world still fails, even though I was able to run
configure and make without any obvious error.
I see bunch of errors in tests like:
+ERROR:  language "plpython3u" does not exist
+HINT:  Use CREATE EXTENSION to load the language into the database.

Here is the logs from failed ci run:
https://api.cirrus-ci.com/v1/artifact/task/4645682031099904/log/build/src/pl/plpython/regression.diffs


Any thoughts on how postgres can be built with --with-python etc. on mingw?

Best,
Melih


[1]
https://www.postgresql.org/message-id/flat/20211001222752.wrz7erzh4cajvgp6%40alap3.anarazel.de

[2] https://packages.msys2.org/package/

[3] https://github.com/anarazel/pg-vm-images/pull/8

[4] https://cirrus-ci.com/build/4999469182746624

Re: Expose JIT counters/timing in pg_stat_statements

2022-02-25 Thread Julien Rouhaud
On Fri, Feb 25, 2022 at 05:38:45PM +0100, Magnus Hagander wrote:
>
> Per some off-list discussion with Julien, we have clearly been talking
> in slightly different terms. So let's summarize the options into what
> theÿ́d actually be:
>
> Option 0: what is int he patch now
>
> Option 1:
> jit_count - number of executions using jit
> total_jit_time - for sum of functions+inlining+optimization+emission time
> min_jit_time - for sum of functions+inlining+optimization+emission time
> max_jit_time - for sum of functions+inlining+optimization+emission time
> mean_jit_time - for sum of functions+inlining+optimization+emission time
> stddev_jit_time - for sum of functions+inlining+optimization+emission time
> jit_functions - number of functions
> jit_inlining_count - number of executions where inlining happened
> jit_optimization_count - number of executions where optimization happened
> jit_emission_count - number of executions where emission happened
>
> Option 2:
> jit_count
> jit_functions
> jit_generation_time
> jit_inlining_count
> jit_inlining_time
> jit_optimization_count
> jit_optimization_time
> jit_emission_count
> jit_emission_time

I'm for option 2, I think it's important to have the timing details for
inlining and optimization and be able to compute correct stats.




Re: Readd use of TAP subtests

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 09:43:20 -0500, Andrew Dunstan wrote:
> AIUI TAP consumers are supposed to ignore lines they don't understand.

Are they?

In http://testanything.org/tap-version-13-specification.html there's:

"Lines written to standard output matching /^(not )?ok\b/ must be interpreted
as test lines. [...]All other lines must not be considered test output."

That says that all other lines aren't "test ouput". But what does that mean?
It certainly doesn't mean they can just be ignored, because obviously
^(TAP version|#|1..|Bail out) isn't to be ignored.

And then there's:


"
Anything else

  Any output that is not a version, a plan, a test line, a YAML block, a
  diagnostic or a bail out is incorrect. How a harness handles the incorrect
  line is undefined. Test::Harness silently ignores incorrect lines, but will
  become more stringent in the future. TAP::Harness reports TAP syntax errors at
  the end of a test run.
"

If I were to to implement a tap parser this wouldn't make me ignore lines.


Contrasting to that:
http://testanything.org/tap-specification.html

"
Anything else

  A TAP parser is required to not consider an unknown line as an error but may
  optionally choose to capture said line and hand it to the test harness,
  which may have custom behavior attached. This is to allow for forward
  compatability. Test::Harness silently ignores incorrect lines, but will
  become more stringent in the future. TAP::Harness reports TAP syntax errors
  at the end of a test run.
"

I honestly don't know what to make of that. Parsers are supposed to ignore
unknown lines, Test::Harness silently ignores, but also "TAP::Harness reports
TAP syntax errors"?  This seems like a self contradictory mess.

Greetings,

Andres Freund




Re: Expose JIT counters/timing in pg_stat_statements

2022-02-25 Thread Dmitry Dolgov
> On Fri, Feb 25, 2022 at 04:19:27PM +0100, Magnus Hagander wrote:
> On Fri, Feb 25, 2022 at 2:33 PM Julien Rouhaud  wrote:
> >
> > Hi,
> >
> > On Fri, Feb 25, 2022 at 02:06:29PM +0100, Magnus Hagander wrote:
> > > Here's a patch to add the sum of timings for JIT counters to
> > > pg_stat_statements, as a way to follow-up on if JIT is doing a good or
> > > a bad job in a configuration.
> >
> > +1, it seems like something quite useful.
>
> Given the amount of time often spent debugging JIT -- getting more
> insight is going to make it easier to tune it instead of like what
> unfortunately many people do and just turn it off..

Indeed, sounds convenient, although I wonder how exactly one would use it
to tune JIT? I'm curious, because I got used to situations when one
single long query takes much longer than expected due to JIT issues --
but it seems the target of this patch are situations when there are a
lot of long queries using JIT, and it's easier to analyze them via pgss?

> > > I decided to only store the total time for the timings, since there
> > > are 4 different timings and storing max/min/etc for each one of them
> > > would lead to a bit too much data. This can of course be reconsidered,
> > > but I think that's a reasonable tradeoff.
> >
> > I think the cumulated total time is enough.  Looking at the patch, I think 
> > we
> > should also cumulate the number of time jit was triggered, and
> > probably the same for each other main operation (optimization and inlining).
> > Otherwise the values may be wrong and look artificially low.
>
> So just to be clear, you're basically thinking:
>
> jit_count = count of entries where jit_functions>0
> jit_functions = 
> jit_optimizatinos = count of entries where time spent on jit_optimizations > 0

One interesting not-very-relevant for the patch thing I've noticed while
reading it, is that there seems to be no way to find out what fraction
of time jit_tuple_deforming is taking alone, it's sort of merged
together with jit_expressions in generation_counter.




Re: Expose JIT counters/timing in pg_stat_statements

2022-02-25 Thread Magnus Hagander
On Fri, Feb 25, 2022 at 4:41 PM Magnus Hagander  wrote:
>
> On Fri, Feb 25, 2022 at 4:40 PM Julien Rouhaud  wrote:
> >
> > On Fri, Feb 25, 2022 at 04:19:27PM +0100, Magnus Hagander wrote:
> > >
> > > So just to be clear, you're basically thinking:
> > >
> > > jit_count = count of entries where jit_functions>0
> > > jit_functions = 
> > > jit_optimizatinos = count of entries where time spent on 
> > > jit_optimizations > 0
> > >
> > > etc?
> >
> > Yes exactly, so 3 new fields on top of the one you already added.
> >
> > > So we count the times with min/max like other times for the total one,
> > > but instead add a counter for each of the details?
> >
> > I don't understand this one.  Did you mean we *don't* count times with 
> > min/max?
> > If that's the case then yes :)
>
> Hmm. Yeah, that was a bit unclear. I mean we track total time with
> min/max, and detailed time not at all. For details, we only track
> count, not time.

Per some off-list discussion with Julien, we have clearly been talking
in slightly different terms. So let's summarize the options into what
theÿ́d actually be:

Option 0: what is int he patch now

Option 1:
jit_count - number of executions using jit
total_jit_time - for sum of functions+inlining+optimization+emission time
min_jit_time - for sum of functions+inlining+optimization+emission time
max_jit_time - for sum of functions+inlining+optimization+emission time
mean_jit_time - for sum of functions+inlining+optimization+emission time
stddev_jit_time - for sum of functions+inlining+optimization+emission time
jit_functions - number of functions
jit_inlining_count - number of executions where inlining happened
jit_optimization_count - number of executions where optimization happened
jit_emission_count - number of executions where emission happened

Option 2:
jit_count
jit_functions
jit_generation_time
jit_inlining_count
jit_inlining_time
jit_optimization_count
jit_optimization_time
jit_emission_count
jit_emission_time


(We can bikeshed on the exact names of the fields once we have decided
which option is preferred)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Readd use of TAP subtests

2022-02-25 Thread Jacob Champion
On Fri, 2022-02-25 at 16:35 +, Jacob Champion wrote:
> On Fri, 2022-02-25 at 09:43 -0500, Andrew Dunstan wrote:
> > AIUI TAP consumers are supposed to ignore lines they don't understand.
> 
> It's undefined behavior [1]:

And of course the minute I send this I notice that I've linked the v13
spec instead of the original... sorry. Assuming Perl isn't marking its
tests as version 13, you are correct:

> Any output line that is not a version, a plan, a test line, a
> diagnostic or a bail out is considered an “unknown” line. A TAP
> parser is required to not consider an unknown line as an error but
> may optionally choose to capture said line and hand it to the test
> harness, which may have custom behavior attached. This is to allow
> for forward compatability. Test::Harness silently ignores incorrect
> lines, but will become more stringent in the future. TAP::Harness
> reports TAP syntax errors at the end of a test run.

--Jacob


Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-25 Thread Julien Rouhaud
On Fri, Feb 25, 2022 at 08:53:50PM +0530, Nitin Jadhav wrote:
> >
> > I'm not sure what Matthias meant, but as far as I know there's no 
> > fundamental
> > difference between checkpoint with and without the CHECKPOINT_IMMEDIATE 
> > flag,
> > and there's also no scheduling for multiple checkpoints.
> >
> > Yes, the flags will remain the same but checkpoint.c will test both the 
> > passed
> > flags and the shmem flags to see whether a delay should be added or not, 
> > which
> > is the only difference in checkpoint processing for this flag.  See the 
> > call to
> > ImmediateCheckpointRequested() which will look at the value in shmem:
> >
> >/*
> > * Perform the usual duties and take a nap, unless we're behind 
> > schedule,
> > * in which case we just try to catch up as quickly as possible.
> > */
> >if (!(flags & CHECKPOINT_IMMEDIATE) &&
> >!ShutdownRequestPending &&
> >!ImmediateCheckpointRequested() &&
> >IsCheckpointOnSchedule(progress))
> 
> I understand that the checkpointer considers flags as well as the
> shmem flags and if CHECKPOINT_IMMEDIATE flag is set, it affects the
> current checkpoint operation (No further delay) but does not change
> the current flag value. Should we display this change in the kind
> field of the view or not? Please share your thoughts.

I think the fields should be added.  It's good to know that a checkpoint was
trigger due to normal activity and should be spreaded, and then something
upgraded it to an immediate checkpoint.  If you're desperately waiting for the
end of a checkpoint for some reason and ask for an immediate checkpoint, you'll
certainly be happy to see that the checkpointer is aware of it.

But maybe I missed something in the code, so let's wait for Matthias input
about it.




Re: Readd use of TAP subtests

2022-02-25 Thread Jacob Champion
On Fri, 2022-02-25 at 09:43 -0500, Andrew Dunstan wrote:
> AIUI TAP consumers are supposed to ignore lines they don't understand.

It's undefined behavior [1]:

> Any output that is not a version, a plan, a test line, a YAML block,
> a diagnostic or a bail out is incorrect. How a harness handles the
> incorrect line is undefined. Test::Harness silently ignores incorrect
> lines, but will become more stringent in the future. TAP::Harness
> reports TAP syntax errors at the end of a test run.

--Jacob

[1] https://testanything.org/tap-version-13-specification.html


Re: Add parameter jit_warn_above_fraction

2022-02-25 Thread Magnus Hagander
On Fri, Feb 25, 2022 at 5:20 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:
> > This patch adds a configuration parameter jit_warn_above_fraction that
> > will cause a warning to be logged if the fraction of time spent on
> > doing JIT is bigger than the specified one. For example, this can be
> > used to track down those cases where JIT ends up taking 90% of the
> > query runtime because of bad estimates...
>
> Hm. Could it make sense to do this as a auto_explain feature?

It could be. But I was looking for something a lot more "light weight"
than having to install an extension. But yes, if we wanted to, we
could certainly change jit_warn_above_fraction to be
auto_explain.log_min_jit_fraction or something like that, and do
basically the same thing. But then, we could also have
log_min_duration_statement be part of auto_explain instead, so it's
all about where to draw the line :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-25 Thread Jacob Champion
On Thu, 2022-02-24 at 20:44 +, Jacob Champion wrote:
> That simplifies things. PFA a smaller v2; if pgaudit can't make use of
> libpq-be.h for some reason, then I guess we can tailor a fix to that
> use case.

Ha, opr_sanity caught my use of cstring. I'll work on a fix later
today.

--Jacob


Re: Readd use of TAP subtests

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 14:39:15 +0100, Peter Eisentraut wrote:
> On 24.02.22 16:00, Andres Freund wrote:
> > I've incidentally played with subtests yesterdays, when porting
> > src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
> > that subtests aren't actually specified in the tap format, and that 
> > different
> > libraries generate different output formats. The reason this matters 
> > somewhat
> > is that meson's testrunner can parse tap and give nicer progress / error
> > reports. But since subtests aren't in the spec it can't currently parse
> > them...
>
> Ok that's good to know.  What exactly happens when it tries to parse them?
> Does it not count them or does it fail somehow?  The way the output is
> structured

Says that it can't pase a line of the tap output:
16:06:55 MALLOC_PERTURB_=156 /usr/bin/perl /tmp/meson-test/build/../subtest.pl
--- output ---
stdout:
# Subtest: a
ok 1 - a: a
ok 2 - a: b
1..2
ok 1 - a
1..1
stderr:

TAP parsing error: unexpected input at line 4
--


> t/001_basic.pl ..
> # Subtest: vacuumlo --help
> ok 1 - exit code 0
> ok 2 - goes to stdout
> ok 3 - nothing to stderr
> 1..3
> ok 1 - vacuumlo --help
>
> it appears that it should be able to parse it nonetheless and should just
> count the non-indented lines.

It looks like it's not ignoring indented lines...

Greetings,

Andres Freund




Re: Add parameter jit_warn_above_fraction

2022-02-25 Thread Justin Pryzby
On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:
> + {
> + {"jit_warn_above_fraction", PGC_SUSET, LOGGING_WHEN,
> + gettext_noop("Sets the fraction of query time spent on 
> JIT before writing"
> +  "a warning to the log."),
> + gettext_noop("Write a message tot he server log if more 
> than this"
> +  "fraction of the query runtime 
> is spent on JIT."
> +  "Zero turns off the warning.")
> + },
> + _warn_above_fraction,
> + 0.0, 0.0, 1.0,
> + NULL, NULL, NULL
> + },

Should be PGC_USERSET ?

+   gettext_noop("Write a message tot he server log if more 
than this"  
   

to the

+   if (jit_enabled && jit_warn_above_fraction > 0) 

   
+   {   

   
+   int64 jit_time =

   
+   
INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.generation_counter)
 +  
   
+   
INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.inlining_counter)
 +  
 
+   
INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.optimization_counter)
 +  
 
+   
INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.emission_counter);


+   

   
+   if (jit_time > msecs * jit_warn_above_fraction) 

   
+   {   

   
+   ereport(WARNING,

   
+   (errmsg("JIT time was %ld ms of %d ms", 

   
+   jit_time, msecs))); 

   
+   }   

   
+   }   

   


I think it should be a NOTICE (or less?)

Is it more useful if this is applied combined with log_min_duration_statement ?

It's easy to imagine a query for which the planner computes a high cost, but
actually runs quickly.  You might get a bunch of WARNINGs that the query took
10 MS and JIT was 75% of that, even if you don't care about queries that take
less than 10 SEC.

I should say that this is already available by processing the output of
autoexplain.

-- 
Justin




Re: Add parameter jit_warn_above_fraction

2022-02-25 Thread Andres Freund
Hi,

On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:
> This patch adds a configuration parameter jit_warn_above_fraction that
> will cause a warning to be logged if the fraction of time spent on
> doing JIT is bigger than the specified one. For example, this can be
> used to track down those cases where JIT ends up taking 90% of the
> query runtime because of bad estimates...

Hm. Could it make sense to do this as a auto_explain feature?

Greetings,

Andres Freund




Re: Add parameter jit_warn_above_fraction

2022-02-25 Thread Julien Rouhaud
Hi,

On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:
> This patch adds a configuration parameter jit_warn_above_fraction that
> will cause a warning to be logged if the fraction of time spent on
> doing JIT is bigger than the specified one. For example, this can be
> used to track down those cases where JIT ends up taking 90% of the
> query runtime because of bad estimates...

I think that's tremendously useful, huge +1.

Just a few minor nit:

+ A value of 0 (the default)disables the warning.

missing space

+   ereport(WARNING,
+   (errmsg("JIT time was %ld ms of %d ms",
+   jit_time, msecs)));

"JIT time" may a bit obscure for users, how about "JIT total processing time"?"

+   gettext_noop("Sets the fraction of query time spent on 
JIT before writing"
+"a warning to the log."),
+   gettext_noop("Write a message tot he server log if more 
than this"
+"fraction of the query runtime 
is spent on JIT."
+"Zero turns off the warning.")

missing spaces in the concatenated strings.

The rest of the patch looks good to me.




Re: Expose JIT counters/timing in pg_stat_statements

2022-02-25 Thread Magnus Hagander
On Fri, Feb 25, 2022 at 4:40 PM Julien Rouhaud  wrote:
>
> On Fri, Feb 25, 2022 at 04:19:27PM +0100, Magnus Hagander wrote:
> >
> > So just to be clear, you're basically thinking:
> >
> > jit_count = count of entries where jit_functions>0
> > jit_functions = 
> > jit_optimizatinos = count of entries where time spent on jit_optimizations 
> > > 0
> >
> > etc?
>
> Yes exactly, so 3 new fields on top of the one you already added.
>
> > So we count the times with min/max like other times for the total one,
> > but instead add a counter for each of the details?
>
> I don't understand this one.  Did you mean we *don't* count times with 
> min/max?
> If that's the case then yes :)

Hmm. Yeah, that was a bit unclear. I mean we track total time with
min/max, and detailed time not at all. For details, we only track
count, not time.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Expose JIT counters/timing in pg_stat_statements

2022-02-25 Thread Julien Rouhaud
On Fri, Feb 25, 2022 at 04:19:27PM +0100, Magnus Hagander wrote:
> 
> So just to be clear, you're basically thinking:
> 
> jit_count = count of entries where jit_functions>0
> jit_functions = 
> jit_optimizatinos = count of entries where time spent on jit_optimizations > 0
> 
> etc?

Yes exactly, so 3 new fields on top of the one you already added.

> So we count the times with min/max like other times for the total one,
> but instead add a counter for each of the details?

I don't understand this one.  Did you mean we *don't* count times with min/max?
If that's the case then yes :)




Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-02-25 Thread David Christensen
On Fri, Feb 25, 2022 at 7:08 AM Japin Li  wrote:

>
> On Fri, 25 Feb 2022 at 20:48, David Christensen <
> david.christen...@crunchydata.com> wrote:
> >> Cool.  I think we can report an error instead of reading wal files,
> >> if the tablespace, database, or relation is invalid.  Does there any
> >> WAL record that has invalid tablespace, database, or relation OID?
> >
> > The only sort of validity check we could do here is range checking for
> the underlying data types
> > (which we certainly could/should add if it’s known to never be valid for
> the underlying types);
>
> The invalid OID I said here is such as negative number and zero, for those
> parameters, we do not need to read the WAL files, since it always invalid.
>

Agreed.  Can add some additional range validation to the parsed values.

David


Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-02-25 Thread David Christensen
On Fri, Feb 25, 2022 at 7:33 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> Thanks for the patch. This is not adding something that users can't do
> right now, but definitely improves the usability of the pg_waldump as
> it avoids external filterings. Also, it can give the stats/info at
> table and block level. So, +1 from my side.
>

Thanks for the feedback; I will be incorporating most of this into a new
version, with a couple of responses below.


> 3) Crossing 80 char limit
>

This is neither here nor there, but have we as a project considered
increasing that to something more modern?  I know a lot of current projects
consider 132 to be a more reasonable limit.  (Will reduce it down to that
for now, but consider this a vote towards increasing that limit.)


> 5) I would also see a need for "filter by FPW" i.e. list all WAL
> records with "FPW".
>

Yes, that wouldn't be too hard to add to this, can add to the next
version.  We probably ought to also add the fork number as specifiable as
well. Other thoughts on could be some wildcard value in the relation part,
so `1234/23456/*` could filter WAL to a specific database only, say, or
some other multiple specifier, like `--block=1234,123456,121234`.  (I
honestly consider this to be more advanced than we'd need to support in
this patch, but if probably wouldn't be too hard to add to it.)

Thanks,

David


Re: Fix overflow in justify_interval related functions

2022-02-25 Thread Joseph Koshakow
Just checking because I'm not very familiar with the process,
are there any outstanding items that I need to do for this patch?

- Joe Koshakow




Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-25 Thread Nitin Jadhav
> > Thank you Alvaro and Matthias for your views. I understand your point
> > of not updating the progress-report flag here as it just checks
> > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action
> > based on that but it doesn't change the checkpoint flags. I will
> > modify the code but I am a bit confused here. As per Alvaro, we need
> > to make the progress-report flag change in whatever is the place that
> > *requests* an immediate checkpoint. I feel this gives information
> > about the upcoming checkpoint not the current one. So updating here
> > provides wrong details in the view. The flags available during
> > CreateCheckPoint() will remain same for the entire checkpoint
> > operation and we should show the same information in the view till it
> > completes.
>
> I'm not sure what Matthias meant, but as far as I know there's no fundamental
> difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag,
> and there's also no scheduling for multiple checkpoints.
>
> Yes, the flags will remain the same but checkpoint.c will test both the passed
> flags and the shmem flags to see whether a delay should be added or not, which
> is the only difference in checkpoint processing for this flag.  See the call 
> to
> ImmediateCheckpointRequested() which will look at the value in shmem:
>
>/*
> * Perform the usual duties and take a nap, unless we're behind 
> schedule,
> * in which case we just try to catch up as quickly as possible.
> */
>if (!(flags & CHECKPOINT_IMMEDIATE) &&
>!ShutdownRequestPending &&
>!ImmediateCheckpointRequested() &&
>IsCheckpointOnSchedule(progress))

I understand that the checkpointer considers flags as well as the
shmem flags and if CHECKPOINT_IMMEDIATE flag is set, it affects the
current checkpoint operation (No further delay) but does not change
the current flag value. Should we display this change in the kind
field of the view or not? Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Fri, Feb 25, 2022 at 12:33 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Fri, Feb 25, 2022 at 12:23:27AM +0530, Nitin Jadhav wrote:
> > > I think the change to ImmediateCheckpointRequested() makes no sense.
> > > Before this patch, that function merely inquires whether there's an
> > > immediate checkpoint queued.  After this patch, it ... changes a
> > > progress-reporting flag?  I think it would make more sense to make the
> > > progress-report flag change in whatever is the place that *requests* an
> > > immediate checkpoint rather than here.
> > >
> > > > diff --git a/src/backend/postmaster/checkpointer.c 
> > > > b/src/backend/postmaster/checkpointer.c
> > > > +ImmediateCheckpointRequested(int flags)
> > > >  if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
> > > > +{
> > > > +updated_flags |= CHECKPOINT_IMMEDIATE;
> > >
> > > I don't think that these changes are expected behaviour. Under in this
> > > condition; the currently running checkpoint is still not 'immediate',
> > > but it is going to hurry up for a new, actually immediate checkpoint.
> > > Those are different kinds of checkpoint handling; and I don't think
> > > you should modify the reported flags to show that we're going to do
> > > stuff faster than usual. Maybe maintiain a seperate 'upcoming
> > > checkpoint flags' field instead?
> >
> > Thank you Alvaro and Matthias for your views. I understand your point
> > of not updating the progress-report flag here as it just checks
> > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action
> > based on that but it doesn't change the checkpoint flags. I will
> > modify the code but I am a bit confused here. As per Alvaro, we need
> > to make the progress-report flag change in whatever is the place that
> > *requests* an immediate checkpoint. I feel this gives information
> > about the upcoming checkpoint not the current one. So updating here
> > provides wrong details in the view. The flags available during
> > CreateCheckPoint() will remain same for the entire checkpoint
> > operation and we should show the same information in the view till it
> > completes.
>
> I'm not sure what Matthias meant, but as far as I know there's no fundamental
> difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag,
> and there's also no scheduling for multiple checkpoints.
>
> Yes, the flags will remain the same but checkpoint.c will test both the passed
> flags and the shmem flags to see whether a delay should be added or not, which
> is the only difference in checkpoint processing for this flag.  See the call 
> to
> ImmediateCheckpointRequested() which will look at the value in shmem:
>
> /*
>  * Perform the usual duties and take a nap, unless we're behind 
> schedule,
>  * in which case we just try to catch up as quickly as possible.
>  */
> if (!(flags & CHECKPOINT_IMMEDIATE) &&
>   

Re: Expose JIT counters/timing in pg_stat_statements

2022-02-25 Thread Magnus Hagander
On Fri, Feb 25, 2022 at 2:33 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Fri, Feb 25, 2022 at 02:06:29PM +0100, Magnus Hagander wrote:
> > Here's a patch to add the sum of timings for JIT counters to
> > pg_stat_statements, as a way to follow-up on if JIT is doing a good or
> > a bad job in a configuration.
>
> +1, it seems like something quite useful.

Given the amount of time often spent debugging JIT -- getting more
insight is going to make it easier to tune it instead of like what
unfortunately many people do and just turn it off..


> > I decided to only store the total time for the timings, since there
> > are 4 different timings and storing max/min/etc for each one of them
> > would lead to a bit too much data. This can of course be reconsidered,
> > but I think that's a reasonable tradeoff.
>
> I think the cumulated total time is enough.  Looking at the patch, I think we
> should also cumulate the number of time jit was triggered, and
> probably the same for each other main operation (optimization and inlining).
> Otherwise the values may be wrong and look artificially low.

So just to be clear, you're basically thinking:

jit_count = count of entries where jit_functions>0
jit_functions = 
jit_optimizatinos = count of entries where time spent on jit_optimizations > 0

etc?

So we count the times with min/max like other times for the total one,
but instead add a counter for each of the details?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Size functions inconsistent results

2022-02-25 Thread Fabrízio de Royes Mello
On Fri, Feb 25, 2022 at 12:10 PM Japin Li  wrote:
>
>
> I think, you forget the index size of toast table.
>
> with relations as (
>   select schemaname, relname, relid
>   from pg_stat_user_tables
>   where relname = 'test_size'
> ),
> sizes as (
>   select
> schemaname,
> r.relname,
>
> pg_total_relation_size(relid) AS total_bytes,
>
> pg_relation_size(relid, 'main') +
> pg_relation_size(relid, 'init') +
> pg_relation_size(relid, 'fsm') +
> pg_relation_size(relid, 'vm') AS heap_bytes,
> pg_indexes_size(relid) AS index_bytes,
> pg_table_size(reltoastrelid) + pg_indexes_size(reltoastrelid) AS
toast_bytes
>   from relations r
>   join pg_class on pg_class.oid = r.relid
> )
> select
>   total_bytes, heap_bytes, index_bytes, toast_bytes,
>   (total_bytes = (heap_bytes+index_bytes+toast_bytes)) as "Equal?",
>   (total_bytes - (heap_bytes+index_bytes+toast_bytes)) as "Diff"
> from sizes;
>

Ahh perfect... thanks... make sense because pg_table_size don't compute the
indexes size, now it worked:

fabrizio=# with relations as (
  select schemaname, relname, relid
  from pg_stat_user_tables
  where relname = 'test_size'
),
sizes as (
  select
schemaname,
r.relname,

pg_total_relation_size(relid) AS total_bytes,

pg_relation_size(relid, 'main') +
pg_relation_size(relid, 'init') +
pg_relation_size(relid, 'fsm') +
pg_relation_size(relid, 'vm') AS heap_bytes,
pg_indexes_size(relid) AS index_bytes,
pg_table_size(reltoastrelid) + pg_indexes_size(reltoastrelid) AS
toast_bytes
  from relations r
  join pg_class on pg_class.oid = r.relid
)
select
  total_bytes, heap_bytes, index_bytes, toast_bytes,
  (total_bytes = (heap_bytes+index_bytes+toast_bytes)) as "Equal?",
  (total_bytes - (heap_bytes+index_bytes+toast_bytes)) as "Diff"
from sizes;
 total_bytes | heap_bytes | index_bytes | toast_bytes | Equal? | Diff
-++-+-++--
14622720 |  65536 |   40960 |14516224 | t  |0
(1 row)

Regards,

--
Fabrízio de Royes Mello


Add parameter jit_warn_above_fraction

2022-02-25 Thread Magnus Hagander
This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..7c3bc56227 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6592,6 +6592,21 @@ local0.*/var/log/postgresql

  
 
+ 
+  jit_warn_above_fraction (floating point)
+  
+   jit_warn_above_fraction configuration parameter
+  
+  
+   
+
+ Causes a warning to be written to the log if the time spent on JIT (see )
+ goes above this fraction of the total query runtime.
+ A value of 0 (the default)disables the warning.
+
+   
+ 
+
  
   log_startup_progress_interval (integer)
   
diff --git a/src/backend/tcop/fastpath.c b/src/backend/tcop/fastpath.c
index d429aa4663..a1bff893a3 100644
--- a/src/backend/tcop/fastpath.c
+++ b/src/backend/tcop/fastpath.c
@@ -196,6 +196,7 @@ HandleFunctionRequest(StringInfo msgBuf)
 	struct fp_info *fip;
 	bool		callit;
 	bool		was_logged = false;
+	int			msecs;
 	char		msec_str[32];
 
 	/*
@@ -305,7 +306,7 @@ HandleFunctionRequest(StringInfo msgBuf)
 	/*
 	 * Emit duration logging if appropriate.
 	 */
-	switch (check_log_duration(msec_str, was_logged))
+	switch (check_log_duration(, msec_str, was_logged))
 	{
 		case 1:
 			ereport(LOG,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3c7d08209f..c0487ea67f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -961,7 +961,9 @@ exec_simple_query(const char *query_string)
 	bool		save_log_statement_stats = log_statement_stats;
 	bool		was_logged = false;
 	bool		use_implicit_block;
+	int			msecs;
 	char		msec_str[32];
+	int64		jit_time = 0;
 
 	/*
 	 * Report query to various monitoring facilities.
@@ -1220,6 +1222,16 @@ exec_simple_query(const char *query_string)
 
 		receiver->rDestroy(receiver);
 
+		/* Collect JIT timings in case it's active */
+		if (jit_enabled && jit_warn_above_fraction > 0 && portal->queryDesc && portal->queryDesc->estate->es_jit)
+		{
+			jit_time +=
+INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.generation_counter) +
+INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.inlining_counter) +
+INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.optimization_counter) +
+INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.emission_counter);
+		}
+
 		PortalDrop(portal, false);
 
 		if (lnext(parsetree_list, parsetree_item) == NULL)
@@ -1290,7 +1302,7 @@ exec_simple_query(const char *query_string)
 	/*
 	 * Emit duration logging if appropriate.
 	 */
-	switch (check_log_duration(msec_str, was_logged))
+	switch (check_log_duration(, msec_str, was_logged))
 	{
 		case 1:
 			ereport(LOG,
@@ -1306,6 +1318,16 @@ exec_simple_query(const char *query_string)
 			break;
 	}
 
+	if (jit_enabled && jit_warn_above_fraction > 0)
+	{
+		if (jit_time > msecs * jit_warn_above_fraction)
+		{
+			ereport(WARNING,
+	(errmsg("JIT time was %ld ms of %d ms",
+			jit_time, msecs)));
+		}
+	}
+
 	if (save_log_statement_stats)
 		ShowUsage("QUERY STATISTICS");
 
@@ -1333,6 +1355,7 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	CachedPlanSource *psrc;
 	bool		is_named;
 	bool		save_log_statement_stats = log_statement_stats;
+	int			msecs;
 	char		msec_str[32];
 
 	/*
@@ -1548,7 +1571,7 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	/*
 	 * Emit duration logging if appropriate.
 	 */
-	switch (check_log_duration(msec_str, false))
+	switch (check_log_duration(, msec_str, false))
 	{
 		case 1:
 			ereport(LOG,
@@ -1595,6 +1618,7 @@ exec_bind_message(StringInfo input_message)
 	MemoryContext oldContext;
 	bool		save_log_statement_stats = log_statement_stats;
 	bool		snapshot_set = false;
+	int			msecs;
 	char		msec_str[32];
 	ParamsErrorCbData params_data;
 	ErrorContextCallback params_errcxt;
@@ -2007,7 +2031,7 @@ exec_bind_message(StringInfo input_message)
 	/*
 	 * Emit duration logging if appropriate.
 	 */
-	switch (check_log_duration(msec_str, false))
+	switch (check_log_duration(, msec_str, false))
 	{
 		case 1:
 			ereport(LOG,
@@ -2053,6 +2077,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 	bool		is_xact_command;
 	bool		execute_is_fetch;
 	bool		was_logged = false;
+	int			msecs;
 	char		msec_str[32];
 	ParamsErrorCbData params_data;
 	ErrorContextCallback params_errcxt;
@@ -2244,7 +2269,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 	/*
 	 * Emit duration logging if appropriate.
 	 */
-	switch 

Re: Size functions inconsistent results

2022-02-25 Thread Japin Li


On Fri, 25 Feb 2022 at 22:58, Fabrízio de Royes Mello  
wrote:
> Hi all,
>
> While doing some work using our functions [1] for calculate relations size
> I noticed an inconsistency between pg_total_relation_size and calculate
> everything separately, have a look in this example:
>
> fabrizio=# create table test_size (id bigserial primary key, toast_column
> text);
> CREATE TABLE
>
> fabrizio=# insert into test_size (toast_column)
>
>   select repeat('X'::text, pg_size_bytes('1MB')::integer)
>   from generate_series(1,1000);
> INSERT 0 1000
>
> fabrizio=# with relations as (
>   select schemaname, relname, relid
>   from pg_stat_user_tables
>   where relname = 'test_size'
> ),
> sizes as (
>   select
> schemaname,
> r.relname,
>
> pg_total_relation_size(relid) AS total_bytes,
>
> pg_relation_size(relid, 'main') +
> pg_relation_size(relid, 'init') +
> pg_relation_size(relid, 'fsm') +
> pg_relation_size(relid, 'vm') AS heap_bytes,
> pg_indexes_size(relid) AS index_bytes,
> pg_table_size(reltoastrelid) AS toast_bytes
>   from relations r
>   join pg_class on pg_class.oid = r.relid
> )
> select
>   total_bytes, heap_bytes, index_bytes, toast_bytes,
>   (total_bytes = (heap_bytes+index_bytes+toast_bytes)) as "Equal?",
>   (total_bytes - (heap_bytes+index_bytes+toast_bytes)) as "Diff"
> from sizes;
>  total_bytes | heap_bytes | index_bytes | toast_bytes | Equal? |  Diff
> -++-+-++
> 14000128 |  90112 |   40960 |13688832 | f  | 180224
> (1 row)
>
> I want to calculate separately HEAP, INDEXES and TOAST (including indexes)
> sizes but it seems it's a bit inconsistent with pg_total_relation_size.
>
> Is it correct or am I missing something?
>

I think, you forget the index size of toast table.

with relations as (
  select schemaname, relname, relid
  from pg_stat_user_tables
  where relname = 'test_size'
),
sizes as (
  select
schemaname,
r.relname,

pg_total_relation_size(relid) AS total_bytes,

pg_relation_size(relid, 'main') +
pg_relation_size(relid, 'init') +
pg_relation_size(relid, 'fsm') +
pg_relation_size(relid, 'vm') AS heap_bytes,
pg_indexes_size(relid) AS index_bytes,
pg_table_size(reltoastrelid) + pg_indexes_size(reltoastrelid) AS toast_bytes
  from relations r
  join pg_class on pg_class.oid = r.relid
)
select
  total_bytes, heap_bytes, index_bytes, toast_bytes,
  (total_bytes = (heap_bytes+index_bytes+toast_bytes)) as "Equal?",
  (total_bytes - (heap_bytes+index_bytes+toast_bytes)) as "Diff"
from sizes;

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Fix typo in logicalfuncs.c - :%s/private date/Private data

2022-02-25 Thread Bharath Rupireddy
Hi,

Here's a tiny patch to do $subject.

Regards,
Bharath Rupireddy.


v1-0001-Fix-typo-in-logicalfuncs.c.patch
Description: Binary data


Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-25 Thread Nitin Jadhav
> +   if ((ckpt_flags &
> +(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0)
> +   {
>
> This code (present at multiple places) looks a little ugly to me, what
> we can do instead is add a macro probably named IsShutdownCheckpoint()
> which does the above check and use it in all the functions that have
> this check. See below:
>
> #define IsShutdownCheckpoint(flags) \
>  (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY) != 0)
>
> And then you may use this macro like:
>
> if (IsBootstrapProcessingMode() || IsShutdownCheckpoint(flags))
>return;

Good suggestion. In the v3 patch, I have removed the corresponding
code as these checks are not required. Hence this suggestion is not
applicable now.
---

> pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT,
> InvalidOid);
> +
> +   val[0] = XLogCtl->InsertTimeLineID;
> +   val[1] = flags;
> +   val[2] = PROGRESS_CHECKPOINT_PHASE_INIT;
> +   val[3] = CheckpointStats.ckpt_start_t;
> +
> +   pgstat_progress_update_multi_param(4, index, val);
> +   }
>
> Any specific reason for recording the timelineID in checkpoint stats
> table? Will this ever change in our case?

The timelineID is used to decide whether the current operation is
checkpoint or restartpoint. There is a field in the view to display
this information.

Thanks & Regards,
Nitin Jadhav

On Wed, Feb 23, 2022 at 9:46 PM Ashutosh Sharma  wrote:
>
> +   if ((ckpt_flags &
> +(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0)
> +   {
>
> This code (present at multiple places) looks a little ugly to me, what
> we can do instead is add a macro probably named IsShutdownCheckpoint()
> which does the above check and use it in all the functions that have
> this check. See below:
>
> #define IsShutdownCheckpoint(flags) \
>   (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY) != 0)
>
> And then you may use this macro like:
>
> if (IsBootstrapProcessingMode() || IsShutdownCheckpoint(flags))
> return;
>
> This change can be done in all these functions:
>
> +void
> +checkpoint_progress_start(int flags)
>
> --
>
> + */
> +void
> +checkpoint_progress_update_param(int index, int64 val)
>
> --
>
> + * Stop reporting progress of the checkpoint.
> + */
> +void
> +checkpoint_progress_end(void)
>
> ==
>
> +
> pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT,
> InvalidOid);
> +
> +   val[0] = XLogCtl->InsertTimeLineID;
> +   val[1] = flags;
> +   val[2] = PROGRESS_CHECKPOINT_PHASE_INIT;
> +   val[3] = CheckpointStats.ckpt_start_t;
> +
> +   pgstat_progress_update_multi_param(4, index, val);
> +   }
>
> Any specific reason for recording the timelineID in checkpoint stats
> table? Will this ever change in our case?
>
> --
> With Regards,
> Ashutosh Sharma.
>
> On Wed, Feb 23, 2022 at 6:59 PM Nitin Jadhav
>  wrote:
> >
> > > I will make use of pgstat_progress_update_multi_param() in the next
> > > patch to replace multiple calls to checkpoint_progress_update_param().
> >
> > Fixed.
> > ---
> >
> > > > The other progress tables use [type]_total as column names for counter
> > > > targets (e.g. backup_total for backup_streamed, heap_blks_total for
> > > > heap_blks_scanned, etc.). I think that `buffers_total` and
> > > > `files_total` would be better column names.
> > >
> > > I agree and I will update this in the next patch.
> >
> > Fixed.
> > ---
> >
> > > How about this "The checkpoint is started because max_wal_size is 
> > > reached".
> > >
> > > "The checkpoint is started because checkpoint_timeout expired".
> > >
> > > "The checkpoint is started because some operation forced a checkpoint".
> >
> > I have used the above description. Kindly let me know if any changes
> > are required.
> > ---
> >
> > > > > +  checkpointing CommitTs pages
> > > >
> > > > CommitTs -> Commit time stamp
> > >
> > > I will handle this in the next patch.
> >
> > Fixed.
> > ---
> >
> > > There are more scenarios where you can have a baackend requesting a 
> > > checkpoint
> > > and waiting for its completion, and there may be more than one backend
> > > concerned, so I don't think that storing only one / the first backend pid 
> > > is
> > > ok.
> >
> > Thanks for this information. I am not considering backend_pid.
> > ---
> >
> > > I think all the information should be exposed.  Only knowing why the 
> > > current
> > > checkpoint has been triggered without any further information seems a bit
> > > useless.  Think for instance for cases like [1].
> >
> > I have supported all possible checkpoint kinds. Added
> > pg_stat_get_progress_checkpoint_kind() to convert the flags (int) to a
> > string representing a combination of flags and also checking for the
> > flag update in ImmediateCheckpointRequested() which checks whether
> > CHECKPOINT_IMMEDIATE flag is set or not. I did not find any other
> > cases where the 

Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-25 Thread Bharath Rupireddy
On Thu, Jan 6, 2022 at 1:29 PM SATYANARAYANA NARLAPURAM
 wrote:
>
> Consider a cluster formation where we have a Primary(P), Sync Replica(S1), 
> and multiple async replicas for disaster recovery and read scaling (within 
> the region and outside the region). In this setup, S1 is the preferred 
> failover target in an event of the primary failure. When a transaction is 
> committed on the primary, it is not acknowledged to the client until the 
> primary gets an acknowledgment from the sync standby that the WAL is flushed 
> to the disk (assume synchrnous_commit configuration is remote_flush). 
> However, walsenders corresponds to the async replica on the primary don't 
> wait for the flush acknowledgment from the primary and send the WAL to the 
> async standbys (also any logical replication/decoding clients). So it is 
> possible for the async replicas and logical client ahead of the sync replica. 
> If a failover is initiated in such a scenario, to bring the formation into a 
> healthy state we have to either
>
>  run the pg_rewind on the async replicas for them to reconnect with the new 
> primary or
> collect the latest WAL across the replicas and feed the standby.
>
> Both these operations are involved, error prone, and can cause multiple 
> minutes of downtime if done manually. In addition, there is a window where 
> the async replicas can show the data that was neither acknowledged to the 
> client nor committed on standby. Logical clients if they are ahead may need 
> to reseed the data as no easy rewind option for them.
>
> I would like to propose a GUC send_Wal_after_quorum_committed which when set 
> to ON, walsenders corresponds to async standbys and logical replication 
> workers wait until the LSN is quorum committed on the primary before sending 
> it to the standby. This not only simplifies the post failover steps but 
> avoids unnecessary downtime for the async replicas. Thoughts?

Thanks Satya and others for the inputs. Here's the v1 patch that
basically allows async wal senders to wait until the sync standbys
report their flush lsn back to the primary. Please let me know your
thoughts.

I've done pgbench testing to see if the patch causes any problems. I
ran tests two times, there isn't much difference in the txns per
seconds (tps), although there's a delay in the async standby receiving
the WAL, after all, that's the feature we are pursuing.

[1]
HEAD or WITHOUT PATCH:
./pgbench -c 10 -t 500 -P 10 testdb
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 500
number of transactions actually processed: 5000/5000
latency average = 247.395 ms
latency stddev = 74.409 ms
initial connection time = 13.622 ms
tps = 39.713114 (without initial connection time)

PATCH:
./pgbench -c 10 -t 500 -P 10 testdb
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 500
number of transactions actually processed: 5000/5000
latency average = 251.757 ms
latency stddev = 72.846 ms
initial connection time = 13.025 ms
tps = 39.315862 (without initial connection time)

TEST SETUP:
primary in region 1
async standby 1 in the same region as that of the primary region 1
i.e. close to primary
sync standby 1 in region 2
sync standby 2 in region 3
an archive location in a region different from the primary and
standbys regions, region 4
Note that I intentionally kept sync standbys in regions far from
primary because it allows sync standbys to receive WAL a bit late by
default, which works well for our testing.

PGBENCH SETUP:
./psql -d postgres -c "drop database testdb"
./psql -d postgres -c "create database testdb"
./pgbench -i -s 100 testdb
./psql -d testdb -c "\dt"
./psql -d testdb -c "SELECT pg_size_pretty(pg_database_size('testdb'))"
./pgbench -c 10 -t 500 -P 10 testdb

Regards,
Bharath Rupireddy.


v1-0001-Allow-async-standbys-wait-for-sync-replication.patch
Description: Binary data


Size functions inconsistent results

2022-02-25 Thread Fabrízio de Royes Mello
Hi all,

While doing some work using our functions [1] for calculate relations size
I noticed an inconsistency between pg_total_relation_size and calculate
everything separately, have a look in this example:

fabrizio=# create table test_size (id bigserial primary key, toast_column
text);
CREATE TABLE

fabrizio=# insert into test_size (toast_column)

  select repeat('X'::text, pg_size_bytes('1MB')::integer)
  from generate_series(1,1000);
INSERT 0 1000

fabrizio=# with relations as (
  select schemaname, relname, relid
  from pg_stat_user_tables
  where relname = 'test_size'
),
sizes as (
  select
schemaname,
r.relname,

pg_total_relation_size(relid) AS total_bytes,

pg_relation_size(relid, 'main') +
pg_relation_size(relid, 'init') +
pg_relation_size(relid, 'fsm') +
pg_relation_size(relid, 'vm') AS heap_bytes,
pg_indexes_size(relid) AS index_bytes,
pg_table_size(reltoastrelid) AS toast_bytes
  from relations r
  join pg_class on pg_class.oid = r.relid
)
select
  total_bytes, heap_bytes, index_bytes, toast_bytes,
  (total_bytes = (heap_bytes+index_bytes+toast_bytes)) as "Equal?",
  (total_bytes - (heap_bytes+index_bytes+toast_bytes)) as "Diff"
from sizes;
 total_bytes | heap_bytes | index_bytes | toast_bytes | Equal? |  Diff
-++-+-++
14000128 |  90112 |   40960 |13688832 | f  | 180224
(1 row)

I want to calculate separately HEAP, INDEXES and TOAST (including indexes)
sizes but it seems it's a bit inconsistent with pg_total_relation_size.

Is it correct or am I missing something?

Regards,

[1]
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE

-- 
Fabrízio de Royes Mello


Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-25 Thread Nitin Jadhav
> Thank you Alvaro and Matthias for your views. I understand your point
> of not updating the progress-report flag here as it just checks
> whether the CHECKPOINT_IMMEDIATE is set or not and takes an action
> based on that but it doesn't change the checkpoint flags. I will
> modify the code but I am a bit confused here. As per Alvaro, we need
> to make the progress-report flag change in whatever is the place that
> *requests* an immediate checkpoint. I feel this gives information
> about the upcoming checkpoint not the current one. So updating here
> provides wrong details in the view. The flags available during
> CreateCheckPoint() will remain same for the entire checkpoint
> operation and we should show the same information in the view till it
> completes. So just removing the above piece of code (modified in
> ImmediateCheckpointRequested()) in the patch will make it correct. My
> opinion about maintaining a separate field to show upcoming checkpoint
> flags is it makes the view complex. Please share your thoughts.

I have modified the code accordingly.
---

> I think the use of capitals in CHECKPOINT and CHECKPOINTER in the
> documentation is excessive.

Fixed. Here the word CHECKPOINT represents command/checkpoint
operation. If we treat it as a checkpoint operation, I agree to use
lowercase but if we treat it as command, then I think uppercase is
recommended (Refer
https://www.postgresql.org/docs/14/sql-checkpoint.html). Is it ok to
always use lowercase here?
---

> (Same for terms such as MULTIXACT and
> others in those docs; we typically use those in lowercase when
> user-facing; and do we really use term CLOG anymore? Don't we call it
> "commit log" nowadays?)

I have observed the CLOG term in the existing documentation. Anyways I
have changed MULTIXACT to multixact, SUBTRANS to subtransaction and
CLOG to commit log.
---

> +   Whenever the checkpoint operation is running, the
> +   pg_stat_progress_checkpoint view will contain a
> +   single row indicating the progress of the checkpoint. The tables below
>
> Maybe it should show a single row , unless the checkpointer isn't running at
> all (like in single user mode).

Nice thought. Can we add an additional checkpoint phase like 'Idle'.
Idle is ON whenever the checkpointer process is running and there are
no on-going checkpoint Thoughts?
---

> +   Process ID of a CHECKPOINTER process.
>
> It's *the* checkpointer process.

Fixed.
---

> pgstatfuncs.c has a whitespace issue (tab-space).

I have verified with 'git diff --check' and also manually. I did not
find any issue. Kindly mention the specific code which has an issue.
---

> I suppose the functions should set provolatile.

Fixed.
---

> > I am storing the checkpoint start timestamp in the st_progress_param[]
> > and this gets set only once during the checkpoint (at the start of the
> > checkpoint). I have added function
> > pg_stat_get_progress_checkpoint_elapsed() which calculates the elapsed
> > time and returns a string. This function gets called whenever
> > pg_stat_progress_checkpoint view is queried. Kindly refer v2 patch and
> > share your thoughts.
>
> I dislike the lack of access to the actual value of the checkpoint
> start / checkpoint elapsed field.
>
> As a user, if I query the pg_stat_progress_* views, my terminal or
> application can easily interpret an `interval` value and cast it to
> string, but the opposite is not true: the current implementation for
> pg_stat_get_progress_checkpoint_elapsed loses precision. This is why
> we use typed numeric fields in effectively all other places instead of
> stringified versions of the values: oid fields, counters, etc are all
> rendered as bigint in the view, so that no information is lost and
> interpretation is trivial.

Displaying start time of the checkpoint.
---

> > I understand that the log based reporting is very costly and very
> > frequent updates are not advisable.  I am planning to use the existing
> > infrastructure of 'log_startup_progress_interval' which provides an
> > option for the user to configure the interval between each progress
> > update. Hence it avoids frequent updates to server logs. This approach
> > is used only during shutdown and end-of-recovery cases because we
> > cannot access pg_stat_progress_checkpoint view during those scenarios.
>
> I see; but log_startup_progress_interval seems to be exclusively
> consumed through the ereport_startup_progress macro. Why put
> startup/shutdown logging on the same path as the happy flow of normal
> checkpoints?

You mean to say while updating the progress of the checkpoint, call
pgstat_progress_update_param() and then call
ereport_startup_progress() ?

> I think that, instead of looking to what might at some point be added,
> it is better to use the currently available functions instead, and
> move to new functions if and when the log-based reporting requires it.

Make sense. Removing checkpoint_progress_update_param() and
checkpoint_progress_end(). I would like 

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-25 Thread Greg Stark
On Tue, 25 Jan 2022 at 01:32, Andres Freund  wrote:
>
> Hi,
>
> I was looking the shared memory stats patch again.

Can you point me to this thread? I looked for it but couldn't find it.


-- 
greg




Re: Readd use of TAP subtests

2022-02-25 Thread Andrew Dunstan


On 2/25/22 08:39, Peter Eisentraut wrote:
> On 24.02.22 16:00, Andres Freund wrote:
>> I've incidentally played with subtests yesterdays, when porting
>> src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it
>> seems
>> that subtests aren't actually specified in the tap format, and that
>> different
>> libraries generate different output formats. The reason this matters
>> somewhat
>> is that meson's testrunner can parse tap and give nicer progress / error
>> reports. But since subtests aren't in the spec it can't currently parse
>> them...
>
> Ok that's good to know.  What exactly happens when it tries to parse
> them?  Does it not count them or does it fail somehow?  The way the
> output is structured
>
> t/001_basic.pl ..
> # Subtest: vacuumlo --help
>     ok 1 - exit code 0
>     ok 2 - goes to stdout
>     ok 3 - nothing to stderr
>     1..3
> ok 1 - vacuumlo --help
>
> it appears that it should be able to parse it nonetheless and should
> just count the non-indented lines.


AIUI TAP consumers are supposed to ignore lines they don't understand.
The Node TAP setup produces output like this, so perl is hardly alone
here. See 


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: set TESTDIR from perl rather than Makefile

2022-02-25 Thread Andrew Dunstan


On 2/24/22 20:17, Justin Pryzby wrote:
> On Mon, Feb 21, 2022 at 07:00:54AM -0500, Andrew Dunstan wrote:
>> On 2/19/22 18:53, Justin Pryzby wrote:
>>> On Sat, Feb 19, 2022 at 05:41:49PM -0600, Justin Pryzby wrote:
 I rebased and fixed the check-guc script to work, made it work with vpath
 builds, and cleaned it up some.
>>> I also meant to also attach it.
>> This is going to break a bunch of stuff as written.
>>
>> First, it's not doing the same thing. The current system sets TESTDIR to
>> be the parent of the directory that holds the test. e.g. for
>> src/bin/pg_ctl/t/001_start_stop.pl it's src/bin/pg_ctl in the build
>> tree, not the 't' subdirectory. This patch apparently sets it to the 't'
>> subdirectory. That will break anything that goes looking for log files
>> in the current location, like the buildfarm client, and possibly some CI
>> setups as well.
> Yes, thanks.
>
> I changed the patch to use ENV{CURDIR} || dirname(dirname($0)).  If I'm not
> wrong, that seems to be doing the right thing.
>
>> Also, unless I'm mistaken it appears to to the wrong thing for vpath
>> builds:
>>
>> my $test_dir = File::Spec->rel2abs(dirname($0));
>>
>> is completely wrong for vpaths, since that will place it in the source
>> tree, not the build tree.
>>
>> Last, and for no explained reason that I can see, the patch undoes
>> commit f4ce6c4d3a, but only for msvc builds. Even if that's justified it
>> appears to have no relevance to this patch.
> Andres' idea is that perl should set TESTDIR and PATH.  Here I commented out
> PATH, and had the improbable issue that nothing seemed to be breaking,
> including the pipeline test under msvc.  It'd be helpful to know what
> configuration that breaks so I can test that it's broken and then test that
> it's fixed when set from within perl...


I'm fairly sure what's broken here is the MSVC install procedure, which
is massively overeager. We should fix that rather than take it for granted.


>
> I got busy here, and may not be able to progress this for awhile.



You have fixed the vpath issue. But the changes in vcregress.pl look
wrong to me.


-    $ENV{TESTDIR} = "$dir";


If we set PG_SUBDIR in the Makefile shouldn't we set it here too? Yes it
probably doesn't matter as our MSVC build system doesn't support vpath
builds, but it's good to maintain as much equivalence between the two as
possible. (Supporting vpath builds for MSVC might be a nice
student/intern project)


 my $module = basename $dir;
-    # add the module build dir as the second element in the PATH
-    $ENV{PATH} =~ s!;!;$topdir/$Config/$module;!;


See above comment re msvc install. If we're not reverting f4ce6c4d3a in
the Makefile we shouldn't be reverting here either.


+    # $ENV{VCREGRESS_MODE} = $Config;


Why is this commented out line here?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Readd use of TAP subtests

2022-02-25 Thread Peter Eisentraut

On 24.02.22 16:00, Andres Freund wrote:

I've incidentally played with subtests yesterdays, when porting
src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
that subtests aren't actually specified in the tap format, and that different
libraries generate different output formats. The reason this matters somewhat
is that meson's testrunner can parse tap and give nicer progress / error
reports. But since subtests aren't in the spec it can't currently parse
them...


Ok that's good to know.  What exactly happens when it tries to parse 
them?  Does it not count them or does it fail somehow?  The way the 
output is structured


t/001_basic.pl ..
# Subtest: vacuumlo --help
ok 1 - exit code 0
ok 2 - goes to stdout
ok 3 - nothing to stderr
1..3
ok 1 - vacuumlo --help

it appears that it should be able to parse it nonetheless and should 
just count the non-indented lines.





  1   2   >