Re: full-text search with GiST indexes seems to be getting slower since 9.1

2020-05-02 Thread Tom Lane
Tomas Vondra  writes:
> over the past couple of weeks I've been running various benchmarks with
> the intent to demonstrate how the performance evolved since ~8.3. In
> most tests we're doing pretty good, but I've noticed that full-text
> search using GiST indexes is an annoying exception - it's getting slower
> and slower since ~9.1.

Hmmm... given that TS_execute is near the top, I wonder if this has
anything to do with the several times we've mostly-rewritten that
to make it less incorrect for phrase search cases.  But that came in
in 9.6 IIRC, and the fixes have only been back-patched that far.
Don't know what might've changed in 9.3.

regards, tom lane




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-05-02 Thread Justin Pryzby
On Sun, Apr 12, 2020 at 01:53:40PM +0200, Fabien COELHO wrote:
> About v15, seen as one patch.

Thanks for looking.

> - I'm wondering whether could pg_stat_file call pg_ls_dir_files without
>   too much effort? ISTM that the output structure nearly the same. I do
>   not like much having one function specialized for files and one for
>   directories.

I refactored but not like that.  As I mentioned in the commit message, I don't
see a good way to make a function operate on a file when the function's primary
data structure is a DIR*.  Do you ?  I don't think it should call stat() and
then conditionally branch off to pg_stat_file().

There are two functions because they wrap two separate syscalls, which see as
good, transparent goal.  If we want a function that does what "ls -al" does,
that would also be a good example to follow, except that we already didn't
follow it.

/bin/ls first stat()s the path, and then either outputs its metadata (if it's a
file or if -d was specified) or lists a dir.  It's essentially a wrapper around
*two* system calls (stat and readdir/getdents).

Maybe we could invent a new pg_ls() which does that, and then refactor existing
code.  Or, maybe it would be a SQL function which calls stat() and then
conditionally calls pg_ls_dir if isdir=True (or type='d').  That would be easy
if we merge the commit which outputs all stat fields.

I'm still hoping for confirmation from a committer if this approach is worth
pursuing:

https://www.postgresql.org/message-id/20200310183037.GA29065%40telsasoft.com
https://www.postgresql.org/message-id/20200313131232.GO29065%40telsasoft.com
|Rather than making "recurse into tmpdir" the end goal:
|
|  - add a function to show metadata of an arbitrary dir;
|  - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not
|pg_ls_dir).
|  - maybe add pg_ls_dir_recurse, which satisfies the original need;
|  - retire pg_ls_dir (does this work with tuplestore?)
|  - profit

-- 
Justin
>From 6175cecd312296bbb9099834d91ffaa50e059f6c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v16 01/10] Document historic behavior of links to
 directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 96ea57eedd..9b885102da 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25486,7 +25486,8 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 size, last accessed time stamp, last modified time stamp,
 last file status change time stamp (Unix platforms only),
 file creation time stamp (Windows only), and a boolean
-indicating if it is a directory.  Typical usages include:
+indicating if it is a directory (or a symbolic link to a directory).
+Typical usages include:
 
 SELECT * FROM pg_stat_file('filename');
 SELECT (pg_stat_file('filename')).modification;
-- 
2.17.0

>From 52fd41bfacaba66f26f1e75b34e258b3a0f12372 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 30 Mar 2020 18:59:16 -0500
Subject: [PATCH v16 02/10] pg_stat_file and pg_ls_dir_* to use lstat()..

pg_ls_dir_* will now skip (no longer show) symbolic links, same as other
non-regular file types, as we advertize we do since 8b6d94cf6.  That seems to
be the intented behavior, since irregular file types are 1) less portable; and,
2) we don't currently show a file's type except for "bool is_dir".

pg_stat_file will now 1) show metadata of links themselves, rather than their
target; and, 2) specifically, show links to directories with "is_dir=false";
and, 3) not error on broken symlinks.
---
 doc/src/sgml/func.sgml  | 2 +-
 src/backend/utils/adt/genfile.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9b885102da..96b08d0500 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25486,7 +25486,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 size, last accessed time stamp, last modified time stamp,
 last file status change time stamp (Unix platforms only),
 file creation time stamp (Windows only), and a boolean
-indicating if it is a directory (or a symbolic link to a directory).
+indicating if it is a directory.
 Typical usages include:
 
 SELECT * FROM pg_stat_file('filename');
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index ceaa6180da..219ac160f8 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -370,7 +370,7 @@ pg_stat_file(PG_FUNCTION_ARGS)
 
 	filename = convert_and_check_filename(filename_t);
 
-	if (stat(filename, ) < 0)
+	if (lstat(filename, ) < 0)
 	{
 		if (missing_ok && errno == ENOENT)
 			PG_RETURN_NULL();
@@ -596,7 +596,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 
 		/* Get the 

Re: design for parallel backup

2020-05-02 Thread Andres Freund
Hi,

On 2020-05-01 16:32:15 -0400, Robert Haas wrote:
> On Thu, Apr 30, 2020 at 6:06 PM Robert Haas  wrote:
> > On Thu, Apr 30, 2020 at 3:52 PM Andres Freund  wrote:
> > > Why 8kb? That's smaller than what we currently do in pg_basebackup,
> > > afaictl, and you're actually going to be bottlenecked by syscall
> > > overhead at that point (unless you disable / don't have the whole intel
> > > security mitigation stuff).
> >
> > I just picked something. Could easily try other things.
> 
> I tried changing the write size to 64kB, keeping the rest the same.
> Here are the results:
> 
> filesystem media 1@64GB 2@32GB 4@16GB 8@8GB 16@4GB
> xfs mag 65 53 64 74 79
> ext4 mag 96 68 75 303 437
> xfs ssd 75 43 29 33 38
> ext4 ssd 96 68 63 214 254
> spread spread n/a n/a 43 38 40
> 
> And here again are the previous results with an 8kB write size:
> 
> xfs mag 97 53 60 67 71
> ext4 mag 94 68 66 335 549
> xfs ssd 97 55 33 27 25
> ext4 ssd 116 70 66 227 450
> spread spread n/a n/a 48 42 44
> 
> Generally, those numbers look better than the previous numbers, but
> parallelism still looks fairly appealing on the SSD storage - less so
> on magnetic disks, at least in this test.

I spent a fair bit of time analyzing this, and my conclusion is that you
might largely be seeing numa effects. Yay.

I don't have an as large numa machine at hand, but here's what I'm
seeing on my local machine, during a run of writing out 400GiB (this is
a run with noise on the machine, the benchmarks below are without
that). The machine has 192GiB of ram, evenly distributed to two sockets
/ numa domains.


At start I see
numastat -m|grep -E 
'MemFree|MemUsed|Dirty|Writeback|Active\(file\)|Inactive\(file\)'"
MemFree 91908.2092209.85   184118.05
MemUsed  3463.05 4553.33 8016.38
Active(file)  105.46  328.52  433.98
Inactive(file) 68.29  190.14  258.43
Dirty   0.860.901.76
Writeback   0.000.000.00
WritebackTmp0.000.000.00

For a while there's pretty decent IO throughput (all 10s samples):
Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz w/s 
wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s   drqm/s  %drqm 
d_await dareq-sz f/s f_await  aqu-sz  %util
nvme1n1  0.00  0.00 0.00   0.000.00 0.00 1955.67   
2299.32 0.00   0.00   42.48  1203.940.00  0.00 0.00   0.00
0.00 0.000.000.00   82.10  89.33

Then it starts to be slower on a sustained basis:
Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz w/s 
wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s   drqm/s  %drqm 
d_await dareq-sz f/s f_await  aqu-sz  %util
nvme1n1  0.00  0.00 0.00   0.000.00 0.00 1593.33   
1987.85 0.00   0.00   42.90  1277.550.00  0.00 0.00   0.00
0.00 0.000.000.00   67.55  76.53

And then performance tanks completely:
Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz w/s 
wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s   drqm/s  %drqm 
d_await dareq-sz f/s f_await  aqu-sz  %util
nvme1n1  0.00  0.00 0.00   0.000.00 0.00  646.33
781.85 0.00   0.00  132.68  1238.700.00  0.00 0.00   0.00
0.00 0.000.000.00   85.43  58.63


That amount of degradation confused me for a while, especially because I
couldn't reproduce it the more controlled I made the setups. In
particular I stopped seeing the same magnitude of issues after pinnning
processes to one numa socket (both running and memory).

After a few seconds:
Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz w/s 
wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s   drqm/s  %drqm 
d_await dareq-sz f/s f_await  aqu-sz  %util
nvme1n1  0.00  0.00 0.00   0.000.00 0.00 1882.00   
2320.07 0.00   0.00   42.50  1262.350.00  0.00 0.00   0.00
0.00 0.000.000.00   79.05  88.07

MemFree 35356.5080986.46   116342.96
MemUsed 60014.7515776.7275791.47
Active(file)  179.44  163.28  342.72
Inactive(file)  58293.1813385.1571678.33
Dirty   18407.50  882.0019289.50
Writeback 235.78  335.43  571.21
WritebackTmp0.000.000.00

A bit later io starts to get slower:

Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz w/s 
wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s   drqm/s  %drqm 
d_await dareq-sz f/s f_await  aqu-sz  %util
nvme1n1  0.00  0.00 0.00   0.000.00 0.00 1556.30   
1898.70 0.00   0.00   

Re: pg_stat_reset_slru(name) doesn't seem to work as documented

2020-05-02 Thread Atsushi Torikoshi
On Sat, May 2, 2020 at 11:05 PM Tomas Vondra 
wrote

> Pushed. Thanks for the report.
>

Thanks!

--
Atsushi Torikoshi


Re: Failed test 'pg_recvlogical acknowledged changes, nothing pending on slot'

2020-05-02 Thread Noah Misch
On Mon, Jan 29, 2018 at 10:09:49AM +1300, Thomas Munro wrote:
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/334334417
> 
> #   Failed test 'pg_recvlogical acknowledged changes, nothing pending on slot'
> #   at t/006_logical_decoding.pl line 91.
> #  got: 'BEGIN
> # table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
> # table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
> # table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
> # table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
> # COMMIT'
> # expected: ''
> # Looks like you failed 1 test of 16.

The problem is this StreamLogicalLog() code:

if (PQresultStatus(res) == PGRES_COPY_OUT)
{
/*
 * We're doing a client-initiated clean exit and have sent 
CopyDone to
 * the server. We've already sent replay confirmation and 
fsync'd so
 * we can just clean up the connection now.
 */
goto error;
}

Once pg_recvlogical receives the XLogData containing the sought-after end
position, that code makes pg_recvlogical exit without draining the remainder
of the backend messages.  If pg_recvlogical exits quickly enough, the backend
send()s libpq messages after pg_recvlogical disconnects, which can cause
internal_flush() to fail with EPIPE ("LOG:  could not send data to client:
Broken pipe").  If that precedes LogicalConfirmReceivedLocation(), the test
fails as you experienced.  Such failure happened once on the buildfarm[1];
post-LogicalConfirmReceivedLocation() EPIPE also happens[2].  The first
attached patch causes this failure to happen almost every run.  The fix
(second attachment) is to call PQgetCopyData() until no records remain, then
issue PQresultStatus() again[3].  This closes an additional defect, described
in the log message.


I looked at the other instances of "Broken pipe" in a couple of check-world
runs.  Clients might prevent those with cleaner shutdown on error, but it's
cosmetic.  They appeared in cases where the client or the server had already
recognized some other failure, whereas $SUBJECT taints a successful run.


This led me to notice other pg_recvlogical bugs, which I left unchanged:

1) An "unexpected termination of replication stream" error doesn't preclude
exit(0).

2) sigint_handler() doesn't trigger a PQputCopyEnd().  The connection status
remains PGRES_COPY_BOTH, prompting this weird message:

$ pg_recvlogical --create-slot --start -S foo -d postgres -f- && echo success
^Cpg_recvlogical: error: unexpected termination of replication stream: 
success

Other aspects of signal handling surprised me, but they may not be bugs.  The
documentation says that --start continues "until terminated by a signal".  We
don't trap SIGTERM, just SIGINT (initiate clean exit) and SIGHUP (reopen
output file).  pg_recvlogical copied SIGINT behavior from pg_receivewal, and
the pg_receivewal documentation is specific about signals.  Both programs
react to SIGINT with exit(0), whether or not they reached --endpos.
sigint_handler() doesn't trigger a flushAndSendFeedback(), so the slot's next
pg_recvlogical will repeat messages that followed the last fsync/feedback.

3) sendFeedback() wants s/last_fsync_lsn !=/last_fsync_lsn ==/.  This just
changes the volume of feedback messages.


[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogfish=2020-03-07%2018%3A49%3A34
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=florican=2020-04-27%2017%3A25%3A08=recovery-check
[3] https://www.postgresql.org/docs/devel/libpq-copy.html writes "After
PQgetCopyData returns -1, call PQgetResult to obtain the final result
status of the COPY command. One can wait for this result to be available
in the usual way. Then return to normal operation."
Author: Noah Misch 
Commit: Noah Misch 

(not for commit) Reproduce failure seen here:


https://postgr.es/m/CAEepm=1MzM2Z_xNe4foGwZ1a+MO_2S9oYDq3M5D11=jdu_+...@mail.gmail.com

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogfish=2020-03-07%2018%3A49%3A34

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 7717bb2..2268009 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1448,7 +1448,8 @@ internal_flush(void)
last_reported_send_errno = errno;
ereport(COMMERROR,
(errcode_for_socket_access(),
-errmsg("could not send data to 
client: %m")));
+errmsg("could not send data to 
client: %m"),
+errbacktrace()));
}
 
/*
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index ae4a9cb..d660038 100644
--- 

Re: Fixes for two separate bugs in nbtree VACUUM's page deletion

2020-05-02 Thread Peter Geoghegan
On Thu, Apr 30, 2020 at 12:20 AM Masahiko Sawada
 wrote:
> For the part of treating that case as an index corruption I will need
> some time to review because of lacking knowledge of btree indexes. So
> I'll review it later.

I pushed the refactoring patch today. Thanks for the review.

The final test for corruption that I added to btvacuumscan() is less
aggressive than what you saw in the patch I posted. We only report
corruption when backtracking/recursing if the page is "new", not a
leaf page, or is half-dead. We don't treat a fully deleted page as
corruption, because there is a case where the same call to
btvacuumscan() may have called _bt_pagedel() already, which may have
deleted the block that we backtrack/recurse to. The "sibling links
cannot point to a deleted page without concurrent deletion, and we
know that can't happen because we are VACUUM" stuff doesn't really
apply -- we remember which block we will recurse to *before* we
actually call _bt_pagedel().

-- 
Peter Geoghegan




Re: Problems with GSS encryption and SSL in libpq in 12~

2020-05-02 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Mon, Apr 06, 2020 at 04:25:57PM +0900, Michael Paquier wrote:
> > It is possible to enforce this flag to false by using
> > gssencmode=disable, but that's not really user-friendly in my opinion
> > because nobody is going to remember that for connection strings with
> > SSL settings so a lot of application are taking a performance hit at
> > connection because of that in my opinion.  I think that's also a bad
> > idea from the start to assume that we have to try GSS by default, as
> > any new code path opening a secured connection may fail into the trap
> > of attempting to use GSS if this flag is not reset.  Shouldn't we try
> > to set this flag to false by default, and set it to true only if
> > necessary depending on gssencmode?  A quick hack switching this flag
> > to false in makeEmptyPGconn() gives back the past performance to
> > src/test/ssl/, FWIW.
> > 
> > Looking around, it seems to me that there is a second issue as of
> > PQconnectPoll(), where we don't reset the state machine correctly for
> > try_gss within reset_connection_state_machine, and instead HEAD does
> > it in connectDBStart().
> 
> So, a lot of things come down to PQconnectPoll() here.  Once the
> connection state reached is CONNECTION_MADE, we first try a GSS
> connection if try_gss is true, and a SSL connection attempt follows
> just after.  This makes me wonder about the following things:
> - gssencmode is prefer by default, the same as sslmode.  Shouldn't we
> issue an error if any of them is not disabled to avoid any conflicts
> in the client, making the choice of gssencmode=prefer by default a bad
> choice?  It seems to me that there could be an argument to make
> gssencmode disabled by default, and issue an error if somebody
> attempts a connection without at least gssencode or sslmode set as
> disabled.

I don't see why it would make sense to throw an error and require that
one of them be disabled.  I certainly don't agree that we should disable
GSS encryption by default, or that there's any reason to throw an error
if both GSS and SSL are set to 'prefer' (as our current default is).

> - The current code tries a GSS connection first, and then it follows
> with SSL, which is annoying because gssencmode=prefer by default means
> that any user would pay the cost of attempting a GSS connection for
> nothing (with or even without SSL).  Shouldn't we do the opposite
> here, by trying SSL first, and then GSS?

A GSS connection is only attempted, as mentioned, if your GSS library
claims that there's a possibility that credentials could be acquired for
the connection.

> For now, I am attaching a WIP patch which seems like a good angle of
> attack for libpq, meaning that if sslmode and gssencmode are both set,
> then we would attempt a SSL connection before attempting GSS, so as
> any user of SSL does not pay a performance hit compared to past
> versions (I know that src/test/kerberos/ fails with that because
> sslmode=prefer is triggered first in PQconnectPoll(), but that's just
> to show the idea I had in mind).
> 
> Any thoughts?

I don't agree with the assumption that, in the face of having GSS up and
running, which actually validates the client and the server, that we
should prefer SSL, where our default configuration for SSL does *not*
validate properly *either* the client or the server.

What it sounds like to me is that it'd be helpful for you to review why
your environment has a GSS credential cache (or, at least,
gss_acquire_cred() returns without any error) but GSS isn't properly
working.  If you're using OSX, it's possible the issue here is actually
the broken and ridiculously ancient kerberos code that OSX ships with
(and which, in another thread, there's a discussion about detecting and
failing to build if it's detected as it's just outright broken).  If
it's not that then it'd be great to understand more about the specific
environment and what things like 'klist' return.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Problems with GSS encryption and SSL in libpq in 12~

2020-05-02 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> A quick make check with Postgres 11 and 12 for src/test/ssl/ shows a
> lot of difference in run time, using the same set of options with SSL
> and the same compilation flags (OpenSSL 1.1.1f, with debugging and
> assertions enabled among other things FWIW), with 12 taking up to five
> minutes to complete and 11 finishing as a matter of seconds for me.
> 
> I have spent a couple of hours on that, to find out that libpq tries
> to initialize a GSS context where the client remains stuck:
> #9  0x7fcd839bf72c in krb5_expand_hostname () from
> /usr/lib/x86_64-linux-gnu/libkrb5.so.3
> #10 0x7fcd839bf9e0 in krb5_sname_to_principal () from
> /usr/lib/x86_64-linux-gnu/libkrb5.so.3
> #11 0x7fcd83ad55b4 in ?? () from
> /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
> #12 0x7fcd83ac0a98 in ?? () from
> /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
> #13 0x7fcd83ac200f in gss_init_sec_context () from
> /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
> #14 0x7fcd8423b24d in pqsecure_open_gss (conn=0x5582fa8cad90) at
> fe-secure-gssapi.c:626
> #15 0x7fcd8421cd2b in PQconnectPoll (conn=0x5582fa8cad90) at
> fe-connect.c:3165
> #16 0x7fcd8421b311 in connectDBComplete (conn=0x5582fa8cad90) at
> fe-connect.c:2182
> #17 0x7fcd84218c1f in PQconnectdbParams (keywords=0x5582fa8cacf0,
> values=0x5582fa8cad40, expand_dbname=1) at fe-connect.c:647
> #18 0x5582f8a81c87 in main (argc=8, argv=0x7ffe5ddb9df8) at
> startup.c:266
> 
> However this makes little sense, why would libpq do that in the
> context of an OpenSSL connection?  Well, makeEmptyPGconn() does that,
> which means that libpq would try by default to use GSS just if libpq
> is *built* with GSS:
> #ifdef ENABLE_GSS
> conn->try_gss = true;
> #endif

Sure, but if you look at what is done with it:

/*
 * If GSSAPI encryption is enabled, then call
 * pg_GSS_have_cred_cache() which will return true if we can
 * acquire credentials (and give us a handle to use in
 * conn->gcred), and then send a packet to the server asking
 * for GSSAPI Encryption (and skip past SSL negotiation and
 * regular startup below).
 */
if (conn->try_gss && !conn->gctx)
conn->try_gss = pg_GSS_have_cred_cache(>gcred);

In other words, it's trying because a call to gss_acquire_cred() (called
from pg_GSS_have_cred_cache()) returned without error, indicating that
GSS should be possible to attempt.

If you have GSS compiled in, and you've got a credential cache such that
gss_acquire_cred() returns true, it seems entirely reasonable that you'd
like to connect using GSS encryption.

> It is possible to enforce this flag to false by using
> gssencmode=disable, but that's not really user-friendly in my opinion
> because nobody is going to remember that for connection strings with
> SSL settings so a lot of application are taking a performance hit at
> connection because of that in my opinion.  I think that's also a bad
> idea from the start to assume that we have to try GSS by default, as
> any new code path opening a secured connection may fail into the trap
> of attempting to use GSS if this flag is not reset.  Shouldn't we try
> to set this flag to false by default, and set it to true only if
> necessary depending on gssencmode?  A quick hack switching this flag
> to false in makeEmptyPGconn() gives back the past performance to
> src/test/ssl/, FWIW.

We don't just always try to do GSS, that certainly wouldn't make sense-
we only try when gss_acquire_cred() comes back without an error.

As this is part of the initial connection, it's also not possible to
decide to do it by "depending on gssencmode", as we haven't talked to
the server at all at this point and need to decide if we're going to do
GSS encryption or not with the initial packet.  Note that this is
more-or-less identical to what we do with SSL, and, as you saw, we
default to 'prefer' with GSSENCMODE, but you can set it to 'disable' on
the client side if you don't want to try GSS, even when you have a
client compiled with GSS and you have a credential cache.

> Looking around, it seems to me that there is a second issue as of
> PQconnectPoll(), where we don't reset the state machine correctly for
> try_gss within reset_connection_state_machine, and instead HEAD does
> it in connectDBStart().

Not following exactly what you're referring to here, but I see you've
sent a follow-up email about this and will respond to that
independently.

> Also, I have noted a hack as of pqsecure_open_gss() which does that:
>/*
> * We're done - hooray!  Kind of gross, but we need to disable SSL
> * here so that we don't accidentally tunnel one over the other.
> */
> #ifdef USE_SSL
>conn->allow_ssl_try = false;
> #endif
> And that looks like a rather bad idea to me..

Tunneling SSL over GSS encryption is definitely a bad idea, which is why
we prevent that from happening.  I'm not sure what the issue here is-
are you suggesting that we should 

Re: Protocol problem with GSSAPI encryption?

2020-05-02 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
> > > "Peter" == Peter Eisentraut  writes:
> > 
> >  >> It seems to me that this is a bug in ProcessStartupPacket, which
> >  >> should accept both GSS or SSL negotiation requests on a connection
> >  >> (in either order). Maybe secure_done should be two flags rather than
> >  >> one?
> > 
> >  Peter> I have also seen reports of that. I think your analysis is
> >  Peter> correct.
> > 
> > I figure something along these lines for the fix. Anyone in a position
> > to test this?
> 
> At least at first blush, I tend to agree with your analysis and patch.
> 
> I'll see about getting this actually set up and tested in the next week
> or so (and maybe there's some way to also manage to have a regression
> test for it..).

After testing this and playing around with it a bit, I've gone ahead and
pushed it.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: SLRU statistics

2020-05-02 Thread Tomas Vondra

On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:


...



Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?



I've looked at places calling pgstat_send_* functions, and I found
thsese places:

src/backend/postmaster/bgwriter.c

- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.

src/backend/postmaster/checkpointer.c

- This is what we're already discussing here.

src/backend/postmaster/pgarch.c

- Seems irrelevant.


I'm a bit puzzled why we're not sending any stats from walsender, which
I suppose could do various stuff during logical decoding.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WIP: WAL prefetch (another approach)

2020-05-02 Thread Dmitry Dolgov
> On Sat, Apr 25, 2020 at 09:19:35PM +0200, Dmitry Dolgov wrote:
> > On Tue, Apr 21, 2020 at 05:26:52PM +1200, Thomas Munro wrote:
> >
> > One report I heard recently said that if  you get rid of I/O stalls,
> > pread() becomes cheap enough that the much higher frequency lseek()
> > calls I've complained about elsewhere[1] become the main thing
> > recovery is doing, at least on some systems, but I haven't pieced
> > together the conditions required yet.  I'd be interested to know if
> > you see that.
>
> At the moment I've performed couple of tests for the replication in case
> when almost everything is in memory (mostly by mistake, I was expecting
> that a postgres replica within a badly memory limited cgroup will cause
> more IO, but looks like kernel do not evict pages anyway). Not sure if
> that's what you mean by getting rid of IO stalls, but in these tests
> profiling shows lseek & pread appear in similar amount of samples.
>
> If I understand correctly, eventually one can measure prefetching
> influence by looking at different redo function execution time (assuming
> that data they operate with is already prefetched they should be
> faster). I still have to clarify what is the exact reason, but even in
> the situation described above (in memory) there is some visible
> difference, e.g.

I've finally performed couple of tests involving more IO. The
not-that-big dataset of 1.5 GB for the replica with the memory allowing
fitting ~ 1/6 of it, default prefetching parameters and an update
workload with uniform distribution. Rather a small setup, but causes
stable reading into the page cache on the replica and allows to see a
visible influence of the patch (more measurement samples tend to happen
at lower latencies):

# with patch
Function = b'heap_redo' [206]
 nsecs   : count distribution
  1024 -> 2047   : 0||
  2048 -> 4095   : 32833|**  |
  4096 -> 8191   : 59476||
  8192 -> 16383  : 18617||
 16384 -> 32767  : 3992 |**  |
 32768 -> 65535  : 425  ||
 65536 -> 131071 : 5||
131072 -> 262143 : 326  ||
262144 -> 524287 : 6||

# without patch
Function = b'heap_redo' [130]
 nsecs   : count distribution
  1024 -> 2047   : 0||
  2048 -> 4095   : 20062|*** |
  4096 -> 8191   : 70662||
  8192 -> 16383  : 12895|*** |
 16384 -> 32767  : 9123 |*   |
 32768 -> 65535  : 560  ||
 65536 -> 131071 : 1||
131072 -> 262143 : 460  ||
262144 -> 524287 : 3||

Not that there were any doubts, but at the same time it was surprising
to me how good linux readahead works in this situation. The results
above are shown with disabled readahead for filesystem and device, and
without that there was almost no difference, since a lot of IO was
avoided by readahead (which was in fact the majority of all reads):

# with patch
flags = Read
 usecs   : count distribution
16 -> 31 : 0|   
 |
32 -> 63 : 1|   
 |
64 -> 127: 5
||

flags = ReadAhead-Read
 usecs   : count distribution
32 -> 63 : 0|   
 |
64 -> 127: 131  
||
   128 -> 255: 12   |***
 |
   256 -> 511: 6|*  
 |

# without patch
flags = Read
 usecs   : count distribution
16 -> 31 : 0|   
 |
32 -> 63 : 0|   
 |
64 -> 127: 4
||

flags = ReadAhead-Read
 usecs   : count distribution
32 -> 63 : 0|   
 |
   

Re: pg_stat_reset_slru(name) doesn't seem to work as documented

2020-05-02 Thread Tomas Vondra

On Sat, May 02, 2020 at 02:27:33PM +0200, Tomas Vondra wrote:

On Fri, May 01, 2020 at 07:10:23PM +0900, Atsushi Torikoshi wrote:

Hi,

When I tried to reset a counter in pg_stat_slru using
pg_stat_reset_slru(name),
not only the specified counter but all the counters were reset.

postgres=# SELECT * FROM pg_stat_slru ;
   name   | blks_zeroed | blks_hit | blks_read | blks_written |
blks_exists | flushes | truncates |  stats_reset

--+-+--+---+--+-+-+---+---
 async|   3 |0 | 0 |3 |
  0 |   0 | 0 | 2020-05-01 17:36:26.073433+09
 clog |   0 |   56 | 0 |0 |
  0 |   0 | 0 | 2020-05-01 17:36:26.073433+09
 commit_timestamp |   0 |0 | 0 |0 |
  0 |   0 | 0 | 2020-05-01 17:36:26.073433+09
 (snip)


postgres=# SELECT pg_stat_reset_slru('clog');


postgres=# SELECT * FROM pg_stat_slru ;
   name   | blks_zeroed | blks_hit | blks_read | blks_written |
blks_exists | flushes | truncates |  stats_reset

--+-+--+---+--+-+-+---+---
 async|   0 |0 | 0 |0 |
  0 |   0 | 0 | 2000-01-01 09:00:00+09
 clog |   0 |0 | 0 |0 |
  0 |   0 | 0 | 2020-05-01 17:37:02.525006+09
 commit_timestamp |   0 |0 | 0 |0 |
  0 |   0 | 0 | 2000-01-01 09:00:00+09
 (snip)


Attached a patch.



Yeah, that memset is a left-over from some earlier version of the patch.
Thanks for the report and patch, I'll push this shortly.



Pushed. Thanks for the report.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Unify drop-by-OID functions

2020-05-02 Thread Ranier Vilela
Em sáb., 2 de mai. de 2020 às 05:01, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> escreveu:

> On 2020-05-01 23:31, Ranier Vilela wrote:
> > I can suggest improvements?
> >
> > 1. In case Object is cached, delay open_table until the last moment, for
> > the row to be blocked as little as possible and close the table as
> > quickly as possible.
> > 2. In case Object is cached and the tuple is invalid, do not open table.
> > 3. Otherwise, is it possible to call systable_endscan, after table_close?
>
> What do you mean by "object is cached"?
>
Well, that's what I deduced from the cacheId variable name.

regards,
Ranier Vilela


Re: pg_stat_reset_slru(name) doesn't seem to work as documented

2020-05-02 Thread Tomas Vondra

On Fri, May 01, 2020 at 07:10:23PM +0900, Atsushi Torikoshi wrote:

Hi,

When I tried to reset a counter in pg_stat_slru using
pg_stat_reset_slru(name),
not only the specified counter but all the counters were reset.

 postgres=# SELECT * FROM pg_stat_slru ;
name   | blks_zeroed | blks_hit | blks_read | blks_written |
blks_exists | flushes | truncates |  stats_reset

--+-+--+---+--+-+-+---+---
  async|   3 |0 | 0 |3 |
   0 |   0 | 0 | 2020-05-01 17:36:26.073433+09
  clog |   0 |   56 | 0 |0 |
   0 |   0 | 0 | 2020-05-01 17:36:26.073433+09
  commit_timestamp |   0 |0 | 0 |0 |
   0 |   0 | 0 | 2020-05-01 17:36:26.073433+09
  (snip)


 postgres=# SELECT pg_stat_reset_slru('clog');


 postgres=# SELECT * FROM pg_stat_slru ;
name   | blks_zeroed | blks_hit | blks_read | blks_written |
blks_exists | flushes | truncates |  stats_reset

--+-+--+---+--+-+-+---+---
  async|   0 |0 | 0 |0 |
   0 |   0 | 0 | 2000-01-01 09:00:00+09
  clog |   0 |0 | 0 |0 |
   0 |   0 | 0 | 2020-05-01 17:37:02.525006+09
  commit_timestamp |   0 |0 | 0 |0 |
   0 |   0 | 0 | 2000-01-01 09:00:00+09
  (snip)


Attached a patch.



Yeah, that memset is a left-over from some earlier version of the patch.
Thanks for the report and patch, I'll push this shortly.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WAL usage calculation patch

2020-05-02 Thread Amit Kapila
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud  wrote:
>
> On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud  wrote:
> >
> > On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila  wrote:
> > >
> > > Julien, are you planning to write a cleanup patch for this open item?
> >
> > Sorry Amit, I've been quite busy at work for the last couple of days.
> > I'll take care of that this morning for sure!
>
> Here's the patch.
>

Thanks for the patch. I will look into it early next week.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.

2020-05-02 Thread Amit Kapila
On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao  wrote:
>
> On 2020/04/08 1:49, Fujii Masao wrote:
> >
> >
> > On 2020/04/07 20:21, David Steele wrote:
> >>
> >> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote:
> >>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao 
> >>>  wrote in
> >> This doesn't seem a bug, so I'm thinking to merge this to next *major*
> >> version release, i.e., v13.
> > Not a bug, perhaps, but I think we do consider back-patching
> > performance problems. The rise in S3 usage has just exposed how poorly
> > this performed code in high-latency environments.
> 
>  I understood the situation and am fine to back-patch that. But I'm not
>  sure
>  if it's fair to do that. Maybe we need to hear more opinions about
>  this?
>  OTOH, feature freeze for v13 is today, so what about committing the
>  patch
>  in v13 at first, and then doing the back-patch after hearing opinions
>  and
>  receiving many +1?
> >>>
> >>> +1 for commit only v13 today, then back-patch if people wants and/or
> >>> accepts.
>
> Please let me revisit this. Currently Grigory Smolkin, David Steele,
> Michael Paquier and Pavel Suderevsky agree to the back-patch and
> there has been no objection to that. So we should do the back-patch?
> Or does anyone object to that?
>
> I don't think that this is a feature bug because archive recovery works
> fine from a functional perspective without this commit. OTOH,
> I understand that, without the commit, there is complaint about that
> archive recovery may be slow unnecessarily when archival storage is
> located in remote, e.g., Amazon S3 and it takes a long time to fetch
> the non-existent archive WAL file. So I'm ok to the back-patch unless
> there is no strong objection to that.
>

I don't see any obvious problem with the changed code but we normally
don't backpatch performance improvements.  I can see that the code
change here appears to be straight forward so it might be fine to
backpatch this.  Have we seen similar reports earlier as well?  AFAIK,
this functionality is for a long time and if people were facing this
on a regular basis then we would have seen such reports multiple
times.  I mean to say if the chances of this hitting are less then we
can even choose not to backpatch this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Rotten parts of src/backend/replication/README

2020-05-02 Thread Amit Kapila
On Sat, May 2, 2020 at 8:16 AM Michael Paquier  wrote:
>
> Hi all,
>
> The first part of src/backend/replication/README lists all the APIs
> usable for a WAL receiver, but these have aged and lost track of most
> changes that happened over the years.  Four functions are listed in
> the README, with incorrect signatures and many others are missing:
> - walrcv_connect()
> - walrcv_receive()
> - walrcv_send()
> - walrcv_disconnect()
>
> I think that we should clean up that.

+1.

>  And as it seems to me that
> nobody really remembers to update this README, I would suggest to
> update the first section of the README to refer to walreceiver.h for
> details about each function, then move the existing API descriptions
> from the README to walreceiver.h (while fixing the incomplete
> descriptions of course).  This way, if a new function is added or if
> an existing function is changed, it is going to be hard to miss an
> update of the function descriptions.
>

I think in README we can have a general description of the module and
maybe at the broad level how different APIs can help to achieve the
required functionality and then for API description we can refer to
.h/.c.  The detailed description of APIs should be where those APIs
are defined.  The header file can contain some generic description.
The detailed description I am referring to is below in the README:
"Retrieve any message available without blocking through the
connection.  If a message was successfully read, returns its length.
If the connection is closed, returns -1.  Otherwise returns 0 to
indicate that no data is available, and sets *wait_fd to a socket
descriptor which can be waited on before trying again.  On success, a
pointer to the message payload is stored in *buffer. The returned
buffer is valid until the next call to walrcv_* functions, and the
caller should not attempt to free it."

I think having such a description near the actual definition helps in
keeping it updated whenever we change the function.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: SLRU statistics

2020-05-02 Thread Tomas Vondra

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:



On 2020/05/02 9:08, Tomas Vondra wrote:

On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:



On 2020/05/01 3:19, Tomas Vondra wrote:

On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:



On 2020/04/02 9:41, Tomas Vondra wrote:

Hi,

I've pushed this after some minor cleanup and improvements.


+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+  "multixact_offset", "multixact_member",
+  "oldserxid", "pg_xact", "subtrans",
+  "other" /* has to be last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.



Yeah, I think I got confused and accidentally added both "clog" and
"pg_xact". I'll get "pg_xact" removed.


Thanks!



OK, pushed. Thanks!


Thanks a lot!

But, like the patch that I attached in the previous email does,
"pg_xact" should be removed from the description of pg_stat_reset_slru()
in monitoring.sgml.



Whooops. My bad, will fix.


Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?


Atsushi-san reported another issue in pg_stat_slru.
You're planning to work on that?
https://postgr.es/m/cacz0uyfe16pjzxqyatn53mspym7dgmpyl3djljjpw69gmcc...@mail.gmail.com



Yes, I'll investigate.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Unify drop-by-OID functions

2020-05-02 Thread Peter Eisentraut

On 2020-05-01 23:31, Ranier Vilela wrote:

I can suggest improvements?

1. In case Object is cached, delay open_table until the last moment, for 
the row to be blocked as little as possible and close the table as 
quickly as possible.

2. In case Object is cached and the tuple is invalid, do not open table.
3. Otherwise, is it possible to call systable_endscan, after table_close?


What do you mean by "object is cached"?

In any case, this is a refactoring patch, so significant changes to the 
internal logic would not really be in scope.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-05-02 Thread Fujii Masao




On 2020/05/02 9:08, Tomas Vondra wrote:

On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:



On 2020/05/01 3:19, Tomas Vondra wrote:

On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:



On 2020/04/02 9:41, Tomas Vondra wrote:

Hi,

I've pushed this after some minor cleanup and improvements.


+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+  "multixact_offset", "multixact_member",
+  "oldserxid", "pg_xact", "subtrans",
+  "other" /* has to be last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.



Yeah, I think I got confused and accidentally added both "clog" and
"pg_xact". I'll get "pg_xact" removed.


Thanks!



OK, pushed. Thanks!


Thanks a lot!

But, like the patch that I attached in the previous email does,
"pg_xact" should be removed from the description of pg_stat_reset_slru()
in monitoring.sgml.


Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?

In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.

Atsushi-san reported another issue in pg_stat_slru.
You're planning to work on that?
https://postgr.es/m/cacz0uyfe16pjzxqyatn53mspym7dgmpyl3djljjpw69gmcc...@mail.gmail.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION