date:20200328

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-28 Thread Dilip Kumar

On Sat, Mar 28, 2020 at 11:56 AM Amit Kapila  wrote:
>
> On Wed, Mar 4, 2020 at 9:14 AM Dilip Kumar  wrote:
> >
> > On Wed, Mar 4, 2020 at 3:16 AM Tomas Vondra
> >  wrote:
> > >
> > >
> > > The first thing I realized that WAL-logging of assignments in v12 does
> > > both the "old" logging (using dedicated message) and "new" with
> > > toplevel-XID embedded in the first message. Yes, the patch was wrong,
> > > because it eliminated all calls to ProcArrayApplyXidAssignment() and so
> > > it was trivial to crash the replica due to KnownAssignedXids overflow.
> > > But I don't think re-introducing XLOG_XACT_ASSIGNMENT message is the
> > > right fix.
> > >
> > > I actually proposed doing this (having both ways to log assignments) so
> > > that there's no regression risk with (wal_level < logical). But IIRC
> > > Andres objected to it, argumenting that we should not log the same piece
> > > of information in two very different ways at the same time (IIRC it was
> > > discussed on the FOSDEM dev meeting, so I don't have a link to share).
> > > And I do agree with him ...
> > >
> > > The question is, why couldn't the replica use the same assignment info
> > > we already write for logical decoding? The main challenge is that now
> > > the assignment can be sent in many different xlog messages, from a bunch
> > > of resource managers (essentially, any xlog message with a xid can have
> > > embedded XID of the toplevel xact). So the handling would either need to
> > > happen in every rmgr, or we need to move it before we call the rmgr.
> > >
> > > For exampple, we might do this e.g. in StartupXLOG() I think, per the
> > > attached patch (FWIW this particular fix was written by Masahiko Sawada,
> > > not me). This does the trick for me - I'm no longer able to reproduce
> > > the KnownAssignedXids overflow.
> > >
> > > The one difference is that we used to call ProcArrayApplyXidAssignment
> > > for larger groups of XIDs, as sent in the assignment message. Now we
> > > call it for each individual assignment. I don't know if this is an
> > > issue, but I suppose we might introduce some sort of local caching
> > > (accumulate the assignments into a local array, call the function only
> > > when we have enough of them).
> >
> > Thanks for the pointers,  I will think over these points.
> >
>
> I have looked at the solution proposed and I would like to share my
> findings.  I think calling ProcArrayApplyXidAssignment for each
> subtransaction is not a good idea for a couple of reasons:
> (a) It will just beat the purpose of maintaining KnowAssignedXids
> array which is to avoid looking at pg_subtrans in
> TransactionIdIsInProgress() on standby.  Basically, if we remove it
> for each subXid, it will consider the KnowAssignedXids to be
> overflowed and check pg_subtrans frequently.

Right, I also think this is a problem with this solution.  I think we
may try to avoid this by caching this information.  But, then we will
have to maintain this in some dimensional array which stores
sub-transaction ids per top transaction or we can maintain a list of
sub-transaction for each transaction.  I haven't thought about how
much complexity this solution will add.

> (b)  Calling ProcArrayApplyXidAssignment() for each subtransaction can
> be costly from the perspective of concurrency because it acquires
> ProcArrayLock in Exclusive mode, so concurrently running transactions
> might start blocking at this lock.

Right

 Also, I see that
> SubTransSetParent() makes the page dirty, so it might lead to more
> writes if we spread out setting that by calling it separately for each
> sub-transaction.

Right.

>
> Apart from this, I don't see how the proposed fix is correct because
> as far as I can see it tries to remove the Xid before we even record
> it via RecordKnownAssignedTransactionIds().  It seems after patch
> RecordKnownAssignedTransactionIds() will be called after
> ProcArrayApplyXidAssignment(), how could that be correct.

Valid point.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: A bug when use get_bit() function for a long bytea string

2020-03-28 Thread movead...@highgo.ca

>I think the second argument indicates the bit position, which would be max 
>bytea length * 8. If max bytea length covers whole int32, the second argument 
>>needs to be wider i.e. int64.
Yes, it makes sence and followed.

>I think we need a similar change in byteaGetBit() and byteaSetBit() as well.
Sorry, I think it's my mistake, it is the two functions above should be changed.


> Some more comments on the patch
> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
>  unsigned (*decode_len) (const char *data, unsigned dlen);
>  unsigned (*encode) (const char *data, unsigned dlen, char *res);
>  unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
>  res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>  /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this 
>cast.
I change the 'encode' function, it needs an int64 return type, but keep other 
functions in 'pg_encoding', because I think it of no necessary reason.

>Ok, let's leave it for a committer to decide. 
Well, I change all of them this time, because Tom Lane supports on next mail.

 
>Some more review comments.
>+   int64   res,resultlen;
Done

>We need those on separate lines, possibly.
>+   byteNo = (int32)(n / BITS_PER_BYTE);
>Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
>add a comment explaining the reason for the cast. The comment applies at other
>places where this change appears.
>-   int len;
>+   int64   len;
>Why do we need this change?
>int i;
It is my mistake as describe above, it should not be 'bitgetbit()/bitsetbit()'  
to be changed.



>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.
Add two test cases.
 
>+
>+select get_bit(
>+set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 
>1024 * 1024 + 1, 0)
>+   ,1024 * 1024 * 1024 + 1);

>This bit position is still within int4.
>postgres=# select pg_column_size(1024 * 1024 * 1024 + 1); 
> pg_column_size
>
>  4   
>(1 row)

>You want something like
>postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8); 
> pg_column_size
>
>  8   
>(1 row)
I intend to test size large then 1G, and now I think you give a better idea and 
followed.




Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



long_bytea_string_bug_fix_ver4.patch
Description: Binary data

[PATCH] postgresql.conf.sample->postgresql.conf.sample.in

2020-03-28 Thread i . taranov

A patch for converting postgresql.conf.sample to 
postgresql.conf.sample.in . This feature allows you to manage the 
contents of postgresql.conf.sample at the configure phase.


Usage example:

./configure --enable-foo


configure.in:

foo_params=$(cat <<-END
  foo_param1 = on
  foo_param2 = 16
END
)
AC_SUBST(foo_params)


postgresql.conf.sample.in:

@foo_params@


postgresql.conf.sample:

foo_param1 = on
foo_param2 = 16

--From d8ac8bb8f49e8a528a3e89de38c23b014361dab3 Mon Sep 17 00:00:00 2001
From: "Ivan N. Taranov" 
Date: Fri, 27 Mar 2020 22:27:08 +0300
Subject: [PATCH] postgresql.conf.sample->postgresql.conf.sample.in


diff --git a/configure.in b/configure.in
index ecdf172396..c20992be77 100644
--- a/configure.in
+++ b/configure.in
@@ -2389,7 +2389,7 @@ fi
 AC_SUBST(vpath_build)
 
 
-AC_CONFIG_FILES([GNUmakefile src/Makefile.global])
+AC_CONFIG_FILES([GNUmakefile src/Makefile.global src/backend/utils/misc/postgresql.conf.sample])
 
 AC_CONFIG_LINKS([
   src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION}
diff --git a/src/backend/utils/misc/.gitignore b/src/backend/utils/misc/.gitignore
index 495b1aec76..0adfc77197 100644
--- a/src/backend/utils/misc/.gitignore
+++ b/src/backend/utils/misc/.gitignore
@@ -1 +1,2 @@
 /guc-file.c
+/postgresql.conf.sample
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample.in
similarity index 100%
rename from src/backend/utils/misc/postgresql.conf.sample
rename to src/backend/utils/misc/postgresql.conf.sample.in
-- 
2.26.0

Re: A bug when use get_bit() function for a long bytea string

2020-03-28 Thread movead...@highgo.ca

I want to resent the mail, because last one is in wrong format and hardly to 
read.

In addition, I think 'Size' or 'size_t' is rely on platform, they may can't 
work on 32bit
system. So I choose 'int64' after ashutosh's review.

>>I think the second argument indicates the bit position, which would be max 
>>bytea length * 8. If max bytea length covers whole int32, the second argument 
>>>needs to be wider i.e. int64.
>Yes, it makes sence and followed.

>>I think we need a similar change in byteaGetBit() and byteaSetBit() as well.
Sorry, I think it's my mistake, it is the two functions above should be changed.


>>Some more comments on the patch
>> struct pg_encoding
>> {
>>- unsigned (*encode_len) (const char *data, unsigned dlen);
>>+ int64 (*encode_len) (const char *data, unsigned dlen);
>>  unsigned (*decode_len) (const char *data, unsigned dlen);
>>  unsigned (*encode) (const char *data, unsigned dlen, char *res);
>>  unsigned (*decode) (const char *data, unsigned dlen, char *res);
>> Why not use return type of int64 for rest of the functions here as well?
>>  res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>>  /* Make this FATAL 'cause we've trodden on memory ... */
>>- if (res > resultlen)
>>+ if ((int64)res > resultlen)
>>
>>if we change return type of all those functions to int64, we won't need this 
>>cast.
>I change the 'encode' function, it needs an int64 return type, but keep other 
>functions in 'pg_encoding', because I think it of no necessary reason.

>>Ok, let's leave it for a committer to decide. 
Well, I change all of them this time, because Tom Lane supports on next mail.

 
>Some more review comments.
>+   int64   res,resultlen;
>We need those on separate lines, possibly.
Done

>+   byteNo = (int32)(n / BITS_PER_BYTE);
>Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
>add a comment explaining the reason for the cast. The comment applies at other
>places where this change appears.
>-   int len;
>+   int64   len;
>Why do we need this change?
>int i;
It is my mistake as describe above, it should not be 'bitgetbit()/bitsetbit()'  
to be changed.



>>It might help to add a test where we could pass the second argument something
>>greater than 1G. But it may be difficult to write such a test case.
>Add two test cases.
 
>+
>+select get_bit(
>+set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 
>1024 * 1024 + 1, 0)
>+   ,1024 * 1024 * 1024 + 1);

>This bit position is still within int4.
>postgres=# select pg_column_size(1024 * 1024 * 1024 + 1); 
> pg_column_size
>
>  4   
>(1 row)

>You want something like
>postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8); 
> pg_column_size
>
>  8   
>(1 row)
I intend to test size large then 1G, and now I think you give a better idea and 
followed.





Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread David Rowley

On Sat, 28 Mar 2020 at 19:21, David Rowley  wrote:
> Thank you.  Pushed.

I'm unsure yet if this has caused an instability on lousyjack's run in
[1]. I see that table does have 30,000 rows inserted, so it does seem
probable that it may receive an autovacuum now when didn't before. I
did a quick local test to see if swapping the "ANALYZE pagg_tab_ml;"
to "VACUUM ANALYZE pagg_tab_ml;" would do the same on my local
machine, but it didn't.

I'll keep an eye on lousyjack's next run.  If it passes next run, I
may add some SQL to determine if pg_stat_all_tables.autovacuum_count
for those tables are varying between passing and failing runs.

David

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2020-03-28 Thread Fabien COELHO



Hello Tom,

Thanks for your feedback,


I'd be rather unclear about what the actual feedback is, though. I'd
interpret it as "pg does not care much about code coverage". Most clients
are in the red on coverage.postgresql.org. I'd like pgbench at least to be
in the green, but it does not look that it will ever be the case.



The reason why the first iteration failed was that it was insufficiently
insensitive to timing.


Yes.

So the problem for a replacement patch is "how do you test fundamentally 
timing-sensitive behavior in a timing-insensitive way?".


The answer is that it depends on the test objective, and there can be no 
miracle because the test environment is not controlled, in particular it
cannot expect the execution to be responsive (i.e. the host not to be 
overloaded for some reason).


As written in the added test comments, these tests mostly aim at coverage, 
so the time-related part checks are loose, although real.


It's not really clear to me that that's possible, so I don't have a lot 
of faith that this patch wouldn't fail as well.


AFAICR I think that it is pretty unlikely to fail.

Many other pg test can fail but mostly don't: some rely on random stuff, 
and you can get unlucky once in a while; Other events can occur such as 
file system full or whatever.



I'm also a bit disturbed that the patch seems to be changing pgbench's
behavior for no reason other than to try to make the test produce the
same results no matter what the actual machine performance is ... which
seems, at best, rather contrary to pgbench's real purpose.


No, not exactly.

The behavior change is to remove a corner-case optimization which creates 
an exception to -T/-P compliance: when under very low test rate (with -R) 
and -T and -P, pgbench shortens the run (i.e. end before the prescribed 
time) if there is nothing to do in the future, so that progress is not 
shown.


This optimization makes it impossible to test that pgbench complies to -T 
and -P, because it does not always complies. Using -R is useful to avoid 
too much assumption on the test host load.


So I wonder, are those behavioral changes a net win from a user's 
standpoint?


Would a user complain that they asked for -T 10 -P 1 but the test ran for 
10 seconds and the got 10 progress reports along the way?


The net win is that the feature they asked for has been tested a little 
before they actually ran it. It is small, but better than nothing.


If not we're letting the tail wag the dog.  (If they are a win, they 
ought to be submitted and defended independently, and maybe there ought 
to be some documentation changes alongside.)


The shorten execution time is a non documented corner case that nobody 
really expects as a feature, IMHO. It is a side effect of the 
implementation. I do not think it is worth documenting.



I'm not against having better test coverage numbers, but it's a means
to an end not an end in itself.


Sure, numbers are not an end in themselves.

For me what is not tested should not be expected to work, so I like to 
have most/all lines run at least once by some tests, as a minimal 
insurance that it is not completely broken… which means at least green.


It has to be balanced against the amount of effort to be put into 
testing (as opposed to actually improving our software).


I'm all for balanced and meaningful testing, obviously.

However, the current balance results in the coverage status being abysmal. 
I do not think it is the right one.


--
Fabien.

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread Fabien COELHO




Hello Tom,


I cannot say that I "want" to fix something which already works the same
way, because it is against my coding principles. [...]
I counted nearly 3500 calls under src/bin.


Yeah, that's the problem.  If someone does come forward with a patch to do
that, I think it'd be summarily rejected, at least in high-traffic code
like pg_dump.  The pain it'd cause for back-patching would outweigh the
value.


What about "typedef StringInfoData PQExpBufferData" and replacing 
PQExpBuffer by StringInfo internally, just keeping the old interface 
around because it is there? That would remove a few hundreds clocs.


ISTM that with inline and varargs macro the substition can be managed 
reasonably lightly, depending on what level of compatibility is required 
for libpq: should it be linkability, or requiring a recompilation is ok?


A clear benefit is that there are quite a few utils for PQExpBuffer in 
"fe_utils/string_utils.c" which would become available for StringInfo, 
which would help using StringInfo without duplicating them.



That being the case, I'd think a better design principle is "make your
new code look like the code around it",


Yep.

which would tend to weigh against introducing StringInfo uses into 
pgbench when there's none there now and a bunch of PQExpBuffer instead.

So I can't help thinking the advice you're being given here is suspect.


Well, that is what I was saying, but at 2 against 1, I fold.

--
Fabien.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-28 Thread Amit Kapila

On Sat, Mar 28, 2020 at 2:19 PM Dilip Kumar  wrote:
>
> On Sat, Mar 28, 2020 at 11:56 AM Amit Kapila  wrote:
> >
> >
> > I have looked at the solution proposed and I would like to share my
> > findings.  I think calling ProcArrayApplyXidAssignment for each
> > subtransaction is not a good idea for a couple of reasons:
> > (a) It will just beat the purpose of maintaining KnowAssignedXids
> > array which is to avoid looking at pg_subtrans in
> > TransactionIdIsInProgress() on standby.  Basically, if we remove it
> > for each subXid, it will consider the KnowAssignedXids to be
> > overflowed and check pg_subtrans frequently.
>
> Right, I also think this is a problem with this solution.  I think we
> may try to avoid this by caching this information.  But, then we will
> have to maintain this in some dimensional array which stores
> sub-transaction ids per top transaction or we can maintain a list of
> sub-transaction for each transaction.  I haven't thought about how
> much complexity this solution will add.
>

How about if instead of writing an XLOG_XACT_ASSIGNMENT WAL, we set a
flag in TransactionStateData and then log that as special information
whenever we write next WAL record for a new subtransaction?  Then
during recovery, we can only call ProcArrayApplyXidAssignment when we
find that special flag is set in a WAL record.  One idea could be to
use a flag bit in XLogRecord.xl_info.  If that is feasible then the
solution can work as it is now, without any overhead or change in the
way we maintain KnownAssignedXids.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: doc: vacuum full, fillfactor, and "extra space"

2020-03-28 Thread Amit Kapila

On Wed, Jan 29, 2020 at 9:10 PM Peter Eisentraut
 wrote:
>
> On 2020-01-20 06:30, Justin Pryzby wrote:
>
> About your patch, I don't think this is clearer.  The fillfactor stuff
> is valid to be mentioned, but the way it's being proposed makes it sound
> like the main purpose of VACUUM FULL is to bloat the table to make
> fillfactor room.  The "no extra space" wording made sense to me, with
> the fillfactor business perhaps worth being put into a parenthetical note.
>

Justin, would you like to address this comment of Peter E.?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WAL usage calculation patch

2020-03-28 Thread Amit Kapila

On Sat, Mar 28, 2020 at 12:54 AM Julien Rouhaud  wrote:
>
> On Fri, Mar 27, 2020 at 8:21 PM Kirill Bychik  wrote:
> >
> >
> > All these are really valuable objections. Unfortunately, I won't be
> > able to get all sorted out soon, due to total lack of time. I would be
> > very glad if somebody could step in for this patch.
>
> I'll try to do that tomorrow!
>

I see some basic problems with the patch.  The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me.  Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?  Basically,
I don't know changes done in ExecInitParallelPlan and friends allow us
to compute WAL for parallel operations.  Those will primarily cover
parallel queries that won't write WAL.  How you have tested those
changes?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Possible copy and past error? (\usr\backend\commands\analyze.c)

2020-03-28 Thread Ranier Vilela

Em sex., 27 de mar. de 2020 às 20:49, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > Can someone check if there is a copy and paste error, at file:
> > \usr\backend\commands\analyze.c, at lines 2225 and 2226?
> > int num_mcv = stats->attr->attstattarget;
> > int num_bins = stats->attr->attstattarget;
>
> No, that's intentional I believe.  Those are independent variables
> that just happen to start out with the same value.
>
Neither you nor I can say with 100% certainty that the original author's
intention.

>
> > If they really are the same values, it could be changed to:
>
> > int num_mcv = stats->attr->attstattarget;
> > int num_bins = num_mcv;
>
> That would make it look like they are interdependent, which they are not.
>
> That's exactly why, instead of proposing a patch, I asked a question.


> > To silence this alert.
>
> If you have a tool that complains about that coding, I think the
> tool needs a solid whack upside the head.  There's nothing wrong
> with the code, and it clearly expresses the intent, which the other
> way doesn't.  (Or in other words: it's the compiler's job to
> optimize away the duplicate fetch.  Not the programmer's.)
>
I completely disagree. My tools have proven their worth, including finding
serious errors in the code, which fortunately have been fixed by other
committers.
When issuing this alert, the tool does not value judgment regarding
performance or optimization, but it does an excellent job of finding
similar patterns in adjacent lines, and the only thing it asked for was to
be asked if this was really the case. original author's intention.

regards,
Ranier Vilela

Re: allow online change primary_conninfo

2020-03-28 Thread Sergei Kornilov

Hello

Thank you very much!

I attached updated patch: walreceiver will use configured primary_slot_name as 
temporary slot name if wal_receiver_create_temp_slot is enabled.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de21903a1..8983cb5f5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4170,8 +4170,8 @@ ANY num_sync ( ).
+slot on the remote instance. The slot name can be configured
+(using ), otherwise it will be generated.
 The default is off.  This parameter can only be set in the 
 postgresql.conf file or on the server command line.
 If this parameter is changed while the WAL receiver process is running,
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index fd9ac35dac..134600db7d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -116,13 +116,8 @@ StartupRereadConfig(void)
 
 	conninfoChanged = strcmp(conninfo, PrimaryConnInfo) != 0;
 	slotnameChanged = strcmp(slotname, PrimarySlotName) != 0;
+	tempSlotChanged = (tempSlot != wal_receiver_create_temp_slot);
 
-	/*
-	 * wal_receiver_create_temp_slot is used only when we have no slot
-	 * configured.  We do not need to track this change if it has no effect.
-	 */
-	if (!slotnameChanged && strcmp(PrimarySlotName, "") == 0)
-		tempSlotChanged = tempSlot != wal_receiver_create_temp_slot;
 	pfree(conninfo);
 	pfree(slotname);
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 760e3c7ab0..303e739b5b 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -245,12 +245,6 @@ WalReceiverMain(void)
 	startpoint = walrcv->receiveStart;
 	startpointTLI = walrcv->receiveStartTLI;
 
-	/*
-	 * At most one of is_temp_slot and slotname can be set; otherwise,
-	 * RequestXLogStreaming messed up.
-	 */
-	Assert(!is_temp_slot || (slotname[0] == '\0'));
-
 	/* Initialise to a sanish value */
 	walrcv->lastMsgSendTime =
 		walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = now;
@@ -365,14 +359,14 @@ WalReceiverMain(void)
 
 		/*
 		 * Create temporary replication slot if requested, and update slot
-		 * name in shared memory.  (Note the slot name cannot already be set
-		 * in this case.)
+		 * name in shared memory.
 		 */
 		if (is_temp_slot)
 		{
-			snprintf(slotname, sizeof(slotname),
-	 "pg_walreceiver_%lld",
-	 (long long int) walrcv_get_backend_pid(wrconn));
+			if (slotname[0] == '\0')
+snprintf(slotname, sizeof(slotname),
+		"pg_walreceiver_%lld",
+		(long long int) walrcv_get_backend_pid(wrconn));
 
 			walrcv_create_slot(wrconn, slotname, true, 0, NULL);
 
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 21d1823607..28117883e1 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -218,7 +218,7 @@ ShutdownWalRcv(void)
  * "recptr" indicates the position where streaming should begin.  "conninfo"
  * is a libpq connection string to use.  "slotname" is, optionally, the name
  * of a replication slot to acquire.  "create_temp_slot" indicates to create
- * a temporary slot when no "slotname" is given.
+ * a temporary slot.
  *
  * WAL receivers do not directly load GUC parameters used for the connection
  * to the primary, and rely on the values passed down by the caller of this
@@ -254,23 +254,17 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
 	else
 		walrcv->conninfo[0] = '\0';
 
-	/*
-	 * Use configured replication slot if present, and ignore the value
-	 * of create_temp_slot as the slot name should be persistent.  Otherwise,
-	 * use create_temp_slot to determine whether this WAL receiver should
-	 * create a temporary slot by itself and use it, or not.
-	 */
 	if (slotname != NULL && slotname[0] != '\0')
 	{
 		strlcpy((char *) walrcv->slotname, slotname, NAMEDATALEN);
-		walrcv->is_temp_slot = false;
 	}
 	else
 	{
 		walrcv->slotname[0] = '\0';
-		walrcv->is_temp_slot = create_temp_slot;
 	}
 
+	walrcv->is_temp_slot = create_temp_slot;
+
 	if (walrcv->walRcvState == WALRCV_STOPPED)
 	{
 		launch = true;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 79bc7ac8ca..e702e3eddc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2051,7 +2051,7 @@ static struct config_bool ConfigureNamesBool[] =
 
 	{
 		{"wal_receiver_create_temp_slot", PGC_SIGHUP, REPLICATION_STANDBY,
-			gettext_noop("Sets whether a WAL receiver should create a temporary replication slot if no permanent slot is configured."),
+			gettext_noop("Sets whether a WAL receiver should create a temporary replication slot."),
 		},
 		&wal_receiver_create_temp_slot,
 		false,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e9f8ca775d..e09be86940 100644

Re: [PATCH] postgresql.conf.sample->postgresql.conf.sample.in

2020-03-28 Thread Peter Eisentraut


On 2020-03-28 10:00, i.tara...@postgrespro.ru wrote:

A patch for converting postgresql.conf.sample to
postgresql.conf.sample.in . This feature allows you to manage the
contents of postgresql.conf.sample at the configure phase.

Usage example:

  ./configure --enable-foo


configure.in:

  foo_params=$(cat <<-END
foo_param1 = on
foo_param2 = 16
  END
  )
  AC_SUBST(foo_params)


postgresql.conf.sample.in:

  @foo_params@


postgresql.conf.sample:

  foo_param1 = on
  foo_param2 = 16


Why do we need that?  We already have the capability to make initdb edit 
postgresql.conf.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Allow some recovery parameters to be changed with reload

2020-03-28 Thread Sergei Kornilov

Hello

I want to return to this discussion, since primary_conninfo is now PGC_SIGHUP 
(and I hope will not be reverted)

> On 2019-02-08 09:19:31 +0900, Michael Paquier wrote:
>>  On Thu, Feb 07, 2019 at 11:06:27PM +0100, Peter Eisentraut wrote:
>>  > Probably right. I figured it would be useful to see what the outcome is
>>  > with primary_conninfo, so they can be treated similarly.
>>
>>  The interactions with waiting for WAL to be available and the WAL
>>  receiver stresses me a bit for restore_command, as you could finish
>>  with the startup process switching to use restore_command with a WAL
>>  receiver still working behind and overwriting partially the recovered
>>  segment, which could lead to corruption. We should be *very* careful
>>  about that.
>
> I'm not clear on the precise mechanics you're imagining here, could you
> expand a bit? We kill the walreceiver when switching from receiver to
> restore command, and wait for it to acknowledge that, no?
> C.F. ShutdownWalRcv() call in the lastSourceFailed branch of
> WaitForWALToBecomeAvailable().

So...
We call restore_command only when walreceiver is stopped.
We use restore_command only in startup process - so we have no race condition 
between processes.
We have some issues here? Or we can just make restore_command reloadable as 
attached?

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de21903a1..454bf95d9b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3303,7 +3303,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows

 

-This parameter can only be set at server start.
+This parameter can only be set in the postgresql.conf
+file or on the server command line.

   
  
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 79bc7ac8ca..f340369dcf 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3641,7 +3641,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will retrieve an archived WAL file."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e9f8ca775d..8078249e1f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -247,7 +247,6 @@
 # placeholders: %p = path of file to restore
 #   %f = file name only
 # e.g. 'cp /mnt/server/archivedir/%f %p'
-# (change requires restart)
 #archive_cleanup_command = ''	# command to execute at every restartpoint
 #recovery_end_command = ''	# command to execute at completion of recovery

Re: [PATCH] postgresql.conf.sample->postgresql.conf.sample.in

2020-03-28 Thread Ivan N. Taranov

This is usable for build installable postgresql.conf.SAMPLE. At the
configure phase, it is possible to include / exclude parameters in the
sample depending on the selected options (--enable - * / - disable- *
etc ..)

On Sat, Mar 28, 2020 at 2:21 PM Peter Eisentraut
 wrote:
>
> On 2020-03-28 10:00, i.tara...@postgrespro.ru wrote:
> > A patch for converting postgresql.conf.sample to
> > postgresql.conf.sample.in . This feature allows you to manage the
> > contents of postgresql.conf.sample at the configure phase.
> >
> > Usage example:
> >
> >   ./configure --enable-foo
> >
> >
> > configure.in:
> >
> >   foo_params=$(cat <<-END
> > foo_param1 = on
> > foo_param2 = 16
> >   END
> >   )
> >   AC_SUBST(foo_params)
> >
> >
> > postgresql.conf.sample.in:
> >
> >   @foo_params@
> >
> >
> > postgresql.conf.sample:
> >
> >   foo_param1 = on
> >   foo_param2 = 16
>
> Why do we need that?  We already have the capability to make initdb edit
> postgresql.conf.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online checksums verification in the backend

2020-03-28 Thread Julien Rouhaud

On Sat, Mar 28, 2020 at 12:28:27PM +0900, Masahiko Sawada wrote:
> On Wed, 18 Mar 2020 at 19:11, Julien Rouhaud  wrote:
> >
> > v5 attached
> 
> Thank you for updating the patch. I have some comments:

Thanks a lot for the review!

> 1.
> +   
> +pg_check_relation(relation
> oid, fork
> text)
> +   
> 
> Looking at the declaration of pg_check_relation, 'relation' and 'fork'
> are optional arguments. So I think the above is not correct. But as I
> commented below, 'relation' should not be optional, so maybe the above
> line could be:
> 
> +pg_check_relation(relation
> oid[, fork
> text])

Yes I missed that when making relation mandatory.  Fixed.

> 2.
> +   
> +pg_check_relation
> +   
> +   
> +pg_check_relation iterates over all the blocks of 
> all
> +or the specified fork of a given relation and verify their checksum.  It
> +returns the list of blocks for which the found checksum doesn't match the
> +expected one.  You must be a member of the
> +pg_read_all_stats role to use this function.  It can
> +only be used if data checksums are enabled.  See  +linkend="app-initdb-data-checksums"/> for more information.
> +   
> 
> * I think we need a description about possible values for 'fork'
> (i.g., 'main', 'vm', 'fsm' and 'init'), and the behavior when 'fork'
> is omitted.

Done.

> * Do we need to explain about checksum cost-based delay here?

It's probably better in config.sgml, nearby vacuum cost-based delay, so done
this way with a link to reference that part.

> 3.
> +CREATE OR REPLACE FUNCTION pg_check_relation(
> +  IN relation regclass DEFAULT NULL::regclass, IN fork text DEFAULT 
> NULL::text,
> +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> +  OUT expected_checksum integer, OUT found_checksum integer)
> +  RETURNS SETOF record VOLATILE LANGUAGE internal AS 'pg_check_relation'
> +  PARALLEL RESTRICTED;
> 
> Now that pg_check_relation doesn't accept NULL as 'relation', I think
> we need to make 'relation' a mandatory argument.

Correct, fixed.

> 4.
> +   /* Check if the relation (still) exists */
> +   if (SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
> +   {
> +   /*
> +* We use a standard relation_open() to acquire the initial lock.  It
> +* means that this will block until the lock is acquired, or will
> +* raise an ERROR if lock_timeout has been set.  If caller wants to
> +* check multiple tables while relying on a maximum wait time, it
> +* should process tables one by one instead of relying on a global
> +* processing with the main SRF.
> +*/
> +   relation = relation_open(relid, AccessShareLock);
> +   }
> 
> IIUC the above was necessary because we used to have
> check_all_relations() which iterates all relations on the database to
> do checksum checks. But now that we don't have that function and
> pg_check_relation processes one relation. Can we just do
> relation_open() here?

Ah yes I missed that comment.  I think only the comment needed to be updated to
remove any part related to NULL-relation call.  I ended up removign the whole
comment since locking and lock_timeout behavior is inherent to relation_open
and there's no need to document that any further now that we always only check
one relation at a time.

> 5.
> I think we need to check if the relation is a temp relation. I'm not
> sure it's worth to check checksums of temp relations but at least we
> need not to check other's temp relations.

Good point.  I think it's still worthwhile to check the backend's temp
relation, although typical usage should be a bgworker/cron job doing that check
so there shouldn't be any.

> 6.
> +/*
> + * Safely read the wanted buffer from disk, dealing with possible concurrency
> + * issue.  Note that if a buffer is found dirty in shared_buffers, no read 
> will
> + * be performed and the caller will be informed that no check should be done.
> + * We can safely ignore such buffers as they'll be written before next
> + * checkpoint's completion..
> [...] 
> + */
> 
> I think the above comment also needs some "/*---" at the beginning and 
> end.

Fixed.

> 7.
> +static void
> +check_get_buffer(Relation relation, ForkNumber forknum,
> +BlockNumber blkno, char *buffer, bool needlock, bool 
> *checkit,
> +bool *found_in_sb)
> +{
> 
> Maybe we can make check_get_buffer() return a bool indicating we found
> a buffer to check, instead of having '*checkit'. That way, we can
> simplify the following code:
> 
> +   check_get_buffer(relation, forknum, blkno, buffer, force_lock,
> +&checkit, &found_in_sb);
> +
> +   if (!checkit)
> +   continue;
> 
> to something like:
> 
> + if (!check_get_buffer(relation, forknum, blkno, buffer, force_lock,
> +  &found_in_sb))
> + continue;

Changed.

> 8.
> +   if (PageIsVerified(buffer, blkno))
> +   {
> +   /*
>

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-28 Thread Dean Rasheed

On Wed, 25 Mar 2020 at 00:28, Tomas Vondra  wrote:
>
> Seems OK to me.
>
> I'd perhaps name deps_clauselist_selectivity differently, it's a bit too
> similar to dependencies_clauselist_selectivity. Perhaps something like
> clauselist_apply_dependencies? But that's a minor detail.
>

OK, I've pushed that with your recommendation for that function name.

Regards,
Dean

Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

2020-03-28 Thread Dave Sharpe

From Gilles Darold  on 2020-03-26T16:09:04.
> Actually the callback function is called when the error is thrown:

> psql:eat_rollback2.sql:20: INFO:  0: myTransactionCallback() XactEvent 2 
> (is abort) level 1 <-
> LOCATION:  myTransactionCallback, eat_rollback.c:52
> psql:eat_rollback2.sql:20: ERROR:  XX000: no no no
> LOCATION:  mySubtransactionCallback, eat_rollback.c:65

> this is probably why the callback is not called on the subsequent ROLLBACK 
> execution because abort processing is
>  already done (src/backend/access/transam/xact.c:3890).
So I withdraw this patch and fix. The callback during the error will drive the 
ROLLBACK remote, as required in the fdw.
Great catch, thanks Gilles!

Cheers,
Dave

GSoC Proposal

2020-03-28 Thread Kartik Ohri

Hi! Can I get some review on my GSoC proposal ?

https://docs.google.com/document/d/1EiIHZjOjf6yWfGzKeHCbu8bJ6K1tCEcmPsD3i8lPXbg/edit?usp=sharing



Thanks.

[PATCH'] Variables assigned with values that is never used.

2020-03-28 Thread Ranier Vilela

Hi,

Theses variables, are assigned with values that never is used and, can
safely have their values removed.

best regards,
Ranier Vilela


variables_assigned_unused_value.patch
Description: Binary data

Re: WAL usage calculation patch

2020-03-28 Thread Julien Rouhaud

On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> 
> I see some basic problems with the patch.  The way it tries to compute
> WAL usage for parallel stuff doesn't seem right to me.  Can you share
> or point me to any test done where we have computed WAL for parallel
> operations like Parallel Vacuum or Parallel Create Index?

Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for.  That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.

> Basically,
> I don't know changes done in ExecInitParallelPlan and friends allow us
> to compute WAL for parallel operations.  Those will primarily cover
> parallel queries that won't write WAL.  How you have tested those
> changes?

I didn't tested those, and I'm not even sure how to properly and reliably test
that.  Do you have any advice on how to achieve that?

However the patch is mimicking the buffer instrumentation that already exists,
and the approach also looks correct to me.  Do you have a reason to believe
that the approach that works for buffer usage wouldn't work for WAL records? (I
of course agree that this should be tested anyway)

Re: base backup client as auxiliary backend process

2020-03-28 Thread Alvaro Herrera

On 2020-Jan-14, Peter Eisentraut wrote:

> On 2020-01-14 07:32, Masahiko Sawada wrote:
> > - Replication slot name used by this WAL receiver
> > + 
> > +  Replication slot name used by this WAL receiver.  This is only set 
> > if a
> > +  permanent replication slot is set using  > +  linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may 
> > use
> > +  a temporary replication slot (determined by  > +  linkend="guc-wal-receiver-create-temp-slot"/>), but these are not 
> > shown
> > +  here.
> > + 
> > 
> > Now that the slot name is shown even if it's a temp slot the above
> > documentation changes needs to be changed. Other changes look good to
> > me.
> 
> committed, thanks

Sergei has just proposed a change in semantics: if primary_slot_name is
specified as well as wal_receiver_create_temp_slot, then a temp slot is
used and it uses the specified name, instead of ignoring the temp-slot
option as currently.

Patch is at 
https://postgr.es/m/3109511585392...@myt6-887fb48a9c29.qloud-c.yandex.net

(To clarify: the current semantics if both options are set is that an
existing permanent slot is sought with the given name, and an error is
raised if it doesn't exist.)

What do you think?  Preliminarly I think the proposed semantics are
saner.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: proposal \gcsv

2020-03-28 Thread Daniel Verite

Erik Rijkers wrote:

> 2. let the psql command-line option '--csv' honour the value given by  
> psql -F/--field-separator (it does not do so now)
>
> or
> 
> 3. add an psql -commandline option:
> --csv-field-separator

Setting the field separator on the command line is already supported
through this kind of invocation:

psql --csv -P csv_fieldsep=$'\t'

bash expands $'\t' to a tab character. Other shells might need
different tricks.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: proposal \gcsv

2020-03-28 Thread Pavel Stehule

so 28. 3. 2020 v 15:06 odesílatel Daniel Verite 
napsal:

> Erik Rijkers wrote:
>
> > 2. let the psql command-line option '--csv' honour the value given by
> > psql -F/--field-separator (it does not do so now)
> >
> > or
> >
> > 3. add an psql -commandline option:
> > --csv-field-separator
>
> Setting the field separator on the command line is already supported
> through this kind of invocation:
>
> psql --csv -P csv_fieldsep=$'\t'
>
> bash expands $'\t' to a tab character. Other shells might need
> different tricks.
>

We have named parameters in shell, but not in psql



>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread James Coleman

On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
 wrote:
> ...
> >> As a side note here, I'm wondering if this (determining useful pathkeys)
> >> can be made a bit smarter by looking both at query_pathkeys and pathkeys
> >> useful for merging, similarly to what truncate_useless_pathkeys() does.
> >> But that can be seen as an improvement of what we do now.
> >
> >Unless your comment below about looking at truncate_useless_pathkeys
> >is implying you're considering aiming to get this in now, I wonder if
> >we should just expand the comment to reference pathkeys useful for
> >merging as a possible future extension.
> >
>
> Maybe. I've been thinking about how to generate those path keys, but
> it's a bit tricky, because pathkeys_useful_for_merging() - the thing
> that backs truncate_useless_pathkeys() actually gets pathkeys and merely
> verifies if those are useful for merging. But this code needs to do the
> opposite - generate those pathkeys.
>
> But let's say we know we have a join on columns (a,b,c). For each of
> those PathKey values we know it's useful for merging, but should we
> generate pathkeys (a,b,c), (b,c,a), ... or any other permutation? I
> suppose we can look at pathkeys for existing paths of the relation to
> prune the possibilities a bit, but what then?

I'm not convinced it's worth it this time around. Both because of the
late hour in the CF etc., but also because it seems likely to become
pretty complex quickly, and also far more likely to raise performance
questions in planning if there's not a lot of work done to keep it
limited.

> BTW I wonder if we actually need the ec_has_volatile check in
> get_useful_pathkeys_for_relation. The comment says we can't but is it
> really true? pathkeys_useful_for_ordering doesn't do any such checks, so
> I'd bet this is merely an unnecessary copy-paste from postgres_fdw.
> Opinions?

I hadn't really looked at that part in depth before, but after reading
it over, re-reading the definition of volatility, and running a few
basic queries, I agree.

For example: we already do allow volatile pathkeys in a simple query like:
-- t(a, b) with index on (a)
select * from t order by a, random();

It makes sense that you couldn't push down such a sort to a foreign
server, given there's no constraints said function isn't operating
directly on the primary database (in the fdw case). But that obviously
wouldn't apply here.

> >> >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation:
> >> >* Considering query_pathkeys is always worth it, because it might let us
> >> >* avoid a local sort.
> >> >
> >> >That originally was a copy from the fdw code, but since the two
> >> >functions have diverged (Is that concerning? I could be confusing, but
> >> >isn't a compilation problem) I didn't move the function.
> >> >
> >>
> >> I think it's OK the two functions diverged, it's simply because the FDW
> >> one needs to check other things too. But I might rework this once I look
> >> closer at truncate_useless_pathkeys.
> >
> >Agreed, for now at least. It's tempting to think they should always be
> >shared, but I'm not convinced (without a lot more digging) that this
> >represents structural rather than incidental duplication.
> >
>
> The more I look at pathkeys_useful_for_ordering() the more I think the
> get_useful_pathkeys_for_relation() function should look more like it
> than the postgres_fdw one ...

If we go down that path (and indeed this is actually implied by
removing the volatile check too), doesn't that really just mean (at
least for now) that get_useful_pathkeys_for_relation goes away
entirely and we effectively use root->query_pathkeys directly? The
only thing your lose them is the check that each eclass has a member
in the rel. But that check probably wasn't quite right anyway: at
least for incremental sort (maybe not regular sort), I think we
probably don't necessarily care that the entire list has members in
the rel, but rather that some prefix does, right? The idea there would
be that we could create a gather merge here that supplies a partial
ordering (that prefix of query_pathkeys) and then above it the planner
might place another incremental sort (say above a join), or perhaps
even a join above that would actually be sufficient (since many joins
are capable of providing an ordering anyway).

I've attached (added prefix .txt to avoid the cfbot assuming this is a
full series) an idea in that direction to see if we're thinking along
the same route. You'll want to apply and view with `git diff -w`
probably.

The attached also adds a few XXX comments. In particular, I wonder if
we should verify that some prefix of the root->query_pathkeys is
actually made up of eclass members for the current rel, because
otherwise I think we can skip the loop on the subpaths entirely.

> >> >I did notice though that find_em_expr_for_rel() is wholesale copied
> >> >(and unchanged) from the fdw code, so I moved it to equivclass.c so
> >> >both places can share it.
> >> >
> >>
> >

Re: improve transparency of bitmap-only heap scans

2020-03-28 Thread James Coleman

On Fri, Mar 27, 2020 at 9:24 PM Amit Kapila  wrote:
>
> On Wed, Mar 25, 2020 at 5:44 PM James Coleman  wrote:
> >
> > On Tue, Mar 24, 2020 at 11:02 PM Amit Kapila  
> > wrote:
> > >
> > > On Wed, Mar 25, 2020 at 12:44 AM Tom Lane  wrote:
> > > >
> > > > I took a quick look through this patch.  While I see nothing to complain
> > > > about implementation-wise, I'm a bit befuddled as to why we need this
> > > > reporting when there is no comparable data provided for regular 
> > > > index-only
> > > > scans.  Over there, you just get "Heap Fetches: n", and the existing
> > > > counts for bitmap scans seem to cover the same territory.
> > > >
> > >
> > > Isn't deducing similar information ("Skipped Heap Fetches: n") there
> > > is a bit easier than it is here?
> >
> > While I'm not the patch author so can't speak to the original thought
> > process, I do think it makes sense to show it.
> >
>
> Yeah, I also see this information could be useful.  It seems Tom Lane
> is not entirely convinced of this.  I am not sure if this is the right
> time to seek more opinions as we are already near the end of CF.  So,
> we should either decide to move this to the next CF if we think of
> getting the opinion of others or simply reject it and see a better way
> for EXPLAIN to identify an index-only BMS.

I'm curious if Tom's objection is mostly on the grounds that we should
be consistent in what's displayed, or that he thinks the information
is likely to be useless.

If consistency is the goal you might e.g., do something that just
changes the node type output, but in favor of changing that, it seems
to me that showing "how well did the optimization" is actually more
valuable than "did we do the optimization at all". Additionally I
think showing it as an optimization of an existing node is actually
likely less confusing anyway.

One other thing: my understanding is that this actually matches the
underlying code split too. For the index only scan case, we actually
have a separate node (it's not just an optimization of the standard
index scan). There are discussions about whether that's a good thing,
but it's what we have. In contrast, the bitmap scan actually has it as
a pure optimization of the existing bitmap heap scan node.

James

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread David Steele


On 3/27/20 9:52 PM, Alvaro Herrera wrote:

On 2020-Mar-27, Tom Lane wrote:


That being the case, I'd think a better design principle is "make your
new code look like the code around it", which would tend to weigh against
introducing StringInfo uses into pgbench when there's none there now and
a bunch of PQExpBuffer instead.  So I can't help thinking the advice
you're being given here is suspect.


+1 for keeping it PQExpBuffer-only, until such a time when you need a
StringInfo feature that's not in PQExpBuffer -- and even at that point,
I think you'd switch just that one thing to StringInfo, not the whole
program.


I think I need to be careful what I joke about.  It wasn't my intention 
to advocate changing all the existing *PQExpBuffer() calls in bin.


But, the only prior committer to look at this patch expressed a 
preference for StringInfo so in the absence of any other input I thought 
it might move the patch forward if I reinforced that.  Now it seems the 
consensus has moved in favor of *PQExpBuffer().


Fabien has provided a patch in each flavor, so I guess the question is: 
is it committable either way?


Regards,
--
-David
da...@pgmasters.net

Re: proposal \gcsv

2020-03-28 Thread Erik Rijkers


On 2020-03-28 15:06, Daniel Verite wrote:

Erik Rijkers wrote:


2. let the psql command-line option '--csv' honour the value given by
psql -F/--field-separator (it does not do so now)

or

3. add an psql -commandline option:
--csv-field-separator


Setting the field separator on the command line is already supported
through this kind of invocation:

psql --csv -P csv_fieldsep=$'\t'

bash expands $'\t' to a tab character. Other shells might need
different tricks.


Ah yes, that works.  I had not seen that psql -P option.  Thanks!




Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-28 Thread Julien Rouhaud

On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > 
> > I see some basic problems with the patch.  The way it tries to compute
> > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > or point me to any test done where we have computed WAL for parallel
> > operations like Parallel Vacuum or Parallel Create Index?
> 
> Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> workers won't be accounted for.  That being said, I think that an argument
> could be made that proper infrastructure should have been added in the 
> original
> parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> usage in parallel utility, unless I'm missing something.

Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.

I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
commit adding parallel maintenance.

Re: [PATCH] postgresql.conf.sample->postgresql.conf.sample.in

2020-03-28 Thread Tom Lane

"Ivan N. Taranov"  writes:
> This is usable for build installable postgresql.conf.SAMPLE. At the
> configure phase, it is possible to include / exclude parameters in the
> sample depending on the selected options (--enable - * / - disable- *
> etc ..)

I'm with Peter on this: you're proposing to complicate matters for
no real gain.

As a former packager, I can readily imagine situations where somebody
wants to adjust the initial contents of postgresql.conf compared to
what's distributed --- I've done it myself.  But anybody who's in that
situation has got lots of other tools they can use for the purpose
(patch(1) being a pretty favorite one, since it can also apply other
sorts of code changes).  Even more to the point, they've probably got
an existing process for this, which would be needlessly broken by
renaming the file as-distributed.

Also, of the various ways that one might inject a modification,
editing the configure.in file and then having to re-autoconf is
one of the more painful ones, probably only exceeded by trying
to maintain a patch against configure itself :-(

As far as the project's own internal needs go, we do already have
cases where configure's choices need to feed into postgresql.conf, but
having initdb do all the actual editing has worked out fine for that.
I don't think splitting the responsibility between configure time and
initdb time would be an improvement --- for one thing, it'd be more
painful not less so to deal with cases where considerations at both
levels affect the same postgresql.conf entries.

So if you want this proposal to go anywhere, you need a much more
concrete and compelling example of something for which this is the
only sane way to do it.

regards, tom lane

Re: [PATCH] postgresql.conf.sample->postgresql.conf.sample.in

2020-03-28 Thread Ivan N. Taranov

Patch - yes, a good way. but 1) requires invasion to  the makefile 2)
makes changes in the file stored on git..

in case postgresql.conf.sample.in is a template, there are no such
problems. and this does not bother those who if someone assumes the
existence of the postgres.conf.sample file

>Even more to the point, they've probably got an existing process for this, 
>which would be needlessly broken by renaming the file as-distributed.


I agree, this is a serious reason not to do this, especially if the
vendor stores changes in postgres.conf.samle in git

> So if you want this proposal to go anywhere, you need a much more concrete 
> and compelling example of something for which this is the  only sane way to 
> do it.


This feature seems usable  for preparing a certain number of packages
consisting of different features. Each feature can have its own set of
sample settings in postgres.conf.sample. In this case, using makefile
+ patch is more ugly.

In any case, I am grateful for the answer and clarification!

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-28 Thread Tom Lane

Justin Pryzby  writes:
> Thanks for fixing my test case and pushing.

The buildfarm just showed up another instability in the test cases
we added:

=== regression.diffs 
diff -U3 
/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/test/regress/expected/misc_functions.out
 
/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/misc_functions.out
--- 
/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/test/regress/expected/misc_functions.out
 2020-03-17 08:14:50.292037956 +0100
+++ 
/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/misc_functions.out
   2020-03-28 13:55:12.490024822 +0100
@@ -169,11 +169,7 @@
 
 select (w).size = :segsize as ok
 from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1;
- ok 
-
- t
-(1 row)
-
+ERROR:  could not stat file "pg_wal/00010078": No such file or 
directory
 select count(*) >= 0 as ok from pg_ls_archive_statusdir();
  ok 
 

It's pretty obvious what happened here: concurrent activity renamed or
removed the WAL segment between when we saw it in the directory and
when we tried to stat() it.

This seems like it would be just as much of a hazard for field usage
as it is for regression testing, so I propose that we fix these
directory-scanning functions to silently ignore ENOENT failures from
stat().  Are there any for which we should not do that?

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread Tom Lane

David Rowley  writes:
> On Sat, 28 Mar 2020 at 17:12, Laurenz Albe  wrote:
>> In the light of that, I have no objections.

> Thank you.  Pushed.

It seems like this commit has resulted in some buildfarm instability:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02

apparent change of plan

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05

unstable results in stats_ext test

I initially thought that Dean's functional-stats adjustment might be
the culprit, but the timestamps on these failures disprove that.

regards, tom lane

Re: ssl passphrase callback

2020-03-28 Thread Tom Lane

Andreas Karlsson  writes:
> On 3/22/20 1:08 AM, Andrew Dunstan wrote:
>> Latest patch attached, I think all comments have been addressed. I
>> propose to push this later this coming week if there are no more comments.

> I do not have any objections.

This CF entry is still open, should it not be closed as committed?

https://commitfest.postgresql.org/27/2338/

regards, tom lane

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread Andres Freund

Hi,

On 2020-03-27 19:57:12 -0400, Tom Lane wrote:
> Fabien COELHO  writes:
> >>> Ok. I find it strange to mix PQExpBuffer & StringInfo in the same file.
> 
> >> Agreed, but we'd rather use StringInfo going forward.  However, I don't 
> >> think 
> >> that puts you on the hook for updating all the PQExpBuffer references.
> >> Unless you want to...
> 
> > I cannot say that I "want" to fix something which already works the same 
> > way, because it is against my coding principles.
> > However there may be some fun in writing a little script to replace one 
> > with the other automatically. I counted nearly 3500 calls under src/bin.
> 
> Yeah, that's the problem.  If someone does come forward with a patch to do
> that, I think it'd be summarily rejected, at least in high-traffic code
> like pg_dump.  The pain it'd cause for back-patching would outweigh the
> value.

Sure, but that's not at all what was proposed.


> That being the case, I'd think a better design principle is "make your
> new code look like the code around it", which would tend to weigh against
> introducing StringInfo uses into pgbench when there's none there now and
> a bunch of PQExpBuffer instead.  So I can't help thinking the advice
> you're being given here is suspect.

I don't agree with this. This is a "fresh" usage of StringInfo. That's
different to adding one new printed line among others built with
pqexpbuffer. If we continue adding large numbers of new uses of both
pieces of infrastructure, we're just making things more confusing.

Greetings,

Andres Freund

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-28 Thread Justin Pryzby

On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote:
> The buildfarm just showed up another instability in the test cases
> we added:

Yea, as you said, this is an issue with the *testcase*.  The function behavior
didn't change, we just weren't previously exercising it.

>  select (w).size = :segsize as ok
>  from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1;
> - ok 
> -
> - t
> -(1 row)
> -
> +ERROR:  could not stat file "pg_wal/00010078": No such file 
> or directory
>  select count(*) >= 0 as ok from pg_ls_archive_statusdir();
>   ok 
>  
> 
> It's pretty obvious what happened here: concurrent activity renamed or
> removed the WAL segment between when we saw it in the directory and
> when we tried to stat() it.
> 
> This seems like it would be just as much of a hazard for field usage
> as it is for regression testing,

That's clearly true for pg_ls_waldir(), which call pg_ls_dir_files, and
includes some metadata columns.

> so I propose that we fix these directory-scanning functions to silently
> ignore ENOENT failures from stat().  Are there any for which we should not do
> that?

I think it's reasonable to ignore transient ENOENT for tmpdir, logdir, and
probably archive_statusdir.  That doesn't currently affect pg_ls_dir(), which
lists file but not metadata for an arbitrary dir, so doesn't call stat().

Note that dangling links in the other functions currently cause (wrong [0])
error.  I guess it should be documented if broken link is will be ignored due
to ENOENT.

Maybe we should lstat() the file to determine if it's a dangling link; if
lstat() fails, then skip it.  Currently, we use stat(), which shows metdata of
a link's *target*.  Maybe we'd change that.

Note that I have a patch which generalizes pg_ls_dir_files and makes
pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same
unless I add another flag to do otherwise (but behaving the same has its
merits).  It already uses lstat() to show links to dirs as isdir=no, which was
needed to avoid recursing into links-to-dirs in the new helper function
pg_ls_dir_recurse().  https://commitfest.postgresql.org/26/2377/

[0] Which you fixed in 085b6b667 and I previously fixed at:
https://www.postgresql.org/message-id/attachment/106478/v2-0001-BUG-in-errmsg.patch
|$ sudo ln -s foo /var/log/postgresql/bar
|ts=# SELECT * FROM pg_ls_logdir() ORDER BY 3;
|ERROR:  could not stat directory "/var/log/postgresql": No such file or 
directory

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread Tom Lane

Andres Freund  writes:
> On 2020-03-27 19:57:12 -0400, Tom Lane wrote:
>> That being the case, I'd think a better design principle is "make your
>> new code look like the code around it", which would tend to weigh against
>> introducing StringInfo uses into pgbench when there's none there now and
>> a bunch of PQExpBuffer instead.  So I can't help thinking the advice
>> you're being given here is suspect.

> I don't agree with this. This is a "fresh" usage of StringInfo. That's
> different to adding one new printed line among others built with
> pqexpbuffer. If we continue adding large numbers of new uses of both
> pieces of infrastructure, we're just making things more confusing.

Why?  I'm not aware of any intention to deprecate/remove PQExpBuffer,
and I doubt it'd be a good thing to try.  It does some things that
StringInfo won't, notably cope with OOM without crashing.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread Tomas Vondra

On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote:

On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
 wrote:

...
>> As a side note here, I'm wondering if this (determining useful pathkeys)
>> can be made a bit smarter by looking both at query_pathkeys and pathkeys
>> useful for merging, similarly to what truncate_useless_pathkeys() does.
>> But that can be seen as an improvement of what we do now.
>
>Unless your comment below about looking at truncate_useless_pathkeys
>is implying you're considering aiming to get this in now, I wonder if
>we should just expand the comment to reference pathkeys useful for
>merging as a possible future extension.
>

Maybe. I've been thinking about how to generate those path keys, but
it's a bit tricky, because pathkeys_useful_for_merging() - the thing
that backs truncate_useless_pathkeys() actually gets pathkeys and merely
verifies if those are useful for merging. But this code needs to do the
opposite - generate those pathkeys.

But let's say we know we have a join on columns (a,b,c). For each of
those PathKey values we know it's useful for merging, but should we
generate pathkeys (a,b,c), (b,c,a), ... or any other permutation? I
suppose we can look at pathkeys for existing paths of the relation to
prune the possibilities a bit, but what then?

I'm not convinced it's worth it this time around. Both because of the
late hour in the CF etc., but also because it seems likely to become
pretty complex quickly, and also far more likely to raise performance
questions in planning if there's not a lot of work done to keep it
limited.

Agreed. There'll be time to add more optimizations in the future.

BTW I wonder if we actually need the ec_has_volatile check in
get_useful_pathkeys_for_relation. The comment says we can't but is it
really true? pathkeys_useful_for_ordering doesn't do any such checks, so
I'd bet this is merely an unnecessary copy-paste from postgres_fdw.
Opinions?

I hadn't really looked at that part in depth before, but after reading
it over, re-reading the definition of volatility, and running a few
basic queries, I agree.

For example: we already do allow volatile pathkeys in a simple query like:
-- t(a, b) with index on (a)
select * from t order by a, random();

It makes sense that you couldn't push down such a sort to a foreign
server, given there's no constraints said function isn't operating
directly on the primary database (in the fdw case). But that obviously
wouldn't apply here.

Thanks for confirming my reasoning.

>> >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation:
>> >* Considering query_pathkeys is always worth it, because it might let us
>> >* avoid a local sort.
>> >
>> >That originally was a copy from the fdw code, but since the two
>> >functions have diverged (Is that concerning? I could be confusing, but
>> >isn't a compilation problem) I didn't move the function.
>> >
>>
>> I think it's OK the two functions diverged, it's simply because the FDW
>> one needs to check other things too. But I might rework this once I look
>> closer at truncate_useless_pathkeys.
>
>Agreed, for now at least. It's tempting to think they should always be
>shared, but I'm not convinced (without a lot more digging) that this
>represents structural rather than incidental duplication.
>

The more I look at pathkeys_useful_for_ordering() the more I think the
get_useful_pathkeys_for_relation() function should look more like it
than the postgres_fdw one ...

If we go down that path (and indeed this is actually implied by
removing the volatile check too), doesn't that really just mean (at
least for now) that get_useful_pathkeys_for_relation goes away
entirely and we effectively use root->query_pathkeys directly?

Yes, basically.

The only thing your lose them is the check that each eclass has a
member in the rel. But that check probably wasn't quite right anyway:
at least for incremental sort (maybe not regular sort), I think we
probably don't necessarily care that the entire list has members in the
rel, but rather that some prefix does, right?

Probably, although I always forget how exactly this EC business works.
My reasoning is more "If pathkeys_useful_for_ordering does not need
that, why should we need it here?"

The idea there would be that we could create a gather merge here that
supplies a partial ordering (that prefix of query_pathkeys) and then
above it the planner might place another incremental sort (say above a
join), or perhaps even a join above that would actually be sufficient
(since many joins are capable of providing an ordering anyway).

Not sure. Isn't the idea that we do the Incremental Sort below the
Gather Merge, because then it's actually done in parallel?

I've attached (added prefix .txt to avoid the cfbot assuming this is a
full series) an idea in that direction to see if we're thinking along
the same route. You'll want to apply and view with `git diff -w`
probably.

The attached also adds a few XXX comments. In pa

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2020-03-28 Thread Nikolay Shaplov

A new version of the patch.
Autovacuum options were extended in b07642db

So I added that options to the current patch.diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index e136116..55bd1ec 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -46,9 +46,8 @@
  * upper and lower bounds (if applicable); for strings, consider a validation
  * routine.
  * (ii) add a record below (or use add__reloption).
- * (iii) add it to the appropriate options struct (perhaps StdRdOptions)
- * (iv) add it to the appropriate handling routine (perhaps
- * default_reloptions)
+ * (iii) add it to the appropriate options struct
+ * (iv) add it to the appropriate handling routine
  * (v) make sure the lock level is set correctly for that operation
  * (vi) don't forget to document the option
  *
@@ -1147,9 +1146,11 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 	switch (classForm->relkind)
 	{
 		case RELKIND_RELATION:
-		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
-			options = heap_reloptions(classForm->relkind, datum, false);
+			options = heap_reloptions(datum, false);
+			break;
+		case RELKIND_TOASTVALUE:
+			options = toast_reloptions(datum, false);
 			break;
 		case RELKIND_PARTITIONED_TABLE:
 			options = partitioned_table_reloptions(datum, false);
@@ -1521,59 +1522,66 @@ fillRelOptions(void *rdopts, Size basesize,
 
 
 /*
- * Option parser for anything that uses StdRdOptions.
+ * Option parsing definition for autovacuum. Used in toast and heap options.
+ */
+
+#define AUTOVACUUM_RELOPTIONS(OFFSET)\
+		{"autovacuum_enabled", RELOPT_TYPE_BOOL, \
+		OFFSET + offsetof(AutoVacOpts, enabled)},\
+		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_threshold)},   \
+		{"autovacuum_vacuum_insert_threshold", RELOPT_TYPE_INT,  \
+		OFFSET + offsetof(AutoVacOpts, vacuum_ins_threshold)},   \
+		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, analyze_threshold)},  \
+		{"autovacuum_vacuum_cost_delay", RELOPT_TYPE_REAL,   \
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_delay)},  \
+		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_limit)},  \
+		{"autovacuum_freeze_min_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_min_age)}, \
+		{"autovacuum_freeze_max_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_max_age)}, \
+		{"autovacuum_freeze_table_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, freeze_table_age)},   \
+		{"autovacuum_multixact_freeze_min_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_min_age)},   \
+		{"autovacuum_multixact_freeze_max_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_max_age)},   \
+		{"autovacuum_multixact_freeze_table_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_table_age)}, \
+		{"log_autovacuum_min_duration", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, log_min_duration)},   \
+		{"autovacuum_vacuum_scale_factor", RELOPT_TYPE_REAL, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_scale_factor)},\
+		{"autovacuum_vacuum_insert_scale_factor", RELOPT_TYPE_REAL,  \
+		OFFSET  + offsetof(AutoVacOpts, vacuum_ins_scale_factor)},   \
+		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,\
+		OFFSET + offsetof(AutoVacOpts, analyze_scale_factor)}
+/*
+ * Option parser for heap
  */
 bytea *
-default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
+heap_reloptions(Datum reloptions, bool validate)
 {
 	static const relopt_parse_elt tab[] = {
-		{"fillfactor", RELOPT_TYPE_INT, offsetof(StdRdOptions, fillfactor)},
-		{"autovacuum_enabled", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, enabled)},
-		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_threshold)},
-		{"autovacuum_vacuum_insert_threshold", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_ins_threshold)},
-		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, analyze_threshold)},
-		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_cost_limit)},
-		{"autovacuum_freeze_min_age", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, freeze_min_age)},
-		{"autovacuum_freeze_max_age", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, freeze_max_age)},
-		{"autovacuum_freeze_table_age"

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread Andres Freund

On 2020-03-28 14:49:31 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-03-27 19:57:12 -0400, Tom Lane wrote:
> >> That being the case, I'd think a better design principle is "make your
> >> new code look like the code around it", which would tend to weigh against
> >> introducing StringInfo uses into pgbench when there's none there now and
> >> a bunch of PQExpBuffer instead.  So I can't help thinking the advice
> >> you're being given here is suspect.
> 
> > I don't agree with this. This is a "fresh" usage of StringInfo. That's
> > different to adding one new printed line among others built with
> > pqexpbuffer. If we continue adding large numbers of new uses of both
> > pieces of infrastructure, we're just making things more confusing.
> 
> Why?  I'm not aware of any intention to deprecate/remove PQExpBuffer,
> and I doubt it'd be a good thing to try.  It does some things that
> StringInfo won't, notably cope with OOM without crashing.

- code using it cannot easily be shared between frontend/backend (no
  memory context integration etc)
- most code does *not* want to deal with the potential for OOM without
  erroring out
- it's naming is even more confusing than StringInfo
- it introduces dependencies to libpq even when not needed
- both stringinfo and pqexpbuffer are performance relevant in some uses,
  needing to optimize both is wasted effort
- we shouldn't expose everyone to both APIs except where needed - it's
  stuff one has to learn

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-28 Thread Tom Lane

Justin Pryzby  writes:
> On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote:
>> so I propose that we fix these directory-scanning functions to silently
>> ignore ENOENT failures from stat().  Are there any for which we should not do
>> that?

> Maybe we should lstat() the file to determine if it's a dangling link; if
> lstat() fails, then skip it.  Currently, we use stat(), which shows metdata of
> a link's *target*.  Maybe we'd change that.

Hm, good point that ENOENT could refer to a symlink's target.  Still,
I'm not sure it's worth going out of our way to disambiguate that,
given that these directories aren't really supposed to contain symlinks.
(And on the third hand, if they aren't supposed to, then maybe these
functions needn't look through any symlinks?  In which case just
substituting lstat for stat would resolve the ambiguity.)

> Note that I have a patch which generalizes pg_ls_dir_files and makes
> pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same
> unless I add another flag to do otherwise (but behaving the same has its
> merits).  It already uses lstat() to show links to dirs as isdir=no, which was
> needed to avoid recursing into links-to-dirs in the new helper function
> pg_ls_dir_recurse().  https://commitfest.postgresql.org/26/2377/

I think we need a back-patchable fix for the ENOENT failure, seeing that
we back-patched the new regression test; intermittent buildfarm failures
are no fun in any branch.  So new functions aren't too relevant here,
although it's fair to look ahead at whether the same behavior will be
appropriate for them.

regards, tom lane

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread Tom Lane

Andres Freund  writes:
> On 2020-03-28 14:49:31 -0400, Tom Lane wrote:
>> Why?  I'm not aware of any intention to deprecate/remove PQExpBuffer,
>> and I doubt it'd be a good thing to try.  It does some things that
>> StringInfo won't, notably cope with OOM without crashing.

> - code using it cannot easily be shared between frontend/backend (no
>   memory context integration etc)

True, but also pretty irrelevant for pgbench and similar code.

> - most code does *not* want to deal with the potential for OOM without
>   erroring out

Fair point.

> - it's naming is even more confusing than StringInfo

Eye of the beholder ...

> - it introduces dependencies to libpq even when not needed

Most of our FE programs do include libpq, and pgbench certainly does,
so this seems like a pretty irrelevant objection as well.

> - both stringinfo and pqexpbuffer are performance relevant in some uses,
>   needing to optimize both is wasted effort

I'm not aware that anybody is trying to micro-optimize either.  Even
if someone is, it doesn't imply that they need to change both.

> - we shouldn't expose everyone to both APIs except where needed - it's
>   stuff one has to learn

That situation is unlikely to change in the foreseeable future.
Moreover, using both APIs in one program, where we were not before,
makes it worse not better.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread James Coleman

On Sat, Mar 28, 2020 at 2:54 PM Tomas Vondra
 wrote:
> ...
> >> >> >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation:
> >> >> >* Considering query_pathkeys is always worth it, because it might let 
> >> >> >us
> >> >> >* avoid a local sort.
> >> >> >
> >> >> >That originally was a copy from the fdw code, but since the two
> >> >> >functions have diverged (Is that concerning? I could be confusing, but
> >> >> >isn't a compilation problem) I didn't move the function.
> >> >> >
> >> >>
> >> >> I think it's OK the two functions diverged, it's simply because the FDW
> >> >> one needs to check other things too. But I might rework this once I look
> >> >> closer at truncate_useless_pathkeys.
> >> >
> >> >Agreed, for now at least. It's tempting to think they should always be
> >> >shared, but I'm not convinced (without a lot more digging) that this
> >> >represents structural rather than incidental duplication.
> >> >
> >>
> >> The more I look at pathkeys_useful_for_ordering() the more I think the
> >> get_useful_pathkeys_for_relation() function should look more like it
> >> than the postgres_fdw one ...
> >
> >If we go down that path (and indeed this is actually implied by
> >removing the volatile check too), doesn't that really just mean (at
> >least for now) that get_useful_pathkeys_for_relation goes away
> >entirely and we effectively use root->query_pathkeys directly?
>
> Yes, basically.
>
> >The only thing your lose them is the check that each eclass has a
> >member in the rel. But that check probably wasn't quite right anyway:
> >at least for incremental sort (maybe not regular sort), I think we
> >probably don't necessarily care that the entire list has members in the
> >rel, but rather that some prefix does, right?
>
> Probably, although I always forget how exactly this EC business works.
> My reasoning is more "If pathkeys_useful_for_ordering does not need
> that, why should we need it here?"

i think it effectively does, since it's called by
truncate_useless_pathkeys with the pathkeys list a path provides and
root-query_pathkeys and tries to find a prefix.

So if there wasn't a prefix of matching eclasses, we'd return 0 as the
number of matching prefix pathkeys, and thus a NIL list to the caller
of truncate_useless_pathkeys.

> >The idea there would be that we could create a gather merge here that
> >supplies a partial ordering (that prefix of query_pathkeys) and then
> >above it the planner might place another incremental sort (say above a
> >join), or perhaps even a join above that would actually be sufficient
> >(since many joins are capable of providing an ordering anyway).
> >
>
> Not sure. Isn't the idea that we do the Incremental Sort below the
> Gather Merge, because then it's actually done in parallel?

Yeah, I think as I was typing this my ideas got kinda mixed up a bit,
or at least was confusing. The incremental sort *would* need a path
that is ordered by a prefix of query_pathkeys and provide the sort on
the full query_pathkeys.

But the incremental sort at this point would still be on a path with a
given rel, and that path's rel would need to contain all of the
eclasses for the pathkeys we want as the final ordering to be able to
sort on them, right? For example, suppose:
select * from t join s order by t.a, s.b -- index on t.a

We'd have a presorted path on t.a that's has a prefix of the
query_pathkeys, but we couldn't have the incremental sort on path
whose rel only contained t, right? It'd have to a rel that was the
result of the join, otherwise we don't yet have access to s.b.

Given that, I think we have two options in this code. Suppose query
pathkeys (a, b, c):
1. Build an incremental sort path on (a, b, c) if we have a path
ordered by (a) or (a, b) but only when (c) is already available.
2. Additionally, (for the most possible paths generated): build an
incremental sort path on (a, b) if we have a path ordered by (a) but
(c) isn't yet available.

Either way I think we need the ability to know if all (or some subset)
of the query pathkeys are for eclasses we have access to in the
current rel.

> >I've attached (added prefix .txt to avoid the cfbot assuming this is a
> >full series) an idea in that direction to see if we're thinking along
> >the same route. You'll want to apply and view with `git diff -w`
> >probably.
> >
> >The attached also adds a few XXX comments. In particular, I wonder if
> >we should verify that some prefix of the root->query_pathkeys is
> >actually made up of eclass members for the current rel, because
> >otherwise I think we can skip the loop on the subpaths entirely.
> >
>
> Hmmm, that's an interesting possible optimization. I wonder if that
> actually saves anything, though, because the number of paths in the loop
> is likely fairly low.

If what I said above is correct (and please poke holes in it if
possible), then I think we have to know the matching eclass count
anyway, so we might as well include the optimization since it'd be a
simple int

Re: Memory-Bounded Hash Aggregation

2020-03-28 Thread Jeff Davis

On Fri, 2020-03-27 at 02:31 +0100, Tomas Vondra wrote:
> On Thu, Mar 26, 2020 at 05:56:56PM +0800, Richard Guo wrote:
> > If nbatches is some number between 1.0 and 2.0, we would have a
> > negative
> > depth. As a result, we may have a negative cost for hash
> > aggregation
> > plan node, as described in [1].
> > numGroups / ngroups_limit );
> 
> and we should probably do
> 
>nbatches = ceil(nbatches);
> 

Thank you both. I also protected against nbatches == 0 (shouldn't
happen), and against num_partitions <= 1. That allowed me to remove the
conditional and simplify a bit.

Regards,
Jeff Davis

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-03-28 Thread Andres Freund

Hi,

On 2020-02-28 22:10:52 -0800, Andres Freund wrote:
> On 2020-02-28 21:24:59 -0800, Andres Freund wrote:
> > Turns out that I am to blame for that. All the way back in 9.4. For
> > logical decoding I needed to make ScanPgRelation() use a specific type
> > of snapshot during one corner case of logical decoding. For reasons lost
> > to time, I didn't continue to pass NULL to systable_beginscan() in the
> > plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
> > Note the missing RegisterSnapshot()...
> 
> Oh, I pierced through the veil: It's likely because the removal of
> SnapshotNow happened concurrently to the development of logical
> decoding. Before using proper snapshot for catalog scans passing in
> SnapshotNow was precisely the right thing to do...
> 
> I think that's somewhat unlikely to actively cause problems in practice,
> as ScanPgRelation() requires that we already have a lock on the
> relation, we only look for a single row, and I don't think we rely on
> the result's tid to be correct. I don't immediately see a case where
> this would trigger in a problematic way.

Pushed a fix for this.


> So, um. What happens is that doDeletion() does a catalog scan, which
> sets a snapshot. The results of that catalog scan are then used to
> perform modifications. But at that point there's no guarantee that we
> still hold *any* snapshot, as e.g. invalidations can trigger the catalog
> snapshot being released.
> 
> I don't see how that's safe. Without ->xmin preventing that,
> intermediate row versions that we did look up could just get vacuumed
> away, and replaced with a different row. That does seem like a serious
> issue?
> 
> I think there's likely a lot of places that can hit this? What makes it
> safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
> ->xmin when there previously has been *any* catalog access? Because in
> contrast to normal table modifications, there's currently nothing at all
> forcing us to hold a snapshot between catalog lookups an their
> modifications?
> 
> Am I missing something? Or is this a fairly significant hole in our
> arrangements?
> 
> The easiest way to fix this would probably be to have inval.c call a
> version of InvalidateCatalogSnapshot() that leaves the oldest catalog
> snapshot around, but sets up things so that GetCatalogSnapshot() will
> return a freshly taken snapshot?  ISTM that pretty much every
> InvalidateCatalogSnapshot() call within a transaction needs that behaviour?

I'd still like to get some input here.

Greetings,

Andres Freund

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread Andres Freund

On 2020-03-28 15:16:21 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > - both stringinfo and pqexpbuffer are performance relevant in some uses,
> >   needing to optimize both is wasted effort
> 
> I'm not aware that anybody is trying to micro-optimize either.

https://postgr.es/m/5450.1578797036%40sss.pgh.pa.us

Re: ssl passphrase callback

2020-03-28 Thread Andrew Dunstan



On 3/28/20 1:45 PM, Tom Lane wrote:
> Andreas Karlsson  writes:
>> On 3/22/20 1:08 AM, Andrew Dunstan wrote:
>>> Latest patch attached, I think all comments have been addressed. I
>>> propose to push this later this coming week if there are no more comments.
>> I do not have any objections.
> This CF entry is still open, should it not be closed as committed?
>
> https://commitfest.postgresql.org/27/2338/
>
>   


Done, thanks for the reminder.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-03-28 Thread Andres Freund

Hi,

On 2020-03-28 12:30:40 -0700, Andres Freund wrote:
> On 2020-02-28 22:10:52 -0800, Andres Freund wrote:
> > On 2020-02-28 21:24:59 -0800, Andres Freund wrote:
> > So, um. What happens is that doDeletion() does a catalog scan, which
> > sets a snapshot. The results of that catalog scan are then used to
> > perform modifications. But at that point there's no guarantee that we
> > still hold *any* snapshot, as e.g. invalidations can trigger the catalog
> > snapshot being released.
> > 
> > I don't see how that's safe. Without ->xmin preventing that,
> > intermediate row versions that we did look up could just get vacuumed
> > away, and replaced with a different row. That does seem like a serious
> > issue?
> > 
> > I think there's likely a lot of places that can hit this? What makes it
> > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
> > ->xmin when there previously has been *any* catalog access? Because in
> > contrast to normal table modifications, there's currently nothing at all
> > forcing us to hold a snapshot between catalog lookups an their
> > modifications?
> > 
> > Am I missing something? Or is this a fairly significant hole in our
> > arrangements?
> > 
> > The easiest way to fix this would probably be to have inval.c call a
> > version of InvalidateCatalogSnapshot() that leaves the oldest catalog
> > snapshot around, but sets up things so that GetCatalogSnapshot() will
> > return a freshly taken snapshot?  ISTM that pretty much every
> > InvalidateCatalogSnapshot() call within a transaction needs that behaviour?
> 
> I'd still like to get some input here.

Attached is a one patch that adds assertions to detect this, and one
that puts enough workarounds in place to make the tests pass. I don't
like this much, but I thought it'd be useful for others to understand
the problem.

Greetings,

Andres Freund
>From a22780e3544d7c83b5ad8851de240c3d5ef8f221 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 29 Feb 2020 18:07:25 -0800
Subject: [PATCH v1 1/2] TMP: work around missing snapshot registrations.

This is just what's hit by the tests. It's not an actual fix.
---
 src/backend/catalog/namespace.c |  7 +++
 src/backend/catalog/pg_subscription.c   |  4 
 src/backend/commands/indexcmds.c|  9 +
 src/backend/commands/tablecmds.c|  8 
 src/backend/replication/logical/tablesync.c | 12 
 src/backend/replication/logical/worker.c|  4 
 src/backend/utils/time/snapmgr.c|  4 
 7 files changed, 48 insertions(+)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 2ec23016fe5..e4696d8d417 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -55,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
 
@@ -4244,12 +4245,18 @@ RemoveTempRelationsCallback(int code, Datum arg)
 {
 	if (OidIsValid(myTempNamespace))	/* should always be true */
 	{
+		Snapshot snap;
+
 		/* Need to ensure we have a usable transaction. */
 		AbortOutOfAnyTransaction();
 		StartTransactionCommand();
 
+		/* ensure xmin stays set */
+		snap = RegisterSnapshot(GetCatalogSnapshot(InvalidOid));
+
 		RemoveTempRelations(myTempNamespace);
 
+		UnregisterSnapshot(snap);
 		CommitTransactionCommand();
 	}
 }
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index cb157311154..4a324dfb4f1 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -31,6 +31,7 @@
 #include "utils/fmgroids.h"
 #include "utils/pg_lsn.h"
 #include "utils/rel.h"
+#include "utils/snapmgr.h"
 #include "utils/syscache.h"
 
 static List *textarray_to_stringlist(ArrayType *textarray);
@@ -286,6 +287,7 @@ UpdateSubscriptionRelState(Oid subid, Oid relid, char state,
 	bool		nulls[Natts_pg_subscription_rel];
 	Datum		values[Natts_pg_subscription_rel];
 	bool		replaces[Natts_pg_subscription_rel];
+	Snapshot snap = RegisterSnapshot(GetCatalogSnapshot(InvalidOid));
 
 	LockSharedObject(SubscriptionRelationId, subid, 0, AccessShareLock);
 
@@ -321,6 +323,8 @@ UpdateSubscriptionRelState(Oid subid, Oid relid, char state,
 
 	/* Cleanup. */
 	table_close(rel, NoLock);
+
+	UnregisterSnapshot(snap);
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 4e8263af4be..b5fe3649a22 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2720,6 +2720,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	char	   *relationName = NULL;
 	char	   *relationNamespace = NULL;
 	PGRUsage	ru0;
+	Snapshot	snap;
 
 	/*
 	 * Create a memory context that will survive forced transaction commits we
@@ -3189,6 +3190,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	 */
 
 	StartTransactionCommand();
+	snap = Regi

Potential (low likelihood) wraparound hazard in heap_abort_speculative()

2020-03-28 Thread Andres Freund

Hi,

heap_abort_speculative() does:
/*
 * The tuple will become DEAD immediately.  Flag that this page
 * immediately is a candidate for pruning by setting xmin to
 * RecentGlobalXmin.  That's not pretty, but it doesn't seem worth
 * inventing a nicer API for this.
 */
Assert(TransactionIdIsValid(RecentGlobalXmin));
PageSetPrunable(page, RecentGlobalXmin);

but that doesn't seem right to me. RecentGlobalXmin could very well be
older than the table's relfrozenxid. Especially when multiple databases
are used, or logical replication is active, it's not unlikely at all.

That's because RecentGlobalXmin is a) the minimum xmin of all databases,
whereas horizon computations for relations are done for only the current
database b) RecentGlobalXmin may have been computed a while ago (when
the snapshot for the transaction was computed, for example), but a
concurrent vacuum could be more recent c) RecentGlobalXmin includes the
more "pessimistic" xmin for catalog relations.

Unless somebody has a better idea for how to solve this in a
back-paptchable way, I think it'd be best to replace RecentGlobalXmin
with RecentXmin. That'd be safe as far as I can see.

Greetings,

Andres Freund

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread Tomas Vondra


On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote:

On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
 wrote:


...

The more I look at pathkeys_useful_for_ordering() the more I think the
get_useful_pathkeys_for_relation() function should look more like it
than the postgres_fdw one ...


If we go down that path (and indeed this is actually implied by
removing the volatile check too), doesn't that really just mean (at
least for now) that get_useful_pathkeys_for_relation goes away
entirely and we effectively use root->query_pathkeys directly? The
only thing your lose them is the check that each eclass has a member
in the rel. But that check probably wasn't quite right anyway: at
least for incremental sort (maybe not regular sort), I think we
probably don't necessarily care that the entire list has members in
the rel, but rather that some prefix does, right? The idea there would
be that we could create a gather merge here that supplies a partial
ordering (that prefix of query_pathkeys) and then above it the planner
might place another incremental sort (say above a join), or perhaps
even a join above that would actually be sufficient (since many joins
are capable of providing an ordering anyway).

I've attached (added prefix .txt to avoid the cfbot assuming this is a
full series) an idea in that direction to see if we're thinking along
the same route. You'll want to apply and view with `git diff -w`
probably.



Hmmm, I'm not sure the patch is quite correct.

Firstly, I suggest we don't remove get_useful_pathkeys_for_relation
entirely, because that allows us to add more useful pathkeys in the
future (even if we don't consider pathkeys useful for merging now).
We could also do the EC optimization in the function, return NIL and
not loop over partial_pathlist at all. That's cosmetic issue, though.

More importantly, I'm not sure this makes sense:

  /* consider incremental sort for interesting orderings */
  useful_pathkeys = truncate_useless_pathkeys(root, rel, subpath->pathkeys);

  ...

  is_sorted = pathkeys_common_contained_in(useful_pathkeys,
   subpath->pathkeys,
   &presorted_keys);

I mean, useful_pathkeys is bound to be a sublist of subpath->pathkeys,
right? So how could this ever return is_sorted=false?

The whole point is to end up with query_pathkeys (or whatever pathkeys
we deem useful), but this does not do that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread David Rowley

On Sun, 29 Mar 2020 at 06:26, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Sat, 28 Mar 2020 at 17:12, Laurenz Albe  wrote:
> >> In the light of that, I have no objections.
>
> > Thank you.  Pushed.
>
> It seems like this commit has resulted in some buildfarm instability:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
>
> apparent change of plan
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
>
> unstable results in stats_ext test

Yeah, thanks for pointing that out.  I'm just doing some tests locally
to see if I can recreate those results after vacuuming the mcv_list
table, so far I'm unable to.

[PATCH] Redudant initilization

2020-03-28 Thread Ranier Vilela

Hi,
This patch fixes some redundant initilization, that are safe to remove.

best regards,
Ranier Vilela


redudant_initialization.patch
Description: Binary data

debian bugrept involving fast default crash in pg11.7

2020-03-28 Thread Justin Pryzby

I happened across this bugreport, which seems to have just enough information
to be interesting.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953204
|Version: 11.7-0+deb10u1
|2020-03-05 16:55:55.511 UTC [515] LOG:  background worker "parallel worker" 
(PID 884) was terminated by signal 11: Segmentation fault
|2020-03-05 16:55:55.511 UTC [515] DETAIL:  Failed process was running: 
|SELECT distinct student_prob.student_id, student_prob.score, student_name, 
v_capacity_score.capacity
|FROM data JOIN model on model.id = 2 AND data_stage(data) = 
model.target_begin_field_id
|JOIN student_prob ON data.crm_id = student_prob.student_id AND model.id = 
student_prob.model_id AND (student_prob.additional_aid < 1)
|LEFT JOIN v_capacity_score ON data.crm_id = v_capacity_score.student_id AND 
student_prob.model_id = v_capacity_score.model_id
|WHERE data.term_code = '202090' AND student_prob.score > 0
|ORDER BY student_prob.score DESC, student_name
|LIMIT 100 OFFSET 100 ;

Tim: it'd be nice to get more information, if and when possible:
 - "explain" plan for that query;
 - \d for the tables involved: constraints, inheritence, defaults;
 - corefile or backtrace; it looks like there's two different crashes (maybe 
same problem) so both would be useful;
 - Can you reprodue the crash if you "SET max_parallel_workers_per_gather=0" ?
 - Do you know if it crashed under v11.6 ?

If anyone wants to hack on the .deb:
https://packages.debian.org/buster/amd64/postgresql-11/download and (I couldn't 
find the dbg package anywhere else)
https://snapshot.debian.org/package/postgresql-11/11.7-0%2Bdeb10u1/#postgresql-11-dbgsym_11.7-0:2b:deb10u1

$ mkdir pg11
$ cd pg11
$ wget -q 
http://security.debian.org/debian-security/pool/updates/main/p/postgresql-11/postgresql-11_11.7-0+deb10u1_amd64.deb
$ ar x ./postgresql-11_11.7-0+deb10u1_amd64.deb 
$ tar xf ./data.tar.xz 
$ ar x postgresql-11-dbgsym_11.7-0+deb10u1_amd64.deb
$ tar tf data.tar.xz
$ gdb usr/lib/postgresql/11/bin/postgres 

(gdb) set debug-file-directory usr/lib/debug/
(gdb) file usr/lib/postgresql/11/bin/postmaster 
(gdb) info target 

If I repeat the process Bernhard used (thanks for that) on the first crash in
libc6, I get:

(gdb) find /b 0x00022320, 0x0016839b, 0xf9, 0x20, 0x77, 0x1f, 
0xc5, 0xfd, 0x74, 0x0f, 0xc5, 0xfd, 0xd7, 0xc1, 0x85, 0xc0, 0x0f, 0x85, 0xdf, 
0x00, 0x00, 0x00, 0x48, 0x83, 0xc7, 0x20, 0x83, 0xe1, 0x1f, 0x48, 0x83, 0xe7, 
0xe0, 0xeb, 0x36, 0x66, 0x90, 0x83, 0xe1, 0x1f, 0x48, 0x83, 0xe7, 0xe0, 0xc5, 
0xfd, 0x74, 0x0f, 0xc5, 0xfd, 0xd7, 0xc1, 0xd3, 0xf8, 0x85, 0xc0, 0x74, 0x1b, 
0xf3, 0x0f, 0xbc, 0xc0, 0x48, 0x01, 0xf8, 0x48
0x15c17d <__strlen_avx2+13>
warning: Unable to access 1631 bytes of target memory at 0x167d3d, halting 
search.
1 pattern found.

I'm tentatively guessing that heap_modify_tuple() is involved, since it calls
getmissingattr and (probably) fill_val.  It looks like maybe some data
structure is corrupted which crashed two parallel workers, one in
fill_val()/strlen() and one in heap_deform_tuple()/getmissingattr().  Maybe
something not initialized in parallel worker, or a use-after-free?  I'll stop
guessing.

Justin

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread James Coleman

On Sat, Mar 28, 2020 at 5:30 PM Tomas Vondra
 wrote:
>
> On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote:
> >On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
> > wrote:
> >>
> >> ...
> >>
> >> The more I look at pathkeys_useful_for_ordering() the more I think the
> >> get_useful_pathkeys_for_relation() function should look more like it
> >> than the postgres_fdw one ...
> >
> >If we go down that path (and indeed this is actually implied by
> >removing the volatile check too), doesn't that really just mean (at
> >least for now) that get_useful_pathkeys_for_relation goes away
> >entirely and we effectively use root->query_pathkeys directly? The
> >only thing your lose them is the check that each eclass has a member
> >in the rel. But that check probably wasn't quite right anyway: at
> >least for incremental sort (maybe not regular sort), I think we
> >probably don't necessarily care that the entire list has members in
> >the rel, but rather that some prefix does, right? The idea there would
> >be that we could create a gather merge here that supplies a partial
> >ordering (that prefix of query_pathkeys) and then above it the planner
> >might place another incremental sort (say above a join), or perhaps
> >even a join above that would actually be sufficient (since many joins
> >are capable of providing an ordering anyway).
> >
> >I've attached (added prefix .txt to avoid the cfbot assuming this is a
> >full series) an idea in that direction to see if we're thinking along
> >the same route. You'll want to apply and view with `git diff -w`
> >probably.
> >
>
> Hmmm, I'm not sure the patch is quite correct.
>
> Firstly, I suggest we don't remove get_useful_pathkeys_for_relation
> entirely, because that allows us to add more useful pathkeys in the
> future (even if we don't consider pathkeys useful for merging now).
> We could also do the EC optimization in the function, return NIL and
> not loop over partial_pathlist at all. That's cosmetic issue, though.
>
> More importantly, I'm not sure this makes sense:
>
>/* consider incremental sort for interesting orderings */
>useful_pathkeys = truncate_useless_pathkeys(root, rel, subpath->pathkeys);
>
>...
>
>is_sorted = pathkeys_common_contained_in(useful_pathkeys,
> subpath->pathkeys,
> &presorted_keys);
>
> I mean, useful_pathkeys is bound to be a sublist of subpath->pathkeys,
> right? So how could this ever return is_sorted=false?
>
> The whole point is to end up with query_pathkeys (or whatever pathkeys
> we deem useful), but this does not do that.

Yes, that's obviously a thinko in my rush to get an idea out.

I think useful_pathkeys there would be essentially
root->query_pathkeys, or, more correctly the prefix of query_pathkeys
that has eclasses shared with the current rel. Or, if we go with the
more restrictive approach (see the two approaches I mentioned
earlier), then either NIL or query_pathkeys (if they all matches
eclasses in the current rel).

Given those requirements, I agree that keeping
get_useful_pathkeys_for_relation makes sense to wrap up all of that
behavior.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread Tomas Vondra


Hi,

Attached is my take on simplification of the useful pathkeyes thing. It
keeps the function, but it truncates query_pathkeys to only members with
EC members in the relation. I think that's essentially the optimization
you've proposed.

I've also noticed an issue in explain output. EXPLAIN ANALYZE on a simple
query gives me this:

  QUERY PLAN
---
 Gather Merge  (cost=66.30..816060.48 rows=8333226 width=24) (actual 
time=6.464..19091.006 rows=1000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Incremental Sort  (cost=66.28..729188.13 rows=4166613 width=24) (actual 
time=1.836..13401.109 rows=333 loops=3)
 Sort Key: a, b, c
 Presorted Key: a, b
 Full-sort Groups: 4156 (Methods: quicksort) Memory: 30kB (avg), 30kB 
(max)
 Presorted Groups: 4137 (Methods: quicksort) Memory: 108kB (avg), 111kB 
(max)
 Full-sort Groups: 6888 (Methods: ) Memory: 30kB (avg), 30kB (max)
 Presorted Groups: 6849 (Methods: ) Memory: 121kB (avg), 131kB (max)
 Full-sort Groups: 6869 (Methods: ) Memory: 30kB (avg), 30kB (max)
 Presorted Groups: 6816 (Methods: ) Memory: 128kB (avg), 132kB (max)
 ->  Parallel Index Scan using t_a_b_idx on t  (cost=0.43..382353.69 
rows=4166613 width=24) (actual time=0.033..9346.679 rows=333 loops=3)
 Planning Time: 0.133 ms
 Execution Time: 23998.669 ms
(15 rows)

while with incremental sort disabled it looks like this:

 QUERY PLAN
-
 Gather Merge  (cost=734387.50..831676.35 rows=8333226 width=24) (actual 
time=5597.978..14967.246 rows=1000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=734387.47..744804.00 rows=4166613 width=24) (actual 
time=5584.616..7645.711 rows=333 loops=3)
 Sort Key: a, b, c
 Sort Method: external merge  Disk: 111216kB
 Worker 0:  Sort Method: external merge  Disk: 109552kB
 Worker 1:  Sort Method: external merge  Disk: 112112kB
 ->  Parallel Seq Scan on t  (cost=0.00..105361.13 rows=4166613 
width=24) (actual time=0.011..1753.128 rows=333 loops=3)
 Planning Time: 0.048 ms
 Execution Time: 19682.582 ms
(11 rows)

So I think there's a couple of issues:

1) Missing worker identification (Worker #).

2) Missing method for workers (we have it for the leader, though).

3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
why it's in parenthesis.

4) Not sure having two lines for each worker is a great idea.

5) I'd probably prefer having multiple labels for avg/max memory values,
instead of (avg) and (max) notes. Also, I think we use "peak" in this
context instead of "max".


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 32bf734820..49d4990e66 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2748,7 +2748,6 @@ static List *
 get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
 {
List   *useful_pathkeys_list = NIL;
-   ListCell   *lc;
 
/*
 * Considering query_pathkeys is always worth it, because it might 
allow us
@@ -2756,29 +2755,27 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, 
RelOptInfo *rel)
 */
if (root->query_pathkeys)
{
-   boolquery_pathkeys_ok = true;
+   ListCell   *lc;
+   List   *pathkeys = NIL;
 
foreach(lc, root->query_pathkeys)
{
PathKey*pathkey = (PathKey *) lfirst(lc);
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-   Expr   *em_expr;
 
/*
-* We can't use incremental sort for pathkeys 
containing volatile
-* expressions. We could walk the exppression itself, 
but checking
-* ec_has_volatile here saves some cycles.
+* We can't build Incremental Sort when pathkeys 
contain elements
+* without an EC member not contained in the current 
relation, so
+* just ignore anything after the first such pathkey.
 */
-   if (pathkey_ec->ec_has_volatile ||
-   !(em_expr = find_em_expr_for_rel(pathkey_ec, 
rel)))
-   {
-   quer

Re: fix for BUG #3720: wrong results at using ltree

2020-03-28 Thread Tom Lane

Nikita Glukhov  writes:
> On 24.01.2020 21:29, Tomas Vondra wrote:
>> Unfortunately, the current code is somewhat undercommented :-(

> The main problem is that no one really understands how it works now.

Indeed.  I was disturbed to realize that lquery_op.c, despite being
far from trivial code, contained NOT ONE SINGLE COMMENT before today,
other than the content-free file header and a commented-out (visibly
unsafe, too) debugging printing function.  This is a long way south
of minimally acceptable, in my book.

Anyway, I concur that Nikita's two patches are bug fixes, so I pushed
them.  Nonetheless, he *did* hijack this thread, so in hopes of restoring
attention to the original topic, here's a rebased version of the original
patch.

My main complaint about it remains the same, that it changes a
disturbingly large number of existing regression-test results,
suggesting that there's not a meeting of the minds about what
this logic is supposed to do.  Maybe it's okay or maybe it's
not, but who's going to decide?

Also, now that I've looked at it a bit more, I'd be inclined to
strip out the parts of the patch that remove setting up the
LQUERY_HASNOT flag.  Even if we're not using that right now,
we might want it again someday, and we're not saving much of
anything by introducing a minor on-disk incompatibility.

regards, tom lane

diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index 1b31dbc..1eef086 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -722,7 +722,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.d.*';
 SELECT 'a.b.c.d.e'::ltree ~ '*.!d.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '*.!d';
@@ -752,7 +752,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.!e';
 SELECT 'a.b.c.d.e'::ltree ~ '*.!e.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!e';
@@ -770,7 +770,7 @@ SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!d';
 SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!d.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!f.*';
@@ -788,7 +788,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.!f.*';
 SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.!d.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '*.a.!d.*';
@@ -812,13 +812,13 @@ SELECT 'a.b.c.d.e'::ltree ~ 'a.!d.*';
 SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.!d.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '*.!b.c.*';
@@ -830,7 +830,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.!b.c.*';
 SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*.c.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '!b.*.c.*';
@@ -878,31 +878,31 @@ SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*.!c.*.e';
 SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*{1}.!c.*.e';
  ?column? 
 --
- t
+ f
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ 'a.!b.*{1}.!c.*.e';
  ?column? 
 --
- t
+ f
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '!b.*{1}.!c.*.e';
  ?column? 
 --
- t
+ f
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*{1}.!c.*.e';
  ?column? 
 --
- t
+ f
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*.!c.*.e';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '!b.!c.*';
@@ -932,19 +932,19 @@ SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*.!c.*';
 SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*{1}.!c.*';
  ?column? 
 --
- t
+ f
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ 'a.!b.*{1}.!c.*';
  ?column? 
 --
- t
+ f
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '!b.*{1}.!c.*';
  ?column? 
 --
- t
+ f
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*{1}.!c.*';
@@ -956,7 +956,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*{1}.!c.*';
 SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*.!c.*';
  ?column? 
 --
- f
+ t
 (1 row)
 
 SELECT 'a.b.c.d.e'::ltree ~ 'a.*{2}.*{2}';
@@ -983,6 +983,18 @@ SELECT 'a.b.c.d.e'::ltree ~ 'a.*{5}.*';
  f
 (1 row)
 
+SELECT '5.0.1.0'::ltree ~ '5.!0.!0.0';
+ ?column? 
+--
+ f
+(1 row)
+
+SELECT 'a.b'::ltree ~ '!a.!a';
+ ?column? 
+--
+ f
+(1 row)
+
 SELECT 'QWER_TY'::ltree ~ 'q%@*';
  ?column? 
 --
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index 5c7afe5..81fcb0e 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -19,16 +19,6 @@ PG_FUNCTION_INFO_V1(lt_q_rregex);
 
 #define NEXTVAL(x) ( (lquery*)( (char*)(x) + INTALIGN( VARSIZE(x) ) ) )
 
-typedef struct
-{
-	lquery_level *q;
-	int			nq;
-	ltree_level *t;
-	int			nt;
-	int			posq;
-	int			post;
-} FieldNot;
-
 static char *
 getlexeme(char *start, char *end, int *len)
 {
@@ -109,6 +99,9 @@ checkLevel(lquery_level *curq, ltree_level *curt)
 	int			(*cmpptr) (const char *, const char *, size_t);
 	lquery_variant *curvar = LQL_FIRST(curq);
 	int			i;
+	bool		success;
+
+	success = (curq->flag & LQL_NOT) ? false : true;
 
 	for (i = 0; i < curq->numvar; i++)
 	{
@@ -116,39 +109,26 @@ checkLevel(lquery_level

Minor bug in suffix truncation of non-key attributes from INCLUDE indexes

2020-03-28 Thread Peter Geoghegan

During a stress test of an experimental patch (which implements a new
technique for managing B-Tree index bloat caused by non-HOT updates),
I noticed a minor bug in _bt_truncate(). The issue affects Postgres 12
+ master.

The problem is that INCLUDE indexes don't have their non-key
attributes physically truncated away in a small minority of cases. I
say "physically" because we nevertheless encode the number of
remaining key attributes in the tuple correctly in all cases (we also
correctly encode the presence or absence of the special pivot tuple
heap TID representation in all cases). That is, we don't consistently
avoid non-key attribute space overhead in the new high key, even
though _bt_truncate() is clearly supposed to consistently avoid said
overhead. Even when it must add a heap TID to the high key using the
special pivot representation of heap TID, it shouldn't have to keep
around the non-key attributes.

This only happens when we cannot truncate away any key columns (and
must therefore include a heap TID in the new leaf high key) in an
INCLUDE index. This condition is rare because in general nbtsplitloc.c
goes out of its way to at least avoid having to include a heap TID in
new leaf page high keys. It's also rare because INCLUDE indexes are
generally only used with unique constraints/indexes, which makes it
particularly likely that nbtsplitloc.c will be able to avoid including
a heap TID in the final high key (unique indexes seldom have enough
duplicates to make nbtsplitloc.c ever use its "single value"
strategy).

Attached patch fixes the issue. Barring objections, I'll push this to
v12 + master branches early next week. The bug is low severity, but
then the fix is very low risk.

-- 
Peter Geoghegan


v1-0001-Consistently-truncate-non-key-suffix-columns.patch
Description: Binary data

Re: Improving connection scalability: GetSnapshotData()

2020-03-28 Thread Andres Freund

Hi,

On 2020-03-17 23:59:14 +1300, David Rowley wrote:
> Nice performance gains.

Thanks.

> On Sun, 1 Mar 2020 at 21:36, Andres Freund  wrote:
> 2. I don't quite understand your change in
> UpdateSubscriptionRelState(). snap seems unused. Drilling down into
> SearchSysCacheCopy2, in SearchCatCacheMiss() the systable_beginscan()
> passes a NULL snapshot.
> 
> the whole patch does this. I guess I don't understand why 0002 does this.

See the thread at 
https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2%40alap3.anarazel.de

Basically, the way catalog snapshots are handled right now, it's not
correct to much without a snapshot held. Any concurrent invalidation can
cause the catalog snapshot to be released, which can reset the backend's
xmin. Which in turn can allow for pruning etc to remove required data.

This is part of this series only because I felt I needed to add stronger
asserts to be confident in what's happening. And they started to trigger
all over :( - and weren't related to the patchset :(.

> 4. Did you consider the location of 'delayChkpt' in PGPROC. Couldn't
> you slot it in somewhere it would fit in existing padding?
> 
> 0007

Hm, maybe. I'm not sure what the best thing to do here is - there's some
arguments to be made that we should keep the fields moved from PGXACT
together on their own cacheline. Compared to some of the other stuff in
PGPROC they're still accessed from other backends fairly frequently.

> 5. GinPageIsRecyclable() has no comments at all. I know that
> ginvacuum.c is not exactly the modal citizen for function header
> comments, but likely this patch is no good reason to continue the
> trend.

Well, I basically just moved the code from the macro of the same
name... I'll add something.

> 9. I get the idea you don't intend to keep the debug message in
> InvisibleToEveryoneTestFullXid(), but if you do, then shouldn't it be
> using UINT64_FORMAT?

Yea, I don't intend to keep them - they're way too verbose, even for
DEBUG*. Note that there's some advantage in the long long cast approach
- it's easier to deal with for translations IIRC.

> 13. is -> are
> 
> * accessed data is stored contiguously in memory in as few cache lines as

Oh? 'data are stored' sounds wrong to me, somehow.

Greetings,

Andres Freund

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-28 Thread Tomas Vondra

On Sat, Mar 28, 2020 at 03:29:34PM +0530, Amit Kapila wrote:

On Sat, Mar 28, 2020 at 2:19 PM Dilip Kumar  wrote:

On Sat, Mar 28, 2020 at 11:56 AM Amit Kapila  wrote:
>
>
> I have looked at the solution proposed and I would like to share my
> findings.  I think calling ProcArrayApplyXidAssignment for each
> subtransaction is not a good idea for a couple of reasons:
> (a) It will just beat the purpose of maintaining KnowAssignedXids
> array which is to avoid looking at pg_subtrans in
> TransactionIdIsInProgress() on standby.  Basically, if we remove it
> for each subXid, it will consider the KnowAssignedXids to be
> overflowed and check pg_subtrans frequently.

Right, I also think this is a problem with this solution.  I think we
may try to avoid this by caching this information.  But, then we will
have to maintain this in some dimensional array which stores
sub-transaction ids per top transaction or we can maintain a list of
sub-transaction for each transaction.  I haven't thought about how
much complexity this solution will add.

How about if instead of writing an XLOG_XACT_ASSIGNMENT WAL, we set a
flag in TransactionStateData and then log that as special information
whenever we write next WAL record for a new subtransaction?  Then
during recovery, we can only call ProcArrayApplyXidAssignment when we
find that special flag is set in a WAL record.  One idea could be to
use a flag bit in XLogRecord.xl_info.  If that is feasible then the
solution can work as it is now, without any overhead or change in the
way we maintain KnownAssignedXids.

Ummm, how is that different from what the patch is doing now? I mean, we
only write the top-level XID for the first WAL record in each subxact,
right? Or what would be the difference with your approach?

Anyway, I think you're right the ProcArrayApplyXidAssignment call was
done too early, but I think that can be fixed by moving it until after
the RecordKnownAssignedTransactionIds call, no? Essentially, right
before rm_redo().

You're right calling ProcArrayApplyXidAssignment() may be an issue,
because it exclusively acquires the ProcArrayLock. I've actually hinted
that might be an issue in my original message, suggesting we might add a
local cache of assigned XIDs (a small static array, doing essentially
the same thing we used to do on the upstream node). I haven't done that
in my WIP patch to keep it simple, but AFACS it'd work.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Improving connection scalability: GetSnapshotData()

2020-03-28 Thread Thomas Munro

On Sun, Mar 29, 2020 at 1:49 PM Andres Freund  wrote:
> > 13. is -> are
> >
> > * accessed data is stored contiguously in memory in as few cache lines as
>
> Oh? 'data are stored' sounds wrong to me, somehow.

In computer contexts it seems pretty well established that we treat
"data" as an uncountable noun (like "air"), so I think "is" is right
here.  In maths or science contexts it's usually treated as a plural
following Latin, which admittedly sounds cleverer, but it also has a
slightly different meaning, not bits and bytes but something more like
samples or (wince) datums.

Can we get rid of GetLocaleInfoEx() yet?

2020-03-28 Thread Tom Lane

Commit 0fb54de9a ("Support building with Visual Studio 2015")
introduced a hack in chklocale.c's win32_langinfo() to make
it use GetLocaleInfoEx() in place of _create_locale().

There's a problem with this, which is that if I'm reading the
docs correctly, GetLocaleInfoEx() accepts a smaller set of
possible locale strings (only "locale names") than do either
_create_locale() or setlocale().  The _create_locale() docs say

The locale argument can take a locale name, a language string, a
language string and country/region code, a code page, or a language
string, country/region code, and code page.

and they imply (but don't quite manage to say in so many words)
that these are the same strings accepted by setlocale().

The reason this is a problem is that when given a locale string,
in either initdb or CREATE DATABASE, we first validate it by
seeing if setlocale() likes it.  We produce a reasonable error
message if not.  Otherwise we then go on to try to identify the
implied encoding via chklocale.c.  But if GetLocaleInfoEx()
fails, we fall back to trying to parse out the codepage for
ourselves, which can lead to a silly/unhelpful error message,
as recently complained of at [1].

The reason for the hack, per the comments, is that VS2015
omits a codepage field from the result of _create_locale();
and some optimism is expressed therein that Microsoft might
undo that oversight in future.  Has this been fixed in more
recent VS versions?  If not, can we find another, more robust
way to do it?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/F4D04849032C4464B8FF17CB0F896F9E%40dell2

Re: Improving connection scalability: GetSnapshotData()

2020-03-28 Thread Peter Geoghegan

On Sun, Mar 1, 2020 at 12:36 AM Andres Freund  wrote:
> The workload is a pgbench readonly, with pgbench -M prepared -c $conns
> -j $conns -S -n for each client count.  This is on a machine with 2
> Intel(R) Xeon(R) Platinum 8168, but virtualized.
>
> conns   tps master  tps pgxact-split
>
> 1   26842.49284526524.194821
> 10  246923.158682   249224.782661
> 50  695956.539704   709833.746374
> 100 1054727.043139  1903616.306028
> 200 964795.282957   1949200.338012
> 300 906029.377539   1927881.231478
> 400 845696.690912   1911065.369776
> 500 812295.222497   1926237.255856
> 600 888030.104213   1903047.236273
> 700 866896.532490   1886537.202142
> 800 863407.341506   1883768.592610
> 900 871386.608563   1874638.012128
> 1000887668.277133   1876402.391502
> 1500860051.361395   1815103.564241
> 2000890900.098657   1775435.271018
> 3000874184.980039   1653953.817997
> 4000845023.080703   1582582.316043
> 5000817100.195728   1512260.802371
>
> I think these are pretty nice results.

This scalability improvement is clearly very significant. There is
little question that this is a strategically important enhancement for
the Postgres project in general. I hope that you will ultimately be
able to commit the patchset before feature freeze.

I have heard quite a few complaints about the scalability of snapshot
acquisition in Postgres. Generally from very large users that are not
well represented on the mailing lists, for a variety of reasons. The
GetSnapshotData() bottleneck is a *huge* problem for us. (As problems
for Postgres users go, I would probably rank it second behind issues
with VACUUM.)

-- 
Peter Geoghegan

Re: Improving connection scalability: GetSnapshotData()

2020-03-28 Thread Bruce Momjian

On Sat, Mar 28, 2020 at 06:39:32PM -0700, Peter Geoghegan wrote:
> On Sun, Mar 1, 2020 at 12:36 AM Andres Freund  wrote:
> > The workload is a pgbench readonly, with pgbench -M prepared -c $conns
> > -j $conns -S -n for each client count.  This is on a machine with 2
> > Intel(R) Xeon(R) Platinum 8168, but virtualized.
> >
> > conns   tps master  tps pgxact-split
> >
> > 1   26842.49284526524.194821
> > 10  246923.158682   249224.782661
> > 50  695956.539704   709833.746374
> > 100 1054727.043139  1903616.306028
> > 200 964795.282957   1949200.338012
> > 300 906029.377539   1927881.231478
> > 400 845696.690912   1911065.369776
> > 500 812295.222497   1926237.255856
> > 600 888030.104213   1903047.236273
> > 700 866896.532490   1886537.202142
> > 800 863407.341506   1883768.592610
> > 900 871386.608563   1874638.012128
> > 1000887668.277133   1876402.391502
> > 1500860051.361395   1815103.564241
> > 2000890900.098657   1775435.271018
> > 3000874184.980039   1653953.817997
> > 4000845023.080703   1582582.316043
> > 5000817100.195728   1512260.802371
> >
> > I think these are pretty nice results.
> 
> This scalability improvement is clearly very significant. There is
> little question that this is a strategically important enhancement for
> the Postgres project in general. I hope that you will ultimately be
> able to commit the patchset before feature freeze.

+1

> I have heard quite a few complaints about the scalability of snapshot
> acquisition in Postgres. Generally from very large users that are not
> well represented on the mailing lists, for a variety of reasons. The
> GetSnapshotData() bottleneck is a *huge* problem for us. (As problems
> for Postgres users go, I would probably rank it second behind issues
> with VACUUM.)

+1

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread David Rowley

On Sun, 29 Mar 2020 at 10:30, David Rowley  wrote:
>
> On Sun, 29 Mar 2020 at 06:26, Tom Lane  wrote:
> >
> > David Rowley  writes:
> > > On Sat, 28 Mar 2020 at 17:12, Laurenz Albe  
> > > wrote:
> > >> In the light of that, I have no objections.
> >
> > > Thank you.  Pushed.
> >
> > It seems like this commit has resulted in some buildfarm instability:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
> >
> > apparent change of plan
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
> >
> > unstable results in stats_ext test
>
> Yeah, thanks for pointing that out.  I'm just doing some tests locally
> to see if I can recreate those results after vacuuming the mcv_list
> table, so far I'm unable to.

I'm considering pushing the attached to try to get some confirmation
that additional autovacuums are the issue. However, I'm not too sure
it's a wise idea to as I can trigger an additional auto-vacuum and
have these new tests fail with make installcheck after setting
autovacuum_naptime to 1s, but I'm not getting the other diffs
experienced by lousyjack and petalura.  The patch may just cause more
failures without proving much, especially so with slower machines.

The other idea I had was just to change the
autovacuum_vacuum_insert_threshold relopt to -1 for the problem tables
and see if that stabilises things.

Yet another option would be to see if reltuples varies between runs
and ditch the autovacuum_count column from the attached.  There does
not appear to be any part of the tests which would cause any dead
tuples in any of the affected relations, so I'm unsure why reltuples
would vary between what ANALYZE and VACUUM would set it to.

I'm still thinking. Input welcome.

David

temp_telemetry_checks_for_unstable_tests.patch
Description: Binary data

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread Tomas Vondra


On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote:

On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra
 wrote:


Hi,

Attached is my take on simplification of the useful pathkeyes thing. It
keeps the function, but it truncates query_pathkeys to only members with
EC members in the relation. I think that's essentially the optimization
you've proposed.


Thanks. I've included that in the patch series in this email (as a
separate patch) with a few additional comments.



Thanks.


I've also noticed that the enabled_incrementalsort check in
generate_useful_gather_paths seemed broken, because it returned us out
of the function before creating either a plain gather merge (if
already sorted) or an explicit sort path. I've included a patch that
moves it to the if block that actually builds the incremental sort
path.



Hmmm, that's probably right.

The reason why the GUC check was right after generate_gather_paths is
that the intent to disable all the useful-pathkeys business, essentially
reducing it back to plain generate_gather_paths.

But I think you're right that's wrong, because it might lead to strange
behavior when the GUC switches between plans without any incremental
sort nodes - setting it to 'true' might end up picking GM on top of
plain Sort, for example.



...

1) Missing worker identification (Worker #).


Fixed.


2) Missing method for workers (we have it for the leader, though).


Fixed. Since we can't have pointers in the parallel shared memory
space, we can't store the sort methods used in a list. To accomplish
the same goal, I've assigned the TuplesortMethod enum entries uique
bit positions, and store the methods used in a bitmask.



OK, makes sense.


3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
why it's in parenthesis.


I've removed the parentheses. It's labeled "Methods" since there can
be more than one (different batches could use different methods). I've
updated this to properly use singular/plural depending on the number
of methods used.



I'm a bit confused. How do I know which batch used which method? Is it
actually worth printing in explain analyze? Maybe only print it in the
verbose mode?


4) Not sure having two lines for each worker is a great idea.


I've left these in for now because the lines are already very long
(much, much longer than the worker lines in a standard sort node).
This is largely because we're trying to summarize many sort batches,
while standard sort nodes only have to give the exact stats from a
single batch.

See the example output later in the email.



OK


5) I'd probably prefer having multiple labels for avg/max memory values,
instead of (avg) and (max) notes. Also, I think we use "peak" in this
context instead of "max".


Updated.



OK


Here's the current output:

Limit  (cost=1887419.20..1889547.68 rows=1 width=8) (actual
time=13218.403..13222.519 rows=1 loo
ps=1)
  ->  Gather Merge  (cost=1887419.20..19624748.03 rows=8360
width=8) (actual time=13218.401..13229.7
50 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Incremental Sort  (cost=1886419.17..10005010.55
rows=4180 width=8) (actual time=13208.00
4..13208.586 rows=4425 loops=3)
  Sort Key: a, b
  Presorted Key: a
  Full-sort Groups: 1 Sort Method: quicksort Memory:
avg=28kB peak=28kB
  Presorted Groups: 1 Sort Method: top-N heapsort Memory:
avg=1681kB peak=1681kB
  Worker 0:  Full-sort Groups: 1 Sort Method: quicksort
Memory: avg=28kB peak=28kB
Presorted Groups: 1 Sort Method: top-N heapsort
Memory: avg=1680kB peak=1680kB
  Worker 1:  Full-sort Groups: 1 Sort Method: quicksort
Memory: avg=28kB peak=28kB
Presorted Groups: 1 Sort Method: top-N heapsort
Memory: avg=1682kB peak=1682kB
  ->  Parallel Index Scan using index_s_a on s
(cost=0.57..4967182.06 rows=4180 width=8
) (actual time=0.455..11730.878 rows=668 loops=3)

James


Looks reasonable. Did you try it in other output formats - json/yaml?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread James Coleman

On Sat, Mar 28, 2020 at 11:14 PM Tomas Vondra
 wrote:
>
> On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote:
> >On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra
> > wrote:
> >>
> >> Hi,
> >>
> >> Attached is my take on simplification of the useful pathkeyes thing. It
> >> keeps the function, but it truncates query_pathkeys to only members with
> >> EC members in the relation. I think that's essentially the optimization
> >> you've proposed.
> >
> >Thanks. I've included that in the patch series in this email (as a
> >separate patch) with a few additional comments.
> >
>
> Thanks.
>
> >I've also noticed that the enabled_incrementalsort check in
> >generate_useful_gather_paths seemed broken, because it returned us out
> >of the function before creating either a plain gather merge (if
> >already sorted) or an explicit sort path. I've included a patch that
> >moves it to the if block that actually builds the incremental sort
> >path.
> >
>
> Hmmm, that's probably right.
>
> The reason why the GUC check was right after generate_gather_paths is
> that the intent to disable all the useful-pathkeys business, essentially
> reducing it back to plain generate_gather_paths.
>
> But I think you're right that's wrong, because it might lead to strange
> behavior when the GUC switches between plans without any incremental
> sort nodes - setting it to 'true' might end up picking GM on top of
> plain Sort, for example.

Thanks.

> >>
> >> ...
> >>
> >> 1) Missing worker identification (Worker #).
> >
> >Fixed.
> >
> >> 2) Missing method for workers (we have it for the leader, though).
> >
> >Fixed. Since we can't have pointers in the parallel shared memory
> >space, we can't store the sort methods used in a list. To accomplish
> >the same goal, I've assigned the TuplesortMethod enum entries uique
> >bit positions, and store the methods used in a bitmask.
> >
>
> OK, makes sense.
>
> >> 3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
> >> why it's in parenthesis.
> >
> >I've removed the parentheses. It's labeled "Methods" since there can
> >be more than one (different batches could use different methods). I've
> >updated this to properly use singular/plural depending on the number
> >of methods used.
> >
>
> I'm a bit confused. How do I know which batch used which method? Is it
> actually worth printing in explain analyze? Maybe only print it in the
> verbose mode?

The alternative is showing no sort method information at all, or only
showing it if all batches used the same method (which seems confusing
to me). It seems weird that we wouldn't try to find some rough
analogue to what a regular sort node outputs, so I've attempted to
summarize.

This is similar to the memory information: the average doesn't apply
to any one batch, and you don't know which one (or how many) hit the
peak memory usage either, but I think it's meaningful to know a
summary.

With the sort methods, I think it's useful to be able to, for example,
know if any of the groups happened to trigger the top-n heapsort
optimization, or not, and as a corollary, if all of them did or not.

> >> 4) Not sure having two lines for each worker is a great idea.
> >
> >I've left these in for now because the lines are already very long
> >(much, much longer than the worker lines in a standard sort node).
> >This is largely because we're trying to summarize many sort batches,
> >while standard sort nodes only have to give the exact stats from a
> >single batch.
> >
> >See the example output later in the email.
> >
>
> OK
>
> >> 5) I'd probably prefer having multiple labels for avg/max memory values,
> >> instead of (avg) and (max) notes. Also, I think we use "peak" in this
> >> context instead of "max".
> >
> >Updated.
> >
>
> OK
>
> >Here's the current output:
> >
> > Limit  (cost=1887419.20..1889547.68 rows=1 width=8) (actual
> >time=13218.403..13222.519 rows=1 loo
> >ps=1)
> >   ->  Gather Merge  (cost=1887419.20..19624748.03 rows=8360
> >width=8) (actual time=13218.401..13229.7
> >50 rows=1 loops=1)
> > Workers Planned: 2
> > Workers Launched: 2
> > ->  Incremental Sort  (cost=1886419.17..10005010.55
> >rows=4180 width=8) (actual time=13208.00
> >4..13208.586 rows=4425 loops=3)
> >   Sort Key: a, b
> >   Presorted Key: a
> >   Full-sort Groups: 1 Sort Method: quicksort Memory:
> >avg=28kB peak=28kB
> >   Presorted Groups: 1 Sort Method: top-N heapsort Memory:
> >avg=1681kB peak=1681kB
> >   Worker 0:  Full-sort Groups: 1 Sort Method: quicksort
> >Memory: avg=28kB peak=28kB
> > Presorted Groups: 1 Sort Method: top-N heapsort
> >Memory: avg=1680kB peak=1680kB
> >   Worker 1:  Full-sort Groups: 1 Sort Method: quicksort
> >Memory: avg=28kB peak=28kB
> > Presorted Groups: 1 Sort Method: top-N heapsort
> >Memory: avg=1682kB peak=1682kB
> >   ->  Parallel Index Scan using index_s_a on

Re: error context for vacuum to include block number

2020-03-28 Thread Masahiko Sawada

On Sat, 28 Mar 2020 at 13:23, Amit Kapila  wrote:
>
> On Sat, Mar 28, 2020 at 7:04 AM Justin Pryzby  wrote:
> >
> > On Sat, Mar 28, 2020 at 06:59:10AM +0530, Amit Kapila wrote:
> > > On Sat, Mar 28, 2020 at 6:46 AM Justin Pryzby  
> > > wrote:
> > > >
> > > > On Sat, Mar 28, 2020 at 06:28:38AM +0530, Amit Kapila wrote:
> > > > > > Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
> > > > > > kill+sleep.  The kill() could come from running 
> > > > > > pg_cancel_backend().  And the
> > > > > > sleep() just encourages a context switch, which can happen at any 
> > > > > > time.
> > > > >
> > > > > pg_sleep internally uses CHECK_FOR_INTERRUPTS() due to which it would
> > > > > have accepted the signal sent via pg_cancel_backend().  Can you try
> > > > > your scenario by temporarily removing CHECK_FOR_INTERRUPTS from
> > > > > pg_sleep() or maybe better by using OS Sleep call?
> > > >
> > > > Ah, that explains it.  Right, I'm not able to induce a crash with 
> > > > usleep().
> > > >
> > > > Do you want me to resend a patch without that change ?  I feel like 
> > > > continuing
> > > > to trade patches is more likely to introduce new errors or lose someone 
> > > > else's
> > > > changes than to make much progress.  The patch has been through enough
> > > > iterations and it's very easy to miss an issue if I try to eyeball it.
> > >
> > > I can do it but we have to agree on the other two points (a) I still
> > > feel that switching to the truncate phase should be done at the place
> > > from where we are calling lazy_truncate_heap and (b)
> > > lazy_cleanup_index should switch back the error phase after calling
> > > index_vacuum_cleanup.  I have explained my reasoning for these points
> > > a few emails back.
> >
> > I have no objection to either.  It was intuitive to me to do it how I
> > originally wrote it but I'm not wedded to it.
> >
>
> Please find attached the updated patch with all the changes discussed.
> Let me know if I have missed anything?
>

Thank you for updating the patch! Looks good to me.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Improving connection scalability: GetSnapshotData()

2020-03-28 Thread Andres Freund

Hi,

On 2020-03-28 18:39:32 -0700, Peter Geoghegan wrote:
> I have heard quite a few complaints about the scalability of snapshot
> acquisition in Postgres. Generally from very large users that are not
> well represented on the mailing lists, for a variety of reasons. The
> GetSnapshotData() bottleneck is a *huge* problem for us. (As problems
> for Postgres users go, I would probably rank it second behind issues
> with VACUUM.)

Yea, I see it similarly. For busy databases, my experience is that
vacuum is the big problem for write heavy workloads (or the write
portion), and snapshot scalability the big problem for read heavy oltp
workloads.

> This scalability improvement is clearly very significant. There is
> little question that this is a strategically important enhancement for
> the Postgres project in general. I hope that you will ultimately be
> able to commit the patchset before feature freeze.

I've done a fair bit of cleanup, but I'm still fighting with how to
implement old_snapshot_threshold in a good way. It's not hard to get it
back to kind of working, but it requires some changes that go into the
wrong direction.

The problem basically is that the current old_snapshot_threshold
implementation just reduces OldestXmin to whatever is indicated by
old_snapshot_threshold, even if not necessary for pruning to do the
specific cleanup that's about to be done. If OldestXmin < threshold,
it'll set shared state that fails all older accesses.  But that doesn't
really work well with approach in the patch of using a lower/upper
boundary for potentially valid xmin horizons.

I thinkt he right approach would be to split
TransactionIdLimitedForOldSnapshots() into separate parts. One that
determines the most aggressive horizon that old_snapshot_threshold
allows, and a separate part that increases the threshold after which
accesses need to error out
(i.e. SetOldSnapshotThresholdTimestamp()). Then we can only call
SetOldSnapshotThresholdTimestamp() for exactly the xids that are
removed, not for the most aggressive interpretation.

Unfortunately I think that basically requires changing
HeapTupleSatisfiesVacuum's signature, to take a more complex parameter
than OldestXmin (to take InvisibleToEveryoneState *), which quickly
increases the size of the patch.

I'm currently doing that and seeing how the result makes me feel about
the patch.

Alternatively we also can just be less efficient and call
GetOldestXmin() more aggressively when old_snapshot_threshold is
set. That'd be easier to implement - but seems like an ugly gotcha.

Greetings,

Andres Freund

Re: backup manifests

2020-03-28 Thread Noah Misch

On Fri, Mar 27, 2020 at 01:53:54PM -0400, Robert Haas wrote:
> - Replace a doc paragraph about the advantages and disadvantages of
> CRC-32C with one by Stephen Frost, with a slightly change by me that I
> thought made it sound more grammatical.

Defaulting to CRC-32C seems prudent to me:

- As Andres Freund said, SHA-512 is slow relative to storage now available.
  Since gzip is a needlessly-slow choice for backups (or any application that
  copies the compressed data just a few times), comparison to "gzip -6" speed
  is immaterial.

- While I'm sure some other fast hash would be a superior default, introducing
  a new algorithm is a bikeshed, as you said.  This design makes it easy,
  technically, for someone to introduce a new algorithm later.  CRC-32C is not
  catastrophically unfit for 1GiB files.

- Defaulting to SHA-512 would, in the absence of a WAL archive that also uses
  a cryptographic hash function, give a false sense of having achieved some
  coherent cryptographic goal.  With the CRC-32C default, WAL and the rest get
  similar protection.  I'm discounting the case of using BASE_BACKUP without a
  WAL archive, because I expect little intersection between sites "worried
  enough to hash everything" and those "not worried enough to use an archive".
  (On the other hand, the program that manages the WAL archive can reasonably
  own hashing base backups; putting ownership in the server isn't achieving
  much extra.)

> + 
> +  pg_validatebackup
> +  verify the integrity of a base backup of a
> +  PostgreSQL cluster
> + 

> +
> +  
> +pg_wal is ignored because WAL files are sent
> +separately from the backup, and are therefore not described by the
> +backup manifest.
> +  
> +

Stephen Frost mentioned that a backup could pass validation even if
pg_basebackup were killed after writing the base backup and before finishing
the writing of pg_wal.  One might avoid that by simply writing the manifest to
a temporary name and renaming it to the final name after populating pg_wal.

What do you think of having the verification process also call pg_waldump to
validate the WAL CRCs (shown upthread)?  That looked helpful and simple.

I think this functionality doesn't belong in its own program.  If you suspect
pg_basebackup or pg_restore will eventually gain the ability to merge
incremental backups into a recovery-ready base backup, I would put the
functionality in that program.  Otherwise, I would put it in pg_checksums.
For me, part of the friction here is that the program description indicates
general verification, but the actual functionality merely checks hashes on a
directory tree that happens to represent a PostgreSQL base backup.

> + parse->pathname = palloc(raw_length + 1);

I don't see this freed anywhere; is it?  (It's useful to make peak memory
consumption not grow in proportion to the number of files backed up.)

[This message is not a full code review.]

snapper vs. HEAD

2020-03-28 Thread Tom Lane

Buildfarm member snapper has been crashing in the core regression tests
since commit 17a28b0364 (well, there's a bit of a range of uncertainty
there, but 17a28b0364 looks to be the only such commit that could have
affected code in gistget.c where the crash is).  Curiously, its sibling
skate is *not* failing, despite being on the same machine and compiler.

I looked into this by dint of setting up a similar environment in a
qemu VM.  I might not have reproduced things exactly, but I managed
to get the same kind of crash at approximately the same place, and
what it looks like to me is a compiler bug.  It's iterating
gistindex_keytest's loop over search keys

ScanKey key = scan->keyData;
int keySize = scan->numberOfKeys;

while (keySize > 0)
{
...
key++;
keySize--;
}

one time too many, and accessing a garbage ScanKey value off the end of
the keyData array, leading to a function call into never-never land.

I'm no expert on Sparc assembly code, but it looks like the compiler
forgot the ",a" (annul) modifier here:

.loc 1 181 0
andcc   %o7, 64, %g0
be,pt   %icc, .L134 <
 addcc  %l5, -1, %l5
.loc 1 183 0
lduh[%i4+16], %o7
add %i4, %o7, %o7
lduh[%o7+12], %o7
andcc   %o7, 1, %g0
bne,pt  %icc, .L141
 ld [%fp-32], %g2
.loc 1 163 0
ba,pt   %xcc, .L134
 addcc  %l5, -1, %l5

causing %l5 (which contains the keySize value) to be decremented
an extra time in the case where that branch is not taken and
we fall through as far as the "ba" at the end.  Even that would
not be fatal, perhaps, except that the compiler also decided to
optimize the "keySize > 0" test to "keySize != 0", for ghod only
knows what reason (surely it's not any faster on a RISC machine),
so that arriving at .L134 with %l5 containing -1 allows the loop
to be iterated again.  Kaboom.

It's unclear how 17a28b0364 would have affected this, but there is
an elog() call elsewhere in the same function, so maybe the new
coding for that changed register assignments or some other
phase-of-the-moon effect.

I doubt that anyone's going to take much interest in fixing this
old compiler version, so my recommendation is to back off the
optimization level on snapper to -O1, and probably on skate as
well because there's no obvious reason why the same compiler bug
might not bite skate at some point.  I was able to get through
the core regression tests on my qemu VM after recompiling
gistget.c at -O1 (with other flags the same as snapper is using).

regards, tom lane

Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-28 Thread Amit Kapila

On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud  wrote:
>
> On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > >
> > > I see some basic problems with the patch.  The way it tries to compute
> > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > or point me to any test done where we have computed WAL for parallel
> > > operations like Parallel Vacuum or Parallel Create Index?
> >
> > Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> > workers won't be accounted for.  That being said, I think that an argument
> > could be made that proper infrastructure should have been added in the 
> > original
> > parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> > usage in parallel utility, unless I'm missing something.
>
> Just to be sure I did a quick test with pg_stat_statements behavior using
> parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> doesn't reflect parallel workers' activity.
>

Sawada-San would like to investigate this? If not, I will look into
this next week.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WAL usage calculation patch

2020-03-28 Thread Amit Kapila

On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud  wrote:
>
> On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> >
> > Basically,
> > I don't know changes done in ExecInitParallelPlan and friends allow us
> > to compute WAL for parallel operations.  Those will primarily cover
> > parallel queries that won't write WAL.  How you have tested those
> > changes?
>
> I didn't tested those, and I'm not even sure how to properly and reliably test
> that.  Do you have any advice on how to achieve that?
>
> However the patch is mimicking the buffer instrumentation that already exists,
> and the approach also looks correct to me.  Do you have a reason to believe
> that the approach that works for buffer usage wouldn't work for WAL records? 
> (I
> of course agree that this should be tested anyway)
>

The buffer usage infrastructure is for read-only queries (for ex. for
stats like blks_hit, blks_read).  As far as I can think, there is no
easy way to test the WAL usage via that API.  It might or might not be
required in the future depending on whether we decide to use the same
infrastructure for parallel writes.  I think for now we should remove
that part of changes and rather think how to get that for parallel
operations that can write WAL.  For ex. we might need to do something
similar to what this patch has done in begin_parallel_vacuum and
end_parallel_vacuum.  Would you like to attempt that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench - refactor init functions with buffers

2020-03-28 Thread Fabien COELHO



Hello Andres,


That being the case, I'd think a better design principle is "make your
new code look like the code around it", which would tend to weigh against
introducing StringInfo uses into pgbench when there's none there now and
a bunch of PQExpBuffer instead.  So I can't help thinking the advice
you're being given here is suspect.


I don't agree with this. This is a "fresh" usage of StringInfo. That's
different to adding one new printed line among others built with
pqexpbuffer. If we continue adding large numbers of new uses of both
pieces of infrastructure, we're just making things more confusing.


My 0.02 € :

 - I'm in favor or having one tool for one purpose, so a fe/be common
StringInfo interface is fine with me;

 - I prefer to avoid using both PQExpBuffer & StringInfo in the same file, 
because they do the exact same thing and it is locally confusing;


 - I'd be fine with switching all of pgbench to StringInfo, as there are 
only 31 uses;


 - But, pgbench relies on psql scanner, which uses PQExpBuffer in 
PsqlScanState, so mixing is unavoidable, unless PQExpBuffer & StringInfo

are the same thing (i.e. typedef + cpp/inline/function wrappers);

 - There are 1260 uses of PQExpBuffer in psql that, although they are 
trivial, I'm in no hurry to update.


--
Fabien.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-28 Thread Amit Kapila

On Sun, Mar 29, 2020 at 6:29 AM Tomas Vondra
 wrote:
>
> On Sat, Mar 28, 2020 at 03:29:34PM +0530, Amit Kapila wrote:
> >On Sat, Mar 28, 2020 at 2:19 PM Dilip Kumar  wrote:
> >
> >How about if instead of writing an XLOG_XACT_ASSIGNMENT WAL, we set a
> >flag in TransactionStateData and then log that as special information
> >whenever we write next WAL record for a new subtransaction?  Then
> >during recovery, we can only call ProcArrayApplyXidAssignment when we
> >find that special flag is set in a WAL record.  One idea could be to
> >use a flag bit in XLogRecord.xl_info.  If that is feasible then the
> >solution can work as it is now, without any overhead or change in the
> >way we maintain KnownAssignedXids.
> >
>
> Ummm, how is that different from what the patch is doing now? I mean, we
> only write the top-level XID for the first WAL record in each subxact,
> right? Or what would be the difference with your approach?
>

We have to do what the patch is currently doing and additionally, we
will set this flag after PGPROC_MAX_CACHED_SUBXIDS which would allow
us to call ProcArrayApplyXidAssignment during WAL replay only after
PGPROC_MAX_CACHED_SUBXIDS number of subxacts.  It will help us in
clearing the KnownAssignedXids at the same time as we do now, so no
additional performance overhead.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: error context for vacuum to include block number

2020-03-28 Thread Amit Kapila

On Sun, Mar 29, 2020 at 9:04 AM Masahiko Sawada
 wrote:
>
> On Sat, 28 Mar 2020 at 13:23, Amit Kapila  wrote:
> >
> >
> > Please find attached the updated patch with all the changes discussed.
> > Let me know if I have missed anything?
> >
>
> Thank you for updating the patch! Looks good to me.
>

Okay, I will push this tomorrow.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-28 Thread Julien Rouhaud

On Fri, Mar 27, 2020 at 03:42:50PM +0100, Julien Rouhaud wrote:
> On Fri, Mar 27, 2020 at 2:01 PM Fujii Masao  
> wrote:
> >
> 
> > So what I'd like to say is that the information that users are interested
> > in would vary on each situation and case. At least for me it seems
> > enough for pgss to report only the basic information. Then users
> > can calculate to get the numbers (like total_time) they're interested in,
> > from those basic information.
> >
> > But of course, I'd like to hear more opinions about this...
> 
> +1
> 
> Unless someone chime in by tomorrow, I'll just drop the sum as it
> seems less controversial and not a blocker in userland if users are
> interested.

Done in attached v11, with also the s/querytext/query_text/ discrepancy noted
previously.

> > >
> > > I also exported BufferUsageAccumDiff as mentioned previously, as it seems
> > > clearner and will avoid future useless code churn, and run pgindent.
> >
> > Many thanks!! I'm thinking to commit this part separately.
> > So I made that patch based on your patch. Attached.
> 
> Thanks! It looks good to me.

I also kept that part in a distinct commit for convenience.
>From 0d9bf628dac15ce9e790d3a040ef1123c7c6e7d9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 28 Mar 2019 13:33:23 +0100
Subject: [PATCH v11 1/3] Pass query string to the planner

---
 src/backend/commands/copy.c  |  3 ++-
 src/backend/commands/createas.c  |  3 ++-
 src/backend/commands/explain.c   |  2 +-
 src/backend/commands/extension.c |  2 +-
 src/backend/commands/matview.c   |  2 +-
 src/backend/commands/portalcmds.c|  2 +-
 src/backend/executor/functions.c |  1 +
 src/backend/optimizer/plan/planner.c | 10 ++
 src/backend/tcop/postgres.c  | 13 -
 src/backend/utils/cache/plancache.c  |  3 ++-
 src/include/optimizer/optimizer.h|  3 ++-
 src/include/optimizer/planner.h  |  4 +++-
 src/include/tcop/tcopprot.h  |  6 --
 13 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fbde9f88e7..efb1e0d03e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1580,7 +1580,8 @@ BeginCopy(ParseState *pstate,
 		}
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, NULL);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, NULL);
 
 		/*
 		 * With row level security and a user using "COPY relation TO", we
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3a5676fb39..9febdc5165 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,7 +329,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
 		Assert(query->commandType == CMD_SELECT);
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, params);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, params);
 
 		/*
 		 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ff2f45cfb2..ee0e638f33 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -376,7 +376,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 		INSTR_TIME_SET_CURRENT(planstart);
 
 		/* plan the query */
-		plan = pg_plan_query(query, cursorOptions, params);
+		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 00cf4ef268..38cbea385a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -751,7 +751,7 @@ execute_sql_string(const char *sql)
 		   NULL,
 		   0,
 		   NULL);
-		stmt_list = pg_plan_queries(stmt_list, CURSOR_OPT_PARALLEL_OK, NULL);
+		stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
 
 		foreach(lc2, stmt_list)
 		{
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c3954f3e24..e5a5eef102 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -391,7 +391,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	CHECK_FOR_INTERRUPTS();
 
 	/* Plan the query which will generate data for the refresh. */
-	plan = pg_plan_query(query, 0, NULL);
+	plan = pg_plan_query(query, queryString, 0, NULL);
 
 	/*
 	 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 40be5069fe..6a2c233615 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -90,7 +90,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
 		elog(ERROR, "non-SELECT s

Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-28 Thread Masahiko Sawada

On Sun, 29 Mar 2020 at 14:23, Amit Kapila  wrote:
>
> On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud  wrote:
> >
> > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > >
> > > > I see some basic problems with the patch.  The way it tries to compute
> > > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > > or point me to any test done where we have computed WAL for parallel
> > > > operations like Parallel Vacuum or Parallel Create Index?
> > >
> > > Ah, that's indeed a good point and AFAICT WAL records from parallel 
> > > utility
> > > workers won't be accounted for.  That being said, I think that an argument
> > > could be made that proper infrastructure should have been added in the 
> > > original
> > > parallel utility patches, as pg_stat_statement is already broken wrt. 
> > > buffer
> > > usage in parallel utility, unless I'm missing something.
> >
> > Just to be sure I did a quick test with pg_stat_statements behavior using
> > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer 
> > usage
> > doesn't reflect parallel workers' activity.
> >
>
> Sawada-San would like to investigate this? If not, I will look into
> this next week.

Sure, I'll investigate this issue today.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

77 matches

Mail list logo