Re: Problems around compute_query_id

2021-04-12 Thread Michael Banck
Hi,

On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:
> > Fujii-san has reported on Twitter that enabling the computation of
> > query IDs does not work properly with log_statement as the query ID is
> > calculated at parse analyze time and the query is logged before that.
> > As far as I can see, that's really a problem as any queries logged are
> > combined with a query ID of 0, and log parsers would not really be
> > able to use that, even if the information provided by for example
> > log_duration gives the computed query ID prevent in pg_stat_activity.
> 
> I don't see any way around that.  The goal of log_statements is to log all
> syntactically valid queries, including otherwise invalid queries.  For
> instance, it would log
> 
> SELECT notacolumn;
> 
> and there's no way to compute a queryid in that case.  I think that
> log_statements already causes some issues with log parsers.  At least pgbadger
> recommends to avoid using that:
> 
> "Do not enable log_statement as its log format will not be parsed by 
> pgBadger."
> 
> I think we should simply document that %Q is not compatible with
> log_statements.

What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz




Re: Problems around compute_query_id

2021-04-12 Thread Julien Rouhaud
On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:
> 
> What about log_statement_sample_rate ? Does compute_query_id have the
> same problem with that?

No, log_statement_sample_rate samples log_min_duration_statements, not
log_statements so it works as expected.




回复:Bug on update timing of walrcv->flushedUpto variable

2021-04-12 Thread 蔡梦娟(玊于)
   Hi.

   I still feel confused about some point, hope to get your answer: 
   1)  You said that  "We shouldn't rewind flushedUpto to backward. The 
variable notifies how far recovery (or startup process) can read WAL content 
safely. "
   This fix only rewinds flushedUpto when request wal streaming, and rewind 
flushedUpto means it is re-receiving one wal segment file, which indicates that 
 there maybe some wrong data in previous one. flushedUpto nitifies how far 
startup process can read WAL content safely because WAL before that has been 
flushed to disk. However, this doesn't mean it is correct to read the WAL 
because the content maybe invalid. And if we rewind flushedUpto, 
WaitForWALToBecomeAvailable function returns true only when the WAL re-received 
is greater than what XLogReadRecord needs, at this time it is correct and also 
safe to read the content. 
By the way, what do you think of updating flushedUpto to the 
LogstreamResult.Flush when start streaming not when request streaming, the 
value of LogstreamResult.Flush is set to replayPtr when start streaming.

2) You said that "Once startup process reads the beginning of a record, 
XLogReadRecord tries to continue fetching *only the rest* of the record, which 
is inconsistent from the first part in this scenario."
If XLogReadRecord found the record is invalid, it will read the whole 
record again, inlcude the beginning of the record, not only the rest of the 
record. Am I missing something?

3)I wonder how are you going to acheive preventing an immature records 
at a segment boundary from being archived and inhibiting being sent to standby

 Thanks & Best Regard

--
发件人:Kyotaro Horiguchi 
发送时间:2021年3月29日(星期一) 09:54
收件人:蔡梦娟(玊于) 
抄 送:bossartn ; x4mmm ; hlinnaka 
; pgsql-hackers 
主 题:Re: Bug on update timing of walrcv->flushedUpto variable

Hi.

(Added Nathan, Andrey and Heikki in Cc:)

At Fri, 26 Mar 2021 23:44:21 +0800, "蔡梦娟(玊于)"  
wrote in 
> Hi, all
> 
> Recently, I found a bug on update timing of walrcv->flushedUpto variable, 
> consider the following scenario, there is one Primary node, one Standby node 
> which streaming from Primary:
> There are a large number of SQL running in the Primary, and the length of the 
> xlog record generated by these SQL maybe greater than the left space of 
> current page so that it needs to be written cross pages. As shown below, the 
> length of the last_xlog of wal_1 is greater than the left space of last_page, 
> so it has to be written in wal_2. If Primary crashed after flused the 
> last_page of wal_1 to disk, the remian content of last_xlog hasn't been 
> flushed in time, then the last_xlog in wal_1 will be incomplete. And Standby 
> also received the wal_1 by wal-streaming in this case.

It seems like the same with the issue discussed in [1].

There are two symptom of the issue, one is that archive ends with a
segment that ends with a immature WAL record, which causes
inconsistency between archive and pg_wal directory. Another is , as
you saw, walreceiver receives an immature record at the end of a
segment, which prevents recovery from proceeding.

In the thread, trying to solve that by preventing such an immature
records at a segment boundary from being archived and inhibiting being
sent to standby.

> [日志1.png]

It doesn't seem attached..

> The confusing point is: why only updates the walrcv->flushedUpto at the first 
> startup of walreceiver on a specific timeline, not each time when request 
> xlog streaming? In above case, it is also reasonable to update 
> walrcv->flushedUpto to wal_1 when Standby re-receive wal_1. So I changed to 
> update the walrcv->flushedUpto each time when request xlog streaming, which 
> is the patch I want to share with you, based on postgresql-13.2, what do you 
> think of this change?
> 
> By the way, I also want to know why call pgstat_reset_all function during 
> recovery process?

We shouldn't rewind flushedUpto to backward. The variable notifies how
far recovery (or startup process) can read WAL content safely.  Once
startup process reads the beginning of a record, XLogReadRecord tries
to continue fetching *only the rest* of the record, which is
inconsistent from the first part in this scenario. So at least only
this fix doesn't work fine.  And we also need to fix the archive
inconsistency, maybe as a part of a fix for this issue.

We are trying to fix this by refraining from archiving (or streaming)
until a record crossing a segment boundary is completely flushed.

regards.


[1] 
https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-12 Thread Amit Langote
On Mon, Apr 12, 2021 at 6:20 AM Alvaro Herrera  wrote:
> On 2021-Mar-31, Tom Lane wrote:
>
> > diff -U3 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> >  
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > --- 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> >   2021-03-29 20:14:21.258199311 +0200
> > +++ 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > 2021-03-30 18:54:34.272839428 +0200
> > @@ -324,6 +324,7 @@
> >  1
> >  2
> >  step s1insert: insert into d4_fk values (1);
> > +ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
> > "d4_fk_a_fkey"
> >  step s1c: commit;
> >
> >  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> > s1insert s1c
>
> Hmm, actually, looking at this closely, I think the expected output is
> bogus and trilobite is doing the right thing by throwing this error
> here.  The real question is why isn't this case behaving in that way in
> every *other* animal.

Indeed.

I can see a wrong behavior of RelationGetPartitionDesc() in a case
that resembles the above test case.

drop table if exists d4_primary, d4_primary1, d4_fk, d4_pid;
create table d4_primary (a int primary key) partition by list (a);
create table d4_primary1 partition of d4_primary for values in (1);
create table d4_primary2 partition of d4_primary for values in (2);
insert into d4_primary values (1);
insert into d4_primary values (2);
create table d4_fk (a int references d4_primary);
insert into d4_fk values (2);
create table d4_pid (pid int);

Session 1:
begin isolation level serializable;
select * from d4_primary;
 a
---
 1
 2
(2 rows)

Session 2:
alter table d4_primary detach partition d4_primary1 concurrently;


Session 1:
-- can see d4_primary1 as detached at this point, though still scans!
select * from d4_primary;
 a
---
 1
 2
(2 rows)
insert into d4_fk values (1);
INSERT 0 1
end;

Session 2:

ALTER TABLE

Session 1:
-- FK violated
select * from d4_primary;
 a
---
 2
(1 row)
select * from d4_fk;
 a
---
 1
(1 row)

The 2nd select that session 1 performs adds d4_primary1, whose detach
it *sees* is pending, to the PartitionDesc, but without setting its
includes_detached.  The subsequent insert's RI query, because it uses
that PartitionDesc as-is, returns 1 as being present in d4_primary,
leading to the insert succeeding.  When session 1's transaction
commits, the waiting ALTER proceeds with committing the 2nd part of
the DETACH, without having a chance again to check if the DETACH would
lead to the foreign key of d4_fk being violated.

It seems problematic to me that the logic of setting includes_detached
is oblivious of the special check in find_inheritance_children() to
decide whether "force"-include a detach-pending child based on
cross-checking its pg_inherit tuple's xmin against the active
snapshot.  Maybe fixing that would help, although I haven't tried that
myself yet.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Wired if-statement in gen_partprune_steps_internal

2021-04-12 Thread Andy Fan
On Thu, Apr 8, 2021 at 7:59 PM Amit Langote  wrote:

> On Thu, Apr 8, 2021 at 7:41 PM David Rowley  wrote:
> > On Thu, 8 Apr 2021 at 21:04, Amit Langote 
> wrote:
> > > Maybe, we should also updated the description of node struct as
> > > follows to consider that last point:
> >>
> > >  * PartitionPruneStepOp - Information to prune using a set of mutually
> ANDed
> > >  *  OpExpr and any IS [ NOT ] NULL clauses
> >
> > I didn't add that. I wasn't really sure if I understood why we'd talk
> > about PartitionPruneStepCombine in the PartitionPruneStepOp. I thought
> > the overview in gen_partprune_steps_internal was ok to link the two
> > together and explain why they're both needed.
>
> Sorry, maybe the way I wrote it was a bit confusing, but I meant to
> suggest that we do what I have quoted above from my last email.  That
> is, we should clarify in the description of PartitionPruneStepOp that
> it contains information derived from OpExprs and in some cases also IS
> [ NOT ] NULL clauses.
>
> Thanks for the commit.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>

Thanks for the patch.

Recently I am reading the partition prune code again,  and want to
propose some tiny changes.  That is helpful for me and hope it is
helpful for others as well, especially for the people who are not familiar
with these codes.

-- v1-0001-Document-enhancement-for-RelOptInfo.partexprs-nul.patch

Just add comments for RelOptInfo.partexprs & nullable_partexprs to
remind the reader nullable_partexprs is just for partition wise join.  and
use bms_add_member(relinfo->all_partrels, childRTindex); instead of
bms_add_members(relinfo->all_partrels, childrelinfo->relids);  which
would be more explicit to say add the child rt index to all_partrels.

-- v1-0002-Split-gen_prune_steps_from_exprs-into-some-smalle.patch

Just split the gen_prune_steps_from_opexps into some smaller chunks.
The benefits are the same as smaller functions.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


v1-0001-Document-enhancement-for-RelOptInfo.partexprs-nul.patch
Description: Binary data


v1-0002-Split-gen_prune_steps_from_exprs-into-some-smalle.patch
Description: Binary data


Re: pgsql: Move tablespace path re-creation from the makefiles to pg_regres

2021-04-12 Thread Christoph Berg
Re: Michael Paquier
> http://commitfest.cputube.org/michael-paquier.html
> 
> So it looks like this could be a different answer.

The mkdir() function looks like a sane and clean approach to me.

Christoph




vacuum freeze - possible improvements

2021-04-12 Thread Virender Singla
Hi Postgres Community,

Regarding anti wraparound vacuums (to freeze tuples), I see it has to scan
all the pages which are not frozen-all (looking at visibility map). That
means even if we want to freeze less transactions only (For ex - by
increasing parameter vacuum_freeze_min_age to 1B), still it will scan all
the pages in the visibility map and a time taking process.

Can there be any improvement on this process so VACUUM knows the
tuple/pages of those transactions which need to freeze up.

Benefit of such an improvement is that if we are reaching transaction id
close to 2B (and downtime), that time we can quickly recover the database
with vacuuming freeze only a few millions rows with quick lookup rather
than going all the pages from visibility map.

For Ex - A Binary Tree structure where it gets all the rows corresponding
to a table including transaction ids. So whenever we say free all tuples
having transaction id greater than x and less than y. Yes that makes extra
overhead on data load and lots of other things to consider.


Thanks,
Virender


Re: Problems around compute_query_id

2021-04-12 Thread Julien Rouhaud
On Mon, Apr 12, 2021 at 03:26:33PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:
> > 
> > What about log_statement_sample_rate ? Does compute_query_id have the
> > same problem with that?
> 
> No, log_statement_sample_rate samples log_min_duration_statements, not
> log_statements so it works as expected.

While on that topic, it's probably worth mentioning that log_duration is now
way more useful if you have the queryid in you log_line_prefix.  It avoids to
log the full query text while still being able to know what was the underlying
normalized query by dumping the content of pg_stat_statements.




Re: Replication slot stats misgivings

2021-04-12 Thread Amit Kapila
On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada  wrote:
>
> On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila  wrote:
> >
> >
> > It seems Vignesh has changed patches based on the latest set of
> > comments so you might want to rebase.
>
> I've merged my patch into the v6 patch set Vignesh submitted.
>
> I've attached the updated version of the patches. I didn't change
> anything in the patch that changes char[NAMEDATALEN] to NameData (0001
> patch) and patches that add tests.
>

I think we can push 0001. What do you think?

> In 0003 patch I reordered the
> output parameters of pg_stat_replication_slots; showing total number
> of transactions and total bytes followed by statistics for spilled and
> streamed transactions seems appropriate to me.
>

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes). So probably
keeping these new counters at the end makes more sense to me.

> Since my patch resolved
> the issue of writing stats beyond the end of the array, I've removed
> the patch that writes the number of stats into the stats file
> (v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).
>

Okay, but I think it might be better to keep 0001, 0002, 0003 as
Vignesh had because those are agreed upon changes and are
straightforward. We can push those and then further review HTAB
implementation and also see if Andres has any suggestions on the same.

> Apart from the above updates, the
> contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
> function during testing but I think we can use poll_query_until()
> instead.

+1. Can you please change it in the next version?

-- 
With Regards,
Amit Kapila.




Re: Replication slot stats misgivings

2021-04-12 Thread vignesh C
On Sat, Mar 20, 2021 at 9:26 AM Amit Kapila  wrote:
>
> On Sat, Mar 20, 2021 at 12:22 AM Andres Freund  wrote:
> >
> > And then more generally about the feature:
> > - If a slot was used to stream out a large amount of changes (say an
> >   initial data load), but then replication is interrupted before the
> >   transaction is committed/aborted, stream_bytes will not reflect the
> >   many gigabytes of data we may have sent.
> >
>
> We can probably update the stats each time we spilled or streamed the
> transaction data but it was not clear at that stage whether or how
> much it will be useful.
>

I felt we can update the replication slot statistics data each time we
spill/stream the transaction data instead of accumulating the
statistics and updating at the end. I have tried this in the attached
patch and the statistics data were getting updated.
Thoughts?

Regards,
Vignesh
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 99f2afc73c..9905e2b8ad 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -748,7 +748,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 /*
@@ -830,7 +830,7 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 
@@ -887,7 +887,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	}
 
 	/* update the decoding stats */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 954dbb5554..00c481e1d2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -219,6 +219,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder->apply_truncate = truncate_cb_wrapper;
 	ctx->reorder->commit = commit_cb_wrapper;
 	ctx->reorder->message = message_cb_wrapper;
+	namestrcpy(&ctx->reorder->slotname, NameStr(ctx->slot->data.name));
 
 	/*
 	 * To support streaming, we require start/stop/abort/commit/change
@@ -1791,15 +1792,14 @@ ResetLogicalStreamingState(void)
  * Report stats for a slot.
  */
 void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)
 {
-	ReorderBuffer *rb = ctx->reorder;
-
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
 	 * time the stats has been sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->spillCount <=0 &&
+		rb->streamBytes <= 0 && rb->streamCount <=0)
 		return;
 
 	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
@@ -1811,7 +1811,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes);
 
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
+	pgstat_report_replslot(NameStr(rb->slotname),
 		   rb->spillTxns, rb->spillCount, rb->spillBytes,
 		   rb->streamTxns, rb->streamCount, rb->streamBytes);
 	rb->spillTxns = 0;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 92c7fa7993..b60a864004 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3528,6 +3528,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+		UpdateDecodingStats(rb);
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3897,6 +3898,8 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	/* Don't consider already streamed transaction. */
 	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
 
+	UpdateDecodingStats(rb);
+
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
 	Assert(txn->nentries_mem == 0);
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 72f049b347..71d93058d2 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -139,6 +139,6 @@ extern bool filter_prepare_cb_wrapper(LogicalDecodingContext *ctx,
 	  TransactionId xid, const char *gid);
 extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id);
 extern void ResetLogicalStreamingState(void);
-extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
+extern void UpdateDecodingStats(ReorderBuffer *rb);
 
 #endif
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 6c9f2c6c77..bcc4788

RE: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com
> Then I get timeout error occurs and the subscriber worker keep re-launching
> over and over (you did not mention see such errors?)
I test again and get errors, too. I didn't check log after timeout in the 
previous test. 

Regards,
Tang






RE: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com
Sorry for sending a wrong mail.  Please ignore it.

> -Original Message-
> From: Shi, Yu/侍 雨
> Sent: Monday, April 12, 2021 6:51 PM
> To: Tang, Haiying/唐 海英 
> Cc: pgsql-hackers@lists.postgresql.org
> Subject: RE: Could you help testing logical replication?
> 
> > Then I get timeout error occurs and the subscriber worker keep
> > re-launching over and over (you did not mention see such errors?)
> I test again and get errors, too. I didn't check log after timeout in the 
> previous
> test.
> 
> Regards,
> Tang
> 





Re: Replication slot stats misgivings

2021-04-12 Thread Masahiko Sawada
On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila  wrote:
>
> On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada  
> wrote:
> >
> > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila  wrote:
> > >
> > >
> > > It seems Vignesh has changed patches based on the latest set of
> > > comments so you might want to rebase.
> >
> > I've merged my patch into the v6 patch set Vignesh submitted.
> >
> > I've attached the updated version of the patches. I didn't change
> > anything in the patch that changes char[NAMEDATALEN] to NameData (0001
> > patch) and patches that add tests.
> >
>
> I think we can push 0001. What do you think?

+1

>
> > In 0003 patch I reordered the
> > output parameters of pg_stat_replication_slots; showing total number
> > of transactions and total bytes followed by statistics for spilled and
> > streamed transactions seems appropriate to me.
> >
>
> I am not sure about this because I think we might want to add some
> info of stream/spill bytes in total_bytes description (something like
> stream/spill bytes are not in addition to total_bytes).

Okay.

> So probably
> keeping these new counters at the end makes more sense to me.

But I think all of those counters are new for users since
pg_stat_replication_slots view will be introduced to PG14, no?

>
> > Since my patch resolved
> > the issue of writing stats beyond the end of the array, I've removed
> > the patch that writes the number of stats into the stats file
> > (v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).
> >
>
> Okay, but I think it might be better to keep 0001, 0002, 0003 as
> Vignesh had because those are agreed upon changes and are
> straightforward. We can push those and then further review HTAB
> implementation and also see if Andres has any suggestions on the same.

Makes sense. Maybe it should have written my patch as 0004 (i.g.,
applied on top of the patch that adds total_txn and tota_bytes).

>
> > Apart from the above updates, the
> > contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
> > function during testing but I think we can use poll_query_until()
> > instead.
>
> +1. Can you please change it in the next version?

Sure, I'll update the patches.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Replication slot stats misgivings

2021-04-12 Thread Amit Kapila
On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada  wrote:
>
> On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > It seems Vignesh has changed patches based on the latest set of
> > > > comments so you might want to rebase.
> > >
> > > I've merged my patch into the v6 patch set Vignesh submitted.
> > >
> > > I've attached the updated version of the patches. I didn't change
> > > anything in the patch that changes char[NAMEDATALEN] to NameData (0001
> > > patch) and patches that add tests.
> > >
> >
> > I think we can push 0001. What do you think?
>
> +1
>
> >
> > > In 0003 patch I reordered the
> > > output parameters of pg_stat_replication_slots; showing total number
> > > of transactions and total bytes followed by statistics for spilled and
> > > streamed transactions seems appropriate to me.
> > >
> >
> > I am not sure about this because I think we might want to add some
> > info of stream/spill bytes in total_bytes description (something like
> > stream/spill bytes are not in addition to total_bytes).
>
> Okay.
>
> > So probably
> > keeping these new counters at the end makes more sense to me.
>
> But I think all of those counters are new for users since
> pg_stat_replication_slots view will be introduced to PG14, no?
>

Right, I was referring to total_txns and total_bytes attributes. I
think keeping them at end after spill and stream counters should be
okay.

-- 
With Regards,
Amit Kapila.




Re: Replication slot stats misgivings

2021-04-12 Thread vignesh C
On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada  wrote:
>
> On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > It seems Vignesh has changed patches based on the latest set of
> > > > comments so you might want to rebase.
> > >
> > > I've merged my patch into the v6 patch set Vignesh submitted.
> > >
> > > I've attached the updated version of the patches. I didn't change
> > > anything in the patch that changes char[NAMEDATALEN] to NameData (0001
> > > patch) and patches that add tests.
> > >
> >
> > I think we can push 0001. What do you think?
>
> +1
>
> >
> > > In 0003 patch I reordered the
> > > output parameters of pg_stat_replication_slots; showing total number
> > > of transactions and total bytes followed by statistics for spilled and
> > > streamed transactions seems appropriate to me.
> > >
> >
> > I am not sure about this because I think we might want to add some
> > info of stream/spill bytes in total_bytes description (something like
> > stream/spill bytes are not in addition to total_bytes).
>
> Okay.
>
> > So probably
> > keeping these new counters at the end makes more sense to me.
>
> But I think all of those counters are new for users since
> pg_stat_replication_slots view will be introduced to PG14, no?
>
> >
> > > Since my patch resolved
> > > the issue of writing stats beyond the end of the array, I've removed
> > > the patch that writes the number of stats into the stats file
> > > (v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).
> > >
> >
> > Okay, but I think it might be better to keep 0001, 0002, 0003 as
> > Vignesh had because those are agreed upon changes and are
> > straightforward. We can push those and then further review HTAB
> > implementation and also see if Andres has any suggestions on the same.
>
> Makes sense. Maybe it should have written my patch as 0004 (i.g.,
> applied on top of the patch that adds total_txn and tota_bytes).
>
> >
> > > Apart from the above updates, the
> > > contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
> > > function during testing but I think we can use poll_query_until()
> > > instead.
> >
> > +1. Can you please change it in the next version?
>
> Sure, I'll update the patches.

I had started working on poll_query_until comment, I will test and
post a patch for that comment shortly.

Regards,
Vignesh




Re: Replication slot stats misgivings

2021-04-12 Thread Amit Kapila
On Sat, Apr 10, 2021 at 6:51 PM vignesh C  wrote:
>

Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002.


+total_bytesbigint
+   
+   
+Amount of decoded transactions data sent to the decoding output plugin
+while decoding the changes from WAL for this slot. This can be used to
+gauge the total amount of data sent during logical decoding.

Can we slightly extend it to say something like: Note that this
includes the bytes streamed and or spilled. Similarly, we can extend
it for total_txns.

-- 
With Regards,
Amit Kapila.




Re: TRUNCATE on foreign table

2021-04-12 Thread Fujii Masao




On 2021/04/11 19:15, Bharath Rupireddy wrote:

On Sun, Apr 11, 2021 at 9:47 AM Justin Pryzby  wrote:

Find attached language fixes.


Thanks for the patches.


Thanks for the patches!

0001 patch basically looks good to me.

+ behavior must be specified as
+ DROP_RESTRICT or DROP_CASCADE.
+ If specified as DROP_RESTRICT, the
+ RESTRICT option will be included in the
  TRUNCATE command.
+ If specified as DROP_CASCADE, the
+ CASCADE option will be included.

Maybe "will be included" is confusing? Because FDW might not include
the RESTRICT in the TRUNCATE command that it will issue
when DROP_RESTRICT is specified, for example. To be more precise,
we should document something like "If specified as DROP_RESTRICT,
the RESTRICT option was included in the original TRUNCATE command"?



I'm also proposing to convert an if/else to an switch(), since I don't like
"if/else if" without an "else", and since the compiler can warn about missing
enum values in switch cases.


I think we have a good bunch of if, else-if (without else) in the code
base, and I'm not sure the compilers have warned about them. Apart
from that whether if-else or switch-case is just a coding choice. And
we have only two values for DropBehavior enum i.e DROP_RESTRICT and
DROP_CASCADE(I'm not sure we will extend this enum to have more
values), if we have more then switch case would have looked cleaner.
But IMO, existing if and else-if would be good. Having said that, it's
up to the committer which one they think better in this case.


Either works at least for me. Also for now I have no strong opinion
to change the condition so that it uses switch().



You could also write:
| Assert(behavior == DROP_RESTRICT || behavior == DROP_CASCADE)


IMO, we don't need to assert on behaviour as we just carry that
variable from ExecuteTruncateGuts to postgresExecForeignTruncate
without any modifications. And ExecuteTruncateGuts would get it from
the syntaxer, so no point it will have a different value than
DROP_RESTRICT or DROP_CASCADE.


Also, you currently test:

+ if (extra & TRUNCATE_REL_CONTEXT_ONLY)


but TRUNCATE_REL_ aren't indepedent bits, so shouldn't be tested with "&".


You're right.



Yeah this is an issue. We could just change the #defines to values
0x0001, 0x0002, 0x0004, 0x0008 ... 0x0020 and so on and then testing
with & would work. So, this way, more than option can be multiplexed
into the same int value. To multiplex, we need to think: will there be
a scenario where a single rel in the truncate can have multiple
options at a time and do we want to distinguish these options while
deparsing?

#define TRUNCATE_REL_CONTEXT_NORMAL  0x0001 /* specified without
ONLY clause */
#define TRUNCATE_REL_CONTEXT_ONLY 0x0002 /* specified with
ONLY clause */
#define TRUNCATE_REL_CONTEXT_CASCADING0x0004   /* not specified
but truncated

And I'm not sure what's the agreement on retaining or removing #define
values, currently I see only TRUNCATE_REL_CONTEXT_ONLY is being used,
others are just being set but not used. As I said upthread, it will be
good to remove the unused macros/enums, retain only the ones that are
used, especially TRUNCATE_REL_CONTEXT_CASCADING this is not required I
feel, because we can add the child partitions that are foreign tables
to relids as just normal foreign tables with TRUNCATE_REL_CONTEXT_ONLY
option.


Our current consensus seems that TRUNCATE_REL_CONTEXT_NORMAL and
TRUNCATE_REL_CONTEXT_CASCADING should be removed because they are not used.
Since Kaigai-san thinks to remove the extra information at all,
I guess he also agrees to remove those both TRUNCATE_REL_CONTEXT_NORMAL
and _CASCADING. If this is right, we should apply 0003 patch and remove
those two macro values. Or we should make the extra info boolean value
instead of int, i.e., it indicates whether ONLY was specified or not.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




撤回: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com
Shi, Yu/侍 雨 将撤回邮件“Could you help testing logical replication?”。

撤回: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com
Shi, Yu/侍 雨 将撤回邮件“Could you help testing logical replication?”。

Re: Replication slot stats misgivings

2021-04-12 Thread Masahiko Sawada
On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila  wrote:
>
> On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada  wrote:
> >
> > On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila  wrote:
> > >
> > > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > >
> > > > > It seems Vignesh has changed patches based on the latest set of
> > > > > comments so you might want to rebase.
> > > >
> > > > I've merged my patch into the v6 patch set Vignesh submitted.
> > > >
> > > > I've attached the updated version of the patches. I didn't change
> > > > anything in the patch that changes char[NAMEDATALEN] to NameData (0001
> > > > patch) and patches that add tests.
> > > >
> > >
> > > I think we can push 0001. What do you think?
> >
> > +1
> >
> > >
> > > > In 0003 patch I reordered the
> > > > output parameters of pg_stat_replication_slots; showing total number
> > > > of transactions and total bytes followed by statistics for spilled and
> > > > streamed transactions seems appropriate to me.
> > > >
> > >
> > > I am not sure about this because I think we might want to add some
> > > info of stream/spill bytes in total_bytes description (something like
> > > stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right? The doc says "Amount of
decoded transactions data sent to the decoding output plugin while
decoding the changes from WAL for this slot" but I think we also send
decoded data that had been spilled to the decoding output plugin.

> >
> > Okay.
> >
> > > So probably
> > > keeping these new counters at the end makes more sense to me.
> >
> > But I think all of those counters are new for users since
> > pg_stat_replication_slots view will be introduced to PG14, no?
> >
>
> Right, I was referring to total_txns and total_bytes attributes. I
> think keeping them at end after spill and stream counters should be
> okay.

Okay, understood.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: allow partial union-all and improve parallel subquery costing

2021-04-12 Thread Luc Vlaming

Hi David,

On 15-03-2021 14:09, David Steele wrote:

Hi Luc,

On 12/30/20 8:54 AM, Luc Vlaming wrote:


Created a commitfest entry assuming this is the right thing to do so 
that someone can potentially pick it up during the commitfest.


Providing an updated patch based on latest master.


Looks like you need another rebase: 
http://cfbot.cputube.org/patch_32_2787.log. Marked as Waiting for Author.


You may also want to give a more detailed description of what you have 
done here and why it improves execution plans. This may help draw some 
reviewers.


Regards,


Here's an improved and rebased patch. Hope the description helps some 
people. I will resubmit it to the next commitfest.


Regards,
Luc
>From e918e7cf8c9fe628c7daba2ccf37ad767691e4c7 Mon Sep 17 00:00:00 2001
From: Luc Vlaming 
Date: Mon, 12 Apr 2021 09:55:30 +0200
Subject: [PATCH v4] Add explicit partial UNION ALL path and improve parallel
 subquery rowcounts and costing.

By adding the partial union-all path we get parallel plans whenever
the flatten_simple_union_all cannot be applied, e.g. whenever
the column types do not exactly match. A simple testcase shows
this in the regression tests.
Also for e.g. tpc-ds query 5 we now get a more parallel plan
for the part that processes the csr CTE.
To make it more likely that the improved path is chosen another
small fix is added which corrects the rowcounts when subquery
nodes are used in parallel plans.
---
 src/backend/optimizer/path/costsize.c | 11 
 src/backend/optimizer/prep/prepunion.c|  4 ++
 .../regress/expected/incremental_sort.out | 10 ++--
 src/test/regress/expected/union.out   | 52 +++
 src/test/regress/sql/union.sql| 37 +
 5 files changed, 108 insertions(+), 6 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8577c7b138..1da6879c6d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1426,6 +1426,17 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 	startup_cost += path->path.pathtarget->cost.startup;
 	run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows;
 
+	/* Adjust costing for parallelism, if used. */
+	if (path->path.parallel_workers > 0)
+	{
+		double  parallel_divisor = get_parallel_divisor(&path->path);
+
+		path->path.rows = clamp_row_est(path->path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		run_cost /= parallel_divisor;
+	}
+
 	path->path.startup_cost += startup_cost;
 	path->path.total_cost += startup_cost + run_cost;
 }
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 037dfaacfd..7d4a6a19c2 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -679,6 +679,10 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 			   NIL, NULL,
 			   parallel_workers, enable_parallel_append,
 			   -1);
+
+		if (op->all && enable_parallel_append)
+			add_partial_path(result_rel, ppath);
+
 		ppath = (Path *)
 			create_gather_path(root, result_rel, ppath,
 			   result_rel->reltarget, NULL, NULL);
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index a417b566d9..a0a31ba053 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1487,14 +1487,12 @@ explain (costs off) select * from t union select * from t order by 1,3;
->  Unique
  ->  Sort
Sort Key: t.a, t.b, t.c
-   ->  Append
- ->  Gather
-   Workers Planned: 2
+   ->  Gather
+ Workers Planned: 2
+ ->  Parallel Append
->  Parallel Seq Scan on t
- ->  Gather
-   Workers Planned: 2
->  Parallel Seq Scan on t t_1
-(13 rows)
+(11 rows)
 
 -- Full sort, not just incremental sort can be pushed below a gather merge path
 -- by generate_useful_gather_paths.
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 75f78db8f5..cf7660f524 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1420,3 +1420,55 @@ where (x = 0) or (q1 >= q2 and q1 <= q2);
  4567890123456789 |  4567890123456789 | 1
 (6 rows)
 
+-- Test handling of appendrel with different types which disables the path flattening and
+-- forces a subquery node. for the subquery node ensure the rowcounts are correct.
+create function check_estimated_rows(text) returns table (estimated int)
+language plpgsql as
+$$
+declare
+ln text;
+tmp text[];
+first_row bool := true;
+begin
+for ln in
+execute format('explain %s', $1)
+loop
+tmp := regexp_ma

Re: Lazy JIT IR code generation to increase JIT speed with partitions

2021-04-12 Thread Luc Vlaming

On 18-01-2021 08:47, Luc Vlaming wrote:

Hi everyone, Andres,

On 03-01-2021 11:05, Luc Vlaming wrote:

On 30-12-2020 14:23, Luc Vlaming wrote:

On 30-12-2020 02:57, Andres Freund wrote:

Hi,

Great to see work in this area!


I would like this topic to somehow progress and was wondering what other 
benchmarks / tests would be needed to have some progress? I've so far 
provided benchmarks for small(ish) queries and some tpch numbers, 
assuming those would be enough.




On 2020-12-28 09:44:26 +0100, Luc Vlaming wrote:
I would like to propose a small patch to the JIT machinery which 
makes the
IR code generation lazy. The reason for postponing the generation 
of the IR

code is that with partitions we get an explosion in the number of JIT
functions generated as many child tables are involved, each with 
their own
JITted functions, especially when e.g. partition-aware 
joins/aggregates are
enabled. However, only a fraction of those functions is actually 
executed
because the Parallel Append node distributes the workers among the 
nodes.
With the attached patch we get a lazy generation which makes that 
this is no

longer a problem.


I unfortunately don't think this is quite good enough, because it'll
lead to emitting all functions separately, which can also lead to very
substantial increases of the required time (as emitting code is an
expensive step). Obviously that is only relevant in the cases where the
generated functions actually end up being used - which isn't the 
case in

your example.

If you e.g. look at a query like
   SELECT blub, count(*),sum(zap) FROM foo WHERE blarg = 3 GROUP BY 
blub;

on a table without indexes, you would end up with functions for

- WHERE clause (including deforming)
- projection (including deforming)
- grouping key
- aggregate transition
- aggregate result projection

with your patch each of these would be emitted separately, instead of
one go. Which IIRC increases the required time by a significant amount,
especially if inlining is done (where each separate code generation 
ends

up with copies of the inlined code).


As far as I can see you've basically falsified the second part of this
comment (which you moved):


+
+    /*
+ * Don't immediately emit nor actually generate the function.
+ * instead do so the first time the expression is actually 
evaluated.
+ * That allows to emit a lot of functions together, avoiding a 
lot of
+ * repeated llvm and memory remapping overhead. It also helps 
with not
+ * compiling functions that will never be evaluated, as can be 
the case
+ * if e.g. a parallel append node is distributing workers 
between its

+ * child nodes.
+ */



-    /*
- * Don't immediately emit function, instead do so the first 
time the

- * expression is actually evaluated. That allows to emit a lot of
- * functions together, avoiding a lot of repeated llvm and memory
- * remapping overhead.
- */


Greetings,

Andres Freund



Hi,

Happy to help out, and thanks for the info and suggestions.
Also, I should have first searched psql-hackers and the like, as I 
just found out there is already discussions about this in [1] and [2].
However I think the approach I took can be taken independently and 
then other solutions could be added on top.


Assuming I understood all suggestions correctly, the ideas so far are:
1. add a LLVMAddMergeFunctionsPass so that duplicate code is removed 
and not optimized several times (see [1]). Requires all code to be 
emitted in the same module.

2. JIT only parts of the plan, based on cost (see [2]).
3. Cache compilation results to avoid recompilation. this would 
either need a shm capable optimized IR cache or would not work with 
parallel workers.

4. Lazily jitting (this patch)

An idea that might not have been presented in the mailing list yet(?):
5. Only JIT in nodes that process a certain amount of rows. Assuming 
there is a constant overhead for JITting and the goal is to gain 
runtime.


Going forward I would first try to see if my current approach can 
work out. The only idea that would be counterproductive to my 
solution would be solution 1. Afterwards I'd like to continue with 
either solution 2, 5, or 3 in the hopes that we can reduce JIT 
overhead to a minimum and can therefore apply it more broadly.


To test out why and where the JIT performance decreased with my 
solution I improved the test script and added various queries to 
model some of the cases I think we should care about. I have not 
(yet) done big scale benchmarks as these queries seemed to already 
show enough problems for now. Now there are 4 queries which test 
JITting with/without partitions, and with varying amounts of workers 
and rowcounts. I hope these are indeed a somewhat representative set 
of queries.


As pointed out the current patch does create a degradation in 
performance wrt queries that are not partitioned (basically q3 and 
q4). After looking into those queries I noticed two things:
- 

Re: Replication slot stats misgivings

2021-04-12 Thread vignesh C
On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila  wrote:
>
> On Sat, Apr 10, 2021 at 6:51 PM vignesh C  wrote:
> >
>
> Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002.
>
> 
> +total_bytesbigint
> +   
> +   
> +Amount of decoded transactions data sent to the decoding output 
> plugin
> +while decoding the changes from WAL for this slot. This can be used 
> to
> +gauge the total amount of data sent during logical decoding.
>
> Can we slightly extend it to say something like: Note that this
> includes the bytes streamed and or spilled. Similarly, we can extend
> it for total_txns.
>

Thanks for the comments, the comments are fixed in the v8 patch attached.
Thoughts?

Regards,
Vignesh
From b3288d575d9b8c26381fa8da773960d16ae2a1f3 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v8 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c   | 32 +++
 src/backend/replication/logical/logical.c | 13 ++---
 src/backend/replication/slot.c|  7 -
 src/backend/utils/adt/pgstatfuncs.c   |  2 +-
 src/include/pgstat.h  | 11 +++-
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..666ce95d08 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * --
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-	   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-	   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-	   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5202,7 +5200,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5538,7 +5536,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5763,7 +5761,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5776,7 +5774,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlot

"could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-12 Thread Luc Vlaming

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94, 
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the 
plan for query 94 looks like below. I tried figuring out what the 
problem is but to be honest I would need some pointers as the code that 
tries to matching equivalence members in prepare_sort_from_pathkeys is 
something i'm really not familiar with.


To reproduce you can either ingest and test using the toolkit I used too 
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or 
alternatively just use the schema (see 
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)


Best,
Luc


 Limit  (cost=229655.62..229655.63 rows=1 width=72)
   ->  Sort  (cost=229655.62..229655.63 rows=1 width=72)
 Sort Key: (count(DISTINCT ws1.ws_order_number))
 ->  Aggregate  (cost=229655.60..229655.61 rows=1 width=72)
   ->  Nested Loop Semi Join  (cost=1012.65..229655.59 
rows=1 width=16)
 ->  Nested Loop  (cost=1012.22..229653.73 rows=1 
width=20)
   Join Filter: (ws1.ws_web_site_sk = 
web_site.web_site_sk)
   ->  Nested Loop  (cost=1012.22..229651.08 
rows=1 width=24)
 ->  Gather  (cost=1011.80..229650.64 
rows=1 width=28)

   Workers Planned: 2
   ->  Nested Loop Anti Join 
(cost=11.80..228650.54 rows=1 width=28)
 ->  Hash Join 
(cost=11.37..227438.35 rows=2629 width=28)
   Hash Cond: 
(ws1.ws_ship_date_sk = date_dim.d_date_sk)
   ->  Parallel Seq 
Scan on web_sales ws1  (cost=0.00..219548.92 rows=3000992 width=32)
   ->  Hash 
(cost=10.57..10.57 rows=64 width=4)
 ->  Index Scan 
using idx_d_date on date_dim  (cost=0.29..10.57 rows=64 width=4)
   Index 
Cond: ((d_date >= '2000-03-01'::date) AND (d_date <= '2000-04-30'::date))
 ->  Index Only Scan using 
idx_wr_order_number on web_returns wr1  (cost=0.42..0.46 rows=2 width=4)
   Index Cond: 
(wr_order_number = ws1.ws_order_number)
 ->  Index Scan using 
customer_address_pkey on customer_address  (cost=0.42..0.44 rows=1 width=4)
   Index Cond: (ca_address_sk = 
ws1.ws_ship_addr_sk)
   Filter: ((ca_state)::text = 
'GA'::text)
   ->  Seq Scan on web_site  (cost=0.00..2.52 
rows=10 width=4)
 Filter: ((web_company_name)::text = 
'pri'::text)
 ->  Index Scan using idx_ws_order_number on 
web_sales ws2  (cost=0.43..1.84 rows=59 width=8)
   Index Cond: (ws_order_number = 
ws1.ws_order_number)

   Filter: (ws1.ws_warehouse_sk <> ws_warehouse_sk)

The top of the stacktrace is:
#0  errfinish (filename=0x5562dc1a5125 "createplan.c", lineno=6186, 
funcname=0x5562dc1a54d0 <__func__.14> "prepare_sort_from_pathkeys") at 
elog.c:514
#1  0x5562dbc2d7de in prepare_sort_from_pathkeys 
(lefttree=0x5562dc5a2f58, pathkeys=0x5562dc4eabc8, relids=0x0, 
reqColIdx=0x0, adjust_tlist_in_place=, 
p_numsortkeys=0x7ffc0b8cda84, p_sortColIdx=0x7ffc0b8cda88, 
p_sortOperators=0x7ffc0b8cda90, p_collations=0x7ffc0b8cda98, 
p_nullsFirst=0x7ffc0b8cdaa0) at createplan.c:6186
#2  0x5562dbe8d695 in make_sort_from_pathkeys (lefttree=out>, pathkeys=, relids=) at createplan.c:6313
#3  0x5562dbe8eba3 in create_sort_plan (flags=1, 
best_path=0x5562dc548d68, root=0x5562dc508cf8) at createplan.c:2118
#4  create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc548d68, 
flags=1) at createplan.c:489
#5  0x5562dbe8f315 in create_gather_merge_plan 
(best_path=0x5562dc5782f8, root=0x5562dc508cf8) at createplan.c:1885
#6  create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc5782f8, 
flags=) at createplan.c:541
#7  0x5562dbe8ddad in create_nestloop_plan 
(best_path=0x5562dc585668, root=0x5562dc508cf8) at createplan.c:4237
#8  create_join_plan (best_path=0x5562dc585668, root=0x5562dc508cf8) at 
createplan.c:1062
#9  create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc585668, 
flags=) at createplan.c:418
#10 0x5562dbe8ddad in create_nestloop_plan 
(best_path=0x5562dc5c4428, root=0x5562dc508cf8) at createplan.c:4237
#11 create_join_plan (best_path=0x5562dc5c4428, roo

interaction between csps with dummy tlists and set_customscan_references

2021-04-12 Thread Luc Vlaming

Hi,

Whilst developing a CSP that potentially sits (directly) above e.g. any 
union or anything with a dummy tlist we observed some problems as the 
set_customscan_references cannot handle any dummy tlists and will give 
invalid varno errors. I was wondering how we can fix this, and I was 
wondering what the reason is that there is actually no callback in the 
csp interface for the set_customscan_references. Can someone maybe 
clarify this for me?


Thanks!

Regards,
Luc




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-12 Thread Amit Langote
On Mon, Apr 12, 2021 at 4:42 PM Amit Langote  wrote:
> On Mon, Apr 12, 2021 at 6:20 AM Alvaro Herrera  
> wrote:
> > On 2021-Mar-31, Tom Lane wrote:
> >
> > > diff -U3 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > >  
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > > --- 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > >   2021-03-29 20:14:21.258199311 +0200
> > > +++ 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > > 2021-03-30 18:54:34.272839428 +0200
> > > @@ -324,6 +324,7 @@
> > >  1
> > >  2
> > >  step s1insert: insert into d4_fk values (1);
> > > +ERROR:  insert or update on table "d4_fk" violates foreign key 
> > > constraint "d4_fk_a_fkey"
> > >  step s1c: commit;
> > >
> > >  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> > > s1insert s1c
> >
> > Hmm, actually, looking at this closely, I think the expected output is
> > bogus and trilobite is doing the right thing by throwing this error
> > here.  The real question is why isn't this case behaving in that way in
> > every *other* animal.
>
> Indeed.
>
> I can see a wrong behavior of RelationGetPartitionDesc() in a case
> that resembles the above test case.
>
> drop table if exists d4_primary, d4_primary1, d4_fk, d4_pid;
> create table d4_primary (a int primary key) partition by list (a);
> create table d4_primary1 partition of d4_primary for values in (1);
> create table d4_primary2 partition of d4_primary for values in (2);
> insert into d4_primary values (1);
> insert into d4_primary values (2);
> create table d4_fk (a int references d4_primary);
> insert into d4_fk values (2);
> create table d4_pid (pid int);
>
> Session 1:
> begin isolation level serializable;
> select * from d4_primary;
>  a
> ---
>  1
>  2
> (2 rows)
>
> Session 2:
> alter table d4_primary detach partition d4_primary1 concurrently;
> 
>
> Session 1:
> -- can see d4_primary1 as detached at this point, though still scans!
> select * from d4_primary;
>  a
> ---
>  1
>  2
> (2 rows)
> insert into d4_fk values (1);
> INSERT 0 1
> end;
>
> Session 2:
> 
> ALTER TABLE
>
> Session 1:
> -- FK violated
> select * from d4_primary;
>  a
> ---
>  2
> (1 row)
> select * from d4_fk;
>  a
> ---
>  1
> (1 row)
>
> The 2nd select that session 1 performs adds d4_primary1, whose detach
> it *sees* is pending, to the PartitionDesc, but without setting its
> includes_detached.  The subsequent insert's RI query, because it uses
> that PartitionDesc as-is, returns 1 as being present in d4_primary,
> leading to the insert succeeding.  When session 1's transaction
> commits, the waiting ALTER proceeds with committing the 2nd part of
> the DETACH, without having a chance again to check if the DETACH would
> lead to the foreign key of d4_fk being violated.
>
> It seems problematic to me that the logic of setting includes_detached
> is oblivious of the special check in find_inheritance_children() to
> decide whether "force"-include a detach-pending child based on
> cross-checking its pg_inherit tuple's xmin against the active
> snapshot.  Maybe fixing that would help, although I haven't tried that
> myself yet.

I tried that in the attached.  It is indeed the above failing
isolation test whose output needed to be adjusted.

While at it, I tried rewording the comment around that special
visibility check done to force-include detached partitions, as I got
confused by the way it's worded currently.  Actually, it may be a good
idea to add some comments around the intended include_detached
behavior in the places where PartitionDesc is used; e.g.
set_relation_partition_info() lacks a one-liner on why it's okay for
the planner to not see detached partitions.  Or perhaps, a comment for
includes_detached of PartitionDesc should describe the various cases
in which it is true and the cases in which it is not.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


PartitionDesc-includes_detached-thinko-fix.patch
Description: Binary data


Re: TRUNCATE on foreign table

2021-04-12 Thread Fujii Masao




On 2021/04/09 23:10, Bharath Rupireddy wrote:

On Fri, Apr 9, 2021 at 7:06 PM Fujii Masao  wrote:

  > 4. Tab-completion for TRUNCATE should be updated so that also foreign 
tables are displayed.

 It will be good to have.


Patch attached.


Tab completion patch LGTM and it works as expected.


Thanks for the review! Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Replication slot stats misgivings

2021-04-12 Thread Amit Kapila
On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada  wrote:
>
> On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > It seems Vignesh has changed patches based on the latest set of
> > > > > > comments so you might want to rebase.
> > > > >
> > > > > I've merged my patch into the v6 patch set Vignesh submitted.
> > > > >
> > > > > I've attached the updated version of the patches. I didn't change
> > > > > anything in the patch that changes char[NAMEDATALEN] to NameData (0001
> > > > > patch) and patches that add tests.
> > > > >
> > > >
> > > > I think we can push 0001. What do you think?
> > >
> > > +1
> > >
> > > >
> > > > > In 0003 patch I reordered the
> > > > > output parameters of pg_stat_replication_slots; showing total number
> > > > > of transactions and total bytes followed by statistics for spilled and
> > > > > streamed transactions seems appropriate to me.
> > > > >
> > > >
> > > > I am not sure about this because I think we might want to add some
> > > > info of stream/spill bytes in total_bytes description (something like
> > > > stream/spill bytes are not in addition to total_bytes).
>
> BTW doesn't it confuse users that stream/spill bytes are not in
> addition to total_bytes? User will need to do "total_bytes +
> spill/stream_bytes" to know the actual total amount of data sent to
> the decoding output plugin, is that right?
>

No, total_bytes includes the spill/stream bytes. So, the user doesn't
need to do any calculation to compute totel_bytes sent to output
plugin.

-- 
With Regards,
Amit Kapila.




Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-12 Thread Tomas Vondra
On 4/12/21 2:24 PM, Luc Vlaming wrote:
> Hi,
> 
> When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
> 95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
> When I disable enable_gathermerge the problem goes away and then the
> plan for query 94 looks like below. I tried figuring out what the
> problem is but to be honest I would need some pointers as the code that
> tries to matching equivalence members in prepare_sort_from_pathkeys is
> something i'm really not familiar with.
> 

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

> To reproduce you can either ingest and test using the toolkit I used too
> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or
> alternatively just use the schema (see
> https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)
> 

Thanks, I'll see if I can reproduce that with your schema.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: multi-install PostgresNode fails with older postgres versions

2021-04-12 Thread Jehan-Guillaume de Rorthais
Hi,

On Wed, 7 Apr 2021 20:07:41 +0200
Jehan-Guillaume de Rorthais  wrote:
[...]
> > > Let me know if it worth that I work on an official patch.
> > 
> > Let's give it a try ...  
> 
> OK

So, as promised, here is my take to port my previous work on PostgreSQL
source tree.

Make check pass with no errors. The new class probably deserve some own TAP
tests.

Note that I added a PostgresVersion class for easier and cleaner version
comparisons. This could be an interesting take away no matter what.

I still have some more ideas to cleanup, revamp and extend the base class, but
I prefer to go incremental to keep things review-ability.

Thanks,
>From 077e447995b3b8bb4c0594a6616e1350acd9d48b Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Mon, 12 Apr 2021 14:45:17 +0200
Subject: [PATCH] Draft: Makes PostgresNode multi-version aware

---
 src/test/perl/PostgresNode.pm| 708 ---
 src/test/perl/PostgresVersion.pm |  65 +++
 2 files changed, 704 insertions(+), 69 deletions(-)
 create mode 100644 src/test/perl/PostgresVersion.pm

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index e26b2b3f30..d7a9e54efd 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -102,6 +102,7 @@ use Test::More;
 use TestLib ();
 use Time::HiRes qw(usleep);
 use Scalar::Util qw(blessed);
+use PostgresVersion;
 
 our @EXPORT = qw(
   get_new_node
@@ -376,9 +377,42 @@ sub dump_info
 	return;
 }
 
+# Internal routine to allows streaming replication on a primary node.
+sub allows_streaming
+{
+	my ($self, $allows_streaming) = @_;
+	my $pgdata = $self->data_dir;
+	my $wal_level;
+
+	if ($allows_streaming eq "logical")
+	{
+		$wal_level = "logical";
+	}
+	else
+	{
+		$wal_level = "replica";
+	}
+
+	$self->append_conf( 'postgresql.conf', qq{
+		wal_level = $wal_level
+		max_wal_senders = 10
+		max_replication_slots = 10
+		wal_log_hints = on
+		hot_standby = on
+		# conservative settings to ensure we can run multiple postmasters:
+		shared_buffers = 1MB
+		max_connections = 10
+		# limit disk space consumption, too:
+		max_wal_size = 128MB
+	});
 
-# Internal method to set up trusted pg_hba.conf for replication.  Not
-# documented because you shouldn't use it, it's called automatically if needed.
+	$self->set_replication_conf;
+
+	return;
+}
+
+# Internal to set up trusted pg_hba.conf for replication.  Not documented
+# because you shouldn't use it, it's called automatically if needed.
 sub set_replication_conf
 {
 	my ($self) = @_;
@@ -398,6 +432,15 @@ sub set_replication_conf
 	return;
 }
 
+# Internal methods to track capabilities depending on the PostgreSQL major
+# version used
+
+sub can_slots   { return 1 }
+sub can_log_hints   { return 1 }
+sub can_restart_after_crash { return 1 }
+sub can_skip_init_fsync { return 1 }
+
+
 =pod
 
 =item $node->init(...)
@@ -429,6 +472,7 @@ sub init
 	my $port   = $self->port;
 	my $pgdata = $self->data_dir;
 	my $host   = $self->host;
+	my @cmd;
 
 	local %ENV = $self->_get_env();
 
@@ -438,19 +482,25 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
-		@{ $params{extra} });
+	@cmd = ( 'initdb', '-D', $pgdata, '-A', 'trust' );
+	push @cmd, '-N' if $self->can_skip_init_fsync;
+	push @cmd, @{ $params{extra} } if defined $params{extra};
+
+	TestLib::system_or_bail(@cmd);
 	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata,
 		@{ $params{auth_extra} });
 
 	open my $conf, '>>', "$pgdata/postgresql.conf";
 	print $conf "\n# Added by PostgresNode.pm\n";
 	print $conf "fsync = off\n";
-	print $conf "restart_after_crash = off\n";
+	print $conf "restart_after_crash = off\n" if $self->can_restart_after_crash;
 	print $conf "log_line_prefix = '%m [%p] %q%a '\n";
 	print $conf "log_statement = all\n";
-	print $conf "log_replication_commands = on\n";
-	print $conf "wal_retrieve_retry_interval = '500ms'\n";
+	if ($self->version >= '9.5.0')
+	{
+		print $conf "log_replication_commands = on\n";
+		print $conf "wal_retrieve_retry_interval = '500ms'\n";
+	}
 
 	# If a setting tends to affect whether tests pass or fail, print it after
 	# TEMP_CONFIG.  Otherwise, print it before TEMP_CONFIG, thereby permitting
@@ -463,31 +513,8 @@ sub init
 	# concurrently must not share a stats_temp_directory.
 	print $conf "stats_temp_directory = 'pg_stat_tmp'\n";
 
-	if ($params{allows_streaming})
-	{
-		if ($params{allows_streaming} eq "logical")
-		{
-			print $conf "wal_level = logical\n";
-		}
-		else
-		{
-			print $conf "wal_level = replica\n";
-		}
-		print $conf "max_wal_senders = 10\n";
-		print $conf "max_replication_slots = 10\n";
-		print $conf "wal_log_hints = on\n";
-		print $conf "hot_standby = on\n";
-		# conservative settings to ensure we can run multiple postmasters:
-		print $conf "shared_buffers = 1MB\n";
-		print $conf "max_connections = 10\n";
-		# limit disk space consumption, to

pg_upgrade check for invalid role-specific default config

2021-04-12 Thread Charlie Hornsby
Hi all,

While troubleshooting a failed upgrade from v11 -> v12 I realised I had
encountered a bug previously reported on the pgsql-bugs mailing list:

#14242 Role with a setconfig "role" setting to a nonexistent role causes
pg_upgrade to fail

https://www.postgresql.org/message-id/20160711223641.1426.86096%40wrigleys.postgresql.org

To quote the previous report:

> It is possible to modify the "role" setting in setconfig in the
> pg_db_role_setting table such that it points to a nonexistent role.  When
> this is the case, restoring the output of pg_dumpall will fail due to the
> missing role.

> Steps to reproduce:

> 1. As superuser, execute "create role foo with login password 'test'"
> 2. As foo, execute "alter role foo set role = 'foo'"
> 3. As superuser, execute "alter role foo rename to bar"
> a. At this point, the setconfig entry in pg_db_role_setting for
> 'bar' will contain '{role=foo}', which no longer exists
> 4. Execute pg_upgrade with the recommended steps in
> https://www.postgresql.org/docs/current/static/pgupgrade.html

> During pg_upgrade (more specifically, during the restore of the output from
> pg_dumpall), the "ALTER ROLE "bar" SET "role" TO 'foo'" command generated
> will fail with "ERROR: role "foo" does not exist".

> This issue was identified by Jordan Lange and Nathan Bossart.

The steps in the original report reproduce the problem on all currently
supported pg versions. I appreciate that the invalid role-specific default 
settings are ultimately self-inflicted by the user, but as a third-party 
performing the upgrade this caught me by surprise.

Since it is possible to write a query to identify these cases, would there
be appetite for me to submit a patch to add a check for this to 
pg_upgrade?

First time mailing list user here so many apologies for any missteps I have
made in this message.

Best regards,
Charlie Hornsby



Re: Replication slot stats misgivings

2021-04-12 Thread Masahiko Sawada
On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila  wrote:
>
> On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada  wrote:
> >
> > On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila  wrote:
> > >
> > > On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila 
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > > It seems Vignesh has changed patches based on the latest set of
> > > > > > > comments so you might want to rebase.
> > > > > >
> > > > > > I've merged my patch into the v6 patch set Vignesh submitted.
> > > > > >
> > > > > > I've attached the updated version of the patches. I didn't change
> > > > > > anything in the patch that changes char[NAMEDATALEN] to NameData 
> > > > > > (0001
> > > > > > patch) and patches that add tests.
> > > > > >
> > > > >
> > > > > I think we can push 0001. What do you think?
> > > >
> > > > +1
> > > >
> > > > >
> > > > > > In 0003 patch I reordered the
> > > > > > output parameters of pg_stat_replication_slots; showing total number
> > > > > > of transactions and total bytes followed by statistics for spilled 
> > > > > > and
> > > > > > streamed transactions seems appropriate to me.
> > > > > >
> > > > >
> > > > > I am not sure about this because I think we might want to add some
> > > > > info of stream/spill bytes in total_bytes description (something like
> > > > > stream/spill bytes are not in addition to total_bytes).
> >
> > BTW doesn't it confuse users that stream/spill bytes are not in
> > addition to total_bytes? User will need to do "total_bytes +
> > spill/stream_bytes" to know the actual total amount of data sent to
> > the decoding output plugin, is that right?
> >
>
> No, total_bytes includes the spill/stream bytes. So, the user doesn't
> need to do any calculation to compute totel_bytes sent to output
> plugin.

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 1320. Am I missing
something?

postgres(1:85969)=# select pg_create_logical_replication_slot('s',
'test_decoding');
 pg_create_logical_replication_slot

 (s,0/1884468)
(1 row)

postgres(1:85969)=# create table a (i int);
CREATE TABLE
postgres(1:85969)=# insert into a select generate_series(1, 10);
INSERT 0 10
postgres(1:85969)=# set logical_decoding_work_mem to 64;
SET
postgres(1:85969)=# select * from pg_stat_replication_slots ;
 slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
---++-++-+-+-+--+--+-
 s |  0 |   0 |  0 |   0 |
  0 |   0 |0 |0 |
(1 row)

postgres(1:85969)=# select count(*) from
pg_logical_slot_peek_changes('s', NULL, NULL);
 count

 14
(1 row)

postgres(1:85969)=# select * from pg_stat_replication_slots ;
 slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
---++-++-+-+-+--+--+-
 s |  2 |1808 |  1 | 202 |
1320 |   0 |0 |0 |
(1 row)

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: multi-install PostgresNode fails with older postgres versions

2021-04-12 Thread Andrew Dunstan


On 4/12/21 8:59 AM, Jehan-Guillaume de Rorthais wrote:
> Hi,
>
> On Wed, 7 Apr 2021 20:07:41 +0200
> Jehan-Guillaume de Rorthais  wrote:
> [...]
 Let me know if it worth that I work on an official patch.
>>> Let's give it a try ...  
>> OK
> So, as promised, here is my take to port my previous work on PostgreSQL
> source tree.
>
> Make check pass with no errors. The new class probably deserve some own TAP
> tests.
>
> Note that I added a PostgresVersion class for easier and cleaner version
> comparisons. This could be an interesting take away no matter what.
>
> I still have some more ideas to cleanup, revamp and extend the base class, but
> I prefer to go incremental to keep things review-ability.
>

Thanks for this. We have been working somewhat on parallel lines. With
your permission I'm going to take some of what you've done and
incorporate it in the patch I'm working on.


A PostgresVersion class is a good idea - I was already contemplating
something of the kind.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





[GSoC 2021 proposal] pl/julia extension

2021-04-12 Thread Konstantina Skovola
Hello community,

I’m Konstantina, a GSoC candidate for the project “Create Procedural
language extension for the Julia programming language”. The mentors have
already looked at my proposal and I’m attaching the finalized document.
There is still some time for corrections, in case anyone would like to
offer an opinion.

Best regards,

Konstantina Skovola


Konstantina Skovola pl_julia proposal.pdf
Description: Adobe PDF document


Re: psql - add SHOW_ALL_RESULTS option

2021-04-12 Thread Fabien COELHO



Hello Tom,


It's right: this is dead code because all paths through the if-nest
starting at line 1373 now leave results = NULL.  Hence, this patch
has broken the autocommit logic;


Do you mean yet another feature without a single non-regression test? :-(

I tend to rely on non regression tests to catch bugs in complex 
multi-purpose hard-to-maintain functions when the code is modified.


I have submitted a patch to improve psql coverage to about 90%, but given 
the lack of enthousiasm, I simply dropped it. Not sure I was right not 
to insist.


it's no longer possible to tell whether we should do anything with our 
savepoint.



Between this and the known breakage of control-C, it seems clear
to me that this patch was nowhere near ready for prime time.
I think shoving it in on the last day before feature freeze was
ill-advised, and it ought to be reverted.  We can try again later.


The patch has been asleep for quite a while, and was resurrected, possibly 
too late in the process. ISTM that fixing it for 14 is manageable, 
but this is not my call.


--
Fabien.




Re: multi-install PostgresNode fails with older postgres versions

2021-04-12 Thread Jehan-Guillaume de Rorthais
On Mon, 12 Apr 2021 09:52:24 -0400
Andrew Dunstan  wrote:

> On 4/12/21 8:59 AM, Jehan-Guillaume de Rorthais wrote:
> > Hi,
> >
> > On Wed, 7 Apr 2021 20:07:41 +0200
> > Jehan-Guillaume de Rorthais  wrote:
> > [...]  
>  Let me know if it worth that I work on an official patch.  
> >>> Let's give it a try ...
> >> OK  
> > So, as promised, here is my take to port my previous work on PostgreSQL
> > source tree.
> >
> > Make check pass with no errors. The new class probably deserve some own TAP
> > tests.
> >
> > Note that I added a PostgresVersion class for easier and cleaner version
> > comparisons. This could be an interesting take away no matter what.
> >
> > I still have some more ideas to cleanup, revamp and extend the base class,
> > but I prefer to go incremental to keep things review-ability.
> >  
> 
> Thanks for this. We have been working somewhat on parallel lines. With
> your permission I'm going to take some of what you've done and
> incorporate it in the patch I'm working on.

The current context makes my weeks difficult to plan and quite chaotic, that's
why it took me some days to produce the patch I promised.

I'm fine with working with a common base code, thanks. Feel free to merge both
code, we'll trade patches during review. However, I'm not sure what is the
status of your patch, so I can not judge what would be the easier way to
incorporate. Mine is tested down to 9.1 (9.0 was meaningless because of lack of
pg_stat_replication) with:

* get_new_node
* init(allows_streaming => 1)
* start
* stop
* backup
* init_from_backup
* wait_for_catchup
* command_checks_all

Note the various changes in my init() and new method allow_streaming(), etc.

FYI (to avoid duplicate work), the next step on my todo was to produce some
meaningful tap tests to prove the class.

> A PostgresVersion class is a good idea - I was already contemplating
> something of the kind.

Thanks!

Regards,




Contribution to PostgreSQL - please give an advice

2021-04-12 Thread Ian Zagorskikh
Hi all!

I would like to contribute my time and efforts to the PostgreSQL project
development. I have some [hope not too bad] experience in software
development primarily for Linux/BSD/Windows platforms with C/C++ though
almost no experience in RDBMS internals. I have read the "Development
Information" section in wiki and checked the official TODO list. It's
really great but it's a bit huge and "all is so tasty" so I don't know
where to start. Can you please advise me how to proceed? Maybe there are
active tasks or bugs or issues that require man power? I would appreciate
any advice or mentorship. Thank you!

-- 
Best Regards,
Ian Zagorskikh


Re: SQL/JSON: JSON_TABLE

2021-04-12 Thread Erik Rijkers
> On 2021.03.27. 02:12 Nikita Glukhov  wrote:
> Attached 47th version of the patches.

We're past feature freeze for 14 and alas, JSON_TABLE has not made it.

I have tested quite a bit with it and because I didn't find any trouble with 
functionality or speed, I wanted to at least mention that here once.

I looked at v47, these files
> [0001-SQL-JSON-functions-v47.patch]
> [0002-JSON_TABLE-v47.patch]
> [0003-JSON_TABLE-PLAN-DEFAULT-clause-v47.patch]
> [0004-JSON_TABLE-PLAN-clause-v47.patch]
> [manual_addition_fixed.patch]  # for this see [1], [2]

   (v47 doesn't apply anymore, as cfbot shows, but instances can still be built 
on top of 6131ffc43ff from 30 march 2021)

I hope it will fare better next round, version 15.

Thanks,

Erik Rijkers

[1] 
https://www.postgresql.org/message-id/69eefc5a-cabc-8dd3-c689-93da038c0d6a%40postgrespro.ru
[2] 
https://www.postgresql.org/message-id/19181987.22943.1617141503618%40webmailclassic.xs4all.nl


> -- 
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company




Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));

2021-04-12 Thread Yulin PEI
HI hackers,
I found it could cause a crash when executing sql statement: `CREATE VIEW 
v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in 
postgres 13.2 release.

The crash happens at view.c:89 and I did some analysis:

```

ColumnDef  *def = makeColumnDef(tle->resname,
exprType((Node *) tle->expr),
exprTypmod((Node *) tle->expr),
exprCollation((Node *) tle->expr));



/*
 * It's possible that the column is of a collatable type but the
 * collation could not be resolved, so double-check.
 */

// Here is the analysis:

//example :  ('4' COLLATE "C")::INT

//exprCollation((Node *) tle->expr) is the oid of collate "COLLATE 'C'" so 
def->collOid is valid
//exprType((Node *) tle->expr)) is 23 which is the oid of type int4.
//We know that int4 is not collatable by calling type_is_collatable()

if (type_is_collatable(exprType((Node *) tle->expr)))
{
   if (!OidIsValid(def->collOid))
  ereport(ERROR,
(errcode(ERRCODE_INDETERMINATE_COLLATION),
 errmsg("could not determine which collation to use for view column 
\"%s\"",
  def->colname),
 errhint("Use the COLLATE clause to set the collation 
explicitly.")));
}
else

   // So we are here! int is not collatable and def->collOid is valid.
   Assert(!OidIsValid(def->collOid));

```

I am not sure whether to fix this bug in function DefineVirtualRelation or to 
fix this bug in parse tree and analyze procedure, so maybe we can discuss.




Best Regard!
Yulin PEI




Re: Replication slot stats misgivings

2021-04-12 Thread vignesh C
On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada  wrote:
>
> On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > It seems Vignesh has changed patches based on the latest set of
> > > > > > > > comments so you might want to rebase.
> > > > > > >
> > > > > > > I've merged my patch into the v6 patch set Vignesh submitted.
> > > > > > >
> > > > > > > I've attached the updated version of the patches. I didn't change
> > > > > > > anything in the patch that changes char[NAMEDATALEN] to NameData 
> > > > > > > (0001
> > > > > > > patch) and patches that add tests.
> > > > > > >
> > > > > >
> > > > > > I think we can push 0001. What do you think?
> > > > >
> > > > > +1
> > > > >
> > > > > >
> > > > > > > In 0003 patch I reordered the
> > > > > > > output parameters of pg_stat_replication_slots; showing total 
> > > > > > > number
> > > > > > > of transactions and total bytes followed by statistics for 
> > > > > > > spilled and
> > > > > > > streamed transactions seems appropriate to me.
> > > > > > >
> > > > > >
> > > > > > I am not sure about this because I think we might want to add some
> > > > > > info of stream/spill bytes in total_bytes description (something 
> > > > > > like
> > > > > > stream/spill bytes are not in addition to total_bytes).
> > >
> > > BTW doesn't it confuse users that stream/spill bytes are not in
> > > addition to total_bytes? User will need to do "total_bytes +
> > > spill/stream_bytes" to know the actual total amount of data sent to
> > > the decoding output plugin, is that right?
> > >
> >
> > No, total_bytes includes the spill/stream bytes. So, the user doesn't
> > need to do any calculation to compute totel_bytes sent to output
> > plugin.
>
> The following test for the latest v8 patch seems to show different.
> total_bytes is 1808 whereas spill_bytes is 1320. Am I missing
> something?

I will check this issue and post my analysis.

Regards,
Vignesh




Re: Contribution to PostgreSQL - please give an advice

2021-04-12 Thread Pavel Stehule
Hi

po 12. 4. 2021 v 17:22 odesílatel Ian Zagorskikh  napsal:

> Hi all!
>
> I would like to contribute my time and efforts to the PostgreSQL project
> development. I have some [hope not too bad] experience in software
> development primarily for Linux/BSD/Windows platforms with C/C++ though
> almost no experience in RDBMS internals. I have read the "Development
> Information" section in wiki and checked the official TODO list. It's
> really great but it's a bit huge and "all is so tasty" so I don't know
> where to start. Can you please advise me how to proceed? Maybe there are
> active tasks or bugs or issues that require man power? I would appreciate
> any advice or mentorship. Thank you!
>

It is great - any hands and eyes are welcome.

I think so the best start now (when you have not own topic) is review  of
one or more patches from commitfest application

https://commitfest.postgresql.org/33/

Regards

Pavel



> --
> Best Regards,
> Ian Zagorskikh
>


Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Bruce Momjian
On Sun, Apr 11, 2021 at 07:33:34PM -0700, Zhihong Yu wrote:
> Among previous examples given by Bryn, the following produces correct result
> based on Bruce's patch.
> 
> # select interval '-1.7 years 29.4 months';
>     interval
> 
>  9 mons 12 days

Yes, that changed is caused by the rounding fixes, and not by the unit
pushdown adjustments.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> On Sun, Apr 11, 2021 at 11:16 AM Tom Lane  wrote:
>> It wasn't very clear, because I hadn't thought it through very much;
>> but what I'm imagining is that we discard most of the thrashing around
>> all-visible rechecks and have just one such test somewhere very late
>> in heap_update, after we've successfully acquired a target buffer for
>> the update and are no longer going to possibly need to release any
>> buffer lock.  If at that one point we see the page is all-visible
>> and we don't have the vmbuffer, then we have to release all our locks
>> and go back to "l2".  Which is less efficient than some of the existing
>> code paths, but given how hard this problem is to reproduce, it seems
>> clear that optimizing for the occurrence is just not worth it.

> Oh! That sounds way better.

After poking at this for awhile, it seems like it won't work very nicely.
The problem is that once we've invoked the toaster, we really don't want
to just abandon that work; we'd leak any toasted out-of-line data that
was created.

So I think we have to stick with the current basic design, and just
tighten things up to make sure that visibility pins are accounted for
in the places that are missing it.

Hence, I propose the attached.  It passes check-world, but that proves
absolutely nothing of course :-(.  I wonder if there is any way to
exercise these code paths deterministically.

(I have realized BTW that I was exceedingly fortunate to reproduce
the buildfarm report here --- I have run hundreds of additional
cycles of the same test scenario without getting a second failure.)

regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 03d4abc938..265b31e981 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3784,7 +3784,7 @@ l2:
 		 * overhead would be unchanged, that doesn't seem necessarily
 		 * worthwhile.
 		 */
-		if (PageIsAllVisible(BufferGetPage(buffer)) &&
+		if (PageIsAllVisible(page) &&
 			visibilitymap_clear(relation, block, vmbuffer,
 VISIBILITYMAP_ALL_FROZEN))
 			cleared_all_frozen = true;
@@ -3846,36 +3846,46 @@ l2:
 		 * first".  To implement this, we must do RelationGetBufferForTuple
 		 * while not holding the lock on the old page, and we must rely on it
 		 * to get the locks on both pages in the correct order.
+		 *
+		 * Another consideration is that we need visibility map page pin(s) if
+		 * we will have to clear the all-visible flag on either page.  If we
+		 * call RelationGetBufferForTuple, we rely on it to acquire any such
+		 * pins; but if we don't, we have to handle that here.  Hence we need
+		 * a loop.
 		 */
-		if (newtupsize > pagefree)
-		{
-			/* Assume there's no chance to put heaptup on same page. */
-			newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
-			   buffer, 0, NULL,
-			   &vmbuffer_new, &vmbuffer);
-		}
-		else
+		for (;;)
 		{
+			if (newtupsize > pagefree)
+			{
+/* It doesn't fit, must use RelationGetBufferForTuple. */
+newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
+   buffer, 0, NULL,
+   &vmbuffer_new, &vmbuffer);
+/* We're all done. */
+break;
+			}
+			/* Acquire VM page pin if needed and we don't have it. */
+			if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
+visibilitymap_pin(relation, block, &vmbuffer);
 			/* Re-acquire the lock on the old tuple's page. */
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 			/* Re-check using the up-to-date free space */
 			pagefree = PageGetHeapFreeSpace(page);
-			if (newtupsize > pagefree)
+			if (newtupsize > pagefree ||
+(vmbuffer == InvalidBuffer && PageIsAllVisible(page)))
 			{
 /*
- * Rats, it doesn't fit anymore.  We must now unlock and
- * relock to avoid deadlock.  Fortunately, this path should
- * seldom be taken.
+ * Rats, it doesn't fit anymore, or somebody just now set the
+ * all-visible flag.  We must now unlock and loop to avoid
+ * deadlock.  Fortunately, this path should seldom be taken.
  */
 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
-   buffer, 0, NULL,
-   &vmbuffer_new, &vmbuffer);
 			}
 			else
 			{
-/* OK, it fits here, so we're done. */
+/* We're all done. */
 newbuf = buffer;
+break;
 			}
 		}
 	}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 37a1be4114..ffc89685bf 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -293,9 +293,13 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
  *	happen if space is freed in that page after heap_update finds there's not
  *	enough there).  In that case, the page will be pinned and locked only once.
  *
- *	For the vmbuffer and vmbuffer_other arguments, we avoid deadlock by
- *	locking them only af

Re: psql - add SHOW_ALL_RESULTS option

2021-04-12 Thread Tom Lane
Fabien COELHO  writes:
>> Between this and the known breakage of control-C, it seems clear
>> to me that this patch was nowhere near ready for prime time.
>> I think shoving it in on the last day before feature freeze was
>> ill-advised, and it ought to be reverted.  We can try again later.

> The patch has been asleep for quite a while, and was resurrected, possibly 
> too late in the process. ISTM that fixing it for 14 is manageable, 
> but this is not my call.

I just observed an additional issue that I assume was introduced by this
patch, which is that psql's response to a server crash has gotten
repetitive:

regression=# CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM 
generate_series(1, 10));
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> 

I've never seen that before, and it's not because I don't see
server crashes regularly.

regards, tom lane




Re: Contribution to PostgreSQL - please give an advice

2021-04-12 Thread Justin Pryzby
On Mon, Apr 12, 2021 at 03:21:41PM +, Ian Zagorskikh wrote:
> I would like to contribute my time and efforts to the PostgreSQL project
> development. I have some [hope not too bad] experience in software
> development primarily for Linux/BSD/Windows platforms with C/C++ though
> almost no experience in RDBMS internals. I have read the "Development
> Information" section in wiki and checked the official TODO list. It's
> really great but it's a bit huge and "all is so tasty" so I don't know
> where to start. Can you please advise me how to proceed? Maybe there are
> active tasks or bugs or issues that require man power? I would appreciate
> any advice or mentorship. Thank you!

I think the best way is to start following this mailing list.
Since it's very "busy", it may be easier to follow the web archives.
https://www.postgresql.org/list/pgsql-hackers/2021-04/

When someone reports a problem, you can try to reproduce it to see if they've
provided enough information to confirm the issue, or test any proposed patch.

You can see the patches being proposed for future release:
https://commitfest.postgresql.org/

However, right now, we've just passed the "feature freeze" for v14, so new
development is on hold for awhile.  What's most useful is probably testing the
changes that have been committed.  You can check that everything works as
described, that the implemented behavior doesn't have any rough edges, that the
features work together well, and work under your own use-cases/workloads.

You can see a list of commits for v14 like so:
> git log --cherry-pick origin/REL_13_STABLE...origin/master
(Any commits that show up twice are also in v13, so aren't actually "new in
v14", but the patches differ so GIT couldn't figure that out)

-- 
Justin




Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-12 Thread Ranier Vilela
Em seg., 12 de abr. de 2021 às 03:04, Tom Lane  escreveu:

> Michael Paquier  writes:
> > On Sun, Apr 11, 2021 at 07:42:20PM -0300, Ranier Vilela wrote:
> >> Em dom., 11 de abr. de 2021 às 16:25, Justin Pryzby <
> pry...@telsasoft.com>
> >>> I think it's cleanest to write:
> >>> |HeapTupleData tmptup = {0};
>
> > I agree that this would be cleaner.
>
> It would be wrong, though, or at least not have the same effect.
>
I think that you speak about fill pointers with 0 is not the same as fill
pointers with NULL.


> ItemPointerSetInvalid does not set the target to all-zeroes.
>
ItemPointerSetInvalid set or not set the target to all-zeroes?


> (Regardless of that detail, it's generally best to accomplish
> objective X in the same way that existing code does.  Deciding
> that you have a better way is often wrong, and even if you
> are right, you should then submit a patch to change all the
> existing cases.)
>
I was confused here, does the patch follow the pattern and fix the problem
or not?

regards,
Ranier Vilela


Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));

2021-04-12 Thread Tom Lane
Yulin PEI  writes:
> I found it could cause a crash when executing sql statement: `CREATE VIEW 
> v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in 
> postgres 13.2 release.

Nice catch.  I don't think the code in DefineVirtualRelation is wrong:
exprCollation shouldn't report any collation for an expression of a
non-collatable type.  Rather the problem is with an old kluge in
coerce_type(), which will push a type coercion underneath a CollateExpr
... without any mind for the possibility that the coercion result isn't
collatable.  So the right fix is more or less the attached.

regards, tom lane

diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index d5310f27db..228ee8e7d6 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -95,6 +95,7 @@ coerce_to_target_type(ParseState *pstate, Node *expr, Oid exprtype,
 	 * *must* know that to avoid possibly calling hide_coercion_node on
 	 * something that wasn't generated by coerce_type.  Note that if there are
 	 * multiple stacked CollateExprs, we just discard all but the topmost.
+	 * Also, if the target type isn't collatable, we discard the CollateExpr.
 	 */
 	origexpr = expr;
 	while (expr && IsA(expr, CollateExpr))
@@ -114,7 +115,7 @@ coerce_to_target_type(ParseState *pstate, Node *expr, Oid exprtype,
 ccontext, cformat, location,
 (result != expr && !IsA(result, Const)));
 
-	if (expr != origexpr)
+	if (expr != origexpr && type_is_collatable(targettype))
 	{
 		/* Reinstall top CollateExpr */
 		CollateExpr *coll = (CollateExpr *) origexpr;
@@ -384,7 +385,7 @@ coerce_type(ParseState *pstate, Node *node,
 		if (result)
 			return result;
 	}
-	if (IsA(node, CollateExpr))
+	if (IsA(node, CollateExpr) && type_is_collatable(targetTypeId))
 	{
 		/*
 		 * If we have a COLLATE clause, we have to push the coercion


Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-12 Thread Tomas Vondra



On 4/12/21 6:55 PM, Ranier Vilela wrote:
> 
> 
> Em seg., 12 de abr. de 2021 às 03:04, Tom Lane  > escreveu:
> 
> Michael Paquier mailto:mich...@paquier.xyz>>
> writes:
> > On Sun, Apr 11, 2021 at 07:42:20PM -0300, Ranier Vilela wrote:
> >> Em dom., 11 de abr. de 2021 às 16:25, Justin Pryzby
> mailto:pry...@telsasoft.com>>
> >>> I think it's cleanest to write:
> >>> |HeapTupleData tmptup = {0};
> 
> > I agree that this would be cleaner.
> 
> It would be wrong, though, or at least not have the same effect.
> 
> I think that you speak about fill pointers with 0 is not the same as
> fill pointers with NULL.
>  
> 
> ItemPointerSetInvalid does not set the target to all-zeroes.
> 
> ItemPointerSetInvalid set or not set the target to all-zeroes?
> 

Not sure what exactly are you asking about? What Tom said is that if you
do 'struct = {0}' it sets all fields to 0, but we only want/need to set
the t_self/t_tableOid fields to 0.

> 
> (Regardless of that detail, it's generally best to accomplish
> objective X in the same way that existing code does.  Deciding
> that you have a better way is often wrong, and even if you
> are right, you should then submit a patch to change all the
> existing cases.)
> 
> I was confused here, does the patch follow the pattern and fix the
> problem or not?
> 

I believe it does, and it's doing it in the same way as most other
similar places.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-12 Thread Tom Lane
Ranier Vilela  writes:
> Em seg., 12 de abr. de 2021 às 03:04, Tom Lane  escreveu:
>> It would be wrong, though, or at least not have the same effect.

> I think that you speak about fill pointers with 0 is not the same as fill
> pointers with NULL.

No, I mean that InvalidBlockNumber isn't 0.

> I was confused here, does the patch follow the pattern and fix the problem
> or not?

Your patch seems fine.  Justin's proposed improvement isn't.

(I'm not real sure whether there's any *actual* bug here --- would we
really be looking at either ctid or tableoid of this temporary tuple?
But it's probably best to ensure that they're valid anyway.)

regards, tom lane




Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-12 Thread Justin Pryzby
On Mon, Apr 12, 2021 at 01:55:13PM -0300, Ranier Vilela wrote:
> Em seg., 12 de abr. de 2021 às 03:04, Tom Lane  escreveu:
> > Michael Paquier  writes:
> > > On Sun, Apr 11, 2021 at 07:42:20PM -0300, Ranier Vilela wrote:
> > >> Em dom., 11 de abr. de 2021 às 16:25, Justin Pryzby <
> > pry...@telsasoft.com>
> > >>> I think it's cleanest to write:
> > >>> |HeapTupleData tmptup = {0};
> >
> > > I agree that this would be cleaner.
> >
> > It would be wrong, though, or at least not have the same effect.
> >
> I think that you speak about fill pointers with 0 is not the same as fill
> pointers with NULL.
> 
> 
> > ItemPointerSetInvalid does not set the target to all-zeroes.
> >
> ItemPointerSetInvalid set or not set the target to all-zeroes?

I think Tom means that it does:
BlockIdSet(&((pointer)->ip_blkid), InvalidBlockNumber),
(pointer)->ip_posid = InvalidOffsetNumber

but it's not zero, as I thought:

src/include/storage/block.h:#define InvalidBlockNumber  ((BlockNumber) 
0x)

> > (Regardless of that detail, it's generally best to accomplish
> > objective X in the same way that existing code does.  Deciding
> > that you have a better way is often wrong, and even if you
> > are right, you should then submit a patch to change all the
> > existing cases.)

FYI, I'd gotten the idea from here:

$ git grep 'HeapTupleData.*='
src/backend/executor/execTuples.c:  HeapTupleData tuple = {0};

-- 
Justin




Proposal for working on open source with PostgreSQL

2021-04-12 Thread Nandni Mehla
Hello Sir/Madam,
I'm Nandni Mehla, a sophomore currently pursuing B.Tech in IT from Indira
Gandhi Delhi Technical University for Women, Delhi. I've recently started
working on open source and I think I will be a positive addition to
your organization for working on projects using C and SQL, as I have
experience in these, and I am willing to learn more from you.
I am attaching my proposal in this email for your reference, please guide
me through this.
Regards.

https://docs.google.com/document/d/1H84WmzZbMERPrjsnXbvoQ7W2AaKsM8eJU02SNw7vQBk/edit?usp=sharing


Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Peter Geoghegan
On Mon, Apr 12, 2021 at 9:19 AM Tom Lane  wrote:
> So I think we have to stick with the current basic design, and just
> tighten things up to make sure that visibility pins are accounted for
> in the places that are missing it.
>
> Hence, I propose the attached.  It passes check-world, but that proves
> absolutely nothing of course :-(.  I wonder if there is any way to
> exercise these code paths deterministically.

This approach seems reasonable to me. At least you've managed to
structure the visibility map page pin check as concomitant with the
existing space recheck.

> (I have realized BTW that I was exceedingly fortunate to reproduce
> the buildfarm report here --- I have run hundreds of additional
> cycles of the same test scenario without getting a second failure.)

In the past I've had luck with RR's chaos mode (most notably with the
Jepsen SSI bug). That didn't work for me here, though I might just
have not persisted with it for long enough. I should probably come up
with a shell script that runs the same thing hundreds of times or more
in chaos mode, while making sure that useless recordings don't
accumulate.

The feature is described here:

https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mode.html

You only have to be lucky once. Once that happens, you're left with a
recording to review and re-review at your leisure. This includes all
Postgres backends, maybe even pg_regress and other scaffolding (if
that's what you're after).

But that's for debugging, not testing. The only way that we'll ever be
able to test stuff like this is with something like Alexander
Korotkov's stop events patch [1]. That infrastructure should be added
sooner rather than later.

[1] 
https://postgr.es/m/capphfdtseohx8dsk9qp+z++i4bgqoffkip6jdwngea+g7z-...@mail.gmail.com
--
Peter Geoghegan




Re: [PATCH] Identify LWLocks in tracepoints

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 14:31:32 +0800, Craig Ringer wrote:
> * There is no easy way to look up the tranche name by ID from outside the
> backend

But it's near trivial to add that.


> It's annoying that we have to pay the cost of computing the tranche name
> though. It never used to matter, but now that T_NAME() expands to
> GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
> on such a hot path. I might see if I can do a little comparison and see how
> much.  I could add TRACE_POSTGRESQL_<>_ENABLED() guards since 
> we
> do in fact build with SDT semaphore support. That adds a branch for each
> tracepoint, but they're already marshalling arguments and making a function
> call that does lots more than a single branch, so that seems pretty
> sensible.

I am against adding any overhead for this feature. I honestly think the
probes we have right now in postgres do not provide a commensurate
benefit.


> (It'd be possible to instead generate probes for each GUC at compile-time
> using the preprocessor and the DTRACE_ macros. But as noted above, that
> doesn't currently work properly in the same compilation unit that a dtrace
> script-generated probes.h is included in. I think it's probably nicer to
> have specific probes for GUCs of high interest, then generic probes that
> capture all GUCs anyway.)
>
> There are a TON of probes I want to add, and I have a tree full of them
> waiting to submit progressively. Yes, ability to probe all GUCs is in
> there. So is detail on walsender, reorder buffer, and snapshot builder
> activity. Heavyweight lock SDTs. A probe that identifies the backend type
> at startup. SDT probe events emitted for every wait-event. Probes in elog.c
> to let probes observe error unwinding, capture error messages,
> etc. [...] A probe that fires whenever debug_query_string
> changes. Lots. But I can't submit them all at once, especially without
> some supporting use cases and scripts that other people can use so
> they can understand why these probes are useful.

-1. This is not scalable. Adding static probes all over has both a
runtime (L1I, branches, code optimization) and maintenance overhead.


> (Those can also be used with systemtap guru mode scripts to do things
> like turn a particular elog(DEBUG) into a PANIC at runtime for
> diagnostic purposes).

Yikes.

Greetings,

Andres Freund




Re: Proposal for working on open source with PostgreSQL

2021-04-12 Thread Laurenz Albe
On Mon, 2021-04-12 at 23:26 +0530, Nandni Mehla wrote:
> I'm Nandni Mehla, a sophomore currently pursuing B.Tech in IT from Indira 
> Gandhi
>  Delhi Technical University for Women, Delhi. I've recently started working on
>  open source and I think I will be a positive addition to your organization 
> for
>  working on projects using C and SQL, as I have experience in these, and I am
>  willing to learn more from you.
> I am attaching my proposal in this email for your reference, please guide me 
> through this.
> Regards.
> 
> https://docs.google.com/document/d/1H84WmzZbMERPrjsnXbvoQ7W2AaKsM8eJU02SNw7vQBk/edit?usp=sharing

Thanks for your willingness to help with PostgreSQL!

I couldn't see any detail information about the project in your proposal, except
that the project is called "plsample".  Is there more information somewhere?

If it is a procedural language as the name suggests, you probably don't have
to modify PostgreSQL core code to make it work.

Yours,
Laurenz Albe





Re: psql - add SHOW_ALL_RESULTS option

2021-04-12 Thread Bossart, Nathan
On 4/12/21, 9:25 AM, "Tom Lane"  wrote:
> Fabien COELHO  writes:
>>> Between this and the known breakage of control-C, it seems clear
>>> to me that this patch was nowhere near ready for prime time.
>>> I think shoving it in on the last day before feature freeze was
>>> ill-advised, and it ought to be reverted.  We can try again later.
>
>> The patch has been asleep for quite a while, and was resurrected, possibly
>> too late in the process. ISTM that fixing it for 14 is manageable,
>> but this is not my call.
>
> I just observed an additional issue that I assume was introduced by this
> patch, which is that psql's response to a server crash has gotten
> repetitive:
>
> regression=# CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM 
> generate_series(1, 10));
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
> !?>
>
> I've never seen that before, and it's not because I don't see
> server crashes regularly.

I think I've found another issue with this patch.  If AcceptResult()
returns false in SendQueryAndProcessResults(), it seems to result in
an infinite loop of "unexpected PQresultStatus" messages.  This can be
reproduced by trying to run "START_REPLICATION" via psql.

The following patch seems to resolve the issue, although I'll admit I
haven't dug into this too deeply.  In any case, +1 for reverting the
patch for now.

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 028a357991..abafd41763 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1176,7 +1176,7 @@ SendQueryAndProcessResults(const char *query, double 
*pelapsed_msec, bool is_wat

 /* and switch to next result */
 result = PQgetResult(pset.db);
-continue;
+break;
 }

 /* must handle COPY before changing the current result */

Nathan



Re: psql - add SHOW_ALL_RESULTS option

2021-04-12 Thread Alvaro Herrera
On 2021-Apr-12, Bossart, Nathan wrote:

> The following patch seems to resolve the issue, although I'll admit I
> haven't dug into this too deeply.  In any case, +1 for reverting the
> patch for now.

Please note that there's no "for now" about it -- if the patch is
reverted, the only way to get it back is to wait for PG15.  That's
undesirable.  A better approach is to collect all those bugs and get
them fixed.  There's plenty of time to do that.

I, for one, would prefer to see the feature repaired in this cycle.

-- 
Álvaro Herrera   Valdivia, Chile




Re: Proposal for working on open source with PostgreSQL

2021-04-12 Thread Alvaro Herrera
On 2021-Apr-12, Laurenz Albe wrote:

> I couldn't see any detail information about the project in your proposal, 
> except
> that the project is called "plsample".  Is there more information somewhere?
> 
> If it is a procedural language as the name suggests, you probably don't have
> to modify PostgreSQL core code to make it work.

plsample is in src/test/modules/plsample.  Possible improvements are
handling of DO blocks, support for trigger functions, routine
validation.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andrey Borodin
Hi hackers!

This thread continues discussion of allowing something to non-superuser, AFAIK 
previous was [0].

Currently only superuser is allowed to create LEAKPROOF functions because 
leakproof functions can see tuples which have not yet been filtered out by 
security barrier views or row level security policies.

But managed cloud services typically do not provide superuser roles. I'm 
thinking about allowing the database owner or someone with BYPASSRLS flag to 
create these functions. Or, perhaps, pg_read_all_data.

And I'm trying to figure out if there are any security implications. Consider a 
user who already has access to all user data in a DB and the ability to create 
LEAKPROOF functions. Can they gain a superuser role or access something else 
that is available only to a superuser?
Is it possible to relax requirements for the creator of LEAKPROOF functions in 
upstream Postgres?

I'll appreciate any comments. Thanks!


Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/CACqFVBbx6PDq%2B%3DvHM0n78kHzn8tvOM-kGO_2q_q0zNAMT%2BTzdA%40mail.gmail.com



Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Tom Lane
Andrey Borodin  writes:
> Currently only superuser is allowed to create LEAKPROOF functions because 
> leakproof functions can see tuples which have not yet been filtered out by 
> security barrier views or row level security policies.

Yeah.

> But managed cloud services typically do not provide superuser roles.

This is not a good argument for relaxing superuser requirements.

regards, tom lane




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-11 13:55:30 -0400, Tom Lane wrote:
> Either way, it's hard to argue that heap_update hasn't crossed the
> complexity threshold where it's impossible to maintain safely.  We
> need to simplify it.

Yea, I think we're well beyond that point. I can see a few possible
steps to wrangle the existing complexity into an easier to understand
shape:

- Rename heapam.c goto labels, they're useless to understand what is
  happening.

- Move HeapTupleSatisfiesUpdate() call and the related branches
  afterwards into its own function.

- Move "temporarily mark it locked" branch into its own function. It's a
  minimal implementation of tuple locking, so it seems more than
  separate enough.

- Move the "store the new tuple" part into its own function (pretty much
  the critical section).

- Probably worth unifying the exit paths - there's a fair bit of
  duplication by now...

Half related:

- I think we might also need to do something about the proliferation of
  bitmaps in heap_update(). We now separately allocate 5 bitmapsets -
  that strikes me as fairly insane.


However, these would not really address the complexity in itself, just
make it easier to manage.

ISTM that a lot of the complexity is related to needing to retry (and
avoiding doing so unnecessarily), which in turn is related to avoiding
deadlocks. We actually know how to not need that to the same degree -
the (need_toast || newtupsize > pagefree) locks the tuple and afterwards
has a lot more freedom. We obviously can't just always do that, due to
the WAL logging overhead.

I wonder if we could make that path avoid the WAL logging overhead. We
don't actually need a full blown tuple lock, potentially even with its
own multixact, here.

The relevant comment (in heap_lock_tuple()) says:
/*
 * XLOG stuff.  You might think that we don't need an XLOG record 
because
 * there is no state change worth restoring after a crash.  You would be
 * wrong however: we have just written either a TransactionId or a
 * MultiXactId that may never have been seen on disk before, and we need
 * to make sure that there are XLOG entries covering those ID numbers.
 * Else the same IDs might be re-used after a crash, which would be
 * disastrous if this page made it to disk before the crash.  
Essentially
 * we have to enforce the WAL log-before-data rule even in this case.
 * (Also, in a PITR log-shipping or 2PC environment, we have to have 
XLOG
 * entries for everything anyway.)
 */

But I don't really think that doing full-blown WAL tuple-locking WAL
logging is really the right solution.

- The "next xid" concerns are at least as easily solved by WAL logging a
  distinct "highest xid assigned" record when necessary. Either by
  having a shared memory variable saying "latestLoggedXid" or such, or
  by having end-of-recovery advance nextXid to beyond what ExtendCLOG()
  extended to. That reduces the overhead to at most once-per-xact (and
  commonly smaller) or nothing, respectively.

- While there's obviously a good bit of simplicity ensuring that a
  replica is exactly the same ("Also, in a PITR log-shipping or 2PC
  environment ..."), we don't actually enforce that strictly anyway -
  so I am not sure why it's necessary to pay the price here.

But maybe I'm all wet here, I certainly haven't had enough coffee.

Greetings,

Andres Freund




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Tomas Vondra


On 4/12/21 10:37 PM, Tom Lane wrote:
> Andrey Borodin  writes:
>> Currently only superuser is allowed to create LEAKPROOF functions
>> because leakproof functions can see tuples which have not yet been
>> filtered out by security barrier views or row level security
>> policies.
> 
> Yeah.
> 
>> But managed cloud services typically do not provide superuser
>> roles.
> 
> This is not a good argument for relaxing superuser requirements.
> 

I guess for the cloud services it's not an issue - they're mostly
concerned about manageability and restricting access to the OS. It's
unfortunate that we tie the this capability to being superuser, so maybe
the right solution would be to introduce a separate role with this
privilege?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andrey Borodin
Thanks for so quick response, Tom!

> 12 апр. 2021 г., в 23:37, Tom Lane  написал(а):
> 
>> But managed cloud services typically do not provide superuser roles.
> 
> This is not a good argument for relaxing superuser requirements.
Ok, let's put aside question about relaxing requirements in upstream.

Do I risk having some extra superusers in my installation if I allow everyone 
to create LEAKPROOF functions?

Thanks!

Best regards, Andrey Borodin.



Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 16:37:01 -0400, Tom Lane wrote:
> Andrey Borodin  writes:
> > Currently only superuser is allowed to create LEAKPROOF functions
> > because leakproof functions can see tuples which have not yet been
> > filtered out by security barrier views or row level security
> > policies.
>
> Yeah.
>
> > But managed cloud services typically do not provide superuser roles.
>
> This is not a good argument for relaxing superuser requirements.

IDK. I may have been adjacent to people operating database-as-a-service
for too long, but ISTM there's decent reasons for (and also against) not
providing full superuser access. Even outside of managed services it
seems like a decent idea to split the "can execute native code" role
from the "administers an application" role. That reduces the impact a
bug in the application can incur.

There's certain things that are pretty intrinsically "can execute native
code", like defining new 'C' functions, arbitrary ALTER SYSTEM,
arbitrary file reads/writes, etc. Splitting them off from superuser is a
fools errand. But it's not at all clear why adding LEAKPROOF to
functions falls into that category?

Greetings,

Andres Freund




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 22:42:03 +0200, Tomas Vondra wrote:
> It's unfortunate that we tie the this capability to being superuser,
> so maybe the right solution would be to introduce a separate role with
> this privilege?

Perhaps DB owner + BYPASSRLS would be enough?

Greetings,

Andres Freund




Curious test case added by collation version tracking patch

2021-04-12 Thread Tom Lane
I am wondering what was the intent of this test case added by commit
257836a75:

CREATE INDEX icuidx16_mood ON collate_test(id) WHERE mood > 'ok' COLLATE 
"fr-x-icu";

where "mood" is of an enum type, which surely does not respond to
collations.

The reason I ask is that this case started failing after I fixed
a parse_coerce.c bug that allowed a CollateExpr node to survive
in this WHERE expression, which by rights it should not.  I'm
inclined to think that the test case is wrong and should be removed,
but maybe there's some reason to have a variant of it.

regards, tom lane




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andrey Borodin
Thanks, Tomas!

> 12 апр. 2021 г., в 23:42, Tomas Vondra  
> написал(а):
> 
> I guess for the cloud services it's not an issue - they're mostly
> concerned about manageability and restricting access to the OS.
In fact, we would happily give a client access to an OS too. It's a client's VM 
after all and all software is open source. But it opens a way to attack control 
plane. Which in turn opens a way for clients to attack each other. And we 
really do not want it.

Thanks!

Best regards, Andrey Borodin.



Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 23:51:02 +0300, Andrey Borodin wrote:
> Do I risk having some extra superusers in my installation if I allow
> everyone to create LEAKPROOF functions?

I think that depends on what you define "superuser" to exactly
be. Defining it as "has a path to executing arbitrary native code", I
don't think, if implemented sensibly, allowing to set LEAKPROOF on new
functions would equate superuser permissions. But you soon after might
hit further limitations where lifting them would have such a risk,
e.g. defining new types with in/out functions.

Greetings,

Andres Freund




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andrey Borodin



> 13 апр. 2021 г., в 00:01, Andres Freund  написал(а):
> 
> Hi,
> 
> On 2021-04-12 23:51:02 +0300, Andrey Borodin wrote:
>> Do I risk having some extra superusers in my installation if I allow
>> everyone to create LEAKPROOF functions?
> 
> I think that depends on what you define "superuser" to exactly
> be. Defining it as "has a path to executing arbitrary native code", I
> don't think, if implemented sensibly, allowing to set LEAKPROOF on new
> functions would equate superuser permissions.
Thanks!


> But you soon after might
> hit further limitations where lifting them would have such a risk,
> e.g. defining new types with in/out functions.

I think, real extensibility of a managed DB service is a very distant challenge.
Currently we just allow-list extensions.

Best regards, Andrey Borodin.



Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Tom Lane
Andres Freund  writes:
> On 2021-04-12 23:51:02 +0300, Andrey Borodin wrote:
>> Do I risk having some extra superusers in my installation if I allow
>> everyone to create LEAKPROOF functions?

> I think that depends on what you define "superuser" to exactly
> be. Defining it as "has a path to executing arbitrary native code", I
> don't think, if implemented sensibly, allowing to set LEAKPROOF on new
> functions would equate superuser permissions. But you soon after might
> hit further limitations where lifting them would have such a risk,
> e.g. defining new types with in/out functions.

I think the issue here is more that superuser = "able to break the
security guarantees of the database".  I doubt that falsely labeling
a function LEAKPROOF can get you more than the ability to read data
you're not supposed to be able to read ... but that ability is then
available to all users, or at least all users who can execute the
function in question.  So it definitely is a fairly serious security
hazard, and one that's not well modeled by role labels.  If you
give somebody e.g. pg_read_all_data privileges, you don't expect
that that means they can give it to other users.

regards, tom lane




Re: pg_upgrade check for invalid role-specific default config

2021-04-12 Thread Bruce Momjian
On Mon, Apr 12, 2021 at 01:28:19PM +, Charlie Hornsby wrote:
> Hi all,
> 
> While troubleshooting a failed upgrade from v11 -> v12 I realised I had
> encountered a bug previously reported on the pgsql-bugs mailing list:
> 
> #14242 Role with a setconfig "role" setting to a nonexistent role causes
> pg_upgrade to fail
> 
> https://www.postgresql.org/message-id/20160711223641.1426.86096%40wrigleys.postgresql.org

...

> Since it is possible to write a query to identify these cases, would there
> be appetite for me to submit a patch to add a check for this to 
> pg_upgrade?
> 
> First time mailing list user here so many apologies for any missteps I have
> made in this message.

Yes, I think a patch would be good, but the fix might be for pg_dump
instead, which pg_upgrade uses.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade check for invalid role-specific default config

2021-04-12 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, Apr 12, 2021 at 01:28:19PM +, Charlie Hornsby wrote:
>> While troubleshooting a failed upgrade from v11 -> v12 I realised I had
>> encountered a bug previously reported on the pgsql-bugs mailing list:
>> #14242 Role with a setconfig "role" setting to a nonexistent role causes
>> pg_upgrade to fail
>> https://www.postgresql.org/message-id/20160711223641.1426.86096%40wrigleys.postgresql.org
>> Since it is possible to write a query to identify these cases, would there
>> be appetite for me to submit a patch to add a check for this to 
>> pg_upgrade?

> Yes, I think a patch would be good, but the fix might be for pg_dump
> instead, which pg_upgrade uses.

I'm not sure I buy the premise that "it is possible to write a query
to identify these cases".  It seems to me that the general problem is
that ALTER ROLE/DATABASE SET values might have become incorrect since
they were installed and would thus fail when reloaded in dump/restore.
We're not going to be able to prevent that in the general case, and
it's not obvious to me what special case might be worth going after.

I do find it interesting that we now have two reports of somebody
doing "ALTER ROLE SET role = something".  In the older thread,
I was skeptical that that had any real use-case, so I wonder if
Charlie has a rationale for having done that.

regards, tom lane




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 17:14:20 -0400, Tom Lane wrote:
> I doubt that falsely labeling a function LEAKPROOF can get you more
> than the ability to read data you're not supposed to be able to read
> ... but that ability is then available to all users, or at least all
> users who can execute the function in question.  So it definitely is a
> fairly serious security hazard, and one that's not well modeled by
> role labels.  If you give somebody e.g. pg_read_all_data privileges,
> you don't expect that that means they can give it to other users.

A user with BYPASSRLS can create public security definer functions
returning data. If the concern is a BYPASSRLS user intentionally
exposing data, then there's not a meaningful increase to allow defining
LEAKPROOF functions.

To me the more relevant concern is that it's hard to determine
LEAKPROOF-ness and that many use-cases for BYPASSRLS do not require the
target to have the technical chops to determine if a function actually
is leakproof. But that seems more an argument for providing a separate
control over allowing to specify LEAKPROOF than against separating it
from superuser.

Greetings,

Andres Freund




Re: pg_upgrade check for invalid role-specific default config

2021-04-12 Thread Tom Lane
I wrote:
> I'm not sure I buy the premise that "it is possible to write a query
> to identify these cases".  It seems to me that the general problem is
> that ALTER ROLE/DATABASE SET values might have become incorrect since
> they were installed and would thus fail when reloaded in dump/restore.
> We're not going to be able to prevent that in the general case, and
> it's not obvious to me what special case might be worth going after.

Actually, after thinking about that a bit more, this is a whole lot
like the issues we have with reloading function bodies and function
SET clauses.  The solution we've adopted for that is to allow dumps
to turn off validation via the check_function_bodies GUC.  Maybe there
should be a GUC to disable validation of ALTER ROLE/DATABASE SET values.
If you fat-finger a setting, you might not be able to log in, but you
couldn't log in in the old database either.

Another answer is that maybe the processing of the "role" case
in particular is just broken.  Compare the behavior here:

regression=# create role joe;
CREATE ROLE
regression=# alter role joe set role = 'notthere';
ERROR:  role "notthere" does not exist
regression=# alter role joe set default_text_search_config to 'notthere';
NOTICE:  text search configuration "notthere" does not exist
ALTER ROLE
# \drds
List of settings
 Role |  Database  |  Settings   
--++-
 joe  || default_text_search_config=notthere

despite the fact that a direct SET fails:

regression=# set default_text_search_config to 'notthere';
ERROR:  invalid value for parameter "default_text_search_config": "notthere"

It's intentional that we don't throw a hard error for
default_text_search_config, because that would create
a problematic ordering dependency for pg_dump: the
desired config might just not have been reloaded yet.
Maybe the right answer here is that the processing of
"set role" in particular failed to get that memo.

regards, tom lane




Re: Curious test case added by collation version tracking patch

2021-04-12 Thread Thomas Munro
On Tue, Apr 13, 2021 at 8:59 AM Tom Lane  wrote:
> I am wondering what was the intent of this test case added by commit
> 257836a75:
>
> CREATE INDEX icuidx16_mood ON collate_test(id) WHERE mood > 'ok' COLLATE 
> "fr-x-icu";
>
> where "mood" is of an enum type, which surely does not respond to
> collations.
>
> The reason I ask is that this case started failing after I fixed
> a parse_coerce.c bug that allowed a CollateExpr node to survive
> in this WHERE expression, which by rights it should not.  I'm
> inclined to think that the test case is wrong and should be removed,
> but maybe there's some reason to have a variant of it.

Indeed, this doesn't do anything useful, other than prove that we
record a collation dependency where it is (uselessly) allowed in an
expression.  Since you're not going to allow that anymore, it should
be dropped.




Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Bryn Llewellyn
br...@momjian.us wrote:
> 
> z...@yugabyte.com wrote:
>> Among previous examples given by Bryn, the following produces correct result 
>> based on Bruce's patch.
>> 
>> # select interval '-1.7 years 29.4 months';
>> interval
>> 
>>  9 mons 12 days
> 
> Yes, that changed is caused by the rounding fixes, and not by the unit 
> pushdown adjustments.

I showed you all this example a long time ago:

select (
'
  3.853467 years
'::interval
  )::text as i;

This behavior is the same in the env. of Bruce’s patch as in unpatched PG 13.2. 
This is the result.

3 years 10 mons

Notice that "3.853467 years" is "3 years" plus "10.241604 months". This 
explains the "10 mons" in the result. But the 0.241604 months remainder did not 
spill down into days.

Can anybody defend this quirk? An extension of this example with a real number 
of month in the user input is correspondingly yet more quirky. The rules can be 
written down. But they’re too tortuos to allow an ordinary mortal confidently 
to design code that relies on them.

(I was unable to find any rule statement that lets the user predict this in the 
doc. But maybe that’s because of my feeble searching skills.)

If there is no defense (and I cannot imagine one) might Bruce’s patch normalize 
this too to follow this rule:

— convert 'y years m months' to the real number y*12 + m.

— record truc( y*12 + m) in the "months" field of the internal representation

— flow the remainder down to days (but no further)

After all, you've bitten the bullet now and changed the behavior. This means 
that the semantics of some extant applications will change. So... in for a 
penny, in for a pound?



Re: Curious test case added by collation version tracking patch

2021-04-12 Thread Tom Lane
Thomas Munro  writes:
> On Tue, Apr 13, 2021 at 8:59 AM Tom Lane  wrote:
>> The reason I ask is that this case started failing after I fixed
>> a parse_coerce.c bug that allowed a CollateExpr node to survive
>> in this WHERE expression, which by rights it should not.  I'm
>> inclined to think that the test case is wrong and should be removed,
>> but maybe there's some reason to have a variant of it.

> Indeed, this doesn't do anything useful, other than prove that we
> record a collation dependency where it is (uselessly) allowed in an
> expression.  Since you're not going to allow that anymore, it should
> be dropped.

OK, I'll go clean it up.  Thanks!

regards, tom lane




Teaching users how they can get the most out of HOT in Postgres 14

2021-04-12 Thread Peter Geoghegan
Recent work from commit 5100010e taught VACUUM that it doesn't have to
do index vacuuming in cases where there are practically zero (not
necessarily exactly zero) tuples to delete from indexes. It also
surfaces the information used to decide whether or not we skip index
vacuuming in the logs, via the log_autovacuum_min_duration mechanism.
This log output can be used to get a sense of how effective HOT is
over time.

There is one number of particular interest: the proportion of heap
pages that have one or more LP_DEAD items across successive VACUUMs
(this is expressed as a percentage of the table). The immediate reason
to expose this is that it is crucial to the skipping behavior from
commit 5100010e -- the threshold for skipping is 2% of all heap pages.
But that's certainly not the only reason to pay attention to the
percentage. It can also be used to understand HOT in general. It can
be correlated with workload spikes and stressors that tend to make HOT
less effective.

A number of interesting workload-specific patterns seem to emerge by
focussing on how this number changes/grows over time. I think that
this should be pointed out directly in the docs. What's more, it seems
like a good vehicle for discussing how HOT works in general. Why did
we never really get around to documenting HOT? There should at least
be some handling of how DBAs can get the most out of HOT through
monitoring and through tuning -- especially by lowering heap
fillfactor.

It's very hard to get all UPDATEs to use HOT. It's much easier to get
UPDATEs to mostly use HOT most of the time. How things change over
time seems crucially important.

I'll show one realistic example, just to give some idea of what it
might look like. This is output for 3 successive autovacuums against
the largest TPC-C table:

automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 0
pages: 0 removed, 4668405 remain, 0 skipped due to pins, 696997 skipped frozen
tuples: 324571 removed, 186702373 remain, 333888 are dead but not yet
removable, oldest xmin: 7624965
buffer usage: 3969937 hits, 3931997 misses, 1883255 dirtied
index scan bypassed: 42634 pages from table (0.91% of total) have
324364 dead item identifiers
avg read rate: 62.469 MB/s, avg write rate: 29.920 MB/s
I/O Timings: read=42359.501 write=11867.903
system usage: CPU: user: 43.62 s, system: 38.17 s, elapsed: 491.74 s
WAL usage: 4586766 records, 1850599 full page images, 849931 bytes

automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 0
pages: 0 removed, 5976391 remain, 0 skipped due to pins, 2516643 skipped frozen
tuples: 759956 removed, 239787517 remain, 1848431 are dead but not yet
removable, oldest xmin: 18489326
buffer usage: 3432019 hits, 3385757 misses, 2426571 dirtied
index scan bypassed: 103941 pages from table (1.74% of total) have
790233 dead item identifiers
avg read rate: 50.338 MB/s, avg write rate: 36.077 MB/s
I/O Timings: read=49252.721 write=17003.987
system usage: CPU: user: 45.86 s, system: 34.47 s, elapsed: 525.47 s
WAL usage: 5598040 records, 2274556 full page images, 10510281959 bytes

automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 1
pages: 0 removed, 7483804 remain, 1 skipped due to pins, 4208295 skipped frozen
tuples: 972778 removed, 299295048 remain, 1970910 are dead but not yet
removable, oldest xmin: 30987445
buffer usage: 3384994 hits, 4593727 misses, 2891003 dirtied
index scan needed: 174243 pages from table (2.33% of total) had
1325752 dead item identifiers removed
index "bmsql_order_line_pkey": pages: 1250660 in total, 0 newly
deleted, 0 currently deleted, 0 reusable
avg read rate: 60.505 MB/s, avg write rate: 38.078 MB/s
I/O Timings: read=72881.986 write=21872.615
system usage: CPU: user: 65.24 s, system: 42.24 s, elapsed: 593.14 s
WAL usage: 6668353 records, 2684040 full page images, 12374536939 bytes

These autovacuums occur every 60-90 minutes with the workload in
question (with pretty aggressive autovacuum settings). We see that HOT
works rather well here -- but not so well that index vacuuming can be
avoided consistently, which happens in the final autovacuum (it has
"index scans: 1"). There was slow but steady growth in the percentage
of LP_DEAD-containing heap pages over time here, which is common
enough.

The point of HOT is not to avoid having to do index vacuuming, of
course -- that has it backwards. But framing HOT as doing work in
backends so autovacuum doesn't have to do similar work later on is a
good mental model to encourage users to adopt. There are also
significant advantages to reducing the effectiveness of HOT to this
one number -- HOT must be working well if it's close to 0%, almost
always below 2%, with the occasional aberration that sees it go up to
maybe 5%. But if it ever goes too high (in the absence of DELETEs),
you might have trouble on your hands. It might not go down again.

There are other interesting patterns from other tables within the same
database -- including 

Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Bruce Momjian
On Mon, Apr 12, 2021 at 03:09:48PM -0700, Bryn Llewellyn wrote:
> I showed you all this example a long time ago:
> 
> select (
> '
>   3.853467 years
> '::interval
>   )::text as i;
> 
> This behavior is the same in the env. of Bruce’s patch as in unpatched PG 
> 13.2. This is the result.
> 
> 3 years 10 mons
> 
> Notice that "3.853467 years" is "3 years" plus "10.241604 months". This 
> explains the "10 mons" in the result. But the 0.241604 months remainder did 
> not spill down into days.
> 
> Can anybody defend this quirk? An extension of this example with a real 
> number of month in the user input is correspondingly yet more quirky. The 
> rules can be written down. But they’re too tortuos to allow an ordinary 
> mortal confidently to design code that relies on them.
> 
> (I was unable to find any rule statement that lets the user predict this in 
> the doc. But maybe that’s because of my feeble searching skills.)
> 
> If there is no defense (and I cannot imagine one) might Bruce’s patch 
> normalize this too to follow this rule:
> 
> — convert 'y years m months' to the real number y*12 + m.
> 
> — record truc( y*12 + m) in the "months" field of the internal representation
> 
> — flow the remainder down to days (but no further)
> 
> After all, you've bitten the bullet now and changed the behavior. This means 
> that the semantics of some extant applications will change. So... in for a 
> penny, in for a pound?

The docs now say:

 Field values can have fractional parts;  for example, '1.5
 weeks' or '01:02:03.45'.  The fractional
-->  parts are used to compute appropriate values for the next lower-order
 internal fields (months, days, seconds).

meaning fractional years flows to the next lower internal unit, months,
and no further.  Fractional months would flow to days.  The idea of not
flowing past the next lower-order internal field is that the
approximations between units are not precise enough to flow accurately.

With my patch, the output is now:

SELECT INTERVAL '3 years 10.241604 months';
interval

 3 years 10 mons 7 days

It used to flow to seconds.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-04-12 Thread David Rowley
On Sun, 11 Apr 2021 at 10:38, Tomas Vondra
 wrote:
> I wonder what's the relationship between the length of the IN list and
> the minimum number of rows needed for the hash to start winning.

I made the attached spreadsheet which demonstrates the crossover point
using the costs that I coded into cost_qual_eval_walker().

It basically shows, for large arrays, that there are fairly
significant benefits to hashing for just 2 lookups and not hashing
only just wins for 1 lookup.  However, the cost model does not account
for allocating memory for the hash table, which is far from free.

You can adjust the number of items in the IN clause by changing the
value in cell B1. The values in B2 and B3 are what I saw the planner
set when I tested with both INT and TEXT types.

David


cost_comparison_hashed_vs_non-hashed_saops.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 16:11:59 -0700, Peter Geoghegan wrote:
> Recent work from commit 5100010e taught VACUUM that it doesn't have to
> do index vacuuming in cases where there are practically zero (not
> necessarily exactly zero) tuples to delete from indexes.

FWIW, I'd not at all be surprised if this causes some issues. Consider
cases where all index lookups are via bitmap scans (which does not
support killtuples) - if value ranges are looked up often the repeated
heap fetches can absolutely kill query performance. I've definitely had
to make autovacuum more aggressive for cases like this or schedule
manual vacuums, and now that's silently not good enough anymore. Yes, 2%
of the table isn't all that much, but often enough all the updates and
lookups concentrate in one value range.

As far as I can see there's no reasonable way to disable this
"optimization", which scares me.

Greetings,

Andres Freund




Re: psql - add SHOW_ALL_RESULTS option

2021-04-12 Thread Michael Paquier
On Mon, Apr 12, 2021 at 07:08:21PM +, Bossart, Nathan wrote:
> I think I've found another issue with this patch.  If AcceptResult()
> returns false in SendQueryAndProcessResults(), it seems to result in
> an infinite loop of "unexpected PQresultStatus" messages.  This can be
> reproduced by trying to run "START_REPLICATION" via psql.

Yes, that's another problem, and this causes an infinite loop where
we would just report one error previously :/
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade check for invalid role-specific default config

2021-04-12 Thread Tom Lane
I wrote:
> Another answer is that maybe the processing of the "role" case
> in particular is just broken.

After digging around a bit more, I think that that is indeed the
right answer.  Most of the GUC check functions that have
database-state-dependent behavior are programmed to behave specially
when checking a proposed ALTER USER/DATABASE setting; but check_role
and check_session_authorization did not get that memo.  I also
noted that check_temp_buffers would throw an error for no very good
reason.  There don't look to be any other troublesome cases.
So I ended up with the attached.

It feels a bit unsatisfactory to have these various check functions
know about this explicitly.  However, I'm not sure that there's a
good way to centralize it.  Only the check function knows whether
the check it's making is immutable or dependent on DB state --- and
in the former case, not throwing an error wouldn't be an improvement.

Anyway, I'm inclined to think that we should not only apply this
but back-patch it.  Two complaints is enough to suggest we have
an issue.  Plus, I think there is a dump/reload ordering problem
here: as things stand, "alter user joe set role = bob" would work
or not work depending on the order the roles are created in
and/or the order privileges are granted in.

regards, tom lane

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index c5cf08b423..0c85679420 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -767,6 +767,17 @@ check_session_authorization(char **newval, void **extra, GucSource source)
 	roleTup = SearchSysCache1(AUTHNAME, PointerGetDatum(*newval));
 	if (!HeapTupleIsValid(roleTup))
 	{
+		/*
+		 * When source == PGC_S_TEST, we don't throw a hard error for a
+		 * nonexistent user name, only a NOTICE.  See comments in guc.h.
+		 */
+		if (source == PGC_S_TEST)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_UNDEFINED_OBJECT),
+	 errmsg("role \"%s\" does not exist", *newval)));
+			return true;
+		}
 		GUC_check_errmsg("role \"%s\" does not exist", *newval);
 		return false;
 	}
@@ -837,10 +848,23 @@ check_role(char **newval, void **extra, GucSource source)
 			return false;
 		}
 
+		/*
+		 * When source == PGC_S_TEST, we don't throw a hard error for a
+		 * nonexistent user name or insufficient privileges, only a NOTICE.
+		 * See comments in guc.h.
+		 */
+
 		/* Look up the username */
 		roleTup = SearchSysCache1(AUTHNAME, PointerGetDatum(*newval));
 		if (!HeapTupleIsValid(roleTup))
 		{
+			if (source == PGC_S_TEST)
+			{
+ereport(NOTICE,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg("role \"%s\" does not exist", *newval)));
+return true;
+			}
 			GUC_check_errmsg("role \"%s\" does not exist", *newval);
 			return false;
 		}
@@ -859,6 +883,14 @@ check_role(char **newval, void **extra, GucSource source)
 		if (!InitializingParallelWorker &&
 			!is_member_of_role(GetSessionUserId(), roleid))
 		{
+			if (source == PGC_S_TEST)
+			{
+ereport(NOTICE,
+		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+		 errmsg("permission will be denied to set role \"%s\"",
+*newval)));
+return true;
+			}
 			GUC_check_errcode(ERRCODE_INSUFFICIENT_PRIVILEGE);
 			GUC_check_errmsg("permission denied to set role \"%s\"",
 			 *newval);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d0a51b507d..2ffefdf943 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11777,8 +11777,9 @@ check_temp_buffers(int *newval, void **extra, GucSource source)
 {
 	/*
 	 * Once local buffers have been initialized, it's too late to change this.
+	 * However, if this is only a test call, allow it.
 	 */
-	if (NLocBuffer && NLocBuffer != *newval)
+	if (source != PGC_S_TEST && NLocBuffer && NLocBuffer != *newval)
 	{
 		GUC_check_errdetail("\"temp_buffers\" cannot be changed after any temporary tables have been accessed in the session.");
 		return false;


Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-04-12 Thread Peter Geoghegan
On Mon, Apr 12, 2021 at 4:30 PM Andres Freund  wrote:
> As far as I can see there's no reasonable way to disable this
> "optimization", which scares me.

I'm fine with adding a simple 'off' switch. What I'd like to avoid
doing is making the behavior tunable, since it's likely to change in
Postgres 15 and Postgres 16 anyway.

-- 
Peter Geoghegan




Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-04-12 Thread David Rowley
On Fri, 9 Apr 2021 at 00:00, David Rowley  wrote:
> I push this with some minor cleanup from the v6 patch I posted earlier.

I realised when working on something unrelated last night that we can
also do hash lookups for NOT IN too.

We'd just need to check if the operator's negator operator is
hashable.  No new fields would need to be added to ScalarArrayOpExpr.
We'd just set the hashfuncid to the correct value and then set the
opfuncid to the negator function.  In the executor, we'd know to check
if the value is in the table or not in the table based on the useOr
value.

I'm not really sure whether lack of NOT IN support is going to be a
source of bug reports for PG14 or not.  If it was, then it might be
worth doing something about that for PG14.   Otherwise, we can just
leave it for future work for PG15 and beyond.  I personally don't have
any strong feelings either way, but I'm leaning towards just writing a
patch and thinking of pushing it sometime after we branch for PG15.

I've included the RMT, just in case they want to voice an opinion on that.

David




Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, Apr 12, 2021 at 03:09:48PM -0700, Bryn Llewellyn wrote:
>> After all, you've bitten the bullet now and changed the behavior. This means 
>> that the semantics of some extant applications will change. So... in for a 
>> penny, in for a pound?

> The docs now say:

>  Field values can have fractional parts;  for example, '1.5
>  weeks' or '01:02:03.45'.  The fractional
> -->  parts are used to compute appropriate values for the next lower-order
>  internal fields (months, days, seconds).

> meaning fractional years flows to the next lower internal unit, months,
> and no further.  Fractional months would flow to days.  The idea of not
> flowing past the next lower-order internal field is that the
> approximations between units are not precise enough to flow accurately.

Um, what's the argument for down-converting AT ALL?  The problem is
precisely that any such conversion is mostly fictional.

> With my patch, the output is now:

>   SELECT INTERVAL '3 years 10.241604 months';
>   interval
>   
>3 years 10 mons 7 days

> It used to flow to seconds.

Yeah, that's better than before, but I don't see any principled argument
for it not to be "3 years 10 months", full stop.

regards, tom lane




Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-04-12 Thread Tom Lane
David Rowley  writes:
> I realised when working on something unrelated last night that we can
> also do hash lookups for NOT IN too.

... and still get the behavior right for nulls?

regards, tom lane




Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-04-12 Thread David Rowley
On Tue, 13 Apr 2021 at 11:42, Tom Lane  wrote:
>
> David Rowley  writes:
> > I realised when working on something unrelated last night that we can
> > also do hash lookups for NOT IN too.
>
> ... and still get the behavior right for nulls?

Yeah, it will. There are already some special cases for NULLs in the
IN version. Those would need to be adapted for NOT IN.

David




Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-04-12 Thread Michael Paquier
On Mon, Apr 12, 2021 at 04:35:13PM -0700, Peter Geoghegan wrote:
> On Mon, Apr 12, 2021 at 4:30 PM Andres Freund  wrote:
> > As far as I can see there's no reasonable way to disable this
> > "optimization", which scares me.
> 
> I'm fine with adding a simple 'off' switch. What I'd like to avoid
> doing is making the behavior tunable, since it's likely to change in
> Postgres 15 and Postgres 16 anyway.

While going through this commit a couple of days ago, I really got to
wonder why you are controlling this stuff with a hardcoded value and I
found that scary, while what you should be using are two GUCs with the
reloptions that come with the feature (?):
- A threshold, as an integer, to define a number of pages.
- A scale factor to define a percentage of pages.

Also, I am a bit confused with the choice of BYPASS_THRESHOLD_PAGES as
parameter name.  For all the other parameters of autovacuum, we use
"threshold" for a fixed number of items, not a percentage of a given
item.
--
Michael


signature.asc
Description: PGP signature


Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-04-12 Thread Peter Geoghegan
On Mon, Apr 12, 2021 at 4:52 PM Michael Paquier  wrote:
> While going through this commit a couple of days ago, I really got to
> wonder why you are controlling this stuff with a hardcoded value and I
> found that scary, while what you should be using are two GUCs with the
> reloptions that come with the feature (?):
> - A threshold, as an integer, to define a number of pages.
> - A scale factor to define a percentage of pages.

Why?


-- 
Peter Geoghegan




Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Bruce Momjian
On Mon, Apr 12, 2021 at 07:38:21PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, Apr 12, 2021 at 03:09:48PM -0700, Bryn Llewellyn wrote:
> >> After all, you've bitten the bullet now and changed the behavior. This 
> >> means that the semantics of some extant applications will change. So... in 
> >> for a penny, in for a pound?
> 
> > The docs now say:
> 
> >  Field values can have fractional parts;  for example, '1.5
> >  weeks' or '01:02:03.45'.  The fractional
> > -->  parts are used to compute appropriate values for the next lower-order
> >  internal fields (months, days, seconds).
> 
> > meaning fractional years flows to the next lower internal unit, months,
> > and no further.  Fractional months would flow to days.  The idea of not
> > flowing past the next lower-order internal field is that the
> > approximations between units are not precise enough to flow accurately.
> 
> Um, what's the argument for down-converting AT ALL?  The problem is
> precisely that any such conversion is mostly fictional.

True.

> > With my patch, the output is now:
> 
> > SELECT INTERVAL '3 years 10.241604 months';
> > interval
> > 
> >  3 years 10 mons 7 days
> 
> > It used to flow to seconds.
> 
> Yeah, that's better than before, but I don't see any principled argument
> for it not to be "3 years 10 months", full stop.

Well, the case was:

SELECT INTERVAL '0.1 months';
 interval
--
 3 days

SELECT INTERVAL '0.1 months' + interval '0.9 months';
 ?column?
--
 30 days

If you always truncate, you basically lose the ability to specify
fractional internal units, which I think is useful.  I would say if you
use fractional units of one of the internal units, you are basically
knowing you are asking for an approximation --- that is not true of '3.5
years', for example.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Bryn Llewellyn
> t...@sss.pgh.pa.us wrote:
> 
> br...@momjian.us writes:
>> b...@yugabyte.com wrote:
>>> After all, you've bitten the bullet now and changed the behavior. This 
>>> means that the semantics of some extant applications will change. So... in 
>>> for a penny, in for a pound?
> 
>> The docs now say:
> 
>> Field values can have fractional parts;  for example, '1.5
>> weeks' or '01:02:03.45'.  The fractional
>> -->  parts are used to compute appropriate values for the next lower-order
>> internal fields (months, days, seconds).
> 
>> meaning fractional years flows to the next lower internal unit, months, and 
>> no further.  Fractional months would flow to days.  The idea of not flowing 
>> past the next lower-order internal field is that the approximations between 
>> units are not precise enough to flow accurately.
> 
> Um, what's the argument for down-converting AT ALL?  The problem is precisely 
> that any such conversion is mostly fictional.
> 
>> With my patch, the output is now:
> 
>>  SELECT INTERVAL '3 years 10.241604 months';
>>  interval
>>  
>>   3 years 10 mons 7 days
> 
>> It used to flow to seconds.
> 
> Yeah, that's better than before, but I don't see any principled argument for 
> it not to be "3 years 10 months", full stop.

Tom, I fully support your suggestion to have no flow down at all. Please may 
this happen! However, the new rule must be described in terms of the three 
fields of the internal representation: [months, days, seconds]. This 
representation is already documented.

Don’t forget that '731.42587 hours’ is read back as "731:25:33.132" (or, if you 
prefer, 731 hours, 25 minutes, and 33.132 seconds if you use "extract" and your 
own pretty print). But we don’t think of this as “flow down”. Rather, it’s just 
a conventional representation of the seconds field of the internal 
representation. I could go on. But you all know what I mean.

By the way, I made a nice little demo (for my doc project). It shows that:

(1) if you pick the right date-time just before a DST change, and do the 
addition in the right time zone, then adding 24 hours gets a different answer 
than adding one day.

(2) if you pick the right date-time just before 29-Feb in a leap year, then 
adding 30 days gets a different answer than adding one month.

You all know why. And though the doc could explain and illustrate this better, 
it does tell you to expect this. It also says that the difference in semantics 
that these examples show is the reason for the three-field internal 
representation.

It seems to me that both the age-old behavior that vanilla 13.2 exhibits, and 
the behavior in the regime of Bruce’s patch are like adding 2.2 oranges to 1.3 
oranges and getting 3 oranges and 21 apples (because 1 orange is conventionally 
the same as 42 apples). Bruce made a step in the right direction by stopping 
oranges convert all the way down to bananas. But it would be so much better to 
get rid of this false equivalence business altogether.




Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Bryn Llewellyn
> On 12-Apr-2021, at 17:00, Bruce Momjian  wrote:
> 
> On Mon, Apr 12, 2021 at 07:38:21PM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>>> On Mon, Apr 12, 2021 at 03:09:48PM -0700, Bryn Llewellyn wrote:
 After all, you've bitten the bullet now and changed the behavior. This 
 means that the semantics of some extant applications will change. So... in 
 for a penny, in for a pound?
>> 
>>> The docs now say:
>> 
>>> Field values can have fractional parts;  for example, '1.5
>>> weeks' or '01:02:03.45'.  The fractional
>>> -->  parts are used to compute appropriate values for the next lower-order
>>> internal fields (months, days, seconds).
>> 
>>> meaning fractional years flows to the next lower internal unit, months,
>>> and no further.  Fractional months would flow to days.  The idea of not
>>> flowing past the next lower-order internal field is that the
>>> approximations between units are not precise enough to flow accurately.
>> 
>> Um, what's the argument for down-converting AT ALL?  The problem is
>> precisely that any such conversion is mostly fictional.
> 
> True.
> 
>>> With my patch, the output is now:
>> 
>>> SELECT INTERVAL '3 years 10.241604 months';
>>> interval
>>> 
>>>  3 years 10 mons 7 days
>> 
>>> It used to flow to seconds.
>> 
>> Yeah, that's better than before, but I don't see any principled argument
>> for it not to be "3 years 10 months", full stop.
> 
> Well, the case was:
> 
>   SELECT INTERVAL '0.1 months';
>interval
>   --
>3 days
>   
>   SELECT INTERVAL '0.1 months' + interval '0.9 months';
>?column?
>   --
>30 days
> 
> If you always truncate, you basically lose the ability to specify
> fractional internal units, which I think is useful.  I would say if you
> use fractional units of one of the internal units, you are basically
> knowing you are asking for an approximation --- that is not true of '3.5
> years', for example.

I’d argue that the fact that this:

('0.3 months'::interval) + ('0.7 months'::interval)

Is reported as '30 days' and not '1 month' is yet another bug—precisely because 
of what I said in my previous email (sorry that I forked the thread) where I 
referred to the fact that, in the right test, adding 1 month gets a different 
answer than adding 30 days. Yet another convincing reason to get rid of this 
flow down business altogether.

If some application wants to model flow-down, then it can do so with trivial 
programming and full control over its own definition of the rules.





Re: Have I found an interval arithmetic bug?

2021-04-12 Thread Bruce Momjian
On Mon, Apr 12, 2021 at 05:20:43PM -0700, Bryn Llewellyn wrote:
> I’d argue that the fact that this:
>
> ('0.3 months'::interval) + ('0.7 months'::interval)
>
> Is reported as '30 days' and not '1 month' is yet another
> bug—precisely because of what I said in my previous email (sorry
> that I forked the thread) where I referred to the fact that, in the
> right test, adding 1 month gets a different answer than adding 30
> days. 

Flowing _up_ is what these functions do:

\df *justify*
   List of functions
   Schema   |   Name   | Result data type | Argument data types 
| Type

+--+--+-+--
 pg_catalog | justify_days | interval | interval
| func
 pg_catalog | justify_hours| interval | interval
| func
 pg_catalog | justify_interval | interval | interval
| func

> Yet another convincing reason to get rid of this flow down
> business altogether.

We can certainly get rid of all downflow, which in the current patch is
only when fractional internal units are specified.

> If some application wants to model flow-down, then it can do so with
> trivial programming and full control over its own definition of the
> rules.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: wal stats questions

2021-04-12 Thread Fujii Masao




On 2021/03/30 20:37, Masahiro Ikeda wrote:

OK, I added the condition to the fast-return check. I noticed that I
misunderstood that the purpose is to avoid expanding a clock check using WAL
stats counters. But, the purpose is to make the conditions stricter, right?


Yes. Currently if the following condition is false even when the WAL counters
are updated, nothing is sent to the stats collector. But with your patch,
in this case the WAL stats are sent.

if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
!have_function_stats && !disconnect)

Thanks for the patch! It now fails to be applied to the master cleanly.
So could you rebase the patch?

pgstat_initialize() should initialize pgWalUsage with zero?

+   /*
+* set the counters related to generated WAL data.
+*/
+   WalStats.m_wal_records = pgWalUsage.wal_records;
+   WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
+   WalStats.m_wal_bytes = pgWalUsage.wal_bytes;

This should be skipped if pgWalUsage.wal_records is zero?

+ * Be careful that the counters are cleared after reporting them to
+ * the stats collector although you can use WalUsageAccumDiff()
+ * to computing differences to previous values. For backends,
+ * the counters may be reset after a transaction is finished and
+ * pgstat_send_wal() is invoked, so you can compute the difference
+ * in the same transaction only.

On the second thought, I'm afraid that this can be likely to be a foot-gun
in the future. So I'm now wondering if the current approach (i.e., calculate
the diff between the current and previous pgWalUsage and don't reset it
to zero) is better. Thought? Since the similar data structure pgBufferUsage
doesn't have this kind of restriction, I'm afraid that the developer can
easily miss that only pgWalUsage has the restriction.

But currently the diff is calculated (i.e., WalUsageAccumDiff() is called)
even when the WAL counters are not updated. Then if that calculated diff is
zero, we skip sending the WAL stats. This logic should be improved. That is,
probably we should be able to check whether the WAL counters are updated
or not, without calculating the diff, because the calculation is not free.
We can implement this by introducing new flag variable that we discussed
upthread. This flag is set to true whenever the WAL counters are incremented
and used to determine whether the WAL stats need to be sent.

If we do this, another issue is that the data types for wal_records and wal_fpi
in pgWalUsage are long. Which may lead to overflow? If yes, it's should be
replaced with uint64.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 16:53:47 -0700, Peter Geoghegan wrote:
> On Mon, Apr 12, 2021 at 4:52 PM Michael Paquier  wrote:
> > While going through this commit a couple of days ago, I really got to
> > wonder why you are controlling this stuff with a hardcoded value and I
> > found that scary, while what you should be using are two GUCs with the
> > reloptions that come with the feature (?):
> > - A threshold, as an integer, to define a number of pages.
> > - A scale factor to define a percentage of pages.
> 
> Why?

Well, one argument is that you made a fairly significant behavioural
change, with hard-coded logic for when the optimization kicks in. It's
not at all clear that your constants are the right ones for every
workload. We'll likely on get to know whether they're right in > 1 year
- not having a real out at that point imo is somewhat scary.

That said, adding more and more reloptions has a significant cost, so I
don't think it's clear cut that it's the right decision to add
one. Perhaps vacuum_cleanup_index_scale_factor should just be reused for
BYPASS_THRESHOLD_PAGES?

Greetings,

Andres Freund




Re: TRUNCATE on foreign table

2021-04-12 Thread Justin Pryzby
On Sun, Apr 11, 2021 at 03:45:36PM +0530, Bharath Rupireddy wrote:
> On Sun, Apr 11, 2021 at 9:47 AM Justin Pryzby  wrote:
> > Also, you currently test:
> > > + if (extra & TRUNCATE_REL_CONTEXT_ONLY)
> >
> > but TRUNCATE_REL_ aren't indepedent bits, so shouldn't be tested with "&".
> 
> Yeah this is an issue. We could just change the #defines to values
> 0x0001, 0x0002, 0x0004, 0x0008 ... 0x0020 and so on and then testing
> with & would work. So, this way, more than option can be multiplexed
> into the same int value. To multiplex, we need to think: will there be
> a scenario where a single rel in the truncate can have multiple
> options at a time and do we want to distinguish these options while
> deparsing?
> 
> #define TRUNCATE_REL_CONTEXT_NORMAL  0x0001 /* specified without
> ONLY clause */
> #define TRUNCATE_REL_CONTEXT_ONLY 0x0002 /* specified with
> ONLY clause */
> #define TRUNCATE_REL_CONTEXT_CASCADING0x0004   /* not specified
> but truncated
> 
> And I'm not sure what's the agreement on retaining or removing #define
> values, currently I see only TRUNCATE_REL_CONTEXT_ONLY is being used,
> others are just being set but not used. As I said upthread, it will be
> good to remove the unused macros/enums, retain only the ones that are
> used, especially TRUNCATE_REL_CONTEXT_CASCADING this is not required I
> feel, because we can add the child partitions that are foreign tables
> to relids as just normal foreign tables with TRUNCATE_REL_CONTEXT_ONLY
> option.

Converting to "bits" would collapse TRUNCATE_REL_CONTEXT_ONLY and
TRUNCATE_REL_CONTEXT_NORMAL into a single bit.  TRUNCATE_REL_CONTEXT_CASCADING
could optionally be removed.

+1 to convert to bits instead of changing "&" to "==".

-- 
Justin
>From 26e1015de344352c4cd73deda28ad59594914067 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 12 Apr 2021 19:07:34 -0500
Subject: [PATCH 1/2] Convert TRUNCATE_REL_CONTEXT_* to bits

The values were defined as consecutive integers, but TRUNCATE_REL_CONTEXT_ONLY
was tested with bitwise test.  If specified as bits, NORMAL vs ONLY is
redundant (there's only one bit).  And CASCADING was unused.

See also: 8ff1c94649f5c9184ac5f07981d8aea9dfd7ac19
---
 src/backend/commands/tablecmds.c | 9 +++--
 src/backend/replication/logical/worker.c | 4 ++--
 src/include/commands/tablecmds.h | 6 +-
 3 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 096a6f2891..e8215d8dc8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1654,8 +1654,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 
 		rels = lappend(rels, rel);
 		relids = lappend_oid(relids, myrelid);
-		relids_extra = lappend_int(relids_extra, (recurse ?
-  TRUNCATE_REL_CONTEXT_NORMAL :
+		relids_extra = lappend_int(relids_extra, (recurse ? 0 :
   TRUNCATE_REL_CONTEXT_ONLY));
 		/* Log this relation only if needed for logical decoding */
 		if (RelationIsLogicallyLogged(rel))
@@ -1704,8 +1703,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 
 rels = lappend(rels, rel);
 relids = lappend_oid(relids, childrelid);
-relids_extra = lappend_int(relids_extra,
-		   TRUNCATE_REL_CONTEXT_CASCADING);
+relids_extra = lappend_int(relids_extra, 0);
 /* Log this relation only if needed for logical decoding */
 if (RelationIsLogicallyLogged(rel))
 	relids_logged = lappend_oid(relids_logged, childrelid);
@@ -1799,8 +1797,7 @@ ExecuteTruncateGuts(List *explicit_rels,
 truncate_check_activity(rel);
 rels = lappend(rels, rel);
 relids = lappend_oid(relids, relid);
-relids_extra = lappend_int(relids_extra,
-		   TRUNCATE_REL_CONTEXT_CASCADING);
+relids_extra = lappend_int(relids_extra, 0);
 /* Log this relation only if needed for logical decoding */
 if (RelationIsLogicallyLogged(rel))
 	relids_logged = lappend_oid(relids_logged, relid);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fb3ba5c415..986b634525 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1825,7 +1825,7 @@ apply_handle_truncate(StringInfo s)
 		remote_rels = lappend(remote_rels, rel);
 		rels = lappend(rels, rel->localrel);
 		relids = lappend_oid(relids, rel->localreloid);
-		relids_extra = lappend_int(relids_extra, TRUNCATE_REL_CONTEXT_NORMAL);
+		relids_extra = lappend_int(relids_extra, 0);
 		if (RelationIsLogicallyLogged(rel->localrel))
 			relids_logged = lappend_oid(relids_logged, rel->localreloid);
 
@@ -1864,7 +1864,7 @@ apply_handle_truncate(StringInfo s)
 rels = lappend(rels, childrel);
 part_rels = lappend(part_rels, childrel);
 relids = lappend_oid(relids, childrelid);
-relids_extra = lappend_int(relids_extra, TRUNCATE_REL_CONTEXT_CASCADING);
+relids_extra = lappend_int(relids_extra, 0);
 /* Log this relation only if

Re: Possible SSI bug in heap_update

2021-04-12 Thread Thomas Munro
On Mon, Apr 12, 2021 at 10:36 AM Thomas Munro  wrote:
> Yeah.  Patch attached.

Pushed.




Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-04-12 Thread Peter Geoghegan
On Mon, Apr 12, 2021 at 5:37 PM Andres Freund  wrote:
> Well, one argument is that you made a fairly significant behavioural
> change, with hard-coded logic for when the optimization kicks in. It's
> not at all clear that your constants are the right ones for every
> workload.

(Apparently nobody wants to talk about HOT and the documentation.)

The BYPASS_THRESHOLD_PAGES cutoff was chosen conservatively, so that
it would avoid index vacuuming in truly marginal cases -- and it tends
to only delay it there.

A table-level threshold is not the best way of constraining the
problem. In the future, the table threshold should be treated as only
one factor among several. Plus there will be more than a simple yes/no
question to consider. We should eventually be able to do index
vacuuming for some indexes but not others. Bottom-up index deletion
has totally changed things here, because roughly speaking it makes
index bloat proportionate to the number of logical changes to indexed
columns -- you could have one super-bloated index on the table, but
several others that perfectly retain their original size. You still
need to do heap vacuuming eventually, which necessitates vacuuming
indexes too, but the right strategy is probably to vacuum much more
frequently, vacuuming the bloated index each time. You only do a full
round of index vacuuming when the table starts to accumulate way too
many LP_DEAD items. You need a much more sophisticated model for this.
It might also need to hook into autovacuums scheduling.

One of the dangers of high BYPASS_THRESHOLD_PAGES settings is that
it'll work well for some indexes but not others. To a dramatic degree,
even.

That said, nbtree isn't the only index AM, and it is hard to be
completely sure that you've caught everything. So an off switch seems
like a good idea now.

> We'll likely on get to know whether they're right in > 1 year
> - not having a real out at that point imo is somewhat scary.
>
> That said, adding more and more reloptions has a significant cost, so I
> don't think it's clear cut that it's the right decision to add
> one. Perhaps vacuum_cleanup_index_scale_factor should just be reused for
> BYPASS_THRESHOLD_PAGES?

I think that the right way to do this is to generalize INDEX_CLEANUP
to support a mode of operation that disallows vacuumlazy.c from
applying this optimization, as well as any similar optimizations which
will be added in the future.

Even if you don't buy my argument about directly parameterizing
BYPASS_THRESHOLD_PAGES undermining future work, allowing it to be set
much higher than 5% - 10% would be a pretty big footgun. It might
appear to help at first, but risks destabilizing things much later on.


--
Peter Geoghegan




Re: TRUNCATE on foreign table

2021-04-12 Thread Bharath Rupireddy
On Tue, Apr 13, 2021 at 6:27 AM Justin Pryzby  wrote:
>
> On Sun, Apr 11, 2021 at 03:45:36PM +0530, Bharath Rupireddy wrote:
> > On Sun, Apr 11, 2021 at 9:47 AM Justin Pryzby  wrote:
> > > Also, you currently test:
> > > > + if (extra & TRUNCATE_REL_CONTEXT_ONLY)
> > >
> > > but TRUNCATE_REL_ aren't indepedent bits, so shouldn't be tested with "&".
> >
> > Yeah this is an issue. We could just change the #defines to values
> > 0x0001, 0x0002, 0x0004, 0x0008 ... 0x0020 and so on and then testing
> > with & would work. So, this way, more than option can be multiplexed
> > into the same int value. To multiplex, we need to think: will there be
> > a scenario where a single rel in the truncate can have multiple
> > options at a time and do we want to distinguish these options while
> > deparsing?
> >
> > #define TRUNCATE_REL_CONTEXT_NORMAL  0x0001 /* specified without
> > ONLY clause */
> > #define TRUNCATE_REL_CONTEXT_ONLY 0x0002 /* specified with
> > ONLY clause */
> > #define TRUNCATE_REL_CONTEXT_CASCADING0x0004   /* not specified
> > but truncated
> >
> > And I'm not sure what's the agreement on retaining or removing #define
> > values, currently I see only TRUNCATE_REL_CONTEXT_ONLY is being used,
> > others are just being set but not used. As I said upthread, it will be
> > good to remove the unused macros/enums, retain only the ones that are
> > used, especially TRUNCATE_REL_CONTEXT_CASCADING this is not required I
> > feel, because we can add the child partitions that are foreign tables
> > to relids as just normal foreign tables with TRUNCATE_REL_CONTEXT_ONLY
> > option.
>
> Converting to "bits" would collapse TRUNCATE_REL_CONTEXT_ONLY and
> TRUNCATE_REL_CONTEXT_NORMAL into a single bit.  TRUNCATE_REL_CONTEXT_CASCADING
> could optionally be removed.
>
> +1 to convert to bits instead of changing "&" to "==".

Thanks for the patches.

I agree to convert to bits and pass it as int value which is
extensible i.e. we can pass more extra parameters across if required.
Also I'm not in favour of removing relids_extra altogether, we might
need this to send some info in future.

Both 0001 and 0002(along with the new phrasings) look good to me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Apr 12, 2021 at 9:19 AM Tom Lane  wrote:
>> Hence, I propose the attached.  It passes check-world, but that proves
>> absolutely nothing of course :-(.  I wonder if there is any way to
>> exercise these code paths deterministically.

> This approach seems reasonable to me. At least you've managed to
> structure the visibility map page pin check as concomitant with the
> existing space recheck.

Thanks for looking it over.  Do you have an opinion on whether or not
to back-patch?  As far as we know, these bugs aren't exposed in the
back branches for lack of code that would set the all-visible flag
without superexclusive lock.  But I'd still say that heap_update
is failing to honor its API contract in these edge cases, and that
seems like something that could bite us after future back-patches.
Or there might be third-party code that can set all-visible flags.
So I'm kind of tempted to back-patch, but it's a judgment call.

regards, tom lane




  1   2   >