date:20210921

Re: POC: Cleaning up orphaned files using undo logs

2021-09-21 Thread Antonin Houska

Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> > Antonin Houska  wrote:
> >
> > > Dmitry Dolgov <9erthali...@gmail.com> wrote:
> >
> > > > * By throwing at the patchset `make installcheck` I'm getting from time 
> > > > to time
> > > >   and error on the restart:
> > > >
> > > > TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)",
> > > > File: "undorecordset.c", Line: 1098, PID: 6055)
> > > >
> > > >   From what I see XLogReadBufferForRedoExtended finds an invalid buffer 
> > > > and
> > > >   returns BLK_NOTFOUND. The commentary says:
> > > >
> > > >  If the block was not found, then it must be discarded later in
> > > >  the WAL.
> > > >
> > > >   and continues with skip = false, but fails to get a page from an 
> > > > invalid
> > > >   buffer few lines later. It seems that the skip flag is supposed to be 
> > > > used
> > > >   this situation, should it also guard the BufferGetPage part?
> > >
> > > I could see this sometime too, but can't reproduce it now. It's also not 
> > > clear
> > > to me how XLogReadBufferForRedoExtended() can return BLK_NOTFOUND, as the
> > > whole undo log segment is created at once, even if only part of it is 
> > > needed -
> > > see allocate_empty_undo_segment().
> >
> > I could eventually reproduce the problem. The root cause was that WAL 
> > records
> > were created even for temporary / unlogged undo, and thus only empty pages
> > could be found during replay. I've fixed that and also setup regular test 
> > for
> > the BLK_NOTFOUND value. That required a few more fixes to UndoReplay().
> >
> > Attached is a new version.
> 
> Yep, makes sense, thanks. I have few more questions:
> 
> * The use case with orphaned files is working somewhat differently after
>   the rebase on the latest master, do you observe it as well? The
>   difference is ApplyPendingUndo -> SyncPostCheckpoint doesn't clean up
>   an orphaned relation file immediately (only later on checkpoint)
>   because of empty pendingUnlinks. I haven't investigated more yet, but
>   seems like after this commit:
> 
> commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
> Author: Thomas Munro 
> Date:   Mon Aug 2 17:32:20 2021 +1200
> 
> Run checkpointer and bgwriter in crash recovery.
> 
> Start up the checkpointer and bgwriter during crash recovery (except 
> in
> --single mode), as we do for replication.  This wasn't done back in
> commit cdd46c76 out of caution.  Now it seems like a better idea to 
> make
> the environment as similar as possible in both cases.  There may also 
> be
> some performance advantages.
> 
>   something has to be updated (pendingOps are empty right now, so no
>   unlink request is remembered).

I haven't been debugging that part recently, but yes, this commit is relevant,
thanks for pointing that out! Attached is a patch that should fix it. I'll
include it in the next version of the patch series, unless you tell me that
something is still wrong.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

diff --git a/src/backend/access/undo/undorecordset.c b/src/backend/access/undo/undorecordset.c
index 59eba7dfb6..9d05824141 100644
--- a/src/backend/access/undo/undorecordset.c
+++ b/src/backend/access/undo/undorecordset.c
@@ -2622,14 +2622,6 @@ ApplyPendingUndo(void)
 		}
 	}
 
-	/*
-	 * Some undo actions may unlink files. Since the checkpointer is not
-	 * guaranteed to be up, it seems simpler to process the undo request
-	 * ourselves in the way the checkpointer would do.
-	 */
-	SyncPreCheckpoint();
-	SyncPostCheckpoint();
-
 	/* Cleanup. */
 	chunktable_destroy(sets);
 }

Re: POC: Cleaning up orphaned files using undo logs

2021-09-21 Thread Dmitry Dolgov

On Tue, 21 Sep 2021 09:00 Antonin Houska,  wrote:

> Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > Yep, makes sense, thanks. I have few more questions:
> >
> > * The use case with orphaned files is working somewhat differently after
> >   the rebase on the latest master, do you observe it as well? The
> >   difference is ApplyPendingUndo -> SyncPostCheckpoint doesn't clean up
> >   an orphaned relation file immediately (only later on checkpoint)
> >   because of empty pendingUnlinks. I haven't investigated more yet, but
> >   seems like after this commit:
> >
> > commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
> > Author: Thomas Munro 
> > Date:   Mon Aug 2 17:32:20 2021 +1200
> >
> > Run checkpointer and bgwriter in crash recovery.
> >
> > Start up the checkpointer and bgwriter during crash recovery
> (except in
> > --single mode), as we do for replication.  This wasn't done back
> in
> > commit cdd46c76 out of caution.  Now it seems like a better idea
> to make
> > the environment as similar as possible in both cases.  There may
> also be
> > some performance advantages.
> >
> >   something has to be updated (pendingOps are empty right now, so no
> >   unlink request is remembered).
>
> I haven't been debugging that part recently, but yes, this commit is
> relevant,
> thanks for pointing that out! Attached is a patch that should fix it. I'll
> include it in the next version of the patch series, unless you tell me that
> something is still wrong.
>

Sure, but  I can take a look only in a couple of days.

>

Re: Logical replication timeout problem

2021-09-21 Thread Fabrice Chapuis

If I understand, the instruction to send keep alive by the wal sender has
not been reached in the for loop, for what reason?
...
* Check for replication timeout. */
  WalSndCheckTimeOut();

/* Send keepalive if the time has come */
  WalSndKeepaliveIfNecessary();
...

The data load is performed on a table which is not replicated, I do not
understand why the whole transaction linked to an insert is copied to snap
files given that table does not take part of the logical replication.
We are going to do a test by modifying parameters
wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem is that
these parameters are global and changing them will also impact the physical
replication.

Concerning the walsender timeout, when the worker is started again after a
timeout, it will trigger a new walsender associated with it.

postgres 55680 12546  0 Sep20 ?00:00:02 postgres: aq: bgworker:
logical replication worker for subscription 24651602
postgres 55681 12546  0 Sep20 ?00:00:00 postgres: aq: wal sender
process repuser 127.0.0.1(57930) idle

Kind Regards

Fabrice

On Tue, Sep 21, 2021 at 8:38 AM Amit Kapila  wrote:

> On Mon, Sep 20, 2021 at 9:43 PM Fabrice Chapuis 
> wrote:
> >
> > By passing the autovacuum parameter to off the problem did not occur
> right after loading the table as in our previous tests. However, the
> timeout occurred later. We have seen the accumulation of .snap files for
> several Gb.
> >
> > ...
> > -rw---. 1 postgres postgres 16791226 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F500.snap
> > -rw---. 1 postgres postgres 16973268 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F600.snap
> > -rw---. 1 postgres postgres 16790984 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F700.snap
> > -rw---. 1 postgres postgres 16988112 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F800.snap
> > -rw---. 1 postgres postgres 16864593 Sep 20 15:26
> xid-1238444701-lsn-2D2B-F900.snap
> > -rw---. 1 postgres postgres 16902167 Sep 20 15:26
> xid-1238444701-lsn-2D2B-FA00.snap
> > -rw---. 1 postgres postgres 16914638 Sep 20 15:26
> xid-1238444701-lsn-2D2B-FB00.snap
> > -rw---. 1 postgres postgres 16782471 Sep 20 15:26
> xid-1238444701-lsn-2D2B-FC00.snap
> > -rw---. 1 postgres postgres 16963667 Sep 20 15:27
> xid-1238444701-lsn-2D2B-FD00.snap
> > ...
> >
>
> Okay, still not sure why the publisher is not sending keep_alive
> messages in between spilling such a big transaction. If you see, we
> have logic in WalSndLoop() wherein each time after sending data we
> check whether we need to send a keep-alive message via function
> WalSndKeepaliveIfNecessary(). I think to debug this problem further
> you need to add some logs in function WalSndKeepaliveIfNecessary() to
> see why it is not sending keep_alive messages when all these files are
> being created.
>
> Did you change the default value of
> wal_sender_timeout/wal_receiver_timeout? What is the value of those
> variables in your environment? Did you see the message "terminating
> walsender process due to replication timeout" in your server logs?
>
> --
> With Regards,
> Amit Kapila.
>

Re: row filtering for logical replication

2021-09-21 Thread Amit Kapila

On Tue, Sep 21, 2021 at 11:16 AM Dilip Kumar  wrote:
>
> On Tue, Sep 21, 2021 at 10:41 AM Amit Kapila  wrote:
> >
> I think the point is if for some expression some
> > values are in old tuple and others are in new then the idea proposed
> > in the patch seems sane. Moreover, I think in your idea for each tuple
> > we might need to build a new expression and sometimes twice that will
> > beat the purpose of cache we have kept in the patch and I am not sure
> > if it is less costly.
>
> Basically, expression initialization should happen only once in most
> cases so with my suggestion you might have to do it twice.
>

No, the situation will be that we might have to do it twice per update
where as now, it is just done at the very first operation on a
relation.

>  But the
> overhead of extra expression evaluation is far less than doing
> duplicate evaluation because that will happen for sending each update
> operation right?
>

Expression evaluation has to be done twice because every update can
have a different set of values in the old and new tuple.

> > See another example where splitting filter might not give desired results:
> >
> > Say filter expression: (a = 10 and b = 20 and c = 30)
> >
> > Now, old_tuple has values for columns a and c and say values are 10
> > and 30. So, the old_tuple will match the filter if we split it as per
> > your suggestion. Now say new_tuple has values (a = 5, b = 15, c = 25).
> > In such a situation dividing the filter will give us the result that
> > the old_tuple is matching but new tuple is not matching which seems
> > incorrect. I think dividing filter conditions among old and new tuples
> > might not retain its sanctity.
>
> Yeah that is a good example to apply a duplicate filter, basically
> some filters might not even get evaluated on new tuples as the above
> example and if we have removed such expression on the other tuple we
> might break something.
>

Right.

>  Maybe for now this suggest that we might not
> be able to avoid the duplicate execution of the expression
>

So, IIUC, you agreed that let's proceed with the proposed approach and
we can later do optimizations if possible or if we get better ideas.

> > > >
> > > > Even if it were done, there would still be the overhead of deforming 
> > > > the tuple.
> > >
> > > Suppose filter is just (a > 10 and b < 20) and only if the a is
> > > updated, and if we are able to modify the filter for the oldtuple to
> > > be just (a>10) then also do we need to deform?
> > >
> >
> > Without deforming, how will you determine which columns are part of
> > the old tuple?
>
> Okay, then we might have to deform, but at least are we ensuring that
> once we have deform the tuple for the expression evaluation then we
> are not doing that again while sending the tuple?
>

I think this is possible but we might want to be careful not to send
extra unchanged values as we are doing now.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-21 Thread Amit Kapila

On Tue, Sep 21, 2021 at 1:52 PM Fabrice Chapuis  wrote:
>
> If I understand, the instruction to send keep alive by the wal sender has not 
> been reached in the for loop, for what reason?
> ...
> * Check for replication timeout. */
>   WalSndCheckTimeOut();
>
> /* Send keepalive if the time has come */
>   WalSndKeepaliveIfNecessary();
> ...
>

Are you sure that these functions have not been called? Or the case is
that these are called but due to some reason the keep-alive is not
sent? IIUC, these are called after processing each WAL record so not
sure how is it possible in your case that these are not reached?

> The data load is performed on a table which is not replicated, I do not 
> understand why the whole transaction linked to an insert is copied to snap 
> files given that table does not take part of the logical replication.
>

It is because we don't know till the end of the transaction (where we
start sending the data) whether the table will be replicated or not. I
think specifically for this purpose the new 'streaming' feature
introduced in PG-14 will help us to avoid writing data of such tables
to snap/spill files. See 'streaming' option in Create Subscription
docs [1].

> We are going to do a test by modifying parameters 
> wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem is that 
> these parameters are global and changing them will also impact the physical 
> replication.
>

Do you mean you are planning to change from 1 minute to 5 minutes? I
agree with the global nature of parameters and I think your approach
to finding out the root cause is good here because otherwise, under
some similar or more heavy workload, it might lead to the same
situation.

> Concerning the walsender timeout, when the worker is started again after a 
> timeout, it will trigger a new walsender associated with it.
>

Right, I know that but I was curious to know if the walsender has
exited before walreceiver.

[1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html

-- 
With Regards,
Amit Kapila.

Re: POC: Cleaning up orphaned files using undo logs

2021-09-21 Thread Amit Kapila

On Mon, Sep 20, 2021 at 10:24 AM Antonin Houska  wrote:
>
> Amit Kapila  wrote:
>
> > On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > >
> > > > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> > >
> > > * What happened with the idea of abandoning discard worker for the sake
> > >   of simplicity? From what I see limiting everything to foreground undo
> > >   could reduce the core of the patch series to the first four patches
> > >   (forgetting about test and docs, but I guess it would be enough at
> > >   least for the design review), which is already less overwhelming.
> > >
> >
> > I think the discard worker would be required even if we decide to
> > apply all the undo in the foreground. We need to forget/remove the
> > undo of committed transactions as well which we can't remove
> > immediately after the commit.
>
> I think I proposed foreground discarding at some point, but you reminded me
> that the undo may still be needed for some time even after transaction
> commit. Thus the discard worker is indispensable.
>

Right.

> What we can miss, at least for the cleanup of the orphaned files, is the *undo
> worker*. In this patch series the cleanup is handled by the startup process.
>

Okay, I think various people at different point of times has suggested
that idea. I think one thing we might need to consider is what to do
in case of a FATAL error? In case of FATAL error, it won't be
advisable to execute undo immediately, so would we upgrade the error
to PANIC in such cases. I remember vaguely that for clean up of
orphaned files that can happen rarely and someone has suggested
upgrading the error to PANIC in such a case but I don't remember the
exact details.

-- 
With Regards,
Amit Kapila.

Re: psql: tab completion differs on semicolon placement

2021-09-21 Thread Dagfinn Ilmari Mannsåker

David Fetter  writes:

> On Mon, Sep 20, 2021 at 08:26:51PM +0100, Dagfinn Ilmari Mannsåker wrote:
>
>> The same applies to any completion after a MatchAny that ends in a any
>> of the WORD_BREAKS characters (except whitespace and () which are
>> handled specially).
>> 
>> #define WORD_BREAKS "\t\n@$><=;|&{() "
>> 
>> IMO a fix should be more principled than just special-casing semicolon
>> and CREATE TABLE.  Maybe get_previous_words() should stop when it sees
>> an unquoted semicolon?
>
> Is there some reason get_previous_words() shouldn't stop for
> everything that's WORD_BREAKS? If not, making that the test might make the
> general rule a little simpler to write, and if WORD_BREAKS ever
> changed, for example to include all space, or all breaking space, or
> similar, the consequences would at least not propagate through
> seemingly unrelated code.

By "stopping" I meant ignoring everything before the last semicolon when
splitting the buffer into words, i.e. not putting them into the
previous_words array, so they're not considered by the
(Tail|Head)?Matches(CS)? macros.  WORD_BREAKS is the list of characters
used for splitting the input buffer into the previous_words array, so it
would need to keep going past those, or you'd only be able to match the
last word when tab completing, rendering the entire exercise pointless.

> At the moment, get_previous_words() does look for everything in
> WORD_BREAKS, and then accounts for double quotes (") and then does
> something clever to account for double quotes and the quoting behavior
> that doubling them ("") accomplishes. Anyhow, that looks like it
> should work in this case, but clearly it's not.

WORD_BREAK characters inside double-quoted identifiers are handled
correclty, but only after you've typed the closing quote.  If you have
an ambiguous prefix that contains a WORD_BREAK character, you can't
tab-complete the rest:

ilmari@[local]:5432 ~=# drop table "foo
"foo$bar"  "foo$zot"  "foo-bar"  "foo-zot"
ilmari@[local]:5432 ~=# drop table "foo-
"foo-bar"  "foo-zot"
ilmari@[local]:5432 ~=# rop table "foo$

ilmari@[local]:5432 ~=# drop table "foo$bar" 
cascade   restrict

Tangentially, I would argue that $ shouldn't be a WORD_BREAK character,
since it's valid in unquoted identifiers (except at the start, just like
numbers).  But you do need to quote such identifiers when
tab-completing, since quote_ident() quotes anything tht's not all
lowercase letters, underscores and numbers.

- ilmari

Re: logical replication restrictions

2021-09-21 Thread Marcos Pegoraro

No, I´m talking about that configuration you can have on standby servers
recovery_min_apply_delay = '8h'

Atenciosamente,




Em seg., 20 de set. de 2021 às 23:44, Amit Kapila 
escreveu:

> On Mon, Sep 20, 2021 at 9:47 PM Marcos Pegoraro  wrote:
> >
> > One thing is needed and is not solved yet is delayed replication on
> logical replication. Would be interesting to document it on Restrictions
> page, right ?
> >
>
> What do you mean by delayed replication? Is it that by default we send
> the transactions at commit?
>
> --
> With Regards,
> Amit Kapila.
>

Re: row filtering for logical replication

2021-09-21 Thread Dilip Kumar

On Tue, Sep 21, 2021 at 2:34 PM Amit Kapila  wrote:
>
> On Tue, Sep 21, 2021 at 11:16 AM Dilip Kumar  wrote:
> >
> > On Tue, Sep 21, 2021 at 10:41 AM Amit Kapila  
> > wrote:
> > >
> > I think the point is if for some expression some
> > > values are in old tuple and others are in new then the idea proposed
> > > in the patch seems sane. Moreover, I think in your idea for each tuple
> > > we might need to build a new expression and sometimes twice that will
> > > beat the purpose of cache we have kept in the patch and I am not sure
> > > if it is less costly.
> >
> > Basically, expression initialization should happen only once in most
> > cases so with my suggestion you might have to do it twice.
> >
>
> No, the situation will be that we might have to do it twice per update
> where as now, it is just done at the very first operation on a
> relation.

Yeah right.  Actually, I mean it will not get initialized for decoding
each tuple, so instead of once it will be done twice, but anyway now
we agree that we can not proceed in this direction because of the
issue you pointed out.

> >  Maybe for now this suggest that we might not
> > be able to avoid the duplicate execution of the expression
> >
>
> So, IIUC, you agreed that let's proceed with the proposed approach and
> we can later do optimizations if possible or if we get better ideas.

Make sense.

> > Okay, then we might have to deform, but at least are we ensuring that
> > once we have deform the tuple for the expression evaluation then we
> > are not doing that again while sending the tuple?
> >
>
> I think this is possible but we might want to be careful not to send
> extra unchanged values as we are doing now.

Right.

Some more comments,

In pgoutput_row_filter_update(), first, we are deforming the tuple in
local datum, then modifying the tuple, and then reforming the tuple.
I think we can surely do better here.  Currently, you are reforming
the tuple so that you can store it in the scan slot by calling
ExecStoreHeapTuple which will be used for expression evaluation.
Instead of that what you need to do is to deform the tuple using
tts_values of the scan slot and later call ExecStoreVirtualTuple(), so
advantages are 1) you don't need to reform the tuple 2) the expression
evaluation machinery doesn't need to deform again for fetching the
value of the attribute, instead it can directly get from the value
from the virtual tuple.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: mem context is not reset between extended stats

2021-09-21 Thread Tomas Vondra

On 9/21/21 3:37 AM, Justin Pryzby wrote:
> On Tue, Sep 21, 2021 at 02:15:45AM +0200, Tomas Vondra wrote:
>> On 9/15/21 10:09 PM, Justin Pryzby wrote:
>>> Memory allocation appeared be O(1) WRT the number of statistics objects, 
>>> which
>>> was not expected to me.  This is true in v13 (and probably back to v10).
> 
> Of course I meant to say that it's O(N) and not O(1) :)
> 

Sure, I got that ;-)

>> In principle we don't expect too many extended statistics on a single table,
> 
> Yes, but note that expression statistics make it more reasonable to have
> multiple extended stats objects.  I noticed this while testing a patch to 
> build
> (I think) 7 stats objects on each of our current month's partitions.
> autovacuum was repeatedly killed on this vm after using using 2+GB RAM,
> probably in part because there were multiple autovacuum workers handling the
> most recent batch of inserted tables.
> 
> First, I tried to determine what specifically was leaking so badly, and
> eventually converged to this patch.  Maybe there's additional subcontexts 
> which
> would be useful, but the minimum is to reset between objects.
> 

Agreed.

I don't think there's much we could release, given the current design,
because we evaluate (and process) all expressions at once. We might
evaluate/process them one by one (and release the memory), but only when
no other statistics kinds are requested. That seems pretty futile.


>> These issues exist pretty much since PG10, which is where extended stats
>> were introduced, so we'll have to backpatch it. But there's no rush and I
>> don't want to interfere with rc1 at the moment.
> 
> Ack that.  It'd be *nice* if if the fix were included in v14.0, but I don't
> know the rules about what can change after rc1.
> 

IMO this is a bugfix, and I'll get it into 14.0 (and backpatch). But I
don't want to interfere with the rc1 tagging and release, so I'll do
that later this week.

>> Attached are two patches - 0001 is your patch (seems fine, but I looked only
>> very briefly) and 0002 is the context reset I proposed.
> 
> I noticed there seems to be a 3rd patch available, which might either be junk
> for testing or a cool new feature I'll hear about later ;)
> 

Haha! Nope, that was just an experiment with doubling the repalloc()
sizes in functional dependencies, instead of growing them in tiny
chunks. but it does not make a measurable difference, so I haven't
included that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Allow multiple recursive self-references

2021-09-21 Thread Denis Hirn

> I studied up on the theory and terminology being discussed here.  I conclude 
> that what the latest version of this patch is doing (allowing multiple 
> recursive references, but only in a linear-recursion way) is sound and useful.

That's great to hear!

> I haven't looked closely at the implementation changes yet.  I'm not very 
> familiar with that part of the code, so it will take a bit longer. Perhaps 
> you could explain what you are doing there, either in email or (even better) 
> in the commit message.

I have extended the commit message. Some more discussion (regarding tree 
rotation etc.) can be found
in this mail, but also in the commit message.

> What I think this patch needs is a lot more test cases.  I would like to see 
> more variants of different nestings of the UNION branches, some mixtures of 
> UNION ALL and UNION DISTINCT, joins of the recursive CTE with regular tables 
> (like in the flights-and-trains examples)

You are right, the testing is a bit sparse at the moment. I've added some more
test cases with the new version of this patch. Also I've improved the comments.

> as well as various cases of what is not allowed, namely showing that it can 
> carefully prohibit non-linear recursion.

The regression tests already feature tests against non-linear recursion.
Therefore I did not add more myself.

> Also, in one instance you modify an existing test case.  I'm not sure why.  
> Perhaps it would be better to leave the existing test case alone and write a 
> new one.

I agree that it would be better not to modify the existing test case, but the
modification is unavoidable. Here are the original queries:

> -- non-linear recursion is not allowed
> WITH RECURSIVE foo(i) AS
> (values (1)
> UNION ALL
>(SELECT i+1 FROM foo WHERE i < 10
>   UNION ALL
>SELECT i+1 FROM foo WHERE i < 5)
> ) SELECT * FROM foo;

> WITH RECURSIVE foo(i) AS
> (values (1)
> UNION ALL
>  SELECT * FROM
>(SELECT i+1 FROM foo WHERE i < 10
>   UNION ALL
>SELECT i+1 FROM foo WHERE i < 5) AS t
> ) SELECT * FROM foo;

These two test cases are supposed to trigger the non-linear recursion check,
and they do without this patch. However, with the modifications this patch
introduces, both queries are now valid input. That's because each recursive
reference is placed inside a separate recursive UNION branch. This means that
both queries are linear recursive, and not non-linear recursive as they should 
be.

To make these tests check for non-linear recursion again, at least one
UNION branch must contain more than one recursive reference. That's the
modification I did.



> You also introduce this concept of reshuffling the UNION branches to collect 
> all the recursive terms under one subtree.

Yes, but the reshuffling is only applied in a very specific situation. Example:

>  UNION   --->   UNION
> / \/ \
>   UNIONC  AUNION
>  / \  / \
> A   BB   C

A, B, and C are arbitrary SelectStmt nodes and can themselfes be deeper nested
UNION nodes.

A is a non-recursive term in the WITH RECURSIVE query. B, and C both contain a
recursive self-reference. The planner expects the top UNION node to contain
the non-recursive term in the larg, and the recursive term in the rarg.
Therefore, the tree shape on the left is invalid and would result in an error
message at the parsing stage. However, by rotating the tree to the right, this
problem can be solved so that the valid tree shape on the right side is created.

All this does, really, is to re-parenthesize the expression:
(A UNION B) UNION C ---> A UNION (B UNION C)



> Also, currently a query like this works [...] but this doesn't:
> 
> WITH RECURSIVE t(n) AS (
>SELECT n+1 FROM t WHERE n < 100
> UNION ALL
>VALUES (1)
> )
> SELECT sum(n) FROM t;
> 
> With your patch, the second should also work, so let's show some tests for 
> that as well.

With just the tree rotation, the second query can not be fixed. The order of two
nodes is never changed. And I think that this is a good thing. Consider the
following query:

> WITH RECURSIVE t(n) AS (
> VALUES (1)
>   UNION ALL
> SELECT n+1 FROM t WHERE n < 100
>   UNION ALL
> VALUES (2)
> ) SELECT * FROM t LIMIT 100;

If we'd just collect all non-recursive UNION branches, the semantics of the
second query would change. But changing the semantics of a query (or preventing
certain queries to be formulated at all) is not something I think this patch
should do. Therfore – I think – it's appropriate that the second query fails.

Best,
  -- Denis



0009-Add-SQL-standard-multiple-linear-self-references-in-.patch
Description: Binary data

Re: row filtering for logical replication

2021-09-21 Thread Dilip Kumar

On Tue, Sep 21, 2021 at 4:29 PM Dilip Kumar  wrote:
> Some more comments,
>
> In pgoutput_row_filter_update(), first, we are deforming the tuple in
> local datum, then modifying the tuple, and then reforming the tuple.
> I think we can surely do better here.  Currently, you are reforming
> the tuple so that you can store it in the scan slot by calling
> ExecStoreHeapTuple which will be used for expression evaluation.
> Instead of that what you need to do is to deform the tuple using
> tts_values of the scan slot and later call ExecStoreVirtualTuple(), so
> advantages are 1) you don't need to reform the tuple 2) the expression
> evaluation machinery doesn't need to deform again for fetching the
> value of the attribute, instead it can directly get from the value
> from the virtual tuple.
>

I have one more question, while looking into the
ExtractReplicaIdentity() function, it seems that if any of the "rep
ident key" fields is changed then we will write all the key fields in
the WAL as part of the old tuple, not just the changed fields.  That
means either the old tuple will be NULL or it will be having all the
key attributes.  So if we are supporting filter only on the "rep ident
key fields" then is there any need to copy the fields from the new
tuple to the old tuple?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

2021-09-21 Thread Jeevan Ladhe

Thanks for the newer set of the patches Robert!

I was wondering if we should change the bbs_buffer_length in bbsink to
be size_t instead of int, because that's what most of the compression
libraries have their length variables defined as.

Regards,
Jeevan Ladhe

On Mon, Sep 13, 2021 at 9:42 PM Robert Haas  wrote:

> On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar  wrote:
> > Seems like nothing has been done about the issue reported in [1]
> >
> > This one line change shall fix the issue,
>
> Oops. Try this version.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

Re: Added schema level support for publication.

2021-09-21 Thread Greg Nancarrow

On Tue, Sep 21, 2021 at 4:12 PM vignesh C  wrote:
>
> > (1)
> > In get_object_address_publication_schema(), the error message:
> >
> > + errmsg("publication tables of schema \"%s\" in publication \"%s\"
> > does not exist",
> >
> > isn't grammatically correct. It should probably be:
> >
> > + errmsg("publication tables of schema \"%s\" in publication \"%s\" do
> > not exist",
>
> "does not exist" is used across the file. Should we keep it like that
> to maintain consistency. Thoughts?
>

When it's singular, "does not exist" is correct.
I think currently only this case exists in the publication code.
e.g.
"publication \"%s\" does not exist"
"publication relation \"%s\" in publication \"%s\" does not exist"

But "publication tables" is plural, so it needs to say "do not exist"
rather than "does not exist".

> >
> > In the case of "if (!(*nspname))", the following line should probably
> > be added before returning false:
> >
> >*pubname = NULL;
>
> In case of failure we return false and don't access it. I felt we
> could keep it as it is. Thoughts?
>

OK then, I might be being a bit pedantic.
(I was just thinking, strictly speaking, we probably shouldn't be
writing into the caller's pass-by-reference parameters in the case
false is returned)

> > (5)
> > I'm wondering, in CheckObjSchemaNotAlreadyInPublication(), instead of
> > checking "checkobjtype" each loop iteration, wouldn't it be better to
> > just use the same for-loop in each IF block?
>
> I will be changing it to:
> static void
> CheckObjSchemaNotAlreadyInPublication(List *rels, List *schemaidlist,
> PublicationObjSpecType checkobjtype)
> {
>   ListCell   *lc;
>
>   foreach(lc, rels)
>   {
> Relation  rel = (Relation) lfirst(lc);
> Oid relSchemaId = RelationGetNamespace(rel);
>
> if (list_member_oid(schemaidlist, relSchemaId))
> {
>   if (checkobjtype == PUBLICATIONOBJ_REL_IN_SCHEMA)
> ereport(ERROR,
> errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("cannot add schema \"%s\" to publication",
>  get_namespace_name(relSchemaId)),
> errdetail("Table \"%s\" in schema \"%s\" is already part
> of the publication, adding the same schema is not supported.",
>   RelationGetRelationName(rel),
>   get_namespace_name(relSchemaId)));
>   else if (checkobjtype == PUBLICATIONOBJ_TABLE)
> ereport(ERROR,
> errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("cannot add relation \"%s.%s\" to publication",
>  get_namespace_name(relSchemaId),
>  RelationGetRelationName(rel)),
> errdetail("Table's schema \"%s\" is already part of the
> publication.",
>   get_namespace_name(relSchemaId)));
> }
>   }
> }
> After the change checkobjtype will be checked only once in case of error.
>

OK.

One thing related to this code is the following:

i)
postgres=# create publication pub1 for all tables in schema sch1,
table sch1.test;
ERROR:  cannot add relation "sch1.test" to publication
DETAIL:  Table's schema "sch1" is already part of the publication.

ii)
postgres=# create publication pub1 for table sch1.test, all tables in
schema sch1;
ERROR:  cannot add relation "sch1.test" to publication
DETAIL:  Table's schema "sch1" is already part of the publication.

Notice that in case (ii), the same error message is used, but the
order of items to be "added" to the publication is the reverse of case
(i), and really implies the table "sch1.test" was added first, but
this is not reflected by the error message. So it seems slightly odd
to say the schema is already part of the publication, when the table
was actually listed first.
I'm wondering if this can be improved?

One idea I had was the following more generic type of message, but I'm
not 100% happy with the wording:

DETAIL:  Schema "sch1" and one of its tables can't separately be
part of the publication.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Daniel Gustafsson

> On 21 Sep 2021, at 08:50, Pavel Stehule  wrote:
> po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson  > napsal:

> +   printf(_("  --filter=FILENAMEdump objects and data based 
> on the filter expressions\n"
> +"   from the filter 
> file\n"));
> Before we settle on --filter I think we need to conclude whether this file is
> intended to be included from a config file, or used on it's own.  If we gow 
> tih
> the former then we might not want a separate option for just --filter.
> 
> I prefer to separate two files. Although there is some intersection, I think 
> it is good to have two simple separate files for two really different tasks.
> It does filtering, and it should be controlled by option "--filter". When the 
> implementation will be changed, then this option can be changed too. 
> Filtering is just a pg_dump related feature. Revision of client application 
> configuration is a much more generic task, and if we mix it to one, we can be
> in a trap. It can be hard to find one good format for large script generated 
> content, and possibly hand written structured content. For practical
> reasons it can be good to have two files too. Filters and configurations can 
> have different life cycles.

I'm not convinced that we can/should change or remove a commandline parameter
in a coming version when there might be scripts expecting it to work in a
specific way.  Having a --filter as well as a --config where the configfile can
refer to the filterfile also passed via --filter sounds like problem waiting to
happen, so I think we need to settle how we want to interact with this file
before anything goes in.

Any thoughts from those in the thread who have had strong opinions on config
files etc?

> +if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed.  The is
> not a hot-path, we can afford the string comparison in case of errors.  Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
> 
> I agree that this is not a hot-path, just I don't feel well if I need to make 
> a zero end string just for comparison pg_strcasecmp. Current design reduces 
> malloc/free cycles. It is used in more places, when Postgres parses strings - 
> SQL parser, plpgsql parser. I am not sure about the benefits and costs - 
> pg_strcasecmp can be more readable, but for any keyword I have to call 
> pstrdup and pfree. Is it necessary? My opinion in this part is not too strong 
> - it is a minor issue, maybe I have a little bit different feelings about 
> benefits and costs in this specific case, and if you really think the 
> benefits of rewriting are higher, I'll do it

Sorry, I typoed my response.  What I meant was to move the pg_strncasecmp call
inline and not do the strlen check, to save readers from jumping around.  So
basically end up with the below in read_filter_item():

+   /* Now we expect sequence of two keywords */
+   if (pg_strncasecmp(keyword, "include", size) == 0)
+   *is_include = true;

> +initStringInfo(&line);
> Why is this using a StringInfo rather than a PQExpBuffer as the rest of 
> pg_dump
> does?
> 
> The StringInfo is used because I use the pg_get_line_buf function, and this 
> function uses this API.

Ah, of course.

A few other comments from another pass over this:

+   exit_nicely(-1);
Why -1?  pg_dump (and all other binaries) exits with 1 on IMO even more serious
errors so I think this should use 1 as well.


+   if (!pg_get_line_buf(fstate->fp, line))
+   {
+   if (ferror(fstate->fp))
+   fatal("could not read from file \"%s\": %m", 
fstate->filename);
+
+   exit_invalid_filter_format(fstate,"unexpected end of file");
+   }
In the ferror() case this codepath isn't running fclose() on the file pointer
(unless stdin) which we do elsewhere, so this should use pg_log_error and
exit_nicely instead.


+   pg_log_fatal("could not read from file \"%s\": %m", fstate->filename);
Based on how other errors are treated in pg_dump I think this should be
downgraded to a pg_log_error.

The above comments are fixed in the attached, as well as a pass over the docs
and extended tests to actually test matching a foreign server.  What do think
about this version?  I'm still not convinced that there aren't more quoting
bugs in the parser, but I've left that intact for now.

--
Daniel Gustafsson   https://vmware.com/



0001-Implement-filter-for-pg_dump.patch
Description: Binary data

Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails

2021-09-21 Thread torikoshia


On 2021-09-16 08:40, Tom Lane wrote:

I wrote:

I do not think that patch is a proper solution, but we do need to do
something about this.


I poked into this and decided it's an ancient omission within 
ruleutils.c.
The reason we've not seen it before is probably that you can't get to 
the
case through the parser.  The SEARCH stuff is generating a query 
structure

basically equivalent to

regression=# with recursive cte (x,r) as (
select 42 as x, row(i, 2.3) as r from generate_series(1,3) i
union all
select x, row((c.r).f1, 4.5) from cte c
)
select * from cte;
ERROR:  record type has not been registered

and as you can see, expandRecordVariable fails to figure out what
the referent of "c.r" is.  I think that could be fixed (by looking
into the non-recursive term), but given the lack of field demand,
I'm not feeling that it's urgent.

So the omission is pretty obvious from the misleading comment:
actually, Vars referencing RTE_CTE RTEs can also appear in 
WorkTableScan

nodes, and we're not doing anything to support that.  But we only reach
this code when trying to resolve a field of a Var of RECORD type, which
is a case that it seems like the parser can't produce.

It doesn't look too hard to fix: we just have to find the 
RecursiveUnion

that goes with the WorkTableScan, and drill down into that, much as we
would do in the CteScan case.  See attached draft patch.  I'm too tired
to beat on this heavily or add a test case, but I have verified that it
passes check-world and handles the example presented in this thread.

regards, tom lane


Thanks for looking into this and fixing it!

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Pavel Stehule

út 21. 9. 2021 v 14:37 odesílatel Daniel Gustafsson 
napsal:

> > On 21 Sep 2021, at 08:50, Pavel Stehule  wrote:
> > po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson  > napsal:
>
> > +   printf(_("  --filter=FILENAMEdump objects and data
> based on the filter expressions\n"
> > +"   from the filter
> file\n"));
> > Before we settle on --filter I think we need to conclude whether this
> file is
> > intended to be included from a config file, or used on it's own.  If we
> gow tih
> > the former then we might not want a separate option for just --filter.
> >
> > I prefer to separate two files. Although there is some intersection, I
> think it is good to have two simple separate files for two really different
> tasks.
> > It does filtering, and it should be controlled by option "--filter".
> When the implementation will be changed, then this option can be changed
> too.
> > Filtering is just a pg_dump related feature. Revision of client
> application configuration is a much more generic task, and if we mix it to
> one, we can be
> > in a trap. It can be hard to find one good format for large script
> generated content, and possibly hand written structured content. For
> practical
> > reasons it can be good to have two files too. Filters and configurations
> can have different life cycles.
>
> I'm not convinced that we can/should change or remove a commandline
> parameter
> in a coming version when there might be scripts expecting it to work in a
> specific way.  Having a --filter as well as a --config where the
> configfile can
> refer to the filterfile also passed via --filter sounds like problem
> waiting to
> happen, so I think we need to settle how we want to interact with this file
> before anything goes in.
>
> Any thoughts from those in the thread who have had strong opinions on
> config
> files etc?
>
> > +if (filter_is_keyword(keyword, size, "include"))
> > I would prefer if this function call was replaced by just the
> pg_strcasecmp()
> > call in filter_is_keyword() and the strlen optimization there removed.
> The is
> > not a hot-path, we can afford the string comparison in case of errors.
> Having
> > the string comparison done inline here will improve readability saving
> the
> > reading from jumping to another function to see what it does.
> >
> > I agree that this is not a hot-path, just I don't feel well if I need to
> make a zero end string just for comparison pg_strcasecmp. Current design
> reduces malloc/free cycles. It is used in more places, when Postgres parses
> strings - SQL parser, plpgsql parser. I am not sure about the benefits and
> costs - pg_strcasecmp can be more readable, but for any keyword I have to
> call pstrdup and pfree. Is it necessary? My opinion in this part is not too
> strong - it is a minor issue, maybe I have a little bit different feelings
> about benefits and costs in this specific case, and if you really think the
> benefits of rewriting are higher, I'll do it
>
> Sorry, I typoed my response.  What I meant was to move the pg_strncasecmp
> call
> inline and not do the strlen check, to save readers from jumping around.
> So
> basically end up with the below in read_filter_item():
>
> +   /* Now we expect sequence of two keywords */
> +   if (pg_strncasecmp(keyword, "include", size) == 0)
> +   *is_include = true;
>
>
I don't think so it is safe (strict). Only pg_strncasecmp(..)  is true for
keywords "includex", "includedsss", ... You should to compare the size

Regards

Pavel


>
>

Re: refactoring basebackup.c

2021-09-21 Thread Jeevan Ladhe

>
> >> + /*
> >> + * LZ4F_compressUpdate() returns the number of bytes written into
> output
> >> + * buffer. We need to keep track of how many bytes have been
> cumulatively
> >> + * written into the output buffer(bytes_written). But,
> >> + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
> >> + * written to output buffer, set autoFlush to 1 to force the writing
> to the
> >> + * output buffer.
> >> + */
> >> + prefs->autoFlush = 1;
> >>
> >> I don't see why this should be necessary. Elsewhere you have code that
> >> caters to bytes being stuck inside LZ4's buffer, so why do we also
> >> require this?
> >
> > This is needed to know the actual bytes written in the output buffer. If
> it is
> > set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
> > bytes are written to the output buffer, depending on whether it has
> buffered
> > or really flushed data to the output buffer.
>
> The problem is that if we autoflush, I think it will cause the
> compression ratio to be less good. Try un-lz4ing a file that is
> produced this way and then re-lz4 it and compare the size of the
> re-lz4'd file to the original one. Compressors rely on postponing
> decisions about how to compress until they've seen as much of the
> input as possible, and flushing forces them to decide earlier, and
> maybe making a decision that isn't as good as it could have been. So I
> believe we should look for a way of avoiding this. Now I realize
> there's a problem there with doing that and also making sure the
> output buffer is large enough, and I'm not quite sure how we solve
> that problem, but there is probably a way to do it.
>

Yes, you are right here, and I could verify this fact with an experiment.
When autoflush is 1, the file gets less compressed i.e. the compressed file
is of more size than the one generated when autoflush is set to 0.
But, as of now, I couldn't think of a solution as we need to really advance
the
bytes written to the output buffer so that we can write into the output
buffer.

Regards,
Jeevan Ladhe

Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Alvaro Herrera

I definitely agree that we should have two files, one for config and
another one for filter, since their purposes are orthogonal and their
formats are likely different; trying to cram the filter specification in
the config file seems unfriendly because it'd force users to write the
filter in whatever alien grammar used for the config file.  Also, this
would make it easier to use a single config file with a bunch of
different filter files.

On 2021-Sep-21, Daniel Gustafsson wrote:

> I'm not convinced that we can/should change or remove a commandline parameter
> in a coming version when there might be scripts expecting it to work in a
> specific way.  Having a --filter as well as a --config where the configfile 
> can
> refer to the filterfile also passed via --filter sounds like problem waiting 
> to
> happen, so I think we need to settle how we want to interact with this file
> before anything goes in.

I think both the filter and the hypothetical config file are going to
interact (be redundant) with almost all already existing switches, and
there's no need to talk about removing anything (e.g., nobody would
argue for the removal of "-t" even though that's redundant with the
filter file).

I see no problem with the config file specifying a filter file.

AFAICS if the config file specifies a filter and the user also specifies
a filter in the command line, we have two easy options: raise an error
about the redundant option, or have the command line option supersede
the one in the config file.  The latter strikes me as the more useful
behavior, and it's in line with what other tools do in similar cases, so
that's what I propose doing.

(There might be less easy options too, such as somehow combining the two
filters, but offhand I don't see any reason why this is real-world
useful, so I don't propose doing that.)

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."(Robert Davidson)
   http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php

Re: PostgreSQL High Precision Support Extension.

2021-09-21 Thread Andrew Dunstan

On 9/20/21 9:29 PM, A Z wrote:
> I have been trying to get a reply or interest in either updating
> PostgreSQL to support the following,
> or for there to be a public, free for any use Extension put out there,
> that will support the following.
> Can someone able and interested please respond to me about the
> following project specification,
> which I am very keen to see happen:

Please stop posting the same thing over and over. It doesn't help you,
in fact it's likely to put off anyone who might be interested in your
project. If you haven't got an answer by now you should conclude that
nobody here is interested.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: refactoring basebackup.c

2021-09-21 Thread Jeevan Ladhe

Hi Robert,

Here is a patch for lz4 based on the v5 set of patches. The patch adapts
with the
bbsink changes, and is now able to make the provision for the required
length
for the output buffer using the new callback
function bbsink_lz4_begin_backup().

Sample command to take backup:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4

Please let me know your thoughts.

Regards,
Jeevan Ladhe

On Mon, Sep 13, 2021 at 9:42 PM Robert Haas  wrote:

> On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar  wrote:
> > Seems like nothing has been done about the issue reported in [1]
> >
> > This one line change shall fix the issue,
>
> Oops. Try this version.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

lz4_compress_v2.patch
Description: Binary data

Re: Refactoring postgres_fdw code to rollback remote transaction

2021-09-21 Thread Fujii Masao





On 2021/09/17 15:33, Bharath Rupireddy wrote:

On Fri, Sep 17, 2021 at 8:28 AM Fujii Masao  wrote:


On 2021/09/17 11:40, Zhihong Yu wrote:

+   goto fail;  /* Trouble clearing prepared statements */

The label fail can be removed. Under the above condition,  
entry->changing_xact_state is still true. You can directly return.


Thanks for the review! Yes, you're right. Attached the updated version of the 
patch.


+1 for the code refactoring (1 file changed, 75 insertions(+), 102
deletions(-)).

The v2 patch looks good to me as is.


Thanks for the review! Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: refactoring basebackup.c

2021-09-21 Thread Robert Haas

On Tue, Sep 14, 2021 at 11:30 AM Sergei Kornilov  wrote:
> I found that in 0001 you propose to rename few options. Probably we could 
> rename another option for clarify? I think FAST (it's about some bw limits?) 
> and WAIT (wait for what? checkpoint?) option names are confusing.
> Could we replace FAST with "CHECKPOINT [fast|spread]" and WAIT to 
> WAIT_WAL_ARCHIVED? I think such names would be more descriptive.

I think CHECKPOINT { 'spread' | 'fast' } is probably a good idea; the
options logic for pg_basebackup uses the same convention, and if
somebody ever wanted to introduce a third kind of checkpoint, it would
be a lot easier if you could just make pg_basebackup -cbanana send
CHECKPOINT 'banana' to the server. I don't think renaming WAIT ->
WAIT_WAL_ARCHIVED has much value. The replication grammar isn't really
intended to be consumed directly by end-users, and it's also not clear
that WAIT_WAL_ARCHIVED would attract more support than any of 5 or 10
other possible variants. I'd rather leave it alone.

> -   if (PQserverVersion(conn) >= 10)
> -   /* pg_recvlogical doesn't use an exported snapshot, 
> so suppress */
> -   appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
> +   /* pg_recvlogical doesn't use an exported snapshot, so 
> suppress */
> +   if (use_new_option_syntax)
> +   AppendStringCommandOption(query, 
> use_new_option_syntax,
> +  
> "SNAPSHOT", "nothing");
> +   else
> +   AppendPlainCommandOption(query, use_new_option_syntax,
> +
> "NOEXPORT_SNAPSHOT");
>
> In 0002, it looks like condition for 9.x releases was lost?

Good catch, thanks.

I'll post an updated version of these two patches on the thread
dedicated to those two patches, which can be found at
http://postgr.es/m/CA+Tgmob2cbCPNbqGoixp0J6aib0p00XZerswGZwx-5G=0m+...@mail.gmail.com

> Also my gcc version 8.3.0 is not happy with 
> v5-0007-Support-base-backup-targets.patch and produces:
>
> basebackup.c: In function ‘parse_basebackup_options’:
> basebackup.c:970:7: error: ‘target_str’ may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>errmsg("target '%s' does not accept a target detail",
>^~

OK, I'll fix that. Thanks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Logical replication timeout problem

2021-09-21 Thread Fabrice Chapuis

> IIUC, these are called after processing each WAL record so not
sure how is it possible in your case that these are not reached?

I don't know, as you say, to highlight the problem we would have to debug
the WalSndKeepaliveIfNecessary function

> I was curious to know if the walsender has exited before walreceiver

During the last tests we made we didn't observe any timeout of the wal
sender process.

> Do you mean you are planning to change from 1 minute to 5 minutes?

We set wal_sender_timeout/wal_receiver_timeout to 5' and launch new test.
The result is surprising and rather positive there is no timeout any more
in the log and the 20Gb of snap files are removed in less than 5 minutes.
How to explain that behaviour, why the snap files are consumed suddenly so
quickly.
I choose the value arbitrarily for wal_sender_timeout/wal_receiver_timeout
parameters, are theses values appropriate from your point of view?

Best Regards

Fabrice



On Tue, Sep 21, 2021 at 11:52 AM Amit Kapila 
wrote:

> On Tue, Sep 21, 2021 at 1:52 PM Fabrice Chapuis 
> wrote:
> >
> > If I understand, the instruction to send keep alive by the wal sender
> has not been reached in the for loop, for what reason?
> > ...
> > * Check for replication timeout. */
> >   WalSndCheckTimeOut();
> >
> > /* Send keepalive if the time has come */
> >   WalSndKeepaliveIfNecessary();
> > ...
> >
>
> Are you sure that these functions have not been called? Or the case is
> that these are called but due to some reason the keep-alive is not
> sent? IIUC, these are called after processing each WAL record so not
> sure how is it possible in your case that these are not reached?
>
> > The data load is performed on a table which is not replicated, I do not
> understand why the whole transaction linked to an insert is copied to snap
> files given that table does not take part of the logical replication.
> >
>
> It is because we don't know till the end of the transaction (where we
> start sending the data) whether the table will be replicated or not. I
> think specifically for this purpose the new 'streaming' feature
> introduced in PG-14 will help us to avoid writing data of such tables
> to snap/spill files. See 'streaming' option in Create Subscription
> docs [1].
>
> > We are going to do a test by modifying parameters
> wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem is that
> these parameters are global and changing them will also impact the physical
> replication.
> >
>
> Do you mean you are planning to change from 1 minute to 5 minutes? I
> agree with the global nature of parameters and I think your approach
> to finding out the root cause is good here because otherwise, under
> some similar or more heavy workload, it might lead to the same
> situation.
>
> > Concerning the walsender timeout, when the worker is started again after
> a timeout, it will trigger a new walsender associated with it.
> >
>
> Right, I know that but I was curious to know if the walsender has
> exited before walreceiver.
>
> [1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html
>
> --
> With Regards,
> Amit Kapila.
>

Re: Estimating HugePages Requirements?

2021-09-21 Thread Bossart, Nathan

On 9/20/21, 6:48 PM, "Michael Paquier"  wrote:
> Thanks.  I have gone through the last patch this morning, did some
> tests on all the platforms I have at hand (including Linux) and
> finished by applying it after doing some small tweaks.  First, I have 
> finished by extending GetHugePageSize() to accept NULL for its first
> argument hugepagesize.  A second thing was in the docs, where it is
> still useful IMO to keep the reference to /proc/meminfo and
> /sys/kernel/mm/hugepages to let users know how the system impacts the
> calculation of the new GUC.
>
> Let's see what the buildfarm thinks about it.

Thanks!

Nathan

Re: Estimating HugePages Requirements?

2021-09-21 Thread Bossart, Nathan

On 9/20/21, 7:29 PM, "Michael Paquier"  wrote:
> On Tue, Sep 21, 2021 at 12:08:22AM +, Bossart, Nathan wrote:
>> Should we also initialize the shared memory GUCs in bootstrap and
>> single-user mode?  I think I missed this in bd17880.
>
> Why would we need that for the bootstrap mode?
>
> While looking at the patch for shared_memory_size, I have looked at
> those code paths to note that some of the runtime GUCs would be set
> thanks to the load of the control file, but supporting this case
> sounded rather limited to me for --single when it came to shared
> memory and huge page estimation and we don't load
> shared_preload_libraries in this context either, which could lead to
> wrong estimations.  Anyway, I am not going to fight hard if people
> would like that for the --single mode, even if it may lead to an
> underestimation of the shmem allocated.

I was looking at this from the standpoint of keeping the startup steps
consistent between the modes.  Looking again, I can't think of
a strong reason to add it to bootstrap mode.  I think the case for
adding it to single-user mode is a bit stronger, as commands like
"SHOW shared_memory_size;" currently return 0.  I lean in favor of
adding it for single-user mode, but it's probably fine either way.

Nathan

Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Tomas Vondra

On 9/21/21 3:28 PM, Alvaro Herrera wrote:
> I definitely agree that we should have two files, one for config and
> another one for filter, since their purposes are orthogonal and their
> formats are likely different; trying to cram the filter specification in
> the config file seems unfriendly because it'd force users to write the
> filter in whatever alien grammar used for the config file.  Also, this
> would make it easier to use a single config file with a bunch of
> different filter files.
> 

+1, that is pretty much excatly what I argued for not too long ago.

> On 2021-Sep-21, Daniel Gustafsson wrote:
> 
>> I'm not convinced that we can/should change or remove a commandline parameter
>> in a coming version when there might be scripts expecting it to work in a
>> specific way.  Having a --filter as well as a --config where the configfile 
>> can
>> refer to the filterfile also passed via --filter sounds like problem waiting 
>> to
>> happen, so I think we need to settle how we want to interact with this file
>> before anything goes in.
> 
> I think both the filter and the hypothetical config file are going to
> interact (be redundant) with almost all already existing switches, and
> there's no need to talk about removing anything (e.g., nobody would
> argue for the removal of "-t" even though that's redundant with the
> filter file).
> 
> I see no problem with the config file specifying a filter file.
> 
> AFAICS if the config file specifies a filter and the user also specifies
> a filter in the command line, we have two easy options: raise an error
> about the redundant option, or have the command line option supersede
> the one in the config file.  The latter strikes me as the more useful
> behavior, and it's in line with what other tools do in similar cases, so
> that's what I propose doing.
> 
> (There might be less easy options too, such as somehow combining the two
> filters, but offhand I don't see any reason why this is real-world
> useful, so I don't propose doing that.)
> 

Well, I think we already have to do decisions like that, because you can
do e.g. this:

pg_dump -T t -t t

So we already do combine the switches, and we do this:

When both -t and -T are given, the behavior is to dump just the
tables that match at least one -t switch but no -T switches. If -T
appears without -t, then tables matching -T are excluded from what
is otherwise a normal dump.

That seems fairly reasonable, and I don't see why not to use the same
logic for combining patterns no matter where we got them (filter file,
command-line option, etc.).

Just combine everything, and then check if there's any "exclude" rule.
If yes, we're done - exclude. If not, check if there's "include" rule.
If not, still exclude. Otherwise include.

Seems reasonable and consistent to me, and I don't see why not to allow
multiple --filter parameters.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Deduplicate code updating ControleFile's DBState.

2021-09-21 Thread Bossart, Nathan

On 9/20/21, 10:07 PM, "Amul Sul"  wrote:
> On Tue, Sep 21, 2021 at 4:44 AM Bossart, Nathan  wrote:
>> On 9/19/21, 11:07 PM, "Amul Sul"  wrote:
>> > I have one additional concern about the way we update the control
>> > file, at every place where doing the update, we need to set control
>> > file update time explicitly, why can't the time update line be moved
>> > to UpdateControlFile() so that time gets automatically updated?
>>
>> I see a few places where UpdateControlFile() is called without
>> updating ControlFile->time.  I haven't found any obvious reason for
>> that, so perhaps it would be okay to move it to update_controlfile().
>>
>
> Ok, thanks, did the same in the attached version.

void
UpdateControlFile(void)
{
+   ControlFile->time = (pg_time_t) time(NULL);
update_controlfile(DataDir, ControlFile, true);
}

Shouldn't we update the time in update_controlfile()?  Also, can you
split this change into two patches (i.e., one for the timestamp change
and another for the refactoring)?  Otherwise, this looks reasonable to
me.

Nathan

回复：Queries that should be canceled will get stuck on secure_write function

2021-09-21 Thread 蔡梦娟(玊于)

Hi all, I want to know your opinion on this patch, or in what way do you think 
we should solve this problem?
--
发件人：蔡梦娟(玊于) 
发送时间：2021年9月9日(星期四) 17:38
收件人：Robert Haas ; Andres Freund ; 
alvherre ; masao.fujii 
抄　送：pgsql-hackers 
主　题：回复：Queries that should be canceled will get stuck on secure_write function


I changed the implementation about this problem: 
a) if the cancel query interrupt is from db for some reason, such as recovery 
conflict, then handle it immediately, and cancel request is treated as 
terminate request;
b) if the cancel query interrupt is from client, then ignore as original way

In the attached patch, I also add a tap test to generate a recovery conflict on 
a standby during the backend process is stuck on client write, check whether it 
can handle the cancel query request due to recovery conflict.

what do you think of it, hope to get your reply

Thanks & Best Regards




0001-Handle-cancel-interrupts-during-client-readwrite.patch
Description: Binary data

Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Pavel Stehule

Hi


> The above comments are fixed in the attached, as well as a pass over the
> docs
> and extended tests to actually test matching a foreign server.  What do
> think
> about this version?  I'm still not convinced that there aren't more quoting
> bugs in the parser, but I've left that intact for now.
>

The problematic points are double quotes and new line char. Any other is
just in  sequence of bytes.

I have just one note to your patch. When you use pg_strncasecmp, then you
have to check the size too


char   *xxx = "incl";
int xxx_size = 4;

elog(NOTICE, "%d",
  pg_strncasecmp(xxx, "include", xxx_size) == 0);

result is NOTICE:  1

"incl" is not keyword "include"

Regards

Pavel



>
> --
> Daniel Gustafsson   https://vmware.com/
>
>

Re: extensible options syntax for replication parser?

2021-09-21 Thread Robert Haas

On Fri, Sep 10, 2021 at 3:44 PM Robert Haas  wrote:
> Last call for complaints about either the overall direction or the
> specific implementation choices...

A complaint showed up over at
http://postgr.es/m/979131631633...@mail.yandex.ru and pursuant to that
complaint I have made the new syntax for controlling the checkpoint
type look like CHECKPOINT { 'fast' | 'spread' } rather than just
having an option called FAST. It was suggested over there to also
rename WAIT to WAIT_WAL_ARCHIVED, but I don't like that for reasons
explained on that thread and so have not adopted that proposal.

Sergei also helpfully pointed out that I'd accidentally deleted a
version check in one place, so this version is also updated to not do
that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v6-0001-Flexible-options-for-BASE_BACKUP.patch
Description: Binary data

v6-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patch
Description: Binary data

Re: refactoring basebackup.c

2021-09-21 Thread Robert Haas

On Tue, Sep 21, 2021 at 9:08 AM Jeevan Ladhe
 wrote:
> Yes, you are right here, and I could verify this fact with an experiment.
> When autoflush is 1, the file gets less compressed i.e. the compressed file
> is of more size than the one generated when autoflush is set to 0.
> But, as of now, I couldn't think of a solution as we need to really advance 
> the
> bytes written to the output buffer so that we can write into the output 
> buffer.

I don't understand why you think we need to do that. What happens if
you just change prefs->autoFlush = 1 to set it to 0 instead? What I
think will happen is that you'll call LZ4F_compressUpdate a bunch of
times without outputting anything, and then suddenly one of the calls
will produce a bunch of output all at once. But so what? I don't see
that anything in bbsink_lz4_archive_contents() would get broken by
that.

It would be a problem if LZ4F_compressUpdate() didn't produce anything
and also didn't buffer the data internally, and expected us to keep
the input around. That we would have difficulty doing, because we
wouldn't be calling LZ4F_compressUpdate() if we didn't need to free up
some space in that sink's input buffer. But if it buffers the data
internally, I don't know why we care.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Improve logging when using Huge Pages

2021-09-21 Thread Fujii Masao





On 2021/09/20 17:55, Shinoda, Noriyoshi (PN Japan FSIP) wrote:

I was worried about the same thing as you.
So the attached patch gets the value of the kernel parameter vm.nr_hugepages,
and if it is the default value of zero, the log level is the same as before.
On the other hand, if any value is set, the level is set to LOG.


Probably I understood your point. But isn't it more confusing to users?
Because whether the log message is output or not is changed depending on
the setting of the kernel parameter.  For example, when vm.nr_hugepages=0
and no log message about huge pages is output, users might easily misunderstand
that shared memory was successfully allocated with huge pages because
they saw no such log message.

IMO we should leave the log message "mmap(%zu) with MAP_HUGETLB failed..."
as it is if users don't like additional log message output whenever
the server restarts. In this case, instead, maybe it's better to provide GUC or
something to report whether shared memory was successfully allocated
with huge pages or not.

OTOH, if users can accept such additional log message, I think that it's
less confusing to report something like "disable huge pages ..." whenever
mmap() with huge pages fails. I still prefer "disable huge pages ..." to
"fall back ..." as the log message, but if many people think the latter is
better, I'd follow that.

Another idea is to output "Anonymous shared memory was allocated with
 huge pages" when it's successfully allocated with huge pages, and to output
 "Anonymous shared memory was allocated without huge pages"
 when it's successfully allocated without huge pages. I'm not sure if users
 may think even this message is noisy, though.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: refactoring basebackup.c

2021-09-21 Thread Robert Haas

On Tue, Sep 21, 2021 at 9:35 AM Jeevan Ladhe
 wrote:
> Here is a patch for lz4 based on the v5 set of patches. The patch adapts with 
> the
> bbsink changes, and is now able to make the provision for the required length
> for the output buffer using the new callback function 
> bbsink_lz4_begin_backup().
>
> Sample command to take backup:
> pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4
>
> Please let me know your thoughts.

This pretty much looks right, with the exception of the autoFlush
thing about which I sent a separate email. I need to write docs for
all of this, and ideally test cases. It might also be good if
pg_basebackup had an option to un-gzip or un-lz4 archives, but I
haven't thought too hard about what would be required to make that
work.

+ if (opt->compression == BACKUP_COMPRESSION_LZ4)

else if

+ /* First of all write the frame header to destination buffer. */
+ Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX);
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ CHUNK_SIZE,
+ prefs);

I think this is wrong. I think you should be passing bbs_buffer_length
instead of CHUNK_SIZE, and I think you can just delete CHUNK_SIZE. If
you think otherwise, why?

+ * sink's bbs_buffer of length that can accomodate the compressed input

Spelling.

+ * Make it next multiple of BLCKSZ since the buffer length is expected so.

The buffer length is expected to be a multiple of BLCKSZ, so round up.

+ * If we are falling short of available bytes needed by
+ * LZ4F_compressUpdate() per the upper bound that is decided by
+ * LZ4F_compressBound(), send the archived contents to the next sink to
+ * process it further.

If the number of available bytes has fallen below the value computed
by LZ4F_compressBound(), ask the next sink to process the data so that
we can empty the buffer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: .ready and .done files considered harmful

2021-09-21 Thread Robert Haas

On Mon, Sep 20, 2021 at 4:42 PM Alvaro Herrera  wrote:
> I was going to say that perhaps we can avoid repeated scans by having a
> bitmap of future files that were found by a scan; so if we need to do
> one scan, we keep track of the presence of the next (say) 64 files in
> our timeline, and then we only have to do another scan when we need to
> archive a file that wasn't present the last time we scanned.

There are two different proposed patches on this thread. One of them
works exactly that way, and the other one tries to optimize by
assuming that if we just optimized WAL file N, we likely will next
want to archive WAL file N+1. It's been hard to decide which way is
better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-09-21 Thread Tom Lane

"Joel Jacobson"  writes:
> [ 0005-regexp-positions.patch ]

I took a quick look at this patch.  I am on board now with the general
idea of returning match positions, but I think there are still
definitional issues to be discussed.

1. The main idea we might perhaps want to adopt from the Oracle-ish
regexp functions is a search-start-position argument.  I'm not sure
that this is exciting --- if we intend this function to be a close
analog of regexp_matches(), it'd be better to leave that off.  But
it deserves explicit consideration.

2. It looks like you modeled this on regexp_matches() to the extent
of returning a set and using the 'g' flag to control whether the
set can actually contain more than one row.  That's pretty ancient
precedent ... but we have revisited that design more than once
because regexp_matches() is just too much of a pain to use.  I think
if we're going to do this, we should learn from history, and provide an
analog to regexp_match() as well as regexp_matches() right off the bat.

3. The API convention you chose (separate start and length arrays)
is perhaps still confusing.  When I first looked at the test case

+SELECT regexp_positions('foobarbequebaz', $re$(bar)(beque)$re$);
+ regexp_positions  
+---
+ ("{4,7}","{3,5}")
+(1 row)

I thought it was all wrong because it seemed to be identifying
the substrings 'barbequ' and 'obarb'.  If there'd been a different
number of matches than capture groups, maybe I'd not have been
confused, but still...  I wonder if we'd be better advised to make
N capture groups produce N two-element arrays, or else mash it all
into one array of N*2 elements.  But this probably depends on which
way is the easiest to work with in SQL.

4. Not sure about the handling of sub-matches.
There are various plausible definitions we could use:

* We return the position/length of the overall match, never mind
about any parenthesized subexpressions.  This is simple but I think
it loses significant functionality.  As an example, you might have
a pattern like 'ab*(c*)d+' where what you actually want to know
is where the 'c's are, but they have to be in the context described
by the rest of the regexp.  Without subexpression-match capability
that's painful to do.

* If there's a capturing subexpression, return the position/length
of the first such subexpression, else return the overall match.
This matches the behavior of substring().

* If there are capturing subexpression(s), return the
positions/lengths of those, otherwise return the overall match.
This agrees with the behavior of regexp_match(es), so I'd tend
to lean to this option, but perhaps it's the hardest to use.

* Return the position/length of the overall match *and* those of
each capturing subexpression.  This is the most flexible choice,
but I give it low marks since it matches no existing behaviors.


As for comments on the patch itself:

* The documentation includes an extraneous entry for regexp_replace,
and it fails to add the promised paragraph to functions-posix-regexp.

* This bit is evidently copied from regexp_matches:

+/* be sure to copy the input string into the multi-call ctx */
+matchctx = setup_regexp_matches(PG_GETARG_TEXT_P_COPY(0), pattern,

regexp_matches needs to save the input string so that
build_regexp_match_result can copy parts of that.  But
regexp_positions has no such need AFAICS, so I think we
don't need a long-lived copy of the string.

* I wouldn't use these names in build_regexp_positions_result:

+ArrayType  *so_ary;
+ArrayType  *eo_ary;

"so_ary" isn't awful, but it invites confusion with regex's "so"
field, which hasn't got the same semantics (off by one).
"eo_ary" is pretty bad because it isn't an "end offset" at all, but
a length.  I'd go for "start_ary" and "length_ary" or some such.

* Test cases seem a bit thin, notably there's no coverage of the
null-subexpression code path.

regards, tom lane

Re: Added schema level support for publication.

2021-09-21 Thread vignesh C

On Mon, Sep 20, 2021 at 4:20 PM vignesh C  wrote:
>
> On Mon, Sep 20, 2021 at 3:57 PM Amit Kapila  wrote:
> >
> > On Fri, Sep 17, 2021 at 5:40 PM vignesh C  wrote:
> > >
> > > On Thu, Sep 16, 2021 at 9:54 AM Amit Kapila  
> > > wrote:
> > > >
> > > > I think there is one more similar locking problem.
> > > > AlterPublicationSchemas()
> > > > {
> > > > ..
> > > > + if (stmt->action == DEFELEM_ADD)
> > > > + {
> > > > + List *rels;
> > > > +
> > > > + rels = GetPublicationRelations(pubform->oid, PUBLICATION_PART_ROOT);
> > > > + RelSchemaIsMemberOfSchemaList(rels, schemaidlist, true);
> > > > ...
> > > > ...
> > > > }
> > > >
> > > > Here, we don't have a lock on the relation. So, what if the relation
> > > > is concurrently dropped after you get the rel list by
> > > > GetPublicationRelations?
> > >
> > > This works fine without locking even after concurrent drop, I felt
> > > this works because of MVCC.
> > >
> >
> > Can you share the exact scenario you have tested? I think here it can
> > give a wrong error because it might access invalid cache entry, so I
> > think a lock is required here. Also, as said before, this might help
> > to make the rel list consistent in function
> > CheckObjSchemaNotAlreadyInPublication().
>
> This is the scenario I tried:
> create schema sch1;
> create table sch1.t1 (c1 int);
> create publication pub1 for table sch1.t1;
> alter publication pub1 add all tables in schema sch1;  -- concurrently
> drop table sch1.t1 from another session.
>
> I will add the locking and changing of
> CheckObjSchemaNotAlreadyInPublication in the next version.

I have made the changes for the above at the v30 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm3aFtXpkD4m28-ENG9F4faBEVdGNUrEhgKV0pHr2S_C2g%40mail.gmail.com

Regards,
Vignesh

Re: Added schema level support for publication.

2021-09-21 Thread vignesh C

On Tue, Sep 21, 2021 at 9:03 AM Greg Nancarrow  wrote:
>
> On Fri, Sep 17, 2021 at 10:09 PM vignesh C  wrote:
> >
> > Attached v29 patch has the fixes for the same.
> >
>
> Some minor comments on the v29-0002 patch:
>
> (1)
> In get_object_address_publication_schema(), the error message:
>
> + errmsg("publication tables of schema \"%s\" in publication \"%s\"
> does not exist",
>
> isn't grammatically correct. It should probably be:
>
> + errmsg("publication tables of schema \"%s\" in publication \"%s\" do
> not exist",

Modified

> (2)
> getPublicationSchemaInfo() function header.
> "string" should be "strings"
>
> BEFORE:
> + * nspname. Both pubname and nspname are palloc'd string which will be freed 
> by
> AFTER:
> + * nspname. Both pubname and nspname are palloc'd strings which will
> be freed by

Modified

> (3)
> getPublicationSchemaInfo()
>
> In the case of "if (!(*nspname))", the following line should probably
> be added before returning false:
>
>*pubname = NULL;

I left it as it is, as we don't access it in case of return false.

> (4)
> GetAllSchemasPublicationRelations() function header
>
> Shouldn't:
>
> + * Gets the list of all relations published by FOR ALL TABLES IN SCHEMA
> + * publication(s).
>
> actually say:
>
> + * Gets the list of all relations published by a FOR ALL TABLES IN SCHEMA
> + * publication.
>
> since it is for a specified publication?

Modified

> (5)
> I'm wondering, in CheckObjSchemaNotAlreadyInPublication(), instead of
> checking "checkobjtype" each loop iteration, wouldn't it be better to
> just use the same for-loop in each IF block?

Modified

I have made the changes for the above at the v30 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm3aFtXpkD4m28-ENG9F4faBEVdGNUrEhgKV0pHr2S_C2g%40mail.gmail.com

Regards,
Vignesh

Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-09-21 Thread Ranier Vilela

Hi,

Currently when determining where CoerceToDomainValue can be read,
it evaluates every step in a loop.
But, I think that the expression is immutable and should be solved only
once.

Otherwise the logic is wrong since by the rules of C, even though the
variable is
being initialized in the declaration, it still receives initialization at
each repetition.
What causes palloc running multiple times.

In other words:
Datum   *domainval = NULL;

is the same:
Datum   *domainval;
domainval = NULL;

Once there, reduce the scope for save_innermost_domainval and
save_innermost_domainnull.

Thoughts?

regards,
Ranier Vilela


fix_eval_expr_once.patch
Description: Binary data

windows build slow due to windows.h includes

2021-09-21 Thread Andres Freund

Hi,

For the AIO stuff I needed to build postgres for windows. And I was a bit
horrified by the long compile times. At first I was ready to blame the MS
compiler for being slow, until I noticed that using mingw gcc from linux to
cross compile to windows is also a *lot* slower than building for linux.

I found some blog-post-documented-only compiler flags [1], most importantly
/d1reportTime. Which shows that the include processing of postgres.h takes
0.6s [2]

Basically all the time in a debug windows build is spent parsing windows.h and
related headers. Argh.

The amount of stuff we include in win32_port.h and declare is pretty absurd
imo. There's really no need to expose the whole backend to all of it. Most of
it should just be needed in a few port/ files and a few select users.

But that's too much work for my taste. As it turns out there's a partial
solution to windows.h being just so damn big, the delightfully named
WIN32_LEAN_AND_MEAN.

This reduces the non-incremental buildtime in my 8 core windows VM from 187s to
140s. Cross compiling from linux it's
master:
real0m53.807s
user22m16.930s
sys 2m50.264s
WIN32_LEAN_AND_MEAN
real0m32.956s
user12m17.773s
sys 1m52.313s

Still far from !windows compile times, but still not a bad improvement.

Most of the compile time after this is still spent doing parsing /
preprocessing. I sidetracked myself into looking at precompiled headers, but
it's not trivial to do that right unfortunately.


I think it'd be good if win32_port.h were slimmed down, and more of its
contents were moved into fake "port/win32/$name-of-unix-header" style headers
or such.


Greetings,

Andres Freund

[1] https://aras-p.info/blog/2019/01/21/Another-cool-MSVC-flag-d1reportTime/

[2]

postgres.c
Include Headers:
Count: 483
c:\Users\anfreund\src\postgres\src\include\postgres.h: 0.561795s
c:\Users\anfreund\src\postgres\src\include\c.h: 
0.556991s

c:\Users\anfreund\src\postgres\src\include\postgres_ext.h: 0.000488s

c:\Users\anfreund\src\postgres\src\include\pg_config_ext.h: 0.000151s

c:\Users\anfreund\src\postgres\src\include\pg_config.h: 0.000551s

c:\Users\anfreund\src\postgres\src\include\pg_config_manual.h: 0.000286s

c:\Users\anfreund\src\postgres\src\include\pg_config_os.h: 0.014283s
C:\Program Files (x86)\Microsoft Visual 
Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\crtdefs.h: 0.009727s
...

c:\Users\anfreund\src\postgres\src\include\port\win32_port.h: 0.487469s
C:\Program Files (x86)\Windows 
Kits\10\include\10.0.20348.0\um\winsock2.h: 0.449373s
...
C:\Program Files 
(x86)\Windows Kits\10\include\10.0.20348.0\um\windows.h: 0.439666s

diff --git i/src/include/port/win32_port.h w/src/include/port/win32_port.h
index 05c5a534420..b5a44519d98 100644
--- i/src/include/port/win32_port.h
+++ w/src/include/port/win32_port.h
@@ -43,6 +43,8 @@
 #define _WINSOCKAPI_
 #endif
 
+#define WIN32_LEAN_AND_MEAN
+
 #include 
 #include 
 #include

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-09-21 Thread Andrew Dunstan



On 9/15/21 10:38 AM, Mark Dilger wrote:
>> On Aug 31, 2021, at 6:41 PM, Mark Dilger  
>> wrote:
>>
>> 
> Synopsis:
>
> The major change in version 7 is a reworking of role ownership and the 
> CREATEROLE attribute to make it no longer a near-superuser privilege.  This 
> new functionality is in v7-0021.
>
> Details:
>
> The changes in version 7 of the patchset are:
>
> v7-0001 is a new patch that introduces a single new regression test covering 
> various aspects of the permissions system surrounding creating, altering, 
> dropping and granting membership in roles.  The functional changes in v7-0021 
> do not cause pre-existing regression test failures, not even when running 
> check-world, despite fundamentally changing how much of this works.  This new 
> test adds coverage for create role, and as each patch in the series 
> introduces changes, is modified to reflect them.
>
> v6-0001 through v6-0019 correspond to v7-0002 through v7-0020 and are mostly 
> unchanged, but are updated to apply cleanly to the current git master, to fix 
> a bug that was present in the v6 patch set, to update the regression tests 
> for security labels where CREATEROLE is used, and to update the create_role 
> regression test from v7-0001 as needed per patch.
>
> v7-0021 redesigns the CREATEROLE attribute to no longer bestow nearly so much 
> power.  The ability to alter or drop a role no longer flows from having the 
> CREATEROLE attribute, but rather from being the role's owner.  The ADMIN 
> option works as before, but role owners implicitly have ADMIN on roles which 
> they own.
>
> Roles with the CREATEROLE attribute may create new roles, but those new roles 
> may not be created with privileges which the creating role lacks.  
> Specifically, SUPERUSER, REPLICATION, BYPASSRLS, CREATEDB, CREATEROLE and 
> LOGIN privilege may not be granted the new role unless the creating role has 
> them.  (This rule is adhered to but trivial in the case of the CREATEROLE 
> privilege, since the creator must necessarily have that one.)  When creating 
> a new role using the IN ROLE, ROLE, or ADMIN clauses, the creating role must 
> have sufficient privileges on the roles named by these clauses to perform the 
> GRANTs these roles entail.  Merely having the CREATEROLE attribute is 
> insufficient to perform arbitrary grants of role memberships.
>
> The INHERIT, VALID UNTIL, and CONNECTION LIMIT attributes are not thought 
> about as privileges in the patch; perhaps they should be?  It would be quite 
> reasonable to say that a role with a finite connection limit should have that 
> limit thought about as a "pool" and should have to assign connection rights 
> from that pool to other roles it creates.  Likewise, a role with a VALID 
> UNTIL limit could be constrained to only create roles with VALID UNTIL less 
> than or equal to its own limit.  Perhaps a NOINHERIT role should only be able 
> to create NOINHERIT roles?  The patch does none of these things, but feedback 
> is much appreciated.
>
> The docs are adjusted, but drop_role.sgml may need to be further adjusted:
>
>  
>   The SQL standard defines DROP ROLE, but it allows
>   only one role to be dropped at a time, and it specifies different
>   privilege requirements than PostgreSQL uses.
>  
>
> I lack a copy of the SQL standard, so I'm uncertain if this patch has, by 
> chance, changed the privilege requirements to match that of the spec?
>
>

This patch set is failing to apply for me - it fails on patch 2.


I haven't dug terribly deeply into it yet, but I notice that there is a
very large increase in test volume, which appears to account for much of
the 44635 lines of the patch set. I think we're probably going to want
to reduce that. We've had complaints in the past from prominent hackers
about adding too much volume to the regression tests.


I do like the basic thrust of reducing the power of CREATEROLE. There's
an old legal maxim I learned in my distant youth that says "nemo dat
quod non habet" - Nobody can give something they don't own. This seems
to be in that spirit, and I approve :-)


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: windows build slow due to windows.h includes

2021-09-21 Thread Andrew Dunstan



On 9/21/21 3:30 PM, Andres Freund wrote:
> Hi,
>
> For the AIO stuff I needed to build postgres for windows. And I was a bit
> horrified by the long compile times. At first I was ready to blame the MS
> compiler for being slow, until I noticed that using mingw gcc from linux to
> cross compile to windows is also a *lot* slower than building for linux.
>
> I found some blog-post-documented-only compiler flags [1], most importantly
> /d1reportTime. Which shows that the include processing of postgres.h takes
> 0.6s [2]
>
> Basically all the time in a debug windows build is spent parsing windows.h and
> related headers. Argh.
>
> The amount of stuff we include in win32_port.h and declare is pretty absurd
> imo. There's really no need to expose the whole backend to all of it. Most of
> it should just be needed in a few port/ files and a few select users.
>
> But that's too much work for my taste. As it turns out there's a partial
> solution to windows.h being just so damn big, the delightfully named
> WIN32_LEAN_AND_MEAN.
>
> This reduces the non-incremental buildtime in my 8 core windows VM from 187s 
> to
> 140s. Cross compiling from linux it's
> master:
> real  0m53.807s
> user  22m16.930s
> sys   2m50.264s
> WIN32_LEAN_AND_MEAN
> real  0m32.956s
> user  12m17.773s
> sys   1m52.313s


Nice!


I also see references to VC_EXTRALEAN which defines this and some other
stuff that might make things even faster.


Worth investigating.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-09-21 Thread Andres Freund

Hi,

On 2021-09-21 15:09:11 -0300, Ranier Vilela wrote:
> Currently when determining where CoerceToDomainValue can be read,
> it evaluates every step in a loop.
> But, I think that the expression is immutable and should be solved only
> once.

What is immutable here?

> Otherwise the logic is wrong since by the rules of C, even though the
> variable is
> being initialized in the declaration, it still receives initialization at
> each repetition.
> What causes palloc running multiple times.
> 
> In other words:
> Datum   *domainval = NULL;
> 
> is the same:
> Datum   *domainval;
> domainval = NULL;

Obviously?

> Thoughts?

I don't see what this is supposed to achieve. The allocation of
domainval/domainnull happens on every loop iteration with/without your patch.

And it has to, the allocation intentionally is separate for each
constraint. As the comment even explicitly says:
/*
 * Since value might be read multiple 
times, force to R/O
 * - but only if it could be an 
expanded datum.
 */
Greetings,

Andres Freund

Re: prevent immature WAL streaming

2021-09-21 Thread Alvaro Herrera

I made one final pass over the whole thing to be sure it's all commented
as thoroughly as possible, and changed the names of things to make them
all consistent.  So here's the last version which I intend to push to
all branches soon.  (The only difference in back-branches is that
the xlogreader struct needs to be adjusted to have the new fields at the
bottom.)

One thing to note is that this is an un-downgradeable minor; and of
course people should upgrade standbys before primaries.

On 2021-Sep-17, Alvaro Herrera wrote:

> On 2021-Sep-17, Bossart, Nathan wrote:

> > I see.  IMO feels a bit counterintuitive to validate a partial record
> > that you are ignoring anyway, but I suppose it's still valuable to
> > know when the WAL is badly broken.  It's not expensive, and it doesn't
> > add a ton of complexity, either.
> 
> Yeah, we don't have any WAL record history validation other than the
> verifying the LSN of the previous record; I suppose in this particular
> case you could argue that we shouldn't bother with any validation
> either.  But it seems safer to do it.  It doesn't hurt anything anyway.

BTW we do validate the CRC for all records, but we don't have any means
to validate the CRC of a partial record; so in theory if we don't add
CRC validation of the ignored contents, there would be no validation at
all for that record.  I don't think we care, but it's true that there
would be a big blob in WAL that we don't really know anything about.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)
>From 8893e2073365c88449a118bd340a78241afe8f97 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 21 Sep 2021 12:43:15 -0300
Subject: [PATCH v8 1/3] Document XLOG_INCLUDE_XID a little better
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I noticed that commit 0bead9af484c left this flag undocumented in
XLogSetRecordFlags, which led me to discover that the flag doesn't
actually do what the one comment on it said it does.  Improve the
situation by adding some more comments.

Backpatch to 14, where the aforementioned commit appears.

Author: Álvaro Herrera 
---
 src/backend/access/transam/xloginsert.c | 2 ++
 src/include/access/xlog.h   | 2 +-
 src/include/access/xlogrecord.h | 5 +++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index e596a0470a..e329ae5c2a 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -409,6 +409,8 @@ XLogRegisterBufData(uint8 block_id, char *data, int len)
  * - XLOG_MARK_UNIMPORTANT, to signal that the record is not important for
  *	 durability, which allows to avoid triggering WAL archiving and other
  *	 background activity.
+ * - XLOG_INCLUDE_XID, a message-passing hack between XLogRecordAssemble
+ *   and XLogResetInsertion.
  */
 void
 XLogSetRecordFlags(uint8 flags)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 0a8ede700d..5e2c94a05f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -212,7 +212,7 @@ extern bool XLOG_DEBUG;
  */
 #define XLOG_INCLUDE_ORIGIN		0x01	/* include the replication origin */
 #define XLOG_MARK_UNIMPORTANT	0x02	/* record not important for durability */
-#define XLOG_INCLUDE_XID		0x04	/* include XID of top-level xact */
+#define XLOG_INCLUDE_XID		0x04	/* WAL-internal message-passing hack */
 
 
 /* Checkpoint statistics */
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index e06ee92a5e..8d1305eae8 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -215,8 +215,9 @@ typedef struct XLogRecordDataHeaderLong
  * Block IDs used to distinguish different kinds of record fragments. Block
  * references are numbered from 0 to XLR_MAX_BLOCK_ID. A rmgr is free to use
  * any ID number in that range (although you should stick to small numbers,
- * because the WAL machinery is optimized for that case). A couple of ID
- * numbers are reserved to denote the "main" data portion of the record.
+ * because the WAL machinery is optimized for that case). A few ID
+ * numbers are reserved to denote the "main" data portion of the record,
+ * as well as replication-supporting transaction metadata.
  *
  * The maximum is currently set at 32, quite arbitrarily. Most records only
  * need a handful of block references, but there are a few exceptions that
-- 
2.20.1

>From 2daa93570c801f21f5abdacc6d8f77bd62bf5d53 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 Sep 2021 17:21:46 -0400
Subject: [PATCH v8 2/3] Implement FIRST_IS_ABORTED_CONTRECORD

---
 src/backend/access/rmgrdesc/xlogdesc.c  |  12 ++
 src/backend/access/transam

Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-09-21 Thread Tom Lane

Andres Freund  writes:
> On 2021-09-21 15:09:11 -0300, Ranier Vilela wrote:
>> Currently when determining where CoerceToDomainValue can be read,
>> it evaluates every step in a loop.
>> But, I think that the expression is immutable and should be solved only
>> once.

> What is immutable here?

I think Ranier has a point here.  The clear intent of this bit:

/*
 * If first time through, determine where CoerceToDomainValue
 * nodes should read from.
 */
if (domainval == NULL)
{

is that we only need to emit the EEOP_MAKE_READONLY once when there are
multiple CHECK constraints.  But because domainval has the wrong lifespan,
that test is constant-true, and we'll do it over each time to little
purpose.

> And it has to, the allocation intentionally is separate for each
> constraint. As the comment even explicitly says:
> /*
>  * Since value might be read multiple times, force to R/O
>  * - but only if it could be an expanded datum.
>  */

No, what that's on about is that each constraint might contain multiple
VALUE symbols.  But once we've R/O-ified the datum, we can keep using
it across VALUE symbols in different CHECK expressions, not just one.

(AFAICS anyway)

I'm unexcited by the proposed move of the save_innermost_domainval/null
variables, though.  It adds no correctness and it forces an additional
level of indentation of a good deal of code, as the patch fails to show.

regards, tom lane

Re: PostgreSQL High Precision Support Extension.

2021-09-21 Thread Thomas Munro

On Tue, Sep 21, 2021 at 2:58 PM Thomas Munro  wrote:
> On Tue, Sep 21, 2021 at 1:30 PM A Z  wrote:
> > -A library like GMP, written in C, is an appropriate basis to start from 
> > and to include, for all OS platforms involved.
>
> Are you aware of Daniele Varrazzo's extension
> https://github.com/dvarrazzo/pgmp/ ?  (Never looked into it myself,
> but this seems like the sort of thing you might be looking for?)

[A Z replied off-list and mentioned areas where pgmp falls short, but
I'll reply on-list to try to increase the chance of useful discussion
here...]

It seems to me that there are 3 or 4 different topics here:

1.  Can you find the functions GMP lacks in some other library?  For
example, if I understand correctly, the library "mpfr" provides a
bunch of transcendental functions for libgmp's types.  Are there other
libraries?  Can you share what you already know about the landscape of
relevant libraries and what's good or lacking from your perspective?
Or are you proposing writing entirely new numeric code (in which case
that's getting pretty far away from the topics we're likely to discuss
here...).

2.  Supposing there are suitable libraries that build on top of GMP,
would it be reasonable to make a separate extension that extends pgmp?
 That is, users install the pgmp extension to get the basic types and
functions, and then install a second, new extension "pmpfr" (or
whatever) to get access to more functions?

3.  You mentioned wanting portability and packages for platforms X, Y,
Z.  Packaging is something to worry about later, and not typically
something that the author of an extension has to do personally.  Once
you produce a good extension, it seems very likely that you could
convince the various package maintainers to pick it up (as they've
done for pgmp and many other extensions).  The only question to worry
about initially is how portable the libraries you depend on are.  For
what it's worth, I'd personally start by setting up a CI system for a
bunch of relevant OSes with all the relevant libraries installed, for
exploration; I could provide some pointers on how to do that if you
think that would be interesting.

4.  You talked about whether such types could be in PostgreSQL "core".
In my humble opinion (1) this is a textbook example of something that
belongs in an extension, and (2) things built on GNU/GPL libraries are
complicated and probably couldn't ever be included in core in our
BSD-licensed project anyway.  (In contrast, the SQL:2016 DECFLOAT(n)
types really should be included in the core system, because they're in
the SQL standard and we are a SQL implementation, and AFAIK the only
real thing stopping us from doing that is deciding which library to
use to do it, which is complicated.)

Re: PostgreSQL High Precision Support Extension.

2021-09-21 Thread Jan Wieck


On 9/21/21 6:20 PM, Thomas Munro wrote:

On Tue, Sep 21, 2021 at 2:58 PM Thomas Munro  wrote:

On Tue, Sep 21, 2021 at 1:30 PM A Z  wrote:
> -A library like GMP, written in C, is an appropriate basis to start from and 
to include, for all OS platforms involved.

Are you aware of Daniele Varrazzo's extension
https://github.com/dvarrazzo/pgmp/ ?  (Never looked into it myself,
but this seems like the sort of thing you might be looking for?)


[A Z replied off-list and mentioned areas where pgmp falls short, but
I'll reply on-list to try to increase the chance of useful discussion
here...]


This seems to become a common pattern to open source communities. Not 
just PostgreSQL, I have seen it elsewhere. People make vague or just 
specification level "proposals" and as soon as anyone replies, try to 
drag it into private and off-list/off-forum conversations.


Most of the time this is because they aren't really proposing anything, 
but are just looking for someone else to implement what they need for 
their own, paying customer. They are not willing or able to contribute 
anything but requirements.


As the original author of the NUMERIC data type I am definitely 
interested in this sort of stuff. And I would love contributing in the 
design of the on-disk and in-memory structures of these new data types 
created as an EXTENSION.


However, so far I only see a request for someone else to create this 
extension. What exactly is "A Z" (poweruserm) going to contribute to 
this effort?



Regards, Jan

--
Jan Wieck

Re: windows build slow due to windows.h includes

2021-09-21 Thread Andres Freund

Hi,

On 2021-09-21 16:13:55 -0400, Andrew Dunstan wrote:
> I also see references to VC_EXTRALEAN which defines this and some other
> stuff that might make things even faster.

I don't think that's relevant to "us", just mfc apps (which we gladly
aren't). From what I can see we'd have to actually clean up our includes to
not have windows.h everywhere or use precompiled headers to benefit further.

Greetings,

Andres Freund

Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-09-21 Thread Andres Freund

Hi,

On 2021-09-21 18:21:24 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2021-09-21 15:09:11 -0300, Ranier Vilela wrote:
> >> Currently when determining where CoerceToDomainValue can be read,
> >> it evaluates every step in a loop.
> >> But, I think that the expression is immutable and should be solved only
> >> once.
> 
> > What is immutable here?
> 
> I think Ranier has a point here.  The clear intent of this bit:
> 
> /*
>  * If first time through, determine where CoerceToDomainValue
>  * nodes should read from.
>  */
> if (domainval == NULL)
> {
> 
> is that we only need to emit the EEOP_MAKE_READONLY once when there are
> multiple CHECK constraints.  But because domainval has the wrong lifespan,
> that test is constant-true, and we'll do it over each time to little
> purpose.

Oh, I clearly re-skimmed the code too quickly. Sorry for that!


> (AFAICS anyway)
> 
> I'm unexcited by the proposed move of the save_innermost_domainval/null
> variables, though.  It adds no correctness and it forces an additional
> level of indentation of a good deal of code, as the patch fails to show.

Yea.

Greetings,

Andres Freund

Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-09-21 Thread Ranier Vilela

Em ter., 21 de set. de 2021 às 19:21, Tom Lane  escreveu:

> Andres Freund  writes:
> > On 2021-09-21 15:09:11 -0300, Ranier Vilela wrote:
> >> Currently when determining where CoerceToDomainValue can be read,
> >> it evaluates every step in a loop.
> >> But, I think that the expression is immutable and should be solved only
> >> once.
>
> > What is immutable here?
>
> I think Ranier has a point here.  The clear intent of this bit:
>
> /*
>  * If first time through, determine where
> CoerceToDomainValue
>  * nodes should read from.
>  */
> if (domainval == NULL)
> {
>
> is that we only need to emit the EEOP_MAKE_READONLY once when there are
> multiple CHECK constraints.  But because domainval has the wrong lifespan,
> that test is constant-true, and we'll do it over each time to little
> purpose.
>
Exactly, thanks for the clear explanation.


> > And it has to, the allocation intentionally is separate for each
> > constraint. As the comment even explicitly says:
> > /*
> >  * Since value might be read multiple times, force
> to R/O
> >  * - but only if it could be an expanded datum.
> >  */
>
> No, what that's on about is that each constraint might contain multiple
> VALUE symbols.  But once we've R/O-ified the datum, we can keep using
> it across VALUE symbols in different CHECK expressions, not just one.
>
> (AFAICS anyway)
>
> I'm unexcited by the proposed move of the save_innermost_domainval/null
> variables, though.  It adds no correctness and it forces an additional
> level of indentation of a good deal of code, as the patch fails to show.
>
Ok, but I think that still has a value in reducing the scope.
save_innermost_domainval and save_innermost_domainnull,
only are needed with DOM_CONSTRAINT_CHECK expressions,
and both are declared even when they will not be used.

Anyway, the v1 patch fixes only the expression eval.

regards,
Ranier Vilela


v1_fix_eval_expr_once.patch
Description: Binary data

Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-09-21 Thread Ranier Vilela

Em ter., 21 de set. de 2021 às 20:00, Andres Freund 
escreveu:

> Hi,
>
> On 2021-09-21 18:21:24 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2021-09-21 15:09:11 -0300, Ranier Vilela wrote:
> > >> Currently when determining where CoerceToDomainValue can be read,
> > >> it evaluates every step in a loop.
> > >> But, I think that the expression is immutable and should be solved
> only
> > >> once.
> >
> > > What is immutable here?
> >
> > I think Ranier has a point here.  The clear intent of this bit:
> >
> > /*
> >  * If first time through, determine where
> CoerceToDomainValue
> >  * nodes should read from.
> >  */
> > if (domainval == NULL)
> > {
> >
> > is that we only need to emit the EEOP_MAKE_READONLY once when there are
> > multiple CHECK constraints.  But because domainval has the wrong
> lifespan,
> > that test is constant-true, and we'll do it over each time to little
> > purpose.
>
> Oh, I clearly re-skimmed the code too quickly. Sorry for that!
>
No problem, thanks for taking a look.

regards,
Ranier Vilela

Re: windows build slow due to windows.h includes

2021-09-21 Thread Ranier Vilela

Em ter., 21 de set. de 2021 às 16:30, Andres Freund 
escreveu:

> Hi,
>
> For the AIO stuff I needed to build postgres for windows. And I was a bit
> horrified by the long compile times. At first I was ready to blame the MS
> compiler for being slow, until I noticed that using mingw gcc from linux to
> cross compile to windows is also a *lot* slower than building for linux.
>
> I found some blog-post-documented-only compiler flags [1], most importantly
> /d1reportTime. Which shows that the include processing of postgres.h takes
> 0.6s [2]
>
> Basically all the time in a debug windows build is spent parsing windows.h
> and
> related headers. Argh.
>
> The amount of stuff we include in win32_port.h and declare is pretty absurd
> imo. There's really no need to expose the whole backend to all of it. Most
> of
> it should just be needed in a few port/ files and a few select users.
>
> But that's too much work for my taste. As it turns out there's a partial
> solution to windows.h being just so damn big, the delightfully named
> WIN32_LEAN_AND_MEAN.
>
+1
But I did a quick dirty test here, and removed windows.h in win32_port.h,
and compiled normally with msvc 2019 (64 bit), would it work with mingw
cross compile?

regards,
Ranier Vilela

Re: [PATCH] Implement INSERT SET syntax

2021-09-21 Thread Rachel Heaton

> On 4/23/20 8:04 PM, Gareth Palmer wrote:
> >
> > Thank you for the review, attached is v7 of the patch which should
> > apply correcly to HEAD.
> >

Hello Gareth,

This patch no longer applies to HEAD, can you please submit a rebased version?

Thanks,
Rachel

Re: windows build slow due to windows.h includes

2021-09-21 Thread Andres Freund

Hi,

On 2021-09-21 20:26:36 -0300, Ranier Vilela wrote:
> Em ter., 21 de set. de 2021 às 16:30, Andres Freund 
> escreveu:
> > But that's too much work for my taste. As it turns out there's a partial
> > solution to windows.h being just so damn big, the delightfully named
> > WIN32_LEAN_AND_MEAN.
> >
> +1
> But I did a quick dirty test here, and removed windows.h in win32_port.h,
> and compiled normally with msvc 2019 (64 bit), would it work with mingw
> cross compile?

That's likely only because winsock indirectly includes windows.h - because of
that it won't actually reduce compile time. And you can't remove the other
headers that indirectly include windows.h without causing compilation errors.

Greetings,

Andres Freund

Re: Improve logging when using Huge Pages

2021-09-21 Thread Justin Pryzby

On Wed, Sep 22, 2021 at 02:03:11AM +0900, Fujii Masao wrote:
> Another idea is to output "Anonymous shared memory was allocated with
>  huge pages" when it's successfully allocated with huge pages, and to output
>  "Anonymous shared memory was allocated without huge pages"
>  when it's successfully allocated without huge pages. I'm not sure if users
>  may think even this message is noisy, though.

+1

Maybe it could show the page size instead of "with"/without:
"Shared memory allocated with 4k/1MB/1GB pages."

-- 
Justin

Re: row filtering for logical replication

2021-09-21 Thread Ajin Cherian

On Tue, Sep 21, 2021 at 9:42 PM Dilip Kumar  wrote:
>
> On Tue, Sep 21, 2021 at 4:29 PM Dilip Kumar  wrote:
> > Some more comments,
> >
> > In pgoutput_row_filter_update(), first, we are deforming the tuple in
> > local datum, then modifying the tuple, and then reforming the tuple.
> > I think we can surely do better here.  Currently, you are reforming
> > the tuple so that you can store it in the scan slot by calling
> > ExecStoreHeapTuple which will be used for expression evaluation.
> > Instead of that what you need to do is to deform the tuple using
> > tts_values of the scan slot and later call ExecStoreVirtualTuple(), so
> > advantages are 1) you don't need to reform the tuple 2) the expression
> > evaluation machinery doesn't need to deform again for fetching the
> > value of the attribute, instead it can directly get from the value
> > from the virtual tuple.
> >
>
> I have one more question, while looking into the
> ExtractReplicaIdentity() function, it seems that if any of the "rep
> ident key" fields is changed then we will write all the key fields in
> the WAL as part of the old tuple, not just the changed fields.  That
> means either the old tuple will be NULL or it will be having all the
> key attributes.  So if we are supporting filter only on the "rep ident
> key fields" then is there any need to copy the fields from the new
> tuple to the old tuple?
>

Yes, I just figured this out while testing. So we don't need to copy fields
from the new tuple to the old tuple.

But there is still the case of your fix for the unchanged toasted RI
key fields in the new tuple
which needs to be copied from the old tuple to the new tuple. This
particular case
seems to violate both rules that an old tuple will be present only
when there are changed
RI key fields and that if there is an old tuple it will contain all RI
key fields. I think we
still need to deform both old tuple and new tuple, just to handle this case.

There is currently logic in ReorderBufferToastReplace() which already
deforms the new tuple
to detoast changed toasted fields in the new tuple. I think if we can
enhance this logic for our
purpose, then we can avoid an extra deform of the new tuple.
But I think you had earlier indicated that having untoasted unchanged
values in  the new tuple
can be bothersome.

Any suggestions?

regards,
Ajin Cherian
Fujitsu Australia

regards,
Ajin Cherian
Fujitsu Australia

Re: Added schema level support for publication.

2021-09-21 Thread Greg Nancarrow

On Wed, Sep 22, 2021 at 4:02 AM vignesh C  wrote:
>
> Attached v30 patch has the fixes for the same.
>

Thanks for all the patch updates.
I have some suggested updates to the v30-0005 documentation patch:

doc/src/sgml/ref/alter_publication.sgml
(1)
I'm thinking it might be better to simplify the description, because
it's a bit wordy and difficult to read with the "all tables in schema"
bits.
Suggested update is below (thoughts?):

BEFORE:
+   The first three variants change which tables and/or all tables in schema are
+   part of the publication.  The SET clause will replace
+   the list of tables and/or all tables in schema in the publication with the
+   specified one, the existing tables and all tables in schema that were
+   present in the publication will be removed.  The ADD
+   clause will add one or more tables and/or all tables in schema to the
+   publication. The DROP clauses will remove one or more
+   tables and/or all tables in schema from the publication.  Note that adding
+   tables and/or all tables in schema to a publication that is already
+   subscribed to will require a ALTER SUBSCRIPTION ...
REFRESH PUBLICATION
+   action on the subscribing side in order to become effective.
AFTER:
+   The first three variants change which tables/schemas are
+   part of the publication.  The SET clause will replace
+   the list of tables/schemas in the publication with the
+   specified list; the existing tables/schemas that were
+   present in the publication will be removed.  The ADD
+   clause will add one or more tables/schemas to the
+   publication. The DROP clauses will remove one or more
+   tables/schemas from the publication.  Note that adding
+   tables/schemas to a publication that is already
+   subscribed to will require a ALTER SUBSCRIPTION ...
REFRESH PUBLICATION
+   action on the subscribing side in order to become effective.


doc/src/sgml/ref/create_publication.sgml
(2)
I suggest an update similar to the following:

BEFORE:
+  Specifying a table that is part of schema specified in
+  FOR ALL TABLES IN SCHEMA option is not supported.
AFTER:
+  Specifying a table that is part of a schema already included in
the publication is not supported.


(3)
I find the following description a little unclear:

+ 
+  Specifying a schema along with schema's table specified as part of
+  FOR TABLE option is not supported.
+ 

Perhaps the following would be better:

+ 
+  Specifying a schema that contains a table already included in the
+  publication is not supported.
+ 

(4)
Minor fix:
BEFORE:
+   rights on the table.  The FOR ALL TABLES and
+   FOR ALL TABLES IN SCHEMA clause requires the invoking
+   user to be a superuser.
AFTER:
+   rights on the table.  The FOR ALL TABLES and
+   FOR ALL TABLES IN SCHEMA clauses require the invoking
+   user to be a superuser.


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Unintended interaction between bottom-up deletion and deduplication's single value strategy

2021-09-21 Thread Peter Geoghegan

On Sun, Sep 19, 2021 at 7:47 PM Peter Geoghegan  wrote:
> Attached patch fixes the bug by slightly narrowing the conditions
> under which we'll consider if we should apply deduplication's single
> value strategy.

Pushed this fix a moment ago.

-- 
Peter Geoghegan

RE: Added schema level support for publication.

2021-09-21 Thread houzj.f...@fujitsu.com

On Wednesday, September 22, 2021 2:02 AM vignesh C  wrote:
> Attached v30 patch has the fixes for the same.

Thanks for updating the patches.

I have one comment.
@@ -474,7 +707,75 @@ AlterPublication(ParseState *pstate, AlterPublicationStmt 
*stmt)
...
+   if (list_length(relations))
+   {
...
+   /* remove the existing schemas from the publication */
+   PublicationDropSchemas(pubform->oid, delschemas, false);
...
+   }

After more thoughts on it, I still don't think drop all the schemas under " if
(list_length(relations))" is a good idea. I think 1) we'd better keep schema
and relation code separate. 2) if we support other type object(SEQUENCE) I the
future and only SET xx SEQUENCE, I think the above logic won't work because
both relations and schemaidlist will be NIL.

Same as the logic of drop all tables under " if (list_length(schemaidlist))".

Thoughs ?

Best regards,
Hou zj

Re: Added schema level support for publication.

2021-09-21 Thread Masahiko Sawada

On Wed, Sep 22, 2021 at 3:02 AM vignesh C  wrote:
>
>
> Attached v30 patch has the fixes for the same.
>

Thank you for updating the patches.

Here are random comments on v30-0002 patch:

+
+   if (stmt->action == DEFELEM_SET &&
!list_length(schemaidlist))
+   {
+   delschemas =
GetPublicationSchemas(pubform->oid);
+   LockSchemaList(delschemas);
+   }

I think "list_length(schemaidlist) > 0" would be more readable.

---
This patch introduces some new static functions to publicationcmds.c
but some have function prototypes but some don't. As far as I checked,

ObjectsInPublicationToOids()
CheckObjSchemaNotAlreadyInPublication()
GetAlterPublicationDelRelations()
AlterPublicationSchemas()
CheckPublicationAlterTables()
CheckPublicationAlterSchemas()
LockSchemaList()
OpenReliIdList()
PublicationAddSchemas()
PublicationDropSchemas()

are newly introduced but only four functions:

OpenReliIdList()
LockSchemaList()
PublicationAddSchemas()
PublicationDropSchemas()

have function prototypes. Actually, there already are functions that
don't have their prototype in publicationcmds.c. But it seems better
to clarify what kind of functions don't need to have a prototype at
least in this file.

---
ISTM we can inline the contents of three functions:
GetAlterPublicationDelRelations(), CheckPublicationAlterTables(), and
CheckPublicationAlterSchemas(). These have only one caller and ISTM
makes the readability worse. I think it's not necessary to make
functions for them.

---
+ * This is dispatcher function for AlterPublicationOptions,
+ * AlterPublicationSchemas and AlterPublicationTables.

As this comment mentioned, AlterPublication() calls
AlterPublicationTables() and AlterPublicationSchemas() but this
function also a lot of pre-processing such as building the list and
some checks, depending on stmt->action before calling these two
functions. And those two functions also perform some operation
depending on stmt->action. So ISTM it's better to move those
pre-processing to these two functions and have AlterPublication() just
call these two functions. What do you think?

---
+List *
+GetAllSchemasPublicationRelations(Oid puboid, PublicationPartOpt pub_partopt)

Since this function gets all relations in the schema publication, I
think GetAllSchemaPublicationRelations() would be better as a function
name (removing 's' before 'P').

---
+   if (!IsA(node, String))
+   ereport(ERROR,
+   errcode(ERRCODE_SYNTAX_ERROR),
+   errmsg("invalid schema
name at or near"),
+
parser_errposition(pstate, pubobj->location));

The error message should mention where the invalid schema name is at
or near. Also, In the following example, the error position in the
error message seems not to be where the invalid schemaname s.s is:

postgres(1:47707)=# create publication p for all tables in schema s.s;
ERROR:  invalid schema name at or near
LINE 1: create publication p for all tables in schema s.s;
 ^

---
+   if (pubobj->pubobjtype == PUBLICATIONOBJ_TABLE)
+   {
+   if (IsA(node, RangeVar))
+   *rels = lappend(*rels, (RangeVar *) node);
+   else if (IsA(node, String))
+   {
+   RangeVar   *rel = makeRangeVar(NULL,
strVal(node),
+
pubobj->location);
+
+   *rels = lappend(*rels, rel);
+   }
+   }
+   else if (pubobj->pubobjtype == PUBLICATIONOBJ_REL_IN_SCHEMA)
+   {
(snip)
+   /* Filter out duplicates if user specifies
"sch1, sch1" */
+   *schemas = list_append_unique_oid(*schemas, schemaid);
+   }

Do we need to filter out duplicates also in PUBLICATIONOBJ_TABLE case
since users can specify things like "TABLE tbl, tbl, tbl"?

---
+   if ((action == DEFELEM_ADD || action == DEFELEM_SET) && !superuser())
+   ereport(ERROR,
+   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+errmsg("must be superuser to add or
set schemas")));

Why do we require the superuser privilege only for ADD and SET but not for DROP?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: row filtering for logical replication

2021-09-21 Thread Amit Kapila

On Wed, Sep 22, 2021 at 6:42 AM Ajin Cherian  wrote:
>
> On Tue, Sep 21, 2021 at 9:42 PM Dilip Kumar  wrote:
> >
> > On Tue, Sep 21, 2021 at 4:29 PM Dilip Kumar  wrote:
> >
> > I have one more question, while looking into the
> > ExtractReplicaIdentity() function, it seems that if any of the "rep
> > ident key" fields is changed then we will write all the key fields in
> > the WAL as part of the old tuple, not just the changed fields.  That
> > means either the old tuple will be NULL or it will be having all the
> > key attributes.  So if we are supporting filter only on the "rep ident
> > key fields" then is there any need to copy the fields from the new
> > tuple to the old tuple?
> >
>
> Yes, I just figured this out while testing. So we don't need to copy fields
> from the new tuple to the old tuple.
>
> But there is still the case of your fix for the unchanged toasted RI
> key fields in the new tuple
> which needs to be copied from the old tuple to the new tuple. This
> particular case
> seems to violate both rules that an old tuple will be present only
> when there are changed
> RI key fields and that if there is an old tuple it will contain all RI
> key fields.
>

Why do you think that the second assumption (if there is an old tuple
it will contain all RI key fields.) is broken? It seems to me even
when we are planning to include unchanged toast as part of old_key, it
will contain all the key columns, isn't that true?

> I think we
> still need to deform both old tuple and new tuple, just to handle this case.
>

Yeah, but we will anyway talking about saving that cost for later if
we decide to send that tuple. I think we can further try to optimize
it by first checking whether the new tuple has any toasted value, if
so then only we need this extra pass of deforming.

> There is currently logic in ReorderBufferToastReplace() which already
> deforms the new tuple
> to detoast changed toasted fields in the new tuple. I think if we can
> enhance this logic for our
> purpose, then we can avoid an extra deform of the new tuple.
> But I think you had earlier indicated that having untoasted unchanged
> values in  the new tuple
> can be bothersome.
>

I think it will be too costly on the subscriber side during apply
because it will update all the unchanged toasted values which will
lead to extra writes both for WAL and data.

-- 
With Regards,
Amit Kapila.

Re: 回复：Queries that should be canceled will get stuck on secure_write function

2021-09-21 Thread Fujii Masao





On 2021/09/22 1:14, 蔡梦娟(玊于) wrote:

Hi all, I want to know your opinion on this patch, or in what way do you think 
we should solve this problem?


I agree that something like the patch (i.e., introduction of promotion
from cancel request to terminate one) is necessary for the fix. One concern
on the patch is that the cancel request can be promoted to the terminate one
even when secure_write() doesn't actually get stuck. Is this acceptable?
Maybe I'm tempted to set up the duration until the promotion happens
Or we should introduce the dedicated timer for communication on the socket?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Estimating HugePages Requirements?

2021-09-21 Thread Michael Paquier

On Tue, Sep 21, 2021 at 04:06:38PM +, Bossart, Nathan wrote:
> I was looking at this from the standpoint of keeping the startup steps
> consistent between the modes.  Looking again, I can't think of
> a strong reason to add it to bootstrap mode.  I think the case for
> adding it to single-user mode is a bit stronger, as commands like
> "SHOW shared_memory_size;" currently return 0.  I lean in favor of
> adding it for single-user mode, but it's probably fine either way.

I am not sure either as that's a tradeoff between an underestimation
and no information.  The argument of consistency indeed matters.
Let's see if others have any opinion to share on this point.
--
Michael


signature.asc
Description: PGP signature

Re: relation OID in ReorderBufferToastReplace error message

2021-09-21 Thread Amit Kapila

On Wed, Sep 22, 2021 at 2:17 AM Jeremy Schneider  wrote:
>
> On 9/20/21 22:14, Amit Kapila wrote:
> > On Fri, Sep 17, 2021 at 10:53 AM Amit Kapila  
> > wrote:
> >>
> >> I don't think it is a bad idea to print additional information as you
> >> are suggesting but why only for this error? It could be useful to
> >> investigate any other error we get during decoding. I think normally
> >> we add such additional information via error_context. We have recently
> >> added/enhanced it for apply-workers, see commit [1].
> >>
> >> I think here we should just print the relation name in the error
> >> message you pointed out and then work on adding additional information
> >> via error context as a separate patch. What do you think?
> >
> > Attached please find the patch which just modifies the current error
> > message as proposed by you. I am planning to commit it in a day or two
> > unless there are comments or any other suggestions.
>
> Looks good to me. I see that I hadn't used the macro for getting the
> relation name, thanks for fixing that!
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

2021-09-21 Thread Ajin Cherian

On Wed, Sep 22, 2021 at 1:50 PM Amit Kapila  wrote:
>
> On Wed, Sep 22, 2021 at 6:42 AM Ajin Cherian  wrote:
> >
>
> Why do you think that the second assumption (if there is an old tuple
> it will contain all RI key fields.) is broken? It seems to me even
> when we are planning to include unchanged toast as part of old_key, it
> will contain all the key columns, isn't that true?

Yes, I assumed wrongly. Just checked. What you say is correct.

>
> > I think we
> > still need to deform both old tuple and new tuple, just to handle this case.
> >
>
> Yeah, but we will anyway talking about saving that cost for later if
> we decide to send that tuple. I think we can further try to optimize
> it by first checking whether the new tuple has any toasted value, if
> so then only we need this extra pass of deforming.

Ok, I will go ahead with this approach.

>
> > There is currently logic in ReorderBufferToastReplace() which already
> > deforms the new tuple
> > to detoast changed toasted fields in the new tuple. I think if we can
> > enhance this logic for our
> > purpose, then we can avoid an extra deform of the new tuple.
> > But I think you had earlier indicated that having untoasted unchanged
> > values in  the new tuple
> > can be bothersome.
> >
>
> I think it will be too costly on the subscriber side during apply
> because it will update all the unchanged toasted values which will
> lead to extra writes both for WAL and data.
>

Ok, agreed.

regards,
Ajin Cherian
Fujitsu Australia

Re: logical replication restrictions

2021-09-21 Thread Amit Kapila

On Tue, Sep 21, 2021 at 4:21 PM Marcos Pegoraro  wrote:

> No, I´m talking about that configuration you can have on standby servers
> recovery_min_apply_delay = '8h'
>
>
oh okay, I think this can be useful in some cases where we want to avoid
data loss similar to its use for physical standby. For example, if the user
has by mistake truncated the table (or deleted some required data) on the
publisher, we can always it from the subscriber if we have such a feature.

Having said that, I am not sure if we can call it a restriction. It is more
of a TODO kind of thing. It doesn't sound advisable to me to keep growing
the current Restrictions page [1].

[1] - https://wiki.postgresql.org/wiki/Todo
[2] -
https://www.postgresql.org/docs/devel/logical-replication-restrictions.html

-- 
With Regards,
Amit Kapila.

RE: Failed transaction statistics to measure the logical replication progress

2021-09-21 Thread osumi.takami...@fujitsu.com

Hello


Just conducted some cosmetic changes
and rebased my patch, using v14 patch-set in [1].

[1] - 
https://www.postgresql.org/message-id/CAD21AoCO_ZYWZEBw7ziiYoX7Zm1P0L9%3Dd7Jj9YsGEGsT9o6wmw%40mail.gmail.com


Best Regards,
Takamichi Osumi
 

extend_subscription_stats_of_transaction_v05.patch
Description: extend_subscription_stats_of_transaction_v05.patch

Re: Added schema level support for publication.

2021-09-21 Thread Amit Kapila

On Wed, Sep 22, 2021 at 8:02 AM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, September 22, 2021 2:02 AM vignesh C  
> wrote:
> > Attached v30 patch has the fixes for the same.
>
> Thanks for updating the patches.
>
> I have one comment.
> @@ -474,7 +707,75 @@ AlterPublication(ParseState *pstate, 
> AlterPublicationStmt *stmt)
> ...
> +   if (list_length(relations))
> +   {
> ...
> +   /* remove the existing schemas from the publication */
> +   PublicationDropSchemas(pubform->oid, delschemas, 
> false);
> ...
> +   }
>
> After more thoughts on it, I still don't think drop all the schemas under " if
> (list_length(relations))" is a good idea. I think 1) we'd better keep schema
> and relation code separate.
>

How do you suggest changing it?

-- 
With Regards,
Amit Kapila.

Re: windows build slow due to windows.h includes

2021-09-21 Thread Noah Misch

On Tue, Sep 21, 2021 at 12:30:35PM -0700, Andres Freund wrote:
> solution to windows.h being just so damn big, the delightfully named
> WIN32_LEAN_AND_MEAN.
> 
> This reduces the non-incremental buildtime in my 8 core windows VM from 187s 
> to
> 140s. Cross compiling from linux it's
> master:
> real  0m53.807s
> user  22m16.930s
> sys   2m50.264s
> WIN32_LEAN_AND_MEAN
> real  0m32.956s
> user  12m17.773s
> sys   1m52.313s

+1, great win for a one-liner.

Re: row filtering for logical replication

2021-09-21 Thread Dilip Kumar

On Wed, Sep 22, 2021 at 9:20 AM Amit Kapila  wrote:
>
> On Wed, Sep 22, 2021 at 6:42 AM Ajin Cherian  wrote:
> >
> > On Tue, Sep 21, 2021 at 9:42 PM Dilip Kumar  wrote:
> > >
> > > On Tue, Sep 21, 2021 at 4:29 PM Dilip Kumar  wrote:
> > >
> > > I have one more question, while looking into the
> > > ExtractReplicaIdentity() function, it seems that if any of the "rep
> > > ident key" fields is changed then we will write all the key fields in
> > > the WAL as part of the old tuple, not just the changed fields.  That
> > > means either the old tuple will be NULL or it will be having all the
> > > key attributes.  So if we are supporting filter only on the "rep ident
> > > key fields" then is there any need to copy the fields from the new
> > > tuple to the old tuple?
> > >
> >
> > Yes, I just figured this out while testing. So we don't need to copy fields
> > from the new tuple to the old tuple.
> >
> > But there is still the case of your fix for the unchanged toasted RI
> > key fields in the new tuple
> > which needs to be copied from the old tuple to the new tuple.

Yes, we will have to do that.

> > There is currently logic in ReorderBufferToastReplace() which already
> > deforms the new tuple
> > to detoast changed toasted fields in the new tuple. I think if we can
> > enhance this logic for our
> > purpose, then we can avoid an extra deform of the new tuple.
> > But I think you had earlier indicated that having untoasted unchanged
> > values in  the new tuple
> > can be bothersome.
> >
>
> I think it will be too costly on the subscriber side during apply
> because it will update all the unchanged toasted values which will
> lead to extra writes both for WAL and data.

Right we should not do that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Added schema level support for publication.

2021-09-21 Thread tanghy.f...@fujitsu.com

On Wednesday, September 22, 2021 11:22 AM Masahiko Sawada 
 wrote:
> 
> ---
> +   if (!IsA(node, String))
> +   ereport(ERROR,
> +   errcode(ERRCODE_SYNTAX_ERROR),
> +   errmsg("invalid schema
> name at or near"),
> +
> parser_errposition(pstate, pubobj->location));
> 
> The error message should mention where the invalid schema name is at
> or near. Also, In the following example, the error position in the
> error message seems not to be where the invalid schemaname s.s is:
> 
> postgres(1:47707)=# create publication p for all tables in schema s.s;
> ERROR:  invalid schema name at or near
> LINE 1: create publication p for all tables in schema s.s;
>  ^
> 

I noticed this, too. And I think it could be fixed by the following change, 
thoughts?

--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -9681,7 +9681,7 @@ PublicationObjSpec:   TABLE pubobj_expr
{
$$ = $5;
$$->pubobjtype = 
PUBLICATIONOBJ_REL_IN_SCHEMA;
-   $$->location = @1;
+   $$->location = @5;
}
| pubobj_expr
{


Besides, about this change in tab-complete.c:

+   else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD|DROP|SET", 
"SCHEMA"))
+   COMPLETE_WITH_QUERY(Query_for_list_of_schemas
+   " UNION SELECT 
'CURRENT_SCHEMA'");

It should be "ALL TABLES IN SCHEMA" not "SCHEMA" at the first line, right?

Regards
Tang

Re: Added schema level support for publication.

2021-09-21 Thread Amit Kapila

On Tue, Sep 21, 2021 at 11:39 PM vignesh C  wrote:
>
> On Tue, Sep 21, 2021 at 9:03 AM Greg Nancarrow  wrote:
> >
> > On Fri, Sep 17, 2021 at 10:09 PM vignesh C  wrote:
> > >
> > > Attached v29 patch has the fixes for the same.
> > >
> >
> > Some minor comments on the v29-0002 patch:
> >
> > (1)
> > In get_object_address_publication_schema(), the error message:
> >
> > + errmsg("publication tables of schema \"%s\" in publication \"%s\"
> > does not exist",
> >
> > isn't grammatically correct. It should probably be:
> >
> > + errmsg("publication tables of schema \"%s\" in publication \"%s\" do
> > not exist",
>
> Modified
>

I still see the old message in v30. But I have a different suggestion
for this message. How about changing it to: "publication schema \"%s\"
in publication \"%s\" does not exist"? This will make it similar to
other messages and I don't see the need here to add 'tables' as we
have it in grammar.

-- 
With Regards,
Amit Kapila.

72 matches

Mail list logo