Re: Single transaction in the tablesync worker?

2021-01-23 Thread Peter Smith
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila  wrote:
>
> On Sat, Jan 23, 2021 at 4:55 AM Peter Smith  wrote:
> >
> > PSA the v18 patch for the Tablesync Solution1.
> >
>
> Few comments:
> =
> 1.
> - *   So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
> - *   CATCHUP -> SYNCDONE -> READY.
> + *   So the state progression is always: INIT -> DATASYNC ->
> + *   (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
>
> I don't think we need to be specific here that sync worker sets
> FINISHEDCOPY state.
>

This was meant to indicate that *only* the sync worker knows about the
FINISHEDCOPY state, whereas all the other states are either known
(assigned and/or used) by *both* kinds of workers. But, I can remove
it if you feel that distinction is not useful.

> 4.
> - /*
> - * To build a slot name for the sync work, we are limited to NAMEDATALEN -
> - * 1 characters.  We cut the original slot name to NAMEDATALEN - 28 chars
> - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0').  (It's actually the
> - * NAMEDATALEN on the remote that matters, but this scheme will also work
> - * reasonably if that is different.)
> - */
> - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity 
> */
> - slotname = psprintf("%.*s_%u_sync_%u",
> - NAMEDATALEN - 28,
> - MySubscription->slotname,
> - MySubscription->oid,
> - MyLogicalRepWorker->relid);
> + /* Calculate the name of the tablesync slot. */
> + slotname = ReplicationSlotNameForTablesync(
> +MySubscription->oid,
> +MyLogicalRepWorker->relid);
>
> What is the reason for changing the slot name calculation? If there is
> any particular reasons, then we can add a comment to indicate why we
> can't include the subscription's slotname in this calculation.
>

The subscription slot name may be changed (e.g. ALTER SUBSCRIPTION)
and so including the subscription slot name as part of the tablesync
slot name was considered to be:
a) possibly risky/undefined, if the subscription slot_name = NONE
b) confusing, if we end up using 2 different slot names for the same
tablesync (e.g. if the subscription slot name is changed before a sync
worker is re-launched).
And since this subscription slot name part is not necessary for
uniqueness anyway, it was removed from the tablesync slot name to
eliminate those concerns.

Also, the tablesync slot name calculation was encapsulated as a
separate function because previously (i.e. before v18) it was used by
various other cleanup codes. I still like it better as a function, but
now it is only called from one place so we could put that code back
inline if you prefer it how it was..


Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Is Recovery actually paused?

2021-01-23 Thread Bharath Rupireddy
On Sun, Jan 24, 2021 at 11:29 AM Dilip Kumar  wrote:
> > Some comments on the v6 patch:

> > [2] Typo - it's "requested" + * 'paused requested' - if pause is
> > reqested but recovery is not yet paused

Here I meant the typo "reqested" in "if pause is reqested but recovery
is not yet paused" statement from v6 patch.

> > [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'"
>
> Which code does it refer, can give put the snippet from the patch.
> However, I have found there were 'paused requested' in two places so I
> have fixed.

Thanks.

> > [6] As I mentioned upthread, isn't it better to have
> > "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"?
>
> That is an existing function so I think it's fine to keep the same name.

Personally, I think the function RecoveryIsPaused itself is
unnecessary with the new function GetRecoveryPauseState introduced in
your patch. IMHO, we can remove it. If not okay, then we are at it,
can we at least change the function name to be meaningful
"IsRecoveryPaused"? Others may have better thoughts than me.

> > [7] Can we have the function variable name "recoveryPause" as "state"
> > or "pauseState? Because that variable context is set by the enum name
> > RecoveryPauseState and the function name.
> >
> > +SetRecoveryPause(RecoveryPauseState recoveryPause)
> >
> > Here as well, "recoveryPauseState" to "state"?
> > +GetRecoveryPauseState(void)
> >  {
> > -boolrecoveryPause;
> > +RecoveryPauseStaterecoveryPauseState;
>
> I don't think it is required but while changing the patch I will see
> whether to change or not.

It will be good to change that. I personally don't like structure
names and variable names to be the same.

> > [6] Function name RecoveryIsPaused and it's comment "Check whether the
> > recovery pause is requested." doesn't seem to be matching. Seems like
> > it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED.
> > Should it return true only when the state is RECOVERY_PAUSE_REQUESTED?
>
> Code is doing right, I will change the comments.
>
> > Instead of "while (RecoveryIsPaused())", can't we change it to "while
> > (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the
> > RecoveryIsPaused()?
>
> I think it looks clean with the function

As I said earlier, I see no use of RecoveryIsPaused() with the
introduction of the new function GetRecoveryPauseState(). Others may
have better thoughts than me.

> > [7] Can we change the switch-case in pg_is_wal_replay_paused to
> > something like below?
> >
> > Datum
> > pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
> > {
> > +char   *state;
> > +/* get the recovery pause state */
> > +switch(GetRecoveryPauseState())
> > +{
> > +case RECOVERY_NOT_PAUSED:
> > +state = "not paused";
> > +case RECOVERY_PAUSE_REQUESTED:
> > +state = "paused requested";
> > +case RECOVERY_PAUSED:
> > +state = "paused";
> > +default:
> > +elog(ERROR, "invalid recovery pause state");
> > +}
> > +
> > +PG_RETURN_TEXT_P(cstring_to_text(type));
>
> Why do you think it is better to use an extra variable?

I see no wrong in having PG_RETURN_TEXT_P and cstring_to_text 's in
every case statement. But, just to make sure the code looks cleaner, I
said that we can have a local state variable and just one
PG_RETURN_TEXT_P(cstring_to_text(state));. See some existing functions
brin_page_type, hash_page_type, json_typeof,
pg_stat_get_backend_activity, pg_stat_get_backend_wait_event_type,
pg_stat_get_backend_wait_event, get_command_type.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-23 Thread Bharath Rupireddy
On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar  wrote:
> On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
>  wrote:
>>
>> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
>> > Please find the patch for the same.  I haven't added a test case for
>> > this yet.  I mean we can write a test case to pause the recovery and
>> > get the status.  But I am not sure that we can really write a reliable
>> > test case for 'pause requested' and 'paused'.
>>
>> +1 to just show the recovery pause state in the output of
>> pg_is_wal_replay_paused. But, should the function name
>> "pg_is_wal_replay_paused" be something like
>> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
>> in a function, I expect a boolean output. Others may have better
>> thoughts.
>>
>> IIUC the above change, ensuring the recovery is paused after it's
>> requested lies with the user. IMHO, the main problem we are trying to
>> solve is not met
>
>
> Basically earlier their was no way for the user yo know whether the recovery 
> is actually paused or not because it was always returning true after pause 
> requested.  Now, we will return whether pause requested or actually paused.  
> So > for tool designer who want to wait for recovery to get paused can have a 
> loop and wait until the recovery state reaches to paused.  That will give a 
> better control.

I get it and I agree to have that change. My point was whether we can
have a new function pg_wal_replay_pause_and_wait that waits until
recovery is actually paused ((along with pg_is_wal_replay_paused
returning the actual state than a true/false) so that tool developers
don't need to have the waiting code outside, if at all they care about
it? Others may have better thoughts than me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Faulty HEAP_XMAX_LOCK_ONLY & HEAP_KEYS_UPDATED hintbit combination

2021-01-23 Thread Julien Rouhaud
Hi,

While working on pg14 compatibility for an extension relying on an apparently
uncommon combination of FOR UPDATE and stored function calls, I hit some new
Asserts introduced in 866e24d47db (Extend amcheck to check heap pages):

+   /*
+* Do not allow tuples with invalid combinations of hint bits to be 
placed
+* on a page.  These combinations are detected as corruption by the
+* contrib/amcheck logic, so if you disable one or both of these
+* assertions, make corresponding changes there.
+*/
+   Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+(tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));


I attach a simple self contained script to reproduce the problem, the last
UPDATE triggering the Assert.

I'm not really familiar with this part of the code, so it's not exactly clear
to me if some logic is missing in compute_new_xmax_infomask() /
heap_prepare_insert(), or if this should actually be an allowed combination of
hint bit.
DROP TABLE IF EXISTS t1;
CREATE TABLE t1(id integer, val text);
INSERT INTO t1 SELECT i, 'val' FROM generate_series(1, 500) i;

BEGIN;
SAVEPOINT s1;
SELECT 1 FROM t1 WHERE id = 123 FOR UPDATE;
UPDATE t1 SET val = 'hoho' WHERE id = 123;
release s1;
savepoint s1;
SELECT 1 FROM t1 WHERE id = 123 FOR UPDATE;
UPDATE t1 SET val = 'hoho' WHERE id = 123;
COMMIT;


Re: Is Recovery actually paused?

2021-01-23 Thread Dilip Kumar
On Sat, Jan 23, 2021 at 4:40 PM Bharath Rupireddy
 wrote:
>
> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
> > Please find the patch for the same.  I haven't added a test case for
> > this yet.  I mean we can write a test case to pause the recovery and
> > get the status.  But I am not sure that we can really write a reliable
> > test case for 'pause requested' and 'paused'.
>
> +1 to just show the recovery pause state in the output of
> pg_is_wal_replay_paused. But, should the function name
> "pg_is_wal_replay_paused" be something like
> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> in a function, I expect a boolean output. Others may have better
> thoughts.

I am fine with the name change, but don't feel that it will be
completely wrong if pg_is_wal_replay_paused returns a different state
of the recovery pause.
So I would like to see what others thinks and based on that we can decide.

> IIUC the above change, ensuring the recovery is paused after it's
> requested lies with the user. IMHO, the main problem we are trying to
> solve is not met. Isn't it better if we have a new function(wait
> version) along with the above change to pg_is_wal_replay_paused,
> something like "pg_wal_replay_pause_and_wait" returning true or false?
> The functionality is pg_wal_replay_pause + wait until it's actually
> paused.
>
> Thoughts?

Already replied in the last mail.

> Some comments on the v6 patch:
>
> [1] How about
> + * This function returns the current state of the recovery pause.
> instead of
> + * This api will return the current state of the recovery pause.

Okay

> [2] Typo - it's "requested" + * 'paused requested' - if pause is
> reqested but recovery is not yet paused
>
> [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'"

Which code does it refer, can give put the snippet from the patch.
However, I have found there were 'paused requested' in two places so I
have fixed.

> [4] Isn't it good to have an example usage and output of the function
> in the documentaion?
> +Returns recovery pause status, which is not
> paused if
> +pause is not requested, pause requested if pause 
> is
> +requested but recovery is not yet paused and,
> paused if
> +the recovery is actually paused.
> 

I will add.

> [5] Is it
> + * Wait until shared recoveryPause state is set to RECOVERY_NOT_PAUSED.
> instead of
> + * Wait until shared recoveryPause is set to RECOVERY_NOT_PAUSED.

Ok

> [6] As I mentioned upthread, isn't it better to have
> "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"?

That is an existing function so I think it's fine to keep the same name.

> [7] Can we have the function variable name "recoveryPause" as "state"
> or "pauseState? Because that variable context is set by the enum name
> RecoveryPauseState and the function name.
>
> +SetRecoveryPause(RecoveryPauseState recoveryPause)
>
> Here as well, "recoveryPauseState" to "state"?
> +GetRecoveryPauseState(void)
>  {
> -boolrecoveryPause;
> +RecoveryPauseStaterecoveryPauseState;

I don't think it is required but while changing the patch I will see
whether to change or not.

> [6] Function name RecoveryIsPaused and it's comment "Check whether the
> recovery pause is requested." doesn't seem to be matching. Seems like
> it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED.
> Should it return true only when the state is RECOVERY_PAUSE_REQUESTED?

Code is doing right, I will change the comments.

> Instead of "while (RecoveryIsPaused())", can't we change it to "while
> (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the
> RecoveryIsPaused()?

I think it looks clean with the function

> [7] Can we change the switch-case in pg_is_wal_replay_paused to
> something like below?
>
> Datum
> pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
> {
> +char   *state;
> +/* get the recovery pause state */
> +switch(GetRecoveryPauseState())
> +{
> +case RECOVERY_NOT_PAUSED:
> +state = "not paused";
> +case RECOVERY_PAUSE_REQUESTED:
> +state = "paused requested";
> +case RECOVERY_PAUSED:
> +state = "paused";
> +default:
> +elog(ERROR, "invalid recovery pause state");
> +}
> +
> +PG_RETURN_TEXT_P(cstring_to_text(type));

Why do you think it is better to use an extra variable?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: simplifying foreign key/RI checks

2021-01-23 Thread Corey Huinker
On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu  wrote:

> Hi,
>
> +   for (i = 0; i < riinfo->nkeys; i++)
> +   {
> +   Oid eq_opr = eq_oprs[i];
> +   Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
> +   RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
> +
> +   if (pk_nulls[i] != 'n' &&
> OidIsValid(entry->cast_func_finfo.fn_oid))
>
> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
> assignment to the three local variables. That way, ri_HashCompareOp
> wouldn't be called when pk_nulls[i] == 'n'.
>
> +   case TM_Updated:
> +   if (IsolationUsesXactSnapshot())
> ...
> +   case TM_Deleted:
> +   if (IsolationUsesXactSnapshot())
>
> It seems the handling for TM_Updated and TM_Deleted is the same. The cases
> for these two values can be put next to each other (saving one block of
> code).
>
> Cheers
>

I'll pause on reviewing v4 until you've addressed the suggestions above.


Re: Is Recovery actually paused?

2021-01-23 Thread Dilip Kumar
On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
> > Please find the patch for the same.  I haven't added a test case for
> > this yet.  I mean we can write a test case to pause the recovery and
> > get the status.  But I am not sure that we can really write a reliable
> > test case for 'pause requested' and 'paused'.
>
> +1 to just show the recovery pause state in the output of
> pg_is_wal_replay_paused. But, should the function name
> "pg_is_wal_replay_paused" be something like
> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> in a function, I expect a boolean output. Others may have better
> thoughts.
>
> IIUC the above change, ensuring the recovery is paused after it's
> requested lies with the user. IMHO, the main problem we are trying to
> solve is not met


Basically earlier their was no way for the user yo know whether the
recovery is actually paused or not because it was always returning true
after pause requested.  Now, we will return whether pause requested or
actually paused.  So for tool designer who want to wait for recovery to get
paused can have a loop and wait until the recovery state reaches to
paused.  That will give a better control.

I will check other comments and respond along with the patch.

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: How to expose session vs txn lock info in pg_locks view?

2021-01-23 Thread Andres Freund
Hi,

On 2021-01-19 14:16:07 +0800, Craig Ringer wrote:
> AFAICS it'd be necessary to expand PROCLOG to expose this in shmem.
> Probably by adding a small bitfield where bit 0 is set if there's a txn
> level lock and bit 1 is set if there's a session level lock. But I'm not
> convinced that expanding PROCLOCK is justifiable for this. sizeof(PROCLOCK)
> is 64 on a typical x64 machine. Adding anything to it increases it to 72
> bytes.

Indeed - I really don't want to increase the size, it's already a
problem.


> It's frustrating to be unable to tell the difference between session-level
> and txn-level locks in diagnostic output.

It'd be useful, I agree.


> And the deadlock detector has no way to tell the difference when
> selecting a victim for a deadlock abort - it'd probably make sense to
> prefer to send a deadlock abort for txn-only lockers.

I'm doubtful this is worth going for.


> But I'm not sure I see a sensible way to add the info - PROCLOCK is
> already free of any padding, and I wouldn't want to use hacks like
> pointer-tagging.

I think there's an easy way to squeeze out space: make groupLeader be an
integer index into allProcs instead. That requires only 4 bytes...

Alternatively, I think it'd be reasonably easy to add the scope as a bit
in LOCKMASK - there's plenty space.

Greetings,

Andres Freund




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2021-01-23 Thread Tomas Vondra

Hi,

I've pushed the fix, after a a couple extra rounds of careful testing.

I noticed that the existing pg_visibility regression tests don't check 
if we freeze the TOAST rows too (failing to do that was one of the 
symptoms). It'd be good to add that, because that would fail even 
without CLOBBER_CACHE_ALWAYS, so attached is a test I propose to add.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/pg_visibility/data/toast.data b/contrib/pg_visibility/data/toast.data
new file mode 100644
index 00..0aa5e4eb9a
--- /dev/null
+++ b/contrib/pg_visibility/data/toast.data
@@ -0,0 +1,10 @@
+1	54457a2daa04a110cff4512a80005d477b16279b92ed1bb18e0946af4fa425ed0f88c4089e4b41e1f62ec004f9ce358a53b91edc9d459004d22de05ade13016a0871ab9239433f1703d7776940c2b97a75acd1558d62a10ea00b20bb65f6b5ccb542ec21553c6e1fcde2a465bd70147bac2960565e605ffeb8b708f526549f135f492db9533d7d6b54c059d538dbfda4431af420245b20dafaaec92767b8b5c68f81e27726ef9e8d9d53a629f5ff86b705fef2db5f2d7377a3795a14f15a048fb73d89c4db0b83c3cffa338250f88b28a22f5bd4003a3a11d23470edc9cc258114df51eaf1436391834cbc808b6bae8639a783a1311abba0d4848842a4c0b75a0ac4f22e855e09f33994d45fb9b8af0ee32340d34bf39989dcce163907a407a793702b8e05d457c92b55877a7efd7cc19778d55175373099233c357982da46ed8cd7647e845272e1f620ab266a2bfe7de214c9b9b1cac97a8489623041c7a1ff3051ba801a1075d1883ce39b69d1e4835369be8bfd7050a50f1b77c37f8a4a1b5ae58dfdd327c5452fbc18c0729304daf8c2e9456585533447503ecbc29d812078724464feebd7d14eac2a8e2e0edbd735b2d13e6579a09672a21b948a241bde8db29df07d77c2919a69fdcb06e9ddb568f65dcf6de63b88bd0db421a2ffa811e56a5087d66b3a553c788b4897c16369acc913b4691edc2c132d71d5f7c49b70cdcb35750f04d0cc4b513614906c087f062f50e3d2be1570166cf696ed5013ec8d15ea72d83b70b17588d8ae82f2097d4cd2c48317d8dbdf5d8b2d6263a45c07ecccbfd803345dffe3ab076cca26b88b066ed6072e0486f6a3c518a962ec59487c02f34ab31239b08754cafa10bf5d8f2305ed1537f9c80fc4a74a8bf94b5a7d346ad7766f850e64ecebf2eb0efcd0d9d8e622c70da0f5e6b50b1486842f0b7a8279acd412ce9678c3589596c600f310545aa1dac129d73fb1a3231757c0b3c64442c81efa3b8c5185f82a67ed3cc9ab03f842997ceb563802a4a675c9f68127d0b94425f2ccede51ad289e372321fbc464f49a749b897a4ba5315a773d21bc31d671fac9b9a0cfe84f9f3d5f449e23c479b4791f91708e3e675dd176eb988e9ef79c24ed40be1ebb3b2874bd0f0906c04c3a5258fc371457bd5490e3ba340ccb07943e9cfc12426bfe1436c2c61b29f038bae18a6a54ade6465210d9a1fea6919ac99393873ecb3dd7e4b141f9b1398a13ddc68deb4ced98b79c554914db8bb21d8646a3e0488c27d43783506dcd1dc4033fb22df03c3b15723f7a0c80e6d9096223704ddeadff797bc75b6f5dcb18a3153e46da6ff6eb3f938fa659e2e0fe9a78eb3544902006d12bae15e4b212d10f34c8c9f92efd7f1740d5cf8eaf3ca940c2a69d4add0356c5988a14d86bf942893ab48ce596ec5e4ce2e9ae868e6691faa7b01da73beca61e63f1a4665629fc8678dac12c6694241c76650b0b80c6db838818e7d97b239cc6c532f7f1703d7a9c046cf4eda425a7567a57d68654edb4e6d6fa6646c53711fdc5092ea56fc3d9d2fd583725bf318fc15b5fab8781ee10796ea2eb7448145a0de54f26216ad61549f56c0b0842d37eb183e8023d7e932760a5231c3d8c8e5566464d035cccf97140ccef3f282cfe36e4fbb8d935ba23fbebdf20f3f99405eb6a8f086f55079ffe71d62e7dc10efa5a54304aaa7248f5f93321a781ef19acecdaf77a9fb61dff984255fe919b233fe88c72c412b3fa704518d8a33b591400578c0c8c0429bb3d5fa1cf54144a68f69437da80586cf878ffd59d0d2fb43e36398efb3b76e14051c49f758a9a1d99d28d7291777832fe78158a7d5768d12900927627204c976369a05c8dbcf04dc2767258598f4759f8e7a9a7ed4a61505bd2875c49a29a114086ca24a0c07cf0a40c1ad2583b17e2344646a6e795c8b722073365372821524ce4acda220d2d9a1a519f5e4170b8ea1ce66e20653f4ef0abc978cbaf1e1a7f5641a0b95aea3fc313d27749b0308c0fa86b5094df134b06a572a8ae9deda4f4321121f36a4194675609845abf1a54f996a54efa045a8c7612e2589a7ebe4e53b3e051cbe7e655c6953e945fde4a4d03dc1a6b146e6aab74cb33cebb1b1f6ddf578d67ceab9a40383cfdc53b75d86945a2e3b3b0c7103fdf1807f757cafc9c2c6d638f90ddbe4d80ed0ee57740db9c799d98205b617e7b9fe9e11c3296d0744d951e2e1a466f0c046d82096ad7cf5e0ddb1258f
+2	

Re: Use pg_pwrite() in pg_test_fsync

2021-01-23 Thread Thomas Munro
On Sun, Jan 10, 2021 at 9:21 AM Thomas Munro  wrote:
> I left the fsync-after-closing and non-sync'd tests using write(),
> because they weren't using lseek().  The latter case is arguably a bit
> odd because it's not overwriting pre-allocated blocks, unlike the
> earlier tests.

On closer inspection, the weird thing about that final test is that
it's opening and closing the file every time.  That doesn't seem to
make any sense.  Perhaps it's a copy and paste error from the previous
test?  In v2 I changed it to pg_pwrite(), and moved the open and close
calls out of the loop.
From 1e6bf5e539d22e12cff133fac6ad2e9ecd24ff3e Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 9 Jan 2021 23:37:18 +1300
Subject: [PATCH v2] Use pg_pwrite() in pg_test_fsync.

For consistency with the PostgreSQL behavior we're tring to simulate,
use pwrite() instead of lseek() + write().

Also fix the final "non-sync" test, which was opening and closing the
file for every write.

Discussion: https://postgr.es/m/CA%2BhUKGJjjid2BJsvjMALBTduo1ogdx2SPYaTQL3wAy8y2hc4nw%40mail.gmail.com
---
 src/bin/pg_test_fsync/pg_test_fsync.c | 51 +++
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 3eddd983c6..29ee7c7d6f 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -290,10 +290,11 @@ test_sync(int writes_per_op)
 		for (ops = 0; alarm_triggered == false; ops++)
 		{
 			for (writes = 0; writes < writes_per_op; writes++)
-if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+if (pg_pwrite(tmpfile,
+			  buf,
+			  XLOG_BLCKSZ,
+			  writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
 	die("write failed");
-			if (lseek(tmpfile, 0, SEEK_SET) == -1)
-die("seek failed");
 		}
 		STOP_TIMER;
 		close(tmpfile);
@@ -315,11 +316,12 @@ test_sync(int writes_per_op)
 	for (ops = 0; alarm_triggered == false; ops++)
 	{
 		for (writes = 0; writes < writes_per_op; writes++)
-			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+			if (pg_pwrite(tmpfile,
+		  buf,
+		  XLOG_BLCKSZ,
+		  writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
 die("write failed");
 		fdatasync(tmpfile);
-		if (lseek(tmpfile, 0, SEEK_SET) == -1)
-			die("seek failed");
 	}
 	STOP_TIMER;
 	close(tmpfile);
@@ -339,12 +341,13 @@ test_sync(int writes_per_op)
 	for (ops = 0; alarm_triggered == false; ops++)
 	{
 		for (writes = 0; writes < writes_per_op; writes++)
-			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+			if (pg_pwrite(tmpfile,
+		  buf,
+		  XLOG_BLCKSZ,
+		  writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
 die("write failed");
 		if (fsync(tmpfile) != 0)
 			die("fsync failed");
-		if (lseek(tmpfile, 0, SEEK_SET) == -1)
-			die("seek failed");
 	}
 	STOP_TIMER;
 	close(tmpfile);
@@ -362,12 +365,13 @@ test_sync(int writes_per_op)
 	for (ops = 0; alarm_triggered == false; ops++)
 	{
 		for (writes = 0; writes < writes_per_op; writes++)
-			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+			if (pg_pwrite(tmpfile,
+		  buf,
+		  XLOG_BLCKSZ,
+		  writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
 die("write failed");
 		if (pg_fsync_writethrough(tmpfile) != 0)
 			die("fsync failed");
-		if (lseek(tmpfile, 0, SEEK_SET) == -1)
-			die("seek failed");
 	}
 	STOP_TIMER;
 	close(tmpfile);
@@ -393,8 +397,10 @@ test_sync(int writes_per_op)
 		for (ops = 0; alarm_triggered == false; ops++)
 		{
 			for (writes = 0; writes < writes_per_op; writes++)
-if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
-
+if (pg_pwrite(tmpfile,
+			  buf,
+			  XLOG_BLCKSZ,
+			  writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
 	/*
 	 * This can generate write failures if the filesystem has
 	 * a large block size, e.g. 4k, and there is no support
@@ -402,8 +408,6 @@ test_sync(int writes_per_op)
 	 * size, e.g. XFS.
 	 */
 	die("write failed");
-			if (lseek(tmpfile, 0, SEEK_SET) == -1)
-die("seek failed");
 		}
 		STOP_TIMER;
 		close(tmpfile);
@@ -457,11 +461,12 @@ test_open_sync(const char *msg, int writes_size)
 		for (ops = 0; alarm_triggered == false; ops++)
 		{
 			for (writes = 0; writes < 16 / writes_size; writes++)
-if (write(tmpfile, buf, writes_size * 1024) !=
+if (pg_pwrite(tmpfile,
+			  buf,
+			  writes_size * 1024,
+			  writes * writes_size * 1024) !=
 	writes_size * 1024)
 	die("write failed");
-			if (lseek(tmpfile, 0, SEEK_SET) == -1)
-die("seek failed");
 		}
 		STOP_TIMER;
 		close(tmpfile);
@@ -553,16 +558,16 @@ test_non_sync(void)
 	printf(LABEL_FORMAT, "write");
 	fflush(stdout);
 
+	if ((tmpfile = open(filename, O_RDWR | PG_BINARY, 0)) == -1)
+		die("could not open output file");
 	START_TIMER;
 	for (ops = 0; alarm_triggered == false; ops++)
 	{
-		if ((tmpfile = open(filename, O_RDWR | PG_BINARY, 0)) == -1)
-			die("could not open output file");
-		if (write(tmpfile, buf, XLOG_BLCKSZ) != 

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-01-23 Thread Andres Freund
Hi,

On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:
> We found an issue in pg_upgrade on a cluster with a third-party
> background worker. The upgrade goes fine, but the new cluster is then in
> an inconsistent state. The background worker comes from the PoWA
> extension but the issue does not appear to related to this particular
> code.

Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affect

anything. Presumably the issue is that the bgworker actually does
transactional writes, which causes problems because the xids /
multixacts from early during pg_upgrade won't actually be valid after we
do pg_resetxlog etc.


> As a solution, it seems that, for similar reasons that we restrict
> socket access to prevent accidental connections (from commit
> f763b77193), we should also prevent background workers to start at this
> step.

I think that'd have quite the potential for negative impact - imagine
extensions that refuse to be loaded outside of shared_preload_libraries
(e.g. because they need to allocate shared memory) but that are required
during the course of pg_upgrade (e.g. because it's a tableam, a PL or
such). Those libraries will then tried to be loaded during the upgrade
(due to the _PG_init() hook being called when functions from the
extension are needed, e.g. the tableam or PL handler).

Nor is it clear to me that the only way this would be problematic is
with shared_preload_libraries. A library in local_preload_libraries, or
a demand loaded library can trigger bgworkers (or database writes in
some other form) as well.


I wonder if we could

a) set default_transaction_read_only to true, and explicitly change it
   in the sessions that need that.
b) when in binary upgrade mode / -b, error out on all wal writes in
   sessions that don't explicitly set a session-level GUC to allow
   writes.

Greetings,

Andres Freund




Re: Single transaction in the tablesync worker?

2021-01-23 Thread Peter Smith
FYI - I have done some long-running testing using the current patch [v18].

1. The src/test/subscription TAP tests:
- Subscription TAP tests were executed in a loop X 150 iterations.
- Duration 5 hrs.
- All iterations report "Result: PASS"

2. The postgres "make check" tests:
- make check was executed in a loop X 150 iterations.
- Duration 2 hrs.
- All iterations report "All 202 tests passed"

---
[v18] 
https://www.postgresql.org/message-id/CAHut%2BPvm0R%3DMn_uVN_JhK0scE54V6%2BEDGHJg1WYJx0Q8HX_mkQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Test coverage for contrib/auto_explain

2021-01-23 Thread Tom Lane
I got annoyed about the lack of $SUBJECT.  The attached patch
adds a simple test case, bringing the module's coverage to 84%
according to my results.  (The uncovered lines mostly are in
_PG_fini(), which is unreachable, or else to do with chaining
to additional occupiers of the same hooks.)

Any objections?

regards, tom lane

diff --git a/contrib/auto_explain/.gitignore b/contrib/auto_explain/.gitignore
new file mode 100644
index 00..5dcb3ff972
--- /dev/null
+++ b/contrib/auto_explain/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/auto_explain/Makefile b/contrib/auto_explain/Makefile
index 54d6d45d40..efd127d3ca 100644
--- a/contrib/auto_explain/Makefile
+++ b/contrib/auto_explain/Makefile
@@ -6,6 +6,8 @@ OBJS = \
 	auto_explain.o
 PGFILEDESC = "auto_explain - logging facility for execution plans"
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/auto_explain/t/001_auto_explain.pl b/contrib/auto_explain/t/001_auto_explain.pl
new file mode 100644
index 00..7968be963b
--- /dev/null
+++ b/contrib/auto_explain/t/001_auto_explain.pl
@@ -0,0 +1,52 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 4;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf',
+	"shared_preload_libraries = 'auto_explain'");
+$node->append_conf('postgresql.conf', "auto_explain.log_min_duration = 0");
+$node->append_conf('postgresql.conf', "auto_explain.log_analyze = on");
+$node->start;
+
+# run a couple of queries
+$node->safe_psql("postgres", "SELECT * FROM pg_class;");
+$node->safe_psql("postgres",
+	"SELECT * FROM pg_proc WHERE proname = 'int4pl';");
+
+# emit some json too
+$node->append_conf('postgresql.conf', "auto_explain.log_format = json");
+$node->reload;
+$node->safe_psql("postgres", "SELECT * FROM pg_proc;");
+$node->safe_psql("postgres",
+	"SELECT * FROM pg_class WHERE relname = 'pg_class';");
+
+$node->stop('fast');
+
+my $log = $node->logfile();
+
+my $log_contents = slurp_file($log);
+
+like(
+	$log_contents,
+	qr/Seq Scan on pg_class/,
+	"sequential scan logged, text mode");
+
+like(
+	$log_contents,
+	qr/Index Scan using pg_proc_proname_args_nsp_index on pg_proc/,
+	"index scan logged, text mode");
+
+like(
+	$log_contents,
+	qr/"Node Type": "Seq Scan"[^}]*"Relation Name": "pg_proc"/s,
+	"sequential scan logged, json mode");
+
+like(
+	$log_contents,
+	qr/"Node Type": "Index Scan"[^}]*"Index Name": "pg_class_relname_nsp_index"/s,
+	"index scan logged, json mode");


Re: Git, diffs, and patches

2021-01-23 Thread Bruce Momjian
On Fri, Jan 15, 2021 at 01:39:49PM -0500, Bruce Momjian wrote:
> I learned a few things when working on the key management patch that I
> want to share here in case it helps anyone:
...
> Maybe everyone else knew these things, but I didn't.  I can provide more
> details if desired.

One more learning is that git diff compares two source trees and outputs
a diff, while format-patch compares two trees with a common commit and
outputs a diff for each commit.  This is why format-patch can output
file name changes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: proposal - idea - enhancing plpgsql FOREACH for JSON, jsonb and hstore

2021-01-23 Thread Pavel Stehule
so 23. 1. 2021 v 19:21 odesílatel Stephen Frost  napsal:

> Greetings,
>
> * Pavel Stehule (pavel.steh...@gmail.com) wrote:
> > jsonb with subscripting support can be used as a dictionary object in
> > plpgsql.
> >
> > Can be nice to have support for iteration over a set of tuples (key,
> > value).
>
> Yes, I agree that this would be useful.
>
> > FOREACH fieldvar [ KEY keyvar] IN DICTIONARY sourceexpr [VALUE
> searchexpr]
> > LOOP
> > END LOOP;
>
> Should we be thinking about using sql/json path for what to search
> for instead of just fieldvar/keyvar..?  Or perhaps support both..
>

I would support both. JSONPath can be specified by a special clause - I
used the keyword VALUE (but can be different).

My primary inspiration and motivation is  the possibility to use jsonb as a
collection or dictionary in other languages. But if we implement some
"iterators", then enhancing to support XMLPath or JSONPath is natural. The
interface should not be too complex like specialized functions XMLTABLE or
JSON_TABLE, but simple task should be much faster with FOREACH statement,
because there is not an overhead of SQL or SPI.


> > and for JSON arrays
> >
> > FOREACH var IN ARRAY jsonval
> > LOOP
> > END LOOP
>
> Presumably we'd also support SLICE with this?
>

if we find good semantics, then why not?

>
> Also, I wonder about having a way to FOREACH through all objects,
> returning top-level ones, which a user could then call jsonb_typeof on
> and then recurse if an object is found, allowing an entire jsonb tree to
> be processed this way.
>

Probably this should be possible via JSONPath iteration.

We need similar interface like nodeTableFuncscan.c



> Thanks,
>
> Stephen
>


Re: proposal - idea - enhancing plpgsql FOREACH for JSON, jsonb and hstore

2021-01-23 Thread Stephen Frost
Greetings,

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
> jsonb with subscripting support can be used as a dictionary object in
> plpgsql.
> 
> Can be nice to have support for iteration over a set of tuples (key,
> value).

Yes, I agree that this would be useful.

> FOREACH fieldvar [ KEY keyvar] IN DICTIONARY sourceexpr [VALUE searchexpr]
> LOOP
> END LOOP;

Should we be thinking about using sql/json path for what to search
for instead of just fieldvar/keyvar..?  Or perhaps support both..

> and for JSON arrays
> 
> FOREACH var IN ARRAY jsonval
> LOOP
> END LOOP

Presumably we'd also support SLICE with this?

Also, I wonder about having a way to FOREACH through all objects,
returning top-level ones, which a user could then call jsonb_typeof on
and then recurse if an object is found, allowing an entire jsonb tree to
be processed this way.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: a verbose option for autovacuum

2021-01-23 Thread Stephen Frost
Greetings,

On Fri, Jan 22, 2021 at 2:33 PM Tom Lane  wrote:
> Tommy Li  writes:
> > Additionally, is there any interest in exposing more vacuum options to be
> > run by autovac? Right now it runs FREEZE and ANALYZE, which leaves the
> > VERBOSE, SKIP_LOCKED, INDEX_CLEANUP, and TRUNCATE unconfigurable.
>
> To the extent that any of these make sense in autovacuum, I'd say they
> ought to be managed automatically.  I don't see a strong argument for
> users configuring this.  (See also nearby thread about allowing index
> AMs to control some of this.)

I agree that it'd be nice to figure out some way to have these managed
automatically, but it's probably useful to point out to Tommy that you
can set vacuum options on a table level which autovacuum should respect,
such as vacuum_index_cleanup and vacuum_truncate.  For skip locked,
autovacuum already will automatically release it's attempt to acquire a
lock if someone backs up behind it for too long.

Until we get something automatic though, I could see being able to set
TRUNCATE, in particular, to be off globally as useful when running a
system with replicas that might end up having queries which block WAL
replay.  If no one is stepping up to build some way to handle that
automatically then I'd be in support of making it something that an
administrator can configure (to avoid having to always remember to do it
for each table created...).

* Tommy Li (to...@coffeemeetsbagel.com) wrote:
> > Seems like that would very soon feel like log spam.  What would be the
> > use case for having this on?  If you want one-off results you could
> > run VACUUM manually.
> 
> In my case I have a fairly large, fairly frequently updated table with a
> large number of indexes where autovacuum's runtime can fluctuate between 12
> and 24 hours. If I want to investigate why autovacuum today is running many
> hours longer than it did last week, the only information I have to go off
> is from pg_stat_progress_vacuum, which reports only progress based on the
> number of blocks completed across _all_ indexes.
> 
> VACUUM VERBOSE's output is nice because it reports runtime per index, which
> would allow me to see if a specific index has bloated more than usual.
> 
> I also have autovacuum throttled much more aggressively than manual
> vacuums, so information from a one-off manual VACUUM isn't comparable.

I tend to agree that this is pretty useful information to have included
when trying to figure out what autovacuum's doing.

> As for log spam, I'm not sure it's a problem as long as the verbose option
> is disabled by default.

While this would be in-line with our existing dismal logging defaults,
I'd be happier with a whole bunch more logging enabled by default,
including this, so that we don't have to tell everyone who deploys PG to
go enable this very sensible logging.  Arguments of 'log spam' really
fall down when you're on the receiving end of practically empty PG logs
and trying to figure out what's going on.  Further, log analysis tools
exist to aggregate this data and bring it up to a higher level for
administrators.

Still, I'd much rather have the option, even if disabled by default,
than not have it at all.

Thanks,

Stephen



signature.asc
Description: PGP signature


Re: simplifying foreign key/RI checks

2021-01-23 Thread Zhihong Yu
Hi,

+   for (i = 0; i < riinfo->nkeys; i++)
+   {
+   Oid eq_opr = eq_oprs[i];
+   Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+   RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+   if (pk_nulls[i] != 'n' &&
OidIsValid(entry->cast_func_finfo.fn_oid))

It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment
to the three local variables. That way, ri_HashCompareOp wouldn't be called
when pk_nulls[i] == 'n'.

+   case TM_Updated:
+   if (IsolationUsesXactSnapshot())
...
+   case TM_Deleted:
+   if (IsolationUsesXactSnapshot())

It seems the handling for TM_Updated and TM_Deleted is the same. The cases
for these two values can be put next to each other (saving one block of
code).

Cheers

On Fri, Jan 22, 2021 at 11:10 PM Amit Langote 
wrote:

> On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker 
> wrote:
> >> I decided not to deviate from pk_ terminology so that the new code
> >> doesn't look too different from the other code in the file.  Although,
> >> I guess we can at least call the main function
> >> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
> >> changed that.
> >
> > I think that's a nice compromise, it makes the reader aware of the
> concept.
> >>
> >> I've attached the updated patch.
> >
> > Missing "break" added. Check.
> > Comment updated. Check.
> > Function renamed. Check.
> > Attribute mapping matching test (and assertion) added. Check.
> > Patch applies to an as-of-today master, passes make check and check
> world.
> > No additional regression tests required, as no new functionality is
> introduced.
> > No docs required, as there is nothing user-facing.
>
> Thanks for the review.
>
> > Questions:
> > 1. There's a palloc for mapped_partkey_attnums, which is never freed, is
> the prevailing memory context short lived enough that we don't care?
> > 2. Same question for the AtrrMap map, should there be a free_attrmap().
>
> I hadn't checked, but yes, the prevailing context is
> AfterTriggerTupleContext, a short-lived one that is reset for every
> trigger event tuple.  I'm still inclined to explicitly free those
> objects, so changed like that.  While at it, I also changed
> mapped_partkey_attnums to root_partattrs for readability.
>
> Attached v4.
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>


Re: POC: postgres_fdw insert batching

2021-01-23 Thread Tomas Vondra




On 1/23/21 9:31 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
 wrote:

On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:

From: Tomas Vondra 

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right 
fix to
push.


Thanks to the outer check for operation ==  CMD_INSERT, the inner one became 
unnecessary.



Right. I've pushed the fix, hopefully buildfarm will get happy again.


I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions.  Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching.  However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching.  As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1;  -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.



Thanks for catching this. I think it'd be good if the fix included a 
regression test. The example seems like a good starting point, not sure 
if it can be simplified further.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-01-23 Thread Zhihong Yu
Hi, Mark:

+ CREATE TABLE races (

Maybe 'racings' is a better name for the table (signifying the activities).

+   if (ARR_NDIM(arr) != 1 ||
+   ARR_DIMS(arr)[0] != numkeys ||
+   ARR_HASNULL(arr) ||
+   ARR_ELEMTYPE(arr) != CHAROID)
+   elog(ERROR, "confreftype is not a 1-D char array");

For the case of ARR_DIMS(arr)[0] != numkeys (while other conditions are
satisfied), maybe refine the error message mentioning numkeys so that the
user would get better idea.

+* XXX this restriction is pretty annoying, considering the
effort
+* that's been put into the rest of the RI mechanisms to make
them

Is the above going to be addressed in subsequent patches ?

+SplitFKColElems(List *fkcolelems, List **names, List **reftypes)

Maybe add assertion that names and reftypes are not NULL.

+* If a foreign key, the reference semantics for each column
+*/
+   charconfreftype[1];

It would be nice to expand 'semantics' a bit to make the comment clearer.
e.g. mention 'Foreign key column reference semantics codes'

Thanks

On Sat, Jan 23, 2021 at 5:37 AM Mark Rofail  wrote:

> > Changelog (since the last version, v8):
> Below are the versions mentioned in the changelog. v12 is the latest
> version.
>
> /Mark
>
> On Sat, Jan 23, 2021 at 2:34 PM Mark Rofail 
> wrote:
>
>> Greetings,
>>
>> I am trying to revive this patch, Foreign Key Arrays. The original
>> proposal from my GSoC 2017 days can be found here:
>>
>> https://www.postgresql.org/message-id/CAJvoCut7zELHnBSC8HrM6p-R6q-NiBN1STKhqnK5fPE-9%3DGq3g%40mail.gmail.com
>>
>> Disclaimer, I am not the original author of this patch, I picked up this
>> patch in 2017 to migrate the original patch from 2012 and add a GIN index
>> to make it usable as the performance without a GIN index is not usable
>> after 100 rows.
>> The original authors, Tom Lane and Marco Nenciarini, are the ones who did
>> most of the heavy lifting. The original discussion can be found here:
>>
>> https://www.postgresql.org/message-id/flat/1343842863.5162.4.camel%40greygoo.devise-it.lan#1343842863.5162.4.ca...@greygoo.devise-it.lan
>>
>> *In brief, it would be used as follows:*
>> ```sql
>>CREATE TABLE A ( atest1 int PRIMARY KEY, atest2 text );
>>CREATE TABLE B ( btest1 int[], btest2 int );
>> ALTER TABLE B ADD CONSTRAINT FKARRAY FOREIGN KEY (EACH ELEMENT OF
>> btest1) REFERENCES A;
>> ```
>> and now table B references table A as follows:
>> ```sql
>> INSERT INTO B VALUES ('{10,1}', 2);
>> ```
>> where this row references rows 1 and 10 from A without the need of a
>> many-to-many table
>>
>> *Changelog (since the last version, v8):*
>> - v9 (made compatible with Postgresql 11)
>> support `DeconstructFkConstraintRow`
>> support `CloneFkReferencing`
>> support `generate_operator_clause`
>>
>> - v10 (made compatible with Postgresql v12)
>> support `addFkRecurseReferenced` and `addFkRecurseReferencing`
>> support `CloneFkReferenced` and `CloneFkReferencing`
>> migrate tests
>>
>> - v11(make compatible with Postgresql v13)
>> drop `ConvertTriggerToFK`
>> drop `select_common_type_2args` in favor of
>> `select_common_type_from_oids`
>> migrate tests
>>
>> - v12(made compatible with current master, 2021-01-23,
>> commit a8ed6bb8f4cf259b95c1bff5da09a8f4c79dca46)
>> add ELEMENT to `bare_label_keyword`
>> migrate docs
>>
>> *Todo:*
>> - re-add @>> operator which allows comparison of between array and
>> element and returns true iff the element is within the array
>> to allow easier select statements and lower overhead of explicitly
>> creating an array within the SELECT statement.
>>
>> ```diff
>> - SELECT * FROM B WHERE btest1 @> ARRAY[5];
>> + SELECT * FROM B WHERE btest1 @>> 5;
>> ```
>>
>> I apologize it took so long to get a new version here (years). However,
>> this is not the first time I tried to migrate the patch, every time a
>> different issue blocked me from doing so.
>> Reviews and suggestions are most welcome, @Joel Jacobson
>>  please review and test as previously agreed.
>>
>> /Mark
>>
>> On Tue, Oct 2, 2018 at 7:13 AM Michael Paquier 
>> wrote:
>>
>>> On Sat, Aug 11, 2018 at 05:20:57AM +0200, Mark Rofail wrote:
>>> > I am still having problems rebasing this patch. I can not figure it
>>> out on
>>> > my own.
>>>
>>> Okay, it's been a couple of months since this last email, and nothing
>>> has happened, so I am marking it as returned with feedback.
>>> --
>>> Michael
>>>
>>


Re: POC: postgres_fdw insert batching

2021-01-23 Thread Zhihong Yu
Amit:
Good catch.

bq. ExecInitRoutingInfo() that is in the charge of initialing

Should be 'ExecInitRoutingInfo() that is in charge of initializing'

Cheers

On Sat, Jan 23, 2021 at 12:31 AM Amit Langote 
wrote:

> On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
>  wrote:
> > On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:
> > > From: Tomas Vondra 
> > >> Right, that's pretty much what I ended up doing (without the
> CMD_INSERT
> > >> check it'd add batching info to explain for updates too, for example).
> > >> I'll do a bit more testing on the attached patch, but I think that's
> the right fix to
> > >> push.
> > >
> > > Thanks to the outer check for operation ==  CMD_INSERT, the inner one
> became unnecessary.
> > >
> >
> > Right. I've pushed the fix, hopefully buildfarm will get happy again.
>
> I was looking at this and it looks like we've got a problematic case
> where postgresGetForeignModifyBatchSize() is called from
> ExecInitRoutingInfo().
>
> That case is when the insert is performed as part of a cross-partition
> update of a partitioned table containing postgres_fdw foreign table
> partitions.  Because we don't check the operation in
> ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
> inserts attempt to use batching.  However the ResultRelInfo may be one
> for the original update operation, so ri_FdwState contains a
> PgFdwModifyState with batch_size set to 0, because updates don't
> support batching.  As things stand now,
> postgresGetForeignModifyBatchSize() simply returns that, tripping the
> following Assert in the caller.
>
> Assert(partRelInfo->ri_BatchSize >= 1);
>
> Use this example to see the crash:
>
> create table p (a int) partition by list (a);
> create table p1 (like p);
> create extension postgres_fdw;
> create server lb foreign data wrapper postgres_fdw ;
> create user mapping for current_user server lb;
> create foreign table fp1 (a int) server lb options (table_name 'p1');
> alter table p attach partition fp1 for values in (1);
> create or replace function report_trig_details() returns trigger as $$
> begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
> 'DELETE' then return old; end if; return new; end; $$ language
> plpgsql;
> create trigger trig before update on fp1 for each row execute function
> report_trig_details();
> create table p2 partition of p for values in (2);
> insert into p values (2);
> update p set a = 1;  -- crashes
>
> So we let's check mtstate->operation == CMD_INSERT in
> ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
> in cross-update situations where mtstate->operation would be
> CMD_UPDATE.
>
> I've attached a patch.
>
>
>
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-01-23 Thread Mark Rofail
Greetings,

I am trying to revive this patch, Foreign Key Arrays. The original proposal
from my GSoC 2017 days can be found here:
https://www.postgresql.org/message-id/CAJvoCut7zELHnBSC8HrM6p-R6q-NiBN1STKhqnK5fPE-9%3DGq3g%40mail.gmail.com

Disclaimer, I am not the original author of this patch, I picked up this
patch in 2017 to migrate the original patch from 2012 and add a GIN index
to make it usable as the performance without a GIN index is not usable
after 100 rows.
The original authors, Tom Lane and Marco Nenciarini, are the ones who did
most of the heavy lifting. The original discussion can be found here:
https://www.postgresql.org/message-id/flat/1343842863.5162.4.camel%40greygoo.devise-it.lan#1343842863.5162.4.ca...@greygoo.devise-it.lan

*In brief, it would be used as follows:*
```sql
   CREATE TABLE A ( atest1 int PRIMARY KEY, atest2 text );
   CREATE TABLE B ( btest1 int[], btest2 int );
ALTER TABLE B ADD CONSTRAINT FKARRAY FOREIGN KEY (EACH ELEMENT OF
btest1) REFERENCES A;
```
and now table B references table A as follows:
```sql
INSERT INTO B VALUES ('{10,1}', 2);
```
where this row references rows 1 and 10 from A without the need of a
many-to-many table

*Changelog (since the last version, v8):*
- v9 (made compatible with Postgresql 11)
support `DeconstructFkConstraintRow`
support `CloneFkReferencing`
support `generate_operator_clause`

- v10 (made compatible with Postgresql v12)
support `addFkRecurseReferenced` and `addFkRecurseReferencing`
support `CloneFkReferenced` and `CloneFkReferencing`
migrate tests

- v11(make compatible with Postgresql v13)
drop `ConvertTriggerToFK`
drop `select_common_type_2args` in favor of `select_common_type_from_oids`
migrate tests

- v12(made compatible with current master, 2021-01-23,
commit a8ed6bb8f4cf259b95c1bff5da09a8f4c79dca46)
add ELEMENT to `bare_label_keyword`
migrate docs

*Todo:*
- re-add @>> operator which allows comparison of between array and element
and returns true iff the element is within the array
to allow easier select statements and lower overhead of explicitly creating
an array within the SELECT statement.

```diff
- SELECT * FROM B WHERE btest1 @> ARRAY[5];
+ SELECT * FROM B WHERE btest1 @>> 5;
```

I apologize it took so long to get a new version here (years). However,
this is not the first time I tried to migrate the patch, every time a
different issue blocked me from doing so.
Reviews and suggestions are most welcome, @Joel Jacobson
 please
review and test as previously agreed.

/Mark

On Tue, Oct 2, 2018 at 7:13 AM Michael Paquier  wrote:

> On Sat, Aug 11, 2018 at 05:20:57AM +0200, Mark Rofail wrote:
> > I am still having problems rebasing this patch. I can not figure it out
> on
> > my own.
>
> Okay, it's been a couple of months since this last email, and nothing
> has happened, so I am marking it as returned with feedback.
> --
> Michael
>


Re: Single transaction in the tablesync worker?

2021-01-23 Thread Amit Kapila
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith  wrote:
>
> PSA the v18 patch for the Tablesync Solution1.
>

Few comments:
=
1.
- *   So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- *   CATCHUP -> SYNCDONE -> READY.
+ *   So the state progression is always: INIT -> DATASYNC ->
+ *   (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.

I don't think we need to be specific here that sync worker sets
FINISHEDCOPY state.

2.
@@ -98,11 +102,16 @@
 #include "miscadmin.h"
 #include "parser/parse_relation.h"
 #include "pgstat.h"
+#include "postmaster/interrupt.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
 #include "replication/walreceiver.h"
 #include "replication/worker_internal.h"
+#include "replication/slot.h"

I don't think the above includes are required. They seem to the
remnant of the previous approach.

3.
 process_syncing_tables_for_sync(XLogRecPtr current_lsn)
 {
- Assert(IsTransactionState());
+ bool sync_done = false;

  SpinLockAcquire(>relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(>relmutex);

- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
  {
  TimeLineID tli;

+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(>relmutex);

Why do we need these changes? If you have done it for the
code-readability purpose then we can consider this as a separate patch
because I don't see why these are required w.r.t this patch.

4.
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters.  We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0').  (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+MySubscription->oid,
+MyLogicalRepWorker->relid);

What is the reason for changing the slot name calculation? If there is
any particular reasons, then we can add a comment to indicate why we
can't include the subscription's slotname in this calculation.

5.
This is WAL
+ * logged for for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.

/for for/for

6.
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "DropSubscription: dropping origin tracking for
\"%s\"", originname);

I don't think we need this and the DEBUG1 message in
AlterSubscription_refresh. IT is fine to print this information for
background workers like in apply-worker but not sure if need it here.
The DropSubscription drops the origin of apply worker but it doesn't
use such a DEBUG message so I guess we don't it for tablesync origins
as well.

7. Have you tested with the new patch the scenario where we crash
after FINISHEDCOPY and before SYNCDONE, is it able to pick up the
replication using the new temporary slot? Here, we need to test the
case where during the catchup phase we have received few commits and
then the tablesync worker is crashed/errored out? Basically, check if
the replication is continued from the same point? I understand that
this can be only tested by adding some logs and we might not be able
to write a test for it.

-- 
With Regards,
Amit Kapila.




Re: Is Recovery actually paused?

2021-01-23 Thread Bharath Rupireddy
On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
> Please find the patch for the same.  I haven't added a test case for
> this yet.  I mean we can write a test case to pause the recovery and
> get the status.  But I am not sure that we can really write a reliable
> test case for 'pause requested' and 'paused'.

+1 to just show the recovery pause state in the output of
pg_is_wal_replay_paused. But, should the function name
"pg_is_wal_replay_paused" be something like
"pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
in a function, I expect a boolean output. Others may have better
thoughts.

IIUC the above change, ensuring the recovery is paused after it's
requested lies with the user. IMHO, the main problem we are trying to
solve is not met. Isn't it better if we have a new function(wait
version) along with the above change to pg_is_wal_replay_paused,
something like "pg_wal_replay_pause_and_wait" returning true or false?
The functionality is pg_wal_replay_pause + wait until it's actually
paused.

Thoughts?

Some comments on the v6 patch:

[1] How about
+ * This function returns the current state of the recovery pause.
instead of
+ * This api will return the current state of the recovery pause.

[2] Typo - it's "requested" + * 'paused requested' - if pause is
reqested but recovery is not yet paused

[3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'"

[4] Isn't it good to have an example usage and output of the function
in the documentaion?
+Returns recovery pause status, which is not
paused if
+pause is not requested, pause requested if pause is
+requested but recovery is not yet paused and,
paused if
+the recovery is actually paused.


[5] Is it
+ * Wait until shared recoveryPause state is set to RECOVERY_NOT_PAUSED.
instead of
+ * Wait until shared recoveryPause is set to RECOVERY_NOT_PAUSED.

[6] As I mentioned upthread, isn't it better to have
"IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"?

[7] Can we have the function variable name "recoveryPause" as "state"
or "pauseState? Because that variable context is set by the enum name
RecoveryPauseState and the function name.

+SetRecoveryPause(RecoveryPauseState recoveryPause)

Here as well, "recoveryPauseState" to "state"?
+GetRecoveryPauseState(void)
 {
-boolrecoveryPause;
+RecoveryPauseStaterecoveryPauseState;

[6] Function name RecoveryIsPaused and it's comment "Check whether the
recovery pause is requested." doesn't seem to be matching. Seems like
it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED.
Should it return true only when the state is RECOVERY_PAUSE_REQUESTED?

Instead of "while (RecoveryIsPaused())", can't we change it to "while
(GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the
RecoveryIsPaused()?

[7] Can we change the switch-case in pg_is_wal_replay_paused to
something like below?

Datum
pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
{
+char   *state;
+/* get the recovery pause state */
+switch(GetRecoveryPauseState())
+{
+case RECOVERY_NOT_PAUSED:
+state = "not paused";
+case RECOVERY_PAUSE_REQUESTED:
+state = "paused requested";
+case RECOVERY_PAUSED:
+state = "paused";
+default:
+elog(ERROR, "invalid recovery pause state");
+}
+
+PG_RETURN_TEXT_P(cstring_to_text(type));

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: SQL/JSON: functions

2021-01-23 Thread Erik Rijkers

On 2021-01-20 03:49, Nikita Glukhov wrote:


[0001-Add-common-SQL-JSON-clauses-v52.patch.gz]
[0002-SQL-JSON-constructors-v52.patch.gz]
[0003-IS-JSON-predicate-v52.patch.gz]
[0004-SQL-JSON-query-functions-v52.patch.gz]
[0005-SQL-JSON-functions-for-json-type-v52.patch.gz]
[0006-GUC-sql_json-v52.patch.gz]


Hi,

I read through the file func.sgml (only that file) and put the 
errors/peculiarities in the attached diff.  (Small stuff; typos really)



Your patch includes a CREATE TABLE my_films + INSERT, to run the 
examples against.  I think this is a great idea and we should do it more 
often.


But, the table has a text-column to contain the subsequently inserted 
json values. The insert runs fine but it turns out that some later 
examples queries only run against a jsonb column.  So I propose to 
change:

  CREATE TABLE my_films (js text);
to:
  CREATE TABLE my_films (js jsonb);

This change is not yet included in the attached file.  An alternative 
would be to cast the text-column in the example queries as js::jsonb



I also noticed that some errors were different in the sgml file than 'in 
the event':



   SELECT JSON_QUERY(js, '$.favorites[*].kind' ERROR ON ERROR) FROM 
my_films_jsonb;
   (table 'my_films_jsonb' is the same as your 'my_films', but with js 
as a jsonb column)


manual says: "ERROR: more than one SQL/JSON item"
 in reality: "ERROR: JSON path expression in JSON_QUERY should return 
singleton item without wrapper"
and:   "HINT: use WITH WRAPPER clause to wrap SQL/JSON item 
sequence into array"



Thanks,

Erik Rijkers



--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--- ./doc/src/sgml/func.sgml.orig	2021-01-20 14:52:35.564407275 +0100
+++ ./doc/src/sgml/func.sgml	2021-01-23 11:09:11.582465755 +0100
@@ -16968,7 +16968,7 @@
  
 
  
-   All SQL/JSON functions fall into one of the two groups.
+   All SQL/JSON functions fall into one of two groups.
Constructor functions
generate JSON data from values of SQL types.
Query functions
@@ -17034,7 +17034,7 @@
  Description
 
  
-  JSON function generates a JSON
+  The JSON function generates a JSON
   from a text data.
  
 
@@ -17049,7 +17049,7 @@
 
  
   String expression that provides the JSON text data.
-  Accepted any character strings (text, char, etc.)
+  Accepts any character strings (text, char, etc.)
   or binary strings (bytea) in UTF8 encoding.
   For null input, SQL null value is returned.
  
@@ -17110,7 +17110,7 @@
 
  Notes
  
-  Alternatively, you can construct JSON values simply
+  Alternatively, you can construct JSON values by simply
   using PostgreSQL-specific casts to 
   json and jsonb types.
  
@@ -17118,7 +17118,7 @@
 
  Examples
  
-  Construct a JSON the provided strings:
+  Construct a JSON using the provided strings:
  
 
 SELECT JSON('{ "a" : 123, "b": [ true, "foo" ], "a" : "bar" }');
@@ -17173,7 +17173,7 @@
   JSON.
   For null input, SQL  null
   (not a JSON null) value is returned.
-  For any scalar other than a number, a Boolean, the text representation
+  For any scalar other than a number or a Boolean, the text representation
   will be used, with escaping as necessary to make it a valid
   JSON string value.
   For details, see
@@ -17208,7 +17208,7 @@
 
  Examples
  
-  Construct a JSON from the provided values various types:
+  Construct a JSON from the provided value of various type:
  
 
 SELECT JSON_SCALAR(123.45);
@@ -17250,7 +17250,7 @@
  Description
 
  
-  JSON_OBJECT function generates a JSON
+  The JSON_OBJECT function generates a JSON
   object from SQL or JSON data.
  
 
@@ -17463,7 +17463,7 @@
  Description
 
  
-  JSON_OBJECTAGG function aggregates the provided data
+  The JSON_OBJECTAGG function aggregates the provided data
   into a JSON object. You can use this function to combine values
   stored in different table columns into pairs. If you specify a GROUP BY
   or an ORDER BY clause, this function returns a separate JSON object
@@ -17689,7 +17689,7 @@
  Description
 
  
-  JSON_ARRAY function constructs a JSON array from
+  The JSON_ARRAY function constructs a JSON array from
   the provided SQL or JSON data.
  
 
@@ -17855,7 +17855,7 @@
  Description
 
  
-  JSON_ARRAYAGG function aggregates the provided SQL
+  The JSON_ARRAYAGG function aggregates the provided SQL
   or JSON data into a JSON array.
  
 
@@ -18038,7 +18038,7 @@
  Description
 
  
-  JSON_EXISTS function checks whether the provided
+  The JSON_EXISTS function checks whether the provided
   JSON path expression can return any SQL/JSON items.
  
 
@@ -18151,7 +18151,7 @@
  Description
 

Re: doc review for v14

2021-01-23 Thread Michael Paquier
Hi Justin,

On Sun, Dec 27, 2020 at 02:26:05PM -0600, Justin Pryzby wrote:
> Thank you.

I have been looking at 0005, the patch dealing with the docs of the
replication stats, and have some comments.


 Number of times transactions were spilled to disk while decoding 
changes
-from WAL for this slot. Transactions may get spilled repeatedly, and
-this counter gets incremented on every such invocation.
+from WAL for this slot. A given transaction may be spilled multiple 
times, and
+this counter is incremented each time.
   
The original can be a bit hard to read, and I don't think that the new
formulation is an improvement.  I actually find confusing that this
mixes in the same sentence that a transaction can be spilled multiple
times and increment this counter each time.  What about splitting that
into two sentences?  Here is an idea:
"This counter is incremented each time a transaction is spilled.  The
same transaction may be spilled multiple times."

-Number of transactions spilled to disk after the memory used by
-logical decoding of changes from WAL for this slot exceeds
+Number of transactions spilled to disk because the memory used by
+logical decoding of changes from WAL for this slot exceeded
What does "logical decoding of changes from WAL" mean?  Here is an
idea to clarify all that:
"Number of transactions spilled to disk once the memory used by
logical decoding to decode changes from WAL has exceeded
logical_decoding_work_mem."

 Number of in-progress transactions streamed to the decoding output 
plugin
-after the memory used by logical decoding of changes from WAL for this
-slot exceeds logical_decoding_work_mem. Streaming 
only
+because the memory used by logical decoding of changes from WAL for 
this
+slot exceeded logical_decoding_work_mem. Streaming 
only
 works with toplevel transactions (subtransactions can't be streamed
-independently), so the counter does not get incremented for 
subtransactions+independently), so the counter is not incremented for 
subtransactions.
I have the same issue here with "by logical decoding of changes from
WAL".  I'd say "after the memory used by logical decoding to decode
changes from WAL for this slot has exceeded logical_decoding_work_mem".

 output plugin while decoding changes from WAL for this slot. 
Transactions
-may get streamed repeatedly, and this counter gets incremented on every
-such invocation.
+may be streamed multiple times, and this counter is incremented each 
time.
I would split this stuff into two sentences:
"This counter is incremented each time a transaction is streamed.  The
same transaction may be streamed multiple times.

  Resets statistics to zero for a single replication slot, or for all
- replication slots in the cluster.  The argument can be either the name
- of the slot to reset the stats or NULL.  If the argument is NULL, all
- counters shown in the 
pg_stat_replication_slots
- view for all replication slots are reset.
+ replication slots in the cluster.  The argument can be either NULL or 
the name
+ of a slot for which stats are to be reset.  If the argument is NULL, 
all
+ counters in the pg_stat_replication_slots
+ view are reset for all replication slots.
Here also, I find rather confusing that this paragraph tells multiple
times that NULL resets the stats for all the replication slots.  NULL
should use a  markup, and it is cleaner to use "statistics"
rather than "stats" IMO.  So I guess we could simplify things as
follows:
"Resets statistics of the replication slot defined by the argument. If
the argument is NULL, resets statistics for all the replication
slots."
--
Michael


signature.asc
Description: PGP signature


Re: proposal: schema variables

2021-01-23 Thread Pavel Stehule
Hi

only rebase

Regards

Pavel


schema-variables-20200123.patch.gz
Description: application/gzip


Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-23 Thread James Hilliard
On Sat, Jan 23, 2021 at 1:27 AM Sergey Shinderuk
 wrote:
>
> On 23.01.2021 08:02, Sergey Shinderuk wrote:
> >> On the whole it looks like we should recommend installing the CLT
> >> and not bothering with Xcode, which is about 10X the size:
> >>
> >> $ du -hs /Library/Developer/CommandLineTools
> >> 1.1G/Library/Developer/CommandLineTools
> >> $ du -hs /Applications/Xcode.app
> >>   15G/Applications/Xcode.app
> >>
> >
> > Fair.
>
> BTW, Homebrew prefers the CLT SDK:
> https://github.com/Homebrew/brew/blob/master/Library/Homebrew/os/mac.rb#L138
>
># Prefer CLT SDK when both Xcode and the CLT are installed.
># Expected results:
># 1. On Xcode-only systems, return the Xcode SDK.
># 2. On Xcode-and-CLT systems where headers are provided by the
> system, return nil.
># 3. On CLT-only systems with no CLT SDK, return nil.
># 4. On CLT-only systems with a CLT SDK, where headers are
> provided by the system, return nil.
># 5. On CLT-only systems with a CLT SDK, where headers are not
> provided by the system, return the CLT SDK.
>
>
> Here is the relevant discussion:
> https://github.com/Homebrew/brew/pull/7134
>
> I like the example of Git compiled against the wrong
> LIBCURL_VERSION_NUM. Clearly, there are other issues with
> cross-compiling to a newer SDK, besides autoconf probes and weak imports.

>From my understanding homebrew considers supporting deployment targets
other than that of the build host entirely out of scope, due to its
nature of being
a meta build system homebrew probably needs to sidestep package target
deployment bugs in this way as they are likely to be quite common. Homebrew
also has to ensure compatibility with the binary bottles. Homebrew handles
backwards compatibility largely by using host OS specific binaries, which is not
the typical way package binaries are distributed on OSX.

So it appears that their reasoning for doing this may not be directly applicable
to the situation we have with postgres as they have additional concerns. In
theory we should generally not have to worry about this much as long as target
deployment feature detection is functional as either SDK would generally work
for producing binaries that can run on the build host.

>
> --
> Sergey Shinderuk
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company




Re: POC: postgres_fdw insert batching

2021-01-23 Thread Amit Langote
On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
 wrote:
> On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:
> > From: Tomas Vondra 
> >> Right, that's pretty much what I ended up doing (without the CMD_INSERT
> >> check it'd add batching info to explain for updates too, for example).
> >> I'll do a bit more testing on the attached patch, but I think that's the 
> >> right fix to
> >> push.
> >
> > Thanks to the outer check for operation ==  CMD_INSERT, the inner one 
> > became unnecessary.
> >
>
> Right. I've pushed the fix, hopefully buildfarm will get happy again.

I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions.  Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching.  However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching.  As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1;  -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.




--
Amit Langote
EDB: http://www.enterprisedb.com


0001-Prevent-FDW-insert-batching-during-cross-partition-u.patch
Description: Binary data


Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-23 Thread Sergey Shinderuk

On 23.01.2021 08:02, Sergey Shinderuk wrote:

On the whole it looks like we should recommend installing the CLT
and not bothering with Xcode, which is about 10X the size:

$ du -hs /Library/Developer/CommandLineTools
1.1G    /Library/Developer/CommandLineTools
$ du -hs /Applications/Xcode.app
  15G    /Applications/Xcode.app



Fair.


BTW, Homebrew prefers the CLT SDK:
https://github.com/Homebrew/brew/blob/master/Library/Homebrew/os/mac.rb#L138

  # Prefer CLT SDK when both Xcode and the CLT are installed.
  # Expected results:
  # 1. On Xcode-only systems, return the Xcode SDK.
  # 2. On Xcode-and-CLT systems where headers are provided by the 
system, return nil.

  # 3. On CLT-only systems with no CLT SDK, return nil.
  # 4. On CLT-only systems with a CLT SDK, where headers are 
provided by the system, return nil.
  # 5. On CLT-only systems with a CLT SDK, where headers are not 
provided by the system, return the CLT SDK.



Here is the relevant discussion:
https://github.com/Homebrew/brew/pull/7134

I like the example of Git compiled against the wrong 
LIBCURL_VERSION_NUM. Clearly, there are other issues with 
cross-compiling to a newer SDK, besides autoconf probes and weak imports.


--
Sergey Shinderuk
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-23 Thread Bharath Rupireddy
On Sat, Jan 23, 2021 at 11:50 AM Amit Kapila  wrote:
>
> On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
>  wrote:
> >
> > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila  wrote:
> >
> > Yes you are right. Looks like the above commit is causing the issue. I
> > reverted that commit and did not see the drop publication bug, see use
> > case [1].
> >
>
> Okay, thanks for confirming the same. I am planning to push the
> attached where I made changes in few comments, combined the code and
> test patch, and made a few changes in the commit message. Let me know
> if you have any suggestions?

Thanks Amit. Looks like there were some typos in the test case
description, I corrected them.

+# Drop the table from publicaiton
+# Add the table to publicaiton again
+# Rerfresh publication after table is added to publication

Attaching v7 patch. I'm fine with it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v7-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behavior.patch
Description: Binary data


Re: Is Recovery actually paused?

2021-01-23 Thread Dilip Kumar
On Sat, Jan 23, 2021 at 9:56 AM Dilip Kumar  wrote:
>
> On Fri, Jan 22, 2021 at 2:18 AM Robert Haas  wrote:
> >
> > On Mon, Jan 18, 2021 at 9:42 PM Yugo NAGATA  wrote:
> > > If it is acceptable that pg_is_wal_replay_paused() makes users wait,
> > > I'm ok for the current interface. I don't feel the need of
> > > pg_is_wal_replay_paluse_requeseted().
> >
> > Another idea could be that pg_is_wal_replay_paused() could be changed
> > to text, and the string could be either 'paused' or 'pause requested'
> > or 'not paused'. That way we'd be returning a direct representation of
> > the state we're keeping in memory. Some of the complexity in this
> > discussion seems to come from trying to squeeze 3 possibilities into a
> > Boolean.
> >
> > Let's also consider that we don't really know whether the client wants
> > us to wait or not, and different clients may want different things, or
> > maybe not, but we don't really know at this point. If we provide an
> > interface that waits, and the client doesn't want to wait but just
> > know the current state, they don't necessarily have any great options.
> > If we provide an interface that doesn't wait, and the client wants to
> > wait, it can poll until it gets the answer it wants. Polling can be
> > inefficient, but anybody who is writing a tool that uses this should
> > be able to manage an algorithm with some reasonable back-off behavior
> > (e.g. try after 10ms, 20ms, keep doubling, max of 5s, or something of
> > that sort), so I'm not sure there's actually any real problem in
> > practice. So to me it seems more likely that an interface that is
> > based on waiting will cause difficulty for tool-writers than one that
> > does not.
> >
> > Other people may feel differently, of course...
>
> I think this is the better way of handling this.  So +1 from my side,
> I will send an updated patch.

Please find the patch for the same.  I haven't added a test case for
this yet.  I mean we can write a test case to pause the recovery and
get the status.  But I am not sure that we can really write a reliable
test case for 'pause requested' and 'paused'.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v6-0001-pg_is_wal_replay_paused-will-return-the-status-of.patch
Description: Binary data