Re: Impact of checkpointer during pg_upgrade

2023-09-05 Thread Dilip Kumar
On Tue, Sep 5, 2023 at 10:55 AM Amit Kapila  wrote:
>
> > Earlier I was thinking that ERRORing out is better so that the user
> > can take necessary action for the invalidated slots and then retry
> > upgrade.  But thinking again I could not find what are the advantages
> > of this because if we error out then also users need to restart the
> > old cluster again and have to drop the corresponding subscriptions
> > OTOH if we allow the upgrade by ignoring the slots then also the user
> > has to take similar actions on the new cluster?  So what's the
> > advantage of erroring out over upgrading?
> >
>
> The advantage is that we avoid inconvenience caused to users because
> Drop Subscription will be unsuccessful as the corresponding slots are
> not present. So users first need to disassociate slots for the
> subscription and then drop the subscription.

Yeah that's a valid argument for erroring out.

 Also, I am not sure
> leaving behind some slots doesn't have any other impact, otherwise,
> why don't we drop such slots from time to time after they are marked
> invalidated during normal operation?

Okay, I am also not sure of that.

 If users really want to leave
> behind such invalidated slots after upgrade, we can even think of
> providing some option like "exclude_invalid_logical_slots".

+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread vignesh C
On Tue, 5 Sept 2023 at 09:10, Dilip Kumar  wrote:
>
> On Fri, Sep 1, 2023 at 12:16 PM vignesh C  wrote:
> >
> > On Fri, 1 Sept 2023 at 10:06, Amit Kapila  wrote:
> > >
> > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > All but one. Normally, the idea of marking dirty is to indicate that
> > > > > we will actually write/flush the contents at a later point (except
> > > > > when required for correctness) as even indicated in the comments atop
> > > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use
> > > > > that protocol at all the current places including CreateSlotOnDisk().
> > > > > So, we can probably do it here as well.
> > > >
> > > > yes
> > > >
> > >
> > > I think we should also ensure that slots are not invalidated
> > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > > because we don't allow decoding from such slots, so we shouldn't
> > > include those.
> >
> > Added this check.
> >
> > Apart from this I have also fixed the following issues that were
> > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> > instead of setting it in SaveSlotToPath b) The comments were moved
> > from ReplicationSlot and moved to CheckPointReplicationSlots c) Tests
> > will be run in autovacuum = off d) updating last_saved_confirmed_flush
> > based on cp.slotdata.confirmed_flush rather than
> > slot->data.confirmed_flush.
> > I have also added the commitfest entry for this at [1].
>
> The overall idea looks fine to me
>
> +
> + /*
> + * We won't ensure that the slot is persisted after the
> + * confirmed_flush LSN is updated as that could lead to frequent
> + * writes.  However, we need to ensure that we do persist the slots at
> + * the time of shutdown whose confirmed_flush LSN is changed since we
> + * last saved the slot to disk. This will help in avoiding retreat of
> + * the confirmed_flush LSN after restart.
> + */
> + if (is_shutdown && SlotIsLogical(s))
> + {
> + SpinLockAcquire(&s->mutex);
> + if (s->data.invalidated == RS_INVAL_NONE &&
> + s->data.confirmed_flush != s->last_saved_confirmed_flush)
> + s->dirty = true;
> + SpinLockRelease(&s->mutex);
> + }
>
> The comments don't mention anything about why we are just flushing at
> the shutdown checkpoint.  I mean the checkpoint is not that frequent
> and we already perform a lot of I/O during checkpoints so isn't it
> wise to flush during every checkpoint.  We may argue that there is no
> extra advantage of that as we are not optimizing for crash recovery
> but OTOH there is no reason for not doing so for other checkpoints or
> we are worried about the concurrency with parallel walsender running
> during non shutdown checkpoint if so then better we explain that as
> well?  If it is already discussed in the thread and we have a
> conclusion on this then maybe we can mention this in comments?

I felt it is better to do this only during the shutdown checkpoint as
in other cases it is being saved  periodically as and when the
replication happens. Added comments for the same.

> /*
>   * Flush all replication slots to disk.
>   *
> - * This needn't actually be part of a checkpoint, but it's a convenient
> - * location.
> + * is_shutdown is true in case of a shutdown checkpoint.
>   */
>  void
> -CheckPointReplicationSlots(void)
> +CheckPointReplicationSlots(bool is_shutdown)
>
> It seems we have removed two lines from the function header comments,
> is this intentional or accidental?

Modified.
The updated v8 version patch has the changes for the same.

Regards,
Vignesh
From 1e498fb2f13cebe49e8e187c664caf71da2aed5e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v8] Persist logical slots to disk during a shutdown checkpoint
 if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process some
transactions for which we didn't send anything downstream (the changes got
filtered) but the confirm_flush LSN is updated due to keepalives. As we
don't flush the latest value of confirm_flush LSN, it may lead to
processing the same changes again without this patch.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith, Ashutosh Bapat
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=gyqkxox0afnbm1fbp7sy7t4yw...@mail.gmail.com
Discussion: http://postgr.es/m/tyapr01mb58664c81887b3af2eb6b16e3f5...@tyapr01mb5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c |   2 +-
 src/

Re: pg_basebackup: Always return valid temporary slot names

2023-09-05 Thread Nishant Sharma
Hi Jelte,


Please find my reviews below:-
*1)* With what I have understood from above, the pgbouncer fills up
be_pid (int, 32 bits) with random bits as it does not have an
associated server connection yet.
With this, I was thinking, isn't this a problem of pgbouncer filling
be_pid with random bits? Maybe it should have filled the be_pid
with a random positive integer instead of any integer as it
represents a pid? -- If this makes sense here, then maybe the fix
should be in pgbouncer instead of how the be_pid is processed in
pg_basebackup?

*2)* Rest, the patch looks straightforward, with these two changes :
"%d" --> "%u" and "(int)" --> "(unsigned)".


Regards,
Nishant.


On Thu, Aug 31, 2023 at 2:43 PM Jelte Fennema  wrote:

> With PgBouncer in the middle PQbackendPID can return negative values
> due to it filling all 32 bits of the be_pid with random bits.
>
> When this happens it results in pg_basebackup generating an invalid slot
> name (when no specific slot name is passed in) and thus throwing an
> error like this:
>
> pg_basebackup: error: could not send replication command
> "CREATE_REPLICATION_SLOT "pg_basebackup_-1201966863" TEMPORARY
> PHYSICAL ( RESERVE_WAL)": ERROR:  replication slot name
> "pg_basebackup_-1201966863" contains invalid character
> HINT:  Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>
> This patch fixes that problem by formatting the result from PQbackendPID
> as an unsigned integer when creating the temporary replication slot
> name.
>
> I think it would be good to backport this fix too.
>
> Replication connection support for PgBouncer is not merged yet, but
> it's pretty much ready:
> https://github.com/pgbouncer/pgbouncer/pull/876
>
> The reason PgBouncer does not pass on the actual Postgres backend PID
> is that it doesn't have an associated server connection yet when it
> needs to send the startup message to the client. It also cannot use
> it's own PID, because that would be the same for all clients, since
> pgbouncer is a single process.
>


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-05 Thread Hayato Kuroda (Fujitsu)
Dear Hou-san,

> Based on this, it’s possible that the slots we get each time when checking
> wal_status are different, because they may get changed in between these 
> checks.
> This may not cause serious problems for now, because we will either copy all
> the slots including ones invalidated when upgrading or we report ERROR. But I
> feel it's better to get consistent result each time we check the slots to 
> close
> the possibility for problems in the future. So, I feel we could centralize the
> check for wal_status and slots fetch, so that even if some slots status 
> changed
> after that, it won't have a risk to affect our check. What do you think ?

Thank you for giving the suggestion! I agreed that to centralize checks, and I
had already started to modify. Here is the updated patch.

In this patch all slot infos are extracted in the 
get_old_cluster_logical_slot_infos(),
upcoming functions uses them. Based on the change, two attributes 
confirmed_flush
and wal_status were added in LogicalSlotInfo.

IIUC we cannot use strcut List in the client codes, so structures and related
functions are added in the function.c. These are used for extracting unique
plugins, but it may be overkill because check_loadable_libraries() handle
duplicated entries. If we can ignore duplicated entries, these functions can be
removed.

Also, for simplifying codes, only a first-met invalidated slot is output in the
check_old_cluster_for_valid_slots(). Warning messages int the function were
removed. I think it may be enough because check_new_cluster_is_empty() do
similar thing. Please tell me if it should be reverted...


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v31-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch
Description:  v31-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch


v31-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v31-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: Improve heapgetpage() performance, overhead from serializable

2023-09-05 Thread John Naylor
On Mon, Jul 17, 2023 at 9:58 PM Andres Freund  wrote:

> FWIW, there's more we can do, with some hacky changes I got the time down
to
> 273.261, but the tradeoffs start to be a bit more complicated. And
397->320ms
> for something as core as this, is imo worth considering on its own.

Nice!

> On 2023-07-17 09:55:07 +0800, Zhang Mingli wrote:

> > Does it make sense to combine if else condition and put it to the
incline function’s param?
> >
> > Like:
> > scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
> >
 block, lines, all_visible, check_serializable);
>
> I think that makes it less likely that the compiler actually generates a
> constant-folded version for each of the branches. Perhaps worth some
> experimentation.

Combining this way doesn't do so for me.

Minor style nit:

+ scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
+   block, lines, 0, 1);

I believe we prefer true/false rather than numbers.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: old_snapshot_threshold bottleneck on replica

2023-09-05 Thread Thomas Munro
On Wed, Jun 14, 2023 at 3:56 PM Thomas Munro  wrote:
> On Tue, Feb 14, 2023 at 9:45 AM Andres Freund  wrote:
> > Indeed. There's a lot of things wrong with it. We have reproducers for
> > creating wrong query results. Nobody has shown interest in fixing the
> > problems, for several years by now. It costs users that *do not* use the
> > feature performance (*).
> >
> > I think we're doing our users a disservice by claiming to have this feature.
> >
> > I don't think a lot of the existing code would survive if we were to create 
> > a
> > newer version, more maintainable / reliable, version of the feature.
>
> I raised this at the recent developer meeting and the assembled
> hackers agreed.  Does anyone think we *shouldn't* drop the feature?  I
> volunteered to write a removal patch for v17, so here's a first run
> through to find all the traces of this feature.  In this first go I
> removed everything I could think of, but we might want to keep some
> vestiges.  I guess we might want to keep the registered error
> class/code?  Should we invent a place where we keep stuff like #define
> TestForOldSnapshot(...) expanding to nothing for some amount of time,
> for extensions?  I dunno, I bet extensions doing stuff that
> sophisticated already have a bunch of version tests anyway.  I suppose
> keeping the GUC wouldn't really be helpful (if you're using it, you
> probably want to know that it isn't available anymore and think about
> the implications for your application).

Done.

I hope we get "snapshot too old" back one day.




Re: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread Dilip Kumar
On Tue, Sep 5, 2023 at 10:12 AM Amit Kapila  wrote:
>
> On Tue, Sep 5, 2023 at 7:54 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Monday, September 4, 2023 6:15 PM vignesh C  wrote:
> > >
> > > On Mon, 4 Sept 2023 at 15:20, Amit Kapila  wrote:
> > > >
> > > > On Fri, Sep 1, 2023 at 10:50 AM vignesh C  wrote:
> > > > >
> > > > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila
> > >  wrote:
> > > > > > > >
> > > > > > > > All but one. Normally, the idea of marking dirty is to
> > > > > > > > indicate that we will actually write/flush the contents at a
> > > > > > > > later point (except when required for correctness) as even
> > > > > > > > indicated in the comments atop ReplicatioinSlotMarkDirty().
> > > > > > > > However, I see your point that we use that protocol at all the 
> > > > > > > > current
> > > places including CreateSlotOnDisk().
> > > > > > > > So, we can probably do it here as well.
> > > > > > >
> > > > > > > yes
> > > > > > >
> > > > > >
> > > > > > I think we should also ensure that slots are not invalidated
> > > > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > > > > > because we don't allow decoding from such slots, so we shouldn't
> > > > > > include those.
> > > > >
> > > > > Added this check.
> > > > >
> > > > > Apart from this I have also fixed the following issues that were
> > > > > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> > > > > instead of setting it in SaveSlotToPath
> > > > >
> > > >
> > > > + if (is_shutdown && SlotIsLogical(s)) { SpinLockAcquire(&s->mutex);
> > > > + if (s->data.invalidated == RS_INVAL_NONE &&
> > > > + s->data.confirmed_flush != s->last_saved_confirmed_flush) dirty =
> > > > + s->true;
> > > >
> > > > I think it is better to use ReplicationSlotMarkDirty() as that would
> > > > be consistent with all other usages.
> > >
> > > ReplicationSlotMarkDirty works only on MyReplicationSlot whereas
> > > CheckpointReplicationSlots loops through all the slots and marks the
> > > appropriate slot as dirty, we might have to change 
> > > ReplicationSlotMarkDirty to
> > > take the slot as input parameter and all caller should pass 
> > > MyReplicationSlot.
> >
> > Personally, I feel if we want to centralize the code of marking dirty into a
> > function, we can introduce a new static function MarkSlotDirty(slot) to mark
> > passed slot dirty and let ReplicationSlotMarkDirty and
> > CheckpointReplicationSlots call it. Like:
> >
> > void
> > ReplicationSlotMarkDirty(void)
> > {
> > MarkSlotDirty(MyReplicationSlot);
> > }
> >
> > +static void
> > +MarkSlotDirty(ReplicationSlot *slot)
> > +{
> > +   Assert(slot != NULL);
> > +
> > +   SpinLockAcquire(&slot->mutex);
> > +   slot->just_dirtied = true;
> > +   slot->dirty = true;
> > +   SpinLockRelease(&slot->mutex);
> > +}
> >
> > This is somewhat similar to the relation between 
> > ReplicationSlotSave(serialize
> > my backend's replications slot) and SaveSlotToPath(save the passed slot).
> >
> > > Another thing is we have already taken spin lock to access
> > > last_confirmed_flush_lsn from CheckpointReplicationSlots, we could set 
> > > dirty
> > > flag here itself, else we will have to release the lock and call
> > > ReplicationSlotMarkDirty which will take lock again.
> >
> > Yes, this is unavoidable, but maybe it's not a big problem as
> > we only do it at shutdown.
> >
>
> True but still it doesn't look elegant. I also thought about having a
> probably inline function that marks both just_dirty and dirty fields.
> However, that requires us to assert that the caller has already
> acquired a spinlock. I see a macro SpinLockFree() that might help but
> it didn't seem to be used anywhere in the code so not sure if we can
> rely on it.

Can't we just have code like this?  I mean we will have to make
ReplicationSlotMarkDirty take slot as an argument or have another
version which takes slot as an argument and that would be called by us
as well as by ReplicationSlotMarkDirty().  I mean why do we need these
checks (s-(data.invalidated == RS_INVAL_NONE &&
s->data.confirmed_flush != s->last_saved_confirmed_flush) under the
mutex?  Walsender is shutdown so confirmed flush LSN can not move
concurrently and slot can not be invalidated as well because that is
done by checkpointer and we are in checkpointer?

+ if (is_shutdown && SlotIsLogical(s))
+ {
+ if (s->data.invalidated == RS_INVAL_NONE &&
+ s->data.confirmed_flush != s->last_saved_confirmed_flush)
+ {
+ ReplicationSlotMarkDirty(s);
+ }
+
+ SpinLockRelease(&s->mutex);
+ }


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-05 Thread John Naylor
On Thu, Aug 24, 2023 at 5:55 PM Quan Zongliang 
wrote:

> In the function heapgetpage. If a table is not updated very frequently.
> Many actions in tuple loops are superfluous. For all_visible pages,
> loctup does not need to be assigned, nor does the "valid" variable.
> CheckForSerializableConflictOutNeeded from
> HeapCheckForSerializableConflictOut function, it only need to inspect at

Thanks for submitting! A few weeks before this, there was another proposal,
which specializes code for all paths, not just one. That patch also does so
without duplicating the loop:

https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysj...@awork3.anarazel.de

> the beginning of the cycle only once. Using vtune you can clearly see
> the result (attached heapgetpage.jpg).
>
> So by splitting the loop logic into two parts, the vtune results show
> significant improvement (attached heapgetpage-allvis.jpg).

For future reference, it's not clear at all from the screenshots what the
improvement will be for the user. In the above thread, the author shares
testing methodology as well as timing measurements. This is useful for
reproducibilty, as well as convincing others that the change is important.

--
John Naylor
EDB: http://www.enterprisedb.com


Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2023-09-05 Thread tender wang
Hi all,

   I recently run benchmark[1] on master, but I found performance problem
as below:

explain analyze select
  subq_0.c0 as c0,
  subq_0.c1 as c1,
  subq_0.c2 as c2
from
  (select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
  from
public.lineitem as ref_0
  left join public.lineitem as sample_0
  on ((select p_partkey from public.part order by p_partkey limit 1)
 is not NULL)
  where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
   QUERY PLAN

-
 Limit  (cost=78.00..45267050.75 rows=1 width=27) (actual
time=299695.097..299695.099 rows=0 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=78.00..78.00 rows=1 width=8) (actual
time=0.651..0.652 rows=1 loops=1)
   ->  Sort  (cost=78.00..83.00 rows=2000 width=8) (actual
time=0.650..0.651 rows=1 loops=1)
 Sort Key: part.p_partkey
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on part  (cost=0.00..68.00 rows=2000 width=8)
(actual time=0.013..0.428 rows=2000 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..45266972.75 rows=1 width=27)
(actual time=299695.096..299695.096 rows=0 loops=1)
 Join Filter: ($0 IS NOT NULL)
 Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS
NULL))
 Rows Removed by Filter: 3621030625
 ->  Seq Scan on lineitem ref_0  (cost=0.00..1969.75 rows=60175
width=11) (actual time=0.026..6.225 rows=60175 loops=1)
 ->  Materialize  (cost=0.00..2270.62 rows=60175 width=27) (actual
time=0.000..2.554 rows=60175 loops=60175)
   ->  Seq Scan on lineitem sample_0  (cost=0.00..1969.75
rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
 Planning Time: 0.172 ms
 Execution Time: 299695.501 ms
(16 rows)

After I set enable_material to off, the same query run faster, as below:
set enable_material = off;
explain analyze  select
  subq_0.c0 as c0,
  subq_0.c1 as c1,
  subq_0.c2 as c2
from
  (select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
  from
public.lineitem as ref_0
  left join public.lineitem as sample_0
  on ((select p_partkey from public.part order by p_partkey limit 1)
 is not NULL)
  where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
  QUERY
PLAN
---
 Limit  (cost=1078.00..91026185.57 rows=1 width=27) (actual
time=192669.605..192670.425 rows=0 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=78.00..78.00 rows=1 width=8) (actual
time=0.662..0.663 rows=1 loops=1)
   ->  Sort  (cost=78.00..83.00 rows=2000 width=8) (actual
time=0.661..0.662 rows=1 loops=1)
 Sort Key: part.p_partkey
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on part  (cost=0.00..68.00 rows=2000 width=8)
(actual time=0.017..0.430 rows=2000 loops=1)
   ->  Gather  (cost=1000.00..91026107.57 rows=1 width=27) (actual
time=192669.604..192670.422 rows=0 loops=1)
 Workers Planned: 1
 Params Evaluated: $0
 Workers Launched: 1
 ->  Nested Loop Left Join  (cost=0.00..91025107.47 rows=1
width=27) (actual time=192588.143..192588.144 rows=0 loops=2)
   Join Filter: ($0 IS NOT NULL)
   Filter: ((sample_0.l_orderkey IS NULL) AND
(sample_0.l_shipmode IS NULL))
   Rows Removed by Filter: 1810515312
   ->  Parallel Seq Scan on lineitem ref_0  (cost=0.00..1721.97
rows=35397 width=11) (actual time=0.007..3.797 rows=30088 loops=2)
   ->  Seq Scan on lineitem sample_0  (cost=0.00..1969.75
rows=60175 width=27) (actual time=0.000..2.637 rows=60175 loops=60175)
 Planning Time: 0.174 ms
 Execution Time: 192670.458 ms
(19 rows)

I debug the code and find consider_parallel_nestloop() doesn't consider
materialized form of the cheapest inner path.
When enable_material = true,  we can see Material path won in first plan,
but Parallel Seq Scan node doesn't add as outer path, which because
in try_partial_nestloop_path() , the cost of nestloop wat computed using
seq scan path not material path.

[1] include test table schema and data, you can repeat above problem.

I try fix this problem in attached patch, and I found pg12.12 also had this
is

Re: Autogenerate some wait events code and documentation

2023-09-05 Thread Drouvot, Bertrand

Hi,

On 9/5/23 7:44 AM, Michael Paquier wrote:

On Mon, Sep 04, 2023 at 02:14:58PM +0200, Drouvot, Bertrand wrote:

 # Build the descriptions.  There are in camel-case.
 # LWLock and Lock classes do not need any modifications.

Nit: 2 whitespaces before "There are in camel"


The whitespaces are intentional, 


Oh ok, out of curiosity, why are 2 whitespaces intentional?


Then, the only diff is:

< Client,WalSenderWaitWal,Waiting for WAL to be flushed in WAL sender process
---

Client,WalSenderWaitForWAL,Waiting for WAL to be flushed in WAL sender process


That said, it looks good to me.


Ah, good catch.  I did not think about cross-checking the data in the
new view before and after the patch set.  This rename needs to happen
in 0001.

Please find v5 attached.  How does that look?


Thanks!

That looks good. I just noticed that v5 did re-introduce the "issue" that
was fixed in 00e49233a9.

Also, v5 needs a rebase due to f691f5b80a.

Attaching v6 taking care of the 2 points mentioned above.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 6a94fc1a00ae784b12e15f498a5afba4020377f9 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 5 Sep 2023 14:32:37 +0900
Subject: [PATCH v6 1/2] Rename wait events with more consistent camelcase
 style

This will help in the introduction of more simplifications with the
generation of wait event data using wait_event_names.txt.  The event
names used the same case-insensitive characters, hence applying lower()
or upper() to the monitoring queries allows the detection of the same
events as before this change.
---
 src/backend/libpq/pqmq.c  |  2 +-
 src/backend/replication/walsender.c   |  2 +-
 src/backend/storage/ipc/shm_mq.c  |  6 +-
 .../utils/activity/wait_event_names.txt   | 90 +--
 4 files changed, 50 insertions(+), 50 deletions(-)
   3.2% src/backend/storage/ipc/
  93.3% src/backend/utils/activity/
   3.3% src/backend/

diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 253181f47a..38b042804c 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -182,7 +182,7 @@ mq_putmessage(char msgtype, const char *s, size_t len)
break;
 
(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-WAIT_EVENT_MQ_PUT_MESSAGE);
+
WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE);
ResetLatch(MyLatch);
CHECK_FOR_INTERRUPTS();
}
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index 46e48492ac..e250b0567e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1654,7 +1654,7 @@ WalSndWaitForWal(XLogRecPtr loc)
if (pq_is_send_pending())
wakeEvents |= WL_SOCKET_WRITEABLE;
 
-   WalSndWait(wakeEvents, sleeptime, 
WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+   WalSndWait(wakeEvents, sleeptime, 
WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL);
}
 
/* reactivate latch so WalSndLoop knows to continue */
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index d134a09a19..06d6b73527 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -1017,7 +1017,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const 
void *data,
 * cheaper than setting one that has been reset.
 */
(void) WaitLatch(MyLatch, WL_LATCH_SET | 
WL_EXIT_ON_PM_DEATH, 0,
-WAIT_EVENT_MQ_SEND);
+
WAIT_EVENT_MESSAGE_QUEUE_SEND);
 
/* Reset the latch so we don't spin. */
ResetLatch(MyLatch);
@@ -1163,7 +1163,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size 
bytes_needed, bool nowait,
 * setting one that has been reset.
 */
(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-WAIT_EVENT_MQ_RECEIVE);
+
WAIT_EVENT_MESSAGE_QUEUE_RECEIVE);
 
/* Reset the latch so we don't spin. */
ResetLatch(MyLatch);
@@ -1252,7 +1252,7 @@ shm_mq_wait_internal(shm_mq *mq, PGPROC **ptr, 
BackgroundWorkerHandle *handle)
 
/* Wait to be signaled. */
(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-WAIT_EVENT_MQ_INTERNAL);
+
WAIT_EVENT_MESSAGE_QUEUE_INTERNAL);
 
/* Reset the latch so we don't spin. */
ResetLatc

Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-05 Thread Quan Zongliang




On 2023/9/5 16:15, John Naylor wrote:


On Thu, Aug 24, 2023 at 5:55 PM Quan Zongliang > wrote:


 > In the function heapgetpage. If a table is not updated very frequently.
 > Many actions in tuple loops are superfluous. For all_visible pages,
 > loctup does not need to be assigned, nor does the "valid" variable.
 > CheckForSerializableConflictOutNeeded from
 > HeapCheckForSerializableConflictOut function, it only need to inspect at

Thanks for submitting! A few weeks before this, there was another 
proposal, which specializes code for all paths, not just one. That patch 
also does so without duplicating the loop:


https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysj...@awork3.anarazel.de
 



Nice patch. I'm sorry I didn't notice it before.


 > the beginning of the cycle only once. Using vtune you can clearly see
 > the result (attached heapgetpage.jpg).
 >
 > So by splitting the loop logic into two parts, the vtune results show
 > significant improvement (attached heapgetpage-allvis.jpg).

For future reference, it's not clear at all from the screenshots what 
the improvement will be for the user. In the above thread, the author 
shares testing methodology as well as timing measurements. This is 
useful for reproducibilty, as well as convincing others that the change 
is important.



Here's how I test it
   EXPLAIN ANALYZE SELECT * FROM orders;
Maybe the test wasn't good enough. Although the modified optimal result 
looks good. Because it fluctuates a lot. It's hard to compare. The 
results of vtune are therefore used.


My patch is mainly to eliminate:
1, Assignment of "loctup" struct variable (in vtune you can see that 
these 4 lines have a significant overhead: 0.4 1.0 0.2 0.4).

2. Assignment of the "valid" variable.(overhead 0.6)
3. HeapCheckForSerializableConflictOut function call.(overhead 0.6)

Although these are not the same overhead from test to test. But all are 
too obvious to ignore. The screenshots are mainly to show the three 
improvements mentioned above.


I'll also try Andres Freund's test method next.


--
John Naylor
EDB: http://www.enterprisedb.com 






Re: logfmt and application_context

2023-09-05 Thread Daniel Gustafsson
> On 30 Aug 2023, at 14:36, Étienne BERSAC  wrote:

> ..what do you think of having logfmt output along json and CSV ?

logfmt is widely supported by log ingestion and analysis tools, and have been
for a long enoug time (IMHO) to be called mature, which is good.  Less ideal is
that there is no official formal definition of what logfmt is, some consumers
of it (like Splunk) even support it while not calling it logfmt.  If we add
support for it, can we reasonably expect that what we emit is what consumers of
it assume it will look like?  Given the simplicity of it I think it can be
argued, but I'm far from an expert in this area.

--
Daniel Gustafsson





Re: pg_basebackup: Always return valid temporary slot names

2023-09-05 Thread Daniel Gustafsson
> On 5 Sep 2023, at 09:09, Nishant Sharma  
> wrote:

> With this, I was thinking, isn't this a problem of pgbouncer filling
> be_pid with random bits? Maybe it should have filled the be_pid
> with a random positive integer instead of any integer as it
> represents a pid?

I'm inclined to agree that anyone sending a value which is supposed to
represent a PID should be expected to send a value which corresponds to the
format of a PID.

--
Daniel Gustafsson





Re: Optionally using a better backtrace library?

2023-09-05 Thread Alvaro Herrera
On 2023-Sep-04, Noah Misch wrote:

> On Mon, Jul 03, 2023 at 11:58:25AM +0200, Alvaro Herrera wrote:

> > Agreed, these backtraces are pretty close to useless.  Not completely,
> > but I haven't found a practical way to use them for actual debugging
> > of production problems.
> 
> For what it's worth, I use the attached script to convert the current
> errbacktrace output to a fully-symbolized backtrace.

Much appreciated!  I can put this to good use.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2023-09-05 Thread tender wang
After using patch, the result as below :
   QUERY
PLAN

 Limit  (cost=1078.00..26630101.20 rows=1 width=27) (actual
time=160571.005..160571.105 rows=0 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=78.00..78.00 rows=1 width=8) (actual
time=1.065..1.066 rows=1 loops=1)
   ->  Sort  (cost=78.00..83.00 rows=2000 width=8) (actual
time=1.064..1.065 rows=1 loops=1)
 Sort Key: part.p_partkey
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on part  (cost=0.00..68.00 rows=2000 width=8)
(actual time=0.046..0.830 rows=2000 loops=1)
   ->  Gather  (cost=1000.00..26630023.20 rows=1 width=27) (actual
time=160571.003..160571.102 rows=0 loops=1)
 Workers Planned: 1
 Params Evaluated: $0
 Workers Launched: 1
 ->  Nested Loop Left Join  (cost=0.00..26629023.10 rows=1
width=27) (actual time=160549.257..160549.258 rows=0 loops=2)
   Join Filter: ($0 IS NOT NULL)
   Filter: ((sample_0.l_orderkey IS NULL) AND
(sample_0.l_shipmode IS NULL))
   Rows Removed by Filter: 1810515312
   ->  Parallel Seq Scan on lineitem ref_0  (cost=0.00..1721.97
rows=35397 width=11) (actual time=0.010..3.393 rows=30088 loops=2)
   ->  Materialize  (cost=0.00..2270.62 rows=60175 width=27)
(actual time=0.000..2.839 rows=60175 loops=60175)
 ->  Seq Scan on lineitem sample_0  (cost=0.00..1969.75
rows=60175 width=27) (actual time=0.008..11.381 rows=60175 loops=2)
 Planning Time: 0.174 ms
 Execution Time: 160571.476 ms
(20 rows)

tender wang  于2023年9月5日周二 16:52写道:

> Hi all,
>
>I recently run benchmark[1] on master, but I found performance problem
> as below:
>
> explain analyze select
>   subq_0.c0 as c0,
>   subq_0.c1 as c1,
>   subq_0.c2 as c2
> from
>   (select
> ref_0.l_shipmode as c0,
> sample_0.l_orderkey as c1,
> sample_0.l_quantity as c2,
> ref_0.l_orderkey as c3,
> sample_0.l_shipmode as c5,
> ref_0.l_shipinstruct as c6
>   from
> public.lineitem as ref_0
>   left join public.lineitem as sample_0
>   on ((select p_partkey from public.part order by p_partkey limit
> 1)
>  is not NULL)
>   where sample_0.l_orderkey is NULL) as subq_0
> where subq_0.c5 is NULL
> limit 1;
>QUERY PLAN
>
>
> -
>  Limit  (cost=78.00..45267050.75 rows=1 width=27) (actual
> time=299695.097..299695.099 rows=0 loops=1)
>InitPlan 1 (returns $0)
>  ->  Limit  (cost=78.00..78.00 rows=1 width=8) (actual
> time=0.651..0.652 rows=1 loops=1)
>->  Sort  (cost=78.00..83.00 rows=2000 width=8) (actual
> time=0.650..0.651 rows=1 loops=1)
>  Sort Key: part.p_partkey
>  Sort Method: top-N heapsort  Memory: 25kB
>  ->  Seq Scan on part  (cost=0.00..68.00 rows=2000
> width=8) (actual time=0.013..0.428 rows=2000 loops=1)
>->  Nested Loop Left Join  (cost=0.00..45266972.75 rows=1 width=27)
> (actual time=299695.096..299695.096 rows=0 loops=1)
>  Join Filter: ($0 IS NOT NULL)
>  Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode
> IS NULL))
>  Rows Removed by Filter: 3621030625
>  ->  Seq Scan on lineitem ref_0  (cost=0.00..1969.75 rows=60175
> width=11) (actual time=0.026..6.225 rows=60175 loops=1)
>  ->  Materialize  (cost=0.00..2270.62 rows=60175 width=27) (actual
> time=0.000..2.554 rows=60175 loops=60175)
>->  Seq Scan on lineitem sample_0  (cost=0.00..1969.75
> rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
>  Planning Time: 0.172 ms
>  Execution Time: 299695.501 ms
> (16 rows)
>
> After I set enable_material to off, the same query run faster, as below:
> set enable_material = off;
> explain analyze  select
>   subq_0.c0 as c0,
>   subq_0.c1 as c1,
>   subq_0.c2 as c2
> from
>   (select
> ref_0.l_shipmode as c0,
> sample_0.l_orderkey as c1,
> sample_0.l_quantity as c2,
> ref_0.l_orderkey as c3,
> sample_0.l_shipmode as c5,
> ref_0.l_shipinstruct as c6
>   from
> public.lineitem as ref_0
>   left join public.lineitem as sample_0
>   on ((select p_partkey from public.part order by p_partkey limit
> 1)
>  is not NULL)
>   where sample_0.l_orderkey is NULL) as subq_0
> where subq_0.c5 is NULL
> limit 1;
>   QUERY
> PLAN
>
> --

Re: pg_basebackup: Always return valid temporary slot names

2023-09-05 Thread Jelte Fennema
On Tue, 5 Sept 2023 at 11:39, Daniel Gustafsson  wrote:
>
> > On 5 Sep 2023, at 09:09, Nishant Sharma  
> > wrote:
>
> > With this, I was thinking, isn't this a problem of pgbouncer filling
> > be_pid with random bits? Maybe it should have filled the be_pid
> > with a random positive integer instead of any integer as it
> > represents a pid?
>
> I'm inclined to agree that anyone sending a value which is supposed to
> represent a PID should be expected to send a value which corresponds to the
> format of a PID.

When there is a pooler in the middle it already isn't a PID anyway. I
took a look at a few other connection poolers and all the ones I
looked at (Odyssey and pgcat) do the same: They put random bytes in
the be_pid field (and thus can result in negative values). This normally
does not cause any problems, because the be_pid value is simply sent
back verbatim to the server when canceling a query, which is it's main
purpose according to the docs:

> This message provides secret-key data that the frontend must save if it wants 
> to be able to issue cancel requests later.

Source: https://www.postgresql.org/docs/current/protocol-flow.html#id-1.10.6.7.3

For that purpose it's actually more secure to use all bits for random
data, instead of keeping one bit always 0.

Its main other purpose that I know if is displaying it in a psql
prompt, so you know where to attach a debugger. This is completely
broken either way as soon as you have a connection pooler in the
middle, because you would want to display the Postgres backend PID
instead of the random ID that the connection pooler sends back. So if
it's negative that's no issue (it displays fine and it's useless
either way).

So, while I agree that putting a negative value in the process ID field of
BackendData, is arguably incorrect. Given the simplicity of the fix on
the pg_basebackup side, I think addressing it in pg_basebackup is
easier than fixing this in all connection poolers.

Sidenote: When PgBouncer is run in peering mode it actually uses the
first two bytes of the PID to encode the peer_id into it. That way it
knows to which peer it should forward the cancellation message. Thus
fixing this in PgBouncer would require using other bytes for that.




Remove unnecessary 'always:' from CompilerWarnings task

2023-09-05 Thread Nazir Bilal Yavuz
Hi,

There are multiple 'always:' keywords under the CompilerWarnings task.
Instead of that, we can use one 'always:' and move the instructions
under this. So, I removed unnecessary ones and rearranged indents
according to that change.

Any kind of feedback would be appreciated.

Regards,
Nazir Bilal Yavuz
Microsoft
From 3bd8f6ff5b8add90dcfe42761a3c1fe81a5c174a Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Tue, 5 Sep 2023 13:04:24 +0300
Subject: [PATCH v1] Remove unnecessary 'always:' from CompilerWarnings task

Warning scripts and upload_caches instructions can be defined under one
'always:' keyword, so remove unnecessary ones and rearrange indents
according to that change.
---
 .cirrus.tasks.yml | 43 ++-
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e137769850d..e4a81a151b6 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -684,8 +684,8 @@ task:
   # different compilers to build with different combinations of dtrace on/off
   # and cassert on/off.
 
-  # gcc, cassert off, dtrace on
   always:
+# gcc, cassert off, dtrace on
 gcc_warning_script: |
   time ./configure \
 --cache gcc.cache \
@@ -695,8 +695,7 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} world-bin
 
-  # gcc, cassert on, dtrace off
-  always:
+# gcc, cassert on, dtrace off
 gcc_a_warning_script: |
   time ./configure \
 --cache gcc.cache \
@@ -706,8 +705,7 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} world-bin
 
-  # clang, cassert off, dtrace off
-  always:
+# clang, cassert off, dtrace off
 clang_warning_script: |
   time ./configure \
 --cache clang.cache \
@@ -716,8 +714,7 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} world-bin
 
-  # clang, cassert on, dtrace on
-  always:
+# clang, cassert on, dtrace on
 clang_a_warning_script: |
   time ./configure \
 --cache clang.cache \
@@ -728,8 +725,7 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} world-bin
 
-  # cross-compile to windows
-  always:
+# cross-compile to windows
 mingw_cross_warning_script: |
   time ./configure \
 --host=x86_64-w64-mingw32 \
@@ -740,11 +736,10 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} world-bin
 
-  ###
-  # Verify docs can be built
-  ###
-  # XXX: Only do this if there have been changes in doc/ since last build
-  always:
+###
+# Verify docs can be built
+###
+# XXX: Only do this if there have been changes in doc/ since last build
 docs_build_script: |
   time ./configure \
 --cache gcc.cache \
@@ -754,16 +749,15 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} -C doc
 
-  ###
-  # Verify headerscheck / cpluspluscheck succeed
-  #
-  # - Don't use ccache, the files are uncacheable, polluting ccache's
-  #   cache
-  # - Use -fmax-errors, as particularly cpluspluscheck can be very verbose
-  # - XXX have to disable ICU to avoid errors:
-  #   https://postgr.es/m/20220323002024.f2g6tivduzrktgfa%40alap3.anarazel.de
-  ###
-  always:
+###
+# Verify headerscheck / cpluspluscheck succeed
+#
+# - Don't use ccache, the files are uncacheable, polluting ccache's
+#   cache
+# - Use -fmax-errors, as particularly cpluspluscheck can be very verbose
+# - XXX have to disable ICU to avoid errors:
+#   https://postgr.es/m/20220323002024.f2g6tivduzrktgfa%40alap3.anarazel.de
+###
 headers_headerscheck_script: |
   time ./configure \
 ${LINUX_CONFIGURE_FEATURES} \
@@ -775,5 +769,4 @@ task:
 headers_cpluspluscheck_script: |
   time make -s cpluspluscheck EXTRAFLAGS='-fmax-errors=10'
 
-  always:
 upload_caches: ccache
-- 
2.40.1



whether to unlink the existing state.tmp file in SaveSlotToPath

2023-09-05 Thread Sergei Kornilov
Hello
I encountered a very lucky logical decoding error on the publisher:

2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] LOG:  starting logical decoding for slot "pubsub"
2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] DETAIL:  Streaming transactions committing after 
0/16AD5F8, reading WAL from 0/16AD5F8.
2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] STATEMENT:  START_REPLICATION SLOT "pubsub" LOGICAL 
0/16AD5F8 (proto_version '4', origin 'any', publication_names '"testpub"')
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] LOG:  logical decoding found consistent point at 
0/16AD5F8
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] DETAIL:  There are no running transactions.
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] STATEMENT:  START_REPLICATION SLOT "pubsub" LOGICAL 
0/16AD5F8 (proto_version '4', origin 'any', publication_names '"testpub"')
2023-09-05 09:58:39.187 UTC 28316 melkij@postgres from [local] [vxid:3/0 
txid:0] [START_REPLICATION] ERROR:  could not create file 
"pg_replslot/pubsub/state.tmp": File exists

As I found out, the disk with the database ran out of space, but it was so 
lucky that postgresql did not go into crash recovery. Doubly lucky that logical 
walsender was able to create state.tmp, but could not write the contents and 
got "ERROR: could not write to file "pg_replslot/pubsub/state.tmp": No space 
left on device". The empty state.tmp remained on disk. When the problem with 
free disk space was solved, the publication remained inoperative. To fix it, 
one need to restart the database (RestoreSlotFromDisk always deletes state.tmp) 
or delete state.tmp manually.

Maybe in SaveSlotToPath (src/backend/replication/slot.c) it's also worth 
deleting state.tmp if it already exists? All operations are performed under 
LWLock and there should be no parallel access.

PS: I reproduced the error on HEAD by adding pg_usleep to SaveSlotToPath before 
writing to file. At this time, I filled up the virtual disk.

regards, Sergei




Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-05 Thread John Naylor
On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang 
wrote:

> Here's how I test it
> EXPLAIN ANALYZE SELECT * FROM orders;

Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good for
these kinds of tests.

> I'll also try Andres Freund's test method next.

Commit f691f5b80a85 from today removes another source of overhead in this
function, so I suggest testing against that, if you wish to test again.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2023-09-05 Thread Richard Guo
On Tue, Sep 5, 2023 at 4:52 PM tender wang  wrote:

>I recently run benchmark[1] on master, but I found performance problem
> as below:
> ...
>
> I debug the code and find consider_parallel_nestloop() doesn't consider
> materialized form of the cheapest inner path.
>

Yeah, this seems an omission in commit 45be99f8.  I reviewed the patch
and here are some comments.

* I think we should not consider materializing the cheapest inner path
  if we're doing JOIN_UNIQUE_INNER, because in this case we have to
  unique-ify the inner path.

* I think we can check if it'd be parallel safe before creating the
  material path, thus avoid the creation in unsafe cases.

* I don't think the test case you added works for the code changes.
  Maybe a plan likes below is better:

explain (costs off)
select * from tenk1, tenk2 where tenk1.two = tenk2.two;
  QUERY PLAN
--
 Gather
   Workers Planned: 4
   ->  Nested Loop
 Join Filter: (tenk1.two = tenk2.two)
 ->  Parallel Seq Scan on tenk1
 ->  Materialize
   ->  Seq Scan on tenk2
(7 rows)

Thanks
Richard


Re: pg_basebackup: Always return valid temporary slot names

2023-09-05 Thread Jelte Fennema
I modified the PgBouncer PR to always set the sign bit to 0. But I
still I think it makes sense to also address this in pg_basebackup.

On Tue, 5 Sept 2023 at 12:21, Jelte Fennema  wrote:
>
> On Tue, 5 Sept 2023 at 11:39, Daniel Gustafsson  wrote:
> >
> > > On 5 Sep 2023, at 09:09, Nishant Sharma  
> > > wrote:
> >
> > > With this, I was thinking, isn't this a problem of pgbouncer filling
> > > be_pid with random bits? Maybe it should have filled the be_pid
> > > with a random positive integer instead of any integer as it
> > > represents a pid?
> >
> > I'm inclined to agree that anyone sending a value which is supposed to
> > represent a PID should be expected to send a value which corresponds to the
> > format of a PID.
>
> When there is a pooler in the middle it already isn't a PID anyway. I
> took a look at a few other connection poolers and all the ones I
> looked at (Odyssey and pgcat) do the same: They put random bytes in
> the be_pid field (and thus can result in negative values). This normally
> does not cause any problems, because the be_pid value is simply sent
> back verbatim to the server when canceling a query, which is it's main
> purpose according to the docs:
>
> > This message provides secret-key data that the frontend must save if it 
> > wants to be able to issue cancel requests later.
>
> Source: 
> https://www.postgresql.org/docs/current/protocol-flow.html#id-1.10.6.7.3
>
> For that purpose it's actually more secure to use all bits for random
> data, instead of keeping one bit always 0.
>
> Its main other purpose that I know if is displaying it in a psql
> prompt, so you know where to attach a debugger. This is completely
> broken either way as soon as you have a connection pooler in the
> middle, because you would want to display the Postgres backend PID
> instead of the random ID that the connection pooler sends back. So if
> it's negative that's no issue (it displays fine and it's useless
> either way).
>
> So, while I agree that putting a negative value in the process ID field of
> BackendData, is arguably incorrect. Given the simplicity of the fix on
> the pg_basebackup side, I think addressing it in pg_basebackup is
> easier than fixing this in all connection poolers.
>
> Sidenote: When PgBouncer is run in peering mode it actually uses the
> first two bytes of the PID to encode the peer_id into it. That way it
> knows to which peer it should forward the cancellation message. Thus
> fixing this in PgBouncer would require using other bytes for that.




Re: proposal: psql: show current user in prompt

2023-09-05 Thread Jelte Fennema
On Tue, 5 Sept 2023 at 05:50, Pavel Stehule  wrote:

> I prefer to introduce a new function - it is ten lines of code. The form is 
> not important - it can be a full number or minor number. It doesn't matter I 
> think. But my opinion in this area is not strong, and I like to see feedback 
> from others too. It is true that this feature and interface is not fully 
> complete.

I think when using the API it is nicest to have a single function that
returns the full version number. i.e. if we're introducing a new
function I think it should be PQprotocolFullVersion instead of
PQprotocolSubversion. Then instead of doing a version check like this:

if (PQprotocolVersion(pset.db) == 3 && PQprotocolSubversion(pset.db) >= 1)

You can do the simpler:

if (PQprotocolFullVersion(pset.db) >= 301)

This is also in line with how you do version checks for postgres versions.

So I think this patch should at least add that one instead of
PQprotocolSubversion. If we then decide to replace PQprotocolVersion
with this new implementation, that would be a trivial change.

>> +   /* The protocol 3.0 is required */
>> +   if (PG_PROTOCOL_MAJOR(their_version) == 3)
>> +   conn->pversion = their_version;
>>
>> Let's compare against the actual PG_PROTOCOL_EARLIEST and
>> PG_PROTOCOL_LATEST to determine if the version is supported or not.
>
>
> changed

I think we should also check the minor version part. So like this instead
+   if (their_version < PG_PROTOCOL_EARLIEST || their_version >
PG_PROTOCOL_LATEST)


PS. If you use the -v flag of git format-patch a version number is
prepended to your patches. That makes it easier to reference them.




Re: cataloguing NOT NULL constraints

2023-09-05 Thread Peter Eisentraut

On 31.08.23 12:02, Alvaro Herrera wrote:

In constraint_column_usage, you're adding a relkind to the catalog scan
that goes through pg_depend for CHECK constraints.  Here we can rely on
a simple conkey[1] check and a separate UNION ALL arm[q5]; this is also
faster when there are many tables.

The third view definition looks ok.  It's certainly very nice to be able
to delete XXX comments there.


The following information schema views are affected by the not-null 
constraint catalog entries:


1. CHECK_CONSTRAINTS
2. CONSTRAINT_COLUMN_USAGE
3. DOMAIN_CONSTRAINTS
4. TABLE_CONSTRAINTS

Note that 1 and 3 also contain domain constraints.  So as long as the 
domain not-null constraints are not similarly catalogued, we can't 
delete the separate not-null union branch.  (3 never had one, so 
arguably a bit buggy.)


I think we can fix up 4 by just deleting the not-null union branch.

For 2, the simple fix is also easy, but there are some other options, as 
you discuss above.


How do you want to proceed?





RE: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread Zhijie Hou (Fujitsu)
On Tuesday, September 5, 2023 4:15 PM Dilip Kumar  wrote:

Hi,

> 
> On Tue, Sep 5, 2023 at 10:12 AM Amit Kapila 
> wrote:
> >
> > On Tue, Sep 5, 2023 at 7:54 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Monday, September 4, 2023 6:15 PM vignesh C
>  wrote:
> > > >
> > > > On Mon, 4 Sept 2023 at 15:20, Amit Kapila 
> wrote:
> > > > >
> > > > > On Fri, Sep 1, 2023 at 10:50 AM vignesh C 
> wrote:
> > > > > >
> > > > > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila 
> wrote:
> > > > > > > I think we should also ensure that slots are not invalidated
> > > > > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them
> > > > > > > dirty because we don't allow decoding from such slots, so we
> > > > > > > shouldn't include those.
> > > > > >
> > > > > > Added this check.
> > > > > >
> > > > > > Apart from this I have also fixed the following issues that
> > > > > > were agreed on: a) Setting slots to dirty in
> > > > > > CheckPointReplicationSlots instead of setting it in
> > > > > > SaveSlotToPath
> > > > > >
> > > > >
> > > > > + if (is_shutdown && SlotIsLogical(s)) {
> > > > > + SpinLockAcquire(&s->mutex); if (s->data.invalidated ==
> > > > > + RS_INVAL_NONE &&
> > > > > + s->data.confirmed_flush != s->last_saved_confirmed_flush)
> > > > > + s->dirty = true;
> > > > >
> > > > > I think it is better to use ReplicationSlotMarkDirty() as that
> > > > > would be consistent with all other usages.
> > > >
> > > > ReplicationSlotMarkDirty works only on MyReplicationSlot whereas
> > > > CheckpointReplicationSlots loops through all the slots and marks
> > > > the appropriate slot as dirty, we might have to change
> > > > ReplicationSlotMarkDirty to take the slot as input parameter and all 
> > > > caller
> should pass MyReplicationSlot.
> > >
> > > Personally, I feel if we want to centralize the code of marking
> > > dirty into a function, we can introduce a new static function
> > > MarkSlotDirty(slot) to mark passed slot dirty and let
> > > ReplicationSlotMarkDirty and CheckpointReplicationSlots call it. Like:
> > >
> > > void
> > > ReplicationSlotMarkDirty(void)
> > > {
> > > MarkSlotDirty(MyReplicationSlot); }
> > >
> > > +static void
> > > +MarkSlotDirty(ReplicationSlot *slot) {
> > > +   Assert(slot != NULL);
> > > +
> > > +   SpinLockAcquire(&slot->mutex);
> > > +   slot->just_dirtied = true;
> > > +   slot->dirty = true;
> > > +   SpinLockRelease(&slot->mutex); }
> > >
> > > This is somewhat similar to the relation between
> > > ReplicationSlotSave(serialize my backend's replications slot) and
> SaveSlotToPath(save the passed slot).
> > >
> > > > Another thing is we have already taken spin lock to access
> > > > last_confirmed_flush_lsn from CheckpointReplicationSlots, we could
> > > > set dirty flag here itself, else we will have to release the lock
> > > > and call ReplicationSlotMarkDirty which will take lock again.
> > >
> > > Yes, this is unavoidable, but maybe it's not a big problem as we
> > > only do it at shutdown.
> > >
> >
> > True but still it doesn't look elegant. I also thought about having a
> > probably inline function that marks both just_dirty and dirty fields.
> > However, that requires us to assert that the caller has already
> > acquired a spinlock. I see a macro SpinLockFree() that might help but
> > it didn't seem to be used anywhere in the code so not sure if we can
> > rely on it.
> 
> Can't we just have code like this?  I mean we will have to make
> ReplicationSlotMarkDirty take slot as an argument or have another version
> which takes slot as an argument and that would be called by us as well as by
> ReplicationSlotMarkDirty().  I mean why do we need these checks
> (s-(data.invalidated == RS_INVAL_NONE &&
> s->data.confirmed_flush != s->last_saved_confirmed_flush) under the
> mutex?  Walsender is shutdown so confirmed flush LSN can not move
> concurrently and slot can not be invalidated as well because that is done by
> checkpointer and we are in checkpointer?

I agree with your analysis that the lock may be unnecessary for now and the
code will work, but I personally feel we'd better take the spinlock.

Firstly, considering our discussion on the potential extension of persisting
the slot for online checkpoints in the future, we anyway need the lock at that
time, so taking the lock here could avoid overlooking the need to update it
later. And the lock also won't cause any performance or concurrency issue.

Additionally, if we don't take the lock, we rely on the assumption that the
walsender will exit before the shutdown checkpoint, currently, that's true for
logical walsender, but physical walsender can exit later than checkpointer. So,
I am slight woirred that if we change the logical walsender's exit timing in
the future, the assumption may not hold.

Besides, for non-built-in logical replication, if someone creates their own
walsender or other processes to send the changes and the process doesn't exit
before the shutdown checkpoint, it may als

Re: Autogenerate some wait events code and documentation

2023-09-05 Thread Michael Paquier
On Tue, Sep 05, 2023 at 11:06:36AM +0200, Drouvot, Bertrand wrote:
> Oh ok, out of curiosity, why are 2 whitespaces intentional?

That depends on the individual who write the code, but I recall that
this is some old-school style from the 70's and/or the 80's when
typing machines were still something.  I'm just used to this style
after the end of a sentence in a comment.

> That looks good. I just noticed that v5 did re-introduce the "issue" that
> was fixed in 00e49233a9.
> 
> Also, v5 needs a rebase due to f691f5b80a.
> 
> Attaching v6 taking care of the 2 points mentioned above.

Dammit, thanks.  These successive rebases are a bit annoying..  The
data produced is consistent, and the new contents can be grepped, so I
think that I am just going to apply both patches and move on to other
topics.
--
Michael


signature.asc
Description: PGP signature


Re: Commitfest 2023-09 starts soon

2023-09-05 Thread Aleksander Alekseev
Hi,

> I think it would be better to just remove the "consensus" part of your
> mail, and just put down the real reason why each patch is being RfC-ed
> or rejected. That is, don't imply that there are hackers that OK-ed it
> when there are none, and inform patch authors directly about the
> reasons why the patch is being revoked; so without "see consensus in
> [0]".

That's fair enough. I will use "It's been decided" or something like
this next time to avoid any confusion.

-- 
Best regards,
Aleksander Alekseev




Re: pg_basebackup: Always return valid temporary slot names

2023-09-05 Thread Daniel Gustafsson
> On 5 Sep 2023, at 12:21, Jelte Fennema  wrote:
> 
> On Tue, 5 Sept 2023 at 11:39, Daniel Gustafsson  wrote:
>> 
>>> On 5 Sep 2023, at 09:09, Nishant Sharma  
>>> wrote:
>> 
>>> With this, I was thinking, isn't this a problem of pgbouncer filling
>>> be_pid with random bits? Maybe it should have filled the be_pid
>>> with a random positive integer instead of any integer as it
>>> represents a pid?
>> 
>> I'm inclined to agree that anyone sending a value which is supposed to
>> represent a PID should be expected to send a value which corresponds to the
>> format of a PID.
> 
> When there is a pooler in the middle it already isn't a PID anyway. I
> took a look at a few other connection poolers and all the ones I
> looked at (Odyssey and pgcat) do the same: They put random bytes in
> the be_pid field (and thus can result in negative values). This normally
> does not cause any problems, because the be_pid value is simply sent
> back verbatim to the server when canceling a query, which is it's main
> purpose according to the docs:
> 
>> This message provides secret-key data that the frontend must save if it 
>> wants to be able to issue cancel requests later.
> 
> Source: 
> https://www.postgresql.org/docs/current/protocol-flow.html#id-1.10.6.7.3
> 
> For that purpose it's actually more secure to use all bits for random
> data, instead of keeping one bit always 0.

If it's common practice to submit a pid which isn't a pid, I wonder if longer
term it's worth inventing a value for be_pid which means "unknown pid" such
that consumers can make informed calls when reading it?  Not the job of this
patch to do so, but maybe food for thought.

> So, while I agree that putting a negative value in the process ID field of
> BackendData, is arguably incorrect. Given the simplicity of the fix on
> the pg_basebackup side, I think addressing it in pg_basebackup is
> easier than fixing this in all connection poolers.

Since the value in the temporary slotname isn't used to convey meaning, but
merely to ensure uniqueness, I don't think it's unreasonable to guard aginst
malformed input (ie negative integer).

--
Daniel Gustafsson





Re: GUC for temporarily disabling event triggers

2023-09-05 Thread Daniel Gustafsson
> On 6 Apr 2023, at 00:06, Michael Paquier  wrote:

> If there is no strong
> case for more than a boolean for now, simpler is better.

The attached version of the patch replaces it with a boolean flag for turning
off all event triggers, and I also renamed it to the debug_xxx "GUC namespace"
which seemed more appropriate.

--
Daniel Gustafsson



v6-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data


Re: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread Dilip Kumar
On Tue, Sep 5, 2023 at 5:04 PM Zhijie Hou (Fujitsu)
 wrote:
>

> > Can't we just have code like this?  I mean we will have to make
> > ReplicationSlotMarkDirty take slot as an argument or have another version
> > which takes slot as an argument and that would be called by us as well as by
> > ReplicationSlotMarkDirty().  I mean why do we need these checks
> > (s-(data.invalidated == RS_INVAL_NONE &&
> > s->data.confirmed_flush != s->last_saved_confirmed_flush) under the
> > mutex?  Walsender is shutdown so confirmed flush LSN can not move
> > concurrently and slot can not be invalidated as well because that is done by
> > checkpointer and we are in checkpointer?
>
> I agree with your analysis that the lock may be unnecessary for now and the
> code will work, but I personally feel we'd better take the spinlock.
>
> Firstly, considering our discussion on the potential extension of persisting
> the slot for online checkpoints in the future, we anyway need the lock at that
> time, so taking the lock here could avoid overlooking the need to update it
> later. And the lock also won't cause any performance or concurrency issue.

If we think that we might plan to persist on the online checkpoint as
well then better to do it now, because this is not a extension of the
feature instead we are thinking that it is wise to just persist on the
shutdown checkpoint and I think that's what the conclusion at this
point and if thats the conclusion then no point to right code in
assumption that we will change our conclusion in future.

> Additionally, if we don't take the lock, we rely on the assumption that the
> walsender will exit before the shutdown checkpoint, currently, that's true for
> logical walsender, but physical walsender can exit later than checkpointer. 
> So,
> I am slight woirred that if we change the logical walsender's exit timing in
> the future, the assumption may not hold.
>
> Besides, for non-built-in logical replication, if someone creates their own
> walsender or other processes to send the changes and the process doesn't exit
> before the shutdown checkpoint, it may also be a problem. Although I don't 
> have
> exsiting examples about these extensions, but I feel taking the lock would 
> make
> it more robust.

I think our all logic is based on that the walsender is existed
already.  If not then even if you check under the mutex that the
confirmed flush LSN is not changed then it can changed right after you
release the lock and then we will not be flushing the latest update of
the confirmed flush lsn to the disk and our logic of comparing
checkpoint.redo with the confirmed flush lsn might not work?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Extract numeric filed in JSONB more effectively

2023-09-05 Thread jian he
On Mon, Sep 4, 2023 at 10:35 PM Andy Fan  wrote:
>
> Hi,
>
>   v13 attached.  Changes includes:
>
> 1.  fix the bug Jian provides.
> 2.  reduce more code duplication without DirectFunctionCall.
> 3.  add the overlooked  jsonb_path_query and jsonb_path_query_first as 
> candidates
>
>
> --
> Best Regards
> Andy Fan

based on v13.
IMHO, it might be a good idea to write some comments on
jsonb_object_field_internal. especially the second boolean argument.
something like "some case, we just want return JsonbValue rather than
Jsonb. to return JsonbValue, make as_jsonb be false".

I am not sure "jsonb_object_field_start" is a good name, so far I only
come up with "jsonb_object_field_to_jsonbvalues".

linitial(jsonb_start_func->args) =
makeRelabelType(linitial(jsonb_start_func->args),
   INTERNALOID, 0,
   InvalidOid,
   COERCE_IMPLICIT_CAST);

if no need, output typmod (usually -1), so here should be -1 rather than 0?

list_make2(jsonb_start_func, makeConst(.). you can just combine
two different types then make a list, seems pretty cool...




Re: XLog size reductions: smaller XLRec block header for PG17

2023-09-05 Thread Aleksander Alekseev
Hi,

I noticed that the patch needs review and decided to take a look.

> No meaningful savings in the pgbench workload, mostly due to xlog
> record length MAXALIGNs currently not being favorable in the pgbench
> workload. But, record sizes have dropped by 1 or 2 bytes in several
> cases, as can be seen at the bottom of this mail.

This may not sound a lot but still is valuable IMO if we consider the
reduction in terms of percentages of overall saved disk throughput,
network traffic, etc, not in absolute values per one record. Even if
1-2 bytes are not a bottleneck that can be seen on benchmarks (or the
performance improvement is not that impressive), it's some amount of
money paid on cloud. Considering the fact that the patch is not that
complicated I see no reason not to apply the optimization as long as
it doesn't cause degradations.

I also agree with Matthias' arguments above regarding the lack of
one-size-fits-all variable encoding and the overall desire to keep the
focus. E.g. the code can be refactored if and when we discover that
different subsystems ended up using the same encoding.

All in all the patch looks good to me, but I have a couple of nitpicks:

* The comment for XLogSizeClass seems to be somewhat truncated as if
Ctr+S was not pressed before creating the patch. I also suggest
double-checking the grammar.
* `Size written = -1;` in XLogWriteLength() can lead to compiler
warnings some day considering the fact that Size / size_t are
unsigned. Also this assignment doesn't seem to serve any particular
purpose. So I suggest removing it.
* I don't see much value in using the WRITE_OP macro in
XLogWriteLength(). The code is read more often than it's written and I
wouldn't call this code particularly readable (although it's shorter).
* XLogReadLength() - ditto
* `if (read < 0)` in DecodeXLogRecord() is noop since `read` is unsigned

-- 
Best regards,
Aleksander Alekseev




Re: RFC: Logging plan of the running query

2023-09-05 Thread torikoshia

On 2023-08-28 22:47, James Coleman wrote:
On Mon, Aug 28, 2023 at 3:01 AM torikoshia  
wrote:


On 2023-08-26 21:03, James Coleman wrote:
> On Fri, Aug 25, 2023 at 7:43 AM James Coleman  wrote:
>>
>> On Thu, Aug 17, 2023 at 10:02 AM torikoshia
>>  wrote:
>> >
>> > On 2023-06-16 01:34, James Coleman wrote:
>> > > Attached is v28
>> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
>> > > when necessary. Once built with those two patches I'm simply running
>> > > `make check`.
>> >
>> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume
>> > huge
>> > amount of memory and under some environments they were terminated by OOM
>> > killer.
>> >
>> > This was because memory was allocated from existing memory contexts and
>> > they
>> > were not freed after ProcessLogQueryPlanInterrupt().
>> > Updated the patch to use dedicated memory context for
>> > ProcessLogQueryPlanInterrupt().
>> >
>> > Applying attached patch and v28-0002 patch, `make check` successfully
>> > completed after 20min and 50GB of logs on my environment.
>> >
>> > >>> On 2023-06-15 01:48, James Coleman wrote:
>> > >>> > The tests have been running since last night, but have been 
apparently
>> > >>> > hung now for many hours.
>> >
>> > I don't know if this has anything to do with the hung you faced, but I
>> > thought
>> > it might be possible that the large amount of memory usage resulted in
>> > swapping, which caused a significant delay in processing.
>>
>> Ah, yes, I think that could be a possible explanation. I was delaying
>> on this thread because I wasn't comfortable with having caused an
>> issue once (even if I couldn't easily reproduce) without at least some
>> theory as to the cause (and a fix).
>>
>> > If possible, I would be very grateful if you could try to reproduce this
>> > with
>> > the v29 patch.
>>
>> I'll kick off some testing.
>>
>
> I don't have time to investigate what's happening here, but 24 hours
> later the first "make check" is still running, and at first glance it
> seems to have the same behavior I'd seen that first time. The test
> output is to this point:
>
> # parallel group (5 tests):  index_including create_view
> index_including_gist create_index create_index_spgist
> ok 66+ create_index26365 ms
> ok 67+ create_index_spgist 27675 ms
> ok 68+ create_view  1235 ms
> ok 69+ index_including  1102 ms
> ok 70+ index_including_gist 1633 ms
> # parallel group (16 tests):  create_aggregate create_cast errors
> roleattributes drop_if_exists hash_func typed_table create_am
> infinite_recurse
>
> and it hasn't progressed past that point since at least ~16 hours ago
> (the first several hours of the run I wasn't monitoring it).
>
> I haven't connected up gdb yet, and won't be able to until maybe
> tomorrow, but here's the ps output for postgres processes that are
> running:
>
> admin3213249  0.0  0.0 196824 20552 ?Ss   Aug25   0:00
> /home/admin/postgresql-test/bin/postgres -D
> /home/admin/postgresql-test-data
> admin3213250  0.0  0.0 196964  7284 ?Ss   Aug25   0:00
> postgres: checkpointer
> admin3213251  0.0  0.0 196956  4276 ?Ss   Aug25   0:00
> postgres: background writer
> admin3213253  0.0  0.0 196956  8600 ?Ss   Aug25   0:00
> postgres: walwriter
> admin3213254  0.0  0.0 198424  7316 ?Ss   Aug25   0:00
> postgres: autovacuum launcher
> admin3213255  0.0  0.0 198412  5488 ?Ss   Aug25   0:00
> postgres: logical replication launcher
> admin3237967  0.0  0.0   2484   516 pts/4S+   Aug25   0:00
> /bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
> 
PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
> 
LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
> INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
> ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
> --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20
> --schedule=./parallel_schedule
> admin3237973  0.0  0.0 197880 20688 pts/4S+   Aug25   0:00
> postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
> listen_addresses= -k /tmp/pg_regress-7mmGUa
> admin3237976  0.0  0.1 198332 44608 ?Ss   Aug25   0:00
> postgres: checkpointer
> admin3237977  0.0  0.0 198032  4640 ?Ss   Aug25   0:00
> postgres: background writer
> admin3237979  0.0  0.0 197880  8580 ?Ss   Aug25   0:00
> postgres: walwriter
> admin3237980  0.0  0.0 199484  7608 ?Ss   Aug25   0:00
> postgres: autovacuum launcher
> admin3237981  0.0  0.0 199460  5488 ?Ss   Aug25   0:00
> postgres: logical replication launcher
> admin3243644  0.0  0.2 252400 74656 ?Ss   Aug25   0:01
> postgres: 

Re: Initdb-time block size specification

2023-09-05 Thread Robert Haas
On Sat, Sep 2, 2023 at 3:09 PM Tomas Vondra
 wrote:
> Except that no one is forcing you to actually go 130mph or 32mph, right?
> You make it seem like this patch forces people to use some other page
> size, but that's clearly not what it's doing - it gives you the option
> to use smaller or larger block, if you chose to. Just like increasing
> the speed limit to 130mph doesn't mean you can't keep going 65mph.
>
> The thing is - we *already* allow using different block size, except
> that you have to do custom build. This just makes it easier.

Right. Which is worth doing if it doesn't hurt performance and is
likely to be useful to a lot of people, and is not worth doing if it
will hurt performance and be useful to relatively few people. My bet
is on the latter.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




How to add a new pg oid?

2023-09-05 Thread jacktby jacktby




Re: Initdb-time block size specification

2023-09-05 Thread David Christensen
On Tue, Sep 5, 2023 at 9:04 AM Robert Haas  wrote:
>
> On Sat, Sep 2, 2023 at 3:09 PM Tomas Vondra
>  wrote:
> > Except that no one is forcing you to actually go 130mph or 32mph, right?
> > You make it seem like this patch forces people to use some other page
> > size, but that's clearly not what it's doing - it gives you the option
> > to use smaller or larger block, if you chose to. Just like increasing
> > the speed limit to 130mph doesn't mean you can't keep going 65mph.
> >
> > The thing is - we *already* allow using different block size, except
> > that you have to do custom build. This just makes it easier.
>
> Right. Which is worth doing if it doesn't hurt performance and is
> likely to be useful to a lot of people, and is not worth doing if it
> will hurt performance and be useful to relatively few people. My bet
> is on the latter.

Agreed that this doesn't make sense if there are major performance
regressions, however the goal here is patch evaluation to measure this
against other workloads and see if this is the case; from my localized
testing things were within acceptable noise levels with the latest
version.

I agree with Tomas' earlier thoughts: we already allow different block
sizes, and if there are baked-in algorithmic assumptions about block
size (which there probably are), then identifying those or places in
the code where we need additional work or test coverage will only
improve things overall for those non-standard block sizes.

Best,

David




Re: [RFC] Add jit deform_counter

2023-09-05 Thread Daniel Gustafsson
> On 14 Aug 2023, at 16:36, Dmitry Dolgov <9erthali...@gmail.com> wrote:

> And here is it. The json version of EXPLAIN now looks like this:
> 
> "JIT": {
>  [...]
>   "Timing": {
> "Generation": {
>   "Deform": 0.000,
>   "Total": 0.205
> },
> "Inlining": 0.065,
> "Optimization": 2.465,
> "Emission": 2.337,
> "Total": 5.072
>   }
> },

I've gone over this version of the patch and I think it's ready to go in.  I'm
marking this Ready for Committer and will go ahead with it shortly barring any
objections.

--
Daniel Gustafsson





Re: generic plans and "initial" pruning

2023-09-05 Thread Robert Haas
On Tue, Sep 5, 2023 at 3:13 AM Amit Langote  wrote:
> Attached 0001 removes unnecessary cleanup calls from ExecEnd*() routines.

It also adds a few random Assert()s to verify that unrelated pointers
are not NULL. I suggest that it shouldn't do that.

The commit message doesn't mention the removal of the calls to
ExecDropSingleTupleTableSlot. It's not clear to me why that's OK and I
think it would be nice to mention it in the commit message, assuming
that it is in fact OK.

I suggest changing the subject line of the commit to something like
"Remove obsolete executor cleanup code."

> 0002 adds NULLness checks in ExecEnd*() routines on some pointers that
> may not be initialized by the corresponding ExecInit*() routines in
> the case where it returns early.

I think you should only add these where it's needed. For example, I
think list_free_deep(NIL) is fine.

The changes to ExecEndForeignScan look like they include stuff that
belongs in 0001.

Personally, I prefer explicit NULL-tests i.e. if (x != NULL) to
implicit ones like if (x), but opinions vary.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: trying again to get incremental backup

2023-09-05 Thread Robert Haas
On Mon, Sep 4, 2023 at 8:42 AM Dilip Kumar  wrote:
> Can't we think of comparing at the block level, like we can compare
> each block but ignore the content of the hole?

We could do that, but I don't think that's a full solution. I think
I'd end up having to reimplement the equivalent of heap_mask,
btree_mask, et. al. in Perl, which doesn't seem very reasonable. It's
fairly complicated logic even written in C, and doing the right thing
in Perl would be more complex, I think, because it wouldn't have
access to all the same #defines which depend on things like word size
and Endianness and stuff. If we want to allow this sort of comparison,
I feel we should think of changing the C code in some way to make it
work reliably rather than try to paper over the problems in Perl.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: How to add a new pg oid?

2023-09-05 Thread jacktby jacktby


> 2023年9月5日 22:29,jacktby jacktby  写道:
> 
I’m trying to add a new data type for my pg. How to do that? Can you give me 
more details or an example?



Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-05 Thread Robert Haas
On Mon, Sep 4, 2023 at 10:17 AM Aleksander Alekseev
 wrote:
> During the triage of the patches submitted for the September CF a
> consensus was reached [1] to mark this patch as Rejected.

I don't think that's the correct conclusion. You said here that you
didn't think the patch was valuable. Then you said the same thing over
there. You agreeing with yourself is not a consensus.

I think this patch is a good idea and should be committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SLRUs in the main buffer pool - Page Header definitions

2023-09-05 Thread Robert Haas
On Thu, Aug 24, 2023 at 3:28 PM Stephen Frost  wrote:
> Agreed that we'd certainly want to make sure it's all correct and to do
> performance testing but in terms of how many buffers... isn't much of
> the point of this that we have data in memory because it's being used
> and if it's not then we evict it?  That is, I wouldn't think we'd have
> set parts of the buffer pool for SLRUs vs. regular data but would let
> the actual demand drive what pages are in the cache and I thought that
> was part of this proposal and part of the reasoning behind making this
> change.

I think that it's not quite that simple. In the regular buffer pool,
access to pages is controlled by buffer pins and buffer content locks,
but these mechanisms don't exist in the same way in the SLRU code. But
buffer pins drive usage counts which drive eviction decisions. So if
you move SLRU data into the main buffer pool, you either need to keep
the current locking regime and use some new logic to decide how much
of shared_buffers to bequeath to the SLRU pools, OR you need to make
SLRU access use buffer pins and buffer content locks. If you do the
latter, I think you substantially increase the cost of an uncontended
SLRU buffer access, because you now need to pin the buffer, and and
then take and release the content lock, and then release the pin;
whereas today you can do all that by just taking and release the
SLRU's lwlock. That's more atomic operations, and hence more costly, I
think. But even if not, it could perform terribly if SLRU buffers
become more vulnerable to eviction than they are at present, or
alternatively if they take over too much of the buffer pool and force
other important data out.

SLRUs are a performance hotspot, so even relatively minor changes to
their performance characteristics can, I believe, have pretty
noticeable effects on performance overall.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: GUC for temporarily disabling event triggers

2023-09-05 Thread Robert Haas
On Tue, Sep 5, 2023 at 8:12 AM Daniel Gustafsson  wrote:
> > On 6 Apr 2023, at 00:06, Michael Paquier  wrote:
> > If there is no strong
> > case for more than a boolean for now, simpler is better.
>
> The attached version of the patch replaces it with a boolean flag for turning
> off all event triggers, and I also renamed it to the debug_xxx "GUC namespace"
> which seemed more appropriate.

I don't care for the debug_xxx renaming, myself. I think that should
be reserved for things where we believe there's no reason to ever use
it in production/real life, or for things whose purpose is to emit
debugging messages. Neither is the case here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: cataloguing NOT NULL constraints

2023-09-05 Thread Alvaro Herrera
On 2023-Sep-05, Peter Eisentraut wrote:

> The following information schema views are affected by the not-null
> constraint catalog entries:
> 
> 1. CHECK_CONSTRAINTS
> 2. CONSTRAINT_COLUMN_USAGE
> 3. DOMAIN_CONSTRAINTS
> 4. TABLE_CONSTRAINTS
> 
> Note that 1 and 3 also contain domain constraints.  So as long as the domain
> not-null constraints are not similarly catalogued, we can't delete the
> separate not-null union branch.  (3 never had one, so arguably a bit buggy.)
> 
> I think we can fix up 4 by just deleting the not-null union branch.
> 
> For 2, the simple fix is also easy, but there are some other options, as you
> discuss above.
> 
> How do you want to proceed?

I posted as a patch in a separate thread[1].  Let me fix up the
definitions for views 1 and 3 for domains per your comments, and I'll
post in that thread again.

[1] https://postgr.es/m/202309041710.psytrxlsiqex@alvherre.pgsql

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)




Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run

2023-09-05 Thread Yugo NAGATA
On Mon, 14 Aug 2023 23:37:25 +0900
Yugo NAGATA  wrote:

> On Mon, 14 Aug 2023 08:29:25 +0900
> Michael Paquier  wrote:
> 
> > On Sun, Aug 13, 2023 at 11:22:33AM +0200, Fabien COELHO wrote:
> > > Test run is ok on my Ubuntu laptop.
> > 
> > I have a few comments about this patch.
> > 
> > On HEAD and even after this patch, we still have the following:
> > SKIP:   
> > 
> >{
> > 
> >   skip "cancel test requires a Unix 
> > shell", 2 if $windows_os;
> > 
> > Could the SKIP be removed for $windows_os?  If not, this had better be
> > documented because the reason for the skip becomes incorrect.
> > 
> > The comment at the top of the SKIP block still states the following:
> > # There is, as of this writing, no documented way to get the PID of
> > # the process from IPC::Run.  As a workaround, we have psql print its
> > # own PID (which is the parent of the shell launched by psql) to a
> > # file.
> > 
> > This is also incorrect.
> 
> Thank you for your comments
> 
> I will check whether the test works in Windows and remove SKIP if possible.
> Also, I'll fix the comment in either case.

I checked if the test using IPC::Run::signal can work on Windows, and
confirmed that this didn't work because sending SIGINT caused to
terminate the test itself. Here is the results;

t/001_basic.pl ... ok
t/010_tab_completion.pl .. skipped: readline is not supported by this build
t/020_cancel.pl .. Terminating on signal SIGINT(2)

Therefore, this test should be skipped on Windows.

I attached the update patch. I removed the incorrect comments and
unnecessary lines. Also,  I rewrote the test to use "skip_all" instead
of SKIP because we skip the whole test rather than a part of it.

Regards,
Yugo Nagata

> 
> Regards,
> Yugo Nagata
> 
> -- 
> Yugo NAGATA 
> 
> 


-- 
Yugo NAGATA 
diff --git a/src/bin/psql/t/020_cancel.pl b/src/bin/psql/t/020_cancel.pl
index 0765d82b92..cabbf0210a 100644
--- a/src/bin/psql/t/020_cancel.pl
+++ b/src/bin/psql/t/020_cancel.pl
@@ -9,72 +9,39 @@ use PostgreSQL::Test::Utils;
 use Test::More;
 use Time::HiRes qw(usleep);
 
-my $tempdir = PostgreSQL::Test::Utils::tempdir;
+# Test query canceling by sending SIGINT to a running psql
+
+if ($windows_os)
+{
+	plan skip_all => 'sending SIGINT on Windows terminates the test itself';
+}
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
 $node->start;
 
-# Test query canceling by sending SIGINT to a running psql
-#
-# There is, as of this writing, no documented way to get the PID of
-# the process from IPC::Run.  As a workaround, we have psql print its
-# own PID (which is the parent of the shell launched by psql) to a
-# file.
-SKIP:
-{
-	skip "cancel test requires a Unix shell", 2 if $windows_os;
+local %ENV = $node->_get_env();
 
-	local %ENV = $node->_get_env();
+my ($stdin, $stdout, $stderr);
+my $h = IPC::Run::start([ 'psql', '-X', '-v', 'ON_ERROR_STOP=1' ],
+	\$stdin, \$stdout, \$stderr);
 
-	my ($stdin, $stdout, $stderr);
+# Send sleep command and wait until the server has registered it
+$stdin = "select pg_sleep($PostgreSQL::Test::Utils::timeout_default);\n";
+pump $h while length $stdin;
+$node->poll_query_until('postgres',
+	q{SELECT (SELECT count(*) FROM pg_stat_activity WHERE query ~ '^select pg_sleep') > 0;}
+) or die "timed out";
 
-	# Test whether shell supports $PPID.  It's part of POSIX, but some
-	# pre-/non-POSIX shells don't support it (e.g., NetBSD).
-	$stdin = "\\! echo \$PPID";
-	IPC::Run::run([ 'psql', '-X', '-v', 'ON_ERROR_STOP=1' ],
-		'<', \$stdin, '>', \$stdout, '2>', \$stderr);
-	$stdout =~ /^\d+$/ or skip "shell apparently does not support \$PPID", 2;
+# Send cancel request
+$h->signal('INT');
 
-	# Now start the real test
-	my $h = IPC::Run::start([ 'psql', '-X', '-v', 'ON_ERROR_STOP=1' ],
-		\$stdin, \$stdout, \$stderr);
+my $result = finish $h;
 
-	# Get the PID
-	$stdout = '';
-	$stderr = '';
-	$stdin = "\\! echo \$PPID >$tempdir/psql.pid\n";
-	pump $h while length $stdin;
-	my $count;
-	my $psql_pid;
-	until (
-		-s "$tempdir/psql.pid"
-		  and ($psql_pid =
-			PostgreSQL::Test::Utils::slurp_file("$tempdir/psql.pid")) =~
-		  /^\d+\n/s)
-	{
-		($count++ < 100 * $PostgreSQL::Test::Utils::timeout_default)
-		  or die "pid file did not appear";
-		usleep(10_000);
-	}
-
-	# Send sleep command and wait until the server has registered it
-	$stdin = "select pg_sleep($PostgreSQL::Test::Utils::timeout_default);\n";
-	pump $h while length $stdin;
-	$node->poll_query_until('postgres',
-		q{SELECT (SELECT count(*) FROM pg_stat_activity WHERE query ~ '^select pg_sleep') > 0;}
-	) or die "timed out";
-
-	# Send cancel request
-	kill 'INT', $psql_pid;
-
-	my $result = fini

Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-05 Thread Aleksander Alekseev
Hi Robert,

> I don't think that's the correct conclusion. You said here that you
> didn't think the patch was valuable. Then you said the same thing over
> there. You agreeing with yourself is not a consensus.

The word "consensus" was a poor choice for sure. The fact that I
suggested to reject the patch and nobody objected straight away is not
a consensus.

> I think this patch is a good idea and should be committed.

No problem, changing status back to "Needs review".

-- 
Best regards,
Aleksander Alekseev




Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-05 Thread Aleksander Alekseev
Hi,

> > I think this patch is a good idea and should be committed.
>
> No problem, changing status back to "Needs review".

Now when we continue reviewing the patch, could you please elaborate a
bit on why you think it's worth committing?

-- 
Best regards,
Aleksander Alekseev




Re: information_schema and not-null constraints

2023-09-05 Thread Alvaro Herrera
On 2023-Sep-05, Peter Eisentraut wrote:

> The following information schema views are affected by the not-null
> constraint catalog entries:
> 
> 1. CHECK_CONSTRAINTS
> 2. CONSTRAINT_COLUMN_USAGE
> 3. DOMAIN_CONSTRAINTS
> 4. TABLE_CONSTRAINTS
> 
> Note that 1 and 3 also contain domain constraints.

After looking at what happens for domain constraints in older versions
(I tested 15, but I suppose this applies everywhere), I notice that we
don't seem to handle them anywhere that I can see.  My quick exercise is
just

create domain nnint as int not null;
create table foo (a nnint);

and then verify that this constraint shows nowhere -- it's not in
DOMAIN_CONSTRAINTS for starters, which is I think the most obvious place.
And nothing is shown in CHECK_CONSTRAINTS nor TABLE_CONSTRAINTS either.

This did ever work in the past?  I tested with 9.3 and didn't see
anything there either.

I am hesitant to try to add domain not-null constraint support to
information_schema in the same commit as these changes.  I think this
should be fixed separately.

(Note that if, in older versions, you change the table to be
 create table foo (a nnint NOT NULL);
 then you do get a row in table_constraints, but nothing in
 check_constraints.  With my proposed definition this constraint appears
 in check_constraints, table_constraints and constraint_column_usage.)

On 2023-Sep-04, Tom Lane wrote:

> I object very very strongly to this proposed test method.  It
> completely undoes the work I did in v15 (cc50080a8 and related)
> to make the core regression test scripts mostly independent of each
> other.  Even without considering the use-case of running a subset of
> the tests, the new test's expected output will constantly be needing
> updates as side effects of unrelated changes.

You're absolutely right, this would be disastrous.  A better alternative
is that the new test file creates a few objects for itself, either by
using a separate role or by using a separate schema, and we examine the
information_schema display for those objects only.  Then it'll be better
isolated.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)




Re: sandboxing untrusted code

2023-09-05 Thread Robert Haas
On Fri, Sep 1, 2023 at 5:27 PM Jeff Davis  wrote:
> Which privileges are available in a sandboxed environment, exactly? Is
> it kind of like masking away all privileges except EXECUTE, or are
> other privileges available, like SELECT?

I think I've more or less answered this already -- fully sandboxed
code can't make reference to external data sources, from which it
follows that it can't exercise SELECT (and most other privileges).

> And the distinction that you are drawing between having the privileges
> but them (mostly) not being available, versus not having the privileges
> at all, is fairly subtle. Some examples showing why that distinction is
> important would be helpful.

I view it like this: when Bob tries to insert or update or delete
Alice's table, and Alice has some code attached to it, Alice is
effectively asking Bob to execute that code with his own privileges.
In general, I think we can reasonably expect that Bob WILL be willing
to do this: if he didn't want to modify into Alice's table, he
wouldn't have executed a DML statement against it, and executing the
code that Alice has attached to that table is a precondition of being
allowed to perform that modification. It's Alice's table and she gets
to set the rules. However, Bob is also allowed to protect himself. If
he's running Alice's code and it wants to do something with which Bob
isn't comfortable, he can change his mind and refuse to execute it
after all.

I always find it helpful to consider real world examples with similar
characteristics. Let's say that Bob is renting a VRBO from Alice.
Alice leaves behind, in the VRBO, a set of rules which Bob must follow
as a condition of being allowed to rent the VRBO. Those rules include
things that Bob but must do at checkout time, like washing all of his
dishes. As a matter of routine, Bob will follow Alice's checkout
instructions. But if Alice includes in the checkout instructions
"Leave your driver's license and social security card on the dining
room table after checkout, plus a record of all of your bank account
numbers," the security systems in Bob's brain should activate and
prevent those instructions from getting followed.

A major difference between that situation (a short term rental of
someone else's house) and the in-database case (a DML statement
against someone else's table) is that when Bob is following Alice's
VRBO checkout instructions, he knows exactly what actions he is
performing. When he executes a DML statement against Alice's table,
Bob the human being does not actually know what Alice's triggers or
index expressions or whatever are causing him to do. As I see it, the
purpose of this system is to prevent Bob from doing things that he
didn't intend to do. He's cool with adding 2 and 2 or concatenating
some strings or whatever, but probably not with reading data and
handing it over to Alice, and definitely not handing all of his
privileges over to Alice. Full sandboxing has to block that kind of
stuff, and it needs to do so precisely because *Bob would not allow
those operations if he knew about them*.

Now, it is not going to be possible to get that perfectly right.
PostgreSQL can not know the state of Bob's human mind, and it cannot
be expected to judge with perfect accuracy what actions Bob would or
would not approve. However, it can make some conservative guesses. If
Bob wants to override those guesses by saying "I trust Alice, do
whatever she says" that's fine. This system attempts to prevent Bob
from accidentally giving away his permissions to an adversary who has
buried malicious code in some unexpected place. But, unlike the
regular permissions system, it is not there to prevent Bob from doing
things that he isn't allowed to do. It's there to prevent Bob from
doing things that he didn't intend to do.

And that's where I see the distinction between *having* permissions
and those permissions being *available* in a particular context. Bob
has permission to give Alice an extra $1000 or whatever if he has the
money and wishes to do so. But those permissions are probably not
*available* in the context where Bob is following a set of
instructions from Alice. If Bob's brain spontaneously generated the
idea "let's give Alice a $1000 tip because her vacation home was
absolutely amazing and I am quite rich," he would probably go right
ahead and act on that idea and that is completely fine. But when Bob
encounters that same idea *on a list of instructions provided by
Alice*, the same operation is blocked *because it came from Alice*. If
the list of instructions from Alice said to sweep the parlor, Bob
would just go ahead and do it. Alice has permission to induce Bob to
sweep the parlor, but does not have permission to induce Bob to give
her a bunch of extra money.

And in the database context, I think it's fine if Alice induces Bob to
compute some values or look at the value of work_mem, but I don't
think it's OK if Alice induces Bob to make her a superuser. Unless Bob
declares t

Re: How to add a new pg oid?

2023-09-05 Thread Matthias van de Meent
On Tue, 5 Sept 2023 at 18:13, jacktby jacktby  wrote:
>
> I’m trying to add a new data type for my pg. How to do that? Can you give me 
> more details or an example?

You could get started by looking at the documentation on custom SQL
types with https://www.postgresql.org/docs/current/sql-createtype.html,
or look at the comments in pg_type.dat and the comments on TypInfo in
bootstrap.c on how the built-in types are created and managed.

Lastly, you could look at pg_class and the genbki documentation if you
want to add new catalog types.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: PATCH: document for regression test forgets libpq test

2023-09-05 Thread Bruce Momjian
On Fri, Sep  1, 2023 at 08:01:47AM +, Ryo Matsumura (Fujitsu) wrote:
> Hi,
> 
> I found a small mistake in document in 33.1.3. Additional Test Suites.
> 
> > The additional tests that can be invoked this way include:
> The list doesn't include interface/libpq/test.
> 
> I attached patch.

Yes, good point.  I modifed the patch, attached, and applied it to all
supported versions.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 675db86e4d..de065c0564 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -196,8 +196,9 @@ make check-world -j8 >/dev/null


 
- Regression tests for the ECPG interface library,
- located in src/interfaces/ecpg/test.
+ Regression tests for the interface libraries,
+ located in src/interfaces/libpq/test and
+ src/interfaces/ecpg/test.
 




Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-05 Thread Robert Haas
On Tue, Sep 5, 2023 at 12:05 PM Aleksander Alekseev
 wrote:
> Now when we continue reviewing the patch, could you please elaborate a
> bit on why you think it's worth committing?

Well, why not? The test he's proposing to move earlier doesn't involve
calling a function, so it should be cheaper than the one he's moving
it past, which does.

I mean, I don't know whether the savings are going to be measurable on
a benchmark, but I guess I don't particularly see why it matters. Why
write a function that says "this thing is cheaper so we test it first"
and then perform a cheaper test afterwards? That's just silly. We can
either change the comment to say "we do this first for no reason even
though it would be more sensible to do the cheap test first" or we can
reorder the tests to match the principle set forth in the existing
comment.

I mean, unless there's some reason why it *isn't* cheaper. In that
case we should have a different conversation...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: information_schema and not-null constraints

2023-09-05 Thread Alvaro Herrera
On 2023-Sep-05, Alvaro Herrera wrote:

> After looking at what happens for domain constraints in older versions
> (I tested 15, but I suppose this applies everywhere), I notice that we
> don't seem to handle them anywhere that I can see.  My quick exercise is
> just
> 
> create domain nnint as int not null;
> create table foo (a nnint);
> 
> and then verify that this constraint shows nowhere -- it's not in
> DOMAIN_CONSTRAINTS for starters, which is I think the most obvious place.
> And nothing is shown in CHECK_CONSTRAINTS nor TABLE_CONSTRAINTS either.

Looking now at what to do for CHECK_CONSTRAINTS with domain constraints,
I admit I'm completely confused about what this view is supposed to
show.  Currently, we show the constraint name and a definition like
"CHECK (column IS NOT NULL)".  But since the table name is not given, it
is not possible to know to what table the column name refers to.  For
domains, we could show "CHECK (VALUE IS NOT NULL)" but again with no
indication of what domain it applies to, or anything at all that would
make this useful in any way whatsoever.

So this whole thing seems pretty futile and I'm disinclined to waste
much time on it.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Create shorthand for including all extra tests

2023-09-05 Thread Nazir Bilal Yavuz
Hi,

Thanks for the feedback! I updated the patch, 'needs-private-lo'
option enables kerberos, ldap, load_balance and ssl extra tests now.

> On Mon, Sep 04, 2023 at 04:30:31PM -0400, Tom Lane wrote:
> > Yeah, I could live with something like that from the security standpoint.
> > Not sure if it helps Nazir's use-case though.  Maybe we could invent
> > categories that can be used in place of individual test names?
> > For now,

Yes, that is not ideal for my use-case but still better.

On Tue, 5 Sept 2023 at 00:09, Noah Misch  wrote:
>
> I could imagine categories for filesystem bytes and RAM bytes.  Also, while
> needs-private-lo has a bounded definition, "slow" doesn't.  If today's one
> "slow" test increases check-world duration by 1.1x, we may not let a
> 100x-increase test use the same keyword.

I agree. I didn't create a new category as 'slow' but still open to suggestions.

I am not very familiar with perl syntax, I would like to hear your
opinions on how the implementation of the check_extra_text_enabled()
function could be done better.

Regards,
Nazir Bilal Yavuz
Microsoft
From 3a33161eef699e4fbcfeb2d62ec387621a331d78 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Tue, 5 Sep 2023 20:14:15 +0300
Subject: [PATCH v2] Create shorthand for including extra tests that treat the
 loopback interface as private

This patch aims to make running full testsuite easier without having to
keep up with new tests and updates.

Create 'needs-private-lo' option for PG_TEST_EXTRA to enable extra
test suites that treat the loopback interface as private. That is
achieved by creating check_extra_tests_enabled() function in
the Test/Utils.pm file. This function takes the test's name as an
input and checks if PG_TEST_EXTRA contains this extra test or any
option that enables this extra test.
---
 doc/src/sgml/regress.sgml | 10 +
 .../libpq/t/004_load_balance_dns.pl   |  2 +-
 src/test/kerberos/t/001_auth.pl   |  2 +-
 src/test/ldap/t/001_auth.pl   |  2 +-
 src/test/ldap/t/002_bindpasswd.pl |  2 +-
 src/test/modules/Makefile |  2 +-
 .../t/001_mutated_bindpasswd.pl   |  2 +-
 src/test/perl/PostgreSQL/Test/Utils.pm| 45 +++
 src/test/recovery/t/027_stream_regress.pl |  3 +-
 src/test/ssl/t/001_ssltests.pl|  2 +-
 src/test/ssl/t/002_scram.pl   |  2 +-
 src/test/ssl/t/003_sslinfo.pl |  2 +-
 12 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 675db86e4d7..a9ceb0342d3 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -313,6 +313,16 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance'
   
  
 
+
+
+ needs_private_lo
+ 
+  
+   Enables the extra tests that treat the loopback interface as a private.
+   Currently enables kerberos, ldap, load_balance and ssl extra tests.
+  
+ 
+

 
Tests for features that are not supported by the current build
diff --git a/src/interfaces/libpq/t/004_load_balance_dns.pl b/src/interfaces/libpq/t/004_load_balance_dns.pl
index 875070e2120..62eeb21843e 100644
--- a/src/interfaces/libpq/t/004_load_balance_dns.pl
+++ b/src/interfaces/libpq/t/004_load_balance_dns.pl
@@ -6,7 +6,7 @@ use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
 use Test::More;
 
-if ($ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/)
+if (!PostgreSQL::Test::Utils::check_extra_text_enabled('load_balance'))
 {
 	plan skip_all =>
 	  'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA';
diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 0deb9bffc8d..59574178afc 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -28,7 +28,7 @@ if ($ENV{with_gssapi} ne 'yes')
 {
 	plan skip_all => 'GSSAPI/Kerberos not supported by this build';
 }
-elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/)
+elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('kerberos'))
 {
 	plan skip_all =>
 	  'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA';
diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index 3e113fd6ebb..9db07e801e9 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes')
 {
 	plan skip_all => 'LDAP not supported by this build';
 }
-elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/)
+elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap'))
 {
 	plan skip_all =>
 	  'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA';
diff --git a/src/test/ldap/t/002_bindpasswd.pl b/src/test/ldap/t/002_bindpasswd.pl
index bcd4aa2b742..a1b1bd8c22f 100644
--- a/src/test/ldap/t/002_bindpasswd.pl
+++ b/src/test/ldap/t/002_bindpasswd.pl
@@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes')
 {
 	plan skip_all => 'LDAP not supported by t

Re: Opportunistically pruning page before update

2023-09-05 Thread Melanie Plageman
On Wed, Jun 21, 2023 at 8:51 AM James Coleman  wrote:
> While at PGCon I was chatting with Andres (and I think Peter G. and a
> few others who I can't remember at the moment, apologies) and Andres
> noted that while we opportunistically prune a page when inserting a
> tuple (before deciding we need a new page) we don't do the same for
> updates.
>
> Attached is a patch series to do the following:
>
> 0001: Make it possible to call heap_page_prune_opt already holding an
> exclusive lock on the buffer.
> 0002: Opportunistically prune pages on update when the current tuple's
> page has no free space. If this frees up enough space, then we
> continue to put the new tuple on that page; if not, then we take the
> existing code path and get a new page.

I've reviewed these patches and have questions.

Under what conditions would this be exercised for UPDATE? Could you
provide an example?

With your patch applied, when I create a table, the first time I update
it heap_page_prune_opt() will return before actually doing any pruning
because the page prune_xid hadn't been set (it is set after pruning as
well as later in heap_update() after RelationGetBufferForTuple() is
called).

I actually added an additional parameter to heap_page_prune() and
heap_page_prune_opt() to identify if heap_page_prune() was called from
RelationGetBufferForTuple() and logged a message when this was true.
Running the test suite, I didn't see any UPDATEs executing
heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
other statement types doing so (see RelationGetBufferForTuple()'s other
callers). Was that intended?

> I started to work on benchmarking this, but haven't had time to devote
> properly to that, so I'm wondering if there's anyone who might be
> interested in collaborating on that part.

I'm interested in this feature and in helping with it/helping with
benchmarking it, but I don't yet understand the design in its current
form.

- Melanie




missing privilege check after not-null constraint rework

2023-09-05 Thread Alvaro Herrera
Here's a fix to move the privilege check on constraint dropping from
ATExecDropConstraint to dropconstraint_internal.  The former doesn't
recurse anymore, so there's no point in doing that or in fact even
having the 'recursing' argument anymore.

This fixes the following test case

CREATE ROLE alice;
CREATE ROLE bob;

GRANT ALL ON SCHEMA PUBLIC to alice, bob;
GRANT alice TO bob;

SET ROLE alice;
CREATE TABLE parent (a int NOT NULL);

SET ROLE bob;
CREATE TABLE child () INHERITS (parent);

At this point, bob owns the child table, to which alice has no access.
But alice can do this:
ALTER TABLE parent ALTER a DROP NOT NULL;
which is undesirable, because it removes the NOT NULL constraint from
table child, which is owned by bob.


Alternatively, we could say that Alice is allowed to drop the constraint
on her table, and that we should react by marking the constraint on
Bob's child table as 'islocal' instead of removing it.  Now, I'm pretty
sure we don't really care one bit about this case, and the reason is
this: we seem to have no tests for mixed-ownership table hierarchies.
If we did care, we would have some, and this bug would not have occurred
in the first place.  Besides, nobody likes legacy inheritance anyway.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La persona que no quería pecar / estaba obligada a sentarse
 en duras y empinadas sillas/ desprovistas, por cierto
 de blandos atenuantes"  (Patricio Vogel)




Re: missing privilege check after not-null constraint rework

2023-09-05 Thread Alvaro Herrera
On 2023-Sep-05, Alvaro Herrera wrote:

> Here's a fix to move the privilege check on constraint dropping from
> ATExecDropConstraint to dropconstraint_internal.


-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."
>From f58d4532d39bc39edc3841c1ee5c3e3e9e9d153a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 5 Sep 2023 17:06:23 +0200
Subject: [PATCH] Move privilege check to the right place

Now that ATExecDropConstraint doesn't recurse anymore, there's no point
in testing the privileges during recursion there.  Move the check to
dropconstraint_internal, which is the place where recursion occurs.

In passing, remove now-useless 'recursing' argument to
ATExecDropConstraint.
---
 src/backend/commands/tablecmds.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 47c556669f..8a2c671b66 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -542,8 +542,7 @@ static void GetForeignKeyCheckTriggers(Relation trigrel,
 	   Oid *insertTriggerOid,
 	   Oid *updateTriggerOid);
 static void ATExecDropConstraint(Relation rel, const char *constrName,
- DropBehavior behavior,
- bool recurse, bool recursing,
+ DropBehavior behavior, bool recurse,
  bool missing_ok, LOCKMODE lockmode);
 static ObjectAddress dropconstraint_internal(Relation rel,
 			 HeapTuple constraintTup, DropBehavior behavior,
@@ -5236,7 +5235,7 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 			break;
 		case AT_DropConstraint: /* DROP CONSTRAINT */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
- cmd->recurse, false,
+ cmd->recurse,
  cmd->missing_ok, lockmode);
 			break;
 		case AT_AlterColumnType:	/* ALTER COLUMN TYPE */
@@ -12263,8 +12262,7 @@ createForeignKeyCheckTriggers(Oid myRelOid, Oid refRelOid,
  */
 static void
 ATExecDropConstraint(Relation rel, const char *constrName,
-	 DropBehavior behavior,
-	 bool recurse, bool recursing,
+	 DropBehavior behavior, bool recurse,
 	 bool missing_ok, LOCKMODE lockmode)
 {
 	Relation	conrel;
@@ -12273,10 +12271,6 @@ ATExecDropConstraint(Relation rel, const char *constrName,
 	HeapTuple	tuple;
 	bool		found = false;
 
-	/* At top level, permission check was done in ATPrepCmd, else do it */
-	if (recursing)
-		ATSimplePermissions(AT_DropConstraint, rel, ATT_TABLE | ATT_FOREIGN_TABLE);
-
 	conrel = table_open(ConstraintRelationId, RowExclusiveLock);
 
 	/*
@@ -12302,7 +12296,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
 	{
 		List	   *readyRels = NIL;
 
-		dropconstraint_internal(rel, tuple, behavior, recurse, recursing,
+		dropconstraint_internal(rel, tuple, behavior, recurse, false,
 missing_ok, &readyRels, lockmode);
 		found = true;
 	}
@@ -12353,6 +12347,10 @@ dropconstraint_internal(Relation rel, HeapTuple constraintTup, DropBehavior beha
 		return InvalidObjectAddress;
 	*readyRels = lappend_oid(*readyRels, RelationGetRelid(rel));
 
+	/* At top level, permission check was done in ATPrepCmd, else do it */
+	if (recursing)
+		ATSimplePermissions(AT_DropConstraint, rel, ATT_TABLE | ATT_FOREIGN_TABLE);
+
 	conrel = table_open(ConstraintRelationId, RowExclusiveLock);
 
 	con = (Form_pg_constraint) GETSTRUCT(constraintTup);
-- 
2.30.2



Re: How to add a new pg oid?

2023-09-05 Thread David G. Johnston
OIDs don't exist independently of the data they are associated with.  Give
more context if you want a better answer.  Or just go look at the source
code commits for when the last time something needing an OID got added to
the core catalog.

David J.


Re: Unlogged relation copy is not fsync'd

2023-09-05 Thread Robert Haas
On Fri, Aug 25, 2023 at 8:47 AM Heikki Linnakangas  wrote:
> 1. Create an unlogged table
> 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls
> RelationCopyStorage
> 3. a checkpoint happens while the command is running
> 4. After the ALTER TABLE has finished, shut down postgres cleanly.
> 5. Lose power
>
> When you recover, the unlogged table is not reset, because it was a
> clean postgres shutdown. But the part of the file that was copied after
> the checkpoint at step 3 was never fsync'd, so part of the file is lost.
> I was able to reproduce with a VM that I kill to simulate step 5.
>
> This is the same scenario that the smgrimmedsync() call above protects
> from for WAL-logged relations. But we got the condition wrong: instead
> of worrying about the init fork, we need to call smgrimmedsync() for all
> *other* forks of an unlogged relation.
>
> Fortunately the fix is trivial, see attached.

The general rule throughout the system is that the init-fork of an
unlogged relation is treated the same as a permanent relation: it is
WAL-logged and fsyncd. But the other forks of an unlogged relation are
neither WAL-logged nor fsync'd ... except in the case of a clean
shutdown, when we fsync even that data.

In other words, somehow it feels like we ought to be trying to defer
the fsync here until a clean shutdown actually occurs, instead of
performing it immediately. Admittedly, the bookkeeping seems like a
problem, so maybe this is the best we can do, but it's clearly worse
than what we do in other cases.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: How to add a new pg oid?

2023-09-05 Thread David G. Johnston
On Tue, Sep 5, 2023, 11:17 jacktby jacktby  wrote:

>
> > 2023年9月5日 22:29,jacktby jacktby  写道:
> >
> I’m trying to add a new data type for my pg. How to do that? Can you give
> me more details or an example
>

Use create type and let the system deal with it.  Otherwise, no, I don't
have that knowledge.

David J.

>


Re: Improve heapgetpage() performance, overhead from serializable

2023-09-05 Thread Andres Freund
Hi,

On 2023-09-05 14:42:57 +0700, John Naylor wrote:
> On Mon, Jul 17, 2023 at 9:58 PM Andres Freund  wrote:
>
> > FWIW, there's more we can do, with some hacky changes I got the time down
> to
> > 273.261, but the tradeoffs start to be a bit more complicated. And
> 397->320ms
> > for something as core as this, is imo worth considering on its own.
>
> Nice!
>
> > On 2023-07-17 09:55:07 +0800, Zhang Mingli wrote:
>
> > > Does it make sense to combine if else condition and put it to the
> incline function’s param?
> > >
> > > Like:
> > > scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
> > >
>  block, lines, all_visible, check_serializable);
> >
> > I think that makes it less likely that the compiler actually generates a
> > constant-folded version for each of the branches. Perhaps worth some
> > experimentation.
>
> Combining this way doesn't do so for me.

Are you saying that the desired constant folding happened after combining the
branches, or that it didn't happen?

Greetings,

Andres Freund




Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-09-05 Thread Jeff Davis
On Mon, 2023-09-04 at 18:01 +0530, Ashutosh Bapat wrote:
> Why do we need to re-check parameters constantly? We will need to
> restart subscriptions which are using the user mapping of FDW when
> user mapping or server options change.

"Constantly" was an exaggeration, but the point is that it's a separate
validation step after the ALTER SERVER or ALTER USER MAPPING has
already happened, so the subscription would start failing.

Perhaps this is OK, but it's not the ideal user experience. Ideally,
the user would get some indication from the ALTER SERVER or ALTER USER
MAPPING that it's about to break a subscription that depends on it.

> I didn't understand your worry about circumventing password_required
> protection.

If the subscription doesn't do its own validation, and if the FDW
doesn't ensure that the password is set, then it could end up creating
a creating a connection string without supplying the password.

> We don't need to if we allow any FDW (even if non-postgreSQL) to be
> specified there.

OK, so we could have a built-in FDW called pg_connection that would do
the right kinds of validation; and then also allow other FDWs but the
subscription would have to do its own validation.

Regards,
Jeff Davis





Re: Initdb-time block size specification

2023-09-05 Thread Hannu Krosing
Something I also asked at this years Unconference - Do we currently
have Build Farm animals testing with different page sizes ?

I'd say that testing all sizes from 4KB up (so 4, 8, 16, 32) should be
done at least before each release if not continuously.

-- Cheers

Hannu


On Tue, Sep 5, 2023 at 4:31 PM David Christensen
 wrote:
>
> On Tue, Sep 5, 2023 at 9:04 AM Robert Haas  wrote:
> >
> > On Sat, Sep 2, 2023 at 3:09 PM Tomas Vondra
> >  wrote:
> > > Except that no one is forcing you to actually go 130mph or 32mph, right?
> > > You make it seem like this patch forces people to use some other page
> > > size, but that's clearly not what it's doing - it gives you the option
> > > to use smaller or larger block, if you chose to. Just like increasing
> > > the speed limit to 130mph doesn't mean you can't keep going 65mph.
> > >
> > > The thing is - we *already* allow using different block size, except
> > > that you have to do custom build. This just makes it easier.
> >
> > Right. Which is worth doing if it doesn't hurt performance and is
> > likely to be useful to a lot of people, and is not worth doing if it
> > will hurt performance and be useful to relatively few people. My bet
> > is on the latter.
>
> Agreed that this doesn't make sense if there are major performance
> regressions, however the goal here is patch evaluation to measure this
> against other workloads and see if this is the case; from my localized
> testing things were within acceptable noise levels with the latest
> version.
>
> I agree with Tomas' earlier thoughts: we already allow different block
> sizes, and if there are baked-in algorithmic assumptions about block
> size (which there probably are), then identifying those or places in
> the code where we need additional work or test coverage will only
> improve things overall for those non-standard block sizes.
>
> Best,
>
> David
>
>




Re: pg_upgrade fails with in-place tablespace[

2023-09-05 Thread Robert Haas
On Mon, Sep 4, 2023 at 11:29 PM Rui Zhao  wrote:
> Thank you for your response. It is evident that there is a need
> for this features in our system.
> Firstly, our customers express their desire to utilize tablespaces
> for table management, without necessarily being concerned about
> the directory location of these tablespaces.

That's not the purpose of that feature. I'm not sure that we should be
worrying about users who want to use features for a purpose other than
the one for which they were designed.

> Secondly, currently PG only supports absolute-path tablespaces, but
> in-place tablespaces are very likely to become popular in the future.

That's not really a reason. If you said why this was going to happen,
that might be a reason, but just asserting that it will happen isn't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Initdb-time block size specification

2023-09-05 Thread Andres Freund
Hi,

On 2023-09-05 21:52:18 +0200, Hannu Krosing wrote:
> Something I also asked at this years Unconference - Do we currently
> have Build Farm animals testing with different page sizes ?

You can check that yourself as easily as anybody else.

Greetings,

Andres Freund




Fix some wording in WAL docs

2023-09-05 Thread Tristan Partin
I was reading through the page and noticed this portion which didn't 
sound quite right. I am hoping that I captured the original intent 
correctly. Please let me know if something should be changed and/or 
reflowed, since I am not sure what best practices are when editing the 
docs. I did notice that this same wording issue has existed since 
428b1d6.


--
Tristan Partin
Neon (https://neon.tech)
From 6fea7ef8c94022a68a9e14bf577480589e4fa32c Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 5 Sep 2023 15:27:31 -0500
Subject: [PATCH v1] Fix some wording in WAL docs

The sentence did not make sense as it was.
---
 doc/src/sgml/wal.sgml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4aad0e1a07..41f31f5ba7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -601,9 +601,9 @@
   
 
   
-   On Linux and POSIX platforms 
-   allows to force the OS that pages written by the checkpoint should be
-   flushed to disk after a configurable number of bytes.  Otherwise, these
+   On Linux and POSIX platforms, 
+   forces the OS to flush dirty pages written by the checkpoint to disk
+   after writing the configured number of bytes. Otherwise, these
pages may be kept in the OS's page cache, inducing a stall when
fsync is issued at the end of a checkpoint.  This setting will
often help to reduce transaction latency, but it also can have an adverse
-- 
Tristan Partin
Neon (https://neon.tech)



Re: Initdb-time block size specification

2023-09-05 Thread David Christensen
On Tue, Sep 5, 2023 at 2:52 PM Hannu Krosing  wrote:
>
> Something I also asked at this years Unconference - Do we currently
> have Build Farm animals testing with different page sizes ?
>
> I'd say that testing all sizes from 4KB up (so 4, 8, 16, 32) should be
> done at least before each release if not continuously.
>
> -- Cheers
>
> Hannu

The regression tests currently have a lot of breakage when running
against non-standard block sizes, so I would assume the answer here is
no.  I would expect that we will want to add regression test variants
or otherwise normalize results so they will work with differing block
sizes, but have not done that yet.




Re: Add 'worker_type' to pg_stat_subscription

2023-09-05 Thread Peter Smith
On Sat, Sep 2, 2023 at 7:41 AM Nathan Bossart  wrote:

Thanks for your interest in this patch.

> Is there any reason not to spell out the names?  I think that would match
> the other system views better (e.g., backend_type in pg_stat_activity).

I had thought it might be simpler in case someone wanted to query by
type. But your suggestion for consistency is probably better, so I
changed to do it that way. The help is also simplified to match the
other 'backend_type' you cited.

> Also, instead of "tablesync worker", I'd suggest using "synchronization
> worker" to match the name used elsewhere in this table.
>

Changed to "table synchronization worker".

> I see that the table refers to "leader apply workers".  Would those show up
> as parallel apply workers in the view?  Can we add another worker type for
> those?

Internally there are only 3 worker types: A "leader" apply worker is
basically the same as a regular apply worker, except it has other
parallel apply workers associated with it.

I felt that pretending there are 4 types in the view would be
confusing. Instead, I just removed the word "leader". Now there are:
"apply worker"
"parallel apply worker"
"table synchronization worker"

PSA patch v2.

--
Kind Regards,
Peter Smith.
Fujitsu Australia
From bac581d9f6843b3df0dd5fc45e318594a7921ee6 Mon Sep 17 00:00:00 2001
From: Peter Smith 
Date: Mon, 4 Sep 2023 13:58:16 +1000
Subject: [PATCH v2] Add worker_type to pg_stat_subscription

---
 doc/src/sgml/monitoring.sgml   | 12 +++
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/replication/logical/launcher.c | 55 +-
 src/include/catalog/pg_proc.dat|  6 ++--
 src/test/regress/expected/rules.out|  3 +-
 5 files changed, 56 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4ff415d..45b9ccf 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2004,6 +2004,18 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
  
   
+   worker_type text
+  
+  
+   Type of the subscription worker process. Possible types are:
+   apply worker,
+   parallel apply worker,
+   table synchronization worker.
+  
+ 
+
+ 
+  
leader_pid integer
   
   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 77b06e2..3b7f5c2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -950,6 +950,7 @@ CREATE VIEW pg_stat_subscription AS
 su.oid AS subid,
 su.subname,
 st.pid,
+			st.worker_type,
 st.leader_pid,
 st.relid,
 st.received_lsn,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 7882fc9..0e2fbaf 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1278,7 +1278,7 @@ GetLeaderApplyWorkerPid(pid_t pid)
 Datum
 pg_stat_get_subscription(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_SUBSCRIPTION_COLS	9
+#define PG_STAT_GET_SUBSCRIPTION_COLS	10
 	Oid			subid = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0);
 	int			i;
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
@@ -1304,40 +1304,61 @@ pg_stat_get_subscription(PG_FUNCTION_ARGS)
 		if (OidIsValid(subid) && worker.subid != subid)
 			continue;
 
+		values[0] = ObjectIdGetDatum(worker.subid);
+
 		worker_pid = worker.proc->pid;
+		values[1] = Int32GetDatum(worker_pid);
+
+		switch (worker.type)
+		{
+			case WORKERTYPE_APPLY:
+values[2] = CStringGetTextDatum("apply worker");
+break;
+			case WORKERTYPE_PARALLEL_APPLY:
+values[2] = CStringGetTextDatum("parallel apply worker");
+break;
+			case WORKERTYPE_TABLESYNC:
+values[2] = CStringGetTextDatum("table synchronization worker");
+break;
+			case WORKERTYPE_UNKNOWN: /* should not be possible */
+nulls[2] = true;
+		}
 
-		values[0] = ObjectIdGetDatum(worker.subid);
-		if (isTablesyncWorker(&worker))
-			values[1] = ObjectIdGetDatum(worker.relid);
-		else
-			nulls[1] = true;
-		values[2] = Int32GetDatum(worker_pid);
 
 		if (isParallelApplyWorker(&worker))
 			values[3] = Int32GetDatum(worker.leader_pid);
 		else
 			nulls[3] = true;
 
-		if (XLogRecPtrIsInvalid(worker.last_lsn))
+		if (isTablesyncWorker(&worker))
+			values[4] = ObjectIdGetDatum(worker.relid);
+		else
 			nulls[4] = true;
+
+		if (XLogRecPtrIsInvalid(worker.last_lsn))
+			nulls[5] = true;
 		else
-			values[4] = LSNGetDatum(worker.last_lsn);
+			values[5] = LSNGetDatum(worker.last_lsn);
+
 		if (worker.last_send_time == 0)
-			nulls[5] = true;
+			nulls[6] = true;
 		else
-			values[5] = TimestampTzGetDatum(worker.last_send_time);
+			values[6] = TimestampTzGetDatum(worker.last_send_time);
+
 		if (worker.last_recv_time == 0)
-			nulls[6] = true;
+			nul

Re: Replace known_assigned_xids_lck by memory barrier

2023-09-05 Thread Nathan Bossart
On Fri, Sep 01, 2023 at 03:53:54PM -0400, Robert Haas wrote:
> I'm not an expert on this code but I looked at this patch briefly and
> it seems OK to me.

Thanks for taking a look.  Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Initdb-time block size specification

2023-09-05 Thread Hannu Krosing
Sure, I was just hoping that somebody already knew without needing to
specifically check :)

And as I see in David's response, the tests are actually broken for other sizes.

I'll see if I can (convince somebody to) set this up .

Cheers
Hannu

On Tue, Sep 5, 2023 at 10:23 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-09-05 21:52:18 +0200, Hannu Krosing wrote:
> > Something I also asked at this years Unconference - Do we currently
> > have Build Farm animals testing with different page sizes ?
>
> You can check that yourself as easily as anybody else.
>
> Greetings,
>
> Andres Freund




Re: Add 'worker_type' to pg_stat_subscription

2023-09-05 Thread Nathan Bossart
On Wed, Sep 06, 2023 at 09:02:21AM +1200, Peter Smith wrote:
> On Sat, Sep 2, 2023 at 7:41 AM Nathan Bossart  
> wrote:
>> I see that the table refers to "leader apply workers".  Would those show up
>> as parallel apply workers in the view?  Can we add another worker type for
>> those?
> 
> Internally there are only 3 worker types: A "leader" apply worker is
> basically the same as a regular apply worker, except it has other
> parallel apply workers associated with it.
> 
> I felt that pretending there are 4 types in the view would be
> confusing. Instead, I just removed the word "leader". Now there are:
> "apply worker"
> "parallel apply worker"
> "table synchronization worker"

Okay.  Should we omit "worker" for each of the types?  Since these are the
values for the "worker_type" column, it seems a bit redundant.  For
example, we don't add "backend" to the end of each value for backend_type
in pg_stat_activity.

I wonder if we could add the new field to the end of
pg_stat_get_subscription() so that we could simplify this patch a bit.  At
the moment, a big chunk of it is dedicated to reordering the values.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: information_schema and not-null constraints

2023-09-05 Thread Vik Fearing

On 9/5/23 19:15, Alvaro Herrera wrote:

On 2023-Sep-05, Alvaro Herrera wrote:

Looking now at what to do for CHECK_CONSTRAINTS with domain constraints,
I admit I'm completely confused about what this view is supposed to
show.  Currently, we show the constraint name and a definition like
"CHECK (column IS NOT NULL)".  But since the table name is not given, it
is not possible to know to what table the column name refers to.  For
domains, we could show "CHECK (VALUE IS NOT NULL)" but again with no
indication of what domain it applies to, or anything at all that would
make this useful in any way whatsoever.


Constraint names are supposed to be unique per schema[1] so the view 
contains the minimum required information to identify the constraint.



So this whole thing seems pretty futile and I'm disinclined to waste
much time on it.


Until PostgreSQL either
  A) obeys the spec on this uniqueness, or
  B) decides to deviate from the information_schema spec;
this view will be completely useless for actually getting any useful 
information.


I would like to see us do A because it is the right thing to do.  Our 
autogenerated names obey this rule, but who knows how many duplicate 
names per schema are out there in the wild from people specifying their 
own names.


I don't know what the project would think about doing B.


[1] SQL:2023-2 11.4  Syntax Rule 4
--
Vik Fearing





Re: information_schema and not-null constraints

2023-09-05 Thread David G. Johnston
On Tue, Sep 5, 2023 at 2:50 PM Vik Fearing  wrote:

> On 9/5/23 19:15, Alvaro Herrera wrote:
> > On 2023-Sep-05, Alvaro Herrera wrote:
> >
> > Looking now at what to do for CHECK_CONSTRAINTS with domain constraints,
> > I admit I'm completely confused about what this view is supposed to
> > show.  Currently, we show the constraint name and a definition like
> > "CHECK (column IS NOT NULL)".  But since the table name is not given, it
> > is not possible to know to what table the column name refers to.  For
> > domains, we could show "CHECK (VALUE IS NOT NULL)" but again with no
> > indication of what domain it applies to, or anything at all that would
> > make this useful in any way whatsoever.
>
> Constraint names are supposed to be unique per schema[1] so the view
> contains the minimum required information to identify the constraint.
>

I'm presuming that the view constraint_column_usage [1] is an integral part
of all this though I haven't taken the time to figure out exactly how we
are implementing it today.

I'm not all that for either A or B since the status quo seems workable.
Though ideally if the system has unique names per schema then everything
should just work - having the views produce duplicated information (as
opposed to nothing) if they are used when the DBA doesn't enforce the
standard's requirements seems plausible.

David J.

[1]
https://www.postgresql.org/docs/current/infoschema-constraint-column-usage.html


Re: sandboxing untrusted code

2023-09-05 Thread Jeff Davis
On Tue, 2023-09-05 at 12:25 -0400, Robert Haas wrote:
> I think I've more or less answered this already -- fully sandboxed
> code can't make reference to external data sources, from which it
> follows that it can't exercise SELECT (and most other privileges).

By what principle are we allowing EXECUTE but not SELECT? In theory, at
least, a function could hold secrets in the code, e.g.:

  CREATE FUNCTION answer_to_ultimate_question() RETURNS INT
LANGUAGE plpgsql AS $$ BEGIN RETURN 42; END; $$;

Obviously that's a bad idea in plpgsql, because anyone can just read
pg_proc. And maybe C would be handled differently somehow, so maybe it
all works.

But it feels like something is wrong there: it's fine to execute the
answer_to_ultimate_question() not because Bob has an EXECUTE privilege,
but because the sandbox renders any security concerns with *anyone*
executing the function moot. So why bother checking the EXECUTE
privilege at all?

> And that's where I see the distinction between *having* permissions
> and those permissions being *available* in a particular context. Bob
> has permission to give Alice an extra $1000 or whatever if he has the
> money and wishes to do so. But those permissions are probably not
> *available* in the context where Bob is following a set of
> instructions from Alice. If Bob's brain spontaneously generated the
> idea "let's give Alice a $1000 tip because her vacation home was
> absolutely amazing and I am quite rich," he would probably go right
> ahead and act on that idea and that is completely fine. But when Bob
> encounters that same idea *on a list of instructions provided by
> Alice*, the same operation is blocked *because it came from Alice*.
> If
> the list of instructions from Alice said to sweep the parlor, Bob
> would just go ahead and do it. Alice has permission to induce Bob to
> sweep the parlor, but does not have permission to induce Bob to give
> her a bunch of extra money.

In the real world example, sweeping the parlor has a (slight) cost to
the person doing it and it (slightly) matters who does it. In Postgres,
we don't do any CPU accounting per user, and it's all executed under
the same PID, so it really doesn't matter.

So it raises the question: why would we not simply say that this list
of instructions should be executed by the person who wrote it, in which
case the existing privilege mechanism would work just fine?

> And in the database context, I think it's fine if Alice induces Bob
> to
> compute some values or look at the value of work_mem, but I don't
> think it's OK if Alice induces Bob to make her a superuser.

If all the code can do is compute some values or look at work_mem,
perhaps the function needs no privileges at all (or some minimal
privileges)?

You explained conceptually where you're coming from, but I still don't
see much of a practical difference between having privileges but being
in a context where they won't be used, and dropping the privileges
entirely during that time. I suppose the answer is that the EXECUTE
privilege will still be used, but as I said above, that doesn't
entirely make sense to me, either.

Regards,
Jeff Davis





Re: information_schema and not-null constraints

2023-09-05 Thread Vik Fearing

On 9/6/23 00:14, David G. Johnston wrote:


I'm not all that for either A or B since the status quo seems workable.


Pray tell, how is it workable?  The view does not identify a specific 
constraint because we don't obey the rules on one side and we do obey 
the rules on the other side.  It is completely useless and unworkable.



Though ideally if the system has unique names per schema then everything
should just work - having the views produce duplicated information (as
opposed to nothing) if they are used when the DBA doesn't enforce the
standard's requirements seems plausible.

Let us not engage in victim blaming.  Postgres is the problem here.
--
Vik Fearing





Re: Replace known_assigned_xids_lck by memory barrier

2023-09-05 Thread Michail Nikolaev
Thanks everyone for help!


Re: should frontend tools use syncfs() ?

2023-09-05 Thread Nathan Bossart
I've committed 0001 for now.  I plan to commit the rest in the next couple
of days.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: information_schema and not-null constraints

2023-09-05 Thread Tom Lane
Vik Fearing  writes:
> On 9/6/23 00:14, David G. Johnston wrote:
>> I'm not all that for either A or B since the status quo seems workable.

> Pray tell, how is it workable?  The view does not identify a specific 
> constraint because we don't obey the rules on one side and we do obey 
> the rules on the other side.  It is completely useless and unworkable.

What solution do you propose?  Starting to enforce the spec's rather
arbitrary requirement that constraint names be unique per-schema is
a complete nonstarter.  Changing the set of columns in a spec-defined
view is also a nonstarter, or at least we've always taken it as such.

If you'd like to see some forward progress in this area, maybe you
could lobby the SQL committee to make constraint names unique per-table
not per-schema, and then make the information_schema changes that would
be required to support that.

In general though, the fact that we have any DDL extensions at all
compared to the standard means that there will be Postgres databases
that are not adequately represented by the information_schema views.
I'm not sure it's worth being more outraged about constraint names
than anything else.  Or do you also want us to rip out (for starters)
unique indexes on expressions, or unique partial indexes?

regards, tom lane




Re: RFC: Logging plan of the running query

2023-09-05 Thread James Coleman
On Tue, Sep 5, 2023 at 9:59 AM torikoshia  wrote:
>
> On 2023-08-28 22:47, James Coleman wrote:
> > On Mon, Aug 28, 2023 at 3:01 AM torikoshia 
> > wrote:
> >>
> >> On 2023-08-26 21:03, James Coleman wrote:
> >> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman  wrote:
> >> >>
> >> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia
> >> >>  wrote:
> >> >> >
> >> >> > On 2023-06-16 01:34, James Coleman wrote:
> >> >> > > Attached is v28
> >> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> >> >> > > when necessary. Once built with those two patches I'm simply running
> >> >> > > `make check`.
> >> >> >
> >> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes 
> >> >> > consume
> >> >> > huge
> >> >> > amount of memory and under some environments they were terminated by 
> >> >> > OOM
> >> >> > killer.
> >> >> >
> >> >> > This was because memory was allocated from existing memory contexts 
> >> >> > and
> >> >> > they
> >> >> > were not freed after ProcessLogQueryPlanInterrupt().
> >> >> > Updated the patch to use dedicated memory context for
> >> >> > ProcessLogQueryPlanInterrupt().
> >> >> >
> >> >> > Applying attached patch and v28-0002 patch, `make check` successfully
> >> >> > completed after 20min and 50GB of logs on my environment.
> >> >> >
> >> >> > >>> On 2023-06-15 01:48, James Coleman wrote:
> >> >> > >>> > The tests have been running since last night, but have been 
> >> >> > >>> > apparently
> >> >> > >>> > hung now for many hours.
> >> >> >
> >> >> > I don't know if this has anything to do with the hung you faced, but I
> >> >> > thought
> >> >> > it might be possible that the large amount of memory usage resulted in
> >> >> > swapping, which caused a significant delay in processing.
> >> >>
> >> >> Ah, yes, I think that could be a possible explanation. I was delaying
> >> >> on this thread because I wasn't comfortable with having caused an
> >> >> issue once (even if I couldn't easily reproduce) without at least some
> >> >> theory as to the cause (and a fix).
> >> >>
> >> >> > If possible, I would be very grateful if you could try to reproduce 
> >> >> > this
> >> >> > with
> >> >> > the v29 patch.
> >> >>
> >> >> I'll kick off some testing.
> >> >>
> >> >
> >> > I don't have time to investigate what's happening here, but 24 hours
> >> > later the first "make check" is still running, and at first glance it
> >> > seems to have the same behavior I'd seen that first time. The test
> >> > output is to this point:
> >> >
> >> > # parallel group (5 tests):  index_including create_view
> >> > index_including_gist create_index create_index_spgist
> >> > ok 66+ create_index26365 ms
> >> > ok 67+ create_index_spgist 27675 ms
> >> > ok 68+ create_view  1235 ms
> >> > ok 69+ index_including  1102 ms
> >> > ok 70+ index_including_gist 1633 ms
> >> > # parallel group (16 tests):  create_aggregate create_cast errors
> >> > roleattributes drop_if_exists hash_func typed_table create_am
> >> > infinite_recurse
> >> >
> >> > and it hasn't progressed past that point since at least ~16 hours ago
> >> > (the first several hours of the run I wasn't monitoring it).
> >> >
> >> > I haven't connected up gdb yet, and won't be able to until maybe
> >> > tomorrow, but here's the ps output for postgres processes that are
> >> > running:
> >> >
> >> > admin3213249  0.0  0.0 196824 20552 ?Ss   Aug25   0:00
> >> > /home/admin/postgresql-test/bin/postgres -D
> >> > /home/admin/postgresql-test-data
> >> > admin3213250  0.0  0.0 196964  7284 ?Ss   Aug25   0:00
> >> > postgres: checkpointer
> >> > admin3213251  0.0  0.0 196956  4276 ?Ss   Aug25   0:00
> >> > postgres: background writer
> >> > admin3213253  0.0  0.0 196956  8600 ?Ss   Aug25   0:00
> >> > postgres: walwriter
> >> > admin3213254  0.0  0.0 198424  7316 ?Ss   Aug25   0:00
> >> > postgres: autovacuum launcher
> >> > admin3213255  0.0  0.0 198412  5488 ?Ss   Aug25   0:00
> >> > postgres: logical replication launcher
> >> > admin3237967  0.0  0.0   2484   516 pts/4S+   Aug25   0:00
> >> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
> >> > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
> >> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
> >> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
> >> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
> >> > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20
> >> > --schedule=./parallel_schedule
> >> > admin3237973  0.0  0.0 197880 20688 pts/4S+   Aug25   0:00
> >> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
> >> > listen_addresses= -k /tmp/pg_

Re: Autogenerate some wait events code and documentation

2023-09-05 Thread Erik Wienhold
On 05/09/2023 13:50 CEST Michael Paquier  wrote:

> On Tue, Sep 05, 2023 at 11:06:36AM +0200, Drouvot, Bertrand wrote:
> > Oh ok, out of curiosity, why are 2 whitespaces intentional?
>
> That depends on the individual who write the code, but I recall that
> this is some old-school style from the 70's and/or the 80's when
> typing machines were still something.  I'm just used to this style
> after the end of a sentence in a comment.

FYI: https://en.wikipedia.org/wiki/Sentence_spacing

--
Erik




Re: information_schema and not-null constraints

2023-09-05 Thread Vik Fearing

On 9/6/23 02:53, Tom Lane wrote:

Vik Fearing  writes:

On 9/6/23 00:14, David G. Johnston wrote:

I'm not all that for either A or B since the status quo seems workable.



Pray tell, how is it workable?  The view does not identify a specific
constraint because we don't obey the rules on one side and we do obey
the rules on the other side.  It is completely useless and unworkable.


What solution do you propose?  Starting to enforce the spec's rather
arbitrary requirement that constraint names be unique per-schema is
a complete nonstarter.  Changing the set of columns in a spec-defined
view is also a nonstarter, or at least we've always taken it as such.


I both semi-agree and semi-disagree that these are nonstarters.  One of 
them has to give.



If you'd like to see some forward progress in this area, maybe you
could lobby the SQL committee to make constraint names unique per-table
not per-schema, and then make the information_schema changes that would
be required to support that.


I could easily do that; but now you are asking to denormalize the 
standard, because the constraints could be from tables, domains, or 
assertions.


I don't think that will go over well, starting with my own opinion.

And for this reason, I do not believe that this is a "rather arbitrary 
requirement".



In general though, the fact that we have any DDL extensions at all
compared to the standard means that there will be Postgres databases
that are not adequately represented by the information_schema views.


Sure.


I'm not sure it's worth being more outraged about constraint names
than anything else.  Or do you also want us to rip out (for starters)
unique indexes on expressions, or unique partial indexes?


Indexes of any kind are not part of the standard so these examples are 
basically invalid.


SQL:2023-11 Schemata is not the part I am most familiar with, but I 
don't even see where regular multi-column unique constraints are listed 
out, so that is both a lack in the standard and a knockdown of this 
argument.  I am happy to be shown wrong about this.

--
Vik Fearing





Re: psql - pager support - using invisible chars for signalling end of report

2023-09-05 Thread Thomas Munro
On Sat, Apr 25, 2020 at 3:53 PM Pavel Stehule  wrote:
> so 25. 4. 2020 v 2:12 odesílatel Tom Lane  napsal:
>> Pavel Stehule  writes:
>> > pá 24. 4. 2020 v 21:33 odesílatel Tom Lane  napsal:
>> >> And what will happen when those characters are in the data?
>>
>> > It will be used on pager side as signal so previous rows was really last
>> > row of result, and new row will be related to new result.
>>
>> In other words, it will misbehave badly if those characters appear
>> in the query result.  Doesn't sound acceptable to me.
>
>
> It should not be problem, I think
>
> a) it can be applied as special char only when row before was empty
> b) psql formates this char inside query result, so should not be possible to 
> find binary this value inside result.
>
> postgres=# select e'AHOJ' || chr(5) || '';
> ┌──┐
.> │   ?column?   │
> ╞══╡
> │ AHOJ\x05 │
> └──┘
> (1 row)

This sounds better than the QUERY_SEPARATOR hack from commit
664d757531e, and similar kludges elsewhere.  I think Pavel and David
are right about NUL being impossible in psql query output, no?




Re: Improve heapgetpage() performance, overhead from serializable

2023-09-05 Thread John Naylor
On Wed, Sep 6, 2023 at 1:38 AM Andres Freund  wrote:

> > > I think that makes it less likely that the compiler actually
generates a
> > > constant-folded version for each of the branches. Perhaps worth some
> > > experimentation.
> >
> > Combining this way doesn't do so for me.
>
> Are you saying that the desired constant folding happened after combining
the
> branches, or that it didn't happen?

Constant folding did not happen.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Disabling Heap-Only Tuples

2023-09-05 Thread Laurenz Albe
On Wed, 2023-08-30 at 09:31 -0400, Robert Haas wrote:
> On Wed, Aug 30, 2023 at 9:01 AM Matthias van de Meent
>  wrote:
> > I've reworked the patch a bit to remove the "excessive bloat with low
> > fillfactors when local space is available" issue that this parameter
> > could cause - local updates are now done if the selected page we would
> > be inserting into is after the old tuple's page and the old tuple's
> > page still (or: now) has space available.
> > 
> > Does that alleviate your concerns?
> 
> That seems like a good chance, but my core concern is around people
> having to micromanage local_update_limit, and probably either not
> knowing how to do it properly, or not being able or willing to keep
> updating it as things change.
> 
> In a way, this parameter is a lot like work_mem, which is notoriously
> very difficult to tune.

I don't think that is a good comparison.  While most people probably
never need to touch "local_update_limit", "work_mem" is something everybody
has to consider.

And it is not so hard to tune: the setting would be the desired table
size, and you could use pgstattuple to find a good value.

I don't know what other use cases come to mind, but I see it as a tool to
shrink a table after it has grown big holes, perhaps after a mass delete.
Today, you can only VACUUM (FULL) or play with the likes of pg_squeeze and
pg_repack.

I think this is useful.

To alleviate your concerns, perhaps it would help to describe the use case
and ideas for a good setting in the documentation.

Yours,
Laurenz Albe




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-05 Thread Zhijie Hou (Fujitsu)
On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Dear Hou-san,
> 
> > Based on this, it’s possible that the slots we get each time when
> > checking wal_status are different, because they may get changed in between
> these checks.
> > This may not cause serious problems for now, because we will either
> > copy all the slots including ones invalidated when upgrading or we
> > report ERROR. But I feel it's better to get consistent result each
> > time we check the slots to close the possibility for problems in the
> > future. So, I feel we could centralize the check for wal_status and
> > slots fetch, so that even if some slots status changed after that, it won't 
> > have
> a risk to affect our check. What do you think ?
> 
> Thank you for giving the suggestion! I agreed that to centralize checks, and I
> had already started to modify. Here is the updated patch.
> 
> In this patch all slot infos are extracted in the
> get_old_cluster_logical_slot_infos(),
> upcoming functions uses them. Based on the change, two attributes
> confirmed_flush and wal_status were added in LogicalSlotInfo.
> 
> IIUC we cannot use strcut List in the client codes, so structures and related
> functions are added in the function.c. These are used for extracting unique
> plugins, but it may be overkill because check_loadable_libraries() handle
> duplicated entries. If we can ignore duplicated entries, these functions can 
> be
> removed.
> 
> Also, for simplifying codes, only a first-met invalidated slot is output in 
> the
> check_old_cluster_for_valid_slots(). Warning messages int the function were
> removed. I think it may be enough because check_new_cluster_is_empty() do
> similar thing. Please tell me if it should be reverted...

Thank for updating the patch ! here are few comments.

1.

+   res = executeQueryOrDie(conn, "SHOW wal_level;");
+   wal_level = PQgetvalue(res, 0, 0);

+   res = executeQueryOrDie(conn, "SHOW wal_level;");
+   wal_level = PQgetvalue(res, 0, 0);

I think it would be better to do a sanity check using PQntuples() before
calling PQgetvalue() in above places.

2.

+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)
+{
+   WALAvailability status = WALAVAIL_INVALID_LSN;
+
+   if (strcmp(str, "reserved") == 0)
+   status = WALAVAIL_RESERVED;

Not a comment, but I am wondering if we could use conflicting field to do this
check, so that we could avoid the new conversion function and structure
movement. What do you think ?


3.

+   curr->confirmed_flush = strtoLSN(
+   
 PQgetvalue(res,
+   
slotnum,
+   
i_confirmed_flush),
+   
 &have_error);

The indention looks a bit unusual.

4.
+* XXX: As mentioned in comments atop get_output_plugins(), we may not
+* have to consider the uniqueness of entries. If so, we can use
+* count_old_cluster_logical_slots() instead of plugin_list_length().
+*/

I think check_loadable_libraries() will avoid loading the same library, so it
seems fine to skip duplicating the plugins and we can save some codes.


/* Did the library name change?  Probe it. */
if (libnum == 0 || strcmp(lib, os_info.libraries[libnum - 
1].name) != 0)


But if we think duplicating them would be better, I feel we could use the
SimpleStringList to store and duplicate the plugin name. get_output_plugins can
return an array of the stringlist, each stringlist includes the plugins names
in one db. I shared a rough POC patch to show how it works, the intention is to
avoid introducing our new plugin list API.

5.

+   os_info.libraries = (LibraryInfo *) pg_malloc(
+   
  (totaltups + plugin_list_length(output_plugins)) * 
sizeof(LibraryInfo));

If we think this looks too long, maybe using pg_malloc_array can help.

Best Regards,
Hou zj



0001-use-simple-ptr-list_topup_patch
Description: 0001-use-simple-ptr-list_topup_patch


Re: information_schema and not-null constraints

2023-09-05 Thread Tom Lane
Vik Fearing  writes:
> On 9/6/23 02:53, Tom Lane wrote:
>> What solution do you propose?  Starting to enforce the spec's rather
>> arbitrary requirement that constraint names be unique per-schema is
>> a complete nonstarter.  Changing the set of columns in a spec-defined
>> view is also a nonstarter, or at least we've always taken it as such.

> I both semi-agree and semi-disagree that these are nonstarters.  One of 
> them has to give.

[ shrug... ] if you stick to a SQL-compliant schema setup, then the
information_schema views will serve for introspection.  If you don't,
they won't, and you'll need to look at Postgres-specific catalog data.
This compromise has served for twenty years or so, and I'm not in a
hurry to change it.  I think the odds of changing to the spec's
restriction without enormous pushback are nil, and I do not think
that the benefit could possibly be worth the ensuing pain to users.
(It's not even the absolute pain level that is a problem, so much
as the asymmetry: the pain would fall exclusively on users who get
no benefit, because they weren't relying on these views anyway.
If you think that's an easy sell, you're mistaken.)

It could possibly be a little more palatable to add column(s) to the
information_schema views, but I'm having a hard time seeing how that
moves the needle.  The situation would still be precisely describable
as "if you stick to a SQL-compliant schema setup, then the standard
columns of the information_schema views will serve for introspection.
If you don't, they won't, and you'll need to look at Postgres-specific
columns".  That doesn't seem like a big improvement.  Also, given your
point about normalization, how would we define the additions exactly?

regards, tom lane




Re: Autogenerate some wait events code and documentation

2023-09-05 Thread Michael Paquier
On Tue, Sep 05, 2023 at 11:06:36AM +0200, Drouvot, Bertrand wrote:
> Also, v5 needs a rebase due to f691f5b80a.
> 
> Attaching v6 taking care of the 2 points mentioned above.

Thanks.  This time I have correctly checked the consistency of the
data produced across all these commits using pg_wait_events, and
that's OK.  So applied both.
--
Michael


signature.asc
Description: PGP signature


Re: Autogenerate some wait events code and documentation

2023-09-05 Thread Michael Paquier
On Wed, Sep 06, 2023 at 04:20:23AM +0200, Erik Wienhold wrote:
> FYI: https://en.wikipedia.org/wiki/Sentence_spacing

That was an interesting read.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: document the need to analyze partitioned tables

2023-09-05 Thread Laurenz Albe
Sorry for dropping the ball on this; I'll add it to the next commitfest.

On Wed, 2023-01-25 at 21:43 +1300, David Rowley wrote:
> > I think your first sentence it a bit clumsy and might be streamlined to
> > 
> >   Partitioned tables do not directly store tuples and consequently do not
> >   require autovacuum to perform any VACUUM operations.
> 
> That seems better than what I had.

Ok, I went with it.

> > Also, I am a little bit unhappy about
> > 
> > 1. Your paragraph states that partitioned table need no autovacuum,
> >    but doesn't state unmistakably that they will never be treated
> >    by autovacuum.
> 
> hmm. I assume the reader realises from the text that lack of any
> tuples means VACUUM is not required.  The remaining part of what
> autovacuum does not do is explained when the text goes on to say that
> ANALYZE operations are also not performed on partitioned tables. I'm
> not sure what is left that's mistakable there.

I rewrote the paragraph a little so that it looks clearer to me.
I hope it is OK for you as well.

> > 2. You make a distinction between table partitions and "normal tables",
> >    but really there is no distiction.
> 
> We may have different mental models here. This relates to the part
> that I wasn't keen on in your patch, i.e:
> 
> +    The partitions of a partitioned table are normal tables and get processed
> +    by autovacuum
> 
> While I agree that the majority of partitions are likely to be
> relkind='r', which you might ordinarily consider a "normal table", you
> just might change your mind when you try to INSERT or UPDATE records
> that would violate the partition constraint. Some partitions might
> also be themselves partitioned tables and others might be foreign
> tables. That does not really matter much when it comes to what
> autovacuum does or does not do, but I'm not really keen to imply in
> our documents that partitions are "normal tables".

Agreed, there are differences between partitions and normal tables.
And this is not the place in the documentation where we would like to
get into detail about the differences.

Attached is the next version of my patch.

Yours,
Laurenz Albe
From 33ef30888b5f5b57c776a1bd00065e0c94daccdb Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Wed, 6 Sep 2023 05:43:30 +0200
Subject: [PATCH] Improve autovacuum doc on partitioned tables

The documentation mentioned that autovacuum doesn't process
partitioned tables, but it was unclear about the impact.
The old wording could be interpreted to mean that there are
problems with dead tuple cleanup on partitioned tables.
Clarify that the only potential problem is autoanalyze, and
that statistics for the partitions will be gathered.

Author: Laurenz Albe
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/1fd81ddc7710a154834030133c6fea41e55c8efb.camel%40cybertec.at
---
 doc/src/sgml/maintenance.sgml | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..10b1f211e8 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -861,10 +861,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu

 

-Partitioned tables are not processed by autovacuum.  Statistics
-should be collected by running a manual ANALYZE when it is
-first populated, and again whenever the distribution of data in its
-partitions changes significantly.
+Partitioned tables do not directly store tuples and consequently do not
+require autovacuum to perform any VACUUM operations.
+Autovacuum performs a VACUUM on the partitioned
+table's partitions the same as it does with other tables.  While that
+works fine, the fact that autovacuum doesn't process partitioned tables
+also means that it doesn't run ANALYZE on them, and this
+can be problematic, as there are various places in the query planner that
+attempt to make use of table statistics for partitioned tables when
+partitioned tables are queried.  For now, you can work around this problem
+by running a manual ANALYZE command on the partitioned
+table when the partitioned table is first populated, and again whenever
+the distribution of data in its partitions changes significantly.

 

-- 
2.41.0



Re: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread Amit Kapila
On Tue, Sep 5, 2023 at 6:00 PM Dilip Kumar  wrote:
>
> On Tue, Sep 5, 2023 at 5:04 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
>
> > > Can't we just have code like this?  I mean we will have to make
> > > ReplicationSlotMarkDirty take slot as an argument or have another version
> > > which takes slot as an argument and that would be called by us as well as 
> > > by
> > > ReplicationSlotMarkDirty().  I mean why do we need these checks
> > > (s-(data.invalidated == RS_INVAL_NONE &&
> > > s->data.confirmed_flush != s->last_saved_confirmed_flush) under the
> > > mutex?  Walsender is shutdown so confirmed flush LSN can not move
> > > concurrently and slot can not be invalidated as well because that is done 
> > > by
> > > checkpointer and we are in checkpointer?
> >
>
...
> > Additionally, if we don't take the lock, we rely on the assumption that the
> > walsender will exit before the shutdown checkpoint, currently, that's true 
> > for
> > logical walsender, but physical walsender can exit later than checkpointer. 
> > So,
> > I am slight woirred that if we change the logical walsender's exit timing in
> > the future, the assumption may not hold.
> >
> > Besides, for non-built-in logical replication, if someone creates their own
> > walsender or other processes to send the changes and the process doesn't 
> > exit
> > before the shutdown checkpoint, it may also be a problem. Although I don't 
> > have
> > exsiting examples about these extensions, but I feel taking the lock would 
> > make
> > it more robust.
>
> I think our all logic is based on that the walsender is existed
> already.  If not then even if you check under the mutex that the
> confirmed flush LSN is not changed then it can changed right after you
> release the lock and then we will not be flushing the latest update of
> the confirmed flush lsn to the disk and our logic of comparing
> checkpoint.redo with the confirmed flush lsn might not work?
>

Right, it can change and in that case, the check related to
confirm_flush LSN will fail during the upgrade. However, the point is
that if we don't take spinlock, we need to properly write comments on
why unlike in other places it is safe here to check these values
without spinlock. We can do that but I feel we have to be careful for
all future usages of these variables, so, having spinlock makes them
follow the normal coding pattern which I feel makes it more robust.
Yes, marking dirty via common function also has merits but personally,
I find it better to follow the normal coding practice of checking the
required fields under spinlock. The other possibility is to first
check if we need to mark the slot dirty under spinlock, then release
the spinlock, and then call the common MarkDirty function, but again
that will be under the assumption that these flags won't change.

-- 
With Regards,
Amit Kapila.




Re: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread Dilip Kumar
On Wed, Sep 6, 2023 at 9:47 AM Amit Kapila  wrote:
>
> On Tue, Sep 5, 2023 at 6:00 PM Dilip Kumar  wrote:

>
> Right, it can change and in that case, the check related to
> confirm_flush LSN will fail during the upgrade. However, the point is
> that if we don't take spinlock, we need to properly write comments on
> why unlike in other places it is safe here to check these values
> without spinlock.

I agree with that, but now also it is not true that we are alway
reading this under the spin lock for example[1][2], we can see we are
reading this without spin lock.
[1]
StartLogicalReplication
{
/*
* Report the location after which we'll send out further commits as the
* current sentPtr.
*/
sentPtr = MyReplicationSlot->data.confirmed_flush;
}
[2]
LogicalIncreaseRestartDecodingForSlot
{
/* candidates are already valid with the current flush position, apply */
if (updated_lsn)
LogicalConfirmReceivedLocation(slot->data.confirmed_flush);
}

 We can do that but I feel we have to be careful for
> all future usages of these variables, so, having spinlock makes them
> follow the normal coding pattern which I feel makes it more robust.
> Yes, marking dirty via common function also has merits but personally,
> I find it better to follow the normal coding practice of checking the
> required fields under spinlock. The other possibility is to first
> check if we need to mark the slot dirty under spinlock, then release
> the spinlock, and then call the common MarkDirty function, but again
> that will be under the assumption that these flags won't change.

Thats true, but we are already making the assumption because now also
we are taking the spinlock and taking a decision of marking the slot
dirty.  And after that we are releasing the spin lock and if we do not
have guarantee that it can not concurrently change the many things can
go wrong no?

Anyway said that, I do not have any strong objection against what we
are doing now.  There were discussion around making the code so that
it can use common function and I was suggesting how it could be
achieved but I am not against the current way either.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Autogenerate some wait events code and documentation

2023-09-05 Thread Drouvot, Bertrand

Hi,

On 9/6/23 5:44 AM, Michael Paquier wrote:

On Wed, Sep 06, 2023 at 04:20:23AM +0200, Erik Wienhold wrote:

FYI: https://en.wikipedia.org/wiki/Sentence_spacing


That was an interesting read.  Thanks.


+1, thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Eager page freeze criteria clarification

2023-09-05 Thread Andres Freund
Hi,

On 2023-08-28 12:26:01 -0400, Robert Haas wrote:
> On Mon, Aug 28, 2023 at 10:00 AM Melanie Plageman
>  wrote:
> > For the second goal, I've relied on past data to predict future
> > behavior, so I tried several criteria to estimate the likelihood that a
> > page will not be imminently modified. What was most effective was
> > Andres' suggestion of comparing the page LSN to the insert LSN at the
> > end of the last vacuum of that table; this approximates whether the page
> > has been recently modified, which is a decent proxy for whether it'll be
> > modified in the future. To do this, we need to save that insert LSN
> > somewhere. In the attached WIP patch, I saved it in the table stats, for
> > now -- knowing that those are not crash-safe.
>
> I wonder what the real plan here is for where to store this. It's not
> obvious that we need this to be crash-safe; it's after all only for
> use by a heuristic, and there's no actual breakage if the heuristic
> goes wrong. At the same time, it doesn't exactly feel like a
> statistic.

I'm not certain either. This is generally something that's not satisfying
right now - although IMO not necessarily for the reason you mention. Given
that we already store, e.g., the time of the last autovacuum in the stats, I
don't see a problem also storing a corresponding LSN. My issue is more that
this kind of information not being crashsafe is really problematic - it's a
well known issue that autovacuum just doesn't do anything for a while after a
crash-restart (or pitr restore or ...), for example.

Given that all the other datapoints are stored in the stats, I think just
storing the LSNs alongside is reasonable.


> Then there's the question of whether it's the right metric. My first
> reaction is to think that it sounds pretty good. One thing I really
> like about it is that if the table is being vacuumed frequently, then
> we freeze less aggressively, and if the table is being vacuumed
> infrequently, then we freeze more aggressively. That seems like a very
> desirable property. It also seems broadly good that this metric
> doesn't really care about reads. If there are a lot of reads on the
> system, or no reads at all, it doesn't really change the chances that
> a certain page is going to be written again soon, and since reads
> don't change the insert LSN, here again it seems to do the right
> thing. I'm a little less clear about whether it's good that it doesn't
> really depend on wall-clock time.

Yea, it'd be useful to have a reasonably approximate wall clock time for the
last modification of a page. We just don't have infrastructure for determining
that. We'd need an LSN->time mapping (xid->time wouldn't be particularly
useful, I think).

A very rough approximate modification time can be computed by assuming an even
rate of WAL generation, and using the LSN at the time of the last vacuum and
the time of the last vacuum, to compute the approximate age.

For a while I thought that'd not give us anything that just using LSNs gives
us, but I think it might allow coming up with a better cutoff logic: Instead
of using a cutoff like "page LSN is older than 10% of the LSNs since the last
vacuum of the table", it would allow us to approximate "page has not been
modified in the last 15 seconds" or such.  I think that might help avoid
unnecessary freezing on tables with very frequent vacuuming.


> Certainly, that's desirable from the point of view of not wanting to have to
> measure wall-clock time in places where we otherwise wouldn't have to, which
> tends to end up being expensive.

IMO the bigger issue is that we don't want to store a timestamp on each page.


> >Page Freezes/Page Frozen (less is better)

As, I think, Robert mentioned downthread, I'm not sure this is a useful thing
to judge the different heuristics by. If the number of pages frozen is small,
the ratio quickly can be very large, without the freezing having a negative
effect.

I suspect interesting top-level figures to compare would be:

1) WAL volume (to judge the amount of unnecessary FPIs)

2) data reads + writes (to see the effect of repeated vacuuming of the same
   blocks)

3) number of vacuums and/or time spent vacuuming (freezing less aggressively
   might increase the number of vacuums due to anti-wrap vacuums, at the same
   time, freezing too aggressively could lead to vacuums taking too long)

4) throughput of the workload (to see potential regressions due to vacuuming
   overhead)

5) for transactional workloads: p99 latency (to see if vacuuming increases
   commit latency and such, just using average tends to hide too much)



Greetings,

Andres Freund




[Regression] Incorrect filename in test case comment

2023-09-05 Thread Suraj Kharage
Hi,

While browsing the test cases, found that the incorrect filename was there
in the test case comment.
The below commit added the custom hash opclass in insert.sql,

--




*commit fafec4cce814b9b15991b62520dc5e5e84655a8aAuthor: Alvaro Herrera
>Date:   Fri Apr 13
12:27:22 2018 -0300Use custom hash opclass for hash partition pruning*
  --

and later below commit moved those to test_setup.sql

--




*commit cc50080a828dd4791b43539f5a0f976e535d147cAuthor: Tom Lane
>Date:   Tue Feb 8 15:30:38 2022
-0500*

*Rearrange core regression tests to reduce cross-script dependencies. *
--

but we haven't changed the filename in other test cases.
Did the same in the attached patch.


-- 
--

Thanks & Regards,
Suraj kharage,



edbpostgres.com


Fix_file_name_in_test_case.patch
Description: Binary data


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-05 Thread Peter Smith
Hi, here are some comments for patch v31-0002.

==
src/bin/pg_upgrade/controldata.c

1. get_control_data

+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ {
+ bool have_error = false;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ cluster->controldata.chkpnt_latest =
+ strtoLSN(p, &have_error);

1a.
The declaration assignment of 'have_error' is redundant because it
gets overwritten before it is checked anyhow.

~

1b.
IMO that first check logic should also be shifted to be *inside* the
strtoLSN and it would just return have_error true. This eliminates
having 2x pg_fatal that have the same purpose.

~~~

2. strtoLSN

+/*
+ * Convert String to XLogRecPtr.
+ *
+ * This function is ported from pg_lsn_in_internal(). The function cannot be
+ * called from client binaries.
+ */
+XLogRecPtr
+strtoLSN(const char *str, bool *have_error)

SUGGESTION (comment wording)
This function is ported from pg_lsn_in_internal() which cannot be
called from client binaries.

==
src/bin/pg_upgrade/function.c

3. struct plugin_list

+typedef struct plugin_list
+{
+ int dbnum;
+ char*plugin;
+ struct plugin_list *next;
+} plugin_list;

I found that name confusing. IMO should be like 'plugin_list_elem'.

e.g. it gets too strange in subsequent code:
+ plugin_list *newentry = (plugin_list *) pg_malloc(sizeof(plugin_list));

~~~

4. is_plugin_unique

+/* Has the given plugin already been listed? */
+static bool
+is_plugin_unique(plugin_list_head *listhead, const char *plugin)
+{
+ plugin_list *point;
+
+ /* Quick return if the head is NULL */
+ if (listhead == NULL)
+ return true;
+
+ /* Seek the plugin list */
+ for (point = listhead->head; point; point = point->next)
+ {
+ if (strcmp(point->plugin, plugin) == 0)
+ return false;
+ }
+
+ return true;
+}

What's the meaning of the name 'point'? Maybe something generic like
'cur' or similar is better?

~~~

5. get_output_plugins

+/*
+ * Load the list of unique output plugins.
+ *
+ * XXX: Currently, we extract the list of unique output plugins, but this may
+ * be overkill. The list is used for two purposes - 1) to allocate the minimal
+ * memory for the library list and 2) to skip storing duplicated plugin names.
+ * However, the consumer check_loadable_libraries() can avoid double checks for
+ * the same library. The above means that we can arrange output plugins without
+ * considering their uniqueness, so that we can remove this function.
+ */
+static plugin_list_head *
+get_output_plugins(void)
+{
+ plugin_list_head *head = NULL;
+ int dbnum;
+
+ /* Quick return if there are no logical slots to be migrated. */
+ if (count_old_cluster_logical_slots() == 0)
+ return NULL;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+ /* Add to the list if the plugin has not been listed yet */
+ if (is_plugin_unique(head, slot->plugin))
+ add_plugin_list_item(&head, dbnum, slot->plugin);
+ }
+ }
+
+ return head;
+}

About the XXX. Yeah, since the uniqueness seems checked later anyway
all this extra code seems overkill. Instead of all the extra code you
just need a comment to mention how it will be sorted and checked
later.

But even if you prefer to keep it, I thought those 2 functions
'is_plugin_unique()' and 'add_plugin_list_item()' could have been
combined to just have 'add_plugin_list_unique_item()'. Since order
does not matter, such a function would just add items to the end of
the list (after finding uniqueness) instead of to the head.

~~~

6. get_loadable_libraries

  FirstNormalObjectId);
+
  totaltups += PQntuples(ress[dbnum]);
~

The extra blank line in the existing code is not needed in this patch.

~~~

7. get_loadable_libraries

  int rowno;
+ plugin_list *point;

~

Same as a prior comment #4. What's the meaning of the name 'point'?

~~~

8. get_loadable_libraries
+
+ /*
+ * If the old cluster has logical replication slots, plugins used by
+ * them must be also stored. It must be done only once, so do it at
+ * dbnum == 0 case.
+ */
+ if (output_plugins == NULL)
+ continue;
+
+ if (dbnum != 0)
+ continue;

This logic seems misplaced. If this "must be done only once" then why
is it within the db loop in the first place? Shouldn't this be done
seperately outside the loop?

==
src/bin/pg_upgrade/info.c

9.
+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)

Should this be forward declared like the other static functions are?

~~~

10. get_old_cluster_logical_slot_infos

+ for (slotnum = 0; slotnum < num_slots; slot

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-05 Thread Peter Smith
On Tue, Sep 5, 2023 at 7:34 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Also, for simplifying codes, only a first-met invalidated slot is output in 
> the
> check_old_cluster_for_valid_slots(). Warning messages int the function were
> removed. I think it may be enough because check_new_cluster_is_empty() do
> similar thing. Please tell me if it should be reverted...
>

Another possible idea is to show all the WARNINGS but only when in verbose mode.

---
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-05 Thread Zhijie Hou (Fujitsu)
On Wednesday, September 6, 2023 11:18 AM Zhijie Hou (Fujitsu) 
 wrote:
> 
> On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田 隼人
>  wrote:
> 
> 4.
> +  * XXX: As mentioned in comments atop get_output_plugins(), we may
> not
> +  * have to consider the uniqueness of entries. If so, we can use
> +  * count_old_cluster_logical_slots() instead of plugin_list_length().
> +  */
> 
> I think check_loadable_libraries() will avoid loading the same library, so it 
> seems
> fine to skip duplicating the plugins and we can save some codes.

Sorry, there is a typo, I mean "deduplicating" instead of " duplicating "

> 
> 
>   /* Did the library name change?  Probe it. */
>   if (libnum == 0 || strcmp(lib, os_info.libraries[libnum -
> 1].name) != 0)
> 
> 
> But if we think duplicating them would be better, I feel we could use the

Here also " duplicating " should be "deduplicating".

Best Regards,
Hou zj



Re: pg_basebackup: Always return valid temporary slot names

2023-09-05 Thread Nishant Sharma
On Tue, Sep 5, 2023 at 4:40 PM Jelte Fennema  wrote:

> I modified the PgBouncer PR to always set the sign bit to 0. But I
> still I think it makes sense to also address this in pg_basebackup.


Sounds good to me. Thank you!


On Tue, Sep 5, 2023 at 5:36 PM Daniel Gustafsson  wrote:

> Since the value in the temporary slotname isn't used to convey meaning, but
> merely to ensure uniqueness, I don't think it's unreasonable to guard
> aginst
> malformed input (ie negative integer).
>

 Ok. In this case, I also agree.


+1 to the patch from my side. Thank you!


Regards,
Nishant.


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-09-05 Thread John Naylor
On Mon, Aug 28, 2023 at 9:44 PM Masahiko Sawada 
wrote:
>
> I've attached v42 patch set. I improved tidstore regression test codes
> in addition of imcorporating the above comments.

Seems fine at a glance, thanks. I will build on this to implement
variable-length values. I have already finished one prerequisite which is:
public APIs passing pointers to values.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread Amit Kapila
On Wed, Sep 6, 2023 at 9:57 AM Dilip Kumar  wrote:
>
> On Wed, Sep 6, 2023 at 9:47 AM Amit Kapila  wrote:
> >
> > On Tue, Sep 5, 2023 at 6:00 PM Dilip Kumar  wrote:
>
> >
> > Right, it can change and in that case, the check related to
> > confirm_flush LSN will fail during the upgrade. However, the point is
> > that if we don't take spinlock, we need to properly write comments on
> > why unlike in other places it is safe here to check these values
> > without spinlock.
>
> I agree with that, but now also it is not true that we are alway
> reading this under the spin lock for example[1][2], we can see we are
> reading this without spin lock.
> [1]
> StartLogicalReplication
> {
> /*
> * Report the location after which we'll send out further commits as the
> * current sentPtr.
> */
> sentPtr = MyReplicationSlot->data.confirmed_flush;
> }
> [2]
> LogicalIncreaseRestartDecodingForSlot
> {
> /* candidates are already valid with the current flush position, apply */
> if (updated_lsn)
> LogicalConfirmReceivedLocation(slot->data.confirmed_flush);
> }
>

These are accessed only in walsender and confirmed_flush is always
updated by walsender. So, this is clearly okay.

>  We can do that but I feel we have to be careful for
> > all future usages of these variables, so, having spinlock makes them
> > follow the normal coding pattern which I feel makes it more robust.
> > Yes, marking dirty via common function also has merits but personally,
> > I find it better to follow the normal coding practice of checking the
> > required fields under spinlock. The other possibility is to first
> > check if we need to mark the slot dirty under spinlock, then release
> > the spinlock, and then call the common MarkDirty function, but again
> > that will be under the assumption that these flags won't change.
>
> Thats true, but we are already making the assumption because now also
> we are taking the spinlock and taking a decision of marking the slot
> dirty.  And after that we are releasing the spin lock and if we do not
> have guarantee that it can not concurrently change the many things can
> go wrong no?
>

Also, note that invalidated field could be updated by startup process
but that is only possible on standby, so it is safe but again that
would be another assumption.

> Anyway said that, I do not have any strong objection against what we
> are doing now.  There were discussion around making the code so that
> it can use common function and I was suggesting how it could be
> achieved but I am not against the current way either.
>

Okay, thanks for looking into it.

-- 
With Regards,
Amit Kapila.




Re: persist logical slots to disk during shutdown checkpoint

2023-09-05 Thread Dilip Kumar
On Wed, Sep 6, 2023 at 12:01 PM Amit Kapila  wrote:
>
> On Wed, Sep 6, 2023 at 9:57 AM Dilip Kumar  wrote:
> >
> > On Wed, Sep 6, 2023 at 9:47 AM Amit Kapila  wrote:
> > >
> > > On Tue, Sep 5, 2023 at 6:00 PM Dilip Kumar  wrote:
> >
> > >
> > > Right, it can change and in that case, the check related to
> > > confirm_flush LSN will fail during the upgrade. However, the point is
> > > that if we don't take spinlock, we need to properly write comments on
> > > why unlike in other places it is safe here to check these values
> > > without spinlock.
> >
> > I agree with that, but now also it is not true that we are alway
> > reading this under the spin lock for example[1][2], we can see we are
> > reading this without spin lock.
> > [1]
> > StartLogicalReplication
> > {
> > /*
> > * Report the location after which we'll send out further commits as the
> > * current sentPtr.
> > */
> > sentPtr = MyReplicationSlot->data.confirmed_flush;
> > }
> > [2]
> > LogicalIncreaseRestartDecodingForSlot
> > {
> > /* candidates are already valid with the current flush position, apply */
> > if (updated_lsn)
> > LogicalConfirmReceivedLocation(slot->data.confirmed_flush);
> > }
> >
>
> These are accessed only in walsender and confirmed_flush is always
> updated by walsender. So, this is clearly okay.

Hmm, that's a valid point.

> >  We can do that but I feel we have to be careful for
> > > all future usages of these variables, so, having spinlock makes them
> > > follow the normal coding pattern which I feel makes it more robust.
> > > Yes, marking dirty via common function also has merits but personally,
> > > I find it better to follow the normal coding practice of checking the
> > > required fields under spinlock. The other possibility is to first
> > > check if we need to mark the slot dirty under spinlock, then release
> > > the spinlock, and then call the common MarkDirty function, but again
> > > that will be under the assumption that these flags won't change.
> >
> > Thats true, but we are already making the assumption because now also
> > we are taking the spinlock and taking a decision of marking the slot
> > dirty.  And after that we are releasing the spin lock and if we do not
> > have guarantee that it can not concurrently change the many things can
> > go wrong no?
> >
>
> Also, note that invalidated field could be updated by startup process
> but that is only possible on standby, so it is safe but again that
> would be another assumption.

Okay, so I also agree to go with the current patch.  Because as you
said above if we access this without a spin lock outside walsender
then we will be making a new exception and I agree with that decision
of not making the new exception.

Other than that the patch LGTM.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




  1   2   >