Re: Document that server will start even if it's unable to open some TCP/IP ports

2023-06-12 Thread Nathan Bossart
On Sat, May 27, 2023 at 03:17:21PM -0700, Gurjeet Singh wrote:
> Upon further testing, I saw that server will not start only if it is
> unable to open the port on _all_ the interfaces/addresses. It it's
> able to open the port on at least one, the server will start.

This surprised me.  I would've expected the server to fail to start if it
failed for anything in listen_addresses.  After some digging, I found what
I believe is the original justification [0] as well as a follow-up thread
[1] that seems to call out kernel support for IPv6 as the main objection.
Perhaps it is time to reevaluate this decision.

> If listen_addresses is empty, then server won't try to open any TCP/IP
> ports. The patch does not change any language related to that.

Your proposed change notes that the server only starts if it can listen on
at least one TCP/IP address, which I worry might lead folks to think that
the server won't start if listen_addresses is empty.

[0] https://postgr.es/m/6739.1079384078%40sss.pgh.pa.us
[1] https://postgr.es/m/200506281149.51696.peter_e%40gmx.net

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: RFC: Adding \history [options] [filename] to psql (Snippets and Shared Queries)

2023-06-12 Thread Gurjeet Singh
On Mon, Jun 5, 2023 at 8:58 AM Kirk Wolak  wrote:
>
> Everyone,
>   After recently deep diving on some readline features and optimizing my bash 
> environment to have a static set of "snippets" that I can always find...
>
>   it takes just a couple of history API calls to add some interesting 
> features for those  that want them.  The result of adding 3-4 such commands 
> (all under \history, and with compatible flags):
>
> - Saving your current history without exiting (currently doable as \s 
> :HISTFILE)
> - Reloading your history file (so you can easily share something across 
> sessions) w/o exiting.
> - Stack Loading of specific history (like a shared snippets library, and a 
> personal snippets library) [clearing your history, then loading them in a 
> custom order]
>
>   The upside is really about clearly identifying and sharing permanent 
> snippets, while having that list be editable externally.  Again, bringing 
> teams online who don't always know the PG way of doing things (Waits, Locks, 
> Space, High CPU queries, Running Queries).
>
>   My intention is to leverage the way PSQL keeps the Comment above the SQL 
> with the SQL.
> Then I can step backwards searching for "context" markers (Ctrl-R) or
> --  [F8] {history-search-backward}
>
>   To rip through my snippets
>
> Kirk...
> PS: I could do all of this under \s [options] [filename] it's just less 
> clear...

Understandably, there doesn't seem to be a lot of enthusiasm for this.
If you could show others a sample/demo session of what the UI and UX
would look like, maybe others can chime in with either their opinion
of the behaviour, or perhaps a better/different way of achieving that.

Best regards,
Gurjeet
http://Gurje.et




Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

2023-06-12 Thread Gurjeet Singh
On Mon, Jun 5, 2023 at 4:24 AM Ranier Vilela  wrote:
> Em seg., 5 de jun. de 2023 às 08:06, Ranier Vilela  
> escreveu:
>> Em dom., 4 de jun. de 2023 às 23:37, Richard Guo  
>> escreveu:
>>> On Sun, Jun 4, 2023 at 8:42 PM Ranier Vilela  wrote:
 Hi,

 Per Coverity.

 At function ExtendBufferedRelShared, has a always true test.
 eb.rel was dereferenced one line above, so in
 if (eb.rel) is always true.

 I think it's worth removing the test, because Coverity raises dozens of 
 alerts thinking eb.rel might be NULL.
 Besides, one less test is one less branch.
>>>
>>>
>>> This also happens in ExtendBufferedRelTo, and the comment there explains
>>> that the eb.rel 'could have been closed while waiting for lock'.
>>
>> Well, RelationGetSmgr also dereferences eb.rel.
>> If eb.rel could be closed while waiting for lock,
>> anyone who references eb.rel below takes a risk?
>>
>> static inline SMgrRelation
>> RelationGetSmgr(Relation rel)
>> {
>> if (unlikely(rel->rd_smgr == NULL))
>> smgrsetowner(&(rel->rd_smgr), smgropen(rel->rd_locator, rel->rd_backend));
>> return rel->rd_smgr;
>> }
>
> Sorry Richard, nevermind.
>
> My fault, I withdraw this patch.

I'm not quite convinced of the reasoning provided; either the reason
is not good enough, or my C is rusty. In either case, I'd like a
resolution.

The code in question:

>   LockRelationForExtension(eb.rel, ExclusiveLock);
>
>   /* could have been closed while waiting for lock */
>   if (eb.rel)
>   eb.smgr = RelationGetSmgr(eb.rel);

eb.rel is being passed by-value at line 1, so even if the relation is
closed, the value of the eb.rel cannot change between line 1 and line
3. So a code verification tool complaining that the 'if' condition
will always be true is quite right, IMO.

To verify my assumptions, I removed those checks and ran `make
{check,check-world,installcheck}`, and all those tests passed.

The only way, that I can think of, the value of eb.rel can change
between lines 1 and 3 is if 'eb' is a shared-memory data structure,
and some other process changed the 'rel' member in shared-memory. And
I don't think 'eb' is in shared memory in this case.

To me, it looks like these checks are a result of code being
copy-pasted from somewhere else, where this check might have been
necessary. The checks are sure not necessary at these spots.

Please see attached v2 of the patch; it includes both occurrences of
the spurious checks identified in this thread.

Best regards,
Gurjeet
http://Gurje.et


v2-0001-Remove-always-true-checks.patch
Description: Binary data


Re: Improve logging when using Huge Pages

2023-06-12 Thread Michael Paquier
On Mon, Jun 12, 2023 at 02:37:15PM -0700, Nathan Bossart wrote:
> Fair enough.  I know I've been waffling in the GUC versus function
> discussion, but FWIW v7 of the patch looks reasonable to me.

+   Assert(strcmp("unknown", GetConfigOption("huge_pages_status", false, 
false)) != 0);

Not sure that there is anything to gain with this assertion in
CreateSharedMemoryAndSemaphores(), because this is pretty much what
check_GUC_init() looks after?

@@ -627,6 +627,9 @@ CreateAnonymousSegment(Size *size)
}
 #endif
 
+   SetConfigOption("huge_pages_status", ptr == MAP_FAILED ? "off" : "on",
+   PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);

The choice of this location is critical, because this is just *after*
we've tried huge pages, but *before* doing the fallback mmap() call
when the huge page allocation was on "try".  I think that this
deserves a comment.

@@ -327,6 +327,10 @@ retry:
Sleep(1000);
continue;
}
+
+   SetConfigOption("huge_pages_status", (flProtect & SEC_LARGE_PAGES) ?
+   "on" : "off", PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);

Hmm.  I still think that it is cleaner to move that at the end of
PGSharedMemoryCreate() for the WIN32 case.  There are also few FATALs
in-between, which would make SetConfigOption() useless if there is an
in-flight problem.

Don't we need to update save_backend_variables() and add an entry
in BackendParameters to make other processes launched by EXEC_BACKEND
inherit the status value set?
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-06-12 Thread Masahiko Sawada
On Tue, Jun 6, 2023 at 2:13 PM John Naylor  wrote:
>
> On Mon, Jun 5, 2023 at 5:32 PM Masahiko Sawada  wrote:
> >
> > > Sometime in the not-too-distant future, I will start a new thread 
> > > focusing on bitmap heap scan, but for now, I just want to share some 
> > > progress on making the radix tree usable not only for that, but hopefully 
> > > a wider range of applications, while making the code simpler and the 
> > > binary smaller. The attached patches are incomplete (e.g. no iteration) 
> > > and quite a bit messy, so tar'd and gzip'd for the curious (should apply 
> > > on top of v32 0001-03 + 0007-09 ).
> > >
> >
> > Thank you for making progress on this. I agree with these directions
> > overall. I have some comments and questions:
>
> Glad to hear it and thanks for looking!
>
> > > * Other nodes will follow in due time, but only after I figure out how to 
> > > do it nicely (ideas welcome!) -- currently node32's two size classes work 
> > > fine for growing, but the code should be simplified before extending to 
> > > other cases.)
> >
> > Within the size class, we just alloc a new node of lower size class
> > and do memcpy(). I guess it will be almost same as what we do for
> > growing.
>
> Oh, the memcpy part is great, very simple. I mean the (compile-time) "class 
> info" table lookups are a bit awkward. I'm thinking the hard-coded numbers 
> like this:
>
> .fanout = 3,
> .inner_size = sizeof(RT_NODE_INNER_3) + 3 * sizeof(RT_PTR_ALLOC),
>
> ...may be better with a #defined symbol that can also be used elsewhere.

FWIW, exposing these definitions would be good in terms of testing too
since we can use them in regression tests.

>
> > I don't think
> > shrinking class-3 to class-1 makes sense.
>
> Agreed. The smallest kind should just be freed when empty.
>
> > > Limited support for "combined pointer-value slots". At compile-time, 
> > > choose either that or "single-value leaves" based on the size of the 
> > > value type template parameter. Values that are pointer-sized or less can 
> > > fit in the last-level child slots of nominal "inner nodes" without 
> > > duplicated leaf-node code. Node256 now must act like the previous 
> > > 'node256 leaf', since zero is a valid value. Aside from that, this was a 
> > > small change.
> >
> > Yes, but it also means that we use pointer-sized value anyway even if
> > the value size is less than that, which wastes the memory, no?
>
> At a low-level, that makes sense, but I've found an interesting global effect 
> showing the opposite: _less_ memory, which may compensate:
>
> psql -c "select * from bench_search_random_nodes(1*1000*1000)"
> num_keys = 992660
>
> (using a low enough number that the experimental change n125->n63 doesn't 
> affect anything)
> height = 4, n3 = 375258, n15 = 137490, n32 = 0, n63 = 0, n256 = 1025
>
> v31:
>  mem_allocated | load_ms | search_ms
> ---+-+---
>   47800768 | 253 |   134
>
> (unreleased code "similar" to v33, but among other things restores the 
> separate "extend down" function)
>  mem_allocated | load_ms | search_ms
> ---+-+---
>   42926048 | 221 |   127
>
> I'd need to make sure, but apparently just going from 6 non-empty memory 
> contexts to 3 (remember all values are embedded here) reduces memory 
> fragmentation significantly in this test. (That should also serve as a 
> demonstration that additional size classes have both runtime costs as well as 
> benefits. We need to have a balance.)

Interesting. The result would probably vary if we change the slab
block sizes. I'd like to experiment if the code is available.

>
> So, I'm inclined to think the only reason to prefer "multi-value leaves" is 
> if 1) the value type is _bigger_ than a pointer 2) there is no convenient 
> abbreviation (like tid bitmaps have) and 3) the use case really needs to 
> avoid another memory access. Under those circumstances, though, the new code 
> plus lazy expansion etc might suit and be easier to maintain.

Indeed.

>
> > > What I've shared here could work (in principal, since it uses uint64 
> > > values) for tidstore, possibly faster (untested) because of better code 
> > > density, but as mentioned I want to shoot for higher. For tidbitmap.c, I 
> > > want to extend this idea and branch at run-time on a per-value basis, so 
> > > that a page-table entry that fits in a pointer can go there, and if not, 
> > > it'll be a full leaf. (This technique enables more flexibility in 
> > > lossifying pages as well.) Run-time info will require e.g. an additional 
> > > bit per slot. Since the node header is now 3 bytes, we can spare one more 
> > > byte in the node3 case. In addition, we can and should also bump it back 
> > > up to node4, still keeping the metadata within 8 bytes (no struct 
> > > padding).
> >
> > Sounds good.
>
> The additional bit per slot would require per-node logic and additional 
> branches, which is not great. I'm now thinking a much 

Re: Support logical replication of DDLs

2023-06-12 Thread Amit Kapila
On Thu, Jun 8, 2023 at 5:32 PM shveta malik  wrote:
>
> On Thu, Jun 8, 2023 at 10:36 AM Amit Kapila  wrote:
>
> Please find new set of patches addressing below:
> a) Issue mentioned by Wang-san in [1],
> b) Comments from Peter given in [2]
> c) Comments from Amit given in the last 2 emails.
>

As mentioned previously [1], I think there is a value to allow
appending the options not given in the original statement (say
tablespace) but it would be better to provide additional information
with some subscription option like expanded_mode  = on/off. With
expanded_mode = off, we should only WAL log the information required
to construct the original DDL statement.

I think we can now remove ObjTree part of the code as we have seen
that we can form the required jsonb without it as well.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B3ac2qXZZYfdiobuOF17e60v-fiFMG7HfJx93WbEkFhQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: add non-option reordering to in-tree getopt_long

2023-06-12 Thread Nathan Bossart
On Tue, Jun 13, 2023 at 12:00:01PM +0900, Kyotaro Horiguchi wrote:
> POSIXLY_CORRECT appears to be intended for debugging or feature
> validation. If we know we can always rearrange argv on those
> platforms, we don't need it.  I would suggest that we turn on the new
> feature at the compile time on those platforms where we know this
> rearrangement works, instead of switching at runtime.

I'd be okay with leaving it out wherever possible.  I'm curious whether any
supported systems do not allow this.

> As far as I can see, getopt_long on Rocky9 does *not* rearrange argv
> until it reaches the end of the array. But it won't matter much.

Do you mean that it rearranges argv once all the options have been
returned, or that it doesn't rearrange argv at all?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Atomic ops for unlogged LSN

2023-06-12 Thread Nathan Bossart
On Mon, Jun 12, 2023 at 07:24:18PM -0400, Stephen Frost wrote:
> * Nathan Bossart (nathandboss...@gmail.com) wrote:
>> Is it?  I see uses in GiST indexing (62401db), so it's not immediately
>> obvious to me how it is debugging-only.  If it is, then I think this patch
>> ought to clearly document it so that nobody else tries to use it for
>> non-debugging-only stuff.
> 
> I don't see it as a debugging value.  I'm not sure where that came
> from..?  We do use it in places and if anything, I expect it to be used
> more, not less, in the future as a persistant generally increasing
> value (could go backwards on a crash-restart but in such case all
> unlogged tables are truncated).

This is my understanding as well.

>> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
>> might see an old value of unloggedLSN.  From the following note in
>> README.barrier, it sounds like this would be a problem even if we ensured
>> full barrier semantics:

Never mind.  I think I'm forgetting that the atomics support in Postgres
deals with cache coherency.

> The concern around unlogged LSN, imv anyway, is less about shared memory
> access and making sure that all callers understand that it can move
> backwards on a crash/restart.  I don't think that's an issue for current
> users but we just need to make sure to try and comment sufficiently
> regarding that such that new users understand that about this particular
> value.

Right.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg_stat_io for the startup process

2023-06-12 Thread Kyotaro Horiguchi
At Mon, 12 Jun 2023 21:13:35 -0700, Andres Freund  wrote in 
> Thanks for looking - I already had pushed the commit by the time I read your
> email, otherwise I'd have mentioned you reviewing it...

Oops! But I anticipated that and that's no problem (I should have
confirmed commits.). Thanks for the reply!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: index prefetching

2023-06-12 Thread Dilip Kumar
On Thu, Jun 8, 2023 at 9:10 PM Tomas Vondra
 wrote:

> We already do prefetching for bitmap index scans, where the bitmap heap
> scan prefetches future pages based on effective_io_concurrency. I'm not
> sure why exactly was prefetching implemented only for bitmap scans, but
> I suspect the reasoning was that it only helps when there's many
> matching tuples, and that's what bitmap index scans are for. So it was
> not worth the implementation effort.

One of the reasons IMHO is that in the bitmap scan before starting the
heap fetch TIDs are already sorted in heap block order.  So it is
quite obvious that once we prefetch a heap block most of the
subsequent TIDs will fall on that block i.e. each prefetch will
satisfy many immediate requests.  OTOH, in the index scan the I/O
request is very random so we might have to prefetch many blocks even
for satisfying the request for TIDs falling on one index page.  I
agree with prefetching with an index scan will definitely help in
reducing the random I/O, but this is my guess that thinking of
prefetching with a Bitmap scan appears more natural and that would
have been one of the reasons for implementing this only for a bitmap
scan.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2023-06-12 Thread Zhijie Hou (Fujitsu)
On Tuesday, June 13, 2023 12:04 PM Amit Kapila  wrote:
> 
> On Wed, Jun 7, 2023 at 6:02 PM Tomas Vondra
>  wrote:
> >
> >
> > Well, I think the issue is pretty clear - we end up with an initial
> > snapshot that's in between the ASSIGNMENT and NEW_CID, and because
> > NEW_CID has both xact and subxact XID it fails because we add two TXNs
> > with the same LSN, not realizing one of them is subxact.
> >
> > That's obviously wrong, although somewhat benign in production because
> > it only fails because of hitting an assert.
> >
> 
> Doesn't this indicate that we can end up decoding a partial transaction when
> we restore a snapshot? Won't that be a problem even for production?

Yes, I think it can cause the problem that only partial changes of a 
transaction are streamed.
I tried to reproduce this and here are the steps. Note, to make sure the test
won't be affected by other running_xact WALs, I changed LOG_SNAPSHOT_INTERVAL_MS
to a bigger number.

session 1:
-
create table test(a int);
SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot_1', 
'test_decoding');
-

session 2:
-
- Start a transaction
BEGIN;
INSERT INTO test VALUES(1);
-

session 3:
-
- Create another slot isolation_slot_2, it should choose a restart_point which 
is
- after the changes that happened in session 2. Note, to let the current slot
- restore another snapshot, we need to use gdb to block the current backend at
- SnapBuildFindSnapshot(), the backend should have logged the running_xacts WAL
- before reaching SnapBuildFindSnapshot.

SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot_2', 
'test_decoding');
-

session 1:
-
- Since there is a running_xacts which session 3 logged, the current backend 
will
- serialize the snapshot when decoding the running_xacts WAL, and the snapshot
- can be used by other slots (e.g. isolation_slot_2)

SELECT data FROM pg_logical_slot_get_changes('isolation_slot_1', NULL, NULL, 
'skip-empty-xacts', '1', 'include-xids', '0');
-

session 2:
-
- Insert some different data and commit the transaction.

INSERT INTO test VALUES(2);
INSERT INTO test VALUES(3);
INSERT INTO test VALUES(4);
COMMIT
-

session 3:
-
- Release the process and try to stream the changes, since the restart point is
- at the middle of the transaction, it will stream partial changes of the
- transaction which was committed in session 2:

SELECT data FROM pg_logical_slot_get_changes('isolation_slot_2', NULL, NULL, 
'skip-empty-xacts', '1', 'include-xids', '0');
-

Results (partial streamed changes):
postgres=# SELECT data FROM pg_logical_slot_get_changes('isolation_slot_2', 
NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
  data
-
 BEGIN
 table public.test: INSERT: a[integer]:2
 table public.test: INSERT: a[integer]:3
 table public.test: INSERT: a[integer]:4
 COMMIT
(5 rows)

> 
> > Regular builds are likely to
> > just ignore it, although I haven't checked if the COMMIT cleanup (I
> > wonder if we remove the subxact from the toplevel list on commit).
> >
> > I think the problem is we just grab an existing snapshot, before all
> > running xacts complete. Maybe we should fix that, and leave the
> > needs_full_snapshot alone.
> >
> 
> It is not clear what exactly you have in mind to fix this because if there is 
> no
> running xact, we don't even need to restore the snapshot because of a prior
> check "if (running->oldestRunningXid ==
> running->nextXid)". I think the main problem is that we started
> decoding immediately from the point where we restored a snapshot as at that
> point we could have some partial running xacts.


Best Regards,
Hou zj


Re: pg_stat_io for the startup process

2023-06-12 Thread Andres Freund
Hi,

On 2023-06-13 12:54:19 +0900, Kyotaro Horiguchi wrote:
> I think that the reason is that we only pass true only when we're in a
> sort of idle time. Addition to that XLOG_RUNNING_XACTS comes so
> infrequently to cause noticeable degradation. If it causes
> sufficiently frequent updates, we will be satisfied with it.
> 
> The patch is quite simple and looks good to me. (At least way better
> than always using GetCurrentTimestamp():)

Thanks for looking - I already had pushed the commit by the time I read your
email, otherwise I'd have mentioned you reviewing it...

Greetings,

Andres Freund




Re: Views no longer in rangeTabls?

2023-06-12 Thread Amit Langote
On Sat, Jun 10, 2023 at 10:27 PM Tom Lane  wrote:
> Julien Rouhaud  writes:
> > On Sat, Jun 10, 2023 at 08:56:47AM -0400, Tom Lane wrote:
> >> -   rte->relkind = 0;
>
> > and also handle that field in (read|out)funcs.c
>
> Oh, right.  Ugh, that means a catversion bump.  It's not like
> we've never done that during beta, but it's kind of an annoying
> cost for a detail as tiny as this.

OK, so how about the attached?

I considered adding Assert(relkind == RELKIND_VIEW) in all places that
have the "rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)"
condition, but that seemed like an overkill, so only added one in the
#ifdef USE_ASSERT_CHECKING block in ExecCheckPermissions() that
f75cec4fff877 added.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
From f7390a898b7e156d75372d28ea5698d2ced9795b Mon Sep 17 00:00:00 2001
From: Amit Langote 
Date: Tue, 13 Jun 2023 12:52:47 +0900
Subject: [PATCH v1] Retain relkind too in RTE_SUBQUERY entries for views.

47bb9db75 modified the ApplyRetrieveRule()'s conversion of a view's
original RTE_RELATION entry into an RTE_SUBQUERY one to retain relid,
rellockmode, and perminfoindex so that the executor can lock the view
and check its permissions.  It seems better to also retain
relkind for cross-checking that the exception of an
RTE_SUBQUERY entry being allowed to carry relation details only
applies to views, so do so.

Bump catversion because this changes the output format of
RTE_SUBQUERY RTEs.

Suggested-by: David Steele 
Discussion: https://postgr.es/m/3953179e-9540-e5d1-a743-4bef368785b0%40pgmasters.net
---
 src/backend/executor/execMain.c  |  3 +++
 src/backend/nodes/outfuncs.c |  1 +
 src/backend/nodes/readfuncs.c|  1 +
 src/backend/rewrite/rewriteHandler.c |  7 +++
 src/include/catalog/catversion.h |  2 +-
 src/include/nodes/parsenodes.h   | 14 +++---
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c76fdf59ec..7aed5e7b36 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -595,6 +595,9 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
 		if (rte->perminfoindex != 0)
 		{
 			/* Sanity checks */
+			Assert(rte->rtekind == RTE_RELATION ||
+   (rte->rtekind == RTE_SUBQUERY &&
+	rte->relkind == RELKIND_VIEW));
 			(void) getRTEPermissionInfo(rteperminfos, rte);
 			/* Many-to-one mapping not allowed */
 			Assert(!bms_is_member(rte->perminfoindex, indexset));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ba00b99249..955286513d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -513,6 +513,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 			WRITE_BOOL_FIELD(security_barrier);
 			/* we re-use these RELATION fields, too: */
 			WRITE_OID_FIELD(relid);
+			WRITE_CHAR_FIELD(relkind);
 			WRITE_INT_FIELD(rellockmode);
 			WRITE_UINT_FIELD(perminfoindex);
 			break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 597e5b3ea8..a136ae1d60 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -503,6 +503,7 @@ _readRangeTblEntry(void)
 			READ_BOOL_FIELD(security_barrier);
 			/* we re-use these RELATION fields, too: */
 			READ_OID_FIELD(relid);
+			READ_CHAR_FIELD(relkind);
 			READ_INT_FIELD(rellockmode);
 			READ_UINT_FIELD(perminfoindex);
 			break;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 0e4f76efa8..0967be762c 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1849,11 +1849,10 @@ ApplyRetrieveRule(Query *parsetree,
 
 	/*
 	 * Clear fields that should not be set in a subquery RTE.  Note that we
-	 * leave the relid, rellockmode, and perminfoindex fields set, so that the
-	 * view relation can be appropriately locked before execution and its
-	 * permissions checked.
+	 * leave the relid, rellockmode, relkind, and perminfoindex fields set, so
+	 * that the view relation can be appropriately locked before execution and
+	 * its permissions checked.
 	 */
-	rte->relkind = 0;
 	rte->tablesample = NULL;
 	rte->inh = false;			/* must not be set for a subquery */
 
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index c784937a0e..fe70d8396d 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202305211
+#define CATALOG_VERSION_NO	202306141
 
 #endif
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 0ca298f5a1..786692781d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1056,13 +1056,13 @@ typedef struct RangeTblEntry
 	 * this RTE in the containing struct's list of same; 0 if permissions need
 	 * not be checked for this RTE.
 	 *
-	 * As a 

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2023-06-12 Thread Amit Kapila
On Wed, Jun 7, 2023 at 6:02 PM Tomas Vondra
 wrote:
>
>
> Well, I think the issue is pretty clear - we end up with an initial
> snapshot that's in between the ASSIGNMENT and NEW_CID, and because
> NEW_CID has both xact and subxact XID it fails because we add two TXNs
> with the same LSN, not realizing one of them is subxact.
>
> That's obviously wrong, although somewhat benign in production because
> it only fails because of hitting an assert.
>

Doesn't this indicate that we can end up decoding a partial
transaction when we restore a snapshot? Won't that be a problem even
for production?

> Regular builds are likely to
> just ignore it, although I haven't checked if the COMMIT cleanup (I
> wonder if we remove the subxact from the toplevel list on commit).
>
> I think the problem is we just grab an existing snapshot, before all
> running xacts complete. Maybe we should fix that, and leave the
> needs_full_snapshot alone.
>

It is not clear what exactly you have in mind to fix this because if
there is no running xact, we don't even need to restore the snapshot
because of a prior check "if (running->oldestRunningXid ==
running->nextXid)". I think the main problem is that we started
decoding immediately from the point where we restored a snapshot as at
that point we could have some partial running xacts.

-- 
With Regards,
Amit Kapila.




Re: pg_stat_io for the startup process

2023-06-12 Thread Kyotaro Horiguchi
At Fri, 9 Jun 2023 21:28:23 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2023-05-22 13:36:42 +0900, Kyotaro Horiguchi wrote:
> > At Sun, 21 May 2023 15:14:23 -0700, Andres Freund  
> > wrote in 
> > > Hi,
> > > 
> > > I don't really feel confident we're going to get the timeout approach 
> > > solid
> > > enough for something going into the tree around beta 1.
> > > 
> > > How about this, imo a lot simpler, approach: We flush stats whenever 
> > > replaying
> > > a XLOG_RUNNING_XACTS record. Unless the primary is idle, it will log 
> > > those at
> > > a regular interval. If the primary is idle, we don't need to flush stats 
> > > in
> > > the startup process, because we'll not have done any io.
> > > 
> > > We only log XLOG_RUNNING_XACTS when wal_level >= replica, so stats 
> > > wouldn't
> > > get regularly flushed if wal_level = minimal - but in that case the stats 
> > > are
> > > also not accessible, so that's not a problem.
> > 
> > I concur with the three aspects, interval regularity, reliability and
> > about the case of the minimal wal_level. While I can't confidently
> > claim it is the best, its simplicilty gives us a solid reason to
> > proceed with it.  If necessary, we can switch to alternative timing
> > sources in the future without causing major disruptions for users, I
> > believe.
> > 
> > > It's not the prettiest solution, but I think the simplicity is worth a 
> > > lot.
> > > 
> > > Greetings,
> > 
> > +1.
> 
> Attached a patch implementing this approach.
> 
> It's imo always a separate bug that we were using
> GetCurrentTransactionStopTimestamp() when force was passed in - that timestamp
> can be quite out of date in some cases. But I don't immediately see a
> noticeable consequence, so ...

Thanks!

I think that the reason is that we only pass true only when we're in a
sort of idle time. Addition to that XLOG_RUNNING_XACTS comes so
infrequently to cause noticeable degradation. If it causes
sufficiently frequent updates, we will be satisfied with it.

The patch is quite simple and looks good to me. (At least way better
than always using GetCurrentTimestamp():)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1

2023-06-12 Thread Tom Lane
Richard Guo  writes:
> Oh, wait ... It occurred to me that we may have this same issue with
> Memoize cache keys.  In get_memoize_path we collect the cache keys from
> innerpath's ppi_clauses and innerrel's lateral_vars, and the latter may
> contain nullingrel markers that need adjustment.  As an example,
> consider the query below

> explain (costs off)
> select * from onek t1
> left join onek t2 on true
> left join lateral
>   (select * from onek t3 where t3.two = t2.two offset 0) s
>   on t2.unique1 = 1;
> ERROR:  wrong varnullingrels (b 3) (expected (b)) for Var 2/3

Good catch --- I'll take a closer look tomorrow.

regards, tom lane




Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1

2023-06-12 Thread Richard Guo
On Mon, Jun 12, 2023 at 10:02 PM Tom Lane  wrote:

> Richard Guo  writes:
> > Yeah, that makes sense.  process_subquery_nestloop_params is a better
> > place to do this adjustments.  +1 to v2 patch.
>
> Pushed, then.


Oh, wait ... It occurred to me that we may have this same issue with
Memoize cache keys.  In get_memoize_path we collect the cache keys from
innerpath's ppi_clauses and innerrel's lateral_vars, and the latter may
contain nullingrel markers that need adjustment.  As an example,
consider the query below

explain (costs off)
select * from onek t1
left join onek t2 on true
left join lateral
  (select * from onek t3 where t3.two = t2.two offset 0) s
  on t2.unique1 = 1;
ERROR:  wrong varnullingrels (b 3) (expected (b)) for Var 2/3

Attached is a patch that does the same adjustments to innerrel's
lateral_vars before they are added to MemoizePath->param_exprs.

I was wondering if there are more places that need this kind of
adjustments.  After some thoughts I believe the Memoize cache keys
should be the last one regarding adjustments to nestloop parameters.
AFAICS the lateral references in origin query would go to two places,
one is plan_params and the other is lateral_vars.  And now we've handled
both of them.

Thanks
Richard


v3-0001-Fix-nulling-bitmap-for-Memoize-cache-keys.patch
Description: Binary data


Re: add non-option reordering to in-tree getopt_long

2023-06-12 Thread Kyotaro Horiguchi
At Fri, 9 Jun 2023 16:22:57 -0700, Nathan Bossart  
wrote in 
> While working on 2dcd157, I noticed cfbot failures for Windows due to tests
> with commands that had non-options specified before options.  For example,
> "createuser myrole -a myadmin" specified a non-option (myrole) before the
> option/argument pair (-a admin).  To get the tests passing for Windows,
> non-options must be placed at the end (e.g., "createuser -a myadmin
> myrole").  This same problem was encountered while working on 08951a7 [0],
> but it was again fortunately caught with cfbot.  Others have not been so
> lucky [1] [2] [3].

While I don't see it as reason to change the behavior, I do believe
the change could be beneficial from a user's perspective.

> The reason for this discrepancy is because Windows uses the in-tree
> implementation of getopt_long(), which differs from the other
> implementations I've found in that it doesn't reorder non-options to the
> end of argv by default.  Instead, it returns -1 as soon as the first
> non-option is found, even if there are other options listed afterwards.  By
> moving non-options to the end of argv, getopt_long() can parse all
> specified options and return -1 when only non-options remain.  The
> implementations I reviewed all reorder argv unless the POSIXLY_CORRECT
> environment variable is set or the "optstring" argument begins with '+'.
>
> The best reasons I can think of to keep the current behavior are 1)
> reordering involves writing to the original argv array, which could be
> risky (as noted by Tom [4]) and 2) any systems with a getopt_long()
> implementation that doesn't reorder non-options could begin failing tests
> that take advantage of this behavior.  However, this quirk in the in-tree
> getopt_long() is periodically missed by hackers, the only systems I'm aware
> of that use it are Windows and AIX, and every other implementation of
> getopt_long() I saw reorders non-options by default.  Furthermore, C99
> omits const decorations in main()'s signature, so modifying argv might not
> be too scary.

POSIXLY_CORRECT appears to be intended for debugging or feature
validation. If we know we can always rearrange argv on those
platforms, we don't need it.  I would suggest that we turn on the new
feature at the compile time on those platforms where we know this
rearrangement works, instead of switching at runtime.

> Thus, I propose introducing this non-option reordering behavior but
> allowing it to be disabled via the POSIXLY_CORRECT environment variable.
> The attached patch is my first attempt at implementing this proposal.  I
> don't think we need to bother with handling '+' at the beginning of
> "optstring" since it seems unlikely to be used in PostgreSQL, but it would
> probably be easy enough to add if folks want it.
> 
> I briefly looked at getopt() and concluded that we should probably retain
> its POSIX-compliant behavior for non-options, as reordering support seems
> much less universal than with getopt_long().  AFAICT all client utilities
> use getopt_long(), anyway.
> 
> Thoughts?

As far as I can see, getopt_long on Rocky9 does *not* rearrange argv
until it reaches the end of the array. But it won't matter much.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-06-12 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

> At PGcon and other places we have discussed the time-delayed logical
> replication,
> but now we have understood that there are no easy ways. Followings are our
> analysis.

At this point, I have not planned to develop the PoC anymore, unless better idea
or infrastructure will come.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

Thank you for advises.

> From: Bruce Momjian 
> Sent: Monday, June 12, 2023 10:38 PM
> > I understand what the problem is. I will put a mechanism maintaining 
> > compatibility into the patch.
> > I believe there are three approaches.
> > Approach 1-1 is preferable because it does not require additional options 
> > for postgres_fdw.
> > I will revise the patch according to Approach 1-1, unless otherwise 
> > commented.
> >
> > Approach1:
> > I ensure that postgres_fdw retrieves the version of each remote server
> > and does not partial aggregate pushd down if the server version is less 
> > than 17.
> > There are two approaches to obtaining remote server versions.
> > Approach1-1: postgres_fdw connects a remote server and use 
> > PQserverVersion().
> > Approach1-2: Adding a postgres_fdw option about a remote server version 
> > (like "server_version").
> >
> > Approach2:
> > Adding a postgres_fdw option for partial aggregate pushdown is enable
> > or not (like enable_partial_aggregate_pushdown).
> 
> These are good questions.  Adding a postgres_fdw option called 
> enable_partial_aggregate_pushdown helps make the
> purpose of the option clear, but remote_version can be used for future 
> breakage as well.
> 
> I think remote_version is the best idea, and in the documention for the 
> option, let's explcitly say it is useful to disable
> partial aggreates pushdown on pre-PG 17 servers.  If we need to use the 
> option for other cases, we can just update the
> documentation.  When the option is blank, the default, everything is pushed 
> down.
> 
> I see remote_version a logical addition to match our "extensions" option that 
> controls what extension functions can be
> pushed down.

Thank you for your perspective.
So, of the approaches I have presented, you think that approach 1-2 is
preferable and that the option name remote_server is preferable?
Indeed, the option of a remote version may have other uses.
However, this information can be obtained by connecting to a remote server, 
I'm concerned that some people may find this option redundant.

Is the problem with approach 1-1 because the user cannot decide whether to 
include the compatibility check in the decision to do partial aggregate 
pushdown or not?
# If Approach 1-1 is taken, the problem is that this feature cannot be used for 
all buit-in aggregate functions
# when the remote server is older than PG17.
If so, Approache1-3 below seem more desirable.
Would it be possible for us to hear your thoughts?

Approache1-3:We add a postgres_fdw option about a compatibility check for 
partial aggregate pushdown
(like "enable_aggpartialfunc_compatibility_check"). This option is false, the 
default.
When this option is true, postgres_fdw obtains the remote server version by 
connecting the remote server and using PQserverVersion(). 
And if the remote server version is older than PG17, then the partial aggregate 
pushdown feature is enable for all buit-in aggregate functions.
Otherwise the partial aggregate pushdown feature is disable for all buit-in 
aggregate functions.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




Re: Shouldn't construct_array_builtin and deconstruct_array_builtin agree on types?

2023-06-12 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jun 12, 2023 at 11:06:18PM +0200, Peter Eisentraut wrote:
>> They only support the types that they were actually being used with.  If you
>> need another type, feel free to add it.

> FWIW, I agree with Tomas that this is an oversight that should be
> fixed in v16, saving from the need to have a copy of
> deconstruct_array_builtin() in extensions.

We don't want to bloat these functions indefinitely, so I understand
Peter's approach of only adding the cases actually being used.
At the same time, it's reasonable to take some thought for extensions
that might want slightly more functionality than the core code
happens to need at any given moment.

The idea of making both functions support the same set of types
does seem like a reasonable compromise.

regards, tom lane




Re: query_id, pg_stat_activity, extended query protocol

2023-06-12 Thread Michael Paquier
On Mon, Jun 12, 2023 at 09:03:24PM +0300, kaido vaikla wrote:
> I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
> is not present in pg_stat_activity.  Erik Wienhold figured out that reason
> can be in extended query protocol (
> https://www.postgresql.org/message-id/1391613709.939460.1684777418...@office.mailbox.org
> )
> My question is, is it expected or is it a bug: if extended query protocol
> then no query_id  in pg_stat_activity for running query.

Well, you could say a bit of both, I guess.  The query ID is compiled
and stored in backend entries only after parse analysis, which is not
something that would happen when using the execution phase of the
extended query protocol, though it should be possible to access to the
Query nodes in the cached plans and their assigned query IDs.

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list.  I have not in
details looked at how much could be achieved, TBH.
--
Michael


signature.asc
Description: PGP signature


Re: Setting restrictedtoken in pg_regress

2023-06-12 Thread Nathan Bossart
On Tue, Jun 13, 2023 at 08:29:19AM +0900, Michael Paquier wrote:
> I am actually a bit confused with the return value of
> CreateRestrictedProcess() on failures in restricted_token.c.  Wouldn't
> it be cleaner to return INVALID_HANDLE_VALUE rather than 0 in these
> cases?

My suspicion is that this was chosen to align with CreateProcess and to
allow things like

if (!CreateRestrictedProcess(...))

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Slight improvement of worker_spi.c example

2023-06-12 Thread Nathan Bossart
On Sat, Jun 03, 2023 at 06:35:00PM -0400, Michael Paquier wrote:
> It does not have to be complicated, but I definitely agree that we'd
> better spend some efforts in improving it as a whole especially
> knowing that this is mentioned on the docs as an example that one
> could rely on.

+1.  I know I've used worker_spi as a reference for writing background
workers before.

IMHO it'd be better if the patch documented the places where the ordering
really does matter instead of hoping extension authors will understand the
reasoning behind the proposed reordering.  I agree that the current code
could lead folks to think that PushActiveSnapshot must go after
SPI_connect, but wouldn't the reverse ordering just give folks the opposite
impression?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Setting restrictedtoken in pg_regress

2023-06-12 Thread Michael Paquier
On Mon, Jun 12, 2023 at 04:12:22PM -0700, Nathan Bossart wrote:
> On Tue, Aug 30, 2022 at 03:02:54PM +0200, Daniel Gustafsson wrote:
>> In pg_regress we set restrictedToken when calling CreateRestrictedProcess, 
>> but
>> we never seem to use that anywhere.  Not being well versed in Windows I might
>> be missing something, but is it needed or is it a copy/pasteo from 
>> fa1e5afa8a2
>> which does that in restricted_token.c?  If not needed, removing it makes the
>> code more readable IMO.
> 
> Looks reasonable to me.

Indeed, looks like a copy-pasto to me.

I am actually a bit confused with the return value of
CreateRestrictedProcess() on failures in restricted_token.c.  Wouldn't
it be cleaner to return INVALID_HANDLE_VALUE rather than 0 in these
cases?
--
Michael


signature.asc
Description: PGP signature


Re: Atomic ops for unlogged LSN

2023-06-12 Thread Stephen Frost
Greetings,

* Nathan Bossart (nathandboss...@gmail.com) wrote:
> On Thu, May 25, 2023 at 07:41:21AM +0900, Michael Paquier wrote:
> > On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote:
> >> So we indeed loose some "barrier strength" - but I think that's fine. For 
> >> one,
> >> it's a debugging-only value.
> 
> Is it?  I see uses in GiST indexing (62401db), so it's not immediately
> obvious to me how it is debugging-only.  If it is, then I think this patch
> ought to clearly document it so that nobody else tries to use it for
> non-debugging-only stuff.

I don't see it as a debugging value.  I'm not sure where that came
from..?  We do use it in places and if anything, I expect it to be used
more, not less, in the future as a persistant generally increasing
value (could go backwards on a crash-restart but in such case all
unlogged tables are truncated).

> >> But more importantly, I don't see what reordering
> >> the barrier could prevent - a barrier is useful for things like sequencing 
> >> two
> >> memory accesses to happen in the intended order - but unloggedLSN doesn't
> >> interact with another variable that's accessed within the ControlFileLock
> >> afaict.
> > 
> > This stuff is usually tricky enough that I am never completely sure
> > whether it is fine without reading again README.barrier, which is
> > where unloggedLSN is saved in the control file under its LWLock.
> > Something that I find confusing in the patch is that it does not
> > document the reason why this is OK.
> 
> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
> might see an old value of unloggedLSN.  From the following note in
> README.barrier, it sounds like this would be a problem even if we ensured
> full barrier semantics:
> 
> 3. No ordering guarantees.  While memory barriers ensure that any given
> process performs loads and stores to shared memory in order, they don't
> guarantee synchronization.  In the queue example above, we can use memory
> barriers to be sure that readers won't see garbage, but there's nothing to
> say whether a given reader will run before or after a given writer.  If 
> this
> matters in a given situation, some other mechanism must be used instead of
> or in addition to memory barriers.
> 
> IIUC we know that shared memory accesses cannot be reordered to precede
> aquisition or follow release of a spinlock (thanks to 0709b7e), which is
> why this isn't a problem in the current implementation.

The concern around unlogged LSN, imv anyway, is less about shared memory
access and making sure that all callers understand that it can move
backwards on a crash/restart.  I don't think that's an issue for current
users but we just need to make sure to try and comment sufficiently
regarding that such that new users understand that about this particular
value.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Shouldn't construct_array_builtin and deconstruct_array_builtin agree on types?

2023-06-12 Thread Michael Paquier
On Mon, Jun 12, 2023 at 11:06:18PM +0200, Peter Eisentraut wrote:
> They only support the types that they were actually being used with.  If you
> need another type, feel free to add it.

FWIW, I agree with Tomas that this is an oversight that should be
fixed in v16, saving from the need to have a copy of
deconstruct_array_builtin() in extensions.  Same argument here:
couldn't it be possible that an extension does multiple array
constructs and deconstructs in a single code path?  We have patterns
like that in core as well.  For instance, see
extension_config_remove().
--
Michael


signature.asc
Description: PGP signature


Re: Setting restrictedtoken in pg_regress

2023-06-12 Thread Nathan Bossart
On Tue, Aug 30, 2022 at 03:02:54PM +0200, Daniel Gustafsson wrote:
> In pg_regress we set restrictedToken when calling CreateRestrictedProcess, but
> we never seem to use that anywhere.  Not being well versed in Windows I might
> be missing something, but is it needed or is it a copy/pasteo from fa1e5afa8a2
> which does that in restricted_token.c?  If not needed, removing it makes the
> code more readable IMO.

Looks reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Atomic ops for unlogged LSN

2023-06-12 Thread Nathan Bossart
On Thu, May 25, 2023 at 07:41:21AM +0900, Michael Paquier wrote:
> On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote:
>> So we indeed loose some "barrier strength" - but I think that's fine. For 
>> one,
>> it's a debugging-only value.

Is it?  I see uses in GiST indexing (62401db), so it's not immediately
obvious to me how it is debugging-only.  If it is, then I think this patch
ought to clearly document it so that nobody else tries to use it for
non-debugging-only stuff.

>> But more importantly, I don't see what reordering
>> the barrier could prevent - a barrier is useful for things like sequencing 
>> two
>> memory accesses to happen in the intended order - but unloggedLSN doesn't
>> interact with another variable that's accessed within the ControlFileLock
>> afaict.
> 
> This stuff is usually tricky enough that I am never completely sure
> whether it is fine without reading again README.barrier, which is
> where unloggedLSN is saved in the control file under its LWLock.
> Something that I find confusing in the patch is that it does not
> document the reason why this is OK.

My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
might see an old value of unloggedLSN.  From the following note in
README.barrier, it sounds like this would be a problem even if we ensured
full barrier semantics:

3. No ordering guarantees.  While memory barriers ensure that any given
process performs loads and stores to shared memory in order, they don't
guarantee synchronization.  In the queue example above, we can use memory
barriers to be sure that readers won't see garbage, but there's nothing to
say whether a given reader will run before or after a given writer.  If this
matters in a given situation, some other mechanism must be used instead of
or in addition to memory barriers.

IIUC we know that shared memory accesses cannot be reordered to precede
aquisition or follow release of a spinlock (thanks to 0709b7e), which is
why this isn't a problem in the current implementation.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Michael Paquier
On Mon, Jun 12, 2023 at 12:24:30PM -0700, Andres Freund wrote:
> Those points seems pretty much unrelated to the potential gains from switching
> to a threading model. The main advantages are:
> 
> 1) We'd gain from being able to share state more efficiently (using normal
>pointers) and more dynamically (not needing to pre-allocate). That'd remove
>a good amount of complexity. As an example, consider the work we need to do
>to ferry tuples from one process to another. Even if we just continue to
>use shm_mq, in a threading world we could just put a pointer in the queue,
>but have the tuple data be shared between the processes etc.
> 
>Eventually this could include removing the 1:1 connection<->process/thread
>model. That's possible to do with processes as well, but considerably
>harder.
> 
> 2) Making context switches cheaper / sharing more resources at the OS and
>hardware level.

Yes.  FWIW, while reading the thread, parallel workers stroke me as
the first area that would benefit from all that.  Could it be easier
to figure out the incremental pieces if working on a new node doing a
Gather based on threads, for instance?
--
Michael


signature.asc
Description: PGP signature


Re: Meson build updates

2023-06-12 Thread Tristan Partin
On Mon Jun 12, 2023 at 4:24 PM CDT, Peter Eisentraut wrote:
> On 12.06.23 20:48, Tristan Partin wrote:
> > Attached you will find a v3 with the offending commits removed. I did
> > leave the overrides in since you didn't mention it in your last email.
>
> Patches 1-14 look ok to me.  (But I didn't test anything, so some 
> caveats about whether the non-cosmetic patches actually work apply.)  If 
> we're fine-tuning the capitalization the options descriptions, let's 
> also turn "tap tests" into "TAP tests" and "TCL" into "Tcl".

I'll get to that.

> Patch 15 about the wrap integration, I'm not sure.  I share the concerns 
> about whether this is worth maintaining.  Maybe put this patch into the 
> commitfest separately, to allow for further study and testing.  (The 
> other patches, if they are acceptable, ought to go into PG16, I think.)

Ok. I will split it off for further discussion. Expect a v4 tomorrow
with a few extra changes with regard to another review from a Meson
maintainer.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Improve logging when using Huge Pages

2023-06-12 Thread Nathan Bossart
On Tue, May 02, 2023 at 11:17:50AM +0900, Michael Paquier wrote:
> On Thu, Apr 20, 2023 at 02:16:17PM -0700, Nathan Bossart wrote:
>> AFAICT this would involve adding a bool to BackendParameters and using it
>> in save_backend_variables() and restore_backend_variables(), which is an
>> additional 3 lines of code.  That doesn't sound too bad to me, but perhaps
>> I am missing something.
> 
> Appending more information to BackendParameters would be OK, still
> this would require the extra SQL function to access it, which is
> something that pg_settings is able to equally offer access to if
> using a GUC.

Fair enough.  I know I've been waffling in the GUC versus function
discussion, but FWIW v7 of the patch looks reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-12 Thread Joe Conway

On 6/12/23 10:44, Joe Conway wrote:

1/ how do we fix the misbehavior reported due to libperl in existing
stable branches





I was mostly trying to concentrate on #1, but 2 & 3 are worthy of
discussion.


Hmm, browsing through the perl source I came across a reference to this 
(from https://perldoc.perl.org/perllocale):


---
PERL_SKIP_LOCALE_INIT

This environment variable, available starting in Perl v5.20, if set 
(to any value), tells Perl to not use the rest of the environment 
variables to initialize with. Instead, Perl uses whatever the current 
locale settings are. This is particularly useful in embedded 
environments, see "Using embedded Perl with POSIX locales" in perlembed.

---

Seems we ought to be using that.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: index prefetching

2023-06-12 Thread Tomasz Rybak
On Thu, 2023-06-08 at 17:40 +0200, Tomas Vondra wrote:
> Hi,
> 
> At pgcon unconference I presented a PoC patch adding prefetching for
> indexes, along with some benchmark results demonstrating the (pretty
> significant) benefits etc. The feedback was quite positive, so let me
> share the current patch more widely.
> 

I added entry to 
https://wiki.postgresql.org/wiki/PgCon_2023_Developer_Unconference
based on notes I took during that session.
Hope it helps.

-- 
Tomasz Rybak, Debian Developer 
GPG: A565 CE64 F866 A258 4DDC F9C7 ECB7 3E37 E887 AA8C




Re: Meson build updates

2023-06-12 Thread Peter Eisentraut

On 12.06.23 20:48, Tristan Partin wrote:

Attached you will find a v3 with the offending commits removed. I did
leave the overrides in since you didn't mention it in your last email.


Patches 1-14 look ok to me.  (But I didn't test anything, so some 
caveats about whether the non-cosmetic patches actually work apply.)  If 
we're fine-tuning the capitalization the options descriptions, let's 
also turn "tap tests" into "TAP tests" and "TCL" into "Tcl".


Patch 15 about the wrap integration, I'm not sure.  I share the concerns 
about whether this is worth maintaining.  Maybe put this patch into the 
commitfest separately, to allow for further study and testing.  (The 
other patches, if they are acceptable, ought to go into PG16, I think.)






Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Heikki Linnakangas

On 10/06/2023 21:01, Hannu Krosing wrote:

On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas  wrote:


If there are no major objections, I'm going to update the developer FAQ,
removing the excuses there for why we don't use threads [1].


I think it is not wise to start the wholesale removal of the objections there.

But I think it is worthwhile to revisit the section about threads and
maybe split out the historic part which is no more true, and provide
both pros and cons for these.



I started with this short summary from the discussion in this thread,
feel free to expand, argue, fix :)
* is current excuse
-- is counterargument or ack


Thanks, that's a good idea.


* Speed improvements using threads are small compared to the remaining
backend startup time.
-- we now have some measurements that show significant performance
improvements not related to startup time


Also, I don't expect much performance gain directly from switching to 
threads. The point is that switching to a multi-threaded model makes 
possible, or at least greatly simplifies, a lot of other development. 
Which can then help with the backend startup time, among other things. 
For example, a shared catalog cache.



* The backend code would be more complex.
-- this is still the case


I don't quite buy that. A multi-threaded model isn't inherently more 
complex than a multi-process model. Just different. Sure, the transition 
period will be more complex, when we need to support both models. But in 
the long run, if we can remove the multi-process mode, we can make a lot 
of things *simpler*.



-- even more worrisome is that all extensions also need to be rewritten


"rewritten" is an exaggeration. Yes, extensions will need adapt, similar 
to the core code. But I hope it will be pretty mechanical work, marking 
global variables as thread-local and such. Many extensions will work 
with little to no changes.



-- and many incompatibilities will be silent and take potentially years to find


IMO this is the most scary part of all this. I'm optimistic that we can 
have enough compiler support and tooling to catch most issues. But we 
don't know for sure at this point.



* Terminating backend processes allows the OS to cleanly and quickly
free all resources, protecting against memory and file descriptor
leaks and making backend shutdown cheaper and faster
-- still true


Yep. I'm not too worried about PostgreSQL code, our memory contexts and 
resource owners are very good at stopping leaks. But 3rd party libraries 
could pose hard problems. IIRC we still have a leak with the LLVM JIT 
code, for example. We should fix that anyway, of course, but the 
multi-process model is more forgiving with leaks like that.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Shouldn't construct_array_builtin and deconstruct_array_builtin agree on types?

2023-06-12 Thread Peter Eisentraut

On 12.06.23 22:38, Tomas Vondra wrote:

I noticed that these two function, introduced in d746021de18b, disagree
on what types they support.



I ran into this for INT4OID. Sure, I can just lookup the stuff and use
the regualr deconstruct_array, but the asymmetry seems a bit strange. Is
that intentional?


They only support the types that they were actually being used with.  If 
you need another type, feel free to add it.






Re: Order changes in PG16 since ICU introduction

2023-06-12 Thread Peter Eisentraut

On 09.06.23 02:36, Jeff Davis wrote:

Patches 0001, 0002:

These patches implement the built-in provider and automatically change
provider=icu to provider=builtin when the locale is C.


I object to adding a new provider for PG16 (patch 0001).  This is 
clearly a new feature, which wasn't even contemplated a few weeks ago.



  * Switch to the libc provider for the C locale: would make the libc
provider even more complicated and had some potential for confusion,
and also has catalog representation problems when --locale is specified
along with --lc-ctype.


I don't follow this concern.  This could be done entirely within initdb. 
 I mean, just change the default for --locale-provider if --locale=C is 
given.  That's like 10 lines of code in initdb.c.


I don't think I want CREATE DATABASE or CREATE COLLATION to have that 
logic, nor do they really need it.



Patch 0003:

Makes LOCALE apply to all providers. The overall feel after this patch
is that "locale" now means the collation locale, and
LC_COLLATE/LC_CTYPE are for the server environment. When using libc,
LC_COLLATE and LC_CTYPE still work as they did before, but their
relationship to database collation feels more like a special case of
the libc provider. I believe most people favor this patch and I haven't
seen recent objections.


This seems reasonable.


1. If you specify --locale-provider=builtin at initdb time, you *must*
specify --locale=C/POSIX, otherwise you get an error.


Shouldn't the --locale option be ignored (or rejected) in that case. 
Why insist on it being specified?



2. Patch 0004 is possibly out of scope for 16, but it felt consistent
with the other UI changes and low risk. Please try with/without before
objecting.


Also clearly a new feature.  Also the implications of various upgrade, 
dump/restore scenarios are not fully explored.


I think it's an interesting idea, to make CREATE DATABASE and CREATE 
COLLATION also default to icu of the respective higher scope has been 
set to icu.  In fact, this makes me wonder now whether changing the 
default to icu in *only* initdb is sensible.  But again, we'd need to 
see the full matrix of upgrade scenarios laid out here.



3. Daniel Verite felt that we should only change the provider from ICU
to "builtin" for the C locale if the provider is defaulting to ICU; not
if it's specified as ICU.


Correct, we shouldn't override what was explicitly specified.


I did not differentiate between specifying
ICU and defaulting to ICU because:
   a. "libc" unconditionally uses the built-in memcmp() logic for C, it
never actually uses libc
   b. If a user really wants the root locale or the en-US-u-va-posix
locale, they can specify those directly
   c. I don't see any plausible case where it helps a user to keep
provider=icu when locale=C.


If the user specifies that, that's up to them to deal with the outcomes. 
 Just changing it to something different seems wrong.



4. Joe Conway and Peter Eisentraut both felt that C.UTF-8 with
provider=icu should not be changed to use the builtin provider, and
instead passed on to ICU. I implemented a compromise where initdb will
change C.UTF-8 to the built-in provider; but CREATE DATABASE/COLLATION
will pass it along to ICU (which may support it as en-US-u-va-posix in
some versions, or may throw an error in other versions). My reasoning
is that initdb is pulling from the environment, and we should try
harder to succeed on any reasonable environmental settings (otherwise
initdb with default settings could fail); whereas we can be more strict
with CREATE DATABASE/COLLATION.


I'm not objecting to changing anything about C.UTF-8, but I am objecting 
to changing anything substantial in PG16.



5. For the built-in provider, initdb defaults to UTF-8 rather than
SQL_ASCII. Otherwise, you would be unable to use ICU at all later,
because ICU doesn't support SQL_ASCII.


Also a behavior change that is not appropriate for PG16 at this stage.




Shouldn't construct_array_builtin and deconstruct_array_builtin agree on types?

2023-06-12 Thread Tomas Vondra
Hi,

I noticed that these two function, introduced in d746021de18b, disagree
on what types they support. For construct_array_builtin, we support
these types:

 - CHAROID:
 - CSTRINGOID:
 - FLOAT4OID:
 - INT2OID:
 - INT4OID:
 - INT8OID:
 - NAMEOID:
 - OIDOID:
 - REGTYPEOID:
 - TEXTOID:
 - TIDOID:

while deconstruct_array_builtin supports just a subset:

 - CHAROID:
 - CSTRINGOID:
 - FLOAT8OID:
 - INT2OID:
 - OIDOID:
 - TEXTOID:
 - TIDOID:

I ran into this for INT4OID. Sure, I can just lookup the stuff and use
the regualr deconstruct_array, but the asymmetry seems a bit strange. Is
that intentional?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Do we want a hashset type?

2023-06-12 Thread Joel Jacobson
On Mon, Jun 12, 2023, at 21:58, Tomas Vondra wrote:
> But those are numbers for large keys - if you restrict the input to
> 4B-16B (which is what we planned to do by focusing on int, bigint and
> uuid), there's no significant difference:

Oh, sorry, I completely failed to read the meaning of the Columns.

> lookup3:
>
> Small key speed test -4-byte keys -30.17 cycles/hash
> Small key speed test -8-byte keys -31.00 cycles/hash
> Small key speed test -   16-byte keys -49.00 cycles/hash
>
> xxh3low:
>
> Small key speed test -4-byte keys -29.00 cycles/hash
> Small key speed test -8-byte keys -29.58 cycles/hash
> Small key speed test -   16-byte keys -37.00 cycles/hash

The winner of the "Small key speed test" competition seems to be:

ahash64 "ahash 64bit":
Small key speed test -4-byte keys -24.00 cycles/hash
Small key speed test -8-byte keys -24.00 cycles/hash
Small key speed test -   16-byte keys -26.98 cycles/hash

Looks like it's a popular one, e.g. it's used by Rust in their 
std::collections::HashSet.

Another notable property of ahash64 is that it's "DOS resistant",
but it isn't crucial for our use case, since we exclusively target
auto-generated primary keys which are not influenced by end-users.

> Not sure what you mean by "optimizing for space" - I imagined the
> on-disk format would just use the same hash table with tiny amount of
> free space (say 10% and not ~%50).

With "optimizing for space" I meant trying to find some alternative or
intermediate data structure that is more likely to be compressible,
like your idea of sorting the data.
What I wondered was if the on-disk format would be affected by
the choice of hash function. I guess it wouldn't, if the hashset
is created by adding the elements one-by-one by iterating
through the elements by reading the on-disk format.
But I thought maybe something more advanced could be
done, where conversion between the on-disk format
and the in-memory format could be done without naively
iterating through all elements, i.e. something less complex
than O(n).
No idea what that mechanism would be though.

> My suggestion is to be lazy, just use the lookup3 we have in hashfn.c
> (through hash_bytes or something), and at the same time make it possible
> to switch to a different function in the future. I'd store and ID of the
> hash function in the set, so that we can support a different algorithm
> in the future, if we choose to.

Sounds good to me.

Smart idea to include the hash function algorithm ID in the set,
to allow implementing a different one in the future!

/Joel




Re: Documentation for building with meson

2023-06-12 Thread Peter Eisentraut

On 10.06.23 06:00, Andres Freund wrote:

 From c5e637a54c2b83e5bd8c4155784d97e82937eb51 Mon Sep 17 00:00:00 2001
From: Samay Sharma
Date: Mon, 6 Feb 2023 16:09:42 -0800
Subject: [PATCH v9 2/5] Add data layout options sub-section in installation
  docs

This commit separates out blocksize, segsize and wal_blocksize
options into a separate Data layout options sub-section in both
the make and meson docs. They were earlier in a miscellaneous
section which included several unrelated options. This change
also helps reduce the repetition of the warnings that changing
these parameters breaks on-disk compatibility.

I still like this change, but ISTM that the "Data Layout" section should
follow the "PostgreSQL Features" section, rather than follow "Anti Features",
"Build Process Details" and "Miscellaneous". I realize some of these are
reorganized later on, but even then "Build Process Details"

Would anybody mind if I swapped these around?


I don't mind a Data Layout section in principle, but I wonder whether 
it's worth changing now.  The segsize option is proposed to be turned 
into a run-time option (and/or removed).  For the WAL block size, I had 
previously mentioned, I don't think it is correct that pg_upgrade should 
actually care about it.  So I wouldn't spend too much time trying to 
carefully refactor the notes on the data layout options if we're going 
to have to change them around before long again anyway.







Re: Do we want a hashset type?

2023-06-12 Thread Tomas Vondra



On 6/12/23 19:34, Joel Jacobson wrote:
> On Sat, Jun 10, 2023, at 22:12, Tomas Vondra wrote:
 1) better hash table implementation
> 
> I noticed src/include/common/hashfn.h provides implementation
> of the Jenkins/lookup3 hash function, and thought maybe
> we could simply use it in hashset?
> 
> However, I noticed that according to SMHasher [1],
> Jenkins/lookup3 has some quality problems:
> "UB, 28% bias, collisions, 30% distr, BIC"
> 
> Not sure if that's true or maybe not a problem in the PostgreSQL 
> implementation?
> > According to SHHasher, the two fastest 32/64-bit hash functions
> for non-cryptographic purposes without any quality problems
> that are also portable seems to be these two:
> 
> wyhash v4.1 (64-bit) [2]
> MiB/sec: 22513.04
> cycl./hash: 29.01
> size: 474
> 
> xxh3low (xxHash v3, 64-bit, low 32-bits part) [3]
> MiB/sec: 20722.94
> cycl./hash: 30.26
> size: 756
> 
> [1] https://github.com/rurban/smhasher
> [2] https://github.com/wangyi-fudan/wyhash
> [3] https://github.com/Cyan4973/xxHash
> 

But those are numbers for large keys - if you restrict the input to
4B-16B (which is what we planned to do by focusing on int, bigint and
uuid), there's no significant difference:

lookup3:

Small key speed test -4-byte keys -30.17 cycles/hash
Small key speed test -8-byte keys -31.00 cycles/hash
Small key speed test -   16-byte keys -49.00 cycles/hash

xxh3low:

Small key speed test -4-byte keys -29.00 cycles/hash
Small key speed test -8-byte keys -29.58 cycles/hash
Small key speed test -   16-byte keys -37.00 cycles/hash

But you can try doing some measurements, of course. Or just do profiling
to see how much time we spend in the hash function - I'd bet it's pretty
tiny fraction of the total time.

As for the "quality" issues - it's the same algorithm in Postgres, so it
has the same issues. I don't if that has measurable impact, though. I'd
guess it does not, particularly for "reasonably small" sets.

 5) more efficient storage format, with versioning etc.
>> I think the main question is whether to serialize the hash table as is,
>> or compact it in some way. The current code just uses the same thing for
>> both cases - on-disk format and in-memory representation (aggstate).
>> That's simple, but it also means the on-disk value is likely not well
>> compressible (because it's ~50% random data.
>>
>> I've thought about serializing just the values (as a simple array), but
>> that defeats the whole purpose of fast membership checks. I have two ideas:
>>
>> a) sort the data, and use binary search for this compact variant (and
>> then expand it into "full" hash table if needed)
>>
>> b) use a more compact hash table (with load factor much closer to 1.0)
>>
>> Not sure which if this option is the right one, each has cost for
>> converting between formats (but large on-disk value is not free either).
>>
>> That's roughly what I did for the tdigest extension.
> 
> Is the choice of hash function (and it's in-memory representation)
> independent of the on-disk format question, i.e. could we work
> on experimenting and evaluating different hash functions first,
> to optimise for speed and quality, and then when done, proceed
> and optimise for space, or are the two intertwined somehow?
> 

Not sure what you mean by "optimizing for space" - I imagined the
on-disk format would just use the same hash table with tiny amount of
free space (say 10% and not ~%50).


My suggestion is to be lazy, just use the lookup3 we have in hashfn.c
(through hash_bytes or something), and at the same time make it possible
to switch to a different function in the future. I'd store and ID of the
hash function in the set, so that we can support a different algorithm
in the future, if we choose to.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Andres Freund
Hi,

On 2023-06-12 16:23:14 +0400, Pavel Borisov wrote:
> Is the following true or not?
>
> 1. If we switch processes to threads but leave the amount of session
> local variables unchanged, there would be hardly any performance gain.

False.


> 2. If we move some backend's local variables into shared memory then
> the performance gain would be very near to what we get with threads
> having equal amount of session-local variables.

False.


> In other words, the overall goal in principle is to gain from less
> memory copying wherever it doesn't add the burden of locks for
> concurrent variables access?

False.

Those points seems pretty much unrelated to the potential gains from switching
to a threading model. The main advantages are:

1) We'd gain from being able to share state more efficiently (using normal
   pointers) and more dynamically (not needing to pre-allocate). That'd remove
   a good amount of complexity. As an example, consider the work we need to do
   to ferry tuples from one process to another. Even if we just continue to
   use shm_mq, in a threading world we could just put a pointer in the queue,
   but have the tuple data be shared between the processes etc.

   Eventually this could include removing the 1:1 connection<->process/thread
   model. That's possible to do with processes as well, but considerably
   harder.

2) Making context switches cheaper / sharing more resources at the OS and
   hardware level.

Greetings,

Andres Freund




Re: Meson build updates

2023-06-12 Thread Tristan Partin
On Mon Jun 12, 2023 at 12:20 PM CDT, Andres Freund wrote:
> Hi,
>
> On 2023-06-12 11:54:42 -0500, Tristan Partin wrote:
> > On Fri Jun 9, 2023 at 1:36 PM CDT, Andres Freund wrote:
> > > The biggest change to make them more usable would be to properly 
> > > reconfigure
> > > when the contents of machine file change. IIRC configure is rerun, but the
> > > changes aren't taken into account.
> > 
> > I could not reproduce this. Perhaps you were testing with an older Meson
> > where that was the case
> > 
> > # meson.build
> > project('mytest')
> > 
> > myprog = find_program('myprog')
> > message(myprog.full_path())
> > 
> > test('dummy', find_program('echo'), args: [myprog.full_path()])
> > 
> > # file.ini
> > [binaries]
> > myprog = '/usr/bin/python3'
> > 
> > # CLI
> > meson setup build
> > meson test -C build
> > sed -i 's/python3/python2/' file.ini
> > meson test -C build
>
> It's possible that it doesn't happen in all contexts. I just reproduced the
> problem I had, changing
>
> [binaries]
> llvm-config = '/usr/bin/llvm-config-13'
>
> to
>
> [binaries]
> llvm-config = '/usr/bin/llvm-config-14'
>
> does not change which version is used in an existing build tree, but does
> change what's used in a new build tree.
>
> Same with e.g. changing the C compiler version in a machine file. That also
> only takes effect in a new tree.
>
>
> This is with meson HEAD, updated earlier today.

Mind opening a Meson issue if one doesn't exist already?

> > > In the end, I also just don't see a meaningful benefit in forcing the use 
> > > of
> > > machine files.
> > 
> > I think it is best to use patterns tools want you to use.
>
> Sometimes. I'd perhaps have a different view if we weren't migrating from
> autoconf, where overwriting binaries was trivially possible...

I'll see what I can advocate for, regardless. The following things seem
relevant. Might be useful to track in your meta issue on your fork.

https://github.com/mesonbuild/meson/issues/7755
https://github.com/mesonbuild/meson/pull/11561
https://github.com/mesonbuild/meson/issues/6180
https://github.com/mesonbuild/meson/issues/11294

Attached you will find a v3 with the offending commits removed. I did
leave the overrides in since you didn't mention it in your last email.

-- 
Tristan Partin
Neon (https://neon.tech)
From 3467aa4c4f9a239d349d84f26f4662fd37ec605f Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 16 May 2023 07:55:03 -0500
Subject: [PATCH v3 01/15] Remove triple-quoted strings

Triple-quoted strings are for multiline strings in Meson. None of the
descriptions that got changed were multiline and the entire file uses
single-line descriptions.
---
 meson_options.txt | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/meson_options.txt b/meson_options.txt
index 5b44a8829d..1ea9729dc2 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -10,19 +10,19 @@ option('blocksize', type : 'combo',
 option('wal_blocksize', type : 'combo',
   choices: ['1', '2', '4', '8', '16', '32', '64'],
   value: '8',
-  description : '''WAL block size, in kilobytes''')
+  description : 'WAL block size, in kilobytes')
 
 option('segsize', type : 'integer', value : 1,
-  description : '''Segment size, in gigabytes''')
+  description : 'Segment size, in gigabytes')
 
 option('segsize_blocks', type : 'integer', value: 0,
-  description : '''Segment size, in blocks''')
+  description : 'Segment size, in blocks')
 
 
 # Miscellaneous options
 
 option('krb_srvnam', type : 'string', value : 'postgres',
-  description : '''Default Kerberos service principal for GSSAPI''')
+  description : 'Default Kerberos service principal for GSSAPI')
 
 option('system_tzdata', type: 'string', value: '',
   description: 'use system time zone data in specified directory')
@@ -32,7 +32,7 @@ option('system_tzdata', type: 'string', value: '',
 
 option('pgport', type : 'integer', value : 5432,
   min: 1, max: 65535,
-  description : '''Default port number for server and clients''')
+  description : 'Default port number for server and clients')
 
 
 # Developer options
-- 
Tristan Partin
Neon (https://neon.tech)

From 8e911a487443b0fa40c5e7033746d97ee9a741f1 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 16 May 2023 08:03:31 -0500
Subject: [PATCH v3 02/15] Use consistent casing in Meson option descriptions

Meson itself uses capital letters for option descriptions, so follow
that.
---
 meson_options.txt | 90 +++
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/meson_options.txt b/meson_options.txt
index 1ea9729dc2..fa823fd088 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -5,7 +5,7 @@
 option('blocksize', type : 'combo',
   choices : ['1', '2', '4', '8', '16', '32'],
   value : '8',
-  description: 'set relation block size in kB')
+  description: 'Set relation block size in kB')
 
 option('wal_blocksize', type : 'combo',
   choices: ['1', '2', '4', '8', '16', '32', '64'],
@@ -25,7 

query_id, pg_stat_activity, extended query protocol

2023-06-12 Thread kaido vaikla
I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity.  Erik Wienhold figured out that reason
can be in extended query protocol (
https://www.postgresql.org/message-id/1391613709.939460.1684777418...@office.mailbox.org
)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id  in pg_stat_activity for running query.

br
Kaido


Re: Do we want a hashset type?

2023-06-12 Thread Joel Jacobson
On Sat, Jun 10, 2023, at 22:12, Tomas Vondra wrote:
>>> 1) better hash table implementation

I noticed src/include/common/hashfn.h provides implementation
of the Jenkins/lookup3 hash function, and thought maybe
we could simply use it in hashset?

However, I noticed that according to SMHasher [1],
Jenkins/lookup3 has some quality problems:
"UB, 28% bias, collisions, 30% distr, BIC"

Not sure if that's true or maybe not a problem in the PostgreSQL implementation?

According to SHHasher, the two fastest 32/64-bit hash functions
for non-cryptographic purposes without any quality problems
that are also portable seems to be these two:

wyhash v4.1 (64-bit) [2]
MiB/sec: 22513.04
cycl./hash: 29.01
size: 474

xxh3low (xxHash v3, 64-bit, low 32-bits part) [3]
MiB/sec: 20722.94
cycl./hash: 30.26
size: 756

[1] https://github.com/rurban/smhasher
[2] https://github.com/wangyi-fudan/wyhash
[3] https://github.com/Cyan4973/xxHash

>>> 5) more efficient storage format, with versioning etc.
> I think the main question is whether to serialize the hash table as is,
> or compact it in some way. The current code just uses the same thing for
> both cases - on-disk format and in-memory representation (aggstate).
> That's simple, but it also means the on-disk value is likely not well
> compressible (because it's ~50% random data.
>
> I've thought about serializing just the values (as a simple array), but
> that defeats the whole purpose of fast membership checks. I have two ideas:
>
> a) sort the data, and use binary search for this compact variant (and
> then expand it into "full" hash table if needed)
>
> b) use a more compact hash table (with load factor much closer to 1.0)
>
> Not sure which if this option is the right one, each has cost for
> converting between formats (but large on-disk value is not free either).
>
> That's roughly what I did for the tdigest extension.

Is the choice of hash function (and it's in-memory representation)
independent of the on-disk format question, i.e. could we work
on experimenting and evaluating different hash functions first,
to optimise for speed and quality, and then when done, proceed
and optimise for space, or are the two intertwined somehow?

/Joel




Re: Meson build updates

2023-06-12 Thread Andres Freund
Hi,

On 2023-06-12 11:54:42 -0500, Tristan Partin wrote:
> On Fri Jun 9, 2023 at 1:36 PM CDT, Andres Freund wrote:
> > The biggest change to make them more usable would be to properly reconfigure
> > when the contents of machine file change. IIRC configure is rerun, but the
> > changes aren't taken into account.
> 
> I could not reproduce this. Perhaps you were testing with an older Meson
> where that was the case
> 
> # meson.build
> project('mytest')
> 
> myprog = find_program('myprog')
> message(myprog.full_path())
> 
> test('dummy', find_program('echo'), args: [myprog.full_path()])
> 
> # file.ini
> [binaries]
> myprog = '/usr/bin/python3'
> 
> # CLI
> meson setup build
> meson test -C build
> sed -i 's/python3/python2/' file.ini
> meson test -C build

It's possible that it doesn't happen in all contexts. I just reproduced the
problem I had, changing

[binaries]
llvm-config = '/usr/bin/llvm-config-13'

to

[binaries]
llvm-config = '/usr/bin/llvm-config-14'

does not change which version is used in an existing build tree, but does
change what's used in a new build tree.

Same with e.g. changing the C compiler version in a machine file. That also
only takes effect in a new tree.


This is with meson HEAD, updated earlier today.


> > In the end, I also just don't see a meaningful benefit in forcing the use of
> > machine files.
> 
> I think it is best to use patterns tools want you to use.

Sometimes. I'd perhaps have a different view if we weren't migrating from
autoconf, where overwriting binaries was trivially possible...

Greetings,

Andres Freund




Re: [RFC] building postgres with meson - v12

2023-06-12 Thread Andres Freund
Hi,

Small update for posterity:

On 2022-09-09 16:58:36 -0700, Andres Freund wrote:
> I've run this through enough attempts by now that I'm quite confident that the
> problem does not occur when the errormode does not include
> SEM_NOOPENFILEERRORBOX. I'll want a few more runs to be certain, but...
> 
> 
> Given that the problem appears to happen after _exit() is called, and only
> when SEM_NOOPENFILEERRORBOX is not set, it seems likely to be an OS / C
> runtime bug. Presumably it's related to something that python does first, but
> I don't see how anything could justify crashing only if SEM_NOOPENFILEERRORBOX
> is set (rather than the opposite).

These SEM_NOOPENFILEERRORBOX references should have been SEM_NOGPFAULTERRORBOX
- I guess after staring at these names for a while, I couldn't quite see the
difference anymore.

Greetings,

Andres Freund




Re: Meson build updates

2023-06-12 Thread Tristan Partin
On Fri Jun 9, 2023 at 1:36 PM CDT, Andres Freund wrote:
> Hi,
>
> On 2023-06-09 13:15:27 -0500, Tristan Partin wrote:
> > On Fri Jun 9, 2023 at 11:41 AM CDT, Andres Freund wrote:
> > > I like the current version better - depending on the meson version makes 
> > > it
> > > easy to find what needs to be removed when we upgrade the meson minimum
> > > version.
> > 
> > I think the savings of not looking up selinux/systemd on non-Linux
> > systems is pretty big. Would you accept a change of something like:
>
> For selinux it's default disabled, so it doesn't make a practical
> difference. Outside of linux newer versions of meson are more common IME, so
> I'm not really worried about it for systemd.
>
>
> > if meson.version().version_compare('>=0.59')
> >   # do old stuff
> > else if host_system == 'linux'
> >   # do new stuff
> > endif
> > 
> > Otherwise, I am happy to remove the patch.
>
> Hm, I don't quite know how this would end up looking like.

Actually, nevermind. I don't know what I was talking about. I will just
go ahead and remove this patch from the set.

> > > > From 189d3ac3d5593ce3e475813e4830a29bb4e96f70 Mon Sep 17 00:00:00 2001
> > > > From: Tristan Partin 
> > > > Date: Wed, 17 May 2023 10:36:52 -0500
> > > > Subject: [PATCH postgres v1 16/17] Add Meson overrides
> > > > 
> > > > Meson has the ability to do transparent overrides when projects are used
> > > > as subprojects. For instance, say I am building a Postgres extension. I
> > > > can define Postgres to be a subproject of my extension given the
> > > > following wrap file:
> > > > 
> > > > [wrap-git]
> > > > url = https://git.postgresql.org/git/postgresql.git
> > > > revision = master
> > > > depth = 1
> > > > 
> > > > [provide]
> > > > dependency_names = libpq
> > > > 
> > > > Then in my extension (root project), I can have the following line
> > > > snippet:
> > > > 
> > > > libpq = dependency('libpq')
> > > > 
> > > > This will tell Meson to transparently compile libpq prior to it
> > > > compiling my extension (because I depend on libpq) if libpq isn't found
> > > > on the host system.
> > > > 
> > > > I have also added overrides for the various public-facing exectuables.
> > > > Though I don't expect them to get much usage, might as well go ahead and
> > > > override them. They can be used by adding the following line to the
> > > > aforementioned wrap file:
> > >
> > > I think adding more boilerplate to all these places does have some cost. 
> > > So
> > > I'm not really convinced this is worth doign.
> > 
> > Could you explain more about what costs you foresee?
>
> Repetitive code that needs to be added to each further binary we add. I don't
> mind doing that if it has a use case, but I'm not sure I see the use case for
> random binaries...

I want to make sure I am only adding it to things that are user-facing.
So if I have added the line to executables that are for internal use
only, please correct me. In general, I override for all user-facing
programs/dependencies because I never know how some end-user might use
the binaries.

Perhaps what might sway you is that the old method (libpq = libpq_dep)
is very error prone because variable names become part of the public API
of your build description which no one likes. In the new way, which is
only possible when specifying overrides, as long as binary names or
pkg-config names don't change, you get stability no matter how you name
your variables.

> > > > From 5ee13f09e4101904dbc9887bd4844eb5f1cb4fea Mon Sep 17 00:00:00 2001
> > > > From: Tristan Partin 
> > > > Date: Wed, 17 May 2023 10:54:53 -0500
> > > > Subject: [PATCH postgres v1 17/17] Remove Meson program options for 
> > > > specifying
> > > >  paths
> > > > 
> > > > Meson has a built-in way to override paths without polluting project
> > > > build options called machine files.
> > >
> > > I think meson machine files are just about unusable. For one, you can't
> > > actually change any of the options without creating a new build dir. For
> > > another, it's not something that easily can be done on the commandline.
> > >
> > > So I really don't want to remove these options.
> > 
> > I felt like this would be the most controversial change. What could be
> > done in upstream Meson to make this a more enjoyable experience?
>
> I think *requiring* separate files is a really poor experience when you come
> from some other system, where those could trivially be overwritten on the
> commandline.

Hmm. I could maybe ask around for a --program command line option. Could
you provide the syntax for what autotools does? That way I can come to
the discussion in the Meson channel with prior art. I personally don't
find the files that annoying, but to each their own.

> The biggest change to make them more usable would be to properly reconfigure
> when the contents of machine file change. IIRC configure is rerun, but the
> changes aren't taken into account.

I could not reproduce this. Perhaps you were testing with an older Meson
where 

Re: Inconsistent results with libc sorting on Windows

2023-06-12 Thread Juan José Santamaría Flecha
On Fri, Jun 9, 2023 at 11:18 AM Daniel Verite 
wrote:

> Juan José Santamaría Flecha wrote:
>
> > Just to make sure we are all seeing the same problem, does the attached
> > patch fix your test?
>
> The problem of the random changes in sorting disappears for all libc
> locales in pg_collation, so this is very promising.
>

Great to hear, I'll try to push the patch to something reviewable as per
attached.

However it persists for the default collation.
>

We can apply a similar approach there.

Regards,

Juan José Santamaría Flecha


0001-WIN32-Inconsistent-results-with-libc-utf8-sorting.patch
Description: Binary data


Re: RFC: Logging plan of the running query

2023-06-12 Thread James Coleman
On Sun, Jun 11, 2023 at 11:07 PM torikoshia  wrote:
>
> On 2023-06-06 03:26, James Coleman wrote:
> > On Mon, Jun 5, 2023 at 4:30 AM torikoshia 
> > wrote:
> >>
> >> On 2023-06-03 02:51, James Coleman wrote:
> >> > Hello,
> >> >
> >> > Thanks for working on this patch!
> >
> > Sure thing! I'm *very interested* in seeing this available, and I
> > think it paves the way for some additional features later on...
> >
> >> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia 
> >> ...
> >> > To put it positively: we believe that, for example, catalog accesses
> >> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside
> >> > an existing valid transaction/query state, as it would be for this
> >> > patch -- are safe. If there were problems, then those problems are
> >> > likely bugs we already have in other CFI cases.
> >>
> >> Thanks a lot for the discussion and sharing it!
> >> I really appreciate it.
> >>
> >> BTW I'm not sure whether all the CFI are called in valid transaction,
> >> do you think we should check each of them?
> >
> > I kicked off the regressions tests with a call to
> > ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()
> > call. Several hours and 52 GB of logs later I have confirmed that
> > (with the attached revision) at the very least the regression test
> > suite can't trigger any kind of failures regardless of when we trigger
> > this. The existing code in the patch for only running the explain when
> > there's an active query handling that.
>
> Thanks for the testing!
>
> > I've attached v27. The important change here in 0001 is that it
> > guarantees the interrupt handler is re-entrant, since that was a bug
> > exposed by my testing. I've also included 0002 which is only meant for
> > testing (it attempts to log in the plan in every
> > CHECK_FOR_INTERRUPTS() call).
>
> When SIGINT is sent during ProcessLogQueryPlanInterrupt(),
> ProcessLogQueryPlanInterruptActive can remain true.
> After that, pg_log_query_plan() does nothing but just returns.
>
> AFAIU, v26 logs plan for each pg_log_query_plan() even when another
> pg_log_query_plan() is running, but it doesn't cause errors or critical
> problem.
>
> Considering the problem solved and introduced by v27, I think v26 might
> be fine.
> How do you think?

The testing I did with calling this during every CFI is what uncovered
the re-entrancy problem. IIRC (without running that test again) the
problem was a stack overflow. Now, to be sure this is a particularly
degenerate case because in real-world usage it'd be impossible in
practice, I think, to trigger that many calls to this function (and by
extension the interrupt handler).

If SIGINT is the only concern we could reset
ProcessLogQueryPlanInterruptActive in error handling code. I admit
that part of my thought process here is thinking ahead to an
additional patch I'd like to see on top of this, which is logging a
query plan before cleaning up when statement timeout occurs. The
re-entrancy issue becomes more interesting then, I think, since we
would then have automated calling of the logging code. BTW: I'd
thought that would make a nice follow-up patch for this, but if you'd
prefer I could add it as another patch in the series here.

What do you think about resetting the flag versus just not having it?

Regards,
James Coleman




Re: Do we want a hashset type?

2023-06-12 Thread Joel Jacobson
On Sat, Jun 10, 2023, at 17:46, Andrew Dunstan wrote:
> Maybe you can post a full patch as well as incremental?

Attached patch is based on tvondra's last commit 375b072.

> Stylistically I think you should reduce reliance on magic numbers (like 13). 
> Probably need some #define's?

Great idea, fixed, I've added a HASHSET_STEP definition (set to the value 13).

On Sat, Jun 10, 2023, at 17:51, jian he wrote:
> int32 value = strtol(str, , 10);
> there is no int32 value range check? 
> imitate src/backend/utils/adt/int.c. the following way is what I came up with.
>
>
> int64 value = strtol(str, , 10);
>
> if (errno == ERANGE || value < INT_MIN || value > INT_MAX)

Thanks, fixed like suggested, except I used PG_INT32_MIN and PG_INT32_MAX,
which explicitly represent the maximum value for a 32-bit integer,
regardless of the platform or C implementation.

> also it will infinity loop in hashset_in if supply the wrong value 
> example select  '{1,2s}'::hashset;
> I need kill -9 to kill the process. 

Thanks. I've added a new test, `sql/invalid.sql` with that example query.

Here is a summary of all other changes:
 
* README.md: Added sections Usage, Data types, Functions and Aggregate Functions

* Added test/ directory with some tests.

* Added "not production-ready" notice at top of README, warning for breaking
changes and no migration scripts, until our first release.

* Change version from 1.0.0 to 0.0.1 to indicate current status.

* Added CEIL_DIV macro

* Implemented hashset_in(), hashset_out()
  The syntax for the serialized format is comma separated integer values
  wrapped around curly braces, e.g '{1,2,3}'

* Implemented hashset_recv() to match the existing hashset_send()

* Removed/rewrote some tdigest related comments

/Joel

hashset-0.0.1-a8a282a.patch
Description: Binary data


Re: Add support for AT LOCAL

2023-06-12 Thread Alvaro Herrera
On 2023-Jun-06, Laurenz Albe wrote:

> At a quick glance, it looks like you resolve "timezone" at the time
> the query is parsed.  Shouldn't the resolution happen at query
> execution time?

Sounds like it -- consider the case where the timestamp value is a
partition key and one of the partition boundaries falls in between two
timezone offsets for some particular ts value; then you use a prepared
query to read from a view defined with AT LOCAL.  Partition pruning
would need to compute partitions to read from at runtime, not plan time.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: abi-compliance-checker

2023-06-12 Thread Tristan Partin
On Mon Jun 12, 2023 at 10:10 AM CDT, Tristan Partin wrote:
> On Sat Jun 10, 2023 at 9:17 AM CDT, Peter Eisentraut wrote:
> > I have rearranged this a bit.  There are now two build options, called 
> > abidw and abidiff.  The abidw option produces the xml file, that you 
> > would then at appropriate times commit into the tree as the base.  The 
> > abidiff option enables the abidiff tests.  This doesn't actually require 
> > abidw, since abidiff can compare the binary directly against the 
> > recorded XML file.  So these options are distinct and nonoverlapping.
> >
> > Note that in this setup, you still need a conditional around the abidiff 
> > test() call, because otherwise meson setup will fail if the base file 
> > doesn't exist (yet), so it would be impossible to bootstrap this system.
>
> Could you speak more to the workflow you see with managing the checked
> in diff files?

Just saw your other email which talks about the workflow.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: abi-compliance-checker

2023-06-12 Thread Tristan Partin
On Sat Jun 10, 2023 at 9:17 AM CDT, Peter Eisentraut wrote:
> I have rearranged this a bit.  There are now two build options, called 
> abidw and abidiff.  The abidw option produces the xml file, that you 
> would then at appropriate times commit into the tree as the base.  The 
> abidiff option enables the abidiff tests.  This doesn't actually require 
> abidw, since abidiff can compare the binary directly against the 
> recorded XML file.  So these options are distinct and nonoverlapping.
>
> Note that in this setup, you still need a conditional around the abidiff 
> test() call, because otherwise meson setup will fail if the base file 
> doesn't exist (yet), so it would be impossible to bootstrap this system.

Could you speak more to the workflow you see with managing the checked
in diff files?

At my previous job, I had tried to do something similar with regard to
making sure we didn't break ABI[0], but I took a different approach
where instead of hooking into the Meson test infrastructure, I used a CI
job where I checked out the previous major version of the code and the
current version of the code, built both, and checked the built binaries.
The benefit of this workflow is that you don't check anything into the
source repo.

I think the same approach might be better here, but instead of writing
it all into the CI file like I did, use a perl script. Then once you
have the perl script, it could be possible to then hook into the Meson
test infrastructure.

> There is something weird going on where the cirrus linux/meson job 
> doesn't upload the produced abidw artifacts, even though they are 
> apparently built, and the equivalent works for the freebsd job.  Maybe 
> someone can see something that I'm not seeing there.

Nothing obvious is wrong to me. Was the failure maybe just a fluke?

[0]: 
https://github.com/hse-project/hse/blob/master/.github/workflows/abicheck.yaml

-- 
Tristan Partin
Neon (https://neon.tech)




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-12 Thread Joe Conway

(moving to hackers)

On 6/12/23 05:13, Heikki Linnakangas wrote:

On 10/06/2023 22:28, Joe Conway wrote:

On 6/10/23 15:07, Joe Conway wrote:

On 6/10/23 14:42, Tom Lane wrote:

Joe Conway  writes:

5/ The attached fixes the issue for me on pg10 and passes check-world.
Comments?


The call in PGLC_localeconv seems *very* oddly placed.  Why not
do that before it does any other locale calls?  Otherwise you don't
really have reason to believe you're saving the appropriate
values to restore later.



As far as I can tell it really only affects localeconv(), so I tried to
place it close to those. But I am fine with moving it up.


This version is against pg16 (rather than pg10), moves up that hunk,
mentions localeconv() in the comment as the reason for the call, and
fixes some whitespace sloppiness. I will plan to apply to all supported
branches.

Better?


The man page for uselocale(LC_GLOBAL_LOCALE) says: "The calling thread's
current locale is set to the global locale determined by setlocale(3)."
Does that undo the effect of calling uselocale() previously, so if you
later call setlocale(), the new locale takes effect in the thread too?
Or is it equivalent to "uselocale(LC_ALL, setlocale(NULL))", so that it
sets the thread's locale to the current global locale, but later
setlocale() calls have no effect on it?


setlocale() changes the global locale, but uselocale() changes the 
locale that is currently active, as I understand it.


Also note that uselocale man page says "Unlike setlocale(3), uselocale() 
does not allow selective replacement of individual locale  categories. 
To employ a locale that differs in only a few categories from the 
current locale, use calls to duplocale(3) and newlocale(3) to obtain a 
locale object equivalent to the current locale and modify the desired 
categories in that object."



In any case, this still doesn't feel like the right place. We have many
more setlocale() calls. Shouldn't we sprinkle them all with
uselocale(LC_GLOBAL_LOCALE)? cache_locale_time() for example. Or rather,
all the places where we use any functions that depend on the current locale.

How about we replace all setlocale() calls with uselocale()?


I don't see us backpatching something that invasive. It might be the 
right thing to do for pg17, or even pg16, but I think that is a 
different discussion



Shouldn't we restore the old thread-specific locale after the calls? I'm
not sure why libperl calls uselocale(), but we are now overwriting the
locale that it sets.


That is a good question. Though arguably perl is doing the wrong thing 
by not resetting the global locale when it is being used embedded.



We have a few other uselocale() calls in pg_locale.c, and we take
care to restore the old locale in those.


I think as long as we are relying on setlocale rather than uselocale in 
general (see above), the global locale is where we want things left.



There are a few uselocale() calls in ecpg, and they are protected by
HAVE_USELOCALE. Interestingly, the calls in pg_locale.c are not, but
they are protected by HAVE_LOCALE_T. Seems a little inconsistent.


Possibly something we should clean up, but I think that is separate from 
this fix.


In general I think we have 2 or possibly three distinct things here:

1/ how do we fix the misbehavior reported due to libperl in existing 
stable branches


2/ what makes most sense going forward (and does that mean pg16 or pg17)

3/ misc code cleanups

I was mostly trying to concentrate on #1, but 2 & 3 are worthy of 
discussion.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15

2023-06-12 Thread Melanie Plageman
On Fri, Jun 9, 2023 at 3:28 AM Gregory Smith  wrote:
> On Thu, Jun 8, 2023 at 9:21 PM Andres Freund  wrote:
>>
>> You might need to add --no-children to the perf report invocation, otherwise
>> it'll show you the call graph inverted.
>
>
> My problem was not writing kernel symbols out, I was only getting addresses 
> for some reason.  This worked:
>
>   sudo perf record -g --call-graph dwarf -d --phys-data -a sleep 1
>   perf report --stdio

Do you know why using phys-data would have solved the problem in your
particular setup? I find figuring out what perf options I need
mystifying.
I end up trying random things from
https://wiki.postgresql.org/wiki/Profiling_with_perf,  the perf man
page, and https://www.brendangregg.com/perf.html

The pg wiki page actually has a lot of detail. If you think your
particular problem is something others would encounter, it could be
good to add it there.

FWIW, I think it is helpful to have hackers threads like this where
people work through unexplained performance results with others in the
community.

- Melanie




Re: Wrong results from Parallel Hash Full Join

2023-06-12 Thread Melanie Plageman
On Sun, Jun 11, 2023 at 11:24 PM Michael Paquier  wrote:
>
> On Wed, Jun 07, 2023 at 05:16:12PM -0400, Melanie Plageman wrote:
> > On Fri, May 19, 2023 at 8:05 PM Tom Lane  wrote:
> >> Considering that this is a parallel plan, I don't think there's any
> >> mystery about why an ORDER-BY-less query might have unstable output
> >> order; the only mystery is why more of the buildfarm hasn't failed.
> >> Can we just add "ORDER BY t1.id" to this query?  It looks like you
> >> get the same PHJ plan, although now underneath Sort/Gather Merge.
> >
> > Yes, this was an oversight on my part. Attached is the patch that does
> > just what you suggested.
>
> Confirmed that adding an ORDER BY adds a Sort node between a Gather
> Merge and a Parallel Hash Full Join, not removing coverage.
>
> This has fallen through the cracks and conchuela has failed again
> today, so I went ahead and applied the fix on HEAD.  Thanks!

Thanks!




Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1

2023-06-12 Thread Tom Lane
Richard Guo  writes:
> Yeah, that makes sense.  process_subquery_nestloop_params is a better
> place to do this adjustments.  +1 to v2 patch.

Pushed, then.

regards, tom lane




Re: Partial aggregates pushdown

2023-06-12 Thread Bruce Momjian
On Mon, Jun 12, 2023 at 08:51:30AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Bruce, Mr.Pyhalov, hackers.
> 
> Thank you for comments. I will try to respond to both of your comments as 
> follows.
> I plan to start revising the patch next week. If you have any comments on the 
> following
> respondences, I would appreciate it if you could give them to me this week.
> 
> > From: Bruce Momjian 
> > Sent: Saturday, June 10, 2023 1:44 AM
> > I agree that this feature is designed for built-in sharding, but it is 
> > possible people could be using aggregates on partitions
> > backed by foreign tables without sharding.  Adding a requirement for 
> > non-sharding setups to need PG 17+ servers might
> > be unreasonable.
> Indeed, it is possible to use partial aggregate pushdown feature for purposes 
> other than sharding.
> The description of the section "F.38.6. Built-in sharding in PostgreSQL" 
> assumes the use of
> Built-in sharding and will be modified to eliminate this assumption.
> The title of this section should be changed to something like "Aggregate on 
> partitioned table".

Sounds good.

> > From: Bruce Momjian 
> > Sent: Saturday, June 10, 2023 1:44 AM
> > Looking at previous release note incompatibilities, we don't normally 
> > change non-administrative functions in a way that
> > causes errors if run on older servers.  Based on Alexander's observations, 
> > I wonder if we need to re-add the postgres_fdw
> > option to control partial aggregate pushdown, and default it to enabled.
> > 
> > If we ever add more function breakage we might need more postgres_fdw 
> > options.  Fortunately, such changes are rare.
> 
> I understand what the problem is. I will put a mechanism maintaining 
> compatibility into the patch.
> I believe there are three approaches.
> Approach 1-1 is preferable because it does not require additional options for 
> postgres_fdw.
> I will revise the patch according to Approach 1-1, unless otherwise commented.
> 
> Approach1:
> I ensure that postgres_fdw retrieves the version of each remote server
> and does not partial aggregate pushd down if the server version is less than 
> 17.
> There are two approaches to obtaining remote server versions.
> Approach1-1: postgres_fdw connects a remote server and use PQserverVersion().
> Approach1-2: Adding a postgres_fdw option about a remote server version (like 
> "server_version").
> 
> Approach2:
> Adding a postgres_fdw option for partial aggregate pushdown is enable or not
> (like enable_partial_aggregate_pushdown).

These are good questions.  Adding a postgres_fdw option called
enable_partial_aggregate_pushdown helps make the purpose of the option
clear, but remote_version can be used for future breakage as well.

I think remote_version is the best idea, and in the documention for the
option, let's explcitly say it is useful to disable partial aggreates
pushdown on pre-PG 17 servers.  If we need to use the option for other
cases, we can just update the documentation.  When the option is blank,
the default, everything is pushed down.

I see remote_version a logical addition to match our "extensions" option
that controls what extension functions can be pushed down.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Do we want a hashset type?

2023-06-12 Thread Andrew Dunstan


On 2023-06-12 Mo 09:00, Tomas Vondra wrote:



What was the json/jsonb story, was it ever an extension before
being included in core?

I don't recall what the exact story was, but I guess the "json" type was
added to core very long ago (before we started to push back a bit), and
we added some SQL grammar stuff too, which can't be done from extension.
So when we added jsonb (much later than json), there wasn't much point
in not having it in core.




Not quite.

The json type as added in 9.2 (Sept 2012) and jsonb in 9.4 (Dec 2014). I 
wouldn't call those far apart or very long ago. Neither included any 
grammar changes AFAIR.


But if they had been added as extensions we'd probably be in a whole lot 
more trouble now implementing SQL/JSON, so whether that was foresight or 
laziness I think the we landed on our feet there.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Do we want a hashset type?

2023-06-12 Thread Tomas Vondra



On 6/12/23 14:46, Andrew Dunstan wrote:
> 
> On 2023-06-11 Su 16:15, Joel Jacobson wrote:
>> On Sun, Jun 11, 2023, at 17:03, Tomas Vondra wrote:
>>> On 6/11/23 12:26, Joel Jacobson wrote:
 I think there are some good arguments that speaks in favour of including 
 it in core:
>> ...
>>> I agree with all of that, but ...
>>>
>>> It's just past feature freeze, so the earliest release this could appear
>>> in is 17, about 15 months away.
>>>
>>> Once stuff gets added to core, it's tied to the release cycle, so no new
>>> features in between.
>>>
>>> Presumably people would like to use the extension in the release they
>>> already use, without backporting.
>>>
>>> Postgres is extensible for a reason, exactly so that we don't need to
>>> have everything in core.
>> Interesting, I've never thought about this one before:
>> What if something is deemed to be fundamental and therefore qualify for core 
>> inclusion,
>> and at the same time is suitable to be made an extension.
>> Would it be possible to ship such extension as pre-installed?
>>
>> What was the json/jsonb story, was it ever an extension before
>> being included in core?
> 
> 
> No, and the difficulty is that an in-core type and associated functions
> will have different oids, so migrating from one to the other would be at
> best painful.
> 
> So it's a kind of now or never decision. I think extensions are
> excellent for specialized types. But I don't regard a set type in that
> light.
> 

Perhaps. So you're proposing to have this as a regular built-in type?
It's hard for me to judge how popular this feature would be, but I guess
people often use arrays while they actually want set semantics ...

If we do that, I wonder if we could do something similar to arrays, with
the polymorphism and SQL grammar support. Does SQL have any concept of
sets (or arrays, for that matter) as data types?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Do we want a hashset type?

2023-06-12 Thread Tomas Vondra



On 6/11/23 22:15, Joel Jacobson wrote:
> On Sun, Jun 11, 2023, at 17:03, Tomas Vondra wrote:
>> On 6/11/23 12:26, Joel Jacobson wrote:
>>> I think there are some good arguments that speaks in favour of including it 
>>> in core:
> ...
>>
>> I agree with all of that, but ...
>>
>> It's just past feature freeze, so the earliest release this could appear
>> in is 17, about 15 months away.
>>
>> Once stuff gets added to core, it's tied to the release cycle, so no new
>> features in between.
>>
>> Presumably people would like to use the extension in the release they
>> already use, without backporting.
>>
>> Postgres is extensible for a reason, exactly so that we don't need to
>> have everything in core.
> 
> Interesting, I've never thought about this one before:
> What if something is deemed to be fundamental and therefore qualify for core 
> inclusion,
> and at the same time is suitable to be made an extension.
> Would it be possible to ship such extension as pre-installed?
> 

I think it's always a matter of judgment - I don't think there's some
clear set into a stone. If something is "fundamental" and can be done in
an extension, there's always the option to have it in contrib (with all
the limitations I mentioned).

FWIW I'm not strictly against adding it to contrib, if there's agreement
it's worth it. But if we consider it to be a fundamental data structure
and widely useful, maybe we should consider making it a built-in data
type, with fixed OID etc. That'd allow support at the SQL grammar level,
and perhaps also from the proposed SQL/PGQ (and GQL?). AFAIK moving data
types from extension (even if from contrib) to core is not painless.

Either way it might be a nice / useful first patch, I guess.

> What was the json/jsonb story, was it ever an extension before
> being included in core?

I don't recall what the exact story was, but I guess the "json" type was
added to core very long ago (before we started to push back a bit), and
we added some SQL grammar stuff too, which can't be done from extension.
So when we added jsonb (much later than json), there wasn't much point
in not having it in core.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Do we want a hashset type?

2023-06-12 Thread Andrew Dunstan


On 2023-06-11 Su 16:15, Joel Jacobson wrote:

On Sun, Jun 11, 2023, at 17:03, Tomas Vondra wrote:

On 6/11/23 12:26, Joel Jacobson wrote:

I think there are some good arguments that speaks in favour of including it in 
core:

...

I agree with all of that, but ...

It's just past feature freeze, so the earliest release this could appear
in is 17, about 15 months away.

Once stuff gets added to core, it's tied to the release cycle, so no new
features in between.

Presumably people would like to use the extension in the release they
already use, without backporting.

Postgres is extensible for a reason, exactly so that we don't need to
have everything in core.

Interesting, I've never thought about this one before:
What if something is deemed to be fundamental and therefore qualify for core 
inclusion,
and at the same time is suitable to be made an extension.
Would it be possible to ship such extension as pre-installed?

What was the json/jsonb story, was it ever an extension before
being included in core?



No, and the difficulty is that an in-core type and associated functions 
will have different oids, so migrating from one to the other would be at 
best painful.


So it's a kind of now or never decision. I think extensions are 
excellent for specialized types. But I don't regard a set type in that 
light.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Pavel Borisov
Is the following true or not?

1. If we switch processes to threads but leave the amount of session
local variables unchanged, there would be hardly any performance gain.
2. If we move some backend's local variables into shared memory then
the performance gain would be very near to what we get with threads
having equal amount of session-local variables.

In other words, the overall goal in principle is to gain from less
memory copying wherever it doesn't add the burden of locks for
concurrent variables access?

Regards,
Pavel Borisov,
Supabase




Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Joel Jacobson
On Mon, Jun 12, 2023, at 13:53, Tomas Vondra wrote:
> In a way, I think this "split into independently beneficial steps"
> strategy is the only option with a meaningful chance of success.

+1

/Joel




Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Tomas Vondra



On 6/10/23 13:20, Dave Cramer wrote:
> 
> 
> On Fri, 9 Jun 2023 at 18:29, Stephen Frost  > wrote:
> 
> Greetings,
> 
> * Dave Cramer (davecramer@postgres.rocks) wrote:
> > One thing I can think of is upgrading. AFAIK dump and restore is
> the only
> > way to change the on disk format.
> > Presuming that eventually we will be forced to change the on disk
> format it
> > would be nice to be able to do so in a manner which does not force
> long
> > down times
> 
> There is an ongoing effort moving in this direction.  The $subject isn't
> great, but this patch set (which we are currently working on
> updating...): https://commitfest.postgresql.org/43/3986/
>  attempts
> changing a lot of currently compile-time block-size pieces to be
> run-time which would open up the possibility to have a different page
> format for, eg, different tablespaces.  Possibly even different block
> sizes.  We'd certainly welcome discussion from others who are
> interested.
> 
> Thanks,
> 
> Stephen
> 
> 
> Upgrading was just one example of difficult problems that need to be 
> addressed. My thought was that before we commit to something as
> potentially resource intensive as changing the threading model we
> compile a list of other "big issues" and prioritize.
> 

I doubt anyone expects the community to commit to the threading switch
in this sense - drop everything else and just start working on this
(pretty massive) change. Not going to happen.

> I realize open source is more of a scratch your itch kind of development
> model, but I'm not convinced the random walk that entails is the
> appropriate way to move forward. At the very least I'd like us to
> question it.

I may be missing something, but it's not clear to me whether you argue
for the open source approach or against it. I personally think it's
perfectly fine for people to work on scratching their itch and focus on
stuff that yields value to them (or their customers).

And I think the only way to succeed at the threading switch is within
this very framework - split it into (much) smaller steps that are
beneficial on their own and scratch some other itch.

For example, we have issues with large number of connections and we've
discussed stuff like built-in connection pooling etc. for a very long
time (including this thread). But we have session state in various
places in process private memory, which makes it borderline impossible
and thus we don't have anything built-in. IIUC the threading would needs
to isolate/define the session state anyway, so perhaps it could do it in
a way that'd also work for the connection pooling (with processes)?

Which would mean this particular change is immediately beneficial even
without the threading switch (which I'd expect to take considerable
amount of time).

In a way, I think this "split into independently beneficial steps"
strategy is the only option with a meaningful chance of success.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-06-12 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

At PGcon and other places we have discussed the time-delayed logical 
replication,
but now we have understood that there are no easy ways. Followings are our 
analysis.

# Abstract

To implement the time-dealyed logical replication for more proper approach,
the worker must serialize all the received messages into permanent files.
But PostgreSQL does not have good infrastructures for the purpose so huge 
engineering is needed.

## Review: problem of without-file approach

In the without-file approach, the  apply worker process sleeps while delaying 
the application.
This approach is chosen in earlier versions like [1], but it contains problems 
which was
shared by Sawada-san [2]. They lead the PANIC error due to the disk full.

 A) WALs cannot be recycled on publisher because they are not flushed on 
subscriber.
 B) Moreover, vacuuming cannot remove dead tuples on publisher.

## Alternative approach: serializing messages to files

To prevent any potential issues, the worker should serialize all incoming 
messages
to a permanent file, like what the physical walreceiver does.
Here, messages are first written into files at the beginning of transactions 
and then flushed at the end.
This approach could slove problem a), b), but it still has many considerations 
and difficulties.

### How to separate messages into files?

There are two possibilities for dividing messages into files, but neither of 
them is ideal.

1. Create a file per received transaction. 
 
In this case files will be removed after the delay-period is exceeded and it is 
applied.
This is the simplest approach, but the number of files is bloat.

2. Use one large file or segmented file (like WAL). 

This can reduce the number of files, but we must consider further things:

 A) Purge – We must purge the applied transaction, but we do not have a good way
 to remove one transaction from the large file.

 B) 2PC – It is more likely to occur that the segment which contains the actual
 transaction differs from the segment where COMMIT PREPARED.
 Hence the worker must check all the segments to find the actual messages from 
them.

 C) Streamed in-progress transactions - chunks of transactions are separated
 into several segments. Hence the worker must check all the segments to find
 chunks messages from them, same as above.

### Handle the case when the file exceeds the limitation 

Regardless of the option chosen from the ones mentioned above, there is a 
possibility
that the file size could exceed the file system's limit. This can occur as the
publisher can send transactions of any length.
PostgreSQL provides a mechanism for working with such large files - BufFile 
data structure,
but it could not be used as-is for several reasons:

 A) It only supports the buffered-I/O. A read or write of the low-level File
 occurs only when the buffer is filled or emptied. So, we cannot control when 
it is persisted.

 B) It can be used only for temporary purpose. Internally the BufFile creates
 some physical files into $PGDATA/base/pgsql_tmp directories, and files in the
 subdirectory will be removed when postmaster restarts.

 C) It does not have mechanisms for restoring information after the restart.
 BufFile contains virtual positions such as file index and offset, but these
 fields are stored in a memory structure, so the BufFile will forget the 
ordering
 of files and its initial/final position after restarts.

 D) It cannot remove a part of virtual file. Even if a large file is separated
 into multiple physical files and all transactions in a physical file are 
already
 applied, BufFile cannot remove only one part.

[1]: 
https://www.postgresql.org/message-id/f026292b-c9ee-472e-beaa-d32c5c3a2ced%40www.fastmail.com
[2]: 
https://www.postgresql.org/message-id/CAD21AoAeG2+RsUYD9+mEwr8-rrt8R1bqpe56T2D=euo-qs-...@mail.gmail.com

Acknowledgement:

Amit, Peter, Sawada-san
Thank you for discussing with me off-list.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Order changes in PG16 since ICU introduction

2023-06-12 Thread Daniel Verite
Jeff Davis wrote:

> I guess where I'm confused is: why would a user actually want their
> database collation to be C.UTF-8? It's slower than C, our
> implementation doesn't properly version it (as you pointed out), and
> the semantics don't seem great ('Z' < 'a').

Because when LC_CTYPE=C, characters outside of US ASCII are not
categorized properly. upper/lower/regexp matching/... produce
incorrect results.

> But if they don't specify the provider, isn't it much more likely they
> just don't care much about the locale, and would be happier with C? 

Consider a pre-existing script doing initdb --locale=C.UTF-8
Surely it does care about the locale, otherwise it would not specify
it.
Assuming that it would be better off with C is assuming that a
non-Unicode aware locale is better than the Unicode-aware locale
they're asking. I don't think it's reasonable.

> The user can easily get libc behavior by specifying --locale-
> provider=libc, so I don't see how you reached this conclusion.

What would be user hostile is forcing users that don't need an ICU
locale to change their invocations of initdb/createdb to avoid
regressions with v16. Most people would discover this after
it breaks their apps.

> It looks like you are fine with 0003 applying LOCALE to whatever
> provider is chosen, but you'd like to be smarter about choosing the
> provider and to choose libc in at least some cases.
> 
> That is actually very much like option #2 in the list I presented
> here[2], and has the same problems. How should the following behave?
> 
>   initdb --locale=C --lc-collate=fr_FR.utf8
>   initdb --locale=en --lc-collate=fr_FR.utf8

The same as v15.

> If we switch to libc in the first case, then --locale will be ignored
> and the collation will be fr_FR.utf8.

$ initdb --locale=C --lc-collate=fr_FR.utf8
v15 does that:

  The database cluster will be initialized with this locale configuration:
provider:libc
LC_COLLATE:  fr_FR.utf8
LC_CTYPE:C
LC_MESSAGES: C
LC_MONETARY: C
LC_NUMERIC:  C
LC_TIME: C
  The default database encoding has accordingly been set to "SQL_ASCII".

--locale is not ignored, it's overriden for LC_COLLATE only.

> But we will leave the second case as ICU and the collation will be
> "en".

Yes. To me the rule for "ICU is the default" in v16 should be: if the
--locale argument points to a locale that we know ICU does not provide,
we fall back to the v15 behavior down to every detail, otherwise we let
ICU be the provider.

> You also suggested that we consider switching the provider to libc any
> time ICU doesn't support something. I'm not sure whether you meant a
> static list (C, C.UTF-8, POSIX, ...?) or some kind of dynamic test.

C, C.*, POSIX. I'm not sure if there are other cases.

> I'm also not clear whether you think we should abandon the built-in
> provider, or still select it for C/POSIX.

I see it as going in v17, because it came after feature freeze and
is not strictly necessary in v16.



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Large files for relations

2023-06-12 Thread David Steele

On 5/28/23 08:48, Thomas Munro wrote:


Alright, since I had some time to kill in an airport, here is a
starter patch for initdb --rel-segsize.  


I've gone through this patch and it looks pretty good to me. A few things:

+* rel_setment_size, we will truncate the K+1st segment 
to 0 length

rel_setment_size -> rel_segment_size

+* We used a phony GUC with a custome show function, because we don't

custome -> custom

+   if (strcmp(endptr, "kB") == 0)

Why kB here instead of KB to match MB, GB, TB below?

+   int64   relseg_size;/* blocks per segment of large relation 
*/

This will require PG_CONTROL_VERSION to be bumped -- but you are 
probably waiting until commit time to avoid annoying conflicts, though I 
don't think it is as likely as with CATALOG_VERSION_NO.



Some random thoughts:

Another potential option name would be --segsize, if we think we're
going to use this for temp files too eventually.


I feel like temp file segsize should be separately configurable for the 
same reason that we are leaving it as 1GB for now.



Maybe it's not so beautiful to have that global variable
rel_segment_size (which replaces REL_SEGSIZE everywhere).  


Maybe not, but it is the way these things are done in general, .e.g. 
wal_segment_size, so I don't think it will be too controversial.



Another
idea would be to make it static in md.c and call smgrsetsegmentsize(),
or something like that.  That could be a nice place to compute the
"shift" value up front, instead of computing it each time in
blockno_to_segno(), but that's probably not worth bothering with (?).
BSR/LZCNT/CLZ instructions are pretty fast on modern chips.  That's
about the only place where someone could say that this change makes
things worse for people not interested in the new feature, so I was
careful to get rid of / and % operations with no-longer-constant RHS.


Right -- not sure we should be troubling ourselves with trying to 
optimize away ops that are very fast, unless they are computed trillions 
of times.



I had to promote segment size to int64 (global variable, field in
control file), because otherwise it couldn't represent
--rel-segsize=32TB (it'd be too big by one).  Other ideas would be to
store the shift value instead of the size, or store the max block
number, eg subtract one, or use InvalidBlockNumber to mean "no limit"
(with more branches to test for it).  The only problem I ran into with
the larger type was that 'SHOW segment_size' now needs a custom show
function because we don't have int64 GUCs.


A custom show function seems like a reasonable solution here.


A C type confusion problem that I noticed: some code uses BlockNumber
and some code uses int for segment numbers.  It's not really a
reachable problem for practical reasons (you'd need over 2 billion
directories and VFDs to reach it), but it's wrong to use int if
segment size can be set as low as BLCKSZ (one file per block); you
could have more segments than an int can represent.  We could go for
uint32, BlockNumber or create SegmentNumber (which I think I've
proposed before, and lost track of...).  We can address that
separately (perhaps by finding my old patch...)


I think addressing this separately is fine, though maybe enforcing some 
reasonable minimum in initdb would be a good idea for this patch. For my 
2c SEGSIZE == BLOCKSZ just makes very little sense.


Lastly, I think the blockno_to_segno(), blockno_within_segment(), and 
blockno_to_seekpos() functions add enough readability that they should 
be committed regardless of how this patch proceeds.


Regards,
-David




RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Bruce, Mr.Pyhalov, hackers.

Thank you for comments. I will try to respond to both of your comments as 
follows.
I plan to start revising the patch next week. If you have any comments on the 
following
respondences, I would appreciate it if you could give them to me this week.

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> I agree that this feature is designed for built-in sharding, but it is 
> possible people could be using aggregates on partitions
> backed by foreign tables without sharding.  Adding a requirement for 
> non-sharding setups to need PG 17+ servers might
> be unreasonable.
Indeed, it is possible to use partial aggregate pushdown feature for purposes 
other than sharding.
The description of the section "F.38.6. Built-in sharding in PostgreSQL" 
assumes the use of
Built-in sharding and will be modified to eliminate this assumption.
The title of this section should be changed to something like "Aggregate on 
partitioned table".

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> Looking at previous release note incompatibilities, we don't normally change 
> non-administrative functions in a way that
> causes errors if run on older servers.  Based on Alexander's observations, I 
> wonder if we need to re-add the postgres_fdw
> option to control partial aggregate pushdown, and default it to enabled.
> 
> If we ever add more function breakage we might need more postgres_fdw 
> options.  Fortunately, such changes are rare.

I understand what the problem is. I will put a mechanism maintaining 
compatibility into the patch.
I believe there are three approaches.
Approach 1-1 is preferable because it does not require additional options for 
postgres_fdw.
I will revise the patch according to Approach 1-1, unless otherwise commented.

Approach1:
I ensure that postgres_fdw retrieves the version of each remote server
and does not partial aggregate pushd down if the server version is less than 17.
There are two approaches to obtaining remote server versions.
Approach1-1: postgres_fdw connects a remote server and use PQserverVersion().
Approach1-2: Adding a postgres_fdw option about a remote server version (like 
"server_version").

Approach2:
Adding a postgres_fdw option for partial aggregate pushdown is enable or not
(like enable_partial_aggregate_pushdown).

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




Re: check_strxfrm_bug()

2023-06-12 Thread Heikki Linnakangas

On 12/06/2023 01:15, Thomas Munro wrote:

There are still at least a couple
of functions that lack XXX_l variants in the standard: mbstowcs() and
wcstombs() (though we use the non-standard _l variants if we find them
in ), but that's OK because we use uselocale() and not
setlocale(), because uselocale() is required to be thread-local.


Right, mbstowcs() and wcstombs() are already thread-safe, that's why 
there are no _l variants of them.



The use of setlocale() to set up the per-backend/per-database
default locale would have to be replaced with uselocale().
This recent bug report is also related to that: 
https://www.postgresql.org/message-id/17946-3e84cb577e9551c3%40postgresql.org. 
In a nutshell, libperl calls uselocale(), which overrides the setting we 
try set with setlocale().


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys

2023-06-12 Thread Richard Guo
On Mon, Jun 12, 2023 at 12:06 PM David Rowley  wrote:

> On Fri, 9 Jun 2023 at 20:57, Richard Guo  wrote:
> > On Fri, Jun 9, 2023 at 8:13 AM David Rowley 
> wrote:
> >> It might be possible to make adjustments in nodeWindowAgg.c to have
> >> the equality checks come out as true when there is no ORDER BY.
> >> update_frameheadpos() is one location that would need to be adjusted.
> >> It would need further study to ensure we don't accidentally break
> >> anything. I've not done that study, so won't be adjusting the patch
> >> for now.
> >
> > I'm also not sure if doing that is safe in all cases.  Hmm, do you think
> > we can instead check wc->frameOptions to see if it is the RANGE OFFSET
> > case in make_pathkeys_for_window(), and decide to not remove or remove
> > redundant ORDER BY items according to whether it is or not RANGE OFFSET?
>
> I think ideally, we'd not have to add special cases to the planner to
> disable the optimisation for certain cases. I'd much rather see
> adjustments in the executor to handle cases where we've removed ORDER
> BY columns (e.g adjust update_frameheadpos() to assume rows are equal
> when there are no order by columns.)  That of course would require
> that there are no cases where removing ORDER BY columns would change
> the actual query results.  I can't currently think of any reason why
> the results would change, but I'm not overly familiar with the RANGE
> option, so I'd need to spend a bit longer looking at it than I have
> done so far to feel confident in making the patch process ORDER BY
> columns too.
>
> I'm ok with just doing the PARTITION BY stuff as step one.  The ORDER
> BY stuff is more complex and risky which seems like a good reason to
> tackle separately.


I see your point.  Agreed that the ORDER BY stuff might be better to be
done in a separate patch.  So now the v2 patch looks good to me.

Thanks
Richard


Re: Allow pg_archivecleanup to remove backup history files

2023-06-12 Thread Michael Paquier
On Fri, Jun 09, 2023 at 12:32:15AM +0900, Fujii Masao wrote:
> +   
> + Remove backup history files.
> 
> Isn't it better to document clearly which backup history files to be removed? 
> For example, "In addition to removing WAL files, remove backup history files 
> with prefixes logically preceding the oldestkeptwalfile.".

I've written about this part at the beginning of this one, where this
sounds like a duplicated description of the Description section:
https://www.postgresql.org/message-id/zg1nq13v411y4...@paquier.xyz

>   printf(_("  -n, --dry-run   dry run, show the names of the 
> files that would be removed\n"));
> + printf(_("  -b, --clean-backup-history  clean up files including backup 
> history files\n"));
> 
> Shouldn't -b option be placed in alphabetical order?

+1.
--
Michael


signature.asc
Description: PGP signature