Re: Patch: Add parse_type Function

2024-02-10 Thread Erik Wienhold
Let me comment on some issues since I wrote the very first version of
parse_type() on which David's patch is based.

> On 2024-02-01 01:00 +0100, jian he  wrote:
> 
> cosmetic issue. Looking around other error handling code, the format
> should be something like:
> `
> if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
> ereport(ERROR,
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("function returning record called in"
>  "context that cannot accept type record")));
> `

+1

> `PG_FUNCTION_INFO_V1(parse_type);`
> is unnecessary?
> based on the error information:  https://cirrus-ci.com/task/5899457502380032
> not sure why it only fails on windows.

Yeah, it's not necessary for internal functions per [1].  It's a
leftover from when this function was part of the pgtap extension.

> +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
> +#undef PARSE_TYPE_STRING_COLS
> Are these necessary?
> given that the comments already say return the OID and type modifier.

Not sure what the convention here is but I found this in a couple of
places, maybe even a tutorial on writing C functions.  See
`git grep '_COLS\s\+[1-9]'` for those instances.  It's in line with my
habit of avoiding magic constants.

> +( typid oid,
> +  typmod int4 )
> here `int4` should be `integer`?

+1

> commit message:
> `Also accounts for data typs that require the SQL grammar to be parsed:`
> except the typo issue, this sentence is still hard for me to understand.

Yes, the sentence is rather handwavy.  What is meant here is that this
function also resolves types whose typmod is determined by the SQL
parser or some step after that.  There are types whose typmod is
whatever value is found inside the parenthesis, e.g. bit(13) has typmod
13.  Our first idea before coming up with that function was to do some
string manipulation and extract the typmod from inside the parenthesis.
But we soon found out that other typmods don't translate one to one,
e.g.  varchar(13) has typmod 17.  The interval type is also special
because the typmod is some bitmask encoding of e.g. 'second(0)'.  Hence
the mention of the SQL grammar.

> +   
> +Parses a string representing an SQL type declaration as used in a
> +CREATE TABLE statement, optionally 
> schema-qualified.
> +Returns a record with two fields, typid and
> +typmod, representing the OID and
> modifier for the
> +type. These correspond to the parameters to pass to the
> +format_type
> function.
> +   
> 
> can be simplified:
> +   
> +Parses a string representing an SQL data type, optionally
> schema-qualified.
> +Returns a record with two fields, typid and
> +typmod, representing the OID and
> modifier for the
> +type. These correspond to the parameters to pass to the
> +format_type
> function.
> +   
> (I don't have a strong opinion though)

Sounds better.  The CREATE TABLE reference is superfluous.

[1] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C-V1-CALL-CONV

-- 
Erik




035_standby_logical_decoding unbounded hang

2024-02-10 Thread Noah Misch
Coincidentally, one of my buildfarm animals hanged several weeks in a
different test, 035_standby_logical_decoding.pl.  A LOG_SNAPSHOT_INTERVAL_MS
reduction was part of making it reproducible:

On Fri, Feb 02, 2024 at 04:01:45PM -0800, Noah Misch wrote:
> On Fri, Feb 02, 2024 at 02:30:03PM -0800, Noah Misch wrote:
> > On Fri, Feb 02, 2024 at 05:07:14PM -0500, Tom Lane wrote:
> > > If you look at the buildfarm's failures page and filter down to
> > > just subscriptionCheck failures, what you find is that all of the
> > > last 6 such failures are in 031_column_list.pl:

> > https://www.postgresql.org/message-id/flat/16d6d9cc-f97d-0b34-be65-425183ed3721%40gmail.com
> > reported a replacement BgWriterDelay value reproducing it.
> 
> Correction: the recipe changes LOG_SNAPSHOT_INTERVAL_MS in addition to
> BgWriterDelay.

I'm reusing this thread just in case there's overlap with the
031_column_list.pl cause and fix.  The 035_standby_logical_decoding.pl hang is
a race condition arising from an event sequence like this:

- Test script sends CREATE SUBSCRIPTION to subscriber, which loses the CPU.
- Test script calls pg_log_standby_snapshot() on primary.  Emits 
XLOG_RUNNING_XACTS.
- checkpoint_timeout makes a primary checkpoint finish.  Emits 
XLOG_RUNNING_XACTS.
- bgwriter executes LOG_SNAPSHOT_INTERVAL_MS logic.  Emits XLOG_RUNNING_XACTS.
- CREATE SUBSCRIPTION wakes up and sends CREATE_REPLICATION_SLOT to standby.

Other test code already has a solution for this, so the attached patches add a
timeout and copy the existing solution.  I'm also attaching the hack that
makes it 100% reproducible.
Author: Noah Misch 
Commit: Noah Misch 

[not for commit] Demonstrate 035_standby_logical_decoding.pl hang.

There are three way XLOG_RUNNING_XACTS can show up:
- SELECT pg_log_standby_snapshot()
- CHECKPOINT
- bgwriter LOG_SNAPSHOT_INTERVAL_MS

An idle primary won't do the second two.  If you're unlucky enough for
all three to happen before CREATE_REPLICATION_SLOT's
DecodingContextFindStartpoint() call starts watching, then
CREATE_REPLICATION_SLOT will hang indefinitely.

diff --git a/src/backend/postmaster/bgwriter.c 
b/src/backend/postmaster/bgwriter.c
index f11ce27..07f987c 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -70,7 +70,7 @@ int   BgWriterDelay = 200;
  * Interval in which standby snapshots are logged into the WAL stream, in
  * milliseconds.
  */
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1
 
 /*
  * LSN and timestamp at which we last issued a LogStandbySnapshot(), to avoid
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 
b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 9270d7b..0f170a4 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -1085,6 +1085,7 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char 
*slotname,
appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
}
 
+   pg_usleep(1000 * 1000);
res = libpqrcv_PQexec(conn->streamConn, cmd.data);
pfree(cmd.data);
 
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index cebfa52..c371369 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -467,6 +467,7 @@ $psql_subscriber{run}->pump_nb();
 
 # Speed up the subscription creation
 $node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+$node_primary->safe_psql('postgres', "CHECKPOINT");
 
 # Explicitly shut down psql instance gracefully - to avoid hangs
 # or worse on windows
Author: Noah Misch 
Commit: Noah Misch 

Bound waits in 035_standby_logical_decoding.pl.

One IPC::Run::start() used an IPC::Run::timer() without checking for
expiration.  The other used no timeout or timer.  Back-patch to v16,
which introduced the test.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index c371369..61da915 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -21,7 +21,6 @@ my $node_cascading_standby =
   PostgreSQL::Test::Cluster->new('cascading_standby');
 my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
-my $psql_timeout = IPC::Run::timer($default_timeout);
 my $res;
 
 # Name for the physical slot on primary
@@ -90,7 +89,8 @@ sub make_slot_active
'>',
$to_stdout,
'2>',
-   $to_stderr);
+   $to_stderr,
+   IPC::Run::tim

Re: What about Perl autodie?

2024-02-10 Thread Tom Lane
Andrew Dunstan  writes:
> On 2024-02-08 Th 11:08, Daniel Gustafsson wrote:
>> On 8 Feb 2024, at 16:53, Tom Lane  wrote:
>>> 2. Don't wait, migrate them all now.  This would mean requiring
>>> Perl 5.10.1 or later to run the TAP tests, even in back branches.
>>> I think #2 might not be all that radical.  We have nothing older
>>> than 5.14.0 in the buildfarm, so we don't really have much grounds
>>> for claiming that 5.8.3 will work today.  And 5.10.1 came out in
>>> 2009, so how likely is it that anyone cares anymore?

>> I would vote for this option, if we don't run the trailing edge anywhere 
>> where
>> breakage is visible to developers then it is like you say, far from 
>> guaranteed
>> to work.

> +1 from me too. We kept 5.8 going for a while because it was what the 
> Msys (v1) DTK perl was, but that doesn't matter any more I think.

I've reconfigured longfin, which was using perl 5.14.0 on all
branches, to use 5.10.1 on the pre-v16 branches (and it did pass).
This seems like a good change even if we don't pull the trigger on
the above proposal --- although if we don't, maybe I should see
if I can get 5.8.3 to build on that machine.

regards, tom lane




Re: Sequence Access Methods, round two

2024-02-10 Thread Michael Paquier
On Thu, Feb 08, 2024 at 04:06:36PM +0100, Peter Eisentraut wrote:
> On 19.01.24 00:27, Michael Paquier wrote:
>> The reason why this stuff has bumped into my desk is that we have no
>> good solution in-core for globally-distributed transactions for
>> active-active deployments.  First, anything we have needs to be
>> plugged into default expressions of attributes like with [1] or [2],
>> or a tweak is to use sequence values that are computed with different
>> increments to avoid value overlaps across nodes.  Both of these
>> require application changes, which is meh for a bunch of users.
> 
> I don't follow how these require "application changes".  I guess it depends
> on where you define the boundary of the "application".

Yep.  There's a dependency to that.

> The cited solutions
> require that you specify a different default expression for "id" columns.
> Is that part of the application side?  How would your solution work on that
> level?  AFAICT, you'd still need to specify the sequence AM when you create
> the sequence or identity column.  So you'd need to modify the DDL code in
> any case.

One idea is to rely on a GUC to control what is the default sequence
AM when taking the DefineRelation() path, so as the sequence AM
attached to a sequence is known for any DDL operation that may create
one internally, including generated columns.  The patch set does that
with default_sequence_access_method, including support for
pg_dump[all] and pg_restore to give the possibility to one to force a
new default or just dump data without a specific AM (this uses SET
commands in-between the CREATE/ALTER commands).
--
Michael


signature.asc
Description: PGP signature


Re: Patch: Add parse_type Function

2024-02-10 Thread jian he
+ /*
+ * Parse type-name argument to obtain type OID and encoded typmod. We don't
+ * need to check for parseTypeString failure, but just let the error be
+ * raised. The 0 arg works both as the `Node *escontext` arg in Postgres 16
+ * and the `bool missing_ok` arg in 9.4-15.
+ */
+ (void) parseTypeString(type, &typid, &typmod, 0);
if you are wondering around other code deal with NULL, ErrorSaveContext.
NULL would be more correct?
`(void) parseTypeString(type, &typid, &typmod, NULL);`

also parseTypeString already explained the third argument, our
comments can be simplified as:
/*
* Parse type-name argument to obtain type OID and encoded typmod. We don't
* need to handle parseTypeString failure, just let the error be
* raised.
*/

cosmetic issue. Looking around other error handling code, the format
should be something like:
`
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function returning record called in"
 "context that cannot accept type record")));
`

`PG_FUNCTION_INFO_V1(parse_type);`
is unnecessary?
based on the error information:  https://cirrus-ci.com/task/5899457502380032
not sure why it only fails on windows.

+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+#undef PARSE_TYPE_STRING_COLS
Are these necessary?
given that the comments already say return the OID and type modifier.


+( typid oid,
+  typmod int4 )
here `int4` should be `integer`?

commit message:
`Also accounts for data typs that require the SQL grammar to be parsed:`
except the typo issue, this sentence is still hard for me to understand.

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitabld for passing to `format_type()`.

suitabld should be suitable.


+   
+Parses a string representing an SQL type declaration as used in a
+CREATE TABLE statement, optionally schema-qualified.
+Returns a record with two fields, typid and
+typmod, representing the OID and
modifier for the
+type. These correspond to the parameters to pass to the
+format_type
function.
+   

can be simplified:
+   
+Parses a string representing an SQL data type, optionally
schema-qualified.
+Returns a record with two fields, typid and
+typmod, representing the OID and
modifier for the
+type. These correspond to the parameters to pass to the
+format_type
function.
+   
(I don't have a strong opinion though)




Re: Popcount optimization using AVX512

2024-02-10 Thread Noah Misch
On Fri, Feb 09, 2024 at 08:33:23PM -0800, Andres Freund wrote:
> On 2024-02-09 15:27:57 -0800, Noah Misch wrote:
> > On Fri, Feb 09, 2024 at 10:24:32AM -0800, Andres Freund wrote:
> > > On 2024-01-26 07:42:33 +0100, Alvaro Herrera wrote:
> > > > This suggests that finding a way to make the ifunc stuff work (with good
> > > > performance) is critical to this work.
> > > 
> > > Ifuncs are effectively implemented as a function call via a pointer, 
> > > they're
> > > not magic, unfortunately. The sole trick they provide is that you don't
> > > manually have to use the function pointer.
> > 
> > The IFUNC creators introduced it so glibc could use arch-specific memcpy 
> > with
> > the instruction sequence of a non-pointer, extern function call, not the
> > instruction sequence of a function pointer call.
> 
> My understanding is that the ifunc mechanism just avoid the need for repeated
> indirect calls/jumps to implement a single function call, not the use of
> indirect function calls at all. Calls into shared libraries, like libc, are
> indirected via the GOT / PLT, i.e. an indirect function call/jump.  Without
> ifuncs, the target of the function call would then have to dispatch to the
> resolved function. Ifuncs allow to avoid this repeated dispatch by moving the
> dispatch to the dynamic linker stage, modifying the contents of the GOT/PLT to
> point to the right function. Thus ifuncs are an optimization when calling a
> function in a shared library that's then dispatched depending on the cpu
> capabilities.
> 
> However, in our case, where the code is in the same binary, function calls
> implemented in the main binary directly (possibly via a static library) don't
> go through GOT/PLT. In such a case, use of ifuncs turns a normal direct
> function call into one going through the GOT/PLT, i.e. makes it indirect. The
> same is true for calls within a shared library if either explicit symbol
> visibility is used, or -symbolic, -Wl,-Bsymbolic or such is used. Therefore
> there's no efficiency gain of ifuncs over a call via function pointer.
> 
> 
> This isn't because ifunc is implemented badly or something - the reason for
> this is that dynamic relocations aren't typically implemented by patching all
> callsites (".text relocations"), which is what you would need to avoid the
> need for an indirect call to something that fundamentally cannot be a constant
> address at link time. The reason text relocations are disfavored is that
> they can make program startup quite slow, that they require allowing
> modifications to executable pages which are disliked due to the security
> implications, and that they make the code non-shareable, as the in-memory
> executable code has to differ from the on-disk code.
> 
> 
> I actually think ifuncs within the same binary are a tad *slower* than plain
> function pointer calls, unless -fno-plt is used. Without -fno-plt, an ifunc is
> called by 1) a direct call into the PLT, 2) loading the target address from
> the GOT, 3) making an an indirect jump to that address.  Whereas a "plain
> indirect function call" is just 1) load target address from variable 2) making
> an indirect jump to that address. With -fno-plt the callsites themselves load
> the address from the GOT.

That sounds more accurate than what I wrote.  Thanks.




Re: glibc qsort() vulnerability

2024-02-10 Thread Nathan Bossart
On Sat, Feb 10, 2024 at 08:59:06AM +0100, Mats Kindahl wrote:
> Split the code into two patches: one that just adds the functions
> (including the new pg_cmp_size()) to common/int.h and one that starts using
> them. I picked the name "pg_cmp_size" rather than "pg_cmp_size_t" since
> "_t" is usually used as a suffix for types.
> 
> I added a comment to the (a > b) - (a < b) return and have also added casts
> to (int32) for the int16 and uint16 functions (we need a signed int for
> uin16 since we need to be able to get a negative number).
> 
> Changed the type of two instances that had an implicit cast from size_t to
> int and used the new pg_,cmp_size() function.
> 
> Also fixed the missed replacements in the "contrib" directory.

Thanks for the new patches.  I think the comparison in resowner.c is
backwards, and I think we should expand on some of the commentary in int.h.
For example, the comment at the top of int.h seems very tailored to the
existing functions and should probably be adjusted.  And the "comparison
routines for integers" comment might benefit from some additional details
about the purpose and guarantees of the new functions.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: backend *.c #include cleanup (IWYU)

2024-02-10 Thread Nathan Bossart
On Sat, Feb 10, 2024 at 08:40:43AM +0100, Peter Eisentraut wrote:
> The purpose of this patch is twofold: One, it's of course a nice cleanup.
> Two, this is a test how well IWYU might work for us.  If we find either by
> human interpretation that a change doesn't make sense, or something breaks
> on some platform, then that would be useful feedback (perhaps to addressed
> by more pragma annotations or more test coverage).
> 
> (Interestingly, IWYU has been mentioned in src/tools/pginclude/README since
> 2012.  Has anyone else played with it?  Was it not mature enough back then?)

I haven't played with it at all, but the topic reminds me of this thread:


https://postgr.es/m/flat/CALDaNm1B9naPDTm3ox1m_yZvOm3KA5S4kZQSWWAeLHAQ%3D3gV1Q%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add native windows on arm64 support

2024-02-10 Thread Andrew Dunstan


On 2024-02-10 Sa 12:20, Dave Cramer wrote:



On Sat, 10 Feb 2024 at 11:19, Andrew Dunstan  wrote:


On 2024-02-09 Fr 14:23, Dave Cramer wrote:


Dave Cramer
www.postgres.rocks 


On Fri, 9 Feb 2024 at 07:18, Dave Cramer
  wrote:





On Fri, 9 Feb 2024 at 00:26, Michael Paquier
 wrote:

On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.

Only with the version posted at [1]? Interesting, that's
the same
contents as v8 posted upthread, minus src/tools/ because
we don't need
to care about them anymore.

Andrew, what's happening on the test side?  It does not
seem you've
mentioned any details about what is going wrong, or I
have just missed
them.

> I'll try the buildfarm next.

[1]:

https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5...@dunslane.net


interestingly meson test does not produce any error
The buildfarm produces the following error for me:

-SELECT relname, attname, coltypes, get_columns_length(coltypes)
- FROM check_columns
- WHERE get_columns_length(coltypes) % 8 != 0 OR
-       'name'::regtype::oid = ANY(coltypes);
- relname | attname | coltypes | get_columns_length
--+-+--+
-(0 rows)
-
+server closed the connection unexpectedly
+This probably means the server terminated abnormally
+before or while processing the request.
+connection to server was lost


Actually digging some more, here is the actual error

2024-02-09 13:31:11.008 -05 postmaster[10672] LOG:  server
process (PID 11204) was terminated by exception 0xC005
2024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL:  Failed
process was running: VACUUM;
2024-02-09 13:31:11.008 -05 postmaster[10672] HINT:  See C
include file "ntstatus.h" for a description of the hexadecimal value.
2024-02-09 13:31:11.008 -05 postmaster[10672] LOG:  terminating
any other active server processes
2024-02-09 13:31:11.013 -05 postmaster[10672] LOG:  all server
processes terminated; reinitializing
2024-02-09 13:31:11.034 -05 startup[6152] LOG:  database system
was interrupted; last known up at 2024-02-09 13:31:01 -05






So how does one debug this ?

Also if I `run meson test` I don't see this error. What does the 
buildfarm do differently?



First it does this:


   meson test -C $pgsql --no-rebuild --suite setup


Then it does this (jflag is for the number of jobs):


   meson test -t $meson_test_timeout $jflag -C $pgsql --logbase
   checklog --print-errorlogs --no-rebuild --suite regress
   --test-args=--no-locale


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


RE: Psql meta-command conninfo+

2024-02-10 Thread Maiquel Grassi
>Database   | postgres
>[...]
>Host   | 127.0.0.1
>Encryption | SSL
>Protocol   | PQsslAttribute(protocol)
>Cipher | PQsslAttribute(cipher)
>Compression| PQsslAttribute(compression)
>
>When GSS, like this
>
>Database   | postgres
>[...]
>Host   | 127.0.0.1
>Encryption | GSS

--//--

Hi PgHackers,

Columns were added for SSL and GSS.


For SSL, I conducted some tests as follows. For GSS, I will perform
them and intend to provide a sample here in the next interaction.

If anyone can and wants to test GSSAPI as well, I appreciate it.

[postgres@localhost bin]$ ./psql -h localhost -p 5432 -x

psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "localhost" 
(address "::1") at port "5432".
postgres=# \conninfo+
Current Connection Information
-[ RECORD 1 ]--+--
Database   | postgres
Authenticated User | postgres
System User|
Current User   | postgres
Session User   | postgres
Backend PID| 15809
Server Address | ::1
Server Port| 5432
Client Address | ::1
Client Port| 56890
Socket Directory   |
Host   | localhost

postgres=# \q
[postgres@localhost bin]$ ./psql -h localhost -p 5433 -x
psql (17devel, server 15.6)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, 
compression: off)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "localhost" 
(address "::1") at port "5433".
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, 
compression: off)
postgres=# \conninfo+
Current Connection Information
-[ RECORD 1 ]--+
Database   | postgres
Authenticated User | postgres
Current User   | postgres
Session User   | postgres
Backend PID| 15811
Server Address | ::1
Server Port| 5433
Client Address | ::1
Client Port| 40622
Socket Directory   |
Host   | localhost
Encryption | SSL
Protocol   | TLSv1.2
Cipher | ECDHE-RSA-AES256-GCM-SHA384
Compression| off

Regards,
Maiquel Grassi.


v13-0001-psql-meta-command-conninfo-plus.patch
Description: v13-0001-psql-meta-command-conninfo-plus.patch


Re: recently added jsonpath method change jsonb_path_query, jsonb_path_query_first immutability

2024-02-10 Thread Andrew Dunstan


On 2024-02-08 Th 21:02, Jeevan Chalke wrote:



On Thu, Feb 8, 2024 at 2:22 PM jian he  
wrote:


On Thu, Feb 8, 2024 at 1:27 PM Jeevan Chalke
 wrote:
>
>
>
> On Wed, Feb 7, 2024 at 9:13 PM jian he
 wrote:
>>
>> On Wed, Feb 7, 2024 at 7:36 PM Jeevan Chalke
>>  wrote:
>> > Added checkTimezoneIsUsedForCast() check where ever we are
casting timezoned to non-timezoned types and vice-versa.
>>
>> https://www.postgresql.org/docs/devel/functions-json.html
>> above Table 9.51. jsonpath Filter Expression Elements, the Note
>> section, do we also need to rephrase it?
>
>
> OK. Added a line for the same.
>

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8..37ae2d1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18240,7 +18240,11 @@ ERROR:  jsonpath member accessor can only be
applied to an object
       timestamptz, and time to
timetz.
       However, all but the first of these conversions depend on
the current
        setting, and thus can only
be performed
-      within timezone-aware jsonpath functions.
+      within timezone-aware jsonpath functions. 
Similarly, other
+      date/time-related methods that convert string to the
date/time types
+      also do the casting and may involve the current
+      .  To preserve the
immutability, those can
+      only be performed within timezone-aware
jsonpath functions.
      
     

my proposed minor changes:
-      within timezone-aware jsonpath functions.
+      within timezone-aware jsonpath functions.
Similarly, other
+      date/time-related methods that convert string to the
date/time types
+      also do the casting and may involve the current
+       setting. Those conversions can
+      only be performed within timezone-aware
jsonpath functions.
I don't have a strong opinion, though.


That seems fine as well. Let's leave that to the committer.


I edited slightly to my taste, and committed the patch. Thanks.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: [PATCH] Add native windows on arm64 support

2024-02-10 Thread Dave Cramer
On Sat, 10 Feb 2024 at 11:19, Andrew Dunstan  wrote:

>
> On 2024-02-09 Fr 14:23, Dave Cramer wrote:
>
>
> Dave Cramer
> www.postgres.rocks
>
>
> On Fri, 9 Feb 2024 at 07:18, Dave Cramer 
>  wrote:
>
>>
>>
>>
>>
>> On Fri, 9 Feb 2024 at 00:26, Michael Paquier  wrote:
>>
>>> On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
>>> > Thanks, this patch works and
>>> > testing with meson passes.
>>>
>>> Only with the version posted at [1]?  Interesting, that's the same
>>> contents as v8 posted upthread, minus src/tools/ because we don't need
>>> to care about them anymore.
>>>
>>> Andrew, what's happening on the test side?  It does not seem you've
>>> mentioned any details about what is going wrong, or I have just missed
>>> them.
>>>
>>> > I'll try the buildfarm next.
>>>
>>> [1]:
>>> https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5...@dunslane.net
>>
>>
>> interestingly meson test does not produce any error
>> The buildfarm produces the following error for me:
>>
>> -SELECT relname, attname, coltypes, get_columns_length(coltypes)
>> - FROM check_columns
>> - WHERE get_columns_length(coltypes) % 8 != 0 OR
>> -   'name'::regtype::oid = ANY(coltypes);
>> - relname | attname | coltypes | get_columns_length
>> --+-+--+
>> -(0 rows)
>> -
>> +server closed the connection unexpectedly
>> + This probably means the server terminated abnormally
>> + before or while processing the request.
>> +connection to server was lost
>>
>
> Actually digging some more, here is the actual error
>
> 2024-02-09 13:31:11.008 -05 postmaster[10672] LOG:  server process (PID
> 11204) was terminated by exception 0xC005
> 2024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL:  Failed process was
> running: VACUUM;
> 2024-02-09 13:31:11.008 -05 postmaster[10672] HINT:  See C include file
> "ntstatus.h" for a description of the hexadecimal value.
> 2024-02-09 13:31:11.008 -05 postmaster[10672] LOG:  terminating any other
> active server processes
> 2024-02-09 13:31:11.013 -05 postmaster[10672] LOG:  all server processes
> terminated; reinitializing
> 2024-02-09 13:31:11.034 -05 startup[6152] LOG:  database system was
> interrupted; last known up at 2024-02-09 13:31:01 -05
>
>
>
>
>
So how does one debug this ?

Also if I `run meson test` I don't see this error. What does the buildfarm
do differently?

Dave

> Yes, this is pretty much what I saw.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


Re: [PATCH] Add native windows on arm64 support

2024-02-10 Thread Andrew Dunstan


On 2024-02-09 Fr 14:23, Dave Cramer wrote:


Dave Cramer
www.postgres.rocks


On Fri, 9 Feb 2024 at 07:18, Dave Cramer  
wrote:






On Fri, 9 Feb 2024 at 00:26, Michael Paquier 
wrote:

On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.

Only with the version posted at [1]?  Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we
don't need
to care about them anymore.

Andrew, what's happening on the test side?  It does not seem
you've
mentioned any details about what is going wrong, or I have
just missed
them.

> I'll try the buildfarm next.

[1]:

https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5...@dunslane.net


interestingly meson test does not produce any error
The buildfarm produces the following error for me:

-SELECT relname, attname, coltypes, get_columns_length(coltypes)
- FROM check_columns
- WHERE get_columns_length(coltypes) % 8 != 0 OR
-       'name'::regtype::oid = ANY(coltypes);
- relname | attname | coltypes | get_columns_length
--+-+--+
-(0 rows)
-
+server closed the connection unexpectedly
+This probably means the server terminated abnormally
+before or while processing the request.
+connection to server was lost


Actually digging some more, here is the actual error

2024-02-09 13:31:11.008 -05 postmaster[10672] LOG:  server process 
(PID 11204) was terminated by exception 0xC005
2024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL:  Failed process 
was running: VACUUM;
2024-02-09 13:31:11.008 -05 postmaster[10672] HINT:  See C include 
file "ntstatus.h" for a description of the hexadecimal value.
2024-02-09 13:31:11.008 -05 postmaster[10672] LOG:  terminating any 
other active server processes
2024-02-09 13:31:11.013 -05 postmaster[10672] LOG:  all server 
processes terminated; reinitializing
2024-02-09 13:31:11.034 -05 startup[6152] LOG:  database system was 
interrupted; last known up at 2024-02-09 13:31:01 -05






Yes, this is pretty much what I saw.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2024-02-10 Thread Andrew Dunstan


On 2024-02-07 We 05:37, Gabriele Bartolini wrote:

Hi Joel,

On Wed, 7 Feb 2024 at 10:00, Joel Jacobson  wrote:

On Fri, Sep 8, 2023, at 16:17, Gabriele Bartolini wrote:
> ```
> postgres=# ALTER SYSTEM SET wal_level TO minimal;
> ERROR:  could not open file "postgresql.auto.conf": Permission
denied
> ```

+1 to simply mark postgresql.auto.conf file as not being writeable.

To improve the UX experience, how about first checking if the file
is not writeable, or catch EACCESS, and add a user-friendly hint?

```
postgres=# ALTER SYSTEM SET wal_level TO minimal;
ERROR:  could not open file "postgresql.auto.conf": Permission denied
HINT: The ALTER SYSTEM command is effectively disabled as the
configuration file is set to read-only.
```


This would do - provided we fix the issue with pg_rewind not handling 
read-only files in PGDATA.




This seems like the simplest solution. And maybe we should be fixing 
pg_rewind regardless of this issue?



cheers


andrew


--

Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: What about Perl autodie?

2024-02-10 Thread Andrew Dunstan



On 2024-02-08 Th 11:08, Daniel Gustafsson wrote:

On 8 Feb 2024, at 16:53, Tom Lane  wrote:
2. Don't wait, migrate them all now.  This would mean requiring
Perl 5.10.1 or later to run the TAP tests, even in back branches.

I think #2 might not be all that radical.  We have nothing older
than 5.14.0 in the buildfarm, so we don't really have much grounds
for claiming that 5.8.3 will work today.  And 5.10.1 came out in
2009, so how likely is it that anyone cares anymore?

I would vote for this option, if we don't run the trailing edge anywhere where
breakage is visible to developers then it is like you say, far from guaranteed
to work.



+1 from me too. We kept 5.8 going for a while because it was what the 
Msys (v1) DTK perl was, but that doesn't matter any more I think.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Synchronizing slots from primary to standby

2024-02-10 Thread Amit Kapila
On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada  wrote:
>
> On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
>  wrote:
>
> > Another alternative is to register the callback when calling slotsync 
> > functions
> > and unregister it after the function call. And register the callback in
> > slotsyncworkmain() for the slotsync worker patch, although this may adds a 
> > few
> > more codes.
>
> Another idea is that SyncReplicationSlots() calls synchronize_slots()
> in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure
> to clear the flag in case of ERROR or FATAL. And the slotsync worker
> uses the before_shmem_callback to clear the flag.
>

+1. This sounds like a better way to clear the flag.

-- 
With Regards,
Amit Kapila.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-02-10 Thread John Naylor
On Tue, Feb 6, 2024 at 9:58 AM Masahiko Sawada  wrote:
>
> On Fri, Feb 2, 2024 at 8:47 PM John Naylor  wrote:

> > My todo:
> > - benchmark tid store / vacuum again, since we haven't since varlen
> > types and removing unnecessary locks.

I ran a vacuum benchmark similar to the one in [1] (unlogged tables
for reproducibility), but smaller tables (100 million records),
deleting only the last 20% of the table, and including a parallel
vacuum test. Scripts attached.

monotonically ordered int column index:

master:
system usage: CPU: user: 4.27 s, system: 0.41 s, elapsed: 4.70 s
system usage: CPU: user: 4.23 s, system: 0.44 s, elapsed: 4.69 s
system usage: CPU: user: 4.26 s, system: 0.39 s, elapsed: 4.66 s

v-59:
system usage: CPU: user: 3.10 s, system: 0.44 s, elapsed: 3.56 s
system usage: CPU: user: 3.07 s, system: 0.35 s, elapsed: 3.43 s
system usage: CPU: user: 3.07 s, system: 0.36 s, elapsed: 3.44 s

uuid column index:

master:
system usage: CPU: user: 18.22 s, system: 1.70 s, elapsed: 20.01 s
system usage: CPU: user: 17.70 s, system: 1.70 s, elapsed: 19.48 s
system usage: CPU: user: 18.48 s, system: 1.59 s, elapsed: 20.43 s

v-59:
system usage: CPU: user: 5.18 s, system: 1.18 s, elapsed: 6.45 s
system usage: CPU: user: 6.56 s, system: 1.39 s, elapsed: 7.99 s
system usage: CPU: user: 6.51 s, system: 1.44 s, elapsed: 8.05 s

int & uuid indexes in parallel:

master:
system usage: CPU: user: 4.53 s, system: 1.22 s, elapsed: 20.43 s
system usage: CPU: user: 4.49 s, system: 1.29 s, elapsed: 20.98 s
system usage: CPU: user: 4.46 s, system: 1.33 s, elapsed: 20.50 s

v59:
system usage: CPU: user: 2.09 s, system: 0.32 s, elapsed: 4.86 s
system usage: CPU: user: 3.76 s, system: 0.51 s, elapsed: 8.92 s
system usage: CPU: user: 3.83 s, system: 0.54 s, elapsed: 9.09 s

Over all, I'm pleased with these results, although I'm confused why
sometimes with the patch the first run reports running faster than the
others. I'm curious what others get. Traversing a tree that lives in
DSA has some overhead, as expected, but still comes out way ahead of
master.

There are still some micro-benchmarks we could do on tidstore, and
it'd be good to find out worse-case memory use (1 dead tuple each on
spread-out pages), but this is decent demonstration.

> > I'm not sure what the test_node_types_* functions are testing that
> > test_basic doesn't. They have a different, and confusing, way to stop
> > at every size class and check the keys/values. It seems we can replace
> > all that with two more calls (asc/desc) to test_basic, with the
> > maximum level.

v58-0008:

+ /* borrowed from RT_MAX_SHIFT */
+ const int max_shift = (pg_leftmost_one_pos64(UINT64_MAX) /
BITS_PER_BYTE) * BITS_PER_BYTE;

This is harder to read than "64 - 8", and doesn't really help
maintainability either.
Maybe "(sizeof(uint64) - 1) * BITS_PER_BYTE" is a good compromise.

+ /* leaf nodes */
+ test_basic(test_info, 0);

+ /* internal nodes */
+ test_basic(test_info, 8);
+
+ /* max-level nodes */
+ test_basic(test_info, max_shift);

This three-way terminology is not very informative. How about:

+   /* a tree with one level, i.e. a single node under the root node. */
 ...
+   /* a tree with two levels */
 ...
+   /* a tree with the maximum number of levels */

+static void
+test_basic(rt_node_class_test_elem *test_info, int shift)
+{
+ elog(NOTICE, "testing node %s with shift %d", test_info->class_name, shift);
+
+ /* Test nodes while changing the key insertion order */
+ do_test_basic(test_info->nkeys, shift, false);
+ do_test_basic(test_info->nkeys, shift, true);

Adding a level of indirection makes this harder to read, and do we
still know whether a test failed in asc or desc keys?

> > My earlier opinion was that "handle" was a nicer variable name, but
> > this brings back the typedef and also keeps the variable name I didn't
> > like, but pushes it down into the function. I'm a bit confused, so
> > I've kept these not-squashed for now.
>
> I misunderstood your comment. I've changed to use a variable name
> rt_handle and removed the TidStoreHandle type. 0013 patch.

(diff against an earlier version)
-   pvs->shared->dead_items_handle = TidStoreGetHandle(dead_items);
+   pvs->shared->dead_items_dp = TidStoreGetHandle(dead_items);

Shall we use "handle" in vacuum_parallel.c as well?

> > I'm pretty sure there's an
> > accidental memset call that crept in there, but I'm running out of
> > steam today.

I have just a little bit of work to add for v59:

v59-0009 - set_offset_bitmap_at() will call memset if it needs to zero
any bitmapwords. That can only happen if e.g. there is an offset > 128
and there are none between 64 and 128, so not a huge deal but I think
it's a bit nicer in this patch.

> > >  * WIP: notes about traditional radix tree trading off span vs height...
> > >
> > > Are you going to write it?
> >
> > Yes, when I draft a rough commit message, (for next time).

I haven't gotten to the commit message, but:

v59-0004 - I did some rew

Re: Thoughts about NUM_BUFFER_PARTITIONS

2024-02-10 Thread Tomas Vondra
On 2/8/24 14:27, wenhui qiu wrote:
> Hi Heikki Linnakangas
> I think the larger shared buffer  higher the probability of multiple
> backend processes accessing the same bucket slot BufMappingLock
> simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
> have free time, I want to do this test. I have seen some tests, but the
> result report is in Chinese
> 

I think Heikki is right this is unrelated to the amount of RAM. The
partitions are meant to reduce the number of lock collisions when
multiple processes try to map a buffer concurrently. But the machines
got much larger in this regard too - in 2006 the common CPUs had maybe
2-4 cores, now it's common to have CPUs with ~100 cores, and systems
with multiple of them. OTOH the time spent holing the partition lock
should be pretty low, IIRC we pretty much just pin the buffer before
releasing it, and the backend should do plenty other expensive stuff. So
who knows how many backends end up doing the locking at the same time.

OTOH, with 128 partitions it takes just 14 backends to have 50% chance
of a conflict, so with enough cores ... But how many partitions would be
enough? With 1024 partitions it still takes only 38 backends to get 50%
chance of a collision. Better, but considering we now have hundreds of
cores, not sure if sufficient.

(Obviously, we probably want much lower probability of a collision, I
only used 50% to illustrate the changes).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Synchronizing slots from primary to standby

2024-02-10 Thread Masahiko Sawada
On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, February 9, 2024 2:44 PM Masahiko Sawada  
> wrote:
> >
> > On Thu, Feb 8, 2024 at 8:01 PM shveta malik  wrote:
> > >
> > > On Thu, Feb 8, 2024 at 12:08 PM Peter Smith 
> > wrote:
> > > >
> > > > Here are some review comments for patch v80_2-0001.
> > >
> > > Thanks for the feedback Peter. Addressed the comments in v81.
> > > Attached patch001 for early feedback. Rest of the patches need
> > > rebasing and thus will post those later.
> > >
> > > It also addresses comments by Amit in [1].
> >
> > Thank you for updating the patch! Here are random comments:
>
> Thanks for the comments!
>
> >
> > ---
> > +
> > +/*
> > + * Register the callback function to clean up the shared memory of
> > slot
> > + * synchronization.
> > + */
> > +SlotSyncInitialize();
> >
> > I think it would have a wider impact than expected. IIUC this callback is 
> > needed
> > only for processes who calls synchronize_slots(). Why do we want all 
> > processes
> > to register this callback?
>
> I think the current style is similar to the ReplicationSlotInitialize() above 
> it. For backend,
> both of them can only be used when user calls slot SQL functions. So, I think 
> it could be fine to
> register it at the general place which can also avoid registering the same 
> again for the later
> slotsync worker patch.

Yes, but it seems to be a legitimate case since replication slot code
involves many functions that need the callback to clear the flag. On
the other hand, in the slotsync code, only one function,
SyncReplicationSlots(), needs the callback at least in 0001 patch.

> Another alternative is to register the callback when calling slotsync 
> functions
> and unregister it after the function call. And register the callback in
> slotsyncworkmain() for the slotsync worker patch, although this may adds a few
> more codes.

Another idea is that SyncReplicationSlots() calls synchronize_slots()
in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure
to clear the flag in case of ERROR or FATAL. And the slotsync worker
uses the before_shmem_callback to clear the flag.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




RE: Synchronizing slots from primary to standby

2024-02-10 Thread Zhijie Hou (Fujitsu)
On Saturday, February 10, 2024 11:37 AM Zhijie Hou (Fujitsu) 
 wrote:
> 
> Attach the V83 patch which addressed Peter[1][2], Amit and Sawada-san's[3]
> comments. Only 0001 is sent in this version, we will send other patches after
> rebasing.
> 
> [1]
> https://www.postgresql.org/message-id/CAHut%2BPvW8s6AYD2UD0xadM%2B
> 3VqBkXP2LjD30LEGRkHUa-Szm%2BQ%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CAHut%2BPv88vp9mNxX37c_Bc5FDBs
> TS%2BdhV02Vgip9Wqwh7GBYSg%40mail.gmail.com
> [3]
> https://www.postgresql.org/message-id/CAD21AoDvyLu%3D2-mqfGn_T_3jUa
> mR34w%2BsxKvYnVzKqTCpyq_FQ%40mail.gmail.com

I noticed one cfbot failure that the slot is not synced when the standby is
lagging behind the subscriber. I have modified the test to disable the sub
before syncing to avoid this failure. Attach the V83_2 patch, no other code
changes are included in this version.

Best Regards,
Hou zj


v83_2-0001-Add-a-slot-synchronization-function.patch
Description: v83_2-0001-Add-a-slot-synchronization-function.patch