Re: Perform streaming logical transactions by background workers and parallel apply

2022-09-24 Thread Amit Kapila
On Thu, Sep 22, 2022 at 3:41 PM Amit Kapila  wrote:
>
> On Thu, Sep 22, 2022 at 8:59 AM wangw.f...@fujitsu.com
>  wrote:
> >
>
> Few comments on v33-0001
> ===
>

Some more comments on v33-0001
=
1.
+ /* Information from the corresponding LogicalRepWorker slot. */
+ uint16 logicalrep_worker_generation;
+
+ int logicalrep_worker_slot_no;
+} ParallelApplyWorkerShared;

Both these variables are read/changed by leader/parallel workers
without using any lock (mutex). It seems currently there is no problem
because of the way the patch is using in_parallel_apply_xact but I
think it won't be a good idea to rely on it. I suggest using mutex to
operate on these variables and also check if the slot_no is in a valid
range after reading it in parallel_apply_free_worker, otherwise error
out using elog.

2.
 static void
 apply_handle_stream_stop(StringInfo s)
 {
- if (!in_streamed_transaction)
+ ParallelApplyWorkerInfo *winfo = NULL;
+ TransApplyAction apply_action;
+
+ if (!am_parallel_apply_worker() &&
+ (!in_streamed_transaction && !stream_apply_worker))
  ereport(ERROR,
  (errcode(ERRCODE_PROTOCOL_VIOLATION),
  errmsg_internal("STREAM STOP message without STREAM START")));

This check won't be able to detect missing stream start messages for
parallel apply workers apart from the first pair of start/stop. I
thought of adding in_remote_transaction check along with
am_parallel_apply_worker() to detect the same but that also won't work
because the parallel worker doesn't reset it at the stop message.
Another possibility is to introduce yet another variable for this but
that doesn't seem worth it. I would like to keep this check simple.
Can you think of any better way?

3. I think we can skip sending start/stop messages from the leader to
the parallel worker because unlike apply worker it will process only
one transaction-at-a-time. However, it is not clear whether that is
worth the effort because it is sent after logical_decoding_work_mem
changes. For now, I have added a comment for this in the attached
patch but let me if I am missing something or if I am wrong.

4.
postgres=# select pid, leader_pid, application_name, backend_type from
pg_stat_activity;
  pid  | leader_pid | application_name | backend_type
---++--+--
 27624 ||  | logical replication launcher
 17336 || psql | client backend
 26312 ||  | logical replication worker
 26376 || psql | client backend
 14004 ||  | logical replication worker

Here, the second worker entry is for the parallel worker. Isn't it
better if we distinguish this by keeping type as a logical replication
parallel worker? I think for this you need to change bgw_type in
logicalrep_worker_launch().

5. Can we name parallel_apply_subxact_info_add() as
parallel_apply_start_subtrans()?

Apart from the above, I have added/edited a few comments and made a
few other cosmetic changes in the attached.

-- 
With Regards,
Amit Kapila.


changes_atop_v33_1_amit.patch
Description: Binary data


Re: HOT chain validation in verify_heapam()

2022-09-24 Thread Himanshu Upadhyaya
On Tue, Sep 20, 2022 at 6:43 PM Robert Haas  wrote:

> I disapprove of ignoring the HEAP_COMBOCID flag. Emitting a message
> claiming that the CID has a certain value when that's actually a combo
> CID is misleading, so at least a different message wording is needed
> in such cases. But it's also not clear to me that the newer update has
> to have a higher combo CID, because combo CIDs can be reused. If you
> have multiple cursors open in the same transaction, the updates can be
> interleaved, and it seems to me that it might be possible for an older
> CID to have created a certain combo CID after a newer CID, and then
> both cursors could update the same page in succession and end up with
> combo CIDs out of numerical order. Unless somebody can make a
> convincing argument that this isn't possible, I think we should just
> skip this check for cases where either tuple has a combo CID.
>
> Here our objective is to validate if both Predecessor's xmin and current
Tuple's xmin are same then cmin of predecessor must be less than current
Tuple's cmin. In case when both tuple xmin's are same then I think
predecessor's t_cid will always hold combo CID.
Then either one or both tuple will always have a combo CID and skipping
this check based on "either tuple has a combo CID" will make this if
condition to be evaluated to false ''.


> +if (TransactionIdEquals(pred_xmin, curr_xmin) &&
> +(TransactionIdEquals(curr_xmin, curr_xmax) ||
> + !TransactionIdIsValid(curr_xmax)) && pred_cmin >=
> curr_cmin)
>
> I don't understand the reason for the middle part of this condition --
> TransactionIdEquals(curr_xmin, curr_xmax) ||
> !TransactionIdIsValid(curr_xmax). I suppose the comment is meant to
> explain this, but I still don't get it. If a tuple with XMIN 12345
> CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's
> corruption, regardless of what the XMAX of the second tuple may happen
> to be.
>

tuple | t_xmin | t_xmax | t_cid | t_ctid |  tuple_data_split
|
heap_tuple_infomask_flags

---+++---++-+--
-
 1 |971 |971 | 0 | (0,3)  |
{"\\x1774657374312020202020","\\x0100"} |
("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{})
 2 |971 |  0 | 1 | (0,2)  |
{"\\x1774657374322020202020","\\x0200"} |
("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
 3 |971 |971 | 1 | (0,4)  |
{"\\x1774657374322020202020","\\x0100"} |
("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY
_TUPLE}",{})
 4 |971 |972 | 0 | (0,5)  |
{"\\x1774657374332020202020","\\x0100"} |
("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
 5 |972 |  0 | 0 | (0,5)  |
{"\\x1774657374342020202020","\\x0100"} |
("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})

In the above case Tuple 1->3->4 is inserted and updated by xid 971 and
tuple 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its
predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of
deleting transaction(cmax), that is why we need to check xmax of the Tuple.

Please correct me if I am missing anything here?
-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com


Re: Pluggable toaster

2022-09-24 Thread Nikita Malakhov
Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one,
rebased onto the current
master (from today). The third patch contains documentation package, and
the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs
some refactoring -
move all files related to TOAST implementation into new folder
/backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with
reference TOAST mechanics;
v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster
API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov  wrote:

> Hi hackers!
>
> Cfbot is not happy with previous patchset, so I'm attaching new one,
> rebased onto current master
> (15b4). Also providing patch with documentation package, and the second
> one contains large
> README.toastapi file providing additional in-depth docs for developers.
>
> Comments would be greatly appreciated.
>
> Also, after checking patch sources I have a strong opinion that it needs
> some refactoring -
> move all files related to TOAST implementation into new folder
> /backend/access/toast where
> Generic (default) Toaster resides.
>
> Patchset consists of:
> v15-0001-toaster-interface.patch - Pluggable TOAST API interface along
> with reference TOAST mechanics;
> v15-0002-toaster-default.patch - Default TOAST re-implemented using
> Toaster API;
> v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package
>
> On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion 
> wrote:
>
>> On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov 
>> wrote:
>> > It would be more clear for complex data types like JSONB, where
>> developers could
>> > need some additional functionality to work with internal representation
>> of data type,
>> > and its full potential is revealed in our JSONB toaster extension. The
>> JSONB toaster
>> > is still in development but we plan to make it available soon.
>>
>> Okay. It'll be good to have that, because as it is now it's hard to
>> see the whole picture.
>>
>> > On installing dummy_toaster contrib: I've just checked it by making a
>> patch from commit
>> > and applying onto my clone of master and 2 patches provided in previous
>> email without
>> > any errors and sll checks passed - applying with git am, configure with
>> debug, cassert,
>> > depend and enable-tap-tests flags and run checks.
>> > Please advice what would cause such a behavior?
>>
>> I don't think the default pg_upgrade tests will upgrade contrib
>> objects (there are instructions in src/bin/pg_upgrade/TESTING that
>> cover manual dumps, if you prefer that method). My manual steps were
>> roughly
>>
>> =# CREATE EXTENSION dummy_toaster;
>> =# CREATE TABLE test (t TEXT
>> STORAGE external
>> TOASTER dummy_toaster_handler);
>> =# \q
>> $ initdb -D newdb
>> $ pg_ctl -D olddb stop
>> $ pg_upgrade -b /bin -B /bin -d
>> ./olddb -D ./newdb
>>
>> (where /bin is on the PATH, so we're using the right
>> binaries).
>>
>> Thanks,
>> --Jacob
>>
>
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> https://postgrespro.ru/
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


v16-0003-toaster-docs.patch.gz
Description: GNU Zip compressed data


v16-0002-toaster-default.patch.gz
Description: GNU Zip compressed data


v16-0001-toaster-interface.patch.gz
Description: GNU Zip compressed data


Re: pg_stat_statements and "IN" conditions

2022-09-24 Thread Dmitry Dolgov
> On Fri, Sep 16, 2022 at 09:25:13PM +0300, Sergei Kornilov wrote:
> Hello!
>
> Unfortunately the patch needs another rebase due to the recent split of guc.c 
> (0a20ff54f5e66158930d5328f89f087d4e9ab400)
>
> I'm reviewing a patch on top of a previous commit and noticed a failed test:
>
> #   Failed test 'no parameters missing from postgresql.conf.sample'
> #   at t/003_check_guc.pl line 82.
> #  got: '1'
> # expected: '0'
> # Looks like you failed 1 test of 3.
> t/003_check_guc.pl ..
>
> The new option has not been added to the postgresql.conf.sample
>
> PS: I would also like to have such a feature. It's hard to increase 
> pg_stat_statements.max or lose some entries just because some ORM sends 
> requests with a different number of parameters.

Thanks! I'll post the rebased version soon.




Re: tweak to a few index tests to hits ambuildempty() routine.

2022-09-24 Thread Noah Misch
On Wed, Sep 21, 2022 at 02:10:42PM +0700, a.kozhemya...@postgrespro.ru wrote:
> After analyzing this, I found out why we don't reach that Assert but we have
> coverage shown - firstly, it reached via another test, vacuum; secondly, it
> depends on the gcc optimization flag. We reach that Assert only when using
> -O0.
> If we build with -O2 or -Og that function is not reached (due to different
> results of the heap_prune_satisfies_vacuum() check inside
> heap_page_prune()).

With "make check MAX_CONNECTIONS=1", does that difference between -O0 and -O2
still appear?  Compiler optimization shouldn't consistently change pruning
decisions.  It could change pruning decisions probabilistically, by changing
which parallel actions overlap.  If the difference disappears under
MAX_CONNECTIONS=1, the system is likely fine.




Re: [BUG] Logical replica crash if there was an error in a function.

2022-09-24 Thread Tom Lane
"Anton A. Melnikov"  writes:
> [ v4-0001-Fix-logical-replica-assert-on-func-error.patch ]

I took a quick look at this.  I think you're solving the
problem in the wrong place.  The real issue is why are
we not setting up ActivePortal correctly when running
user-defined code in a logrep worker?  There is other code
that expects that to be set, eg EnsurePortalSnapshotExists.

regards, tom lane




Re: identifying the backend that owns a temporary schema

2022-09-24 Thread Tom Lane
Nathan Bossart  writes:
> On Tue, Aug 23, 2022 at 10:29:05AM +0100, Greg Stark wrote:
>> Alternately should pg_stat_activity show the actual temp schema name
>> instead of the id? I don't recall if it's visible outside the backend
>> but if it is, could pg_stat_activity show whether the temp schema is
>> actually attached or not?

> I'm open to adding the backend ID or the temp schema name to
> pg_stat_activity, but I wouldn't be surprised to learn that others aren't.

FWIW, I'd vote against adding the temp schema per se.  We can see from
outside whether the corresponding temp schema exists, but we can't readily
tell whether the session has decided to use it, so attributing it to the
session is a bit dangerous.  Maybe there is an argument for having
sessions report it to pgstats when they do adopt a temp schema, but I
think there'd be race conditions, rollback after error, and other issues
to contend with there.

The proposed patch seems like an independent first step in any case.

One thing I don't like about it documentation-wise is that it leaves
the concept of backend ID pretty much completely undefined.

regards, tom lane




Re: [RFC] building postgres with meson - v13

2022-09-24 Thread Tom Lane
... btw, shouldn't the CF entry [1] get closed now?
The cfbot's unhappy that the last patch no longer applies.

regards, tom lane

[1] https://commitfest.postgresql.org/39/3395/




Re: cpluspluscheck complains about use of register

2022-09-24 Thread Andres Freund
Hi,

On 2022-03-08 10:59:02 -0800, Andres Freund wrote:
> On 2022-03-08 13:46:36 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > When running cpluspluscheck I get many many complaints like
> > > /tmp/pg-test-repo/src/include/port/atomics/arch-x86.h:143:23: warning: 
> > > ISO C++17 does not allow ‘register’ storage class specifier [-Wregister]
> >
> > Interesting, I don't see that here.
> 
> Probably a question of the gcc version. I think starting with 11 g++ defaults
> to C++ 17.
> 
> 
> > > It seems we should just remove the use of register?
> >
> > I have a vague idea that it was once important to say "register" if
> > you are going to use the variable in an asm snippet that requires it
> > to be in a register.  That might be wrong, or it might be obsolete
> > even if once true.  We could try taking these out and seeing if the
> > buildfarm complains.
> 
> We have several inline asm statements not using register despite using
> variables in a register (e.g. pg_atomic_compare_exchange_u32_impl()), so I
> wouldn't expect a problem with compilers we support.
> 
> Should we make configure test for -Wregister? There's at least one additional
> use of register that we'd have to change (pg_regexec).
> 
> 
> > (If so, maybe -Wno-register would help?)
> 
> That's what I did to work around the flood of warnings locally, so it'd
> work.

I hit this again while porting cplupluscheck to be invoked by meson as
well. ISTM that we should just remove the uses of register. Yes, some very old
compilers might generate worse code without register, but I don't think we
need to care about peak efficiency with neolithic compilers.

Fabien raised the concern that removing register might lead to accidentally
adding pointers to such variables - I don't find that convincing, because a)
such code is typically inside a helper inline anyway b) we don't use register
widely enough to ensure this.


Attached is a patch removing uses of register. The use in regexec.c could
remain, since we only try to keep headers C++ clean. But there really doesn't
seem to be a good reason to use register in that spot.

I tried to use -Wregister to keep us honest going forward, but unfortunately
it only works with a C++ compiler...

I tested this by redefining register to something else, and I grepped for
non-comment uses of register. Entirely possible that I missed something.

Greetings,

Andres Freund
>From 03bf971d2dc701d473705fd00891028d140dd5ae Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 24 Sep 2022 12:01:06 -0700
Subject: [PATCH v1] Remove uses of register due to incompatibility with C++17
 and up

The use in regexec.c could remain, since we only try to keep headers C++
clean. But there really doesn't seem to be a good reason to use register in
that spot.

Discussion: https://postgr.es/m/20220308185902.ibdqmasoaunzj...@alap3.anarazel.de
---
 src/include/port/atomics/arch-x86.h |  2 +-
 src/include/storage/s_lock.h| 14 +++---
 src/backend/regex/regexec.c |  2 +-
 .cirrus.yml |  4 +---
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/include/port/atomics/arch-x86.h b/src/include/port/atomics/arch-x86.h
index cef1ba724c9..6c0b917f12e 100644
--- a/src/include/port/atomics/arch-x86.h
+++ b/src/include/port/atomics/arch-x86.h
@@ -140,7 +140,7 @@ pg_spin_delay_impl(void)
 static inline bool
 pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
 {
-	register char _res = 1;
+	char		_res = 1;
 
 	__asm__ __volatile__(
 		"	lock			\n"
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 65aa66c5984..4225d9b7fc3 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -142,7 +142,7 @@ typedef unsigned char slock_t;
 static __inline__ int
 tas(volatile slock_t *lock)
 {
-	register slock_t _res = 1;
+	slock_t		_res = 1;
 
 	/*
 	 * Use a non-locking test before asserting the bus lock.  Note that the
@@ -223,7 +223,7 @@ typedef unsigned char slock_t;
 static __inline__ int
 tas(volatile slock_t *lock)
 {
-	register slock_t _res = 1;
+	slock_t		_res = 1;
 
 	__asm__ __volatile__(
 		"	lock			\n"
@@ -356,7 +356,7 @@ typedef unsigned char slock_t;
 static __inline__ int
 tas(volatile slock_t *lock)
 {
-	register slock_t _res;
+	slock_t		_res;
 
 	/*
 	 *	See comment in src/backend/port/tas/sunstudio_sparc.s for why this
@@ -511,9 +511,9 @@ typedef unsigned int slock_t;
 static __inline__ int
 tas(volatile slock_t *lock)
 {
-	register volatile slock_t *_l = lock;
-	register int _res;
-	register int _tmp;
+	volatile slock_t *_l = lock;
+	int			_res;
+	int			_tmp;
 
 	__asm__ __volatile__(
 		"   .set push   \n"
@@ -574,7 +574,7 @@ static __inline__ int
 tas(volatile slock_t *lock)
 {
 	volatile int *lockword = TAS_ACTIVE_WORD(lock);
-	register int lockval;
+	int			lockval;
 
 	/*
 	 * The LDCWX instruction atomically clears the target word and
diff --git a/src/backend/regex/regexec.c b/src/backend/regex/regexec.c

Re: cpluspluscheck complains about use of register

2022-09-24 Thread Peter Geoghegan
On Sat, Sep 24, 2022 at 12:11 PM Andres Freund  wrote:
> I hit this again while porting cplupluscheck to be invoked by meson as
> well. ISTM that we should just remove the uses of register. Yes, some very old
> compilers might generate worse code without register, but I don't think we
> need to care about peak efficiency with neolithic compilers.

+1. I seem to recall reading that the register keyword was basically
useless as long as 15 years ago.

-- 
Peter Geoghegan




Re: cpluspluscheck complains about use of register

2022-09-24 Thread Tom Lane
Andres Freund  writes:
> I hit this again while porting cplupluscheck to be invoked by meson as
> well. ISTM that we should just remove the uses of register.

OK by me.

> I tried to use -Wregister to keep us honest going forward, but unfortunately
> it only works with a C++ compiler...

I think we only really care about stuff that cpluspluscheck would spot,
so I don't feel a need to mess with the standard compilation flags.

regards, tom lane




Re: cpluspluscheck complains about use of register

2022-09-24 Thread Andres Freund
Hi,

On 2022-09-24 16:01:25 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I hit this again while porting cplupluscheck to be invoked by meson as
> > well. ISTM that we should just remove the uses of register.
> 
> OK by me.

Done. Thanks Tom, Peter.




Re: Pluggable toaster

2022-09-24 Thread Nikita Malakhov
Hi hackers!
Last patchset has an invalid patch file - v16-0003-toaster-docs.patch.
Here's corrected patchset,
sorry for the noise.

On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov  wrote:

> Hi hackers!
>
> Cfbot is still not happy with the patchset, so I'm attaching a rebased
> one, rebased onto the current
> master (from today). The third patch contains documentation package, and
> the second one contains large
> README.toastapi file providing additional in-depth docs for developers.
>
> Comments would be greatly appreciated.
>
> Again, after checking patch sources I have a strong opinion that it needs
> some refactoring -
> move all files related to TOAST implementation into new folder
> /backend/access/toast where
> Generic (default) Toaster resides.
>
> Patchset consists of:
> v16-0001-toaster-interface.patch - Pluggable TOAST API interface along
> with reference TOAST mechanics;
> v16-0002-toaster-default.patch - Default TOAST re-implemented using
> Toaster API;
> v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package
>
> Actual GitHub branch resides at
> https://github.com/postgrespro/postgres/tree/toasterapi_clean
>
> On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov 
> wrote:
>
>> Hi hackers!
>>
>> Cfbot is not happy with previous patchset, so I'm attaching new one,
>> rebased onto current master
>> (15b4). Also providing patch with documentation package, and the second
>> one contains large
>> README.toastapi file providing additional in-depth docs for developers.
>>
>> Comments would be greatly appreciated.
>>
>> Also, after checking patch sources I have a strong opinion that it needs
>> some refactoring -
>> move all files related to TOAST implementation into new folder
>> /backend/access/toast where
>> Generic (default) Toaster resides.
>>
>> Patchset consists of:
>> v15-0001-toaster-interface.patch - Pluggable TOAST API interface along
>> with reference TOAST mechanics;
>> v15-0002-toaster-default.patch - Default TOAST re-implemented using
>> Toaster API;
>> v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package
>>
>> On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion 
>> wrote:
>>
>>> On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov 
>>> wrote:
>>> > It would be more clear for complex data types like JSONB, where
>>> developers could
>>> > need some additional functionality to work with internal
>>> representation of data type,
>>> > and its full potential is revealed in our JSONB toaster extension. The
>>> JSONB toaster
>>> > is still in development but we plan to make it available soon.
>>>
>>> Okay. It'll be good to have that, because as it is now it's hard to
>>> see the whole picture.
>>>
>>> > On installing dummy_toaster contrib: I've just checked it by making a
>>> patch from commit
>>> > and applying onto my clone of master and 2 patches provided in
>>> previous email without
>>> > any errors and sll checks passed - applying with git am, configure
>>> with debug, cassert,
>>> > depend and enable-tap-tests flags and run checks.
>>> > Please advice what would cause such a behavior?
>>>
>>> I don't think the default pg_upgrade tests will upgrade contrib
>>> objects (there are instructions in src/bin/pg_upgrade/TESTING that
>>> cover manual dumps, if you prefer that method). My manual steps were
>>> roughly
>>>
>>> =# CREATE EXTENSION dummy_toaster;
>>> =# CREATE TABLE test (t TEXT
>>> STORAGE external
>>> TOASTER dummy_toaster_handler);
>>> =# \q
>>> $ initdb -D newdb
>>> $ pg_ctl -D olddb stop
>>> $ pg_upgrade -b /bin -B /bin -d
>>> ./olddb -D ./newdb
>>>
>>> (where /bin is on the PATH, so we're using the right
>>> binaries).
>>>
>>> Thanks,
>>> --Jacob
>>>
>>
>>
>> --
>> Regards,
>> Nikita Malakhov
>> Postgres Professional
>> https://postgrespro.ru/
>>
>
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> https://postgrespro.ru/
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


v17-0003-toaster-docs.patch.gz
Description: GNU Zip compressed data


v17-0002-toaster-default.patch.gz
Description: GNU Zip compressed data


v17-0001-toaster-interface.patch.gz
Description: GNU Zip compressed data


Re: [RFC] building postgres with meson - v13

2022-09-24 Thread Peter Geoghegan
On Thu, Sep 22, 2022 at 2:50 PM Andres Freund  wrote:
> meson:
>
> time meson test
> real0m42.178s
> user7m8.533s
> sys 2m17.711s

I find that a more or less comparable test run on my workstation
(which has a Ryzen 9 5950X) takes just over 38 seconds. I think that
the improvement is far more pronounced on that machine compared to a
much older workstation.

One more question about this, that wasn't covered by the Wiki page: is
there some equivalent to "make installcheck" with meson builds?

-- 
Peter Geoghegan




Re: pg_stat_statements and "IN" conditions

2022-09-24 Thread Dmitry Dolgov
> On Sat, Sep 24, 2022 at 04:07:14PM +0200, Dmitry Dolgov wrote:
> > On Fri, Sep 16, 2022 at 09:25:13PM +0300, Sergei Kornilov wrote:
> > Hello!
> >
> > Unfortunately the patch needs another rebase due to the recent split of 
> > guc.c (0a20ff54f5e66158930d5328f89f087d4e9ab400)
> >
> > I'm reviewing a patch on top of a previous commit and noticed a failed test:
> >
> > #   Failed test 'no parameters missing from postgresql.conf.sample'
> > #   at t/003_check_guc.pl line 82.
> > #  got: '1'
> > # expected: '0'
> > # Looks like you failed 1 test of 3.
> > t/003_check_guc.pl ..
> >
> > The new option has not been added to the postgresql.conf.sample
> >
> > PS: I would also like to have such a feature. It's hard to increase 
> > pg_stat_statements.max or lose some entries just because some ORM sends 
> > requests with a different number of parameters.
>
> Thanks! I'll post the rebased version soon.

And here it is.
>From 327673290bfad8cebc00f9706a25d11034d2245d Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Sun, 24 Jul 2022 11:43:25 +0200
Subject: [PATCH v9] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on number of parameters, because every element of ArrayExpr is
jumbled. In certain situations it's undesirable, especially if the list
becomes too large.

Make Const expressions contribute nothing to the jumble hash if they're
a part of an ArrayExpr, which length is larger than specified threshold.
Allow to configure the threshold via the new GUC const_merge_threshold
with the default value zero, which disables this feature.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane
Tested-by: Chengxi Sun
---
 .../expected/pg_stat_statements.out   | 412 ++
 .../pg_stat_statements/pg_stat_statements.c   |  33 +-
 .../sql/pg_stat_statements.sql| 107 +
 doc/src/sgml/config.sgml  |  26 ++
 doc/src/sgml/pgstatstatements.sgml|  28 +-
 src/backend/utils/misc/guc_tables.c   |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/backend/utils/misc/queryjumble.c  | 105 -
 src/include/utils/queryjumble.h   |   5 +-
 9 files changed, 712 insertions(+), 19 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index ff0166fb9d..858cf49e66 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1102,4 +1102,416 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query 
LIKE '%SELECT GROUPING%';
  2
 (1 row)
 
+--
+-- Consts merging
+--
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- No merging
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query 
 | calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6) 
 | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7) 
 | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8) 
 | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) 
 | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, 
$10) | 1
+ SELECT pg_stat_statements_reset() 
 | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"
 | 0
+(7 rows)
+
+-- Normal
+SET const_merge_threshold = 5;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | 
calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3)  | 1

Re: [RFC] building postgres with meson - v13

2022-09-24 Thread Andres Freund
Hi,

On 2022-09-24 16:56:20 -0700, Peter Geoghegan wrote:
> On Thu, Sep 22, 2022 at 2:50 PM Andres Freund  wrote:
> > meson:
> >
> > time meson test
> > real0m42.178s
> > user7m8.533s
> > sys 2m17.711s
> 
> I find that a more or less comparable test run on my workstation
> (which has a Ryzen 9 5950X) takes just over 38 seconds. I think that
> the improvement is far more pronounced on that machine compared to a
> much older workstation.

Cool!


> One more question about this, that wasn't covered by the Wiki page: is
> there some equivalent to "make installcheck" with meson builds?

Not yet. Nothing impossible, just not done yet. Partially because installcheck
is so poorly defined (run against an already running server for pg_regress vs
using "system" installed binaries for tap tests).

Greetings,

Andres Freund




Re: [RFC] building postgres with meson - v13

2022-09-24 Thread Peter Geoghegan
On Sat, Sep 24, 2022 at 5:13 PM Andres Freund  wrote:
> > One more question about this, that wasn't covered by the Wiki page: is
> > there some equivalent to "make installcheck" with meson builds?
>
> Not yet. Nothing impossible, just not done yet. Partially because installcheck
> is so poorly defined (run against an already running server for pg_regress vs
> using "system" installed binaries for tap tests).

Got it. I can work around that by just having an old autoconf-based
vpath build directory. I'll need to do this when I run Valgrind.

My workaround would be annoying if I needed to run "installcheck"
anywhere near as frequently as I run "make check-world". But that
isn't the case. meson delivers a significant improvement in the metric
that really matters to me, so I can't really complain.

-- 
Peter Geoghegan




Re: Consider parallel for lateral subqueries with limit

2022-09-24 Thread James Coleman
On Thu, Sep 22, 2022 at 5:19 PM James Coleman  wrote:
>
> On Mon, Sep 19, 2022 at 4:29 PM Robert Haas  wrote:
> >
> > On Mon, Sep 19, 2022 at 3:58 PM James Coleman  wrote:
> > > But in the case where there's correlation via LATERAL we already don't
> > > guarantee unique executions for a given set of params into the lateral
> > > subquery execution, right? For example, suppose we have:
> > >
> > >   select *
> > >   from foo
> > >   left join lateral (
> > > select n
> > > from bar
> > > where bar.a = foo.a
> > > limit 1
> > >   ) on true
> > >
> > > and suppose that foo.a (in the table scan) returns these values in
> > > order: 1, 2, 1. In that case we'll execute the lateral subquery 3
> > > separate times rather than attempting to order the results of foo.a
> > > such that we can re-execute the subquery only when the param changes
> > > to a new unique value (and we definitely don't cache the results to
> > > guarantee unique subquery executions).
> >
> > I think this is true, but I don't really understand why we should
> > focus on LATERAL here. What we really need, and I feel like we've
> > talked about this before, is a way to reason about where parameters
> > are set and used.
>
> Yes, over in the thread "Parallelize correlated subqueries that
> execute within each worker" [1] which was meant to build on top of
> this work (though is technically separate). Your bringing it up here
> too is making me wonder if we can combine the two and instead of
> always allowing subqueries with LIMIT instead (like the other patch
> does) delay final determination of parallel safety of rels and paths
> (perhaps, as that thread is discussing, until gather/gather merge path
> creation).

Upon further investigation and thought I believe it *might* be
possible to do what I'd wondered about above (delay final
determination of parallel safety of the rel until later on in planning
by marking e.g. rels as tentatively safe and re-evaluating that as we
go) as my original patch did on the referenced thread, but that thread
also ended up with a much simpler proposed approach that still moved
final parallel safety determination to later in the planner, but it
did it by checking (in generate_gather_paths and
generate_user_gather_paths) whether that point in the plan tree
supplies the params required by the partial path.

So the current approach in that other thread is orthogonal to (if
complementary in some queries) the current question, which is "must we
immediately disallow parallelism on a rel that has a limit?"

Tom's concern about my arguments about special casing lateral was:

> This argument seems to be assuming that Y is laterally dependent on X,
> but the patch as written will take *any* lateral dependency as a
> get-out-of-jail-free card.  If what we have is "Limit(Y sub Z)" where
> Z is somewhere else in the query tree, it's not apparent to me that
> your argument holds.

I do see now that there was a now obvious flaw in the original patch:
rte->lateral may well be set to true even in cases where there's no
actual lateral dependency. That being said I don't see a need to
determine if the current subtree provides the required lateral
dependency, for the following reasons:

1. We don't want to set rel->consider_parallel to false immediately
because that will then poison everything higher in the tree, despite
the fact that it may well be that it's only higher up the plan tree
that the lateral dependency is provided. Regardless of the level in
the plan tree at which the param is provided we are going to execute
the subquery (even in serial execution) once per outer tuple at the
point in the join tree where the lateral join lies.
2. We're *not* at this point actually checking parallel safety of a
given path (i.e., is the path parallel safe at this point given the
params currently provided), we're only checking to see if the rel
itself should be entirely excluded from consideration for parallel
plans at any point in the future.
3. We already know that the question of whether or not a param is
provided can't be the concern here since it isn't under consideration
in the existing code path. That is, if a subquery doesn't have a limit
then this particular check won't determine that the subquery's
existence should result in setting rel->consider_parallel to false
because of any params it may or may not contain.
4. It is already the case that a subplan using exec params in the
where clause will not be considered parallel safe in path generation.

I believe the proper check for when to exclude subqueries with limit
clauses is (as in the attached patch) prohibiting a limit when
rel->lateral_relids is empty. Here are several examples of queries and
plans and how this code plays out to attempt to validate that
hypothesis:

 select *
  from foo
  left join lateral (
select n
from bar
where bar.a = foo.a
limit 1
  ) on true;

 Nested Loop Left Join  (cost=0.00..8928.05 rows=2550 width=8)
   ->  Seq Scan o

Re: Allow foreign keys to reference a superset of unique columns

2022-09-24 Thread James Coleman
On Fri, Sep 2, 2022 at 5:42 AM Wolfgang Walther  wrote:
>
> Kaiting Chen:
> > I'd like to propose a change to PostgreSQL to allow the creation of a 
> > foreign
> > key constraint referencing a superset of uniquely constrained columns.
>
> +1
>
> Tom Lane:
> > TBH, I think this is a fundamentally bad idea and should be rejected
> > outright.  It fuzzes the semantics of the FK relationship, and I'm
> > not convinced that there are legitimate use-cases.  Your example
> > schema could easily be dismissed as bad design that should be done
> > some other way.
>
> I had to add quite a few unique constraints on a superset of already
> uniquely constrained columns in the past, just to be able to support FKs
> to those columns. I think those cases most often come up when dealing
> with slightly denormalized schemas, e.g. for efficiency.
>
> One other use-case I had recently, was along the followling lines, in
> abstract terms:
>
> CREATE TABLE classes (class INT PRIMARY KEY, ...);
>
> CREATE TABLE instances (
>instance INT PRIMARY KEY,
>class INT REFERENCES classes,
>...
> );
>
> Think about classes and instances as in OOP. So the table classes
> contains some definitions for different types of object and the table
> instances realizes them into concrete objects.
>
> Now, assume you have some property of a class than is best modeled as a
> table like this:
>
> CREATE TABLE classes_prop (
>property INT PRIMARY KEY,
>class INT REFERNECES classes,
>...
> );
>
> Now, assume you need to store data for each of those classes_prop rows
> for each instance. You'd do the following:
>
> CREATE TABLE instances_prop (
>instance INT REFERENCES instances,
>property INT REFERENCES classes_prop,
>...
> );
>
> However, this does not ensure that the instance and the property you're
> referencing in instances_prop are actually from the same class, so you
> add a class column:
>
> CREATE TABLE instances_prop (
>instance INT,
>class INT,
>property INT,
>FOREIGN KEY (instance, class) REFERENCES instances,
>FOREIGN KEY (property, class) REFERENCES classes_prop,
>...
> );
>
> But this won't work, without creating some UNIQUE constraints on those
> supersets of the PK column first.

If I'm following properly this sounds like an overengineered EAV
schema, and neither of those things inspires me to think "this is a
use case I want to support".

That being said, I know that sometimes examples that have been
abstracted enough to share aren't always the best, so perhaps there's
something underlying this that's a more valuable example.

> > For one example of where the semantics get fuzzy, it's not
> > very clear how the extra-baggage columns ought to participate in
> > CASCADE updates.  Currently, if we have
> > CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> > then an update that changes only foo.b doesn't need to update
> > referencing tables, and I think we even have optimizations that
> > assume that if no unique-key columns are touched then RI checks
> > need not be made.  But if you did
> > CREATE TABLE bar (x integer, y integer,
> >   FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE 
> > CASCADE);
> > then perhaps you expect bar.y to be updated ... or maybe you don't?
>
> In all use-cases I had so far, I would expect bar.y to be updated, too.
>
> I think it would not even be possible to NOT update bar.y, because the
> FK would then not match anymore. foo.a is the PK, so the value in bar.x
> already forces bar.y to be the same as foo.b at all times.
>
> bar.y is a little bit like a generated value in that sense, it should
> always match foo.b. I think it would be great, if we could actually go a
> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
> exists, I would like to set bar.y automatically to the new foo.b.
> Otherwise those kind of updates always have to either query foo before,
> or add a trigger to do the same.

Isn't this actually contradictory to the behavior you currently have
with a multi-column foreign key? In the example above then an update
to bar.x is going to update the rows in foo that match bar.x = foo.a
and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
new values. You seem to be suggesting that instead it should look for
other rows that already match the *new value* of only one of the
columns in the constraint. If I'm understanding the example correctly,
that seems like a *very* bad idea.

James Coleman




Re: WIP: WAL prefetch (another approach)

2022-09-24 Thread Thomas Munro
On Wed, Apr 13, 2022 at 8:05 AM Thomas Munro  wrote:
> On Wed, Apr 13, 2022 at 3:57 AM Dagfinn Ilmari Mannsåker
>  wrote:
> > Simon Riggs  writes:
> > > This is a nice feature if it is safe to turn off full_page_writes.

> > > When is it safe to do that? On which platform?
> > >
> > > I am not aware of any released software that allows full_page_writes
> > > to be safely disabled. Perhaps something has been released recently
> > > that allows this? I think we have substantial documentation about
> > > safety of other settings, so we should carefully document things here
> > > also.
> >
> > Our WAL reliability docs claim that ZFS is safe against torn pages:
> >
> > https://www.postgresql.org/docs/current/wal-reliability.html:
> >
> > If you have file-system software that prevents partial page writes
> > (e.g., ZFS), you can turn off this page imaging by turning off the
> > full_page_writes parameter.
>
> Unfortunately, posix_fadvise(WILLNEED) doesn't do anything on ZFS
> right now :-(.

Update: OpenZFS now has this working in its master branch (Linux only
for now), so fingers crossed for the next release.