date:20220615

> On Jun 15, 2022, at 8:51 PM, Michael Paquier  wrote:
> 
> I am not sure to see why this would be something users would actually
> use in prod.  That means to pick up something else than what the
> server thinks is the best default AM but where somebody does not want
> to trust the default, while generating an error if specifying the
> default AM in the USING NOT clause.

Sorry for the lack of clarity.  I do not suggest raising an error.  If you say 
"USING NOT foo", and foo is the default table access method, then you get the 
same behavior as a "USING heap" would have gotten you, otherwise, you get the 
same behavior as not providing any USING clause at all.

In future, we might want to create a list of fallback tams rather than just 
hardcoding "heap" as the one and only fallback, but I haven't run into an 
actual need for that.  If you're wondering what "USING NOT heap" falls back to, 
I think that could error, or it could just use heap anyway.  Whatever.  That's 
why I'm still soliciting for comments at this phase rather than posting a patch.

>  On top of that
> default_table_access_method is user-settable.

Yeah, but specifying a "USING foo" clause is also open to any user, so I don't 
see why this matters.  "USING NOT foo" is just shorthand for checking the 
current default_table_access_method, and then either appending a "USING heap" 
clause or appending no clause.  Since the user can do this anyway, what's the 
security implication in some syntactic sugar?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

2022-06-15 Thread Andrey Lepikhov


On 15/6/2022 21:45, Imseih (AWS), Sami wrote:

Adding a plan_id to pg_stat_activity allows users
to determine if a plan for a particular statement
has changed and if the new plan is performing better
or worse for a particular statement.
There are several ways the plan_id in pg_stat_activity

In general, your idea is quite useful.
But, as discussed earlier [1] extensions would implement many useful 
things if they could add into a plan some custom data.
Maybe implement your feature with some private list of nodes in plan 
structure instead of single-purpose plan_id field?


[1] 
https://www.postgresql.org/message-id/flat/e0de3423-4bba-1e69-c55a-f76bf18dbd74%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional

Re: Modest proposal to extend TableAM API for controlling cluster commands

> On Jun 15, 2022, at 8:18 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:
>>> On Jun 15, 2022, at 7:30 PM, Andres Freund  wrote:
 But it's up to the TAM what it wants to do with that boolean, if in fact it
 does anything at all based on that.  A TAM could decide to do the exact
 opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
 on CLUSTER, or perhaps perform a randomized shuffle, or >>> behavior here>.
>>> 
>>> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
>>> that's possible with all extension APIs.
>> 
>> I don't think it's "stupid stuff".  A major motivation, perhaps the only
>> useful motivation, for implementing a TAM is to get a non-trivial
>> performance boost (relative to heap) for some target workload, almost
>> certainly at the expense of worse performance for another workload.  To
>> optimize any particular workload sufficiently to make it worth the bother,
>> you've pretty much got to do something meaningfully different than what heap
>> does.
> 
> Sure. I just don't see what that has to do with doing something widely
> differing in VACUUM FULL. Which AM can't support that? I guess there's some
> where implementing the full visibility semantics isn't feasible, but that's
> afaics OK.

The problem isn't that the TAM can't do it.  Any TAM can probably copy its 
contents verbatim from the OldHeap to the NewHeap.  But it's really stupid to 
have to do that if you're not going to change anything along the way, and I 
think if you divorce your thinking from how heap-AM works sufficiently, you 
might start to see how other TAMs might have nothing useful to do at this step. 
 So there's a strong motivation to not be forced into a "move all the data 
uselessly" step.

I don't really want to discuss the proprietary details of any TAMs I'm 
developing, so I'll use some made up examples.  Imagine you have decided to 
trade off the need to vacuum against the cost of storing 64bit xids.  That 
doesn't mean that compaction isn't maybe still a good thing, but you don't need 
to vacuum for anti-wraparound purposes anymore.  Now imagine you've also 
decided to trade off insert speed for select speed, and you sort the entire 
table on every insert, and to keep indexes happy, you keep a "externally 
visible TID" to "internal actual location" mapping in a separate fork.   Let's 
say you also don't support UPDATE or DELETE at all.  Those immediately draw an 
error, "not implemented by this tam".

At this point, you have a fully sorted and completely compacted table at all 
times.  It's simply an invariant of the TAM.  So, now what exactly is VACUUM 
FULL or CLUSTER supposed to do?  Seems like the answer is "diddly squat", and 
yet you seem to propose requiring the TAM to do it.  I don't like that.

 From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
 synonyms.  Or am I missing something?
>>> 
>>> The callback gets passed use_sort. So just implement it use_sort = false and
>>> error out if use_sort = true?
>> 
>> I'm not going to say that your idea is unreasonable for a TAM that you might
>> choose to implement, but I don't see why that should be required of all TAMs
>> anybody might ever implement.
> 
>> The callback that gets use_sort is called from copy_table_data().  By that 
>> time, it's too late to avoid the
>> 
>>/*
>> * Open the relations we need.
>> */
>>NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
>>OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);
>> 
>> code that has already happened in cluster.c's copy_table_data() function,
>> and unless I raise an error, after returning from my TAM's callback, the
>> cluster code will replace the old table with the new one.  I'm left no
>> choices but to copy my data over, loose my data, or abort the command.  None
>> of those are OK options for me.
> 
> I think you need to do a bit more explaining of what you're actually trying to
> achieve here. You're just saying "I don't want to", which doesn't really help
> me to understand the set of useful options.

I'm trying to opt out of cluster/vacfull.

>> I'm open to different solutions.  If a simple callback like
>> relation_supports_cluster(Relation rel) is too simplistic, and more
>> parameters with more context information is wanted, then fine, let's do
>> that.
> 
> FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
> relation_copy_for_cluster optional.
> 
> I still think it's a terrible idea to silently ignore tables in CLUSTER or
> VACUUM FULL.

I'm not entirely against you on that, but it makes me cringe that we impose 
design decisions like that on any and all future TAMs.  It seems better to me 
to let the TAM author decide to emit an error, warning, notice, or whatever, as 
they see fit.

>> But I can't really complete my work with the interface as it stands
>> now.
> 
> Since you've not described that work

Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

2022-06-15 Thread Julien Rouhaud

Hi,

On Wed, Jun 15, 2022 at 06:45:38PM +, Imseih (AWS), Sami wrote:
> Adding a plan_id to pg_stat_activity allows users
> to determine if a plan for a particular statement
> has changed and if the new plan is performing better
> or worse for a particular statement.
> [...]
> Attached is a POC patch that computes the plan_id
> and presents the top-level plan_id in pg_stat_activity.

AFAICS you're proposing to add an identifier for a specific plan, but no way to
know what that plan was?  How are users supposed to use the information if they
know something changed but don't know what changed exactly?

> - In the POC, the compute_query_id GUC determines if a
>   plan_id is to be computed. Should this be a separate GUC?

Probably, as computing it will likely be quite expensive.  Some benchmark on
various workloads would be needed here.

I only had a quick look at the patch, but I see that you have some code to
avoid storing the query text multiple times with different planid.  How does it
work exactly, and does it ensure that the query text is only removed once the
last entry that uses it is removed?  It seems that you identify a specific
query text by queryid, but that seems wrong as collision can (easily?) happen
in different databases.  The real identifier of a query text should be (dbid,
queryid).

Note that this problem already exists, as the query texts are now stored per
(userid, dbid, queryid, istoplevel).  Maybe this part could be split in a
different commit as it could already be useful without a planid.

Re: Extending USING [heap | mytam | yourtam] grammar and behavior

> On Jun 15, 2022, at 8:51 PM, Michael Paquier  wrote:
> 
> However,
> your problem is basically that you develop multiple AMs, but you want
> to have regression tests that do checks across more than one table AM
> at the same time.

It is true that I test multiple table AMs at the same time, but that's a 
somewhat different concern.

>  Am I getting that right?

Not exactly.

> Why is a grammar
> extension necessary for what looks like a test structure problem when 
> there are interdependencies across multiple AMs developped?

Ok, I didn't want to get into my exact process, because it involves other 
changes that I don't expect -hackers to want.  But basically what I do is:

./configure --with-default-tam=chicago && make && make check-world

That fails for a few tests, and I manually change the create table statements 
in tests that are not chicago-compatible to "using not chicago".  Then

./configure --with-default-tam=detroit && make && make check-world

That fails for some other set of tests, but note that the tests with "using not 
chicago" are still using detroit in this second run.  That wouldn't be true if 
I'd fixed up the tests in the first run "using heap".

Then I can also add my own tests which might make some chicago backed tables 
plus some detroit backed tables and see how they interact.  But that's 
superfluous to the issue of just trying to leverage the existing tests as much 
as I can without having to reinvent tests to cover "chicago", and then reinvent 
again to cover "detroit", and so forth.

If you develop enough TAMs in parallel, and go with the "using heap" solution, 
you eventually have zero coverage for any of the TAMs, because you'll 
eventually be "using heap" in all the tables of all the tests.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Replica Identity check of partition table on subscriber

2022-06-15 Thread shiy.f...@fujitsu.com

On Wed, Jun 15, 2022 8:30 PM Amit Kapila  wrote:
> 
> I have pushed the first bug-fix patch today.
> 

Thanks.

Attached the remaining patches which are rebased.

Regards,
Shi yu



v9-0002-fix-memory-leak-about-attrmap.patch
Description: v9-0002-fix-memory-leak-about-attrmap.patch


v9-0001-Check-partition-table-replica-identity-on-subscri.patch
Description:  v9-0001-Check-partition-table-replica-identity-on-subscri.patch

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-06-15 Thread Masahiko Sawada

On Wed, May 25, 2022 at 11:48 AM Masahiko Sawada  wrote:
>
> On Tue, May 10, 2022 at 6:58 PM John Naylor
>  wrote:
> >
> > On Tue, May 10, 2022 at 8:52 AM Masahiko Sawada  
> > wrote:
> > >
> > > Overall, radix tree implementations have good numbers. Once we got an
> > > agreement on moving in this direction, I'll start a new thread for
> > > that and move the implementation further; there are many things to do
> > > and discuss: deletion, API design, SIMD support, more tests etc.
> >
> > +1
> >
>
> Thanks!
>
> I've attached an updated version patch. It is still WIP but I've
> implemented deletion and improved test cases and comments.

I've attached an updated version patch that changes the configure
script. I'm still studying how to support AVX2 on msvc build. Also,
added more regression tests.

The integration with lazy vacuum and parallel vacuum is missing for
now. In order to support parallel vacuum, we need to have the radix
tree support to be created on DSA.

Added this item to the next CF.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index d3562d6fee..a56d6e89da 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -676,3 +676,27 @@ if test x"$Ac_cachevar" = x"yes"; then
 fi
 undefine([Ac_cachevar])dnl
 ])# PGAC_ARMV8_CRC32C_INTRINSICS
+
+# PGAC_AVX2_INTRINSICS
+# 
+# Check if the compiler supports the Intel AVX2 instructinos.
+#
+# If the intrinsics are supported, sets pgac_avx2_intrinsics, and CFLAGS_AVX2.
+AC_DEFUN([PGAC_AVX2_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx2_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for _mm256_set_1_epi8 _mm256_cmpeq_epi8 _mm256_movemask_epi8 CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [__m256i vec = _mm256_set1_epi8(0);
+   __m256i cmp = _mm256_cmpeq_epi8(vec, vec);
+   return _mm256_movemask_epi8(cmp) > 0;])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_AVX2="$1"
+  pgac_avx2_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_AVX2_INTRINSICS
diff --git a/configure b/configure
index 7dec6b7bf9..6ebc15a8c1 100755
--- a/configure
+++ b/configure
@@ -645,6 +645,7 @@ XGETTEXT
 MSGMERGE
 MSGFMT_FLAGS
 MSGFMT
+CFLAGS_AVX2
 PG_CRC32C_OBJS
 CFLAGS_ARMV8_CRC32C
 CFLAGS_SSE42
@@ -18829,6 +18830,82 @@ $as_echo "slicing-by-8" >&6; }
 fi
 
 
+# Check for Intel AVX2 intrinsics.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mm256i CFLAGS=" >&5
+$as_echo_n "checking for _mm256i CFLAGS=... " >&6; }
+if ${pgac_cv_avx2_intrinsics_+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS "
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+__m256i vec = _mm256_set1_epi8(0);
+   __m256i cmp = _mm256_cmpeq_epi8(vec, vec);
+   return _mm256_movemask_epi8(cmp) > 0;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv_avx2_intrinsics_=yes
+else
+  pgac_cv_avx2_intrinsics_=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+CFLAGS="$pgac_save_CFLAGS"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_avx2_intrinsics_" >&5
+$as_echo "$pgac_cv_avx2_intrinsics_" >&6; }
+if test x"$pgac_cv_avx2_intrinsics_" = x"yes"; then
+  CFLAGS_AVX2=""
+  pgac_avx2_intrinsics=yes
+fi
+
+if test x"pgac_avx2_intrinsics" != x"yes"; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mm256i CFLAGS=-mavx2" >&5
+$as_echo_n "checking for _mm256i CFLAGS=-mavx2... " >&6; }
+if ${pgac_cv_avx2_intrinsics__mavx2+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS -mavx2"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+__m256i vec = _mm256_set1_epi8(0);
+   __m256i cmp = _mm256_cmpeq_epi8(vec, vec);
+   return _mm256_movemask_epi8(cmp) > 0;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv_avx2_intrinsics__mavx2=yes
+else
+  pgac_cv_avx2_intrinsics__mavx2=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+CFLAGS="$pgac_save_CFLAGS"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_avx2_intrinsics__mavx2" >&5
+$as_echo "$pgac_cv_avx2_intrinsics__mavx2" >&6; }
+if test x"$pgac_cv_avx2_intrinsics__mavx2" = x"yes"; then
+  CFLAGS_AVX2="-mavx2"
+  pgac_avx2_intrinsics=yes
+fi
+
+fi
+
 
 # Select semaphore implementation type.
 if test "$PORTNAME" != "win32"; then
diff --git a/configure.ac b/configure.ac
index d093fb88dd..6b6d095306 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2300,6 +2300,12 @@ else
 fi
 AC_SUBST(PG_CRC32C_OBJS)
 
+# Check for Intel AVX2 intrinsics.
+PGAC_AVX2_INTRINSICS([])
+if test x"pgac_avx2_intrinsics" != x"yes"; then
+

Re: Extending USING [heap | mytam | yourtam] grammar and behavior

2022-06-15 Thread David G. Johnston

On Wed, Jun 15, 2022 at 8:51 PM Michael Paquier  wrote:

> On top of that
> default_table_access_method is user-settable.
>
>
FWIW this proposal acknowledges that and basically leverages it to the
hilt, turning it into something like search_path.  I strongly dislike the
idea of any workflow that depends on a GUC in this manner.  The fact that
it is user-settable is, IMO, a flaw, not a feature, at least as far as
production settings are concerned.

It is a novel API for PostgreSQL to rely upon setting a GUC then attaching
"unless" configurations to individual objects to ignore it.  And what would
be chosen (ultimately fallback is heap?), or whether it would simply error,
is presently, as you say, undefined.

In production this general behavior becomes useful only under the condition
that among the various named access methods some of them don't even exist
on the server in question, but that a fallback option would be acceptable
in that case.  But that suggests extending "USING" to accept
multiple names, not inventing a "NOT USING".

That all said, I can understand that testing presents its own special
needs.  But testing is probably where GUCs shine.  So why not implement
this capability as a GUC that is set just before the table is created
instead of extending the grammar for it?  Add it to "developer options" and
call it a day.  Dump/Restore no longer has to care about it, and its value
once the table exists is basically zero anyway.

David J.

Re: bogus: logical replication rows/cols combinations

On Mon, Jun 13, 2022 at 8:54 AM Amit Kapila  wrote:
>
> I would like to close the Open item listed corresponding to this
> thread [1] as the fix for the reported issue is committed
> (fd0b9dcebd). Do let me know if you or others think otherwise?
>

Seeing no objections, I have closed this item.

-- 
With Regards,
Amit Kapila.

Re: Skipping schema changes in publication

On Tue, Jun 14, 2022 at 9:10 AM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, June 8, 2022 7:04 PM Amit Kapila  
> wrote:
> >
> > On Fri, Jun 3, 2022 at 3:37 PM vignesh C  wrote:
> > >
> > > Thanks for the comments, the attached v8 patch has the changes for the
> > same.
> > >
> >
> > AFAICS, the summary of this proposal is that we want to support
> > exclude of certain objects from publication with two kinds of
> > variants. The first variant is to add support to exclude specific
> > tables from ALL TABLES PUBLICATION. Without this feature, users need
> > to manually add all tables for a database even when she wants to avoid
> > only a handful of tables from the database say because they contain
> > sensitive information or are not required. We have seen that other
> > database like MySQL also provides similar feature [1] (See
> > REPLICATE_WILD_IGNORE_TABLE). The proposed syntax for this is as
> > follows:
> >
> > CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE t1,t2;
> > or
> > ALTER PUBLICATION pub1 ADD ALL TABLES EXCEPT TABLE t1,t2;
> >
> > This will allow us to publish all the tables in the current database
> > except t1 and t2. Now, I see that pg_dump has a similar option
> > provided by switch --exclude-table but that allows tables matching
> > patterns which is not the case here. I am not sure if we need a
> > similar variant here.
> >
> > Then users will be allowed to reset the publication by:
> > ALTER PUBLICATION pub1 RESET;
> >
> > This will reset the publication to the default state which includes
> > resetting the publication parameters, setting the ALL TABLES flag to
> > false, and dropping the relations and schemas that are associated with
> > the publication. I don't know if we want to go further with allowing
> > to RESET specific parameters and if so which parameters and what would
> > its syntax be?
> >
> > The second variant is to add support to exclude certain columns of a
> > table while publishing a particular table. Currently, users need to
> > list all required columns' names even if they don't want to hide most
> > of the columns in the table (for example Create Publication pub For
> > Table t1 (c1, c2)). Consider user doesn't want to publish the 'salary'
> > or other sensitive information of executives/employees but would like
> > to publish all other columns. I feel in such cases it will be a lot of
> > work for the user especially when the table has many columns. I see
> > that Oracle has a similar feature [2]. I think without this it will be
> > difficult for users to use this feature in some cases. The patch for
> > this is not proposed but I would imagine syntax for it to be something
> > like "Create Publication pub For Table t1 Except (c3)" and similar
> > variants for Alter Publication.
>
> I think the feature to exclude certain columns of a table would be useful.
>
> In some production scenarios, we usually do not want to replicate
> sensitive fields(column) in the table. Although we already can achieve
> this by specify all replicated columns in the list[1], but that seems a
> hard work when the table has hundreds of columns.
>
> [1]
> CREATE TABLE test(a int, b int, c int,..., sensitive text);
> CRAETE PUBLICATION pub FOR TABLE test(a,b,c,...);
>
> In addition, it's not easy to maintain the column list like above. Because
> we sometimes need to add new fields or delete fields due to business
> needs. Every time we add a column(or delete a column in column list), we
> need to update the column list.
>
> If we support Except:
> CRAETE PUBLICATION pub FOR TABLE test EXCEPT (sensitive);
>
> We don't need to update the column list in most cases.
>

Right, this is a valid point and I think it makes sense for me to
support such a feature for column list and also to exclude a
particular table(s) from the ALL TABLES publication.

Peter E., Euler, and others, do you have any objections to supporting
the above-mentioned two cases?

-- 
With Regards,
Amit Kapila.

Re: Extending USING [heap | mytam | yourtam] grammar and behavior

On Wed, Jun 15, 2022 at 06:16:21PM -0700, Mark Dilger wrote:
> When developing two or more TAMs, this falls apart.  Some tests may
> be worth fixing up (perhaps with alternate output files) for
> "mytam", but not for "columnar_tam".  That might be because the test
> is checking fundamentally row-store-ish properties of the table,
> which has no applicability to your column-store-ish TAM.  In that
> case, "USING NOT columnar_tam" fixes the test failure when columnar
> is the default, without preventing the test from testing "mytam"
> when it happens to be the default. 

I think that it is very important for the in-core test suite to remain
transparent in terms of options used for table AMs (or compression),
and this has improved a lot over the last years with options like
HIDE_TABLEAM and HIDE_TOAST_COMPRESSION.  Things could have actually
more ORDER BY clauses to ensure more ordering of the results, as long
as the tests don't want to stress a specific planning path.  However,
your problem is basically that you develop multiple AMs, but you want
to have regression tests that do checks across more than one table AM
at the same time.  Am I getting that right?  Why is a grammar
extension necessary for what looks like a test structure problem when 
there are interdependencies across multiple AMs developped?

> Once you have enough TAMs developed and deployed, this USING NOT
> business becomes useful in production.  You might have different
> defaults on different servers, or for different customers, etc., and
> for a given piece of DDL that you want to release you only want to
> say which TAMs not to use, not to nail down which TAM must be used. 

I am not sure to see why this would be something users would actually
use in prod.  That means to pick up something else than what the
server thinks is the best default AM but where somebody does not want
to trust the default, while generating an error if specifying the
default AM in the USING NOT clause.  On top of that
default_table_access_method is user-settable.
--
Michael

signature.asc
Description: PGP signature

Re: Modest proposal to extend TableAM API for controlling cluster commands

2022-06-15 Thread David G. Johnston

On Wed, Jun 15, 2022 at 8:18 PM Andres Freund  wrote:

> > If a simple callback like
> > relation_supports_cluster(Relation rel) is too simplistic
>

Seems like it should be called:
relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Basically, if the user tells the table to make itself smaller on disk by
removing dead tuples, should we support the case where the Table AM says:
"Sorry, I cannot do that"?

If yes, then naming the table explicitly should elicit an error.  Having
the table chosen implicitly should provoke a warning.  For ALTER TABLE
CLUSTER there should be an error: which makes the implicit CLUSTER command
a non-factor.

However, given that should the table structure change it is imperative that
the Table AM be capable of producing a new physical relation with the
correct data, which will have been compacted as a side-effect, it seems
like, explicit or implicit, expecting any Table AM to do that when faced
with Vacuum Full is reasonable.  Which leaves deciding how to allow a table
with a given TAM to prevent itself from being added to the CLUSTER roster.
And decide whether an opt-out feature for implicit VACUUM FULL is something
we should offer as well.

I'm doubtful that a TAM that is pluggable into the MVCC and WAL
architecture of PostgreSQL could avoid this basic contract between the
system and its users.

David J.

Re: warn if GUC set to an invalid shared library

2022-06-15 Thread Justin Pryzby

I've started to think that we should really WARN whenever a (set of) GUC is set
in a manner that the server will fail to start - not just for shared libraries.

In particular, I think the server should also warn if it's going to fail to
start like this:

2022-06-15 22:48:34.279 CDT postmaster[20782] FATAL:  WAL streaming 
(max_wal_senders > 0) requires wal_level "replica" or "logical"

-- 
Justin

Re: [Proposal] pg_rewind integration into core

On Wed, Mar 23, 2022 at 05:13:47PM +0530, RKN Sai Krishna wrote:
> Considering the scenarios where primary is ahead of sync standbys, upon
> promotion of a standby, pg_rewind is needed on the old primary if it has to
> be up as a standby. Similarly in the scenarios where async standbys(in
> physical replication context) go ahead of sync standbys, and upon promotion
> of a standby, there is need for pg_rewind to be performed on the async
> standbys which are ahead of sync standby being promoted.

> With these scenarios under consideration, integrating pg_rewind into
> postgres core might be a better option IMO. We could optionally choose to
> have pg_rewind dry run performed during the standby startup and based on
> the need, perform the rewind and have the standby in sync with the primary.

pg_rewind is already part of the core code as a binary tool, but what
you mean is to integrate a portion of it in the backend code, as of a
startup sequence (with the node to rewind using primary_conninfo for
the source?).  Once thing that we would need to be careful about
is that no assumptions a rewind relies on are messed up in any way
at the step where the rewind begins.  One such thing is that the
standby has achieved crash recovery correctly, so you would need 
to somewhat complicate more the startup sequence, which is already a
complicated and sensitive piece of logic, with more internal
dependencies between each piece.  I am not really convinced that we
need to add more technical debt in this area, particularly now that
pg_rewind is able to enforce recovery on the target node once so as it
has a clean state when the rewind can begin, so the assumptions around
crash recovery and rewind have a clear frontier cut.
--
Michael


signature.asc
Description: PGP signature

Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 7:30 PM, Andres Freund  wrote:
> >> But it's up to the TAM what it wants to do with that boolean, if in fact it
> >> does anything at all based on that.  A TAM could decide to do the exact
> >> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
> >> on CLUSTER, or perhaps perform a randomized shuffle, or  >> behavior here>.
> > 
> > That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> > that's possible with all extension APIs.
> 
> I don't think it's "stupid stuff".  A major motivation, perhaps the only
> useful motivation, for implementing a TAM is to get a non-trivial
> performance boost (relative to heap) for some target workload, almost
> certainly at the expense of worse performance for another workload.  To
> optimize any particular workload sufficiently to make it worth the bother,
> you've pretty much got to do something meaningfully different than what heap
> does.

Sure. I just don't see what that has to do with doing something widely
differing in VACUUM FULL. Which AM can't support that? I guess there's some
where implementing the full visibility semantics isn't feasible, but that's
afaics OK.


> >> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
> >> synonyms.  Or am I missing something?
> > 
> > The callback gets passed use_sort. So just implement it use_sort = false and
> > error out if use_sort = true?
> 
> I'm not going to say that your idea is unreasonable for a TAM that you might
> choose to implement, but I don't see why that should be required of all TAMs
> anybody might ever implement.

> The callback that gets use_sort is called from copy_table_data().  By that 
> time, it's too late to avoid the
> 
> /*
>  * Open the relations we need.
>  */
> NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
> OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);
> 
> code that has already happened in cluster.c's copy_table_data() function,
> and unless I raise an error, after returning from my TAM's callback, the
> cluster code will replace the old table with the new one.  I'm left no
> choices but to copy my data over, loose my data, or abort the command.  None
> of those are OK options for me.

I think you need to do a bit more explaining of what you're actually trying to
achieve here. You're just saying "I don't want to", which doesn't really help
me to understand the set of useful options.


> I'm open to different solutions.  If a simple callback like
> relation_supports_cluster(Relation rel) is too simplistic, and more
> parameters with more context information is wanted, then fine, let's do
> that.

FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
relation_copy_for_cluster optional.

I still think it's a terrible idea to silently ignore tables in CLUSTER or
VACUUM FULL.


> But I can't really complete my work with the interface as it stands
> now.

Since you've not described that work to a meaningful degree...

Greetings,

Andres Freund

Re: warn if GUC set to an invalid shared library

On Wed, Mar 23, 2022 at 03:02:23PM -0400, Tom Lane wrote:
> I agree with Maciek's concerns about these HINTs being emitted
> inappropriately, but I also have a stylistic gripe: they're only
> halfway hints.  Given that we fix things so they only print when they
> should, the complaint about the server not starting is not a hint,
> it's a fact, which per style guidelines means it should be errdetail.
> So I think this ought to look more like
> 
> WARNING:  could not access file "xyz"
> DETAIL:  The server will fail to start with this setting.
> HINT:  If the server is shut down, it will be necessary to manually edit the
> postgresql.auto.conf file to allow it to start again.
> 
> I adjusted the wording a bit too --- YMMV, but I think my text is clearer.

It seems to me that there is no objection to the proposed patch, but
an update is required.  Justin?
--
Michael


signature.asc
Description: PGP signature

Re: Modest proposal to extend TableAM API for controlling cluster commands

> On Jun 15, 2022, at 7:30 PM, Andres Freund  wrote:
> 
>> It's effectively a synonym which determines whether the "bool use_sort"
>> parameter of the table AM's relation_copy_for_cluster will be set.  Heap-AM
>> plays along and sorts or not based on that.
> 
> Hardly a synonym if it behaves differently?

I don't think this point is really worth arguing.  We don't have to call it a 
synonym, and the rest of the discussion remains the same.

>> But it's up to the TAM what it wants to do with that boolean, if in fact it
>> does anything at all based on that.  A TAM could decide to do the exact
>> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
>> on CLUSTER, or perhaps perform a randomized shuffle, or > behavior here>.
> 
> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> that's possible with all extension APIs.

I don't think it's "stupid stuff".  A major motivation, perhaps the only useful 
motivation, for implementing a TAM is to get a non-trivial performance boost 
(relative to heap) for some target workload, almost certainly at the expense of 
worse performance for another workload.  To optimize any particular workload 
sufficiently to make it worth the bother, you've pretty much got to do 
something meaningfully different than what heap does.

>> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
>> synonyms.  Or am I missing something?
> 
> The callback gets passed use_sort. So just implement it use_sort = false and
> error out if use_sort = true?

I'm not going to say that your idea is unreasonable for a TAM that you might 
choose to implement, but I don't see why that should be required of all TAMs 
anybody might ever implement.

The callback that gets use_sort is called from copy_table_data().  By that 
time, it's too late to avoid the

/*
 * Open the relations we need.
 */
NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);

code that has already happened in cluster.c's copy_table_data() function, and 
unless I raise an error, after returning from my TAM's callback, the cluster 
code will replace the old table with the new one.  I'm left no choices but to 
copy my data over, loose my data, or abort the command.  None of those are OK 
options for me.

I'm open to different solutions.  If a simple callback like 
relation_supports_cluster(Relation rel) is too simplistic, and more parameters 
with more context information is wanted, then fine, let's do that.  But I can't 
really complete my work with the interface as it stands now.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Using PQexecQuery in pipeline mode produces unexpected Close messages

2022-06-15 Thread Kyotaro Horiguchi

At Thu, 16 Jun 2022 10:34:22 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> PQgetResult() resets the state to IDLE while in pipeline mode.
> 
> fe-exec.c:2171
> 
> > if (conn->pipelineStatus != PQ_PIPELINE_OFF)
> > {
> > /*
> >  * We're about to send the results of the 
> > current query.  Set
> >  * us idle now, and ...
> >  */
> > conn->asyncStatus = PGASYNC_IDLE;
> 
> And actually that code let the connection state enter to IDLE before
> CloseComplete.  In the test case I posted, the following happens.
> 
>   PQsendQuery(conn, "SELECT 1;");
>   PQsendFlushRequest(conn);
>   PQgetResult(conn);  // state enters IDLE, reads down to 
> 
>   PQgetResult(conn);  // reads 
>   PQpipelineSync(conn);   // sync too late
> 
> Pipeline feature seems intending to allow PQgetResult called before
> PQpipelineSync. And also seems allowing to call QPpipelineSync() after
> PQgetResult().
> 
> I haven't come up with a valid *fix* of this flow..

The attached is a crude patch to separate the state for PIPELINE-IDLE
from PGASYNC_IDLE.  I haven't found a better way..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c
index 0ff563f59a..261ccc3ed4 100644
--- a/src/test/modules/libpq_pipeline/libpq_pipeline.c
+++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c
@@ -968,15 +968,27 @@ test_prepared(PGconn *conn)
 	fprintf(stderr, "ok\n");
 }
 
+static int n_notice;
+static void
+notice_processor(void *arg, const char *message)
+{
+	n_notice++;
+	fprintf(stderr, "%s", message);
+}
+
 static void
 test_simple_pipeline(PGconn *conn)
 {
 	PGresult   *res = NULL;
 	const char *dummy_params[1] = {"1"};
 	Oid			dummy_param_oids[1] = {INT4OID};
-
+	PQnoticeProcessor oldproc;
+ 
 	fprintf(stderr, "simple pipeline... ");
 
+	n_notice = 0;
+	oldproc = PQsetNoticeProcessor(conn, notice_processor, NULL);
+
 	/*
 	 * Enter pipeline mode and dispatch a set of operations, which we'll then
 	 * process the results of as they come in.
@@ -1052,6 +1064,51 @@ test_simple_pipeline(PGconn *conn)
 	if (PQpipelineStatus(conn) != PQ_PIPELINE_OFF)
 		pg_fatal("Exiting pipeline mode didn't seem to work");
 
+	if (n_notice > 0)
+		pg_fatal("unexpected notice");
+
+	/* Try the same thing with PQsendQuery */
+	if (PQenterPipelineMode(conn) != 1)
+		pg_fatal("failed to enter pipeline mode: %s", PQerrorMessage(conn));
+
+	if (PQsendQuery(conn, "SELECT 1;") != 1)
+		pg_fatal("failed to send query: %s", PQerrorMessage(conn));
+	PQsendFlushRequest(conn);
+
+	res = PQgetResult(conn);
+	if (res == NULL)
+		pg_fatal("PQgetResult returned null when there's a pipeline item: %s",
+ PQerrorMessage(conn));
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		pg_fatal("Unexpected result code %s from first pipeline item",
+ PQresStatus(PQresultStatus(res)));
+	PQclear(res);
+
+	res = PQgetResult(conn);
+	if (res != NULL)
+		pg_fatal("expected NULL result");
+
+	if (PQpipelineSync(conn) != 1)
+		pg_fatal("pipeline sync failed: %s", PQerrorMessage(conn));
+	res = PQgetResult(conn);
+	if (res == NULL)
+		pg_fatal("PQgetResult returned null when there's a pipeline item: %s",
+ PQerrorMessage(conn));
+	if (PQresultStatus(res) != PGRES_PIPELINE_SYNC)
+		pg_fatal("Unexpected result code %s instead of PGRES_PIPELINE_SYNC, error: %s",
+ PQresStatus(PQresultStatus(res)), PQerrorMessage(conn));
+	PQclear(res);
+	res =NULL;
+
+	if (PQexitPipelineMode(conn) != 1)
+		pg_fatal("attempt to exit pipeline mode failed when it should've succeeded: %s",
+ PQerrorMessage(conn));
+
+	if (n_notice > 0)
+		pg_fatal("unexpected notice");
+
+	PQsetNoticeProcessor(conn, oldproc, NULL);
+
 	fprintf(stderr, "ok\n");
 }
 
diff --git a/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace b/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace
index 5c94749bc1..1612c371c0 100644
--- a/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace
+++ b/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace
@@ -9,4 +9,18 @@ B	33	RowDescription	 1 "?column?"  0  4 -1 0
 B	11	DataRow	 1 1 '1'
 B	13	CommandComplete	 "SELECT 1"
 B	5	ReadyForQuery	 I
+F	17	Parse	 "" "SELECT 1;" 0
+F	12	Bind	 "" "" 0 0 0
+F	6	Describe	 P ""
+F	9	Execute	 "" 0
+F	6	Close	 P ""
+F	4	Flush
+B	4	ParseComplete
+B	4	BindComplete
+B	33	RowDescription	 1 "?column?"  0  4 -1 0
+B	11	DataRow	 1 1 '1'
+B	13	CommandComplete	 "SELECT 1"
+F	4	Sync
+B	4	CloseComplete
+B	5	ReadyForQuery	 I
 F	4	Terminate
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 919cf5741d..a811d7 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1380,7 +1380,7 @@ pqAppendCmdQueueEntry(PGconn *conn, PGcmdQueueEntry *entry)
 			 * itself

Re: fix stats_fetch_consistency value in postgresql.conf.sample

On Sat, Jun 11, 2022 at 09:41:37AM -0500, Justin Pryzby wrote:
> Note that this gives:
> 
> guc.c:7573:9: warning: ‘dst’ may be used uninitialized in this function 
> [-Wmaybe-uninitialized]
> 
> with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)
> 
> I wonder whether you'd consider renaming pg_normalize_config_value() to
> pg_pretty_config_value() or similar.

I have looked at the patch, and I am not convinced that we need a
function that does a integer -> integer-with-unit conversion for the
purpose of this test.  One thing is that it can be unstable with the
unit in the conversion where values are close to a given threshold
(aka for cases like 2048kB/2MB).  On top of that, this overlaps with
the existing system function in charge of converting values with bytes
as size unit, while this stuff handles more unit types and all GUC
types.  I think that there could be some room in doing the opposite
conversion, feeding the value from postgresql.conf.sample to something
and compare it directly with boot_val.  That's solvable at SQL level,
still a system function may be more user-friendly.

Extending the tests to check after the values is something worth
doing, but I think that I would limit the checks to the parameters  
that do not have units for now, until we figure out which interface
would be more adapted for doing the normalization of the parameter
values.

-#syslog_facility = 'LOCAL0'
+#syslog_facility = 'local0'
Those changes should not be necessary in postgresql.conf.sample.  The
test should be in charge of applying the lower() conversion, in the
same way as guc.c does internally, and that's a mode supported by the
parameter parsing.  Using an upper-case value in the sample file is
actually meaningful sometimes (for example, syslog can use upper-case
strings to refer to LOCAL0~7).
--
Michael

signature.asc
Description: PGP signature

Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 19:21:42 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 7:14 PM, Andres Freund  wrote:
> > On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:
> >>> On Jun 15, 2022, at 6:55 PM, Andres Freund  wrote:
> >>> 
> >>> I think nothing would happen in this case - only pre-clustered tables get
> >>> clustered in an argumentless CLUSTER. What am I missing?
> >> 
> >> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
> >> is pre-clustered
> > 
> > VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the 
> > implementation
> > is shared, VACUUM FULL doesn't order the table contents. I see now reason 
> > why
> > an AM shouldn't support VACUUM FULL?
> 
> It's effectively a synonym which determines whether the "bool use_sort"
> parameter of the table AM's relation_copy_for_cluster will be set.  Heap-AM
> plays along and sorts or not based on that.

Hardly a synonym if it behaves differently?


> But it's up to the TAM what it wants to do with that boolean, if in fact it
> does anything at all based on that.  A TAM could decide to do the exact
> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
> on CLUSTER, or perhaps perform a randomized shuffle, or  behavior here>.

That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
that's possible with all extension APIs.



> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
> synonyms.  Or am I missing something?

The callback gets passed use_sort. So just implement it use_sort = false and
error out if use_sort = true?


> >> , and both will run against the table if the user has run an ALTER
> >> TABLE..CLUSTER ON.
> > 
> > If a user does that for a table that doesn't support clustering, well, I 
> > don't
> > see what's gained by not erroring out.
> 
> Perhaps they want to give the TAM information about which index to use for
> sorting, on those occasions when the TAM's logic dictates that sorting is
> appropriate, but not in response to a cluster command.

I have little sympathy to randomly misusing catalog contents and then
complaining that those catalog contents have an effect.

Greetings,

Andres Freund

Re: Modest proposal to extend TableAM API for controlling cluster commands

> On Jun 15, 2022, at 7:21 PM, Mark Dilger  wrote:
> 
>> If a user does that for a table that doesn't support clustering, well, I 
>> don't
>> see what's gained by not erroring out.
> 
> Perhaps they want to give the TAM information about which index to use for 
> sorting, on those occasions when the TAM's logic dictates that sorting is 
> appropriate, but not in response to a cluster command.

I should admit that this is a bit hack-ish, but TAM authors haven't been left a 
lot of options here.  Index AMs allow for custom storage parameters, but Table 
AMs don't, so getting information to the TAM about how to behave takes more 
than a little slight of hand.  Simon's proposal from a while back (don't have 
the link just now) to allow TAMs to define custom storage parameters would go 
some distance here.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Modest proposal to extend TableAM API for controlling cluster commands

> On Jun 15, 2022, at 7:14 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:
>>> On Jun 15, 2022, at 6:55 PM, Andres Freund  wrote:
>>> 
>>> I think nothing would happen in this case - only pre-clustered tables get
>>> clustered in an argumentless CLUSTER. What am I missing?
>> 
>> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
>> is pre-clustered
> 
> VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
> is shared, VACUUM FULL doesn't order the table contents. I see now reason why
> an AM shouldn't support VACUUM FULL?

It's effectively a synonym which determines whether the "bool use_sort" 
parameter of the table AM's relation_copy_for_cluster will be set.  Heap-AM 
plays along and sorts or not based on that.  But it's up to the TAM what it 
wants to do with that boolean, if in fact it does anything at all based on 
that.  A TAM could decide to do the exact opposite of what Heap-AM does and 
instead sort on VACUUM FULL but not sort on CLUSTER, or perhaps perform a 
randomized shuffle, or .  From the 
point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are synonyms.  Or 
am I missing something?

>> , and both will run against the table if the user has run an ALTER
>> TABLE..CLUSTER ON.
> 
> If a user does that for a table that doesn't support clustering, well, I don't
> see what's gained by not erroring out.

Perhaps they want to give the TAM information about which index to use for 
sorting, on those occasions when the TAM's logic dictates that sorting is 
appropriate, but not in response to a cluster command.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 6:55 PM, Andres Freund  wrote:
> > 
> > I think nothing would happen in this case - only pre-clustered tables get
> > clustered in an argumentless CLUSTER. What am I missing?
> 
> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
> is pre-clustered

VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
is shared, VACUUM FULL doesn't order the table contents. I see now reason why
an AM shouldn't support VACUUM FULL?

> , and both will run against the table if the user has run an ALTER
> TABLE..CLUSTER ON.

If a user does that for a table that doesn't support clustering, well, I don't
see what's gained by not erroring out.

Greetings,

Andres Freund

Re: Modest proposal to extend TableAM API for controlling cluster commands

> On Jun 15, 2022, at 6:55 PM, Andres Freund  wrote:
> 
> I think nothing would happen in this case - only pre-clustered tables get
> clustered in an argumentless CLUSTER. What am I missing?

The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target is 
pre-clustered, and both will run against the table if the user has run an ALTER 
TABLE..CLUSTER ON.  Now, we could try to catch that latter command with a 
utility hook, but since the VACUUM FULL is still problematic, it seems cleaner 
to just add the callback I am proposing.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 18:24:45 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 6:01 PM, Andres Freund  wrote:
> > On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
> >> While developing various Table Access Methods, I have wanted a callback for
> >> determining if CLUSTER (and VACUUM FULL) should be run against a table
> >> backed by a given TAM.  The current API contains a callback for doing the
> >> guts of the cluster, but by that time, it's a bit too late to cleanly back
> >> out.  For single relation cluster commands, raising an error from that
> >> callback is probably not too bad.  For multi-relation cluster commands, 
> >> that
> >> aborts the clustering of other yet to be processed relations, which doesn't
> >> seem acceptable.
> > 
> > Why not? What else do you want to do in that case? Silently ignoring
> > non-clusterable tables doesn't seem right either. What's the use-case for
> > swallowing the error?
> 
> Imagine you develop a TAM for which the concept of "clustering" doesn't have
> any defined meaning.  Perhaps you've arranged the data in a way that has no
> similarity to heap, and now somebody runs a CLUSTER command (with no
> arguments.)

I think nothing would happen in this case - only pre-clustered tables get
clustered in an argumentless CLUSTER. What am I missing?

Greetings,

Andres Freund

Re: Using PQexecQuery in pipeline mode produces unexpected Close messages

2022-06-15 Thread Kyotaro Horiguchi

At Thu, 16 Jun 2022 10:34:22 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 15 Jun 2022 14:56:42 -0400, Tom Lane  wrote in 
> > Alvaro Herrera  writes:
> > > So, git archaeology led me to this thread
> > > https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql
> > > which is why we added that message in the first place.
> > 
> > Um.  Good thing you looked.  I doubt we want to revert that change now.
> > 
> > > Alternatives:
> > > - Have the client not complain if it gets CloseComplete in idle state.
> > >   (After all, it's a pretty useless message, since we already do nothing
> > >   with it if we get it in BUSY state.)
> > 
> > ISTM the actual problem here is that we're reverting to IDLE state too
> > soon.  I didn't try to trace down exactly where that's happening, but
> 
> Yes. I once visited that fact but also I thought that in the
> comparison with non-pipelined PQsendQuery, the three messages look
> extra.  Thus I concluded (at the time) that removing Close is enough
> here.
> 
> > I notice that in the non-pipeline case we don't go to IDLE till we've
> > seen 'Z' (Sync).  Something in the pipeline logic must be jumping the
> > gun on that state transition.
> 
- PQgetResult() resets the state to IDLE when not in pipeline mode.

D... the "not" should not be there.

+ PQgetResult() resets the state to IDLE while in pipeline mode.

> fe-exec.c:2171
> 
> > if (conn->pipelineStatus != PQ_PIPELINE_OFF)
> > {
> > /*
> >  * We're about to send the results of the 
> > current query.  Set
> >  * us idle now, and ...
> >  */
> > conn->asyncStatus = PGASYNC_IDLE;
> 
> And actually that code let the connection state enter to IDLE before
> CloseComplete.  In the test case I posted, the following happens.
> 
>   PQsendQuery(conn, "SELECT 1;");
>   PQsendFlushRequest(conn);
>   PQgetResult(conn);  // state enters IDLE, reads down to 
> 
>   PQgetResult(conn);  // reads 
>   PQpipelineSync(conn);   // sync too late
> 
> Pipeline feature seems intending to allow PQgetResult called before
> PQpipelineSync. And also seems allowing to call QPpipelineSync() after
> PQgetResult().
> 
> I haven't come up with a valid *fix* of this flow..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Using PQexecQuery in pipeline mode produces unexpected Close messages

2022-06-15 Thread Kyotaro Horiguchi

At Wed, 15 Jun 2022 14:56:42 -0400, Tom Lane  wrote in 
> Alvaro Herrera  writes:
> > So, git archaeology led me to this thread
> > https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql
> > which is why we added that message in the first place.
> 
> Um.  Good thing you looked.  I doubt we want to revert that change now.
> 
> > Alternatives:
> > - Have the client not complain if it gets CloseComplete in idle state.
> >   (After all, it's a pretty useless message, since we already do nothing
> >   with it if we get it in BUSY state.)
> 
> ISTM the actual problem here is that we're reverting to IDLE state too
> soon.  I didn't try to trace down exactly where that's happening, but

Yes. I once visited that fact but also I thought that in the
comparison with non-pipelined PQsendQuery, the three messages look
extra.  Thus I concluded (at the time) that removing Close is enough
here.

> I notice that in the non-pipeline case we don't go to IDLE till we've
> seen 'Z' (Sync).  Something in the pipeline logic must be jumping the
> gun on that state transition.

PQgetResult() resets the state to IDLE when not in pipeline mode.

fe-exec.c:2171

>   if (conn->pipelineStatus != PQ_PIPELINE_OFF)
>   {
>   /*
>* We're about to send the results of the 
> current query.  Set
>* us idle now, and ...
>*/
>   conn->asyncStatus = PGASYNC_IDLE;

And actually that code let the connection state enter to IDLE before
CloseComplete.  In the test case I posted, the following happens.

  PQsendQuery(conn, "SELECT 1;");
  PQsendFlushRequest(conn);
  PQgetResult(conn);  // state enters IDLE, reads down to 
  PQgetResult(conn);  // reads 
  PQpipelineSync(conn);   // sync too late

Pipeline feature seems intending to allow PQgetResult called before
PQpipelineSync. And also seems allowing to call QPpipelineSync() after
PQgetResult().

I haven't come up with a valid *fix* of this flow..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Modest proposal to extend TableAM API for controlling cluster commands

> On Jun 15, 2022, at 6:01 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
>> While developing various Table Access Methods, I have wanted a callback for
>> determining if CLUSTER (and VACUUM FULL) should be run against a table
>> backed by a given TAM.  The current API contains a callback for doing the
>> guts of the cluster, but by that time, it's a bit too late to cleanly back
>> out.  For single relation cluster commands, raising an error from that
>> callback is probably not too bad.  For multi-relation cluster commands, that
>> aborts the clustering of other yet to be processed relations, which doesn't
>> seem acceptable.
> 
> Why not? What else do you want to do in that case? Silently ignoring
> non-clusterable tables doesn't seem right either. What's the use-case for
> swallowing the error?

Imagine you develop a TAM for which the concept of "clustering" doesn't have 
any defined meaning.  Perhaps you've arranged the data in a way that has no 
similarity to heap, and now somebody runs a CLUSTER command (with no 
arguments.)  It's reasonable that they want all their heap tables to get the 
usual cluster_rel treatment, and that the existence of a table using your 
exotic TAM shouldn't interfere with that.  Then what?  You are forced to copy 
all the data from your OldHeap (badly named) to the NewHeap (also badly named), 
or to raise an error.  That doesn't seem ok.

>> I tried fixing this with a ProcessUtility_hook, but that
>> fires before the multi-relation cluster command has compiled the list of
>> relations to cluster, meaning the ProcessUtility_hook doesn't have access to
>> the necessary information.  (It can be hacked to compile the list of
>> relations itself, but that duplicates both code and effort, with the usual
>> risks that the code will get out of sync.)
>> 
>> For my personal development, I have declared a new hook, bool
>> (*relation_supports_cluster) (Relation rel).  It gets called from
>> commands/cluster.c in both the single-relation and multi-relation code
>> paths, with warning or debug log messages output for relations that decline
>> to be clustered, respectively.
> 
> Do you actually need to dynamically decide whether CLUSTER is supported?
> Otherwise we could just make the existing cluster callback optional and error
> out if a table is clustered that doesn't have the callback.

Same as above, I don't know why erroring would be the right thing to do.  As a 
comparison, consider that we don't attempt to cluster a partitioned table, but 
rather just silently skip it.  Imagine if, when we introduced the concept of 
partitioned tables, we made unqualified CLUSTER commands always fail when they 
encountered a partitioned table.

>> Before posting a patch, do people think this sounds useful?  Would you like
>> the hook function signature to differ in some way?  Is a simple "yes this
>> relation may be clustered" vs. "no this relation may not be clustered"
>> interface overly simplistic?
> 
> It seems overly complicated, if anything ;)

For the TAMs I've developed thus far, I don't need the (Relation rel) 
parameter, and could just have easily used (void).  But that seems to fence in 
what other TAM authors could do in future.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Extending USING [heap | mytam | yourtam] grammar and behavior

Hackers,

I have extended the grammar to allow "USING NOT method [, ...]" to exclude one 
or more TAMs in a CREATE TABLE statement.  This may sound like a weird thing to 
do, but it is surprisingly useful when developing new Table Access Methods, 
particularly when you are developing two or more, not just one.  To explain:

Developing a new TAM takes an awful lot of testing, and much of it is 
duplicative of the existing core regression test suite.  Leveraging the 
existing tests saves an awful lot of test development.

When developing just one TAM, leveraging the existing tests isn't too hard.  
Without much work*, you can set default_table_access_method=mytam for the 
duration of the check-world.  You'll get a few test failures this way.  Some 
will be in tests that probe the catalogs to verify that /heap/ is stored there, 
and instead /mytam/ is found.  Others will be tests that are sensitive to the 
number of rows that fit per page, etc.  But a surprising number of tests just 
pass, at least after you get the TAM itself debugged.

When developing two or more TAMs, this falls apart.  Some tests may be worth 
fixing up (perhaps with alternate output files) for "mytam", but not for 
"columnar_tam".  That might be because the test is checking fundamentally 
row-store-ish properties of the table, which has no applicability to your 
column-store-ish TAM.  In that case, "USING NOT columnar_tam" fixes the test 
failure when columnar is the default, without preventing the test from testing 
"mytam" when it happens to be the default.

Once you have enough TAMs developed and deployed, this USING NOT business 
becomes useful in production.  You might have different defaults on different 
servers, or for different customers, etc., and for a given piece of DDL that 
you want to release you only want to say which TAMs not to use, not to nail 
down which TAM must be used.

Thoughts?  I'll hold off posting a patch until the general idea is debated.


[*] It takes some extra work to get the TAP tests to play along.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
> While developing various Table Access Methods, I have wanted a callback for
> determining if CLUSTER (and VACUUM FULL) should be run against a table
> backed by a given TAM.  The current API contains a callback for doing the
> guts of the cluster, but by that time, it's a bit too late to cleanly back
> out.  For single relation cluster commands, raising an error from that
> callback is probably not too bad.  For multi-relation cluster commands, that
> aborts the clustering of other yet to be processed relations, which doesn't
> seem acceptable.

Why not? What else do you want to do in that case? Silently ignoring
non-clusterable tables doesn't seem right either. What's the use-case for
swallowing the error?


> I tried fixing this with a ProcessUtility_hook, but that
> fires before the multi-relation cluster command has compiled the list of
> relations to cluster, meaning the ProcessUtility_hook doesn't have access to
> the necessary information.  (It can be hacked to compile the list of
> relations itself, but that duplicates both code and effort, with the usual
> risks that the code will get out of sync.)
> 
> For my personal development, I have declared a new hook, bool
> (*relation_supports_cluster) (Relation rel).  It gets called from
> commands/cluster.c in both the single-relation and multi-relation code
> paths, with warning or debug log messages output for relations that decline
> to be clustered, respectively.

Do you actually need to dynamically decide whether CLUSTER is supported?
Otherwise we could just make the existing cluster callback optional and error
out if a table is clustered that doesn't have the callback.


> Before posting a patch, do people think this sounds useful?  Would you like
> the hook function signature to differ in some way?  Is a simple "yes this
> relation may be clustered" vs. "no this relation may not be clustered"
> interface overly simplistic?

It seems overly complicated, if anything ;)

Greetings,

Andres Freund

Re:

On Thu, Jun 9, 2022 at 11:50 PM Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > Initially, that chapter did not document any system views.
>
> Maybe we could make the system views a separate chapter?

+1

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: tablesync copy ignores publication actions

On Wed, Jun 15, 2022 at 5:05 PM shiy.f...@fujitsu.com
 wrote:
>
...
> Thanks for updating the patch. Two comments:
>
> 1.
> + it means the copied table t3 contains all rows even 
> when
> + they do not patch the row filter of publication 
> pub3b.
>
> Typo. I think "they do not patch the row filter" should be "they do not match
> the row filter", right?
>
> 2.
> @@ -500,7 +704,6 @@
>
>   
>  
> -
>
>
>
>
> It seems we should remove this change.
>

Thank you for your review comments. Those reported mistakes are fixed
in the attached patch v3.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-PGDOCS-tablesync-ignores-publish-operation.patch
Description: Binary data

Modest proposal to extend TableAM API for controlling cluster commands

Hackers,

While developing various Table Access Methods, I have wanted a callback for 
determining if CLUSTER (and VACUUM FULL) should be run against a table backed 
by a given TAM.  The current API contains a callback for doing the guts of the 
cluster, but by that time, it's a bit too late to cleanly back out.  For single 
relation cluster commands, raising an error from that callback is probably not 
too bad.  For multi-relation cluster commands, that aborts the clustering of 
other yet to be processed relations, which doesn't seem acceptable.  I tried 
fixing this with a ProcessUtility_hook, but that fires before the 
multi-relation cluster command has compiled the list of relations to cluster, 
meaning the ProcessUtility_hook doesn't have access to the necessary 
information.  (It can be hacked to compile the list of relations itself, but 
that duplicates both code and effort, with the usual risks that the code will 
get out of sync.)

For my personal development, I have declared a new hook, bool 
(*relation_supports_cluster) (Relation rel).  It gets called from 
commands/cluster.c in both the single-relation and multi-relation code paths, 
with warning or debug log messages output for relations that decline to be 
clustered, respectively.

Before posting a patch, do people think this sounds useful?  Would you like the 
hook function signature to differ in some way?  Is a simple "yes this relation 
may be clustered" vs. "no this relation may not be clustered" interface overly 
simplistic?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Small TAP improvements

On Wed, Jun 15, 2022 at 07:59:10AM -0400, Andrew Dunstan wrote:
> My would we do that? If you want a map don't pass any switches. But as
> written you could do:
> 
> my ($incdir, $localedir, $sharedir) = $node->config_data(qw(--includedir 
> --localedir --sharedir));
> 
> No map needed to get what you want, in fact a map would get in the
> way.

Nice, I didn't know you could do that.  That's a pattern worth
mentioning in the perldoc part as an example, in my opinion.
--
Michael


signature.asc
Description: PGP signature

Re: "buffer too small" or "path too long"?

On Wed, Jun 15, 2022 at 02:02:03PM -0400, Tom Lane wrote:
> Yeah, that was what was bugging me about this proposal.  Removing
> one function's dependency on MAXPGPATH isn't much of a step forward.

This comes down to out-of-memory vs path length at the end.  Changing
only the paths of make_outputdirs() without touching all the paths in 
check.c and the one in function.c does not sound good to me, as this
increases the risk of failing pg_upgrade in the middle, and that's
what we should avoid, as said upthread.

> I note also that the patch leaks quite a lot of memory (a kilobyte or
> so per pathname, IIRC).  That's probably negligible in this particular
> context, but anyplace that was called more than once per program run
> would need to be more tidy.

Surely.
--
Michael

signature.asc
Description: PGP signature

Re: better page-level checksums

2022-06-15 Thread Peter Geoghegan

On Wed, Jun 15, 2022 at 1:27 PM Robert Haas  wrote:
> I think what will happen, depending on
> the encryption mode, is probably that either (a) the page will decrypt
> to complete garbage or (b) the page will fail some kind of
> verification and you won't be able to decrypt it at all. Either way,
> you won't be able to infer anything about what caused the problem. All
> you'll know is that something is wrong. That sucks - a lot - and I
> don't have a lot of good ideas as to what can be done about it. The
> idea that an encrypted page is unintelligible and that small changes
> to either the encrypted or unencrypted data should result in large
> changes to the other is intrinsic to the nature of encryption. It's
> more or less un-debuggable by design.

It's pretty clear that there must be a lot of truth to that. But that
doesn't mean that there aren't meaningful gradations beyond that.

I think that it's worth doing the following exercise (humor me): Why
wouldn't it be okay to just encrypt the tuple space and the line
pointer array, leaving both the page header and page special area
unencrypted? What kind of user would find that trade-off to be
unacceptable, and why? What's the nuance of it?

For all I know you're right (about encrypting the whole page, metadata
and all). I just want to know why that is. I understand that this
whole area is one where in general we may have to live with a certain
amount of uncertainty about what really matters.

> With extended checksums, I don't think the issues are anywhere near as
> bad. I'm not deeply opposed to setting a page-level flag but I expect
> nominal benefits.

I also expect only a small benefit. But that isn't a particularly
important factor in my mind.

Let's suppose that it turns out to be significantly more useful than
we originally expected, for whatever reason. Assuming all that, what
else can be said about it now? Isn't it now *relatively* likely that
including that status bit metadata will be *extremely* valuable, and
not merely somewhat more valuable?

I guess it doesn't matter much now (since you have all but conceded
that using a bit for this makes sense), but FWIW that's the main
reason why I almost took it for granted that we'd need to use a status
bit (or bits) for this.

-- 
Peter Geoghegan

Re: better page-level checksums

On Tue, Jun 14, 2022 at 10:30 PM Peter Geoghegan  wrote:
> Basically I think that this is giving up rather a lot. For example,
> isn't it possible that we'd have corruption that could be a bug in
> either the checksum code, or in recovery?
>
> I'd feel a lot better about it if there was some sense of both the
> costs and the benefits.

I think that, if and when we get TDE, debuggability is likely to be a
huge issue. Something will go wrong for someone at some point, and
when it does, what they'll have is a supposedly-encrypted page that
cannot be decrypted, and it will be totally unclear what has gone
wrong. Did the page get corrupted on disk by a random bit flip? Is
there a bug in the algorithm? Torn page? As things stand today, when a
page gets corrupted, a human being can look at the page and make an
educated guess about what has gone wrong and whether PostgreSQL or
some other system is to blame, and if it's PostgreSQL, perhaps have
some ideas as to where to look for the bug. If the pages are
encrypted, that's a lot harder. I think what will happen, depending on
the encryption mode, is probably that either (a) the page will decrypt
to complete garbage or (b) the page will fail some kind of
verification and you won't be able to decrypt it at all. Either way,
you won't be able to infer anything about what caused the problem. All
you'll know is that something is wrong. That sucks - a lot - and I
don't have a lot of good ideas as to what can be done about it. The
idea that an encrypted page is unintelligible and that small changes
to either the encrypted or unencrypted data should result in large
changes to the other is intrinsic to the nature of encryption. It's
more or less un-debuggable by design.

With extended checksums, I don't think the issues are anywhere near as
bad. I'm not deeply opposed to setting a page-level flag but I expect
nominal benefits. A human being looking at the page isn't going to
have a ton of trouble figuring out whether or not the extended
checksum is present unless the page is horribly, horribly garbled, and
even if that happens, will debugging that problem really be any worse
than debugging a horribly, horribly garbled page today? I don't think
so. I likewise expect that pg_filedump could use heuristics to figure
out what's going on just by looking at the page, even if no external
information is available. You are probably right when you say that
there's no need to be so parsimonious with pd_flags space as all that,
but I believe that if we did decide to set no bit in pd_flags, whoever
maintains pg_filedump these days would not have huge difficulty
inventing a suitable heuristic. A page with an extended checksum is
basically still an intelligible page, and we shouldn't understate the
value of that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_upgrade (12->14) fails on aggregate

Justin Pryzby  writes:
> But Petr has a point - pg_upgrade should aspire to catch errors in --check,
> rather than starting and then leaving a mess behind for the user to clean up

Agreed; pg_upgrade has historically tried to find problems similar to
this.  However, it's not just aggregates that are at risk.  People
might also have built user-defined plain functions, or operators,
atop these functions.  How far do we want to go in looking?

As for the query, I think it could be simplified quite a bit by
relying on regprocedure literals, that is something like

WHERE ... a.aggtransfn IN
  ('array_append(anyarray,anyelement)'::regprocedure,
   'array_prepend(anyelement,anyarray)'::regprocedure,
   ...)

Not sure if it's necessary to stick explicit "pg_catalog." schema
qualifications into this --- IIRC pg_upgrade runs with restrictive
search_path, so that this would be safe as-is.

Also, I think you need to check aggfinalfn too.

Also, I'd be inclined to reject system-provided objects by checking
for OID >= 16384 rather than hard-wiring assumptions about things
being in pg_catalog or not.

regards, tom lane

Re: Using PQexecQuery in pipeline mode produces unexpected Close messages

Alvaro Herrera  writes:
> So, git archaeology led me to this thread
> https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql
> which is why we added that message in the first place.

Um.  Good thing you looked.  I doubt we want to revert that change now.

> Alternatives:
> - Have the client not complain if it gets CloseComplete in idle state.
>   (After all, it's a pretty useless message, since we already do nothing
>   with it if we get it in BUSY state.)

ISTM the actual problem here is that we're reverting to IDLE state too
soon.  I didn't try to trace down exactly where that's happening, but
I notice that in the non-pipeline case we don't go to IDLE till we've
seen 'Z' (Sync).  Something in the pipeline logic must be jumping the
gun on that state transition.

regards, tom lane

Re: Using PQexecQuery in pipeline mode produces unexpected Close messages

2022-06-15 Thread Alvaro Herrera

On 2022-Jun-10, Kyotaro Horiguchi wrote:

> (Moved to -hackers)
> 
> At Wed, 8 Jun 2022 17:08:47 +0200, Alvaro Herrera  
> wrote in 
> > What that Close message is doing is closing the unnamed portal, which
> > is otherwise closed implicitly when the next one is opened.  That's how
> > single-query mode works: if you run a single portal, it'll be kept open.
> > 
> > I believe that the right fix is to not send that Close message in
> > PQsendQuery.
> 
> Agreed. At least Close message in that context is useless and
> PQsendQueryGuts doesn't send it. And removes the Close message surely
> fixes the issue.

So, git archaeology led me to this thread
https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql
which is why we added that message in the first place.

I was about to push the attached patch (a merged version of Kyotaro's
and mine), but now I'm wondering if that's the right approach.

Alternatives:

- Have the client not complain if it gets CloseComplete in idle state.
  (After all, it's a pretty useless message, since we already do nothing
  with it if we get it in BUSY state.)

- Have the server not send CloseComplete at all in the cases where
  the client is not expecting it.  Not sure how this would be
  implemented.

- Other ideas?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)
>From d9f0ff57fe8aa7f963a9411741bb1d68082cc31a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 15 Jun 2022 19:56:41 +0200
Subject: [PATCH v2] PQsendQuery: Don't send Close in pipeline mode

In commit 4efcf47053ea, we modified PQsendQuery to send a Close message
when in pipeline mode.  But now we discover that that's not a good
thing: under certain circumstances, it causes the server to deliver a
CloseComplete message at a time when the client is not expecting it.  We
failed to noticed it because the tests don't have any scenario where the
problem is hit.  Remove the offending Close, and add a test case that
tickles the problematic scenario.

Co-authored-by: Kyotaro Horiguchi 
Reported-by: Daniele Varrazzo 
Discussion: https://postgr.es/m/ca+mi_8bvd0_cw3sumgwpvwdnzxy32itog_16tdyru_1s2gv...@mail.gmail.com
---
 src/interfaces/libpq/fe-exec.c|  5 --
 .../modules/libpq_pipeline/libpq_pipeline.c   | 62 +++
 .../traces/pipeline_abort.trace   |  2 -
 .../traces/simple_pipeline.trace  | 12 
 4 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 4180683194..e2df3a3480 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1403,11 +1403,6 @@ PQsendQueryInternal(PGconn *conn, const char *query, bool newQuery)
 			pqPutInt(0, 4, conn) < 0 ||
 			pqPutMsgEnd(conn) < 0)
 			goto sendFailed;
-		if (pqPutMsgStart('C', conn) < 0 ||
-			pqPutc('P', conn) < 0 ||
-			pqPuts("", conn) < 0 ||
-			pqPutMsgEnd(conn) < 0)
-			goto sendFailed;
 
 		entry->queryclass = PGQUERY_EXTENDED;
 		entry->query = strdup(query);
diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c
index c27c4e0ada..e24fbfe1cc 100644
--- a/src/test/modules/libpq_pipeline/libpq_pipeline.c
+++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c
@@ -968,15 +968,29 @@ test_prepared(PGconn *conn)
 	fprintf(stderr, "ok\n");
 }
 
+/* Notice processor: print them, and count how many we got */
+static void
+notice_processor(void *arg, const char *message)
+{
+	int	   *n_notices = (int *) arg;
+
+	(*n_notices)++;
+	fprintf(stderr, "NOTICE %d: %s", *n_notices, message);
+}
+
 static void
 test_simple_pipeline(PGconn *conn)
 {
 	PGresult   *res = NULL;
 	const char *dummy_params[1] = {"1"};
 	Oid			dummy_param_oids[1] = {INT4OID};
+	PQnoticeProcessor oldproc;
+	int			n_notices = 0;
 
 	fprintf(stderr, "simple pipeline... ");
 
+	oldproc = PQsetNoticeProcessor(conn, notice_processor, _notices);
+
 	/*
 	 * Enter pipeline mode and dispatch a set of operations, which we'll then
 	 * process the results of as they come in.
@@ -1052,6 +1066,54 @@ test_simple_pipeline(PGconn *conn)
 	if (PQpipelineStatus(conn) != PQ_PIPELINE_OFF)
 		pg_fatal("Exiting pipeline mode didn't seem to work");
 
+	if (n_notices > 0)
+		pg_fatal("unexpected notice");
+
+	/* Try the same thing with PQsendQuery */
+	if (PQenterPipelineMode(conn) != 1)
+		pg_fatal("failed to enter pipeline mode: %s", PQerrorMessage(conn));
+
+	if (PQsendQuery(conn, "SELECT 1") != 1)
+		pg_fatal("failed to send query: %s", PQerrorMessage(conn));
+	PQsendFlushRequest(conn);
+	res = PQgetResult(conn);
+	if (res == NULL)
+		pg_fatal("PQgetResult returned null when there's a pipeline item: %s",
+ PQerrorMessage(conn));
+	if (PQresultStatus(res) !=

Re: "buffer too small" or "path too long"?

Robert Haas  writes:
> On Wed, Jun 15, 2022 at 2:51 AM Peter Eisentraut
>  wrote:
>> We have this problem of long file names being silently truncated all
>> over the source code.  Instead of equipping each one of them with a
>> length check, why don't we get rid of the fixed-size buffers and
>> allocate dynamically, as in the attached patch.

> I don't know how much we gain by fixing one place and not all the
> others, but maybe it would set a trend.

Yeah, that was what was bugging me about this proposal.  Removing
one function's dependency on MAXPGPATH isn't much of a step forward.

I note also that the patch leaks quite a lot of memory (a kilobyte or
so per pathname, IIRC).  That's probably negligible in this particular
context, but anyplace that was called more than once per program run
would need to be more tidy.

regards, tom lane

Re: docs: mention "pg_read_all_stats" in "track_activities" description

2022-06-15 Thread Nathan Bossart

On Tue, Jun 07, 2022 at 10:08:21PM +0900, Ian Lawrence Barwick wrote:
> A little late to the party, but as an alternative suggestion for the last
> part:
> 
>   "... and users who either own the session being reported on, or who have
>   privileges of the role to which the session belongs,"
> 
> so the whole sentence would read:
> 
>   Note that even when enabled, this information is only visible to superusers,
>   roles with privileges of the pg_read_all_stats role, and users who either 
> own
>   the session being reported on or who have privileges of the role to which 
> the
>   session belongs, so it should not represent a security risk.

This seems clearer to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 14, 2022 at 3:54 AM Masahiko Sawada  wrote:
> > AFAICS, we could do that by:
> >
> > 1. De-supporting platforms that have this problem, or
> > 2. Introducing new typalign values, as Noah proposed back on April 2, or
> > 3. Somehow forcing values that are sometimes 4-byte aligned and
> > sometimes 8-byte aligned to be 8-byte alignment on all platforms
>
> Introducing new typalign values seems a good idea to me as it's more
> future-proof. Will this item be for PG16, right? The main concern
> seems that what this test case enforces would be nuisance when
> introducing a new system catalog or a new column to the existing
> catalog but given we're in post PG15-beta1 it is unlikely to happen in
> PG15.

I agree that we're not likely to introduce a new typalign value any
sooner than v16. There are a couple of things that bother me about
that solution. One is that I don't know how many different behaviors
exist out there in the wild. If we distinguish the alignment of double
from the alignment of int8, is that good enough, or are there other
data types whose properties aren't necessarily the same as either of
those? The other is that 32-bit systems are already relatively rare
and probably will become more rare until they disappear completely. It
doesn't seem like a ton of fun to engineer solutions to problems that
may go away by themselves with the passage of time. On the other hand,
if the alternative is to live with this kind of ugliness for another 5
years, maybe the time it takes to craft a solution is effort well
spent.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: "buffer too small" or "path too long"?

On Wed, Jun 15, 2022 at 2:51 AM Peter Eisentraut
 wrote:
> We have this problem of long file names being silently truncated all
> over the source code.  Instead of equipping each one of them with a
> length check, why don't we get rid of the fixed-size buffers and
> allocate dynamically, as in the attached patch.

I've always wondered why we rely on MAXPGPATH instead of dynamic
allocation. It seems pretty lame.

I don't know how much we gain by fixing one place and not all the
others, but maybe it would set a trend.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Wed, Jun 15, 2022 at 1:51 AM Michael Paquier  wrote:
> Handle is more consistent with the other types of interruptions in the
> SIGUSR1 handler, so the name proposed in the patch in not that
> confusing to me.  And so does Process, in spirit with
> ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt().
> While on it, is postgres.c the best home for
> HandleRecoveryConflictInterrupt()?  That's a very generic file, for
> starters.  Not related to the actual bug, just asking.

Yeah, there's existing precedent for this kind of split in, for
example, HandleCatchupInterrupt() and ProcessCatchupInterrupt(). I
think the idea is that "process" is supposed to sound like the more
involved part of the operation, whereas "handle" is supposed to sound
like the initial response to the signal.

I'm not sure it's the clearest possible naming, but naming things is
hard, and this patch is apparently not inventing a new way to do it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: better page-level checksums

On Wed, Jun 15, 2022 at 4:54 AM Peter Eisentraut
 wrote:
> It's hard to get any definite information about what size of checksum is
> "good enough", since after all it depends on what kinds of errors you
> expect and what kinds of probabilities you want to accept.  But the best
> I could gather so far is that 16-bit CRC are good until about 16 kB
> block size.

Not really. There's a lot of misinformation on this topic floating
around on this mailing list, and some of that misinformation is my
fault. I keep learning more about this topic. However, I'm pretty
confident that, on the one hand, there's no hard limit on the size of
the data that can be effectively validated via a CRC, and on the other
hand, CRC isn't a particularly great algorithm, although it does have
certain interesting advantages for certain purposes.

For example, according to
https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks#Error_detection_strength
a CRC is guaranteed to detect all single-bit errors. This property is
easy to achieve: for example, a parity bit has this property.
According to the same source, a CRC is guaranteed to detect two-bit
errors only if the distance between them is less than some limit that
gets larger as the CRC gets wider. Imagine that you have a CRC-16 of a
message 64k+1 bits in length. Suppose that an error in the first bit
changes the result from v to v'. Can we, by flipping a second bit
later in the message, change the final result from v' back to v? The
calculation only has 64k possible answers, and we have 64k bits we can
flip to try to get the desired answer. If every one of those bit flips
produces a different answer, then one of those answers must be v --
which means detection of two-bits errors is not guaranteed. If at
least two of those bit flips produce the same answer, then consider
the messages produced by those two different bit flips. They differ
from each other by exactly two bits and yet produced the same CRC, so
detection of two-bit errors is still not guaranteed.

On the other hand, it's still highly likely. If a message of length
2^16+1 bits contains two bit errors one of which is in the first bit,
the chances that the other one is in exactly the right place to cancel
out the first error are about 2^-16. That's not zero, but it's just as
good as our chances of detecting a replacement of the entire message
with some other message chosen completely at random. I think the
reason why discussion of CRCs tends to focus on the types of bit
errors that it can detect is that the algorithm was designed when
people were doing stuff like sending files over a modem. It's easy to
understand how individual bits could get garbled without anybody
noticing, while large-scale corruption would be less likely, but the
risks are not necessarily the same for a PostgreSQL data file. Lower
levels of the stack are probably already using checksums to try to
detect errors at the level of the physical medium. I'm sure some stuff
slips through the cracks, but in practice we also see failure modes
where the filesystem substitutes 8kB of data from an unrelated file,
or where a torn write in combination with unreliable fsync results in
half of the page contents being from an older version of the page.
These kinds of large-scale replacements aren't what CRCs are designed
to detect, and the chances that we will detect them are roughly
1-2^-bits, whether we use a CRC or something else.

Of course, that partly depends on the algorithm quality. If an
algorithm is more likely to generate some results than others, then
its actual error detection rate will not be as good as the number of
output bits would suggest. If the result doesn't depend equally on
every input bit, then the actual error detection rate will not be as
good as the number of output bits would suggest. And CRC-32 is
apparently not great by modern standards:

https://github.com/rurban/smhasher

Compare the results for CRC-32 with, say, Spooky32. Apparently the
latter is faster yet produces better output. So maybe we would've been
better off if we'd made Spooky32 the default algorithm for backup
manifest checksums rather than CRC-32.

> The benefits of doubling the checksum size are exponential rather than
> linear, so there is no significant benefit of using a 64-bit checksum
> over a 32-bit one, for supported block sizes (current max is 32 kB).

I'm still unconvinced that the block size is very relevant here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Remove trailing newlines from pg_upgrade's messages

Kyotaro Horiguchi  writes:
> Also leading newlines and just "\n" bug me when I edit message
> catalogues with poedit. I might want a line-spacing function like
> pg_log_newline(PG_REPORT) if we need line-breaks in the ends of a
> message.

Yeah, that is sort of the inverse problem.  I think those are there
to ensure that the text appears on a fresh line even if the current
line has transient status on it.  We could get rid of those perhaps
if we teach pg_log_v to remember whether it ended the last output
with a newline or not, and then put out a leading newline only if
necessary, rather than hard-wiring one into the message texts.

This might take a little bit of fiddling to make it work, because
we'd not want the extra newline when completing an incomplete line
by adding status.  That would mean that report_status would have
to do something special, plus we'd have to be sure that all such
cases do go through report_status rather than calling pg_log
directly.  (I'm fairly sure that the code is sloppy about that
today :-(.)  It seems probably do-able, though.

regards, tom lane

Re: support for MERGE

2022-06-15 Thread Álvaro Herrera

Pushed this.  I noticed that the paragraph that described MERGE in the
read-committed subsection had been put in the middle of some other paras
that describe interactions with other transactions, so I moved it up so
that it appears together with all the other paras that describe the
behavior of specific commands, which is where I believe it belongs.

Thanks!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)

Re: CREATE TABLE ( .. STORAGE ..)

2022-06-15 Thread Aleksander Alekseev

Hi hackers,

I noticed that cfbot is not entirely happy with the patch, so I rebased it.

> I see that COMPRESSION and STORAGE now are handled slightly
> differently in the grammar. Maybe we could standardize that a bit
> more; so that we have only one `STORAGE [kind]` definition in the
> grammar?
>
> As I'm new to the grammar files; would you know the difference between
> `name` and `ColId`, and why you would change from one to the other in
> ALTER COLUMN STORAGE?

Good point, Matthias. I addressed this in 0002. Does it look better now?

-- 
Best regards,
Aleksander Alekseev


v3-0001-CREATE-TABLE-.-STORAGE.patch
Description: Binary data


v3-0002-Handle-SET-STORAGE-and-SET-COMPRESSION-similarly.patch
Description: Binary data

Re: [PATCH] Add sortsupport for range types and btree_gist

2022-06-15 Thread Andrey Borodin

Hi Christoph!

> On 15 Jun 2022, at 15:45, Christoph Heiss  wrote:
> 
> By sorting the data before inserting it into the index results in a much 
> better index structure, leading to significant performance improvements.

Here's my version of the very similar idea [0]. It lacks range types support.
On a quick glance your version lacks support of abbreviated sort, so I think 
benchmarks can be pushed event further :)
Let's merge our efforts and create combined patch?

Please, create a new entry for the patch on Commitfest.

Thank you!

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/37/2824/

Re: replacing role-level NOINHERIT with a grant-level option

On Wed, Jun 15, 2022 at 5:23 AM Peter Eisentraut
 wrote:
> > Consider a user who in general prefers the NOINHERIT behavior but also
> > wants to use predefined roles. Perhaps user 'peter' is to be granted
> > both 'paul' and 'pg_execute_server_programs'. If role 'peter' is set
> > to INHERIT, Peter will be sad, because his love for NOINHERIT probably
> > means that he doesn't want to exercise Paul's privileges
> > automatically. However, he needs to inherit the privileges of
> > 'pg_execute_server_programs' or they are of no use to him. Peter
> > presumably wants to use COPY TO/FROM program to put data into a table
> > owned by 'peter', not a table owned by 'pg_execute_server_programs'.
> > If so, being able to SET ROLE to 'pg_execute_server_programs' is of no
> > use to him at all, but inheriting the privilege is useful.
>
> That's because our implementation of SET ROLE is bogus.  We should have
> a SET ROLE that is separate from SET SESSION AUTHORIZATION, where the
> current user can keep their current user-ness and additionally enable
> (non-inherited) roles.

It would help me to have a better description of what you think the
behavior ought to be. I've always thought there was something funny
about SET ROLE and SET SESSION AUTHORIZATION, because it seems like
they are too similar to each other. But it would surprise me if SET
ROLE added additional privileges to my session while leaving the old
ones intact, too, much as I'd be surprised if SET work_mem = '8MB'
followed by SET work_mem = '1GB' somehow left both values partly in
effect at the same time. It feels to me like SET is describing an
action that changes the session state, rather than adding to it.

> I'm mainly concerned that (AAIU), you propose to remove the current
> INHERIT/NOINHERIT attribute of roles.  I wouldn't like that.  If you
> want a feature that allows overriding that per-grant, maybe that's okay.

Yeah, I want to remove it and replace it with something more
fine-grained. I don't yet understand why that's a problem for anything
you want to do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Compression dictionaries for JSONB

2022-06-15 Thread Aleksander Alekseev

Hi Matthias,

> The bulk of the patch
> should still be usable, but I think that the way it interfaces with
> the CREATE TABLE (column ...) APIs would need reworking to build on
> top of the api's of the "pluggable toaster" patches (so, creating
> toasters instead of types). I think that would allow for an overall
> better user experience and better performance due to decreased need
> for fully decompressed type casting.

Many thanks for the feedback.

The "pluggable TOASTer" patch looks very interesting indeed. I'm
currently trying to make heads and tails of it and trying to figure
out if it can be used as a base for compression dictionaries,
especially for implementing the partial decompression. Hopefully I
will be able to contribute to it and to the dependent patch [1] in the
upcoming CF, at least as a tester/reviewer. Focusing our efforts on
[1] for now seems to be a good strategy.

My current impression of your idea is somewhat mixed at this point though.

Teodor's goal is to allow creating _extensions_ that implement
alternative TOAST strategies, which use alternative compression
algorithms and/or use the knowledge of the binary representation of
the particular type. For sure, this would be a nice thing to have.
However, during the discussion of the "compression dictionaries" RFC
the consensus was reached that the community wants to see it as a
_built_in_ functionality rather than an extension. Otherwise we could
simply add ZSON to /contrib/ as it was originally proposed.

So if we are going to keep "compression dictionaries" a built-in
functionality, putting artificial constraints on its particular
implementation, or adding artificial dependencies of two rather
complicated patches, is arguably a controversial idea. Especially
considering the fact that it was shown that the feature can be
implemented without these dependencies, in a very non-invasive way.

These are just my initial thoughts I would like to share though. I may
change my mind after diving deeper into a "pluggable TOASTer" patch.

I cc:'ed Teodor in case he would like to share his insights on the topic.

[1]: https://commitfest.postgresql.org/38/3479/

-- 
Best regards,
Aleksander Alekseev

Re: Replica Identity check of partition table on subscriber

On Wed, Jun 15, 2022 at 8:52 AM shiy.f...@fujitsu.com
 wrote:
>
> On Tue, Jun 14, 2022 8:57 PM Amit Kapila  wrote:
> >
> > > v4-0002 looks good btw, except the bitpick about test comment similar
> > > to my earlier comment regarding v5-0001:
> > >
> > > +# Change the column order of table on publisher
> > >
> > > I think it might be better to say something specific to describe the
> > > test intent, like:
> > >
> > > Test that replication into partitioned target table continues to works
> > > correctly when the published table is altered
> > >
> >
> > Okay changed this and slightly modify the comments and commit message.
> > I am just attaching the HEAD patches for the first two issues.
> >
>
> Thanks for updating the patch.
>
> Attached the new patch set which ran pgindent, and the patches for pg14 and
> pg13. (Only the first two patches of the patch set.)
>

I have pushed the first bug-fix patch today.


-- 
With Regards,
Amit Kapila.

Re: Perform streaming logical transactions by background workers and parallel apply

On Tue, Jun 14, 2022 at 9:07 AM wangw.f...@fujitsu.com
 wrote:
>
>
> Attach the new patches.
> Only changed patches 0001, 0004 and added new separate patch 0005.
>

Few questions/comments on 0001
===
1.
In the commit message, I see: "We also need to allow stream_stop to
complete by the apply background worker to avoid deadlocks because
T-1's current stream of changes can update rows in conflicting order
with T-2's next stream of changes."

Thinking about this, won't the T-1 and T-2 deadlock on the publisher
node as well if the above statement is true?

2.
+   
+The apply background workers are taken from the pool defined by
+max_logical_replication_workers.
+   
+   
+The default value is 3. This parameter can only be set in the
+postgresql.conf file or on the server command
+line.
+   

Is there a reason to choose this number as 3? Why not 2 similar to
max_sync_workers_per_subscription?

3.
+
+  
+   Setting streaming mode to apply could export invalid LSN
+   as finish LSN of failed transaction. Changing the streaming mode and making
+   the same conflict writes the finish LSN of the failed transaction in the
+   server log if required.
+  

How will the user identify that this is an invalid LSN value and she
shouldn't use it to SKIP the transaction? Can we change the second
sentence to: "User should change the streaming mode to 'on' if they
would instead wish to see the finish LSN on error. Users can use
finish LSN to SKIP applying the transaction." I think we can give
reference to docs where the SKIP feature is explained.

4.
+ * This file contains routines that are intended to support setting up, using,
+ * and tearing down a ApplyBgworkerState.
+ * Refer to the comments in file header of logical/worker.c to see more
+ * informations about apply background worker.

Typo. /informations/information.

Consider having an empty line between the above two lines.

5.
+ApplyBgworkerState *
+apply_bgworker_find_or_start(TransactionId xid, bool start)
{
...
...
+ if (!TransactionIdIsValid(xid))
+ return NULL;
+
+ /*
+ * We don't start new background worker if we are not in streaming apply
+ * mode.
+ */
+ if (MySubscription->stream != SUBSTREAM_APPLY)
+ return NULL;
+
+ /*
+ * We don't start new background worker if user has set skiplsn as it's
+ * possible that user want to skip the streaming transaction. For
+ * streaming transaction, we need to spill the transaction to disk so that
+ * we can get the last LSN of the transaction to judge whether to skip
+ * before starting to apply the change.
+ */
+ if (start && !XLogRecPtrIsInvalid(MySubscription->skiplsn))
+ return NULL;
+
+ /*
+ * For streaming transactions that are being applied in apply background
+ * worker, we cannot decide whether to apply the change for a relation
+ * that is not in the READY state (see should_apply_changes_for_rel) as we
+ * won't know remote_final_lsn by that time. So, we don't start new apply
+ * background worker in this case.
+ */
+ if (start && !AllTablesyncsReady())
+ return NULL;
...
...
}

Can we move some of these starting checks to a separate function like
canstartapplybgworker()?

-- 
With Regards,
Amit Kapila.

Re: Small TAP improvements

2022-06-15 Thread Andrew Dunstan



On 2022-06-14 Tu 19:13, Michael Paquier wrote:
> On Tue, Jun 14, 2022 at 12:20:56PM -0400, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> The second changes the new GUCs TAP test to check against the installed
>>> postgresql.conf.sample rather than the one in the original source
>>> location. There are probably arguments both ways, but if we ever decided
>>> to postprocess the file before installation, this would do the right thing.
>> Seems like a good idea, especially since it also makes the test code
>> shorter and more robust(-looking).
> It seems to me that you did not look at the git history very closely.
> The first version of 003_check_guc.pl did exactly what 0002 is
> proposing to do, see b0a55f4.  That's also why config_data() has been
> introduced in the first place.  This original logic has been reverted
> once shortly after, as of 52377bb, per a complain by Christoph Berg
> because this broke some of the assumptions the custom patches of
> Debian relied on:
> https://www.postgresql.org/message-id/ygyw25oxv5men...@msg.df7cb.de


Quite right, I missed that. Still, it now seems to be moot, given what
Christoph said at the bottom of the thread. If I'd seen the thread I
would probably have been inclined to say that is Debian can patch
pg_config they can also patch the test :-)


>
> And it was also pointed out that we'd better use the version in the
> source tree rather than a logic that depends on finding the path from
> the output of pg_config with an installation tree assumed to exist
> (there should be one for installcheck anyway), as of:
> https://www.postgresql.org/message-id/2023925.1644591...@sss.pgh.pa.us
>
> If the change of 0002 is applied, we will just loop back to the
> original issue with Debian.  So I am adding Christoph in CC, as he has
> also mentioned that the patch applied to PG for Debian that
> manipulates the installation paths has been removed, but I may be
> wrong in assuming that it is the case.



Honestly, I don't care all that much. I noticed these issues when
dealing with something for EDB that turned out not to be related to
these things. I can see arguments both ways on this one.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Small TAP improvements

2022-06-15 Thread Andrew Dunstan



On 2022-06-14 Tu 19:24, Michael Paquier wrote:
> On Tue, Jun 14, 2022 at 05:08:28PM -0400, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> OK, here's a more principled couple of patches. For config_data, if you
>>> give multiple options it gives you back the list of values. If you don't
>>> specify any, in scalar context it just gives you back all of pg_config's
>>> output, but in array context it gives you a map, so you should be able
>>> to say things like:
>>> my %node_config = $node->config_data;
>> Might be overkill, but since you wrote it already, looks OK to me.
> +   # exactly one option: hand back the output (minus LF)
> +   return $stdout if (@options == 1);
> +   my @lines = split(/\n/, $stdout);
> +   # more than one option: hand back the list of values;
> +   return @lines if (@options);
> +   # no options, array context: return a map
> +   my @map;
> +   foreach my $line (@lines)
> +   {
> +   my ($k,$v) = split (/ = /,$line,2);
> +   push(@map, $k, $v);
> +   }
>
> This patch is able to handle the case of no option and one option
> specified by the caller of the routine.  However, pg_config is able to
> return a set of values when specifying multiple switches, respecting
> the order of the switches, so wouldn't it be better to return a map
> made of ($option, $line)?  For example, on a command like `pg_config
> --sysconfdir --`, we would get back:
> (('--sysconfdir', sysconfdir_val), ('--localedir', localedir_val))
>
> If this is not worth the trouble, I think that you'd better die() hard
> if the caller specifies more than two option switches.


My would we do that? If you want a map don't pass any switches. But as
written you could do:


my ($incdir, $localedir, $sharedir) = $node->config_data(qw(--includedir 
--localedir --sharedir));


No map needed to get what you want, in fact a map would get in the way.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Add header support to text format and matching feature

2022-06-15 Thread Daniel Verite

Julien Rouhaud wrote:

> Maybe that's just me but I understand "not supported" as "this makes
> sense, but this is currently a limitation that might be lifted
> later".

Looking at ProcessCopyOptions(), there are quite a few invalid
combinations of options that produce
ERRCODE_FEATURE_NOT_SUPPORTED currently:

- HEADER in binary mode
- FORCE_QUOTE outside of csv
- FORCE_QUOTE outside of COPY TO
- FORCE_NOT_NULL outside of csv
- FORCE_NOT_NULL outside of COPY FROM
- ESCAPE outside of csv
- delimiter appearing in the NULL specification
- csv quote appearing in the NULL specification

FORCE_QUOTE and FORCE_NOT_NULL are options that only make sense in one
direction, so the errors when using these in the wrong direction are
comparable to the "HEADER MATCH outside of COPY FROM" error that we
want to add. In that sense, ERRCODE_FEATURE_NOT_SUPPORTED would be
consistent.

The other errors in the list above are more about the format itself,
with options that make sense only for one format. So the way we're
supposed to understand ERRCODE_FEATURE_NOT_SUPPORTED in these
other cases is that such format does not support such feature,
but without implying that it's a limitation.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-06-15 Thread Przemysław Sztoch


Two fixes (bad comment and fixed Latin-ASCII.xml).

Michael Paquier wrote on 17.05.2022 09:11:

On Thu, May 05, 2022 at 09:44:15PM +0200, Przemysław Sztoch wrote:

Tom, I disagree with you because many similar numerical conversions are
already taking place, e.g. 1/2, 1/4...

This part sounds like a valid argument to me.  unaccent.rules does
already the conversion of some mathematical signs, and the additions
proposed in the patch don't look that weird to me.  I agree with Peter
and Przemysław that this is reasonable.
--
Michael


--
Przemysław Sztoch | Mobile +48 509 99 00 66
commit 5ea0f226e1d0d7f0cc7b32fa4bd1bc6ed120c194
Author: Przemyslaw Sztoch 
Date:   Wed May 4 18:28:59 2022 +0200

Update unnaccent rules generator.

diff --git a/contrib/unaccent/generate_unaccent_rules.py 
b/contrib/unaccent/generate_unaccent_rules.py
index c405e231b3..d94a2a3bf6 100644
--- a/contrib/unaccent/generate_unaccent_rules.py
+++ b/contrib/unaccent/generate_unaccent_rules.py
@@ -40,6 +40,7 @@ sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
 # language knowledge.
 PLAIN_LETTER_RANGES = ((ord('a'), ord('z')),  # Latin lower case
(ord('A'), ord('Z')),  # Latin upper case
+   (ord('0'), ord('9')),  # Digits
(0x03b1, 0x03c9),  # GREEK SMALL LETTER ALPHA, 
GREEK SMALL LETTER OMEGA
(0x0391, 0x03a9))  # GREEK CAPITAL LETTER ALPHA, 
GREEK CAPITAL LETTER OMEGA
 
@@ -139,17 +140,17 @@ def get_plain_letter(codepoint, table):
 return codepoint
 
 # Should not come here
-assert(False)
+assert False, 'Codepoint U+%0.2X' % codepoint.id
 
 
 def is_ligature(codepoint, table):
 """Return true for letters combined with letters."""
-return all(is_letter(table[i], table) for i in codepoint.combining_ids)
+return all(i in table and is_letter(table[i], table) for i in 
codepoint.combining_ids)
 
 
 def get_plain_letters(codepoint, table):
 """Return a list of plain letters from a ligature."""
-assert(is_ligature(codepoint, table))
+assert is_ligature(codepoint, table), 'Codepoint U+%0.2X' % codepoint.id
 return [get_plain_letter(table[id], table) for id in 
codepoint.combining_ids]
 
 
@@ -248,7 +249,7 @@ def main(args):
 # walk through all the codepoints looking for interesting mappings
 for codepoint in all:
 if codepoint.general_category.startswith('L') and \
-   len(codepoint.combining_ids) > 1:
+   len(codepoint.combining_ids) > 0:
 if is_letter_with_marks(codepoint, table):
 charactersSet.add((codepoint.id,
chr(get_plain_letter(codepoint, table).id)))
@@ -257,6 +258,13 @@ def main(args):
"".join(chr(combining_codepoint.id)
for combining_codepoint
in get_plain_letters(codepoint, 
table
+elif codepoint.general_category.startswith('N') and \
+   len(codepoint.combining_ids) > 0 and \
+   args.noLigaturesExpansion is False and is_ligature(codepoint, 
table):
+charactersSet.add((codepoint.id,
+   "".join(chr(combining_codepoint.id)
+   for combining_codepoint
+   in get_plain_letters(codepoint, 
table
 elif is_mark_to_remove(codepoint):
 charactersSet.add((codepoint.id, None))
 
diff --git a/contrib/unaccent/unaccent.rules b/contrib/unaccent/unaccent.rules
index 3030166ed6..ff484d09d7 100644
--- a/contrib/unaccent/unaccent.rules
+++ b/contrib/unaccent/unaccent.rules
@@ -1,9 +1,15 @@
 ¡  !
 ©  (C)
+ª  a
 «  <<
   -
 ®  (R)
 ±  +/-
+²  2
+³  3
+µ  μ
+¹  1
+º  o
 »  >>
 ¼   1/4
 ½   1/2
@@ -402,6 +408,11 @@
 ʦ  ts
 ʪ  ls
 ʫ  lz
+ʰ  h
+ʲ  j
+ʳ  r
+ʷ  w
+ʸ  y
 ʹ  '
 ʺ  "
 ʻ  '
@@ -417,6 +428,9 @@
 ˖  +
 ˗  -
 ˜  ~
+ˡ  l
+ˢ  s
+ˣ  x
 ̀
 ́
 ̂
@@ -536,6 +550,17 @@
 ό  ο
 ύ  υ
 ώ  ω
+ϐ  β
+ϑ  θ
+ϒ  Υ
+ϕ  φ
+ϖ  π
+ϰ  κ
+ϱ  ρ
+ϲ  ς
+ϴ  Θ
+ϵ  ε
+Ϲ  Σ
 Ё  Е
 ё  е
 ᴀ  A
@@ -556,6 +581,50 @@
 ᴠ  V
 ᴡ  W
 ᴢ  Z
+ᴬ  A
+ᴮ  B
+ᴰ  D
+ᴱ  E
+ᴳ  G
+ᴴ  H
+ᴵ  I
+ᴶ  J
+ᴷ  K
+ᴸ  L
+ᴹ  M
+ᴺ  N
+ᴼ  O
+ᴾ  P
+ᴿ  R
+ᵀ  T
+ᵁ  U
+ᵂ  W
+ᵃ  a
+ᵇ  b
+ᵈ  d
+ᵉ  e
+ᵍ  g
+ᵏ  k
+ᵐ  m
+ᵒ  o
+ᵖ  p
+ᵗ  t
+ᵘ  u
+ᵛ  v
+ᵝ  β
+ᵞ  γ
+ᵟ  δ
+ᵠ  φ
+ᵡ  χ
+ᵢ  i
+ᵣ  r
+ᵤ  u
+ᵥ  v
+ᵦ  β
+ᵧ  γ
+ᵨ  ρ
+ᵩ  φ
+ᵪ  χ
 ᵫ  ue
 ᵬ  b
 ᵭ  d
@@ -592,6 +661,10 @@
 ᶓ  e
 ᶖ  i
 ᶙ  u
+ᶜ  c
+ᶠ  f
+ᶻ  z
+ᶿ  θ
 Ḁ  A
 ḁ  a
 Ḃ  B
@@ -947,12 +1020,19 @@
 Ὦ

[PATCH] Add sortsupport for range types and btree_gist

2022-06-15 Thread Christoph Heiss


Hi all!

The motivation behind this is that incrementally building up a GiST 
index for certain input data can create a terrible tree structure.
Furthermore, exclusion constraints are commonly implemented using GiST 
indices and for that use case, data is mostly orderable.


By sorting the data before inserting it into the index results in a much 
better index structure, leading to significant performance improvements.


Testing was done using following setup, with about 50 million rows:

   CREATE EXTENSION btree_gist;
   CREATE TABLE t (id uuid, block_range int4range);
   CREATE INDEX ON before USING GIST (id, block_range);
   COPY t FROM '..' DELIMITER ',' CSV HEADER;

using

   SELECT * FROM t WHERE id = '..' AND block_range && '..'

as test query, using a unpatched instance and one with the patch applied.

Some stats for fetching 10,000 random rows using the query above,
100 iterations to get good averages.

The benchmarking was done on a unpatched instance compiled using the 
exact same options as with the patch applied.

[ Results are noted in a unpatched -> patched fashion. ]

First set of results are after the initial CREATE TABLE, CREATE INDEX 
and a COPY to the table, thereby incrementally building the index.


Shared Hit Blocks (average): 110.97 -> 78.58
Shared Read Blocks (average): 58.90 -> 47.42
Execution Time (average): 1.10 -> 0.83 ms
I/O Read Time (average): 0.19 -> 0.15 ms

After a REINDEX on the table, the results improve even more:

Shared Hit Blocks (average): 84.24 -> 8.54
Shared Read Blocks (average): 49.89 -> 0.74
Execution Time (average): 0.84 -> 0.065 ms
I/O Read Time (average): 0.16 -> 0.004 ms

Additionally, the time a REINDEX takes also improves significantly:

672407.584 ms (11:12.408) -> 130670.232 ms (02:10.670)

Most of the sortsupport for btree_gist was implemented by re-using 
already existing infrastructure. For the few remaining types (bit, bool, 
cash, enum, interval, macaddress8 and time) I manually implemented them 
directly in btree_gist.
It might make sense to move them into the backend for uniformity, but I 
wanted to get other opinions on that first.


`make check-world` reports no regressions.

Attached below, besides the patch, are also two scripts for benchmarking.

`bench-gist.py` to benchmark the actual patch, example usage of this 
would be e.g. `./bench-gist.py -o results.csv public.table`. This 
expects a local instance with no authentication and default `postgres` 
user. The port can be set using the `--port` option.


`plot.py` prints average values (as used above) and creates boxplots for 
each statistic from the result files produced with `bench-gist.py`. 
Depends on matplotlib and pandas.


Additionally, if needed, the sample dataset used to benchmark this is 
available to independently verify the results [1].


Thanks,
Christoph Heiss

---

[1] https://drive.google.com/file/d/1SKRiUYd78_zl7CeD8pLDoggzCCh0wj39From 22e1b60929c39309e06c2477d8d027e60ad49b9d Mon Sep 17 00:00:00 2001
From: Christoph Heiss 
Date: Thu, 9 Jun 2022 17:07:46 +0200
Subject: [PATCH 1/1] Add sortsupport for range types and btree_gist

Incrementally building up a GiST index can result in a sub-optimal index
structure for range types.
By sorting the data before inserting it into the index will result in a
much better index.

This can provide sizeable improvements in query execution time, I/O read
time and shared block hits/reads.

Signed-off-by: Christoph Heiss 
---
 contrib/btree_gist/Makefile |   3 +-
 contrib/btree_gist/btree_bit.c  |  19 
 contrib/btree_gist/btree_bool.c |  22 
 contrib/btree_gist/btree_cash.c |  22 
 contrib/btree_gist/btree_enum.c |  19 
 contrib/btree_gist/btree_gist--1.7--1.8.sql | 105 
 contrib/btree_gist/btree_gist.control   |   2 +-
 contrib/btree_gist/btree_interval.c |  19 
 contrib/btree_gist/btree_macaddr8.c |  19 
 contrib/btree_gist/btree_time.c |  19 
 src/backend/utils/adt/rangetypes_gist.c |  70 +
 src/include/catalog/pg_amproc.dat   |   3 +
 src/include/catalog/pg_proc.dat |   3 +
 13 files changed, 323 insertions(+), 2 deletions(-)
 create mode 100644 contrib/btree_gist/btree_gist--1.7--1.8.sql

diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index 48997c75f6..4096de73ea 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -33,7 +33,8 @@ EXTENSION = btree_gist
 DATA = btree_gist--1.0--1.1.sql \
btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \
-   btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql
+   btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \
+   btree_gist--1.7--1.8.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp

Re: replacing role-level NOINHERIT with a grant-level option


On 13.06.22 20:00, Robert Haas wrote:

I don't think this creates any more of a conflict than we've already
got. In fact, I'd go so far as to say it resolves a problem that we
currently have. As far as I can see, we are stuck with a situation
where we have to support both the INHERIT behavior and the NOINHERIT
behavior. Removing either one would be a pretty major compatibility
break. And even if some people were willing to endorse that, it seems
clear from previous discussions that there are people who like the
NOINHERIT behavior and would object to its removal, and other people
(like me!) who like the INHERIT behavior and would object to removing
that. If you think it's feasible to get rid of either of these
behaviors, I'd be interested in hearing your thoughts on that, but to
me it looks like we are stuck with supporting both. From my point of
view, the question is how to make the best of that situation.


I think we want to keep both.


Consider a user who in general prefers the NOINHERIT behavior but also
wants to use predefined roles. Perhaps user 'peter' is to be granted
both 'paul' and 'pg_execute_server_programs'. If role 'peter' is set
to INHERIT, Peter will be sad, because his love for NOINHERIT probably
means that he doesn't want to exercise Paul's privileges
automatically. However, he needs to inherit the privileges of
'pg_execute_server_programs' or they are of no use to him. Peter
presumably wants to use COPY TO/FROM program to put data into a table
owned by 'peter', not a table owned by 'pg_execute_server_programs'.
If so, being able to SET ROLE to 'pg_execute_server_programs' is of no
use to him at all, but inheriting the privilege is useful.


That's because our implementation of SET ROLE is bogus.  We should have 
a SET ROLE that is separate from SET SESSION AUTHORIZATION, where the 
current user can keep their current user-ness and additionally enable 
(non-inherited) roles.



I don't think I'm proposing to break anything or go in a totally
opposite direction from anything, and to be honest I'm kind of
confused as to why you think that what I'm proposing would have that
effect. As far as I can see, the changes that I'm proposing are
upward-compatible and would permit easy migration to new releases via
either pg_dump or pg_upgrade with no behavior changes at all. Some
syntax would be a bit different on the new releases and that would
unlock some new options we don't currently have, but there's no
behavior that you can get today which you wouldn't be able to get any
more under this proposal.


I'm mainly concerned that (AAIU), you propose to remove the current 
INHERIT/NOINHERIT attribute of roles.  I wouldn't like that.  If you 
want a feature that allows overriding that per-grant, maybe that's okay.

Re: Add header support to text format and matching feature




On 14.06.22 11:13, Julien Rouhaud wrote:

There is no need for the extra
comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED.

Is there any rule for what error code should be used?

Maybe that's just me but I understand "not supported" as "this makes sense, but
this is currently a limitation that might be lifted later".


I tend to agree with that interpretation.

Also, when you consider the way SQL rules and error codes are set up, 
errors that are detected during parse analysis should be a subclass of 
"syntax error or access rule violation".

Re: Handle infinite recursion in logical replication setup

PSA a test script that demonstrates all the documented steps for
setting up n-way bidirectional replication. These steps are the same
as those documented [1] on the new page "Bidirectional logical
replication".

This script works using the current latest v20* patch set. Each of the
sections of 31.11.1 - 31.11.5 (see below) can be run independently
(just edit the script and at the bottom uncomment the part you want to
test):
31.11.1. Setting bidirectional replication between two nodes
31.11.2. Adding a new node when there is no data in any of the nodes
31.11.3. Adding a new node when data is present in the existing nodes
31.11.4. Adding a new node when data is present in the new node
31.11.5. Generic steps for adding a new node to an existing set of nodes

Some sample output is also attached.

--
[1]
https://www.postgresql.org/message-id/attachment/134464/v20-0004-Document-bidirectional-logical-replication-steps.patch

Kind Regards,
Peter Smith.
Fujitsu Australia
Clean up
pg_ctl: directory "data_N1" does not exist
pg_ctl: directory "data_N2" does not exist
pg_ctl: directory "data_N3" does not exist
pg_ctl: directory "data_N4" does not exist
rm: cannot remove ‘data_N1’: No such file or directory
rm: cannot remove ‘data_N2’: No such file or directory
rm: cannot remove ‘data_N3’: No such file or directory
rm: cannot remove ‘data_N4’: No such file or directory
Set up
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_AU.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data_N1 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Australia/Sydney
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option
-A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

pg_ctl -D data_N1 -l logfile start

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_AU.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data_N2 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Australia/Sydney
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

pg_ctl -D data_N2 -l logfile start

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_AU.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data_N3 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Australia/Sydney
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

pg_ctl -D data_N3 -l logfile start

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_AU.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data_N4 ... ok

Re: better page-level checksums


On 13.06.22 20:20, Robert Haas wrote:

If the user wants 16-bit checksums, the feature we've already got
seems good enough -- and, as you say, it doesn't use any extra disk
space. This proposal is just about making people happy if they want a
bigger checksum.


It's hard to get any definite information about what size of checksum is 
"good enough", since after all it depends on what kinds of errors you 
expect and what kinds of probabilities you want to accept.  But the best 
I could gather so far is that 16-bit CRC are good until about 16 kB 
block size.


Which leads to the question whether there is really a lot of interest in 
catering to larger block sizes.  The recent thread about performance 
impact of different block sizes might renew interest in this.  But 
unless we really want to encourage playing with the block sizes (and if 
my claim above is correct), then a larger checksum size might not be needed.



On the topic of which algorithm to use, I'd be inclined to think that
it is going to be more useful to offer checksums that are 64 bits or
more, since IMHO 32 is not all that much more than 16, and I still
think there are going to be alignment issues. Beyond that I don't have
anything against your specific suggestions, but I'd like to hear what
other people think.


Again, gathering some vague information ...

The benefits of doubling the checksum size are exponential rather than 
linear, so there is no significant benefit of using a 64-bit checksum 
over a 32-bit one, for supported block sizes (current max is 32 kB).

Re: Typo in ro.po file?


On 14.06.22 05:34, Peter Smith wrote:

FWIW, I stumbled on this obscure possible typo (?) in src/pl/plperl/po/ro.po:

~~~

#: plperl.c:788
msgid "while parsing Perl initialization"
msgstr "în timpul parsing inițializării Perl"
#: plperl.c:793
msgid "while running Perl initialization"
msgstr "în timpul rulării intializării Perl"

~~~

(Notice the missing 'i' -  "inițializării" versus "intializării")


Fixed in translations repository.  Thanks.

RE: tablesync copy ignores publication actions

2022-06-15 Thread shiy.f...@fujitsu.com

On Tue, Jun 14, 2022 3:36 PM Peter Smith  wrote:
> 
> PSA v2 of the patch, based on all feedback received.
> 
> ~~~
> 
> Main differences from v1:
> 
> * Rewording and more explanatory text.
> 
> * The examples were moved to the "Subscription" [1] page and also
> extended to show some normal replication and row filter examples, from
> [Amit].
> 
> * Added some text to CREATE PUBLICATION 'publish' param [2], from
> [Euler][Amit].
> 
> * Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san].
> 
> * Added some text to the "Publication page" [4] to say the 'publish'
> is only for DML operations.
> 
> * I changed the note in "Row Filter - Initial Data Synchronization"
> [5] to be a warning because I felt users could be surprised to see
> data exposed by the initial copy, which a DML operation would not
> expose.
> 

Thanks for updating the patch. Two comments:

1.
+ it means the copied table t3 contains all rows even 
when
+ they do not patch the row filter of publication pub3b.

Typo. I think "they do not patch the row filter" should be "they do not match
the row filter", right?

2.
@@ -500,7 +704,6 @@
   
  
 
-
   
 
   

It seems we should remove this change.

Regards,
Shi yu

Re: Remove trailing newlines from pg_upgrade's messages


On 14.06.22 20:57, Tom Lane wrote:

I'll stick this in the CF queue, but I wonder if there is any case
for squeezing it into v15 instead of waiting for v16.


Let's stick this into 16 and use it as a starting point of tidying all 
this up in pg_upgrade.

Re: "buffer too small" or "path too long"?


On 14.06.22 03:55, Michael Paquier wrote:

On Tue, Jun 14, 2022 at 09:52:52AM +0900, Kyotaro Horiguchi wrote:

At Tue, 14 Jun 2022 09:48:26 +0900 (JST), Kyotaro Horiguchi 
 wrote in

Yeah, I feel so and it is what I wondered about recently when I saw
some complete error messages.  Is that because of the length of the
subject?


And I found that it is alrady done. Thanks!


I have noticed this thread and 4e54d23 as a result this morning.  If
you want to spread this style more, wouldn't it be better to do that
in all the places of pg_upgrade where we store paths to files?  I can
see six code paths with log_opts.basedir that could do the same, as of
the attached.  The hardcoded file names have various lengths, and some
of them are quite long making the generated paths more exposed to
being cut in the middle.


We have this problem of long file names being silently truncated all 
over the source code.  Instead of equipping each one of them with a 
length check, why don't we get rid of the fixed-size buffers and 
allocate dynamically, as in the attached patch.From 1a21434c584319b28fd483444aa24b7b54b4f949 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 15 Jun 2022 07:25:07 +0200
Subject: [PATCH] pg_upgrade: Use psprintf in make_outputdirs

---
 src/bin/pg_upgrade/pg_upgrade.c | 42 -
 1 file changed, 10 insertions(+), 32 deletions(-)

diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 265d829490..f7e3461cfe 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -216,19 +216,13 @@ main(int argc, char **argv)
 static void
 make_outputdirs(char *pgdata)
 {
-   FILE   *fp;
-   char  **filename;
time_t  run_time = time(NULL);
-   charfilename_path[MAXPGPATH];
+   char   *filename_path;
chartimebuf[128];
struct timeval time;
time_t  tt;
-   int len;
 
-   log_opts.rootdir = (char *) pg_malloc0(MAXPGPATH);
-   len = snprintf(log_opts.rootdir, MAXPGPATH, "%s/%s", pgdata, 
BASE_OUTPUTDIR);
-   if (len >= MAXPGPATH)
-   pg_fatal("directory path for new cluster is too long\n");
+   log_opts.rootdir = psprintf("%s/%s", pgdata, BASE_OUTPUTDIR);
 
/* BASE_OUTPUTDIR/$timestamp/ */
gettimeofday(, NULL);
@@ -237,25 +231,13 @@ make_outputdirs(char *pgdata)
/* append milliseconds */
snprintf(timebuf + strlen(timebuf), sizeof(timebuf) - strlen(timebuf),
 ".%03d", (int) (time.tv_usec / 1000));
-   log_opts.basedir = (char *) pg_malloc0(MAXPGPATH);
-   len = snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", log_opts.rootdir,
-  timebuf);
-   if (len >= MAXPGPATH)
-   pg_fatal("directory path for new cluster is too long\n");
+   log_opts.basedir = psprintf("%s/%s", log_opts.rootdir, timebuf);
 
/* BASE_OUTPUTDIR/$timestamp/dump/ */
-   log_opts.dumpdir = (char *) pg_malloc0(MAXPGPATH);
-   len = snprintf(log_opts.dumpdir, MAXPGPATH, "%s/%s/%s", 
log_opts.rootdir,
-  timebuf, DUMP_OUTPUTDIR);
-   if (len >= MAXPGPATH)
-   pg_fatal("directory path for new cluster is too long\n");
+   log_opts.dumpdir = psprintf("%s/%s/%s", log_opts.rootdir, timebuf, 
DUMP_OUTPUTDIR);
 
/* BASE_OUTPUTDIR/$timestamp/log/ */
-   log_opts.logdir = (char *) pg_malloc0(MAXPGPATH);
-   len = snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
-  timebuf, LOG_OUTPUTDIR);
-   if (len >= MAXPGPATH)
-   pg_fatal("directory path for new cluster is too long\n");
+   log_opts.logdir = psprintf("%s/%s/%s", log_opts.rootdir, timebuf, 
LOG_OUTPUTDIR);
 
/*
 * Ignore the error case where the root path exists, as it is kept the
@@ -270,21 +252,17 @@ make_outputdirs(char *pgdata)
if (mkdir(log_opts.logdir, pg_dir_create_mode) < 0)
pg_fatal("could not create directory \"%s\": %m\n", 
log_opts.logdir);
 
-   len = snprintf(filename_path, sizeof(filename_path), "%s/%s",
-  log_opts.logdir, INTERNAL_LOG_FILE);
-   if (len >= sizeof(filename_path))
-   pg_fatal("directory path for new cluster is too long\n");
+   filename_path = psprintf("%s/%s", log_opts.logdir, INTERNAL_LOG_FILE);
 
if ((log_opts.internal = fopen_priv(filename_path, "a")) == NULL)
pg_fatal("could not open log file \"%s\": %m\n", filename_path);
 
/* label start of upgrade in logfiles */
-   for (filename = output_files; *filename != NULL; filename++)
+   for (char **filename = output_files; *filename != NULL; filename++)
{
-   len = snprintf(filename_path, sizeof(filename_path), "%s/%s",
-

RE: Support logical replication of DDLs

2022-06-15 Thread houzj.f...@fujitsu.com

On Wednesday, June 15, 2022 8:14 AM Zheng Li  wrote:
> 
> > Thanks for providing this idea.
> >
> > I looked at the string that is used for replication:
> >
> > """
> > {ALTERTABLESTMT :relation {RANGEVAR :schemaname public :relname foo
> > :inh true :relpersistence p :alias <> :location 12} :cmds ({ALTERTABLECMD
> > :subtype 0 :name <> :num 0 :newowner <> :def {COLUMNDEF :colname b
> > :typeName {TYPENAME :names ("public" "timestamptz") :typeOid 0 :setof
> > false :pct_type false :typmods <> :typemod -1 :arrayBounds <> :location
> > 29} :compression <> :inhcount 0 :is_local true :is_not_null false
> > :is_from_type false :storage <> :raw_default <> :cooked_default <>
> > :identity <> :identitySequence <> :generated <> :collClause <> :collOid 0
> > :constraints <> :fdwoptions <> :location 27} :behavior 0 :missing_ok
> > false}) :objtype 41 :missing_ok false}
> > """
> >
> > I think the converted parsetree string includes lots of internal
> > objects(e.g typeOid/pct_type/objtype/collOid/location/...). These are
> > unnecessary stuff for replication and we cannot make sure all the internal
> > stuff are consistent among pub/sub. So I am not sure whether replicating
> > this string is better.
> >
> > Besides, replicating the string from nodetostring() means we would need to
> > deal with the structure difference between the publisher and the
> > subscriber if any related structure has been changed which seems not good.
> 
> Yeah, this existing format is not designed to be portable between different
> major versions. So it can't directly be used for replication without
> serious modification.
> 
> > IMO, The advantages of the deparsing approach(as implemented in the POC
> > patch set[1]) are:
> >
> > 1) We can generate a command representation that can be
> > parsed/processed/transformed arbitrarily by the subscriber using generic
> > rules it(for example: user can easily replace the schema name in it) while
> > the results of nodetostring() seems not a standard json string, so I am
> > not sure can user reuse it without traversing the parsetree again.
> >
> > 2) With event_trigger + deparser, we can filter the unpublished objects
> > easier. For example: "DROP TABLE table_pub, table_unpub;". We can deparse
> > it into two commands "DROP TABLE table_pub" and "DROP TABLE
> table_pub" and
> > only publish the first one.
> >
> > 3) With deparser, we are able to query the catalog in the deparser to
> > build a complete command(filled with schemaname...) which user don't need
> > to do any other work for it. We don't need to force the subscriber to set
> > the same search_path as the publisher which give user more flexibility.
> >
> > 4) For CREATE TABLE AS, we can separate out the CREATE TABLE part with the
> > help of deparser and event trigger. This can avoid executing the subquery
> > on subcriber.
> >
> > 5) For ALTER TABLE command. We might want to filter out the DDL which use
> > volatile function as discussed in [2]. We can achieve this easier by
> > extending the deparser to check the functions used. We can even rebuild a
> > command without unsupported functions to replicate by using deparser.
> >
> > There may be more cases I am missing as we are still analyzing other DDLs.
> 
> How does the deparser deparses CREATE FUNCTION STATEMENT? Will it
> schema qualify
> objects inside the function definition?

The current deparser doesn't schema qualify objects inside the function
source as we won't know the schema of inner objects until the function is
executed. The deparser will only schema qualify the objects around
function declaration Like:

CREATE FUNCTION [public].test_func(i [pg_catalog].int4 ) RETURNS  
[pg_catalog].int4 LANGUAGE plpgsql

Best regards,
Hou zj

Re: Handle infinite recursion in logical replication setup

Here are some review comments for patch v20-0004.

==

1. General

I thought that it is better to refer to the
subscription/publications/table "on" the node, rather than "in" the
node. Most of the review comments below are related to this point.

==

2. Commit message

a) Creating a two-node bidirectional replication when there is no data
in both nodes.
b) Adding a new node when there is no data in any of the nodes.
c) Adding a new node when data is present in the existing nodes.

"in both nodes" -> "on both nodes"
"in any of the nodes" -> "on any of the nodes"
"in the existing nodes" -> "on the existing nodes"

==

3. doc/src/sgml/logical-replication.sgml - Setting bidirectional
replication between two nodes

3a.
+   
+The following steps demonstrate how to create a two-node bidirectional
+replication when there is no table data present in both nodes
+node1 and node2:
+   
-> "on both nodes"

3b.
+Create a publication in node1:
-> "on"

3c.
+Create a publication in node2:
-> "on"

3d.
+   
+Lock the table t1 in node1 and
+node2 in EXCLUSIVE mode until the
+setup is completed.
+   
-> "on node1"

3e.
+Create a subscription in node2 to subscribe to
-> "on"

3f.
+Create a subscription in node1 to subscribe to
+node2:
-> "on"

~~~

4. doc/src/sgml/logical-replication.sgml - Adding a new node when
there is no data in any of the nodes

4a.
+   Adding a new node when there is no data in any of the nodes
SUGGESTION
Adding a new node when there is no table data on any of the nodes

4b.
+   
+The following steps demonstrate adding a new node node3
+to the existing node1 and node2 when
+there is no t1 data in any of the nodes. This requires
+creating subscriptions in node1 and
+node2 to replicate the data from
+node3 and creating subscriptions in
+node3 to replicate data from node1
+and node2. Note: These steps assume that the
+bidirectional logical replication between node1 and
+node2 is already completed.
+   

"data in any of the nodes" -> "data on any of the nodes"
"creating subscriptions in node1" -> "creating
subscriptions on node1"
"creating subscriptions in node3" -> "creating
subscriptions on node3"

4c.
+Create a publication in node3:
-> "on"

4d.
+Lock table t1 in all the nodes
-> "on"

4e.
+Create a subscription in node1 to subscribe to
+node3:
-> "on"

4f.
+Create a subscription in node2 to subscribe to
+node3:
-> "on"

4g.
+Create a subscription in node3 to subscribe to
+node1:
-> "on"

4h.
+Create a subscription in node3 to subscribe to
+node2:

4i.
+node3. Incremental changes made in any node will be
+replicated to the other two nodes.
"in any node" -> "on any node"

~~~

5. doc/src/sgml/logical-replication.sgml - Adding a new node when data
is present in the existing nodes

5a.
+   Adding a new node when data is present in the existing nodes
SUGGESTION
Adding a new node when table data is present on the existing nodes

5b.
+ during initial data synchronization. Note: These steps assume that the
+ bidirectional logical replication between node1 and
+ node2 is already completed, and the pre-existing data
+ in table t1 is already synchronized in both those
+ nodes.
+   
"in both those nodes" -> "on both those nodes"

5c.
+Create a publication in node3
-> "on"

5d.
+Lock table t1 in node2 and
-> "on"

5e.
+Create a subscription in node1 to subscribe to
+node3:
-> "on"

5f.
+Create a subscription in node2 to subscribe to
+node3:
-> "on"

5g.
+Create a subscription in node3 to subscribe to
+node1. Use copy_data = force  so that
+the existing table data is copied during initial sync:
-> "on"


5h.
+Create a subscription in node3 to subscribe to
+node2. Use copy_data = off
-> "on"

5i.
+node3. Incremental changes made in any node will be
+replicated to the other two nodes.
"in any node" -> "on any node"

~~~

6. doc/src/sgml/logical-replication.sgml - Adding a new node when data
is present in the new node

+   Adding a new node when data is present in the new node
SUGGESTION
Adding a new node when table data is present on the new node

~~~

7. doc/src/sgml/logical-replication.sgml - Generic steps for adding a
new node to an existing set of nodes

7a.
+   
+Step-2: Lock the required tables of the new node in EXCLUSIVE mode until
+the setup is complete. (This lock is necessary to prevent any modifications
+from happening in the new node because if data modifications occurred after
+Step-3, there is a chance that the modifications will be published to the
+first node and then synchronized back to the new node while creating the
+subscription in Step-5. This would result in inconsistent data).
+   
"happening in the new node" -> "happening on the new node"

7b.
+not be synchronized to the new node. This would result in inconsistent
+data. There is no need to lock the

Re: Handle infinite recursion in logical replication setup

Here are some review comments for patch v20-0003.

==

1. Commit message

In case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

SUGGESTION
If table t1 has a unique key, this will lead to a unique key
violation, and replication won't proceed.

~~~

2. Commit message

This problem can be solved by using...

SUGGESTION
This problem can be avoided by using...

~~~

3. Commit message

step 3: Create a subscription in node1 to subscribe to node2. Use
'copy_data = on' so that the existing table data is copied during
initial sync:
node1=# CREATE SUBSCRIPTION sub_node1_node2 CONNECTION ''
node1-# PUBLICATION pub_node2 WITH (copy_data = off, origin = local);
CREATE SUBSCRIPTION

This is wrong information. The table on node2 has no data, so talking
about 'copy_data = on' is inappropriate here.

==

4. Commit message

IMO it might be better to refer to subscription/publication/table "on"
nodeXXX, instead of saying "in" nodeXXX.

4a.
"the publication tables were also subscribing data in the publisher
from other publishers." -> "the publication tables were also
subscribing from other publishers.

4b.
"After the subscription is created in node2" -> "After the
subscription is created on node2"

4c.
"step 3: Create a subscription in node1 to subscribe to node2." ->
"step 3: Create a subscription on node1 to subscribe to node2."

4d.
"step 4: Create a subscription in node2 to subscribe to node1." ->
"step 4: Create a subscription on node2 to subscribe to node1."

==

5. doc/src/sgml/ref/create_subscription.sgml

@@ -383,6 +397,15 @@ CREATE SUBSCRIPTION subscription_name

+  
+   If the subscription is created with origin = local and
+   copy_data = on, it will check if the publisher tables are
+   being subscribed to any other publisher and, if so, then throw an error to
+   prevent possible non-local data from being copied. The user can override
+   this check and continue with the copy operation by specifying
+   copy_data = force.
+  

Perhaps it is better here to say 'copy_data = true' instead of
'copy_data = on', simply because the value 'true' was mentioned
earlier on this page (but this would be the first mention of 'on').

==

6. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

+ errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force."));

Saying "off or force" is not consistent with the other message wording
in this patch, which used "/" for multiple enums.
(e.g. "connect = false", "copy_data = true/force").

So perhaps this errhint should be worded similarly:
"Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force."

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Handle infinite recursion in logical replication setup