PG15 beta1 sort performance regression due to Generation context change

2022-05-19 Thread David Rowley
Hackers,

Over the past few days I've been gathering some benchmark results
together to show the sort performance improvements in PG15 [1].

One of the test cases I did was to demonstrate Heikki's change to use
a k-way merge (65014000b).

The test I did to try this out was along the lines of:

set max_parallel_workers_per_gather = 0;
create table t (a bigint not null, b bigint not null, c bigint not
null, d bigint not null, e bigint not null, f bigint not null);

insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x; -- 10GB!
vacuum freeze t;

The query I ran was:

select * from t order by a offset 140247142;

I tested various sizes of work_mem starting at 4MB and doubled that
all the way to 16GB. For many of the smaller values of work_mem the
performance is vastly improved by Heikki's change, however for
work_mem = 64MB I detected quite a large slowdown. PG14 took 20.9
seconds and PG15 beta 1 took 29 seconds!

I've been trying to get to the bottom of this today and finally have
discovered this is due to the tuple size allocations in the sort being
exactly 64 bytes. Prior to 40af10b57 (Use Generation memory contexts
to store tuples in sorts) the tuple for the sort would be stored in an
aset context. After 40af10b57 we'll use a generation context.  The
idea with that change is that  the generation context does no
power-of-2 round ups for allocations, so we save memory in most cases.
However, due to this particular test having a tuple size of 64-bytes,
there was no power-of-2 wastage with aset.

The problem is that generation chunks have a larger chunk header than
aset do due to having to store the block pointer that the chunk
belongs to so that GenerationFree() can increment the nfree chunks in
the block. aset.c does not require this as freed chunks just go onto a
freelist that's global to the entire context.

Basically, for my test query, the slowdown is because instead of being
able to store 620702 tuples per tape over 226 tapes with an aset
context, we can now only store 576845 tuples per tape resulting in
requiring 244 tapes when using the generation context.

If I had added column "g" to make the tuple size 72 bytes causing
aset's code to round allocations up to 128 bytes and generation.c to
maintain the 72 bytes then the sort would have stored 385805 tuples
over 364 batches for aset and 538761 tuples over 261 batches using the
generation context. That would have been a huge win.

So it basically looks like I discovered a very bad case that causes a
significant slowdown.  Yet other cases that are not an exact power of
2 stand to gain significantly from this change.

One thing 40af10b57 does is stops those terrible performance jumps
when the tuple size crosses a power-of-2 boundary. The performance
should be more aligned to the size of the data being sorted now...
Unfortunately, that seems to mean regressions for large sorts with
power-of-2 sized tuples.

I'm unsure exactly what I should do about this right now.

David

[1] 
https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/speeding-up-sort-performance-in-postgres-15/ba-p/3396953#change4




Re: Skipping schema changes in publication

2022-05-19 Thread Peter Smith
Below are my review comments for v6-0002.

==

1. Commit message.
The psql \d family of commands to display excluded tables.

SUGGESTION
The psql \d family of commands can now display excluded tables.

~~~

2. doc/src/sgml/ref/alter_publication.sgml

@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 ALTER PUBLICATION name
ADD publication_object [,
...]
+ALTER PUBLICATION name
ADD ALL TABLES [ EXCEPT [ TABLE ] exception_object [, ... ] ]

The "exception_object" font is wrong. Should look the same as
"publication_object"

~~~

3. doc/src/sgml/ref/alter_publication.sgml - Examples

@@ -214,6 +220,14 @@ ALTER PUBLICATION sales_publication ADD ALL
TABLES IN SCHEMA marketing, sales;
 
   

+  
+   Alter publication production_publication to publish
+   all tables except users and
+   departments tables:
+
+ALTER PUBLICATION production_publication ADD ALL TABLES EXCEPT TABLE
users, departments;
+

Consider using "EXCEPT" instead of "EXCEPT TABLE" because that will
show TABLE keyword is optional.

~~~

4. doc/src/sgml/ref/create_publication.sgml

An SGML tag error caused building the docs to fail. My fix was
previously reported [1].

~~~

5. doc/src/sgml/ref/create_publication.sgml

@@ -22,7 +22,7 @@ PostgreSQL documentation
  
 
 CREATE PUBLICATION name
-[ FOR ALL TABLES
+[ FOR ALL TABLES [ EXCEPT [ TABLE ] exception_object [, ... ] ]

The "exception_object" font is wrong. Should look the same as
"publication_object"

~~~

6. doc/src/sgml/ref/create_publication.sgml - Examples

@@ -351,6 +366,15 @@ CREATE PUBLICATION production_publication FOR
TABLE users, departments, ALL TABL
 CREATE PUBLICATION sales_publication FOR ALL TABLES IN SCHEMA marketing, sales;
 

+  
+   Create a publication that publishes all changes in all the tables except for
+   the changes of users and
+   departments table:
+
+CREATE PUBLICATION mypublication FOR ALL TABLE EXCEPT TABLE users, departments;
+
+  
+

6a.
Typo: "FOR ALL TABLE" -> "FOR ALL TABLES"

6b.
Consider using "EXCEPT" instead of "EXCEPT TABLE" because that will
show TABLE keyword is optional.

~~~

7. src/backend/catalog/pg_publication.c - GetTopMostAncestorInPublication

@@ -316,18 +316,25 @@ GetTopMostAncestorInPublication(Oid puboid, List
*ancestors, int *ancestor_level
  }
  else
  {
- aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
- if (list_member_oid(aschemaPubids, puboid))
+ List*aschemapubids = NIL;
+ List*aexceptpubids = NIL;
+
+ aschemapubids = GetSchemaPublications(get_rel_namespace(ancestor));
+ aexceptpubids = GetRelationPublications(ancestor, true);
+ if (list_member_oid(aschemapubids, puboid) ||
+ (puballtables && !list_member_oid(aexceptpubids, puboid)))
  {

You could re-write this as multiple conditions instead of one. That
could avoid always assigning the 'aexceptpubids', so it might be a
more efficient way to write this logic.

~~~

8. src/backend/catalog/pg_publication.c - CheckPublicationDefValues

+/*
+ * Check if the publication has default values
+ *
+ * Check the following:
+ * Publication is having default options
+ *  Publication is not associated with relations
+ *  Publication is not associated with schemas
+ *  Publication is not set with "FOR ALL TABLES"
+ */
+static bool
+CheckPublicationDefValues(HeapTuple tup)

8a.
Remove the tab. Replace with spaces.

8b.
It might be better if this comment order is the same as the logic order.
e.g.

* Check the following:
*  Publication is not set with "FOR ALL TABLES"
*  Publication is having default options
*  Publication is not associated with schemas
*  Publication is not associated with relations

~~~

9. src/backend/catalog/pg_publication.c - AlterPublicationSetAllTables

+/*
+ * Reset the publication.
+ *
+ * Reset the publication options, publication relations and
publication schemas.
+ */
+static void
+AlterPublicationSetAllTables(Relation rel, HeapTuple tup)

The function comment and the function name do not seem to match here;
something looks like a cut/paste error ??

~~~

10. src/backend/catalog/pg_publication.c - AlterPublicationSetAllTables

+ /* set all tables option */
+ values[Anum_pg_publication_puballtables - 1] = BoolGetDatum(true);
+ replaces[Anum_pg_publication_puballtables - 1] = true;

SUGGEST (comment)
/* set all ALL TABLES flag */

~~~

11. src/backend/catalog/pg_publication.c - AlterPublication

@@ -1501,6 +1579,20 @@ AlterPublication(ParseState *pstate,
AlterPublicationStmt *stmt)
  aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_PUBLICATION,
 stmt->pubname);

+ if (stmt->for_all_tables)
+ {
+ bool isdefault = CheckPublicationDefValues(tup);
+
+ if (!isdefault)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("Setting ALL TABLES requires publication \"%s\" to have
default values",
+stmt->pubname),
+ errhint("Use ALTER PUBLICATION ... RESET to reset the publication"));

The errmsg should start with a lowercase letter.

~~~

12. src/backend/catalog/pg_publication.c - AlterPublication

@@ -1501,6 +1579,20 @@ 

Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Tom Lane
Andres Freund  writes:
> On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
>> There's some fallout in the expected-file, of course, but this
>> does seem to fix it (20 consecutive successful runs now at
>> 100/2).  Don't see why though ...

> I think what might be happening is that the transactional stats updates get
> reported by s2 *before* the non-transactional stats updates come in from
> s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
> report, because the machine is slow enough for it to be "time to reports stats
> again". Then s1 reports its non-transactional stats.

Sounds plausible.  And I left the test loop running, and it's now past
100 consecutive successes, so I think this change definitely "fixes" it.

> It looks like our stats maintenance around truncation isn't quite "concurrency
> safe". That code hasn't meaningfully changed, but it'd not be surprising if
> it's not 100% precise...

Yeah.  Probably not something to try to improve post-beta, especially
since it's not completely clear how transactional and non-transactional
cases *should* interact.  Maybe non-transactional updates should be
pushed immediately?  But I'm not sure if that's fully correct, and
it definitely sounds expensive.

I'd be good with tweaking this test case as you suggest, and maybe
revisiting the topic later.

Kyotaro-san worried about whether any other places in stats.spec
have the same issue.  I've not seen any evidence of that in my
tests, but perhaps some other machine with different timing
could find it.

regards, tom lane




Re: Tracking notnull attributes inside Var

2022-05-19 Thread Andy Fan
Hi Ashutosh:

   Nice to see you again!

On Tue, May 17, 2022 at 8:50 PM Ashutosh Bapat 
wrote:

> On Sun, May 15, 2022 at 8:41 AM Andy Fan  wrote:
>
> >
> > The var in RelOptInfo->reltarget should have nullable = 0 but the var in
> > RelOptInfo->baserestrictinfo should have nullable = 1;  The beauty of
> this
> > are: a). It can distinguish the two situations perfectly b). Whenever we
> want
> > to know the nullable attribute of a Var for an expression, it is super
> easy to
> > know. In summary, we need to maintain the nullable attribute at 2
> different
> > places. one is the before the filters are executed(baserestrictinfo,
> joininfo,
> > ec_list at least).  one is after the filters are executed
> (RelOptInfo.reltarget
> > only?)
>
> Thanks for identifying this. What you have written makes sense and it
> might open a few optimization opportunities. But let me put down some
> other thoughts here. You might want to take those into consideration
> when designing your solution.
>

Thanks.


>
> Do we want to just track nullable and non-nullable. May be we want
> expand this class to nullable (var may be null), non-nullable (Var is
> definitely non-NULL), null (Var will be always NULL).
>
>
Currently it doesn't support "Var will be always NULL" .  Do you have any
use cases for this? and I can't think of too many cases where we can get
such information except something like "SELECT a FROM t WHERE a
IS NULL".

But the other way to look at this is along the lines of equivalence
> classes. Equivalence classes record the expressions which are equal in
> the final result of the query. The equivalence class members are not
> equal at all the stages of query execution.  But because they are
> equal in the final result, we can impose that restriction on the lower
> levels as well. Can we think of nullable in that fashion? If a Var is
> non-nullable in the final result, we can impose that restriction on
> the intermediate stages since rows with NULL values for that Var will
> be filtered out somewhere. Similarly we could argue for null Var. But
> knowledge that a Var is nullable in the final result does not impose a
> NULL, non-NULL restriction on the intermediate stages. If we follow
> this thought process, we don't need to differentiate Var at different
> stages in query.
>

I agree this is an option.  If so we need to track it under the PlannerInfo
struct but it would not be as fine-grained as my previous. Without
intermediate information,  We can't know if a UnqiueKey contains multiple
NULLs, this would not be an issue for the "MARK Distinct as no-op" case,
but I'm not sure it is OK for other UniqueKey user cases.  So my current
idea
is I still prefer to maintain the intermediate information, unless we are
sure it
costs too much or it is too complex to implement which I don't think so for
now
at least.  So if you have time to look at the attached patch, that would be
super
great as well.

-- 
Best Regards
Andy Fan


Re: Tracking notnull attributes inside Var

2022-05-19 Thread Andy Fan
Hi Tom:

Thanks for your attention!

On Wed, May 18, 2022 at 1:25 AM Tom Lane  wrote:

> Andy Fan  writes:
> > notnulls discussion is forked from UniqueKey stuff, you can see the
> > attachment
> > for the UnqiueKey introduction. Tom raised his opinion to track the
> > nullability
> > inside Var[1][2][3], this thread would start from there based on my
> > understanding.
>
> I'm pretty certain that I never suggested this:
>
> > struct Var
> > {
> > ...;
> >   int nullable;  // -1 unknown,  0 - not nullable.  1 - nullable
> > };
>
> You're free to pursue it if you like, but I think it will be a dead end.
>

OK, Here is a huge misunderstanding.  I have my own solution at the
beginning and then I think you want to go with this direction and I think
it is really hard to understand,  so I started this thread to make things
clear.  It is so great that the gap is filled now.

The fundamental problem as you note is that equalVar() cannot do anything
> sane with a field defined that way.  Also, we'd have to generate Vars
> initially with nullable = unknown (else, for example, ALTER SET/DROP NOT
> NULL breaks stored views referring to the column).  It'd be on the planner
> to run through the tree and replace that with "nullable" or "not
> nullable".  It's hard to see how that's more advantageous than just
> keeping the info in the associated RelOptInfo.
>

Agreed.


>
> Also, I think you're confusing two related but distinct issues.  For
> certain optimization issues, we'd like to keep track of whether a column
> stored in a table is known NOT NULL.  However, that's not the same thing
> as the question that I've muttered about, which is how to treat a Var
> that's been possibly forced to null due to null-extension of an outer
> join.  That is a different value from the Var as read from the table,
> but we currently represent it the same within the planner, which causes
> various sorts of undesirable complication. We cannot fix that by setting

Var.nullable = true in above-the-join instances, because it might also
> be true in below-the-join instances.  "Known not null in the table" is
> not the inverse of "potentially nulled by an outer join".  Moreover, we
> probably need to know *which* join is the one potentially nulling the Var,
> so a bool is not likely enough anyway.
>

I read the above graph several times, but *I think probably my code can
express better than my words*.  It would be great that you can have a
look at them.  Just one point to mention now:  Seems you didn't mention the
case where the NULL values are filtered by qual,  not sure it is negligible
or by mistake.

CREATE TABLE t(a int);
SELECT * FROM t WHERE a > 1;

My patch is my previous solution not the Inside Var one.


> The schemes I've been toying with tend to look more like putting a
> PlaceHolderVar-ish wrapper around the Var or expression that represents
> the below-the-join value.  The wrapper node could carry any join ID
> info that we find necessary.  The thing that I'm kind of stalled on is
> how to define this struct so that it's not a big headache for join
> strength reduction (which could remove the need for a wrapper altogether)
> or outer-join reordering (which makes it a bit harder to define which
> join we think is the one nulling the value).
>
>
Not sure if the "NULL values are filtered by qual '' matters in this
solution,
and I'm pretty open for direction. But to avoid further misunderstanding
from me,  I would like to fill more gaps first by raising my patch now
and continue talking in this direction.

-- 
Best Regards
Andy Fan


v1-0001-Introduce-notnull_attrs-for-RelOptInfo.patch
Description: Binary data


Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Andres Freund
Hi,

On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Hm. Does the instability vanish if you switch s2_commit_prepared_a and 
> > s1_ff?
> 
> Like this?

Yea.


> diff --git a/src/test/isolation/specs/stats.spec 
> b/src/test/isolation/specs/stats.spec
> index be4ae1f4ff..70be29b207 100644
> --- a/src/test/isolation/specs/stats.spec
> +++ b/src/test/isolation/specs/stats.spec
> @@ -562,8 +562,9 @@ permutation
>s1_table_insert_k1 # should be counted
>s1_table_update_k1 # dito
>s1_prepare_a
> +  s1_ff
>s2_commit_prepared_a
> -  s1_ff s2_ff
> +  s2_ff
>s1_table_stats
>  
>  # S1 prepares, S1 aborts prepared
> 
> There's some fallout in the expected-file, of course, but this
> does seem to fix it (20 consecutive successful runs now at
> 100/2).  Don't see why though ...

I think what might be happening is that the transactional stats updates get
reported by s2 *before* the non-transactional stats updates come in from
s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
report, because the machine is slow enough for it to be "time to reports stats
again". Then s1 reports its non-transactional stats.

It looks like our stats maintenance around truncation isn't quite "concurrency
safe". That code hasn't meaningfully changed, but it'd not be surprising if
it's not 100% precise...

Greetings,

Andres Freund




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Tom Lane
Andres Freund  writes:
> Hm. Does the instability vanish if you switch s2_commit_prepared_a and s1_ff?

Like this?

diff --git a/src/test/isolation/specs/stats.spec 
b/src/test/isolation/specs/stats.spec
index be4ae1f4ff..70be29b207 100644
--- a/src/test/isolation/specs/stats.spec
+++ b/src/test/isolation/specs/stats.spec
@@ -562,8 +562,9 @@ permutation
   s1_table_insert_k1 # should be counted
   s1_table_update_k1 # dito
   s1_prepare_a
+  s1_ff
   s2_commit_prepared_a
-  s1_ff s2_ff
+  s2_ff
   s1_table_stats
 
 # S1 prepares, S1 aborts prepared

There's some fallout in the expected-file, of course, but this
does seem to fix it (20 consecutive successful runs now at
100/2).  Don't see why though ...

regards, tom lane




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Kyotaro Horiguchi
At Thu, 19 May 2022 22:58:14 -0400, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> Notice that these markers can only delay reporting of the completion
> of a step, not the launch of a step.  The isolationtester will launch
> the next step in a permutation as soon as (A) all prior steps of the
> same session are done, and (B) the immediately preceding step in the
> permutation is done or deemed blocked.  For this purpose, "deemed
> blocked" means that it has been seen to be waiting on a database lock,
> or that it is complete but the report of its completion is delayed by
> one of these markers.
> 
> There's no "waiting..." reports in the test output, nor do we have any
> condition markers in stats.spec right now, so I don't think any steps
> have been "deemed blocked".

Mmm... Thanks. I miunderstood the effect of it..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Andres Freund
Hi,

On 2022-05-19 22:58:14 -0400, Tom Lane wrote:
> What I am wondering about at this point is whether the effects of
> pg_stat_force_next_flush() could somehow be delayed until after we
> have told the client the command is complete.

It shouldn't - it just forces pg_stat_report_stat() to flush (rather than
doing so based on the time of the last report). And pg_stat_report_stat()
happens before ReadyForQuery().

Hm. Does the instability vanish if you switch s2_commit_prepared_a and s1_ff?

Greetings,

Andres Freund




Re: Build-farm - intermittent error in 031_column_list.pl

2022-05-19 Thread Amit Kapila
On Fri, May 20, 2022 at 6:58 AM Kyotaro Horiguchi
 wrote:
>
> > > the apply worker will use the existing slot and replication origin
> > > corresponding to the subscription. Now, it is possible that before
> > > restart the origin has not been updated and the WAL start location
> > > points to a location prior to where PUBLICATION pub9 exists which can
> > > lead to such an error. Once this error occurs, apply worker will never
> > > be able to proceed and will always return the same error. Does this
> > > make sense?
>
> Wow. I didin't thought that line. That theory explains the silence and
> makes sense even though I don't see LSN transistions that clearly
> support it.  I dimly remember a similar kind of problem..
>
> > > Unless you or others see a different theory, this seems to be the
> > > existing problem in logical replication which is manifested by this
> > > test. If we just want to fix these test failures, we can create a new
> > > subscription instead of altering the existing publication to point to
> > > the new publication.
> > >
> >
> > If the above theory is correct then I think allowing the publisher to
> > catch up with "$node_publisher->wait_for_catchup('sub1');" before
> > ALTER SUBSCRIPTION should fix this problem. Because if before ALTER
> > both publisher and subscriber are in sync then the new publication
> > should be visible to WALSender.
>
> It looks right to me.
>

Let's wait for Tomas or others working in this area to share their thoughts.

>  That timetravel seems inintuitive but it's the
> (current) way it works.
>

I have thought about it but couldn't come up with a good way to change
the way currently it works. Moreover, I think it is easy to hit this
in other ways as well. Say, you first create a subscription with a
non-existent publication and then do operation on any unrelated table
on the publisher before creating the required publication, we will hit
exactly this problem of "publication does not exist", so I think we
may need to live with this behavior and write tests carefully.

-- 
With Regards,
Amit Kapila.




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Tom Lane
Andres Freund  writes:
> On 2022-05-19 21:42:31 -0400, Tom Lane wrote:
>> Even more interesting, the repeatability varies with the settings
>> of max_connections and max_prepared_transactions.

> That is indeed quite odd.

>> At low values (resp. 20 and 0) I've not been able to make it happen at all,
>> but at 100 and 2 it happens circa three times out of four.

> Is this the only permutation that fails?

No, but those values definitely seem to affect the probability of
failure.  I just finished a more extensive series of runs, and got:

successes/tries with max_connections/max_prepared_transactions:
5/5 OK with 20/2
5/5 OK with 100/0
3/5 OK with 100/2
4/5 OK with 200/2
5/6 OK with 100/10
5/6 OK with 50/2
6/10 OK with 100/2

It seems like the probability decreases again if I raise either
number further.  So I'm now guessing that this is purely a timing
issue and that somehow 100/2 is near the sweet spot for hitting
the timing window.  Why those settings would affect pgstats at
all is unclear, though.

regards, tom lane




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Andres Freund
Hi,

On 2022-05-19 21:42:31 -0400, Tom Lane wrote:
> >  
> > seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
> >  
> > ++-+-+-+--+--+
> > -   3|   9|5|1|0| 1| 1| 
> >   0
> > +   3|   9|5|1|0| 4| 1| 
> >   0
> >  (1 row)

> Even more interesting, the repeatability varies with the settings
> of max_connections and max_prepared_transactions.

That is indeed quite odd.


> At low values (resp. 20 and 0) I've not been able to make it happen at all,
> but at 100 and 2 it happens circa three times out of four.

Is this the only permutation that fails?

Greetings,

Andres Freund




Re: bogus: logical replication rows/cols combinations

2022-05-19 Thread Amit Kapila
On Thu, May 19, 2022 at 7:54 PM Tom Lane  wrote:
>
> Justin Pryzby  writes:
> > On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:
> >> I have committed the first patch after fixing this part. It seems Tom
> >> is not very happy doing this after beta-1 [1]. The reason we get this
> >> information via this view (and underlying function) is that it
> >> simplifies the queries on the subscriber-side as you can see in the
> >> second patch. The query change is as below:
> >> [1] - 
> >> https://www.postgresql.org/message-id/91075.1652929852%40sss.pgh.pa.us
>
> > I think Tom's concern is that adding information to a view seems like 
> > adding a
> > feature that hadn't previously been contemplated.
> > (Catalog changes themselves are not prohibited during the beta period).
>
> It certainly smells like a new feature, but my concern was more around the
> post-beta catalog change.  We do those only if really forced to, and the
> explanation in the commit message didn't satisfy me as to why it was
> necessary.  This explanation isn't much better --- if we're trying to
> prohibit a certain class of publication definitions, what good does it do
> to check that on the subscriber side?
>

It is required on the subscriber side because prohibition is only for
the cases where multiple publications are combined. We disallow the
cases where the column list is different for the same table when
combining publications. For example:

Publisher-side:
Create table tab(c1 int, c2 int);
Create Publication pub1 for table tab(c1);
Create Publication pub1 for table tab(c2);

Subscriber-side:
Create Subscription sub1 Connection 'dbname=postgres' Publication pub1, pub2;

We want to prohibit such cases. So, it would be better to check at the
time of 'Create Subscription' to validate such combinations and
prohibit them. To achieve that we extended the existing function
pg_get_publication_tables() and view pg_publication_tables to expose
the column list and verify such a combination. We primarily need
column list information for this prohibition but it appeared natural
to expose the row filter.

As mentioned in my previous email, we can fetch the required
information directly from system table pg_publication_rel and extend
the query in fetch_table_list to achieve the desired purpose but
extending the existing function/view for this appears to be a simpler
way.

>  Even more to the point, how can we
> have a subscriber do that by relying on view columns that don't exist in
> older versions?
>

We need a version check like (if
(walrcv_server_version(LogRepWorkerWalRcvConn) >= 15)) for that.

>  I'm also quite concerned about anything that involves
> subscribers examining row filter conditions; that sounds like a pretty
> direct route to bugs involving unsolvability and the halting problem.
>

We examine only the column list for the purpose of this prohibition.

-- 
With Regards,
Amit Kapila.




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-19 Thread Kyotaro Horiguchi
At Thu, 19 May 2022 17:16:03 -0400, Tom Lane  wrote in 
> Justin Pryzby  writes:
> > ./tmp_install/usr/local/pgsql/bin/postgres -D 
> > ./src/test/regress/tmp_check/data -c min_dynamic_shared_memory=1MB
> > TRAP: FailedAssertion("val > base", File: 
> > "../../../../src/include/utils/relptr.h", Line: 67, PID: 21912)
> 
> Yeah, I see it too.
> 
> > It looks like this may be pre-existing problem exposed by
> > commit e07d4ddc55fdcf82082950b3eb0cd8f728284c9d
> 
> Agreed.  Here I see
> 
> #5  FreePageManagerInitialize (fpm=fpm@entry=0x7f34b3ddd300, 
> base=base@entry=0x7f34b3ddd300 "") at freepage.c:187
> #6  0x0082423e in dsm_shmem_init () at dsm.c:473
> 
> so that where we do
> 
>   relptr_store(base, fpm->self, fpm);
> 
> the "relative" pointer value would have to be zero, making the case
> indistinguishable from a NULL pointer.  We can either change the
> caller so that these addresses aren't the same, or give up the
> ability to store NULL in relptrs ... doesn't seem like a hard call.
> 
> One interesting question I didn't look into is why it takes a nondefault
> value of min_dynamic_shared_memory to expose this bug.

The path is taken only when a valid value is given to the
parameter. If we don't use preallocated dsm, it is initialized
elsewhere.  In those cases the first bytes of the base address (the
second parameter of FreePageManagerInitialize) are used for
dsa_segment_header so the relptr won't be zero (!= NULL).

It can be silenced by wasting the first MAXALIGN bytes of
dsm_main_space_begin..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Fri, 20 May 2022 11:02:21 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
>> Doesn't the step s1_table_stats needs a blocking condition (s2_ff)?
> s/needs/need/;
> If I removed the step s2_ff, I see the same difference. So I think it
> is that.  Some other permutations seem to need the same.

Hmm ... it does seem like the answer might be somewhere around there,
but it's not there exactly.  This won't fix it:

-  s1_table_stats
+  s1_table_stats(s2_ff)

That sort of marker only stabilizes cases where two different steps might
be reported as completing in either order.  In this case, that's clearly
not the problem.  What we do need is to ensure that s1_table_stats doesn't
*launch* before s2_ff is done.  However, it doesn't look to me like that's
what's happening.  isolation/README explains that

Notice that these markers can only delay reporting of the completion
of a step, not the launch of a step.  The isolationtester will launch
the next step in a permutation as soon as (A) all prior steps of the
same session are done, and (B) the immediately preceding step in the
permutation is done or deemed blocked.  For this purpose, "deemed
blocked" means that it has been seen to be waiting on a database lock,
or that it is complete but the report of its completion is delayed by
one of these markers.

There's no "waiting..." reports in the test output, nor do we have any
condition markers in stats.spec right now, so I don't think any steps
have been "deemed blocked".

What I am wondering about at this point is whether the effects of
pg_stat_force_next_flush() could somehow be delayed until after we
have told the client the command is complete.  I've not poked into
the code in that area, but if that could happen it'd explain this
behavior.

regards, tom lane




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Kyotaro Horiguchi
At Fri, 20 May 2022 11:02:21 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Doesn't the step s1_table_stats needs a blocking condition (s2_ff)?

s/needs/need/;

If I removed the step s2_ff, I see the same difference. So I think it
is that.  Some other permutations seem to need the same.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/isolation/specs/stats.spec b/src/test/isolation/specs/stats.spec
index be4ae1f4ff..441dd7d3b8 100644
--- a/src/test/isolation/specs/stats.spec
+++ b/src/test/isolation/specs/stats.spec
@@ -417,7 +417,7 @@ permutation
   s2_table_update_k1
   s1_table_drop
   s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
 
 permutation
   s1_table_select
@@ -461,7 +461,7 @@ permutation
   s2_table_select
   s1_track_counts_on
   s1_ff s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
   # but can count again after
   s1_table_select
   s1_table_update_k1
@@ -478,7 +478,7 @@ permutation
   s2_table_select
   s1_track_counts_on
   s1_ff s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
   s1_table_select
   s1_table_update_k1
   s1_ff
@@ -509,7 +509,7 @@ permutation
   s2_commit_prepared_a
   s1_table_select
   s1_ff s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
 
 # S1 prepares, S2 commits prepared
 permutation
@@ -533,7 +533,7 @@ permutation
   s2_rollback_prepared_a
   s1_table_select
   s1_ff s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
 
 
 ### 2PC: truncate handling
@@ -564,7 +564,7 @@ permutation
   s1_prepare_a
   s2_commit_prepared_a
   s1_ff s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
 
 # S1 prepares, S1 aborts prepared
 permutation
@@ -592,7 +592,7 @@ permutation
   s1_prepare_a
   s2_rollback_prepared_a
   s1_ff s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
 
 
 ### 2PC: rolled back drop maintains live / dead counters
@@ -627,7 +627,7 @@ permutation
   s1_prepare_a
   s2_rollback_prepared_a
   s1_ff s2_ff
-  s1_table_stats
+  s1_table_stats(s2_ff)
 
 
 ##


Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Kyotaro Horiguchi
At Thu, 19 May 2022 21:42:31 -0400, Tom Lane  wrote in 
> Christoph Berg  writes:
> > Debian unstable mips (the old 32-bit one):
> 
> > --- /<>/src/test/isolation/expected/stats.out  2022-05-16 
> > 21:10:42.0 +
> > +++ /<>/build/src/test/isolation/output_iso/results/stats.out  
> > 2022-05-18 23:26:56.573000536 +
> > @@ -2854,7 +2854,7 @@
> 
> >  
> > seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
> >  
> > ++-+-+-+--+--+
> > -   3|   9|5|1|0| 1| 1| 
> >   0
> > +   3|   9|5|1|0| 4| 1| 
> >   0
> >  (1 row)
> 
> I have just discovered that I can reproduce this identical symptom
> fairly repeatably on an experimental lashup that I've been running
> with bleeding-edge NetBSD on my ancient HPPA box.  (You didn't think
> I was just going to walk away from that hardware, did you?)
> 
> Even more interesting, the repeatability varies with the settings
> of max_connections and max_prepared_transactions.  At low values
> (resp. 20 and 0) I've not been able to make it happen at all, but
> at 100 and 2 it happens circa three times out of four.
> 
> I have no idea where to start looking, but this is clearly an issue
> in the new stats code ... or else the hoped-for goal of removing
> flakiness from the stats tests is just as far away as ever.

Doesn't the step s1_table_stats needs a blocking condition (s2_ff)?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: 15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Tom Lane
Christoph Berg  writes:
> Debian unstable mips (the old 32-bit one):

> --- /<>/src/test/isolation/expected/stats.out2022-05-16 
> 21:10:42.0 +
> +++ /<>/build/src/test/isolation/output_iso/results/stats.out
> 2022-05-18 23:26:56.573000536 +
> @@ -2854,7 +2854,7 @@

>  
> seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
>  
> ++-+-+-+--+--+
> -   3|   9|5|1|0| 1| 1|   
> 0
> +   3|   9|5|1|0| 4| 1|   
> 0
>  (1 row)

I have just discovered that I can reproduce this identical symptom
fairly repeatably on an experimental lashup that I've been running
with bleeding-edge NetBSD on my ancient HPPA box.  (You didn't think
I was just going to walk away from that hardware, did you?)

Even more interesting, the repeatability varies with the settings
of max_connections and max_prepared_transactions.  At low values
(resp. 20 and 0) I've not been able to make it happen at all, but
at 100 and 2 it happens circa three times out of four.

I have no idea where to start looking, but this is clearly an issue
in the new stats code ... or else the hoped-for goal of removing
flakiness from the stats tests is just as far away as ever.

regards, tom lane




Re: Build-farm - intermittent error in 031_column_list.pl

2022-05-19 Thread Kyotaro Horiguchi
At Thu, 19 May 2022 16:42:31 +0530, Amit Kapila  wrote 
in 
> On Thu, May 19, 2022 at 3:16 PM Amit Kapila  wrote:
> >
> > This happens after "ALTER SUBSCRIPTION sub1 SET PUBLICATION pub9". The
> > probable theory is that ALTER SUBSCRIPTION will lead to restarting of
> > apply worker (which we can see in LOGS as well) and after the restart,

Yes.

> > the apply worker will use the existing slot and replication origin
> > corresponding to the subscription. Now, it is possible that before
> > restart the origin has not been updated and the WAL start location
> > points to a location prior to where PUBLICATION pub9 exists which can
> > lead to such an error. Once this error occurs, apply worker will never
> > be able to proceed and will always return the same error. Does this
> > make sense?

Wow. I didin't thought that line. That theory explains the silence and
makes sense even though I don't see LSN transistions that clearly
support it.  I dimly remember a similar kind of problem..

> > Unless you or others see a different theory, this seems to be the
> > existing problem in logical replication which is manifested by this
> > test. If we just want to fix these test failures, we can create a new
> > subscription instead of altering the existing publication to point to
> > the new publication.
> >
> 
> If the above theory is correct then I think allowing the publisher to
> catch up with "$node_publisher->wait_for_catchup('sub1');" before
> ALTER SUBSCRIPTION should fix this problem. Because if before ALTER
> both publisher and subscriber are in sync then the new publication
> should be visible to WALSender.

It looks right to me.  That timetravel seems inintuitive but it's the
(current) way it works.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Skipping schema changes in publication

2022-05-19 Thread Peter Smith
On Fri, May 20, 2022 at 10:19 AM Peter Smith  wrote:
>
> FYI, although the v6-0002 patch applied cleanly, I found that the SGML
> was malformed and so the pg docs build fails.
>
> ~~~
> e.g.
>
> [postgres@CentOS7-x64 sgml]$ make STYLE=website html
> { \
>   echo ""; \
>   echo ""; \
> } > version.sgml
> '/usr/bin/perl' ./mk_feature_tables.pl YES
> ../../../src/backend/catalog/sql_feature_packages.txt
> ../../../src/backend/catalog/sql_features.txt >
> features-supported.sgml
> '/usr/bin/perl' ./mk_feature_tables.pl NO
> ../../../src/backend/catalog/sql_feature_packages.txt
> ../../../src/backend/catalog/sql_features.txt >
> features-unsupported.sgml
> '/usr/bin/perl' ./generate-errcodes-table.pl
> ../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
> '/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> ref/create_publication.sgml:171: parser error : Opening and ending tag
> mismatch: varlistentry line 166 and listitem
> 
>^
> ref/create_publication.sgml:172: parser error : Opening and ending tag
> mismatch: variablelist line 60 and varlistentry
>
>   ^
> ref/create_publication.sgml:226: parser error : Opening and ending tag
> mismatch: refsect1 line 57 and variablelist
>   
>  ^
> ...
>
> I will work around it locally, but for future patches please check the
> SGML builds ok before posting.

FYI, I rewrote the bad SGML fragment like this:

   
EXCEPT TABLE

 
  This clause specifies a list of tables to exclude from the publication. It
  can only be used with FOR ALL TABLES.
 

   

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Addition of PostgreSQL::Test::Cluster::pg_version()

2022-05-19 Thread Michael Paquier
On Thu, May 19, 2022 at 07:28:53AM -0400, Andrew Dunstan wrote:
> Looks ok. PostgreSQL::Version is designed so that the object behaves
> sanely in comparisons and when interpolated into a string.

I saw that, and that's kind of nice when it comes to write
version-specific code paths in the tests ;) 
--
Michael


signature.asc
Description: PGP signature


Re: Skipping schema changes in publication

2022-05-19 Thread Peter Smith
FYI, although the v6-0002 patch applied cleanly, I found that the SGML
was malformed and so the pg docs build fails.

~~~
e.g.

[postgres@CentOS7-x64 sgml]$ make STYLE=website html
{ \
  echo ""; \
  echo ""; \
} > version.sgml
'/usr/bin/perl' ./mk_feature_tables.pl YES
../../../src/backend/catalog/sql_feature_packages.txt
../../../src/backend/catalog/sql_features.txt >
features-supported.sgml
'/usr/bin/perl' ./mk_feature_tables.pl NO
../../../src/backend/catalog/sql_feature_packages.txt
../../../src/backend/catalog/sql_features.txt >
features-unsupported.sgml
'/usr/bin/perl' ./generate-errcodes-table.pl
../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
'/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml
/usr/bin/xmllint --path . --noout --valid postgres.sgml
ref/create_publication.sgml:171: parser error : Opening and ending tag
mismatch: varlistentry line 166 and listitem

   ^
ref/create_publication.sgml:172: parser error : Opening and ending tag
mismatch: variablelist line 60 and varlistentry
   
  ^
ref/create_publication.sgml:226: parser error : Opening and ending tag
mismatch: refsect1 line 57 and variablelist
  
 ^
...

I will work around it locally, but for future patches please check the
SGML builds ok before posting.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-19 Thread David Rowley
On Thu, 19 May 2022 at 11:20, Tom Lane  wrote:
> The thing that makes this a bit more difficult than it might be is
> the special cases we have for known-aligned and so on targets, which
> are particularly critical for palloc0 and makeNode etc.  So there's
> more than one case to look into.  But I'd argue that those special
> cases are actually what we want to worry about the most: zeroing
> relatively small, known-aligned node structs is THE use case.

I think the makeNode() trickery would be harder to get rid of, or for
that matter, anything where the size/alignment is unknown at compile
time.  I think the more interesting ones that we might be able to get
rid of are the ones where the alignment and size *are* known at
compile time. Also probably anything that passes a compile-time const
that's not 0 will fallback on memset anyway, so might as well be
removed to tidy things up.

It just all seems a bit untidy when you look at functions like
ExecStoreAllNullTuple() which use a mix of memset and MemSet without
any apparent explanation of why. That particular one is likely that
way due to the first size guaranteed to be multiples of sizeof(Datum)
and the latter not.

Naturally, we'd need to run enough benchmarks to prove to ourselves
that we're not causing any slowdowns.  The intention of memset.c was
to try to put something out there that people could test so we could
get an idea if there are any machines/compilers that we might need to
be concerned about.

David




Re: 15beta1 crash on mips64el in pg_regress/triggers

2022-05-19 Thread Tom Lane
I wrote:
> I see that the gcc farm[1] has another mips64 machine running Debian
> buster, so I've started a build there to see what happens.

Many kilowatt-hours later, I've entirely failed to reproduce this
on gcc230.  Not sure how to investigate further.  Given that your
original build machine is so slow, could it be timing-related?
Hard to see how, given the location of the crash, but ...

regards, tom lane




Re: A qsort template

2022-05-19 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, May 19, 2022 at 1:12 PM Justin Pryzby  wrote:
>> Should these debug lines be removed ?
>> 
>> elog(DEBUG1, "qsort_tuple");

> I agree -- DEBUG1 seems too chatty for something like this. DEBUG2
> would be more appropriate IMV. Though I don't feel very strongly about
> it.

Given the lack of context identification, I'd put the usefulness of
these in production at close to zero.  +1 for removing them
altogether, or failing that, downgrade to DEBUG5 or so.

regards, tom lane




Re: A qsort template

2022-05-19 Thread Peter Geoghegan
On Thu, May 19, 2022 at 1:12 PM Justin Pryzby  wrote:
> Should these debug lines be removed ?
>
> elog(DEBUG1, "qsort_tuple");

I agree -- DEBUG1 seems too chatty for something like this. DEBUG2
would be more appropriate IMV. Though I don't feel very strongly about
it.

--
Peter Geoghegan




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-19 Thread Tom Lane
Justin Pryzby  writes:
> ./tmp_install/usr/local/pgsql/bin/postgres -D 
> ./src/test/regress/tmp_check/data -c min_dynamic_shared_memory=1MB
> TRAP: FailedAssertion("val > base", File: 
> "../../../../src/include/utils/relptr.h", Line: 67, PID: 21912)

Yeah, I see it too.

> It looks like this may be pre-existing problem exposed by
> commit e07d4ddc55fdcf82082950b3eb0cd8f728284c9d

Agreed.  Here I see

#5  FreePageManagerInitialize (fpm=fpm@entry=0x7f34b3ddd300, 
base=base@entry=0x7f34b3ddd300 "") at freepage.c:187
#6  0x0082423e in dsm_shmem_init () at dsm.c:473

so that where we do

relptr_store(base, fpm->self, fpm);

the "relative" pointer value would have to be zero, making the case
indistinguishable from a NULL pointer.  We can either change the
caller so that these addresses aren't the same, or give up the
ability to store NULL in relptrs ... doesn't seem like a hard call.

One interesting question I didn't look into is why it takes a nondefault
value of min_dynamic_shared_memory to expose this bug.

regards, tom lane




Re: parallel not working

2022-05-19 Thread Robert Haas
On Thu, May 19, 2022 at 8:05 AM huangning...@yahoo.com <
huangning...@yahoo.com> wrote:

> I would like to know how can I get it to work properly?
>

I suppose you have a bug in your C code. Try hooking up a debugger to one
of the sessions that is hung and see what it's doing e.g.

gdb -p 26130
bt

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Re: A qsort template

2022-05-19 Thread Justin Pryzby
On Fri, Apr 22, 2022 at 11:37:29AM +0700, John Naylor wrote:
> On Fri, Apr 22, 2022 at 11:13 AM David Rowley  wrote:
> >
> > On Thu, 21 Apr 2022 at 19:09, John Naylor  
> > wrote:
> > > I intend to commit David's v2 fix next week, unless there are
> > > objections, or unless he beats me to it.
> >
> > I wasn't sure if you wanted to handle it or not, but I don't mind
> > doing it, so I just pushed it after a small adjustment to a comment.
> 
> Thank you!

Should these debug lines be removed ?

elog(DEBUG1, "qsort_tuple");

Perhaps if I ask for debug output, I shouldn't be surprised if it changes
between major releases - but I still found this surprising.

I'm sure it's useful during development and maybe during beta.  It could even
make sense if it were shown during regression tests (preferably at DEBUG2).
But right now it's not.  is that

ts=# \dt
DEBUG:  qsort_tuple
List of relations

-- 
Justin




Re: Zstandard support for toast compression

2022-05-19 Thread Robert Haas
On Thu, May 19, 2022 at 4:20 AM Michael Paquier  wrote:
> Btw, shouldn't we have something a bit more, err, extensible for the
> design of an extensible varlena header?  If we keep it down to some
> bitwise information, we'd be fine for a long time but it would be
> annoying to review again an extended design if we need to extend it
> with more data.

What do you have in mind?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-19 Thread Justin Pryzby
./tmp_install/usr/local/pgsql/bin/postgres -D ./src/test/regress/tmp_check/data 
-c min_dynamic_shared_memory=1MB

TRAP: FailedAssertion("val > base", File: 
"../../../../src/include/utils/relptr.h", Line: 67, PID: 21912)
./tmp_install/usr/local/pgsql/bin/postgres(ExceptionalCondition+0xa0)[0x55af5c9c463e]
./tmp_install/usr/local/pgsql/bin/postgres(FreePageManagerInitialize+0x94)[0x55af5c9f4478]
./tmp_install/usr/local/pgsql/bin/postgres(dsm_shmem_init+0x87)[0x55af5c841532]
./tmp_install/usr/local/pgsql/bin/postgres(CreateSharedMemoryAndSemaphores+0x8d)[0x55af5c843f30]
./tmp_install/usr/local/pgsql/bin/postgres(+0x41805c)[0x55af5c7c605c]
./tmp_install/usr/local/pgsql/bin/postgres(PostmasterMain+0x959)[0x55af5c7ca8e7]
./tmp_install/usr/local/pgsql/bin/postgres(main+0x229)[0x55af5c70af7f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f736d6e1b97]
./tmp_install/usr/local/pgsql/bin/postgres(_start+0x2a)[0x55af5c4794fa]

It looks like this may be pre-existing problem exposed by

commit e07d4ddc55fdcf82082950b3eb0cd8f728284c9d
Author: Tom Lane 
Date:   Sat Mar 26 14:29:29 2022 -0400

Suppress compiler warning in relptr_store().





Inquiring about my GSoC Proposal.

2022-05-19 Thread Israa Odeh
Greetings!

I was wondering if you could provide me with initial feedback on my GSoC 
proposal, as well as if you have any comments about it. And would it be 
possible to know whether I got accepted as a contributor?

Best Regards,
Israa Odeh.
Sent from Mail for Windows



Re: Privileges on PUBLICATION

2022-05-19 Thread Antonin Houska
Euler Taveira  wrote:

> On Wed, May 18, 2022, at 6:16 AM, Antonin Houska wrote:
> 
>  The patch is attached to this message.
> 
> Great. Add it to the next CF. I'll review it when I have some spare time.

https://commitfest.postgresql.org/38/3641/

Thanks!

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: [RFC] Improving multi-column filter cardinality estimation using MCVs and HyperLogLog

2022-05-19 Thread Matthias van de Meent
On Mon, 16 May 2022 at 00:09, Tomas Vondra 
wrote:
>
> On 5/15/22 21:55, Matthias van de Meent wrote:
> > Note: I am not (currently) planning on implementing this rough idea,
> > just putting it up to share and document the idea, on request of Tomas
> > (cc-ed).
> >
> > The excellent pgconf.de presentation on PostgreSQL's extended
> > statistics system by Tomas Vondra [0] talked about how the current
> > default statistics assume the MCVs of columns to be fully independent,
> > i.e. values of column A do not imply any value of columns B and C, and
> > that for accurate data on correllated values the user needs to
> > manually create statistics on the combined columns (by either
> > STATISTICS or by INDEX).
> >
> > This is said to be due to limitations in our statistics collector: to
> > determine the fraction of the table that contains the value, we store
> > the N most common values with the fraction of their occurrance in the
> > table. This value is quite exact, but combining these values proves
> > difficult: there is nothing in the stored value that can confidently
> > include or exclude parts of the table from a predicate using that MCV,
> > so we can only assume that the values of two columns are independent.
> >
> > After the presentation it came to me that if we were to add an
> > estimator for the number of rows with that value to the MCV lists in
> > the form of HLL sketches (in addition to or replacing the current
> > most_common_elem_freqs fractions), we would be able to make better
> > estimates for multi-column filters by combining the HLL row
> > cardinality sketches for filters that filter on these MCVs. This would
> > remove the immediate need for manual statistics with an cartesian
> > product of the MCVs of those columns with their occurrance fractions,
> > which significantly reduces the need for the creation of manual
> > statistics - the need that exists due to planner mis-estimates in
> > correlated columns. Custom statistics will still be required for
> > expression statistics, but column correlation estimations _within
> > MCVs_ is much improved.
> >
> > How I imagine this would work is that for each value in the MCV, an
> > HLL is maintained that estimates the amount of distinct tuples
> > containing that value. This can be h(TID) or h(PK), or anything else
> > that would uniquely identify returned tuples. Because the keyspace of
> > all HLLs that are generated are on the same table, you can apply join
> > and intersection operations on the HLLs of the MCVs (for OR and
> > AND-operations respectively), and provide fairly accurately estimates
> > for the amount of tuples that would be returned by the filter on that
> > table.
> > > The required size of the HLL sketches can be determined by the amount
> > of tuples scanned during analyze, potentially reducing the size
> > required to store these HLL sketches from the usual 1.5kB per sketch
> > to something smaller - we'll only ever need to count nTuples distinct
> > values, so low values for default_statistics_target would allow for
> > smaller values for m in the HLL sketches, whilst still providing
> > fairly accurate result estimates.
> >
>
> I think it's an interesting idea. In principle it allows deducing the
> multi-column MCV for arbitrary combination of columns, not determined in
> advance. We'd have the MCV with HLL instead of frequencies for columns
> A, B and C:
>
> (a1, hll(a1))
> (a2, hll(a2))
> (...)
> (aK, hll(aK))
>
>
> (b1, hll(b1))
> (b2, hll(b2))
> (...)
> (bL, hll(bL))
>
> (c1, hll(c1))
> (c2, hll(c2))
> (...)
> (cM, hll(cM))
>
> and from this we'd be able to build MCV for any combination of those
> three columns.
>
> And in some sense it might even be more efficient/accurate, because the
> MCV on (A,B,C) might have up to K*L*M items. if there's 100 items in
> each column, that'd be 1,000,000 combinations, which we can't really
> store (target is up to 10k). And even if we could, it'd be 1M
> combinations with frequencies (so ~8-16B per combination).
>
> While with the MCV/HLL, we'd have 300 items and HLL. Assuming 256-512B
> HLL would be enough, that's still way smaller than the multi-column MCV.

HLLs for statistics_target=100 could use 4 bits per bucket, but any target
above 218 should use 5 bits: nbits = ceil(log2(log2(target * 300))), and
this saves only 20% on storage.

Accuracy increases with root(m), so while we can shrink the amount of
buckets, this is only OK if we're accepting the corresponding decrease in
accuracy.

> Even with target=10k it'd still be cheaper to store the separate MCV
> with HLL values, if I count right, and there'd be no items omitted from
> the MCV.

There are more options, though: Count-min was proposed as a replacement for
MCV lists, and they work as guaranteed max count of distinct values. If,
instead of an actual counter in each bucket, we would use HLL in each
bucket, we'd not only have occurance estimates (but not: actual max-count
limits) for all values, but we'd also be 

Re: Zstandard support for toast compression

2022-05-19 Thread Nikolay Shaplov
В письме от вторник, 17 мая 2022 г. 23:01:07 MSK пользователь Tom Lane 
написал:

Hi! I came to this branch looking for a patch to review, but I guess I would 
join the discussion instead of reading the code.

> >> Yeah - I think we had better reserve the fourth bit pattern for
> >> something extensible e.g. another byte or several to specify the
> >> actual method, so that we don't have a hard limit of 4 methods. But
> >> even with such a system, the first 3 methods will always and forever
> >> be privileged over all others, so we'd better not make the mistake of
> >> adding something silly as our third algorithm.
> > 
> > In such a situation, would they really end up being properly distinct
> > when it comes to what our users see..?  I wouldn't really think so.
> 
> It should be transparent to users, sure, but the point is that the
> first three methods will have a storage space advantage over others.
> Plus we'd have to do some actual work to create that extension mechanism.

Postgres is well known for extensiblility. One can write your own 
implementation of almost everything and make it an extension.
Though one would hardly need more than one (or two) additional compression 
methods, but which method one will really need is hard to say. 

So I guess it would be much better to create and API for creating and 
registering own compression method and create build in Zstd compression method 
that can be used (or optionally not used) via that API.

Moreover I guess this API (may be with some modification) can be used for 
seamless data encryption, for example. 

So I guess it would be better to make it extensible from the start and use 
this precious bit for compression method chosen by user, and may be later 
extend it with another byte of compression method bits, if it is needed.

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.


Re: 15beta1 crash on mips64el in pg_regress/triggers

2022-05-19 Thread Christoph Berg
Re: To Tom Lane
> > Is the crash 100% reproducible for you?
> 
> I have scheduled a rebuild now, we'll know in a few hours...

The build was much faster this time (different machine), and worked.

https://buildd.debian.org/status/logs.php?pkg=postgresql-15=mips64el

I'll also start a test build on the mips64el porterbox I have access to.

Christoph




Re: 15beta1 crash on mips64el in pg_regress/triggers

2022-05-19 Thread Tom Lane
Christoph Berg  writes:
> Re: Tom Lane
>> Hmm, so what's different between this and buildfarm member topminnow?

> That one is running Debian jessie (aka oldoldoldoldstable), uses
> -mabi=32 with gcc 4.9, and runs a kernel from 2015.
> The Debian buildd is this: 
> https://db.debian.org/machines.cgi?host=mipsel-aql-01
> The host should be running Debian buster, with the build done in an
> unstable chroot. I don't know what "LS3A-RS780-1w (Quad Core Loongson 3A)"
> means, but it's probably much newer hardware than the other one.

I see that the gcc farm[1] has another mips64 machine running Debian
buster, so I've started a build there to see what happens.

regards, tom lane

[1] https://cfarm.tetaneutral.net/machines/list/




Re: [PATCH] Add result_types column to pg_prepared_statements view

2022-05-19 Thread Dagfinn Ilmari Mannsåker
Dagfinn Ilmari Mannsåker  writes:

> Hi hackers,
>
> Prompted by a question on IRC, here's a patch to add a result_types
> column to the pg_prepared_statements view, so that one can see the types
> of the columns returned by a prepared statement, not just the parameter
> types.

Added to the 2022-07 commitfest: https://commitfest.postgresql.org/38/3644/

- ilmari




[PATCH] Add result_types column to pg_prepared_statements view

2022-05-19 Thread Dagfinn Ilmari Mannsåker
Hi hackers,

Prompted by a question on IRC, here's a patch to add a result_types
column to the pg_prepared_statements view, so that one can see the types
of the columns returned by a prepared statement, not just the parameter
types.

I'm not quite sure about the column name, suggestions welcome.

- ilmari

>From 5045cd5a173fefb5346ed81d355ba35c1c922105 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 19 May 2022 15:54:56 +0100
Subject: [PATCH] Add result_types column to pg_prepared_statements view

Containing the types of the columns returned by the prepared
statement.

Prompted by question from IRC user mlvzk.
---
 doc/src/sgml/catalogs.sgml| 12 +
 src/backend/commands/prepare.c| 19 +--
 src/include/catalog/pg_proc.dat   |  6 +--
 src/test/regress/expected/prepare.out | 72 +--
 src/test/regress/expected/rules.out   |  3 +-
 src/test/regress/sql/prepare.sql  | 12 ++---
 6 files changed, 73 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d96c72e531..ae7627ae48 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11492,6 +11492,18 @@
   
  
 
+ 
+  
+   result_types regtype[]
+  
+  
+   The types of the columns returned by the prepared statement in the
+   form of an array of regtype. The OID corresponding
+   to an element of this array can be obtained by casting the
+   regtype value to oid.
+  
+ 
+
  
   
from_sql bool
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..7c8537fbd2 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -683,8 +683,16 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 		hash_seq_init(_seq, prepared_queries);
 		while ((prep_stmt = hash_seq_search(_seq)) != NULL)
 		{
-			Datum		values[7];
-			bool		nulls[7];
+			Datum		values[8];
+			bool		nulls[8];
+			TupleDesc	result_desc;
+			Oid		   *result_types;
+
+			result_desc = prep_stmt->plansource->resultDesc;
+			result_types = (Oid *) palloc(result_desc->natts * sizeof(Oid));
+
+			for (int i = 0; i < result_desc->natts; i++)
+result_types[i] = result_desc->attrs[i].atttypid;
 
 			MemSet(nulls, 0, sizeof(nulls));
 
@@ -693,9 +701,10 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 			values[2] = TimestampTzGetDatum(prep_stmt->prepare_time);
 			values[3] = build_regtype_array(prep_stmt->plansource->param_types,
 			prep_stmt->plansource->num_params);
-			values[4] = BoolGetDatum(prep_stmt->from_sql);
-			values[5] = Int64GetDatumFast(prep_stmt->plansource->num_generic_plans);
-			values[6] = Int64GetDatumFast(prep_stmt->plansource->num_custom_plans);
+			values[4] = build_regtype_array(result_types, result_desc->natts);
+			values[5] = BoolGetDatum(prep_stmt->from_sql);
+			values[6] = Int64GetDatumFast(prep_stmt->plansource->num_generic_plans);
+			values[7] = Int64GetDatumFast(prep_stmt->plansource->num_custom_plans);
 
 			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
  values, nulls);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571a33..cb953c6411 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8025,9 +8025,9 @@
   proname => 'pg_prepared_statement', prorows => '1000', proretset => 't',
   provolatile => 's', proparallel => 'r', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{text,text,timestamptz,_regtype,bool,int8,int8}',
-  proargmodes => '{o,o,o,o,o,o,o}',
-  proargnames => '{name,statement,prepare_time,parameter_types,from_sql,generic_plans,custom_plans}',
+  proallargtypes => '{text,text,timestamptz,_regtype,_regtype,bool,int8,int8}',
+  proargmodes => '{o,o,o,o,o,o,o,o}',
+  proargnames => '{name,statement,prepare_time,parameter_types,result_types,from_sql,generic_plans,custom_plans}',
   prosrc => 'pg_prepared_statement' },
 { oid => '2511', descr => 'get the open cursors for this session',
   proname => 'pg_cursor', prorows => '1000', proretset => 't',
diff --git a/src/test/regress/expected/prepare.out b/src/test/regress/expected/prepare.out
index 3306c696b1..faf07f620b 100644
--- a/src/test/regress/expected/prepare.out
+++ b/src/test/regress/expected/prepare.out
@@ -1,9 +1,9 @@
 -- Regression tests for prepareable statements. We query the content
 -- of the pg_prepared_statements view as prepared statements are
 -- created and removed.
-SELECT name, statement, parameter_types FROM pg_prepared_statements;
- name | statement | parameter_types 
---+---+-
+SELECT name, statement, parameter_types, result_types FROM pg_prepared_statements;
+ name | statement | parameter_types | result_types 
+--+---+-+--
 (0 rows)
 
 PREPARE q1 AS SELECT 1 AS a;
@@ -13,10 +13,10 @@ EXECUTE q1;
  1
 (1 row)
 
-SELECT name, 

Re: 15beta1 crash on mips64el in pg_regress/triggers

2022-05-19 Thread Christoph Berg
Re: Tom Lane
> Christoph Berg  writes:
> > Debian unstable mips64el:
> 
> Hmm, so what's different between this and buildfarm member topminnow?

That one is running Debian jessie (aka oldoldoldoldstable), uses
-mabi=32 with gcc 4.9, and runs a kernel from 2015.

The Debian buildd is this: https://db.debian.org/machines.cgi?host=mipsel-aql-01
The host should be running Debian buster, with the build done in an
unstable chroot. I don't know what "LS3A-RS780-1w (Quad Core Loongson 3A)"
means, but it's probably much newer hardware than the other one.

Christoph




15beta1 test failure on mips in isolation/expected/stats

2022-05-19 Thread Christoph Berg
Debian unstable mips (the old 32-bit one):

test vacuum-conflict  ... ok 2170 ms
test vacuum-skip-locked   ... ok 2445 ms
test stats... FAILED38898 ms
test horizons ... ok 4543 ms
test predicate-hash   ... ok22419 ms

 build/src/test/isolation/output_iso/regression.diffs 
diff -U3 /<>/src/test/isolation/expected/stats.out 
/<>/build/src/test/isolation/output_iso/results/stats.out
--- /<>/src/test/isolation/expected/stats.out  2022-05-16 
21:10:42.0 +
+++ /<>/build/src/test/isolation/output_iso/results/stats.out  
2022-05-18 23:26:56.573000536 +
@@ -2854,7 +2854,7 @@

 
seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 
++-+-+-+--+--+
-   3|   9|5|1|0| 1| 1| 
  0
+   3|   9|5|1|0| 4| 1| 
  0
 (1 row)

Full build log:
https://buildd.debian.org/status/fetch.php?pkg=postgresql-15=mipsel=15%7Ebeta1-1=1652916588=0

(I'll try rescheduling this build as well, the last one took 4h before
it failed.)

Christoph




Re: 15beta1 crash on mips64el in pg_regress/triggers

2022-05-19 Thread Christoph Berg
Re: Tom Lane
> Christoph Berg  writes:
> > Debian unstable mips64el:
> 
> Hmm, so what's different between this and buildfarm member topminnow?
> 
> Is the crash 100% reproducible for you?

I have scheduled a rebuild now, we'll know in a few hours...

Christoph




Re: 15beta1 crash on mips64el in pg_regress/triggers

2022-05-19 Thread Tom Lane
Christoph Berg  writes:
> Debian unstable mips64el:

Hmm, so what's different between this and buildfarm member topminnow?

Is the crash 100% reproducible for you?

regards, tom lane




15beta1 crash on mips64el in pg_regress/triggers

2022-05-19 Thread Christoph Berg
Debian unstable mips64el:

2022-05-18 22:57:34.436 UTC client backend[19222] pg_regress/triggers 
STATEMENT:  drop trigger trg1 on trigpart3;
...
2022-05-18 22:57:39.110 UTC postmaster[7864] LOG:  server process (PID 19222) 
was terminated by signal 11: Segmentation fault
2022-05-18 22:57:39.110 UTC postmaster[7864] DETAIL:  Failed process was 
running: SELECT a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid, true)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
  a.attnotnull,
  (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type 
t
   WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND 
a.attcollation <> t.typcollation) AS attcollation,
  a.attidentity,
  a.attgenerated
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = '21816' AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;

 build/src/test/regress/tmp_check/data/core 

warning: Can't open file /dev/shm/PostgreSQL.387042440 during file-backed 
mapping note processing

warning: Can't open file /dev/shm/PostgreSQL.4014890228 during file-backed 
mapping note processing

warning: Can't open file /dev/zero (deleted) during file-backed mapping note 
processing

warning: Can't open file /SYSV035e8a2e (deleted) during file-backed mapping 
note processing
[New LWP 19222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/mips64el-linux-gnuabi64/libthread_db.so.1".
Core was generated by `postgres: buildd regression [local] SELECT   
 '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00ffd000565c in ?? ()
#0  0x00ffd000565c in ?? ()
No symbol table info available.
#1  0x00aaad76b730 in ExecEvalExprSwitchContext (isNull=0xfffb9e85e7, 
econtext=0xaab20e9f90, state=0xaab20ea108) at 
./build/../src/include/executor/executor.h:343
retDatum = 
oldContext = 0xaab1fabb10
retDatum = 
oldContext = 
#2  ExecProject (projInfo=0xaab20ea100) at 
./build/../src/include/executor/executor.h:377
econtext = 0xaab20e9f90
state = 0xaab20ea108
slot = 0xaab20ea5b0
isnull = false
#3  ExecScan (node=0xaab20ea100, accessMtd=0xaaad78b6d0 , 
recheckMtd=0xaaad78bf08 ) at 
./build/../src/backend/executor/execScan.c:238
slot = 
econtext = 
qual = 
projInfo = 0xaab20ea100
#4  0x00aaad76b730 in ExecEvalExprSwitchContext (isNull=0xaab20ea450, 
econtext=0xaab20e9f90, state=0x1208) at 
./build/../src/include/executor/executor.h:343
retDatum = 
oldContext = 0xaab20ea6c8
retDatum = 
oldContext = 
#5  ExecProject (projInfo=0x1200) at 
./build/../src/include/executor/executor.h:377
econtext = 0xaab20e9f90
state = 0x1208
slot = 0xaab20ea638
isnull = false
#6  ExecScan (node=0xaab20ea6d0, accessMtd=0xaab20ea6cd, 
recheckMtd=0xaab20ea310) at ./build/../src/backend/executor/execScan.c:238
slot = 
econtext = 
qual = 
projInfo = 0x1200
#7  0x in ?? ()
No symbol table info available.
Backtrace stopped: frame did not save the PC


Full build log:
https://buildd.debian.org/status/fetch.php?pkg=postgresql-15=mips64el=15%7Ebeta1-1=1652916002=0

Christoph




Re: JSON Functions and Operators Docs for v15

2022-05-19 Thread Andrew Dunstan


On 2022-05-19 Th 08:19, Justin Pryzby wrote:
>> +   same level are considerd to be siblings,
> considered
>
>> +PostgreSQL specific functions detailed in
> postgresql hyphen specific (as in the original)
>
> These would all be easier to read with commas:
>
> +expression is NULL an
> +If WITH UNIQUE is specified the
> +If the input is NULL an SQL NULL is returned. If the input is a 
> number
> +If WITH UNIQUE is specified there must not
> +is strict an error is generated if it yields no 
> items.
>
> There's a few instances of "space space" that could be condensed:
>
> +   json_scalar functions, this needs to be  either 
> json or
> +which  must be a SELECT query returning a single column. If



Thanks, pushed now.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: bogus: logical replication rows/cols combinations

2022-05-19 Thread Tom Lane
Justin Pryzby  writes:
> On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:
>> I have committed the first patch after fixing this part. It seems Tom
>> is not very happy doing this after beta-1 [1]. The reason we get this
>> information via this view (and underlying function) is that it
>> simplifies the queries on the subscriber-side as you can see in the
>> second patch. The query change is as below:
>> [1] - https://www.postgresql.org/message-id/91075.1652929852%40sss.pgh.pa.us

> I think Tom's concern is that adding information to a view seems like adding a
> feature that hadn't previously been contemplated.
> (Catalog changes themselves are not prohibited during the beta period).

It certainly smells like a new feature, but my concern was more around the
post-beta catalog change.  We do those only if really forced to, and the
explanation in the commit message didn't satisfy me as to why it was
necessary.  This explanation isn't much better --- if we're trying to
prohibit a certain class of publication definitions, what good does it do
to check that on the subscriber side?  Even more to the point, how can we
have a subscriber do that by relying on view columns that don't exist in
older versions?  I'm also quite concerned about anything that involves
subscribers examining row filter conditions; that sounds like a pretty
direct route to bugs involving unsolvability and the halting problem.

(But I've not read very much of this thread ... been a bit under the
weather the last couple weeks.  Maybe this actually is a sane solution.
It just doesn't sound like one at this level of detail.)

regards, tom lane




Re: JSON Functions and Operators Docs for v15

2022-05-19 Thread Justin Pryzby
> +   same level are considerd to be siblings,

considered

> +PostgreSQL specific functions detailed in

postgresql hyphen specific (as in the original)

These would all be easier to read with commas:

+expression is NULL an
+If WITH UNIQUE is specified the
+If the input is NULL an SQL NULL is returned. If the input is a number
+If WITH UNIQUE is specified there must not
+is strict an error is generated if it yields no 
items.

There's a few instances of "space space" that could be condensed:

+   json_scalar functions, this needs to be  either 
json or
+which  must be a SELECT query returning a single column. If




Re: bogus: logical replication rows/cols combinations

2022-05-19 Thread Justin Pryzby
On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:
> I have committed the first patch after fixing this part. It seems Tom
> is not very happy doing this after beta-1 [1]. The reason we get this
> information via this view (and underlying function) is that it
> simplifies the queries on the subscriber-side as you can see in the
> second patch. The query change is as below:
> [1] - https://www.postgresql.org/message-id/91075.1652929852%40sss.pgh.pa.us

I think Tom's concern is that adding information to a view seems like adding a
feature that hadn't previously been contemplated.
(Catalog changes themselves are not prohibited during the beta period).

> a. Revert the change in view (and underlying function) as done in
> commit 0ff20288e1 and consider the alternate way (using a slightly
> complex query) to fix. Then maybe for PG-16, we can simplify it by
> changing the underlying function and view.

But, ISTM that it makes no sense to do it differently for v15 just to avoid the
appearance of adding a new feature, only to re-do it in 2 weeks for v16...
So (from a passive observer) +0.1 to keep the current patch.

I have some minor language fixes to that patch.

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d96c72e5310..82aa84e96e1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9691,7 +9691,7 @@ SCRAM-SHA-256$iteration 
count:
 
  
   pg_publication_tables
-  publications and information of their associated tables
+  publications and information about their associated tables
  
 
  
@@ -11635,7 +11635,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts 
ppx
 
   
The view pg_publication_tables provides
-   information about the mapping between publications and information of
+   information about the mapping between publications and information about
tables they contain.  Unlike the underlying catalog
pg_publication_rel,
this view expands publications defined as FOR ALL TABLES
@@ -11695,7 +11695,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts 
ppx
   
   
Names of table columns included in the publication. This contains all
-   the columns of the table when the user didn't specify the column list
+   the columns of the table when the user didn't specify a column list
for the table.
   
  
diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 8c7fca62de3..2f706f638ce 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1077,7 +1077,7 @@ get_publication_name(Oid pubid, bool missing_ok)
 }
 
 /*
- * Returns information of tables in a publication.
+ * Returns information about tables in a publication.
  */
 Datum
 pg_get_publication_tables(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571a331..86f13293090 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11673,7 +11673,7 @@
   prosrc => 'pg_show_replication_origin_status' },
 
 # publications
-{ oid => '6119', descr => 'get information of tables in a publication',
+{ oid => '6119', descr => 'get information about tables in a publication',
   proname => 'pg_get_publication_tables', prorows => '1000', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => 'text',
   proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes => 
'{i,o,o,o}',




parallel not working

2022-05-19 Thread huangning...@yahoo.com
Hi:
I write a  C function, the function is as follows:
create function st_geosotgrid(geom geometry, level integer) returns geosotgrid[]
immutable
strict
parallel safe
language c
as
$$
begin
-- missing source code
end;
$$;




At the same time, I set the relevant parameters:
force_parallel_mode: offmax_parallel_maintenance_workers: 
4max_parallel_workers: 8max_parallel_workers_per_gather: 2max_worker_processes: 
8min_parallel_index_scan_size: 64min_parallel_table_scan_size: 
1024parallel_leader_participation: on
set parallel_setup_cost = 10;set parallel_tuple_cost = 0.001;

sql:
select st_geosotgrid(geom,20) from t_polygon_gis;

and the explain as follows:

Gather  (cost=10.00..5098.67 rows=20 width=32)  Workers Planned: 2  ->  
Parallel Seq Scan on t_polygon_gis  (cost=0.00..4888.67 rows=8 width=32)


when i explain analyze ,the parallel worker is suspend:




I would like to know how can I get it to work properly?

Thank You!




Re: Removing unneeded self joins

2022-05-19 Thread Ronan Dunklau
Le jeudi 19 mai 2022, 12:48:18 CEST Andrey Lepikhov a écrit :
> On 5/17/22 19:14, Ronan Dunklau wrote:
> > Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
> >> New version of the feature.
> >> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
> >> when uniqueness of an inner deduced not from join quals, but from a
> >> baserestrictinfo clauses 'x=const', where x - unique field.
> >> Code, dedicated to solve second issue is controversial, so i attached
> >> delta.txt for quick observation.
> >> Maybe we should return to previous version of code, when we didn't split
> >> restriction list into join quals and base quals?
> > 
> > Hello,
> > 
> > I tried to find problematic cases, which would make the planning time grow
> > unacceptably, and couldn't devise it.
> > 
> > The worst case scenario I could think of was as follows:
> >   - a query with many different self joins
> >   - an abundance of unique indexes on combinations of this table columns
> >   to
> > 
> > consider
> > 
> >   - additional predicates on the where clause on columns.
> 
> Looking into the patch I can imagine, that the most difficult case is
> when a set of relations with the same OID is huge, but only small part
> of them (or nothing) can be removed.
> Also, removing a clause from restrictinfo list or from equivalence class
> adds non-linear complexity. So, you can dig this way ).
> 
> > The base table I used for this was a table with 40 integers. 39 unique
> > indexes were defined on every combination of (c1, cX) with cX being
> > columns c2 to c40.
> > 
> > I turned geqo off, set from_collapse_limit and join_collapse_limit to
> > unreasonably high values (30), and tried to run queries of the form:
> > 
> > SELECT * FROM test_table t1
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
> > ...
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.
> > 
> > So no self join can be eliminated in that case.
> 
> I think, you should compare t1.cX with tX.cX to eliminate self-join.
> Cross-unique-index proof isn't supported now.

Yes, that's the point. I wanted to try to introduce as much complexity as I 
could, without actually performing any self join elimination. The idea was to 
try to come up with the worst case scenario.
> 
> > The performance was very similar with or without the GUC enabled. I tested
> > the same thing without the patch, since the test for uniqueness has been
> > slightly altered and incurs a new allocation, but it doesn't seem to
> > change.
> > 
> > One interesting side effect of this patch, is that removing any unneeded
> > self join cuts down the planification time very significantly, as we
> > lower the number of combinations to consider.
> 
> Even more - removing a join we improve cardinality estimation.
> 
> > As for the code:
> >   - Comments on relation_has_unique_index_ext and
> >   relation_has_unique_index_for> 
> > should be rewritten, as relation_has_unique_index_for is now just a
> > special
> > case of relation_has_unique_index_ext. By the way, the comment would
> > probably be better read as: "but if extra_clauses isn't NULL".
> > 
> >   - The whole thing about "extra_clauses", ie, baserestrictinfos which
> >   were
> > 
> > used to determine uniqueness, is not very clear. Most functions where the
> > new argument has been added have not seen an update in their comments,
> > and the name itself doesn't really convey the intented meaning: perhaps
> > required_non_join_clauses ?
> > 
> > The way this works should be explained a bit more thoroughly, for example
> > in remove_self_joins_one_group the purpose of uclauses should be
> > explained. The fact that degenerate_case returns true when we don't have
> > any additional base restrict info is also confusing, as well as the
> > degenerate_case name.
> Agree,
> but after this case thoughts wander in my head: should we make one step
> back to pre-[1] approach? It looks like we have quite similar changes,
> but without special function for a 'degenerate case' detection and
> restrictlist splitting.

I'll take a look at that one. 

> 
> > I'll update if I think of more interesting things to add.
> 
> Thank you for your efforts!
> 
> See in attachment next version which fixes mistakes detected by
> z...@yugabyte.com.
> 
> [1]
> https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW
> 5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com


-- 
Ronan Dunklau






Re: Addition of PostgreSQL::Test::Cluster::pg_version()

2022-05-19 Thread Andrew Dunstan


On 2022-05-18 We 21:38, Michael Paquier wrote:
> Hi all,
> (Added Andrew in CC.)
>
> While working more on expanding the tests of pg_upgrade for
> cross-version checks, I have noticed that we don't expose a routine
> able to get back _pg_version from a node, which should remain a
> private field of Cluster.pm.  We already do that for install_path, as
> of case added by 87076c4.
>
> Any objections or comments about the addition of a routine to get the
> PostgreSQL::Version, as of the attached?
>


Looks ok. PostgreSQL::Version is designed so that the object behaves
sanely in comparisons and when interpolated into a string.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Build-farm - intermittent error in 031_column_list.pl

2022-05-19 Thread Amit Kapila
On Thu, May 19, 2022 at 3:16 PM Amit Kapila  wrote:
>
> On Thu, May 19, 2022 at 12:28 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 19 May 2022 14:26:56 +1000, Peter Smith  
> > wrote in
> > > Hi hackers.
> > >
> > > FYI, I saw that there was a recent Build-farm error on the "grison" 
> > > machine [1]
> > > [1] 
> > > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grison=HEAD
> > >
> > > The error happened during "subscriptionCheck" phase in the TAP test
> > > t/031_column_list.pl
> > > This test file was added by this [2] commit.
> > > [2] 
> > > https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5
> >
> > What is happening for all of them looks like that the name of a
> > publication created by CREATE PUBLICATION without a failure report is
> > missing for a walsender came later. It seems like CREATE PUBLICATION
> > can silently fail to create a publication, or walsender somehow failed
> > to find existing one.
> >
>
> Do you see anything in LOGS which indicates CREATE SUBSCRIPTION has failed?
>
> >
> > > ~~
> > >
> >
> > 2022-04-17 00:16:04.278 CEST [293659][client 
> > backend][4/270:0][031_column_list.pl] LOG:  statement: CREATE PUBLICATION 
> > pub9 FOR TABLE test_part_d (a) WITH (publish_via_partition_root = true);
> > 2022-04-17 00:16:04.279 CEST [293659][client 
> > backend][:0][031_column_list.pl] LOG:  disconnection: session time: 
> > 0:00:00.002 user=bf database=postgres host=[local]
> >
> > "CREATE PUBLICATION pub9" is executed at 00:16:04.278 on 293659 then
> > the session has been disconnected. But the following request for the
> > same publication fails due to the absense of the publication.
> >
> > 2022-04-17 00:16:08.147 CEST [293856][walsender][3/0:0][sub1] STATEMENT:  
> > START_REPLICATION SLOT "sub1" LOGICAL 0/153DB88 (proto_version '3', 
> > publication_names '"pub9"')
> > 2022-04-17 00:16:08.148 CEST [293856][walsender][3/0:0][sub1] ERROR:  
> > publication "pub9" does not exist
> >
>
> This happens after "ALTER SUBSCRIPTION sub1 SET PUBLICATION pub9". The
> probable theory is that ALTER SUBSCRIPTION will lead to restarting of
> apply worker (which we can see in LOGS as well) and after the restart,
> the apply worker will use the existing slot and replication origin
> corresponding to the subscription. Now, it is possible that before
> restart the origin has not been updated and the WAL start location
> points to a location prior to where PUBLICATION pub9 exists which can
> lead to such an error. Once this error occurs, apply worker will never
> be able to proceed and will always return the same error. Does this
> make sense?
>
> Unless you or others see a different theory, this seems to be the
> existing problem in logical replication which is manifested by this
> test. If we just want to fix these test failures, we can create a new
> subscription instead of altering the existing publication to point to
> the new publication.
>

If the above theory is correct then I think allowing the publisher to
catch up with "$node_publisher->wait_for_catchup('sub1');" before
ALTER SUBSCRIPTION should fix this problem. Because if before ALTER
both publisher and subscriber are in sync then the new publication
should be visible to WALSender.

-- 
With Regards,
Amit Kapila.




Re: Removing unneeded self joins

2022-05-19 Thread Andrey Lepikhov

On 5/17/22 19:14, Ronan Dunklau wrote:

Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :

New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?


Hello,

I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.

The worst case scenario I could think of was as follows:

  - a query with many different self joins
  - an abundance of unique indexes on combinations of this table columns to
consider
  - additional predicates on the where clause on columns.
Looking into the patch I can imagine, that the most difficult case is 
when a set of relations with the same OID is huge, but only small part 
of them (or nothing) can be removed.
Also, removing a clause from restrictinfo list or from equivalence class 
adds non-linear complexity. So, you can dig this way ).


The base table I used for this was a table with 40 integers. 39 unique indexes
were defined on every combination of (c1, cX) with cX being columns c2 to c40.

I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:

SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.

So no self join can be eliminated in that case.
I think, you should compare t1.cX with tX.cX to eliminate self-join. 
Cross-unique-index proof isn't supported now.

The performance was very similar with or without the GUC enabled. I tested the
same thing without the patch, since the test for uniqueness has been slightly
altered and incurs a new allocation, but it doesn't seem to change.

One interesting side effect of this patch, is that removing any unneeded self
join cuts down the planification time very significantly, as we lower the number
of combinations to consider.

Even more - removing a join we improve cardinality estimation.


As for the code:

  - Comments on relation_has_unique_index_ext and relation_has_unique_index_for
should be rewritten, as relation_has_unique_index_for is now just a special
case of relation_has_unique_index_ext. By the way, the comment would probably
be better read as: "but if extra_clauses isn't NULL".
  - The whole thing about "extra_clauses", ie, baserestrictinfos which were
used to determine uniqueness, is not very clear. Most functions where the new
argument has been added have not seen an update in their comments, and the
name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?

The way this works should be explained a bit more thoroughly, for example in
remove_self_joins_one_group the purpose of uclauses should be explained. The
fact that degenerate_case returns true when we don't have any additional base
restrict info is also confusing, as well as the degenerate_case name.

Agree,
but after this case thoughts wander in my head: should we make one step 
back to pre-[1] approach? It looks like we have quite similar changes, 
but without special function for a 'degenerate case' detection and 
restrictlist splitting.


I'll update if I think of more interesting things to add.

Thank you for your efforts!

See in attachment next version which fixes mistakes detected by 
z...@yugabyte.com.


[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 1774b4bcfe337b151b76c7f357dc6748755bf1d9 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some 

Re: Zstandard support for toast compression

2022-05-19 Thread Amit Kapila
On Wed, May 18, 2022 at 9:47 PM Robert Haas  wrote:
>
> I do understand that Zstandard is a good compression algorithm, and if
> we had an extensibility mechanism here where one of the four possible
> bit patterns then indicates that the next byte (or two or four) stores
> the real algorithm type, then what about adding Zstandard that way
> instead of consuming one of our four primary bit patterns? That way
> we'd have this option for people who want it, but we'd have more
> options for the future instead of fewer.
>
> i.e. something like:
>
> 00 = PGLZ
> 01 = LZ4
> 10 = reserved for future emergencies
> 11 = extended header with additional type byte (1 of 256 possible
> values reserved for Zstandard)
>

+1 for such an extensible mechanism if we decide to go with Zstandard
compression algorithm. To decide that won't it make sense to see some
numbers as Michael already has a patch for the new algorithm?

-- 
With Regards,
Amit Kapila.




Re: Intermittent buildfarm failures on wrasse

2022-05-19 Thread Alvaro Herrera
I was looking at backpatching this to pg13.  That made me realize that
commit dc7420c2c927 changed things in 14; and before that commit, the
bitmask that is checked is PROCARRAY_FLAGS_VACUUM, which has a
definition independently from whatever proc.h says.  As far as I can
tell, there's no problem with the patches I post here (the backpatched
version for pg13 and p14).  But's it's something to be aware of; and if
we do want to add the additional bits to the bitmask, we should do that
in a separate master-only commit.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 6595af9af208de33ab431b502e2f178f7d0f8007 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Sat, 14 May 2022 16:51:23 +0200
Subject: [PATCH] Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGS

This is a slight, convenient semantics change from what commit
0f0cfb494004 introduced that lets us simplify the coding in the one
place where it is used.
---
 src/backend/storage/ipc/procarray.c | 11 ++-
 src/include/storage/proc.h  |  7 +++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 6ff8d8699b..447b6e3de7 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1916,12 +1916,13 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 		TransactionIdIsNormal(xid) &&
 		TransactionIdPrecedesOrEquals(xid, xmin))
 	{
-		/* Install xmin */
+		/*
+		 * Install xmin and propagate the vacuumFlags that affect how the
+		 * value is interpreted by vacuum.
+		 */
 		MyPgXact->xmin = TransactionXmin = xmin;
-
-		/* Flags being copied must be valid copy-able flags. */
-		Assert((pgxact->vacuumFlags & (~PROC_COPYABLE_FLAGS)) == 0);
-		MyPgXact->vacuumFlags = pgxact->vacuumFlags;
+		MyPgXact->vacuumFlags = (MyPgXact->vacuumFlags & ~PROC_XMIN_FLAGS) |
+			(pgxact->vacuumFlags & PROC_XMIN_FLAGS);
 
 		result = true;
 	}
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 9cf9684b41..7c85b5645b 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -63,11 +63,10 @@ struct XidCache
 	(PROC_IN_VACUUM | PROC_IN_ANALYZE | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
- * Flags that are valid to copy from another proc, the parallel leader
- * process in practice.  Currently, a flag that is set during parallel
- * vacuum is allowed.
+ * Xmin-related flags. Make sure any flags that affect how the process' Xmin
+ * value is interpreted by VACUUM are included here.
  */
-#define		PROC_COPYABLE_FLAGS (PROC_IN_VACUUM)
+#define		PROC_XMIN_FLAGS (PROC_IN_VACUUM)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
-- 
2.30.2

>From 35197d187fe3e7c9e4bd563396c9c7459e39a7ff Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Sat, 14 May 2022 16:51:23 +0200
Subject: [PATCH] Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGS

This is a slight, convenient semantics change from what commit
0f0cfb494004 introduced that lets us simplify the coding in the one
place where it is used.
---
 src/backend/storage/ipc/procarray.c | 11 ++-
 src/include/storage/proc.h  |  7 +++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 127be9c017..08053a7e8f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2686,12 +2686,13 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 		TransactionIdIsNormal(xid) &&
 		TransactionIdPrecedesOrEquals(xid, xmin))
 	{
-		/* Install xmin */
+		/*
+		 * Install xmin and propagate the statusFlags that affect how the
+		 * value is interpreted by vacuum.
+		 */
 		MyProc->xmin = TransactionXmin = xmin;
-
-		/* Flags being copied must be valid copy-able flags. */
-		Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
-		MyProc->statusFlags = proc->statusFlags;
+		MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
+			(proc->statusFlags & PROC_XMIN_FLAGS);
 		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 
 		result = true;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 11b514c9ae..1464fad9b9 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -66,11 +66,10 @@ struct XidCache
 	(PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
- * Flags that are valid to copy from another proc, the parallel leader
- * process in practice.  Currently, flags that are set during parallel
- * vacuum and parallel index creation are allowed.
+ * Xmin-related flags. Make sure any flags that affect how the process' Xmin
+ * value is interpreted by VACUUM are included here.
  */
-#define		PROC_COPYABLE_FLAGS (PROC_IN_VACUUM | PROC_IN_SAFE_IC)
+#define		PROC_XMIN_FLAGS (PROC_IN_VACUUM | PROC_IN_SAFE_IC)
 
 /*
  * We allow a small number of "weak" relation locks 

Re: Build-farm - intermittent error in 031_column_list.pl

2022-05-19 Thread Amit Kapila
On Thu, May 19, 2022 at 12:28 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 19 May 2022 14:26:56 +1000, Peter Smith  wrote 
> in
> > Hi hackers.
> >
> > FYI, I saw that there was a recent Build-farm error on the "grison" machine 
> > [1]
> > [1] 
> > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grison=HEAD
> >
> > The error happened during "subscriptionCheck" phase in the TAP test
> > t/031_column_list.pl
> > This test file was added by this [2] commit.
> > [2] 
> > https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5
>
> What is happening for all of them looks like that the name of a
> publication created by CREATE PUBLICATION without a failure report is
> missing for a walsender came later. It seems like CREATE PUBLICATION
> can silently fail to create a publication, or walsender somehow failed
> to find existing one.
>

Do you see anything in LOGS which indicates CREATE SUBSCRIPTION has failed?

>
> > ~~
> >
>
> 2022-04-17 00:16:04.278 CEST [293659][client 
> backend][4/270:0][031_column_list.pl] LOG:  statement: CREATE PUBLICATION 
> pub9 FOR TABLE test_part_d (a) WITH (publish_via_partition_root = true);
> 2022-04-17 00:16:04.279 CEST [293659][client backend][:0][031_column_list.pl] 
> LOG:  disconnection: session time: 0:00:00.002 user=bf database=postgres 
> host=[local]
>
> "CREATE PUBLICATION pub9" is executed at 00:16:04.278 on 293659 then
> the session has been disconnected. But the following request for the
> same publication fails due to the absense of the publication.
>
> 2022-04-17 00:16:08.147 CEST [293856][walsender][3/0:0][sub1] STATEMENT:  
> START_REPLICATION SLOT "sub1" LOGICAL 0/153DB88 (proto_version '3', 
> publication_names '"pub9"')
> 2022-04-17 00:16:08.148 CEST [293856][walsender][3/0:0][sub1] ERROR:  
> publication "pub9" does not exist
>

This happens after "ALTER SUBSCRIPTION sub1 SET PUBLICATION pub9". The
probable theory is that ALTER SUBSCRIPTION will lead to restarting of
apply worker (which we can see in LOGS as well) and after the restart,
the apply worker will use the existing slot and replication origin
corresponding to the subscription. Now, it is possible that before
restart the origin has not been updated and the WAL start location
points to a location prior to where PUBLICATION pub9 exists which can
lead to such an error. Once this error occurs, apply worker will never
be able to proceed and will always return the same error. Does this
make sense?

Unless you or others see a different theory, this seems to be the
existing problem in logical replication which is manifested by this
test. If we just want to fix these test failures, we can create a new
subscription instead of altering the existing publication to point to
the new publication.

Note: Added Tomas to know his views as he has committed this test.

-- 
With Regards,
Amit Kapila.




Re: First draft of the PG 15 release notes

2022-05-19 Thread Amit Langote
On Thu, May 19, 2022 at 2:56 PM David Rowley  wrote:
> On Thu, 19 May 2022 at 14:41, Amit Langote  wrote:
> > Though a bit late given beta is now wrapped, I have another partition
> > item wording improvement suggestion:
> >
> > -Previously, a partitioned table with any LIST partition containing
> > multiple values could not be used for ordered partition scans.  Now
> > only non-pruned LIST partitions are checked.  This also helps with
> > -partitioned tables with DEFAULT partitions.
> >
> > +Previously, an ordered partition scan would not be considered for a
> > LIST-partitioned table with any partition containing multiple values,
> > nor for partitioned tables with DEFAULT partition.
>
> I think your proposed wording does not really improve things.  The
> "Now only non-pruned LIST partitions are checked" is important and I
> think Bruce did the right thing to mention that. Prior to this change,
> ordered scans were not possible if there was a DEFAULT or if any LIST
> partition allowed >1 value. Now, if the default partition is pruned
> and there are no non-pruned partitions that allow Datum values that
> are inter-mixed with ones from another non-pruned partition, then an
> ordered scan can be performed.
>
> For example, non-pruned partition a allows IN(1,3), and non-pruned
> partition b allows IN(2,4), we cannot do the ordered scan. With
> IN(1,2), IN(3,4), we can.

I think that's what I understood this change to be about.  Before this
change, partitions_are_ordered() only returned true if *all*
partitions of a parent are known to be ordered, which they're not in
the presence of the default partition and of a list partition
containing out-of-order values.  It didn't matter to
partitions_are_ordered() that the caller might not care about those
partitions being present in the PartitionDesc because of having been
pruned by the query, but that information was not readily available .
So, you added PartitionBoundInfo.interleaved_parts to record indexes
of partitions containing out-of-order values and RelOptInfo.live_parts
to record non-pruned partitions, which made it more feasible for
partitions_are_ordered() to address those cases.  I suppose you think
it's better to be verbose by mentioning that partitions_are_ordered()
now considers only non-pruned partitions which allows supporting more
cases, but I see that as mentioning implementation details
unnecessarily.

Or maybe we could mention that but use a wording that doesn't make it
sound like an implementation detail, like:

+Previously, an ordered partition scan could not be used for a
LIST-partitioned table with any partition containing multiple values,
nor for partitioned tables with DEFAULT partition.  Now it can be used
in those cases at least for queries in which such partitions are
pruned.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


reword-ordered-partition-scan-item_v2.diff
Description: Binary data


RE: Handle infinite recursion in logical replication setup

2022-05-19 Thread shiy.f...@fujitsu.com
On Wed, May 4, 2022 2:47 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v13 patch has the changes for the same.
>

Here are some comments on v13-0002 patch.

1)
+* Throw an error so that the user can take care of the initial 
data
+* copying and then create subscription with copy_data off.

Should "with copy_data off" be changed to "with copy_data off or force"?

2)
case ALTER_SUBSCRIPTION_ADD_PUBLICATION:
case ALTER_SUBSCRIPTION_DROP_PUBLICATION:
...
/*
 * See ALTER_SUBSCRIPTION_REFRESH for 
details why this is
 * not allowed.
 */
if (sub->twophasestate == 
LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)

I think we need some changes here, too. Should it be modified to:
if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && 
IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))

Regards,
Shi yu


Re: Zstandard support for toast compression

2022-05-19 Thread Michael Paquier
On Wed, May 18, 2022 at 12:17:16PM -0400, Robert Haas wrote:
> i.e. something like:
> 
> 00 = PGLZ
> 01 = LZ4
> 10 = reserved for future emergencies
> 11 = extended header with additional type byte (1 of 256 possible
> values reserved for Zstandard)

Btw, shouldn't we have something a bit more, err, extensible for the
design of an extensible varlena header?  If we keep it down to some
bitwise information, we'd be fine for a long time but it would be
annoying to review again an extended design if we need to extend it
with more data.
--
Michael


signature.asc
Description: PGP signature


Re: Skipping schema changes in publication

2022-05-19 Thread Peter Smith
Below are my review comments for v6-0001.

==

1. General.

The patch failed 'publication' tests in the make check phase.

Please add this work to the commit-fest so that the 'cfbot' can report
such errors sooner.

~~~

2. src/backend/commands/publicationcmds.c - AlterPublicationReset

+/*
+ * Reset the publication.
+ *
+ * Reset the publication options, publication relations and
publication schemas.
+ */
+static void
+AlterPublicationReset(ParseState *pstate, AlterPublicationStmt *stmt,
+ Relation rel, HeapTuple tup)

SUGGESTION (Make the comment similar to the sgml text instead of
repeating "publication" 4x !)
/*
 * Reset the publication options, set the ALL TABLES flag to false, and
 * drop all relations and schemas that are associated with the publication.
 */

~~~

3. src/test/regress/expected/publication.out

make check failed. The diff is below:

@@ -1716,7 +1716,7 @@
 -- Verify that only superuser can reset a publication
 ALTER PUBLICATION testpub_reset OWNER TO regress_publication_user2;
 SET ROLE regress_publication_user2;
-ALTER PUBLICATION testpub_reset RESET; -- fail
+ALTER PUBLICATION testpub_reset RESET; -- fail - must be superuser
 ERROR:  must be superuser to RESET publication
 SET ROLE regress_publication_user;
 DROP PUBLICATION testpub_reset;


--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Limiting memory allocation

2022-05-19 Thread Dmitry Dolgov
> On Wed, May 18, 2022 at 04:49:24PM -0400, Joe Conway wrote:
> On 5/18/22 16:20, Alvaro Herrera wrote:
> > On 2022-May-18, Joe Conway wrote:
> >
> > > On 5/18/22 11:11, Alvaro Herrera wrote:
> >
> > > > Apparently, if the cgroup goes over the "high" limit, the processes are
> > > > *throttled*.  Then if the group goes over the "max" limit, OOM-killer is
> > > > invoked.
> >
> > > You may be misinterpreting "throttle" in this context. From [1]:
> > >
> > >   The memory.high boundary on the other hand can be set
> > >   much more conservatively. When hit, it throttles
> > >   allocations by forcing them into direct reclaim to
> > >   work off the excess, but it never invokes the OOM
> > >   killer.
> >
> > Well, that means the backend processes don't do their expected task
> > (process some query) but instead they have to do "direct reclaim".  I
> > don't know what that is, but it sounds like we'd need to add
> > Linux-specific code in order for this to fix anything.
>
> Postgres does not need to do anything. The kernel just does its thing (e.g.
> clearing page cache or swapping out anon memory) more aggressively than
> normal to clear up some space for the impending allocation.
>
> > And what would we do in such a situation anyway?  Seems like our
> > best hope would be to> get malloc() to return NULL and have the
> > resulting transaction abort free enough memory that things in other
> > backends can continue to run.
>
> With the right hooks an extension could detect the memory pressure in an OS
> specific way and return null.
>
> > *If* there is a way to have cgroups make Postgres do that, then that
> > would be useful enough.
>
> Memory accounting under cgroups (particularly v2) can provide the signal
> needed for a Linux specific extension to do that.

To elaborate a bit on this, Linux PSI feature (in the context of
containers, cgroups v2 only) [1] would allow a userspace application to
register a trigger on memory pressure exceeding some threshold. The
pressure here is not exactly how much memory is allocated, but rather
memory stall, and the whole machinery would involve polling -- but still
sounds interesting in the context of this thread.

[1]: https://www.kernel.org/doc/Documentation/accounting/psi.rst




Re: Addition of PostgreSQL::Test::Cluster::pg_version()

2022-05-19 Thread Daniel Gustafsson
> On 19 May 2022, at 03:38, Michael Paquier  wrote:

> Any objections or comments about the addition of a routine to get the
> PostgreSQL::Version, as of the attached?

I haven't tested the patch, but +1 on the idea.

--
Daniel Gustafsson   https://vmware.com/





Re: Build-farm - intermittent error in 031_column_list.pl

2022-05-19 Thread Kyotaro Horiguchi
At Thu, 19 May 2022 14:26:56 +1000, Peter Smith  wrote 
in 
> Hi hackers.
> 
> FYI, I saw that there was a recent Build-farm error on the "grison" machine 
> [1]
> [1] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grison=HEAD
> 
> The error happened during "subscriptionCheck" phase in the TAP test
> t/031_column_list.pl
> This test file was added by this [2] commit.
> [2] 
> https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5

What is happening for all of them looks like that the name of a
publication created by CREATE PUBLICATION without a failure report is
missing for a walsender came later. It seems like CREATE PUBLICATION
can silently fail to create a publication, or walsender somehow failed
to find existing one.


> ~~
> 
> I checked the history of fails for that TAP test t/031_column_list.pl
> and found that this same error seems to have been happening
> intermittently for at least the last 50 days.
> 
> Details of similar previous errors from the BF are listed below.
> 
> ~~~
> 
> 1. Details for system "grison" failure at stage subscriptionCheck,
> snapshot taken 2022-05-18 18:11:45
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison=2022-05-18%2018%3A11%3A45
> 
> [22:02:08] t/029_on_error.pl .. ok25475 ms ( 0.01
> usr  0.00 sys + 15.39 cusr  5.59 csys = 20.99 CPU)
> # poll_query_until timed out executing this query:
> # SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
> #  FROM pg_catalog.pg_stat_replication
> #  WHERE application_name IN ('sub1', 'walreceiver')
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # Tests were run but no plan was declared and done_testing() was not seen.
> # Looks like your test exited with 29 just after 22.
> [22:09:25] t/031_column_list.pl ...
> ...
> [22:02:47.887](1.829s) ok 22 - partitions with different replica
> identities not replicated correctly Waiting for replication conn
> sub1's replay_lsn to pass 0/1530588 on publisher
> [22:09:25.395](397.508s) # poll_query_until timed out executing this query:
> # SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
> #  FROM pg_catalog.pg_stat_replication
> #  WHERE application_name IN ('sub1', 'walreceiver')
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> timed out waiting for catchup at t/031_column_list.pl line 728.
> ### Stopping node "publisher" using mode immediate

2022-04-17 00:16:04.278 CEST [293659][client 
backend][4/270:0][031_column_list.pl] LOG:  statement: CREATE PUBLICATION pub9 
FOR TABLE test_part_d (a) WITH (publish_via_partition_root = true);
2022-04-17 00:16:04.279 CEST [293659][client backend][:0][031_column_list.pl] 
LOG:  disconnection: session time: 0:00:00.002 user=bf database=postgres 
host=[local]

"CREATE PUBLICATION pub9" is executed at 00:16:04.278 on 293659 then
the session has been disconnected. But the following request for the
same publication fails due to the absense of the publication.

2022-04-17 00:16:08.147 CEST [293856][walsender][3/0:0][sub1] STATEMENT:  
START_REPLICATION SLOT "sub1" LOGICAL 0/153DB88 (proto_version '3', 
publication_names '"pub9"')
2022-04-17 00:16:08.148 CEST [293856][walsender][3/0:0][sub1] ERROR:  
publication "pub9" does not exist


> ~~~
> 
> 2. Details for system "xenodermus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-16 21:00:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus=2022-04-16%2021%3A00%3A04

The same. pub9 is missing after creation.

> ~~~
> 
> 3. Details for system "phycodurus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-05 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus=2022-04-05%2017%3A30%3A04

The same happens for pub7..

> 4. Details for system "phycodurus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-05 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus=2022-04-05%2017%3A30%3A04

Same. pub7 is missing.

> 5. Details for system "grison" failure at stage subscriptionCheck,
> snapshot taken 2022-04-03 18:11:39
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison=2022-04-03%2018%3A11%3A39

Same. pub7 is missing.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-19 Thread Peter Smith
Here are my review comments for v6-0002.

==

1. src/test/subscription/t/015_stream.pl

+
+# Test using streaming mode 'on'
+
 $node_subscriber->safe_psql('postgres',
  "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
application_name=$appname' PUBLICATION tap_pub WITH (streaming = on)"
 );
-
 $node_publisher->wait_for_catchup($appname);
-
 # Also wait for initial table sync to finish
 my $synced_query =
   "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
 $node_subscriber->poll_query_until('postgres', $synced_query)
   or die "Timed out while waiting for subscriber to synchronize data";
-
 my $result =
   $node_subscriber->safe_psql('postgres',
  "SELECT count(*), count(c), count(d = 999) FROM test_tab");
 is($result, qq(2|2|2), 'check initial data was copied to subscriber');

1a.
Several whitespace lines became removed by the patch. IMO it was
better (e.g. less squishy) how it looked originally.

1b.
Maybe some more blank lines should be added to the 'apply' test part
too, to match 1a.

~~~

2. src/test/subscription/t/015_stream.pl

+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = '$appname' AND state = 'streaming';"
+) or die "Timed out while waiting for apply to restart after changing
PUBLICATION";

Should that say "... after changing SUBSCRIPTION"?

~~~

3. src/test/subscription/t/016_stream_subxact.pl

+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = '$appname' AND state = 'streaming';"
+) or die "Timed out while waiting for apply to restart after changing
PUBLICATION";
+

Should that say "... after changing SUBSCRIPTION"?

~~~

4. src/test/subscription/t/017_stream_ddl.pl

+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = '$appname' AND state = 'streaming';"
+) or die "Timed out while waiting for apply to restart after changing
PUBLICATION";
+

Should that say "... after changing SUBSCRIPTION"?

~~~

5. .../t/018_stream_subxact_abort.pl

+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = '$appname' AND state = 'streaming';"
+) or die "Timed out while waiting for apply to restart after changing
PUBLICATION";

Should that say "... after changing SUBSCRIPTION" ?

~~~

6. .../t/019_stream_subxact_ddl_abort.pl

+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = '$appname' AND state = 'streaming';"
+) or die "Timed out while waiting for apply to restart after changing
PUBLICATION";
+

Should that say "... after changing SUBSCRIPTION"?

~~~

7. .../subscription/t/023_twophase_stream.pl

###
# Check initial data was copied to subscriber
###

Perhaps the above comment now looks a bit out-of-place with the extra #.

Looks better now as just:
# Check initial data was copied to the subscriber

~~~

8. .../subscription/t/023_twophase_stream.pl

+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = '$appname' AND state = 'streaming';"
+) or die "Timed out while waiting for apply to restart after changing
PUBLICATION";

Should that say "... after changing SUBSCRIPTION"?

--
Kind Regards,
Peter Smith.
Fujitsu Australia