Re: abi-compliance-checker

2023-05-29 Thread Peter Eisentraut

On 27.05.23 02:52, Peter Geoghegan wrote:

Attached is a html report that was generated by a tool called
abi-compliance-checker/abi-dumper [1][2] (by using
"abi-compliance-checker -l libTest ... ") .


I have been using the libabigail library/set of tools (abidiff, abidw) 
for this.  I was not familiar with the tool you used.  The nice thing 
about abidiff is that it gives you text output and a meaningful exit 
status, so you can make it part of the build or test process.  You can 
also write suppression files to silence specific warnings.


I think the way to use this would be to compute the ABI for the .0 
release (or rc1 or something like that) and commit it into the tree. 
And then compute the current ABI and compare that against the recorded 
base ABI.


Here is the workflow:

# build REL_11_0
abidw src/backend/postgres > src/tools/postgres-abi-REL_11_0.xml
# build REL_11_20
abidw src/backend/postgres > src/tools/postgres-abi.xml
abidiff --no-added-syms src/tools/postgres-abi-REL_11_0.xml 
src/tools/postgres-abi.xml


This prints

Functions changes summary: 0 Removed, 0 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 14 Removed, 0 Added (85 filtered out) 
function symbols not referenced by debug info
Variable symbols changes summary: 1 Removed, 0 Added (3 filtered out) 
variable symbols not referenced by debug info


14 Removed function symbols not referenced by debug info:

  [D] RelationHasUnloggedIndex
  [D] assign_nestloop_param_placeholdervar
  [D] assign_nestloop_param_var
  [D] logicalrep_typmap_gettypname
  [D] logicalrep_typmap_update
  [D] pqsignal_no_restart
  [D] rb_begin_iterate
  [D] rb_create
  [D] rb_delete
  [D] rb_find
  [D] rb_insert
  [D] rb_iterate
  [D] rb_leftmost
  [D] transformCreateSchemaStmt

1 Removed variable symbol not referenced by debug info:

  [D] wrconn

This appears to be similar to what your tool produced, but I haven't 
checked it in detail.






Re: Support logical replication of DDLs

2023-05-29 Thread shveta malik
On Mon, May 29, 2023 at 6:16 PM vignesh C  wrote:
>

> >
> > I found few comments while making some changes to the patch:
> > 1) Now that objtree is removed, these comments should be modified:
> >  * Deparse object tree is created by using:
> >  * a) new_objtree("know contents") where the complete tree content is known 
> > or
> >  * the initial tree content is known.
> >  * b) new_objtree("") for the syntax where the object tree will be derived
> >  * based on some conditional checks.
> >  * c) new_objtree_VA where the complete tree can be derived using some fixed
> >  * content or by using the initial tree content along with some variable
> >  * arguments.
> >  *
>
> Modified
>
> >  2) isgrant can be removed as it is not used anymore:
> > +/*
> > + * Return the given object type as a string.
> > + *
> > + * If isgrant is true, then this function is called while deparsing GRANT
> > + * statement and some object names are replaced.
> > + */
> > +const char *
> > +stringify_objtype(ObjectType objtype, bool isgrant)
> > +{
> > +   switch (objtype)
> > +   {
> > +   case OBJECT_TABLE:
> > +   return "TABLE";
> > +   default:
> > +   elog(ERROR, "unsupported object type %d", objtype);
> > +   }
> > +
> > +   return "???";   /* keep compiler quiet */
> > +}
>
> Modified
>
> > 3) This statement is not being handled currently, it should be implemented:
> > "alter table all in tablespace tbs1 set tablespace"
>
> Modified
>

With the above fix, are the below commented tests in alter_table.sql
supposed to work? If so, shall these be uncommented?
-- ALTER TABLE ALL IN TABLESPACE pg_default SET TABLESPACE pg_default;
-- ALTER TABLE ALL IN TABLESPACE pg_default OWNED BY ddl_testing_role
SET TABLESPACE pg_default;

thanks
Shveta




Re: ERROR: no relation entry for relid 6

2023-05-29 Thread Richard Guo
On Sat, May 27, 2023 at 12:16 AM Tom Lane  wrote:

> Ah.  I realized that we could make the problem testable by adding
> assertions that a joinclause we're not removing doesn't contain
> any surviving references to the target rel or join.  That turns
> out to fire (without the bug fix) in half a dozen existing test
> cases, so I felt that we didn't need to add another one.
>
> I did the other refactoring we discussed and pushed it.
> Thanks for the report and review!


Thanks for pushing it!

I've managed to find some problems on master with the help of the new
assertions.  First the query below would trigger the new assertions.

create table t (a int unique, b int);

explain (costs off)
select 1 from t t1 left join
(select t2.a, 1 as c from
 t t2 left join t t3 on t2.a = t3.a) s
on true left join t t4 on true where s.a < s.c;
server closed the connection unexpectedly

Note that 's.c' would be wrapped in PHV so the qual 's.a < s.c' is
actually 't2.a < PHV(1)', and it is one of t3's joinquals.  When we're
removing the t2/t3 join, we decide that this PHV is no longer needed so
we remove it entirely rather than just remove references from it.  But
actually its phrels still have references to t3 and t2/t3 join.  So when
we put back the qual 's.a < s.c', we will trigger the new assertions.

At first glance I thought we can just remove the new assertions.  But
then I figured out that the problem is more complex than that.  If the
PHV contains lateral references, removing the PHV entirely would cause
us to lose the information about the lateral references, and that may
cause wrong query results in some cases.  Consider the query below
(after removing the two new assertions, or in a non-assert build).

explain (costs off)
select 1 from t t1 left join
lateral (select t2.a, coalesce(t1.a, 1) as c from
 t t2 left join t t3 on t2.a = t3.a) s
on true left join t t4 on true where s.a < s.c;
QUERY PLAN
--
 Nested Loop Left Join
   ->  Nested Loop
 ->  Seq Scan on t t1
 ->  Materialize
   ->  Seq Scan on t t2
 Filter: (a < COALESCE(a, 1))
   ->  Materialize
 ->  Seq Scan on t t4
(8 rows)

The PHV of 'coalesce(t1.a, 1)' has lateral reference to t1 but we'd lose
this information because we've removed this PHV entirely in
remove_rel_from_query.  As a consequence, we'd fail to extract the
lateral dependency for t2 and fail to build the nestloop parameters for
the t1/t2 join.  And that causes wrong query results.  We can see that
if we insert some data into the table.

insert into t select 1,1;
insert into t select 2,2;

On v15 the query above gives

 ?column?
--
1
1
(2 rows)

but on master it gives

 ?column?
--
(0 rows)

I haven't thought through how to fix it, but I suspect that we may need
to do more checking before we decide to remove PHVs in
remove_rel_from_query.

Thanks
Richard


Re: PG 16 draft release notes ready

2023-05-29 Thread Bruce Momjian
On Wed, May 24, 2023 at 04:13:14PM +1200, David Rowley wrote:
> On Wed, 24 May 2023 at 15:54, Bruce Momjian  wrote:
> > First, I need to mention parallel _hash_ join.  Second, I think this
> > item is saying that the _inner_ side of a parallel hash join can be an
> > OUTER or FULL join.  How about?
> >
> > Allow hash joins to be parallelized where the inner side is
> > processed as an OUTER or FULL join (Melanie Plageman, Thomas Munro)
> >
> > In this case, the inner side is the hashed side.
> 
> I think Jonathan's text is safe to swap OUTER to RIGHT as it mentions
> "execution". For the release notes, maybe the mention of it can be
> moved away from "E.1.3.1.1. Optimizer" and put under "E.1.3.1.2.
> General Performance" and ensure we mention that we're talking about
> the executor?
> 
> I'm thinking it might be confusing if we claim that this is something
> that we switched on in the planner. It was a limitation with the
> executor which the planner was just onboard with not producing plans
> for.

Well, I try to keep plan changes in the optimizer section because that
is where the decisions are made, and how people think of plans since
EXPLAIN makes them visible.  I agree it is an executor change but I
think that distinction will be more confusing than helpful.

Frankly, almost all the optimizer items are really executor changes. 
Maybe the "Optimizer" title needs to be changed, but I do think it is
good to group plan changes together.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-05-29 Thread Bruce Momjian
On Tue, May 23, 2023 at 11:54:30PM -0400, Bruce Momjian wrote:
> On Wed, May 24, 2023 at 08:37:45AM +1200, David Rowley wrote:
> > On Mon, 22 May 2023 at 07:05, Jonathan S. Katz  wrote:
> > > * Parallel execution of queries that use `FULL` and `OUTER` joins
> > 
> > I think this should be `RIGHT` joins rather than `OUTER` joins.
> > 
> > LEFT joins have been parallelizable I think for a long time now.
> 
> Well, since we can swap left/right easily, why would we not have just
> have swappted the tables and done the join in the past?  I think there
> are two things missing in my description.
> 
> First, I need to mention parallel _hash_ join.  Second, I think this
> item is saying that the _inner_ side of a parallel hash join can be an
> OUTER or FULL join.  How about?
> 
>   Allow hash joins to be parallelized where the inner side is
>   processed as an OUTER or FULL join (Melanie Plageman, Thomas Munro)
> 
> In this case, the inner side is the hashed side.

I went with this text:

Allow parallelization of FULL and internal right OUTER hash joins
(Melanie Plageman, Thomas Munro)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-05-29 Thread Bruce Momjian
On Mon, May 29, 2023 at 10:08:41AM -0400, Masahiko Sawada wrote:
> On Sun, May 21, 2023 at 10:47 PM Bruce Momjian  wrote:
> >
> > Okay new merged item is:
> >
> > 
> >
> > 
> > 
> > Allow logical decoding on standbys (Bertrand Drouvot, Andres 
> > Freund, Amit Khandekar, Bertrand Drouvot)
> > 
> > 
> >
> > 
> > 
> > New function pg_log_standby_snapshot() forces creation of WAL 
> > snapshots.
> > Snapshots are required for logical slot creation so this function 
> > speeds their creation on standbys.
> > 
> > 
> >
> 
> Bertrand Drouvot is mentioned two times in this item and commit
> 0fdab27ad is listed two times. Is it intentional?

Thanks, fixed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: abi-compliance-checker

2023-05-29 Thread Peter Geoghegan
On Sun, May 28, 2023 at 9:34 AM Peter Geoghegan  wrote:
> I'll try to come up with a standard abi-compliance-checker Postgres
> workflow once I'm back from pgCon.

Ideally, we'd be able to produce reports that cover an entire stable
release branch at once, including details about how things changed
over time. It turns out that there is a tool from the same family of
tools as abi-compliance-checker, that can do just that:

https://github.com/lvc/abi-tracker

There is an abi-tracker example report, here:

https://abi-laboratory.pro/?view=timeline=glib

It's exactly the same presentation as the report I posted recently,
once you drill down. That seems ideal.

-- 
Peter Geoghegan




Re: benchmark results comparing versions 15.2 and 16

2023-05-29 Thread Peter Geoghegan
On Sun, May 28, 2023 at 2:42 PM David Rowley  wrote:
> c6e0fe1f2 might have helped improve some of that performance, but I
> suspect there must be something else as ~3x seems much more than I'd
> expect from reducing the memory overheads.  Testing versions before
> and after that commit might give a better indication.

I'm virtually certain that this is due to the change in default
collation provider, from libc to ICU. Mostly due to the fact that ICU
is capable of using abbreviated keys, and  the system libc isn't
(unless you go out of your way to define TRUST_STRXFRM when building
Postgres).

Many individual test cases involving larger non-C collation text sorts
showed similar improvements back when I worked on this. Offhand, I
believe that 3x - 3.5x improvements in execution times were common
with high entropy abbreviated keys on high cardinality input columns
at that time (this was with glibc). Low cardinality inputs were more
like 2.5x.

I believe that ICU is faster than glibc in general -- even with
TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the
most important factor here, by far.

-- 
Peter Geoghegan




Re: PG 16 draft release notes ready

2023-05-29 Thread Masahiko Sawada
On Sun, May 21, 2023 at 10:47 PM Bruce Momjian  wrote:
>
> Okay new merged item is:
>
> 
>
> 
> 
> Allow logical decoding on standbys (Bertrand Drouvot, Andres Freund, 
> Amit Khandekar, Bertrand Drouvot)
> 
> 
>
> 
> 
> New function pg_log_standby_snapshot() forces creation of WAL 
> snapshots.
> Snapshots are required for logical slot creation so this function 
> speeds their creation on standbys.
> 
> 
>

Bertrand Drouvot is mentioned two times in this item and commit
0fdab27ad is listed two times. Is it intentional?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: vector search support

2023-05-29 Thread Giuseppe Broccolo
Hi Nathan,

I noticed you implemented a closest_vector function which returns the
closest vector to a given one using the
Euclidean distance: would it make sense to change the implementation in
order to include also different distance
definitions rather than the Euclidean one (for instance, cosine
similarity)? Depending on the use cases, some
metrics could make more sense than others.

Giuseppe.

On 4/22/23 1:07 AM, Nathan Bossart  wrote:

> Attached is a proof-of-concept/work-in-progress patch set that adds
> functions for "vectors" repreѕented with one-dimensional float8 arrays.
> These functions may be used in a variety of applications, but I am
> proposing them with the AI/ML use-cases in mind.  I am posting this early
> in the v17 cycle in hopes of gathering feedback prior to PGCon.
>
> With the accessibility of AI/ML tools such as large language models (LLMs),
> there has been a demand for storing and manipulating high-dimensional
> vectors in PostgreSQL, particularly around nearest-neighbor queries.  Many
> of these vectors have more than 1500 dimensions.  The cube extension [0]
> provides some of the distance functionality (e.g., taxicab, Euclidean, and
> Chebyshev), but it is missing some popular functions (e.g., cosine
> similarity, dot product), and it is limited to 100 dimensions.  We could
> extend cube to support more dimensions, but this would require reworking
> its indexing code and filling in gaps between the cube data type and the
> array types.  For some previous discussion about using the cube extension
> for this kind of data, see [1].
>
> float8[] is well-supported and allows for effectively unlimited dimensions
> of data.  float8 matches the common output format of many AI embeddings,
> and it allows us or extensions to implement indexing methods around these
> functions.  This patch set does not yet contain indexing support, but we
> are exploring using GiST or GIN for the use-cases in question.  It might
> also be desirable to add support for other linear algebra operations (e.g.,
> operations on matrices).  The attached patches likely only scratch the
> surface of the "vector search" use-case.
>
> The patch set is broken up as follows:
>
>  * 0001 does some minor refactoring of dsqrt() in preparation for 0002.
>  * 0002 adds several vector-related functions, including distance functions
>and a kmeans++ implementation.
>  * 0003 adds support for optionally using the OpenBLAS library, which is an
>implementation of the Basic Linear Algebra Subprograms [2]
>specification.  Basic testing with this library showed a small
>performance boost, although perhaps not enough to justify giving this
>patch serious consideration.
>
> Of course, there are many open questions.  For example, should PostgreSQL
> support this stuff out-of-the-box in the first place?  And should we
> introduce a vector data type or SQL domains for treating float8[] as
> vectors?  IMHO these vector search use-cases are an exciting opportunity
> for the PostgreSQL project, so I am eager to hear what folks think.
>
> [0] https://www.postgresql.org/docs/current/cube.html
> [1] https://postgr.es/m/2271927.1593097400%40sss.pgh.pa.us
> [2] https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>


Re: make_ctags: use -I option to ignore pg_node_attr macro

2023-05-29 Thread Masahiko Sawada
Hi,

On Tue, Feb 14, 2023 at 8:15 PM Tatsuo Ishii  wrote:
>
> >> I fixed the above issues and refactored the code.
> >> Attached is the updated version of the patch. Thought?
> >
> > Thank you! Looks good to me.
>
> Fix pushed. Thank you!

In my Mac environment where non-Exuberant ctags and emacs 28.2 are
installed, the generated etags file cannot be loaded by emacs due to
file format error. The generated TAGS file is:

% head -10 TAGS

) /
sizeof(BlockNumber)sizeof(BlockNumber)117,3750
my
@newa newa395,10443

variadic array[1,2]:array[1,2]56,1803

variadic array[]::inarray[]::i72,2331

variadic array[array64,2111

variadic array[array68,

variadic array[array76,2441
  (2 *  (2 53,1353
my $fn fn387,10147
startblock 101,4876

Since the etags files consist of multiple sections[1] we cannot sort
the generated etags file. With the attached patch, make_etags (with
non-Exuberant ctags) generates a correct format etags file and it
works:

% head -10 TAGS

/Users/masahiko/pgsql/source/postgresql/GNUmakefile,1187
subdir 7,56
top_builddir 8,65
docs:docs13,167
world-contrib-recurse:world-contrib-recurse19,273
world-bin-contrib-recurse:world-bin-contrib-recurse24,394
html man:html man26,444
install-docs:install-docs29,474
install-world-contrib-recurse:install-world-contrib-recurse35,604

BTW regarding the following comment, as far as I can read the
Wikipedia page for ctags[1], Exuberant ctags file doesn't have a
header section.

# Exuberant tags has a header that we cannot sort in with the other entries
# so we skip the sort step
# Why are we sorting this?  I guess some tag implementation need this,
# particularly for append mode.  bjm 2012-02-24
if [ ! "$IS_EXUBERANT" ]

Instead, the page says that sorting non-Exuberant tags file allows for
faster searching on of the tags file. I've fixed the comment
accordingly too.

Regards,

[1] https://en.wikipedia.org/wiki/Ctags#Etags_2

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_make_ctags.patch
Description: Binary data


Re: vector search support

2023-05-29 Thread Giuseppe Broccolo
Hi Jonathan,

On 5/26/23 3:38 PM, Jonathan S. Katz  wrote:

> On 4/26/23 9:31 AM, Giuseppe Broccolo wrote:
> > We finally opted for ElasticSearch as search engine, considering that it
> > was providing what we needed:
> >
> > * support to store dense vectors
> > * support for kNN searches (last version of ElasticSearch allows this)
>
> I do want to note that we can implement indexing techniques with GiST
> that perform K-NN searches with the "distance" support function[1], so
> adding the fundamental functions to help with this around known vector
> search techniques could add this functionality. We already have this
> today with "cube", but as Nathan mentioned, it's limited to 100 dims.
>

Yes, I was aware of this. It would be enough to define the required support
functions for GiST
indexing (I was a bit in the loop when it was tried to add PG14 presorting
support to GiST indexing
in PostGIS[1]). That would be really helpful indeed. I was just mentioning
it because I know about
other teams using ElasticSearch as a storage of dense vectors only for this.


> > An internal benchmark showed us that we were able to achieve the
> > expected performance, although we are still lacking some points:
> >
> > * clustering of vectors (this has to be done outside the search engine,
> > using DBScan for our use case)
>
>  From your experience, have you found any particular clustering
> algorithms better at driving a good performance/recall tradeoff?
>

Nope, it really depends on the use case: the point of using DBScan above
was mainly because it's a way of clustering without knowing a priori the
number
of clusters the algorithm should be able to retrieve, which is actually a
parameter
needed for Kmeans. Depending on the use case, DBScan might have better
performance in noisy datasets (i.e. entries that really do not belong to a
cluster in
particular). Noise in vectors obtained with embedding models is quite
normal,
especially when the embedding model is not properly tuned/trained.

In our use case, DBScan was more or less the best choice, without biasing
the
expected clusters.

Also PostGIS includes an implementation of DBScan for its geometries[2].


> > * concurrency in updating the ElasticSearch indexes storing the dense
> > vectors
>
> I do think concurrent updates of vector-based indexes is one area
> PostgreSQL can ultimately be pretty good at, whether in core or in an
> extension.


Oh, it would save a lot of overhead in updating indexed vectors! It's
something needed
when embedding models are re-trained, vectors are re-generated and indexes
need to
be updated.

Regards,
Giuseppe.

[1]
https://github.com/postgis/postgis/blob/a4f354398e52ad7ed3564c47773701e4b6b87ae8/doc/release_notes.xml#L284
[2]
https://github.com/postgis/postgis/blob/ce75a0e81aec2e8a9fad2649ff7b230327acb64b/postgis/lwgeom_window.c#L117


Re: BF animal dikkop reported a failure in 035_standby_logical_decoding

2023-05-29 Thread Drouvot, Bertrand

Hi,

On 5/29/23 1:03 PM, Tom Lane wrote:

"Drouvot, Bertrand"  writes:

On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:

Is it possible that the vacuum command didn't remove tuples and then the
conflict was not triggered?



The flush_wal table added by Andres should guarantee that the WAL is flushed, so
the only reason I can think about is indeed that the vacuum did not remove 
tuples (
but I don't get why/how that could be the case).


This test is broken on its face:

   CREATE TABLE conflict_test(x integer, y text);
   DROP TABLE conflict_test;
   VACUUM full pg_class;

There will be something VACUUM can remove only if there were no other
transactions holding back global xmin --- and there's not even a delay
here to give any such transaction a chance to finish.

Background autovacuum is the most likely suspect for breaking that,


Oh right, I did not think autovacuum could start during this test, but yeah 
there
is no reasons it could not.


but I wouldn't be surprised if something in the logical replication
mechanism itself could be running a transaction at the wrong instant.

Some of the other recovery tests set
autovacuum = off
to try to control such problems, but I'm not sure how much of
a solution that really is.


One option I can think of is to:

1) set autovacuum = off (as it looks like the usual suspect).
2) trigger the vacuum in verbose mode (as suggested by Shi-san) and
depending of its output run the "invalidation" test or: re-launch the vacuum, 
re-check the output
and so on.. (n times max). If n is reached, then skip this test.

As this test is currently failing randomly (and it looks like there is more 
success that failures, even without
autovacuum = off), then the test should still validate that the invalidation 
works as expected for the large
majority of runs (and skipping the test should be pretty rare then).

Would that make sense?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Making Vars outer-join aware

2023-05-29 Thread Anton A. Melnikov

Hello and sorry for the big delay in reply!

On 04.05.2023 15:22, Tom Lane wrote:

AFAICS the change you propose would serve only to mask bugs.


Yes, it's a bad decision.


Under what circumstances would you be trying to inject INDEX_VAR
into a nullingrel set?  Only outer-join relids should ever appear there.


The thing is that i don't try to push INDEX_VAR into a nullingrel set at all,
i just try to replace the existing rt_index that equals to INDEX_VAR in Var 
nodes with
the defined positive indexes by using ChangeVarNodes_walker() function call. It 
checks
if the nullingrel contains the existing rt_index and does it need to be updated 
too.
It calls bms_is_member() for that, but the last immediately throws an error
if the value to be checked is negative like INDEX_VAR.

But we are not trying to corrupt the existing nullingrel with this bad index,
so it doesn't seem like an serious error.
And this index certainly cannot be a member of the Bitmapset.

Therefore it also seems better and more logical to me in the case of an index 
that
cannot possibly be a member of the Bitmapset, immediately return false.

Here is a patch like that.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit cc1724759b898efc703867a83d38173e4b2794b5
Author: Anton A. Melnikov 
Date:   Mon May 29 13:52:42 2023 +0300

Return false from bms_is_member() if checked value is negative.

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 7ba3cf635b..3e1db5fda2 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -446,9 +446,9 @@ bms_is_member(int x, const Bitmapset *a)
 	int			wordnum,
 bitnum;
 
-	/* XXX better to just return false for x<0 ? */
+	/* bitmapset member cannot be negative */
 	if (x < 0)
-		elog(ERROR, "negative bitmapset member not allowed");
+		return false;
 	if (a == NULL)
 		return false;
 	wordnum = WORDNUM(x);


Re: BF animal dikkop reported a failure in 035_standby_logical_decoding

2023-05-29 Thread Tom Lane
"Drouvot, Bertrand"  writes:
> On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:
>> Is it possible that the vacuum command didn't remove tuples and then the
>> conflict was not triggered? 

> The flush_wal table added by Andres should guarantee that the WAL is flushed, 
> so
> the only reason I can think about is indeed that the vacuum did not remove 
> tuples (
> but I don't get why/how that could be the case).

This test is broken on its face:

  CREATE TABLE conflict_test(x integer, y text);
  DROP TABLE conflict_test;
  VACUUM full pg_class;

There will be something VACUUM can remove only if there were no other
transactions holding back global xmin --- and there's not even a delay
here to give any such transaction a chance to finish.

Background autovacuum is the most likely suspect for breaking that,
but I wouldn't be surprised if something in the logical replication
mechanism itself could be running a transaction at the wrong instant.

Some of the other recovery tests set
autovacuum = off
to try to control such problems, but I'm not sure how much of
a solution that really is.

regards, tom lane




RE: BF animal dikkop reported a failure in 035_standby_logical_decoding

2023-05-29 Thread Yu Shi (Fujitsu)
On Monday, May 29, 2023 5:22 PM Drouvot, Bertrand 
 wrote:
> 
> Hi,
> 
> On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:
> > Hi hackers,
> >
> > I saw a buildfarm failure on "dikkop"[1]. It failed in
> > 035_standby_logical_decoding.pl, because the slots row_removal_inactiveslot
> and
> > row_removal_activeslot are not invalidated after vacuum.
> 
> Thanks for sharing!
> 
> >
> > regress_log_035_standby_logical_decoding:
> > ```
> > [12:15:05.943](4.442s) not ok 22 - inactiveslot slot invalidation is logged 
> > with
> vacuum on pg_class
> > [12:15:05.945](0.003s)
> > [12:15:05.946](0.000s) #   Failed test 'inactiveslot slot invalidation is 
> > logged
> with vacuum on pg_class'
> > #   at t/035_standby_logical_decoding.pl line 238.
> > [12:15:05.948](0.002s) not ok 23 - activeslot slot invalidation is logged 
> > with
> vacuum on pg_class
> > [12:15:05.949](0.001s)
> > [12:15:05.950](0.000s) #   Failed test 'activeslot slot invalidation is 
> > logged with
> vacuum on pg_class'
> > #   at t/035_standby_logical_decoding.pl line 244.
> > [13:38:26.977](5001.028s) # poll_query_until timed out executing this query:
> > # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts 
> > where
> datname = 'testdb'
> > # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > [13:38:26.980](0.003s) not ok 24 - confl_active_logicalslot updated
> > [13:38:26.982](0.002s)
> > [13:38:26.982](0.000s) #   Failed test 'confl_active_logicalslot updated'
> > #   at t/035_standby_logical_decoding.pl line 251.
> > Timed out waiting confl_active_logicalslot to be updated at
> t/035_standby_logical_decoding.pl line 251.
> > ```
> >
> > 035_standby_logical_decoding.pl:
> > ```
> > # This should trigger the conflict
> > $node_primary->safe_psql(
> > 'testdb', qq[
> >CREATE TABLE conflict_test(x integer, y text);
> >DROP TABLE conflict_test;
> >VACUUM pg_class;
> >INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
> > ]);
> >
> > $node_primary->wait_for_replay_catchup($node_standby);
> >
> > # Check invalidation in the logfile and in pg_stat_database_conflicts
> > check_for_invalidation('row_removal_', $logstart, 'with vacuum on 
> > pg_class');
> > ```
> >
> > Is it possible that the vacuum command didn't remove tuples and then the
> > conflict was not triggered?
> 
> The flush_wal table added by Andres should guarantee that the WAL is flushed,
> so
> the only reason I can think about is indeed that the vacuum did not remove
> tuples (
> but I don't get why/how that could be the case).
> 
> > It seems we can't confirm this because there is not
> > enough information.
> 
> Right, and looking at its status history most of the time the test is green 
> (making
> it
> even more difficult to diagnose).
> 
> > Maybe "vacuum verbose" can be used to provide more
> > information.
> 
> I can see that dikkop "belongs" to Tomas (adding Tomas to this thread).
> Tomas, do you think it would be possible to run some
> 035_standby_logical_decoding.pl
> manually with "vacuum verbose" in the test mentioned above? (or any other
> way you can think
> about that would help diagnose this random failure?).
> 

Thanks for your reply.

I saw another failure on "drongo" [1], which looks like a similar problem. 

Maybe a temporary patch can be committed to dump the result of "vacuum verbose".
And we can check this when the test fails.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo=2023-05-26%2018%3A05%3A57

Regards,
Shi Yu


Re: BF animal dikkop reported a failure in 035_standby_logical_decoding

2023-05-29 Thread Drouvot, Bertrand

Hi,

On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:

Hi hackers,

I saw a buildfarm failure on "dikkop"[1]. It failed in
035_standby_logical_decoding.pl, because the slots row_removal_inactiveslot and
row_removal_activeslot are not invalidated after vacuum.


Thanks for sharing!



regress_log_035_standby_logical_decoding:
```
[12:15:05.943](4.442s) not ok 22 - inactiveslot slot invalidation is logged 
with vacuum on pg_class
[12:15:05.945](0.003s)
[12:15:05.946](0.000s) #   Failed test 'inactiveslot slot invalidation is 
logged with vacuum on pg_class'
#   at t/035_standby_logical_decoding.pl line 238.
[12:15:05.948](0.002s) not ok 23 - activeslot slot invalidation is logged with 
vacuum on pg_class
[12:15:05.949](0.001s)
[12:15:05.950](0.000s) #   Failed test 'activeslot slot invalidation is logged 
with vacuum on pg_class'
#   at t/035_standby_logical_decoding.pl line 244.
[13:38:26.977](5001.028s) # poll_query_until timed out executing this query:
# select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where 
datname = 'testdb'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[13:38:26.980](0.003s) not ok 24 - confl_active_logicalslot updated
[13:38:26.982](0.002s)
[13:38:26.982](0.000s) #   Failed test 'confl_active_logicalslot updated'
#   at t/035_standby_logical_decoding.pl line 251.
Timed out waiting confl_active_logicalslot to be updated at 
t/035_standby_logical_decoding.pl line 251.
```

035_standby_logical_decoding.pl:
```
# This should trigger the conflict
$node_primary->safe_psql(
'testdb', qq[
   CREATE TABLE conflict_test(x integer, y text);
   DROP TABLE conflict_test;
   VACUUM pg_class;
   INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
]);

$node_primary->wait_for_replay_catchup($node_standby);

# Check invalidation in the logfile and in pg_stat_database_conflicts
check_for_invalidation('row_removal_', $logstart, 'with vacuum on pg_class');
```

Is it possible that the vacuum command didn't remove tuples and then the
conflict was not triggered? 


The flush_wal table added by Andres should guarantee that the WAL is flushed, so
the only reason I can think about is indeed that the vacuum did not remove 
tuples (
but I don't get why/how that could be the case).


It seems we can't confirm this because there is not
enough information. 


Right, and looking at its status history most of the time the test is green 
(making it
even more difficult to diagnose).


Maybe "vacuum verbose" can be used to provide more
information.


I can see that dikkop "belongs" to Tomas (adding Tomas to this thread).
Tomas, do you think it would be possible to run some 
035_standby_logical_decoding.pl
manually with "vacuum verbose" in the test mentioned above? (or any other way 
you can think
about that would help diagnose this random failure?).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-29 Thread Drouvot, Bertrand

Hi,

On 5/26/23 7:28 PM, Peter Geoghegan wrote:

On Fri, May 26, 2023 at 1:56 AM Alvaro Herrera  wrote:

I suppose you're not thinking of applying this to current master, but
instead just leave it for when pg17 opens, right?  I mean, clearly it
seems far too invasive to put it in after beta1.


I was planning on targeting 16 with this. Although I only posted a
patch recently, I complained about the problems in this area shortly
after the code first went in. It's fairly obvious to me that the
changes made to nbtree went quite a bit further than they needed to.


Thanks Peter for the call out and the follow up on this!

As you already mentioned in this thread, all the changes I've done in
61b313e47e were purely "mechanical" as the main goal was to move forward the
logical decoding on standby thread and..


Admittedly that's partly because I'm an expert on this particular
code.



it was not obvious to me (as I'm not an expert as you are in this area) that
many of those changes were "excessive".


Now, to be fair to Bertrand, it *looks* more complicated than it
really is, in large part due to the obscure case where VACUUM finishes
an interrupted page split (during page deletion), which itself goes on
to cause a page split one level up.


Thanks ;-)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: Support logical replication of DDLs

2023-05-29 Thread Yu Shi (Fujitsu)

On Mon, May 22, 2023 1:57 PM shveta malik  wrote:
> 
> Please find the new set of patches for object-tree Removal.  The new
> changes are in patch 0008 only. The new changes address object tree
> removal for below commands.
> 
> create sequence
> alter sequence
> alter object owner to
> alter object set schema
> alter object rename
> 
> In this patch 0008, ddldeparse.c is now object-tree free for all the
> table related commands. Index related commands are yet to be done.
> 

Thanks for updating the patch. Here are some comments.

0001 patch
-
1.
+   colname = get_attname(ownerId, depform->refobjsubid, false);
+   if (colname == NULL)
+   continue;

missing_ok is false when calling get_attname(), so is there any case that
colname is NULL?

2.
+   case AT_SetStatistics:
+   {
+   Assert(IsA(subcmd->def, Integer));
+   if (subcmd->name)
+   tmp_obj = new_objtree_VA("ALTER 
COLUMN %{column}I SET STATISTICS %{statistics}n", 3,
+   
"type", ObjTypeString, "set statistics",
+   
"column", ObjTypeString, subcmd->name,
+   
"statistics", ObjTypeInteger,
+   
intVal((Integer *) subcmd->def));
+   else
+   tmp_obj = new_objtree_VA("ALTER 
COLUMN %{column}n SET STATISTICS %{statistics}n", 3,
+   
"type", ObjTypeString, "set statistics",
+   
"column", ObjTypeInteger, subcmd->num,
+   
"statistics", ObjTypeInteger,
+   
intVal((Integer *) subcmd->def));
+   subcmds = lappend(subcmds, 
new_object_object(tmp_obj));
+   }
+   break;

I think subcmd->name will be NULL only if relation type is index. So should it
be removed because currently only table commands are supported?

0002 patch
-
3.
+   /* Skip adding constraint for inherits 
table sub command */
+   if (!constrOid)
+   continue;

Would it be better to use OidIsValid() here?

0008 patch
-
4.
case AT_AddColumn:
/* XXX need to set the "recurse" bit somewhere? 
*/
Assert(IsA(subcmd->def, ColumnDef));
-   tree = deparse_ColumnDef(rel, dpcontext, false,
-   
 (ColumnDef *) subcmd->def, true, );
 
mark_function_volatile(context, expr);

After this change, `expr` is not assigned a value when mark_function_volatile 
is called.

Some problems I saw :
-
5.
create table p1(f1 int);
create table p1_c1() inherits(p1);
alter table p1 add constraint inh_check_constraint1 check (f1 > 0);
alter table p1_c1 add constraint inh_check_constraint1 check (f1 > 0);

The re-formed command of the last command is "ALTER TABLE  public.p1_c1", which
seems to be wrong.

6.
SET allow_in_place_tablespaces = true;
CREATE TABLESPACE ddl_tblspace LOCATION '';
RESET allow_in_place_tablespaces;
CREATE TABLE tbl_index_tblspe (a int, PRIMARY KEY(a) USING INDEX TABLESPACE 
ddl_tblspace) ;

The re-formed command of the last command seems incorrect:
CREATE  TABLE  public.tbl_index_tblspe (a pg_catalog.int4 STORAGE PLAIN  , 
USING INDEX TABLESPACE ddl_tblspace)

7.
CREATE TABLE part2_with_multiple_storage_params(
id int,
name varchar
) WITH (autovacuum_enabled);

re-formed command: CREATE  TABLE  public.part2_with_multiple_storage_params (id 
pg_catalog.int4 STORAGE PLAIN  , name pg_catalog."varchar" STORAGE EXTENDED 
 COLLATE pg_catalog."default")WITH (vacuum_index_cleanup = 'on', 
autovacuum_vacuum_scale_factor = '0.2', vacuum_truncate = 'true', 
autovacuum_enabled = 'TRUE')

When the option is not specified, re-formed command used uppercase letters. The
reloptions column in pg_class of the original command is 
"{autovacuum_enabled=true}", but that of the re-formed command is
"{autovacuum_enabled=TRUE}". I tried to add this case to