Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-21 Thread Önder Kalacı
Hi,

Thanks for the explanation.

As described in the commit message, we assume that extensions use the
> hook in a similar way to FDWs


I'm not sure if it is fair to assume that extensions use any hook in any
way.

So my question is: does the Citus extension use the hook like this?
> (Sorry, I do not fully understand Onder's explanation.)
>
>
I haven't gone into detail about how Citus uses this hook, but I don't
think we should
need to explain it. In general, Citus uses many hooks, and many other
extensions
use this specific hook. With minor version upgrades, we haven't seen this
kind of
behavior change before.

In general, Citus relies on this hook for collecting information about
joins across
relations/ctes/subqueries. So, its scope is bigger than a single join for
Citus.

The extension assigns a special marker(s) for RTE Relations, and then
checks whether
all relations with these special markers joined transitively across
subqueries, such that
it can decide to pushdown the whole or some parts of the (sub)query.

I must admit, I have not yet looked into whether we can fix the problem
within the extension.
Maybe we can, maybe not.

But the bigger issue is that there has usually been a clear line between
the extensions and
the PG itself when it comes to hooks within the minor version upgrades.
Sadly, this change
breaks that line. We wanted to share our worries here and find out what
others think.

>Except that this was only noticed after it was released in a set of minor
> > versions, I would say that 6f80a8d9c should just straight up be reverted.


I cannot be the one to ask for reverting a commit in PG, but I think doing
it would be a
fair action. We kindly ask those who handle this to think about it.

Thanks,
Onder


Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-16 Thread Önder Kalacı
Hi Etsuro,

Thanks for the response!


> Maybe we could do so by leaving to extensions the decision whether
> they replace joins with pseudoconstant clauses, but I am not sure that
> that is a good idea, because that would require the authors to modify
> and recompile their extensions to fix the issue...


I think I cannot easily follow this argument. The decision to push down the
join
(or not) doesn't seem to be related to calling set_join_pathlist_hook. It
seems like the
extension should decide what to do with the hook.

That seems the generic theme of the hooks that Postgres provides. For
example, the extension
is allowed to even override the whole planner/executor, and there is no
condition that would
prevent it from happening. In other words, an extension can easily return
wrong results with the
wrong actions taken with the hooks, and that should be responsibility of
the extension, not Postgres


> I am not familiar with the Citus extension, but such pseudoconstant
> clauses are handled within the Citus extension?
>
>
As I noted earlier, Citus relies on this hook for collecting information
about all the joins that Postgres
knows about, there is nothing specific to pseudoconstants. Some parts of
creating the (distributed)
plan relies on the information gathered from this hook. So, if information
about some of the joins
are not passed to the extension, then the decisions that the extension
gives are broken (and as a result
the queries are broken).

Thanks,
Onder


Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-15 Thread Önder Kalacı
Hi Etsuro, all

The commit[1] seems to break some queries in Citus[2], which is an
extension which relies on set_join_pathlist_hook.

Although the comment says */*Finally, give extensions a chance to
manipulate the path list.*/  *we use it to extract lots of information
about the joins and do the planning based on the information.

Now,  for some joins where consider_join_pushdown=false, we cannot get the
information that we used to get, which prevents doing distributed planning
for certain queries.

We wonder if it is possible to allow extensions to access the join info
under all circumstances, as it used to be? Basically, removing the
additional check:

diff --git a/src/backend/optimizer/path/joinpath.c
b/src/backend/optimizer/path/joinpath.c
index 03b3185984..080e76cbe9 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -349,8 +349,7 @@ add_paths_to_joinrel(PlannerInfo *root,
/*
 * 6. Finally, give extensions a chance to manipulate the path list.
 */
-   if (set_join_pathlist_hook &&
-   consider_join_pushdown)
+   if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
   jointype,
);
 }



Thanks,
Onder

[1]:
https://github.com/postgres/postgres/commit/b0e390e6d1d68b92e9983840941f8f6d9e083fe0
[2]: https://github.com/citusdata/citus/issues/7119


Etsuro Fujita , 15 Ağu 2023 Sal, 11:05 tarihinde
şunu yazdı:

> On Tue, Aug 8, 2023 at 6:30 PM Richard Guo  wrote:
> > On Tue, Aug 8, 2023 at 4:40 PM Etsuro Fujita 
> wrote:
> >> I modified the code a bit further to use an if-test to avoid a useless
> >> function call, and added/tweaked comments and docs further.  Attached
> >> is a new version of the patch.  I am planning to commit this, if there
> >> are no objections.
>
> > +1 to the v4 patch.  It looks good to me.
>
> Pushed after some copy-and-paste editing of comments/documents.
>
> Thanks!
>
> Best regards,
> Etsuro Fujita
>
>
>


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-19 Thread Önder Kalacı
Hi Masahiko, Amit, all

I've updated the patch.
>

I think the flow is much nicer now compared to the HEAD. I really don't
have any
comments regarding the accuracy of the code changes, all looks good to me.
Overall, I cannot see any behavioral changes as you already alluded to.

Maybe few minor notes regarding the comments:

 /*
> + * And must reference the remote relation column. This is because if it
> + * doesn't, the sequential scan is favorable over index scan in most
> + * cases..
> + */


I think the reader might have lost the context (or say in the future due to
another refactor etc). So maybe start with:

/* And the leftmost index field must refer to the  ...


Also, now in IsIndexUsableForReplicaIdentityFull() some of the conditions
have comments
some don't. Should we comment on the missing ones as well, maybe such as:

/* partial indexes are not support *
> if (indexInfo->ii_Predicate != NIL)
>
and,

> /* all indexes must have at least one attribute */
> Assert(indexInfo->ii_NumIndexAttrs >= 1);




>
>>
>> BTW, IsIndexOnlyExpression() is not necessary but the current code
>> still works fine. So do we need to backpatch it to PG16? I'm thinking
>> we can apply it to only HEAD.
>
> Either way is fine but I think if we backpatch it then the code
> remains consistent and the backpatching would be easier.
>

Yeah, I also have a slight preference for backporting. It could make it
easier to maintain the code
in the future in case of another backport(s). With the cost of making it
slightly harder for you now :)

Thanks,
Onder


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-17 Thread Önder Kalacı
Hi,


> I've attached a draft patch for this idea.


I think it needs a rebase after edca3424342da323499a1998d18a888283e52ac7.

Also, as discussed in [1], I think one improvement we had was to
keep IsIndexUsableForReplicaIdentityFull() in a way that it is easier to
read & better documented in the code. So, it would be nice to stick to that.

Overall, the proposed changes make sense to me as well. Once the patch is
ready, I'm happy to review & test in detail.

Thanks,
Onder


[1]
https://www.postgresql.org/message-id/CAA4eK1Jcyrxt_84wt2%3DQnOcwwJEC2et%2BtCLjAuTXzE6N3FXqQw%40mail.gmail.com


Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

2023-07-17 Thread Önder Kalacı
Hi,

>
> > The last line seems repetitive to me. So, I have removed it. Apart
> > from that patch looks good to me. Sergie, Peter, and others, any
> > thoughts?
>
> The v5 patch LGTM.
>
>
Overall looks good to me as well. Please consider the following as an
optional improvement.

My only minor concern here is the use of the term "default operator class".
It is accurate to use it. However, as far as I know, not many users can
follow that easily. I think the "pkey/repl full" suggestion gives some tip,
but I wonder if we add something like the following to the text such that
users can understand more:

 do not have a default operator class for B-tree or Hash.

+ If  there is no default operator class, usually the type does not have an
> equality operator.

However,  this limitation ..


Thanks,
Onder


Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-13 Thread Önder Kalacı
Hi Amit, Hayato, all

> +1 for the readability. I think we can as you suggest or I feel it
> > would be better if we return false as soon as we found any condition
> > is false. The current patch has a mixed style such that for
> > InvalidStrategy, it returns immediately but for others, it does a
> > combined check.
> >
>
> I have followed the later style in the attached.
>

Looks good to me!

I agree with your note that the confusion was mostly
around two different styles (e,g., some checks return early others wait
until the final return). Now, the code is pretty easy to follow.


> > The other point we should consider in this regard is
> > the below assert check:
> >
> > +#ifdef USE_ASSERT_CHECKING
> > + {
> > + /* Check that the given index access method has amgettuple routine */
> > + IndexAmRoutine *amroutine = GetIndexAmRoutineByAmId(indexInfo->ii_Am,
> > + false);
> > + Assert(amroutine->amgettuple != NULL);
> > + }
> > +#endif
> >
> > Apart from having an assert, we have the following two options (a)
> > check this condition as well and return false if am doesn't support
> > amgettuple (b) report elog(ERROR, ..) in this case.
>

I think with the current state of the patch (e.g., we only support btree
and hash),
Assert looks reasonable.

In the future, in case we have a future hypothetical index type that
fulfills the
"if" checks but fails on amgettuple, we could change the code to "return
false"
such that it gives a chance for the other indexes to satisfy the condition.


Let me know what you think of the attached.
>
>
Looks good to me. I have also done some testing, which works as
documented/expected.

Thanks,
Onder


Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-12 Thread Önder Kalacı
Hi Hayato, all


> > - return is_btree && !is_partial && !is_only_on_expression;
> > + /* Check whether the index is supported or not */
> > + is_suitable_type = ((get_equal_strategy_number_for_am(indexInfo->ii_Am)
> > + != InvalidStrategy));
> > +
> > + is_partial = (indexInfo->ii_Predicate != NIL);
> > + is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
> > +
> > + return is_suitable_type && !is_partial && !is_only_on_expression;
> >
>

I don't want to repeat this too much, as it is a minor note. Just
sharing my perspective here.

As discussed in the other email [1], I feel like keeping
IsIndexUsableForReplicaIdentityFull()  function readable is useful
for documentation purposes as well.

So, I'm more inclined to see something like your old code, maybe with
a different variable name.

bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);


to

> bool has_equal_strategy = get_equal_strategy_number_for_am...
> 
>  return  has_equal_strategy   && !is_partial && !is_only_on_expression;



> 4a.
> > The code can be written more optimally, because if it is deemed
> > already that 'is_suitable_type' is false, then there is no point to
> > continue to do the other checks and function calls.


Sure, there are maybe few cycles of CPU gains, but this code is executed
infrequently, and I don't see much value optimizing it. I think keeping it
slightly
more readable is nicer.

Other than that, I think the code/test looks good. For the
comments/documentation,
I think Amit and Peter have already given quite a bit of useful feedback,
so nothing
much to add from my end.

Thanks,
Onder

[1]:
https://www.postgresql.org/message-id/CACawEhUWH1qAZ8QNeCve737Qe1_ye%3DvTW9P22ePiFssT7%2BHaaQ%40mail.gmail.com


Hayato Kuroda (Fujitsu) , 12 Tem 2023 Çar, 10:07
tarihinde şunu yazdı:

> Dear Amit,
>
> Thanks for checking my patch! The patch can be available at [1].
>
> > > > ==
> > > > .../subscription/t/032_subscribe_use_index.pl
> > > >
> > > > 11.
> > > > AFAIK there is no test to verify that the leftmost index field must
> be
> > > > a column (e.g. cannot be an expression). Yes, there are some tests
> for
> > > > *only* expressions like (expr, expr, expr), but IMO there should be
> > > > another test for an index like (expr, expr, column) which should also
> > > > fail because the column is not the leftmost field.
> > >
> > > I'm not sure they should be fixed in the patch. You have raised these
> points
> > > at the Sawada-san's thread[1] and not sure he has done.
> > > Furthermore, these points are not related with our initial motivation.
> > > Fork, or at least it should be done by another patch.
> > >
> >
> > I think it is reasonable to discuss the existing code-related
> > improvements in a separate thread rather than trying to change this
> > patch.
>
> OK, so I will not touch in this thread.
>
> > I have some other comments about your patch:
> >
> > 1.
> > + * 1) Other indexes do not have a fixed set of strategy numbers.
> > + * In build_replindex_scan_key(), the operator which corresponds to
> "equality
> > + * comparison" is searched by using the strategy number, but it is
> difficult
> > + * for such indexes. For example, GiST index operator classes for
> > + * two-dimensional geometric objects have a comparison "same", but its
> > strategy
> > + * number is different from the gist_int4_ops, which is a GiST index
> operator
> > + * class that implements the B-tree equivalent.
> > + *
> > ...
> > + */
> > +int
> > +get_equal_strategy_number_for_am(Oid am)
> > ...
> >
> > I think this comment is slightly difficult to understand. Can we make
> > it a bit generic by saying something like: "The index access methods
> > other than BTREE and HASH don't have a fixed strategy for equality
> > operation. Note that in other index access methods, the support
> > routines of each operator class interpret the strategy numbers
> > according to the operator class's definition. So, we return
> > InvalidStrategy number for index access methods other than BTREE and
> > HASH."
>
> Sounds better. Fixed with some adjustments.
>
> > 2.
> > + * 2) Moreover, BRIN and GIN indexes do not implement "amgettuple".
> > + * RelationFindReplTupleByIndex() assumes that tuples will be fetched
> one by
> > + * one via index_getnext_slot(), but this currently requires the
> "amgetuple"
> > + * function. To make it use index_getbitmap() must be used, which
> fetches all
> > + * the tuples at once.
> > + */
> > +int
> > +get_equal_strategy_number_for_am(Oid am)
> > {
> > ..
> >
> > I don't think this is a good place for such a comment. We can probably
> > move this atop IsIndexUsableForReplicaIdentityFull(). I think you need
> > to mention two reasons in IsIndexUsableForReplicaIdentityFull() why we
> > support only BTREE and HASH index access methods (a) Refer to comments
> > atop get_equal_strategy_number_for_am(); (b) mention the reason
> > related to tuples_equal() as discussed in email [1]. Then you can say
> > that we also need to ensure 

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Önder Kalacı
Hi Amit, all

Amit Kapila , 12 Tem 2023 Çar, 13:09 tarihinde
şunu yazdı:

> On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada 
> wrote:
> >
> > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith 
> wrote:
> > >
> >
> > I don't think we have concluded any action for it. I agree that
> > IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> > index fields actually. I've attached a draft patch. It removes
> > IsIndexOnlyOnExpression() and merges
> > RemoteRelContainsLeftMostColumnOnIdx() to
> > FindUsableIndexForReplicaIdentityFull. One concern is that we no
> > longer do the assertion check with
> > IsIndexUsableForReplicaIdentityFull(). What do you think?
> >
>
> I think this is a valid concern. Can't we move all the checks
> (including the remote attrs check) inside
> IsIndexUsableForReplicaIdentityFull() and then call it from both
> places? Won't we have attrmap information available in the callers of
> FindReplTupleInLocalRel() via ApplyExecutionData?
>
>
>
I think such an approach is slightly better than the proposed changes on
remove_redundant_check.patch

I think one reason we ended up with IsIndexUsableForReplicaIdentityFull()
is that it
is a nice way for documenting the requirements in the code.

However, as you also alluded to in the
thread, RemoteRelContainsLeftMostColumnOnIdx()
breaks this documentation.

I agree that it is nice to have all the logic to be in the same place. I
think remove_redundant_check.patch
does that by inlining IsIndexUsableForReplicaIdentityFull
and RemoteRelContainsLeftMostColumnOnIdx
into FindUsableIndexForReplicaIdentityFull().

As Amit noted, the other way around might be more interesting. We expand
IsIndexUsableForReplicaIdentityFull() such that it also includes
RemoteRelContainsLeftMostColumnOnIdx(). With that, readers of
IsIndexUsableForReplicaIdentityFull() can follow the requirements slightly
easier.

Though, not sure yet if we can get all the necessary information for the
Assert
via ApplyExecutionData in FindReplTupleInLocalRel. Perhaps yes.

Thanks,
Onder


Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-10 Thread Önder Kalacı
Hi,

I also think so. If this is true, how can we think of supporting
> indexes other than hash like GiST, and SP-GiST as mentioned by you in
> your latest email? As per my understanding if we don't have PK or
> replica identity then after the index scan, we do tuples_equal which
> will fail for GIST or SP-GIST. Am, I missing something?
>

I also don't think we can support anything other than btree, hash and brin
as those lack equality operators.

And, for BRIN, Hayato brought up the amgettuple issue, which is fair. So,
overall, as far as I can see, we can
easily add hash indexes but all others are either very hard to add or not
possible.

I think if someone one day works on supporting primary keys using other
index types, we can use it here :p

Thanks,
Onder


Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-06 Thread Önder Kalacı
Hi Hayato,


> BTW, I have doubt that the restriction is not related with your commit.
> In other words, if the table has attributes which the datatype is not for
> operator
> class of Btree, we could not use REPLICA IDENTITY FULL. IIUC it is not
> documented.
> Please see attched script to reproduce that. The final DELETE statement
> cannot be
> replicated at the subscriber on my env.
>
>
Yes, I agree, it is (and was before my patch as well)
un-documented limitation of REPLICA IDENTITY FULL.
And, as far as I can see, my patch actually didn't have any impact on the
limitation. The unsupported
cases are still unsupported, but now the same error is thrown in a slightly
different place.

I think that is a minor limitation, but maybe should be listed [1]?

>
> For the specific notes you raised about strategy numbers / operator
> classes, I need to
> study a bit :) Though, I'll be available to do that early next week.
> >
>
> Thanks! I'm looking forward to see your opinions...
>

For this one, I did some research in the code, but  I'm not very
comfortable with the answer. Still, I wanted to share my observations so
that it might be useful for the discussion.

First, I checked if the function get_op_btree_interpretation() could be
used here. But, I think that is again btree-only and I couldn't find
anything generic that does something similar.

Then, I tried to come up with a SQL query, actually based on the link [2]
you shared. I think we should always have an "equality" strategy (e.g.,
"same", "overlaps", "contains" etc sounds wrong to me).

And, it seems btree, hash and brin supports "equal". So, a query like the
following seems to provide the list of (index type, strategy_number,
data_type) that we might be allowed to use.

  SELECT
am.amname AS index_type,
 amop.amoplefttype::regtype,amop.amoprighttype::regtype,
op.oprname AS operator,
amop.amopstrategy AS strategy_number
FROM
pg_amop amop
JOIN
pg_am am ON am.oid = amop.amopmethod
JOIN
pg_operator op ON op.oid = amop.amopopr
WHERE
(am.amname = 'btree' and amop.amopstrategy = 3) OR
(am.amname = 'hash' and amop.amopstrategy = 1) OR
(am.amname = 'brin' and amop.amopstrategy = 3)
ORDER BY
index_type,
strategy_number;


What do you think?


[1]
https://www.postgresql.org/docs/current/logical-replication-restrictions.html

[2] https://www.postgresql.org/docs/devel/xindex.html#XINDEX-STRATEGIES


Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-06-26 Thread Önder Kalacı
Hi Hayato, all

>
>
> This is a follow-up thread of [1]. The commit allowed subscribers to use
> indexes
> other than PK and REPLICA IDENTITY when REPLICA IDENTITY is FULL on
> publisher,
> but the index must be a B-tree. In this proposal, I aim to extend this
> functionality to allow
> for hash indexes and potentially other types.
>
>
Cool, thanks for taking the time to work on this.


> # Current problem
>
> The current limitation comes from the function build_replindex_scan_key(),
> specifically these lines:
>

When I last dealt with the same issue, I was examining it from a slightly
broader perspective. I think
my conclusion was that RelationFindReplTupleByIndex() is designed for the
constraints of UNIQUE INDEX
and Primary Key. Hence, btree limitation was given.

So, my main point is that it might be useful to check
RelationFindReplTupleByIndex() once more in detail
to see if there is anything else that is specific to btree indexes.

build_replindex_scan_key() is definitely one of the major culprits but see
below as well.

I think we should also be mindful about tuples_equal() function. When an
index returns more than
one tuple, we rely on tuples_equal() function to make sure non-relevant
tuples are skipped.

For btree indexes, it was safe to rely on that function as the columns that
are indexed using btree
always have equality operator. I think we can safely assume the same for
hash indexes.

However, say we indexed "point" type using "gist" index. Then, if we let
this logic to kick in,
I think tuples_equal() would fail saying that there is no equality operator
exists.

One might argue that it is already the case for RelationFindReplTupleSeq()
or when you
have index but the index on a different column. But still, it seems useful
to make sure
you are aware of this limitation as well.


>
> ## Current difficulties
>
> The challenge with supporting other indexes is that they lack a fixed set
> of strategies,
> making it difficult to choose the correct strategy number based on the
> index
> access method. Even within the same index type, different operator classes
> can
> use different strategy numbers for the same operation.
> E.g. [2] shows that number 6 can be used for the purpose, but other
> operator classes
> added by btree_gist [3] seem to use number 3 for the euqlaity comparison.
>
>
Also, build_replindex_scan_key() seems like a too late place to check this?
I mean, what
if there is no equality operator, how should code react to that? It
effectively becomes
RelationFindReplTupleSeq(), so maybe better follow that route upfront?

In other words, that decision should maybe
happen IsIndexUsableForReplicaIdentityFull()?

For the specific notes you raised about strategy numbers / operator
classes, I need to
study a bit :) Though, I'll be available to do that early next week.

Thanks,
Onder


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-22 Thread Önder Kalacı
Hi Shi Yu,


>
> Is there any reasons why we drop column here? Dropped column case has been
> tested on table dropped_cols. The generated column problem can be detected
> without dropping columns on my machine.
>

We don't really need to, if you check the first patch, we don't have DROP
for generated case. I mostly
wanted to make the test a little more interesting, but it also seems to be
a little confusing.

Now attaching v2 where we do not drop the columns. I don't have strong
preference on
which patch to proceed with, mostly wanted to attach this version to
progress faster (in case
you/Amit considers this one better).

Thanks,
Onder


v2-0001-REL_12-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v2-0001-HEAD-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v2-0001-REL_15-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v2-0001-REL_14-REL_13-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread Önder Kalacı
Hi Shi Yu,

>
>
> 1.
>  $node_publisher->safe_psql(
> 'postgres', qq(
> ALTER TABLE dropped_cols DROP COLUMN b_drop;
> +   ALTER TABLE generated_cols DROP COLUMN b_gen;
>  ));
>  $node_subscriber->safe_psql(
> 'postgres', qq(
> ALTER TABLE dropped_cols DROP COLUMN b_drop;
> +   ALTER TABLE generated_cols DROP COLUMN b_gen;
>  ));
>
> I think we want to test generated columns, so we don't need to drop
> columns.
> Otherwise the generated column problem can't be detected.
>
>
Ow, what a mistake. Now changed (and ensured that without the patch
the test fails).



> 2.
> # The bug was that when the REPLICA IDENTITY FULL is used with dropped
> columns,
> # we fail to apply updates and deletes
>
> Maybe we should mention generated columns in comment of the test.
>
> makes sense


> 3.
> I ran pgindent and it modified some lines. Maybe we can improve the patch
> as the following.
>
> @@ -292,8 +292,8 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot
> *slot2,
> att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
>
> /*
> -* Ignore dropped and generated columns as the publisher
> -* doesn't send those
> +* Ignore dropped and generated columns as the publisher
> doesn't send
> +* those
>  */
> if (att->attisdropped || att->attgenerated)
> continue;
>
>  fixed


Attached patches again.


Thanks,
Onder KALACI


v1-0001-REL_12-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_15-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_14-REL_13-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-HEAD-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread Önder Kalacı
Hi Amit, Shi Yu

Now attaching the similar patches for generated columns.

Thanks,
Onder KALACI



Amit Kapila , 21 Mar 2023 Sal, 09:07 tarihinde
şunu yazdı:

> On Mon, Mar 20, 2023 at 6:28 PM Amit Kapila 
> wrote:
> >
> > On Mon, Mar 20, 2023 at 2:58 PM Önder Kalacı 
> wrote:
> > >
> > >
> > > I applied all patches for all the versions, and re-run the
> subscription tests,
> > > all looks good to me.
> > >
> >
> > LGTM. I'll push this tomorrow unless there are more comments.
> >
>
> Pushed.
>
> --
> With Regards,
> Amit Kapila.
>


v1-0001-HEAD-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_15-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_14-REL_13-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


v1-0001-REL_12-Ignore-generated-columns-during-apply-of-update-d.patch
Description: Binary data


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-20 Thread Önder Kalacı
Hi Shi Yu, all

Thanks for updating the patches. It seems you forgot to attach the patches
> of
> dropped columns for HEAD and pg15, I think they are the same as v2.
>
>
Yes, it seems I forgot. And, yes they were the same as v2.


> On HEAD, we can re-use clusters in other test cases, which can save some
> time.
> (see fccaf259f22f4a)
>
>
 Thanks for noting.


> In the patches for pg12 and pg11, I am not sure why not add the test at
> end of
> the file 100_bugs.pl. I think it would be better to be consistent with
> other
> versions.
>

I applied the same patch that I created for HEAD, and then the "patch"
command
created this version. Given that we are creating new patches per version, I
think
you are right, we should put them at the end.



> The attached patches modify these two points. Besides, I made some minor
> changes, ran pgindent and pgperltidy.


oh, I didn't know about pgperltidy. And, thanks a lot for making changes and
making it ready for the committer.


> These are patches for dropped columns,
> because I think this would be submitted first, and we can discuss the fix
> for
> generated columns later.
>
>
Makes sense, even now we have 5 different patches, lets work on generated
columns
when this is fixed.

I applied all patches for all the versions, and re-run the subscription
tests,
all looks good to me.


Thanks,
Onder KALACI


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-17 Thread Önder Kalacı
Hi Shi Yu,

Thanks for the review, really appreciate it!


> I couldn't apply
> v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
> cleanly in v13 and v14. It looks the patch needs some changes in these
> versions.
>
>

> ```
> Checking patch src/backend/executor/execReplication.c...
> Hunk #1 succeeded at 243 (offset -46 lines).
> Hunk #2 succeeded at 263 (offset -46 lines).
> Checking patch src/test/subscription/t/100_bugs.pl...
> error: while searching for:
> $node_publisher->stop('fast');
> $node_subscriber->stop('fast');
>
> done_testing();
>
> error: patch failed: src/test/subscription/t/100_bugs.pl:373
> Applied patch src/backend/executor/execReplication.c cleanly.
> Applying patch src/test/subscription/t/100_bugs.pl with 1 reject...
> Rejected hunk #1.
> ```
>
>
Hmm, interesting, it behaves differently on Macos and linux. Now attaching
new patches that should apply. Can you please try?


Besides, I tried v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch in v12.
> The
> test failed and here's some information.
>
> ```
> Can't locate object method "new" via package "PostgreSQL::Test::Cluster"
> (perhaps you forgot to load "PostgreSQL::Test::Cluster"?) at t/100_bugs.pl
> line 74.
> # Looks like your test exited with 2 just after 1.
> ```
>
> +my $node_publisher_d_cols =
> PostgreSQL::Test::Cluster->new('node_publisher_d_cols');
>
> It seems this usage is not supported in v12 and we should use
> get_new_node()
> like other test cases.
>
>
Thanks for sharing. Fixed


This time I was able to run all the tests with all the patches applied.

Again, the generated column fix also has some minor differences
per version. So, overall we have 6 patches with very minor
differences :)


Thanks,
Onder


v3-0001-Ignore-dropped-columns-REL_11.patch
Description: Binary data


v3-0001-Ignore-dropped-columns-REL_12.patch
Description: Binary data


v3-0001-Ignore-dropped-columns-REL_13-REL_14.patch
Description: Binary data


v3-0001-Ignore-generated-columns-HEAD-REL_15.patch
Description: Binary data


v3-0001-Ignore-generated-columns-REL_14-REL_13.patch
Description: Binary data


v3-0001-Ignore-generated-columns-REL_12.patch
Description: Binary data


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-17 Thread Önder Kalacı
Hi Amit, all


> You can first submit the fix for dropped columns with patches till
> v10. Once that is committed, then you can send the patches for
> generated columns.
>
>
Alright, attaching 2 patches for dropped columns, the names of the files
shows which
versions the patch can be applied to:
v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch

And, then on top of that, you can apply the patch for generated columns on
all applicable
versions (HEAD, 15, 14, 13 and 12). It applies cleanly.  The name of the
file
is: v2-0001-Ignore-generated-columns.patch


But unfortunately I couldn't test the patch with PG 12 and below. I'm
getting some
unrelated compile errors and Postgrees CI is not available on
these versions . I'll try
to fix that, but I thought it would still be good to share the patches as
you might
already have the environment to run the tests.


Don't worry about v10 --- that's out of support and shouldn't
> get patched for this.


Given this information, I skipped the v10 patch.

Thanks,
Onder KALACI


v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch
Description: Binary data


v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
Description: Binary data


v2-0001-Ignore-generated-columns.patch
Description: Binary data


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-16 Thread Önder Kalacı
Hi Amit, Shi Yu,

> Generated column is introduced in PG12, and I reproduced generated column
problem
in PG12~PG15.
> For dropped column problem, I reproduced it in PG10~PG15. (Logical
replication
was introduced in PG10)

So, I'm planning to split the changes into two commits. The first one fixes
for dropped columns, and the second one adds generated columns check/test.

Is that the right approach for such a case?

>  and backpatch the fix for dropped column to PG10.

Still, even the first commit fails to apply cleanly to PG12 (and below).

What is the procedure here? Should I be creating multiple patches per
version?
Or does the committer prefer to handle the conflicts? Depending on your
reply,
I can work on the followup.

I'm still attaching the dropped column patch for reference.


Thanks,
Onder


shiy.f...@fujitsu.com , 16 Mar 2023 Per, 13:38
tarihinde şunu yazdı:

> On Thu, Mar 16, 2023 5:23 PM Amit Kapila  wrote:
> >
> > On Mon, Mar 13, 2023 at 6:26 PM Önder Kalacı 
> > wrote:
> > >
> > > Attaching v2
> > >
> >
> > Can we change the comment to: "Ignore dropped and generated columns as
> > the publisher doesn't send those."? After your change, att =
> > TupleDescAttr(slot1->tts_tupleDescriptor, attrnum); is done twice in
> > the same function.
> >
> > In test cases, let's change the comment to: "The bug was that when the
> > REPLICA IDENTITY FULL is used with dropped or generated columns, we
> > fail to apply updates and deletes.". Also, I think we don't need to
> > provide the email link as anyway commit message will have a link to
> > the discussion.
> >
> > Did you check this in the back branches?
> >
>
> I tried to reproduce this bug in backbranch.
>
> Generated column is introduced in PG12, and I reproduced generated column
> problem
> in PG12~PG15.
>
> For dropped column problem, I reproduced it in PG10~PG15. (Logical
> replication
> was introduced in PG10)
>
> So I think we should backpatch the fix for generated column to PG12, and
> backpatch the fix for dropped column to PG10.
>
> Regards,
> Shi Yu
>
From d25b2feac3b26646034bf3ace6ab93056c3c29af Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Thu, 16 Mar 2023 18:06:54 +0300
Subject: [PATCH v1 1/2] Ignore dropped columns when REPLICA IDENTITY FULL

Dropped columns are filled with NULL values on slot_store_data()
but not on table_scan_getnextslot(). With this commit, we skip
such columns while checking tuple equality.
---
 src/backend/executor/execReplication.c |  9 -
 src/test/subscription/t/100_bugs.pl| 51 ++
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index fa8628e3e1..7c8b0d2667 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -289,6 +289,13 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
 		Form_pg_attribute att;
 		TypeCacheEntry *typentry;
 
+		/*
+		 * Ignore dropped columns as the publisher doesn't send those
+		 */
+		att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
+		if (att->attisdropped)
+			continue;
+
 		/*
 		 * If one value is NULL and other is not, then they are certainly not
 		 * equal
@@ -302,8 +309,6 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
 		if (slot1->tts_isnull[attrnum] || slot2->tts_isnull[attrnum])
 			continue;
 
-		att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
-
 		typentry = eq[attrnum];
 		if (typentry == NULL)
 		{
diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 036576acab..255746094c 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -373,4 +373,55 @@ is( $result, qq(1
 $node_publisher->stop('fast');
 $node_subscriber->stop('fast');
 
+# The bug was that when the REPLICA IDENTITY FULL is used with dropped columns,
+# we fail to apply updates and deletes
+my $node_publisher_d_cols = PostgreSQL::Test::Cluster->new('node_publisher_d_cols');
+$node_publisher_d_cols->init(allows_streaming => 'logical');
+$node_publisher_d_cols->start;
+
+my $node_subscriber_d_cols = PostgreSQL::Test::Cluster->new('node_subscriber_d_cols');
+$node_subscriber_d_cols->init(allows_streaming => 'logical');
+$node_subscriber_d_cols->start;
+
+$node_publisher_d_cols->safe_psql(
+	'postgres', qq(
+	CREATE TABLE dropped_cols (a int, b_drop int, c int);
+	ALTER TABLE dropped_cols REPLICA IDENTITY FULL;
+	CREATE PUBLICATION pub_dropped_cols FOR TABLE dropped_cols;
+	-- some initial data
+	INSERT INTO dropped_cols VALUES (1, 1, 1);
+));
+
+$node_subscriber_d_cols->safe_psql(
+	'postgres', qq(
+	 CREATE TABLE dropped_cols (a int, b_

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-14 Thread Önder Kalacı
>
>
>
> Thanks for the updated patch.
> Few minor comments:
> 1) The extra line break after IsIndexOnlyOnExpression function can be
> removed:
>

removed


>
>
> 2) Generally we don't terminate with "." for single line comments
> +
> + /*
> + * Simple case, we already have a primary key or a replica identity index.
> + */
> + idxoid = GetRelationIdentityOrPK(localrel);
> + if (OidIsValid(idxoid))
> + return idxoid;
>
> Well, there are several "."  for single line comments even in the same
file such as:

/* 0 means it's a dropped attribute.  See comments atop AttrMap. */

I really don't have any preference on this, but I doubt if I change it,
I'll get
another review suggesting to conform to the existing style in the same file.
So, I'm skipping this suggestion for now, unless you have objections.


v51_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-14 Thread Önder Kalacı
Amit Kapila , 14 Mar 2023 Sal, 11:59 tarihinde
şunu yazdı:

> On Tue, Mar 14, 2023 at 12:48 PM Önder Kalacı 
> wrote:
> >>
> >> 2.
> >> +# make sure that the subscriber has the correct data after the update
> UPDATE
> >>
> >> "update UPDATE" seems to be a typo.
> >>
> >
> > thanks, fixed
> >
> >>
> >> 3.
> >> +# now, drop the index with the expression, and re-create index on
> column lastname
> >>
> >> The comment says "re-create index on column lastname" but it seems we
> didn't do
> >> that, should it be modified to something like:
> >> # now, drop the index with the expression, we will use sequential scan
> >>
> >>
> >
> > Thanks, fixed
> >
> > I'll add the changes to v49 in the next e-mail.
> >
>
> It seems you forgot to address these last two comments in the latest
> version.
>
>
Oops, sorry. I think when I get your test changes, I somehow overridden
these changes
on my local.


v50_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-14 Thread Önder Kalacı
Hi Amit, all


Amit Kapila , 14 Mar 2023 Sal, 09:50 tarihinde
şunu yazdı:

> On Mon, Mar 13, 2023 at 7:46 PM Önder Kalacı 
> wrote:
> >
> > Attaching v47.
> >
>
> I have made the following changes in the attached patch (a) removed
> the function IsIdxSafeToSkipDuplicates() and used the check directly
> in the caller


Should be fine, we can re-introduce this function when I work on the
non-pkey/RI unique index improvement as a follow up to this.


> ; (b) changed a few comments in the patch;


Thanks, looks good.


> (c) the test
> file was inconsistently using ';' while executing statement with
> safe_psql, changed it to remove ';'.
>
>
Alright, thanks.

And as a self-review, when I write regression tests next time, I'll spend a
lot
more time on the style/consistency/comments etc. During this review,
the reviewers had to spend many cycles on that area, which is something
I should have done better.

Attaching v49 with some minor changes Shi Yu noted earlier.

Thanks,
Onder KALACI


v49-0001-Allow-the-use-of-indexes-other-than-PK-and-REPLI.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-14 Thread Önder Kalacı
Hi Shi Yu,


> in RemoteRelContainsLeftMostColumnOnIdx():
>
> +   if (indexInfo->ii_NumIndexAttrs < 1)
> +   return false;
>
> Did you see any cases that the condition is true? I think there is at
> least one
> column in the index. If so, we can use an Assert().
>

Actually, it was mostly to guard against any edge cases. I thought similar
to tables,
we can have zero column indexes, but it turns out it is not possible. Also,
index_create seems to check that, so changing it asset makes sense:

>
>  /*
> * check parameters
> */
> if (indexInfo->ii_NumIndexAttrs < 1)
> elog(ERROR, "must index at least one column");




>
> +   if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
> +   return false;
>
> Similarly, I think `attrmap->maplen` is the number of columns and it is
> always
> greater than keycol. If you agree, we can check it with an Assert().


At this point, I'm really hesitant to make any assumptions. Logical
replication
is pretty flexible, and who knows maybe dropped columns will be treated
differently at some point, and this assumption changes?

I really feel more comfortable to keep this as-is. We call this function
very infrequently
anyway.


> Besides, It
> seems we don't need AttrNumberGetAttrOffset().
>
>
Why not? We are accessing the AttrNumberGetAttrOffset(keycol) element
of the array attnums?


> 2.
> +# make sure that the subscriber has the correct data after the update
> UPDATE
>
> "update UPDATE" seems to be a typo.
>
>
thanks, fixed


> 3.
> +# now, drop the index with the expression, and re-create index on column
> lastname
>
> The comment says "re-create index on column lastname" but it seems we
> didn't do
> that, should it be modified to something like:
> # now, drop the index with the expression, we will use sequential scan
>
>
>
Thanks, fixed

I'll add the changes to v49 in the next e-mail.

Thanks,
Onder KALACI


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-13 Thread Önder Kalacı
Hi Amit, all

> >
> > For that test, my goal was to ensure/show that the invalidation callback
> > is triggered after `DROP / CREATE INDEX` commands.
> >
>
> Fair point. I suggest in that case just keep one of the tests for Drop
> Index such that after that it will pick up a sequence scan. However,
> just do the poll for the number of index scans stat once. I think that
> will cover the case you are worried about without having a noticeable
> impact on test timing.
>
>
So, after dropping the index, it is not possible to poll for the idxscan.

But, I think, after the drop index, it is enough to check if the
modification
is applied properly on the target (wait_for_catchup + safe_psql).
If it were to cache the indexOid, the update/delete would fail anyway.

Attaching v47.


Thanks,
Onder KALACI


v47_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-13 Thread Önder Kalacı
Hi Shi Yu,


> 1.
> @@ -243,6 +243,17 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot
> *slot2,
> Form_pg_attribute att;
> TypeCacheEntry *typentry;
>
> +
> +   Form_pg_attribute attr =
> TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
> +
>
> I think we can use "att" instead of a new variable. They have the same
> value.
>

ah, of course :)


>
> 2.
> +# The bug was that when when the REPLICA IDENTITY FULL is used with
> dropped
>
> There is an extra "when".
>
>
Fixed, thanks


Attaching v2
From caae511206ee904db54d0a9300ecef04c86aadb6 Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Sat, 11 Mar 2023 12:09:31 +0300
Subject: [PATCH v2] Skip dropped and generated columns when REPLICA IDENTITY
 FULL

Dropped columns and generated columns are filled with NULL values on
slot_store_data() but not on table_scan_getnextslot(). With this commit,
we skip such columns while checking tuple equality.
---
 src/backend/executor/execReplication.c | 10 
 src/test/subscription/t/100_bugs.pl| 67 ++
 2 files changed, 77 insertions(+)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 4f5083a598..9d447b3c00 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -243,6 +243,16 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
 		Form_pg_attribute att;
 		TypeCacheEntry *typentry;
 
+		att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
+
+		/*
+		 * Dropped columns and generated columns are filled with NULL values on
+		 * slot_store_data() but not on table_scan_getnextslot, so ignore
+		 * for the comparison.
+		 */
+		if (att->attisdropped || att->attgenerated)
+			continue;
+
 		/*
 		 * If one value is NULL and other is not, then they are certainly not
 		 * equal
diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 036576acab..5f902d26b6 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -373,4 +373,71 @@ is( $result, qq(1
 $node_publisher->stop('fast');
 $node_subscriber->stop('fast');
 
+# https://www.postgresql.org/message-id/CACawEhUu6S8E4Oo7%2Bs5iaq%3DyLRZJb6uOZeEQSGJj-7NVkDzSaw%40mail.gmail.com
+
+# The bug was that when the REPLICA IDENTITY FULL is used with dropped
+# or generated columns, the equality checks for the tuples was trying to
+# compare NULL Datum values (coming from slot_store_data()) with the
+# non-visible actua Datum values (coming from table_scan_getnextslot()).
+my $node_publisher_d_cols = PostgreSQL::Test::Cluster->new('node_publisher_d_cols');
+$node_publisher_d_cols->init(allows_streaming => 'logical');
+$node_publisher_d_cols->start;
+
+my $node_subscriber_d_cols = PostgreSQL::Test::Cluster->new('node_subscriber_d_cols');
+$node_subscriber_d_cols->init(allows_streaming => 'logical');
+$node_subscriber_d_cols->start;
+
+$node_publisher_d_cols->safe_psql(
+	'postgres', qq(
+	CREATE TABLE dropped_cols (a int, b_drop int, c int);
+	ALTER TABLE dropped_cols REPLICA IDENTITY FULL;
+
+	CREATE TABLE generated_cols (a int, b_gen int GENERATED ALWAYS AS (5 * a) STORED, c int);
+	ALTER TABLE generated_cols REPLICA IDENTITY FULL;
+
+	CREATE PUBLICATION pub_dropped_cols FOR TABLE dropped_cols, generated_cols;
+
+	-- some initial data
+	INSERT INTO dropped_cols VALUES (1, 1, 1);
+	INSERT INTO generated_cols (a,c) VALUES (1, 1);
+));
+
+$node_subscriber_d_cols->safe_psql(
+	'postgres', qq(
+	 CREATE TABLE dropped_cols (a int, b_drop int, c int);
+	 CREATE TABLE generated_cols (a int, b_gen int GENERATED ALWAYS AS (5 * a) STORED, c int);
+
+));
+
+my $publisher_connstr_d_cols = $node_publisher_d_cols->connstr . ' dbname=postgres';
+$node_subscriber_d_cols->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub_dropped_cols CONNECTION '$publisher_connstr_d_cols' PUBLICATION pub_dropped_cols"
+);
+$node_subscriber_d_cols->wait_for_subscription_sync;
+
+$node_publisher_d_cols->safe_psql(
+	'postgres', qq(
+		ALTER TABLE dropped_cols DROP COLUMN b_drop;
+));
+$node_subscriber_d_cols->safe_psql(
+	'postgres', qq(
+		ALTER TABLE dropped_cols DROP COLUMN b_drop;
+));
+
+$node_publisher_d_cols->safe_psql(
+	'postgres', qq(
+		UPDATE dropped_cols SET a = 100;
+		UPDATE generated_cols SET a = 100;
+));
+$node_publisher_d_cols->wait_for_catchup('sub_dropped_cols');
+
+is($node_subscriber_d_cols->safe_psql('postgres', "SELECT count(*) FROM dropped_cols WHERE a = 100"),
+	qq(1), 'replication with RI FULL and dropped columns');
+
+is($node_subscriber_d_cols->safe_psql('postgres', "SELECT count(*) FROM generated_cols WHERE a = 100"),
+	qq(1), 'replication with RI FULL and generated columns');
+
+$node_publisher_d_cols->stop('fast');
+$node_subscriber_d_cols->stop('fast');
+
 done_testing();
-- 
2.34.1



Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-13 Thread Önder Kalacı
Hi Amit, Peter, all


> > If the reason for the stats polling was only to know if some index is
> > chosen or not, I was wondering if you can just convey the same
> > information to the TAP test via some conveniently placed (DEBUG?)
> > logging.
> >
>
> I had thought about it but didn't convince myself that it would be a
> better approach because it would LOG a lot of messages for bulk
> updates/deletes.


I'm also hesitant to add any log messages for testing purposes, especially
something like this one, where a single UPDATE on the source code
leads to an unbounded number of logs.


>
> 1.
> +# subscriber gets the missing table information
> +$node_subscriber->safe_psql('postgres',
> + "ALTER SUBSCRIPTION tap_sub_rep_full REFRESH PUBLICATION");
>
> This and the follow-on test was not required after we have removed
> Dropped columns test.
>
>
Right, I kept it with the idea that we might get the dropped column changes
earlier, so that I can rebase and add the drop column ones.

But, sure, we can add that later to other tests.


> 2. Reduce the number of updates/deletes in the first test to two rows.
>

We don't have any particular reasons to have more tuples. Given the
time constraints, I don't have any objections to change this.


>
> 3. Removed the cases for dropping the index. This ensures that after
> dropping the index on the table we switch to either an index scan (if
> a new index is created) or to a sequence scan. It doesn't seem like a
> very interesting case to me.
>

For that test, my goal was to ensure/show that the invalidation callback
is triggered after `DROP / CREATE INDEX` commands.

Can we always assume that this would never change? Because if this
behavior ever changes, the users would stuck with the wrong/old
index until VACUUM happens.


>
> Apart from the above, I have removed the explicit setting of
> 'wal_retrieve_retry_interval = 1ms' as the same is not done for any
> other subscription tests. I know setting wal_retrieve_retry_interval
> avoids the launcher sometimes taking more time to launch apply worker
> but it is better to be consistent


Hmm, I cannot remember why I added that. It was probably to make
poll_query_until/wait_for_catchup to happen faster.

But, running the test w/wout this setting, I cannot observe any noticeable
difference. So, probably fine to remove.


> . See the changes in
> changes_amit_1.patch, if you agree with the same then please include
> them in the next version.
>

included all, but I'm not very sure to remove (3). If you think we have
coverage for that in other cases, I'm fine with that.


>
> After doing the above, the test time on my machine is closer to what
> other tests take which is ~5s.
>
> Yes, same for me.

Thanks, attaching v46
From ad08f37ac79d5b68f0335cb785820db50c55b85a Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Wed, 22 Feb 2023 14:12:56 +0300
Subject: [PATCH v46] Allow the use of indexes other than PK and REPLICA
 IDENTITY on the subscriber.

Using REPLICA IDENTITY FULL on the publisher can lead to a full table
scan per tuple change on the subscription when REPLICA IDENTITY or PK
index is not available. This makes REPLICA IDENTITY FULL impractical
to use apart from some small number of use cases.

This patch allows using indexes other than PRIMARY KEY or REPLICA
IDENTITY on the subscriber during apply of update/delete. The index
that can be used must be a btree index, not a partial index, and it
must have at least one column reference (i.e. cannot consist of only
expressions). We can uplift these restrictions in the future. There is
no smart mechanism to pick the index. If there is more than one index
that satisfies these requirements, we just pick the first one. We discussed
using some of the optimizer's low-level APIs for this but ruled it out
as that can be a maintenance burden in the long run.

This patch improves the performance in the vast majority of cases and
the improvement is proportional to the amount of data in the table.
However, there could be some regression in a small number of cases
where the indexes have a lot of duplicate and dead rows. It was
discussed that those are mostly impractical cases but we can provide a
table or subscription level option to disable this feature if
required.

Author: Onder Kalaci
Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
Hayato, Amit Kapila
Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml |   9 +-
 src/backend/executor/execReplication.c| 111 +++--
 src/backend/replication/logical/relation.c| 227 -
 src/backend/replication/logical/worker.c  |  56 +--
 src/include/replication/logicalrelation.h |   5 +
 src/test/subscription/meson.build |   1 +
 .../subscription/t/032_subscribe_use_index.pl | 435 ++
 7 files changed, 775 insertions(+), 69 deletions(-)
 create mode 100644 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-13 Thread Önder Kalacı
Hi Hou zj,  Shi-san, all


> In this function, it used the local column number(keycol) to match the
> remote
> column number(attkeys), I think it will cause problem if the column order
> between pub/sub doesn't match. Like:
>
> ---
> - pub
> CREATE TABLE test_replica_id_full (x int, y int);
> ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
> - sub
> CREATE TABLE test_replica_id_full (z int, y int, x int);
> CREATE unique INDEX idx ON test_replica_id_full(z);
> CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> port=5432' PUBLICATION tap_pub_rep_full;
> ---
>
> I think we need to use the attrmap->attnums to convert the column number
> before
> comparing. Just for reference, attach a diff(0001) which I noted down when
> trying to
> fix the problem.
>

I'm always afraid of these types of last minute additions to the patch, and
here we have
this issue on one of the latest addition :(

Thanks for reporting the problem and also providing guidance on the fix.
After reading
codes on attrMap and debugging this case further, I think your suggestion
makes sense.

I only made some small changes, and included them in the patch.


> Besides, I also look at the "WIP: Optimize for non-pkey / non-RI unique
> indexes" patch, I think it also had a similar problem about the column
> matching


Right, I'll incorporate this fix to that one as well.


> . And another thing I think we can improved in this WIP patch is that
> we can cache the result of IsIdxSafeToSkipDuplicates() instead of doing it
> for
> each UPDATE, because the cost of this function becomes bigger after
> applying
> this patch


Yes, it makes sense.


>
> Thanks for Shi-san for helping to finish these fixes.
>
> Thank you both!


Onder Kalaci
From 115999a134635ff6bef4caab434e6ece5d77c1b8 Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Wed, 22 Feb 2023 14:12:56 +0300
Subject: [PATCH v45] Allow the use of indexes other than PK and REPLICA
 IDENTITY on the subscriber.

Using REPLICA IDENTITY FULL on the publisher can lead to a full table
scan per tuple change on the subscription when REPLICA IDENTITY or PK
index is not available. This makes REPLICA IDENTITY FULL impractical
to use apart from some small number of use cases.

This patch allows using indexes other than PRIMARY KEY or REPLICA
IDENTITY on the subscriber during apply of update/delete. The index
that can be used must be a btree index, not a partial index, and it
must have at least one column reference (i.e. cannot consist of only
expressions). We can uplift these restrictions in the future. There is
no smart mechanism to pick the index. If there is more than one index
that satisfies these requirements, we just pick the first one. We discussed
using some of the optimizer's low-level APIs for this but ruled it out
as that can be a maintenance burden in the long run.

This patch improves the performance in the vast majority of cases and
the improvement is proportional to the amount of data in the table.
However, there could be some regression in a small number of cases
where the indexes have a lot of duplicate and dead rows. It was
discussed that those are mostly impractical cases but we can provide a
table or subscription level option to disable this feature if
required.

Author: Onder Kalaci
Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
Hayato, Amit Kapila
Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml |   9 +-
 src/backend/executor/execReplication.c| 111 ++--
 src/backend/replication/logical/relation.c| 227 +++-
 src/backend/replication/logical/worker.c  |  56 +-
 src/include/replication/logicalrelation.h |   5 +
 src/test/subscription/meson.build |   1 +
 .../subscription/t/032_subscribe_use_index.pl | 514 ++
 7 files changed, 854 insertions(+), 69 deletions(-)
 create mode 100644 src/test/subscription/t/032_subscribe_use_index.pl

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 1bd5660c87..6b0e300adc 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -132,7 +132,14 @@
certain additional requirements) can also be set to be the replica
identity.  If the table does not have any suitable key, then it can be set
to replica identity full, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
+   the key.  When replica identity full is specified,
+   indexes can be used on the subscriber side for searching the rows.  Candidate
+   indexes must be btree, non-partial, and have at least one column reference
+   (i.e. cannot consist of only expressions).  These restrictions
+   on the non-unique index properties adhere to some of the restrictions that
+   are enforced for 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-13 Thread Önder Kalacı
Hi Shi Yu,


> > >
> > >
> > > Reading [1], I think I can follow what you suggest. So, basically,
> > > if the leftmost column is not filtered, we have the following:
> > >
> > >>  but the entire index would have to be scanned, so in most cases the
> planner
> > would prefer a sequential table scan over using the index.
> > >
> > >
> > > So, in our case, we could follow a similar approach. If the leftmost
> column of
> > the index
> > > is not sent over the wire from the pub, we can prefer the sequential
> scan.
> > >
> > > Is my understanding of your suggestion accurate?
> > >
> >
> > Yes. I request an opinion from Shi-San who has reported the problem.
> >
>
> I also agree with this.
> And I think we can mention this in the comments if we do so.
>
>
Already commented on FindUsableIndexForReplicaIdentityFull() on v44.


Thanks,
Onder KALACI


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-12 Thread Önder Kalacı
Hi Amit, all


> >> I think we can add such a test (which relies on existing buggy
> >> behavior) later after fixing the existing bug. For now, it would be
> >> better to remove that test and add it after we fix dropped columns
> >> issue in HEAD.
> >
> >
> > Alright, when I push the next version (hopefully tomorrow), I'll follow
> this suggestion.
> >
>
> Okay, thanks. See, if you can also include your changes in the patch
> wip_for_optimize_index_column_match (after discussed modification).
> Few other minor comments:
>

Sure, done. Please check RemoteRelContainsLeftMostColumnOnIdx() function.

Note that we already have a test for that on SOME NULL VALUES AND MISSING
COLUMN.
Previously we'd check if test_replica_id_full_idy is used. Now, we don't
because it is not
used anymore :) I initially used poll_query_until with idx_scan=0, but that
also seems
confusing to read in the test. It looks like it could be prone to race
conditions as
poll_query_until with idxcan=0 does not guarantee anything.


>
> 1.
> +   are enforced for primary keys.  Internally, we follow a similar
> approach for
> +   supporting index scans within logical replication scope.  If there are
> no
>
> I think we can remove the above line: "Internally, we follow a similar
> approach for supporting index scans within logical replication scope."
> This didn't seem useful for users.
>
>
removed


> 2.
> diff --git a/src/backend/executor/execReplication.c
> b/src/backend/executor/execReplication.c
> index bc6409f695..646e608eb7 100644
> --- a/src/backend/executor/execReplication.c
> +++ b/src/backend/executor/execReplication.c
> @@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
> rel, Relation idxrel,
> if (!AttributeNumberIsValid(table_attno))
> {
> /*
> -* XXX: For a non-primary/unique index with an
> additional
> -* expression, we do not have to continue at
> this point. However,
> -* the below code assumes the index scan is
> only done for simple
> -* column references. If we can relax the
> assumption in the below
> -* code-block, we can also remove the continue.
> +* XXX: Currently, we don't support
> expressions in the scan key,
> +* see code below.
>  */
>
>
> I have tried to simplify the above comment. See, if that makes sense to
> you.
>

Makes sense


>
> 3.
> /*
> + * We only need to allocate once. This is allocated within per
> + * tuple context -- ApplyMessageContext -- hence no need to
> + * explicitly pfree().
> + */
>
> We normally don't write why we don't need to explicitly pfree. It is
> good during the review but not sure if it is a good idea to keep it in
> the final code.
>
>
Sounds good, applied

4. I have modified the proposed commit message as follows, see if that
> makes sense to you, and let me know if I missed anything especially
> the review/author credit.
>
> Allow the use of indexes other than PK and REPLICA IDENTITY on the
> subscriber.
>
> Using REPLICA IDENTITY FULL on the publisher can lead to a full table
> scan per tuple change on the subscription when REPLICA IDENTITY or PK
> index is not available. This makes REPLICA IDENTITY FULL impractical
> to use apart from some small number of use cases.
>
> This patch allows using indexes other than PRIMARY KEY or REPLICA
> IDENTITY on the subscriber during apply of update/delete. The index
> that can be used must be a btree index, not a partial index, and it
> must have at least one column reference (i.e. cannot consist of only
> expressions). We can uplift these restrictions in the future. There is
> no smart mechanism to pick the index. If there is more than one index
> that
> satisfies these requirements, we just pick the first one. We discussed
> using some of the optimizer's low-level APIs for this but ruled it out
> as that can be a maintenance burden in the long run.
>
> This patch improves the performance in the vast majority of cases and
> the improvement is proportional to the amount of data in the table.
> However, there could be some regression in a small number of cases
> where the indexes have a lot of duplicate and dead rows. It was
> discussed that those are mostly impractical cases but we can provide a
> table or subscription level option to disable this feature if
> required.
>
> Author: Onder Kalaci
> Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
> Hayato, Amit Kapila
> Discussion:
> https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqoniigz7g73hfys7+ng...@mail.gmail.com
>
>
I also see 2 mails/reviews from Wang wei, but I'm not sure what qualifies
as "reviewer" for this group. Should we
add that name as well? I think you can guide us on this.

Other than that, I only fixed one extra new line between 'that' and
"satisfies'. Other than that, it looks pretty good!


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-12 Thread Önder Kalacı
Hi Amit, all

>
> 1.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
> +# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY
>
> No need to use Delete test separate for this.
>

Yeah, there is really no difference between update/delete for this patch,
so it makes sense. I initially added it for completeness for the coverage,
but as it has the perf overhead for the tests, I agree that we could
drop some of those.


>
> 2.
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
> +
> +# check if the index is used even when the index has NULL values
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates test_replica_id_full table";
>
> Here, I think only one update is sufficient.
>

done. I guess you requested this change so that we would wait
for idx_scan=1 not idx_scan=2, which could help.


> 3.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE INDEX people_last_names ON people(lastname)");
> +
> +# wait until the index is created
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'people_last_names';}
> +) or die "Timed out while waiting for creating index people_last_names";
>
> I don't think we need this poll.
>

 true, not sure why I have this. none of the tests has this anyway.


> 4.
> +# update 2 rows
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_1';");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_3' AND lastname = 'last_name_3';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates two rows via index scan with index on
> expressions and columns";
> +
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM people WHERE firstname = 'no-name';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full deletes two rows via index scan with index on
> expressions and columns";
>
> I think having one update or delete should be sufficient.
>

So, I dropped the 2nd update, but kept 1 update and 1 delete.
The latter deletes the tuple updated by the former. Seems like
an interesting test to keep.

Still, I dropped one of the extra poll_query_until, which is probably
good enough for this one? Let me know if you think otherwise.


>
> 5.
> +# update rows, moving them to other partitions
> +$node_publisher->safe_psql('postgres',
> + "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for updates on partitioned table with
> index";
> +
> +# delete rows from different partitions
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates partitioned table";
> +
>
> Can we combine these two polls?
>

Looking at it closely, the first one seems like an unnecessary poll anyway.
We can simply check the idxscan at the end of the test, I don't see
value in checking earlier.


>
> 6.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS,
> ALSO
> +# DROPS COLUMN
>
> In this test, let's try to update/delete 2-3 rows instead of 20. And
> after drop columns, let's keep just one of the update or delete.
>

changed to 3 rows


>
> 7. Apart from the above, I think it is better to use
> wait_for_catchup() consistently before trying to verify the data on
> the subscriber. We always use it in other tests. I guess here you are
> relying on the poll for index scans to ensure that data is replicated
> but I feel it may still be better to use wait_for_catchup().
>

Yes, that 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-11 Thread Önder Kalacı
Hi Amit, all


> > This triggers tuples_equal to fail. To fix that, I improved the
> tuples_equal
> > such that it skips the dropped columns.
> >
>
>  By any chance, have you tried with generated columns?


Yes, it shows the same behavior.


> See
> logicalrep_write_tuple()/logicalrep_write_attrs() where we neither
> send anything for dropped columns nor for generated columns.

Similarly, on receiving side, in logicalrep_rel_open() and
> slot_store_data(), we seem to be using NULL for such columns.
>
>
Thanks for the explanation, it helps a lot.


>
> Yes, it would be better to report and discuss this in a separate thread,
>

Done via [1]

>
> > Attached as v40_0001 on the patch.
> >
> > Note that I need to have that commit as 0001 so that 0002 patch
> > passes the tests.
> >
>
> I think we can add such a test (which relies on existing buggy
> behavior) later after fixing the existing bug. For now, it would be
> better to remove that test and add it after we fix dropped columns
> issue in HEAD.
>

Alright, when I push the next version (hopefully tomorrow), I'll follow
this suggestion.

Thanks,
Onder KALACI

[1]
https://www.postgresql.org/message-id/CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3%2Bv8q32AWYsdpGg%40mail.gmail.com


Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-11 Thread Önder Kalacı
Hi all,

(cc'ed Amit as he has the context)

While working on [1], I realized that on HEAD there is a problem with the
$subject.  Here is the relevant discussion on the thread [2]. Quoting my
own notes on that thread below;

I realized that the dropped columns also get into the tuples_equal()
> function. And,
> the remote sends NULL to for the dropped columns(i.e., remoteslot), but
> index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
> columns on the outslot. So, the dropped columns are not NULL in the outslot


Amit also suggested checking generated columns, which indeed has the same
problem.

Here are the steps to repro the problem with dropped columns:

- pub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
ALTER TABLE test REPLICA IDENTITY FULL;
INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM
generate_series(0,1)i;
CREATE PUBLICATION pub FOR ALL TABLES;

-- sub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION pub;

-- show that before dropping the columns, the data in the source and
-- target are deleted properly
DELETE FROM test WHERE x = 0;

-- both on the source and target
SELECT count(*) FROM test WHERE x = 0;
┌───┐
│ count │
├───┤
│ 0 │
└───┘
(1 row)

-- drop columns on both the the source
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- drop columns on both the the target
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- on the target
ALTER SUBSCRIPTION sub REFRESH PUBLICATION;

-- after dropping the columns
DELETE FROM test WHERE x = 1;

-- source
SELECT count(*) FROM test WHERE x = 1;
┌───┐
│ count │
├───┤
│ 0 │
└───┘
(1 row)


**-- target, OOPS wrong result**SELECT count(*) FROM test WHERE x = 1;
┌───┐
│ count │
├───┤
│ 1 │
└───┘
(1 row)



Attaching a patch that could possibly solve the problem.

Thanks,
Onder KALACI


[1]
https://www.postgresql.org/message-id/flat/CACawEhUN%3D%2BvjY0%2B4q416-rAYx6pw-nZMHQYsJZCftf9MjoPN3w%40mail.gmail.com#2f7fa76f9e4496e3b52a9be6736e5b43
[2]
https://www.postgresql.org/message-id/CACawEhUu6S8E4Oo7%2Bs5iaq%3DyLRZJb6uOZeEQSGJj-7NVkDzSaw%40mail.gmail.com


v1-0001-Skip-dropped-and-generated-columns-when-REPLICA-I.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Önder Kalacı
Hi Amit, all


> I'll look for more opportunities and reply to the thread. I wanted to send
> this mail so that you can have a look at (1) earlier.
>
>
> I merged SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
into SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS.

Also, merged SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA and
 A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

So, we have 6 test cases left. I start to feel that trying to merge further
is going to start making
the readability get worse. Do you have any further easy test case merge
suggestions?

I think one option could be to drop some cases altogether, but not sure
we'd want that.

As a semi-related question: Are you aware of any setting that'd
make pg_stat_all_indexes
reflect the changes sooner? It is hard to debug what is the bottleneck in
the tests, but
I have a suspicion that there might be several poll_query_until() calls on
pg_stat_all_indexes, which might be the reason?

Attaching v43.


v43_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v43_0001_Skip-dropped_columns_for_tuples_equal.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Önder Kalacı
Hi Amit, all


> wip_for_optimize_index_column_match
> +static bool
> +IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
> + LogicalRepRelation  *remoterel)
> +{
> + for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
> + {
>
> Wouldn't it be better to just check if the first column is not part of
> the remote column then we can skip that index?
>

Reading [1], I think I can follow what you suggest. So, basically,
if the leftmost column is not filtered, we have the following:

 but the entire index would have to be scanned, so in most cases the
> planner would prefer a sequential table scan over using the index.


So, in our case, we could follow a similar approach. If the leftmost column
of the index
is not sent over the wire from the pub, we can prefer the sequential scan.

Is my understanding of your suggestion accurate?


>
> In wip_optimize_for_non_pkey_non_ri_unique_index patch, irrespective
> of whether we want to retain this set of changes, the function name
> IsIdxSafeToSkipDuplicates() sounds better than
> IdxIsRelationIdentityOrPK() and we can even change the check to
> GetRelationIdentityOrPK() instead of separate checks replica index and
> PK. So, it would be better to include this part of the change (a.
> change the function name to IsIdxSafeToSkipDuplicates() and (b) change
> the check to use GetRelationIdentityOrPK()) in the main patch.
>
>
>
I agree that it is a good change. Added to v42

Thanks,
Onder KALACI



[1] https://www.postgresql.org/docs/current/indexes-multicolumn.html


v42_0001_Skip-dropped_columns_for_tuples_equal.patch
Description: Binary data


v42_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Önder Kalacı
Hi Peter, all


> src/backend/replication/logical/relation.c
>
> 1. FindUsableIndexForReplicaIdentityFull
>
> +static Oid
> +FindUsableIndexForReplicaIdentityFull(Relation localrel)
> +{
> + List*indexlist = RelationGetIndexList(localrel);
> + Oid usableIndex = InvalidOid;
> + ListCell   *lc;
> +
> + foreach(lc, indexlist)
> + {
> + Oid idxoid = lfirst_oid(lc);
> + bool isUsableIndex;
> + Relation indexRelation = index_open(idxoid, AccessShareLock);
> + IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);
> +
> + isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
> +
> + index_close(indexRelation, AccessShareLock);
> +
> + if (isUsableIndex)
> + {
> + /* we found one eligible index, don't need to continue */
> + usableIndex = idxoid;
> + break;
> + }
> + }
> +
> + return usableIndex;
> +}
>
> This comment is not functional -- if you prefer the code as-is, then
> ignore this comment.
>
> But, personally I would:
> - Move some of that code from the declarations. I feel it would be
> better if the index_open/index_close were both in the code-body
> instead of half in declarations and half not.
> - Remove the 'usableIndex' variable, and just return directly.
> - Shorten all the long names (and use consistent 'idx' instead of
> sometimes 'idx' and sometimes 'index')
>
> SUGGESTION (YMMV)
>
> static Oid
> FindUsableIndexForReplicaIdentityFull(Relation localrel)
> {
> List*idxlist = RelationGetIndexList(localrel);
> ListCell   *lc;
>
> foreach(lc, idxlist)
> {
> Oid idxoid = lfirst_oid(lc);
> bool isUsableIdx;
> Relation idxRel;
> IndexInfo  *idxInfo;
>
> idxRel = index_open(idxoid, AccessShareLock);
> idxInfo = BuildIndexInfo(idxRel);
> isUsableIdx = IsIndexUsableForReplicaIdentityFull(idxInfo);
> index_close(idxRel, AccessShareLock);
>
> /* Return the first eligible index found */
> if (isUsableIdx)
> return idxoid;
> }
>
> return InvalidOid;
> }
>

applied your suggestion. I think it made it slightly easier to follow.


>
> ==
> .../subscription/t/032_subscribe_use_index.pl
>
> 2. SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
>
> 2a.
> # Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
> #
> # This test ensures that after CREATE INDEX, the subscriber can
> automatically
> # use one of the indexes (provided that it fulfils the requirements).
> # Similarly, after DROP index, the subscriber can automatically switch to
> # sequential scan
>
>
> The last sentence is missing full-stop.
>
>
fixed


> ~
>
> 2b.
> # now, create index and see that the index is used
> $node_subscriber->safe_psql('postgres',
> "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
>
> Don't say "and see that the index is used". Yes, that is what this
> whole test is doing, but it is not what the psql following this
> comment is doing.
>

 true


> ~
>
> 2c.
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
> $node_publisher->wait_for_catchup($appname);
>
>
> # wait until the index is used on the subscriber
>
> The double blank lines here should be single.
>
> ~
>

fixed,


>
> 2d.
> # now, the update could either use the test_replica_id_full_idx or
> # test_replica_id_full_idy index it is not possible for user to control
> # which index to use
>
> This sentence should break at "it".
>
> Aso "user" --> "the user"

SUGGESTION
> # now, the update could either use the test_replica_id_full_idx or
> # test_replica_id_full_idy index; it is not possible for the user to
> control
> # which index to use
>
>
looks good, thanks


> ~
>
> 2e.
> # let's also test dropping test_replica_id_full_idy and
> # hence use test_replica_id_full_idx
>
>
> I think you ought to have dropped the other (first) index because we
> already know that the first index had been used (from earlier), but we
> are not 100% sure if the 'y' index has been chosen yet.
>

 make sense. Though in general it is hard to check pg_stat_all_indexes
for any of the indexes on this test, as we don't know the exact number
of tuples for each. Just wanted to explicitly note



> 
>
> 3. SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS
>
> 3a.
> # deletes 20 rows
> $node_publisher->safe_psql('postgres',
> "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
>
> # updates 20 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = 100, y = '200' WHERE x IN (1,
> 2);");
>
>
> "deletes" --> "delete"
>
> "updates" --> "update"
>

Well, I thought the command *deletes* but I guess delete looks better


>
> ~~~
>
> 4. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS
>
> # updates 200 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");
>
>
> "updates" --> "update"
>
> "200 rows" ??? is that right --  20 maybe ???
>
>
I guess this is from an earlier version of the patch, I fixed these types
of errors.


> ~~~
>
> 5. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES
>
> 5a.

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı
Hi Amit,


> >
> >>
> >> 4.
> >> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
> >>
> >> As we have removed enable_indexscan check, you should remove this test.
> >
> >
> > Hmm, I think my rebase problems are causing confusion here, which v38
> fixes.
> >
>
> I think it is still not fixed in v38 as the test is still present in 0001.
>

Ah, yes, sorry again for the noise. v39 will drop that.


>
> > In the first commit, we have ENABLE_INDEXSCAN checks. In the second
> commit,
> > I changed the same test to use enable_replica_identity_full_index_scan.
> >
> > If we are going to only consider the first patch to get into the master
> branch,
> > I can probably drop the test. In that case, I'm not sure what is our
> perspective
> > on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code
> > (hence the test)?
> >
>
> I am not sure what we are going to do on this because I feel we need
> some storage option as you have in 0002 patch but you and Andres
> thinks that is not required. So, we can discuss that a bit more after
> 0001 is committed but if there is no agreement then we need to
> probably drop it. Even if drop it, I don't think using enable_index
> makes sense. I think for now you don't need to send 0002 patch, let's
> first try to get 0001 patch and then we can discuss about 0002.
>

sounds good, when needed I'll rebase 0002.


> >
> > Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH
> MULTIPLE COLUMNS
> > already covers  SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.
> >
> > So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE
> ROWS AND COLUMNS
> > and dropped the second one. Let me know if it does not make sense to
> you. If I try, there are few more
> > opportunities to squeeze in some more tests together, but those would
> start to complicate readability.
> >
>
> I still want to reduce the test time and will think about it. Which of
> the other tests do you think can be combined?
>
>
I'll follow your suggestion in the next e-mail [2], and focus on further
improvements.

BTW, did you consider updating the patch based on my yesterday's email [1]?
>
>
Yes, replied to that one just now with some wip commits [1]


> It appears you are using inconsistent spacing. It may be better to use
> a single empty line wherever required.
>
>
Sure, let me fix those.

attached v39.

[1]
https://www.postgresql.org/message-id/CACawEhXGnk6v7UOHrxuJjjybHvvq33Zv666ouy4UzjPfJM6tBw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAA4eK1LSYWrthA3xjbrZvZVmwuha10HtM3-QRrVMD7YBt4t3pg%40mail.gmail.com


v39_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı
Hi,

Amit Kapila , 8 Mar 2023 Çar, 14:42 tarihinde şunu
yazdı:

> On Wed, Mar 8, 2023 at 4:51 PM Önder Kalacı  wrote:
> >
> >
> >>
> >> I just share this case and then we
> >> can discuss should we pick the index which only contain the extra
> columns on the
> >> subscriber.
> >>
> >
> > I think its performance implications come down to the discussion on [1].
> Overall, I prefer
> > avoiding adding any additional complexity in the code for some edge
> cases. The code
> > can handle this sub-optimal user pattern, with a sub-optimal performance.
> >
>
> It is fine to leave this and Hou-San's case if they make the patch
> complex. However, it may be better to give it a try and see if this or
> other regression/optimization can be avoided without adding much
> complexity to the patch. You can prepare a top-up patch and then we
> can discuss it.
>
>
>
Alright, I did some basic prototypes for the problems mentioned, just to
show
that these problems can be solved without too much hassle. But, the patchees
are not complete, some tests fail, no comments / tests exist, some values
should be
cached etc.  Mostly sharing as a heads up and sharing the progress given I
have not
responded to this specific mail. I'll update these when I have some extra
time after
replying to the 0001 patch.

 Thanks,
Onder


wip_optimize_for_non_pkey_non_ri_unique_index.patch
Description: Binary data


wip_for_optimize_index_column_match.diff
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı
Hi Amit, all


> >
>
> This new test takes ~9s on my machine whereas most other tests in
> subscription/t take roughly 2-5s. I feel we should try to reduce the
> test timing without sacrificing much of the functionality or code
> coverage.


Alright, that is reasonable.


> I think if possible we should try to reduce setup/teardown
> cost for each separate test by combining them where possible. I have a
> few comments on tests which also might help to optimize these tests.
>
> 1.
> +# Testcase start: SUBSCRIPTION USES INDEX
> +#
> +# Basic test where the subscriber uses index
> +# and only updates 1 row and deletes
> +# 1 other row
> ...
> ...
> +# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
> +#
> +# Basic test where the subscriber uses index
> +# and updates 50 rows
>
> ...
> +# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
> +#
> +# Basic test where the subscriber uses index
> +# and deletes 200 rows
>
> I think to a good extent these tests overlap each other. I think we
> can have just one test for the index with multiple columns that
> updates multiple rows and have both updates and deletes.
>

Alright, dropped *SUBSCRIPTION USES INDEX*, expanded
*SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS* with an UPDATE
that touches multiple rows


> 2.
> +# Testcase start: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
> ...
> ...
> +# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS
>
> Instead of having separate tests where we do all setups again, I think
> it would be better if we create both the indexes in one test and show
> that none of them is used.
>

Makes sense


>
> 3.
> +# now, the update could either use the test_replica_id_full_idy or
> test_replica_id_full_idy index
> +# it is not possible for user to control which index to use
>
> The name of the second index is wrong in the above comment.
>

thanks, fixed


>
> 4.
> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
>
> As we have removed enable_indexscan check, you should remove this test.
>

Hmm, I think my rebase problems are causing confusion here, which v38 fixes.

In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit,
I changed the same test to use enable_replica_identity_full_index_scan.

If we are going to only consider the first patch to get into the master
branch,
I can probably drop the test. In that case, I'm not sure what is our
perspective
on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code
(hence the test)?


>
> 5. In general, the line length seems to vary a lot for different
> multi-line comments. Though we are not strict in that it will look
> better if there is consistency in that (let's have ~80 cols line
> length for each comment in a single line).
>
>
Went over the tests, and made ~80 cols. There is one exception, in the first
commit, the test for enable_indexcan is still shorter, but I failed to make
that
properly. I'll try to fix that as well, but I didn't want to block the
progress due to
that.

Also, you have not noted, but I think* SUBSCRIPTION USES INDEX WITH
MULTIPLE COLUMNS*
already covers  *SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.*

So, I changed the first one to *SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS
AND COLUMNS*
and dropped the second one. Let me know if it does not make sense to you.
If I try, there are few more
opportunities to squeeze in some more tests together, but those would start
to complicate readability.

Attached v38.

Thanks,
Onder KALACI


v38_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v38_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı
Hi Vignesh C,


> > Hmm, can you please elaborate more on this? The declaration
> > and assignment are already on different lines.
> >
> > ps: pgindent changed this line a bit. Does that look better?
>
> I thought of changing it to something like below:
> bool isUsableIndex;
> Oid idxoid = lfirst_oid(lc);
> Relation indexRelation = index_open(idxoid, AccessShareLock);
> IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);
>
> isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
>
>
Alright, this looks slightly better. I did a small change to your
suggestion, basically kept  *lfirst_oid *
as the first statement in the loop.

I'll attach the changes on v38 in the next e-mail.


Thanks,
Onder KALACI


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı
Hi Peter,



> 1.
> In my previous review [1] (comment #1) I wrote that only some of the
> "should" were misleading and gave examples where to change. But I
> didn't say that *every* usage of that word was wrong, so your global
> replace of "should" to "must" has modified a couple of places in
> unexpected ways.
>
> Details are in subsequent review comments below -- see #2b, #3, #5.
>

Ah, that was my mistake. Thanks for thorough review on this.


>
> ==
> doc/src/sgml/logical-replication.sgml
>
> 2.
> A published table must have a “replica identity” configured in order
> to be able to replicate UPDATE and DELETE operations, so that
> appropriate rows to update or delete can be identified on the
> subscriber side. By default, this is the primary key, if there is one.
> Another unique index (with certain additional requirements) can also
> be set to be the replica identity. If the table does not have any
> suitable key, then it can be set to replica identity “full”, which
> means the entire row becomes the key. When replica identity “full” is
> specified, indexes can be used on the subscriber side for searching
> the rows. Candidate indexes must be btree, non-partial, and have at
> least one column reference (i.e. cannot consist of only expressions).
> These restrictions on the non-unique index properties adheres to some
> of the restrictions that are enforced for primary keys. Internally, we
> follow a similar approach for supporting index scans within logical
> replication scope. If there are no such suitable indexes, the search
> on the subscriber side can be very inefficient, therefore replica
> identity “full” must only be used as a fallback if no other solution
> is possible. If a replica identity other than “full” is set on the
> publisher side, a replica identity comprising the same or fewer
> columns must also be set on the subscriber side. See REPLICA IDENTITY
> for details on how to set the replica identity. If a table without a
> replica identity is added to a publication that replicates UPDATE or
> DELETE operations then subsequent UPDATE or DELETE operations will
> cause an error on the publisher. INSERT operations can proceed
> regardless of any replica identity.
>
> ~~
>
> 2a.
> My previous review [1] (comment #2)  was not fixed quite as suggested.
>
> Please change:
> "adheres to" --> "adhere to"
>
>
Oops, it seems I only got the "to" part of your suggestion and missed "s".

Done now.


> ~~
>
> 2b. should/must
>
> This should/must change was OK as it was before, because here it is only
> advice.
>
> Please change this back how it was:
> "must only be used as a fallback" --> "should only be used as a fallback"
>

Thanks, changed.


>
> ==
> src/backend/executor/execReplication.c
>
> 3. build_replindex_scan_key
>
>  /*
>   * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key'
> that
>   * is setup to match 'rel' (*NOT* idxrel!).
>   *
> - * Returns whether any column contains NULLs.
> + * Returns how many columns must be used for the index scan.
> + *
>
> ~
>
> This should/must change does not seem quite right.
>
> SUGGESTION (reworded)
> Returns how many columns to use for the index scan.
>

Fixed.

(I wish we had a simpler process to incorporate such
comments.)



>
> ~~~
>
> 4. build_replindex_scan_key
>
> >
> > Based on the discussions below, I kept as-is. I really don't want to do
> unrelated
> > changes in this patch, as I also got several feedback for not doing it,
> >
>
> Hmm, although this code pre-existed I don’t consider this one as
> "unrelated changes" because the patch introduced the new "if
> (!AttributeNumberIsValid(table_attno))" which changed things.  As I
> wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
> assignment *after* the potential 'continue' otherwise there is every
> chance that the assignment is just redundant. And if you move the
> assignment where it belongs, then you might as well declare the
> variable in the more appropriate place at the same time – i.e. with
> 'opfamily' declaration. Anyway, I've given my reason a couple of times
> now, so if you don't want to change it I won't about it debate
> anymore.
>

Alright, given both you and Amit [1] agree on this, I'll follow that.



>
> ==
> src/backend/replication/logical/relation.c
>
> 5. FindUsableIndexForReplicaIdentityFull
>
> + * XXX: There are no fundamental problems for supporting non-btree
> indexes.
> + * We mostly need to relax the limitations in
> RelationFindReplTupleByIndex().
> + * For partial indexes, the required changes are likely to be larger. If
> + * none of the tuples satisfy the expression for the index scan, we must
> + * fall-back to sequential execution, which might not be a good idea in
> some
> + * cases.
>
> The should/must change (the one in the XXX comment) does not seem quite
> right.
>
> SUGGESTION
> "we must fall-back to sequential execution" --> "we fallback to
> sequential execution"
>

fixed, thanks.


>
> ==
> 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Önder Kalacı
Hi Vignesh C,


>
> Few comments
> 1) Maybe this change is not required:
> fallback if no other solution is possible.  If a replica identity other
> than full is set on the publisher side, a replica
> identity
> -   comprising the same or fewer columns must also be set on the subscriber
> -   side.  See  for
> details on
> +   comprising the same or fewer columns must also be set on the
> subscriber side.
> +   See  for details on
>

Yes, fixed.

>
> 2) Variable declaration and the assignment can be split so that the
> readability is better:
> +
> +   boolisUsableIndex =
> +   IsIndexUsableForReplicaIdentityFull(indexInfo);
> +
> +   index_close(indexRelation, AccessShareLock);
> +


Hmm, can you please elaborate more on this? The declaration
and assignment are already on different lines.

ps: pgindent changed this line a bit. Does that look better?


3) Since there is only one statement within the if condition, the
> braces can be removed
> +   if (is_btree && !is_partial && !is_only_on_expression)
> +   {
> +   return true;
> +   }
>
>
Fixed on a newer version of the patch. Now it is only:

*return is_btree && !is_partial && !is_only_on_expression;*


> 4) There is minor indentation issue in this, we could run pgindent to fix
> it:
> +static Oid FindLogicalRepLocalIndex(Relation localrel,
> +
>LogicalRepRelation *remoterel);
> +
>
>
Yes, pgindent fixed it, thanks.


Attached v37

Thanks,
Onder KALACI


v37_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v37_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Önder Kalacı
Hi Shi Yu, all



> e.g.
> -- pub
> CREATE TABLE test_replica_id_full (x int, y int);
> ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
> -- sub
> CREATE TABLE test_replica_id_full (x int, y int, z int);
> CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(z);
> CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> port=5432' PUBLICATION tap_pub_rep_full;
>
> I didn't see in any cases the behavior changed after applying the patch,
> which
> looks good. Besides, I tested the performance for such case.
>

Thanks for testing this edge case. I thought we had a test for this, but it
seems to be missing.


>
> For the cases that all values of extra columns on the subscriber are NULL,
> index
> scan can't do better than sequential scan. This is not a real scenario and
> I
> think it only degrades when there are many NULL values in the index
> column, so
> this is probably not a case to worry about.


I also debugged this case as well, and don't see any problems with that
either. But I think this is a valid
test case given at some point we might forget about this case and somehow
break.

So, I'll add a new test with *PUBLICATION LACKS THE COLUMN ON THE SUBS
INDEX *on v36.



> I just share this case and then we
> can discuss should we pick the index which only contain the extra columns
> on the
> subscriber.
>
>
I think its performance implications come down to the discussion on [1].
Overall, I prefer
avoiding adding any additional complexity in the code for some edge cases.
The code
can handle this sub-optimal user pattern, with a sub-optimal performance.

Still, happy to hear other thoughts on this.

Thanks,
Onder KALACI


[1]
https://www.postgresql.org/message-id/20230307195119.ars36cx6gwqftoen%40awork3.anarazel.de


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Önder Kalacı
Hi Hou zj, all

>
> IIRC, it invokes tuples_equal for all cases unless we are using replica
> identity key or primary key to scan. But there seem some other cases where
> the
> tuples_equal looks unnecessary.
>
> For example, if the table on subscriber don't have a PK or RI key but have
> a
> not-null, non-deferrable, unique key. And if the apply worker choose this
> index
> to do the scan, it seems we can skip the tuples_equal as well.
>
>
Yeah, that's right. I also spotted this earlier, see* # Testcase start:
Unique index *
*that is not primary key or replica identity*

I'm thinking that we should not create any code complexity for this
case, at least with
this patch. I have a few small follow-up ideas, like this one, or allow
non-btree indexes etc.
but I'd rather not get those optional improvements to this patch, if that
makes sense.



Thanks,
Onder KALACI


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Önder Kalacı
Hi Peter, all


>
> 1.
> Saying the index "should" or "should not" do this or that sounds like
> it is still OK but just not recommended. TO remove this ambigity IMO
> most of the "should" ought to be changed to "must" because IIUC this
> patch will simply not consider indexes which do not obey all your
> rules.
>
> This comment applies to a few places (search for "should")
>
> e.g.1. - Commit Message
> e.g.2. - /* There should always be at least one attribute for the index
> scan. */
> e.g.3. - The function comment for
> FindUsableIndexForReplicaIdentityFull(Relation localrel)
>
> ==
> doc/src/sgml/logical-replication.sgml
>

I'm definitely not an expert on this subject (or native speaker). So, I'm
following your
suggestion.


>
> 2.
> A published table must have a “replica identity” configured in order
> to be able to replicate UPDATE and DELETE operations, so that
> appropriate rows to update or delete can be identified on the
> subscriber side. By default, this is the primary key, if there is one.
> Another unique index (with certain additional requirements) can also
> be set to be the replica identity. If the table does not have any
> suitable key, then it can be set to replica identity “full”, which
> means the entire row becomes the key. When replica identity “full” is
> specified, indexes can be used on the subscriber side for searching
> the rows. Candidate indexes must be btree, non-partial, and have at
> least one column reference (i.e. cannot consist of only expressions).
> These restrictions on the non-unique index properties adheres some of
> the restrictions that are enforced for primary keys. Internally, we
> follow a similar approach for supporting index scans within logical
> replication scope. If there are no such suitable indexes, the search
> on the subscriber side can be very inefficient, therefore replica
> identity “full” should only be used as a fallback if no other solution
> is possible. If a replica identity other than “full” is set on the
> publisher side, a replica identity comprising the same or fewer
> columns must also be set on the subscriber side. See REPLICA IDENTITY
> for details on how to set the replica identity. If a table without a
> replica identity is added to a publication that replicates UPDATE or
> DELETE operations then subsequent UPDATE or DELETE operations will
> cause an error on the publisher. INSERT operations can proceed
> regardless of any replica identity.
>
> ~
>
> "adheres some of" --> "adhere to some of" ?
>

sounds right, changed


>
> ==
> src/backend/executor/execReplication.c
>
> 3. build_replindex_scan_key
>
>   {
>   Oid operator;
>   Oid opfamily;
>   RegProcedure regop;
> - int pkattno = attoff + 1;
> - int mainattno = indkey->values[attoff];
> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> + int table_attno = indkey->values[index_attoff];
> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
>
> These variable declarations might look tidier if you kept all the Oids
> together.
>
> ==
> src/backend/replication/logical/relation.c
>

Based on the discussions below, I kept as-is. I really don't want to do
unrelated
changes in this patch, as I also got several feedback for not doing it,



>
> 4. logicalrep_partition_open
>
> + /*
> + * Finding a usable index is an infrequent task. It occurs when an
> + * operation is first performed on the relation, or after invalidation of
> + * the relation cache entry (such as ANALYZE or CREATE/DROP index on the
> + * relation).
> + *
> + * We also prefer to run this code on the oldctx such that we do not
> + * leak anything in the LogicalRepPartMapContext (hence
> + * CacheMemoryContext).
> + */
> + entry->localindexoid = FindLogicalRepLocalIndex(partrel, remoterel)
>
> "such that" --> "so that" ?
>
>
fixed


> ~~~
>
> 5. IsIndexUsableForReplicaIdentityFull
>
> +bool
> +IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
> +{
> + bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> + bool is_partial = (indexInfo->ii_Predicate != NIL);
> + bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
> +
> + if (is_btree && !is_partial && !is_only_on_expression)
> + {
> + return true;
> + }
> +
> + return false;
> +}
>
> SUGGESTION (no need for 2 returns here)
> return is_btree && !is_partial && !is_only_on_expression;
>

true, done


>
> ==
> src/backend/replication/logical/worker.c
>
> 6. FindReplTupleInLocalRel
>
> static bool
> FindReplTupleInLocalRel(EState *estate, Relation localrel,
> LogicalRepRelation *remoterel,
> Oid localidxoid,
> TupleTableSlot *remoteslot,
> TupleTableSlot **localslot)
> {
> bool found;
>
> /*
> * Regardless of the top-level operation, we're performing a read here, so
> * check for SELECT privileges.
> */
> TargetPrivilegesCheck(localrel, ACL_SELECT);
>
> *localslot = table_slot_create(localrel, >es_tupleTable);
>
> Assert(OidIsValid(localidxoid) ||
>(remoterel->replident == REPLICA_IDENTITY_FULL));
>
> if 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Önder Kalacı
Hi Amit, all


> You have not given complete steps to reproduce the problem where
> instead of the index scan, a sequential scan would be picked. I have
> tried to reproduce by extending your steps but didn't see the problem.
> Let me know if I am missing something.
>

I think the steps you shared are what I had in mind.


>
> I have debugged the above example and it uses an index scan during
> apply without your latest change which is what I expected. AFAICS, the
> use of IdxIsRelationIdentityOrPK() is to decide whether we will do
> tuples_equal() or not during the index scan and I see it gives the
> correct results with the example you provided.
>
>
Right, I got confused. IdxIsRelationIdentityOrPK is only called within
RelationFindReplTupleByIndex(). And, yes, it only impacts tuples_equal.

But, still, it feels safer to keep as the current patch if we don't change
the
name of the function.

I really don't have any strong opinions for either way, only a slight
preference
to keep as v35 for future callers not to get confused as we do here.

Let me know how you prefer this.


Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-07 Thread Önder Kalacı
Hi Amit, Peter


> > > Let me give an example to demonstrate why I thought something is fishy
> here:
> > >
> > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=.
> > > Imagine the same rel has a PRIMARY KEY with Oid=.
> > >
>

Hmm, alright, this is syntactically possible, but not sure if any user
would do this. Still thanks for catching this.

And, you are right, if a user has created such a schema,
IdxIsRelationIdentityOrPK()
would return the wrong result and we'd use sequential scan instead of index
scan.
This would be a regression. I think we should change the function.


Here is the example:
DROP TABLE tab1;
CREATE TABLE tab1 (a int NOT NULL);
CREATE UNIQUE INDEX replica_unique ON tab1(a);
ALTER TABLE tab1 REPLICA IDENTITY USING INDEX replica_unique;
ALTER TABLE tab1 ADD CONSTRAINT pkey PRIMARY KEY (a);


I'm attaching v35.

Does that make sense to you Amit?


v35_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v35_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-07 Thread Önder Kalacı
Hi Amit, all

>
> Few comments:
> =
> 1.
> +get_usable_indexoid(ApplyExecutionData *edata, ResultRelInfo *relinfo)
> {
> ...
> + if (targetrelkind == RELKIND_PARTITIONED_TABLE)
> + {
> + /* Target is a partitioned table, so find relmapentry of the partition */
> + TupleConversionMap *map = ExecGetRootToChildMap(relinfo, edata->estate);
> + AttrMap*attrmap = map ? map->attrMap : NULL;
> +
> + relmapentry =
> + logicalrep_partition_open(relmapentry, relinfo->ri_RelationDesc,
> +   attrmap);
>
>
> When will we hit this part of the code? As per my understanding, for
> partitioned tables, we always follow apply_handle_tuple_routing()
> which should call logicalrep_partition_open(), and do the necessary
> work for us.
>
>
Looking closer, there is really no need for that. I changed the
code such that we pass usableLocalIndexOid. It looks simpler
to me. Do you agree?



> 2. In logicalrep_partition_open(), it would be better to set
> localrelvalid after finding the required index. The entry should be
> marked valid after initializing/updating all the required members. I
> have changed this in the attached.
>
>
makes sense


> 3.
> @@ -31,6 +32,7 @@ typedef struct LogicalRepRelMapEntry
>   Relation localrel; /* relcache entry (NULL when closed) */
>   AttrMap*attrmap; /* map of local attributes to remote ones */
>   bool updatable; /* Can apply updates/deletes? */
> + Oid usableIndexOid; /* which index to use, or InvalidOid if none */
>
> Would it be better to name this new variable as localindexoid to match
> it with the existing variable localreloid? Also, the camel case for
> this variable appears odd.
>

yes, both makes sense


>
> 4. If you agree with the above, then we should probably change the
> name of functions get_usable_indexoid() and
> FindLogicalRepUsableIndex() accordingly.
>

I dropped get_usable_indexoid() as noted in (1).

Changed FindLogicalRepUsableIndex->FindLogicalRepLocalIndex



> 5.
> + {
> + /*
> + * If we had a primary key or relation identity with a unique index,
> + * we would have already found and returned that oid. At this point,
> + * the remote relation has replica identity full.
>
> These comments are not required as this just states what the code just
> above is doing.
>

I don't have any strong opinions on adding this comment, applied your
suggestion.


>
> Apart from the above, I have made some modifications in the other
> comments. If you are convinced with those, then kindly include them in
> the next version.
>
>
Sure, they all look good. I think I have lost (and caused the reviewers to
lose)
quite a bit of time on the comment reviews. Next time, I'll try to be more
prepared
for the comments.

Attached v34

Thanks,
Onder KALACI


v34_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v34_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-07 Thread Önder Kalacı
Hi Shi Yu, all


> Thanks for updating the patch. Here are some comments on v33-0001 patch.
>
> 1.
> +   if (RelationReplicaIdentityFullIndexScanEnabled(localrel) &&
> +   remoterel->replident == REPLICA_IDENTITY_FULL)
>
> RelationReplicaIdentityFullIndexScanEnabled() is introduced in 0002 patch,
> so
> the call to it should be moved to 0002 patch I think.
>

Ah, sure, a rebase issue, fixed in v34


>
> 2.
> +#include "optimizer/cost.h"
>
> Do we need this in the latest patch? I tried and it looks it could be
> removed
> from src/backend/replication/logical/relation.c.
>

Hmm, probably an artifact of the initial versions of the patch where we
needed some
costing functionality.


>
> 3.
> +# now, create a unique index and set the replica
> +$node_publisher->safe_psql('postgres',
> +   "CREATE UNIQUE INDEX test_replica_id_full_unique ON
> test_replica_id_full(x);");
> +$node_subscriber->safe_psql('postgres',
> +   "CREATE UNIQUE INDEX test_replica_id_full_unique ON
> test_replica_id_full(x);");
> +
>
> Should the comment be "now, create a unique index and set the replica
> identity"?
>

yes, fixed


>
> 4.
> +$node_publisher->safe_psql('postgres',
> +   "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX
> test_replica_id_full_unique;");
> +$node_subscriber->safe_psql('postgres',
> +   "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX
> test_replica_id_full_unique;");
> +
> +# wait for the synchronization to finish
> +$node_subscriber->wait_for_subscription_sync;
>
> There's no new tables to need to be synchronized here, should we remove
> the call
> to wait_for_subscription_sync?
>

right, probably a copy & paste typo, thanks for spotting.


I'll attach v34 with the next e-mail given the comments here only touch
small parts
of the patch.



Thanks,
Onder KALACI


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-07 Thread Önder Kalacı
Hi Andres, Amit, all

I think the case in which the patch regresses performance in is irrelevant
> in
> practice.
>

This is similar to what I think in this context.

I appreciate the effort from Shi Yu, so that we have a clear understanding
on the overhead.
But the tests we do on [1] where we observe the regression are largely
synthetic test cases
that aim to spot the overhead.

And having an index over thousands
> of non-differing values will generally perform badly, not just in this
> context.


Similarly, maybe there are some eccentric use patterns that might follow
this. But I also suspect
even if there are such patterns, could they really be performance sensitive?


Thanks,
Onder KALACI

[1]
https://www.postgresql.org/message-id/OSZPR01MB63103A4AFBBA56BAF8AE7FAAFDA39%40OSZPR01MB6310.jpnprd01.prod.outlook.com


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-03-06 Thread Önder Kalacı
Hi all,

Thanks for working on this.


> I imagine that a typical use case would be to set min_send_delay to
> several hours to days. I'm concerned that it could not be an
> acceptable trade-off for many users that the system cannot collect any
> garbage during that.
>

I'm not too worried about the WAL recycling, that mostly looks like
a documentation issue to me. It is not a problem that many PG users
are unfamiliar. Also, even though one day creating - altering subscription
is relaxed to be done by a regular user, one option could be to require
this setting to be changed by a superuser? That would alleviate my concern
regarding WAL recycling. A superuser should be able to monitor the system
and adjust the settings/hardware accordingly.

However, VACUUM being blocked by replication with a configuration
change on the subscription sounds more concerning to me. Blocking
VACUUM for hours could quickly escalate to performance problems.

On the other hand, we already have a similar problem with
recovery_min_apply_delay combined with hot_standby_feedback [1].
So, that probably is an acceptable trade-off for the pgsql-hackers.
If you use this feature, you should be even more careful.


> I think we can have the apply process write the decoded changes
> somewhere on the disk (as not temporary files) and return the flush
> LSN so that the apply worker can apply them later and the publisher
> can advance slot's LSN. The feature would be more complex but from the
> user perspective it would be better.
>

Yes, this might probably be one of the ideal solutions to the problem at
hand. But,
my current guess is that it'd be a non-trivial change with different
concurrency/failure
scenarios. So, I'm not sure if that is going to be a realistic patch to
pursue.


Thanks,
Onder KALACI



[1] PostgreSQL: Documentation: 15: 20.6. Replication



Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-06 Thread Önder Kalacı
Hi Amit, all


Amit Kapila , 6 Mar 2023 Pzt, 12:40 tarihinde şunu
yazdı:

> On Fri, Mar 3, 2023 at 6:40 PM Önder Kalacı  wrote:
> >
> > Hi Vignesh,
> >
> > Thanks for the review
> >
> >>
> >> 1) We are currently calling RelationGetIndexList twice, once in
> >> FindUsableIndexForReplicaIdentityFull function and in the caller too,
> >> we could avoid one of the calls by passing the indexlist to the
> >> function or removing the check here, index list check can be handled
> >> in FindUsableIndexForReplicaIdentityFull.
> >> +   if (remoterel->replident == REPLICA_IDENTITY_FULL &&
> >> +   RelationGetIndexList(localrel) != NIL)
> >> +   {
> >> +   /*
> >> +* If we had a primary key or relation identity with a
> >> unique index,
> >> +* we would have already found and returned that oid.
> >> At this point,
> >> +* the remote relation has replica identity full and
> >> we have at least
> >> +* one local index defined.
> >> +*
> >> +* We are looking for one more opportunity for using
> >> an index. If
> >> +* there are any indexes defined on the local
> >> relation, try to pick
> >> +* a suitable index.
> >> +*
> >> +* The index selection safely assumes that all the
> >> columns are going
> >> +* to be available for the index scan given that
> >> remote relation has
> >> +* replica identity full.
> >> +*/
> >> +   return FindUsableIndexForReplicaIdentityFull(localrel);
> >> +   }
> >> +
> >
> > makes sense, done
> >
>
> Today, I was looking at this comment and the fix for it. It seems to
> me that it would be better to not add the check (indexlist != NIL)
> here and rather get the indexlist in
> FindUsableIndexForReplicaIdentityFull(). It will anyway return
> InvalidOid, if there is no index and that way code will look a bit
> cleaner.
>
>
Yeah, seems easier to follow to me as well. Reflected it in the comment as
well.


Thanks,
Onder
From 42d1ccde2a3e904c31f47aa33bbb006bf33626dd Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Fri, 3 Mar 2023 18:20:57 +0300
Subject: [PATCH v33 2/2] Optionally disable index scan when replica identity
 is full

When replica identitiy is full, the logical replication apply workers
for subscription is capable of using index scans. However, index scans
might have some overhead compared to sequential scan. The main use case
for disabling the index scan could be that the table has many dead
tuples as well as many duplicates, where index scan might be slower
compared to sequential scan.

We add a new option to table storage parameter that the user can
control the behavior.
---
 doc/src/sgml/ref/create_table.sgml| 16 +-
 src/backend/access/common/reloptions.c| 11 +++
 src/include/utils/rel.h   | 12 +++
 .../subscription/t/032_subscribe_use_index.pl | 31 +++
 4 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index a03dee4afe..89eb154730 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1812,7 +1812,21 @@ WITH ( MODULUS numeric_literal, REM
 

 
-   
+  
+enable_replica_identity_full_index_scan (boolean)
+
+ enable_replica_identity_full_index_scan storage parameter
+
+
+
+ 
+  Controls using avaliable indexes on the apply worker when replica identity is full. See
+   for details. The default is true.
+ 
+
+   
+
+  
 
   
  
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 14c23101ad..3e54e4a52b 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -168,6 +168,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"enable_replica_identity_full_index_scan",
+			"Enables index scan on logical replication apply worker",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		true
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1877,6 +1886,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, analyze_scale_factor)},
 		{"user_catalog_table", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, user_catalog_table)},
+		{"enable_replica_identity_full_index_scan", 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-06 Thread Önder Kalacı
Hi Peter, all


>
> 1.
> A published table must have a replica identity configured in order to
> be able to replicate UPDATE and DELETE operations, so that appropriate
> rows to update or delete can be identified on the subscriber side. By
> default, this is the primary key, if there is one. Another unique
> index (with certain additional requirements) can also be set to be the
> replica identity. When replica identity FULL is specified, indexes can
> be used on the subscriber side for searching the rows. These indexes
> should be btree, non-partial and have at least one column reference
> (e.g., should not consist of only expressions). These restrictions on
> the non-unique index properties are in essence the same restrictions
> that are enforced for primary keys. Internally, we follow the same
> approach for supporting index scans within logical replication scope.
> If there are no such suitable indexes, the search on the subscriber
> side can be very inefficient, therefore replica identity FULL should
> only be used as a fallback if no other solution is possible. If a
> replica identity other than full is set on the publisher side, a
> replica identity comprising the same or fewer columns must also be set
> on the subscriber side. See REPLICA IDENTITY for details on how to set
> the replica identity. If a table without a replica identity is added
> to a publication that replicates UPDATE or DELETE operations then
> subsequent UPDATE or DELETE operations will cause an error on the
> publisher. INSERT operations can proceed regardless of any replica
> identity.
>
> ~
>
> 1a.
> Changes include:
> "should" --> "must"
> "e.g." --> "i.e."
>
>
makes sense


> BEFORE
> These indexes should be btree, non-partial and have at least one
> column reference (e.g., should not consist of only expressions).
>
> SUGGESTION
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e., cannot consist of only expressions).
>
> ~
>
> 1b.
> The fix for my v27 review comment #2b (changing "full" to FULL) was
> not made correctly. It should be uppercase FULL, not full:
> "other than full" --> "other than FULL"
>

Right, changed that.


>
> ==
> src/backend/executor/execReplication.c
>
> 2.
>  /*
>   * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key'
> that
>   * is setup to match 'rel' (*NOT* idxrel!).
>   *
> - * Returns whether any column contains NULLs.
> + * Returns how many columns should be used for the index scan.
> + *
> + * This is not generic routine, it expects the idxrel to be
> + * a btree, non-partial and have at least one column
> + * reference (e.g., should not consist of only expressions).
>   *
> - * This is not generic routine, it expects the idxrel to be replication
> - * identity of a rel and meet all limitations associated with that.
> + * By definition, replication identity of a rel meets all
> + * limitations associated with that. Note that any other
> + * index could also meet these limitations.
>   */
> -static bool
> +static int
>  build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
>   TupleTableSlot *searchslot)
>
> ~
>
> "(e.g., should not consist of only expressions)" --> "(i.e., cannot
> consist of only expressions)"
>
>
fixed


> ==
> src/backend/replication/logical/relation.c
>
> 3. FindUsableIndexForReplicaIdentityFull
>
> +/*
> + * Returns the oid of an index that can be used via the apply
> + * worker. The index should be btree, non-partial and have at
> + * least one column reference (e.g., should not consist of
> + * only expressions). The limitations arise from
> + * RelationFindReplTupleByIndex(), which is designed to handle
> + * PK/RI and these limitations are inherent to PK/RI.
>
> The 2nd sentence of this comment should match the same changes in the
> Commit message --- "must not" instead of "should not", "i.e." instead
> of "e.g.", etc. See the review comment #1a above.
>
>
Isn't "cannot" better than "must not" ? You also seem to suggest "cannot"
just above.

I changed it to "cannot" in all places.



> ~~~
>
> 4. IdxIsRelationIdentityOrPK
>
> +/*
> + * Given a relation and OID of an index, returns true if the
> + * index is relation's replica identity index or relation's
> + * primary key's index.
> + *
> + * Returns false otherwise.
> + */
> +bool
> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> +{
> + Assert(OidIsValid(idxoid));
> +
> + return GetRelationIdentityOrPK(rel) == idxoid;
> +}
>
> I think you've "simplified" this function in v28 but AFAICT now it has
> a different logic to v27.
>
> PREVIOUSLY it was coded like
> + return RelationGetReplicaIndex(rel) == idxoid ||
> + RelationGetPrimaryKeyIndex(rel) == idxoid;
>
> But now in that scenario, it won't
> even check the PK if there was a valid RI. So it might return false
> when previously it returned true. Is it deliberate?
>
>
Thanks for detailed review/investigation on this. But, I also agree that
there is no difference in terms of 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-06 Thread Önder Kalacı
Hi Amit, all


>
> >
> > I think you've "simplified" this function in v28 but AFAICT now it has
> > a different logic to v27.
> >
> > PREVIOUSLY it was coded like
> > + return RelationGetReplicaIndex(rel) == idxoid ||
> > + RelationGetPrimaryKeyIndex(rel) == idxoid;
> >
> > You can see if 'idxoid' did NOT match RI but if it DID match PK
> > previously it would return true. But now in that scenario, it won't
> > even check the PK if there was a valid RI. So it might return false
> > when previously it returned true. Is it deliberate?
> >
>
> I don't see any problem with this because by default PK will be a
> replica identity. So only if the user specifies the replica identity
> full or changes the replica identity to some other index, we will try
> to get PK which seems valid for this case. Am, I missing something
> which makes this code do something bad?
>

I also re-investigated the code, and I also don't see any issues with that.

See my comment to Peter's original review on this.


>
> Few other comments on latest code:
> 
> 1.
>
> -   A published table must have a replica identity
> configured in
> +   A published table must have a replica
> identity configured in
>
> How the above change is related to this patch?
>

As you suggest, I'm undoing this change.


>
> 2.
> certain additional requirements) can also be set to be the replica
> -   identity.  If the table does not have any suitable key, then it can be
> set
> +   identity. If the table does not have any suitable key, then it can be
> set
>
> I think we should change the spacing of existing docs (two spaces
> after fullstop to one space) and that too inconsistently. I suggest to
> add new changes with same spacing as existing doc. If you are adding
> entirely new section then we can consider differently.
>

Alright, so changed all this section to two spaces after fullstop.



> 3.
> to replica identity full, which means the entire row
> becomes
> -   the key.  This, however, is very inefficient and should only be used
> as a
> -   fallback if no other solution is possible.  If a replica identity other
> -   than full is set on the publisher side, a replica
> identity
> -   comprising the same or fewer columns must also be set on the subscriber
> -   side.  See  for
> details on
> +   the key. When replica identity FULL is specified,
> +   indexes can be used on the subscriber side for searching the rows.
> These
>
> Shouldn't specifying FULL be consistent wih existing
> docs?
>
>
>
Considering the discussion below, I'm switching all back
to full. Let's
be consistent with the existing code. Peter already suggested to improve
that with a follow-up
patch. If that lands in, I can reflect the changes on this patch as well.

Given the changes are small, I'll incorporate the changes with v33 in my
next e-mail.

Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-06 Thread Önder Kalacı
Hi Amit, all


> >
> > Given Amit's suggestion on [1], I'm planning to drop this check
> altogether, and
> > rely on table storage parameters.
> >
>
> This still seems to be present in the latest version. I think we can
> just remove this and then add the additional check as suggested by you
> as part of the second patch.
>
>
Now attaching the second patch as well, which implements a new
storage parameter as
you suggested earlier.

I'm open for naming suggestions, I wanted to make the name explicit, so it
is a little long.

I'm also not very familiar with the sgml format. I mostly followed the
existing docs and
built the docs for inspection, but it would be good to have a look into
that part
a little bit further in case there I missed something important etc.



> Few other comments on latest version:
> ==
> 1.
> +/*
> + * Returns true if the index is usable for replica identity full. For
> details,
> + * see FindUsableIndexForReplicaIdentityFull.
> + */
> +bool
> +IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
> +{
> + bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> + bool is_partial = (indexInfo->ii_Predicate != NIL);
> + bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
> +
> + if (is_btree && !is_partial && !is_only_on_expression)
> + {
> + return true;
> ...
> ...
> +/*
> + * Returns the oid of an index that can be used via the apply worker. The
> index
> + * should be btree, non-partial and have at least one column reference
> (e.g.,
> + * should not consist of only expressions). The limitations arise from
> + * RelationFindReplTupleByIndex(), which is designed to handle PK/RI and
> these
> + * limitations are inherent to PK/RI.
>
> By these two, the patch infers that it picks an index that adheres to
> the limitations of PK/RI. Apart from unique, the other properties of
> RI are "not partial, not deferrable, and include only columns marked
> NOT NULL". See ATExecReplicaIdentity() for corresponding checks. We
> don't try to ensure the last two from the list. It is fine to do so if
> we document the reasons for the same in comments or we can even try to
> enforce the remaining restrictions as well. For example, it should be
> okay to allow NULL column values because we anyway compare the entire
> tuple after getting the value from the index.
>

I think this is a documentation issue of this patch. I improved the wording
a bit
more. Does that look better?

I also went over the code / docs to see if we have
any other such documentation issues, I also
updated logical-replication.sgml.

I'd prefer to support NULL values as there is no harm in doing that and it
is
pretty useful feature (we also have tests covering it).

To my knowledge, I don't see any problems with deferrable as we are only
interested in the indexes, not with the constraint. Still, if you see any,
I can
add the check for that. (Here, the user could still have unique index that
is associated with a constraint, but still, I don't see any edge cases
regarding deferrable constraints).



> 2.
> + {
> + /*
> + * This attribute is an expression, and
> + * FindUsableIndexForReplicaIdentityFull() was called earlier
> + * when the index for subscriber was selected. There, the indexes
> + * comprising *only* expressions have already been eliminated.
> + *
> + * Also, because PK/RI can't include expressions we
> + * sanity check the index is neither of those kinds.
> + */
> + Assert(!IdxIsRelationIdentityOrPK(rel, idxrel->rd_id));
>
> This comment doesn't make much sense after you have moved the
> corresponding Assert in RelationFindReplTupleByIndex(). Either we
> should move or remove this Assert as well or at least update the
> comments to reflect the latest code.
>

I think removing that Assert is fine after having a more generic
Assert in RelationFindReplTupleByIndex().

I mostly left that comment so that the meaning of
AttributeNumberIsValid() is easier for readers to follow. But, now
I'm also leaning towards removing the comment and Assert.


>
> 3. When FindLogicalRepUsableIndex() is invoked from
> logicalrep_partition_open(), the current memory context would be
> LogicalRepPartMapContext which would be a long-lived context and we
> allocate memory for indexes in FindLogicalRepUsableIndex() which can
> accumulate over a period of time. So, I think it would be better to
> switch to the old context in logicalrep_partition_open() before
> invoking FindLogicalRepUsableIndex() provided that is not a long-lived
> context.
>
>
>
Hmm, makes sense, that'd avoid any potential leaks that this patch
might bring. Applied your suggestion. Also, looking at the same function
call in logicalrep_rel_open(), that already seems safe regarding leaks. Do
you see any problems with that?



Attached v32. I'll continue replying to the e-mails on this thread with
different
patches. I'm assuming this is easier for you to review such that we have
different
patches for each review. If not, please let me know, I 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-03 Thread Önder Kalacı
Hi Hayato, Amit, all


>
>
> ```
> +   /* Build scankey for every non-expression attribute in the index.
> */
> ```
>
> I think that single line comments should not terminated by ".".
>

Hmm, looking into execReplication.c, many of the single line comments
terminated by ".".  Also, On the HEAD, the same comment has single
line comment. So, I'd rather stick to that?



>
> ```
> +   /* There should always be at least one attribute for the index
> scan. */
> ```
>
> Same as above.
>

Same as above :)


>
>
> ```
> +#ifdef USE_ASSERT_CHECKING
> +   IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
> +
> +   Assert(!IsIndexOnlyOnExpression(indexInfo));
> +#endif
> ```
>
> I may misunderstand, but the condition of usable index has alteady been
> checked
> when the oid was set but anyway the you confirmed the condition again
> before
> really using that, right?
> So is it OK not to check another assumption that the index is b-tree,
> non-partial,
> and one column reference?

IIUC we can do that by adding new function like
> IsIndexUsableForReplicaIdentityFull()
> that checks these condition, and then call at
> RelationFindReplTupleByIndex() if
> idxIsRelationIdentityOrPK is false.
>

I think adding a function like IsIndexUsableForReplicaIdentityFull is
useful. I can use it inside
FindUsableIndexForReplicaIdentityFull() and assert here. Also good for
readability.

So, I mainly moved this assert to a more generic place with a more generic
check
to RelationFindReplTupleByIndex


>
> 032_subscribe_use_index.pl
>
> ```
> +# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
> ...
> +# Testcase end: SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX
> ```
>
> There is still non-consistent case.
>
>
Fixed, thanks

Anyway, I see your point, so
> if you want to keep the naming as you proposed at least don't change
> the ordering for get_opclass_input_type() call because that looks odd
> to me.


(A small comment from Amit's previous e-mail)

Sure, applied now.

Attaching v31.

Thanks for the reviews!
Onder KALACI


v31_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-03 Thread Önder Kalacı
Hi Vignesh,

Thanks for the review


> 1) We are currently calling RelationGetIndexList twice, once in
> FindUsableIndexForReplicaIdentityFull function and in the caller too,
> we could avoid one of the calls by passing the indexlist to the
> function or removing the check here, index list check can be handled
> in FindUsableIndexForReplicaIdentityFull.
> +   if (remoterel->replident == REPLICA_IDENTITY_FULL &&
> +   RelationGetIndexList(localrel) != NIL)
> +   {
> +   /*
> +* If we had a primary key or relation identity with a
> unique index,
> +* we would have already found and returned that oid.
> At this point,
> +* the remote relation has replica identity full and
> we have at least
> +* one local index defined.
> +*
> +* We are looking for one more opportunity for using
> an index. If
> +* there are any indexes defined on the local
> relation, try to pick
> +* a suitable index.
> +*
> +* The index selection safely assumes that all the
> columns are going
> +* to be available for the index scan given that
> remote relation has
> +* replica identity full.
> +*/
> +   return FindUsableIndexForReplicaIdentityFull(localrel);
> +   }
> +
>

makes sense, done


>
> 2) Copyright year should be mentioned as 2023
> diff --git a/src/test/subscription/t/032_subscribe_use_index.pl
> b/src/test/subscription/t/032_subscribe_use_index.pl
> new file mode 100644
> index 00..db0a7ea2a0
> --- /dev/null
> +++ b/src/test/subscription/t/032_subscribe_use_index.pl
> @@ -0,0 +1,861 @@
> +# Copyright (c) 2021-2022, PostgreSQL Global Development Group
> +
> +# Test logical replication behavior with subscriber uses available index
> +use strict;
> +use warnings;
> +use PostgreSQL::Test::Cluster;
> +use PostgreSQL::Test::Utils;
> +use Test::More;
> +
>

I changed it to #Copyright (c) 2022-2023, but I'm not sure if it should be
only 2023 or
like this.


>
> 3) Many of the tests are using the same tables, we need not
> drop/create publication/subscription for each of the team, we could
> just drop and create required indexes and verify the update/delete
> statements.
> +# 
> +# Testcase start: SUBSCRIPTION USES INDEX
> +#
> +# Basic test where the subscriber uses index
> +# and only updates 1 row and deletes
> +# 1 other row
> +#
> +
> +# create tables pub and sub
> +$node_publisher->safe_psql('postgres',
> +   "CREATE TABLE test_replica_id_full (x int)");
> +$node_publisher->safe_psql('postgres',
> +   "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
> +$node_subscriber->safe_psql('postgres',
> +   "CREATE TABLE test_replica_id_full (x int)");
> +$node_subscriber->safe_psql('postgres',
> +   "CREATE INDEX test_replica_id_full_idx ON
> test_replica_id_full(x)");
>
> +# 
> +# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
> +#
> +# This test ensures that after CREATE INDEX, the subscriber can
> automatically
> +# use one of the indexes (provided that it fulfils the requirements).
> +# Similarly, after DROP index, the subscriber can automatically switch to
> +# sequential scan
> +
> +# create tables pub and sub
> +$node_publisher->safe_psql('postgres',
> +   "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");
> +$node_publisher->safe_psql('postgres',
> +   "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
> +$node_subscriber->safe_psql('postgres',
> +   "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");
>

Well, not all the tables are exactly the same, there are 4-5 different
tables. Mostly the table names are the same.

Plus, the overhead does not seem to be large enough to complicate
the test. Many of the src/test/subscription/t files follow this pattern.

Do you have strong opinions on changing this?


> 4) These additional blank lines can be removed to keep it consistent:
> 4.a)
> +# Testcase end: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
> +# 
> +
> +
> +# 
> +# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS
>
> 4.b)
> +# Testcase end: Unique index that is not primary key or replica identity
> +# 
> +
> +
> +
> +# 
> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
>

Thanks, fixed.

Attached v30
From 29ec8ec8f9e69db3850fa1f7af83e06baa334910 Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Wed, 22 Feb 2023 14:12:56 +0300
Subject: [PATCH 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-03 Thread Önder Kalacı
Hi Amit, all


> Few comments on 0001
> 
> 1.
> +   such suitable indexes, the search on the subscriber s ide can be
> very inefficient,
>
> unnecessary space in 'side'
>

Fixed in v28


>
> 2.
> -   identity.  If the table does not have any suitable key, then it can be
> set
> -   to replica identity full, which means the entire row
> becomes
> -   the key.  This, however, is very inefficient and should only be used
> as a
> -   fallback if no other solution is possible.  If a replica identity other
> +   identity. When replica identity FULL is specified,
> +   indexes can be used on the subscriber side for searching the rows.
>
> I think it is better to retain the first sentence (If the table does
> not ... entire row becomes the key.) as that says what will be part of
> the key.
>
>
right, that sentence looks useful, added back.


> 3.
> -   comprising the same or fewer columns must also be set on the subscriber
> -   side.  See  for
> details on
> -   how to set the replica identity.  If a table without a replica
> identity is
> -   added to a publication that replicates UPDATE
> +   comprising the same or fewer columns must also be set on the
> subscriber side.
> +   See  for
> +   details on how to set the replica identity.  If a table without a
> replica
> +   identity is added to a publication that replicates
> UPDATE
>
> I don't see any change in this except line length. If so, I don't
> think we should change it as part of this patch.
>
>
Yes, fixed. But the first line (starting with See  4.
>  /*
>   * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key'
> that
>   * is setup to match 'rel' (*NOT* idxrel!).
>   *
> - * Returns whether any column contains NULLs.
> + * Returns how many columns should be used for the index scan.
> + *
> + * This is not generic routine, it expects the idxrel to be
> + * a btree, non-partial and have at least one column
> + * reference (e.g., should not consist of only expressions).
>   *
> - * This is not generic routine, it expects the idxrel to be replication
> - * identity of a rel and meet all limitations associated with that.
> + * By definition, replication identity of a rel meets all
> + * limitations associated with that. Note that any other
> + * index could also meet these limitations.
>
> The comment changes look quite asymmetric to me. Normally, we break
> the line if the line length goes beyond 80 cols. Please check and
> change other places in the patch if they have a similar symptom.
>

Went over the patch, and expanded each line ~80 chars.

I'm guessing 80 is not the hard limit, in some cases I went over 81-82.


>
> 5.
> + * There are no fundamental problems for supporting non-btree
> + * and/or partial indexes.
>
> Can we mention partial indexes in the above comment? It seems to me
> that because the required tuple may not satisfy the expression (in the
> case of partial indexes) it may not be easy to support it.
>

Expanded the comment and explained the differences a little further.


>
> 6.
> build_replindex_scan_key()
> {
> ...
> + for (index_attoff = 0; index_attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel);
> + index_attoff++)
> ...
> ...
> +#ifdef USE_ASSERT_CHECKING
> + IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
> +
> + Assert(!IsIndexOnlyOnExpression(indexInfo));
> +#endif
> ...
> }
>
> We can avoid building index info multiple times. This can be either
> checked at the beginning of the function outside attribute offset loop
> or we can probably cache it. I understand this is for assert builds
> but seems easy to avoid it doing multiple times and it also looks odd
> to do it multiple times for the same index.
>

Applied your suggestions. Although I do not have strong opinions, I think
that
it was easier to follow with building the indexInfo for each iteration.


>
> 7.
> - /* Build scankey for every attribute in the index. */
> - for (attoff = 0; attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
> + /* Build scankey for every non-expression attribute in the index. */
> + for (index_attoff = 0; index_attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel);
> + index_attoff++)
>   {
>   Oid operator;
>   Oid opfamily;
> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
>   RegProcedure regop;
> - int pkattno = attoff + 1;
> - int mainattno = indkey->values[attoff];
> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> + int table_attno = indkey->values[index_attoff];
>
> I don't think here we need to change variable names if we retain
> mainattno as it is instead of changing it to table_attno. The current
> naming doesn't seem bad for the current usage of the patch.
>

Hmm, I'm actually not convinced that the variable naming on HEAD is good for
the current patch. The main difference is that now we allow indexes like:
   * CREATE INDEX idx ON table(foo(col), col_2)*

(See # Testcase start: SUBSCRIPTION CAN USE INDEXES WITH
EXPRESSIONS AND 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-02 Thread Önder Kalacı
Hi Peter, all

>
> ==
> Commit Message
>
> 1.
> There is no smart mechanism to pick the index. Instead, we choose
> the first index that fulfils the requirements mentioned above.
>
> ~
>
> 1a.
> I think this paragraph should immediately follow the earlier one
> ("With this patch...") which talked about the index requirements.
>
>
makes sense


>
> 1b.
> Slight rewording
>
> SUGGESTION
> If there is more than one index that satisfies these requirements, we
> just pick the first one.
>
>
applied


> ==
> doc/src/sgml/logical-replication.sgml
>
> 2.
> A published table must have a “replica identity” configured in order
> to be able to replicate UPDATE and DELETE operations, so that
> appropriate rows to update or delete can be identified on the
> subscriber side. By default, this is the primary key, if there is one.
> Another unique index (with certain additional requirements) can also
> be set to be the replica identity. When replica identity FULL is
> specified, indexes can be used on the subscriber side for searching
> the rows. These indexes should be btree, non-partial and have at least
> one column reference (e.g., should not consist of only expressions).
> These restrictions on the non-unique index properties are in essence
> the same restrictions that are enforced for primary keys. Internally,
> we follow the same approach for supporting index scans within logical
> replication scope. If there are no such suitable indexes, the search
> on the subscriber s ide can be very inefficient, therefore replica
> identity FULL should only be used as a fallback if no other solution
> is possible. If a replica identity other than “full” is set on the
> publisher side, a replica identity comprising the same or fewer
> columns must also be set on the subscriber side. See REPLICA IDENTITY
> for details on how to set the replica identity. If a table without a
> replica identity is added to a publication that replicates UPDATE or
> DELETE operations then subsequent UPDATE or DELETE operations will
> cause an error on the publisher. INSERT operations can proceed
> regardless of any replica identity.
>
> ~
>
> 2a.
> IMO the replica identity in the first sentence should
> be changed to be replica identity
>



>
> ~
>
> 2b.
> Typo: "subscriber s ide" --> "subscriber side"
>

fixed


> 2c.
> There is still one remaining "full" in this text. I think ought to be
> changed to FULL to match the others.
>
>
changed


> ==
> src/backend/executor/execReplication.c
>
> 3. IdxIsRelationIdentityOrPK
>
> +/*
> + * Given a relation and OID of an index, returns true if the
> + * index is relation's primary key's index or relation's
> + * replica identity index.
> + *
> + * Returns false otherwise.
> + */
> +bool
> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> +{
> + Assert(OidIsValid(idxoid));
> +
> + return RelationGetReplicaIndex(rel) == idxoid ||
> + RelationGetPrimaryKeyIndex(rel) == idxoid;
>  }
>
> ~
>
> Since the function name mentions RI (1st) and then PK (2nd), and since
> the implementation also has the same order, I think the function
> comment should use the same consistent order when describing what it
> does.
>

alright, done


>
> ==
> src/backend/replication/logical/relation.c
>
> 4. FindUsableIndexForReplicaIdentityFull
>
> +/*
> + * Returns an index oid if there is an index that can be used
> + * via the apply worker. The index should be btree, non-partial
> + * and have at least one column reference (e.g., should
> + * not consist of only expressions). The limitations arise from
> + * RelationFindReplTupleByIndex(), which is designed to handle
> + * PK/RI and these limitations are inherent to PK/RI.
> + *
> + * There are no fundamental problems for supporting non-btree
> + * and/or partial indexes. We should mostly relax the limitations
> + * in RelationFindReplTupleByIndex().
> + *
> + * If no suitable index is found, returns InvalidOid.
> + *
> + * Note that this is not a generic function, it expects REPLICA
> + * IDENTITY FULL for the remote relation.
> + */
>
> ~
>
> 4a.
> Minor rewording of 1st sentence.
>
> BEFORE
> Returns an index oid if there is an index that can be used via the apply
> worker.
>
> SUGGESTION
> Returns the oid of an index that can be used via the apply worker.
>
>
looks better, applied


>
> 4b.
> + * There are no fundamental problems for supporting non-btree
> + * and/or partial indexes. We should mostly relax the limitations
> + * in RelationFindReplTupleByIndex().
>
> I think this paragraph should come later in the comment (just before
> the Note) and should also have "XXX" prefix to indicate it is some
> implementation note for future versions.
>
>
>
done


>
> 5. GetRelationIdentityOrPK
>
> +/*
> + * Get replica identity index or if it is not defined a primary key.
> + *
> + * If neither is defined, returns InvalidOid
> + */
> +Oid
> +GetRelationIdentityOrPK(Relation rel)
> +{
> + Oid idxoid;
> +
> + idxoid = RelationGetReplicaIndex(rel);
> +
> + if 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-02 Thread Önder Kalacı
Hi Hou zj, all


>
> 1.
> +   usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
> +
> "usableIndexContext",
> +
> ALLOCSET_DEFAULT_SIZES);
> +   oldctx = MemoryContextSwitchTo(usableIndexContext);
> +
> +   /* Get index list of the local relation */
> +   indexlist = RelationGetIndexList(localrel);
> +   Assert(indexlist != NIL);
> +
> +   foreach(lc, indexlist)
>
> Is it necessary to create a memory context here ? I thought the memory
> will be
> freed after this apply action soon.
>

Yeah, probably not useful anymore, removed.

In the earlier versions of this patch, this code block was relying on some
planner functions. In that case, it felt safer to use a memory context. Now,
it seems useless.



> 2.
>
> +   /*
> +* Furthermore, because primary key and unique key
> indexes can't
> +* include expressions we also sanity check the
> index is neither
> +* of those kinds.
> +*/
> +   Assert(!IdxIsRelationIdentityOrPK(rel,
> idxrel->rd_id));
>
> It seems you mean "replica identity key" instead of "unique key" in the
> comments.
>

Right, I fixed this comment. Though, are you mentioning multiple comment*s*?
I couldn't
see any other in the patch. Let me know if you see.


>
>
> 3.
> --- a/src/include/replication/logicalrelation.h
> +++ b/src/include/replication/logicalrelation.h
> ...
> +extern bool IsIndexOnlyOnExpression(IndexInfo *indexInfo);
>
> The definition function seems better to be placed in execReplication.c
>

Hmm, why do you think so? IsIndexOnlyOnExpression() is used in
logical/relaton.c, and used for an assertion on execReplication.c

I think it is better suited for relaton.c, but let me know about
your perspective as well.


>
> 4.
>
> +extern Oid GetRelationIdentityOrPK(Relation rel);
>
> The function is only used in relation.c, so we can make it a static
> function.
>
>
In the recent iteration of the patch (I think v27), we also use this
function in check_relation_updatable() in logical/worker.c.

One could argue that we can move the definition back to worker.c,
but it feels better suited for in relation.c, as the perimeter of the
function
is a *Rel*, and the function is looking for a property of a relation.

Let me know if you think otherwise, I don't have strong opinions
on this.


>
> 5.
>
> +   /*
> +* If index scans are disabled, use a sequential scan.
> +*
> +* Note that we do not use index scans below when enable_indexscan
> is
> +* false. Allowing primary key or replica identity even when index
> scan is
> +* disabled is the legacy behaviour. So we hesitate to move the
> below
> +* enable_indexscan check to be done earlier in this function.
> +*/
> +   if (!enable_indexscan)
> +   return InvalidOid;
>
> Since the document of enable_indexscan says "Enables or disables the query
> planner's use of index-scan plan types. The default is on.", and we don't
> use
> planner here, so I am not sure should we allow/disallow index scan in apply
> worker based on this GUC.
>
>
Given Amit's suggestion on [1], I'm planning to drop this check altogether,
and
rely on table storage parameters.

(I'll incorporate these changes with a patch that I'm going to reply
to Peter's e-mail).

Thanks,
Onder

[1]:
https://www.postgresql.org/message-id/CAA4eK1KP-sV4aER51J-2mELjNzq_zVSLf1%2BW90Vu0feo-thVNA%40mail.gmail.com


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-02 Thread Önder Kalacı
Hi Amit, Shi Yu


> >
> > b. Executed SQL.
> > I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you
> did the
> > same, or just executed 50 consecutive UPDATEs. If the latter one, there
> would be
> > lots of old tuples and this might have a bigger impact on sequential
> scan. I
> > tried this case (which executes 50 consecutive UPDATEs) and also saw
> that the
> > overhead is smaller than before.
>

Alright, I'll do similarly, execute truncate/insert before each update.


> In the above profile number of calls to index_fetch_heap(),
> heapam_index_fetch_tuple() explains the reason for the regression you
> are seeing with the index scan. Because the update will generate dead
> tuples in the same transaction and those dead tuples won't be removed,
> we get those from the index and then need to perform
> index_fetch_heap() to find out whether the tuple is dead or not. Now,
> for sequence scan also we need to scan those dead tuples but there we
> don't need to do back-and-forth between index and heap.


Thanks for the insights, I think what you describe makes a lot of sense.



> I think we can
> once check with more number of tuples (say with 2, 5, etc.)
> for case-1.
>
>
As we'd expect, this test made the performance regression more visible.

I quickly ran case-1 for 50 times with 5 as Shi Yu does, and got
the following results. I'm measuring end-to-end times for running the
whole set of commands:

seq_scan: 00 hr 24 minutes, 42 seconds
index_scan:  01 hr 04 minutes 54 seconds


But, I'm still not sure whether we should focus on this regression too
much. In the end, what we are talking about is a case (e.g., all or many
rows are duplicated) where using an index is not a good idea anyway. So,
I doubt users would have such indexes.


>  The quadratic apply performance
> the sequential scans cause, are a much bigger hazard for users than some
apply
> performance reqression.

Quoting Andres' note, I personally think that the regression for this case
is not a big concern.

> I'd prefer not having an option, because we figure out the cause of the
> performance regression (reducing it to be small enough to not care). After
> that an option defaulting to using indexes. I don't think an option
defaulting
> to false makes sense.

I think we figured out the cause of the performance regression. I think it
is not small
enough for some scenarios like the above. But those scenarios seem like
synthetic
test cases, with not much user impacting implications. Still, I think you
are better suited
to comment on this.

If you consider that this is a significant issue,  we could consider the
second patch as well
such that for this unlikely scenario users could disable index scans.

Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-26 Thread Önder Kalacı
Hi Amit, all


Wouldn't a table-level option like 'apply_index_scan' be better than a
> subscription-level option with a default value as false? Anyway, the
> bigger point is that we don't see a better way to proceed here than to
> introduce some option to control this behavior.
>

What would be a good API for adding such an option for table-level?
To be more specific, I cannot see any table level sub/pub options in the
docs.

My main motivation for doing it for subscription-level is that (a) it might
be
too much work for users to control the behavior if it is table-level (b) I
couldn't
find a good API for table-level, and inventing a new one seemed
like a big change.

Overall, I think it makes sense to disable the feature by default. It is
enabled by default, and that's good for test coverage for now, but
let me disable it when I push a version next time.


>
> I see this as a way to provide this feature for users but I would
> prefer to proceed with this if we can get some more buy-in from senior
> community members (at least one more committer) and some user(s) if
> possible. So, I once again request others to chime in and share their
> opinion.
>
>
Agreed, it would be great to hear some other perspectives on this.

Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-21 Thread Önder Kalacı
Hi Amit, all

Amit Kapila , 15 Şub 2023 Çar, 07:37 tarihinde
şunu yazdı:

> On Wed, Feb 15, 2023 at 9:23 AM shiy.f...@fujitsu.com
>  wrote:
> >
> > On Sat, Feb 4, 2023 7:24 PM Amit Kapila  wrote:
> > >
> > > On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı 
> wrote:
> > > >
> > > > On v23, I dropped the planner support for picking the index.
> Instead, it simply
> > > > iterates over the indexes and picks the first one that is suitable.
> > > >
> > > > I'm currently thinking on how to enable users to override this
> decision.
> > > > One option I'm leaning towards is to add a syntax like the following:
> > > >
> > > > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> > > >
> > > > Though, that should probably be a seperate patch. I'm going to work
> > > > on that, but still wanted to share v23 given picking the index sounds
> > > > complementary, not strictly required at this point.
> > > >
> > >
> > > I agree that it could be a separate patch. However, do you think we
> > > need some way to disable picking the index scan? This is to avoid
> > > cases where sequence scan could be better or do we think there won't
> > > exist such a case?
> > >
> >
> > I think such a case exists. I tried the following cases based on v23
> patch.
> >
> ...
> > # Result
> > The time executing update (the average of 3 runs is taken, the unit is
> > milliseconds):
> >
> > ++-+-+
> > || patched |  master |
> > ++-+-+
> > | case 1 | 3933.68 | 1298.32 |
> > | case 2 | 1803.46 | 1294.42 |
> > | case 3 | 1380.82 | 1299.90 |
> > | case 4 | 1042.60 | 1300.20 |
> > | case 5 |  691.69 | 1297.51 |
> > | case 6 |  578.50 | 1300.69 |
> > | case 7 |  566.45 | 1302.17 |
> > ++-+-+
> >
> > In case 1~3, there's an overhead after applying the patch. In other
> cases, the
> > patch improved the performance. As more duplicate values, the greater the
> > overhead after applying the patch.
> >
>
> I think this overhead seems to be mostly due to the need to perform
> tuples_equal multiple times for duplicate values. I don't know if
> there is any simple way to avoid this without using the planner stuff
> as was used in the previous approach. So, this brings us to the
> question of whether just providing a way to disable/enable the use of
> index scan for such cases is sufficient or if we need any other way.
>
> Tom, Andres, or others, do you have any suggestions on how to move
> forward with this patch?
>
>
Thanks for the feedback and testing. Due to personal circumstances,
I could not reply the thread in the last 2 weeks, but I'll be more active
going forward.

 I also agree that we should have a way to control the behavior.

I created another patch (v24_0001_optionally_disable_index.patch) which can
be applied
on top of v23_0001_use_index_on_subs_when_pub_rep_ident_full.patch.

The new patch adds a new *subscription_parameter* for both CREATE and ALTER
subscription
named: *enable_index_scan*. The setting is valid only when REPLICA IDENTITY
is full.

What do you think about such a patch to control the behavior? It does not
give a per-relation
level of control, but still useful for many cases.

(Note that I'll be working on the other feedback in the email thread,
wanted to send this earlier
to hear some early thoughts on v24_0001_optionally_disable_index.patch).


v24_0001_optionally_disable_index.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-02 Thread Önder Kalacı
Hi all,

Thanks for the feedback!

I agree that it won't be a very convenient option for the user but how
> about along with asking for an index from the user (when the user
> didn't provide an index), we also allow to make use of any unique
> index over a subset of the transmitted columns,


Tbh, I cannot follow why you would use REPLICA IDENTITY FULL if you can
already
create a unique index? Aren't you supposed to use REPLICA IDENTITY .. USING
INDEX
in that case (if not simply pkey)?

That seems like a potential expansion of this patch, but I don't consider
it as essential. Given it
is hard to get even small commits in, I'd rather wait to see what you think
before doing such
a change.


> and if there's more
> than one candidate index pick any one. Additionally, we can allow
> disabling the use of an index scan for this particular case. If we are
> too worried about API change for allowing users to specify the index
> then we can do that later or as a separate patch.
>
>
On v23, I dropped the planner support for picking the index. Instead, it
simply
iterates over the indexes and picks the first one that is suitable.

I'm currently thinking on how to enable users to override this decision.
One option I'm leaning towards is to add a syntax like the following:

*ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...*

Though, that should probably be a seperate patch. I'm going to work
on that, but still wanted to share v23 given picking the index sounds
complementary, not strictly required at this point.

Thanks,
Onder


v23_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-01-27 Thread Önder Kalacı
Hi Marco, Tom,

> But still it doesn't seem to me to be appropriate to use the planner to
find a suitable index.

As Marco noted, here we are trying to pick an index that is non-unique. We
could pick the index based on information extracted from pg_index (or
such), but then, it'd be a premature selection. Before sending the patch to
pgsql-hackers, I initially tried to find a suitable one with such an
approach.

But then, I still ended up using costing functions (and some other low
level functions). Overall, it felt like the planner is the module that
makes this decision best. Why would we try to invent another immature way
of doing this? With that reasoning, I ended up using the related planner
functions directly.

However, I assume the current approach of using low-level functions in the
> common case was chosen for performance reasons.
>

That's partially the reason. If you look at the patch, we use the planner
(or the low level functions) infrequently. It is only called when the
logical replication relation cache is rebuilt. As far as I can see, that
happens with (auto) ANALYZE or DDLs etc. I expect these are infrequent
operations. Still, I wanted to make sure we do not create too much overhead
even if there are frequent invalidations.

The main reason for using the low level functions over the planner itself
is to have some more control over the decision. For example, due to the
execution limitations, we currently cannot allow an index that consists of
only expressions (similar to pkey restriction). With the current approach,
we can easily filter those out.

Also, another minor reason is that, if we use planner, we'd get a
PlannedStmt back. It also felt weird to check back the index used from a
PlannedStmt. In the current patch, we iterate over Paths, which seems more
intuitive to me.


> I suppose the options are:
> 1. use regular planner uniformly
> 2. use regular planner only when there's no replica identity (or
> configurable?)
> 3. only use low-level functions
> 4. keep using sequential scans for every single updated row
> 5. introduce a hidden logical row identifier in the heap that is
> guaranteed unique within a table and can be used as a replica identity when
> no unique index exists
>

One other option I considered was to ask the index explicitly on the
subscriber side from the user when REPLICA IDENTITY is FULL. But, it is a
pretty hard choice for any user, even a planner sometimes fails to pick the
right index :)  Also, it is probably controversial to change any of the
APIs for this purpose?

I'd be happy to hear from more experienced hackers on the trade-offs for
the above, and I'd be open to work on that if there is a clear winner. For
me (3) is a decent solution for the problem.

Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-01-09 Thread Önder Kalacı
Hi,

Thank you for the useful comments!


> I took a very brief look through this.  I'm not too pleased with
> this whole line of development TBH.  It seems to me that the core
> design of execReplication.c and related code is "let's build our
> own half-baked executor and much-less-than-half-baked planner,
> because XXX".  (I'm not too sure what XXX was, really, but apparently
> somebody managed to convince people that that is a sane and
> maintainable design.)


This provided me with a broad perspective for the whole execReplication.c.
Before your comment, I have not thought about why there is a specific
logic for the execution of logical replication.

I tried to read the initial commit that adds execReplication.c
(665d1fad99e7b11678b0d5fa24d2898424243cd6)
and the main relevant mail thread (PostgreSQL: Re: Logical Replication WIP
).
But, I couldn't find
any references on this decision. Maybe I'm missing something?

Regarding planner, as far as I can speculate, before my patch, there is
probably no need for any planner infrastructure.
The reason seems that the logical replication either needs a
sequential scan for REPLICA IDENTITY FULL
or an index scan for the primary key / unique index.  I'm not suggesting
that we shouldn't use planner at all,
just trying to understand the design choices that have been made earlier.


Now this patch has decided that it *will*
> use the real planner, or at least portions of it in some cases.
> If we're going to do that ISTM we ought to replace all the existing
> not-really-a-planner logic, but this has not done that; instead
> we have a large net addition to the already very duplicative
> replication code, with weird restrictions because it doesn't want
> to make changes to the half-baked executor.
>

That sounds like a one good perspective on the restrictions that this patch
adds.
>From my perspective, I wanted to fit into the existing execReplication.c,
which only
works for primary keys / unique keys. And, if you look closely, the
restrictions I suggest
are actually the same/similar restrictions with REPLICA IDENTITY ... USING
INDEX.
I hope/assume this is no surprise for you and not too hard to explain to
the users.


>
> I think we should either live within the constraints set by this
> overarching design, or else nuke execReplication.c from orbit and
> start using the real planner and executor.  Perhaps the foreign
> key enforcement mechanisms could be a model --- although if you
> don't want to buy into using SPI as well, you probably should look
> at Amit L's work at [1].
>

This sounds like a good long term plan to me. Are you also suggesting to do
that
before this patch?

I think that such a change is a non-trivial / XL project, which could
likely not be easily
achievable by myself in a reasonable time frame.


>
> Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
> sanely defined in the first place?  It looks to me that
> RelationFindReplTupleSeq assumes without proof that there is a unique
> full-tuple match, but that is only reasonable to assume if there is at
> least one unique index (and maybe not even then, if nulls are involved).
>

In general, RelationFindReplTupleSeq is ok not to match any tuples. So, I'm
not sure
if uniqueness is required?

Even if there are multiple matches, RelationFindReplTupleSeq does only one
change at
a time. My understanding is that if there are multiple matches on the
source, they are
generated as different messages, and each message triggers
RelationFindReplTupleSeq.



> If there is a unique index, why can't that be chosen as replica identity?
> If there isn't, what semantics are we actually providing?
>

I'm not sure I can fully follow this question. In this patch, I'm trying to
allow non-unique
indexes to be used in the subscription. And, the target could have multiple
indexes.

So, the semantics is that we automatically allow users to be able to use
non-unique
indexes on the subscription side even if the replica identity is full on
the source.

The reason (a) we use planner (b) not ask users which index to use, is that
it'd be very inconvenient for any user to pick the indexes among multiple
indexes on the subscription.

If there is a unique index, the expectation is that the user would pick
REPLICA IDENTITY .. USING INDEX or just make it the primary key.
In those cases, this patch would not interfere with the existing logic.


> What I'm thinking about is that maybe REPLICA IDENTITY FULL should be
> defined as "the subscriber can pick any unique index to match on,
> and is allowed to fail if the table has none".  Or if "fail" is a bridge
> too far for you, we could fall back to the existing seqscan logic.
> But thumbing through the existing indexes to find a non-expression unique
> index wouldn't require invoking the full planner.  Any candidate index
> would result in a plan 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-12-12 Thread Önder Kalacı
Hi,

Thanks for the heads-up.


> Needs another rebase, I think:
>
> https://cirrus-ci.com/task/559244463758
>
> [05:44:22.102] FAILED:
> src/backend/postgres_lib.a.p/replication_logical_worker.c.o
> [05:44:22.102] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include
> -I../src/include -Isrc/include/storage -Isrc/include/utils
> -Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes
> -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute
> -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local
> -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation
> -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ
> src/backend/postgres_lib.a.p/replication_logical_worker.c.o -MF
> src/backend/postgres_lib.a.p/replication_logical_worker.c.o.d -o
> src/backend/postgres_lib.a.p/replication_logical_worker.c.o -c
> ../src/backend/replication/logical/worker.c
> [05:44:22.102] ../src/backend/replication/logical/worker.c: In function
> ‘get_usable_indexoid’:
> [05:44:22.102] ../src/backend/replication/logical/worker.c:2101:36: error:
> ‘ResultRelInfo’ has no member named ‘ri_RootToPartitionMap’
> [05:44:22.102]  2101 |   TupleConversionMap *map =
> relinfo->ri_RootToPartitionMap;
> [05:44:22.102]   |^~
>
>
Yes, it seems the commit (fb958b5da86da69651f6fb9f540c2cfb1346cdc5) broke
the build and commit(a61b1f74823c9c4f79c95226a461f1e7a367764b) broke the
tests. But the fixes were trivial. All tests pass again.

Attached v22.

Onder KALACI


v22_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-11-11 Thread Önder Kalacı
Hi hackers,

I rebased the changes to the current master branch, reflected pg_indent
suggestions and also made a few minor style changes.

Also, tested the patch with a few new PG 15 features in combination (such
as row/column filter in logical replication, NULLS NOT DISTINCT indexes
etc.)  as well somethings that I haven't tested before such
as publish_via_partition_root.

I have not added those tests to the regression tests as the existing tests
of this patch are already bulky and I don't see a specific reason to add
all combinations. Still, if anyone thinks that it is a good idea to add
more tests, I can do that. For reference, here are the tests that I did
manually: More Replication Index Tests (github.com)
<https://gist.github.com/onderkalaci/fa91688dea968e4024623feb4ddb627f>

Attached v21.

Onder KALACI



Önder Kalacı , 21 Eki 2022 Cum, 14:14 tarihinde şunu
yazdı:

> Hi Shi yu, all
>
>
>> In execReplication.c:
>>
>> +   TypeCacheEntry **eq = NULL; /* only used when the index is not
>> unique */
>>
>> Maybe the comment here should be changed. Now it is used when the index
>> is not
>> primary key or replica identity index.
>>
>>
> makes sense, updated
>
>
>> 2.
>> +# wait until the index is created
>> +$node_subscriber->poll_query_until(
>> +   'postgres', q{select count(*)=1 from pg_stat_all_indexes where
>> indexrelname = 'test_replica_id_full_idx';}
>> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
>> updates one row via index";
>>
>> The message doesn't seem right,  should it be changed to "Timed out while
>> waiting for creating index test_replica_id_full_idx"?
>>
>
> yes, updated
>
>
>>
>> 3.
>> +# now, ingest more data and create index on column y which has higher
>> cardinality
>> +# then create an index on column y so that future commands uses the
>> index on column
>> +$node_publisher->safe_psql('postgres',
>> +   "INSERT INTO test_replica_id_full SELECT 50, i FROM
>> generate_series(0,3100)i;");
>>
>> The comment say "create (an) index on column y" twice, maybe it can be
>> changed
>> to:
>>
>> now, ingest more data and create index on column y which has higher
>> cardinality,
>> so that future commands will use the index on column y
>>
>>
> fixed
>
>
>> 4.
>> +# deletes 200 rows
>> +$node_publisher->safe_psql('postgres',
>> +   "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
>> +
>> +# wait until the index is used on the subscriber
>> +$node_subscriber->poll_query_until(
>> +   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes
>> where indexrelname = 'test_replica_id_full_idx';}
>> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
>> updates 200 rows via index";
>>
>> It would be better to call wait_for_catchup() after DELETE. (And some
>> other
>> places in this file.)
>>
>
> Hmm, I cannot follow this easily.
>
> Why do you think wait_for_catchup() should be called? In general, I tried
> to follow a pattern where we call poll_query_until() so that we are sure
> that all the changes are replicated via the index. And then, an
> additional check with `is($result, ..` such that we also verify the
> correctness of the data.
>
> One alternative could be to use wait_for_catchup() and then have multiple
> `is($result, ..` to check both pg_stat_all_indexes and the correctness of
> the data.
>
> One minor advantage I see with the current approach is that every
> `is($result, ..` adds one step to the test. So, if I use  `is($result, ..`
> for pg_stat_all_indexes queries, then I'd be adding multiple steps for a
> single test. It felt it is more natural/common to test roughly once with
> `is($result, ..` on each test. Or, at least do not add additional ones for
> pg_stat_all_indexes checks.
>
>
>
>> Besides, the "updates" in the message should be "deletes".
>>
>>
> fixed
>
>
>> 5.
>> +# wait until the index is used on the subscriber
>> +$node_subscriber->poll_query_until(
>> +   'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes
>> where indexrelname ilike 'users_table_part_%';}
>> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
>> updates partitioned table";
>>
>> Maybe we should say "updates partitioned table with index" in this
>> message.
>>
>>
> Fixed
>
> Attached v20.
>
> Thanks!
>
> Onder KALACI
>


v21_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [Proposal] Add foreign-server health checks infrastructure

2022-11-11 Thread Önder Kalacı
Hi Hayota Kuroda,


> I (and my company) worried about overnight batch processing that
> contains some accesses to foreign servers. If the transaction is opened
> overnight and
> one of foreign servers is crashed during it, the transaction must be
> rollbacked.
> But there is a possibility that DBAs do not recognize the crash and
> they waste a time until the morning. This problem may affect customer's
> business.
> (It may not be sufficient to check the status from another different
> server.
> DBAs must check the network between the databases, and they may be
> oversight.)
> This is a motivation we thought.
>

Thanks for the clarification, I agree that this might be a valid concern
for systems.


>
> So how about implementing a check function as an SQL function once and
> update incrementally?
>

I think a SQL function makes sense.


> This still satisfy our motivation and it can avoid overhead because we can
> reduce the number of calling it.
>

Yes, it makes sense. Also, it allows other extensions to utilize the new
libpq function, which is neat.


> If we decide that we establish a new connection in the checking function,
> we can refactor the it.
> And if we decide that we introduce health-check BGworker, then we can add
> a process that calls implemented function periodically.
>

Right, your approach doesn't conflict with a more sophisticated approach,
in fact is a useful building block.


>
> Note that poll() is used here, it means that currently this function can
> be used on some limited platforms.
>
>
I think more experienced hackers could guide us here. I don't see a problem
with that, but checking other functions like pqSocketPoll(), I see that
Postgres uses the select system call. I wonder if we can have something
similar with select? Seems no, but wanted to bring up in case you know more
about that?


> I have added a parameter check_all that controls the scope of
> to-be-checked servers,
> But this is not related with my motivation so we can remove if not needed.
>

I guess it kind of makes sense to have the flexibility to check all
connections vs only in tx connections.

Though, maybe we should follow a similar approach
to postgres_fdw_disconnect(servername) / postgres_fdw_disconnect_all()

postgres_fdw_verify_connection_states(servername) /
postgres_fdw_verify_connection_states_all()

That seems like a more natural API considering the other UDFs. You can also
use in combination with postgres_fdw_get_connections()


> (I have not implemented another version that uses epoll() or kqueue(),
> because they seem to be not called on the libpq layer. Do you know any
> reasons?)
>
>
Hmm, I don't know the reason. Is that maybe epoll is available on a smaller
number of platforms and libpq can be used on more platforms as being a
client library?

Now, some comments regarding the v18:

>
> +static int
> +pqConncheck_internal(int sock)
> +{
> +#if (defined(HAVE_POLL) && defined(POLLRDHUP))
> + struct pollfd input_fd;
> +
> + input_fd.fd = sock;
> + input_fd.events = POLLRDHUP;
> + input_fd.revents = 0;
> +
> + poll(_fd, 1, 0);
> +
> + return input_fd.revents & POLLRDHUP;


WaitEventSetWaitBlock's* defined(WAIT_USE_POLL)* branch uses the following
check to find WL_SOCKET_CLOSED

#ifdef POLLRDHUP
if ((cur_event->events & WL_SOCKET_CLOSED) &&
(cur_pollfd->revents & (POLLRDHUP | errflags)))
{
/* remote peer closed, or error */
occurred_events->events |= WL_SOCKET_CLOSED;
}
#endif

Where *errflags = POLLHUP | POLLERR | POLLNVAL;*

So, should we also be using all these flags like: input_fd.events =
POLLRDHUP | *errflags*;  ?


+ ereport(ERROR,
> + (errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not connect to server \"%s\"",
> + server->servername),
> + errdetail("Socket close is detected."),
> + errhint("Please check the health of the server.")));


Is erroring out always necessary? Maybe we should just return true/false,
and let the caller decide what to do? For example, you might want to
rollback to savepoint and retry?  If the caller wants to rollback the whole
transaction, that is possible anyway.

Maybe similar to postgres_fdw_disconnect(), we can warn if the connection
is involved in a tx (e.g., depth > 0)?

new file mode 100644
> index 00..3ce169c837
> --- /dev/null
> +++ b/contrib/postgres_fdw/expected/postgres_fdw_1.out
> @@ -0,0 +1,11615 @@
> +-- ===
> +-- create FDW objects
> +-- ===
> +CREATE EXTENSION postgres_fdw;
> +CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
> +DO $d$
> +BEGIN


Do we really need this new file or just an oversight in your patch?

Thanks,
Onder KALACI

Software Engineer at Microsoft &
Developing the Citus database extension for PostgreSQL


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-21 Thread Önder Kalacı
Hi Shi yu, all


> In execReplication.c:
>
> +   TypeCacheEntry **eq = NULL; /* only used when the index is not
> unique */
>
> Maybe the comment here should be changed. Now it is used when the index is
> not
> primary key or replica identity index.
>
>
makes sense, updated


> 2.
> +# wait until the index is created
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates one row via index";
>
> The message doesn't seem right,  should it be changed to "Timed out while
> waiting for creating index test_replica_id_full_idx"?
>

yes, updated


>
> 3.
> +# now, ingest more data and create index on column y which has higher
> cardinality
> +# then create an index on column y so that future commands uses the index
> on column
> +$node_publisher->safe_psql('postgres',
> +   "INSERT INTO test_replica_id_full SELECT 50, i FROM
> generate_series(0,3100)i;");
>
> The comment say "create (an) index on column y" twice, maybe it can be
> changed
> to:
>
> now, ingest more data and create index on column y which has higher
> cardinality,
> so that future commands will use the index on column y
>
>
fixed


> 4.
> +# deletes 200 rows
> +$node_publisher->safe_psql('postgres',
> +   "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes
> where indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates 200 rows via index";
>
> It would be better to call wait_for_catchup() after DELETE. (And some other
> places in this file.)
>

Hmm, I cannot follow this easily.

Why do you think wait_for_catchup() should be called? In general, I tried
to follow a pattern where we call poll_query_until() so that we are sure
that all the changes are replicated via the index. And then, an
additional check with `is($result, ..` such that we also verify the
correctness of the data.

One alternative could be to use wait_for_catchup() and then have multiple
`is($result, ..` to check both pg_stat_all_indexes and the correctness of
the data.

One minor advantage I see with the current approach is that every
`is($result, ..` adds one step to the test. So, if I use  `is($result, ..`
for pg_stat_all_indexes queries, then I'd be adding multiple steps for a
single test. It felt it is more natural/common to test roughly once with
`is($result, ..` on each test. Or, at least do not add additional ones for
pg_stat_all_indexes checks.



> Besides, the "updates" in the message should be "deletes".
>
>
fixed


> 5.
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes
> where indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates partitioned table";
>
> Maybe we should say "updates partitioned table with index" in this message.
>
>
Fixed

Attached v20.

Thanks!

Onder KALACI


v20_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-19 Thread Önder Kalacı
Hi,

> As far as I can think of, it should probably be a single background task
> > checking whether the server is down. If so, sending an invalidation
> message
> > to all the backends such that related backends could act on the
> > invalidation and throw an error. This is to cover the use-case you
> > described on [1].
>
> Indeed your approach covers the use case I said, but I'm not sure whether
> it is really good.
> In your approach, once the background worker process will manage all
> foreign servers.
> It may be OK if there are a few servers, but if there are hundreds of
> servers,
> the time interval during checks will be longer.
>

I expect users typically will have a lot more backends than the servers. We
can have a threshold for spinning a new bg worker (e.g., every 10 servers
gets a new bg worker etc.). Still, I think that'd be an optimization that
is probably not necessary for the majority of the users?


> Currently, each FDW can decide whether we do health checks or not per the
> backend.
> For example, we can skip health checks if the foreign server is not used
> now.
> The background worker cannot control such a way.
> Based on the above, I do not agree that we introduce a new background
> worker and make it to do a health check.
>

Again, the definition of "health check" is probably different for me. I'd
expect the health check to happen continuously, ideally keeping track of
how many consecutive times it succeeded and/or last time it
failed/succeeded etc.

A transaction failing with a bad error message (or holding some resources
locally until the transaction is committed) doesn't sound essential to me.
Is there any specific workload are you referring for optimizing to rollback
a transaction earlier if a remote server dies?  What kind of workload would
benefit from that? Maybe there is, but not clear to me and haven't seen
discussed on the thread (sorry if I missed).

I'm trying to understand if we are trying to solve a problem that does not
really exists. I'm bringing this up, because I often deal with
architectures where there is a local node and remote transaction on
different Postgres servers. And, I have not encountered many (or any)
pattern that'd benefit from this change much. In fact, I think, on the
contrary, this might add significant overhead for OLTP type of high query
throughput systems.


> Moreover, methods to connect to foreign servers and check health are
> different per FDW.
> In terms of mysql_fdw [1], we must do mysql_init() and
> mysql_real_connect().
> About file_fdw, we do not have to connect, but developers may want to
> calculate checksum and compare.
> Therefore, we must provide callback functions anyway.
>
>
I think providing callback functions is useful for any case. Each fdw (or
in general extension) should be able to provide its own "health check"
function.

Thanks,
Onder KALACI


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-18 Thread Önder Kalacı
Hi Wang, all


> And I have another confusion about function
> GetCheapestReplicaIdentityFullPath:
> If rel->pathlist is NIL, could we return NULL directly from this function,
> and
> then set idxoid to InvalidOid in function
> FindUsableIndexForReplicaIdentityFull
> in that case?
>
>
We could, but then we need to move some other checks to some other places.
I find the current flow easier to follow, where all happens
via cheapest_total_path, which is a natural field for this purpose.

Do you have a strong opinion on this?


> ===
>
> Here are some comments for test file  032_subscribe_use_index.pl on v18
> patch:
>
> 1.
> ```
> +# Basic test where the subscriber uses index
> +# and only updates 1 row for and deletes
> +# 1 other row
> ```
> There seems to be an extra "for" here.
>

 Fixed


> 2. Typos for subscription name in the error messages.
> tap_sub_rep_full_0 -> tap_sub_rep_full
>
>
Fixed


> 3. Typo in comments
> ```
> +# use the newly created index (provided that it fullfils the
> requirements).
> ```
> fullfils -> fulfils
>
>
Fixed


> 4. Some extra single quotes at the end of the error message ('").
> For example:
> ```
> # wait until the index is used on the subscriber
> $node_subscriber->poll_query_until(
> 'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes
> where indexrelname = 'test_replica_id_full_idx';}
> ) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates 200 rows via index'";
> ```
>

All fixed, thanks



>
> 5. The column names in the error message appear to be a typo.
> ```
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates two rows via index scan with index on high cardinality column-1";
> ...
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates two rows via index scan with index on high cardinality column-3";
> ...
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates two rows via index scan with index on high cardinality column-4";
> ```
> It seems that we need to do the following change: 'column-3' -> 'column-1'
> and
> 'column-4' -> 'column-2'.
> Or we could use the column names directly like this: 'column-1' -> 'column
> a',
> 'column_3' -> 'column a' and 'column_4' -> 'column b'.
>

I think the latter is easier to follow, thanks.


>
> 6. DELETE action is missing from the error message.
> ```
> +# 2 rows from first command, another 2 from the second command
> +# overall index_on_child_1_a is used 4 times
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
> indexrelname = 'index_on_child_1_a';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates child_1 table'";
> ```
> I think we execute both UPDATE and DELETE for child_1 here. Could we add
> DELETE
> action to this error message?
>
>
makes sense, added


> 7. Table name in the error message.
> ```
> # check if the index is used even when the index has NULL values
> $node_subscriber->poll_query_until(
> 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> ) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates parent table'";
> ```
> It seems to be "test_replica_id_full" here instead of "parent'".
>
fixed as well.


Attached v19.

Thanks,
Onder KALACI


v19_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-14 Thread Önder Kalacı
Hi,

Thanks for the review!


Here are some comments on v17 patch.
>
> 1.
> -LogicalRepRelMapEntry *
> +LogicalRepPartMapEntry *
>  logicalrep_partition_open(LogicalRepRelMapEntry *root,
>   Relation partrel,
> AttrMap *map)
>  {
>
> Is there any reason to change the return type of
> logicalrep_partition_open()? It
> seems ok without this change.
>

I think you are right, I probably needed that in some of my
earlier iterations of the patch, but now it seems redundant. Reverted back
to the original version.


>
> 2.
>
> +* of the relation cache entry (e.g., such as ANALYZE or
> +* CREATE/DROP index on the relation).
>
> "e.g." and "such as" mean the same. I think we remove one of them.
>

fixed


>
> 3.
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 2) from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for'check subscriber tap_sub_rep_full
> deletes one row via index";
> +
>
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idy';}
> +) or die "Timed out while waiting for'check subscriber tap_sub_rep_full
> deletes one row via index";
>
>
> "for'check" -> "for check"
>

fixed


>
> 3.
> +$node_subscriber->safe_psql('postgres',
> +   "SELECT pg_reload_conf();");
> +
> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
> +# 
> +
> +$node_subscriber->stop('fast');
> +$node_publisher->stop('fast');
> +
>
> "Testcase start" in the comment should be "Testcase end".
>
>
fixed


> 4.
> There seems to be a problem in the following scenario, which results in
> inconsistent data between publisher and subscriber.
>
> -- publisher
> CREATE TABLE test_replica_id_full (x int, y int);
> ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
>
> -- subscriber
> CREATE TABLE test_replica_id_full (x int, y int);
> CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(x);
> CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> port=5432' PUBLICATION tap_pub_rep_full;
>
> -- publisher
> INSERT INTO test_replica_id_full VALUES (NULL,1);
> INSERT INTO test_replica_id_full VALUES (NULL,2);
> INSERT INTO test_replica_id_full VALUES (NULL,3);
> update test_replica_id_full SET x=1 where y=2;
>
> The data in publisher:
> postgres=# select * from test_replica_id_full order by y;
>  x | y
> ---+---
>| 1
>  1 | 2
>| 3
> (3 rows)
>
> The data in subscriber:
> postgres=# select * from test_replica_id_full order by y;
>  x | y
> ---+---
>| 2
>  1 | 2
>| 3
> (3 rows)
>
> There is no such problem on master branch.
>
>
Uff, the second problem reported regarding NULL values for this patch (both
by you). First, v18 contains the fix for the problem. It turns out that my
idea of treating all unique indexes (pkey, replica identity and unique
regular indexes) the same proved to be wrong.  The former two require all
the involved columns to have NOT NULL. The latter not.

This resulted in RelationFindReplTupleByIndex() to skip tuples_equal() for
regular unique indexes (e.g., non pkey/replid). Hence, the first NULL value
is considered the matching tuple. Instead, we should be doing a full tuple
equality check (e.g., tuples_equal). This is what v18 does. Also, add the
above scenario as a test.

I think we can probably skip tuples_equal() for unique indexes that consist
of only NOT NULL columns. However, that seems like an over-optimization. If
you have such a unique index, why not create a primary key anyway?  That's
why I don't see much value in compicating the code for that use case.

Thanks for the review & testing. I'll focus more on the NULL values on my
own testing as well. Still, I wanted to push my changes so that you can
also have a look if possible.

Attach v18.

Onder KALACI


v18_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-12 Thread Önder Kalacı
Hi,




> ~~~
> 01. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT
> USES AFTER ANALYZE
>
> ```
> # show that index_b is not used
> $node_subscriber->poll_query_until(
> 'postgres', q{select idx_scan=0 from pg_stat_all_indexes where
> indexrelname = 'index_b';}
> ) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates two rows via index scan with index on high cardinality column-2";
> ```
>
> poll_query_until() is still remained here, it should be replaced to is().
>
>
>
Updated

02. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
>
> ```
> # show that the unique index on replica identity is used even when
> enable_indexscan=false
> $result = $node_subscriber->safe_psql('postgres',
> "select idx_scan from pg_stat_all_indexes where indexrelname =
> 'test_replica_id_full_idx'");
> is($result, qq(0), 'ensure subscriber has not used index with
> enable_indexscan=false');
> ```
>
> Is the comment wrong? The index test_replica_id_full_idx is not used here.
>
>
Yeah, the comment is wrong. It is a copy & paste error from the other test.
Fixed now


>
>
> 03. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH
> ENABLE_INDEXSCAN
>
> ```
> $node_publisher->safe_psql('postgres',
> "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX
> test_replica_id_full_unique;");
> ```
>
> I was not sure why ALTER TABLE REPLICA IDENTITY USING INDEX was done on
> the publisher side.
> IIUC this feature works when REPLICA IDENTITY FULL is specified on a
> publisher,
> so it might not be altered here. If so, an index does not have to define
> on the publisher too.
>
>
Yes, not strictly necessary but it is often the case that both
subscriber and publication have the similar schemas when unique index/pkey
is used. For example, see t/028_row_filter.pl where we follow this pattern.

Still, I manually tried that without the index on the publisher (e.g.,
replica identity full), that works as expected. But given that the majority
of the tests already have that approach and this test focuses on
enable_indexscan, I think I'll keep it as is - unless it is confusing?


> 04. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH
> ENABLE_INDEXSCAN
>
> ```
> $node_subscriber->poll_query_until(
> 'postgres', q{select (idx_scan=1) from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_unique'}
> ) or die "Timed out while waiting ensuring subscriber used unique index as
> replica identity even with enable_indexscan=false'";
> ```
>
> 03 comment should be added here.
>
> Yes, done that as well.


Attached v17 now. Thanks for the review!


v17_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-12 Thread Önder Kalacı
Hi,

Sounds reasonable. Do you mean that we can add additional GUC like
> "postgres_fdw.initial_check",
> wait WL_SOCKET_CLOSED if the conneciton is found in the hash table, and do
> reconnection if it might be closed, right?
>
>
Alright, it took me sometime to realize that postgres_fdw already has a
retry mechanism if the first command fails: postgres_fdw: reestablish new
connection if cached one is detected as… · postgres/postgres@32a9c0b
(github.com)



Still, the reestablish mechanism can be further simplified with
WL_SOCKET_CLOSED event such as the following (where we should probably
rename pgfdw_connection_check_internal):

 /*
> * If the connection needs to be remade due to invalidation, disconnect as
> * soon as we're out of all transactions.
> */
>
* | +bool remoteSocketIsClosed =  entry->conn != NULL : *
pgfdw_connection_check_internal*(entry->conn) : false;*

> if (entry->conn != NULL && (entry->invalidated || remoteSocketIsClosed) &&
> entry->xact_depth == 0)

{
> elog(DEBUG3, "closing connection %p for option changes to take effect",
> entry->conn);
> disconnect_pg_server(entry);
> }

* | +else if (remoteSocketIsClosed && && entry->xact_depth > 0)*
* | + error ("Remote Server is down ...")*


In other words, a variation of pgfdw_connection_check_internal()
could potentially go into interfaces/libpq/libpq-fe.h
(backend/libpq/pqcomm.c or src/interfaces/libpq/fe-connect.c).
Then, GetConnection() in postgres_fdw, it can force to reconnect as it is
already done for some cases or error properly:


>
>  Based on off and on discussions, I modified my patch.
>
>
I still think that it is probably too much work/code to detect the
mentioned use-case you described on [1]. Each backend constantly
calling CallCheckingRemoteServersCallbacks() for this purpose doesn't sound
the optimal way to approach the "check whether server down" problem. You
typically try to decide whether a server is down by establishing a
connection (or ping etc), not going over all the existing connections.

As far as I can think of, it should probably be a single background task
checking whether the server is down. If so, sending an invalidation message
to all the backends such that related backends could act on the
invalidation and throw an error. This is to cover the use-case you
described on [1].

Also, maybe we could have a new catalog table like pg_foreign_server_health
or such, where we can keep the last time the check succeeded (and/or
failed), and how many times the check  succeeded (and/or failed).

This is of course how I would approach this problem. I think some other
perspectives on this would be very useful to hear.

Thanks,
Onder KALACI


  [1]
https://www.postgresql.org/message-id/TYAPR01MB58662809E678253B90E82CE5F5889%40TYAPR01MB5866.jpnprd01.prod.outlook.com


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-11 Thread Önder Kalacı
Hi Kuroda Hayato,


> ===
> 01. relation.c - GetCheapestReplicaIdentityFullPath
>
> ```
>  * The reason that the planner would not pick partial indexes and
> indexes
>  * with only expressions based on the way currently
> baserestrictinfos are
>  * formed (e.g., col_1 = $1 ... AND col_N = $2).
> ```
>
> Is "col_N = $2" a typo? I think it should be "col_N = $N" or "attr1 = $1
> ... AND attrN = $N".
>
>
Yes, it is a typo, fixed now.


> ===
> 02. 032_subscribe_use_index.pl
>
> If a table has a primary key on the subscriber, it will be used even if
> enable_indexscan is false(legacy behavior).
> Should we test it?
>
>
Yes, good idea. I added two tests, one test that we cannot use regular
indexes when index scan is disabled, and another one that we use replica
identity index when index scan is disabled. This is useful to make sure if
someone changes the behavior can see the impact.


> ~~~
> 03. 032_subscribe_use_index.pl -  SUBSCRIPTION RE-CALCULATES INDEX AFTER
> CREATE/DROP INDEX
>
> I think this test seems to be not trivial, so could you write down the
> motivation?
>

makes sense, done


>
> ~~~
> 04. 032_subscribe_use_index.pl -  SUBSCRIPTION RE-CALCULATES INDEX AFTER
> CREATE/DROP INDEX
>
> ```
> # wait until the index is created
> $node_subscriber->poll_query_until(
> 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> ) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0
> updates one row via index";
> ```
>
> CREATE INDEX is a synchronous behavior, right? If so we don't have to wait
> here.
> ...And the comment in case of die may be wrong.
> (There are some cases like this)
>

It is not about CREATE INDEX being async. It is about pg_stat_all_indexes
being async. If we do not wait, the tests become flaky, because sometimes
the update has not been reflected in the view immediately.

This is explained here: PostgreSQL: Documentation: 14: 28.2. The Statistics
Collector 

*When using the statistics to monitor collected data, it is important to
> realize that the information does not update instantaneously. Each
> individual server process transmits new statistical counts to the collector
> just before going idle; so a query or transaction still in progress does
> not affect the displayed totals. Also, the collector itself emits a new
> report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless
> altered while building the server). So the displayed information lags
> behind actual activity. However, current-query information collected by
> track_activities is always up-to-date.*
>



>
> ~~~
> 05. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE
> ROWS
>
> ```
> # Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
> #
> # Basic test where the subscriber uses index
> # and touches 50 rows with UPDATE
> ```
>
> "touches 50 rows with UPDATE" -> "updates 50 rows", per other tests.
>
> fixed


> ~~~
> 06. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT
> USES AFTER ANALYZE
>
> I think this test seems to be not trivial, so could you write down the
> motivation?
> (Same as Re-calclate)
>

sure, done


>
> ~~~
> 07. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT
> USES AFTER ANALYZE
>
> ```
> # show that index_b is not used
> $node_subscriber->poll_query_until(
> 'postgres', q{select idx_scan=0 from pg_stat_all_indexes where
> indexrelname = 'index_b';}
> ) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates two rows via index scan with index on high cardinality column-2";
> ```
>
> I think we don't have to wait here, is() should be used instead.
> poll_query_until() should be used only when idx_scan>0 is checked.
> (There are some cases like this)
>

Yes, makes sense


>
> ~~~
> 08. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX ON PARTITIONED
> TABLES
>
> ```
> # make sure that the subscriber has the correct data
> $node_subscriber->poll_query_until(
> 'postgres', q{select sum(user_id+value_1+value_2)=550070 AND
> count(DISTINCT(user_id,value_1, value_2))=981 from users_table_part;}
> ) or die "ensure subscriber has the correct data at the end of the test";
> ```
>
>
Ah, for this case, we already have is() checks for the same results, this
is just a left-over from the earlier iterations


> I think we can replace it to wait_for_catchup() and is()...
> Moreover, we don't have to wait here because in above line we wait until
> the index is used on the subscriber.
> (There are some cases like this)
>

Fixed a few more such cases.

Thanks for the review! Attached v16.

Onder KALACI
From 1ebc170f821f290283aadd10c8c04480ad203580 Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Tue, 17 May 2022 10:47:39 +0200
Subject: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full
 on the publisher

Using 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-07 Thread Önder Kalacı
Hi Kurado Hayato,


> In your patch:
>
> ```
> +   /* Simple case, we already have a primary key or a replica
> identity index */
> +   idxoid = GetRelationIdentityOrPK(localrel);
> +   if (OidIsValid(idxoid))
> +   return idxoid;
> +
> +   /*
> +* If index scans are disabled, use a sequential scan.
> +*
> +* Note that we still allowed index scans above when there is a
> primary
> +* key or unique index replica identity, but that is the legacy
> behaviour
> +* (even when enable_indexscan is false), so we hesitate to move
> this
> +* enable_indexscan check to be done earlier in this function.
> +*/
> ```
>
> And the paragraph " Note that we..." should be at above of
> GetRelationIdentityOrPK().
> Future readers will read the function from top to bottom,
> and when they read around GetRelationIdentityOrPK() they may be confused.
>
>
Ah, makes sense, now I applied your feedback (with some different wording).


> The inlined comment in the function has a similar comment. Is that clear
> > enough?
> > /* * Generate restrictions for all columns in the form of col_1 = $1 AND
> *
> > col_2 = $2 ... */
>
> Actually I missed it, but I still think that whole of emulated SQL should
> be clarified.


Alright, it makes sense. I added the emulated SQL to the function comment
of GetCheapestReplicaIdentityFullPath.


> And followings are the comments for v14. They are mainly about comments.
>
> ===
> 01. relation.c - logicalrep_rel_open
>
> ```
> +   /*
> +* Finding a usable index is an infrequent task. It occurs
> when an
> +* operation is first performed on the relation, or after
> invalidation
> +* of the relation cache entry (e.g., such as ANALYZE).
> +*/
> +   entry->usableIndexOid =
> FindLogicalRepUsableIndex(entry->localrel, remoterel);
> ```
>
> I thought you can mention CREATE INDEX in the comment.
>
> According to your analysis [1] the relation cache will be invalidated if
> users do CREATE INDEX
> At that time the hash entry will be removed
> (logicalrep_relmap_invalidate_cb) and "usable" index
> will be checked again.
>

Yes, that is right. I think it makes sense to mention that as well. In
fact, I also decided to add such a test.

I realized that all tests use ANALYZE for re-calculation of the index. Now,
I added an explicit test that uses CREATE/DROP index to re-calculate the
index.

see # Testcase start: SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP
INDEX.



> ~~~
> 02. relation.c - logicalrep_partition_open
>
> ```
> +   /*
> +* Finding a usable index is an infrequent task. It occurs when an
> +* operation is first performed on the relation, or after
> invalidation of
> +* the relation cache entry (e.g., such as ANALYZE).
> +*/
> +   entry->usableIndexOid = FindLogicalRepUsableIndex(partrel,
> remoterel);
> +
> ```
>
> Same as above
>
>
done


> ~~~
> 03. relation.c - GetIndexOidFromPath
>
> ```
> +   if (path->pathtype == T_IndexScan || path->pathtype ==
> T_IndexOnlyScan)
> +   {
> +   IndexPath  *index_sc = (IndexPath *) path;
> +
> +   return index_sc->indexinfo->indexoid;
> +   }
> ```
>
> I thought Assert(OidIsValid(indexoid)) may be added here. Or is it quite
> trivial?
>

Looking at the PG code, I couldn't see any place that asserts the
information. That seems like fundamental information that is never invalid.

Btw, even if it returns InvalidOid for some reason, we'd not be crashing.
Only not able to use any indexes, fall back to seq. scan.


>
> ~~~
> 04. relation.c - IndexOnlyOnExpression
>
> This method just returns "yes" or "no", so the name of method should be
> start "Has" or "Is".
>
> Yes, it seems like that is a common convention.


> ~~~
> 05. relation.c - SuitablePathsForRepIdentFull
>
> ```
> +/*
> + * Iterates over the input path list and returns another
> + * path list that includes index [only] scans where paths
> + * with non-btree indexes, partial indexes or
> + * indexes on only expressions have been removed.
> + */
> ```
>
> These lines seems to be around 60 columns. Could you expand around 80?
>

done


>
> ~~~
> 06. relation.c - SuitablePathsForRepIdentFull
>
> ```
> +   RelationindexRelation;
> +   IndexInfo  *indexInfo;
> +   boolis_btree;
> +   boolis_partial;
> +   boolis_only_on_expression;
> +
> +   indexRelation = index_open(idxoid,
> AccessShareLock);
> +   indexInfo = BuildIndexInfo(indexRelation);
> +   is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> +   is_partial = (indexInfo->ii_Predicate != NIL);
> +   is_only_on_expression =
> IndexOnlyOnExpression(indexInfo);
> +  

Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread Önder Kalacı
Hi Hayato Kuroda,



> If the checking function is called not periodically but GetConnection(),
> it means that the health of foreign servers will be check only when remote
> connections are used.
> So following workload described in [1] cannot handle the issue.
>
> BEGIN --- remote operations--- local operations --- COMMIT
>
>
As far as I can see this patch is mostly useful for detecting the failures
on the initial remote command. This is especially common when the remote
server does a failover/switchover and postgres_fdw uses a cached connection
to access to the remote server.

The remote server failures within a transaction block sounds like a much
less common problem. Still, you can give a good error message before COMMIT
is sent to the remote server.

I agree that this doesn't solve the issue you described, but I'm not sure
if it is worthwhile to fix such a problem.


> > Can we have this function/logic on Postgres core, so that other
> extensions
> > can also use?
>
> I was not sure about any use-case, but I think it can because it does
> quite general things.
> Is there any good motivation to do that?
>

I think any extension that deals with multiple Postgres nodes can benefit
from such logic. In fact, the reason I realized this patch is that on the
Citus extension, we are trying to solve a similar problem [1], [2].

Thinking even more, I think any extension that uses libpq and WaitEventSets
can benefit from such a function.


> > Do you see any performance implications of creating/freeing waitEventSets
> > per check? I wonder if we can somehow re-use the same waitEventSet by
> > modifyWaitEvent? I guess no, but still, if this check causes a
> performance
> > implication, can we somehow cache 1 waitEventSet per connection?
>
> I have not tested yet, but I agreed this will be caused performance
> decrease.
> In next version first I will re-use the event set anyway, and it must be
> considered later.
> Actually I'm not sure your suggestion,
> but you mean to say that we can add a hash table that associates  PGconn
> and WaitEventSet,  right?
>
>
I think it also depends on where you decide to put
pgfdw_connection_check_internal(). If you prefer the postgres_fdw side,
could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?

But if you decide to put it into the Postgres side, the API
for pgfdw_connection_check_internal() -- or equivalent function -- could be
discussed. Do we pass a WaitEventSet and if it is NULL create a new one,
else use what is passed to the function? Not sure, maybe you can come up
with a better API.

Thanks,
Onder KALACI

[1]: Check WL_SOCKET_CLOSED before using any connection by onderkalaci ·
Pull Request #6259 · citusdata/citus (github.com)

[2]: Add a single connection retry in MULTI_CONNECTION_LOST state by
marcocitus · Pull Request #6283 · citusdata/citus (github.com)



Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread Önder Kalacı
Hi Hayato Kuroda,

Thanks for the patch. I think there are some non-fdw extensions out there
which could benefit from this logic. That's why I want to first learn some
more about high-level design/goals of the patch a little more.

+/*
> + * Call callbacks for checking remote servers.
> + */
> +void
> +CallCheckingRemoteServersCallbacks(void)


Why do we run this periodically, but not when a specific connection is
going to be used? Wouldn't running it periodically prevent detecting some
already-closed sockets at the time of the connection used (e.g., we checked
10 seconds ago, the server died 5 seconds ago)?

In other words, what is the trade-off for calling
pgfdw_connection_check_internal() inside GetConnection() when we are about
to use a "cached" connection? I think that might simplify the patch as well.

+/*
> + * Helper function for pgfdw_connection_check
> + */
> +static bool
> +pgfdw_connection_check_internal(PGconn *conn)
> +{


Can we have this function/logic on Postgres core, so that other extensions
can also use?

 + AddWaitEventToSet(eventset, WL_SOCKET_CLOSED, PQsocket(conn), NULL,
> NULL);


What if PQsocket(conn) returns -1? Maybe we move certain connection state
checks into pgfdw_connection_check_internal() such that it is more generic?
I can think of checks like: conn!=NULL, PQsocket(conn) != PGINVALID_SOCKET,
PQstatus == CONNECTION_OK


+ eventset = CreateWaitEventSet(CurrentMemoryContext, 1);
> + AddWaitEventToSet(eventset, WL_SOCKET_CLOSED, PQsocket(conn), NULL,
> NULL);
> +
> + WaitEventSetWait(eventset, 0, , 1, 0);
> +
> + if (events.events & WL_SOCKET_CLOSED)
> + {
> + FreeWaitEventSet(eventset);
> + return false;
> + }
> + FreeWaitEventSet(eventset);


Do you see any performance implications of creating/freeing waitEventSets
per check? I wonder if we can somehow re-use the same waitEventSet by
modifyWaitEvent? I guess no, but still, if this check causes a performance
implication, can we somehow cache 1 waitEventSet per connection?


Thanks,
Onder KALACI
Developing the Citus extension @Microsoft


Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-29 Thread Önder Kalacı
Hi David, Tom, all,


> I've just pushed the disable byref Datums patch I posted earlier. I
> only made a small adjustment to make use of the TupleDescAttr() macro.
> Önder, thank you for the report.
>
>
With this commit, I re-run the query patterns where we observed the
problem, all looks good now. Wanted to share this information as fyi.

Thanks for the quick turnaround!

Onder KALACI


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-29 Thread Önder Kalacı
Hi Hayato Kuroda,

Thanks for the review!


> 
> 01 relation.c - general
>
> Many files are newly included.
> I was not sure but some codes related with planner may be able to move to
> src/backend/optimizer/plan.
> How do you and any other one think?
>
>
My thinking on those functions is that they should probably stay
in src/backend/replication/logical/relation.c. My main motivation is that
those functions are so much tailored to the purposes of this file that I
cannot see any use-case for these functions in any other context.

Still, at some point, I considered maybe doing something similar
to src/backend/executor/execReplication.c, where I create a new file,
say, src/backend/optimizer/plan/planReplication.c or such as you noted. I'm
a bit torn on this.

Does anyone have any strong opinions for moving to
src/backend/optimizer/plan/planReplication.c? (or another name)


> 02 relation.c - FindLogicalRepUsableIndex
>
> ```
> +/*
> + * Returns an index oid if we can use an index for the apply side. If not,
> + * returns InvalidOid.
> + */
> +static Oid
> +FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation
> *remoterel)
> ```
>
> I grepped files, but I cannot find the word "apply side". How about
> "subscriber" instead?
>

Yes, it makes sense. I guess I made up the "apply side" as there is the
concept of "apply worker". But, yes, subscribers sound better, updated.


>
> 03 relation.c - FindLogicalRepUsableIndex
>
> ```
> +   /* Simple case, we already have an identity or pkey */
> +   idxoid = GetRelationIdentityOrPK(localrel);
> +   if (OidIsValid(idxoid))
> +   return idxoid;
> +
> +   /*
> +* If index scans are disabled, use a sequential scan.
> +*
> +* Note that we still allowed index scans above when there is a
> primary
> +* key or unique index replica identity, but that is the legacy
> behaviour
> +* (even when enable_indexscan is false), so we hesitate to move
> this
> +* enable_indexscan check to be done earlier in this function.
> +*/
> +   if (!enable_indexscan)
> +   return InvalidOid;
> ```
>
> a.
> I think "identity or pkey" should be "replica identity key or primary key"
> or "RI or PK"
>

Looking into other places, it seems "replica identity index" is favored
over "replica identity key". So, I used that term.

You can see this pattern in RelationGetReplicaIndex()


>
> b.
> Later part should be at around GetRelationIdentityOrPK.
>

Hmm, I cannot follow this comment. Can you please clarify?


>
>
> 04 relation.c - FindUsableIndexForReplicaIdentityFull
>
> ```
> +   MemoryContext usableIndexContext;
> ...
> +   usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
> +
> "usableIndexContext",
> +
> ALLOCSET_DEFAULT_SIZES);
> ```
>
> I grepped other sources, and I found that the name like "tmpcxt" is used
> for the temporary MemoryContext.
>

I think there are also several contextes that are named more specifically,
such as new_pdcxt, perTupCxt, anl_context, cluster_context and many others.

So, I think it is better to have specific names, no?


>
> 05 relation.c - SuitablePathsForRepIdentFull
>
> ```
> +   indexRelation = index_open(idxoid,
> AccessShareLock);
> +   indexInfo = BuildIndexInfo(indexRelation);
> +   is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> +   is_partial = (indexInfo->ii_Predicate != NIL);
> +   is_only_on_expression =
> IndexOnlyOnExpression(indexInfo);
> +   index_close(indexRelation, NoLock);
> ```
>
> Why the index is closed with NoLock? AccessShareLock is acquired, so
> shouldn't same lock be released?
>

Hmm, yes you are right. Keeping the lock seems unnecessary and wrong. It
could actually have prevented dropping an index. However, given
that RelationFindReplTupleByIndex() also closes this index at the end, the
apply worker releases the lock. Hence, no problem observed.

Anyway, I'm still changing it to releasing the lock.

Also note that as soon as any index is dropped on the relation, the cache
is invalidated and suitable indexes are re-calculated. That's why it seems
fine to release the lock.


>
>
> 06 relation.c - GetCheapestReplicaIdentityFullPath
>
> IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND
> attrN = $N" is emulated, right?
> you can write explicitly it as comment
>
>
The inlined comment in the function has a similar comment. Is that clear
enough?

/* * Generate restrictions for all columns in the form of col_1 = $1 AND *
col_2 = $2 ... */


> 07 relation.c - GetCheapestReplicaIdentityFullPath
>
> ```
> +   Path   *path = (Path *) lfirst(lc);
> +   Oid idxoid = GetIndexOidFromPath(path);
> +
> +   if (!OidIsValid(idxoid))
> +   {
> +   /* Unrelated Path, 

Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Önder Kalacı
Hi,

Thanks for replying so quickly!

I run your test here with a fix attached.
>
> Can you retake your test with the patch attached?
>
>
> Unfortunately, with the patch, I still see the memory usage increase and
get the OOMs

Thanks,
Onder KALACI


A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Önder Kalacı
Hi hackers,

With PG 15 (rc1 or beta4), I'm observing an interesting memory pattern. I
have not seen a similar discussion on the mailing list. If I missed that,
please refer me there. The problem that I'm going to explain does not
happen on PG 13/14.

It seems like there is a memory leak(?) with $title. Still, not sure about
what is going on and, thought it'd be useful to share at least my initial
investigation.

After running the query and waiting a few minutes (see steps to repro
below), use pg_log_backend_memory_contexts() to get the contexts of the
backend executing the command. See that it goes beyond 100GB. And depending
on vm.overcommit_memory, you get an OOM error or OOM crash eventually.

```
2022-09-28 17:33:38.155 CEST [32224] LOG:  level: 2; PortalContext: 1024
total in 1 blocks; 592 free (0 chunks); 432 used: 
2022-09-28 17:33:38.159 CEST [32224] LOG:  level: 3; ExecutorState:
*114923929600* total in 13710 blocks; 7783264 free (3 chunks); 114916146336
used
2022-09-28 17:33:38.159 CEST [32224] LOG:  level: 4; TupleSort main: 8192
total in 1 blocks; 3928 free (0 chunks); 4264 used
2022-09-28 17:33:38.159 CEST [32224] LOG:  level: 5; TupleSort sort: 295096
total in 8 blocks; 256952 free (67 chunks); 38144 used
2022-09-28 17:33:38.159 CEST [32224] LOG:  level: 6; Caller tuples: 8192
total in 1 blocks (0 chunks); 7992 free (0 chunks); 200 used
2022-09-28 17:33:38.159 CEST [32224] LOG:  level: 4; TupleSort main: 8192
total in 1 blocks; 3928 free (0 chunks); 4264 used
2022-09-28 17:33:38.159 CEST [32224] LOG:  level: 5; TupleSort sort:
4309736 total in 18 blocks; 263864 free (59 chunks); 4045872 used
2022-09-28 17:33:38.159 CEST [32224] LOG:  level: 6; Caller tuples: 8192
total in 1 blocks (0 chunks); 7992 free (0 chunks); 200 used
...
2022-09-28 17:33:38.160 CEST [32224] LOG:  Grand total: *114930446784*
bytes in 13972 blocks; 8802248 free (275 chunks); 114921644536 used
```

I observed this with a merge join involving a table and set returning
function. To simulate the problem with two tables, I have the following
steps:

```
CREATE TABLE t1 (a text);
CREATE TABLE t2 (a text);

-- make the text a little large by adding 1000
INSERT INTO t1 SELECT (1000+i%1000)::text FROM
generate_series(0,1000) i;

-- make the text a little large by adding 1000
INSERT INTO t2 SELECT (1000+i%1)::text FROM
generate_series(0,1000) i;

-- to simplify the explain plan, not strictly necessary
SET max_parallel_workers_per_gather TO 0;

-- these two are necessary so that the problem is triggered
-- these are helpful to use Merge join and avoid materialization
SET enable_hashjoin TO false;
SET enable_material TO false;

-- the join is on a TEXT column
-- when the join is on INT column with a similar setup, I do not observe
this problem
SELECT count(*) FROM t1 JOIN t2 USING (a);
```


The explain output for the query like the following:
```
explain SELECT count(*) FROM t1 JOIN t2 USING (a);
┌─┐
│   QUERY PLAN
   │
├─┤
│ Aggregate  (cost=177735283.36..177735283.37 rows=1 width=8)
  │
│   ->  Merge Join  (cost=2556923.81..152703372.24 rows=10012764448
width=0)  │
│ Merge Cond: (t1.a = t2.a)
  │
│ ->  Sort  (cost=1658556.19..1683556.63 rows=1175 width=13)
   │
│   Sort Key: t1.a
   │
│   ->  Seq Scan on t1  (cost=0.00..154056.75 rows=1175
width=13) │
│ ->  Sort  (cost=1658507.28..1683506.93 rows=861 width=13)
  │
│   Sort Key: t2.a
   │
│   ->  Seq Scan on t2  (cost=0.00..154053.61 rows=861
width=13)  │
└─┘
(9 rows)
```

In the end, my investigation mostly got me to the following palloc(), where
we seem to allocate memory over and over again as memory grows:
```
(gdb) bt
#0  __GI___libc_malloc (bytes=bytes@entry=8388608) at malloc.c:3038
#1  0x5589f3c55444 in AllocSetAlloc (context=0x5589f4896300, size=14)
at aset.c:920
#2  0x5589f3c5d763 in palloc (size=size@entry=14) at mcxt.c:1082
#3  0x5589f3b1f553 in datumCopy (value=94051002161216,
typByVal=typByVal@entry=false,
typLen=) at datum.c:162
#4  0x5589f3c6ed0b in tuplesort_getdatum (state=state@entry
=0x5589f49274e0,
forward=forward@entry=true, val=0x5589f48d7860, isNull=0x5589f48d7868,
abbrev=abbrev@entry=0x0)
at tuplesort.c:2675
#5  0x5589f3947925 in ExecSort (pstate=0x5589f48d0a38) at nodeSort.c:200
#6  0x5589f393d74c in ExecProcNode (node=0x5589f48d0a38)
at ../../../src/include/executor/executor.h:259
#7  ExecMergeJoin (pstate=0x5589f4896cc8) at nodeMergejoin.c:871
#8  0x5589f391fbc8 in ExecProcNode (node=0x5589f4896cc8)
at ../../../src/include/executor/executor.h:259
#9  fetch_input_tuple 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-22 Thread Önder Kalacı
Hii Wang wei,

>
> 1. In the function GetCheapestReplicaIdentityFullPath.
> +   if (rel->pathlist == NIL)
> +   {
> +   /*
> +* A sequential scan could have been dominated by by an
> index scan
> +* during make_one_rel(). We should always have a
> sequential scan
> +* before set_cheapest().
> +*/
> +   Path   *seqScanPath = create_seqscan_path(root, rel,
> NULL, 0);
> +
> +   add_path(rel, seqScanPath);
> +   }
>
> This is a question I'm not sure about:
> Do we need this part to add sequential scan?
>
> I think in our case, the sequential scan seems to have been added by the
> function make_one_rel (see function set_plain_rel_pathlist).


Yes, the sequential scan is added during make_one_rel.


> If I am missing
> something, please let me know. BTW, there is a typo in above comment: `by
> by`.
>

As the comment mentions, the sequential scan could have been dominated &
removed by index scan, see add_path():

> *We also remove from the rel's pathlist any old paths that are dominated
*  by new_path --- that is, new_path is cheaper, at least as well ordered,
*  generates no more rows, requires no outer rels not required by the old
*  path, and is no less parallel-safe.

Still, I agree that the comment could be improved, which I pushed.


> 2. In the file execReplication.c.
> +#ifdef USE_ASSERT_CHECKING
> +#include "catalog/index.h"
> +#endif
>  #include "commands/trigger.h"
>  #include "executor/executor.h"
>  #include "executor/nodeModifyTable.h"
>  #include "nodes/nodeFuncs.h"
>  #include "parser/parse_relation.h"
>  #include "parser/parsetree.h"
> +#ifdef USE_ASSERT_CHECKING
> +#include "replication/logicalrelation.h"
> +#endif
>
> I think it's fine to only add `logicalrelation.h` here, because `index.h`
> has
> been added by `logicalrelation.h`.
>
>
Makes sense, removed thanks.

Attached v13.


v13_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-22 Thread Önder Kalacı
Hi Peter, Kuroda

kuroda.hay...@fujitsu.com , 21 Eyl 2022 Çar,
04:21 tarihinde şunu yazdı:

> > One last thing - do you think there is any need to mention this
> > behaviour in the pgdocs, or is OK just to be a hidden performance
> > improvement?
>
> FYI - I put my opinion.
> We have following sentence in the logical-replication.sgml:
>
> ```
> ...
> If the table does not have any suitable key, then it can be set
>to replica identity full, which means the entire row
> becomes
>the key.  This, however, is very inefficient and should only be used as
> a
>fallback if no other solution is possible.
> ...
> ```
>
> Here the word "very inefficient" may mean that sequential scans will be
> executed every time.
> I think some descriptions can be added around here.
>

Making a small edit in that file makes sense. I'll attach v13 in the next
email that also includes this change.

Also, do you think is this a good time for me to mark the patch "Ready for
committer" in the commit fest? Not sure when and who should change the
state, but it seems I can change. I couldn't find any documentation on how
that process should work.

Thanks!


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-20 Thread Önder Kalacı
Hi Peter,



>
> 1. src/backend/executor/execReplication.c - build_replindex_scan_key
>
> - * This is not generic routine, it expects the idxrel to be replication
> - * identity of a rel and meet all limitations associated with that.
> + * This is not generic routine, it expects the idxrel to be an index
> + * that planner would choose if the searchslot includes all the columns
> + * (e.g., REPLICA IDENTITY FULL on the source).
>   */
> -static bool
> +static int
>  build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
>   TupleTableSlot *searchslot)
>
>
> (I know this is not caused by your patch but maybe fix it at the same
> time?)
>
> "This is not generic routine, it expects..." -> "This is not a generic
> routine - it expects..."
>
>
Fixed


>
> 2.
>
> + IndexInfo  *indexInfo PG_USED_FOR_ASSERTS_ONLY;
> +
> + /*
> + * This attribute is an expression, and
> + * SuitablePathsForRepIdentFull() was called earlier while the
> + * index for subscriber is selected. There, the indexes comprising
> + * *only* expressions have already been eliminated.
> + *
> + * We sanity check this now.
> + */
> +#ifdef USE_ASSERT_CHECKING
> + indexInfo = BuildIndexInfo(idxrel);
> + Assert(!IndexOnlyOnExpression(indexInfo));
> +#endif
>
> 2a.
> "while the index for subscriber is selected..." -> "when the index for
> the subscriber was selected...”
>
>
fixed


> ~
>
> 2b.
> Because there is only one declaration in this code block you could
> simplify this a bit if you wanted to.
>
> SUGGESTION
> /*
>  * This attribute is an expression, and
>  * SuitablePathsForRepIdentFull() was called earlier while the
>  * index for subscriber is selected. There, the indexes comprising
>  * *only* expressions have already been eliminated.
>  *
>  * We sanity check this now.
>  */
> #ifdef USE_ASSERT_CHECKING
> IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
> Assert(!IndexOnlyOnExpression(indexInfo));
> #endif
>
>
Makes sense, no reason to declare above


> ~~~
>
> 3. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
>
> + /* Start an index scan. */
> + scan = index_beginscan(rel, idxrel, , skey_attoff, 0);
>  retry:
>   found = false;
>
> It might be better to have a blank line before that ‘retry’ label,
> like in the original code.
>

agreed, fixed


>
> ==
>
> 4. src/backend/replication/logical/relation.c
>
> +/* see LogicalRepPartMapEntry for details in logicalrelation.h */
>  static HTAB *LogicalRepPartMap = NULL;
>
> Personally, I'd word that something like:
> "/* For LogicalRepPartMap details see LogicalRepPartMapEntry in
> logicalrelation.h */"
>
> but YMMV.
>

I also don't have any strong opinions on that, updated to your suggestion.


>
> ~~~
>
> 5. src/backend/replication/logical/relation.c -
> GenerateDummySelectPlannerInfoForRelation
>
> +/*
> + * This is not a generic function, helper function for
> + * GetCheapestReplicaIdentityFullPath. The function creates
> + * a dummy PlannerInfo for the given relationId as if the
> + * relation is queried with SELECT command.
> + */
> +static PlannerInfo *
> +GenerateDummySelectPlannerInfoForRelation(Oid relationId)
>
> (mentioned this one in my previous post)
>
> "This is not a generic function, helper function" -> "This is not a
> generic function. It is a helper function"
>

Yes, applied.


>
> ~~~
>
> 6. src/backend/replication/logical/relation.c -
> GetCheapestReplicaIdentityFullPath
>
> +/*
> + * Generate all the possible paths for the given subscriber relation,
> + * for the cases that the source relation is replicated via REPLICA
> + * IDENTITY FULL. The function returns the cheapest Path among the
> + * eligible paths, see SuitablePathsForRepIdentFull().
> + *
> + * The function guarantees to return a path, because it adds sequential
> + * scan path if needed.
> + *
> + * The function assumes that all the columns will be provided during
> + * the execution phase, given that REPLICA IDENTITY FULL guarantees
> + * that.
> + */
> +static Path *
> +GetCheapestReplicaIdentityFullPath(Relation localrel)
>
>
> "for the cases that..." -> "for cases where..."
>
>
sounds good


> ~~~
>
> 7.
>
> + /*
> + * Currently it is not possible for planner to pick a partial index or
> + * indexes only on expressions. We still want to be explicit and eliminate
> + * such paths proactively.
>
> "for planner..." -> "for the planner..."
>

fixed


>
> ==
>
> 8. .../subscription/t/032_subscribe_use_index.pl - general
>
> 8a.
> (remove the 'we')
> "# we wait until..." -> "# wait until..." X many occurrences
>
> ~
>
> 8b.
> (remove the 'we')
> "# we show that..." -> “# show that..." X many occurrences
>

Ok, removed all "we"s in the test

>
> ~~~
>
> 9.
>
> There is inconsistent wording for some of your test case start/end comments
>
> 9a.
> e.g.
> start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
> end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS
>
> ~
>
> 9b.
> e.g.
> start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
> end: SUBSCRIPTION USES INDEX 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-20 Thread Önder Kalacı
Hi Peter,

Thanks for the quick response.


> 1. Commit message
>
> It looks like some small mistake happened. You wrote [1] that my
> previous review comments about the commit message were fixed, but it
> seems the v11 commit message is unchanged since v10.
>
>
Oops, yes you are right, I forgot to push commit message changes. I'll
incorporate all these suggestions on v12.



> ==
>
> 2. src/backend/replication/logical/relation.c -
> GenerateDummySelectPlannerInfoForRelation
>
> +/*
> + * This is not a generic function, helper function for
> + * GetCheapestReplicaIdentityFullPath. The function creates
> + * a dummy PlannerInfo for the given relationId as if the
> + * relation is queried with SELECT command.
> + */
> +static PlannerInfo *
> +GenerateDummySelectPlannerInfoForRelation(Oid relationId)
>
> "generic function, helper function" -> "generic function. It is a
> helper function"
>
>
Fixed.

I'll attach the changes in the next email with v12.

Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-19 Thread Önder Kalacı
Hi Peter,

Thanks again for the review, see my comments below:


>
> ==
>
> 1. Commit message
>
> It is often not feasible to use `REPLICA IDENTITY FULL` on the publication
> because it leads to full table scan per tuple change on the subscription.
> This makes `REPLICA IDENTITY FULL` impracticable -- probably other than
> some small number of use cases.
>
> ~
>
> The "often not feasible" part seems repeated by the "impracticable" part.
>


> SUGGESTION
> Using `REPLICA IDENTITY FULL` on the publication leads to a full table
> scan per tuple change on the subscription. This makes `REPLICA
> IDENTITY FULL` impracticable -- probably other than some small number
> of use cases.
>
> ~~~
>

Sure, this is easier to follow, updated.


>
> 2.
>
> The Majority of the logic on the subscriber side already exists in
> the code.
>
> "Majority" -> "majority"
>
>
fixed


> ~~~
>
> 3.
>
> The ones familiar
> with this part of the code could realize that the sequential scan
> code on the subscriber already implements the `tuples_equal()`
> function.
>
> SUGGESTION
> Anyone familiar with this part of the code might recognize that...
>
> ~~~
>

Yes, this is better, applied


>
> 4.
>
> In short, the changes on the subscriber is mostly
> combining parts of (unique) index scan and sequential scan codes.
>
> "is mostly" -> "are mostly"
>
> ~~~
>
>
applied


> 5.
>
> From the performance point of view, there are few things to note.
>
> "are few" -> "are a few"
>
>
applied


> ==
>
> 6. src/backend/executor/execReplication.c - build_replindex_scan_key
>
> +static int
>  build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
>   TupleTableSlot *searchslot)
>  {
> - int attoff;
> + int index_attoff;
> + int scankey_attoff = 0;
>
> Should it be called 'skey_attoff' for consistency with the param 'skey'?
>
>
That looks better, updated


> ~~~
>
> 7.
>
>   Oid operator;
>   Oid opfamily;
>   RegProcedure regop;
> - int pkattno = attoff + 1;
> - int mainattno = indkey->values[attoff];
> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> + int table_attno = indkey->values[index_attoff];
> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
>
> Maybe the 'optype' should be adjacent to the other Oid opXXX
> declarations just to keep them all together?
>

I do not have any preference on this. Although I do not see such a strong
pattern in the code, I have no objection to doing so changed.

~~~
>
> 8.
>
> + if (!AttributeNumberIsValid(table_attno))
> + {
> + IndexInfo *indexInfo PG_USED_FOR_ASSERTS_ONLY;
> +
> + /*
> + * There are two cases to consider. First, if the index is a primary or
> + * unique key, we cannot have any indexes with expressions. So, at this
> + * point we are sure that the index we are dealing with is not these.
> + */
> + Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
> +RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
> +
> + /*
> + * At this point, we are also sure that the index is not consisting
> + * of only expressions.
> + */
> +#ifdef USE_ASSERT_CHECKING
> + indexInfo = BuildIndexInfo(idxrel);
> + Assert(!IndexOnlyOnExpression(indexInfo));
> +#endif
>
> I was a bit confused by the comment. IIUC the code has already called
> the FilterOutNotSuitablePathsForReplIdentFull some point prior so all
> the unwanted indexes are already filtered out. Therefore these
> assertions are just for no reason, other than sanity checking that
> fact, right? If my understand is correct perhaps a simpler single
> comment is possible:
>

Yes, these are for sanity check


>
> SUGGESTION (or something like this)
> This attribute is an expression, however
> FilterOutNotSuitablePathsForReplIdentFull was called earlier during
> [...] and the indexes comprising only expressions have already been
> eliminated. We sanity check this now. Furthermore, because primary key
> and unique key indexes can't include expressions we also sanity check
> the index is neither of those kinds.
>
> ~~~
>

I agree that we can improve comments here. I incorporated your suggestion
as well.


>
> 9.
> - return hasnulls;
> + /* We should always use at least one attribute for the index scan */
> + Assert (scankey_attoff > 0);
>
> SUGGESTION
> There should always be at least one attribute for the index scan.
>

applied


>
> ~~~
>
> 10. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
>
> ScanKeyData skey[INDEX_MAX_KEYS];
> IndexScanDesc scan;
> SnapshotData snap;
> TransactionId xwait;
> Relation idxrel;
> bool found;
> TypeCacheEntry **eq = NULL; /* only used when the index is not unique */
> bool indisunique;
> int scankey_attoff;
>
> 10a.
> Should 'scankey_attoff' be called 'skey_attoff' for consistency with
> the 'skey' array?
>

Yes, it makes sense as you suggested on build_replindex_scan_key

>
> ~
>
> 10b.
> Also, it might be tidier to declare the 'skey_attoff' adjacent to the
> 'skey'.
>

moved

>
> ==
>
> 11. 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-19 Thread Önder Kalacı
Hi Hayato Kuroda,

Thanks for the review, please see my reply below:


> ===
> For execRelation.c
>
> 01. RelationFindReplTupleByIndex()
>
> ```
> /* Start an index scan. */
> InitDirtySnapshot(snap);
> -   scan = index_beginscan(rel, idxrel, ,
> -
> IndexRelationGetNumberOfKeyAttributes(idxrel),
> -  0);
>
> /* Build scan key. */
> -   build_replindex_scan_key(skey, rel, idxrel, searchslot);
> +   scankey_attoff = build_replindex_scan_key(skey, rel, idxrel,
> searchslot);
>
> +   scan = index_beginscan(rel, idxrel, , scankey_attoff, 0);
> ```
>
> I think "/* Start an index scan. */" should be just above
> index_beginscan().
>

moved there


>
> ===
> For worker.c
>
> 02. sable_indexoid_internal()
>
> ```
> + * Note that if the corresponding relmapentry has InvalidOid
> usableIndexOid,
> + * the function returns InvalidOid.
> + */
> +static Oid
> +usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo
> *relinfo)
> ```
>
> "InvalidOid usableIndexOid" should be "invalid usableIndexOid,"
>

makes sense, updated


>
> 03. check_relation_updatable()
>
> ```
>  * We are in error mode so it's fine this is somewhat slow. It's
> better to
>  * give user correct error.
>  */
> -   if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
> +   if (OidIsValid(rel->usableIndexOid))
> {
> ```
>
> Shouldn't we change the above comment to? The check is no longer slow.
>

Hmm, I couldn't realize this comment earlier. So you suggest "slow" here
refers to the additional function call "GetRelationIdentityOrPK"? If so,
yes I'll update that.


>
> ===
> For relation.c
>
> 04. GetCheapestReplicaIdentityFullPath()
>
> ```
> +static Path *
> +GetCheapestReplicaIdentityFullPath(Relation localrel)
> +{
> +   PlannerInfo *root;
> +   Query  *query;
> +   PlannerGlobal *glob;
> +   RangeTblEntry *rte;
> +   RelOptInfo *rel;
> +   int attno;
> +   RangeTblRef *rt;
> +   List *joinList;
> +   Path *seqScanPath;
> ```
>
> I think the part that constructs dummy-planner state can be move to
> another function
> because that part is not meaningful for this.
> Especially line 824-846 can.
>
>
Makes sense, simplified the function. Though, it is always hard to pick
good names for these kinds of helper functions. I
picked GenerateDummySelectPlannerInfoForRelation(), does that sound good to
you as well?


>
> ===
> For 032_subscribe_use_index.pl
>
> 05. general
>
> ```
> +# insert some initial data within the range 0-1000
> +$node_publisher->safe_psql('postgres',
> +   "INSERT INTO test_replica_id_full SELECT i%20 FROM
> generate_series(0,1000)i;"
> +);
> ```
>
> It seems that the range of initial data seems [0, 19].
> Same mistake-candidates are found many place.
>

Ah, several copy & paste errors. Fixed (hopefully) all.


>
> 06. general
>
> ```
> +# updates 1000 rows
> +$node_publisher->safe_psql('postgres',
> +   "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
> ```
>
> Only 50 tuples are modified here.
> Same mistake-candidates are found many place.
>

Alright, yes there were several wrong comments in the tests. I went over
the tests once more to fix those and improve comments.


>
> 07. general
>
> ```
> +# we check if the index is used or not
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes
> where indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full_3
> updates 200 rows via index";
> ```
> The query will be executed until the index scan is finished, but it may be
> not commented.
> How about changing it to "we wait until the index used on the
> subscriber-side." or something?
> Same comments are found in many place.
>

Makes sense, updated


>
> 08. test related with ANALYZE
>
> ```
> +# Testcase start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE
> - PARTITIONED TABLE
> +# 
> ```
>
> "Testcase start:" should be "Testcase end:" here.
>

thanks, fixed


>
> 09. general
>
> In some tests results are confirmed but in other test they are not.
> I think you can make sure results are expected in any case if there are no
> particular reasons.
>
>
Alright, yes I also don't see a reason not to do that. Added to all cases.


I'll attach the patch with the next email as I also want to incorporate the
other comments. Hope this is not going to be confusing.

Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-14 Thread Önder Kalacı
Hi,


> >
> > Oh, I haven't considered inherited tables. That seems right, the
> statistics of the children are not updated when the parent is analyzed.
> >
> >>
> >> Now, the point I was worried about was what if the changes in child
> >> tables (*_part1, *_part2) are much more than in tbl1? In such cases,
> >> we may not invalidate child rel entries, so how will logical
> >> replication behave for updates/deletes on child tables? There may not
> >> be any problem here but it is better to do some analysis of such cases
> >> to see how it behaves.
> >
> >
> > I also haven't observed any specific issues. In the end, when the user
> (or autovacuum) does ANALYZE on the child, it is when the statistics are
> updated for the child.
> >
>
> Right, I also think that should be the behavior but I have not
> verified it. However, I think it should be easy to verify if
> autovacuum updates the stats for child tables when we operate on only
> one of such tables and whether that will invalidate the cache for our
> case.
>
>
I already added a regression test for this with the title: # Testcase
start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - INHERITED
TABLE

I realized that the comments on the test case were confusing, and clarified
those. Attached the new version also rebased onto the master branch.

Thanks,
Onder


v10_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-01 Thread Önder Kalacı
Hi again,


> ==
>
> 1. Commit message
>
> 1a.
> With this patch, I'm proposing the following change: If there is an
> index on the subscriber, use the index as long as the planner
> sub-modules picks any index over sequential scan. The index should be
> a btree index, not a partital index. Finally, the index should have at
> least one column reference (e.g., cannot consists of only
> expressions).
>
> SUGGESTION
> With this patch, I'm proposing the following change: If there is any
> index on the subscriber, let the planner sub-modules compare the costs
> of index versus sequential scan and choose the cheapest. The index
> should be a btree index, not a partial index, and it should have at
> least one column reference (e.g., cannot consist of only expressions).
>
>
makes sense.


> ~
>
> 1b.
> The Majority of the logic on the subscriber side exists in the code.
>
> "exists" -> "already exists"
>

fixed

>
> ~
>
> 1c.
> psql -c "truncate pgbench_accounts;" -p 9700 postgres
>
> "truncate" -> "TRUNCATE"
>

fixed


> ~
>
> 1d.
> Try to wrap this message text at 80 char width.
>

fixed


>
> ==
>
> 2. src/backend/replication/logical/relation.c - logicalrep_rel_open
>
> + /*
> + * Finding a usable index is an infrequent task. It is performed
> + * when an operation is first performed on the relation, or after
> + * invalidation of the relation cache entry (e.g., such as ANALYZE).
> + */
> + entry->usableIndexOid = LogicalRepUsableIndex(entry->localrel,
> remoterel);
>
> Seemed a bit odd to say "performed" 2x in the same sentence.
>
> "It is performed when..." -> "It occurs when...” (?)
>
>
fixed


> ~~~
>
> 3. src/backend/replication/logical/relation.c - logicalrep_partition_open
>
> + /*
> + * Finding a usable index is an infrequent task. It is performed
> + * when an operation is first performed on the relation, or after
> + * invalidation of the relation cache entry (e.g., such as ANALYZE).
> + */
> + part_entry->relmapentry.usableIndexOid =
> + LogicalRepUsableIndex(partrel, remoterel);
>
> 3a.
> Same as comment #2 above.
>

done


>
> ~
>
> 3b.
> The jumping between 'part_entry' and 'entry' is confusing. Since
> 'entry' is already assigned to be _entry->relmapentry can't you
> use that here?
>
> SUGGESTION
> entry->usableIndexOid = LogicalRepUsableIndex(partrel, remoterel);
>
> Yes, sure it makes sense.


> ~~~
>
> 4. src/backend/replication/logical/relation.c - GetIndexOidFromPath
>
> +/*
> + * Returns a valid index oid if the input path is an index path.
> + * Otherwise, return invalid oid.
> + */
> +static Oid
> +GetIndexOidFromPath(Path *path)
>
> Perhaps may this function comment more consistent with others (like
> GetRelationIdentityOrPK, LogicalRepUsableIndex) and refer to the
> InvalidOid.
>
> SUGGESTION
> /*
>  * Returns a valid index oid if the input path is an index path.
>  *
>  * Otherwise, returns InvalidOid.
>  */
>
> sounds good


> ~~~
>
> 5. src/backend/replication/logical/relation.c - IndexOnlyOnExpression
>
> +bool
> +IndexOnlyOnExpression(IndexInfo *indexInfo)
> +{
> + int i;
> + for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
> + {
> + AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
> + if (AttributeNumberIsValid(attnum))
> + return false;
> + }
> +
> + return true;
> +}
>
> 5a.
> Add a blank line after those declarations.
>
>
Done, also went over all the functions and ensured we don't have this
anymore


> ~
>
> 5b.
> AFAIK the C99 style for loop declarations should be OK [1] for new
> code, so declaring like below would be cleaner:
>
> for (int i = 0; ...
>
> Done

> ~~~
>
> 6. src/backend/replication/logical/relation.c -
> FilterOutNotSuitablePathsForReplIdentFull
>
> +/*
> + * Iterates over the input path list and returns another path list
> + * where paths with non-btree indexes, partial indexes or
> + * indexes on only expressions are eliminated from the list.
> + */
> +static List *
> +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
>
> "are eliminated from the list." -> "have been removed."
>
> Done


> ~~~
>
> 7.
>
> + foreach(lc, pathlist)
> + {
> + Path*path = (Path *) lfirst(lc);
> + Oid indexOid = GetIndexOidFromPath(path);
> + Relation indexRelation;
> + IndexInfo *indexInfo;
> + bool is_btree;
> + bool is_partial;
> + bool is_only_on_expression;
> +
> + if (!OidIsValid(indexOid))
> + {
> + /* Unrelated Path, skip */
> + suitableIndexList = lappend(suitableIndexList, path);
> + }
> + else
> + {
> + indexRelation = index_open(indexOid, AccessShareLock);
> + indexInfo = BuildIndexInfo(indexRelation);
> + is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> + is_partial = (indexInfo->ii_Predicate != NIL);
> + is_only_on_expression = IndexOnlyOnExpression(indexInfo);
> + index_close(indexRelation, NoLock);
> +
> + if (is_btree && !is_partial && !is_only_on_expression)
> + suitableIndexList = lappend(suitableIndexList, path);
> + }
> + }
>
> I think most of those variables are only used in the "else" block so
> maybe it's better to declare them 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-01 Thread Önder Kalacı
Hi Peter,

Thanks for the reviews! I'll reply to both of your reviews separately.


> >> 10.
> >>
> >> + /* we only need to allocate once */
> >> + if (eq == NULL)
> >> + eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
> >>
> >> But shouldn't you also free this 'eq' before the function returns, to
> >> prevent leaking memory?
> >>
> >
> > Two notes here. First, this is allocated inside ApplyMessageContext,
> which seems to be reset per tuple change. So, that seems like a good
> boundary to keep this allocation in memory.
> >
>
> OK, fair enough. Is it worth adding a comment to say that or not?
>

Yes, sounds good. Added 1 sentence comment, I'll push this along with my
other changes on v10.


Thanks,
Onder


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-23 Thread Önder Kalacı
Hi,

Thanks for the review!


>
> 1.
> In FilterOutNotSuitablePathsForReplIdentFull(), is
> "nonPartialIndexPathList" a
> good name for the list? Indexes on only expressions are also be filtered.
>
> +static List *
> +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
> +{
> +   ListCell   *lc;
> +   List *nonPartialIndexPathList = NIL;
>
>
Yes, true. We only started filtering the non-partial ones first. Now
changed to *suitableIndexList*, does that look right?


> 2.
> +typedef struct LogicalRepPartMapEntry
> +{
> +   Oid partoid;/*
> LogicalRepPartMap's key */
> +   LogicalRepRelMapEntry relmapentry;
> +   Oid usableIndexOid; /* which index to use?
> (Invalid when no index
> +* used) */
> +} LogicalRepPartMapEntry;
>
> For partition tables, is it possible to use relmapentry->usableIndexOid to
> mark
> which index to use? Which means we don't need to add "usableIndexOid" to
> LogicalRepPartMapEntry.
>
>
My intention was to make this explicit so that it is clear that partitions
can explicitly own indexes.

But I tried your suggested refactor, which looks good. So, I changed it.

Also, I realized that I do not have a test where the partition has an index
(not inherited from the parent), which I also added now.


> 3.
> It looks we should change the comment for FindReplTupleInLocalRel() in this
> patch.
>
> /*
>  * Try to find a tuple received from the publication side (in
> 'remoteslot') in
>  * the corresponding local relation using either replica identity index,
>  * primary key or if needed, sequential scan.
>  *
>  * Local tuple, if found, is returned in '*localslot'.
>  */
> static bool
> FindReplTupleInLocalRel(EState *estate, Relation localrel,
>
>
I made a small change, just adding "index". Do you expect a larger change?


> 4.
> @@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData
> *edata,
>  {
> EState *estate = edata->estate;
> Relationlocalrel = relinfo->ri_RelationDesc;
> -   LogicalRepRelation *remoterel = >targetRel->remoterel;
> +   LogicalRepRelMapEntry *targetRel = edata->targetRel;
> +   LogicalRepRelation *remoterel = >remoterel;
> EPQStateepqstate;
> TupleTableSlot *localslot;
>
> Do we need this change? I didn't see any place to use the variable
> targetRel
> afterwards.
>

Seems so, changed it back.


> 5.
> +   if (!AttributeNumberIsValid(mainattno))
> +   {
> +   /*
> +* There are two cases to consider. First, if the
> index is a primary or
> +* unique key, we cannot have any indexes with
> expressions. So, at this
> +* point we are sure that the index we deal is not
> these.
> +*/
> +   Assert(RelationGetReplicaIndex(rel) !=
> RelationGetRelid(idxrel) &&
> +  RelationGetPrimaryKeyIndex(rel) !=
> RelationGetRelid(idxrel));
> +
> +   /*
> +* For a non-primary/unique index with an
> expression, we are sure that
> +* the expression cannot be used for replication
> index search. The
> +* reason is that we create relevant index paths
> by providing column
> +* equalities. And, the planner does not pick
> expression indexes via
> +* column equality restrictions in the query.
> +*/
> +   continue;
> +   }
>
> Is it possible that it is a usable index with an expression? I think
> indexes
> with an expression has been filtered in
> FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index
> with
> an expression, maybe we shouldn't use "continue" here.
>

Ok, I think there are some confusing comments in the code, which I updated.
Also, added one more explicit Assert to make the code a little more
readable.

We can support indexes involving expressions but not indexes that are only
consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull()
filters out the latter, see IndexOnlyOnExpression().

So, for example, if we have an index as below, we are skipping the
expression while building the index scan keys:

CREATE INDEX people_names ON people (firstname, lastname, (id || '_' ||
sub_id));

We can consider removing `continue`, but that'd mean we should also adjust
the following code-block to handle indexprs. To me, that seems like an edge
case to implement at this point, given such an index is probably not
common. Do you think should I try to use the indexprs as well while
building the scan key?

I'm mostly trying to keep the complexity small. If you suggest this
limitation should be lifted, I can give it a shot. I think the limitation I
leave here is with a 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-20 Thread Önder Kalacı
Hi,

I'm a little late to catch up with your comments, but here are my replies:

> My answer for the above assumes that your question is regarding what
> happens if you ANALYZE on a partitioned table. If your question is
> something different, please let me know.
> >
>
> I was talking about inheritance cases, something like:
> create table tbl1 (a int);
> create table tbl1_part1 (b int) inherits (tbl1);
> create table tbl1_part2 (c int) inherits (tbl1);
>
> What we do in such cases is documented as: "if the table being
> analyzed has inheritance children, ANALYZE gathers two sets of
> statistics: one on the rows of the parent table only, and a second
> including rows of both the parent table and all of its children. This
> second set of statistics is needed when planning queries that process
> the inheritance tree as a whole. The child tables themselves are not
> individually analyzed in this case."


Oh, I haven't considered inherited tables. That seems right, the
statistics of the children are not updated when the parent is analyzed.


> Now, the point I was worried about was what if the changes in child
> tables (*_part1, *_part2) are much more than in tbl1? In such cases,
> we may not invalidate child rel entries, so how will logical
> replication behave for updates/deletes on child tables? There may not
> be any problem here but it is better to do some analysis of such cases
> to see how it behaves.
>

I also haven't observed any specific issues. In the end, when the user (or
autovacuum) does ANALYZE on the child, it is when the statistics are
updated for the child. Although I do not have much experience with
inherited tables, this sounds like the expected behavior?

I also pushed a test covering inherited tables. First, a basic test on the
parent. Then, show that updates on the parent can also use indexes of the
children. Also, after an ANALYZE on the child, we can re-calculate the
index and use the index with a higher cardinality column.


> > Also, for the majority of the use-cases, I think we'd probably expect an
> index on a column with high cardinality -- hence use index scan. So, bitmap
> index scans are probably not going to be that much common.
> >
>
> You are probably right here but I don't think we can make such
> assumptions. I think the safest way to avoid any regression here is to
> choose an index when the planner selects an index scan. We can always
> extend it later to bitmap scans if required. We can add a comment
> indicating the same.
>
>
Alright, I got rid of the bitmap scans.

Though, it caused few of the new tests to fail. I think because of the data
size/distribution, the planner picks bitmap scans. To make the tests
consistent and small, I added `enable_bitmapscan to off` for this new test
file. Does that sound ok to you? Or, should we change the tests to make
sure they genuinely use index scans?

*
> + /*
> + * For insert-only workloads, calculating the index is not necessary.
> + * As the calculation is not expensive, we are fine to do here (instead
> + * of during first update/delete processing).
> + */
>
> I think here instead of talking about cost, we should mention that it
> is quite an infrequent operation i.e performed only when we first time
> performs an operation on the relation or after invalidation. This is
> because I think the cost is relative.
>

Changed, does that look better?

+
> + /*
> + * Although currently it is not possible for planner to pick a
> + * partial index or indexes only on expressions,
>
> It may be better to expand this comment by describing a bit why it is
> not possible in our case. You might want to give the function
> reference where it is decided.
>
> Make sense, added some more information.

Thanks,
Onder


v7_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-12 Thread Önder Kalacı
Hi,


>
> FYI, I noticed that v5 causes cfbot failure in [1].
> Could you please fix it in the next version ?
>

Thanks for letting me know!


>
> [19:44:38.420] execReplication.c: In function
> ‘RelationFindReplTupleByIndex’:
> [19:44:38.420] execReplication.c:186:24: error: ‘eq’ may be used
> uninitialized in this function [-Werror=maybe-uninitialized]
> [19:44:38.420]   186 |   if (!indisunique && !tuples_equal(outslot,
> searchslot, eq))
> [19:44:38.420]   |
> ^
> [19:44:38.420] cc1: all warnings being treated as errors
>
>
It is kind of interesting that the compiler cannot understand that `eq` is
only used when *!indisunique. *Anyway, now I've sent v6 where I avoid the
warning with a slight refactor to avoid the compile warning.

Thanks,
Onder KALACI


v6_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: Materialized view rewrite is broken when there is an event trigger

2022-08-09 Thread Önder Kalacı
Hi,

I should be able to grab a little bit of time next week to look at
> what you have.
>

Thanks!


>
> Please note that we should not add an event in create_am.sql even if
> it is empty, as it gets run in parallel of other tests so there could
> be interferences.  I think that this had better live in
> sql/event_trigger.sql, even if it requires an extra table AM to check
> this specific case.
> --
>


Moved the test to event_trigger.sql.

> parallel group (2 tests, in groups of 1):  event_trigger oidjoins

Though, it also seems to run in parallel, but I assume the parallel test
already works fine with concurrent event triggers?

Thanks,
Onder


v2-Allow-MATERIALIZED-VIEW-Rewrite-when-event-triggers.patch
Description: Binary data


Materialized view rewrite is broken when there is an event trigger

2022-08-09 Thread Önder Kalacı
Hi,

Attached a patch to fix as well. If the patch looks good to you, can you
consider getting this to PG 15?

Steps to repro:
-- some basic examples from src/test/regress/sql/create_am.sql
CREATE TABLE heaptable USING heap AS
SELECT a, repeat(a::text, 100) FROM generate_series(1,9) AS a;
CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
CREATE MATERIALIZED VIEW heapmv USING heap AS SELECT * FROM heaptable;

-- altering MATERIALIZED
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap;

-- setup event trigger
CREATE OR REPLACE FUNCTION empty_event_trigger()
  RETURNS event_trigger AS $$
DECLARE
BEGIN
END;
$$ LANGUAGE plpgsql;
CREATE EVENT TRIGGER empty_triggger ON sql_drop EXECUTE PROCEDURE
empty_event_trigger();

-- now, after creating an event trigger, ALTER MATERIALIZED VIEW fails
unexpectedly
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ERROR:  unexpected command tag "ALTER MATERIALIZED VIEW"

Thanks,
Onder Kalaci


v1-Allow-MATERIALIZED-VIEW-Rewrite-when-event-triggers.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-08 Thread Önder Kalacı
Hi,

Thanks for the feedback, see my reply below.

>
> > It turns out that we already invalidate the relevant entries in
> LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any
> of the statistics in pg_class.
> >
> > The call-stack for analyze is roughly:
> > do_analyze_rel()
> >-> vac_update_relstats()
> >  -> heap_inplace_update()
> >  -> if needs to apply any statistical change
> >  -> CacheInvalidateHeapTuple()
> >

Yeah, it appears that this will work but I see that we don't update
> here for inherited stats, how does it work for such cases?


There, the expansion of the relation list to partitions happens one level
above on the call stack. So, the call stack looks like the following:

autovacuum_do_vac_analyze() (or ExecVacuum)
  -> vacuum()
 -> expand_vacuum_rel()
  -> rel_list=parent+children partitions
  -> for rel in rel_list
  ->analyze_rel()
  ->do_analyze_rel
  ... (and the same call stack as above)

I also added one variation of a similar test for partitioned tables, which
I earlier added for non-partitioned tables as well:

Added a test which triggers this behavior. The test is as follows:
> - Create two indexes on the target, on column_a and column_b
> - Initially load data such that the column_a has a high cardinality
> - Show that we use the index on column_a on a *child *table
> - Load more data such that the column_b has higher cardinality
> - ANALYZE on the *parent* table
> - Show that we use the index on column_b afterwards on the *child* table


My answer for the above assumes that your question is regarding what
happens if you ANALYZE on a partitioned table. If your question is
something different, please let me know.


> >> BTW, do we want to consider partial indexes for the scan in this
> >> context? I mean it may not have data of all rows so how that would be
> >> usable?
> >>
> >
> > As far as I can see, check_index_predicates() never picks a partial
> index for the baserestrictinfos we create in
> CreateReplicaIdentityFullPaths(). The reason is that we have roughly the
> following call stack:
> >
> > -check_index_predicates
> >  --predicate_implied_by
> > ---predicate_implied_by_recurse
> > predicate_implied_by_simple_clause
> > -operator_predicate_proof
> >
> > And, inside operator_predicate_proof(), there is never going to be an
> equality. Because, we push `Param`s to the baserestrictinfos whereas the
> index predicates are always `Const`.
> >
>
> I agree that the way currently baserestrictinfos are formed by patch,
> it won't select the partial path, and chances are that that will be
> true in future as well but I think it is better to be explicit in this
> case to avoid creating a dependency between two code paths.
>
>
Yes, it makes sense. So, I changed Assert into a function where we filter
partial indexes and indexes on only expressions, so that we do not create
such dependencies between the planner and here.

If one day planner supports using column values on index with expressions,
this code would only not be able to use the optimization until we do some
improvements in this code-path. I think that seems like a fair trade-off
for now.

Few other comments:
> ==
> 1. Why is it a good idea to choose the index selected even for the
> bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
> during update/delete, so not sure how we can conclude to use index for
> bitmap paths.
>

In our case, during update/delete we are searching for a single tuple on
the target. And, it seems like using an index is probably going to be
cheaper for finding the single tuple. In general, I thought we should use
an index if the planner ever decides to use it with the given restrictions.

Also, for the majority of the use-cases, I think we'd probably expect an
index on a column with high cardinality -- hence use index scan. So, bitmap
index scans are probably not going to be that much common.

Still, I don't see a problem with using such indexes. Of course, it is
possible that I might be missing something. Do you have any specific
concerns in this area?


>
> 2. The index info is built even on insert, so workload, where there
> are no updates/deletes or those are not published then this index
> selection work will go waste. Will it be better to do it at first
> update/delete? One can say that it is not worth the hassle as anyway
> it will be built the first time we perform an operation on the
> relation or after the relation gets invalidated.


With the current approach, the index (re)-calculation is coupled with
(in)validation of the relevant cache entries. So, I'd argue for the
simplicity of the code, we could afford to waste this small overhead?
According to my local measurements, especially for large tables, the index
oid calculation is mostly insignificant compared to the rest of the steps.
Does that sound OK to you?



> If we think so, then
> 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-01 Thread Önder Kalacı
Hi,

As far as I can see, the following is the answer to the only remaining open
discussion in this thread. Let me know if anything is missed.

(b) it appears to me that the patch decides
> >> which index to use the first time it opens the rel (or if the rel gets
> >> invalidated) on subscriber and then for all consecutive operations it
> >> uses the same index. It is quite possible that after some more
> >> operations on the table, using the same index will actually be
> >> costlier than a sequence scan or some other index scan
> >
> >
> > Regarding (b), yes that is a concern I share. And, I was actually
> considering sending another patch regarding this.
> >
> > Currently, I can see two options and happy to hear your take on these
> (or maybe another idea?)
> >
> > - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE
> or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
> re-create the cache entries. In this case, as far as I can see, we need a
> callback that is called when table "ANALYZE"d, because that is when the
> statistics change. That is the time picking a new index makes sense.
> > However, that seems like adding another dimension to this patch, which I
> can try but also see that committing becomes even harder.
> >
>
> This idea sounds worth investigating. I see that this will require
> more work but OTOH, we can't allow the existing system to regress
> especially because depending on workload it might regress badly. We
> can create a patch for this atop the base patch for easier review/test
> but I feel we need some way to address this point.
>
>
It turns out that we already invalidate the relevant entries
in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates
any of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
   -> vac_update_relstats()
 -> heap_inplace_update()
 -> if needs to apply any statistical change
 -> CacheInvalidateHeapTuple()

And, we register for those invalidations already:
logicalrep_relmap_init() / logicalrep_partmap_init()
   -> CacheRegisterRelcacheCallback()



Added a test which triggers this behavior. The test is as follows:
- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a
- Load more data such that the column_b has higher cardinality
- ANALYZE on the target table
- Show that we use the index on column_b afterwards

Thanks,
Onder KALACI
From 890fd74d57979eb75a976b2131ee619581dc225f Mon Sep 17 00:00:00 2001
From: Onder Kalaci 
Date: Tue, 17 May 2022 10:47:39 +0200
Subject: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full
 on the publisher

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other than some small number of use cases.

With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index as long as the planner sub-modules picks any index over sequential scan.

Majority of the logic on the subscriber side has already existed in the code. The subscriber is already capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function. In short, the changes on the subscriber is mostly combining parts of (unique) index scan and sequential scan codes.

The decision on whether to use an index (or which index) is mostly derived from planner infrastructure. The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let the planner sub-module -- `create_index_paths()` -- to give us the relevant  index `Path`s. On top of that, add the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.

From the performance point of view, there are few things to note. First, the patch aims not to
change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
to `index scan` vs `sequential scan`. That's why it is hard to claim certain number of improvements.
It mostly depends on the data size, index and the data distribution.

Still, below I try to show case the potential improvements using an index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.

// init source db
pgbench -i -s 100 -p 5432 postgres

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-07-29 Thread Önder Kalacı
Hi,

2.
>> @@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
>> Relation idxrel,
>>   int2vector *indkey = >rd_index->indkey;
>>   bool hasnulls = false;
>>
>> - Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
>> -RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));
>>
>> You have removed this assertion but there is a comment ("This is not
>> generic routine, it expects the idxrel to be replication identity of a
>> rel and meet all limitations associated with that.") atop this
>> function which either needs to be changed/removed and probably we
>> should think if the function needs some change after removing that
>> restriction.
>>
>>
> Ack, I can see your point. I think, for example, we should skip index
> attributes that are not simple column references. And, probably whatever
> other restrictions that PRIMARY has, should be here.
>

Primary keys require:
- Unique: We don't need uniqueness, that's the point of this patch
- Valid index: Should not be an issue in this case, because planner would
not pick non-valid index anyway.
- Non-Partial index: As discussed earlier in this thread, I really don't
see any problems with partial indexes for this use-case. Please let me know
if there is anything I miss.
- Deferrable - Immediate: As far as I can see, there is no such concepts
for regular indexes, so does not apply here
- Indexes with no expressions: This is the point where we require some
minor changes inside/around `build_replindex_scan_key `. Previously,
indexes on expressions could not be replica indexes. And, with this patch
they can. However, the expressions cannot be used for filtering the tuples
because of the way we create the restrictinfos. We essentially create
`WHERE col_1 = $1 AND col_2 = $2 .. col_n = $n` for the columns with
equality operators available. In the case of expressions on the indexes,
the planner would never pick such indexes with these restrictions. I
changed `build_replindex_scan_key ` to reflect that, added a new assert and
pushed tests with the following schema, and make sure the code behaves as
expected:

CREATE TABLE people (firstname text, lastname text);
CREATE INDEX people_names_expr_only ON people ((firstname || ' ' ||
lastname));
CREATE INDEX people_names_expr_and_columns ON people ((firstname || ' ' ||
lastname), firstname, lastname);

Also did similar tests with indexes on jsonb fields. Does that help you
avoid the concerns regarding indexes with expressions?

I'll work on one of the other open items in the thread (e.g., analyze
invalidation callback) separately.

Thanks,
Onder KALACI


v3_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-07-22 Thread Önder Kalacı
Hi,

>
> >
> > One another idea could be to re-calculate the index, say after N
> updates/deletes for the table. We may consider using subscription_parameter
> for getting N -- with a good default, or even hard-code into the code. I
> think the cost of re-calculating should really be pretty small compared to
> the other things happening during logical replication. So, a sane default
> might work?
> >
>
> One difficulty in deciding the value of N for the user or choosing a
> default would be that we need to probably consider the local DML
> operations on the table as well.
>
>
Fair enough, it is not easy to find a good default.


> > If you think the above doesn't work, I can try to work on a separate
> patch which adds something like "analyze invalidation callback".
> >
>
> I suggest we should give this a try and if this turns out to be
> problematic or complex then we can think of using some heuristic as
> you are suggesting above.
>

Alright, I'll try this and respond shortly back.


> >
> > OR use a complex heuristic with a reasonable index pick. And, the latter
> approach converged to the planner code in the patch. Do you think the
> former approach is acceptable?
> >
>
> In this regard, I was thinking in which cases a sequence scan can be
> better than the index scan (considering one is available). I think if
> a certain column has a lot of duplicates (for example, a column has a
> boolean value) then probably doing a sequence scan is better. Now,
> considering this even though your other approach sounds simpler but
> could lead to unpredictable results. So, I think the latter approach
> is preferable.
>

Yes, it makes sense. I also considered this during the development of the
patch, but forgot to mention :)


>
> BTW, do we want to consider partial indexes for the scan in this
> context? I mean it may not have data of all rows so how that would be
> usable?
>
>
As far as I can see, check_index_predicates() never picks a partial index
for the baserestrictinfos we create in CreateReplicaIdentityFullPaths().
The reason is that we have roughly the following call stack:

-check_index_predicates
 --predicate_implied_by
---predicate_implied_by_recurse
predicate_implied_by_simple_clause
-operator_predicate_proof

And, inside operator_predicate_proof(), there is never going to be an
equality. Because, we push `Param`s to the baserestrictinfos whereas the
index predicates are always `Const`.

If we want to make it even more explicit, I can filter out `Path`s with
partial indexes. But that seems redundant to me. For now, I pushed the
commit with an assertion that we never pick partial indexes and also added
a test.

If you think it is better to explicitly filter out partial indexes, I can
do that as well.



> Few comments:
> ===
> 1.
> static List *
> +CreateReplicaIdentityFullPaths(Relation localrel)
> {
> ...
> + /*
> + * Rather than doing all the pushups that would be needed to use
> + * set_baserel_size_estimates, just do a quick hack for rows and width.
> + */
> + rel->rows = rel->tuples;
>
> Is it a good idea to set rows without any selectivity estimation?
> Won't this always set the entire rows in a relation? Also, if we don't
> want to use set_baserel_size_estimates(), how will we compute
> baserestrictcost which will later be used in the costing of paths (for
> example, costing of seqscan path (cost_seqscan) uses it)?
>
> In general, I think it will be better to consider calling some
> top-level planner functions even for paths. Can we consider using
> make_one_rel() instead of building individual paths?


Thanks, this looks like a good suggestion/simplification. I wanted to use
the least amount of code possible, and make_one_rel() does either what I
exactly need or slightly more, which is great.

Note that make_one_rel() also follows the same call stack that I noted
above. So, I cannot spot any problems with partial indexes. Maybe am I
missing something here?


> On similar lines,
> in function PickCheapestIndexPathIfExists(), can we use
> set_cheapest()?
>
>
Yes, make_one_rel() + set_cheapest() sounds better. Changed.


> 2.
> @@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
> Relation idxrel,
>   int2vector *indkey = >rd_index->indkey;
>   bool hasnulls = false;
>
> - Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
> -RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));
>
> You have removed this assertion but there is a comment ("This is not
> generic routine, it expects the idxrel to be replication identity of a
> rel and meet all limitations associated with that.") atop this
> function which either needs to be changed/removed and probably we
> should think if the function needs some change after removing that
> restriction.
>
>
Ack, I can see your point. I think, for example, we should skip index
attributes that are not simple column references. And, probably whatever
other restrictions that PRIMARY has, should be here.


  1   2   >