Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Önder Kalacı Sun, 12 Mar 2023 08:07:51 -0700

Hi Amit, all

>
> 1.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
> +# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY
>
> No need to use Delete test separate for this.
>


Yeah, there is really no difference between update/delete for this patch,
so it makes sense. I initially added it for completeness for the coverage,
but as it has the perf overhead for the tests, I agree that we could
drop some of those.


>
> 2.
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
> +
> +# check if the index is used even when the index has NULL values
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates test_replica_id_full table";
>
> Here, I think only one update is sufficient.
>

done. I guess you requested this change so that we would wait
for idx_scan=1 not idx_scan=2, which could help.


> 3.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE INDEX people_last_names ON people(lastname)");
> +
> +# wait until the index is created
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'people_last_names';}
> +) or die "Timed out while waiting for creating index people_last_names";
>
> I don't think we need this poll.
>

 true, not sure why I have this. none of the tests has this anyway.


> 4.
> +# update 2 rows
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_1';");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_3' AND lastname = 'last_name_3';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates two rows via index scan with index on
> expressions and columns";
> +
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM people WHERE firstname = 'no-name';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full deletes two rows via index scan with index on
> expressions and columns";
>
> I think having one update or delete should be sufficient.
>

So, I dropped the 2nd update, but kept 1 update and 1 delete.
The latter deletes the tuple updated by the former. Seems like
an interesting test to keep.

Still, I dropped one of the extra poll_query_until, which is probably
good enough for this one? Let me know if you think otherwise.


>
> 5.
> +# update rows, moving them to other partitions
> +$node_publisher->safe_psql('postgres',
> + "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for updates on partitioned table with
> index";
> +
> +# delete rows from different partitions
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates partitioned table";
> +
>
> Can we combine these two polls?
>

Looking at it closely, the first one seems like an unnecessary poll anyway.
We can simply check the idxscan at the end of the test, I don't see
value in checking earlier.


>
> 6.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS,
> ALSO
> +# DROPS COLUMN
>
> In this test, let's try to update/delete 2-3 rows instead of 20. And
> after drop columns, let's keep just one of the update or delete.
>

changed to 3 rows


>
> 7. Apart from the above, I think it is better to use
> wait_for_catchup() consistently before trying to verify the data on
> the subscriber. We always use it in other tests. I guess here you are
> relying on the poll for index scans to ensure that data is replicated
> but I feel it may still be better to use wait_for_catchup().
>

Yes, that was my understanding & expectation. I'm not convinced that
 wait_for_catchup() is strictly needed, as without catching up, how could
pg_stat_all_indexes be updated? Still, it is good to be consistent
with the test suite. So, applied your suggestion.

Similarly, wait_for_subscription_sync uses the publisher name and
> appname in other tests, so it is better to be consistent. It can avoid
> random failures by ensuring data is synced.
>

makes sense.

I'll attach a new patch in the next e-mail, along with your
other comments.


Thanks,
Onder KALACI

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Reply via email to