Re: why there is not VACUUM FULL CONCURRENTLY?

Junwang Zhao Fri, 30 Aug 2024 03:18:46 -0700

Hi,

On Tue, Aug 27, 2024 at 8:01 PM Antonin Houska <a...@cybertec.at> wrote:
>
> Attached is version 2, the feature itself is now in 0004.
>
> Unlike version 1, it contains some regression tests (0006) and a new GUC to
> control how long the AccessExclusiveLock may be held (0007).
>
> Kirill Reshke <reshkekir...@gmail.com> wrote:
>
> > On Fri, 2 Aug 2024 at 11:09, Antonin Houska <a...@cybertec.at> wrote:
> > >
> > > Kirill Reshke <reshkekir...@gmail.com> wrote:
> > > > However, in general, the 3rd patch is really big, very hard to
> > > > comprehend.  Please consider splitting this into smaller (and
> > > > reviewable) pieces.
> > >
> > > I'll try to move some preparation steps into separate diffs, but not sure 
> > > if
> > > that will make the main diff much smaller. I prefer self-contained 
> > > patches, as
> > > also explained in [3].
> >
> > Thanks for sharing [3], it is a useful link.
> >
> > There is actually one more case when ACCESS EXCLUSIVE is held: during
> > table rewrite (AT set TAM, AT set Tablespace and AT alter column type
> > are some examples).
> > This can be done CONCURRENTLY too, using the same logical replication
> > approach, or do I miss something?
>
> Yes, the logical replication can potentially be used in other cases.
>
> > I'm not saying we must do it immediately, this should be a separate
> > thread, but we can do some preparation work here.
> >
> > I can see that a bunch of functions which are currently placed in
> > cluster.c can be moved to something like
> > logical_rewrite_heap.c. ConcurrentChange struct and
> > apply_concurrent_insert function  is one example of such.
> >
> > So, if this is the case, 0003 patch can be splitted in two:
> > The first one is general utility code for logical table rewrite
> > The second one with actual VACUUM CONCURRENTLY feature.
>
> > What do you think?
>
> I can imagine moving the function process_concurrent_changes() and subroutines
> to a different file (e.g. rewriteheap.c), but moving it into a separate diff
> that does not contain any call of the function makes little sense to me. Such
> a diff would not add any useful functionality and could not be considered
> refactoring either.
>
> So far I at least moved some code to separate diffs: 0003 and 0005. I'll move
> more if I find sensible opportunity in the future.
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>


Thanks for working on this, I think this is a very useful feature.

The patch doesn't compile in the debug build with errors:

../postgres/src/backend/commands/cluster.c: In function ‘get_catalog_state’:
../postgres/src/backend/commands/cluster.c:2771:33: error: declaration
of ‘td_src’ shadows a previous local [-Werror=shadow=compatible-local]
2771 | TupleDesc td_src, td_dst;
| ^~~~~~
../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed
declaration is here
2741 | TupleDesc td_src = RelationGetDescr(rel);

you forgot the meson build for pgoutput_cluster

diff --git a/src/backend/meson.build b/src/backend/meson.build
index 78c5726814..0f9141a4ac 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_cluster')

I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode
in the codebase, but I remember someone proposed all changes to lockmode, how
about sticking to lockmode in your patch?

0004:

+      sure that the old files do not change during the processing because the
+      chnages would get lost due to the swap.
typo

+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short.

I remember pg_squeeze also did some logical decoding after getting the exclusive
lock, if that is still true, I guess the doc above is not precise.

+      Note that <command>CLUSTER</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the clustering started.

Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY
does not order rows at all, why bother implementing it?

+ errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations")));

errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY"
instead of parameter *stmt*.

+ ResourceOwner oldowner = CurrentResourceOwner;
+
+ /*
+ * In the CONCURRENT case, do the planning in a subtrensaction so that
typo

I did not see VacuumStmt changes in gram.y, how do we suppose to
use the vacuum full concurrently? I tried the following but no success.

[local] postgres@demo:5432-36097=# vacuum (concurrently) aircrafts_data;
ERROR:  CONCURRENTLY can only be specified with VACUUM FULL

[local] postgres@demo:5432-36097=# vacuum full (concurrently) full
aircrafts_data;
ERROR:  syntax error at or near "("
LINE 1: vacuum full (concurrently) full aircrafts_data;


-- 
Regards
Junwang Zhao

Re: why there is not VACUUM FULL CONCURRENTLY?

Reply via email to