On Tue, Apr 1, 2025, at 10:31 AM, Antonin Houska wrote: > One more version, hopefully to make cfbot happy (I missed the bug because I > did not set the RELCACHE_FORCE_RELEASE macro in my environment.)
I started reviewing this patch. It was in my radar to review it but I didn't have spare time until now. The syntax is fine for me. It is really close to CLUSTER syntax but it adds one detail: INDEX keyword. It is a good approach because USING clause can be expanded in the future if required. + <refnamediv> + <refname>REPACK</refname> + <refpurpose>cluster a table according to an index</refpurpose> + </refnamediv> This description is not accurate because the index is optional. It means if the index is not specified, it is a "rewrite" instead of a "cluster". One suggestion is to use "rewrite" because it says nothing about the tuple order. A "rewrite a table" or "rewrite a table to reclaim space" are good candidates. On the other hand, the command is called "repack" and it should be a strong candidate for the verb in this description. (I'm surprised that repack is not a recent term [1]). It seems a natural choice. +<synopsis> +REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX<replaceable class="parameter">index_name</replaceable> ] ] You missed a space after INDEX. + <para> + Because the planner records statistics about the ordering of tables, it is + advisable to + run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the + newly repacked table. Otherwise, the planner might make poor choices of + query plans. + </para> If we decide for another term (other than "repacked") then it should reflect here and in some other parts ("repacking" is also used) too. + <para> + When an index scan or a sequential scan without sort is used, a temporary + copy of the table is created that contains the table data in the index + order. That's true for CLUSTER but not for REPACK. Index is optional. + Prints a progress report as each table is clustered + at <literal>INFO</literal> level. s/clustered/repacked/? + Repacking a partitioned table repacks each of its partitions. If an index + is specified, each partition is clustered using the partition of that + index. <command>REPACK</command> on a partitioned table cannot be executed + inside a transaction block. Ditto. + <para> + Cluster the table <literal>employees</literal> on the basis of its + index <literal>employees_ind</literal>: +<programlisting> +REPACK employees USING INDEX employees_ind; +</programlisting> + </para> It sounds strange to use "Repack" in the other examples but this one it says "Cluster". Let's use the same terminology. + + <warning> + <para> + The <command>FULL</command> parameter is deprecated in favor of + <xref linkend="sql-repack"/>. + </para> + </warning> + The warnings, notes, and tips are usually placed *after* the description. --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap, Is it worth to rename table_relation_copy_for_cluster() and heapam_relation_copy_for_cluster() to replace cluster with repack? + SELECT + S.pid AS pid, + S.datid AS datid, + D.datname AS datname, + S.relid AS relid, + CASE S.param1 WHEN 1 THEN 'REPACK' + END AS command, Do you really need command? IIUC REPACK is the only command that will used by this view. There is no need to differentiate commands here. + * + * 'cmd' indicates which commands is being executed. REPACK should be the only + * caller of this function in the future. command. + * + * REPACK does not set indisclustered. XXX Not sure I understand the + * comment above: how can an attribute be set "only in the current + * database"? */ pg_index is a local catalog. To be consistent while clustering a shared catalog, it should set indisclustered in all existent databases because in each pg_index table there is a tuple for the referred index. As the comment says it is not possible. euler=# select relname, relkind, pg_relation_filepath(oid) from pg_class where relname = 'pg_index'; relname | relkind | pg_relation_filepath ----------+---------+---------------------- pg_index | r | base/16424/2610 (1 row) euler=# select indexrelid::regclass, indexrelid::regclass, indisclustered from pg_index where indrelid = 'pg_database'::regclass; indexrelid | indexrelid | indisclustered ---------------------------+---------------------------+---------------- pg_database_datname_index | pg_database_datname_index | f pg_database_oid_index | pg_database_oid_index | f (2 rows) euler=# \c postgres You are now connected to database "postgres" as user "euler". postgres=# select relname, relkind, pg_relation_filepath(oid) from pg_class where relname = 'pg_index'; relname | relkind | pg_relation_filepath ----------+---------+---------------------- pg_index | r | base/5/2610 (1 row) postgres=# select indexrelid::regclass, indexrelid::regclass, indisclustered from pg_index where indrelid = 'pg_database'::regclass; indexrelid | indexrelid | indisclustered ---------------------------+---------------------------+---------------- pg_database_datname_index | pg_database_datname_index | f pg_database_oid_index | pg_database_oid_index | f (2 rows) - if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared) + if (cmd == CLUSTER_COMMAND_CLUSTER && OldHeap->rd_rel->relisshared) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot cluster a shared catalog"))); + errmsg("cannot %s a shared catalog", cmd_str))); I'm confused about this change. Why is it required? If it prints this message only for CLUSTER command, you don't need to have a generic message. This kind of message is not good for translation. If you need multiple verbs here, I advise you to break it into multiple messages. - { - if (OidIsValid(indexOid)) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot cluster temporary tables of other sessions"))); - else - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot vacuum temporary tables of other sessions"))); - } + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot %s temporary tables of other sessions", + cmd_str))); Ditto. - CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM"); + CheckTableNotInUse(OldHeap, asc_toupper(cmd_str, strlen(cmd_str))); If the idea is to remove CLUSTER and VACUUM from this routine in the future, I wouldn't include formatting.h just for asc_toupper(). Instead, I would use an if condition. I think it will be easy to remove this code path when the time comes. - errmsg("cannot cluster on index \"%s\" because access method does not support clustering", - RelationGetRelationName(OldIndex)))); + errmsg("cannot %s on index \"%s\" because access method does not support clustering", + cmd_str, RelationGetRelationName(OldIndex)))); Ditto. I don't think check_index_is_clusterable() should be changed. The action is "cluster" independently of the command. You can keep "cluster" until we completely remove CLUSTER command and then we can replace this term with "repack". It also applies to cluster_is_permitted_for_relation(). - errmsg("cannot cluster on partial index \"%s\"", + errmsg("cannot %s on partial index \"%s\"", + cmd_str, RelationGetRelationName(OldIndex)))); Ditto. - errmsg("cannot cluster on invalid index \"%s\"", - RelationGetRelationName(OldIndex)))); + errmsg("cannot %s on invalid index \"%s\"", + cmd_str, RelationGetRelationName(OldIndex)))); Ditto. - (errmsg("clustering \"%s.%s\" using index scan on \"%s\"", + (errmsg("%sing \"%s.%s\" using index scan on \"%s\"", + cmd_str, nspname, RelationGetRelationName(OldHeap), RelationGetRelationName(OldIndex)))); This is bad for translation. Use complete sentences. - (errmsg("clustering \"%s.%s\" using sequential scan and sort", + (errmsg("%sing \"%s.%s\" using sequential scan and sort", + cmd_str, nspname, RelationGetRelationName(OldHeap)))); Ditto. - (errmsg("vacuuming \"%s.%s\"", + (errmsg("%sing \"%s.%s\"", + cmd_str, nspname, RelationGetRelationName(OldHeap)))); Ditto. /* - * Given an index on a partitioned table, return a list of RelToCluster for + * Like get_tables_to_cluster(), but do not care about indexes. + */ Since the goal is to remove CLUSTER in the future, provide a comment that doesn't mention routines that will certainly be removed. Hence, there is no need to fix them in the future. + /* + * Get all indexes that have indisclustered set and that the current user + * has the appropriate privileges for. + */ This comment is not true. ereport(WARNING, - (errmsg("permission denied to cluster \"%s\", skipping it", + (errmsg("permission denied to %s \"%s\", skipping it", + CLUSTER_COMMAND_STR(cmd), get_rel_name(relid)))); Fix for translation. + if (stmt->relation != NULL) + { + rel = process_single_relation(stmt->relation, stmt->indexname, + CLUSTER_COMMAND_REPACK, ¶ms, + &indexOid); + if (rel == NULL) + return; + } This code path is confusing. It took me some time (after reading process_single_relation() that could have a better name) to understand it. I don't have a good suggestion but it should have at least one comment explaining what the purpose is. +/* + * REPACK a single relation. + * + * Return NULL if done, relation reference if the caller needs to process it + * (because the relation is partitioned). + */ This comment should be expanded. As I said in the previous hunk, there isn't sufficient information to understand how process_single_relation() works. + | REPACK + { + RepackStmt *n = makeNode(RepackStmt); + + n->relation = NULL; + n->indexname = NULL; + n->params = NIL; + $$ = (Node *) n; + } + + | REPACK '(' utility_option_list ')' + { + RepackStmt *n = makeNode(RepackStmt); + + n->relation = NULL; + n->indexname = NULL; + n->params = $3; + $$ = (Node *) n; + } I'm wondering if there is an easy way to avoid these rules. PROGRESS_COMMAND_VACUUM, PROGRESS_COMMAND_ANALYZE, PROGRESS_COMMAND_CLUSTER, + PROGRESS_COMMAND_REPACK, PROGRESS_COMMAND_CREATE_INDEX, PROGRESS_COMMAND_BASEBACKUP, PROGRESS_COMMAND_COPY, It is just a matter of style but I have the habit to include new stuff at the end. +-- Yet another code path: REPACK w/o index. +REPACK clstr_tst USING INDEX clstr_tst_c; +-- Verify that inheritance link still works You forgot to remove the USING INDEX here. I'm still review the other patches (that is basically the implementation of CONCURRENTLY) and to avoid a long review, I'm sending the 0001 review. Anyway, 0001 is independent of the other patches and should be applied separately. [1] https://www.merriam-webster.com/dictionary/repack -- Euler Taveira EDB https://www.enterprisedb.com/