Re: Add Index-level REINDEX with multiple jobs
On Mon, Mar 25, 2024 at 4:47 AM Richard Guo wrote: > On Mon, Mar 25, 2024 at 10:07 AM Tom Lane wrote: >> >> Alexander Korotkov writes: >> > Here goes the revised patch. I'm going to push this if there are no >> > objections. >> >> Quite a lot of the buildfarm is complaining about this: >> >> reindexdb.c: In function 'reindex_one_database': >> reindexdb.c:434:54: error: 'indices_tables_cell' may be used uninitialized >> in this function [-Werror=maybe-uninitialized] >> 434 | strcmp(prev_index_table_name, indices_tables_cell->val) == 0) >> | ~~~^ >> >> I noticed it first on mamba, which is set up with -Werror, but a >> scrape of the buildfarm logs shows many other animals reporting this >> as a warning. > > > I noticed the similar warning on cfbot: > https://cirrus-ci.com/task/6298504306360320?logs=gcc_warning#L448 > > reindexdb.c: In function ‘reindex_one_database’: > reindexdb.c:437:24: error: ‘indices_tables_cell’ may be used uninitialized in > this function [-Werror=maybe-uninitialized] > 437 |indices_tables_cell = indices_tables_cell->next; > |^~~ > > Although it's complaining on line 437 not 434, I think they are the same > issue. > >> >> I think they are right. Even granting that the >> compiler realizes that "parallel && process_type == REINDEX_INDEX" is >> enough to reach the one place where indices_tables_cell is >> initialized, that's not really enough, because that place is >> >> if (indices_tables_list) >> indices_tables_cell = indices_tables_list->head; >> >> So I believe this code will crash if get_parallel_object_list returns >> an empty list. Initializing indices_tables_cell to NULL in its >> declaration would stop the compiler warning, but if I'm right it >> will do nothing to prevent that crash. This needs a bit more effort. > > > Agreed. And the comment of get_parallel_object_list() says that it may > indeed return NULL. > > BTW, on line 373, it checks 'process_list' and bails out if this list is > NULL. But it seems to me that 'process_list' cannot be NULL in this > case, because it's initialized to be 'user_list' and we have asserted > that user_list is not NULL on line 360. I wonder if we should check > indices_tables_list instead of process_list on line 373. Thank you. I'm going to deal with this in the next few hours. -- Regards, Alexander Korotkov
Re: Add Index-level REINDEX with multiple jobs
On Mon, Mar 25, 2024 at 10:07 AM Tom Lane wrote: > Alexander Korotkov writes: > > Here goes the revised patch. I'm going to push this if there are no > objections. > > Quite a lot of the buildfarm is complaining about this: > > reindexdb.c: In function 'reindex_one_database': > reindexdb.c:434:54: error: 'indices_tables_cell' may be used uninitialized > in this function [-Werror=maybe-uninitialized] > 434 | strcmp(prev_index_table_name, indices_tables_cell->val) == 0) > | ~~~^ > > I noticed it first on mamba, which is set up with -Werror, but a > scrape of the buildfarm logs shows many other animals reporting this > as a warning. I noticed the similar warning on cfbot: https://cirrus-ci.com/task/6298504306360320?logs=gcc_warning#L448 reindexdb.c: In function ‘reindex_one_database’: reindexdb.c:437:24: error: ‘indices_tables_cell’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 437 |indices_tables_cell = indices_tables_cell->next; |^~~ Although it's complaining on line 437 not 434, I think they are the same issue. > I think they are right. Even granting that the > compiler realizes that "parallel && process_type == REINDEX_INDEX" is > enough to reach the one place where indices_tables_cell is > initialized, that's not really enough, because that place is > > if (indices_tables_list) > indices_tables_cell = indices_tables_list->head; > > So I believe this code will crash if get_parallel_object_list returns > an empty list. Initializing indices_tables_cell to NULL in its > declaration would stop the compiler warning, but if I'm right it > will do nothing to prevent that crash. This needs a bit more effort. Agreed. And the comment of get_parallel_object_list() says that it may indeed return NULL. BTW, on line 373, it checks 'process_list' and bails out if this list is NULL. But it seems to me that 'process_list' cannot be NULL in this case, because it's initialized to be 'user_list' and we have asserted that user_list is not NULL on line 360. I wonder if we should check indices_tables_list instead of process_list on line 373. Thanks Richard
Re: Add Index-level REINDEX with multiple jobs
Alexander Korotkov writes: > Here goes the revised patch. I'm going to push this if there are no > objections. Quite a lot of the buildfarm is complaining about this: reindexdb.c: In function 'reindex_one_database': reindexdb.c:434:54: error: 'indices_tables_cell' may be used uninitialized in this function [-Werror=maybe-uninitialized] 434 | strcmp(prev_index_table_name, indices_tables_cell->val) == 0) | ~~~^ I noticed it first on mamba, which is set up with -Werror, but a scrape of the buildfarm logs shows many other animals reporting this as a warning. I think they are right. Even granting that the compiler realizes that "parallel && process_type == REINDEX_INDEX" is enough to reach the one place where indices_tables_cell is initialized, that's not really enough, because that place is if (indices_tables_list) indices_tables_cell = indices_tables_list->head; So I believe this code will crash if get_parallel_object_list returns an empty list. Initializing indices_tables_cell to NULL in its declaration would stop the compiler warning, but if I'm right it will do nothing to prevent that crash. This needs a bit more effort. regards, tom lane
Re: Add Index-level REINDEX with multiple jobs
On Wed, Mar 20, 2024 at 7:19 PM Alexander Korotkov wrote: > On Mon, Mar 11, 2024 at 3:44 PM Maxim Orlov wrote: > > On Tue, 6 Feb 2024 at 09:22, Michael Paquier wrote: > >> The problem may be actually trickier than that, no? Could there be > >> other factors to take into account for their classification, like > >> their sizes (typically, we'd want to process the biggest one first, I > >> guess)? > > > > > > Sorry for a late reply. Thanks for an explanation. This is sounds > > reasonable to me. > > Svetlana had addressed this in the patch v2. > > I think this patch is a nice improvement. But it doesn't seem to be > implemented in the right way. There is no guarantee that > get_parallel_object_list() will return tables in the same order as > indexes. Especially when there is "ORDER BY idx.relpages". Also, > sort_indices_by_tables() has quadratic complexity (probably OK since > input list shouldn't be too lengthy) and a bit awkward. > > I've revised the patchset. Now appropriate ordering is made in SQL > query. The original list of indexes is modified to match the list of > tables. The tables are ordered by the size of its greatest index, > within table indexes are ordered by size. > > I'm going to further revise this patch, mostly comments and the commit > message. Here goes the revised patch. I'm going to push this if there are no objections. -- Regards, Alexander Korotkov v4-0001-Add-the-index-level-REINDEX-with-multiple-jobs.patch Description: Binary data
Re: Add Index-level REINDEX with multiple jobs
On Mon, Mar 11, 2024 at 3:44 PM Maxim Orlov wrote: > On Tue, 6 Feb 2024 at 09:22, Michael Paquier wrote: >> The problem may be actually trickier than that, no? Could there be >> other factors to take into account for their classification, like >> their sizes (typically, we'd want to process the biggest one first, I >> guess)? > > > Sorry for a late reply. Thanks for an explanation. This is sounds > reasonable to me. > Svetlana had addressed this in the patch v2. I think this patch is a nice improvement. But it doesn't seem to be implemented in the right way. There is no guarantee that get_parallel_object_list() will return tables in the same order as indexes. Especially when there is "ORDER BY idx.relpages". Also, sort_indices_by_tables() has quadratic complexity (probably OK since input list shouldn't be too lengthy) and a bit awkward. I've revised the patchset. Now appropriate ordering is made in SQL query. The original list of indexes is modified to match the list of tables. The tables are ordered by the size of its greatest index, within table indexes are ordered by size. I'm going to further revise this patch, mostly comments and the commit message. -- Regards, Alexander Korotkov v3-0001-Add-Index-level-REINDEX-with-multiple-jobs.patch Description: Binary data
Re: Add Index-level REINDEX with multiple jobs
On Tue, 6 Feb 2024 at 09:22, Michael Paquier wrote: > > The problem may be actually trickier than that, no? Could there be > other factors to take into account for their classification, like > their sizes (typically, we'd want to process the biggest one first, I > guess)? Sorry for a late reply. Thanks for an explanation. This is sounds reasonable to me. Svetlana had addressed this in the patch v2. -- Best regards, Maxim Orlov.
Re: Add Index-level REINDEX with multiple jobs
Andrey M. Borodin писал(а) 2024-03-04 09:26: On 6 Feb 2024, at 11:21, Michael Paquier wrote: The problem may be actually trickier than that, no? Could there be other factors to take into account for their classification, like their sizes (typically, we'd want to process the biggest one first, I guess)? Maxim, what do you think about it? Best regards, Andrey Borodin. I agree that, as is the case with tables in REINDEX SCHEME, indices should be sorted accordingly to their size. Attaching the second version of patch, with indices being sorted by size. Best regards, Svetlana DerevyankoFrom 3c2f0fcbcf382cc2cb9786e26ad8027558937ea2 Mon Sep 17 00:00:00 2001 From: Svetlana Derevyanko Date: Fri, 29 Dec 2023 08:20:54 +0300 Subject: [PATCH v2] Add Index-level REINDEX with multiple jobs Author: Maxim Orlov , Svetlana Derevyanko --- doc/src/sgml/ref/reindexdb.sgml| 3 +- src/bin/scripts/reindexdb.c| 159 ++--- src/bin/scripts/t/090_reindexdb.pl | 8 +- 3 files changed, 151 insertions(+), 19 deletions(-) diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml index 8d9ced212f..dfb5e534fb 100644 --- a/doc/src/sgml/ref/reindexdb.sgml +++ b/doc/src/sgml/ref/reindexdb.sgml @@ -187,8 +187,7 @@ PostgreSQL documentation setting is high enough to accommodate all connections. -Note that this option is incompatible with the --index -and --system options. +Note that this option is incompatible with the --system option. diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c index 6ae30dff31..63d99e7441 100644 --- a/src/bin/scripts/reindexdb.c +++ b/src/bin/scripts/reindexdb.c @@ -251,14 +251,6 @@ main(int argc, char *argv[]) } else { - /* - * Index-level REINDEX is not supported with multiple jobs as we - * cannot control the concurrent processing of multiple indexes - * depending on the same relation. - */ - if (concurrentCons > 1 && indexes.head != NULL) - pg_fatal("cannot use multiple jobs to reindex indexes"); - if (dbname == NULL) { if (getenv("PGDATABASE")) @@ -279,7 +271,7 @@ main(int argc, char *argv[]) if (indexes.head != NULL) reindex_one_database(, REINDEX_INDEX, , progname, echo, verbose, - concurrently, 1, tablespace); + concurrently, concurrentCons, tablespace); if (tables.head != NULL) reindex_one_database(, REINDEX_TABLE, , @@ -299,6 +291,67 @@ main(int argc, char *argv[]) exit(0); } +static SimpleStringList * +sort_indices_by_tables(PGconn *conn, SimpleStringList *user_list, SimpleStringList *indices_tables_list, + SimpleStringList ***indices_process_list, bool echo) +{ + SimpleStringList *process_list = pg_malloc0(sizeof(SimpleStringList)); + SimpleStringListCell *idx_cell, + *idx_table, + *cell; + int ipl_len = 10, +current_tables_num = 0, +i; + + *indices_process_list = pg_malloc0(sizeof(*indices_process_list) * ipl_len); + + for (idx_cell = user_list->head, idx_table = indices_tables_list->head; + idx_cell; + idx_cell = idx_cell->next, idx_table = idx_table->next) + { + SimpleStringList *list_to_add = NULL; + + if (!current_tables_num) + { + simple_string_list_append(process_list, idx_table->val); + (*indices_process_list)[0] = pg_malloc0(sizeof(SimpleStringList)); + current_tables_num++; + } + + cell = process_list->head; + for (i = 0; i < current_tables_num; i++) + { + if (!strcmp(idx_table->val, cell->val)) + { +list_to_add = (*indices_process_list)[i]; +break; + } + + } + + if (!list_to_add) + { + simple_string_list_append(process_list, idx_table->val); + + if (current_tables_num >= ipl_len) + { +ipl_len *= 2; +*indices_process_list = pg_realloc(*indices_process_list, + sizeof(*indices_process_list) * ipl_len); + } + + (*indices_process_list)[current_tables_num] = pg_malloc0(sizeof(SimpleStringList)); + list_to_add = (*indices_process_list)[current_tables_num]; + current_tables_num++; + } + + simple_string_list_append(list_to_add, idx_cell->val); + + } + + return process_list; +} + static void reindex_one_database(ConnParams *cparams, ReindexType type, SimpleStringList *user_list, @@ -310,10 +363,13 @@ reindex_one_database(ConnParams *cparams, ReindexType type, SimpleStringListCell *cell; bool parallel = concurrentCons > 1; SimpleStringList *process_list = user_list; + SimpleStringList **indices_process_list = NULL; + SimpleStringList *indices_tables_list = NULL; ReindexType process_type = type; ParallelSlotArray *sa; bool failed = false; - int items_count = 0; + int items_count = 0, +i = 0; conn = connectDatabase(cparams, progname, echo, false, true); @@ -383,8 +439,23 @@ reindex_one_database(ConnParams *cparams, ReindexType type, return;
Re: Add Index-level REINDEX with multiple jobs
> On 6 Feb 2024, at 11:21, Michael Paquier wrote: > > The problem may be actually trickier than that, no? Could there be > other factors to take into account for their classification, like > their sizes (typically, we'd want to process the biggest one first, I > guess)? Maxim, what do you think about it? Best regards, Andrey Borodin.
Re: Add Index-level REINDEX with multiple jobs
On Fri, Dec 29, 2023 at 04:15:35PM +0300, Maxim Orlov wrote: > Recently, one of our customers came to us with the question: why do reindex > utility does not support multiple jobs for indices (-i opt)? > And, of course, it is because we cannot control the concurrent processing > of multiple indexes on the same relation. This was > discussed somewhere in [0], I believe. So, customer have to make a shell > script to do his business and so on. Yep, that should be the correct thread. As far as I recall, one major reason was code simplicity because dealing with parallel jobs at table level is a no-brainer on the client side (see 0003): we know that relations with physical storage will never interact with each other. > But. This seems to be not that complicated to split indices by parent > tables and do reindex in multiple jobs? Or I miss something? > PFA patch implementing this. + appendPQExpBufferStr(_query, + "WITH idx as (\n" + " SELECT c.relname, ns.nspname\n" + " FROM pg_catalog.pg_class c,\n" + " pg_catalog.pg_namespace ns\n" + " WHERE c.relnamespace OPERATOR(pg_catalog.=) ns.oid AND\n" + "c.oid OPERATOR(pg_catalog.=) ANY(ARRAY['\n"); The problem may be actually trickier than that, no? Could there be other factors to take into account for their classification, like their sizes (typically, we'd want to process the biggest one first, I guess)? -- Michael signature.asc Description: PGP signature
Add Index-level REINDEX with multiple jobs
Hi! Recently, one of our customers came to us with the question: why do reindex utility does not support multiple jobs for indices (-i opt)? And, of course, it is because we cannot control the concurrent processing of multiple indexes on the same relation. This was discussed somewhere in [0], I believe. So, customer have to make a shell script to do his business and so on. But. This seems to be not that complicated to split indices by parent tables and do reindex in multiple jobs? Or I miss something? PFA patch implementing this. As always, any opinions are very welcome! [0] https://www.postgresql.org/message-id/flat/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg%40mail.gmail.com -- Best regards, Maxim Orlov. v1-0001-Add-Index-level-REINDEX-with-multiple-jobs.patch Description: Binary data