Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Le 23 juil. 2015 19:27, Alvaro Herrera alvhe...@2ndquadrant.com a écrit : Laurent Laborde wrote: Friendly greetings ! What's the status of parallel clusterdb please ? I'm having fun (and troubles) applying the vacuumdb patch to clusterdb. This thread also talk about unifying code for parallelizing clusterdb and reindex. Was anything done about it ? Because i can't see it and my work currently involve a lot of copy/pasting from vacuumdb to clusterdb :) Honestly, I have to wonder whether there are really valid use cases for clusterdb. Are you actually using it and want to see it improved, or is this just an academical exercise? Purely academical. I don't use it. And no, (i'm pretty sure) i don't have the required postgresql knowledge to do this unification if it isn't done already. You may or may not lack it *now*, but that doesn't mean you will continue to lack it forever. That's why i'm working on it :) -- Laurent ker2x Laborde DBA \o/ gandi.net
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jan 2, 2015 at 3:18 PM, Amit Kapila amit.kapil...@gmail.com wrote: Okay, I have marked this patch as Ready For Committer Notes for Committer - There is one behavioural difference in the handling of --analyze-in-stages switch, when individual tables (by using -t option) are analyzed by using this switch, patch will process (in case of concurrent jobs) all the given tables for stage-1 and then for stage-2 and so on whereas in the unpatched code it will process all the three stages table by table (table-1 all three stages, table-2 all three stages and so on). I think the new behaviour is okay as the same is done when this utility does vacuum for whole database. As there was no input from any committer on this point, I thought it is better to get the same rather than waiting more just for one point. Friendly greetings ! What's the status of parallel clusterdb please ? I'm having fun (and troubles) applying the vacuumdb patch to clusterdb. This thread also talk about unifying code for parallelizing clusterdb and reindex. Was anything done about it ? Because i can't see it and my work currently involve a lot of copy/pasting from vacuumdb to clusterdb :) And no, (i'm pretty sure) i don't have the required postgresql knowledge to do this unification if it isn't done already. Thank you :) (And sorry about the thread-necromancy) -- Laurent ker2x Laborde DBA \o/ Gandi.net
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Laurent Laborde wrote: Friendly greetings ! What's the status of parallel clusterdb please ? I'm having fun (and troubles) applying the vacuumdb patch to clusterdb. This thread also talk about unifying code for parallelizing clusterdb and reindex. Was anything done about it ? Because i can't see it and my work currently involve a lot of copy/pasting from vacuumdb to clusterdb :) Honestly, I have to wonder whether there are really valid use cases for clusterdb. Are you actually using it and want to see it improved, or is this just an academical exercise? And no, (i'm pretty sure) i don't have the required postgresql knowledge to do this unification if it isn't done already. You may or may not lack it *now*, but that doesn't mean you will continue to lack it forever. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
2015-01-29 10:28 GMT+01:00 Fabrízio de Royes Mello fabriziome...@gmail.com : Em quinta-feira, 29 de janeiro de 2015, Pavel Stehule pavel.steh...@gmail.com escreveu: Hi I am testing this feature on relative complex schema (38619 tables in db) and I got deadlock [pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4 vacuumdb: vacuuming database test2 vacuumdb: vacuuming of database test2 failed: ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. HINT: See server log for query details. ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; HINT: See server log for query details. STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; LOG: could not send data to client: Broken pipe STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; FATAL: connection to client lost LOG: could not send data to client: Broken pipe ERROR: canceling statement due to user request FATAL: connection to client lost Schema | Name | Type | Owner |Size| Description +-+---+--++- pg_catalog | pg_attribute| table | postgres | 439 MB | pg_catalog | pg_rewrite | table | postgres | 314 MB | pg_catalog | pg_proc | table | postgres | 136 MB | pg_catalog | pg_depend | table | postgres | 133 MB | pg_catalog | pg_class| table | postgres | 69 MB | pg_catalog | pg_attrdef | table | postgres | 55 MB | pg_catalog | pg_trigger | table | postgres | 47 MB | pg_catalog | pg_type | table | postgres | 31 MB | pg_catalog | pg_description | table | postgres | 23 MB | pg_catalog | pg_index| table | postgres | 20 MB | pg_catalog | pg_constraint | table | postgres | 17 MB | pg_catalog | pg_shdepend | table | postgres | 17 MB | pg_catalog | pg_statistic| table | postgres | 928 kB | pg_catalog | pg_operator | table | postgres | 552 kB | pg_catalog | pg_collation| table | postgres | 232 kB | should not be used a pessimist controlled locking instead? Regards Pavel Regards, Fabrízio -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Em quinta-feira, 29 de janeiro de 2015, Pavel Stehule pavel.steh...@gmail.com escreveu: Hi I am testing this feature on relative complex schema (38619 tables in db) and I got deadlock [pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4 vacuumdb: vacuuming database test2 vacuumdb: vacuuming of database test2 failed: ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. HINT: See server log for query details. ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; HINT: See server log for query details. STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; LOG: could not send data to client: Broken pipe STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; FATAL: connection to client lost LOG: could not send data to client: Broken pipe ERROR: canceling statement due to user request FATAL: connection to client lost Schema | Name | Type | Owner |Size| Description +-+---+--++- pg_catalog | pg_attribute| table | postgres | 439 MB | pg_catalog | pg_rewrite | table | postgres | 314 MB | pg_catalog | pg_proc | table | postgres | 136 MB | pg_catalog | pg_depend | table | postgres | 133 MB | pg_catalog | pg_class| table | postgres | 69 MB | pg_catalog | pg_attrdef | table | postgres | 55 MB | pg_catalog | pg_trigger | table | postgres | 47 MB | pg_catalog | pg_type | table | postgres | 31 MB | pg_catalog | pg_description | table | postgres | 23 MB | pg_catalog | pg_index| table | postgres | 20 MB | pg_catalog | pg_constraint | table | postgres | 17 MB | pg_catalog | pg_shdepend | table | postgres | 17 MB | pg_catalog | pg_statistic| table | postgres | 928 kB | pg_catalog | pg_operator | table | postgres | 552 kB | pg_catalog | pg_collation| table | postgres | 232 kB | There are a warning in the docs to be careful to use the -f (full) option and -j. Regards, Fabrízio -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Pavel Stehule wrote: should not be used a pessimist controlled locking instead? Patches welcome. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Hi I am testing this feature on relative complex schema (38619 tables in db) and I got deadlock [pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4 vacuumdb: vacuuming database test2 vacuumdb: vacuuming of database test2 failed: ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. HINT: See server log for query details. ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; HINT: See server log for query details. STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; LOG: could not send data to client: Broken pipe STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; FATAL: connection to client lost LOG: could not send data to client: Broken pipe ERROR: canceling statement due to user request FATAL: connection to client lost Schema | Name | Type | Owner |Size| Description +-+---+--++- pg_catalog | pg_attribute| table | postgres | 439 MB | pg_catalog | pg_rewrite | table | postgres | 314 MB | pg_catalog | pg_proc | table | postgres | 136 MB | pg_catalog | pg_depend | table | postgres | 133 MB | pg_catalog | pg_class| table | postgres | 69 MB | pg_catalog | pg_attrdef | table | postgres | 55 MB | pg_catalog | pg_trigger | table | postgres | 47 MB | pg_catalog | pg_type | table | postgres | 31 MB | pg_catalog | pg_description | table | postgres | 23 MB | pg_catalog | pg_index| table | postgres | 20 MB | pg_catalog | pg_constraint | table | postgres | 17 MB | pg_catalog | pg_shdepend | table | postgres | 17 MB | pg_catalog | pg_statistic| table | postgres | 928 kB | pg_catalog | pg_operator | table | postgres | 552 kB | pg_catalog | pg_collation| table | postgres | 232 kB | Regards Pavel Stehule 2015-01-27 3:26 GMT+01:00 Dilip kumar dilip.ku...@huawei.com: On 23 January 2015 21:10, Alvaro Herrera Wrote, In case you're up for doing some more work later on, there are two ideas here: move the backend's TranslateSocketError to src/common, and try to merge pg_dump's select_loop function with the one in this new code. But that's for another patch anyway (and this new function needs a little battle-testing, of course.) Thanks for your effort, I will take it up for next commitfest.. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 23 January 2015 21:10, Alvaro Herrera Wrote, In case you're up for doing some more work later on, there are two ideas here: move the backend's TranslateSocketError to src/common, and try to merge pg_dump's select_loop function with the one in this new code. But that's for another patch anyway (and this new function needs a little battle-testing, of course.) Thanks for your effort, I will take it up for next commitfest.. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 23 January 2015 23:55, Alvaro Herrera, -j1 is now the same as not specifying anything, and vacuum_one_database uses more common code in the parallel and not-parallel cases: the not- parallel case is just the parallel case with a single connection, so the setup and shutdown is mostly the same in both cases. I pushed the result. Please test, particularly on Windows. If you can use configure --enable-tap-tests and run them (make check in the src/bin/scripts subdir) that would be good too .. not sure whether that's expected to work on Windows. I have tested in windows, its working fine, Not sure how to enable tap test in windows, I will check it and run if possible. Thanks, Dilip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Alvaro Herrera wrote: I'm tweaking your v24 a bit more now, thanks -- main change is to make vacuum_one_database be called only to run one analyze stage, so it never iterates for each stage; callers must iterate calling it multiple times in those cases. (There's only one callsite that needs changing anyway.) I made some more changes, particularly so that the TAP test pass (we were missing the semicolon when a table name was not specified to prepare_vacuum_command). I reordered the code in a more sensible manner, remove the vacuum_database_stage layer (which was pretty useless), and changed the analyze-in-stages mode: if we pass a valid stage number, run that stage, if not, then we're not in analyze-in-stage mode. So I got rid of the boolean flag altogether. I also moved the per-stage commands and messages back into a struct inside a function, since there's no need to have them be file-level variables anymore. -j1 is now the same as not specifying anything, and vacuum_one_database uses more common code in the parallel and not-parallel cases: the not-parallel case is just the parallel case with a single connection, so the setup and shutdown is mostly the same in both cases. I pushed the result. Please test, particularly on Windows. If you can use configure --enable-tap-tests and run them (make check in the src/bin/scripts subdir) that would be good too .. not sure whether that's expected to work on Windows. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Andres Freund wrote: On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: + PQsetnonblocking(connSlot[0].connection, 1); + + for (i = 1; i concurrentCons; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + } Are you sure about this global PQsetnonblocking()? This means that you might not be able to send queries... And you don't seem to be waiting for sockets waiting for writes in the select loop - which means you might end up being stuck waiting for reads when you haven't submitted the query. I think you might need a more complex select() loop. On nonfree connections also wait for writes if PQflush() returns != 0. I removed the PQsetnonblocking() calls. They were a misunderstanding, I think. +/* + * GetIdleSlot + * Process the slot list, if any free slot is available then return + * the slotid else perform the select on all the socket's and wait + * until atleast one slot becomes available. + */ +static int +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) +{ + int i; + fd_set slotset; Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. I tried without the check to use 1500 connections, and select() didn't even blink -- everything worked fine vacuuming 1500 tables in parallel on a set of 2000 tables. Not sure what's the actual limit but my FD_SETSIZE says 1024, so I'm clearly over the limit. (I tried to run it with -j2000 but the server didn't start with that many connections. I didn't try any intermediate numbers.) Anyway I added the check. I fixed some more minor issues and pushed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 22 January 2015 23:16, Alvaro Herrera Wrote, Here's v23. There are two things that continue to bother me and I would like you, dear patch author, to change them before committing this patch: 1. I don't like having vacuum_one_database() and a separate vacuum_one_database_parallel(). I think we should merge them into one function, which does either thing according to parameters. There's plenty in there that's duplicated. 2. in particular, the above means that run_parallel_vacuum can no longer exist as it is. Right now vacuum_one_database_parallel relies on run_parallel_vacuum to do the actual job parallellization. I would like to have that looping in the improved vacuum_one_database() function instead. Looking forward to v24, Thanks you for your effort, I have tried to change the patch as per your instructions and come up with v24, Changes: 1. In current patch vacuum_one_database (for table list), have the table loop outside and analyze_stage loop inside, so it will finish All three stage for one table first and then pick the next table. But vacuum_one_database_parallel will do the stage loop outside and will call run_parallel_vacuum, Which will have table loop, so for one stage all the tables will be vacuumed first, then go to next stage. So for merging two function both functions behaviors should be identical, I think if user have given a list of tables in analyze-in-stages, than doing all the table Atleast for one stage and then picking next stage will be better solution so I have done it that way. 2. in select_loop For WIN32 TranslateSocketError function I replaced with if (WSAGetLastError() == WSAEINTR) errno == EINTR; otherwise I have to expose TranslateSocketError function from socket and include extra header. I have tested in windows also its working fine. Regards, Dilip vacuumdb_parallel_v24.patch Description: vacuumdb_parallel_v24.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Dilip kumar wrote: Changes: 1. In current patch vacuum_one_database (for table list), have the table loop outside and analyze_stage loop inside, so it will finish All three stage for one table first and then pick the next table. But vacuum_one_database_parallel will do the stage loop outside and will call run_parallel_vacuum, Which will have table loop, so for one stage all the tables will be vacuumed first, then go to next stage. So for merging two function both functions behaviors should be identical, I think if user have given a list of tables in analyze-in-stages, than doing all the table Atleast for one stage and then picking next stage will be better solution so I have done it that way. Yeah, I think the stages loop should be outermost, as discussed upthread somewhere -- it's better to have initial stats for all tables as soon as possible, and improve them later, than have some tables/dbs with no stats for a longer period while full stats are computed for some specific tables/database. I'm tweaking your v24 a bit more now, thanks -- main change is to make vacuum_one_database be called only to run one analyze stage, so it never iterates for each stage; callers must iterate calling it multiple times in those cases. (There's only one callsite that needs changing anyway.) 2. in select_loop For WIN32 TranslateSocketError function I replaced with if (WSAGetLastError() == WSAEINTR) errno == EINTR; otherwise I have to expose TranslateSocketError function from socket and include extra header. Grumble. Don't like this bit, but moving TranslateSocketError to src/common is outside the scope of this patch, so okay. (pg_dump's parallel stuff has the same issue anyway.) In case you're up for doing some more work later on, there are two ideas here: move the backend's TranslateSocketError to src/common, and try to merge pg_dump's select_loop function with the one in this new code. But that's for another patch anyway (and this new function needs a little battle-testing, of course.) I have tested in windows also its working fine. Great, thanks. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Here's v23. I reworked a number of things. First, I changed trivial stuff like grouping all the vacuuming options in a struct, to avoid passing an excessive number of arguments to functions. full, freeze, analyze_only, and_analyze and verbose are all in a single struct now. Also, the stage_commands and stage_messages was duplicated by your patch; I moved them to a file-level static struct. I made prepare_command reset the string buffer and receive an optional table name, so that it can append it to the generated command, and append the semicolon as well. Forcing the callers to reset the string before calling, and having them add the table name and semicolon afterwards was awkward and unnecessarily verbose. You had a new in_abort() function in common.c which seems an unnecessary layer; in its place I just exported the inAbort boolean flag it was returning, and renamed to CancelRequested. I was then troubled by the fact that vacuum_one_database() was being called in a loop by main() when multiple tables are vacuumed, but vacuum_parallel() was doing the loop internally. I found this discrepancy confusing, so I renamed that new function to vacuum_one_database_parallel and modified the original vacuum_one_database to do the loop internally as well. Now they are, in essence, a mirror of each other, one doing the parallel stuff and one doing it serially. This seems to make more sense to me -- but see below. I also modified some underlying stuff like GetIdleSlot returning a ParallelSlot pointer instead of an array index. Since its caller always has to dereference the array with the given index, it makes more sense to return the right element pointer instead, so I made it do that. Also, that way, instead of returning NO_SLOT in case of error it can just return NULL; no need for extra cognitive burden. I also changed select_loop. In your patch it had two implementations, one WIN32 and another one for the rest. It looks nicer to me to have only one with small exceptions in the places that need it. (I haven't tested the WIN32 path.) Also, instead of returning ERROR_IN_ABORT I made it set a boolean flag in case of error, which seems cleaner. I changed GetQueryResult as I described in a previous message. There are two things that continue to bother me and I would like you, dear patch author, to change them before committing this patch: 1. I don't like having vacuum_one_database() and a separate vacuum_one_database_parallel(). I think we should merge them into one function, which does either thing according to parameters. There's plenty in there that's duplicated. 2. in particular, the above means that run_parallel_vacuum can no longer exist as it is. Right now vacuum_one_database_parallel relies on run_parallel_vacuum to do the actual job parallellization. I would like to have that looping in the improved vacuum_one_database() function instead. Looking forward to v24, -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml index 3ecd999..211235a 100644 --- a/doc/src/sgml/ref/vacuumdb.sgml +++ b/doc/src/sgml/ref/vacuumdb.sgml @@ -204,6 +204,25 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-j replaceable class=parameterjobs/replaceable/option/term + termoption--jobs=replaceable class=parameternjobs/replaceable/option/term + listitem + para +This option will enable the vacuum operation to run on concurrent +connections. Maximum number of tables can be vacuumed concurrently +is equal to number of jobs. If number of jobs given is more than +number of tables then number of jobs will be set to number of tables. + /para + para +applicationvacuumdb/application will open +replaceable class=parameter njobs/replaceable connections to the +database, so make sure your xref linkend=guc-max-connections +setting is high enough to accommodate all connections. + /para + /listitem + /varlistentry + + varlistentry termoption--analyze-in-stages/option/term listitem para diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c index d942a75..1bf7611 100644 --- a/src/bin/pg_dump/parallel.c +++ b/src/bin/pg_dump/parallel.c @@ -1160,7 +1160,7 @@ select_loop(int maxFd, fd_set *workerset) i = select(maxFd + 1, workerset, NULL, NULL, NULL); /* - * If we Ctrl-C the master process , it's likely that we interrupt + * If we Ctrl-C the master process, it's likely that we interrupt * select() here. The signal handler will set wantAbort == true and * the shutdown journey starts from here. Note that we'll come back * here later when we tell all workers to terminate and read their diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c index 6bfe2e6..da142aa
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Jan 22, 2015 at 8:22 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I didn't understand the coding in GetQueryResult(); why do we check the result status of the last returned result only? It seems simpler to me to check it inside the loop, but maybe there's a reason you didn't do it like that? Also, what is the reason we were ignoring those errors only in completedb mode? It doesn't seem like it would cause any harm if we did it in all cases. That means we can just not have the completeDb flag at all. IIRC it is done to match the existing behaviour where such errors are ignored we use this utility to vacuum database. I think that's fine, but we should do it always, not just in whole-database mode. I've been hacking this patch today BTW; hope to have something to show tomorrow. Great! With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Amit Kapila wrote: On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I didn't understand the coding in GetQueryResult(); why do we check the result status of the last returned result only? It seems simpler to me to check it inside the loop, but maybe there's a reason you didn't do it like that? Also, what is the reason we were ignoring those errors only in completedb mode? It doesn't seem like it would cause any harm if we did it in all cases. That means we can just not have the completeDb flag at all. IIRC it is done to match the existing behaviour where such errors are ignored we use this utility to vacuum database. I think that's fine, but we should do it always, not just in whole-database mode. I've been hacking this patch today BTW; hope to have something to show tomorrow. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
I didn't understand the coding in GetQueryResult(); why do we check the result status of the last returned result only? It seems simpler to me to check it inside the loop, but maybe there's a reason you didn't do it like that? Also, what is the reason we were ignoring those errors only in completedb mode? It doesn't seem like it would cause any harm if we did it in all cases. That means we can just not have the completeDb flag at all. Finally, I think it's better to report the missing relation error, even if we're going to return true to continue processing other tables. That makes the situation clearer to the user. So the function would end up looking like this: /* * GetQueryResult * * Process the query result. Returns true if there's no error, false * otherwise -- but errors about trying to vacuum a missing relation are * ignored. */ static bool GetQueryResult(PGconn *conn, errorOptions *erropts) { PGresult*result = NULL; SetCancelConn(conn); while ((result = PQgetResult(conn)) != NULL) { /* * If errors are found, report them. Errors about a missing table are * harmless so we continue processing, but die for other errors. */ if (PQresultStatus(result) != PGRES_COMMAND_OK) { char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE); fprintf(stderr, _(%s: vacuuming of database \%s\ failed: %s), erropts-progname, erropts-dbname, PQerrorMessage(conn)); if (sqlState strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0) { PQclear(result); return false; } } PQclear(result); } ResetCancelConn(); return true; } -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Michael Paquier wrote: Andres, this patch needs more effort from the author, right? So marking it as returned with feedback. I will give this patch a look in the current commitfest, if you can please set as 'needs review' instead with me as reviewer, so that I don't forget, I would appreciate it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jan 16, 2015 at 12:53 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: Andres, this patch needs more effort from the author, right? So marking it as returned with feedback. I will give this patch a look in the current commitfest, if you can please set as 'needs review' instead with me as reviewer, so that I don't forget, I would appreciate it. Fine for me, done this way. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Sun, Jan 4, 2015 at 10:57 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: + termoption-j replaceable class=parameterjobs/replaceable/option/term + termoption--jobs=replaceable class=parameternjobs/replaceable/option/term + listitem + para +Number of concurrent connections to perform the operation. +This option will enable the vacuum operation to run on asynchronous +connections, at a time one table will be operated on one connection. +So at one time as many tables will be vacuumed parallely as number of +jobs. If number of jobs given are more than number of tables then +number of jobs will be set to number of tables. asynchronous connections isn't a very well defined term. Also, the second part of that sentence doesn't seem to be gramattically correct. + /para + para +applicationvacuumdb/application will open +replaceable class=parameter njobs/replaceable connections to the +database, so make sure your xref linkend=guc-max-connections +setting is high enough to accommodate all connections. + /para Isn't it njobs+1? @@ -141,6 +199,7 @@ main(int argc, char *argv[]) } } + optind++; Hm, where's that coming from? + PQsetnonblocking(connSlot[0].connection, 1); + + for (i = 1; i concurrentCons; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + } Are you sure about this global PQsetnonblocking()? This means that you might not be able to send queries... And you don't seem to be waiting for sockets waiting for writes in the select loop - which means you might end up being stuck waiting for reads when you haven't submitted the query. I think you might need a more complex select() loop. On nonfree connections also wait for writes if PQflush() returns != 0. +/* + * GetIdleSlot + * Process the slot list, if any free slot is available then return + * the slotid else perform the select on all the socket's and wait + * until atleast one slot becomes available. + */ +static int +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) +{ + int i; + fd_set slotset; Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. + int firstFree = -1; + pgsocket maxFd; + + for (i = 0; i max_slot; i++) + if (pSlot[i].isFree) + return i; + FD_ZERO(slotset); + + maxFd = pSlot[0].sock; + + for (i = 0; i max_slot; i++) + { + FD_SET(pSlot[i].sock, slotset); + if (pSlot[i].sock maxFd) + maxFd = pSlot[i].sock; + } So we're waiting for idle connections? I think you'll have to have to use two fdsets here, and set the write set based on PQflush() != 0. +/* + * A select loop that repeats calling select until a descriptor in the read + * set becomes readable. On Windows we have to check for the termination event + * from time to time, on Unix we can just block forever. + */ Should a) mention why we have to check regularly on windows b) that on linux we don't have to because we send a cancel event from the signal handler. +static int +select_loop(int maxFd, fd_set *workerset) +{ + int i; + fd_set saveSet = *workerset; +#ifdef WIN32 + /* should always be the master */ Hm? I have to say, this is a fairly large patch for such a minor feature... Andres, this patch needs more effort from the author, right? So marking it as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: + termoption-j replaceable class=parameterjobs/replaceable/option/term + termoption--jobs=replaceable class=parameternjobs/replaceable/option/term + listitem + para +Number of concurrent connections to perform the operation. +This option will enable the vacuum operation to run on asynchronous +connections, at a time one table will be operated on one connection. +So at one time as many tables will be vacuumed parallely as number of +jobs. If number of jobs given are more than number of tables then +number of jobs will be set to number of tables. asynchronous connections isn't a very well defined term. Also, the second part of that sentence doesn't seem to be gramattically correct. + /para + para +applicationvacuumdb/application will open +replaceable class=parameter njobs/replaceable connections to the +database, so make sure your xref linkend=guc-max-connections +setting is high enough to accommodate all connections. + /para Isn't it njobs+1? @@ -141,6 +199,7 @@ main(int argc, char *argv[]) } } + optind++; Hm, where's that coming from? + PQsetnonblocking(connSlot[0].connection, 1); + + for (i = 1; i concurrentCons; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + } Are you sure about this global PQsetnonblocking()? This means that you might not be able to send queries... And you don't seem to be waiting for sockets waiting for writes in the select loop - which means you might end up being stuck waiting for reads when you haven't submitted the query. I think you might need a more complex select() loop. On nonfree connections also wait for writes if PQflush() returns != 0. +/* + * GetIdleSlot + * Process the slot list, if any free slot is available then return + * the slotid else perform the select on all the socket's and wait + * until atleast one slot becomes available. + */ +static int +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) +{ + int i; + fd_set slotset; Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. + int firstFree = -1; + pgsocket maxFd; + + for (i = 0; i max_slot; i++) + if (pSlot[i].isFree) + return i; + FD_ZERO(slotset); + + maxFd = pSlot[0].sock; + + for (i = 0; i max_slot; i++) + { + FD_SET(pSlot[i].sock, slotset); + if (pSlot[i].sock maxFd) + maxFd = pSlot[i].sock; + } So we're waiting for idle connections? I think you'll have to have to use two fdsets here, and set the write set based on PQflush() != 0. +/* + * A select loop that repeats calling select until a descriptor in the read + * set becomes readable. On Windows we have to check for the termination event + * from time to time, on Unix we can just block forever. + */ Should a) mention why we have to check regularly on windows b) that on linux we don't have to because we send a cancel event from the signal handler. +static int +select_loop(int maxFd, fd_set *workerset) +{ + int i; + fd_set saveSet = *workerset; +#ifdef WIN32 + /* should always be the master */ Hm? I have to say, this is a fairly large patch for such a minor feature... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jan 2, 2015 at 11:47 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 31 December 2014 18:36, Amit Kapila Wrote, The patch looks good to me. I have done couple of cosmetic changes (spelling mistakes, improve some comments, etc.), check the same once and if you are okay, we can move ahead. Thanks for review and changes, changes looks fine to me.. Okay, I have marked this patch as Ready For Committer Notes for Committer - There is one behavioural difference in the handling of --analyze-in-stages switch, when individual tables (by using -t option) are analyzed by using this switch, patch will process (in case of concurrent jobs) all the given tables for stage-1 and then for stage-2 and so on whereas in the unpatched code it will process all the three stages table by table (table-1 all three stages, table-2 all three stages and so on). I think the new behaviour is okay as the same is done when this utility does vacuum for whole database. As there was no input from any committer on this point, I thought it is better to get the same rather than waiting more just for one point. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Amit Kapila amit.kapil...@gmail.com wrote: Notes for Committer - There is one behavioural difference in the handling of --analyze-in-stages switch, when individual tables (by using -t option) are analyzed by using this switch, patch will process (in case of concurrent jobs) all the given tables for stage-1 and then for stage-2 and so on whereas in the unpatched code it will process all the three stages table by table (table-1 all three stages, table-2 all three stages and so on). I think the new behaviour is okay as the same is done when this utility does vacuum for whole database. IMV, the change is for the better. The whole point of --analyze-in-stages is to get minimal statistics so that good enough plans will be built for most queries to allow a production database to be brought back on-line quickly, followed by generating increasing granularity (which takes longer but should help ensure best plan) while the database is in use with the initial statistics. Doing all three levels for one table before generating the rough statistics for the others doesn't help with that, so I see this change as fixing a bug. Is it feasible to break that part out as a separate patch? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jan 2, 2015 at 8:34 PM, Kevin Grittner kgri...@ymail.com wrote: Amit Kapila amit.kapil...@gmail.com wrote: Notes for Committer - There is one behavioural difference in the handling of --analyze-in-stages switch, when individual tables (by using -t option) are analyzed by using this switch, patch will process (in case of concurrent jobs) all the given tables for stage-1 and then for stage-2 and so on whereas in the unpatched code it will process all the three stages table by table (table-1 all three stages, table-2 all three stages and so on). I think the new behaviour is okay as the same is done when this utility does vacuum for whole database. IMV, the change is for the better. The whole point of --analyze-in-stages is to get minimal statistics so that good enough plans will be built for most queries to allow a production database to be brought back on-line quickly, followed by generating increasing granularity (which takes longer but should help ensure best plan) while the database is in use with the initial statistics. Doing all three levels for one table before generating the rough statistics for the others doesn't help with that, so I see this change as fixing a bug. Is it feasible to break that part out as a separate patch? Currently as the patch stands the fix (or new behaviour) is only implemented for the multiple jobs option, however to fix this in base code a separate patch is required. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Dec 29, 2014 at 11:10 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 29 December 2014 10:22 Amit Kapila Wrote, I think nothing more to be handled from my side, you can go ahead with review.. The patch looks good to me. I have done couple of cosmetic changes (spelling mistakes, improve some comments, etc.), check the same once and if you are okay, we can move ahead. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com vacuumdb_parallel_v21.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Wed, Dec 24, 2014 at 4:00 PM, Dilip kumar dilip.ku...@huawei.com wrote: Case1:In Case for CompleteDB: In base code first it will process all the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at least up to certain stage. But If we process all the stages for one table first, and then take the other table for processing the stage 1, then it may happen that for some table all the stages are processed, but others are waiting for even first stage to be processed, this will affect the functionality for analyze-in-stages. Case2: In case for independent tables like –t “t1” –t “t2” In base code also currently we are processing all the stages for first table and processing same for next table and so on. I think, if user is giving multiple tables together then his purpose might be to analyze those tables together stage by stage, but in our code we analyze table1 in all stages and then only considering the next table. So basically you want to say that currently the processing for tables with --analyze-in-stages switch is different when the user executes vacuumdb for whole database versus when it does for individual tables (multiple tables together). In the proposed patch the processing for tables will be same for either cases (whole database or independent tables). I think your point has merit, so lets proceed with this as it is in your patch. Do you have anything more to handle in patch or shall I take one another look and pass it to committer if it is ready for the same. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 29 December 2014 10:22 Amit Kapila Wrote, Case1:In Case for CompleteDB: In base code first it will process all the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at least up to certain stage. But If we process all the stages for one table first, and then take the other table for processing the stage 1, then it may happen that for some table all the stages are processed, but others are waiting for even first stage to be processed, this will affect the functionality for analyze-in-stages. Case2: In case for independent tables like –t “t1” –t “t2” In base code also currently we are processing all the stages for first table and processing same for next table and so on. I think, if user is giving multiple tables together then his purpose might be to analyze those tables together stage by stage, but in our code we analyze table1 in all stages and then only considering the next table. So basically you want to say that currently the processing for tables with --analyze-in-stages switch is different when the user executes vacuumdb for whole database versus when it does for individual tables (multiple tables together). In the proposed patch the processing for tables will be same for either cases (whole database or independent tables). I think your point has merit, so lets proceed with this as it is in your patch. Do you have anything more to handle in patch or shall I take one another look and pass it to committer if it is ready for the same. I think nothing more to be handled from my side, you can go ahead with review.. Regards, Dilip
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 19 December 2014 16:41, Amit Kapila Wrote, One idea is to send all the stages and corresponding Analyze commands to server in one go which means something like BEGIN; SET default_statistics_target=1; SET vacuum_cost_delay=0; Analyze t1; COMMIT; BEGIN; SET default_statistics_target=10; RESET vacuum_cost_delay; Analyze t1; COMMIT; BEGIN; RESET default_statistics_target; Analyze t1; COMMIT; Case1:In Case for CompleteDB: In base code first it will process all the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at least up to certain stage. But If we process all the stages for one table first, and then take the other table for processing the stage 1, then it may happen that for some table all the stages are processed, but others are waiting for even first stage to be processed, this will affect the functionality for analyze-in-stages. Case2: In case for independent tables like –t “t1” –t “t2” In base code also currently we are processing all the stages for first table and processing same for next table and so on. I think, if user is giving multiple tables together then his purpose might be to analyze those tables together stage by stage, but in our code we analyze table1 in all stages and then only considering the next table. So for tables also it should be like Stage1: T1 T2 .. Stage2: T1 T2 … Thoughts? Now, still parallel operations in other backends could lead to page misses, but I think the impact will be minimized. Regards, Dilip
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Dec 15, 2014 at 4:18 PM, Dilip kumar dilip.ku...@huawei.com wrote: On December 2014 17:31 Amit Kapila Wrote, Hmm, theoretically I think new behaviour could lead to more I/O in certain cases as compare to existing behaviour. The reason for more I/O is that in the new behaviour, while doing Analyze for a particular table at different targets, in-between it has Analyze of different table as well, so the pages in shared buffers or OS cache for a particular table needs to be reloded again for a new target whereas currently it will do all stages of Analyze for a particular table in one-go which means that each stage of Analyze could get benefit from the pages of a table loaded by previous stage. If you agree, then we should try to avoid this change in new behaviour. I will work on this comment and post the updated patch.. One idea is to send all the stages and corresponding Analyze commands to server in one go which means something like BEGIN; SET default_statistics_target=1; SET vacuum_cost_delay=0; Analyze t1; COMMIT; BEGIN; SET default_statistics_target=10; RESET vacuum_cost_delay; Analyze t1; COMMIT; BEGIN; RESET default_statistics_target; Analyze t1; COMMIT; Now, still parallel operations in other backends could lead to page misses, but I think the impact will be minimized. I will move this patch to the latest commitfest. By the way, I think this patch should be in Waiting On Author stage. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On December 2014 17:31 Amit Kapila Wrote, I suggest rather than removing, edit the comment to indicate the idea behind code at that place. Done Okay, I think this part of code is somewhat similar to what is done in pg_dump/parallel.c with some differences related to handling of inAbort. One thing I have noticed here which could lead to a problem is that caller of select_loop() function assumes that return value is less than zero only if there is a cancel request which I think is wrong, because select system call could also return -1 in case of error. I am referring below code in above context: + if (i 0) + { + /* + * This can only happen if user has sent the cancel request using + * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result. + */ + + GetQueryResult(pSlot[0].connection, dbname, progname, +completedb); Now for abort case I am using special error code, and other than that case we will assert, this behavior is same as in pg_dump. Hmm, theoretically I think new behaviour could lead to more I/O in certain cases as compare to existing behaviour. The reason for more I/O is that in the new behaviour, while doing Analyze for a particular table at different targets, in-between it has Analyze of different table as well, so the pages in shared buffers or OS cache for a particular table needs to be reloded again for a new target whereas currently it will do all stages of Analyze for a particular table in one-go which means that each stage of Analyze could get benefit from the pages of a table loaded by previous stage. If you agree, then we should try to avoid this change in new behaviour. I will work on this comment and post the updated patch.. I will move this patch to the latest commitfest. Regards, Dilip vacuumdb_parallel_v20.patch Description: vacuumdb_parallel_v20.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Sat, Dec 6, 2014 at 9:01 PM, Amit Kapila amit.kapil...@gmail.com wrote: If you agree, then we should try to avoid this change in new behaviour. Still seeing many concerns about this patch, so marking it as returned with feedback. If possible, switching it to the next CF would be fine I guess as this work is still being continued. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 06 December 2014 20:01 Amit Kapila Wrote I wanted to understand what exactly the above loop is doing. a. first of all the comment on top of it says Some of the slot are free, ..., if some slot is free, then why do you want to process the results? (Do you mean to say that *None* of the slot is free?) This comment is wrong, I will remove this. b. IIUC, you have called function select_loop(maxFd, slotset) to check if socket descriptor is readable, if yes then why in do..while loop the same maxFd is checked always, don't you want to check different socket descriptors? I am not sure if I am missing something here select_loop(maxFd, slotset) maxFd is the max descriptor among all SETS, and slotset contains all the descriptor, so if any of the descriptor get some message select_loop will come out, and once select loop come out, we need to check how many descriptor have got the message from server so we loop and process the results. So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete, and if none of the socket is still free (that we check in below for loop), then go to select_loop again. c. After checking the socket descriptor for maxFd why you want to run run the below for loop for all slots? for (i = 0; i max_slot; i++) After Select loop is out, it’s possible that we might have got result on multiple connections, so consume input and check if still busy, then nothing to do, but if finished process the result and mark the connection free. And if any of the connection is free, then we will break the do..while loop. From: Amit Kapila [mailto:amit.kapil...@gmail.com] Sent: 06 December 2014 20:01 To: Dilip kumar Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ] On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar dilip.ku...@huawei.commailto:dilip.ku...@huawei.com wrote: On 24 November 2014 11:29, Amit Kapila Wrote, I have verified that all previous comments are addressed and the new version is much better than previous version. here we are setting each target once and doing for all the tables.. Hmm, theoretically I think new behaviour could lead to more I/O in certain cases as compare to existing behaviour. The reason for more I/O is that in the new behaviour, while doing Analyze for a particular table at different targets, in-between it has Analyze of different table as well, so the pages in shared buffers or OS cache for a particular table needs to be reloded again for a new target whereas currently it will do all stages of Analyze for a particular table in one-go which means that each stage of Analyze could get benefit from the pages of a table loaded by previous stage. If you agree, then we should try to avoid this change in new behaviour. Please provide you opinion. I have few questions regarding function GetIdleSlot() + static int + GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, +const char *progname, bool completedb) { .. +/* +* Some of the slot are free, Process the results for slots whichever +* are free +*/ + +do +{ +SetCancelConn(pSlot[0].connection); + +i = select_loop(maxFd, slotset); + +ResetCancelConn(); + +if (i 0) +{ +/* + * This can only happen if user has sent the cancel request using +* Ctrl+C, Cancel is handled by 0th slot, so fetch the error result. +*/ + + GetQueryResult(pSlot[0].connection, dbname, progname, + completedb); +return NO_SLOT; +} + +Assert(i != 0); + + for (i = 0; i max_slot; i++) +{ +if (!FD_ISSET(pSlot[i].sock, slotset)) +continue; + +PQconsumeInput(pSlot[i].connection); + if (PQisBusy(pSlot[i].connection)) +continue; + + pSlot[i].isFree = true; + +if (!GetQueryResult(pSlot[i].connection, dbname, progname, + completedb)) + return NO_SLOT; + +if (firstFree 0) +firstFree = i; + } +}while(firstFree 0); } I wanted to understand what exactly the above loop is doing. a. first of all the comment on top
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Dec 8, 2014 at 7:33 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 06 December 2014 20:01 Amit Kapila Wrote I wanted to understand what exactly the above loop is doing. a. first of all the comment on top of it says Some of the slot are free, ..., if some slot is free, then why do you want to process the results? (Do you mean to say that *None* of the slot is free?) This comment is wrong, I will remove this. I suggest rather than removing, edit the comment to indicate the idea behind code at that place. b. IIUC, you have called function select_loop(maxFd, slotset) to check if socket descriptor is readable, if yes then why in do..while loop the same maxFd is checked always, don't you want to check different socket descriptors? I am not sure if I am missing something here select_loop(maxFd, slotset) So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete, Okay, I think this part of code is somewhat similar to what is done in pg_dump/parallel.c with some differences related to handling of inAbort. One thing I have noticed here which could lead to a problem is that caller of select_loop() function assumes that return value is less than zero only if there is a cancel request which I think is wrong, because select system call could also return -1 in case of error. I am referring below code in above context: + if (i 0) + { + /* + * This can only happen if user has sent the cancel request using + * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result. + */ + + GetQueryResult(pSlot[0].connection, dbname, progname, +completedb); With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 24 November 2014 11:29, Amit Kapila Wrote, I have verified that all previous comments are addressed and the new version is much better than previous version. here we are setting each target once and doing for all the tables.. Hmm, theoretically I think new behaviour could lead to more I/O in certain cases as compare to existing behaviour. The reason for more I/O is that in the new behaviour, while doing Analyze for a particular table at different targets, in-between it has Analyze of different table as well, so the pages in shared buffers or OS cache for a particular table needs to be reloded again for a new target whereas currently it will do all stages of Analyze for a particular table in one-go which means that each stage of Analyze could get benefit from the pages of a table loaded by previous stage. If you agree, then we should try to avoid this change in new behaviour. Please provide you opinion. I have few questions regarding function GetIdleSlot() + static int + GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) { .. + /* + * Some of the slot are free, Process the results for slots whichever + * are free + */ + + do + { + SetCancelConn(pSlot[0].connection); + + i = select_loop(maxFd, slotset); + + ResetCancelConn(); + + if (i 0) + { + /* + * This can only happen if user has sent the cancel request using + * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result. + */ + + GetQueryResult(pSlot[0].connection, dbname, progname, + completedb); + return NO_SLOT; + } + + Assert(i != 0); + + for (i = 0; i max_slot; i++) + { + if (!FD_ISSET(pSlot[i].sock, slotset)) + continue; + + PQconsumeInput(pSlot[i].connection); + if (PQisBusy(pSlot[i].connection)) + continue; + + pSlot[i].isFree = true; + + if (!GetQueryResult(pSlot[i].connection, dbname, progname, + completedb)) + return NO_SLOT; + + if (firstFree 0) + firstFree = i; + } + }while(firstFree 0); } I wanted to understand what exactly the above loop is doing. a. first of all the comment on top of it says Some of the slot are free, ..., if some slot is free, then why do you want to process the results? (Do you mean to say that *None* of the slot is free?) b. IIUC, you have called function select_loop(maxFd, slotset) to check if socket descriptor is readable, if yes then why in do..while loop the same maxFd is checked always, don't you want to check different socket descriptors? I am not sure if I am missing something here c. After checking the socket descriptor for maxFd why you want to run run the below for loop for all slots? for (i = 0; i max_slot; i++) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 24 November 2014 11:29, Amit Kapila Wrote, I think still some of the comments given upthread are not handled: a. About cancel handling Your Actual comment was -- One other related point is that I think still cancel handling mechanism is not completely right, code is doing that when there are not enough number of freeslots, but otherwise it won't handle the cancel request, basically I am referring to below part of code: run_parallel_vacuum() { .. for (cell = tables-head; cell; cell = cell-next) { .. free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname, completedb); … PQsendQuery(slotconn, sql.data); resetPQExpBuffer(sql); } 1. I think only if connection is going for Slot wait, it will be in blocking, or while GetQueryResult, so we have Handle SetCancleRequest both places. 2. Now a case (as you mentioned), when there are enough slots, and and above for loop is running if user do Ctrl+C then this will not break, This I have handled by checking inAbort Mode inside the for loop before sending the new command, I think this we cannot do by setting the SetCancel because only when query receive some data it will realize that it canceled and it will fail, but until connection is not going to receive data it will not see the failure. So I have handled inAbort directly. b. Setting of inAbort flag for case where PQCancel is successful Your Actual comment was -- Yeah, user has tried to terminate, however utility will emit the message: Could not send cancel request in such a case and still silently tries to cancel and disconnect all connections. You are right, I have fixed the code, now in case of failure we need not to set inAbort Flag.. c. Performance data of --analyze-in-stages switch Performance Data -- CPU 8 cores RAM = 16GB checkpoint_segments=256 Before each test, run the test.sql (attached) Un-patched - dilip@linux-ltr9:/home/dilip/9.4/install/bin time ./vacuumdb -p 9005 --analyze-in-stages -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics real0m0.843s user0m0.000s sys 0m0.000s Patched dilip@linux-ltr9:/home/dilip/9.4/install/bin time ./vacuumdb -p 9005 --analyze-in-stages -j 2 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics real0m0.593s user0m0.004s sys 0m0.004s dilip@linux-ltr9:/home/dilip/9.4/install/bin time ./vacuumdb -p 9005 --analyze-in-stages -j 4 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics real0m0.421s user0m0.004s sys 0m0.004s I think in 2 connections we can get 30% improvement. d. Have one pass over the comments in patch. I could still some wrong multiline comments. Refer below: + /* Otherwise, we got a stage from vacuum_all_databases(), so run + * only that one. */ Checked all, and fixed.. While testing, I found one more different behavior compare to base code, Base Code: dilip@linux-ltr9:/home/dilip/9.4/install/bin time ./vacuumdb -p 9005 -t t1 -t t2 -t t3 -t t4 --analyze-in-stages -d Postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics real0m0.605s user0m0.004s sys 0m0.000s I think it should be like, SET default_statistics_target=1; do for all three tables SET default_statistics_target=10; do for all three tables so on.. With Patched dilip@linux-ltr9:/home/dilip/9.4/install/bin time ./vacuumdb -p 9005 -t t1 -t t2 -t t3 -t t4 --analyze-in-stages -j 2 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics real0m0.395s user0m0.000s sys 0m0.004s here we are setting each target once and doing for all the tables.. Please provide you opinion. Regards, Dilip Kumar test.sql Description: test.sql vacuumdb_parallel_v19.patch Description: vacuumdb_parallel_v19.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 23 November 2014 14:45, Amit Kapila Wrote Thanks a lot for debugging and fixing the issue.. The stacktrace of crash is as below: #0 0x0080108cf3a4 in .strlen () from /lib64/libc.so.6 #1 0x0080108925bc in ._IO_vfprintf () from /lib64/libc.so.6 #2 0x0080108bc1e0 in .__GL__IO_vsnprintf_vsnprintf () from /lib64/libc.so.6 #3 0x0fff7e581590 in .appendPQExpBufferVA () from /data/akapila/workspace/master/installation/lib/libpq.so.5 #4 0x0fff7e581774 in .appendPQExpBuffer () from /data/akapila/workspace/master/installation/lib/libpq.so.5 #5 0x10003748 in .run_parallel_vacuum () #6 0x10003f60 in .vacuum_parallel () #7 0x10002ae4 in .main () (gdb) f 5 #5 0x10003748 in .run_parallel_vacuum () So now the real reason here is that the list of tables passed to function is corrupted. The below code seems to be the real culprit: vacuum_parallel() { .. if (!tables || !tables-head) { SimpleStringList dbtables = {NULL, NULL}; ... .. tables = dbtables; } .. } In above code dbtables is local to if loop and code is using the address of same to assign it to tables which is used out of if block scope, moving declaration to the outer scope fixes the problem in my environment. Find the updated patch that fixes this problem attached with this mail. Let me know your opinion about the same. Yes, that’s the reason of corruption, this must be causing both the issues, sending corrupted query to server as well as crash at client side. While looking at this problem, I have noticed couple of other improvements: a. In prepare_command() function, patch is doing init of sql buffer (initPQExpBuffer(sql);) which I think is not required as both places from where this function is called, it is done by caller. I think this will lead to memory leak. Fixed.. b. In prepare_command() function, for fixed strings you can use appendPQExpBufferStr() which is what used in original code as well. Changed as per comment.. c. run_parallel_vacuum() { .. prepare_command(connSlot[free_slot].connection, full, verbose, and_analyze, analyze_only, freeze, sql); appendPQExpBuffer(sql, %s, cell-val); .. } I think it is better to end command with ';' by using appendPQExpBufferStr(sql, ;); in above code. Done Latest patch is attached, please have a look. Regards, Dilip Kumar vacuumdb_parallel_v18.patch Description: vacuumdb_parallel_v18.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Nov 24, 2014 at 7:34 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 23 November 2014 14:45, Amit Kapila Wrote Thanks a lot for debugging and fixing the issue.. Latest patch is attached, please have a look. I think still some of the comments given upthread are not handled: a. About cancel handling b. Setting of inAbort flag for case where PQCancel is successful c. Performance data of --analyze-in-stages switch d. Have one pass over the comments in patch. I could still some wrong multiline comments. Refer below: + /* Otherwise, we got a stage from vacuum_all_databases(), so run + * only that one. */ With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Nov 17, 2014 at 8:55 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 13 November 2014 15:35 Amit Kapila Wrote, As mentioned by you offlist that you are not able reproduce this issue, I have tried again and what I observe is that I am able to reproduce it only on *release* build and some cases work without this issue as well, example: ./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics So to me, it looks like this is a timing issue and please notice why in error the statement looks like ANALYZE ng minimal optimizer statistics (1 target). I think this is not a valid statement. Let me know if you still could not reproduce it. I will review the code and try to find the same, meanwhile if you can find some time to debug this, it will be really helpful. I think I have found the problem and fix for the same. The stacktrace of crash is as below: #0 0x0080108cf3a4 in .strlen () from /lib64/libc.so.6 #1 0x0080108925bc in ._IO_vfprintf () from /lib64/libc.so.6 #2 0x0080108bc1e0 in .__GL__IO_vsnprintf_vsnprintf () from /lib64/libc.so.6 #3 0x0fff7e581590 in .appendPQExpBufferVA () from /data/akapila/workspace/master/installation/lib/libpq.so.5 #4 0x0fff7e581774 in .appendPQExpBuffer () from /data/akapila/workspace/master/installation/lib/libpq.so.5 #5 0x10003748 in .run_parallel_vacuum () #6 0x10003f60 in .vacuum_parallel () #7 0x10002ae4 in .main () (gdb) f 5 #5 0x10003748 in .run_parallel_vacuum () So now the real reason here is that the list of tables passed to function is corrupted. The below code seems to be the real culprit: vacuum_parallel() { .. if (!tables || !tables-head) { SimpleStringList dbtables = {NULL, NULL}; ... .. tables = dbtables; } .. } In above code dbtables is local to if loop and code is using the address of same to assign it to tables which is used out of if block scope, moving declaration to the outer scope fixes the problem in my environment. Find the updated patch that fixes this problem attached with this mail. Let me know your opinion about the same. While looking at this problem, I have noticed couple of other improvements: a. In prepare_command() function, patch is doing init of sql buffer (initPQExpBuffer(sql);) which I think is not required as both places from where this function is called, it is done by caller. I think this will lead to memory leak. b. In prepare_command() function, for fixed strings you can use appendPQExpBufferStr() which is what used in original code as well. c. run_parallel_vacuum() { .. prepare_command(connSlot[free_slot].connection, full, verbose, and_analyze, analyze_only, freeze, sql); appendPQExpBuffer(sql, %s, cell-val); .. } I think it is better to end command with ';' by using appendPQExpBufferStr(sql, ;); in above code. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com vacuumdb_parallel_v17.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 13 November 2014 15:35 Amit Kapila Wrote, As mentioned by you offlist that you are not able reproduce this issue, I have tried again and what I observe is that I am able to reproduce it only on *release* build and some cases work without this issue as well, example: ./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics So to me, it looks like this is a timing issue and please notice why in error the statement looks like ANALYZE ng minimal optimizer statistics (1 target). I think this is not a valid statement. Let me know if you still could not reproduce it. Thank you for looking into it once again.. I have tried with the release mode, but could not reproduce the same. By looking at server LOG sent by you “ANALYZE ng minimal optimizer statistics (1 target). ”, seems like some corruption. So actually looks like two issues here. 1. Query string sent to server is getting corrupted. 2. Client is getting crashed. I will review the code and try to find the same, meanwhile if you can find some time to debug this, it will be really helpful. Regards, Dilip
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Oct 27, 2014 at 5:26 PM, Amit Kapila amit.kapil...@gmail.com wrote: Going further with verification of this patch, I found below issue: Run the testcase.sql file at below link: http://www.postgresql.org/message-id/4205e661176a124faf891e0a6ba9135266347...@szxeml509-mbs.china.huawei.com ./vacuumdb --analyze-in-stages -j 8 -d postgres Generating minimal optimizer statistics (1 target) Segmentation fault Server Log: ERROR: syntax error at or near minimal at character 12 STATEMENT: ANALYZE ng minimal optimizer statistics (1 target) LOG: could not receive data from client: Connection reset by peer As mentioned by you offlist that you are not able reproduce this issue, I have tried again and what I observe is that I am able to reproduce it only on *release* build and some cases work without this issue as well, example: ./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics So to me, it looks like this is a timing issue and please notice why in error the statement looks like ANALYZE ng minimal optimizer statistics (1 target). I think this is not a valid statement. Let me know if you still could not reproduce it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Oct 28, 2014 at 9:27 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 28 October 2014 09:18, Amit Kapila Wrote, I am worried about the case if after setting the inAbort flag, PQCancel() fails (returns error). If select(maxFd + 1, workerset, NULL, NULL, tv); come out, we can know whether it came out because of cancel query and handle it accordingly. Yeah, it is fine for the case when PQCancel() is successful, what if it fails? I think even if select comes out due to any other reason, it will behave as if it came out due to Cancel, even though actually Cancel is failed, how are planning to handle that case? I think If PQcancel fails then also there is no problem, because we are setting inAbort flag in handle_sigint handler, it means user have tried to terminate. Yeah, user has tried to terminate, however utility will emit the message: Could not send cancel request in such a case and still silently tries to cancel and disconnect all connections. One other related point is that I think still cancel handling mechanism is not completely right, code is doing that when there are not enough number of freeslots, but otherwise it won't handle the cancel request, basically I am referring to below part of code: run_parallel_vacuum() { .. for (cell = tables-head; cell; cell = cell-next) { /* * This will give the free connection slot, if no slot is free it will * wait for atleast one slot to get free. */ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname, completedb); if (free_slot == NO_SLOT) { error = true; goto fail; } prepare_command(connSlot[free_slot].connection, full, verbose, and_analyze, analyze_only, freeze, sql); appendPQExpBuffer(sql, %s, cell-val); connSlot[free_slot].isFree = false; slotconn = connSlot[free_slot].connection; PQsendQuery(slotconn, sql.data); resetPQExpBuffer(sql); } .. } I am wondering if it would be better to setcancelconn in above loop. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Sat, Oct 25, 2014 at 5:52 PM, Amit Kapila amit.kapil...@gmail.com wrote: *** *** 358,363 handle_sigint(SIGNAL_ARGS) --- 358,364 /* Send QueryCancel if we are processing a database query */ if (cancelConn != NULL) { + inAbort = true; if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) fprintf(stderr, _(Cancel request sent\n)); else Do we need to set inAbort flag incase PQcancel is successful? Basically if PQCancel fails due to any reason, I think behaviour can be undefined as the executing thread can assume that cancel is done. *** 391,396 consoleHandler(DWORD dwCtrlType) --- 392,399 EnterCriticalSection (cancelConnLock); if (cancelConn != NULL) { + inAbort = true; + You have set this flag in case of windows handler, however the same is never used incase of windows, are you expecting any use of this flag for windows? Going further with verification of this patch, I found below issue: Run the testcase.sql file at below link: http://www.postgresql.org/message-id/4205e661176a124faf891e0a6ba9135266347...@szxeml509-mbs.china.huawei.com ./vacuumdb --analyze-in-stages -j 8 -d postgres Generating minimal optimizer statistics (1 target) Segmentation fault Server Log: ERROR: syntax error at or near minimal at character 12 STATEMENT: ANALYZE ng minimal optimizer statistics (1 target) LOG: could not receive data from client: Connection reset by peer Fixed below issues and attached an updated patch with mail: 1. make check for docs gives below errors: { \ echo !ENTITY version \9.5devel\; \ echo !ENTITY majorversion \9.5\; \ } version.sgml '/usr/bin/perl' ./mk_feature_tables.pl YES ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt features-supported.sgml '/usr/bin/perl' ./mk_feature_tables.pl NO ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt features-unsupported.sgml '/usr/bin/perl' ./generate-errcodes-table.pl ../../../src/backend/utils/errcodes.txt errcodes-table.sgml onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -s postgres.sgml onsgmls:ref/vacuumdb.sgml:224:15:E: end tag for LISTITEM omitted, but OMITTAG NO was specified onsgmls:ref/vacuumdb.sgml:209:8: start tag was here onsgmls:ref/vacuumdb.sgml:224:15:E: end tag for VARLISTENTRY omitted, but OMITTAG NO was specified onsgmls:ref/vacuumdb.sgml:206:5: start tag was here onsgmls:ref/vacuumdb.sgml:224:15:E: end tag for VARIABLELIST omitted, but OMITTAG NO was specified onsgmls:ref/vacuumdb.sgml:79:4: start tag was here onsgmls:ref/vacuumdb.sgml:225:18:E: end tag for element LISTITEM which is not open onsgmls:ref/vacuumdb.sgml:226:21:E: end tag for element VARLISTENTRY which is not open onsgmls:ref/vacuumdb.sgml:228:18:E: document type does not allow element VARLISTENTRY here; assuming missing VARIABLELIST start-tag onsgmls:ref/vacuumdb.sgml:260:9:E: end tag for element PARA which is not open make: *** [check] Error 1 Fixed. 2. Below multi-line comment is wrong: /* Otherwise, we got a stage from vacuum_all_databases(), so run * only that one. */ Fixed. 3. fprintf(stderr, _(%s: vacuuming of database \%s\ failed: %s), progname, dbname, PQerrorMessage(conn)); indentation of fprintf is not proper. Fixed. 4. /* This can only happen if user has sent the cacle request using * Ctrl+C, Cancle is handled by 0th slot, so fetch the error result */ spelling of cancel is wrong and multi-line comment is not proper. Fixed 5. /* This can only happen if user has sent the cacle request using * Ctrl+C, Cancle is handled by 0th slot, so fetch the error result */ GetQueryResult(pSlot[0].connection, dbname, progname, completedb); indentation of completedb parameter is wrong. Fixed. 6. /* * vacuum_parallel * This function will open the multiple concurrent connections as * suggested by used, it will derive the table list using server call * if table list is not given by user and perform the vacuum call */ s/used/user Fixed. In general, I think you can once go through all the comments and code to see if similar issues exist at other places as well. I have done some performance test with the patch, data for which is as below: Performance Data -- IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB max_connections = 128 checkpoint_segments=256 checkpoint_timeout=15min shared_buffers = 1GB Before each test, run the testcase.sql file at below link: http://www.postgresql.org/message-id/4205e661176a124faf891e0a6ba9135266347...@szxeml509-mbs.china.huawei.com Un-patched - time ./vacuumdb -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -d postgres real 0m2.454s user 0m0.002s sys 0m0.006s Patched - time ./vacuumdb -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 4 -d postgres real 0m1.691s user 0m0.001s sys 0m0.004s time ./vacuumdb -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres real 0m1.496s user
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 25 October 2014 17:52, Amit Kapila Wrote, *** *** 358,363 handle_sigint(SIGNAL_ARGS) --- 358,364 /* Send QueryCancel if we are processing a database query */ if (cancelConn != NULL) { + inAbort = true; if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) fprintf(stderr, _(Cancel request sent\n)); else Do we need to set inAbort flag incase PQcancel is successful? Basically if PQCancel fails due to any reason, I think behaviour can be undefined as the executing thread can assume that cancel is done. *** 391,396 consoleHandler(DWORD dwCtrlType) --- 392,399 EnterCriticalSection (cancelConnLock); if (cancelConn != NULL) { + inAbort = true; + In “handle_sigint” function if we are going to cancel the query that time I am setting the flag inAbort (even when it is success), so that in “select_loop” function If select(maxFd + 1, workerset, NULL, NULL, tv); come out, we can know whether it came out because of cancel query and handle it accordingly. i = select(maxFd + 1, workerset, NULL, NULL, NULL); if (in_abort()) //loop break because of cancel query, so return fail… { return -1; } if (i 0 errno == EINTR) continue; Regards, Dilip Kumar
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Oct 28, 2014 at 9:03 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 25 October 2014 17:52, Amit Kapila Wrote, *** *** 358,363 handle_sigint(SIGNAL_ARGS) --- 358,364 /* Send QueryCancel if we are processing a database query */ if (cancelConn != NULL) { + inAbort = true; if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) fprintf(stderr, _(Cancel request sent\n)); else Do we need to set inAbort flag incase PQcancel is successful? Basically if PQCancel fails due to any reason, I think behaviour can be undefined as the executing thread can assume that cancel is done. *** 391,396 consoleHandler(DWORD dwCtrlType) --- 392,399 EnterCriticalSection (cancelConnLock); if (cancelConn != NULL) { + inAbort = true; + In “handle_sigint” function if we are going to cancel the query that time I am setting the flag inAbort (even when it is success), so that in “select_loop” function I am worried about the case if after setting the inAbort flag, PQCancel() fails (returns error). If select(maxFd + 1, workerset, NULL, NULL, tv); come out, we can know whether it came out because of cancel query and handle it accordingly. Yeah, it is fine for the case when PQCancel() is successful, what if it fails? I think even if select comes out due to any other reason, it will behave as if it came out due to Cancel, even though actually Cancel is failed, how are planning to handle that case? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 28 October 2014 09:18, Amit Kapila Wrote, I am worried about the case if after setting the inAbort flag, PQCancel() fails (returns error). If select(maxFd + 1, workerset, NULL, NULL, tv); come out, we can know whether it came out because of cancel query and handle it accordingly. Yeah, it is fine for the case when PQCancel() is successful, what if it fails? I think even if select comes out due to any other reason, it will behave as if it came out due to Cancel, even though actually Cancel is failed, how are planning to handle that case? I think If PQcancel fails then also there is no problem, because we are setting inAbort flag in handle_sigint handler, it means user have tried to terminate. So in this case as well we will find that inAbort is set so we return error, and in error case we finally call DisconnectDatabase, and in this function we will send the PQcancel for all active connection and then only we disconnect. Regards, DIlip
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Oct 7, 2014 at 11:10 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 26 September 2014 12:24, Amit Kapila Wrote, I don't think this can handle cancel requests properly because you are just setting it in GetIdleSlot() what if the cancel request came during GetQueryResult() after sending sql for all connections (probably thats the reason why Jeff is not able to cancel the vacuumdb when using parallel option). You are right, I have fixed, it in latest patch, please check latest patch @ ( 4205e661176a124faf891e0a6ba9135266363...@szxeml509-mbs.china.huawei.com) *** *** 358,363 handle_sigint(SIGNAL_ARGS) --- 358,364 /* Send QueryCancel if we are processing a database query */ if (cancelConn != NULL) { + inAbort = true; if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) fprintf(stderr, _(Cancel request sent\n)); else Do we need to set inAbort flag incase PQcancel is successful? Basically if PQCancel fails due to any reason, I think behaviour can be undefined as the executing thread can assume that cancel is done. *** 391,396 consoleHandler(DWORD dwCtrlType) --- 392,399 EnterCriticalSection (cancelConnLock); if (cancelConn != NULL) { + inAbort = true; + You have set this flag in case of windows handler, however the same is never used incase of windows, are you expecting any use of this flag for windows? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs si...@2ndquadrant.com wrote: On 16 October 2014 15:09, Amit Kapila amit.kapil...@gmail.com wrote: I think doing anything on the server side can have higher complexity like: a. Does this function return immediately after sending request to autovacuum, if yes then the behaviour of this new functionality will be different as compare to vacuumdb which user might not expect? b. We need to have some way to handle errors that can occur in autovacuum (may be need to find a way to pass back to user), vacuumdb or Vacuum can report error to user. c. How does nworkers input relates to autovacuum_max_workers which is needed at start for shared memory initialization and in calc of maxbackends. d. How to handle database wide vacuum which is possible via vacuumdb e. What is the best UI (a command like above or via config parameters)? c) seems like the only issue that needs any thought. I don't think its going to be that hard. I don't see any problems with the other points. You can make a function wait, if you wish. It is quite possible, but still I think to accomplish such a function, we need to have some mechanism where it can inform auto vacuum and then some changes in auto vacuum to receive/read that information and reply back. I don't think any such mechanism exists. I think we can find a way for above and may be if any other similar things needs to be taken care, but still I think it is better that we have this feature via vacuumdb rather than adding complexity in server code. Also the current algorithm used in patch is discussed and agreed upon in this thread and if now we want to go via some other method (auto vacuum), it might take much more time to build consensus on all the points. Well, I read Alvaro's point from earlier in the thread and agreed with it. All we really need is an instruction to autovacuum to say be aggressive. Just because somebody added something to the TODO list doesn't make it a good idea. I apologise to Dilip for saying this, it is not anything against him, just the idea. Perhaps we just accept the patch and change AV in the future. So shall we move ahead with review of this patch and make a note of changing AV in future (may be TODO)? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 17 October 2014 12:52, Amit Kapila amit.kapil...@gmail.com wrote: It is quite possible, but still I think to accomplish such a function, we need to have some mechanism where it can inform auto vacuum and then some changes in auto vacuum to receive/read that information and reply back. I don't think any such mechanism exists. That's exactly how the CHECKPOINT command works, in conjunction with the checkpointer process. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Amit Kapila wrote: On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs si...@2ndquadrant.com wrote: On 16 October 2014 15:09, Amit Kapila amit.kapil...@gmail.com wrote: c) seems like the only issue that needs any thought. I don't think its going to be that hard. I don't see any problems with the other points. You can make a function wait, if you wish. It is quite possible, but still I think to accomplish such a function, we need to have some mechanism where it can inform auto vacuum and then some changes in auto vacuum to receive/read that information and reply back. I don't think any such mechanism exists. You're right, it doesn't. I think we have plenty more infrastructure for that than we had when autovacuum was initially developed. It shouldn't be that hard. Of course, this is a task that requires much more thinking, design, and discussion than just adding multi-process capability to vacuumdb ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 17 October 2014 14:05, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Of course, this is a task that requires much more thinking, design, and discussion than just adding multi-process capability to vacuumdb ... Yes, please proceed with this patch as originally envisaged. No more comments from me. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 16 October 2014 06:05, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs si...@2ndquadrant.com wrote: I've been trying to review this thread with the thought what does this give me?. I am keen to encourage contributions and also keen to extend our feature set, but I do not wish to complicate our code base. Dilip's developments do seem to be good quality; what I question is whether we want this feature. This patch seems to allow me to run multiple VACUUMs at once. But I can already do this, with autovacuum. Is there anything this patch can do that cannot be already done with autovacuum? The difference lies in the fact that vacuumdb (or VACUUM) gives the option to user to control the vacuum activity for cases when autovacuum doesn't suffice the need, one of the example is to perform vacuum via vacuumdb after pg_upgrade or some other maintenance activity (as mentioned by Jeff upthread). So I think in all such cases having parallel option can give benefit in terms of performance which is already shown by Dilip upthread by running some tests (with and without patch). Why do we need 2 ways to do the same thing? Why not ask autovacuum to do this for you? Just send a message to autovacuum to request an immediate action. Let it manage the children and the tasks. SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables); Request would allocate an additional N workers and immediately begin vacuuming the stated tables. vacuumdb can still issue the request, but the guts of this are done by the server, not a heavily modified client. If we are going to heavily modify a client then it needs to be able to run more than just one thing. Parallel psql would be nice. pg_batch? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Oct 16, 2014 at 1:56 PM, Simon Riggs si...@2ndquadrant.com wrote: On 16 October 2014 06:05, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs si...@2ndquadrant.com wrote: This patch seems to allow me to run multiple VACUUMs at once. But I can already do this, with autovacuum. Is there anything this patch can do that cannot be already done with autovacuum? The difference lies in the fact that vacuumdb (or VACUUM) gives the option to user to control the vacuum activity for cases when autovacuum doesn't suffice the need, one of the example is to perform vacuum via vacuumdb after pg_upgrade or some other maintenance activity (as mentioned by Jeff upthread). So I think in all such cases having parallel option can give benefit in terms of performance which is already shown by Dilip upthread by running some tests (with and without patch). Why do we need 2 ways to do the same thing? Isn't the multiple ways to do the same already exists like via vacuumdb | Vaccum and autovaccum? Why not ask autovacuum to do this for you? Just send a message to autovacuum to request an immediate action. Let it manage the children and the tasks. SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables); Request would allocate an additional N workers and immediately begin vacuuming the stated tables. I think doing anything on the server side can have higher complexity like: a. Does this function return immediately after sending request to autovacuum, if yes then the behaviour of this new functionality will be different as compare to vacuumdb which user might not expect? b. We need to have some way to handle errors that can occur in autovacuum (may be need to find a way to pass back to user), vacuumdb or Vacuum can report error to user. c. How does nworkers input relates to autovacuum_max_workers which is needed at start for shared memory initialization and in calc of maxbackends. d. How to handle database wide vacuum which is possible via vacuumdb e. What is the best UI (a command like above or via config parameters)? I think we can find a way for above and may be if any other similar things needs to be taken care, but still I think it is better that we have this feature via vacuumdb rather than adding complexity in server code. Also the current algorithm used in patch is discussed and agreed upon in this thread and if now we want to go via some other method (auto vacuum), it might take much more time to build consensus on all the points. vacuumdb can still issue the request, but the guts of this are done by the server, not a heavily modified client. If we are going to heavily modify a client then it needs to be able to run more than just one thing. Parallel psql would be nice. pg_batch? Could you be more specific in this point, I am not able to see how vacuumdb utility has anything to do with parallel psql? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 16 October 2014 15:09, Amit Kapila amit.kapil...@gmail.com wrote: Just send a message to autovacuum to request an immediate action. Let it manage the children and the tasks. SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables); Request would allocate an additional N workers and immediately begin vacuuming the stated tables. I think doing anything on the server side can have higher complexity like: a. Does this function return immediately after sending request to autovacuum, if yes then the behaviour of this new functionality will be different as compare to vacuumdb which user might not expect? b. We need to have some way to handle errors that can occur in autovacuum (may be need to find a way to pass back to user), vacuumdb or Vacuum can report error to user. c. How does nworkers input relates to autovacuum_max_workers which is needed at start for shared memory initialization and in calc of maxbackends. d. How to handle database wide vacuum which is possible via vacuumdb e. What is the best UI (a command like above or via config parameters)? c) seems like the only issue that needs any thought. I don't think its going to be that hard. I don't see any problems with the other points. You can make a function wait, if you wish. I think we can find a way for above and may be if any other similar things needs to be taken care, but still I think it is better that we have this feature via vacuumdb rather than adding complexity in server code. Also the current algorithm used in patch is discussed and agreed upon in this thread and if now we want to go via some other method (auto vacuum), it might take much more time to build consensus on all the points. Well, I read Alvaro's point from earlier in the thread and agreed with it. All we really need is an instruction to autovacuum to say be aggressive. Just because somebody added something to the TODO list doesn't make it a good idea. I apologise to Dilip for saying this, it is not anything against him, just the idea. Perhaps we just accept the patch and change AV in the future. vacuumdb can still issue the request, but the guts of this are done by the server, not a heavily modified client. If we are going to heavily modify a client then it needs to be able to run more than just one thing. Parallel psql would be nice. pg_batch? Could you be more specific in this point, I am not able to see how vacuumdb utility has anything to do with parallel psql? That's my point. All this code in vacuumdb just for this one isolated use case? Twice the maintenance burden. A more generic utility to run commands in parallel would be useful. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 27 September 2014 03:55, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Sep 26, 2014 at 11:47 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Gavin Flower wrote: Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Feasible: no. Useful: maybe, we don't really know. (You could just as well have a worker at double the speed, i.e. double vacuum_cost_limit). Vacuum_cost_delay is already 0 by default. So unless you changed that, vacuum_cost_limit does not take effect under vacuumdb. It is pretty easy for vacuum to be CPU limited, and even easier for analyze to be CPU limited (It does a lot of sorting). I think analyzing is the main use case for this patch, to shorten the pg_upgrade window. At least, that is how I anticipate using it. I've been trying to review this thread with the thought what does this give me?. I am keen to encourage contributions and also keen to extend our feature set, but I do not wish to complicate our code base. Dilip's developments do seem to be good quality; what I question is whether we want this feature. This patch seems to allow me to run multiple VACUUMs at once. But I can already do this, with autovacuum. Is there anything this patch can do that cannot be already done with autovacuum? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs si...@2ndquadrant.com wrote: I've been trying to review this thread with the thought what does this give me?. I am keen to encourage contributions and also keen to extend our feature set, but I do not wish to complicate our code base. Dilip's developments do seem to be good quality; what I question is whether we want this feature. This patch seems to allow me to run multiple VACUUMs at once. But I can already do this, with autovacuum. Is there anything this patch can do that cannot be already done with autovacuum? The difference lies in the fact that vacuumdb (or VACUUM) gives the option to user to control the vacuum activity for cases when autovacuum doesn't suffice the need, one of the example is to perform vacuum via vacuumdb after pg_upgrade or some other maintenance activity (as mentioned by Jeff upthread). So I think in all such cases having parallel option can give benefit in terms of performance which is already shown by Dilip upthread by running some tests (with and without patch). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 26 September 2014 01:24, Jeff Janes Wrote, I think you have an off-by-one error in the index into the array of file handles. Actually the problem is that the socket for the master connection was not getting initialized, see my one line addition here. connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot)); connSlot[0].connection = conn; + connSlot[0].sock = PQsocket(conn); Thanks for the review, I have fixed this. However, I don't think it is good to just ignore errors from the select call (like the EBADF) and go into a busy loop instead, so there are more changes needed than this. Actually this select_loop function I have implemented same as other client application are handling, i.e pg_dum in parallel.c, however parallel.c is handling the case if process is in abort (if Ctrl+c is recieved), And we need to handle the same, so I have fixed this in attached patch. Also, cancelling the run (by hitting ctrl-C in the shell that invoked it) does not seem to work on linux. I get a message that says Cancel request sent, but then it continues to finish the job anyway. Apart from above mentioned reason, GetQueryResult was also not setting “SetCancelConn” as Amit has pointed, now this is also fixed. Regards, Dilip Kumar vacuumdb_parallel_v15.patch Description: vacuumdb_parallel_v15.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 26 September 2014 12:24, Amit Kapila Wrote, I don't think this can handle cancel requests properly because you are just setting it in GetIdleSlot() what if the cancel request came during GetQueryResult() after sending sql for all connections (probably thats the reason why Jeff is not able to cancel the vacuumdb when using parallel option). You are right, I have fixed, it in latest patch, please check latest patch @ (4205e661176a124faf891e0a6ba9135266363...@szxeml509-mbs.china.huawei.comhttp://www.postgresql.org/message-id/4205e661176a124faf891e0a6ba9135266363...@szxeml509-mbs.china.huawei.com) dilip@linux-ltr9:/home/dilip/9.4/install/bin ./vacuumdb -z -a -j 8 -p 9005 vacuumdb: vacuuming database db1 vacuumdb: vacuuming database postgres Cancel request sent vacuumdb: vacuuming of database postgres failed: ERROR: canceling statement due to user request Few other points 1. + vacuum_parallel(const char *dbname, bool full, bool verbose, { .. +connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot)); +connSlot[0].connection = conn; Fixed a. Does above memory gets freed anywhere, if not isn't it good idea to do the same b. For slot 0, you are not seeting it as PQsetnonblocking, where as I think it can be used to run commands like any other connection. Yes, this was missing in the code, I have fixed it.. 2. +/* +* If user has given the vacuum of complete db, then if +* any of the object vacuum failed it can be ignored and vacuuming +* of other object can be continued, this is the same behavior as +* vacuuming of complete db is handled without --jobs option +*/ s/object/object's FIXED 3. +if(!completedb || +(sqlState strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)) +{ + +fprintf(stderr, _(%s: vacuuming of database \%s\ failed: %s), +progname, dbname, PQerrorMessage (conn)); Indentation on both places is wrong. Check other palces for similar issues. FIXED 4. +bool analyze_only, bool freeze, int numAsyncCons, In code still there is reference to AsyncCons, as decided lets change it to concurrent_connections | conc_cons FIXED Regards, Dilip
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Wed, Sep 24, 2014 at 3:18 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 24 August 2014 11:33, Amit Kapila Wrote 7. I think in new mechanism cancel handler will not work. In single connection vacuum it was always set/reset in function executeMaintenanceCommand(). You might need to set/reset it in function run_parallel_vacuum(). Good catch, Now i have called SetCancelConn(pSlot[0].connection), on first connection. this will enable cancle handler to cancle the query on first connection so that select loop will come out. I don't think this can handle cancel requests properly because you are just setting it in GetIdleSlot() what if the cancel request came during GetQueryResult() after sending sql for all connections (probably thats the reason why Jeff is not able to cancel the vacuumdb when using parallel option). 1. I could see one shortcoming in the way the patch has currently parallelize the work for --analyze-in-stages. Basically patch is performing the work for each stage for multiple tables in concurrent connections that seems okay for the cases when number of parallel connections is less than equal to number of tables, but for the case when user has asked for more number of connections than number of tables, then I think this strategy will not be able to use the extra connections. I think --analyze-in-stages should maintain the order. Yes, you are right. So lets keep the code as it is for this case. 2. Similarly for the case of multiple databases, currently it will not be able to use connections more than number of tables in each database because the parallelizing strategy is to just use the conncurrent connections for tables inside single database. I think for multiple database doing the parallel execution we need to maintain the multiple connection with multiple databases. And we need to maintain a table list for all the databases together to run them concurrently. I think this may impact the startup cost, as we need to create a multiple connection and disconnect for preparing the list Yeah probably startup cost will be bit higher, but that cost we will anyway incur during overall operation. and i think it will increase the complexity also. I understand that complexity of code might increase and the strategy to parallelize can also be different incase we want to parallelize for all databases case, so lets leave as it is unless someone else raises voice for the same. Today while again thinking about the startegy used in patch to parallelize the operation (vacuum database), I think we can improve the same for cases when number of connections are lesser than number of tables in database (which I presume will normally be the case). Currently we are sending command to vacuum one table per connection, how about sending multiple commands (example Vacuum t1; Vacuum t2) on one connection. It seems to me there is extra roundtrip for cases when there are many small tables in database and few large tables. Do you think we should optimize for any such cases? Few other points 1. + vacuum_parallel(const char *dbname, bool full, bool verbose, { .. + connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot)); + connSlot[0].connection = conn; a. Does above memory gets freed anywhere, if not isn't it good idea to do the same b. For slot 0, you are not seeting it as PQsetnonblocking, where as I think it can be used to run commands like any other connection. 2. + /* + * If user has given the vacuum of complete db, then if + * any of the object vacuum failed it can be ignored and vacuuming + * of other object can be continued, this is the same behavior as + * vacuuming of complete db is handled without --jobs option + */ s/object/object's 3. + if(!completedb || + (sqlState strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)) + { + + fprintf(stderr, _(%s: vacuuming of database \%s\ failed: %s), + progname, dbname, PQerrorMessage (conn)); Indentation on both places is wrong. Check other palces for similar issues. 4. + bool analyze_only, bool freeze, int numAsyncCons, In code still there is reference to AsyncCons, as decided lets change it to concurrent_connections | conc_cons With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Amit Kapila wrote: Today while again thinking about the startegy used in patch to parallelize the operation (vacuum database), I think we can improve the same for cases when number of connections are lesser than number of tables in database (which I presume will normally be the case). Currently we are sending command to vacuum one table per connection, how about sending multiple commands (example Vacuum t1; Vacuum t2) on one connection. It seems to me there is extra roundtrip for cases when there are many small tables in database and few large tables. Do you think we should optimize for any such cases? I don't think this is a good idea; at least not in a first cut of this patch. It's easy to imagine that a table you initially think is small enough turns out to have grown much larger since last analyze. In that case, putting one worker to process that one together with some other table could end up being bad for parallelism, if later it turns out that some other worker has no table to process. (Table t2 in your example could grown between the time the command is sent and t1 is vacuumed.) It's simpler to have workers do one thing at a time only. I don't think it's a very good idea to call pg_relation_size() on every table in the database from vacuumdb. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Sep 26, 2014 at 7:06 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: Today while again thinking about the startegy used in patch to parallelize the operation (vacuum database), I think we can improve the same for cases when number of connections are lesser than number of tables in database (which I presume will normally be the case). Currently we are sending command to vacuum one table per connection, how about sending multiple commands (example Vacuum t1; Vacuum t2) on one connection. It seems to me there is extra roundtrip for cases when there are many small tables in database and few large tables. Do you think we should optimize for any such cases? I don't think this is a good idea; at least not in a first cut of this patch. It's easy to imagine that a table you initially think is small enough turns out to have grown much larger since last analyze. That could be possible, but currently it vacuum's even system tables one by one (where I think chances of growing up would be comparatively less) which was the main reason I thought it might be worth to consider if the current work distribution strategy is good enough. In that case, putting one worker to process that one together with some other table could end up being bad for parallelism, if later it turns out that some other worker has no table to process. (Table t2 in your example could grown between the time the command is sent and t1 is vacuumed.) It's simpler to have workers do one thing at a time only. Yeah probably that is best at least for initial patch. I don't think it's a very good idea to call pg_relation_size() on every table in the database from vacuumdb. You might be right, however I was bit skeptical about the current strategy where the work unit is one object and each object is considered same irrespective of it's size/bloat. OTOH I agree with you that it is good to keep the first version simpler. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 27/09/14 01:36, Alvaro Herrera wrote: Amit Kapila wrote: Today while again thinking about the startegy used in patch to parallelize the operation (vacuum database), I think we can improve the same for cases when number of connections are lesser than number of tables in database (which I presume will normally be the case). Currently we are sending command to vacuum one table per connection, how about sending multiple commands (example Vacuum t1; Vacuum t2) on one connection. It seems to me there is extra roundtrip for cases when there are many small tables in database and few large tables. Do you think we should optimize for any such cases? I don't think this is a good idea; at least not in a first cut of this patch. It's easy to imagine that a table you initially think is small enough turns out to have grown much larger since last analyze. In that case, putting one worker to process that one together with some other table could end up being bad for parallelism, if later it turns out that some other worker has no table to process. (Table t2 in your example could grown between the time the command is sent and t1 is vacuumed.) It's simpler to have workers do one thing at a time only. I don't think it's a very good idea to call pg_relation_size() on every table in the database from vacuumdb. Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Gavin Flower wrote: Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Feasible: no. Useful: maybe, we don't really know. (You could just as well have a worker at double the speed, i.e. double vacuum_cost_limit). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 9/26/14, 2:38 PM, Gavin Flower wrote: Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Not really feasible without a major overhaul. It might be mildly useful in one rare case. Occasionally I'll find very hot single tables that vacuum is constantly processing, despite mostly living in RAM because the server has a lot of memory. You can set vacuum_cost_page_hit=0 in order to get vacuum to chug through such a table as fast as possible. However, the speed at which that happens will often then be limited by how fast a single core can read from memory, for things in shared_buffers. That is limited by the speed of memory transfers from a single NUMA memory bank. Which bank you get will vary depending on the core that owns that part of shared_buffers' memory, but it's only one at a time. On large servers, that can be only a small fraction of the total memory bandwidth the server is able to reach. I've attached a graph showing how this works on a system with many NUMA banks of RAM, and this is only a medium sized system. This server can hit 40GB/s of memory transfers in total; no one process will ever see more than 8GB/s. If we had more vacuum processes running against the same table, there would then be more situations where they were doing work against different NUMA memory banks at the same time, therefore making faster progress through the hits in shared_buffers possible. In the real world, this situation is rare enough compared to disk-bound vacuum work that I doubt it's worth getting excited over. Systems with lots of RAM where performance is regularly dominated by one big ugly table are common though, so I wouldn't just rule the idea out as not useful either. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Sep 26, 2014 at 11:47 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Gavin Flower wrote: Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Feasible: no. Useful: maybe, we don't really know. (You could just as well have a worker at double the speed, i.e. double vacuum_cost_limit). Vacuum_cost_delay is already 0 by default. So unless you changed that, vacuum_cost_limit does not take effect under vacuumdb. It is pretty easy for vacuum to be CPU limited, and even easier for analyze to be CPU limited (It does a lot of sorting). I think analyzing is the main use case for this patch, to shorten the pg_upgrade window. At least, that is how I anticipate using it. Cheers, Jeff
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 27/09/14 11:36, Gregory Smith wrote: On 9/26/14, 2:38 PM, Gavin Flower wrote: Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Not really feasible without a major overhaul. It might be mildly useful in one rare case. Occasionally I'll find very hot single tables that vacuum is constantly processing, despite mostly living in RAM because the server has a lot of memory. You can set vacuum_cost_page_hit=0 in order to get vacuum to chug through such a table as fast as possible. However, the speed at which that happens will often then be limited by how fast a single core can read from memory, for things in shared_buffers. That is limited by the speed of memory transfers from a single NUMA memory bank. Which bank you get will vary depending on the core that owns that part of shared_buffers' memory, but it's only one at a time. On large servers, that can be only a small fraction of the total memory bandwidth the server is able to reach. I've attached a graph showing how this works on a system with many NUMA banks of RAM, and this is only a medium sized system. This server can hit 40GB/s of memory transfers in total; no one process will ever see more than 8GB/s. If we had more vacuum processes running against the same table, there would then be more situations where they were doing work against different NUMA memory banks at the same time, therefore making faster progress through the hits in shared_buffers possible. In the real world, this situation is rare enough compared to disk-bound vacuum work that I doubt it's worth getting excited over. Systems with lots of RAM where performance is regularly dominated by one big ugly table are common though, so I wouldn't just rule the idea out as not useful either. Thanks for the very detailed reply of yours, and the comments from others. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Wed, Sep 24, 2014 at 2:48 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 24 August 2014 11:33, Amit Kapila Wrote Thanks for you comments, i have worked on both the review comment lists, sent on 19 August, and 24 August. Latest patch is attached with the mail.. Hi Dilip, I think you have an off-by-one error in the index into the array of file handles. vacuumdb runs at full CPU, and if I run: strace -ttt -T -f ../parallel_vac/bin/vacuumdb -z -a -j 8 I get the select returning immediately with a bad file descriptor error: 1411663937.641177 select(55, [4 5 6 7 8 9 10 54], NULL, NULL, NULL) = -1 EBADF (Bad file descriptor) 0.12 1411663937.641232 recvfrom(3, 0x104e3f0, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.12 1411663937.641279 recvfrom(4, 0x1034bc0, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.11 1411663937.641326 recvfrom(5, 0x10017c0, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.14 1411663937.641415 recvfrom(6, 0x10097e0, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.20 1411663937.641487 recvfrom(7, 0x1012330, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.13 1411663937.641538 recvfrom(8, 0x101af00, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.11 1411663937.641584 recvfrom(9, 0x1023af0, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.12 1411663937.641631 recvfrom(10, 0x1054600, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.12 Cheers, Jeff
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Sep 25, 2014 at 10:00 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Sep 24, 2014 at 2:48 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 24 August 2014 11:33, Amit Kapila Wrote Thanks for you comments, i have worked on both the review comment lists, sent on 19 August, and 24 August. Latest patch is attached with the mail.. Hi Dilip, I think you have an off-by-one error in the index into the array of file handles. Actually the problem is that the socket for the master connection was not getting initialized, see my one line addition here. connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot)); connSlot[0].connection = conn; + connSlot[0].sock = PQsocket(conn); However, I don't think it is good to just ignore errors from the select call (like the EBADF) and go into a busy loop instead, so there are more changes needed than this. Also, cancelling the run (by hitting ctrl-C in the shell that invoked it) does not seem to work on linux. I get a message that says Cancel request sent, but then it continues to finish the job anyway. Cheers, Jeff
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 24 August 2014 11:33, Amit Kapila Wrote Thanks for you comments, i have worked on both the review comment lists, sent on 19 August, and 24 August. Latest patch is attached with the mail.. on 19 August: You can compare against SQLSTATE by using below API. val = PQresultErrorField(res, PG_DIAG_SQLSTATE); You need to handle *42P01* SQLSTATE, also please refer below usage where we are checking SQLSTATE. fe-connect.c PQresultErrorField(conn-result, PG_DIAG_SQLSTATE), ERRCODE_INVALID_PASSWORD) == 0) DONE Few more comments: 1. * If user has not given the vacuum of complete db, then if I think here you have said reverse of what code is doing. You don't need *not* in above sentence. DONE 2. + appendPQExpBuffer(sql, \%s\.\%s\, nspace, relName); I think here you need to use function fmtQualifiedId() or fmtId() or something similar to handle quotes appropriately. DONE 3. + */ + if (!r !completedb) Here the usage of completedb variable is reversed which means that it goes into error path when actually whole database is getting vacuumed and the reason is that you are setting it to false in below code: + /* Vaccuming full database*/ + vacuum_tables = false; FIXED 4. Functions prepare_command() and vacuum_one_database() contain duplicate code, is there any problem in using prepare_command() function in vacuum_one_database(). Another point in this context is that I think it is better to name function prepare_command() as append_vacuum_options() or something on that lines, also it will be better if you can write function header for this function as well. DONE 5. + if (error) + { + for (i = 0; i max_slot; i++) + { + DisconnectDatabase(connSlot[i]); + } Here why do we need DisconnectDatabase() type of function? Why can't we simply call PQfinish() as in base code? Beacause base code jut need to handle main connection, and when sending the PQfinish, means either its completed or error, In both the cases, control is with client, But in multi connection case, if one connection fails then we need to send cancle to on other connection that wwhat is done DisconnectDatabase. 6. + /* + * if table list is not provided then we need to do vaccum for whole DB + * get the list of all tables and prpare the list + */ spelling of prepare is wrong. I have noticed spell mistake in comments at some other place as well, please check all comments once FIXED 7. I think in new mechanism cancel handler will not work. In single connection vacuum it was always set/reset in function executeMaintenanceCommand(). You might need to set/reset it in function run_parallel_vacuum(). Good catch, Now i have called SetCancelConn(pSlot[0].connection), on first connection. this will enable cancle handler to cancle the query on first connection so that select loop will come out. 24 August - 1. I could see one shortcoming in the way the patch has currently parallelize the work for --analyze-in-stages. Basically patch is performing the work for each stage for multiple tables in concurrent connections that seems okay for the cases when number of parallel connections is less than equal to number of tables, but for the case when user has asked for more number of connections than number of tables, then I think this strategy will not be able to use the extra connections. I think --analyze-in-stages should maintain the order. 2. Similarly for the case of multiple databases, currently it will not be able to use connections more than number of tables in each database because the parallelizing strategy is to just use the conncurrent connections for tables inside single database. I think for multiple database doing the parallel execution we need to maintain the multiple connection with multiple databases. And we need to maintain a table list for all the databases together to run them concurrently. I think this may impact the startup cost, as we need to create a multiple connection and disconnect for preparing the list and i think it will increase the complexity also. I am not completely sure whether current strategy is good enough or we should try to address the above problems. What do you think? 3. + do + { + i = select_loop(maxFd, slotset); + Assert(i != 0); Could you explain the reason of using this loop, I think you want to wait for data on socket descriptor, but why for maxFd? Also it is better if you explain this logic in comments. We are sending vacuum job to the connection and when non of the connection slot is free, we are waiting on all the socket, and
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Aug 19, 2014 at 4:27 PM, Amit Kapila amit.kapil...@gmail.com wrote: Few more comments: Some more comments: 1. I could see one shortcoming in the way the patch has currently parallelize the work for --analyze-in-stages. Basically patch is performing the work for each stage for multiple tables in concurrent connections that seems okay for the cases when number of parallel connections is less than equal to number of tables, but for the case when user has asked for more number of connections than number of tables, then I think this strategy will not be able to use the extra connections. 2. Similarly for the case of multiple databases, currently it will not be able to use connections more than number of tables in each database because the parallelizing strategy is to just use the conncurrent connections for tables inside single database. I am not completely sure whether current strategy is good enough or we should try to address the above problems. What do you think? 3. + do + { + i = select_loop(maxFd, slotset); + Assert(i != 0); Could you explain the reason of using this loop, I think you want to wait for data on socket descriptor, but why for maxFd? Also it is better if you explain this logic in comments. 4. + for (i = 0; i max_slot; i++) + { + if (!FD_ISSET(pSlot[i].sock, slotset)) + continue; + + PQconsumeInput(pSlot[i].connection); + if (PQisBusy(pSlot[i].connection)) + continue; I think it is better to call PQconsumeInput() only if you find connection is busy. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 21 August 2014 08:31, Amit Kapila Wrote, Not sure. How about *concurrent* or *multiple*? multiple isn't right, but we could say concurrent. I also find concurrent more appropriate. Dilip, could you please change it to concurrent in doc updates, variables, functions unless you see any objection for the same. Ok, I will take care along with other comments fix.. Regards, Dilip Kumar
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila amit.kapil...@gmail.com wrote: 1. +Number of parallel connections to perform the operation. This option will enable the vacuum +operation to run on parallel connections, at a time one table will be operated on one connection. a. How about describing w.r.t asynchronous connections instead of parallel connections? I don't think asynchronous is a good choice of word. Agreed. Maybe simultaneous? Not sure. How about *concurrent* or *multiple*? multiple isn't right, but we could say concurrent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Aug 21, 2014 at 12:04 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila amit.kapil...@gmail.com wrote: a. How about describing w.r.t asynchronous connections instead of parallel connections? I don't think asynchronous is a good choice of word. Agreed. Maybe simultaneous? Not sure. How about *concurrent* or *multiple*? multiple isn't right, but we could say concurrent. I also find concurrent more appropriate. Dilip, could you please change it to concurrent in doc updates, variables, functions unless you see any objection for the same. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Wed, Aug 13, 2014 at 4:01 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 11 August 2014 10:29, Amit kapila wrote, 5. res = executeQuery(conn, select relname, nspname from pg_class c, pg_namespace ns where (relkind = \'r\' or relkind = \'m\') and c.relnamespace = ns.oid order by relpages desc, progname, echo); a. Here you need to use SQL keywords in capital letters, refer one of the other caller of executeQuery() in vacuumdb.c b. Why do you need this condition c.relnamespace = ns.oid in above query? IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE NEED TO VACUUM ALL THE TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE WILL BE TREATED SAME.) Thats right, however writing query in below way might make it more understandable + SELECT relname, nspname FROM pg_class c, pg_namespace ns SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns 7. Here we are getting message string, I think if we need to find the error code then we need to parse the string, and after that we need to compare with error codes. Is there any other way to do this ? You can compare against SQLSTATE by using below API. val = PQresultErrorField(res, PG_DIAG_SQLSTATE); You need to handle *42P01* SQLSTATE, also please refer below usage where we are checking SQLSTATE. fe-connect.c PQresultErrorField(conn-result, PG_DIAG_SQLSTATE), ERRCODE_INVALID_PASSWORD) == 0) Few more comments: 1. * If user has not given the vacuum of complete db, then if I think here you have said reverse of what code is doing. You don't need *not* in above sentence. 2. + appendPQExpBuffer(sql, \%s\.\%s\, nspace, relName); I think here you need to use function fmtQualifiedId() or fmtId() or something similar to handle quotes appropriately. 3. + */ + if (!r !completedb) Here the usage of completedb variable is reversed which means that it goes into error path when actually whole database is getting vacuumed and the reason is that you are setting it to false in below code: + /* Vaccuming full database*/ + vacuum_tables = false; 4. Functions prepare_command() and vacuum_one_database() contain duplicate code, is there any problem in using prepare_command() function in vacuum_one_database(). Another point in this context is that I think it is better to name function prepare_command() as append_vacuum_options() or something on that lines, also it will be better if you can write function header for this function as well. 5. + if (error) + { + for (i = 0; i max_slot; i++) + { + DisconnectDatabase(connSlot[i]); + } Here why do we need DisconnectDatabase() type of function? Why can't we simply call PQfinish() as in base code? 6. + /* + * if table list is not provided then we need to do vaccum for whole DB + * get the list of all tables and prpare the list + */ spelling of prepare is wrong. I have noticed spell mistake in comments at some other place as well, please check all comments once 7. I think in new mechanism cancel handler will not work. In single connection vacuum it was always set/reset in function executeMaintenanceCommand(). You might need to set/reset it in function run_parallel_vacuum(). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila amit.kapil...@gmail.com wrote: 1. +Number of parallel connections to perform the operation. This option will enable the vacuum +operation to run on parallel connections, at a time one table will be operated on one connection. a. How about describing w.r.t asynchronous connections instead of parallel connections? I don't think asynchronous is a good choice of word. Agreed. Maybe simultaneous? Not sure. How about *concurrent* or *multiple*? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila amit.kapil...@gmail.com wrote: 1. +Number of parallel connections to perform the operation. This option will enable the vacuum +operation to run on parallel connections, at a time one table will be operated on one connection. a. How about describing w.r.t asynchronous connections instead of parallel connections? I don't think asynchronous is a good choice of word. Maybe simultaneous? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 11 August 2014 10:29, Amit kapila wrote, 1. I have fixed all the review comments except few, and modified patch is attached. 2. For not fixed comments, find inline reply in the mail.. 1. +Number of parallel connections to perform the operation. This option will enable the vacuum +operation to run on parallel connections, at a time one table will be operated on one connection. a. How about describing w.r.t asynchronous connections instead of parallel connections? b. It is better to have line length as lesser than 80. c. As you are using multiple connections to achieve parallelism, I suggest you add a line in your description indicating user should verify max_connections parameters. Something similar to pg_dump: 2. +So at one time as many tables will be vacuumed parallely as number of jobs. can you briefly mention about the case when number of jobs is more than number of tables? 1 and 2 ARE FIXED in DOC. 3. + /* When user is giving the table list, and list is smaller then + * number of tables + */ + if (tbl_count (parallel tbl_count)) + parallel = tbl_count; + Again here multiline comments are wrong. Some other instances are as below: a. /* This will give the free connection slot, if no slot is free it will * wait for atleast one slot to get free. */ b. /* if table list is not provided then we need to do vaccum for whole DB * get the list of all tables and prpare the list */ c. /* Some of the slot are free, Process the results for slots whichever * are free */ COMMENTS are FIXED 4. src/bin/scripts/vacuumdb.c:51: indent with spaces. + bool analyze_only, bool freeze, PQExpBuffer sql); src/bin/scripts/vacuumdb.c:116: indent with spaces. +int parallel = 0; src/bin/scripts/vacuumdb.c:198: indent with spaces. +optind++; src/bin/scripts/vacuumdb.c:299: space before tab in indent. + vacuum_one_database(dbname, full, verbose, and_analyze, There are lot of redundant whitespaces, check with git diff --check and fix them. ALL are FIXED 5. res = executeQuery(conn, select relname, nspname from pg_class c, pg_namespace ns where (relkind = \'r\' or relkind = \'m\') and c.relnamespace = ns.oid order by relpages desc, progname, echo); a. Here you need to use SQL keywords in capital letters, refer one of the other caller of executeQuery() in vacuumdb.c b. Why do you need this condition c.relnamespace = ns.oid in above query? IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE NEED TO VACUUM ALL THE TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE WILL BE TREATED SAME.) I think to get the list of required objects from pg_class, you don't need to have a join with pg_namespace. DONE 6. vacuum_parallel() { .. if (!tables || !tables-head) { .. tbl_count++; } .. } a. Here why you need a separate variable (tbl_count) to verify number asynchronous/parallel connections you want, why can't we use ntuple? b. there is a warning in code (I have compiled it on windows) as well related to this variable. vacuumdb.c(695): warning C4700: uninitialized local variable 'tbl_count' used Variable REMOVED 7. Fix for one of my previous comment is as below: GetQueryResult() { .. if (!r !completedb) .. } Here I think some generic errors like connection broken or others will also get ignored. Is it possible that we can ignore particular error which we want to ignore without complicating the code? Also in anycase add comments to explain why you are ignoring error for particular case. Here we are getting message string, I think if we need to find the error code then we need to parse the string, and after that we need to compare with error codes. Is there any other way to do this ? Comments are added 8. + fprintf(stderr, _(%s: Number of parallel \jobs\ should be at least 1\n), + progname); formatting of 2nd line progname is not as per standard (you can refer other fprintf in the same file). DONE 9. + int parallel = 0; I think it is better to name it as numAsyncCons or something similar. CHANGED as PER SUGGESTION 10. It is better if you can add function header for newly added functions. ADDED Test2: (as per your scenario, where actual vacuum time is very less) Vacuum done for complete DB 8 tables all with 1 records and few dead tuples I think this test is missing in attached file. Few means? Can you try with 0.1%, 1% of dead tuples in table and try to take time in milliseconds if it is taking less time to complete the test. TESTED with 1%, 0.1% and 0.01 % and results are as follows 1. With 1% (file test1%.sql) Base Code : 22.26 s 2 Threads : 12.82 s 4 Threads : 9.19s 8 Threads : 8.89s 2. With 0.1% Base Code : 3.83.26 s 2 Threads : 2.01 s 4 Threads : 2.02s 8 Threads : 2.25s 3.
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Aug 4, 2014 at 11:41 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 31 July 2014 10:59, Amit kapila Wrote, Thanks for the review and valuable comments. I have fixed all the comments and attached the revised patch. I have again looked into your revised patch and would like to share my findings with you. 1. +Number of parallel connections to perform the operation. This option will enable the vacuum +operation to run on parallel connections, at a time one table will be operated on one connection. a. How about describing w.r.t asynchronous connections instead of parallel connections? b. It is better to have line length as lesser than 80. c. As you are using multiple connections to achieve parallelism, I suggest you add a line in your description indicating user should verify max_connections parameters. Something similar to pg_dump: pg_dump will open njobs + 1 connections to the database, so make sure your max_connections setting is high enough to accommodate all connections. 2. +So at one time as many tables will be vacuumed parallely as number of jobs. can you briefly mention about the case when number of jobs is more than number of tables? 3. + /* When user is giving the table list, and list is smaller then + * number of tables + */ + if (tbl_count (parallel tbl_count)) + parallel = tbl_count; + Again here multiline comments are wrong. Some other instances are as below: a. /* This will give the free connection slot, if no slot is free it will * wait for atleast one slot to get free. */ b. /* if table list is not provided then we need to do vaccum for whole DB * get the list of all tables and prpare the list */ c. /* Some of the slot are free, Process the results for slots whichever * are free */ 4. src/bin/scripts/vacuumdb.c:51: indent with spaces. + bool analyze_only, bool freeze, PQExpBuffer sql); src/bin/scripts/vacuumdb.c:116: indent with spaces. +int parallel = 0; src/bin/scripts/vacuumdb.c:198: indent with spaces. +optind++; src/bin/scripts/vacuumdb.c:299: space before tab in indent. + vacuum_one_database(dbname, full, verbose, and_analyze, There are lot of redundant whitespaces, check with git diff --check and fix them. 5. res = executeQuery(conn, select relname, nspname from pg_class c, pg_namespace ns where (relkind = \'r\' or relkind = \'m\') and c.relnamespace = ns.oid order by relpages desc, progname, echo); a. Here you need to use SQL keywords in capital letters, refer one of the other caller of executeQuery() in vacuumdb.c b. Why do you need this condition c.relnamespace = ns.oid in above query? I think to get the list of required objects from pg_class, you don't need to have a join with pg_namespace. 6. vacuum_parallel() { .. if (!tables || !tables-head) { .. tbl_count++; } .. } a. Here why you need a separate variable (tbl_count) to verify number asynchronous/parallel connections you want, why can't we use ntuple? b. there is a warning in code (I have compiled it on windows) as well related to this variable. vacuumdb.c(695): warning C4700: uninitialized local variable 'tbl_count' used 7. Fix for one of my previous comment is as below: GetQueryResult() { .. if (!r !completedb) .. } Here I think some generic errors like connection broken or others will also get ignored. Is it possible that we can ignore particular error which we want to ignore without complicating the code? Also in anycase add comments to explain why you are ignoring error for particular case. 8. + fprintf(stderr, _(%s: Number of parallel \jobs\ should be at least 1\n), + progname); formatting of 2nd line progname is not as per standard (you can refer other fprintf in the same file). 9. + int parallel = 0; I think it is better to name it as numAsyncCons or something similar. 10. It is better if you can add function header for newly added functions. Test2: (as per your scenario, where actual vacuum time is very less) Vacuum done for complete DB 8 tables all with 1 records and few dead tuples I think this test is missing in attached file. Few means? Can you try with 0.1%, 1% of dead tuples in table and try to take time in milliseconds if it is taking less time to complete the test. PS - It is better if you mention against each review comment/suggestion what you have done, because in some cases it will help reviewer to understand your fix easily and as author you will also be sure that all got fixed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 31 July 2014 10:59, Amit kapila Wrote, Thanks for the review and valuable comments. I have fixed all the comments and attached the revised patch. As per your suggestion I have taken the performance report also… Test1: Machine Configuration: Core : 8 (Intel(R) Xeon(R) CPU E5520 @ 2.27GHz) RAM: 48GB Test Scenario: 8 tables all with 1M+ records. [many records are deleted and inserted using some pattern, (files is attached in the mail)] Test Result Base Code: 43.126s Parallel Vacuum Code 2 Threads : 29.687s 8 Threads : 14.647s Test2: (as per your scenario, where actual vacuum time is very less) Vacuum done for complete DB 8 tables all with 1 records and few dead tuples Test Result Base Code: 0.59s Parallel Vacuum Code 2 Threads : 0.50s 4 Threads : 0.29s 8 Threads : 0.18s Regards, Dilip Kumar From: Amit Kapila [mailto:amit.kapil...@gmail.com] Sent: 31 July 2014 10:59 To: Dilip kumar Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ] On Fri, Jul 18, 2014 at 10:22 AM, Dilip kumar dilip.ku...@huawei.commailto:dilip.ku...@huawei.com wrote: On 16 July 2014 12:13, Magnus Hagander Wrote, Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable... (and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...) I have modified the patch as per the suggestion, Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task. Please have a look and provide your opinion… 1. +connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot)); +for (i = 0; i parallel; i++) +{ +connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + +PQsetnonblocking(connSlot[i].connection, 1); +connSlot[i].isFree = true; +connSlot[i].sock = PQsocket(connSlot[i].connection); +} Here it seems to me that you are opening connections before getting or checking tables list, so in case you have lesser number of tables, won't the extra connections be always idle. Simple case to erify the same is with below example vacuumdb -t t1 -d postgres -j 4 2. + res = executeQuery(conn, + select relname, nspname from pg_class c, pg_namespace ns + where relkind= \'r\' and c.relnamespace = ns.oid + order by relpages desc, + progname, echo); Here it is just trying to get the list of relations, however Vacuum command processes materialized views as well, so I think here the list should include materialized views as well unless you have any specific reason for not including those. 3. In function vacuum_parallel(), if user has not provided list of tables, then it is retrieving all the tables in database and then in run_parallel_vacuum(), it tries to do Vacuum for each of table using Async mechanism, now consider a case when after getting list if any table is dropped by user from some other session, then patch will error out. However without patch or Vacuum command will internally ignore such a case and complete the Vacuum for other tables. Don't you think the patch should maintain the existing behaviour? 4. +termoption-j replaceable class=parameterjobs/replaceable//term +Number of parallel process to perform the operation. Change this description as per new implementation. Also I think there is a need of some explanation for this new option. 5. It seems there is no change in below function decalration: static void vacuum_one_database(const char *dbname, bool full, bool verbose, ! bool and_analyze, bool analyze_only, bool analyze_in_stages, ! bool freeze, const char *table, const char *host, ! const char *port, const char *username, ! enum trivalue prompt_password
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jul 18, 2014 at 10:22 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 16 July 2014 12:13, Magnus Hagander Wrote, Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable... (and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...) I have modified the patch as per the suggestion, Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task. Please have a look and provide your opinion… 1. + connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot)); + for (i = 0; i parallel; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + } Here it seems to me that you are opening connections before getting or checking tables list, so in case you have lesser number of tables, won't the extra connections be always idle. Simple case to erify the same is with below example vacuumdb -t t1 -d postgres -j 4 2. + res = executeQuery(conn, + select relname, nspname from pg_class c, pg_namespace ns + where relkind= \'r\' and c.relnamespace = ns.oid + order by relpages desc, + progname, echo); Here it is just trying to get the list of relations, however Vacuum command processes materialized views as well, so I think here the list should include materialized views as well unless you have any specific reason for not including those. 3. In function vacuum_parallel(), if user has not provided list of tables, then it is retrieving all the tables in database and then in run_parallel_vacuum(), it tries to do Vacuum for each of table using Async mechanism, now consider a case when after getting list if any table is dropped by user from some other session, then patch will error out. However without patch or Vacuum command will internally ignore such a case and complete the Vacuum for other tables. Don't you think the patch should maintain the existing behaviour? 4. +termoption-j replaceable class=parameterjobs/replaceable//term +Number of parallel process to perform the operation. Change this description as per new implementation. Also I think there is a need of some explanation for this new option. 5. It seems there is no change in below function decalration: static void vacuum_one_database(const char *dbname, bool full, bool verbose, ! bool and_analyze, bool analyze_only, bool analyze_in_stages, ! bool freeze, const char *table, const char *host, ! const char *port, const char *username, ! enum trivalue prompt_password, const char *progname, bool echo); 6. + printf(_( -j, --jobs=NUM use this many parallel jobs to vacuum\n)); Change the description as per new implementation. 7. /* This will give the free connection slot, if no slot is free it will wait for atleast one slot to get free.*/ Multiline comments should be written like (refer other places) /* * This will give the free connection slot, if no slot is free it will * wait for atleast one slot to get free. */ Kindly correct at other places if similar instance exist in patch. 8. Isn't it a good idea to check performance of this new patch especially for some worst cases like when there is not much to vacuum in the tables inside a database. The reason I wanted to check is that because with new algorithm (for a vacuum of database, now it will get the list of tables and perform vacuum on individual tables) we have to repeat certain actions in server side like allocation/deallocataion of context, sending stats which would have been otherwise done once. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Wed, Jul 16, 2014 at 5:30 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 16 July 2014 12:13 Magnus Hagander Wrote, Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable... (and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...) Oh, I got your point, I will update my patch and send, Now we can completely remove vac_parallel.h file and no need of refactoring also:) Thanks Regards, Dilip Kumar Should we push the refactoring through anyway? I have a hard time believing that pg_dump is going to be the only client program we ever have that will need process-level parallelism, even if this feature itself does not need it. Why make the next person who comes along re-invent that re-factoring of this wheel? Cheers, Jeff
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Jeff Janes wrote: Should we push the refactoring through anyway? I have a hard time believing that pg_dump is going to be the only client program we ever have that will need process-level parallelism, even if this feature itself does not need it. Why make the next person who comes along re-invent that re-factoring of this wheel? I gave the refactoring patch a look some days ago, and my conclusion was that it is reasonably sound but it needed quite some cleanup in order for it to be committable. Without any immediate use case, it's hard to justify going through all that effort. Maybe we can add a TODO item and have it point to the posted patch, so that if in the future we see a need for another parallel client program we can easily rebase the current patch. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 16 July 2014 12:13, Magnus Hagander Wrote, Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable... (and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...) I have modified the patch as per the suggestion, Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task. Please have a look and provide your opinion… Thanks Regards, Dilip Kumar vacuumdb_parallel_v11.patch Description: vacuumdb_parallel_v11.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Jul 16, 2014 7:05 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Dilip kumar dilip.ku...@huawei.com writes: On 15 July 2014 19:01, Magnus Hagander Wrote, I am late to this game, but the first thing to my mind was - do we really need the whole forking/threading thing on the client at all? Thanks for the review, I understand you point, but I think if we have do this directly by independent connection, It's difficult to equally divide the jobs b/w multiple independent connections. That argument seems like complete nonsense. You're confusing work allocation strategy with the implementation technology for the multiple working threads. I see no reason why a good allocation strategy couldn't work with either approach; indeed, I think it would likely be easier to do some things *without* client-side physical parallelism, because that makes it much simpler to handle feedback between the results of different operational threads. So you would have one initial connection, which generates a task list; then open N libpq connections. Launch one vacuum on each, and then sleep on select() on the three sockets. Whenever one returns read-ready, the vacuuming is done and we send another item from the task list. Repeat until tasklist is empty. No need to fork anything. Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable... (and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...) /Magnus
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 16 July 2014 12:13 Magnus Hagander Wrote, Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable... (and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...) Oh, I got your point, I will update my patch and send, Now we can completely remove vac_parallel.h file and no need of refactoring also:) Thanks Regards, Dilip Kumar From: Magnus Hagander [mailto:mag...@hagander.net] Sent: 16 July 2014 12:13 To: Alvaro Herrera Cc: Dilip kumar; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ] On Jul 16, 2014 7:05 AM, Alvaro Herrera alvhe...@2ndquadrant.commailto:alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Dilip kumar dilip.ku...@huawei.commailto:dilip.ku...@huawei.com writes: On 15 July 2014 19:01, Magnus Hagander Wrote, I am late to this game, but the first thing to my mind was - do we really need the whole forking/threading thing on the client at all? Thanks for the review, I understand you point, but I think if we have do this directly by independent connection, It's difficult to equally divide the jobs b/w multiple independent connections. That argument seems like complete nonsense. You're confusing work allocation strategy with the implementation technology for the multiple working threads. I see no reason why a good allocation strategy couldn't work with either approach; indeed, I think it would likely be easier to do some things *without* client-side physical parallelism, because that makes it much simpler to handle feedback between the results of different operational threads. So you would have one initial connection, which generates a task list; then open N libpq connections. Launch one vacuum on each, and then sleep on select() on the three sockets. Whenever one returns read-ready, the vacuuming is done and we send another item from the task list. Repeat until tasklist is empty. No need to fork anything. Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable... (and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...) /Magnus
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Jul 1, 2014 at 6:25 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 01 July 2014 03:48, Alvaro Wrote, In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. Maybe we should move pgpipe back to src/port and have pg_dump and this new thing use that. I'm not sure about the rest of duplication in vac_parallel.c; there might be a lot in common with what pg_dump/parallel.c does too. Having two copies of code is frowned upon for good reasons. This patch introduces 1200 lines of new code in vac_parallel.c, ugh. If we really require 1200 lines to get parallel vacuum working for vacuumdb, I would question the wisdom of this effort. To me, it seems better spent improving autovacuum to cover whatever it is that this patch is supposed to be good for --- or maybe just enable having a shell script that launches multiple vacuumdb instances in parallel ... Thanks for looking into the patch, I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task, If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as good as one process is doing operation. In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table, this way all process get equal sharing of the task. I am late to this game, but the first thing to my mind was - do we really need the whole forking/threading thing on the client at all? We need it for things like pg_dump/pg_restore because they can themselvse benefit from parallelism at the client level, but for something like this, might the code become a lot simpler if we just use multiple database connections and async queries? That would also bring the benefit of less platform dependent code, less cleanup needs etc. (Oh, and for some reason at my quick review i also noticed - you added quoting of the table name, but forgot to do it for the schema name. You should probably also look at using something like quote_identifier(), that'll make things easier). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Dilip kumar dilip.ku...@huawei.com writes: On 15 July 2014 19:01, Magnus Hagander Wrote, I am late to this game, but the first thing to my mind was - do we really need the whole forking/threading thing on the client at all? Thanks for the review, I understand you point, but I think if we have do this directly by independent connection, It's difficult to equally divide the jobs b/w multiple independent connections. That argument seems like complete nonsense. You're confusing work allocation strategy with the implementation technology for the multiple working threads. I see no reason why a good allocation strategy couldn't work with either approach; indeed, I think it would likely be easier to do some things *without* client-side physical parallelism, because that makes it much simpler to handle feedback between the results of different operational threads. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Tom Lane wrote: Dilip kumar dilip.ku...@huawei.com writes: On 15 July 2014 19:01, Magnus Hagander Wrote, I am late to this game, but the first thing to my mind was - do we really need the whole forking/threading thing on the client at all? Thanks for the review, I understand you point, but I think if we have do this directly by independent connection, It's difficult to equally divide the jobs b/w multiple independent connections. That argument seems like complete nonsense. You're confusing work allocation strategy with the implementation technology for the multiple working threads. I see no reason why a good allocation strategy couldn't work with either approach; indeed, I think it would likely be easier to do some things *without* client-side physical parallelism, because that makes it much simpler to handle feedback between the results of different operational threads. So you would have one initial connection, which generates a task list; then open N libpq connections. Launch one vacuum on each, and then sleep on select() on the three sockets. Whenever one returns read-ready, the vacuuming is done and we send another item from the task list. Repeat until tasklist is empty. No need to fork anything. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 27 June 2014 02:57, Jeff Wrote, Based on that, I find most importantly that it doesn't seem to correctly vacuum tables which have upper case letters in the name, because it does not quote the table names when they need quotes. Thanks for your comments There are two problem First - When doing the vacuum of complete database that time if any table with upper case letter, it was giving error --FIXED by adding quotes for table name Second - When user pass the table using -t option, and if it has uppercase letter --This is the existing problem (without parallel implementation), Just for the record, I don't think the second one is actually a bug. If someone uses -t option from the command line, they are required to provide the quotes if quotes are needed, just like they would need to in psql. That can be annoying to do from a shell, as you then need to protect the quotes themselves from the shell, but that is the way it is. vacuumdb -t 'CrAzY QuOtE' or vacuumdb -t \CrAzY\ QuOtE\ Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jul 4, 2014 at 1:15 AM, Dilip kumar dilip.ku...@huawei.com wrote: In attached patch, I have moved pgpipe, piperead functions to src/port/pipe.c If we want to consider proceeding with this approach, you should probably separate this into a refactoring patch that doesn't do anything but move code around and a feature patch that applies on top of it. (As to whether this is the right approach, I'm not sure.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Wed, Jul 2, 2014 at 11:45 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Jeff Janes wrote: I would only envision using the parallel feature for vacuumdb after a pg_upgrade or some other major maintenance window (that is the only time I ever envision using vacuumdb at all). I don't think autovacuum can be expected to handle such situations well, as it is designed to be a smooth background process. That's a fair point. One thing that would be pretty neat but I don't think I would get anyone to implement it, is having the user control the autovacuum launcher in some way. For instance please vacuum this set of tables as quickly as possible, and it would launch as many workers are configured. It would take months to get a UI settled for this, however. This sounds to be a better way to have multiple workers working on vacuuming tables. For vacuum as we already have some sort of infrastructure (vacuum workers) to perform tasks in parallel, why not to leverage that instead of inventing a new one even if we assume that we can reduce the duplicate code. I don't know how to calibrate the number of lines that is worthwhile. If you write in C and need to have cross-platform compatibility and robust error handling, it seems to take hundreds of lines to do much of anything. The code duplication is a problem, but I don't think just raw line count is, especially since it has already been written. Well, there are (at least) two types of duplicate code: first you have these common routines such as pgpipe that are duplicates for no good reason. Just move them to src/port or something and it's all good. But the OP said there is code that cannot be shared even though it's very similar in both incarnations. That means we cannot (or it's difficult to) just have one copy, which means as they fix bugs in one copy we need to update the other. I checked briefly the duplicate code among both versions and I think, we might be able to reduce it to a significant amount by making common functions and use AH where passed (as an example, I have checked function ParallelBackupStart() which is more than 100 lines). If you see code duplication as a major point for which you don't prefer this patch, then I think that can be ameliorated or atleast it is worth a try to do so. However I think it might be better to achieve in a way suggested by you using autovacuum launcher. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Jun 30, 2014 at 3:17 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Jeff Janes wrote: In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. Maybe we should move pgpipe back to src/port and have pg_dump and this new thing use that. I'm not sure about the rest of duplication in vac_parallel.c; there might be a lot in common with what pg_dump/parallel.c does too. Having two copies of code is frowned upon for good reasons. This patch introduces 1200 lines of new code in vac_parallel.c, ugh. If we really require 1200 lines to get parallel vacuum working for vacuumdb, I would question the wisdom of this effort. To me, it seems better spent improving autovacuum to cover whatever it is that this patch is supposed to be good for --- or maybe just enable having a shell script that launches multiple vacuumdb instances in parallel ... I would only envision using the parallel feature for vacuumdb after a pg_upgrade or some other major maintenance window (that is the only time I ever envision using vacuumdb at all). I don't think autovacuum can be expected to handle such situations well, as it is designed to be a smooth background process. I guess the ideal solution would be for manual VACUUM to have a PARALLEL option, then vacuumdb could just invoke that one table at a time. That way you would get within-table parallelism which would be important if one table dominates the entire database cluster. But I don't foresee that happening any time soon. I don't know how to calibrate the number of lines that is worthwhile. If you write in C and need to have cross-platform compatibility and robust error handling, it seems to take hundreds of lines to do much of anything. The code duplication is a problem, but I don't think just raw line count is, especially since it has already been written. The trend in this project seems to be for shell scripts to eventually get converted into C programs. In fact, src/bin/scripts now has no scripts at all. Also it is important to vacuum/analyze tables in the same database at the same time, otherwise you will not get much speed-up in the ordinary case where there is only one meaningful database. Doing that in a shell script would be fairly hard. It should be pretty easy in Perl (at least for me--I'm sure others disagree), but that also doesn't seem to be the way we do things for programs intended for end users. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Jeff Janes wrote: I would only envision using the parallel feature for vacuumdb after a pg_upgrade or some other major maintenance window (that is the only time I ever envision using vacuumdb at all). I don't think autovacuum can be expected to handle such situations well, as it is designed to be a smooth background process. That's a fair point. One thing that would be pretty neat but I don't think I would get anyone to implement it, is having the user control the autovacuum launcher in some way. For instance please vacuum this set of tables as quickly as possible, and it would launch as many workers are configured. It would take months to get a UI settled for this, however. I guess the ideal solution would be for manual VACUUM to have a PARALLEL option, then vacuumdb could just invoke that one table at a time. That way you would get within-table parallelism which would be important if one table dominates the entire database cluster. But I don't foresee that happening any time soon. I see this as a completely different feature, which might also be pretty neat, at least if you're open to spending more I/O bandwidth processing a single table: have several processes scanning the heap simultaneously. Since I think vacuum is mostly I/O bound at the moment, I'm not sure there is much point in this currently. I don't know how to calibrate the number of lines that is worthwhile. If you write in C and need to have cross-platform compatibility and robust error handling, it seems to take hundreds of lines to do much of anything. The code duplication is a problem, but I don't think just raw line count is, especially since it has already been written. Well, there are (at least) two types of duplicate code: first you have these common routines such as pgpipe that are duplicates for no good reason. Just move them to src/port or something and it's all good. But the OP said there is code that cannot be shared even though it's very similar in both incarnations. That means we cannot (or it's difficult to) just have one copy, which means as they fix bugs in one copy we need to update the other. This is bad -- witness the situation with ecpg's copy of date/time code, where there are bugs fixed in the backend version but the ecpg version does not have the fix. It's difficult to keep track of these things. The trend in this project seems to be for shell scripts to eventually get converted into C programs. In fact, src/bin/scripts now has no scripts at all. Also it is important to vacuum/analyze tables in the same database at the same time, otherwise you will not get much speed-up in the ordinary case where there is only one meaningful database. Doing that in a shell script would be fairly hard. It should be pretty easy in Perl (at least for me--I'm sure others disagree), but that also doesn't seem to be the way we do things for programs intended for end users. Yeah, shipping shell scripts doesn't work very well for us. I'm thinking perhaps we can have sample scripts in which we show how to use parallel(1) to run multiple vacuumdb's in parallel in Unix and some similar mechanism in Windows, and that's it. So we wouldn't provide the complete toolset, but the platform surely has ways to make it happen. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Wed, Jul 2, 2014 at 2:27 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 01 July 2014 22:17, Sawada Masahiko Wrote, I have executed latest patch. One question is that how to use --jobs option is correct? $ vacuumdb -d postgres --jobs=30 I got following error. vacuumdb: unrecognized option '--jobs=30' Try vacuumdb --help for more information. Thanks for comments, Your usage are correct, but there are some problem in code and I have fixed the same in attached patch. This patch allows to set 0 to -j option? When I set 0 to -j option, I think that the behavior of this is same as when I set to 1. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Jul 1, 2014 at 1:25 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 01 July 2014 03:48, Alvaro Wrote, In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. Maybe we should move pgpipe back to src/port and have pg_dump and this new thing use that. I'm not sure about the rest of duplication in vac_parallel.c; there might be a lot in common with what pg_dump/parallel.c does too. Having two copies of code is frowned upon for good reasons. This patch introduces 1200 lines of new code in vac_parallel.c, ugh. If we really require 1200 lines to get parallel vacuum working for vacuumdb, I would question the wisdom of this effort. To me, it seems better spent improving autovacuum to cover whatever it is that this patch is supposed to be good for --- or maybe just enable having a shell script that launches multiple vacuumdb instances in parallel ... Thanks for looking into the patch, I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task, If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as good as one process is doing operation. In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table, this way all process get equal sharing of the task. Thanks Regards, Dilip Kumar I have executed latest patch. One question is that how to use --jobs option is correct? $ vacuumdb -d postgres --jobs=30 I got following error. vacuumdb: unrecognized option '--jobs=30' Try vacuumdb --help for more information. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar dilip.ku...@huawei.com wrote: ... Updated patch is attached in the mail.. Thanks Dilip. I get a compiler warning when building on Windows. When I started looking into that, I see that two files have too much code duplication between them: src/bin/scripts/vac_parallel.c (new file) src/bin/pg_dump/parallel.c (existing file) In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) Also, there are several places in the patch which use spaces for indentation where tabs are called for by the coding style. It looks like you may have copied the code from one terminal window and copied it into another one, converting tabs to spaces in the process. This makes it hard to evaluate the amount of code duplication. In some places the code spins in a tight loop while waiting for a worker process to become free. If I strace the process, I got a long list of selects with 0 time outs: select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout) I have not tried to track down the code that causes it. I did notice that vacuumdb spends an awful lot of time at the top of the Linux top output, and this is probably why. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Jeff Janes wrote: In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. Maybe we should move pgpipe back to src/port and have pg_dump and this new thing use that. I'm not sure about the rest of duplication in vac_parallel.c; there might be a lot in common with what pg_dump/parallel.c does too. Having two copies of code is frowned upon for good reasons. This patch introduces 1200 lines of new code in vac_parallel.c, ugh. If we really require 1200 lines to get parallel vacuum working for vacuumdb, I would question the wisdom of this effort. To me, it seems better spent improving autovacuum to cover whatever it is that this patch is supposed to be good for --- or maybe just enable having a shell script that launches multiple vacuumdb instances in parallel ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 01 July 2014 03:31, Jeff Janes Wrote, I get a compiler warning when building on Windows. When I started looking into that, I see that two files have too much code duplication between them: Thanks for Reviewing, src/bin/scripts/vac_parallel.c (new file) src/bin/pg_dump/parallel.c (existing file) In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) When I started doing this patch, I thought of sharing the common code b/w vacuumdb and pg_dump, But if we notice Pg_dump code is tightly coupled with ArchiveHandle, almost all function take this parameter as input or they operate on this, and other functions uses some structure like ParallelState or ParallelSlot which has ArchiveHandle member. I think making this code common may need to change complete code of Parallel pg_dump. However there are some function which are independent of Archive Handle and can directly move to common code, As you mention pg_pipe, piperead, readMessageFromPipe, select_loop. For moving them to common place we need to decide where the common file to be placed. Thoughts ? Also, there are several places in the patch which use spaces for indentation where tabs are called for by the coding style. It looks like you may have copied the code from one terminal window and copied it into another one, converting tabs to spaces in the process. This makes it hard to evaluate the amount of code duplication. In some places the code spins in a tight loop while waiting for a worker process to become free. If I strace the process, I got a long list of selects with 0 time outs: select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout) I have not tried to track down the code that causes it. I did notice that vacuumdb spends an awful lot of time at the top of the Linux top output, and this is probably why. I will look into these and fix.. Thanks Regards, Dilip Kumar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 01 July 2014 03:48, Alvaro Wrote, In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. Maybe we should move pgpipe back to src/port and have pg_dump and this new thing use that. I'm not sure about the rest of duplication in vac_parallel.c; there might be a lot in common with what pg_dump/parallel.c does too. Having two copies of code is frowned upon for good reasons. This patch introduces 1200 lines of new code in vac_parallel.c, ugh. If we really require 1200 lines to get parallel vacuum working for vacuumdb, I would question the wisdom of this effort. To me, it seems better spent improving autovacuum to cover whatever it is that this patch is supposed to be good for --- or maybe just enable having a shell script that launches multiple vacuumdb instances in parallel ... Thanks for looking into the patch, I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task, If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as good as one process is doing operation. In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table, this way all process get equal sharing of the task. Thanks Regards, Dilip Kumar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Jun 26, 2014 at 2:35 AM, Dilip kumar dilip.ku...@huawei.com wrote: Thank you for giving your time, Please review the updated patch attached in the mail. 1. Rebased the patch 2. Implemented parallel execution for new option --analyze-in-stages Hi Dilip, Thanks for rebasing. I haven't done an architectural or code review on it, I just applied it and used it a little on Linux. Based on that, I find most importantly that it doesn't seem to correctly vacuum tables which have upper case letters in the name, because it does not quote the table names when they need quotes. Of course that needs to be fixed, but taking it as it is, the resulting error message to the console is just: : Execute failed Which is not very informative. I get the same error if I do a pg_ctl shutdown -mi while running the parallel vacuumdb. Without the -j option it produces a more descriptive error message FATAL: terminating connection due to administrator command, so something about the new feature suppresses the informative error messages. I get some compiler warnings with the new patch: vac_parallel.c: In function 'parallel_msg_master': vac_parallel.c:147: warning: function might be possible candidate for 'gnu_printf' format attribute vac_parallel.c:147: warning: function might be possible candidate for 'gnu_printf' format attribute vac_parallel.c: In function 'exit_horribly': vac_parallel.c:1071: warning: 'noreturn' function does return In the usage message, the string has a tab embedded within it (immediately before use) that should be converted to literal spaces, otherwise the output of --help gets misaligned: printf(_( -j, --jobs=NUM use this many parallel jobs to vacuum\n)); Thanks for the work on this. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Hi, I got following FAILED when I patched v3 to HEAD. $ patch -d. -p1 ../patch/vacuumdb_parallel_v3.patch patching file doc/src/sgml/ref/vacuumdb.sgml Hunk #1 succeeded at 224 (offset 20 lines). patching file src/bin/scripts/Makefile Hunk #2 succeeded at 65 with fuzz 2 (offset -1 lines). patching file src/bin/scripts/vac_parallel.c patching file src/bin/scripts/vac_parallel.h patching file src/bin/scripts/vacuumdb.c Hunk #3 succeeded at 61 with fuzz 2. Hunk #4 succeeded at 87 (offset 2 lines). Hunk #5 succeeded at 143 (offset 2 lines). Hunk #6 succeeded at 158 (offset 5 lines). Hunk #7 succeeded at 214 with fuzz 2 (offset 5 lines). Hunk #8 FAILED at 223. Hunk #9 succeeded at 374 with fuzz 1 (offset 35 lines). Hunk #10 FAILED at 360. Hunk #11 FAILED at 387. 3 out of 11 hunks FAILED -- saving rejects to file src/bin/scripts/vacuumdb.c.rej --- Sawada Masahiko On Friday, March 21, 2014, Dilip kumar dilip.ku...@huawei.com wrote: On 16 January 2014 19:53, Euler Taveira Wrote, For the case where you have tables of varying size this would lead to a reduced overall processing time as it prevents large (read: long processing time) tables to be processed in the last step. While processing large tables at first and filling up processing slots/jobs when they get free with smaller tables one after the other would safe overall execution time. Good point, I have made the change and attached the modified patch. Don't you submit it for a CF, do you? Is it too late for this CF? Attached the latest updated patch 1. Rebased the patch to current GIT head. 2. Doc is updated. 3. Supported parallel execution for all db option also. Same I will add to current open commitfest.. Regards, Dilip -- Regards, --- Sawada Masahiko