On Mon, Aug 4, 2014 at 11:41 AM, Dilip kumar <dilip.ku...@huawei.com> wrote: > > On 31 July 2014 10:59, Amit kapila Wrote, > > > > Thanks for the review and valuable comments. > I have fixed all the comments and attached the revised patch.
I have again looked into your revised patch and would like to share my findings with you. 1. + Number of parallel connections to perform the operation. This option will enable the vacuum + operation to run on parallel connections, at a time one table will be operated on one connection. a. How about describing w.r.t asynchronous connections instead of parallel connections? b. It is better to have line length as lesser than 80. c. As you are using multiple connections to achieve parallelism, I suggest you add a line in your description indicating user should verify max_connections parameters. Something similar to pg_dump: "pg_dump will open njobs + 1 connections to the database, so make sure your max_connections setting is high enough to accommodate all connections." 2. + So at one time as many tables will be vacuumed parallely as number of jobs. can you briefly mention about the case when number of jobs is more than number of tables? 3. + /* When user is giving the table list, and list is smaller then + * number of tables + */ + if (tbl_count && (parallel > tbl_count)) + parallel = tbl_count; + Again here multiline comments are wrong. Some other instances are as below: a. /* This will give the free connection slot, if no slot is free it will * wait for atleast one slot to get free. */ b. /* if table list is not provided then we need to do vaccum for whole DB * get the list of all tables and prpare the list */ c. /* Some of the slot are free, Process the results for slots whichever * are free */ 4. src/bin/scripts/vacuumdb.c:51: indent with spaces. + bool analyze_only, bool freeze, PQExpBuffer sql); src/bin/scripts/vacuumdb.c:116: indent with spaces. + int parallel = 0; src/bin/scripts/vacuumdb.c:198: indent with spaces. + optind++; src/bin/scripts/vacuumdb.c:299: space before tab in indent. + vacuum_one_database(dbname, full, verbose, and_analyze, There are lot of redundant whitespaces, check with git diff --check and fix them. 5. res = executeQuery(conn, "select relname, nspname from pg_class c, pg_namespace ns" " where (relkind = \'r\' or relkind = \'m\')" " and c.relnamespace = ns.oid order by relpages desc", progname, echo); a. Here you need to use SQL keywords in capital letters, refer one of the other caller of executeQuery() in vacuumdb.c b. Why do you need this condition c.relnamespace = ns.oid in above query? I think to get the list of required objects from pg_class, you don't need to have a join with pg_namespace. 6. vacuum_parallel() { .. if (!tables || !tables->head) { .. tbl_count++; } .. } a. Here why you need a separate variable (tbl_count) to verify number asynchronous/parallel connections you want, why can't we use ntuple? b. there is a warning in code (I have compiled it on windows) as well related to this variable. vacuumdb.c(695): warning C4700: uninitialized local variable 'tbl_count' used 7. Fix for one of my previous comment is as below: GetQueryResult() { .. if (!r && !completedb) .. } Here I think some generic errors like connection broken or others will also get ignored. Is it possible that we can ignore particular error which we want to ignore without complicating the code? Also in anycase add comments to explain why you are ignoring error for particular case. 8. + fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"), + progname); formatting of 2nd line progname is not as per standard (you can refer other fprintf in the same file). 9. + int parallel = 0; I think it is better to name it as numAsyncCons or something similar. 10. It is better if you can add function header for newly added functions. > > Test2: (as per your scenario, where actual vacuum time is very less) > > Vacuum done for complete DB > > 8 tables all with 10000 records and few dead tuples I think this test is missing in attached file. Few means? Can you try with 0.1%, 1% of dead tuples in table and try to take time in milliseconds if it is taking less time to complete the test. PS - It is better if you mention against each review comment/suggestion what you have done, because in some cases it will help reviewer to understand your fix easily and as author you will also be sure that all got fixed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com