Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

Dilip kumar Sun, 03 Aug 2014 23:14:34 -0700

On 31 July 2014 10:59, Amit kapila Wrote,

Thanks for the review and valuable comments.
I have fixed all the comments and attached the revised patch.


As per your suggestion I have taken the performance report also…

Test1:

Machine Configuration:

                                Core : 8 (Intel(R) Xeon(R) CPU  E5520  @ 
2.27GHz)

                                RAM: 48GB

Test Scenario:

                               8 tables all with 1M+ records. [many records are 
deleted and inserted using some pattern, (files is attached in the mail)]



Test Result

                Base Code:  43.126s

                Parallel Vacuum Code

                                    2 Threads :  29.687s

                                    8 Threads :  14.647s


Test2: (as per your scenario, where actual vacuum time is very less)
            Vacuum done for complete DB
            8 tables all with 10000 records and few dead tuples



Test Result

Base Code:  0.59s

Parallel Vacuum Code

                    2 Threads  :  0.50s

                    4 Threads  :  0.29s

                    8 Threads  :  0.18s


Regards,
Dilip Kumar





From: Amit Kapila [mailto:[email protected]]
Sent: 31 July 2014 10:59
To: Dilip kumar
Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; 
PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP 
]

On Fri, Jul 18, 2014 at 10:22 AM, Dilip kumar 
<[email protected]<mailto:[email protected]>> wrote:
> On 16 July 2014 12:13, Magnus Hagander Wrote,
> >Yeah, those are exactly my points. I think it would be significantly simpler 
> >to do it that way, rather than forking and threading. And also easier to 
> >make portable...
>
> >(and as a  optimization on Alvaros suggestion, you can of course reuse the 
> >initial connection as one of the workers as long as you got the full list of 
> >tasks from it up front, which I think you  do anyway in order to   sorting 
> >of tasks...)
>
> I have modified the patch as per the suggestion,
>
> Now in beginning we create all connections, and first connection we use for 
> getting table list in beginning, After that all connections will be involved 
> in vacuum task.
>
> Please have a look and provide your opinion…

1.
+        connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot));
+        for (i = 0; i < parallel; i++)
+        {
+                    connSlot[i].connection = connectDatabase(dbname, host, 
port, username,
+                                                                               
              prompt_password, progname, false);
+
+                    PQsetnonblocking(connSlot[i].connection, 1);
+                    connSlot[i].isFree = true;
+                    connSlot[i].sock = PQsocket(connSlot[i].connection);
+        }
Here it seems to me that you are opening connections before
getting or checking tables list, so in case you have lesser
number of tables, won't the extra connections be always idle.
Simple case to erify the same is with below example

vacuumdb -t t1 -d postgres -j 4

2.
+         res = executeQuery(conn,
+             "select relname, nspname from pg_class c, pg_namespace ns"
+             " where relkind= \'r\' and c.relnamespace = ns.oid"
+             " order by relpages desc",
+             progname, echo);

Here it is just trying to get the list of relations, however
Vacuum  command processes materialized views as well, so
I think here the list should include materialized views as well
unless you have any specific reason for not including those.

3. In function vacuum_parallel(), if user has not provided list of tables,
then it is retrieving all the tables in database and then in 
run_parallel_vacuum(),
it tries to do Vacuum for each of table using Async mechanism, now
consider a case when after getting list if any table is dropped by user
from some other session, then patch will error out.  However without patch
or Vacuum command will internally ignore such a case and complete
the Vacuum for other tables.  Don't you think the patch should maintain
the existing behaviour?


4.
+        <term><option>-j <replaceable 
class="parameter">jobs</replaceable></></term>
+        Number of parallel process to perform the operation.

Change this description as per new implementation. Also I think
there is a need of some explanation for this new option.


5.
It seems there is no change in below function decalration:
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
!                                                         bool and_analyze, 
bool analyze_only, bool analyze_in_stages,
!                                                         bool freeze, const 
char *table, const char *host,
!                                                         const char *port, 
const char *username,
!                                                         enum trivalue 
prompt_password,
                                                          const char *progname, 
bool echo);

6.
+        printf(_("  -j, --jobs=NUM                  use this many parallel 
jobs to vacuum\n"));
Change the description as per new implementation.



7.
/* This will give the free connection slot, if no slot is free it will
                      wait for atleast one slot to get free.*/

Multiline comments should be written like (refer other places)
/*
 * This will give the free connection slot, if no slot is free it will
 * wait for atleast one slot to get free.
 */
Kindly correct at other places if similar instance exist in patch.

8.
Isn't it a good idea to check performance of this new patch
especially for some worst cases like when there is not much
to vacuum in the tables inside a database.  The reason I wanted
to check is that because with new algorithm (for a vacuum of database,
now it will get the list of tables and perform vacuum on individual
tables) we have to repeat certain actions in server side like
allocation/deallocataion of context, sending stats which would have
been otherwise done once.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

testcase.sql
Description: testcase.sql

vacuumdb_parallel_v12.patch
Description: vacuumdb_parallel_v12.patch

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

Reply via email to