On Fri, Aug 23, 2019 at 3:18 PM Asim R P <aprav...@pivotal.io> wrote:

> Hi Asif
>
> Interesting proposal.  Bulk of the work in a backup is transferring files
> from source data directory to destination.  Your patch is breaking this
> task down in multiple sets of files and transferring each set in parallel.
> This seems correct, however, your patch is also creating a new process to
> handle each set.  Is that necessary?  I think we should try to achieve this
> using multiple asynchronous libpq connections from a single basebackup
> process.  That is to use PQconnectStartParams() interface instead of
> PQconnectdbParams(), wich is currently used by basebackup.  On the server
> side, it may still result in multiple backend processes per connection, and
> an attempt should be made to avoid that as well, but it seems complicated.
>
> What do you think?
>
> The main question is what we really want to solve here. What is the
bottleneck? and which HW want to saturate?. Why I am saying that because
there are multiple H/W involve while taking the backup (Network/CPU/Disk).
If we
already saturated the disk then there is no need to add parallelism because
we will be blocked on disk I/O anyway.  I implemented the parallel backup
in a sperate
application and has wonderful results. I just skim through the code and have
some reservation that creating a separate process only for copying data is
overkill.
There are two options, one is non-blocking calls or you can have some
worker threads.
But before doing that need to see the pg_basebackup bottleneck, after that,
we
can see what is the best way to solve that. Some numbers may help to
understand the
actual benefit.


-- 
Ibrar Ahmed

Reply via email to