Re: WIP/PoC for parallel backup

Ibrar Ahmed Fri, 23 Aug 2019 10:59:10 -0700

On Fri, Aug 23, 2019 at 10:26 PM Stephen Frost <sfr...@snowman.net> wrote:


> Greetings,
>
> * Asif Rehman (asifr.reh...@gmail.com) wrote:
> > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <aprav...@pivotal.io> wrote:
> > > Interesting proposal.  Bulk of the work in a backup is transferring
> files
> > > from source data directory to destination.  Your patch is breaking this
> > > task down in multiple sets of files and transferring each set in
> parallel.
> > > This seems correct, however, your patch is also creating a new process
> to
> > > handle each set.  Is that necessary?  I think we should try to achieve
> this
> > > using multiple asynchronous libpq connections from a single basebackup
> > > process.  That is to use PQconnectStartParams() interface instead of
> > > PQconnectdbParams(), wich is currently used by basebackup.  On the
> server
> > > side, it may still result in multiple backend processes per
> connection, and
> > > an attempt should be made to avoid that as well, but it seems
> complicated.
> >
> > Thanks Asim for the feedback. This is a good suggestion. The main idea I
> > wanted to discuss is the design where we can open multiple backend
> > connections to get the data instead of a single connection.
> > On the client side we can have multiple approaches, One is to use
> > asynchronous APIs ( as suggested by you) and other could be to decide
> > between multi-process and multi-thread. The main point was we can extract
> > lot of performance benefit by using the multiple connections and I built
> > this POC to float the idea of how the parallel backup can work, since the
> > core logic of getting the files using multiple connections will remain
> the
> > same, wether we use asynchronous, multi-process or multi-threaded.
> >
> > I am going to address the division of files to be distributed evenly
> among
> > multiple workers based on file sizes, that would allow to get some
> concrete
> > numbers as well as it will also us to gauge some benefits between async
> and
> > multiprocess/thread approach on client side.
>
> I would expect you to quickly want to support compression on the server
> side, before the data is sent across the network, and possibly
> encryption, and so it'd likely make sense to just have independent
> processes and connections through which to do that.
>
> +1 for compression and encryption, but I think parallelism will give us
the benefit with and without the compression.

Thanks,
>
> Stephen
>


-- 
Ibrar Ahmed

Re: WIP/PoC for parallel backup

Reply via email to