On Fri, Aug 23, 2019 at 10:26 PM Stephen Frost <sfr...@snowman.net> wrote:
> Greetings, > > * Asif Rehman (asifr.reh...@gmail.com) wrote: > > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <aprav...@pivotal.io> wrote: > > > Interesting proposal. Bulk of the work in a backup is transferring > files > > > from source data directory to destination. Your patch is breaking this > > > task down in multiple sets of files and transferring each set in > parallel. > > > This seems correct, however, your patch is also creating a new process > to > > > handle each set. Is that necessary? I think we should try to achieve > this > > > using multiple asynchronous libpq connections from a single basebackup > > > process. That is to use PQconnectStartParams() interface instead of > > > PQconnectdbParams(), wich is currently used by basebackup. On the > server > > > side, it may still result in multiple backend processes per > connection, and > > > an attempt should be made to avoid that as well, but it seems > complicated. > > > > Thanks Asim for the feedback. This is a good suggestion. The main idea I > > wanted to discuss is the design where we can open multiple backend > > connections to get the data instead of a single connection. > > On the client side we can have multiple approaches, One is to use > > asynchronous APIs ( as suggested by you) and other could be to decide > > between multi-process and multi-thread. The main point was we can extract > > lot of performance benefit by using the multiple connections and I built > > this POC to float the idea of how the parallel backup can work, since the > > core logic of getting the files using multiple connections will remain > the > > same, wether we use asynchronous, multi-process or multi-threaded. > > > > I am going to address the division of files to be distributed evenly > among > > multiple workers based on file sizes, that would allow to get some > concrete > > numbers as well as it will also us to gauge some benefits between async > and > > multiprocess/thread approach on client side. > > I would expect you to quickly want to support compression on the server > side, before the data is sent across the network, and possibly > encryption, and so it'd likely make sense to just have independent > processes and connections through which to do that. > > +1 for compression and encryption, but I think parallelism will give us the benefit with and without the compression. Thanks, > > Stephen > -- Ibrar Ahmed