Hi, On 2020-04-21 10:20:01 +0530, Amit Kapila wrote: > It is quite likely that compression can benefit more from parallelism > as compared to the network I/O as that is mostly a CPU intensive > operation but I am not sure if we can just ignore the benefit of > utilizing the network bandwidth. In our case, after copying from the > network we do write that data to disk, so during filesystem I/O the > network can be used if there is some other parallel worker processing > other parts of data.
Well, as I said, network and FS IO as done by server / pg_basebackup are both fully buffered by the OS. Unless the OS throttles the userland process, a large chunk of the work will be done by the kernel, in separate kernel threads. My workstation and my laptop can, in a single thread each, get close 20GBit/s of network IO (bidirectional 10GBit, I don't have faster - it's a thunderbolt 10gbe card) and iperf3 is at 55% CPU while doing so. Just connecting locally it's 45Gbit/s. Or over 8GBbyte/s of buffered filesystem IO. And it doesn't even have that high per-core clock speed. I just don't see this being the bottleneck for now. > Also, there may be some users who don't want their data to be > compressed due to some reason like the overhead of decompression is so > high that restore takes more time and they are not comfortable with > that as for them faster restore is much more critical then compressed > or fast back up. So, for such things, the parallelism during backup > as being discussed in this thread will still be helpful. I am not even convinced it'll be helpful in a large fraction of cases. The added overhead of more connections / processes isn't free. I believe there are some cases where it'd help. E.g. if there are multiple tablespaces on independent storage, parallelism as described here could end up to a significantly better utilization of the different tablespaces. But that'd require sorting work between processes appropriately. > OTOH, I think without some measurements it is difficult to say that we > have significant benefit by paralysing the backup without compression. > I have scanned the other thread [1] where the patch for parallel > backup was discussed and didn't find any performance numbers, so > probably having some performance data with that patch might give us a > better understanding of introducing parallelism in the backup. Agreed, we need some numbers. Greetings, Andres Freund