Re: Parallel copy

Andres Freund Fri, 10 Apr 2020 11:27:26 -0700

Hi,

On 2020-04-10 07:40:06 -0400, Robert Haas wrote:
> On Thu, Apr 9, 2020 at 4:00 PM Andres Freund <and...@anarazel.de> wrote:
> > Imo, yes, there should be only one process doing the chunking. For ilp, 
> > cache efficiency, but also because the leader is the only process with 
> > access to the network socket. It should load input data into one large 
> > buffer that's shared across processes. There should be a separate 
> > ringbuffer with tuple/partial tuple (for huge tuples) offsets. Worker 
> > processes should grab large chunks of offsets from the offset ringbuffer. 
> > If the ringbuffer is not full, the worker chunks should be reduced in size.
> 
> My concern here is that it's going to be hard to avoid processes going
> idle. If the leader does nothing at all once the ring buffer is full,
> it's wasting time that it could spend processing a chunk. But if it
> picks up a chunk, then it might not get around to refilling the buffer
> before other processes are idle with no work to do.


An idle process doesn't cost much. Processes that use CPU inefficiently
however...


> Still, it might be the case that having the process that is reading
> the data also find the line endings is so fast that it makes no sense
> to split those two tasks. After all, whoever just read the data must
> have it in cache, and that helps a lot.

Yea. And if it's not fast enough to split lines, then we have a problem
regardless of which process does the splitting.

Greetings,

Andres Freund

Re: Parallel copy

Reply via email to