On Fri, Feb 14, 2020 at 3:36 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Fri, Feb 14, 2020 at 9:12 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> > This work is to parallelize the copy command and in particular "Copy
> > <table_name> from 'filename' Where <condition>;" command.
> Nice project, and a great stepping stone towards parallel DML.


> > The first idea is that we allocate each chunk to a worker and once the
> > worker has finished processing the current chunk, it can start with
> > the next unprocessed chunk.  Here, we need to see how to handle the
> > partial tuples at the end or beginning of each chunk.  We can read the
> > chunks in dsa/dsm instead of in local buffer for processing.
> > Alternatively, if we think that accessing shared memory can be costly
> > we can read the entire chunk in local memory, but copy the partial
> > tuple at the beginning of a chunk (if any) to a dsa.  We mainly need
> > partial tuple in the shared memory area. The worker which has found
> > the initial part of the partial tuple will be responsible to process
> > the entire tuple. Now, to detect whether there is a partial tuple at
> > the beginning of a chunk, we always start reading one byte, prior to
> > the start of the current chunk and if that byte is not a terminating
> > line byte, we know that it is a partial tuple.  Now, while processing
> > the chunk, we will ignore this first line and start after the first
> > terminating line.
> That's quiet similar to the approach I took with a parallel file_fdw
> patch[1], which mostly consisted of parallelising the reading part of
> copy.c, except that...
> > To connect the partial tuple in two consecutive chunks, we need to
> > have another data structure (for the ease of reference in this email,
> > I call it CTM (chunk-tuple-map)) in shared memory where we store some
> > per-chunk information like the chunk-number, dsa location of that
> > chunk and a variable which indicates whether we can free/reuse the
> > current entry.  Whenever we encounter the partial tuple at the
> > beginning of a chunk we note down its chunk number, and dsa location
> > in CTM.  Next, whenever we encounter any partial tuple at the end of
> > the chunk, we search CTM for next chunk-number and read from
> > corresponding dsa location till we encounter terminating line byte.
> > Once we have read and processed this partial tuple, we can mark the
> > entry as available for reuse.  There are some loose ends here like how
> > many entries shall we allocate in this data structure.  It depends on
> > whether we want to allow the worker to start reading the next chunk
> > before the partial tuple of the previous chunk is processed.  To keep
> > it simple, we can allow the worker to process the next chunk only when
> > the partial tuple in the previous chunk is processed.  This will allow
> > us to keep the entries equal to a number of workers in CTM.  I think
> > we can easily improve this if we want but I don't think it will matter
> > too much as in most cases by the time we processed the tuples in that
> > chunk, the partial tuple would have been consumed by the other worker.
> ... I didn't use a shm 'partial tuple' exchanging mechanism, I just
> had each worker follow the final tuple in its chunk into the next
> chunk, and have each worker ignore the first tuple in chunk after
> chunk 0 because it knows someone else is looking after that.  That
> means that there was some double reading going on near the boundaries,

Right and especially if the part in the second chunk is bigger, then
we might need to read most of the second chunk.

> and considering how much I've been complaining about bogus extra
> system calls on this mailing list lately, yeah, your idea of doing a
> bit more coordination is a better idea.  If you go this way, you might
> at least find the copy.c part of the patch I wrote useful as stand-in
> scaffolding code in the meantime while you prototype the parallel
> writing side, if you don't already have something better for this?

No, I haven't started writing anything yet, but I have some ideas on
how to achieve this.  I quickly skimmed through your patch and I think
that can be used as a starting point though if we decide to go with
accumulating the partial tuple or all the data in shm, then the things
might differ.

> > Another approach that came up during an offlist discussion with Robert
> > is that we have one dedicated worker for reading the chunks from file
> > and it copies the complete tuples of one chunk in the shared memory
> > and once that is done, a handover that chunks to another worker which
> > can process tuples in that area.  We can imagine that the reader
> > worker is responsible to form some sort of work queue that can be
> > processed by the other workers.  In this idea, we won't be able to get
> > the benefit of initial tokenization (forming tuple boundaries) via
> > parallel workers and might need some additional memory processing as
> > after reader worker has handed the initial shared memory segment, we
> > need to somehow identify tuple boundaries and then process them.
> Yeah, I have also wondered about something like this in a slightly
> different context.  For parallel query in general, I wondered if there
> should be a Parallel Scatter node, that can be put on top of any
> parallel-safe plan, and it runs it in a worker process that just
> pushes tuples into a single-producer multi-consumer shm queue, and
> then other workers read from that whenever they need a tuple.

The idea sounds great but the past experience shows that shoving all
the tuples through queue might add a significant overhead.  However, I
don't know how exactly you are planning to use it?

>  Hmm,
> but for COPY, I suppose you'd want to push the raw lines with minimal
> examination, not tuples, into a shm queue, so I guess that's a bit
> different.


> > Another thing we need to figure out is the how many workers to use for
> > the copy command.  I think we can use it based on the file size which
> > needs some experiments or may be based on user input.
> It seems like we don't even really have a general model for that sort
> of thing in the rest of the system yet, and I guess some kind of
> fairly dumb explicit system would make sense in the early days...

makes sense.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply via email to