On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <a...@cybertec.at> wrote: > > > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > This is something similar to what I had also in mind for this idea. I > > > had thought of handing over complete chunk (64K or whatever we > > > decide). The one thing that slightly bothers me is that we will add > > > some additional overhead of copying to and from shared memory which > > > was earlier from local process memory. And, the tokenization (finding > > > line boundaries) would be serial. I think that tokenization should be > > > a small part of the overall work we do during the copy operation, but > > > will do some measurements to ascertain the same. > > > > I don't think any extra copying is needed. > > > > I am talking about access to shared memory instead of the process > local memory. I understand that an extra copy won't be required. > > > The reader can directly > > fread()/pq_copymsgbytes() into shared memory, and the workers can run > > CopyReadLineText() inner loop directly off of the buffer in shared memory. > > > > I am slightly confused here. AFAIU, the for(;;) loop in > CopyReadLineText is about finding the line endings which we thought > that the reader process will do.
Indeed, I somehow misread the code while scanning over it. So CopyReadLineText currently copies data from cstate->raw_buf to the StringInfo in cstate->line_buf. In parallel mode it would copy it from the shared data buffer to local line_buf until it hits the line end found by the data reader. The amount of copying done is still exactly the same as it is now. Regards, Ants Aasma