On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma <a...@cybertec.at> wrote: > > On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <a...@cybertec.at> wrote: > > > > > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > This is something similar to what I had also in mind for this idea. I > > > > had thought of handing over complete chunk (64K or whatever we > > > > decide). The one thing that slightly bothers me is that we will add > > > > some additional overhead of copying to and from shared memory which > > > > was earlier from local process memory. And, the tokenization (finding > > > > line boundaries) would be serial. I think that tokenization should be > > > > a small part of the overall work we do during the copy operation, but > > > > will do some measurements to ascertain the same. > > > > > > I don't think any extra copying is needed. > > > > > > > I am talking about access to shared memory instead of the process > > local memory. I understand that an extra copy won't be required. > > > > > The reader can directly > > > fread()/pq_copymsgbytes() into shared memory, and the workers can run > > > CopyReadLineText() inner loop directly off of the buffer in shared memory. > > > > > > > I am slightly confused here. AFAIU, the for(;;) loop in > > CopyReadLineText is about finding the line endings which we thought > > that the reader process will do. > > Indeed, I somehow misread the code while scanning over it. So CopyReadLineText > currently copies data from cstate->raw_buf to the StringInfo in > cstate->line_buf. In parallel mode it would copy it from the shared data > buffer > to local line_buf until it hits the line end found by the data reader. The > amount of copying done is still exactly the same as it is now. >
Yeah, on a broader level it will be something like that, but actual details might vary during implementation. BTW, have you given any thoughts on one other approach I have shared above [1]? We might not go with that idea, but it is better to discuss different ideas and evaluate their pros and cons. [1] - https://www.postgresql.org/message-id/CAA4eK1LyAyPCtBk4rkwomeT6%3DyTse5qWws-7i9EFwnUFZhvu5w%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com