On Thu, Apr 9, 2020 at 4:20 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Apr 9, 2020 at 1:00 AM Robert Haas <robertmh...@gmail.com> wrote: > > > > On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <a...@cybertec.at> wrote: > > > > > > With option 1 it's not possible to read input data into shared memory > > > and there needs to be an extra memcpy in the time critical sequential > > > flow of the leader. With option 2 data could be read directly into the > > > shared memory buffer. With future async io support, reading and > > > looking for tuple boundaries could be performed concurrently. > > > > But option 2 still seems significantly worse than your proposal above, > > right? > > > > I really think we don't want a single worker in charge of finding > > tuple boundaries for everybody. That adds a lot of unnecessary > > inter-process communication and synchronization. Each process should > > just get the next tuple starting after where the last one ended, and > > then advance the end pointer so that the next process can do the same > > thing. Vignesh's proposal involves having a leader process that has to > > switch roles - he picks an arbitrary 25% threshold - and if it doesn't > > switch roles at the right time, performance will be impacted. If the > > leader doesn't get scheduled in time to refill the queue before it > > runs completely empty, workers will have to wait. Ants's scheme avoids > > that risk: whoever needs the next tuple reads the next line. There's > > no need to ever wait for the leader because there is no leader. > > > > Hmm, I think in his scheme also there is a single reader process. See > the email above [1] where he described how it should work. >
oops, I forgot to specify the link to the email. See https://www.postgresql.org/message-id/CANwKhkO87A8gApobOz_o6c9P5auuEG1W2iCz0D5CfOeGgAnk3g%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com