On Thu, Oct 20, 2016 at 12:03 AM, Peter Geoghegan <p...@heroku.com> wrote: > On Wed, Oct 19, 2016 at 7:39 AM, Robert Haas <robertmh...@gmail.com> wrote: >> Gather Merge can't emit a tuple unless it has buffered at least one >> tuple from every producer; otherwise, the next tuple it receives from >> one of those producers might proceed whichever tuple it chooses to >> emit.
Right. Now, after again looking at Gather Merge patch, I think I can better understand how it performs merging. >> However, it doesn't need to wait until all of the workers are >> completely done. The leader only needs to be at least slightly ahead >> of the slowest worker. I'm not sure how that compares to Peter's >> approach. > > I don't think that eager merging will prove all that effective, > however it's implemented. I see a very I/O bound system when parallel > CREATE INDEX merges serially. There is no obvious reason why you'd > have a straggler worker process with CREATE INDEX, really. > >> What I'm worried about is that we're implementing two separate systems >> to do the same thing, and that the parallel sort approach is actually >> a lot less general. I think it's possible to imagine a Parallel Sort >> implementation which does things Gather Merge can't. If all of the >> workers collaborate to sort all of the data rather than each worker >> sorting its own data, then you've got something which Gather Merge >> can't match. But this is not that. > > It's not that yet, certainly. I think I've sketched a path forward for > making partitioning a part of logtape.c that is promising. The sharing > of ranges within tapes and so on will probably have a significant > amount in common with what I've come up with. > > I don't think that any executor infrastructure is a particularly good > model when *batch output* is needed -- the tuple queue mechanism will > be a significant bottleneck, particularly because it does not > integrate read-ahead, etc. > Tuple queue mechanism might not be super-efficient for *batch output* (cases where many tuples needs to be read and written), but I see no reason why it will be slower than disk I/O which I think you are using in the patch. IIUC, in the patch each worker including leader does the tape sort for it's share of tuples and then finally leader merges and populates the index. I am not sure if the mechanism used in patch can be useful as compare to using tuple queue, if the workers can finish their part of sorting in-memory. > > The bottom line is that it's inherently difficult for me to refute the > idea that Gather Merge could do just as well as what I have here, > because proving that involves adding a significant amount of new > infrastructure (e.g., to teach the executor about IndexTuples). > I think, there could be a simpler way, like we can force the gather merge node when all the tuples needs to be sorted and compute the time till it merges all tuples. Similarly, with your patch, we can wait till final merge is completed. However, after doing initial study of both the patches, I feel one can construct cases where Gather Merge can win and also there will be cases where your patch can win. In particular, the Gather Merge can win where workers needs to perform sort mostly in-memory. I am not sure if it's easy to get best of both the worlds. Your patch needs rebase and I noticed one warning. sort\logtape.c(1422): warning C4700: uninitialized local variable 'lt' used -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers