On 3/6/2025 15:38, Alexander Korotkov wrote:
On Tue, Jun 3, 2025 at 4:23 PM Andrei Lepikhov <lepi...@gmail.com> wrote:
To establish a stable foundation for discussion, I conducted simple
tests - see, for example, a couple of queries in the attachment. As I
see it, Sort->Append works faster: in my test bench, it takes 1250ms on
average versus 1430ms, and it also has lower costs - the same for data
with and without massive numbers of duplicates. Playing with sizes of
inputs, I see the same behaviour.

I run your tests.  For Sort(Append()) case I've got actual
time=811.047..842.473.  For MergeAppend case I've got actual time
actual time=723.678..967.004.  That looks interesting.  At some point
we probably should teach our Sort node to start returning tuple before
finishing the last merge stage.

However, I think costs are not adequate to the timing.  Our cost model
predicts that startup cost of MergeAppend is less than startup cost of
Sort(Append()).  And that's correct.  However, in fast total time of
MergeAppend is bigger than total time of Sort(Append()).  The
differences in these two cases are comparable.  I think we need to
just our cost_sort() to reflect that.
May you explain your idea? As I see (and have shown in the previous message), the total cost of the Sort->Append is fewer than MergeAppend->Sort. Additionally, as I mentioned earlier, the primary reason for choosing MergeAppend in the regression test was a slight total cost difference that triggered the startup cost comparison. May you show the query and its explain, that is a subject of concern for you?

--
regards, Andrei Lepikhov


Reply via email to