Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Ranier Vilela Tue, 06 Jul 2021 06:36:34 -0700

Em ter., 6 de jul. de 2021 às 10:19, James Coleman <jtc...@gmail.com>
escreveu:


> Adding David since this patch is likely a precondition for [1].
>
> On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau <ronan.dunk...@aiven.io>
> wrote:
> >
> > Hello,
> >
> > While testing the patch "Add proper planner support for ORDER BY /
> DISTINCT
> > aggregates" [0] I discovered the performance penalty from adding a sort
> node
> > essentially came from not using the single-datum tuplesort optimization
> in
> > ExecSort (contrary to the sorting done in ExecAgg).
> >
> > I originally proposed this patch as a companion in the same thread [1],
> but
> > following James suggestion I'm making a separate thread just for this as
> the
> > optimization is worthwhile independently of David's patch: it looks like
> we
> > can expect a 2x speedup on a "select a single ordered column" case.
> >
> > The patch aimed to be as simple as possible: we only turn this
> optimization on
> > when the tuple being sorted has only one attribute, it is "byval" (so as
> not
> > to incur copies which would be hard to track in the execution tree) and
> > unbound (again, not having to deal with copying borrowed datum anywhere).
>
> Thanks again for finding this and working up a patch.
>
> I've taken a look, and while I haven't dug into testing it yet, I have
> a few comments.
>
> First, the changes are lacking any explanatory comments. Probably we
> should follow how nodeAgg does this and add both comments to the
> ExecSort function header as well as specific comments above the "if"
> around the new tuplesort_begin_datum explaining the specific
> conditions that are required for the optimization to be useful and
> safe.
>
> That leads to a question I had: I don't follow why bounded mode (when
> using byval) needs to be excluded. Comments should be added if there's
> a good reason (as noted above), but maybe it's a case we can handle
> safely?
>
> A second question: at first glance it's intuitively the case we might
> not be able to handle byref values. But nodeAgg doesn't seem to have
> that restriction. What's the difference here?
>
> A few small code observations:
> - In my view the addition of unlikely() in ExecSort is unlikely to be
> of benefit because it's a single call for the entire node's execution
> (not in the tuple loop).
>
No objection. And I agree that testing is complex and needs to remain as it
is.

- It seems clearer to change the "if (!node->is_single_val)" to flip
> the true/false cases so we don't need the negation.
>
I think yes, it can be better.

regards,
Ranier Vilela

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Reply via email to