On Thu, Jun 13, 2019 at 11:38:12PM -0400, James Coleman wrote:
On Wed, Jun 5, 2019 at 12:14 PM Rafia Sabih <rafia.pghack...@gmail.com> wrote:
> > 2) Provide some fallback at execution time. For example, we might watch
> > the size of the group, and if we run into an unexpectedly large one we
> > might abandon the incremental sort and switch to a "full sort" mode.
>
> Are there good examples of our doing this in other types of nodes
> (whether the fallback is an entirely different algorithm/node type)? I
> like this idea in theory, but I also think it's likely it would add a
> very significant amount of complexity. The other problem is knowing
> where to draw the line: you end up creating these kinds of cliffs
> where pulling one more tuple through the incremental sort would give
> you your batch and result in not having to pull many more tuples in a
> regular sort node, but the fallback logic kicks in anyway.
>
What about having some simple mechanism for this, like if we encounter
the group with more tuples than the one estimated then simply switch
to normal sort for the remaining tuples, as the estimates does not
hold true anyway. Atleast this will not give issues of having
regressions of incremental sort being too bad than the normal sort.
I mean having something like this, populate the tuplesortstate and
keep on counting the number of tuples in a group, if they are within
the budget call tuplesort_performsort, otherwise put all the tuples in
the tuplesort and then call tuplesort_performsort. We may have an
additional field in IncrementalSortState to save the estimated size of
each group. I am assuming that we use MCV lists to approximate better
the group sizes as suggested above by Tomas.

I think the first thing to do is get some concrete numbers on performance if we:

1. Only sort one group at a time.
2. Update the costing to prefer traditional sort unless we have very
high confidence we'll win with incremental sort.

It'd be nice not to have to add additional complexity if at all possible.


+1 to that

> Unrelated to all of the above: if I read the patch properly it
> intentionally excludes backwards scanning. I don't see any particular
> reason why that ought to be the case, and it seems like an odd
> limitation for the feature should it be merged. Should that be a
> blocker to merging?

Regarding this, I came across this,
/*
  * Incremental sort can't be used with either EXEC_FLAG_REWIND,
  * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we hold only current
  * bucket in tuplesortstate.
  */
I think that is quite understandable. How are you planning to support
backwards scan for this? In other words, when will incremental sort be
useful for backward scan.

For some reason I was thinking we'd need it to support backwards scans
to be able to handle DESC sort on the index, but I've tested and
confirmed that already works. I suppose that's because the index scan
provides that ordering and the sort node doesn't need to reverse the
direction of what's provided to it. That's not particularly obvious to
someone newer to the codebase; I'm not sure if that's documented
anywhere.


Yeah, backward scans are not about ASC/DESC, it's about being able to walk
back through the result. And we can't do that with incremental sorts
without materialization.

On a different note, I can't stop imagining this operator on the lines
similar to parallel-append, wherein multiple workers can sort the
different groups independently at the same time.

That is an interesting idea. I suppose it'd be particularly valuable
if somehow there a node that was generating each batch in parallel
already, though I'm not sure at first though what kind of query or
node would result in that. I also wonder if (assuming that weren't the
case) it would be much of an improvement since a single thread would
have to generate each batch anyway; I'm not sure if the overhead of
farming each batch out to a worker would actually be useful or if the
real blocker is the base scan.

At the very least it's an interesting idea.


I kinda doubt that'd be very valuable. Or more precisely, we kinda already
have that capability because we can do things like this:

  -> Gather Merge
     -> Sort
        -> ... scan ...

so I imagine we'd just do an Incremental Sort here. Granted, it's not
distributed by prefix groups (I assume that's what you mean by batches
here), but I don't think that's a big problem.

---

I've been writing down notes here, and I realized that my test case
from far upthread is actually a useful setup to see how much overhead
is involved in sorting each batch individually, since it sets up data
with each batch only containing 1 tuple. That particular case is one
we could easily optimize anyway in the code and skip sorting
altogether -- might be a useful enhancement.

I hope to do some more testing and then report back with the results.

James Coleman

OK.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply via email to