On Sun, Sep 14, 2014 at 9:32 AM, Peter Geoghegan <p...@heroku.com> wrote:
> I think we might be better off if a tuplesort function was called > shortly after tuplesort_begin_heap() is called. How top-n heap sorts > work is something that largely lives in tuplesort's head. Today, we > call tuplesort_set_bound() to hint to tuplesort "By the way, this is a > top-n heap sort applicable sort". I think that with this patch, we > should then hint (where applicable) "by the way, you won't actually be > required to sort those first n indexed attributes; rather, you can > expect to scan those in logical order. You may work the rest out > yourself, and may be clever about exploiting the sorted-ness of the > first few columns". The idea of managing a bunch of tiny sorts from > with ExecSort(), and calling the new function tuplesort_reset() seems > questionable. tuplesortstate is supposed to be private/opaque to > nodeSort.c, and the current design strains that. > > I'd like to keep nodeSort.c simple. I think it's pretty clear that the > guts of this do not belong within ExecSort(), in any case. Also, the > additions there should be much better commented, wherever they finally > end up. > As I understand, you propose to incapsulate partial sort algorithm into tuplesort. However, in this case we anyway need some significant change of its interface: let tuplesort decide when it's able to return tuple. Otherwise, we would miss significant part of LIMIT clause optimization. tuplesort_set_bound() can't solve all the cases. There could be other planner nodes between the partial sort and LIMIT. ------ With best regards, Alexander Korotkov.