Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-29 Thread Jeremy Harris
On 27/09/15 21:58, Gavin Wahl wrote:
> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.

An internal merge sort does well with partially-sorted input.
-- 
Cheers,
 Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-29 Thread Simon Riggs
On 29 September 2015 at 13:20, Jeremy Harris  wrote:

> On 27/09/15 21:58, Gavin Wahl wrote:
> > Somewhat harder but still possible would be using BRIN indexes to
> > accelerate ORDER BY. This would require a sorting algorithm that can take
> > advantage of mostly-sorted inputs. You would sort the page ranges by
> their
> > minimum or maximum value, then feed the sorting algorithm in that order.
>
> An internal merge sort does well with partially-sorted input.
>

Yes, the $SUBJECT would be a valid use for BRIN.

And also in general for sorts, leading to an optimization of merge joins
using BRIN indexes.

All this was foreseen in the original design; if it really was trivial it
would be in 9.5 already...

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-28 Thread Marti Raudsepp
Hi Gavin

Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com

In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.

> *partial-knn-1.patch*
>
> KNN-GiST provides ability to get ordered results from index, but this order
> is based only on index information. For instance, GiST index contains
> bounding rectangles for polygons, and we can't get exact distance to
> polygon from index (similar situation is in PostGIS). In attached patch,
> GiST distance method can set recheck flag (similar to consistent method).
> This flag means that distance method returned lower bound of distance and
> we should recheck it from heap.

Unfortunatley this work has stalled.

Regards,
Marti

On Sun, Sep 27, 2015 at 11:58 PM, Gavin Wahl  wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only scan
> that one. Would that be hard to implement? I'm interested in working on it
> if someone can give me some pointers.
>
> Somewhat harder but still possible would be using BRIN indexes to accelerate
> ORDER BY. This would require a sorting algorithm that can take advantage of
> mostly-sorted inputs. You would sort the page ranges by their minimum or
> maximum value, then feed the sorting algorithm in that order.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-28 Thread Heikki Linnakangas

On 09/28/2015 05:28 PM, Marti Raudsepp wrote:

Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com

In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.


*partial-knn-1.patch*

KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.


Unfortunatley this work has stalled.


No, that was actually committed for 9.5. It's this item in the release 
notes:



Allow queries to perform accurate distance filtering of
bounding-box-indexed objects (polygons, circles) using GiST indexes
(Alexander Korotkov, Heikki Linnakangas)

Previously, a common table expression was required to return a large
number of rows ordered by bounding-box distance, and then filtered
further with a more accurate non-bounding-box distance calculation.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Tom Lane
Alvaro Herrera  writes:
> Gavin Wahl wrote:
>> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> just find the page range with the largest/smallest value, and then only
>> scan that one. Would that be hard to implement? I'm interested in working
>> on it if someone can give me some pointers.

> I think this means you need to represent this operation as a specific
> Path in some way.  See build_minmax_path() and its callers in planagg;
> you probably need to tweak preprocess_minmax_aggregates() to consider
> this.

> This doesn't look like a simple project to me, mind.

>> Somewhat harder but still possible would be using BRIN indexes to
>> accelerate ORDER BY. This would require a sorting algorithm that can take
>> advantage of mostly-sorted inputs. You would sort the page ranges by their
>> minimum or maximum value, then feed the sorting algorithm in that order.

> I wouldn't know where to start for this.  Maybe once Tom is done with
> planner rejiggering it would make sense to consider looking at how to do
> it.

Yeah.  I would urgently recommend that people *not* try to build new
things like planagg.c right now.  A large part of the point of upper
planner path-ification is to have a less grotty way of dealing with
things like specialized aggregate implementations.

(And yes, I am working on that.  Honest.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Alvaro Herrera
Gavin Wahl wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only
> scan that one. Would that be hard to implement? I'm interested in working
> on it if someone can give me some pointers.

I think this means you need to represent this operation as a specific
Path in some way.  See build_minmax_path() and its callers in planagg;
you probably need to tweak preprocess_minmax_aggregates() to consider
this.

This doesn't look like a simple project to me, mind.

> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.

I wouldn't know where to start for this.  Maybe once Tom is done with
planner rejiggering it would make sense to consider looking at how to do
it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Gavin Wahl
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.

Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Thomas Munro
On Mon, Sep 28, 2015 at 9:58 AM, Gavin Wahl  wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only scan
> that one.

You might need to scan more than that if you don't find any rows that
are visible.

> Would that be hard to implement? I'm interested in working on it
> if someone can give me some pointers.
>
> Somewhat harder but still possible would be using BRIN indexes to accelerate
> ORDER BY. This would require a sorting algorithm that can take advantage of
> mostly-sorted inputs. You would sort the page ranges by their minimum or
> maximum value, then feed the sorting algorithm in that order.

Currently you get a Bitmap Index Scan and Bitmap Heap Scan, and then a
Sort node (quicksort or external sort).  So the sort is already
receiving data sorted in BRIN-block order, isn't it?  Are you saying
that the sort node should switch to something like insertion sort in
this case?

http://www.sorting-algorithms.com/nearly-sorted-initial-order

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Gavin Wahl
> Yeah.  I would urgently recommend that people *not* try to build new
> things like planagg.c right now.  A large part of the point of upper
> planner path-ification is to have a less grotty way of dealing with
> things like specialized aggregate implementations.

Ok. I will wait and ask again later.