On 06/17/2014 09:16 PM, Andres Freund wrote:
On 2014-06-17 12:14:00 -0400, Robert Haas wrote:
On Tue, Jun 17, 2014 at 12:04 PM, Andres Freund <and...@2ndquadrant.com> wrote:
Well, I'm not the guy who does things with geometric data, but I don't
want to ignore the significant percentage of our users who are.  As
you must surely know, the GIST implementations for geometric data
types store bounding boxes on internal pages, and that seems to be
useful to people.  What is your reason for thinking that it would be
any less useful in this context?

For me minmax indexes are helpful because they allow to generate *small*
'coarse' indexes over large volumes of data. From my pov that's possible
possible because they don't contain item pointers for every contained
row.
That'ill imo work well if there are consecutive rows in the table that
can be summarized into one min/max range. That's quite likely to happen
for common applications of number of scalar datatypes. But the
likelihood of placing sufficiently many rows with very similar bounding
boxes close together seems much less relevant in practice. And I think
that's generally likely for operations which can't be well represented
as btree opclasses - the substructure that implies inside a Datum will
make correlation between consecutive rows less likely.

Well, I don't know: suppose you're loading geospatial data showing the
location of every building in some country.  It might easily be the
case that the data is or can be loaded in an order that provides
pretty good spatial locality, leading to tight bounding boxes over
physically consecutive data ranges.

Well, it might be doable to correlate them along one axis, but along
both?  That's more complicated... And even alongside one axis you
already get into problems if your geometries are irregularly sized.

Sure, there are cases where it would be useless. But it's easy to imagine scenarios where it would work well, where points are loaded in clusters and points that are close to each other also end up physically close to each other.

Asingle large polygon will completely destroy indexability for anything
stored physically close by because suddently the minmax range will be
huge... So you'll need to cleverly sort for that as well.

That's an inherent risk with minmax indexes: insert a few rows to the "wrong" locations in the heap, and the selectivity of the index degrades rapidly.

The main problem with using it for geometric types is that you can't easily CLUSTER the table to make the minmax index effective again. But there are ways around that.

But I'm not trying to say that we absolutely have to support that kind
of thing; what I am trying to say is that there should be a README or
a mailing list post or some such that says: "We thought about how
generic to make this.  We considered A, B, and C.  We rejected C as
too narrow, and A because if we made it that general it would have
greatly enlarged the disk footprint for the following reasons.
Therefore we selected B."

Isn't 'simpler implementation' a valid reason that's already been
discussed onlist? Obviously simpler implementation doesn't trump
everything, but it's one valid reason...
Note that I have zap to do with the design of this feature. I work for
the same company as Alvaro, but that's pretty much it...

Without some analysis (e.g implementing it and comparing), I don't buy that it makes the implementation simpler to restrict it in this way. Maybe it does, but often it's actually simpler to solve the general case.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to