On Thu, Apr 12, 2018 at 11:20 PM, Darafei "Komяpa" Praliaskouski <
> Another thing that could be done for PostGIS geometries is just another
>> opclass which
>> stores geometries "as is" in leafs. As I know, geometries contain MBRs
>> inside their
>> own, so there is no need to store extra MBR. I think the reason why
>> doesn't have such opclass yet is that geometries are frequently large and
>> can exceed maximum size of index tuple.
> Geometry datatype layout was designed with TOASTing in mind: most of data
> is stored in the header, including type, SRID, box and some other flags, so
> getting just several first bytes tells you a lot.
> PostGIS datasets are often of a mixed binary length: in buildings, for
> example, it is quite common to have a lot of four corner houses, and just
> one mapped as a circle, that digitizing software decided to make via
> 720-point polygon. Since reading it from TOAST for an index would require a
> seek of some kind, it may be as efficient to just look it up in the table.
> This way a lossy decompress function can help with index only scans: up to
> some binary length, try to store the original geometry in the index. After
> that, store a shape that has less points in it but covers slightly larger
> area, plus a flag that it's not precise. There are a lot of ways to
> generate a covering shape with less points: obvious is a box, less obvious
> is non axis aligned box, a collection of boxes for a multipart shape, an
> outer ring for an area with lots of holes, a convex hull or a smallest
> enclosing k-gon.
> In GIS there is a problem of border of Russia: the country overlaps over
> 180 meridian and has a complex border shape. if you take a box of it, it
> spans from -180 to 180. If you query any spot in US or in Europe, you'll
> have it intersecting with your area, require a full recheck, complete
> detoast and decompression, and then "no, it's not a thing we need, skip".
> Allowing anything better than a box would help. If we're allowing a complex
> shape - we've got the container for it, geometry.
> If we're storing geometry in index and original's small, why not allow
> complete Index Only Scan on it, and let it skip recheck? :)
So, as I get the idea is that geometries has very different sizes
from very small to very large. And it would be nice to store small
geometries in the index "as is". And then for small geomentries
we can do index only scan and recheck while for large geometries
we have to visit heap for fetching geometry.
That can be done in custom opclass without work in PostgreSQL
core except index only scan. Right now optimizer expects index
to be always capable to return original datum for some column,
or to be never capable to do this. It doesn't allow this decision to
be done in runtime. Allowing this would require patch to PostgreSQL
core. This patch shouldn't be hard at the executor size. But
optimizer part of this patch seems hard, because I don't know
how to estimate fraction of index keys, which can be used to
reconstruct original datums. Probably that would require
GiST compress method to return some statistics...
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company