Just to put a nail in this one: the last time our selectivity estimates for Contains/Within were good was version 2.1, which used the && operator instead of hte ~/@ operators in the SQL wrapper functions.
> On Dec 13, 2022, at 9:04 AM, Paul Ramsey <[email protected]> wrote: > > On Mon, Dec 12, 2022 at 7:17 PM Regina Obe <[email protected]> wrote: >> >> Paul, >> >> Is it the same story with ST_CoveredBy and ST_Covers. I care more about >> those than ST_Contains and ST_Within. > > Yes, it's the same story. The only place we use a real selectivity > calculation is in the && operator. > >> Maybe part of the reason it hasn't been brought up if it is a really old >> issue, is most people just use ST_Intersects or ST_DWithin. > > I assume that's a lot of it. Though I know that there's a certain > gravitation towards contains/within for point-in-polygon work (people > "feel" the containment more for points in polygons... ironically it's > a case with no performance advantage to using containment, while > poly/poly tests actually get a small boost on walking the index for > pure containment. > >> >> This ticket looks like it might be a possible complaint of the issue. >> https://trac.osgeo.org/postgis/ticket/4754 >> That suggests it's an issue of PostGIS 3+ > > I don't think so..., just eyeballing 2.5 we use constant selectivity > on all those operators too, which would seem to indicate that this has > been the case pretty much forever. And the wrapper for 2.5 ST_Contains > does use the containment (~) operator, not the overlaps (&&) operator. > > CREATE OR REPLACE FUNCTION ST_Contains(geom1 geometry, geom2 geometry) > RETURNS boolean > AS 'SELECT $1 OPERATOR(@extschema@.~) $2 AND > @extschema@._ST_Contains($1,$2)' > LANGUAGE 'sql' IMMUTABLE _PARALLEL; > > It is possible that at some earlier stage we used && in all the > wrappers? Then we'd get correct selectivity on those plans, at the > expense of slightly less efficient index searching. > >> I thought we had others too about ST_covers but can't find any open ones. > > Hard to believe, but this might just be a very very very old bug. > > P. > > >> >> >>> -----Original Message----- >>> From: postgis-users [mailto:[email protected]] On >> Behalf >>> Of Paul Ramsey >>> Sent: Monday, December 12, 2022 5:57 PM >>> To: PostGIS Users Discussion <[email protected]> >>> Subject: Re: [postgis-users] Row estimations >>> >>> From what I've been able to find out today, this bug may have existed for >> a >>> very long time. Firstly, the selectivity calculation is correct (using >> your test >>> setup). >>> >>> =# select _postgis_selectivity('test'::regclass, 'p', 'POLYGON((0 >>> 0,0.8 0,0.8 0.8,0 0.8,0 0))'); >>> _postgis_selectivity >>> ---------------------- >>> 0.7886187126157185 >>> >>> That seems to be the selectivity applied in the ST_Intersects plan, but >> not in >>> the ST_Contains plan. >>> >>> Now, both ST_Intersects and ST_Contains are "index assisted" >>> functions. Under the covers, we add index operators to them before they >> are >>> planned. Before PgSQL 12, this was done with a SQL wrapper, and since then >>> with a "support function". In both cases though, the effect is to add an >>> operator to the query. >>> >>> For ST_Intersects, && is added. For ST_Contains, @ is added. And here we >> get >>> into the problem... if you look at the definitions of && and @, you'll see >> this. >>> >>> CREATE OPERATOR && ( >>> LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = >>> geometry_overlaps, >>> COMMUTATOR = '&&', >>> RESTRICT = gserialized_gist_sel_2d, >>> JOIN = gserialized_gist_joinsel_2d >>> ); >>> >>> CREATE OPERATOR @ ( >>> LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = >>> geometry_within, >>> COMMUTATOR = '~', >>> RESTRICT = contsel, JOIN = contjoinsel ); >>> >>> The && operator is using the PostGIS selectivity calculator to estimate >> rows >>> returned. The @ operator is using an in-built function that returns a >> constant >>> (small) selectivity. Effectively pushing the planner to use the index >> almost all >>> the time. >>> >>> For many (most) spatial queries, a selectivity that pushes the plan into >> using >>> the index is a Good and Correct Thing (tm). But as queries get more >> complex >>> and involve more tables, the odds that this is the Wrong Thing, go up. >>> >>> I'm not sure how long this situation has existed, but I'm inclined to >> thing a Very >>> Long Time. So I'm surprised we haven't had reports about it before. I'm >> also a >>> little surprised we didn't apply selectivity calculations to >> contains/within a long >>> time ago. A simple and effective fix is to replace contsel with >>> gserialized_gist_sel_2d for the ~ and @ operators. Unfortunately it's a >> hard fix >>> to apply in in-place upgrades and in batch patches. >>> >>> Anyone who feels like researching the history of selectivity in @/~ by >> looking >>> at old releases, I'd be interested in knowing how long this situation has >> existed. >>> (Forever?) >>> >>> P. >>> >>> >>> On Fri, Dec 9, 2022 at 10:14 AM Paul Ramsey <[email protected]> >>> wrote: >>>> >>>> >>>> >>>>> On Dec 9, 2022, at 10:11 AM, Igor ALBUQUERQUE SILVA <i.albuquerque- >>> [email protected]> wrote: >>>>> >>>>> In this example ST_Contains estimates 785 rows, while ST_Intersect >>> estimates 615469 which is closer to the real 621571 rows. Using the >> bounding >>> box operation I also obtain good estimations, and I also obtain good >>> estimations if I don't use the gist index. >>>>> >>>> >>>> This still seems Really Bad (tm) in that the contains estimate is still >> tragically >>> wrong. However, we have at least some clue as to where to hunt for this >> bug. >>> Thanks for continuing to push the testing forward. >>>> >>>> P. >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >> >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> https://lists.osgeo.org/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
