On Fri, 14 Jul 2023 at 16:21, Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > > > > On 7/11/23 13:20, Matthias van de Meent wrote: >> On Mon, 10 Jul 2023 at 22:04, Tomas Vondra >> <tomas.von...@enterprisedb.com> wrote: >>> Approximation in what sense? My understanding was we'd get a range of >>> distances that we know covers all rows in that range. So the results >>> should be accurate, no? >> >> The distance is going to be accurate only to the degree that the >> summary can produce accurate distances for the datapoints it >> represents. That can be quite imprecise due to the nature of the >> contained datapoints: a summary of the points (-1, -1) and (1, 1) will >> have a minimum distance of 0 to the origin, where the summary (-1, 0) >> and (-1, 0.5) would have a much more accurate distance of 1. > > Ummm, I'm probably missing something, or maybe my mental model of this > is just wrong, but why would the distance for the second summary be more > accurate? Or what does "more accurate" mean? > > Is that about the range of distances for the summary? For the first > range the summary is a bounding box [(-1,1), (1,1)] so all we know the > points may have distance in range [0, sqrt(2)]. While for the second > summary it's [1, sqrt(1.25)].
Yes; I was trying to refer to the difference between what results you get from the summary vs what results you get from the actual datapoints: In this case, for finding points which are closest to the origin, the first bounding box has a less accurate estimate than the second. > > The point I was making is that the summary can only approximate the > > distance, and that approximation is fine w.r.t. the BRIN sort > > algoritm. > > > > I think as long as the approximation (whatever it means) does not cause > differences in results (compared to not using an index), it's OK. Agreed. Kind regards, Matthias van de Meent Neon (https://neon.tech)