On Mon, Nov 16, 2009 at 9:20 AM, Yonik Seeley <yo...@lucidimagination.com> wrote: > On Mon, Nov 16, 2009 at 8:23 AM, Grant Ingersoll <gsing...@apache.org> wrote: >> One of the other things I think we are going to need is a cache for >> functions that are used this way. For instance, in the geo case, it is >> likely that we would both filter and score by distance, > > Filtering (bounding box) should be a separate, more efficient > operation than calculating distance, so I don't think any sort of > generic cache is needed for geo.
Actually, you're right. I was thinking of filtering by a bounding box, but people will also want to filter by a radius (which should presumably use bounding boxes first to limit the number of points that we calculate the distance for). If someone then also sorts, the distance calculation won't be reused. I don't know a good way around that currently... a full cache would be pretty expensive memory-wise. Actually, perhaps there wouldn't be too much wasted calculation after all? Seems like additional optimizations could limit how many points need distance calculated for filtering? Consider a bounding box for a particular radius... one could also find a box that lies completely within that radius. Only points inside the bigger box but outside the smaller box need to have a distance calculated. Also, if one is sorting by distance anyway, a straight bounding box filter may be sufficient (i.e. users should have the option of the cheaper or more expensive filter). -Yonik http://www.lucidimagination.com