On Mon, Nov 16, 2009 at 9:20 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Mon, Nov 16, 2009 at 8:23 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>> One of the other things I think we are going to need is a cache for 
>> functions that are used this way.  For instance, in the geo case, it is 
>> likely that we would both filter and score by distance,
>
> Filtering (bounding box) should be a separate, more efficient
> operation than calculating distance, so I don't think any sort of
> generic cache is needed for geo.

Actually, you're right.
I was thinking of filtering by a bounding box, but people will also
want to filter by a radius (which should presumably use bounding boxes
first to limit the number of points that we calculate the distance
for).

If someone then also sorts, the distance calculation won't be reused.
I don't know a good way around that currently... a full cache would be
pretty expensive memory-wise.

Actually, perhaps there wouldn't be too much wasted calculation after all?
Seems like additional optimizations could limit how many points need
distance calculated for filtering?

Consider a bounding box for a particular radius... one could also find
a box that lies completely within that radius.  Only points inside the
bigger box but outside the smaller box need to have a distance
calculated.

Also, if one is sorting by distance anyway, a straight bounding box
filter may be sufficient (i.e. users should have the option of the
cheaper or more expensive filter).

-Yonik
http://www.lucidimagination.com

Reply via email to