On Jan 4, 2010, at 5:30 PM, Yonik Seeley wrote:

> On Mon, Jan 4, 2010 at 5:07 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>> 
>> On Jan 4, 2010, at 4:19 PM, Yonik Seeley wrote:
>> 
>>> On Mon, Jan 4, 2010 at 2:29 PM,  <gsing...@apache.org> wrote:
>>>> +  public static final double KM_TO_MILES = 0.621371192;
>>>> +  public static final double MILES_TO_KM = 1.609344;
>>> 
>>> I don't care if these exist, but what are your plans for actually using 
>>> them?
>> 
>> Probably premature to commit on my part, I was working on SOLR-1568 and was 
>> allowing the user to pass in the units for the distance value.
> 
> I still think it's no simpler for a client, and more complex over all.
> You either must require units to be passed in (yuck) or decide on
> default units.  Once you have decided on default units, extra
> parameters for different units is just increased complexity that is
> just as trivial for the client to implement.  They either have to know
> the code for what units they are using or they have to know how to
> convert to the standard units - about the same amount of complexity.
> 
>>> For spatial search, it seems like we should simply standardize on
>>> something, probably either meters or kilometers and be done with it.
>>> It's trivial for clients to convert (and clients aren't end-users),
>>> and will reduce confusion about how to specify units, etc.
>>> 
>>> Likewise for points/locations - they should simply be lat,lon in
>>> degrees.  No need to specify if it's in radians or degrees when
>>> degrees is more of an external standard and it's as simple for a
>>> client to convert as it is to specify.
>> 
>> Possibly, except you can save a few operations per document if you just 
>> store radians when using haversine.
> 
> A single multiply (~3cycles?).  If that's worth saving, we should just
> index it that way for the user...

Sure, point type could have an init parameter, I suppose, that specified 
whether to convert.  Or, the user can just send it in radians to begin with.  
What I want as a designer is to specify it up front based on the type of 
accuracy I want out of my distances.  To me, that's what it all comes back to.  
The app designer doing spatial says:  how accurate do I need my calculations to 
be?  Then, they make decisions about data structures based on that.

> but given the computational cost of
> haversine, it's really in the noise... we should figure out other ways
> to speed things up.
> 

Times 20-100 million records to score/filter?  Not a huge amount of savings, 
but still could be worthwhile for some applications under high load and w/ lots 
of docs without costing anyone else anything different. 


> A location in the xml, when using our built-in field types should be
> unambiguously degrees in lat,lon format.  How it's indexed to increase
> speed, save space, etc, is up to the field type and it's
> configuration.

Actually, it is unambiguous as x,y(,z...).  We have points in a n-dimensional 
space, as of now, but we can add lat/lon specifically if that helps.

> 
>> I'm just not sure I see this as a big deal.  Technically, we could hide all 
>> the complexity of numerics from the user too, but yet we offer ints, floats 
>> and doubles (we could parse them on our side and figure out which is what).
> 
> But we do hide the complexity of numerics from the user (clients) as
> much as we can.  popularity:10 popularity:[5 TO 10] all work without
> the client knowing what kind of numeric field is being used (with the
> exception of plain numerics which are offered only for compatibility
> with existing lucene indexes).

I just don't get why normal spatial calculations are any different from other 
function queries, with the exception, right now, of tiles.  Perhaps if in the 
future we have other complex types that require one offs, then we can unify on 
hiding all of this, but for now the _only_ thing that doesn't work out of the 
box are tiles.  All the rest can be handled through function queries and the 
FunctionRangeQParser.  I don't see much benefit in writing/maintaining code 
that is very marginally more readable than using function queries and will be 
templated in an application anyway and then left to do its job.

-Grant

Reply via email to