I compared locallucene to spatial search and saw a performance
degradation, even using geohash queries, though perhaps I indexed things
wrong? Locallucene across 6 machines handles 150 queries per second fine,
but using geofilt and geohash I got lots of timeouts even when I was doing
only 50 queries per second. Has anybody done a formal comparison of
locallucene with spatial search and latlontype, pointtype and geohash?
On 2/8/12 2:20 PM, "Ryan McKinley" wrote:
>Hi Matthias-
>
>I'm trying to understand how you have your data indexed so we can give
>reasonable direction.
>
>What field type are you using for your locations? Is it using the
>solr spatial field types? What do you see when you look at the debug
>information from &debugQuery=true?
>
>From my experience, there is no single best practice for spatial
>queries -- it will depend on your data density and distribution if.
>
>You may also want to look at:
>http://code.google.com/p/lucene-spatial-playground/
>but note this is off lucene trunk -- the geohash queries are super fast
>though
>
>ryan
>
>
>
>
>2012/2/8 Matthias Käppler :
>> Hi Erick,
>>
>> if we're not doing geo searches, we filter by "location tags" that we
>> attach to places. This is simply a hierachical regional id, which is
>> simple to filter for, but much less flexible. We use that on Web a
>> lot, but not on mobile, where we want to performance searches in
>> arbitrary radii around arbitrary positions. For those location tag
>> kind of queries, the average time spent in SOLR is 43msec (I'm looking
>> at the New Relic snapshot of the last 12 hours). I have disabled our
>> "optimization" again just yesterday, so for the bbox queries we're now
>> at an avg of 220ms (same time window). That's a 5 fold increase in
>> response time, and in peak hours it's worse than that.
>>
>> I've also found a blog post from 3 years ago which outlines the inner
>> workings of the SOLR spatial indexing and searching:
>> http://www.searchworkings.org/blog/-/blogs/23842
>> From that it seems as if SOLR already performs a similar optimization
>> we had in mind during the index step, so if I understand correctly, it
>> doesn't even search over all records, only those that were mapped to
>> the grid box identified during indexing.
>>
>> What I would love to see is what the suggested way is to perform a geo
>> query on SOLR, considering that they're so difficult to cache and
>> expensive to run. Is the best approach to restrict the candidate set
>> as much as possible using cheap filter queries, so that SOLR merely
>> has to do the geo search against these subsets? How does the query
>> planner work here? I see there's a cost attached to a filter query,
>> but one can only set it when cache is set to false? Are cached geo
>> queries executed last when there are cheaper filter queries to cut
>> down on documents? If you have a real world practical setup to share,
>> one that performs well in a production environment that serves
>> requests in the Millions per day, that would be great.
>>
>> I'd love to contribute documentation by the way, if you knew me you'd
>> know I'm an avid open source contributor and actually run several open
>> source projects myself. But tell me, how can I possibly contribute
>> answer to questions I don't have an answer to? That's why I'm here,
>> remember :) So please, these kinds of snippy replies are not helping
>> anyone.
>>
>> Thanks
>> -Matthias
>>
>> On Tue, Feb 7, 2012 at 3:06 PM, Erick Erickson
>> wrote:
>>> So the obvious question is "what is your
>>> performance like without the distance filters?"
>>>
>>> Without that knowledge, we have no clue whether
>>> the modifications you've made had any hope of
>>> speeding up your response times
>>>
>>> As for the docs, any improvements you'd like to
>>> contribute would be happily received
>>>
>>> Best
>>> Erick
>>>
>>> 2012/2/6 Matthias Käppler :
Hi,
we need to perform fast geo lookups on an index of ~13M places, and
were running into performance problems here with SOLR. We haven't done
a lot of query optimization / SOLR tuning up until now so there's
probably a lot of things we're missing. I was wondering if you could
give me some feedback on the way we do things, whether they make
sense, and especially why a supposed optimization we implemented
recently seems to have no effect, when we actually thought it would
help a lot.
What we do is this: our API is built on a Rails stack and talks to
SOLR via a Ruby wrapper. We have a few filters that almost always
apply, which we put in filter queries. Filter cache hit rate is
excellent, about 97%, and cache size caps at 10k filters (max size is
32k, but it never seems to reach that many, probably because we
replicate / delta update every few minutes). Still, geo queries are
slow, about 250-500msec on average. We send them with cache=false, so
as to not flood the fq cache and cause undesirab