solr warmup and reading the index into memory on startup?

2012-02-24 Thread Nicolas Flacco
I'm seeing some problems warming up solr on startup. Currently warmup
consists of two parts- running queries on startup programmatically, and
then running a script to perform queries. The programmatic warmup seems to
warm up Solr fine in terms of making queries via the Solr admin tool, but
when I do a query programmatically, the first query basically takes 2-3m,
during which time I see tons of lucene index loading activity. I'm
assuming that the lucene index is not getting loaded into memory, so this
happens on the first query.

Is there a way to force Solr to load the index into memory on startup
apart from doing a query and waiting?

Programmatic warmup:

for(a bunch of queries){
  SolrQueryResponse rsp = new SolrQueryResponse();
  core.execute(req, rsp);
  NamedList values = rsp.getValues();
  // and iterate through the docs in these values
}

Also tried adding a *:* query into the warmup listener- this didn't help
either.


 warmup_queries.txt
  

*:*





Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Nicolas Flacco
I compared locallucene to spatial search and saw a performance
degradation, even using geohash queries, though perhaps I indexed things
wrong? Locallucene across 6 machines handles 150 queries per second fine,
but using geofilt and geohash I got lots of timeouts even when I was doing
only 50 queries per second. Has anybody done a formal comparison of
locallucene with spatial search and latlontype, pointtype and geohash?

On 2/8/12 2:20 PM, "Ryan McKinley"  wrote:

>Hi Matthias-
>
>I'm trying to understand how you have your data indexed so we can give
>reasonable direction.
>
>What field type are you using for your locations?  Is it using the
>solr spatial field types?  What do you see when you look at the debug
>information from &debugQuery=true?
>
>From my experience, there is no single best practice for spatial
>queries -- it will depend on your data density and distribution if.
>
>You may also want to look at:
>http://code.google.com/p/lucene-spatial-playground/
>but note this is off lucene trunk -- the geohash queries are super fast
>though
>
>ryan
>
>
>
>
>2012/2/8 Matthias Käppler :
>> Hi Erick,
>>
>> if we're not doing geo searches, we filter by "location tags" that we
>> attach to places. This is simply a hierachical regional id, which is
>> simple to filter for, but much less flexible. We use that on Web a
>> lot, but not on mobile, where we want to performance searches in
>> arbitrary radii around arbitrary positions. For those location tag
>> kind of queries, the average time spent in SOLR is 43msec (I'm looking
>> at the New Relic snapshot of the last 12 hours). I have disabled our
>> "optimization" again just yesterday, so for the bbox queries we're now
>> at an avg of 220ms (same time window). That's a 5 fold increase in
>> response time, and in peak hours it's worse than that.
>>
>> I've also found a blog post from 3 years ago which outlines the inner
>> workings of the SOLR spatial indexing and searching:
>> http://www.searchworkings.org/blog/-/blogs/23842
>> From that it seems as if SOLR already performs a similar optimization
>> we had in mind during the index step, so if I understand correctly, it
>> doesn't even search over all records, only those that were mapped to
>> the grid box identified during indexing.
>>
>> What I would love to see is what the suggested way is to perform a geo
>> query on SOLR, considering that they're so difficult to cache and
>> expensive to run. Is the best approach to restrict the candidate set
>> as much as possible using cheap filter queries, so that SOLR merely
>> has to do the geo search against these subsets? How does the query
>> planner work here? I see there's a cost attached to a filter query,
>> but one can only set it when cache is set to false? Are cached geo
>> queries executed last when there are cheaper filter queries to cut
>> down on documents? If you have a real world practical setup to share,
>> one that performs well in a production environment that serves
>> requests in the Millions per day, that would be great.
>>
>> I'd love to contribute documentation by the way, if you knew me you'd
>> know I'm an avid open source contributor and actually run several open
>> source projects myself. But tell me, how can I possibly contribute
>> answer to questions I don't have an answer to? That's why I'm here,
>> remember :) So please, these kinds of snippy replies are not helping
>> anyone.
>>
>> Thanks
>> -Matthias
>>
>> On Tue, Feb 7, 2012 at 3:06 PM, Erick Erickson
>> wrote:
>>> So the obvious question is "what is your
>>> performance like without the distance filters?"
>>>
>>> Without that knowledge, we have no clue whether
>>> the modifications you've made had any hope of
>>> speeding up your response times
>>>
>>> As for the docs, any improvements you'd like to
>>> contribute would be happily received
>>>
>>> Best
>>> Erick
>>>
>>> 2012/2/6 Matthias Käppler :
 Hi,

 we need to perform fast geo lookups on an index of ~13M places, and
 were running into performance problems here with SOLR. We haven't done
 a lot of query optimization / SOLR tuning up until now so there's
 probably a lot of things we're missing. I was wondering if you could
 give me some feedback on the way we do things, whether they make
 sense, and especially why a supposed optimization we implemented
 recently seems to have no effect, when we actually thought it would
 help a lot.

 What we do is this: our API is built on a Rails stack and talks to
 SOLR via a Ruby wrapper. We have a few filters that almost always
 apply, which we put in filter queries. Filter cache hit rate is
 excellent, about 97%, and cache size caps at 10k filters (max size is
 32k, but it never seems to reach that many, probably because we
 replicate / delta update every few minutes). Still, geo queries are
 slow, about 250-500msec on average. We send them with cache=false, so
 as to not flood the fq cache and cause undesirab

spatial search performance - latlontype vs pointtype?

2012-02-01 Thread Nicolas Flacco
I've switched my index to use pointtype instead of latlontype of spatial search 
queries. Unfortunately I'm seeing much worse performance, and I was wondering 
if anybody else knew of any issues between the two types. I would expect a flat 
space calculation of pointtype to be better than the spherical calculation of 
latlontype… is this an incorrect assumption?

I saw a message from last July on this subject that hinted at latlontype being 
preferred, but that was as far as it went: 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201107.mbox/%3ccah0thkb6lg5pu-r-iulgf77tnzbjlpxisostb_nzqz-ublu...@mail.gmail.com%3E

Thanks,
Nick