Re: Real Time Search and External File Fields

Erick Erickson Sat, 08 Oct 2016 20:34:24 -0700

I chose 16 as a place to start. You usually reach diminishing returns
pretty quickly, i feel it's a mistake to set your autowarm counts to, say
256 (and I've seen this in the thousands) unless you have some proof
that it's useful to bump higher.


But certainly if you set them to 16 and see spikes just after a searcher
is opened that aren't tolerable, feel free to make them larger.

You've hit on exactly why newSearcher and firstSearcher are there.
The theory behind autowarm counts is that the last N entries are
likely to be useful in the near future. There's no guarantee at all that
this is true and newSearcher/firstSearcher are certain to exercise
what _you_ think is most important.

As for why autowarm counts are set to 0 in the examples, there's no
overarching reason. Certainly if the soft commit interval is 1 second,
autowarming
is largely useless so having it also at 0 makes sense.

Best,
Erick

On Sat, Oct 8, 2016 at 12:31 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> With time-oriented data, you can use an old trick (goes back to Infoseek in 
> 1995).
>
> Make a “today” collection that is very fresh. Nightly, migrate new documents 
> to
> the “not today” collection. The today collection will be small and can be 
> updated
> quickly. The archive collection will be large and slow to update, but who 
> cares?
>
> You can also send all docs to both collections and de-dupe.
>
> Every night, you start over with the “today” collection.
>
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Oct 8, 2016, at 12:18 PM, Mike Lissner <mliss...@michaeljaylissner.com> 
>> wrote:
>>
>> On Fri, Oct 7, 2016 at 8:18 PM Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> What you haven't mentioned is how often you add new docs. Is it once a
>>> day? Steadily
>>> from 8:00 to 17:00?
>>>
>>
>> Alas, it's a steady trickle during business hours. We're ingesting court
>> documents as they're posted on court websites, then sending alerts as soon
>> as possible.
>>
>>
>>> Whatever, your soft commit really should be longer than your autowarm
>>> interval. Configure
>>> autowarming to reference queries (firstSearcher or newSearcher events
>>> or autowarm
>>> counts in queryResultCache and filterCache. Say 16 in each of these
>>> latter for a start) such
>>> that they cause the external file to load. That _should_ prevent any
>>> queries from being
>>> blocked since the autowarming will happen in the background and while
>>> it's happening
>>> incoming queries will be served by the old searcher.
>>>
>>
>> I want to make sure I understand this properly and document this for future
>> people that may find this thread. Here's what I interpret your advice to be:
>>
>> 0. Slacken my auto soft commit interval to something more like a minute.
>>
>> 1. Set up a query in the newSearcher listener that uses my external file
>> field.
>> 1a. Do the same in firstSearcher if I want newly started solr to warm up
>> before getting queries (this doesn't matter to me, so I'm skipping this).
>>
>> and/or
>>
>> 2. Set autowarmcount in queryResultCache and filterCache to 16 so that the
>> top 16 query results from the previous searcher are regenerated in the new
>> searcher.
>>
>> Doing #1 seems like a safe strategy since it's guaranteed to hit the
>> external file field. #2 feels like a bonus.
>>
>> I'm a bit confused about the example autowarmcount for the caches, which is
>> 0. Why not set this to something higher? I guess it's a RAM utilization vs.
>> speed tradeoff? A low number like 16 seems like it'd have minimal impact on
>> RAM?
>>
>> Thanks for all the great replies and for everything you do for Solr. I
>> truly appreciate your efforts.
>>
>> Mike
>

Re: Real Time Search and External File Fields

Reply via email to