Re: Realtime search and facets with very frequent commits

Jason Rutherglen Thu, 18 Feb 2010 08:38:16 -0800

Janne,

I don't think there's any activity happening there.


SOLR-1606 is the tracking issue for moving to per segment facets and
docsets.  I haven't had an immediate commercial need to implement
those.

Jason

On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta
<janne.majara...@gmail.com> wrote:
> Hi Otis,
>
> Ok, now I'm confused ;)
> There seems to be a bit activity though when looking at the "last updated"
> timestamps in the google code project wiki:
> http://code.google.com/p/oceansearch/w/list
>
> The Tag Index feature sounds very interesting.
>
> -Janne
>
>
> 2010/2/18 Otis Gospodnetic <otis_gospodne...@yahoo.com>
>
>> Hi Janne,
>>
>> I *think*  Ocean Realtime Search has been superseded by Lucene NRT search.
>>
>>  Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Hadoop ecosystem search :: http://search-hadoop.com/
>>
>>
>>
>> ----- Original Message ----
>> > From: Janne Majaranta <janne.majara...@gmail.com>
>> > To: solr-user@lucene.apache.org
>> > Sent: Thu, February 18, 2010 2:12:37 AM
>> > Subject: Re: Realtime search and facets with very frequent commits
>> >
>> > Hi,
>> >
>> > Yes, I did play with mergeFactor.
>> > I didn't play with mergePolicy.
>> >
>> > Wouldn't that affect indexing speed and possibly memory usage ?
>> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec
>> via
>> > the standard HTTP API ).
>> >
>> > My problem is that I need very warm caches to get fast faceting, and the
>> > autowarming of the caches takes too long compared to the frequency of
>> > commits I'm having.
>> > So a commit every minute means less than a minute time to warm the
>> caches.
>> >
>> > To give you a idea of what kind of queries needs to be autowarmed in my
>> app,
>> > the logevents indexed as documents have timestamps with different
>> > granularity used for faceting.
>> > For example, to get count of logevents for every hour using faceting
>> there's
>> > a timestamp field with the format yyyymmddhh ( for example: 2010021808
>> > meaning 2010-02-18 8am).
>> > One use case is to get hourly counts over the whole index. A non-cached
>> > query counting the hourly counts over the 40M documents index takes a
>> > while..
>> > And to my understanding autowarming means something like that this kind
>> of
>> > query would be basically re-executed against a cold cache. Probably not
>> > exactly how it works, but it "feels" like it would.
>> >
>> > Moving the commits to a smaller index while using sharding to have a
>> > transparent view to the index from the client app seems to solve my
>> problem.
>> >
>> > I'm not sure if the (upcoming?) NRT features would keep the caches more
>> > persistent, probably not in a environment where docs get frequent updates
>> /
>> > deletes.
>> >
>> > Also, I'm closely following the Ocean Realtime Search project AND it's
>> SOLR
>> > integration. It sounds like it has the "dream features" to enable
>> realtime
>> > updates to the index.
>> >
>> > -Janne
>> >
>> >
>> > 2010/2/18 Jan Høydahl / Cominvent
>> >
>> > > Hi,
>> > >
>> > > Have you tried playing with mergeFactor or even mergePolicy?
>> > >
>> > > --
>> > > Jan Høydahl  - search architect
>> > > Cominvent AS - www.cominvent.com
>> > >
>> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote:
>> > >
>> > > > Hey Dipti,
>> > > >
>> > > > Basically query optimizations + setting cache sizes to a very high
>> level.
>> > > > Other than that, the config is about the same as the out-of-the-box
>> > > config
>> > > > that comes with the Solr download.
>> > > >
>> > > > I haven't found a magic switch to get very fast query responses +
>> facet
>> > > > counts with the frequency of commits I'm having using one single SOLR
>> > > > instance.
>> > > > Adding some TOP queries for a certain type of user to static warming
>> > > queries
>> > > > just moved the time of autowarming the caches to the time it took to
>> warm
>> > > > the caches with static queries.
>> > > > I've been staging a setup where there's a small solr instance
>> receiving
>> > > all
>> > > > the updates and a large instance which doesn't receive the live feed
>> of
>> > > > updates.
>> > > > The small index will be merged with the large index periodically
>> (once a
>> > > > week or once a month).
>> > > > The two instances are seen by the client app as one instance using
>> the
>> > > > sharding features of SOLR.
>> > > > The instances are running on the same server inside their own JVM /
>> > > jetty.
>> > > >
>> > > > In this setup the caches are very HOT for the large index and queries
>> are
>> > > > extremely fast, and the small index is small enough to get extremely
>> fast
>> > > > queries without having to warm up the caches too much.
>> > > >
>> > > > Basically I'm able to have a commit frequency of 10 seconds in a 40M
>> docs
>> > > > index while counting TOP5 facets over 14 fields in 200ms.
>> > > > In reality the commit frequency of 10 seconds comes from the fact
>> that
>> > > the
>> > > > updates are going into a 1M - 2M documents index, and the fast facet
>> > > counts
>> > > > from the fact that the 38M documents index has hot caches and doesn't
>> > > > receive any updates.
>> > > >
>> > > > Also, not running updates to the large index means that the SOLR
>> instance
>> > > > reading the large index uses about half the memory it used before
>> when
>> > > > running the updates to the large index. At least it does so on
>> Win2k3.
>> > > >
>> > > > -Janne
>> > > >
>> > > >
>> > > > 2010/2/15 dipti khullar
>> > > >
>> > > >> Hey Janne
>> > > >>
>> > > >> Can you please let me know what other optimizations are you talking
>> > > about
>> > > >> here. Because in our application we are committing in about 5 mins
>> but
>> > > >> still
>> > > >> the response time is very low and at times there are some connection
>> > > time
>> > > >> outs also.
>> > > >>
>> > > >> Just wanted to confirm if you have done some major configuration
>> changes
>> > > >> which have proved beneficial.
>> > > >>
>> > > >> Thanks
>> > > >> Dipti
>> > > >>
>> > > >>
>> > >
>> > >
>>
>>
>

Re: Realtime search and facets with very frequent commits

Reply via email to