Re: Realtime search and facets with very frequent commits

dipti khullar Mon, 15 Feb 2010 03:29:51 -0800

Hey Janne

Can you please let me know what other optimizations are you talking about
here. Because in our application we are committing in about 5 mins but still
the response time is very low and at times there are some connection time
outs also.


Just wanted to confirm if you have done some major configuration changes
which have proved beneficial.

Thanks
Dipti

On Fri, Feb 12, 2010 at 3:03 AM, Janne Majaranta
<janne.majara...@gmail.com>wrote:

> Ok,
>
> Thanks Yonik and Otis.
> I already had static warming queries with facets turned on and autowarming
> at zero.
> There were a lot of other optimizations after that however, so I'll try
> with
> zero autowarming and static warming queries again.
>
> If that doesn't work, I'll go with 3 instances on the same server.
>
> BTW, does it sound like normal that when running updates every minute to a
> 36M index it takes all the available heap size after about 5 commits
> although there is not a single query executed to the index and autowarming
> is set to zero ? Just curious.
>
> -Janne
>
>
> 2010/2/11 Otis Gospodnetic <otis_gospodne...@yahoo.com>
>
> > Janne,
> >
> > The answers to your last 2 questions are both yes.  I've seen that done a
> > few times and it works.  I don't have the answer to the always-hot cache
> > question.
> >
> >
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
> >
> >
> > ----- Original Message ----
> > > From: Janne Majaranta <janne.majara...@gmail.com>
> > > To: solr-user@lucene.apache.org
> > > Sent: Thu, February 11, 2010 12:35:20 PM
> > > Subject: Realtime search and facets with very frequent commits
> > >
> > > Hello,
> > >
> > > I have a log search like application which requires indexed log events
> to
> > be
> > > searchable within a minute
> > > and uses facets and the statscomponent.
> > >
> > > Some stats:
> > > - The log events are indexed every 10 seconds with a "commitWithin" of
> 60
> > > seconds.
> > > - 1M events / day (~75% are updates to previous events).
> > > - Faceting over 14 fields ( strings ). Usually TOP5 by numdocs but
> facets
> > > for all 14 fields at the same time.
> > > - Heavy use of StatsComponent ( stats over facets of ~36M documents ).
> > >
> > >
> > > The application is running a single Solr instance. All updates and
> > queries
> > > are sent to the same instance.
> > > Faceting and the StatsComponent are both amazingly fast with that
> amount
> > of
> > > documents *when* the caches are warm.
> > >
> > > The problem I'm now facing is that keeping the caches warm is too heavy
> > > compared to the frequency of updates.
> > > It takes over 60 seconds to warmup the caches to the level where facets
> > and
> > > stats are returned in milliseconds.
> > >
> > > I have tested putting a second solr instance on the same server and
> > sending
> > > the updates to that new instance.
> > > Warming up the new small instance is very fast while the large instance
> > has
> > > very hot caches.
> > >
> > > I also put a third (empty) solr instance on the same server which
> passes
> > the
> > > queries to the two instances with the
> > > "shards" parameters. This is mainly because the client app really
> doesn't
> > > have to know anything about the shards.
> > >
> > > The setup was easy to configure and responses are back in milliseconds
> > and
> > > the updates are visible in seconds.
> > > That is, responses in milliseconds over 40M documents and a update
> > frequency
> > > of 15 seconds on a single physical server.
> > > The (lab) server has 16g RAM and it is running win23k.
> > >
> > > Also, what I found out is that using the sharded setup I only need half
> > the
> > > memory for the large instance.
> > > When indexing to the large instance the memory usage goes very fast up
> to
> > > the maximum allocated heap size and never goes down.
> > >
> > > My question is, is there a magic switch in SOLR to have that kind of
> > update
> > > frequency while having the caches on fire ?
> > > Or is it just impossible to achieve facet counts and queries in
> > milliseconds
> > > while updating the index every minute ?
> > >
> > > The second question is, the setup with a empty SOLR as a "coordinating"
> > > instance, a large SOLR instance with hot caches and a small SOLR
> instance
> > > with immediate updates,
> > > all on the same physical server, does it sound like a durable solution
> > > (until the small instance gets big) or is it something is braindead ?
> > >
> > > And the third question is, would it be a good idea to merge the small
> and
> > > the large index periodically so that a fresh and empty small instance
> > would
> > > be available
> > > after the merge ?
> > >
> > > Any ideas ?
> > >
> > > Best Regards,
> > >
> > > Janne Majaranta
> >
> >
>

Re: Realtime search and facets with very frequent commits

Reply via email to