Re: 99.9% uptime requirement

Walter Underwood Thu, 06 Aug 2009 09:35:58 -0700

Design so that you can handle the load with one server down (N+1sizing), then take one server out for any maintenance. Simple andworks fine.


wunder


On Aug 6, 2009, at 9:25 AM, Robert Petersen wrote:

Here is another idea.  With solr multicore you can dynamically spin up
extra cores and bring them online.  I'm not sure how well this would
work for us since we have hard coded the names of the cores we are
hitting in our config files.

-----Original Message-----
From: Brian Klippel [mailto:br...@theport.com]
Sent: Thursday, August 06, 2009 8:38 AM
To: solr-user@lucene.apache.org
Subject: RE: 99.9% uptime requirement

You could create a new "working" core, then call the swap command once

it is ready. Then remove the work core and delete the appropriateindex

folder at your convenience.


-----Original Message-----
From: Robert Petersen [mailto:rober...@buy.com]
Sent: Wednesday, August 05, 2009 6:41 PM
To: solr-user@lucene.apache.org
Subject: RE: 99.9% uptime requirement

Maintenance Questions:  In a two slave one master setup where the two

slaves are behind load balancers what happens if I have to restartsolr?

If I have to restart solr say for a schema update where I have added a
new field then what is the recommended procedure?

If I can guarantee no commits or optimizes happen on the master during

the schema update so no new snapshots become available then can Isafely

leave rsyncd enabled?  When I stop and start a slave server, should I
first pull it out of the load balancers list or will solr gracefully
release connections as it shuts down so no searches are lost?

What do you guys do to push out updates?

Thanks for any thoughts,
Robi


-----Original Message-----
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Tuesday, August 04, 2009 8:57 AM
To: solr-user@lucene.apache.org
Subject: Re: 99.9% uptime requirement

Right. You don't get to 99.9% by assuming that an 8 hour outage is OK.
Design for continuous uptime, with plans for how long it takes to
patch around a single point of failure. For example, if your load
balancer is a single point of failure, make sure that you can redirect

the front end servers to a single Solr server in much less than 8hours.


Also, think about your SLA. Can the search index be more than 8 hours
stale? How quickly do you need to be able to replace a failed indexing
server? You might be able to run indexing locally on each search
server if they are lightly loaded.

wunder

On Aug 4, 2009, at 7:11 AM, Norberto Meijome wrote:

On Mon, 3 Aug 2009 13:15:44 -0700
"Robert Petersen" <rober...@buy.com> wrote:

Thanks all, I figured there would be more talk about daemontools if
there
were really a need.  I appreciate the input and for starters we'll
put two
slaves behind a load balancer and grow it from there.


Robert,
not taking away from daemon tools, but daemon tools won't help you
if your
whole server goes down.

don't put all your eggs in one basket - several
servers, load balancer (hardware load balancers x 2, haproxy, etc)

and sure, use daemon tools to keep your services running within each
server...

B
_________________________
{Beto|Norberto|Numard} Meijome

"Why do you sit there looking like an envelope without any address
on it?"
Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery
when wet.
Reading disclaimers makes you go blind. Writing them is worse. You
have been
Warned.

Re: 99.9% uptime requirement

Reply via email to