Re: High-Availability deployment

2007-10-09 Thread Daniel Alheiros
Hi Hoss,

Yes I know that, but I want to have a proper dummy backup (something that
could be kept in a very controlled environment). I thought about using this
approach (a slave just for this purpose), but if I'm using it just as a
backup node there is no reason I don't use a proper backup structure (as I
have all needed infra-structure in place for that). It's just an extra
redundancy level as I'm going to have a Master/Slaves structure and the
index is replicated amongst them anyway.

Yes, I got it. I have implemented ways to re-index stuff in an incremental
way so I can just re-index a slice of my content (based on dates or id's)
which should be enough to keep my index up-to-date quickly after a possible
disaster.

Thank you for your considerations,
Daniel


On 8/10/07 18:29, Chris Hostetter [EMAIL PROTECTED] wrote:

 : I'm setting up a backup task to keep a copy of my master index, just to
 : avoid having to re-build my index from scratch. And other important issue is
 
 every slave is a backup of the master, so you don't usually need a
 seperate backup mechanism.
 
 re-building hte index is more about peace of mind when asking why did it
 crash?  what did/didn't get writen the index before it crashed?
 
 
 
 
 -Hoss
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



High-Availability deployment

2007-10-08 Thread Daniel Alheiros
Hi

I'm about to deploy SOLR in a production environment and so far I'm a bit
concerned about availability.

I have a system that is responsible for fetching data from a database and
then pushing it to SOLR using its XML/HTTP interface.

So I'm going to deploy N instances of my application so it's going to be
redundant enough.

And I'm deploying SOLR in a Master / Slaves structure, so I'm using the
slaves nodes as a way to keep my index replicated and to be able to use them
to serve my queries. But my problem lies on the indexing side of things. Is
there a good alternative like a Master/Master structure that I could use so
if my current master dies I can automatically switch to my secondary master
keeping my index integrity? Or it would be needed a manual index merge after
this switch over so I can redefine my primary master server?

Thanks,
Daniel  


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: High-Availability deployment

2007-10-08 Thread Yonik Seeley
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote:
 I'm about to deploy SOLR in a production environment

Cool, can you share exactly what it will be used for?

 and so far I'm a bit
 concerned about availability.

 I have a system that is responsible for fetching data from a database and
 then pushing it to SOLR using its XML/HTTP interface.

 So I'm going to deploy N instances of my application so it's going to be
 redundant enough.

 And I'm deploying SOLR in a Master / Slaves structure, so I'm using the
 slaves nodes as a way to keep my index replicated and to be able to use them
 to serve my queries. But my problem lies on the indexing side of things. Is
 there a good alternative like a Master/Master structure that I could use so
 if my current master dies I can automatically switch to my secondary master
 keeping my index integrity?

In all the setups I've dealt with, master redundancy wasn't an issue.
If something bad happens to corrupt the index, shut off replication to
the slaves and do a complete rebuild on the master.  If the master
hardware dies, reconfigure one of the slaves to be the new master.
These are manual steps and assumes that it's not the end of the world
if your search is stale for a couple of hours.  A schema change that
required reindexing would also cause this window of staleness.

If your index build takes a long time, you could set up a secondary
master to pull from the primary (just like another slave).  But
there's no support for automatically switching over slaves, and the
secondary wouldn't have stuff between the last commit and the primary
crash... so something would need to update it... (query for latest doc
and start from there).

You could also have two search tiers... another copy of the master and
multiple slaves.  If one was down, being upgraded, or being rebuilt,
you could direct search traffic to the other set of servers.

-Yonik


Re: High-Availability deployment

2007-10-08 Thread Walter Underwood
We run multiple, identical, independent copies. No master/slave
dependencies. Yes, we run indexing N times for N servers, but
that's what CPU is for and I sleep better at night. It makes
testing and deployment trivial, too.

wunder
==
Walter Underwood
Search Guy, Netflix


On 10/8/07 4:05 AM, Daniel Alheiros [EMAIL PROTECTED] wrote:

 Hi
 
 I'm about to deploy SOLR in a production environment and so far I'm a bit
 concerned about availability.
 
 I have a system that is responsible for fetching data from a database and
 then pushing it to SOLR using its XML/HTTP interface.
 
 So I'm going to deploy N instances of my application so it's going to be
 redundant enough.
 
 And I'm deploying SOLR in a Master / Slaves structure, so I'm using the
 slaves nodes as a way to keep my index replicated and to be able to use them
 to serve my queries. But my problem lies on the indexing side of things. Is
 there a good alternative like a Master/Master structure that I could use so
 if my current master dies I can automatically switch to my secondary master
 keeping my index integrity? Or it would be needed a manual index merge after
 this switch over so I can redefine my primary master server?
 
 Thanks,
 Daniel  
 
 
 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may contain personal
 views which are not the views of the BBC unless specifically stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor act in reliance on
 it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or received.
 Further communication will signify your consent to this.
 



Re: High-Availability deployment

2007-10-08 Thread Yonik Seeley
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote:
 Well I believe I can live with some staleness at certain moments, but it's
 not good as users are supposed to need it 24x7. So the common practice is to
 make one of the slaves as the new master and switch things over to it and
 after the outage put them in sync again and do the proper switch back? OK,
 I'll follow this, but I'm still concerned about the amount of manual steps
 to be done...

That was the plan - never needed it though... (never had a master
completely die that I know of).  Having the collection not be updated
for an hour or so while the ops folks fixed things always worked fine.

 And other important issue is
 how frequently have you seen indexes getting corrupted?

Just once I think - no idea of the cause (and I think it was quite an
old version of lucene).

 If I try to run a
 commit or optimize on a Solr master instance and it's index got corrupted
 will it run the command?

Almost all of the cases I've seen of a master failing was an OOM
error, often during segment merging (again, older versions of Lucene,
and someone forgot to change the JVM heap size from the default).
This could cause a situation where you added a document but the old
one was not deleted (overwritten).  Not corrupted at the Lucene
level, but if the JVM died at the wrong spot, search results could
possibly return two documents for the same unique key.  We normally
just rebuilt after a crash.

 And more importantly, will it run the
 postOptimize/postCommit scripts generating snapshots and then possibly
 propagating the bad index?

Normally not, I think... the JVM crash/restart left the lucene write
lock aquired on the index and further attempts to modify it failed.

-Yonik


Re: High-Availability deployment

2007-10-08 Thread Daniel Alheiros
Hi Yonik.

It looks pretty good.

I hope I'm not the one who will post a very odd crash after a while. :)
OK, so is very unlikely that a OOM it's going to happen, as I've set my JVM
heap size to 1.5G.

Hmm, is there any exception thrown in case the index get corrupted (if it's
not caused by OOM and the JVM crashes)? The document uniqueness SOLR offers
is one of the many reasons I'm using it and should be excellent to know when
it's gone. :) 
Does it mean that after recovering from a JVM crash should be recommended to
rebuild my indexes instead of just re-starting it?

Thanks again,
Daniel


On 8/10/07 17:30, Yonik Seeley [EMAIL PROTECTED] wrote:

 On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote:
 Well I believe I can live with some staleness at certain moments, but it's
 not good as users are supposed to need it 24x7. So the common practice is to
 make one of the slaves as the new master and switch things over to it and
 after the outage put them in sync again and do the proper switch back? OK,
 I'll follow this, but I'm still concerned about the amount of manual steps
 to be done...
 
 That was the plan - never needed it though... (never had a master
 completely die that I know of).  Having the collection not be updated
 for an hour or so while the ops folks fixed things always worked fine.
 
 And other important issue is
 how frequently have you seen indexes getting corrupted?
 
 Just once I think - no idea of the cause (and I think it was quite an
 old version of lucene).
 
 If I try to run a
 commit or optimize on a Solr master instance and it's index got corrupted
 will it run the command?
 
 Almost all of the cases I've seen of a master failing was an OOM
 error, often during segment merging (again, older versions of Lucene,
 and someone forgot to change the JVM heap size from the default).
 This could cause a situation where you added a document but the old
 one was not deleted (overwritten).  Not corrupted at the Lucene
 level, but if the JVM died at the wrong spot, search results could
 possibly return two documents for the same unique key.  We normally
 just rebuilt after a crash.
 
 And more importantly, will it run the
 postOptimize/postCommit scripts generating snapshots and then possibly
 propagating the bad index?
 
 Normally not, I think... the JVM crash/restart left the lucene write
 lock aquired on the index and further attempts to modify it failed.
 
 -Yonik


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: High-Availability deployment

2007-10-08 Thread Yonik Seeley
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote:
 Hmm, is there any exception thrown in case the index get corrupted (if it's
 not caused by OOM and the JVM crashes)? The document uniqueness SOLR offers
 is one of the many reasons I'm using it and should be excellent to know when
 it's gone. :)
 Does it mean that after recovering from a JVM crash should be recommended to
 rebuild my indexes instead of just re-starting it?

Yes, it's safer to do so.
I think in a future release we will be able to guarantee document
uniqueness even in the face of a crash.

-Yonik


Re: High-Availability deployment

2007-10-08 Thread Daniel Alheiros
OK, I'll define it as a procedure in my disaster recovery plan.

That would be great. I'm looking forward to it.

Thanks,
Daniel

On 8/10/07 18:07, Yonik Seeley [EMAIL PROTECTED] wrote:

 On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote:
 Hmm, is there any exception thrown in case the index get corrupted (if it's
 not caused by OOM and the JVM crashes)? The document uniqueness SOLR offers
 is one of the many reasons I'm using it and should be excellent to know when
 it's gone. :)
 Does it mean that after recovering from a JVM crash should be recommended to
 rebuild my indexes instead of just re-starting it?
 
 Yes, it's safer to do so.
 I think in a future release we will be able to guarantee document
 uniqueness even in the face of a crash.
 
 -Yonik


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: High-Availability deployment

2007-10-08 Thread Chris Hostetter
: I'm setting up a backup task to keep a copy of my master index, just to
: avoid having to re-build my index from scratch. And other important issue is

every slave is a backup of the master, so you don't usually need a 
seperate backup mechanism.

re-building hte index is more about peace of mind when asking why did it 
crash?  what did/didn't get writen the index before it crashed?




-Hoss