Re: High-Availability deployment
Hi Hoss, Yes I know that, but I want to have a proper dummy backup (something that could be kept in a very controlled environment). I thought about using this approach (a slave just for this purpose), but if I'm using it just as a backup node there is no reason I don't use a proper backup structure (as I have all needed infra-structure in place for that). It's just an extra redundancy level as I'm going to have a Master/Slaves structure and the index is replicated amongst them anyway. Yes, I got it. I have implemented ways to re-index stuff in an incremental way so I can just re-index a slice of my content (based on dates or id's) which should be enough to keep my index up-to-date quickly after a possible disaster. Thank you for your considerations, Daniel On 8/10/07 18:29, Chris Hostetter [EMAIL PROTECTED] wrote: : I'm setting up a backup task to keep a copy of my master index, just to : avoid having to re-build my index from scratch. And other important issue is every slave is a backup of the master, so you don't usually need a seperate backup mechanism. re-building hte index is more about peace of mind when asking why did it crash? what did/didn't get writen the index before it crashed? -Hoss http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
High-Availability deployment
Hi I'm about to deploy SOLR in a production environment and so far I'm a bit concerned about availability. I have a system that is responsible for fetching data from a database and then pushing it to SOLR using its XML/HTTP interface. So I'm going to deploy N instances of my application so it's going to be redundant enough. And I'm deploying SOLR in a Master / Slaves structure, so I'm using the slaves nodes as a way to keep my index replicated and to be able to use them to serve my queries. But my problem lies on the indexing side of things. Is there a good alternative like a Master/Master structure that I could use so if my current master dies I can automatically switch to my secondary master keeping my index integrity? Or it would be needed a manual index merge after this switch over so I can redefine my primary master server? Thanks, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: High-Availability deployment
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: I'm about to deploy SOLR in a production environment Cool, can you share exactly what it will be used for? and so far I'm a bit concerned about availability. I have a system that is responsible for fetching data from a database and then pushing it to SOLR using its XML/HTTP interface. So I'm going to deploy N instances of my application so it's going to be redundant enough. And I'm deploying SOLR in a Master / Slaves structure, so I'm using the slaves nodes as a way to keep my index replicated and to be able to use them to serve my queries. But my problem lies on the indexing side of things. Is there a good alternative like a Master/Master structure that I could use so if my current master dies I can automatically switch to my secondary master keeping my index integrity? In all the setups I've dealt with, master redundancy wasn't an issue. If something bad happens to corrupt the index, shut off replication to the slaves and do a complete rebuild on the master. If the master hardware dies, reconfigure one of the slaves to be the new master. These are manual steps and assumes that it's not the end of the world if your search is stale for a couple of hours. A schema change that required reindexing would also cause this window of staleness. If your index build takes a long time, you could set up a secondary master to pull from the primary (just like another slave). But there's no support for automatically switching over slaves, and the secondary wouldn't have stuff between the last commit and the primary crash... so something would need to update it... (query for latest doc and start from there). You could also have two search tiers... another copy of the master and multiple slaves. If one was down, being upgraded, or being rebuilt, you could direct search traffic to the other set of servers. -Yonik
Re: High-Availability deployment
We run multiple, identical, independent copies. No master/slave dependencies. Yes, we run indexing N times for N servers, but that's what CPU is for and I sleep better at night. It makes testing and deployment trivial, too. wunder == Walter Underwood Search Guy, Netflix On 10/8/07 4:05 AM, Daniel Alheiros [EMAIL PROTECTED] wrote: Hi I'm about to deploy SOLR in a production environment and so far I'm a bit concerned about availability. I have a system that is responsible for fetching data from a database and then pushing it to SOLR using its XML/HTTP interface. So I'm going to deploy N instances of my application so it's going to be redundant enough. And I'm deploying SOLR in a Master / Slaves structure, so I'm using the slaves nodes as a way to keep my index replicated and to be able to use them to serve my queries. But my problem lies on the indexing side of things. Is there a good alternative like a Master/Master structure that I could use so if my current master dies I can automatically switch to my secondary master keeping my index integrity? Or it would be needed a manual index merge after this switch over so I can redefine my primary master server? Thanks, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: High-Availability deployment
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: Well I believe I can live with some staleness at certain moments, but it's not good as users are supposed to need it 24x7. So the common practice is to make one of the slaves as the new master and switch things over to it and after the outage put them in sync again and do the proper switch back? OK, I'll follow this, but I'm still concerned about the amount of manual steps to be done... That was the plan - never needed it though... (never had a master completely die that I know of). Having the collection not be updated for an hour or so while the ops folks fixed things always worked fine. And other important issue is how frequently have you seen indexes getting corrupted? Just once I think - no idea of the cause (and I think it was quite an old version of lucene). If I try to run a commit or optimize on a Solr master instance and it's index got corrupted will it run the command? Almost all of the cases I've seen of a master failing was an OOM error, often during segment merging (again, older versions of Lucene, and someone forgot to change the JVM heap size from the default). This could cause a situation where you added a document but the old one was not deleted (overwritten). Not corrupted at the Lucene level, but if the JVM died at the wrong spot, search results could possibly return two documents for the same unique key. We normally just rebuilt after a crash. And more importantly, will it run the postOptimize/postCommit scripts generating snapshots and then possibly propagating the bad index? Normally not, I think... the JVM crash/restart left the lucene write lock aquired on the index and further attempts to modify it failed. -Yonik
Re: High-Availability deployment
Hi Yonik. It looks pretty good. I hope I'm not the one who will post a very odd crash after a while. :) OK, so is very unlikely that a OOM it's going to happen, as I've set my JVM heap size to 1.5G. Hmm, is there any exception thrown in case the index get corrupted (if it's not caused by OOM and the JVM crashes)? The document uniqueness SOLR offers is one of the many reasons I'm using it and should be excellent to know when it's gone. :) Does it mean that after recovering from a JVM crash should be recommended to rebuild my indexes instead of just re-starting it? Thanks again, Daniel On 8/10/07 17:30, Yonik Seeley [EMAIL PROTECTED] wrote: On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: Well I believe I can live with some staleness at certain moments, but it's not good as users are supposed to need it 24x7. So the common practice is to make one of the slaves as the new master and switch things over to it and after the outage put them in sync again and do the proper switch back? OK, I'll follow this, but I'm still concerned about the amount of manual steps to be done... That was the plan - never needed it though... (never had a master completely die that I know of). Having the collection not be updated for an hour or so while the ops folks fixed things always worked fine. And other important issue is how frequently have you seen indexes getting corrupted? Just once I think - no idea of the cause (and I think it was quite an old version of lucene). If I try to run a commit or optimize on a Solr master instance and it's index got corrupted will it run the command? Almost all of the cases I've seen of a master failing was an OOM error, often during segment merging (again, older versions of Lucene, and someone forgot to change the JVM heap size from the default). This could cause a situation where you added a document but the old one was not deleted (overwritten). Not corrupted at the Lucene level, but if the JVM died at the wrong spot, search results could possibly return two documents for the same unique key. We normally just rebuilt after a crash. And more importantly, will it run the postOptimize/postCommit scripts generating snapshots and then possibly propagating the bad index? Normally not, I think... the JVM crash/restart left the lucene write lock aquired on the index and further attempts to modify it failed. -Yonik http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: High-Availability deployment
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: Hmm, is there any exception thrown in case the index get corrupted (if it's not caused by OOM and the JVM crashes)? The document uniqueness SOLR offers is one of the many reasons I'm using it and should be excellent to know when it's gone. :) Does it mean that after recovering from a JVM crash should be recommended to rebuild my indexes instead of just re-starting it? Yes, it's safer to do so. I think in a future release we will be able to guarantee document uniqueness even in the face of a crash. -Yonik
Re: High-Availability deployment
OK, I'll define it as a procedure in my disaster recovery plan. That would be great. I'm looking forward to it. Thanks, Daniel On 8/10/07 18:07, Yonik Seeley [EMAIL PROTECTED] wrote: On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: Hmm, is there any exception thrown in case the index get corrupted (if it's not caused by OOM and the JVM crashes)? The document uniqueness SOLR offers is one of the many reasons I'm using it and should be excellent to know when it's gone. :) Does it mean that after recovering from a JVM crash should be recommended to rebuild my indexes instead of just re-starting it? Yes, it's safer to do so. I think in a future release we will be able to guarantee document uniqueness even in the face of a crash. -Yonik http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: High-Availability deployment
: I'm setting up a backup task to keep a copy of my master index, just to : avoid having to re-build my index from scratch. And other important issue is every slave is a backup of the master, so you don't usually need a seperate backup mechanism. re-building hte index is more about peace of mind when asking why did it crash? what did/didn't get writen the index before it crashed? -Hoss