On 10/8/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote: > Well I believe I can live with some staleness at certain moments, but it's > not good as users are supposed to need it 24x7. So the common practice is to > make one of the slaves as the new master and switch things over to it and > after the outage put them in sync again and do the proper switch back? OK, > I'll follow this, but I'm still concerned about the amount of manual steps > to be done...
That was the plan - never needed it though... (never had a master completely die that I know of). Having the collection not be updated for an hour or so while the ops folks fixed things always worked fine. > And other important issue is > how frequently have you seen indexes getting corrupted? Just once I think - no idea of the cause (and I think it was quite an old version of lucene). > If I try to run a > commit or optimize on a Solr master instance and it's index got corrupted > will it run the command? Almost all of the cases I've seen of a master failing was an OOM error, often during segment merging (again, older versions of Lucene, and someone forgot to change the JVM heap size from the default). This could cause a situation where you added a document but the old one was not deleted (overwritten). Not "corrupted" at the Lucene level, but if the JVM died at the wrong spot, search results could possibly return two documents for the same unique key. We normally just rebuilt after a crash. > And more importantly, will it run the > postOptimize/postCommit scripts generating snapshots and then possibly > propagating the bad index? Normally not, I think... the JVM crash/restart left the lucene write lock aquired on the index and further attempts to modify it failed. -Yonik