I think the repeaters are misleading you a bit here. The purpose of a repeater is usually to replicate across a slow network, say in a remote data center, then slaves at that center can get more timely updates. I don't think they add anything to your disaster recovery scenario.
So I'll ignore repeaters for a bit here. The only difference between a master and a slave is a bit of configuration, and usually you'll allocate, say, memory differently on the two machines when you start the JVM. You might disable caches on the master (since they're used for searching). You may...... Let's say I have master M, and slaves S1, S2, S3. The slaves have an up-to-date index as of the last replication (just like your repeater would have). If any slave goes down, you can simply bring up another machine as a slave, point it at your master, wait for replication on that slave and then let your load balancer know it's there. This is the HOST2-4 failure you outlined.... Should the master fail you have two choices, depending upon how long you can wait for *new* content to be searchable. Let's say you can wait half a day in this situation. Spin up a new machine, copy the index over from one of the slaves (via a simple copy or by replicating). Point your indexing process at the master, point your slaves at the master for replication and you're done. Let's say you can't wait very long at all (and remember this better be quite a rare event). Then you could take a slave (let's say S1) it out of the loop that serves searches. Copy in the configuration files you use for your masters to it, point the indexer and searchers at it and you're done. Now spin up a new slave as above and your old configuration is back. Note that in two of these cases, you temporarily have 2 slaves doing the work that 3 used to, so a bit of over-capacity may be in order. But a really good question here is how to be sure all your data is in your index. After all, the slaves (and repeater for that matter) are only current up to the last replication. The simplest thing to do is simply re-index everything from the last known commit point. Assuming you have a <uniqueKey> defined, if you index documents that are already in the index, they'll just be replaced, no harm done. So let's say your replication interval is 10 minutes (picking a number from thin air). When your system is back and you restart your indexer, restart indexing from, say, the time you noticed your master went down - 1 hour as the restart point for your indexer. You can be more deterministic than this by examining the log on the machine you're using to replace the master with and noting the last replication time and subtract your hour (or whatever) from that. Anyway, hope I haven't confused you unduly! The take-away is that a that a slave can be made into a master as fast as a repeater can, the replication process is the same and I just don't see what a repeater buys you in the scenario you described. Best Erick On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson <parker_john...@gap.com>wrote: > > > I am hoping to get some feedback on the architecture I've been planning > for a medium to high volume site. This is my first time working > with Solr, so I want to be sure what I'm planning isn't totally weird, > unsupported, etc. > > We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts will > be repeaters (master+slave), and 2 of those hosts will be pure slaves. One > of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2 > will be "downed" and not taking traffic from that vip. The second vip, > "Search-vip" will have 3 members: HOST2, HOST3, and HOST4. The > "Index-vip" is intended to be used to post and commit index changes. The > "Search-vip" is intended to be customer facing. > > Here is some ASCII art. The line with the "X"'s thru it denotes a > "downed" member of a vip, one that isn't taking any traffic. The "M:" > denotes the value in the solrconfig.xml that the host uses as the master. > > > Index-vip Search-vip > / \ / | \ > / X / | \ > / \ / | \ > / X / | \ > / \ / | \ > / X / | \ > / \ / | \ > HOST1 HOST2 HOST3 HOST4 > REPEATER REPEATER SLAVE SLAVE > M:Index-vip M:Index-vip M:Index-vip M:Index-vip > > > I've been working through a couple failure scenarios. Recovering from a > failure of HOST2, HOST3, or HOST4 is pretty straightforward. Loosing > HOST1 is my major concern. My plan for recovering from a failure of HOST1 > is as follows: Enable HOST2 as a member of the Index-vip, while disabling > member HOST1. HOST2 effectively becomes the Master. HOST2, 3, and 4 > continue fielding customer requests and pulling indexes from "Index-vip." > Since HOST2 is now in charge of crunching indexes and fielding customer > requests, I assume load will increase on that box. > > When we recover HOST1, we will simply make sure it has replicated against > "Index-vip" and then re-enable HOST1 as a member of the Index-vip and > disable HOST2. > > Hopefully this makes sense. If all goes correctly, I've managed to keep > all services up and running without loosing any index data. > > So, I have a few questions: > > 1. Has anyone else tried this dual repeater approach? > 2. Am I going to have any semaphore/blocking issues if a repeater is > pulling index data from itself? > 3. Is there a better way to do this? > > > Thanks, > Parker > > > > > > >