Otis and Erick, Thanks for the responses and for thinking over my potential scenarios.
The big draw for me on 2 repeaters idea is that I can: 1. Maximize my hardware. I don't need a standby master. Instead, I can use the "second" repeater to field customer requests. 2. After the primary repeater failure, I neither need to fumble with multiple solconfig.xml edits (we're also using cores) or worry about manually replicating or copying indexes around. In a sense, although, perhaps not by design, a repeater solves those problems. We considered centralized storage and a standby master with access to shared filesystem, but what are you using for a shared filesystem? (NFS? Egh...) -Parker On 4/12/11 6:19 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: >I think the repeaters are misleading you a bit here. The purpose of a >repeater is >usually to replicate across a slow network, say in a remote data >center, then slaves at that center can get more timely updates. I don't >think >they add anything to your disaster recovery scenario. > >So I'll ignore repeaters for a bit here. The only difference between a >master >and a slave is a bit of configuration, and usually you'll allocate, say, >memory >differently on the two machines when you start the JVM. You might disable >caches on the master (since they're used for searching). You may...... > >Let's say >I have master M, and slaves S1, S2, S3. The slaves have an >up-to-date index as of the last replication (just like your repeater >would have). If any slave goes down, you can simply bring up another >machine as a slave, point it at your master, wait for replication on that >slave and then let your load balancer know it's there. This is the >HOST2-4 failure you outlined.... > >Should the master fail you have two choices, >depending upon how long you can wait for *new* content to be searchable. >Let's say you can wait half a day in this situation. Spin up a new >machine, >copy the index over from one of the slaves (via a simple copy or by >replicating). Point your indexing process at the master, point your slaves >at the master for replication and you're done. > >Let's say you can't wait very long at all (and remember this better be >quite >a rare >event). Then you could take a slave (let's say S1) it out of the loop that >serves >searches. Copy in the configuration files you use for your >masters to it, point the indexer and searchers at it and you're done. >Now spin up a new slave as above and your old configuration is back. > >Note that in two of these cases, you temporarily have 2 slaves doing the >work >that 3 used to, so a bit of over-capacity may be in order. > >But a really good question here is how to be sure all your data is in your >index. >After all, the slaves (and repeater for that matter) are only current up >to >the last >replication. The simplest thing to do is simply re-index everything from >the >last >known commit point. Assuming you have a <uniqueKey> defined, if you index >documents that are already in the index, they'll just be replaced, no harm >done. >So let's say your replication interval is 10 minutes (picking a number >from >thin >air). When your system is back and you restart your indexer, restart >indexing from, >say, the time you noticed your master went down - 1 hour as the restart >point for >your indexer. You can be more deterministic than this by examining the log >on >the machine you're using to replace the master with and noting the last >replication >time and subtract your hour (or whatever) from that. > >Anyway, hope I haven't confused you unduly! The take-away is that a that >a >slave can be made into a master as fast as a repeater can, the replication >process is the same and I just don't see what a repeater buys you in the >scenario you described. > >Best >Erick > > >On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson ><parker_john...@gap.com>wrote: > >> >> >> I am hoping to get some feedback on the architecture I've been planning >> for a medium to high volume site. This is my first time working >> with Solr, so I want to be sure what I'm planning isn't totally weird, >> unsupported, etc. >> >> We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts >>will >> be repeaters (master+slave), and 2 of those hosts will be pure slaves. >>One >> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2 >> will be "downed" and not taking traffic from that vip. The second vip, >> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4. The >> "Index-vip" is intended to be used to post and commit index changes. >>The >> "Search-vip" is intended to be customer facing. >> >> Here is some ASCII art. The line with the "X"'s thru it denotes a >> "downed" member of a vip, one that isn't taking any traffic. The "M:" >> denotes the value in the solrconfig.xml that the host uses as the >>master. >> >> >> Index-vip Search-vip >> / \ / | \ >> / X / | \ >> / \ / | \ >> / X / | \ >> / \ / | \ >> / X / | \ >> / \ / | \ >> HOST1 HOST2 HOST3 HOST4 >> REPEATER REPEATER SLAVE SLAVE >> M:Index-vip M:Index-vip M:Index-vip M:Index-vip >> >> >> I've been working through a couple failure scenarios. Recovering from a >> failure of HOST2, HOST3, or HOST4 is pretty straightforward. Loosing >> HOST1 is my major concern. My plan for recovering from a failure of >>HOST1 >> is as follows: Enable HOST2 as a member of the Index-vip, while >>disabling >> member HOST1. HOST2 effectively becomes the Master. HOST2, 3, and 4 >> continue fielding customer requests and pulling indexes from >>"Index-vip." >> Since HOST2 is now in charge of crunching indexes and fielding customer >> requests, I assume load will increase on that box. >> >> When we recover HOST1, we will simply make sure it has replicated >>against >> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and >> disable HOST2. >> >> Hopefully this makes sense. If all goes correctly, I've managed to keep >> all services up and running without loosing any index data. >> >> So, I have a few questions: >> >> 1. Has anyone else tried this dual repeater approach? >> 2. Am I going to have any semaphore/blocking issues if a repeater is >> pulling index data from itself? >> 3. Is there a better way to do this? >> >> >> Thanks, >> Parker >> >> >> >> >> >> >>