I think the repeaters are misleading you a bit here. The purpose of a
repeater is
usually to replicate across a slow network, say in a remote data
center, then slaves at that center can get more timely updates. I don't
think
they add anything to your disaster recovery scenario.

So I'll ignore repeaters for a bit here. The only difference between a
master
and a slave is a bit of configuration, and usually you'll allocate, say,
memory
differently on the two machines when you start the JVM. You might disable
caches on the master (since they're used for searching). You may......

Let's say
I have master M, and slaves S1, S2, S3. The slaves have an
up-to-date index as of the last replication (just like your repeater
would have). If any slave goes down, you can simply bring up another
machine as a slave, point it at your master, wait for replication on that
slave and then let your load balancer know it's there. This is the
HOST2-4 failure you outlined....

Should the master fail you have two choices,
depending upon how long you can wait for *new* content to be searchable.
Let's say you can wait half a day in this situation. Spin up a new machine,
copy the index over from one of the slaves (via a simple copy or by
replicating). Point your indexing process at the master, point your slaves
at the master for replication and you're done.

Let's say you can't wait very long at all (and remember this better be quite
a rare
event). Then you could take a slave (let's say S1) it out of the loop that
serves
searches. Copy in the configuration files you use for your
masters to it, point the indexer and searchers at it and you're done.
Now spin up a new slave as above and your old configuration is back.

Note that in two of these cases, you temporarily have 2 slaves doing the
work
that 3 used to, so a bit of over-capacity may be in order.

But a really good question here is how to be sure all your data is in your
index.
After all, the slaves (and repeater for that matter) are only current up to
the last
replication. The simplest thing to do is simply re-index everything from the
last
known commit point. Assuming you have a <uniqueKey> defined, if you index
documents that are already in the index, they'll just be replaced, no harm
done.
So let's say your replication interval is 10 minutes (picking a number from
thin
air). When your system is back and you restart your indexer, restart
indexing from,
say, the time you noticed your master went down - 1 hour as the restart
point for
your indexer. You can be more deterministic than this by examining the log
on
the machine you're using to replace the master with and noting the last
replication
time and subtract your hour (or whatever) from that.

Anyway, hope I haven't confused you unduly! The take-away is that a that  a
slave can be made into a master as fast as a repeater can, the replication
process is the same and I just don't see what a repeater buys you in the
scenario you described.

Best
Erick


On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson <parker_john...@gap.com>wrote:

>
>
> I am hoping to get some feedback on the architecture I've been planning
> for a medium to high volume site.  This is my first time working
> with Solr, so I want to be sure what I'm planning isn't totally weird,
> unsupported, etc.
>
> We've got a a pair of F5 loadbalancers and 4 hosts.  2 of those hosts will
> be repeaters (master+slave), and 2 of those hosts will be pure slaves. One
> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2
> will be "downed" and not taking traffic from that vip.  The second vip,
> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4.  The
> "Index-vip" is intended to be used to post and commit index changes.  The
> "Search-vip" is intended to be customer facing.
>
> Here is some ASCII art.  The line with the "X"'s thru it denotes a
> "downed" member of a vip, one that isn't taking any traffic.  The "M:"
> denotes the value in the solrconfig.xml that the host uses as the master.
>
>
>              Index-vip         Search-vip
>                 / \             /   |   \
>                /   X           /    |    \
>               /     \         /     |     \
>              /       X       /      |      \
>             /         \     /       |       \
>            /           X   /        |        \
>           /             \ /         |         \
>         HOST1          HOST2      HOST3      HOST4
>       REPEATER        REPEATER    SLAVE      SLAVE
>      M:Index-vip    M:Index-vip M:Index-vip  M:Index-vip
>
>
> I've been working through a couple failure scenarios.  Recovering from a
> failure of HOST2, HOST3, or HOST4 is pretty straightforward.  Loosing
> HOST1 is my major concern.  My plan for recovering from a failure of HOST1
> is as follows: Enable HOST2 as a member of the Index-vip, while disabling
> member HOST1.  HOST2 effectively becomes the Master.  HOST2, 3, and 4
> continue fielding customer requests and pulling indexes from "Index-vip."
> Since HOST2 is now in charge of crunching indexes and fielding customer
> requests, I assume load will increase on that box.
>
> When we recover HOST1, we will simply make sure it has replicated against
> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and
> disable HOST2.
>
> Hopefully this makes sense.  If all goes correctly, I've managed to keep
> all services up and running without loosing any index data.
>
> So, I have a few questions:
>
> 1. Has anyone else tried this dual repeater approach?
> 2. Am I going to have any semaphore/blocking issues if a repeater is
> pulling index data from itself?
> 3. Is there a better way to do this?
>
>
> Thanks,
> Parker
>
>
>
>
>
>
>

Reply via email to