Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Parker Johnson Thu, 14 Apr 2011 12:36:47 -0700

Otis and Erick,

Thanks for the responses and for thinking over my potential scenarios.


The big draw for me on 2 repeaters idea is that I can:

1. Maximize my hardware.  I don't need a standby master.  Instead, I can
use the "second" repeater to field customer requests.
2. After the primary repeater failure, I neither need to fumble with
multiple solconfig.xml edits (we're also using cores) or worry about
manually replicating or copying indexes around.

In a sense, although, perhaps not by design, a repeater solves those
problems.

We considered centralized storage and a standby master with access to
shared filesystem, but what are you using for a shared filesystem? (NFS?
Egh...)

-Parker

On 4/12/11 6:19 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

>I think the repeaters are misleading you a bit here. The purpose of a
>repeater is
>usually to replicate across a slow network, say in a remote data
>center, then slaves at that center can get more timely updates. I don't
>think
>they add anything to your disaster recovery scenario.
>
>So I'll ignore repeaters for a bit here. The only difference between a
>master
>and a slave is a bit of configuration, and usually you'll allocate, say,
>memory
>differently on the two machines when you start the JVM. You might disable
>caches on the master (since they're used for searching). You may......
>
>Let's say
>I have master M, and slaves S1, S2, S3. The slaves have an
>up-to-date index as of the last replication (just like your repeater
>would have). If any slave goes down, you can simply bring up another
>machine as a slave, point it at your master, wait for replication on that
>slave and then let your load balancer know it's there. This is the
>HOST2-4 failure you outlined....
>
>Should the master fail you have two choices,
>depending upon how long you can wait for *new* content to be searchable.
>Let's say you can wait half a day in this situation. Spin up a new
>machine,
>copy the index over from one of the slaves (via a simple copy or by
>replicating). Point your indexing process at the master, point your slaves
>at the master for replication and you're done.
>
>Let's say you can't wait very long at all (and remember this better be
>quite
>a rare
>event). Then you could take a slave (let's say S1) it out of the loop that
>serves
>searches. Copy in the configuration files you use for your
>masters to it, point the indexer and searchers at it and you're done.
>Now spin up a new slave as above and your old configuration is back.
>
>Note that in two of these cases, you temporarily have 2 slaves doing the
>work
>that 3 used to, so a bit of over-capacity may be in order.
>
>But a really good question here is how to be sure all your data is in your
>index.
>After all, the slaves (and repeater for that matter) are only current up
>to
>the last
>replication. The simplest thing to do is simply re-index everything from
>the
>last
>known commit point. Assuming you have a <uniqueKey> defined, if you index
>documents that are already in the index, they'll just be replaced, no harm
>done.
>So let's say your replication interval is 10 minutes (picking a number
>from
>thin
>air). When your system is back and you restart your indexer, restart
>indexing from,
>say, the time you noticed your master went down - 1 hour as the restart
>point for
>your indexer. You can be more deterministic than this by examining the log
>on
>the machine you're using to replace the master with and noting the last
>replication
>time and subtract your hour (or whatever) from that.
>
>Anyway, hope I haven't confused you unduly! The take-away is that a that
>a
>slave can be made into a master as fast as a repeater can, the replication
>process is the same and I just don't see what a repeater buys you in the
>scenario you described.
>
>Best
>Erick
>
>
>On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson
><parker_john...@gap.com>wrote:
>
>>
>>
>> I am hoping to get some feedback on the architecture I've been planning
>> for a medium to high volume site.  This is my first time working
>> with Solr, so I want to be sure what I'm planning isn't totally weird,
>> unsupported, etc.
>>
>> We've got a a pair of F5 loadbalancers and 4 hosts.  2 of those hosts
>>will
>> be repeaters (master+slave), and 2 of those hosts will be pure slaves.
>>One
>> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2
>> will be "downed" and not taking traffic from that vip.  The second vip,
>> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4.  The
>> "Index-vip" is intended to be used to post and commit index changes.
>>The
>> "Search-vip" is intended to be customer facing.
>>
>> Here is some ASCII art.  The line with the "X"'s thru it denotes a
>> "downed" member of a vip, one that isn't taking any traffic.  The "M:"
>> denotes the value in the solrconfig.xml that the host uses as the
>>master.
>>
>>
>>              Index-vip         Search-vip
>>                 / \             /   |   \
>>                /   X           /    |    \
>>               /     \         /     |     \
>>              /       X       /      |      \
>>             /         \     /       |       \
>>            /           X   /        |        \
>>           /             \ /         |         \
>>         HOST1          HOST2      HOST3      HOST4
>>       REPEATER        REPEATER    SLAVE      SLAVE
>>      M:Index-vip    M:Index-vip M:Index-vip  M:Index-vip
>>
>>
>> I've been working through a couple failure scenarios.  Recovering from a
>> failure of HOST2, HOST3, or HOST4 is pretty straightforward.  Loosing
>> HOST1 is my major concern.  My plan for recovering from a failure of
>>HOST1
>> is as follows: Enable HOST2 as a member of the Index-vip, while
>>disabling
>> member HOST1.  HOST2 effectively becomes the Master.  HOST2, 3, and 4
>> continue fielding customer requests and pulling indexes from
>>"Index-vip."
>> Since HOST2 is now in charge of crunching indexes and fielding customer
>> requests, I assume load will increase on that box.
>>
>> When we recover HOST1, we will simply make sure it has replicated
>>against
>> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and
>> disable HOST2.
>>
>> Hopefully this makes sense.  If all goes correctly, I've managed to keep
>> all services up and running without loosing any index data.
>>
>> So, I have a few questions:
>>
>> 1. Has anyone else tried this dual repeater approach?
>> 2. Am I going to have any semaphore/blocking issues if a repeater is
>> pulling index data from itself?
>> 3. Is there a better way to do this?
>>
>>
>> Thanks,
>> Parker
>>
>>
>>
>>
>>
>>
>>

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Reply via email to