Re: Fully automated replica creation in AWS

Jeff Wartes Wed, 09 Dec 2015 13:47:20 -0800

It’s a pretty common misperception that since solr scales, you can just
spin up new nodes and be done. Amazon ElasticSearch and older solrcloud
getting-started docs encourage this misperception, as does the HDFS-only
autoAddReplicas flag.


I agree that auto-scaling should be approached carefully, and
per-collection, but the question also comes up a lot, so the lack of
available/blessed solr tools hurts, in my opinion.
https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placeme
nt helps, but you still need to say when and how many.

Anyway, the last time this came up
(https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201510.mbox/%3C
cae+cwktalxicdfc2zlfxvxregvnt-56yyqct3u-onhqchxa...@mail.gmail.com%3E) I
suggested this as a place to get started - it’s a tool that knows where to
look: https://github.com/whitepages/solrcloud_manager

Using this, scaling up a collection is pretty easy. Add some nodes, then:
    // Add replicas as "cluster space” allows. (assumes you’re not using
built-in rule-based placement)
    // Respects current maxShardsPerNode for the collection, prevents
adding replicas that already exist on a node,
    // adds replicas for shards with a lower replication factor first.
    java -jar solrcloud_manager-assembly-1.4.0.jar fill -z
zk0.example.com:2181/myapp -c collection1

Scaling down is also a one-liner. Turn off some nodes and run:
    // removes any replicas in the collection that are not marked “active”
from the cluster state
    java -jar solrcloud_manager-assembly-1.4.0.jar cleancollection -z
zk0.example.com:2181/myapp -c collection1
BUT, you still need to take care not to take down all of the nodes with a
given shard at once. This can be tricky to figure out if your collection
has shards spread across many nodes.


Another downscaling option would be to proactively delete the replicas off
of a node before turning it off:
    // deletes all replicas for the collection on the node(s),
    // but refuses to delete a replica if that’d bring the
replicationFactor for that shard below 2
    java -jar solrcloud_manager-assembly-1.4.0.jar clean -z
zk0.example.com:2181/myapp -c collection1 --nodes abc.example.com
--safetyFactor 2



On 12/9/15, 12:09 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

>bq: As a side note, we do this for our
>customers as that's baked into our cloud provisioning software,
>
>Exactly, nothing OOB is there, but all the data is available, you
>"just" have to write a tool that knows where to look ;) That said,
>this would certainly be something that would have to be optional
>and, IMO, not on by default. It'd need some design effort, as I'd
>guess it'd be on a per-collection basis thus turned on in solrconfig.xml.
>Hmmm, or perhaps this would be some kind of list of nodes
>in Zookeeper. Or..
>
>So when a new Solr node came up for the first time, it'd need
>to find out which collections were configured with this option
>either through enumerating all the collections and checking
>their states or looking in their solrconfig files or enumerating
>a list of children in Zookeeper or.... Plus you'd need some
>kind of safeguards in place to handle bringing up, say, 10 new
>Solr instances one at a time so all the new replicas didn't
>end up on the first node you added. And so on....
>
>The more I think about all that the less I like it; it seems that
>custom utilities on a per-collection basis make more sense.
>
>And yeah, autoAddReplicas is exactly for that on HDFS. Using
>autoAddReplicas for non shared filesystems doesn't really
>make sense IMO. The supposition is that the Solr node has
>gone away. How would some _other_ Solr instance on some _other_
>node know where to look for the index?
>
>Best,
>Erick
>
>On Wed, Dec 9, 2015 at 11:37 AM, Sameer Maggon
><sam...@measuredsearch.com> wrote:
>> Erick,
>>
>> Typically, while creating collections, a replicationFactor is specified.
>> Thus, the meta data about the collection does have information about
>>what
>> the "desired" replicationFactor is for the collection. If that's the
>>case,
>> when a Solr node joins the cluster, there could be a pro-active
>>add-replica
>> operation that can be initiated if the Solr detects that the current
>> replicas are less than the desired replicationFactor and pull the
>> collection data from the leader.
>>
>> Isn't that what the attribute "autoAddReplicas" does for HDFS - can
>>this be
>> done for non-shared filesystem? As a side note, we do this for our
>> customers as that's baked into our cloud provisioning software, but it
>> would be nice if Solr supports that OOTB. Are there any underlying
>>flaws of
>> doing that?
>>
>> Thanks,
>> --
>>
>> *Sameer Maggon*
>> www.measuredsearch.com
>> 
>><https://mailtrack.io/trace/link/663333fad5b85359bf1b21be04166edea6c7d13e
>>?url=http%3A%2F%2Fmeasuredsearch.com%2F&signature=6dbc74f0abef4882>
>> |
>> Deploy, Scale & Manage Solr in the cloud of your choice.
>>
>>
>> On Wed, Dec 9, 2015 at 11:19 AM, Erick Erickson
>><erickerick...@gmail.com>
>> wrote:
>>
>>> Not that I know of. The two systems are somewhat disconnected.
>>> AWS doesn't know that Solr lives on those nodes, it's just spinning
>>> one up, right? Albeit with Solr running.
>>>
>>> There's nothing in Solr that auto-detects the  existence of a new
>>> Solr node and automagically assigns collections and/or replicas.
>>>
>>> How would either system intuit that this new node is replacing
>>> something else and "do the right thing"?
>>>
>>> I'll tell you how, by interrogating Zookeeper and seeing that for some
>>> specific collection, shardX had fewer replicas than other shards and
>>> issuing the Collections API ADDREPLICA command.
>>>
>>> But now there are _three_ systems that need to be coordinated and
>>> doing the right thing in your situation would be the wrong thing in
>>> another. The last thing many sys ops want is having replicas started
>>> without their knowledge.
>>>
>>> And on top of that, I have doubts about the model. Having AWS
>>> elastically spin up a new replica is a heavyweight operation from
>>> Solr's perspective. I mean this potentially copies a many G set of
>>> index files from one place to another which could take a long time,
>>> is that really what's desired here?
>>>
>>> I have seen some folks spin up/down Solr instances based on a
>>> schedule if they know roughly when the peak load will be, but again
>>> there's nothing built in to handle this.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Dec 9, 2015 at 10:15 AM, Ugo Matrangolo
>>> <ugo.matrang...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I was trying to setup a SolrCloud cluster in AWS backed by an ASG
>>>(auto
>>> > scaling group) serving a replicated collection. I have just came
>>>across a
>>> > case when one of the Solr node became unresponsive with AWS killing
>>>it
>>> and
>>> > spinning a new one.
>>> >
>>> > Unfortunately, this new Solr node did not join as a replica of the
>>> existing
>>> > collection requiring human intervention to configure it as a new
>>>replica.
>>> >
>>> > I was wondering if there is around something that will make this
>>>process
>>> > fully automated by detecting that a new node just joined the cluster
>>>and
>>> > instructing it (e.g. via Collections API) to join as a replica of a
>>>given
>>> > collection.
>>> >
>>> > Best
>>> > Ugo
>>>

Re: Fully automated replica creation in AWS

Reply via email to