Our project built a custom "admin" webapp that we use for various O&M activities so I went ahead and added the ability to upload a Zip distribution which then uses SolrJ to forward the extracted contents to ZK, this package is built and uploaded via a Gradle build task which makes life easy on us by allowing us to jam stuff into ZK which is sitting in a private network (local VPC) without necessarily needing to be on a ZK machine. We then moved on to creating collection (trivial), and adding/removing replicas. As for adding replicas I am rather confused as to why I would need specify a specific shard for replica placement, before when I threw down a core.properties file the machine would automatically come up and figure out which shard it should join based on reasonable assumptions - why wouldn't the same logic apply here? I then saw that a Rule-based Replica Placement <https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement> feature was added which I thought would be reasonable but after looking at the tests <https://issues.apache.org/jira/browse/SOLR-7577> it appears to still require a shard parameter for adding a replica which seems to defeat the entire purpose. So after getting bummed out about that, I took a look at the delete replica request since we are having machines come/go we need to start dropping them and found that the delete replica requires a collection, shard, and replica name and if I have the name of the machine it appears the only way to figure out what to remove is by walking the clusterstate tree for all collections and determine which replicas are a candidate for removal which seems unnecessarily complicated.
Hopefully I don't come off as complaining, but rather looking at it from a client perspective, the Collections API doesn't seem simple to use and really the only reason I am messing around with it now is because there is repeated threats to make "zk as truth" the default in the 5.x branch at some point in the future. I would personally advocate that something like the autoManageReplicas <https://issues.apache.org/jira/browse/SOLR-5748> be introduced to make life much simpler on clients as this appears to be the thing I am trying to implement externally. If anyone has happened to to build a system to orchestrate Solr for cloud infrastructure and have some pointers it would be greatly appreciated. Thanks, -Steve On Thu, Sep 24, 2015 at 10:15 AM, Dan Davis <dansm...@gmail.com> wrote: > ant is very good at this sort of thing, and easier for Java devs to learn > than Make. Python has a module called fabric that is also very fine, but > for my dev. ops. it is another thing to learn. > I tend to divide things into three categories: > > - Things that have to do with system setup, and need to be run as root. > For this I write a bash script (I should learn puppet, but...) > - Things that have to do with one time installation as a solr admin user > with /bin/bash, including upconfig. For this I use an ant build. > - Normal operational procedures. For this, I typically use Solr admin or > scripts, but I wish I had time to create a good webapp (or money to > purchase Fusion). > > > On Thu, Sep 24, 2015 at 12:39 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > bq: What tools do you use for the "auto setup"? How do you get your > config > > automatically uploaded to zk? > > > > Both uploading the config to ZK and creating collections are one-time > > operations, usually done manually. Currently uploading the config set is > > accomplished with zkCli (yes, it's a little clumsy). There's a JIRA to > put > > this into solr/bin as a command though. They'd be easy enough to script > in > > any given situation though with a shell script or wizard.... > > > > Best, > > Erick > > > > On Wed, Sep 23, 2015 at 7:33 PM, Steve Davids <sdav...@gmail.com> wrote: > > > > > What tools do you use for the "auto setup"? How do you get your config > > > automatically uploaded to zk? > > > > > > On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum <gilinac...@gmail.com> > > wrote: > > > > > > > Our auto setup sequence is: > > > > 1.deploy 3 zk nodes > > > > 2. Deploy solr nodes and start them connecting to zk. > > > > 3. Upload collection config to zk. > > > > 4. Call create collection rest api. > > > > 5. Done. SolrCloud ready to work. > > > > > > > > Don't yet have automation for replacing or adding a node. > > > > On Sep 22, 2015 18:27, "Steve Davids" <sdav...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > I am trying to come up with a repeatable process for deploying a > Solr > > > > Cloud > > > > > cluster from scratch along with the appropriate security groups, > auto > > > > > scaling groups, and custom Solr plugin code. I saw that LucidWorks > > > > created > > > > > a Solr Scale Toolkit but that seems to be more of a one-shot deal > > than > > > > > really setting up your environment for the long-haul. Here is were > we > > > are > > > > > at right now: > > > > > > > > > > 1. ZooKeeper ensemble is easily brought up via a Cloud Formation > > > > Script > > > > > 2. We have an RPM built to lay down the Solr distribution + > Custom > > > > > plugins + Configuration > > > > > 3. Solr machines come up and connect to ZK > > > > > > > > > > Now, we are using Puppet which could easily create the > > core.properties > > > > file > > > > > for the corresponding core and have ZK get bootstrapped but that > > seems > > > to > > > > > be a no-no these days... So, can anyone think of a way to get ZK > > > > > bootstrapped automatically with pre-configured Collection > > > configurations? > > > > > Also, is there a recommendation on how to deal with machines that > are > > > > > coming/going? As I see it machines will be getting spun up and > > > terminated > > > > > from time to time and we need to have a process of dealing with > that, > > > the > > > > > first idea was to just use a common node name so if a machine was > > > > > terminated a new one can come up and replace that particular node > but > > > on > > > > > second thought it would seem to require an auto scaling group *per* > > > node > > > > > (so it knows what node name it is). For a large cluster this seems > > > crazy > > > > > from a maintenance perspective, especially if you want to be > elastic > > > with > > > > > regard to the number of live replicas for peak times. So, then the > > next > > > > > idea was to have some outside observer listen to when new ec2 > > instances > > > > are > > > > > created or terminated (via CloudWatch SQS) and make the appropriate > > API > > > > > calls to either add the replica or delete it, this seems doable but > > > > perhaps > > > > > not the simplest solution that could work. > > > > > > > > > > I was hoping others have already gone through this and have > valuable > > > > advice > > > > > to give, we are trying to setup Solr Cloud the "right way" so we > > don't > > > > get > > > > > nickel-and-dimed to death from an O&M perspective. > > > > > > > > > > Thanks, > > > > > > > > > > -Steve > > > > > > > > > > > > > > >