On 7/28/2016 9:38 AM, Andy C wrote: > Would it make sense to use the embedded Zookeeper instance in this > situation? I have seen warning that the embedded Zookeeper should not > be used in production deployments, but the reason generally given is > that if Solr goes down Zookeeper will also go down, which doesn't seem > relevant here. Are there other reasons not to use the embedded Zookeeper?
The embedded zookeeper uses code copied from a fairly old version of zookeeper and slightly modified. This was needed at the time SolrCloud was created because that version of zookeeper would fail to start if the "myid" file was missing or didn't contain a valid server ID. In order for Solr to be able to control the the embedded ZK sufficiently, it wasn't possible to include the myid file with Solr, so the hack was needed. Because SolrCloud uses copied code to parse the zoo.cfg file and start the embedded zookeeper, it will not support ZK features added after 3.2, like snapshot auto-purge. Recently, Zookeeper was changed so it will work without a myid file if there are no "server" lines in the config, so the code hack in SolrCloud is no longer required. It will take some time for Solr's code to be changed to take advantage of this. As far as functionality, the embedded zookeeper will do fine for non-HA deployments, but it does mean there will be differences between your production and non-HA environments in *doing* the deployment, and in how Solr is configured/started. If that's acceptable to you, and you do not need advanced ZK features, then the embedded ZK would be good enough for non-HA environments. I personally would still use standalone ZK even for a dev environment, just to reduce the number of things that are different from production. Thanks, Shawn