I would not run Zookeeper in a container. That seems like a very bad idea.
Each Zookeeper node has an identity. They are not interchangeable.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 31, 2018, at 11:14 AM, Jack Schlederer 
> <jack.schlede...@directsupply.com> wrote:
> 
> Thanks Erick. After some more testing, I'd like to correct the failure case
> we're seeing. It's not when 2 ZK nodes are killed that we have trouble
> recovering, but rather when all 3 ZK nodes that came up when the cluster
> was initially started get killed at some point. Even if it's one at a time,
> and we wait for a new one to spin up and join the cluster before killing
> the next one, we get into a bad state when none of the 3 nodes that were in
> the cluster initially are there anymore, even though they've been replaced
> by our cloud provider spinning up new ZK's. We assign DNS names to the
> ZooKeepers as they spin up, with a 10 second TTL, and those are what get
> set as the ZK_HOST environment variable on the Solr hosts (i.e., ZK_HOST=
> zk1.foo.com:2182,zk2.foo.com:2182,zk3.foo.com:2182). Our working hypothesis
> is that Solr's JVM is caching the IP addresses for the ZK hosts' DNS names
> when it starts up, and doesn't re-query DNS for some reason when it finds
> that that IP address is no longer reachable (i.e., when a ZooKeeper node
> dies and spins up at a different IP). Our current trajectory has us finding
> a way to assign known static IPs to the ZK nodes upon startup, and
> assigning those IPs to the ZK_HOST env var, so we can take DNS lookups out
> of the picture entirely.
> 
> We can reproduce this in our cloud environment, as each ZK node has its own
> IP and DNS name, but it's difficult to reproduce locally due to all the
> ZooKeeper containers having the same IP when running locally (127.0.0.1).
> 
> Please let us know if you have insight into this issue.
> 
> Thanks,
> Jack
> 
> On Fri, Aug 31, 2018 at 10:40 AM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> Jack:
>> 
>> Is it possible to reproduce "manually"? By that I mean without the
>> chaos bit by the following:
>> 
>> - Start 3 ZK nodes
>> - Create a multi-node, multi-shard Solr collection.
>> - Sequentially stop and start the ZK nodes, waiting for the ZK quorum
>> to recover between restarts.
>> - Solr does not reconnect to the restarted ZK node and will think it's
>> lost quorum after the second node is restarted.
>> 
>> bq. Kill 2, however, and we lose the quorum and we have
>> collections/replicas that appear as "gone" on the Solr Admin UI's
>> cloud graph display.
>> 
>> It's odd that replicas appear as "gone", and suggests that your ZK
>> ensemble is possibly not correctly configured, although exactly how is
>> a mystery. Solr pulls it's picture of the topology of the network from
>> ZK, establishes watches and the like. For most operations, Solr
>> doesn't even ask ZooKeeper for anything since it's picture of the
>> cluster is stored locally. ZKs job is to inform the various Solr nodes
>> when the topology changes, i.e. _Solr_ nodes change state. For
>> querying and indexing, there's no ZK involved at all. Even if _all_
>> ZooKeeper nodes disappear, Solr should still be able to talk to other
>> Solr nodes and shouldn't show them as down just because it can't talk
>> to ZK. Indeed, querying should be OK although indexing will fail if
>> quorum is lost.
>> 
>> But you say you see the restarted ZK nodes rejoin the ZK ensemble, so
>> the ZK config seems right. Is there any chance your chaos testing
>> "somehow" restarts the ZK nodes with any changes to the configs?
>> Shooting in the dark here.
>> 
>> For a replica to be "gone", the host node should _also_ be removed
>> form the "live_nodes" znode, Hmmmm. I do wonder if what you're
>> observing is a consequence of both killing ZK nodes and Solr nodes.
>> I'm not saying this is what _should_ happen, just trying to understand
>> what you're reporting.
>> 
>> My theory here is that your chaos testing kills some Solr nodes and
>> that fact is correctly propagated to the remaining Solr nodes. Then
>> your ZK nodes are killed and somehow Solr doesn't reconnect to ZK
>> appropriately so it's picture of the cluster has the node as
>> permanently down. Then you restart the Solr node and that information
>> isn't propagated to the Solr nodes since they didn't reconnect. If
>> that were the case, then I'd expect the admin UI to correctly show the
>> state of the cluster when hit on a Solr node that has never been
>> restarted.
>> 
>> As you can tell, I'm using something of a scattergun approach here b/c
>> this isn't what _should_ happen given what you describe.
>> Theoretically, all the ZK nodes should be able to go away and come
>> back and Solr reconnect...
>> 
>> As an aside, if you are ever in the code you'll see that for a replica
>> to be usable, it must have both the state set to "active" _and_ the
>> corresponding node has to be present in the live_nodes ephemeral
>> zNode.
>> 
>> Is there any chance you could try the manual steps above (AWS isn't
>> necessary here) and let us know what happens? And if we can get a
>> reproducible set of steps, feel free to open a JIRA.
>> On Thu, Aug 30, 2018 at 10:11 PM Jack Schlederer
>> <jack.schlede...@directsupply.com> wrote:
>>> 
>>> We run a 3 node ZK cluster, but I'm not concerned about 2 nodes failing
>> at
>>> the same time. Our chaos process only kills approximately one node per
>>> hour, and our cloud service provider automatically spins up another ZK
>> node
>>> when one goes down. All 3 ZK nodes are back up within 2 minutes, talking
>> to
>>> each other and syncing data. It's just that Solr doesn't seem to
>> recognize
>>> it. We'd have to restart Solr to get it to recognize the new Zookeepers,
>>> which we can't do without taking downtime or losing data that's stored on
>>> non-persistent disk within the container.
>>> 
>>> The ZK_HOST environment variable lists all 3 ZK nodes.
>>> 
>>> We're running ZooKeeper version 3.4.13.
>>> 
>>> Thanks,
>>> Jack
>>> 
>>> On Thu, Aug 30, 2018 at 4:12 PM Walter Underwood <wun...@wunderwood.org>
>>> wrote:
>>> 
>>>> How many Zookeeper nodes in your ensemble? You need five nodes to
>>>> handle two failures.
>>>> 
>>>> Are your Solr instances started with a zkHost that lists all five
>>>> Zookeeper nodes?
>>>> 
>>>> What version of Zookeeper?
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>>> On Aug 30, 2018, at 1:45 PM, Jack Schlederer <
>>>> jack.schlede...@directsupply.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> My team is attempting to spin up a SolrCloud cluster with an external
>>>>> ZooKeeper ensemble. We're trying to engineer our solution to be HA
>> and
>>>>> fault-tolerant such that we can lose either 1 Solr instance or 1
>>>> ZooKeeper
>>>>> and not take downtime. We use chaos engineering to randomly kill
>>>> instances
>>>>> to test our fault-tolerance. Killing Solr instances seems to be
>> solved,
>>>> as
>>>>> we use a high enough replication factor and Solr's built in
>> autoscaling
>>>> to
>>>>> ensure that new Solr nodes added to the cluster get the replicas that
>>>> were
>>>>> lost from the killed node. However, ZooKeeper seems to be a different
>>>>> story. We can kill 1 ZooKeeper instance and still maintain, and
>>>> everything
>>>>> is good. It comes back and starts participating in leader elections,
>> etc.
>>>>> Kill 2, however, and we lose the quorum and we have
>> collections/replicas
>>>>> that appear as "gone" on the Solr Admin UI's cloud graph display,
>> and we
>>>>> get Java errors in the log reporting that collections can't be read
>> from
>>>>> ZK. This means we aren't servicing search requests. We found an open
>> JIRA
>>>>> that reports this same issue, but its only affected version is
>> 5.3.1. We
>>>>> are experiencing this problem in 7.3.1. Has there been any progress
>> or
>>>>> potential workarounds on this issue since?
>>>>> 
>>>>> Thanks,
>>>>> Jack
>>>>> 
>>>>> Reference:
>>>>> https://issues.apache.org/jira/browse/SOLR-8868
>>>> 
>>>> 
>> 

Reply via email to