Re: Client fails due to single Zookeeper node failure
Specifically: https://issues.apache.org/jira/browse/ACCUMULO-3218 -Eric On Fri, Sep 11, 2015 at 8:49 AM, Christopherwrote: > I believe this was one of the bugs fixed in either 1.6.2 or 1.6.3. There > was an error parsing the configuration as a list. > > On Fri, Sep 11, 2015, 08:28 Brendan Mahoney wrote: > >> Hi, >> We have an Accumulo v1.6.1 cluster with a 5-node Zookeeper v3.4.5 >> cluster. One of the Zookeeper nodes crashed and all Accumulo client >> connections (including the shell) now fail with: >> >> ERROR: java.lang.RuntimeException: Failed to connect to zookeeper >> (node_11:2181) within 2x zookeeper timeout period 3. >> >> If we move the bad Zookeeper node (node_11) to the end of the Zookeeper >> node list in accumulo-site.xml, clients connect successfully. Is the first >> Zookeeper node in the list a single-point-of-failure for our Accumulo >> cluster? >> >> Thanks, >>Brendan >> >
Re: imbalance in number of zookeeper clients
i think ACCUMULO-3218 may have bit us two days ago, due to the first zookeeper in the list being unavailable. but, after a full restart, i don't think that situation applies - the lighter-load zookeeper was in the middle of the list - 600, 600, 250, 600, 600. thanks, On 9/10/15 8:02 PM, dlmarion wrote: Hey Jeff, Take a look at [1] and see if the zookeeper balance issue mentioned is applicable. Dave [1] https://accumulo.apache.org/release_notes/1.6.2.html Original message From: Jeff TurnerDate: 09/10/2015 7:42 PM (GMT-05:00) To: user@accumulo.apache.org Subject: imbalance in number of zookeeper clients sorry if this is a faq. i can't come up with a good google query to find the answer. how bad is it that four of our five zookeepers have 600-700 clients, and one has about 250? i assumed that zookeeper or accumulo has some sort of natural rebalancing property, so it will all work itself out. i've been resisting a full accumulo/zk restart. and restarting the one zookeeper to see what happens has a big unpleasant wake, too. so - will they eventually rebalance - if not, how bad is it that four of them are working harder thanks, jeff
Re: Client fails due to single Zookeeper node failure
I believe this was one of the bugs fixed in either 1.6.2 or 1.6.3. There was an error parsing the configuration as a list. On Fri, Sep 11, 2015, 08:28 Brendan Mahoneywrote: > Hi, > We have an Accumulo v1.6.1 cluster with a 5-node Zookeeper v3.4.5 > cluster. One of the Zookeeper nodes crashed and all Accumulo client > connections (including the shell) now fail with: > > ERROR: java.lang.RuntimeException: Failed to connect to zookeeper > (node_11:2181) within 2x zookeeper timeout period 3. > > If we move the bad Zookeeper node (node_11) to the end of the Zookeeper > node list in accumulo-site.xml, clients connect successfully. Is the first > Zookeeper node in the list a single-point-of-failure for our Accumulo > cluster? > > Thanks, >Brendan >
Client fails due to single Zookeeper node failure
Hi, We have an Accumulo v1.6.1 cluster with a 5-node Zookeeper v3.4.5 cluster. One of the Zookeeper nodes crashed and all Accumulo client connections (including the shell) now fail with: ERROR: java.lang.RuntimeException: Failed to connect to zookeeper (node_11:2181) within 2x zookeeper timeout period 3. If we move the bad Zookeeper node (node_11) to the end of the Zookeeper node list in accumulo-site.xml, clients connect successfully. Is the first Zookeeper node in the list a single-point-of-failure for our Accumulo cluster? Thanks, Brendan