Re: Client fails due to single Zookeeper node failure

2015-09-11 Thread Eric Newton
Specifically: https://issues.apache.org/jira/browse/ACCUMULO-3218

-Eric


On Fri, Sep 11, 2015 at 8:49 AM, Christopher  wrote:

> I believe this was one of the bugs fixed in either 1.6.2 or 1.6.3. There
> was an error parsing the configuration as a list.
>
> On Fri, Sep 11, 2015, 08:28 Brendan Mahoney  wrote:
>
>> Hi,
>>   We have an Accumulo v1.6.1 cluster with a 5-node Zookeeper v3.4.5
>> cluster.   One of the Zookeeper nodes crashed and all Accumulo client
>> connections (including the shell) now fail with:
>>
>> ERROR: java.lang.RuntimeException: Failed to connect to zookeeper
>> (node_11:2181) within 2x zookeeper timeout period 3.
>>
>>  If we move the bad Zookeeper node (node_11) to the end of the Zookeeper
>> node list in accumulo-site.xml, clients connect successfully.  Is the first
>> Zookeeper node in the list a single-point-of-failure for our Accumulo
>> cluster?
>>
>> Thanks,
>>Brendan
>>
>


Re: imbalance in number of zookeeper clients

2015-09-11 Thread Jeff Turner
i think ACCUMULO-3218 may have bit us two days ago, due to the first 
zookeeper

in the list being unavailable.

but, after a full restart, i don't think that situation applies - the 
lighter-load zookeeper

was in the middle of the list - 600, 600, 250, 600, 600.

thanks,


On 9/10/15 8:02 PM, dlmarion wrote:

Hey Jeff,

 Take a look at [1] and see if the zookeeper balance issue mentioned 
is applicable.


Dave

[1] https://accumulo.apache.org/release_notes/1.6.2.html





 Original message 
From: Jeff Turner 
Date: 09/10/2015 7:42 PM (GMT-05:00)
To: user@accumulo.apache.org
Subject: imbalance in number of zookeeper clients

sorry if this is a faq.  i can't come up with a good google query to
find the answer.

how bad is it that four of our five zookeepers have 600-700 clients, 
and one

has about 250?

i assumed that zookeeper or accumulo has some sort of natural
rebalancing property,
so it will all work itself out.

i've been resisting a full accumulo/zk restart.
and restarting the one zookeeper to see what happens has a big
unpleasant wake, too.

so
   - will they eventually rebalance
   - if not, how bad is it that four of them are working harder

thanks,
jeff




Re: Client fails due to single Zookeeper node failure

2015-09-11 Thread Christopher
I believe this was one of the bugs fixed in either 1.6.2 or 1.6.3. There
was an error parsing the configuration as a list.

On Fri, Sep 11, 2015, 08:28 Brendan Mahoney  wrote:

> Hi,
>   We have an Accumulo v1.6.1 cluster with a 5-node Zookeeper v3.4.5
> cluster.   One of the Zookeeper nodes crashed and all Accumulo client
> connections (including the shell) now fail with:
>
> ERROR: java.lang.RuntimeException: Failed to connect to zookeeper
> (node_11:2181) within 2x zookeeper timeout period 3.
>
>  If we move the bad Zookeeper node (node_11) to the end of the Zookeeper
> node list in accumulo-site.xml, clients connect successfully.  Is the first
> Zookeeper node in the list a single-point-of-failure for our Accumulo
> cluster?
>
> Thanks,
>Brendan
>


Client fails due to single Zookeeper node failure

2015-09-11 Thread Brendan Mahoney
Hi,
  We have an Accumulo v1.6.1 cluster with a 5-node Zookeeper v3.4.5
cluster.   One of the Zookeeper nodes crashed and all Accumulo client
connections (including the shell) now fail with:

ERROR: java.lang.RuntimeException: Failed to connect to zookeeper
(node_11:2181) within 2x zookeeper timeout period 3.

 If we move the bad Zookeeper node (node_11) to the end of the Zookeeper
node list in accumulo-site.xml, clients connect successfully.  Is the first
Zookeeper node in the list a single-point-of-failure for our Accumulo
cluster?

Thanks,
   Brendan