On Fri, Mar 7, 2014 at 12:15 PM, Josh Elser <[email protected]> wrote:

> On 3/7/14, 12:01 PM, Terry P. wrote:
>
>> Greetings folks,
>> It seems network woes will never go away for this Accumulo 1.4.2 project
>> :-(
>>
>> They rebooted one of the two "redundant switches" last night, but of
>> course zero redundancy actually took place and the Master lost his
>> zookeeper lock as did one of the Datanodes after 60 seconds and shut
>> itself down.
>>
>
> By datanode you mean tserver? Hadoop datanodes don't communicate with
> ZooKeeper.
>
>
>  The 60 second period is odd, because I see that
>> instance.zookeeper.timeout is actually set to 30s, but I do recall that
>> often by default zookeeper clients retry 2 times before bailing so maybe
>> that's why.
>>
>
> It won't always be 30s before it's seen; I've seen it much quicker too.
> I'm not sure about the retries off the top of my head.


Most likely you were seeing the effects of ACCUMULO-1572 in which a
ZooKeeper disconnect causes Accumulo failure before the expiration of the
session.  Fixed in 1.5.1 and to-be-released 1.4.5.  If you think you're
seeing something else it would be good to hear about it.

Reply via email to