Re: Losing tservers - Unusually high Last Contact times

Josh Elser Tue, 20 May 2014 08:22:19 -0700

On 5/20/14, 10:21 AM, thomasa wrote:

I was worried about how many connections would be open on the larger cloud,
so I significantly reduced the number of YARN process. Side question: does
each worker node have a connection with every other node?

Are you referring to the YARN processes or Accumulo processes? For YARN,I believe the container will primarily be communicating back to the RMfor MapReduce, but a custom app could be doing anything.

For Accumulo, mostly, a tserver will be only communicating with themaster. I know this isn't entirely true, though. For examples, tserverswill communicate with other tservers as a part of bulk-importing.


If they did, my

guess was that there would be significantly more open connections on a 150+
node cloud than a 40 node cloud. For that reason, I only have 2 YARN
processes with 2gb memory each on the larger cloud that is seeing the
issues. My thought was that each YARN process needs a core, the tablet
server needs a core, and OS stuff could probably use a core.

Yes, you should most definitely be leaving headroom on a system for theoperating system. A core and 1G of RAM is probably a good startingpoint, but YMMV.

To increase the zookeeper timeout, you can try this, but it will haveother implications, such a failure detection/recovery being slower:

In accumulo-site.xml: set instance.zookeeper.timeout equal to somethinglike 45s or 60s (default is 30s as Dave mentioned earlier).

In zoo.cfg: set maxSessionTimeout equal to the above, but inmilliseconds, e.g. 45000 or 60000.

Re: Losing tservers - Unusually high Last Contact times

Reply via email to