On 5/20/14, 10:21 AM, thomasa wrote:
I was worried about how many connections would be open on the larger cloud,
so I significantly reduced the number of YARN process. Side question: does
each worker node have a connection with every other node?

Are you referring to the YARN processes or Accumulo processes? For YARN, I believe the container will primarily be communicating back to the RM for MapReduce, but a custom app could be doing anything.

For Accumulo, mostly, a tserver will be only communicating with the master. I know this isn't entirely true, though. For examples, tservers will communicate with other tservers as a part of bulk-importing.

If they did, my
guess was that there would be significantly more open connections on a 150+
node cloud than a 40 node cloud. For that reason, I only have 2 YARN
processes with 2gb memory each on the larger cloud that is seeing the
issues. My thought was that each YARN process needs a core, the tablet
server needs a core, and OS stuff could probably use a core.

Yes, you should most definitely be leaving headroom on a system for the operating system. A core and 1G of RAM is probably a good starting point, but YMMV.


To increase the zookeeper timeout, you can try this, but it will have other implications, such a failure detection/recovery being slower:

In accumulo-site.xml: set instance.zookeeper.timeout equal to something like 45s or 60s (default is 30s as Dave mentioned earlier).

In zoo.cfg: set maxSessionTimeout equal to the above, but in milliseconds, e.g. 45000 or 60000.

Reply via email to