Re: Losing tservers - Unusually high Last Contact times

Keith Turner Mon, 19 May 2014 18:21:00 -0700

On Mon, May 19, 2014 at 6:56 PM, <[email protected]> wrote:

> You are hitting the zookeeper timeout, default 30s I believe. You said you
> are not oversubscribed for memory, but what about CPU? Are you running YARN
> processes on the same nodes as the tablet servers? Is the tablet server
> being pushed into swap or starved of CPU?
>


Also check on the zookeeper server nodes.  Is Java GC pausing tservers or
zookeeper servers?


>
> -----Original Message-----
> From: thomasa [mailto:[email protected]]
> Sent: Monday, May 19, 2014 4:22 PM
> To: [email protected]
> Subject: Losing tservers - Unusually high Last Contact times
>
> Hello all,
>
> I am having issues with tablet servers going down due to poor contact times
> (my hypothesis at least). In the past I have had stability success with
> smaller clouds (20-40 nodes), but have run into issues with a larger number
> of nodes (150+). Each node is a datanode, nodemanger, and tablet server.
> There is a master node that is running the hadoop namenode, hadoop resource
> manager and accumulo master, monitor, etc. There are three zookeeper nodes.
> All nodes are vms. This same setup is used on the smaller, stable clouds as
> well.
>
> I do not believe memory allocation is an issue as I have only given
> hadoop/yarn (2.2.0) and accumulo (1.5.1) less than half of the available
> memory. The FATAL errors I have seen are:
>
> Lost tablet server lock (resaon = SESSION_EXPIRED), exiting
>
> Lost ability to monitor tablet server lock, exiting
>
> Other than bumping up rpc timeout (which I have done but would rather not
> do
> that and find the root cause of the problem), I have run out of ideas on
> how
> to solve this issue.
>
> Does anyone have any insight into why I would be seeing such bad response
> times between nodes? Are there any configuration parameters I can play with
> to fix this?
>
> I realize this is a very general question, so let me know if there is any
> information I can provide to help clarify the issue.
>
> Thank you in advance for your time.
>
> Thomas
>
>
>
> --
> View this message in context:
>
> http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-
> Last-Contact-times-tp9950.html
> Sent from the Users mailing list archive at Nabble.com.
>
>

Re: Losing tservers - Unusually high Last Contact times

Reply via email to