________________________________
From: Michael Chen <[email protected]>
Sent: Wednesday, August 16, 2017 3:47 PM
To: [email protected]; [email protected]; [email protected]
Subject: Re: Error connecting to ZooKeeper server
Also, the cluster is on AWS. Security group set to allow all inbound and
outbound traffic...
MG>can you verify ALL inbound ports and ALL outbound ports are enabled and
listening with netstat -lpn
Any ideas?...
MG>to eliminate AWS as the culprit what happens when you disable the
problematic AWS Security Group?
https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
[http://www.google.com/images/icons/product/groups-128.png]<https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
AWS Security Group settings for Chronos
Cluster<https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
groups.google.com
Posted 9/22/14 9:04 AM, 3 messages
On 08/16/2017 12:37 PM, Michael Chen wrote:
>
> Hi,
>
> I've run into a ZooKeeper connection error during the execution of a
> Nutch hadoop job. The tasks stall on connection error to ZooKeeper
> server. Here's what I know:
>
> 1. ZK connection error is the only known problem, other logs report no
> issue
>
> 2. Error message on YARN NodeManager on one of the slaves is:
>
> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)]
> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)]
> org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
> error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>
> The connection keeps failing until it hits the 10min limit and the
> task fails.
>
> 3. ZooKeeper Server is deployed only on master
>
> 4. Cluster managed by CloudEra Manager 5.12.
>
> Could a configuration on Nutch side or CloudEra Manager side be
> missing? There are no ZK servers on the slaves and the NodeManager
> should be connecting to the ZK server on the master, instead of
> localhost:2181.
>
> Any suggestion or help is greatly appreciated!
>
> Thank you,
>
> Michael
>