Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Now I'm seeing following error in controller log. Restarting the controller
fixed the issue. We are time to time seeing this in controller with zk
connection issues. Is this also something to do with zk client version?

2019-05-31 13:21:46,669 [Thread-0-SendThread(localhost:2181)] WARN
 o.apache.zookeeper.ClientCnxn  - Session 0x16b0ebbee1d000e for server
localhost/127.0.0.1:2181, unexpected error, closing socket connection and
attempting reconnect
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:102)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)

Thanks
Dimuthu

On Fri, May 31, 2019 at 1:14 PM DImuthu Upeksha 
wrote:

> Hi Lei,
>
> We use 0.8.2. We initially had 0.8.4 but it contains an issue with task
> retry logic so we downgraded to 0.8.2. We are planning to go into
> production with 0.8.2 by next week so can you please advice a better way to
> solve this without upgrading to 0.8.4.
>
> Thanks
> Dimuthu
>
> On Fri, May 31, 2019 at 1:04 PM Lei Xia  wrote:
>
>> Which Helix version do you use?  This may caused by this Zookeeper bug (
>> https://issues.apache.org/jira/browse/ZOOKEEPER-706).  We have upgraded
>> ZkClient in later Helix versions.
>>
>>
>> Lei
>>
>> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
>> dimuthu.upeks...@gmail.com> wrote:
>>
>>> Hi Folks,
>>>
>>> I'm getting following error in controller log and seems like controller
>>> is
>>> not moving froward after that point
>>>
>>> 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
>>> Starting helix controller
>>> 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
>>> Settings loaded from
>>>
>>> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
>>> 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>>> Connection to helix cluster : AiravataDemoCluster with name :
>>> helixcontroller2
>>> 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>>> Zookeeper connection string localhost:2181
>>> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
>>> o.a.h.c.GenericHelixController  - Exception while executing
>>> DEFAULTpipeline:
>>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for
>>> cluster .AiravataDemoCluster. Will not continue to next pipeline
>>> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
>>> get
>>> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
>>> at
>>>
>>> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
>>> at
>>>
>>> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
>>> at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
>>> at
>>>
>>> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
>>> at
>>>
>>> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
>>> Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
>>> Fail to read nodes for
>>> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
>>> ... 7 common frames omitted
>>>
>>> In the zookeeper log I can see following warning getting printed
>>> continuously. What could be the reason for that? I'm using helix 0.8.2
>>> and
>>> zookeeper 3.4.8
>>>
>>> 2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
>>> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f
>>> 2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket
>>> connection
>>> from /127.0.0.1:57984
>>> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
>>> 

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Lei,

We use 0.8.2. We initially had 0.8.4 but it contains an issue with task
retry logic so we downgraded to 0.8.2. We are planning to go into
production with 0.8.2 by next week so can you please advice a better way to
solve this without upgrading to 0.8.4.

Thanks
Dimuthu

On Fri, May 31, 2019 at 1:04 PM Lei Xia  wrote:

> Which Helix version do you use?  This may caused by this Zookeeper bug (
> https://issues.apache.org/jira/browse/ZOOKEEPER-706).  We have upgraded
> ZkClient in later Helix versions.
>
>
> Lei
>
> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com> wrote:
>
>> Hi Folks,
>>
>> I'm getting following error in controller log and seems like controller is
>> not moving froward after that point
>>
>> 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
>> Starting helix controller
>> 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
>> Settings loaded from
>>
>> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
>> 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>> Connection to helix cluster : AiravataDemoCluster with name :
>> helixcontroller2
>> 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>> Zookeeper connection string localhost:2181
>> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
>> o.a.h.c.GenericHelixController  - Exception while executing
>> DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for
>> cluster .AiravataDemoCluster. Will not continue to next pipeline
>> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
>> get
>> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
>> at
>>
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
>> at
>>
>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
>> at
>>
>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
>> at
>>
>> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
>> at
>>
>> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
>> at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
>> at
>>
>> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
>> at
>>
>> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
>> Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
>> Fail to read nodes for
>> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
>> at
>>
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
>> at
>>
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
>> at
>>
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
>> ... 7 common frames omitted
>>
>> In the zookeeper log I can see following warning getting printed
>> continuously. What could be the reason for that? I'm using helix 0.8.2 and
>> zookeeper 3.4.8
>>
>> 2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
>> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f
>> 2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket
>> connection
>> from /127.0.0.1:57984
>> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
>> session 0x16b0e59877f at /127.0.0.1:57984
>> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session
>> 0x16b0e59877f with negotiated timeout 3 for client /
>> 127.0.0.1:57984
>> 2019-05-31 10:49:37,790 [myid:] - WARN  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
>> EndOfStreamException: Unable to read additional data from client sessionid
>> 0x16b0e59877f, likely client has closed socket
>> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
>> at
>>
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Thanks
>> Dimuthu
>>
>
>
> --
> Lei Xia
>


Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread Lei Xia
Which Helix version do you use?  This may caused by this Zookeeper bug (
https://issues.apache.org/jira/browse/ZOOKEEPER-706).  We have upgraded
ZkClient in later Helix versions.


Lei

On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha 
wrote:

> Hi Folks,
>
> I'm getting following error in controller log and seems like controller is
> not moving froward after that point
>
> 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
> Starting helix controller
> 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
> Settings loaded from
>
> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
> 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> Connection to helix cluster : AiravataDemoCluster with name :
> helixcontroller2
> 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> Zookeeper connection string localhost:2181
> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
> o.a.h.c.GenericHelixController  - Exception while executing
> DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for
> cluster .AiravataDemoCluster. Will not continue to next pipeline
> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to get
> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
> at
>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
> at
>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
> at
>
> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
> at
>
> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
> at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
> at
>
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
> at
>
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
> Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
> Fail to read nodes for
> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
> ... 7 common frames omitted
>
> In the zookeeper log I can see following warning getting printed
> continuously. What could be the reason for that? I'm using helix 0.8.2 and
> zookeeper 3.4.8
>
> 2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f
> 2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket
> connection
> from /127.0.0.1:57984
> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
> session 0x16b0e59877f at /127.0.0.1:57984
> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session
> 0x16b0e59877f with negotiated timeout 3 for client /
> 127.0.0.1:57984
> 2019-05-31 10:49:37,790 [myid:] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x16b0e59877f, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
> at java.lang.Thread.run(Thread.java:748)
>
> Thanks
> Dimuthu
>


-- 
Lei Xia


Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Kishore,

Adding -Djute.maxbuffer=49107800 fixed the issue but now I can see a whole
lot of logs printing with following line and participant is executing a
bulk of Tasks once in a while with around 5 minute delay in between.

2019-05-31 12:45:58,804 [GenericHelixController-event_process] WARN
 o.a.h.t.a.AssignableInstance  - AssignableInstance does not have enough
capacity for quotaType: DEFAULT. Task:
Task_TASK_57f87021-bedb-415a-bf1f-ddfd17641f11, quotaType: DEFAULT,
Instance name: helixparticipant3. Current capacity: 40 capacity needed to
schedule: 40

Thanks
Dimuthu

On Fri, May 31, 2019 at 12:25 PM DImuthu Upeksha 
wrote:

> Hi Kishore,
>
> Please find below log and I think the issue is "Packet len4194362 is out
> of range!". Currently we have around 1 unprocessed workflows. Will that
> be the reason for that?
>
> 2019-05-31 12:17:15,221 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (SyncConnected)
> 2019-05-31 12:17:15,237 [Thread-0-SendThread(localhost:2181)] WARN
>  o.apache.zookeeper.ClientCnxn  - Session 0x16b0e59877f0005 for server
> localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection
> and attempting reconnect
> java.io.IOException: Packet len4194362 is out of range!
> at
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
> 2019-05-31 12:17:15,337 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (Disconnected)
> 2019-05-31 12:17:15,523 [Thread-0-SendThread(localhost:2181)] INFO
>  o.apache.zookeeper.ClientCnxn  - Opening socket connection to server
> localhost/127.0.0.1:2181
> 2019-05-31 12:17:15,524 [Thread-0-SendThread(localhost:2181)] INFO
>  o.apache.zookeeper.ClientCnxn  - Socket connection established to
> localhost/127.0.0.1:2181, initiating session
> 2019-05-31 12:17:15,525 [Thread-0-SendThread(localhost:2181)] INFO
>  o.apache.zookeeper.ClientCnxn  - Session establishment complete on server
> localhost/127.0.0.1:2181, sessionid = 0x16b0e59877f0005, negotiated
> timeout = 3
> 2019-05-31 12:17:15,526 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (SyncConnected)
> 2019-05-31 12:17:15,540 [Thread-0-SendThread(localhost:2181)] WARN
>  o.apache.zookeeper.ClientCnxn  - Session 0x16b0e59877f0005 for server
> localhost/127.0.0.1:2181, unexpected error, closing socket connection and
> attempting reconnect
> java.io.IOException: Packet len4194362 is out of range!
> at
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
> 2019-05-31 12:17:15,641 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (Disconnected)
>
> Thanks
> Dimuthu
>
> On Fri, May 31, 2019 at 11:55 AM kishore g  wrote:
>
>> can you grep for zookeeper state in controller log.
>>
>> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
>> dimuthu.upeks...@gmail.com>
>> wrote:
>>
>> > Hi Folks,
>> >
>> > I'm getting following error in controller log and seems like controller
>> is
>> > not moving froward after that point
>> >
>> > 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
>> > Starting helix controller
>> > 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
>> > Settings loaded from
>> >
>> >
>> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
>> > 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>> > Connection to helix cluster : AiravataDemoCluster with name :
>> > helixcontroller2
>> > 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>> > Zookeeper connection string localhost:2181
>> > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
>> > o.a.h.c.GenericHelixController  - Exception while executing
>> > DEFAULTpipeline:
>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for
>> > cluster .AiravataDemoCluster. Will not continue to next pipeline
>> > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
>> get
>> > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Kishore,

Please find below log and I think the issue is "Packet len4194362 is out of
range!". Currently we have around 1 unprocessed workflows. Will that be
the reason for that?

2019-05-31 12:17:15,221 [Thread-0-EventThread] INFO
 o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (SyncConnected)
2019-05-31 12:17:15,237 [Thread-0-SendThread(localhost:2181)] WARN
 o.apache.zookeeper.ClientCnxn  - Session 0x16b0e59877f0005 for server
localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection
and attempting reconnect
java.io.IOException: Packet len4194362 is out of range!
at
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
2019-05-31 12:17:15,337 [Thread-0-EventThread] INFO
 o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (Disconnected)
2019-05-31 12:17:15,523 [Thread-0-SendThread(localhost:2181)] INFO
 o.apache.zookeeper.ClientCnxn  - Opening socket connection to server
localhost/127.0.0.1:2181
2019-05-31 12:17:15,524 [Thread-0-SendThread(localhost:2181)] INFO
 o.apache.zookeeper.ClientCnxn  - Socket connection established to
localhost/127.0.0.1:2181, initiating session
2019-05-31 12:17:15,525 [Thread-0-SendThread(localhost:2181)] INFO
 o.apache.zookeeper.ClientCnxn  - Session establishment complete on server
localhost/127.0.0.1:2181, sessionid = 0x16b0e59877f0005, negotiated timeout
= 3
2019-05-31 12:17:15,526 [Thread-0-EventThread] INFO
 o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (SyncConnected)
2019-05-31 12:17:15,540 [Thread-0-SendThread(localhost:2181)] WARN
 o.apache.zookeeper.ClientCnxn  - Session 0x16b0e59877f0005 for server
localhost/127.0.0.1:2181, unexpected error, closing socket connection and
attempting reconnect
java.io.IOException: Packet len4194362 is out of range!
at
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
2019-05-31 12:17:15,641 [Thread-0-EventThread] INFO
 o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (Disconnected)

Thanks
Dimuthu

On Fri, May 31, 2019 at 11:55 AM kishore g  wrote:

> can you grep for zookeeper state in controller log.
>
> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com>
> wrote:
>
> > Hi Folks,
> >
> > I'm getting following error in controller log and seems like controller
> is
> > not moving froward after that point
> >
> > 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
> > Starting helix controller
> > 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
> > Settings loaded from
> >
> >
> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
> > 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> > Connection to helix cluster : AiravataDemoCluster with name :
> > helixcontroller2
> > 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> > Zookeeper connection string localhost:2181
> > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
> > o.a.h.c.GenericHelixController  - Exception while executing
> > DEFAULTpipeline:
> org.apache.helix.controller.pipeline.Pipeline@408d6d26for
> > cluster .AiravataDemoCluster. Will not continue to next pipeline
> > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
> get
> > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
> > at
> >
> >
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
> > at
> >
> >
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
> > at
> >
> >
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
> > at
> >
> >
> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
> > at
> >
> >
> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
> > at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
> > at
> >
> >
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
> > at
> >
> >
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
> > Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
> > Fail to read nodes for
> > [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
> > at
> >
> >
> 

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread kishore g
can you grep for zookeeper state in controller log.

On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha 
wrote:

> Hi Folks,
>
> I'm getting following error in controller log and seems like controller is
> not moving froward after that point
>
> 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
> Starting helix controller
> 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
> Settings loaded from
>
> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
> 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> Connection to helix cluster : AiravataDemoCluster with name :
> helixcontroller2
> 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> Zookeeper connection string localhost:2181
> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
> o.a.h.c.GenericHelixController  - Exception while executing
> DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for
> cluster .AiravataDemoCluster. Will not continue to next pipeline
> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to get
> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
> at
>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
> at
>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
> at
>
> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
> at
>
> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
> at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
> at
>
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
> at
>
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
> Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
> Fail to read nodes for
> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
> ... 7 common frames omitted
>
> In the zookeeper log I can see following warning getting printed
> continuously. What could be the reason for that? I'm using helix 0.8.2 and
> zookeeper 3.4.8
>
> 2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f
> 2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket
> connection
> from /127.0.0.1:57984
> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
> session 0x16b0e59877f at /127.0.0.1:57984
> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session
> 0x16b0e59877f with negotiated timeout 3 for client /
> 127.0.0.1:57984
> 2019-05-31 10:49:37,790 [myid:] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x16b0e59877f, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
> at java.lang.Thread.run(Thread.java:748)
>
> Thanks
> Dimuthu
>


Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Folks,

I'm getting following error in controller log and seems like controller is
not moving froward after that point

2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
Starting helix controller
2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
Settings loaded from
file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
Connection to helix cluster : AiravataDemoCluster with name :
helixcontroller2
2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
Zookeeper connection string localhost:2181
2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
o.a.h.c.GenericHelixController  - Exception while executing
DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for
cluster .AiravataDemoCluster. Will not continue to next pipeline
org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to get
full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
at
org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
at
org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
at
org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
at
org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
at
org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
at
org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
at
org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
Fail to read nodes for
[/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
at
org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
at
org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
at
org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
... 7 common frames omitted

In the zookeeper log I can see following warning getting printed
continuously. What could be the reason for that? I'm using helix 0.8.2 and
zookeeper 3.4.8

2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f
2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection
from /127.0.0.1:57984
2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
session 0x16b0e59877f at /127.0.0.1:57984
2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session
0x16b0e59877f with negotiated timeout 3 for client /127.0.0.1:57984
2019-05-31 10:49:37,790 [myid:] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x16b0e59877f, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)

Thanks
Dimuthu