Re: Zookeeper connection errors in Helix Controller
Now I'm seeing following error in controller log. Restarting the controller fixed the issue. We are time to time seeing this in controller with zk connection issues. Is this also something to do with zk client version? 2019-05-31 13:21:46,669 [Thread-0-SendThread(localhost:2181)] WARN o.apache.zookeeper.ClientCnxn - Session 0x16b0ebbee1d000e for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:102) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) Thanks Dimuthu On Fri, May 31, 2019 at 1:14 PM DImuthu Upeksha wrote: > Hi Lei, > > We use 0.8.2. We initially had 0.8.4 but it contains an issue with task > retry logic so we downgraded to 0.8.2. We are planning to go into > production with 0.8.2 by next week so can you please advice a better way to > solve this without upgrading to 0.8.4. > > Thanks > Dimuthu > > On Fri, May 31, 2019 at 1:04 PM Lei Xia wrote: > >> Which Helix version do you use? This may caused by this Zookeeper bug ( >> https://issues.apache.org/jira/browse/ZOOKEEPER-706). We have upgraded >> ZkClient in later Helix versions. >> >> >> Lei >> >> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha < >> dimuthu.upeks...@gmail.com> wrote: >> >>> Hi Folks, >>> >>> I'm getting following error in controller log and seems like controller >>> is >>> not moving froward after that point >>> >>> 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - >>> Starting helix controller >>> 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - >>> Settings loaded from >>> >>> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties >>> 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - >>> Connection to helix cluster : AiravataDemoCluster with name : >>> helixcontroller2 >>> 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - >>> Zookeeper connection string localhost:2181 >>> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR >>> o.a.h.c.GenericHelixController - Exception while executing >>> DEFAULTpipeline: >>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for >>> cluster .AiravataDemoCluster. Will not continue to next pipeline >>> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to >>> get >>> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT >>> at >>> >>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) >>> at >>> >>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) >>> at >>> >>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) >>> at >>> >>> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) >>> at >>> >>> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) >>> at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) >>> at >>> >>> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) >>> at >>> >>> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) >>> Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException: >>> Fail to read nodes for >>> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] >>> at >>> >>> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414) >>> at >>> >>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479) >>> at >>> >>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442) >>> ... 7 common frames omitted >>> >>> In the zookeeper log I can see following warning getting printed >>> continuously. What could be the reason for that? I'm using helix 0.8.2 >>> and >>> zookeeper 3.4.8 >>> >>> 2019-05-31 10:49:37,621 [myid:] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for >>> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f >>> 2019-05-31 10:49:37,773 [myid:] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket >>> connection >>> from /127.0.0.1:57984 >>> 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew >>>
Re: Zookeeper connection errors in Helix Controller
Hi Lei, We use 0.8.2. We initially had 0.8.4 but it contains an issue with task retry logic so we downgraded to 0.8.2. We are planning to go into production with 0.8.2 by next week so can you please advice a better way to solve this without upgrading to 0.8.4. Thanks Dimuthu On Fri, May 31, 2019 at 1:04 PM Lei Xia wrote: > Which Helix version do you use? This may caused by this Zookeeper bug ( > https://issues.apache.org/jira/browse/ZOOKEEPER-706). We have upgraded > ZkClient in later Helix versions. > > > Lei > > On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha < > dimuthu.upeks...@gmail.com> wrote: > >> Hi Folks, >> >> I'm getting following error in controller log and seems like controller is >> not moving froward after that point >> >> 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - >> Starting helix controller >> 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - >> Settings loaded from >> >> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties >> 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - >> Connection to helix cluster : AiravataDemoCluster with name : >> helixcontroller2 >> 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - >> Zookeeper connection string localhost:2181 >> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR >> o.a.h.c.GenericHelixController - Exception while executing >> DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for >> cluster .AiravataDemoCluster. Will not continue to next pipeline >> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to >> get >> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT >> at >> >> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) >> at >> >> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) >> at >> >> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) >> at >> >> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) >> at >> >> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) >> at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) >> at >> >> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) >> at >> >> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) >> Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException: >> Fail to read nodes for >> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] >> at >> >> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414) >> at >> >> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479) >> at >> >> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442) >> ... 7 common frames omitted >> >> In the zookeeper log I can see following warning getting printed >> continuously. What could be the reason for that? I'm using helix 0.8.2 and >> zookeeper 3.4.8 >> >> 2019-05-31 10:49:37,621 [myid:] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for >> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f >> 2019-05-31 10:49:37,773 [myid:] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket >> connection >> from /127.0.0.1:57984 >> 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew >> session 0x16b0e59877f at /127.0.0.1:57984 >> 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session >> 0x16b0e59877f with negotiated timeout 3 for client / >> 127.0.0.1:57984 >> 2019-05-31 10:49:37,790 [myid:] - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception >> EndOfStreamException: Unable to read additional data from client sessionid >> 0x16b0e59877f, likely client has closed socket >> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) >> at >> >> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) >> at java.lang.Thread.run(Thread.java:748) >> >> Thanks >> Dimuthu >> > > > -- > Lei Xia >
Re: Zookeeper connection errors in Helix Controller
Which Helix version do you use? This may caused by this Zookeeper bug ( https://issues.apache.org/jira/browse/ZOOKEEPER-706). We have upgraded ZkClient in later Helix versions. Lei On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha wrote: > Hi Folks, > > I'm getting following error in controller log and seems like controller is > not moving froward after that point > > 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - > Starting helix controller > 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - > Settings loaded from > > file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties > 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - > Connection to helix cluster : AiravataDemoCluster with name : > helixcontroller2 > 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - > Zookeeper connection string localhost:2181 > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR > o.a.h.c.GenericHelixController - Exception while executing > DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for > cluster .AiravataDemoCluster. Will not continue to next pipeline > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to get > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) > at > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) > at > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) > at > > org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) > at > > org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) > at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) > at > > org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) > at > > org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) > Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException: > Fail to read nodes for > [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414) > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479) > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442) > ... 7 common frames omitted > > In the zookeeper log I can see following warning getting printed > continuously. What could be the reason for that? I'm using helix 0.8.2 and > zookeeper 3.4.8 > > 2019-05-31 10:49:37,621 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for > client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f > 2019-05-31 10:49:37,773 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket > connection > from /127.0.0.1:57984 > 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew > session 0x16b0e59877f at /127.0.0.1:57984 > 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session > 0x16b0e59877f with negotiated timeout 3 for client / > 127.0.0.1:57984 > 2019-05-31 10:49:37,790 [myid:] - WARN [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid > 0x16b0e59877f, likely client has closed socket > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:748) > > Thanks > Dimuthu > -- Lei Xia
Re: Zookeeper connection errors in Helix Controller
Hi Kishore, Adding -Djute.maxbuffer=49107800 fixed the issue but now I can see a whole lot of logs printing with following line and participant is executing a bulk of Tasks once in a while with around 5 minute delay in between. 2019-05-31 12:45:58,804 [GenericHelixController-event_process] WARN o.a.h.t.a.AssignableInstance - AssignableInstance does not have enough capacity for quotaType: DEFAULT. Task: Task_TASK_57f87021-bedb-415a-bf1f-ddfd17641f11, quotaType: DEFAULT, Instance name: helixparticipant3. Current capacity: 40 capacity needed to schedule: 40 Thanks Dimuthu On Fri, May 31, 2019 at 12:25 PM DImuthu Upeksha wrote: > Hi Kishore, > > Please find below log and I think the issue is "Packet len4194362 is out > of range!". Currently we have around 1 unprocessed workflows. Will that > be the reason for that? > > 2019-05-31 12:17:15,221 [Thread-0-EventThread] INFO > o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (SyncConnected) > 2019-05-31 12:17:15,237 [Thread-0-SendThread(localhost:2181)] WARN > o.apache.zookeeper.ClientCnxn - Session 0x16b0e59877f0005 for server > localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection > and attempting reconnect > java.io.IOException: Packet len4194362 is out of range! > at > org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) > 2019-05-31 12:17:15,337 [Thread-0-EventThread] INFO > o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (Disconnected) > 2019-05-31 12:17:15,523 [Thread-0-SendThread(localhost:2181)] INFO > o.apache.zookeeper.ClientCnxn - Opening socket connection to server > localhost/127.0.0.1:2181 > 2019-05-31 12:17:15,524 [Thread-0-SendThread(localhost:2181)] INFO > o.apache.zookeeper.ClientCnxn - Socket connection established to > localhost/127.0.0.1:2181, initiating session > 2019-05-31 12:17:15,525 [Thread-0-SendThread(localhost:2181)] INFO > o.apache.zookeeper.ClientCnxn - Session establishment complete on server > localhost/127.0.0.1:2181, sessionid = 0x16b0e59877f0005, negotiated > timeout = 3 > 2019-05-31 12:17:15,526 [Thread-0-EventThread] INFO > o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (SyncConnected) > 2019-05-31 12:17:15,540 [Thread-0-SendThread(localhost:2181)] WARN > o.apache.zookeeper.ClientCnxn - Session 0x16b0e59877f0005 for server > localhost/127.0.0.1:2181, unexpected error, closing socket connection and > attempting reconnect > java.io.IOException: Packet len4194362 is out of range! > at > org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) > 2019-05-31 12:17:15,641 [Thread-0-EventThread] INFO > o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (Disconnected) > > Thanks > Dimuthu > > On Fri, May 31, 2019 at 11:55 AM kishore g wrote: > >> can you grep for zookeeper state in controller log. >> >> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha < >> dimuthu.upeks...@gmail.com> >> wrote: >> >> > Hi Folks, >> > >> > I'm getting following error in controller log and seems like controller >> is >> > not moving froward after that point >> > >> > 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - >> > Starting helix controller >> > 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - >> > Settings loaded from >> > >> > >> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties >> > 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - >> > Connection to helix cluster : AiravataDemoCluster with name : >> > helixcontroller2 >> > 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - >> > Zookeeper connection string localhost:2181 >> > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR >> > o.a.h.c.GenericHelixController - Exception while executing >> > DEFAULTpipeline: >> org.apache.helix.controller.pipeline.Pipeline@408d6d26for >> > cluster .AiravataDemoCluster. Will not continue to next pipeline >> > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to >> get >> > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT >> > at >> > >> > >> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) >> > at >> > >> > >> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) >> > at >> > >> > >> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
Re: Zookeeper connection errors in Helix Controller
Hi Kishore, Please find below log and I think the issue is "Packet len4194362 is out of range!". Currently we have around 1 unprocessed workflows. Will that be the reason for that? 2019-05-31 12:17:15,221 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (SyncConnected) 2019-05-31 12:17:15,237 [Thread-0-SendThread(localhost:2181)] WARN o.apache.zookeeper.ClientCnxn - Session 0x16b0e59877f0005 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len4194362 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) 2019-05-31 12:17:15,337 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (Disconnected) 2019-05-31 12:17:15,523 [Thread-0-SendThread(localhost:2181)] INFO o.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181 2019-05-31 12:17:15,524 [Thread-0-SendThread(localhost:2181)] INFO o.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2181, initiating session 2019-05-31 12:17:15,525 [Thread-0-SendThread(localhost:2181)] INFO o.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x16b0e59877f0005, negotiated timeout = 3 2019-05-31 12:17:15,526 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (SyncConnected) 2019-05-31 12:17:15,540 [Thread-0-SendThread(localhost:2181)] WARN o.apache.zookeeper.ClientCnxn - Session 0x16b0e59877f0005 for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len4194362 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) 2019-05-31 12:17:15,641 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (Disconnected) Thanks Dimuthu On Fri, May 31, 2019 at 11:55 AM kishore g wrote: > can you grep for zookeeper state in controller log. > > On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha < > dimuthu.upeks...@gmail.com> > wrote: > > > Hi Folks, > > > > I'm getting following error in controller log and seems like controller > is > > not moving froward after that point > > > > 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - > > Starting helix controller > > 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - > > Settings loaded from > > > > > file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties > > 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - > > Connection to helix cluster : AiravataDemoCluster with name : > > helixcontroller2 > > 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - > > Zookeeper connection string localhost:2181 > > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR > > o.a.h.c.GenericHelixController - Exception while executing > > DEFAULTpipeline: > org.apache.helix.controller.pipeline.Pipeline@408d6d26for > > cluster .AiravataDemoCluster. Will not continue to next pipeline > > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to > get > > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT > > at > > > > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) > > at > > > > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) > > at > > > > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) > > at > > > > > org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) > > at > > > > > org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) > > at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) > > at > > > > > org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) > > at > > > > > org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) > > Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException: > > Fail to read nodes for > > [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] > > at > > > > >
Re: Zookeeper connection errors in Helix Controller
can you grep for zookeeper state in controller log. On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha wrote: > Hi Folks, > > I'm getting following error in controller log and seems like controller is > not moving froward after that point > > 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - > Starting helix controller > 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - > Settings loaded from > > file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties > 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - > Connection to helix cluster : AiravataDemoCluster with name : > helixcontroller2 > 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - > Zookeeper connection string localhost:2181 > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR > o.a.h.c.GenericHelixController - Exception while executing > DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for > cluster .AiravataDemoCluster. Will not continue to next pipeline > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to get > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) > at > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) > at > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) > at > > org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) > at > > org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) > at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) > at > > org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) > at > > org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) > Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException: > Fail to read nodes for > [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414) > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479) > at > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442) > ... 7 common frames omitted > > In the zookeeper log I can see following warning getting printed > continuously. What could be the reason for that? I'm using helix 0.8.2 and > zookeeper 3.4.8 > > 2019-05-31 10:49:37,621 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for > client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f > 2019-05-31 10:49:37,773 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket > connection > from /127.0.0.1:57984 > 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew > session 0x16b0e59877f at /127.0.0.1:57984 > 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session > 0x16b0e59877f with negotiated timeout 3 for client / > 127.0.0.1:57984 > 2019-05-31 10:49:37,790 [myid:] - WARN [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid > 0x16b0e59877f, likely client has closed socket > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:748) > > Thanks > Dimuthu >
Zookeeper connection errors in Helix Controller
Hi Folks, I'm getting following error in controller log and seems like controller is not moving froward after that point 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - Starting helix controller 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - Settings loaded from file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - Connection to helix cluster : AiravataDemoCluster with name : helixcontroller2 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - Zookeeper connection string localhost:2181 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR o.a.h.c.GenericHelixController - Exception while executing DEFAULTpipeline: org.apache.helix.controller.pipeline.Pipeline@408d6d26for cluster .AiravataDemoCluster. Will not continue to next pipeline org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to get full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT at org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) at org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) at org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) at org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) at org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException: Fail to read nodes for [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] at org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414) at org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479) at org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442) ... 7 common frames omitted In the zookeeper log I can see following warning getting printed continuously. What could be the reason for that? I'm using helix 0.8.2 and zookeeper 3.4.8 2019-05-31 10:49:37,621 [myid:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f 2019-05-31 10:49:37,773 [myid:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:57984 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew session 0x16b0e59877f at /127.0.0.1:57984 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session 0x16b0e59877f with negotiated timeout 3 for client /127.0.0.1:57984 2019-05-31 10:49:37,790 [myid:] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x16b0e59877f, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) at java.lang.Thread.run(Thread.java:748) Thanks Dimuthu