Hi Tushar,

Thanks For the reply. I am attaching detailed log below FYR. I do not see
"Shutdown after reaching failure threshold for" message in the log. One more
information I got after talking with my colleague who looked at my logs is
it is due to some security bug it is not able to connect HDFS with username
& password. So I can relate with what you replied . " You told some operator
is continuously failing in DAG and due to that application getting killed"
-> yes I have HDFS operator which writes to HDFS our final output, looks
like it is failing and may be due to this application is getting killed.
Could be one of the reason because 
*
2017-03-02 12:18:08,861 WARN  ipc.Client
(Client.java:handleConnectionFailure(886)) - Failed to connect to server:
d-3zkvk02.target.com/10.66.241.46:8030: retries get failed due to exceeded
maximum allowed retries number: 0
java.net.ConnectException: Connection refused*

this warning message I am getting throughout the log. 


FULL LOG FYR==>
2017-03-02 21:38:39,094 INFO  hdfs.DFSClient
(DFSClient.java:getDelegationToken(1043)) - Created HDFS_DELEGATION_TOKEN
token 5002831 for SVFFLHDS on ha-hdfs:littleredns
2017-03-02 21:38:39,095 WARN  ipc.Client
(Client.java:handleConnectionFailure(886)) - Failed to connect to server:
d-3zkvk02.target.com/10.66.241.46:8032: retries get failed due to exceeded
maximum allowed retries number: 0
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
        at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618)
        at org.apache.hadoop.ipc.Client.call(Client.java:1449)
        at org.apache.hadoop.ipc.Client.call(Client.java:1396)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy93.getDelegationToken(Unknown Source)
        at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:310)
        at sun.reflect.GeneratedMethodAccessor150.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
        at com.sun.proxy.$Proxy94.getDelegationToken(Unknown Source)
        at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getRMDelegationToken(YarnClientImpl.java:531)
        at
com.datatorrent.stram.client.StramClientUtils$ClientRMHelper.addRMDelegationToken(StramClientUtils.java:281)
        at
com.datatorrent.stram.security.StramUserLogin$1.run(StramUserLogin.java:114)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at
com.datatorrent.stram.security.StramUserLogin.refreshTokens(StramUserLogin.java:98)
        at
com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:751)
        at
com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:647)
        at
com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:104)
2017-03-02 21:38:39,095 INFO  client.ConfiguredRMFailoverProxyProvider
(ConfiguredRMFailoverProxyProvider.java:performFailover(100)) - Failing over
to rm2
2017-03-02 21:38:39,107 INFO  client.StramClientUtils$ClientRMHelper
(StramClientUtils.java:addRMDelegationToken(287)) - Yarn Resource Manager HA
is enabled
2017-03-02 21:38:39,107 INFO  client.StramClientUtils$ClientRMHelper
(StramClientUtils.java:getRMHAToken(265)) - Yarn Resource Manager id: rm1
2017-03-02 21:38:39,107 INFO  client.StramClientUtils$ClientRMHelper
(StramClientUtils.java:getRMHAToken(265)) - Yarn Resource Manager id: rm2
2017-03-02 21:38:39,107 INFO  client.StramClientUtils$ClientRMHelper
(StramClientUtils.java:addRMDelegationToken(298)) - RM dt Kind:
RM_DELEGATION_TOKEN, Service: 10.66.241.46:8032,10.66.241.14:8032, Ident:
([email protected], renewer=yarn, realUser=,
issueDate=1488512319104, maxDate=1489117119104, sequenceNumber=1309582,
masterKeyId=8466)
2017-03-02 21:38:40,111 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:41,115 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:42,119 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:43,124 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:44,128 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:45,132 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:46,136 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:47,140 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:48,143 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:49,147 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:50,151 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:51,154 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:52,160 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:53,164 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:54,167 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:55,168 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:monitorHeartbeat(789)) - Container
container_e3076_1487635160678_27652_01_001...@brdn1164.target.com:45454
heartbeat timeout (30795 ms).
2017-03-02 21:38:55,171 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:56,172 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:sendContainerAskToRM(1119)) - Requested stop
container container_e3076_1487635160678_27652_01_001538
2017-03-02 21:38:56,172 INFO  impl.NMClientAsyncImpl
(NMClientAsyncImpl.java:run(536)) - Processing Event EventType:
STOP_CONTAINER for Container container_e3076_1487635160678_27652_01_001538
2017-03-02 21:38:56,172 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:monitorHeartbeat(789)) - Container
container_e3076_1487635160678_27652_01_001...@brdn1164.target.com:45454
heartbeat timeout (31799 ms).
2017-03-02 21:38:56,178 INFO  impl.NMClientImpl
(NMClientImpl.java:stopContainer(242)) - ok, stopContainerInternal..
container_e3076_1487635160678_27652_01_001538
2017-03-02 21:38:56,178 INFO  impl.ContainerManagementProtocolProxy
(ContainerManagementProtocolProxy.java:newProxy(260)) - Opening proxy :
brdn1164.target.com:45454
2017-03-02 21:38:56,183 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:57,184 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:execute(929)) - Completed
containerId=container_e3076_1487635160678_27652_01_001538, state=COMPLETE,
exitStatus=-105, diagnostics=Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2017-03-02 21:38:57,185 INFO 
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:cancelToken(520)) - Token
cancelation requested for identifier: owner=SVFFLHDS, renewer=, realUser=,
issueDate=1488512300792, maxDate=4611687506939688695, sequenceNumber=1508,
masterKeyId=2
2017-03-02 21:38:57,185 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:scheduleContainerRestart(1122)) - Initiating
recovery for
container_e3076_1487635160678_27652_01_001...@brdn1164.target.com:45454
2017-03-02 21:38:57,185 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:scheduleContainerRestart(1138)) - Affected
operators [PTOperator[id=13,name=Retriggered_Recs_Hdfs_Write]]
2017-03-02 21:38:57,187 ERROR stram.FSEventRecorder
(FSEventRecorder.java:run(85)) - Caught Exception
java.io.IOException: Connection is not open
        at
com.datatorrent.stram.util.PubSubWebSocketClient.assertUsable(PubSubWebSocketClient.java:264)
        at
com.datatorrent.stram.util.PubSubWebSocketClient.publish(PubSubWebSocketClient.java:287)
        at
com.datatorrent.stram.util.SharedPubSubWebSocketClient.publish(SharedPubSubWebSocketClient.java:120)
        at
com.datatorrent.stram.FSEventRecorder$EventRecorderThread.run(FSEventRecorder.java:79)
2017-03-02 21:38:57,203 ERROR stram.FSEventRecorder
(FSEventRecorder.java:run(85)) - Caught Exception
java.net.ConnectException: Connection refused: /10.66.18.168:9090
        at
org.apache.apex.shaded.ning19.com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
        at
org.apache.apex.shaded.ning19.com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /10.66.18.168:9090
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        ... 8 more
2017-03-02 21:38:57,206 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:58,206 INFO  stram.ResourceRequestHandler
(ResourceRequestHandler.java:getHost(240)) - Strict anti-affinity = [] for
container with operators PTOperator[id=13,name=Retriggered_Recs_Hdfs_Write]
2017-03-02 21:38:58,206 INFO  stram.ResourceRequestHandler
(ResourceRequestHandler.java:getHost(272)) - Found host null
2017-03-02 21:38:58,206 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:sendContainerAskToRM(1103)) - Asking RM for
containers: [Capability[<memory:10240, vCores:1>]Priority[1508]]
2017-03-02 21:38:58,207 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:sendContainerAskToRM(1105)) - Requested
container: Capability[<memory:10240, vCores:1>]Priority[1508] on host:
[null]
2017-03-02 21:38:58,210 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:38:59,211 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:execute(864)) - Got new container.,
containerId=container_e3076_1487635160678_27652_01_001539,
containerNode=brdn1164.target.com:45454,
containerNodeURI=brdn1164.target.com:8042, containerResourceMemory10240,
priority1508
2017-03-02 21:38:59,211 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:assignContainer(1256)) - Removing container
agent container_e3076_1487635160678_27652_01_001538
2017-03-02 21:38:59,212 INFO 
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating
password for identifier: owner=SVFFLHDS, renewer=, realUser=,
issueDate=1488512339212, maxDate=4611687506939727115, sequenceNumber=1509,
masterKeyId=2, currentKey: 2
2017-03-02 21:38:59,212 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:run(150)) - Setting up container launch
context for containerid=container_e3076_1487635160678_27652_01_001539
2017-03-02 21:38:59,212 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:setClasspath(125)) - CLASSPATH:
./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
2017-03-02 21:38:59,239 INFO  util.BasicContainerOptConfigurator
(BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for
operator {-Xmx=9216m, Generic=null}
2017-03-02 21:38:59,239 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts 
-Xmx9663676416  for container container_e3076_1487635160678_27652_01_001539
2017-03-02 21:38:59,239 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:run(191)) - Launching on node:
brdn1164.target.com:45454 command: $JAVA_HOME/bin/java  -Xmx9663676416 
-Ddt.attr.APPLICATION_PATH=hdfs://littleredns/user/SVFFLHDS/datatorrent/apps/application_1487635160678_27652
-Djava.io.tmpdir=$PWD/tmp
-Ddt.cid=container_e3076_1487635160678_27652_01_001539
-Dhadoop.root.logger=INFO,RFA -Dhadoop.log.dir=<LOG_DIR>
com.datatorrent.stram.engine.StreamingContainer 1><LOG_DIR>/stdout
2><LOG_DIR>/stderr  
2017-03-02 21:38:59,239 INFO  impl.NMClientAsyncImpl
(NMClientAsyncImpl.java:run(536)) - Processing Event EventType:
START_CONTAINER for Container container_e3076_1487635160678_27652_01_001539
2017-03-02 21:38:59,239 INFO  impl.ContainerManagementProtocolProxy
(ContainerManagementProtocolProxy.java:newProxy(260)) - Opening proxy :
brdn1164.target.com:45454
2017-03-02 21:38:59,240 ERROR stram.FSEventRecorder
(FSEventRecorder.java:run(85)) - Caught Exception
java.net.ConnectException: Connection refused: /10.66.18.168:9090
        at
org.apache.apex.shaded.ning19.com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
        at
org.apache.apex.shaded.ning19.com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /10.66.18.168:9090
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        ... 8 more
2017-03-02 21:38:59,244 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:00,247 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:01,251 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:01,361 INFO  ipc.Server (Server.java:saslProcess(1538)) -
Auth successful for SVFFLHDS (auth:TOKEN)
2017-03-02 21:39:01,365 INFO  authorize.ServiceAuthorizationManager
(ServiceAuthorizationManager.java:authorize(137)) - Authorization successful
for SVFFLHDS (auth:TOKEN) for protocol=interface
com.datatorrent.stram.api.StreamingContainerUmbilicalProtocol
2017-03-02 21:39:01,535 INFO  stram.StreamingContainerParent
(StreamingContainerParent.java:log(166)) - child msg:
[container_e3076_1487635160678_27652_01_001539] Entering heartbeat loop..
context:
PTContainer[id=15(container_e3076_1487635160678_27652_01_001539),state=ALLOCATED,operators=[PTOperator[id=13,name=Retriggered_Recs_Hdfs_Write]]]
2017-03-02 21:39:02,255 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:02,734 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:processHeartbeat(1459)) - Container
container_e3076_1487635160678_27652_01_001539 buffer server:
brdn1164.target.com:60138
2017-03-02 21:39:03,259 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:04,264 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:05,267 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:06,272 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:07,278 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:08,283 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:09,288 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:10,293 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:11,297 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:12,302 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:13,307 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:14,311 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:15,314 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:16,319 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:17,323 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:18,327 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:19,331 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:20,335 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:21,338 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:22,342 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:23,346 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:24,351 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:25,355 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:26,359 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:27,362 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:28,366 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:29,370 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:30,375 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:31,379 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:32,383 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:33,383 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:monitorHeartbeat(789)) - Container
container_e3076_1487635160678_27652_01_001...@brdn1164.target.com:45454
heartbeat timeout (30649 ms).
2017-03-02 21:39:33,386 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:34,386 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:sendContainerAskToRM(1119)) - Requested stop
container container_e3076_1487635160678_27652_01_001539
2017-03-02 21:39:34,386 INFO  impl.NMClientAsyncImpl
(NMClientAsyncImpl.java:run(536)) - Processing Event EventType:
STOP_CONTAINER for Container container_e3076_1487635160678_27652_01_001539
2017-03-02 21:39:34,388 INFO  impl.NMClientImpl
(NMClientImpl.java:stopContainer(242)) - ok, stopContainerInternal..
container_e3076_1487635160678_27652_01_001539
2017-03-02 21:39:34,388 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:monitorHeartbeat(789)) - Container
container_e3076_1487635160678_27652_01_001...@brdn1164.target.com:45454
heartbeat timeout (31654 ms).
2017-03-02 21:39:34,388 INFO  impl.ContainerManagementProtocolProxy
(ContainerManagementProtocolProxy.java:newProxy(260)) - Opening proxy :
brdn1164.target.com:45454
2017-03-02 21:39:34,391 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:35,392 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:execute(929)) - Completed
containerId=container_e3076_1487635160678_27652_01_001539, state=COMPLETE,
exitStatus=-105, diagnostics=Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2017-03-02 21:39:35,392 INFO 
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:cancelToken(520)) - Token
cancelation requested for identifier: owner=SVFFLHDS, renewer=, realUser=,
issueDate=1488512339212, maxDate=4611687506939727115, sequenceNumber=1509,
masterKeyId=2
2017-03-02 21:39:35,392 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:scheduleContainerRestart(1122)) - Initiating
recovery for
container_e3076_1487635160678_27652_01_001...@brdn1164.target.com:45454
2017-03-02 21:39:35,392 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:scheduleContainerRestart(1138)) - Affected
operators [PTOperator[id=13,name=Retriggered_Recs_Hdfs_Write]]
2017-03-02 21:39:35,395 ERROR stram.FSEventRecorder
(FSEventRecorder.java:run(85)) - Caught Exception
java.io.IOException: Connection is not open
        at
com.datatorrent.stram.util.PubSubWebSocketClient.assertUsable(PubSubWebSocketClient.java:264)
        at
com.datatorrent.stram.util.PubSubWebSocketClient.publish(PubSubWebSocketClient.java:287)
        at
com.datatorrent.stram.util.SharedPubSubWebSocketClient.publish(SharedPubSubWebSocketClient.java:120)
        at
com.datatorrent.stram.FSEventRecorder$EventRecorderThread.run(FSEventRecorder.java:79)
2017-03-02 21:39:35,409 ERROR stram.FSEventRecorder
(FSEventRecorder.java:run(85)) - Caught Exception
java.net.ConnectException: Connection refused: /10.66.18.168:9090
        at
org.apache.apex.shaded.ning19.com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
        at
org.apache.apex.shaded.ning19.com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /10.66.18.168:9090
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at
org.apache.apex.shaded.ning19.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        ... 8 more
2017-03-02 21:39:35,412 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:36,412 INFO  stram.ResourceRequestHandler
(ResourceRequestHandler.java:getHost(240)) - Strict anti-affinity = [] for
container with operators PTOperator[id=13,name=Retriggered_Recs_Hdfs_Write]
2017-03-02 21:39:36,413 INFO  stram.ResourceRequestHandler
(ResourceRequestHandler.java:getHost(272)) - Found host null
2017-03-02 21:39:36,413 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:sendContainerAskToRM(1103)) - Asking RM for
containers: [Capability[<memory:10240, vCores:1>]Priority[1509]]
2017-03-02 21:39:36,413 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:sendContainerAskToRM(1105)) - Requested
container: Capability[<memory:10240, vCores:1>]Priority[1509] on host:
[null]
2017-03-02 21:39:36,417 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:37,418 INFO  stram.StreamingAppMasterService
(StreamingAppMasterService.java:execute(864)) - Got new container.,
containerId=container_e3076_1487635160678_27652_01_001540,
containerNode=brdn1164.target.com:45454,
containerNodeURI=brdn1164.target.com:8042, containerResourceMemory10240,
priority1509
2017-03-02 21:39:37,418 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:assignContainer(1256)) - Removing container
agent container_e3076_1487635160678_27652_01_001539
2017-03-02 21:39:37,419 INFO 
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating
password for identifier: owner=SVFFLHDS, renewer=, realUser=,
issueDate=1488512377419, maxDate=4611687506939765322, sequenceNumber=1510,
masterKeyId=2, currentKey: 2
2017-03-02 21:39:37,419 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:run(150)) - Setting up container launch
context for containerid=container_e3076_1487635160678_27652_01_001540
2017-03-02 21:39:37,419 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:setClasspath(125)) - CLASSPATH:
./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
2017-03-02 21:39:42,215 INFO  util.BasicContainerOptConfigurator
(BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for
operator {-Xmx=9216m, Generic=null}
2017-03-02 21:39:42,229 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts 
-Xmx9663676416  for container container_e3076_1487635160678_27652_01_001540
2017-03-02 21:39:42,230 INFO  stram.LaunchContainerRunnable
(LaunchContainerRunnable.java:run(191)) - Launching on node:
brdn1164.target.com:45454 command: $JAVA_HOME/bin/java  -Xmx9663676416 
-Ddt.attr.APPLICATION_PATH=hdfs://littleredns/user/SVFFLHDS/datatorrent/apps/application_1487635160678_27652
-Djava.io.tmpdir=$PWD/tmp
-Ddt.cid=container_e3076_1487635160678_27652_01_001540
-Dhadoop.root.logger=INFO,RFA -Dhadoop.log.dir=<LOG_DIR>
com.datatorrent.stram.engine.StreamingContainer 1><LOG_DIR>/stdout
2><LOG_DIR>/stderr  
2017-03-02 21:39:42,230 INFO  impl.NMClientAsyncImpl
(NMClientAsyncImpl.java:run(536)) - Processing Event EventType:
START_CONTAINER for Container container_e3076_1487635160678_27652_01_001540
2017-03-02 21:39:42,230 INFO  impl.ContainerManagementProtocolProxy
(ContainerManagementProtocolProxy.java:newProxy(260)) - Opening proxy :
brdn1164.target.com:45454
2017-03-02 21:39:42,231 ERROR stram.FSEventRecorder
(FSEventRecorder.java:run(85)) - Caught Exception
java.io.IOException: Connection is not open
        at
com.datatorrent.stram.util.PubSubWebSocketClient.assertUsable(PubSubWebSocketClient.java:264)
        at
com.datatorrent.stram.util.PubSubWebSocketClient.publish(PubSubWebSocketClient.java:287)
        at
com.datatorrent.stram.util.SharedPubSubWebSocketClient.publish(SharedPubSubWebSocketClient.java:120)
        at
com.datatorrent.stram.FSEventRecorder$EventRecorderThread.run(FSEventRecorder.java:79)
2017-03-02 21:39:42,234 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:43,238 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:44,122 INFO  ipc.Server (Server.java:saslProcess(1538)) -
Auth successful for SVFFLHDS (auth:TOKEN)
2017-03-02 21:39:44,127 INFO  authorize.ServiceAuthorizationManager
(ServiceAuthorizationManager.java:authorize(137)) - Authorization successful
for SVFFLHDS (auth:TOKEN) for protocol=interface
com.datatorrent.stram.api.StreamingContainerUmbilicalProtocol
2017-03-02 21:39:44,242 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:44,267 INFO  stram.StreamingContainerParent
(StreamingContainerParent.java:log(166)) - child msg:
[container_e3076_1487635160678_27652_01_001540] Entering heartbeat loop..
context:
PTContainer[id=15(container_e3076_1487635160678_27652_01_001540),state=ALLOCATED,operators=[PTOperator[id=13,name=Retriggered_Recs_Hdfs_Write]]]
2017-03-02 21:39:45,246 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:45,440 INFO  stram.StreamingContainerManager
(StreamingContainerManager.java:processHeartbeat(1459)) - Container
container_e3076_1487635160678_27652_01_001540 buffer server:
brdn1164.target.com:50773
2017-03-02 21:39:46,251 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:47,255 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:48,259 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:49,262 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:50,266 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:51,270 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:52,280 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:53,285 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:54,288 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:55,292 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:56,295 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:57,299 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:58,311 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map
2017-03-02 21:39:59,316 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:calculateEndWindowStats(823)) - Some
operators are behind for more than 1000 windows! Trimming the end window
stats map


Thanks Again for reply

Thanks
Rishi Mishra



--
View this message in context: 
http://apache-apex-users-list.78494.x6.nabble.com/Apex-application-getting-killed-at-regular-interval-tp1421p1424.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.

Reply via email to