Stephan Ewen created FLINK-1835:
-----------------------------------

             Summary: Spurious failure of YARN tests
                 Key: FLINK-1835
                 URL: https://issues.apache.org/jira/browse/FLINK-1835
             Project: Flink
          Issue Type: Bug
          Components: YARN Client
    Affects Versions: 0.9
            Reporter: Stephan Ewen
            Assignee: Robert Metzger
             Fix For: 0.9


THe failure was caused by detecting an exception in the log.

Stack trace of the exception (extracted from the log) below

{code}
21:18:29,555 WARN  org.apache.hadoop.util.NativeCodeLoader                      
 - Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
21:18:29,806 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - YARN daemon runs as travis setting user to execute Flink 
ApplicationMaster/JobManager to travis
21:18:29,808 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - 
--------------------------------------------------------------------------------
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -  Starting YARN ApplicationMaster/JobManager (Version: 0.9-SNAPSHOT, 
Rev:d2020b5, Date:06.04.2015 @ 18:00:21 UTC)
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -  Current user: travis
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.31-b07
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -  Maximum heap size: 393 MiBytes
21:18:29,826 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -  JAVA_HOME: /usr/lib/jvm/java-8-oracle
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -  JVM Options:
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -     -Xmx409M
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -     
-Dlog.file=/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-1_0/application_1428355034517_0004/container_1428355034517_0004_01_000001/jobmanager-main.log
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -     -Dlogback.configurationFile=file:logback.xml
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -     -Dlog4j.configuration=file:log4j.properties
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 -  Program Arguments: (none)
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - 
--------------------------------------------------------------------------------
21:18:29,828 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - registered UNIX signal handlers for [TERM, HUP, INT]
21:18:29,843 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - Starting JobManager for YARN
21:18:29,845 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - Loading config from: 
/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/container_1428355034517_0004_01_000001
21:18:30,388 INFO  akka.event.slf4j.Slf4jLogger                                 
 - Slf4jLogger started
21:18:30,450 INFO  Remoting                                                     
 - Starting remoting
21:18:30,637 INFO  Remoting                                                     
 - Remoting started; listening on addresses 
:[akka.tcp://flink@172.17.0.176:34023]
21:18:30,651 INFO  org.apache.flink.runtime.blob.BlobServer                     
 - Created BLOB server storage directory 
/tmp/blobStore-e34b86da-094c-4a4e-aa02-7b0556e8af93
21:18:30,655 INFO  org.apache.flink.runtime.blob.BlobServer                     
 - Started BLOB server at 0.0.0.0:33717 - max concurrent requests: 50 - max 
backlog: 1000
21:18:30,670 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - Starting Job Manger web frontend.
21:18:30,673 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer        
 - Setting up web info server, using web-root directory 
jar:file:/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/filecache/12/flink-dist-0.9-SNAPSHOT.jar!/web-docs-infoserver.
21:18:30,705 INFO  org.apache.flink.runtime.jobmanager.JobManager               
 - Starting JobManager at akka://flink/user/jobmanager#395299512.
21:18:31,184 INFO  org.eclipse.jetty.util.log                                   
 - jetty-0.9-SNAPSHOT
21:18:31,269 INFO  org.eclipse.jetty.util.log                                   
 - Started SelectChannelConnector@0.0.0.0:49867
21:18:31,270 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer        
 - Started web info server for JobManager on 0.0.0.0:49867
21:18:31,270 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - Generate configuration file for application master.
21:18:31,283 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - Starting YARN session on Job Manager.
21:18:31,284 INFO  org.apache.flink.yarn.ApplicationMaster$                     
 - Application Master properly initiated. Awaiting termination of actor system.
21:18:31,287 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Start yarn session.
21:18:31,489 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Requesting 1 TaskManagers. Tolerating 1 failed TaskManagers
21:18:31,815 INFO  org.apache.hadoop.yarn.client.RMProxy                        
 - Connecting to ResourceManager at /0.0.0.0:8030
21:18:31,914 INFO  
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
yarn.client.max-cached-nodemanagers-proxies : 0
21:18:31,915 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Registering ApplicationMaster with tracking url 
http://testing-worker-linux-docker-2f4f6c00-3426-linux-13.prod.travis-ci.org:49867.
21:18:32,255 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Requesting initial TaskManager container 0.
21:18:32,283 INFO  org.apache.flink.yarn.Utils                                  
 - Copying from 
file:/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/container_1428355034517_0004_01_000001/flink-conf-modified.yaml
 to 
file:/tmp/junit3904564006360292351/junit1676152559016123175/.flink/application_1428355034517_0004/flink-conf-modified.yaml
21:18:32,458 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Prepared local resource for modified yaml: resource { scheme: "file" port: 
-1 file: 
"/tmp/junit3904564006360292351/junit1676152559016123175/.flink/application_1428355034517_0004/flink-conf-modified.yaml"
 } size: 3393 timestamp: 1428355112000 type: FILE visibility: APPLICATION
21:18:32,461 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Create container launch context.
21:18:32,483 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Starting TM with command=$JAVA_HOME/bin/java -Xmx819m  
-Dlog.file="<LOG_DIR>/taskmanager.log" 
-Dlogback.configurationFile=file:logback.xml 
-Dlog4j.configuration=file:log4j.properties 
org.apache.flink.yarn.appMaster.YarnTaskManagerRunner --configDir . 1> 
<LOG_DIR>/taskmanager-stdout.log 2> <LOG_DIR>/taskmanager-stderr.log
21:18:33,077 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - The user requested 1 containers, 0 running. 1 containers missing
21:18:33,631 ERROR akka.actor.OneForOneStrategy                                 
 - Application attempt appattempt_1428355034517_0004_000001 doesn't exist in 
ApplicationMasterService cache.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
        at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: 
Application attempt appattempt_1428355034517_0004_000001 doesn't exist in 
ApplicationMasterService cache.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
        at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy8.allocate(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
        at 
org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:190)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
        at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
        at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:91)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
 Application attempt appattempt_1428355034517_0004_000001 doesn't exist in 
ApplicationMasterService cache.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
        at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy7.allocate(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
        ... 26 more
21:18:33,646 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1   
 - Stopping JobManager akka://flink/user/jobmanager#395299512.
21:18:33,701 ERROR org.apache.flink.yarn.ApplicationMaster$                     
 - RECEIVED SIGNAL 15: SIGTERM
21:19:52,986 INFO  org.apache.flink.yarn.YarnTestBase                           
 - Shutting down MiniYarn cluster
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to