Thank you. The YARN job was started now, but the Flink job itself is in
some bad state.

Flink UI keeps showing status CREATED for all sub-tasks and nothing seems
to be happening.

( For the record, this is what I did: export HADOOP_CLASSPATH=`hadoop
classpath` – as found at
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html
)

I found this in Job manager log:

2018-03-28 15:26:17,449 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job
UniqueIdStream (43ed4ace55974d3c486452a45ee5db93) switched from state
RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate all requires slots within timeout of 300000 ms. Slots
required: 20, slots allocated: 8
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$36(ExecutionGraph.java:984)
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:551)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:789)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)

After this there was:

2018-03-28 15:26:17,521 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Restarting
the job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93).

And some time after that:

2018-03-28 15:27:39,125 ERROR
org.apache.flink.runtime.blob.BlobServerConnection            - GET
operation failed
java.io.EOFException: Premature end of GET request
at
org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:275)
at
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117)

Task manager logs don't have any errors.

Is that error about BlobServerConnection severe enough to make the job get
stuck like this? How to debug this further?

Thanks!

On Wed, Mar 28, 2018 at 5:56 PM, Gary Yao <g...@data-artisans.com> wrote:

> Hi Juho,
>
> Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
> For example:
>   HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink run
> [...]
>
> Best,
> Gary
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> master/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths
>
>
> On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <juho.au...@rovio.com> wrote:
>
>> I built a new Flink distribution from release-1.5 branch today.
>>
>> I tried running a job but get this error:
>> java.lang.NoClassDefFoundError: com/sun/jersey/core/util/Featu
>> resAndProperties
>>
>> I use yarn-cluster mode.
>>
>> The jersey-core jar is found in the hadoop lib on my EMR cluster, but
>> seems like it's not used any more.
>>
>> I checked that jersey-core classes are not included in the new
>> distribution, but they were not included in my previously built flink
>> 1.5-SNAPSHOT either, which works. Has something changed recently to
>> cause this?
>>
>> Is this a Flink bug or should I fix this by somehow explicitly telling
>> Flink YARN app to use the hadoop lib now?
>>
>> More details below if needed.
>>
>> Thanks,
>> Juho
>>
>>
>> My launch command is basically:
>>
>> flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT}
>> -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY}
>> -yst -yD restart-strategy=fixed-delay -yD 
>> restart-strategy.fixed-delay.attempts=3
>> -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@
>>
>>
>> I'm also setting this to fix some classloading error (with the previous
>> build that still works)
>> -yD.classloader.resolve-order=parent-first
>>
>>
>> Error stack trace:
>>
>> java.lang.NoClassDefFoundError: com/sun/jersey/core/util/Featu
>> resAndProperties
>> at java.lang.ClassLoader.defineClass1(Native Method)
>> at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>> at java.security.SecureClassLoader.defineClass(SecureClassLoade
>> r.java:142)
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at org.apache.hadoop.yarn.client.api.TimelineClient.createTimel
>> ineClient(TimelineClient.java:55)
>> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.create
>> TimelineClient(YarnClientImpl.java:181)
>> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.servic
>> eInit(YarnClientImpl.java:168)
>> at org.apache.hadoop.service.AbstractService.init(AbstractServi
>> ce.java:163)
>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDesc
>> riptor(FlinkYarnSessionCli.java:971)
>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescript
>> or(FlinkYarnSessionCli.java:273)
>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterD
>> escriptor(FlinkYarnSessionCli.java:449)
>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterD
>> escriptor(FlinkYarnSessionCli.java:92)
>> at org.apache.fliCommand exiting with ret '31'
>>
>>
>

Reply via email to