Can you give me the stacktrace that is logged in the JobManager logs?
On 13.03.2019 10:57, Wouter Zorgdrager wrote:
Hi Chesnay,
Unfortunately this is not true when I run the Flink 1.7.2 docker
images. The response is still:
{
"errors": [
"org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error."
]
}
Regards,
Wouter Zorgdrager
Op wo 13 mrt. 2019 om 10:42 schreef Chesnay Schepler
<[email protected] <mailto:[email protected]>>:
You should get the full stacktrace if you upgrade to 1.7.2 .
On 13.03.2019 09:55, Wouter Zorgdrager wrote:
Hey all!
I'm looking for some advice on the following; I'm working on an
abstraction on top of Apache Flink to 'pipeline' Flink
applications using Kafka. For deployment this means that all
these Flink jobs are embedded into one jar and each job is
started using an program argument (e.g. "--stage
'FirstFlinkJob'". To ease deploying a set of interconnected Flink
jobs onto a cluster I wrote a Python script which basically
communicates with the REST client of the JobManager. So you can
do things like "pipeline start --jar 'JarWithThePipeline.jar'"
and this would deploy every Flink application separately.
However, this script was written a while ago against Flink
version "1.4.2". This week I tried to upgrade it to Flink latest
version but I noticed a change in the REST responses. In order to
get the "pipeline start" command working,we need to know all the
Flink jobs that are in the jar (we call these Flink jobs
'stages') because we need to know the stage names as argument for
the jar. For the 1.4.2 version we used a dirty trick; we ran the
jar with '--list --asException' as program arguments which
basically runs the jar file and immediately throws an exception
with the stage names. These are then parsed and used to start
every stage separately. The error message that Flink threw looked
something like this:
java.util.concurrent.CompletionException:
org.apache.flink.util.FlinkException: Could not run the jar.
at
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:90)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not run
the jar.
... 9 more
Caused by:
org.apache.flink.client.program.ProgramInvocationException: The
main method caused an error.
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:542)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
at
org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)
at
org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:334)
at
org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:87)
at
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:69)
... 8 more
Caused by: org.codefeedr.pipeline.PipelineListException:
["org.codefeedr.plugin.twitter.stages.TwitterStatusInput","mongo_tweets","elasticsearch_tweets"]
at org.codefeedr.pipeline.Pipeline.showList(Pipeline.scala:114)
at org.codefeedr.pipeline.Pipeline.start(Pipeline.scala:100)
at nl.wouterr.Main$.main(Main.scala:23)
at nl.wouterr.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
However, for 1.7.0 this trick doesn't work anymore because
instead of returning the full stack trace, it only returns the
following:
org.apache.flink.client.program.ProgramInvocationException: The
program caused an error:
In the console of the JobManager it does give the full stack
trace though. So first of all I'm wondering if there might be a
way to enable more detailed stacktraces for Flink 1.7 in the REST
responses. If not, do you have any suggestions on how to tackle
this problem. I know, in the end this isn't really a Flink
problem however you might know a workaround in the Flink REST
client to achieve the same.
Some solutions I already considered:
- Running the jar with the "--list --asException" locally through
the Python script; however Flink and Scala are not provided in
the jar. Technically I could add them both to the classpath, but
this would require users to have the Flink jar locally (and also
Scala somewhere, but I assume most have).
- Let users provide a list of stage names for all their
(interconnected) Flink jobs. This is not really an option,
because the (main) idea behind this framework is to reduce the
boilerplate and cumbersome of setting up complex stream
processing architectures.
Any help is appreciated. Thanks in advance!
Kind regards,
Wouter Zorgdrager