I investigated further:
As Flink 1.9.1 works and Flink 1.9.2 does not, I simply tried a rollout
of a vanilla flink:1.9.2-scala_2.11 image to K8s and that worked. So the
issue must be in my image or the JAR I am attaching:
ARG FLINK_VERSION=1.9.2
ARG SCALA_VERSION=2.11
FROM flink:${FLINK_VERSION}-scala_${SCALA_VERSION}
COPY --chown=flink:flink conf/log4j-console.properties
/opt/flink/conf/log4j-console.properties
ADD --chown=flink:flink
https://repo1.maven.org/maven2/org/apache/flink/flink-metrics-prometheus_${SCALA_VERSION}/${FLINK_VERSION}/flink-metrics-prometheus_${SCALA_VERSION}-${FLINK_VERSION}.jar
/opt/flink/lib/flink-metrics-prometheus_${SCALA_VERSION}-${FLINK_VERSION}.jar
ADD --chown=flink:flink
https://repo1.maven.org/maven2/org/apache/flink/flink-statebackend-rocksdb_${SCALA_VERSION}/${FLINK_VERSION}/flink-statebackend-rocksdb_${SCALA_VERSION}-${FLINK_VERSION}.jar
/opt/flink/lib/flink-statebackend-rocksdb_${SCALA_VERSION}-${FLINK_VERSION}.jar
ADD --chown=flink:flink deployment/run.sh /opt/flink/run.sh
RUN chmod +x /opt/flink/run.sh
COPY --from=builder --chown=flink:flink
/build/target/di-beam-bundled.jar /opt/flink/lib/beam_pipelines.jar
Commenting out the last COPY step - to circumvent the addition of the
fat Beam JAR - did the trick. So I think my beam JAR contains something
Flink does not like. Next thing I tried was building a vanilla Beam JAR via:
mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.19.0 \
-DgroupId=org.example \
-DartifactId=word-count-beam \
-Dversion="0.1" \
-Dpackage=org.apache.beam.examples \
-DinteractiveMode=false
and then mvn package -Pflink-runner - even with this JAR the
webinterface of the jobmanager does not come up.
Best,
Tobias
On Fri, Feb 28, 2020 at 11:36 AM Kaymak, Tobias
<tobias.kay...@ricardo.ch <mailto:tobias.kay...@ricardo.ch>> wrote:
Hi Max,
I am surprised that Stop and Cancel had the same action in Flink 1.8
and lower. When I clicked on stop in the past the whole pipeline
shutdown each operator one after the other and after a couple of
minutes it was done and all outstanding bundles were imported
correctly (my pipeline goes from Kafka to BigQuery). When I clicked
cancel it stopped immediately, leaving some data (bundles) in the
"air". Was my understanding wrong? From an operational point of view
when clicking cancel and starting the pipeline again will it resume
without a loss of data?
Here is my mail to the Flink mailing list explaining my issue with
1.9.2:
I enabled DEBUG/TRACE logging and it looks ok to me when I make the
request after the jobmanager pod has been started:
2020-02-27 10:11:42,439 TRACE
org.apache.flink.runtime.rest.FileUploadHandler -
Received request. URL:/ Method:GET
2020-02-27 10:11:42,440 DEBUG
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler
- Responding with file
'/tmp/flink-web-19e3b6a9-f464-49bf-ad94-3237ae898bc7/flink-web-ui/index.html'
2020-02-27 10:11:42,578 TRACE
org.apache.flink.runtime.rest.FileUploadHandler -
Received request. URL:/styles.30d0912c1ece284d8d9a.css Method:GET
2020-02-27 10:11:42,579 DEBUG
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler
- Responding with file
'/tmp/flink-web-19e3b6a9-f464-49bf-ad94-3237ae898bc7/flink-web-ui/styles.30d0912c1ece284d8d9a.css'
2020-02-27 10:11:42,648 TRACE
org.apache.flink.runtime.rest.FileUploadHandler -
Received request. URL:/polyfills.b37850e8279bc3caafc9.js Method:GET
2020-02-27 10:11:42,649 DEBUG
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler
- Responding with file
'/tmp/flink-web-19e3b6a9-f464-49bf-ad94-3237ae898bc7/flink-web-ui/polyfills.b37850e8279bc3caafc9.js'
2020-02-27 10:11:42,651 TRACE
org.apache.flink.runtime.rest.FileUploadHandler -
Received request. URL:/runtime.440aa244803781d5f83e.js Method:GET
2020-02-27 10:11:42,652 DEBUG
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler
- Responding with file
'/tmp/flink-web-19e3b6a9-f464-49bf-ad94-3237ae898bc7/flink-web-ui/runtime.440aa244803781d5f83e.js'
2020-02-27 10:11:42,658 TRACE
org.apache.flink.runtime.rest.FileUploadHandler -
Received request. URL:/main.177039bdbab11da4f8ac.js Method:GET
2020-02-27 10:11:42,658 DEBUG
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler
- Responding with file
'/tmp/flink-web-19e3b6a9-f464-49bf-ad94-3237ae898bc7/flink-web-ui/main.177039bdbab11da4f8ac.js'
2020-02-27 10:11:42,922 TRACE
org.apache.flink.runtime.rest.FileUploadHandler -
Received request. URL:/config Method:GET
I tried Chrome (newest version) and Safari, both with Incognito
modes as well. The webserver returns Javascript and HTML to the
browser - so I am very puzzled about what could be wrong here?
image.png
Content type looks ok:
curl -i localhost:8081
HTTP/1.1 200 OK
Content-Type: text/html
Date: Thu, 27 Feb 2020 12:48:08 GMT
Expires: Thu, 27 Feb 2020 12:53:08 GMT
Cache-Control: private, max-age=300
Last-Modified: Thu, 27 Feb 2020 12:43:36 GMT
Connection: keep-alive
content-length: 2137
Best,
Tobias
On Fri, Feb 28, 2020 at 10:29 AM Maximilian Michels <m...@apache.org
<mailto:m...@apache.org>> wrote:
The stop functionality has been removed in Beam. It was
semantically
identical to using cancel, so we decided to drop support for it.
> Both Flink 1.9.2 and 1.10.0 are not supported yet on Beam,
probably they will be part of the 2.21.0 release
1.9.2 should be supported, as it is just a patch release.
Generally, all
1.x.y release are compatible with Beam's Flink 1.x Runner.
> I can confirm that the pipeline behaves as expected with
2.20.0-SNAPSHOT and Flink 1.9.1 - I also tried Flink 1.9.2 but
the webinterface didn't show up (just a blank page - javascript
was being loaded though).
I'm surprised about that. If you have more information that
would be great.
-Max
On 28.02.20 07:46, Kaymak, Tobias wrote:
> What I found so far is that the "Stop" Button next to the
"Cancel"
> button is missing when I run my Beam 2.19.0/2.20.0-SNAPSHOT
streaming
> pipeline in Flink's 1.9.1's web interface. I couldn't figure
out yet if
> it has been removed by the Flink team on purpose or if that
is something
> "missing" in the Beam translation layer.
>
> Best,
> Tobias
>
> On Thu, Feb 27, 2020 at 1:44 PM Ismaël Mejía
<ieme...@gmail.com <mailto:ieme...@gmail.com>
> <mailto:ieme...@gmail.com <mailto:ieme...@gmail.com>>> wrote:
>
> Both Flink 1.9.2 and 1.10.0 are not supported yet on
Beam, probably
> they will be part of the 2.21.0 release
> You can follow the progress on both issues (and help us
with early
> testing once in master):
>
> BEAM-9295 Add Flink 1.10 build target and Make FlinkRunner
> compatible with Flink 1.10
> https://issues.apache.org/jira/browse/BEAM-9295
>
> BEAM-9299 Upgrade Flink Runner to 1.8.3 and 1.9.2
> https://issues.apache.org/jira/browse/BEAM-9299
>
> Regards,
> Ismaël
>
>
> On Thu, Feb 27, 2020 at 11:53 AM Kaymak, Tobias
> <tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>> wrote:
>
> Copy paste error, sorry:
>
> 2.20.0-SNAPSHOT in combination
with beam-runners-flink-1.10
> or beam-runners-flink-1.10-SNAPSHOT didn't work
either for me.
>
>
> On Thu, Feb 27, 2020 at 11:50 AM Kaymak, Tobias
> <tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>> wrote:
>
> I can confirm that the pipeline behaves as
expected with
> 2.20.0-SNAPSHOT and Flink 1.9.1 - I also tried
Flink 1.9.2
> but the webinterface didn't show up (just a blank
page -
> javascript was being loaded though).
> I emptied my cache and investigated the log and
asked on the
> Flink mailing list if this is known - maybe it's also
> because of one of the dependencies in my fat Beam
jar. I am
> still investigating this.
>
> How can I test the Flink 1.10 runners? (The
following POM is
> not resolvable by maven)
>
> <dependency>
> <groupId>org.apache.beam</groupId>
>
<artifactId>beam-runners-flink-1.10</artifactId>
> <version>2.20-SNAPSHOT</version>
> </dependency>
>
> Best,
> Tobi
>
> On Wed, Feb 26, 2020 at 5:07 PM Ismaël Mejía
> <ieme...@gmail.com <mailto:ieme...@gmail.com>
<mailto:ieme...@gmail.com <mailto:ieme...@gmail.com>>> wrote:
>
> Since it was merged yesterday you can test
with the
> 2.20.0-SNAPSHOT until the first candidate is out.
>
> On Wed, Feb 26, 2020 at 4:37 PM Kaymak, Tobias
> <tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>
> <mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>> wrote:
>
> If I am not running in detached mode (so
that my
> pipeline starts) I am unable to Stop it
in the
> webinterface. The only option available
is to cancel
> it. Is this expected?
>
> Screenshot 2020-02-26 at 16.34.08.png
>
> On Wed, Feb 26, 2020 at 4:16 PM Kaymak,
Tobias
> <tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>
> <mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>> wrote:
>
> Hello,
>
> we fixed the issue and are ready to
test :) - is
> there a RC already available?
>
> Best,
> Tobi
>
> On Wed, Feb 26, 2020 at 12:59 PM
Kaymak, Tobias
> <tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>
> <mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>> wrote:
>
> Hello,
>
> happy to help testing! I am
currently fixing
> a networking issue between our
dev cluster
> for integration tests and the
Kafka it is
> consuming from.
> After that I would be ready to
spin it up
> and test
>
> Best,
> Tobi
>
> On Mon, Feb 24, 2020 at 10:13 PM
Maximilian
> Michels <m...@apache.org
<mailto:m...@apache.org>
> <mailto:m...@apache.org
<mailto:m...@apache.org>>> wrote:
>
> Thank you for reporting /
filing /
> collecting the issues.
>
> There is a fix pending:
> https://github.com/apache/beam/pull/10950
>
> As for the upgrade issues,
the 1.8 and
> 1.9 upgrade is trivial. I will
> check out the Flink 1.10 PR
tomorrow.
>
> Cheers,
> Max
>
> On 24.02.20 09:26, Ismaël
Mejía wrote:
> > We are cutting the release
branch for
> 2.20.0 next wednesday, so not
sure
> > if these tickets will make
it, but
> hopefully.
> >
> > For ref,
> > BEAM-9295 Add Flink 1.10
build target
> and Make FlinkRunner compatible
> > with Flink 1.10
> > BEAM-9299 Upgrade Flink
Runner to
> 1.8.3 and 1.9.2
> >
> > In any case if you have
cycles to
> help test any of the related
tickets
> > PRs that would help too.
> >
> >
> > On Mon, Feb 24, 2020 at
8:47 AM
> Kaymak, Tobias
<tobias.kay...@ricardo.ch <mailto:tobias.kay...@ricardo.ch>
>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>
> >
<mailto:tobias.kay...@ricardo.ch <mailto:tobias.kay...@ricardo.ch>
>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>>> wrote:
> >
> > Hi Kyle,
> >
> > thank you for creating
the JIRA
> ticket, I think my best
option right
> > now is to wait for a
Beam version
> that is running on Flink 1.10
then
> > - unless there is a
new Beam
> release around the corner :)
> >
> > Best,
> > Tobi
> >
> > On Thu, Feb 20, 2020
at 11:52 PM
> Kyle Weaver
<kcwea...@google.com <mailto:kcwea...@google.com>
> <mailto:kcwea...@google.com
<mailto:kcwea...@google.com>>
> >
<mailto:kcwea...@google.com <mailto:kcwea...@google.com>
> <mailto:kcwea...@google.com
<mailto:kcwea...@google.com>>>> wrote:
> >
> > Hi Tobi,
> >
> > This seems like a bug
> with Beam 2.19. I filed
> >
> https://issues.apache.org/jira/browse/BEAM-9345 to
> track the issue.
> >
> > > What puzzles me
is that
> the session cluster should be
allowed
> > to have multiple
environments
> in detached mode - or am I wrong?
> >
> > It looks like
that check is
> removed in Flink 1.10:
> >
> https://issues.apache.org/jira/browse/FLINK-15201
> >
> > Thanks for reporting.
> > Kyle
> >
> > On Thu, Feb 20,
2020 at 4:10
> AM Kaymak, Tobias
> >
<tobias.kay...@ricardo.ch <mailto:tobias.kay...@ricardo.ch>
>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>
>
<mailto:tobias.kay...@ricardo.ch <mailto:tobias.kay...@ricardo.ch>
>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>>> wrote:
> >
> > Hello,
> >
> > I am trying to
upgrade
> from a Flink session cluster
1.8 to
> > 1.9 and from
Beam 2.16.0
> to 2.19.0.
> > Everything went
> quite smoothly, the local
runner and the
> > local Flink
runner work
> flawlessly.
> >
> > However when I:
> > 1. Generate
a Beam jar
> for the FlinkRunner via maven
(mvn
> > package
-PFlinkRunner)
> > 2. Glue
that into a
> Flink 1.9 docker image
> > 3. Start
the image as
> a Standalone Session Cluster
> >
> > When I try to
launch the
> first pipeline I get the
following
> > exception
> >
> >
>
org.apache.flink.client.program.ProgramInvocationException:
> > The main
method caused an
> error: Failed to construct
> > instance from
factory method
> >
>
FlinkRunner#fromOptions(interface
> >
>
org.apache.beam.sdk.options.PipelineOptions)
> > at
> >
>
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:593)
> > at
> >
>
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
> > at
> >
>
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
> > at
> >
>
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
> > at
> >
>
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
> > at
> >
>
org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
> > at
> >
>
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
> > at
> >
>
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
> > at
> >
>
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> > at
> >
>
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
> > Caused by:
> java.lang.RuntimeException:
Failed to
> construct
> > instance from
factory method
> >
>
FlinkRunner#fromOptions(interface
> >
>
org.apache.beam.sdk.options.PipelineOptions)
> > at
> >
>
org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:224)
> > at
> >
>
org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:155)
> > at
> >
>
org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
> > at
>
org.apache.beam.sdk.Pipeline.run(Pipeline.java:309)
> > at
>
org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
> > at
> >
>
ch.ricardo.di.beam.KafkaToBigQuery.main(KafkaToBigQuery.java:180)
> > at
> >
>
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
>
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at
>
java.lang.reflect.Method.invoke(Method.java:498)
> > at
> >
>
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
> > ... 9
more
> > Caused by:
>
java.lang.reflect.InvocationTargetException
> > at
> >
>
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
>
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at
>
java.lang.reflect.Method.invoke(Method.java:498)
> > at
> >
>
org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
> > ...
19 more
> > Caused by:
> >
>
org.apache.flink.api.common.InvalidProgramException:
> > Multiple
environments
> cannot be created in detached
mode
> > at
> >
>
org.apache.flink.client.program.ContextEnvironmentFactory.createExecutionEnvironment(ContextEnvironmentFactory.java:67)
> > at
>
java.util.Optional.map(Optional.java:215)
> > at
> >
>
org.apache.flink.api.java.ExecutionEnvironment.getExecutionEnvironment(ExecutionEnvironment.java:1068)
> > at
> >
>
org.apache.beam.runners.flink.translation.utils.Workarounds.restoreOriginalStdOutAndStdErrIfApplicable(Workarounds.java:43)
> > at
> >
>
org.apache.beam.runners.flink.FlinkRunner.<init>(FlinkRunner.java:96)
> > at
> >
>
org.apache.beam.runners.flink.FlinkRunner.fromOptions(FlinkRunner.java:90)
> > ...
24 more
> >
> > I've checked
the release
> notes and the issues and couldn't
> > find anything that
> relates to this. What puzzles
me is that
> > the session
cluster
> should be allowed to have
multiple
> > environments
in detached
> mode - or am I wrong?
> >
> > Best,
> > Tobi
> >
> >
> >
> > --
> >
> > Tobias Kaymak
> > Data Engineer
> > Data Intelligence
> >
> > tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>
>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>
>
<mailto:tobias.kay...@ricardo.ch <mailto:tobias.kay...@ricardo.ch>
>
<mailto:tobias.kay...@ricardo.ch
<mailto:tobias.kay...@ricardo.ch>>>
> > www.ricardo.ch
<http://www.ricardo.ch>
> <http://www.ricardo.ch>
> <http://www.ricardo.ch/>
> > Theilerstrasse 1a,
6300 Zug
> >
>