Hi

I'm struggling with updating a flink application. While developing, I was using 
a dev application configuration (application.conf). While updating 
application.conf, and rerunning job in a production mode, I caught an exception 
that stated, that i'm having a dev application configuration. Looks that 
application.conf was somehow cached in a cluster.
Application configuration includes section of dynamic list of RichMapFunction`s.
These map steps are assembled into a pipeline before starting a job.
Dev configuration have some map steps, that are not enabled in the production 
configuration.
For example dev pipeline looks like: source -> dmpkit-cp-base -> 
dmpkit-cp-attribute-mapper -> dmpkit-cp-tm -> sink
Production pipeline lacks first mapper: source -> dmpkit-cp-attribute-mapper -> 
dmpkit-cp-tm -> sink

Modifying application.conf from dev to prod and rerunning job results in having 
dev configuration enabled, which causes errors.
Job is submitted with the parameters:
sudo -Eu dmpkit /usr/lib/flink/bin/flink run \
-m yarn-cluster \
--detached \
--yarnname cleverdata-dmpkit-customerjourney-http-job \
--yarnship /tmp/tmp.uYNTPFfAtm-cleverdata-dmpkit-customerjourney-http-job \
--class ru.cleverdata.dmpkit.customerjourney.http.job.impl.ApplicationBootstrap 
\
-C 
file:///tmp/tmp.uYNTPFfAtm-cleverdata-dmpkit-customerjourney-http-job/jackson-databind-2.9.5.jar
 \
.... (other dependencies jars) \
-C 
file:///tmp/tmp.uYNTPFfAtm-cleverdata-dmpkit-customerjourney-http-job/etc.zip \
/tmp/tmp.uYNTPFfAtm-cleverdata-dmpkit-customerjourney-http-job/cleverdata-dmpkit-customerjourney-http-job-impl_2.11-2.18.0-93.jar

application.conf is zipped into etc.zip archive and shipped in both --yarnship 
/tmp/... directory and -C file:///.../etc.zip file.
Inspection of all deployed jars and configurations shows none dev 
configurations. Started application shows that dev configuration loaded. App 
crashes.

However, some tracks of dev configuration can be found in the hdfs directory of 
a deployed job:
some tmp file states, that dev pipeline map functions enabled

hdfs dfs -cat 
/user/dmpkit/.flink/application_1586040264218_59021/application_1586040264218_590217815615186225961213.tmp
 | strings | less
contains some mentions of dmpkit-cp-base, which is dev-only map step.
I have no idea how this map function got into pipeline, when, as I said 
earlier, it is not enabled in configuration and non of jars and\or *.conf files 
contains this configuration.
I'm stuck with searching where this invalid configuration could be cached.

Cheers,
Dmitry.

Reply via email to