Thank you Xintong, while tracking down the existence of bash-java-utils.jar I found a bug in my CI scripts that incorrectly built the wrong version of flink. I fixed this and then added a -Xmx value.
env: - name: FLINK_ENV_JAVA_OPTS value: "-Xmx{{ .Values.analytics.flink.taskManagerHeapSize }}" It's running perfectly now! Thank you again, Clay On Fri, Jun 12, 2020 at 5:13 AM Xintong Song <tonysong...@gmail.com> wrote: > Hi Clay, > > Could you verify the "taskmanager.sh" used is the same script shipped with > Flink-1.10.1? Or a custom script is used? Also, does the jar file > "bash-java-utils.jar" exist in your Flink bin directory? > > In Flink 1.10, the memory configuration for a TaskManager works as follows. > > - "taskmanager.sh" executes "bash-java-utils.jar" for the memory > calculations > - "bash-java-utils.jar" will read your "flink-conf.yaml" and all the > "-D" arguments, and calculate memory sizes accordingly > - "bash-java-utils.jar" will then return the memory calculation > results as two strings, for JVM parameter ("-Xmx", "-Xms", etc.) and > dynamic configurations ("-D") respectively > - At this step, all the detailed memory sizes should be determined > - That means, even for memory sizes not configured by you, there > should be an exact value generated in the returned dynamic configuration > - That also means, for memory components configured in ranges > (e.g., network memory configured through a pair of [min, max]), > a deterministic value should be decided and both min/max configuration > options should already been overwrite to that value > - "taskmanager.sh" starts the task manager JVM process with the > returned JVM parameters, and passes the dynamic configurations as arguments > into the task manager process. These dynamic configurations will be read by > Flink task manager so that memory will be managed accordingly. > > Flink task manager expects all the memory configurations are already set > (thus network min/max should have the same value) before it's started. In > your case, it seems such configurations are missing. Same for the cpu cores. > > Thank you~ > > Xintong Song > > > > On Fri, Jun 12, 2020 at 12:58 AM Clay Teeter <clay.tee...@maalka.com> > wrote: > >> Hi flink fans, >> >> I'm hoping for an easy solution. I'm trying to upgrade my 9.3 cluster to >> flink 10.1, but i'm running into memory configuration errors. >> >> Such as: >> *Caused by: org.apache.flink.configuration.IllegalConfigurationException: >> The network memory min (64 mb) and max (1 gb) mismatch, the network memory >> has to be resolved and set to a fixed value before task executor starts* >> >> *Caused by: org.apache.flink.configuration.IllegalConfigurationException: >> The required configuration option Key: 'taskmanager.cpu.cores' , default: >> null (fallback keys: []) is not set* >> >> I was able to fix a cascade of errors by explicitly setting these values: >> >> taskmanager.memory.managed.size: {{ >> .Values.analytics.flink.taskManagerManagedSize }} >> taskmanager.memory.task.heap.size: {{ >> .Values.analytics.flink.taskManagerHeapSize }} >> taskmanager.memory.jvm-metaspace.size: 500m >> taskmanager.cpu.cores: 4 >> >> So, the documentation implies that flink will default many of these >> values, however my 101. cluster doesn't seem to be doing this. 9.3, worked >> great! >> >> Do I really have to set all the memory (even network) values? If not, >> what am I missing? >> >> If i do have to set all the memory parameters, how do I resolve "The >> network memory min (64 mb) and max (1 gb) mismatch"? >> >> >> My cluster runs standalone jobs on kube >> >> flnk-config.yaml: >> state.backend: rocksdb >> state.backend.incremental: true >> state.checkpoints.num-retained: 1 >> taskmanager.memory.managed.size: {{ >> .Values.analytics.flink.taskManagerManagedSize }} >> taskmanager.memory.task.heap.size: {{ >> .Values.analytics.flink.taskManagerHeapSize }} >> taskmanager.memory.jvm-metaspace.size: 500m >> taskmanager.cpu.cores: 4 >> taskmanager.numberOfTaskSlots: {{ >> .Values.analytics.task.numberOfTaskSlots }} >> parallelism.default: {{ .Values.analytics.flink.parallelism }} >> >> >> JobManger: >> command: ["/opt/flink/bin/standalone-job.sh"] >> args: ["start-foreground", "-j={{ .Values.analytics.flinkRunnable >> }}", ... >> >> TakManager >> command: ["/opt/flink/bin/taskmanager.sh"] >> args: [ >> "start-foreground", >> "-Djobmanager.rpc.address=localhost", >> "-Dmetrics.reporter.prom.port=9430"] >> >> >> >>