Scalability

Radoslav Smilyanov Mon, 26 Apr 2021 01:14:50 -0700

Hi all,

I am having multiple questions regarding Flink :) Let me give you some
background of what I have done so far.

*Description*
I am using Flink 1.11.2. My job is doing data enrichment. Data is consumed
from 6 different kafka topics and it is joined via multiple
CoProcessFunctions. On a daily basis the job is handling ~20 millions
events from the source kafka topics.

*Configuration*
These are the settings I am using:

jobmanager.memory.process.size: 4096m
jobmanager.memory.off-heap.size: 512m
taskmanager.memory.process.size: 12000m
taskmanager.memory.task.off-heap.size: 512m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 5
taskmanager.rpc.port: 6122
jobmanager.execution.failover-strategy: region
state.backend: rocksdb
state.backend.incremental: true
state.backend.rocksdb.localdir: /opt/flink/rocksdb
state.backend.rocksdb.memory.managed: true
state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED
state.backend.rocksdb.block.cache-size: 64mb
state.checkpoints.dir: s3://bucket/checkpoints
state.savepoints.dir: s3://bucket/savepoints
s3.access-key: AWS_ACCESS_KEY_ID
s3.secret-key: AWS_SECRET_ACCESS_KEY
s3.endpoint: http://<internal_url>
s3.path.style.access: true
s3.entropy.key: _entropy_
s3.entropy.length: 8
presto.s3.socket-timeout: 10m
client.timeout: 60min

*Deployment setup*
Flink is deployed in k8s with Per-Job mode having 1 job manager and 5 task
managers. I have a daily cron job which triggers savepoint in order to have
a fresh copy of the whole state.

*Problems with the existing setup*
1. I observe that savepoints are causing Flink to consume more than the
allowed memory. I observe the behavior described in this stackoverflow post
<https://stackoverflow.com/questions/64172881/flink-1-11-1-off-heap-memory-growing-using-rocksdbstatebackend>
(which
seems to be solved in 1.12.X if I am getting it right).
2. I cannot achieve high availability with Per-Job mode and thus I ended up
having a regular savepoint on a daily basis.

*Questions*
1. Is it a good idea to have regular savepoints (say on a daily basis)?
2. Is it possible to have high availability with Per-Job mode? Or maybe I
should go with session mode and make sure that my flink cluster is running
a single job?
3. Let's assume that savepoints should be triggered only before job
update/deployment. How can I trigger a savepoint if my job is already
consuming more than 80% of the allowed memory per pod in k8s? My
observations show that k8s kills task managers (which are running as pods)
and I need to retry it a couple of times.
4. Should I consider upgrading to version 1.12.3?
5. Should I consider switching off state.backend.rocksdb.memory.managed
property even in version 1.12.3?
6. How do I decide when the job parallelism should be increased? Are there
some metrics which can lead me to a clue that the parallelism should be
increased?

Best Regards,
Rado

Deployment/Memory Configuration/Scalability

Reply via email to