Rocksdb local directories and yarn

Сыроватский Артем Иванович via user Thu, 05 Jun 2025 11:49:49 -0700

Hello flink community!

I have some troubles running flink in yarn.
We have servers with many SSD data disks(16 per server). When running jobs in 
yarn, flink configures temporary directories, for example:


2025-06-05 17:13:54,133 INFO 
org.apache.flink.runtime.clusterframework.BootstrapTools [] - Setting 
directories for temporary files to:
/data1/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data2/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data3/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data4/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data5/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data6/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data7/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data8/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data9/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data10/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data11/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data12/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data13/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data14/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data15/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549,
/data16/hadoop-yarn/nm-local/usercache/flink/appcache/application_1746738689296_320549




But for rocksdb flink uses only one random disk(folder).

This behavior has 2 drawbacks:

1. single disk fills up very quickly (we have heavy streaming jobs with large 
state)

2. disk is overloaded while other 15 are idle


I can not find any option to change this behaviour, only to set option 
state.backend.rocksdb.localdir manually, but in this case when flink is 
restarted or killed i have garbage on disks, which i should clean manually.

In case of many clusters in yarn it is very hard to identify what directories 
are still alive and required and what are garbage.


Is there any good solution?

--

With best regards, Artem Syrovatskiy

Rocksdb local directories and yarn

Reply via email to