anishshri-db opened a new pull request, #37196: URL: https://github.com/apache/spark/pull/37196
### What changes were proposed in this pull request? For some large users of stateful queries with lot of rocksdb related files open, they run into IO exceptions around "too many open files". ``` Job aborted due to stage failure: ... : org.rocksdb.RocksDBException: While open a file for random read: ... XXX.sst: Too many open files ``` This change allows configuring the max_open_files property for the underlying RocksDB instance. ### Why are the changes needed? By default, value for maxOpenFiles is -1, which means that the DB can keep opened files always open. However, in some cases, this will hit the OS limit and crash the process. As part of this change, we provide a state store config option for RockDB to set this to a finite value so that number of opened files can be bounded per RocksDB instance. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests to validate config passed through a RocksDB conf as well as through Spark session. ``` [info] - RocksDB confs are passed correctly from SparkSession to db instance (2 seconds, 377 milliseconds) 12:54:57.927 WARN org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite: ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.RocksDBStateStoreSuite, threads: rpc-boss-3-1 (daemon=true), shuffle-boss-6-1 (daemon=true) ===== [info] Run completed in 4 seconds, 24 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ``` [info] RocksDBSuite: 12:55:56.165 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [info] - SPARK-39781: adding valid max_open_files=-1 config property for RocksDB state store instance should succeed (1 second, 553 milliseconds) [info] - SPARK-39781: adding valid max_open_files=100 config property for RocksDB state store instance should succeed (664 milliseconds) [info] - SPARK-39781: adding valid max_open_files=1000 config property for RocksDB state store instance should succeed (558 milliseconds) [info] - SPARK-39781: adding invalid max_open_files=test config property for RocksDB state store instance should fail (9 milliseconds) [info] - SPARK-39781: adding invalid max_open_files=true config property for RocksDB state store instance should fail (8 milliseconds) [info] Run completed in 3 seconds, 815 milliseconds. [info] Total number of tests run: 5 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
