FileSource enumerateSplits recursively call S3 list - high number of calls

William Wallace Tue, 15 Jul 2025 03:27:40 -0700

We are using Flink File Connector to continuously scan s3 paths. We use
FileSource which uses NonSplittingRecursiveEnumerator to scan the s3 paths.
For each parent path, the enumerateSplits function will recursively call S3
list for each S3 “sub-directory”, which can result in a large number of
calls to S3 as the number of sub-directories grows. This can be excessively
slow versus calling the S3 ListObjectsV2 API directly on the parent path.



For example, if the parent path is /test, and there are 1000 subdirectories
under /test, this will result in 1000 calls to S3 versus 1 call to S3.


Could you let us know if this behavior is expected and in such case could
be optimized by reducing the high number of reads? This is a blocker for us.


Thank you.

FileSource enumerateSplits recursively call S3 list - high number of calls

Reply via email to