EnricoMi opened a new pull request, #54268: URL: https://github.com/apache/spark/pull/54268
### What changes were proposed in this pull request? This introduces a generic `FileSystemSegmentManagedBuffer`, which wraps a segment of a file on an Hadoop `FileSystem`. This is then used by the `FallbackStorage` to read block data lazily. ### Why are the changes needed? The `ShuffleBlockFetcherIterator` iterates over various sources of block data: local, host-local, push-merged local, remote and fallback storage blocks. It makes large efforts to keep the memory consumed during iteration low. On creation of the iterator, `ShuffleBlockFetcherIterator.initialize()` creates `ManagedBuffer`s for each local, host-local and push-merged local block. Only on `ShuffleBlockFetcherIterator.next()`, the `ManagedBuffer` actually reads the block data of the next block. Remote blocks are fetched synchronously and only up to a specific amount of bytes. Currently, method `FallbackStorage.read` returns a `ManagedBuffer` that already stores the data. Therefore, fallback storage blocks are fully read in `ShuffleBlockFetcherIterator.initialize()`. The entire shuffle data of the iterator that originates on the fallback storage is hold in memory before the iteration starts. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests for `FileSystemSegmentManagedBuffer` and `ShuffleBlockFetcherIterator`. This now explicitly tests `ShuffleBlockFetcherIterator` with fallback storage blocks. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
