hvanhovell opened a new pull request #34632: URL: https://github.com/apache/spark/pull/34632
### What changes were proposed in this pull request? This PR moves the BlockInfoManager from a single mutex per instance to much more fine grained locking at the block level. Concretely this PR makes the following changes: - The use of `this.synchronized` for guarding against concurrent creation of a block has been replaced with a striped lock. We have effectively replaced a single coarse-grained lock with a lock per block. - The use of `this.synchronized` for `wait()` and `notifyAll()` has been replaced with per-block Conditions. - Extract common logic from `lockForWriting` and `lockForReading` into an acquireLock helper method. This deduplication is important given the size of the changes that this PR needed to make to the shared code. - Optimization: call `currentTaskAttemptId` only once per method by storing its result into a local variable. The PR is the first in a series of PRs. The next one will add group based locking and removal so we can remove broadcasts and cached RDDs in a constant time operation. ### Why are the changes needed? The main motivation for this change is to increase and stabilize throughput of clusters that run a significant number of concurrent queries. In the current situation these queries are fighting for the BlockInfoManager lock when they create broadcasts. The worse problem arrises after we do a full GC. This triggers clean-up of unused broadcasts. The number of broadcasts in the system can be significant (>> 10K) when we run under load. We need to acquire a lock per broadcast block. We end up with all the BM worker threads, the DAG Scheduler thread, and the query threads fighting for a single lock. This causes the throughput to drop to close to 0 during these clean-up periods. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Functionally this is covered by existing tests. The performance has been checked by running 32 concurrent streams and hammering a cluster with queries. That shows the throughput drops are mostly gone now. I am not sure how well we can capture this in a non invasive benchmark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
