sunchao commented on PR #56293: URL: https://github.com/apache/spark/pull/56293#issuecomment-4628304268
@viirya Thanks for the careful review. I addressed the two concrete maintainability points in `f72fa825533`: - `MemoryConsumer.spill()` now documents that implementations must release memory synchronously on the calling thread and must not acquire task memory, directly or from another thread. - The PR description now explicitly calls out the extra pointer-capacity check on shuffle insertion instead of saying successful allocations are entirely unchanged. I kept the `ThreadLocal` guard rather than switching to a consumer-keyed set. The current guard intentionally covers any nested page allocation on the synchronous recovery call stack, including allocations by a consumer other than the one currently being spilled. A consumer-keyed set would require predicting and registering every possible nested consumer, while allowing an asynchronous spill worker to acquire task memory would conflict with the spill contract and can reintroduce blocking/deadlock behavior. The two guard helpers also cannot currently nest: each checks the guard before setting it, and the recovery loop invokes only one guarded path at a time. This update also fixes an adversarial shuffle regression found during follow-up validation: with `spark.shuffle.sort.initialBufferSize=1`, the prior head could create one spill per record. Successful pointer growth is now retained, with a one-time minimum-array fallback if it starves the first data page. The 4,097-record test now produces 12 TimSort spills and 14 radix-sort spills instead of 4,097, and radix no longer OOMs. The focused and broader memory/shuffle suites pass (68 tests), as does Checkstyle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
