zirtoshka opened a new pull request, #53282:
URL: https://github.com/apache/spark/pull/53282
Scheduling-related performance issues such as **data skew** and **load
imbalance** remain difficult to diagnose automatically in open-source systems
like Spark.
Current limitations:
- detection of skew requires **manual inspection** of the Spark UI and logs;
- Spark does not emit a clear **real-time signal** that:
*“this stage is suffering from scheduling-related performance problems”*;
- automated remediation tools cannot act without such a signal.
This project implements a small but practical step towards **automated
remediation**:
> A lightweight driver-side skew detector that emits a structured event
whenever scheduling-related issues occur.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]