itholic opened a new pull request, #39387: URL: https://github.com/apache/spark/pull/39387
### What changes were proposed in this pull request? This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path. To summarize, this PR includes the changes below: - Add `python/pyspark/errors/error-classes.json` to support error class for PySpark. - Add `ErrorClassesJsonReader` to manage the `error-classes.json`. - Add `PySparkException` to handle the errors generated by PySpark. - Add `check_error` for error class testing. This is an initial PR for introducing error framework for PySpark to facilitate the error management and provide better/consistent error messages to users. While such an active work is being done on the [SQL side to improve error messages](https://issues.apache.org/jira/browse/SPARK-37935), so far there is no work to improve error messages in PySpark. So, I'd expect to also initiate the effort on error message improvement for PySpark side from this PR. **Next up** for this PR include: - Migrate more errors into `PySparkException` across all modules (e.g, Spark Connect, pandas API on Spark...). - Migrate more error tests into error class tests by using `check_error`. - Define more error classes onto `error-classes.json`. - Add documentation. ### Why are the changes needed? Centralizing error messages & introducing identified error class provides the following benefits: - Errors are searchable via the unique class names and properly classified. - Reduce the cost of future maintenance for PySpark errors. - Provide consistent & actionable error messages to users. - Facilitates translating error messages into different languages. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Adding UTs & running the existing static analysis tools (`dev/lint-python`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
