itholic opened a new pull request, #39387:
URL: https://github.com/apache/spark/pull/39387

   ### What changes were proposed in this pull request?
   
   This PR proposes to introduce `pyspark.errors` and error classes to unifying 
& improving errors generated by PySpark under a single path.
   
   To summarize, this PR includes the changes below:
   - Add `python/pyspark/errors/error-classes.json` to support error class for 
PySpark.
   - Add `ErrorClassesJsonReader` to manage the `error-classes.json`.
   - Add `PySparkException` to handle the errors generated by PySpark.
   - Add `check_error` for error class testing.
   
   This is an initial PR for introducing error framework for PySpark to 
facilitate the error management and provide better/consistent error messages to 
users.
   
   While such an active work is being done on the [SQL side to improve error 
messages](https://issues.apache.org/jira/browse/SPARK-37935), so far there is 
no work to improve error messages in PySpark.
   
   So, I'd expect to also initiate the effort on error message improvement for 
PySpark side from this PR.
   
   **Next up** for this PR include:
   - Migrate more errors into `PySparkException` across all modules (e.g, Spark 
Connect, pandas API on Spark...).
   - Migrate more error tests into error class tests  by using `check_error`.
   - Define more error classes onto `error-classes.json`.
   - Add documentation.
   
   ### Why are the changes needed?
   
   Centralizing error messages & introducing identified error class provides 
the following benefits:
   - Errors are searchable via the unique class names and properly classified.
   - Reduce the cost of future maintenance for PySpark errors.
   - Provide consistent & actionable error messages to users.
   - Facilitates translating error messages into different languages.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Adding UTs & running the existing static analysis tools (`dev/lint-python`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to