nchammas opened a new pull request, #44971:
URL: https://github.com/apache/spark/pull/44971
### What changes were proposed in this pull request?
Consolidate all error documentation into a single page of error states and
conditions. Each condition and sub-condition will have a link anchor that
allows for direct references.
Here are some examples:
```
sql-error-conditions.html#cannot-update-field
sql-error-conditions.html#cannot-update-field-array-type
sql-error-conditions.html#cannot-update-field-interval-type
sql-error-conditions.html#cannot-update-field-map-type
```
The table is styled to make it easier to read:
- Sub-conditions are indented relative to their parent condition.
- Long condition names like
`UDTF_INVALID_ALIAS_IN_REQUESTED_ORDERING_STRING_FROM_ANALYZE_METHOD` wrap in a
visually pleasing manner.
The new documentation is generated dynamically via
`docs/util/build-error-docs.py` as part of the documentation build. No
generated files will be tracked in git.
TODO:
- [ ] Figure out what, if anything, we will do about the links we are
breaking by deleting the old error pages.
- [ ] Make sure we are using the correct terminology once SPARK-46810 is
resolved.
- [ ] Add opening prose for the main error conditions page.
- [ ] Update the main Jekyll build to generate the new documentation on the
fly.
### Why are the changes needed?
The current error documentation has several problems that make it difficult
to maintain and difficult to use:
1. The current documentation consists of many Markdown files that are both
programmatically generated and checked in to git. This combination leads to
problems like the checked in files getting out of sync with the source they are
generated from, and leads to unnecessary code churn and problems like the one
that precipitated #44847.
2. The individual pages like
`docs/sql-error-conditions-cannot-load-state-store-error-class.md` are sparse
and don't add much value as standalone pages. They also clutter the top-level
namespace. There are around 60 of these individual pages, one for each error
condition that has sub-conditions.
The number of pages to manage leads to other, more subtle problems as
well. Though there are 60 individual pages, only 22 of them are captured in
`docs/_data/menu-sql.yaml`.
3. The current process to generate this documentation is awkward. We are
using a) a Spark Scala test that requires b) a special environment variable to
be set so that c) we read some JSON, in order to d) generate Markdown files,
which in turn e) get compiled into HTML.
Fundamentally, we are just generating HTML from JSON. It seems
inappropriate to couple this process to the Spark build.
### Does this PR introduce _any_ user-facing change?
Yes, it overhauls the error documentation. Unless we add redirects, several
links that work against the Spark 3.5 documentation will no longer work against
the Spark 4.0 documentation.
### How was this patch tested?
I built the docs and reviewed the results in my browser.
<img width="400"
src="https://github.com/apache/spark/assets/1039369/effdeea9-a5f8-4471-873f-9746480e1b97">
<br>
<img width="400"
src="https://github.com/apache/spark/assets/1039369/278af3de-a6a3-4872-84ee-97371f92232e">
<br>
<img width="400"
src="https://github.com/apache/spark/assets/1039369/e11dc3bf-bfda-4d70-bd3d-605d48d06d26">
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]