nchammas opened a new pull request, #44756:
URL: https://github.com/apache/spark/pull/44756
### What changes were proposed in this pull request?
Enable Spark configs to be assigned to documentation groups. These groups
will be used to automatically build config tables for display in our
documentation.
Instead of having to maintain [large blocks of HTML tables][1] throughout
our documentation, config tables can simply be included as follows:
```liquid
{% include_api_gen _generated/config_tables/sql-tuning-caching-data.html %}
```
This approach covers both SQL and non-SQL config docs and, if accepted, will
replace `sql/gen-sql-config-docs.py`.
This proposal is an alternative to #44300 that does not require modifying
`ConfigEntry` or `ConfigBuilder` to add a new field. Instead, the groups are
defined completely outside of Spark's core.
[1]:
https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/docs/sql-performance-tuning.md?plain=1#L37-L56
### Why are the changes needed?
Using this approach we can accomplish several goals at once:
- Eliminate thousands of lines of manually maintained HTML tables of Spark
configs.
- Ensure that internal configs are not accidentally documented publicly.
(e.g. `spark.sql.files.openCostInBytes`)
- Ensure that configs are documented publicly exactly as they are in the
code. (e.g. `spark.sql.autoBroadcastJoinThreshold`)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I manually ran the new script to generate config tables and confirmed the
following:
- The desired config tables are generated.
- If a config is mentioned in the YAML file but is not found, the script
errors.
- If a config group is defined in the YAML file that uses a reserved name,
the script errors.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]