nchammas opened a new pull request, #44756:
URL: https://github.com/apache/spark/pull/44756

   ### What changes were proposed in this pull request?
   
   Enable Spark configs to be assigned to documentation groups. These groups 
will be used to automatically build config tables for display in our 
documentation.
   
   Instead of having to maintain [large blocks of HTML tables][1] throughout 
our documentation, config tables can simply be included as follows:
   
   ```liquid
   {% include_api_gen _generated/config_tables/sql-tuning-caching-data.html %}
   ```
   
   This approach covers both SQL and non-SQL config docs and, if accepted, will 
replace `sql/gen-sql-config-docs.py`.
   
   This proposal is an alternative to #44300 that does not require modifying 
`ConfigEntry` or `ConfigBuilder` to add a new field. Instead, the groups are 
defined completely outside of Spark's core.
   
   [1]: 
https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/docs/sql-performance-tuning.md?plain=1#L37-L56
   
   ### Why are the changes needed?
   
   Using this approach we can accomplish several goals at once:
   
   - Eliminate thousands of lines of manually maintained HTML tables of Spark 
configs.
   - Ensure that internal configs are not accidentally documented publicly. 
(e.g. `spark.sql.files.openCostInBytes`)
   - Ensure that configs are documented publicly exactly as they are in the 
code. (e.g. `spark.sql.autoBroadcastJoinThreshold`)
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   I manually ran the new script to generate config tables and confirmed the 
following:
   
   - The desired config tables are generated.
   - If a config is mentioned in the YAML file but is not found, the script 
errors.
   - If a config group is defined in the YAML file that uses a reserved name, 
the script errors.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to