nchammas opened a new pull request, #44865:
URL: https://github.com/apache/spark/pull/44865

   ### What changes were proposed in this pull request?
   
   As [suggested here][1], this change improves the documentation build so that 
it builds Spark at most one time, regardless of what API docs are requested in 
the build.
   
   [1]: https://github.com/apache/spark/pull/44791#discussion_r1459233153
   
   ### Why are the changes needed?
   
   There is no need to build Spark multiple times when generating docs. In 
particular, building Scala and Python docs, or Scala and SQL docs, causes Spark 
to be built twice.
   
   Fixing this problem saves us a couple of minutes.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   I built the docs as follows on `master` as well as on this branch:
   
   ```sh
   time SKIP_RDOC=1 SKIP_PYTHONDOC=1 bundle exec jekyll build
   ```
   
   The time results are as follows:
   
   ```
   Before this change
   ------------------
   real    6m48.815s
   user    23m17.943s
   sys     1m29.578s
   
   After this change
   -----------------
   real    4m10.672s
   user    14m10.130s
   sys     1m0.773s
   ```
   
   Additionally, I diffed the generated `_site/` dir across `master` and this 
branch and confirmed they are essential identical except for some general SQL 
examples files.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to