cloud-fan opened a new pull request, #56538: URL: https://github.com/apache/spark/pull/56538
### What changes were proposed in this pull request? Spark publishes both a Scaladoc and a Javadoc API site. The Javadoc is generated from Scala sources by genjavadoc, and it currently exposes a large number of internal types that the Scaladoc correctly hides. The root cause: a top-level `private[x]` Scala type (e.g. `private[spark] trait SupportsDelegationToken`) compiles to a JVM-`public` symbol. genjavadoc emits a `public` Java stub for it even with `-P:genjavadoc:strictVisibility=true`, and the Javadoc `-public` option can't filter it because the stub genuinely is public. Scaladoc, by contrast, honors the access qualifier and drops these types. This PR adds a filter to `JavaUnidoc / unidoc / unidocAllSources` (alongside the existing `ignoreUndocumentedPackages`) that drops a generated stub `<module>/target/java/<pkg>/<Name>.java` **iff every top-level Scala declaration of `<Name>` in that package is `private[...]`**. A public class with a `private[...]` companion object (e.g. `SparkConf` — public `class`, `private[spark] object`) is kept, since the class itself is public. ### Why are the changes needed? The published Javadoc lists ~1.3k internal types (e.g. `BarrierCoordinator`, `ContextCleaner`, `ExecutorAllocationManager`, scheduler RPC messages, `SupportsDelegationToken`) that are `private[spark]` in source and are absent from the Scaladoc. This both misleads Java users about the public API surface and makes the two API docs disagree on which types are public. Filtering them aligns the Java API doc with the Scala one (format still differs, coverage now matches) without touching genuinely Java-authored public APIs. ### Does this PR introduce _any_ user-facing change? No code/runtime change. The only user-facing effect is on the generated Javadoc site: top-level `private[spark]` (and other qualified-private) Scala types no longer appear as public Java classes. Genuinely public APIs — including Java-authored ones (`src/main/java`, e.g. the DataSource V2 connector interfaces) and Java-friendly wrappers like `org.apache.spark.api.java.JavaRDD` — are unaffected. ### How was this patch tested? - Validated the filter selects exactly the package-private stubs against the already-generated `*/target/java` stubs across `core`, `sql/core`, `sql/api`, `sql/catalyst`, `mllib`, `streaming`: it drops the `private[spark]` leaks (`SupportsDelegationToken`, `StructuredStreamingIdAwareSchedulerLogging`, `InternalAccumulator`, ~1.3k total) while keeping public types and public-class-with-private-companion cases (`SparkConf`, `SparkContext`, `TaskContext`, `RDD`). - Confirmed the build definition compiles via `build/sbt reload`. - A full `build/sbt unidoc` run is the end-to-end integration check; relying on CI's docs build for that. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Isaac) This pull request and its description were written by Isaac. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
