cloud-fan opened a new pull request, #56538:
URL: https://github.com/apache/spark/pull/56538

   ### What changes were proposed in this pull request?
   
   Spark publishes both a Scaladoc and a Javadoc API site. The Javadoc is 
generated from Scala sources by genjavadoc, and it currently exposes a large 
number of internal types that the Scaladoc correctly hides.
   
   The root cause: a top-level `private[x]` Scala type (e.g. `private[spark] 
trait SupportsDelegationToken`) compiles to a JVM-`public` symbol. genjavadoc 
emits a `public` Java stub for it even with 
`-P:genjavadoc:strictVisibility=true`, and the Javadoc `-public` option can't 
filter it because the stub genuinely is public. Scaladoc, by contrast, honors 
the access qualifier and drops these types.
   
   This PR adds a filter to `JavaUnidoc / unidoc / unidocAllSources` (alongside 
the existing `ignoreUndocumentedPackages`) that drops a generated stub 
`<module>/target/java/<pkg>/<Name>.java` **iff every top-level Scala 
declaration of `<Name>` in that package is `private[...]`**. A public class 
with a `private[...]` companion object (e.g. `SparkConf` — public `class`, 
`private[spark] object`) is kept, since the class itself is public.
   
   ### Why are the changes needed?
   
   The published Javadoc lists ~1.3k internal types (e.g. `BarrierCoordinator`, 
`ContextCleaner`, `ExecutorAllocationManager`, scheduler RPC messages, 
`SupportsDelegationToken`) that are `private[spark]` in source and are absent 
from the Scaladoc. This both misleads Java users about the public API surface 
and makes the two API docs disagree on which types are public. Filtering them 
aligns the Java API doc with the Scala one (format still differs, coverage now 
matches) without touching genuinely Java-authored public APIs.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No code/runtime change. The only user-facing effect is on the generated 
Javadoc site: top-level `private[spark]` (and other qualified-private) Scala 
types no longer appear as public Java classes. Genuinely public APIs — 
including Java-authored ones (`src/main/java`, e.g. the DataSource V2 connector 
interfaces) and Java-friendly wrappers like `org.apache.spark.api.java.JavaRDD` 
— are unaffected.
   
   ### How was this patch tested?
   
   - Validated the filter selects exactly the package-private stubs against the 
already-generated `*/target/java` stubs across `core`, `sql/core`, `sql/api`, 
`sql/catalyst`, `mllib`, `streaming`: it drops the `private[spark]` leaks 
(`SupportsDelegationToken`, `StructuredStreamingIdAwareSchedulerLogging`, 
`InternalAccumulator`, ~1.3k total) while keeping public types and 
public-class-with-private-companion cases (`SparkConf`, `SparkContext`, 
`TaskContext`, `RDD`).
   - Confirmed the build definition compiles via `build/sbt reload`.
   - A full `build/sbt unidoc` run is the end-to-end integration check; relying 
on CI's docs build for that.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Isaac)
   
   This pull request and its description were written by Isaac.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to