haiyangsun-db opened a new pull request, #56273:
URL: https://github.com/apache/spark/pull/56273
## Title: [SPARK-56413][UDF][BUILD] Confine gRPC to a dedicated
udf-worker-grpc module
### What changes were proposed in this pull request?
This PR extracts the gRPC-based UDF worker transport into a new
`udf/worker/grpc` Maven/SBT module, sibling to the existing `udf/worker/proto`
and `udf/worker/core` modules, so that gRPC is no longer pulled onto the shared
Spark classpath.
Concretely:
- **New module `spark-udf-worker-grpc`** — generates the gRPC service stubs
(`UdfWorkerGrpc`) from the `.proto` definitions in `udf-worker-proto`
(`compile-custom` / grpc-java only), and owns the gRPC runtime dependencies
(`grpc-api`, `grpc-protobuf`, `grpc-stub`, plus `grpc-inprocess` for tests).
- **`udf-worker-proto`** now generates only protobuf-java message classes
(dropped the grpc-java codegen goal and the `grpc-*` dependencies).
- **`udf-worker-core`** no longer depends on gRPC (the `grpc-inprocess` test
dependency was removed).
- **`EchoProtocolSuite`** (the gRPC protocol test) moved from
`udf-worker-core` to the new `udf-worker-grpc` module and re-packaged to
`org.apache.spark.udf.worker.grpc`.
- Registered the module in the root `pom.xml` and in
`project/SparkBuild.scala` (new `udfWorkerGrpc` project, `UDFWorkerGrpc`
settings for grpc-stub-only codegen, and `UDFWorkerProto` restricted to
message-only codegen).
- Regenerated `dev/deps/spark-deps-hadoop-3-hive-2.3`, which drops
`grpc-api`, `grpc-protobuf`, `grpc-protobuf-lite`, `grpc-stub`,
`proto-google-common-protos`, `animal-sniffer-annotations`, and
`error_prone_annotations` from the assembly classpath.
Module dependency shape after this change:
```
udf-worker-proto (protobuf-java messages only)
^ ^
| |
core/catalyst/sql-core -- use message types + worker abstractions (NO gRPC)
|
udf-worker-core (worker abstractions, no gRPC)
^
|
udf-worker-grpc (gRPC service stubs + gRPC runtime -- confined here)
```
### Why are the changes needed?
Introducing the language-agnostic UDF worker framework made
`spark-udf-worker-proto`/`-core` compile dependencies of `core`, `catalyst`,
and `sql/core`. Because the proto module carried the gRPC stack as
compile-scope dependencies (needed to compile its generated gRPC service
stubs), this dragged `grpc-api`, `grpc-protobuf{,-lite}`, `grpc-stub`, and
`proto-google-common-protos` transitively onto the widely-shared Spark
core/assembly classpath. Spark has historically kept gRPC isolated to Spark
Connect (relocated/shaded) to avoid `io.grpc`/protobuf version clashes on that
classpath.
No code on the runtime classpath actually uses the gRPC stubs yet (only
`EchoProtocolSuite` did, a test). Confining gRPC to its own module removes the
unnecessary footprint from `core`/`catalyst`/`sql-core` while keeping the
framework's message types and worker abstractions available to them.
### Does this PR introduce _any_ user-facing change?
No. This is a build/module reorganization; the affected UDF worker framework
is experimental and not yet consumed at runtime.
### How was this patch tested?
- Existing tests, relocated: `EchoProtocolSuite` now runs under
`udf-worker-grpc`.
- Verified with SBT that `udf-worker-grpc/Test`, `udf-worker-core/Test`,
`catalyst`, `core`, and `sql` compile, and confirmed the codegen split on disk
(proto -> `generated-sources/protobuf/java` messages only; grpc ->
`generated-sources/protobuf/grpc-java/UdfWorkerGrpc.java`).
- Regenerated and validated the dependency manifest via
`./dev/test-dependencies.sh --replace-manifest`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]