rangadi commented on code in PR #42236:
URL: https://github.com/apache/spark/pull/42236#discussion_r1287455898
##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala:
##########
@@ -3233,14 +3233,15 @@ class PlanGenerationTestSuite
"connect/common/src/test/resources/protobuf-tests/common.desc"
test("from_protobuf messageClassName") {
- binary.select(pbFn.from_protobuf(fn.col("bytes"),
classOf[StorageLevel].getName))
+ binary.select(
+ pbFn.from_protobuf(fn.col("bytes"),
"org.apache.spark.sql.protobuf.protos.TestProtoObj"))
Review Comment:
> In order to use this sql function, users have no choice but to relocate
the content of the Java PB description files used in their business according
to Spark's project rules(replcace all com.google.protobuf.
toorg.sparkproject.spark_protobuf.protobuf.
Moving the discussion about "why do users need to shade their java classes?"
here in a comment thread. cc: @LuciferYang, @advancedxy, @HyukjinKwon.
There is no other option. Spark at runtime includes unshaded Protobuf
library (2.5.x) in its class path. This is was as of a few months back. Don't
know if that changed. Its been this way for years. There is no way to support
protobuf Java classes for current versions of Protobuf library without shading.
I agree it is not convenient for the customers. That is why descriptor file
is documented as the primary API (both for Scala and python).
We will soon support Python classes that will help Python customers.
FWIW I have a small gitbuf project to help customers create shaded Java
classes : https://github.com/rangadi/shaded-protobuf-classes
##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala:
##########
@@ -3233,14 +3233,15 @@ class PlanGenerationTestSuite
"connect/common/src/test/resources/protobuf-tests/common.desc"
test("from_protobuf messageClassName") {
Review Comment:
This is great! Thank you.
##########
connector/connect/server/src/test/java/org/apache/spark/sql/protobuf/protos/TestProto.java:
##########
@@ -0,0 +1,57 @@
+// Generated by the protocol buffer compiler. DO NOT EDIT!
+// source: test.proto
+
+package org.apache.spark.sql.protobuf.protos;
+
+/**
+ * SPARK-43646: This is generated by the
`connector/connect/server/src/test/resources/test.proto`
+ * to test the spark protobuf uber jar. If you need to regenerate this file:
+ * 1. Modify `connector/connect/server/src/test/resources/test.proto` to
generate Java files
+ * and replace the current file.
+ * 2. Replace all `com.google.protobuf` in the file with
`org.sparkproject.spark_protobuf.protobuf.`
+ * * <p>
+ * * TODO(SPARK-44606): Generate this file and replace the package names in
the file when testing.
+ */
+public final class TestProto {
Review Comment:
Could you update the doc comment to say why this is included (the current
comment talks about a fix, but not the problem). Looks like this uses shaded
version of protobuf library.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]