SandishKumarHN commented on code in PR #39550:
URL: https://github.com/apache/spark/pull/39550#discussion_r1167987988
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala:
##########
@@ -72,51 +69,129 @@ object functions {
}
/**
- * Converts a binary column of Protobuf format into its corresponding
catalyst value. The
- * specified Protobuf class must match the data, otherwise the behavior is
- * undefined: it may fail or return arbitrary result. The jar containing
Java class should be
+ * Converts a binary column of Protobuf format into its corresponding
catalyst value.
+ * `messageClassName` points to Protobuf Java class. The jar containing Java
class should be
* shaded. Specifically, `com.google.protobuf.*` should be shaded to
* `org.sparkproject.spark-protobuf.protobuf.*`.
+ * https://github.com/rangadi/shaded-protobuf-classes is useful to create
shaded jar from
+ * Protobuf files.
*
* @param data
* the binary column.
- * @param shadedMessageClassName
- * The Protobuf class name. E.g.
<code>org.spark.examples.protobuf.ExampleEvent</code>.
+ * @param messageClassName
+ * The full name for Protobuf Java class. E.g.
<code>com.example.protos.ExampleEvent</code>.
* The jar with these classes needs to be shaded as described above.
* @since 3.4.0
*/
@Experimental
- def from_protobuf(data: Column, shadedMessageClassName: String): Column = {
- new Column(ProtobufDataToCatalyst(data.expr, shadedMessageClassName))
+ def from_protobuf(data: Column, messageClassName: String): Column = {
+ new Column(ProtobufDataToCatalyst(data.expr, messageClassName))
}
/**
- * Converts a column into binary of protobuf format.
+ * Converts a binary column of Protobuf format into its corresponding
catalyst value.
+ * `messageClassName` points to Protobuf Java class. The jar containing Java
class should be
+ * shaded. Specifically, `com.google.protobuf.*` should be shaded to
+ * `org.sparkproject.spark-protobuf.protobuf.*`.
+ * https://github.com/rangadi/shaded-protobuf-classes is useful to create
shaded jar from
+ * Protobuf files.
+ *
+ * @param data
+ * the binary column.
+ * @param messageClassName
+ * The full name for Protobuf Java class. E.g.
<code>com.example.protos.ExampleEvent</code>.
+ * The jar with these classes needs to be shaded as described above.
+ * @param options
+ * @since 3.4.0
+ */
+ @Experimental
Review Comment:
@ericsun95 Protobuf files can be in either .pb or binary format. .proto
files contain the schema for the protobuf data, while binary files contain the
actual data. If your protobuf files are in binary format, you can use the Spark
binaryFile data source to read them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]