rangadi commented on code in PR #41192:
URL: https://github.com/apache/spark/pull/41192#discussion_r1197260139


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala:
##########
@@ -148,8 +212,38 @@ object functions {
     messageName: String,
     descFilePath: String,
     options: java.util.Map[String, String]): Column = {
+    val fileContent = ProtobufUtils.readDescriptorFileContent(descFilePath)

Review Comment:
   Didn't want to carry the file path further down in those classes. This 
avoids even accidental read of those files.
   Let me know if you see any advantages. 
   It might simplify changes to test suite files in this PR, but this is just 
one time cost. 



##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala:
##########
@@ -26,14 +26,14 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 private[protobuf] case class CatalystDataToProtobuf(
     child: Expression,
     messageName: String,
-    descFilePath: Option[String] = None,
+    fileDescriptorSetBytesOpt: Option[Array[Byte]] = None,

Review Comment:
   @advancedxy suggested using broadcast for distributing this buffer ([in the 
comment 
here](https://github.com/apache/spark/pull/41192#issuecomment-1551260315)): 
   
   Moving that discussion to code comment since it has better threading.
   @advancedxy most of the time size would be in kilobytes. It is possible some 
customers have huge protobuf repo and they bundled everything. 
   Using broadcast is a good idea. I need to read up more on it. Can we do that 
as an implementation detail (so that users don't need be aware of it). We also 
need to make sure it is cleaned up.
   
   I will add a comment. Do you think we can do that an enhancement later?
   If we do it as implementation detail, it should work fine with Spark-Connect 
too. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to