advancedxy commented on code in PR #41192:
URL: https://github.com/apache/spark/pull/41192#discussion_r1197287073


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala:
##########
@@ -26,14 +26,14 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 private[protobuf] case class CatalystDataToProtobuf(
     child: Expression,
     messageName: String,
-    descFilePath: Option[String] = None,
+    fileDescriptorSetBytesOpt: Option[Array[Byte]] = None,

Review Comment:
   > most of the time size would be in kilobytes
   
   Yes. Most of the time, we don't need to bother the size of protobuf 
messages. But there are certain cases, huge message and huge protobuf repo is 
used, such as ML featuring engineering: they could define thousands, or even 
tens of thousands fields in one message.
   
   > Can we do that as an implementation detail (so that users don't need be 
aware of it).
   
   Of course, we should do that as an implementation details, users should be 
no aware how the bytes are distributed.  And we can be adaptive here, only the 
size exceeds a threshold, say 512K, we should broadcast instead of serializing.
   
   I just checked the related spark connect impl, seems functions are just 
placeholders on the client side, it will work fine with spark connect.
   
   > I will add a comment. Do you think we can do that an enhancement later?
   
   A comment is sufficient for now. Since we can improve it via broadcast, this 
enhancement could be addressed in later prs.
   
   > We also need to make sure it is cleaned up.
   
   P.S. if we are using it correctly, it should be cleaned up automatically. A 
lot of places uses broadcast.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to