Re: [PR] [SPARK-53917][CONNECT] Support large local relations - follow-ups [spark]

via GitHub Mon, 10 Nov 2025 05:22:22 -0800


vicennial commented on code in PR #52973:
URL: https://github.com/apache/spark/pull/52973#discussion_r2510551815



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -6003,6 +6003,19 @@ object SQLConf {
       .bytesConf(ByteUnit.BYTE)
       .createWithDefaultString("3GB")
 
+  val LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES =
+    buildConf(SqlApiConfHelper.LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES_KEY)

Review Comment:
   We could update the name here to be a bit more explicit in the sense that 
this pertains to the maximum number of bytes that we will materialise in memory 
for the **specific local relation**



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -6003,6 +6003,19 @@ object SQLConf {
       .bytesConf(ByteUnit.BYTE)
       .createWithDefaultString("3GB")
 
+  val LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES =
+    buildConf(SqlApiConfHelper.LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES_KEY)
+      .internal()
+      .doc("The batch size in bytes for uploading chunks of local relations to 
the server. " +
+        "The client collects multiple chunks into a single batch until this 
limit is reached, " +
+        "then uploads the batch of chunks to the server. This helps reduce 
memory pressure on " +
+        "the client when dealing with large local relations because the client 
does not have to " +
+        "materialize all chunks in memory.")
+      .version("4.1.0")
+      .longConf
+      .checkValue(_ > 0, "The batch size in bytes must be positive")

Review Comment:
   We should check if this value is greater than the chunk size value value as 
an initial step



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -6003,6 +6003,19 @@ object SQLConf {
       .bytesConf(ByteUnit.BYTE)
       .createWithDefaultString("3GB")
 
+  val LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES =
+    buildConf(SqlApiConfHelper.LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES_KEY)

Review Comment:
   (specific because multi-threading can result in multiple artifacts being 
materialised at once)



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -6003,6 +6003,19 @@ object SQLConf {
       .bytesConf(ByteUnit.BYTE)
       .createWithDefaultString("3GB")
 
+  val LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES =
+    buildConf(SqlApiConfHelper.LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES_KEY)
+      .internal()
+      .doc("The batch size in bytes for uploading chunks of local relations to 
the server. " +
+        "The client collects multiple chunks into a single batch until this 
limit is reached, " +
+        "then uploads the batch of chunks to the server. This helps reduce 
memory pressure on " +
+        "the client when dealing with large local relations because the client 
does not have to " +
+        "materialize all chunks in memory.")
+      .version("4.1.0")
+      .longConf
+      .checkValue(_ > 0, "The batch size in bytes must be positive")

Review Comment:
   In the case of conflicts, this conf should be respected and the operation 
should error out as we wouldn't want to bypass an explicitly set max 
materialisation size (to avoid system failures)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53917][CONNECT] Support large local relations - follow-ups [spark]

Reply via email to