Re: [PR] [SPARK-49673] Increase CONNECT_GRPC_ARROW_MAX_BATCH_SIZE to 0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE [spark]

via GitHub Tue, 17 Sep 2024 08:51:11 -0700


dillitz commented on code in PR #48122:
URL: https://github.com/apache/spark/pull/48122#discussion_r1763488768



##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -1566,6 +1566,25 @@ class ClientE2ETestSuite
     val result = df.select(trim(col("col"), " ").as("trimmed_col")).collect()
     assert(result sameElements Array(Row("a"), Row("b"), Row("c")))
   }
+
+  test("SPARK-49673: new batch size, multiple batches") {
+    val maxBatchSize = 
spark.conf.get("spark.connect.grpc.arrow.maxBatchSize").dropRight(1).toInt

Review Comment:
   Sorry not sure what you mean here. I've decreased both the 
GRPC_MAX_MESSAGE_SIZE in the client and the maxBatchSize on the server to 10MiB 
to simulate the batches being close to the message size limit without having to 
process `range(128*1024*1024)` on the cluster. Are you asking me to replace 
this with spark.connect.grpc.arrow.maxBatchSize = 
spark.connect.grpc.arrow.maxBatchSize = CONNECT_GRPC_MAX_MESSAGE_SIZE = 0.8 * 
128MiB?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49673] Increase CONNECT_GRPC_ARROW_MAX_BATCH_SIZE to 0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE [spark]

Reply via email to