[GitHub] [spark] cloud-fan commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

via GitHub Fri, 18 Aug 2023 04:20:20 -0700


cloud-fan commented on code in PR #41782:
URL: https://github.com/apache/spark/pull/41782#discussion_r1298336899



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -487,6 +487,25 @@ object SQLConf {
     .intConf
     .createWithDefault(10000)
 
+  val VECTORIZED_HUGE_VECTOR_RESERVE_RATIO =
+    buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorReserveRatio")
+      .doc("spark will reserve requiredCapacity * this ratio memory next time. 
This is only " +
+        "effective when spark.sql.inMemoryColumnarStorage.hugeVectorThreshold 
> 0 and required " +
+        "memory larger than that threshold.")
+      .version("3.5.0")
+      .doubleConf
+      .createWithDefault(1.2)
+
+  val VECTORIZED_HUGE_VECTOR_THRESHOLD =
+    buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorThreshold")
+      .doc("When the in memory column vector is larger than this, spark will 
reserve " +
+        s"requiredCapacity * ${VECTORIZED_HUGE_VECTOR_RESERVE_RATIO.key} 
memory next time and " +
+        "free this column vector before reading next batch data. -1 means 
disabling the " +
+        "optimization.")
+      .version("3.5.0")
+      .bytesConf(ByteUnit.BYTE)
+      .createWithDefault(-1)

Review Comment:
   can we rest this as `1` and see if there is any test failures? If not we can 
change it back to `-1` and merge it.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -487,6 +487,25 @@ object SQLConf {
     .intConf
     .createWithDefault(10000)
 
+  val VECTORIZED_HUGE_VECTOR_RESERVE_RATIO =
+    buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorReserveRatio")
+      .doc("spark will reserve requiredCapacity * this ratio memory next time. 
This is only " +
+        "effective when spark.sql.inMemoryColumnarStorage.hugeVectorThreshold 
> 0 and required " +
+        "memory larger than that threshold.")
+      .version("3.5.0")
+      .doubleConf
+      .createWithDefault(1.2)
+
+  val VECTORIZED_HUGE_VECTOR_THRESHOLD =
+    buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorThreshold")
+      .doc("When the in memory column vector is larger than this, spark will 
reserve " +
+        s"requiredCapacity * ${VECTORIZED_HUGE_VECTOR_RESERVE_RATIO.key} 
memory next time and " +
+        "free this column vector before reading next batch data. -1 means 
disabling the " +
+        "optimization.")
+      .version("3.5.0")
+      .bytesConf(ByteUnit.BYTE)
+      .createWithDefault(-1)

Review Comment:
   can we set this as `1` and see if there is any test failures? If not we can 
change it back to `-1` and merge it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

Reply via email to