[jira] [Updated] (SPARK-46943) Support for configuring ShuffledHashJoin plan size Threshold
[ https://issues.apache.org/jira/browse/SPARK-46943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated SPARK-46943: --- Description: When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the following error. {code:java} org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to build hash relation, got 478549889 bytes at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761) at org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064) at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153) at org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown Source) at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775) at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915){code} Because when converting SMJ to SHJ, it only determines whether the size of the plan is smaller than `conf.autoBroadcastJoinThreshold * conf.numShufflePartitions`. When the configured `numShufflePartitions` is large enough, it is easy to convert to SHJ. The executor build hash relation fails due to insufficient memory. [https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513] was: When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the following error. {code:java} org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to build hash relation, got 478549889 bytes at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761) at org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064) at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153) at org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown Source) at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775) at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) {code} Because when converting SMJ to SHJ, it only determines whether the size of the plan is smaller than `conf.autoBroadcastJoinThreshold * conf.numShufflePartitions`. When the configured `numShufflePartitions` is large enough, it is easy to convert to SHJ. The executor build hash relation fails due to insufficient memory. https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513 > Support for configuring ShuffledHashJoin plan size Threshold > > > Key: SPARK-46943 > URL: https://issues.apache.org/jira/browse/SPARK-46943 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: dzcxzl >Priority: Minor > Labels: pull-request-available > > When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the > following error. > > {code:java} > org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to > build hash relation, got 478549889 bytes > at > org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795) > at > org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581) > at > org.apache.spark.sql.
[jira] [Updated] (SPARK-46943) Support for configuring ShuffledHashJoin plan size Threshold
[ https://issues.apache.org/jira/browse/SPARK-46943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated SPARK-46943: --- Description: When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the following error. {code:java} org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to build hash relation, got 478549889 bytes at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761) at org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064) at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153) at org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown Source) at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775) at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) {code} Because when converting SMJ to SHJ, it only determines whether the size of the plan is smaller than `conf.autoBroadcastJoinThreshold * conf.numShufflePartitions`. When the configured `numShufflePartitions` is large enough, it is easy to convert to SHJ. The executor build hash relation fails due to insufficient memory. https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513 > Support for configuring ShuffledHashJoin plan size Threshold > > > Key: SPARK-46943 > URL: https://issues.apache.org/jira/browse/SPARK-46943 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: dzcxzl >Priority: Minor > Labels: pull-request-available > > When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the > following error. > > {code:java} > org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to > build hash relation, got 478549889 bytes > at > org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795) > at > org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581) > at > org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813) > at > org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761) > at > org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064) > at > org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153) > at > org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown > Source) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771) > at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) > {code} > > Because when converting SMJ to SHJ, it only determines whether the size of > the plan is smaller than `conf.autoBroadcastJoinThreshold * > conf.numShufflePartitions`. > When the configured `numShufflePartitions` is large enough, it is easy to > convert to SHJ. The executor build hash relation fails due to insufficient > memory. > > https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46943) Support for configuring ShuffledHashJoin plan size Threshold
[ https://issues.apache.org/jira/browse/SPARK-46943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46943: --- Labels: pull-request-available (was: ) > Support for configuring ShuffledHashJoin plan size Threshold > > > Key: SPARK-46943 > URL: https://issues.apache.org/jira/browse/SPARK-46943 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: dzcxzl >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org