[GitHub] [spark] wangyum commented on a change in pull request #31984: [SPARK-34884][SQL] Improve DPP evaluation to make filtering side must can broadcast by size or broadcast by hint

GitBox Mon, 05 Apr 2021 17:12:46 -0700


wangyum commented on a change in pull request #31984:
URL: https://github.com/apache/spark/pull/31984#discussion_r607405046




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -868,7 +868,7 @@ abstract class DynamicPartitionPruningSuiteBase
           |ON f.store_id = s.store_id WHERE s.country = 'DE'
         """.stripMargin)
 
-      checkPartitionPruningPredicate(df, true, false)
+      checkPartitionPruningPredicate(df, false, false)

Review comment:
       We can add hint to enable DPP:
   ```scala
       Given("disable broadcast hash join and enable query duplication")
       withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST_ONLY.key 
-> "false",
         SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1",
         SQLConf.DYNAMIC_PARTITION_PRUNING_USE_STATS.key -> "true") {
         val df = sql(
           """
             |SELECT /*+ BROADCAST(s) */ f.date_id, f.product_id, f.units_sold, 
f.store_id FROM fact_stats f
             |JOIN dim_stats s
             |ON f.store_id = s.store_id WHERE s.country = 'DE'
           """.stripMargin)
   
         checkPartitionPruningPredicate(df, false, true)
   
         checkAnswer(df,
           Row(1030, 2, 10, 3) ::
           Row(1040, 2, 50, 3) ::
           Row(1050, 2, 50, 3) ::
           Row(1060, 2, 50, 3) :: Nil
         )
       }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum commented on a change in pull request #31984: [SPARK-34884][SQL] Improve DPP evaluation to make filtering side must can broadcast by size or broadcast by hint

Reply via email to