[GitHub] [spark] ulysses-you commented on a diff in pull request #41100: [SPARK-43420][SQL] Make DisableUnnecessaryBucketedScan smart with table cache

via GitHub Tue, 09 May 2023 02:27:47 -0700


ulysses-you commented on code in PR #41100:
URL: https://github.com/apache/spark/pull/41100#discussion_r1188375423



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala:
##########
@@ -228,10 +228,12 @@ class PlannerSuite extends SharedSparkSession with 
AdaptiveSparkPlanHelper {
     assert(planned.exists(_.isInstanceOf[TakeOrderedAndProjectExec]))
   }
 
-  test("CollectLimit can appear in the middle of a plan when caching is used") 
{
+  test("CollectLimit can not appear in the middle of a plan when caching is 
used") {
     val query = testData.select($"key", $"value").limit(2).cache()
     val planned = 
query.queryExecution.optimizedPlan.asInstanceOf[InMemoryRelation]
-    assert(planned.cachedPlan.isInstanceOf[CollectLimitExec])
+    assert(planned.cachedPlan.isInstanceOf[TableCacheExec])
+    
assert(planned.cachedPlan.children.head.isInstanceOf[WholeStageCodegenExec])
+    
assert(planned.cachedPlan.children.head.children.head.isInstanceOf[GlobalLimitExec])

Review Comment:
   This is a small change that, before we would plan limit to 
`CollectLimitExec` in cached plan but now would plan to `GlobalLimitExec`. The 
reason is we insert a `TableCache` between `ReturnAnswer` and `Limit`.
   
   I think the `GlobalLimitExec` is actually better than `CollectLimitExec`:
   1. we always call `execute` for `CollectLimitExec` for a cached plan, so it 
is no difference with them 
   2. `GlobalLimitExec` support codegen but `CollectLimitExec` not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ulysses-you commented on a diff in pull request #41100: [SPARK-43420][SQL] Make DisableUnnecessaryBucketedScan smart with table cache

Reply via email to