ulysses-you commented on code in PR #41100:
URL: https://github.com/apache/spark/pull/41100#discussion_r1188375423


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala:
##########
@@ -228,10 +228,12 @@ class PlannerSuite extends SharedSparkSession with 
AdaptiveSparkPlanHelper {
     assert(planned.exists(_.isInstanceOf[TakeOrderedAndProjectExec]))
   }
 
-  test("CollectLimit can appear in the middle of a plan when caching is used") 
{
+  test("CollectLimit can not appear in the middle of a plan when caching is 
used") {
     val query = testData.select($"key", $"value").limit(2).cache()
     val planned = 
query.queryExecution.optimizedPlan.asInstanceOf[InMemoryRelation]
-    assert(planned.cachedPlan.isInstanceOf[CollectLimitExec])
+    assert(planned.cachedPlan.isInstanceOf[TableCacheExec])
+    
assert(planned.cachedPlan.children.head.isInstanceOf[WholeStageCodegenExec])
+    
assert(planned.cachedPlan.children.head.children.head.isInstanceOf[GlobalLimitExec])

Review Comment:
   This is a small change that, before we would plan limit to 
`CollectLimitExec` in cached plan but now would plan to `GlobalLimitExec`. The 
reason is we insert a `TableCache` between `ReturnAnswer` and `Limit`.
   
   I think the `GlobalLimitExec` is actually better than `CollectLimitExec`:
   1. we always call `execute` for `CollectLimitExec` for a cached plan, so it 
is no difference with them 
   2. `GlobalLimitExec` support codegen but `CollectLimitExec` not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to