dtenedor commented on code in PR #35975:
URL: https://github.com/apache/spark/pull/35975#discussion_r852434711
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala:
##########
@@ -85,22 +85,48 @@ abstract class SparkStrategies extends
QueryPlanner[SparkPlan] {
case ReturnAnswer(rootPlan) => rootPlan match {
case Limit(IntegerLiteral(limit), Sort(order, true, child))
if limit < conf.topKSortFallbackThreshold =>
- TakeOrderedAndProjectExec(limit, order, child.output,
planLater(child)) :: Nil
+ TakeOrderedAndProjectExec(limit, 0, order, child.output,
planLater(child)) :: Nil
case Limit(IntegerLiteral(limit), Project(projectList, Sort(order,
true, child)))
if limit < conf.topKSortFallbackThreshold =>
- TakeOrderedAndProjectExec(limit, order, projectList,
planLater(child)) :: Nil
+ TakeOrderedAndProjectExec(limit, 0, order, projectList,
planLater(child)) :: Nil
case Limit(IntegerLiteral(limit), child) =>
- CollectLimitExec(limit, planLater(child)) :: Nil
+ CollectLimitExec(limit, 0, planLater(child)) :: Nil
+ case GlobalLimitAndOffset(IntegerLiteral(limit),
IntegerLiteral(offset),
+ Sort(order, true, child)) if limit + offset <
conf.topKSortFallbackThreshold =>
+ TakeOrderedAndProjectExec(limit, offset, order, child.output,
planLater(child)) :: Nil
+ case GlobalLimitAndOffset(IntegerLiteral(limit),
IntegerLiteral(offset),
+ Project(projectList, Sort(order, true, child)))
+ if limit + offset < conf.topKSortFallbackThreshold =>
+ TakeOrderedAndProjectExec(limit, offset, order, projectList,
planLater(child)) :: Nil
+ case GlobalLimitAndOffset(IntegerLiteral(limit),
IntegerLiteral(offset), child) =>
+ CollectLimitExec(limit, offset, planLater(child)) :: Nil
case Tail(IntegerLiteral(limit), child) =>
CollectTailExec(limit, planLater(child)) :: Nil
case other => planLater(other) :: Nil
}
case Limit(IntegerLiteral(limit), Sort(order, true, child))
if limit < conf.topKSortFallbackThreshold =>
- TakeOrderedAndProjectExec(limit, order, child.output,
planLater(child)) :: Nil
+ TakeOrderedAndProjectExec(limit, 0, order, child.output,
planLater(child)) :: Nil
Review Comment:
It seems like there is some duplication between this and the cases under the
`ReturnAnswer` block in L86-L105. Could we deduplicate them int once place? It
could help us make sure the numbers are correct when adding the limit and
offset together and comparing against the `conf.topKSortFallbackThreshold`.
##########
sql/core/src/test/resources/sql-tests/inputs/postgreSQL/limit.sql:
##########
@@ -12,25 +12,24 @@ SELECT '' AS five, unique1, unique2, stringu1
SELECT '' AS two, unique1, unique2, stringu1
FROM onek WHERE unique1 > 60 AND unique1 < 63
ORDER BY unique1 LIMIT 5;
--- [SPARK-28330] ANSI SQL: Top-level <result offset clause> in <query
expression>
--- SELECT '' AS three, unique1, unique2, stringu1
--- FROM onek WHERE unique1 > 100
--- ORDER BY unique1 LIMIT 3 OFFSET 20;
--- SELECT '' AS zero, unique1, unique2, stringu1
--- FROM onek WHERE unique1 < 50
--- ORDER BY unique1 DESC LIMIT 8 OFFSET 99;
--- SELECT '' AS eleven, unique1, unique2, stringu1
--- FROM onek WHERE unique1 < 50
--- ORDER BY unique1 DESC LIMIT 20 OFFSET 39;
+SELECT '' AS three, unique1, unique2, stringu1
+ FROM onek WHERE unique1 > 100
+ ORDER BY unique1 LIMIT 3 OFFSET 20;
Review Comment:
Could we also have a few test cases with ORDER BY and LIMIT + OFFSET, where
the sum of the limit and offset are just below and just above the
`conf.topKSortFallbackThreshold`? If we can include EXPLAIN output in the
tests, we can show that the top-K sort gets planned as intended.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]