Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22630#discussion_r223182488
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
---
@@ -345,6 +345,27 @@ trait CodegenSupport extends SparkPlan {
* don't require shouldStop() in the loop of producing rows.
*/
def needStopCheck: Boolean = parent.needStopCheck
+
+ /**
+ * A sequence of checks which evaluate to true if the downstream Limit
operators have not received
+ * enough records and reached the limit. If current node is a data
producing node, it can leverage
+ * this information to stop producing data and complete the data flow
earlier. Common data
+ * producing nodes are leaf nodes like Range and Scan, and blocking
nodes like Sort and Aggregate.
+ * These checks should be put into the loop condition of the data
producing loop.
+ */
+ def limitNotReachedChecks: Seq[String] = parent.limitNotReachedChecks
+
+ /**
+ * A helper method to generate the data producing loop condition
according to the
+ * limit-not-reached checks.
+ */
+ final def limitNotReachedCond: String = {
+ if (parent.limitNotReachedChecks.isEmpty) {
--- End diff --
It's not very useful to enforce that. The consequence is so minor and I
don't think it's worth the complexity. I want to have a simple and robust
framework for the limit optimization first.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]