[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

cloud-fan Sat, 06 Oct 2018 06:36:28 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22630#discussion_r223182488
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
    @@ -345,6 +345,27 @@ trait CodegenSupport extends SparkPlan {
        * don't require shouldStop() in the loop of producing rows.
        */
       def needStopCheck: Boolean = parent.needStopCheck
    +
    +  /**
    +   * A sequence of checks which evaluate to true if the downstream Limit 
operators have not received
    +   * enough records and reached the limit. If current node is a data 
producing node, it can leverage
    +   * this information to stop producing data and complete the data flow 
earlier. Common data
    +   * producing nodes are leaf nodes like Range and Scan, and blocking 
nodes like Sort and Aggregate.
    +   * These checks should be put into the loop condition of the data 
producing loop.
    +   */
    +  def limitNotReachedChecks: Seq[String] = parent.limitNotReachedChecks
    +
    +  /**
    +   * A helper method to generate the data producing loop condition 
according to the
    +   * limit-not-reached checks.
    +   */
    +  final def limitNotReachedCond: String = {
    +    if (parent.limitNotReachedChecks.isEmpty) {
    --- End diff --
    
    It's not very useful to enforce that. The consequence is so minor and I 
don't think it's worth the complexity. I want to have a simple and robust 
framework for the limit optimization first.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

Reply via email to