[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

mgaido91 Fri, 05 Oct 2018 04:54:02 -0700

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22630#discussion_r222956995
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
    @@ -345,6 +345,27 @@ trait CodegenSupport extends SparkPlan {
        * don't require shouldStop() in the loop of producing rows.
        */
       def needStopCheck: Boolean = parent.needStopCheck
    +
    +  /**
    +   * A sequence of checks which evaluate to true if the downstream Limit 
operators have not received
    +   * enough records and reached the limit. If current node is a data 
producing node, it can leverage
    +   * this information to stop producing data and complete the data flow 
earlier. Common data
    +   * producing nodes are leaf nodes like Range and Scan, and blocking 
nodes like Sort and Aggregate.
    +   * These checks should be put into the loop condition of the data 
producing loop.
    +   */
    +  def limitNotReachedChecks: Seq[String] = parent.limitNotReachedChecks
    +
    +  /**
    +   * A helper method to generate the data producing loop condition 
according to the
    +   * limit-not-reached checks.
    +   */
    +  final def limitNotReachedCond: String = {
    +    if (parent.limitNotReachedChecks.isEmpty) {
    +      ""
    +    } else {
    +      parent.limitNotReachedChecks.mkString(" && ", " && ", "")
    --- End diff --
    
    here we are assuming that this is going to be in and with an already 
existing condition. I don't see a case in which this may be used is a different 
context as of now, but what about just producing the conditions here and put 
the initial and outside of this? It may be easier to reuse this. WDYT?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

Reply via email to