[GitHub] spark pull request #22698: [SPARK-25710][SQL] range should report metrics co...

cloud-fan Thu, 11 Oct 2018 20:10:15 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22698#discussion_r224659380
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
    @@ -506,18 +513,18 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
           |       $numElementsTodo = 0;
           |       if ($nextBatchTodo == 0) break;
           |     }
    -      |     $numOutput.add($nextBatchTodo);
    -      |     $inputMetrics.incRecordsRead($nextBatchTodo);
           |     $batchEnd += $nextBatchTodo * ${step}L;
           |   }
           |
           |   int $localEnd = (int)(($batchEnd - $nextIndex) / ${step}L);
           |   for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
           |     long $value = ((long)$localIdx * ${step}L) + $nextIndex;
           |     ${consume(ctx, Seq(ev))}
    -      |     $shouldStop
    +      |     $stopCheck
           |   }
           |   $nextIndex = $batchEnd;
    +      |   $numOutput.add($localEnd);
    --- End diff --
    
    more background: the stop check for limit is done in batch granularity, 
while the stop check for result buffer is done in row granularity.
    
    That said, even if the limit is smaller than the batch size, the range 
operator still outputs a entire batch, physically.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22698: [SPARK-25710][SQL] range should report metrics co...

Reply via email to