Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22698#discussion_r224659380
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
    @@ -506,18 +513,18 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
           |       $numElementsTodo = 0;
           |       if ($nextBatchTodo == 0) break;
           |     }
    -      |     $numOutput.add($nextBatchTodo);
    -      |     $inputMetrics.incRecordsRead($nextBatchTodo);
           |     $batchEnd += $nextBatchTodo * ${step}L;
           |   }
           |
           |   int $localEnd = (int)(($batchEnd - $nextIndex) / ${step}L);
           |   for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
           |     long $value = ((long)$localIdx * ${step}L) + $nextIndex;
           |     ${consume(ctx, Seq(ev))}
    -      |     $shouldStop
    +      |     $stopCheck
           |   }
           |   $nextIndex = $batchEnd;
    +      |   $numOutput.add($localEnd);
    --- End diff --
    
    more background: the stop check for limit is done in batch granularity, 
while the stop check for result buffer is done in row granularity.
    
    That said, even if the limit is smaller than the batch size, the range 
operator still outputs a entire batch, physically.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to