Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10605#discussion_r48900944
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala ---
    @@ -661,29 +719,35 @@ private[execution] final class 
UnboundedFollowingWindowFunctionFrame(
         lbound: BoundOrdering) extends WindowFunctionFrame {
     
       /** Rows of the partition currently being processed. */
    -  private[this] var input: ArrayBuffer[InternalRow] = null
    +  private[this] var input: ExternalRowBuffer = null
     
       /** Index of the first input row with a value equal to or greater than 
the lower bound of the
        * current output row. */
       private[this] var inputIndex = 0
     
    -  /** Index of the row we are currently writing. */
    -  private[this] var outputIndex = 0
    -
       /** Prepare the frame for calculating a new partition. */
    -  override def prepare(rows: ArrayBuffer[InternalRow]): Unit = {
    +  override def prepare(rows: ExternalRowBuffer): Unit = {
         input = rows
         inputIndex = 0
    -    outputIndex = 0
       }
     
       /** Write the frame columns for the current row to the given target row. 
*/
    -  override def write(): Unit = {
    -    var bufferUpdated = outputIndex == 0
    +  override def write(index: Int, current: InternalRow): Unit = {
    +    var bufferUpdated = index == 0
    +
    +    // Duplicate the input to have a new iterator
    +    val tmp = input.copy()
     
         // Drop all rows from the buffer for which the input row value is 
smaller than
         // the output row lower bound.
    -    while (inputIndex < input.size && lbound.compare(input, inputIndex, 
outputIndex) < 0) {
    +    var i = 0
    +    while (i < inputIndex) {
    +      tmp.next()
    --- End diff --
    
    We are skipping all the records we have already processed here; this 
shouldn't be to expensive; but it still adds some complexity (```O(n * (n - 1) 
/2)``` towards ```O(n * n)```). I'd like to see/do a few benchmarks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to