Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10605#discussion_r48902027
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala ---
    @@ -661,29 +719,35 @@ private[execution] final class 
UnboundedFollowingWindowFunctionFrame(
         lbound: BoundOrdering) extends WindowFunctionFrame {
     
       /** Rows of the partition currently being processed. */
    -  private[this] var input: ArrayBuffer[InternalRow] = null
    +  private[this] var input: ExternalRowBuffer = null
     
       /** Index of the first input row with a value equal to or greater than 
the lower bound of the
        * current output row. */
       private[this] var inputIndex = 0
     
    -  /** Index of the row we are currently writing. */
    -  private[this] var outputIndex = 0
    -
       /** Prepare the frame for calculating a new partition. */
    -  override def prepare(rows: ArrayBuffer[InternalRow]): Unit = {
    +  override def prepare(rows: ExternalRowBuffer): Unit = {
         input = rows
         inputIndex = 0
    -    outputIndex = 0
       }
     
       /** Write the frame columns for the current row to the given target row. 
*/
    -  override def write(): Unit = {
    -    var bufferUpdated = outputIndex == 0
    +  override def write(index: Int, current: InternalRow): Unit = {
    +    var bufferUpdated = index == 0
    +
    +    // Duplicate the input to have a new iterator
    +    val tmp = input.copy()
     
         // Drop all rows from the buffer for which the input row value is 
smaller than
         // the output row lower bound.
    -    while (inputIndex < input.size && lbound.compare(input, inputIndex, 
outputIndex) < 0) {
    +    var i = 0
    +    while (i < inputIndex) {
    +      tmp.next()
    --- End diff --
    
    Yes, it doubles the time. If N is large, it's kind of like double of 
infinity is still infinity :-) 
    
    Actually I tried to add a `skip(n)` API for UnsafeSortIterator to improve 
this a little bit, but it seems not worth that complexity, so removed.
    
    I think the next step could be have a fast path for those functions that 
has communitativity, we could reduce the complexity to `O(N * 2)`
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to