Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/10605#discussion_r48902027
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala ---
@@ -661,29 +719,35 @@ private[execution] final class
UnboundedFollowingWindowFunctionFrame(
lbound: BoundOrdering) extends WindowFunctionFrame {
/** Rows of the partition currently being processed. */
- private[this] var input: ArrayBuffer[InternalRow] = null
+ private[this] var input: ExternalRowBuffer = null
/** Index of the first input row with a value equal to or greater than
the lower bound of the
* current output row. */
private[this] var inputIndex = 0
- /** Index of the row we are currently writing. */
- private[this] var outputIndex = 0
-
/** Prepare the frame for calculating a new partition. */
- override def prepare(rows: ArrayBuffer[InternalRow]): Unit = {
+ override def prepare(rows: ExternalRowBuffer): Unit = {
input = rows
inputIndex = 0
- outputIndex = 0
}
/** Write the frame columns for the current row to the given target row.
*/
- override def write(): Unit = {
- var bufferUpdated = outputIndex == 0
+ override def write(index: Int, current: InternalRow): Unit = {
+ var bufferUpdated = index == 0
+
+ // Duplicate the input to have a new iterator
+ val tmp = input.copy()
// Drop all rows from the buffer for which the input row value is
smaller than
// the output row lower bound.
- while (inputIndex < input.size && lbound.compare(input, inputIndex,
outputIndex) < 0) {
+ var i = 0
+ while (i < inputIndex) {
+ tmp.next()
--- End diff --
Yes, it doubles the time. If N is large, it's kind of like double of
infinity is still infinity :-)
Actually I tried to add a `skip(n)` API for UnsafeSortIterator to improve
this a little bit, but it seems not worth that complexity, so removed.
I think the next step could be have a fast path for those functions that
has communitativity, we could reduce the complexity to `O(N * 2)`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]