[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

GitBox Tue, 27 Oct 2020 19:15:23 -0700


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r513138536




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##########
@@ -151,10 +173,93 @@ final class OffsetWindowFunctionFrame(
     }
     inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The unbounded offset window frame is an internal window frame just used to 
optimize the
+ * performance for the window function that returns the value of the input 
column offset
+ * by a number of rows within the partition and has specified ROWS BETWEEN 
UNBOUNDED PRECEDING
+ * AND UNBOUNDED FOLLOWING. The internal window frame is not a popular window 
frame cannot be
+ * specified and used directly by the users.
+ * The unbounded offset window frame calculates frames containing NTH_VALUE 
statements.
+ * The unbounded offset window frame return the same value for all rows in the 
window partition.
+ */
+class UnboundedOffsetWindowFunctionFrame(
+    target: InternalRow,
+    ordinal: Int,
+    expressions: Array[OffsetWindowSpec],
+    inputSchema: Seq[Attribute],
+    newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+    offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+    target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+    input = rows
+    if (offset > input.length) {
+      fillDefaultValue(EmptyRow)
+    } else {
+      inputIterator = input.generateIterator()
+      // drain the first few rows if offset is larger than one
+      inputIndex = 0
+      while (inputIndex < offset - 1) {
+        if (inputIterator.hasNext) inputIterator.next()
+        inputIndex += 1
+      }
+      val r = WindowFunctionFrame.getNextOrNull(inputIterator)
+      projection(r)
+    }
+  }
+
+  override def write(index: Int, current: InternalRow): Unit = {
+    // The results are the same for each row in the partition, and have been 
evaluated in prepare.
+    // Don't need to recalculate here.
+  }
+}
+
+/**
+ * The unbounded preceding offset window frame is an internal window frame 
just used to optimize
+ * the performance for the window function that returns the value of the input 
column offset
+ * by a number of rows within the partition and has specified ROWS BETWEEN 
UNBOUNDED PRECEDING
+ * AND CURRENT ROW. The internal window frame is not a popular window frame 
cannot be specified
+ * and used directly by the users.
+ * The unbounded preceding offset window frame calculates frames containing 
NTH_VALUE statements.
+ * The unbounded preceding offset window frame return the same value for rows 
which index
+ * (starting from 1) equal to or greater than offset in the window partition.
+ */
+class UnboundedPrecedingOffsetWindowFunctionFrame(
+    target: InternalRow,
+    ordinal: Int,
+    expressions: Array[OffsetWindowSpec],
+    inputSchema: Seq[Attribute],
+    newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+    offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+    target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+
+  var selectedRow: UnsafeRow = null
+
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+    input = rows
+    inputIterator = input.generateIterator()
+    // drain the first few rows if offset is larger than one
+    inputIndex = 0
+    while (inputIndex < offset - 1) {
+      if (inputIterator.hasNext) inputIterator.next()
+      inputIndex += 1
+    }
+    if (inputIndex >= 0 && inputIndex < input.length) {

Review comment:
       Yes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

Reply via email to