[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r513153232



##
File path: sql/core/src/test/resources/sql-tests/results/window.sql.out
##
@@ -479,6 +479,38 @@ Anthony Bow6627Gerard Bondur
 Leslie Thompson5186Gerard Bondur
 
 
+-- !query
+SELECT
+employee_name,
+salary,
+nth_value(employee_name, 2) OVER (
+  ORDER BY salary DESC
+  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) second_highest_salary
+FROM
+basic_pays
+ORDER BY salary DESC
+-- !query schema
+struct
+-- !query output
+Larry Bott 11798   NULL
+Gerard Bondur  11472   Gerard Bondur
+Pamela Castillo11303   Gerard Bondur

Review comment:
   The output comes from `hiveResultString`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r513138536



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +173,93 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The unbounded offset window frame is an internal window frame just used to 
optimize the
+ * performance for the window function that returns the value of the input 
column offset
+ * by a number of rows within the partition and has specified ROWS BETWEEN 
UNBOUNDED PRECEDING
+ * AND UNBOUNDED FOLLOWING. The internal window frame is not a popular window 
frame cannot be
+ * specified and used directly by the users.
+ * The unbounded offset window frame calculates frames containing NTH_VALUE 
statements.
+ * The unbounded offset window frame return the same value for all rows in the 
window partition.
+ */
+class UnboundedOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+input = rows
+if (offset > input.length) {
+  fillDefaultValue(EmptyRow)
+} else {
+  inputIterator = input.generateIterator()
+  // drain the first few rows if offset is larger than one
+  inputIndex = 0
+  while (inputIndex < offset - 1) {
+if (inputIterator.hasNext) inputIterator.next()
+inputIndex += 1
+  }
+  val r = WindowFunctionFrame.getNextOrNull(inputIterator)
+  projection(r)
+}
+  }
+
+  override def write(index: Int, current: InternalRow): Unit = {
+// The results are the same for each row in the partition, and have been 
evaluated in prepare.
+// Don't need to recalculate here.
+  }
+}
+
+/**
+ * The unbounded preceding offset window frame is an internal window frame 
just used to optimize
+ * the performance for the window function that returns the value of the input 
column offset
+ * by a number of rows within the partition and has specified ROWS BETWEEN 
UNBOUNDED PRECEDING
+ * AND CURRENT ROW. The internal window frame is not a popular window frame 
cannot be specified
+ * and used directly by the users.
+ * The unbounded preceding offset window frame calculates frames containing 
NTH_VALUE statements.
+ * The unbounded preceding offset window frame return the same value for rows 
which index
+ * (starting from 1) equal to or greater than offset in the window partition.
+ */
+class UnboundedPrecedingOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+
+  var selectedRow: UnsafeRow = null
+
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+input = rows
+inputIterator = input.generateIterator()
+// drain the first few rows if offset is larger than one
+inputIndex = 0
+while (inputIndex < offset - 1) {
+  if (inputIterator.hasNext) inputIterator.next()
+  inputIndex += 1
+}
+if (inputIndex >= 0 && inputIndex < input.length) {

Review comment:
   Yes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r513133285



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala
##
@@ -57,8 +57,12 @@ import org.apache.spark.sql.types.{CalendarIntervalType, 
DateType, IntegerType,
  * 3. CURRENT ROW AND 1 FOLLOWING
  * 4. 1 PRECEDING AND 1 FOLLOWING
  * 5. 1 FOLLOWING AND 2 FOLLOWING
- * - Offset frame: The frame consist of one row, which is an offset number of 
rows away from the
- *   current row. Only [[OffsetWindowFunction]]s can be processed in an offset 
frame.
+ * - Offset frame: The frame consist of one row, which is an offset number of 
rows. There are three
+ * implement of offset frame.
+ * 1. [[FrameLessOffsetWindowFunction]] returns the value of the input 
column offset by a number
+ *of rows according to the current row.
+ * 2. [[UnboundedOffsetWindowFunctionFrame]] and 
[[UnboundedPrecedingOffsetWindowFunctionFrame]]
+ *   returns the value of the input column offset by a number of rows 
within the partition.

Review comment:
   Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512481955



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +169,69 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The unbounded offset window frame calculates frames containing NTH_VALUE 
statements.
+ * The unbounded offset window frame return the same value for all rows in the 
window partition.
+ */
+class UnboundedOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+super.prepare(rows)
+if (inputIndex >= 0 && inputIndex < input.length) {
+  val r = WindowFunctionFrame.getNextOrNull(inputIterator)

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512478210



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OK

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
-functions.map(_.asInstanceOf[OffsetWindowFunction]),
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
 child.output,
 (expressions, schema) =>
   MutableProjection.create(expressions, schema),
 offset)
+  case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OK

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
-functions.map(_.asInstanceOf[OffsetWindowFunction]),
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
 child.output,
 (expressions, schema) =>
   MutableProjection.create(expressions, schema),
 offset)
+  case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
+child.output,
+(expressions, schema) =>
+  MutableProjection.create(expressions, schema),
+offset - 1)
+}
+  case ("UNBOUNDED_PRECEDING_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedPrecedingOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512477822



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala
##
@@ -58,7 +58,7 @@ import org.apache.spark.sql.types.{CalendarIntervalType, 
DateType, IntegerType,
  * 4. 1 PRECEDING AND 1 FOLLOWING
  * 5. 1 FOLLOWING AND 2 FOLLOWING
  * - Offset frame: The frame consist of one row, which is an offset number of 
rows away from the
- *   current row. Only [[OffsetWindowFunction]]s can be processed in an offset 
frame.
+ *   current row. Only [[FrameLessOffsetWindowFunction]]s can be processed in 
an offset frame.

Review comment:
   I will update the doc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512468539



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -355,6 +344,36 @@ abstract class OffsetWindowFunction
*/
   val offset: Expression
 
+  /**
+   * Default result value for the function when the `offset`th row does not 
exist.
+   */
+  val default: Expression
+
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean
+
+  lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, offset, offset)
+}
+
+/**
+ * A frameless offset window function is a window function that cannot specify 
window frame and
+ * returns the value of the input column offset by a number of rows within the 
partition.

Review comment:
   I make a mistake.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-26 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512387980



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +169,69 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1

Review comment:
   `nth_value` not use `FrameLessOffsetWindowFunctionFrame`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-26 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512387456



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +169,69 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The unbounded offset window frame calculates frames containing NTH_VALUE 
statements.
+ * The unbounded offset window frame return the same value for all rows in the 
window partition.
+ */
+class UnboundedOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+super.prepare(rows)
+if (inputIndex >= 0 && inputIndex < input.length) {
+  val r = WindowFunctionFrame.getNextOrNull(inputIterator)
+  projection(r)
+} else {
+  fillDefaultValue(EmptyRow)
+}
+  }
+
+  override def write(index: Int, current: InternalRow): Unit = {
+// The results are the same for each row in the partition, and have been 
evaluated in prepare.
+// Don't need to recalculate here.
+  }
+}
+
+/**
+ * The unbounded preceding offset window frame calculates frames containing 
NTH_VALUE statements.
+ * The unbounded preceding offset window frame return the same value for rows 
which index
+ * (starting from 1) equal to or greater than offset in the window partition.
+ */
+class UnboundedPrecedingOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+
+  var selectedRow: UnsafeRow = null
+
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+super.prepare(rows)
+if (inputIndex >= 0 && inputIndex < input.length) {
+  selectedRow = WindowFunctionFrame.getNextOrNull(inputIterator)
+}
+  }
+
+  override def write(index: Int, current: InternalRow): Unit = {
+if (index >= inputIndex && inputIndex < input.length && selectedRow != 
null) {

Review comment:
   Thanks! Good eyesight!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-26 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512381721



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -172,17 +179,41 @@ trait WindowExecBase extends UnaryExecNode {
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
   // Offset Frame

Review comment:
   OK. I update the comment of `FrameLessOffsetWindowFunctionFrame`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-26 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512376963



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -136,8 +136,15 @@ trait WindowExecBase extends UnaryExecNode {
   val frame = 
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame]
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
+case f: FrameLessOffsetWindowFunction => 
collect("RELATIVE_OFFSET", frame, e, f)

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-26 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512376471



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -355,6 +344,35 @@ abstract class OffsetWindowFunction
*/
   val offset: Expression
 
+  /**
+   * Default result value for the function when the `offset`th row does not 
exist.
+   */
+  val default: Expression
+
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean
+
+  lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, offset, offset)
+}
+
+/**
+ * An offset window function is a window function that returns the value of 
the input column offset

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-23 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r510808186



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -355,6 +344,35 @@ abstract class OffsetWindowFunction
*/
   val offset: Expression
 
+  /**
+   * Default result value for the function when the `offset`th row does not 
exist.
+   */
+  val default: Expression
+
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean = true
+
+  lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, offset, offset)
+}
+
+/**
+ * An offset window function is a window function that returns the value of 
the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-23 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r510791572



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -136,8 +136,15 @@ trait WindowExecBase extends UnaryExecNode {
   val frame = 
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame]
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
+case f: OffsetWindowFunction => collect("RELATIVE_OFFSET", frame, 
e, f)
+case f: OffsetWindowSpec if !f.ignoreNulls &&
+  frame.frameType == RowFrame && frame.lower == UnboundedPreceding 
=>
+  frame.upper match {
+case UnboundedFollowing => collect("UNBOUNDED_OFFSET", 
f.fakeFrame, e, f)
+case CurrentRow => collect("UNBOUNDED_PRECEDING_OFFSET", 
f.fakeFrame, e, f)
+case _ => collect("AGGREGATE", frame, e, f)

Review comment:
   OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-23 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r510781235



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -355,6 +344,35 @@ abstract class OffsetWindowFunction
*/
   val offset: Expression
 
+  /**
+   * Default result value for the function when the `offset`th row does not 
exist.
+   */
+  val default: Expression
+
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean = true

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-22 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r509983342



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -136,8 +136,15 @@ trait WindowExecBase extends UnaryExecNode {
   val frame = 
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame]
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
+case f: OffsetWindowSpec if f.isRelative =>
+  collect("RELATIVE_OFFSET", frame, e, f)
+case f: OffsetWindowSpec if !f.ignoreNulls &&
+  frame.frameType == RowFrame && frame.lower == UnboundedPreceding 
=>
+  frame.upper match {
+case UnboundedFollowing => collect("UNBOUNDED_OFFSET", 
f.fakeFrame, e, f)
+case _ => collect("UNBOUNDED_PRECEDING_OFFSET", f.fakeFrame, 
e, f)

Review comment:
   I see.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-22 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r509957584



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -136,8 +136,15 @@ trait WindowExecBase extends UnaryExecNode {
   val frame = 
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame]
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
+case f: OffsetWindowSpec if f.isRelative =>

Review comment:
   Good idea.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-22 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r509933808



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -327,25 +327,14 @@ object WindowFunctionType {
   }
 }
 
-
-/**
- * An offset window function is a window function that returns the value of 
the input column offset
- * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
- * offset -2, will get the value of x 2 rows back in the partition.
- */
-abstract class OffsetWindowFunction
-  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+trait OffsetWindowSpec extends Expression {
   /**
* Input expression to evaluate against a row which a number of rows below 
or above (depending on
-   * the value and sign of the offset) the current row.
+   * the value and sign of the offset) the starting row (current row if 
isRelative=true, or the
+   * first row of the window frame otherwise).
*/
   val input: Expression
 
-  /**
-   * Default result value for the function when the `offset`th row does not 
exist.
-   */
-  val default: Expression

Review comment:
   Because for `Lead` and` Lag`, `default` is the third parameter.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-13 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r503782816



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -327,38 +327,56 @@ object WindowFunctionType {
   }
 }
 
-
-/**
- * An offset window function is a window function that returns the value of 
the input column offset
- * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
- * offset -2, will get the value of x 2 rows back in the partition.
- */
-abstract class OffsetWindowFunction
-  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+trait OffsetWindowSpec extends Expression {
   /**
* Input expression to evaluate against a row which a number of rows below 
or above (depending on
-   * the value and sign of the offset) the current row.
+   * the value and sign of the offset) the starting row (current row if 
isRelative=true, or the
+   * first row of the window frame otherwise).
*/
   val input: Expression
 
+  /**
+   * (Foldable) expression that contains the number of rows between the 
starting row (current row
+   * if isRelative=true, or the first row of the window frame otherwise) and 
the row where the
+   * input expression is evaluated.
+   */
+  val inputOffset: Expression
+
   /**
* Default result value for the function when the `offset`th row does not 
exist.
*/
   val default: Expression
 
   /**
-   * (Foldable) expression that contains the number of rows between the 
current row and the row
-   * where the input expression is evaluated.
+   * A new expression based on inputOffset, considering the direction of the 
`offset`.
+   */
+  val offsetExpr: Expression

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r503179138



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -347,18 +341,51 @@ abstract class OffsetWindowFunction
   val default: Expression
 
   /**
-   * (Foldable) expression that contains the number of rows between the 
current row and the row
-   * where the input expression is evaluated.
+   * (Foldable) expression that contains the number of rows between the 
starting row (current row
+   * if isRelative=true, or the first row of the window frame otherwise) and 
the row where the
+   * input expression is evaluated.
*/
-  val offset: Expression
+  val offsetExpr: Expression
 
   /**
-   * Direction of the number of rows between the current row and the row where 
the input expression
-   * is evaluated.
+   * Direction of the number of rows between the current row or the first row 
of the entire
+   * window partition and the row where the input expression is evaluated.
*/
   val direction: SortDirection
 
-  override def children: Seq[Expression] = Seq(input, offset, default)
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean = true
+
+  lazy val boundary = direction match {
+case Ascending => offsetExpr
+case Descending => UnaryMinus(offsetExpr) match {
+  case e: Expression if e.foldable => Literal.create(e.eval(EmptyRow), 
e.dataType)
+  case o => o
+}
+  }
+
+  lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, boundary, boundary)

Review comment:
   Just tell every one that this frame don't play the real role of the 
frame, just a fake.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -347,18 +341,51 @@ abstract class OffsetWindowFunction
   val default: Expression
 
   /**
-   * (Foldable) expression that contains the number of rows between the 
current row and the row
-   * where the input expression is evaluated.
+   * (Foldable) expression that contains the number of rows between the 
starting row (current row
+   * if isRelative=true, or the first row of the window frame otherwise) and 
the row where the
+   * input expression is evaluated.
*/
-  val offset: Expression
+  val offsetExpr: Expression
 
   /**
-   * Direction of the number of rows between the current row and the row where 
the input expression
-   * is evaluated.
+   * Direction of the number of rows between the current row or the first row 
of the entire
+   * window partition and the row where the input expression is evaluated.
*/
   val direction: SortDirection
 
-  override def children: Seq[Expression] = Seq(input, offset, default)
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean = true
+
+  lazy val boundary = direction match {

Review comment:
   I got it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r503178031



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -617,10 +639,16 @@ case class CumeDist() extends RowNumberLike with 
SizeBasedWindowFunction {
   group = "window_funcs")
 // scalastyle:on line.size.limit line.contains.tab
 case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: 
Boolean)
-extends AggregateWindowFunction with ImplicitCastInputTypes {
+extends AggregateWindowFunction with OffsetWindowSpec with 
ImplicitCastInputTypes {
 
   def this(child: Expression, offset: Expression) = this(child, offset, false)
 
+  override lazy val default = Literal.create(null, input.dataType)
+
+  override val direction = Ascending
+
+  override val isRelative = false
+
   override def children: Seq[Expression] = input :: offsetExpr :: Nil
 
   override val frame: WindowFrame = UnspecifiedFrame

Review comment:
   `NthValue` extends `AggregateWindowFunction` and later override frame as:
   `override val frame: WindowFrame = SpecifiedWindowFrame(RowFrame, 
UnboundedPreceding, CurrentRow)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r502165124



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -136,8 +136,15 @@ trait WindowExecBase extends UnaryExecNode {
   val frame = 
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame]
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
+case f: OffsetWindowSpec if f.isRelative =>
+  collect("RELATIVE_OFFSET", frame, e, f)
+case f: OffsetWindowSpec if !f.ignoreNulls &&
+  frame.frameType == RowFrame && frame.lower == UnboundedPreceding 
=>
+  frame.upper match {
+case UnboundedFollowing => collect("UNBOUNDED_OFFSET", 
f.fakeFrame, e, f)

Review comment:
   I use the name `UNBOUNDED_OFFSET` because there is `UNBOUNDED PRECEDING 
and UNBOUNDED FOLLOWING`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-09 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r502165124



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -136,8 +136,15 @@ trait WindowExecBase extends UnaryExecNode {
   val frame = 
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame]
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
+case f: OffsetWindowSpec if f.isRelative =>
+  collect("RELATIVE_OFFSET", frame, e, f)
+case f: OffsetWindowSpec if !f.ignoreNulls &&
+  frame.frameType == RowFrame && frame.lower == UnboundedPreceding 
=>
+  frame.upper match {
+case UnboundedFollowing => collect("UNBOUNDED_OFFSET", 
f.fakeFrame, e, f)

Review comment:
   I use the name `UNBOUNDED_OFFSET` because there is unbounded before and 
after.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +169,69 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The unbounded offset window frame calculates frames containing NTH_VALUE 
statements.
+ * The unbounded offset window frame return the same value for all rows in the 
window partition.
+ */
+class UnboundedOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+super.prepare(rows)
+if (inputIndex >= 0 && inputIndex < input.length) {
+  val r = WindowFunctionFrame.getNextOrNull(inputIterator)
+  projection(r)
+} else {
+  fillDefaultValue(EmptyRow)
+}
+  }
+
+  override def write(index: Int, current: InternalRow): Unit = {
+// The results are the same for each row in the partition, and have been 
evaluated in prepare.
+// Don't need to recalculate here.
+  }
+}
+
+/**
+ * The unbounded preceding offset window frame calculates frames containing 
NTH_VALUE statements.
+ * The unbounded preceding offset window frame return the same value for rows 
which index
+ * (starting from 1) equal to or greater than offset in the window partition.
+ */
+class UnboundedPrecedingOffsetWindowFunctionFrame(

Review comment:
   Because the current `WindowFunctionFrame` is for data row by row.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-08 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r502165838



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +169,69 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The unbounded offset window frame calculates frames containing NTH_VALUE 
statements.
+ * The unbounded offset window frame return the same value for all rows in the 
window partition.
+ */
+class UnboundedOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+super.prepare(rows)
+if (inputIndex >= 0 && inputIndex < input.length) {
+  val r = WindowFunctionFrame.getNextOrNull(inputIterator)
+  projection(r)
+} else {
+  fillDefaultValue(EmptyRow)
+}
+  }
+
+  override def write(index: Int, current: InternalRow): Unit = {
+// The results are the same for each row in the partition, and have been 
evaluated in prepare.
+// Don't need to recalculate here.
+  }
+}
+
+/**
+ * The unbounded preceding offset window frame calculates frames containing 
NTH_VALUE statements.
+ * The unbounded preceding offset window frame return the same value for rows 
which index
+ * (starting from 1) equal to or greater than offset in the window partition.
+ */
+class UnboundedPrecedingOffsetWindowFunctionFrame(

Review comment:
   Because the current `WindowFunctionFrame` is for data row by row.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-08 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r502165124



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -136,8 +136,15 @@ trait WindowExecBase extends UnaryExecNode {
   val frame = 
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame]
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
+case f: OffsetWindowSpec if f.isRelative =>
+  collect("RELATIVE_OFFSET", frame, e, f)
+case f: OffsetWindowSpec if !f.ignoreNulls &&
+  frame.frameType == RowFrame && frame.lower == UnboundedPreceding 
=>
+  frame.upper match {
+case UnboundedFollowing => collect("UNBOUNDED_OFFSET", 
f.fakeFrame, e, f)

Review comment:
   I use the name `UNBOUNDED_OFFSET` because there is unbounded before and 
after.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org