[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476308921 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +168,46 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The absolute offset window frame based on the first row calculates frames containing + * NTH_VALUE statements. The absolute offset window frame based on the first row return + * the same value for all rows in the window partition. Review comment: I will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476307662 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -352,17 +353,24 @@ abstract class OffsetWindowFunction val default: Expression /** - * (Foldable) expression that contains the number of rows between the current row and the row - * where the input expression is evaluated. + * (Foldable) expression that contains the number of rows between the current row or the first + * row of the entire window partition and the row where the input expression is evaluated. */ val offset: Expression /** - * Direction of the number of rows between the current row and the row where the input expression - * is evaluated. + * Direction of the number of rows between the current row or the first row of the entire + * window partition and the row where the input expression is evaluated. */ val direction: SortDirection + /** + * Whether the offset is start with the current row. If `isRelative` is true, `offset` means Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476307341 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -352,17 +353,24 @@ abstract class OffsetWindowFunction val default: Expression /** - * (Foldable) expression that contains the number of rows between the current row and the row - * where the input expression is evaluated. + * (Foldable) expression that contains the number of rows between the current row or the first + * row of the entire window partition and the row where the input expression is evaluated. */ val offset: Expression /** - * Direction of the number of rows between the current row and the row where the input expression - * is evaluated. + * Direction of the number of rows between the current row or the first row of the entire + * window partition and the row where the input expression is evaluated. */ val direction: SortDirection + /** + * Whether the offset is start with the current row. If `isRelative` is true, `offset` means + * the offset is start with the current row. otherwise, the offset is start with the first + * row of the entire window partition. Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476307262 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -352,17 +353,24 @@ abstract class OffsetWindowFunction val default: Expression /** - * (Foldable) expression that contains the number of rows between the current row and the row - * where the input expression is evaluated. + * (Foldable) expression that contains the number of rows between the current row or the first Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476291241 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -342,7 +342,8 @@ abstract class OffsetWindowFunction extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { /** * Input expression to evaluate against a row which a number of rows below or above (depending on - * the value and sign of the offset) the current row. + * the value and sign of the offset) the current row or the first row of the entire window Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476195407 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode { function match { case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", frame, e, f) case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, f) -case f: OffsetWindowFunction => collect("OFFSET", frame, e, f) +case f: OffsetWindowFunction => if (f.startWithCurrentRow) { + collect("OFFSET", frame, e, f) +} else { + collect("WHOLE_OFFSET", frame, e, f) Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476159993 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode { function match { case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", frame, e, f) case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, f) -case f: OffsetWindowFunction => collect("OFFSET", frame, e, f) +case f: OffsetWindowFunction => if (f.startWithCurrentRow) { + collect("OFFSET", frame, e, f) +} else { + collect("WHOLE_OFFSET", frame, e, f) Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476153597 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -77,31 +77,30 @@ object WindowFunctionFrame { * @param newMutableProjection function used to create the projection. * @param offset by which rows get moved within a partition. */ -final class OffsetWindowFunctionFrame( Review comment: We can't uniform the behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476152901 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -363,6 +363,12 @@ abstract class OffsetWindowFunction */ val direction: SortDirection + /** + * Whether the offset is start with the current row. If `startWithCurrentRow` is false, `offset` + * means the offset is start with the first row of the entire window partition. + */ Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476096152 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +480,54 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the `offset`th row from beginning of the + * window frame. Offsets start at 1. When the value of `input` is null at the `offset`th row or + * there is no such an `offset`th row, null is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row + from beginning of the window frame. Offsets start at 1. If the value of `input` at the + `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the + offset is 10, size of the window frame less than 10), null is returned. + """, + since = "3.1.0") +case class NthValue(input: Expression, offset: Expression) +extends OffsetWindowFunction { + + override val default = Literal(null) + + override val startWithCurrentRow = false + + override val direction = Ascending + + override def checkInputDataTypes(): TypeCheckResult = { +val check = super.checkInputDataTypes() +if (check.isFailure) { + check +} else if (offset.foldable) { + offset.dataType match { +case IntegerType => + offset.eval().asInstanceOf[Int] match { +case i: Int if i <= 0 => TypeCheckFailure( + s"The 'offset' argument of nth_value must be greater than zero but it is $i.") +case _ => TypeCheckSuccess + } +case _ => TypeCheckFailure( + s"The 'offset' parameter must be a int literal but it is ${offset.dataType}.") Review comment: postgresql: integer oracle:integer redshift:integer vertica:integer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476090110 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +480,54 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the `offset`th row from beginning of the + * window frame. Offsets start at 1. When the value of `input` is null at the `offset`th row or Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r474380349 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -184,6 +188,18 @@ trait WindowExecBase extends UnaryExecNode { MutableProjection.create(expressions, schema), offset) + case ("WHOLE_OFFSET", _, IntegerLiteral(offset), _) => +target: InternalRow => + new FixedOffsetWindowFunctionFrame( Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470474638 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,54 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the `offset`th row of beginning Review comment: OK ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,54 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the `offset`th row of beginning + * the window partition (counting from 1). Offsets start at 1. When the value of `input` is null Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470473071 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -363,6 +363,11 @@ abstract class OffsetWindowFunction */ val direction: SortDirection + /** + * Whether the offset is based on the entire frame. Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470379655 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode { function match { case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", frame, e, f) case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, f) -case f: OffsetWindowFunction => collect("OFFSET", frame, e, f) +case f: OffsetWindowFunction => if (f.isWholeBased) { Review comment: According to the plan, `first_value` and `last_value` need to be realized. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470377487 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,55 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is Review comment: ``offset`th row of the window` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470375984 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,55 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the Review comment: Sorry! I made a mistake. `Offsets start at 1` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r469835363 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The fixed offset window frame calculates frames containing + * NTH_VALUE statements. + * The fixed offset window frame return the same value for all rows in the window partition. + */ +class FixedOffsetWindowFunctionFrame( +target: InternalRow, +ordinal: Int, +expressions: Array[OffsetWindowFunction], +inputSchema: Seq[Attribute], +newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection, +offset: Int) + extends OffsetWindowFunctionFrameBase( +target, ordinal, expressions, inputSchema, newMutableProjection, offset) { - override def currentUpperBound(): Int = throw new UnsupportedOperationException() + var maybeRow: Option[UnsafeRow] = None + + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +super.prepare(rows) +if (inputIndex >= 0 && inputIndex < input.length) { + val r = WindowFunctionFrame.getNextOrNull(inputIterator) + maybeRow = Some(r) +} + } + + override def write(index: Int, current: InternalRow): Unit = { Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r469674334 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is + null, null is returned. If there is no such an offset row (e.g., when the offset is 10, + size of the window frame less than 10), null is returned. + """, + since = "3.1.0") +case class NthValue(input: Expression, offset: Expression) +extends OffsetWindowFunction { + + override val default = Literal(null) + + override val isWholeBased = true + + override val direction = Ascending + + override def checkInputDataTypes(): TypeCheckResult = { +val check = super.checkInputDataTypes() +if (check.isFailure) { + check +} else if (offset.foldable) { + offset.eval() match { +case i: Int if i <= 0 => TypeCheckFailure( + s"The 'offset' argument of nth_value must be greater than zero but it is $i.") +case i: Int => TypeCheckSuccess +case other => TypeCheckFailure( + s"The 'offset' parameter must be a int literal but it is $other.") Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r469673892 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is + null, null is returned. If there is no such an offset row (e.g., when the offset is 10, + size of the window frame less than 10), null is returned. Review comment: `If there is no such an offset row, null is returned.` means `If the offset greater than the number of values in the window, null is returned.` `PostgreSQL`,`Vertica`,`Oracle` and `Presto` all clearly stated this requirement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r469670262 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r447462884 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -363,6 +363,11 @@ abstract class OffsetWindowFunction */ val direction: SortDirection + /** + * Whether the offset is based on the entire frame. + */ + val isWholeBased: Boolean = false Review comment: @HyukjinKwon Thanks for your review. But I can't understand your means. Could you tell me more? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446590900 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql ## @@ -301,7 +301,7 @@ FROM tenk1 WHERE unique1 < 10; -- unique1, four -- FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); --- [SPARK-27951] ANSI SQL: NTH_VALUE function +-- [SPARK-30708] first_value/last_value window function throws ParseException Review comment: Because #25082 has reverted, SPARK-30708 not need. I updated this comments with SPARK-28310 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446590900 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql ## @@ -301,7 +301,7 @@ FROM tenk1 WHERE unique1 < 10; -- unique1, four -- FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); --- [SPARK-27951] ANSI SQL: NTH_VALUE function +-- [SPARK-30708] first_value/last_value window function throws ParseException Review comment: Because #25082 has reverted, SPARK-30708 not need. I will delete this line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446593095 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -363,6 +363,11 @@ abstract class OffsetWindowFunction */ val direction: SortDirection + /** + * Whether the offset is based on the entire frame. + */ + val isWholeBased: Boolean = false Review comment: I added this flag used to distinguish `OffsetWindowFunctionFrame` and `FixedOffsetWindowFunctionFrame`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446591904 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The fixed offset window frame calculates frames containing + * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements. + * The fixed offset windwo frame return the same value for all rows in the window partition. + */ +class FixedOffsetWindowFunctionFrame( +target: InternalRow, +ordinal: Int, +expressions: Array[OffsetWindowFunction], +inputSchema: Seq[Attribute], +newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection, +offset: Int) + extends OffsetWindowFunctionFrameBase( +target, ordinal, expressions, inputSchema, newMutableProjection, offset) { - override def currentUpperBound(): Int = throw new UnsupportedOperationException() + var rowOption: Option[UnsafeRow] = None Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446591742 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The fixed offset window frame calculates frames containing + * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements. Review comment: I will open other PR to support first_value and last_value. I will delete first_value and last_value temporarily, until the other PR support them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446591742 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The fixed offset window frame calculates frames containing + * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements. Review comment: I will open other PR to support first_value and last_value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446591681 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The fixed offset window frame calculates frames containing + * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements. + * The fixed offset windwo frame return the same value for all rows in the window partition. Review comment: Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446591593 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -993,6 +993,30 @@ object functions { Lead(e.expr, Literal(offset), Literal(defaultValue)) } + /** + * Window function: returns the value that is the `offset`th row of the window frame + * (counting from 1), and `null` if the size of window frame is less than `offset` rows. + * + * This is equivalent to the nth_value function in SQL. + * + * @group window_funcs + * @since 3.1.0 + */ + def nth_value(columnName: String, offset: Int): Column = { +nth_value(Column(columnName), offset) + } + + /** + * Window function: returns the value that is the `offset`th row of the window frame + * (counting from 1), and `null` if the size of window frame is less than `offset` rows. + * + * This is equivalent to the nth_value function in SQL. + * + * @group window_funcs + * @since 3.0.0 Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446590900 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql ## @@ -301,7 +301,7 @@ FROM tenk1 WHERE unique1 < 10; -- unique1, four -- FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); --- [SPARK-27951] ANSI SQL: NTH_VALUE function +-- [SPARK-30708] first_value/last_value window function throws ParseException Review comment: Because #25082 has reverted, SPARK-30708 not need. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r439184634 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -993,6 +993,30 @@ object functions { Lead(e.expr, Literal(offset), Literal(defaultValue)) } + /** + * Window function: returns the value that is the `offset`th row of the window frame + * (counting from 1), and `null` if the size of window frame is less than `offset` rows. + * + * This is equivalent to the nth_value function in SQL. + * + * @group window_funcs + * @since 3.0.0 Review comment: @HyukjinKwon Thanks for your reminder. I'm wating @hvanhovell here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r438556665 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is + null, null is returned. If there is no such an offset row (e.g., when the offset is 10, + size of the window frame less than 10), null is returned. + """, + since = "3.0.0") Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r437999658 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -993,6 +993,30 @@ object functions { Lead(e.expr, Literal(offset), Literal(defaultValue)) } + /** + * Window function: returns the value that is the `offset`th row of the window frame + * (counting from 1), and `null` if the size of window frame is less than `offset` rows. + * + * This is equivalent to the nth_value function in SQL. + * + * @group window_funcs + * @since 3.0.0 + */ + def nth_value(columnName: String, offset: Int): Column = { Review comment: They are existing two differences. The first is: The offset of `lag` based on the current row in the window. The offset of `nth_value` based on the first row in the window. The second is. The offset of `lag` increases from back to front. The offset of `nth_value` increases from front to back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org